Compare commits

...

633 Commits

Author SHA1 Message Date
Timothy @aden 0c85406bc2 Add TimothyZhang7 to contributors list 2026-03-05 10:36:19 -08:00
Timothy @aden 1051134594 Merge pull request #5878 from aden-hive/feature/hive-as-a-game
chore: fix repo owner
2026-03-05 10:32:20 -08:00
Timothy c7f0ab0444 chore: fix repo owner 2026-03-05 09:33:02 -08:00
Timothy @aden 93bf373a5b Merge pull request #5869 from aden-hive/feature/hive-as-a-game
feat: integration bounty program with Lurkr XP, Discord roles, and automated tracking
2026-03-05 09:29:48 -08:00
Timothy 2d87042a70 fix: bad chars 2026-03-05 09:00:51 -08:00
Timothy 8a28abb7b8 fix: github actions 2026-03-05 09:00:06 -08:00
Emmanuel Nwanguma 0cdfbac5a1 docs(tools): add README for brevo, csv, runtime_logs, account_info tools (#5602)
* docs(tools): add README for brevo, csv, runtime_logs, account_info tools

- brevo_tool: Transactional email/SMS and contact management via Brevo API
- csv_tool: Read, write, query CSV files with DuckDB SQL support
- runtime_logs_tool: Query three-level runtime logging system
- account_info_tool: Query connected accounts and identities

* docs: fix runtime_logs_tool README to match implementation

- query_runtime_logs: add missing status values (degraded, in_progress, needs_attention)
- query_runtime_log_details: add missing needs_attention_only parameter
- query_runtime_log_raw: fix step_type -> step_index (int, not str)
- Fix file names: nodes.jsonl -> details.jsonl, steps.jsonl -> tool_logs.jsonl
- Fix error handling examples to match actual code

---------

Co-authored-by: hundao <alchemy_wimp@hotmail.com>
2026-03-05 18:51:47 +08:00
alidevh 29a3ae471f fix(config): add logging for config parse errors (#4955)
Co-authored-by: alihassan <239741857+alidevh@users.noreply.github.com>
2026-03-05 18:22:34 +08:00
singhhnitin 9c0f56f027 Improve indirect variable expansion for provider API key detection (#5504)
Co-authored-by: Nitin Singh <nitinsingh3323@gmail.com>
2026-03-05 18:12:14 +08:00
Hundao 462e303a6e ci: skip POSIX permission tests on Windows (temporary, see #5842) (#5847)
Windows does not support POSIX file permissions, causing 4 test
failures on Windows CI. Skip these tests until the proper
ReplaceFileW fix lands.
2026-03-05 17:50:05 +08:00
Hundao a84b3c7867 fix: validate agent.json before parsing in AgentRunner.load() (#5846)
Use is_file() instead of exists() to reject directories, and check
for empty content before passing to json parser. Prevents raw
tracebacks on invalid agent.json inputs.

Fixes #5787
2026-03-05 17:23:46 +08:00
Anushka Punekar 606267d053 fix(cli): validate --output path before agent execution in cmd_run (#5838)
* fix(cli): validate --output path before agent execution in cmd_run

* style: fix indentation and formatting

---------

Co-authored-by: hundao <alchemy_wimp@hotmail.com>
2026-03-05 16:51:33 +08:00
Timothy @aden 35791ae478 Merge pull request #5834 from aden-hive/fix/quickstart-tweaks
Release / Create Release (push) Waiting to run
chore(micro-fix): tweak quickstart
2026-03-04 20:06:40 -08:00
Timothy 10f0002080 chore: tweak quickstart 2026-03-04 20:05:16 -08:00
Bryan @ Aden 60bff4107d Merge pull request #5833 from aden-hive/feat/google-scopes
micro-fix: quickstart build failing
2026-03-05 04:03:55 +00:00
bryan be11fa4b29 fix: quickstart build failing 2026-03-04 20:02:23 -08:00
Bryan @ Aden da8bc796d3 Merge pull request #5832 from aden-hive/feat/google-scopes
(micro-fix): chore: updating tool tests
2026-03-05 03:53:25 +00:00
bryan 429619379e fix: linter issues 2026-03-04 19:50:25 -08:00
bryan 0fecedbbbf chore: updating tool tests 2026-03-04 19:47:55 -08:00
Timothy @aden a2244ada75 Merge pull request #5764 from aden-hive/feat/google-scopes
Feat/google scopes
2026-03-04 19:43:30 -08:00
bryan 7608ba9290 Merge branch 'main' into feat/google-scopes 2026-03-04 19:40:46 -08:00
bryan f5f3396d5c chore: update icons of sample agents 2026-03-04 19:38:14 -08:00
bryan ed80ae80f0 feat: twitter news sample agent 2026-03-04 19:37:34 -08:00
Timothy c7a47c71f0 fix: simplify game plan 2026-03-04 19:15:27 -08:00
Timothy @aden b14b8f8c52 Merge pull request #5815 from levxn/bug/agent-sessions
Restoring session during server restart | smooth conversation picked from where left off | fix unhandled error in event routes |
2026-03-04 19:10:38 -08:00
bryan df1a83d475 feat/local-business-sample-agent 2026-03-04 19:09:02 -08:00
bryan 5b7727cfd1 fix: permanent top bar 2026-03-04 19:08:20 -08:00
Timothy 93e270dafb fix: change initial plan 2026-03-04 19:01:24 -08:00
Timothy be675dbb17 fix: restructure docs 2026-03-04 18:59:20 -08:00
Timothy 1c24848db3 feat: implement hive github repo and discord as a connected game 2026-03-04 18:52:42 -08:00
Timothy @aden 4b5ec796bc Merge pull request #5829 from aden-hive/feat/remove-old-session-status-tools
fix: remove the reference in the coder agent init
2026-03-04 17:42:34 -08:00
Richard Tang 24df4729ca fix: remove the reference in the coder agent init 2026-03-04 17:40:28 -08:00
Timothy @aden 1e6538efac Merge pull request #5828 from aden-hive/feat/remove-old-session-status-tools
Remove deprecated get_agent_session_state and get_agent_session_memory tools
2026-03-04 17:29:35 -08:00
Richard Tang f9e53f58af refactor: remove old get_agent_session_state and get_agent_session_memory tools 2026-03-04 17:23:10 -08:00
Timothy 41388efc31 fix: Windows compat — guard os.fchmod and remove deleted LLM_CREDENTIALS import
os.fchmod does not exist on Windows; guard with hasattr check.
Remove LLM_CREDENTIALS reference from test (module deleted in e1db3a4).
2026-03-04 17:22:21 -08:00
Timothy @aden fab5ce6fd0 Merge pull request #5824 from aden-hive/chore/fix-tool-tests
chore(micro-fix): fix test
2026-03-04 17:16:10 -08:00
Timothy 207d6baee5 chore: fix test 2026-03-04 16:49:39 -08:00
Timothy @aden fec72bb2b6 Merge pull request #5294 from Antiarin/feat/hashline-edit-tool
[Integration]feat: add hashline anchor-based file editing tool
2026-03-04 16:38:13 -08:00
Timothy c4c4c24c59 Merge branch 'main' into feat/hashline-edit-tool 2026-03-04 16:23:07 -08:00
bryan 917c7706ea chore: lint fix 2026-03-04 16:14:56 -08:00
bryan 8fadcd5b21 Merge branch 'main' into feat/google-scopes 2026-03-04 16:12:31 -08:00
Timothy @aden 2005ba2dca Merge pull request #5823 from aden-hive/micro-fix/lint
chore(micro-fix): lint
2026-03-04 16:11:51 -08:00
Timothy 557d5fd6e5 chore: lint 2026-03-04 16:10:35 -08:00
Timothy @aden 79d2a15f95 Merge pull request #5814 from fermano/feature/windows-filesysten
Feature/windows filesystem
2026-03-04 16:07:33 -08:00
Timothy ab32e44128 style: ruff format fixes
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 16:00:13 -08:00
Bryan @ Aden 047059f85f Merge pull request #5774 from kostasuser01gr/docs/4780-roadmap-updates
docs: update roadmap to reflect completed features (refs #4780)
2026-03-04 23:59:56 +00:00
Timothy e8364f616d Merge remote-tracking branch 'origin/main' into feature/windows-filesysten 2026-03-04 15:59:49 -08:00
Bryan @ Aden 9098c9b6c6 Merge pull request #5785 from code-Miracle49/fix/remove-duplicate-execute-subagent
micro-fix: remove duplicate _execute_subagent method in EventLoopNode
2026-03-04 23:55:36 +00:00
Timothy 84fd9ebac8 style: fix E501 line-too-long lint errors
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 15:53:45 -08:00
Timothy @aden 23d5d76d56 Merge pull request #5822 from aden-hive/fix/anthropic-vendor-issue
fix: remove hardcoded Anthropic dependencies from core framework
2026-03-04 15:46:16 -08:00
Timothy b0c86588b6 chore: lint 2026-03-04 15:44:11 -08:00
Timothy 5aff1f9489 chore: lint 2026-03-04 15:43:59 -08:00
Timothy Zhang 199cb3d8cc fix: stdin conflicts 2026-03-04 14:45:02 -08:00
Fernando Mano a98a4ca0b6 feature(WindowsFilesystemSupport): #5677 - Windows File System Support and Testing -- fixing lint issues 2026-03-04 21:22:11 -03:00
Fernando Mano c4f49aadfa feature(WindowsFilesystemSupport): #5677 - Windows File System Support and Testing -- fixing lint issues 2026-03-04 21:20:33 -03:00
Fernando Mano ca5ac389cf feature(WindowsFilesystemSupport): #5677 - Windows File System Support and Testing -- fixing lint issues 2026-03-04 21:15:08 -03:00
Fernando Mano 7a658f7953 feature(WindowsFilesystemSupport): #5677 - Windows File System Support and Testing -- fixing lint issues 2026-03-04 21:05:49 -03:00
Timothy e05fc99da7 Merge branch 'main' into fix/anthropic-vendor-issue 2026-03-04 14:43:27 -08:00
Bryan @ Aden 787090667e Merge pull request #5816 from aden-hive/fix/pause-stop-worker
(micro-fix): update pause in pipeline uses stop_worker like queen
2026-03-04 21:50:23 +00:00
Antiarin 80b36b4052 fix: CRLF double-conversion in hashline edit and add large file skip reporting
- Replace joined.replace("\n", "\r\n") with re.sub(r"(?<!\r)\n", "\r\n", joined to prevent \r\n in replace op new_content from becoming \r\r\n (fixed in both hashline_edit.py and file_ops.py)
 - Track and report skipped large files in grep_search instead of silently skipping them
 - Extract HASHLINE_MAX_FILE_BYTES constant to hashline.py as single source of truth, imported by view_file, grep_search, hashline_edit, and file_ops
 - Add tests for CRLF replace op (both copies) and large file skip reporting
2026-03-05 03:12:35 +05:30
bryan 0b8ed521c0 fix:update pause in pipeline uses stop_worker like queen 2026-03-04 13:42:20 -08:00
levxn 1ec7c5545f fixing lints and formatting 2026-03-05 02:59:57 +05:30
levxn cc6b6760c3 enables resume from where it was left off 2026-03-05 02:34:23 +05:30
Levin 26aed90ab2 Merge branch 'aden-hive:main' into bug/agent-sessions 2026-03-05 02:32:56 +05:30
Timothy 1c58ccb0c1 chore: lint 2026-03-04 12:45:27 -08:00
Timothy 79b80fe817 feat: coder tools to also support hashline editing 2026-03-04 12:41:07 -08:00
Antiarin c0f3841af7 feat: add file size check in grep_search to skip large files and switch case in hashline edit
- Implemented a check to skip files larger than 10MB in the grep_search function to optimize memory usage.
2026-03-04 12:41:07 -08:00
Antiarin 2b7d9bc471 feat:Updating the docs 2026-03-04 12:41:07 -08:00
Antiarin 98dc493a39 feat: Add cross-tool hashing anchor, grep search, and all viewfile 2026-03-04 12:41:07 -08:00
Antiarin cfaa57b28d feat:Add hashing tool 2026-03-04 12:40:25 -08:00
Timothy @aden 219e603de6 Merge pull request #5813 from aden-hive/refactor/quickstart
Refactor/quickstart
2026-03-04 12:27:45 -08:00
Timothy @aden 7663a5bce8 Merge pull request #5797 from Waryjustice/fix/windows-browser-auto-open
fix: browser auto-open after quickstart does not work on Windows
2026-03-04 12:27:35 -08:00
Timothy f2841b945d chore: lint 2026-03-04 12:24:08 -08:00
bryan faff64c413 chore: agents.md update 2026-03-04 12:12:27 -08:00
Timothy 6fbcdc1d87 fix: auto install node 20 2026-03-04 12:11:29 -08:00
bryan 69a11af949 chore: best effort alignment of windows quickstart 2026-03-04 11:43:50 -08:00
bryan 9ef272020e chore: added llm key health check 2026-03-04 11:35:12 -08:00
bryan 258cfe7de5 chore: added easy way to update llm provider key 2026-03-04 10:42:57 -08:00
bryan 0d53b21133 chore: doc updates about hive open 2026-03-04 10:33:34 -08:00
Fernando Mano 704a0fd63a feature(WindowsFilesystemSupport): #5677 - Windows File System Support and Testing -- remove testing codeand prepare for PR 2026-03-04 15:09:06 -03:00
bryan 0ccb28ffab fix: enter to use previously configured 2026-03-04 10:05:59 -08:00
Fernando Mano bf4101ac38 feature(WindowsFilesystemSupport): #5677 - Windows File System Support and Testing -- remove testing codeand prepare for PR 2026-03-04 15:03:02 -03:00
bryan b30b571b44 chore: update recommended models 2026-03-04 09:54:29 -08:00
bryan bc44c3a401 chore: make gcu enabled by default 2026-03-04 09:52:42 -08:00
bryan 7fbf57cbb7 fix: linter update 2026-03-04 09:52:16 -08:00
Fernando Mano bc349e8fde feature(WindowsFilesystemSupport): #5677 - Windows File System Support and Testing -- remove testing codeand prepare for PR 2026-03-04 14:41:42 -03:00
bryan 67d094f51a fix: tool tests 2026-03-04 09:22:34 -08:00
bryan 873af04c6e fix: utilize mac keychain for claude code subscription 2026-03-04 09:22:12 -08:00
Shaurya Singh 2f0439dca8 Merge branch 'main' into fix/windows-browser-auto-open 2026-03-04 22:50:39 +05:30
Fernando Mano 8470c6a980 feature(WindowsFilesystemSupport): #5677 - Windows File System Support and Testing 2026-03-04 14:16:48 -03:00
Levin 43092ba1d7 Merge branch 'aden-hive:main' into bug/agent-sessions 2026-03-04 22:33:40 +05:30
bryan 1920192656 feat: hive open cmd 2026-03-04 08:55:18 -08:00
bryan 61487db481 chore: linter fixes 2026-03-04 08:44:04 -08:00
Waryjustice f56feaf821 fix: browser auto-open after quickstart does not work on Windows 2026-03-04 22:12:53 +05:30
Timothy @aden 4cbd5a4c6c Merge pull request #5786 from osb910/fix/charmap-decode-error
fix(core): add utf-8 encoding to backend open calls (micro-fix)
2026-03-04 08:39:10 -08:00
Timothy 65aa5629e8 chore: fix lint 2026-03-04 08:34:01 -08:00
bryan c42c8ba505 Merge branch 'main' into feat/google-scopes 2026-03-04 08:25:29 -08:00
Omar Shareef 7193d09bed formatting warning fix 2026-03-04 16:43:46 +02:00
Omar Shareef 49f8fae0b4 fix: systematically enforce UTF-8 encoding across tools and core to fix Windows charmap decode errors 2026-03-04 16:04:53 +02:00
Omar Shareef e1a490756e fix: systematically enforce UTF-8 encoding across tools and core to fix Windows charmap decode errors 2026-03-04 15:58:03 +02:00
code-Miracle49 c313ea7ee2 micro-fix: remove duplicate _execute_subagent method in EventLoopNode 2026-03-04 12:54:43 +01:00
Omar Shareef 91bfaf36e3 fix(core): add utf-8 encoding to backend open calls
This fixes a charmap decoding error on Windows when opening agent files without explicitly specifying the encoding.
2026-03-04 13:32:59 +02:00
levxn e3ea9212dd latest upstream and MC resolved in workspace.tsx 2026-03-04 12:08:14 +05:30
kostasuser01gr 99d41d8cc6 docs: update roadmap to reflect completed features (refs #4780) 2026-03-04 08:37:14 +02:00
levxn 8988c1e760 session management and ability to converse from where the chat was left off, fix v1 2026-03-04 11:40:44 +05:30
Timothy @aden 465adf5b1f Merge pull request #5767 from aden-hive/feat/integrations
Feat/integrations
2026-03-03 22:04:08 -08:00
RichardTang-Aden 132d00d166 Merge pull request #5769 from aden-hive/queen-mode-separation
Release / Create Release (push) Waiting to run
Queen mode separation: building, staging, and running modes
2026-03-03 21:31:23 -08:00
bryan b1a5f8e730 chore: tool test fixes 2026-03-03 21:01:19 -08:00
Richard Tang a604fee3aa chore: mode label update 2026-03-03 20:47:35 -08:00
Timothy 8018325923 style: fix all ruff lint errors (E501, E722, E741, F841)
- Break long lines (E501) across 25+ files
- Replace bare except with except Exception (E722)
- Rename ambiguous variable `l` to `item` (E741)
- Prefix unused variables with underscore (F841)
2026-03-03 20:42:30 -08:00
Richard Tang 3f86bd4009 chore: lint fix 2026-03-03 20:39:04 -08:00
Timothy b4cf10214b chore: lint issues 2026-03-03 20:38:30 -08:00
Bryan @ Aden c7818c2c33 Merge pull request #5766 from aden-hive/fix/credential-modal-delete
(micro-fix): Fix/credential modal delete
2026-03-04 04:38:23 +00:00
Timothy e421bcc326 chore: lint issues 2026-03-03 20:36:28 -08:00
Richard Tang 09e5a4dcc0 chore: frontend verbrige 2026-03-03 20:31:26 -08:00
Richard Tang ce08c44235 feat: improve ui indicator 2026-03-03 20:28:32 -08:00
Richard Tang e743234324 fix: strenghthen prompt to collect user intent 2026-03-03 20:23:53 -08:00
Timothy 9b76ac48b7 chore: new depedency 2026-03-03 20:23:10 -08:00
bryan 06a9adb051 chore: linter fix 2026-03-03 20:15:42 -08:00
Richard Tang 6ae16345a8 fix: reference err from merging 2026-03-03 20:15:37 -08:00
Richard Tang 8daaf000b1 Merge remote-tracking branch 'origin/feat/question-widget' into queen-mode-separation 2026-03-03 20:09:10 -08:00
bryan 9ce753055c feat: meeting scheduler agent 2026-03-03 20:01:58 -08:00
bryan 0ce87b5155 refactor: update calendar list events tool 2026-03-03 20:01:42 -08:00
Richard Tang 273f411eee feat: replace the reload agent to stop worker 2026-03-03 20:01:27 -08:00
Richard Tang 6929cecf8a fix: tag for frontend 2026-03-03 19:53:18 -08:00
Richard Tang 9221a7ff03 Merge remote-tracking branch 'origin/queen-mode-separation' into queen-mode-separation 2026-03-03 19:43:33 -08:00
Richard Tang a6089c5b3b feat: returning queen bee status when starting session 2026-03-03 19:43:04 -08:00
Richard Tang a7ee972b32 feat: enable the frontend to cancel the current queen run and sync queen mode 2026-03-03 19:30:55 -08:00
Richard Tang c817989b99 feat: allow frontend change to control mode 2026-03-03 19:29:33 -08:00
Richard Tang 2272a6854c refactor: consolidate discorver_mcp_tools and list_agent_tools 2026-03-03 19:08:58 -08:00
Timothy 040fc1ee8d feat: corrected agent generation guidelines 2026-03-03 18:53:40 -08:00
Richard Tang f00b8d7b8c fix: update the initial state condition 2026-03-03 18:35:24 -08:00
Timothy @aden 6c8c6d7048 Merge pull request #5234 from Antiarin/fix/guardian-self-trigger-loop
fix(tui): fix pause/stop to cancel all running tasks across all graphs
2026-03-03 18:17:15 -08:00
Richard Tang f27ef52c7a feat: update queen initial state 2026-03-03 18:15:51 -08:00
Richard Tang 0a2ff1db97 feat: new queen stages and tools 2026-03-03 18:07:47 -08:00
Timothy 6da48eac6f feat: split tool loading into verified and unverified tiers
register_all_tools() now only loads verified (stable) tools by default.
Pass include_unverified=True to also load new/community integrations.
This prevents unverified tools from being loaded in production.

Also fixes duplicate register_brevo and register_pushover calls.
2026-03-03 17:54:45 -08:00
Timothy 638ff04e24 fix: remove duplicate community tool directories and fix credential wiring
- Remove s3_tool (duplicate of aws_s3_tool), power_bi_tool (duplicate of
  powerbi_tool), x_tool (duplicate of twitter_tool)
- Remove integrations/plaid (duplicate of plaid_tool), integrations/sap_s4hana
  (duplicate of sap_tool), stray tools/mssql.py
- Add help key to credential error responses across 14 tool modules
- Fix health checker registry keys (calendly -> calendly_pat, lusha -> lusha_api_key)
- Add health_check_endpoint to calendly and lusha credential specs
- Fix Trello env var (TRELLO_TOKEN -> TRELLO_API_TOKEN) and remove duplicate
  Trello specs from hubspot.py
- Add credential_group="aws" to AWS S3 and Redshift specs sharing env vars
- Update conftest UNREGISTERED_COMMUNITY_MODULES to only contain mssql_tool
2026-03-03 17:46:28 -08:00
Timothy d7075b459b fix: cleanse llm conversations 2026-03-03 17:44:21 -08:00
bryan d0e7aa14b6 fix: hide delete button for Aden-managed credentials 2026-03-03 17:36:04 -08:00
bryan 59fee56c54 fix: share server credential store with runner to avoid redundant Aden syncs 2026-03-03 17:35:24 -08:00
bryan 2207306169 fix: resolve MCP server cwd from repo root instead of agent path 2026-03-03 17:34:51 -08:00
Richard Tang 8ff2e91f2d feat: add queen agent building and running mode switching 2026-03-03 16:01:41 -08:00
bryan 730370a007 test: update calendar and health check tests 2026-03-03 15:42:22 -08:00
bryan f87909109c refactor: simplify health check system 2026-03-03 15:42:07 -08:00
bryan d6a6d8b5ef refactor: unify Google OAuth to single credential 2026-03-03 15:41:53 -08:00
bryan 57563abfa7 feat: add Google Sheets tool 2026-03-03 15:41:26 -08:00
Richard Tang 61afaa4c8b feat: add uv instruction to agents 2026-03-03 14:51:58 -08:00
Richard Tang 0de47dbc3f feat: agents.md for agent collaboration 2026-03-03 14:51:58 -08:00
Richard Tang 676ef56134 fix: mcp path 2026-03-03 14:51:58 -08:00
Richard Tang f0899bb35d feat: use send instead of draft for email reply agent 2026-03-03 14:51:58 -08:00
Richard Tang f490038e36 chore: move the email reply sample agent 2026-03-03 14:51:58 -08:00
Richard Tang cbf220eb00 feat: email reply sample agent 2026-03-03 14:51:58 -08:00
Richard Tang bf0d80ea20 docs: reorder section in documentation 2026-03-03 14:51:58 -08:00
Richard Tang 3ae889a6f8 docs: add running screenshot and update the coding agent instruction 2026-03-03 14:51:58 -08:00
Richard Tang 03ca1067ac docs: sync all i18n READMEs with primary README 2026-03-03 14:51:58 -08:00
Richard Tang 3cda30a40a docs: update the latest features from recent changes 2026-03-03 14:51:58 -08:00
Richard Tang 26934527b9 docs: update readme instructions 2026-03-03 14:51:58 -08:00
Richard Tang 2619acde22 docs: remove TUI in the readme 2026-03-03 14:51:58 -08:00
Richard Tang b983d3cfd2 chore: ignore local dev skills 2026-03-03 14:51:58 -08:00
Richard Tang 87a9dd15fe fix: load-new-session from home 2026-03-03 14:51:58 -08:00
RichardTang-Aden 4066962ade Merge pull request #5751 from aden-hive/load-new-session-from-home
Fix new session from home and add email reply agent template
2026-03-03 14:48:17 -08:00
Richard Tang 0f26e34f09 fix: improve the reply template 2026-03-03 14:45:07 -08:00
Richard Tang d76e436e3d fix: new session should have their own id 2026-03-03 14:44:51 -08:00
Timothy 4ff531dec7 fix: update expected health checkers set (add calendly, zoho_crm) 2026-03-03 14:10:34 -08:00
Timothy 4f8b3d7aff fix: update credential specs for community Linear/Trello tools, skip unregistered community modules 2026-03-03 14:09:04 -08:00
Timothy 210fa9c474 fix: use community Brevo implementation (6 tools), remove orphaned x_tool test 2026-03-03 14:06:00 -08:00
Timothy 25361cac8c fix: align tests with community implementations, revert Reddit to httpx (praw unavailable) 2026-03-03 14:02:33 -08:00
Timothy 28defebd6d fix: remove community youtube_transcript tool.py requiring uninstalled SDK 2026-03-03 13:58:45 -08:00
Timothy c74381619e Merge branch 'feature/queen-worker-comm' into feat/question-widget 2026-03-03 13:57:52 -08:00
Timothy d58f3103dd fix: guard register_tools for s3_tool and mssql_tool when SDK not available 2026-03-03 13:54:46 -08:00
Timothy 5d1ed35660 fix: remove shell heredoc artifacts from community power_bi_tool 2026-03-03 13:52:20 -08:00
Timothy 1f3e305534 fix: guard optional SDK imports (boto3, pyodbc) and remove s3_tool registration 2026-03-03 13:51:04 -08:00
Timothy 7d8fdd279c fix: revert Asana to httpx-based implementation (asana SDK not available) 2026-03-03 13:33:35 -08:00
Timothy cacae9f290 fix: compaction logics 2026-03-03 13:33:01 -08:00
Timothy bb061b770f merge: incorporate QuickBooks community PR #4158
# Conflicts:
#	examples/templates/deep_research_agent/config.py
#	examples/templates/tech_news_reporter/config.py
#	tools/README.md
#	tools/src/aden_tools/credentials/__init__.py
#	tools/src/aden_tools/credentials/quickbooks.py
#	tools/src/aden_tools/tools/__init__.py
#	tools/src/aden_tools/tools/quickbooks_tool/__init__.py
#	tools/src/aden_tools/tools/quickbooks_tool/quickbooks_tool.py
#	tools/tests/tools/test_quickbooks_tool.py
2026-03-03 13:27:04 -08:00
Timothy a8768b9ed6 merge: incorporate MSSQL community PR #4200
# Conflicts:
#	tools/pyproject.toml
#	tools/src/aden_tools/credentials/integrations.py
#	tools/src/aden_tools/tools/__init__.py
2026-03-03 13:26:36 -08:00
Timothy b437aa5f6c merge: incorporate Linear community PR #3585
# Conflicts:
#	.claude/skills/hive-credentials/SKILL.md
#	tools/README.md
#	tools/src/aden_tools/tools/__init__.py
#	tools/src/aden_tools/tools/linear_tool/__init__.py
#	tools/src/aden_tools/tools/linear_tool/linear_tool.py
2026-03-03 13:24:57 -08:00
Timothy 9248182570 merge: incorporate Trello community PR #3376
# Conflicts:
#	tools/README.md
#	tools/src/aden_tools/tools/__init__.py
#	tools/src/aden_tools/tools/trello_tool/__init__.py
#	tools/src/aden_tools/tools/trello_tool/trello_tool.py
#	tools/tests/tools/test_trello_tool.py
2026-03-03 13:24:23 -08:00
bryan 511c1a6ed5 fix: update queen prompt around ask_user 2026-03-03 13:22:59 -08:00
Timothy 7c77c7170f merge: incorporate YouTube Transcript community PR #3520
# Conflicts:
#	tools/pyproject.toml
#	tools/src/aden_tools/tools/__init__.py
2026-03-03 13:22:46 -08:00
Timothy 85fcb6516c merge: incorporate Redshift community PR #3533
# Conflicts:
#	tools/pyproject.toml
#	tools/src/aden_tools/tools/__init__.py
#	tools/src/aden_tools/tools/redshift_tool/__init__.py
#	tools/src/aden_tools/tools/redshift_tool/redshift_tool.py
#	tools/tests/tools/test_redshift_tool.py
2026-03-03 13:17:41 -08:00
Timothy e8e76d85f7 merge: incorporate Pushover community PR #5424
# Conflicts:
#	tools/src/aden_tools/tools/pushover_tool/__init__.py
#	tools/src/aden_tools/tools/pushover_tool/pushover_tool.py
2026-03-03 13:17:18 -08:00
Timothy 5aaa5ae4d5 merge: incorporate Twitter/X community PR #3807
# Conflicts:
#	tools/src/aden_tools/credentials/__init__.py
#	tools/src/aden_tools/tools/__init__.py
#	tools/tests/test_credentials.py
2026-03-03 13:16:45 -08:00
Timothy c3a8ee9c7b merge: incorporate Calendly community PR #3947
# Conflicts:
#	tools/src/aden_tools/credentials/__init__.py
#	tools/src/aden_tools/credentials/calendly.py
#	tools/src/aden_tools/tools/__init__.py
#	tools/src/aden_tools/tools/calendly_tool/__init__.py
#	tools/src/aden_tools/tools/calendly_tool/calendly_tool.py
#	tools/tests/test_health_checks.py
#	tools/tests/tools/test_calendly_tool.py
2026-03-03 13:14:20 -08:00
Timothy 5d07a8aba5 merge: incorporate Airtable community PR #3953
# Conflicts:
#	tools/src/aden_tools/credentials/__init__.py
#	tools/src/aden_tools/credentials/airtable.py
#	tools/src/aden_tools/credentials/health_check.py
#	tools/src/aden_tools/tools/__init__.py
#	tools/src/aden_tools/tools/airtable_tool/__init__.py
#	tools/src/aden_tools/tools/airtable_tool/airtable_tool.py
#	tools/tests/test_health_checks.py
#	tools/tests/tools/test_airtable_tool.py
2026-03-03 13:13:47 -08:00
Timothy d18e0594b8 merge: incorporate Reddit community PR #3963
# Conflicts:
#	tools/pyproject.toml
#	tools/src/aden_tools/credentials/__init__.py
#	tools/src/aden_tools/credentials/health_check.py
#	tools/src/aden_tools/credentials/reddit.py
#	tools/src/aden_tools/tools/__init__.py
#	tools/src/aden_tools/tools/reddit_tool/__init__.py
#	tools/src/aden_tools/tools/reddit_tool/reddit_tool.py
#	tools/tests/tools/test_reddit_tool.py
#	uv.lock
2026-03-03 13:12:55 -08:00
Timothy 26dcc86a24 merge: incorporate Zoho CRM community PR #4713
# Conflicts:
#	tools/src/aden_tools/credentials/__init__.py
#	tools/src/aden_tools/tools/__init__.py
#	tools/src/aden_tools/tools/zoho_crm_tool/__init__.py
#	tools/src/aden_tools/tools/zoho_crm_tool/zoho_crm_tool.py
#	tools/tests/test_health_checks.py
2026-03-03 13:11:51 -08:00
Timothy e928ad19e5 merge: incorporate Lusha community PR #4714
# Conflicts:
#	tools/src/aden_tools/credentials/__init__.py
#	tools/src/aden_tools/credentials/lusha.py
#	tools/src/aden_tools/tools/__init__.py
#	tools/src/aden_tools/tools/lusha_tool/__init__.py
#	tools/src/aden_tools/tools/lusha_tool/lusha_tool.py
#	tools/tests/tools/test_lusha_tool.py
2026-03-03 13:11:33 -08:00
Timothy 6768aaa575 merge: incorporate Apify community PR #4770
# Conflicts:
#	tools/src/aden_tools/credentials/__init__.py
#	tools/src/aden_tools/credentials/apify.py
#	tools/src/aden_tools/tools/__init__.py
#	tools/src/aden_tools/tools/apify_tool/__init__.py
#	tools/src/aden_tools/tools/apify_tool/apify_tool.py
#	tools/tests/tools/test_apify_tool.py
2026-03-03 13:10:45 -08:00
Timothy f561aacbfc merge: incorporate Attio community PR #4832
# Conflicts:
#	tools/src/aden_tools/credentials/__init__.py
#	tools/src/aden_tools/credentials/attio.py
#	tools/src/aden_tools/tools/__init__.py
#	tools/src/aden_tools/tools/attio_tool/__init__.py
#	tools/src/aden_tools/tools/attio_tool/attio_tool.py
2026-03-03 13:10:09 -08:00
RichardTang-Aden af1ece40c2 Merge pull request #5742 from aden-hive/load-new-session-from-home
Load new session from home
2026-03-03 13:09:44 -08:00
Timothy d9edd7adf7 merge: incorporate Asana community PR #4857
# Conflicts:
#	tools/src/aden_tools/credentials/__init__.py
#	tools/src/aden_tools/credentials/asana.py
#	tools/src/aden_tools/tools/__init__.py
#	tools/src/aden_tools/tools/asana_tool/__init__.py
#	tools/tests/tools/test_asana_tool.py
2026-03-03 13:08:30 -08:00
Richard Tang 3541fab363 feat: add uv instruction to agents 2026-03-03 13:06:50 -08:00
Richard Tang 1160dceeff feat: agents.md for agent collaboration 2026-03-03 13:06:09 -08:00
bryan bbe8efeba2 fix: prevent queen auto-block from overwriting pending worker questions 2026-03-03 13:04:33 -08:00
Timothy b4a5323009 merge: incorporate Brevo community PR #5136
# Conflicts:
#	tools/src/aden_tools/credentials/__init__.py
#	tools/src/aden_tools/credentials/brevo.py
#	tools/src/aden_tools/tools/brevo_tool/__init__.py
#	tools/src/aden_tools/tools/brevo_tool/brevo_tool.py
2026-03-03 13:04:29 -08:00
Timothy ade8b5b9a7 merge: incorporate Databricks community PR #5428
# Conflicts:
#	tools/src/aden_tools/credentials/__init__.py
#	tools/src/aden_tools/credentials/databricks.py
#	tools/src/aden_tools/tools/__init__.py
#	tools/src/aden_tools/tools/databricks_tool/__init__.py
#	tools/src/aden_tools/tools/databricks_tool/databricks_tool.py
#	tools/tests/tools/test_databricks_tool.py
2026-03-03 13:02:30 -08:00
Timothy e4ace3d484 merge: incorporate YouTube community PR #5673 (resolve conflicts, preserve README) 2026-03-03 12:29:32 -08:00
Timothy f3dd25adc5 merge: incorporate Power BI community PR #4341 2026-03-03 12:27:06 -08:00
Timothy ec251f8168 merge: incorporate SAP S/4HANA community PR #5519 2026-03-03 12:27:02 -08:00
Timothy 1bb9579dc5 merge: incorporate Plaid community PR #5518 2026-03-03 12:26:56 -08:00
Timothy 7ebf4146ce merge: incorporate AWS S3 community PR #5521 2026-03-03 12:26:50 -08:00
Richard Tang a8db4cb2f5 fix: mcp path 2026-03-03 12:19:32 -08:00
Richard Tang 24433396dd feat: use send instead of draft for email reply agent 2026-03-03 12:04:44 -08:00
Richard Tang 02bdf17641 chore: move the email reply sample agent 2026-03-03 11:59:14 -08:00
Timothy e0e05f3488 chore: register Obsidian tool in tool/credential registries 2026-03-03 11:55:12 -08:00
Timothy c92f2510c8 test: add Obsidian tool unit tests (read, write, append, search, list, active) 2026-03-03 11:55:12 -08:00
Timothy ea1fbe9ee1 chore: add Obsidian credential spec (REST API key) 2026-03-03 11:55:11 -08:00
Timothy 84a0be0179 feat: add Obsidian knowledge management integration (#3741)
6 tools: obsidian_read_note, obsidian_write_note, obsidian_append_note,
obsidian_search, obsidian_list_files, obsidian_get_active.
Uses Local REST API plugin with Bearer token auth. Supports vault
browsing, full-text search, and note CRUD with frontmatter metadata.
2026-03-03 11:55:04 -08:00
RichardTang-Aden 54f5c0dc91 Merge pull request #5735 from aden-hive/docs/readme/v6
docs: reorder section in documentation
2026-03-03 11:54:09 -08:00
Richard Tang adf1a10318 docs: reorder section in documentation 2026-03-03 11:53:05 -08:00
RichardTang-Aden e2a679a265 Merge pull request #5734 from aden-hive/docs/readme/v6
docs: add running screenshot and update the coding agent instruction
2026-03-03 11:50:56 -08:00
Richard Tang a3916a6932 docs: add running screenshot and update the coding agent instruction 2026-03-03 11:49:19 -08:00
Timothy 1b5780461e chore: register Langfuse tool in tool/credential registries 2026-03-03 11:42:49 -08:00
Timothy c8d35b63a4 test: add Langfuse tool unit tests (traces, scores, prompts) 2026-03-03 11:42:49 -08:00
Timothy feb1ebae04 chore: add Langfuse credential specs (public key, secret key) 2026-03-03 11:42:48 -08:00
Timothy efe49d0a5b feat: add Langfuse LLM observability integration (#5322)
6 tools: langfuse_list_traces, langfuse_get_trace, langfuse_list_scores,
langfuse_create_score, langfuse_list_prompts, langfuse_get_prompt.
Uses HTTP Basic Auth with public/secret key pair. Supports cloud and
self-hosted instances with offset-based pagination.
2026-03-03 11:41:11 -08:00
Timothy e50a5ea22a chore: register Zoom and n8n tools in tool/credential registries 2026-03-03 11:31:25 -08:00
Timothy 6382c94d0a test: add n8n tool unit tests (workflows, executions, activate/deactivate) 2026-03-03 11:31:21 -08:00
Timothy 58ce84c9cc chore: add n8n credential specs (API key, base URL) 2026-03-03 11:31:20 -08:00
Timothy 08fd6ff765 feat: add n8n workflow automation integration (#2931)
6 tools: n8n_list_workflows, n8n_get_workflow, n8n_activate_workflow,
n8n_deactivate_workflow, n8n_list_executions, n8n_get_execution.
Uses X-N8N-API-KEY header auth with configurable base URL.
Supports cursor-based pagination and execution status filtering.
2026-03-03 11:31:15 -08:00
Timothy a9cb79909c test: add Zoom tool unit tests (user, meetings, recordings) 2026-03-03 11:31:07 -08:00
Timothy 852f8ccd94 chore: add Zoom credential spec (Server-to-Server OAuth token) 2026-03-03 11:31:07 -08:00
Timothy 9388ef3e99 feat: add Zoom meeting management integration (#2867)
6 tools: zoom_get_user, zoom_list_meetings, zoom_get_meeting,
zoom_create_meeting, zoom_delete_meeting, zoom_list_recordings.
Uses Server-to-Server OAuth Bearer token. Supports token-based
pagination and cloud recording retrieval by date range.
2026-03-03 11:31:00 -08:00
Timothy 04afb0c4bb chore: register Salesforce and Shopify tools in tool/credential registries 2026-03-03 11:22:40 -08:00
Timothy a07fd44de3 test: add Shopify tool unit tests (orders, products, customers, search) 2026-03-03 11:22:35 -08:00
Timothy f6c1b13846 chore: add Shopify credential specs (access token, store name) 2026-03-03 11:22:35 -08:00
Timothy 654fa3dd1f feat: add Shopify Admin REST API integration - orders, products, customers (#2984)
6 tools: shopify_list_orders, shopify_get_order, shopify_list_products,
shopify_get_product, shopify_list_customers, shopify_search_customers.
Uses X-Shopify-Access-Token header auth with store subdomain.
2026-03-03 11:22:29 -08:00
Timothy 8183449d27 test: add Salesforce CRM tool unit tests (SOQL, CRUD, describe, list objects) 2026-03-03 11:22:16 -08:00
Timothy a9acfb86ad chore: add Salesforce credential specs (access token, instance URL) 2026-03-03 11:22:15 -08:00
Timothy d7d070ac5f feat: add Salesforce CRM integration - SOQL, records, and metadata (#2916)
6 tools: salesforce_soql_query, salesforce_get_record, salesforce_create_record,
salesforce_update_record, salesforce_describe_object, salesforce_list_objects.
Uses OAuth2 Bearer token auth with instance URL. Supports pagination via
nextRecordsUrl and field-level describe with picklist values.
2026-03-03 11:22:08 -08:00
RichardTang-Aden ead51f1eb6 Merge pull request #5732 from aden-hive/docs/readme/v6
docs: update README and sync all i18n translations
2026-03-03 11:19:06 -08:00
Timothy 8c01b573ce chore: register Redshift and SAP S/4HANA in tool/credential registries 2026-03-03 11:11:12 -08:00
Timothy 7744f21b9d test: add SAP S/4HANA tool unit tests (POs, partners, products, sales orders) 2026-03-03 11:11:08 -08:00
Timothy 9ed23a235f chore: add SAP S/4HANA credential specs (base URL, username, password) 2026-03-03 11:11:07 -08:00
Timothy e88328321f feat: add SAP S/4HANA Cloud read-only procurement integration (#3182) 2026-03-03 11:11:06 -08:00
Timothy a4c516bea1 test: add Redshift tool unit tests (execute, describe, results, databases, tables) 2026-03-03 11:11:00 -08:00
Timothy 1c932a04ef chore: add Redshift credential specs (AWS access key, secret key) 2026-03-03 11:11:00 -08:00
Timothy 76d34be4c2 feat: add Amazon Redshift Data API integration - SQL and schema browsing (#3267) 2026-03-03 11:10:59 -08:00
bryan cb0e9ff9ec chore: fixing tests 2026-03-03 11:07:49 -08:00
Timothy d6e8afe316 chore: register Azure SQL and Kafka in tool/credential registries 2026-03-03 11:03:31 -08:00
Timothy a04f2bcf99 test: add Kafka tool unit tests (topics, produce, consumer groups) 2026-03-03 11:03:27 -08:00
Timothy c138e7c638 chore: add Kafka credential specs (REST URL, cluster ID) 2026-03-03 11:03:27 -08:00
Timothy fc08c7007f feat: add Apache Kafka integration via Confluent REST Proxy (#4774) 2026-03-03 11:03:26 -08:00
Timothy d559bb3446 test: add Azure SQL tool unit tests (servers, databases, firewall rules) 2026-03-03 11:03:18 -08:00
Timothy 55a8c39e4b chore: add Azure SQL credential specs (token, subscription ID) 2026-03-03 11:03:17 -08:00
Timothy 02d6f10e5f feat: add Azure SQL Database management integration (#3377) 2026-03-03 11:03:16 -08:00
Timothy 77428a91cc chore: register Power BI and Snowflake in tool/credential registries 2026-03-03 10:56:46 -08:00
Timothy 51403dc276 test: add Snowflake tool unit tests (execute, status, cancel) 2026-03-03 10:56:43 -08:00
Timothy 914a07a35d chore: add Snowflake credential specs (account, token) 2026-03-03 10:56:42 -08:00
Timothy 3c70d7b424 feat: add Snowflake SQL REST API integration (#3230) 2026-03-03 10:56:41 -08:00
Timothy ce1ee4ff17 test: add Power BI tool unit tests (workspaces, datasets, reports, refresh) 2026-03-03 10:56:35 -08:00
Timothy fca41d9bda chore: add Power BI credential spec (POWERBI_ACCESS_TOKEN) 2026-03-03 10:56:34 -08:00
Timothy ff889e02f7 feat: add Power BI integration - workspaces, datasets, reports (#3973) 2026-03-03 10:56:34 -08:00
Richard Tang cbd2c86bbf docs: sync all i18n READMEs with primary README 2026-03-03 10:53:11 -08:00
Timothy 43ab460462 chore: register Terraform Cloud and Lusha in tool/credential registries 2026-03-03 10:49:21 -08:00
Timothy caa06e266b test: add Lusha tool unit tests (enrich, search, usage) 2026-03-03 10:49:17 -08:00
Timothy 3622ca78ee chore: add Lusha credential spec (LUSHA_API_KEY) 2026-03-03 10:49:17 -08:00
Timothy 019e3f9659 feat: add Lusha B2B contact and company enrichment integration (#3461) 2026-03-03 10:49:16 -08:00
Timothy 208cb579a2 test: add Terraform Cloud tool unit tests (workspaces, runs) 2026-03-03 10:49:09 -08:00
Timothy 17de7e4485 chore: add Terraform Cloud credential spec (TFC_TOKEN) 2026-03-03 10:49:08 -08:00
Timothy 810616eee1 feat: add Terraform Cloud integration - workspaces and runs (#4773) 2026-03-03 10:48:41 -08:00
Timothy 191f583669 chore: register Twitter/X and Tines in tool/credential registries 2026-03-03 10:35:46 -08:00
Timothy 1d638cc18e test: add Tines tool unit tests (stories, actions, logs) 2026-03-03 10:35:42 -08:00
Timothy 3efa1f3b88 chore: add Tines credential specs (domain, api_key) 2026-03-03 10:35:42 -08:00
Timothy 4daa33db09 feat: add Tines integration - security automation stories and actions
Implements 5 tools via Tines REST API:
- tines_list_stories: List workflow stories with search/filter
- tines_get_story: Get story details including entry/exit agents
- tines_list_actions: List actions (agents) in stories
- tines_get_action: Get action details with sources/receivers
- tines_get_action_logs: Get action execution logs by level

Uses Bearer token auth with tenant domain.
2026-03-03 10:35:37 -08:00
Timothy fab2fb0056 test: add Twitter/X tool unit tests (search, user, timeline, tweet) 2026-03-03 10:35:29 -08:00
Timothy ce885c120e chore: add Twitter/X credential spec (bearer_token) 2026-03-03 10:35:28 -08:00
Timothy 75b53c47ff feat: add Twitter/X integration - tweet search and user lookup via API v2
Implements 4 tools via X API v2:
- twitter_search_tweets: Search recent tweets with query operators
- twitter_get_user: Get user profile by username
- twitter_get_user_tweets: Get user timeline
- twitter_get_tweet: Get tweet details by ID

Uses Bearer token auth (app-only, read access).
2026-03-03 10:35:21 -08:00
Timothy 2936f73707 chore: register AWS S3 and QuickBooks in tool/credential registries 2026-03-03 10:22:46 -08:00
Timothy e26426b138 test: add QuickBooks tool unit tests (query, entities, invoices) 2026-03-03 10:22:42 -08:00
Timothy 62cacb8e28 chore: add QuickBooks credential specs (access_token, realm_id) 2026-03-03 10:22:42 -08:00
Timothy f3e37190ce feat: add QuickBooks Online integration - accounting API
Implements 5 tools via QuickBooks Online API v3:
- quickbooks_query: Query entities with SQL-like syntax
- quickbooks_get_entity: Get entity by type and ID
- quickbooks_create_customer: Create customers
- quickbooks_create_invoice: Create invoices with line items
- quickbooks_get_company_info: Get company details

Uses OAuth 2.0 Bearer token auth. Supports sandbox mode.
2026-03-03 10:22:35 -08:00
Timothy 0863bbbd2f test: add AWS S3 tool unit tests (buckets, objects, get, put, delete) 2026-03-03 10:22:25 -08:00
Timothy b23fa1daad chore: add AWS S3 credential specs (access_key_id, secret_access_key) 2026-03-03 10:22:24 -08:00
Timothy 05cc1ce599 feat: add AWS S3 integration - object storage via REST API with SigV4
Implements 5 tools via AWS S3 REST API:
- s3_list_buckets: List all buckets in the account
- s3_list_objects: List objects with prefix/delimiter filtering
- s3_get_object: Get object content and metadata
- s3_put_object: Upload text objects
- s3_delete_object: Delete objects

Uses AWS Signature V4 signing (no boto3 dependency).
2026-03-03 10:22:16 -08:00
RichardTang-Aden a1c045fd91 Merge pull request #5727 from aden-hive/docs/readme/v6
Docs: Remove TUI references from README
2026-03-03 10:14:13 -08:00
Timothy e6939f8d51 chore: register PagerDuty and Calendly in tool/credential registries 2026-03-03 10:13:18 -08:00
Timothy 801fef12e1 test: add Calendly tool unit tests (user, events, invitees) 2026-03-03 10:13:14 -08:00
Timothy 5845629175 chore: add Calendly credential spec (personal_access_token) 2026-03-03 10:13:13 -08:00
Timothy 11b916301a feat: add Calendly integration - scheduling events and invitees
Implements 5 tools via Calendly API v2:
- calendly_get_current_user: Get user URI and profile info
- calendly_list_event_types: List meeting templates
- calendly_list_scheduled_events: List booked meetings with date filters
- calendly_get_scheduled_event: Get event details by URI
- calendly_list_invitees: List invitees for an event

Uses Bearer token auth (Personal Access Token).
2026-03-03 10:13:07 -08:00
Timothy aa5d80b1d2 test: add PagerDuty tool unit tests (incidents, services) 2026-03-03 10:13:02 -08:00
Timothy aa5f990acd chore: add PagerDuty credential specs (api_key, from_email) 2026-03-03 10:13:01 -08:00
Timothy 9764c82c2a feat: add PagerDuty integration - incident management and services
Implements 5 tools via PagerDuty REST API v2:
- pagerduty_list_incidents: List incidents with status/urgency/date filters
- pagerduty_get_incident: Get incident details by ID
- pagerduty_create_incident: Create incidents on a service
- pagerduty_update_incident: Acknowledge or resolve incidents
- pagerduty_list_services: List services with name search

Uses Token auth header, From header for write operations.
2026-03-03 10:12:55 -08:00
Richard Tang f921846879 docs: update the latest features from recent changes 2026-03-03 10:12:43 -08:00
Richard Tang a370403b16 docs: update readme instructions 2026-03-03 10:06:13 -08:00
Timothy 543a71eb6c chore: register MongoDB and Airtable in tool/credential registries 2026-03-03 10:06:12 -08:00
Timothy 8285593c13 test: add Airtable tool unit tests (records, bases, schema) 2026-03-03 10:06:08 -08:00
Timothy 6fbfe773fb chore: add Airtable credential spec (personal_access_token) 2026-03-03 10:06:07 -08:00
Timothy a8c54b1e5f feat: add Airtable integration - record CRUD and base metadata
Implements 6 tools via Airtable Web API:
- airtable_list_records: List records with filters, sort, field selection
- airtable_get_record: Get a single record by ID
- airtable_create_records: Create up to 10 records per request
- airtable_update_records: Partial update up to 10 records per request
- airtable_list_bases: List accessible bases
- airtable_get_base_schema: Get table and field schema for a base

Uses Bearer token auth (Personal Access Token).
2026-03-03 10:06:03 -08:00
Timothy a5323abfca test: add MongoDB tool unit tests (find, insert, update, delete, aggregate) 2026-03-03 10:05:53 -08:00
Timothy ba4df2d2c4 chore: add MongoDB credential specs (data_api_url, api_key, data_source) 2026-03-03 10:05:52 -08:00
Timothy 6510633a8c feat: add MongoDB Atlas Data API integration - document CRUD and aggregation
Implements 6 tools via MongoDB Atlas Data API:
- mongodb_find: Find documents with filters, projection, sort, limit
- mongodb_find_one: Find a single document
- mongodb_insert_one: Insert a document
- mongodb_update_one: Update a document with MongoDB operators
- mongodb_delete_one: Delete a document
- mongodb_aggregate: Run aggregation pipelines

Uses API key auth header. All endpoints are POST.
2026-03-03 10:05:42 -08:00
Timothy 9172e5f46b chore: register Twilio and Zendesk in tool/credential registries 2026-03-03 09:56:14 -08:00
Timothy ed3e3848c0 test: add Zendesk tool unit tests (list, get, create, update, search) 2026-03-03 09:56:10 -08:00
Timothy ee90185d5c chore: add Zendesk credential specs (subdomain, email, api_token) 2026-03-03 09:56:09 -08:00
Timothy 6eb2633677 feat: add Zendesk integration - ticket management and search
Implements 5 tools via Zendesk Support API v2:
- zendesk_list_tickets: List tickets with status/sort filters
- zendesk_get_ticket: Get ticket details by ID
- zendesk_create_ticket: Create tickets with priority/type/tags
- zendesk_update_ticket: Update ticket fields and add comments
- zendesk_search_tickets: Search tickets with Zendesk query syntax

Uses Basic auth (email/token:api_token).
2026-03-03 09:56:00 -08:00
Timothy c1f215dcf2 test: add Twilio tool unit tests (SMS, WhatsApp, list, get) 2026-03-03 09:55:50 -08:00
Timothy 97cc9a1045 chore: add Twilio credential specs (account_sid, auth_token) 2026-03-03 09:55:49 -08:00
Timothy 5f7b02a4b7 feat: add Twilio integration - SMS and WhatsApp messaging
Implements 4 tools via Twilio REST API:
- twilio_send_sms: Send SMS messages
- twilio_send_whatsapp: Send WhatsApp messages
- twilio_list_messages: List message history with filters
- twilio_get_message: Get message details by SID

Uses Basic auth (AccountSID:AuthToken), form-urlencoded POST.
2026-03-03 09:55:43 -08:00
Richard Tang ad6d504ea4 docs: remove TUI in the readme 2026-03-03 09:52:06 -08:00
Timothy e696b41a0e chore: register GitLab and Google Sheets in tool/credential registries 2026-03-03 09:49:23 -08:00
Timothy 1f9acc6135 test: add Google Sheets tool unit tests (metadata, read, batch read) 2026-03-03 09:49:23 -08:00
Timothy 7e8699cb4b chore: add Google Sheets credential spec (api_key) 2026-03-03 09:49:22 -08:00
Timothy fd4fc657d6 feat: add Google Sheets integration - read spreadsheet data via API v4
3 tools: sheets_get_spreadsheet, sheets_read_range, sheets_batch_read.
Uses API key auth for read-only access to public spreadsheets.
2026-03-03 09:49:21 -08:00
Timothy 34403648b9 test: add GitLab tool unit tests (projects, issues, MRs) 2026-03-03 09:49:15 -08:00
Timothy 3795d50eb9 chore: add GitLab credential spec (personal access token) 2026-03-03 09:49:14 -08:00
Timothy 80515dde5a feat: add GitLab integration - projects, issues, merge requests
6 tools: gitlab_list_projects, gitlab_get_project, gitlab_list_issues,
gitlab_get_issue, gitlab_create_issue, gitlab_list_merge_requests.
Supports GitLab.com and self-hosted via configurable base URL.
2026-03-03 09:49:13 -08:00
Timothy b59094d35f fix: queen should not return on empty stream 2026-03-03 09:44:15 -08:00
Timothy efcd296d83 chore: register Notion and Jira tools in tool/credential registries 2026-03-03 09:43:32 -08:00
Timothy 802cb292b0 test: add Jira tool unit tests (issues, projects, comments) 2026-03-03 09:43:32 -08:00
Timothy 8e55f74d73 chore: add Jira credential specs (domain, email, api_token) 2026-03-03 09:43:31 -08:00
Timothy 3d810485a0 feat: add Jira integration - issues, projects, comments via REST API v3
6 tools: jira_search_issues, jira_get_issue, jira_create_issue,
jira_list_projects, jira_get_project, jira_add_comment. Uses Basic auth
with email + API token and Atlassian Document Format for text fields.
2026-03-03 09:43:30 -08:00
Timothy 94cfd48661 test: add Notion tool unit tests (search, pages, databases) 2026-03-03 09:43:16 -08:00
Timothy 87c8e741f3 chore: add Notion credential spec (api_token) 2026-03-03 09:43:15 -08:00
Timothy d0e92ed18d feat: add Notion integration - pages, databases, and search
5 tools: notion_search, notion_get_page, notion_create_page,
notion_query_database, notion_get_database. Uses Bearer auth
with Notion internal integration token.
2026-03-03 09:43:14 -08:00
Richard Tang 88640f9222 feat: email reply sample agent 2026-03-03 09:41:20 -08:00
Timothy 1927045519 chore: register Greenhouse and YouTube Transcript in tool/credential registries 2026-03-03 09:36:47 -08:00
Timothy 68cffb86c9 test: add YouTube Transcript tool unit tests (get, list transcripts) 2026-03-03 09:36:47 -08:00
Timothy 5bec989647 feat: add YouTube Transcript integration - captions and transcript retrieval
2 tools: youtube_get_transcript, youtube_list_transcripts.
Uses youtube-transcript-api library, no API key required.
2026-03-03 09:36:46 -08:00
Timothy 66f5d2f36c test: add Greenhouse tool unit tests (jobs, candidates, applications) 2026-03-03 09:36:40 -08:00
Timothy 941f815254 chore: add Greenhouse credential spec (api_token) 2026-03-03 09:36:39 -08:00
Timothy 42afd10518 feat: add Greenhouse integration - ATS jobs, candidates, applications
6 tools: greenhouse_list_jobs, greenhouse_get_job, greenhouse_list_candidates,
greenhouse_get_candidate, greenhouse_list_applications, greenhouse_get_application.
Uses Harvest API v1 with Basic auth (API token).
2026-03-03 09:36:38 -08:00
Timothy 3efa285a59 chore: register Cloudinary and Reddit tools in tool/credential registries 2026-03-03 09:31:22 -08:00
Timothy 4f2b4172b4 test: add Reddit tool unit tests (search, posts, comments, user) 2026-03-03 09:31:18 -08:00
Timothy 0d7de71b94 chore: add Reddit credential specs (client_id, client_secret) 2026-03-03 09:31:17 -08:00
Timothy f0f5b4bede feat: add Reddit integration - search, posts, comments, user info
4 tools: reddit_search, reddit_get_posts, reddit_get_comments, reddit_get_user.
Uses OAuth2 client_credentials flow for app-only access.
2026-03-03 09:31:17 -08:00
Timothy bfd27e97d3 test: add Cloudinary tool unit tests (upload, list, get, delete, search) 2026-03-03 09:31:10 -08:00
Timothy f2def27390 chore: add Cloudinary credential specs (cloud_name, api_key, api_secret) 2026-03-03 09:31:10 -08:00
Timothy b3f7bd6cc0 feat: add Cloudinary integration - upload, manage, search media assets
5 tools: cloudinary_upload, cloudinary_list_resources, cloudinary_get_resource,
cloudinary_delete_resource, cloudinary_search. Uses Basic auth with
API key/secret and supports image, video, and raw resource types.
2026-03-03 09:31:09 -08:00
Timothy 0e8e78dc5b chore: register Trello and Confluence tools in tool/credential registries 2026-03-03 09:22:03 -08:00
Timothy b259d85776 test: add Confluence tool tests (9 tests) 2026-03-03 09:22:02 -08:00
Timothy 175d9c3b7c feat: add Confluence credential spec with Basic auth (email + API token) 2026-03-03 09:21:55 -08:00
Timothy a2a810aabf feat: add Confluence integration - spaces, pages, content search via CQL 2026-03-03 09:21:54 -08:00
Timothy 175c7cfd51 test: add Trello tool tests (12 tests) 2026-03-03 09:21:47 -08:00
Timothy 5ada973d38 feat: add Trello credential spec with API key and token auth 2026-03-03 09:21:39 -08:00
Timothy 0103276136 feat: add Trello integration - boards, lists, cards management 2026-03-03 09:21:37 -08:00
Timothy 1d9e8ec138 chore: register HuggingFace tool in tool/credential registries 2026-03-03 09:11:59 -08:00
Timothy 83ac2e71bb test: add HuggingFace tool tests (10 tests) 2026-03-03 09:11:56 -08:00
Timothy 0b35a729a7 feat: add HuggingFace credential spec with token auth 2026-03-03 09:11:55 -08:00
Timothy 56723a519a feat: add HuggingFace Hub integration - models, datasets, spaces search 2026-03-03 09:11:49 -08:00
Timothy ebff394c76 chore: register Plaid tool in tool/credential registries 2026-03-03 09:08:44 -08:00
Timothy ceecc97bc8 test: add Plaid tool tests (13 tests) 2026-03-03 09:08:40 -08:00
Timothy 313154f880 feat: add Plaid credential spec with client_id and secret auth 2026-03-03 09:08:38 -08:00
Timothy 3eb6417cdc feat: add Plaid integration - accounts, balances, transactions, institutions 2026-03-03 09:08:29 -08:00
Timothy 1b35d6ca0a chore: register Pinecone tool in tool/credential registries 2026-03-03 09:05:20 -08:00
Timothy 1d89f0ba9d test: add Pinecone tool tests (18 tests) 2026-03-03 09:05:16 -08:00
Timothy 864df0e21a feat: add Pinecone credential spec with API key auth 2026-03-03 09:05:14 -08:00
Timothy 3f626decc4 feat: add Pinecone vector database integration - indexes, vectors, queries 2026-03-03 09:05:06 -08:00
Timothy bf1760b1a9 chore: register DuckDuckGo tool in tool registry 2026-03-03 08:56:06 -08:00
Timothy 8a58ea6344 test: add DuckDuckGo tool tests (6 tests) 2026-03-03 08:56:06 -08:00
Timothy 662ff4c35f feat: add DuckDuckGo search integration - web search, news, images 2026-03-03 08:56:01 -08:00
Timothy af02352b49 chore: register Linear tool in tool/credential registries 2026-03-03 08:43:41 -08:00
Timothy db9f987d46 test: add Linear tool tests (10 tests) 2026-03-03 08:43:41 -08:00
Timothy 8490ce1389 feat: add Linear credential spec with API key auth 2026-03-03 08:43:41 -08:00
Timothy 55ea9a56a4 feat: add Linear integration - issues, projects, teams, search via GraphQL 2026-03-03 08:43:41 -08:00
Timothy bd2381b10d chore: register Asana tool in tool/credential registries 2026-03-03 08:40:02 -08:00
Timothy 443de755bd test: add Asana tool tests (12 tests) 2026-03-03 08:40:02 -08:00
Timothy 55ec5f14ee feat: add Asana credential spec with PAT auth 2026-03-03 08:40:02 -08:00
Timothy 2e019302c9 feat: add Asana integration - tasks, projects, workspaces, search 2026-03-03 08:40:02 -08:00
Timothy b1e829644b chore: register Yahoo Finance tool in tool registry 2026-03-03 08:36:20 -08:00
Timothy 18f773e91b test: add Yahoo Finance tool tests (8 tests) 2026-03-03 08:36:19 -08:00
Timothy 987cfee930 feat: add Yahoo Finance integration - quotes, history, financials, company info 2026-03-03 08:36:19 -08:00
Timothy 57f6b8498a chore: register Google Search Console tool in tool/credential registries 2026-03-03 08:34:30 -08:00
Timothy 9f0d35977c test: add Google Search Console tool tests (10 tests) 2026-03-03 08:34:30 -08:00
Timothy e5910bbf2f feat: add Google Search Console credential spec with OAuth2 auth 2026-03-03 08:34:30 -08:00
Timothy 0015bf7b38 feat: add Google Search Console integration - analytics, sitemaps, URL inspection 2026-03-03 08:34:30 -08:00
Timothy a6b9234abb chore: register Zoho CRM tool in tool/credential registries 2026-03-03 08:32:13 -08:00
Timothy 086f3942b8 test: add Zoho CRM tool tests (12 tests) 2026-03-03 08:32:13 -08:00
Timothy 924f4abede feat: add Zoho CRM credential spec with OAuth token auth 2026-03-03 08:32:13 -08:00
Timothy 02be91cb08 feat: add Zoho CRM integration - leads, contacts, deals, accounts, notes 2026-03-03 08:32:13 -08:00
Timothy c2298393ab chore: register Apify tool in tool/credential registries 2026-03-03 08:29:33 -08:00
Timothy 4b8c63bf6e test: add Apify tool tests (11 tests) 2026-03-03 08:29:33 -08:00
Timothy e089c3b72c feat: add Apify credential spec with API token auth 2026-03-03 08:29:33 -08:00
Timothy a93983b5db feat: add Apify integration - actors, runs, datasets, key-value stores 2026-03-03 08:29:27 -08:00
Timothy 20f6329004 chore: register Attio tool in tool/credential registries 2026-03-03 08:25:12 -08:00
Timothy 3c2cf71c47 test: add Attio tool tests (14 tests) 2026-03-03 08:25:08 -08:00
Timothy 56288c3137 feat: add Attio credential spec with API key auth 2026-03-03 08:25:04 -08:00
Timothy 79188921a5 feat: add Attio CRM integration - records, lists, notes, tasks 2026-03-03 08:24:58 -08:00
RichardTang-Aden 65962ddf58 Merge pull request #5709 from aden-hive/load-new-session-from-home
Fix new session creation when submitting prompt from home page
2026-03-03 08:20:20 -08:00
Timothy 5ab66008ae chore: register Pipedrive tool in tool/credential registries 2026-03-03 08:18:45 -08:00
Timothy f38c9ee049 test: add Pipedrive tool tests (16 tests) 2026-03-03 08:18:41 -08:00
Timothy 86f5e71ec2 feat: add Pipedrive credential spec with API token auth 2026-03-03 08:18:29 -08:00
Timothy 1e15cc8495 feat: add Pipedrive CRM integration - deals, contacts, orgs, activities, pipelines 2026-03-03 08:18:24 -08:00
Richard Tang bba44430c4 chore: ignore local dev skills 2026-03-03 08:17:32 -08:00
Timothy 077d82ad82 chore: register Docker Hub tool in tool/credential registries 2026-03-03 08:14:27 -08:00
Timothy e4cf7f3da2 test: add Docker Hub tool tests (9 tests) 2026-03-03 08:14:24 -08:00
Timothy e3bdc9e8d7 feat: add Docker Hub credential spec with PAT auth 2026-03-03 08:14:20 -08:00
Timothy f1c1c9aab3 feat: add Docker Hub integration - search, repos, tags, image details 2026-03-03 08:14:15 -08:00
Timothy 97cbcf7658 fix: adapt path guarantee 2026-03-03 08:11:49 -08:00
Richard Tang 69c71d77fb fix: load-new-session from home 2026-03-03 08:09:22 -08:00
Timothy 4860739a2f chore: register Vercel in tool/credential registries (#5044) 2026-03-03 08:08:16 -08:00
Timothy 791ee40cd6 test: add Vercel tool unit tests (#5044) 2026-03-03 08:08:12 -08:00
Timothy e0191ac52b feat: add Vercel credential spec (#5044) 2026-03-03 08:08:07 -08:00
Timothy e0724df196 feat: add Vercel tool - deployments, projects, domains, env vars (#5044) 2026-03-03 08:08:00 -08:00
Timothy 2a56294638 chore: register Databricks in tool/credential registries (#5167) 2026-03-03 08:05:25 -08:00
Timothy d5cd557013 test: add Databricks tool unit tests (#5167) 2026-03-03 08:05:21 -08:00
Timothy 2a43f23a3d feat: add Databricks credential spec (#5167) 2026-03-03 08:05:03 -08:00
Timothy 69af8f569a feat: add Databricks tool - SQL, jobs, clusters, workspace (#5167) 2026-03-03 08:04:34 -08:00
bryan dcc11c9ea3 chore: move test deps to testing extra and dev group 2026-03-03 08:03:02 -08:00
Timothy 4b4abb47b0 Merge branch 'feature/queen-worker-comm' into fix/queen-recovery 2026-03-03 08:02:59 -08:00
Timothy 0e86dbcc9b chore: register Redis tool in tool/credential registries (#5370) 2026-03-03 08:01:43 -08:00
Timothy 92c75aa6f5 test: add Redis tool unit tests (#5370) 2026-03-03 08:01:37 -08:00
Timothy be41d848e5 feat: add Redis credential spec (#5370) 2026-03-03 08:01:32 -08:00
Timothy f7c299f6f0 feat: add Redis tool implementation - KV, hash, list, pub/sub (#5370) 2026-03-03 08:01:25 -08:00
Timothy b6a0f65a09 feat: add Pushover push notification integration (#5415)
4 tools: pushover_send, pushover_validate_user, pushover_list_sounds,
pushover_check_receipt. Supports priority levels, HTML, sounds, TTL.
All 12 unit tests and 13 conformance tests passing.
2026-03-03 07:58:29 -08:00
Timothy 1e7b0068ed chore: register Supabase tool in tool/credential registries 2026-03-03 07:54:34 -08:00
bryan 207d2fb911 feat: wire QuestionWidget into ChatPanel and workspace 2026-03-03 07:54:32 -08:00
Timothy de5105f313 feat: add Supabase integration - DB, Auth, Edge Functions (#5489)
7 tools: supabase_select, supabase_insert, supabase_update, supabase_delete,
supabase_auth_signup, supabase_auth_signin, supabase_edge_invoke.
All 19 unit tests and 13 conformance tests passing.
2026-03-03 07:54:27 -08:00
bryan c65a99c87d feat: add QuestionWidget component 2026-03-03 07:54:21 -08:00
bryan b4d7e57250 feat: update queen prompt for structured ask_user 2026-03-03 07:53:35 -08:00
bryan 63845a07aa feat: add queen-context endpoint and SSE replay 2026-03-03 07:53:22 -08:00
bryan 68ac73aa55 feat: add options support to ask_user tool 2026-03-03 07:53:05 -08:00
Timothy 6d32f1bb36 chore: register YouTube and Microsoft Graph tools in tool/credential registries 2026-03-03 07:51:33 -08:00
Timothy 9c316cee28 feat: add Microsoft Graph integration - Outlook, Teams, OneDrive (#5601)
11 tools: outlook_list_messages, outlook_get_message, outlook_send_mail,
teams_list_teams, teams_list_channels, teams_send_channel_message,
teams_get_channel_messages, onedrive_search_files, onedrive_list_files,
onedrive_download_file, onedrive_upload_file.
All 15 unit tests and 13 conformance tests passing.
2026-03-03 07:47:49 -08:00
Timothy 6af4f2d6e6 feat: add YouTube Data API integration (#5603)
8 tools: search_videos, get_video_details, get_channel, list_channel_videos,
get_playlist, search_channels, get_video_comments, get_video_categories.
All 17 unit tests and 13 conformance tests passing.
2026-03-03 07:47:34 -08:00
Timothy bc9a43d5a9 fix: execution recovery 2026-03-03 07:43:05 -08:00
levxn 7c7b60a5e9 every sessions loads properly without any issue 2026-03-03 19:46:27 +05:30
levxn 3f0b8bff5b fixes a minor unhandled error in event routes 2026-03-03 18:53:43 +05:30
Amdev-5 57651900f1 Merge remote-tracking branch 'origin/main' into lusha 2026-03-03 18:46:12 +05:30
Amdev-5 46b0617018 Merge remote-tracking branch 'origin/main' into lusha
# Conflicts:
#	tools/src/aden_tools/credentials/health_check.py
#	tools/src/aden_tools/tools/__init__.py
#	tools/tests/test_health_checks.py
2026-03-03 18:34:54 +05:30
levxn 91190cf82d restarts with previous session continuity 2026-03-03 17:48:01 +05:30
RichardTang-Aden 7b98a6613a Merge pull request #5656 from aden-hive/feature/queen-worker-comm
Release / Create Release (push) Waiting to run
Feature/queen worker comm
2026-03-02 22:50:13 -08:00
Richard Tang 26481e27a6 fix: fix tests and lint 2026-03-02 22:46:38 -08:00
Aaryann Chandola 87a26db779 Merge branch 'aden-hive:main' into fix/guardian-self-trigger-loop 2026-03-03 11:56:15 +05:30
Richard Tang bb227b3d73 chore: ruff lint 2026-03-02 21:30:07 -08:00
Richard Tang 8a0cf5e0ae Merge remote-tracking branch 'origin/feature/queen-worker-comm' into feature/queen-worker-comm 2026-03-02 21:27:22 -08:00
Timothy 69218d5699 chore: lint codes 2026-03-02 20:16:34 -08:00
Timothy 7d1433af21 fix: queen agent flakiness 2026-03-02 19:57:18 -08:00
Richard Tang 0bfbf1e9c5 fix: unused /hive-credentials prompts in the validation 2026-03-02 19:53:57 -08:00
Richard Tang 1ca4f5b22b refactor: update the preload_validation logics 2026-03-02 19:46:50 -08:00
Richard Tang 0984e4c1e8 feat: add gcu subagent validation and refactor the prestart validation steps 2026-03-02 18:35:25 -08:00
P Gokul Sree Chandra 7d9bd2e86b feat(tools): add YouTube Data API integration
- Implement 6 YouTube API tools (search videos, get video/channel details, list channel videos, get playlist items, search channels)
- Add YOUTUBE_API_KEY credential spec with help_url and description
- Register YouTube tool in tools/__init__.py
- Add comprehensive test coverage (18 tests) with mocking
- Add detailed README with setup instructions and examples
- Use httpx for HTTP requests to YouTube Data API v3
- Verified with real API integration testing

Implements #5603
2026-03-03 07:35:04 +05:30
Sarthak Karode 4cbf5a7434 feat(core): add pytest framework testing integration with helpful error messages (#5485) 2026-03-03 10:01:33 +08:00
Hundao b33178c5be fix(graph): move auto-block grace period check before _await_user_input (#5672)
The grace period logic for client-facing auto-blocks was placed after
_await_user_input(), which blocks forever since no inject_event is
scheduled for text-only turns. This caused test_text_after_user_input
_goes_to_judge to hang indefinitely, blocking CI framework tests.

Move the grace period check before the blocking call so that within
the grace window, auto-blocks with missing outputs skip blocking
entirely and continue to the next LLM turn for judge RETRY pressure.

Also adds an _auto_missing check: nodes with no missing outputs
(e.g. queen monitoring with output_keys=[]) should still block
as their text-only output is legitimate conversation.

Fixes #5633
2026-03-03 09:39:14 +08:00
Richard Tang dc6a336c60 fix: removed the unused build_capability_summary 2026-03-02 16:26:47 -08:00
Antiarin 20ef5cb14f test(runtime): add async test for canceling multiple tasks across streams 2026-03-03 05:54:42 +05:30
Antiarin 2c3ec7e74c fix(tui): fix pause/stop to cancel all running tasks across all graphs 2026-03-03 05:30:20 +05:30
Richard Tang b855336448 chore: ruff format issue 2026-03-02 15:47:30 -08:00
Richard Tang de021977fd Merge remote-tracking branch 'origin/main' into feature/queen-worker-comm 2026-03-02 15:39:15 -08:00
Timothy cd2b3fcd16 Merge branch 'feature/new-inbox-management-agent' into feature/queen-worker-comm 2026-03-02 14:46:14 -08:00
Timothy b64024ede5 fix: gcu error log throwing 2026-03-02 14:45:57 -08:00
bryan a280d23113 fix: removing escalate to coder from worker tools 2026-03-02 12:02:35 -08:00
Timothy 41785abdba fix: rephrasing 2026-03-02 11:54:22 -08:00
Timothy de494c7e55 Merge branch 'feature/queen-worker-comm' into feature/new-inbox-management-agent 2026-03-02 11:44:08 -08:00
Timothy 5fa0903ea8 fix: teach email agent to search emails 2026-03-02 11:43:40 -08:00
Timothy 7bd99fe074 fix: email inbox management agent 2026-03-02 11:01:21 -08:00
bryan c838e1ca6d feat: agent building animation 2026-03-02 10:54:57 -08:00
bryan f475923353 feat: subagents populate node panel 2026-03-02 09:59:24 -08:00
Timothy 43f43c92e3 Merge branch 'feature/queen-worker-comm' into feature/new-inbox-management-agent 2026-03-02 09:40:55 -08:00
Timothy 5463134322 fix: inbox management template v2 2026-03-02 09:40:36 -08:00
Timothy 3fbb392103 fix: add credentials to queen lifecycle tools 2026-03-02 09:39:38 -08:00
RichardTang-Aden a162da17e1 Merge pull request #5639 from RichardTang-Aden/main
feat: support Gemini 3.1 pro
2026-03-02 09:24:27 -08:00
Richard Tang b565134d57 chore: fix the ruff lint 2026-03-02 09:23:02 -08:00
Richard Tang 3aafc89912 feat: support Gemini 3.1 pro 2026-03-02 09:20:48 -08:00
bryan 93449f92fe fix: clear build cache in quickstart 2026-03-02 09:00:48 -08:00
Bryan @ Aden d766e68d42 Merge pull request #5494 from Antiarin/security/harden-validate-agent-path
[Bug][Security]: agent_path accepts arbitrary filesystem paths with no validation
2026-03-02 16:57:51 +00:00
Hundao 1d8b1f9774 fix: enforce 0600 permissions on OAuth token files (#5631)
* fix: enforce 0600 permissions on OAuth token files

Credential files were written with default umask permissions.
Use os.open with explicit 0o600 mode to ensure token files
are always owner-read/write only, regardless of umask.

Fixes #5530

* style: fix line too long in checkpoint_store.py
2026-03-02 18:30:40 +08:00
Rajneesh Chaudhary 5ea9abae83 fix(core): prevent sse critical event queue from blocking event bus (#5533) (#5536)
Disconnects slow clients instead of blocking the publisher task.

Signed-off-by: Rajneesh180 <rajneeshrehsaan48@gmail.com>
2026-03-02 17:57:52 +08:00
ArshpreetSingh04 15957499c5 docs(core): fix outdated goal-agent path reference in README (#5629)
Update the MCP client configuration example in core/README.md to replace the outdated `goal-agent` path with the correct `hive/core` path.

Fixes #5628
2026-03-02 17:07:25 +08:00
Timothy 0b50d9e874 fix: block idle event 2026-03-01 21:01:59 -08:00
Amdev-5 cce073dbdb fix(lusha): add pagination and empty filter validation
- Expose page parameter on search_people and search_companies
  (client + MCP tool) enabling access beyond the first 50 results
- Add guard requiring at least one filter on both search endpoints
  to prevent broad requests that burn API credits
- Add unit tests for pagination and empty filter validation
2026-03-02 10:20:08 +05:30
Timothy a1e54922bd fix: timer count down update 2026-03-01 20:22:46 -08:00
Timothy 63c0ca34ea Merge branch 'feature/agent-runtime-idling' into feature/queen-worker-comm 2026-03-01 20:14:46 -08:00
Timothy 135477e516 feat: agent idling detection 2026-03-01 20:14:35 -08:00
Timothy 8cac49cd91 feat: frontend display of scheduler count down 2026-03-01 20:13:21 -08:00
Timothy 28dce63682 fix: conversation ordering 2026-03-01 18:56:41 -08:00
Timothy 313ac952e0 Merge branch 'feature/tool-pill-v2' into feature/queen-worker-comm 2026-03-01 18:33:54 -08:00
Timothy 0633d5130b fix: command line refresh frontend build 2026-03-01 18:33:43 -08:00
Timothy 995e487b49 Merge branch 'feature/tool-pill-v2' into feature/queen-worker-comm 2026-03-01 18:26:49 -08:00
Timothy 64b58b57e0 fix: remove reddish color 2026-03-01 18:26:27 -08:00
Timothy c6465908df feat: colorful tool pills 2026-03-01 18:11:57 -08:00
Timothy ca96bcc09f fix: add pending question content to worker status 2026-03-01 18:11:15 -08:00
Timothy 65ee628fae fix: tool pill turn id 2026-03-01 17:58:31 -08:00
Timothy 02043614e5 feat: consolidate worker status report, fix conversation order 2026-03-01 17:56:27 -08:00
Timothy 212b9bf9d4 fix: load agent 2026-03-01 16:26:55 -08:00
Timothy 6070c30a88 Merge branch 'feat/open-hive' into feature/queen-worker-comm 2026-03-01 16:06:43 -08:00
Timothy 8a653e51bc feat: separate worker and queen input 2026-03-01 15:50:28 -08:00
Vasu Bansal 6a92588264 fix(plaid): update v0.6 credential compatibility and stabilize tests 2026-03-01 01:16:16 +05:30
Vasu Bansal 276aad6f0d feat: add Plaid banking integration
- Implement Plaid connector for account balances
- Add transaction history retrieval
- Include GL reconciliation functionality
- Add institution metadata lookup
- Include comprehensive tests and documentation

Closes #4016
2026-03-01 01:16:16 +05:30
Vasu Bansal 10620bda4f fix(sap): update credential-store compatibility and test imports 2026-03-01 01:07:00 +05:30
Vasu Bansal c214401a00 feat(integration): add SAP S/4HANA connector
Add complete SAP S/4HANA integration with:
- Connector for OData API access
- Credential management following Hive patterns
- Unit tests with mocked responses
- Documentation and usage examples

Refs #3182
2026-03-01 01:07:00 +05:30
Vasu Bansal 260ac33324 fix(s3): support v0.6 credential refs and register S3 tools 2026-03-01 00:56:22 +05:30
Vasu Bansal d4cd643860 feat: add AWS S3 integration for cloud object storage
- Add S3Storage class with upload, download, list, delete operations
- Support IAM roles, environment variables, and credential store
- Implement retry logic with adaptive backoff
- Add MCP tools: s3_upload, s3_download, s3_list, s3_delete, s3_check_credentials
- Include comprehensive tests with moto mocking
- Add documentation for setup and IAM permissions

Closes #3012
2026-03-01 00:54:57 +05:30
IamSayeed dc16cfda21 Merge branch 'main' into feature/add-asana-integration 2026-02-28 11:28:43 +05:30
RichardTang-Aden d562670425 Merge pull request #5501 from aden-hive/feat/open-hive
Feat: v6 windows compatibility support
2026-02-27 19:58:48 -08:00
Timothy Zhang 677bee6fe5 Merge branch 'feat/open-hive' of https://github.com/adenhq/hive into feat/open-hive 2026-02-27 19:55:54 -08:00
Timothy Zhang de27bfe76f fix: windows competibility 2026-02-27 19:55:48 -08:00
Timothy 1c1dcb9c33 chore: new architecture 2026-02-27 19:55:05 -08:00
RichardTang-Aden 4ba950f155 Merge pull request #5499 from aden-hive/feat/open-hive
Release / Create Release (push) Waiting to run
feat: tool call revamp, Intercom & GA integrations, credential improvements
2026-02-27 19:41:11 -08:00
bryan 9c3a11d7bb chore: remove load agent 2026-02-27 19:14:35 -08:00
Richard Tang b7d357aea2 Merge remote-tracking branch 'upstream/feat/open-hive' into feat/sub-agent-framework 2026-02-27 19:07:45 -08:00
bryan b2fed68346 chore: fix linter 2026-02-27 18:57:52 -08:00
bryan 0e996928be fix: load credentials check into new agent session 2026-02-27 18:50:03 -08:00
Timothy 6ff4ec3643 Merge branch 'feature/tool-call-revamp' into feat/open-hive 2026-02-27 18:45:35 -08:00
bryan 099f9514ef Merge branch 'main' into feat/open-hive 2026-02-27 18:10:42 -08:00
Bryan @ Aden 296aab6ecb Merge pull request #5171 from Ttian18/feat/tina/intercom-tool-4256
feat(tools): add Intercom tool integration (#4256)
2026-02-28 02:01:57 +00:00
Richard Tang 14182c45fc refactor: reorganized file tools 2026-02-27 17:52:21 -08:00
Richard Tang 2fa8f4283c Merge remote-tracking branch 'upstream/feat/open-hive' into feat/sub-agent-framework 2026-02-27 17:51:43 -08:00
Bryan @ Aden ad3cec2361 Merge pull request #4239 from Ttian18/feat/tina-google-analytics-tool
[Integration]: Google Analytics - Website Traffic & Marketing Performance #3727
2026-02-28 01:50:07 +00:00
Adam Albarghouthi bc836db0f9 micro-fix: fix incorrect CLI commands and docstring in core docs (#5457)
- Replace non-existent CLI commands (calculate, interactive, analyze)
  with actual commands (run, shell, info) in core/README.md
- Fix test-list argument from <goal_id> to <agent_path> in core/README.md
- Fix misleading docstring on MockProvider.complete_with_tools()

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: hundao <alchemy_wimp@hotmail.com>
2026-02-28 08:40:58 +08:00
Adam Albarghouthi 7f28474967 micro-fix: fix wrong credential path and env var in docs (#5458)
* micro-fix: fix wrong credential path and env var in docs

Both docs/configuration.md and docs/environment-setup.md reference a
non-existent ADEN_CREDENTIALS_PATH env var and wrong default path
(~/.aden/credentials). The actual env var is HIVE_CREDENTIAL_KEY and
the default path is ~/.hive/credentials (see storage.py:119,125).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* micro-fix: clarify HIVE_CREDENTIAL_KEY comment wording

Reword comment to avoid implying the env var controls the path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 08:01:16 +08:00
wlkjyy 5d8ba1e49c micro-fix: tests: use unified session_* run IDs in runtime logging tests (#5480)
* tests: use session_* run IDs in runtime logging tests

* refactor: extract _sid() helper for session IDs in runtime logger tests
2026-02-28 07:54:59 +08:00
Richard Tang ccb394675b Merge remote-tracking branch 'upstream/feat/open-hive' into feat/sub-agent-framework 2026-02-27 14:48:47 -08:00
Richard Tang 931487a7d4 feat: clean the options for browser open tools that should not be used by LLM 2026-02-27 14:48:31 -08:00
Richard Tang fb28280ced feat: human-friendly LLM and tool calls logs 2026-02-27 14:45:12 -08:00
Richard Tang 52f16d5bb6 Merge remote-tracking branch 'upstream/feat/open-hive' into feat/sub-agent-framework 2026-02-27 13:49:14 -08:00
Antiarin e5b6c8581a feat: implement agent path validation and restrict loading to allowed directories 2026-02-28 02:56:31 +05:30
Timothy e1db3a4af9 fix: remove hardcoded anthropic logics 2026-02-27 10:23:59 -08:00
Zhang 890b906f15 fix(tools): address review feedback on Google Analytics tool
- Use Credentials.from_service_account_file() instead of mutating os.environ
- Remove unused dimensions param from _format_report_response
- Remove unused metrics param from _format_realtime_response
- Extract duplicated property_id/limit validation into _validate_inputs helper
- Add credential_group="google_cloud" to GA and BigQuery specs
- Update tests to mock Credentials class
2026-02-26 20:46:20 -08:00
Zhang 335a9603e8 feat(tools): add Google Analytics 4 integration (#3727)
Add read-only GA4 Data API v1 tools: ga_run_report, ga_get_realtime,
ga_get_top_pages, and ga_get_traffic_sources. Includes credential spec,
unit tests, and README.
2026-02-26 20:22:12 -08:00
Zhang 5e8a6202e7 fix(credentials): add Intercom health checker to registry (#4256)
Add IntercomHealthChecker (subclass of OAuthBearerHealthChecker) and
register it in HEALTH_CHECKERS so the credential registry completeness
test passes in CI.
2026-02-26 20:01:43 -08:00
Zhang 55a4cdefd7 fix(tools): pass assignee_type through to Intercom API and add README (#4256)
- Pass assignee_type from intercom_assign_conversation tool function
  through to _IntercomClient.assign_conversation() and into the API payload
- Add tests for assignee_type="team" passthrough at client and tool levels
- Add tool README with setup, usage examples, and error handling

Addresses PR #5171 review feedback from @bryanadenhq
2026-02-26 19:56:36 -08:00
Richard Tang 2b63135afb Merge remote-tracking branch 'upstream/feat/open-hive' into feat/sub-agent-framework 2026-02-26 19:33:24 -08:00
Richard Tang 779b376c6e Merge remote-tracking branch 'upstream/feat/open-hive' into feat/sub-agent-framework 2026-02-26 19:02:35 -08:00
Richard Tang b1f3d6b155 Merge remote-tracking branch 'upstream/feat/open-hive' into feat/sub-agent-framework 2026-02-26 17:59:15 -08:00
Richard Tang e7da62e61c Merge remote-tracking branch 'upstream/feat/open-hive' into feat/sub-agent-framework 2026-02-26 17:17:37 -08:00
Richard Tang 7176745e1c feat: GCU enabled in the quickstart menu 2026-02-26 17:15:37 -08:00
Richard Tang 20efd523c9 Merge remote-tracking branch 'upstream/feature/llm-turn-logging' into feat/sub-agent-framework 2026-02-26 16:16:37 -08:00
Richard Tang edf51e6996 feat: prompts for GCU 2026-02-26 15:45:03 -08:00
Richard Tang 6b867883ce chore: ruff lint 2026-02-26 15:03:06 -08:00
Richard Tang 35a05f4120 Merge remote-tracking branch 'upstream/feat/open-hive' into feat/sub-agent-framework 2026-02-26 14:59:48 -08:00
Richard Tang e0e78a97ce refactor: re-organize all the broswer tool and make them built-in for the gcu node type 2026-02-26 12:51:10 -08:00
Navya Bijoy ddd30a950d Integration: add Databricks MCP tool integration
Implements the Databricks MCP tool integration for the Hive agent framework
2026-02-26 21:01:59 +05:30
KRYSTALM7 3ca0e63d54 feat(tools): add Pushover push notification integration
Closes #5415
2026-02-26 13:54:34 +00:00
Richard Tang 214098aaae fix: remove the run_command tool from the predefined engineering tool set for worker agent 2026-02-25 18:36:00 -08:00
Richard Tang 754e33a1ae feat: browser tools optimization 2026-02-24 14:05:26 -08:00
Richard Tang b11b43bbe1 feat: reorganized the log structure for subagents 2026-02-24 10:41:13 -08:00
Richard Tang 86f4645d1c fix: inherit the tool call overflow margin for subagent 2026-02-24 08:20:08 -08:00
Richard Tang 2d05e96cd5 fix: spillover for subagent 2026-02-24 08:18:52 -08:00
Richard Tang 9c44d3b793 feat: add the upgraded file operation tools 2026-02-23 20:25:25 -08:00
Richard Tang 9b89ac694e feat: new snapshot tools 2026-02-23 19:34:42 -08:00
Richard Tang 630d8208cf fix: avoid using headless broswer 2026-02-23 19:09:18 -08:00
Richard Tang 9b342dc593 feat: add health check for the browser start 2026-02-23 18:28:59 -08:00
Richard Tang ad879de6ff feat: clean the browser snapshot tool 2026-02-23 17:56:05 -08:00
Richard Tang 795266aab4 feat: store the subagent logs in the node logs folder 2026-02-23 16:02:39 -08:00
Richard Tang 4e4ef121f9 feat: Progressive feedback in SubagentJudge 2026-02-23 15:48:34 -08:00
Richard Tang ddb9126955 fix: result the bug for calling the snapshot tool too many times 2026-02-23 15:38:04 -08:00
Richard Tang bac6d6dd68 feat: subagent ending judge and communication 2026-02-23 15:25:59 -08:00
Richard Tang 3451570541 feat: enable subagent to talk back to the parent via tools 2026-02-23 12:31:51 -08:00
Richard Tang e5e939f344 feat: add a basic test tool for the broswer control tools validity 2026-02-23 11:08:08 -08:00
Richard Tang 0d51d25482 feat: highlight interactive actions 2026-02-23 11:03:19 -08:00
Richard Tang a0a5b10df0 fix: remove the max subagent logic 2026-02-23 10:35:55 -08:00
Richard Tang 04bac93c14 feat: fix tool bugs and add background tabs option 2026-02-23 10:20:52 -08:00
Richard Tang 047f4a1a0c Merge branch 'main' into feat/sub-agent-framework 2026-02-22 18:31:47 -08:00
Richard Tang 7994b90dfa feat: add the max_sub_agents config and constrain 2026-02-22 18:23:52 -08:00
Richard Tang 04b6a80370 feat: shared agent profile 2026-02-22 18:17:40 -08:00
Shivam Shahi– oss/acc 0f8627f17a format 2026-02-22 00:25:15 +05:30
Zhang fc0c3e169f feat(tools): add Intercom tool with conversations, contacts, and tags (#4256) 2026-02-20 17:14:30 -08:00
Zhang 4760f95bda feat(credentials): add Intercom credential spec (#4256)
Register INTERCOM_ACCESS_TOKEN in INTEGRATION_CREDENTIALS for the
8 Intercom tools (search/get conversations, contacts, notes, tags,
assignment, teams). Tool implementation follows in subsequent commits.
2026-02-20 17:13:54 -08:00
Utkarsh Singh cd0cf69099 feat(tools): add Brevo transactional email and SMS integration
- Add brevo_tool with 6 MCP tools: brevo_send_email, brevo_send_sms,
  brevo_create_contact, brevo_get_contact, brevo_update_contact,
  brevo_get_email_stats
- Add CredentialSpec for BREVO_API_KEY in credentials/brevo.py
- Register brevo_tool in tools/__init__.py and credentials/__init__.py
- Add README with setup instructions and usage examples
- Add 34 unit tests covering all tools, validation and error handling

Closes #5127
2026-02-20 13:19:07 +05:30
Richard Tang a04a8a866d fix: sub-agents reachability check 2026-02-19 11:33:32 -08:00
Richard Tang 8c9baa62b0 feat: create default hive profile for browser use 2026-02-18 18:10:37 -08:00
Richard Tang 262eaa6d84 feat: mcp dependencies for gcu 2026-02-18 16:34:19 -08:00
Richard Tang fc1a48f3bc feat: breaking the browser use tools by types 2026-02-18 16:10:17 -08:00
Richard Tang 060f320cd1 feat(wip): gcu node and basic browser tools 2026-02-18 15:52:46 -08:00
Richard Tang bff32bcaa3 feat: allow sub_agent in the agent framework 2026-02-18 14:43:01 -08:00
Amdev-5 9744363342 fix(lusha): address PR review round 2 — structured filters, pagination, correct types
- search_people: replaced freetext searchText concatenation with proper
  structured Lusha API filters (jobTitles, seniority as list[int],
  departments, locations as dict, company_names, industry_ids, search_text)
- search_companies: added locations, company_names, search_text params;
  made all params optional for flexible queries
- Pagination: exposed limit param (clamped 10-50 per Lusha API constraints)
  on both search tools, replacing hardcoded size=25
- get_signals: changed ids from list[str] to list[int], removed internal
  str-to-int conversion as Lusha IDs are always numeric
- seniority type corrected to list[int] (API rejects string-encoded values
  despite OpenAPI spec suggesting strings — verified via live integration)
- Unit tests updated for all changes (19/19 pass)

Verified against live Lusha API: all 6 tools return correct responses.
2026-02-17 22:00:09 +05:30
Amdev-5 6fe8439e94 fix(lusha): use mainIndustriesIds for company search, safer credential handling
- search_companies: replace names filter with mainIndustriesIds (numeric
  industry IDs) per Lusha API schema. Parameter changed from
  industry: str to industry_ids: list[int] | None.
- _get_api_key: return None instead of raising TypeError on unexpected
  credential type. Lets _get_client handle it with the standard error dict
  pattern used across all tools.
- Updated unit tests for new industry_ids parameter and added test for
  non-string credential handling.
2026-02-17 21:33:02 +05:30
Amdev-5 8e61ffe377 fix(tools): remove invalid searchText field from Lusha prospecting filters
Lusha API rejects filters.companies.include.searchText (HTTP 400).
Replaced with valid 'names' field in search_companies and removed
redundant company searchText from search_people. Updated unit tests.
2026-02-17 21:33:02 +05:30
Amdev-5 723476f7a7 feat(tools): add Lusha MCP integration with credentials and health checks 2026-02-17 21:33:02 +05:30
IamSayeed 0f253027ae Merge branch 'main' into feature/add-asana-integration 2026-02-17 12:20:01 +05:30
Sayeed Rizwan 6053895a82 fix(asana): resolve from PR feedback - refactor client, fix specs, add tests 2026-02-17 12:18:06 +05:30
Shivam Shahi– oss/acc ceffa38717 Merge branch 'main' into feat/zoho-crm 2026-02-17 02:46:29 +05:30
Your hh3538962 ae205fa3f2 fix(tools): address Power BI integration code review feedback
- Fix export endpoint: /Export -> /ExportTo
- Add 202 Accepted response handling
- Add notifyOption to refresh_dataset API call
- Rename format parameter to export_format (avoid shadowing builtin)
- Add PNG support to export formats
- All critical API issues from review addressed
2026-02-16 14:00:09 +05:00
Shivam Shahi– oss/acc 669a05892b Merge branch 'main' into feat/zoho-crm 2026-02-15 21:47:52 +05:30
IamSayeed 4898a9759a Merge branch 'main' into feature/add-asana-integration 2026-02-15 13:07:15 +05:30
Sayeed Rizwan 2c2fa25580 fix: Resolve merge conflicts in credential and tool registries 2026-02-15 13:00:23 +05:30
Sayeed Rizwan 56496d7dbd feat: Add Asana integration for project management automation
- Implement 25 MCP tools for comprehensive Asana operations
  - Task management (create, update, search, delete, complete, comment, subtask)
  - Project management (create, update, list, get tasks)
  - Workspace & team operations (list workspaces, get users)
  - Section management for Kanban workflows
  - Tag and custom field support

- Add Personal Access Token (PAT) authentication
- Use official asana>=3.2.0 Python SDK (v5+ API)
- Include comprehensive error handling with ApiException
- Add 5 unit tests with 100% pass rate
- Provide detailed documentation and usage examples

Technical Details:
- Uses asana.ApiClient with Configuration pattern
- Implements workspace resolution by name or GID
- Handles paginated responses automatically
- Follows CredentialStoreAdapter pattern
- Matches existing tool structure (slack_tool, github_tool)

Closes #4156
2026-02-15 11:33:17 +05:30
y0sif dd0696e44d chore: resolve merge conflicts with main 2026-02-14 21:38:44 +02:00
y0sif dcda273e0b chore: resolve merge conflicts with main 2026-02-14 21:32:33 +02:00
y0sif f3b159c650 docs(tools): document Attio CRM in README 2026-02-14 21:23:47 +02:00
y0sif 06df037e28 chore: add Attio credentials to test spec file 2026-02-14 21:22:55 +02:00
y0sif e814e516d1 chore: add Attio credentials to init file 2026-02-14 21:21:37 +02:00
y0sif 0375e068ed test(tools): add Attio tool tests 2026-02-14 21:20:03 +02:00
y0sif 34ffc533d3 feat(tools): add Attio CRM integration 2026-02-14 21:19:14 +02:00
mubarakar95 ea2ea1a4ae Merge branch 'main' into integration/apify 2026-02-14 17:53:39 +05:30
mubarakar95 9e11947687 style: apply ruff formatting to apify_tool.py 2026-02-14 17:22:35 +05:30
mubarakar95 47117281e1 fix(test): resolve E501 line too long in test_apify_tool.py 2026-02-14 17:22:33 +05:30
mubarakar95 032dd13f5a feat(tools): implement Apify integration with 4 tools and comprehensive tests
- Added credential spec with health check endpoint
- Implemented apify_run_actor (sync/async execution)
- Implemented apify_get_dataset (result retrieval)
- Implemented apify_get_run (status checking)
- Implemented apify_search_actors (marketplace search)
- Created comprehensive README with examples and use cases
- Added 24 unit tests with mocked API responses
- All tests passing, conformance validated, linting clean

Resolves: #4510
2026-02-14 17:22:25 +05:30
mubarakar95 13d8ebbeff feat: Add Apify integration (issue #4510)
Implements comprehensive Apify integration for web scraping and automation:

- Added 4 new tools: apify_run_actor, apify_get_dataset, apify_get_run, apify_search_actors
- Credential management for APIFY_API_TOKEN with health check
- Support for synchronous (wait=True) and asynchronous (wait=False) actor execution
- Actor ID validation and comprehensive error handling
- Full test coverage (26 tests passing)
- README with usage examples and documentation

Addresses #4510
2026-02-14 11:53:56 +05:30
Shivam Shahi– oss/acc 2efa0e01df ruff format fix 2026-02-14 00:35:30 +05:30
Shivam Shahi– oss/acc 6044369fdf feat(tools): add Zoho CRM v8 integration with OAuth2 and MCP tools
Add Zoho CRM MCP integration for lead/contact/account/deal workflows with notes support. Implements 5 MCP tools:
- zoho_crm_search: Search Leads/Contacts/Accounts/Deals by criteria or word with pagination
- zoho_crm_get_record: Fetch a single record by module and ID
- zoho_crm_create_record: Create records with pass-through field payloads
- zoho_crm_update_record: Update records by ID with partial field payloads
- zoho_crm_add_note: Create notes linked to CRM records via Parent_Id mapping

Features:
- Zoho OAuth2 provider added in core credentials (refresh-token flow)
- Zoho auth format: Authorization: Zoho-oauthtoken <token>
- Region/DC-aware routing using accounts domain/region + api_domain usage
- Persisted DC metadata on refresh (api_domain/accounts_domain/location)
- Credential spec and health check registration for zoho_crm
- Tool registration and allowed-tool list updates
- Normalized tool responses with retriable 429 handling
- README with setup, auth modes, usage, and testing instructions
- Comprehensive unit/integration coverage updates for tool, provider, and health checks

Validation:
- Scoped ruff lint/format checks passed
- Targeted test suite passed: 563 passed, 18 skipped

Closes #4418
2026-02-13 18:28:12 +05:30
RichardTang-Aden 97440f9e8a Merge branch 'main' into feature/x-twitter-integration 2026-02-11 17:13:33 -08:00
Your hh3538962 765f7cae58 feat(tools): add get_datasets, get_reports, and export_report functions to Power BI integration 2026-02-11 22:19:51 +05:00
Your hh3538962 b455c8a2ad Merge remote-tracking branch 'origin/main' into feat/power-bi-integration 2026-02-11 22:07:00 +05:00
Sapna vishnoi da25e0ffa5 Merge branch 'main' into feat/redshift-integration 2026-02-11 13:42:26 +05:30
Your hh3538962 e07703c01f feat(tools): add Power BI integration - initial structure with workspace and dataset refresh functions 2026-02-10 13:23:32 +05:00
mishrapravin114 a4abf3eb2b Merge upstream/main: resolve conflicts with Apollo integration
- Keep both APOLLO_CREDENTIALS and AIRTABLE_CREDENTIALS
- Keep both apollo_tool and airtable_tool imports (alphabetical)

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-10 00:25:17 +05:30
mishrapravin114 269d72d073 Merge upstream/main: resolve conflicts with Apollo integration
- Keep both APOLLO_CREDENTIALS and CALENDLY_CREDENTIALS
- Keep both apollo_tool and calendly_tool imports (alphabetical)

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-10 00:20:17 +05:30
mishrapravin114 c8f5dccbd2 docs(airtable): add rate limit section to README
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-10 00:17:49 +05:30
mishrapravin114 8b797ee73f feat(airtable): add rate limit retry and retry_after
- Add 429 handling with retry_after from Retry-After header
- Add _request_with_retry (2 retries) for all API calls
- Update tests to use httpx.request

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-10 00:17:37 +05:30
mishrapravin114 de38adb1e4 feat(calendly): add rate limit handling, retry, 7-day validation
- Add 429 handling with retry_after from Retry-After header
- Add _request_with_retry (2 retries) for all API calls
- Validate get_availability date range <= 7 days
- Update tests to use httpx.request

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-10 00:16:37 +05:30
Sapna vishnoi c169bcc5d8 Merge branch 'main' into feat/redshift-integration 2026-02-09 23:32:08 +05:30
kubrakaradirek 80ea286beb fix: resolve complex merge conflicts and restore integrations 2026-02-09 16:09:43 +03:00
kubrakaradirek 3499be782e feat: implement MSSQL tool with schema discovery closes #3377 2026-02-09 15:32:57 +03:00
Gordon Ng 16603ae49c Test MCP 2026-02-09 01:48:49 -05:00
Gordon Ng bf6bd9ce7f test mcp 2026-02-09 01:48:46 -05:00
Gordon Ng a54c0f6f46 update 2026-02-09 01:20:25 -05:00
Gordon Ng beeed11d48 update 2026-02-09 01:11:33 -05:00
Manas Dutta 25331590a7 feat(reddit): add Reddit health checker and update tool functions 2026-02-08 19:26:01 +05:30
GastonAQS bff9f8976e Merge branch 'main' into feature/add-trello-integration 2026-02-07 15:57:48 -03:00
Manas Dutta b71628e211 Merge branch 'main' into feature/reddit-integration 2026-02-07 19:35:02 +05:30
Manas Dutta 8c1cb1f55b feat: add Reddit integration with 18 MCP tools
Implements Reddit API integration for community management and content monitoring.

Features:
- Search & Monitoring: search posts/comments, get subreddit feeds (new/hot), get posts/comments (6 tools)
- Content Creation: submit posts, reply, edit, delete comments (5 tools)
- User Engagement: get profiles, upvote, downvote, save posts (4 tools)
- Moderation: remove/approve posts, ban users (3 tools)

Implementation:
- OAuth 2.0 authentication via REDDIT_CREDENTIALS
- PRAW library for Reddit API integration
- Comprehensive error handling and validation
- Full test coverage (25 tests passing)

Resolves #3595
2026-02-07 18:38:59 +05:30
mishrapravin114 66214384a9 fix: add register_airtable import and fix ruff I001 import order
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-07 17:18:26 +05:30
mishrapravin114 6d6646887c feat(tools): add Airtable bases and records integration
- Add Airtable tool with 5 MCP tools:
  - airtable_list_bases
  - airtable_list_tables
  - airtable_list_records (with filter/sort)
  - airtable_create_record
  - airtable_update_record
- Add AIRTABLE_CREDENTIALS with credentialSpec + credentialStore
- Add AirtableHealthChecker for token validation
- Add README with setup and usage
- Add unit tests (9 tests total)

Fixes #2911

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-07 17:14:46 +05:30
mishrapravin114 6f8db0ed08 style: apply ruff format to calendly and health check files
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-07 17:00:05 +05:30
mishrapravin114 6aaf6836ea fix(calendly): resolve ruff lint errors (UP017, E501)
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-07 16:58:48 +05:30
mishrapravin114 4f2348f50e feat(tools): add Calendly scheduling integration
- Add Calendly tool with 4 MCP tools:
  - calendly_list_event_types
  - calendly_get_availability
  - calendly_get_booking_link
  - calendly_cancel_event
- Add CALENDLY_CREDENTIALS with credentialSpec + credentialStore
- Add CalendlyHealthChecker for token validation
- Add README with setup and usage
- Add unit tests (12 tests total)

Fixes #2930

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-07 16:51:27 +05:30
RichardTang-Aden deb7f2f72a Merge pull request #3814 from Amdev-5/feature/x-twitter-integration
fix(tests): update credential group test for X integration
2026-02-06 09:16:42 -08:00
Amdev-5 d989d9c65a fix(tests): update credential group test for X integration
Add test_x_credentials_share_credential_group to verify all X credentials
share the 'x' credential group. Update test_credential_group_default_empty
to account for X credentials alongside existing Google exceptions.
2026-02-06 22:17:40 +05:30
bryan 4173c606ab Merge feature/x-twitter-final-integration from Amdev-5/hive - X (Twitter) tool with DM support 2026-02-06 08:03:43 -08:00
Amdev-5 a01430d20f Merge verification fixes into PR branch 2026-02-06 16:42:56 +05:30
Amdev-5 2a8f775732 feat(tools): enhance X tool with DM support and robust error handling
- Added `x_send_dm` tool using v2 endpoint (`POST /dm_conversations/with/:id/messages`) for reliable 1:1 messaging.
- Fixed 403 Forbidden payload validation errors by simplifying DM payload structure.
- Enhanced `_handle_response` to verify `x_tool.py` returns raw API error details for 403/400 responses, aiding in permission debugging.
- Updated `demo_x_tools.py` to support standard `.env` variable names (e.g., `X_API_KEY`) and added user lookup for DM testing.
- Added unit tests covering new DM functionality and payload verification in `test_x_tool.py`.
- Audited credential handling: Read-only tools (Search/Mentions) correctly use Bearer Token, while Write tools (Post/Reply/Delete/DM) enforce OAuth 1.0a User Context.

Verified with live API tests (see PR description for logs).
2026-02-06 15:48:20 +05:30
Sapna vishnoi 4a0d9b2855 Merge branch 'main' into feat/redshift-integration 2026-02-05 11:44:09 +05:30
y0sif 92c65d69ea chore: resolve merge conflicts with main 2026-02-05 07:13:36 +02:00
Yosif Soliman 910a8968c4 fix(linear): correct GraphQL variable type for workflow states query 2026-02-05 07:00:28 +02:00
Sapna vishnoi cdb4679c5a Merge branch 'main' into feat/redshift-integration 2026-02-05 00:05:38 +05:30
Sapna.Vishnoi 1a9dce89b4 feat(tools): Add Amazon Redshift integration
- Implement 5 core functions for data warehouse querying
- Add boto3 integration with Redshift Data API
- Security: Read-only SELECT queries by default
- Full credential store support
- 26/26 tests passing (100% coverage)
- Complete documentation with examples
2026-02-04 23:58:35 +05:30
Aneesh cf1e4d7f88 Merge remote-tracking branch 'origin/main' into feature/youtube-transcript 2026-02-04 19:46:52 +05:30
Aneesh f2f0b4fc61 feat(tools): add youtube transcript integration via youtube-transcript-api 2026-02-04 19:24:40 +05:30
y0sif b21dd25181 fix(linear): handle credential decryption errors gracefully, handle mcp tool issue with credentials 2026-02-04 05:21:23 +02:00
y0sif 04a18bcbe5 docs(tools): document Linear integration in README and setup credentials claude skill 2026-02-04 04:05:15 +02:00
y0sif 7f66dd67eb feat(linear): add OAuth setup instructions 2026-02-04 04:03:37 +02:00
y0sif cfa03b89c8 test(tools): add comprehensive Linear tool tests 2026-02-04 03:47:28 +02:00
y0sif 9866d7a22b feat(tools): add Linear project management integration 2026-02-04 03:47:03 +02:00
GastonAQS 331a6e442f feat: add Trello integration tools and API client 2026-02-03 10:32:25 -03:00
Sashank Thapa 1c2295b2b5 Merge branch 'adenhq:main' into feature/twitter-x-mcp-tool 2026-02-03 16:20:45 +05:30
Sashank Thapa fa43ca3785 Merge branch 'adenhq:main' into feature/twitter-x-mcp-tool 2026-01-31 16:26:39 +05:30
kozuedoingregression b4a2c3bd14 ruff formatting and lint fixes 2026-01-31 16:18:16 +05:30
kozuedoingregression 2d4ec4f462 lint fix 2026-01-31 16:14:25 +05:30
kozuedoingregression 1e8b933da0 add X (Twitter) integration tool 2026-01-31 15:49:16 +05:30
Aneesh 48b1e0e038 Docs: clarify agent creation assumptions in Getting Started 2026-01-28 22:49:30 +05:30
515 changed files with 76094 additions and 5859 deletions
@@ -0,0 +1,89 @@
name: Integration Bounty
description: A bounty task for the integration contribution program
title: "[Bounty]: "
labels: []
body:
- type: markdown
attributes:
value: |
## Integration Bounty
This issue is part of the [Integration Bounty Program](../../docs/bounty-program/README.md).
**Claim this bounty** by commenting below — a maintainer will assign you within 24 hours.
- type: dropdown
id: bounty-type
attributes:
label: Bounty Type
options:
- "Test a Tool (20 pts)"
- "Write Docs (20 pts)"
- "Code Contribution (30 pts)"
- "New Integration (75 pts)"
validations:
required: true
- type: dropdown
id: difficulty
attributes:
label: Difficulty
options:
- Easy
- Medium
- Hard
validations:
required: true
- type: input
id: tool-name
attributes:
label: Tool Name
description: The integration this bounty targets (e.g., `airtable`, `salesforce`)
placeholder: e.g., airtable
validations:
required: true
- type: textarea
id: description
attributes:
label: Description
description: What needs to be done to complete this bounty.
placeholder: |
Describe the specific task, including:
- What the contributor needs to do
- Links to relevant files in the repo
- Any setup requirements (API keys, accounts, etc.)
validations:
required: true
- type: textarea
id: acceptance-criteria
attributes:
label: Acceptance Criteria
description: What "done" looks like. The PR or report must meet all criteria.
placeholder: |
- [ ] Criterion 1
- [ ] Criterion 2
- [ ] CI passes
validations:
required: true
- type: textarea
id: relevant-files
attributes:
label: Relevant Files
description: Links to tool directory, credential spec, health check file, etc.
placeholder: |
- Tool: `tools/src/aden_tools/tools/{tool_name}/`
- Credential spec: `tools/src/aden_tools/credentials/{category}.py`
- Health checks: `tools/src/aden_tools/credentials/health_check.py`
- type: textarea
id: resources
attributes:
label: Resources
description: Links to API docs, examples, or guides that will help the contributor.
placeholder: |
- [Building Tools Guide](../../tools/BUILDING_TOOLS.md)
- [Tool README Template](../../docs/bounty-program/templates/tool-readme-template.md)
- API docs: https://...
+37
View File
@@ -0,0 +1,37 @@
name: Bounty completed
description: Awards points and notifies Discord when a bounty PR is merged
on:
pull_request:
types: [closed]
jobs:
bounty-notify:
if: >
github.event.pull_request.merged == true &&
contains(join(github.event.pull_request.labels.*.name, ','), 'bounty:')
runs-on: ubuntu-latest
timeout-minutes: 5
permissions:
contents: read
pull-requests: read
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Setup Bun
uses: oven-sh/setup-bun@v2
with:
bun-version: latest
- name: Award XP and notify Discord
run: bun run scripts/bounty-tracker.ts notify
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GITHUB_REPOSITORY_OWNER: ${{ github.repository_owner }}
GITHUB_REPOSITORY_NAME: ${{ github.event.repository.name }}
DISCORD_WEBHOOK_URL: ${{ secrets.DISCORD_BOUNTY_WEBHOOK_URL }}
LURKR_API_KEY: ${{ secrets.LURKR_API_KEY }}
LURKR_GUILD_ID: ${{ secrets.LURKR_GUILD_ID }}
PR_NUMBER: ${{ github.event.pull_request.number }}
+5 -2
View File
@@ -62,8 +62,11 @@ jobs:
uv run pytest tests/ -v
test-tools:
name: Test Tools
runs-on: ubuntu-latest
name: Test Tools (${{ matrix.os }})
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest, windows-latest]
steps:
- uses: actions/checkout@v4
+40
View File
@@ -0,0 +1,40 @@
name: Weekly bounty leaderboard
description: Posts the integration bounty leaderboard to Discord every Monday
on:
schedule:
# Every Monday at 9:00 UTC
- cron: "0 9 * * 1"
workflow_dispatch:
inputs:
since_date:
description: "Only count PRs merged after this date (YYYY-MM-DD). Leave empty for all-time."
required: false
jobs:
leaderboard:
runs-on: ubuntu-latest
timeout-minutes: 5
permissions:
contents: read
pull-requests: read
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Setup Bun
uses: oven-sh/setup-bun@v2
with:
bun-version: latest
- name: Post leaderboard to Discord
run: bun run scripts/bounty-tracker.ts leaderboard
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GITHUB_REPOSITORY_OWNER: ${{ github.repository_owner }}
GITHUB_REPOSITORY_NAME: ${{ github.event.repository.name }}
DISCORD_WEBHOOK_URL: ${{ secrets.DISCORD_BOUNTY_WEBHOOK_URL }}
LURKR_API_KEY: ${{ secrets.LURKR_API_KEY }}
LURKR_GUILD_ID: ${{ secrets.LURKR_GUILD_ID }}
SINCE_DATE: ${{ github.event.inputs.since_date || '' }}
+2
View File
@@ -70,6 +70,7 @@ exports/*
.agent-builder-sessions/*
.claude/settings.local.json
.claude/skills/ship-it/
.venv
@@ -78,3 +79,4 @@ core/tests/*dumps/*
screenshots/*
.gemini/*
+34
View File
@@ -0,0 +1,34 @@
# Repository Guidelines
Shared agent instructions for this workspace.
## Deprecations
- **TUI is deprecated.** The terminal UI (`hive tui`) is no longer maintained. Use the browser-based interface (`hive open`) instead.
## Coding Agent Notes
-
- When working on a GitHub Issue or PR, print the full URL at the end of the task.
- When answering questions, respond with high-confidence answers only: verify in code; do not guess.
- Do not update dependencies casually. Version bumps, patched dependencies, overrides, or vendored dependency changes require explicit approval.
- Add brief comments for tricky logic. Keep files reasonably small when practical; split or refactor large files instead of growing them indefinitely.
- If shared guardrails are available locally, review them; otherwise follow this repo's guidance.
- Use `uv` for Python execution and package management. Do not use `python` or `python3` directly unless the user explicitly asks for it.
- Prefer `uv run` for scripts and tests, and `uv pip` for package operations.
## Multi-Agent Safety
- Do not create, apply, or drop `git stash` entries unless explicitly requested.
- Do not create, remove, or modify `git worktree` checkouts unless explicitly requested.
- Do not switch branches or check out a different branch unless explicitly requested.
- When the user says `push`, you may `git pull --rebase` to integrate latest changes, but never discard other in-progress work.
- When the user says `commit`, commit only your changes. When the user says `commit all`, commit everything in grouped chunks.
- When you see unrecognized files or unrelated changes, keep going and focus on your scoped changes.
## Change Hygiene
- If staged and unstaged diffs are formatting-only, resolve them without asking.
- If a commit or push was already requested, include formatting-only follow-up changes in that same commit when practical.
- Only stop to ask for confirmation when changes are semantic and may alter behavior.
Symlink
+1
View File
@@ -0,0 +1 @@
AGENTS.md
+13 -1
View File
@@ -20,8 +20,20 @@ check: ## Run all checks without modifying files (CI-safe)
cd core && ruff format --check .
cd tools && ruff format --check .
test: ## Run all tests
test: ## Run all tests (core + tools, excludes live)
cd core && uv run python -m pytest tests/ -v
cd tools && uv run python -m pytest -v
test-tools: ## Run tool tests only (mocked, no credentials needed)
cd tools && uv run python -m pytest -v
test-live: ## Run live integration tests (requires real API credentials)
cd tools && uv run python -m pytest -m live -s -o "addopts=" --log-cli-level=INFO
test-all: ## Run everything including live tests
cd core && uv run python -m pytest tests/ -v
cd tools && uv run python -m pytest -v
cd tools && uv run python -m pytest -m live -s -o "addopts=" --log-cli-level=INFO
install-hooks: ## Install pre-commit hooks
uv pip install pre-commit
+22 -87
View File
@@ -37,11 +37,11 @@
## Overview
Build autonomous, reliable, self-improving AI agents without hardcoding workflows. Define your goal through conversation with a coding agent, and the framework generates a node graph with dynamically created connection code. When things break, the framework captures failure data, evolves the agent through the coding agent, and redeploys. Built-in human-in-the-loop nodes, credential management, and real-time monitoring give you control without sacrificing adaptability.
Build autonomous, reliable, self-improving AI agents without hardcoding workflows. Define your goal through conversation with hive coding agent(queen), and the framework generates a node graph with dynamically created connection code. When things break, the framework captures failure data, evolves the agent through the coding agent, and redeploys. Built-in human-in-the-loop nodes, credential management, and real-time monitoring give you control without sacrificing adaptability.
Visit [adenhq.com](https://adenhq.com) for complete documentation, examples, and guides.
https://github.com/user-attachments/assets/846c0cc7-ffd6-47fa-b4b7-495494857a55
[![Hive Demo](https://img.youtube.com/vi/XDOG9fOaLjU/maxresdefault.jpg)](https://www.youtube.com/watch?v=XDOG9fOaLjU)
## Who Is Hive For?
@@ -50,7 +50,7 @@ Hive is designed for developers and teams who want to build **production-grade A
Hive is a good fit if you:
- Want AI agents that **execute real business processes**, not demos
- Prefer **goal-driven development** over hardcoded workflows
- Need **fast or high volume agent execution** over open workflow
- Need **self-healing and adaptive agents** that improve over time
- Require **human-in-the-loop control**, observability, and cost limits
- Plan to run agents in **production environments**
@@ -81,7 +81,8 @@ Use Hive when you need:
### Prerequisites
- Python 3.11+ for agent development
- Claude Code, Codex CLI, or Cursor for utilizing agent skills
- An LLM provider that powers the agents
- **ripgrep (optional, recommended on Windows):** The `search_files` tool uses ripgrep for faster file search. If not installed, a Python fallback is used. On Windows: `winget install BurntSushi.ripgrep` or `scoop install ripgrep`
> **Note for Windows Users:** It is strongly recommended to use **WSL (Windows Subsystem for Linux)** or **Git Bash** to run this framework. Some core automation scripts may not execute correctly in standard Command Prompt or PowerShell.
@@ -110,71 +111,38 @@ This sets up:
- **LLM provider** - Interactive default model configuration
- All required Python dependencies with `uv`
- At last, it will initiate the open hive interface in your browser
> **Tip:** To reopen the dashboard later, run `hive open` from the project directory.
<img width="2500" height="1214" alt="home-screen" src="https://github.com/user-attachments/assets/134d897f-5e75-4874-b00b-e0505f6b45c4" />
### Build Your First Agent
```bash
# Build an agent using Claude Code
claude> /hive
Type the agent you want to build in the home input box
# Test your agent
claude> /hive-debugger
<img width="2500" height="1214" alt="Image" src="https://github.com/user-attachments/assets/1ce19141-a78b-46f5-8d64-dbf987e048f4" />
# (at separate terminal) Launch the interactive dashboard
hive tui
### Use Template Agents
# Or run directly
hive run exports/your_agent_name --input '{"key": "value"}'
```
Click "Try a sample agent" and check the templates. You can run a templates directly or choose to build your version on top of the existing template.
## Coding Agent Support
### Run Agents
### Codex CLI
Now you can run an agent by selectiing the agent (either an existing agent or example agent). You can click the Run button on the top left, or talk to the queen agent and it can run the agent for you.
Hive includes native support for [OpenAI Codex CLI](https://github.com/openai/codex) (v0.101.0+).
1. **Config:** `.codex/config.toml` with `agent-builder` MCP server (tracked in git)
2. **Skills:** `.agents/skills/` symlinks to Hive skills (tracked in git)
3. **Launch:** Run `codex` in the repo root, then type `use hive`
Example:
```
codex> use hive
```
### Opencode
Hive includes native support for [Opencode](https://github.com/opencode-ai/opencode).
1. **Setup:** Run the quickstart script
2. **Launch:** Open Opencode in the project root.
3. **Activate:** Type `/hive` in the chat to switch to the Hive Agent.
4. **Verify:** Ask the agent _"List your tools"_ to confirm the connection.
The agent has access to all Hive skills and can scaffold agents, add tools, and debug workflows directly from the chat.
**[📖 Complete Setup Guide](docs/environment-setup.md)** - Detailed instructions for agent development
### Antigravity IDE Support
Skills and MCP servers are also available in [Antigravity IDE](https://antigravity.google/) (Google's AI-powered IDE). **Easiest:** open a terminal in the hive repo folder and run (use `./` — the script is inside the repo):
```bash
./scripts/setup-antigravity-mcp.sh
```
**Important:** Always restart/refresh Antigravity IDE after running the setup script—MCP servers only load on startup. After restart, **agent-builder** and **tools** MCP servers should connect. Skills are under `.agent/skills/` (symlinks to `.claude/skills/`). See [docs/antigravity-setup.md](docs/antigravity-setup.md) for manual setup and troubleshooting.
<img width="2500" height="1214" alt="Image" src="https://github.com/user-attachments/assets/71c38206-2ad5-49aa-bde8-6698d0bc55f5" />
## Features
- **[Goal-Driven Development](docs/key_concepts/goals_outcome.md)** - Define objectives in natural language; the coding agent generates the agent graph and connection code to achieve them
- **Browser-Use** - Control the browser on your computer to achieve hard tasks
- **Parallel Execution** - Execute the generated graph in parallel. This way you can have multiple agent compelteing the jobs for you
- **[Goal-Driven Generation](docs/key_concepts/goals_outcome.md)** - Define objectives in natural language; the coding agent generates the agent graph and connection code to achieve them
- **[Adaptiveness](docs/key_concepts/evolution.md)** - Framework captures failures, calibrates according to the objectives, and evolves the agent graph
- **[Dynamic Node Connections](docs/key_concepts/graph.md)** - No predefined edges; connection code is generated by any capable LLM based on your goals
- **SDK-Wrapped Nodes** - Every node gets shared memory, local RLM memory, monitoring, tools, and LLM access out of the box
- **[Human-in-the-Loop](docs/key_concepts/graph.md#human-in-the-loop)** - Intervention nodes that pause execution for human input with configurable timeouts and escalation
- **Real-time Observability** - WebSocket streaming for live monitoring of agent execution, decisions, and node-to-node communication
- **Interactive TUI Dashboard** - Terminal-based dashboard with live graph view, event log, and chat interface for agent interaction
- **Cost & Budget Control** - Set spending limits, throttles, and automatic model degradation policies
- **Production-Ready** - Self-hostable, built for scale and reliability
## Integration
@@ -240,35 +208,10 @@ flowchart LR
4. **Control Plane Monitors** → Real-time metrics, budget enforcement, policy management
5. **[Adaptiveness](docs/key_concepts/evolution.md)** → On failure, the system evolves the graph and redeploys automatically
## Run Agents
The `hive` CLI is the primary interface for running agents.
```bash
# Browse and run agents interactively (Recommended)
hive tui
# Run a specific agent directly
hive run exports/my_agent --input '{"task": "Your input here"}'
# Run a specific agent with the TUI dashboard
hive run exports/my_agent --tui
# Interactive REPL
hive shell
```
The TUI scans both `exports/` and `examples/templates/` for available agents.
> **Using Python directly (alternative):** You can also run agents with `PYTHONPATH=exports uv run python -m agent_name run --input '{...}'`
See [environment-setup.md](docs/environment-setup.md) for complete setup instructions.
## Documentation
- **[Developer Guide](docs/developer-guide.md)** - Comprehensive guide for developers
- [Getting Started](docs/getting-started.md) - Quick setup instructions
- [TUI Guide](docs/tui-selection-guide.md) - Interactive dashboard usage
- [Configuration Guide](docs/configuration.md) - All configuration options
- [Architecture Overview](docs/architecture/README.md) - System design and structure
@@ -435,7 +378,7 @@ This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENS
**Q: What LLM providers does Hive support?**
Hive supports 100+ LLM providers through LiteLLM integration, including OpenAI (GPT-4, GPT-4o), Anthropic (Claude models), Google Gemini, DeepSeek, Mistral, Groq, and many more. Simply set the appropriate API key environment variable and specify the model name.
Hive supports 100+ LLM providers through LiteLLM integration, including OpenAI (GPT-4, GPT-4o), Anthropic (Claude models), Google Gemini, DeepSeek, Mistral, Groq, and many more. Simply set the appropriate API key environment variable and specify the model name. We recommend using Claude, GLM and Gemini as they have the best performance.
**Q: Can I use Hive with local AI models like Ollama?**
@@ -477,14 +420,6 @@ Visit [docs.adenhq.com](https://docs.adenhq.com/) for complete guides, API refer
Contributions are welcome! Fork the repository, create your feature branch, implement your changes, and submit a pull request. See [CONTRIBUTING.md](CONTRIBUTING.md) for detailed guidelines.
**Q: When will my team start seeing results from Aden's adaptive agents?**
Aden's adaptation loop begins working from the first execution. When an agent fails, the framework captures the failure data, helping developers evolve the agent graph through the coding agent. How quickly this translates to measurable results depends on the complexity of your use case, the quality of your goal definitions, and the volume of executions generating feedback.
**Q: How does Hive compare to other agent frameworks?**
Hive focuses on generating agents that run real business processes, rather than generic agents. This vision emphasizes outcome-driven design, adaptability, and an easy-to-use set of tools and integrations.
---
<p align="center">
+31
View File
@@ -0,0 +1,31 @@
perf: reduce subprocess spawning in quickstart scripts (#4427)
## Problem
Windows process creation (CreateProcess) is 10-100x slower than Linux fork/exec.
The quickstart scripts were spawning 4+ separate `uv run python -c "import X"`
processes to verify imports, adding ~600ms overhead on Windows.
## Solution
Consolidated all import checks into a single batch script that checks multiple
modules in one subprocess call, reducing spawn overhead by ~75%.
## Changes
- **New**: `scripts/check_requirements.py` - Batched import checker
- **New**: `scripts/test_check_requirements.py` - Test suite
- **New**: `scripts/benchmark_quickstart.ps1` - Performance benchmark tool
- **Modified**: `quickstart.ps1` - Updated import verification (2 sections)
- **Modified**: `quickstart.sh` - Updated import verification
## Performance Impact
**Benchmark results on Windows:**
- Before: ~19.8 seconds for import checks
- After: ~4.9 seconds for import checks
- **Improvement: 14.9 seconds saved (75.2% faster)**
## Testing
- ✅ All functional tests pass (`scripts/test_check_requirements.py`)
- ✅ Quickstart scripts work correctly on Windows
- ✅ Error handling verified (invalid imports reported correctly)
- ✅ Performance benchmark confirms 75%+ improvement
Fixes #4427
+27
View File
@@ -0,0 +1,27 @@
# Identity mapping: GitHub username -> Discord ID
#
# This file links GitHub accounts to Discord accounts for the
# Integration Bounty Program. When a bounty PR is merged, the
# GitHub Action uses this file to ping the contributor on Discord.
#
# HOW TO ADD YOURSELF:
# 1. Fork this repo
# 2. Add your entry below (keep alphabetical order)
# 3. Submit a PR with title: "docs: link @your-github to Discord"
#
# To find your Discord ID:
# 1. Open Discord Settings > Advanced > Enable Developer Mode
# 2. Right-click your name > Copy User ID
#
# Format:
# - github: your-github-username
# discord: "your-discord-id" # quotes required (it's a number)
# name: Your Display Name # optional
contributors:
# - github: example-user
# discord: "123456789012345678"
# name: Example User
- github: TimothyZhang7
discord: "408460790061072384"
name: Timothy@Aden
+9 -9
View File
@@ -64,7 +64,7 @@ To use the agent builder with Claude Desktop or other MCP clients, add this to y
"agent-builder": {
"command": "python",
"args": ["-m", "framework.mcp.agent_builder_server"],
"cwd": "/path/to/goal-agent"
"cwd": "/path/to/hive/core"
}
}
}
@@ -85,14 +85,14 @@ The MCP server provides tools for:
Run an LLM-powered calculator:
```bash
# Single calculation
uv run python -m framework calculate "2 + 3 * 4"
# Run an exported agent
uv run python -m framework run exports/calculator --input '{"expression": "2 + 3 * 4"}'
# Interactive mode
uv run python -m framework interactive
# Interactive shell session
uv run python -m framework shell exports/calculator
# Analyze runs with Builder
uv run python -m framework analyze calculator
# Show agent info
uv run python -m framework info exports/calculator
```
### Using the Runtime
@@ -141,8 +141,8 @@ uv run python -m framework test-run <agent_path> --goal <goal_id> --parallel 4
# Debug failed tests
uv run python -m framework test-debug <agent_path> <test_name>
# List tests for a goal
uv run python -m framework test-list <goal_id>
# List tests for an agent
uv run python -m framework test-list <agent_path>
```
For detailed testing workflows, see the [hive-test skill](../.claude/skills/hive-test/SKILL.md).
+4 -2
View File
@@ -15,6 +15,7 @@ import base64
import hashlib
import http.server
import json
import os
import platform
import secrets
import subprocess
@@ -150,8 +151,9 @@ def save_credentials(token_data: dict, account_id: str) -> None:
if "id_token" in token_data:
auth_data["tokens"]["id_token"] = token_data["id_token"]
CODEX_AUTH_FILE.parent.mkdir(parents=True, exist_ok=True)
with open(CODEX_AUTH_FILE, "w") as f:
CODEX_AUTH_FILE.parent.mkdir(parents=True, exist_ok=True, mode=0o700)
fd = os.open(CODEX_AUTH_FILE, os.O_WRONLY | os.O_CREAT | os.O_TRUNC, 0o600)
with os.fdopen(fd, "w") as f:
json.dump(auth_data, f, indent=2)
+1 -1
View File
@@ -69,7 +69,7 @@ goal = Goal(
id="dynamic-tool-discovery",
description=(
"Always discover available tools dynamically via "
"discover_mcp_tools before referencing tools in agent designs"
"list_agent_tools before referencing tools in agent designs"
),
constraint_type="hard",
category="correctness",
+2 -2
View File
@@ -10,7 +10,7 @@ def _load_preferred_model() -> str:
config_path = Path.home() / ".hive" / "configuration.json"
if config_path.exists():
try:
with open(config_path) as f:
with open(config_path, encoding="utf-8") as f:
config = json.load(f)
llm = config.get("llm", {})
if llm.get("provider") and llm.get("model"):
@@ -24,7 +24,7 @@ def _load_preferred_model() -> str:
class RuntimeConfig:
model: str = field(default_factory=_load_preferred_model)
temperature: float = 0.7
max_tokens: int = 40000
max_tokens: int = 8000
api_key: str | None = None
api_base: str | None = None
+288 -134
View File
@@ -7,19 +7,38 @@ from framework.graph import NodeSpec
# Load reference docs at import time so they're always in the system prompt.
# No voluntary read_file() calls needed — the LLM gets everything upfront.
_ref_dir = Path(__file__).parent.parent / "reference"
_framework_guide = (_ref_dir / "framework_guide.md").read_text()
_file_templates = (_ref_dir / "file_templates.md").read_text()
_anti_patterns = (_ref_dir / "anti_patterns.md").read_text()
_framework_guide = (_ref_dir / "framework_guide.md").read_text(encoding="utf-8")
_file_templates = (_ref_dir / "file_templates.md").read_text(encoding="utf-8")
_anti_patterns = (_ref_dir / "anti_patterns.md").read_text(encoding="utf-8")
_gcu_guide_path = _ref_dir / "gcu_guide.md"
_gcu_guide = _gcu_guide_path.read_text(encoding="utf-8") if _gcu_guide_path.exists() else ""
def _is_gcu_enabled() -> bool:
try:
from framework.config import get_gcu_enabled
return get_gcu_enabled()
except Exception:
return False
def _build_appendices() -> str:
parts = (
"\n\n# Appendix: Framework Reference\n\n"
+ _framework_guide
+ "\n\n# Appendix: File Templates\n\n"
+ _file_templates
+ "\n\n# Appendix: Anti-Patterns\n\n"
+ _anti_patterns
)
if _is_gcu_enabled() and _gcu_guide:
parts += "\n\n# Appendix: GCU Browser Automation Guide\n\n" + _gcu_guide
return parts
# Shared appendices — appended to every coding node's system prompt.
_appendices = (
"\n\n# Appendix: Framework Reference\n\n"
+ _framework_guide
+ "\n\n# Appendix: File Templates\n\n"
+ _file_templates
+ "\n\n# Appendix: Anti-Patterns\n\n"
+ _anti_patterns
)
_appendices = _build_appendices()
# Tools available to both coder (worker) and queen.
_SHARED_TOOLS = [
@@ -27,23 +46,62 @@ _SHARED_TOOLS = [
"read_file",
"write_file",
"edit_file",
"hashline_edit",
"list_directory",
"search_files",
"run_command",
"undo_changes",
# Meta-agent
"list_agent_tools",
"discover_mcp_tools",
"validate_agent_tools",
"list_agents",
"list_agent_sessions",
"get_agent_session_state",
"get_agent_session_memory",
"list_agent_checkpoints",
"get_agent_checkpoint",
"run_agent_tests",
]
# Queen mode-specific tool sets.
# Building mode: full coding + agent construction tools.
_QUEEN_BUILDING_TOOLS = _SHARED_TOOLS + [
"load_built_agent",
"list_credentials",
]
# Staging mode: agent loaded but not yet running — inspect, configure, launch.
_QUEEN_STAGING_TOOLS = [
# Read-only (inspect agent files, logs)
"read_file",
"list_directory",
"search_files",
"run_command",
# Agent inspection
"list_credentials",
"get_worker_status",
# Launch or go back
"run_agent_with_input",
"stop_worker_and_edit",
]
# Running mode: worker is executing — monitor and control.
_QUEEN_RUNNING_TOOLS = [
# Read-only coding (for inspecting logs, files)
"read_file",
"list_directory",
"search_files",
"run_command",
# Credentials
"list_credentials",
# Worker lifecycle
"stop_worker",
"stop_worker_and_edit",
"get_worker_status",
"inject_worker_message",
# Monitoring
"get_worker_health_summary",
"notify_operator",
]
# ---------------------------------------------------------------------------
# Shared agent-building knowledge: core mandates, tool docs, meta-agent
@@ -72,26 +130,35 @@ errors yourself. Don't declare success until validation passes.
# Tools
## Paths (MANDATORY)
**Always use RELATIVE paths**
(e.g. `exports/agent_name/config.py`, `exports/agent_name/nodes/__init__.py`).
**Never use absolute paths** like `/mnt/data/...` or `/workspace/...` they fail.
The project root is implicit.
## File I/O
- read_file(path, offset?, limit?) read with line numbers
- read_file(path, offset?, limit?, hashline?) read with line numbers; \
hashline=True for N:hhhh|content anchors (use with hashline_edit)
- write_file(path, content) create/overwrite, auto-mkdir
- edit_file(path, old_text, new_text, replace_all?) fuzzy-match edit
- hashline_edit(path, edits, auto_cleanup?, encoding?) anchor-based \
editing using N:hhhh refs from read_file(hashline=True). Ops: set_line, \
replace_lines, insert_after, insert_before, replace, append
- list_directory(path, recursive?) list contents
- search_files(pattern, path?, include?) regex search
- search_files(pattern, path?, include?, hashline?) regex search; \
hashline=True for anchors in results
- run_command(command, cwd?, timeout?) shell execution
- undo_changes(path?) restore from git snapshot
## Meta-Agent
- list_agent_tools(server_config_path?) list all tool names available \
for agent building, grouped by category. Call this FIRST before designing.
- discover_mcp_tools(server_config_path?) connect to MCP servers \
and list all available tools with full schemas. Use for parameter details.
- list_agent_tools(server_config_path?, output_schema?, group?) discover \
available tools grouped by category. output_schema: "simple" (default) or \
"full" (includes input_schema). group: "all" (default) or a prefix like \
"gmail". Call FIRST before designing.
- validate_agent_tools(agent_path) validate that all tools declared \
in an agent's nodes actually exist. Call after building.
- list_agents() list all agent packages in exports/ with session counts
- list_agent_sessions(agent_name, status?, limit?) list sessions
- get_agent_session_state(agent_name, session_id) full session state
- get_agent_session_memory(agent_name, session_id, key?) memory data
- list_agent_checkpoints(agent_name, session_id) list checkpoints
- get_agent_checkpoint(agent_name, session_id, checkpoint_id?) load checkpoint
- run_agent_tests(agent_name, test_types?, fail_fast?) run pytest with parsing
@@ -102,15 +169,14 @@ You are not just a file writer. You have deep integration with the \
Hive framework:
## Tool Discovery (MANDATORY before designing)
Before designing any agent, run list_agent_tools() to get all \
available tool names. ONLY use tools from this list in your node \
definitions. NEVER guess or fabricate tool names from memory.
Before designing any agent, run list_agent_tools() to discover all \
available tools. ONLY use tools from this list in your node definitions. \
NEVER guess or fabricate tool names from memory.
For full parameter schemas when you need details:
discover_mcp_tools()
To check a specific agent's configured tools:
list_agent_tools("exports/{agent_name}/mcp_servers.json")
list_agent_tools() # names + descriptions
list_agent_tools(output_schema="full") # include input_schema
list_agent_tools(group="gmail") # only gmail_* tools
list_agent_tools("exports/{agent_name}/mcp_servers.json") # specific agent
## Agent Awareness
Run list_agents() to see what agents already exist. Read their code \
@@ -127,8 +193,7 @@ After writing agent code, validate structurally AND run tests:
## Debugging Built Agents
When a user says "my agent is failing" or "debug this agent":
1. list_agent_sessions("{agent_name}") find the session
2. get_agent_session_state("{agent_name}", "{session_id}") see status
3. get_agent_session_memory("{agent_name}", "{session_id}") inspect data
2. get_worker_status
4. list_agent_checkpoints / get_agent_checkpoint trace execution
# Agent Building Workflow
@@ -227,11 +292,12 @@ explicitly requests a one-shot/batch agent. Forever-alive agents loop \
continuously the user exits by closing the TUI. This is the standard \
pattern for all interactive agents.
### Node Count Rules (HARD LIMITS)
### Node Design Rules
**2-4 nodes** for all agents. Never exceed 4 unless the user explicitly \
requests more. Each node boundary serializes outputs to shared memory \
and DESTROYS all in-context information (tool results, reasoning, history).
Each node boundary serializes outputs to shared memory \
and DESTROYS all in-context information (tool results, reasoning, history). \
Use as many nodes as the use case requires, but don't create nodes without \
tools merge them into nodes that do real work.
**MERGE nodes when:**
- Node has NO tools (pure LLM reasoning) merge into predecessor/successor
@@ -245,10 +311,11 @@ and DESTROYS all in-context information (tool results, reasoning, history).
- Fundamentally different tool sets
- Fan-out parallelism (parallel branches MUST be separate)
**Typical patterns:**
- 2 nodes: `interact (client-facing) process (autonomous) interact`
- 3 nodes: `intake (CF) process (auto) review (CF) intake`
**Typical patterns (queen manages intake NO client-facing intake node):**
- 2 nodes: `process (autonomous) review (client-facing) process`
- 1 node: `process (autonomous)` simplest; queen handles all interaction
- WRONG: 7 nodes where half have no tools and just do LLM reasoning
- WRONG: Intake node that asks the user for requirements the queen does intake
Read reference agents before designing:
list_agents()
@@ -261,20 +328,27 @@ use box-drawing characters and clear flow arrows:
```
intake (client-facing)
tools: set_output
on_success
process (autonomous)
in: user_request
tools: web_search,
save_data
on_success
back to intake
review (client-facing)
tools: set_output
on_success
back to process
```
The queen owns intake: she gathers user requirements, then calls \
`run_agent_with_input(task)` with a structured task description. \
When building the agent, design the entry node's `input_keys` to \
match what the queen will provide at run time. No client-facing \
intake node in the worker.
Follow the graph with a brief summary of each node's purpose. \
Get user approval before implementing.
@@ -337,8 +411,9 @@ from .agent import (
```
**entry_points**: `{"start": "first-node-id"}`
For agents with multiple entry points (e.g. a reminder trigger), \
add them: `{"start": "intake", "reminder": "reminder"}`
The first node should be an autonomous processing node (NOT a \
client-facing intake). For agents with multiple entry points, \
add them: `{"start": "process", "reminder": "check"}`
**conversation_mode** ONLY two valid values:
- `"continuous"` recommended for interactive agents (context carries \
@@ -372,7 +447,8 @@ NO "mcpServers" wrapper. cwd "../../tools". command "uv".
**Storage**: `Path.home() / ".hive" / "agents" / "{name}"`
**Client-facing system prompts** STEP 1/STEP 2 pattern:
**Client-facing system prompts** (review/approval nodes only, NOT intake) \
STEP 1/STEP 2 pattern:
```
STEP 1 Present to user (text only, NO tool calls):
[instructions]
@@ -380,6 +456,9 @@ STEP 1 — Present to user (text only, NO tool calls):
STEP 2 After user responds, call set_output:
[set_output calls]
```
The queen manages intake. Workers should NOT have a client-facing node \
that asks for requirements. Use client_facing=True only for review or \
approval checkpoints mid-execution.
**Autonomous system prompts** set_output in SEPARATE turn.
@@ -389,9 +468,15 @@ If list_agent_tools() shows these don't exist, use alternatives \
(e.g. save_data/load_data for data persistence).
**Node rules**:
- **2-4 nodes MAX.** Never exceed 4. Merge thin nodes aggressively.
- **NO intake nodes.** The queen owns intake. She defines the entry \
node's input_keys at build time and fills them via \
`run_agent_with_input(task)` at run time.
- Don't abuse nodes without tools — merge them into a node that does work.
- A node with 0 tools is NOT a real node merge it.
- node_type always "event_loop"
- node_type "event_loop" for all regular graph nodes. Use "gcu" ONLY for
browser automation subagents (see GCU appendix). GCU nodes MUST be in a
parent node's sub_agents list, NEVER connected via edges, and NEVER used
as entry/terminal nodes.
- max_node_visits default is 0 (unbounded) correct for forever-alive. \
Only set >0 in one-shot agents with bounded feedback loops.
- Feedback inputs: nullable_output_keys
@@ -520,45 +605,89 @@ start_agent("{name}") # triggers default entry point
_queen_tools_docs = """
## Worker Lifecycle
- start_worker(task) Start the worker with a task description. The \
worker runs autonomously until it finishes or asks the user a question.
- stop_worker() Cancel the worker's current execution.
- get_worker_status() Check if the worker is idle, running, or waiting \
for user input. Returns execution details.
- inject_worker_message(content) Send a message to the running worker. \
Use this to relay user instructions or concerns.
## Operating Modes
## Monitoring
- get_worker_health_summary() Read the latest health data from the judge.
- notify_operator(ticket_id, analysis, urgency) Alert the user about a \
critical issue. Use sparingly.
You operate in one of three modes. Your available tools change based on the \
mode. The system notifies you when a mode change occurs.
## Agent Loading
- load_built_agent(agent_path) Load a newly built agent as the worker in \
this session. If a worker is already loaded, it is automatically unloaded \
first. Call after building and validating an agent to make it available \
immediately.
### BUILDING mode (default)
You have full coding tools for building and modifying agents:
- File I/O: read_file, write_file, edit_file, list_directory, search_files, \
run_command, undo_changes
- Meta-agent: list_agent_tools, validate_agent_tools, \
list_agents, list_agent_sessions, \
list_agent_checkpoints, get_agent_checkpoint, run_agent_tests
- load_built_agent(agent_path) Load the agent and switch to STAGING mode
- list_credentials(credential_id?) List authorized credentials
When you finish building an agent, call load_built_agent(path) to stage it.
### STAGING mode (agent loaded, not yet running)
The agent is loaded and ready to run. You can inspect it and launch it:
- Read-only: read_file, list_directory, search_files, run_command
- list_credentials(credential_id?) Verify credentials are configured
- get_worker_status() Check the loaded worker
- run_agent_with_input(task) Start the worker and switch to RUNNING mode
- stop_worker_and_edit() Go back to BUILDING mode
In STAGING mode you do NOT have write tools. If you need to modify the agent, \
call stop_worker_and_edit() to go back to BUILDING mode.
### RUNNING mode (worker is executing)
The worker is running. You have monitoring and lifecycle tools:
- Read-only: read_file, list_directory, search_files, run_command
- get_worker_status() Check worker status (idle, running, waiting)
- inject_worker_message(content) Send a message to the running worker
- get_worker_health_summary() Read the latest health data
- notify_operator(ticket_id, analysis, urgency) Alert the user (use sparingly)
- stop_worker() Stop the worker and return to STAGING mode, then ask the user what to do next
- stop_worker_and_edit() Stop the worker and switch back to BUILDING mode
In RUNNING mode you do NOT have write tools or agent construction tools. \
If you need to modify the agent, call stop_worker_and_edit() to switch back \
to BUILDING mode. To stop the worker and ask the user what to do next, call \
stop_worker() to return to STAGING mode.
### Mode transitions
- load_built_agent(path) switches to STAGING mode
- run_agent_with_input(task) starts worker, switches to RUNNING mode
- stop_worker() stops worker, switches to STAGING mode (ask user: re-run or edit?)
- stop_worker_and_edit() stops worker (if running), switches to BUILDING mode
"""
_queen_behavior = """
# Behavior
## CRITICAL RULE — ask_user tool
Every response that ends with a question, a prompt, or expects user \
input MUST finish with a call to ask_user(prompt, options). This is \
NON-NEGOTIABLE. The system CANNOT detect that you are waiting for \
input unless you call ask_user. You MUST call ask_user as the LAST \
action in your response.
NEVER end a response with a question in text without calling ask_user. \
NEVER rely on the user seeing your text and replying call ask_user.
Always provide 2-4 short options that cover the most likely answers. \
The user can always type a custom response.
Examples:
- ask_user("What do you need?",
["Build a new agent", "Run the loaded worker", "Help with code"])
- ask_user("Which pattern?",
["Simple 2-node", "Rich with feedback", "Custom"])
- ask_user("Ready to proceed?",
["Yes, go ahead", "Let me change something"])
## Greeting and identity
When the user greets you ("hi", "hello") or asks what you can do / \
what you are, respond concisely. DO NOT list internal processes \
(validation steps, AgentRunner.load, tool discovery). Focus on \
user-facing capabilities:
1. Direct capabilities: file operations, shell commands, coding, \
agent building & debugging.
2. Delegation: describe what the loaded worker does in one sentence \
(read the Worker Profile at the end of this prompt). If no worker \
is loaded, say so.
3. End with a short prompt: "What do you need?"
Keep it under 10 lines. No bullet-point dumps of every tool you have.
When the user greets you or asks what you can do, respond concisely \
(under 10 lines). DO NOT list internal processes. Focus on:
1. Direct capabilities: coding, agent building & debugging.
2. What the loaded worker does (one sentence from Worker Profile). \
If no worker is loaded, say so.
3. THEN call ask_user to prompt them do NOT just write text.
## Direct coding
You can do any coding task directly reading files, writing code, running \
@@ -569,7 +698,8 @@ The worker is a specialized agent (see Worker Profile at the end of this \
prompt). It can ONLY do what its goal and tools allow.
**Decision rule read the Worker Profile first:**
- The user's request directly matches the worker's goal start_worker(task)
- The user's request directly matches the worker's goal use \
run_agent_with_input(task) (if in staging) or load then run (if in building)
- Anything else do it yourself. Do NOT reframe user requests into \
subtasks to justify delegation.
- Building, modifying, or configuring agents is ALWAYS your job. Never \
@@ -577,30 +707,72 @@ delegate agent construction to the worker, even as a "research" subtask.
## When the user says "run", "execute", or "start" (without specifics)
The loaded worker is described in the Worker Profile below. Ask what \
task or topic they want do NOT call list_agents() or list directories. \
The worker is already loaded. Just ask for the input the worker needs \
(e.g., a research topic, a target domain, a job description).
The loaded worker is described in the Worker Profile below. You MUST \
ask the user what task or input they want using ask_user do NOT \
invent a task, do NOT call list_agents() or list directories. \
The worker is already loaded. Just ask for the specific input the \
worker needs (e.g., a research topic, a target domain, a job description). \
NEVER call run_agent_with_input until the user has provided their input.
If NO worker is loaded, say so and offer to build one.
## When in staging mode (agent loaded, not running):
- Tell the user the agent is loaded and ready.
- For tasks matching the worker's goal: ALWAYS ask the user for their \
specific input BEFORE calling run_agent_with_input(task). NEVER make up \
or assume what the user wants. Use ask_user to collect the task details \
(e.g., topic, target, requirements). Once you have the user's answer, \
compose a structured task description from their input and call \
run_agent_with_input(task). The worker has no intake node it receives \
your task and starts processing.
- If the user wants to modify the agent, call stop_worker_and_edit().
## When idle (worker not running):
- Greet the user. Mention what the worker can do in one sentence.
- For tasks matching the worker's goal, call start_worker(task).
- For tasks matching the worker's goal, use run_agent_with_input(task) \
(if in staging) or load the agent first (if in building).
- For everything else, do it directly.
## When worker is running:
- If the user asks about progress, call get_worker_status().
- If the user has a concern or instruction for the worker, call \
inject_worker_message(content) to relay it.
- You can still do coding tasks directly while the worker runs.
- If an escalation ticket arrives from the judge, assess severity:
- Low/transient: acknowledge silently, do not disturb the user.
- High/critical: notify the user with a brief analysis and suggested action.
## When the user clicks Run (external event notification)
When you receive an event that the user clicked Run:
- If the worker started successfully, briefly acknowledge it do NOT \
repeat the full status. The user can see the graph is running.
- If the worker failed to start (credential or structural error), \
explain the problem clearly and help fix it. For credential errors, \
guide the user to set up the missing credentials. For structural \
issues, offer to fix the agent graph directly.
## When worker asks user a question:
- The system will route the user's response directly to the worker. \
You do not need to relay it. The user will come back to you after responding.
## When worker is running — GO SILENT
Once you call start_worker(), your job is DONE. Do NOT call ask_user, \
do NOT call get_worker_status(), do NOT emit any text. Just stop. \
The worker owns the conversation now it has its own client-facing \
nodes that talk to the user directly.
**After start_worker, your ENTIRE response should be ONE short \
confirmation sentence with NO tool calls.** Example: \
"Started the vulnerability assessment." that's it. No ask_user, \
no get_worker_status, no follow-up questions.
You only wake up again when:
- The user explicitly addresses you (not answering a worker question)
- A worker question is forwarded to you for relay
- An escalation ticket arrives from the judge
- The worker finishes
If the user explicitly asks about progress, call get_worker_status() \
ONCE and report. Do NOT poll or check proactively.
For escalation tickets: low/transient acknowledge silently. \
High/critical notify the user with a brief analysis.
## When the worker asks the user a question:
- The user's answer is routed to you with context: \
[Worker asked: "...", Options: ...] User answered: "...".
- If the user is answering the worker's question normally, relay it \
using inject_worker_message(answer_text). Then go silent again.
- If the user is rejecting the approach, asking to stop, or giving \
you an instruction, handle it yourself do NOT relay.
## Showing or describing the loaded worker
@@ -616,16 +788,18 @@ building something new.
When the user asks to change, modify, or update the loaded worker \
(e.g., "change the report node", "add a node", "delete node X"):
1. Use the **Path** from the Worker Profile to locate the agent files.
2. Read the relevant files (nodes/__init__.py, agent.py, etc.).
3. Make the requested changes using edit_file / write_file.
4. Run validation (default_agent.validate(), AgentRunner.load(), \
1. Call stop_worker_and_edit() this stops the worker and gives you \
coding tools (switches to BUILDING mode).
2. Use the **Path** from the Worker Profile to locate the agent files.
3. Read the relevant files (nodes/__init__.py, agent.py, etc.).
4. Make the requested changes using edit_file / write_file.
5. Run validation (default_agent.validate(), AgentRunner.load(), \
validate_agent_tools()).
5. **Reload the modified worker**: call load_built_agent("{path}") \
so the changes take effect immediately. If a worker is already loaded, \
stop it first, then reload.
6. **Reload the modified worker**: call load_built_agent("{path}") \
so the changes take effect immediately (switches to STAGING mode). \
Then call run_agent_with_input(task) to restart execution.
Do NOT skip step 5 without reloading, the user will still be \
Do NOT skip step 6 without reloading, the user will still be \
interacting with the old version.
"""
@@ -634,9 +808,9 @@ _queen_phase_7 = """
After building and verifying, load the agent into the current session:
load_built_agent("exports/{name}")
This makes the agent available immediately the user sees its graph, \
the tab name updates, and you can delegate to it via start_worker(). \
Do NOT tell the user to run `python -m {name} run` load it here.
This switches to STAGING mode the user sees the agent's graph and \
the tab name updates. Then call run_agent_with_input(task) to start it. \
Do NOT tell the user to run `python -m {name} run` load and run it here.
"""
_queen_style = """
@@ -766,19 +940,7 @@ queen_node = NodeSpec(
"User's intent is understood, coding tasks are completed correctly, "
"and the worker is managed effectively when delegated to."
),
tools=_SHARED_TOOLS
+ [
# Worker lifecycle
"start_worker",
"stop_worker",
"get_worker_status",
"inject_worker_message",
# Monitoring
"get_worker_health_summary",
"notify_operator",
# Agent loading
"load_built_agent",
],
tools=sorted(set(_QUEEN_BUILDING_TOOLS + _QUEEN_STAGING_TOOLS + _QUEEN_RUNNING_TOOLS)),
system_prompt=(
"You are the Queen — the user's primary interface. You are a coding agent "
"with the same capabilities as the Hive Coder worker, PLUS the ability to "
@@ -792,18 +954,7 @@ queen_node = NodeSpec(
),
)
ALL_QUEEN_TOOLS = _SHARED_TOOLS + [
# Worker lifecycle
"start_worker",
"stop_worker",
"get_worker_status",
"inject_worker_message",
# Monitoring
"get_worker_health_summary",
"notify_operator",
# Agent loading
"load_built_agent",
]
ALL_QUEEN_TOOLS = sorted(set(_QUEEN_BUILDING_TOOLS + _QUEEN_STAGING_TOOLS + _QUEEN_RUNNING_TOOLS))
__all__ = [
"coder_node",
@@ -811,4 +962,7 @@ __all__ = [
"queen_node",
"ALL_QUEEN_TRIAGE_TOOLS",
"ALL_QUEEN_TOOLS",
"_QUEEN_BUILDING_TOOLS",
"_QUEEN_STAGING_TOOLS",
"_QUEEN_RUNNING_TOOLS",
]
@@ -48,11 +48,11 @@ profile_setup → daily_intake → update_tracker → analyze_progress → gener
```
`analyze_progress` has no tools. `schedule_reminders` just sets one boolean. `report` just presents analysis. `update_tracker` and `generate_plan` are sequential autonomous work.
**Good example** (3 nodes):
**Good example** (2 nodes):
```
intake (client-facing) → process (autonomous: track + analyze + plan) → intake (loop back)
process (autonomous: track + analyze + plan) → review (client-facing) → process (loop back)
```
One client-facing node handles ALL user interaction (setup, logging, reports). One autonomous node handles ALL backend work (CSV update, analysis, plan generation) with tools and context preserved.
The queen handles intake (gathering requirements from the user) and passes the task via `run_agent_with_input(task)`. One autonomous node handles ALL backend work (CSV update, analysis, plan generation) with tools and context preserved. One client-facing node handles review/approval when needed.
12. **Adding framework gating for LLM behavior** — Don't add output rollback, premature rejection, or interaction protocol injection. Fix with better prompts or custom judges.
@@ -105,3 +105,9 @@ def test_research_routes_back_to_interact(self):
23. **Forgetting sys.path setup in conftest.py** — Tests need `exports/` and `core/` on sys.path.
24. **Not using auto_responder for client-facing nodes** — Tests with client-facing nodes hang without an auto-responder that injects input. But note: even WITH auto_responder, forever-alive agents still hang because the graph never terminates. Auto-responder only helps for agents with terminal nodes.
25. **Manually wiring browser tools on event_loop nodes** — If the agent needs browser automation, use `node_type="gcu"` which auto-includes all browser tools and prepends best-practices guidance. Do NOT manually list browser tool names on event_loop nodes — they may not exist in the MCP server or may be incomplete. See the GCU Guide appendix.
26. **Using GCU nodes as regular graph nodes** — GCU nodes (`node_type="gcu"`) are exclusively subagents. They must ONLY appear in a parent node's `sub_agents=["gcu-node-id"]` list and be invoked via `delegate_to_sub_agent()`. They must NEVER be connected via edges, used as entry nodes, or used as terminal nodes. If a GCU node appears as an edge source or target, the graph will fail pre-load validation.
27. **Adding a client-facing intake node to worker agents** — The queen owns intake. She defines the entry node's `input_keys` at build time and fills them via `run_agent_with_input(task)` at run time. Worker agents should start with an autonomous processing node, NOT a client-facing intake node that asks the user for requirements. Client-facing nodes in workers are for mid-execution review/approval only.
@@ -57,51 +57,28 @@ metadata = AgentMetadata()
from framework.graph import NodeSpec
# Node 1: Intake (client-facing)
intake_node = NodeSpec(
id="intake",
name="Intake",
description="Gather requirements from the user",
# Node 1: Process (autonomous entry node)
# The queen handles intake and passes structured input via
# run_agent_with_input(task). NO client-facing intake node.
# The queen defines input_keys at build time and fills them at run time.
process_node = NodeSpec(
id="process",
name="Process",
description="Execute the task using available tools",
node_type="event_loop",
client_facing=True,
max_node_visits=0, # Unlimited for forever-alive
input_keys=["topic"],
output_keys=["brief"],
success_criteria="The brief is specific and actionable.",
system_prompt="""\
You are an intake specialist.
**STEP 1 — Read and respond (text only, NO tool calls):**
1. Read the topic provided
2. If vague, ask 1-2 clarifying questions
3. If clear, confirm your understanding
**STEP 2 — After the user confirms, call set_output:**
- set_output("brief", "Clear description of what to do")
""",
tools=[],
)
# Node 2: Worker (autonomous)
worker_node = NodeSpec(
id="worker",
name="Worker",
description="Do the main work",
node_type="event_loop",
max_node_visits=0,
input_keys=["brief", "feedback"],
input_keys=["user_request", "feedback"],
output_keys=["results"],
nullable_output_keys=["feedback"], # Only on feedback edge
success_criteria="Results are complete and accurate.",
system_prompt="""\
You are a worker agent. Given a brief, do the work.
If feedback is provided, this is a follow-up — address the feedback.
You are a processing agent. Your task is in memory under "user_request". \
If "feedback" is present, this is a revision — address the feedback.
Work in phases:
1. Use tools to gather/process data
2. Analyze results
3. Call set_output for each key in a SEPARATE turn:
3. Call set_output in a SEPARATE turn:
- set_output("results", "structured results")
""",
tools=["web_search", "web_scrape", "save_data", "load_data", "list_data_files"],
@@ -115,7 +92,7 @@ review_node = NodeSpec(
node_type="event_loop",
client_facing=True,
max_node_visits=0,
input_keys=["results", "brief"],
input_keys=["results", "user_request"],
output_keys=["next_action", "feedback"],
nullable_output_keys=["feedback"],
success_criteria="User has reviewed and decided next steps.",
@@ -128,14 +105,14 @@ Present the results to the user.
3. Ask: satisfied, or want changes?
**STEP 2 — After user responds, call set_output:**
- set_output("next_action", "new_topic") — if starting fresh
- set_output("next_action", "done") — if satisfied
- set_output("next_action", "revise") — if changes needed
- set_output("feedback", "what to change") — only if revising
""",
tools=[],
)
__all__ = ["intake_node", "worker_node", "review_node"]
__all__ = ["process_node", "review_node"]
```
## agent.py
@@ -155,7 +132,7 @@ from framework.runtime.agent_runtime import AgentRuntime, create_agent_runtime
from framework.runtime.execution_stream import EntryPointSpec
from .config import default_config, metadata
from .nodes import intake_node, worker_node, review_node
from .nodes import process_node, review_node
# Goal definition
goal = Goal(
@@ -172,27 +149,26 @@ goal = Goal(
)
# Node list
nodes = [intake_node, worker_node, review_node]
nodes = [process_node, review_node]
# Edge definitions
edges = [
EdgeSpec(id="intake-to-worker", source="intake", target="worker",
EdgeSpec(id="process-to-review", source="process", target="review",
condition=EdgeCondition.ON_SUCCESS, priority=1),
EdgeSpec(id="worker-to-review", source="worker", target="review",
condition=EdgeCondition.ON_SUCCESS, priority=1),
# Feedback loop
EdgeSpec(id="review-to-worker", source="review", target="worker",
# Feedback loop — revise results
EdgeSpec(id="review-to-process", source="review", target="process",
condition=EdgeCondition.CONDITIONAL,
condition_expr="str(next_action).lower() == 'revise'", priority=2),
# Loop back for new topic
EdgeSpec(id="review-to-intake", source="review", target="intake",
# Loop back for next task (queen sends new input)
EdgeSpec(id="review-done", source="review", target="process",
condition=EdgeCondition.CONDITIONAL,
condition_expr="str(next_action).lower() == 'new_topic'", priority=1),
condition_expr="str(next_action).lower() == 'done'", priority=1),
]
# Graph configuration
entry_node = "intake"
entry_points = {"start": "intake"}
# Graph configuration — entry is the autonomous process node
# The queen handles intake and passes the task via run_agent_with_input(task)
entry_node = "process"
entry_points = {"start": "process"}
pause_nodes = []
terminal_nodes = [] # Forever-alive
@@ -208,7 +184,7 @@ class MyAgent:
self.goal = goal
self.nodes = nodes
self.edges = edges
self.entry_node = entry_node
self.entry_node = entry_node # "process" — autonomous entry
self.entry_points = entry_points
self.pause_nodes = pause_nodes
self.terminal_nodes = terminal_nodes
@@ -498,7 +474,7 @@ def tui():
llm = LiteLLMProvider(model=agent.config.model, api_key=agent.config.api_key, api_base=agent.config.api_base)
runtime = create_agent_runtime(
graph=agent._build_graph(), goal=agent.goal, storage_path=storage,
entry_points=[EntryPointSpec(id="start", name="Start", entry_node="intake", trigger_type="manual", isolation_level="isolated")],
entry_points=[EntryPointSpec(id="start", name="Start", entry_node="process", trigger_type="manual", isolation_level="isolated")],
llm=llm, tools=list(agent._tool_registry.get_tools().values()), tool_executor=agent._tool_registry.get_executor())
await runtime.start()
try:
@@ -72,7 +72,7 @@ goal = Goal(
| id | str | required | kebab-case identifier |
| name | str | required | Display name |
| description | str | required | What the node does |
| node_type | str | required | Always `"event_loop"` |
| node_type | str | required | `"event_loop"` or `"gcu"` (browser automation — see GCU Guide appendix) |
| input_keys | list[str] | required | Memory keys this node reads |
| output_keys | list[str] | required | Memory keys this node writes via set_output |
| system_prompt | str | "" | LLM instructions |
@@ -131,13 +131,19 @@ downstream node only sees the serialized summary string.
- A "report" node that presents analysis → merge into the client-facing node
- A "confirm" or "schedule" node that doesn't call any external service → remove
**Typical agent structure (3 nodes):**
**Typical agent structure (2 nodes):**
```
intake (client-facing) ←→ process (autonomous) ←→ review (client-facing)
process (autonomous) ←→ review (client-facing)
```
Or for simpler agents, just 2 nodes:
The queen owns intake — she gathers requirements from the user, then
passes structured input via `run_agent_with_input(task)`. When building
the agent, design the entry node's `input_keys` to match what the queen
will provide at run time. Worker agents should NOT have a client-facing
intake node. Client-facing nodes are for mid-execution review/approval only.
For simpler agents, just 1 autonomous node:
```
interact (client-facing) → process (autonomous) → interact (loop)
process (autonomous) — loops back to itself
```
### nullable_output_keys
@@ -397,7 +403,7 @@ from .agent import (
### Reference Agent
See `exports/gmail_inbox_guardian/agent.py` for a complete example with:
- Primary client-facing intake node (user configures rules)
- Primary client-facing node (user configures rules)
- Timer-based scheduled inbox checks (every 20 min)
- Webhook-triggered email event handling
- Shared isolation for memory access across streams
@@ -413,13 +419,13 @@ See `exports/gmail_inbox_guardian/agent.py` for a complete example with:
## Tool Discovery
Do NOT rely on a static tool list — it will be outdated. Always use
`list_agent_tools()` to get available tool names grouped by category.
For full schemas with parameter details, use `discover_mcp_tools()`.
`list_agent_tools()` to discover available tools, grouped by category.
```
list_agent_tools() # all available tools
list_agent_tools("exports/my_agent/mcp_servers.json") # specific agent
discover_mcp_tools() # full schemas with params
list_agent_tools() # names + descriptions, all groups
list_agent_tools(output_schema="full") # include input_schema
list_agent_tools(group="gmail") # only gmail_* tools
list_agent_tools("exports/my_agent/mcp_servers.json") # specific agent's tools
```
After building, validate tools exist: `validate_agent_tools("exports/{name}")`
@@ -0,0 +1,119 @@
# GCU Browser Automation Guide
## When to Use GCU Nodes
Use `node_type="gcu"` when:
- The user's workflow requires **navigating real websites** (scraping, form-filling, social media interaction, testing web UIs)
- The task involves **dynamic/JS-rendered pages** that `web_scrape` cannot handle (SPAs, infinite scroll, login-gated content)
- The agent needs to **interact with a website** — clicking, typing, scrolling, selecting, uploading files
Do NOT use GCU for:
- Static content that `web_scrape` handles fine
- API-accessible data (use the API directly)
- PDF/file processing
- Anything that doesn't require a browser UI
## What GCU Nodes Are
- `node_type="gcu"` — a declarative enhancement over `event_loop`
- Framework auto-prepends browser best-practices system prompt
- Framework auto-includes all 31 browser tools from `gcu-tools` MCP server
- Same underlying `EventLoopNode` class — no new imports needed
- `tools=[]` is correct — tools are auto-populated at runtime
## GCU Architecture Pattern
GCU nodes are **subagents** — invoked via `delegate_to_sub_agent()`, not connected via edges.
- Primary nodes (`event_loop`, client-facing) orchestrate; GCU nodes do browser work
- Parent node declares `sub_agents=["gcu-node-id"]` and calls `delegate_to_sub_agent(agent_id="gcu-node-id", task="...")`
- GCU nodes set `max_node_visits=1` (single execution per delegation), `client_facing=False`
- GCU nodes use `output_keys=["result"]` and return structured JSON via `set_output("result", ...)`
## GCU Node Definition Template
```python
gcu_browser_node = NodeSpec(
id="gcu-browser-worker",
name="Browser Worker",
description="Browser subagent that does X.",
node_type="gcu",
client_facing=False,
max_node_visits=1,
input_keys=[],
output_keys=["result"],
tools=[], # Auto-populated with all browser tools
system_prompt="""\
You are a browser agent. Your job: [specific task].
## Workflow
1. browser_start (only if no browser is running yet)
2. browser_open(url=TARGET_URL) — note the returned targetId
3. browser_snapshot to read the page
4. [task-specific steps]
5. set_output("result", JSON)
## Output format
set_output("result", JSON) with:
- [field]: [type and description]
""",
)
```
## Parent Node Template (orchestrating GCU subagents)
```python
orchestrator_node = NodeSpec(
id="orchestrator",
...
node_type="event_loop",
sub_agents=["gcu-browser-worker"],
system_prompt="""\
...
delegate_to_sub_agent(
agent_id="gcu-browser-worker",
task="Navigate to [URL]. Do [specific task]. Return JSON with [fields]."
)
...
""",
tools=[], # Orchestrator doesn't need browser tools
)
```
## mcp_servers.json with GCU
```json
{
"hive-tools": { ... },
"gcu-tools": {
"transport": "stdio",
"command": "uv",
"args": ["run", "python", "-m", "gcu.server", "--stdio"],
"cwd": "../../tools",
"description": "GCU tools for browser automation"
}
}
```
Note: `gcu-tools` is auto-added if any node uses `node_type="gcu"`, but including it explicitly is fine.
## GCU System Prompt Best Practices
Key rules to bake into GCU node prompts:
- Prefer `browser_snapshot` over `browser_get_text("body")` — compact accessibility tree vs 100KB+ raw HTML
- Always `browser_wait` after navigation
- Use large scroll amounts (~2000-5000) for lazy-loaded content
- For spillover files, use `run_command` with grep, not `read_file`
- If auth wall detected, report immediately — don't attempt login
- Keep tool calls per turn ≤10
- Tab isolation: when browser is already running, use `browser_open(background=true)` and pass `target_id` to every call
## GCU Anti-Patterns
- Using `browser_screenshot` to read text (use `browser_snapshot`)
- Re-navigating after scrolling (resets scroll position)
- Attempting login on auth walls
- Forgetting `target_id` in multi-tab scenarios
- Putting browser tools directly on `event_loop` nodes instead of using GCU subagent pattern
- Making GCU nodes `client_facing=True` (they should be autonomous subagents)
+3 -3
View File
@@ -660,7 +660,7 @@ class GraphBuilder:
# Generate Python code
code = self._generate_code(graph)
Path(path).write_text(code)
Path(path).write_text(code, encoding="utf-8")
self.session.phase = BuildPhase.EXPORTED
self._save_session()
@@ -754,14 +754,14 @@ class GraphBuilder:
"""Save session to disk."""
self.session.updated_at = datetime.now()
path = self.storage_path / f"{self.session.id}.json"
path.write_text(self.session.model_dump_json(indent=2))
path.write_text(self.session.model_dump_json(indent=2), encoding="utf-8")
def _load_session(self, session_id: str) -> BuildSession:
"""Load session from disk."""
path = self.storage_path / f"{session_id}.json"
if not path.exists():
raise FileNotFoundError(f"Session not found: {session_id}")
return BuildSession.model_validate_json(path.read_text())
return BuildSession.model_validate_json(path.read_text(encoding="utf-8"))
@classmethod
def list_sessions(cls, storage_path: Path | str | None = None) -> list[str]:
+13 -1
View File
@@ -6,6 +6,7 @@ helper functions.
"""
import json
import logging
import os
from dataclasses import dataclass, field
from pathlib import Path
@@ -18,6 +19,7 @@ from framework.graph.edge import DEFAULT_MAX_TOKENS
# ---------------------------------------------------------------------------
HIVE_CONFIG_FILE = Path.home() / ".hive" / "configuration.json"
logger = logging.getLogger(__name__)
def get_hive_config() -> dict[str, Any]:
@@ -27,7 +29,12 @@ def get_hive_config() -> dict[str, Any]:
try:
with open(HIVE_CONFIG_FILE, encoding="utf-8-sig") as f:
return json.load(f)
except (json.JSONDecodeError, OSError):
except (json.JSONDecodeError, OSError) as e:
logger.warning(
"Failed to load Hive config %s: %s",
HIVE_CONFIG_FILE,
e,
)
return {}
@@ -90,6 +97,11 @@ def get_api_key() -> str | None:
return None
def get_gcu_enabled() -> bool:
"""Return whether GCU (browser automation) is enabled in user config."""
return get_hive_config().get("gcu_enabled", True)
def get_api_base() -> str | None:
"""Return the api_base URL for OpenAI-compatible endpoints, if configured."""
llm = get_hive_config().get("llm", {})
+2 -2
View File
@@ -69,7 +69,7 @@ def save_credential_key(key: str) -> Path:
# Restrict the secrets directory itself
path.parent.chmod(stat.S_IRWXU) # 0o700
path.write_text(key)
path.write_text(key, encoding="utf-8")
path.chmod(stat.S_IRUSR | stat.S_IWUSR) # 0o600
os.environ[CREDENTIAL_KEY_ENV_VAR] = key
@@ -164,7 +164,7 @@ def _read_credential_key_file() -> str | None:
"""Read the credential key from ``~/.hive/secrets/credential_key``."""
try:
if CREDENTIAL_KEY_PATH.is_file():
value = CREDENTIAL_KEY_PATH.read_text().strip()
value = CREDENTIAL_KEY_PATH.read_text(encoding="utf-8").strip()
if value:
return value
except Exception:
@@ -73,6 +73,7 @@ from .provider import (
TokenExpiredError,
TokenPlacement,
)
from .zoho_provider import ZohoOAuth2Provider
__all__ = [
# Types
@@ -82,6 +83,7 @@ __all__ = [
# Providers
"BaseOAuth2Provider",
"HubSpotOAuth2Provider",
"ZohoOAuth2Provider",
# Lifecycle
"TokenLifecycleManager",
"TokenRefreshResult",
@@ -0,0 +1,198 @@
"""
Zoho CRM-specific OAuth2 provider.
Pre-configured for Zoho's OAuth2 endpoints and CRM scopes.
Extends BaseOAuth2Provider for Zoho-specific behavior.
Usage:
provider = ZohoOAuth2Provider(
client_id="your-client-id",
client_secret="your-client-secret",
accounts_domain="https://accounts.zoho.com", # or .in, .eu, etc.
)
# Use with credential store
store = CredentialStore(
storage=EncryptedFileStorage(),
providers=[provider],
)
See: https://www.zoho.com/crm/developer/docs/api/v2/access-refresh.html
"""
from __future__ import annotations
import logging
import os
from typing import Any
from ..models import CredentialObject, CredentialRefreshError, CredentialType
from .base_provider import BaseOAuth2Provider
from .provider import OAuth2Config, OAuth2Token, TokenPlacement
logger = logging.getLogger(__name__)
# Default CRM scopes for Phase 1 (Leads, Contacts, Accounts, Deals, Notes)
ZOHO_DEFAULT_SCOPES = [
"ZohoCRM.modules.leads.ALL",
"ZohoCRM.modules.contacts.ALL",
"ZohoCRM.modules.accounts.ALL",
"ZohoCRM.modules.deals.ALL",
"ZohoCRM.modules.notes.CREATE",
]
class ZohoOAuth2Provider(BaseOAuth2Provider):
"""
Zoho CRM OAuth2 provider with pre-configured endpoints.
Handles Zoho-specific OAuth2 behavior:
- Pre-configured token and authorization URLs (region-aware)
- Default CRM scopes for Leads, Contacts, Accounts, Deals, Notes
- Token validation via Zoho CRM API
- Authorization header format: "Authorization: Zoho-oauthtoken {token}"
Example:
provider = ZohoOAuth2Provider(
client_id="your-zoho-client-id",
client_secret="your-zoho-client-secret",
accounts_domain="https://accounts.zoho.com", # US
# or "https://accounts.zoho.in" for India
# or "https://accounts.zoho.eu" for EU
)
"""
def __init__(
self,
client_id: str,
client_secret: str,
accounts_domain: str = "https://accounts.zoho.com",
api_domain: str | None = None,
scopes: list[str] | None = None,
):
"""
Initialize Zoho OAuth2 provider.
Args:
client_id: Zoho OAuth2 client ID
client_secret: Zoho OAuth2 client secret
accounts_domain: Zoho accounts domain (region-specific)
- US: https://accounts.zoho.com
- India: https://accounts.zoho.in
- EU: https://accounts.zoho.eu
- etc.
api_domain: Zoho API domain for CRM calls (used in validate).
Defaults to ZOHO_API_DOMAIN env or https://www.zohoapis.com
scopes: Override default scopes if needed
"""
base = accounts_domain.rstrip("/")
token_url = f"{base}/oauth/v2/token"
auth_url = f"{base}/oauth/v2/auth"
config = OAuth2Config(
token_url=token_url,
authorization_url=auth_url,
client_id=client_id,
client_secret=client_secret,
default_scopes=scopes or ZOHO_DEFAULT_SCOPES,
token_placement=TokenPlacement.HEADER_CUSTOM,
custom_header_name="Authorization",
)
super().__init__(config, provider_id="zoho_crm_oauth2")
self._accounts_domain = base
self._api_domain = (
api_domain or os.getenv("ZOHO_API_DOMAIN", "https://www.zohoapis.com")
).rstrip("/")
@property
def supported_types(self) -> list[CredentialType]:
return [CredentialType.OAUTH2]
def format_for_request(self, token: OAuth2Token) -> dict[str, Any]:
"""
Format token for Zoho CRM API requests.
Zoho uses Authorization header: "Zoho-oauthtoken {access_token}"
(not Bearer).
"""
return {
"headers": {
"Authorization": f"Zoho-oauthtoken {token.access_token}",
"Content-Type": "application/json",
"Accept": "application/json",
}
}
def validate(self, credential: CredentialObject) -> bool:
"""
Validate Zoho credential by making a lightweight API call.
Uses GET /crm/v2/users?type=CurrentUser (doesn't require module access).
Treats 429 as valid-but-rate-limited.
"""
access_token = credential.get_key("access_token")
if not access_token:
return False
try:
client = self._get_client()
response = client.get(
f"{self._api_domain}/crm/v2/users?type=CurrentUser",
headers={
"Authorization": f"Zoho-oauthtoken {access_token}",
"Accept": "application/json",
},
timeout=self.config.request_timeout,
)
return response.status_code in (200, 429)
except Exception as e:
logger.debug("Zoho credential validation failed: %s", e)
return False
def _parse_token_response(self, response_data: dict[str, Any]) -> OAuth2Token:
"""
Parse Zoho token response.
Zoho returns:
{
"access_token": "...",
"refresh_token": "...",
"expires_in": 3600,
"api_domain": "https://www.zohoapis.com",
"token_type": "Bearer"
}
"""
token = OAuth2Token.from_token_response(response_data)
if "api_domain" in response_data:
token.raw_response["api_domain"] = response_data["api_domain"]
return token
def refresh(self, credential: CredentialObject) -> CredentialObject:
"""Refresh Zoho OAuth2 credential and persist DC metadata."""
refresh_tok = credential.get_key("refresh_token")
if not refresh_tok:
raise CredentialRefreshError(f"Credential '{credential.id}' has no refresh_token")
try:
new_token = self.refresh_access_token(refresh_tok)
except Exception as e:
raise CredentialRefreshError(f"Failed to refresh '{credential.id}': {e}") from e
credential.set_key("access_token", new_token.access_token, expires_at=new_token.expires_at)
if new_token.refresh_token and new_token.refresh_token != refresh_tok:
credential.set_key("refresh_token", new_token.refresh_token)
api_domain = new_token.raw_response.get("api_domain")
if isinstance(api_domain, str) and api_domain:
credential.set_key("api_domain", api_domain.rstrip("/"))
accounts_server = new_token.raw_response.get("accounts-server")
if isinstance(accounts_server, str) and accounts_server:
credential.set_key("accounts_domain", accounts_server.rstrip("/"))
location = new_token.raw_response.get("location")
if isinstance(location, str) and location:
credential.set_key("location", location.strip().lower())
return credential
+1 -1
View File
@@ -568,7 +568,7 @@ def _load_nodes_from_python_agent(agent_path: Path) -> list:
def _load_nodes_from_json_agent(agent_json: Path) -> list:
"""Load nodes from a JSON-based agent."""
try:
with open(agent_json) as f:
with open(agent_json, encoding="utf-8") as f:
data = json.load(f)
from framework.graph import NodeSpec
+3 -3
View File
@@ -227,7 +227,7 @@ class EncryptedFileStorage(CredentialStorage):
index_path = self.base_path / "metadata" / "index.json"
if not index_path.exists():
return []
with open(index_path) as f:
with open(index_path, encoding="utf-8") as f:
index = json.load(f)
return list(index.get("credentials", {}).keys())
@@ -268,7 +268,7 @@ class EncryptedFileStorage(CredentialStorage):
index_path = self.base_path / "metadata" / "index.json"
if index_path.exists():
with open(index_path) as f:
with open(index_path, encoding="utf-8") as f:
index = json.load(f)
else:
index = {"credentials": {}, "version": "1.0"}
@@ -283,7 +283,7 @@ class EncryptedFileStorage(CredentialStorage):
index["last_modified"] = datetime.now(UTC).isoformat()
with open(index_path, "w") as f:
with open(index_path, "w", encoding="utf-8") as f:
json.dump(index, f, indent=2)
+1 -5
View File
@@ -159,11 +159,7 @@ class CredentialValidationResult:
f" {c.env_var} for {_label(c)}"
f"\n Connect this integration at hive.adenhq.com first."
)
lines.append(
"\nTo fix: run /hive-credentials in Claude Code."
"\nIf you've already set up credentials, "
"restart your terminal to load them."
)
lines.append("\nIf you've already set up credentials, restart your terminal to load them.")
return "\n".join(lines)
+230 -44
View File
@@ -107,17 +107,38 @@ _TC_ARG_LIMIT = 200 # max chars per tool_call argument after compaction
def _compact_tool_calls(tool_calls: list[dict[str, Any]]) -> list[dict[str, Any]]:
"""Truncate tool_call arguments to save context tokens during compaction.
Preserves ``id``, ``type``, and ``function.name`` exactly. Truncates
``function.arguments`` (a JSON string) to at most ``_TC_ARG_LIMIT`` chars
so that large payloads (e.g. set_output with full findings) don't survive
compaction and defeat the purpose of context reduction.
Preserves ``id``, ``type``, and ``function.name`` exactly. When arguments
exceed ``_TC_ARG_LIMIT``, replaces the full JSON string with a compact
**valid** JSON summary. The Anthropic API parses tool_call arguments and
rejects requests with malformed JSON (e.g. unterminated strings), so we
must never produce broken JSON here.
"""
compact = []
for tc in tool_calls:
func = tc.get("function", {})
args = func.get("arguments", "")
if len(args) > _TC_ARG_LIMIT:
args = args[:_TC_ARG_LIMIT] + "…[truncated]"
# Build a valid JSON summary instead of slicing mid-string.
# Try to extract top-level keys for a meaningful preview.
try:
parsed = json.loads(args)
if isinstance(parsed, dict):
# Preserve key names, truncate values
summary_parts = []
for k, v in parsed.items():
v_str = str(v)
if len(v_str) > 60:
v_str = v_str[:60] + "..."
summary_parts.append(f"{k}={v_str}")
summary = ", ".join(summary_parts)
if len(summary) > _TC_ARG_LIMIT:
summary = summary[:_TC_ARG_LIMIT] + "..."
args = json.dumps({"_compacted": summary})
else:
args = json.dumps({"_compacted": str(parsed)[:_TC_ARG_LIMIT]})
except (json.JSONDecodeError, TypeError):
# Args were already invalid JSON — wrap the preview safely
args = json.dumps({"_compacted": args[:_TC_ARG_LIMIT]})
compact.append(
{
"id": tc.get("id", ""),
@@ -131,6 +152,72 @@ def _compact_tool_calls(tool_calls: list[dict[str, Any]]) -> list[dict[str, Any]
return compact
def extract_tool_call_history(messages: list[Message], max_entries: int = 30) -> str:
"""Build a compact tool call history from a list of messages.
Used in compaction summaries to prevent the LLM from re-calling
tools it already called. Extracts tool call details, files saved,
outputs set, and errors encountered.
"""
tool_calls_detail: dict[str, list[str]] = {}
files_saved: list[str] = []
outputs_set: list[str] = []
errors: list[str] = []
def _summarize_input(name: str, args: dict) -> str:
if name == "web_search":
return args.get("query", "")
if name == "web_scrape":
return args.get("url", "")
if name in ("load_data", "save_data"):
return args.get("filename", "")
return ""
for msg in messages:
if msg.role == "assistant" and msg.tool_calls:
for tc in msg.tool_calls:
func = tc.get("function", {})
name = func.get("name", "unknown")
try:
args = json.loads(func.get("arguments", "{}"))
except (json.JSONDecodeError, TypeError):
args = {}
summary = _summarize_input(name, args)
tool_calls_detail.setdefault(name, []).append(summary)
if name == "save_data" and args.get("filename"):
files_saved.append(args["filename"])
if name == "set_output" and args.get("key"):
outputs_set.append(args["key"])
if msg.role == "tool" and msg.is_error:
preview = msg.content[:120].replace("\n", " ")
errors.append(preview)
parts: list[str] = []
if tool_calls_detail:
lines: list[str] = []
for name, inputs in list(tool_calls_detail.items())[:max_entries]:
count = len(inputs)
non_empty = [s for s in inputs if s]
if non_empty:
detail_lines = [f" - {s[:120]}" for s in non_empty[:8]]
lines.append(f" {name} ({count}x):\n" + "\n".join(detail_lines))
else:
lines.append(f" {name} ({count}x)")
parts.append("TOOLS ALREADY CALLED:\n" + "\n".join(lines))
if files_saved:
unique = list(dict.fromkeys(files_saved))
parts.append("FILES SAVED: " + ", ".join(unique))
if outputs_set:
unique = list(dict.fromkeys(outputs_set))
parts.append("OUTPUTS SET: " + ", ".join(unique))
if errors:
parts.append("ERRORS (do NOT retry these):\n" + "\n".join(f" - {e}" for e in errors[:10]))
return "\n\n".join(parts)
# ---------------------------------------------------------------------------
# ConversationStore protocol (Phase 2)
# ---------------------------------------------------------------------------
@@ -352,9 +439,36 @@ class NodeConversation:
def _repair_orphaned_tool_calls(
msgs: list[dict[str, Any]],
) -> list[dict[str, Any]]:
"""Ensure every tool_call has a matching tool-result message."""
"""Ensure tool_call / tool_result pairs are consistent.
1. **Orphaned tool results** (tool_result with no preceding tool_use)
are dropped. This happens when compaction removes an assistant
message but leaves its tool-result messages behind.
2. **Orphaned tool calls** (tool_use with no following tool_result)
get a synthetic error result appended. This happens when a loop
is cancelled mid-tool-execution.
"""
# Pass 1: collect all tool_call IDs from assistant messages so we
# can identify orphaned tool-result messages.
all_tool_call_ids: set[str] = set()
for m in msgs:
if m.get("role") == "assistant":
for tc in m.get("tool_calls") or []:
tc_id = tc.get("id")
if tc_id:
all_tool_call_ids.add(tc_id)
# Pass 2: build repaired list — drop orphaned tool results, patch
# missing tool results.
repaired: list[dict[str, Any]] = []
for i, m in enumerate(msgs):
# Drop tool-result messages whose tool_call_id has no matching
# tool_use in any assistant message (orphaned by compaction).
if m.get("role") == "tool":
tid = m.get("tool_call_id")
if tid and tid not in all_tool_call_ids:
continue # skip orphaned result
repaired.append(m)
tool_calls = m.get("tool_calls")
if m.get("role") != "assistant" or not tool_calls:
@@ -632,6 +746,7 @@ class NodeConversation:
spillover_dir: str,
keep_recent: int = 4,
phase_graduated: bool = False,
aggressive: bool = False,
) -> None:
"""Structure-preserving compaction: save freeform text to file, keep tool messages.
@@ -641,6 +756,11 @@ class NodeConversation:
after pruning. Only freeform text exchanges (user messages,
text-only assistant messages) are saved to a file and removed.
When *aggressive* is True, non-essential tool call pairs are also
collapsed into a compact summary instead of being kept individually.
Only ``set_output`` calls and error results are preserved; all other
old tool pairs are replaced by a tool-call history summary.
The result: the agent retains exact knowledge of what tools it called,
where each result is stored, and can load the conversation text if
needed. No LLM summary call. No heuristics. Nothing lost.
@@ -672,35 +792,91 @@ class NodeConversation:
# Classify old messages: structural (keep) vs freeform (save to file)
kept_structural: list[Message] = []
freeform_lines: list[str] = []
collapsed_msgs: list[Message] = []
for msg in old_messages:
if msg.role == "tool":
# Tool results — already pruned to ~30 tokens (file reference).
# Keep in conversation.
kept_structural.append(msg)
elif msg.role == "assistant" and msg.tool_calls:
# Assistant message with tool_calls — keep the tool_calls
# with truncated arguments, clear the freeform text content.
compact_tcs = _compact_tool_calls(msg.tool_calls)
kept_structural.append(
Message(
seq=msg.seq,
role=msg.role,
content="",
tool_calls=compact_tcs,
is_error=msg.is_error,
phase_id=msg.phase_id,
is_transition_marker=msg.is_transition_marker,
)
if aggressive:
# Aggressive: only keep set_output tool pairs and error results.
# Everything else is collapsed into a tool-call history summary.
# We need to track tool_call IDs to pair assistant messages with
# their tool results.
protected_tc_ids: set[str] = set()
collapsible_tc_ids: set[str] = set()
# First pass: classify assistant messages
for msg in old_messages:
if msg.role != "assistant" or not msg.tool_calls:
continue
has_protected = any(
tc.get("function", {}).get("name") == "set_output" for tc in msg.tool_calls
)
else:
# Freeform text (user messages, text-only assistant messages)
# — save to file and remove from conversation.
role_label = msg.role
text = msg.content
if len(text) > 2000:
text = text[:2000] + ""
freeform_lines.append(f"[{role_label}] (seq={msg.seq}): {text}")
tc_ids = {tc.get("id", "") for tc in msg.tool_calls}
if has_protected:
protected_tc_ids |= tc_ids
else:
collapsible_tc_ids |= tc_ids
# Second pass: classify all messages
for msg in old_messages:
if msg.role == "tool":
tc_id = msg.tool_use_id or ""
if tc_id in protected_tc_ids:
kept_structural.append(msg)
elif msg.is_error:
# Error results are always protected
kept_structural.append(msg)
# Protect the parent assistant message too
protected_tc_ids.add(tc_id)
else:
collapsed_msgs.append(msg)
elif msg.role == "assistant" and msg.tool_calls:
tc_ids = {tc.get("id", "") for tc in msg.tool_calls}
if tc_ids & protected_tc_ids:
# Has at least one protected tool call — keep entire msg
compact_tcs = _compact_tool_calls(msg.tool_calls)
kept_structural.append(
Message(
seq=msg.seq,
role=msg.role,
content="",
tool_calls=compact_tcs,
is_error=msg.is_error,
phase_id=msg.phase_id,
is_transition_marker=msg.is_transition_marker,
)
)
else:
collapsed_msgs.append(msg)
else:
# Freeform text — save to file
role_label = msg.role
text = msg.content
if len(text) > 2000:
text = text[:2000] + ""
freeform_lines.append(f"[{role_label}] (seq={msg.seq}): {text}")
else:
# Standard mode: keep all tool call pairs as structural
for msg in old_messages:
if msg.role == "tool":
kept_structural.append(msg)
elif msg.role == "assistant" and msg.tool_calls:
compact_tcs = _compact_tool_calls(msg.tool_calls)
kept_structural.append(
Message(
seq=msg.seq,
role=msg.role,
content="",
tool_calls=compact_tcs,
is_error=msg.is_error,
phase_id=msg.phase_id,
is_transition_marker=msg.is_transition_marker,
)
)
else:
role_label = msg.role
text = msg.content
if len(text) > 2000:
text = text[:2000] + ""
freeform_lines.append(f"[{role_label}] (seq={msg.seq}): {text}")
# Write freeform text to a numbered conversation file
spill_path = Path(spillover_dir)
@@ -720,13 +896,25 @@ class NodeConversation:
conv_filename = ""
# Build reference message
ref_parts: list[str] = []
if conv_filename:
ref_content = (
ref_parts.append(
f"[Previous conversation saved to '{conv_filename}'. "
f"Use load_data('{conv_filename}') to review if needed.]"
)
else:
ref_content = "[Previous freeform messages compacted.]"
elif not collapsed_msgs:
ref_parts.append("[Previous freeform messages compacted.]")
# Aggressive: add collapsed tool-call history to the reference
if collapsed_msgs:
tool_history = extract_tool_call_history(collapsed_msgs)
if tool_history:
ref_parts.append(tool_history)
elif not ref_parts:
ref_parts.append("[Previous tool calls compacted.]")
ref_content = "\n\n".join(ref_parts)
# Use a seq just before the first kept message
recent_messages = list(self._messages[split:])
if kept_structural:
@@ -739,15 +927,13 @@ class NodeConversation:
ref_msg = Message(seq=ref_seq, role="user", content=ref_content)
# Persist: delete old messages from store, write reference + kept structural
# Persist: delete old messages from store, write reference + kept structural.
# In aggressive mode, collapsed messages may be interspersed with kept
# messages, so we delete everything before the recent boundary and
# rewrite only what we want to keep.
if self._store:
first_kept_seq = (
kept_structural[0].seq
if kept_structural
else (recent_messages[0].seq if recent_messages else self._next_seq)
)
# Delete everything before the first structural message we're keeping
await self._store.delete_parts_before(first_kept_seq)
recent_boundary = recent_messages[0].seq if recent_messages else self._next_seq
await self._store.delete_parts_before(recent_boundary)
# Write the reference message
await self._store.write_part(ref_msg.seq, ref_msg.to_storage_dict())
# Write kept structural messages (they may have been modified)
+32 -7
View File
@@ -103,7 +103,12 @@ FEEDBACK: (reason if RETRY, empty if ACCEPT)"""
def _extract_recent_context(conversation: NodeConversation, max_messages: int = 10) -> str:
"""Extract recent conversation messages for evaluation."""
"""Extract recent conversation messages for evaluation.
Includes tool-call summaries from assistant messages so the judge
can see what tools were invoked (especially set_output values) even
when the assistant message body is empty.
"""
messages = conversation.messages
recent = messages[-max_messages:] if len(messages) > max_messages else messages
@@ -112,8 +117,24 @@ def _extract_recent_context(conversation: NodeConversation, max_messages: int =
role = msg.role.upper()
content = msg.content or ""
# Truncate long tool results
if msg.role == "tool" and len(content) > 200:
content = content[:200] + "..."
if msg.role == "tool" and len(content) > 500:
content = content[:500] + "..."
# For assistant messages with empty content but tool_calls,
# summarise the tool calls so the judge knows what happened.
if msg.role == "assistant" and not content.strip():
tool_calls = getattr(msg, "tool_calls", None)
if tool_calls:
tc_parts = []
for tc in tool_calls:
fn = tc.get("function", {}) if isinstance(tc, dict) else {}
name = fn.get("name", "")
args = fn.get("arguments", "")
if name == "set_output":
# Show the value so the judge can evaluate content quality
tc_parts.append(f" called {name}({args[:1000]})")
else:
tc_parts.append(f" called {name}(...)")
content = "Tool calls:\n" + "\n".join(tc_parts)
if content.strip():
parts.append(f"[{role}]: {content.strip()}")
@@ -125,6 +146,10 @@ def _format_outputs(accumulator_state: dict[str, Any]) -> str:
Lists and dicts get structural formatting so the judge can assess
quantity and structure, not just a truncated stringification.
String values are given a generous limit (2000 chars) so the judge
can verify substantive content (e.g. a research brief with key
questions, scope boundaries, and deliverables).
"""
if not accumulator_state:
return "(none)"
@@ -144,12 +169,12 @@ def _format_outputs(accumulator_state: dict[str, Any]) -> str:
val_str += f"\n ... and {len(value) - 8} more"
elif isinstance(value, dict):
val_str = str(value)
if len(val_str) > 400:
val_str = val_str[:400] + "..."
if len(val_str) > 2000:
val_str = val_str[:2000] + "..."
else:
val_str = str(value)
if len(val_str) > 300:
val_str = val_str[:300] + "..."
if len(val_str) > 2000:
val_str = val_str[:2000] + "..."
parts.append(f" {key}: {val_str}")
return "\n".join(parts)
+56 -41
View File
@@ -338,6 +338,10 @@ class AsyncEntryPointSpec(BaseModel):
max_concurrent: int = Field(
default=10, description="Maximum concurrent executions for this entry point"
)
max_resurrections: int = Field(
default=3,
description="Auto-restart on non-fatal failure (0 to disable)",
)
model_config = {"extra": "allow"}
@@ -427,8 +431,7 @@ class GraphSpec(BaseModel):
max_tokens: int = Field(default=None) # resolved by _resolve_max_tokens validator
# Cleanup LLM for JSON extraction fallback (fast/cheap model preferred)
# If not set, uses CEREBRAS_API_KEY -> cerebras/llama-3.3-70b or
# ANTHROPIC_API_KEY -> claude-haiku-4-5 as fallback
# If not set, uses CEREBRAS_API_KEY -> cerebras/llama-3.3-70b
cleanup_llm_model: str | None = None
# Execution limits
@@ -503,45 +506,6 @@ class GraphSpec(BaseModel):
"""Get all edges entering a node."""
return [e for e in self.edges if e.target == node_id]
def build_capability_summary(self, from_node_id: str) -> str:
"""Build a summary of the agent's downstream workflow phases and tools.
Walks the graph from *from_node_id* and collects all reachable nodes
(excluding the starting node itself) so that client-facing entry nodes
can inform the user about what the overall agent is capable of.
Returns:
A formatted string listing each downstream node's name,
description, and tools or an empty string when there are
no downstream nodes.
"""
reachable: list[Any] = []
visited: set[str] = set()
queue = [from_node_id]
while queue:
nid = queue.pop()
if nid in visited:
continue
visited.add(nid)
node = self.get_node(nid)
if node and nid != from_node_id:
reachable.append(node)
for edge in self.get_outgoing_edges(nid):
queue.append(edge.target)
if not reachable:
return ""
lines = [
"## Agent Capabilities",
"This agent has the following workflow phases and tools:",
]
for node in reachable:
tool_str = f" (tools: {', '.join(node.tools)})" if node.tools else ""
lines.append(f"- {node.name}: {node.description}{tool_str}")
return "\n".join(lines)
def detect_fan_out_nodes(self) -> dict[str, list[str]]:
"""
Detect nodes that fan-out to multiple targets.
@@ -683,6 +647,13 @@ class GraphSpec(BaseModel):
for edge in self.get_outgoing_edges(current):
to_visit.append(edge.target)
# Also mark sub-agents as reachable (they're invoked via delegate_to_sub_agent, not edges)
for node in self.nodes:
if node.id in reachable:
sub_agents = getattr(node, "sub_agents", []) or []
for sub_agent_id in sub_agents:
reachable.add(sub_agent_id)
# Build set of async entry point nodes for quick lookup
async_entry_nodes = {ep.entry_node for ep in self.async_entry_points}
@@ -734,4 +705,48 @@ class GraphSpec(BaseModel):
else:
seen_keys[key] = node_id
# GCU nodes must only be used as subagents
gcu_node_ids = {n.id for n in self.nodes if n.node_type == "gcu"}
if gcu_node_ids:
# GCU nodes must not be entry nodes
if self.entry_node in gcu_node_ids:
errors.append(
f"GCU node '{self.entry_node}' is used as entry node. "
"GCU nodes must only be used as subagents via delegate_to_sub_agent()."
)
# GCU nodes must not be terminal nodes
for term in self.terminal_nodes:
if term in gcu_node_ids:
errors.append(
f"GCU node '{term}' is used as terminal node. "
"GCU nodes must only be used as subagents."
)
# GCU nodes must not be connected via edges
for edge in self.edges:
if edge.source in gcu_node_ids:
errors.append(
f"GCU node '{edge.source}' is used as edge source (edge '{edge.id}'). "
"GCU nodes must only be used as subagents, not connected via edges."
)
if edge.target in gcu_node_ids:
errors.append(
f"GCU node '{edge.target}' is used as edge target (edge '{edge.id}'). "
"GCU nodes must only be used as subagents, not connected via edges."
)
# GCU nodes must be referenced in at least one parent's sub_agents
referenced_subagents = set()
for node in self.nodes:
for sa_id in node.sub_agents or []:
referenced_subagents.add(sa_id)
orphaned = gcu_node_ids - referenced_subagents
for nid in orphaned:
errors.append(
f"GCU node '{nid}' is not referenced in any node's sub_agents list. "
"GCU nodes must be declared as subagents of a parent node."
)
return errors
File diff suppressed because it is too large Load Diff
+200 -39
View File
@@ -138,6 +138,7 @@ class GraphExecutor:
accounts_prompt: str = "",
accounts_data: list[dict] | None = None,
tool_provider_map: dict[str, str] | None = None,
dynamic_tools_provider: Callable | None = None,
):
"""
Initialize the executor.
@@ -160,6 +161,8 @@ class GraphExecutor:
accounts_prompt: Connected accounts block for system prompt injection
accounts_data: Raw account data for per-node prompt generation
tool_provider_map: Tool name to provider name mapping for account routing
dynamic_tools_provider: Optional callback returning current
tool list (for mode switching)
"""
self.runtime = runtime
self.llm = llm
@@ -178,12 +181,14 @@ class GraphExecutor:
self.accounts_prompt = accounts_prompt
self.accounts_data = accounts_data
self.tool_provider_map = tool_provider_map
self.dynamic_tools_provider = dynamic_tools_provider
# Initialize output cleaner
# Initialize output cleaner — uses its own dedicated fast model (CEREBRAS_API_KEY),
# never the main agent LLM. Passing the main LLM here would cause expensive
# Anthropic calls for output cleaning whenever ANTHROPIC_API_KEY is set.
self.cleansing_config = cleansing_config or CleansingConfig()
self.output_cleaner = OutputCleaner(
config=self.cleansing_config,
llm_provider=llm,
)
# Parallel execution settings
@@ -193,6 +198,9 @@ class GraphExecutor:
# Pause/resume control
self._pause_requested = asyncio.Event()
# Track the currently executing node for external injection routing
self.current_node_id: str | None = None
def _write_progress(
self,
current_node: str,
@@ -283,6 +291,125 @@ class GraphExecutor:
return errors
# Max chars of formatted messages before proactively splitting for LLM.
_PHASE_LLM_CHAR_LIMIT = 240_000
_PHASE_LLM_MAX_DEPTH = 10
async def _phase_llm_compact(
self,
conversation: Any,
next_spec: NodeSpec,
messages: list,
_depth: int = 0,
) -> str:
"""Summarise messages for phase-boundary compaction.
Uses the same recursive binary-search splitting as EventLoopNode.
"""
from framework.graph.conversation import extract_tool_call_history
from framework.graph.event_loop_node import _is_context_too_large_error
if _depth > self._PHASE_LLM_MAX_DEPTH:
raise RuntimeError("Phase LLM compaction recursion limit")
# Format messages
lines: list[str] = []
for m in messages:
if m.role == "tool":
c = m.content[:500] + ("..." if len(m.content) > 500 else "")
lines.append(f"[tool result]: {c}")
elif m.role == "assistant" and m.tool_calls:
names = [tc.get("function", {}).get("name", "?") for tc in m.tool_calls]
lines.append(
f"[assistant (calls: {', '.join(names)})]: "
f"{m.content[:200] if m.content else ''}"
)
else:
lines.append(f"[{m.role}]: {m.content}")
formatted = "\n\n".join(lines)
# Proactive split
if len(formatted) > self._PHASE_LLM_CHAR_LIMIT and len(messages) > 1:
summary = await self._phase_llm_compact_split(
conversation,
next_spec,
messages,
_depth,
)
else:
max_tokens = getattr(conversation, "_max_history_tokens", 32000)
target_tokens = max_tokens // 2
target_chars = target_tokens * 4
prompt = (
"You are compacting an AI agent's conversation history "
"at a phase boundary.\n\n"
f"NEXT PHASE: {next_spec.name}\n"
)
if next_spec.description:
prompt += f"NEXT PHASE PURPOSE: {next_spec.description}\n"
prompt += (
f"\nCONVERSATION MESSAGES:\n{formatted}\n\n"
"INSTRUCTIONS:\n"
f"Write a summary of approximately {target_chars} characters "
f"(~{target_tokens} tokens).\n"
"Preserve user-stated rules, constraints, and preferences "
"verbatim. Preserve key decisions and results from earlier "
"phases. Preserve context needed for the next phase.\n"
)
summary_budget = max(1024, max_tokens // 2)
try:
response = await self._llm.acomplete(
messages=[{"role": "user", "content": prompt}],
system=(
"You are a conversation compactor. Write a detailed "
"summary preserving context for the next phase."
),
max_tokens=summary_budget,
)
summary = response.content
except Exception as e:
if _is_context_too_large_error(e) and len(messages) > 1:
summary = await self._phase_llm_compact_split(
conversation,
next_spec,
messages,
_depth,
)
else:
raise
# Append tool history at top level only
if _depth == 0:
tool_history = extract_tool_call_history(messages)
if tool_history and "TOOLS ALREADY CALLED" not in summary:
summary += "\n\n" + tool_history
return summary
async def _phase_llm_compact_split(
self,
conversation: Any,
next_spec: NodeSpec,
messages: list,
_depth: int,
) -> str:
"""Split messages in half and summarise each half."""
mid = max(1, len(messages) // 2)
s1 = await self._phase_llm_compact(
conversation,
next_spec,
messages[:mid],
_depth + 1,
)
s2 = await self._phase_llm_compact(
conversation,
next_spec,
messages[mid:],
_depth + 1,
)
return s1 + "\n\n" + s2
async def execute(
self,
graph: GraphSpec,
@@ -338,6 +465,9 @@ class GraphExecutor:
cumulative_tool_names: set[str] = set()
cumulative_output_keys: list[str] = [] # Output keys from all visited nodes
# Build node registry for subagent lookup
node_registry: dict[str, NodeSpec] = {node.id: node for node in graph.nodes}
# Initialize checkpoint store if checkpointing is enabled
checkpoint_store: CheckpointStore | None = None
if checkpoint_config and checkpoint_config.enabled and self._storage_path:
@@ -491,11 +621,14 @@ class GraphExecutor:
# node doesn't restore a filled OutputAccumulator from the previous
# webhook run (which would cause the judge to accept immediately).
# The conversation history is preserved (continuous memory).
# Exclude cold restores — those need to continue the conversation
# naturally without a "start fresh" marker.
_is_fresh_shared = bool(
session_state
and session_state.get("resume_session_id")
and not session_state.get("paused_at")
and not session_state.get("resume_from_checkpoint")
and not session_state.get("cold_restore")
)
if _is_fresh_shared and is_continuous and self._storage_path:
try:
@@ -694,6 +827,9 @@ class GraphExecutor:
# Execute this node, then pause
# (We'll check again after execution and save state)
# Expose current node for external injection routing
self.current_node_id = current_node_id
self.logger.info(f"\n▶ Step {steps}: {node_spec.name} ({node_spec.node_type})")
self.logger.info(f" Inputs: {node_spec.input_keys}")
self.logger.info(f" Outputs: {node_spec.output_keys}")
@@ -729,6 +865,7 @@ class GraphExecutor:
override_tools=cumulative_tools if is_continuous else None,
cumulative_output_keys=cumulative_output_keys if is_continuous else None,
event_triggered=_event_triggered,
node_registry=node_registry,
identity_prompt=getattr(graph, "identity_prompt", ""),
narrative=_resume_narrative,
graph=graph,
@@ -1131,6 +1268,7 @@ class GraphExecutor:
source_result=result,
source_node_spec=node_spec,
path=path,
node_registry=node_registry,
)
total_tokens += branch_tokens
@@ -1280,9 +1418,7 @@ class GraphExecutor:
# Set current phase for phase-aware compaction
continuous_conversation.set_current_phase(next_spec.id)
# Opportunistic compaction at transition:
# 1. Prune old tool results (free, no LLM call)
# 2. If still over 80%, do a phase-graduated compact
# Phase-boundary compaction (same flow as EventLoopNode._compact)
if continuous_conversation.usage_ratio() > 0.5:
await continuous_conversation.prune_old_tool_results(
protect_tokens=2000,
@@ -1294,42 +1430,64 @@ class GraphExecutor:
_phase_ratio * 100,
)
_data_dir = (
str(self._storage_path / "data")
if self._storage_path
else None
str(self._storage_path / "data") if self._storage_path else None
)
# Step 1: Structural compaction (>=80%)
if _data_dir:
_pre = continuous_conversation.usage_ratio()
await continuous_conversation.compact_preserving_structure(
spillover_dir=_data_dir,
keep_recent=4,
phase_graduated=True,
)
# Circuit breaker: if still over budget, fall back
_post_ratio = continuous_conversation.usage_ratio()
if _post_ratio >= 0.9 * _phase_ratio:
self.logger.warning(
" Structure-preserving compaction ineffective "
"(%.0f%% -> %.0f%%), falling back to summary",
_phase_ratio * 100,
_post_ratio * 100,
)
summary = (
f"Summary of earlier phases (before {next_spec.name}). "
"See transition markers for phase details."
)
await continuous_conversation.compact(
summary,
if continuous_conversation.usage_ratio() >= 0.9 * _pre:
await continuous_conversation.compact_preserving_structure(
spillover_dir=_data_dir,
keep_recent=4,
phase_graduated=True,
aggressive=True,
)
else:
# Step 2: LLM compaction (>95%)
if (
continuous_conversation.usage_ratio() > 0.95
and self._llm is not None
):
self.logger.info(
" LLM phase-boundary compaction (%.0f%% usage)",
continuous_conversation.usage_ratio() * 100,
)
try:
_llm_summary = await self._phase_llm_compact(
continuous_conversation,
next_spec,
list(continuous_conversation.messages),
)
await continuous_conversation.compact(
_llm_summary,
keep_recent=2,
phase_graduated=True,
)
except Exception as e:
self.logger.warning(
" Phase LLM compaction failed: %s",
e,
)
# Step 3: Emergency (only if still over budget)
if continuous_conversation.needs_compaction():
self.logger.warning(
" Emergency phase compaction (%.0f%%)",
continuous_conversation.usage_ratio() * 100,
)
summary = (
f"Summary of earlier phases (before {next_spec.name}). "
f"Summary of earlier phases "
f"(before {next_spec.name}). "
"See transition markers for phase details."
)
await continuous_conversation.compact(
summary,
keep_recent=4,
keep_recent=1,
phase_graduated=True,
)
@@ -1585,6 +1743,7 @@ class GraphExecutor:
event_triggered: bool = False,
identity_prompt: str = "",
narrative: str = "",
node_registry: dict[str, NodeSpec] | None = None,
graph: "GraphSpec | None" = None,
) -> NodeContext:
"""Build execution context for a node."""
@@ -1614,17 +1773,7 @@ class GraphExecutor:
node_tool_names=node_spec.tools,
)
# Build goal context, enriched with capability summary for
# client-facing nodes so the LLM knows what the full agent can do.
goal_context = goal.to_prompt_context()
if graph and node_spec.client_facing:
capability_summary = graph.build_capability_summary(graph.entry_node)
if capability_summary:
goal_context = (
f"{goal_context}\n\n{capability_summary}"
if goal_context
else capability_summary
)
return NodeContext(
runtime=self.runtime,
@@ -1648,10 +1797,15 @@ class GraphExecutor:
narrative=narrative,
execution_id=self._execution_id,
stream_id=self._stream_id,
node_registry=node_registry or {},
all_tools=list(self.tools), # Full catalog for subagent tool resolution
shared_node_registry=self.node_registry, # For subagent escalation routing
dynamic_tools_provider=self.dynamic_tools_provider,
)
VALID_NODE_TYPES = {
"event_loop",
"gcu",
}
# Node types removed in v0.5 — provide migration guidance
REMOVED_NODE_TYPES = {
@@ -1686,8 +1840,8 @@ class GraphExecutor:
f"Must be one of: {sorted(self.VALID_NODE_TYPES)}."
)
# Create based on type (only event_loop is valid)
if node_spec.node_type == "event_loop":
# Create based on type
if node_spec.node_type in ("event_loop", "gcu"):
# Auto-create EventLoopNode with sensible defaults.
# Custom configs can still be pre-registered via node_registry.
from framework.graph.event_loop_node import EventLoopNode, LoopConfig
@@ -1904,6 +2058,7 @@ class GraphExecutor:
source_result: NodeResult,
source_node_spec: Any,
path: list[str],
node_registry: dict[str, NodeSpec] | None = None,
) -> tuple[dict[str, NodeResult], int, int]:
"""
Execute multiple branches in parallel using asyncio.gather.
@@ -2002,7 +2157,13 @@ class GraphExecutor:
# Build context for this branch
ctx = self._build_context(
node_spec, memory, goal, mapped, graph.max_tokens, graph=graph
node_spec,
memory,
goal,
mapped,
graph.max_tokens,
node_registry=node_registry,
graph=graph,
)
node_impl = self._get_node_implementation(node_spec, graph.cleanup_llm_model)
+23
View File
@@ -0,0 +1,23 @@
"""File tools MCP server constants.
Analogous to ``gcu.py`` defines the server name and default stdio config
so the runner can auto-register the files MCP server for any agent that has
``event_loop`` or ``gcu`` nodes.
"""
# ---------------------------------------------------------------------------
# MCP server identity
# ---------------------------------------------------------------------------
FILES_MCP_SERVER_NAME = "files-tools"
"""Name used to identify the file tools MCP server in ``mcp_servers.json``."""
FILES_MCP_SERVER_CONFIG: dict = {
"name": FILES_MCP_SERVER_NAME,
"transport": "stdio",
"command": "uv",
"args": ["run", "python", "files_server.py", "--stdio"],
"cwd": "../../tools",
"description": "File tools for reading, writing, editing, and searching files",
}
"""Default stdio config for the file tools MCP server (relative to exports/<agent>/)."""
+86
View File
@@ -0,0 +1,86 @@
"""GCU (browser automation) node type constants.
A ``gcu`` node is an ``event_loop`` node with two automatic enhancements:
1. A canonical browser best-practices system prompt is prepended.
2. All tools from the GCU MCP server are auto-included.
No new ``NodeProtocol`` subclass the ``gcu`` type is purely a declarative
signal processed by the runner and executor at setup time.
"""
# ---------------------------------------------------------------------------
# MCP server identity
# ---------------------------------------------------------------------------
GCU_SERVER_NAME = "gcu-tools"
"""Name used to identify the GCU MCP server in ``mcp_servers.json``."""
GCU_MCP_SERVER_CONFIG: dict = {
"name": GCU_SERVER_NAME,
"transport": "stdio",
"command": "uv",
"args": ["run", "python", "-m", "gcu.server", "--stdio"],
"cwd": "../../tools",
"description": "GCU tools for browser automation",
}
"""Default stdio config for the GCU MCP server (relative to exports/<agent>/)."""
# ---------------------------------------------------------------------------
# Browser best-practices system prompt
# ---------------------------------------------------------------------------
GCU_BROWSER_SYSTEM_PROMPT = """\
# Browser Automation Best Practices
Follow these rules for reliable, efficient browser interaction.
## Reading Pages
- ALWAYS prefer `browser_snapshot` over `browser_get_text("body")`
it returns a compact ~1-5 KB accessibility tree vs 100+ KB of raw HTML.
- Use `browser_snapshot_aria` when you need full ARIA properties
for detailed element inspection.
- Do NOT use `browser_screenshot` for reading text content
it produces huge base64 images with no searchable text.
- Only fall back to `browser_get_text` for extracting specific
small elements by CSS selector.
## Navigation & Waiting
- Always call `browser_wait` after navigation actions
(`browser_open`, `browser_navigate`, `browser_click` on links)
to let the page load.
- NEVER re-navigate to the same URL after scrolling
this resets your scroll position and loses loaded content.
## Scrolling
- Use large scroll amounts ~2000 when loading more content
sites like twitter and linkedin have lazy loading for paging.
- After scrolling, take a new `browser_snapshot` to see updated content.
## Error Recovery
- If a tool fails, retry once with the same approach.
- If it fails a second time, STOP retrying and switch approach.
- If `browser_snapshot` fails try `browser_get_text` with a
specific small selector as fallback.
- If `browser_open` fails or page seems stale `browser_stop`,
then `browser_start`, then retry.
## Tab Management
- Use `browser_tabs` to list open tabs when managing multiple pages.
- Pass `target_id` to tools when operating on a specific tab.
- Open background tabs with `browser_open(url=..., background=true)`
to avoid losing your current context.
- Close tabs you no longer need with `browser_close` to free resources.
## Login & Auth Walls
- If you see a "Log in" or "Sign up" prompt instead of expected
content, report the auth wall immediately do NOT attempt to log in.
- Check for cookie consent banners and dismiss them if they block content.
## Efficiency
- Minimize tool calls combine actions where possible.
- When a snapshot result is saved to a spillover file, use
`run_command` with grep to extract specific data rather than
re-reading the full file.
- Call `set_output` in the same turn as your last browser action
when possible don't waste a turn.
"""
+4 -56
View File
@@ -154,69 +154,17 @@ class HITLProtocol:
"""
Parse human's raw input into structured response.
Uses Haiku to intelligently extract answers for each question.
Maps the raw input to the first question. For multi-question HITL,
the caller should present one question at a time.
"""
import os
response = HITLResponse(request_id=request.request_id, raw_input=raw_input)
# If no questions, just return raw input
if not request.questions:
return response
# Try to use Haiku for intelligent parsing
api_key = os.environ.get("ANTHROPIC_API_KEY")
if not use_haiku or not api_key:
# Simple fallback: treat as answer to first question
if request.questions:
response.answers[request.questions[0].id] = raw_input
return response
# Use Haiku to extract answers
try:
import json
import anthropic
questions_str = "\n".join(
[f"{i + 1}. {q.question} (id: {q.id})" for i, q in enumerate(request.questions)]
)
prompt = f"""Parse the user's response and extract answers for each question.
Questions asked:
{questions_str}
User's response:
{raw_input}
Extract the answer for each question. Output JSON with question IDs as keys.
Example format:
{{"question-1": "answer here", "question-2": "answer here"}}"""
client = anthropic.Anthropic(api_key=api_key)
message = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=500,
messages=[{"role": "user", "content": prompt}],
)
# Parse Haiku's response
import re
response_text = message.content[0].text.strip()
json_match = re.search(r"\{[^{}]*\}", response_text, re.DOTALL)
if json_match:
parsed = json.loads(json_match.group())
response.answers = parsed
except Exception:
# Fallback: use raw input for first question
if request.questions:
response.answers[request.questions[0].id] = raw_input
# Map raw input to first question
response.answers[request.questions[0].id] = raw_input
return response
@staticmethod
+37 -55
View File
@@ -166,7 +166,7 @@ class NodeSpec(BaseModel):
# Node behavior type
node_type: str = Field(
default="event_loop",
description="Type: 'event_loop' (recommended), 'router', 'human_input'.",
description="Type: 'event_loop' (recommended), 'gcu' (browser automation).",
)
# Data flow
@@ -204,6 +204,16 @@ class NodeSpec(BaseModel):
default=None, description="Specific model to use (defaults to graph default)"
)
# For subagent delegation
sub_agents: list[str] = Field(
default_factory=list,
description="Node IDs that can be invoked as subagents from this node",
)
# For function nodes
function: str | None = Field(
default=None, description="Function name or path for function nodes"
)
# For router nodes
routes: dict[str, str] = Field(
default_factory=dict, description="Condition -> target_node_id mapping for routers"
@@ -520,6 +530,25 @@ class NodeContext:
# Falls back to node_id when not set (legacy / standalone executor).
stream_id: str = ""
# Subagent mode
is_subagent_mode: bool = False # True when running as a subagent (prevents nested delegation)
report_callback: Any = None # async (message: str, data: dict | None) -> None
node_registry: dict[str, "NodeSpec"] = field(default_factory=dict) # For subagent lookup
# Full tool catalog (unfiltered) — used by _execute_subagent to resolve
# subagent tools that aren't in the parent node's filtered available_tools.
all_tools: list[Tool] = field(default_factory=list)
# Shared reference to the executor's node_registry — used by subagent
# escalation (_EscalationReceiver) to register temporary receivers that
# the inject_input() routing chain can find.
shared_node_registry: dict[str, Any] = field(default_factory=dict)
# Dynamic tool provider — when set, EventLoopNode rebuilds the tool
# list from this callback at the start of each iteration. Used by
# the queen to switch between building-mode and running-mode tools.
dynamic_tools_provider: Any = None # Callable[[], list[Tool]] | None
@dataclass
class NodeResult:
@@ -556,7 +585,6 @@ class NodeResult:
Generate a human-readable summary of this node's execution and output.
This is like toString() - it describes what the node produced in its current state.
Uses Haiku to intelligently summarize complex outputs.
"""
if not self.success:
return f"❌ Failed: {self.error}"
@@ -564,59 +592,13 @@ class NodeResult:
if not self.output:
return "✓ Completed (no output)"
# Use Haiku to generate intelligent summary
import os
api_key = os.environ.get("ANTHROPIC_API_KEY")
if not api_key:
# Fallback: simple key-value listing
parts = [f"✓ Completed with {len(self.output)} outputs:"]
for key, value in list(self.output.items())[:5]: # Limit to 5 keys
value_str = str(value)[:100]
if len(str(value)) > 100:
value_str += "..."
parts.append(f"{key}: {value_str}")
return "\n".join(parts)
# Use Haiku to generate intelligent summary
try:
import json
import anthropic
node_context = ""
if node_spec:
node_context = f"\nNode: {node_spec.name}\nPurpose: {node_spec.description}"
output_json = json.dumps(self.output, indent=2, default=str)[:2000]
prompt = (
f"Generate a 1-2 sentence human-readable summary of "
f"what this node produced.{node_context}\n\n"
f"Node output:\n{output_json}\n\n"
"Provide a concise, clear summary that a human can quickly "
"understand. Focus on the key information produced."
)
client = anthropic.Anthropic(api_key=api_key)
message = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=200,
messages=[{"role": "user", "content": prompt}],
)
summary = message.content[0].text.strip()
return f"{summary}"
except Exception:
# Fallback on error
parts = [f"✓ Completed with {len(self.output)} outputs:"]
for key, value in list(self.output.items())[:3]:
value_str = str(value)[:80]
if len(str(value)) > 80:
value_str += "..."
parts.append(f"{key}: {value_str}")
return "\n".join(parts)
parts = [f"✓ Completed with {len(self.output)} outputs:"]
for key, value in list(self.output.items())[:5]: # Limit to 5 keys
value_str = str(value)[:100]
if len(str(value)) > 100:
value_str += "..."
parts.append(f"{key}: {value_str}")
return "\n".join(parts)
class NodeProtocol(ABC):
+1 -1
View File
@@ -280,7 +280,7 @@ def build_transition_marker(
]
if file_lines:
sections.append(
"\nData files (use load_data to access):\n" + "\n".join(file_lines)
"\nData files (use read_file to access):\n" + "\n".join(file_lines)
)
# Agent working memory
+30 -34
View File
@@ -170,7 +170,7 @@ def _dump_failed_request(
"temperature": kwargs.get("temperature"),
}
with open(filepath, "w") as f:
with open(filepath, "w", encoding="utf-8") as f:
json.dump(dump_data, f, indent=2, default=str)
return str(filepath)
@@ -237,6 +237,11 @@ def _is_stream_transient_error(exc: BaseException) -> bool:
Transient errors (recoverable=True): network issues, server errors, timeouts.
Permanent errors (recoverable=False): auth, bad request, context window, etc.
NOTE: "Failed to parse tool call arguments" (malformed LLM output) is NOT
transient at the stream level retrying with the same messages produces the
same malformed output. This error is handled at the EventLoopNode level
where the conversation can be modified before retrying.
"""
try:
from litellm.exceptions import (
@@ -783,9 +788,7 @@ class LiteLLMProvider(LLMProvider):
m
for m in full_messages
if not (
m.get("role") == "assistant"
and not m.get("content")
and not m.get("tool_calls")
m.get("role") == "assistant" and not m.get("content") and not m.get("tool_calls")
)
]
@@ -919,30 +922,6 @@ class LiteLLMProvider(LLMProvider):
# and we skip the retry path — nothing was yielded in vain.)
has_content = accumulated_text or tool_calls_acc
if not has_content:
# If the conversation ends with an assistant or tool
# message, an empty stream is expected — the LLM has
# nothing new to say. Don't burn retries on this;
# let the caller (EventLoopNode) decide what to do.
# Typical case: client_facing node where the LLM set
# all outputs via set_output tool calls, and the tool
# results are the last messages.
last_role = next(
(m["role"] for m in reversed(full_messages) if m.get("role") != "system"),
None,
)
if last_role in ("assistant", "tool"):
logger.warning(
"[stream] %s returned empty stream after %s message "
"(no text, no tool calls). Treating as a no-op turn. "
"If this repeats, the agent may be stuck — check for "
"ghost empty assistant messages in conversation history.",
self.model,
last_role,
)
for event in tail_events:
yield event
return
# finish_reason=length means the model exhausted
# max_tokens before producing content. Retrying with
# the same max_tokens will never help.
@@ -960,10 +939,16 @@ class LiteLLMProvider(LLMProvider):
yield event
return
# Empty stream after a user message — use short fixed
# retries, not the rate-limit backoff. This is likely
# a deterministic conversation-structure issue, so long
# exponential waits don't help.
# Empty stream — always retry regardless of last message
# role. Ghost empty streams after tool results are NOT
# expected no-ops; they create infinite loops when the
# conversation doesn't change between iterations.
# After retries, return the empty result and let the
# caller (EventLoopNode) decide how to handle it.
last_role = next(
(m["role"] for m in reversed(full_messages) if m.get("role") != "system"),
None,
)
if attempt < EMPTY_STREAM_MAX_RETRIES:
token_count, token_method = _estimate_tokens(
self.model,
@@ -976,7 +961,8 @@ class LiteLLMProvider(LLMProvider):
attempt=attempt,
)
logger.warning(
f"[stream-retry] {self.model} returned empty stream "
f"[stream-retry] {self.model} returned empty stream "
f"after {last_role} message — "
f"~{token_count} tokens ({token_method}). "
f"Request dumped to: {dump_path}. "
f"Retrying in {EMPTY_STREAM_RETRY_DELAY}s "
@@ -985,7 +971,17 @@ class LiteLLMProvider(LLMProvider):
await asyncio.sleep(EMPTY_STREAM_RETRY_DELAY)
continue
# Success (or final attempt) — flush remaining events.
# All retries exhausted — log and return the empty
# result. EventLoopNode's empty response guard will
# accept if all outputs are set, or handle the ghost
# stream case if outputs are still missing.
logger.error(
f"[stream] {self.model} returned empty stream after "
f"{EMPTY_STREAM_MAX_RETRIES} retries "
f"(last_role={last_role}). Returning empty result."
)
# Success (or empty after exhausted retries) — flush events.
for event in tail_events:
yield event
return
+87 -91
View File
@@ -10,6 +10,7 @@ Usage:
import json
import logging
import os
import shutil
import sys
from datetime import datetime
from pathlib import Path
@@ -161,7 +162,7 @@ def _load_session(session_id: str) -> BuildSession:
if not session_file.exists():
raise ValueError(f"Session '{session_id}' not found")
with open(session_file) as f:
with open(session_file, encoding="utf-8") as f:
data = json.load(f)
return BuildSession.from_dict(data)
@@ -173,7 +174,7 @@ def _load_active_session() -> BuildSession | None:
return None
try:
with open(ACTIVE_SESSION_FILE) as f:
with open(ACTIVE_SESSION_FILE, encoding="utf-8") as f:
session_id = f.read().strip()
if session_id:
@@ -227,7 +228,7 @@ def list_sessions() -> str:
if SESSIONS_DIR.exists():
for session_file in SESSIONS_DIR.glob("*.json"):
try:
with open(session_file) as f:
with open(session_file, encoding="utf-8") as f:
data = json.load(f)
sessions.append(
{
@@ -247,7 +248,7 @@ def list_sessions() -> str:
active_id = None
if ACTIVE_SESSION_FILE.exists():
try:
with open(ACTIVE_SESSION_FILE) as f:
with open(ACTIVE_SESSION_FILE, encoding="utf-8") as f:
active_id = f.read().strip()
except Exception:
pass
@@ -309,7 +310,7 @@ def delete_session(session_id: Annotated[str, "ID of the session to delete"]) ->
_session = None
if ACTIVE_SESSION_FILE.exists():
with open(ACTIVE_SESSION_FILE) as f:
with open(ACTIVE_SESSION_FILE, encoding="utf-8") as f:
active_id = f.read().strip()
if active_id == session_id:
ACTIVE_SESSION_FILE.unlink()
@@ -562,16 +563,29 @@ def _validate_agent_path(agent_path: str) -> tuple[Path | None, str | None]:
path = Path(agent_path)
# Resolve relative paths against project root (not MCP server's cwd)
if not path.is_absolute() and not path.exists():
resolved = _PROJECT_ROOT / path
if resolved.exists():
path = resolved
if not path.is_absolute():
path = _PROJECT_ROOT / path
# Restrict to allowed directories BEFORE checking existence to prevent
# leaking whether arbitrary filesystem paths exist on disk.
from framework.server.app import validate_agent_path
try:
path = validate_agent_path(path)
except ValueError:
return None, json.dumps(
{
"success": False,
"error": "agent_path must be inside an allowed directory "
"(exports/, examples/, or ~/.hive/agents/)",
}
)
if not path.exists():
return None, json.dumps(
{
"success": False,
"error": f"Agent path not found: {path}",
"error": f"Agent path not found: {agent_path}",
"hint": "Run export_graph to create an agent in exports/ first",
}
)
@@ -586,7 +600,7 @@ def add_node(
description: Annotated[str, "What this node does"],
node_type: Annotated[
str,
"Type: event_loop (recommended), router.",
"Type: event_loop (recommended), gcu (browser automation), router.",
],
input_keys: Annotated[str, "JSON array of keys this node reads from shared memory"],
output_keys: Annotated[str, "JSON array of keys this node writes to shared memory"],
@@ -675,8 +689,23 @@ def add_node(
if node_type == "event_loop" and not system_prompt:
warnings.append(f"Event loop node '{node_id}' should have a system_prompt")
# GCU node validation
if node_type == "gcu":
if tools_list:
warnings.append(
f"GCU node '{node_id}' auto-includes all browser tools from the "
f"gcu-tools MCP server. Manually listed tools {tools_list} will be "
f"merged with the auto-included set."
)
if not system_prompt:
warnings.append(
f"GCU node '{node_id}' has a default browser best-practices prompt. "
f"Consider adding a task-specific system_prompt — it will be appended "
f"after the browser instructions."
)
# Warn about client_facing on nodes with tools (likely autonomous work)
if node_type == "event_loop" and client_facing and tools_list:
if node_type in ("event_loop", "gcu") and client_facing and tools_list:
warnings.append(
f"Node '{node_id}' is client_facing=True but has tools {tools_list}. "
"Nodes with tools typically do autonomous work and should be "
@@ -1774,6 +1803,14 @@ def export_graph() -> str:
enriched_criteria.append(crit_dict)
export_data["goal"]["success_criteria"] = enriched_criteria
# Auto-add GCU MCP server if any node uses the gcu type
has_gcu_nodes = any(n.node_type == "gcu" for n in session.nodes)
if has_gcu_nodes:
from framework.graph.gcu import GCU_MCP_SERVER_CONFIG, GCU_SERVER_NAME
if not any(s.get("name") == GCU_SERVER_NAME for s in session.mcp_servers):
session.mcp_servers.append(dict(GCU_MCP_SERVER_CONFIG))
# === WRITE FILES TO DISK ===
# Create exports directory
exports_dir = Path("exports") / session.name
@@ -1864,7 +1901,7 @@ def import_from_export(
return json.dumps({"success": False, "error": f"File not found: {agent_json_path}"})
try:
data = json.loads(path.read_text())
data = json.loads(path.read_text(encoding="utf-8"))
except json.JSONDecodeError as e:
return json.dumps({"success": False, "error": f"Invalid JSON: {e}"})
@@ -2772,6 +2809,21 @@ def run_tests(
import re
import subprocess
# Guard: pytest must be available as a subprocess command.
# Install with: pip install 'framework[testing]'
if shutil.which("pytest") is None:
return json.dumps(
{
"goal_id": goal_id,
"error": (
"pytest is not installed or not on PATH. "
"Hive's test runner requires pytest at runtime. "
"Install it with: pip install 'framework[testing]' "
"or: uv pip install 'framework[testing]'"
),
}
)
path, err = _validate_agent_path(agent_path)
if err:
return err
@@ -2842,10 +2894,12 @@ def run_tests(
try:
result = subprocess.run(
cmd,
encoding="utf-8",
capture_output=True,
text=True,
timeout=600, # 10 minute timeout
env=env,
stdin=subprocess.DEVNULL,
)
except subprocess.TimeoutExpired:
return json.dumps(
@@ -2965,6 +3019,22 @@ def debug_test(
import re
import subprocess
# Guard: pytest must be available as a subprocess command.
# Install with: pip install 'framework[testing]'
if shutil.which("pytest") is None:
return json.dumps(
{
"goal_id": goal_id,
"test_name": test_name,
"error": (
"pytest is not installed or not on PATH. "
"Hive's test runner requires pytest at runtime. "
"Install it with: pip install 'framework[testing]' "
"or: uv pip install 'framework[testing]'"
),
}
)
# Derive agent_path from session if not provided
if not agent_path and _session:
agent_path = f"exports/{_session.name}"
@@ -2986,7 +3056,7 @@ def debug_test(
# Find which file contains the test
test_file = None
for py_file in tests_dir.glob("test_*.py"):
content = py_file.read_text()
content = py_file.read_text(encoding="utf-8")
if f"def {test_name}" in content or f"async def {test_name}" in content:
test_file = py_file
break
@@ -3017,10 +3087,12 @@ def debug_test(
try:
result = subprocess.run(
cmd,
encoding="utf-8",
capture_output=True,
text=True,
timeout=120, # 2 minute timeout for single test
env=env,
stdin=subprocess.DEVNULL,
)
except subprocess.TimeoutExpired:
return json.dumps(
@@ -3138,7 +3210,7 @@ def list_tests(
tests = []
for test_file in sorted(tests_dir.glob("test_*.py")):
try:
content = test_file.read_text()
content = test_file.read_text(encoding="utf-8")
tree = ast.parse(content)
# Find all async function definitions that start with "test_"
@@ -3644,82 +3716,6 @@ def list_agent_sessions(
)
@mcp.tool()
def get_agent_session_state(
agent_work_dir: Annotated[str, "Path to the agent's working directory"],
session_id: Annotated[str, "The session ID (e.g., 'session_20260208_143022_abc12345')"],
) -> str:
"""
Load full session state for a specific session.
Returns complete session data including status, progress, result,
metrics, and checkpoint info. Memory values are excluded to prevent
context bloat -- use get_agent_session_memory to retrieve memory contents.
"""
state_path = Path(agent_work_dir) / "sessions" / session_id / "state.json"
data = _read_session_json(state_path)
if data is None:
return json.dumps({"error": f"Session not found: {session_id}"})
memory = data.get("memory", {})
data["memory_keys"] = list(memory.keys()) if isinstance(memory, dict) else []
data["memory_size"] = len(memory) if isinstance(memory, dict) else 0
data.pop("memory", None)
return json.dumps(data, indent=2, default=str)
@mcp.tool()
def get_agent_session_memory(
agent_work_dir: Annotated[str, "Path to the agent's working directory"],
session_id: Annotated[str, "The session ID"],
key: Annotated[str, "Specific memory key to retrieve. Empty for all."] = "",
) -> str:
"""
Get memory contents from a session.
Memory stores intermediate results passed between nodes. Use this
to inspect what data was produced during execution.
If key is provided, returns only that memory key's value.
If key is empty, returns all memory keys and their values.
"""
state_path = Path(agent_work_dir) / "sessions" / session_id / "state.json"
data = _read_session_json(state_path)
if data is None:
return json.dumps({"error": f"Session not found: {session_id}"})
memory = data.get("memory", {})
if not isinstance(memory, dict):
memory = {}
if key:
if key not in memory:
return json.dumps(
{
"error": f"Memory key not found: '{key}'",
"available_keys": list(memory.keys()),
}
)
value = memory[key]
return json.dumps(
{
"session_id": session_id,
"key": key,
"value": value,
"value_type": type(value).__name__,
},
indent=2,
default=str,
)
return json.dumps(
{"session_id": session_id, "memory": memory, "total_keys": len(memory)},
indent=2,
default=str,
)
@mcp.tool()
def list_agent_checkpoints(
agent_work_dir: Annotated[str, "Path to the agent's working directory"],
+154 -60
View File
@@ -401,6 +401,43 @@ def register_commands(subparsers: argparse._SubParsersAction) -> None:
)
serve_parser.set_defaults(func=cmd_serve)
# open command (serve + auto-open browser)
open_parser = subparsers.add_parser(
"open",
help="Start HTTP server and open dashboard in browser",
description="Shortcut for 'hive serve --open'. "
"Starts the HTTP server and opens the dashboard.",
)
open_parser.add_argument(
"--host",
type=str,
default="127.0.0.1",
help="Host to bind (default: 127.0.0.1)",
)
open_parser.add_argument(
"--port",
"-p",
type=int,
default=8787,
help="Port to listen on (default: 8787)",
)
open_parser.add_argument(
"--agent",
"-a",
type=str,
action="append",
default=[],
help="Agent path to preload (repeatable)",
)
open_parser.add_argument(
"--model",
"-m",
type=str,
default=None,
help="LLM model for preloaded agents",
)
open_parser.set_defaults(func=cmd_open)
def _load_resume_state(
agent_path: str, session_id: str, checkpoint_id: str | None = None
@@ -428,7 +465,7 @@ def _load_resume_state(
if not cp_path.exists():
return None
try:
cp_data = json.loads(cp_path.read_text())
cp_data = json.loads(cp_path.read_text(encoding="utf-8"))
except (json.JSONDecodeError, OSError):
return None
return {
@@ -444,7 +481,7 @@ def _load_resume_state(
if not state_path.exists():
return None
try:
state_data = json.loads(state_path.read_text())
state_data = json.loads(state_path.read_text(encoding="utf-8"))
except (json.JSONDecodeError, OSError):
return None
progress = state_data.get("progress", {})
@@ -517,11 +554,28 @@ def cmd_run(args: argparse.Namespace) -> int:
return 1
elif args.input_file:
try:
with open(args.input_file) as f:
with open(args.input_file, encoding="utf-8") as f:
context = json.load(f)
except (FileNotFoundError, json.JSONDecodeError) as e:
print(f"Error reading input file: {e}", file=sys.stderr)
return 1
# Validate --output path before execution begins (fail fast, before agent loads)
if args.output:
import os
output_parent = Path(args.output).parent
if not output_parent.exists():
print(
f"Error: output directory does not exist: {output_parent}/",
file=sys.stderr,
)
return 1
if not os.access(output_parent, os.W_OK):
print(
f"Error: output directory is not writable: {output_parent}/",
file=sys.stderr,
)
return 1
# Run the agent (with TUI or standard)
if getattr(args, "tui", False):
@@ -659,7 +713,7 @@ def cmd_run(args: argparse.Namespace) -> int:
# Output results
if args.output:
with open(args.output, "w") as f:
with open(args.output, "w", encoding="utf-8") as f:
json.dump(output, f, indent=2, default=str)
if not args.quiet:
print(f"Results written to {args.output}")
@@ -1053,62 +1107,19 @@ def _interactive_approval(request):
def _format_natural_language_to_json(
user_input: str, input_keys: list[str], agent_description: str, session_context: dict = None
) -> dict:
"""Use Haiku to convert natural language input to JSON based on agent's input schema."""
import os
"""Convert natural language input to JSON based on agent's input schema.
import anthropic
Maps user input to the primary input field. For follow-up inputs,
appends to the existing value.
"""
main_field = input_keys[0] if input_keys else "objective"
client = anthropic.Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY"))
# Build prompt for Haiku
session_info = ""
if session_context:
# Extract the main field (usually 'objective') that we'll append to
main_field = input_keys[0] if input_keys else "objective"
existing_value = session_context.get(main_field, "")
if existing_value:
return {main_field: f"{existing_value}\n\n{user_input}"}
session_info = (
f'\n\nExisting {main_field}: "{existing_value}"\n\n'
f"The user is providing ADDITIONAL information. Append this new "
f"information to the existing {main_field} to create an enriched, "
"more detailed version."
)
prompt = f"""You are formatting user input for an agent that requires specific input fields.
Agent: {agent_description}
Required input fields: {", ".join(input_keys)}{session_info}
User input: {user_input}
{"If this is a follow-up, APPEND new info to the existing field value." if session_context else ""}
Output ONLY valid JSON, no explanation:"""
try:
message = client.messages.create(
model="claude-haiku-4-5-20251001", # Fast and cheap
max_tokens=500,
messages=[{"role": "user", "content": prompt}],
)
json_str = message.content[0].text.strip()
# Remove markdown code blocks if present
if json_str.startswith("```"):
json_str = json_str.split("```")[1]
if json_str.startswith("json"):
json_str = json_str[4:]
json_str = json_str.strip()
return json.loads(json_str)
except Exception:
# Fallback: try to infer the main field
if len(input_keys) == 1:
return {input_keys[0]: user_input}
else:
# Put it in the first field as fallback
return {input_keys[0]: user_input}
return {main_field: user_input}
def cmd_shell(args: argparse.Namespace) -> int:
@@ -1517,7 +1528,7 @@ def _extract_python_agent_metadata(agent_path: Path) -> tuple[str, str]:
return fallback_name, fallback_desc
try:
with open(config_path) as f:
with open(config_path, encoding="utf-8") as f:
tree = ast.parse(f.read())
# Find AgentMetadata class definition
@@ -1928,25 +1939,102 @@ def cmd_setup_credentials(args: argparse.Namespace) -> int:
def _open_browser(url: str) -> None:
"""Open URL in the default browser (best-effort, non-blocking)."""
import subprocess
import sys
try:
if sys.platform == "darwin":
subprocess.Popen(["open", url], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
subprocess.Popen(
["open", url],
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
encoding="utf-8",
)
elif sys.platform == "win32":
subprocess.Popen(
["cmd", "/c", "start", "", url],
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
)
elif sys.platform == "linux":
subprocess.Popen(
["xdg-open", url], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL
["xdg-open", url],
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
encoding="utf-8",
)
except Exception:
pass # Best-effort — don't crash if browser can't open
def _build_frontend() -> bool:
"""Build the frontend if source is newer than dist. Returns True if dist exists."""
import subprocess
# Find the frontend directory relative to this file or cwd
candidates = [
Path("core/frontend"),
Path(__file__).resolve().parent.parent.parent / "frontend",
]
frontend_dir: Path | None = None
for c in candidates:
if (c / "package.json").is_file():
frontend_dir = c.resolve()
break
if frontend_dir is None:
return False
dist_dir = frontend_dir / "dist"
src_dir = frontend_dir / "src"
# Skip build if dist is up-to-date (newest src file older than dist index.html)
index_html = dist_dir / "index.html"
if index_html.exists() and src_dir.is_dir():
dist_mtime = index_html.stat().st_mtime
needs_build = False
for f in src_dir.rglob("*"):
if f.is_file() and f.stat().st_mtime > dist_mtime:
needs_build = True
break
if not needs_build:
return True
# Need to build
print("Building frontend...")
try:
# Ensure deps are installed
subprocess.run(
["npm", "install", "--no-fund", "--no-audit"],
encoding="utf-8",
cwd=frontend_dir,
check=True,
capture_output=True,
)
subprocess.run(
["npm", "run", "build"],
encoding="utf-8",
cwd=frontend_dir,
check=True,
capture_output=True,
)
print("Frontend built.")
return True
except FileNotFoundError:
print("Node.js not found — skipping frontend build.")
return dist_dir.is_dir()
except subprocess.CalledProcessError as exc:
stderr = exc.stderr.decode(errors="replace") if exc.stderr else ""
print(f"Frontend build failed: {stderr[:500]}")
return dist_dir.is_dir()
def cmd_serve(args: argparse.Namespace) -> int:
"""Start the HTTP API server."""
import logging
from aiohttp import web
_build_frontend()
from framework.server.app import create_app
logging.basicConfig(
@@ -1971,7 +2059,7 @@ def cmd_serve(args: argparse.Namespace) -> int:
print(f"Error loading {agent_path}: {e}")
# Start server using AppRunner/TCPSite (same pattern as webhook_server.py)
runner = web.AppRunner(app)
runner = web.AppRunner(app, access_log=None)
await runner.setup()
site = web.TCPSite(runner, args.host, args.port)
await site.start()
@@ -2012,3 +2100,9 @@ def cmd_serve(args: argparse.Namespace) -> int:
print("\nServer stopped.")
return 0
def cmd_open(args: argparse.Namespace) -> int:
"""Start the HTTP API server and open the dashboard in the browser."""
args.open = True
return cmd_serve(args)
+27 -8
View File
@@ -7,6 +7,8 @@ Supports both STDIO and HTTP transports using the official MCP Python SDK.
import asyncio
import logging
import os
import sys
import threading
from dataclasses import dataclass, field
from typing import Any, Literal
@@ -73,6 +75,8 @@ class MCPClient:
# Background event loop for persistent STDIO connection
self._loop = None
self._loop_thread = None
# Serialize STDIO tool calls (avoids races, helps on Windows)
self._stdio_call_lock = threading.Lock()
def _run_async(self, coro):
"""
@@ -156,11 +160,19 @@ class MCPClient:
# Create server parameters
# Always inherit parent environment and merge with any custom env vars
merged_env = {**os.environ, **(self.config.env or {})}
# On Windows, passing cwd can cause WinError 267 ("invalid directory name").
# tool_registry passes cwd=None and uses absolute script paths when applicable.
cwd = self.config.cwd
if os.name == "nt" and cwd is not None:
# Avoid passing cwd on Windows; tool_registry should have set cwd=None
# and absolute script paths for tools-dir servers. If cwd is still set,
# pass None to prevent WinError 267 (caller should use absolute paths).
cwd = None
server_params = StdioServerParameters(
command=self.config.command,
args=self.config.args,
env=merged_env,
cwd=self.config.cwd,
cwd=cwd,
)
# Store for later use
@@ -184,10 +196,12 @@ class MCPClient:
from mcp.client.stdio import stdio_client
# Create persistent stdio client context.
# Redirect server stderr to devnull to prevent raw
# output from leaking behind the TUI.
devnull = open(os.devnull, "w") # noqa: SIM115
self._stdio_context = stdio_client(server_params, errlog=devnull)
# On Windows, use stderr so subprocess startup errors are visible.
if os.name == "nt":
errlog = sys.stderr
else:
errlog = open(os.devnull, "w") # noqa: SIM115
self._stdio_context = stdio_client(server_params, errlog=errlog)
(
self._read_stream,
self._write_stream,
@@ -353,7 +367,8 @@ class MCPClient:
raise ValueError(f"Unknown tool: {tool_name}")
if self.config.transport == "stdio":
return self._run_async(self._call_tool_stdio_async(tool_name, arguments))
with self._stdio_call_lock:
return self._run_async(self._call_tool_stdio_async(tool_name, arguments))
else:
return self._call_tool_http(tool_name, arguments)
@@ -448,11 +463,15 @@ class MCPClient:
if self._stdio_context:
await self._stdio_context.__aexit__(None, None, None)
except asyncio.CancelledError:
logger.warning(
logger.debug(
"STDIO context cleanup was cancelled; proceeding with best-effort shutdown"
)
except Exception as e:
logger.warning(f"Error closing STDIO context: {e}")
msg = str(e).lower()
if "cancel scope" in msg or "different task" in msg:
logger.debug("STDIO context teardown (known anyio quirk): %s", e)
else:
logger.warning(f"Error closing STDIO context: {e}")
finally:
self._stdio_context = None
+185
View File
@@ -0,0 +1,185 @@
"""Pre-load validation for agent graphs.
Runs structural and credential checks before MCP servers are spawned.
Fails fast with actionable error messages.
"""
from __future__ import annotations
import logging
from dataclasses import dataclass, field
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from framework.graph.edge import GraphSpec
from framework.graph.node import NodeSpec
logger = logging.getLogger(__name__)
class PreloadValidationError(Exception):
"""Raised when pre-load validation fails."""
def __init__(self, errors: list[str]):
self.errors = errors
msg = "Pre-load validation failed:\n" + "\n".join(f" - {e}" for e in errors)
super().__init__(msg)
@dataclass
class PreloadResult:
"""Result of pre-load validation."""
valid: bool
errors: list[str] = field(default_factory=list)
warnings: list[str] = field(default_factory=list)
def validate_graph_structure(graph: GraphSpec) -> list[str]:
"""Run graph structural validation (includes GCU subagent-only checks).
Delegates to GraphSpec.validate() which checks entry/terminal nodes,
edge references, reachability, fan-out rules, and GCU constraints.
"""
return graph.validate()
def validate_credentials(
nodes: list[NodeSpec],
*,
interactive: bool = True,
skip: bool = False,
) -> None:
"""Validate agent credentials.
Calls ``validate_agent_credentials`` which performs two-phase validation:
1. Presence check (env var, encrypted store, Aden sync)
2. Health check (lightweight HTTP call to verify the key works)
On failure raises ``CredentialError`` with ``validation_result`` and
``failed_cred_names`` attributes preserved from the upstream check.
In interactive mode (CLI with TTY), attempts recovery via the
credential setup flow before re-raising.
"""
if skip:
return
from framework.credentials.validation import validate_agent_credentials
if not interactive:
# Non-interactive: let CredentialError propagate with full context.
# validate_agent_credentials attaches .validation_result and
# .failed_cred_names to the exception automatically.
validate_agent_credentials(nodes)
return
import sys
from framework.credentials.models import CredentialError
try:
validate_agent_credentials(nodes)
except CredentialError as e:
if not sys.stdin.isatty():
raise
print(f"\n{e}", file=sys.stderr)
from framework.credentials.validation import build_setup_session_from_error
session = build_setup_session_from_error(e, nodes=nodes)
if not session.missing:
raise
result = session.run_interactive()
if not result.success:
# Preserve the original validation_result so callers can
# inspect which credentials are still missing.
exc = CredentialError(
"Credential setup incomplete. Run again after configuring the required credentials."
)
if hasattr(e, "validation_result"):
exc.validation_result = e.validation_result # type: ignore[attr-defined]
if hasattr(e, "failed_cred_names"):
exc.failed_cred_names = e.failed_cred_names # type: ignore[attr-defined]
raise exc from None
# Re-validate after successful setup — this will raise if still broken,
# with fresh validation_result attached to the new exception.
validate_agent_credentials(nodes)
def credential_errors_to_json(exc: Exception) -> dict:
"""Extract structured credential failure details from a CredentialError.
Returns a dict suitable for JSON serialization with enough detail for
the queen to report actionable guidance to the user. Falls back to
``str(exc)`` when rich metadata is not available.
"""
result = getattr(exc, "validation_result", None)
if result is None:
return {
"error": "credentials_required",
"message": str(exc),
}
failed = result.failed
missing = []
for c in failed:
if c.available:
status = "invalid"
elif c.aden_not_connected:
status = "aden_not_connected"
else:
status = "missing"
entry: dict = {
"credential": c.credential_name,
"env_var": c.env_var,
"status": status,
}
if c.tools:
entry["tools"] = c.tools
if c.node_types:
entry["node_types"] = c.node_types
if c.help_url:
entry["help_url"] = c.help_url
if c.validation_message:
entry["validation_message"] = c.validation_message
missing.append(entry)
return {
"error": "credentials_required",
"message": str(exc),
"missing_credentials": missing,
}
def run_preload_validation(
graph: GraphSpec,
*,
interactive: bool = True,
skip_credential_validation: bool = False,
) -> PreloadResult:
"""Run all pre-load validations.
Order:
1. Graph structure (includes GCU subagent-only checks) non-recoverable
2. Credentials potentially recoverable via interactive setup
Raises PreloadValidationError for structural issues.
Raises CredentialError for credential issues.
"""
# 1. Structural validation (calls graph.validate() which includes GCU checks)
graph_errors = validate_graph_structure(graph)
if graph_errors:
raise PreloadValidationError(graph_errors)
# 2. Credential validation
validate_credentials(
graph.nodes,
interactive=interactive,
skip=skip_credential_validation,
)
return PreloadResult(valid=True)
+198 -80
View File
@@ -12,7 +12,6 @@ from typing import TYPE_CHECKING, Any
from framework.config import get_hive_config, get_preferred_model
from framework.credentials.validation import (
ensure_credential_key_env as _ensure_credential_key_env,
validate_agent_credentials,
)
from framework.graph import Goal
from framework.graph.edge import (
@@ -25,6 +24,7 @@ from framework.graph.edge import (
from framework.graph.executor import ExecutionResult
from framework.graph.node import NodeSpec
from framework.llm.provider import LLMProvider, Tool
from framework.runner.preload_validation import run_preload_validation
from framework.runner.tool_registry import ToolRegistry
from framework.runtime.agent_runtime import AgentRuntime, AgentRuntimeConfig, create_agent_runtime
from framework.runtime.execution_stream import EntryPointSpec
@@ -39,6 +39,7 @@ logger = logging.getLogger(__name__)
CLAUDE_CREDENTIALS_FILE = Path.home() / ".claude" / ".credentials.json"
CLAUDE_OAUTH_TOKEN_URL = "https://console.anthropic.com/v1/oauth/token"
CLAUDE_OAUTH_CLIENT_ID = "9d1c250a-e61b-44d9-88ed-5944d1962f5e"
CLAUDE_KEYCHAIN_SERVICE = "Claude Code-credentials"
# Buffer in seconds before token expiry to trigger a proactive refresh
_TOKEN_REFRESH_BUFFER_SECS = 300 # 5 minutes
@@ -51,6 +52,96 @@ CODEX_KEYCHAIN_SERVICE = "Codex Auth"
_CODEX_TOKEN_LIFETIME_SECS = 3600 # 1 hour (no explicit expiry field)
def _read_claude_keychain() -> dict | None:
"""Read Claude Code credentials from macOS Keychain.
Returns the parsed JSON dict, or None if not on macOS or entry missing.
"""
import getpass
import platform
import subprocess
if platform.system() != "Darwin":
return None
try:
account = getpass.getuser()
result = subprocess.run(
[
"security",
"find-generic-password",
"-s",
CLAUDE_KEYCHAIN_SERVICE,
"-a",
account,
"-w",
],
capture_output=True,
encoding="utf-8",
timeout=5,
)
if result.returncode != 0:
return None
raw = result.stdout.strip()
if not raw:
return None
return json.loads(raw)
except (subprocess.TimeoutExpired, json.JSONDecodeError, OSError) as exc:
logger.debug("Claude keychain read failed: %s", exc)
return None
def _save_claude_keychain(creds: dict) -> bool:
"""Write Claude Code credentials to macOS Keychain. Returns True on success."""
import getpass
import platform
import subprocess
if platform.system() != "Darwin":
return False
try:
account = getpass.getuser()
data = json.dumps(creds)
result = subprocess.run(
[
"security",
"add-generic-password",
"-U",
"-s",
CLAUDE_KEYCHAIN_SERVICE,
"-a",
account,
"-w",
data,
],
capture_output=True,
timeout=5,
)
return result.returncode == 0
except (subprocess.TimeoutExpired, OSError) as exc:
logger.debug("Claude keychain write failed: %s", exc)
return False
def _read_claude_credentials() -> dict | None:
"""Read Claude Code credentials from Keychain (macOS) or file (Linux/Windows)."""
# Try macOS Keychain first
creds = _read_claude_keychain()
if creds:
return creds
# Fall back to file
if not CLAUDE_CREDENTIALS_FILE.exists():
return None
try:
with open(CLAUDE_CREDENTIALS_FILE, encoding="utf-8") as f:
return json.load(f)
except (json.JSONDecodeError, OSError):
return None
def _refresh_claude_code_token(refresh_token: str) -> dict | None:
"""Refresh the Claude Code OAuth token using the refresh token.
@@ -89,16 +180,14 @@ def _refresh_claude_code_token(refresh_token: str) -> dict | None:
def _save_refreshed_credentials(token_data: dict) -> None:
"""Write refreshed token data back to ~/.claude/.credentials.json."""
"""Write refreshed token data back to Keychain (macOS) or credentials file."""
import time
if not CLAUDE_CREDENTIALS_FILE.exists():
creds = _read_claude_credentials()
if not creds:
return
try:
with open(CLAUDE_CREDENTIALS_FILE) as f:
creds = json.load(f)
oauth = creds.get("claudeAiOauth", {})
oauth["accessToken"] = token_data["access_token"]
if "refresh_token" in token_data:
@@ -107,9 +196,15 @@ def _save_refreshed_credentials(token_data: dict) -> None:
oauth["expiresAt"] = int((time.time() + token_data["expires_in"]) * 1000)
creds["claudeAiOauth"] = oauth
with open(CLAUDE_CREDENTIALS_FILE, "w") as f:
json.dump(creds, f, indent=2)
logger.debug("Claude Code credentials refreshed successfully")
# Try Keychain first (macOS), fall back to file
if _save_claude_keychain(creds):
logger.debug("Claude Code credentials refreshed in Keychain")
return
if CLAUDE_CREDENTIALS_FILE.exists():
with open(CLAUDE_CREDENTIALS_FILE, "w", encoding="utf-8") as f:
json.dump(creds, f, indent=2)
logger.debug("Claude Code credentials refreshed in file")
except (json.JSONDecodeError, OSError, KeyError) as exc:
logger.debug("Failed to save refreshed credentials: %s", exc)
@@ -117,8 +212,8 @@ def _save_refreshed_credentials(token_data: dict) -> None:
def get_claude_code_token() -> str | None:
"""Get the OAuth token from Claude Code subscription with auto-refresh.
Reads from ~/.claude/.credentials.json which is created by the
Claude Code CLI when users authenticate with their subscription.
Reads from macOS Keychain (on Darwin) or ~/.claude/.credentials.json
(on Linux/Windows), as created by the Claude Code CLI.
If the token is expired or close to expiry, attempts an automatic
refresh using the stored refresh token.
@@ -128,13 +223,8 @@ def get_claude_code_token() -> str | None:
"""
import time
if not CLAUDE_CREDENTIALS_FILE.exists():
return None
try:
with open(CLAUDE_CREDENTIALS_FILE) as f:
creds = json.load(f)
except (json.JSONDecodeError, OSError):
creds = _read_claude_credentials()
if not creds:
return None
oauth = creds.get("claudeAiOauth", {})
@@ -212,7 +302,7 @@ def _read_codex_keychain() -> dict | None:
"-w",
],
capture_output=True,
text=True,
encoding="utf-8",
timeout=5,
)
if result.returncode != 0:
@@ -231,7 +321,7 @@ def _read_codex_auth_file() -> dict | None:
if not CODEX_AUTH_FILE.exists():
return None
try:
with open(CODEX_AUTH_FILE) as f:
with open(CODEX_AUTH_FILE, encoding="utf-8") as f:
return json.load(f)
except (json.JSONDecodeError, OSError):
return None
@@ -322,8 +412,9 @@ def _save_refreshed_codex_credentials(auth_data: dict, token_data: dict) -> None
auth_data["tokens"] = tokens
auth_data["last_refresh"] = datetime.now(UTC).isoformat()
CODEX_AUTH_FILE.parent.mkdir(parents=True, exist_ok=True)
with open(CODEX_AUTH_FILE, "w") as f:
CODEX_AUTH_FILE.parent.mkdir(parents=True, exist_ok=True, mode=0o700)
fd = os.open(CODEX_AUTH_FILE, os.O_WRONLY | os.O_CREAT | os.O_TRUNC, 0o600)
with os.fdopen(fd, "w", encoding="utf-8") as f:
json.dump(auth_data, f, indent=2)
logger.debug("Codex credentials refreshed successfully")
except (OSError, KeyError) as exc:
@@ -620,6 +711,7 @@ class AgentRunner:
requires_account_selection: bool = False,
configure_for_account: Callable | None = None,
list_accounts: Callable | None = None,
credential_store: Any | None = None,
):
"""
Initialize the runner (use AgentRunner.load() instead).
@@ -639,6 +731,7 @@ class AgentRunner:
requires_account_selection: If True, TUI shows account picker before starting.
configure_for_account: Callback(runner, account_dict) to scope tools after selection.
list_accounts: Callback() -> list[dict] to fetch available accounts.
credential_store: Optional shared CredentialStore (avoids creating redundant stores).
"""
self.agent_path = agent_path
self.graph = graph
@@ -652,6 +745,7 @@ class AgentRunner:
self.requires_account_selection = requires_account_selection
self._configure_for_account = configure_for_account
self._list_accounts = list_accounts
self._credential_store = credential_store
# Set up storage
if storage_path:
@@ -678,68 +772,29 @@ class AgentRunner:
self._agent_runtime: AgentRuntime | None = None
self._uses_async_entry_points = self.graph.has_async_entry_points()
# Validate credentials before spawning MCP servers.
# Pre-load validation: structural checks + credentials.
# Fails fast with actionable guidance — no MCP noise on screen.
self._validate_credentials()
run_preload_validation(
self.graph,
interactive=self._interactive,
skip_credential_validation=self.skip_credential_validation,
)
# Auto-discover tools from tools.py
tools_path = agent_path / "tools.py"
if tools_path.exists():
self._tool_registry.discover_from_module(tools_path)
# Set environment variables for MCP subprocesses
# These are inherited by MCP servers (e.g., GCU browser tools)
os.environ["HIVE_AGENT_NAME"] = agent_path.name
os.environ["HIVE_STORAGE_PATH"] = str(self._storage_path)
# Auto-discover MCP servers from mcp_servers.json
mcp_config_path = agent_path / "mcp_servers.json"
if mcp_config_path.exists():
self._load_mcp_servers_from_config(mcp_config_path)
def _validate_credentials(self) -> None:
"""Check that required credentials are available before spawning MCP servers.
If ``interactive`` is True and stdin is a TTY, automatically launches
the interactive credential setup flow so the user can fix the issue
in-place. Re-validates after setup succeeds.
When ``interactive`` is False (e.g. TUI callers), the CredentialError
propagates immediately so the caller can handle it with its own UI.
"""
if self.skip_credential_validation:
return
if not self._interactive:
# Let the CredentialError propagate — caller handles UI.
validate_agent_credentials(self.graph.nodes)
return
import sys
from framework.credentials.models import CredentialError
try:
validate_agent_credentials(self.graph.nodes)
return # All good
except CredentialError as e:
if not sys.stdin.isatty():
raise
# Interactive: show the error then enter credential setup
print(f"\n{e}", file=sys.stderr)
from framework.credentials.validation import build_setup_session_from_error
session = build_setup_session_from_error(e, nodes=self.graph.nodes)
if not session.missing:
raise
result = session.run_interactive()
if not result.success:
raise CredentialError(
"Credential setup incomplete. "
"Run again after configuring the required credentials."
) from None
# Re-validate after setup
validate_agent_credentials(self.graph.nodes)
@staticmethod
def _import_agent_module(agent_path: Path):
"""Import an agent package from its directory path.
@@ -788,6 +843,7 @@ class AgentRunner:
model: str | None = None,
interactive: bool = True,
skip_credential_validation: bool | None = None,
credential_store: Any | None = None,
) -> "AgentRunner":
"""
Load an agent from an export folder.
@@ -805,6 +861,7 @@ class AgentRunner:
Set to False from TUI callers that handle setup via their own UI.
skip_credential_validation: If True, skip credential checks at load time.
When None (default), uses the agent module's setting.
credential_store: Optional shared CredentialStore (avoids creating redundant stores).
Returns:
AgentRunner instance ready to run
@@ -894,15 +951,19 @@ class AgentRunner:
requires_account_selection=needs_acct,
configure_for_account=configure_fn,
list_accounts=list_accts_fn,
credential_store=credential_store,
)
# Fallback: load from agent.json (legacy JSON-based agents)
agent_json_path = agent_path / "agent.json"
if not agent_json_path.exists():
if not agent_json_path.is_file():
raise FileNotFoundError(f"No agent.py or agent.json found in {agent_path}")
with open(agent_json_path) as f:
graph, goal = load_agent_export(f.read())
content = agent_json_path.read_text(encoding="utf-8").strip()
if not content:
raise FileNotFoundError(f"agent.json is empty: {agent_json_path}")
graph, goal = load_agent_export(content)
return cls(
agent_path=agent_path,
@@ -913,6 +974,7 @@ class AgentRunner:
model=model,
interactive=interactive,
skip_credential_validation=skip_credential_validation or False,
credential_store=credential_store,
)
def register_tool(
@@ -1118,7 +1180,9 @@ class AgentRunner:
# Fail fast if the agent needs an LLM but none was configured
if self._llm is None:
has_llm_nodes = any(node.node_type == "event_loop" for node in self.graph.nodes)
has_llm_nodes = any(
node.node_type in ("event_loop", "gcu") for node in self.graph.nodes
)
if has_llm_nodes:
from framework.credentials.models import CredentialError
@@ -1136,6 +1200,52 @@ class AgentRunner:
)
raise CredentialError(f"LLM API key not found for model '{self.model}'. {hint}")
# For GCU nodes: auto-register GCU MCP server if needed, then expand tool lists
has_gcu_nodes = any(node.node_type == "gcu" for node in self.graph.nodes)
if has_gcu_nodes:
from framework.graph.gcu import GCU_MCP_SERVER_CONFIG, GCU_SERVER_NAME
# Auto-register GCU MCP server if tools aren't loaded yet
gcu_tool_names = self._tool_registry.get_server_tool_names(GCU_SERVER_NAME)
if not gcu_tool_names:
# Resolve cwd to repo-level tools/ (not relative to agent_path)
gcu_config = dict(GCU_MCP_SERVER_CONFIG)
_repo_root = Path(__file__).resolve().parent.parent.parent.parent
gcu_config["cwd"] = str(_repo_root / "tools")
self._tool_registry.register_mcp_server(gcu_config)
gcu_tool_names = self._tool_registry.get_server_tool_names(GCU_SERVER_NAME)
# Expand each GCU node's tools list to include all GCU server tools
if gcu_tool_names:
for node in self.graph.nodes:
if node.node_type == "gcu":
existing = set(node.tools)
for tool_name in sorted(gcu_tool_names):
if tool_name not in existing:
node.tools.append(tool_name)
# For event_loop/gcu nodes: auto-register file tools MCP server, then expand tool lists
has_loop_nodes = any(node.node_type in ("event_loop", "gcu") for node in self.graph.nodes)
if has_loop_nodes:
from framework.graph.files import FILES_MCP_SERVER_CONFIG, FILES_MCP_SERVER_NAME
files_tool_names = self._tool_registry.get_server_tool_names(FILES_MCP_SERVER_NAME)
if not files_tool_names:
# Resolve cwd to repo-level tools/ (not relative to agent_path)
files_config = dict(FILES_MCP_SERVER_CONFIG)
_repo_root = Path(__file__).resolve().parent.parent.parent.parent
files_config["cwd"] = str(_repo_root / "tools")
self._tool_registry.register_mcp_server(files_config)
files_tool_names = self._tool_registry.get_server_tool_names(FILES_MCP_SERVER_NAME)
if files_tool_names:
for node in self.graph.nodes:
if node.node_type in ("event_loop", "gcu"):
existing = set(node.tools)
for tool_name in sorted(files_tool_names):
if tool_name not in existing:
node.tools.append(tool_name)
# Get tools for runtime
tools = list(self._tool_registry.get_tools().values())
tool_executor = self._tool_registry.get_executor()
@@ -1147,7 +1257,10 @@ class AgentRunner:
try:
from aden_tools.credentials.store_adapter import CredentialStoreAdapter
adapter = CredentialStoreAdapter.default()
if self._credential_store is not None:
adapter = CredentialStoreAdapter(store=self._credential_store)
else:
adapter = CredentialStoreAdapter.default()
accounts_data = adapter.get_all_account_info()
tool_provider_map = adapter.get_tool_provider_map()
if accounts_data:
@@ -1218,9 +1331,11 @@ class AgentRunner:
return None
try:
from framework.credentials import CredentialStore
store = self._credential_store
if store is None:
from framework.credentials import CredentialStore
store = CredentialStore.with_encrypted_storage()
store = CredentialStore.with_encrypted_storage()
return store.get(cred_id)
except Exception:
return None
@@ -1263,6 +1378,7 @@ class AgentRunner:
isolation_level=async_ep.isolation_level,
priority=async_ep.priority,
max_concurrent=async_ep.max_concurrent,
max_resurrections=async_ep.max_resurrections,
)
entry_points.append(ep)
@@ -1672,7 +1788,9 @@ class AgentRunner:
warnings.append(warning_msg)
except ImportError:
# aden_tools not installed - fall back to direct check
has_llm_nodes = any(node.node_type == "event_loop" for node in self.graph.nodes)
has_llm_nodes = any(
node.node_type in ("event_loop", "gcu") for node in self.graph.nodes
)
if has_llm_nodes:
api_key_env = self._get_api_key_env_var(self.model)
if api_key_env and not os.environ.get(api_key_env):
+113 -4
View File
@@ -61,6 +61,7 @@ class ToolRegistry:
self._mcp_tool_names: set[str] = set() # Tool names registered from MCP
self._mcp_cred_snapshot: set[str] = set() # Credential filenames at MCP load time
self._mcp_aden_key_snapshot: str | None = None # ADEN_API_KEY value at MCP load time
self._mcp_server_tools: dict[str, set[str]] = {} # server name -> tool names
def register(
self,
@@ -294,6 +295,10 @@ class ToolRegistry:
"""Check if a tool is registered."""
return name in self._tools
def get_server_tool_names(self, server_name: str) -> set[str]:
"""Return tool names registered from a specific MCP server."""
return set(self._mcp_server_tools.get(server_name, set()))
def set_session_context(self, **context) -> None:
"""
Set session context to auto-inject into tool calls.
@@ -321,6 +326,103 @@ class ToolRegistry:
"""Restore execution context to its previous state."""
_execution_context.reset(token)
@staticmethod
def resolve_mcp_stdio_config(server_config: dict[str, Any], base_dir: Path) -> dict[str, Any]:
"""Resolve cwd and script paths for MCP stdio config (Windows compatibility).
Use this when building MCPServerConfig from a config file (e.g. in
list_agent_tools, discover_mcp_tools) so hive-tools and other servers
work on Windows. Call with base_dir = directory containing the config.
"""
registry = ToolRegistry()
return registry._resolve_mcp_server_config(server_config, base_dir)
def _resolve_mcp_server_config(
self, server_config: dict[str, Any], base_dir: Path
) -> dict[str, Any]:
"""Resolve cwd and script paths for MCP stdio servers (Windows compatibility).
On Windows, passing cwd to subprocess can cause WinError 267. We use cwd=None
and absolute script paths when the server runs a .py script from the tools dir.
If the resolved cwd doesn't exist (e.g. config from ~/.hive/agents/), fall back
to Path.cwd() / "tools".
"""
config = dict(server_config)
if config.get("transport") != "stdio":
return config
cwd = config.get("cwd")
args = list(config.get("args", []))
if not cwd and not args:
return config
# Resolve cwd relative to base_dir
resolved_cwd: Path | None = None
if cwd:
if Path(cwd).is_absolute():
resolved_cwd = Path(cwd)
else:
resolved_cwd = (base_dir / cwd).resolve()
# Find .py script in args (e.g. coder_tools_server.py, files_server.py)
script_name = None
for i, arg in enumerate(args):
if isinstance(arg, str) and arg.endswith(".py"):
script_name = arg
script_idx = i
break
if resolved_cwd is None:
return config
# If resolved cwd doesn't exist or (when we have a script) doesn't contain it,
# try fallback
tools_fallback = Path.cwd() / "tools"
need_fallback = not resolved_cwd.is_dir()
if script_name and not need_fallback:
need_fallback = not (resolved_cwd / script_name).exists()
if need_fallback:
fallback_ok = tools_fallback.is_dir()
if script_name:
fallback_ok = fallback_ok and (tools_fallback / script_name).exists()
else:
# No script (e.g. GCU); just need tools dir to exist
pass
if fallback_ok:
resolved_cwd = tools_fallback
logger.debug(
"MCP server '%s': using fallback tools dir %s",
config.get("name", "?"),
resolved_cwd,
)
else:
config["cwd"] = str(resolved_cwd)
return config
if not script_name:
# No .py script (e.g. GCU uses -m gcu.server); just set cwd
config["cwd"] = str(resolved_cwd)
return config
# For coder_tools_server, inject --project-root so writes go to the expected workspace
if script_name and "coder_tools" in script_name:
project_root = str(resolved_cwd.parent.resolve())
args = list(args)
if "--project-root" not in args:
args.extend(["--project-root", project_root])
config["args"] = args
if os.name == "nt":
# Windows: cwd=None avoids WinError 267; use absolute script path
config["cwd"] = None
abs_script = str((resolved_cwd / script_name).resolve())
args = list(config["args"])
args[script_idx] = abs_script
config["args"] = args
else:
config["cwd"] = str(resolved_cwd)
return config
def load_mcp_config(self, config_path: Path) -> None:
"""
Load and register MCP servers from a config file.
@@ -335,7 +437,7 @@ class ToolRegistry:
self._mcp_config_path = Path(config_path)
try:
with open(config_path) as f:
with open(config_path, encoding="utf-8") as f:
config = json.load(f)
except Exception as e:
logger.warning(f"Failed to load MCP config from {config_path}: {e}")
@@ -352,9 +454,7 @@ class ToolRegistry:
server_list = [{"name": name, **cfg} for name, cfg in config.items()]
for server_config in server_list:
cwd = server_config.get("cwd")
if cwd and not Path(cwd).is_absolute():
server_config["cwd"] = str((base_dir / cwd).resolve())
server_config = self._resolve_mcp_server_config(server_config, base_dir)
try:
self.register_mcp_server(server_config)
except Exception as e:
@@ -411,6 +511,9 @@ class ToolRegistry:
self._mcp_clients.append(client)
# Register each tool
server_name = server_config["name"]
if server_name not in self._mcp_server_tools:
self._mcp_server_tools[server_name] = set()
count = 0
for mcp_tool in client.list_tools():
# Convert MCP tool to framework Tool (strips context params from LLM schema)
@@ -464,6 +567,7 @@ class ToolRegistry:
make_mcp_executor(client, mcp_tool.name, self, tool_params),
)
self._mcp_tool_names.add(mcp_tool.name)
self._mcp_server_tools[server_name].add(mcp_tool.name)
count += 1
logger.info(f"Registered {count} tools from MCP server '{config.name}'")
@@ -471,6 +575,11 @@ class ToolRegistry:
except Exception as e:
logger.error(f"Failed to register MCP server: {e}")
if "Connection closed" in str(e) and os.name == "nt":
logger.debug(
"On Windows, check that the MCP subprocess starts (e.g. uv in PATH, "
"script path correct). Worker config uses base_dir = mcp_servers.json parent."
)
return 0
def _convert_mcp_tool_to_framework_tool(self, mcp_tool: Any) -> Tool:
+165 -20
View File
@@ -411,7 +411,12 @@ class AgentRuntime:
)
continue
def _make_cron_timer(entry_point_id: str, expr: str, immediate: bool):
def _make_cron_timer(
entry_point_id: str,
expr: str,
immediate: bool,
idle_timeout: float = 300,
):
async def _cron_loop():
from croniter import croniter
@@ -442,11 +447,28 @@ class AgentRuntime:
await asyncio.sleep(max(0, sleep_secs))
continue
# Gate: skip tick if previous execution still running
_stream = self._streams.get(entry_point_id)
if _stream and _stream.active_execution_ids:
logger.debug(
"Cron '%s': execution already in progress, skipping tick",
# Gate: skip tick if ANY stream is actively working.
# If the execution is idle (no LLM/tool activity
# beyond idle_timeout) let the timer proceed —
# execute() will cancel the stale execution.
_any_active = False
_min_idle = float("inf")
for _s in self._streams.values():
if _s.active_execution_ids:
_any_active = True
_idle = _s.agent_idle_seconds
if _idle < _min_idle:
_min_idle = _idle
logger.info(
"Cron '%s': gate — active=%s, idle=%.1fs, timeout=%ds",
entry_point_id,
_any_active,
_min_idle,
idle_timeout,
)
if _any_active and _min_idle < idle_timeout:
logger.info(
"Cron '%s': agent actively working, skipping tick",
entry_point_id,
)
self._timer_next_fire[entry_point_id] = (
@@ -517,7 +539,12 @@ class AgentRuntime:
return _cron_loop
task = asyncio.create_task(
_make_cron_timer(ep_id, cron_expr, run_immediately)()
_make_cron_timer(
ep_id,
cron_expr,
run_immediately,
idle_timeout=tc.get("idle_timeout_seconds", 300),
)()
)
self._timer_tasks.append(task)
logger.info(
@@ -529,7 +556,12 @@ class AgentRuntime:
elif interval and interval > 0:
# Fixed interval mode (original behavior)
def _make_timer(entry_point_id: str, mins: float, immediate: bool):
def _make_timer(
entry_point_id: str,
mins: float,
immediate: bool,
idle_timeout: float = 300,
):
async def _timer_loop():
interval_secs = mins * 60
_persistent_session_id: str | None = None
@@ -551,11 +583,26 @@ class AgentRuntime:
await asyncio.sleep(interval_secs)
continue
# Gate: skip tick if previous execution still running
_stream = self._streams.get(entry_point_id)
if _stream and _stream.active_execution_ids:
logger.debug(
"Timer '%s': execution already in progress, skipping tick",
# Gate: skip tick if agent is actively working.
# Gate: skip tick if ANY stream is actively working.
_any_active = False
_min_idle = float("inf")
for _s in self._streams.values():
if _s.active_execution_ids:
_any_active = True
_idle = _s.agent_idle_seconds
if _idle < _min_idle:
_min_idle = _idle
logger.info(
"Timer '%s': gate — active=%s, idle=%.1fs, timeout=%ds",
entry_point_id,
_any_active,
_min_idle,
idle_timeout,
)
if _any_active and _min_idle < idle_timeout:
logger.info(
"Timer '%s': agent actively working, skipping tick",
entry_point_id,
)
self._timer_next_fire[entry_point_id] = (
@@ -621,7 +668,14 @@ class AgentRuntime:
return _timer_loop
task = asyncio.create_task(_make_timer(ep_id, interval, run_immediately)())
task = asyncio.create_task(
_make_timer(
ep_id,
interval,
run_immediately,
idle_timeout=tc.get("idle_timeout_seconds", 300),
)()
)
self._timer_tasks.append(task)
logger.info(
"Started timer for entry point '%s' every %s min%s",
@@ -961,6 +1015,7 @@ class AgentRuntime:
local_ep: str,
mins: float,
immediate: bool,
idle_timeout: float = 300,
):
async def _timer_loop():
interval_secs = mins * 60
@@ -990,12 +1045,28 @@ class AgentRuntime:
await asyncio.sleep(interval_secs)
continue
# Gate: skip tick if previous execution still running
# Gate: skip tick if ANY stream in this graph is actively working.
_reg = self._graphs.get(gid)
_stream = _reg.streams.get(local_ep) if _reg else None
if _stream and _stream.active_execution_ids:
logger.debug(
"Timer '%s::%s': execution already in progress, skipping tick",
_any_active = False
_min_idle = float("inf")
if _reg:
for _sid, _s in _reg.streams.items():
if _s.active_execution_ids:
_any_active = True
_idle = _s.agent_idle_seconds
if _idle < _min_idle:
_min_idle = _idle
logger.info(
"Timer '%s::%s': gate — active=%s, idle=%.1fs, timeout=%ds",
gid,
local_ep,
_any_active,
_min_idle,
idle_timeout,
)
if _any_active and _min_idle < idle_timeout:
logger.info(
"Timer '%s::%s': agent actively working, skipping tick",
gid,
local_ep,
)
@@ -1066,7 +1137,13 @@ class AgentRuntime:
return _timer_loop
task = asyncio.create_task(
_make_timer(graph_id, ep_id, interval, run_immediately)()
_make_timer(
graph_id,
ep_id,
interval,
run_immediately,
idle_timeout=tc.get("idle_timeout_seconds", 300),
)()
)
timer_tasks.append(task)
logger.info("Timer task created for '%s::%s': %s", graph_id, ep_id, task)
@@ -1174,10 +1251,61 @@ class AgentRuntime:
return float("inf")
return time.monotonic() - self._last_user_input_time
@property
def agent_idle_seconds(self) -> float:
"""Seconds since any stream last had activity (LLM call, tool call, etc.).
Returns the *minimum* idle time across all streams with active
executions. Returns ``float('inf')`` if nothing is running.
"""
min_idle = float("inf")
for reg in self._graphs.values():
for stream in reg.streams.values():
idle = stream.agent_idle_seconds
if idle < min_idle:
min_idle = idle
return min_idle
def get_graph_registration(self, graph_id: str) -> _GraphRegistration | None:
"""Get the registration for a specific graph (or None)."""
return self._graphs.get(graph_id)
def cancel_all_tasks(self, loop: asyncio.AbstractEventLoop) -> bool:
"""Cancel all running execution tasks across all graphs.
Schedules the cancellation on *loop* (the agent event loop) so
that ``_execution_tasks`` is only read from the thread that owns
it, avoiding cross-thread dict access. Safe to call from any
thread (e.g. the Textual UI thread).
Blocks the caller for up to 5 seconds waiting for the result.
For async callers, use :meth:`cancel_all_tasks_async` instead.
"""
future = asyncio.run_coroutine_threadsafe(self.cancel_all_tasks_async(), loop)
try:
return future.result(timeout=5)
except Exception:
logger.warning("cancel_all_tasks: timed out or failed")
return False
async def cancel_all_tasks_async(self) -> bool:
"""Cancel all running execution tasks (runs on the agent loop).
Iterates ``_execution_tasks`` and calls ``task.cancel()`` directly.
Must be awaited on the agent event loop so dict access is
thread-safe. Returns True if at least one task was cancelled.
"""
cancelled = False
for gid in self.list_graphs():
reg = self.get_graph_registration(gid)
if reg:
for stream in reg.streams.values():
for task in list(stream._execution_tasks.values()):
if task and not task.done():
task.cancel()
cancelled = True
return cancelled
def _get_primary_session_state(
self,
exclude_entry_point: str,
@@ -1368,6 +1496,23 @@ class AgentRuntime:
# Fallback: primary graph
return list(self._entry_points.values())
def get_timer_next_fire_in(self, entry_point_id: str) -> float | None:
"""Return seconds until the next timer fire for *entry_point_id*.
Checks the primary graph's ``_timer_next_fire`` dict as well as
all registered secondary graphs. Returns ``None`` when no fire
time is recorded (e.g. the timer is currently executing or the
entry point is not a timer).
"""
mono = self._timer_next_fire.get(entry_point_id)
if mono is not None:
return max(0.0, mono - time.monotonic())
for reg in self._graphs.values():
mono = reg.timer_next_fire.get(entry_point_id)
if mono is not None:
return max(0.0, mono - time.monotonic())
return None
def get_stream(self, entry_point_id: str) -> ExecutionStream | None:
"""Get a specific execution stream."""
return self._streams.get(entry_point_id)
+44 -2
View File
@@ -130,10 +130,19 @@ class EventType(StrEnum):
WORKER_ESCALATION_TICKET = "worker_escalation_ticket"
QUEEN_INTERVENTION_REQUESTED = "queen_intervention_requested"
# Execution resurrection (auto-restart on non-fatal failure)
EXECUTION_RESURRECTED = "execution_resurrected"
# Worker lifecycle (session manager → frontend)
WORKER_LOADED = "worker_loaded"
CREDENTIALS_REQUIRED = "credentials_required"
# Queen mode changes (building ↔ running)
QUEEN_MODE_CHANGED = "queen_mode_changed"
# Subagent reports (one-way progress updates from sub-agents)
SUBAGENT_REPORT = "subagent_report"
@dataclass
class AgentEvent:
@@ -709,15 +718,24 @@ class EventBus:
node_id: str,
prompt: str = "",
execution_id: str | None = None,
options: list[str] | None = None,
) -> None:
"""Emit client input requested event (client_facing=True nodes)."""
"""Emit client input requested event (client_facing=True nodes).
Args:
options: Optional predefined choices for the user (1-3 items).
The frontend appends an "Other" free-text option automatically.
"""
data: dict[str, Any] = {"prompt": prompt}
if options:
data["options"] = options
await self.publish(
AgentEvent(
type=EventType.CLIENT_INPUT_REQUESTED,
stream_id=stream_id,
node_id=node_id,
execution_id=execution_id,
data={"prompt": prompt},
data=data,
)
)
@@ -1012,6 +1030,30 @@ class EventBus:
)
)
async def emit_subagent_report(
self,
stream_id: str,
node_id: str,
subagent_id: str,
message: str,
data: dict[str, Any] | None = None,
execution_id: str | None = None,
) -> None:
"""Emit a one-way progress report from a sub-agent."""
await self.publish(
AgentEvent(
type=EventType.SUBAGENT_REPORT,
stream_id=stream_id,
node_id=node_id,
execution_id=execution_id,
data={
"subagent_id": subagent_id,
"message": message,
"data": data,
},
)
)
# === QUERY OPERATIONS ===
def get_history(
+203 -43
View File
@@ -32,6 +32,19 @@ if TYPE_CHECKING:
from framework.storage.concurrent import ConcurrentStorage
from framework.storage.session_store import SessionStore
class ExecutionAlreadyRunningError(RuntimeError):
"""Raised when attempting to start an execution on a stream that already has one running."""
def __init__(self, stream_id: str, active_ids: list[str]):
self.stream_id = stream_id
self.active_ids = active_ids
super().__init__(
f"Stream '{stream_id}' already has an active execution: {active_ids}. "
"Concurrent executions on the same stream are not allowed."
)
logger = logging.getLogger(__name__)
@@ -56,9 +69,11 @@ class GraphScopedEventBus(EventBus):
# (subscriptions, history, semaphore, etc.) to the real bus.
self._real_bus = bus
self._scope_graph_id = graph_id
self.last_activity_time: float = time.monotonic()
async def publish(self, event: "AgentEvent") -> None: # type: ignore[override]
event.graph_id = self._scope_graph_id
self.last_activity_time = time.monotonic()
await self._real_bus.publish(event)
# --- Delegate state-reading methods to the real bus ---
@@ -93,6 +108,7 @@ class EntryPointSpec:
isolation_level: str = "shared" # "isolated" | "shared" | "synchronized"
priority: int = 0
max_concurrent: int = 10 # Max concurrent executions for this entry point
max_resurrections: int = 3 # Auto-restart on non-fatal failure (0 to disable)
def get_isolation_level(self) -> IsolationLevel:
"""Convert string isolation level to enum."""
@@ -233,9 +249,11 @@ class ExecutionStream:
self._lock = asyncio.Lock()
# Graph-scoped event bus (stamps graph_id on published events)
self._scoped_event_bus = self._event_bus
if self._event_bus and self.graph_id:
self._scoped_event_bus = GraphScopedEventBus(self._event_bus, self.graph_id)
# Always wrap in GraphScopedEventBus so we can track last_activity_time.
if self._event_bus:
self._scoped_event_bus = GraphScopedEventBus(self._event_bus, self.graph_id or "")
else:
self._scoped_event_bus = None
# State
self._running = False
@@ -265,6 +283,21 @@ class ExecutionStream:
"""Return IDs of all currently active executions."""
return list(self._active_executions.keys())
@property
def agent_idle_seconds(self) -> float:
"""Seconds since the last agent activity (LLM call, tool call, node transition).
Returns ``float('inf')`` if no event bus is attached or no events have
been published yet. When there are no active executions, also returns
``float('inf')`` (nothing to be idle *about*).
"""
if not self._active_executions:
return float("inf")
bus = self._scoped_event_bus
if isinstance(bus, GraphScopedEventBus):
return time.monotonic() - bus.last_activity_time
return float("inf")
@property
def is_awaiting_input(self) -> bool:
"""True when an active execution is blocked waiting for client input."""
@@ -292,13 +325,21 @@ class ExecutionStream:
"""Return nodes that support message injection (have ``inject_event``).
Each entry is ``{"node_id": ..., "execution_id": ...}``.
The currently executing node is placed first so that
``inject_worker_message`` targets the active node, not a stale one.
"""
injectable: list[dict[str, str]] = []
current_first: list[dict[str, str]] = []
for exec_id, executor in self._active_executors.items():
current = getattr(executor, "current_node_id", None)
for node_id, node in executor.node_registry.items():
if hasattr(node, "inject_event"):
injectable.append({"node_id": node_id, "execution_id": exec_id})
return injectable
entry = {"node_id": node_id, "execution_id": exec_id}
if node_id == current:
current_first.append(entry)
else:
injectable.append(entry)
return current_first + injectable
def _record_execution_result(self, execution_id: str, result: ExecutionResult) -> None:
"""Record a completed execution result with retention pruning."""
@@ -404,6 +445,27 @@ class ExecutionStream:
if not self._running:
raise RuntimeError(f"ExecutionStream '{self.stream_id}' is not running")
# Only one execution may run on a stream at a time — concurrent
# executions corrupt shared session state. Cancel any running
# execution before starting the new one. The cancelled execution
# writes its state to disk before cleanup, and the new execution
# runs in the same session directory (via resume_session_id).
active = self.active_execution_ids
for eid in active:
logger.info(
"Cancelling running execution %s on stream '%s' before starting new one",
eid,
self.stream_id,
)
executor = self._active_executors.get(eid)
if executor:
for node in executor.node_registry.values():
if hasattr(node, "signal_shutdown"):
node.signal_shutdown()
if hasattr(node, "cancel_current_turn"):
node.cancel_current_turn()
await self.cancel_execution(eid)
# When resuming, reuse the original session ID so the execution
# continues in the same session directory instead of creating a new one.
resume_session_id = session_state.get("resume_session_id") if session_state else None
@@ -449,8 +511,44 @@ class ExecutionStream:
logger.debug(f"Queued execution {execution_id} for stream {self.stream_id}")
return execution_id
# Errors that indicate resurrection won't help — the same error will recur.
# Includes both configuration/environment errors and deterministic node
# failures where the conversation/state hasn't changed.
_FATAL_ERROR_PATTERNS: tuple[str, ...] = (
# Configuration / environment
"credential",
"authentication",
"unauthorized",
"forbidden",
"api key",
"import error",
"module not found",
"no module named",
"permission denied",
"invalid api",
"configuration error",
# Deterministic node failures — resurrecting at the same node with
# the same conversation produces the same result.
"node stalled",
"ghost empty stream",
"max iterations",
)
@classmethod
def _is_fatal_error(cls, error: str | None) -> bool:
"""Return True if the error is life-threatening (no point resurrecting)."""
if not error:
return False
error_lower = error.lower()
return any(pat in error_lower for pat in cls._FATAL_ERROR_PATTERNS)
async def _run_execution(self, ctx: ExecutionContext) -> None:
"""Run a single execution within the stream."""
"""Run a single execution within the stream.
Supports automatic resurrection: when the execution fails with a
non-fatal error, it restarts from the failed node up to
``entry_spec.max_resurrections`` times (default 3).
"""
execution_id = ctx.id
# When sharing a session with another entry point (resume_session_id),
@@ -458,6 +556,11 @@ class ExecutionStream:
# owns the state.json and _write_progress() keeps memory up-to-date.
_is_shared_session = bool(ctx.session_state and ctx.session_state.get("resume_session_id"))
max_resurrections = self.entry_spec.max_resurrections
_resurrection_count = 0
_current_session_state = ctx.session_state
_current_input_data = ctx.input_data
# Acquire semaphore to limit concurrency
async with self._semaphore:
ctx.status = "running"
@@ -498,12 +601,6 @@ class ExecutionStream:
store=self._runtime_log_store, agent_id=self.graph.id
)
# Create executor for this execution.
# Each execution gets its own storage under sessions/{exec_id}/
# so conversations, spillover, and data files are all scoped
# to this execution. The executor sets data_dir via execution
# context (contextvars) so data tools and spillover share the
# same session-scoped directory.
# Derive storage from session_store (graph-specific for secondary
# graphs) so that all files — conversations, state, checkpoints,
# data — land under the graph's own sessions/ directory, not the
@@ -512,43 +609,106 @@ class ExecutionStream:
exec_storage = self._session_store.sessions_dir / execution_id
else:
exec_storage = self._storage.base_path / "sessions" / execution_id
executor = GraphExecutor(
runtime=runtime_adapter,
llm=self._llm,
tools=self._tools,
tool_executor=self._tool_executor,
event_bus=self._scoped_event_bus,
stream_id=self.stream_id,
execution_id=execution_id,
storage_path=exec_storage,
runtime_logger=runtime_logger,
loop_config=self.graph.loop_config,
accounts_prompt=self._accounts_prompt,
accounts_data=self._accounts_data,
tool_provider_map=self._tool_provider_map,
)
# Track executor so inject_input() can reach EventLoopNode instances
self._active_executors[execution_id] = executor
# Write initial session state
if not _is_shared_session:
await self._write_session_state(execution_id, ctx)
# Create modified graph with entry point
# We need to override the entry_node to use our entry point
modified_graph = self._create_modified_graph()
# Execute
result = await executor.execute(
graph=modified_graph,
goal=self.goal,
input_data=ctx.input_data,
session_state=ctx.session_state,
checkpoint_config=self._checkpoint_config,
)
# Write initial session state
if not _is_shared_session:
await self._write_session_state(execution_id, ctx)
# Clean up executor reference
self._active_executors.pop(execution_id, None)
# --- Resurrection loop ---
# Each iteration creates a fresh executor. On non-fatal failure,
# the executor's session_state (memory + resume_from) carries
# forward so the next attempt resumes at the failed node.
while True:
# Create executor for this execution.
# Each execution gets its own storage under sessions/{exec_id}/
# so conversations, spillover, and data files are all scoped
# to this execution. The executor sets data_dir via execution
# context (contextvars) so data tools and spillover share the
# same session-scoped directory.
executor = GraphExecutor(
runtime=runtime_adapter,
llm=self._llm,
tools=self._tools,
tool_executor=self._tool_executor,
event_bus=self._scoped_event_bus,
stream_id=self.stream_id,
execution_id=execution_id,
storage_path=exec_storage,
runtime_logger=runtime_logger,
loop_config=self.graph.loop_config,
accounts_prompt=self._accounts_prompt,
accounts_data=self._accounts_data,
tool_provider_map=self._tool_provider_map,
)
# Track executor so inject_input() can reach EventLoopNode instances
self._active_executors[execution_id] = executor
# Execute
result = await executor.execute(
graph=modified_graph,
goal=self.goal,
input_data=_current_input_data,
session_state=_current_session_state,
checkpoint_config=self._checkpoint_config,
)
# Clean up executor reference
self._active_executors.pop(execution_id, None)
# Check if resurrection is appropriate
if (
not result.success
and not result.paused_at
and _resurrection_count < max_resurrections
and result.session_state
and not self._is_fatal_error(result.error)
):
_resurrection_count += 1
logger.warning(
"Execution %s failed (%s) — resurrecting (%d/%d) from node '%s'",
execution_id,
(result.error or "unknown")[:200],
_resurrection_count,
max_resurrections,
result.session_state.get("resume_from", "?"),
)
# Emit resurrection event
if self._scoped_event_bus:
from framework.runtime.event_bus import AgentEvent, EventType
await self._scoped_event_bus.publish(
AgentEvent(
type=EventType.EXECUTION_RESURRECTED,
stream_id=self.stream_id,
execution_id=execution_id,
data={
"attempt": _resurrection_count,
"max_resurrections": max_resurrections,
"error": (result.error or "")[:500],
"resume_from": result.session_state.get("resume_from"),
},
)
)
# Resume from the failed node with preserved memory
_current_session_state = {
**result.session_state,
"resume_session_id": execution_id,
}
# On resurrection, input_data is already in memory —
# pass empty so we don't overwrite intermediate results.
_current_input_data = {}
# Brief cooldown before resurrection
await asyncio.sleep(2.0)
continue
break # success, fatal failure, or resurrections exhausted
# Store result with retention
self._record_execution_result(execution_id, result)
@@ -0,0 +1,85 @@
"""HIVE_LLM_DEBUG — write every LLM turn to a JSONL file for replay/debugging.
Set the env var to enable:
HIVE_LLM_DEBUG=1 writes to ~/.hive/llm_logs/<ts>.jsonl
HIVE_LLM_DEBUG=/some/path writes to that directory
Each line is a JSON object with the full LLM turn: assistant text, tool calls,
tool results, and token counts. The file is opened lazily on first call and
flushed after every write. Errors are silently swallowed this must never
break the agent.
"""
import json
import logging
import os
from datetime import datetime
from pathlib import Path
from typing import IO, Any
logger = logging.getLogger(__name__)
_LLM_DEBUG_RAW = os.environ.get("HIVE_LLM_DEBUG", "").strip()
_LLM_DEBUG_ENABLED = _LLM_DEBUG_RAW.lower() in ("1", "true") or (
bool(_LLM_DEBUG_RAW) and _LLM_DEBUG_RAW.lower() not in ("0", "false", "")
)
_log_file: IO[str] | None = None
_log_ready = False # lazy init guard
def _open_log() -> IO[str] | None:
"""Open a JSONL log file. Returns None if disabled."""
if not _LLM_DEBUG_ENABLED:
return None
raw = _LLM_DEBUG_RAW
if raw.lower() in ("1", "true"):
log_dir = Path.home() / ".hive" / "llm_logs"
else:
log_dir = Path(raw)
log_dir.mkdir(parents=True, exist_ok=True)
ts = datetime.now().strftime("%Y%m%d_%H%M%S")
path = log_dir / f"{ts}.jsonl"
logger.info("LLM debug log → %s", path)
return open(path, "a", encoding="utf-8") # noqa: SIM115
def log_llm_turn(
*,
node_id: str,
stream_id: str,
execution_id: str,
iteration: int,
assistant_text: str,
tool_calls: list[dict[str, Any]],
tool_results: list[dict[str, Any]],
token_counts: dict[str, Any],
) -> None:
"""Write one JSONL line capturing a complete LLM turn.
No-op when HIVE_LLM_DEBUG is not set. Never raises.
"""
if not _LLM_DEBUG_ENABLED:
return
try:
global _log_file, _log_ready # noqa: PLW0603
if not _log_ready:
_log_file = _open_log()
_log_ready = True
if _log_file is None:
return
record = {
"timestamp": datetime.now().isoformat(),
"node_id": node_id,
"stream_id": stream_id,
"execution_id": execution_id,
"iteration": iteration,
"assistant_text": assistant_text,
"tool_calls": tool_calls,
"tool_results": tool_results,
"token_counts": token_counts,
}
_log_file.write(json.dumps(record, default=str) + "\n")
_log_file.flush()
except Exception:
pass # never break the agent
@@ -24,6 +24,8 @@ class ToolCallLog(BaseModel):
tool_input: dict[str, Any] = Field(default_factory=dict)
result: str = ""
is_error: bool = False
start_timestamp: str = "" # ISO 8601 timestamp when tool execution started
duration_s: float = 0.0 # Wall-clock execution time in seconds
class NodeStepLog(BaseModel):
+2
View File
@@ -114,6 +114,8 @@ class RuntimeLogger:
tool_input=tc.get("tool_input", {}),
result=tc.get("content", ""),
is_error=tc.get("is_error", False),
start_timestamp=tc.get("start_timestamp", ""),
duration_s=tc.get("duration_s", 0.0),
)
)
@@ -821,5 +821,148 @@ class TestTimerEntryPoints:
await runtime.stop()
# === Cancel All Tasks Tests ===
class TestCancelAllTasks:
"""Tests for cancel_all_tasks and cancel_all_tasks_async."""
@pytest.mark.asyncio
async def test_cancel_all_tasks_async_returns_false_when_no_tasks(
self, sample_graph, sample_goal, temp_storage
):
"""Test that cancel_all_tasks_async returns False with no running tasks."""
runtime = AgentRuntime(
graph=sample_graph,
goal=sample_goal,
storage_path=temp_storage,
)
entry_spec = EntryPointSpec(
id="webhook",
name="Webhook",
entry_node="process-webhook",
trigger_type="webhook",
)
runtime.register_entry_point(entry_spec)
await runtime.start()
try:
result = await runtime.cancel_all_tasks_async()
assert result is False
finally:
await runtime.stop()
@pytest.mark.asyncio
async def test_cancel_all_tasks_async_cancels_running_task(
self, sample_graph, sample_goal, temp_storage
):
"""Test that cancel_all_tasks_async cancels a running task and returns True."""
runtime = AgentRuntime(
graph=sample_graph,
goal=sample_goal,
storage_path=temp_storage,
)
entry_spec = EntryPointSpec(
id="webhook",
name="Webhook",
entry_node="process-webhook",
trigger_type="webhook",
)
runtime.register_entry_point(entry_spec)
await runtime.start()
try:
# Inject a fake running task into the stream
stream = runtime._streams["webhook"]
async def hang_forever():
await asyncio.get_event_loop().create_future()
fake_task = asyncio.ensure_future(hang_forever())
stream._execution_tasks["fake-exec"] = fake_task
result = await runtime.cancel_all_tasks_async()
assert result is True
# Let the CancelledError propagate
try:
await fake_task
except asyncio.CancelledError:
pass
assert fake_task.cancelled()
# Clean up
del stream._execution_tasks["fake-exec"]
finally:
await runtime.stop()
@pytest.mark.asyncio
async def test_cancel_all_tasks_async_cancels_multiple_tasks_across_streams(
self, sample_graph, sample_goal, temp_storage
):
"""Test that cancel_all_tasks_async cancels tasks across multiple streams."""
runtime = AgentRuntime(
graph=sample_graph,
goal=sample_goal,
storage_path=temp_storage,
)
# Register two entry points so we get two streams
runtime.register_entry_point(
EntryPointSpec(
id="stream-a",
name="Stream A",
entry_node="process-webhook",
trigger_type="webhook",
)
)
runtime.register_entry_point(
EntryPointSpec(
id="stream-b",
name="Stream B",
entry_node="process-webhook",
trigger_type="webhook",
)
)
await runtime.start()
try:
async def hang_forever():
await asyncio.get_event_loop().create_future()
stream_a = runtime._streams["stream-a"]
stream_b = runtime._streams["stream-b"]
# Two tasks in stream A, one task in stream B
task_a1 = asyncio.ensure_future(hang_forever())
task_a2 = asyncio.ensure_future(hang_forever())
task_b1 = asyncio.ensure_future(hang_forever())
stream_a._execution_tasks["exec-a1"] = task_a1
stream_a._execution_tasks["exec-a2"] = task_a2
stream_b._execution_tasks["exec-b1"] = task_b1
result = await runtime.cancel_all_tasks_async()
assert result is True
# Let CancelledErrors propagate
for task in [task_a1, task_a2, task_b1]:
try:
await task
except asyncio.CancelledError:
pass
assert task.cancelled()
# Clean up
del stream_a._execution_tasks["exec-a1"]
del stream_a._execution_tasks["exec-a2"]
del stream_b._execution_tasks["exec-b1"]
finally:
await runtime.stop()
if __name__ == "__main__":
pytest.main([__file__, "-v"])
+53 -7
View File
@@ -11,6 +11,52 @@ from framework.server.session_manager import Session, SessionManager
logger = logging.getLogger(__name__)
# Anchor to the repository root so allowed roots are independent of CWD.
# app.py lives at core/framework/server/app.py, so four .parent calls
# reach the repo root where exports/ and examples/ live.
_REPO_ROOT = Path(__file__).resolve().parent.parent.parent.parent
_ALLOWED_AGENT_ROOTS: tuple[Path, ...] | None = None
def _get_allowed_agent_roots() -> tuple[Path, ...]:
"""Return resolved allowed root directories for agent loading.
Roots are anchored to the repository root (derived from ``__file__``)
so the allowlist is correct regardless of the process's working
directory.
"""
global _ALLOWED_AGENT_ROOTS
if _ALLOWED_AGENT_ROOTS is None:
_ALLOWED_AGENT_ROOTS = (
(_REPO_ROOT / "exports").resolve(),
(_REPO_ROOT / "examples").resolve(),
(Path.home() / ".hive" / "agents").resolve(),
)
return _ALLOWED_AGENT_ROOTS
def validate_agent_path(agent_path: str | Path) -> Path:
"""Validate that an agent path resolves inside an allowed directory.
Prevents arbitrary code execution via ``importlib.import_module`` by
restricting agent loading to known safe directories: ``exports/``,
``examples/``, and ``~/.hive/agents/``.
Returns the resolved ``Path`` on success.
Raises:
ValueError: If the path is outside all allowed roots.
"""
resolved = Path(agent_path).expanduser().resolve()
for root in _get_allowed_agent_roots():
if resolved.is_relative_to(root) and resolved != root:
return resolved
raise ValueError(
"agent_path must be inside an allowed directory (exports/, examples/, or ~/.hive/agents/)"
)
def safe_path_segment(value: str) -> str:
"""Validate a URL path parameter is a safe filesystem name.
@@ -18,7 +64,7 @@ def safe_path_segment(value: str) -> str:
traversal sequences. aiohttp decodes ``%2F`` inside route params,
so a raw ``{session_id}`` can contain ``/`` or ``..`` after decoding.
"""
if "/" in value or "\\" in value or ".." in value:
if not value or value == "." or "/" in value or "\\" in value or ".." in value:
raise web.HTTPBadRequest(reason="Invalid path parameter")
return value
@@ -130,10 +176,7 @@ def create_app(model: str | None = None) -> web.Application:
"""
app = web.Application(middlewares=[cors_middleware, error_middleware])
# Store manager on app for handlers
app["manager"] = SessionManager(model=model)
# Initialize credential store
# Initialize credential store (before SessionManager so it can be shared)
from framework.credentials.store import CredentialStore
try:
@@ -154,10 +197,13 @@ def create_app(model: str | None = None) -> web.Application:
except Exception as exc:
logger.warning("Could not auto-persist HIVE_CREDENTIAL_KEY: %s", exc)
app["credential_store"] = CredentialStore.with_aden_sync()
credential_store = CredentialStore.with_aden_sync()
except Exception:
logger.debug("Encrypted credential store unavailable, using in-memory fallback")
app["credential_store"] = CredentialStore.for_testing({})
credential_store = CredentialStore.for_testing({})
app["credential_store"] = credential_store
app["manager"] = SessionManager(model=model, credential_store=credential_store)
# Register shutdown hook
app.on_shutdown.append(_on_shutdown)
@@ -8,6 +8,7 @@ from pydantic import SecretStr
from framework.credentials.models import CredentialKey, CredentialObject
from framework.credentials.store import CredentialStore
from framework.server.app import validate_agent_path
logger = logging.getLogger(__name__)
@@ -128,6 +129,11 @@ async def handle_check_agent(request: web.Request) -> web.Response:
if not agent_path:
return web.json_response({"error": "agent_path is required"}, status=400)
try:
agent_path = str(validate_agent_path(agent_path))
except ValueError as e:
return web.json_response({"error": str(e)}, status=400)
try:
from framework.credentials.setup import load_agent_nodes
from framework.credentials.validation import (
+53 -4
View File
@@ -4,6 +4,7 @@ import asyncio
import logging
from aiohttp import web
from aiohttp.client_exceptions import ClientConnectionResetError as _AiohttpConnReset
from framework.runtime.event_bus import EventType
from framework.server.app import resolve_session
@@ -37,6 +38,8 @@ DEFAULT_EVENT_TYPES = [
EventType.CONTEXT_COMPACTED,
EventType.WORKER_LOADED,
EventType.CREDENTIALS_REQUIRED,
EventType.SUBAGENT_REPORT,
EventType.QUEEN_MODE_CHANGED,
]
# Keepalive interval in seconds
@@ -90,13 +93,26 @@ async def handle_events(request: web.Request) -> web.StreamResponse:
"node_loop_started",
"credentials_required",
"worker_loaded",
"queen_mode_changed",
}
client_disconnected = asyncio.Event()
async def on_event(event) -> None:
"""Push event dict into queue; drop non-critical events if full."""
if client_disconnected.is_set():
return
evt_dict = event.to_dict()
if evt_dict.get("type") in _CRITICAL_EVENTS:
await queue.put(evt_dict) # block rather than drop
try:
queue.put_nowait(evt_dict)
except asyncio.QueueFull:
logger.warning(
"SSE client queue full on critical event; disconnecting session='%s'",
session.id,
)
client_disconnected.set()
else:
try:
queue.put_nowait(evt_dict)
@@ -117,10 +133,33 @@ async def handle_events(request: web.Request) -> web.StreamResponse:
"SSE connected: session='%s', sub_id='%s', types=%d", session.id, sub_id, len(event_types)
)
# Replay buffered events that were published before this SSE connected.
# The EventBus keeps a history ring-buffer; we replay the subset that
# produces visible chat messages so the frontend never misses early
# queen output. Lifecycle events are NOT replayed to avoid duplicate
# state transitions (turn counter increments, etc.).
_REPLAY_TYPES = {
EventType.CLIENT_OUTPUT_DELTA.value,
EventType.EXECUTION_STARTED.value,
EventType.CLIENT_INPUT_REQUESTED.value,
}
event_type_values = {et.value for et in event_types}
replay_types = _REPLAY_TYPES & event_type_values
replayed = 0
for past_event in event_bus._event_history:
if past_event.type.value in replay_types:
try:
queue.put_nowait(past_event.to_dict())
replayed += 1
except asyncio.QueueFull:
break
if replayed:
logger.info("SSE replayed %d buffered events for session='%s'", replayed, session.id)
event_count = 0
close_reason = "unknown"
try:
while True:
while not client_disconnected.is_set():
try:
data = await asyncio.wait_for(queue.get(), timeout=KEEPALIVE_INTERVAL)
await sse.send_event(data)
@@ -130,13 +169,23 @@ async def handle_events(request: web.Request) -> web.StreamResponse:
"SSE first event: session='%s', type='%s'", session.id, data.get("type")
)
except TimeoutError:
await sse.send_keepalive()
except (ConnectionResetError, ConnectionError):
try:
await sse.send_keepalive()
except (ConnectionResetError, ConnectionError, _AiohttpConnReset):
close_reason = "client_disconnected"
break
except Exception as exc:
close_reason = f"keepalive_error: {exc}"
break
except (ConnectionResetError, ConnectionError, _AiohttpConnReset):
close_reason = "client_disconnected"
break
except Exception as exc:
close_reason = f"error: {exc}"
break
if client_disconnected.is_set() and close_reason == "unknown":
close_reason = "slow_client"
except asyncio.CancelledError:
close_reason = "cancelled"
finally:
+159 -29
View File
@@ -64,6 +64,16 @@ async def handle_trigger(request: web.Request) -> web.Response:
session_state=session_state,
)
# Cancel queen's in-progress LLM turn so it picks up the mode change cleanly
if session.queen_executor:
node = session.queen_executor.node_registry.get("queen")
if node and hasattr(node, "cancel_current_turn"):
node.cancel_current_turn()
# Switch queen to running mode (mirrors run_agent_with_input tool behavior)
if session.mode_state is not None:
await session.mode_state.switch_to_running(source="frontend")
return web.json_response({"execution_id": execution_id})
@@ -92,12 +102,10 @@ async def handle_inject(request: web.Request) -> web.Response:
async def handle_chat(request: web.Request) -> web.Response:
"""POST /api/sessions/{session_id}/chat — convenience endpoint.
"""POST /api/sessions/{session_id}/chat — send a message to the queen.
Routing priority:
1. Worker awaiting input inject into worker node
2. Queen active inject into queen conversation
3. Error no handler available
The input box is permanently connected to the queen agent.
Worker input is handled separately via /worker-input.
Body: {"message": "hello"}
"""
@@ -111,26 +119,6 @@ async def handle_chat(request: web.Request) -> web.Response:
if not message:
return web.json_response({"error": "message is required"}, status=400)
# 1. Check if worker is awaiting input → inject to worker
if session.worker_runtime:
node_id, graph_id = session.worker_runtime.find_awaiting_node()
if node_id:
delivered = await session.worker_runtime.inject_input(
node_id,
message,
graph_id=graph_id,
is_client_input=True,
)
return web.json_response(
{
"status": "injected",
"node_id": node_id,
"delivered": delivered,
}
)
# 2. Queen active → inject into queen conversation
queen_executor = session.queen_executor
if queen_executor is not None:
node = queen_executor.node_registry.get("queen")
@@ -143,8 +131,76 @@ async def handle_chat(request: web.Request) -> web.Response:
}
)
# 3. No queen or worker available
return web.json_response({"error": "No worker or queen available"}, status=503)
return web.json_response({"error": "Queen not available"}, status=503)
async def handle_queen_context(request: web.Request) -> web.Response:
"""POST /api/sessions/{session_id}/queen-context — queue context for the queen.
Unlike /chat, this does NOT trigger an LLM response. The message is
queued in the queen's injection queue and will be drained on her next
natural iteration (prefixed with [External event]:).
Body: {"message": "..."}
"""
session, err = resolve_session(request)
if err:
return err
body = await request.json()
message = body.get("message", "")
if not message:
return web.json_response({"error": "message is required"}, status=400)
queen_executor = session.queen_executor
if queen_executor is not None:
node = queen_executor.node_registry.get("queen")
if node is not None and hasattr(node, "inject_event"):
await node.inject_event(message, is_client_input=False)
return web.json_response({"status": "queued", "delivered": True})
return web.json_response({"error": "Queen not available"}, status=503)
async def handle_worker_input(request: web.Request) -> web.Response:
"""POST /api/sessions/{session_id}/worker-input — send input to waiting worker node.
Auto-discovers the worker node currently awaiting input and injects the message.
Returns 404 if no worker node is awaiting input.
Body: {"message": "..."}
"""
session, err = resolve_session(request)
if err:
return err
body = await request.json()
message = body.get("message", "")
if not message:
return web.json_response({"error": "message is required"}, status=400)
if not session.worker_runtime:
return web.json_response({"error": "No worker loaded"}, status=503)
node_id, graph_id = session.worker_runtime.find_awaiting_node()
if not node_id:
return web.json_response({"error": "No worker node awaiting input"}, status=404)
delivered = await session.worker_runtime.inject_input(
node_id,
message,
graph_id=graph_id,
is_client_input=True,
)
return web.json_response(
{
"status": "injected",
"node_id": node_id,
"delivered": delivered,
}
)
async def handle_goal_progress(request: web.Request) -> web.Response:
@@ -190,7 +246,7 @@ async def handle_resume(request: web.Request) -> web.Response:
return web.json_response({"error": "Session not found"}, status=404)
try:
state = json.loads(state_path.read_text())
state = json.loads(state_path.read_text(encoding="utf-8"))
except (json.JSONDecodeError, OSError) as e:
return web.json_response({"error": f"Failed to read session: {e}"}, status=500)
@@ -232,6 +288,60 @@ async def handle_resume(request: web.Request) -> web.Response:
)
async def handle_pause(request: web.Request) -> web.Response:
"""POST /api/sessions/{session_id}/pause — pause the worker (queen stays alive).
Mirrors the queen's stop_worker() tool: cancels all active worker
executions, pauses timers so nothing auto-restarts, but does NOT
touch the queen so she can observe and react to the pause.
"""
session, err = resolve_session(request)
if err:
return err
if not session.worker_runtime:
return web.json_response({"error": "No worker loaded in this session"}, status=503)
runtime = session.worker_runtime
cancelled = []
for graph_id in runtime.list_graphs():
reg = runtime.get_graph_registration(graph_id)
if reg is None:
continue
for _ep_id, stream in reg.streams.items():
# Signal shutdown on active nodes to abort in-flight LLM streams
for executor in stream._active_executors.values():
for node in executor.node_registry.values():
if hasattr(node, "signal_shutdown"):
node.signal_shutdown()
if hasattr(node, "cancel_current_turn"):
node.cancel_current_turn()
for exec_id in list(stream.active_execution_ids):
try:
ok = await stream.cancel_execution(exec_id)
if ok:
cancelled.append(exec_id)
except Exception:
pass
# Pause timers so the next tick doesn't restart execution
runtime.pause_timers()
# Switch to staging (agent still loaded, ready to re-run)
if session.mode_state is not None:
await session.mode_state.switch_to_staging(source="frontend")
return web.json_response(
{
"stopped": bool(cancelled),
"cancelled": cancelled,
"timers_paused": True,
}
)
async def handle_stop(request: web.Request) -> web.Response:
"""POST /api/sessions/{session_id}/stop — cancel a running execution.
@@ -255,8 +365,26 @@ async def handle_stop(request: web.Request) -> web.Response:
if reg is None:
continue
for _ep_id, stream in reg.streams.items():
# Signal shutdown on active nodes to abort in-flight LLM streams
for executor in stream._active_executors.values():
for node in executor.node_registry.values():
if hasattr(node, "signal_shutdown"):
node.signal_shutdown()
if hasattr(node, "cancel_current_turn"):
node.cancel_current_turn()
cancelled = await stream.cancel_execution(execution_id)
if cancelled:
# Cancel queen's in-progress LLM turn
if session.queen_executor:
node = session.queen_executor.node_registry.get("queen")
if node and hasattr(node, "cancel_current_turn"):
node.cancel_current_turn()
# Switch to staging (agent still loaded, ready to re-run)
if session.mode_state is not None:
await session.mode_state.switch_to_staging(source="frontend")
return web.json_response(
{
"stopped": True,
@@ -340,7 +468,9 @@ def register_routes(app: web.Application) -> None:
app.router.add_post("/api/sessions/{session_id}/trigger", handle_trigger)
app.router.add_post("/api/sessions/{session_id}/inject", handle_inject)
app.router.add_post("/api/sessions/{session_id}/chat", handle_chat)
app.router.add_post("/api/sessions/{session_id}/pause", handle_stop)
app.router.add_post("/api/sessions/{session_id}/queen-context", handle_queen_context)
app.router.add_post("/api/sessions/{session_id}/worker-input", handle_worker_input)
app.router.add_post("/api/sessions/{session_id}/pause", handle_pause)
app.router.add_post("/api/sessions/{session_id}/resume", handle_resume)
app.router.add_post("/api/sessions/{session_id}/stop", handle_stop)
app.router.add_post("/api/sessions/{session_id}/cancel-queen", handle_cancel_queen)
+8 -1
View File
@@ -45,6 +45,7 @@ def _node_to_dict(node) -> dict:
"client_facing": node.client_facing,
"success_criteria": node.success_criteria,
"system_prompt": node.system_prompt or "",
"sub_agents": node.sub_agents,
}
@@ -79,7 +80,7 @@ async def handle_list_nodes(request: web.Request) -> web.Response:
)
if state_path.exists():
try:
state = json.loads(state_path.read_text())
state = json.loads(state_path.read_text(encoding="utf-8"))
progress = state.get("progress", {})
visit_counts = progress.get("node_visit_counts", {})
failures = progress.get("nodes_with_failures", [])
@@ -99,6 +100,7 @@ async def handle_list_nodes(request: web.Request) -> web.Response:
{"source": e.source, "target": e.target, "condition": e.condition, "priority": e.priority}
for e in graph.edges
]
rt = session.worker_runtime
entry_points = [
{
"id": ep.id,
@@ -106,6 +108,11 @@ async def handle_list_nodes(request: web.Request) -> web.Response:
"entry_node": ep.entry_node,
"trigger_type": ep.trigger_type,
"trigger_config": ep.trigger_config,
**(
{"next_fire_in": nf}
if rt and (nf := rt.get_timer_next_fire_in(ep.id)) is not None
else {}
),
}
for ep in reg.entry_points.values()
]
+131 -24
View File
@@ -30,7 +30,12 @@ from pathlib import Path
from aiohttp import web
from framework.server.app import resolve_session, safe_path_segment, sessions_dir
from framework.server.app import (
resolve_session,
safe_path_segment,
sessions_dir,
validate_agent_path,
)
from framework.server.session_manager import SessionManager
logger = logging.getLogger(__name__)
@@ -43,6 +48,7 @@ def _get_manager(request: web.Request) -> SessionManager:
def _session_to_live_dict(session) -> dict:
"""Serialize a live Session to the session-primary JSON shape."""
info = session.worker_info
mode_state = getattr(session, "mode_state", None)
return {
"session_id": session.id,
"worker_id": session.worker_id,
@@ -55,6 +61,7 @@ def _session_to_live_dict(session) -> dict:
"loaded_at": session.loaded_at,
"uptime_seconds": round(time.time() - session.loaded_at, 1),
"intro_message": getattr(session.runner, "intro_message", "") or "",
"queen_mode": mode_state.mode if mode_state else "building",
}
@@ -117,6 +124,15 @@ async def handle_create_session(request: web.Request) -> web.Response:
session_id = body.get("session_id")
model = body.get("model")
initial_prompt = body.get("initial_prompt")
# When set, the queen writes conversations to this existing session's directory
# so the full history accumulates in one place across server restarts.
queen_resume_from = body.get("queen_resume_from")
if agent_path:
try:
agent_path = str(validate_agent_path(agent_path))
except ValueError as e:
return web.json_response({"error": str(e)}, status=400)
try:
if agent_path:
@@ -126,6 +142,7 @@ async def handle_create_session(request: web.Request) -> web.Response:
agent_id=agent_id,
model=model,
initial_prompt=initial_prompt,
queen_resume_from=queen_resume_from,
)
else:
# Queen-only session
@@ -133,6 +150,7 @@ async def handle_create_session(request: web.Request) -> web.Response:
session_id=session_id,
model=model,
initial_prompt=initial_prompt,
queen_resume_from=queen_resume_from,
)
except ValueError as e:
msg = str(e)
@@ -143,14 +161,17 @@ async def handle_create_session(request: web.Request) -> web.Response:
status=409,
)
return web.json_response({"error": msg}, status=409)
except FileNotFoundError as e:
return web.json_response({"error": str(e)}, status=404)
except FileNotFoundError:
return web.json_response(
{"error": f"Agent not found: {agent_path or 'no path'}"},
status=404,
)
except Exception as e:
resp = _credential_error_response(e, agent_path)
if resp is not None:
return resp
logger.exception("Error creating session: %s", e)
return web.json_response({"error": str(e)}, status=500)
return web.json_response({"error": "Internal server error"}, status=500)
return web.json_response(_session_to_live_dict(session), status=201)
@@ -163,7 +184,12 @@ async def handle_list_live_sessions(request: web.Request) -> web.Response:
async def handle_get_live_session(request: web.Request) -> web.Response:
"""GET /api/sessions/{session_id} — get session detail."""
"""GET /api/sessions/{session_id} — get session detail.
Falls back to cold session metadata (HTTP 200 with ``cold: true``) when the
session is not alive in memory but queen conversation files exist on disk.
This lets the frontend detect a server restart and restore message history.
"""
manager = _get_manager(request)
session_id = request.match_info["session_id"]
session = manager.get_session(session_id)
@@ -174,6 +200,10 @@ async def handle_get_live_session(request: web.Request) -> web.Response:
{"session_id": session_id, "loading": True},
status=202,
)
# Check if conversation files survived on disk (post-restart scenario)
cold_info = SessionManager.get_cold_session_info(session_id)
if cold_info is not None:
return web.json_response(cold_info)
return web.json_response(
{"error": f"Session '{session_id}' not found"},
status=404,
@@ -182,6 +212,7 @@ async def handle_get_live_session(request: web.Request) -> web.Response:
data = _session_to_live_dict(session)
if session.worker_runtime:
rt = session.worker_runtime
data["entry_points"] = [
{
"id": ep.id,
@@ -189,8 +220,13 @@ async def handle_get_live_session(request: web.Request) -> web.Response:
"entry_node": ep.entry_node,
"trigger_type": ep.trigger_type,
"trigger_config": ep.trigger_config,
**(
{"next_fire_in": nf}
if (nf := rt.get_timer_next_fire_in(ep.id)) is not None
else {}
),
}
for ep in session.worker_runtime.get_entry_points()
for ep in rt.get_entry_points()
]
data["graphs"] = session.worker_runtime.list_graphs()
@@ -230,6 +266,11 @@ async def handle_load_worker(request: web.Request) -> web.Response:
if not agent_path:
return web.json_response({"error": "agent_path is required"}, status=400)
try:
agent_path = str(validate_agent_path(agent_path))
except ValueError as e:
return web.json_response({"error": str(e)}, status=400)
worker_id = body.get("worker_id")
model = body.get("model")
@@ -242,14 +283,14 @@ async def handle_load_worker(request: web.Request) -> web.Response:
)
except ValueError as e:
return web.json_response({"error": str(e)}, status=409)
except FileNotFoundError as e:
return web.json_response({"error": str(e)}, status=404)
except FileNotFoundError:
return web.json_response({"error": f"Agent not found: {agent_path}"}, status=404)
except Exception as e:
resp = _credential_error_response(e, agent_path)
if resp is not None:
return resp
logger.exception("Error loading worker: %s", e)
return web.json_response({"error": str(e)}, status=500)
return web.json_response({"error": "Internal server error"}, status=500)
return web.json_response(_session_to_live_dict(session))
@@ -308,7 +349,8 @@ async def handle_session_entry_points(request: web.Request) -> web.Response:
status=404,
)
eps = session.worker_runtime.get_entry_points() if session.worker_runtime else []
rt = session.worker_runtime
eps = rt.get_entry_points() if rt else []
return web.json_response(
{
"entry_points": [
@@ -318,6 +360,11 @@ async def handle_session_entry_points(request: web.Request) -> web.Response:
"entry_node": ep.entry_node,
"trigger_type": ep.trigger_type,
"trigger_config": ep.trigger_config,
**(
{"next_fire_in": nf}
if rt and (nf := rt.get_timer_next_fire_in(ep.id)) is not None
else {}
),
}
for ep in eps
]
@@ -369,7 +416,7 @@ async def handle_list_worker_sessions(request: web.Request) -> web.Response:
state_path = d / "state.json"
if state_path.exists():
try:
state = json.loads(state_path.read_text())
state = json.loads(state_path.read_text(encoding="utf-8"))
entry["status"] = state.get("status", "unknown")
entry["started_at"] = state.get("started_at")
entry["completed_at"] = state.get("completed_at")
@@ -408,7 +455,7 @@ async def handle_get_worker_session(request: web.Request) -> web.Response:
return web.json_response({"error": "Session not found"}, status=404)
try:
state = json.loads(state_path.read_text())
state = json.loads(state_path.read_text(encoding="utf-8"))
except (json.JSONDecodeError, OSError) as e:
return web.json_response({"error": f"Failed to read session: {e}"}, status=500)
@@ -436,7 +483,7 @@ async def handle_list_checkpoints(request: web.Request) -> web.Response:
if f.suffix != ".json":
continue
try:
data = json.loads(f.read_text())
data = json.loads(f.read_text(encoding="utf-8"))
checkpoints.append(
{
"checkpoint_id": f.stem,
@@ -546,13 +593,14 @@ async def handle_messages(request: web.Request) -> web.Response:
if part_file.suffix != ".json":
continue
try:
part = json.loads(part_file.read_text())
part = json.loads(part_file.read_text(encoding="utf-8"))
part["_node_id"] = node_dir.name
part.setdefault("created_at", part_file.stat().st_mtime)
all_messages.append(part)
except (json.JSONDecodeError, OSError):
continue
all_messages.sort(key=lambda m: m.get("seq", 0))
all_messages.sort(key=lambda m: m.get("created_at", m.get("seq", 0)))
client_only = request.query.get("client_only", "").lower() in ("true", "1")
if client_only:
@@ -579,15 +627,17 @@ async def handle_messages(request: web.Request) -> web.Response:
async def handle_queen_messages(request: web.Request) -> web.Response:
"""GET /api/sessions/{session_id}/queen-messages — get queen conversation."""
session, err = resolve_session(request)
if err:
return err
"""GET /api/sessions/{session_id}/queen-messages — get queen conversation.
queen_dir = Path.home() / ".hive" / "queen" / "session" / session.id
Reads directly from disk so it works for both live sessions and cold
(post-server-restart) sessions no live session required.
"""
session_id = request.match_info["session_id"]
queen_dir = Path.home() / ".hive" / "queen" / "session" / session_id
convs_dir = queen_dir / "conversations"
if not convs_dir.exists():
return web.json_response({"messages": []})
return web.json_response({"messages": [], "session_id": session_id})
all_messages: list[dict] = []
for node_dir in convs_dir.iterdir():
@@ -600,13 +650,16 @@ async def handle_queen_messages(request: web.Request) -> web.Response:
if part_file.suffix != ".json":
continue
try:
part = json.loads(part_file.read_text())
part = json.loads(part_file.read_text(encoding="utf-8"))
part["_node_id"] = node_dir.name
# Use file mtime as created_at so frontend can order
# queen and worker messages chronologically.
part.setdefault("created_at", part_file.stat().st_mtime)
all_messages.append(part)
except (json.JSONDecodeError, OSError):
continue
all_messages.sort(key=lambda m: m.get("seq", 0))
all_messages.sort(key=lambda m: m.get("created_at", m.get("seq", 0)))
# Filter to client-facing messages only
all_messages = [
@@ -617,7 +670,58 @@ async def handle_queen_messages(request: web.Request) -> web.Response:
and not (m["role"] == "assistant" and m.get("tool_calls"))
]
return web.json_response({"messages": all_messages})
return web.json_response({"messages": all_messages, "session_id": session_id})
async def handle_session_history(request: web.Request) -> web.Response:
"""GET /api/sessions/history — all queen sessions on disk (live + cold).
Returns every session directory under ~/.hive/queen/session/, newest first.
Live sessions have ``live: true, cold: false``; sessions that survived a
server restart have ``live: false, cold: true``.
"""
manager = _get_manager(request)
live_sessions = {s.id: s for s in manager.list_sessions()}
disk_sessions = SessionManager.list_cold_sessions()
for s in disk_sessions:
if s["session_id"] in live_sessions:
live = live_sessions[s["session_id"]]
s["cold"] = False
s["live"] = True
# Fill in agent_name from live memory if meta.json wasn't written yet
if not s.get("agent_name") and live.worker_info:
s["agent_name"] = live.worker_info.name
if not s.get("agent_path") and live.worker_path:
s["agent_path"] = str(live.worker_path)
return web.json_response({"sessions": disk_sessions})
async def handle_delete_history_session(request: web.Request) -> web.Response:
"""DELETE /api/sessions/history/{session_id} — permanently remove a session.
Stops the live session (if still running) and deletes the queen session
directory from disk at ~/.hive/queen/session/{session_id}/.
This is the frontend 'delete from history' action.
"""
manager = _get_manager(request)
session_id = request.match_info["session_id"]
# Stop the live session if it exists (best-effort)
if manager.get_session(session_id):
await manager.stop_session(session_id)
# Delete the queen session directory from disk
queen_session_dir = Path.home() / ".hive" / "queen" / "session" / session_id
if queen_session_dir.exists() and queen_session_dir.is_dir():
try:
shutil.rmtree(queen_session_dir)
except OSError as e:
logger.warning("Failed to delete session directory %s: %s", queen_session_dir, e)
return web.json_response({"error": f"Failed to delete session: {e}"}, status=500)
return web.json_response({"deleted": session_id})
# ------------------------------------------------------------------
@@ -666,6 +770,9 @@ def register_routes(app: web.Application) -> None:
# Session lifecycle
app.router.add_post("/api/sessions", handle_create_session)
app.router.add_get("/api/sessions", handle_list_live_sessions)
# history must be registered before {session_id} so it takes priority
app.router.add_get("/api/sessions/history", handle_session_history)
app.router.add_delete("/api/sessions/history/{session_id}", handle_delete_history_session)
app.router.add_get("/api/sessions/{session_id}", handle_get_live_session)
app.router.add_delete("/api/sessions/{session_id}", handle_stop_session)
+300 -26
View File
@@ -40,9 +40,16 @@ class Session:
runner: Any | None = None # AgentRunner
worker_runtime: Any | None = None # AgentRuntime
worker_info: Any | None = None # AgentInfo
# Queen mode state (building/staging/running)
mode_state: Any = None # QueenModeState
# Judge (active when worker is loaded)
judge_task: asyncio.Task | None = None
escalation_sub: str | None = None
# Session directory resumption:
# When set, _start_queen writes queen conversations to this existing session's
# directory instead of creating a new one. This lets cold-restores accumulate
# all messages in the original session folder so history is never fragmented.
queen_resume_from: str | None = None
class SessionManager:
@@ -52,10 +59,11 @@ class SessionManager:
(blocking I/O) then started on the event loop.
"""
def __init__(self, model: str | None = None) -> None:
def __init__(self, model: str | None = None, credential_store=None) -> None:
self._sessions: dict[str, Session] = {}
self._loading: set[str] = set()
self._model = model
self._credential_store = credential_store
self._lock = asyncio.Lock()
# ------------------------------------------------------------------
@@ -111,18 +119,25 @@ class SessionManager:
session_id: str | None = None,
model: str | None = None,
initial_prompt: str | None = None,
queen_resume_from: str | None = None,
) -> Session:
"""Create a new session with a queen but no worker.
The queen starts immediately with MCP coding tools.
A worker can be loaded later via load_worker().
When ``queen_resume_from`` is set the queen writes conversation messages
to that existing session's directory instead of creating a new one.
This preserves full conversation history across server restarts.
"""
session = await self._create_session_core(session_id=session_id, model=model)
session.queen_resume_from = queen_resume_from
# Start queen immediately (queen-only, no worker tools yet)
await self._start_queen(session, worker_identity=None, initial_prompt=initial_prompt)
logger.info("Session '%s' created (queen-only)", session.id)
logger.info(
"Session '%s' created (queen-only, resume_from=%s)",
session.id,
queen_resume_from,
)
return session
async def create_session_with_worker(
@@ -131,15 +146,12 @@ class SessionManager:
agent_id: str | None = None,
model: str | None = None,
initial_prompt: str | None = None,
queen_resume_from: str | None = None,
) -> Session:
"""Create a session and load a worker in one step.
Backward-compatible with the old POST /api/agents flow.
Loads the worker FIRST so the queen starts with full lifecycle
and monitoring tools available.
The session gets an auto-generated unique ID. The agent name
becomes the worker_id (used by the frontend as backendAgentId).
When ``queen_resume_from`` is set the queen writes conversation messages
to that existing session's directory instead of creating a new one.
"""
from framework.tools.queen_lifecycle_tools import build_worker_profile
@@ -148,6 +160,7 @@ class SessionManager:
# Auto-generate session ID (not the agent name)
session = await self._create_session_core(model=model)
session.queen_resume_from = queen_resume_from
try:
# Load worker FIRST (before queen) so queen gets full tools
await self._load_worker_core(
@@ -167,10 +180,6 @@ class SessionManager:
session, worker_identity=worker_identity, initial_prompt=initial_prompt
)
# Health judge disabled for simplicity.
# if agent_path.name != "hive_coder" and session.worker_runtime:
# await self._start_judge(session, session.runner._storage_path)
except Exception:
# If anything fails, tear down the session
await self.stop_session(session.id)
@@ -217,6 +226,7 @@ class SessionManager:
model=resolved_model,
interactive=False,
skip_credential_validation=True,
credential_store=self._credential_store,
),
)
@@ -277,13 +287,13 @@ class SessionManager:
if not state_path.exists():
continue
try:
state = json.loads(state_path.read_text())
state = json.loads(state_path.read_text(encoding="utf-8"))
if state.get("status") != "active":
continue
state["status"] = "cancelled"
state.setdefault("result", {})["error"] = "Stale session: runtime restarted"
state.setdefault("timestamps", {})["updated_at"] = datetime.now().isoformat()
state_path.write_text(json.dumps(state, indent=2))
state_path.write_text(json.dumps(state, indent=2), encoding="utf-8")
logger.info(
"Marked stale session '%s' as cancelled for agent '%s'", d.name, agent_path.name
)
@@ -397,7 +407,12 @@ class SessionManager:
worker_identity: str | None,
initial_prompt: str | None = None,
) -> None:
"""Start the queen executor for a session."""
"""Start the queen executor for a session.
When ``session.queen_resume_from`` is set, queen conversation messages
are written to the ORIGINAL session's directory so the full conversation
history accumulates in one place across server restarts.
"""
from framework.agents.hive_coder.agent import (
queen_goal,
queen_graph as _queen_graph,
@@ -407,9 +422,41 @@ class SessionManager:
from framework.runtime.core import Runtime
hive_home = Path.home() / ".hive"
queen_dir = hive_home / "queen" / "session" / session.id
# Determine which session directory to use for queen storage.
# When queen_resume_from is set we write to the ORIGINAL session's
# directory so that all messages accumulate in one place.
storage_session_id = session.queen_resume_from or session.id
queen_dir = hive_home / "queen" / "session" / storage_session_id
queen_dir.mkdir(parents=True, exist_ok=True)
# Always write/update session metadata so history sidebar has correct
# agent name, path, and last-active timestamp (important so the original
# session directory sorts as "most recent" after a cold-restore resume).
_meta_path = queen_dir / "meta.json"
try:
_agent_name = (
session.worker_info.name
if session.worker_info
else (
str(session.worker_path.name).replace("_", " ").title()
if session.worker_path
else None
)
)
_meta_path.write_text(
json.dumps(
{
"agent_name": _agent_name,
"agent_path": str(session.worker_path) if session.worker_path else None,
"created_at": time.time(),
}
),
encoding="utf-8",
)
except OSError:
pass
# Register MCP coding tools
queen_registry = ToolRegistry()
import framework.agents.hive_coder as _hive_coder_pkg
@@ -423,16 +470,26 @@ class SessionManager:
except Exception:
logger.warning("Queen: MCP config failed to load", exc_info=True)
# Mode state for building/running mode switching
from framework.tools.queen_lifecycle_tools import (
QueenModeState,
register_queen_lifecycle_tools,
)
# Start in staging when the caller provided an agent, building otherwise.
initial_mode = "staging" if worker_identity else "building"
mode_state = QueenModeState(mode=initial_mode, event_bus=session.event_bus)
session.mode_state = mode_state
# Always register lifecycle tools — they check session.worker_runtime
# at call time, so they work even if no worker is loaded yet.
from framework.tools.queen_lifecycle_tools import register_queen_lifecycle_tools
register_queen_lifecycle_tools(
queen_registry,
session=session,
session_id=session.id,
session_manager=self,
manager_session_id=session.id,
mode_state=mode_state,
)
# Monitoring tools need concrete worker paths — only register when present
@@ -450,6 +507,32 @@ class SessionManager:
queen_tools = list(queen_registry.get_tools().values())
queen_tool_executor = queen_registry.get_executor()
# Partition tools into mode-specific sets
from framework.agents.hive_coder.nodes import (
_QUEEN_BUILDING_TOOLS,
_QUEEN_RUNNING_TOOLS,
_QUEEN_STAGING_TOOLS,
)
building_names = set(_QUEEN_BUILDING_TOOLS)
staging_names = set(_QUEEN_STAGING_TOOLS)
running_names = set(_QUEEN_RUNNING_TOOLS)
registered_names = {t.name for t in queen_tools}
missing_building = building_names - registered_names
if missing_building:
logger.warning(
"Queen: %d/%d building tools NOT registered: %s",
len(missing_building),
len(building_names),
sorted(missing_building),
)
logger.info("Queen: registered tools: %s", sorted(registered_names))
mode_state.building_tools = [t for t in queen_tools if t.name in building_names]
mode_state.staging_tools = [t for t in queen_tools if t.name in staging_names]
mode_state.running_tools = [t for t in queen_tools if t.name in running_names]
# Build queen graph with adjusted prompt + tools
_orig_node = _queen_graph.nodes[0]
base_prompt = _orig_node.system_prompt or ""
@@ -491,20 +574,51 @@ class SessionManager:
storage_path=queen_dir,
loop_config=queen_graph.loop_config,
execution_id=session.id,
dynamic_tools_provider=mode_state.get_current_tools,
)
session.queen_executor = executor
logger.info(
"Queen starting with %d tools: %s",
len(queen_tools),
[t.name for t in queen_tools],
# Wire inject_notification so mode switches notify the queen LLM
async def _inject_mode_notification(content: str) -> None:
node = executor.node_registry.get("queen")
if node is not None and hasattr(node, "inject_event"):
await node.inject_event(content)
mode_state.inject_notification = _inject_mode_notification
# Auto-switch to staging when worker execution finishes naturally
from framework.runtime.event_bus import EventType as _ET
async def _on_worker_done(event):
if event.stream_id == "queen":
return
if mode_state.mode == "running":
await mode_state.switch_to_staging(source="auto")
session.event_bus.subscribe(
event_types=[_ET.EXECUTION_COMPLETED, _ET.EXECUTION_FAILED],
handler=_on_worker_done,
)
await executor.execute(
logger.info(
"Queen starting in %s mode with %d tools: %s",
mode_state.mode,
len(mode_state.get_current_tools()),
[t.name for t in mode_state.get_current_tools()],
)
result = await executor.execute(
graph=queen_graph,
goal=queen_goal,
input_data={"greeting": initial_prompt or "Session started."},
session_state={"resume_session_id": session.id},
)
logger.warning("Queen executor returned (should be forever-alive)")
if result.success:
logger.warning("Queen executor returned (should be forever-alive)")
else:
logger.error(
"Queen executor failed: %s",
result.error or "(no error message)",
)
except Exception:
logger.error("Queen conversation crashed", exc_info=True)
finally:
@@ -703,6 +817,166 @@ class SessionManager:
def list_sessions(self) -> list[Session]:
return list(self._sessions.values())
# ------------------------------------------------------------------
# Cold session helpers (disk-only, no live runtime required)
# ------------------------------------------------------------------
@staticmethod
def get_cold_session_info(session_id: str) -> dict | None:
"""Return disk metadata for a session that is no longer live in memory.
Checks whether queen conversation files exist at
~/.hive/queen/session/{session_id}/conversations/. Returns None when
no data is found so callers can fall through to a 404.
"""
queen_dir = Path.home() / ".hive" / "queen" / "session" / session_id
convs_dir = queen_dir / "conversations"
if not convs_dir.exists():
return None
# Check whether any message part files are actually present
has_messages = False
try:
for node_dir in convs_dir.iterdir():
if not node_dir.is_dir():
continue
parts_dir = node_dir / "parts"
if parts_dir.exists() and any(f.suffix == ".json" for f in parts_dir.iterdir()):
has_messages = True
break
except OSError:
pass
try:
created_at = queen_dir.stat().st_ctime
except OSError:
created_at = 0.0
# Read extra metadata written at session start
agent_name: str | None = None
agent_path: str | None = None
meta_path = queen_dir / "meta.json"
if meta_path.exists():
try:
meta = json.loads(meta_path.read_text(encoding="utf-8"))
agent_name = meta.get("agent_name")
agent_path = meta.get("agent_path")
created_at = meta.get("created_at") or created_at
except (json.JSONDecodeError, OSError):
pass
return {
"session_id": session_id,
"cold": True,
"live": False,
"has_messages": has_messages,
"created_at": created_at,
"agent_name": agent_name,
"agent_path": agent_path,
}
@staticmethod
def list_cold_sessions() -> list[dict]:
"""Return metadata for every queen session directory on disk, newest first."""
queen_sessions_dir = Path.home() / ".hive" / "queen" / "session"
if not queen_sessions_dir.exists():
return []
results: list[dict] = []
try:
entries = sorted(
queen_sessions_dir.iterdir(),
key=lambda p: p.stat().st_mtime,
reverse=True,
)
except OSError:
return []
for d in entries:
if not d.is_dir():
continue
try:
created_at = d.stat().st_ctime
except OSError:
created_at = 0.0
agent_name: str | None = None
agent_path: str | None = None
meta_path = d / "meta.json"
if meta_path.exists():
try:
meta = json.loads(meta_path.read_text(encoding="utf-8"))
agent_name = meta.get("agent_name")
agent_path = meta.get("agent_path")
created_at = meta.get("created_at") or created_at
except (json.JSONDecodeError, OSError):
pass
# Build a quick preview of the last human/assistant exchange.
# We read all conversation parts, filter to client-facing messages,
# and return the last assistant message content as a snippet.
last_message: str | None = None
message_count: int = 0
convs_dir = d / "conversations"
if convs_dir.exists():
try:
all_parts: list[dict] = []
for node_dir in convs_dir.iterdir():
if not node_dir.is_dir():
continue
parts_dir = node_dir / "parts"
if not parts_dir.exists():
continue
for part_file in sorted(parts_dir.iterdir()):
if part_file.suffix != ".json":
continue
try:
part = json.loads(part_file.read_text(encoding="utf-8"))
part.setdefault("created_at", part_file.stat().st_mtime)
all_parts.append(part)
except (json.JSONDecodeError, OSError):
continue
# Filter to client-facing messages only
client_msgs = [
p
for p in all_parts
if not p.get("is_transition_marker")
and p.get("role") != "tool"
and not (p.get("role") == "assistant" and p.get("tool_calls"))
]
client_msgs.sort(key=lambda m: m.get("created_at", m.get("seq", 0)))
message_count = len(client_msgs)
# Last assistant message as preview snippet
for msg in reversed(client_msgs):
content = msg.get("content") or ""
if isinstance(content, list):
# Anthropic-style content blocks
content = " ".join(
b.get("text", "")
for b in content
if isinstance(b, dict) and b.get("type") == "text"
)
if content and msg.get("role") == "assistant":
last_message = content[:120].strip()
break
except OSError:
pass
results.append(
{
"session_id": d.name,
"cold": True, # caller overrides for live sessions
"live": False,
"has_messages": convs_dir.exists() and message_count > 0,
"created_at": created_at,
"agent_name": agent_name,
"agent_path": agent_path,
"last_message": last_message,
"message_count": message_count,
}
)
return results
async def shutdown_all(self) -> None:
"""Gracefully stop all sessions. Called on server shutdown."""
session_ids = list(self._sessions.keys())
+26 -13
View File
@@ -74,6 +74,7 @@ class MockStream:
is_awaiting_input: bool = False
_execution_tasks: dict = field(default_factory=dict)
_active_executors: dict = field(default_factory=dict)
active_execution_ids: set = field(default_factory=set)
async def cancel_execution(self, execution_id: str) -> bool:
return execution_id in self._execution_tasks
@@ -117,6 +118,9 @@ class MockRuntime:
async def inject_input(self, node_id, content, graph_id=None, *, is_client_input=False):
return True
def pause_timers(self):
pass
async def get_goal_progress(self):
return {"progress": 0.5, "criteria": []}
@@ -537,18 +541,8 @@ class TestExecution:
assert resp.status == 400
@pytest.mark.asyncio
async def test_pause_not_found(self):
session = _make_session()
app = _make_app_with_session(session)
async with TestClient(TestServer(app)) as client:
resp = await client.post(
"/api/sessions/test_agent/pause",
json={"execution_id": "nonexistent"},
)
assert resp.status == 404
@pytest.mark.asyncio
async def test_pause_missing_execution_id(self):
async def test_pause_no_active_executions(self):
"""Pause with no active executions returns stopped=False."""
session = _make_session()
app = _make_app_with_session(session)
async with TestClient(TestServer(app)) as client:
@@ -556,7 +550,26 @@ class TestExecution:
"/api/sessions/test_agent/pause",
json={},
)
assert resp.status == 400
assert resp.status == 200
data = await resp.json()
assert data["stopped"] is False
assert data["cancelled"] == []
assert data["timers_paused"] is True
@pytest.mark.asyncio
async def test_pause_does_not_cancel_queen(self):
"""Pause should stop the worker but leave the queen running."""
session = _make_session()
app = _make_app_with_session(session)
async with TestClient(TestServer(app)) as client:
resp = await client.post(
"/api/sessions/test_agent/pause",
json={},
)
assert resp.status == 200
# Queen's cancel_current_turn should NOT have been called
queen_node = session.queen_executor.node_registry["queen"]
queen_node.cancel_current_turn.assert_not_called()
@pytest.mark.asyncio
async def test_goal_progress(self):
+4 -2
View File
@@ -95,7 +95,7 @@ class CheckpointStore:
return None
try:
return Checkpoint.model_validate_json(checkpoint_path.read_text())
return Checkpoint.model_validate_json(checkpoint_path.read_text(encoding="utf-8"))
except Exception as e:
logger.error(f"Failed to load checkpoint {checkpoint_id}: {e}")
return None
@@ -123,7 +123,9 @@ class CheckpointStore:
return None
try:
return CheckpointIndex.model_validate_json(self.index_path.read_text())
return CheckpointIndex.model_validate_json(
self.index_path.read_text(encoding="utf-8")
)
except Exception as e:
logger.error(f"Failed to load checkpoint index: {e}")
return None
+2 -2
View File
@@ -114,7 +114,7 @@ class SessionStore:
if not state_path.exists():
return None
return SessionState.model_validate_json(state_path.read_text())
return SessionState.model_validate_json(state_path.read_text(encoding="utf-8"))
return await asyncio.to_thread(_read)
@@ -151,7 +151,7 @@ class SessionStore:
continue
try:
state = SessionState.model_validate_json(state_path.read_text())
state = SessionState.model_validate_json(state_path.read_text(encoding="utf-8"))
# Apply filters
if status and state.status != status:
+2 -2
View File
@@ -270,10 +270,10 @@ def _edit_test_code(code: str) -> str:
try:
# Open editor
subprocess.run([editor, temp_path], check=True)
subprocess.run([editor, temp_path], check=True, encoding="utf-8")
# Read edited code
with open(temp_path) as f:
with open(temp_path, encoding="utf-8") as f:
return f.read()
except subprocess.CalledProcessError:
print("Editor failed, keeping original code")
+34 -3
View File
@@ -11,10 +11,35 @@ Provides commands:
import argparse
import ast
import os
import shutil
import subprocess
import sys
from pathlib import Path
def _check_pytest_available() -> bool:
"""Check if pytest is available as a runnable command.
Returns True if pytest is found, otherwise prints an error message
with install instructions and returns False.
"""
if shutil.which("pytest") is None:
print(
"Error: pytest is not installed or not on PATH.\n"
"Hive's testing commands require pytest at runtime.\n"
"Install it with:\n"
"\n"
" pip install 'framework[testing]'\n"
"\n"
"or if using uv:\n"
"\n"
" uv pip install 'framework[testing]'",
file=sys.stderr,
)
return False
return True
def register_testing_commands(subparsers: argparse._SubParsersAction) -> None:
"""Register testing CLI commands."""
@@ -105,6 +130,9 @@ def register_testing_commands(subparsers: argparse._SubParsersAction) -> None:
def cmd_test_run(args: argparse.Namespace) -> int:
"""Run tests for an agent using pytest subprocess."""
if not _check_pytest_available():
return 1
agent_path = Path(args.agent_path)
tests_dir = agent_path / "tests"
@@ -162,6 +190,7 @@ def cmd_test_run(args: argparse.Namespace) -> int:
try:
result = subprocess.run(
cmd,
encoding="utf-8",
env=env,
timeout=600, # 10 minute timeout
)
@@ -177,7 +206,8 @@ def cmd_test_run(args: argparse.Namespace) -> int:
def cmd_test_debug(args: argparse.Namespace) -> int:
"""Debug a failed test by re-running with verbose output."""
import subprocess
if not _check_pytest_available():
return 1
agent_path = Path(args.agent_path)
test_name = args.test_name
@@ -190,7 +220,7 @@ def cmd_test_debug(args: argparse.Namespace) -> int:
# Find which file contains the test
test_file = None
for py_file in tests_dir.glob("test_*.py"):
content = py_file.read_text()
content = py_file.read_text(encoding="utf-8")
if f"def {test_name}" in content or f"async def {test_name}" in content:
test_file = py_file
break
@@ -219,6 +249,7 @@ def cmd_test_debug(args: argparse.Namespace) -> int:
try:
result = subprocess.run(
cmd,
encoding="utf-8",
env=env,
timeout=120, # 2 minute timeout for single test
)
@@ -238,7 +269,7 @@ def _scan_test_files(tests_dir: Path) -> list[dict]:
for test_file in sorted(tests_dir.glob("test_*.py")):
try:
content = test_file.read_text()
content = test_file.read_text(encoding="utf-8")
tree = ast.parse(content)
for node in ast.walk(tree):
+759 -74
View File
@@ -36,13 +36,14 @@ from __future__ import annotations
import asyncio
import json
import logging
from dataclasses import dataclass
from dataclasses import dataclass, field
from pathlib import Path
from typing import TYPE_CHECKING, Any
from framework.credentials.models import CredentialError
from framework.credentials.validation import validate_agent_credentials
from framework.runner.preload_validation import credential_errors_to_json, validate_credentials
from framework.runtime.event_bus import AgentEvent, EventType
from framework.server.app import validate_agent_path
if TYPE_CHECKING:
from framework.runner.tool_registry import ToolRegistry
@@ -65,6 +66,125 @@ class WorkerSessionAdapter:
worker_path: Path | None = None
@dataclass
class QueenModeState:
"""Mutable state container for queen operating mode.
Three modes: building staging running.
Shared between the dynamic_tools_provider callback and tool handlers
that trigger mode transitions.
"""
mode: str = "building" # "building", "staging", or "running"
building_tools: list = field(default_factory=list) # list[Tool]
staging_tools: list = field(default_factory=list) # list[Tool]
running_tools: list = field(default_factory=list) # list[Tool]
inject_notification: Any = None # async (str) -> None
event_bus: Any = None # EventBus — for emitting QUEEN_MODE_CHANGED events
def get_current_tools(self) -> list:
"""Return tools for the current mode."""
if self.mode == "running":
return list(self.running_tools)
if self.mode == "staging":
return list(self.staging_tools)
return list(self.building_tools)
async def _emit_mode_event(self) -> None:
"""Publish a QUEEN_MODE_CHANGED event so the frontend updates the tag."""
if self.event_bus is not None:
await self.event_bus.publish(
AgentEvent(
type=EventType.QUEEN_MODE_CHANGED,
stream_id="queen",
data={"mode": self.mode},
)
)
async def switch_to_running(self, source: str = "tool") -> None:
"""Switch to running mode and notify the queen.
Args:
source: Who triggered the switch "tool" (queen LLM),
"frontend" (user clicked Run), or "auto" (system).
"""
if self.mode == "running":
return
self.mode = "running"
tool_names = [t.name for t in self.running_tools]
logger.info("Queen mode → running (source=%s, tools: %s)", source, tool_names)
await self._emit_mode_event()
if self.inject_notification:
if source == "frontend":
msg = (
"[MODE CHANGE] The user clicked Run in the UI. Switched to RUNNING mode. "
"Worker is now executing. You have monitoring/lifecycle tools: "
+ ", ".join(tool_names)
+ "."
)
else:
msg = (
"[MODE CHANGE] Switched to RUNNING mode. "
"Worker is executing. You now have monitoring/lifecycle tools: "
+ ", ".join(tool_names)
+ "."
)
await self.inject_notification(msg)
async def switch_to_staging(self, source: str = "tool") -> None:
"""Switch to staging mode and notify the queen.
Args:
source: Who triggered the switch "tool", "frontend", or "auto".
"""
if self.mode == "staging":
return
self.mode = "staging"
tool_names = [t.name for t in self.staging_tools]
logger.info("Queen mode → staging (source=%s, tools: %s)", source, tool_names)
await self._emit_mode_event()
if self.inject_notification:
if source == "frontend":
msg = (
"[MODE CHANGE] The user stopped the worker from the UI. "
"Switched to STAGING mode. Agent is still loaded. "
"Available tools: " + ", ".join(tool_names) + "."
)
elif source == "auto":
msg = (
"[MODE CHANGE] Worker execution completed. Switched to STAGING mode. "
"Agent is still loaded. Call run_agent_with_input(task) to run again. "
"Available tools: " + ", ".join(tool_names) + "."
)
else:
msg = (
"[MODE CHANGE] Switched to STAGING mode. "
"Agent loaded and ready. Call run_agent_with_input(task) to start, "
"or stop_worker_and_edit() to go back to building. "
"Available tools: " + ", ".join(tool_names) + "."
)
await self.inject_notification(msg)
async def switch_to_building(self, source: str = "tool") -> None:
"""Switch to building mode and notify the queen.
Args:
source: Who triggered the switch "tool", "frontend", or "auto".
"""
if self.mode == "building":
return
self.mode = "building"
tool_names = [t.name for t in self.building_tools]
logger.info("Queen mode → building (source=%s, tools: %s)", source, tool_names)
await self._emit_mode_event()
if self.inject_notification:
await self.inject_notification(
"[MODE CHANGE] Switched to BUILDING mode. "
"Lifecycle tools removed. Full coding tools restored. "
"Call load_built_agent(path) when ready to stage."
)
def build_worker_profile(runtime: AgentRuntime, agent_path: Path | str | None = None) -> str:
"""Build a worker capability profile from its graph/goal definition.
@@ -119,6 +239,8 @@ def register_queen_lifecycle_tools(
# Server context — enables load_built_agent tool
session_manager: Any = None,
manager_session_id: str | None = None,
# Mode switching
mode_state: QueenModeState | None = None,
) -> int:
"""Register queen lifecycle tools.
@@ -135,6 +257,9 @@ def register_queen_lifecycle_tools(
for ``load_built_agent`` to hot-load a worker.
manager_session_id: (Server only) The session's ID in the manager,
used with ``session_manager.load_worker()``.
mode_state: (Optional) Mutable mode state for building/running
mode switching. When provided, load_built_agent switches to
running mode and stop_worker_and_edit switches to building mode.
Returns the number of tools registered.
"""
@@ -158,6 +283,11 @@ def register_queen_lifecycle_tools(
# --- start_worker ---------------------------------------------------------
# How long to wait for credential validation + MCP resync before
# proceeding with trigger anyway. These are pre-flight checks that
# should not block the queen indefinitely.
_START_PREFLIGHT_TIMEOUT = 15 # seconds
async def start_worker(task: str) -> str:
"""Start the worker agent with a task description.
@@ -169,25 +299,50 @@ def register_queen_lifecycle_tools(
return json.dumps({"error": "No worker loaded in this session."})
try:
# Validate credentials before running — same deferred check as
# handle_trigger. Runs in executor because validate_agent_credentials
# makes blocking HTTP health-check calls.
# Pre-flight: validate credentials and resync MCP servers.
# Both are blocking I/O (HTTP health-checks, subprocess spawns)
# so they run in a thread-pool executor. We cap the total
# preflight time so the queen never hangs waiting.
loop = asyncio.get_running_loop()
await loop.run_in_executor(
None, lambda: validate_agent_credentials(runtime.graph.nodes)
)
# Resync MCP servers if credentials were added since the worker loaded
# (e.g. user connected an OAuth account mid-session via Aden UI).
runner = getattr(session, "runner", None)
if runner:
async def _preflight():
cred_error: CredentialError | None = None
try:
await loop.run_in_executor(
None,
lambda: runner._tool_registry.resync_mcp_servers_if_needed(),
lambda: validate_credentials(
runtime.graph.nodes,
interactive=False,
skip=False,
),
)
except Exception as e:
logger.warning("MCP resync failed: %s", e)
except CredentialError as e:
cred_error = e
runner = getattr(session, "runner", None)
if runner:
try:
await loop.run_in_executor(
None,
lambda: runner._tool_registry.resync_mcp_servers_if_needed(),
)
except Exception as e:
logger.warning("MCP resync failed: %s", e)
# Re-raise CredentialError after MCP resync so both steps
# get a chance to run before we bail.
if cred_error is not None:
raise cred_error
try:
await asyncio.wait_for(_preflight(), timeout=_START_PREFLIGHT_TIMEOUT)
except TimeoutError:
logger.warning(
"start_worker preflight timed out after %ds — proceeding with trigger",
_START_PREFLIGHT_TIMEOUT,
)
except CredentialError:
raise # handled below
# Resume timers in case they were paused by a previous stop_worker
runtime.resume_timers()
@@ -213,6 +368,11 @@ def register_queen_lifecycle_tools(
}
)
except CredentialError as e:
# Build structured error with per-credential details so the
# queen can report exactly what's missing and how to fix it.
error_payload = credential_errors_to_json(e)
error_payload["agent_path"] = str(getattr(session, "worker_path", "") or "")
# Emit SSE event so the frontend opens the credentials modal
bus = getattr(session, "event_bus", None)
if bus is not None:
@@ -220,14 +380,10 @@ def register_queen_lifecycle_tools(
AgentEvent(
type=EventType.CREDENTIALS_REQUIRED,
stream_id="queen",
data={
"error": "credentials_required",
"message": str(e),
"agent_path": str(getattr(session, "worker_path", "") or ""),
},
data=error_payload,
)
)
return json.dumps({"error": "credentials_required", "message": str(e)})
return json.dumps(error_payload)
except Exception as e:
return json.dumps({"error": f"Failed to start worker: {e}"})
@@ -254,30 +410,40 @@ def register_queen_lifecycle_tools(
# --- stop_worker ----------------------------------------------------------
async def stop_worker() -> str:
"""Cancel all active worker executions.
"""Cancel all active worker executions across all graphs.
Stops the worker gracefully. Returns the IDs of cancelled executions.
Stops the worker immediately. Returns the IDs of cancelled executions.
"""
runtime = _get_runtime()
if runtime is None:
return json.dumps({"error": "No worker loaded in this session."})
cancelled = []
graph_id = runtime.graph_id
# Get the primary graph's streams
reg = runtime.get_graph_registration(graph_id)
if reg is None:
return json.dumps({"error": "Worker graph not found"})
# Iterate ALL registered graphs — multiple entrypoint requests
# can spawn executions in different graphs within the same session.
for graph_id in runtime.list_graphs():
reg = runtime.get_graph_registration(graph_id)
if reg is None:
continue
for _ep_id, stream in reg.streams.items():
for exec_id in list(stream.active_execution_ids):
try:
ok = await stream.cancel_execution(exec_id)
if ok:
cancelled.append(exec_id)
except Exception as e:
logger.warning("Failed to cancel %s: %s", exec_id, e)
for _ep_id, stream in reg.streams.items():
# Signal shutdown on all active EventLoopNodes first so they
# exit cleanly and cancel their in-flight LLM streams.
for executor in stream._active_executors.values():
for node in executor.node_registry.values():
if hasattr(node, "signal_shutdown"):
node.signal_shutdown()
if hasattr(node, "cancel_current_turn"):
node.cancel_current_turn()
for exec_id in list(stream.active_execution_ids):
try:
ok = await stream.cancel_execution(exec_id)
if ok:
cancelled.append(exec_id)
except Exception as e:
logger.warning("Failed to cancel %s: %s", exec_id, e)
# Pause timers so the next tick doesn't restart execution
runtime.pause_timers()
@@ -301,13 +467,117 @@ def register_queen_lifecycle_tools(
registry.register("stop_worker", _stop_tool, lambda inputs: stop_worker())
tools_registered += 1
# --- stop_worker_and_edit -------------------------------------------------
async def stop_worker_and_edit() -> str:
"""Stop the worker and switch to building mode for editing the agent."""
stop_result = await stop_worker()
# Switch to building mode
if mode_state is not None:
await mode_state.switch_to_building()
result = json.loads(stop_result)
result["mode"] = "building"
result["message"] = (
"Worker stopped. You are now in building mode. "
"Use your coding tools to modify the agent, then call "
"load_built_agent(path) to stage it again."
)
return json.dumps(result)
_stop_edit_tool = Tool(
name="stop_worker_and_edit",
description=(
"Stop the running worker and switch to building mode. "
"Use this when you need to modify the agent's code, nodes, or configuration. "
"After editing, call load_built_agent(path) to reload and run."
),
parameters={"type": "object", "properties": {}},
)
registry.register(
"stop_worker_and_edit", _stop_edit_tool, lambda inputs: stop_worker_and_edit()
)
tools_registered += 1
# --- stop_worker (Running → Staging) -------------------------------------
async def stop_worker_to_staging() -> str:
"""Stop the running worker and switch to staging mode.
After stopping, ask the user whether they want to:
1. Re-run the agent with new input call run_agent_with_input(task)
2. Edit the agent code call stop_worker_and_edit() to go to building mode
"""
stop_result = await stop_worker()
# Switch to staging mode
if mode_state is not None:
await mode_state.switch_to_staging()
result = json.loads(stop_result)
result["mode"] = "staging"
result["message"] = (
"Worker stopped. You are now in staging mode. "
"Ask the user: would they like to re-run with new input, "
"or edit the agent code?"
)
return json.dumps(result)
_stop_worker_tool = Tool(
name="stop_worker",
description=(
"Stop the running worker and switch to staging mode. "
"After stopping, ask the user whether they want to re-run "
"with new input or edit the agent code."
),
parameters={"type": "object", "properties": {}},
)
registry.register("stop_worker", _stop_worker_tool, lambda inputs: stop_worker_to_staging())
tools_registered += 1
# --- get_worker_status ----------------------------------------------------
async def get_worker_status() -> str:
"""Check if the worker is idle, running, or waiting for user input.
def _get_event_bus():
"""Get the session's event bus for querying history."""
return getattr(session, "event_bus", None)
Returns worker identity, execution state, active node, and iteration count.
_status_last_called: dict[str, float] = {} # {"ts": monotonic time}
_STATUS_COOLDOWN = 30.0 # seconds between full status checks
async def get_worker_status(last_n: int = 20) -> str:
"""Comprehensive worker status: state, execution details, and recent activity.
Returns everything the queen needs in a single call:
- Identity and high-level state (idle / running / waiting_for_input)
- Active execution details (elapsed time, current node, iteration)
- Running tool calls (started but not yet completed)
- Recent completed tool calls (name, success/error)
- Node transitions (execution path)
- Retries, stalls, and constraint violations
- Goal progress and token consumption
Args:
last_n: Number of recent events to include per category (default 20).
"""
import time as _time
now = _time.monotonic()
last = _status_last_called.get("ts", 0.0)
if now - last < _STATUS_COOLDOWN:
remaining = int(_STATUS_COOLDOWN - (now - last))
return json.dumps(
{
"status": "cooldown",
"message": (
f"Status was checked {int(now - last)}s ago. "
f"Wait {remaining}s before checking again. "
"Do NOT call this tool in a loop — wait for user input instead."
),
}
)
_status_last_called["ts"] = now
runtime = _get_runtime()
if runtime is None:
return json.dumps({"status": "not_loaded", "message": "No worker loaded."})
@@ -318,55 +588,235 @@ def register_queen_lifecycle_tools(
if reg is None:
return json.dumps({"status": "not_loaded"})
base = {
result: dict[str, Any] = {
"worker_graph_id": graph_id,
"worker_goal": getattr(goal, "name", graph_id),
}
# --- Execution state ---
active_execs = []
for ep_id, stream in reg.streams.items():
for exec_id in stream.active_execution_ids:
active_execs.append(
{
"execution_id": exec_id,
"entry_point": ep_id,
}
)
exec_info: dict[str, Any] = {
"execution_id": exec_id,
"entry_point": ep_id,
}
ctx = stream.get_context(exec_id)
if ctx:
from datetime import datetime
elapsed = (datetime.now() - ctx.started_at).total_seconds()
exec_info["elapsed_seconds"] = round(elapsed, 1)
exec_info["exec_status"] = ctx.status
active_execs.append(exec_info)
if not active_execs:
return json.dumps(
{
**base,
"status": "idle",
"message": "Worker has no active executions.",
}
result["status"] = "idle"
result["message"] = "Worker has no active executions."
else:
waiting_nodes = []
for _ep_id, stream in reg.streams.items():
waiting_nodes.extend(stream.get_waiting_nodes())
result["status"] = "waiting_for_input" if waiting_nodes else "running"
result["active_executions"] = active_execs
if waiting_nodes:
result["waiting_node_id"] = waiting_nodes[0]["node_id"]
result["agent_idle_seconds"] = round(runtime.agent_idle_seconds, 1)
# --- EventBus enrichment ---
bus = _get_event_bus()
if not bus:
return json.dumps(result)
try:
# Pending user question (from ask_user tool)
if result.get("status") == "waiting_for_input":
input_events = bus.get_history(event_type=EventType.CLIENT_INPUT_REQUESTED, limit=1)
if input_events:
prompt = input_events[0].data.get("prompt", "")
if prompt:
result["pending_question"] = prompt
# Current node
edge_events = bus.get_history(event_type=EventType.EDGE_TRAVERSED, limit=1)
if edge_events:
target = edge_events[0].data.get("target_node")
if target:
result["current_node"] = target
# Current iteration
iter_events = bus.get_history(event_type=EventType.NODE_LOOP_ITERATION, limit=1)
if iter_events:
result["current_iteration"] = iter_events[0].data.get("iteration")
# Running tool calls (started but not yet completed)
tool_started = bus.get_history(event_type=EventType.TOOL_CALL_STARTED, limit=last_n * 2)
tool_completed = bus.get_history(
event_type=EventType.TOOL_CALL_COMPLETED, limit=last_n * 2
)
completed_ids = {
evt.data.get("tool_use_id") for evt in tool_completed if evt.data.get("tool_use_id")
}
running = [
evt
for evt in tool_started
if evt.data.get("tool_use_id") and evt.data.get("tool_use_id") not in completed_ids
]
if running:
result["running_tools"] = [
{
"tool": evt.data.get("tool_name"),
"node": evt.node_id,
"started_at": evt.timestamp.isoformat(),
"input_preview": str(evt.data.get("tool_input", ""))[:200],
}
for evt in running
]
# Check if the worker is waiting for user input
waiting_nodes = []
for _ep_id, stream in reg.streams.items():
waiting_nodes.extend(stream.get_waiting_nodes())
# Recent completed tool calls
if tool_completed:
result["recent_tool_calls"] = [
{
"tool": evt.data.get("tool_name"),
"error": bool(evt.data.get("is_error")),
"node": evt.node_id,
"time": evt.timestamp.isoformat(),
}
for evt in tool_completed[:last_n]
]
status = "waiting_for_input" if waiting_nodes else "running"
result = {
**base,
"status": status,
"active_executions": active_execs,
}
if waiting_nodes:
result["waiting_node_id"] = waiting_nodes[0]["node_id"]
return json.dumps(result)
# Node transitions
edges = bus.get_history(event_type=EventType.EDGE_TRAVERSED, limit=last_n)
if edges:
result["node_transitions"] = [
{
"from": evt.data.get("source_node"),
"to": evt.data.get("target_node"),
"condition": evt.data.get("edge_condition"),
"time": evt.timestamp.isoformat(),
}
for evt in edges
]
# Retries
retries = bus.get_history(event_type=EventType.NODE_RETRY, limit=last_n)
if retries:
result["retries"] = [
{
"node": evt.node_id,
"retry_count": evt.data.get("retry_count"),
"error": evt.data.get("error", "")[:200],
"time": evt.timestamp.isoformat(),
}
for evt in retries
]
# Stalls and doom loops
stalls = bus.get_history(event_type=EventType.NODE_STALLED, limit=5)
doom_loops = bus.get_history(event_type=EventType.NODE_TOOL_DOOM_LOOP, limit=5)
issues = []
for evt in stalls:
issues.append(
{
"type": "stall",
"node": evt.node_id,
"reason": evt.data.get("reason", "")[:200],
"time": evt.timestamp.isoformat(),
}
)
for evt in doom_loops:
issues.append(
{
"type": "tool_doom_loop",
"node": evt.node_id,
"description": evt.data.get("description", "")[:200],
"time": evt.timestamp.isoformat(),
}
)
if issues:
result["issues"] = issues
# Constraint violations
violations = bus.get_history(event_type=EventType.CONSTRAINT_VIOLATION, limit=5)
if violations:
result["constraint_violations"] = [
{
"constraint": evt.data.get("constraint_id"),
"description": evt.data.get("description", "")[:200],
"time": evt.timestamp.isoformat(),
}
for evt in violations
]
# Goal progress
try:
progress = await runtime.get_goal_progress()
if progress:
result["goal_progress"] = progress
except Exception:
pass
# Token summary
llm_events = bus.get_history(event_type=EventType.LLM_TURN_COMPLETE, limit=200)
if llm_events:
total_in = sum(evt.data.get("input_tokens", 0) or 0 for evt in llm_events)
total_out = sum(evt.data.get("output_tokens", 0) or 0 for evt in llm_events)
result["token_summary"] = {
"llm_turns": len(llm_events),
"input_tokens": total_in,
"output_tokens": total_out,
"total_tokens": total_in + total_out,
}
# Execution completions/failures
exec_completed = bus.get_history(event_type=EventType.EXECUTION_COMPLETED, limit=5)
exec_failed = bus.get_history(event_type=EventType.EXECUTION_FAILED, limit=5)
if exec_completed or exec_failed:
result["execution_outcomes"] = []
for evt in exec_completed:
result["execution_outcomes"].append(
{
"outcome": "completed",
"execution_id": evt.execution_id,
"time": evt.timestamp.isoformat(),
}
)
for evt in exec_failed:
result["execution_outcomes"].append(
{
"outcome": "failed",
"execution_id": evt.execution_id,
"error": evt.data.get("error", "")[:200],
"time": evt.timestamp.isoformat(),
}
)
except Exception:
pass # Non-critical enrichment
return json.dumps(result, default=str, ensure_ascii=False)
_status_tool = Tool(
name="get_worker_status",
description=(
"Check the worker agent's current state: idle (no execution), "
"running (actively processing), or waiting_for_input (blocked on "
"user response). Returns execution details."
"Get comprehensive worker status: state (idle/running/waiting_for_input), "
"execution details (elapsed time, current node, iteration), "
"recent tool calls, running tools, node transitions, retries, "
"stalls, constraint violations, goal progress, and token consumption. "
"One call gives the queen a complete picture."
),
parameters={"type": "object", "properties": {}},
parameters={
"type": "object",
"properties": {
"last_n": {
"type": "integer",
"description": "Number of recent events per category (default 20)",
},
},
"required": [],
},
)
registry.register("get_worker_status", _status_tool, lambda inputs: get_worker_status())
registry.register("get_worker_status", _status_tool, lambda inputs: get_worker_status(**inputs))
tools_registered += 1
# --- inject_worker_message ------------------------------------------------
@@ -391,7 +841,7 @@ def register_queen_lifecycle_tools(
injectable = stream.get_injectable_nodes()
if injectable:
target_node_id = injectable[0]["node_id"]
ok = await stream.inject_input(target_node_id, content)
ok = await stream.inject_input(target_node_id, content, is_client_input=True)
if ok:
return json.dumps(
{
@@ -430,6 +880,105 @@ def register_queen_lifecycle_tools(
)
tools_registered += 1
# --- list_credentials -----------------------------------------------------
async def list_credentials(credential_id: str = "") -> str:
"""List all authorized credentials (Aden OAuth + local encrypted store).
Returns credential IDs, aliases, status, and identity metadata.
Never returns secret values. Optionally filter by credential_id.
"""
try:
# Primary: CredentialStoreAdapter sees both Aden OAuth and local accounts
from aden_tools.credentials import CredentialStoreAdapter
store = CredentialStoreAdapter.default()
all_accounts = store.get_all_account_info()
# Filter by credential_id / provider if requested
if credential_id:
all_accounts = [
a
for a in all_accounts
if a.get("credential_id", "").startswith(credential_id)
or a.get("provider", "") == credential_id
]
return json.dumps(
{
"count": len(all_accounts),
"credentials": all_accounts,
},
default=str,
)
except ImportError:
pass
except Exception as e:
return json.dumps({"error": f"Failed to list credentials: {e}"})
# Fallback: local encrypted store only
try:
from framework.credentials.local.registry import LocalCredentialRegistry
registry = LocalCredentialRegistry.default()
accounts = registry.list_accounts(
credential_id=credential_id or None,
)
credentials = []
for info in accounts:
entry: dict[str, Any] = {
"credential_id": info.credential_id,
"alias": info.alias,
"storage_id": info.storage_id,
"status": info.status,
"created_at": info.created_at.isoformat() if info.created_at else None,
"last_validated": (
info.last_validated.isoformat() if info.last_validated else None
),
}
identity = info.identity.to_dict()
if identity:
entry["identity"] = identity
credentials.append(entry)
return json.dumps(
{
"count": len(credentials),
"credentials": credentials,
"location": "~/.hive/credentials",
},
default=str,
)
except Exception as e:
return json.dumps({"error": f"Failed to list credentials: {e}"})
_list_creds_tool = Tool(
name="list_credentials",
description=(
"List all authorized credentials in the local store. Returns credential IDs, "
"aliases, status (active/failed/unknown), and identity metadata — never secret "
"values. Optionally filter by credential_id (e.g. 'brave_search')."
),
parameters={
"type": "object",
"properties": {
"credential_id": {
"type": "string",
"description": (
"Filter to a specific credential type (e.g. 'brave_search'). "
"Omit to list all credentials."
),
},
},
"required": [],
},
)
registry.register(
"list_credentials", _list_creds_tool, lambda inputs: list_credentials(**inputs)
)
tools_registered += 1
# --- load_built_agent (server context only) --------------------------------
if session_manager is not None and manager_session_id is not None:
@@ -449,9 +998,12 @@ def register_queen_lifecycle_tools(
logger.error("Failed to unload existing worker: %s", e, exc_info=True)
return json.dumps({"error": f"Failed to unload existing worker: {e}"})
resolved_path = Path(agent_path).resolve()
try:
resolved_path = validate_agent_path(agent_path)
except ValueError as e:
return json.dumps({"error": str(e)})
if not resolved_path.exists():
return json.dumps({"error": f"Agent path does not exist: {resolved_path}"})
return json.dumps({"error": f"Agent path does not exist: {agent_path}"})
try:
updated_session = await session_manager.load_worker(
@@ -459,11 +1011,24 @@ def register_queen_lifecycle_tools(
str(resolved_path),
)
info = updated_session.worker_info
# Switch to staging mode after successful load
if mode_state is not None:
await mode_state.switch_to_staging()
worker_name = info.name if info else updated_session.worker_id
return json.dumps(
{
"status": "loaded",
"mode": "staging",
"message": (
f"Successfully loaded '{worker_name}'. "
"You are now in STAGING mode. "
"Call run_agent_with_input(task) to start the worker, "
"or stop_worker_and_edit() to go back to building."
),
"worker_id": updated_session.worker_id,
"worker_name": info.name if info else updated_session.worker_id,
"worker_name": worker_name,
"goal": info.goal_name if info else "",
"node_count": info.node_count if info else 0,
}
@@ -498,5 +1063,125 @@ def register_queen_lifecycle_tools(
)
tools_registered += 1
# --- run_agent_with_input ------------------------------------------------
async def run_agent_with_input(task: str) -> str:
"""Run the loaded worker agent with the given task input.
Performs preflight checks (credentials, MCP resync), triggers the
worker's default entry point, and switches to running mode.
"""
runtime = _get_runtime()
if runtime is None:
return json.dumps({"error": "No worker loaded in this session."})
try:
# Pre-flight: validate credentials and resync MCP servers.
loop = asyncio.get_running_loop()
async def _preflight():
cred_error: CredentialError | None = None
try:
await loop.run_in_executor(
None,
lambda: validate_credentials(
runtime.graph.nodes,
interactive=False,
skip=False,
),
)
except CredentialError as e:
cred_error = e
runner = getattr(session, "runner", None)
if runner:
try:
await loop.run_in_executor(
None,
lambda: runner._tool_registry.resync_mcp_servers_if_needed(),
)
except Exception as e:
logger.warning("MCP resync failed: %s", e)
if cred_error is not None:
raise cred_error
try:
await asyncio.wait_for(_preflight(), timeout=_START_PREFLIGHT_TIMEOUT)
except TimeoutError:
logger.warning(
"run_agent_with_input preflight timed out after %ds — proceeding",
_START_PREFLIGHT_TIMEOUT,
)
except CredentialError:
raise # handled below
# Resume timers in case they were paused by a previous stop
runtime.resume_timers()
# Get session state from any prior execution for memory continuity
session_state = runtime._get_primary_session_state("default") or {}
if session_id:
session_state["resume_session_id"] = session_id
exec_id = await runtime.trigger(
entry_point_id="default",
input_data={"user_request": task},
session_state=session_state,
)
# Switch to running mode
if mode_state is not None:
await mode_state.switch_to_running()
return json.dumps(
{
"status": "started",
"mode": "running",
"execution_id": exec_id,
"task": task,
}
)
except CredentialError as e:
error_payload = credential_errors_to_json(e)
error_payload["agent_path"] = str(getattr(session, "worker_path", "") or "")
bus = getattr(session, "event_bus", None)
if bus is not None:
await bus.publish(
AgentEvent(
type=EventType.CREDENTIALS_REQUIRED,
stream_id="queen",
data=error_payload,
)
)
return json.dumps(error_payload)
except Exception as e:
return json.dumps({"error": f"Failed to start worker: {e}"})
_run_input_tool = Tool(
name="run_agent_with_input",
description=(
"Run the loaded worker agent with the given task. Validates credentials, "
"triggers the worker's default entry point, and switches to running mode. "
"Use this after loading an agent (staging mode) to start execution."
),
parameters={
"type": "object",
"properties": {
"task": {
"type": "string",
"description": "The task or input for the worker agent to execute",
},
},
"required": ["task"],
},
)
registry.register(
"run_agent_with_input", _run_input_tool, lambda inputs: run_agent_with_input(**inputs)
)
tools_registered += 1
logger.info("Registered %d queen lifecycle tools", tools_registered)
return tools_registered
+6 -3
View File
@@ -18,7 +18,6 @@ from __future__ import annotations
import json
import logging
from pathlib import Path
from typing import TYPE_CHECKING
if TYPE_CHECKING:
@@ -48,10 +47,14 @@ def register_graph_tools(registry: ToolRegistry, runtime: AgentRuntime) -> int:
"""
from framework.runner.runner import AgentRunner
from framework.runtime.execution_stream import EntryPointSpec
from framework.server.app import validate_agent_path
path = Path(agent_path).resolve()
try:
path = validate_agent_path(agent_path)
except ValueError as e:
return json.dumps({"error": str(e)})
if not path.exists():
return json.dumps({"error": f"Agent path does not exist: {path}"})
return json.dumps({"error": f"Agent path does not exist: {agent_path}"})
try:
runner = AgentRunner.load(path)
+43 -37
View File
@@ -256,7 +256,7 @@ class AdenTUI(App):
"""Override to use native `open` for file:// URLs on macOS."""
if url.startswith("file://") and platform.system() == "Darwin":
path = url.removeprefix("file://")
subprocess.Popen(["open", path])
subprocess.Popen(["open", path], encoding="utf-8")
else:
super().open_url(url, new_tab=new_tab)
@@ -475,7 +475,10 @@ class AdenTUI(App):
from framework.graph.executor import GraphExecutor
from framework.runner.tool_registry import ToolRegistry
from framework.runtime.core import Runtime
from framework.tools.queen_lifecycle_tools import register_queen_lifecycle_tools
from framework.tools.queen_lifecycle_tools import (
QueenModeState,
register_queen_lifecycle_tools,
)
from framework.tools.worker_monitoring_tools import register_worker_monitoring_tools
log = logging.getLogger("tui.queen")
@@ -536,12 +539,16 @@ class AdenTUI(App):
except Exception:
log.warning("Queen: MCP config failed to load", exc_info=True)
# Worker is already loaded in TUI path → start in staging mode.
mode_state = QueenModeState(mode="staging", event_bus=event_bus)
register_queen_lifecycle_tools(
queen_registry,
worker_runtime=self.runtime,
event_bus=event_bus,
storage_path=storage_path,
session_id=session_id,
mode_state=mode_state,
)
register_worker_monitoring_tools(
queen_registry,
@@ -553,6 +560,20 @@ class AdenTUI(App):
queen_tools = list(queen_registry.get_tools().values())
queen_tool_executor = queen_registry.get_executor()
# Partition tools into mode-specific sets
from framework.agents.hive_coder.nodes import (
_QUEEN_BUILDING_TOOLS,
_QUEEN_RUNNING_TOOLS,
_QUEEN_STAGING_TOOLS,
)
building_names = set(_QUEEN_BUILDING_TOOLS)
staging_names = set(_QUEEN_STAGING_TOOLS)
running_names = set(_QUEEN_RUNNING_TOOLS)
mode_state.building_tools = [t for t in queen_tools if t.name in building_names]
mode_state.staging_tools = [t for t in queen_tools if t.name in staging_names]
mode_state.running_tools = [t for t in queen_tools if t.name in running_names]
# Build worker profile for queen's system prompt.
from framework.tools.queen_lifecycle_tools import build_worker_profile
@@ -593,12 +614,23 @@ class AdenTUI(App):
stream_id="queen",
storage_path=queen_dir,
loop_config=queen_graph.loop_config,
dynamic_tools_provider=mode_state.get_current_tools,
)
self._queen_executor = executor
# Wire inject_notification so mode switches notify the queen LLM
async def _inject_mode_notification(content: str) -> None:
node = executor.node_registry.get("queen")
if node is not None and hasattr(node, "inject_event"):
await node.inject_event(content)
mode_state.inject_notification = _inject_mode_notification
log.info(
"Queen starting with %d tools: %s",
len(queen_tools),
[t.name for t in queen_tools],
"Queen starting in %s mode with %d tools: %s",
mode_state.mode,
len(mode_state.get_current_tools()),
[t.name for t in mode_state.get_current_tools()],
)
# The queen's event_loop node runs forever (continuous mode).
# It blocks on _await_user_input() after each LLM turn,
@@ -1611,46 +1643,20 @@ class AdenTUI(App):
self.notify(f"Logs {mode}", severity="information", timeout=2)
def action_pause_execution(self) -> None:
"""Immediately pause execution by cancelling task (bound to Ctrl+Z)."""
"""Immediately pause execution by cancelling all running tasks (bound to Ctrl+Z)."""
if self.chat_repl is None or self.runtime is None:
return
try:
if not self.chat_repl._current_exec_id:
if self.runtime.cancel_all_tasks(self.chat_repl._agent_loop):
self.chat_repl._current_exec_id = None
self.notify(
"No active execution to pause",
"All executions stopped",
severity="information",
timeout=3,
)
return
task_cancelled = False
all_streams = []
active_reg = self.runtime.get_graph_registration(self.runtime.active_graph_id)
if active_reg:
all_streams.extend(active_reg.streams.values())
for gid in self.runtime.list_graphs():
if gid == self.runtime.active_graph_id:
continue
reg = self.runtime.get_graph_registration(gid)
if reg:
all_streams.extend(reg.streams.values())
for stream in all_streams:
exec_id = self.chat_repl._current_exec_id
task = stream._execution_tasks.get(exec_id)
if task and not task.done():
task.cancel()
task_cancelled = True
self.notify(
"Execution paused - state saved",
severity="information",
timeout=3,
)
break
if not task_cancelled:
else:
self.notify(
"Execution already completed",
"No active executions",
severity="information",
timeout=2,
)
+4 -4
View File
@@ -53,7 +53,7 @@ def _get_last_active(agent_name: str) -> str | None:
if not state_file.exists():
continue
try:
data = json.loads(state_file.read_text())
data = json.loads(state_file.read_text(encoding="utf-8"))
ts = data.get("timestamps", {}).get("updated_at")
if ts and (latest is None or ts > latest):
latest = ts
@@ -84,7 +84,7 @@ def _extract_agent_stats(agent_path: Path) -> tuple[int, int, list[str]]:
agent_py = agent_path / "agent.py"
if agent_py.exists():
try:
tree = ast.parse(agent_py.read_text())
tree = ast.parse(agent_py.read_text(encoding="utf-8"))
for node in ast.walk(tree):
# Find `nodes = [...]` assignment
if isinstance(node, ast.Assign):
@@ -99,7 +99,7 @@ def _extract_agent_stats(agent_path: Path) -> tuple[int, int, list[str]]:
agent_json = agent_path / "agent.json"
if agent_json.exists():
try:
data = json.loads(agent_json.read_text())
data = json.loads(agent_json.read_text(encoding="utf-8"))
json_nodes = data.get("nodes", [])
if node_count == 0:
node_count = len(json_nodes)
@@ -150,7 +150,7 @@ def discover_agents() -> dict[str, list[AgentEntry]]:
agent_json = path / "agent.json"
if agent_json.exists():
try:
data = json.loads(agent_json.read_text())
data = json.loads(agent_json.read_text(encoding="utf-8"))
meta = data.get("agent", {})
name = meta.get("name", name)
desc = meta.get("description", desc)
+18 -32
View File
@@ -488,7 +488,7 @@ class ChatRepl(Vertical):
if not state_file.exists():
continue
with open(state_file) as f:
with open(state_file, encoding="utf-8") as f:
state = json.load(f)
status = state.get("status", "").lower()
@@ -547,7 +547,7 @@ class ChatRepl(Vertical):
# Read session state
try:
with open(state_file) as f:
with open(state_file, encoding="utf-8") as f:
state = json.load(f)
# Track this session for /resume <number> lookup
@@ -599,7 +599,7 @@ class ChatRepl(Vertical):
try:
import json
with open(state_file) as f:
with open(state_file, encoding="utf-8") as f:
state = json.load(f)
# Basic info
@@ -640,7 +640,7 @@ class ChatRepl(Vertical):
# Load and show checkpoints
for i, cp_file in enumerate(checkpoint_files[-5:], 1): # Last 5
try:
with open(cp_file) as f:
with open(cp_file, encoding="utf-8") as f:
cp_data = json.load(f)
cp_id = cp_data.get("checkpoint_id", cp_file.stem)
@@ -687,7 +687,7 @@ class ChatRepl(Vertical):
import json
with open(state_file) as f:
with open(state_file, encoding="utf-8") as f:
state = json.load(f)
# Resume from session state (not checkpoint)
@@ -868,27 +868,17 @@ class ChatRepl(Vertical):
self._write_history(f"[dim]{traceback.format_exc()}[/dim]")
async def _cmd_pause(self) -> None:
"""Immediately pause execution by cancelling task (same as Ctrl+Z)."""
# Check if there's a current execution
if not self._current_exec_id:
self._write_history("[bold yellow]No active execution to pause[/bold yellow]")
self._write_history(" Start an execution first, then use /pause during execution")
return
# Find and cancel the execution task - executor will catch and save state
task_cancelled = False
for stream in self.runtime._streams.values():
exec_id = self._current_exec_id
task = stream._execution_tasks.get(exec_id)
if task and not task.done():
task.cancel()
task_cancelled = True
self._write_history("[bold green]⏸ Execution paused - state saved[/bold green]")
self._write_history(" Resume later with: [bold]/resume[/bold]")
break
if not task_cancelled:
self._write_history("[bold yellow]Execution already completed[/bold yellow]")
"""Immediately pause execution by cancelling all running tasks (same as Ctrl+Z)."""
future = asyncio.run_coroutine_threadsafe(
self.runtime.cancel_all_tasks_async(), self._agent_loop
)
result = await asyncio.wrap_future(future)
if result:
self._current_exec_id = None
self._write_history("[bold green]⏸ All executions stopped[/bold green]")
self._write_history(" Resume later with: [bold]/resume[/bold]")
else:
self._write_history("[bold yellow]No active executions[/bold yellow]")
async def _cmd_coder(self, reason: str = "") -> None:
"""User-initiated escalation to Hive Coder."""
@@ -1112,7 +1102,7 @@ class ChatRepl(Vertical):
continue
try:
with open(state_file) as f:
with open(state_file, encoding="utf-8") as f:
state = json.load(f)
status = state.get("status", "").lower()
@@ -1460,10 +1450,6 @@ class ChatRepl(Vertical):
indicator.update("Preparing question...")
return
if tool_name == "escalate_to_coder":
indicator.update("Escalating to coder...")
return
# Update indicator to show tool activity
indicator.update(f"Using tool: {tool_name}...")
@@ -1475,7 +1461,7 @@ class ChatRepl(Vertical):
def handle_tool_completed(self, tool_name: str, result: str, is_error: bool) -> None:
"""Handle a tool call completing."""
if tool_name in ("ask_user", "escalate_to_coder"):
if tool_name == "ask_user":
return
result_str = str(result)
@@ -38,6 +38,7 @@ def _linux_file_dialog() -> subprocess.CompletedProcess | None:
"--title=Select a PDF file",
"--file-filter=PDF files (*.pdf)|*.pdf",
],
encoding="utf-8",
capture_output=True,
text=True,
timeout=300,
@@ -54,6 +55,7 @@ def _linux_file_dialog() -> subprocess.CompletedProcess | None:
".",
"PDF files (*.pdf)",
],
encoding="utf-8",
capture_output=True,
text=True,
timeout=300,
@@ -79,6 +81,7 @@ def _pick_pdf_subprocess() -> Path | None:
'POSIX path of (choose file of type {"com.adobe.pdf"} '
'with prompt "Select a PDF file")',
],
encoding="utf-8",
capture_output=True,
text=True,
timeout=300,
@@ -93,6 +96,7 @@ def _pick_pdf_subprocess() -> Path | None:
)
result = subprocess.run(
["powershell", "-NoProfile", "-Command", ps_script],
encoding="utf-8",
capture_output=True,
text=True,
timeout=300,
@@ -199,10 +199,11 @@ def _copy_to_clipboard(text: str) -> None:
"""Copy text to system clipboard using platform-native tools."""
try:
if sys.platform == "darwin":
subprocess.run(["pbcopy"], input=text.encode(), check=True, timeout=5)
subprocess.run(["pbcopy"], encoding="utf-8", input=text.encode(), check=True, timeout=5)
elif sys.platform == "win32":
subprocess.run(
["clip.exe"],
encoding="utf-8",
input=text.encode("utf-16le"),
check=True,
timeout=5,
@@ -211,6 +212,7 @@ def _copy_to_clipboard(text: str) -> None:
try:
subprocess.run(
["xclip", "-selection", "clipboard"],
encoding="utf-8",
input=text.encode(),
check=True,
timeout=5,
@@ -218,6 +220,7 @@ def _copy_to_clipboard(text: str) -> None:
except (subprocess.SubprocessError, FileNotFoundError):
subprocess.run(
["xsel", "--clipboard", "--input"],
encoding="utf-8",
input=text.encode(),
check=True,
timeout=5,
+7
View File
@@ -37,6 +37,13 @@ export const executionApi = {
chat: (sessionId: string, message: string) =>
api.post<ChatResult>(`/sessions/${sessionId}/chat`, { message }),
/** Queue context for the queen without triggering an LLM response. */
queenContext: (sessionId: string, message: string) =>
api.post<ChatResult>(`/sessions/${sessionId}/queen-context`, { message }),
workerInput: (sessionId: string, message: string) =>
api.post<ChatResult>(`/sessions/${sessionId}/worker-input`, { message }),
stop: (sessionId: string, executionId: string) =>
api.post<StopResult>(`/sessions/${sessionId}/stop`, {
execution_id: executionId,
+12 -3
View File
@@ -13,12 +13,13 @@ export const sessionsApi = {
// --- Session lifecycle ---
/** Create a session. If agentPath is provided, loads worker in one step. */
create: (agentPath?: string, agentId?: string, model?: string, initialPrompt?: string) =>
create: (agentPath?: string, agentId?: string, model?: string, initialPrompt?: string, queenResumeFrom?: string) =>
api.post<LiveSession>("/sessions", {
agent_path: agentPath,
agent_id: agentId,
model,
initial_prompt: initialPrompt,
queen_resume_from: queenResumeFrom || undefined,
}),
/** List all active sessions. */
@@ -66,9 +67,17 @@ export const sessionsApi = {
graphs: (sessionId: string) =>
api.get<{ graphs: string[] }>(`/sessions/${sessionId}/graphs`),
/** Get queen conversation history for a session. */
/** Get queen conversation history for a session (works for cold/post-restart sessions too). */
queenMessages: (sessionId: string) =>
api.get<{ messages: Message[] }>(`/sessions/${sessionId}/queen-messages`),
api.get<{ messages: Message[]; session_id: string }>(`/sessions/${sessionId}/queen-messages`),
/** List all queen sessions on disk — live + cold (post-restart). */
history: () =>
api.get<{ sessions: Array<{ session_id: string; cold: boolean; live: boolean; has_messages: boolean; created_at: number; agent_name?: string | null; agent_path?: string | null }> }>("/sessions/history"),
/** Permanently delete a history session (stops live session + removes disk files). */
deleteHistory: (sessionId: string) =>
api.delete<{ deleted: string }>(`/sessions/history/${sessionId}`),
// --- Worker session browsing (persisted execution runs) ---
+12 -1
View File
@@ -12,6 +12,8 @@ export interface LiveSession {
loaded_at: number;
uptime_seconds: number;
intro_message?: string;
/** Queen operating mode — "building", "staging", or "running" */
queen_mode?: "building" | "staging" | "running";
/** Present in 409 conflict responses when worker is still loading */
loading?: boolean;
}
@@ -19,6 +21,8 @@ export interface LiveSession {
export interface LiveSessionDetail extends LiveSession {
entry_points?: EntryPoint[];
graphs?: string[];
/** True when the session exists on disk but is not live (server restarted). */
cold?: boolean;
}
export interface EntryPoint {
@@ -27,6 +31,8 @@ export interface EntryPoint {
entry_node: string;
trigger_type: string;
trigger_config?: Record<string, unknown>;
/** Seconds until the next timer fire (only present for timer entry points). */
next_fire_in?: number;
}
export interface DiscoverEntry {
@@ -131,6 +137,8 @@ export interface Message {
is_transition_marker?: boolean;
is_client_input?: boolean;
tool_calls?: unknown[];
/** Epoch seconds from file mtime — used for cross-conversation ordering */
created_at?: number;
[key: string]: unknown;
}
@@ -151,6 +159,7 @@ export interface NodeSpec {
client_facing: boolean;
success_criteria: string | null;
system_prompt: string;
sub_agents?: string[];
// Runtime enrichment (when session_id provided)
visit_count?: number;
has_failures?: boolean;
@@ -265,7 +274,9 @@ export type EventTypeName =
| "custom"
| "escalation_requested"
| "worker_loaded"
| "credentials_required";
| "credentials_required"
| "queen_mode_changed"
| "subagent_report";
export interface AgentEvent {
type: EventTypeName;
+45 -5
View File
@@ -30,6 +30,8 @@ interface AgentGraphProps {
onPause?: () => void;
version?: string;
runState?: RunState;
building?: boolean;
queenMode?: "building" | "staging" | "running";
}
// --- Extracted RunButton so hover state survives parent re-renders ---
@@ -144,7 +146,7 @@ function truncateLabel(label: string, availablePx: number, fontSize: number): st
return label.slice(0, Math.max(maxChars - 1, 1)) + "\u2026";
}
export default function AgentGraph({ nodes, title: _title, onNodeClick, onRun, onPause, version, runState: externalRunState }: AgentGraphProps) {
export default function AgentGraph({ nodes, title: _title, onNodeClick, onRun, onPause, version, runState: externalRunState, building, queenMode }: AgentGraphProps) {
const [localRunState, setLocalRunState] = useState<RunState>("idle");
const runState = externalRunState ?? localRunState;
const runBtnRef = useRef<HTMLButtonElement>(null);
@@ -276,10 +278,17 @@ export default function AgentGraph({ nodes, title: _title, onNodeClick, onRun, o
</span>
)}
</div>
<RunButton runState={runState} disabled={nodes.length === 0} onRun={handleRun} onPause={onPause ?? (() => {})} btnRef={runBtnRef} />
<RunButton runState={runState} disabled={nodes.length === 0 || queenMode === "building"} onRun={handleRun} onPause={onPause ?? (() => {})} btnRef={runBtnRef} />
</div>
<div className="flex-1 flex items-center justify-center px-5">
<p className="text-xs text-muted-foreground/60 text-center italic">No pipeline configured yet.<br/>Chat with the Queen to get started.</p>
{building ? (
<div className="flex flex-col items-center gap-3">
<Loader2 className="w-6 h-6 animate-spin text-primary/60" />
<p className="text-xs text-muted-foreground/80 text-center">Building agent...</p>
</div>
) : (
<p className="text-xs text-muted-foreground/60 text-center italic">No pipeline configured yet.<br/>Chat with the Queen to get started.</p>
)}
</div>
</div>
);
@@ -407,6 +416,18 @@ export default function AgentGraph({ nodes, title: _title, onNodeClick, onRun, o
const triggerFontSize = nodeW < 140 ? 10.5 : 11.5;
const triggerAvailW = nodeW - 38;
const triggerDisplayLabel = truncateLabel(node.label, triggerAvailW, triggerFontSize);
const nextFireIn = node.triggerConfig?.next_fire_in as number | undefined;
// Format countdown for display below node
let countdownLabel: string | null = null;
if (nextFireIn != null && nextFireIn > 0) {
const h = Math.floor(nextFireIn / 3600);
const m = Math.floor((nextFireIn % 3600) / 60);
const s = Math.floor(nextFireIn % 60);
countdownLabel = h > 0
? `next in ${h}h ${String(m).padStart(2, "0")}m`
: `next in ${m}m ${String(s).padStart(2, "0")}s`;
}
return (
<g key={node.id} onClick={() => onNodeClick?.(node)} style={{ cursor: onNodeClick ? "pointer" : "default" }}>
@@ -442,6 +463,17 @@ export default function AgentGraph({ nodes, title: _title, onNodeClick, onRun, o
>
{triggerDisplayLabel}
</text>
{/* Countdown label below node */}
{countdownLabel && (
<text
x={pos.x + nodeW / 2} y={pos.y + NODE_H + 13}
fill="hsl(210,30%,50%)" fontSize={9.5}
textAnchor="middle" fontStyle="italic" opacity={0.7}
>
{countdownLabel}
</text>
)}
</g>
);
};
@@ -568,18 +600,26 @@ export default function AgentGraph({ nodes, title: _title, onNodeClick, onRun, o
</div>
{/* Graph */}
<div className="flex-1 overflow-y-auto overflow-x-hidden px-3 pb-5">
<div className="flex-1 overflow-y-auto overflow-x-hidden px-3 pb-5 relative">
<svg
width={svgWidth}
height={svgHeight}
viewBox={`0 0 ${svgWidth} ${svgHeight}`}
className="select-none"
className={`select-none${building ? " opacity-30" : ""}`}
style={{ fontFamily: "'Inter', system-ui, sans-serif" }}
>
{forwardEdges.map((e, i) => renderForwardEdge(e, i))}
{backEdges.map((e, i) => renderBackEdge(e, i))}
{nodes.map((n, i) => renderNode(n, i))}
</svg>
{building && (
<div className="absolute inset-0 flex items-center justify-center">
<div className="flex flex-col items-center gap-3">
<Loader2 className="w-6 h-6 animate-spin text-primary/60" />
<p className="text-xs text-muted-foreground/80">Rebuilding agent...</p>
</div>
</div>
)}
</div>
</div>
);
+198 -106
View File
@@ -1,7 +1,7 @@
import { memo, useState, useRef, useEffect } from "react";
import { Send, Square, Crown, Cpu, Check, ChevronRight, Loader2 } from "lucide-react";
import { formatAgentDisplayName } from "@/lib/chat-helpers";
import { Send, Square, Crown, Cpu, Check, Loader2 } from "lucide-react";
import MarkdownContent from "@/components/MarkdownContent";
import QuestionWidget from "@/components/QuestionWidget";
export interface ChatMessage {
id: string;
@@ -9,41 +9,71 @@ export interface ChatMessage {
agentColor: string;
content: string;
timestamp: string;
type?: "system" | "agent" | "user" | "tool_status";
type?: "system" | "agent" | "user" | "tool_status" | "worker_input_request";
role?: "queen" | "worker";
/** Which worker thread this message belongs to (worker agent name) */
thread?: string;
/** Epoch ms when this message was first created — used for ordering queen/worker interleaving */
createdAt?: number;
}
interface ChatPanelProps {
messages: ChatMessage[];
onSend: (message: string, thread: string) => void;
isWaiting?: boolean;
/** When true a worker is thinking (not yet streaming) */
isWorkerWaiting?: boolean;
/** When true the queen is busy (typing or streaming) — shows the stop button */
isBusy?: boolean;
activeThread: string;
/** When true, the agent is waiting for user input — changes placeholder text */
awaitingInput?: boolean;
/** When true, the input is disabled (e.g. during loading) */
disabled?: boolean;
/** Called when user clicks the stop button to cancel the queen's current turn */
onCancel?: () => void;
/** Pending question from ask_user — replaces textarea when present */
pendingQuestion?: string | null;
/** Options for the pending question */
pendingOptions?: string[] | null;
/** Called when user submits an answer to the pending question */
onQuestionSubmit?: (answer: string, isOther: boolean) => void;
/** Called when user dismisses the pending question without answering */
onQuestionDismiss?: () => void;
/** Queen operating mode — shown as a tag on queen messages */
queenMode?: "building" | "staging" | "running";
}
const queenColor = "hsl(45,95%,58%)";
const workerColor = "hsl(220,60%,55%)";
function getColor(_agent: string, role?: "queen" | "worker"): string {
if (role === "queen") return queenColor;
return "hsl(220,60%,55%)";
return workerColor;
}
// Honey-drizzle palette — based on color-hex.com/color-palette/80116
// #8e4200 · #db6f02 · #ff9624 · #ffb825 · #ffd69c + adjacent warm tones
const TOOL_HEX = [
"#db6f02", // rich orange
"#ffb825", // golden yellow
"#ff9624", // bright orange
"#c48820", // warm bronze
"#e89530", // honey
"#d4a040", // goldenrod
"#cc7a10", // caramel
"#e5a820", // sunflower
];
function toolHex(name: string): string {
let hash = 0;
for (let i = 0; i < name.length; i++) hash = (hash * 31 + name.charCodeAt(i)) | 0;
return TOOL_HEX[Math.abs(hash) % TOOL_HEX.length];
}
function ToolActivityRow({ content }: { content: string }) {
const [expanded, setExpanded] = useState(false);
let tools: { name: string; done: boolean }[] = [];
let allDone = false;
try {
const parsed = JSON.parse(content);
tools = parsed.tools || [];
allDone = parsed.allDone ?? false;
} catch {
// Legacy plain-text fallback
return (
@@ -57,54 +87,64 @@ function ToolActivityRow({ content }: { content: string }) {
if (tools.length === 0) return null;
const total = tools.length;
// Group by tool name → count done vs running
const grouped = new Map<string, { done: number; running: number }>();
for (const t of tools) {
const entry = grouped.get(t.name) || { done: 0, running: 0 };
if (t.done) entry.done++;
else entry.running++;
grouped.set(t.name, entry);
}
if (allDone && !expanded) {
return (
<div className="flex gap-3 pl-10">
<button
onClick={() => setExpanded(true)}
className="flex items-center gap-1.5 text-[11px] text-muted-foreground hover:text-foreground transition-colors"
>
<ChevronRight className="w-3 h-3" />
<Check className="w-3 h-3 text-emerald-500" />
<span>{total} tool{total === 1 ? "" : "s"} used</span>
</button>
</div>
);
// Build pill list: running first, then done
const runningPills: { name: string; count: number }[] = [];
const donePills: { name: string; count: number }[] = [];
for (const [name, counts] of grouped) {
if (counts.running > 0) runningPills.push({ name, count: counts.running });
if (counts.done > 0) donePills.push({ name, count: counts.done });
}
return (
<div className="flex gap-3 pl-10">
<div className="flex flex-wrap items-center gap-1.5">
{allDone && (
<button onClick={() => setExpanded(false)} className="text-muted-foreground hover:text-foreground transition-colors">
<ChevronRight className="w-3 h-3 rotate-90" />
</button>
)}
{tools.map((t, i) => (
<span
key={i}
className={`inline-flex items-center gap-1 text-[11px] px-2 py-0.5 rounded-full border ${
t.done
? "text-emerald-600 bg-emerald-500/10 border-emerald-500/20"
: "text-muted-foreground bg-muted/40 border-border/40"
}`}
>
{t.done ? (
<Check className="w-2.5 h-2.5" />
) : (
{runningPills.map((p) => {
const hex = toolHex(p.name);
return (
<span
key={`run-${p.name}`}
className="inline-flex items-center gap-1 text-[11px] px-2.5 py-0.5 rounded-full"
style={{ color: hex, backgroundColor: `${hex}18`, border: `1px solid ${hex}35` }}
>
<Loader2 className="w-2.5 h-2.5 animate-spin" />
)}
{t.name}
</span>
))}
{p.name}
{p.count > 1 && (
<span className="text-[10px] font-medium opacity-70">×{p.count}</span>
)}
</span>
);
})}
{donePills.map((p) => {
const hex = toolHex(p.name);
return (
<span
key={`done-${p.name}`}
className="inline-flex items-center gap-1 text-[11px] px-2.5 py-0.5 rounded-full"
style={{ color: hex, backgroundColor: `${hex}18`, border: `1px solid ${hex}35` }}
>
<Check className="w-2.5 h-2.5" />
{p.name}
{p.count > 1 && (
<span className="text-[10px] opacity-80">×{p.count}</span>
)}
</span>
);
})}
</div>
</div>
);
}
const MessageBubble = memo(function MessageBubble({ msg }: { msg: ChatMessage }) {
const MessageBubble = memo(function MessageBubble({ msg, queenMode }: { msg: ChatMessage; queenMode?: "building" | "staging" | "running" }) {
const isUser = msg.type === "user";
const isQueen = msg.role === "queen";
const color = getColor(msg.agent, msg.role);
@@ -159,7 +199,13 @@ const MessageBubble = memo(function MessageBubble({ msg }: { msg: ChatMessage })
isQueen ? "bg-primary/15 text-primary" : "bg-muted text-muted-foreground"
}`}
>
{isQueen ? "Queen" : "Worker"}
{isQueen
? queenMode === "running"
? "running mode"
: queenMode === "staging"
? "staging mode"
: "building mode"
: "Worker"}
</span>
</div>
<div
@@ -172,12 +218,14 @@ const MessageBubble = memo(function MessageBubble({ msg }: { msg: ChatMessage })
</div>
</div>
);
}, (prev, next) => prev.msg.id === next.msg.id && prev.msg.content === next.msg.content);
}, (prev, next) => prev.msg.id === next.msg.id && prev.msg.content === next.msg.content && prev.queenMode === next.queenMode);
export default function ChatPanel({ messages, onSend, isWaiting, activeThread, awaitingInput, disabled, onCancel }: ChatPanelProps) {
export default function ChatPanel({ messages, onSend, isWaiting, isWorkerWaiting, isBusy, activeThread, disabled, onCancel, pendingQuestion, pendingOptions, onQuestionSubmit, onQuestionDismiss, queenMode }: ChatPanelProps) {
const [input, setInput] = useState("");
const [readMap, setReadMap] = useState<Record<string, number>>({});
const bottomRef = useRef<HTMLDivElement>(null);
const scrollRef = useRef<HTMLDivElement>(null);
const stickToBottom = useRef(true);
const textareaRef = useRef<HTMLTextAreaElement>(null);
const threadMessages = messages.filter((m) => {
@@ -194,10 +242,24 @@ export default function ChatPanel({ messages, onSend, isWaiting, activeThread, a
// Suppress unused var
void readMap;
const lastMsg = threadMessages[threadMessages.length - 1];
// Autoscroll: only when user is already near the bottom
const handleScroll = () => {
const el = scrollRef.current;
if (!el) return;
const distFromBottom = el.scrollHeight - el.scrollTop - el.clientHeight;
stickToBottom.current = distFromBottom < 80;
};
useEffect(() => {
bottomRef.current?.scrollIntoView({ behavior: "smooth" });
}, [threadMessages.length, lastMsg?.content]);
if (stickToBottom.current) {
bottomRef.current?.scrollIntoView({ behavior: "smooth" });
}
}, [threadMessages, pendingQuestion, isWaiting, isWorkerWaiting]);
// Always start pinned to bottom when switching threads
useEffect(() => {
stickToBottom.current = true;
}, [activeThread]);
const handleSubmit = (e: React.FormEvent) => {
e.preventDefault();
@@ -207,8 +269,6 @@ export default function ChatPanel({ messages, onSend, isWaiting, activeThread, a
if (textareaRef.current) textareaRef.current.style.height = "auto";
};
const activeWorkerLabel = formatAgentDisplayName(activeThread);
return (
<div className="flex flex-col h-full min-w-0">
{/* Compact sub-header */}
@@ -217,15 +277,44 @@ export default function ChatPanel({ messages, onSend, isWaiting, activeThread, a
</div>
{/* Messages */}
<div className="flex-1 overflow-auto px-5 py-4 space-y-3">
<div ref={scrollRef} onScroll={handleScroll} className="flex-1 overflow-auto px-5 py-4 space-y-3">
{threadMessages.map((msg) => (
<MessageBubble key={msg.id} msg={msg} />
<div key={msg.id}>
<MessageBubble msg={msg} queenMode={queenMode} />
</div>
))}
{isWaiting && (
<div className="flex gap-3">
<div className="w-7 h-7 rounded-xl bg-muted flex items-center justify-center">
<Cpu className="w-3.5 h-3.5 text-muted-foreground" />
<div
className="flex-shrink-0 w-9 h-9 rounded-xl flex items-center justify-center"
style={{
backgroundColor: `${queenColor}18`,
border: `1.5px solid ${queenColor}35`,
boxShadow: `0 0 12px ${queenColor}20`,
}}
>
<Crown className="w-4 h-4" style={{ color: queenColor }} />
</div>
<div className="border border-primary/20 bg-primary/5 rounded-2xl rounded-tl-md px-4 py-3">
<div className="flex gap-1.5">
<span className="w-1.5 h-1.5 rounded-full bg-muted-foreground animate-bounce" style={{ animationDelay: "0ms" }} />
<span className="w-1.5 h-1.5 rounded-full bg-muted-foreground animate-bounce" style={{ animationDelay: "150ms" }} />
<span className="w-1.5 h-1.5 rounded-full bg-muted-foreground animate-bounce" style={{ animationDelay: "300ms" }} />
</div>
</div>
</div>
)}
{isWorkerWaiting && !isWaiting && (
<div className="flex gap-3">
<div
className="flex-shrink-0 w-7 h-7 rounded-xl flex items-center justify-center"
style={{
backgroundColor: `${workerColor}18`,
border: `1.5px solid ${workerColor}35`,
}}
>
<Cpu className="w-3.5 h-3.5" style={{ color: workerColor }} />
</div>
<div className="bg-muted/60 rounded-2xl rounded-tl-md px-4 py-3">
<div className="flex gap-1.5">
@@ -239,54 +328,57 @@ export default function ChatPanel({ messages, onSend, isWaiting, activeThread, a
<div ref={bottomRef} />
</div>
{/* Input */}
<form onSubmit={handleSubmit} className="p-4 border-t border-border">
<div className="flex items-center gap-3 bg-muted/40 rounded-xl px-4 py-2.5 border border-border focus-within:border-primary/40 transition-colors">
<textarea
ref={textareaRef}
rows={1}
value={input}
onChange={(e) => {
setInput(e.target.value);
const ta = e.target;
ta.style.height = "auto";
ta.style.height = `${Math.min(ta.scrollHeight, 160)}px`;
}}
onKeyDown={(e) => {
if (e.key === "Enter" && !e.shiftKey) {
e.preventDefault();
handleSubmit(e);
}
}}
placeholder={
disabled
? "Connecting to agent..."
: awaitingInput
? "Agent is waiting for your response..."
: `Message ${activeWorkerLabel}...`
}
disabled={disabled}
className="flex-1 bg-transparent text-sm text-foreground outline-none placeholder:text-muted-foreground disabled:opacity-50 disabled:cursor-not-allowed resize-none overflow-y-auto"
/>
{isWaiting && onCancel ? (
<button
type="button"
onClick={onCancel}
className="p-2 rounded-lg bg-destructive text-destructive-foreground hover:opacity-90 transition-opacity"
>
<Square className="w-4 h-4" />
</button>
) : (
<button
type="submit"
disabled={!input.trim() || disabled}
className="p-2 rounded-lg bg-primary text-primary-foreground disabled:opacity-30 hover:opacity-90 transition-opacity"
>
<Send className="w-4 h-4" />
</button>
)}
</div>
</form>
{/* Input area — question widget replaces textarea when a question is pending */}
{pendingQuestion && pendingOptions && onQuestionSubmit ? (
<QuestionWidget
question={pendingQuestion}
options={pendingOptions}
onSubmit={onQuestionSubmit}
onDismiss={onQuestionDismiss}
/>
) : (
<form onSubmit={handleSubmit} className="p-4">
<div className="flex items-center gap-3 bg-muted/40 rounded-xl px-4 py-2.5 border border-border focus-within:border-primary/40 transition-colors">
<textarea
ref={textareaRef}
rows={1}
value={input}
onChange={(e) => {
setInput(e.target.value);
const ta = e.target;
ta.style.height = "auto";
ta.style.height = `${Math.min(ta.scrollHeight, 160)}px`;
}}
onKeyDown={(e) => {
if (e.key === "Enter" && !e.shiftKey) {
e.preventDefault();
handleSubmit(e);
}
}}
placeholder={disabled ? "Connecting to agent..." : "Message Queen Bee..."}
disabled={disabled}
className="flex-1 bg-transparent text-sm text-foreground outline-none placeholder:text-muted-foreground disabled:opacity-50 disabled:cursor-not-allowed resize-none overflow-y-auto"
/>
{isBusy && onCancel ? (
<button
type="button"
onClick={onCancel}
className="p-2 rounded-lg bg-amber-500/15 text-amber-400 border border-amber-500/40 hover:bg-amber-500/25 transition-colors"
>
<Square className="w-4 h-4" />
</button>
) : (
<button
type="submit"
disabled={!input.trim() || disabled}
className="p-2 rounded-lg bg-primary text-primary-foreground disabled:opacity-30 hover:opacity-90 transition-opacity"
>
<Send className="w-4 h-4" />
</button>
)}
</div>
</form>
)}
</div>
);
}
@@ -400,17 +400,19 @@ export default function CredentialsModal({
<Pencil className="w-3 h-3" />
</button>
)}
<button
onClick={() => {
setDeletingId(deletingId === row.id ? null : row.id);
if (editingId) { setEditingId(null); setInputValue(""); }
}}
disabled={saving}
className="p-1.5 rounded-md text-muted-foreground hover:text-destructive hover:bg-destructive/10 transition-colors"
title="Delete credential"
>
<Trash2 className="w-3 h-3" />
</button>
{!(row.adenSupported && row.id !== "aden_api_key") && (
<button
onClick={() => {
setDeletingId(deletingId === row.id ? null : row.id);
if (editingId) { setEditingId(null); setInputValue(""); }
}}
disabled={saving}
className="p-1.5 rounded-md text-muted-foreground hover:text-destructive hover:bg-destructive/10 transition-colors"
title="Delete credential"
>
<Trash2 className="w-3 h-3" />
</button>
)}
</div>
) : row.adenSupported && !adenPlatformConnected && row.id !== "aden_api_key" ? (
<span className="text-[11px] text-muted-foreground italic flex-shrink-0">
@@ -0,0 +1,431 @@
/**
* HistorySidebar — persistent ChatGPT-style session history sidebar.
*
* Shown on both the Home page and the Workspace. Clicking a session fires
* `onOpen(sessionId, agentPath)` so the caller decides what to do (navigate
* to workspace on Home, open/switch tab on Workspace).
*
* Labels (user-visible names) are stored purely in localStorage — backend
* session IDs are never touched.
*
* Session deduplication: the backend may have multiple session directories
* for the same agent (cold restarts create new directories). We deduplicate
* by agent_path and show only the most-recent session per agent so the
* history list stays clean.
*/
import { useState, useEffect, useRef, useCallback } from "react";
import { ChevronLeft, ChevronRight, Clock, Bot, Loader2, MoreHorizontal, Pencil, Trash2, Check, X } from "lucide-react";
import { sessionsApi } from "@/api/sessions";
// ── Types ─────────────────────────────────────────────────────────────────────
export type HistorySession = {
session_id: string;
cold: boolean;
live: boolean;
has_messages: boolean;
created_at: number;
agent_name?: string | null;
agent_path?: string | null;
/** Snippet of the last assistant message — for sidebar preview. */
last_message?: string | null;
/** Total number of client-facing messages in this session. */
message_count?: number;
};
const LABEL_STORE_KEY = "hive:history-labels";
function loadLabelStore(): Record<string, string> {
try {
const raw = localStorage.getItem(LABEL_STORE_KEY);
return raw ? (JSON.parse(raw) as Record<string, string>) : {};
} catch {
return {};
}
}
function saveLabelStore(store: Record<string, string>) {
try {
localStorage.setItem(LABEL_STORE_KEY, JSON.stringify(store));
} catch { }
}
// ── Helpers ───────────────────────────────────────────────────────────────────
function defaultLabel(s: HistorySession, index: number): string {
if (s.agent_name) return s.agent_name;
if (s.agent_path) {
const base = s.agent_path.replace(/\/$/, "").split("/").pop() || s.agent_path;
return base
.split("_")
.map((w) => w.charAt(0).toUpperCase() + w.slice(1))
.join(" ");
}
return `New Agent${index > 0 ? ` #${index + 1}` : ""}`;
}
function formatDateTime(createdAt: number, sessionId: string): string {
// Prefer timestamp embedded in session_id: session_YYYYMMDD_HHMMSS_xxx
const match = sessionId.match(/^session_(\d{4})(\d{2})(\d{2})_(\d{2})(\d{2})(\d{2})/);
const d = match
? new Date(+match[1], +match[2] - 1, +match[3], +match[4], +match[5], +match[6])
: new Date(createdAt * 1000);
return d.toLocaleString(undefined, {
month: "short",
day: "numeric",
hour: "2-digit",
minute: "2-digit",
});
}
/**
* Deduplicate sessions by agent_path — keep only the most recent session
* per agent. Sessions are already sorted newest-first by the backend.
* Sessions without an agent_path (new-agent / queen-only) are kept individually.
*/
function deduplicateByAgent(sessions: HistorySession[]): HistorySession[] {
const seen = new Set<string>();
const result: HistorySession[] = [];
for (const s of sessions) {
// Group key: use agent_path when present, otherwise use session_id (unique)
const key = s.agent_path ? s.agent_path.replace(/\/$/, "") : `__no_agent__${s.session_id}`;
if (!seen.has(key)) {
seen.add(key);
result.push(s);
}
// Additional sessions for the same agent are silently skipped
}
return result;
}
function groupByDate(sessions: HistorySession[]): { label: string; items: HistorySession[] }[] {
const now = new Date();
const today = new Date(now.getFullYear(), now.getMonth(), now.getDate()).getTime();
const yesterday = today - 86_400_000;
const weekAgo = today - 7 * 86_400_000;
const groups: { label: string; items: HistorySession[] }[] = [
{ label: "Today", items: [] },
{ label: "Yesterday", items: [] },
{ label: "Last 7 days", items: [] },
{ label: "Older", items: [] },
];
for (const s of sessions) {
const d = new Date(s.created_at * 1000);
const dayTs = new Date(d.getFullYear(), d.getMonth(), d.getDate()).getTime();
if (dayTs >= today) groups[0].items.push(s);
else if (dayTs >= yesterday) groups[1].items.push(s);
else if (dayTs >= weekAgo) groups[2].items.push(s);
else groups[3].items.push(s);
}
return groups.filter((g) => g.items.length > 0);
}
// ── Row component ─────────────────────────────────────────────────────────────
interface RowProps {
session: HistorySession;
label: string;
index: number;
isActive: boolean;
isLive: boolean;
onOpen: () => void;
onRename: (newLabel: string) => void;
onDelete: () => void;
}
function HistoryRow({ session: s, label, isActive, isLive, onOpen, onRename, onDelete }: RowProps) {
const [menuOpen, setMenuOpen] = useState(false);
const [renaming, setRenaming] = useState(false);
const [draftLabel, setDraftLabel] = useState(label);
const menuRef = useRef<HTMLDivElement>(null);
const inputRef = useRef<HTMLInputElement>(null);
useEffect(() => {
if (!menuOpen) return;
const handler = (e: MouseEvent) => {
if (menuRef.current && !menuRef.current.contains(e.target as Node)) setMenuOpen(false);
};
document.addEventListener("mousedown", handler);
return () => document.removeEventListener("mousedown", handler);
}, [menuOpen]);
useEffect(() => {
if (renaming) {
setDraftLabel(label);
requestAnimationFrame(() => inputRef.current?.select());
}
}, [renaming, label]);
const commitRename = () => {
const trimmed = draftLabel.trim();
if (trimmed) onRename(trimmed);
setRenaming(false);
};
const dateStr = formatDateTime(s.created_at, s.session_id);
return (
<div
className={`group relative flex items-start gap-2 px-3 py-2 cursor-pointer transition-colors ${isActive
? "bg-primary/10 border-l-2 border-primary"
: "border-l-2 border-transparent hover:bg-muted/40"
}`}
onClick={() => { if (!renaming) onOpen(); }}
>
<Bot className="w-3.5 h-3.5 flex-shrink-0 mt-[3px] text-muted-foreground/40 group-hover:text-muted-foreground/70 transition-colors" />
<div className="min-w-0 flex-1">
{renaming ? (
<div className="flex items-center gap-1" onClick={(e) => e.stopPropagation()}>
<input
ref={inputRef}
value={draftLabel}
onChange={(e) => setDraftLabel(e.target.value)}
onKeyDown={(e) => {
if (e.key === "Enter") commitRename();
if (e.key === "Escape") setRenaming(false);
}}
className="flex-1 min-w-0 text-[11px] bg-muted/60 border border-border/50 rounded px-1.5 py-0.5 text-foreground focus:outline-none focus:ring-1 focus:ring-primary/40"
/>
<button onClick={commitRename} className="p-0.5 text-primary hover:text-primary/80">
<Check className="w-3 h-3" />
</button>
<button onClick={() => setRenaming(false)} className="p-0.5 text-muted-foreground hover:text-foreground">
<X className="w-3 h-3" />
</button>
</div>
) : (
<>
<div className={`text-[11px] font-medium truncate leading-tight ${isActive ? "text-foreground" : "text-foreground/80"}`}>
{label}
</div>
{/* Message preview — most recent assistant message */}
{s.last_message && (
<div className="text-[10px] text-muted-foreground/50 mt-0.5 leading-tight line-clamp-2 break-words">
{s.last_message}
</div>
)}
<div className="flex items-center gap-1.5 mt-0.5">
<div className="text-[10px] text-muted-foreground/40">{dateStr}</div>
{(s.message_count ?? 0) > 0 && (
<span className="text-[9px] text-muted-foreground/30">· {s.message_count} msgs</span>
)}
</div>
{isLive && (
<span className="text-[9px] text-emerald-500/80 font-semibold uppercase tracking-wide">live</span>
)}
</>
)}
</div>
{/* 3-dot button — visible on row hover */}
{!renaming && (
<div className="relative flex-shrink-0" ref={menuRef} onClick={(e) => e.stopPropagation()}>
<button
onClick={() => setMenuOpen((o) => !o)}
className={`p-0.5 rounded transition-colors text-muted-foreground/40 hover:text-foreground hover:bg-muted/60 ${menuOpen ? "opacity-100" : "opacity-0 group-hover:opacity-100"
}`}
title="More options"
>
<MoreHorizontal className="w-3.5 h-3.5" />
</button>
{menuOpen && (
<div className="absolute right-0 top-5 z-50 w-36 rounded-lg border border-border/60 bg-card shadow-xl shadow-black/30 overflow-hidden py-1">
<button
onClick={() => { setMenuOpen(false); setRenaming(true); }}
className="flex items-center gap-2 w-full px-3 py-1.5 text-xs text-foreground hover:bg-muted/60 transition-colors"
>
<Pencil className="w-3 h-3 text-muted-foreground" />
Rename
</button>
<button
onClick={() => { setMenuOpen(false); onDelete(); }}
className="flex items-center gap-2 w-full px-3 py-1.5 text-xs text-destructive hover:bg-destructive/10 transition-colors"
>
<Trash2 className="w-3 h-3" />
Delete
</button>
</div>
)}
</div>
)}
</div>
);
}
// ── Main sidebar component ────────────────────────────────────────────────────
interface HistorySidebarProps {
/** Called when a history session is clicked. */
onOpen: (sessionId: string, agentPath?: string | null, agentName?: string | null) => void;
/** session_ids of tabs already open (for highlighting). */
openSessionIds?: string[];
/** session_id of the currently active/viewed session (live backend ID). */
activeSessionId?: string | null;
/** historySourceId of the active session — the original cold session ID before revive,
* stays stable even after the backend creates a new live session on cold-restore. */
activeHistorySourceId?: string | null;
/** Increment this to force a refresh of the session list. */
refreshKey?: number;
}
export default function HistorySidebar({ onOpen, openSessionIds = [], activeSessionId, activeHistorySourceId, refreshKey }: HistorySidebarProps) {
const [collapsed, setCollapsed] = useState(false);
// Raw sessions from the backend (may contain duplicates per agent)
const [rawSessions, setRawSessions] = useState<HistorySession[]>([]);
const [loading, setLoading] = useState(false);
const [labels, setLabels] = useState<Record<string, string>>(loadLabelStore);
const refresh = useCallback(() => {
setLoading(true);
sessionsApi
.history()
.then((r) => setRawSessions(r.sessions))
.catch(() => { })
.finally(() => setLoading(false));
}, []);
// Refresh on mount and whenever the parent forces a refresh
useEffect(() => {
refresh();
}, [refresh, refreshKey]);
// Refresh when the browser tab regains visibility
useEffect(() => {
const handleVisibility = () => {
if (document.visibilityState === "visible") refresh();
};
document.addEventListener("visibilitychange", handleVisibility);
return () => document.removeEventListener("visibilitychange", handleVisibility);
}, [refresh]);
const handleRename = (sessionId: string, newLabel: string) => {
const next = { ...labels, [sessionId]: newLabel };
setLabels(next);
saveLabelStore(next);
};
const handleDelete = (sessionId: string) => {
// Optimistically remove from in-memory list immediately
setRawSessions((prev) => prev.filter((s) => s.session_id !== sessionId));
const next = { ...labels };
delete next[sessionId];
setLabels(next);
saveLabelStore(next);
// Permanently delete session files from disk (fire-and-forget)
sessionsApi.deleteHistory(sessionId).catch(() => {
// Soft failure — the entry is already removed from the UI.
// The file may linger on disk, but won't appear in the next refresh
// because it's been removed from rawSessions.
});
};
// ── Deduplicate & render ────────────────────────────────────────────────────
// Deduplicate: show only the most-recent session per agent_path.
// rawSessions is already sorted newest-first by the backend.
const sessions = deduplicateByAgent(rawSessions);
const groups = groupByDate(sessions);
return (
<div
className={`flex-shrink-0 flex flex-col bg-card/20 border-r border-border/30 transition-[width] duration-200 overflow-hidden ${collapsed ? "w-[44px]" : "w-[220px]"
}`}
>
{/* Header */}
<div
className={`flex items-center border-b border-border/20 flex-shrink-0 h-10 ${collapsed ? "justify-center" : "px-3 gap-2"
}`}
>
{!collapsed && (
<span className="text-[11px] font-semibold text-muted-foreground/60 uppercase tracking-wider flex-1">
History
</span>
)}
<button
onClick={() => setCollapsed((o) => !o)}
className="p-1 rounded-md text-muted-foreground hover:text-foreground hover:bg-muted/50 transition-colors flex-shrink-0"
title={collapsed ? "Expand history" : "Collapse history"}
>
{collapsed ? (
<ChevronRight className="w-3.5 h-3.5" />
) : (
<ChevronLeft className="w-3.5 h-3.5" />
)}
</button>
</div>
{/* Expanded list */}
{!collapsed && (
<div className="flex-1 overflow-y-auto min-h-0">
{loading ? (
<div className="flex items-center justify-center py-8">
<Loader2 className="w-4 h-4 animate-spin text-muted-foreground/40" />
</div>
) : sessions.length === 0 ? (
<div className="px-4 py-12 text-center text-[11px] text-muted-foreground/40 leading-relaxed">
No previous
<br />
sessions yet
</div>
) : (
groups.map(({ label: groupLabel, items }) => (
<div key={groupLabel}>
<p className="px-3 pt-4 pb-1 text-[10px] font-semibold text-muted-foreground/35 uppercase tracking-wider">
{groupLabel}
</p>
{items.map((s, idx) => {
const customLabel = labels[s.session_id];
const computedLabel = customLabel || defaultLabel(s, idx);
const isActive =
s.session_id === activeSessionId ||
s.session_id === activeHistorySourceId;
// Mark as live if the backend flagged it OR if it's currently open in a tab
const isLive = s.live || openSessionIds.includes(s.session_id);
return (
<HistoryRow
key={s.session_id}
session={s}
label={computedLabel}
index={idx}
isActive={isActive}
isLive={isLive}
onOpen={() => onOpen(s.session_id, s.agent_path, s.agent_name)}
onRename={(nl) => handleRename(s.session_id, nl)}
onDelete={() => handleDelete(s.session_id)}
/>
);
})}
</div>
))
)}
</div>
)}
{/* Collapsed icon strip */}
{collapsed && (
<div className="flex-1 overflow-y-auto min-h-0 flex flex-col items-center py-2 gap-0.5">
{sessions.slice(0, 30).map((s) => {
const isLive = s.live || openSessionIds.includes(s.session_id);
return (
<button
key={s.session_id}
onClick={() => { setCollapsed(false); onOpen(s.session_id, s.agent_path, s.agent_name); }}
className="w-7 h-7 rounded-md flex items-center justify-center text-muted-foreground/40 hover:text-foreground hover:bg-muted/50 transition-colors relative"
title={labels[s.session_id] || defaultLabel(s, 0)}
>
<Clock className="w-3 h-3" />
{isLive && (
<span className="absolute top-0.5 right-0.5 w-1.5 h-1.5 rounded-full bg-emerald-500" />
)}
</button>
);
})}
</div>
)}
</div>
);
}
@@ -20,9 +20,19 @@ interface ToolCredential {
value?: string;
}
export interface SubagentReport {
subagent_id: string;
message: string;
data?: Record<string, unknown>;
timestamp: string;
status?: "running" | "complete" | "error";
}
interface NodeDetailPanelProps {
node: GraphNode | null;
nodeSpec?: NodeSpec | null;
allNodeSpecs?: NodeSpec[];
subagentReports?: SubagentReport[];
sessionId?: string;
graphId?: string;
workerSessionId?: string | null;
@@ -195,10 +205,96 @@ function SystemPromptTab({ systemPrompt }: { systemPrompt?: string }) {
);
}
function SubagentsTab() {
function SubagentStatusBadge({ status }: { status?: "running" | "complete" | "error" }) {
if (!status) return null;
if (status === "running") {
return (
<span className="ml-auto flex items-center gap-1 text-[10px] font-medium flex-shrink-0" style={{ color: "hsl(45,95%,58%)" }}>
<span className="relative flex h-1.5 w-1.5">
<span className="animate-ping absolute inline-flex h-full w-full rounded-full opacity-75" style={{ backgroundColor: "hsl(45,95%,58%)" }} />
<span className="relative inline-flex rounded-full h-1.5 w-1.5" style={{ backgroundColor: "hsl(45,95%,58%)" }} />
</span>
Running
</span>
);
}
if (status === "complete") {
return (
<span className="ml-auto flex items-center gap-1 text-[10px] font-medium flex-shrink-0" style={{ color: "hsl(43,70%,45%)" }}>
<CheckCircle2 className="w-3 h-3" />
Complete
</span>
);
}
return (
<div className="flex-1 flex items-center justify-center">
<p className="text-xs text-muted-foreground/60 italic text-center">No subagents assigned to this node.</p>
<span className="ml-auto flex items-center gap-1 text-[10px] font-medium flex-shrink-0" style={{ color: "hsl(0,65%,55%)" }}>
<AlertCircle className="w-3 h-3" />
Failed
</span>
);
}
function SubagentsTab({ subAgentIds, allNodeSpecs, subagentReports }: { subAgentIds: string[]; allNodeSpecs: NodeSpec[]; subagentReports: SubagentReport[] }) {
if (subAgentIds.length === 0) {
return (
<div className="flex-1 flex items-center justify-center">
<p className="text-xs text-muted-foreground/60 italic text-center">No subagents assigned to this node.</p>
</div>
);
}
return (
<div className="space-y-3">
<p className="text-[10px] font-medium text-muted-foreground uppercase tracking-wider mb-1">Sub-agents ({subAgentIds.length})</p>
{subAgentIds.map(saId => {
const spec = allNodeSpecs.find(n => n.id === saId);
const reports = subagentReports.filter(r => r.subagent_id === saId);
// Derive status from latest report that has a status field
const latestStatus = [...reports].reverse().find(r => r.status)?.status;
// Progress messages are reports without a status field (from report_to_parent)
const progressReports = reports.filter(r => !r.status);
return (
<div key={saId} className="rounded-xl border border-border/20 overflow-hidden">
<div className="p-3 bg-muted/30">
<div className="flex items-center gap-2 mb-1">
<Bot className="w-3.5 h-3.5 text-primary/70 flex-shrink-0" />
<span className="text-xs font-medium text-foreground truncate">{spec?.name || saId}</span>
<SubagentStatusBadge status={latestStatus} />
</div>
{spec?.description && (
<p className="text-[11px] text-muted-foreground leading-relaxed mt-1">{spec.description}</p>
)}
</div>
{/* Static info: tools + output keys */}
<div className="px-3 py-2 border-t border-border/15 bg-muted/15">
{spec?.tools && spec.tools.length > 0 && (
<div className="mb-1.5">
<span className="text-[10px] text-muted-foreground font-medium">Tools: </span>
<span className="text-[10px] text-foreground/70">{spec.tools.join(", ")}</span>
</div>
)}
{spec?.output_keys && spec.output_keys.length > 0 && (
<div>
<span className="text-[10px] text-muted-foreground font-medium">Outputs: </span>
<span className="text-[10px] text-foreground/70 font-mono">{spec.output_keys.join(", ")}</span>
</div>
)}
</div>
{/* Live progress reports (from report_to_parent) */}
{progressReports.length > 0 && (
<div className="px-3 py-2 border-t border-border/15 bg-background/60">
<p className="text-[10px] text-muted-foreground font-medium mb-1">Reports ({progressReports.length})</p>
{progressReports.map((r, i) => (
<div key={i} className="text-[10.5px] text-foreground/70 leading-relaxed py-0.5">{r.message}</div>
))}
</div>
)}
</div>
);
})}
</div>
);
}
@@ -213,7 +309,7 @@ const tabs: { id: Tab; label: string; Icon: React.FC<{ className?: string }> }[]
{ id: "subagents", label: "Subagents", Icon: ({ className }) => <Bot className={className} /> },
];
export default function NodeDetailPanel({ node, nodeSpec, sessionId, graphId, workerSessionId, nodeLogs, actionPlan, onClose }: NodeDetailPanelProps) {
export default function NodeDetailPanel({ node, nodeSpec, allNodeSpecs, subagentReports, sessionId, graphId, workerSessionId, nodeLogs, actionPlan, onClose }: NodeDetailPanelProps) {
const [activeTab, setActiveTab] = useState<Tab>("overview");
const [realTools, setRealTools] = useState<ToolInfo[] | null>(null);
const [realCriteria, setRealCriteria] = useState<NodeCriteria | null>(null);
@@ -295,7 +391,7 @@ export default function NodeDetailPanel({ node, nodeSpec, sessionId, graphId, wo
{/* Tab bar */}
<div className="flex border-b border-border/30 flex-shrink-0 px-2 pt-1 overflow-x-auto scrollbar-hide">
{tabs.map(tab => (
{tabs.filter(t => t.id !== "subagents" || (nodeSpec?.sub_agents && nodeSpec.sub_agents.length > 0)).map(tab => (
<button
key={tab.id}
onClick={() => setActiveTab(tab.id)}
@@ -397,8 +493,12 @@ export default function NodeDetailPanel({ node, nodeSpec, sessionId, graphId, wo
<SystemPromptTab systemPrompt={nodeSpec?.system_prompt} />
)}
{activeTab === "subagents" && (
<SubagentsTab />
{activeTab === "subagents" && nodeSpec?.sub_agents && (
<SubagentsTab
subAgentIds={nodeSpec.sub_agents}
allNodeSpecs={allNodeSpecs || []}
subagentReports={subagentReports || []}
/>
)}
</div>
</div>
@@ -0,0 +1,142 @@
import { useState, useRef, useEffect, useCallback } from "react";
import { Send, MessageCircleQuestion, X } from "lucide-react";
export interface QuestionWidgetProps {
/** The question text shown to the user */
question: string;
/** 1-3 predefined options. The UI appends an "Other" free-text option. */
options: string[];
/** Called with the selected option label or custom text, and whether "Other" was chosen */
onSubmit: (answer: string, isOther: boolean) => void;
/** Called when user dismisses the question without answering */
onDismiss?: () => void;
}
export default function QuestionWidget({ question, options, onSubmit, onDismiss }: QuestionWidgetProps) {
const [selected, setSelected] = useState<number | null>(null);
const [customText, setCustomText] = useState("");
const [submitted, setSubmitted] = useState(false);
const inputRef = useRef<HTMLInputElement>(null);
const containerRef = useRef<HTMLDivElement>(null);
// "Other" is always the last option index
const otherIndex = options.length;
const isOtherSelected = selected === otherIndex;
// Focus the text input when "Other" is selected
useEffect(() => {
if (isOtherSelected) {
inputRef.current?.focus();
}
}, [isOtherSelected]);
const canSubmit = selected !== null && (!isOtherSelected || customText.trim().length > 0);
const handleSubmit = useCallback(() => {
if (!canSubmit || submitted) return;
setSubmitted(true);
if (isOtherSelected) {
onSubmit(customText.trim(), true);
} else {
onSubmit(options[selected!], false);
}
}, [canSubmit, submitted, isOtherSelected, customText, options, selected, onSubmit]);
// Keyboard: Enter to submit, number keys to select (only when text input is not focused)
useEffect(() => {
const handleKeyDown = (e: KeyboardEvent) => {
if (submitted) return;
const inTextInput = e.target === inputRef.current;
if (e.key === "Enter" && !e.shiftKey) {
e.preventDefault();
handleSubmit();
return;
}
// Number keys 1-4 select options — skip when typing in the "Other" field
if (!inTextInput) {
const num = parseInt(e.key, 10);
if (num >= 1 && num <= options.length + 1) {
e.preventDefault();
setSelected(num - 1);
}
}
};
window.addEventListener("keydown", handleKeyDown);
return () => window.removeEventListener("keydown", handleKeyDown);
}, [handleSubmit, submitted, options.length]);
if (submitted) return null;
return (
<div ref={containerRef} className="p-4">
<div className="bg-card border border-border rounded-xl shadow-sm overflow-hidden">
{/* Header / Question */}
<div className="px-5 pt-4 pb-3 flex items-start gap-3">
<div className="w-7 h-7 rounded-lg bg-primary/10 border border-primary/20 flex items-center justify-center flex-shrink-0 mt-0.5">
<MessageCircleQuestion className="w-3.5 h-3.5 text-primary" />
</div>
<p className="text-sm font-medium text-foreground leading-relaxed flex-1">{question}</p>
{onDismiss && (
<button
onClick={onDismiss}
className="p-1 rounded-md text-muted-foreground hover:text-foreground hover:bg-muted/60 transition-colors flex-shrink-0"
>
<X className="w-4 h-4" />
</button>
)}
</div>
{/* Options */}
<div className="px-5 pb-3 space-y-1.5">
{options.map((option, idx) => (
<button
key={idx}
onClick={() => setSelected(idx)}
className={`w-full text-left px-4 py-2.5 rounded-lg border text-sm transition-colors ${
selected === idx
? "border-primary bg-primary/10 text-foreground"
: "border-border/60 bg-muted/20 text-foreground hover:border-primary/40 hover:bg-muted/40"
}`}
>
<span className="text-xs text-muted-foreground mr-2">{idx + 1}.</span>
{option}
</button>
))}
{/* "Other" — inline text input that auto-selects on focus */}
<input
ref={inputRef}
type="text"
value={customText}
onFocus={() => setSelected(otherIndex)}
onChange={(e) => {
setSelected(otherIndex);
setCustomText(e.target.value);
}}
placeholder="Type a custom response..."
className={`w-full px-4 py-2.5 rounded-lg border border-dashed text-sm transition-colors bg-transparent placeholder:text-muted-foreground focus:outline-none ${
isOtherSelected
? "border-primary bg-primary/10 text-foreground"
: "border-border text-muted-foreground hover:border-primary/40"
}`}
/>
</div>
{/* Submit */}
<div className="px-5 pb-4">
<button
onClick={handleSubmit}
disabled={!canSubmit}
className="w-full flex items-center justify-center gap-2 py-2.5 rounded-lg text-sm font-medium bg-primary text-primary-foreground hover:bg-primary/90 disabled:opacity-30 disabled:cursor-not-allowed transition-colors"
>
<Send className="w-3.5 h-3.5" />
Submit
</button>
</div>
</div>
</div>
);
}
+9
View File
@@ -167,3 +167,12 @@
.animate-in.slide-in-from-right {
animation: slide-in-from-right 0.2s ease-out;
}
/* Slide-up animation for question widget */
@keyframes slide-in-from-bottom {
from { transform: translateY(16px); opacity: 0; }
to { transform: translateY(0); opacity: 1; }
}
.animate-in.slide-in-from-bottom {
animation: slide-in-from-bottom 0.25s ease-out;
}
+15 -14
View File
@@ -37,8 +37,11 @@ export function backendMessageToChatMessage(
thread: string,
agentDisplayName?: string,
): ChatMessage {
// Use file-mtime created_at (epoch seconds → ms) for cross-conversation
// ordering; fall back to seq for backwards compatibility.
const createdAt = msg.created_at ? msg.created_at * 1000 : msg.seq;
return {
id: `backend-${msg.seq}`,
id: `backend-${msg._node_id}-${msg.seq}`,
agent: msg.role === "user" ? "You" : agentDisplayName || msg._node_id || "Agent",
agentColor: "",
content: msg.content,
@@ -46,6 +49,7 @@ export function backendMessageToChatMessage(
type: msg.role === "user" ? "user" : undefined,
role: msg.role === "user" ? undefined : "worker",
thread,
createdAt,
};
}
@@ -67,6 +71,8 @@ export function sseEventToChatMessage(
const eid = event.execution_id ?? "";
const tid = turnId != null ? String(turnId) : "";
const idKey = eid && tid ? `${eid}-${tid}` : eid || tid || `t-${Date.now()}`;
// Use the backend event timestamp for message ordering
const createdAt = event.timestamp ? new Date(event.timestamp).getTime() : Date.now();
switch (event.type) {
case "client_output_delta": {
@@ -86,22 +92,14 @@ export function sseEventToChatMessage(
timestamp: "",
role: "worker",
thread,
createdAt,
};
}
case "client_input_requested": {
const prompt = (event.data?.prompt as string) || "";
if (!prompt) return null;
return {
id: `input-req-${idKey}-${event.node_id}`,
agent: agentDisplayName || event.node_id || "Agent",
agentColor: "",
content: prompt,
timestamp: "",
role: "worker",
thread,
};
}
case "client_input_requested":
// Handled explicitly in handleSSEEvent (workspace.tsx) so it can
// create a worker_input_request message and set awaitingInput state.
return null;
case "llm_text_delta": {
const snapshot = (event.data?.snapshot as string) || (event.data?.content as string) || "";
@@ -114,6 +112,7 @@ export function sseEventToChatMessage(
timestamp: "",
role: "worker",
thread,
createdAt,
};
}
@@ -126,6 +125,7 @@ export function sseEventToChatMessage(
timestamp: "",
type: "system",
thread,
createdAt,
};
}
@@ -139,6 +139,7 @@ export function sseEventToChatMessage(
timestamp: "",
type: "system",
thread,
createdAt,
};
}
+25 -3
View File
@@ -12,8 +12,27 @@ import type { GraphNode, NodeStatus } from "@/components/AgentGraph";
* 4. Map session enrichment fields to NodeStatus
*/
export function topologyToGraphNodes(topology: GraphTopology): GraphNode[] {
const { nodes, edges, entry_node, entry_points } = topology;
if (nodes.length === 0) return [];
const { nodes: allNodes, edges, entry_node, entry_points } = topology;
if (allNodes.length === 0) return [];
// Filter out subagent-only nodes (referenced in sub_agents but not in any edge)
const subagentIds = new Set<string>();
for (const n of allNodes) {
for (const sa of n.sub_agents ?? []) {
subagentIds.add(sa);
}
}
const edgeParticipants = new Set<string>();
for (const e of edges) {
edgeParticipants.add(e.source);
edgeParticipants.add(e.target);
}
const nodes = allNodes.filter(
(n) =>
!subagentIds.has(n.id) ||
edgeParticipants.has(n.id) ||
n.id === entry_node,
);
// --- Synthesize trigger nodes for non-manual entry points ---
const schedulerEntryPoints = (entry_points || []).filter(
@@ -29,7 +48,10 @@ export function topologyToGraphNodes(topology: GraphTopology): GraphNode[] {
status: "pending",
nodeType: "trigger",
triggerType: ep.trigger_type,
triggerConfig: ep.trigger_config,
triggerConfig: {
...ep.trigger_config,
...(ep.next_fire_in != null ? { next_fire_in: ep.next_fire_in } : {}),
},
next: [ep.entry_node],
});
}
+1 -1
View File
@@ -9,7 +9,7 @@ import type { GraphNode } from "@/components/AgentGraph";
export const TAB_STORAGE_KEY = "hive:workspace-tabs";
export interface PersistedTabState {
tabs: Array<{ id: string; agentType: string; label: string; backendSessionId?: string }>;
tabs: Array<{ id: string; agentType: string; tabKey?: string; label: string; backendSessionId?: string; historySourceId?: string }>;
activeSessionByAgent: Record<string, string>;
activeWorker: string;
sessions?: Record<string, { messages: ChatMessage[]; graphNodes: GraphNode[] }>;
+31 -6
View File
@@ -1,6 +1,6 @@
import { useState, useEffect, useRef } from "react";
import { useNavigate } from "react-router-dom";
import { Crown, Mail, Briefcase, Shield, Search, Newspaper, ArrowRight, Hexagon, Send, Bot } from "lucide-react";
import { Crown, Mail, Briefcase, Shield, Search, Newspaper, ArrowRight, Hexagon, Send, Bot, Radar, Reply, DollarSign, MapPin, Calendar, UserPlus, Twitter } from "lucide-react";
import TopBar from "@/components/TopBar";
import type { LucideIcon } from "lucide-react";
import { agentsApi } from "@/api/agents";
@@ -14,6 +14,13 @@ const AGENT_ICONS: Record<string, LucideIcon> = {
vulnerability_assessment: Shield,
deep_research_agent: Search,
tech_news_reporter: Newspaper,
competitive_intel_agent: Radar,
email_reply_agent: Reply,
hubspot_revenue_leak_detector: DollarSign,
local_business_extractor: MapPin,
meeting_scheduler: Calendar,
sdr_agent: UserPlus,
twitter_news_agent: Twitter,
};
const AGENT_COLORS: Record<string, string> = {
@@ -22,6 +29,13 @@ const AGENT_COLORS: Record<string, string> = {
vulnerability_assessment: "hsl(15,70%,52%)",
deep_research_agent: "hsl(210,70%,55%)",
tech_news_reporter: "hsl(270,60%,55%)",
competitive_intel_agent: "hsl(190,70%,45%)",
email_reply_agent: "hsl(45,80%,55%)",
hubspot_revenue_leak_detector: "hsl(145,60%,42%)",
local_business_extractor: "hsl(350,65%,55%)",
meeting_scheduler: "hsl(220,65%,55%)",
sdr_agent: "hsl(165,55%,45%)",
twitter_news_agent: "hsl(200,85%,55%)",
};
function agentSlug(path: string): string {
@@ -40,7 +54,7 @@ const promptHints = [
export default function Home() {
const navigate = useNavigate();
const [inputValue, setInputValue] = useState("");
const textareaRef = useRef<HTMLInputElement>(null);
const textareaRef = useRef<HTMLTextAreaElement>(null);
const [showAgents, setShowAgents] = useState(false);
const [agents, setAgents] = useState<DiscoverEntry[]>([]);
const [loading, setLoading] = useState(false);
@@ -106,13 +120,24 @@ export default function Home() {
{/* Chat input */}
<form onSubmit={handleSubmit} className="mb-6">
<div className="relative border border-border/60 rounded-xl bg-card/50 hover:border-primary/30 focus-within:border-primary/40 transition-colors shadow-sm">
<input
<textarea
ref={textareaRef}
type="text"
rows={1}
value={inputValue}
onChange={(e) => setInputValue(e.target.value)}
onChange={(e) => {
setInputValue(e.target.value);
const ta = e.target;
ta.style.height = "auto";
ta.style.height = `${Math.min(ta.scrollHeight, 160)}px`;
}}
onKeyDown={(e) => {
if (e.key === "Enter" && !e.shiftKey) {
e.preventDefault();
handleSubmit(e);
}
}}
placeholder="Describe a task for the hive..."
className="w-full bg-transparent px-5 py-4 pr-12 text-sm text-foreground placeholder:text-muted-foreground/60 focus:outline-none rounded-xl"
className="w-full bg-transparent px-5 py-4 pr-12 text-sm text-foreground placeholder:text-muted-foreground/60 focus:outline-none rounded-xl resize-none overflow-y-auto"
/>
<div className="absolute right-3 bottom-2.5">
<button
+2 -2
View File
@@ -41,11 +41,11 @@ export default function MyAgents() {
const idleCount = agents.length - activeCount;
return (
<div className="min-h-screen bg-background flex flex-col">
<div className="h-screen bg-background flex flex-col overflow-hidden">
<TopBar />
{/* Content */}
<div className="flex-1 p-6 md:p-10 max-w-5xl mx-auto w-full">
<div className="flex-1 p-6 md:p-10 max-w-5xl mx-auto w-full overflow-y-auto">
<div className="flex items-center justify-between mb-8">
<div>
<h1 className="text-xl font-semibold text-foreground">My Agents</h1>
File diff suppressed because it is too large Load Diff
+12 -4
View File
@@ -12,9 +12,6 @@ dependencies = [
"mcp>=1.0.0",
"fastmcp>=2.0.0",
"textual>=1.0.0",
"pytest>=8.0",
"pytest-asyncio>=0.23",
"pytest-xdist>=3.0",
"tools",
]
@@ -22,6 +19,11 @@ dependencies = [
tui = ["textual>=0.75.0"]
webhook = ["aiohttp>=3.9.0"]
server = ["aiohttp>=3.9.0"]
testing = [
"pytest>=8.0",
"pytest-asyncio>=0.23",
"pytest-xdist>=3.0",
]
[project.scripts]
hive = "framework.cli:main"
@@ -63,4 +65,10 @@ lint.isort.section-order = [
]
[dependency-groups]
dev = ["ty>=0.0.13", "ruff>=0.14.14"]
dev = [
"ty>=0.0.13",
"ruff>=0.14.14",
"pytest>=8.0",
"pytest-asyncio>=0.23",
"pytest-xdist>=3.0",
]
+10 -3
View File
@@ -53,7 +53,13 @@ def log_error(message: str):
def run_command(cmd: list, error_msg: str) -> bool:
"""Run a command and return success status."""
try:
subprocess.run(cmd, check=True, capture_output=True, text=True)
subprocess.run(
cmd,
check=True,
capture_output=True,
text=True,
encoding="utf-8",
)
return True
except subprocess.CalledProcessError as e:
log_error(error_msg)
@@ -97,7 +103,7 @@ def main():
if mcp_config_path.exists():
log_success("MCP configuration found at .mcp.json")
logger.info("Configuration:")
with open(mcp_config_path) as f:
with open(mcp_config_path, encoding="utf-8") as f:
config = json.load(f)
logger.info(json.dumps(config, indent=2))
else:
@@ -114,7 +120,7 @@ def main():
}
}
with open(mcp_config_path, "w") as f:
with open(mcp_config_path, "w", encoding="utf-8") as f:
json.dump(config, f, indent=2)
log_success("Created .mcp.json")
@@ -129,6 +135,7 @@ def main():
check=True,
capture_output=True,
text=True,
encoding="utf-8",
)
log_success("MCP server module verified")
except subprocess.CalledProcessError as e:
+5
View File
@@ -68,6 +68,7 @@ class TestFrameworkModule:
[sys.executable, "-m", "framework", "--help"],
capture_output=True,
text=True,
encoding="utf-8",
cwd=str(project_root / "core"),
)
assert result.returncode == 0
@@ -79,6 +80,7 @@ class TestFrameworkModule:
[sys.executable, "-m", "framework", "list", "--help"],
capture_output=True,
text=True,
encoding="utf-8",
cwd=str(project_root / "core"),
)
assert result.returncode == 0
@@ -104,6 +106,7 @@ class TestHiveEntryPoint:
["hive", "--help"],
capture_output=True,
text=True,
encoding="utf-8",
)
assert result.returncode == 0
assert "run" in result.stdout.lower()
@@ -115,6 +118,7 @@ class TestHiveEntryPoint:
["hive", "list", "--help"],
capture_output=True,
text=True,
encoding="utf-8",
)
assert result.returncode == 0
@@ -124,5 +128,6 @@ class TestHiveEntryPoint:
["hive", "run", "nonexistent_agent_xyz"],
capture_output=True,
text=True,
encoding="utf-8",
)
assert result.returncode != 0
+23
View File
@@ -0,0 +1,23 @@
"""Tests for framework/config.py - Hive configuration loading."""
import logging
from framework.config import get_hive_config
class TestGetHiveConfig:
"""Test get_hive_config() logs warnings on parse errors."""
def test_logs_warning_on_malformed_json(self, tmp_path, monkeypatch, caplog):
"""Test that malformed JSON logs warning and returns empty dict."""
config_file = tmp_path / "configuration.json"
config_file.write_text('{"broken": }')
monkeypatch.setattr("framework.config.HIVE_CONFIG_FILE", config_file)
with caplog.at_level(logging.WARNING):
result = get_hive_config()
assert result == {}
assert "Failed to load Hive config" in caplog.text
assert str(config_file) in caplog.text
+135 -4
View File
@@ -111,7 +111,7 @@ def tool_call_scenario(
@pytest.fixture
def runtime():
rt = MagicMock(spec=Runtime)
rt.start_run = MagicMock(return_value="run_1")
rt.start_run = MagicMock(return_value="session_20250101_000000_eventlp01")
rt.decide = MagicMock(return_value="dec_1")
rt.record_outcome = MagicMock()
rt.end_run = MagicMock()
@@ -578,7 +578,11 @@ class TestClientFacingBlocking:
"""signal_shutdown should unblock a waiting client_facing node."""
llm = MockStreamingLLM(
scenarios=[
tool_call_scenario("ask_user", {"question": "Waiting..."}, tool_use_id="ask_1"),
tool_call_scenario(
"ask_user",
{"question": "Waiting...", "options": ["Continue", "Stop"]},
tool_use_id="ask_1",
),
]
)
bus = EventBus()
@@ -600,7 +604,11 @@ class TestClientFacingBlocking:
"""CLIENT_INPUT_REQUESTED should be published when ask_user blocks."""
llm = MockStreamingLLM(
scenarios=[
tool_call_scenario("ask_user", {"question": "Hello!"}, tool_use_id="ask_1"),
tool_call_scenario(
"ask_user",
{"question": "Hello!", "options": ["Yes", "No"]},
tool_use_id="ask_1",
),
]
)
bus = EventBus()
@@ -796,7 +804,7 @@ class TestClientFacingExpectingWork:
async def user_then_shutdown():
await asyncio.sleep(0.05)
await node.inject_event("furwise.app")
await node.inject_event("furwise.app", is_client_input=True)
# Node should auto-block on "Monitoring..." text.
# Give it time to reach the block, then shutdown.
await asyncio.sleep(0.1)
@@ -1893,6 +1901,71 @@ class TestToolDoomLoopIntegration:
result = await node.execute(ctx)
assert result.success is True
@pytest.mark.asyncio
async def test_doom_loop_detects_repeated_failing_tool(
self,
runtime,
node_spec,
memory,
):
"""A tool that keeps failing with is_error=True should trigger doom loop.
Regression test: previously, errored tool calls were excluded from
doom loop fingerprinting (``not tc.get("is_error")``), so a tool like
a tool failing with the same error every turn
would never be detected.
"""
node_spec.output_keys = []
judge = AsyncMock(spec=JudgeProtocol)
eval_count = 0
async def judge_eval(*args, **kwargs):
nonlocal eval_count
eval_count += 1
if eval_count >= 5:
return JudgeVerdict(action="ACCEPT")
return JudgeVerdict(action="RETRY")
judge.evaluate = judge_eval
# 4 turns of the same failing tool call, then text
llm = ToolRepeatLLM("failing_tool", {}, tool_turns=4)
bus = EventBus()
doom_events: list = []
bus.subscribe(
event_types=[EventType.NODE_TOOL_DOOM_LOOP],
handler=lambda e: doom_events.append(e),
)
def tool_exec(tool_use: ToolUse) -> ToolResult:
return ToolResult(
tool_use_id=tool_use.id,
content="Error: accessibility tree unavailable",
is_error=True,
)
ctx = build_ctx(
runtime,
node_spec,
memory,
llm,
tools=[Tool(name="failing_tool", description="s", parameters={})],
)
node = EventLoopNode(
judge=judge,
tool_executor=tool_exec,
event_bus=bus,
config=LoopConfig(
max_iterations=10,
tool_doom_loop_threshold=3,
),
)
result = await node.execute(ctx)
assert result.success is True
# Doom loop MUST fire for repeatedly-failing tool calls
assert len(doom_events) >= 1
assert "failing_tool" in doom_events[0].data["description"]
# ===========================================================================
# execution_id plumbing
@@ -1962,3 +2035,61 @@ class TestExecutionId:
node_spec=node_spec, memory=SharedMemory(), goal=goal, input_data={}
)
assert ctx.execution_id == ""
# ---------------------------------------------------------------------------
# Subagent memory snapshot includes accumulator outputs
# ---------------------------------------------------------------------------
class TestSubagentAccumulatorMemory:
"""Verify that subagent memory construction merges accumulator outputs
and includes the subagent's input_keys in read permissions."""
def test_accumulator_values_merged_into_parent_data(self):
"""Keys from OutputAccumulator should appear in subagent memory."""
# Simulate what _execute_subagent does internally:
# parent shared memory has user_request but NOT tweet_content
parent_memory = SharedMemory()
parent_memory.write("user_request", "post a joke")
parent_data = parent_memory.read_all() # {"user_request": "post a joke"}
# Accumulator has tweet_content (set via set_output before delegation)
acc = OutputAccumulator(values={"tweet_content": "Hello world!"})
# Merge accumulator outputs (the fix)
for key, value in acc.to_dict().items():
if key not in parent_data:
parent_data[key] = value
# Build subagent memory
subagent_memory = SharedMemory()
for key, value in parent_data.items():
subagent_memory.write(key, value, validate=False)
subagent_input_keys = ["tweet_content"]
read_keys = set(parent_data.keys()) | set(subagent_input_keys)
scoped = subagent_memory.with_permissions(read_keys=list(read_keys), write_keys=[])
# This would have raised PermissionError before the fix
assert scoped.read("tweet_content") == "Hello world!"
assert scoped.read("user_request") == "post a joke"
def test_input_keys_allowed_even_if_not_in_data(self):
"""Subagent input_keys should be in read permissions even if the
key doesn't exist in memory (returns None instead of PermissionError)."""
parent_memory = SharedMemory()
parent_memory.write("user_request", "hi")
parent_data = parent_memory.read_all()
subagent_memory = SharedMemory()
for key, value in parent_data.items():
subagent_memory.write(key, value, validate=False)
# input_keys includes "tweet_content" which isn't in parent_data
read_keys = set(parent_data.keys()) | {"tweet_content"}
scoped = subagent_memory.with_permissions(read_keys=list(read_keys), write_keys=[])
# Should return None (not raise PermissionError)
assert scoped.read("tweet_content") is None
assert scoped.read("user_request") == "hi"
-19
View File
@@ -248,22 +248,3 @@ async def test_event_loop_max_retries_positive_logs_warning(runtime, caplog):
# Custom nodes (not EventLoopNode instances) don't get override warning
assert "Overriding to 0" not in caplog.text
# --- Existing node types unaffected ---
def test_existing_node_types_unchanged():
"""Only event_loop is a valid node type."""
expected = {"event_loop"}
assert expected == GraphExecutor.VALID_NODE_TYPES
# Default node_type is event_loop
spec = NodeSpec(id="x", name="X", description="x")
assert spec.node_type == "event_loop"
# Default max_retries is still 3
assert spec.max_retries == 3
# Default client_facing is False
assert spec.client_facing is False
+8 -5
View File
@@ -47,8 +47,11 @@ class DummyLLMProvider(LLMProvider):
) -> AsyncIterator[StreamEvent]:
self._call_count += 1
if self._call_count == 1:
# First call: set the output via tool call
# Each execution takes 2 LLM calls:
# - Odd calls (1, 3, 5, ...): set output via tool call
# - Even calls (2, 4, 6, ...): finish with text
if self._call_count % 2 == 1:
# First call of each execution: set the output via tool call
yield ToolCallEvent(
tool_use_id=f"tc_{self._call_count}",
tool_name="set_output",
@@ -56,7 +59,7 @@ class DummyLLMProvider(LLMProvider):
)
yield FinishEvent(stop_reason="tool_use", input_tokens=10, output_tokens=10)
else:
# Subsequent calls: just finish with text
# Second call of each execution: finish with text
yield TextDeltaEvent(content="Done.", snapshot="Done.")
yield FinishEvent(stop_reason="end_turn", input_tokens=5, output_tokens=5)
@@ -229,7 +232,7 @@ async def test_shared_session_reuses_directory_and_memory(tmp_path):
# Verify primary session's state.json exists and has the primary entry_point
primary_state_path = tmp_path / "sessions" / primary_exec_id / "state.json"
assert primary_state_path.exists()
primary_state = json.loads(primary_state_path.read_text())
primary_state = json.loads(primary_state_path.read_text(encoding="utf-8"))
assert primary_state["entry_point"] == "primary"
# Async stream — simulates a webhook entry point sharing the session
@@ -272,7 +275,7 @@ async def test_shared_session_reuses_directory_and_memory(tmp_path):
# State.json should NOT have been overwritten by the async execution
# (it should still show the primary entry point)
final_state = json.loads(primary_state_path.read_text())
final_state = json.loads(primary_state_path.read_text(encoding="utf-8"))
assert final_state["entry_point"] == "primary"
# Verify only ONE session directory exists (not two)
+599 -1
View File
@@ -2,11 +2,12 @@
from __future__ import annotations
import json
from typing import Any
import pytest
from framework.graph.conversation import Message, NodeConversation
from framework.graph.conversation import Message, NodeConversation, extract_tool_call_history
from framework.storage.conversation_store import FileConversationStore
# ---------------------------------------------------------------------------
@@ -930,3 +931,600 @@ class TestConversationIntegration:
assert restored.next_seq == 4
assert restored.messages[0].content == "new msg"
assert restored.messages[0].seq == 2
# ---------------------------------------------------------------------------
# Helpers for aggressive compaction tests
# ---------------------------------------------------------------------------
def _make_tool_call(call_id: str, name: str, args: dict) -> dict:
return {
"id": call_id,
"type": "function",
"function": {"name": name, "arguments": json.dumps(args)},
}
async def _build_tool_heavy_conversation(
store: MockConversationStore | None = None,
) -> NodeConversation:
"""Build a conversation with many tool call pairs.
Layout: user msg, then 5x (assistant with append_data tool_call + tool result),
then 1x (assistant with set_output tool_call + tool result), then user msg + assistant msg.
"""
conv = NodeConversation(store=store)
await conv.add_user_message("Process the data") # seq 0
for i in range(5):
args = {"filename": "output.html", "content": "x" * 500}
tc = [_make_tool_call(f"call_{i}", "append_data", args)]
conv._messages.append(
Message(
seq=conv._next_seq,
role="assistant",
content=f"Appending part {i}",
tool_calls=tc,
)
)
if store:
await store.write_part(conv._next_seq, conv._messages[-1].to_storage_dict())
conv._next_seq += 1
conv._messages.append(
Message(
seq=conv._next_seq,
role="tool",
content='{"success": true}',
tool_use_id=f"call_{i}",
)
)
if store:
await store.write_part(conv._next_seq, conv._messages[-1].to_storage_dict())
conv._next_seq += 1
# set_output call — must be protected
so_tc = [_make_tool_call("call_so", "set_output", {"key": "result", "value": "done"})]
conv._messages.append(
Message(seq=conv._next_seq, role="assistant", content="Setting output", tool_calls=so_tc)
)
if store:
await store.write_part(conv._next_seq, conv._messages[-1].to_storage_dict())
conv._next_seq += 1
conv._messages.append(
Message(
seq=conv._next_seq,
role="tool",
content="Output 'result' set successfully.",
tool_use_id="call_so",
)
)
if store:
await store.write_part(conv._next_seq, conv._messages[-1].to_storage_dict())
conv._next_seq += 1
# Recent messages
await conv.add_user_message("Continue")
await conv.add_assistant_message("Working on it")
return conv
# ---------------------------------------------------------------------------
# Tests: aggressive structural compaction
# ---------------------------------------------------------------------------
class TestAggressiveStructuralCompaction:
@pytest.mark.asyncio
async def test_aggressive_collapses_tool_pairs(self, tmp_path):
"""Aggressive mode should collapse non-essential tool pairs into a summary."""
conv = await _build_tool_heavy_conversation()
spill = str(tmp_path)
await conv.compact_preserving_structure(
spillover_dir=spill,
keep_recent=2,
aggressive=True,
)
# The 5 append_data pairs (10 msgs) + 1 user msg should be collapsed.
# Remaining: ref_msg + set_output pair (2 msgs) + 2 recent = 5
assert conv.message_count == 5
assert conv.messages[0].role == "user" # ref message
assert "TOOLS ALREADY CALLED" in conv.messages[0].content
assert "append_data (5x)" in conv.messages[0].content
# set_output pair should be preserved
assert conv.messages[1].role == "assistant"
assert conv.messages[1].tool_calls is not None
assert conv.messages[1].tool_calls[0]["function"]["name"] == "set_output"
assert conv.messages[2].role == "tool"
# Recent messages intact
assert conv.messages[3].content == "Continue"
assert conv.messages[4].content == "Working on it"
@pytest.mark.asyncio
async def test_aggressive_preserves_set_output(self, tmp_path):
"""set_output tool calls are always protected in aggressive mode."""
conv = await _build_tool_heavy_conversation()
spill = str(tmp_path)
await conv.compact_preserving_structure(
spillover_dir=spill,
keep_recent=2,
aggressive=True,
)
# Find all tool calls in remaining messages
tool_names = []
for msg in conv.messages:
if msg.tool_calls:
for tc in msg.tool_calls:
tool_names.append(tc["function"]["name"])
assert "set_output" in tool_names
# append_data should NOT be in remaining messages (collapsed)
assert "append_data" not in tool_names
@pytest.mark.asyncio
async def test_aggressive_preserves_errors(self, tmp_path):
"""Error tool results are always protected in aggressive mode."""
conv = NodeConversation()
await conv.add_user_message("Start")
# Regular tool call
tc1 = [_make_tool_call("call_ok", "web_search", {"query": "test"})]
conv._messages.append(
Message(seq=conv._next_seq, role="assistant", content="", tool_calls=tc1)
)
conv._next_seq += 1
conv._messages.append(
Message(seq=conv._next_seq, role="tool", content="results", tool_use_id="call_ok")
)
conv._next_seq += 1
# Error tool call
tc2 = [_make_tool_call("call_err", "web_scrape", {"url": "http://broken.com"})]
conv._messages.append(
Message(seq=conv._next_seq, role="assistant", content="", tool_calls=tc2)
)
conv._next_seq += 1
conv._messages.append(
Message(
seq=conv._next_seq,
role="tool",
content="Connection timeout",
tool_use_id="call_err",
is_error=True,
)
)
conv._next_seq += 1
await conv.add_user_message("Next")
await conv.add_assistant_message("OK")
spill = str(tmp_path)
await conv.compact_preserving_structure(
spillover_dir=spill,
keep_recent=2,
aggressive=True,
)
# Error pair should be preserved
error_msgs = [m for m in conv.messages if m.role == "tool" and m.is_error]
assert len(error_msgs) == 1
assert error_msgs[0].content == "Connection timeout"
@pytest.mark.asyncio
async def test_standard_mode_keeps_all_tool_pairs(self, tmp_path):
"""Non-aggressive mode should keep all tool pairs (existing behavior)."""
conv = await _build_tool_heavy_conversation()
spill = str(tmp_path)
await conv.compact_preserving_structure(
spillover_dir=spill,
keep_recent=2,
aggressive=False,
)
# All 6 tool pairs (12 msgs) should be kept as structural.
# Removed: 1 user msg (freeform). Remaining: ref + 12 structural + 2 recent = 15
assert conv.message_count == 15
@pytest.mark.asyncio
async def test_two_pass_sequence(self, tmp_path):
"""Standard pass then aggressive pass produces valid result."""
conv = await _build_tool_heavy_conversation()
spill = str(tmp_path)
# Pass 1: standard
await conv.compact_preserving_structure(
spillover_dir=spill,
keep_recent=2,
)
after_standard = conv.message_count
assert after_standard == 15 # all structural kept
# Pass 2: aggressive
await conv.compact_preserving_structure(
spillover_dir=spill,
keep_recent=2,
aggressive=True,
)
after_aggressive = conv.message_count
assert after_aggressive < after_standard
# ref + set_output pair + 2 recent = 5
assert after_aggressive == 5
@pytest.mark.asyncio
async def test_aggressive_persists_correctly(self, tmp_path):
"""Aggressive compaction correctly updates the store."""
store = MockConversationStore()
conv = await _build_tool_heavy_conversation(store=store)
spill = str(tmp_path)
await conv.compact_preserving_structure(
spillover_dir=spill,
keep_recent=2,
aggressive=True,
)
# Verify store state matches in-memory state
parts = await store.read_parts()
assert len(parts) == conv.message_count
class TestExtractToolCallHistory:
def test_basic_extraction(self):
msgs = [
Message(
seq=0,
role="assistant",
content="",
tool_calls=[
_make_tool_call("c1", "web_search", {"query": "python async"}),
],
),
Message(seq=1, role="tool", content="results", tool_use_id="c1"),
Message(
seq=2,
role="assistant",
content="",
tool_calls=[
_make_tool_call(
"c2", "save_data", {"filename": "output.txt", "content": "data"}
),
],
),
Message(seq=3, role="tool", content="saved", tool_use_id="c2"),
]
result = extract_tool_call_history(msgs)
assert "web_search (1x)" in result
assert "save_data (1x)" in result
assert "FILES SAVED: output.txt" in result
def test_errors_included(self):
msgs = [
Message(
seq=0,
role="tool",
content="Connection refused",
is_error=True,
tool_use_id="c1",
),
]
result = extract_tool_call_history(msgs)
assert "ERRORS" in result
assert "Connection refused" in result
def test_empty_messages(self):
assert extract_tool_call_history([]) == ""
# ---------------------------------------------------------------------------
# Tests for _is_context_too_large_error
# ---------------------------------------------------------------------------
class TestIsContextTooLargeError:
def test_context_window_class_name(self):
from framework.graph.event_loop_node import _is_context_too_large_error
class ContextWindowExceededError(Exception):
pass
assert _is_context_too_large_error(ContextWindowExceededError("x"))
def test_openai_context_length(self):
from framework.graph.event_loop_node import _is_context_too_large_error
err = RuntimeError("This model's maximum context length is 128000 tokens")
assert _is_context_too_large_error(err)
def test_anthropic_too_long(self):
from framework.graph.event_loop_node import _is_context_too_large_error
err = RuntimeError("prompt is too long: 150000 tokens > 100000")
assert _is_context_too_large_error(err)
def test_generic_exceeds_limit(self):
from framework.graph.event_loop_node import _is_context_too_large_error
err = ValueError("Request exceeds token limit")
assert _is_context_too_large_error(err)
def test_unrelated_error(self):
from framework.graph.event_loop_node import _is_context_too_large_error
assert not _is_context_too_large_error(ValueError("connection refused"))
assert not _is_context_too_large_error(RuntimeError("timeout"))
# ---------------------------------------------------------------------------
# Tests for _format_messages_for_summary
# ---------------------------------------------------------------------------
class TestFormatMessagesForSummary:
def test_user_assistant_messages(self):
from framework.graph.event_loop_node import EventLoopNode
msgs = [
Message(seq=0, role="user", content="Hello world"),
Message(seq=1, role="assistant", content="Hi there"),
]
result = EventLoopNode._format_messages_for_summary(msgs)
assert "[user]: Hello world" in result
assert "[assistant]: Hi there" in result
def test_tool_result_truncated(self):
from framework.graph.event_loop_node import EventLoopNode
msgs = [
Message(seq=0, role="tool", content="x" * 1000, tool_use_id="c1"),
]
result = EventLoopNode._format_messages_for_summary(msgs)
assert "[tool result]:" in result
assert "..." in result
# Should be truncated to 500 + "..."
assert len(result) < 600
def test_assistant_with_tool_calls(self):
from framework.graph.event_loop_node import EventLoopNode
tc = [_make_tool_call("c1", "web_search", {"query": "test"})]
msgs = [
Message(seq=0, role="assistant", content="Searching", tool_calls=tc),
]
result = EventLoopNode._format_messages_for_summary(msgs)
assert "web_search" in result
assert "[assistant (calls:" in result
# ---------------------------------------------------------------------------
# Tests for _llm_compact (recursive binary-search)
# ---------------------------------------------------------------------------
class TestLlmCompact:
"""Test the recursive LLM compaction with mock LLM."""
def _make_node(self):
"""Create a minimal EventLoopNode for testing."""
from framework.graph.event_loop_node import EventLoopNode, LoopConfig
config = LoopConfig(max_history_tokens=32000)
node = EventLoopNode.__new__(EventLoopNode)
node._config = config
node._event_bus = None
node._judge = None
node._approval_callback = None
node._tool_executor = None
node._adaptive_learner = None
# Set class-level constants (already on class, but explicit)
return node
def _make_ctx(self, llm_responses=None, llm_error=None):
"""Create a mock NodeContext with controllable LLM."""
from unittest.mock import AsyncMock, MagicMock
from framework.graph.node import NodeSpec
spec = NodeSpec(
id="test",
name="Test Node",
description="A test node",
node_type="event_loop",
input_keys=[],
output_keys=["result"],
)
ctx = MagicMock()
ctx.node_spec = spec
ctx.node_id = "test"
ctx.stream_id = "test"
ctx.continuous_mode = False
ctx.runtime_logger = None
mock_llm = AsyncMock()
if llm_error:
mock_llm.acomplete.side_effect = llm_error
elif llm_responses:
responses = []
for text in llm_responses:
resp = MagicMock()
resp.content = text
responses.append(resp)
mock_llm.acomplete.side_effect = responses
else:
resp = MagicMock()
resp.content = "Summary of conversation."
mock_llm.acomplete.return_value = resp
ctx.llm = mock_llm
return ctx
@pytest.mark.asyncio
async def test_single_call_success(self):
node = self._make_node()
ctx = self._make_ctx()
msgs = [
Message(seq=0, role="user", content="Do something"),
Message(seq=1, role="assistant", content="Done"),
]
result = await node._llm_compact(ctx, msgs, None)
assert "Summary of conversation." in result
ctx.llm.acomplete.assert_called_once()
@pytest.mark.asyncio
async def test_context_too_large_triggers_split(self):
"""When LLM raises context error, should split and retry."""
from unittest.mock import MagicMock
node = self._make_node()
call_count = 0
async def mock_acomplete(**kwargs):
nonlocal call_count
call_count += 1
# First call with full messages → fail
# Subsequent calls with smaller chunks → succeed
if call_count == 1:
raise RuntimeError("This model's maximum context length is 128000 tokens")
resp = MagicMock()
resp.content = f"Summary part {call_count}"
return resp
ctx = self._make_ctx()
ctx.llm.acomplete = mock_acomplete
msgs = [Message(seq=i, role="user", content=f"Message {i}") for i in range(10)]
result = await node._llm_compact(ctx, msgs, None)
# Should have split and produced two summaries
assert "Summary part" in result
assert call_count >= 3 # 1 failure + 2 successful halves
@pytest.mark.asyncio
async def test_non_context_error_propagates(self):
"""Non-context errors should propagate, not trigger splitting."""
node = self._make_node()
ctx = self._make_ctx(llm_error=ValueError("API key invalid"))
msgs = [
Message(seq=0, role="user", content="Hello"),
Message(seq=1, role="assistant", content="Hi"),
]
with pytest.raises(ValueError, match="API key invalid"):
await node._llm_compact(ctx, msgs, None)
@pytest.mark.asyncio
async def test_proactive_split_for_large_input(self):
"""Messages exceeding char limit should be split proactively."""
node = self._make_node()
# Lower the limit for testing
node._LLM_COMPACT_CHAR_LIMIT = 100
ctx = self._make_ctx(
llm_responses=["Part 1 summary", "Part 2 summary"],
)
msgs = [
Message(seq=0, role="user", content="x" * 80),
Message(seq=1, role="user", content="y" * 80),
]
result = await node._llm_compact(ctx, msgs, None)
assert "Part 1 summary" in result
assert "Part 2 summary" in result
# LLM should have been called twice (no failure, proactive split)
assert ctx.llm.acomplete.call_count == 2
@pytest.mark.asyncio
async def test_tool_history_appended_at_top_level(self):
"""Tool history should only be appended at depth 0."""
node = self._make_node()
ctx = self._make_ctx()
tc = [_make_tool_call("c1", "web_search", {"query": "test"})]
msgs = [
Message(seq=0, role="assistant", content="", tool_calls=tc),
Message(seq=1, role="tool", content="results", tool_use_id="c1"),
]
result = await node._llm_compact(ctx, msgs, None)
assert "TOOLS ALREADY CALLED" in result
assert "web_search" in result
# ---------------------------------------------------------------------------
# Orphaned tool result repair
# ---------------------------------------------------------------------------
class TestRepairOrphanedToolCalls:
"""Test _repair_orphaned_tool_calls handles both directions."""
def test_orphaned_tool_result_dropped(self):
"""Tool result with no matching tool_use should be dropped."""
msgs = [
# tool result with no preceding assistant tool_use
{"role": "tool", "tool_call_id": "orphan_1", "content": "stale result"},
{"role": "user", "content": "hello"},
{"role": "assistant", "content": "hi"},
]
repaired = NodeConversation._repair_orphaned_tool_calls(msgs)
assert len(repaired) == 2
assert repaired[0]["role"] == "user"
assert repaired[1]["role"] == "assistant"
def test_valid_tool_pair_preserved(self):
"""Tool result with matching tool_use should be kept."""
msgs = [
{"role": "user", "content": "search"},
{
"role": "assistant",
"content": "",
"tool_calls": [{"id": "tc_1", "function": {"name": "search", "arguments": "{}"}}],
},
{"role": "tool", "tool_call_id": "tc_1", "content": "results"},
]
repaired = NodeConversation._repair_orphaned_tool_calls(msgs)
assert len(repaired) == 3
assert repaired[2]["tool_call_id"] == "tc_1"
def test_orphaned_tool_use_gets_stub(self):
"""Tool use with no following tool result gets a synthetic error stub."""
msgs = [
{"role": "user", "content": "search"},
{
"role": "assistant",
"content": "",
"tool_calls": [{"id": "tc_1", "function": {"name": "search", "arguments": "{}"}}],
},
# No tool result follows
{"role": "user", "content": "what happened?"},
]
repaired = NodeConversation._repair_orphaned_tool_calls(msgs)
# Should insert a synthetic tool result between assistant and user
assert len(repaired) == 4
assert repaired[2]["role"] == "tool"
assert repaired[2]["tool_call_id"] == "tc_1"
assert "interrupted" in repaired[2]["content"].lower()
def test_mixed_orphans(self):
"""Both orphaned results and orphaned calls handled together."""
msgs = [
# Orphaned result (no matching tool_use)
{"role": "tool", "tool_call_id": "gone_1", "content": "old result"},
{"role": "user", "content": "try again"},
{
"role": "assistant",
"content": "",
"tool_calls": [{"id": "tc_2", "function": {"name": "fetch", "arguments": "{}"}}],
},
# Missing result for tc_2
{"role": "user", "content": "done?"},
]
repaired = NodeConversation._repair_orphaned_tool_calls(msgs)
# orphaned result dropped, stub added for tc_2
roles = [m["role"] for m in repaired]
assert roles == ["user", "assistant", "tool", "user"]
assert repaired[2]["tool_call_id"] == "tc_2"
+2 -2
View File
@@ -184,7 +184,7 @@ class TestPathTraversalWithActualFiles:
# Create a secret file outside storage
secret_file = tmpdir_path / "secret.txt"
secret_file.write_text("SENSITIVE_DATA")
secret_file.write_text("SENSITIVE_DATA", encoding="utf-8")
storage = FileStorage(storage_dir)
@@ -193,7 +193,7 @@ class TestPathTraversalWithActualFiles:
storage.get_runs_by_goal("../secret")
# Verify the secret file was not accessed (still contains original data)
assert secret_file.read_text() == "SENSITIVE_DATA"
assert secret_file.read_text(encoding="utf-8") == "SENSITIVE_DATA"
def test_cannot_write_outside_storage(self):
"""Verify that we can't write files outside storage directory."""
+82 -50
View File
@@ -21,22 +21,51 @@ from framework.runtime.runtime_log_schemas import (
from framework.runtime.runtime_log_store import RuntimeLogStore
from framework.runtime.runtime_logger import RuntimeLogger
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
_SESSION_PREFIX = "session_20250101_000000"
def _sid(suffix: str) -> str:
"""Build a deterministic session ID for tests."""
return f"{_SESSION_PREFIX}_{suffix}"
# ---------------------------------------------------------------------------
# RuntimeLogStore tests
# ---------------------------------------------------------------------------
@pytest.fixture(autouse=True)
def _force_session_run_ids(monkeypatch):
"""Use unified session_* IDs in tests to avoid deprecated run path warnings."""
original_start_run = RuntimeLogger.start_run
counter = 0
def _patched_start_run(self, goal_id: str = "", session_id: str = "") -> str:
nonlocal counter
if not session_id:
counter += 1
session_id = _sid(f"{counter:08x}")
return original_start_run(self, goal_id=goal_id, session_id=session_id)
monkeypatch.setattr(RuntimeLogger, "start_run", _patched_start_run)
class TestRuntimeLogStore:
@pytest.mark.asyncio
async def test_ensure_run_dir_creates_directory(self, tmp_path: Path):
store = RuntimeLogStore(tmp_path / "logs")
store.ensure_run_dir("test_run_1")
assert (tmp_path / "logs" / "runs" / "test_run_1").is_dir()
store.ensure_run_dir(_sid("test0001"))
assert (tmp_path / "logs" / "sessions" / _sid("test0001") / "logs").is_dir()
@pytest.mark.asyncio
async def test_append_and_load_details(self, tmp_path: Path):
store = RuntimeLogStore(tmp_path / "logs")
store.ensure_run_dir("test_run_2")
store.ensure_run_dir(_sid("test0002"))
detail1 = NodeDetail(
node_id="node-1",
@@ -56,10 +85,10 @@ class TestRuntimeLogStore:
total_steps=1,
)
store.append_node_detail("test_run_2", detail1)
store.append_node_detail("test_run_2", detail2)
store.append_node_detail(_sid("test0002"), detail1)
store.append_node_detail(_sid("test0002"), detail2)
loaded = await store.load_details("test_run_2")
loaded = await store.load_details(_sid("test0002"))
assert loaded is not None
assert len(loaded.nodes) == 2
assert loaded.nodes[0].node_id == "node-1"
@@ -69,7 +98,7 @@ class TestRuntimeLogStore:
@pytest.mark.asyncio
async def test_append_and_load_tool_logs(self, tmp_path: Path):
store = RuntimeLogStore(tmp_path / "logs")
store.ensure_run_dir("test_run_3")
store.ensure_run_dir(_sid("test0003"))
step = NodeStepLog(
node_id="node-1",
@@ -91,9 +120,9 @@ class TestRuntimeLogStore:
verdict="CONTINUE",
)
store.append_step("test_run_3", step)
store.append_step(_sid("test0003"), step)
loaded = await store.load_tool_logs("test_run_3")
loaded = await store.load_tool_logs(_sid("test0003"))
assert loaded is not None
assert len(loaded.steps) == 1
assert loaded.steps[0].tool_calls[0].tool_name == "web_search"
@@ -104,7 +133,7 @@ class TestRuntimeLogStore:
async def test_save_and_load_summary(self, tmp_path: Path):
store = RuntimeLogStore(tmp_path / "logs")
summary = RunSummaryLog(
run_id="test_run_1",
run_id=_sid("test0001"),
agent_id="agent-a",
goal_id="goal-1",
status="success",
@@ -115,11 +144,11 @@ class TestRuntimeLogStore:
execution_quality="clean",
)
await store.save_summary("test_run_1", summary)
await store.save_summary(_sid("test0001"), summary)
loaded = await store.load_summary("test_run_1")
loaded = await store.load_summary(_sid("test0001"))
assert loaded is not None
assert loaded.run_id == "test_run_1"
assert loaded.run_id == _sid("test0001")
assert loaded.status == "success"
assert loaded.total_nodes_executed == 3
assert loaded.goal_id == "goal-1"
@@ -128,9 +157,9 @@ class TestRuntimeLogStore:
@pytest.mark.asyncio
async def test_load_missing_run_returns_none(self, tmp_path: Path):
store = RuntimeLogStore(tmp_path / "logs")
assert await store.load_summary("nonexistent") is None
assert await store.load_details("nonexistent") is None
assert await store.load_tool_logs("nonexistent") is None
assert await store.load_summary(_sid("missing00")) is None
assert await store.load_details(_sid("missing00")) is None
assert await store.load_tool_logs(_sid("missing00")) is None
@pytest.mark.asyncio
async def test_list_runs_empty(self, tmp_path: Path):
@@ -143,21 +172,21 @@ class TestRuntimeLogStore:
store = RuntimeLogStore(tmp_path / "logs")
# Save a success run
store.ensure_run_dir("run_ok")
store.ensure_run_dir(_sid("runok000"))
await store.save_summary(
"run_ok",
_sid("runok000"),
RunSummaryLog(
run_id="run_ok",
run_id=_sid("runok000"),
status="success",
started_at="2025-01-01T00:00:01",
),
)
# Save a failure run
store.ensure_run_dir("run_fail")
store.ensure_run_dir(_sid("runfail0"))
await store.save_summary(
"run_fail",
_sid("runfail0"),
RunSummaryLog(
run_id="run_fail",
run_id=_sid("runfail0"),
status="failure",
needs_attention=True,
started_at="2025-01-01T00:00:02",
@@ -171,19 +200,19 @@ class TestRuntimeLogStore:
# Filter by status
success_runs = await store.list_runs(status="success")
assert len(success_runs) == 1
assert success_runs[0].run_id == "run_ok"
assert success_runs[0].run_id == _sid("runok000")
# Filter by needs_attention
attention_runs = await store.list_runs(status="needs_attention")
assert len(attention_runs) == 1
assert attention_runs[0].run_id == "run_fail"
assert attention_runs[0].run_id == _sid("runfail0")
@pytest.mark.asyncio
async def test_list_runs_sorted_by_timestamp_desc(self, tmp_path: Path):
store = RuntimeLogStore(tmp_path / "logs")
for i in range(5):
run_id = f"run_{i}"
run_id = f"session_20250101_0000{i:02d}_run{i:04d}"
store.ensure_run_dir(run_id)
await store.save_summary(
run_id,
@@ -196,15 +225,15 @@ class TestRuntimeLogStore:
runs = await store.list_runs()
# Most recent first
assert runs[0].run_id == "run_4"
assert runs[-1].run_id == "run_0"
assert runs[0].run_id == "session_20250101_000004_run0004"
assert runs[-1].run_id == "session_20250101_000000_run0000"
@pytest.mark.asyncio
async def test_list_runs_limit(self, tmp_path: Path):
store = RuntimeLogStore(tmp_path / "logs")
for i in range(10):
run_id = f"run_{i}"
run_id = f"session_20250101_0000{i:02d}_run{i:04d}"
store.ensure_run_dir(run_id)
await store.save_summary(
run_id,
@@ -224,45 +253,45 @@ class TestRuntimeLogStore:
store = RuntimeLogStore(tmp_path / "logs")
# Completed run with summary
store.ensure_run_dir("run_done")
store.ensure_run_dir(_sid("rundone0"))
await store.save_summary(
"run_done",
_sid("rundone0"),
RunSummaryLog(
run_id="run_done",
run_id=_sid("rundone0"),
status="success",
started_at="2025-01-01T00:00:01",
),
)
# In-progress run: directory exists but no summary.json
store.ensure_run_dir("run_active")
store.ensure_run_dir(_sid("runactiv0"))
all_runs = await store.list_runs()
assert len(all_runs) == 2
run_ids = {r.run_id for r in all_runs}
assert "run_done" in run_ids
assert "run_active" in run_ids
assert _sid("rundone0") in run_ids
assert _sid("runactiv0") in run_ids
active = next(r for r in all_runs if r.run_id == "run_active")
active = next(r for r in all_runs if r.run_id == _sid("runactiv0"))
assert active.status == "in_progress"
@pytest.mark.asyncio
async def test_read_node_details_sync(self, tmp_path: Path):
store = RuntimeLogStore(tmp_path / "logs")
store.ensure_run_dir("test_run")
store.ensure_run_dir(_sid("testsync0"))
store.append_node_detail(
"test_run",
_sid("testsync0"),
NodeDetail(
node_id="n1", node_name="A", success=True, input_tokens=100, output_tokens=50
),
)
store.append_node_detail(
"test_run",
_sid("testsync0"),
NodeDetail(node_id="n2", node_name="B", success=False, error="oops"),
)
details = store.read_node_details_sync("test_run")
details = store.read_node_details_sync(_sid("testsync0"))
assert len(details) == 2
assert details[0].node_id == "n1"
assert details[1].error == "oops"
@@ -271,15 +300,15 @@ class TestRuntimeLogStore:
async def test_corrupt_jsonl_line_skipped(self, tmp_path: Path):
"""A corrupt JSONL line should be skipped without breaking reads."""
store = RuntimeLogStore(tmp_path / "logs")
store.ensure_run_dir("test_run")
store.ensure_run_dir(_sid("corrupt00"))
# Write a valid line, a corrupt line, then another valid line
jsonl_path = tmp_path / "logs" / "runs" / "test_run" / "details.jsonl"
jsonl_path = tmp_path / "logs" / "sessions" / _sid("corrupt00") / "logs" / "details.jsonl"
valid1 = json.dumps(NodeDetail(node_id="n1", node_name="A", success=True).model_dump())
valid2 = json.dumps(NodeDetail(node_id="n2", node_name="B", success=True).model_dump())
jsonl_path.write_text(f"{valid1}\n{{corrupt line\n{valid2}\n")
details = store.read_node_details_sync("test_run")
details = store.read_node_details_sync(_sid("corrupt00"))
assert len(details) == 2
assert details[0].node_id == "n1"
assert details[1].node_id == "n2"
@@ -297,14 +326,14 @@ class TestRuntimeLogger:
rl = RuntimeLogger(store=store, agent_id="test-agent")
run_id = rl.start_run("goal-1")
assert run_id
assert len(run_id) > 10 # timestamp + uuid
assert run_id.startswith("session_")
@pytest.mark.asyncio
async def test_start_run_creates_directory(self, tmp_path: Path):
store = RuntimeLogStore(tmp_path / "logs")
rl = RuntimeLogger(store=store, agent_id="test-agent")
run_id = rl.start_run("goal-1")
assert (tmp_path / "logs" / "runs" / run_id).is_dir()
assert (tmp_path / "logs" / "sessions" / run_id / "logs").is_dir()
@pytest.mark.asyncio
async def test_log_step_writes_to_disk_immediately(self, tmp_path: Path):
@@ -322,9 +351,11 @@ class TestRuntimeLogger:
)
# Verify the file exists and has one line
jsonl_path = tmp_path / "logs" / "runs" / run_id / "tool_logs.jsonl"
jsonl_path = tmp_path / "logs" / "sessions" / run_id / "logs" / "tool_logs.jsonl"
assert jsonl_path.exists()
lines = [line for line in jsonl_path.read_text().strip().split("\n") if line]
lines = [
line for line in jsonl_path.read_text(encoding="utf-8").strip().split("\n") if line
]
assert len(lines) == 1
data = json.loads(lines[0])
@@ -345,9 +376,10 @@ class TestRuntimeLogger:
exit_status="success",
)
jsonl_path = tmp_path / "logs" / "runs" / run_id / "details.jsonl"
jsonl_path = tmp_path / "logs" / "sessions" / run_id / "logs" / "details.jsonl"
assert jsonl_path.exists()
lines = [line for line in jsonl_path.read_text().strip().split("\n") if line]
content = jsonl_path.read_text(encoding="utf-8").strip()
lines = [line for line in content.split("\n") if line]
assert len(lines) == 1
data = json.loads(lines[0])
@@ -789,10 +821,10 @@ class TestRuntimeLogger:
# Make the store path unwritable to force an error
import os
bad_path = tmp_path / "logs" / "runs"
bad_path = tmp_path / "logs" / "sessions"
bad_path.mkdir(parents=True, exist_ok=True)
# Create a file where directory should be
run_dir = bad_path / rt_logger._run_id
run_dir = bad_path / rt_logger._run_id / "logs"
run_dir.mkdir(parents=True, exist_ok=True)
blocker = run_dir / "summary.json"
blocker.write_text("not json")
+1 -1
View File
@@ -98,7 +98,7 @@ class TestFileStorageRunOperations:
assert run_file.exists()
# Verify it's valid JSON
with open(run_file) as f:
with open(run_file, encoding="utf-8") as f:
data = json.load(f)
assert data["id"] == "my_run"
File diff suppressed because it is too large Load Diff
+693
View File
@@ -0,0 +1,693 @@
"""End-to-end test for subagent escalation via report_to_parent(wait_for_response=True).
Tests the FULL routing chain:
ExecutionStream GraphExecutor EventLoopNode _execute_subagent
_report_callback registers _EscalationReceiver in executor.node_registry
emit CLIENT_INPUT_REQUESTED with escalation_id
subscriber calls stream.inject_input(escalation_id, "done")
ExecutionStream finds _EscalationReceiver in executor.node_registry
receiver.inject_event("done") unblocks the subagent
subagent continues and completes
"""
from __future__ import annotations
import asyncio
from collections.abc import AsyncIterator
from typing import Any
import pytest
from framework.graph import Goal, NodeSpec, SuccessCriterion
from framework.graph.edge import GraphSpec
from framework.llm.provider import LLMProvider, LLMResponse, Tool
from framework.llm.stream_events import (
FinishEvent,
StreamEvent,
TextDeltaEvent,
ToolCallEvent,
)
from framework.runtime.event_bus import AgentEvent, EventBus, EventType
from framework.runtime.execution_stream import EntryPointSpec, ExecutionStream
from framework.runtime.outcome_aggregator import OutcomeAggregator
from framework.runtime.shared_state import SharedStateManager
from framework.storage.concurrent import ConcurrentStorage
# ---------------------------------------------------------------------------
# Sequenced mock LLM — returns different responses per call index
# ---------------------------------------------------------------------------
class SequencedLLM(LLMProvider):
"""Mock LLM that returns pre-programmed stream events per call.
Each call to stream() pops the next scenario from the queue.
Shared between parent and subagent (they use the same LLM instance).
"""
def __init__(self, scenarios: list[list[StreamEvent]]):
self._scenarios = list(scenarios)
self._call_index = 0
self.stream_calls: list[dict] = []
async def stream(
self,
messages: list[dict[str, Any]],
system: str = "",
tools: list[Tool] | None = None,
max_tokens: int = 4096,
) -> AsyncIterator[StreamEvent]:
self.stream_calls.append(
{
"index": self._call_index,
"system": system[:200],
"tool_names": [t.name for t in (tools or [])],
}
)
if self._call_index < len(self._scenarios):
events = self._scenarios[self._call_index]
else:
# Fallback: just finish
events = [
TextDeltaEvent(content="Done.", snapshot="Done."),
FinishEvent(stop_reason="end_turn", input_tokens=5, output_tokens=5),
]
self._call_index += 1
for event in events:
yield event
def complete(self, messages, system="", **kwargs) -> LLMResponse:
return LLMResponse(content="Summary.", model="mock", stop_reason="stop")
def complete_with_tools(self, messages, system, tools, tool_executor, **kwargs) -> LLMResponse:
return LLMResponse(content="", model="mock", stop_reason="stop")
# ---------------------------------------------------------------------------
# Test
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_escalation_e2e_through_execution_stream(tmp_path):
"""Full e2e: subagent escalation routed through ExecutionStream.inject_input().
Scenario:
1. Parent node delegates to "researcher" subagent
2. Researcher calls report_to_parent(wait_for_response=True, message="Login required")
3. A subscriber on CLIENT_INPUT_REQUESTED gets the escalation_id
4. Subscriber calls stream.inject_input(escalation_id, "done logging in")
5. Subagent unblocks, sets output, completes
6. Parent receives subagent result, sets its own output, completes
"""
# -- Graph setup --
goal = Goal(
id="escalation-test",
name="Escalation Test",
description="Test subagent escalation flow",
success_criteria=[
SuccessCriterion(
id="result",
description="Result present",
metric="output_contains",
target="result",
)
],
constraints=[],
)
parent_node = NodeSpec(
id="parent",
name="Parent",
description="Parent that delegates to researcher",
node_type="event_loop",
input_keys=["query"],
output_keys=["result"],
sub_agents=["researcher"],
system_prompt="You delegate research tasks to the researcher sub-agent.",
)
researcher_node = NodeSpec(
id="researcher",
name="Researcher",
description="Researches by browsing, may need user help for login",
node_type="event_loop",
input_keys=["task"],
output_keys=["findings"],
system_prompt="You research topics. If you hit a login wall, ask for help.",
)
graph = GraphSpec(
id="escalation-graph",
goal_id=goal.id,
version="1.0.0",
entry_node="parent",
entry_points={"start": "parent"},
terminal_nodes=["parent"],
pause_nodes=[],
nodes=[parent_node, researcher_node],
edges=[],
default_model="mock",
max_tokens=10,
)
# -- LLM scenarios --
# The LLM is shared between parent and subagent. Calls happen in order:
#
# Call 0 (parent turn 1): delegate to researcher
# Call 1 (subagent turn 1): report_to_parent(wait_for_response=True)
# → blocks here until inject_input()
# Call 2 (subagent turn 2): set_output("findings", "...")
# Call 3 (subagent turn 3): text finish (implicit judge accepts after output filled)
# Call 4 (parent turn 2): set_output("result", "...")
# Call 5 (parent turn 3): text finish
scenarios: list[list[StreamEvent]] = [
# Call 0: Parent delegates
[
ToolCallEvent(
tool_name="delegate_to_sub_agent",
tool_input={"agent_id": "researcher", "task": "Check LinkedIn profiles"},
tool_use_id="delegate_1",
),
FinishEvent(stop_reason="tool_use", input_tokens=10, output_tokens=5, model="mock"),
],
# Call 1: Subagent hits login wall, escalates
[
ToolCallEvent(
tool_name="report_to_parent",
tool_input={
"message": "Login required for LinkedIn. Please log in manually.",
"wait_for_response": True,
},
tool_use_id="report_1",
),
FinishEvent(stop_reason="tool_use", input_tokens=10, output_tokens=5, model="mock"),
],
# Call 2: Subagent continues after user login, sets output
[
ToolCallEvent(
tool_name="set_output",
tool_input={"key": "findings", "value": "Profile data extracted after login"},
tool_use_id="set_1",
),
FinishEvent(stop_reason="tool_use", input_tokens=10, output_tokens=5, model="mock"),
],
# Call 3: Subagent finishes
[
TextDeltaEvent(content="Research complete.", snapshot="Research complete."),
FinishEvent(stop_reason="end_turn", input_tokens=5, output_tokens=5, model="mock"),
],
# Call 4: Parent uses subagent result
[
ToolCallEvent(
tool_name="set_output",
tool_input={"key": "result", "value": "LinkedIn profile data retrieved"},
tool_use_id="set_2",
),
FinishEvent(stop_reason="tool_use", input_tokens=10, output_tokens=5, model="mock"),
],
# Call 5: Parent finishes
[
TextDeltaEvent(content="Task complete.", snapshot="Task complete."),
FinishEvent(stop_reason="end_turn", input_tokens=5, output_tokens=5, model="mock"),
],
]
llm = SequencedLLM(scenarios)
# -- Event bus + subscriber that auto-responds to escalation --
bus = EventBus()
escalation_events: list[AgentEvent] = []
all_events: list[AgentEvent] = []
inject_called = asyncio.Event()
# We need the stream reference for inject_input, so use a holder
stream_holder: list[ExecutionStream] = []
async def escalation_handler(event: AgentEvent):
"""Simulate a TUI/runner: when CLIENT_INPUT_REQUESTED arrives with
an escalation node_id, inject the user's response via the stream."""
all_events.append(event)
if event.type == EventType.CLIENT_INPUT_REQUESTED:
node_id = event.node_id
if ":escalation:" in node_id:
escalation_events.append(event)
# Small delay to simulate user typing
await asyncio.sleep(0.05)
# Route through the REAL inject_input chain
stream = stream_holder[0]
success = await stream.inject_input(node_id, "done logging in")
assert success, (
f"inject_input({node_id!r}) returned False — "
"escalation receiver not found in executor.node_registry"
)
inject_called.set()
bus.subscribe(
event_types=[EventType.CLIENT_INPUT_REQUESTED, EventType.CLIENT_OUTPUT_DELTA],
handler=escalation_handler,
)
# -- Build and run ExecutionStream --
storage = ConcurrentStorage(tmp_path)
await storage.start()
stream = ExecutionStream(
stream_id="start",
entry_spec=EntryPointSpec(
id="start",
name="Start",
entry_node="parent",
trigger_type="manual",
isolation_level="shared",
),
graph=graph,
goal=goal,
state_manager=SharedStateManager(),
storage=storage,
outcome_aggregator=OutcomeAggregator(goal, bus),
event_bus=bus,
llm=llm,
tools=[],
tool_executor=None,
)
stream_holder.append(stream)
await stream.start()
# Execute
execution_id = await stream.execute({"query": "Find LinkedIn profiles"})
result = await stream.wait_for_completion(execution_id, timeout=15)
await stream.stop()
await storage.stop()
# -- Assertions --
# 1. Execution completed successfully
assert result is not None, "Execution should have completed"
assert result.success, f"Execution should have succeeded, got: {result}"
# 2. Escalation event was received and routed
assert inject_called.is_set(), "inject_input should have been called for escalation"
assert len(escalation_events) >= 1, "Should have received at least one escalation event"
# 3. Escalation event has correct structure
esc_event = escalation_events[0]
assert ":escalation:" in esc_event.node_id
assert esc_event.data["prompt"] == "Login required for LinkedIn. Please log in manually."
# 4. CLIENT_OUTPUT_DELTA was emitted for the escalation message
output_deltas = [
e
for e in all_events
if e.type == EventType.CLIENT_OUTPUT_DELTA and "Login required" in e.data.get("content", "")
]
assert len(output_deltas) >= 1, (
"Should have emitted CLIENT_OUTPUT_DELTA with escalation message"
)
# 5. The parent node got the subagent's result
assert "result" in result.output
assert result.output["result"] == "LinkedIn profile data retrieved"
# 6. The LLM was called the expected number of times
assert llm._call_index >= 4, (
f"Expected at least 4 LLM calls (delegate + escalation + set_output + finish), "
f"got {llm._call_index}"
)
# 7. The user's escalation response appeared in the subagent's conversation
# Call index 2 should be the subagent's second turn (after receiving "done logging in")
assert len(llm.stream_calls) >= 3
# The second subagent call should have report_to_parent in its tools
# (verifying the subagent got the right tool set)
subagent_tools = llm.stream_calls[1]["tool_names"]
assert "report_to_parent" in subagent_tools, (
f"Subagent should have report_to_parent tool, got: {subagent_tools}"
)
@pytest.mark.asyncio
async def test_escalation_cleanup_after_completion(tmp_path):
"""Verify that _EscalationReceiver is cleaned up from the registry after use.
After the escalation flow completes, no escalation receivers should remain
in the executor's node_registry.
"""
from framework.graph.event_loop_node import _EscalationReceiver
goal = Goal(
id="cleanup-test",
name="Cleanup Test",
description="Test escalation cleanup",
success_criteria=[
SuccessCriterion(
id="result",
description="Result present",
metric="output_contains",
target="result",
)
],
constraints=[],
)
parent_node = NodeSpec(
id="parent",
name="Parent",
description="Delegates to researcher",
node_type="event_loop",
input_keys=["query"],
output_keys=["result"],
sub_agents=["researcher"],
)
researcher_node = NodeSpec(
id="researcher",
name="Researcher",
description="Researches topics",
node_type="event_loop",
input_keys=["task"],
output_keys=["findings"],
)
graph = GraphSpec(
id="cleanup-graph",
goal_id=goal.id,
version="1.0.0",
entry_node="parent",
entry_points={"start": "parent"},
terminal_nodes=["parent"],
pause_nodes=[],
nodes=[parent_node, researcher_node],
edges=[],
default_model="mock",
max_tokens=10,
)
scenarios = [
# Parent delegates
[
ToolCallEvent(
tool_name="delegate_to_sub_agent",
tool_input={"agent_id": "researcher", "task": "Check page"},
tool_use_id="d1",
),
FinishEvent(stop_reason="tool_use", input_tokens=10, output_tokens=5, model="mock"),
],
# Subagent escalates
[
ToolCallEvent(
tool_name="report_to_parent",
tool_input={"message": "Need help", "wait_for_response": True},
tool_use_id="r1",
),
FinishEvent(stop_reason="tool_use", input_tokens=10, output_tokens=5, model="mock"),
],
# Subagent sets output
[
ToolCallEvent(
tool_name="set_output",
tool_input={"key": "findings", "value": "Done"},
tool_use_id="s1",
),
FinishEvent(stop_reason="tool_use", input_tokens=10, output_tokens=5, model="mock"),
],
# Subagent finish
[
TextDeltaEvent(content="Done.", snapshot="Done."),
FinishEvent(stop_reason="end_turn", input_tokens=5, output_tokens=5, model="mock"),
],
# Parent sets output
[
ToolCallEvent(
tool_name="set_output",
tool_input={"key": "result", "value": "Got it"},
tool_use_id="s2",
),
FinishEvent(stop_reason="tool_use", input_tokens=10, output_tokens=5, model="mock"),
],
# Parent finish
[
TextDeltaEvent(content="Complete.", snapshot="Complete."),
FinishEvent(stop_reason="end_turn", input_tokens=5, output_tokens=5, model="mock"),
],
]
llm = SequencedLLM(scenarios)
bus = EventBus()
# Track node_registry contents via the executor
registries_snapshot: list[dict] = []
stream_holder: list[ExecutionStream] = []
async def auto_respond(event: AgentEvent):
if event.type == EventType.CLIENT_INPUT_REQUESTED and ":escalation:" in event.node_id:
stream = stream_holder[0]
# Snapshot the active executor's node_registry BEFORE responding
for executor in stream._active_executors.values():
escalation_keys = [k for k in executor.node_registry if ":escalation:" in k]
registries_snapshot.append(
{
"phase": "before_inject",
"escalation_keys": escalation_keys,
"has_receiver": any(
isinstance(v, _EscalationReceiver)
for v in executor.node_registry.values()
),
}
)
await asyncio.sleep(0.02)
await stream.inject_input(event.node_id, "ok")
bus.subscribe(
event_types=[EventType.CLIENT_INPUT_REQUESTED],
handler=auto_respond,
)
storage = ConcurrentStorage(tmp_path)
await storage.start()
stream = ExecutionStream(
stream_id="start",
entry_spec=EntryPointSpec(
id="start",
name="Start",
entry_node="parent",
trigger_type="manual",
isolation_level="shared",
),
graph=graph,
goal=goal,
state_manager=SharedStateManager(),
storage=storage,
outcome_aggregator=OutcomeAggregator(goal, bus),
event_bus=bus,
llm=llm,
tools=[],
tool_executor=None,
)
stream_holder.append(stream)
await stream.start()
execution_id = await stream.execute({"query": "test"})
result = await stream.wait_for_completion(execution_id, timeout=15)
await stream.stop()
await storage.stop()
assert result is not None and result.success
# The receiver WAS in the registry during escalation
assert len(registries_snapshot) >= 1
assert registries_snapshot[0]["has_receiver"] is True
assert len(registries_snapshot[0]["escalation_keys"]) == 1
# After completion, no active executors remain (they're cleaned up),
# so no stale receivers can linger. The `finally` block in the callback
# guarantees cleanup even within a single execution.
# ---------------------------------------------------------------------------
# Test: mark_complete e2e through ExecutionStream
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_mark_complete_e2e_through_execution_stream(tmp_path):
"""Full e2e: subagent uses report_to_parent(mark_complete=True) to terminate.
Scenario:
1. Parent delegates to "researcher" subagent
2. Researcher calls report_to_parent(mark_complete=True, message="Found profiles", data={...})
3. Subagent terminates immediately (no set_output needed)
4. Parent receives subagent result with reports, sets its own output, completes
"""
goal = Goal(
id="mark-complete-test",
name="Mark Complete Test",
description="Test mark_complete subagent flow",
success_criteria=[
SuccessCriterion(
id="result",
description="Result present",
metric="output_contains",
target="result",
)
],
constraints=[],
)
parent_node = NodeSpec(
id="parent",
name="Parent",
description="Parent that delegates to researcher",
node_type="event_loop",
input_keys=["query"],
output_keys=["result"],
sub_agents=["researcher"],
system_prompt="You delegate research tasks to the researcher sub-agent.",
)
researcher_node = NodeSpec(
id="researcher",
name="Researcher",
description="Researches topics and reports findings",
node_type="event_loop",
input_keys=["task"],
output_keys=["findings"],
system_prompt="You research topics. Use report_to_parent with mark_complete when done.",
)
graph = GraphSpec(
id="mark-complete-graph",
goal_id=goal.id,
version="1.0.0",
entry_node="parent",
entry_points={"start": "parent"},
terminal_nodes=["parent"],
pause_nodes=[],
nodes=[parent_node, researcher_node],
edges=[],
default_model="mock",
max_tokens=10,
)
# LLM call sequence:
# Call 0 (parent turn 1): delegate to researcher
# Call 1 (subagent turn 1): report_to_parent(mark_complete=True) → sets flag
# Call 2 (subagent turn 2): text finish (inner loop exit) → _evaluate sees flag → ACCEPT
# Call 3 (parent turn 2): set_output("result", "...")
# Call 4 (parent turn 3): text finish
scenarios: list[list[StreamEvent]] = [
# Call 0: Parent delegates
[
ToolCallEvent(
tool_name="delegate_to_sub_agent",
tool_input={"agent_id": "researcher", "task": "Find LinkedIn profiles"},
tool_use_id="delegate_1",
),
FinishEvent(stop_reason="tool_use", input_tokens=10, output_tokens=5, model="mock"),
],
# Call 1: Subagent reports with mark_complete=True
[
ToolCallEvent(
tool_name="report_to_parent",
tool_input={
"message": "Found 3 matching profiles",
"data": {"profiles": ["alice", "bob", "carol"]},
"mark_complete": True,
},
tool_use_id="report_1",
),
FinishEvent(stop_reason="tool_use", input_tokens=10, output_tokens=5, model="mock"),
],
# Call 2: Subagent text finish (inner loop needs this to exit)
[
TextDeltaEvent(content="Done.", snapshot="Done."),
FinishEvent(stop_reason="end_turn", input_tokens=5, output_tokens=5, model="mock"),
],
# Call 3: Parent uses subagent result to set output
[
ToolCallEvent(
tool_name="set_output",
tool_input={"key": "result", "value": "Found 3 profiles: alice, bob, carol"},
tool_use_id="set_1",
),
FinishEvent(stop_reason="tool_use", input_tokens=10, output_tokens=5, model="mock"),
],
# Call 4: Parent finishes
[
TextDeltaEvent(content="Task complete.", snapshot="Task complete."),
FinishEvent(stop_reason="end_turn", input_tokens=5, output_tokens=5, model="mock"),
],
]
llm = SequencedLLM(scenarios)
bus = EventBus()
# Track subagent report events
report_events: list[AgentEvent] = []
async def report_handler(event: AgentEvent):
if event.type == EventType.SUBAGENT_REPORT:
report_events.append(event)
bus.subscribe(event_types=[EventType.SUBAGENT_REPORT], handler=report_handler)
storage = ConcurrentStorage(tmp_path)
await storage.start()
stream = ExecutionStream(
stream_id="start",
entry_spec=EntryPointSpec(
id="start",
name="Start",
entry_node="parent",
trigger_type="manual",
isolation_level="shared",
),
graph=graph,
goal=goal,
state_manager=SharedStateManager(),
storage=storage,
outcome_aggregator=OutcomeAggregator(goal, bus),
event_bus=bus,
llm=llm,
tools=[],
tool_executor=None,
)
await stream.start()
execution_id = await stream.execute({"query": "Find LinkedIn profiles"})
result = await stream.wait_for_completion(execution_id, timeout=15)
await stream.stop()
await storage.stop()
# -- Assertions --
# 1. Execution completed successfully
assert result is not None, "Execution should have completed"
assert result.success, f"Execution should have succeeded, got: {result}"
# 2. Parent got the final output
assert "result" in result.output
assert "3 profiles" in result.output["result"]
# 3. Subagent report was emitted via event bus
# (The subagent's EventLoopNode has event_bus=None, but _execute_subagent
# wires its own callback that emits via the parent's bus)
assert len(report_events) >= 1, "Should have received subagent report event"
assert report_events[0].data["message"] == "Found 3 matching profiles"
# 4. The subagent did NOT need to call set_output — it used mark_complete
# Verify by checking LLM call count: subagent only needed 2 calls
# (report_to_parent + text finish), not 3+ (report + set_output + text finish)
assert llm._call_index == 5, (
f"Expected 5 LLM calls total (delegate + report + finish + set_output + finish), "
f"got {llm._call_index}"
)

Some files were not shown because too many files have changed in this diff Show More