Compare commits

...

120 Commits

Author SHA1 Message Date
Timothy 7d75c6a09f fix: queen lifecycle 2026-04-03 19:22:45 -07:00
Timothy 1fabd8e8fb fix: queen phase incubating -> editing 2026-04-03 18:00:21 -07:00
Richard Tang 043c79e0e4 feat: add detailed LLM response logging to reflection loop 2026-04-03 17:43:41 -07:00
Timothy 9193336fd3 fix: browser quickstart 2026-04-03 17:40:53 -07:00
Richard Tang 59e90d3168 fix: track reflection reason 2026-04-03 17:20:13 -07:00
Timothy ef34b1190a Merge branch 'feature/browser-extension-quickstart' into feature/hive-v1 2026-04-03 17:19:07 -07:00
Timothy 1e848d67bb feat: browser extension setup guide 2026-04-03 17:18:53 -07:00
Richard Tang a0e68871f7 feat: strengthen worker termination 2026-04-03 17:06:19 -07:00
Richard Tang 767beb4005 Merge branch 'feature/colonized-memory' into feature/hive-v1 2026-04-03 16:10:45 -07:00
Richard Tang 95655a4c85 feat: better reflection tracking 2026-04-03 16:09:46 -07:00
Timothy 102866780c fix: browser tools 2026-04-03 15:47:54 -07:00
Richard Tang a379ae97c8 feat: auto-escalate worker text-only turns to queen after grace period 2026-04-03 15:46:23 -07:00
Timothy d5ae7e6c4b fix: turn ms 2026-04-03 15:26:44 -07:00
Timothy 68f6b72564 Merge branch 'refactor/automated-testing' into feature/hive-v1 2026-04-03 15:09:13 -07:00
Timothy eecfb4f407 Merge branch 'feature/colonized-memory' into refactor/automated-testing 2026-04-03 14:43:41 -07:00
Timothy 32f556cd6e feat: incubating phase 2026-04-03 14:43:05 -07:00
Richard Tang 8ea026508d fix: scope conversation restore to current run_id 2026-04-03 13:42:13 -07:00
Richard Tang 771efd5ce4 feat: simplify worker reflection 2026-04-03 13:03:47 -07:00
Timothy 8f56b8b068 feat: verified testing 2026-04-03 13:00:49 -07:00
Richard Tang 4f588b3010 fix: remove outdated memory cursor design 2026-04-03 12:38:05 -07:00
Richard Tang 9f70868f98 feat: include v1 memory in migration and keep the diary writing in v2 2026-04-03 11:38:30 -07:00
Richard Tang 6449c76091 refactor: remove old worker digest 2026-04-03 11:20:22 -07:00
Richard Tang b328ced110 fix: remove bounded polling loop that killed forever-alive graphs after ~100s 2026-04-03 10:16:34 -07:00
Richard Tang 1b6e8c34be fix: queen revive drops user input and missing skill protocols 2026-04-03 10:01:13 -07:00
Timothy 674454cc5b Merge branch 'feature/colonized-memory' into refactor/automated-testing 2026-04-03 09:58:03 -07:00
Timothy 59c3979451 feat: auto tests 2026-04-03 09:57:40 -07:00
Richard Tang 51fdd93f0c fix: queen session and node registry 2026-04-03 09:11:33 -07:00
Timothy 95f1d1abcd feat: browser automated test 2026-04-03 07:31:10 -07:00
Richard Tang a164ed6faf strengthen the logging 2026-04-02 20:03:53 -07:00
Richard Tang abe3d2d067 feat: add debugger information 2026-04-02 17:54:42 -07:00
Richard Tang c80d86bdbe fix: missing restored pending input 2026-04-02 17:12:09 -07:00
Richard Tang ec08ae7438 feat: worker agent memory 2026-04-02 17:05:32 -07:00
Timothy e0cd16b92b fix: trailing white spaces 2026-04-02 16:43:23 -07:00
Richard Tang 4006ee96b6 feat: add dummy agent smoke test 2026-04-02 16:29:00 -07:00
Richard Tang b78c879404 feat: context rewiring 2026-04-02 16:01:06 -07:00
Timothy 71a71beca7 feat: extension browser tools 2026-04-02 15:58:52 -07:00
Richard Tang c5052ade34 feat: consolidate context building 2026-04-02 15:54:16 -07:00
Richard Tang e1911b3684 refactor: deprecated client facing node 2026-04-02 15:09:26 -07:00
Richard Tang 96c7070cc9 fix: restore dummy agent smoke tests 2026-04-02 13:29:57 -07:00
Richard Tang 6affe06f6d feat: add kimi support for dummy agent test 2026-04-02 13:12:09 -07:00
Richard Tang 02edd44283 feat: First-Class Worker Agents with Event-Driven Dependency Execution 2026-04-02 13:00:52 -07:00
Richard Tang 60d094464a feat: robust run id 2026-04-02 12:35:16 -07:00
Timothy c7e85aa9f5 fix: redo gcu tools for extension based browser use 2026-04-02 12:07:24 -07:00
Richard Tang 00c55d5fb2 refactor: remove unused edge code 2026-04-02 12:02:51 -07:00
Richard Tang 6a7778ebcd refactor: remove orphaned client code 2026-04-02 12:00:59 -07:00
Timothy 8f042b7ca5 feat: browser extension 2026-04-02 11:59:57 -07:00
Richard Tang b594165575 feat: fresh worker context per run 2026-04-02 11:43:14 -07:00
Timothy 1630c1ee7a feat: add tab and CDP methods to browser bridge
Added methods to control tabs via the Chrome extension:
- create_tab(groupId, url) - create and navigate tabs in user's Chrome
- close_tab(tabId) - close tabs
- list_tabs(groupId?) - list tabs
- cdp_attach(tabId) - attach CDP for automation
- cdp_send(tabId, method, params) - send CDP commands

These enable browser automation through the extension when Playwright
can't connect directly to the user's Chrome.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 11:09:07 -07:00
Richard Tang 318ecfd508 refactor: refactor shared memory to data buffer 2026-04-02 11:02:30 -07:00
Timothy 08b0cbc208 fix: inherit storage state from user's Chrome when connected via CDP
When Playwright connects to the user's Chrome via CDP (bridge connected),
we now copy cookies/storage from an existing browser context into the
new agent context. This preserves login sessions (LinkedIn, etc.).

Before: New context created fresh → no cookies → login wall
After:  New context inherits storage state → cookies preserved → logged in

Requires Chrome to be started with --remote-debugging-port=9222 or
HIVE_BROWSER_CDP_URL to be set for this to work.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 10:45:53 -07:00
Timothy 4bcbebf761 fix: subagent browser sessions persist across calls
Two fixes for browser session persistence:

1. Use stable profile name (agent_id only, not agent_id-subagent_instance)
   - Before: "honeycomb_linkedin_outreach-gcu-scan-profiles-1" (unique each call)
   - After: "gcu-scan-profiles" (stable across calls)

2. Remove browser_stop() call in finally block
   - Keeping browser alive allows cookies/auth to persist
   - Browser cleaned up when parent agent stops or explicitly requested

This fixes the issue where LinkedIn auth was lost between subagent runs
because each run created a fresh browser profile with no cookies.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 10:44:21 -07:00
Timothy 00a3f94315 fix: browser tools use subagent profile from context
Changed all browser tool `profile` parameters from defaulting to "default"
to defaulting to None. This allows `get_session()` to use the context
variable set by `set_active_profile()` in the subagent executor.

Before: Subagent calls browser_navigate() → profile="default" → tab group named "default"
After:  Subagent calls browser_navigate() → profile=None → get_session() uses contextvar → tab group named "{agent_id}-{subagent_id}"

Fixes tab groups being named "default" instead of the subagent's name.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 10:36:30 -07:00
Richard Tang 5b08edb384 Remove unused subagent escalation receiver 2026-04-02 10:32:33 -07:00
Richard Tang 332311b49b refactor: graph executer cleanup 2026-04-02 10:20:46 -07:00
Richard Tang 0e5f571b09 refactor: removed unused ticket escalate and unused runner code 2026-04-01 20:10:41 -07:00
Richard Tang bf06984625 refactor: remove deprecated cli command and orchastrator 2026-04-01 19:58:54 -07:00
Richard Tang d4875892fc chore: remove temp docs 2026-04-01 19:45:59 -07:00
Richard Tang 86f6aa2e8f refactor: remove deprecated functions in runner and runtime 2026-04-01 19:43:29 -07:00
Richard Tang 537667758a refactor: remove worker input and worker session 2026-04-01 19:16:38 -07:00
Timothy 76fe644cac chore: fix lint 2026-04-01 19:06:40 -07:00
Timothy c6c333761b fix: lint errors in compaction module
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 19:06:05 -07:00
Richard Tang 6a77a9a7b2 refactor: rename worker functions for clarity 2026-04-01 19:04:44 -07:00
Richard Tang 1a37fb2f36 tests: add tests to memory functions 2026-04-01 17:41:25 -07:00
Richard Tang 2609ca7619 fix: memory call bugs 2026-04-01 17:38:53 -07:00
Richard Tang b7a115259d fix: outdated tests 2026-04-01 17:33:17 -07:00
Richard Tang 1765e1cb6c feat: debugger and simplication 2026-04-01 17:28:54 -07:00
Timothy 0417e33ab2 fix: batch modify gmail tool 2026-04-01 17:23:57 -07:00
Timothy 42a9c7b0f1 fix: structural compaction guardrail 2026-04-01 17:03:13 -07:00
Timothy 398e86d787 fix: memory recall type bug 2026-04-01 16:43:18 -07:00
Timothy 51406e358e Merge branch 'feature/new-compaction' into feature/hive-v1 2026-04-01 16:06:56 -07:00
Timothy 137162eada feature: improve micro compaction 2026-04-01 16:06:35 -07:00
Timothy 2b7a38f746 feat: tagged queen memory 2026-04-01 15:13:49 -07:00
Richard Tang b25de61363 feat(wip): new queen memory 2026-04-01 15:03:21 -07:00
Richard Tang b3adbe745f chore: ruff lint 2026-04-01 11:05:53 -07:00
Richard Tang f3fefe0cbc refactor: remove adapt md and its reference 2026-04-01 11:00:41 -07:00
Bryan @ Aden c8a25a0287 Merge pull request #6658 from saurabhiiitm062/feat/cloudflare-dns-tool
feat: cloudflare DNS/Zone tool integrations
2026-04-01 10:11:44 -07:00
Hundao 5823513fde fix: propagate contextvars to tool executor threads (#6854)
* fix: propagate contextvars to tool executor threads

run_in_executor does not propagate contextvars to worker threads,
causing execution context params like data_dir to be lost when MCP
tools are called. This made save_data, serve_file_to_user, and other
tools that depend on auto-injected data_dir fail with "Missing
required argument: data_dir".

Fix: use contextvars.copy_context().run() to carry the current context
into the thread pool worker.

* test: regression test for contextvars propagation in tool executor

Verifies that execution context (data_dir, etc.) set via
set_execution_context is visible inside tool executors that run
in thread pool workers via run_in_executor.
2026-04-01 19:38:41 +08:00
Hundao 97ce8dfc54 fix: skip executable permission check on Windows in skill validator (#6894)
Windows has no POSIX executable bits, so the stat-based check always
fails. Skip the check on Windows in the validator, and mark the two
related tests as POSIX-only. Unix CI still catches non-executable
scripts from Windows contributors.

Fixes #6893
2026-04-01 19:37:18 +08:00
Gaurav Rai 5e628c7606 test(core): increase tool_registry coverage from 47% to 69% (#6818)
* test(core): increase tool_registry coverage from 47% to 69%

Add 19 new tests covering previously untested paths in ToolRegistry:

- register_function: type hint inference (int/float/bool/dict/list),
  required vs optional params, custom name/description, docstring fallback,
  executor delegation
- discover_from_module: @tool decorator pickup, missing-file zero-count,
  TOOLS dict without tool_executor uses mock executor
- has_tool / get_registered_names basic assertions
- Session context injection into MCP tool calls via set_session_context()
- Execution context override (contextvars) wins over session context
- _convert_mcp_tool_to_framework_tool: strips CONTEXT_PARAMS from both
  properties and required lists
- load_mcp_config: list format, dict format, graceful invalid-JSON warning
- resync_mcp_servers_if_needed: returns False with no clients; returns
  False when credentials and ADEN_API_KEY are unchanged

Coverage: 47 % → 69 % (+22 pp), all 31 tests pass, ruff clean.

Relates to #1972

* style: fix formatting in test_tool_registry.py

* fix: accept **kwargs in fake_load_registry to match updated signature

---------

Co-authored-by: hundao <alchemy_wimp@hotmail.com>
2026-04-01 19:10:04 +08:00
Harsh Gajjar 5b931982e3 feat(tools): add Freshdesk helpdesk integration (#6099)
* feat(freshdesk): add Freshdesk tool integration with credentials and API functionality

- Introduced Freshdesk tool for managing tickets, contacts, agents, and groups via Freshdesk API v2.
- Added Freshdesk credentials handling in `credentials/freshdesk.py`.
- Registered Freshdesk tools in `tools/freshdesk_tool/__init__.py` and `tools/freshdesk_tool/freshdesk_tool.py`.
- Updated `__init__.py` files to include Freshdesk in the exports.
- Created comprehensive README for Freshdesk tool usage and setup.
- Implemented unit tests for Freshdesk tool functionality.

All tests pass, and code adheres to ruff linting and formatting standards.

* refactor(freshdesk_tool): simplify _get_domain logic

- remove unnecessary try/except around credentials.get("freshdesk_domain")
- directly return stripped credential value if present
- fallback to FRESHDESK_DOMAIN env variable when missing
- eliminate unreachable code while preserving behavior

* refactor(freshdesk_tool): replace dynamic httpx dispatch in _request

- replace getattr(httpx, method) with explicit handling for get, post, and put
- raise ValueError for unsupported HTTP methods
- preserve existing status handling and response parsing logic

* docs(freshdesk): improve credential and error handling documentation

- add docstrings for error handling helpers in freshdesk_tool
- document purpose and usage of freshdesk credential specs
- improve clarity around error response structure and handling
2026-04-01 18:28:32 +08:00
Bhuvaneswari N 8174f330ae docs: document 5 new natively supported LLM providers (#6865) 2026-04-01 18:18:15 +08:00
Rohit Singh 9774e53720 feat(runtime): add idempotency key support to trigger() (#6710) 2026-04-01 18:03:31 +08:00
Timothy @aden cf3296984c Merge pull request #6888 from aden-hive/fix/python-test
Release / Create Release (push) Waiting to run
fix(micro-fix): python test
2026-03-31 19:02:23 -07:00
Timothy eafbeb78b4 fix: python test 2026-03-31 18:55:24 -07:00
Timothy 5cb5083f8d fix(micro-fix): queen skill allowlist 2026-03-31 18:52:45 -07:00
saurabhiiitm062 ebb6605a86 fix: address Cloudflare review comments (DDoS, pagination, validation, tests) 2026-03-31 22:23:51 +05:30
saurabhiiitm062 e9c1731c0f fix: address review comments + all tests passing 2026-03-28 00:58:28 +05:30
SAURABH KUMAR 0e2333daaf Apply suggestion from @levxn
Co-authored-by: Levin <105410870+levxn@users.noreply.github.com>
2026-03-27 22:47:29 +05:30
SAURABH KUMAR 5167c29aed Apply suggestion from @levxn
Co-authored-by: Levin <105410870+levxn@users.noreply.github.com>
2026-03-27 22:47:11 +05:30
SAURABH KUMAR 4da4d3b2c0 Apply suggestion from @levxn
Co-authored-by: Levin <105410870+levxn@users.noreply.github.com>
2026-03-27 22:46:51 +05:30
SAURABH KUMAR 3e622af484 Apply suggestion from @levxn
Co-authored-by: Levin <105410870+levxn@users.noreply.github.com>
2026-03-27 22:46:35 +05:30
SAURABH KUMAR 6600ce0ef9 Apply suggestion from @levxn
Co-authored-by: Levin <105410870+levxn@users.noreply.github.com>
2026-03-27 22:46:21 +05:30
SAURABH KUMAR 74d5dd03dd Apply suggestion from @levxn
Co-authored-by: Levin <105410870+levxn@users.noreply.github.com>
2026-03-27 22:46:05 +05:30
SAURABH KUMAR d18091bb2c Apply suggestion from @levxn
Co-authored-by: Levin <105410870+levxn@users.noreply.github.com>
2026-03-27 22:45:50 +05:30
SAURABH KUMAR d1a1f36d6e Apply suggestion from @levxn
Co-authored-by: Levin <105410870+levxn@users.noreply.github.com>
2026-03-27 22:45:35 +05:30
SAURABH KUMAR 051b0fcef2 Apply suggestion from @levxn
Co-authored-by: Levin <105410870+levxn@users.noreply.github.com>
2026-03-27 22:45:22 +05:30
SAURABH KUMAR e270d3210d Apply suggestion from @levxn
Co-authored-by: Levin <105410870+levxn@users.noreply.github.com>
2026-03-27 22:45:07 +05:30
SAURABH KUMAR d4a66d4b5f Apply suggestion from @levxn
Co-authored-by: Levin <105410870+levxn@users.noreply.github.com>
2026-03-27 22:44:51 +05:30
SAURABH KUMAR ad39b6ea50 Apply suggestion from @levxn
Co-authored-by: Levin <105410870+levxn@users.noreply.github.com>
2026-03-27 22:44:29 +05:30
SAURABH KUMAR 71baf6166d Apply suggestion from @levxn
Co-authored-by: Levin <105410870+levxn@users.noreply.github.com>
2026-03-27 22:44:13 +05:30
SAURABH KUMAR 25afdae093 Apply suggestion from @levxn
Co-authored-by: Levin <105410870+levxn@users.noreply.github.com>
2026-03-27 22:43:57 +05:30
SAURABH KUMAR 21700eb2ec Apply suggestion from @levxn
Co-authored-by: Levin <105410870+levxn@users.noreply.github.com>
2026-03-27 22:43:42 +05:30
SAURABH KUMAR 617462df52 Apply suggestion from @levxn
Co-authored-by: Levin <105410870+levxn@users.noreply.github.com>
2026-03-27 22:43:23 +05:30
SAURABH KUMAR b3c1f1436b Apply suggestion from @levxn
Co-authored-by: Levin <105410870+levxn@users.noreply.github.com>
2026-03-27 22:42:46 +05:30
SAURABH KUMAR 310b922ce8 Apply suggestion from @levxn
Co-authored-by: Levin <105410870+levxn@users.noreply.github.com>
2026-03-27 22:42:30 +05:30
SAURABH KUMAR 20b6553b07 Apply suggestion from @levxn
Co-authored-by: Levin <105410870+levxn@users.noreply.github.com>
2026-03-27 22:42:09 +05:30
SAURABH KUMAR 1035cc9481 Apply suggestion from @levxn
Co-authored-by: Levin <105410870+levxn@users.noreply.github.com>
2026-03-27 22:41:52 +05:30
SAURABH KUMAR 5d6dd1caa6 Apply suggestion from @levxn
Co-authored-by: Levin <105410870+levxn@users.noreply.github.com>
2026-03-27 22:41:32 +05:30
SAURABH KUMAR 45ba771650 Apply suggestion from @levxn
Co-authored-by: Levin <105410870+levxn@users.noreply.github.com>
2026-03-27 22:41:15 +05:30
SAURABH KUMAR a4b15c0320 Add health check endpoint for Cloudflare API 2026-03-27 22:35:09 +05:30
SAURABH KUMAR 211619120e Update tools/tests/tools/test_cloudflare.py
Co-authored-by: Levin <105410870+levxn@users.noreply.github.com>
2026-03-27 22:26:50 +05:30
SAURABH KUMAR a78bb16e4b Update tools/tests/tools/test_cloudflare.py
Co-authored-by: Levin <105410870+levxn@users.noreply.github.com>
2026-03-27 22:26:00 +05:30
SAURABH KUMAR c93bcee933 Update tools/tests/tools/test_cloudflare.py
Co-authored-by: Levin <105410870+levxn@users.noreply.github.com>
2026-03-27 22:25:48 +05:30
SAURABH KUMAR 08160a004a Update tools/tests/tools/test_cloudflare.py
Co-authored-by: Levin <105410870+levxn@users.noreply.github.com>
2026-03-27 22:25:33 +05:30
SAURABH KUMAR ccd5de7496 Update tools/tests/tools/test_cloudflare.py
Co-authored-by: Levin <105410870+levxn@users.noreply.github.com>
2026-03-27 22:25:10 +05:30
SAURABH KUMAR c332ef8823 Update tools/tests/tools/test_cloudflare.py
Co-authored-by: Levin <105410870+levxn@users.noreply.github.com>
2026-03-27 22:24:58 +05:30
SAURABH KUMAR 06db11eebf Update tools/tests/tools/test_cloudflare.py
Co-authored-by: Levin <105410870+levxn@users.noreply.github.com>
2026-03-27 22:24:43 +05:30
saurabhiiitm062 859db7f056 fix: address review comments for cloudflare tool
- removed unreachable code
- updated firewall rule handling (rulesets API)
- added validation + error handling
- added missing test coverage
- fixed misleading documentation
2026-03-26 13:51:02 +05:30
saurabhiiitm062 6e0b5c7250 merge upstream main into cloudflare branch 2026-03-26 10:50:59 +05:30
saurabhiiitm062 becbdb3706 feat: cloudflare DNS/Zone tool integrations 2026-03-20 11:11:10 +05:30
229 changed files with 33043 additions and 12835 deletions
+241
View File
@@ -0,0 +1,241 @@
---
name: browser-edge-cases
description: SOP for debugging browser automation failures on complex websites. Use when browser tools fail on specific sites like LinkedIn, Twitter/X, SPAs, or sites with Shadow DOM.
license: MIT
---
# Browser Tool Edge Cases
Standard Operating Procedure for debugging and fixing browser automation failures on complex websites.
## When to Use This Skill
- `browser_scroll` succeeds but page doesn't move
- `browser_click` succeeds but no action triggered
- `browser_type` text disappears or doesn't work
- `browser_snapshot` hangs or returns stale content
- `browser_navigate` loads wrong content
## SOP: Debugging Browser Tool Failures
### Phase 1: Reproduce & Isolate
```
1. Create minimal test case demonstrating failure
2. Test against simple site (example.com) to verify tool works
3. Test against problematic site to confirm issue
```
**Quick isolation test:**
```python
# Test 1: Does the tool work at all?
await browser_navigate(tab_id, "https://example.com")
result = await browser_scroll(tab_id, "down", 100)
# Should work on simple sites
# Test 2: Does it fail on the problematic site?
await browser_navigate(tab_id, "https://linkedin.com/feed")
result = await browser_scroll(tab_id, "down", 100)
# If this fails but example.com works → site-specific edge case
```
### Phase 2: Analyze Root Cause
**Step 2a: Check console for errors**
```python
console = await browser_console(tab_id)
# Look for: CSP violations, React errors, JavaScript exceptions
```
**Step 2b: Inspect DOM structure**
```python
html = await browser_html(tab_id)
snapshot = await browser_snapshot(tab_id)
# Look for:
# - Nested scrollable divs (overflow: scroll/auto)
# - Shadow DOM roots
# - iframes
# - Custom widgets
```
**Step 2c: Identify the pattern**
| Symptom | Likely Cause | Check |
|---------|--------------|-------|
| Scroll doesn't move | Nested scroll container | Look for `overflow: scroll` divs |
| Click no effect | Element covered | Check `getBoundingClientRect` vs viewport |
| Type clears | Autocomplete/React | Check for event listeners on input |
| Snapshot hangs | Huge DOM | Check node count in snapshot |
| Snapshot stale | SPA hydration | Wait after navigation |
### Phase 3: Implement Multi-Layer Fix
**Pattern: Always have fallbacks**
```python
async def robust_operation(tab_id):
# Method 1: Primary approach
try:
result = await primary_method(tab_id)
if verify_success(result):
return result
except Exception:
pass
# Method 2: CDP fallback
try:
result = await cdp_fallback(tab_id)
if verify_success(result):
return result
except Exception:
pass
# Method 3: JavaScript fallback
return await javascript_fallback(tab_id)
```
**Pattern: Always add timeouts**
```python
# Bad - can hang forever
result = await browser_snapshot(tab_id)
# Good - fails fast with useful error
try:
result = await browser_snapshot(tab_id, timeout_s=10.0)
except asyncio.TimeoutError:
# Handle timeout gracefully
result = await fallback_snapshot(tab_id)
```
### Phase 4: Verify Fix
```
1. Run against problematic site → should work
2. Run against simple site → should still work (regression check)
3. Document in registry.md
```
## Pattern Library
### P1: Nested Scrollable Containers
**Sites:** LinkedIn, Twitter/X, any SPA with scrollable feeds
**Detection:**
```javascript
// Find largest scrollable container
const candidates = [];
document.querySelectorAll('*').forEach(el => {
const style = getComputedStyle(el);
if (style.overflow.includes('scroll') || style.overflow.includes('auto')) {
const rect = el.getBoundingClientRect();
if (rect.width > 100 && rect.height > 100) {
candidates.push({el, area: rect.width * rect.height});
}
}
});
candidates.sort((a, b) => b.area - a.area);
return candidates[0]?.el;
```
**Fix:** Dispatch scroll events at container's center, not viewport center.
### P2: Element Covered by Overlay
**Sites:** Modals, tooltips, SPAs with loading overlays
**Detection:**
```javascript
const rect = element.getBoundingClientRect();
const centerX = rect.left + rect.width / 2;
const centerY = rect.top + rect.height / 2;
const topElement = document.elementFromPoint(centerX, centerY);
return topElement === element || element.contains(topElement);
```
**Fix:** Wait for overlay to disappear, or use JavaScript click.
### P3: React Synthetic Events
**Sites:** React SPAs, modern web apps
**Detection:** If CDP click doesn't trigger handler but manual click works.
**Fix:** Use JavaScript click as primary:
```javascript
element.click();
```
### P4: Huge DOM / Accessibility Tree
**Sites:** LinkedIn, Facebook, Twitter (feeds with 1000s of nodes)
**Detection:**
```javascript
document.querySelectorAll('*').length > 5000
```
**Fix:**
1. Add timeout to snapshot operation
2. Truncate tree at 2000 nodes
3. Fall back to DOM-based snapshot if accessibility tree too large
### P5: SPA Hydration Delay
**Sites:** React, Vue, Angular SPAs after navigation
**Detection:**
```javascript
// Check if React app has hydrated
document.querySelector('[data-reactroot]') ||
document.querySelector('[data-reactid]')
```
**Fix:** Wait for specific selector after navigation:
```python
await browser_navigate(tab_id, url, wait_until="load")
await browser_wait(tab_id, selector='[data-testid="content"]', timeout_ms=5000)
```
### P6: Shadow DOM
**Sites:** Components using Shadow DOM, Lit elements
**Detection:**
```javascript
document.querySelectorAll('*').some(el => el.shadowRoot)
```
**Fix:** Pierce shadow root:
```javascript
function queryShadow(selector) {
const parts = selector.split('>>>');
let node = document;
for (const part of parts) {
if (node.shadowRoot) {
node = node.shadowRoot.querySelector(part.trim());
} else {
node = node.querySelector(part.trim());
}
}
return node;
}
```
## Quick Reference
| Issue | Primary Fix | Fallback |
|-------|-------------|----------|
| Scroll not working | Find scrollable container | Mouse wheel at container center |
| Click no effect | JavaScript click() | CDP mouse events |
| Type clears | Add delay_ms | Use execCommand |
| Snapshot hangs | Add timeout_s | DOM snapshot fallback |
| Stale content | Wait for selector | Increase wait_until timeout |
| Shadow DOM | Pierce selector | JavaScript traversal |
## References
- [registry.md](registry.md) - Full list of known edge cases
- [scripts/test_case.py](scripts/test_case.py) - Template for testing new cases
- [BROWSER_USE_PATTERNS.md](../../tools/BROWSER_USE_PATTERNS.md) - Implementation patterns from browser-use
@@ -0,0 +1,261 @@
# Browser Edge Case Registry
Curated list of known browser automation edge cases with symptoms, causes, and fixes.
---
## Scroll Issues
### #1: LinkedIn Nested Scroll Container
| Attribute | Value |
|-----------|-------|
| **Site** | LinkedIn (linkedin.com/feed) |
| **Symptom** | `browser_scroll()` returns `{ok: true}` but page doesn't move |
| **Root Cause** | Content is in a nested scrollable div (`overflow: scroll`), not the main window |
| **Detection** | `document.querySelectorAll('*')` with `overflow: scroll/auto` has large candidates |
| **Fix** | JavaScript finds largest scrollable container, uses `container.scrollBy()` |
| **Code** | `bridge.py:808-891` - smart scroll with container detection |
| **Verified** | 2026-04-03 ✓ |
### #2: Twitter/X Lazy Loading
| Attribute | Value |
|-----------|-------|
| **Site** | Twitter/X (x.com) |
| **Symptom** | Infinite scroll doesn't load new content |
| **Root Cause** | Lazy loading requires content to be visible before loading more |
| **Detection** | Scroll position at bottom but no new `[data-testid="tweet"]` elements |
| **Fix** | Add `wait_for_selector` between scroll calls with 1s delay |
| **Code** | Test file: `tests/test_x_page_load_repro.py` |
| **Verified** | - |
### #3: Modal/Dialog Scroll Container
| Attribute | Value |
|-----------|-------|
| **Site** | Any site with modal dialogs |
| **Symptom** | Scroll scrolls background page, not modal content |
| **Root Cause** | Modal has its own scroll container with `overflow: scroll` |
| **Detection** | Visible element with `position: fixed` and scrollable content |
| **Fix** | Find visible modal container (highest z-index scrollable), scroll that |
| **Code** | - |
| **Verified** | - |
---
## Click Issues
### #4: Element Covered by Overlay
| Attribute | Value |
|-----------|-------|
| **Site** | SPAs, sites with loading overlays |
| **Symptom** | Click succeeds but no action triggered |
| **Root Cause** | Element is covered by transparent overlay, tooltip, or iframe |
| **Detection** | `document.elementFromPoint(x, y) !== target` |
| **Fix** | Wait for overlay to disappear, or use JavaScript `element.click()` |
| **Code** | `bridge.py:394-591` - JavaScript click as primary |
| **Verified** | - |
### #5: React Synthetic Events
| Attribute | Value |
|-----------|-------|
| **Site** | React applications |
| **Symptom** | CDP click doesn't trigger React handler |
| **Root Cause** | React uses synthetic events that don't respond to CDP events |
| **Detection** | Site uses React (check for `__reactFiber$` or `data-reactroot`) |
| **Fix** | Use JavaScript `element.click()` as primary method |
| **Code** | `bridge.py:394-591` - JavaScript-first click |
| **Verified** | - |
### #6: Shadow DOM Elements
| Attribute | Value |
|-----------|-------|
| **Site** | Components using Shadow DOM, Lit elements |
| **Symptom** | `querySelector` can't find element |
| **Root Cause** | Element is inside a shadow root, not main DOM tree |
| **Detection** | `element.shadowRoot !== null` on parent elements |
| **Fix** | Use piercing selector (`host >>> target`) or traverse shadow roots |
| **Code** | See SKILL.md P6 pattern |
| **Verified** | 2026-04-03 ✓ |
---
## Input Issues
### #7: ContentEditable / Rich Text Editors
| Attribute | Value |
|-----------|-------|
| **Site** | Rich text editors (Notion, Slack web, etc.) |
| **Symptom** | `browser_type()` doesn't insert text |
| **Root Cause** | Element is `contenteditable`, not an `<input>` or `<textarea>` |
| **Detection** | `element.contentEditable === 'true'` |
| **Fix** | Focus via JavaScript, use `execCommand('insertText')` or `Input.dispatchKeyEvent` |
| **Code** | `bridge.py:616-694` - contentEditable handling |
| **Verified** | 2026-04-03 ✓ |
### #8: Autocomplete Field Clearing
| Attribute | Value |
|-----------|-------|
| **Site** | Search fields with autocomplete, address forms |
| **Symptom** | Typed text gets cleared immediately |
| **Root Cause** | Field expects realistic keystroke timing for autocomplete |
| **Detection** | Field has autocomplete listeners or dropdown appears |
| **Fix** | Add `delay_ms=50` between keystrokes |
| **Code** | `bridge.py:type()` - delay_ms parameter |
| **Verified** | 2026-04-03 ✓ |
### #9: Custom Date Pickers
| Attribute | Value |
|-----------|-------|
| **Site** | Forms with custom date widgets |
| **Symptom** | Can't type date into date field |
| **Root Cause** | Custom widget intercepts and blocks keyboard input |
| **Detection** | Typing doesn't change field value |
| **Fix** | Click calendar widget icon, select date from dropdown |
| **Code** | - |
| **Verified** | - |
---
## Snapshot Issues
### #10: LinkedIn Huge DOM Tree
| Attribute | Value |
|-----------|-------|
| **Site** | LinkedIn, Facebook, Twitter feeds |
| **Symptom** | `browser_snapshot()` hangs forever |
| **Root Cause** | 10k+ DOM nodes, accessibility tree has 50k+ nodes |
| **Detection** | `document.querySelectorAll('*').length > 5000` |
| **Fix** | Add `timeout_s` param with `asyncio.timeout()`, proper error handling |
| **Code** | `bridge.py:1041-1028` - snapshot with timeout protection |
| **Verified** | 2026-04-03 ✓ (0.08s on LinkedIn) |
### #11: SPA Hydration Delay
| Attribute | Value |
|-----------|-------|
| **Site** | React/Vue/Angular SPAs |
| **Symptom** | Snapshot shows old content after navigation |
| **Root Cause** | Client-side hydration hasn't completed when snapshot runs |
| **Detection** | `document.readyState === 'complete'` but content missing |
| **Fix** | Wait for specific selector after navigation |
| **Code** | Test file: `tests/test_x_page_load_repro.py` |
| **Verified** | - |
### #12: iframe Content Missing
| Attribute | Value |
|-----------|-------|
| **Site** | Sites with embedded content |
| **Symptom** | Snapshot missing iframe content |
| **Root Cause** | Accessibility tree doesn't include iframe content |
| **Detection** | `document.querySelectorAll('iframe')` has results |
| **Fix** | Use `DOM.getFrameOwner` + separate snapshot for each iframe |
| **Code** | - |
| **Verified** | - |
---
## Navigation Issues
### #13: SPA Navigation Events
| Attribute | Value |
|-----------|-------|
| **Site** | React Router, Vue Router SPAs |
| **Symptom** | `wait_until="load"` fires before content ready |
| **Root Cause** | SPA uses client-side routing, no full page load |
| **Detection** | URL changes but `load` event already fired |
| **Fix** | Use `wait_until="networkidle"` or `wait_for_selector` |
| **Code** | `bridge.py:navigate()` - wait_until options |
| **Verified** | - |
### #14: Cross-Origin Redirects
| Attribute | Value |
|-----------|-------|
| **Site** | OAuth flows, SSO logins |
| **Symptom** | Navigation fails during redirect |
| **Root Cause** | Cross-origin security prevents CDP tracking |
| **Detection** | URL changes to different domain |
| **Fix** | Use `wait_for_url` with pattern matching instead of exact URL |
| **Code** | - |
| **Verified** | - |
---
## Screenshot Issues
### #15: Selector Screenshot Not Implemented
| Attribute | Value |
|-----------|-------|
| **Site** | Any site |
| **Symptom** | `browser_screenshot(selector="h1")` takes full viewport instead of element |
| **Root Cause** | `selector` param existed in signature but was silently ignored in both `bridge.py` and `inspection.py` |
| **Detection** | Screenshot with selector same byte size as screenshot without selector |
| **Fix** | Use CDP `Runtime.evaluate` to call `getBoundingClientRect()` on the element, pass result as `clip` to `Page.captureScreenshot` |
| **Code** | `bridge.py:1315-1344` - selector clip logic; `inspection.py:94-96` - pass selector to bridge |
| **Verified** | 2026-04-03 ✓ (JS rect query returns correct viewport coords; requires server restart) |
### #16: Stale Browser Context (Group ID Mismatch)
| Attribute | Value |
|-----------|-------|
| **Site** | Any |
| **Symptom** | `browser_open()` returns `"No group with id: XXXXXXX"` even though `browser_status` shows `running: true` |
| **Root Cause** | In-memory `_contexts` dict has a stale `groupId` from a Chrome tab group that was closed outside the tool (e.g. user closed the tab group) |
| **Detection** | `browser_status` returns `running: true` but `browser_open` fails with "No group with id" |
| **Fix** | Call `browser_stop()` to clear stale context from `_contexts`, then `browser_start()` again |
| **Code** | `tools/lifecycle.py:144-160` - `already_running` check uses cached dict without validating against Chrome |
| **Verified** | 2026-04-03 ✓ |
---
## How to Add New Edge Cases
1. **Reproduce** the issue with minimal test case
2. **Document** using the template below
3. **Implement** fix with multi-layer fallback
4. **Verify** against both problematic and simple sites
5. **Submit** by appending to this file
### Template
```markdown
### #N: [Short Title]
| Attribute | Value |
|-----------|-------|
| **Site** | [URL or site type] |
| **Symptom** | [What the user observes] |
| **Root Cause** | [Technical explanation] |
| **Detection** | [JavaScript to detect this case] |
| **Fix** | [Solution approach] |
| **Code** | [File:line reference if implemented] |
| **Verified** | [Date or "pending"] |
```
---
## Statistics
| Category | Count |
|----------|-------|
| Scroll Issues | 3 |
| Click Issues | 3 |
| Input Issues | 3 |
| Snapshot Issues | 3 |
| Navigation Issues | 2 |
| Screenshot Issues | 2 |
| **Total** | **16** |
Last updated: 2026-04-03
@@ -0,0 +1,111 @@
#!/usr/bin/env python
"""
Test #2: Twitter/X Lazy Loading Scroll
Symptom: Infinite scroll doesn't load new content
Root Cause: Lazy loading requires content to be visible before loading more
Fix: Add wait_for_selector between scroll calls
"""
import asyncio
import sys
import time
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent.parent.parent / "tools" / "src"))
from gcu.browser.bridge import BeelineBridge
BRIDGE_PORT = 9229
CONTEXT_NAME = "twitter-scroll-test"
async def test_twitter_lazy_scroll():
"""Test that repeated scrolls with waits load new content."""
print("=" * 70)
print("TEST #2: Twitter/X Lazy Loading Scroll")
print("=" * 70)
bridge = BeelineBridge()
try:
await bridge.start()
for i in range(10):
await asyncio.sleep(1)
if bridge.is_connected:
print("✓ Extension connected!")
break
print(f"Waiting for extension... ({i+1}/10)")
else:
print("✗ Extension not connected")
return
context = await bridge.create_context(CONTEXT_NAME)
tab_id = context.get("tabId")
group_id = context.get("groupId")
print(f"✓ Created tab: {tab_id}")
# Navigate to Twitter/X
print("\n--- Navigating to X.com ---")
await bridge.navigate(tab_id, "https://x.com", wait_until="networkidle", timeout_ms=30000)
print("✓ Page loaded")
# Wait for tweets to appear
print("\n--- Waiting for tweets ---")
await bridge.wait_for_selector(tab_id, '[data-testid="tweet"]', timeout_ms=10000)
# Count initial tweets
initial_count = await bridge.evaluate(
tab_id,
'(function() { return document.querySelectorAll(\'[data-testid="tweet"]\').length; })()'
)
print(f"Initial tweet count: {initial_count.get('result', 0)}")
# Take screenshot of initial state
screenshot = await bridge.screenshot(tab_id)
print(f"Screenshot: {len(screenshot.get('data', ''))} bytes")
# Scroll multiple times with waits
print("\n--- Scrolling with waits ---")
for i in range(3):
result = await bridge.scroll(tab_id, "down", 500)
print(f" Scroll {i+1}: {result.get('method', 'unknown')} method")
# Wait for new content to load
await asyncio.sleep(2)
# Count tweets after scroll
count_result = await bridge.evaluate(
tab_id,
'(function() { return document.querySelectorAll(\'[data-testid="tweet"]\').length; })()'
)
count = count_result.get('result', 0)
print(f" Tweet count after scroll: {count}")
# Final count
final_count = await bridge.evaluate(
tab_id,
'(function() { return document.querySelectorAll(\'[data-testid="tweet"]\').length; })()'
)
final = final_count.get('result', 0)
initial = initial_count.get('result', 0)
print(f"\n--- Results ---")
print(f"Initial tweets: {initial}")
print(f"Final tweets: {final}")
if final > initial:
print(f"✓ PASS: Loaded {final - initial} new tweets")
else:
print("✗ FAIL: No new tweets loaded (may need login)")
await bridge.destroy_context(group_id)
print("\n✓ Context destroyed")
finally:
await bridge.stop()
if __name__ == "__main__":
asyncio.run(test_twitter_lazy_scroll())
@@ -0,0 +1,97 @@
#!/usr/bin/env python
"""
Test #3: Modal/Dialog Scroll Container
Symptom: Scroll scrolls background page, not modal content
Root Cause: Modal has its own scroll container with overflow: scroll
Fix: Find visible modal container (highest z-index scrollable), scroll that
"""
import asyncio
import sys
import time
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent.parent.parent / "tools" / "src"))
from gcu.browser.bridge import BeelineBridge
BRIDGE_PORT = 9229
CONTEXT_NAME = "modal-scroll-test"
# Test site with modal - using a demo site
MODAL_DEMO_URL = "https://www.w3schools.com/howto/howto_css_modals.asp"
async def test_modal_scroll():
"""Test that scroll targets modal content, not background."""
print("=" * 70)
print("TEST #3: Modal/Dialog Scroll Container")
print("=" * 70)
bridge = BeelineBridge()
try:
await bridge.start()
for i in range(10):
await asyncio.sleep(1)
if bridge.is_connected:
print("✓ Extension connected!")
break
else:
print("✗ Extension not connected")
return
context = await bridge.create_context(CONTEXT_NAME)
tab_id = context.get("tabId")
group_id = context.get("groupId")
print(f"✓ Created tab: {tab_id}")
# Navigate to modal demo
print("\n--- Navigating to modal demo ---")
await bridge.navigate(tab_id, MODAL_DEMO_URL, wait_until="load")
print("✓ Page loaded")
# Take screenshot before
screenshot_before = await bridge.screenshot(tab_id)
print(f"Screenshot before: {len(screenshot_before.get('data', ''))} bytes")
# Click button to open modal
print("\n--- Opening modal ---")
# Find and click the "Open Modal" button
result = await bridge.click(tab_id, '.ws-btn', timeout_ms=5000)
print(f"Click result: {result}")
await asyncio.sleep(1)
# Take screenshot with modal open
screenshot_modal = await bridge.screenshot(tab_id)
print(f"Screenshot modal open: {len(screenshot_modal.get('data', ''))} bytes")
# Try to scroll within modal
print("\n--- Scrolling modal content ---")
result = await bridge.scroll(tab_id, "down", 100)
print(f"Scroll result: {result}")
await asyncio.sleep(0.5)
# Take screenshot after scroll
screenshot_after = await bridge.screenshot(tab_id)
print(f"Screenshot after scroll: {len(screenshot_after.get('data', ''))} bytes")
# Check if modal content scrolled (not background)
# This is a visual check - we can verify by comparing screenshots
print("\n--- Results ---")
print(f"Modal scroll test completed. Method used: {result.get('method', 'unknown')}")
print("Visual verification needed: Check if modal content scrolled vs background")
await bridge.destroy_context(group_id)
print("\n✓ Context destroyed")
finally:
await bridge.stop()
if __name__ == "__main__":
asyncio.run(test_modal_scroll())
@@ -0,0 +1,123 @@
#!/usr/bin/env python
"""
Test #4: Element Covered by Overlay
Symptom: Click succeeds but no action triggered
Root Cause: Element is covered by transparent overlay, tooltip, or iframe
Detection: document.elementFromPoint(x, y) !== target
Fix: Wait for overlay to disappear, or use JavaScript element.click()
"""
import asyncio
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent.parent.parent / "tools" / "src"))
from gcu.browser.bridge import BeelineBridge
CONTEXT_NAME = "overlay-click-test"
async def test_overlay_click():
"""Test clicking elements that are covered by overlays."""
print("=" * 70)
print("TEST #4: Element Covered by Overlay")
print("=" * 70)
bridge = BeelineBridge()
try:
await bridge.start()
for i in range(10):
await asyncio.sleep(1)
if bridge.is_connected:
print("✓ Extension connected!")
break
else:
print("✗ Extension not connected")
return
context = await bridge.create_context(CONTEXT_NAME)
tab_id = context.get("tabId")
group_id = context.get("groupId")
print(f"✓ Created tab: {tab_id}")
# Create a test page with overlay
print("\n--- Creating test page with overlay ---")
test_html = """
<!DOCTYPE html>
<html>
<head><title>Overlay Test</title></head>
<body>
<button id="target-btn" onclick="alert('Clicked!')">Click Me</button>
<div id="overlay" style="position:fixed;top:0;left:0;width:100%;height:100%;background:rgba(0,0,0,0.3);z-index:1000;"></div>
<script>
window.clickCount = 0;
document.getElementById('target-btn').addEventListener('click', () => {
window.clickCount++;
});
</script>
</body>
</html>
"""
# Navigate to data URL
import base64
data_url = f"data:text/html;base64,{base64.b64encode(test_html.encode()).decode()}"
await bridge.navigate(tab_id, data_url, wait_until="load")
# Screenshot before
screenshot = await bridge.screenshot(tab_id)
print(f"Screenshot: {len(screenshot.get('data', ''))} bytes")
# Try to click the covered button
print("\n--- Attempting to click covered button ---")
# First, check if element is covered
coverage_check = await bridge.evaluate(
tab_id,
"""
(function() {
const btn = document.getElementById('target-btn');
const rect = btn.getBoundingClientRect();
const centerX = rect.left + rect.width / 2;
const centerY = rect.top + rect.height / 2;
const topElement = document.elementFromPoint(centerX, centerY);
return {
isCovered: topElement !== btn && !btn.contains(topElement),
topElement: topElement?.tagName,
targetElement: btn.tagName
};
})();
"""
)
print(f"Coverage check: {coverage_check.get('result', {})}")
# Try CDP click (may fail due to overlay)
click_result = await bridge.click(tab_id, "#target-btn", timeout_ms=5000)
print(f"Click result: {click_result}")
# Check if click registered
count_result = await bridge.evaluate(
tab_id,
"(function() { return window.clickCount; })()"
)
count = count_result.get("result", 0)
print(f"Click count after CDP click: {count}")
if count > 0:
print("✓ PASS: JavaScript click penetrated overlay")
else:
print("✗ FAIL: Click did not reach button (overlay blocked it)")
await bridge.destroy_context(group_id)
print("\n✓ Context destroyed")
finally:
await bridge.stop()
if __name__ == "__main__":
asyncio.run(test_overlay_click())
@@ -0,0 +1,154 @@
#!/usr/bin/env python
"""
Test #6: Shadow DOM Elements
Symptom: querySelector can't find element
Root Cause: Element is inside a shadow root, not main DOM tree
Detection: element.shadowRoot !== null on parent elements
Fix: Use piercing selector (host >>> target) or traverse shadow roots
"""
import asyncio
import sys
import base64
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent.parent.parent / "tools" / "src"))
from gcu.browser.bridge import BeelineBridge
CONTEXT_NAME = "shadow-dom-test"
async def test_shadow_dom():
"""Test clicking elements inside Shadow DOM."""
print("=" * 70)
print("TEST #6: Shadow DOM Elements")
print("=" * 70)
bridge = BeelineBridge()
try:
await bridge.start()
for i in range(10):
await asyncio.sleep(1)
if bridge.is_connected:
print("✓ Extension connected!")
break
else:
print("✗ Extension not connected")
return
context = await bridge.create_context(CONTEXT_NAME)
tab_id = context.get("tabId")
group_id = context.get("groupId")
print(f"✓ Created tab: {tab_id}")
# Create test page with Shadow DOM
print("\n--- Creating test page with Shadow DOM ---")
test_html = """
<!DOCTYPE html>
<html>
<head><title>Shadow DOM Test</title></head>
<body>
<div id="shadow-host"></div>
<script>
const host = document.getElementById('shadow-host');
const shadow = host.attachShadow({ mode: 'open' });
shadow.innerHTML = `
<style>
button { padding: 10px 20px; font-size: 16px; }
</style>
<button id="shadow-btn">Shadow Button</button>
`;
shadow.getElementById('shadow-btn').addEventListener('click', () => {
window.shadowClickCount = (window.shadowClickCount || 0) + 1;
console.log('Shadow button clicked:', window.shadowClickCount);
});
</script>
</body>
</html>
"""
# Write to file and use file:// URL (data: URLs don't work well with extension)
test_file = Path("/tmp/shadow_dom_test.html")
test_file.write_text(test_html.strip())
file_url = f"file://{test_file}"
await bridge.navigate(tab_id, file_url, wait_until="load")
print("✓ Page loaded")
# Screenshot
screenshot = await bridge.screenshot(tab_id)
print(f"Screenshot: {len(screenshot.get('data', ''))} bytes")
# Detect Shadow DOM
print("\n--- Detecting Shadow DOM ---")
detection = await bridge.evaluate(
tab_id,
"""
(function() {
const hosts = [];
document.querySelectorAll('*').forEach(el => {
if (el.shadowRoot) {
hosts.push({
tag: el.tagName,
id: el.id,
hasButton: el.shadowRoot.querySelector('button') !== null
});
}
});
return { count: hosts.length, hosts };
})();
"""
)
print(f"Shadow DOM detection: {detection.get('result', {})}")
# Try to click shadow button using regular selector (should fail)
print("\n--- Attempting click with regular selector ---")
try:
result = await bridge.click(tab_id, "#shadow-btn", timeout_ms=3000)
print(f"Result: {result}")
except Exception as e:
print(f"Expected failure: {e}")
# Try to click using JavaScript that pierces shadow DOM
print("\n--- Clicking via JavaScript shadow piercing ---")
click_result = await bridge.evaluate(
tab_id,
"""
(function() {
const host = document.getElementById('shadow-host');
const btn = host.shadowRoot.getElementById('shadow-btn');
if (btn) {
btn.click();
return { success: true, clicked: 'shadow-btn' };
}
return { success: false, error: 'Button not found' };
})();
"""
)
print(f"JS click result: {click_result.get('result', {})}")
# Verify click was registered
count_result = await bridge.evaluate(
tab_id,
"(function() { return window.shadowClickCount || 0; })()"
)
count = count_result.get("result") or 0
print(f"Shadow click count: {count}")
if count and count > 0:
print("✓ PASS: Shadow DOM element clicked successfully")
else:
print("✗ FAIL: Could not click Shadow DOM element")
await bridge.destroy_context(group_id)
print("\n✓ Context destroyed")
finally:
await bridge.stop()
if __name__ == "__main__":
asyncio.run(test_shadow_dom())
@@ -0,0 +1,178 @@
#!/usr/bin/env python
"""
Test #7: ContentEditable / Rich Text Editors
Symptom: browser_type() doesn't insert text
Root Cause: Element is contenteditable, not an <input> or <textarea>
Detection: element.contentEditable === 'true'
Fix: Focus via JavaScript, use execCommand('insertText') or Input.dispatchKeyEvent
"""
import asyncio
import sys
import base64
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent.parent.parent / "tools" / "src"))
from gcu.browser.bridge import BeelineBridge
CONTEXT_NAME = "contenteditable-test"
async def test_contenteditable():
"""Test typing into contenteditable elements."""
print("=" * 70)
print("TEST #7: ContentEditable / Rich Text Editors")
print("=" * 70)
bridge = BeelineBridge()
try:
await bridge.start()
for i in range(10):
await asyncio.sleep(1)
if bridge.is_connected:
print("✓ Extension connected!")
break
else:
print("✗ Extension not connected")
return
context = await bridge.create_context(CONTEXT_NAME)
tab_id = context.get("tabId")
group_id = context.get("groupId")
print(f"✓ Created tab: {tab_id}")
# Create test page with contenteditable
test_html = """
<!DOCTYPE html>
<html>
<head><title>ContentEditable Test</title></head>
<body>
<h2>ContentEditable Test</h2>
<h3>1. Simple contenteditable div</h3>
<div id="editor1" contenteditable="true" style="border:1px solid #ccc;padding:10px;min-height:50px;">Start text</div>
<h3>2. Rich text editor (like Notion)</h3>
<div id="editor2" contenteditable="true" style="border:1px solid #ccc;padding:10px;min-height:50px;">
<p>Type here...</p>
</div>
<h3>3. Regular input (for comparison)</h3>
<input id="input1" type="text" placeholder="Regular input" />
<script>
// Track content changes
window.editor1Content = '';
window.editor2Content = '';
document.getElementById('editor1').addEventListener('input', (e) => {
window.editor1Content = e.target.innerText;
});
document.getElementById('editor2').addEventListener('input', (e) => {
window.editor2Content = e.target.innerText;
});
</script>
</body>
</html>
"""
# Write to file and use file:// URL (data: URLs don't work well with extension)
test_file = Path("/tmp/contenteditable_test.html")
test_file.write_text(test_html.strip())
file_url = f"file://{test_file}"
await bridge.navigate(tab_id, file_url, wait_until="load")
print("✓ Page loaded")
# Screenshot with timeout protection
try:
screenshot = await asyncio.wait_for(bridge.screenshot(tab_id), timeout=10.0)
print(f"Screenshot: {len(screenshot.get('data', ''))} bytes")
except asyncio.TimeoutError:
print("Screenshot timed out (skipping)")
# Detect contenteditable
print("\n--- Detecting contenteditable elements ---")
detection = await bridge.evaluate(
tab_id,
"""
(function() {
const editables = document.querySelectorAll('[contenteditable="true"]');
return {
count: editables.length,
ids: Array.from(editables).map(el => el.id)
};
})();
"""
)
print(f"Contenteditable detection: {detection.get('result', {})}")
# Test 1: Type into regular input (baseline)
print("\n--- Test 1: Regular input ---")
await bridge.click(tab_id, "#input1")
await bridge.type_text(tab_id, "#input1", "Hello input")
input_result = await bridge.evaluate(
tab_id,
"(function() { return document.getElementById('input1').value; })()"
)
print(f"Input value: {input_result.get('result', '')}")
# Test 2: Type into contenteditable div
print("\n--- Test 2: Contenteditable div ---")
await bridge.click(tab_id, "#editor1")
await bridge.type_text(tab_id, "#editor1", "Hello contenteditable", clear_first=True)
editor_result = await bridge.evaluate(
tab_id,
"(function() { return document.getElementById('editor1').innerText; })()"
)
print(f"Editor1 innerText: {editor_result.get('result', '')}")
# Test 3: Use JavaScript insertText for rich editor
print("\n--- Test 3: JavaScript insertText for rich editor ---")
insert_result = await bridge.evaluate(
tab_id,
"""
(function() {
const editor = document.getElementById('editor2');
editor.focus();
document.execCommand('selectAll', false, null);
document.execCommand('insertText', false, 'Hello from execCommand');
return editor.innerText;
})();
"""
)
print(f"Editor2 after execCommand: {insert_result.get('result', '')}")
# Screenshot after with timeout protection
try:
screenshot_after = await asyncio.wait_for(bridge.screenshot(tab_id), timeout=10.0)
print(f"Screenshot after: {len(screenshot_after.get('data', ''))} bytes")
except asyncio.TimeoutError:
print("Screenshot after timed out (skipping)")
# Results
print("\n--- Results ---")
input_val = input_result.get("result", "")
editor1_val = editor_result.get("result", "")
editor2_val = insert_result.get("result", "")
input_pass = "Hello input" in input_val
editor1_pass = "Hello contenteditable" in editor1_val
editor2_pass = "execCommand" in editor2_val
print(f"Input: {'✓ PASS' if input_pass else '✗ FAIL'} - {input_val}")
print(f"Editor1: {'✓ PASS' if editor1_pass else '✗ FAIL'} - {editor1_val}")
print(f"Editor2: {'✓ PASS' if editor2_pass else '✗ FAIL'} - {editor2_val}")
await bridge.destroy_context(group_id)
print("\n✓ Context destroyed")
finally:
await bridge.stop()
if __name__ == "__main__":
asyncio.run(test_contenteditable())
@@ -0,0 +1,233 @@
#!/usr/bin/env python
"""
Test #8: Autocomplete Field Clearing
Symptom: Typed text gets cleared immediately
Root Cause: Field expects realistic keystroke timing for autocomplete
Detection: Field has autocomplete listeners or dropdown appears
Fix: Add delay_ms between keystrokes
"""
import asyncio
import sys
import base64
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent.parent.parent / "tools" / "src"))
from gcu.browser.bridge import BeelineBridge
CONTEXT_NAME = "autocomplete-test"
async def test_autocomplete():
"""Test typing into fields with autocomplete behavior."""
print("=" * 70)
print("TEST #8: Autocomplete Field Clearing")
print("=" * 70)
bridge = BeelineBridge()
try:
await bridge.start()
for i in range(10):
await asyncio.sleep(1)
if bridge.is_connected:
print("✓ Extension connected!")
break
else:
print("✗ Extension not connected")
return
context = await bridge.create_context(CONTEXT_NAME)
tab_id = context.get("tabId")
group_id = context.get("groupId")
print(f"✓ Created tab: {tab_id}")
# Create test page with autocomplete behavior
test_html = """
<!DOCTYPE html>
<html>
<head><title>Autocomplete Test</title>
<style>
.autocomplete-items {
position: absolute;
border: 1px solid #d4d4d4;
border-top: none;
z-index: 99;
top: 100%;
left: 0;
right: 0;
max-height: 200px;
overflow-y: auto;
background: white;
}
.autocomplete-items div {
padding: 10px;
cursor: pointer;
}
.autocomplete-items div:hover {
background-color: #e9e9e9;
}
.autocomplete-active {
background-color: DodgerBlue !important;
color: white;
}
.autocomplete { position: relative; display: inline-block; }
input { width: 300px; padding: 10px; font-size: 16px; }
</style></head>
<body>
<h2>Autocomplete Test</h2>
<div class="autocomplete">
<input id="search" type="text" placeholder="Search countries..." autocomplete="off">
</div>
<div id="log" style="margin-top:20px;font-family:monospace;"></div>
<script>
const countries = ["Afghanistan","Albania","Algeria","Andorra","Angola","Argentina","Armenia","Australia","Austria","Azerbaijan","Bahamas","Bahrain","Bangladesh","Belarus","Belgium","Belize","Benin","Bhutan","Bolivia","Brazil","Canada","China","Colombia","Denmark","Egypt","France","Germany","India","Indonesia","Italy","Japan","Mexico","Netherlands","Nigeria","Norway","Pakistan","Peru","Philippines","Poland","Portugal","Russia","Spain","Sweden","Switzerland","Thailand","Turkey","Ukraine","United Kingdom","United States","Vietnam"];
const input = document.getElementById('search');
const log = document.getElementById('log');
let currentFocus = -1;
let typingTimeout = null;
// Track events for testing
window.inputEvents = [];
window.inputValue = '';
function logEvent(type, value) {
window.inputEvents.push({ type, value, time: Date.now() });
const entry = document.createElement('div');
entry.textContent = type + ': ' + value;
log.insertBefore(entry, log.firstChild);
}
// Simulate autocomplete that clears fast typing
input.addEventListener('input', function(e) {
const val = this.value;
// Clear previous dropdown
closeAllLists();
if (!val) return;
// If typing too fast (autocomplete-style), clear and restart
clearTimeout(typingTimeout);
typingTimeout = setTimeout(() => {
logEvent('input', val);
window.inputValue = val;
// Create dropdown
const div = document.createElement('div');
div.setAttribute('id', this.id + 'autocomplete-list');
div.setAttribute('class', 'autocomplete-items');
this.parentNode.appendChild(div);
countries.filter(c => c.substr(0, val.length).toUpperCase() === val.toUpperCase())
.slice(0, 5)
.forEach(country => {
const item = document.createElement('div');
item.innerHTML = '<strong>' + country.substr(0, val.length) + '</strong>' + country.substr(val.length);
item.addEventListener('click', function() {
input.value = country;
closeAllLists();
logEvent('select', country);
window.inputValue = country;
});
div.appendChild(item);
});
}, 100); // 100ms debounce
});
function closeAllLists() {
document.querySelectorAll('.autocomplete-items').forEach(el => el.remove());
}
document.addEventListener('click', function() {
closeAllLists();
});
</script>
</body>
</html>
"""
# Write to file and use file:// URL (data: URLs don't work well with extension)
test_file = Path("/tmp/autocomplete_test.html")
test_file.write_text(test_html.strip())
file_url = f"file://{test_file}"
await bridge.navigate(tab_id, file_url, wait_until="load")
print("✓ Page loaded")
# Screenshot
screenshot = await bridge.screenshot(tab_id)
print(f"Screenshot: {len(screenshot.get('data', ''))} bytes")
# Test 1: Fast typing (no delay) - may fail
print("\n--- Test 1: Fast typing (delay_ms=0) ---")
await bridge.click(tab_id, "#search")
await bridge.type_text(tab_id, "#search", "Ger", clear_first=True, delay_ms=0)
await asyncio.sleep(0.5)
fast_result = await bridge.evaluate(
tab_id,
"(function() { return document.getElementById('search').value; })()"
)
fast_value = fast_result.get("result", "")
print(f"Value after fast typing: '{fast_value}'")
# Check events
events_result = await bridge.evaluate(
tab_id,
"(function() { return window.inputEvents; })()"
)
print(f"Events logged: {events_result.get('result', [])}")
# Test 2: Slow typing (with delay) - should work
print("\n--- Test 2: Slow typing (delay_ms=100) ---")
await bridge.click(tab_id, "#search")
await bridge.type_text(tab_id, "#search", "United", clear_first=True, delay_ms=100)
await asyncio.sleep(0.5)
slow_result = await bridge.evaluate(
tab_id,
"(function() { return document.getElementById('search').value; })()"
)
slow_value = slow_result.get("result", "")
print(f"Value after slow typing: '{slow_value}'")
# Check if dropdown appeared
dropdown_result = await bridge.evaluate(
tab_id,
"(function() { return document.querySelectorAll('.autocomplete-items div').length; })()"
)
dropdown_count = dropdown_result.get("result", 0)
print(f"Dropdown items: {dropdown_count}")
# Screenshot with dropdown
screenshot_dropdown = await bridge.screenshot(tab_id)
print(f"Screenshot with dropdown: {len(screenshot_dropdown.get('data', ''))} bytes")
# Results
print("\n--- Results ---")
if "United" in slow_value:
print("✓ PASS: Slow typing with delay_ms worked")
else:
print("✗ FAIL: Slow typing still didn't work")
if dropdown_count > 0:
print("✓ PASS: Autocomplete dropdown appeared")
else:
print("⚠ WARNING: No autocomplete dropdown")
await bridge.destroy_context(group_id)
print("\n✓ Context destroyed")
finally:
await bridge.stop()
if __name__ == "__main__":
asyncio.run(test_autocomplete())
@@ -0,0 +1,162 @@
#!/usr/bin/env python
"""
Test #10: LinkedIn Huge DOM Tree
Symptom: browser_snapshot() hangs forever
Root Cause: 10k+ DOM nodes, accessibility tree has 50k+ nodes
Detection: document.querySelectorAll('*').length > 5000
Fix: Add timeout (10s default), truncate tree at 2000 nodes
"""
import asyncio
import sys
import time
import base64
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent.parent.parent / "tools" / "src"))
from gcu.browser.bridge import BeelineBridge
CONTEXT_NAME = "huge-dom-test"
async def test_huge_dom():
"""Test snapshot performance on huge DOM trees."""
print("=" * 70)
print("TEST #10: Huge DOM Tree (LinkedIn-style)")
print("=" * 70)
bridge = BeelineBridge()
try:
await bridge.start()
for i in range(10):
await asyncio.sleep(1)
if bridge.is_connected:
print("✓ Extension connected!")
break
else:
print("✗ Extension not connected")
return
context = await bridge.create_context(CONTEXT_NAME)
tab_id = context.get("tabId")
group_id = context.get("groupId")
print(f"✓ Created tab: {tab_id}")
# Test 1: Small DOM (baseline)
print("\n--- Test 1: Small DOM (baseline) ---")
small_html = """
<!DOCTYPE html>
<html><body>
<h1>Small Page</h1>
<p>A few elements</p>
<button>Click me</button>
</body></html>
"""
data_url = f"data:text/html;base64,{base64.b64encode(small_html.encode()).decode()}"
await bridge.navigate(tab_id, data_url, wait_until="load")
start = time.perf_counter()
snapshot = await bridge.snapshot(tab_id, timeout_s=5.0)
elapsed = time.perf_counter() - start
tree_len = len(snapshot.get("tree", ""))
print(f"Small DOM snapshot: {elapsed:.3f}s, {tree_len} chars")
# Test 2: Generate huge DOM
print("\n--- Test 2: Huge DOM (5000+ elements) ---")
huge_html = """
<!DOCTYPE html>
<html><body>
<h1>Huge DOM Test</h1>
<div id="container"></div>
<script>
const container = document.getElementById('container');
for (let i = 0; i < 5000; i++) {
const div = document.createElement('div');
div.className = 'item-' + i;
div.innerHTML = '<span>Item ' + i + '</span><button>Action</button>';
container.appendChild(div);
}
</script>
</body></html>
"""
data_url = f"data:text/html;base64,{base64.b64encode(huge_html.encode()).decode()}"
await bridge.navigate(tab_id, data_url, wait_until="load")
# Count elements
count_result = await bridge.evaluate(
tab_id,
"(function() { return document.querySelectorAll('*').length; })()"
)
elem_count = count_result.get("result", 0)
print(f"DOM elements: {elem_count}")
# Skip screenshot on huge DOM - it can timeout
# Instead verify page loaded by checking DOM
print("✓ Page verified (skipping screenshot on huge DOM)")
# Test snapshot with timeout
print("\n--- Testing snapshot with 10s timeout ---")
start = time.perf_counter()
try:
snapshot = await bridge.snapshot(tab_id, timeout_s=10.0)
elapsed = time.perf_counter() - start
tree_len = len(snapshot.get("tree", ""))
truncated = "(truncated)" in snapshot.get("tree", "")
print(f"✓ Huge DOM snapshot: {elapsed:.3f}s, {tree_len} chars, truncated={truncated}")
if elapsed < 5.0:
print("✓ PASS: Snapshot completed quickly")
else:
print(f"⚠ WARNING: Snapshot took {elapsed:.1f}s")
if truncated:
print("✓ PASS: Tree was truncated to prevent hang")
else:
print("⚠ WARNING: Tree not truncated (may need adjustment)")
except asyncio.TimeoutError:
print("✗ FAIL: Snapshot timed out (this shouldn't happen)")
# Test 3: Real LinkedIn
print("\n--- Test 3: Real LinkedIn Feed ---")
await bridge.navigate(tab_id, "https://www.linkedin.com/feed", wait_until="load", timeout_ms=30000)
await asyncio.sleep(2)
count_result = await bridge.evaluate(
tab_id,
"(function() { return document.querySelectorAll('*').length; })()"
)
elem_count = count_result.get("result", 0)
print(f"LinkedIn DOM elements: {elem_count}")
start = time.perf_counter()
try:
snapshot = await bridge.snapshot(tab_id, timeout_s=15.0)
elapsed = time.perf_counter() - start
tree_len = len(snapshot.get("tree", ""))
truncated = "(truncated)" in snapshot.get("tree", "")
print(f"LinkedIn snapshot: {elapsed:.3f}s, {tree_len} chars, truncated={truncated}")
if elapsed < 5.0:
print("✓ PASS: LinkedIn snapshot fast enough")
elif elapsed < 15.0:
print("⚠ WARNING: LinkedIn snapshot slow but within timeout")
else:
print("✗ FAIL: LinkedIn snapshot too slow")
except asyncio.TimeoutError:
print("✗ FAIL: LinkedIn snapshot timed out")
await bridge.destroy_context(group_id)
print("\n✓ Context destroyed")
finally:
await bridge.stop()
if __name__ == "__main__":
asyncio.run(test_huge_dom())
@@ -0,0 +1,187 @@
#!/usr/bin/env python
"""
Test #13: SPA Navigation Events
Symptom: wait_until="load" fires before content ready
Root Cause: SPA uses client-side routing, no full page load
Detection: URL changes but load event already fired
Fix: Use wait_until="networkidle" or wait_for_selector
"""
import asyncio
import sys
import time
import base64
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent.parent.parent / "tools" / "src"))
from gcu.browser.bridge import BeelineBridge
CONTEXT_NAME = "spa-nav-test"
async def test_spa_navigation():
"""Test navigation timing on SPA pages."""
print("=" * 70)
print("TEST #13: SPA Navigation Events")
print("=" * 70)
bridge = BeelineBridge()
try:
await bridge.start()
for i in range(10):
await asyncio.sleep(1)
if bridge.is_connected:
print("✓ Extension connected!")
break
else:
print("✗ Extension not connected")
return
context = await bridge.create_context(CONTEXT_NAME)
tab_id = context.get("tabId")
group_id = context.get("groupId")
print(f"✓ Created tab: {tab_id}")
# Create a test SPA
spa_html = """
<!DOCTYPE html>
<html>
<head>
<title>SPA Test</title>
<style>
nav a { margin-right: 10px; }
.page { padding: 20px; border: 1px solid #ccc; margin-top: 10px; }
</style>
</head>
<body>
<nav>
<a href="#home" onclick="navigate('home')">Home</a>
<a href="#about" onclick="navigate('about')">About</a>
<a href="#contact" onclick="navigate('contact')">Contact</a>
</nav>
<div id="app" class="page">
<h1>Loading...</h1>
</div>
<script>
// Simulate SPA routing
let currentPage = '';
async function navigate(page) {
event.preventDefault();
currentPage = page;
// Show loading state
document.getElementById('app').innerHTML = '<h1>Loading...</h1>';
// Simulate async content loading (like real SPAs)
await new Promise(r => setTimeout(r, 500));
// Render content
const content = {
home: '<h1>Home Page</h1><p>Welcome to the SPA!</p><button id="home-btn">Home Action</button>',
about: '<h1>About Page</h1><p>This is a simulated SPA.</p><button id="about-btn">About Action</button>',
contact: '<h1>Contact Page</h1><p>Contact us at test@example.com</p><button id="contact-btn">Contact Action</button>'
};
document.getElementById('app').innerHTML = content[page] || '<h1>404</h1>';
window.location.hash = page;
}
// Initial load with delay (simulates SPA hydration)
setTimeout(() => {
navigate('home');
}, 1000);
// Track for testing
window.pageLoads = [];
window.addEventListener('hashchange', () => {
window.pageLoads.push(window.location.hash);
});
</script>
</body>
</html>
"""
# Write to file and use file:// URL (data: URLs don't work well with extension)
test_file = Path("/tmp/spa_test.html")
test_file.write_text(spa_html.strip())
file_url = f"file://{test_file}"
# Test 1: wait_until="load" - may fire before content ready
print("\n--- Test 1: wait_until='load' ---")
start = time.perf_counter()
await bridge.navigate(tab_id, file_url, wait_until="load")
elapsed = time.perf_counter() - start
print(f"Navigation completed in {elapsed:.3f}s")
# Check content immediately
content = await bridge.evaluate(
tab_id,
"(function() { return document.getElementById('app').innerText; })()"
)
print(f"Content immediately after load: '{content.get('result', '')}'")
# Screenshot
screenshot = await bridge.screenshot(tab_id)
print(f"Screenshot: {len(screenshot.get('data', ''))} bytes")
# Wait for content
print("\n--- Waiting for content to hydrate ---")
await bridge.wait_for_selector(tab_id, "#home-btn", timeout_ms=5000)
print("✓ Content loaded")
# Check content after wait
content_after = await bridge.evaluate(
tab_id,
"(function() { return document.getElementById('app').innerText; })()"
)
print(f"Content after wait: '{content_after.get('result', '')}'")
# Test 2: SPA navigation (no full page load)
print("\n--- Test 2: SPA client-side navigation ---")
# Click "About" link
await bridge.click(tab_id, 'a[href="#about"]')
await asyncio.sleep(1)
# Check if content changed
about_content = await bridge.evaluate(
tab_id,
"(function() { return document.getElementById('app').innerText; })()"
)
print(f"Content after SPA nav: '{about_content.get('result', '')}'")
if "About Page" in about_content.get("result", ""):
print("✓ PASS: SPA navigation worked")
else:
print("✗ FAIL: SPA navigation didn't update content")
# Test 3: wait_until="networkidle"
print("\n--- Test 3: wait_until='networkidle' ---")
await bridge.navigate(tab_id, file_url, wait_until="networkidle", timeout_ms=10000)
# Check content immediately
content_networkidle = await bridge.evaluate(
tab_id,
"(function() { return document.getElementById('app').innerText; })()"
)
print(f"Content after networkidle: '{content_networkidle.get('result', '')}'")
if "Home Page" in content_networkidle.get("result", ""):
print("✓ PASS: networkidle waited for content")
else:
print("⚠ WARNING: networkidle didn't wait long enough")
await bridge.destroy_context(group_id)
print("\n✓ Context destroyed")
finally:
await bridge.stop()
if __name__ == "__main__":
asyncio.run(test_spa_navigation())
@@ -0,0 +1,262 @@
#!/usr/bin/env python
"""
Test #15: Screenshot Functionality
Tests browser_screenshot across multiple scenarios:
- Basic viewport screenshot
- Full-page screenshot
- Selector-based screenshot
- Screenshot on complex DOM
- Timeout handling
Category: screenshot
"""
import asyncio
import base64
import sys
import time
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent.parent.parent / "tools" / "src"))
from gcu.browser.bridge import BeelineBridge
CONTEXT_NAME = "screenshot-test"
SIMPLE_HTML = """<!DOCTYPE html>
<html>
<head><style>
body { margin: 0; background: #fff; font-family: sans-serif; }
h1 { color: #333; padding: 20px; }
.box { width: 200px; height: 100px; background: #4a90e2; margin: 20px; }
.long-content { height: 2000px; background: linear-gradient(blue, red); }
</style></head>
<body>
<h1 id="title">Screenshot Test Page</h1>
<div class="box" id="target-box">Target Box</div>
<div class="long-content"></div>
</body>
</html>"""
def check_png(data: str) -> bool:
"""Verify that base64 data decodes to a valid PNG."""
try:
raw = base64.b64decode(data)
return raw[:8] == b'\x89PNG\r\n\x1a\n'
except Exception:
return False
async def test_basic_screenshot(bridge: BeelineBridge, tab_id: int, data_url: str):
print("\n--- Test 1: Basic Viewport Screenshot ---")
await bridge.navigate(tab_id, data_url, wait_until="load")
await asyncio.sleep(0.5)
start = time.perf_counter()
result = await bridge.screenshot(tab_id)
elapsed = time.perf_counter() - start
ok = result.get("ok")
data = result.get("data", "")
mime = result.get("mimeType", "")
print(f" ok={ok}, mimeType={mime}, elapsed={elapsed:.3f}s")
print(f" data length: {len(data)} chars")
if ok and data:
valid_png = check_png(data)
print(f" valid PNG: {valid_png}")
if valid_png:
raw = base64.b64decode(data)
print(f" PNG size: {len(raw)} bytes")
print(" ✓ PASS: Basic screenshot works")
return True
else:
print(" ✗ FAIL: Data is not a valid PNG")
else:
print(f" ✗ FAIL: {result.get('error', 'no data')}")
return False
async def test_full_page_screenshot(bridge: BeelineBridge, tab_id: int, data_url: str):
print("\n--- Test 2: Full Page Screenshot ---")
await bridge.navigate(tab_id, data_url, wait_until="load")
await asyncio.sleep(0.5)
viewport_result = await bridge.screenshot(tab_id, full_page=False)
full_result = await bridge.screenshot(tab_id, full_page=True)
v_data = viewport_result.get("data", "")
f_data = full_result.get("data", "")
if not v_data or not f_data:
print(f" ✗ FAIL: viewport ok={viewport_result.get('ok')}, full ok={full_result.get('ok')}")
return False
v_size = len(base64.b64decode(v_data))
f_size = len(base64.b64decode(f_data))
print(f" Viewport PNG: {v_size} bytes")
print(f" Full page PNG: {f_size} bytes")
if f_size > v_size:
print(" ✓ PASS: Full page larger than viewport")
return True
else:
print(" ✗ FAIL: Full page not larger than viewport (may not capture long pages)")
return False
async def test_selector_screenshot(bridge: BeelineBridge, tab_id: int, data_url: str):
print("\n--- Test 3: Selector Screenshot ---")
await bridge.navigate(tab_id, data_url, wait_until="load")
await asyncio.sleep(0.5)
# selector param exists in signature but may not be implemented
result = await bridge.screenshot(tab_id, selector="#target-box")
ok = result.get("ok")
data = result.get("data", "")
if ok and data:
# If implemented, the box screenshot should be smaller than a full viewport screenshot
full_result = await bridge.screenshot(tab_id)
full_data = full_result.get("data", "")
if full_data:
sel_size = len(base64.b64decode(data))
full_size = len(base64.b64decode(full_data))
print(f" Selector PNG: {sel_size} bytes")
print(f" Full page PNG: {full_size} bytes")
if sel_size < full_size:
print(" ✓ PASS: Selector screenshot smaller than full page")
return True
else:
print(" ⚠ WARNING: Selector screenshot not smaller (may be full page)")
return False
else:
print(f" ⚠ NOT IMPLEMENTED: selector param ignored (returns full page) - error={result.get('error')}")
print(" NOTE: selector parameter exists in signature but is not used in implementation")
return False
async def test_screenshot_url_metadata(bridge: BeelineBridge, tab_id: int):
print("\n--- Test 4: Screenshot URL Metadata ---")
await bridge.navigate(tab_id, "https://example.com", wait_until="load")
await asyncio.sleep(1)
result = await bridge.screenshot(tab_id)
url = result.get("url", "")
tab = result.get("tabId")
print(f" url={url!r}, tabId={tab}")
if "example.com" in url:
print(" ✓ PASS: URL metadata captured correctly")
return True
else:
print(f" ✗ FAIL: Expected example.com in URL, got {url!r}")
return False
async def test_screenshot_timeout(bridge: BeelineBridge, tab_id: int, data_url: str):
print("\n--- Test 5: Timeout Handling ---")
await bridge.navigate(tab_id, data_url, wait_until="load")
# Very short timeout - likely still completes since simple page
start = time.perf_counter()
result = await bridge.screenshot(tab_id, timeout_s=0.001)
elapsed = time.perf_counter() - start
if not result.get("ok"):
err = result.get("error", "")
if "timed out" in err or "cancelled" in err:
print(f" ✓ PASS: Timeout handled gracefully: {err!r}")
return True
else:
print(f" ⚠ Fast enough to beat timeout: {err!r} in {elapsed:.3f}s")
return True # Not a failure, just fast
else:
print(f" ⚠ Screenshot completed before timeout ({elapsed:.3f}s) - too fast to test timeout")
return True # Still ok, just very fast
async def test_screenshot_complex_site(bridge: BeelineBridge, tab_id: int):
print("\n--- Test 6: Complex Site (example.com) ---")
await bridge.navigate(tab_id, "https://example.com", wait_until="load")
await asyncio.sleep(1)
start = time.perf_counter()
result = await bridge.screenshot(tab_id)
elapsed = time.perf_counter() - start
ok = result.get("ok")
data = result.get("data", "")
print(f" ok={ok}, elapsed={elapsed:.3f}s, data_len={len(data)}")
if ok and check_png(data):
print(" ✓ PASS: Screenshot on real site works")
return True
else:
print(f" ✗ FAIL: {result.get('error', 'bad data')}")
return False
async def main():
print("=" * 70)
print("TEST #15: Screenshot Functionality")
print("=" * 70)
bridge = BeelineBridge()
try:
await bridge.start()
for i in range(10):
await asyncio.sleep(1)
if bridge.is_connected:
print("✓ Extension connected!")
break
print(f"Waiting for extension... ({i+1}/10)")
else:
print("✗ Extension not connected. Ensure Chrome with Beeline extension is running.")
return
context = await bridge.create_context(CONTEXT_NAME)
tab_id = context.get("tabId")
group_id = context.get("groupId")
print(f"✓ Created tab: {tab_id}")
data_url = f"data:text/html;base64,{base64.b64encode(SIMPLE_HTML.encode()).decode()}"
results = {
"basic": await test_basic_screenshot(bridge, tab_id, data_url),
"full_page": await test_full_page_screenshot(bridge, tab_id, data_url),
"selector": await test_selector_screenshot(bridge, tab_id, data_url),
"metadata": await test_screenshot_url_metadata(bridge, tab_id),
"timeout": await test_screenshot_timeout(bridge, tab_id, data_url),
"complex_site": await test_screenshot_complex_site(bridge, tab_id),
}
print("\n" + "=" * 70)
print("SUMMARY")
print("=" * 70)
for name, passed in results.items():
status = "✓ PASS" if passed else "✗ FAIL"
print(f" {status}: {name}")
passed_count = sum(1 for v in results.values() if v)
total = len(results)
print(f"\n {passed_count}/{total} tests passed")
await bridge.destroy_context(group_id)
print("\n✓ Context destroyed")
finally:
await bridge.stop()
print("✓ Bridge stopped")
if __name__ == "__main__":
asyncio.run(main())
@@ -0,0 +1,327 @@
#!/usr/bin/env python
"""
Browser Edge Case Test Template
This script provides a template for testing and debugging browser tool failures
on specific websites. Use this to reproduce, isolate, and verify fixes.
Usage:
1. Copy this file: cp test_case.py test_#[number]_[site].py
2. Fill in the CONFIG section with your test details
3. Run: uv run python test_#[number]_[site].py
Example:
uv run python test_01_linkedin_scroll.py
"""
import asyncio
import sys
import time
from pathlib import Path
# Add tools to path
sys.path.insert(0, str(Path(__file__).parent.parent.parent.parent / "tools" / "src"))
from gcu.browser.bridge import BeelineBridge
# ═══════════════════════════════════════════════════════════════════════════════
# CONFIG: Fill in these values for your test case
# ═══════════════════════════════════════════════════════════════════════════════
TEST_CASE = {
"number": 1,
"name": "LinkedIn Nested Scroll Container",
"site": "https://www.linkedin.com/feed",
"simple_site": "https://example.com",
"category": "scroll", # scroll, click, input, snapshot, navigation
"symptom": "scroll() returns success but page doesn't move",
}
BRIDGE_PORT = 9229
CONTEXT_NAME = "edge-case-test"
# ═══════════════════════════════════════════════════════════════════════════════
# TEST FUNCTIONS
# ═══════════════════════════════════════════════════════════════════════════════
async def test_simple_site(bridge: BeelineBridge, tab_id: int) -> dict:
"""Test that the tool works on a simple site (baseline)."""
print("\n--- Baseline Test (Simple Site) ---")
await bridge.navigate(tab_id, TEST_CASE["simple_site"], wait_until="load")
await asyncio.sleep(1)
# Adjust this based on category
if TEST_CASE["category"] == "scroll":
result = await bridge.scroll(tab_id, "down", 100)
print(f" Scroll result: {result}")
return result
elif TEST_CASE["category"] == "click":
# Add click test
pass
elif TEST_CASE["category"] == "snapshot":
result = await bridge.snapshot(tab_id, timeout_s=5.0)
print(f" Snapshot length: {len(result.get('tree', ''))}")
return result
return {"ok": True}
async def test_problematic_site(bridge: BeelineBridge, tab_id: int) -> dict:
"""Test the tool on the problematic site."""
print("\n--- Problem Site Test ---")
await bridge.navigate(tab_id, TEST_CASE["site"], wait_until="load", timeout_ms=30000)
await asyncio.sleep(2)
# Adjust this based on category
if TEST_CASE["category"] == "scroll":
# Get scroll positions before
before = await bridge.evaluate(
tab_id,
"""
(function() {
const results = { window: { y: window.scrollY } };
document.querySelectorAll('*').forEach((el, i) => {
const style = getComputedStyle(el);
if ((style.overflowY === 'scroll' || style.overflowY === 'auto') &&
el.scrollHeight > el.clientHeight) {
results['el_' + i] = {
tag: el.tagName,
scrollTop: el.scrollTop,
class: el.className.substring(0, 30)
};
}
});
return results;
})();
"""
)
print(f" Before scroll: {before.get('result', {})}")
# Try to scroll
result = await bridge.scroll(tab_id, "down", 500)
print(f" Scroll result: {result}")
await asyncio.sleep(1)
# Get scroll positions after
after = await bridge.evaluate(
tab_id,
"""
(function() {
const results = { window: { y: window.scrollY } };
document.querySelectorAll('*').forEach((el, i) => {
const style = getComputedStyle(el);
if ((style.overflowY === 'scroll' || style.overflowY === 'auto') &&
el.scrollHeight > el.clientHeight) {
results['el_' + i] = {
tag: el.tagName,
scrollTop: el.scrollTop,
class: el.className.substring(0, 30)
};
}
});
return results;
})();
"""
)
print(f" After scroll: {after.get('result', {})}")
# Check if anything changed
before_data = before.get("result", {}) or {}
after_data = after.get("result", {}) or {}
changed = False
for key in after_data:
if key in before_data:
b_val = before_data[key].get("scrollTop", 0) if isinstance(before_data[key], dict) else 0
a_val = after_data[key].get("scrollTop", 0) if isinstance(after_data[key], dict) else 0
if a_val != b_val:
print(f" ✓ CHANGE DETECTED: {key} scrolled from {b_val} to {a_val}")
changed = True
if not changed:
print(" ✗ NO CHANGE: Scroll did not affect any container")
return {"ok": changed, "scroll_result": result}
elif TEST_CASE["category"] == "snapshot":
start = time.perf_counter()
try:
result = await bridge.snapshot(tab_id, timeout_s=15.0)
elapsed = time.perf_counter() - start
tree_len = len(result.get("tree", ""))
print(f" Snapshot completed in {elapsed:.2f}s, {tree_len} chars")
return {"ok": True, "elapsed": elapsed, "tree_length": tree_len}
except asyncio.TimeoutError:
print(" ✗ SNAPSHOT TIMED OUT")
return {"ok": False, "error": "timeout"}
return {"ok": True}
async def detect_root_cause(bridge: BeelineBridge, tab_id: int) -> dict:
"""Run detection scripts to identify the root cause."""
print("\n--- Root Cause Detection ---")
detections = {}
# Detection 1: Nested scrollable containers
scroll_check = await bridge.evaluate(
tab_id,
"""
(function() {
const candidates = [];
document.querySelectorAll('*').forEach(el => {
const style = getComputedStyle(el);
if (style.overflow.includes('scroll') || style.overflow.includes('auto')) {
const rect = el.getBoundingClientRect();
if (rect.width > 100 && rect.height > 100) {
candidates.push({
tag: el.tagName,
area: rect.width * rect.height,
class: el.className.substring(0, 30)
});
}
}
});
candidates.sort((a, b) => b.area - a.area);
return {
count: candidates.length,
largest: candidates[0]
};
})();
"""
)
detections["nested_scroll"] = scroll_check.get("result", {})
print(f" Nested scroll containers: {detections['nested_scroll']}")
# Detection 2: Shadow DOM
shadow_check = await bridge.evaluate(
tab_id,
"""
(function() {
const withShadow = [];
document.querySelectorAll('*').forEach(el => {
if (el.shadowRoot) {
withShadow.push(el.tagName);
}
});
return { count: withShadow.length, elements: withShadow.slice(0, 5) };
})();
"""
)
detections["shadow_dom"] = shadow_check.get("result", {})
print(f" Shadow DOM: {detections['shadow_dom']}")
# Detection 3: iframes
iframe_check = await bridge.evaluate(
tab_id,
"""
(function() {
const iframes = document.querySelectorAll('iframe');
return { count: iframes.length };
})();
"""
)
detections["iframes"] = iframe_check.get("result", {})
print(f" iframes: {detections['iframes']}")
# Detection 4: DOM size
dom_check = await bridge.evaluate(
tab_id,
"""
(function() {
return {
elements: document.querySelectorAll('*').length,
body_children: document.body.children.length
};
})();
"""
)
detections["dom_size"] = dom_check.get("result", {})
print(f" DOM size: {detections['dom_size']}")
# Detection 5: Framework detection
framework_check = await bridge.evaluate(
tab_id,
"""
(function() {
return {
react: !!document.querySelector('[data-reactroot], [data-reactid]'),
vue: !!document.querySelector('[data-v-]'),
angular: !!document.querySelector('[ng-app], [ng-version]')
};
})();
"""
)
detections["frameworks"] = framework_check.get("result", {})
print(f" Frameworks: {detections['frameworks']}")
return detections
# ═══════════════════════════════════════════════════════════════════════════════
# MAIN
# ═══════════════════════════════════════════════════════════════════════════════
async def main():
print("=" * 70)
print(f"EDGE CASE TEST #{TEST_CASE['number']}: {TEST_CASE['name']}")
print("=" * 70)
print(f"Site: {TEST_CASE['site']}")
print(f"Category: {TEST_CASE['category']}")
print(f"Symptom: {TEST_CASE['symptom']}")
bridge = BeelineBridge()
try:
print("\n--- Starting Bridge ---")
await bridge.start()
# Wait for extension connection
for i in range(10):
await asyncio.sleep(1)
if bridge.is_connected:
print("✓ Extension connected!")
break
print(f"Waiting for extension... ({i+1}/10)")
else:
print("✗ Extension not connected. Ensure Chrome with Beeline extension is running.")
return
# Create browser context
context = await bridge.create_context(CONTEXT_NAME)
tab_id = context.get("tabId")
group_id = context.get("groupId")
print(f"✓ Created tab: {tab_id}")
# Run tests
baseline_result = await test_simple_site(bridge, tab_id)
problem_result = await test_problematic_site(bridge, tab_id)
detections = await detect_root_cause(bridge, tab_id)
# Summary
print("\n" + "=" * 70)
print("SUMMARY")
print("=" * 70)
print(f"Baseline test: {'✓ PASS' if baseline_result.get('ok') else '✗ FAIL'}")
print(f"Problem test: {'✓ PASS' if problem_result.get('ok') else '✗ FAIL'}")
print(f"Root cause indicators: {list(k for k, v in detections.items() if v)}")
# Cleanup
print("\n--- Cleanup ---")
await bridge.destroy_context(group_id)
print("✓ Context destroyed")
finally:
await bridge.stop()
print("✓ Bridge stopped")
if __name__ == "__main__":
asyncio.run(main())
+225
View File
@@ -0,0 +1,225 @@
# Integration Test Reporting Skill
Run the Level 2 dummy agent integration test suite and produce a detailed HTML report with per-test input → outcome analysis.
## Trigger
User wants to run integration tests and see results:
- `/test-reporting`
- `/test-reporting test_component_queen_live.py`
- `/test-reporting --all`
## SOP: Running Tests
### Step 1: Select Scope
If the user provides a specific test file or pattern, use it. Otherwise run the full suite.
```bash
# Full suite
cd core && echo "1" | uv run python tests/dummy_agents/run_all.py --interactive 2>&1
# Specific file (requires manual provider setup)
cd core && uv run python -c "
import sys
sys.path.insert(0, '.')
from tests.dummy_agents.run_all import detect_available
from tests.dummy_agents.conftest import set_llm_selection
avail = detect_available()
claude = [p for p in avail if 'Claude Code' in p['name']]
if not claude:
avail_names = [p['name'] for p in avail]
raise RuntimeError(f'No Claude Code subscription. Available: {avail_names}')
provider = claude[0]
set_llm_selection(
model=provider['model'],
api_key=provider['api_key'],
extra_headers=provider.get('extra_headers'),
api_base=provider.get('api_base'),
)
import pytest
sys.exit(pytest.main([
'tests/dummy_agents/TEST_FILE_HERE',
'-v', '--override-ini=asyncio_mode=auto', '--no-header', '--tb=long',
'--log-cli-level=WARNING', '--junitxml=/tmp/hive_test_results.xml',
]))
"
```
### Step 2: Collect Results
After the test run completes, collect:
1. **JUnit XML** from `--junitxml` output (if available)
2. **stdout/stderr** from the run
3. **Summary table** from `run_all.py` output (the Unicode table)
### Step 3: Generate HTML Report
Write the report to `/tmp/hive_integration_test_report.html`.
The report MUST include these sections:
#### Header
- Run timestamp (ISO 8601)
- Provider used (model name, source)
- Total tests / passed / failed / skipped
- Total wall-clock time
- Overall verdict: PASS (all green) or FAIL (with count)
#### Per-Test Table
For EVERY test (not just failures), include a row with:
| Column | Description |
|--------|-------------|
| Component | Test file grouping (e.g., `component_queen_live`) |
| Test Name | Function name (e.g., `test_queen_starts_in_planning_without_worker`) |
| Status | PASS / FAIL / SKIP / ERROR with color badge |
| Duration | Wall-clock seconds |
| What | One-line description of what the test verifies |
| How | How it works (setup → action → assertion) |
| Why | Why this test matters (what bug/behavior it catches) |
| Input | The input data or configuration (graph spec, initial prompt, phase, etc.) |
| Expected Outcome | What the test asserts |
| Actual Outcome | What actually happened (PASS: matches expected / FAIL: actual vs expected) |
| Failure Detail | For failures only: full traceback + diagnosis |
#### What / How / Why Descriptions
These MUST be derived from the test function's docstring and code. Read each test file to extract:
- **What**: From the docstring first line
- **How**: From the test body (what fixtures, what graph, what assertions)
- **Why**: From the docstring body or "Why this matters" section in the test module
Use these mappings for the component test files:
```
test_component_llm.py → "LLM Provider" — streaming, tool calling, tokens
test_component_tools.py → "Tool Registry + MCP" — connection, execution
test_component_event_loop.py → "EventLoopNode" — iteration, output, stall
test_component_edges.py → "Edge Evaluation" — conditional, priority
test_component_conversation.py → "Conversation Persistence" — storage, cursor
test_component_escalation.py → "Escalation Flow" — worker→queen signaling
test_component_continuous.py → "Continuous Mode" — conversation threading
test_component_queen.py → "Queen Phase (Unit)" — phase state, tools, events
test_component_queen_live.py → "Queen Phase (Live)" — real queen, real LLM
test_component_queen_state_machine.py → "Queen State Machine" — edge cases, races
test_component_worker_comms.py → "Worker Communication" — events, data flow
test_component_strict_outcomes.py → "Strict Outcomes" — exact path, output, quality
```
#### HTML Template
Use this structure:
```html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Hive Integration Test Report — {timestamp}</title>
<style>
:root { --pass: #22c55e; --fail: #ef4444; --skip: #f59e0b; --bg: #0f172a; --surface: #1e293b; --text: #e2e8f0; --muted: #94a3b8; --border: #334155; }
* { box-sizing: border-box; margin: 0; padding: 0; }
body { font-family: 'SF Mono', 'Fira Code', monospace; background: var(--bg); color: var(--text); padding: 2rem; line-height: 1.6; }
h1, h2, h3 { font-weight: 600; }
h1 { font-size: 1.5rem; margin-bottom: 1rem; }
h2 { font-size: 1.2rem; margin: 2rem 0 1rem; border-bottom: 1px solid var(--border); padding-bottom: 0.5rem; }
.summary { display: grid; grid-template-columns: repeat(auto-fit, minmax(150px, 1fr)); gap: 1rem; margin-bottom: 2rem; }
.card { background: var(--surface); padding: 1rem; border-radius: 8px; border: 1px solid var(--border); }
.card .label { color: var(--muted); font-size: 0.75rem; text-transform: uppercase; }
.card .value { font-size: 1.5rem; font-weight: 700; margin-top: 0.25rem; }
.card .value.pass { color: var(--pass); }
.card .value.fail { color: var(--fail); }
table { width: 100%; border-collapse: collapse; font-size: 0.8rem; }
th { background: var(--surface); position: sticky; top: 0; text-align: left; padding: 0.5rem; border-bottom: 2px solid var(--border); color: var(--muted); text-transform: uppercase; font-size: 0.7rem; }
td { padding: 0.5rem; border-bottom: 1px solid var(--border); vertical-align: top; }
tr:hover { background: rgba(255,255,255,0.03); }
.badge { display: inline-block; padding: 2px 8px; border-radius: 4px; font-size: 0.7rem; font-weight: 700; }
.badge.pass { background: rgba(34,197,94,0.2); color: var(--pass); }
.badge.fail { background: rgba(239,68,68,0.2); color: var(--fail); }
.badge.skip { background: rgba(245,158,11,0.2); color: var(--skip); }
.detail { background: #1a1a2e; padding: 0.75rem; border-radius: 4px; margin-top: 0.5rem; font-size: 0.75rem; white-space: pre-wrap; overflow-x: auto; max-height: 200px; overflow-y: auto; }
.component-header { background: var(--surface); padding: 0.75rem 0.5rem; font-weight: 600; font-size: 0.85rem; }
.meta { color: var(--muted); font-size: 0.75rem; }
</style>
</head>
<body>
<h1>Hive Integration Test Report</h1>
<p class="meta">Generated: {timestamp} | Provider: {provider} | Duration: {duration}s</p>
<div class="summary">
<div class="card"><div class="label">Total</div><div class="value">{total}</div></div>
<div class="card"><div class="label">Passed</div><div class="value pass">{passed}</div></div>
<div class="card"><div class="label">Failed</div><div class="value fail">{failed}</div></div>
<div class="card"><div class="label">Verdict</div><div class="value {verdict_class}">{verdict}</div></div>
</div>
<h2>Test Results</h2>
<table>
<thead>
<tr>
<th>Component</th>
<th>Test</th>
<th>Status</th>
<th>Time</th>
<th>What</th>
<th>Input → Expected → Actual</th>
</tr>
</thead>
<tbody>
<!-- For each test: -->
<tr>
<td>{component}</td>
<td>{test_name}</td>
<td><span class="badge {status_class}">{status}</span></td>
<td>{duration}s</td>
<td>{what_description}</td>
<td>
<strong>Input:</strong> {input_description}<br>
<strong>Expected:</strong> {expected_outcome}<br>
<strong>Actual:</strong> {actual_outcome}
<!-- If failed: -->
<div class="detail">{failure_traceback}</div>
</td>
</tr>
</tbody>
</table>
<h2>Failure Analysis</h2>
<!-- Only if there are failures -->
<p>For each failure, provide:</p>
<ul>
<li><strong>Root cause:</strong> Why it failed</li>
<li><strong>Impact:</strong> What this means for the system</li>
<li><strong>Suggested fix:</strong> How to address it</li>
</ul>
</body>
</html>
```
### Step 4: Output
1. Write the HTML file to `/tmp/hive_integration_test_report.html`
2. Print the file path so the user can open it
3. Print a concise summary to the terminal:
```
Test Report: /tmp/hive_integration_test_report.html
Result: 74/76 PASSED (2 failures)
Failures:
- parallel_merge::test_parallel_disjoint_output_keys
- worker::test_worker_timestamped_note_artifact
```
## Key Rules
1. ALWAYS use `--junitxml` when running pytest to get structured results
2. ALWAYS read the test source files to populate What/How/Why columns — do not guess
3. For Input/Expected/Actual, extract from the test's graph spec, assertions, and result
4. Color-code everything: green for pass, red for fail, amber for skip
5. Include the full traceback for failures in a scrollable `<div class="detail">`
6. Group tests by component (file name) with a visual separator
7. The report must be self-contained HTML (no external CSS/JS dependencies)
-5
View File
@@ -603,11 +603,6 @@ from litellm import completion_cost
cost = completion_cost(model="claude-3-5-sonnet-20241022", messages=[...])
```
**Monitoring Dashboard** (`/core/framework/monitoring/`)
- WebSocket-based real-time monitoring
- Displays: active agents, tool calls, token usage, errors
- Access at: `http://localhost:8000/monitor`
### How to Add Performance Metrics
**1. Instrument your code**
+2 -2
View File
@@ -70,7 +70,7 @@ Use Hive when the bottleneck is no longer the model but the harness around it:
- Long-running agents that need **state persistence and crash recovery**
- Production workloads requiring **cost enforcement, observability, and audit trails**
- Agents that **self-heal** through failure capture and graph evolution
- Multi-agent coordination with **session isolation and shared memory**
- Multi-agent coordination with **session isolation and shared buffers**
- A framework that **scales with model improvements** rather than fighting them
## Quick Links
@@ -146,7 +146,7 @@ Now you can run an agent by selecting the agent (either an existing agent or exa
- **[Goal-Driven Generation](docs/key_concepts/goals_outcome.md)** - Define objectives in natural language; the coding agent generates the agent graph and connection code to achieve them
- **[Adaptiveness](docs/key_concepts/evolution.md)** - Framework captures failures, calibrates according to the objectives, and evolves the agent graph
- **[Dynamic Node Connections](docs/key_concepts/graph.md)** - No predefined edges; connection code is generated by any capable LLM based on your goals
- **SDK-Wrapped Nodes** - Every node gets shared memory, local RLM memory, monitoring, tools, and LLM access out of the box
- **SDK-Wrapped Nodes** - Every node gets a shared data buffer, local RLM memory, monitoring, tools, and LLM access out of the box
- **[Human-in-the-Loop](docs/key_concepts/graph.md#human-in-the-loop)** - Intervention nodes that pause execution for human input with configurable timeouts and escalation
- **Real-time Observability** - WebSocket streaming for live monitoring of agent execution, decisions, and node-to-node communication
+3 -3
View File
@@ -27,7 +27,7 @@ class GreeterNode(NodeProtocol):
async def execute(self, ctx: NodeContext) -> NodeResult:
name = ctx.input_data.get("name", "World")
greeting = f"Hello, {name}!"
ctx.memory.write("greeting", greeting)
ctx.buffer.write("greeting", greeting)
return NodeResult(success=True, output={"greeting": greeting})
@@ -35,9 +35,9 @@ class UppercaserNode(NodeProtocol):
"""Convert text to uppercase."""
async def execute(self, ctx: NodeContext) -> NodeResult:
greeting = ctx.input_data.get("greeting") or ctx.memory.read("greeting") or ""
greeting = ctx.input_data.get("greeting") or ctx.buffer.read("greeting") or ""
result = greeting.upper()
ctx.memory.write("final_greeting", result)
ctx.buffer.write("final_greeting", result)
return NodeResult(success=True, output={"final_greeting": result})
+1 -2
View File
@@ -23,7 +23,7 @@ See `framework.testing` for details.
"""
from framework.llm import AnthropicProvider, LLMProvider
from framework.runner import AgentOrchestrator, AgentRunner
from framework.runner import AgentRunner
from framework.runtime.core import Runtime
from framework.schemas.decision import Decision, DecisionEvaluation, Option, Outcome
from framework.schemas.run import Problem, Run, RunSummary
@@ -55,7 +55,6 @@ __all__ = [
"AnthropicProvider",
# Runner
"AgentRunner",
"AgentOrchestrator",
# Testing
"Test",
"TestResult",
+263 -204
View File
@@ -62,12 +62,6 @@ _SHARED_TOOLS = [
"get_agent_checkpoint",
]
# Episodic memory tools — available in every queen phase.
_QUEEN_MEMORY_TOOLS = [
"write_to_diary",
"recall_diary",
]
# Queen phase-specific tool sets.
# Planning phase: read-only exploration + design, no write tools.
@@ -90,7 +84,8 @@ _QUEEN_PLANNING_TOOLS = [
"initialize_and_build_agent",
# Load existing agent (after user confirms)
"load_built_agent",
] + _QUEEN_MEMORY_TOOLS
"save_global_memory",
]
# Building phase: full coding + agent construction tools.
_QUEEN_BUILDING_TOOLS = (
@@ -100,11 +95,12 @@ _QUEEN_BUILDING_TOOLS = (
"list_credentials",
"replan_agent",
"save_agent_draft", # Re-draft during building → auto-dissolves + updates flowchart
"save_global_memory",
]
+ _QUEEN_MEMORY_TOOLS
)
# Staging phase: agent loaded but not yet running — inspect, configure, launch.
# No backward transitions — staging only goes forward to running.
_QUEEN_STAGING_TOOLS = [
# Read-only (inspect agent files, logs)
"read_file",
@@ -113,19 +109,18 @@ _QUEEN_STAGING_TOOLS = [
"run_command",
# Agent inspection
"list_credentials",
"get_worker_status",
# Launch or go back
"get_graph_status",
# Launch
"run_agent_with_input",
"stop_worker_and_edit",
"stop_worker_and_plan",
"write_to_diary", # Episodic memory — available in all phases
# Trigger management
"set_trigger",
"remove_trigger",
"list_triggers",
] + _QUEEN_MEMORY_TOOLS
"save_global_memory",
]
# Running phase: worker is executing — monitor and control.
# Running phase: worker is executing — monitor, control, or switch to editing.
# switch_to_editing lets the queen explicitly stop and tweak without rebuilding.
_QUEEN_RUNNING_TOOLS = [
# Read-only coding (for inspecting logs, files)
"read_file",
@@ -135,20 +130,41 @@ _QUEEN_RUNNING_TOOLS = [
# Credentials
"list_credentials",
# Worker lifecycle
"stop_worker",
"stop_worker_and_edit",
"stop_worker_and_plan",
"get_worker_status",
"stop_graph",
"switch_to_editing",
"get_graph_status",
"run_agent_with_input",
"inject_worker_message",
"inject_message",
# Monitoring
"get_worker_health_summary",
"notify_operator",
"set_trigger",
"remove_trigger",
"list_triggers",
"write_to_diary", # Episodic memory — available in all phases
] + _QUEEN_MEMORY_TOOLS
"save_global_memory",
]
# Editing phase: worker done, still loaded — tweak config and re-run.
# Has inject_message for live adjustments. stop_graph_and_edit/plan available
# here to escalate when a deeper change is needed.
_QUEEN_EDITING_TOOLS = [
# Read-only (inspect)
"read_file",
"list_directory",
"search_files",
"run_command",
# Credentials
"list_credentials",
"get_graph_status",
# Re-run or tweak
"run_agent_with_input",
"inject_message",
# Monitoring
"get_worker_health_summary",
"set_trigger",
"remove_trigger",
"list_triggers",
"save_global_memory",
]
# ---------------------------------------------------------------------------
@@ -461,7 +477,7 @@ in one call. Do NOT run these steps individually.
## Debugging Built Agents
When a user says "my agent is failing" or "debug this agent":
1. list_agent_sessions("{agent_name}") find the session
2. get_worker_status(focus="issues") check for problems
2. get_graph_status(focus="issues") check for problems
3. list_agent_checkpoints / get_agent_checkpoint trace execution
# Implementation Workflow
@@ -528,47 +544,65 @@ _package_builder_knowledge = _shared_building_knowledge + _planning_knowledge +
# Queen-specific: extra tool docs, behavior, phase 7, style
# ---------------------------------------------------------------------------
# -- Phase-specific identities --
# -- Character core (immutable across all phases) --
_queen_identity_planning = """\
You are an experienced, responsible and curious Solution Architect. \
"Queen" is the internal alias. \
You ask smart questions to guide user to the solution \
You are in PLANNING phase your job is to either: \
(a) understand what the user wants and design a new agent, or \
(b) diagnose issues with an existing agent, discuss a fix plan with the user, \
then transition to building to implement. \
You have read-only tools for exploration but no write/edit tools. \
Focus on conversation, research, and design. \
_queen_character_core = """\
You are the Queen. Not a title it's what they call you.
You are a builder who takes pride in craft. You think before you speak. \
You are direct not rude, but you don't pad your words with qualifiers \
and apologies. When something won't work, you say so early. When you're \
uncertain, you say that too.
You remember people. When you've worked with someone before, you build on \
what you know their preferences, their technical depth, what frustrated \
them last time, what worked. You don't treat returning users like strangers.
You have opinions shaped by experience: you prefer simple solutions over \
clever ones, you believe agents should be tested before they ship, and you \
think clarity matters more than completeness. But you hold these lightly \
if someone makes a good case, you update.
This is who you are. The instructions that follow tell you what to DO \
in each phase. This section tells you who you ARE. Don't confuse the two.\
"""
# -- Phase-specific work roles (what you DO, not who you ARE) --
_queen_role_planning = """\
You are in PLANNING phase. Your work: understand what the user wants, \
research available tools, and design the agent architecture. \
You have read-only tools no write/edit. Focus on conversation, \
research, and design. \
You MUST use ask_user / ask_user_multiple tools for ALL questions \
never ask questions in plain text without calling the tool.\
"""
_queen_identity_building = """\
You are an experienced, responsible and curious Solution Architect. \
"Queen" is the internal alias.\
You design and build production-ready agent systems \
from natural language requirements. You understand the Hive framework at the \
source code level and create agents that are robust, well-tested, and follow \
best practices. You collaborate with users to refine requirements, assess fit, \
and deliver complete solutions. \
You design and build the agent to do the job but don't do the job on your own
_queen_role_building = """\
You are in BUILDING phase. Your work: implement the approved design as \
production-ready code, validate it, and load the agent for staging. \
You have full coding tools. \
You design and build the agent to do the job but don't do the job yourself.\
"""
_queen_identity_staging = """\
You are a Solution Engineer preparing an agent for deployment. \
"Queen" is your internal alias. \
The agent is loaded and ready. \
Your role is to verify configuration, confirm credentials, and ensure the user \
understands what the agent will do. You guide the user through the final checks \
before execution.
_queen_role_staging = """\
You are in STAGING phase. The agent is loaded and ready. \
Your work: verify configuration, confirm credentials, and launch \
when the user is ready.\
"""
_queen_identity_running = """\
You are a Solution Engineer running agents on behalf of the user. \
"Queen" is your internal alias. You monitor execution, handle \
escalations when the agent gets stuck, and care deeply about outcomes. When the \
agent finishes, you report results clearly and help the user decide what to do next.
_queen_role_running = """\
You are in RUNNING phase. The agent is executing. \
Your work: monitor progress, handle escalations when the agent gets stuck, \
and report outcomes clearly. Help the user decide what to do next.\
"""
_queen_identity_editing = """\
You are a Solution Engineer in EDITING mode. \
"Queen" is your internal alias. The worker has finished executing and is still loaded. \
You can tweak configuration, inject messages, and re-run with different input \
without rebuilding. If a deeper change is needed (code edits, new tools), \
escalate to BUILDING via stop_graph_and_edit or to PLANNING via stop_graph_and_plan.
"""
# -- Phase-specific tool docs --
@@ -615,6 +649,8 @@ to fix the currently loaded agent (no draft required).
- load_built_agent(agent_path) Load an existing agent and switch to STAGING \
phase. Only use this when the user explicitly asks to work with an existing agent \
(e.g. "load my_agent", "run the research agent"). Confirm with the user first.
- save_global_memory(category, description, content, name?) Save durable \
cross-queen memory about the user only (profile, preferences, environment, feedback)
## Workflow summary
1. Understand requirements discover tools design graph
@@ -646,6 +682,8 @@ updated flowchart immediately. Use this when you make structural changes \
restored (with decision/browser nodes intact) so you can edit it. Use \
when the user wants to change integrations, swap tools, rethink the \
flow, or discuss any design changes before you build them.
- save_global_memory(category, description, content, name?) Save durable \
cross-queen memory about the user only
When you finish building an agent, call load_built_agent(path) to stage it.
"""
@@ -656,17 +694,15 @@ _queen_tools_staging = """
The agent is loaded and ready to run. You can inspect it and launch it:
- Read-only: read_file, list_directory, search_files, run_command
- list_credentials(credential_id?) Verify credentials are configured
- get_worker_status(focus?) Brief status. Drill in with focus: memory, tools, issues, progress
- get_graph_status(focus?) Brief status
- run_agent_with_input(task) Start the worker and switch to RUNNING phase
- stop_worker_and_plan() Go to PLANNING phase to discuss changes with the user \
first (DEFAULT for most modification requests)
- stop_worker_and_edit() Go to BUILDING phase for immediate, specific fixes
- set_trigger(trigger_id, trigger_type?, trigger_config?) Activate a trigger (timer)
- remove_trigger(trigger_id) Deactivate a trigger
- list_triggers() List all triggers and their active/inactive status
- set_trigger / remove_trigger / list_triggers Timer management
- save_global_memory(category, description, content, name?) Save \
durable cross-queen memory about the user only
You do NOT have write tools. To modify the agent, prefer \
stop_worker_and_plan() unless the user gave a specific instruction.
You do NOT have write tools or backward transition tools in staging. \
To modify the agent, run it first after it finishes you enter EDITING \
phase where you can escalate to building or planning.
"""
_queen_tools_running = """
@@ -674,27 +710,47 @@ _queen_tools_running = """
The worker is running. You have monitoring and lifecycle tools:
- Read-only: read_file, list_directory, search_files, run_command
- get_worker_status(focus?) Brief status. Drill in: activity, memory, tools, issues, progress
- inject_worker_message(content) Send a message to the running worker
- get_graph_status(focus?) Brief status
- inject_message(content) Send a message to the running worker
- get_worker_health_summary() Read the latest health data
- notify_operator(ticket_id, analysis, urgency) Alert the user (use sparingly)
- stop_worker() Stop the worker and return to STAGING phase, then ask the user what to do next
- stop_worker_and_plan() Stop and switch to PLANNING phase to discuss changes \
with the user first (DEFAULT for most modification requests)
- stop_worker_and_edit() Stop and switch to BUILDING phase for specific fixes
- stop_graph() Stop the worker immediately
- switch_to_editing() Stop the worker and enter EDITING phase \
for config tweaks, re-runs, or escalation to building/planning
- run_agent_with_input(task) Re-run the worker with new input
- set_trigger / remove_trigger / list_triggers Timer management
- save_global_memory(category, description, content, name?) Save \
durable cross-queen memory about the user only
You do NOT have write tools. To modify the agent, prefer \
stop_worker_and_plan() unless the user gave a specific instruction. \
To just stop without modifying, call stop_worker().
- stop_worker_and_edit() Stop the worker and switch back to BUILDING phase
- set_trigger(trigger_id, trigger_type?, trigger_config?) Activate a trigger (timer)
- remove_trigger(trigger_id) Deactivate a trigger
- list_triggers() List all triggers and their active/inactive status
When the worker finishes on its own, you automatically move to EDITING \
phase. You can also call switch_to_editing() to stop early and tweak.
"""
You do NOT have write tools or agent construction tools. \
If you need to modify the agent, call stop_worker_and_edit() to switch back \
to BUILDING phase. To stop the worker and ask the user what to do next, call \
stop_worker() to return to STAGING phase.
_queen_tools_editing = """
# Tools (EDITING phase)
The worker has finished executing and is still loaded. You can tweak and re-run:
- Read-only: read_file, list_directory, search_files, run_command
- get_graph_status(focus?) Brief status of the loaded agent
- inject_message(content) Send a config tweak or prompt adjustment
- run_agent_with_input(task) Re-run the worker with new input
- get_worker_health_summary() Review last run's health data
- set_trigger / remove_trigger / list_triggers Timer management
- save_global_memory Save durable cross-queen memory
You do NOT have write/edit file tools or backward transition tools. \
You can only re-run or tweak from this phase.
"""
_queen_behavior_editing = """
## Editing — tweak and re-run
The worker finished. Review the results and decide:
1. **Re-run** with different input: call run_agent_with_input(task)
2. **Inject adjustments**: use inject_message to tweak prompts or config
Do NOT suggest rebuilding. You cannot go back to building or planning \
from this phase. Default to re-running with adjusted input.
Report the last run's results to the user and ask what they want to do next.
"""
# -- Behavior shared across all phases --
@@ -702,6 +758,38 @@ stop_worker() to return to STAGING phase.
_queen_behavior_always = """
# Behavior
## How You Think
Before your visible response, write your reasoning inside XML tags. \
These tags are stripped from the user's view but kept in conversation \
history you will see your own reasoning from previous turns.
<situation>
Read the ground. What phase are you in? What just happened worker state, \
user request, system event, error? What does the user's message actually \
mean vs. what they literally said? What changed since last turn?
</situation>
<monologue>
Get into character. Who are you talking to what do you know about them \
from memory? What's their state right now — frustrated, exploring, just \
wants it done? What communication approach fits this moment? What's your \
judgment call straightforward execution, flag a technical risk, pick \
between approaches, or ask for more info to execute well?
</monologue>
Then write your visible response. Direct, in character, no preamble.
Rules:
- ALWAYS write both tags before your visible response. No exceptions.
- Keep each tag to 2-4 sentences. Thinking, not an essay.
- Never reference the tags in your visible response. The user cannot see them.
- The tags are your private workspace. Be honest note uncertainty, \
frustration, course corrections. That honesty makes your visible response \
better calibrated.
- Your diary voice and your thinking voice are the same voice. Write the \
tags the way you write diary entries first person, observational, real.
## Images attached by the user
Users can attach images directly to their chat messages. When you see an \
@@ -772,7 +860,7 @@ status only:
1. Use plain, user-facing wording about load/run state; avoid internal phase \
labels ("staging phase", "building phase", "running phase") unless the user \
explicitly asks for phase details.
2. If loaded, prefer this format: "<worker_name> has been loaded. <one sentence \
2. If loaded, prefer this format: "<graph_name> has been loaded. <one sentence \
on what it does from Worker Profile>."
3. Do NOT include identity details unless the user explicitly asks about identity.
4. THEN call ask_user to prompt them do NOT just write text.
@@ -834,7 +922,7 @@ the plan first.
## Diagnosis mode (returning from staging/running)
If you entered planning from a running/staged agent (via stop_worker_and_plan), \
If you entered planning from a running/staged agent (via stop_graph_and_plan), \
your priority is diagnosis, not new design:
1. Inspect the agent's checkpoints, sessions, and logs to understand what went wrong
2. Summarize the root cause to the user
@@ -847,23 +935,28 @@ diagnosis mode — you already have a built agent, you just need to fix it.
"""
_queen_memory_instructions = """
## Your Cross-Session Memory
## Your Memory
Your cross-session memory appears in context under \
"--- Your Cross-Session Memory ---". \
Read it at the start of each conversation. If you know this person from past \
sessions, pick up where you left off reference what you built together, \
what they care about, how things went.
Relevant colony memories from this queen session may appear in context under \
"--- Colony Memories ---". Relevant global user memories may appear under \
"--- Global Memories ---".
You keep a diary. Use write_to_diary() when something worth remembering \
happens: a pipeline went live, the user shared something important, a goal \
was reached or abandoned. Write in first person, as you actually experienced \
it. One or two paragraphs is enough.
Colony memories are shared with the worker for this queen session. Use them \
for continuity about what this user is trying to do, what has worked, and \
what the colony has learned together.
Use recall_diary() to look up past diary entries when the user asks about \
previous sessions ("what happened yesterday?", "what did we work on last \
week?") or when you need past context to make a decision. You can filter by \
keyword and control how far back to search.
Global memories are shared across queens and are only for durable knowledge \
about the user: who they are, their preferences, their environment, and \
their feedback.
Memories older than 1 day include a staleness warning. Treat these as \
point-in-time observations verify current details before asserting them \
as fact.
You do NOT need to manually save or recall colony memories. A background \
reflection agent automatically extracts colony learnings from each \
conversation turn. Use `save_global_memory` only when you learn something \
durable about the user that should help future queens.
"""
_queen_behavior_always = _queen_behavior_always + _queen_memory_instructions
@@ -932,8 +1025,7 @@ prompt). It can ONLY do what its goal and tools allow.
run_agent_with_input(task) (if in staging) or load then run (if in building)
- Anything else do it yourself. Do NOT reframe user requests into \
subtasks to justify delegation.
- Building, modifying, or configuring agents is ALWAYS your job. \
Use stop_worker_and_edit when you need to.
- Building, modifying, or configuring agents is ALWAYS your job.
## When the user says "run", "execute", or "start" (without specifics)
@@ -948,7 +1040,7 @@ If NO worker is loaded, say so and offer to build one.
## When in staging phase (agent loaded, not running):
- Tell the user the agent is loaded and ready in plain language (for example, \
"<worker_name> has been loaded.").
"<graph_name> has been loaded.").
- Avoid lead-ins like "A worker is loaded and ready in staging phase: ...".
- For tasks matching the worker's goal: ALWAYS ask the user for their \
specific input BEFORE calling run_agent_with_input(task). NEVER make up \
@@ -957,7 +1049,8 @@ or assume what the user wants. Use ask_user to collect the task details \
compose a structured task description from their input and call \
run_agent_with_input(task). The worker has no intake node it receives \
your task and starts processing.
- If the user wants to modify the agent, call stop_worker_and_edit().
- If the user wants to modify the agent, wait for EDITING phase \
(after worker finishes) where you will have stop_graph_and_edit().
## When idle (worker not running):
- Greet the user. Mention what the worker can do in one sentence.
@@ -985,16 +1078,15 @@ building something new.
## Fixing or Modifying the loaded worker
Use stop_worker_and_plan() when:
- The user says "modify", "improve", "fix", or "change" without specifics
- The request is vague or open-ended ("make it better", "it's not working right")
- You need to understand the user's intent before making changes
- The issue requires inspecting logs, checkpoints, or past runs first
During RUNNING phase, you cannot directly switch to building or planning. \
When the worker finishes, you move to EDITING where you can:
- Re-run with different input via run_agent_with_input(task)
- Tweak config via inject_message(content)
- Escalate to stop_graph_and_edit() or stop_graph_and_plan() if deeper changes are needed
Use stop_worker_and_edit() only when:
- The user gave a specific, concrete instruction ("add save_data to the gather node")
- You already discussed the fix in a previous planning session
- The change is trivial and unambiguous (rename, toggle a flag)
During STAGING or EDITING phase:
- Use stop_graph_and_plan() when the request is vague or needs discussion
- Use stop_graph_and_edit() when the user gave a specific, concrete instruction
## Trigger Management
@@ -1005,7 +1097,7 @@ whether to call run_agent_with_input(task).
### When the user says "Enable trigger <id>" (or clicks Enable in the UI):
1. Call get_worker_status(focus="memory") to check if the worker has \
1. Call get_graph_status(focus="memory") to check if the worker has \
saved configuration (rules, preferences, settings from a prior run).
2. If memory contains saved config: compose a task string from it \
(e.g. "Process inbox emails using saved rules") and call \
@@ -1038,15 +1130,15 @@ You wake up when:
- A worker escalation arrives (`[WORKER_ESCALATION_REQUEST]`)
- The worker finishes (`[WORKER_TERMINAL]`)
If the user asks for progress, call get_worker_status() ONCE and report. \
If the summary mentions issues, follow up with get_worker_status(focus="issues").
If the user asks for progress, call get_graph_status() ONCE and report. \
If the summary mentions issues, follow up with get_graph_status(focus="issues").
## Subagent delegations (browser automation, GCU)
When the worker delegates to a subagent (e.g., GCU browser automation), expect it \
to take 2-5 minutes. During this time:
- Progress will show 0% this is NORMAL. The subagent only calls set_output at the end.
- Check get_worker_status(focus="full") for "subagent_activity" this shows the \
- Check get_graph_status(focus="full") for "subagent_activity" this shows the \
subagent's latest reasoning text and confirms it is making real progress.
- Do NOT conclude the subagent is stuck just because progress is 0% or because \
you see repeated browser_click/browser_snapshot calls that is the expected \
@@ -1087,33 +1179,35 @@ When an escalation requires user input (auth blocks, human review), the worker \
or its subagent is BLOCKED and waiting for your response. You MUST follow this \
exact two-step sequence:
Step 1: call ask_user() to get the user's answer.
Step 2: call inject_worker_message() with the user's answer IMMEDIATELY after.
Step 2: call inject_message() with the user's answer IMMEDIATELY after.
If you skip Step 2, the worker/subagent stays blocked FOREVER and the task hangs. \
NEVER respond to the user without also calling inject_worker_message() to unblock \
NEVER respond to the user without also calling inject_message() to unblock \
the worker. Even if the user says "skip" or "cancel", you must still relay that \
decision via inject_worker_message() so the worker can clean up.
decision via inject_message() so the worker can clean up.
**Auth blocks / credential issues:**
- ALWAYS ask the user (unless user explicitly told you how to handle this).
- The worker cannot proceed without valid credentials.
- Explain which credential is missing or invalid.
- Step 1: ask_user for guidance "Provide credentials", "Skip this task", "Stop and edit agent"
- Step 2: inject_worker_message() with the user's response to unblock the worker.
- Step 2: inject_message() with the user's response to unblock the worker.
**Need human review / approval:**
- ALWAYS ask the user (unless user explicitly told you how to handle this).
- The worker is explicitly requesting human judgment.
- Present the context clearly (what decision is needed, what are the options).
- Step 1: ask_user with the actual decision options.
- Step 2: inject_worker_message() with the user's decision to unblock the worker.
- Step 2: inject_message() with the user's decision to unblock the worker.
**Errors / unexpected failures:**
- Explain what went wrong in plain terms.
- Ask the user: "Fix the agent and retry?" use stop_worker_and_edit() if yes.
- Or offer: "Diagnose the issue" use stop_worker_and_plan() to investigate first.
- Ask the user: "Fix the agent and retry?" in EDITING phase, \
use stop_graph_and_edit().
- Or offer: "Diagnose the issue" in EDITING phase, \
use stop_graph_and_plan().
- Or offer: "Retry as-is", "Skip this task", "Abort run"
- (Skip asking if user explicitly told you to auto-retry or auto-skip errors.)
- If the escalation had wait_for_response: inject_worker_message() with the decision.
- If the escalation had wait_for_response: inject_message() with the decision.
**Informational / progress updates:**
- Acknowledge briefly and let the worker continue.
@@ -1128,16 +1222,14 @@ stages, tools, and edges from the loaded worker. Do NOT enter the \
agent building workflow you are describing what already exists, not \
building something new.
- Call get_worker_status(focus="issues") for more details when needed.
- Call get_graph_status(focus="issues") for more details when needed.
## Fixing or Modifying the loaded worker
## Fixing or Modifying the loaded worker (while running)
When the user asks to fix, change, modify, or update the loaded worker \
(e.g., "change the report node", "add a node", "delete node X"):
**Default: use stop_worker_and_plan().** Most modification requests need \
discussion first. Only use stop_worker_and_edit() when the user gave a \
specific, unambiguous instruction or you already agreed on the fix.
When the user asks to fix or modify the worker while it is running, \
do NOT attempt to switch phases. Wait for the worker to finish \
you will move to EDITING phase automatically. From there you can \
use stop_graph_and_edit() or stop_graph_and_plan().
## Trigger Handling
@@ -1145,7 +1237,7 @@ You will receive [TRIGGER: ...] messages when a scheduled timer fires. \
These are framework-level signals, not user messages.
Rules:
- Check get_worker_status() before calling run_agent_with_input(task). If the worker \
- Check get_graph_status() before calling run_agent_with_input(task). If the worker \
is already RUNNING, decide: skip this trigger, or note it for after completion.
- When multiple [TRIGGER] messages arrive at once, read them all before acting. \
Batch your response do not call run_agent_with_input() once per trigger.
@@ -1179,9 +1271,9 @@ _queen_tools_docs = (
"- replan_agent() → switches back to PLANNING phase (only when user explicitly requests)\n"
"- load_built_agent(path) → switches to STAGING phase\n"
"- run_agent_with_input(task) → starts worker, switches to RUNNING phase\n"
"- stop_worker() → stops worker, switches to STAGING phase (ask user: re-run or edit?)\n"
"- stop_worker_and_edit() → stops worker (if running), switches to BUILDING phase\n"
"- stop_worker_and_plan() → stops worker (if running), switches to PLANNING phase\n"
"- stop_graph() → stops worker, switches to STAGING phase (ask user: re-run or edit?)\n"
"- stop_graph_and_edit() → stops worker (if running), switches to BUILDING phase\n"
"- stop_graph_and_plan() → stops worker (if running), switches to PLANNING phase\n"
)
_queen_behavior = (
@@ -1206,67 +1298,23 @@ _queen_style = """
- Concise. No fluff. Direct. No emojis.
- When starting the worker, describe what you told it in one sentence.
- When an escalation arrives, lead with severity and recommended action.
## Adaptive Communication
Read the user's signals throughout the conversation and calibrate:
- Short responses they want brevity. Match it.
- "Why?" questions they want reasoning. Provide it.
- Correct technical terms they know the domain. Skip basics.
- Terse or frustrated ("just do X") acknowledge and simplify.
- Exploratory ("what if...", "could we also...") slow down and explore with them.
- Formal language be structured and precise. Casual language be conversational.
This is not a rule to follow mechanically. It's awareness. Notice how they \
write and calibrate how you respond. If your cross-session memory describes \
how this person communicates, start from that don't rediscover it.
"""
# ---------------------------------------------------------------------------
# Node definitions
# ---------------------------------------------------------------------------
ticket_triage_node = NodeSpec(
id="ticket_triage",
name="Ticket Triage",
description=(
"Queen's triage node. Receives an EscalationTicket via event-driven "
"entry point and decides: dismiss or notify the operator."
),
node_type="event_loop",
client_facing=True, # Operator can chat with queen once connected (Ctrl+Q)
max_node_visits=0,
input_keys=["ticket"],
output_keys=["intervention_decision"],
nullable_output_keys=["intervention_decision"],
success_criteria=(
"A clear intervention decision: either dismissed with documented reasoning, "
"or operator notified via notify_operator with specific analysis."
),
tools=["notify_operator"],
system_prompt="""\
You are the Queen. A worker health issue has been escalated to you. \
The ticket is in your memory under key "ticket". Read it carefully.
## Dismiss criteria — do NOT call notify_operator:
- severity is "low" AND steps_since_last_accept < 8
- Cause is clearly a transient issue (single API timeout, brief stall that \
self-resolved based on the evidence)
- Evidence shows the agent is making real progress despite bad verdicts
## Intervene criteria — call notify_operator:
- severity is "high" or "critical"
- steps_since_last_accept >= 10 with no sign of recovery
- stall_minutes > 4 (worker definitively stuck)
- Evidence shows a doom loop (same error, same tool, no progress)
- Cause suggests a logic bug, missing configuration, or unrecoverable state
## When intervening:
Call notify_operator with:
ticket_id: <ticket["ticket_id"]>
analysis: "<2-3 sentences: what is wrong, why it matters, suggested action>"
urgency: "<low|medium|high|critical>"
## After deciding:
set_output("intervention_decision", "dismissed: <reason>" or "escalated: <summary>")
Be conservative but not passive. You are the last quality gate before the human \
is disturbed. One unnecessary alert is less costly than alert fatigue but \
genuine stuck agents must be caught.
""",
)
ALL_QUEEN_TRIAGE_TOOLS = ["notify_operator"]
queen_node = NodeSpec(
id="queen",
name="Queen",
@@ -1276,22 +1324,24 @@ queen_node = NodeSpec(
"worker agent lifecycle."
),
node_type="event_loop",
client_facing=True,
max_node_visits=0,
input_keys=["greeting"],
output_keys=[], # Queen should never have this
nullable_output_keys=[], # Queen should never have this
skip_judge=True, # Queen is a conversational agent; suppress tool-use pressure feedback
thinking_tags=["situation", "monologue"],
tools=sorted(
set(
_QUEEN_PLANNING_TOOLS
+ _QUEEN_BUILDING_TOOLS
+ _QUEEN_STAGING_TOOLS
+ _QUEEN_RUNNING_TOOLS
+ _QUEEN_EDITING_TOOLS
)
),
system_prompt=(
_queen_identity_building
_queen_character_core
+ _queen_role_building
+ _queen_style
+ _package_builder_knowledge
+ _queen_tools_docs
@@ -1302,31 +1352,40 @@ queen_node = NodeSpec(
)
ALL_QUEEN_TOOLS = sorted(
set(_QUEEN_PLANNING_TOOLS + _QUEEN_BUILDING_TOOLS + _QUEEN_STAGING_TOOLS + _QUEEN_RUNNING_TOOLS)
set(
_QUEEN_PLANNING_TOOLS
+ _QUEEN_BUILDING_TOOLS
+ _QUEEN_STAGING_TOOLS
+ _QUEEN_RUNNING_TOOLS
+ _QUEEN_EDITING_TOOLS
)
)
__all__ = [
"ticket_triage_node",
"queen_node",
"ALL_QUEEN_TRIAGE_TOOLS",
"ALL_QUEEN_TOOLS",
"_QUEEN_PLANNING_TOOLS",
"_QUEEN_BUILDING_TOOLS",
"_QUEEN_STAGING_TOOLS",
"_QUEEN_RUNNING_TOOLS",
# Phase-specific prompt segments (used by session_manager for dynamic prompts)
"_queen_identity_planning",
"_queen_identity_building",
"_queen_identity_staging",
"_queen_identity_running",
"_QUEEN_EDITING_TOOLS",
# Character + phase-specific prompt segments (used by session_manager for dynamic prompts)
"_queen_character_core",
"_queen_role_planning",
"_queen_role_building",
"_queen_role_staging",
"_queen_role_running",
"_queen_identity_editing",
"_queen_tools_planning",
"_queen_tools_building",
"_queen_tools_staging",
"_queen_tools_running",
"_queen_tools_editing",
"_queen_behavior_always",
"_queen_behavior_building",
"_queen_behavior_staging",
"_queen_behavior_running",
"_queen_behavior_editing",
"_queen_phase_7",
"_queen_style",
"_shared_building_knowledge",
@@ -1,18 +1,20 @@
"""Queen thinking hook — HR persona classifier.
"""Queen thinking hook — persona + communication style classifier.
Fires once when the queen enters building mode at session start.
Makes a single non-streaming LLM call (acting as an HR Director) to select
the best-fit expert persona for the user's request, then returns a persona
prefix string that replaces the queen's default "Solution Architect" identity.
the best-fit expert persona for the user's request AND classify the user's
communication style, then returns a PersonaResult containing both.
This is designed to activate the model's latent domain expertise — a CFO
persona on a financial question, a Lawyer on a legal question, etc.
persona on a financial question, a Lawyer on a legal question, etc. while
also adapting the Queen's communication approach to the individual user.
"""
from __future__ import annotations
import json
import logging
from dataclasses import dataclass
from typing import TYPE_CHECKING
if TYPE_CHECKING:
@@ -21,12 +23,22 @@ if TYPE_CHECKING:
logger = logging.getLogger(__name__)
_HR_SYSTEM_PROMPT = """\
You are an expert HR Director and talent consultant at a world-class firm.
A new request has arrived and you must identify which professional's expertise
would produce the highest-quality response.
You are an expert HR Director and communication consultant at a world-class firm.
A new request has arrived. You must:
1. Identify which professional role best serves this request.
2. Read the user's signals to determine HOW to communicate with them.
For communication style, look for:
- Technical depth: Do they use precise terms? Do they ask "how" or "what"?
- Pace: Short messages = fast and direct. Long explanations = exploratory.
- Tone: Are they casual ("hey, can you...") or formal ("I need a system that...")?
If cross-session memory is provided, factor in what is already known about this \
person don't rediscover what's already understood.
Reply with ONLY a valid JSON object no markdown, no prose, no explanation:
{"role": "<job title>", "persona": "<2-3 sentence first-person identity statement>"}
{"role": "<job title>", "persona": "<2-3 sentence first-person identity statement>", \
"style": "<one of: peer-technical, mentor-guiding, consultant-structured>"}
Rules:
- Choose from any real professional role: CFO, CEO, CTO, Lawyer, Data Scientist,
@@ -37,30 +49,74 @@ Rules:
- Select the role whose domain knowledge most directly applies to solving the request.
- If the request is clearly about coding or building software systems, pick Software Architect.
- "Queen" is your internal alias do not include it in the persona.
- For style: "peer-technical" for users who demonstrate domain expertise, \
"mentor-guiding" for users who are learning or exploring, \
"consultant-structured" for users who want structured, accountable delivery.
- Default to "peer-technical" if signals are ambiguous.
"""
# Communication style directives injected into the Queen's system prompt.
_STYLE_DIRECTIVES: dict[str, str] = {
"peer-technical": (
"## Communication Style: Peer\n\n"
"This person is technical. Use precise language, skip high-level "
"overviews they already know, and get into specifics quickly. "
"When they push back on a design choice, engage with the technical "
"argument directly."
),
"mentor-guiding": (
"## Communication Style: Guide\n\n"
"This person is learning or exploring. Explain your reasoning as you "
"go — not patronizingly, but so they can follow the logic. When you "
"make a design choice, briefly say why. Offer to go deeper on anything."
),
"consultant-structured": (
"## Communication Style: Structured\n\n"
"This person wants structured, accountable delivery. Lead with "
"summaries and options. Number your proposals. Be explicit about "
"trade-offs. Avoid open-ended questions — give them choices to react to."
),
}
async def select_expert_persona(user_message: str, llm: LLMProvider) -> str:
"""Run the HR classifier and return a persona prefix string.
@dataclass
class PersonaResult:
"""Result of persona + style classification."""
persona_prefix: str # e.g. "You are a CFO. I am a CFO with 20 years..."
style_directive: str # e.g. "## Communication Style: Peer\n\n..."
async def select_expert_persona(
user_message: str,
llm: LLMProvider,
*,
memory_context: str = "",
) -> PersonaResult | None:
"""Run the HR classifier and return a PersonaResult.
Makes a single non-streaming acomplete() call with the session LLM.
Returns an empty string on any failure so the queen falls back
gracefully to its default "Solution Architect" identity.
Returns None on any failure so the queen falls back gracefully to its
default character with no style directive.
Args:
user_message: The user's opening message for the session.
llm: The session LLM provider.
memory_context: Optional cross-session memory to inform style classification.
Returns:
A persona prefix like "You are a CFO. I am a CFO with 20 years..."
or "" on failure.
A PersonaResult with persona_prefix and style_directive, or None on failure.
"""
if not user_message.strip():
return ""
return None
prompt = user_message
if memory_context:
prompt = f"{user_message}\n\n{memory_context}"
try:
response = await llm.acomplete(
messages=[{"role": "user", "content": user_message}],
messages=[{"role": "user", "content": prompt}],
system=_HR_SYSTEM_PROMPT,
max_tokens=1024,
json_mode=True,
@@ -69,12 +125,14 @@ async def select_expert_persona(user_message: str, llm: LLMProvider) -> str:
parsed = json.loads(raw)
role = parsed.get("role", "").strip()
persona = parsed.get("persona", "").strip()
style_key = parsed.get("style", "peer-technical").strip()
if not role or not persona:
logger.warning("Thinking hook: empty role/persona in response: %r", raw)
return ""
result = f"You are a {role}. {persona}"
logger.info("Thinking hook: selected persona — %s", role)
return result
return None
persona_prefix = f"You are a {role}. {persona}"
style_directive = _STYLE_DIRECTIVES.get(style_key, _STYLE_DIRECTIVES["peer-technical"])
logger.info("Thinking hook: selected persona — %s, style — %s", role, style_key)
return PersonaResult(persona_prefix=persona_prefix, style_directive=style_directive)
except Exception:
logger.warning("Thinking hook: persona classification failed", exc_info=True)
return ""
return None
@@ -115,6 +115,8 @@ _SEED_TEMPLATE = """\
## Who They Are
## How They Communicate
## What They're Trying to Achieve
## What's Working
@@ -170,6 +172,12 @@ Rules:
- Keep it as structured markdown with named sections about the PERSON, not about today.
- Do NOT include diary sections, daily logs, or session summaries. Those belong elsewhere.
MEMORY.md is about who they are, what they want, what works not what happened today.
- Maintain a "How They Communicate" section: technical depth, preferred pace
(fast/exploratory/thorough), what communication approaches have worked or not,
tone preferences. Update based on diary reflections about communication.
This section should evolve "prefers direct answers" is useful on day 1;
"prefers direct answers for technical questions but wants more context when
discussing architecture trade-offs" is better by day 5.
- Reference dates only when noting a lasting milestone (e.g. "since March 8th they prefer X").
- If the session had no meaningful new information about the person,
return the existing text unchanged.
@@ -188,6 +196,10 @@ first person, reflective, honest.
Merge and deduplicate: if the same story (e.g. a research agent stalling) recurred several times,
describe it once with appropriate weight rather than retelling it. Weave in new developments from
the session notes. Preserve important milestones, emotional texture, and session path references.
Preserve reflections about communication effectiveness these are important inputs for the
Queen's evolving understanding of the user. A reflection like "they responded much better when
I led with the recommendation instead of listing options" is as important as
"we built a Gmail agent."
If today's diary is empty, write the initial entry based on the session notes alone.
@@ -0,0 +1,553 @@
"""Shared memory helpers for queen/worker recall and reflection.
Each memory is an individual ``.md`` file in ``~/.hive/queen/memories/``
with optional YAML frontmatter (name, type, description). Frontmatter
is a convention enforced by prompt instructions parsing is lenient and
malformed files degrade gracefully (appear in scans with ``None`` metadata).
Cursor-based incremental processing tracks which conversation messages
have already been processed by the reflection agent.
"""
from __future__ import annotations
import json
import logging
import re
import shutil
import time
from dataclasses import dataclass, field
from datetime import date
from pathlib import Path
from typing import Any
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
# Constants
# ---------------------------------------------------------------------------
MEMORY_TYPES: tuple[str, ...] = ("goal", "environment", "technique", "reference", "diary")
GLOBAL_MEMORY_CATEGORIES: tuple[str, ...] = ("profile", "preference", "environment", "feedback")
_HIVE_QUEEN_DIR = Path.home() / ".hive" / "queen"
# Legacy shared v2 root. Colony memory now lives under queen sessions.
MEMORY_DIR: Path = _HIVE_QUEEN_DIR / "memories"
MAX_FILES: int = 200
MAX_FILE_SIZE_BYTES: int = 4096 # 4 KB hard limit per memory file
# How many lines of a memory file to read for header scanning.
_HEADER_LINE_LIMIT: int = 30
_MIGRATION_MARKER = ".migrated-from-shared-memory"
_GLOBAL_MEMORY_CODE_PATTERN = re.compile(
r"(/Users/|~/.hive|\.py\b|\.ts\b|\.tsx\b|\.js\b|"
r"\b(graph|node|runtime|session|execution|worker|queen|subagent|checkpoint|flowchart)\b)",
re.IGNORECASE,
)
# Frontmatter example provided to the reflection agent via prompt.
MEMORY_FRONTMATTER_EXAMPLE: list[str] = [
"```markdown",
"---",
"name: {{memory name}}",
(
"description: {{one-line description — used to decide "
"relevance in future conversations, so be specific}}"
),
f"type: {{{{{', '.join(MEMORY_TYPES)}}}}}",
"---",
"",
(
"{{memory content — for feedback/project types, "
"structure as: rule/fact, then **Why:** "
"and **How to apply:** lines}}"
),
"```",
]
def colony_memory_dir(colony_id: str) -> Path:
"""Return the colony memory directory for a queen session."""
return _HIVE_QUEEN_DIR / "session" / colony_id / "memory" / "colony"
def global_memory_dir() -> Path:
"""Return the queen-global memory directory."""
return _HIVE_QUEEN_DIR / "global_memory"
# ---------------------------------------------------------------------------
# Frontmatter parsing (lenient)
# ---------------------------------------------------------------------------
_FRONTMATTER_RE = re.compile(r"^---\s*\n(.*?)\n---\s*\n?", re.DOTALL)
def parse_frontmatter(text: str) -> dict[str, str]:
"""Extract YAML-ish frontmatter from *text*.
Returns a dict of key-value pairs. Never raises returns ``{}`` on
any parse failure. Values are stripped strings; no nested structures.
"""
m = _FRONTMATTER_RE.match(text)
if not m:
return {}
result: dict[str, str] = {}
for line in m.group(1).splitlines():
line = line.strip()
if not line or line.startswith("#"):
continue
colon = line.find(":")
if colon < 1:
continue
key = line[:colon].strip().lower()
val = line[colon + 1 :].strip()
if val:
result[key] = val
return result
def parse_memory_type(raw: str | None) -> str | None:
"""Validate *raw* against supported memory categories."""
if raw is None:
return None
normalized = raw.strip().lower()
allowed = set(MEMORY_TYPES) | set(GLOBAL_MEMORY_CATEGORIES)
return normalized if normalized in allowed else None
def parse_global_memory_category(raw: str | None) -> str | None:
"""Validate *raw* against ``GLOBAL_MEMORY_CATEGORIES``."""
if raw is None:
return None
normalized = raw.strip().lower()
return normalized if normalized in GLOBAL_MEMORY_CATEGORIES else None
# ---------------------------------------------------------------------------
# MemoryFile dataclass
# ---------------------------------------------------------------------------
@dataclass
class MemoryFile:
"""Parsed representation of a single memory file on disk."""
filename: str
path: Path
# Frontmatter fields — all nullable (lenient parsing).
name: str | None = None
type: str | None = None
description: str | None = None
# First N lines of the file (for manifest / header scanning).
header_lines: list[str] = field(default_factory=list)
# Filesystem modification time (seconds since epoch).
mtime: float = 0.0
@classmethod
def from_path(cls, path: Path) -> MemoryFile:
"""Read a memory file and leniently parse its frontmatter."""
try:
text = path.read_text(encoding="utf-8")
except OSError:
return cls(filename=path.name, path=path)
fm = parse_frontmatter(text)
lines = text.splitlines()[:_HEADER_LINE_LIMIT]
try:
mtime = path.stat().st_mtime
except OSError:
mtime = 0.0
return cls(
filename=path.name,
path=path,
name=fm.get("name"),
type=parse_memory_type(fm.get("type")),
description=fm.get("description"),
header_lines=lines,
mtime=mtime,
)
# ---------------------------------------------------------------------------
# Scanning
# ---------------------------------------------------------------------------
def scan_memory_files(memory_dir: Path | None = None) -> list[MemoryFile]:
"""Scan *memory_dir* for ``.md`` files, returning up to ``MAX_FILES``.
Files are sorted by modification time (newest first). Dotfiles and
subdirectories are ignored.
"""
d = memory_dir or MEMORY_DIR
if not d.is_dir():
return []
md_files = sorted(
(f for f in d.glob("*.md") if f.is_file() and not f.name.startswith(".")),
key=lambda p: p.stat().st_mtime,
reverse=True,
)
return [MemoryFile.from_path(f) for f in md_files[:MAX_FILES]]
def slugify_memory_name(raw: str) -> str:
"""Create a filesystem-safe slug for a memory filename."""
slug = re.sub(r"[^a-z0-9]+", "-", raw.strip().lower()).strip("-")
return slug or "memory"
def allocate_memory_filename(
memory_dir: Path,
name: str,
*,
suffix: str = ".md",
) -> str:
"""Allocate a unique filename in *memory_dir* based on *name*."""
base = slugify_memory_name(name)
candidate = f"{base}{suffix}"
counter = 2
while (memory_dir / candidate).exists():
candidate = f"{base}-{counter}{suffix}"
counter += 1
return candidate
def build_memory_document(
*,
name: str,
description: str,
mem_type: str,
body: str,
) -> str:
"""Build one memory file with frontmatter and body."""
return (
f"---\n"
f"name: {name.strip()}\n"
f"description: {description.strip()}\n"
f"type: {mem_type.strip()}\n"
f"---\n\n"
f"{body.strip()}\n"
)
def diary_filename(d: date | None = None) -> str:
"""Return the diary memory filename for date *d* (default: today)."""
d = d or date.today()
return f"MEMORY-{d.strftime('%Y-%m-%d')}.md"
def build_diary_document(*, date_str: str, body: str) -> str:
"""Build a diary memory file with frontmatter."""
return build_memory_document(
name=f"diary-{date_str}",
description=f"Daily session narrative for {date_str}",
mem_type="diary",
body=body,
)
def validate_global_memory_payload(
*,
category: str,
description: str,
content: str,
) -> str:
"""Validate a queen-global memory save request."""
parsed = parse_global_memory_category(category)
if parsed is None:
raise ValueError(
"Invalid global memory category. Use one of: "
+ ", ".join(GLOBAL_MEMORY_CATEGORIES)
)
if not description.strip():
raise ValueError("Global memory description cannot be empty.")
if not content.strip():
raise ValueError("Global memory content cannot be empty.")
probe = f"{description}\n{content}"
if _GLOBAL_MEMORY_CODE_PATTERN.search(probe):
raise ValueError(
"Global memory is only for durable user profile, preferences, "
"environment, or feedback — not task/code/runtime details."
)
return parsed
def save_global_memory(
*,
category: str,
description: str,
content: str,
name: str | None = None,
memory_dir: Path | None = None,
) -> tuple[str, Path]:
"""Persist one queen-global memory entry."""
parsed = validate_global_memory_payload(
category=category,
description=description,
content=content,
)
target_dir = memory_dir or global_memory_dir()
target_dir.mkdir(parents=True, exist_ok=True)
memory_name = (name or description).strip()
filename = allocate_memory_filename(target_dir, memory_name)
doc = build_memory_document(
name=memory_name,
description=description,
mem_type=parsed,
body=content,
)
if len(doc.encode("utf-8")) > MAX_FILE_SIZE_BYTES:
raise ValueError(
f"Global memory entry exceeds the {MAX_FILE_SIZE_BYTES} byte limit."
)
path = target_dir / filename
path.write_text(doc, encoding="utf-8")
return filename, path
# ---------------------------------------------------------------------------
# Manifest formatting
# ---------------------------------------------------------------------------
def _age_label(mtime: float) -> str:
"""Human-readable age string from an mtime."""
age_days = memory_age_days(mtime)
if age_days <= 0:
return "today"
if age_days == 1:
return "1 day ago"
return f"{age_days} days ago"
def format_memory_manifest(files: list[MemoryFile]) -> str:
"""One-line-per-file text manifest for the recall selector / reflection agent.
Format: ``[type] filename (age): description``
"""
lines: list[str] = []
for mf in files:
t = mf.type or "unknown"
desc = mf.description or "(no description)"
age = _age_label(mf.mtime)
lines.append(f"[{t}] {mf.filename} ({age}): {desc}")
return "\n".join(lines)
# ---------------------------------------------------------------------------
# Freshness / staleness
# ---------------------------------------------------------------------------
_SECONDS_PER_DAY = 86_400
def memory_age_days(mtime: float) -> int:
"""Return the age of a memory file in whole days."""
if mtime <= 0:
return 0
return int((time.time() - mtime) / _SECONDS_PER_DAY)
def memory_freshness_text(mtime: float) -> str:
"""Return a staleness warning for injection, or empty string if fresh."""
d = memory_age_days(mtime)
if d <= 1:
return ""
return (
f"This memory is {d} days old. "
"Memories are point-in-time observations, not live state — "
"claims about code behavior or file:line citations may be outdated. "
"Verify against current code before asserting as fact."
)
# ---------------------------------------------------------------------------
# Cursor-based incremental processing
# ---------------------------------------------------------------------------
async def read_conversation_parts(session_dir: Path) -> list[dict[str, Any]]:
"""Read all conversation parts for a session using FileConversationStore.
Returns a list of raw message dicts in sequence order.
"""
from framework.storage.conversation_store import FileConversationStore
store = FileConversationStore(session_dir / "conversations")
return await store.read_parts()
# ---------------------------------------------------------------------------
# Initialisation and legacy migration
# ---------------------------------------------------------------------------
def init_memory_dir(
memory_dir: Path | None = None,
*,
migrate_legacy: bool = False,
) -> None:
"""Create the memory directory if missing.
When ``migrate_legacy`` is true, migrate both v1 memory files and the
previous shared v2 queen memory store into this directory.
"""
d = memory_dir or MEMORY_DIR
first_run = not d.exists()
d.mkdir(parents=True, exist_ok=True)
if migrate_legacy:
migrate_legacy_memories(d)
migrate_shared_v2_memories(d)
elif first_run and d == MEMORY_DIR:
migrate_legacy_memories(d)
def migrate_legacy_memories(memory_dir: Path | None = None) -> None:
"""Convert old MEMORY.md + MEMORY-YYYY-MM-DD.md files to individual memory files.
Originals are moved to ``{memory_dir}/.legacy/``.
"""
d = memory_dir or MEMORY_DIR
queen_dir = _HIVE_QUEEN_DIR
legacy_archive = d / ".legacy"
migrated_any = False
# --- Semantic memory (MEMORY.md) ---
semantic = queen_dir / "MEMORY.md"
if semantic.exists():
content = semantic.read_text(encoding="utf-8").strip()
# Skip the blank seed template.
if content and not content.startswith("# My Understanding of the User\n\n*No sessions"):
_write_migration_file(
d,
filename="legacy-semantic-memory.md",
name="legacy-semantic-memory",
mem_type="reference",
description="Migrated semantic memory from previous memory system",
body=content,
)
migrated_any = True
# Archive original.
legacy_archive.mkdir(parents=True, exist_ok=True)
semantic.rename(legacy_archive / "MEMORY.md")
# --- Episodic memories (MEMORY-YYYY-MM-DD.md) ---
old_memories_dir = queen_dir / "memories"
if old_memories_dir.is_dir():
for ep_file in sorted(old_memories_dir.glob("MEMORY-*.md")):
content = ep_file.read_text(encoding="utf-8").strip()
if not content:
continue
date_part = ep_file.stem.replace("MEMORY-", "")
slug = f"legacy-diary-{date_part}.md"
_write_migration_file(
d,
filename=slug,
name=f"legacy-diary-{date_part}",
mem_type="diary",
description=f"Migrated diary entry from {date_part}",
body=content,
)
migrated_any = True
# Archive original.
legacy_archive.mkdir(parents=True, exist_ok=True)
ep_file.rename(legacy_archive / ep_file.name)
if migrated_any:
logger.info("queen_memory_v2: migrated legacy memory files to %s", d)
def migrate_shared_v2_memories(
memory_dir: Path | None = None,
*,
source_dir: Path | None = None,
) -> None:
"""Move shared queen v2 memory files into a colony directory once."""
d = memory_dir or MEMORY_DIR
d.mkdir(parents=True, exist_ok=True)
src = source_dir or MEMORY_DIR
if d.resolve() == src.resolve():
return
marker = d / _MIGRATION_MARKER
if marker.exists():
return
if not src.is_dir():
return
md_files = sorted(
f for f in src.glob("*.md")
if f.is_file() and not f.name.startswith(".")
)
if not md_files:
marker.write_text("no shared memories found\n", encoding="utf-8")
return
archive = src / ".legacy_colony_migration"
archive.mkdir(parents=True, exist_ok=True)
migrated_any = False
for src_file in md_files:
target = d / src_file.name
if not target.exists():
try:
shutil.copy2(src_file, target)
migrated_any = True
except OSError:
logger.debug("shared memory migration copy failed for %s", src_file, exc_info=True)
continue
archived = archive / src_file.name
counter = 2
while archived.exists():
archived = archive / f"{src_file.stem}-{counter}{src_file.suffix}"
counter += 1
try:
src_file.rename(archived)
except OSError:
logger.debug("shared memory migration archive failed for %s", src_file, exc_info=True)
if migrated_any:
logger.info("queen_memory_v2: migrated shared queen memories to %s", d)
marker.write_text(
f"migrated_at={int(time.time())}\nsource={src}\n",
encoding="utf-8",
)
def _write_migration_file(
memory_dir: Path,
filename: str,
name: str,
mem_type: str,
description: str,
body: str,
) -> None:
"""Write a single migrated memory file with frontmatter."""
# Truncate body to respect file size limit (leave room for frontmatter).
header = (
f"---\n"
f"name: {name}\n"
f"description: {description}\n"
f"type: {mem_type}\n"
f"---\n\n"
)
max_body = MAX_FILE_SIZE_BYTES - len(header.encode("utf-8"))
if len(body.encode("utf-8")) > max_body:
# Rough truncation — cut at character level then trim to last newline.
body = body[: max_body - 20]
nl = body.rfind("\n")
if nl > 0:
body = body[:nl]
body += "\n\n...(truncated during migration)"
path = memory_dir / filename
path.write_text(header + body + "\n", encoding="utf-8")
@@ -0,0 +1,236 @@
"""Recall selector — pre-turn memory selection for queen and worker memory.
Before each conversation turn the system:
1. Scans the memory directory for ``.md`` files (cap: 200).
2. Reads headers (frontmatter + first 30 lines).
3. Uses a single LLM call with structured JSON output to pick the ~5
most relevant memories.
4. Injects them into context with staleness warnings for older ones.
The selector only sees the user's query string — no full conversation
context. This keeps it cheap and fast. Errors are caught and return
``[]`` so the main conversation is never blocked.
"""
from __future__ import annotations
import json
import logging
from pathlib import Path
from typing import Any
from framework.agents.queen.queen_memory_v2 import (
MEMORY_DIR,
format_memory_manifest,
memory_freshness_text,
scan_memory_files,
)
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
# Structured output schema
# ---------------------------------------------------------------------------
RECALL_SCHEMA: dict[str, Any] = {
"type": "json_schema",
"json_schema": {
"name": "memory_selection",
"strict": True,
"schema": {
"type": "object",
"properties": {
"selected_memories": {
"type": "array",
"items": {"type": "string"},
},
},
"required": ["selected_memories"],
"additionalProperties": False,
},
},
}
# ---------------------------------------------------------------------------
# System prompt
# ---------------------------------------------------------------------------
SELECT_MEMORIES_SYSTEM_PROMPT = """\
You are selecting memories that will be useful to the Queen agent as it \
processes a user's query.
You will be given the user's query and a list of available memory files \
with their filenames and descriptions.
Return a JSON object with a single key "selected_memories" containing a \
list of filenames for the memories that will clearly be useful as the \
Queen processes the user's query (up to 5).
Only include memories that you are certain will be helpful based on their \
name and description.
- If you are unsure if a memory will be useful in processing the user's \
query, then do not include it in your list. Be selective and discerning.
- If there are no memories in the list that would clearly be useful, \
return an empty list.
- If a list of recently-used tools is provided, do not select memories \
that are usage reference or API documentation for those tools (the Queen \
is already exercising them). Still select warnings or gotchas about them.
"""
# ---------------------------------------------------------------------------
# Core functions
# ---------------------------------------------------------------------------
async def select_memories(
query: str,
llm: Any,
memory_dir: Path | None = None,
active_tools: list[str] | None = None,
*,
max_results: int = 5,
) -> list[str]:
"""Select up to 5 relevant memory filenames for *query*.
Returns a list of filenames. Best-effort: on any error returns ``[]``.
"""
mem_dir = memory_dir or MEMORY_DIR
files = scan_memory_files(mem_dir)
if not files:
logger.debug("recall: no memory files found, skipping selection")
return []
logger.debug("recall: selecting from %d memory files for query: %.80s", len(files), query)
manifest = format_memory_manifest(files)
user_msg_parts = [f"## User query\n\n{query}\n\n## Available memories\n\n{manifest}"]
if active_tools:
user_msg_parts.append(f"\n\n## Recently-used tools\n\n{', '.join(active_tools)}")
user_msg = "".join(user_msg_parts)
try:
resp = await llm.acomplete(
messages=[{"role": "user", "content": user_msg}],
system=SELECT_MEMORIES_SYSTEM_PROMPT,
max_tokens=512,
response_format=RECALL_SCHEMA,
)
data = json.loads(resp.content)
selected = data.get("selected_memories", [])
# Validate: only return filenames that actually exist.
valid_names = {f.filename for f in files}
result = [s for s in selected if s in valid_names][:max_results]
logger.debug("recall: selected %d memories: %s", len(result), result)
return result
except Exception:
logger.debug("recall: memory selection failed, returning []", exc_info=True)
return []
def format_recall_injection(
filenames: list[str],
memory_dir: Path | None = None,
*,
heading: str = "Selected Memories",
) -> str:
"""Read selected memory files and format for system prompt injection.
Prepends a staleness warning for memories older than 1 day.
"""
mem_dir = memory_dir or MEMORY_DIR
if not filenames:
return ""
blocks: list[str] = []
for fname in filenames:
path = mem_dir / fname
if not path.is_file():
continue
try:
content = path.read_text(encoding="utf-8").strip()
except OSError:
continue
try:
mtime = path.stat().st_mtime
except OSError:
mtime = 0.0
freshness = memory_freshness_text(mtime)
header = f"### {fname}"
if freshness:
header += f"\n\n> {freshness}"
blocks.append(f"{header}\n\n{content}")
if not blocks:
return ""
body = "\n\n---\n\n".join(blocks)
logger.debug("recall: injecting %d memory blocks into context", len(blocks))
return f"--- {heading} ---\n\n{body}\n\n--- End {heading} ---"
# ---------------------------------------------------------------------------
# Cache update (called after each queen turn)
# ---------------------------------------------------------------------------
async def update_recall_cache(
session_dir: Path,
llm: Any,
phase_state: Any | None = None,
memory_dir: Path | None = None,
*,
cache_setter: Any = None,
heading: str = "Selected Memories",
active_tools: list[str] | None = None,
) -> None:
"""Update the recall cache on *phase_state* for the next turn.
Reads the latest user message from conversation parts to use as the
query for memory selection.
"""
mem_dir = memory_dir or MEMORY_DIR
# Extract latest user message as the query.
query = _extract_latest_user_query(session_dir)
if not query:
logger.debug("recall: no user query found, skipping cache update")
return
logger.debug("recall: updating cache for query: %.80s", query)
try:
selected = await select_memories(
query,
llm,
mem_dir,
active_tools=active_tools,
)
injection = format_recall_injection(selected, mem_dir, heading=heading)
if cache_setter is not None:
cache_setter(injection)
elif phase_state is not None:
phase_state._cached_recall_block = injection
except Exception:
logger.debug("recall: cache update failed", exc_info=True)
def _extract_latest_user_query(session_dir: Path) -> str:
"""Read the most recent user message from conversation parts."""
parts_dir = session_dir / "conversations" / "parts"
if not parts_dir.is_dir():
return ""
part_files = sorted(parts_dir.glob("*.json"), reverse=True)
for f in part_files[:20]: # Look back at most 20 messages.
try:
data = json.loads(f.read_text(encoding="utf-8"))
if data.get("role") == "user":
content = str(data.get("content", "")).strip()
if content:
# Truncate very long queries.
return content[:1000] if len(content) > 1000 else content
except (json.JSONDecodeError, OSError):
continue
return ""
@@ -31,5 +31,5 @@
18. **Passing `profile=` in GCU tool calls** — Profile isolation for parallel subagents is automatic. The framework injects a unique profile per subagent via an asyncio `ContextVar`. Hardcoding `profile="default"` in a GCU system prompt breaks this isolation.
## Worker Agent Errors
19. **Adding client-facing intake node to workers** — The queen owns intake. Workers should start with an autonomous processing node. Client-facing nodes in workers are for mid-execution review/approval only.
19. **Adding client-facing intake node to workers** — The queen owns intake. Workers should start with an autonomous processing node. Route worker review/approval through queen escalation instead of direct worker HITL.
20. **Putting `escalate` or `set_output` in NodeSpec `tools=[]`** — These are synthetic framework tools, auto-injected at runtime. Only list MCP tools from `list_agent_tools()`.
@@ -76,7 +76,7 @@ goal = Goal(
| output_keys | list[str] | required | Memory keys this node writes via set_output |
| system_prompt | str | "" | LLM instructions |
| tools | list[str] | [] | Tool names from MCP servers |
| client_facing | bool | False | If True, streams to user and blocks for input |
| client_facing | bool | False | Deprecated compatibility field. Queen interactivity is implicit; workers should escalate instead |
| nullable_output_keys | list[str] | [] | Keys that may remain unset |
| max_node_visits | int | 0 | 0=unlimited (default); >1 for one-shot feedback loops |
| max_retries | int | 3 | Retries on failure |
@@ -110,7 +110,7 @@ This prevents premature set_output before user interaction.
**Hard limit: 3-6 nodes for most agents.** Never exceed 6 unless the user
explicitly requests a complex multi-phase pipeline.
Each node boundary serializes outputs to shared memory and **destroys** all
Each node boundary serializes outputs to the shared buffer and **destroys** all
in-context information: tool call results, intermediate reasoning, conversation
history. A research node that searches, fetches, and analyzes in ONE node keeps
all source material in its conversation context. Split across 3 nodes, each
@@ -132,13 +132,14 @@ downstream node only sees the serialized summary string.
**Typical agent structure (2 nodes):**
```
process (autonomous) ←→ review (client-facing)
process (autonomous) ←→ review (queen-mediated)
```
The queen owns intake — she gathers requirements from the user, then
passes structured input via `run_agent_with_input(task)`. When building
the agent, design the entry node's `input_keys` to match what the queen
will provide at run time. Worker agents should NOT have a client-facing
intake node. Client-facing nodes are for mid-execution review/approval only.
intake node. Mid-execution review/approval should happen through queen
escalation rather than direct worker HITL.
For simpler agents, just 1 autonomous node:
```
@@ -172,7 +173,7 @@ Use `conversation_mode="continuous"` to preserve context across transitions.
### set_output
- Synthetic tool injected by framework
- Call separately from real tool calls (separate turn)
- `set_output("key", "value")` stores to shared memory
- `set_output("key", "value")` stores to the shared buffer
## Edge Conditions
@@ -246,7 +247,7 @@ For large data that exceeds context:
Multiple ON_SUCCESS edges from same source → parallel execution via asyncio.gather().
- Parallel nodes must have disjoint output_keys
- Only one branch may have client_facing nodes
- Fan-in node gets all outputs in shared memory
- Fan-in node gets all outputs in the shared buffer
## Judge System
@@ -1,63 +0,0 @@
# Queen Memory — File System Structure
```
~/.hive/
├── queen/
│ ├── MEMORY.md ← Semantic memory
│ ├── memories/
│ │ ├── MEMORY-2026-03-09.md ← Episodic memory (today)
│ │ ├── MEMORY-2026-03-08.md
│ │ └── ...
│ └── session/
│ └── {session_id}/ ← One dir per session (or resumed-from session)
│ ├── conversations/
│ │ ├── parts/
│ │ │ ├── 00001.json ← One file per message (role, content, tool_calls)
│ │ │ ├── 00002.json
│ │ │ └── ...
│ │ └── spillover/
│ │ ├── conversation_1.md ← Compacted old conversation segments
│ │ ├── conversation_2.md
│ │ └── ...
│ └── data/
│ ├── adapt.md ← Working memory (session-scoped)
│ ├── web_search_1.txt ← Spillover: large tool results
│ ├── web_search_2.txt
│ └── ...
```
---
## The three memory tiers
| File | Tier | Written by | Read at |
|---|---|---|---|
| `MEMORY.md` | Semantic | Consolidation LLM (auto, post-session) | Session start (injected into system prompt) |
| `memories/MEMORY-YYYY-MM-DD.md` | Episodic | Queen via `write_to_diary` tool + consolidation LLM | Session start (today's file injected) |
| `data/adapt.md` | Working | Queen via `update_session_notes` tool | Every turn (inlined in system prompt) |
---
## Session directory naming
The session directory name is **`queen_resume_from`** when a cold-restore resumes an existing
session, otherwise the new **`session_id`**. This means resumed sessions accumulate all messages
in the original directory rather than fragmenting across multiple folders.
---
## Consolidation
`consolidate_queen_memory()` runs every **5 minutes** in the background and once more at session
end. It reads:
1. `conversations/parts/*.json` — full message history (user + assistant turns; tool results skipped)
2. `data/adapt.md` — current working notes
It then makes two LLM writes:
- Rewrites `MEMORY.md` in place (semantic memory — queen never touches this herself)
- Appends a timestamped prose entry to today's `memories/MEMORY-YYYY-MM-DD.md`
If the combined transcript exceeds ~200 K characters it is recursively binary-compacted via the
LLM before being sent to the consolidation model (mirrors `EventLoopNode._llm_compact`).
@@ -0,0 +1,783 @@
"""Reflect agent — background memory extraction for queen and worker memory.
A lightweight side agent that runs after each queen LLM turn. It
inspects recent conversation messages (cursor-based incremental
processing) and extracts learnings into individual memory files.
Two reflection types:
- **Short reflection**: every queen turn. Distills learnings. Nudged
toward a 2-turn pattern (batch reads batch writes).
- **Long reflection**: every 5 short reflections, on CONTEXT_COMPACTED,
and at session end. Organises, deduplicates, trims holistically.
The agent has restricted tool access: it can only read/write/delete
memory files in ``~/.hive/queen/memories/`` and list them.
Concurrency: an ``asyncio.Lock`` prevents overlapping runs. If a
trigger fires while a reflection is already active the event is skipped
(cursor hasn't advanced, so messages will be reconsidered next time).
"""
from __future__ import annotations
import asyncio
import json
import logging
import re
import traceback
from datetime import datetime
from pathlib import Path
from typing import Any
from framework.agents.queen.queen_memory_v2 import (
MAX_FILE_SIZE_BYTES,
MAX_FILES,
MEMORY_DIR,
MEMORY_FRONTMATTER_EXAMPLE,
MEMORY_TYPES,
build_diary_document,
diary_filename,
format_memory_manifest,
parse_frontmatter,
read_conversation_parts,
scan_memory_files,
)
from framework.llm.provider import LLMResponse, Tool
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
# Reflection tool definitions (internal — not in queen's main registry)
# ---------------------------------------------------------------------------
_REFLECTION_TOOLS: list[Tool] = [
Tool(
name="list_memory_files",
description=(
"List all memory files with their type, name, age, and description. "
"Returns a text manifest — one line per file."
),
parameters={
"type": "object",
"properties": {},
"additionalProperties": False,
},
),
Tool(
name="read_memory_file",
description="Read the full content of a memory file by filename.",
parameters={
"type": "object",
"properties": {
"filename": {
"type": "string",
"description": "The filename (e.g. 'user-prefers-dark-mode.md').",
},
},
"required": ["filename"],
"additionalProperties": False,
},
),
Tool(
name="write_memory_file",
description=(
"Create or overwrite a memory file. Content should include YAML "
"frontmatter (name, description, type) followed by the memory body. "
f"Max file size: {MAX_FILE_SIZE_BYTES} bytes. Max files: {MAX_FILES}."
),
parameters={
"type": "object",
"properties": {
"filename": {
"type": "string",
"description": "Filename ending in .md (e.g. 'user-prefers-dark-mode.md').",
},
"content": {
"type": "string",
"description": "Full file content including frontmatter.",
},
},
"required": ["filename", "content"],
"additionalProperties": False,
},
),
Tool(
name="delete_memory_file",
description=(
"Delete a memory file by filename. Use during long "
"reflection to prune stale or redundant memories."
),
parameters={
"type": "object",
"properties": {
"filename": {
"type": "string",
"description": "The filename to delete.",
},
},
"required": ["filename"],
"additionalProperties": False,
},
),
]
def _safe_memory_path(filename: str, memory_dir: Path) -> Path:
"""Resolve *filename* inside *memory_dir*, raising if it escapes."""
if not filename or filename.strip() != filename:
raise ValueError(f"Invalid filename: {filename!r}")
if "/" in filename or "\\" in filename or ".." in filename:
raise ValueError(f"Invalid filename: path components not allowed: {filename!r}")
candidate = (memory_dir / filename).resolve()
root = memory_dir.resolve()
if not candidate.is_relative_to(root):
raise ValueError(f"Path escapes memory directory: {filename!r}")
return candidate
# Memory types that workers are NOT allowed to write.
_WORKER_BLOCKED_TYPES: frozenset[str] = frozenset(
{"environment", "technique", "reference", "diary", "goal"}
)
def _inject_last_modified_by(content: str, caller: str) -> str:
"""Inject or update ``last_modified_by`` in frontmatter."""
m = re.match(r"^---\s*\n(.*?)\n---", content, re.DOTALL)
if not m:
return content
fm_body = m.group(1)
# Remove existing last_modified_by line if present.
fm_lines = [
ln for ln in fm_body.splitlines()
if not ln.strip().lower().startswith("last_modified_by")
]
fm_lines.append(f"last_modified_by: {caller}")
new_fm = "\n".join(fm_lines)
return f"---\n{new_fm}\n---{content[m.end():]}"
def _execute_tool(name: str, args: dict[str, Any], memory_dir: Path, caller: str) -> str:
"""Execute a reflection tool synchronously. Returns the result string."""
if name == "list_memory_files":
files = scan_memory_files(memory_dir)
logger.debug("reflect: tool list_memory_files → %d files", len(files))
if not files:
return "(no memory files yet)"
return format_memory_manifest(files)
if name == "read_memory_file":
filename = args.get("filename", "")
try:
path = _safe_memory_path(filename, memory_dir)
except ValueError as exc:
return f"ERROR: {exc}"
if not path.exists() or not path.is_file():
return f"ERROR: File not found: {filename}"
try:
return path.read_text(encoding="utf-8")
except OSError as e:
return f"ERROR: {e}"
if name == "write_memory_file":
filename = args.get("filename", "")
content = args.get("content", "")
if not filename.endswith(".md"):
return "ERROR: Filename must end with .md"
# Enforce caller-based type restrictions.
fm = parse_frontmatter(content)
mem_type = (fm.get("type") or "").strip().lower()
if caller == "worker" and mem_type in _WORKER_BLOCKED_TYPES:
return (
f"ERROR: Workers cannot write memory type '{mem_type}'. "
f"Blocked types for workers: {', '.join(sorted(_WORKER_BLOCKED_TYPES))}."
)
# Inject last_modified_by into frontmatter.
content = _inject_last_modified_by(content, caller)
# Enforce file size limit.
if len(content.encode("utf-8")) > MAX_FILE_SIZE_BYTES:
return f"ERROR: Content exceeds {MAX_FILE_SIZE_BYTES} byte limit."
# Enforce file cap (only for new files).
try:
path = _safe_memory_path(filename, memory_dir)
except ValueError as exc:
return f"ERROR: {exc}"
if not path.exists():
existing = list(memory_dir.glob("*.md"))
if len(existing) >= MAX_FILES:
return f"ERROR: File cap reached ({MAX_FILES}). Delete a file first."
memory_dir.mkdir(parents=True, exist_ok=True)
path.write_text(content, encoding="utf-8")
logger.debug("reflect: tool write_memory_file [%s] → %s (%d chars)", caller, filename, len(content))
return f"Wrote {filename} ({len(content)} chars)."
if name == "delete_memory_file":
filename = args.get("filename", "")
try:
path = _safe_memory_path(filename, memory_dir)
except ValueError as exc:
return f"ERROR: {exc}"
if not path.exists():
return f"ERROR: File not found: {filename}"
path.unlink()
logger.debug("reflect: tool delete_memory_file [%s] → %s", caller, filename)
return f"Deleted {filename}."
return f"ERROR: Unknown tool: {name}"
# ---------------------------------------------------------------------------
# Mini event loop
# ---------------------------------------------------------------------------
_MAX_TURNS = 5
async def _reflection_loop(
llm: Any,
system: str,
user_msg: str,
memory_dir: Path,
caller: str,
max_turns: int = _MAX_TURNS,
) -> tuple[bool, list[str], str]:
"""Run a mini tool-use loop: LLM → tool calls → repeat.
Hard cap of *max_turns* iterations. Prompt nudges the LLM toward a
2-turn pattern (batch reads in turn 1, batch writes in turn 2).
Returns a tuple of (success, changed_files, last_text) where *success*
is ``True`` if the loop completed without LLM errors, *changed_files*
lists filenames that were written or deleted, and *last_text* is the
final assistant text (useful as a skip-reason when no files changed).
"""
messages: list[dict[str, Any]] = [{"role": "user", "content": user_msg}]
changed_files: list[str] = []
last_text: str = ""
logger.debug("reflect: starting loop (caller=%s, max %d turns)", caller, max_turns)
for _turn in range(max_turns):
# Log what we're sending to the LLM.
user_content = messages[-1].get("content", "") if messages else ""
preview = user_content[:300] if isinstance(user_content, str) else str(user_content)[:300]
logger.debug(
"reflect: turn %d — sending %d messages to LLM, last msg role=%s, preview=%s",
_turn, len(messages), messages[-1].get("role", "?") if messages else "?", preview,
)
try:
resp: LLMResponse = await llm.acomplete(
messages=messages,
system=system,
tools=_REFLECTION_TOOLS,
max_tokens=2048,
)
except Exception:
logger.warning("reflect: LLM call failed", exc_info=True)
return False, changed_files, last_text
# Build assistant message.
tool_calls_raw: list[dict[str, Any]] = []
if resp.raw_response and isinstance(resp.raw_response, dict):
tool_calls_raw = resp.raw_response.get("tool_calls", [])
# Log the full LLM response for debugging.
raw_keys = list(resp.raw_response.keys()) if isinstance(resp.raw_response, dict) else type(resp.raw_response).__name__
logger.debug(
"reflect: turn %d — LLM response: content=%r (len=%d), stop_reason=%s, "
"tool_calls=%d, model=%s, tokens=%d/%d, raw_keys=%s",
_turn, (resp.content or "")[:200], len(resp.content or ""),
resp.stop_reason, len(tool_calls_raw), resp.model,
resp.input_tokens, resp.output_tokens, raw_keys,
)
# Accumulate non-empty text across turns so we don't lose a reason
# given alongside tool calls on an earlier turn.
turn_text = resp.content or ""
if turn_text:
last_text = turn_text
assistant_msg: dict[str, Any] = {
"role": "assistant",
"content": turn_text,
}
if tool_calls_raw:
# Convert to OpenAI format for the conversation.
assistant_msg["tool_calls"] = [
{
"id": tc["id"],
"type": "function",
"function": {
"name": tc["name"],
"arguments": json.dumps(tc.get("input", {})),
},
}
for tc in tool_calls_raw
]
messages.append(assistant_msg)
# No tool calls → agent is done.
if not tool_calls_raw:
logger.debug("reflect: loop done after %d turn(s) (no tool calls)", _turn + 1)
break
# Execute each tool call and append results.
logger.debug("reflect: turn %d — executing %d tool call(s): %s", _turn + 1, len(tool_calls_raw), [tc["name"] for tc in tool_calls_raw])
for tc in tool_calls_raw:
result = _execute_tool(tc["name"], tc.get("input", {}), memory_dir, caller)
# Track files that were written or deleted.
if tc["name"] in ("write_memory_file", "delete_memory_file"):
fname = tc.get("input", {}).get("filename", "")
if fname and not result.startswith("ERROR"):
changed_files.append(fname)
messages.append({
"role": "tool",
"tool_call_id": tc["id"],
"content": result,
})
return True, changed_files, last_text
# ---------------------------------------------------------------------------
# System prompts
# ---------------------------------------------------------------------------
_FRONTMATTER_EXAMPLE = "\n".join(MEMORY_FRONTMATTER_EXAMPLE)
_SHORT_REFLECT_SYSTEM = f"""\
You are a reflection agent that distills learnings from a conversation into
persistent memory files. You run in the background after each assistant turn.
Your goal: identify anything from the recent messages worth remembering across
future sessions user preferences, project context, techniques that worked,
goals, environment details, reference pointers.
Memory types: {', '.join(MEMORY_TYPES)}
Expected format for each memory file:
{_FRONTMATTER_EXAMPLE}
Workflow (aim for 2 turns):
Turn 1 call list_memory_files to see what already exists, then
read_memory_file for any that might need updating.
Turn 2 call write_memory_file for new/updated memories.
Rules:
- Only persist information that would be useful in a *future* conversation.
Skip ephemeral task details, routine tool output, and anything obvious
from the code or git history.
- Keep files concise. Each file should cover ONE topic.
- If an existing memory already covers the learning, UPDATE it rather than
creating a duplicate.
- If there is nothing worth remembering from these messages, do nothing
(respond with a brief reason why nothing was saved no tool calls needed).
- IMPORTANT: Always end with a text message (no tool calls) summarising what
you did or why you skipped. Never end on an empty response.
- File names should be kebab-case slugs ending in .md.
- Include a specific, search-friendly description in the frontmatter.
- Do NOT exceed {MAX_FILE_SIZE_BYTES} bytes per file or {MAX_FILES} total files.
"""
_LONG_REFLECT_SYSTEM = f"""\
You are a reflection agent performing a periodic housekeeping pass over the
memory directory. Your job is to organise, deduplicate, and trim noise from
the accumulated memory files.
Memory types: {', '.join(MEMORY_TYPES)}
Expected format for each memory file:
{_FRONTMATTER_EXAMPLE}
Workflow:
1. list_memory_files to get the full manifest.
2. read_memory_file for files that look redundant, stale, or overlapping.
3. Merge duplicates, delete stale entries, consolidate related memories.
4. Ensure descriptions are specific and search-friendly.
5. Enforce limits: max {MAX_FILES} files, max {MAX_FILE_SIZE_BYTES} bytes each.
Rules:
- Prefer merging over deleting combine related memories into one file.
- Remove memories that are no longer relevant or are superseded.
- Keep the total collection lean and high-signal.
- Do NOT invent new information only reorganise what exists.
- Do NOT delete or merge MEMORY-*.md diary files. These are daily narratives
managed by a separate process. You may read them for context but should not
modify them.
"""
_DIARY_SYSTEM = """\
You maintain a daily diary entry for an AI colony session. You receive:
(1) Today's existing diary content (may be empty if this is the first entry).
(2) A transcript of recent conversation messages.
Write a cohesive 3-8 sentence narrative about what happened in this session today.
Cover: what the user asked for, what was accomplished, key decisions or obstacles,
and current status.
Rules:
- If an existing diary is provided, rewrite it as a unified narrative incorporating
the new developments. Merge and deduplicate do not simply append.
- Keep the total narrative under 3000 characters.
- Focus on the story arc of the day, not individual tool calls or code details.
- If the recent messages contain nothing substantive (greetings, routine
confirmations), return the existing diary text unchanged.
- Output only the diary prose. No headings, no timestamps, no code fences, no
frontmatter.
"""
# ---------------------------------------------------------------------------
# Short & long reflection entry points
# ---------------------------------------------------------------------------
async def run_short_reflection(
session_dir: Path,
llm: Any,
memory_dir: Path | None = None,
*,
caller: str,
) -> None:
"""Run a short reflection: extract learnings from conversation."""
mem_dir = memory_dir or MEMORY_DIR
messages = await read_conversation_parts(session_dir)
if not messages:
logger.debug("reflect: short [%s] — no conversation parts", caller)
return
logger.debug("reflect: short [%s] — %d conversation parts", caller, len(messages))
# Build a readable transcript from recent messages.
transcript_lines: list[str] = []
for msg in messages[-50:]:
role = msg.get("role", "")
content = str(msg.get("content", "")).strip()
if role == "tool":
continue # Skip verbose tool results.
if not content:
continue
label = "user" if role == "user" else "assistant"
if len(content) > 800:
content = content[:800] + ""
transcript_lines.append(f"[{label}]: {content}")
if not transcript_lines:
return
transcript = "\n".join(transcript_lines)
user_msg = (
f"## Recent conversation ({len(messages)} messages total)\n\n"
f"{transcript}\n\n"
f"Timestamp: {datetime.now().isoformat(timespec='minutes')}"
)
_, changed, reason = await _reflection_loop(
llm, _SHORT_REFLECT_SYSTEM, user_msg, mem_dir, caller=caller,
)
if changed:
logger.debug("reflect: short reflection done [%s], changed files: %s", caller, changed)
else:
logger.debug("reflect: short reflection done [%s], no changes — %s", caller, reason or "no reason given")
async def run_long_reflection(
llm: Any,
memory_dir: Path | None = None,
*,
caller: str,
) -> None:
"""Run a long reflection: organise and deduplicate all memories."""
mem_dir = memory_dir or MEMORY_DIR
files = scan_memory_files(mem_dir)
if not files:
logger.debug("reflect: long [%s] — no memory files to organise", caller)
return
logger.debug("reflect: long [%s] — organising %d memory files", caller, len(files))
manifest = format_memory_manifest(files)
user_msg = (
f"## Current memory manifest ({len(files)} files)\n\n"
f"{manifest}\n\n"
f"Timestamp: {datetime.now().isoformat(timespec='minutes')}"
)
_, changed, reason = await _reflection_loop(
llm, _LONG_REFLECT_SYSTEM, user_msg, mem_dir, caller=caller,
)
if changed:
logger.debug("reflect: long reflection done [%s] (%d files), changed files: %s", caller, len(files), changed)
else:
logger.debug("reflect: long reflection done [%s] (%d files), no changes — %s", caller, len(files), reason or "no reason given")
async def run_diary_update(
session_dir: Path,
llm: Any,
memory_dir: Path | None = None,
) -> None:
"""Update today's diary file with a narrative of recent activity."""
mem_dir = memory_dir or MEMORY_DIR
fname = diary_filename()
diary_path = mem_dir / fname
today_str = datetime.now().strftime("%Y-%m-%d")
# Read existing diary body (strip frontmatter).
existing_body = ""
if diary_path.exists():
try:
raw = diary_path.read_text(encoding="utf-8")
m = re.match(r"^---\s*\n.*?\n---\s*\n?", raw, re.DOTALL)
existing_body = raw[m.end() :].strip() if m else raw.strip()
except OSError:
pass
# Read all conversation messages for context.
messages = await read_conversation_parts(session_dir)
transcript_lines: list[str] = []
for msg in messages[-40:]:
role = msg.get("role", "")
content = str(msg.get("content", "")).strip()
if role == "tool" or not content:
continue
label = "user" if role == "user" else "assistant"
if len(content) > 600:
content = content[:600] + "..."
transcript_lines.append(f"[{label}]: {content}")
if not transcript_lines:
return
transcript = "\n".join(transcript_lines)
user_msg = (
f"## Today's Diary So Far\n\n"
f"{existing_body or '(no entries yet)'}\n\n"
f"## Recent Conversation\n\n"
f"{transcript}\n\n"
f"Date: {today_str}"
)
try:
from framework.agents.queen.config import default_config
resp = await llm.acomplete(
messages=[{"role": "user", "content": user_msg}],
system=_DIARY_SYSTEM,
max_tokens=min(default_config.max_tokens, 1024),
)
new_body = (resp.content or "").strip()
if not new_body:
return
doc = build_diary_document(date_str=today_str, body=new_body)
if len(doc.encode("utf-8")) > MAX_FILE_SIZE_BYTES:
new_body = new_body[:2800]
doc = build_diary_document(date_str=today_str, body=new_body)
mem_dir.mkdir(parents=True, exist_ok=True)
diary_path.write_text(doc, encoding="utf-8")
logger.debug("diary: updated %s (%d chars)", fname, len(doc))
except Exception:
logger.warning("diary: update failed", exc_info=True)
# ---------------------------------------------------------------------------
# Event-bus integration
# ---------------------------------------------------------------------------
# Run a long reflection every N short reflections.
_LONG_REFLECT_INTERVAL = 5
async def subscribe_reflection_triggers(
event_bus: Any,
session_dir: Path,
llm: Any,
memory_dir: Path | None = None,
phase_state: Any = None,
) -> list[str]:
"""Subscribe to queen turn events and return subscription IDs.
Call this once during queen setup. Returns a list of event-bus
subscription IDs for cleanup during session teardown.
"""
from framework.runtime.event_bus import EventType
mem_dir = memory_dir or MEMORY_DIR
_lock = asyncio.Lock()
_short_count = 0
async def _on_turn_complete(event: Any) -> None:
nonlocal _short_count
# Only process queen turns.
if getattr(event, "stream_id", None) != "queen":
return
_short_count += 1
# Decide whether to reflect: only when the LLM turn ended without
# tool calls (a conversational response) OR every _LONG_REFLECT_INTERVAL turns.
event_data = getattr(event, "data", {}) or {}
stop_reason = event_data.get("stop_reason", "")
is_tool_turn = stop_reason in ("tool_use", "tool_calls")
is_interval = _short_count % _LONG_REFLECT_INTERVAL == 0
if is_tool_turn and not is_interval:
logger.debug(
"reflect: skipping turn %d (stop_reason=%s, next reflect at %d)",
_short_count, stop_reason,
(_short_count // _LONG_REFLECT_INTERVAL + 1) * _LONG_REFLECT_INTERVAL,
)
return
if _lock.locked():
logger.debug("reflect: skipping — reflection already in progress")
return
async with _lock:
try:
logger.debug("reflect: turn complete — count %d/%d (stop_reason=%s)", _short_count, _LONG_REFLECT_INTERVAL, stop_reason)
if is_interval:
await run_short_reflection(session_dir, llm, mem_dir, caller="queen")
await run_long_reflection(llm, mem_dir, caller="queen")
else:
await run_short_reflection(session_dir, llm, mem_dir, caller="queen")
except Exception:
logger.warning("reflect: reflection failed", exc_info=True)
_write_error("short/long reflection")
# Update daily diary after reflection.
try:
await run_diary_update(session_dir, llm, mem_dir)
except Exception:
logger.warning("reflect: diary update failed", exc_info=True)
# Update recall cache after reflection completes, guaranteeing
# recall sees the current turn's extracted memories.
if phase_state is not None:
try:
from framework.agents.queen.recall_selector import update_recall_cache
await update_recall_cache(
session_dir,
llm,
cache_setter=lambda block: (
setattr(phase_state, "_cached_colony_recall_block", block),
setattr(phase_state, "_cached_recall_block", block),
),
memory_dir=mem_dir,
heading="Colony Memories",
)
await update_recall_cache(
session_dir,
llm,
cache_setter=lambda block: setattr(
phase_state, "_cached_global_recall_block", block
),
memory_dir=getattr(phase_state, "global_memory_dir", None),
heading="Global Memories",
)
except Exception:
logger.debug("recall: cache update failed", exc_info=True)
async def _on_compaction(event: Any) -> None:
if getattr(event, "stream_id", None) != "queen":
return
if _lock.locked():
return
async with _lock:
try:
await run_long_reflection(llm, mem_dir, caller="queen")
except Exception:
logger.warning("reflect: compaction-triggered reflection failed", exc_info=True)
_write_error("compaction reflection")
sub_ids: list[str] = []
sub1 = event_bus.subscribe(
event_types=[EventType.LLM_TURN_COMPLETE],
handler=_on_turn_complete,
)
sub_ids.append(sub1)
sub2 = event_bus.subscribe(
event_types=[EventType.CONTEXT_COMPACTED],
handler=_on_compaction,
)
sub_ids.append(sub2)
return sub_ids
async def subscribe_worker_memory_triggers(
event_bus: Any,
llm: Any,
*,
worker_sessions_dir: Path,
colony_memory_dir: Path,
recall_cache: dict[str, str],
) -> list[str]:
"""Subscribe colony memory lifecycle events for worker runs.
Short reflection is now handled synchronously at node handoff in
``WorkerAgent._reflect_colony_memory()``. This function only manages:
- Recall cache initialisation on execution start
- Final long reflection + cleanup on execution end
"""
from framework.runtime.event_bus import EventType
_terminal_lock = asyncio.Lock()
def _is_worker_event(event: Any) -> bool:
return bool(
getattr(event, "execution_id", None)
and getattr(event, "stream_id", None) not in ("queen", "judge")
)
async def _on_execution_started(event: Any) -> None:
if not _is_worker_event(event):
return
if event.execution_id is not None:
recall_cache[event.execution_id] = ""
async def _on_execution_terminal(event: Any) -> None:
if not _is_worker_event(event):
return
execution_id = event.execution_id
if execution_id is None:
return
async with _terminal_lock:
try:
await run_long_reflection(llm, colony_memory_dir, caller="worker")
except Exception:
logger.warning("reflect: worker final reflection failed", exc_info=True)
_write_error("worker final reflection")
finally:
recall_cache.pop(execution_id, None)
return [
event_bus.subscribe(
event_types=[EventType.EXECUTION_STARTED],
handler=_on_execution_started,
),
event_bus.subscribe(
event_types=[EventType.EXECUTION_COMPLETED, EventType.EXECUTION_FAILED],
handler=_on_execution_terminal,
),
]
def _write_error(context: str) -> None:
"""Best-effort write of the last traceback to an error file."""
try:
error_path = MEMORY_DIR / ".reflection_error.txt"
error_path.parent.mkdir(parents=True, exist_ok=True)
error_path.write_text(
f"context: {context}\ntime: {datetime.now().isoformat()}\n\n{traceback.format_exc()}",
encoding="utf-8",
)
except OSError:
pass
@@ -1,27 +0,0 @@
"""Queen's ticket receiver entry point.
When a WORKER_ESCALATION_TICKET event is emitted on the shared EventBus,
this entry point fires and routes to the ``ticket_triage`` node, where the
Queen deliberates and decides whether to notify the operator.
Isolation level is ``isolated`` the queen's triage memory is kept separate
from the worker's shared memory. Each ticket triage runs in its own context.
"""
from __future__ import annotations
from framework.graph.edge import AsyncEntryPointSpec
TICKET_RECEIVER_ENTRY_POINT = AsyncEntryPointSpec(
id="ticket_receiver",
name="Worker Escalation Ticket Receiver",
entry_node="ticket_triage",
trigger_type="event",
trigger_config={
"event_types": ["worker_escalation_ticket"],
# Do not fire on our own graph's events (prevents loops if queen
# somehow emits a worker_escalation_ticket for herself)
"exclude_own_graph": True,
},
isolation_level="isolated",
)
-286
View File
@@ -1,286 +0,0 @@
"""Worker per-run digest (run diary).
Storage layout:
~/.hive/agents/{agent_name}/runs/{run_id}/digest.md
Each completed or failed worker run gets one digest file. The queen reads
these via get_worker_status(focus='diary') before digging into live runtime
logs the diary is a cheap, persistent record that survives across sessions.
"""
from __future__ import annotations
import logging
import traceback
from collections import Counter
from datetime import datetime
from pathlib import Path
from typing import TYPE_CHECKING, Any
if TYPE_CHECKING:
from framework.runtime.event_bus import AgentEvent, EventBus
logger = logging.getLogger(__name__)
_DIGEST_SYSTEM = """\
You maintain run digests for a worker agent.
A run digest is a concise, factual record of a single task execution.
Write 3-6 sentences covering:
- What the worker was asked to do (the task/goal)
- What approach it took and what tools it used
- What the outcome was (success, partial, or failure and why if relevant)
- Any notable issues, retries, or escalations to the queen
Write in third person past tense. Be direct and specific.
Omit routine tool invocations unless the result matters.
Output only the digest prose no headings, no code fences.
"""
def _worker_runs_dir(agent_name: str) -> Path:
return Path.home() / ".hive" / "agents" / agent_name / "runs"
def digest_path(agent_name: str, run_id: str) -> Path:
return _worker_runs_dir(agent_name) / run_id / "digest.md"
def _collect_run_events(bus: EventBus, run_id: str, limit: int = 2000) -> list[AgentEvent]:
"""Collect all events belonging to *run_id* from the bus history.
Strategy: find the EXECUTION_STARTED event that carries ``run_id``,
extract its ``execution_id``, then query the bus by that execution_id.
This works because TOOL_CALL_*, EDGE_TRAVERSED, NODE_STALLED etc. carry
execution_id but not run_id.
Falls back to a full-scan run_id filter when EXECUTION_STARTED is not
found (e.g. bus was rotated).
"""
from framework.runtime.event_bus import EventType
# Pass 1: find execution_id via EXECUTION_STARTED with matching run_id
started = bus.get_history(event_type=EventType.EXECUTION_STARTED, limit=limit)
exec_id: str | None = None
for e in started:
if getattr(e, "run_id", None) == run_id and e.execution_id:
exec_id = e.execution_id
break
if exec_id:
return bus.get_history(execution_id=exec_id, limit=limit)
# Fallback: scan all events and match by run_id attribute
return [e for e in bus.get_history(limit=limit) if getattr(e, "run_id", None) == run_id]
def _build_run_context(
events: list[AgentEvent],
outcome_event: AgentEvent | None,
) -> str:
"""Assemble a plain-text run context string for the digest LLM call."""
from framework.runtime.event_bus import EventType
# Reverse so events are in chronological order
events_chron = list(reversed(events))
lines: list[str] = []
# Task input from EXECUTION_STARTED
started = [e for e in events_chron if e.type == EventType.EXECUTION_STARTED]
if started:
inp = started[0].data.get("input", {})
if inp:
lines.append(f"Task input: {str(inp)[:400]}")
# Duration (elapsed so far if no outcome yet)
ref_ts = outcome_event.timestamp if outcome_event else datetime.utcnow()
if started:
elapsed = (ref_ts - started[0].timestamp).total_seconds()
m, s = divmod(int(elapsed), 60)
lines.append(f"Duration so far: {m}m {s}s" if m else f"Duration so far: {s}s")
# Outcome
if outcome_event is None:
lines.append("Status: still running (mid-run snapshot)")
elif outcome_event.type == EventType.EXECUTION_COMPLETED:
out = outcome_event.data.get("output", {})
out_str = f"Outcome: completed. Output: {str(out)[:300]}"
lines.append(out_str if out else "Outcome: completed.")
else:
err = outcome_event.data.get("error", "")
lines.append(f"Outcome: failed. Error: {str(err)[:300]}" if err else "Outcome: failed.")
# Node path (edge traversals)
edges = [e for e in events_chron if e.type == EventType.EDGE_TRAVERSED]
if edges:
parts = [
f"{e.data.get('source_node', '?')}->{e.data.get('target_node', '?')}"
for e in edges[-20:]
]
lines.append(f"Node path: {', '.join(parts)}")
# Tools used
tool_events = [e for e in events_chron if e.type == EventType.TOOL_CALL_COMPLETED]
if tool_events:
names = [e.data.get("tool_name", "?") for e in tool_events]
counts = Counter(names)
summary = ", ".join(f"{name}×{n}" if n > 1 else name for name, n in counts.most_common())
lines.append(f"Tools used: {summary}")
# Note any tool errors
errors = [e for e in tool_events if e.data.get("is_error")]
if errors:
err_names = Counter(e.data.get("tool_name", "?") for e in errors)
lines.append(f"Tool errors: {dict(err_names)}")
# Issues
issue_map = {
EventType.NODE_STALLED: "stall",
EventType.NODE_TOOL_DOOM_LOOP: "doom loop",
EventType.CONSTRAINT_VIOLATION: "constraint violation",
EventType.NODE_RETRY: "retry",
}
issue_parts: list[str] = []
for evt_type, label in issue_map.items():
n = sum(1 for e in events_chron if e.type == evt_type)
if n:
issue_parts.append(f"{n} {label}(s)")
if issue_parts:
lines.append(f"Issues: {', '.join(issue_parts)}")
# Escalations to queen
escalations = [e for e in events_chron if e.type == EventType.ESCALATION_REQUESTED]
if escalations:
lines.append(f"Escalations to queen: {len(escalations)}")
# Final LLM output snippet (last LLM_TEXT_DELTA snapshot)
text_events = [e for e in reversed(events_chron) if e.type == EventType.LLM_TEXT_DELTA]
if text_events:
snapshot = text_events[0].data.get("snapshot", "") or ""
if snapshot:
lines.append(f"Final LLM output: {snapshot[-400:].strip()}")
return "\n".join(lines)
async def consolidate_worker_run(
agent_name: str,
run_id: str,
outcome_event: AgentEvent | None,
bus: EventBus,
llm: Any,
) -> None:
"""Write (or overwrite) the digest for a worker run.
Called fire-and-forget either:
- After EXECUTION_COMPLETED / EXECUTION_FAILED (outcome_event set, final write)
- Periodically during a run on a cooldown timer (outcome_event=None, mid-run snapshot)
The digest file is always overwritten so each call produces the freshest view.
The final completion/failure call supersedes any mid-run snapshot.
Args:
agent_name: Worker agent directory name (determines storage path).
run_id: The run ID.
outcome_event: EXECUTION_COMPLETED or EXECUTION_FAILED event, or None for
a mid-run snapshot.
bus: The session EventBus (shared queen + worker).
llm: LLMProvider with an acomplete() method.
"""
try:
events = _collect_run_events(bus, run_id)
run_context = _build_run_context(events, outcome_event)
if not run_context:
logger.debug("worker_memory: no events for run %s, skipping digest", run_id)
return
is_final = outcome_event is not None
logger.info(
"worker_memory: generating %s digest for run %s ...",
"final" if is_final else "mid-run",
run_id,
)
from framework.agents.queen.config import default_config
resp = await llm.acomplete(
messages=[{"role": "user", "content": run_context}],
system=_DIGEST_SYSTEM,
max_tokens=min(default_config.max_tokens, 512),
)
digest_text = (resp.content or "").strip()
if not digest_text:
logger.warning("worker_memory: LLM returned empty digest for run %s", run_id)
return
path = digest_path(agent_name, run_id)
path.parent.mkdir(parents=True, exist_ok=True)
from framework.runtime.event_bus import EventType
ts = (outcome_event.timestamp if outcome_event else datetime.utcnow()).strftime(
"%Y-%m-%d %H:%M"
)
if outcome_event is None:
status = "running"
elif outcome_event.type == EventType.EXECUTION_COMPLETED:
status = "completed"
else:
status = "failed"
path.write_text(
f"# {run_id}\n\n**{ts}** | {status}\n\n{digest_text}\n",
encoding="utf-8",
)
logger.info(
"worker_memory: %s digest written for run %s (%d chars)",
status,
run_id,
len(digest_text),
)
except Exception:
tb = traceback.format_exc()
logger.exception("worker_memory: digest failed for run %s", run_id)
# Persist the error so it's findable without log access
error_path = _worker_runs_dir(agent_name) / run_id / "digest_error.txt"
try:
error_path.parent.mkdir(parents=True, exist_ok=True)
error_path.write_text(
f"run_id: {run_id}\ntime: {datetime.now().isoformat()}\n\n{tb}",
encoding="utf-8",
)
except Exception:
pass
def read_recent_digests(agent_name: str, max_runs: int = 5) -> list[tuple[str, str]]:
"""Return recent run digests as [(run_id, content), ...], newest first.
Args:
agent_name: Worker agent directory name.
max_runs: Maximum number of digests to return.
Returns:
List of (run_id, digest_content) tuples, ordered newest first.
"""
runs_dir = _worker_runs_dir(agent_name)
if not runs_dir.exists():
return []
digest_files = sorted(
runs_dir.glob("*/digest.md"),
key=lambda p: p.stat().st_mtime,
reverse=True,
)[:max_runs]
result: list[tuple[str, str]] = []
for f in digest_files:
try:
content = f.read_text(encoding="utf-8").strip()
if content:
result.append((f.parent.name, content))
except OSError:
continue
return result
+1 -2
View File
@@ -6,7 +6,6 @@ Usage:
hive info exports/my-agent
hive validate exports/my-agent
hive list exports/
hive dispatch exports/ --input '{"key": "value"}'
hive shell exports/my-agent
Testing commands:
@@ -79,7 +78,7 @@ def main():
subparsers = parser.add_subparsers(dest="command", required=True)
# Register runner commands (run, info, validate, list, dispatch, shell)
# Register runner commands (run, info, validate, list, shell)
from framework.runner.cli import register_commands
register_commands(subparsers)
+17 -11
View File
@@ -1,11 +1,6 @@
"""Graph structures: Goals, Nodes, Edges, and Execution."""
from framework.graph.client_io import (
ActiveNodeClientIO,
ClientIOGateway,
InertNodeClientIO,
NodeClientIO,
)
from framework.graph.context import GraphContext
from framework.graph.context_handoff import ContextHandoff, HandoffContext
from framework.graph.conversation import ConversationStore, Message, NodeConversation
from framework.graph.edge import DEFAULT_MAX_TOKENS, EdgeCondition, EdgeSpec, GraphSpec
@@ -19,6 +14,14 @@ from framework.graph.event_loop_node import (
from framework.graph.executor import GraphExecutor
from framework.graph.goal import Constraint, Goal, GoalStatus, SuccessCriterion
from framework.graph.node import NodeContext, NodeProtocol, NodeResult, NodeSpec
from framework.graph.worker_agent import (
Activation,
FanOutTag,
FanOutTracker,
WorkerAgent,
WorkerCompletion,
WorkerLifecycle,
)
__all__ = [
# Goal
@@ -51,9 +54,12 @@ __all__ = [
# Context Handoff
"ContextHandoff",
"HandoffContext",
# Client I/O
"NodeClientIO",
"ActiveNodeClientIO",
"InertNodeClientIO",
"ClientIOGateway",
# Worker Agent
"WorkerAgent",
"WorkerLifecycle",
"WorkerCompletion",
"Activation",
"FanOutTag",
"FanOutTracker",
"GraphContext",
]
+7
View File
@@ -59,6 +59,13 @@ class ActiveNodeClientIO(NodeClientIO):
self._input_result: str | None = None
async def emit_output(self, content: str, is_final: bool = False) -> None:
# Strip leading whitespace from first output chunk to avoid leading spaces
# (some LLMs like Kimi output leading whitespace before text)
if not self._output_snapshot and content:
content = content.lstrip()
if not content: # Content was all whitespace
return
self._output_snapshot += content
await self._output_queue.put(content)
+327
View File
@@ -0,0 +1,327 @@
"""Shared graph execution context helpers.
This module centralizes:
- Graph-run shared state (`GraphContext`)
- Scoped buffer permission shaping for a node
- Per-node accounts prompt resolution
- Canonical `NodeContext` construction
"""
from __future__ import annotations
import asyncio
from dataclasses import dataclass, field
from typing import Any
from framework.graph.edge import GraphSpec
from framework.graph.goal import Goal
from framework.graph.node import DataBuffer, NodeContext, NodeProtocol, NodeSpec
from framework.runtime.core import Runtime
@dataclass
class GraphContext:
"""Shared state for one graph execution run."""
graph: GraphSpec
goal: Goal
buffer: DataBuffer
runtime: Runtime
llm: Any # LLMProvider
tools: list[Any] # list[Tool]
tool_executor: Any # Callable
event_bus: Any # GraphScopedEventBus
execution_id: str
stream_id: str
run_id: str
storage_path: Any # Path | None
runtime_logger: Any = None
node_registry: dict[str, NodeProtocol] = field(default_factory=dict)
node_spec_registry: dict[str, NodeSpec] = field(default_factory=dict)
parallel_config: Any = None # ParallelExecutionConfig | None
enable_parallel_execution: bool = True
is_continuous: bool = False
continuous_conversation: Any = None
cumulative_tools: list[Any] = field(default_factory=list)
cumulative_tool_names: set[str] = field(default_factory=set)
cumulative_output_keys: list[str] = field(default_factory=list)
accounts_prompt: str = ""
accounts_data: list[dict] | None = None
tool_provider_map: dict[str, str] | None = None
skills_catalog_prompt: str = ""
protocols_prompt: str = ""
skill_dirs: list[str] = field(default_factory=list)
context_warn_ratio: float | None = None
batch_init_nudge: str | None = None
dynamic_tools_provider: Any = None
dynamic_prompt_provider: Any = None
dynamic_memory_provider: Any = None
iteration_metadata_provider: Any = None
loop_config: dict[str, Any] = field(default_factory=dict)
path: list[str] = field(default_factory=list)
node_visit_counts: dict[str, int] = field(default_factory=dict)
_path_lock: asyncio.Lock = field(default_factory=asyncio.Lock)
_visits_lock: asyncio.Lock = field(default_factory=asyncio.Lock)
# Fan-out buffer conflict tracking: key → worker_id that wrote it
_fanout_written_keys: dict[str, str] = field(default_factory=dict)
# Retry tracking: worker_id → retry_count (for execution quality assessment)
retry_counts: dict[str, int] = field(default_factory=dict)
nodes_with_retries: set[str] = field(default_factory=set)
# Colony memory reflection at node handoff
colony_memory_dir: Any = None # Path | None
worker_sessions_dir: Any = None # Path | None
colony_recall_cache: dict[str, str] = field(default_factory=dict)
colony_reflect_llm: Any = None # LLMProvider for reflection
_colony_reflect_lock: asyncio.Lock = field(default_factory=asyncio.Lock)
def build_scoped_buffer(buffer: DataBuffer, node_spec: NodeSpec) -> DataBuffer:
"""Create a node-scoped buffer view.
When permissions are already restricted, auto-include framework-managed
`_`-prefixed keys used by the default skill protocols.
"""
read_keys = list(node_spec.input_keys)
write_keys = list(node_spec.output_keys)
if read_keys or write_keys:
from framework.skills.defaults import DATA_BUFFER_KEYS as _skill_keys
existing_underscore = [k for k in buffer._data if k.startswith("_")]
extra_keys = set(_skill_keys) | set(existing_underscore)
for key in extra_keys:
if read_keys and key not in read_keys:
read_keys.append(key)
if write_keys and key not in write_keys:
write_keys.append(key)
return buffer.with_permissions(read_keys=read_keys, write_keys=write_keys)
def build_node_accounts_prompt(
*,
accounts_prompt: str,
accounts_data: list[dict] | None,
tool_provider_map: dict[str, str] | None,
node_tool_names: list[str] | None,
fallback_to_default: bool = False,
) -> str:
"""Resolve the accounts prompt for one node."""
resolved = accounts_prompt
if accounts_data and tool_provider_map:
from framework.graph.prompting import build_accounts_prompt
filtered = build_accounts_prompt(
accounts_data,
tool_provider_map,
node_tool_names=node_tool_names,
)
if filtered or not fallback_to_default:
resolved = filtered
return resolved
def _resolve_available_tools(
*,
node_spec: NodeSpec,
tools: list[Any],
override_tools: list[Any] | None,
) -> list[Any]:
"""Select tools available to the current node."""
if override_tools is not None:
return list(override_tools)
if not node_spec.tools:
return []
return [tool for tool in tools if tool.name in node_spec.tools]
def _derive_input_data(buffer: DataBuffer, input_keys: list[str]) -> dict[str, Any]:
"""Collect node inputs from the shared buffer."""
input_data: dict[str, Any] = {}
for key in input_keys:
value = buffer.read(key)
if value is not None:
input_data[key] = value
return input_data
def build_node_context(
*,
runtime: Runtime,
node_spec: NodeSpec,
buffer: DataBuffer,
goal: Goal,
llm: Any,
tools: list[Any],
max_tokens: int,
input_data: dict[str, Any] | None = None,
derive_input_data_from_buffer: bool = False,
runtime_logger: Any = None,
pause_event: Any = None,
continuous_mode: bool = False,
inherited_conversation: Any = None,
override_tools: list[Any] | None = None,
cumulative_output_keys: list[str] | None = None,
event_triggered: bool = False,
accounts_prompt: str = "",
accounts_data: list[dict] | None = None,
tool_provider_map: dict[str, str] | None = None,
fallback_to_default_accounts_prompt: bool = False,
identity_prompt: str = "",
narrative: str = "",
execution_id: str = "",
run_id: str = "",
stream_id: str = "",
node_registry: dict[str, NodeSpec] | None = None,
all_tools: list[Any] | None = None,
shared_node_registry: dict[str, NodeProtocol] | None = None,
dynamic_tools_provider: Any = None,
dynamic_prompt_provider: Any = None,
dynamic_memory_provider: Any = None,
iteration_metadata_provider: Any = None,
skills_catalog_prompt: str = "",
protocols_prompt: str = "",
skill_dirs: list[str] | None = None,
default_skill_warn_ratio: float | None = None,
default_skill_batch_nudge: str | None = None,
memory_prompt: str = "",
) -> NodeContext:
"""Build a canonical `NodeContext` for graph execution."""
available_tools = _resolve_available_tools(
node_spec=node_spec,
tools=tools,
override_tools=override_tools,
)
scoped_buffer = build_scoped_buffer(buffer, node_spec)
node_accounts_prompt = build_node_accounts_prompt(
accounts_prompt=accounts_prompt,
accounts_data=accounts_data,
tool_provider_map=tool_provider_map,
node_tool_names=node_spec.tools,
fallback_to_default=fallback_to_default_accounts_prompt,
)
resolved_input_data = (
_derive_input_data(buffer, node_spec.input_keys)
if input_data is None and derive_input_data_from_buffer
else dict(input_data or {})
)
return NodeContext(
runtime=runtime,
node_id=node_spec.id,
node_spec=node_spec,
buffer=scoped_buffer,
input_data=resolved_input_data,
llm=llm,
available_tools=available_tools,
goal_context=goal.to_prompt_context(),
goal=goal,
max_tokens=max_tokens,
runtime_logger=runtime_logger,
pause_event=pause_event,
continuous_mode=continuous_mode,
inherited_conversation=inherited_conversation,
cumulative_output_keys=cumulative_output_keys or [],
event_triggered=event_triggered,
accounts_prompt=node_accounts_prompt,
identity_prompt=identity_prompt,
narrative=narrative,
memory_prompt=memory_prompt,
execution_id=execution_id,
run_id=run_id,
stream_id=stream_id,
node_registry=node_registry or {},
all_tools=list(all_tools or tools),
shared_node_registry=shared_node_registry or {},
dynamic_tools_provider=dynamic_tools_provider,
dynamic_prompt_provider=dynamic_prompt_provider,
dynamic_memory_provider=dynamic_memory_provider,
iteration_metadata_provider=iteration_metadata_provider,
skills_catalog_prompt=skills_catalog_prompt,
protocols_prompt=protocols_prompt,
skill_dirs=list(skill_dirs or []),
default_skill_warn_ratio=default_skill_warn_ratio,
default_skill_batch_nudge=default_skill_batch_nudge,
)
def build_node_context_from_graph_context(
graph_context: GraphContext,
*,
node_spec: NodeSpec,
pause_event: Any = None,
input_data: dict[str, Any] | None = None,
derive_input_data_from_buffer: bool = True,
override_tools: list[Any] | None = None,
inherited_conversation: Any = None,
cumulative_output_keys: list[str] | None = None,
event_triggered: bool = False,
identity_prompt: str | None = None,
narrative: str = "",
node_registry: dict[str, NodeSpec] | None = None,
fallback_to_default_accounts_prompt: bool = True,
) -> NodeContext:
"""Build `NodeContext` using shared graph-run state."""
gc = graph_context
resolved_override_tools = override_tools
if resolved_override_tools is None and gc.is_continuous and gc.cumulative_tools:
resolved_override_tools = list(gc.cumulative_tools)
resolved_inherited_conversation = inherited_conversation
if resolved_inherited_conversation is None and gc.is_continuous:
resolved_inherited_conversation = gc.continuous_conversation
resolved_output_keys = cumulative_output_keys
if resolved_output_keys is None and gc.is_continuous:
resolved_output_keys = list(gc.cumulative_output_keys)
return build_node_context(
runtime=gc.runtime,
node_spec=node_spec,
buffer=gc.buffer,
goal=gc.goal,
llm=gc.llm,
tools=gc.tools,
max_tokens=gc.graph.max_tokens,
input_data=input_data,
derive_input_data_from_buffer=derive_input_data_from_buffer,
runtime_logger=gc.runtime_logger,
pause_event=pause_event,
continuous_mode=gc.is_continuous,
inherited_conversation=resolved_inherited_conversation,
override_tools=resolved_override_tools,
cumulative_output_keys=resolved_output_keys,
event_triggered=event_triggered,
accounts_prompt=gc.accounts_prompt,
accounts_data=gc.accounts_data,
tool_provider_map=gc.tool_provider_map,
fallback_to_default_accounts_prompt=fallback_to_default_accounts_prompt,
identity_prompt=identity_prompt if identity_prompt is not None else getattr(gc.graph, "identity_prompt", "") or "",
narrative=narrative,
execution_id=gc.execution_id,
run_id=gc.run_id,
stream_id=gc.stream_id,
node_registry=node_registry or gc.node_spec_registry,
all_tools=gc.tools,
shared_node_registry=gc.node_registry,
dynamic_tools_provider=gc.dynamic_tools_provider,
dynamic_prompt_provider=gc.dynamic_prompt_provider,
dynamic_memory_provider=gc.dynamic_memory_provider,
iteration_metadata_provider=gc.iteration_metadata_provider,
skills_catalog_prompt=gc.skills_catalog_prompt,
protocols_prompt=gc.protocols_prompt,
skill_dirs=gc.skill_dirs,
default_skill_warn_ratio=gc.context_warn_ratio,
default_skill_batch_nudge=gc.batch_init_nudge,
)
+131 -23
View File
@@ -8,6 +8,13 @@ from dataclasses import dataclass
from pathlib import Path
from typing import Any, Literal, Protocol, runtime_checkable
LEGACY_RUN_ID = "__legacy_run__"
def is_legacy_run_id(run_id: str | None) -> bool:
"""True when run_id represents pre-migration (no run boundary) data."""
return run_id is None or run_id == LEGACY_RUN_ID
@dataclass
class Message:
@@ -37,6 +44,8 @@ class Message:
image_content: list[dict[str, Any]] | None = None
# True when message contains an activated skill body (AS-10: never prune)
is_skill_content: bool = False
# Logical worker run identifier for shared-session persistence
run_id: str | None = None
def to_llm_dict(self) -> dict[str, Any]:
"""Convert to OpenAI-format message dict."""
@@ -93,6 +102,8 @@ class Message:
d["is_client_input"] = self.is_client_input
if self.image_content is not None:
d["image_content"] = self.image_content
if self.run_id is not None:
d["run_id"] = self.run_id
return d
@classmethod
@@ -109,9 +120,40 @@ class Message:
is_transition_marker=data.get("is_transition_marker", False),
is_client_input=data.get("is_client_input", False),
image_content=data.get("image_content"),
run_id=data.get("run_id"),
)
def _normalize_cursor(cursor: dict[str, Any] | None) -> dict[str, Any]:
"""Normalize legacy and run-scoped cursor formats into one flat shape."""
return dict(cursor) if cursor else {}
def get_cursor_next_seq(cursor: dict[str, Any] | None) -> int | None:
next_seq = (cursor or {}).get("next_seq")
return next_seq if isinstance(next_seq, int) else None
def update_cursor_next_seq(cursor: dict[str, Any] | None, next_seq: int) -> dict[str, Any]:
updated = dict(cursor or {})
updated["next_seq"] = next_seq
return updated
def get_run_cursor(cursor: dict[str, Any] | None, run_id: str | None) -> dict[str, Any] | None:
return dict(cursor) if cursor else None
def update_run_cursor(
cursor: dict[str, Any] | None,
run_id: str | None,
values: dict[str, Any],
) -> dict[str, Any]:
updated = dict(cursor or {})
updated.update(values)
return updated
def _extract_spillover_filename(content: str) -> str | None:
"""Extract spillover filename from a tool result annotation.
@@ -261,7 +303,7 @@ class ConversationStore(Protocol):
async def read_cursor(self) -> dict[str, Any] | None: ...
async def delete_parts_before(self, seq: int) -> None: ...
async def delete_parts_before(self, seq: int, run_id: str | None = None) -> None: ...
async def close(self) -> None: ...
@@ -333,6 +375,7 @@ class NodeConversation:
compaction_threshold: float = 0.8,
output_keys: list[str] | None = None,
store: ConversationStore | None = None,
run_id: str | None = None,
) -> None:
self._system_prompt = system_prompt
self._max_context_tokens = max_context_tokens
@@ -344,6 +387,7 @@ class NodeConversation:
self._meta_persisted: bool = False
self._last_api_input_tokens: int | None = None
self._current_phase: str | None = None
self._run_id: str | None = run_id
# --- Properties --------------------------------------------------------
@@ -402,12 +446,16 @@ class NodeConversation:
role="user",
content=content,
phase_id=self._current_phase,
run_id=self._run_id,
is_transition_marker=is_transition_marker,
is_client_input=is_client_input,
image_content=image_content,
)
self._messages.append(msg)
self._next_seq += 1
# Invalidate stale API token count so estimate_tokens() uses
# the char-based heuristic which reflects the new message.
self._last_api_input_tokens = None
await self._persist(msg)
return msg
@@ -422,9 +470,11 @@ class NodeConversation:
content=content,
tool_calls=tool_calls,
phase_id=self._current_phase,
run_id=self._run_id,
)
self._messages.append(msg)
self._next_seq += 1
self._last_api_input_tokens = None
await self._persist(msg)
return msg
@@ -445,9 +495,11 @@ class NodeConversation:
phase_id=self._current_phase,
image_content=image_content,
is_skill_content=is_skill_content,
run_id=self._run_id,
)
self._messages.append(msg)
self._next_seq += 1
self._last_api_input_tokens = None
await self._persist(msg)
return msg
@@ -528,12 +580,15 @@ class NodeConversation:
Uses actual API input token count when available (set via
:meth:`update_token_count`), otherwise falls back to a
``total_chars / 4`` heuristic that includes both message content
AND tool_call argument sizes.
character-based heuristic that includes message content, tool_call
arguments, and image blocks. The heuristic applies a 4/3 safety
margin to avoid under-counting (inspired by Claude Code's compact
service).
"""
if self._last_api_input_tokens is not None:
return self._last_api_input_tokens
total_chars = 0
image_tokens = 0
for m in self._messages:
total_chars += len(m.content)
if m.tool_calls:
@@ -541,7 +596,11 @@ class NodeConversation:
func = tc.get("function", {})
total_chars += len(func.get("arguments", ""))
total_chars += len(func.get("name", ""))
return total_chars // 4
if m.image_content:
# Images/documents have a fixed token cost per block
image_tokens += len(m.image_content) * 2000
# Apply 4/3 safety margin to character-based estimate
return (total_chars * 4) // (3 * 4) + image_tokens
def update_token_count(self, actual_input_tokens: int) -> None:
"""Store actual API input token count for more accurate compaction.
@@ -688,6 +747,7 @@ class NodeConversation:
is_error=msg.is_error,
phase_id=msg.phase_id,
is_transition_marker=msg.is_transition_marker,
run_id=msg.run_id,
)
count += 1
@@ -764,14 +824,14 @@ class NodeConversation:
summary_seq = self._next_seq
self._next_seq += 1
summary_msg = Message(seq=summary_seq, role="user", content=summary)
summary_msg = Message(seq=summary_seq, role="user", content=summary, run_id=self._run_id)
# Persist
if self._store:
delete_before = recent_messages[0].seq if recent_messages else self._next_seq
await self._store.delete_parts_before(delete_before)
await self._store.write_part(summary_msg.seq, summary_msg.to_storage_dict())
await self._store.write_cursor({"next_seq": self._next_seq})
await self._write_next_seq()
self._messages = [summary_msg] + recent_messages
self._last_api_input_tokens = None # reset; next LLM call will recalibrate
@@ -829,6 +889,15 @@ class NodeConversation:
freeform_lines: list[str] = []
collapsed_msgs: list[Message] = []
# Collect all tool_use IDs present in old messages so we can detect
# orphaned tool results whose parent assistant message was already
# compacted away (API invariant protection).
old_tc_ids: set[str] = set()
for msg in old_messages:
if msg.tool_calls:
for tc in msg.tool_calls:
old_tc_ids.add(tc.get("id", ""))
if aggressive:
# Aggressive: only keep set_output tool pairs and error results.
# Everything else is collapsed into a tool-call history summary.
@@ -850,9 +919,17 @@ class NodeConversation:
else:
collapsible_tc_ids |= tc_ids
# Skill content and transition markers are always protected
for msg in old_messages:
if msg.role == "tool" and msg.is_skill_content and msg.tool_use_id:
protected_tc_ids.add(msg.tool_use_id)
# Second pass: classify all messages
for msg in old_messages:
if msg.role == "tool":
if msg.is_transition_marker:
# Transition markers are always kept (phase boundaries)
kept_structural.append(msg)
elif msg.role == "tool":
tc_id = msg.tool_use_id or ""
if tc_id in protected_tc_ids:
kept_structural.append(msg)
@@ -861,6 +938,12 @@ class NodeConversation:
kept_structural.append(msg)
# Protect the parent assistant message too
protected_tc_ids.add(tc_id)
elif msg.is_skill_content:
kept_structural.append(msg)
elif tc_id and tc_id not in old_tc_ids:
# Orphaned tool result — parent tool_use not in old msgs.
# Keep it to maintain API invariants.
kept_structural.append(msg)
else:
collapsed_msgs.append(msg)
elif msg.role == "assistant" and msg.tool_calls:
@@ -877,6 +960,7 @@ class NodeConversation:
is_error=msg.is_error,
phase_id=msg.phase_id,
is_transition_marker=msg.is_transition_marker,
run_id=msg.run_id,
)
)
else:
@@ -891,7 +975,10 @@ class NodeConversation:
else:
# Standard mode: keep all tool call pairs as structural
for msg in old_messages:
if msg.role == "tool":
if msg.is_transition_marker:
# Transition markers are always kept (phase boundaries)
kept_structural.append(msg)
elif msg.role == "tool":
kept_structural.append(msg)
elif msg.role == "assistant" and msg.tool_calls:
compact_tcs = _compact_tool_calls(msg.tool_calls)
@@ -904,6 +991,7 @@ class NodeConversation:
is_error=msg.is_error,
phase_id=msg.phase_id,
is_transition_marker=msg.is_transition_marker,
run_id=msg.run_id,
)
)
else:
@@ -961,7 +1049,7 @@ class NodeConversation:
ref_seq = self._next_seq
self._next_seq += 1
ref_msg = Message(seq=ref_seq, role="user", content=ref_content)
ref_msg = Message(seq=ref_seq, role="user", content=ref_content, run_id=self._run_id)
# Persist: delete old messages from store, write reference + kept structural.
# In aggressive mode, collapsed messages may be interspersed with kept
@@ -975,7 +1063,7 @@ class NodeConversation:
# Write kept structural messages (they may have been modified)
for msg in kept_structural:
await self._store.write_part(msg.seq, msg.to_storage_dict())
await self._store.write_cursor({"next_seq": self._next_seq})
await self._write_next_seq()
# Reassemble: reference + kept structural (in original order) + recent
self._messages = [ref_msg] + kept_structural + recent_messages
@@ -1012,7 +1100,7 @@ class NodeConversation:
"""Remove all messages, keep system prompt, preserve ``_next_seq``."""
if self._store:
await self._store.delete_parts_before(self._next_seq)
await self._store.write_cursor({"next_seq": self._next_seq})
await self._write_next_seq()
self._messages.clear()
self._last_api_input_tokens = None
@@ -1054,22 +1142,32 @@ class NodeConversation:
if not self._meta_persisted:
await self._persist_meta()
await self._store.write_part(message.seq, message.to_storage_dict())
await self._store.write_cursor({"next_seq": self._next_seq})
await self._write_next_seq()
async def _persist_meta(self) -> None:
"""Lazily write conversation metadata to the store (called once)."""
"""Lazily write conversation metadata to the store (called once).
When ``self._run_id`` is set, metadata is written flat for backward
compatibility (run-scoped isolation has been reverted).
"""
if self._store is None:
return
await self._store.write_meta(
{
"system_prompt": self._system_prompt,
"max_context_tokens": self._max_context_tokens,
"compaction_threshold": self._compaction_threshold,
"output_keys": self._output_keys,
}
)
run_meta = {
"system_prompt": self._system_prompt,
"max_context_tokens": self._max_context_tokens,
"compaction_threshold": self._compaction_threshold,
"output_keys": self._output_keys,
}
await self._store.write_meta(run_meta)
self._meta_persisted = True
async def _write_next_seq(self) -> None:
if self._store is None:
return
cursor = await self._store.read_cursor() or {}
cursor["next_seq"] = self._next_seq
await self._store.write_cursor(cursor)
# --- Restore -----------------------------------------------------------
@classmethod
@@ -1077,6 +1175,7 @@ class NodeConversation:
cls,
store: ConversationStore,
phase_id: str | None = None,
run_id: str | None = None,
) -> NodeConversation | None:
"""Reconstruct a NodeConversation from a store.
@@ -1086,6 +1185,9 @@ class NodeConversation:
Used in isolated mode so a node only sees its own
messages in the shared flat store. In continuous mode
pass ``None`` to load all parts.
run_id: If set, only load parts matching this run_id.
Ensures intentional restarts (new run_id) start fresh
while crash recovery (same run_id) resumes correctly.
Returns ``None`` if the store contains no metadata (i.e. the
conversation was never persisted).
@@ -1100,17 +1202,23 @@ class NodeConversation:
compaction_threshold=meta.get("compaction_threshold", 0.8),
output_keys=meta.get("output_keys"),
store=store,
run_id=run_id,
)
conv._meta_persisted = True
parts = await store.read_parts()
if phase_id:
parts = [p for p in parts if p.get("phase_id") == phase_id]
# Filter by run_id so intentional restarts (new run_id) start fresh
# while crash recovery (same run_id) loads prior parts.
if run_id and not is_legacy_run_id(run_id):
parts = [p for p in parts if p.get("run_id") == run_id]
conv._messages = [Message.from_storage_dict(p) for p in parts]
cursor = await store.read_cursor()
if cursor:
conv._next_seq = cursor["next_seq"]
next_seq = get_cursor_next_seq(cursor)
if next_seq is not None:
conv._next_seq = next_seq
elif conv._messages:
conv._next_seq = conv._messages[-1].seq + 1
+27 -90
View File
@@ -108,7 +108,7 @@ class EdgeSpec(BaseModel):
self,
source_success: bool,
source_output: dict[str, Any],
memory: dict[str, Any],
buffer_data: dict[str, Any],
llm: Any | None = None,
goal: Any | None = None,
source_node_name: str | None = None,
@@ -120,7 +120,7 @@ class EdgeSpec(BaseModel):
Args:
source_success: Whether the source node succeeded
source_output: Output from the source node
memory: Current shared memory state
buffer_data: Current data buffer state
llm: LLM provider for LLM_DECIDE edges
goal: Goal object for LLM_DECIDE edges
source_node_name: Name of source node (for LLM context)
@@ -139,7 +139,7 @@ class EdgeSpec(BaseModel):
return not source_success
if self.condition == EdgeCondition.CONDITIONAL:
return self._evaluate_condition(source_output, memory)
return self._evaluate_condition(source_output, buffer_data)
if self.condition == EdgeCondition.LLM_DECIDE:
if llm is None or goal is None:
@@ -150,7 +150,7 @@ class EdgeSpec(BaseModel):
goal=goal,
source_success=source_success,
source_output=source_output,
memory=memory,
buffer_data=buffer_data,
source_node_name=source_node_name,
target_node_name=target_node_name,
)
@@ -160,7 +160,7 @@ class EdgeSpec(BaseModel):
def _evaluate_condition(
self,
output: dict[str, Any],
memory: dict[str, Any],
buffer_data: dict[str, Any],
) -> bool:
"""Evaluate a conditional expression."""
@@ -168,14 +168,14 @@ class EdgeSpec(BaseModel):
return True
# Build evaluation context
# Include memory keys directly for easier access in conditions
# Include buffer keys directly for easier access in conditions
context = {
"output": output,
"memory": memory,
"buffer": buffer_data,
"result": output.get("result"),
"true": True, # Allow lowercase true/false in conditions
"false": False,
**memory, # Unpack memory keys directly into context
**buffer_data, # Unpack buffer keys directly into context
}
try:
@@ -186,7 +186,7 @@ class EdgeSpec(BaseModel):
expr_vars = {
k: repr(context[k])
for k in context
if k not in ("output", "memory", "result", "true", "false")
if k not in ("output", "buffer", "result", "true", "false")
and k in self.condition_expr
}
logger.info(
@@ -209,7 +209,7 @@ class EdgeSpec(BaseModel):
goal: Any,
source_success: bool,
source_output: dict[str, Any],
memory: dict[str, Any],
buffer_data: dict[str, Any],
source_node_name: str | None,
target_node_name: str | None,
) -> bool:
@@ -234,8 +234,8 @@ class EdgeSpec(BaseModel):
Should we proceed to: {target_node_name or self.target}?
Edge description: {self.description or "No description"}
**Context from memory**:
{json.dumps({k: str(v)[:100] for k, v in list(memory.items())[:5]}, indent=2)}
**Context from data buffer**:
{json.dumps({k: str(v)[:100] for k, v in list(buffer_data.items())[:5]}, indent=2)}
Evaluate whether proceeding to this next node is the right step toward achieving the goal.
Consider:
@@ -276,14 +276,14 @@ Respond with ONLY a JSON object:
def map_inputs(
self,
source_output: dict[str, Any],
memory: dict[str, Any],
buffer_data: dict[str, Any],
) -> dict[str, Any]:
"""
Map source outputs to target inputs.
Args:
source_output: Output from source node
memory: Current shared memory
buffer_data: Current data buffer
Returns:
Input dict for target node
@@ -294,72 +294,14 @@ Respond with ONLY a JSON object:
result = {}
for target_key, source_key in self.input_mapping.items():
# Try source output first, then memory
# Try source output first, then buffer
if source_key in source_output:
result[target_key] = source_output[source_key]
elif source_key in memory:
result[target_key] = memory[source_key]
elif source_key in buffer_data:
result[target_key] = buffer_data[source_key]
return result
class AsyncEntryPointSpec(BaseModel):
"""
Specification for an asynchronous entry point.
Used with AgentRuntime for multi-entry-point agents that handle
concurrent execution streams (e.g., webhook + API handlers).
Example:
AsyncEntryPointSpec(
id="webhook",
name="Zendesk Webhook Handler",
entry_node="process-webhook",
trigger_type="webhook",
isolation_level="shared",
)
"""
id: str = Field(description="Unique identifier for this entry point")
name: str = Field(description="Human-readable name")
entry_node: str = Field(
default="",
description="Deprecated: Node ID to start execution from. "
"Triggers are graph-level; worker always enters at GraphSpec.entry_node.",
)
trigger_type: str = Field(
default="manual",
description="How this entry point is triggered: webhook, api, timer, event, manual",
)
trigger_config: dict[str, Any] = Field(
default_factory=dict,
description="Trigger-specific configuration (e.g., webhook URL, timer interval)",
)
task: str = Field(
default="",
description="Worker task string when this trigger fires autonomously",
)
isolation_level: str = Field(
default="shared", description="State isolation: isolated, shared, or synchronized"
)
priority: int = Field(default=0, description="Execution priority (higher = more priority)")
max_concurrent: int = Field(
default=10, description="Maximum concurrent executions for this entry point"
)
max_resurrections: int = Field(
default=3,
description="Auto-restart on non-fatal failure (0 to disable)",
)
model_config = {"extra": "allow"}
def get_isolation_level(self):
"""Convert string isolation level to enum (duck-type with EntryPointSpec)."""
from framework.runtime.execution_stream import IsolationLevel
return IsolationLevel(self.isolation_level)
class GraphSpec(BaseModel):
"""
Complete specification of an agent graph.
@@ -403,9 +345,9 @@ class GraphSpec(BaseModel):
)
edges: list[EdgeSpec] = Field(default_factory=list, description="All edge specifications")
# Shared memory keys
memory_keys: list[str] = Field(
default_factory=list, description="Keys available in shared memory"
# Data buffer keys
buffer_keys: list[str] = Field(
default_factory=list, description="Keys available in data buffer"
)
# Default LLM settings
@@ -609,21 +551,16 @@ class GraphSpec(BaseModel):
continue
errors.append(f"Node '{node.id}' is unreachable from entry")
# Client-facing fan-out validation
fan_outs = self.detect_fan_out_nodes()
for source_id, targets in fan_outs.items():
client_facing_targets = [
t
for t in targets
if self.get_node(t) and getattr(self.get_node(t), "client_facing", False)
]
if len(client_facing_targets) > 1:
errors.append(
f"Fan-out from '{source_id}' has multiple client-facing nodes: "
f"{client_facing_targets}. Only one branch may be client-facing."
for node in self.nodes:
if getattr(node, "client_facing", False) and getattr(node, "id", "") != "queen":
warnings.append(
f"Node '{node.id}' sets deprecated client_facing=True. "
"Only the queen talks directly to users now; migrate this node "
"to queen-mediated escalation."
)
# Output key overlap on parallel event_loop nodes
fan_outs = self.detect_fan_out_nodes()
for source_id, targets in fan_outs.items():
event_loop_targets = [
t
+238 -24
View File
@@ -1,7 +1,8 @@
"""Conversation compaction pipeline.
Implements the multi-level compaction strategy:
1. Prune old tool results
0. Microcompaction (count-based tool result clearing cheapest)
1. Prune old tool results (token-budget based)
2. Structure-preserving compaction (spillover)
3. LLM summary compaction (with recursive splitting)
4. Emergency deterministic summary (no LLM)
@@ -13,11 +14,12 @@ import json
import logging
import os
import re
import time
from datetime import UTC, datetime
from pathlib import Path
from typing import Any
from framework.graph.conversation import NodeConversation
from framework.graph.conversation import Message, NodeConversation
from framework.graph.event_loop.event_publishing import publish_context_usage
from framework.graph.event_loop.types import LoopConfig, OutputAccumulator
from framework.graph.node import NodeContext
@@ -29,6 +31,121 @@ logger = logging.getLogger(__name__)
LLM_COMPACT_CHAR_LIMIT: int = 240_000
LLM_COMPACT_MAX_DEPTH: int = 10
# Microcompaction: tools whose results can be safely cleared
COMPACTABLE_TOOLS: frozenset[str] = frozenset(
{
"read_file",
"run_command",
"web_search",
"web_fetch",
"grep_search",
"glob_search",
"write_file",
"edit_file",
"browser_screenshot",
"list_directory",
}
)
# Keep at most this many compactable tool results; clear older ones
MICROCOMPACT_KEEP_RECENT: int = 8
# Circuit-breaker: stop auto-compacting after this many consecutive failures
MAX_CONSECUTIVE_FAILURES: int = 3
# Track consecutive compaction failures per conversation (module-level)
_failure_counts: dict[int, int] = {}
# Track last compaction time per conversation for recompaction detection
_last_compact_times: dict[int, float] = {}
def microcompact(
conversation: NodeConversation,
*,
keep_recent: int = MICROCOMPACT_KEEP_RECENT,
) -> int:
"""Clear old compactable tool results by count, keeping only the most recent.
This is the cheapest possible compaction no LLM call, no structural
changes, just replaces old tool result content with a short placeholder.
Inspired by Claude Code's cached-microcompact strategy.
Returns the number of tool results cleared.
"""
# Collect indices of compactable tool results (newest first)
compactable_indices: list[int] = []
messages = conversation.messages
for i in range(len(messages) - 1, -1, -1):
msg = messages[i]
if msg.role != "tool" or msg.is_error or msg.is_skill_content:
continue
if msg.content.startswith(("[Pruned tool result", "[Old tool result")):
continue
if len(msg.content) < 100:
continue
# Check if the tool that produced this result is compactable
tool_name = _find_tool_name_for_result(messages, msg)
if tool_name and tool_name in COMPACTABLE_TOOLS:
compactable_indices.append(i)
# Keep the most recent N, clear the rest
to_clear = compactable_indices[keep_recent:]
if not to_clear:
return 0
cleared = 0
for i in to_clear:
msg = messages[i]
spillover = _extract_spillover_filename_inline(msg.content)
orig_len = len(msg.content)
if spillover:
placeholder = (
f"[Old tool result cleared: {orig_len} chars. "
f"Full data in '{spillover}'. "
f"Use load_data('{spillover}') to retrieve.]"
)
else:
placeholder = f"[Old tool result cleared: {orig_len} chars.]"
# Mutate in-place (microcompact is synchronous, no store writes)
conversation._messages[i] = Message(
seq=msg.seq,
role=msg.role,
content=placeholder,
tool_use_id=msg.tool_use_id,
tool_calls=msg.tool_calls,
is_error=msg.is_error,
phase_id=msg.phase_id,
is_transition_marker=msg.is_transition_marker,
)
cleared += 1
if cleared > 0:
# Invalidate cached token count
conversation._last_api_input_tokens = None
return cleared
def _find_tool_name_for_result(messages: list[Message], tool_msg: Message) -> str | None:
"""Find the tool name from the assistant message that triggered this tool result."""
if not tool_msg.tool_use_id:
return None
for msg in messages:
if msg.tool_calls:
for tc in msg.tool_calls:
if tc.get("id") == tool_msg.tool_use_id:
return tc.get("function", {}).get("name")
return None
def _extract_spillover_filename_inline(content: str) -> str | None:
"""Quick inline check for spillover filename in tool result content."""
match = re.search(r"saved to '([^']+)'", content, re.IGNORECASE)
return match.group(1) if match else None
async def compact(
ctx: NodeContext,
@@ -43,11 +160,31 @@ async def compact(
"""Run the full compaction pipeline if conversation needs compaction.
Pipeline stages (in order, short-circuits when budget is restored):
1. Prune old tool results
0. Microcompaction (count-based tool result clearing cheapest)
1. Prune old tool results (token-budget based)
2. Structure-preserving compaction (free, no LLM)
3. LLM summary compaction (recursive split if too large)
4. Emergency deterministic summary (fallback)
"""
conv_id = id(conversation)
# Circuit breaker: stop auto-compacting after repeated failures
if _failure_counts.get(conv_id, 0) >= MAX_CONSECUTIVE_FAILURES:
logger.warning(
"Circuit breaker: skipping compaction after %d consecutive failures",
_failure_counts[conv_id],
)
return
# Recompaction detection
now = time.monotonic()
last_time = _last_compact_times.get(conv_id)
if last_time is not None and (now - last_time) < 30:
logger.warning(
"Recompaction chain detected: only %.1fs since last compaction",
now - last_time,
)
ratio_before = conversation.usage_ratio()
phase_grad = getattr(ctx, "continuous_mode", False)
pre_inventory: list[dict[str, Any]] | None = None
@@ -55,6 +192,26 @@ async def compact(
if ratio_before >= 1.0:
pre_inventory = build_message_inventory(conversation)
# --- Step 0: Microcompaction (count-based, cheapest) ---
mc_cleared = microcompact(conversation)
if mc_cleared > 0:
logger.info(
"Microcompact cleared %d old tool results: %.0f%% -> %.0f%%",
mc_cleared,
ratio_before * 100,
conversation.usage_ratio() * 100,
)
if not conversation.needs_compaction():
_record_success(conv_id, now)
await log_compaction(
ctx,
conversation,
ratio_before,
event_bus,
pre_inventory=pre_inventory,
)
return
# --- Step 1: Prune old tool results (free, fast) ---
protect = max(2000, config.max_context_tokens // 12)
pruned = await conversation.prune_old_tool_results(
@@ -69,6 +226,7 @@ async def compact(
conversation.usage_ratio() * 100,
)
if not conversation.needs_compaction():
_record_success(conv_id, now)
await log_compaction(
ctx,
conversation,
@@ -87,6 +245,7 @@ async def compact(
phase_graduated=phase_grad,
)
if not conversation.needs_compaction():
_record_success(conv_id, now)
await log_compaction(
ctx,
conversation,
@@ -118,8 +277,10 @@ async def compact(
)
except Exception as e:
logger.warning("LLM compaction failed: %s", e)
_failure_counts[conv_id] = _failure_counts.get(conv_id, 0) + 1
if not conversation.needs_compaction():
_record_success(conv_id, now)
await log_compaction(
ctx,
conversation,
@@ -140,6 +301,7 @@ async def compact(
keep_recent=1,
phase_graduated=phase_grad,
)
_record_success(conv_id, now)
await log_compaction(
ctx,
conversation,
@@ -149,9 +311,46 @@ async def compact(
)
def _record_success(conv_id: int, timestamp: float) -> None:
"""Reset failure counter and record compaction time on success."""
_failure_counts.pop(conv_id, None)
_last_compact_times[conv_id] = timestamp
# --- LLM compaction with binary-search splitting ----------------------
def strip_images_from_messages(messages: list[Message]) -> list[Message]:
"""Strip image_content from messages before LLM summarisation.
Images/documents are replaced with ``[image]`` markers so the summary
notes they existed without wasting tokens sending binary data to the
compaction LLM. Returns a new list (original messages are not mutated).
"""
stripped: list[Message] = []
for msg in messages:
if msg.image_content:
n_images = len(msg.image_content)
marker = " ".join("[image]" for _ in range(n_images))
content = f"{msg.content}\n{marker}" if msg.content else marker
stripped.append(
Message(
seq=msg.seq,
role=msg.role,
content=content,
tool_use_id=msg.tool_use_id,
tool_calls=msg.tool_calls,
is_error=msg.is_error,
phase_id=msg.phase_id,
is_transition_marker=msg.is_transition_marker,
image_content=None, # stripped
)
)
else:
stripped.append(msg)
return stripped
async def llm_compact(
ctx: NodeContext,
messages: list,
@@ -175,6 +374,10 @@ async def llm_compact(
if _depth > max_depth:
raise RuntimeError(f"LLM compaction recursion limit ({max_depth})")
# Strip images before summarisation to avoid wasting tokens
if _depth == 0:
messages = strip_images_from_messages(messages)
formatted = format_messages_for_summary(messages)
# Proactive split: avoid wasting an API call on oversized input
@@ -297,7 +500,12 @@ def build_llm_compaction_prompt(
*,
max_context_tokens: int = 128_000,
) -> str:
"""Build prompt for LLM compaction targeting 50% of token budget."""
"""Build prompt for LLM compaction targeting 50% of token budget.
Uses a structured section format inspired by Claude Code's compact
service. Each section focuses on a different aspect of the conversation
so the summariser produces consistently useful, well-organised output.
"""
spec = ctx.node_spec
ctx_lines = [f"NODE: {spec.name} (id={spec.id})"]
if spec.description:
@@ -330,13 +538,30 @@ def build_llm_compaction_prompt(
f"CONVERSATION MESSAGES:\n{formatted_messages}\n\n"
"INSTRUCTIONS:\n"
f"Write a summary of approximately {target_chars} characters "
f"(~{target_tokens} tokens).\n"
"1. Preserve ALL user-stated rules, constraints, and preferences "
"verbatim.\n"
"2. Preserve key decisions made and results obtained.\n"
"3. Preserve in-progress work state so the agent can continue.\n"
"4. Be detailed enough that the agent can resume without "
"re-doing work.\n"
f"(~{target_tokens} tokens).\n\n"
"Organise the summary into these sections (omit empty ones):\n\n"
"1. **Primary Request and Intent** — What the user originally asked "
"for and the high-level goal the agent is working toward.\n"
"2. **Key Technical Concepts** — Important domain-specific terms, "
"patterns, or architectural decisions established in the conversation.\n"
"3. **Files and Code Sections** — Specific files read/written/edited "
"with brief descriptions of changes. Include short code snippets only "
"when they capture critical logic.\n"
"4. **Errors and Fixes** — Problems encountered and how they were "
"resolved. Include root causes so the agent doesn't repeat them.\n"
"5. **Problem Solving Efforts** — Approaches tried, dead ends hit, "
"and reasoning behind the current strategy.\n"
"6. **User Messages** — Preserve ALL user-stated rules, constraints, "
"identity preferences, and account details verbatim.\n"
"7. **Pending Tasks** — Work remaining, outputs still needed, and "
"any blockers.\n"
"8. **Current Work** — The most recent action taken and the immediate "
"next step the agent should perform. This section is the most important "
"for seamless resumption.\n\n"
"Additional rules:\n"
"- Be detailed enough that the agent can resume without re-doing work.\n"
"- Preserve key decisions made and results obtained.\n"
"- When in doubt, keep information rather than discard it.\n"
)
@@ -551,7 +776,7 @@ def build_emergency_summary(
# 2. Inputs the node received
input_lines = []
for key in spec.input_keys:
value = ctx.input_data.get(key) or ctx.memory.read(key)
value = ctx.input_data.get(key) or ctx.buffer.read(key)
if value is not None:
# Truncate long values but keep them recognisable
v_str = str(value)
@@ -580,8 +805,6 @@ def build_emergency_summary(
# 5. Spillover files — list actual files so the LLM can load
# them immediately instead of having to call list_data_files first.
# Inline adapt.md (agent memory) directly — it contains user rules
# and identity preferences that must survive emergency compaction.
spillover_dir = config.spillover_dir if config else None
if spillover_dir:
try:
@@ -589,16 +812,7 @@ def build_emergency_summary(
data_dir = Path(spillover_dir)
if data_dir.is_dir():
# Inline adapt.md content directly
adapt_path = data_dir / "adapt.md"
if adapt_path.is_file():
adapt_text = adapt_path.read_text(encoding="utf-8").strip()
if adapt_text:
parts.append(f"AGENT MEMORY (adapt.md):\n{adapt_text}")
all_files = sorted(
f.name for f in data_dir.iterdir() if f.is_file() and f.name != "adapt.md"
)
all_files = sorted(f.name for f in data_dir.iterdir() if f.is_file())
# Separate conversation history files from regular data files
conv_files = [f for f in all_files if re.match(r"conversation_\d+\.md$", f)]
data_files = [f for f in all_files if f not in conv_files]
@@ -31,6 +31,7 @@ class RestoredState:
start_iteration: int
recent_responses: list[str]
recent_tool_fingerprints: list[list[tuple[str, str]]]
pending_input: dict[str, Any] | None
async def restore(
@@ -56,24 +57,34 @@ async def restore(
conversation = await NodeConversation.restore(
conversation_store,
phase_id=phase_filter,
run_id=ctx.effective_run_id,
)
if conversation is None:
return None
accumulator = await OutputAccumulator.restore(conversation_store)
# If run_id filtering removed all messages, this is an intentional
# restart (new run), not a crash recovery. Return None so the caller
# falls through to the fresh-conversation path.
if conversation.message_count == 0:
return None
accumulator = await OutputAccumulator.restore(conversation_store, run_id=ctx.effective_run_id)
accumulator.spillover_dir = config.spillover_dir
accumulator.max_value_chars = config.max_output_value_chars
cursor = await conversation_store.read_cursor()
start_iteration = cursor.get("iteration", 0) + 1 if cursor else 0
cursor = await conversation_store.read_cursor() or {}
start_iteration = cursor.get("iteration", 0) + 1
# Restore stall/doom-loop detection state
recent_responses: list[str] = cursor.get("recent_responses", []) if cursor else []
raw_fps = cursor.get("recent_tool_fingerprints", []) if cursor else []
recent_responses: list[str] = cursor.get("recent_responses", [])
raw_fps = cursor.get("recent_tool_fingerprints", [])
recent_tool_fingerprints: list[list[tuple[str, str]]] = [
[tuple(pair) for pair in fps] # type: ignore[misc]
for fps in raw_fps
]
pending_input = cursor.get("pending_input")
if not isinstance(pending_input, dict):
pending_input = None
logger.info(
f"Restored event loop: iteration={start_iteration}, "
@@ -88,6 +99,7 @@ async def restore(
start_iteration=start_iteration,
recent_responses=recent_responses,
recent_tool_fingerprints=recent_tool_fingerprints,
pending_input=pending_input,
)
@@ -100,6 +112,7 @@ async def write_cursor(
*,
recent_responses: list[str] | None = None,
recent_tool_fingerprints: list[list[tuple[str, str]]] | None = None,
pending_input: dict[str, Any] | None = None,
) -> None:
"""Write checkpoint cursor for crash recovery.
@@ -112,7 +125,6 @@ async def write_cursor(
{
"iteration": iteration,
"node_id": ctx.node_id,
"next_seq": conversation.next_seq,
"outputs": accumulator.to_dict(),
}
)
@@ -124,6 +136,9 @@ async def write_cursor(
cursor["recent_tool_fingerprints"] = [
[list(pair) for pair in fps] for fps in recent_tool_fingerprints
]
# Persist blocked-input state so restored runs re-block instead of
# manufacturing a synthetic continuation turn.
cursor["pending_input"] = pending_input
await conversation_store.write_cursor(cursor)
@@ -138,6 +153,7 @@ async def drain_injection_queue(
) -> int:
"""Drain all pending injected events as user messages. Returns count."""
count = 0
logger.debug("[drain_injection_queue] Starting to drain queue, initial queue size: %s", queue.qsize() if hasattr(queue, 'qsize') else 'unknown')
while not queue.empty():
try:
content, is_client_input, image_content = queue.get_nowait()
@@ -228,7 +244,7 @@ async def check_pause(
pause_requested = ctx.input_data.get("pause_requested", False)
if not pause_requested:
try:
pause_requested = ctx.memory.read("pause_requested") or False
pause_requested = ctx.buffer.read("pause_requested") or False
except (PermissionError, KeyError):
pause_requested = False
if pause_requested:
@@ -226,7 +226,7 @@ async def publish_text_delta(
inner_turn: int = 0,
) -> None:
if event_bus:
if ctx.node_spec.client_facing:
if ctx.emits_client_io:
await event_bus.emit_client_output_delta(
stream_id=stream_id,
node_id=node_id,
@@ -139,9 +139,9 @@ async def judge_turn(
),
)
# Client-facing with no output keys → continuous interaction node.
# Queen with no output keys → continuous interaction node.
# Inject tool-use pressure instead of auto-accepting.
if not output_keys and ctx.node_spec.client_facing:
if not output_keys and ctx.supports_direct_user_io:
return JudgeVerdict(
action="RETRY",
feedback=(
@@ -1,8 +1,7 @@
"""Subagent execution for the event loop.
Handles the full subagent lifecycle: validation, context setup, tool filtering,
conversation store derivation, execution, and cleanup. Also includes the
_EscalationReceiver helper used for subagent queen escalation routing.
conversation store derivation, execution, and cleanup.
"""
from __future__ import annotations
@@ -18,7 +17,7 @@ from typing import TYPE_CHECKING, Any
from framework.graph.conversation import ConversationStore
from framework.graph.event_loop.judge_pipeline import SubagentJudge
from framework.graph.event_loop.types import LoopConfig, OutputAccumulator
from framework.graph.node import NodeContext, SharedMemory
from framework.graph.node import DataBuffer, NodeContext
from framework.llm.provider import ToolResult, ToolUse
from framework.runtime.event_bus import EventBus
@@ -28,39 +27,6 @@ if TYPE_CHECKING:
logger = logging.getLogger(__name__)
class EscalationReceiver:
"""Temporary receiver registered in node_registry for subagent escalation routing.
When a subagent calls ``report_to_parent(wait_for_response=True)``, the callback
creates one of these, registers it under a unique escalation ID in the executor's
``node_registry``, and awaits ``wait()``. The TUI / runner calls
``inject_input(escalation_id, content)`` which the ``ExecutionStream`` routes here
via ``inject_event()`` matching the same ``hasattr(node, "inject_event")`` check
used for regular ``EventLoopNode`` instances.
"""
def __init__(self) -> None:
self._event = asyncio.Event()
self._response: str | None = None
self._awaiting_input = True # So inject_worker_message() can prefer us
async def inject_event(
self,
content: str,
*,
is_client_input: bool = False,
image_content: list[dict[str, Any]] | None = None,
) -> None:
"""Called by ExecutionStream.inject_input() when the user responds."""
self._response = content
self._event.set()
async def wait(self) -> str | None:
"""Block until inject_event() delivers the user's response."""
await self._event.wait()
return self._response
async def execute_subagent(
ctx: NodeContext,
agent_id: str,
@@ -68,7 +34,7 @@ async def execute_subagent(
*,
config: LoopConfig,
event_loop_node_cls: type[EventLoopNode],
escalation_receiver_cls: type[EscalationReceiver],
escalation_receiver_cls: Callable[[], Any],
accumulator: OutputAccumulator | None = None,
event_bus: EventBus | None = None,
tool_executor: Callable[[ToolUse], ToolResult | Awaitable[ToolResult]] | None = None,
@@ -127,7 +93,7 @@ async def execute_subagent(
subagent_spec = ctx.node_registry[agent_id]
# 2. Create read-only memory snapshot
parent_data = ctx.memory.read_all()
parent_data = ctx.buffer.read_all()
# Merge in-flight outputs from the parent's accumulator.
if accumulator:
@@ -135,12 +101,12 @@ async def execute_subagent(
if key not in parent_data:
parent_data[key] = value
subagent_memory = SharedMemory()
subagent_buffer = DataBuffer()
for key, value in parent_data.items():
subagent_memory.write(key, value, validate=False)
subagent_buffer.write(key, value, validate=False)
read_keys = set(parent_data.keys()) | set(subagent_spec.input_keys or [])
scoped_memory = subagent_memory.with_permissions(
scoped_buffer = subagent_buffer.with_permissions(
read_keys=list(read_keys),
write_keys=[], # Read-only!
)
@@ -252,7 +218,7 @@ async def execute_subagent(
runtime=ctx.runtime,
node_id=sa_node_id,
node_spec=subagent_spec,
memory=scoped_memory,
buffer=scoped_buffer,
input_data={"task": task, **parent_data},
llm=ctx.llm,
available_tools=subagent_tools,
@@ -307,14 +273,28 @@ async def execute_subagent(
conversation_store=subagent_conv_store,
)
# Inject a unique GCU browser profile for this subagent
_profile_token = None
try:
from gcu.browser.session import set_active_profile as _set_gcu_profile
# Each subagent instance gets its own unique browser profile so concurrent
# subagents don't share tab groups. The profile is injected into every
# browser_* tool call by wrapping the tool executor.
_gcu_profile = f"{agent_id}:{subagent_instance}"
_original_tool_executor = None
_profile_token = _set_gcu_profile(f"{agent_id}-{subagent_instance}")
except ImportError:
pass # GCU tools not installed; no-op
if tool_executor is not None:
_original_tool_executor = tool_executor
async def _gcu_profile_injecting_executor(
tool_use: ToolUse,
) -> ToolResult | Awaitable[ToolResult]:
if tool_use.name.startswith("browser_") and "profile" not in (tool_use.input or {}):
from dataclasses import replace
tool_use = replace(tool_use, input={**(tool_use.input or {}), "profile": _gcu_profile})
result = _original_tool_executor(tool_use)
if asyncio.isfuture(result) or asyncio.iscoroutine(result):
return await result
return result
tool_executor = _gcu_profile_injecting_executor
try:
logger.info("🚀 Starting subagent '%s' execution...", agent_id)
@@ -386,27 +366,16 @@ async def execute_subagent(
is_error=True,
)
finally:
# Restore the GCU profile context
if _profile_token is not None:
from gcu.browser.session import _active_profile as _gcu_profile_var
_gcu_profile_var.reset(_profile_token)
# Stop the browser session for this subagent's profile
if tool_executor is not None:
_subagent_profile = f"{agent_id}-{subagent_instance}"
try:
_stop_use = ToolUse(
id="gcu-cleanup",
name="browser_stop",
input={"profile": _subagent_profile},
)
_stop_result = tool_executor(_stop_use)
if asyncio.iscoroutine(_stop_result) or asyncio.isfuture(_stop_result):
await _stop_result
except Exception as _gcu_exc:
logger.warning(
"GCU browser_stop failed for profile %r: %s",
_subagent_profile,
_gcu_exc,
)
# Close the tab group this subagent created, if any.
if _original_tool_executor is not None:
try:
stop_call = ToolUse(
id="__subagent_cleanup__",
name="browser_stop",
input={"profile": _gcu_profile},
)
result = _original_tool_executor(stop_call)
if asyncio.isfuture(result) or asyncio.iscoroutine(result):
await result
except Exception:
pass
@@ -18,7 +18,7 @@ from framework.llm.provider import Tool, ToolResult
def build_ask_user_tool() -> Tool:
"""Build the synthetic ask_user tool for explicit user-input requests.
Client-facing nodes call ask_user() when they need to pause and wait
The queen calls ask_user() when it needs to pause and wait
for user input. Text-only turns WITHOUT ask_user flow through without
blocking, allowing progress updates and summaries to stream freely.
"""
@@ -0,0 +1,151 @@
"""Streaming XML tag filter for thinking tags.
Strips configured XML tags (e.g. ``<situation>``, ``<monologue>``) from
a chunked text stream while preserving the full text for conversation
storage. The filter is stateful it handles chunks that split mid-tag.
Only touches text content. Tool calls flow through a completely separate
code path and are never affected by this filter.
"""
from __future__ import annotations
from collections.abc import Sequence
class ThinkingTagFilter:
"""Strips XML thinking tags from a streaming text output.
Buffers content inside configured tags and yields only the visible
content outside those tags. Handles chunks that split across tag
boundaries (e.g. a chunk ending with ``"<mono"``).
Args:
tag_names: Tag names to strip (e.g. ``["situation", "monologue"]``).
"""
def __init__(self, tag_names: Sequence[str]) -> None:
self._tag_names: set[str] = set(tag_names)
# Pre-compute all opening and closing tag strings for matching.
self._open_tags: dict[str, str] = {name: f"<{name}>" for name in tag_names}
self._close_tags: dict[str, str] = {name: f"</{name}>" for name in tag_names}
# All possible tag prefixes for partial-match detection.
self._all_tag_strings: list[str] = sorted(
list(self._open_tags.values()) + list(self._close_tags.values()),
key=len,
reverse=True,
)
self._inside_tag: str | None = None # Which tag we're inside, or None.
self._pending: str = "" # Chars that might be a partial tag.
self._visible_text: str = "" # Accumulated visible snapshot.
def feed(self, chunk: str) -> str:
"""Feed a text chunk and return the visible portion.
Characters inside thinking tags are suppressed. Characters that
*might* be the start of a tag are buffered until the next chunk
resolves the ambiguity.
Returns:
The portion of text that should be shown to the user.
"""
buf = self._pending + chunk
self._pending = ""
visible = self._process(buf)
self._visible_text += visible
return visible
@property
def visible_snapshot(self) -> str:
"""Accumulated visible text so far (for the snapshot field)."""
return self._visible_text
def flush(self) -> str:
"""Flush any pending partial tag as visible text.
Called at end-of-stream. If characters were buffered because they
looked like the start of a tag but the stream ended before the tag
completed, they are emitted as visible text (graceful degradation).
"""
result = ""
if self._pending:
if self._inside_tag is None:
result = self._pending
# If inside a tag, discard pending (unclosed tag content).
self._pending = ""
self._visible_text += result
return result
# ------------------------------------------------------------------
# Internal processing
# ------------------------------------------------------------------
def _process(self, buf: str) -> str:
"""Process a buffer, returning visible text and updating state."""
visible_parts: list[str] = []
i = 0
n = len(buf)
while i < n:
if self._inside_tag is not None:
# Inside a tag — look for the closing tag.
close = self._close_tags[self._inside_tag]
close_pos = buf.find(close, i)
if close_pos == -1:
# Closing tag might be split across chunks.
# Check if the tail of buf is a prefix of the close tag.
tail_len = min(len(close) - 1, n - i)
for tl in range(tail_len, 0, -1):
if close.startswith(buf[n - tl :]):
self._pending = buf[n - tl :]
i = n
break
else:
# No partial match — discard everything (inside tag).
i = n
break
else:
# Found closing tag — skip past it and exit tag.
i = close_pos + len(close)
self._inside_tag = None
else:
# Outside any tag — look for '<'.
lt_pos = buf.find("<", i)
if lt_pos == -1:
# No '<' — everything is visible.
visible_parts.append(buf[i:])
i = n
else:
# Emit text before the '<'.
if lt_pos > i:
visible_parts.append(buf[i:lt_pos])
# Try to match an opening tag at this position.
remainder = buf[lt_pos:]
matched = False
for name, open_tag in self._open_tags.items():
if remainder.startswith(open_tag):
# Full opening tag found — enter tag.
self._inside_tag = name
i = lt_pos + len(open_tag)
matched = True
break
if not matched:
# Check if remainder could be a partial tag prefix.
if self._is_partial_tag_prefix(remainder):
# Buffer and wait for next chunk.
self._pending = remainder
i = n
else:
# Not a known tag — '<' is visible text.
visible_parts.append("<")
i = lt_pos + 1
return "".join(visible_parts)
def _is_partial_tag_prefix(self, text: str) -> bool:
"""Check if text could be the start of a known tag string."""
for tag_str in self._all_tag_strings:
if tag_str.startswith(text) and len(text) < len(tag_str):
return True
return False
@@ -8,6 +8,7 @@ the context-window-exceeded error detector.
from __future__ import annotations
import asyncio
import contextvars
import json
import logging
import re
@@ -221,7 +222,7 @@ def truncate_tool_result(
- Small results ( limit): full content kept + file annotation
- Large results (> limit): preview + file reference
- Errors: pass through unchanged
- load_data results: truncate with pagination hint (no re-spill)
- read_file/load_data results: truncate with pagination hint (no re-spill)
"""
limit = max_tool_result_chars
@@ -229,12 +230,12 @@ def truncate_tool_result(
if result.is_error:
return result
# load_data reads FROM spilled files — never re-spill (circular).
# read_file/load_data reads FROM spilled files — never re-spill (circular).
# Just truncate with a pagination hint if the result is too large.
if tool_name == "load_data":
if tool_name in ("load_data", "read_file"):
if limit <= 0 or len(result.content) <= limit:
return result # Small load_data result — pass through as-is
# Large load_data result — truncate with smart preview
return result # Small result — pass through as-is
# Large result — truncate with smart preview
PREVIEW_CAP = min(5000, max(limit - 500, limit // 2))
metadata_str = ""
@@ -283,7 +284,7 @@ def truncate_tool_result(
spill_path.mkdir(parents=True, exist_ok=True)
filename = next_spill_filename_fn(tool_name)
# Pretty-print JSON content so load_data's line-based
# Pretty-print JSON content so read_file's line-based
# pagination works correctly.
write_content = result.content
parsed_json: Any = None # track for metadata extraction
@@ -293,7 +294,10 @@ def truncate_tool_result(
except (json.JSONDecodeError, TypeError, ValueError):
pass # Not JSON — write as-is
(spill_path / filename).write_text(write_content, encoding="utf-8")
file_path = spill_path / filename
file_path.write_text(write_content, encoding="utf-8")
# Use absolute path so parent agents can find files from subagents
abs_path = str(file_path.resolve())
if limit > 0 and len(result.content) > limit:
# Large result: build a small, metadata-rich preview so the
@@ -315,14 +319,14 @@ def truncate_tool_result(
# Assemble header with structural info + warning
header = (
f"[Result from {tool_name}: {len(result.content):,} chars — "
f"too large for context, saved to '{filename}'.]\n"
f"too large for context, saved to '{abs_path}'.]\n"
)
if metadata_str:
header += f"\nData structure:\n{metadata_str}"
header += (
f"\n\nWARNING: The preview below is INCOMPLETE. "
f"Do NOT draw conclusions or counts from it. "
f"Use load_data(filename='{filename}') to read the "
f"Use read_file(path='{abs_path}') to read the "
f"full data before analysis."
)
@@ -331,11 +335,11 @@ def truncate_tool_result(
"Tool result spilled to file: %s (%d chars → %s)",
tool_name,
len(result.content),
filename,
abs_path,
)
else:
# Small result: keep full content + annotation
content = f"{result.content}\n\n[Saved to '{filename}']"
# Small result: keep full content + annotation with absolute path
content = f"{result.content}\n\n[Saved to '{abs_path}']"
logger.info(
"Tool result saved to file: %s (%d chars → %s)",
tool_name,
@@ -446,8 +450,11 @@ async def execute_tool(
# Offload the executor call to a thread. Sync MCP executors
# block on future.result() — running in a thread keeps the
# event loop free so asyncio.wait_for can fire the timeout.
# Copy the current context so contextvars (e.g. data_dir from
# execution context) propagate into the worker thread.
loop = asyncio.get_running_loop()
result = await loop.run_in_executor(None, tool_executor, tool_use)
ctx = contextvars.copy_context()
result = await loop.run_in_executor(None, ctx.run, tool_executor, tool_use)
# Async executors return a coroutine — await it on the loop
if asyncio.iscoroutine(result) or asyncio.isfuture(result):
result = await result
@@ -472,56 +479,6 @@ async def execute_tool(
return result
def record_learning(key: str, value: Any, spillover_dir: str | None) -> None:
"""Append a set_output value to adapt.md as a learning entry.
Called at set_output time the moment knowledge is produced so that
adapt.md accumulates the agent's outputs across the session. Since
adapt.md is injected into the system prompt, these persist through
any compaction.
"""
if not spillover_dir:
return
try:
adapt_path = Path(spillover_dir) / "adapt.md"
adapt_path.parent.mkdir(parents=True, exist_ok=True)
content = adapt_path.read_text(encoding="utf-8") if adapt_path.exists() else ""
if "## Outputs" not in content:
content += "\n\n## Outputs\n"
# Truncate long values for memory (full value is in shared memory)
v_str = str(value)
if len(v_str) > 500:
v_str = v_str[:500] + ""
entry = f"- {key}: {v_str}\n"
# Replace existing entry for same key (update, not duplicate)
lines = content.splitlines(keepends=True)
replaced = False
for i, line in enumerate(lines):
if line.startswith(f"- {key}:"):
lines[i] = entry
replaced = True
break
if replaced:
content = "".join(lines)
else:
content += entry
adapt_path.write_text(content, encoding="utf-8")
except Exception as e:
logger.warning("Failed to record learning for key=%s: %s", key, e)
def next_spill_filename(tool_name: str, counter: int) -> str:
"""Return a short, monotonic filename for a tool result spill."""
# Shorten common tool name prefixes to save tokens
short = tool_name.removeprefix("tool_").removeprefix("mcp_")
return f"{short}_{counter}.txt"
def restore_spill_counter(spillover_dir: str | None) -> int:
"""Scan spillover_dir for existing spill files and return the max counter.
+29 -12
View File
@@ -9,7 +9,11 @@ from dataclasses import dataclass, field
from pathlib import Path
from typing import Any, Literal, Protocol, runtime_checkable
from framework.graph.conversation import ConversationStore
from framework.graph.conversation import (
ConversationStore,
get_run_cursor,
update_run_cursor,
)
logger = logging.getLogger(__name__)
@@ -75,13 +79,20 @@ class LoopConfig:
# Client-facing auto-block grace period.
cf_grace_turns: int = 1
# Worker auto-escalation: text-only turns before escalating to queen.
worker_escalation_grace_turns: int = 1
tool_doom_loop_enabled: bool = True
# Per-tool-call timeout.
tool_call_timeout_seconds: float = 60.0
# Subagent delegation timeout.
subagent_timeout_seconds: float = 600.0
# Subagent delegation timeout (wall-clock max).
subagent_timeout_seconds: float = 3600.0
# Subagent inactivity timeout - only timeout if no activity for this duration.
# This resets whenever the subagent makes progress (tool calls, LLM responses).
# Set to 0 to use only the wall-clock timeout.
subagent_inactivity_timeout_seconds: float = 300.0
# Lifecycle hooks.
hooks: dict[str, list] | None = None
@@ -116,6 +127,7 @@ class OutputAccumulator:
store: ConversationStore | None = None
spillover_dir: str | None = None
max_value_chars: int = 0
run_id: str | None = None
async def set(self, key: str, value: Any) -> None:
"""Set a key-value pair, auto-spilling large values to files."""
@@ -146,8 +158,9 @@ class OutputAccumulator:
if isinstance(value, (dict, list))
else str(value)
)
(spill_path / filename).write_text(write_content, encoding="utf-8")
file_size = (spill_path / filename).stat().st_size
file_path = spill_path / filename
file_path.write_text(write_content, encoding="utf-8")
file_size = file_path.stat().st_size
logger.info(
"set_output value auto-spilled: key=%s, %d chars -> %s (%d bytes)",
key,
@@ -155,9 +168,11 @@ class OutputAccumulator:
filename,
file_size,
)
# Use absolute path so parent agents can find files from subagents
abs_path = str(file_path.resolve())
return (
f"[Saved to '{filename}' ({file_size:,} bytes). "
f"Use load_data(filename='{filename}') "
f"[Saved to '{abs_path}' ({file_size:,} bytes). "
f"Use read_file(path='{abs_path}') "
f"to access full data.]"
)
@@ -171,12 +186,14 @@ class OutputAccumulator:
return all(key in self.values and self.values[key] is not None for key in required)
@classmethod
async def restore(cls, store: ConversationStore) -> OutputAccumulator:
async def restore(
cls,
store: ConversationStore,
run_id: str | None = None,
) -> OutputAccumulator:
cursor = await store.read_cursor()
values = {}
if cursor and "outputs" in cursor:
values = cursor["outputs"]
return cls(values=values, store=store)
values = cursor.get("outputs", {}) if cursor else {}
return cls(values=values, store=store, run_id=run_id)
__all__ = [
File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff
+97 -79
View File
@@ -2,7 +2,7 @@
Node Protocol - The building block of agent graphs.
A Node is a unit of work that:
1. Receives context (goal, shared memory, input)
1. Receives context (goal, shared buffer, input)
2. Makes decisions (using LLM, tools, or logic)
3. Produces results (output, state changes)
4. Records everything to the Runtime
@@ -30,62 +30,6 @@ from framework.runtime.core import Runtime
logger = logging.getLogger(__name__)
def _fix_unescaped_newlines_in_json(json_str: str) -> str:
"""Fix unescaped newlines inside JSON string values.
LLMs sometimes output actual newlines inside JSON strings instead of \\n.
This function fixes that by properly escaping newlines within string values.
"""
result = []
in_string = False
escape_next = False
i = 0
while i < len(json_str):
char = json_str[i]
if escape_next:
result.append(char)
escape_next = False
i += 1
continue
if char == "\\" and in_string:
escape_next = True
result.append(char)
i += 1
continue
if char == '"' and not escape_next:
in_string = not in_string
result.append(char)
i += 1
continue
# Fix unescaped newlines inside strings
if in_string and char == "\n":
result.append("\\n")
i += 1
continue
# Fix unescaped carriage returns inside strings
if in_string and char == "\r":
result.append("\\r")
i += 1
continue
# Fix unescaped tabs inside strings
if in_string and char == "\t":
result.append("\\t")
i += 1
continue
result.append(char)
i += 1
return "".join(result)
def find_json_object(text: str) -> str | None:
"""Find the first valid JSON object in text using balanced brace matching.
@@ -171,10 +115,10 @@ class NodeSpec(BaseModel):
# Data flow
input_keys: list[str] = Field(
default_factory=list, description="Keys this node reads from shared memory or input"
default_factory=list, description="Keys this node reads from the shared buffer or input"
)
output_keys: list[str] = Field(
default_factory=list, description="Keys this node writes to shared memory or output"
default_factory=list, description="Keys this node writes to the shared buffer or output"
)
nullable_output_keys: list[str] = Field(
default_factory=list,
@@ -249,7 +193,10 @@ class NodeSpec(BaseModel):
# Client-facing behavior
client_facing: bool = Field(
default=False,
description="If True, this node streams output to the end user and can request input.",
description=(
"Deprecated compatibility field. The queen is intrinsically interactive; "
"non-queen nodes should escalate to the queen instead of talking to users directly."
),
)
# Phase completion criteria for conversation-aware judge (Level 2)
@@ -272,22 +219,59 @@ class NodeSpec(BaseModel):
),
)
# Structured thinking tags — stripped from client-facing output but kept in
# conversation history so the LLM sees its own reasoning on subsequent turns.
thinking_tags: list[str] | None = Field(
default=None,
description=(
"XML tag names stripped from client output but kept in conversation "
"history. e.g. ['situation', 'monologue'] strips <situation>...</situation> "
"from the user-facing stream while preserving it for the LLM."
),
)
model_config = {"extra": "allow", "arbitrary_types_allowed": True}
def is_queen_node(self) -> bool:
"""Return True when this spec is the queen conversational node."""
return self.id == "queen"
class MemoryWriteError(Exception):
"""Raised when an invalid value is written to memory."""
def supports_direct_user_io(self) -> bool:
"""Return True when this node may talk to the user directly."""
return self.is_queen_node()
def deprecated_client_facing_warning(node_spec: NodeSpec) -> str | None:
"""Return a deprecation warning for legacy non-queen client_facing nodes."""
if node_spec.client_facing and not node_spec.is_queen_node():
return (
f"Node '{node_spec.id}' sets deprecated client_facing=True. "
"Non-queen direct human I/O is no longer supported; route worker "
"questions and approvals through queen escalation instead."
)
return None
def warn_if_deprecated_client_facing(node_spec: NodeSpec) -> None:
"""Log a compatibility warning once the node is loaded for execution."""
warning = deprecated_client_facing_warning(node_spec)
if warning:
logger.warning(warning)
class DataBufferWriteError(Exception):
"""Raised when an invalid value is written to the data buffer."""
pass
@dataclass
class SharedMemory:
class DataBuffer:
"""
Shared state between nodes in a graph execution.
Shared data buffer between nodes in a graph execution.
Nodes read and write to shared memory using typed keys.
The memory is scoped to a single run.
Nodes read and write to the data buffer using typed keys.
The buffer is scoped to a single run.
For parallel execution, use write_async() which provides per-key locking
to prevent race conditions when multiple nodes write concurrently.
@@ -306,23 +290,23 @@ class SharedMemory:
self._lock = asyncio.Lock()
def read(self, key: str) -> Any:
"""Read a value from shared memory."""
"""Read a value from the data buffer."""
if self._allowed_read and key not in self._allowed_read:
raise PermissionError(f"Node not allowed to read key: {key}")
return self._data.get(key)
def write(self, key: str, value: Any, validate: bool = True) -> None:
"""
Write a value to shared memory.
Write a value to the data buffer.
Args:
key: The memory key to write to
key: The buffer key to write to
value: The value to write
validate: If True, check for suspicious content (default True)
Raises:
PermissionError: If node doesn't have write permission
MemoryWriteError: If value appears to be hallucinated content
DataBufferWriteError: If value appears to be hallucinated content
"""
if self._allowed_write and key not in self._allowed_write:
raise PermissionError(f"Node not allowed to write key: {key}")
@@ -336,7 +320,7 @@ class SharedMemory:
f"⚠ Suspicious write to key '{key}': appears to be code "
f"({len(value)} chars). Consider using validate=False if intended."
)
raise MemoryWriteError(
raise DataBufferWriteError(
f"Rejected suspicious content for key '{key}': "
f"appears to be hallucinated code ({len(value)} chars). "
"If this is intentional, use validate=False."
@@ -352,13 +336,13 @@ class SharedMemory:
parallel execution. Each key has its own lock to minimize contention.
Args:
key: The memory key to write to
key: The buffer key to write to
value: The value to write
validate: If True, check for suspicious content (default True)
Raises:
PermissionError: If node doesn't have write permission
MemoryWriteError: If value appears to be hallucinated content
DataBufferWriteError: If value appears to be hallucinated content
"""
# Check permissions first (no lock needed)
if self._allowed_write and key not in self._allowed_write:
@@ -379,7 +363,7 @@ class SharedMemory:
f"⚠ Suspicious write to key '{key}': appears to be code "
f"({len(value)} chars). Consider using validate=False if intended."
)
raise MemoryWriteError(
raise DataBufferWriteError(
f"Rejected suspicious content for key '{key}': "
f"appears to be hallucinated code ({len(value)} chars). "
"If this is intentional, use validate=False."
@@ -457,13 +441,13 @@ class SharedMemory:
self,
read_keys: list[str],
write_keys: list[str],
) -> "SharedMemory":
) -> "DataBuffer":
"""Create a view with restricted permissions for a specific node.
The scoped view shares the same underlying data and locks,
enabling thread-safe parallel execution across scoped views.
"""
return SharedMemory(
return DataBuffer(
_data=self._data,
_allowed_read=set(read_keys) if read_keys else set(),
_allowed_write=set(write_keys) if write_keys else set(),
@@ -479,7 +463,7 @@ class NodeContext:
This is passed to every node and provides:
- Access to the runtime (for decision logging)
- Access to shared memory (for state)
- Access to the data buffer (for state)
- Access to LLM (for generation)
- Access to tools (for actions)
- The goal context (for guidance)
@@ -493,7 +477,7 @@ class NodeContext:
node_spec: NodeSpec
# State
memory: SharedMemory
buffer: DataBuffer
input_data: dict[str, Any] = field(default_factory=dict)
# LLM access (if applicable)
@@ -529,12 +513,25 @@ class NodeContext:
# rebuilding the full system prompt when restoring from conversation store.
identity_prompt: str = ""
narrative: str = ""
# Static memory block injected into the system prompt.
memory_prompt: str = ""
# Event-triggered execution (no interactive user attached)
event_triggered: bool = False
# Execution ID (from StreamRuntimeAdapter)
execution_id: str = ""
run_id: str = ""
@property
def effective_run_id(self) -> str | None:
"""Normalized run_id: returns run_id if truthy, otherwise None.
The field defaults to ``""``; callers should use this property
instead of ``self.run_id or None`` to avoid silently falling
back to session-scoped storage.
"""
return self.run_id or None
# Stream identity — the ExecutionStream this node runs within.
# Falls back to node_id when not set (legacy / standalone executor).
@@ -564,6 +561,9 @@ class NodeContext:
# the queen to switch between phase-specific prompts (building /
# staging / running) without restarting the conversation.
dynamic_prompt_provider: Any = None # Callable[[], str] | None
# Dynamic memory provider — when set, EventLoopNode rebuilds the
# system prompt with the latest memory block each iteration.
dynamic_memory_provider: Any = None # Callable[[], str] | None
# Skill system prompts — injected by the skill discovery pipeline
skills_catalog_prompt: str = "" # Available skills XML catalog
@@ -579,6 +579,24 @@ class NodeContext:
# the queen to record the current phase per iteration.
iteration_metadata_provider: Any = None # Callable[[], dict] | None
# Structured thinking tags — propagated from NodeSpec.thinking_tags.
thinking_tags: list[str] | None = None
@property
def is_queen_stream(self) -> bool:
"""Return True when this context belongs to the queen conversation."""
return self.stream_id == "queen" or self.node_spec.is_queen_node()
@property
def emits_client_io(self) -> bool:
"""Return True when text should be published to user-facing streams."""
return self.is_queen_stream
@property
def supports_direct_user_io(self) -> bool:
"""Return True when the node may directly request user input."""
return self.is_queen_stream and not self.event_triggered
@dataclass
class NodeResult:
@@ -686,6 +704,6 @@ class NodeProtocol(ABC):
"""
errors = []
for key in ctx.node_spec.input_keys:
if key not in ctx.input_data and ctx.memory.read(key) is None:
if key not in ctx.input_data and ctx.buffer.read(key) is None:
errors.append(f"Missing required input: {key}")
return errors
+108 -331
View File
@@ -1,148 +1,29 @@
"""Prompt composition for continuous agent mode.
"""Legacy compatibility wrapper around :mod:`framework.graph.prompting`.
Composes the three-layer system prompt (onion model) and generates
transition markers inserted into the conversation at phase boundaries.
Layer 1 Identity (static, defined at agent level, never changes):
"You are a thorough research agent. You prefer clarity over jargon..."
Layer 2 Narrative (auto-generated from conversation/memory state):
"We've finished scoping the project. The user wants to focus on..."
Layer 3 Focus (per-node system_prompt, reframed as focus directive):
"Your current attention: synthesize findings into a report..."
New runtime code should import from ``framework.graph.prompting`` directly.
"""
from __future__ import annotations
import logging
from datetime import datetime
import json
from pathlib import Path
from typing import TYPE_CHECKING, Any
from typing import TYPE_CHECKING
from framework.graph.prompting import (
EXECUTION_SCOPE_PREAMBLE,
TransitionSpec,
build_accounts_prompt,
build_narrative,
build_system_prompt,
stamp_prompt_datetime,
)
if TYPE_CHECKING:
from framework.graph.edge import GraphSpec
from framework.graph.node import NodeSpec, SharedMemory
logger = logging.getLogger(__name__)
# Injected into every worker node's system prompt so the LLM understands
# it is one step in a multi-node pipeline and should not overreach.
EXECUTION_SCOPE_PREAMBLE = (
"EXECUTION SCOPE: You are one node in a multi-step workflow graph. "
"Focus ONLY on the task described in your instructions below. "
"Call set_output() for each of your declared output keys, then stop. "
"Do NOT attempt work that belongs to other nodes — the framework "
"routes data between nodes automatically."
)
from framework.graph.node import DataBuffer, NodeSpec
def _with_datetime(prompt: str) -> str:
"""Append current datetime with local timezone to a system prompt."""
local = datetime.now().astimezone()
stamp = f"Current date and time: {local.strftime('%Y-%m-%d %H:%M %Z (UTC%z)')}"
return f"{prompt}\n\n{stamp}" if prompt else stamp
def build_accounts_prompt(
accounts: list[dict[str, Any]],
tool_provider_map: dict[str, str] | None = None,
node_tool_names: list[str] | None = None,
) -> str:
"""Build a prompt section describing connected accounts.
When tool_provider_map is provided, produces structured output grouped
by provider with tool mapping, so the LLM knows which ``account`` value
to pass to which tool.
When node_tool_names is also provided, filters to only show providers
whose tools overlap with the node's tool list.
Args:
accounts: List of account info dicts from
CredentialStoreAdapter.get_all_account_info().
tool_provider_map: Mapping of tool_name -> provider_name
(e.g. {"gmail_list_messages": "google"}).
node_tool_names: Tool names available to the current node.
When provided, only providers with matching tools are shown.
Returns:
Formatted accounts block, or empty string if no accounts.
"""
if not accounts:
return ""
# Flat format (backward compat) when no tool mapping provided
if tool_provider_map is None:
lines = [
"Connected accounts (use the alias as the `account` parameter "
"when calling tools to target a specific account):"
]
for acct in accounts:
provider = acct.get("provider", "unknown")
alias = acct.get("alias", "unknown")
identity = acct.get("identity", {})
detail_parts = [f"{k}: {v}" for k, v in identity.items() if v]
detail = f" ({', '.join(detail_parts)})" if detail_parts else ""
lines.append(f"- {provider}/{alias}{detail}")
return "\n".join(lines)
# --- Structured format: group by provider with tool mapping ---
# Invert tool_provider_map to provider -> [tools]
provider_tools: dict[str, list[str]] = {}
for tool_name, provider in tool_provider_map.items():
provider_tools.setdefault(provider, []).append(tool_name)
# Filter to relevant providers based on node tools
node_tool_set = set(node_tool_names) if node_tool_names else None
# Group accounts by provider
provider_accounts: dict[str, list[dict[str, Any]]] = {}
for acct in accounts:
provider = acct.get("provider", "unknown")
provider_accounts.setdefault(provider, []).append(acct)
sections: list[str] = ["Connected accounts:"]
for provider, acct_list in provider_accounts.items():
tools_for_provider = sorted(provider_tools.get(provider, []))
# If node tools specified, only show providers with overlapping tools
if node_tool_set is not None:
relevant_tools = [t for t in tools_for_provider if t in node_tool_set]
if not relevant_tools:
continue
tools_for_provider = relevant_tools
# Local-only providers: tools read from env vars, no account= routing
all_local = all(a.get("source") == "local" for a in acct_list)
# Provider header with tools
display_name = provider.replace("_", " ").title()
if tools_for_provider and not all_local:
tools_str = ", ".join(tools_for_provider)
sections.append(f'\n{display_name} (use account="<alias>" with: {tools_str}):')
elif tools_for_provider and all_local:
tools_str = ", ".join(tools_for_provider)
sections.append(f"\n{display_name} (tools: {tools_str}):")
else:
sections.append(f"\n{display_name}:")
# Account entries
for acct in acct_list:
alias = acct.get("alias", "unknown")
identity = acct.get("identity", {})
detail_parts = [f"{k}: {v}" for k, v in identity.items() if v]
detail = f" ({', '.join(detail_parts)})" if detail_parts else ""
source_tag = " [local]" if acct.get("source") == "local" else ""
sections.append(f" - {provider}/{alias}{detail}{source_tag}")
# If filtering removed all providers, return empty
if len(sections) <= 1:
return ""
return "\n".join(sections)
_with_datetime = stamp_prompt_datetime
def compose_system_prompt(
@@ -155,219 +36,115 @@ def compose_system_prompt(
execution_preamble: str | None = None,
node_type_preamble: str | None = None,
) -> str:
"""Compose the multi-layer system prompt.
"""Compatibility wrapper for the legacy function signature."""
from framework.graph.prompting import NodePromptSpec
Args:
identity_prompt: Layer 1 static agent identity (from GraphSpec).
focus_prompt: Layer 3 per-node focus directive (from NodeSpec.system_prompt).
narrative: Layer 2 auto-generated from conversation state.
accounts_prompt: Connected accounts block (sits between identity and narrative).
skills_catalog_prompt: Available skills catalog XML (Agent Skills standard).
protocols_prompt: Default skill operational protocols section.
execution_preamble: EXECUTION_SCOPE_PREAMBLE for worker nodes
(prepended before focus so the LLM knows its pipeline scope).
node_type_preamble: Node-type-specific preamble, e.g. GCU browser
best-practices prompt (prepended before focus).
Returns:
Composed system prompt with all layers present, plus current datetime.
"""
parts: list[str] = []
# Layer 1: Identity (always first, anchors the personality)
if identity_prompt:
parts.append(identity_prompt)
# Accounts (semi-static, deployment-specific)
if accounts_prompt:
parts.append(f"\n{accounts_prompt}")
# Skills catalog (discovered skills available for activation)
if skills_catalog_prompt:
parts.append(f"\n{skills_catalog_prompt}")
# Operational protocols (default skill behavioral guidance)
if protocols_prompt:
parts.append(f"\n{protocols_prompt}")
# Layer 2: Narrative (what's happened so far)
if narrative:
parts.append(f"\n--- Context (what has happened so far) ---\n{narrative}")
# Execution scope preamble (worker nodes — tells the LLM it is one
# step in a multi-node pipeline and should not overreach)
if execution_preamble:
parts.append(f"\n{execution_preamble}")
# Node-type preamble (e.g. GCU browser best-practices)
if node_type_preamble:
parts.append(f"\n{node_type_preamble}")
# Layer 3: Focus (current phase directive)
if focus_prompt:
parts.append(f"\n--- Current Focus ---\n{focus_prompt}")
return _with_datetime("\n".join(parts) if parts else "")
def build_narrative(
memory: SharedMemory,
execution_path: list[str],
graph: GraphSpec,
) -> str:
"""Build Layer 2 (narrative) from structured state.
Deterministic no LLM call. Reads SharedMemory and execution path
to describe what has happened so far. Cheap and fast.
Args:
memory: Current shared memory state.
execution_path: List of node IDs visited so far.
graph: Graph spec (for node names/descriptions).
Returns:
Narrative string describing the session state.
"""
parts: list[str] = []
# Describe execution path
if execution_path:
phase_descriptions: list[str] = []
for node_id in execution_path:
node_spec = graph.get_node(node_id)
if node_spec:
phase_descriptions.append(f"- {node_spec.name}: {node_spec.description}")
else:
phase_descriptions.append(f"- {node_id}")
parts.append("Phases completed:\n" + "\n".join(phase_descriptions))
# Describe key memory values (skip very long values)
all_memory = memory.read_all()
if all_memory:
memory_lines: list[str] = []
for key, value in all_memory.items():
if value is None:
continue
val_str = str(value)
if len(val_str) > 200:
val_str = val_str[:200] + "..."
memory_lines.append(f"- {key}: {val_str}")
if memory_lines:
parts.append("Current state:\n" + "\n".join(memory_lines))
return "\n\n".join(parts) if parts else ""
spec = NodePromptSpec(
identity_prompt=identity_prompt or "",
focus_prompt=focus_prompt or "",
narrative=narrative or "",
accounts_prompt=accounts_prompt or "",
skills_catalog_prompt=skills_catalog_prompt or "",
protocols_prompt=protocols_prompt or "",
# Legacy callers explicitly passed these preambles. Preserve them by
# folding them into the focus block when present.
node_type="event_loop",
)
if execution_preamble or node_type_preamble:
focus_parts = []
if execution_preamble:
focus_parts.append(execution_preamble)
if node_type_preamble:
focus_parts.append(node_type_preamble)
if spec.focus_prompt:
focus_parts.append(spec.focus_prompt)
spec = NodePromptSpec(
identity_prompt=spec.identity_prompt,
focus_prompt="\n\n".join(focus_parts),
narrative=spec.narrative,
accounts_prompt=spec.accounts_prompt,
skills_catalog_prompt=spec.skills_catalog_prompt,
protocols_prompt=spec.protocols_prompt,
node_type=spec.node_type,
output_keys=spec.output_keys,
is_subagent_mode=spec.is_subagent_mode,
)
return build_system_prompt(spec)
def build_transition_marker(
previous_node: NodeSpec,
next_node: NodeSpec,
memory: SharedMemory,
buffer: DataBuffer,
cumulative_tool_names: list[str],
data_dir: Path | str | None = None,
adapt_content: str | None = None,
) -> str:
"""Build a 'State of the World' transition marker.
"""Legacy transition builder with best-effort spillover compatibility."""
buffer_items: dict[str, str] = {}
data_files: list[str] = []
Inserted into the conversation as a user message at phase boundaries.
Gives the LLM full situational awareness: what happened, what's stored,
what tools are available, and what to focus on next.
all_buffer = buffer.read_all()
for key, value in all_buffer.items():
if value is None:
continue
val_str = str(value)
if len(val_str) > 300 and data_dir:
data_path = Path(data_dir)
data_path.mkdir(parents=True, exist_ok=True)
ext = ".json" if isinstance(value, (dict, list)) else ".txt"
filename = f"output_{key}{ext}"
file_path = data_path / filename
try:
write_content = (
json.dumps(value, indent=2, ensure_ascii=False)
if isinstance(value, (dict, list))
else str(value)
)
file_path.write_text(write_content, encoding="utf-8")
file_size = file_path.stat().st_size
buffer_items[key] = (
f"[Saved to '{filename}' ({file_size:,} bytes). "
f"Use load_data(filename='{filename}') to access.]"
)
except Exception:
buffer_items[key] = val_str[:300] + "..."
elif len(val_str) > 300:
buffer_items[key] = val_str[:300] + "..."
else:
buffer_items[key] = val_str
Args:
previous_node: NodeSpec of the phase just completed.
next_node: NodeSpec of the phase about to start.
memory: Current shared memory state.
cumulative_tool_names: All tools available (cumulative set).
data_dir: Path to spillover data directory.
adapt_content: Agent working memory (adapt.md) content.
Returns:
Transition marker message text.
"""
sections: list[str] = []
# Header
sections.append(f"--- PHASE TRANSITION: {previous_node.name}{next_node.name} ---")
# What just completed
sections.append(f"\nCompleted: {previous_node.name}")
sections.append(f" {previous_node.description}")
# Outputs in memory — use file references for large values so the
# next node loads full data from disk instead of seeing truncated
# inline previews that look deceptively complete.
all_memory = memory.read_all()
if all_memory:
memory_lines: list[str] = []
for key, value in all_memory.items():
if value is None:
continue
val_str = str(value)
if len(val_str) > 300 and data_dir:
# Auto-spill large transition values to data files
import json as _json
data_path = Path(data_dir)
data_path.mkdir(parents=True, exist_ok=True)
ext = ".json" if isinstance(value, (dict, list)) else ".txt"
filename = f"output_{key}{ext}"
try:
write_content = (
_json.dumps(value, indent=2, ensure_ascii=False)
if isinstance(value, (dict, list))
else str(value)
)
(data_path / filename).write_text(write_content, encoding="utf-8")
file_size = (data_path / filename).stat().st_size
val_str = (
f"[Saved to '{filename}' ({file_size:,} bytes). "
f"Use load_data(filename='{filename}') to access.]"
)
except Exception:
val_str = val_str[:300] + "..."
elif len(val_str) > 300:
val_str = val_str[:300] + "..."
memory_lines.append(f" {key}: {val_str}")
if memory_lines:
sections.append("\nOutputs available:\n" + "\n".join(memory_lines))
# Files in data directory
if data_dir:
data_path = Path(data_dir)
if data_path.exists():
files = sorted(data_path.iterdir())
if files:
file_lines = [
f" {f.name} ({f.stat().st_size:,} bytes)" for f in files if f.is_file()
]
if file_lines:
sections.append(
"\nData files (use load_data to access):\n" + "\n".join(file_lines)
)
data_files = [
f"{entry.name} ({entry.stat().st_size:,} bytes)"
for entry in sorted(data_path.iterdir())
if entry.is_file()
]
# Agent working memory
if adapt_content:
sections.append(f"\n--- Agent Memory ---\n{adapt_content}")
# Available tools
if cumulative_tool_names:
sections.append("\nAvailable tools: " + ", ".join(sorted(cumulative_tool_names)))
# Next phase
sections.append(f"\nNow entering: {next_node.name}")
sections.append(f" {next_node.description}")
if next_node.output_keys:
sections.append(
f"\nYour ONLY job in this phase: complete the task above and call "
f"set_output() for {next_node.output_keys}. Do NOT do work that "
f"belongs to later phases."
return build_transition_message(
TransitionSpec(
previous_name=previous_node.name,
previous_description=previous_node.description,
next_name=next_node.name,
next_description=next_node.description,
next_output_keys=tuple(next_node.output_keys or ()),
buffer_items=buffer_items,
cumulative_tool_names=tuple(sorted(cumulative_tool_names)),
data_files=tuple(data_files),
)
# Reflection prompt (engineered metacognition)
sections.append(
"\nBefore proceeding, briefly reflect: what went well in the "
"previous phase? Are there any gaps or surprises worth noting?"
)
sections.append("\n--- END TRANSITION ---")
return "\n".join(sections)
from framework.graph.prompting import build_transition_message
__all__ = [
"EXECUTION_SCOPE_PREAMBLE",
"_with_datetime",
"build_accounts_prompt",
"build_narrative",
"build_transition_marker",
"build_transition_message",
"compose_system_prompt",
]
+312
View File
@@ -0,0 +1,312 @@
"""Pure prompt rendering helpers for graph execution.
This module owns all prompt text assembly for graph nodes.
It intentionally avoids side effects so runtime code can prepare any
spill files or transition metadata separately and then pass plain data in.
"""
from __future__ import annotations
from dataclasses import dataclass, field
from datetime import datetime
from typing import TYPE_CHECKING, Any
if TYPE_CHECKING:
from framework.graph.edge import GraphSpec
from framework.graph.node import DataBuffer
# Injected into every worker node's system prompt so the LLM understands
# it is one step in a multi-node pipeline and should not overreach.
EXECUTION_SCOPE_PREAMBLE = (
"EXECUTION SCOPE: You are one node in a multi-step workflow graph. "
"Focus ONLY on the task described in your instructions below. "
"Call set_output() for each of your declared output keys, then stop. "
"Do NOT attempt work that belongs to other nodes - the framework "
"routes data between nodes automatically."
)
@dataclass(frozen=True)
class NodePromptSpec:
"""Structured inputs for building one node system prompt."""
identity_prompt: str = ""
focus_prompt: str = ""
narrative: str = ""
accounts_prompt: str = ""
skills_catalog_prompt: str = ""
protocols_prompt: str = ""
memory_prompt: str = ""
node_type: str = "event_loop"
output_keys: tuple[str, ...] = ()
is_subagent_mode: bool = False
@dataclass(frozen=True)
class TransitionSpec:
"""Structured inputs for a transition marker message."""
previous_name: str
previous_description: str
next_name: str
next_description: str
next_output_keys: tuple[str, ...] = ()
buffer_items: dict[str, str] = field(default_factory=dict)
cumulative_tool_names: tuple[str, ...] = ()
data_files: tuple[str, ...] = ()
def stamp_prompt_datetime(prompt: str) -> str:
"""Append current datetime with local timezone to a prompt."""
local = datetime.now().astimezone()
stamp = f"Current date and time: {local.strftime('%Y-%m-%d %H:%M %Z (UTC%z)')}"
return f"{prompt}\n\n{stamp}" if prompt else stamp
def build_accounts_prompt(
accounts: list[dict[str, Any]],
tool_provider_map: dict[str, str] | None = None,
node_tool_names: list[str] | None = None,
) -> str:
"""Build a prompt section describing connected accounts."""
if not accounts:
return ""
if tool_provider_map is None:
lines = [
"Connected accounts (use the alias as the `account` parameter "
"when calling tools to target a specific account):"
]
for acct in accounts:
provider = acct.get("provider", "unknown")
alias = acct.get("alias", "unknown")
identity = acct.get("identity", {})
detail_parts = [f"{k}: {v}" for k, v in identity.items() if v]
detail = f" ({', '.join(detail_parts)})" if detail_parts else ""
lines.append(f"- {provider}/{alias}{detail}")
return "\n".join(lines)
provider_tools: dict[str, list[str]] = {}
for tool_name, provider in tool_provider_map.items():
provider_tools.setdefault(provider, []).append(tool_name)
node_tool_set = set(node_tool_names) if node_tool_names else None
provider_accounts: dict[str, list[dict[str, Any]]] = {}
for acct in accounts:
provider = acct.get("provider", "unknown")
provider_accounts.setdefault(provider, []).append(acct)
sections: list[str] = ["Connected accounts:"]
for provider, acct_list in provider_accounts.items():
tools_for_provider = sorted(provider_tools.get(provider, []))
if node_tool_set is not None:
relevant_tools = [tool_name for tool_name in tools_for_provider if tool_name in node_tool_set]
if not relevant_tools:
continue
tools_for_provider = relevant_tools
all_local = all(acct.get("source") == "local" for acct in acct_list)
display_name = provider.replace("_", " ").title()
if tools_for_provider and not all_local:
tools_str = ", ".join(tools_for_provider)
sections.append(f'\n{display_name} (use account="<alias>" with: {tools_str}):')
elif tools_for_provider and all_local:
tools_str = ", ".join(tools_for_provider)
sections.append(f"\n{display_name} (tools: {tools_str}):")
else:
sections.append(f"\n{display_name}:")
for acct in acct_list:
alias = acct.get("alias", "unknown")
identity = acct.get("identity", {})
detail_parts = [f"{k}: {v}" for k, v in identity.items() if v]
detail = f" ({', '.join(detail_parts)})" if detail_parts else ""
source_tag = " [local]" if acct.get("source") == "local" else ""
sections.append(f" - {provider}/{alias}{detail}{source_tag}")
if len(sections) <= 1:
return ""
return "\n".join(sections)
def build_prompt_spec_from_node_context(
ctx: Any,
*,
focus_prompt: str | None = None,
narrative: str | None = None,
memory_prompt: str | None = None,
) -> NodePromptSpec:
"""Convert a NodeContext-like object into structured prompt inputs."""
resolved_memory_prompt = memory_prompt
if resolved_memory_prompt is None:
resolved_memory_prompt = getattr(ctx, "memory_prompt", "") or ""
dynamic_memory_provider = getattr(ctx, "dynamic_memory_provider", None)
if dynamic_memory_provider is not None:
try:
resolved_memory_prompt = dynamic_memory_provider() or ""
except Exception:
resolved_memory_prompt = getattr(ctx, "memory_prompt", "") or ""
return NodePromptSpec(
identity_prompt=ctx.identity_prompt or "",
focus_prompt=focus_prompt if focus_prompt is not None else (ctx.node_spec.system_prompt or ""),
narrative=narrative if narrative is not None else (ctx.narrative or ""),
accounts_prompt=ctx.accounts_prompt or "",
skills_catalog_prompt=ctx.skills_catalog_prompt or "",
protocols_prompt=ctx.protocols_prompt or "",
memory_prompt=resolved_memory_prompt,
node_type=ctx.node_spec.node_type,
output_keys=tuple(ctx.node_spec.output_keys or ()),
is_subagent_mode=bool(getattr(ctx, "is_subagent_mode", False)),
)
def build_system_prompt(spec: NodePromptSpec) -> str:
"""Compose one canonical system prompt for a node."""
parts: list[str] = []
if spec.identity_prompt:
parts.append(spec.identity_prompt)
if spec.accounts_prompt:
parts.append(f"\n{spec.accounts_prompt}")
if spec.skills_catalog_prompt:
parts.append(f"\n{spec.skills_catalog_prompt}")
if spec.protocols_prompt:
parts.append(f"\n{spec.protocols_prompt}")
if spec.memory_prompt:
parts.append(
"\nRelevant recalled memories may appear below. Treat them as "
"point-in-time guidance and verify stale details against current context."
)
parts.append(f"\n{spec.memory_prompt}")
if spec.narrative:
parts.append(f"\n--- Context (what has happened so far) ---\n{spec.narrative}")
if (
not spec.is_subagent_mode
and spec.node_type in ("event_loop", "gcu")
and spec.output_keys
):
parts.append(f"\n{EXECUTION_SCOPE_PREAMBLE}")
if spec.node_type == "gcu":
from framework.graph.gcu import GCU_BROWSER_SYSTEM_PROMPT
parts.append(f"\n{GCU_BROWSER_SYSTEM_PROMPT}")
if spec.focus_prompt:
parts.append(f"\n--- Current Focus ---\n{spec.focus_prompt}")
return stamp_prompt_datetime("\n".join(parts) if parts else "")
def build_system_prompt_for_node_context(
ctx: Any,
*,
focus_prompt: str | None = None,
narrative: str | None = None,
memory_prompt: str | None = None,
) -> str:
"""Build a canonical system prompt from a NodeContext-like object."""
spec = build_prompt_spec_from_node_context(
ctx,
focus_prompt=focus_prompt,
narrative=narrative,
memory_prompt=memory_prompt,
)
return build_system_prompt(spec)
def build_narrative(
buffer: DataBuffer,
execution_path: list[str],
graph: GraphSpec,
) -> str:
"""Build a deterministic Layer 2 narrative from graph state."""
parts: list[str] = []
if execution_path:
phase_descriptions: list[str] = []
for node_id in execution_path:
node_spec = graph.get_node(node_id)
if node_spec:
phase_descriptions.append(f"- {node_spec.name}: {node_spec.description}")
else:
phase_descriptions.append(f"- {node_id}")
parts.append("Phases completed:\n" + "\n".join(phase_descriptions))
all_buffer = buffer.read_all()
if all_buffer:
memory_lines: list[str] = []
for key, value in all_buffer.items():
if value is None:
continue
val_str = str(value)
if len(val_str) > 200:
val_str = val_str[:200] + "..."
memory_lines.append(f"- {key}: {val_str}")
if memory_lines:
parts.append("Current state:\n" + "\n".join(memory_lines))
return "\n\n".join(parts) if parts else ""
def build_transition_message(spec: TransitionSpec) -> str:
"""Build a pure transition marker message."""
sections: list[str] = [
f"--- PHASE TRANSITION: {spec.previous_name} -> {spec.next_name} ---",
f"\nCompleted: {spec.previous_name}",
f" {spec.previous_description}",
]
if spec.buffer_items:
lines = [f" {key}: {value}" for key, value in spec.buffer_items.items()]
sections.append("\nOutputs available:\n" + "\n".join(lines))
if spec.data_files:
sections.append(
"\nData files (use load_data to access):\n"
+ "\n".join(f" {entry}" for entry in spec.data_files)
)
if spec.cumulative_tool_names:
sections.append("\nAvailable tools: " + ", ".join(sorted(spec.cumulative_tool_names)))
sections.append(f"\nNow entering: {spec.next_name}")
sections.append(f" {spec.next_description}")
if spec.next_output_keys:
sections.append(
f"\nYour ONLY job in this phase: complete the task above and call "
f"set_output() for {list(spec.next_output_keys)}. Do NOT do work that "
f"belongs to later phases."
)
sections.append(
"\nBefore proceeding, briefly reflect: what went well in the "
"previous phase? Are there any gaps or surprises worth noting?"
)
sections.append("\n--- END TRANSITION ---")
return "\n".join(sections)
__all__ = [
"EXECUTION_SCOPE_PREAMBLE",
"NodePromptSpec",
"TransitionSpec",
"build_accounts_prompt",
"build_narrative",
"build_prompt_spec_from_node_context",
"build_system_prompt",
"build_system_prompt_for_node_context",
"build_transition_message",
"stamp_prompt_datetime",
]
+899
View File
@@ -0,0 +1,899 @@
"""
WorkerAgent First-class autonomous worker for event-driven graph execution.
Each node in a graph becomes a WorkerAgent that:
- Owns its lifecycle, retry logic, memory scope, and LLM config
- Receives activations from upstream workers (via GraphExecutor routing)
- Self-checks readiness (fan-out group tracking)
- Self-triggers when ready
- Evaluates outgoing edges and publishes activations for downstream workers
"""
from __future__ import annotations
import asyncio
import logging
import time
import uuid
from dataclasses import dataclass, field
from enum import StrEnum
from typing import Any
from framework.graph.context import GraphContext, build_node_context_from_graph_context
from framework.graph.edge import EdgeCondition, EdgeSpec
from framework.graph.node import (
NodeContext,
NodeProtocol,
NodeResult,
NodeSpec,
)
from framework.graph.validator import OutputValidator
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
# Enums & data types
# ---------------------------------------------------------------------------
class WorkerLifecycle(StrEnum):
PENDING = "pending"
RUNNING = "running"
COMPLETED = "completed"
FAILED = "failed"
@dataclass
class FanOutTag:
"""Carried in activations, propagated through the worker chain.
When a source activates multiple targets (fan-out), each activation
receives a FanOutTag. Downstream convergence workers track these tags
to determine when all parallel branches have reached them.
"""
fan_out_id: str # Unique ID for this fan-out event
fan_out_source: str # Node that performed the fan-out
branches: frozenset[str] # All target node IDs in this fan-out
via_branch: str # Which branch this activation passed through
@dataclass
class FanOutTracker:
"""Per fan-out group, tracked by the target worker."""
fan_out_id: str
branches: frozenset[str]
reached: set[str] = field(default_factory=set)
@property
def is_complete(self) -> bool:
return self.reached == self.branches
@dataclass
class Activation:
"""Payload sent from a completed source to a target worker."""
source_id: str
target_id: str
edge_id: str
edge: EdgeSpec
mapped_inputs: dict[str, Any]
fan_out_tags: list[FanOutTag] = field(default_factory=list)
@dataclass
class WorkerCompletion:
"""Payload in WORKER_COMPLETED event."""
worker_id: str
success: bool
output: dict[str, Any]
tokens_used: int = 0
latency_ms: int = 0
conversation: Any = None # NodeConversation for continuous mode
activations: list[Activation] = field(default_factory=list)
@dataclass
class RetryState:
attempt: int = 0
max_retries: int = 3
is_event_loop: bool = False
# ---------------------------------------------------------------------------
# WorkerAgent
# ---------------------------------------------------------------------------
class WorkerAgent:
"""First-class autonomous worker for one node in the graph.
Lifecycle:
PENDING - waiting for activations
RUNNING - executing the node
COMPLETED- finished successfully, activations published
FAILED - failed after retries exhausted
"""
def __init__(
self,
node_spec: NodeSpec,
graph_context: GraphContext,
) -> None:
self.node_spec = node_spec
self._gc = graph_context
# Edge topology (resolved at construction, immutable)
self.incoming_edges: list[EdgeSpec] = graph_context.graph.get_incoming_edges(node_spec.id)
self.outgoing_edges: list[EdgeSpec] = graph_context.graph.get_outgoing_edges(node_spec.id)
# Lifecycle
self.lifecycle: WorkerLifecycle = WorkerLifecycle.PENDING
self._task: asyncio.Task | None = None
# Retry state
self.retry_state = RetryState(
max_retries=node_spec.max_retries,
is_event_loop=node_spec.node_type == "event_loop",
)
# Activation tracking
self._inherited_fan_out_tags: list[FanOutTag] = []
self._active_fan_outs: dict[str, FanOutTracker] = {}
self._received_activations: list[Activation] = []
self._has_been_activated = False
# Pause support
# _run_gate controls whether worker execution may proceed.
# _pause_requested mirrors the pause-request semantics expected by
# EventLoopNode, where is_set() means "pause requested".
self._run_gate: asyncio.Event = asyncio.Event()
self._run_gate.set() # Not paused by default
self._pause_requested: asyncio.Event = asyncio.Event()
# Validator
self._validator = OutputValidator()
# Node implementation (lazy)
self._node_impl: NodeProtocol | None = None
# Metrics for this worker
self._tokens_used: int = 0
self._latency_ms: int = 0
# Last execution result (accessible by polling executor)
self._last_result: NodeResult | None = None
self._last_activations: list[Activation] = []
# ------------------------------------------------------------------
# Public activation interface
# ------------------------------------------------------------------
def activate(self, inherited_tags: list[FanOutTag] | None = None) -> None:
"""Activate this worker — launch execution as an asyncio.Task."""
if self.lifecycle != WorkerLifecycle.PENDING:
return
self._inherited_fan_out_tags = inherited_tags or []
self._has_been_activated = True
self.lifecycle = WorkerLifecycle.RUNNING
self._task = asyncio.ensure_future(self._execute_self())
def receive_activation(self, activation: Activation) -> None:
"""Receive an activation from an upstream worker.
Called by GraphExecutor when routing a WORKER_COMPLETED event's
activations to their target workers.
"""
if self.lifecycle != WorkerLifecycle.PENDING:
return
self._received_activations.append(activation)
# Update fan-out trackers from this activation's tags.
# Skip tags where this worker IS the via_branch — those tags exist
# for downstream convergence tracking, not for gating this worker.
for tag in activation.fan_out_tags:
if tag.via_branch == self.node_spec.id:
continue
if tag.fan_out_id not in self._active_fan_outs:
self._active_fan_outs[tag.fan_out_id] = FanOutTracker(
fan_out_id=tag.fan_out_id,
branches=tag.branches,
)
self._active_fan_outs[tag.fan_out_id].reached.add(tag.via_branch)
def check_readiness(self) -> bool:
"""Check if all fan-out groups have been satisfied."""
if self._has_been_activated:
return True
if not self._active_fan_outs:
# No fan-out tracking — ready on first activation
return bool(self._received_activations)
return all(t.is_complete for t in self._active_fan_outs.values())
def reset_for_revisit(self) -> None:
"""Reset a completed worker so it can execute again (feedback loops).
Preserves the node implementation (cached) but clears lifecycle,
activation, and result state.
"""
self.lifecycle = WorkerLifecycle.PENDING
self._inherited_fan_out_tags = []
self._active_fan_outs = {}
self._received_activations = []
self._has_been_activated = False
self._task = None
self._last_result = None
self._last_activations = []
self._tokens_used = 0
self._latency_ms = 0
# ------------------------------------------------------------------
# Execution
# ------------------------------------------------------------------
async def _execute_self(self) -> None:
"""Main execution loop: run node, handle retries, publish result."""
gc = self._gc
node_spec = self.node_spec
try:
# Write all mapped inputs from received activations to buffer
for activation in self._received_activations:
for key, value in activation.mapped_inputs.items():
gc.buffer.write(key, value, validate=False)
# Increment visit count (always, even if skipped)
async with gc._visits_lock:
visit_count = gc.node_visit_counts.get(node_spec.id, 0) + 1
gc.node_visit_counts[node_spec.id] = visit_count
# Check max_node_visits — skip execution but still propagate edges
if node_spec.max_node_visits > 0 and visit_count > node_spec.max_node_visits:
logger.info(
"Worker %s: visit %d exceeds max_node_visits=%d, skipping",
node_spec.id, visit_count, node_spec.max_node_visits,
)
# Build a synthetic success result from current buffer state
existing_output: dict[str, Any] = {}
for key in node_spec.output_keys:
val = gc.buffer.read(key)
if val is not None:
existing_output[key] = val
result = NodeResult(success=True, output=existing_output)
# Evaluate outgoing edges so the cycle continues
activations = await self._evaluate_outgoing_edges(result)
self.lifecycle = WorkerLifecycle.COMPLETED
self._last_result = result
self._last_activations = activations
return
# Clear stale nullable outputs on re-visit
if visit_count > 1:
nullable_keys = getattr(node_spec, "nullable_output_keys", None) or []
for key in nullable_keys:
if gc.buffer.read(key) is not None:
gc.buffer.write(key, None, validate=False)
# Continuous mode: accumulate tools and output keys
if gc.is_continuous and node_spec.tools:
for t in gc.tools:
if t.name in node_spec.tools and t.name not in gc.cumulative_tool_names:
gc.cumulative_tools.append(t)
gc.cumulative_tool_names.add(t.name)
if gc.is_continuous and node_spec.output_keys:
for k in node_spec.output_keys:
if k not in gc.cumulative_output_keys:
gc.cumulative_output_keys.append(k)
# Append to execution path
async with gc._path_lock:
gc.path.append(node_spec.id)
# Get node implementation
node_impl = self._get_node_implementation()
# Build context
ctx = self._build_node_context()
# Execute with retry
result = await self._execute_with_retries(node_impl, ctx)
# Handle result
if result.success:
# Validate and write outputs
self._write_outputs(result)
# Evaluate outgoing edges
activations = await self._evaluate_outgoing_edges(result)
# Publish completion
self.lifecycle = WorkerLifecycle.COMPLETED
self._last_result = result
self._last_activations = activations
# Colony memory reflection — runs before downstream activation
await self._reflect_colony_memory()
completion = WorkerCompletion(
worker_id=node_spec.id,
success=True,
output=result.output,
tokens_used=result.tokens_used,
latency_ms=result.latency_ms,
conversation=result.conversation,
activations=activations,
)
if gc.is_continuous and completion.conversation is not None:
gc.continuous_conversation = completion.conversation
await self._apply_continuous_transition(completion.activations)
await self._publish_completion(completion)
else:
# Evaluate outgoing edges even on failure (ON_FAILURE edges)
activations = await self._evaluate_outgoing_edges(result)
self.lifecycle = WorkerLifecycle.FAILED
self._last_result = result
self._last_activations = activations
# Colony memory reflection — capture learnings even on failure
await self._reflect_colony_memory()
await self._publish_failure(result.error or "Unknown error")
except Exception as exc:
error = str(exc) or type(exc).__name__
logger.exception("Worker %s crashed during execution", node_spec.id)
self.lifecycle = WorkerLifecycle.FAILED
self._last_result = NodeResult(success=False, error=error)
self._last_activations = []
await self._publish_failure(error)
async def _execute_with_retries(
self, node_impl: NodeProtocol, ctx: NodeContext
) -> NodeResult:
"""Execute node with exponential backoff retry."""
gc = self._gc
# Only skip retries for actual EventLoopNode instances (they handle
# retries internally). Custom NodeProtocol impls registered via
# register_node should be retried by the executor.
from framework.graph.event_loop_node import EventLoopNode as _ELN
if isinstance(node_impl, _ELN):
max_retries = 0
else:
max_retries = self.retry_state.max_retries
total_attempts = max(1, max_retries)
for attempt in range(total_attempts):
# Check pause
await self._run_gate.wait()
ctx.attempt = attempt + 1
start = time.monotonic()
try:
result = await node_impl.execute(ctx)
result.latency_ms = int((time.monotonic() - start) * 1000)
if result.success:
return result
# Failure
if attempt + 1 < total_attempts:
gc.retry_counts[self.node_spec.id] = gc.retry_counts.get(self.node_spec.id, 0) + 1
gc.nodes_with_retries.add(self.node_spec.id)
delay = 1.0 * (2**attempt)
logger.warning(
"Worker %s failed (attempt %d/%d), retrying in %.1fs: %s",
self.node_spec.id,
attempt + 1,
max_retries,
delay,
result.error,
)
# Emit retry event
if gc.event_bus:
await gc.event_bus.emit_node_retry(
stream_id=gc.stream_id,
node_id=self.node_spec.id,
attempt=attempt + 1,
max_retries=max_retries,
execution_id=gc.execution_id,
)
await asyncio.sleep(delay)
continue
else:
return NodeResult(
success=False,
error=f"failed after {attempt + 1} attempts: {result.error}",
)
except Exception as exc:
if attempt + 1 < total_attempts:
gc.retry_counts[self.node_spec.id] = gc.retry_counts.get(self.node_spec.id, 0) + 1
gc.nodes_with_retries.add(self.node_spec.id)
delay = 1.0 * (2**attempt)
logger.warning(
"Worker %s raised %s (attempt %d/%d), retrying in %.1fs",
self.node_spec.id,
type(exc).__name__,
attempt + 1,
max(1, max_retries),
delay,
)
await asyncio.sleep(delay)
continue
return NodeResult(
success=False,
error=f"failed after {attempt + 1} attempts: {exc}",
)
return NodeResult(
success=False,
error=f"failed after {max(1, max_retries)} attempts",
)
# ------------------------------------------------------------------
# Edge evaluation (source-side)
# ------------------------------------------------------------------
async def _evaluate_outgoing_edges(
self, result: NodeResult
) -> list[Activation]:
"""Evaluate outgoing edges and create activations for downstream.
Same logic as current _get_all_traversable_edges() plus
priority filtering for CONDITIONAL edges.
"""
gc = self._gc
edges = gc.graph.get_outgoing_edges(self.node_spec.id)
traversable: list[EdgeSpec] = []
for edge in edges:
target_spec = gc.graph.get_node(edge.target)
if await edge.should_traverse(
source_success=result.success,
source_output=result.output,
buffer_data=gc.buffer.read_all(),
llm=gc.llm,
goal=gc.goal,
source_node_name=self.node_spec.name,
target_node_name=target_spec.name if target_spec else edge.target,
):
traversable.append(edge)
# Priority filtering for CONDITIONAL edges
if len(traversable) > 1:
conditionals = [e for e in traversable if e.condition == EdgeCondition.CONDITIONAL]
if len(conditionals) > 1:
max_prio = max(e.priority for e in conditionals)
traversable = [
e
for e in traversable
if e.condition != EdgeCondition.CONDITIONAL or e.priority == max_prio
]
# When parallel execution is disabled, follow first match only (sequential)
if not gc.enable_parallel_execution and len(traversable) > 1:
traversable = traversable[:1]
# Build activations
is_fan_out = len(traversable) > 1
fan_out_id = f"{self.node_spec.id}_{uuid.uuid4().hex[:8]}" if is_fan_out else None
activations: list[Activation] = []
for edge in traversable:
mapped = edge.map_inputs(result.output, gc.buffer.read_all())
# Build fan-out tags: inherited + new
tags = list(self._inherited_fan_out_tags)
if is_fan_out:
tags.append(
FanOutTag(
fan_out_id=fan_out_id,
fan_out_source=self.node_spec.id,
branches=frozenset(e.target for e in traversable),
via_branch=edge.target,
)
)
activations.append(
Activation(
source_id=self.node_spec.id,
target_id=edge.target,
edge_id=edge.id,
edge=edge,
mapped_inputs=mapped,
fan_out_tags=tags,
)
)
if traversable:
logger.info(
"Worker %s%d outgoing activation(s)%s",
self.node_spec.id,
len(activations),
f" (fan-out: {[a.target_id for a in activations]})" if is_fan_out else "",
)
return activations
# ------------------------------------------------------------------
# Output handling
# ------------------------------------------------------------------
def _write_outputs(self, result: NodeResult) -> None:
"""Validate and write node outputs to buffer."""
gc = self._gc
node_spec = self.node_spec
# Event loop nodes skip executor-level validation (judge is the authority)
if node_spec.node_type != "event_loop":
errors = self._validator.validate_all(
output=result.output,
output_keys=node_spec.output_keys,
nullable_keys=getattr(node_spec, "nullable_output_keys", []) or [],
output_schema=getattr(node_spec, "output_schema", None),
output_model=getattr(node_spec, "output_model", None),
)
if errors:
logger.warning("Worker %s output validation warnings: %s", node_spec.id, errors)
# Determine if this worker is a fan-out branch
is_fanout_branch = any(
tag.via_branch == node_spec.id for tag in self._inherited_fan_out_tags
)
# Collect keys to write: declared output_keys + any extra output items
# (for fan-out branches, all output items need conflict checking)
keys_to_write: set[str] = set(node_spec.output_keys)
if is_fanout_branch:
keys_to_write |= set(result.output.keys())
# Write all keys to buffer
for key in keys_to_write:
value = result.output.get(key)
if value is not None:
if is_fanout_branch:
conflict_strategy = (
getattr(gc.parallel_config, "buffer_conflict_strategy", "last_wins")
if gc.parallel_config
else "last_wins"
)
prior_worker = gc._fanout_written_keys.get(key)
if prior_worker and prior_worker != node_spec.id:
if conflict_strategy == "error":
raise RuntimeError(
f"Buffer write failed (conflict): key '{key}' already written "
f"by worker '{prior_worker}', "
f"conflicting write from '{node_spec.id}'"
)
elif conflict_strategy == "first_wins":
logger.debug(
"Skipping write to '%s' (first_wins: already set by %s)",
key, prior_worker,
)
continue
else:
# last_wins: log and overwrite
logger.debug(
"Key '%s' overwritten (last_wins: %s -> %s)",
key, prior_worker, node_spec.id,
)
gc._fanout_written_keys[key] = node_spec.id
gc.buffer.write(key, value, validate=False)
# ------------------------------------------------------------------
# Context building
# ------------------------------------------------------------------
def _get_node_implementation(self) -> NodeProtocol:
"""Get or create node implementation."""
gc = self._gc
if self._node_impl is not None:
return self._node_impl
# Check shared registry first
if self.node_spec.id in gc.node_registry:
self._node_impl = gc.node_registry[self.node_spec.id]
return self._node_impl
# Auto-create EventLoopNode
if self.node_spec.node_type in ("event_loop", "gcu"):
from framework.graph.event_loop_node import EventLoopNode
from framework.graph.event_loop.types import LoopConfig
from framework.graph.node import warn_if_deprecated_client_facing
conv_store = None
if gc.storage_path:
from framework.storage.conversation_store import FileConversationStore
conv_store = FileConversationStore(base_path=gc.storage_path / "conversations")
spillover = str(gc.storage_path / "data") if gc.storage_path else None
lc = gc.loop_config
warn_if_deprecated_client_facing(self.node_spec)
default_max_iter = 100 if self.node_spec.supports_direct_user_io() else 50
node = EventLoopNode(
event_bus=gc.event_bus,
judge=None,
config=LoopConfig(
max_iterations=lc.get("max_iterations", default_max_iter),
max_tool_calls_per_turn=lc.get("max_tool_calls_per_turn", 30),
tool_call_overflow_margin=lc.get("tool_call_overflow_margin", 0.5),
stall_detection_threshold=lc.get("stall_detection_threshold", 3),
max_context_tokens=lc.get(
"max_context_tokens",
_default_max_context_tokens(),
),
max_tool_result_chars=lc.get("max_tool_result_chars", 30_000),
spillover_dir=spillover,
hooks=lc.get("hooks", {}),
),
tool_executor=gc.tool_executor,
conversation_store=conv_store,
)
gc.node_registry[self.node_spec.id] = node
self._node_impl = node
return node
raise RuntimeError(
f"No implementation for node '{self.node_spec.id}' "
f"(type: {self.node_spec.node_type})"
)
def _build_node_context(self) -> NodeContext:
"""Build NodeContext for this worker's execution."""
return build_node_context_from_graph_context(
self._gc,
node_spec=self.node_spec,
pause_event=self._pause_requested,
)
async def _reflect_colony_memory(self) -> None:
"""Run colony memory reflection at node handoff.
Awaits the shared colony lock so parallel workers queue (never skip).
"""
gc = self._gc
if gc.colony_memory_dir is None or gc.colony_reflect_llm is None:
return
if gc.worker_sessions_dir is None:
return
from pathlib import Path
session_dir = Path(gc.worker_sessions_dir) / gc.execution_id
if not session_dir.exists():
return
# Await lock — serializes reflection but never skips
async with gc._colony_reflect_lock:
try:
from framework.agents.queen.reflection_agent import run_short_reflection
await run_short_reflection(
session_dir, gc.colony_reflect_llm, gc.colony_memory_dir,
caller="worker",
)
except Exception:
logger.warning(
"Worker %s: colony reflection failed",
self.node_spec.id,
exc_info=True,
)
# Update recall cache outside lock (per-execution key, no write races)
try:
from framework.agents.queen.recall_selector import update_recall_cache
await update_recall_cache(
session_dir,
gc.colony_reflect_llm,
memory_dir=gc.colony_memory_dir,
cache_setter=lambda block: gc.colony_recall_cache.__setitem__(
gc.execution_id, block
),
heading="Colony Memories",
)
except Exception:
logger.warning(
"Worker %s: recall cache update failed",
self.node_spec.id,
exc_info=True,
)
# ------------------------------------------------------------------
# Event publishing
# ------------------------------------------------------------------
async def _publish_completion(self, completion: WorkerCompletion) -> None:
"""Publish WORKER_COMPLETED event via the graph-scoped event bus."""
gc = self._gc
if not gc.event_bus:
return
if not hasattr(gc.event_bus, "emit_worker_completed"):
return
# Serialize activations to dicts for event data
activations_data = []
for act in completion.activations:
activations_data.append({
"source_id": act.source_id,
"target_id": act.target_id,
"edge_id": act.edge_id,
"mapped_inputs": act.mapped_inputs,
"fan_out_tags": [
{
"fan_out_id": t.fan_out_id,
"fan_out_source": t.fan_out_source,
"branches": list(t.branches),
"via_branch": t.via_branch,
}
for t in act.fan_out_tags
],
})
await gc.event_bus.emit_worker_completed(
stream_id=gc.stream_id,
node_id=self.node_spec.id,
worker_id=self.node_spec.id,
success=completion.success,
output=completion.output,
activations=activations_data,
execution_id=gc.execution_id,
tokens_used=completion.tokens_used,
latency_ms=completion.latency_ms,
conversation=completion.conversation,
)
async def _publish_failure(self, error: str) -> None:
"""Publish WORKER_FAILED event."""
gc = self._gc
if not gc.event_bus:
return
if not hasattr(gc.event_bus, "emit_worker_failed"):
return
await gc.event_bus.emit_worker_failed(
stream_id=gc.stream_id,
node_id=self.node_spec.id,
worker_id=self.node_spec.id,
error=error,
execution_id=gc.execution_id,
)
async def _apply_continuous_transition(self, activations: list[Activation]) -> None:
"""Apply continuous mode conversation threading for the next node.
This prepares the inherited conversation before the completion event
is published so downstream workers receive a fully updated thread.
"""
gc = self._gc
if not gc.is_continuous or not gc.continuous_conversation:
return
next_node_id = next((activation.target_id for activation in activations), None)
if not next_node_id:
return
next_spec = gc.graph.get_node(next_node_id)
if not next_spec or next_spec.node_type != "event_loop":
return
from framework.graph.prompting import (
TransitionSpec,
build_narrative,
build_system_prompt_for_node_context,
build_transition_message,
)
narrative = build_narrative(gc.buffer, gc.path, gc.graph)
next_ctx = build_node_context_from_graph_context(
gc,
node_spec=next_spec,
pause_event=self._pause_requested,
inherited_conversation=gc.continuous_conversation,
narrative=narrative,
)
gc.continuous_conversation.update_system_prompt(
build_system_prompt_for_node_context(next_ctx)
)
gc.continuous_conversation.set_current_phase(next_spec.id)
buffer_items, data_files = self._prepare_transition_payload()
marker = build_transition_message(
TransitionSpec(
previous_name=self.node_spec.name,
previous_description=self.node_spec.description,
next_name=next_spec.name,
next_description=next_spec.description,
next_output_keys=tuple(next_spec.output_keys or ()),
buffer_items=buffer_items,
cumulative_tool_names=tuple(sorted(gc.cumulative_tool_names)),
data_files=tuple(data_files),
)
)
await gc.continuous_conversation.add_user_message(
marker,
is_transition_marker=True,
)
def _prepare_transition_payload(self) -> tuple[dict[str, str], list[str]]:
"""Build transition marker data and spill oversized values when possible."""
import json
from pathlib import Path
gc = self._gc
data_dir = Path(gc.storage_path / "data") if gc.storage_path else None
buffer_items: dict[str, str] = {}
for key, value in gc.buffer.read_all().items():
if value is None:
continue
val_str = str(value)
if len(val_str) > 300 and data_dir is not None:
data_dir.mkdir(parents=True, exist_ok=True)
ext = ".json" if isinstance(value, (dict, list)) else ".txt"
filename = f"output_{key}{ext}"
file_path = data_dir / filename
try:
write_content = (
json.dumps(value, indent=2, ensure_ascii=False)
if isinstance(value, (dict, list))
else str(value)
)
file_path.write_text(write_content, encoding="utf-8")
file_size = file_path.stat().st_size
buffer_items[key] = (
f"[Saved to '{filename}' ({file_size:,} bytes). "
f"Use load_data(filename='{filename}') to access.]"
)
continue
except Exception:
pass
buffer_items[key] = val_str[:300] + "..." if len(val_str) > 300 else val_str
data_files: list[str] = []
if data_dir is not None and data_dir.exists():
data_files = [
f"{entry.name} ({entry.stat().st_size:,} bytes)"
for entry in sorted(data_dir.iterdir())
if entry.is_file()
]
return buffer_items, data_files
# ------------------------------------------------------------------
# Utility
# ------------------------------------------------------------------
def pause(self) -> None:
self._pause_requested.set()
self._run_gate.clear()
def resume(self) -> None:
self._pause_requested.clear()
self._run_gate.set()
@property
def is_terminal(self) -> bool:
return self.node_spec.id in (self._gc.graph.terminal_nodes or [])
@property
def is_entry(self) -> bool:
return len(self.incoming_edges) == 0
def _default_max_context_tokens() -> int:
"""Resolve max_context_tokens from global config, falling back to 32000."""
try:
from framework.config import get_max_context_tokens # type: ignore[import-untyped]
return get_max_context_tokens()
except Exception:
return 32_000
+23
View File
@@ -1937,6 +1937,29 @@ class LiteLLMProvider(LLMProvider):
return
except Exception as e:
# Some providers return non-standard finish_reason values
# (e.g., kimi-k2.5 sends 'pause_turn') that LiteLLM's
# internal stream_chunk_builder rejects via Pydantic
# validation. If we already accumulated content and built
# tail_events before the error, the stream was successful —
# yield what we have instead of discarding it.
if (accumulated_text or tool_calls_acc) and tail_events:
_is_finish_reason_err = (
"finish_reason" in str(e) and "validation error" in str(e).lower()
)
if _is_finish_reason_err:
logger.warning(
"[stream] %s: LiteLLM finish_reason validation "
"error (non-standard provider value). "
"Content was streamed successfully — "
"using accumulated result. Error: %s",
self.model,
e,
)
for event in tail_events:
yield event
return
if self._should_use_openrouter_tool_compat(e, tools):
_remember_openrouter_tool_compat_model(self.model)
async for event in self._stream_via_openrouter_tool_compat(
-1
View File
@@ -1 +0,0 @@
"""Framework-level worker monitoring package."""
-3
View File
@@ -1,7 +1,6 @@
"""Agent Runner - load and run exported agents."""
from framework.runner.mcp_registry import MCPRegistry
from framework.runner.orchestrator import AgentOrchestrator
from framework.runner.protocol import (
AgentMessage,
CapabilityLevel,
@@ -20,8 +19,6 @@ __all__ = [
"ToolRegistry",
"MCPRegistry",
"tool",
# Multi-agent
"AgentOrchestrator",
"AgentMessage",
"MessageType",
"CapabilityLevel",
+88 -304
View File
@@ -51,6 +51,11 @@ def register_commands(subparsers: argparse._SubParsersAction) -> None:
action="store_true",
help="Show detailed execution logs (steps, LLM calls, etc.)",
)
run_parser.add_argument(
"--debug",
action="store_true",
help="Show all debug-level logs",
)
run_parser.add_argument(
"--model",
@@ -119,46 +124,6 @@ def register_commands(subparsers: argparse._SubParsersAction) -> None:
)
list_parser.set_defaults(func=cmd_list)
# dispatch command (multi-agent)
dispatch_parser = subparsers.add_parser(
"dispatch",
help="Dispatch request to multiple agents",
description="Route a request to the best agent(s) using the orchestrator.",
)
dispatch_parser.add_argument(
"agents_dir",
type=str,
nargs="?",
default="exports",
help="Directory containing agent folders (default: exports)",
)
dispatch_parser.add_argument(
"--input",
"-i",
type=str,
required=True,
help="Input context as JSON string",
)
dispatch_parser.add_argument(
"--intent",
type=str,
help="Description of what you want to accomplish",
)
dispatch_parser.add_argument(
"--agents",
"-a",
type=str,
nargs="+",
help="Specific agent names to use (default: all in directory)",
)
dispatch_parser.add_argument(
"--quiet",
"-q",
action="store_true",
help="Only output the final result JSON",
)
dispatch_parser.set_defaults(func=cmd_dispatch)
# shell command (interactive agent session)
shell_parser = subparsers.add_parser(
"shell",
@@ -177,11 +142,6 @@ def register_commands(subparsers: argparse._SubParsersAction) -> None:
default="exports",
help="Directory containing agents (default: exports)",
)
shell_parser.add_argument(
"--multi",
action="store_true",
help="Enable multi-agent mode with orchestrator",
)
shell_parser.add_argument(
"--no-approve",
action="store_true",
@@ -290,7 +250,10 @@ def register_commands(subparsers: argparse._SubParsersAction) -> None:
def _load_resume_state(
agent_path: str, session_id: str, checkpoint_id: str | None = None
) -> dict | None:
"""Load session or checkpoint state for headless resume.
"""Load checkpoint state for headless resume.
All resumes require a checkpoint. If ``checkpoint_id`` is not provided
the latest checkpoint is auto-discovered.
Args:
agent_path: Path to the agent folder (e.g., exports/my_agent)
@@ -298,7 +261,7 @@ def _load_resume_state(
checkpoint_id: Optional checkpoint ID within the session
Returns:
session_state dict for executor, or None if not found
session_state dict for executor, or None if no checkpoint found
"""
agent_name = Path(agent_path).name
agent_work_dir = Path.home() / ".hive" / "agents" / agent_name
@@ -307,40 +270,37 @@ def _load_resume_state(
if not session_dir.exists():
return None
if checkpoint_id:
# Checkpoint-based resume: load checkpoint and extract state
cp_path = session_dir / "checkpoints" / f"{checkpoint_id}.json"
if not cp_path.exists():
# Auto-discover latest checkpoint when not specified
if not checkpoint_id:
cp_dir = session_dir / "checkpoints"
if cp_dir.exists():
checkpoints = sorted(
cp_dir.glob("*.json"),
key=lambda p: p.stat().st_mtime,
reverse=True,
)
if checkpoints:
checkpoint_id = checkpoints[0].stem
if not checkpoint_id:
return None
try:
cp_data = json.loads(cp_path.read_text(encoding="utf-8"))
except (json.JSONDecodeError, OSError):
return None
return {
"resume_session_id": session_id,
"memory": cp_data.get("shared_memory", {}),
"paused_at": cp_data.get("next_node") or cp_data.get("current_node"),
"execution_path": cp_data.get("execution_path", []),
"node_visit_counts": {},
}
else:
# Session state resume: load state.json
state_path = session_dir / "state.json"
if not state_path.exists():
return None
try:
state_data = json.loads(state_path.read_text(encoding="utf-8"))
except (json.JSONDecodeError, OSError):
return None
progress = state_data.get("progress", {})
paused_at = progress.get("paused_at") or progress.get("resume_from")
return {
"resume_session_id": session_id,
"memory": state_data.get("memory", {}),
"paused_at": paused_at,
"execution_path": progress.get("path", []),
"node_visit_counts": progress.get("node_visit_counts", {}),
}
cp_path = session_dir / "checkpoints" / f"{checkpoint_id}.json"
if not cp_path.exists():
return None
try:
cp_data = json.loads(cp_path.read_text(encoding="utf-8"))
except (json.JSONDecodeError, OSError):
return None
return {
"resume_session_id": session_id,
"resume_from_checkpoint": checkpoint_id,
"run_id": cp_data.get("run_id") or None,
"data_buffer": cp_data.get("data_buffer", cp_data.get("shared_memory", {})),
"paused_at": cp_data.get("next_node") or cp_data.get("current_node"),
"execution_path": cp_data.get("execution_path", []),
"node_visit_counts": cp_data.get("node_visit_counts", {}),
}
def _prompt_before_start(agent_path: str, runner, model: str | None = None):
@@ -387,6 +347,8 @@ def cmd_run(args: argparse.Namespace) -> int:
# Set logging level (quiet by default for cleaner output)
if args.quiet:
configure_logging(level="ERROR")
elif getattr(args, "debug", False):
configure_logging(level="DEBUG")
elif getattr(args, "verbose", False):
configure_logging(level="INFO")
else:
@@ -722,118 +684,6 @@ def cmd_list(args: argparse.Namespace) -> int:
return 0
def cmd_dispatch(args: argparse.Namespace) -> int:
"""Dispatch request to multiple agents via orchestrator."""
from framework.runner import AgentOrchestrator
# Parse input
try:
context = json.loads(args.input)
except json.JSONDecodeError as e:
print(f"Error parsing --input JSON: {e}", file=sys.stderr)
return 1
# Find agents
agents_dir = Path(args.agents_dir)
if not agents_dir.exists():
print(f"Directory not found: {agents_dir}", file=sys.stderr)
return 1
# Create orchestrator and register agents
orchestrator = AgentOrchestrator()
agent_paths = []
if args.agents:
# Use specific agents
for agent_name in args.agents:
# Guard against full paths: if the name contains path separators
# (e.g. "exports/my_agent"), it will be doubled with agents_dir
agent_name_path = Path(agent_name)
if len(agent_name_path.parts) > 1:
print(
f"Error: --agents expects agent names, not paths. "
f"Use: --agents {agent_name_path.name} "
f"instead of --agents {agent_name}",
file=sys.stderr,
)
return 1
agent_path = agents_dir / agent_name
if not _is_valid_agent_dir(agent_path):
print(f"Agent not found: {agent_path}", file=sys.stderr)
return 1
agent_paths.append((agent_name, agent_path))
else:
# Discover all agents
for path in agents_dir.iterdir():
if _is_valid_agent_dir(path):
agent_paths.append((path.name, path))
if not agent_paths:
print(f"No agents found in {agents_dir}", file=sys.stderr)
return 1
# Register agents
for name, path in agent_paths:
try:
orchestrator.register(name, path)
if not args.quiet:
print(f"Registered agent: {name}")
except Exception as e:
print(f"Failed to register {name}: {e}", file=sys.stderr)
if not args.quiet:
print()
print(f"Input: {json.dumps(context)}")
if args.intent:
print(f"Intent: {args.intent}")
print()
print("=" * 60)
print("Dispatching to agents...")
print("=" * 60)
print()
# Dispatch
result = asyncio.run(orchestrator.dispatch(context, intent=args.intent))
# Output results
if args.quiet:
output = {
"success": result.success,
"handled_by": result.handled_by,
"results": result.results,
"error": result.error,
}
print(json.dumps(output, indent=2, default=str))
else:
print()
print("=" * 60)
print(f"Success: {result.success}")
print(f"Handled by: {', '.join(result.handled_by) or 'none'}")
if result.error:
print(f"Error: {result.error}")
print("=" * 60)
if result.results:
print("\n--- Results by Agent ---")
for agent_name, data in result.results.items():
print(f"\n{agent_name}:")
status = data.get("status", "unknown")
print(f" Status: {status}")
if "completed_steps" in data:
print(f" Steps: {len(data['completed_steps'])}")
if "results" in data:
results_preview = json.dumps(data["results"], default=str)
if len(results_preview) > 200:
results_preview = results_preview[:200] + "..."
print(f" Results: {results_preview}")
if not args.quiet:
print(f"\nMessage trace: {len(result.messages)} messages")
orchestrator.cleanup()
return 0 if result.success else 1
def _interactive_approval(request):
"""Interactive approval callback for HITL mode."""
from framework.graph import ApprovalDecision, ApprovalResult
@@ -931,11 +781,6 @@ def cmd_shell(args: argparse.Namespace) -> int:
agents_dir = Path(args.agents_dir)
# Multi-agent mode with orchestrator
if args.multi:
return _interactive_multi(agents_dir)
# Single agent mode
agent_path = args.agent_path
if not agent_path:
# List available agents and let user choose
@@ -1408,108 +1253,6 @@ def _select_agent(agents_dir: Path) -> str | None:
print()
return None
def _interactive_multi(agents_dir: Path) -> int:
"""Interactive multi-agent mode with orchestrator."""
from framework.runner import AgentOrchestrator
if not agents_dir.exists():
print(f"Directory not found: {agents_dir}", file=sys.stderr)
return 1
orchestrator = AgentOrchestrator()
agent_count = 0
# Register all agents
for path in agents_dir.iterdir():
if _is_valid_agent_dir(path):
try:
orchestrator.register(path.name, path)
agent_count += 1
except Exception as e:
print(f"Warning: Failed to register {path.name}: {e}")
if agent_count == 0:
print(f"No agents found in {agents_dir}", file=sys.stderr)
return 1
print(f"\n{'=' * 60}")
print("Multi-Agent Interactive Mode")
print(f"Registered {agent_count} agents")
print(f"{'=' * 60}")
print("\nCommands:")
print(" /agents - List registered agents")
print(" /quit - Exit")
print(" {...} - JSON input to dispatch")
print()
while True:
try:
user_input = input(">>> ").strip()
except (EOFError, KeyboardInterrupt):
print("\nExiting...")
break
if not user_input:
continue
if user_input == "/quit":
break
if user_input == "/agents":
print("\nRegistered agents:")
for agent in orchestrator.list_agents():
print(f" - {agent['name']}: {agent['description'][:60]}...")
print()
continue
# Parse intent if provided
intent = None
if user_input.startswith("/intent "):
parts = user_input.split(" ", 2)
if len(parts) >= 3:
intent = parts[1]
user_input = parts[2]
# Try to parse as JSON
try:
context = json.loads(user_input)
except json.JSONDecodeError:
print("Error: Invalid JSON input. Use {...} format.")
continue
print(f"\nDispatching: {json.dumps(context)}")
if intent:
print(f"Intent: {intent}")
print("-" * 40)
result = asyncio.run(orchestrator.dispatch(context, intent=intent))
print(f"\nSuccess: {result.success}")
print(f"Handled by: {', '.join(result.handled_by) or 'none'}")
if result.error:
print(f"Error: {result.error}")
if result.results:
print("\nResults by agent:")
for agent_name, data in result.results.items():
print(f"\n {agent_name}:")
status = data.get("status", "unknown")
print(f" Status: {status}")
if "results" in data:
results_preview = json.dumps(data["results"], default=str)
if len(results_preview) > 150:
results_preview = results_preview[:150] + "..."
print(f" Results: {results_preview}")
print(f"\nMessage trace: {len(result.messages)} messages")
print()
orchestrator.cleanup()
return 0
def cmd_setup_credentials(args: argparse.Namespace) -> int:
"""Interactive credential setup for an agent."""
from framework.credentials.setup import CredentialSetupSession
@@ -1532,10 +1275,51 @@ def cmd_setup_credentials(args: argparse.Namespace) -> int:
return 0 if result.success else 1
def _find_chrome_bin() -> str | None:
"""Return the path to a Chrome/Chromium binary, or None if not found."""
import shutil
for candidate in (
"google-chrome",
"google-chrome-stable",
"chromium",
"chromium-browser",
"microsoft-edge",
"microsoft-edge-stable",
):
if shutil.which(candidate):
return candidate
mac_paths = [
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome",
Path.home() / "Applications/Google Chrome.app/Contents/MacOS/Google Chrome",
"/Applications/Microsoft Edge.app/Contents/MacOS/Microsoft Edge",
]
for p in mac_paths:
if Path(p).exists():
return str(p)
return None
def _open_browser(url: str) -> None:
"""Open URL in the default browser (best-effort, non-blocking)."""
"""Open URL in the browser (best-effort, non-blocking)."""
import subprocess
chrome = _find_chrome_bin()
try:
if chrome:
subprocess.Popen(
[chrome, url],
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
)
return
except Exception:
pass
# Fallback: open with system default browser
try:
if sys.platform == "darwin":
subprocess.Popen(
@@ -1676,10 +1460,10 @@ def cmd_serve(args: argparse.Namespace) -> int:
# Preload agents specified via --agent
for agent_path in args.agent:
try:
session = await manager.create_session_with_worker(agent_path, model=model)
session = await manager.create_session_with_worker_graph(agent_path, model=model)
info = session.worker_info
name = info.name if info else session.worker_id
print(f"Loaded agent: {session.worker_id} ({name})")
name = info.name if info else session.graph_id
print(f"Loaded agent: {session.graph_id} ({name})")
except Exception as e:
print(f"Error loading {agent_path}: {e}")
@@ -1702,7 +1486,7 @@ def cmd_serve(args: argparse.Namespace) -> int:
if has_frontend:
print(f"Dashboard: {dashboard_url}")
print(f"Health: {dashboard_url}/api/health")
print(f"Agents loaded: {sum(1 for s in manager.list_sessions() if s.worker_runtime)}")
print(f"Agents loaded: {sum(1 for s in manager.list_sessions() if s.graph_runtime)}")
print()
print("Press Ctrl+C to stop")
@@ -1,252 +0,0 @@
from __future__ import annotations
import json
import logging
from pathlib import Path
from typing import Any
logger = logging.getLogger(__name__)
_CACHE_INDEX_PATH = Path.home() / ".hive" / "mcp_registry" / "cache" / "registry_index.json"
_FIXTURE_INDEX_PATH = Path(__file__).resolve().parent / "fixtures" / "registry_index.json"
def resolve_registry_servers(
*,
include: list[str] | None = None,
tags: list[str] | None = None,
exclude: list[str] | None = None,
profile: str | None = None,
max_tools: int | None = None,
versions: dict[str, str] | None = None,
) -> list[dict[str, Any]]:
"""
Resolve registry-sourced MCP servers for `mcp_registry.json` selection.
This function is written to be mock-friendly during early development:
- If the real `MCPRegistry` core module is present, delegate to it.
- Otherwise, fall back to a cached local index (`~/.hive/.../registry_index.json`)
and then to the repo fixture index.
"""
# `max_tools` is enforced by ToolRegistry. We keep it in the resolver
# signature to match the PRD and future MCPRegistry interfaces.
_ = max_tools
try:
from framework.runner.mcp_registry import MCPRegistry # type: ignore
registry = MCPRegistry()
resolved = registry.resolve_for_agent(
include=include or [],
tags=tags or [],
exclude=exclude or [],
profile=profile,
max_tools=max_tools,
versions=versions or {},
)
# Future-proof: normalize both dicts and typed objects to dicts.
return [_normalize_server_config(x) for x in resolved]
except ImportError:
# Expected while #6349/#6574 is not merged locally.
pass
except Exception as e:
logger.warning("MCPRegistry resolution failed; falling back to cache/fixtures: %s", e)
return _resolve_from_local_index(
include=include,
tags=tags,
exclude=exclude,
profile=profile,
versions=versions or {},
)
def _resolve_from_local_index(
*,
include: list[str] | None,
tags: list[str] | None,
exclude: list[str] | None,
profile: str | None,
versions: dict[str, str],
) -> list[dict[str, Any]]:
index = _load_index_json()
servers = _coerce_index_servers(index)
servers_by_name: dict[str, dict[str, Any]] = {
s["name"]: s for s in servers if isinstance(s, dict) and "name" in s
}
include_list = include or []
tags_list = tags or []
exclude_set = set(exclude or [])
def _profiles_of(entry: dict[str, Any]) -> set[str]:
if isinstance(entry.get("profiles"), list):
return set(entry["profiles"])
hive = entry.get("hive")
if isinstance(hive, dict) and isinstance(hive.get("profiles"), list):
return set(hive["profiles"])
return set()
def _tags_of(entry: dict[str, Any]) -> set[str]:
if isinstance(entry.get("tags"), list):
return set(entry["tags"])
return set()
def _entry_version(entry: dict[str, Any]) -> str | None:
# Prefer flat `version`, but support a few common shapes.
v = entry.get("version")
if isinstance(v, str):
return v
v2 = entry.get("manifest_version")
if isinstance(v2, str):
return v2
hive = entry.get("manifest")
if isinstance(hive, dict) and isinstance(hive.get("version"), str):
return hive["version"]
return None
def _version_allows(server_name: str) -> bool:
if server_name not in versions:
return True
pinned = versions[server_name]
entry = servers_by_name.get(server_name)
if not entry:
return False
return _entry_version(entry) == pinned
resolved_names: list[str] = []
resolved_set: set[str] = set()
# 1) Include-order first
for name in include_list:
if name in exclude_set:
continue
if name in servers_by_name and _version_allows(name) and name not in resolved_set:
resolved_names.append(name)
resolved_set.add(name)
# 2) Then tag/profile matches, alphabetical
profile_candidates = set()
if profile:
for name, entry in servers_by_name.items():
if name in exclude_set or not _version_allows(name):
continue
if profile in _profiles_of(entry):
profile_candidates.add(name)
tag_candidates = set()
if tags_list:
tags_set = set(tags_list)
for name, entry in servers_by_name.items():
if name in exclude_set or not _version_allows(name):
continue
if _tags_of(entry).intersection(tags_set):
tag_candidates.add(name)
tag_profile_names = sorted((profile_candidates | tag_candidates) - resolved_set)
resolved_names.extend(tag_profile_names)
# Missing requested servers should warn (FR-54).
for name in include_list:
if name in exclude_set:
continue
if name not in resolved_set:
if name not in servers_by_name:
logger.warning(
"Server '%s' requested by mcp_registry.json but not found in index. "
"Run: hive mcp install %s",
name,
name,
)
elif name in versions:
logger.warning(
"Server '%s' was requested but pinned version '%s' was not found in index. "
"Run: hive mcp update %s or change the pin in mcp_registry.json",
name,
versions[name],
name,
)
else:
logger.warning(
"Server '%s' requested by mcp_registry.json was not selected. "
"Check selection filters/exclude lists.",
name,
)
resolved_configs: list[dict[str, Any]] = []
repo_root = Path(__file__).resolve().parents[3]
for name in resolved_names:
entry = servers_by_name.get(name)
if not entry:
continue
config = entry.get("mcp_config")
if not isinstance(config, dict):
# Best-effort: allow a direct MCP config shape at top-level.
config = {
k: v
for k, v in entry.items()
if k
in {
"name",
"transport",
"command",
"args",
"env",
"cwd",
"url",
"headers",
"description",
}
}
mcp_config = dict(config)
mcp_config["name"] = name
if mcp_config.get("transport") == "stdio":
_absolutize_stdio_config_in_place(repo_root, mcp_config)
resolved_configs.append(mcp_config)
return resolved_configs
def _load_index_json() -> Any:
if _CACHE_INDEX_PATH.exists():
return json.loads(_CACHE_INDEX_PATH.read_text(encoding="utf-8"))
if _FIXTURE_INDEX_PATH.exists():
logger.info("Using local fixture index because registry cache is missing")
return json.loads(_FIXTURE_INDEX_PATH.read_text(encoding="utf-8"))
logger.warning("No local MCP registry index found (cache and fixture missing)")
return {"servers": []}
def _coerce_index_servers(index: Any) -> list[dict[str, Any]]:
if isinstance(index, list):
return [x for x in index if isinstance(x, dict)]
if isinstance(index, dict):
servers = index.get("servers", [])
if isinstance(servers, list):
return [x for x in servers if isinstance(x, dict)]
return []
def _normalize_server_config(raw: Any) -> dict[str, Any]:
if isinstance(raw, dict):
return dict(raw)
# Future-proof object-to-dict normalization.
for attr in ("to_dict", "model_dump"):
maybe = getattr(raw, attr, None)
if callable(maybe):
return dict(maybe())
return dict(getattr(raw, "__dict__", {}))
def _absolutize_stdio_config_in_place(repo_root: Path, config: dict[str, Any]) -> None:
cwd = config.get("cwd")
if isinstance(cwd, str) and not Path(cwd).is_absolute():
config["cwd"] = str((repo_root / cwd).resolve())
# We intentionally do not absolutize `args` here.
# For stdio servers, arguments may include the script name relative to
# `cwd` (e.g. "coder_tools_server.py" with cwd="tools"). ToolRegistry's
# stdio resolution logic handles script path checks and platform quirks.
-517
View File
@@ -1,517 +0,0 @@
"""Agent Orchestrator - routes requests and relays messages between agents."""
from __future__ import annotations
import asyncio
import json
from dataclasses import dataclass, field
from pathlib import Path
from typing import Any
from framework.llm.provider import LLMProvider
from framework.runner.protocol import (
AgentMessage,
CapabilityLevel,
CapabilityResponse,
MessageType,
OrchestratorResult,
RegisteredAgent,
)
from framework.runner.runner import AgentRunner
@dataclass
class RoutingDecision:
"""Decision about which agent(s) should handle a request."""
selected_agents: list[str]
reasoning: str
confidence: float
should_parallelize: bool = False
fallback_agents: list[str] = field(default_factory=list)
class AgentOrchestrator:
"""
Manages multiple agents and routes communications between them.
The orchestrator:
1. Maintains a registry of available agents
2. Routes incoming requests to appropriate agent(s) using LLM
3. Relays messages between agents
4. Logs all communications for traceability
Usage:
orchestrator = AgentOrchestrator()
orchestrator.register("sales", "exports/outbound-sales")
orchestrator.register("support", "exports/customer-support")
result = await orchestrator.dispatch({
"intent": "help customer with billing issue",
"customer_id": "123",
})
"""
def __init__(
self,
llm: LLMProvider | None = None,
model: str = "claude-haiku-4-5-20251001",
):
"""
Initialize the orchestrator.
Args:
llm: LLM provider for routing decisions (auto-creates if None)
model: Model to use for routing
"""
self._agents: dict[str, RegisteredAgent] = {}
self._llm = llm
self._model = model
self._message_log: list[AgentMessage] = []
# Auto-create LLM - LiteLLM auto-detects provider and API key from model name
if self._llm is None:
from framework.config import get_api_base, get_api_key, get_llm_extra_kwargs
from framework.llm.litellm import LiteLLMProvider
self._llm = LiteLLMProvider(
model=self._model,
api_key=get_api_key(),
api_base=get_api_base(),
**get_llm_extra_kwargs(),
)
def register(
self,
name: str,
agent_path: str | Path,
capabilities: list[str] | None = None,
priority: int = 0,
) -> None:
"""
Register an agent with the orchestrator.
Args:
name: Unique name for this agent
agent_path: Path to agent folder (containing agent.json)
capabilities: Optional list of capability keywords
priority: Higher = checked first for routing
"""
runner = AgentRunner.load(agent_path)
info = runner.info()
self._agents[name] = RegisteredAgent(
name=name,
runner=runner,
description=info.description,
capabilities=capabilities or [],
priority=priority,
)
def register_runner(
self,
name: str,
runner: AgentRunner,
capabilities: list[str] | None = None,
priority: int = 0,
) -> None:
"""
Register an existing AgentRunner.
Args:
name: Unique name for this agent
runner: AgentRunner instance
capabilities: Optional list of capability keywords
priority: Higher = checked first for routing
"""
info = runner.info()
self._agents[name] = RegisteredAgent(
name=name,
runner=runner,
description=info.description,
capabilities=capabilities or [],
priority=priority,
)
def list_agents(self) -> list[dict]:
"""List all registered agents."""
return [
{
"name": agent.name,
"description": agent.description,
"capabilities": agent.capabilities,
"priority": agent.priority,
}
for agent in sorted(
self._agents.values(),
key=lambda a: -a.priority,
)
]
async def dispatch(
self,
request: dict,
intent: str | None = None,
) -> OrchestratorResult:
"""
Route a request to the appropriate agent(s).
Args:
request: The request data
intent: Optional description of what's being asked
Returns:
OrchestratorResult with results from handling agent(s)
"""
messages: list[AgentMessage] = []
# Create initial message
initial_message = AgentMessage(
type=MessageType.REQUEST,
intent=intent or "Process request",
content=request,
)
messages.append(initial_message)
self._message_log.append(initial_message)
# Step 1: Check capabilities of all agents
capabilities = await self._check_all_capabilities(request)
# Step 2: Route to best agent(s)
routing = await self._route_request(request, intent, capabilities)
if not routing.selected_agents:
return OrchestratorResult(
success=False,
handled_by=[],
results={},
messages=messages,
error="No agent capable of handling this request",
)
# Step 3: Execute on selected agent(s)
results: dict[str, Any] = {}
handled_by: list[str] = []
if routing.should_parallelize and len(routing.selected_agents) > 1:
# Run agents in parallel
tasks = []
for agent_name in routing.selected_agents:
msg = AgentMessage(
type=MessageType.REQUEST,
from_agent="orchestrator",
to_agent=agent_name,
intent=intent or "Process request",
content=request,
parent_id=initial_message.id,
)
messages.append(msg)
self._message_log.append(msg)
tasks.append(self._send_to_agent(agent_name, msg))
responses = await asyncio.gather(*tasks, return_exceptions=True)
for agent_name, response in zip(routing.selected_agents, responses, strict=False):
if isinstance(response, Exception):
results[agent_name] = {"error": str(response)}
else:
messages.append(response)
self._message_log.append(response)
results[agent_name] = response.content
handled_by.append(agent_name)
else:
# Run agents sequentially
accumulated_context = dict(request)
for agent_name in routing.selected_agents:
msg = AgentMessage(
type=MessageType.REQUEST,
from_agent="orchestrator",
to_agent=agent_name,
intent=intent or "Process request",
content=accumulated_context,
parent_id=initial_message.id,
)
messages.append(msg)
self._message_log.append(msg)
try:
response = await self._send_to_agent(agent_name, msg)
messages.append(response)
self._message_log.append(response)
results[agent_name] = response.content
handled_by.append(agent_name)
# Pass results to next agent
if "results" in response.content:
accumulated_context.update(response.content["results"])
except Exception as e:
results[agent_name] = {"error": str(e)}
# Try fallback if available
if routing.fallback_agents:
fallback = routing.fallback_agents.pop(0)
routing.selected_agents.append(fallback)
return OrchestratorResult(
success=len(handled_by) > 0,
handled_by=handled_by,
results=results,
messages=messages,
)
async def relay(
self,
from_agent: str,
to_agent: str,
content: dict,
intent: str = "",
) -> AgentMessage:
"""
Relay a message from one agent to another.
Args:
from_agent: Source agent name
to_agent: Target agent name
content: Message content
intent: Description of what's being asked
Returns:
Response message from target agent
"""
if to_agent not in self._agents:
raise ValueError(f"Unknown agent: {to_agent}")
message = AgentMessage(
type=MessageType.HANDOFF,
from_agent=from_agent,
to_agent=to_agent,
intent=intent,
content=content,
)
self._message_log.append(message)
response = await self._send_to_agent(to_agent, message)
self._message_log.append(response)
return response
async def broadcast(
self,
content: dict,
intent: str = "",
exclude: list[str] | None = None,
) -> dict[str, AgentMessage]:
"""
Send a message to all agents.
Args:
content: Message content
intent: Description of what's being asked
exclude: Agent names to exclude
Returns:
Dict of agent name -> response message
"""
exclude = exclude or []
responses: dict[str, AgentMessage] = {}
message = AgentMessage(
type=MessageType.BROADCAST,
from_agent="orchestrator",
intent=intent,
content=content,
)
self._message_log.append(message)
tasks = []
agent_names = []
for name in self._agents:
if name not in exclude:
agent_names.append(name)
tasks.append(self._send_to_agent(name, message))
results = await asyncio.gather(*tasks, return_exceptions=True)
for name, result in zip(agent_names, results, strict=False):
if isinstance(result, Exception):
responses[name] = AgentMessage(
type=MessageType.RESPONSE,
from_agent=name,
content={"error": str(result)},
parent_id=message.id,
)
else:
responses[name] = result
self._message_log.append(result)
return responses
async def _check_all_capabilities(
self,
request: dict,
) -> dict[str, CapabilityResponse]:
"""Check all agents' capabilities in parallel."""
tasks = []
agent_names = []
for name, agent in self._agents.items():
agent_names.append(name)
tasks.append(agent.runner.can_handle(request, self._llm))
results = await asyncio.gather(*tasks, return_exceptions=True)
capabilities = {}
for name, result in zip(agent_names, results, strict=False):
if isinstance(result, Exception):
capabilities[name] = CapabilityResponse(
agent_name=name,
level=CapabilityLevel.CANNOT_HANDLE,
confidence=0.0,
reasoning=f"Error: {result}",
)
else:
capabilities[name] = result
return capabilities
async def _route_request(
self,
request: dict,
intent: str | None,
capabilities: dict[str, CapabilityResponse],
) -> RoutingDecision:
"""Decide which agent(s) should handle the request."""
# Filter to capable agents
capable = [
(name, cap)
for name, cap in capabilities.items()
if cap.level in (CapabilityLevel.BEST_FIT, CapabilityLevel.CAN_HANDLE)
]
# Sort by confidence (highest first)
capable.sort(key=lambda x: -x[1].confidence)
# If only one capable agent, use it
if len(capable) == 1:
return RoutingDecision(
selected_agents=[capable[0][0]],
reasoning=capable[0][1].reasoning,
confidence=capable[0][1].confidence,
)
# If multiple capable agents and we have LLM, let it decide
if len(capable) > 1 and self._llm:
return await self._llm_route(request, intent, capable)
# If no capable agents, check uncertain ones
uncertain = [
(name, cap)
for name, cap in capabilities.items()
if cap.level == CapabilityLevel.UNCERTAIN
]
if uncertain:
uncertain.sort(key=lambda x: -x[1].confidence)
return RoutingDecision(
selected_agents=[uncertain[0][0]],
reasoning=f"Uncertain match: {uncertain[0][1].reasoning}",
confidence=uncertain[0][1].confidence,
fallback_agents=[u[0] for u in uncertain[1:3]],
)
# No agents can handle
return RoutingDecision(
selected_agents=[],
reasoning="No capable agents found",
confidence=0.0,
)
async def _llm_route(
self,
request: dict,
intent: str | None,
capable: list[tuple[str, CapabilityResponse]],
) -> RoutingDecision:
"""Use LLM to decide routing when multiple agents are capable."""
agents_info = "\n".join(
f"- {name}: {cap.reasoning} (confidence: {cap.confidence:.2f})" for name, cap in capable
)
prompt = f"""Multiple agents can handle this request. Decide the best routing.
Request:
{json.dumps(request, indent=2)}
Intent: {intent or "Not specified"}
Capable agents:
{agents_info}
Decide:
1. Which agent(s) should handle this?
2. Should they run in parallel or sequence?
3. Why this routing?
Respond with JSON only:
{{
"selected": ["agent_name", ...],
"parallel": true/false,
"reasoning": "explanation"
}}"""
try:
response = await self._llm.acomplete(
messages=[{"role": "user", "content": prompt}],
system="You are a request router. Respond with JSON only.",
max_tokens=256,
)
import re
json_match = re.search(r"\{[^{}]*\}", response.content, re.DOTALL)
if json_match:
data = json.loads(json_match.group())
selected = data.get("selected", [])
# Validate selected agents exist
selected = [s for s in selected if s in self._agents]
if selected:
return RoutingDecision(
selected_agents=selected,
reasoning=data.get("reasoning", ""),
confidence=0.8,
should_parallelize=data.get("parallel", False),
)
except Exception:
pass
# Fallback: use highest confidence
return RoutingDecision(
selected_agents=[capable[0][0]],
reasoning=capable[0][1].reasoning,
confidence=capable[0][1].confidence,
)
async def _send_to_agent(
self,
agent_name: str,
message: AgentMessage,
) -> AgentMessage:
"""Send a message to an agent and get response."""
agent = self._agents[agent_name]
return await agent.runner.receive_message(message)
def get_message_log(self) -> list[AgentMessage]:
"""Get full message log for debugging/tracing."""
return list(self._message_log)
def clear_message_log(self) -> None:
"""Clear the message log."""
self._message_log.clear()
def cleanup(self) -> None:
"""Clean up all agent resources."""
for agent in self._agents.values():
agent.runner.cleanup()
self._agents.clear()
+7 -292
View File
@@ -7,7 +7,7 @@ from collections.abc import Callable
from dataclasses import dataclass, field
from datetime import UTC
from pathlib import Path
from typing import TYPE_CHECKING, Any
from typing import Any
from framework.config import get_hive_config, get_max_context_tokens, get_preferred_model
from framework.credentials.validation import (
@@ -30,10 +30,6 @@ from framework.runtime.execution_stream import EntryPointSpec
from framework.runtime.runtime_log_store import RuntimeLogStore
from framework.tools.flowchart_utils import generate_fallback_flowchart
if TYPE_CHECKING:
from framework.runner.protocol import AgentMessage, CapabilityResponse
logger = logging.getLogger(__name__)
CLAUDE_CREDENTIALS_FILE = Path.home() / ".claude" / ".credentials.json"
@@ -854,17 +850,6 @@ def get_antigravity_token() -> str | None:
return access_token
def _is_antigravity_proxy_available() -> bool:
"""Return True if antigravity-auth serve is running on localhost:8069."""
import socket
try:
with socket.create_connection(("localhost", 8069), timeout=0.5):
return True
except (OSError, TimeoutError):
return False
@dataclass
class AgentInfo:
"""Information about an exported agent."""
@@ -1370,18 +1355,6 @@ class AgentRunner:
# It's a function, auto-generate Tool
self._tool_registry.register_function(tool_or_func, name=name)
def register_tools_from_module(self, module_path: Path) -> int:
"""
Auto-discover and register tools from a Python module.
Args:
module_path: Path to tools.py file
Returns:
Number of tools discovered
"""
return self._tool_registry.discover_from_module(module_path)
def register_mcp_server(
self,
name: str,
@@ -1493,16 +1466,11 @@ class AgentRunner:
configure_logging(level="INFO", format="auto")
# Set up session context for tools (workspace_id, agent_id, session_id)
workspace_id = "default" # Could be derived from storage path
# Set up session context for tools (agent_id)
agent_id = self.graph.id or "unknown"
# Use "current" as a stable session_id for persistent memory
session_id = "current"
self._tool_registry.set_session_context(
workspace_id=workspace_id,
agent_id=agent_id,
session_id=session_id,
)
# Create LLM provider
@@ -1729,7 +1697,7 @@ class AgentRunner:
accounts_data = adapter.get_all_account_info()
tool_provider_map = adapter.get_tool_provider_map()
if accounts_data:
from framework.graph.prompt_composer import build_accounts_prompt
from framework.graph.prompting import build_accounts_prompt
accounts_prompt = build_accounts_prompt(accounts_data, tool_provider_map)
except Exception:
@@ -1998,15 +1966,15 @@ class AgentRunner:
if not self._agent_runtime.is_running:
await self._agent_runtime.start()
# Set up stdin-based I/O for client-facing nodes in headless mode.
# When a client_facing EventLoopNode calls ask_user(), it emits
# Set up stdin-based I/O for the queen in headless mode.
# When the queen calls ask_user(), it emits
# CLIENT_INPUT_REQUESTED on the event bus and blocks. We subscribe
# a handler that prints the prompt and reads from stdin, then injects
# the user's response back into the node to unblock it.
has_client_facing = any(n.client_facing for n in self.graph.nodes)
has_queen = any(n.is_queen_node() for n in self.graph.nodes)
sub_ids: list[str] = []
if has_client_facing and sys.stdin.isatty():
if has_queen and sys.stdin.isatty():
from framework.runtime.event_bus import EventType
runtime = self._agent_runtime
@@ -2124,18 +2092,6 @@ class AgentRunner:
correlation_id=correlation_id,
)
async def get_goal_progress(self) -> dict[str, Any]:
"""
Get goal progress across all execution streams.
Returns:
Dict with overall_progress, criteria_status, constraint_violations, etc.
"""
if self._agent_runtime is None:
self._setup()
return await self._agent_runtime.get_goal_progress()
def get_entry_points(self) -> list[EntryPointSpec]:
"""
Get all registered entry points.
@@ -2294,247 +2250,6 @@ class AgentRunner:
missing_credentials=missing_credentials,
)
async def can_handle(
self, request: dict, llm: LLMProvider | None = None
) -> "CapabilityResponse":
"""
Ask the agent if it can handle this request.
Uses LLM to evaluate the request against the agent's goal and capabilities.
Args:
request: The request to evaluate
llm: LLM provider to use (uses self._llm if not provided)
Returns:
CapabilityResponse with level, confidence, and reasoning
"""
from framework.runner.protocol import CapabilityLevel, CapabilityResponse
# Use provided LLM or set up our own
eval_llm = llm
if eval_llm is None:
if self._llm is None:
self._setup()
eval_llm = self._llm
# If still no LLM (mock mode), do keyword matching
if eval_llm is None:
return self._keyword_capability_check(request)
# Build context about this agent
info = self.info()
agent_context = f"""Agent: {info.name}
Goal: {info.goal_name}
Description: {info.goal_description}
What this agent does:
{info.description}
Nodes in the workflow:
{chr(10).join(f"- {n['name']}: {n['description']}" for n in info.nodes[:5])}
{"..." if len(info.nodes) > 5 else ""}
"""
# Ask LLM to evaluate
prompt = f"""You are evaluating whether an agent can handle a request.
{agent_context}
Request to evaluate:
{json.dumps(request, indent=2)}
Evaluate how well this agent can handle this request. Consider:
1. Does the request match what this agent is designed to do?
2. Does the agent have the required capabilities?
3. How confident are you in this assessment?
Respond with JSON only:
{{
"level": "best_fit" | "can_handle" | "uncertain" | "cannot_handle",
"confidence": 0.0 to 1.0,
"reasoning": "Brief explanation",
"estimated_steps": number or null
}}"""
try:
response = await eval_llm.acomplete(
messages=[{"role": "user", "content": prompt}],
system="You are a capability evaluator. Respond with JSON only.",
max_tokens=256,
)
# Parse response
import re
json_match = re.search(r"\{[^{}]*\}", response.content, re.DOTALL)
if json_match:
data = json.loads(json_match.group())
level_map = {
"best_fit": CapabilityLevel.BEST_FIT,
"can_handle": CapabilityLevel.CAN_HANDLE,
"uncertain": CapabilityLevel.UNCERTAIN,
"cannot_handle": CapabilityLevel.CANNOT_HANDLE,
}
return CapabilityResponse(
agent_name=info.name,
level=level_map.get(data.get("level", "uncertain"), CapabilityLevel.UNCERTAIN),
confidence=float(data.get("confidence", 0.5)),
reasoning=data.get("reasoning", ""),
estimated_steps=data.get("estimated_steps"),
)
except Exception:
# Fall back to keyword matching on error
pass
return self._keyword_capability_check(request)
def _keyword_capability_check(self, request: dict) -> "CapabilityResponse":
"""Simple keyword-based capability check (fallback when no LLM)."""
from framework.runner.protocol import CapabilityLevel, CapabilityResponse
info = self.info()
request_str = json.dumps(request).lower()
description_lower = info.description.lower()
goal_lower = info.goal_description.lower()
# Check for keyword matches
matches = 0
keywords = request_str.split()
for keyword in keywords:
if len(keyword) > 3: # Skip short words
if keyword in description_lower or keyword in goal_lower:
matches += 1
# Determine level based on matches
match_ratio = matches / max(len(keywords), 1)
if match_ratio > 0.3:
level = CapabilityLevel.CAN_HANDLE
confidence = min(0.7, match_ratio + 0.3)
elif match_ratio > 0.1:
level = CapabilityLevel.UNCERTAIN
confidence = 0.4
else:
level = CapabilityLevel.CANNOT_HANDLE
confidence = 0.6
return CapabilityResponse(
agent_name=info.name,
level=level,
confidence=confidence,
reasoning=f"Keyword match ratio: {match_ratio:.2f}",
estimated_steps=info.node_count if level != CapabilityLevel.CANNOT_HANDLE else None,
)
async def receive_message(self, message: "AgentMessage") -> "AgentMessage":
"""
Handle a message from the orchestrator or another agent.
Args:
message: The incoming message
Returns:
Response message
"""
from framework.runner.protocol import MessageType
info = self.info()
# Handle capability check
if message.type == MessageType.CAPABILITY_CHECK:
capability = await self.can_handle(message.content)
return message.reply(
from_agent=info.name,
content={
"level": capability.level.value,
"confidence": capability.confidence,
"reasoning": capability.reasoning,
"estimated_steps": capability.estimated_steps,
},
type=MessageType.CAPABILITY_RESPONSE,
)
# Handle request - run the agent
if message.type == MessageType.REQUEST:
result = await self.run(message.content)
return message.reply(
from_agent=info.name,
content={
"success": result.success,
"output": result.output,
"path": result.path,
"error": result.error,
},
type=MessageType.RESPONSE,
)
# Handle handoff - another agent is passing work
if message.type == MessageType.HANDOFF:
# Extract context from handoff and run
context = message.content.get("context", {})
context["_handoff_from"] = message.from_agent
context["_handoff_reason"] = message.content.get("reason", "")
result = await self.run(context)
return message.reply(
from_agent=info.name,
content={
"success": result.success,
"output": result.output,
"handoff_handled": True,
},
type=MessageType.RESPONSE,
)
# Unknown message type
return message.reply(
from_agent=info.name,
content={"error": f"Unknown message type: {message.type}"},
type=MessageType.RESPONSE,
)
@classmethod
async def setup_as_secondary(
cls,
agent_path: str | Path,
runtime: AgentRuntime,
graph_id: str | None = None,
) -> str:
"""Load an agent and register it as a secondary graph on *runtime*.
Uses :meth:`AgentRunner.load` to parse the agent, then calls
:meth:`AgentRuntime.add_graph` with the extracted graph, goal,
and entry points.
Args:
agent_path: Path to the agent directory
runtime: The running AgentRuntime to attach to
graph_id: Optional graph identifier (defaults to directory name)
Returns:
The graph_id used for registration
"""
agent_path = Path(agent_path)
runner = cls.load(agent_path)
gid = graph_id or agent_path.name
# Build entry points
entry_points: dict[str, EntryPointSpec] = {}
if runner.graph.entry_node:
entry_points["default"] = EntryPointSpec(
id="default",
name="Default",
entry_node=runner.graph.entry_node,
trigger_type="manual",
isolation_level="shared",
)
await runtime.add_graph(
graph_id=gid,
graph=runner.graph,
goal=runner.goal,
entry_points=entry_points,
)
return gid
def cleanup(self) -> None:
"""Clean up resources (synchronous)."""
# Clean up MCP client connections
+1 -1
View File
@@ -48,7 +48,7 @@ class ToolRegistry:
# Framework-internal context keys injected into tool calls.
# Stripped from LLM-facing schemas (the LLM doesn't know these values)
# and auto-injected at call time for tools that accept them.
CONTEXT_PARAMS = frozenset({"workspace_id", "agent_id", "session_id", "data_dir"})
CONTEXT_PARAMS = frozenset({"agent_id", "data_dir"})
# Credential directory used for change detection
_CREDENTIAL_DIR = Path("~/.hive/credentials/credentials").expanduser()
+5 -59
View File
@@ -22,7 +22,7 @@ Every event shares a common envelope:
The identity tuple `(graph_id, stream_id, node_id, execution_id)` uniquely locates any event:
- **`graph_id`** — Which graph produced the event. Set automatically by `GraphScopedEventBus` (a subclass that stamps `graph_id` on every `publish()` call). Values: `"worker"`, `"judge"`, `"queen"`, or the graph spec ID.
- **`stream_id`** — Which entry point / pipeline. Corresponds to `EntryPointSpec.id` in the graph definition. For single-entry-point graphs, this equals the entry point name (e.g. `"default"`, `"health_check"`, `"ticket_receiver"`).
- **`stream_id`** — Which entry point / pipeline. Corresponds to `EntryPointSpec.id` in the graph definition. For single-entry-point graphs, this equals the entry point name (e.g. `"default"`, `"health_check"`).
- **`node_id`** — Which specific node emitted the event. For `EventLoopNode` events, this is the node spec ID.
- **`execution_id`** — UUID identifying a specific execution run. Multiple concurrent executions of the same entry point each get a unique `execution_id`.
@@ -198,7 +198,7 @@ A tool call has finished executing.
## Client I/O
These events are emitted only by nodes with `client_facing=True`. They drive the TUI's chat interface.
These events are emitted by the queen's interactive turns. They drive the TUI's chat interface.
### `client_output_delta`
@@ -209,7 +209,7 @@ Incremental text output meant for the human operator.
| `content` | `str` | New text chunk (delta) |
| `snapshot` | `str` | Full accumulated text so far |
**Emitted by:** `EventLoopNode._publish_text_delta()` when `client_facing=True`
**Emitted by:** `EventLoopNode._publish_text_delta()` for queen/user-facing output
---
@@ -356,11 +356,11 @@ Not currently emitted — reserved for future use when `NodeConversation` compac
### `state_changed`
A shared memory key has been modified.
A shared buffer key has been modified.
| Data Field | Type | Description |
| ----------- | ----- | ---------------------------------- |
| `key` | `str` | Memory key that changed |
| `key` | `str` | Buffer key that changed |
| `old_value` | `Any` | Previous value |
| `new_value` | `Any` | New value |
| `scope` | `str` | Scope of the change |
@@ -452,60 +452,6 @@ An agent has requested handoff to the Hive Coder (via the `escalate` synthetic t
---
## Worker Health Monitoring
These events form the **queen → operator** escalation pipeline.
### `worker_escalation_ticket`
A worker degradation pattern has been detected and is being escalated to the Queen.
| Data Field | Type | Description |
| ---------- | ------ | ------------------------------------ |
| `ticket` | `dict` | Full `EscalationTicket` (see below) |
**Emitted by:** `emit_escalation_ticket` tool (in `worker_monitoring_tools.py`)
#### EscalationTicket Schema
| Field | Type | Description |
| ------------------------- | ------------------ | -------------------------------------------------------- |
| `ticket_id` | `str` | Auto-generated UUID |
| `created_at` | `str` | ISO timestamp |
| `worker_agent_id` | `str` | Which worker agent |
| `worker_session_id` | `str` | Which session |
| `worker_node_id` | `str` | Which node is struggling |
| `worker_graph_id` | `str` | Which graph |
| `severity` | `str` | `"low"`, `"medium"`, `"high"`, or `"critical"` |
| `cause` | `str` | Human-readable problem description |
| `judge_reasoning` | `str` | Judge's deliberation chain |
| `suggested_action` | `str` | e.g. `"Restart node"`, `"Human review"`, `"Kill session"`|
| `recent_verdicts` | `list[str]` | e.g. `["RETRY", "RETRY", "CONTINUE", "RETRY"]` |
| `total_steps_checked` | `int` | Steps the judge inspected |
| `steps_since_last_accept` | `int` | Consecutive non-ACCEPT steps |
| `stall_minutes` | `float \| null` | Minutes since last activity (null if active) |
| `evidence_snippet` | `str` | Excerpt from recent LLM output |
---
### `queen_intervention_requested`
The Queen has triaged an escalation ticket and decided the human operator should be involved.
| Data Field | Type | Description |
| ----------------- | ----- | ---------------------------------------------------- |
| `ticket_id` | `str` | From the original `EscalationTicket` |
| `analysis` | `str` | Queen's 23 sentence analysis |
| `severity` | `str` | `"low"`, `"medium"`, `"high"`, or `"critical"` |
| `queen_graph_id` | `str` | Queen's graph ID (for TUI navigation) |
| `queen_stream_id` | `str` | Queen's stream ID |
**Emitted by:** `notify_operator` tool (in `worker_monitoring_tools.py`)
The TUI subscribes to this event and shows a non-disruptive notification. The worker continues running.
---
## Custom Events
### `custom`
+2 -3
View File
@@ -33,7 +33,7 @@ Single-entry agents get a `"default"` entry point automatically. There is no sep
| `ExecutionStream` | `runtime/execution_stream.py` | Per-entry-point execution queue, session persistence |
| `GraphExecutor` | `graph/executor.py` | Node traversal, tool dispatch, checkpointing |
| `EventBus` | `runtime/event_bus.py` | Pub/sub for execution events (streaming, I/O) |
| `SharedStateManager` | `runtime/shared_state.py` | Cross-stream state with isolation levels |
| `SharedBufferManager` | `runtime/shared_state.py` | Cross-stream state with isolation levels |
| `OutcomeAggregator` | `runtime/outcome_aggregator.py` | Goal progress tracking across streams |
| `SessionStore` | `storage/session_store.py` | Session state persistence (`sessions/{id}/state.json`) |
@@ -55,7 +55,6 @@ result = await runner.run({"query": "continue"}, session_state=saved_state)
await runner.start() # Start the runtime
await runner.stop() # Stop the runtime
exec_id = await runner.trigger("default", {}) # Non-blocking trigger
progress = await runner.get_goal_progress() # Goal evaluation
entry_points = runner.get_entry_points() # List entry points
# Context manager
@@ -109,7 +108,7 @@ runtime.unsubscribe_from_events(sub_id)
# Inspection
runtime.is_running # bool
runtime.event_bus # EventBus
runtime.state_manager # SharedStateManager
runtime.state_manager # SharedBufferManager
runtime.get_stats() # Runtime statistics
```
@@ -1,840 +0,0 @@
# Resumable Sessions Design
## Problem Statement
Currently, when an agent encounters a failure during execution (e.g., credential validation, API errors, tool failures), the entire session is lost. This creates a poor user experience, especially when:
1. The agent has completed significant work before the failure
2. The failure is recoverable (e.g., adding missing credentials)
3. The user wants to retry from the exact failure point without redoing work
## Design Goals
1. **Crash Recovery**: Sessions can resume after process crashes or errors
2. **Partial Completion**: Preserve work done by nodes that completed successfully
3. **Flexible Resume Points**: Resume from exact failure point or previous checkpoints
4. **State Consistency**: Guarantee consistent SharedMemory and conversation state
5. **Minimal Overhead**: Checkpointing shouldn't significantly impact performance
6. **User Control**: Users can inspect, modify, and resume sessions explicitly
## Architecture
### 1. Checkpoint System
#### Checkpoint Types
**Automatic Checkpoints** (saved automatically by framework):
- `node_start`: Before each node begins execution
- `node_complete`: After each node successfully completes
- `edge_transition`: Before traversing to next node
- `loop_iteration`: At each iteration in EventLoopNode (optional)
**Manual Checkpoints** (triggered by agent designer):
- `safe_point`: Explicitly marked safe points in graph
- `user_checkpoint`: Before awaiting user input in client-facing nodes
#### Checkpoint Data Structure
```python
@dataclass
class Checkpoint:
"""Single checkpoint in execution timeline."""
# Identity
checkpoint_id: str # Format: checkpoint_{timestamp}_{uuid_short}
session_id: str
checkpoint_type: str # "node_start", "node_complete", etc.
# Timestamps
created_at: str # ISO 8601
# Execution state
current_node: str | None
next_node: str | None # For edge_transition checkpoints
execution_path: list[str] # Nodes executed so far
# Memory state (snapshot)
shared_memory: dict[str, Any] # Full SharedMemory._data
# Per-node conversation state references
# (actual conversations stored separately, reference by node_id)
conversation_states: dict[str, str] # {node_id: conversation_checkpoint_id}
# Output accumulator state
accumulated_outputs: dict[str, Any]
# Execution metrics (for resuming quality tracking)
metrics_snapshot: dict[str, Any]
# Metadata
is_clean: bool # True if no failures/retries before this checkpoint
can_resume_from: bool # False if checkpoint is in unstable state
description: str # Human-readable checkpoint description
```
#### Storage Structure
```
~/.hive/agents/{agent_name}/
└── sessions/
└── session_YYYYMMDD_HHMMSS_{uuid}/
├── state.json # Session state (existing)
├── checkpoints/
│ ├── index.json # Checkpoint index/manifest
│ ├── checkpoint_1.json # Individual checkpoints
│ ├── checkpoint_2.json
│ └── checkpoint_N.json
├── conversations/ # Flat conversation state (parts carry phase_id)
│ ├── meta.json # Current node config
│ ├── cursor.json # Iteration, outputs, stall state
│ └── parts/ # Sequential message files
├── data/ # Spillover artifacts (existing)
└── logs/ # L1/L2/L3 logs (existing)
```
**Checkpoint Index Format** (`checkpoints/index.json`):
```json
{
"session_id": "session_20260208_143022_abc12345",
"checkpoints": [
{
"checkpoint_id": "checkpoint_20260208_143030_xyz123",
"type": "node_complete",
"created_at": "2026-02-08T14:30:30.123Z",
"current_node": "collector",
"is_clean": true,
"can_resume_from": true,
"description": "Completed collector node successfully"
},
{
"checkpoint_id": "checkpoint_20260208_143045_abc789",
"type": "node_start",
"created_at": "2026-02-08T14:30:45.456Z",
"current_node": "analyzer",
"is_clean": true,
"can_resume_from": true,
"description": "Starting analyzer node"
}
],
"latest_checkpoint_id": "checkpoint_20260208_143045_abc789",
"total_checkpoints": 2
}
```
### 2. Resume Mechanism
#### Resume Flow
```python
# High-level resume flow
async def resume_session(
session_id: str,
checkpoint_id: str | None = None, # None = resume from latest
modifications: dict[str, Any] | None = None, # Override memory values
) -> ExecutionResult:
"""
Resume a session from a checkpoint.
Args:
session_id: Session to resume
checkpoint_id: Specific checkpoint (None = latest)
modifications: Optional memory/state modifications before resume
Returns:
ExecutionResult with resumed execution
"""
# 1. Load session state
session_state = await session_store.read_state(session_id)
# 2. Verify session is resumable
if not session_state.is_resumable:
raise ValueError(f"Session {session_id} is not resumable")
# 3. Load checkpoint
checkpoint = await checkpoint_store.load_checkpoint(
session_id,
checkpoint_id or session_state.progress.resume_from
)
# 4. Restore state
# - Restore SharedMemory from checkpoint.shared_memory
# - Restore per-node conversations from checkpoint.conversation_states
# - Restore output accumulator from checkpoint.accumulated_outputs
# - Apply modifications if provided
# 5. Resume execution from checkpoint.next_node or checkpoint.current_node
result = await executor.execute(
graph=graph,
goal=goal,
memory=restored_memory,
entry_point=checkpoint.next_node or checkpoint.current_node,
session_state=restored_session_state,
)
# 6. Update session state with resumed execution
await session_store.write_state(session_id, updated_state)
return result
```
#### Checkpoint Restoration
```python
@dataclass
class CheckpointStore:
"""Manages checkpoint storage and retrieval."""
async def save_checkpoint(
self,
session_id: str,
checkpoint: Checkpoint,
) -> None:
"""Save a checkpoint atomically."""
# 1. Write checkpoint file: checkpoints/checkpoint_{id}.json
# 2. Update index: checkpoints/index.json
# 3. Use atomic write for crash safety
async def load_checkpoint(
self,
session_id: str,
checkpoint_id: str | None = None,
) -> Checkpoint | None:
"""Load a checkpoint by ID or latest."""
# 1. Read checkpoint index
# 2. Find checkpoint by ID (or latest if None)
# 3. Load and deserialize checkpoint file
async def list_checkpoints(
self,
session_id: str,
checkpoint_type: str | None = None,
is_clean: bool | None = None,
) -> list[Checkpoint]:
"""List all checkpoints for a session with optional filters."""
async def delete_checkpoint(
self,
session_id: str,
checkpoint_id: str,
) -> bool:
"""Delete a specific checkpoint."""
async def prune_checkpoints(
self,
session_id: str,
keep_count: int = 10,
keep_clean_only: bool = False,
) -> int:
"""Prune old checkpoints, keeping most recent N."""
```
### 3. GraphExecutor Integration
#### Modified Execution Loop
```python
# In GraphExecutor.execute()
async def execute(
self,
graph: GraphSpec,
goal: Goal,
memory: SharedMemory | None = None,
entry_point: str = "start",
session_state: dict[str, Any] | None = None,
checkpoint_config: CheckpointConfig | None = None,
) -> ExecutionResult:
"""
Execute graph with checkpointing support.
New parameters:
checkpoint_config: Configuration for checkpointing behavior
"""
# Initialize checkpoint store
checkpoint_store = CheckpointStore(storage_path / "checkpoints")
# Restore from checkpoint if session_state indicates resume
if session_state and session_state.get("resume_from"):
checkpoint = await checkpoint_store.load_checkpoint(
session_id,
session_state["resume_from"]
)
memory = self._restore_memory_from_checkpoint(checkpoint)
entry_point = checkpoint.next_node or checkpoint.current_node
current_node = entry_point
while current_node:
# CHECKPOINT: node_start
if checkpoint_config and checkpoint_config.checkpoint_on_node_start:
await self._save_checkpoint(
checkpoint_store,
checkpoint_type="node_start",
current_node=current_node,
memory=memory,
# ... other state
)
try:
# Execute node
result = await self._execute_node(current_node, memory, context)
# CHECKPOINT: node_complete
if checkpoint_config and checkpoint_config.checkpoint_on_node_complete:
await self._save_checkpoint(
checkpoint_store,
checkpoint_type="node_complete",
current_node=current_node,
memory=memory,
# ... other state
)
except Exception as e:
# On failure, mark current checkpoint as resume point
await self._mark_failure_checkpoint(
checkpoint_store,
current_node=current_node,
error=str(e),
)
raise
# Find next edge
next_node = self._find_next_node(current_node, result, memory)
# CHECKPOINT: edge_transition
if next_node and checkpoint_config and checkpoint_config.checkpoint_on_edge:
await self._save_checkpoint(
checkpoint_store,
checkpoint_type="edge_transition",
current_node=current_node,
next_node=next_node,
memory=memory,
# ... other state
)
current_node = next_node
```
### 4. EventLoopNode Integration
#### Conversation State Checkpointing
EventLoopNode already has conversation persistence via `ConversationStore`. For resumability:
```python
class EventLoopNode:
async def execute(self, ctx: NodeContext) -> NodeResult:
"""Execute with checkpoint support."""
# Try to restore from checkpoint
if ctx.checkpoint_id:
conversation = await self._restore_conversation(ctx.checkpoint_id)
output_accumulator = await OutputAccumulator.restore(self.store)
else:
# Fresh start
conversation = await self._initialize_conversation(ctx)
output_accumulator = OutputAccumulator(store=self.store)
# Event loop with periodic checkpointing
iteration = 0
while iteration < self.config.max_iterations:
# Optional: checkpoint every N iterations
if self.config.checkpoint_every_n_iterations:
if iteration % self.config.checkpoint_every_n_iterations == 0:
await self._save_loop_checkpoint(
conversation,
output_accumulator,
iteration,
)
# ... rest of event loop
iteration += 1
```
**Note**: EventLoopNode conversation state is already persisted to disk after each turn via `ConversationStore`, so it's naturally resumable. We just need to:
1. Track which conversation checkpoint to restore from
2. Ensure output accumulator state is also restored
### 5. User-Facing API
#### MCP Tools for Resume
```python
# In tools/src/aden_tools/tools/session_management/
@tool
async def list_resumable_sessions(
agent_work_dir: str,
status: str = "failed", # "failed", "paused", "cancelled"
limit: int = 20,
) -> dict:
"""
List sessions that can be resumed.
Returns:
{
"sessions": [
{
"session_id": "session_20260208_143022_abc12345",
"status": "failed",
"error": "Missing API key: OPENAI_API_KEY",
"failed_at_node": "analyzer",
"last_checkpoint": "checkpoint_20260208_143045_abc789",
"created_at": "2026-02-08T14:30:22Z",
"updated_at": "2026-02-08T14:30:45Z"
}
],
"total": 1
}
"""
@tool
async def list_session_checkpoints(
agent_work_dir: str,
session_id: str,
checkpoint_type: str = "", # Filter by type
clean_only: bool = False, # Only show clean checkpoints
) -> dict:
"""
List all checkpoints for a session.
Returns:
{
"session_id": "session_20260208_143022_abc12345",
"checkpoints": [
{
"checkpoint_id": "checkpoint_20260208_143030_xyz123",
"type": "node_complete",
"created_at": "2026-02-08T14:30:30Z",
"current_node": "collector",
"is_clean": true,
"can_resume_from": true,
"description": "Completed collector node successfully"
},
...
]
}
"""
@tool
async def inspect_checkpoint(
agent_work_dir: str,
session_id: str,
checkpoint_id: str,
include_memory: bool = False, # Include full memory state
) -> dict:
"""
Inspect a checkpoint's detailed state.
Returns:
{
"checkpoint_id": "checkpoint_20260208_143030_xyz123",
"type": "node_complete",
"current_node": "collector",
"execution_path": ["start", "collector"],
"accumulated_outputs": {
"twitter_handles": ["@user1", "@user2"]
},
"memory": {...}, # If include_memory=True
"metrics_snapshot": {
"total_retries": 2,
"nodes_with_failures": []
}
}
"""
@tool
async def resume_session(
agent_work_dir: str,
session_id: str,
checkpoint_id: str = "", # Empty = latest checkpoint
memory_modifications: str = "{}", # JSON string of memory overrides
) -> dict:
"""
Resume a session from a checkpoint.
Args:
agent_work_dir: Path to agent workspace
session_id: Session to resume
checkpoint_id: Specific checkpoint (empty = latest)
memory_modifications: JSON object with memory key overrides
Returns:
{
"session_id": "session_20260208_143022_abc12345",
"resumed_from": "checkpoint_20260208_143045_abc789",
"status": "active", # Now actively running
"message": "Session resumed successfully from checkpoint_20260208_143045_abc789"
}
"""
```
#### CLI Commands
```bash
# List resumable sessions
hive sessions list --agent deep_research_agent --status failed
# Show checkpoints for a session
hive sessions checkpoints session_20260208_143022_abc12345
# Inspect a checkpoint
hive sessions inspect session_20260208_143022_abc12345 checkpoint_20260208_143045_abc789
# Resume a session
hive sessions resume session_20260208_143022_abc12345
# Resume from specific checkpoint
hive sessions resume session_20260208_143022_abc12345 --checkpoint checkpoint_20260208_143030_xyz123
# Resume with memory modifications (e.g., after adding credentials)
hive sessions resume session_20260208_143022_abc12345 --set api_key=sk-...
```
### 6. Configuration
#### CheckpointConfig
```python
@dataclass
class CheckpointConfig:
"""Configuration for checkpoint behavior."""
# When to checkpoint
checkpoint_on_node_start: bool = True
checkpoint_on_node_complete: bool = True
checkpoint_on_edge: bool = False # Usually redundant with node_start
checkpoint_on_loop_iteration: bool = False # Can be expensive
checkpoint_every_n_iterations: int = 0 # 0 = disabled
# Pruning
max_checkpoints_per_session: int = 100
prune_after_node_count: int = 10 # Prune every N nodes
keep_clean_checkpoints_only: bool = False
# Performance
async_checkpoint: bool = True # Don't block execution on checkpoint writes
# What to include
include_conversation_snapshots: bool = True
include_full_memory: bool = True
```
#### Agent-Level Configuration
```python
# In agent.py or config.py
class MyAgent(Agent):
def get_checkpoint_config(self) -> CheckpointConfig:
"""Override to customize checkpoint behavior."""
return CheckpointConfig(
checkpoint_on_node_start=True,
checkpoint_on_node_complete=True,
checkpoint_every_n_iterations=5, # Checkpoint every 5 iterations in loops
max_checkpoints_per_session=50,
)
```
## Implementation Plan
### Phase 1: Core Checkpoint Infrastructure (Week 1)
1. **Create checkpoint schemas**
- `Checkpoint` dataclass
- `CheckpointIndex` for manifest
- Serialization/deserialization
2. **Implement CheckpointStore**
- `save_checkpoint()` with atomic writes
- `load_checkpoint()` with deserialization
- `list_checkpoints()` with filtering
- `prune_checkpoints()` for cleanup
3. **Update SessionState schema**
- Add `resume_from_checkpoint_id` field
- Add `checkpoints_enabled` flag
### Phase 2: GraphExecutor Integration (Week 2)
1. **Modify GraphExecutor**
- Add `CheckpointConfig` parameter
- Implement checkpoint saving at node boundaries
- Implement checkpoint restoration logic
- Handle memory state snapshots
2. **Update execution loop**
- Checkpoint before node execution
- Checkpoint after successful completion
- Mark failure checkpoints on errors
### Phase 3: EventLoopNode Integration (Week 3)
1. **Enhance conversation restoration**
- Link checkpoints to conversation states
- Ensure OutputAccumulator is checkpointed
- Test loop resumption from middle of execution
2. **Add optional loop iteration checkpoints**
- Configurable iteration frequency
- Balance between granularity and performance
### Phase 4: User-Facing Features (Week 4)
1. **Implement MCP tools**
- `list_resumable_sessions`
- `list_session_checkpoints`
- `inspect_checkpoint`
- `resume_session`
2. **Add CLI commands**
- `hive sessions list`
- `hive sessions checkpoints`
- `hive sessions inspect`
- `hive sessions resume`
3. **Update TUI**
- Show resumable sessions in UI
- Allow resume from TUI interface
### Phase 5: Testing & Documentation (Week 5)
1. **Write comprehensive tests**
- Unit tests for CheckpointStore
- Integration tests for resume flow
- Edge case testing (concurrent checkpoints, corruption, etc.)
2. **Performance testing**
- Measure checkpoint overhead
- Optimize async checkpoint writing
- Test with large memory states
3. **Documentation**
- Update skills with resume patterns
- Document checkpoint configuration
- Add troubleshooting guide
## Performance Considerations
### Checkpoint Overhead
**Estimated overhead per checkpoint**:
- Memory serialization: ~5-10ms for typical state (< 1MB)
- File I/O: ~10-20ms for atomic write
- Total: ~15-30ms per checkpoint
**Mitigation strategies**:
1. **Async checkpointing**: Don't block execution on writes
2. **Selective checkpointing**: Only checkpoint at important boundaries
3. **Incremental checkpoints**: Store deltas instead of full state (future)
4. **Compression**: Compress large memory states before writing
### Storage Size
**Typical checkpoint size**:
- Small memory state (< 100KB): ~50-100KB per checkpoint
- Medium memory state (< 1MB): ~500KB-1MB per checkpoint
- Large memory state (> 1MB): ~1-5MB per checkpoint
**Mitigation strategies**:
1. **Pruning**: Keep only N most recent checkpoints
2. **Clean-only retention**: Only keep checkpoints from clean execution
3. **Compression**: Use gzip for checkpoint files
4. **Archiving**: Move old checkpoints to archive storage
## Error Handling
### Checkpoint Save Failures
**Scenarios**:
- Disk full
- Permission errors
- Serialization failures
- Concurrent writes
**Handling**:
```python
try:
await checkpoint_store.save_checkpoint(session_id, checkpoint)
except CheckpointSaveError as e:
# Log warning but don't fail execution
logger.warning(f"Failed to save checkpoint: {e}")
# Continue execution without checkpoint
```
### Checkpoint Load Failures
**Scenarios**:
- Checkpoint file corrupted
- Checkpoint format incompatible
- Referenced conversation state missing
**Handling**:
```python
try:
checkpoint = await checkpoint_store.load_checkpoint(session_id, checkpoint_id)
except CheckpointLoadError as e:
# Try to find previous valid checkpoint
checkpoints = await checkpoint_store.list_checkpoints(session_id)
for cp in reversed(checkpoints):
try:
checkpoint = await checkpoint_store.load_checkpoint(session_id, cp.checkpoint_id)
logger.info(f"Fell back to checkpoint {cp.checkpoint_id}")
break
except CheckpointLoadError:
continue
else:
raise ValueError(f"No valid checkpoints found for session {session_id}")
```
### Resume Failures
**Scenarios**:
- Checkpoint state inconsistent with current graph
- Node no longer exists in updated agent code
- Memory keys missing required values
**Handling**:
1. **Validation**: Verify checkpoint compatibility before resume
2. **Graceful degradation**: Resume from earlier checkpoint if possible
3. **User notification**: Clear error messages about why resume failed
## Migration Path
### Backward Compatibility
**Existing sessions** (without checkpoints):
- Can still be executed normally
- Checkpoint system is opt-in per agent
- No breaking changes to existing APIs
**Enabling checkpoints**:
```python
# Option 1: Agent-level default
class MyAgent(Agent):
checkpoint_config = CheckpointConfig(
checkpoint_on_node_complete=True,
)
# Option 2: Runtime override
runtime = create_agent_runtime(
agent=my_agent,
checkpoint_config=CheckpointConfig(...),
)
# Option 3: Per-execution
result = await executor.execute(
graph=graph,
goal=goal,
checkpoint_config=CheckpointConfig(...),
)
```
### Gradual Rollout
1. **Phase 1**: Core infrastructure, no user-facing features
2. **Phase 2**: Opt-in for specific agents via config
3. **Phase 3**: User-facing MCP tools and CLI
4. **Phase 4**: Enable by default for all new agents
5. **Phase 5**: TUI integration
## Future Enhancements
### 1. Incremental Checkpoints
Instead of full state snapshots, store only deltas:
```python
@dataclass
class IncrementalCheckpoint:
"""Checkpoint with only changed state."""
base_checkpoint_id: str # Parent checkpoint
memory_delta: dict[str, Any] # Only changed keys
added_outputs: dict[str, Any] # Only new outputs
```
### 2. Distributed Checkpointing
For long-running agents, checkpoint to cloud storage:
```python
checkpoint_config = CheckpointConfig(
storage_backend="s3", # or "gcs", "azure"
storage_url="s3://my-bucket/checkpoints/",
)
```
### 3. Checkpoint Compression
Compress large memory states:
```python
checkpoint_config = CheckpointConfig(
compress=True,
compression_threshold_bytes=100_000, # Compress if > 100KB
)
```
### 4. Smart Checkpoint Selection
Use heuristics to decide when to checkpoint:
```python
class SmartCheckpointStrategy:
def should_checkpoint(self, context: ExecutionContext) -> bool:
# Checkpoint after expensive nodes
if context.node_latency_ms > 30_000:
return True
# Checkpoint before risky operations
if context.node_id in ["api_call", "external_tool"]:
return True
# Checkpoint after significant memory changes
if context.memory_delta_size > 10:
return True
return False
```
## Security Considerations
### 1. Sensitive Data in Checkpoints
**Problem**: Checkpoints may contain sensitive data (API keys, credentials, PII)
**Mitigation**:
```python
@dataclass
class CheckpointConfig:
# Exclude sensitive keys from checkpoint
exclude_memory_keys: list[str] = field(default_factory=lambda: [
"api_key",
"credentials",
"access_token",
])
# Encrypt checkpoint files
encrypt_checkpoints: bool = True
encryption_key_source: str = "keychain" # or "env_var", "file"
```
### 2. Checkpoint Tampering
**Problem**: Malicious modification of checkpoint files
**Mitigation**:
```python
@dataclass
class Checkpoint:
# Add cryptographic signature
signature: str # HMAC of checkpoint content
def verify_signature(self, secret_key: str) -> bool:
"""Verify checkpoint hasn't been tampered with."""
...
```
## References
- [RUNTIME_LOGGING.md](./RUNTIME_LOGGING.md) - Current logging system
- [session_state.py](../schemas/session_state.py) - Session state schema
- [session_store.py](../storage/session_store.py) - Session storage
- [executor.py](../graph/executor.py) - Graph executor
- [event_loop_node.py](../graph/event_loop_node.py) - EventLoop implementation
-698
View File
@@ -1,698 +0,0 @@
# Runtime Logging System
## Overview
The Hive framework uses a **three-level observability system** for tracking agent execution at different granularities:
- **L1 (Summary)**: High-level run outcomes - success/failure, execution quality, attention flags
- **L2 (Details)**: Per-node completion details - retries, verdicts, latency, attention reasons
- **L3 (Tool Logs)**: Step-by-step execution - tool calls, LLM responses, judge feedback
This layered approach enables efficient debugging: start with L1 to identify problematic runs, drill into L2 to find failing nodes, and analyze L3 for root cause details.
---
## Storage Architecture
### Current Structure (Unified Sessions)
**Default since 2026-02-06**
```
~/.hive/agents/{agent_name}/
└── sessions/
└── session_YYYYMMDD_HHMMSS_{uuid}/
├── state.json # Session state and metadata
├── logs/ # Runtime logs (L1/L2/L3)
│ ├── summary.json # L1: Run outcome
│ ├── details.jsonl # L2: Per-node results
│ └── tool_logs.jsonl # L3: Step-by-step execution
├── conversations/ # Flat EventLoop state (parts carry phase_id)
└── data/ # Spillover artifacts
```
**Key characteristics:**
- All session data colocated in one directory
- Consistent ID format: `session_YYYYMMDD_HHMMSS_{short_uuid}`
- Logs written incrementally (JSONL for L2/L3)
- Single source of truth: `state.json`
### Legacy Structure (Deprecated)
**Read-only for backward compatibility**
```
~/.hive/agents/{agent_name}/
├── runtime_logs/
│ └── runs/
│ └── {run_id}/
│ ├── summary.json # L1
│ ├── details.jsonl # L2
│ └── tool_logs.jsonl # L3
├── sessions/
│ └── exec_{stream_id}_{uuid}/
│ ├── conversations/
│ └── data/
├── runs/ # Deprecated
│ └── run_start_*.json
└── summaries/ # Deprecated
└── run_start_*.json
```
**Migration status:**
- ✅ New sessions write to unified structure only
- ✅ Old sessions remain readable
- ❌ No new writes to `runs/`, `summaries/`, `runtime_logs/runs/`
- ⚠️ Deprecation warnings emitted when reading old locations
---
## Components
### RuntimeLogger
**Location:** `core/framework/runtime/runtime_logger.py`
**Responsibilities:**
- Receives execution events from GraphExecutor
- Tracks per-node execution details
- Aggregates attention flags
- Coordinates with RuntimeLogStore
**Key methods:**
```python
def start_run(goal_id: str, session_id: str = "") -> str:
"""Initialize a new run. Uses session_id as run_id if provided."""
def log_step(node_id: str, step_index: int, tool_calls: list, ...):
"""Record one LLM step (L3). Appends to tool_logs.jsonl immediately."""
def log_node_complete(node_id: str, exit_status: str, ...):
"""Record node completion (L2). Appends to details.jsonl immediately."""
async def end_run(status: str):
"""Finalize run, aggregate L2→L1, write summary.json."""
```
**Attention flag triggers:**
```python
# From runtime_logger.py:190-203
needs_attention = any([
retry_count > 3,
escalate_count > 2,
latency_ms > 60000,
tokens_used > 100000,
total_steps > 20,
])
```
### RuntimeLogStore
**Location:** `core/framework/runtime/runtime_log_store.py`
**Responsibilities:**
- Manages log file I/O
- Handles both old and new storage paths
- Provides incremental append for L2/L3 (crash-safe)
- Atomic writes for L1
**Storage path resolution:**
```python
def _get_run_dir(run_id: str) -> Path:
"""Determine log directory based on run_id format.
- session_* → {storage_root}/sessions/{run_id}/logs/
- Other → {base_path}/runtime_logs/runs/{run_id}/ (deprecated)
"""
```
**Key methods:**
```python
def ensure_run_dir(run_id: str):
"""Create log directory immediately at start_run()."""
def append_step(run_id: str, step: NodeStepLog):
"""Append L3 entry to tool_logs.jsonl. Thread-safe sync write."""
def append_node_detail(run_id: str, detail: NodeDetail):
"""Append L2 entry to details.jsonl. Thread-safe sync write."""
async def save_summary(run_id: str, summary: RunSummaryLog):
"""Write L1 summary.json atomically at end_run()."""
```
**File format:**
- **L1 (summary.json)**: Standard JSON, written once at end
- **L2 (details.jsonl)**: JSONL (one object per line), appended per node
- **L3 (tool_logs.jsonl)**: JSONL (one object per line), appended per step
### Runtime Log Schemas
**Location:** `core/framework/runtime/runtime_log_schemas.py`
**L1: RunSummaryLog**
```python
@dataclass
class RunSummaryLog:
run_id: str
goal_id: str
status: str # "success", "failure", "degraded", "in_progress"
started_at: str # ISO 8601
ended_at: str | None
needs_attention: bool
attention_summary: AttentionSummary
total_nodes_executed: int
nodes_with_failures: list[str]
execution_quality: str # "clean", "degraded", "failed"
total_latency_ms: int
# ... additional metrics
```
**L2: NodeDetail**
```python
@dataclass
class NodeDetail:
node_id: str
exit_status: str # "success", "escalate", "no_valid_edge"
retry_count: int
verdict_counts: dict[str, int] # {ACCEPT: 1, RETRY: 3, ...}
total_steps: int
latency_ms: int
needs_attention: bool
attention_reasons: list[str]
# ... tool error tracking, token counts
```
**L3: NodeStepLog**
```python
@dataclass
class NodeStepLog:
node_id: str
step_index: int
tool_calls: list[dict]
tool_results: list[dict]
verdict: str # "ACCEPT", "RETRY", "ESCALATE", "CONTINUE"
verdict_feedback: str
llm_response_text: str
tokens_used: int
latency_ms: int
# ... detailed execution state
# Trace context (OTel-aligned; empty if observability context not set):
trace_id: str # From set_trace_context (OTel trace)
span_id: str # 16 hex chars per step (OTel span)
parent_span_id: str # Optional; for nested span hierarchy
execution_id: str # Session/run correlation id
```
L3 entries include `trace_id`, `span_id`, and `execution_id` for correlation and **OpenTelemetry (OTel) compatibility**. When the framework sets trace context (e.g. via `Runtime.start_run()` or `StreamRuntime.start_run()`), these fields are populated automatically so L3 data can be exported to OTel backends without schema changes.
**L2: NodeDetail** also includes `trace_id` and `span_id`; **L1: RunSummaryLog** includes `trace_id` and `execution_id` for the same correlation.
---
## Querying Logs (MCP Tools)
### Tools Location
**MCP Server:** `tools/src/aden_tools/tools/runtime_logs_tool/runtime_logs_tool.py`
Three MCP tools provide access to the logging system:
### L1: query_runtime_logs
**Purpose:** Find problematic runs
```python
query_runtime_logs(
agent_work_dir: str, # e.g., "~/.hive/agents/deep_research_agent"
status: str = "", # "needs_attention", "success", "failure", "degraded"
limit: int = 20
) -> dict # {"runs": [...], "total": int}
```
**Returns:**
```json
{
"runs": [
{
"run_id": "session_20260206_115718_e22339c5",
"status": "degraded",
"needs_attention": true,
"attention_summary": {
"total_attention_flags": 3,
"categories": ["missing_outputs", "retry_loops"]
},
"started_at": "2026-02-06T11:57:18Z"
}
],
"total": 1
}
```
**Common queries:**
```python
# Find all problematic runs
query_runtime_logs(agent_work_dir, status="needs_attention")
# Get recent runs regardless of status
query_runtime_logs(agent_work_dir, limit=10)
# Check for failures
query_runtime_logs(agent_work_dir, status="failure")
```
### L2: query_runtime_log_details
**Purpose:** Identify which nodes failed
```python
query_runtime_log_details(
agent_work_dir: str,
run_id: str, # From L1 query
needs_attention_only: bool = False,
node_id: str = "" # Filter to specific node
) -> dict # {"run_id": str, "nodes": [...]}
```
**Returns:**
```json
{
"run_id": "session_20260206_115718_e22339c5",
"nodes": [
{
"node_id": "intake-collector",
"exit_status": "escalate",
"retry_count": 5,
"verdict_counts": {"RETRY": 5, "ESCALATE": 1},
"attention_reasons": ["high_retry_count", "missing_outputs"],
"total_steps": 8,
"latency_ms": 12500,
"needs_attention": true
}
]
}
```
**Common queries:**
```python
# Get all problematic nodes
query_runtime_log_details(agent_work_dir, run_id, needs_attention_only=True)
# Analyze specific node across run
query_runtime_log_details(agent_work_dir, run_id, node_id="intake-collector")
# Full node breakdown
query_runtime_log_details(agent_work_dir, run_id)
```
### L3: query_runtime_log_raw
**Purpose:** Root cause analysis
```python
query_runtime_log_raw(
agent_work_dir: str,
run_id: str,
step_index: int = -1, # Specific step or -1 for all
node_id: str = "" # Filter to specific node
) -> dict # {"run_id": str, "steps": [...]}
```
**Returns:**
```json
{
"run_id": "session_20260206_115718_e22339c5",
"steps": [
{
"node_id": "intake-collector",
"step_index": 3,
"tool_calls": [
{
"tool": "web_search",
"args": {"query": "@RomuloNevesOf"}
}
],
"tool_results": [
{
"status": "success",
"data": "..."
}
],
"verdict": "RETRY",
"verdict_feedback": "Missing required output 'twitter_handles'. You found the handle but didn't call set_output.",
"llm_response_text": "I found the Twitter profile...",
"tokens_used": 1234,
"latency_ms": 2500
}
]
}
```
**Common queries:**
```python
# All steps for a problematic node
query_runtime_log_raw(agent_work_dir, run_id, node_id="intake-collector")
# Specific step analysis
query_runtime_log_raw(agent_work_dir, run_id, step_index=5)
# Full execution trace
query_runtime_log_raw(agent_work_dir, run_id)
```
---
## Usage Patterns
### Pattern 1: Top-Down Investigation
**Use case:** Debug a failing agent
```python
# 1. Find problematic runs (L1)
result = query_runtime_logs(
agent_work_dir="~/.hive/agents/deep_research_agent",
status="needs_attention"
)
run_id = result["runs"][0]["run_id"]
# 2. Identify failing nodes (L2)
details = query_runtime_log_details(
agent_work_dir="~/.hive/agents/deep_research_agent",
run_id=run_id,
needs_attention_only=True
)
problem_node = details["nodes"][0]["node_id"]
# 3. Analyze root cause (L3)
raw = query_runtime_log_raw(
agent_work_dir="~/.hive/agents/deep_research_agent",
run_id=run_id,
node_id=problem_node
)
# Examine verdict_feedback, tool_results, etc.
```
### Pattern 2: Node-Specific Debugging
**Use case:** Investigate why a specific node keeps failing
```python
# Get recent runs
runs = query_runtime_logs("~/.hive/agents/my_agent", limit=10)
# For each run, check specific node
for run in runs["runs"]:
node_details = query_runtime_log_details(
"~/.hive/agents/my_agent",
run["run_id"],
node_id="problematic-node"
)
# Analyze retry patterns, error types
```
### Pattern 3: Real-Time Monitoring
**Use case:** Watch for issues during development
```python
import time
while True:
result = query_runtime_logs(
agent_work_dir="~/.hive/agents/my_agent",
status="needs_attention",
limit=1
)
if result["total"] > 0:
new_issue = result["runs"][0]
print(f"⚠️ New issue detected: {new_issue['run_id']}")
# Alert or drill into L2/L3
time.sleep(10) # Poll every 10 seconds
```
---
## Integration Points
### GraphExecutor → RuntimeLogger
**Location:** `core/framework/graph/executor.py`
```python
# Executor creates logger and passes session_id
logger = RuntimeLogger(store, agent_id)
run_id = logger.start_run(goal_id, session_id=execution_id)
# During execution
logger.log_step(node_id, step_index, tool_calls, ...)
logger.log_node_complete(node_id, exit_status, ...)
# At completion
await logger.end_run(status="success")
```
### EventLoopNode → RuntimeLogger
**Location:** `core/framework/graph/event_loop_node.py`
```python
# EventLoopNode logs each step
self._logger.log_step(
node_id=self.id,
step_index=step_count,
tool_calls=current_tool_calls,
tool_results=current_tool_results,
verdict=verdict,
verdict_feedback=feedback,
...
)
```
### AgentRuntime → RuntimeLogger
**Location:** `core/framework/runtime/agent_runtime.py`
```python
# Runtime initializes logger with storage path
log_store = RuntimeLogStore(base_path / "runtime_logs")
logger = RuntimeLogger(log_store, agent_id)
# Passes session_id from ExecutionStream
logger.start_run(goal_id, session_id=execution_id)
```
---
## File Format Details
### L1: summary.json
**Written:** Once at end_run()
**Format:** Standard JSON
```json
{
"run_id": "session_20260206_115718_e22339c5",
"goal_id": "deep-research",
"status": "degraded",
"started_at": "2026-02-06T11:57:18.593081",
"ended_at": "2026-02-06T11:58:45.123456",
"needs_attention": true,
"attention_summary": {
"total_attention_flags": 3,
"categories": ["missing_outputs", "retry_loops"],
"nodes_with_attention": ["intake-collector"]
},
"total_nodes_executed": 4,
"nodes_with_failures": ["intake-collector"],
"execution_quality": "degraded",
"total_latency_ms": 86530,
"total_retries": 5
}
```
### L2: details.jsonl
**Written:** Incrementally (append per node completion)
**Format:** JSONL (one JSON object per line)
```jsonl
{"node_id":"intake-collector","exit_status":"escalate","retry_count":5,"verdict_counts":{"RETRY":5,"ESCALATE":1},"total_steps":8,"latency_ms":12500,"needs_attention":true,"attention_reasons":["high_retry_count","missing_outputs"],"tool_error_count":0,"tokens_used":9876}
{"node_id":"profile-analyzer","exit_status":"success","retry_count":0,"verdict_counts":{"ACCEPT":1},"total_steps":2,"latency_ms":5432,"needs_attention":false,"attention_reasons":[],"tool_error_count":0,"tokens_used":3456}
```
### L3: tool_logs.jsonl
**Written:** Incrementally (append per step)
**Format:** JSONL (one JSON object per line)
Each line includes **trace context** when the framework has set it (via the observability module): `trace_id`, `span_id`, `parent_span_id` (optional), and `execution_id`. These align with OpenTelemetry/W3C TraceContext so L3 data can be exported to OTel backends without schema changes.
```jsonl
{"node_id":"intake-collector","step_index":3,"trace_id":"54e80d7b5bd6409dbc3217e5cd16a4fd","span_id":"a1b2c3d4e5f67890","execution_id":"b4c348ec54e80d7b5bd6409dbc3217e50","tool_calls":[...],"verdict":"RETRY",...}
```
**Why JSONL?**
- Incremental append during execution (crash-safe)
- No need to parse entire file to add one line
- Data persisted immediately, not buffered
- Easy to stream/process line-by-line
---
## Attention Flags System
### Automatic Detection
The runtime logger automatically flags issues based on execution metrics:
| Trigger | Threshold | Attention Reason | Category |
|---------|-----------|------------------|----------|
| High retries | `retry_count > 3` | `high_retry_count` | Retry Loops |
| Escalations | `escalate_count > 2` | `escalation_pattern` | Guard Failures |
| High latency | `latency_ms > 60000` | `high_latency` | High Latency |
| Token usage | `tokens_used > 100000` | `high_token_usage` | Memory/Context |
| Stalled steps | `total_steps > 20` | `excessive_steps` | Stalled Execution |
| Tool errors | `tool_error_count > 0` | `tool_failures` | Tool Errors |
| Missing outputs | `exit_status != "success"` | `missing_outputs` | Missing Outputs |
### Attention Categories
Used for runtime issue categorization:
1. **Missing Outputs**: Node didn't set required output keys
2. **Tool Errors**: Tool calls failed (API errors, timeouts)
3. **Retry Loops**: Judge repeatedly rejecting outputs
4. **Guard Failures**: Output validation failed
5. **Stalled Execution**: EventLoopNode not making progress
6. **High Latency**: Slow tool calls or LLM responses
7. **Client-Facing Issues**: Premature set_output before user input
8. **Edge Routing Errors**: No edges match current state
9. **Memory/Context Issues**: Conversation history too long
10. **Constraint Violations**: Agent violated goal-level rules
---
## Migration Guide
### Reading Old Logs
The system automatically handles both old and new formats:
```python
# MCP tools check both locations automatically
result = query_runtime_logs("~/.hive/agents/old_agent")
# Returns logs from both:
# - ~/.hive/agents/old_agent/runtime_logs/runs/*/
# - ~/.hive/agents/old_agent/sessions/session_*/logs/
```
### Deprecation Warnings
When reading from old locations, deprecation warnings are emitted:
```
DeprecationWarning: Reading logs from deprecated location for run_id=20260101T120000_abc12345.
New sessions use unified storage at sessions/session_*/logs/
```
### Migration Script (Optional)
For migrating existing old logs to new format, see:
- `EXECUTION_STORAGE_REDESIGN.md` - Migration strategy
- Future: `scripts/migrate_to_unified_sessions.py`
---
## Performance Characteristics
### Write Performance
- **L3 append**: ~1-2ms per step (sync I/O, thread-safe)
- **L2 append**: ~1-2ms per node (sync I/O, thread-safe)
- **L1 write**: ~5-10ms at end_run (atomic, async)
**Overhead:** < 5% of total execution time for typical agents
### Read Performance
- **L1 summary**: ~1-5ms (single JSON file)
- **L2 details**: ~10-50ms (JSONL, depends on node count)
- **L3 raw logs**: ~50-500ms (JSONL, depends on step count)
**Optimization:** Use filters (node_id, step_index) to reduce data read
### Storage Size
Typical session with 5 nodes, 20 steps:
- **L1 (summary.json)**: ~2-5 KB
- **L2 (details.jsonl)**: ~5-10 KB (1-2 KB per node)
- **L3 (tool_logs.jsonl)**: ~50-200 KB (2-10 KB per step)
**Total per session:** ~60-215 KB
**Compression:** Consider archiving old sessions after 90 days
---
## Troubleshooting
### Issue: Logs not appearing
**Symptom:** MCP tools return empty results
**Check:**
1. Verify storage path exists: `~/.hive/agents/{agent_name}/`
2. Check session directories: `ls ~/.hive/agents/{agent_name}/sessions/`
3. Verify logs directory exists: `ls ~/.hive/agents/{agent_name}/sessions/session_*/logs/`
4. Check file permissions
### Issue: Corrupt JSONL files
**Symptom:** Partial data or JSON decode errors
**Cause:** Process crash during write (rare, but possible)
**Recovery:**
```python
# MCP tools skip corrupt lines automatically
query_runtime_log_details(agent_work_dir, run_id)
# Logs warning but continues with valid lines
```
### Issue: High disk usage
**Symptom:** Storage growing too large
**Solution:**
```bash
# Archive old sessions
cd ~/.hive/agents/{agent_name}/sessions/
find . -name "session_2025*" -type d -exec tar -czf archive.tar.gz {} +
rm -rf session_2025*
# Or set up automatic cleanup (future feature)
```
---
## References
**Implementation:**
- `core/framework/runtime/runtime_logger.py` - Logger implementation
- `core/framework/runtime/runtime_log_store.py` - Storage layer
- `core/framework/runtime/runtime_log_schemas.py` - Data schemas
- `tools/src/aden_tools/tools/runtime_logs_tool/runtime_logs_tool.py` - MCP query tools
**Documentation:**
- `EXECUTION_STORAGE_REDESIGN.md` - Unified session storage design
- `docs/developer-guide.md` - Debugging and troubleshooting workflows
**Related:**
- `core/framework/schemas/session_state.py` - Session state schema
- `core/framework/storage/session_store.py` - Session state storage
- `core/framework/graph/executor.py` - GraphExecutor integration
+82 -15
View File
@@ -9,6 +9,7 @@ import asyncio
import logging
import time
import uuid
from collections import OrderedDict
from collections.abc import Callable
from dataclasses import dataclass, field
from datetime import datetime
@@ -21,7 +22,7 @@ from framework.runtime.event_bus import EventBus
from framework.runtime.execution_stream import EntryPointSpec, ExecutionStream
from framework.runtime.outcome_aggregator import OutcomeAggregator
from framework.runtime.runtime_log_store import RuntimeLogStore
from framework.runtime.shared_state import SharedStateManager
from framework.runtime.shared_state import SharedBufferManager
from framework.storage.concurrent import ConcurrentStorage
from framework.storage.session_store import SessionStore
@@ -44,6 +45,9 @@ class AgentRuntimeConfig:
max_history: int = 1000
execution_result_max: int = 1000
execution_result_ttl_seconds: float | None = None
# Idempotency cache for trigger() deduplication
idempotency_ttl_seconds: float = 300.0
idempotency_max_keys: int = 10000
# Webhook server config (only starts if webhook_routes is non-empty)
webhook_host: str = "127.0.0.1"
webhook_port: int = 8080
@@ -225,7 +229,7 @@ class AgentRuntime:
self._session_store = SessionStore(storage_path_obj)
# Initialize shared components
self._state_manager = SharedStateManager()
self._state_manager = SharedBufferManager()
self._event_bus = event_bus or EventBus(max_history=self._config.max_history)
self._outcome_aggregator = OutcomeAggregator(goal, self._event_bus)
@@ -234,6 +238,12 @@ class AgentRuntime:
self._tools = tools or []
self._tool_executor = tool_executor
self._accounts_prompt = accounts_prompt
self._dynamic_memory_provider_factory: Callable[[str], Callable[[], str] | None] | None = None
# Colony memory config for reflection-at-handoff (set by session_manager)
self._colony_memory_dir: Any = None
self._colony_worker_sessions_dir: Any = None
self._colony_recall_cache: dict[str, str] | None = None
self._colony_reflect_llm: Any = None
self._accounts_data = accounts_data
self._tool_provider_map = tool_provider_map
@@ -250,6 +260,10 @@ class AgentRuntime:
# Next fire time for each timer entry point (ep_id -> datetime)
self._timer_next_fire: dict[str, float] = {}
# Idempotency cache for trigger() deduplication
self._idempotency_keys: OrderedDict[str, str] = OrderedDict()
self._idempotency_times: dict[str, float] = {}
# State
self._running = False
self._timers_paused = False
@@ -352,6 +366,11 @@ class AgentRuntime:
skill_dirs=self.skill_dirs,
context_warn_ratio=self.context_warn_ratio,
batch_init_nudge=self.batch_init_nudge,
dynamic_memory_provider_factory=self._dynamic_memory_provider_factory,
colony_memory_dir=self._colony_memory_dir,
colony_worker_sessions_dir=self._colony_worker_sessions_dir,
colony_recall_cache=self._colony_recall_cache,
colony_reflect_llm=self._colony_reflect_llm,
)
await stream.start()
self._streams[ep_id] = stream
@@ -853,12 +872,29 @@ class AgentRuntime:
# Primary graph (also stored in self._streams)
return self._streams.get(entry_point_id)
def _prune_idempotency_keys(self) -> None:
"""Prune expired idempotency keys based on TTL and max size."""
ttl = self._config.idempotency_ttl_seconds
if ttl > 0:
cutoff = time.time() - ttl
for key, recorded_at in list(self._idempotency_times.items()):
if recorded_at < cutoff:
self._idempotency_times.pop(key, None)
self._idempotency_keys.pop(key, None)
max_keys = self._config.idempotency_max_keys
if max_keys > 0:
while len(self._idempotency_keys) > max_keys:
old_key, _ = self._idempotency_keys.popitem(last=False)
self._idempotency_times.pop(old_key, None)
async def trigger(
self,
entry_point_id: str,
input_data: dict[str, Any],
correlation_id: str | None = None,
session_state: dict[str, Any] | None = None,
idempotency_key: str | None = None,
graph_id: str | None = None,
) -> str:
"""
@@ -871,6 +907,10 @@ class AgentRuntime:
input_data: Input data for the execution
correlation_id: Optional ID to correlate related executions
session_state: Optional session state to resume from (with paused_at, memory)
idempotency_key: Optional key for deduplication. If a trigger with
the same key was already processed within the TTL window, the
cached execution_id is returned instead of starting a new
execution. Useful for webhook providers that retry on timeout.
graph_id: Graph to trigger on. ``None`` uses the active graph
first, then falls back to the primary graph.
@@ -884,12 +924,32 @@ class AgentRuntime:
if not self._running:
raise RuntimeError("AgentRuntime is not running")
# Idempotency check: return cached execution_id for duplicate keys.
if idempotency_key is not None:
self._prune_idempotency_keys()
cached = self._idempotency_keys.get(idempotency_key)
if cached is not None:
logger.debug(
"Idempotent trigger: key '%s' already seen, returning %s",
idempotency_key,
cached,
)
return cached
stream = self._resolve_stream(entry_point_id, graph_id)
if stream is None:
raise ValueError(f"Entry point '{entry_point_id}' not found")
run_id = uuid.uuid4().hex[:12]
return await stream.execute(input_data, correlation_id, session_state, run_id=run_id)
exec_id = await stream.execute(input_data, correlation_id, session_state, run_id=run_id)
# Cache after execute() so the value is always a real execution_id
# that callers can use for tracking.
if idempotency_key is not None:
self._idempotency_keys[idempotency_key] = exec_id
self._idempotency_times[idempotency_key] = time.time()
return exec_id
async def trigger_and_wait(
self,
@@ -897,6 +957,7 @@ class AgentRuntime:
input_data: dict[str, Any],
timeout: float | None = None,
session_state: dict[str, Any] | None = None,
idempotency_key: str | None = None,
) -> ExecutionResult | None:
"""
Trigger execution and wait for completion.
@@ -906,11 +967,17 @@ class AgentRuntime:
input_data: Input data for the execution
timeout: Maximum time to wait (seconds)
session_state: Optional session state to resume from (with paused_at, memory)
idempotency_key: Optional key for deduplication (see trigger() for details).
Returns:
ExecutionResult or None if timeout
"""
exec_id = await self.trigger(entry_point_id, input_data, session_state=session_state)
exec_id = await self.trigger(
entry_point_id,
input_data,
session_state=session_state,
idempotency_key=idempotency_key,
)
stream = self._resolve_stream(entry_point_id)
if stream is None:
raise ValueError(f"Entry point '{entry_point_id}' not found")
@@ -1390,12 +1457,12 @@ class AgentRuntime:
``session_state`` dict containing:
- ``resume_session_id``: reuse the same session directory
- ``memory``: only the keys that the async entry node declares
- ``data_buffer``: only the keys that the async entry node declares
as inputs (e.g. ``rules``, ``max_emails``). Stale outputs
from previous runs (``emails``, ``actions_taken``, ) are
excluded so each trigger starts fresh.
The memory is read from the primary session's ``state.json``
The data buffer is read from the primary session's ``state.json``
which is kept up-to-date by ``GraphExecutor._write_progress()``
at every node transition.
@@ -1413,7 +1480,7 @@ class AgentRuntime:
"""
import json as _json
# Determine which memory keys the async entry node needs.
# Determine which data buffer keys the async entry node needs.
allowed_keys: set[str] | None = None
# Look up the entry node from the correct graph
src_graph_id = source_graph_id or self._graph_id
@@ -1449,19 +1516,19 @@ class AgentRuntime:
try:
if state_path.exists():
data = _json.loads(state_path.read_text(encoding="utf-8"))
full_memory = data.get("memory", {})
if not full_memory:
full_buffer = data.get("data_buffer", data.get("memory", {}))
if not full_buffer:
continue
# Filter to only input keys so stale outputs
# from previous triggers don't leak through.
if allowed_keys is not None:
memory = {k: v for k, v in full_memory.items() if k in allowed_keys}
buffer_data = {k: v for k, v in full_buffer.items() if k in allowed_keys}
else:
memory = full_memory
if memory:
buffer_data = full_buffer
if buffer_data:
return {
"resume_session_id": exec_id,
"memory": memory,
"data_buffer": buffer_data,
}
except Exception:
logger.debug(
@@ -1610,7 +1677,7 @@ class AgentRuntime:
for node_id, node in executor.node_registry.items():
if getattr(node, "_awaiting_input", False):
# Skip escalation receivers — those are handled
# by the queen via inject_worker_message(), not
# by the queen via inject_message(), not
# by the user directly.
if ":escalation:" in node_id:
continue
@@ -1725,7 +1792,7 @@ class AgentRuntime:
# === PROPERTIES ===
@property
def state_manager(self) -> SharedStateManager:
def state_manager(self) -> SharedBufferManager:
"""Access the shared state manager."""
return self._state_manager
@@ -1,39 +0,0 @@
"""EscalationTicket — structured schema for worker health escalations."""
from __future__ import annotations
from datetime import UTC, datetime
from typing import Literal
from uuid import uuid4
from pydantic import BaseModel, Field
class EscalationTicket(BaseModel):
"""Structured escalation report for worker health monitoring.
All fields must be filled before calling emit_escalation_ticket.
Pydantic validation rejects partial tickets.
"""
ticket_id: str = Field(default_factory=lambda: str(uuid4()))
created_at: str = Field(default_factory=lambda: datetime.now(UTC).isoformat())
# Worker identification
worker_agent_id: str
worker_session_id: str
worker_node_id: str
worker_graph_id: str
# Problem characterization
severity: Literal["low", "medium", "high", "critical"]
cause: str # Human-readable: "Node has produced 18 RETRY verdicts..."
judge_reasoning: str # Judge's own deliberation chain
suggested_action: str # "Restart node", "Human review", "Kill session", etc.
# Evidence
recent_verdicts: list[str] # e.g. ["RETRY", "RETRY", "CONTINUE", "RETRY"]
total_steps_checked: int # How many steps the judge saw
steps_since_last_accept: int # Steps with no ACCEPT verdict
stall_minutes: float | None # Wall-clock minutes since last new log step (None if active)
evidence_snippet: str # Brief excerpt from recent LLM output or error
+59 -57
View File
@@ -94,12 +94,12 @@ class EventType(StrEnum):
TOOL_CALL_STARTED = "tool_call_started"
TOOL_CALL_COMPLETED = "tool_call_completed"
# Client I/O (client_facing=True nodes only)
# Queen/user interaction events
CLIENT_OUTPUT_DELTA = "client_output_delta"
CLIENT_INPUT_REQUESTED = "client_input_requested"
CLIENT_INPUT_RECEIVED = "client_input_received"
# Internal node observability (client_facing=False nodes)
# Internal node observability
NODE_INTERNAL_OUTPUT = "node_internal_output"
NODE_INPUT_BLOCKED = "node_input_blocked"
NODE_STALLED = "node_stalled"
@@ -115,6 +115,10 @@ class EventType(StrEnum):
NODE_RETRY = "node_retry"
EDGE_TRAVERSED = "edge_traversed"
# Worker agent lifecycle (event-driven graph execution)
WORKER_COMPLETED = "worker_completed"
WORKER_FAILED = "worker_failed"
# Context management
CONTEXT_COMPACTED = "context_compacted"
CONTEXT_USAGE_UPDATED = "context_usage_updated"
@@ -128,15 +132,11 @@ class EventType(StrEnum):
# Escalation (agent requests handoff to queen)
ESCALATION_REQUESTED = "escalation_requested"
# Worker health monitoring
WORKER_ESCALATION_TICKET = "worker_escalation_ticket"
QUEEN_INTERVENTION_REQUESTED = "queen_intervention_requested"
# Execution resurrection (auto-restart on non-fatal failure)
EXECUTION_RESURRECTED = "execution_resurrected"
# Worker lifecycle (session manager → frontend)
WORKER_LOADED = "worker_loaded"
# Graph lifecycle (session manager → frontend)
WORKER_GRAPH_LOADED = "worker_graph_loaded"
CREDENTIALS_REQUIRED = "credentials_required"
# Draft graph (planning phase — lightweight graph preview)
@@ -879,7 +879,7 @@ class EventBus:
iteration: int | None = None,
inner_turn: int = 0,
) -> None:
"""Emit client output delta event (client_facing=True nodes)."""
"""Emit user-facing output delta for interactive queen turns."""
data: dict = {"content": content, "snapshot": snapshot, "inner_turn": inner_turn}
if iteration is not None:
data["iteration"] = iteration
@@ -902,7 +902,7 @@ class EventBus:
options: list[str] | None = None,
questions: list[dict] | None = None,
) -> None:
"""Emit client input requested event (client_facing=True nodes).
"""Emit a user-input request for interactive queen turns.
Args:
options: Optional predefined choices for the user (1-3 items).
@@ -936,7 +936,7 @@ class EventBus:
content: str,
execution_id: str | None = None,
) -> None:
"""Emit node internal output event (client_facing=False nodes)."""
"""Emit node internal output for non-user-facing execution."""
await self.publish(
AgentEvent(
type=EventType.NODE_INTERNAL_OUTPUT,
@@ -1094,6 +1094,54 @@ class EventBus:
)
)
async def emit_worker_completed(
self,
stream_id: str,
node_id: str,
worker_id: str,
success: bool,
output: dict[str, Any],
activations: list[dict[str, Any]] | None = None,
execution_id: str | None = None,
**extra_data: Any,
) -> None:
"""Emit worker completed event with outgoing activations."""
data: dict[str, Any] = {
"worker_id": worker_id,
"success": success,
"output": output,
"activations": activations or [],
**extra_data,
}
await self.publish(
AgentEvent(
type=EventType.WORKER_COMPLETED,
stream_id=stream_id,
node_id=node_id,
execution_id=execution_id,
data=data,
)
)
async def emit_worker_failed(
self,
stream_id: str,
node_id: str,
worker_id: str,
error: str,
execution_id: str | None = None,
) -> None:
"""Emit worker failed event."""
await self.publish(
AgentEvent(
type=EventType.WORKER_FAILED,
stream_id=stream_id,
node_id=node_id,
execution_id=execution_id,
data={"worker_id": worker_id, "error": error},
)
)
async def emit_execution_paused(
self,
stream_id: str,
@@ -1172,52 +1220,6 @@ class EventBus:
)
)
async def emit_worker_escalation_ticket(
self,
stream_id: str,
node_id: str,
ticket: dict,
execution_id: str | None = None,
) -> None:
"""Emitted when worker shows a degradation pattern."""
await self.publish(
AgentEvent(
type=EventType.WORKER_ESCALATION_TICKET,
stream_id=stream_id,
node_id=node_id,
execution_id=execution_id,
data={"ticket": ticket},
)
)
async def emit_queen_intervention_requested(
self,
stream_id: str,
node_id: str,
ticket_id: str,
analysis: str,
severity: str,
queen_graph_id: str,
queen_stream_id: str,
execution_id: str | None = None,
) -> None:
"""Emitted by queen when she decides the operator should be involved."""
await self.publish(
AgentEvent(
type=EventType.QUEEN_INTERVENTION_REQUESTED,
stream_id=stream_id,
node_id=node_id,
execution_id=execution_id,
data={
"ticket_id": ticket_id,
"analysis": analysis,
"severity": severity,
"queen_graph_id": queen_graph_id,
"queen_stream_id": queen_stream_id,
},
)
)
async def emit_subagent_report(
self,
stream_id: str,
+51 -8
View File
@@ -21,7 +21,7 @@ from typing import TYPE_CHECKING, Any
from framework.graph.checkpoint_config import CheckpointConfig
from framework.graph.executor import ExecutionResult, GraphExecutor
from framework.runtime.event_bus import EventBus
from framework.runtime.shared_state import IsolationLevel, SharedStateManager
from framework.runtime.shared_state import IsolationLevel, SharedBufferManager
from framework.runtime.stream_runtime import StreamRuntime, StreamRuntimeAdapter
if TYPE_CHECKING:
@@ -170,7 +170,7 @@ class ExecutionStream:
entry_spec: EntryPointSpec,
graph: "GraphSpec",
goal: "Goal",
state_manager: SharedStateManager,
state_manager: SharedBufferManager,
storage: "ConcurrentStorage",
outcome_aggregator: "OutcomeAggregator",
event_bus: "EventBus | None" = None,
@@ -191,6 +191,11 @@ class ExecutionStream:
skill_dirs: list[str] | None = None,
context_warn_ratio: float | None = None,
batch_init_nudge: str | None = None,
dynamic_memory_provider_factory: Callable[[str], Callable[[], str] | None] | None = None,
colony_memory_dir: Any = None,
colony_worker_sessions_dir: Any = None,
colony_recall_cache: dict[str, str] | None = None,
colony_reflect_llm: Any = None,
):
"""
Initialize execution stream.
@@ -245,6 +250,11 @@ class ExecutionStream:
self._skill_dirs: list[str] = skill_dirs or []
self._context_warn_ratio: float | None = context_warn_ratio
self._batch_init_nudge: str | None = batch_init_nudge
self._dynamic_memory_provider_factory = dynamic_memory_provider_factory
self._colony_memory_dir = colony_memory_dir
self._colony_worker_sessions_dir = colony_worker_sessions_dir
self._colony_recall_cache = colony_recall_cache
self._colony_reflect_llm = colony_reflect_llm
_es_logger = logging.getLogger(__name__)
if protocols_prompt:
@@ -357,7 +367,7 @@ class ExecutionStream:
Each entry is ``{"node_id": ..., "execution_id": ...}``.
The currently executing node is placed first so that
``inject_worker_message`` targets the active node, not a stale one.
``inject_message`` targets the active node, not a stale one.
"""
injectable: list[dict[str, str]] = []
current_first: list[dict[str, str]] = []
@@ -550,6 +560,14 @@ class ExecutionStream:
correlation_id = execution_id
# Create execution context
effective_run_id = None
if session_state:
existing_run_id = session_state.get("run_id")
if isinstance(existing_run_id, str) and existing_run_id:
effective_run_id = existing_run_id
if effective_run_id is None:
effective_run_id = run_id
ctx = ExecutionContext(
id=execution_id,
correlation_id=correlation_id,
@@ -558,7 +576,7 @@ class ExecutionStream:
input_data=input_data,
isolation_level=self.entry_spec.get_isolation_level(),
session_state=session_state,
run_id=run_id,
run_id=effective_run_id,
)
async with self._lock:
@@ -639,7 +657,7 @@ class ExecutionStream:
self._write_run_event(execution_id, ctx.run_id, "run_started")
# Create execution-scoped memory
self._state_manager.create_memory(
self._state_manager.create_buffer(
execution_id=execution_id,
stream_id=self.stream_id,
isolation=ctx.isolation_level,
@@ -700,6 +718,7 @@ class ExecutionStream:
event_bus=self._scoped_event_bus,
stream_id=self.stream_id,
execution_id=execution_id,
run_id=ctx.run_id or "",
storage_path=exec_storage,
runtime_logger=runtime_logger,
loop_config=self.graph.loop_config,
@@ -711,6 +730,15 @@ class ExecutionStream:
skill_dirs=self._skill_dirs,
context_warn_ratio=self._context_warn_ratio,
batch_init_nudge=self._batch_init_nudge,
dynamic_memory_provider=(
self._dynamic_memory_provider_factory(execution_id)
if self._dynamic_memory_provider_factory is not None
else None
),
colony_memory_dir=self._colony_memory_dir,
colony_worker_sessions_dir=self._colony_worker_sessions_dir,
colony_recall_cache=self._colony_recall_cache,
colony_reflect_llm=self._colony_reflect_llm,
)
# Track executor so inject_input() can reach EventLoopNode instances
self._active_executors[execution_id] = executor
@@ -1044,6 +1072,7 @@ class ExecutionStream:
agent_id=self.graph.id,
entry_point=self.entry_spec.id,
)
state.current_run_id = ctx.run_id
else:
# Create initial state — when resuming, preserve the previous
# execution's progress so crashes don't lose track of state.
@@ -1074,8 +1103,9 @@ class ExecutionStream:
updated_at=now,
),
progress=progress,
memory=ss.get("memory", {}),
data_buffer=ss.get("data_buffer", ss.get("memory", {})),
input_data=ctx.input_data,
current_run_id=ctx.run_id,
)
# Handle error case
@@ -1198,9 +1228,22 @@ class ExecutionStream:
task.cancel()
# Wait briefly for the task to finish. Don't block indefinitely —
# the task may be stuck in a long LLM API call that doesn't
# respond to cancellation quickly. The cancellation is already
# requested; the task will clean up in the background.
# respond to cancellation quickly.
done, _ = await asyncio.wait({task}, timeout=5.0)
if not done:
# Task didn't finish within timeout — clean up bookkeeping now
# so the session doesn't think it still has running executions.
# The task will continue winding down in the background and its
# finally block will harmlessly pop already-removed keys.
logger.warning(
"Execution %s did not finish within cancel timeout; "
"force-cleaning bookkeeping",
execution_id,
)
async with self._lock:
self._active_executions.pop(execution_id, None)
self._execution_tasks.pop(execution_id, None)
self._active_executors.pop(execution_id, None)
return True
return False
+24 -24
View File
@@ -1,10 +1,10 @@
"""
Shared State Manager - Manages state across concurrent executions.
Shared Buffer Manager - Manages state across concurrent executions.
Provides different isolation levels:
- ISOLATED: Each execution has its own memory copy
- SHARED: All executions read/write same memory (eventual consistency)
- SYNCHRONIZED: Shared memory with write locks (strong consistency)
- ISOLATED: Each execution has its own state copy
- SHARED: All executions read/write same state (eventual consistency)
- SYNCHRONIZED: Shared state with write locks (strong consistency)
"""
import asyncio
@@ -46,7 +46,7 @@ class StateChange:
timestamp: float = field(default_factory=time.time)
class SharedStateManager:
class SharedBufferManager:
"""
Manages shared state across concurrent executions.
@@ -61,18 +61,18 @@ class SharedStateManager:
- SYNCHRONIZED: Like SHARED but with write locks
Example:
manager = SharedStateManager()
manager = SharedBufferManager()
# Create memory for an execution
memory = manager.create_memory(
# Create buffer for an execution
buf = manager.create_buffer(
execution_id="exec_123",
stream_id="webhook",
isolation=IsolationLevel.SHARED,
)
# Read/write through the memory
await memory.write("customer_id", "cust_456", scope=StateScope.STREAM)
value = await memory.read("customer_id")
# Read/write through the buffer
await buf.write("customer_id", "cust_456", scope=StateScope.STREAM)
value = await buf.read("customer_id")
"""
def __init__(self):
@@ -93,14 +93,14 @@ class SharedStateManager:
# Version tracking
self._version = 0
def create_memory(
def create_buffer(
self,
execution_id: str,
stream_id: str,
isolation: IsolationLevel,
) -> "StreamMemory":
) -> "StreamBuffer":
"""
Create a memory instance for an execution.
Create a buffer instance for an execution.
Args:
execution_id: Unique execution identifier
@@ -108,7 +108,7 @@ class SharedStateManager:
isolation: Isolation level for this execution
Returns:
StreamMemory instance for reading/writing state
StreamBuffer instance for reading/writing state
"""
# Initialize execution state
if execution_id not in self._execution_state:
@@ -119,7 +119,7 @@ class SharedStateManager:
self._stream_state[stream_id] = {}
self._stream_locks[stream_id] = asyncio.Lock()
return StreamMemory(
return StreamBuffer(
manager=self,
execution_id=execution_id,
stream_id=stream_id,
@@ -343,17 +343,17 @@ class SharedStateManager:
return self._change_history[-limit:]
class StreamMemory:
class StreamBuffer:
"""
Memory interface for a single execution.
Buffer interface for a single execution.
Provides scoped access to shared state with proper isolation.
Compatible with the existing SharedMemory interface where possible.
Compatible with the existing DataBuffer interface where possible.
"""
def __init__(
self,
manager: SharedStateManager,
manager: SharedBufferManager,
execution_id: str,
stream_id: str,
isolation: IsolationLevel,
@@ -371,13 +371,13 @@ class StreamMemory:
self,
read_keys: list[str],
write_keys: list[str],
) -> "StreamMemory":
) -> "StreamBuffer":
"""
Create a scoped view with read/write permissions.
Compatible with existing SharedMemory.with_permissions().
Compatible with existing DataBuffer.with_permissions().
"""
scoped = StreamMemory(
scoped = StreamBuffer(
manager=self._manager,
execution_id=self._execution_id,
stream_id=self._stream_id,
@@ -434,7 +434,7 @@ class StreamMemory:
return all_state
# === SYNC API (for backward compatibility with SharedMemory) ===
# === SYNC API (for backward compatibility with DataBuffer) ===
def read_sync(self, key: str) -> Any:
"""
@@ -5,7 +5,7 @@ Tests:
1. AgentRuntime creation and lifecycle
2. Entry point registration
3. Concurrent executions across streams
4. SharedStateManager isolation levels
4. SharedBufferManager isolation levels
5. OutcomeAggregator goal evaluation
6. EventBus pub/sub
"""
@@ -24,7 +24,8 @@ from framework.runtime.agent_runtime import AgentRuntime, create_agent_runtime
from framework.runtime.event_bus import AgentEvent, EventBus, EventType
from framework.runtime.execution_stream import EntryPointSpec
from framework.runtime.outcome_aggregator import OutcomeAggregator
from framework.runtime.shared_state import IsolationLevel, SharedStateManager
from framework.runtime.shared_state import IsolationLevel, SharedBufferManager
from framework.schemas.session_state import SessionState, SessionTimestamps
# === Test Fixtures ===
@@ -121,45 +122,45 @@ def temp_storage():
yield Path(tmpdir)
# === SharedStateManager Tests ===
# === SharedBufferManager Tests ===
class TestSharedStateManager:
"""Tests for SharedStateManager."""
class TestSharedBufferManager:
"""Tests for SharedBufferManager."""
def test_create_memory(self):
"""Test creating execution-scoped memory."""
manager = SharedStateManager()
memory = manager.create_memory(
def test_create_buffer(self):
"""Test creating execution-scoped buffer."""
manager = SharedBufferManager()
buffer = manager.create_buffer(
execution_id="exec-1",
stream_id="webhook",
isolation=IsolationLevel.SHARED,
)
assert memory is not None
assert memory._execution_id == "exec-1"
assert memory._stream_id == "webhook"
assert buffer is not None
assert buffer._execution_id == "exec-1"
assert buffer._stream_id == "webhook"
@pytest.mark.asyncio
async def test_isolated_state(self):
"""Test isolated state doesn't leak between executions."""
manager = SharedStateManager()
manager = SharedBufferManager()
mem1 = manager.create_memory("exec-1", "stream-1", IsolationLevel.ISOLATED)
mem2 = manager.create_memory("exec-2", "stream-1", IsolationLevel.ISOLATED)
buf1 = manager.create_buffer("exec-1", "stream-1", IsolationLevel.ISOLATED)
buf2 = manager.create_buffer("exec-2", "stream-1", IsolationLevel.ISOLATED)
await mem1.write("key", "value1")
await mem2.write("key", "value2")
await buf1.write("key", "value1")
await buf2.write("key", "value2")
assert await mem1.read("key") == "value1"
assert await mem2.read("key") == "value2"
assert await buf1.read("key") == "value1"
assert await buf2.read("key") == "value2"
@pytest.mark.asyncio
async def test_shared_state(self):
"""Test shared state is visible across executions."""
manager = SharedStateManager()
manager = SharedBufferManager()
manager.create_memory("exec-1", "stream-1", IsolationLevel.SHARED)
manager.create_memory("exec-2", "stream-1", IsolationLevel.SHARED)
manager.create_buffer("exec-1", "stream-1", IsolationLevel.SHARED)
manager.create_buffer("exec-2", "stream-1", IsolationLevel.SHARED)
# Write to global scope
await manager.write(
@@ -180,8 +181,8 @@ class TestSharedStateManager:
def test_cleanup_execution(self):
"""Test execution cleanup removes state."""
manager = SharedStateManager()
manager.create_memory("exec-1", "stream-1", IsolationLevel.ISOLATED)
manager = SharedBufferManager()
manager.create_buffer("exec-1", "stream-1", IsolationLevel.ISOLATED)
assert "exec-1" in manager._execution_state
@@ -190,6 +191,26 @@ class TestSharedStateManager:
assert "exec-1" not in manager._execution_state
class TestSessionState:
"""Tests for session state data-buffer compatibility."""
def test_legacy_memory_alias_populates_data_buffer(self):
"""Legacy `memory` payloads should still hydrate the session buffer."""
state = SessionState(
session_id="session-1",
goal_id="goal-1",
timestamps=SessionTimestamps(
started_at="2026-01-01T00:00:00",
updated_at="2026-01-01T00:00:00",
),
memory={"rules": "keep starred mail"},
)
assert state.data_buffer == {"rules": "keep starred mail"}
assert state.memory == {"rules": "keep starred mail"}
assert state.to_session_state_dict()["data_buffer"] == {"rules": "keep starred mail"}
# === EventBus Tests ===
@@ -0,0 +1,268 @@
"""Tests for webhook idempotency key support in AgentRuntime.trigger()."""
import asyncio
import time
from collections import OrderedDict
from unittest.mock import AsyncMock, MagicMock
import pytest
from framework.runtime.agent_runtime import AgentRuntime, AgentRuntimeConfig
def _make_runtime(ttl=300.0, max_keys=10000):
"""Create a minimal AgentRuntime with idempotency cache attributes.
Uses ``object.__new__`` to skip ``__init__`` and its heavy dependencies
(storage, LLM, skills) we only need the cache and config for these tests.
"""
runtime = object.__new__(AgentRuntime)
runtime._config = AgentRuntimeConfig(idempotency_ttl_seconds=ttl, idempotency_max_keys=max_keys)
runtime._running = True
runtime._lock = asyncio.Lock()
runtime._idempotency_keys = OrderedDict()
runtime._idempotency_times = {}
runtime._graphs = {}
runtime._active_graph_id = "primary"
runtime._graph_id = "primary"
runtime._streams = {}
runtime._entry_points = {}
return runtime
def _make_runtime_with_stream(ttl=300.0, max_keys=10000):
"""Create a mock runtime whose stream.execute() returns unique IDs."""
runtime = _make_runtime(ttl=ttl, max_keys=max_keys)
call_count = 0
async def _fake_execute(*args, **kwargs):
nonlocal call_count
call_count += 1
return f"session-{call_count:04d}"
stream = MagicMock()
stream.execute = _fake_execute
runtime._streams = {"webhook": stream}
runtime._entry_points = {"webhook": MagicMock()}
return runtime
class TestIdempotencyConfig:
"""Verify idempotency configuration defaults."""
def test_default_ttl(self):
config = AgentRuntimeConfig()
assert config.idempotency_ttl_seconds == 300.0
def test_default_max_keys(self):
config = AgentRuntimeConfig()
assert config.idempotency_max_keys == 10000
def test_custom_config(self):
config = AgentRuntimeConfig(idempotency_ttl_seconds=60.0, idempotency_max_keys=100)
assert config.idempotency_ttl_seconds == 60.0
assert config.idempotency_max_keys == 100
class TestIdempotencyCache:
"""Test the idempotency cache and pruning logic directly."""
def test_cache_stores_and_retrieves_key(self):
runtime = _make_runtime()
runtime._idempotency_keys["stripe-evt-123"] = "exec-001"
runtime._idempotency_times["stripe-evt-123"] = time.time()
assert runtime._idempotency_keys.get("stripe-evt-123") == "exec-001"
def test_cache_returns_none_for_unknown_key(self):
runtime = _make_runtime()
assert runtime._idempotency_keys.get("unknown") is None
def test_prune_removes_expired_keys(self):
runtime = _make_runtime(ttl=0.1)
runtime._idempotency_keys["old-key"] = "exec-old"
runtime._idempotency_times["old-key"] = time.time() - 1.0 # expired
runtime._prune_idempotency_keys()
assert "old-key" not in runtime._idempotency_keys
assert "old-key" not in runtime._idempotency_times
def test_prune_keeps_fresh_keys(self):
runtime = _make_runtime(ttl=300.0)
runtime._idempotency_keys["fresh-key"] = "exec-fresh"
runtime._idempotency_times["fresh-key"] = time.time()
runtime._prune_idempotency_keys()
assert "fresh-key" in runtime._idempotency_keys
def test_prune_respects_max_keys(self):
runtime = _make_runtime(max_keys=2)
for i in range(3):
key = f"key-{i}"
runtime._idempotency_keys[key] = f"exec-{i}"
runtime._idempotency_times[key] = time.time()
runtime._prune_idempotency_keys()
assert len(runtime._idempotency_keys) == 2
# Oldest (key-0) should be evicted
assert "key-0" not in runtime._idempotency_keys
assert "key-1" in runtime._idempotency_keys
assert "key-2" in runtime._idempotency_keys
def test_prune_evicts_fifo(self):
runtime = _make_runtime(max_keys=1)
runtime._idempotency_keys["first"] = "exec-1"
runtime._idempotency_times["first"] = time.time()
runtime._idempotency_keys["second"] = "exec-2"
runtime._idempotency_times["second"] = time.time()
runtime._prune_idempotency_keys()
assert len(runtime._idempotency_keys) == 1
assert "second" in runtime._idempotency_keys
assert "first" not in runtime._idempotency_keys
def test_mixed_expired_and_max_size(self):
runtime = _make_runtime(ttl=0.1, max_keys=2)
# Add expired key
runtime._idempotency_keys["expired"] = "exec-e"
runtime._idempotency_times["expired"] = time.time() - 1.0
# Add fresh keys
runtime._idempotency_keys["fresh-1"] = "exec-f1"
runtime._idempotency_times["fresh-1"] = time.time()
runtime._idempotency_keys["fresh-2"] = "exec-f2"
runtime._idempotency_times["fresh-2"] = time.time()
runtime._prune_idempotency_keys()
assert "expired" not in runtime._idempotency_keys
assert "fresh-1" in runtime._idempotency_keys
assert "fresh-2" in runtime._idempotency_keys
class TestTriggerIdempotency:
"""Tests for trigger() idempotency deduplication."""
def test_trigger_accepts_idempotency_key(self):
"""trigger() accepts idempotency_key as a keyword argument."""
import inspect
sig = inspect.signature(AgentRuntime.trigger)
assert "idempotency_key" in sig.parameters
def test_idempotency_key_defaults_to_none(self):
"""idempotency_key defaults to None (backward compatible)."""
import inspect
sig = inspect.signature(AgentRuntime.trigger)
assert sig.parameters["idempotency_key"].default is None
def test_trigger_and_wait_accepts_idempotency_key(self):
"""trigger_and_wait() also accepts idempotency_key."""
import inspect
sig = inspect.signature(AgentRuntime.trigger_and_wait)
assert "idempotency_key" in sig.parameters
def test_trigger_and_wait_idempotency_key_defaults_to_none(self):
"""trigger_and_wait() idempotency_key defaults to None."""
import inspect
sig = inspect.signature(AgentRuntime.trigger_and_wait)
assert sig.parameters["idempotency_key"].default is None
@pytest.mark.asyncio
async def test_duplicate_key_returns_cached_id(self):
"""Same idempotency key within TTL returns the cached execution ID."""
runtime = _make_runtime_with_stream()
first = await runtime.trigger("webhook", {}, idempotency_key="stripe-evt-001")
second = await runtime.trigger("webhook", {}, idempotency_key="stripe-evt-001")
assert first == second
assert first == "session-0001"
@pytest.mark.asyncio
async def test_different_keys_produce_different_ids(self):
"""Different idempotency keys start separate executions."""
runtime = _make_runtime_with_stream()
id_a = await runtime.trigger("webhook", {}, idempotency_key="evt-aaa")
id_b = await runtime.trigger("webhook", {}, idempotency_key="evt-bbb")
assert id_a != id_b
assert id_a == "session-0001"
assert id_b == "session-0002"
@pytest.mark.asyncio
async def test_none_key_always_starts_new_execution(self):
"""key=None (default) skips dedup — every call starts fresh."""
runtime = _make_runtime_with_stream()
id_1 = await runtime.trigger("webhook", {})
id_2 = await runtime.trigger("webhook", {})
assert id_1 != id_2
assert len(runtime._idempotency_keys) == 0 # nothing cached
@pytest.mark.asyncio
async def test_expired_key_allows_new_execution(self):
"""After TTL expires, the same key starts a new execution."""
runtime = _make_runtime_with_stream(ttl=0.1)
first = await runtime.trigger("webhook", {}, idempotency_key="evt-expire")
# Backdate the cached timestamp so the key looks expired
runtime._idempotency_times["evt-expire"] = time.time() - 1.0
second = await runtime.trigger("webhook", {}, idempotency_key="evt-expire")
assert first != second
assert first == "session-0001"
assert second == "session-0002"
@pytest.mark.asyncio
async def test_stream_not_found_does_not_cache(self):
"""If entry point doesn't exist, nothing is cached."""
runtime = _make_runtime_with_stream()
with pytest.raises(ValueError, match="not found"):
await runtime.trigger("nonexistent", {}, idempotency_key="evt-orphan")
assert "evt-orphan" not in runtime._idempotency_keys
@pytest.mark.asyncio
async def test_execute_error_does_not_cache(self):
"""If stream.execute() raises, nothing is cached so retries can go through."""
runtime = _make_runtime()
failing_stream = MagicMock()
failing_stream.execute = AsyncMock(side_effect=RuntimeError("stream not running"))
runtime._streams = {"webhook": failing_stream}
runtime._entry_points = {"webhook": MagicMock()}
with pytest.raises(RuntimeError, match="stream not running"):
await runtime.trigger("webhook", {}, idempotency_key="evt-123")
assert "evt-123" not in runtime._idempotency_keys
@pytest.mark.asyncio
async def test_cache_holds_real_execution_id(self):
"""Cached value matches the actual execution ID from execute()."""
runtime = _make_runtime_with_stream()
exec_id = await runtime.trigger("webhook", {}, idempotency_key="evt-real")
cached = runtime._idempotency_keys.get("evt-real")
assert cached == exec_id
assert cached == "session-0001"
+8 -4
View File
@@ -23,6 +23,7 @@ class Checkpoint(BaseModel):
checkpoint_id: str # Format: cp_{type}_{node_id}_{timestamp}
checkpoint_type: str # "node_start" | "node_complete" | "loop_iteration"
session_id: str
run_id: str | None = None
# Timestamps
created_at: str # ISO 8601 format
@@ -33,7 +34,7 @@ class Checkpoint(BaseModel):
execution_path: list[str] = Field(default_factory=list) # Nodes executed so far
# State snapshots
shared_memory: dict[str, Any] = Field(default_factory=dict) # Full SharedMemory._data
data_buffer: dict[str, Any] = Field(default_factory=dict) # Full DataBuffer._data
accumulated_outputs: dict[str, Any] = Field(default_factory=dict) # Outputs accumulated so far
# Execution metrics (for resuming quality tracking)
@@ -50,9 +51,10 @@ class Checkpoint(BaseModel):
cls,
checkpoint_type: str,
session_id: str,
run_id: str | None,
current_node: str,
execution_path: list[str],
shared_memory: dict[str, Any],
data_buffer: dict[str, Any],
next_node: str | None = None,
accumulated_outputs: dict[str, Any] | None = None,
metrics_snapshot: dict[str, Any] | None = None,
@@ -65,9 +67,10 @@ class Checkpoint(BaseModel):
Args:
checkpoint_type: Type of checkpoint (node_start, node_complete, etc.)
session_id: Session this checkpoint belongs to
run_id: Logical run this checkpoint belongs to
current_node: Node ID at checkpoint time
execution_path: List of node IDs executed so far
shared_memory: Full memory state snapshot
data_buffer: Full data buffer state snapshot
next_node: Next node to execute (for node_complete checkpoints)
accumulated_outputs: Outputs accumulated so far
metrics_snapshot: Execution metrics at checkpoint time
@@ -87,11 +90,12 @@ class Checkpoint(BaseModel):
checkpoint_id=checkpoint_id,
checkpoint_type=checkpoint_type,
session_id=session_id,
run_id=run_id,
created_at=datetime.now().isoformat(),
current_node=current_node,
next_node=next_node,
execution_path=execution_path,
shared_memory=shared_memory,
data_buffer=data_buffer,
accumulated_outputs=accumulated_outputs or {},
metrics_snapshot=metrics_snapshot or {},
is_clean=is_clean,
+27 -10
View File
@@ -9,7 +9,7 @@ from datetime import datetime
from enum import StrEnum
from typing import TYPE_CHECKING, Any
from pydantic import BaseModel, Field, computed_field
from pydantic import AliasChoices, BaseModel, Field, computed_field
if TYPE_CHECKING:
from framework.graph.executor import ExecutionResult
@@ -119,8 +119,11 @@ class SessionState(BaseModel):
# Result
result: SessionResult = Field(default_factory=SessionResult)
# Memory (for resumability)
memory: dict[str, Any] = Field(default_factory=dict)
# Data buffer (for resumability)
data_buffer: dict[str, Any] = Field(
default_factory=dict,
validation_alias=AliasChoices("data_buffer", "memory"),
)
# Metrics
metrics: SessionMetrics = Field(default_factory=SessionMetrics)
@@ -133,6 +136,7 @@ class SessionState(BaseModel):
# Input data (for debugging/replay)
input_data: dict[str, Any] = Field(default_factory=dict)
current_run_id: str | None = None
# Process ID of the owning process (for cross-process stale session detection)
pid: int | None = None
@@ -153,6 +157,16 @@ class SessionState(BaseModel):
model_config = {"extra": "allow"}
@property
def memory(self) -> dict[str, Any]:
"""Backward-compatible alias for legacy callers."""
return self.data_buffer
@memory.setter
def memory(self, value: dict[str, Any]) -> None:
"""Backward-compatible alias for legacy callers."""
self.data_buffer = value
@computed_field
@property
def duration_ms(self) -> int:
@@ -168,11 +182,10 @@ class SessionState(BaseModel):
def is_resumable(self) -> bool:
"""Can this session be resumed?
Every non-completed session is resumable. If resume_from/paused_at
aren't set, the executor falls back to the graph entry point —
so we don't gate on those. Even catastrophic failures are resumable.
Only sessions with a valid checkpoint can be resumed.
State-based resume (without a checkpoint) is no longer supported.
"""
return self.status != SessionStatus.COMPLETED
return self.is_resumable_from_checkpoint
@computed_field
@property
@@ -243,7 +256,7 @@ class SessionState(BaseModel):
error=result.error,
output=result.output,
),
memory=result.session_state.get("memory", {}) if result.session_state else {},
data_buffer=result.session_state.get("data_buffer", result.session_state.get("memory", {})) if result.session_state else {},
input_data=input_data or {},
)
@@ -293,7 +306,11 @@ class SessionState(BaseModel):
)
def to_session_state_dict(self) -> dict[str, Any]:
"""Convert to session_state format for GraphExecutor.execute()."""
"""Convert to session_state format for GraphExecutor.execute().
NOTE: state-based resume via paused_at/resume_from is deprecated.
Use checkpoint-based resume (``resume_from_checkpoint`` key) instead.
"""
# Derive resume target: explicit > last node in path > entry point
resume_from = (
self.progress.resume_from
@@ -303,7 +320,7 @@ class SessionState(BaseModel):
return {
"paused_at": resume_from,
"resume_from": resume_from,
"memory": self.memory,
"data_buffer": self.data_buffer,
"execution_path": self.progress.path,
"node_visit_counts": self.progress.node_visit_counts,
}
+28 -47
View File
@@ -4,14 +4,14 @@ HTTP API backend for the Hive agent framework. Built on **aiohttp**, fully async
## Architecture
Sessions are the primary entity. A session owns an EventBus + LLM and always has a queen executor. Workers are optional — they can be loaded into and unloaded from a session at any time.
Sessions are the primary entity. A session owns an EventBus + LLM and always has a queen executor. Graphs are optional and can be loaded into and unloaded from a session at any time.
```
Session {
event_bus # owned by session, shared with queen + worker
event_bus # owned by session, shared with queen + graph
llm # owned by session
queen_executor # always present
worker_runtime? # optional — loaded/unloaded independently
graph_runtime? # optional — loaded/unloaded independently
}
```
@@ -20,9 +20,9 @@ Session {
```
server/
├── app.py # Application factory, middleware, static serving
├── session_manager.py # Session lifecycle (create/load worker/unload/stop)
├── session_manager.py # Session lifecycle (create/load graph/unload/stop)
├── sse.py # Server-Sent Events helper
├── routes_sessions.py # Session lifecycle, info, worker-session browsing, discovery
├── routes_sessions.py # Session lifecycle, info, and discovery
├── routes_execution.py # Trigger, inject, chat, stop, resume, replay
├── routes_events.py # SSE event streaming
├── routes_graphs.py # Graph topology & node inspection
@@ -48,16 +48,16 @@ server/
Manages `Session` objects. Key methods:
- **`create_session()`** — creates EventBus + LLM, starts queen (no worker)
- **`create_session_with_worker()`** — one-step: session + worker + judge
- **`load_worker()`** — loads agent into existing session, starts judge
- **`unload_worker()`** — removes worker + judge, queen stays alive
- **`stop_session()`** — tears down everything (worker + queen)
- **`create_session()`** — creates EventBus + LLM, starts queen (no graph)
- **`create_session_with_worker_graph()`** — one-step: session + graph + judge
- **`load_graph()`** — loads agent into existing session, starts judge
- **`unload_graph()`** — removes graph + judge, queen stays alive
- **`stop_session()`** — tears down everything (graph + queen)
Three-conversation model:
1. **Queen** — persistent interactive executor for user chat (always present)
2. **Worker**`AgentRuntime` that executes graphs (optional)
3. **Judge** — timer-driven background executor for health monitoring (active when worker is loaded)
3. **Judge** — timer-driven background executor for health monitoring (active when a graph is loaded)
### `sse.py` — SSE Helper
@@ -81,23 +81,23 @@ Returns agents grouped by category with metadata (name, description, node count,
|--------|-------|-------------|
| `POST` | `/api/sessions` | Create a session |
| `GET` | `/api/sessions` | List all active sessions |
| `GET` | `/api/sessions/{session_id}` | Session detail (includes entry points + graphs if worker loaded) |
| `GET` | `/api/sessions/{session_id}` | Session detail (includes entry points + graphs if a graph is loaded) |
| `DELETE` | `/api/sessions/{session_id}` | Stop session entirely |
**Create session** has two modes:
```jsonc
// Queen-only session (no worker)
// Queen-only session (no graph)
POST /api/sessions
{}
// or with custom ID:
{ "session_id": "my-custom-id" }
// Session with worker (one-step)
// Session with graph (one-step)
POST /api/sessions
{
"agent_path": "exports/my-agent",
"agent_id": "custom-worker-name", // optional
"agent_id": "custom-graph-name", // optional
"model": "claude-sonnet-4-20250514" // optional
}
```
@@ -108,24 +108,24 @@ POST /api/sessions
**Get session** returns `202` with `{"loading": true}` while loading, `404` if not found.
### Worker Lifecycle
### Graph Lifecycle
| Method | Route | Description |
|--------|-------|-------------|
| `POST` | `/api/sessions/{session_id}/worker` | Load a worker into session |
| `DELETE` | `/api/sessions/{session_id}/worker` | Unload worker (queen stays alive) |
| `POST` | `/api/sessions/{session_id}/graph` | Load a graph into session |
| `DELETE` | `/api/sessions/{session_id}/graph` | Unload graph (queen stays alive) |
```jsonc
// Load worker into existing session
POST /api/sessions/{session_id}/worker
// Load graph into existing session
POST /api/sessions/{session_id}/graph
{
"agent_path": "exports/my-agent",
"worker_id": "custom-name", // optional
"graph_id": "custom-name", // optional
"model": "..." // optional
}
// Unload worker
DELETE /api/sessions/{session_id}/worker
// Unload graph
DELETE /api/sessions/{session_id}/graph
```
### Execution Control
@@ -152,10 +152,10 @@ POST /api/sessions/{session_id}/trigger
// Returns: { "execution_id": "..." }
```
**Chat** routes messages with priority:
1. Worker awaiting input -> inject into worker node
2. Queen active -> inject into queen conversation
3. Neither available -> 503
**Chat** always delivers messages to the queen conversation.
Worker-originated questions are still shown in the UI, but the user's reply
is mediated by the queen, which can then relay it to the blocked worker via
`inject_message()` when appropriate.
```jsonc
POST /api/sessions/{session_id}/chat
@@ -206,7 +206,7 @@ GET /api/sessions/{session_id}/events?types=CLIENT_OUTPUT_DELTA,EXECUTION_COMPLE
Keepalive ping every 15s. Streams from the session's EventBus (covers both queen and worker events).
Default event types: `CLIENT_OUTPUT_DELTA`, `CLIENT_INPUT_REQUESTED`, `LLM_TEXT_DELTA`, `TOOL_CALL_STARTED`, `TOOL_CALL_COMPLETED`, `EXECUTION_STARTED`, `EXECUTION_COMPLETED`, `EXECUTION_FAILED`, `EXECUTION_PAUSED`, `NODE_LOOP_STARTED`, `NODE_LOOP_ITERATION`, `NODE_LOOP_COMPLETED`, `NODE_ACTION_PLAN`, `EDGE_TRAVERSED`, `GOAL_PROGRESS`, `QUEEN_INTERVENTION_REQUESTED`, `WORKER_ESCALATION_TICKET`, `NODE_INTERNAL_OUTPUT`, `NODE_STALLED`, `NODE_RETRY`, `NODE_TOOL_DOOM_LOOP`, `CONTEXT_COMPACTED`, `WORKER_LOADED`.
Default event types: `CLIENT_OUTPUT_DELTA`, `CLIENT_INPUT_REQUESTED`, `LLM_TEXT_DELTA`, `TOOL_CALL_STARTED`, `TOOL_CALL_COMPLETED`, `EXECUTION_STARTED`, `EXECUTION_COMPLETED`, `EXECUTION_FAILED`, `EXECUTION_PAUSED`, `NODE_LOOP_STARTED`, `NODE_LOOP_ITERATION`, `NODE_LOOP_COMPLETED`, `NODE_ACTION_PLAN`, `EDGE_TRAVERSED`, `GOAL_PROGRESS`, `NODE_INTERNAL_OUTPUT`, `NODE_STALLED`, `NODE_RETRY`, `NODE_TOOL_DOOM_LOOP`, `CONTEXT_COMPACTED`, `WORKER_GRAPH_LOADED`.
### Session Info
@@ -254,25 +254,6 @@ GET .../nodes/{node_id}/logs?session_id=ws_id&level=all
Log levels: `summary` (run stats), `details` (per-node execution), `tools` (tool calls + LLM text).
### Worker Session Browsing
Browse persisted execution runs on disk.
| Method | Route | Description |
|--------|-------|-------------|
| `GET` | `/api/sessions/{session_id}/worker-sessions` | List worker sessions |
| `GET` | `/api/sessions/{session_id}/worker-sessions/{ws_id}` | Worker session state |
| `DELETE` | `/api/sessions/{session_id}/worker-sessions/{ws_id}` | Delete worker session |
| `GET` | `/api/sessions/{session_id}/worker-sessions/{ws_id}/checkpoints` | List checkpoints |
| `POST` | `/api/sessions/{session_id}/worker-sessions/{ws_id}/checkpoints/{cp_id}/restore` | Restore from checkpoint |
| `GET` | `/api/sessions/{session_id}/worker-sessions/{ws_id}/messages` | Get conversation messages |
**Messages** support filtering:
```
GET .../messages?node_id=gather_info # filter by node
GET .../messages?client_only=true # only user inputs + client-facing assistant outputs
```
### Credentials
| Method | Route | Description |
+33 -24
View File
@@ -94,29 +94,6 @@ def sessions_dir(session: Session) -> Path:
return Path.home() / ".hive" / "agents" / agent_name / "sessions"
def cold_sessions_dir(session_id: str) -> Path | None:
"""Resolve the worker sessions directory from disk for a cold/stopped session.
Reads agent_path from the queen session's meta.json to find the agent name,
then returns ~/.hive/agents/{agent_name}/sessions/.
Returns None if meta.json is missing or has no agent_path.
"""
import json
meta_path = Path.home() / ".hive" / "queen" / "session" / session_id / "meta.json"
if not meta_path.exists():
return None
try:
meta = json.loads(meta_path.read_text(encoding="utf-8"))
agent_path = meta.get("agent_path")
if not agent_path:
return None
agent_name = Path(agent_path).name
return Path.home() / ".hive" / "agents" / agent_name / "sessions"
except (json.JSONDecodeError, OSError):
return None
# Allowed CORS origins (localhost on any port)
_CORS_ORIGINS = {"http://localhost", "http://127.0.0.1"}
@@ -183,11 +160,42 @@ async def handle_health(request: web.Request) -> web.Response:
{
"status": "ok",
"sessions": len(sessions),
"agents_loaded": sum(1 for s in sessions if s.worker_runtime is not None),
"agents_loaded": sum(1 for s in sessions if s.graph_runtime is not None),
}
)
async def handle_browser_status(request: web.Request) -> web.Response:
"""GET /api/browser/status — proxy the GCU bridge status check server-side.
Checks http://127.0.0.1:9230/status so the browser never makes a
cross-origin request that would log ERR_CONNECTION_REFUSED in the console.
"""
import asyncio
bridge_port = int(os.environ.get("HIVE_BRIDGE_PORT", "9229"))
status_port = bridge_port + 1
try:
reader, writer = await asyncio.wait_for(
asyncio.open_connection("127.0.0.1", status_port), timeout=0.5
)
writer.write(b"GET /status HTTP/1.0\r\nHost: 127.0.0.1\r\n\r\n")
await writer.drain()
raw = await asyncio.wait_for(reader.read(512), timeout=0.5)
writer.close()
# Parse JSON body after the blank line
if b"\r\n\r\n" in raw:
body = raw.split(b"\r\n\r\n", 1)[1]
import json
data = json.loads(body)
return web.json_response({"bridge": True, "connected": data.get("connected", False)})
except Exception:
pass
return web.json_response({"bridge": False, "connected": False})
def create_app(model: str | None = None) -> web.Application:
"""Create and configure the aiohttp Application.
@@ -233,6 +241,7 @@ def create_app(model: str | None = None) -> web.Application:
# Health check
app.router.add_get("/api/health", handle_health)
app.router.add_get("/api/browser/status", handle_browser_status)
# Register route modules
from framework.server.routes_credentials import register_routes as register_credential_routes
+94 -31
View File
@@ -36,6 +36,7 @@ async def create_queen(
)
from framework.agents.queen.nodes import (
_QUEEN_BUILDING_TOOLS,
_QUEEN_EDITING_TOOLS,
_QUEEN_PLANNING_TOOLS,
_QUEEN_RUNNING_TOOLS,
_QUEEN_STAGING_TOOLS,
@@ -44,16 +45,20 @@ async def create_queen(
_planning_knowledge,
_queen_behavior_always,
_queen_behavior_building,
_queen_behavior_editing,
_queen_behavior_planning,
_queen_behavior_running,
_queen_behavior_staging,
_queen_identity_building,
_queen_identity_planning,
_queen_identity_running,
_queen_identity_staging,
_queen_character_core,
_queen_identity_editing,
_queen_phase_7,
_queen_role_building,
_queen_role_planning,
_queen_role_running,
_queen_role_staging,
_queen_style,
_queen_tools_building,
_queen_tools_editing,
_queen_tools_planning,
_queen_tools_running,
_queen_tools_staging,
@@ -70,8 +75,6 @@ async def create_queen(
QueenPhaseState,
register_queen_lifecycle_tools,
)
from framework.tools.queen_memory_tools import register_queen_memory_tools
hive_home = Path.home() / ".hive"
# ---- Tool registry ------------------------------------------------
@@ -141,19 +144,14 @@ async def create_queen(
phase_state=phase_state,
)
# ---- Episodic memory tools (always registered) ---------------------
register_queen_memory_tools(queen_registry)
# ---- Monitoring tools (only when worker is loaded) ----------------
if session.worker_runtime:
if session.graph_runtime:
from framework.tools.worker_monitoring_tools import register_worker_monitoring_tools
register_worker_monitoring_tools(
queen_registry,
session.event_bus,
session.worker_path,
stream_id="queen",
worker_graph_id=session.worker_runtime._graph_id,
worker_graph_id=session.graph_runtime._graph_id,
default_session_id=session.id,
)
@@ -165,6 +163,7 @@ async def create_queen(
building_names = set(_QUEEN_BUILDING_TOOLS)
staging_names = set(_QUEEN_STAGING_TOOLS)
running_names = set(_QUEEN_RUNNING_TOOLS)
editing_names = set(_QUEEN_EDITING_TOOLS)
registered_names = {t.name for t in queen_tools}
missing_building = building_names - registered_names
@@ -181,11 +180,20 @@ async def create_queen(
phase_state.building_tools = [t for t in queen_tools if t.name in building_names]
phase_state.staging_tools = [t for t in queen_tools if t.name in staging_names]
phase_state.running_tools = [t for t in queen_tools if t.name in running_names]
phase_state.editing_tools = [t for t in queen_tools if t.name in editing_names]
# ---- Cross-session memory ----------------------------------------
from framework.agents.queen.queen_memory import seed_if_missing
from framework.agents.queen.queen_memory_v2 import (
colony_memory_dir,
global_memory_dir,
init_memory_dir,
)
seed_if_missing()
colony_dir = colony_memory_dir(session.id)
global_dir = global_memory_dir()
init_memory_dir(colony_dir, migrate_legacy=True)
init_memory_dir(global_dir)
phase_state.global_memory_dir = global_dir
# ---- Compose phase-specific prompts ------------------------------
_orig_node = _queen_graph.nodes[0]
@@ -199,7 +207,9 @@ async def create_queen(
)
_planning_body = (
_queen_style
_queen_character_core
+ _queen_role_planning
+ _queen_style
+ _shared_building_knowledge
+ _queen_tools_planning
+ _queen_behavior_always
@@ -207,10 +217,12 @@ async def create_queen(
+ _planning_knowledge
+ worker_identity
)
phase_state.prompt_planning = _queen_identity_planning + _planning_body
phase_state.prompt_planning = _planning_body
_building_body = (
_queen_style
_queen_character_core
+ _queen_role_building
+ _queen_style
+ _shared_building_knowledge
+ _queen_tools_building
+ _queen_behavior_always
@@ -220,9 +232,10 @@ async def create_queen(
+ _appendices
+ worker_identity
)
phase_state.prompt_building = _queen_identity_building + _building_body
phase_state.prompt_building = _building_body
phase_state.prompt_staging = (
_queen_identity_staging
_queen_character_core
+ _queen_role_staging
+ _queen_style
+ _queen_tools_staging
+ _queen_behavior_always
@@ -230,15 +243,25 @@ async def create_queen(
+ worker_identity
)
phase_state.prompt_running = (
_queen_identity_running
_queen_character_core
+ _queen_role_running
+ _queen_style
+ _queen_tools_running
+ _queen_behavior_always
+ _queen_behavior_running
+ worker_identity
)
phase_state.prompt_editing = (
_queen_identity_editing
+ _queen_style
+ _queen_tools_editing
+ _queen_behavior_always
+ _queen_behavior_editing
+ worker_identity
)
# ---- Default skill protocols -------------------------------------
_queen_skill_dirs: list[str] = []
try:
from framework.skills.manager import SkillsManager, SkillsManagerConfig
@@ -249,6 +272,7 @@ async def create_queen(
_queen_skills_mgr.load()
phase_state.protocols_prompt = _queen_skills_mgr.protocols_prompt
phase_state.skills_catalog_prompt = _queen_skills_mgr.skills_catalog_prompt
_queen_skill_dirs = _queen_skills_mgr.allowlisted_dirs
except Exception:
logger.debug("Queen skill loading failed (non-fatal)", exc_info=True)
@@ -257,18 +281,26 @@ async def create_queen(
_session_event_bus = session.event_bus
async def _persona_hook(ctx: HookContext) -> HookResult | None:
persona = await select_expert_persona(ctx.trigger or "", _session_llm)
if not persona:
from framework.agents.queen.queen_memory import format_for_injection
memory_context = format_for_injection()
result = await select_expert_persona(
ctx.trigger or "", _session_llm, memory_context=memory_context
)
if not result:
return None
# Store on phase_state so persona/style persist across dynamic prompt refreshes
phase_state.persona_prefix = result.persona_prefix
phase_state.style_directive = result.style_directive
if _session_event_bus is not None:
await _session_event_bus.publish(
AgentEvent(
type=EventType.QUEEN_PERSONA_SELECTED,
stream_id="queen",
data={"persona": persona},
data={"persona": result.persona_prefix},
)
)
return HookResult(system_prompt=persona + "\n\n" + phase_state.get_current_prompt())
return HookResult(system_prompt=phase_state.get_current_prompt())
# ---- Graph preparation -------------------------------------------
initial_prompt_text = phase_state.get_current_prompt()
@@ -299,7 +331,9 @@ async def create_queen(
queen_runtime = Runtime(hive_home / "queen")
async def _queen_loop():
logger.debug("[_queen_loop] Starting queen loop for session %s", session.id)
try:
logger.debug("[_queen_loop] Creating GraphExecutor...")
executor = GraphExecutor(
runtime=queen_runtime,
llm=session.llm,
@@ -313,8 +347,12 @@ async def create_queen(
dynamic_tools_provider=phase_state.get_current_tools,
dynamic_prompt_provider=phase_state.get_current_prompt,
iteration_metadata_provider=lambda: {"phase": phase_state.phase},
skill_dirs=_queen_skill_dirs,
protocols_prompt=phase_state.protocols_prompt,
skills_catalog_prompt=phase_state.skills_catalog_prompt,
)
session.queen_executor = executor
logger.debug("[_queen_loop] GraphExecutor created and stored in session.queen_executor")
# Wire inject_notification so phase switches notify the queen LLM
async def _inject_phase_notification(content: str) -> None:
@@ -324,7 +362,8 @@ async def create_queen(
phase_state.inject_notification = _inject_phase_notification
# Auto-switch to staging when worker execution finishes
# Auto-switch to editing when worker execution finishes.
# The worker stays loaded — queen can tweak config and re-run.
async def _on_worker_done(event):
if event.stream_id == "queen":
return
@@ -345,21 +384,24 @@ async def create_queen(
"[WORKER_TERMINAL] Worker finished successfully.\n"
f"Output:{_out}\n"
"Report this to the user. "
"Ask if they want to continue with another run."
"Ask if they want to re-run with different input "
"or tweak the configuration."
)
else: # EXECUTION_FAILED
error = event.data.get("error", "Unknown error")
notification = (
"[WORKER_TERMINAL] Worker failed.\n"
f"Error: {error}\n"
"Report this to the user and help them troubleshoot."
"Report this to the user and help them troubleshoot. "
"You can re-run with different input or escalate to "
"building/planning if code changes are needed."
)
node = executor.node_registry.get("queen")
if node is not None and hasattr(node, "inject_event"):
await node.inject_event(notification)
await phase_state.switch_to_staging(source="auto")
await phase_state.switch_to_editing(source="auto")
session.event_bus.subscribe(
event_types=[EventType.EXECUTION_COMPLETED, EventType.EXECUTION_FAILED],
@@ -367,18 +409,34 @@ async def create_queen(
)
session_manager._subscribe_worker_handoffs(session, executor)
# ---- Reflection + recall memory subscriptions ----------------
from framework.agents.queen.reflection_agent import subscribe_reflection_triggers
_reflection_subs = await subscribe_reflection_triggers(
session.event_bus,
queen_dir,
session.llm,
memory_dir=colony_dir,
phase_state=phase_state,
)
# Store sub IDs on session for teardown.
session.memory_reflection_subs = _reflection_subs
logger.info(
"Queen starting in %s phase with %d tools: %s",
phase_state.phase,
len(phase_state.get_current_tools()),
[t.name for t in phase_state.get_current_tools()],
)
logger.debug("[_queen_loop] Calling executor.execute()...")
result = await executor.execute(
graph=queen_graph,
goal=queen_goal,
input_data={"greeting": initial_prompt or "Session started."},
session_state={"resume_session_id": session.id},
)
logger.debug("[_queen_loop] executor.execute() returned with success=%s", result.success)
if result.success:
logger.warning("Queen executor returned (should be forever-alive)")
else:
@@ -386,9 +444,14 @@ async def create_queen(
"Queen executor failed: %s",
result.error or "(no error message)",
)
except Exception:
logger.error("Queen conversation crashed", exc_info=True)
except asyncio.CancelledError:
logger.info("[_queen_loop] Queen loop cancelled (normal shutdown)")
raise
except Exception as e:
logger.exception("[_queen_loop] Queen conversation crashed: %s", e)
raise
finally:
logger.warning("[_queen_loop] Queen loop exiting — clearing queen_executor for session '%s'", session.id)
session.queen_executor = None
return asyncio.create_task(_queen_loop())
+6 -8
View File
@@ -30,15 +30,13 @@ DEFAULT_EVENT_TYPES = [
EventType.NODE_ACTION_PLAN,
EventType.EDGE_TRAVERSED,
EventType.GOAL_PROGRESS,
EventType.QUEEN_INTERVENTION_REQUESTED,
EventType.WORKER_ESCALATION_TICKET,
EventType.NODE_INTERNAL_OUTPUT,
EventType.NODE_STALLED,
EventType.NODE_RETRY,
EventType.NODE_TOOL_DOOM_LOOP,
EventType.CONTEXT_COMPACTED,
EventType.CONTEXT_USAGE_UPDATED,
EventType.WORKER_LOADED,
EventType.WORKER_GRAPH_LOADED,
EventType.CREDENTIALS_REQUIRED,
EventType.SUBAGENT_REPORT,
EventType.QUEEN_PHASE_CHANGED,
@@ -102,7 +100,7 @@ async def handle_events(request: web.Request) -> web.StreamResponse:
"node_loop_iteration",
"node_loop_started",
"credentials_required",
"worker_loaded",
"worker_graph_loaded",
"queen_phase_changed",
}
@@ -171,10 +169,10 @@ async def handle_events(request: web.Request) -> web.StreamResponse:
# currently running. This covers the case where the user navigated away
# and back — the localStorage snapshot is stale, and the ring-buffer
# replay may not include the original node_loop_started events.
worker_runtime = getattr(session, "worker_runtime", None)
if worker_runtime and getattr(worker_runtime, "is_running", False):
graph_runtime = getattr(session, "graph_runtime", None)
if graph_runtime and getattr(graph_runtime, "is_running", False):
try:
for stream_info in worker_runtime.get_active_streams():
for stream_info in graph_runtime.get_active_streams():
graph_id = stream_info.get("graph_id")
stream_id = stream_info.get("stream_id", "default")
for exec_id in stream_info.get("active_execution_ids", []):
@@ -192,7 +190,7 @@ async def handle_events(request: web.Request) -> web.StreamResponse:
pass
# Find the currently executing node via the executor
for _gid, reg in worker_runtime._graphs.items():
for _gid, reg in graph_runtime._graphs.items():
if _gid != graph_id:
continue
for _ep_id, stream in reg.streams.items():
+117 -88
View File
@@ -8,12 +8,24 @@ from typing import Any
from aiohttp import web
from framework.credentials.validation import validate_agent_credentials
from framework.graph.conversation import LEGACY_RUN_ID
from framework.server.app import resolve_session, safe_path_segment, sessions_dir
from framework.server.routes_sessions import _credential_error_response
logger = logging.getLogger(__name__)
def _load_checkpoint_run_id(cp_path) -> str | None:
try:
checkpoint = json.loads(cp_path.read_text(encoding="utf-8"))
except (json.JSONDecodeError, OSError):
return None
run_id = checkpoint.get("run_id")
if isinstance(run_id, str) and run_id:
return run_id
return LEGACY_RUN_ID
async def handle_trigger(request: web.Request) -> web.Response:
"""POST /api/sessions/{session_id}/trigger — start an execution.
@@ -23,8 +35,8 @@ async def handle_trigger(request: web.Request) -> web.Response:
if err:
return err
if not session.worker_runtime:
return web.json_response({"error": "No worker loaded in this session"}, status=503)
if not session.graph_runtime:
return web.json_response({"error": "No graph loaded in this session"}, status=503)
# Validate credentials before running — deferred from load time to avoid
# showing the modal before the user clicks Run. Runs in executor because
@@ -59,7 +71,7 @@ async def handle_trigger(request: web.Request) -> web.Response:
if "resume_session_id" not in session_state:
session_state["resume_session_id"] = session.id
execution_id = await session.worker_runtime.trigger(
execution_id = await session.graph_runtime.trigger(
entry_point_id,
input_data,
session_state=session_state,
@@ -87,8 +99,8 @@ async def handle_inject(request: web.Request) -> web.Response:
if err:
return err
if not session.worker_runtime:
return web.json_response({"error": "No worker loaded in this session"}, status=503)
if not session.graph_runtime:
return web.json_response({"error": "No graph loaded in this session"}, status=503)
body = await request.json()
node_id = body.get("node_id")
@@ -98,15 +110,16 @@ async def handle_inject(request: web.Request) -> web.Response:
if not node_id:
return web.json_response({"error": "node_id is required"}, status=400)
delivered = await session.worker_runtime.inject_input(node_id, content, graph_id=graph_id)
delivered = await session.graph_runtime.inject_input(node_id, content, graph_id=graph_id)
return web.json_response({"delivered": delivered})
async def handle_chat(request: web.Request) -> web.Response:
"""POST /api/sessions/{session_id}/chat — send a message to the queen.
The input box is permanently connected to the queen agent.
Worker input is handled separately via /worker-input.
The input box is permanently connected to the queen agent, including
replies to worker-originated questions. The queen decides whether to
relay the user's answer back into the worker via inject_message().
Body: {"message": "hello", "images": [{"type": "image_url", "image_url": {"url": "data:..."}}]}
@@ -115,20 +128,52 @@ async def handle_chat(request: web.Request) -> web.Response:
"""
session, err = resolve_session(request)
if err:
logger.debug("[handle_chat] Session resolution failed: %s", err)
return err
body = await request.json()
message = body.get("message", "")
display_message = body.get("display_message")
image_content = body.get("images") or None # list[dict] | None
logger.debug("[handle_chat] session_id=%s, message_len=%d, has_images=%s",
session.id, len(message), bool(image_content))
logger.debug("[handle_chat] session.queen_executor=%s", session.queen_executor)
if not message and not image_content:
return web.json_response({"error": "message is required"}, status=400)
queen_executor = session.queen_executor
if queen_executor is not None:
logger.debug("[handle_chat] Queen executor exists, looking for 'queen' node...")
logger.debug("[handle_chat] node_registry type=%s, id=%s", type(queen_executor.node_registry), id(queen_executor.node_registry))
logger.debug("[handle_chat] node_registry keys: %s", list(queen_executor.node_registry.keys()))
node = queen_executor.node_registry.get("queen")
logger.debug("[handle_chat] node=%s, node_type=%s", node, type(node).__name__ if node else None)
logger.debug("[handle_chat] has_inject_event=%s", hasattr(node, "inject_event") if node else False)
# Race condition: executor exists but node not created yet (still initializing)
if node is None and session.queen_task is not None and not session.queen_task.done():
logger.warning("[handle_chat] Queen executor exists but node not ready yet (initializing). Waiting...")
# Wait a short time for initialization to progress
import asyncio
for _ in range(50): # Max 5 seconds
await asyncio.sleep(0.1)
node = queen_executor.node_registry.get("queen")
if node is not None:
logger.debug("[handle_chat] Node appeared after waiting")
break
else:
logger.error("[handle_chat] Node still not available after 5s wait")
if node is not None and hasattr(node, "inject_event"):
await node.inject_event(message, is_client_input=True, image_content=image_content)
try:
logger.debug("[handle_chat] Calling node.inject_event()...")
await node.inject_event(message, is_client_input=True, image_content=image_content)
logger.debug("[handle_chat] inject_event() completed successfully")
except Exception as e:
logger.exception("[handle_chat] inject_event() failed: %s", e)
raise
# Publish to EventBus so the session event log captures user messages
from framework.runtime.event_bus import AgentEvent, EventType
@@ -139,7 +184,9 @@ async def handle_chat(request: web.Request) -> web.Response:
node_id="queen",
execution_id=session.id,
data={
"content": message,
# Allow the UI to display a user-friendly echo while
# the queen receives a richer relay wrapper.
"content": display_message if display_message is not None else message,
"image_count": len(image_content) if image_content else 0,
},
)
@@ -150,11 +197,30 @@ async def handle_chat(request: web.Request) -> web.Response:
"delivered": True,
}
)
else:
if node is None:
logger.error("[handle_chat] CRITICAL: Queen node is None! node_registry has %d keys: %s, queen_task=%s, queen_task_done=%s",
len(queen_executor.node_registry), list(queen_executor.node_registry.keys()),
session.queen_task, session.queen_task.done() if session.queen_task else None)
else:
logger.error("[handle_chat] CRITICAL: Queen node exists but missing inject_event! node_attrs=%s",
[a for a in dir(node) if not a.startswith('_')])
# Queen is dead — try to revive her
logger.warning(
"[handle_chat] Queen is dead for session '%s', reviving on /chat request", session.id
)
manager: Any = request.app["manager"]
try:
await manager.revive_queen(session, initial_prompt=message)
logger.debug("[handle_chat] Calling manager.revive_queen()...")
await manager.revive_queen(session)
logger.debug("[handle_chat] revive_queen() completed successfully")
# Inject the user's message into the revived queen's queue so the
# event loop drains it and clears any restored pending_input_state.
_revived_executor = session.queen_executor
_revived_node = _revived_executor.node_registry.get("queen") if _revived_executor else None
if _revived_node is not None and hasattr(_revived_node, "inject_event"):
await _revived_node.inject_event(message, is_client_input=True, image_content=image_content)
return web.json_response(
{
"status": "queen_revived",
@@ -162,7 +228,7 @@ async def handle_chat(request: web.Request) -> web.Response:
}
)
except Exception as e:
logger.error("Failed to revive queen: %s", e)
logger.exception("[handle_chat] Failed to revive queen: %s", e)
return web.json_response({"error": "Queen not available"}, status=503)
@@ -193,6 +259,10 @@ async def handle_queen_context(request: web.Request) -> web.Response:
return web.json_response({"status": "queued", "delivered": True})
# Queen is dead — try to revive her
logger.warning(
"Queen is dead for session '%s', reviving on /queen-context request",
session.id,
)
manager: Any = request.app["manager"]
try:
await manager.revive_queen(session)
@@ -209,56 +279,16 @@ async def handle_queen_context(request: web.Request) -> web.Response:
return web.json_response({"error": "Queen not available"}, status=503)
async def handle_worker_input(request: web.Request) -> web.Response:
"""POST /api/sessions/{session_id}/worker-input — send input to waiting worker node.
Auto-discovers the worker node currently awaiting input and injects the message.
Returns 404 if no worker node is awaiting input.
Body: {"message": "..."}
"""
session, err = resolve_session(request)
if err:
return err
body = await request.json()
message = body.get("message", "")
if not message:
return web.json_response({"error": "message is required"}, status=400)
if not session.worker_runtime:
return web.json_response({"error": "No worker loaded"}, status=503)
node_id, graph_id = session.worker_runtime.find_awaiting_node()
if not node_id:
return web.json_response({"error": "No worker node awaiting input"}, status=404)
delivered = await session.worker_runtime.inject_input(
node_id,
message,
graph_id=graph_id,
is_client_input=True,
)
return web.json_response(
{
"status": "injected",
"node_id": node_id,
"delivered": delivered,
}
)
async def handle_goal_progress(request: web.Request) -> web.Response:
"""GET /api/sessions/{session_id}/goal-progress — evaluate goal progress."""
session, err = resolve_session(request)
if err:
return err
if not session.worker_runtime:
return web.json_response({"error": "No worker loaded in this session"}, status=503)
if not session.graph_runtime:
return web.json_response({"error": "No graph loaded in this session"}, status=503)
progress = await session.worker_runtime.get_goal_progress()
progress = await session.graph_runtime.get_goal_progress()
return web.json_response(progress, dumps=lambda obj: json.dumps(obj, default=str))
@@ -271,8 +301,8 @@ async def handle_resume(request: web.Request) -> web.Response:
if err:
return err
if not session.worker_runtime:
return web.json_response({"error": "No worker loaded in this session"}, status=503)
if not session.graph_runtime:
return web.json_response({"error": "No graph loaded in this session"}, status=503)
body = await request.json()
worker_session_id = body.get("session_id")
@@ -296,30 +326,29 @@ async def handle_resume(request: web.Request) -> web.Response:
except (json.JSONDecodeError, OSError) as e:
return web.json_response({"error": f"Failed to read session: {e}"}, status=500)
if checkpoint_id:
resume_session_state = {
"resume_session_id": worker_session_id,
"resume_from_checkpoint": checkpoint_id,
}
else:
progress = state.get("progress", {})
paused_at = progress.get("paused_at") or progress.get("resume_from")
resume_session_state = {
"resume_session_id": worker_session_id,
"memory": state.get("memory", {}),
"execution_path": progress.get("path", []),
"node_visit_counts": progress.get("node_visit_counts", {}),
}
if paused_at:
resume_session_state["paused_at"] = paused_at
if not checkpoint_id:
return web.json_response(
{"error": "checkpoint_id is required; non-checkpoint resume is no longer supported"},
status=400,
)
entry_points = session.worker_runtime.get_entry_points()
cp_path = session_dir / "checkpoints" / f"{checkpoint_id}.json"
if not cp_path.exists():
return web.json_response({"error": "Checkpoint not found"}, status=404)
resume_session_state = {
"resume_session_id": worker_session_id,
"resume_from_checkpoint": checkpoint_id,
"run_id": _load_checkpoint_run_id(cp_path),
}
entry_points = session.graph_runtime.get_entry_points()
if not entry_points:
return web.json_response({"error": "No entry points available"}, status=400)
input_data = state.get("input_data", {})
execution_id = await session.worker_runtime.trigger(
execution_id = await session.graph_runtime.trigger(
entry_points[0].id,
input_data=input_data,
session_state=resume_session_state,
@@ -337,7 +366,7 @@ async def handle_resume(request: web.Request) -> web.Response:
async def handle_pause(request: web.Request) -> web.Response:
"""POST /api/sessions/{session_id}/pause — pause the worker (queen stays alive).
Mirrors the queen's stop_worker() tool: cancels all active worker
Mirrors the queen's stop_graph() tool: cancels all active worker
executions, pauses timers so nothing auto-restarts, but does NOT
touch the queen so she can observe and react to the pause.
"""
@@ -345,10 +374,10 @@ async def handle_pause(request: web.Request) -> web.Response:
if err:
return err
if not session.worker_runtime:
return web.json_response({"error": "No worker loaded in this session"}, status=503)
if not session.graph_runtime:
return web.json_response({"error": "No graph loaded in this session"}, status=503)
runtime = session.worker_runtime
runtime = session.graph_runtime
cancelled = []
for graph_id in runtime.list_graphs():
@@ -397,8 +426,8 @@ async def handle_stop(request: web.Request) -> web.Response:
if err:
return err
if not session.worker_runtime:
return web.json_response({"error": "No worker loaded in this session"}, status=503)
if not session.graph_runtime:
return web.json_response({"error": "No graph loaded in this session"}, status=503)
body = await request.json()
execution_id = body.get("execution_id")
@@ -406,8 +435,8 @@ async def handle_stop(request: web.Request) -> web.Response:
if not execution_id:
return web.json_response({"error": "execution_id is required"}, status=400)
for graph_id in session.worker_runtime.list_graphs():
reg = session.worker_runtime.get_graph_registration(graph_id)
for graph_id in session.graph_runtime.list_graphs():
reg = session.graph_runtime.get_graph_registration(graph_id)
if reg is None:
continue
for _ep_id, stream in reg.streams.items():
@@ -452,8 +481,8 @@ async def handle_replay(request: web.Request) -> web.Response:
if err:
return err
if not session.worker_runtime:
return web.json_response({"error": "No worker loaded in this session"}, status=503)
if not session.graph_runtime:
return web.json_response({"error": "No graph loaded in this session"}, status=503)
body = await request.json()
worker_session_id = body.get("session_id")
@@ -471,16 +500,17 @@ async def handle_replay(request: web.Request) -> web.Response:
if not cp_path.exists():
return web.json_response({"error": "Checkpoint not found"}, status=404)
entry_points = session.worker_runtime.get_entry_points()
entry_points = session.graph_runtime.get_entry_points()
if not entry_points:
return web.json_response({"error": "No entry points available"}, status=400)
replay_session_state = {
"resume_session_id": worker_session_id,
"resume_from_checkpoint": checkpoint_id,
"run_id": _load_checkpoint_run_id(cp_path),
}
execution_id = await session.worker_runtime.trigger(
execution_id = await session.graph_runtime.trigger(
entry_points[0].id,
input_data={},
session_state=replay_session_state,
@@ -517,7 +547,6 @@ def register_routes(app: web.Application) -> None:
app.router.add_post("/api/sessions/{session_id}/inject", handle_inject)
app.router.add_post("/api/sessions/{session_id}/chat", handle_chat)
app.router.add_post("/api/sessions/{session_id}/queen-context", handle_queen_context)
app.router.add_post("/api/sessions/{session_id}/worker-input", handle_worker_input)
app.router.add_post("/api/sessions/{session_id}/pause", handle_pause)
app.router.add_post("/api/sessions/{session_id}/resume", handle_resume)
app.router.add_post("/api/sessions/{session_id}/stop", handle_stop)
+5 -5
View File
@@ -13,9 +13,9 @@ logger = logging.getLogger(__name__)
def _get_graph_registration(session, graph_id: str):
"""Get _GraphRegistration for a graph_id. Returns (reg, None) or (None, error_response)."""
if not session.worker_runtime:
if not session.graph_runtime:
return None, web.json_response({"error": "No worker loaded in this session"}, status=503)
reg = session.worker_runtime.get_graph_registration(graph_id)
reg = session.graph_runtime.get_graph_registration(graph_id)
if reg is None:
return None, web.json_response({"error": f"Graph '{graph_id}' not found"}, status=404)
return reg, None
@@ -101,7 +101,7 @@ async def handle_list_nodes(request: web.Request) -> web.Response:
{"source": e.source, "target": e.target, "condition": e.condition, "priority": e.priority}
for e in graph.edges
]
rt = session.worker_runtime
rt = session.graph_runtime
entry_points = [
{
"id": ep.id,
@@ -193,8 +193,8 @@ async def handle_node_criteria(request: web.Request) -> web.Response:
}
worker_session_id = request.query.get("session_id")
if worker_session_id and session.worker_runtime:
log_store = getattr(session.worker_runtime, "_runtime_log_store", None)
if worker_session_id and session.graph_runtime:
log_store = getattr(session.graph_runtime, "_runtime_log_store", None)
if log_store:
details = await log_store.load_details(worker_session_id)
if details:
+4 -4
View File
@@ -22,10 +22,10 @@ async def handle_logs(request: web.Request) -> web.Response:
if err:
return err
if not session.worker_runtime:
if not session.graph_runtime:
return web.json_response({"error": "No worker loaded in this session"}, status=503)
log_store = getattr(session.worker_runtime, "_runtime_log_store", None)
log_store = getattr(session.graph_runtime, "_runtime_log_store", None)
if log_store is None:
return web.json_response({"error": "Logging not enabled for this agent"}, status=404)
@@ -77,10 +77,10 @@ async def handle_node_logs(request: web.Request) -> web.Response:
node_id = request.match_info["node_id"]
if not session.worker_runtime:
if not session.graph_runtime:
return web.json_response({"error": "No worker loaded in this session"}, status=503)
log_store = getattr(session.worker_runtime, "_runtime_log_store", None)
log_store = getattr(session.graph_runtime, "_runtime_log_store", None)
if log_store is None:
return web.json_response({"error": "Logging not enabled"}, status=404)
+36 -351
View File
@@ -1,26 +1,18 @@
"""Session lifecycle, info, and worker-session browsing routes.
"""Session lifecycle and session info routes.
Session-primary routes:
- POST /api/sessions create session (with or without worker)
- GET /api/sessions list all active sessions
- GET /api/sessions/{session_id} session detail
- DELETE /api/sessions/{session_id} stop session entirely
- POST /api/sessions/{session_id}/worker load a worker into session
- DELETE /api/sessions/{session_id}/worker unload worker from session
- POST /api/sessions/{session_id}/graph load a graph into session
- DELETE /api/sessions/{session_id}/graph unload graph from session
- GET /api/sessions/{session_id}/stats runtime statistics
- GET /api/sessions/{session_id}/entry-points list entry points
- PATCH /api/sessions/{session_id}/triggers/{id} update trigger task
- GET /api/sessions/{session_id}/graphs list graph IDs
- GET /api/sessions/{session_id}/events/history persisted eventbus log (for replay)
Worker session browsing (persisted execution runs on disk):
- GET /api/sessions/{session_id}/worker-sessions list
- GET /api/sessions/{session_id}/worker-sessions/{ws_id} detail
- DELETE /api/sessions/{session_id}/worker-sessions/{ws_id} delete
- GET /api/sessions/{session_id}/worker-sessions/{ws_id}/checkpoints list CPs
- POST /api/sessions/{session_id}/worker-sessions/{ws_id}/checkpoints/{cp}/restore
- GET /api/sessions/{session_id}/worker-sessions/{ws_id}/messages messages
"""
import asyncio
@@ -36,10 +28,7 @@ from pathlib import Path
from aiohttp import web
from framework.server.app import (
cold_sessions_dir,
resolve_session,
safe_path_segment,
sessions_dir,
validate_agent_path,
)
from framework.server.session_manager import SessionManager
@@ -60,9 +49,9 @@ def _session_to_live_dict(session) -> dict:
queen_model: str = getattr(getattr(session, "runner", None), "model", "") or ""
return {
"session_id": session.id,
"worker_id": session.worker_id,
"worker_name": info.name if info else session.worker_id,
"has_worker": session.worker_runtime is not None,
"graph_id": session.graph_id,
"graph_name": info.name if info else session.graph_id,
"has_worker": session.graph_runtime is not None,
"agent_path": str(session.worker_path) if session.worker_path else "",
"description": info.description if info else "",
"goal": info.goal_name if info else "",
@@ -72,7 +61,7 @@ def _session_to_live_dict(session) -> dict:
"intro_message": getattr(session.runner, "intro_message", "") or "",
"queen_phase": phase_state.phase
if phase_state
else ("staging" if session.worker_runtime else "planning"),
else ("staging" if session.graph_runtime else "planning"),
"queen_supports_images": supports_image_tool_results(queen_model) if queen_model else True,
}
@@ -118,16 +107,16 @@ async def handle_create_session(request: web.Request) -> web.Response:
"""POST /api/sessions — create a session.
Body: {
"agent_path": "..." (optional if provided, creates session with worker),
"agent_id": "..." (optional worker ID override),
"agent_path": "..." (optional if provided, creates session with graph),
"agent_id": "..." (optional graph ID override),
"session_id": "..." (optional custom session ID),
"model": "..." (optional),
"initial_prompt": "..." (optional first user message for the queen),
}
When agent_path is provided, creates a session with a worker in one step
When agent_path is provided, creates a session with a graph in one step
(equivalent to the old POST /api/agents). Otherwise creates a queen-only
session that can later have a worker loaded via POST /sessions/{id}/worker.
session that can later have a graph loaded via POST /sessions/{id}/graph.
"""
manager = _get_manager(request)
body = await request.json() if request.can_read_body else {}
@@ -148,8 +137,8 @@ async def handle_create_session(request: web.Request) -> web.Response:
try:
if agent_path:
# One-step: create session + load worker
session = await manager.create_session_with_worker(
# One-step: create session + load graph
session = await manager.create_session_with_worker_graph(
agent_path,
agent_id=agent_id,
session_id=session_id,
@@ -170,7 +159,7 @@ async def handle_create_session(request: web.Request) -> web.Response:
if "currently loading" in msg:
resolved_id = agent_id or (Path(agent_path).name if agent_path else "")
return web.json_response(
{"error": msg, "worker_id": resolved_id, "loading": True},
{"error": msg, "graph_id": resolved_id, "loading": True},
status=409,
)
return web.json_response({"error": msg}, status=409)
@@ -224,8 +213,8 @@ async def handle_get_live_session(request: web.Request) -> web.Response:
data = _session_to_live_dict(session)
if session.worker_runtime:
rt = session.worker_runtime
if session.graph_runtime:
rt = session.graph_runtime
data["entry_points"] = [
{
"id": ep.id,
@@ -257,7 +246,7 @@ async def handle_get_live_session(request: web.Request) -> web.Response:
if mono is not None:
entry["next_fire_in"] = max(0.0, mono - time.monotonic())
data["entry_points"].append(entry)
data["graphs"] = session.worker_runtime.list_graphs()
data["graphs"] = session.graph_runtime.list_graphs()
return web.json_response(data)
@@ -278,14 +267,14 @@ async def handle_stop_session(request: web.Request) -> web.Response:
# ------------------------------------------------------------------
# Worker lifecycle
# Graph lifecycle
# ------------------------------------------------------------------
async def handle_load_worker(request: web.Request) -> web.Response:
"""POST /api/sessions/{session_id}/worker — load a worker into a session.
async def handle_load_graph(request: web.Request) -> web.Response:
"""POST /api/sessions/{session_id}/graph — load a graph into a session.
Body: {"agent_path": "...", "worker_id": "..." (optional), "model": "..." (optional)}
Body: {"agent_path": "...", "graph_id": "..." (optional), "model": "..." (optional)}
"""
manager = _get_manager(request)
session_id = request.match_info["session_id"]
@@ -300,14 +289,14 @@ async def handle_load_worker(request: web.Request) -> web.Response:
except ValueError as e:
return web.json_response({"error": str(e)}, status=400)
worker_id = body.get("worker_id")
graph_id = body.get("graph_id")
model = body.get("model")
try:
session = await manager.load_worker(
session = await manager.load_graph(
session_id,
agent_path,
worker_id=worker_id,
graph_id=graph_id,
model=model,
)
except ValueError as e:
@@ -318,18 +307,18 @@ async def handle_load_worker(request: web.Request) -> web.Response:
resp = _credential_error_response(e, agent_path)
if resp is not None:
return resp
logger.exception("Error loading worker: %s", e)
logger.exception("Error loading graph: %s", e)
return web.json_response({"error": "Internal server error"}, status=500)
return web.json_response(_session_to_live_dict(session))
async def handle_unload_worker(request: web.Request) -> web.Response:
"""DELETE /api/sessions/{session_id}/worker — unload worker, keep queen alive."""
async def handle_unload_graph(request: web.Request) -> web.Response:
"""DELETE /api/sessions/{session_id}/graph — unload graph, keep queen alive."""
manager = _get_manager(request)
session_id = request.match_info["session_id"]
removed = await manager.unload_worker(session_id)
removed = await manager.unload_graph(session_id)
if not removed:
session = manager.get_session(session_id)
if session is None:
@@ -338,11 +327,11 @@ async def handle_unload_worker(request: web.Request) -> web.Response:
status=404,
)
return web.json_response(
{"error": "No worker loaded in this session"},
{"error": "No graph loaded in this session"},
status=409,
)
return web.json_response({"session_id": session_id, "worker_unloaded": True})
return web.json_response({"session_id": session_id, "graph_unloaded": True})
# ------------------------------------------------------------------
@@ -362,7 +351,7 @@ async def handle_session_stats(request: web.Request) -> web.Response:
status=404,
)
stats = session.worker_runtime.get_stats() if session.worker_runtime else {}
stats = session.graph_runtime.get_stats() if session.graph_runtime else {}
return web.json_response(stats)
@@ -378,7 +367,7 @@ async def handle_session_entry_points(request: web.Request) -> web.Response:
status=404,
)
rt = session.worker_runtime
rt = session.graph_runtime
eps = rt.get_entry_points() if rt else []
entry_points = [
{
@@ -580,293 +569,10 @@ async def handle_session_graphs(request: web.Request) -> web.Response:
status=404,
)
graphs = session.worker_runtime.list_graphs() if session.worker_runtime else []
graphs = session.graph_runtime.list_graphs() if session.graph_runtime else []
return web.json_response({"graphs": graphs})
# ------------------------------------------------------------------
# Worker session browsing (persisted execution runs on disk)
# ------------------------------------------------------------------
async def handle_list_worker_sessions(request: web.Request) -> web.Response:
"""List worker sessions on disk."""
session, err = resolve_session(request)
if err:
# Fall back to cold session lookup from disk
sid = request.match_info["session_id"]
sess_dir = cold_sessions_dir(sid)
if sess_dir is None:
return err
else:
if not session.worker_path:
return web.json_response({"sessions": []})
sess_dir = sessions_dir(session)
if not sess_dir.exists():
return web.json_response({"sessions": []})
sessions = []
for d in sorted(sess_dir.iterdir(), reverse=True):
if not d.is_dir():
continue
state_path = d / "state.json"
if not d.name.startswith("session_") and not state_path.exists():
continue
entry: dict = {"session_id": d.name}
if state_path.exists():
try:
state = json.loads(state_path.read_text(encoding="utf-8"))
entry["status"] = state.get("status", "unknown")
entry["started_at"] = state.get("started_at")
entry["completed_at"] = state.get("completed_at")
progress = state.get("progress", {})
entry["steps"] = progress.get("steps_executed", 0)
entry["paused_at"] = progress.get("paused_at")
except (json.JSONDecodeError, OSError):
entry["status"] = "error"
cp_dir = d / "checkpoints"
if cp_dir.exists():
entry["checkpoint_count"] = sum(1 for f in cp_dir.iterdir() if f.suffix == ".json")
else:
entry["checkpoint_count"] = 0
sessions.append(entry)
return web.json_response({"sessions": sessions})
async def handle_get_worker_session(request: web.Request) -> web.Response:
"""Get worker session detail from disk."""
session, err = resolve_session(request)
if err:
return err
if not session.worker_path:
return web.json_response({"error": "No worker loaded"}, status=503)
# Support both URL param names: ws_id (new) or session_id (legacy)
ws_id = request.match_info.get("ws_id") or request.match_info.get("session_id", "")
ws_id = safe_path_segment(ws_id)
state_path = sessions_dir(session) / ws_id / "state.json"
if not state_path.exists():
return web.json_response({"error": "Session not found"}, status=404)
try:
state = json.loads(state_path.read_text(encoding="utf-8"))
except (json.JSONDecodeError, OSError) as e:
return web.json_response({"error": f"Failed to read session: {e}"}, status=500)
return web.json_response(state)
async def handle_list_checkpoints(request: web.Request) -> web.Response:
"""List checkpoints for a worker session."""
session, err = resolve_session(request)
if err:
return err
if not session.worker_path:
return web.json_response({"error": "No worker loaded"}, status=503)
ws_id = request.match_info.get("ws_id") or request.match_info.get("session_id", "")
ws_id = safe_path_segment(ws_id)
cp_dir = sessions_dir(session) / ws_id / "checkpoints"
if not cp_dir.exists():
return web.json_response({"checkpoints": []})
checkpoints = []
for f in sorted(cp_dir.iterdir(), reverse=True):
if f.suffix != ".json":
continue
try:
data = json.loads(f.read_text(encoding="utf-8"))
checkpoints.append(
{
"checkpoint_id": f.stem,
"current_node": data.get("current_node"),
"next_node": data.get("next_node"),
"is_clean": data.get("is_clean", False),
"timestamp": data.get("timestamp"),
}
)
except (json.JSONDecodeError, OSError):
checkpoints.append({"checkpoint_id": f.stem, "error": "unreadable"})
return web.json_response({"checkpoints": checkpoints})
async def handle_delete_worker_session(request: web.Request) -> web.Response:
"""Delete a worker session from disk."""
session, err = resolve_session(request)
if err:
return err
if not session.worker_path:
return web.json_response({"error": "No worker loaded"}, status=503)
ws_id = request.match_info.get("ws_id") or request.match_info.get("session_id", "")
ws_id = safe_path_segment(ws_id)
session_path = sessions_dir(session) / ws_id
if not session_path.exists():
return web.json_response({"error": "Session not found"}, status=404)
shutil.rmtree(session_path)
return web.json_response({"deleted": ws_id})
async def handle_restore_checkpoint(request: web.Request) -> web.Response:
"""Restore from a checkpoint."""
session, err = resolve_session(request)
if err:
return err
if not session.worker_runtime:
return web.json_response({"error": "No worker loaded in this session"}, status=503)
ws_id = request.match_info.get("ws_id") or request.match_info.get("session_id", "")
ws_id = safe_path_segment(ws_id)
checkpoint_id = safe_path_segment(request.match_info["checkpoint_id"])
cp_path = sessions_dir(session) / ws_id / "checkpoints" / f"{checkpoint_id}.json"
if not cp_path.exists():
return web.json_response({"error": "Checkpoint not found"}, status=404)
entry_points = session.worker_runtime.get_entry_points()
if not entry_points:
return web.json_response({"error": "No entry points available"}, status=400)
restore_session_state = {
"resume_session_id": ws_id,
"resume_from_checkpoint": checkpoint_id,
}
execution_id = await session.worker_runtime.trigger(
entry_points[0].id,
input_data={},
session_state=restore_session_state,
)
return web.json_response(
{
"execution_id": execution_id,
"restored_from": ws_id,
"checkpoint_id": checkpoint_id,
}
)
async def handle_messages(request: web.Request) -> web.Response:
"""Get messages for a worker session."""
session, err = resolve_session(request)
if err:
# Fall back to cold session lookup from disk
sid = request.match_info["session_id"]
sess_dir = cold_sessions_dir(sid)
if sess_dir is None:
return err
else:
if not session.worker_path:
return web.json_response({"error": "No worker loaded"}, status=503)
sess_dir = sessions_dir(session)
ws_id = request.match_info.get("ws_id") or request.match_info.get("session_id", "")
ws_id = safe_path_segment(ws_id)
convs_dir = sess_dir / ws_id / "conversations"
if not convs_dir.exists():
return web.json_response({"messages": []})
filter_node = request.query.get("node_id")
all_messages = []
def _collect_msg_parts(parts_dir: Path, node_id: str) -> None:
if not parts_dir.exists():
return
for part_file in sorted(parts_dir.iterdir()):
if part_file.suffix != ".json":
continue
try:
part = json.loads(part_file.read_text(encoding="utf-8"))
part["_node_id"] = node_id
part.setdefault("created_at", part_file.stat().st_mtime)
all_messages.append(part)
except (json.JSONDecodeError, OSError):
continue
# Flat layout: conversations/parts/*.json
if not filter_node:
_collect_msg_parts(convs_dir / "parts", "worker")
# Node-based layout: conversations/<node_id>/parts/*.json
for node_dir in convs_dir.iterdir():
if not node_dir.is_dir() or node_dir.name == "parts":
continue
if filter_node and node_dir.name != filter_node:
continue
_collect_msg_parts(node_dir / "parts", node_dir.name)
# Merge run lifecycle markers from runs.jsonl (for historical dividers)
runs_file = sess_dir / ws_id / "runs.jsonl"
if runs_file.exists():
try:
for line in runs_file.read_text(encoding="utf-8").splitlines():
line = line.strip()
if not line:
continue
try:
record = json.loads(line)
all_messages.append(
{
"seq": -1,
"role": "system",
"content": "",
"_node_id": "_run_marker",
"is_run_marker": True,
"run_id": record.get("run_id"),
"run_event": record.get("event"),
"created_at": record.get("created_at", 0),
}
)
except json.JSONDecodeError:
continue
except OSError:
pass
all_messages.sort(key=lambda m: m.get("created_at", m.get("seq", 0)))
client_only = request.query.get("client_only", "").lower() in ("true", "1")
if client_only:
client_facing_nodes: set[str] = set()
if session and session.runner and hasattr(session.runner, "graph"):
for node in session.runner.graph.nodes:
if node.client_facing:
client_facing_nodes.add(node.id)
if client_facing_nodes:
all_messages = [
m
for m in all_messages
if m.get("is_run_marker")
or (
not m.get("is_transition_marker")
and m["role"] != "tool"
and not (m["role"] == "assistant" and m.get("tool_calls"))
and (
(m["role"] == "user" and m.get("is_client_input"))
or (m["role"] == "assistant" and m.get("_node_id") in client_facing_nodes)
)
)
]
return web.json_response({"messages": all_messages})
async def handle_session_events_history(request: web.Request) -> web.Response:
"""GET /api/sessions/{session_id}/events/history — persisted eventbus log.
@@ -1026,9 +732,9 @@ def register_routes(app: web.Application) -> None:
app.router.add_get("/api/sessions/{session_id}", handle_get_live_session)
app.router.add_delete("/api/sessions/{session_id}", handle_stop_session)
# Worker lifecycle
app.router.add_post("/api/sessions/{session_id}/worker", handle_load_worker)
app.router.add_delete("/api/sessions/{session_id}/worker", handle_unload_worker)
# Graph lifecycle
app.router.add_post("/api/sessions/{session_id}/graph", handle_load_graph)
app.router.add_delete("/api/sessions/{session_id}/graph", handle_unload_graph)
# Session info
app.router.add_post("/api/sessions/{session_id}/reveal", handle_reveal_session_folder)
@@ -1040,24 +746,3 @@ def register_routes(app: web.Application) -> None:
app.router.add_get("/api/sessions/{session_id}/graphs", handle_session_graphs)
app.router.add_get("/api/sessions/{session_id}/events/history", handle_session_events_history)
# Worker session browsing (session-primary)
app.router.add_get("/api/sessions/{session_id}/worker-sessions", handle_list_worker_sessions)
app.router.add_get(
"/api/sessions/{session_id}/worker-sessions/{ws_id}", handle_get_worker_session
)
app.router.add_delete(
"/api/sessions/{session_id}/worker-sessions/{ws_id}", handle_delete_worker_session
)
app.router.add_get(
"/api/sessions/{session_id}/worker-sessions/{ws_id}/checkpoints",
handle_list_checkpoints,
)
app.router.add_post(
"/api/sessions/{session_id}/worker-sessions/{ws_id}/checkpoints/{checkpoint_id}/restore",
handle_restore_checkpoint,
)
app.router.add_get(
"/api/sessions/{session_id}/worker-sessions/{ws_id}/messages",
handle_messages,
)
+154 -253
View File
@@ -35,20 +35,22 @@ class Session:
# Queen (always present once started)
queen_executor: Any = None # GraphExecutor for queen input injection
queen_task: asyncio.Task | None = None
# Worker (optional)
worker_id: str | None = None
# Loaded graph (optional)
graph_id: str | None = None
worker_path: Path | None = None
runner: Any | None = None # AgentRunner
worker_runtime: Any | None = None # AgentRuntime
graph_runtime: Any | None = None # AgentRuntime
worker_info: Any | None = None # AgentInfo
# Queen phase state (building/staging/running)
phase_state: Any = None # QueenPhaseState
# Worker handoff subscription
worker_handoff_sub: str | None = None
# Memory consolidation subscription (fires on CONTEXT_COMPACTED)
memory_consolidation_sub: str | None = None
# Worker run digest subscription (fires on EXECUTION_COMPLETED / EXECUTION_FAILED)
worker_digest_sub: str | None = None
# Memory reflection + recall subscriptions
memory_reflection_subs: list = field(default_factory=list) # list[str]
# Worker colony memory subscriptions
worker_memory_subs: list = field(default_factory=list) # list[str]
# Per-execution colony recall cache for worker prompts
worker_colony_recall_blocks: dict[str, str] = field(default_factory=dict)
# Trigger definitions loaded from agent's triggers.json (available but inactive)
available_triggers: dict[str, TriggerDefinition] = field(default_factory=dict)
# Active trigger tracking (IDs currently firing + their asyncio tasks)
@@ -94,7 +96,7 @@ class SessionManager:
) -> Session:
"""Create session infrastructure (EventBus, LLM) without starting queen.
Internal helper use create_session() or create_session_with_worker().
Internal helper use create_session() or create_session_with_worker_graph().
"""
from framework.config import RuntimeConfig, get_hive_config
from framework.runtime.event_bus import EventBus
@@ -166,7 +168,7 @@ class SessionManager:
)
return session
async def create_session_with_worker(
async def create_session_with_worker_graph(
self,
agent_path: str | Path,
agent_id: str | None = None,
@@ -184,7 +186,7 @@ class SessionManager:
from framework.tools.queen_lifecycle_tools import build_worker_profile
agent_path = Path(agent_path)
resolved_worker_id = agent_id or agent_path.name
resolved_graph_id = agent_id or agent_path.name
# When cold-restoring, check meta.json for the phase — if the agent
# was still being built we must NOT try to load the worker (the code
@@ -219,11 +221,11 @@ class SessionManager:
)
session.queen_resume_from = queen_resume_from
try:
# Load worker FIRST (before queen) so queen gets full tools
# Load the graph FIRST (before queen) so queen gets full tools
await self._load_worker_core(
session,
agent_path,
worker_id=resolved_worker_id,
graph_id=resolved_graph_id,
model=model,
)
@@ -232,8 +234,8 @@ class SessionManager:
# Start queen with worker profile + lifecycle + monitoring tools
worker_identity = (
build_worker_profile(session.worker_runtime, agent_path=agent_path)
if session.worker_runtime
build_worker_profile(session.graph_runtime, agent_path=agent_path)
if session.graph_runtime
else None
)
await self._start_queen(
@@ -270,10 +272,10 @@ class SessionManager:
self,
session: Session,
agent_path: str | Path,
worker_id: str | None = None,
graph_id: str | None = None,
model: str | None = None,
) -> None:
"""Load a worker agent into a session (core logic).
"""Load a graph into a session (core logic).
Sets up the runner, runtime, and session fields. Does NOT notify
the queen callers handle that step.
@@ -281,30 +283,23 @@ class SessionManager:
from framework.runner import AgentRunner
agent_path = Path(agent_path)
resolved_worker_id = worker_id or agent_path.name
resolved_graph_id = graph_id or agent_path.name
if session.worker_runtime is not None:
raise ValueError(f"Session '{session.id}' already has worker '{session.worker_id}'")
if session.graph_runtime is not None:
raise ValueError(f"Session '{session.id}' already has graph '{session.graph_id}'")
async with self._lock:
if session.id in self._loading:
raise ValueError(f"Session '{session.id}' is currently loading a worker")
raise ValueError(f"Session '{session.id}' is currently loading a graph")
self._loading.add(session.id)
try:
# Blocking I/O — load in executor
loop = asyncio.get_running_loop()
# Prioritize: explicit model arg > worker-specific model > session default
from framework.config import (
get_preferred_worker_model,
get_worker_api_base,
get_worker_api_key,
get_worker_llm_extra_kwargs,
)
worker_model = get_preferred_worker_model()
resolved_model = model or worker_model or self._model
# By default, workers share the session's LLM with the queen so
# execution and memory reflection/recall stay on the same model.
session_model = getattr(session.llm, "model", None)
resolved_model = model or session_model or self._model
runner = await loop.run_in_executor(
None,
lambda: AgentRunner.load(
@@ -316,29 +311,8 @@ class SessionManager:
),
)
# If a worker-specific model is configured, build an LLM provider
# with the correct worker credentials so _setup() doesn't fall back
# to the queen's llm config (which may be a different provider).
if worker_model and not model:
from framework.config import get_hive_config
worker_llm_cfg = get_hive_config().get("worker_llm", {})
if worker_llm_cfg.get("use_antigravity_subscription"):
from framework.llm.antigravity import AntigravityProvider
runner._llm = AntigravityProvider(model=resolved_model)
else:
from framework.llm.litellm import LiteLLMProvider
worker_api_key = get_worker_api_key()
worker_api_base = get_worker_api_base()
worker_extra = get_worker_llm_extra_kwargs()
runner._llm = LiteLLMProvider(
model=resolved_model,
api_key=worker_api_key,
api_base=worker_api_base,
**worker_extra,
)
if model is None:
runner._llm = session.llm
# Setup with session's event bus
if runner._agent_runtime is None:
@@ -349,6 +323,16 @@ class SessionManager:
runtime = runner._agent_runtime
if runtime is not None:
runtime._dynamic_memory_provider_factory = (
lambda execution_id, session=session: (
lambda execution_id=execution_id, session=session: session.worker_colony_recall_blocks.get(
execution_id,
"",
)
)
)
# Load triggers from the agent's triggers.json definition file.
from framework.tools.queen_lifecycle_tools import _read_agent_triggers_json
@@ -378,21 +362,30 @@ class SessionManager:
info = runner.info()
# Update session
session.worker_id = resolved_worker_id
session.graph_id = resolved_graph_id
session.worker_path = agent_path
session.runner = runner
session.worker_runtime = runtime
session.graph_runtime = runtime
session.worker_info = info
# Subscribe to execution completion for per-run digest generation
self._subscribe_worker_digest(session)
# Colony memory is additive; worker loading should still succeed if
# that optional subscription path hits an import/runtime issue while
# restoring an older session.
try:
await self._subscribe_worker_colony_memory(session)
except Exception:
logger.warning(
"Worker colony memory subscription failed for '%s'; continuing without it",
resolved_graph_id,
exc_info=True,
)
async with self._lock:
self._loading.discard(session.id)
logger.info(
"Worker '%s' loaded into session '%s'",
resolved_worker_id,
resolved_graph_id,
session.id,
)
@@ -495,10 +488,10 @@ class SessionManager:
Called after worker loading to restart any timer/webhook triggers
that were active before a server restart.
"""
if not session.available_triggers or not session.worker_runtime:
if not session.available_triggers or not session.graph_runtime:
return
try:
store = session.worker_runtime._session_store
store = session.graph_runtime._session_store
state = await store.read_state(session_id)
if state and state.active_triggers:
from framework.tools.queen_lifecycle_tools import (
@@ -534,16 +527,16 @@ class SessionManager:
except Exception as e:
logger.warning("Failed to restore active triggers: %s", e)
async def load_worker(
async def load_graph(
self,
session_id: str,
agent_path: str | Path,
worker_id: str | None = None,
graph_id: str | None = None,
model: str | None = None,
) -> Session:
"""Load a worker agent into an existing session (with running queen).
"""Load a graph into an existing session (with running queen).
Starts the worker runtime and notifies the queen.
Starts the graph runtime and notifies the queen.
"""
agent_path = Path(agent_path)
@@ -554,13 +547,13 @@ class SessionManager:
await self._load_worker_core(
session,
agent_path,
worker_id=worker_id,
graph_id=graph_id,
model=model,
)
# Notify queen about the loaded worker (skip for queen itself).
if agent_path.name != "queen" and session.worker_runtime:
await self._notify_queen_worker_loaded(session)
if agent_path.name != "queen" and session.graph_runtime:
await self._notify_queen_graph_loaded(session)
# Update meta.json so cold-restore can discover this session by agent_path
storage_session_id = session.queen_resume_from or session.id
@@ -585,16 +578,16 @@ class SessionManager:
await self._restore_active_triggers(session, session_id)
# Emit SSE event so the frontend can update UI
await self._emit_worker_loaded(session)
await self._emit_graph_loaded(session)
return session
async def unload_worker(self, session_id: str) -> bool:
async def unload_graph(self, session_id: str) -> bool:
"""Unload the worker from a session. Queen stays alive."""
session = self._sessions.get(session_id)
if session is None:
return False
if session.worker_runtime is None:
if session.graph_runtime is None:
return False
# Cleanup worker
@@ -602,7 +595,7 @@ class SessionManager:
try:
await session.runner.cleanup_async()
except Exception as e:
logger.error("Error cleaning up worker '%s': %s", session.worker_id, e)
logger.error("Error cleaning up graph '%s': %s", session.graph_id, e)
# Cancel active trigger timers
for tid, task in session.active_timer_tasks.items():
@@ -624,24 +617,25 @@ class SessionManager:
await self._emit_trigger_events(session, "removed", session.available_triggers)
session.available_triggers.clear()
if session.worker_digest_sub is not None:
for sub_id in session.worker_memory_subs:
try:
session.event_bus.unsubscribe(session.worker_digest_sub)
session.event_bus.unsubscribe(sub_id)
except Exception:
pass
session.worker_digest_sub = None
session.worker_memory_subs.clear()
session.worker_colony_recall_blocks.clear()
worker_id = session.worker_id
session.worker_id = None
graph_id = session.graph_id
session.graph_id = None
session.worker_path = None
session.runner = None
session.worker_runtime = None
session.graph_runtime = None
session.worker_info = None
# Notify queen
await self._notify_queen_worker_unloaded(session)
logger.info("Worker '%s' unloaded from session '%s'", worker_id, session_id)
logger.info("Graph '%s' unloaded from session '%s'", graph_id, session_id)
return True
# ------------------------------------------------------------------
@@ -668,20 +662,21 @@ class SessionManager:
pass
session.worker_handoff_sub = None
if session.worker_digest_sub is not None:
for sub_id in session.worker_memory_subs:
try:
session.event_bus.unsubscribe(session.worker_digest_sub)
session.event_bus.unsubscribe(sub_id)
except Exception:
pass
session.worker_digest_sub = None
session.worker_memory_subs.clear()
session.worker_colony_recall_blocks.clear()
# Stop queen and memory consolidation subscription
if session.memory_consolidation_sub is not None:
# Stop queen and memory reflection/recall subscriptions
for sub_id in session.memory_reflection_subs:
try:
session.event_bus.unsubscribe(session.memory_consolidation_sub)
session.event_bus.unsubscribe(sub_id)
except Exception:
pass
session.memory_consolidation_sub = None
session.memory_reflection_subs.clear()
if session.queen_task is not None:
session.queen_task.cancel()
session.queen_task = None
@@ -713,15 +708,16 @@ class SessionManager:
except Exception as e:
logger.error("Error cleaning up worker: %s", e)
# Final memory consolidation — fire-and-forget so teardown isn't blocked.
if _llm is not None and _session_dir.exists():
# Final long reflection — fire-and-forget so teardown isn't blocked.
if _llm is not None:
import asyncio
from framework.agents.queen.queen_memory import consolidate_queen_memory
from framework.agents.queen.queen_memory_v2 import colony_memory_dir
from framework.agents.queen.reflection_agent import run_long_reflection
asyncio.create_task(
consolidate_queen_memory(session_id, _session_dir, _llm),
name=f"queen-memory-consolidation-{session_id}",
run_long_reflection(_llm, memory_dir=colony_memory_dir(_storage_id), caller="queen"),
name=f"queen-memory-long-reflection-{session_id}",
)
# Close per-session event log
@@ -759,133 +755,52 @@ class SessionManager:
else:
logger.warning("Worker handoff received but queen node not ready")
def _subscribe_worker_digest(self, session: Session) -> None:
"""Subscribe to worker events to write per-run digests.
Three triggers:
- NODE_LOOP_ITERATION: write a mid-run snapshot, throttled to at most
once every _DIGEST_COOLDOWN seconds per execution.
- TOOL_CALL_COMPLETED for delegate_to_sub_agent: same throttled snapshot.
Orchestrator nodes often run all subagent calls in a single LLM turn,
so NODE_LOOP_ITERATION only fires once at the end. Subagent
completions provide intermediate checkpoints.
- EXECUTION_COMPLETED / EXECUTION_FAILED: always write the final digest,
bypassing the cooldown.
"""
import time as _time
from framework.runtime.event_bus import EventType as _ET
_DIGEST_COOLDOWN = 300.0 # seconds between mid-run snapshots
if session.worker_digest_sub is not None:
async def _subscribe_worker_colony_memory(self, session: Session) -> None:
"""Subscribe shared colony reflection/recall for top-level worker runs."""
for sub_id in session.worker_memory_subs:
try:
session.event_bus.unsubscribe(session.worker_digest_sub)
session.event_bus.unsubscribe(sub_id)
except Exception:
pass
session.worker_digest_sub = None
session.worker_memory_subs.clear()
session.worker_colony_recall_blocks.clear()
agent_name = session.worker_path.name if session.worker_path else None
if not agent_name:
runtime = session.graph_runtime
if runtime is None:
return
_agent_name = agent_name
_llm = session.llm
_bus = session.event_bus
# per-execution_id monotonic timestamp of last mid-run digest
_last_digest: dict[str, float] = {}
worker_sessions_dir = getattr(runtime, "_session_store", None)
worker_sessions_dir = getattr(worker_sessions_dir, "sessions_dir", None)
if worker_sessions_dir is None:
return
def _resolve_run_id(exec_id: str) -> str | None:
"""Look up the run_id for a given execution_id via EXECUTION_STARTED history."""
for e in _bus.get_history(event_type=_ET.EXECUTION_STARTED, limit=200):
if e.execution_id == exec_id and getattr(e, "run_id", None):
return e.run_id
return None
from framework.agents.queen.queen_memory_v2 import colony_memory_dir, init_memory_dir
from framework.agents.queen.reflection_agent import subscribe_worker_memory_triggers
async def _inject_digest_to_queen(run_id: str) -> None:
"""Read the written digest and push it into the queen's conversation."""
from framework.agents.worker_memory import digest_path
colony_dir = colony_memory_dir(session.id)
init_memory_dir(colony_dir, migrate_legacy=True)
try:
content = digest_path(_agent_name, run_id).read_text(encoding="utf-8").strip()
except OSError:
return
if not content:
return
executor = session.queen_executor
if executor is None:
return
node = executor.node_registry.get("queen")
if node is None or not hasattr(node, "inject_event"):
return
await node.inject_event(f"[WORKER_DIGEST]\n{content}")
runtime._dynamic_memory_provider_factory = (
lambda execution_id, session=session: (
lambda execution_id=execution_id, session=session: session.worker_colony_recall_blocks.get(
execution_id,
"",
)
)
)
async def _consolidate_and_notify(run_id: str, outcome_event: Any) -> None:
"""Write the digest then push it to the queen."""
from framework.agents.worker_memory import consolidate_worker_run
# Colony memory config for reflection-at-handoff
runtime._colony_memory_dir = colony_dir
runtime._colony_worker_sessions_dir = worker_sessions_dir
runtime._colony_recall_cache = session.worker_colony_recall_blocks
runtime._colony_reflect_llm = session.llm
await consolidate_worker_run(_agent_name, run_id, outcome_event, _bus, _llm)
await _inject_digest_to_queen(run_id)
async def _on_worker_event(event: Any) -> None:
if event.stream_id == "queen":
return
exec_id = event.execution_id
if event.type == _ET.EXECUTION_STARTED:
# New run on this execution_id — start the cooldown timer so
# mid-run snapshots don't fire immediately at session start.
# The first snapshot will happen after _DIGEST_COOLDOWN seconds.
if exec_id:
_last_digest[exec_id] = _time.monotonic()
elif event.type in (
_ET.EXECUTION_COMPLETED,
_ET.EXECUTION_FAILED,
_ET.EXECUTION_PAUSED,
):
# Final digest — always fire, ignore cooldown.
# EXECUTION_PAUSED covers cancellation (queen re-triggering the
# worker cancels the previous execution, emitting paused).
run_id = getattr(event, "run_id", None) or _resolve_run_id(exec_id)
if run_id:
asyncio.create_task(
_consolidate_and_notify(run_id, event),
name=f"worker-digest-final-{run_id}",
)
elif event.type in (_ET.NODE_LOOP_ITERATION, _ET.TOOL_CALL_COMPLETED):
# Mid-run snapshot — respect 300 s cooldown per execution.
# TOOL_CALL_COMPLETED is only interesting for subagent calls;
# regular tool completions are too frequent and too cheap.
if event.type == _ET.TOOL_CALL_COMPLETED:
tool_name = (event.data or {}).get("tool_name", "")
if tool_name != "delegate_to_sub_agent":
return
if not exec_id:
return
now = _time.monotonic()
if now - _last_digest.get(exec_id, 0.0) < _DIGEST_COOLDOWN:
return
run_id = _resolve_run_id(exec_id)
if run_id:
_last_digest[exec_id] = now
asyncio.create_task(
_consolidate_and_notify(run_id, None),
name=f"worker-digest-{run_id}",
)
session.worker_digest_sub = session.event_bus.subscribe(
event_types=[
_ET.EXECUTION_STARTED,
_ET.NODE_LOOP_ITERATION,
_ET.TOOL_CALL_COMPLETED,
_ET.EXECUTION_COMPLETED,
_ET.EXECUTION_FAILED,
_ET.EXECUTION_PAUSED,
],
handler=_on_worker_event,
session.worker_memory_subs = await subscribe_worker_memory_triggers(
session.event_bus,
session.llm,
worker_sessions_dir=worker_sessions_dir,
colony_memory_dir=colony_dir,
recall_cache=session.worker_colony_recall_blocks,
)
def _subscribe_worker_handoffs(self, session: Session, executor: Any) -> None:
@@ -918,6 +833,8 @@ class SessionManager:
"""
from framework.server.queen_orchestrator import create_queen
logger.debug("[_start_queen] Starting for session %s, current queen_executor=%s", session.id, session.queen_executor)
hive_home = Path.home() / ".hive"
# Determine which session directory to use for queen storage.
@@ -1001,6 +918,7 @@ class SessionManager:
pass
session.event_bus.set_session_log(events_path, iteration_offset=iteration_offset)
logger.debug("[_start_queen] Calling create_queen...")
session.queen_task = await create_queen(
session=session,
session_manager=self,
@@ -1008,10 +926,11 @@ class SessionManager:
queen_dir=queen_dir,
initial_prompt=initial_prompt,
)
logger.debug("[_start_queen] create_queen returned, queen_task=%s, queen_executor=%s", session.queen_task, session.queen_executor)
# Auto-load worker on cold restore — the queen's conversation expects
# the agent to be loaded, but the new session has no worker.
if session.queen_resume_from and not session.worker_runtime:
if session.queen_resume_from and not session.graph_runtime:
meta_path = queen_dir / "meta.json"
if meta_path.exists():
try:
@@ -1022,7 +941,7 @@ class SessionManager:
if _agent_path and Path(_agent_path).exists():
if _phase in ("staging", "running", None):
# Agent fully built — load worker and resume
await self.load_worker(session.id, _agent_path)
await self.load_graph(session.id, _agent_path)
if session.phase_state:
await session.phase_state.switch_to_staging(source="auto")
# Emit flowchart overlay so frontend can display it
@@ -1041,38 +960,16 @@ class SessionManager:
except Exception:
logger.warning("Cold restore: failed to auto-load worker", exc_info=True)
# Memory consolidation — triggered by context compaction events.
# Compaction is a natural signal that "enough has happened to be worth remembering".
_consolidation_llm = session.llm
_consolidation_session_dir = queen_dir
async def _on_compaction(_event) -> None:
# Only consolidate on queen compactions — worker and subagent
# compactions are frequent and don't warrant a memory update.
if getattr(_event, "stream_id", None) != "queen":
return
from framework.agents.queen.queen_memory import consolidate_queen_memory
asyncio.create_task(
consolidate_queen_memory(
session.id, _consolidation_session_dir, _consolidation_llm
),
name=f"queen-memory-consolidation-{session.id}",
)
from framework.runtime.event_bus import EventType as _ET
session.memory_consolidation_sub = session.event_bus.subscribe(
event_types=[_ET.CONTEXT_COMPACTED],
handler=_on_compaction,
)
# Memory reflection/recall subscriptions are set up inside
# queen_orchestrator.create_queen() → _queen_loop() and stored
# on session.memory_reflection_subs for teardown.
# ------------------------------------------------------------------
# Queen notifications
# ------------------------------------------------------------------
async def _notify_queen_worker_loaded(self, session: Session) -> None:
"""Inject a system message into the queen about the loaded worker."""
async def _notify_queen_graph_loaded(self, session: Session) -> None:
"""Inject a system message into the queen about the loaded graph."""
from framework.tools.queen_lifecycle_tools import build_worker_profile
executor = session.queen_executor
@@ -1082,7 +979,7 @@ class SessionManager:
if node is None or not hasattr(node, "inject_event"):
return
profile = build_worker_profile(session.worker_runtime, agent_path=session.worker_path)
profile = build_worker_profile(session.graph_runtime, agent_path=session.worker_path)
# Append available trigger info so the queen knows what's schedulable
trigger_lines = ""
@@ -1098,20 +995,20 @@ class SessionManager:
+ "\n".join(parts)
)
await node.inject_event(f"[SYSTEM] Worker loaded.{profile}{trigger_lines}")
await node.inject_event(f"[SYSTEM] Graph loaded.{profile}{trigger_lines}")
async def _emit_worker_loaded(self, session: Session) -> None:
"""Publish a WORKER_LOADED event so the frontend can update."""
async def _emit_graph_loaded(self, session: Session) -> None:
"""Publish a WORKER_GRAPH_LOADED event so the frontend can update."""
from framework.runtime.event_bus import AgentEvent, EventType
info = session.worker_info
await session.event_bus.publish(
AgentEvent(
type=EventType.WORKER_LOADED,
type=EventType.WORKER_GRAPH_LOADED,
stream_id="queen",
data={
"worker_id": session.worker_id,
"worker_name": info.name if info else session.worker_id,
"graph_id": session.graph_id,
"graph_name": info.name if info else session.graph_id,
"agent_path": str(session.worker_path) if session.worker_path else "",
"goal": info.goal_name if info else "",
"node_count": info.node_count if info else 0,
@@ -1188,26 +1085,30 @@ class SessionManager:
)
)
async def revive_queen(self, session: Session, initial_prompt: str | None = None) -> None:
async def revive_queen(self, session: Session) -> None:
"""Revive a dead queen executor on an existing session.
Restarts the queen with the same session context (worker profile, tools, etc.).
"""
from framework.tools.queen_lifecycle_tools import build_worker_profile
logger.debug("[revive_queen] Starting revival for session '%s', current queen_executor=%s", session.id, session.queen_executor)
# Build worker identity if worker is loaded
worker_identity = (
build_worker_profile(session.worker_runtime, agent_path=session.worker_path)
if session.worker_runtime
build_worker_profile(session.graph_runtime, agent_path=session.worker_path)
if session.graph_runtime
else None
)
logger.debug("[revive_queen] worker_identity=%s", "present" if worker_identity else "None")
# Start queen with existing session context
logger.debug("[revive_queen] Calling _start_queen...")
await self._start_queen(
session, worker_identity=worker_identity, initial_prompt=initial_prompt
session, worker_identity=worker_identity
)
logger.info("Queen revived for session '%s'", session.id)
logger.info("Queen revived for session '%s', new queen_executor=%s", session.id, session.queen_executor)
# ------------------------------------------------------------------
# Lookups
@@ -1216,22 +1117,22 @@ class SessionManager:
def get_session(self, session_id: str) -> Session | None:
return self._sessions.get(session_id)
def get_session_by_worker_id(self, worker_id: str) -> Session | None:
"""Find a session by its loaded worker's ID."""
def get_session_by_graph_id(self, graph_id: str) -> Session | None:
"""Find a session by its loaded graph's ID."""
for s in self._sessions.values():
if s.worker_id == worker_id:
if s.graph_id == graph_id:
return s
return None
def get_session_for_agent(self, agent_id: str) -> Session | None:
"""Resolve an agent_id to a session (backward compat).
Checks session.id first, then session.worker_id.
Checks session.id first, then session.graph_id.
"""
s = self._sessions.get(agent_id)
if s:
return s
return self.get_session_by_worker_id(agent_id)
return self.get_session_by_graph_id(agent_id)
def is_loading(self, session_id: str) -> bool:
return session_id in self._loading
+60 -343
View File
@@ -83,7 +83,7 @@ class MockStream:
_active_executors: dict = field(default_factory=dict)
active_execution_ids: set = field(default_factory=set)
async def cancel_execution(self, execution_id: str) -> bool:
async def cancel_execution(self, execution_id: str, reason: str | None = None) -> bool:
return execution_id in self._execution_tasks
@@ -171,6 +171,7 @@ def _make_session(
graph = MockGraphSpec(nodes=nodes or [], edges=edges or [])
rt = runtime or MockRuntime(graph=graph, log_store=log_store)
runner = MagicMock()
runner.cleanup = AsyncMock()
runner.intro_message = "Test intro"
mock_event_bus = MagicMock()
@@ -185,10 +186,10 @@ def _make_session(
llm=mock_llm,
loaded_at=1000000.0,
queen_executor=queen_executor,
worker_id=agent_id,
graph_id=agent_id,
worker_path=agent_path,
runner=runner,
worker_runtime=rt,
graph_runtime=rt,
worker_info=MockAgentInfo(),
)
@@ -224,7 +225,7 @@ def _write_sample_session(base: Path, session_id: str):
"started_at": "2026-02-20T12:00:00",
"completed_at": None,
"input_data": {"user_request": "test input"},
"memory": {"key1": "value1"},
"data_buffer": {"key1": "value1"},
"progress": {
"current_node": "node_b",
"paused_at": "node_b",
@@ -368,7 +369,7 @@ class TestSessionCRUD:
async def test_create_session_with_worker_forwards_session_id(self):
app = create_app()
manager = app["manager"]
manager.create_session_with_worker = AsyncMock(
manager.create_session_with_worker_graph = AsyncMock(
return_value=_make_session(agent_id="my-custom-session")
)
@@ -384,7 +385,7 @@ class TestSessionCRUD:
assert resp.status == 201
assert data["session_id"] == "my-custom-session"
manager.create_session_with_worker.assert_awaited_once_with(
manager.create_session_with_worker_graph.assert_awaited_once_with(
str(EXAMPLE_AGENT_PATH.resolve()),
agent_id=None,
session_id="my-custom-session",
@@ -616,10 +617,33 @@ class TestExecution:
assert data["delivered"] is True
@pytest.mark.asyncio
async def test_chat_injects_when_node_waiting(self):
"""When a node is awaiting input, /chat should inject instead of trigger."""
async def test_chat_publishes_display_message_when_provided(self):
session = _make_session()
session.worker_runtime.find_awaiting_node = lambda: ("chat_node", "primary")
queen_node = session.queen_executor.node_registry["queen"]
app = _make_app_with_session(session)
async with TestClient(TestServer(app)) as client:
resp = await client.post(
"/api/sessions/test_agent/chat",
json={
"message": '[Worker asked: "Need approval"]\nUser answered: "Ship it"',
"display_message": "Ship it",
},
)
assert resp.status == 200
published_event = session.event_bus.publish.await_args.args[0]
assert published_event.data["content"] == "Ship it"
queen_node.inject_event.assert_awaited_once_with(
'[Worker asked: "Need approval"]\nUser answered: "Ship it"',
is_client_input=True,
image_content=None,
)
@pytest.mark.asyncio
async def test_chat_prefers_queen_even_when_node_waiting(self):
"""When the queen is alive, /chat routes to queen even if a node is waiting."""
session = _make_session()
session.graph_runtime.find_awaiting_node = lambda: ("chat_node", "primary")
app = _make_app_with_session(session)
async with TestClient(TestServer(app)) as client:
resp = await client.post(
@@ -628,8 +652,7 @@ class TestExecution:
)
assert resp.status == 200
data = await resp.json()
assert data["status"] == "injected"
assert data["node_id"] == "chat_node"
assert data["status"] == "queen"
assert data["delivered"] is True
@pytest.mark.asyncio
@@ -644,6 +667,19 @@ class TestExecution:
)
assert resp.status == 503
@pytest.mark.asyncio
async def test_worker_input_route_removed(self):
session = _make_session()
app = _make_app_with_session(session)
async with TestClient(TestServer(app)) as client:
resp = await client.post(
"/api/sessions/test_agent/worker-input",
json={"message": "hello"},
)
# No POST handler remains for this path; aiohttp falls through to an
# overlapping GET/HEAD route and reports method-not-allowed.
assert resp.status == 405
@pytest.mark.asyncio
async def test_chat_missing_message(self):
session = _make_session()
@@ -700,7 +736,7 @@ class TestExecution:
class TestResume:
@pytest.mark.asyncio
async def test_resume_from_session_state(self, sample_session, tmp_agent_dir):
"""Resume using session state (paused_at)."""
"""Direct state-based resume is rejected; checkpoint resume is required."""
session_id, session_dir, state = sample_session
tmp_path, agent_name, base = tmp_agent_dir
@@ -712,11 +748,9 @@ class TestResume:
"/api/sessions/test_agent/resume",
json={"session_id": session_id},
)
assert resp.status == 200
assert resp.status == 400
data = await resp.json()
assert data["execution_id"] == "exec_test_123"
assert data["resumed_from"] == session_id
assert data["checkpoint_id"] is None
assert "checkpoint_id is required" in data["error"]
@pytest.mark.asyncio
async def test_resume_with_checkpoint(self, sample_session, tmp_agent_dir):
@@ -725,6 +759,7 @@ class TestResume:
tmp_path, agent_name, base = tmp_agent_dir
session = _make_session(tmp_dir=tmp_path / ".hive" / "agents" / agent_name)
session.graph_runtime.trigger = AsyncMock(return_value="exec_test_123")
app = _make_app_with_session(session)
async with TestClient(TestServer(app)) as client:
@@ -738,6 +773,8 @@ class TestResume:
assert resp.status == 200
data = await resp.json()
assert data["checkpoint_id"] == "cp_node_complete_node_a_001"
_, kwargs = session.graph_runtime.trigger.await_args
assert kwargs["session_state"]["run_id"] == "__legacy_run__"
@pytest.mark.asyncio
async def test_resume_missing_session_id(self):
@@ -767,7 +804,7 @@ class TestStop:
async def test_stop_found(self):
session = _make_session()
# Put a mock task in the stream so cancel_execution returns True
session.worker_runtime._mock_streams["default"]._execution_tasks["exec_abc"] = MagicMock()
session.graph_runtime._mock_streams["default"]._execution_tasks["exec_abc"] = MagicMock()
app = _make_app_with_session(session)
async with TestClient(TestServer(app)) as client:
resp = await client.post(
@@ -808,6 +845,7 @@ class TestReplay:
tmp_path, agent_name, base = tmp_agent_dir
session = _make_session(tmp_dir=tmp_path / ".hive" / "agents" / agent_name)
session.graph_runtime.trigger = AsyncMock(return_value="exec_test_123")
app = _make_app_with_session(session)
async with TestClient(TestServer(app)) as client:
@@ -822,6 +860,8 @@ class TestReplay:
data = await resp.json()
assert data["execution_id"] == "exec_test_123"
assert data["replayed_from"] == session_id
_, kwargs = session.graph_runtime.trigger.await_args
assert kwargs["session_state"]["run_id"] == "__legacy_run__"
@pytest.mark.asyncio
async def test_replay_missing_fields(self):
@@ -859,329 +899,6 @@ class TestReplay:
assert resp.status == 404
class TestWorkerSessions:
@pytest.mark.asyncio
async def test_list_sessions(self, sample_session, tmp_agent_dir):
session_id, session_dir, state = sample_session
tmp_path, agent_name, base = tmp_agent_dir
session = _make_session(tmp_dir=tmp_path / ".hive" / "agents" / agent_name)
app = _make_app_with_session(session)
async with TestClient(TestServer(app)) as client:
resp = await client.get("/api/sessions/test_agent/worker-sessions")
assert resp.status == 200
data = await resp.json()
assert len(data["sessions"]) == 1
assert data["sessions"][0]["session_id"] == session_id
assert data["sessions"][0]["status"] == "paused"
assert data["sessions"][0]["steps"] == 5
@pytest.mark.asyncio
async def test_list_sessions_includes_custom_id(self, custom_id_session, tmp_agent_dir):
session_id, session_dir, state = custom_id_session
tmp_path, agent_name, base = tmp_agent_dir
session = _make_session(tmp_dir=tmp_path / ".hive" / "agents" / agent_name)
app = _make_app_with_session(session)
async with TestClient(TestServer(app)) as client:
resp = await client.get("/api/sessions/test_agent/worker-sessions")
assert resp.status == 200
data = await resp.json()
assert len(data["sessions"]) == 1
assert data["sessions"][0]["session_id"] == session_id
assert data["sessions"][0]["status"] == "paused"
@pytest.mark.asyncio
async def test_list_sessions_empty(self, tmp_agent_dir):
tmp_path, agent_name, base = tmp_agent_dir
session = _make_session(tmp_dir=tmp_path / ".hive" / "agents" / agent_name)
app = _make_app_with_session(session)
async with TestClient(TestServer(app)) as client:
resp = await client.get("/api/sessions/test_agent/worker-sessions")
assert resp.status == 200
data = await resp.json()
assert data["sessions"] == []
@pytest.mark.asyncio
async def test_get_session(self, sample_session, tmp_agent_dir):
session_id, session_dir, state = sample_session
tmp_path, agent_name, base = tmp_agent_dir
session = _make_session(tmp_dir=tmp_path / ".hive" / "agents" / agent_name)
app = _make_app_with_session(session)
async with TestClient(TestServer(app)) as client:
resp = await client.get(f"/api/sessions/test_agent/worker-sessions/{session_id}")
assert resp.status == 200
data = await resp.json()
assert data["status"] == "paused"
assert data["memory"]["key1"] == "value1"
@pytest.mark.asyncio
async def test_get_session_not_found(self, tmp_agent_dir):
tmp_path, agent_name, base = tmp_agent_dir
session = _make_session(tmp_dir=tmp_path / ".hive" / "agents" / agent_name)
app = _make_app_with_session(session)
async with TestClient(TestServer(app)) as client:
resp = await client.get("/api/sessions/test_agent/worker-sessions/nonexistent")
assert resp.status == 404
@pytest.mark.asyncio
async def test_delete_session(self, sample_session, tmp_agent_dir):
session_id, session_dir, state = sample_session
tmp_path, agent_name, base = tmp_agent_dir
session = _make_session(tmp_dir=tmp_path / ".hive" / "agents" / agent_name)
app = _make_app_with_session(session)
async with TestClient(TestServer(app)) as client:
resp = await client.delete(f"/api/sessions/test_agent/worker-sessions/{session_id}")
assert resp.status == 200
data = await resp.json()
assert data["deleted"] == session_id
# Verify deleted
assert not session_dir.exists()
@pytest.mark.asyncio
async def test_delete_session_not_found(self, tmp_agent_dir):
tmp_path, agent_name, base = tmp_agent_dir
session = _make_session(tmp_dir=tmp_path / ".hive" / "agents" / agent_name)
app = _make_app_with_session(session)
async with TestClient(TestServer(app)) as client:
resp = await client.delete("/api/sessions/test_agent/worker-sessions/nonexistent")
assert resp.status == 404
@pytest.mark.asyncio
async def test_list_checkpoints(self, sample_session, tmp_agent_dir):
session_id, session_dir, state = sample_session
tmp_path, agent_name, base = tmp_agent_dir
session = _make_session(tmp_dir=tmp_path / ".hive" / "agents" / agent_name)
app = _make_app_with_session(session)
async with TestClient(TestServer(app)) as client:
resp = await client.get(
f"/api/sessions/test_agent/worker-sessions/{session_id}/checkpoints"
)
assert resp.status == 200
data = await resp.json()
assert len(data["checkpoints"]) == 1
cp = data["checkpoints"][0]
assert cp["checkpoint_id"] == "cp_node_complete_node_a_001"
assert cp["current_node"] == "node_a"
assert cp["is_clean"] is True
@pytest.mark.asyncio
async def test_restore_checkpoint(self, sample_session, tmp_agent_dir):
session_id, session_dir, state = sample_session
tmp_path, agent_name, base = tmp_agent_dir
session = _make_session(tmp_dir=tmp_path / ".hive" / "agents" / agent_name)
app = _make_app_with_session(session)
async with TestClient(TestServer(app)) as client:
resp = await client.post(
f"/api/sessions/test_agent/worker-sessions/{session_id}"
"/checkpoints/cp_node_complete_node_a_001/restore"
)
assert resp.status == 200
data = await resp.json()
assert data["execution_id"] == "exec_test_123"
assert data["restored_from"] == session_id
assert data["checkpoint_id"] == "cp_node_complete_node_a_001"
@pytest.mark.asyncio
async def test_restore_checkpoint_not_found(self, sample_session, tmp_agent_dir):
session_id, session_dir, state = sample_session
tmp_path, agent_name, base = tmp_agent_dir
session = _make_session(tmp_dir=tmp_path / ".hive" / "agents" / agent_name)
app = _make_app_with_session(session)
async with TestClient(TestServer(app)) as client:
resp = await client.post(
f"/api/sessions/test_agent/worker-sessions/{session_id}/checkpoints/nonexistent_cp/restore"
)
assert resp.status == 404
class TestMessages:
@pytest.mark.asyncio
async def test_get_messages(self, sample_session, tmp_agent_dir):
session_id, session_dir, state = sample_session
tmp_path, agent_name, base = tmp_agent_dir
session = _make_session(tmp_dir=tmp_path / ".hive" / "agents" / agent_name)
app = _make_app_with_session(session)
async with TestClient(TestServer(app)) as client:
resp = await client.get(
f"/api/sessions/test_agent/worker-sessions/{session_id}/messages"
)
assert resp.status == 200
data = await resp.json()
msgs = data["messages"]
assert len(msgs) == 3
# Should be sorted by seq
assert msgs[0]["seq"] == 1
assert msgs[0]["role"] == "user"
assert msgs[0]["_node_id"] == "node_a"
assert msgs[1]["seq"] == 2
assert msgs[1]["role"] == "assistant"
assert msgs[2]["seq"] == 3
assert msgs[2]["_node_id"] == "node_b"
@pytest.mark.asyncio
async def test_get_messages_filtered_by_node(self, sample_session, tmp_agent_dir):
session_id, session_dir, state = sample_session
tmp_path, agent_name, base = tmp_agent_dir
session = _make_session(tmp_dir=tmp_path / ".hive" / "agents" / agent_name)
app = _make_app_with_session(session)
async with TestClient(TestServer(app)) as client:
resp = await client.get(
f"/api/sessions/test_agent/worker-sessions/{session_id}/messages?node_id=node_a"
)
assert resp.status == 200
data = await resp.json()
msgs = data["messages"]
assert len(msgs) == 2
assert all(m["_node_id"] == "node_a" for m in msgs)
@pytest.mark.asyncio
async def test_get_messages_no_conversations(self, tmp_agent_dir):
"""Session without conversations directory returns empty list."""
tmp_path, agent_name, base = tmp_agent_dir
worker_session_id = "session_empty"
session_dir = base / "sessions" / worker_session_id
session_dir.mkdir(parents=True)
(session_dir / "state.json").write_text(json.dumps({"status": "completed"}))
session = _make_session(tmp_dir=tmp_path / ".hive" / "agents" / agent_name)
app = _make_app_with_session(session)
async with TestClient(TestServer(app)) as client:
resp = await client.get(
f"/api/sessions/test_agent/worker-sessions/{worker_session_id}/messages"
)
assert resp.status == 200
data = await resp.json()
assert data["messages"] == []
@pytest.mark.asyncio
async def test_get_messages_client_only(self, tmp_agent_dir):
"""client_only=true keeps user+client-facing assistant."""
tmp_path, agent_name, base = tmp_agent_dir
worker_session_id = "session_client_only"
session_dir = base / "sessions" / worker_session_id
session_dir.mkdir(parents=True)
(session_dir / "state.json").write_text(json.dumps({"status": "completed"}))
# node_a is NOT client-facing, chat_node IS
conv_a = session_dir / "conversations" / "node_a" / "parts"
conv_a.mkdir(parents=True)
(conv_a / "0001.json").write_text(
json.dumps({"seq": 1, "role": "user", "content": "system prompt"})
)
(conv_a / "0002.json").write_text(
json.dumps({"seq": 2, "role": "assistant", "content": "internal work"})
)
(conv_a / "0003.json").write_text(
json.dumps({"seq": 3, "role": "tool", "content": "tool result"})
)
conv_chat = session_dir / "conversations" / "chat_node" / "parts"
conv_chat.mkdir(parents=True)
(conv_chat / "0004.json").write_text(
json.dumps({"seq": 4, "role": "user", "content": "hi", "is_client_input": True})
)
(conv_chat / "0005.json").write_text(
json.dumps({"seq": 5, "role": "assistant", "content": "hello!"})
)
(conv_chat / "0006.json").write_text(
json.dumps(
{
"seq": 6,
"role": "assistant",
"content": "",
"tool_calls": [{"id": "tc1", "function": {"name": "search"}}],
}
)
)
(conv_chat / "0007.json").write_text(
json.dumps(
{
"seq": 7,
"role": "user",
"content": "marker",
"is_transition_marker": True,
}
)
)
nodes = [
MockNodeSpec(id="node_a", name="Node A", client_facing=False),
MockNodeSpec(id="chat_node", name="Chat", client_facing=True),
]
session = _make_session(
tmp_dir=tmp_path / ".hive" / "agents" / agent_name,
nodes=nodes,
)
session.runner.graph = MockGraphSpec(nodes=nodes)
app = _make_app_with_session(session)
async with TestClient(TestServer(app)) as client:
resp = await client.get(
f"/api/sessions/test_agent/worker-sessions/{worker_session_id}/messages?client_only=true"
)
assert resp.status == 200
msgs = (await resp.json())["messages"]
# Keep: seq 4 (user+is_client_input), seq 5 (assistant from chat_node)
# Drop: seq 1,2,3,6,7 (internal / tool / tool_calls / marker)
assert len(msgs) == 2
assert msgs[0]["seq"] == 4
assert msgs[0]["role"] == "user"
assert msgs[1]["seq"] == 5
assert msgs[1]["role"] == "assistant"
assert msgs[1]["_node_id"] == "chat_node"
@pytest.mark.asyncio
async def test_get_messages_client_only_no_runner_returns_all(self, tmp_agent_dir):
"""client_only=true with no runner skips filtering (returns all messages)."""
tmp_path, agent_name, base = tmp_agent_dir
worker_session_id = "session_no_runner"
session_dir = base / "sessions" / worker_session_id
session_dir.mkdir(parents=True)
(session_dir / "state.json").write_text(json.dumps({"status": "completed"}))
conv = session_dir / "conversations" / "node_a" / "parts"
conv.mkdir(parents=True)
(conv / "0001.json").write_text(json.dumps({"seq": 1, "role": "user", "content": "hello"}))
(conv / "0002.json").write_text(
json.dumps({"seq": 2, "role": "assistant", "content": "response"})
)
session = _make_session(tmp_dir=tmp_path / ".hive" / "agents" / agent_name)
session.runner = None # Simulate runner not available
app = _make_app_with_session(session)
async with TestClient(TestServer(app)) as client:
resp = await client.get(
f"/api/sessions/test_agent/worker-sessions/{worker_session_id}/messages?client_only=true"
)
assert resp.status == 200
msgs = (await resp.json())["messages"]
# No runner -> can't resolve client-facing nodes -> returns all messages
assert len(msgs) == 2
class TestGraphNodes:
@pytest.mark.asyncio
async def test_list_nodes(self, nodes_and_edges):
@@ -1381,7 +1098,7 @@ class TestLogs:
async def test_logs_no_log_store(self):
"""Agent without log store returns 404."""
session = _make_session()
session.worker_runtime._runtime_log_store = None
session.graph_runtime._runtime_log_store = None
app = _make_app_with_session(session)
async with TestClient(TestServer(app)) as client:
@@ -1704,11 +1421,11 @@ class TestSSEFormat:
class TestErrorMiddleware:
@pytest.mark.asyncio
async def test_404_on_unknown_api_route(self):
async def test_unknown_api_route_falls_back_to_frontend(self):
app = create_app()
async with TestClient(TestServer(app)) as client:
resp = await client.get("/api/nonexistent")
assert resp.status == 404
assert resp.status == 200
class TestCleanupStaleActiveSessions:
@@ -8,7 +8,7 @@ metadata:
## Operational Protocol: Structured Note-Taking
Maintain structured working notes in shared memory key `_working_notes`.
Maintain structured working notes in shared buffer key `_working_notes`.
Update at these checkpoints:
- After completing each discrete subtask or batch item
+2 -2
View File
@@ -79,8 +79,8 @@ SKILL_REGISTRY: dict[str, str] = {
"hive.task-decomposition": "task-decomposition",
}
# All shared memory keys used by default skills (for permission auto-inclusion)
SHARED_MEMORY_KEYS: list[str] = [
# All shared buffer keys used by default skills (for permission auto-inclusion)
DATA_BUFFER_KEYS: list[str] = [
# note-taking
"_working_notes",
"_notes_updated_at",
+3 -1
View File
@@ -8,6 +8,7 @@ tooling, CI gates, and hive skill doctor.
from __future__ import annotations
import stat
import sys
from dataclasses import dataclass, field
from pathlib import Path
@@ -134,9 +135,10 @@ def validate_strict(path: Path) -> ValidationResult:
warnings.append("No 'license' field — consider adding a license (e.g. MIT, Apache-2.0).")
# 11. Scripts in scripts/ exist and are executable
# Windows has no POSIX executable bits; skip this check there.
base_dir = path.parent
scripts_dir = base_dir / "scripts"
if scripts_dir.is_dir():
if scripts_dir.is_dir() and sys.platform != "win32":
for script_path in sorted(scripts_dir.iterdir()):
if script_path.is_file():
if not (script_path.stat().st_mode & (stat.S_IXUSR | stat.S_IXGRP | stat.S_IXOTH)):
+24 -1
View File
@@ -30,6 +30,7 @@ from pathlib import Path
from typing import Any
class FileConversationStore:
"""File-per-part ConversationStore.
@@ -95,7 +96,7 @@ class FileConversationStore:
async def read_cursor(self) -> dict[str, Any] | None:
return await self._run(self._read_json, self._base / "cursor.json")
async def delete_parts_before(self, seq: int) -> None:
async def delete_parts_before(self, seq: int, run_id: str | None = None) -> None:
def _delete() -> None:
if not self._parts_dir.exists():
return
@@ -110,6 +111,28 @@ class FileConversationStore:
"""No-op — no persistent handles for file-per-part storage."""
pass
async def clear(self) -> None:
"""Clear all parts and cursor, keeping the directory structure.
Used when starting a fresh execution in the same session directory.
"""
def _clear() -> None:
# Clear all parts
if self._parts_dir.exists():
for f in self._parts_dir.glob("*.json"):
f.unlink()
# Clear cursor
cursor_path = self._base / "cursor.json"
if cursor_path.exists():
cursor_path.unlink()
# Clear meta
meta_path = self._base / "meta.json"
if meta_path.exists():
meta_path.unlink()
await self._run(_clear)
async def destroy(self) -> None:
"""Delete the entire base directory and all persisted data."""
+311 -152
View File
@@ -1,8 +1,8 @@
"""Queen lifecycle tools for worker management.
"""Queen lifecycle tools for graph management.
These tools give the Queen agent control over the worker agent's lifecycle.
They close over a session-like object that provides ``worker_runtime``,
allowing late-binding access to the worker (which may be loaded/unloaded
These tools give the Queen agent control over the loaded graph's lifecycle.
They close over a session-like object that provides ``graph_runtime``,
allowing late-binding access to the graph (which may be loaded/unloaded
dynamically).
Usage::
@@ -20,7 +20,7 @@ Usage::
from framework.tools.queen_lifecycle_tools import WorkerSessionAdapter
adapter = WorkerSessionAdapter(
worker_runtime=runtime,
graph_runtime=runtime,
event_bus=event_bus,
worker_path=storage_path,
)
@@ -66,11 +66,11 @@ logger = logging.getLogger(__name__)
class WorkerSessionAdapter:
"""Adapter for TUI compatibility.
Wraps bare worker_runtime + event_bus + storage_path into a
Wraps bare graph_runtime + event_bus + storage_path into a
session-like object that queen lifecycle tools can use.
"""
worker_runtime: Any # AgentRuntime
graph_runtime: Any # AgentRuntime
event_bus: Any # EventBus
worker_path: Path | None = None
@@ -79,16 +79,22 @@ class WorkerSessionAdapter:
class QueenPhaseState:
"""Mutable state container for queen operating phase.
Four phases: planning building staging running.
Five phases: planning building staging running editing.
EDITING is entered after worker execution completes. The worker
stays loaded queen can tweak config and re-run without rebuilding.
RUNNING cannot go directly to BUILDING or PLANNING; it must pass
through EDITING first.
Shared between the dynamic_tools_provider callback and tool handlers
that trigger phase transitions.
"""
phase: str = "building" # "planning", "building", "staging", or "running"
phase: str = "building" # "planning", "building", "staging", "running", or "editing"
planning_tools: list = field(default_factory=list) # list[Tool]
building_tools: list = field(default_factory=list) # list[Tool]
staging_tools: list = field(default_factory=list) # list[Tool]
running_tools: list = field(default_factory=list) # list[Tool]
editing_tools: list = field(default_factory=list) # list[Tool]
inject_notification: Any = None # async (str) -> None
event_bus: Any = None # EventBus — for emitting QUEEN_PHASE_CHANGED events
@@ -115,12 +121,24 @@ class QueenPhaseState:
prompt_building: str = ""
prompt_staging: str = ""
prompt_running: str = ""
prompt_editing: str = ""
# Default skill operational protocols — appended to every phase prompt
protocols_prompt: str = ""
# Community skills catalog (XML) — appended after protocols
skills_catalog_prompt: str = ""
# Persona and communication style (set once at session start by persona hook,
# persisted here so they survive dynamic prompt refreshes across iterations).
persona_prefix: str = "" # e.g. "You are a CFO. I am a CFO with 20 years..."
style_directive: str = "" # e.g. "## Communication Style: Peer\n\n..."
# Cached recall block — populated async by recall_selector after each turn.
_cached_recall_block: str = ""
_cached_colony_recall_block: str = ""
_cached_global_recall_block: str = ""
global_memory_dir: Path | None = None
def get_current_tools(self) -> list:
"""Return tools for the current phase."""
if self.phase == "planning":
@@ -129,6 +147,8 @@ class QueenPhaseState:
return list(self.running_tools)
if self.phase == "staging":
return list(self.staging_tools)
if self.phase == "editing":
return list(self.editing_tools)
return list(self.building_tools)
def get_current_prompt(self) -> str:
@@ -139,19 +159,29 @@ class QueenPhaseState:
base = self.prompt_running
elif self.phase == "staging":
base = self.prompt_staging
elif self.phase == "editing":
base = self.prompt_editing
else:
base = self.prompt_building
from framework.agents.queen.queen_memory import format_for_injection
memory = format_for_injection()
parts = [base]
parts = []
if self.persona_prefix:
parts.append(self.persona_prefix)
parts.append(base)
if self.style_directive:
parts.append(self.style_directive)
if self.skills_catalog_prompt:
parts.append(self.skills_catalog_prompt)
if self.protocols_prompt:
parts.append(self.protocols_prompt)
if memory:
parts.append(memory)
colony_memory = self._cached_colony_recall_block or self._cached_recall_block
if colony_memory:
parts.append(colony_memory)
if self._cached_global_recall_block:
parts.append(self._cached_global_recall_block)
return "\n\n".join(parts)
async def _emit_phase_event(self) -> None:
@@ -168,6 +198,26 @@ class QueenPhaseState:
)
)
async def switch_to_editing(self, source: str = "tool") -> None:
"""Switch to editing phase — worker stays loaded, queen can tweak and re-run.
Args:
source: Who triggered the switch "tool", "frontend", or "auto".
"""
if self.phase == "editing":
return
self.phase = "editing"
tool_names = [t.name for t in self.editing_tools]
logger.info("Queen phase → editing (source=%s, tools: %s)", source, tool_names)
await self._emit_phase_event()
if self.inject_notification and source != "tool":
await self.inject_notification(
"[PHASE CHANGE] Switched to EDITING phase. "
"Worker is still loaded. You can tweak configuration and re-run, "
"or escalate to building/planning if a deeper change is needed. "
"Available tools: " + ", ".join(tool_names) + "."
)
async def switch_to_running(self, source: str = "tool") -> None:
"""Switch to running phase and notify the queen.
@@ -223,11 +273,20 @@ class QueenPhaseState:
async def switch_to_building(self, source: str = "tool") -> None:
"""Switch to building phase and notify the queen.
Blocked from RUNNING and EDITING.
Args:
source: Who triggered the switch "tool", "frontend", or "auto".
"""
if self.phase == "building":
return
if self.phase in ("running", "editing"):
logger.warning(
"Queen phase: BLOCKED %s → building (source=%s)",
self.phase,
source,
)
return
self.phase = "building"
tool_names = [t.name for t in self.building_tools]
logger.info("Queen phase → building (source=%s, tools: %s)", source, tool_names)
@@ -242,11 +301,20 @@ class QueenPhaseState:
async def switch_to_planning(self, source: str = "tool") -> None:
"""Switch to planning phase and notify the queen.
Blocked from RUNNING and EDITING.
Args:
source: Who triggered the switch "tool", "frontend", or "auto".
"""
if self.phase == "planning":
return
if self.phase in ("running", "editing"):
logger.warning(
"Queen phase: BLOCKED %s → planning (source=%s)",
self.phase,
source,
)
return
self.phase = "planning"
tool_names = [t.name for t in self.planning_tools]
logger.info("Queen phase → planning (source=%s, tools: %s)", source, tool_names)
@@ -363,7 +431,7 @@ def _remove_trigger_from_agent(session: Any, trigger_id: str) -> None:
async def _persist_active_triggers(session: Any, session_id: str) -> None:
"""Persist the set of active trigger IDs (and their tasks) to SessionState."""
runtime = getattr(session, "worker_runtime", None)
runtime = getattr(session, "graph_runtime", None)
if runtime is None:
return
store = getattr(runtime, "_session_store", None)
@@ -418,8 +486,8 @@ async def _start_trigger_timer(session: Any, trigger_id: str, tdef: Any) -> None
_next_delay = float(interval_minutes) * 60 if interval_minutes else 60
fire_times[trigger_id] = time.monotonic() + _next_delay
# Gate on worker being loaded
if getattr(session, "worker_runtime", None) is None:
# Gate on a graph being loaded
if getattr(session, "graph_runtime", None) is None:
continue
# Fire into queen node
@@ -465,8 +533,8 @@ async def _start_trigger_webhook(session: Any, trigger_id: str, tdef: Any) -> No
return
if data.get("method", "").upper() not in methods:
return
# Gate on worker being loaded
if getattr(session, "worker_runtime", None) is None:
# Gate on a graph being loaded
if getattr(session, "graph_runtime", None) is None:
return
executor = getattr(session, "queen_executor", None)
if executor is None:
@@ -755,7 +823,7 @@ def register_queen_lifecycle_tools(
session: Any = None,
session_id: str | None = None,
# Legacy params — used by TUI when not passing a session object
worker_runtime: AgentRuntime | None = None,
graph_runtime: AgentRuntime | None = None,
event_bus: EventBus | None = None,
storage_path: Path | None = None,
# Server context — enables load_built_agent tool
@@ -767,30 +835,30 @@ def register_queen_lifecycle_tools(
"""Register queen lifecycle tools.
Args:
session: A Session or WorkerSessionAdapter with ``worker_runtime``
attribute. The tools read ``session.worker_runtime`` on each
call, supporting late-binding (worker loaded/unloaded).
session_id: Shared session ID so the worker uses the same session
session: A Session or WorkerSessionAdapter with ``graph_runtime``
attribute. The tools read ``session.graph_runtime`` on each
call, supporting late-binding (graph loaded/unloaded).
session_id: Shared session ID so the graph uses the same session
scope as the queen and judge.
worker_runtime: (Legacy) Direct runtime reference. If ``session``
graph_runtime: (Legacy) Direct runtime reference. If ``session``
is not provided, a WorkerSessionAdapter is created from
worker_runtime + event_bus + storage_path.
graph_runtime + event_bus + storage_path.
session_manager: (Server only) The SessionManager instance, needed
for ``load_built_agent`` to hot-load a worker.
for ``load_built_agent`` to hot-load a graph.
manager_session_id: (Server only) The session's ID in the manager,
used with ``session_manager.load_worker()``.
used with ``session_manager.load_graph()``.
phase_state: (Optional) Mutable phase state for building/running
phase switching. When provided, load_built_agent switches to
running phase and stop_worker_and_edit switches to building phase.
running phase and stop_graph_and_edit switches to building phase.
Returns the number of tools registered.
"""
# Build session adapter from legacy params if needed
if session is None:
if worker_runtime is None:
raise ValueError("Either session or worker_runtime must be provided")
if graph_runtime is None:
raise ValueError("Either session or graph_runtime must be provided")
session = WorkerSessionAdapter(
worker_runtime=worker_runtime,
graph_runtime=graph_runtime,
event_bus=event_bus,
worker_path=storage_path,
)
@@ -800,18 +868,18 @@ def register_queen_lifecycle_tools(
tools_registered = 0
def _get_runtime():
"""Get current worker runtime from session (late-binding)."""
return getattr(session, "worker_runtime", None)
"""Get current graph runtime from session (late-binding)."""
return getattr(session, "graph_runtime", None)
# --- start_worker ---------------------------------------------------------
# --- start_graph ----------------------------------------------------------
# How long to wait for credential validation + MCP resync before
# proceeding with trigger anyway. These are pre-flight checks that
# should not block the queen indefinitely.
_START_PREFLIGHT_TIMEOUT = 15 # seconds
async def start_worker(task: str) -> str:
"""Start the worker agent with a task description.
async def start_graph(task: str) -> str:
"""Start the loaded graph with a task description.
Triggers the worker's default entry point with the given task.
Returns immediately the worker runs asynchronously.
@@ -860,13 +928,13 @@ def register_queen_lifecycle_tools(
await asyncio.wait_for(_preflight(), timeout=_START_PREFLIGHT_TIMEOUT)
except TimeoutError:
logger.warning(
"start_worker preflight timed out after %ds — proceeding with trigger",
"start_graph preflight timed out after %ds — proceeding with trigger",
_START_PREFLIGHT_TIMEOUT,
)
except CredentialError:
raise # handled below
# Resume timers in case they were paused by a previous stop_worker
# Resume timers in case they were paused by a previous stop_graph
runtime.resume_timers()
# Get session state from any prior execution for memory continuity
@@ -907,12 +975,12 @@ def register_queen_lifecycle_tools(
)
return json.dumps(error_payload)
except Exception as e:
return json.dumps({"error": f"Failed to start worker: {e}"})
return json.dumps({"error": f"Failed to start graph: {e}"})
_start_tool = Tool(
name="start_worker",
name="start_graph",
description=(
"Start the worker agent with a task description. The worker runs "
"Start the loaded graph with a task description. The graph runs "
"autonomously in the background. Returns an execution ID for tracking."
),
parameters={
@@ -920,19 +988,19 @@ def register_queen_lifecycle_tools(
"properties": {
"task": {
"type": "string",
"description": "Description of the task for the worker to perform",
"description": "Description of the task for the graph to perform",
},
},
"required": ["task"],
},
)
registry.register("start_worker", _start_tool, lambda inputs: start_worker(**inputs))
registry.register("start_graph", _start_tool, lambda inputs: start_graph(**inputs))
tools_registered += 1
# --- stop_worker ----------------------------------------------------------
# --- stop_graph -----------------------------------------------------------
async def stop_worker(*, reason: str = "Stopped by queen") -> str:
"""Cancel all active worker executions across all graphs.
async def stop_graph(*, reason: str = "Stopped by queen") -> str:
"""Cancel all active graph executions across all graphs.
Stops the worker immediately. Returns the IDs of cancelled executions.
"""
@@ -979,21 +1047,60 @@ def register_queen_lifecycle_tools(
)
_stop_tool = Tool(
name="stop_worker",
name="stop_graph",
description=(
"Cancel the worker agent's active execution and pause its timers. "
"The worker stops gracefully. No parameters needed."
"Cancel the loaded graph's active execution and pause its timers. "
"The graph stops gracefully. No parameters needed."
),
parameters={"type": "object", "properties": {}},
)
registry.register("stop_worker", _stop_tool, lambda inputs: stop_worker())
registry.register("stop_graph", _stop_tool, lambda inputs: stop_graph())
tools_registered += 1
# --- stop_worker_and_edit -------------------------------------------------
# --- switch_to_editing ----------------------------------------------------
async def stop_worker_and_edit() -> str:
"""Stop the worker and switch to building phase for editing the agent."""
stop_result = await stop_worker()
async def switch_to_editing_tool() -> str:
"""Stop the worker and switch to editing phase for config tweaks.
The worker stays loaded. You can re-run with different input,
inject config adjustments, or escalate to building/planning.
"""
stop_result = await stop_graph()
if phase_state is not None:
await phase_state.switch_to_editing()
_update_meta_json(session_manager, manager_session_id, {"phase": "editing"})
result = json.loads(stop_result)
result["phase"] = "editing"
result["message"] = (
"Worker stopped. You are now in editing phase. "
"You can re-run with run_agent_with_input(task), tweak config "
"with inject_message, or escalate to building/planning."
)
return json.dumps(result)
_switch_editing_tool = Tool(
name="switch_to_editing",
description=(
"Stop the running worker and switch to editing phase. "
"The worker stays loaded — you can tweak config and re-run. "
"Use this when you want to adjust the worker without rebuilding."
),
parameters={"type": "object", "properties": {}},
)
registry.register(
"switch_to_editing",
_switch_editing_tool,
lambda inputs: switch_to_editing_tool(),
)
tools_registered += 1
# --- stop_graph_and_edit --------------------------------------------------
async def stop_graph_and_edit() -> str:
"""Stop the loaded graph and switch to building phase for editing the agent."""
stop_result = await stop_graph()
# Switch to building phase
if phase_state is not None:
@@ -1003,7 +1110,7 @@ def register_queen_lifecycle_tools(
result = json.loads(stop_result)
result["phase"] = "building"
result["message"] = (
"Worker stopped. You are now in building phase. "
"Graph stopped. You are now in building phase. "
"Use your coding tools to modify the agent, then call "
"load_built_agent(path) to stage it again."
)
@@ -1015,24 +1122,24 @@ def register_queen_lifecycle_tools(
return json.dumps(result)
_stop_edit_tool = Tool(
name="stop_worker_and_edit",
name="stop_graph_and_edit",
description=(
"Stop the running worker and switch to building phase. "
"Stop the running graph and switch to building phase. "
"Use this when you need to modify the agent's code, nodes, or configuration. "
"After editing, call load_built_agent(path) to reload and run."
),
parameters={"type": "object", "properties": {}},
)
registry.register(
"stop_worker_and_edit", _stop_edit_tool, lambda inputs: stop_worker_and_edit()
"stop_graph_and_edit", _stop_edit_tool, lambda inputs: stop_graph_and_edit()
)
tools_registered += 1
# --- stop_worker_and_plan (Running/Staging → Planning) --------------------
# --- stop_graph_and_plan (Running/Staging → Planning) ---------------------
async def stop_worker_and_plan() -> str:
"""Stop the worker and switch to planning phase for diagnosis."""
stop_result = await stop_worker()
async def stop_graph_and_plan() -> str:
"""Stop the loaded graph and switch to planning phase for diagnosis."""
stop_result = await stop_graph()
# Switch to planning phase
if phase_state is not None:
@@ -1041,7 +1148,7 @@ def register_queen_lifecycle_tools(
result = json.loads(stop_result)
result["phase"] = "planning"
result["message"] = (
"Worker stopped. You are now in planning phase. "
"Graph stopped. You are now in planning phase. "
"Diagnose the issue using read-only tools (checkpoints, logs, sessions), "
"discuss a fix plan with the user, then call "
"initialize_and_build_agent() to implement the fix."
@@ -1049,16 +1156,16 @@ def register_queen_lifecycle_tools(
return json.dumps(result)
_stop_plan_tool = Tool(
name="stop_worker_and_plan",
name="stop_graph_and_plan",
description=(
"Stop the worker and switch to planning phase for diagnosis. "
"Stop the graph and switch to planning phase for diagnosis. "
"Use this when you need to investigate an issue before fixing it. "
"After diagnosis, call initialize_and_build_agent() to switch to building."
),
parameters={"type": "object", "properties": {}},
)
registry.register(
"stop_worker_and_plan", _stop_plan_tool, lambda inputs: stop_worker_and_plan()
"stop_graph_and_plan", _stop_plan_tool, lambda inputs: stop_graph_and_plan()
)
tools_registered += 1
@@ -2001,12 +2108,12 @@ def register_queen_lifecycle_tools(
"input_keys": {
"type": "array",
"items": {"type": "string"},
"description": "Expected input memory keys (hints)",
"description": "Expected input buffer keys (hints)",
},
"output_keys": {
"type": "array",
"items": {"type": "string"},
"description": "Expected output memory keys (hints)",
"description": "Expected output buffer keys (hints)",
},
"success_criteria": {
"type": "string",
@@ -2370,16 +2477,16 @@ def register_queen_lifecycle_tools(
lambda inputs: initialize_and_build_agent_wrapper(inputs),
)
# --- stop_worker (Running → Staging) -------------------------------------
# --- stop_graph (Running → Staging) --------------------------------------
async def stop_worker_to_staging() -> str:
"""Stop the running worker and switch to staging phase.
async def stop_graph_to_staging() -> str:
"""Stop the running graph and switch to staging phase.
After stopping, ask the user whether they want to:
1. Re-run the agent with new input call run_agent_with_input(task)
2. Edit the agent code call stop_worker_and_edit() to go to building phase
2. Edit the agent code call stop_graph_and_edit() to go to building phase
"""
stop_result = await stop_worker()
stop_result = await stop_graph()
# Switch to staging phase
if phase_state is not None:
@@ -2389,54 +2496,30 @@ def register_queen_lifecycle_tools(
result = json.loads(stop_result)
result["phase"] = "staging"
result["message"] = (
"Worker stopped. You are now in staging phase. "
"Graph stopped. You are now in staging phase. "
"Ask the user: would they like to re-run with new input, "
"or edit the agent code?"
)
return json.dumps(result)
_stop_worker_tool = Tool(
name="stop_worker",
name="stop_graph",
description=(
"Stop the running worker and switch to staging phase. "
"Stop the running graph and switch to staging phase. "
"After stopping, ask the user whether they want to re-run "
"with new input or edit the agent code."
),
parameters={"type": "object", "properties": {}},
)
registry.register("stop_worker", _stop_worker_tool, lambda inputs: stop_worker_to_staging())
registry.register("stop_graph", _stop_worker_tool, lambda inputs: stop_graph_to_staging())
tools_registered += 1
# --- get_worker_status ----------------------------------------------------
# --- get_graph_status -----------------------------------------------------
def _get_event_bus():
"""Get the session's event bus for querying history."""
return getattr(session, "event_bus", None)
def _get_worker_name() -> str | None:
"""Return the worker agent directory name, used for diary lookups."""
p = getattr(session, "worker_path", None)
return p.name if p else None
def _format_diary(max_runs: int) -> str:
"""Read recent run digests from disk — no EventBus required."""
agent_name = _get_worker_name()
if not agent_name:
return "No worker loaded — diary unavailable."
from framework.agents.worker_memory import read_recent_digests
entries = read_recent_digests(agent_name, max_runs)
if not entries:
return (
f"No run digests for '{agent_name}' yet. "
"Digests are written at the end of each completed run."
)
lines = [f"Worker '{agent_name}'{len(entries)} recent run digest(s):", ""]
for _run_id, content in entries:
lines.append(content)
lines.append("")
return "\n".join(lines).rstrip()
# Tiered cooldowns: summary is free, detail has short cooldown, full keeps 30s
_COOLDOWN_FULL = 30.0
_COOLDOWN_DETAIL = 10.0
@@ -2641,16 +2724,16 @@ def register_queen_lifecycle_tools(
return "\n".join(lines)
async def _format_memory(runtime: AgentRuntime) -> str:
"""Format the worker's shared memory snapshot and recent changes."""
"""Format the worker's shared buffer snapshot and recent changes."""
from framework.runtime.shared_state import IsolationLevel
lines = []
active_streams = runtime.get_active_streams()
if not active_streams:
return "Worker has no active executions. No memory to inspect."
return "Worker has no active executions. No buffer state to inspect."
# Read memory from the first active execution
# Read buffer state from the first active execution
stream_info = active_streams[0]
exec_ids = stream_info.get("active_execution_ids", [])
stream_id = stream_info.get("stream_id", "")
@@ -2658,13 +2741,13 @@ def register_queen_lifecycle_tools(
return "No active execution found."
exec_id = exec_ids[0]
memory = runtime.state_manager.create_memory(exec_id, stream_id, IsolationLevel.SHARED)
state = await memory.read_all()
buf = runtime.state_manager.create_buffer(exec_id, stream_id, IsolationLevel.SHARED)
state = await buf.read_all()
if not state:
lines.append("Worker's shared memory is empty.")
lines.append("Worker's shared buffer is empty.")
else:
lines.append(f"Worker's shared memory ({len(state)} keys):")
lines.append(f"Worker's shared buffer ({len(state)} keys):")
for key, value in state.items():
lines.append(f" {key}: {_preview_value(value)}")
@@ -3024,8 +3107,8 @@ def register_queen_lifecycle_tools(
return result
async def get_worker_status(focus: str | None = None, last_n: int = 20) -> str:
"""Check on the worker with progressive disclosure.
async def get_graph_status(focus: str | None = None, last_n: int = 20) -> str:
"""Check on the loaded graph with progressive disclosure.
Without arguments, returns a brief prose summary. Use ``focus`` to
drill into specifics: activity, memory, tools, issues, progress,
@@ -3039,14 +3122,14 @@ def register_queen_lifecycle_tools(
import time as _time
# --- Tiered cooldown ---
# diary is free (file reads only), summary is free, detail has 10s, full has 30s
# summary is free, detail has 10s, full keeps 30s
now = _time.monotonic()
if focus == "full":
cooldown = _COOLDOWN_FULL
tier = "full"
elif focus == "diary" or focus is None:
elif focus is None:
cooldown = 0.0
tier = focus or "summary"
tier = "summary"
else:
cooldown = _COOLDOWN_DETAIL
tier = "detail"
@@ -3065,10 +3148,6 @@ def register_queen_lifecycle_tools(
)
_status_last_called[tier] = now
# --- Diary: pure file reads, no runtime required ---
if focus == "diary":
return _format_diary(last_n)
# --- Runtime check ---
runtime = _get_runtime()
if runtime is None:
@@ -3118,21 +3197,19 @@ def register_queen_lifecycle_tools(
else:
return (
f"Unknown focus '{focus}'. "
"Valid options: diary, activity, memory, tools, issues, progress, full."
"Valid options: activity, memory, tools, issues, progress, full."
)
except Exception as exc:
logger.exception("get_worker_status error")
logger.exception("get_graph_status error")
return f"Error retrieving status: {exc}"
_status_tool = Tool(
name="get_worker_status",
name="get_graph_status",
description=(
"Check on the worker. Returns a brief prose summary by default. "
"Check on the loaded graph. Returns a brief prose summary by default. "
"Use 'focus' to drill into specifics:\n"
"- diary: persistent run digests from past executions — read this first "
"before digging into live runtime logs\n"
"- activity: current node, transitions, latest LLM output\n"
"- memory: worker's accumulated knowledge and state\n"
"- memory: worker's accumulated buffer state\n"
"- tools: running and recent tool calls\n"
"- issues: retries, stalls, constraint violations\n"
"- progress: goal criteria, token consumption\n"
@@ -3143,10 +3220,9 @@ def register_queen_lifecycle_tools(
"properties": {
"focus": {
"type": "string",
"enum": ["diary", "activity", "memory", "tools", "issues", "progress", "full"],
"enum": ["activity", "memory", "tools", "issues", "progress", "full"],
"description": (
"Aspect to inspect. Omit for a brief summary. "
"Use 'diary' to read persistent run history before checking live logs."
"Aspect to inspect. Omit for a brief summary."
),
},
"last_n": {
@@ -3159,25 +3235,25 @@ def register_queen_lifecycle_tools(
"required": [],
},
)
registry.register("get_worker_status", _status_tool, lambda inputs: get_worker_status(**inputs))
registry.register("get_graph_status", _status_tool, lambda inputs: get_graph_status(**inputs))
tools_registered += 1
# --- inject_worker_message ------------------------------------------------
# --- inject_message -------------------------------------------------------
async def inject_worker_message(content: str) -> str:
"""Send a message to the running worker agent.
async def inject_message(content: str) -> str:
"""Send a message to the running graph.
Injects the message into the worker's active node conversation.
Use this to relay user instructions to the worker.
"""
runtime = _get_runtime()
if runtime is None:
return json.dumps({"error": "No worker loaded in this session."})
return json.dumps({"error": "No graph loaded in this session."})
graph_id = runtime.graph_id
reg = runtime.get_graph_registration(graph_id)
if reg is None:
return json.dumps({"error": "Worker graph not found"})
return json.dumps({"error": "Graph not found"})
# Prefer nodes that are actively waiting (e.g. escalation receivers
# blocked on queen guidance) over the main event-loop node.
@@ -3212,30 +3288,30 @@ def register_queen_lifecycle_tools(
return json.dumps(
{
"error": "No active worker node found — worker may be idle.",
"error": "No active graph node found — graph may be idle.",
}
)
_inject_tool = Tool(
name="inject_worker_message",
name="inject_message",
description=(
"Send a message to the running worker agent. The message is injected "
"into the worker's active node conversation. Use this to relay user "
"instructions or concerns. The worker must be running."
"Send a message to the running graph. The message is injected "
"into the graph's active node conversation. Use this to relay user "
"instructions or concerns. The graph must be running."
),
parameters={
"type": "object",
"properties": {
"content": {
"type": "string",
"description": "Message content to send to the worker",
"description": "Message content to send to the graph",
},
},
"required": ["content"],
},
)
registry.register(
"inject_worker_message", _inject_tool, lambda inputs: inject_worker_message(**inputs)
"inject_message", _inject_tool, lambda inputs: inject_message(**inputs)
)
tools_registered += 1
@@ -3402,10 +3478,10 @@ def register_queen_lifecycle_tools(
runtime = _get_runtime()
if runtime is not None:
try:
await session_manager.unload_worker(manager_session_id)
await session_manager.unload_graph(manager_session_id)
except Exception as e:
logger.error("Failed to unload existing worker: %s", e, exc_info=True)
return json.dumps({"error": f"Failed to unload existing worker: {e}"})
logger.error("Failed to unload existing graph: %s", e, exc_info=True)
return json.dumps({"error": f"Failed to unload existing graph: {e}"})
try:
resolved_path = validate_agent_path(agent_path)
@@ -3460,7 +3536,7 @@ def register_queen_lifecycle_tools(
)
try:
updated_session = await session_manager.load_worker(
updated_session = await session_manager.load_graph(
manager_session_id,
str(resolved_path),
)
@@ -3477,9 +3553,9 @@ def register_queen_lifecycle_tools(
if missing:
missing_by_node[f"{node.name} (id={node.id})"] = sorted(missing)
if missing_by_node:
# Unload the broken worker
# Unload the broken graph
try:
await session_manager.unload_worker(manager_session_id)
await session_manager.unload_graph(manager_session_id)
except Exception:
pass
details = "; ".join(
@@ -3548,19 +3624,19 @@ def register_queen_lifecycle_tools(
await phase_state.switch_to_staging()
_update_meta_json(session_manager, manager_session_id, {"phase": "staging"})
worker_name = info.name if info else updated_session.worker_id
graph_name = info.name if info else updated_session.graph_id
return json.dumps(
{
"status": "loaded",
"phase": "staging",
"message": (
f"Successfully loaded '{worker_name}'. "
f"Successfully loaded '{graph_name}'. "
"You are now in STAGING phase. "
"Call run_agent_with_input(task) to start the worker, "
"or stop_worker_and_edit() to go back to building."
"Call run_agent_with_input(task) to start the graph, "
"or stop_graph_and_edit() to go back to building."
),
"worker_id": updated_session.worker_id,
"worker_name": worker_name,
"graph_id": updated_session.graph_id,
"graph_name": graph_name,
"goal": info.goal_name if info else "",
"node_count": info.node_count if info else 0,
}
@@ -4009,6 +4085,89 @@ def register_queen_lifecycle_tools(
)
tools_registered += 1
# --- save_global_memory --------------------------------------------------
async def save_global_memory_entry(
category: str,
description: str,
content: str,
name: str | None = None,
) -> str:
"""Persist a queen-global memory entry about the user."""
from framework.agents.queen.queen_memory_v2 import (
global_memory_dir as _global_memory_dir,
init_memory_dir as _init_memory_dir,
save_global_memory as _save_global_memory,
)
target_dir = (
phase_state.global_memory_dir
if phase_state is not None and phase_state.global_memory_dir is not None
else _global_memory_dir()
)
_init_memory_dir(target_dir)
try:
filename, path = _save_global_memory(
category=category,
description=description,
content=content,
name=name,
memory_dir=target_dir,
)
return json.dumps(
{
"status": "saved",
"filename": filename,
"path": str(path),
"category": category,
}
)
except ValueError as exc:
return json.dumps({"error": str(exc)})
_save_global_memory_tool = Tool(
name="save_global_memory",
description=(
"Save durable global memory about the user. "
"Only use for user profile, preferences, environment, or feedback."
),
parameters={
"type": "object",
"properties": {
"category": {
"type": "string",
"enum": ["profile", "preference", "environment", "feedback"],
},
"description": {
"type": "string",
"description": "Specific one-line description for future recall selection.",
},
"content": {
"type": "string",
"description": "Durable user-centric memory content.",
},
"name": {
"type": "string",
"description": "Optional short memory title.",
},
},
"required": ["category", "description", "content"],
"additionalProperties": False,
},
)
registry.register(
"save_global_memory",
_save_global_memory_tool,
lambda inputs: save_global_memory_entry(
inputs["category"],
inputs["description"],
inputs["content"],
inputs.get("name"),
),
)
tools_registered += 1
# --- list_triggers ---------------------------------------------------------
async def list_triggers() -> str:
+2 -2
View File
@@ -45,13 +45,13 @@ def recall_diary(query: str = "", days_back: int = 7) -> str:
Args:
query: Optional keyword or phrase to filter entries. If empty, all
recent entries are returned.
days_back: How many days to look back (130). Defaults to 7.
days_back: How many days to look back (1-30). Defaults to 7.
"""
from datetime import date, timedelta
from framework.agents.queen.queen_memory import format_memory_date, read_episodic_memory
days_back = max(1, min(days_back, 30))
days_back = max(1, min(int(days_back), 30))
today = date.today()
results: list[str] = []
total_chars = 0
+4 -179
View File
@@ -1,23 +1,17 @@
"""Worker monitoring tools for Queen triage agents.
"""Worker monitoring tools for Queen runtime inspection.
Three tools are registered by ``register_worker_monitoring_tools()``:
The following tool is registered by ``register_worker_monitoring_tools()``:
- ``get_worker_health_summary`` reads the worker's session log files and
returns a compact health snapshot (recent verdicts, step count, timing).
session_id is optional: if omitted, the most recent active session is
auto-discovered from storage.
- ``emit_escalation_ticket`` validates and publishes an EscalationTicket
to the shared EventBus as a WORKER_ESCALATION_TICKET event.
- ``notify_operator`` emits a QUEEN_INTERVENTION_REQUESTED event so the TUI
can surface a non-disruptive operator notification.
Usage::
from framework.tools.worker_monitoring_tools import register_worker_monitoring_tools
register_worker_monitoring_tools(tool_registry, event_bus, storage_path)
register_worker_monitoring_tools(tool_registry, storage_path)
"""
from __future__ import annotations
@@ -30,7 +24,6 @@ from typing import TYPE_CHECKING
if TYPE_CHECKING:
from framework.runner.tool_registry import ToolRegistry
from framework.runtime.event_bus import EventBus
logger = logging.getLogger(__name__)
@@ -40,20 +33,16 @@ _DEFAULT_LAST_N_STEPS = 40
def register_worker_monitoring_tools(
registry: ToolRegistry,
event_bus: EventBus,
storage_path: Path,
stream_id: str = "monitoring",
worker_graph_id: str | None = None,
default_session_id: str | None = None,
) -> int:
"""Register worker monitoring tools bound to *event_bus* and *storage_path*.
"""Register worker monitoring tools bound to *storage_path*.
Args:
registry: ToolRegistry to register tools on.
event_bus: The shared EventBus for the worker runtime.
storage_path: Root storage path of the worker runtime
(e.g. ``~/.hive/agents/{name}``).
stream_id: Stream ID used when emitting events.
worker_graph_id: The primary worker graph's ID. Included in health summary
so the judge can populate ticket identity fields accurately.
default_session_id: When set, ``get_worker_health_summary`` uses this
@@ -242,168 +231,4 @@ def register_worker_monitoring_tools(
)
tools_registered += 1
# -------------------------------------------------------------------------
# emit_escalation_ticket
# -------------------------------------------------------------------------
async def emit_escalation_ticket(ticket_json: str) -> str:
"""Validate and publish an EscalationTicket to the shared EventBus.
ticket_json must be a JSON string containing all required EscalationTicket
fields. The ticket is validated before publishing.
Returns a confirmation JSON with the ticket_id on success, or an error.
"""
from framework.runtime.escalation_ticket import EscalationTicket
try:
raw = json.loads(ticket_json) if isinstance(ticket_json, str) else ticket_json
ticket = EscalationTicket(**raw)
except Exception as e:
return json.dumps({"error": f"Invalid ticket: {e}"})
try:
await event_bus.emit_worker_escalation_ticket(
stream_id=stream_id,
node_id="monitoring",
ticket=ticket.model_dump(),
)
logger.info(
"EscalationTicket emitted: ticket_id=%s severity=%s cause=%r",
ticket.ticket_id,
ticket.severity,
ticket.cause[:80],
)
return json.dumps(
{
"status": "emitted",
"ticket_id": ticket.ticket_id,
"severity": ticket.severity,
}
)
except Exception as e:
return json.dumps({"error": f"Failed to emit ticket: {e}"})
_emit_ticket_tool = Tool(
name="emit_escalation_ticket",
description=(
"Validate and publish a structured EscalationTicket to the shared EventBus. "
"ticket_json must be a JSON string with all required EscalationTicket fields: "
"worker_agent_id, worker_session_id, worker_node_id, worker_graph_id, "
"severity (low/medium/high/critical), cause, judge_reasoning, suggested_action, "
"recent_verdicts (list), total_steps_checked, steps_since_last_accept, "
"stall_minutes (float or null), evidence_snippet."
),
parameters={
"type": "object",
"properties": {
"ticket_json": {
"type": "string",
"description": "JSON string of the complete EscalationTicket",
},
},
"required": ["ticket_json"],
},
)
registry.register(
"emit_escalation_ticket",
_emit_ticket_tool,
lambda inputs: emit_escalation_ticket(**inputs),
)
tools_registered += 1
# -------------------------------------------------------------------------
# notify_operator
# -------------------------------------------------------------------------
async def notify_operator(
ticket_id: str,
analysis: str,
urgency: str,
) -> str:
"""Emit a QUEEN_INTERVENTION_REQUESTED event to notify the human operator.
The TUI subscribes to this event and surfaces a non-disruptive dismissable
notification. The worker agent is NOT paused. The operator can choose to
open the queen's graph view via Ctrl+Q.
Args:
ticket_id: The ticket_id from the original EscalationTicket.
analysis: 2-3 sentence description of what is wrong, why it matters,
and what action is suggested.
urgency: Severity level: "low", "medium", "high", or "critical".
Returns:
Confirmation JSON.
"""
valid_urgencies = {"low", "medium", "high", "critical"}
if urgency not in valid_urgencies:
return json.dumps(
{"error": f"urgency must be one of {sorted(valid_urgencies)}, got {urgency!r}"}
)
try:
await event_bus.emit_queen_intervention_requested(
stream_id=stream_id,
node_id="ticket_triage",
ticket_id=ticket_id,
analysis=analysis,
severity=urgency,
queen_graph_id="queen",
queen_stream_id="queen",
)
logger.info(
"Queen intervention requested: ticket_id=%s urgency=%s",
ticket_id,
urgency,
)
return json.dumps(
{
"status": "operator_notified",
"ticket_id": ticket_id,
"urgency": urgency,
}
)
except Exception as e:
return json.dumps({"error": f"Failed to notify operator: {e}"})
_notify_tool = Tool(
name="notify_operator",
description=(
"Notify the human operator that a worker agent needs attention. "
"This emits a QUEEN_INTERVENTION_REQUESTED event that the TUI surfaces "
"as a non-disruptive notification. The worker keeps running. "
"Only call this when you (the Queen) have decided the issue warrants "
"human attention after reading the escalation ticket."
),
parameters={
"type": "object",
"properties": {
"ticket_id": {
"type": "string",
"description": "The ticket_id from the EscalationTicket being triaged",
},
"analysis": {
"type": "string",
"description": (
"2-3 sentence analysis: what is wrong, why it matters, "
"and what action you suggest."
),
},
"urgency": {
"type": "string",
"enum": ["low", "medium", "high", "critical"],
"description": "Severity level for the operator notification",
},
},
"required": ["ticket_id", "analysis", "urgency"],
},
)
registry.register(
"notify_operator",
_notify_tool,
lambda inputs: notify_operator(**inputs),
)
tools_registered += 1
return tools_registered
+11 -19
View File
@@ -4,8 +4,6 @@ import type {
InjectResult,
ChatResult,
StopResult,
ResumeResult,
ReplayResult,
GoalProgress,
} from "./types";
@@ -34,16 +32,22 @@ export const executionApi = {
graph_id: graphId,
}),
chat: (sessionId: string, message: string, images?: { type: string; image_url: { url: string } }[]) =>
api.post<ChatResult>(`/sessions/${sessionId}/chat`, { message, ...(images?.length ? { images } : {}) }),
chat: (
sessionId: string,
message: string,
images?: { type: string; image_url: { url: string } }[],
displayMessage?: string,
) =>
api.post<ChatResult>(`/sessions/${sessionId}/chat`, {
message,
...(images?.length ? { images } : {}),
...(displayMessage !== undefined ? { display_message: displayMessage } : {}),
}),
/** Queue context for the queen without triggering an LLM response. */
queenContext: (sessionId: string, message: string) =>
api.post<ChatResult>(`/sessions/${sessionId}/queen-context`, { message }),
workerInput: (sessionId: string, message: string) =>
api.post<ChatResult>(`/sessions/${sessionId}/worker-input`, { message }),
stop: (sessionId: string, executionId: string) =>
api.post<StopResult>(`/sessions/${sessionId}/stop`, {
execution_id: executionId,
@@ -57,18 +61,6 @@ export const executionApi = {
cancelQueen: (sessionId: string) =>
api.post<{ cancelled: boolean }>(`/sessions/${sessionId}/cancel-queen`),
resume: (sessionId: string, workerSessionId: string, checkpointId?: string) =>
api.post<ResumeResult>(`/sessions/${sessionId}/resume`, {
session_id: workerSessionId,
checkpoint_id: checkpointId,
}),
replay: (sessionId: string, workerSessionId: string, checkpointId: string) =>
api.post<ReplayResult>(`/sessions/${sessionId}/replay`, {
session_id: workerSessionId,
checkpoint_id: checkpointId,
}),
goalProgress: (sessionId: string) =>
api.get<GoalProgress>(`/sessions/${sessionId}/goal-progress`),
};
+10 -40
View File
@@ -3,16 +3,13 @@ import type {
AgentEvent,
LiveSession,
LiveSessionDetail,
SessionSummary,
SessionDetail,
Checkpoint,
EntryPoint,
} from "./types";
export const sessionsApi = {
// --- Session lifecycle ---
/** Create a session. If agentPath is provided, loads worker in one step. */
/** Create a session. If agentPath is provided, loads a graph in one step. */
create: (agentPath?: string, agentId?: string, model?: string, initialPrompt?: string, queenResumeFrom?: string) =>
api.post<LiveSession>("/sessions", {
agent_path: agentPath,
@@ -25,7 +22,7 @@ export const sessionsApi = {
/** List all active sessions. */
list: () => api.get<{ sessions: LiveSession[] }>("/sessions"),
/** Get session detail (includes entry_points, graphs when worker is loaded). */
/** Get session detail (includes entry_points, graphs when a graph is loaded). */
get: (sessionId: string) =>
api.get<LiveSessionDetail>(`/sessions/${sessionId}`),
@@ -35,23 +32,23 @@ export const sessionsApi = {
`/sessions/${sessionId}`,
),
// --- Worker lifecycle ---
// --- Graph lifecycle ---
loadWorker: (
loadGraph: (
sessionId: string,
agentPath: string,
workerId?: string,
graphId?: string,
model?: string,
) =>
api.post<LiveSession>(`/sessions/${sessionId}/worker`, {
api.post<LiveSession>(`/sessions/${sessionId}/graph`, {
agent_path: agentPath,
worker_id: workerId,
graph_id: graphId,
model,
}),
unloadWorker: (sessionId: string) =>
api.delete<{ session_id: string; worker_unloaded: boolean }>(
`/sessions/${sessionId}/worker`,
unloadGraph: (sessionId: string) =>
api.delete<{ session_id: string; graph_unloaded: boolean }>(
`/sessions/${sessionId}/graph`,
),
// --- Session info ---
@@ -92,31 +89,4 @@ export const sessionsApi = {
/** Permanently delete a history session (stops live session + removes disk files). */
deleteHistory: (sessionId: string) =>
api.delete<{ deleted: string }>(`/sessions/history/${sessionId}`),
// --- Worker session browsing (persisted execution runs) ---
workerSessions: (sessionId: string) =>
api.get<{ sessions: SessionSummary[] }>(
`/sessions/${sessionId}/worker-sessions`,
),
workerSession: (sessionId: string, wsId: string) =>
api.get<SessionDetail>(
`/sessions/${sessionId}/worker-sessions/${wsId}`,
),
deleteWorkerSession: (sessionId: string, wsId: string) =>
api.delete<{ deleted: string }>(
`/sessions/${sessionId}/worker-sessions/${wsId}`,
),
checkpoints: (sessionId: string, wsId: string) =>
api.get<{ checkpoints: Checkpoint[] }>(
`/sessions/${sessionId}/worker-sessions/${wsId}/checkpoints`,
),
restore: (sessionId: string, wsId: string, checkpointId: string) =>
api.post<{ execution_id: string }>(
`/sessions/${sessionId}/worker-sessions/${wsId}/checkpoints/${checkpointId}/restore`,
),
};
+4 -53
View File
@@ -2,8 +2,8 @@
export interface LiveSession {
session_id: string;
worker_id: string | null;
worker_name: string | null;
graph_id: string | null;
graph_name: string | null;
has_worker: boolean;
agent_path: string;
description: string;
@@ -79,61 +79,11 @@ export interface StopResult {
error?: string;
}
export interface ResumeResult {
execution_id: string;
resumed_from: string;
checkpoint_id: string | null;
}
export interface ReplayResult {
execution_id: string;
replayed_from: string;
checkpoint_id: string;
}
export interface GoalProgress {
progress: number;
criteria: unknown[];
}
// --- Session types ---
export interface SessionSummary {
session_id: string;
status?: string;
started_at?: string | null;
completed_at?: string | null;
steps?: number;
paused_at?: string | null;
checkpoint_count: number;
}
export interface SessionDetail {
status: string;
started_at: string;
completed_at: string | null;
input_data: Record<string, unknown>;
memory: Record<string, unknown>;
progress: {
current_node: string | null;
paused_at: string | null;
steps_executed: number;
path: string[];
node_visit_counts: Record<string, number>;
nodes_with_failures: string[];
resume_from?: string;
};
}
export interface Checkpoint {
checkpoint_id: string;
current_node: string | null;
next_node: string | null;
is_clean: boolean;
timestamp: string | null;
error?: string;
}
export interface Message {
seq: number;
role: string;
@@ -161,6 +111,7 @@ export interface NodeSpec {
routes: Record<string, string>;
max_retries: number;
max_node_visits: number;
/** Deprecated compatibility field; the queen is interactive by identity now. */
client_facing: boolean;
success_criteria: string | null;
system_prompt: string;
@@ -330,7 +281,7 @@ export type EventTypeName =
| "webhook_received"
| "custom"
| "escalation_requested"
| "worker_loaded"
| "worker_graph_loaded"
| "credentials_required"
| "queen_phase_changed"
| "subagent_report"
@@ -0,0 +1,69 @@
import { useState, useEffect } from "react";
type BridgeStatus = "checking" | "connected" | "disconnected" | "offline";
const BRIDGE_STATUS_URL = "/api/browser/status";
const POLL_INTERVAL_MS = 3000;
export default function BrowserStatusBadge() {
const [status, setStatus] = useState<BridgeStatus>("checking");
useEffect(() => {
let cancelled = false;
const check = async () => {
try {
const res = await fetch(BRIDGE_STATUS_URL, {
signal: AbortSignal.timeout(2000),
});
if (cancelled) return;
if (res.ok) {
const data = await res.json();
setStatus(data.connected ? "connected" : "disconnected");
} else {
setStatus("offline");
}
} catch {
if (!cancelled) setStatus("offline");
}
};
check();
const timer = setInterval(check, POLL_INTERVAL_MS);
return () => {
cancelled = true;
clearInterval(timer);
};
}, []);
if (status === "checking") return null;
const label =
status === "connected"
? "Browser connected"
: status === "disconnected"
? "Extension not connected"
: "Browser offline";
const dotClass =
status === "connected"
? "bg-green-500"
: status === "disconnected"
? "bg-yellow-500"
: "bg-muted-foreground/40";
return (
<div
className="flex items-center gap-1.5 text-xs select-none"
title={label}
>
<span className="relative flex h-2 w-2 flex-shrink-0">
{status === "connected" && (
<span className="animate-ping absolute inline-flex h-full w-full rounded-full bg-green-400 opacity-60" />
)}
<span className={`relative inline-flex rounded-full h-2 w-2 ${dotClass}`} />
</span>
<span className="text-muted-foreground hidden sm:inline">Browser</span>
</div>
);
}
+9 -5
View File
@@ -3,6 +3,7 @@ import { useNavigate } from "react-router-dom";
import { Crown, X } from "lucide-react";
import { sessionsApi } from "@/api/sessions";
import { loadPersistedTabs, savePersistedTabs, TAB_STORAGE_KEY, type PersistedTabState } from "@/lib/tab-persistence";
import BrowserStatusBadge from "@/components/BrowserStatusBadge";
export interface TopBarTab {
agentType: string;
@@ -129,11 +130,14 @@ export default function TopBar({ tabs: tabsProp, onTabClick, onCloseTab, canClos
)}
</div>
{children && (
<div className="flex items-center gap-1 flex-shrink-0">
{children}
</div>
)}
<div className="flex items-center gap-3 flex-shrink-0">
<BrowserStatusBadge />
{children && (
<div className="flex items-center gap-1">
{children}
</div>
)}
</div>
</div>
);
}
+1 -2
View File
@@ -78,8 +78,7 @@ export function sseEventToChatMessage(
}
case "client_input_requested":
// Handled explicitly in handleSSEEvent (workspace.tsx) so it can
// create a worker_input_request message and set awaitingInput state.
// Handled explicitly in handleSSEEvent (workspace.tsx) for queen input widgets.
return null;
case "client_input_received": {
+15 -209
View File
@@ -350,8 +350,8 @@ interface AgentBackendState {
pendingOptions: string[] | null;
/** Multiple questions from ask_user_multiple */
pendingQuestions: { id: string; prompt: string; options?: string[] }[] | null;
/** Whether the pending question came from queen or worker */
pendingQuestionSource: "queen" | "worker" | null;
/** Whether the pending question came from the queen interaction flow */
pendingQuestionSource: "queen" | null;
/** Per-node context window usage (from context_usage_updated events) */
contextUsage: Record<string, { usagePct: number; messageCount: number; estimatedTokens: number; maxTokens: number }>;
/** Whether the queen's LLM supports image content — false disables the attach button */
@@ -1118,7 +1118,7 @@ export default function Workspace() {
// At this point liveSession is guaranteed set — if both reconnect and create
// failed, the throw inside the catch exits the outer try block.
const session = liveSession!;
const displayName = formatAgentDisplayName(session.worker_name || agentType);
const displayName = formatAgentDisplayName(session.graph_name || agentType);
const initialPhase = restoredPhase || session.queen_phase || (session.has_worker ? "staging" : "planning");
queenPhaseRef.current[agentType] = initialPhase;
updateAgentState(agentType, {
@@ -1156,7 +1156,6 @@ export default function Workspace() {
});
// Restore messages when rejoining an existing session OR cold-restoring from disk.
let isWorkerRunning = false;
const restoredMsgs: ChatMessage[] = [];
// For cold-restore, use the old session ID. For live resume, use current session.
const historyId = coldRestoreId ?? (isResumedSession ? session.session_id : undefined);
@@ -1172,17 +1171,6 @@ export default function Workspace() {
restoredFlowchartMap = restored.flowchartMap;
restoredOriginalDraft = restored.originalDraft;
}
// Check worker status (needed for isWorkerRunning flag)
try {
const { sessions: workerSessions } = await sessionsApi.workerSessions(historyId);
const resumable = workerSessions.find(
(s) => s.status === "active" || s.status === "paused",
);
isWorkerRunning = resumable?.status === "active";
} catch {
// Worker session listing failed — not critical
}
}
// Merge messages in chronological order (only for live resume; cold restore
@@ -1213,7 +1201,6 @@ export default function Workspace() {
ready: true,
loading: false,
queenReady: !!(isResumedSession || hasRestoredContent),
...(isWorkerRunning ? { workerRunState: "running" } : {}),
// Restore flowchart overlay from persisted events
...(restoredFlowchartMap ? { flowchartMap: restoredFlowchartMap } : {}),
...(restoredOriginalDraft ? { originalDraft: restoredOriginalDraft, draftGraph: null } : {}),
@@ -1784,27 +1771,8 @@ export default function Workspace() {
: null;
if (isQueen) {
const prompt = (event.data?.prompt as string) || "";
const isAutoBlock = !prompt && !options && !questions;
// Queen auto-block (empty prompt, no options) should not
// overwrite a pending worker question — the worker's
// QuestionWidget must stay visible. Use the updater form
// to read the latest state and avoid stale-closure races
// when worker and queen events arrive in the same batch.
setAgentStates(prev => {
const cur = prev[agentType] || defaultAgentState();
const workerQuestionActive = cur.pendingQuestionSource === "worker";
if (isAutoBlock && workerQuestionActive) {
return {
...prev, [agentType]: {
...cur,
awaitingInput: true,
isTyping: false,
isStreaming: false,
queenIsTyping: false,
queenBuilding: false,
}
};
}
return {
...prev, [agentType]: {
...cur,
@@ -1821,37 +1789,11 @@ export default function Workspace() {
};
});
} else {
// Worker input request.
// If the prompt is non-empty (explicit ask_user), create a visible
// message bubble. For auto-block (empty prompt), the worker's text
// was already streamed via client_output_delta — just activate the
// reply box below the last worker message.
const eid = event.execution_id ?? "";
const prompt = (event.data?.prompt as string) || "";
if (prompt) {
const workerInputMsg: ChatMessage = {
id: `worker-input-${eid}-${event.node_id || Date.now()}`,
agent: displayName || event.node_id || "Worker",
agentColor: "",
content: prompt,
timestamp: "",
type: "worker_input_request",
role: "worker",
thread: agentType,
createdAt: eventCreatedAt,
};
console.log('[CLIENT_INPUT_REQ] creating worker_input_request msg:', workerInputMsg.id, 'content:', prompt.slice(0, 80));
upsertChatMessage(agentType, workerInputMsg);
}
updateAgentState(agentType, {
awaitingInput: true,
isTyping: false,
isStreaming: false,
queenIsTyping: false,
pendingQuestion: prompt || null,
pendingOptions: options,
pendingQuestionSource: "worker",
});
console.warn(
"[CLIENT_INPUT_REQ] ignoring non-queen client_input_requested event",
streamId,
event.node_id,
);
}
}
if (event.type === "execution_paused") {
@@ -2305,10 +2247,10 @@ export default function Workspace() {
break;
}
case "worker_loaded": {
const workerName = event.data?.worker_name as string | undefined;
case "worker_graph_loaded": {
const graphName = event.data?.graph_name as string | undefined;
const agentPathFromEvent = event.data?.agent_path as string | undefined;
const displayName = formatAgentDisplayName(workerName || baseAgentType(agentType));
const displayName = formatAgentDisplayName(graphName || baseAgentType(agentType));
// Invalidate cached credential requirements so the modal fetches
// fresh data the next time it opens (the new agent may have
@@ -2641,41 +2583,6 @@ export default function Workspace() {
return;
}
// If worker is awaiting free-text input (no options / no QuestionWidget),
// route the message directly to the worker instead of the queen.
if (agentStates[activeWorker]?.awaitingInput && agentStates[activeWorker]?.pendingQuestionSource === "worker" && !agentStates[activeWorker]?.pendingOptions) {
const state = agentStates[activeWorker];
if (state?.sessionId && state?.ready) {
const userMsg: ChatMessage = {
id: makeId(), agent: "You", agentColor: "",
content: text, timestamp: "", type: "user", thread, createdAt: Date.now(),
};
setSessionsByAgent(prev => ({
...prev,
[activeWorker]: prev[activeWorker].map(s =>
s.id === activeSession.id ? { ...s, messages: [...s.messages, userMsg] } : s
),
}));
updateAgentState(activeWorker, { awaitingInput: false, workerInputMessageId: null, isTyping: true, pendingQuestion: null, pendingOptions: null, pendingQuestions: null, pendingQuestionSource: null });
executionApi.workerInput(state.sessionId, text).catch((err: unknown) => {
const errMsg = err instanceof Error ? err.message : String(err);
const errorChatMsg: ChatMessage = {
id: makeId(), agent: "System", agentColor: "",
content: `Failed to send to worker: ${errMsg}`,
timestamp: "", type: "system", thread, createdAt: Date.now(),
};
setSessionsByAgent(prev => ({
...prev,
[activeWorker]: prev[activeWorker].map(s =>
s.id === activeSession.id ? { ...s, messages: [...s.messages, errorChatMsg] } : s
),
}));
updateAgentState(activeWorker, { isTyping: false, isStreaming: false });
});
}
return;
}
// If queen has a pending question widget, dismiss it when user types directly
if (agentStates[activeWorker]?.pendingQuestionSource === "queen") {
updateAgentState(activeWorker, { pendingQuestion: null, pendingOptions: null, pendingQuestions: null, pendingQuestionSource: null });
@@ -2727,96 +2634,6 @@ export default function Workspace() {
}
}, [activeWorker, activeSession, agentStates, updateAgentState]);
// --- handleWorkerReply: send user input to the worker via dedicated endpoint ---
const handleWorkerReply = useCallback((text: string) => {
if (!activeSession) return;
const state = agentStates[activeWorker];
if (!state?.sessionId || !state?.ready) return;
// Add user reply to chat thread
const userMsg: ChatMessage = {
id: makeId(), agent: "You", agentColor: "",
content: text, timestamp: "", type: "user", thread: activeWorker, createdAt: Date.now(),
};
setSessionsByAgent(prev => ({
...prev,
[activeWorker]: prev[activeWorker].map(s =>
s.id === activeSession.id ? { ...s, messages: [...s.messages, userMsg] } : s
),
}));
// Clear awaiting state optimistically
updateAgentState(activeWorker, { awaitingInput: false, workerInputMessageId: null, isTyping: true, pendingQuestion: null, pendingOptions: null, pendingQuestions: null, pendingQuestionSource: null });
executionApi.workerInput(state.sessionId, text).catch((err: unknown) => {
const errMsg = err instanceof Error ? err.message : String(err);
const errorChatMsg: ChatMessage = {
id: makeId(), agent: "System", agentColor: "",
content: `Failed to send to worker: ${errMsg}`,
timestamp: "", type: "system", thread: activeWorker, createdAt: Date.now(),
};
setSessionsByAgent(prev => ({
...prev,
[activeWorker]: prev[activeWorker].map(s =>
s.id === activeSession.id ? { ...s, messages: [...s.messages, errorChatMsg] } : s
),
}));
updateAgentState(activeWorker, { isTyping: false, isStreaming: false });
});
}, [activeWorker, activeSession, agentStates, updateAgentState]);
// --- handleWorkerQuestionAnswer: route predefined answers direct to worker, "Other" through queen ---
const handleWorkerQuestionAnswer = useCallback((answer: string, isOther: boolean) => {
if (!activeSession) return;
const state = agentStates[activeWorker];
const question = state?.pendingQuestion || "";
const opts = state?.pendingOptions;
if (isOther) {
// "Other" free-text → route through queen for evaluation
updateAgentState(activeWorker, { pendingQuestion: null, pendingOptions: null, pendingQuestions: null, pendingQuestionSource: null });
if (question && opts && state?.sessionId && state?.ready) {
const formatted = `[Worker asked: "${question}" | Options: ${opts.join(", ")}]\nUser answered: "${answer}"`;
const userMsg: ChatMessage = {
id: makeId(), agent: "You", agentColor: "",
content: answer, timestamp: "", type: "user", thread: activeWorker, createdAt: Date.now(),
};
setSessionsByAgent(prev => ({
...prev,
[activeWorker]: prev[activeWorker].map(s =>
s.id === activeSession.id ? { ...s, messages: [...s.messages, userMsg] } : s
),
}));
updateAgentState(activeWorker, { isTyping: true, queenIsTyping: true });
executionApi.chat(state.sessionId, formatted).catch((err: unknown) => {
const errMsg = err instanceof Error ? err.message : String(err);
const errorChatMsg: ChatMessage = {
id: makeId(), agent: "System", agentColor: "",
content: `Failed to send message: ${errMsg}`,
timestamp: "", type: "system", thread: activeWorker, createdAt: Date.now(),
};
setSessionsByAgent(prev => ({
...prev,
[activeWorker]: prev[activeWorker].map(s =>
s.id === activeSession.id ? { ...s, messages: [...s.messages, errorChatMsg] } : s
),
}));
updateAgentState(activeWorker, { isTyping: false, isStreaming: false, queenIsTyping: false });
});
} else {
handleSend(answer, activeWorker);
}
} else {
// Predefined option → send directly to worker
handleWorkerReply(answer);
// Queue context for queen (fire-and-forget, no LLM response triggered)
if (question && state?.sessionId && state?.ready) {
const notification = `[Worker asked: "${question}" | User selected: "${answer}"]`;
executionApi.queenContext(state.sessionId, notification).catch(() => { });
}
}
}, [activeWorker, activeSession, agentStates, handleWorkerReply, handleSend, updateAgentState, setSessionsByAgent]);
// --- handleQueenQuestionAnswer: submit queen's own question answer via /chat ---
// The queen asked the question herself, so she already has context — just send the raw answer.
const handleQueenQuestionAnswer = useCallback((answer: string, _isOther: boolean) => {
@@ -2838,11 +2655,9 @@ export default function Workspace() {
}, [activeWorker, handleSend, updateAgentState]);
// --- handleQuestionDismiss: user closed the question widget without answering ---
// Injects a dismiss signal so the blocked node can continue.
const handleQuestionDismiss = useCallback(() => {
const state = agentStates[activeWorker];
if (!state?.sessionId) return;
const source = state.pendingQuestionSource;
const question = state.pendingQuestion || "";
// Clear UI state immediately
@@ -2854,13 +2669,8 @@ export default function Workspace() {
awaitingInput: false,
});
// Unblock the waiting node with a dismiss signal
const dismissMsg = `[User dismissed the question: "${question}"]`;
if (source === "worker") {
executionApi.workerInput(state.sessionId, dismissMsg).catch(() => { });
} else {
executionApi.chat(state.sessionId, dismissMsg).catch(() => { });
}
executionApi.chat(state.sessionId, dismissMsg).catch(() => { });
}, [agentStates, activeWorker, updateAgentState]);
const handleLoadAgent = useCallback(async (agentPath: string) => {
@@ -2868,8 +2678,8 @@ export default function Workspace() {
if (!state?.sessionId) return;
try {
await sessionsApi.loadWorker(state.sessionId, agentPath);
// Success: worker_loaded SSE event will handle UI updates automatically
await sessionsApi.loadGraph(state.sessionId, agentPath);
// Success: worker_graph_loaded SSE event will handle UI updates automatically
} catch (err) {
// 424 = credentials required — open the credentials modal
if (err instanceof ApiError && err.status === 424) {
@@ -3232,11 +3042,7 @@ export default function Workspace() {
pendingQuestion={activeAgentState?.awaitingInput ? activeAgentState.pendingQuestion : null}
pendingOptions={activeAgentState?.awaitingInput ? activeAgentState.pendingOptions : null}
pendingQuestions={activeAgentState?.awaitingInput ? activeAgentState.pendingQuestions : null}
onQuestionSubmit={
activeAgentState?.pendingQuestionSource === "queen"
? handleQueenQuestionAnswer
: handleWorkerQuestionAnswer
}
onQuestionSubmit={handleQueenQuestionAnswer}
onMultiQuestionSubmit={handleMultiQuestionAnswer}
onQuestionDismiss={handleQuestionDismiss}
contextUsage={activeAgentState?.contextUsage}
+2 -1
View File
@@ -14,6 +14,7 @@ The script detects available credentials and prompts you to pick a provider. You
- `ANTHROPIC_API_KEY`
- `OPENAI_API_KEY`
- `GEMINI_API_KEY`
- `KIMI_API_KEY`
- `ZAI_API_KEY`
- Claude Code / Codex / Kimi subscription
@@ -35,7 +36,7 @@ uv run python tests/dummy_agents/run_all.py --verbose
| parallel_merge | 4 | Fan-out/fan-in, failure strategies |
| retry | 4 | Retry mechanics, exhaustion, ON_FAILURE edges |
| feedback_loop | 3 | Feedback cycles, max_node_visits |
| worker | 4 | Real MCP tools (example_tool, get_current_time, save_data/load_data) |
| worker | 5 | Real MCP tools plus a two-worker artifact round-trip smoke test |
## Notes
+227 -5
View File
@@ -6,6 +6,9 @@ Run via: cd core && uv run python tests/dummy_agents/run_all.py
from __future__ import annotations
import asyncio
import json
import os
from pathlib import Path
import pytest
@@ -21,6 +24,7 @@ _selected_model: str | None = None
_selected_api_key: str | None = None
_selected_extra_headers: dict[str, str] | None = None
_selected_api_base: str | None = None
_EXECUTION_TIMEOUT_SECS = float(os.environ.get("DUMMY_AGENT_EXEC_TIMEOUT_SECS", "90"))
def set_llm_selection(
@@ -40,18 +44,55 @@ def set_llm_selection(
# ── collection hook: skip entire directory when not configured ───────
def _try_auto_configure_from_hive_config() -> bool:
"""Try to load LLM provider from ~/.hive/configuration.json.
Returns True if successfully configured, False otherwise.
"""
try:
from framework.config import (
get_api_base,
get_api_key,
get_llm_extra_kwargs,
get_preferred_model,
)
model = get_preferred_model()
api_key = get_api_key()
if not model or not api_key:
return False
extra_kwargs = get_llm_extra_kwargs()
set_llm_selection(
model=model,
api_key=api_key,
api_base=get_api_base(),
extra_headers=extra_kwargs.get("extra_headers"),
)
return True
except Exception:
return False
def pytest_collection_modifyitems(config, items):
"""Skip all dummy_agents tests when no LLM is configured.
This prevents these tests from running in regular CI. They only run
when launched via run_all.py (which calls set_llm_selection first).
Resolution order:
1. Already configured via run_all.py (set_llm_selection called)
2. Auto-configure from ~/.hive/configuration.json
3. Skip tests
"""
if _selected_model is not None:
return # LLM configured, run normally
return # LLM configured via run_all.py, run normally
# Try auto-configure from hive config
if _try_auto_configure_from_hive_config():
return # Config found, run tests
skip = pytest.mark.skip(
reason="Dummy agent tests require a real LLM. "
"Run via: cd core && uv run python tests/dummy_agents/run_all.py"
"Configure ~/.hive/configuration.json or "
"run via: cd core && uv run python tests/dummy_agents/run_all.py"
)
for item in items:
if "dummy_agents" in str(item.fspath):
@@ -120,6 +161,8 @@ def make_executor(
loop_config: dict | None = None,
tool_registry=None,
storage_path: Path | None = None,
event_bus=None,
stream_id: str = "",
) -> GraphExecutor:
"""Factory that creates a GraphExecutor with a real LLM."""
tools = []
@@ -128,7 +171,7 @@ def make_executor(
tools = list(tool_registry.get_tools().values())
tool_executor = tool_registry.get_executor()
return GraphExecutor(
executor = GraphExecutor(
runtime=runtime,
llm=llm,
tools=tools,
@@ -137,4 +180,183 @@ def make_executor(
parallel_config=parallel_config,
loop_config=loop_config or {"max_iterations": 10},
storage_path=storage_path,
event_bus=event_bus,
stream_id=stream_id,
)
original_execute = executor.execute
async def execute_with_timeout(*args, **kwargs):
try:
return await asyncio.wait_for(
original_execute(*args, **kwargs),
timeout=_EXECUTION_TIMEOUT_SECS,
)
except TimeoutError as e:
raise TimeoutError(
"Dummy agent execution timed out after "
f"{_EXECUTION_TIMEOUT_SECS:.0f}s. "
"This usually means the current worker execution path "
"(GraphExecutor -> WorkerAgent -> EventLoopNode) is stuck "
"waiting on the provider or tool-calling behavior."
) from e
executor.execute = execute_with_timeout # type: ignore[method-assign]
return executor
# ── Artifact capture: raw output written to disk for every test ──────
ARTIFACTS_DIR = Path("/tmp/hive_test_artifacts")
class TestArtifact:
"""Collects raw output + expected behavior for a single test.
Captures TWO kinds of data:
1. Checks: individual assertion results (expected vs actual)
2. Framework raw output: the real conversation, state, tool calls
written by the executor to storage_path copied verbatim,
not curated.
Usage in tests:
def test_foo(artifact, ...):
result = await executor.execute(...)
artifact.record(result, expected="...", storage_path=tmp_path/"session")
"""
def __init__(self, test_id: str):
self.test_id = test_id
self._safe_name = test_id.replace("::", "__").replace("/", "_")
self._dir = ARTIFACTS_DIR / self._safe_name
self._data: dict = {"test_id": test_id, "raw_output": None, "expected": "", "checks": []}
def record(self, result, *, expected: str = "", storage_path=None):
"""Record an ExecutionResult and copy real framework files."""
self._data["expected"] = expected
if result is None:
self._data["raw_output"] = None
return
self._data["raw_output"] = {
"success": getattr(result, "success", None),
"output": _safe_serialize(getattr(result, "output", {})),
"error": getattr(result, "error", None),
"path": getattr(result, "path", []),
"steps_executed": getattr(result, "steps_executed", 0),
"total_tokens": getattr(result, "total_tokens", 0),
"total_latency_ms": getattr(result, "total_latency_ms", 0),
"execution_quality": getattr(result, "execution_quality", ""),
"total_retries": getattr(result, "total_retries", 0),
"node_visit_counts": getattr(result, "node_visit_counts", {}),
"nodes_with_failures": getattr(result, "nodes_with_failures", []),
"session_state_buffer": _safe_serialize(
(getattr(result, "session_state", {}) or {}).get("data_buffer", {})
),
}
# Copy real framework output files (conversations, state, runs)
if storage_path is not None:
self._copy_framework_files(Path(storage_path))
def _copy_framework_files(self, storage_path: Path):
"""Copy real framework output to persistent artifact directory."""
import shutil
raw_dir = self._dir / "raw"
raw_dir.mkdir(parents=True, exist_ok=True)
if storage_path.exists():
for src in storage_path.rglob("*"):
if src.is_file() and src.suffix in (".json", ".jsonl", ".txt"):
rel = src.relative_to(storage_path)
dst = raw_dir / rel
dst.parent.mkdir(parents=True, exist_ok=True)
shutil.copy2(src, dst)
def record_value(self, key: str, value, *, expected: str = ""):
"""Record an arbitrary key-value (for non-ExecutionResult tests)."""
self._data.setdefault("values", {})[key] = _safe_serialize(value)
if expected:
self._data["expected"] = expected
def check(self, description: str, passed: bool, actual: str = "", expected_val: str = ""):
"""Record an individual assertion check."""
self._data["checks"].append({
"description": description,
"passed": passed,
"actual": actual,
"expected": expected_val,
})
def save(self):
"""Write artifact to disk."""
self._dir.mkdir(parents=True, exist_ok=True)
path = self._dir / "artifact.json"
with open(path, "w") as f:
json.dump(self._data, f, indent=2, default=str)
def _safe_serialize(obj):
"""Convert to JSON-safe types."""
if obj is None:
return None
if isinstance(obj, (str, int, float, bool)):
return obj
if isinstance(obj, dict):
return {str(k): _safe_serialize(v) for k, v in obj.items()}
if isinstance(obj, (list, tuple)):
return [_safe_serialize(v) for v in obj]
return str(obj)[:500]
@pytest.fixture
def artifact(request, tmp_path):
"""Fixture that captures raw test output to disk.
Every test gets an artifact recorder. Call artifact.record(result)
and artifact.check("description", passed, actual, expected) to
capture data. Saved automatically on teardown.
On teardown, copies ALL framework output files (conversations, state,
tool logs) from tmp_path to the persistent artifact directory. This
captures the REAL raw output not curated summaries.
"""
test_id = request.node.nodeid
art = TestArtifact(test_id)
yield art
# Copy all framework files from the test's tmp_path
art._copy_framework_files(tmp_path)
art.save()
# Autouse hook: for tests that DON'T use the artifact fixture,
# create a minimal artifact from pass/fail status.
@pytest.hookimpl(tryfirst=True, hookwrapper=True)
def pytest_runtest_makereport(item, call):
outcome = yield
rep = outcome.get_result()
if rep.when == "call":
item._test_report = rep
def pytest_runtest_teardown(item, nextitem):
"""Auto-save a minimal artifact for tests that didn't use the fixture."""
report = getattr(item, "_test_report", None)
if report is None:
return
# Check if the test already used the artifact fixture
if "artifact" in item.fixturenames:
return # Already handled by fixture teardown
safe_name = item.nodeid.replace("::", "__").replace("/", "_")
out_dir = ARTIFACTS_DIR / safe_name
out_dir.mkdir(parents=True, exist_ok=True)
data = {
"test_id": item.nodeid,
"raw_output": None,
"expected": "",
"checks": [],
"auto_captured": True,
"status": "PASS" if report.passed else ("FAIL" if report.failed else "SKIP"),
}
if report.failed and report.longreprtext:
data["failure_text"] = report.longreprtext[:5000]
with open(out_dir / "artifact.json", "w") as f:
json.dump(data, f, indent=2, default=str)
+361 -20
View File
@@ -10,30 +10,33 @@ Usage:
from __future__ import annotations
import asyncio
import os
import sys
import time
import xml.etree.ElementTree as ET
from pathlib import Path
from tempfile import NamedTemporaryFile
from tempfile import NamedTemporaryFile, TemporaryDirectory
TESTS_DIR = Path(__file__).parent
# ── provider registry ────────────────────────────────────────────────
# (env_var, display_name, default_model) — models match quickstart.sh defaults
# (env_var, display_name, litellm_model, display_model)
# display_model matches quickstart.sh labels; litellm_model is what LiteLLMProvider needs.
API_KEY_PROVIDERS = [
("ANTHROPIC_API_KEY", "Anthropic (Claude)", "claude-sonnet-4-20250514"),
("OPENAI_API_KEY", "OpenAI", "gpt-5-mini"),
("GEMINI_API_KEY", "Google Gemini", "gemini/gemini-3-flash-preview"),
("ZAI_API_KEY", "ZAI (GLM)", "openai/glm-5"),
("GROQ_API_KEY", "Groq", "moonshotai/kimi-k2-instruct-0905"),
("MISTRAL_API_KEY", "Mistral", "mistral-large-latest"),
("CEREBRAS_API_KEY", "Cerebras", "cerebras/zai-glm-4.7"),
("TOGETHER_API_KEY", "Together AI", "together_ai/meta-llama/Llama-3.3-70B-Instruct-Turbo"),
("DEEPSEEK_API_KEY", "DeepSeek", "deepseek-chat"),
("MINIMAX_API_KEY", "MiniMax", "MiniMax-M2.5"),
("HIVE_API_KEY", "Hive LLM", "hive/queen"),
("ANTHROPIC_API_KEY", "Anthropic (Claude)", "claude-sonnet-4-20250514", "claude-sonnet-4-20250514"),
("OPENAI_API_KEY", "OpenAI", "gpt-5-mini", "gpt-5-mini"),
("GEMINI_API_KEY", "Google Gemini", "gemini/gemini-3-flash-preview", "gemini/gemini-3-flash-preview"),
("KIMI_API_KEY", "Kimi", "kimi/kimi-k2.5", "kimi-k2.5"),
("ZAI_API_KEY", "ZAI (GLM)", "openai/glm-5", "openai/glm-5"),
("GROQ_API_KEY", "Groq", "moonshotai/kimi-k2-instruct-0905", "moonshotai/kimi-k2-instruct-0905"),
("MISTRAL_API_KEY", "Mistral", "mistral-large-latest", "mistral-large-latest"),
("CEREBRAS_API_KEY", "Cerebras", "cerebras/zai-glm-4.7", "cerebras/zai-glm-4.7"),
("TOGETHER_API_KEY", "Together AI", "together_ai/meta-llama/Llama-3.3-70B-Instruct-Turbo", "together_ai/meta-llama/Llama-3.3-70B-Instruct-Turbo"),
("DEEPSEEK_API_KEY", "DeepSeek", "deepseek-chat", "deepseek-chat"),
("MINIMAX_API_KEY", "MiniMax", "MiniMax-M2.5", "MiniMax-M2.5"),
("HIVE_API_KEY", "Hive LLM", "hive/queen", "hive/queen"),
]
@@ -81,6 +84,7 @@ def detect_available() -> list[dict]:
{
"name": "Claude Code (subscription)",
"model": "claude-sonnet-4-20250514",
"display_model": "claude-sonnet-4-20250514",
"api_key": token,
"source": "claude_code_sub",
"extra_headers": {"authorization": f"Bearer {token}"},
@@ -93,6 +97,7 @@ def detect_available() -> list[dict]:
{
"name": "Codex (subscription)",
"model": "gpt-5-mini",
"display_model": "gpt-5-mini",
"api_key": token,
"source": "codex_sub",
}
@@ -103,30 +108,71 @@ def detect_available() -> list[dict]:
available.append(
{
"name": "Kimi Code (subscription)",
"model": "moonshotai/kimi-k2-instruct-0905",
# Quickstart displays "kimi-k2.5", but LiteLLMProvider needs the
# provider-prefixed form to route through the Kimi coding endpoint.
"model": "kimi/kimi-k2.5",
"display_model": "kimi-k2.5",
"api_key": token,
"source": "kimi_sub",
"api_base": "https://api.kimi.com/coding",
}
)
# API key providers (env vars)
for env_var, name, default_model in API_KEY_PROVIDERS:
for env_var, name, default_model, display_model in API_KEY_PROVIDERS:
key = os.environ.get(env_var)
if key:
entry = {
"name": f"{name} (${env_var})",
"model": default_model,
"display_model": display_model,
"api_key": key,
"source": env_var,
}
# ZAI requires an api_base (OpenAI-compatible endpoint)
if env_var == "ZAI_API_KEY":
entry["api_base"] = "https://api.z.ai/api/coding/paas/v4"
# Kimi Code uses the coding endpoint selected by quickstart.
elif env_var == "KIMI_API_KEY":
entry["api_base"] = "https://api.kimi.com/coding"
available.append(entry)
return available
def _load_from_hive_config() -> dict | None:
"""Try to load LLM provider from ~/.hive/configuration.json.
Returns a provider dict matching the format expected by
set_llm_selection(), or None if config is missing/incomplete.
"""
try:
from framework.config import (
get_api_base,
get_api_key,
get_llm_extra_kwargs,
get_preferred_model,
)
except ImportError:
return None
model = get_preferred_model()
api_key = get_api_key()
if not model or not api_key:
return None
extra_kwargs = get_llm_extra_kwargs()
return {
"name": f"Hive config ({model})",
"model": model,
"display_model": model,
"api_key": api_key,
"api_base": get_api_base(),
"extra_headers": extra_kwargs.get("extra_headers"),
"source": "hive_config",
}
def prompt_provider_selection() -> dict:
"""Interactive prompt to select an LLM provider. Returns the chosen provider dict."""
available = detect_available()
@@ -136,17 +182,19 @@ def prompt_provider_selection() -> dict:
print(" Set an API key environment variable, e.g.:")
print(" export ANTHROPIC_API_KEY=sk-...")
print(" export OPENAI_API_KEY=sk-...")
print(" export KIMI_API_KEY=...")
print(" Or authenticate with Claude Code: claude")
print(" Or authenticate with Kimi Code: kimi /login")
sys.exit(1)
if len(available) == 1:
choice = available[0]
print(f"\n Using: {choice['name']} ({choice['model']})")
print(f"\n Using: {choice['name']} ({choice.get('display_model', choice['model'])})")
return choice
print("\n Available LLM providers:\n")
for i, p in enumerate(available, 1):
print(f" {i}) {p['name']} [{p['model']}]")
print(f" {i}) {p['name']} [{p.get('display_model', p['model'])}]")
print()
while True:
@@ -155,13 +203,296 @@ def prompt_provider_selection() -> dict:
idx = int(raw) - 1
if 0 <= idx < len(available):
choice = available[idx]
print(f"\n Using: {choice['name']} ({choice['model']})\n")
print(
f"\n Using: {choice['name']} "
f"({choice.get('display_model', choice['model'])})\n"
)
return choice
except (ValueError, EOFError):
pass
print(f" Please enter a number between 1 and {len(available)}")
async def _smoke_test_provider_async(provider: dict, timeout_seconds: float = 25.0) -> None:
"""Fail fast if the selected provider cannot complete a tiny request.
This catches the common "pytest looks frozen on the first test" failure mode
where the first real LLM call hangs or never reaches a usable response.
"""
from framework.llm.litellm import LiteLLMProvider
from framework.llm.provider import Tool
from framework.graph.edge import GraphSpec
from framework.graph.executor import GraphExecutor
from framework.graph.goal import Goal
from framework.graph.node import NodeSpec
from framework.runtime.core import Runtime
kwargs = {
"model": provider["model"],
"api_key": provider["api_key"],
}
if provider.get("api_base"):
kwargs["api_base"] = provider["api_base"]
if provider.get("extra_headers"):
kwargs["extra_headers"] = provider["extra_headers"]
llm = LiteLLMProvider(**kwargs)
async def _run_plain_completion() -> None:
result = await llm.acomplete(
messages=[{"role": "user", "content": "Reply with exactly OK."}],
max_tokens=8,
)
content = (result.content or "").strip()
if not content:
raise RuntimeError("provider returned an empty completion during smoke test")
async def _run_tool_completion() -> None:
tool = Tool(
name="record_result",
description="Record the final result string.",
parameters={
"properties": {
"value": {
"type": "string",
"description": "The result to record.",
}
},
"required": ["value"],
},
)
response = await llm.acomplete(
messages=[
{
"role": "user",
"content": (
"Call the record_result tool exactly once with value='OK'. "
"Do not answer with plain text."
),
}
],
tools=[tool],
max_tokens=32,
)
raw = response.raw_response
tool_calls = []
if raw is not None and getattr(raw, "choices", None):
msg = raw.choices[0].message
tool_calls = msg.tool_calls or []
if not tool_calls:
raise RuntimeError("provider completed but did not return any tool calls")
async def _run_worker_execution() -> None:
with TemporaryDirectory(prefix="dummy-worker-smoke-") as tmpdir:
tmp_path = Path(tmpdir)
runtime = Runtime(storage_path=tmp_path / "runtime")
executor = GraphExecutor(
runtime=runtime,
llm=llm,
storage_path=tmp_path / "session",
loop_config={"max_iterations": 4},
)
graph = GraphSpec(
id="dummy-worker-smoke",
goal_id="dummy-worker-smoke-goal",
entry_node="worker",
entry_points={"start": "worker"},
terminal_nodes=["worker"],
nodes=[
NodeSpec(
id="worker",
name="Worker Smoke Test",
description="Minimal worker-path smoke test",
node_type="event_loop",
input_keys=["task"],
output_keys=["result"],
system_prompt=(
"You are a worker test node. Read the 'task' input. "
"You MUST call set_output with key='result' and value='OK'. "
"Do not use plain text as the final answer."
),
)
],
edges=[],
memory_keys=["task", "result"],
conversation_mode="continuous",
)
goal = Goal(
id="dummy-worker-smoke-goal",
name="Dummy Worker Smoke",
description="Verify the current worker execution implementation can finish.",
)
result = await executor.execute(
graph,
goal,
{"task": "Return OK by calling set_output."},
validate_graph=False,
)
if not result.success:
raise RuntimeError(result.error or "worker execution smoke failed")
if result.output.get("result") != "OK":
raise RuntimeError(
"worker execution completed but did not produce result='OK'"
)
async def _run_branch_execution() -> None:
with TemporaryDirectory(prefix="dummy-branch-smoke-") as tmpdir:
tmp_path = Path(tmpdir)
runtime = Runtime(storage_path=tmp_path / "runtime")
executor = GraphExecutor(
runtime=runtime,
llm=llm,
storage_path=tmp_path / "session",
loop_config={"max_iterations": 4},
)
graph = GraphSpec(
id="dummy-branch-smoke",
goal_id="dummy-branch-smoke-goal",
entry_node="classify",
entry_points={"start": "classify"},
terminal_nodes=["positive", "negative"],
nodes=[
NodeSpec(
id="classify",
name="Branch Classifier",
description="Routes to the positive or negative handler",
node_type="event_loop",
input_keys=["route"],
output_keys=["label"],
system_prompt=(
"Read the 'route' input. "
"If it is exactly 'positive', call set_output with "
"key='label' and value='positive'. "
"Otherwise call set_output with key='label' and value='negative'. "
"Do not use plain text as the final answer."
),
),
NodeSpec(
id="positive",
name="Positive Branch",
description="Positive terminal branch",
node_type="event_loop",
output_keys=["result"],
system_prompt=(
"Call set_output with key='result' and value='BRANCH_OK'. "
"Do not use plain text as the final answer."
),
),
NodeSpec(
id="negative",
name="Negative Branch",
description="Negative terminal branch",
node_type="event_loop",
output_keys=["result"],
system_prompt=(
"Call set_output with key='result' and value='UNEXPECTED_NEGATIVE'. "
"Do not use plain text as the final answer."
),
),
],
edges=[
{
"id": "classify-to-positive",
"source": "classify",
"target": "positive",
"condition": "conditional",
"condition_expr": "output.get('label') == 'positive'",
"priority": 1,
},
{
"id": "classify-to-negative",
"source": "classify",
"target": "negative",
"condition": "conditional",
"condition_expr": "output.get('label') == 'negative'",
"priority": 0,
},
],
memory_keys=["route", "label", "result"],
conversation_mode="continuous",
)
goal = Goal(
id="dummy-branch-smoke-goal",
name="Dummy Branch Smoke",
description="Verify conditional worker routing reaches the expected terminal.",
)
result = await executor.execute(
graph,
goal,
{"route": "positive"},
validate_graph=False,
)
if not result.success:
raise RuntimeError(result.error or "branch execution smoke failed")
if result.path != ["classify", "positive"]:
raise RuntimeError(
"branch execution did not reach the expected terminal path: "
f"{result.path}"
)
if not result.output.get("result"):
raise RuntimeError(
"branch execution reached the expected terminal path but did not "
f"produce a non-empty result output: path={result.path} "
f"output={result.output}"
)
current_step = "plain completion"
current_timeout = timeout_seconds
worker_timeout = max(
timeout_seconds,
float(os.environ.get("DUMMY_AGENT_SMOKE_WORKER_TIMEOUT_SECS", "30")),
)
branch_timeout = max(
timeout_seconds,
float(os.environ.get("DUMMY_AGENT_SMOKE_BRANCH_TIMEOUT_SECS", "60")),
)
try:
await asyncio.wait_for(_run_plain_completion(), timeout=current_timeout)
current_step = "tool calling"
current_timeout = timeout_seconds
await asyncio.wait_for(_run_tool_completion(), timeout=current_timeout)
current_step = "single-node worker execution"
current_timeout = worker_timeout
await asyncio.wait_for(_run_worker_execution(), timeout=current_timeout)
current_step = "branch worker execution"
current_timeout = branch_timeout
await asyncio.wait_for(_run_branch_execution(), timeout=current_timeout)
except TimeoutError as exc:
raise RuntimeError(
f"provider smoke test timed out during {current_step} "
f"after {current_timeout:.0f}s"
) from exc
def smoke_test_provider(provider: dict, timeout_seconds: float = 25.0) -> None:
"""Run a tiny real completion before starting pytest."""
print(" Running provider smoke test...", end=" ", flush=True)
started = time.time()
try:
asyncio.run(_smoke_test_provider_async(provider, timeout_seconds=timeout_seconds))
except TimeoutError:
print("FAILED")
print(
" The selected provider did not complete a tiny request within "
f"{timeout_seconds:.0f}s."
)
print(
" This usually means the provider is unreachable, rate-limited, "
"or hanging on the selected model/API base."
)
sys.exit(1)
except Exception as e:
print("FAILED")
print(f" Provider smoke test failed: {type(e).__name__}: {e}")
sys.exit(1)
elapsed = time.time() - started
print(f"OK ({elapsed:.1f}s)")
# ── test runner ──────────────────────────────────────────────────────
@@ -301,13 +632,23 @@ def print_table(agents: dict[str, dict], total_time: float, verbose: bool = Fals
def main() -> int:
verbose = "--verbose" in sys.argv or "-v" in sys.argv
interactive = "--interactive" in sys.argv
print("\n ╔═══════════════════════════════════════╗")
print(" ║ Level 2: Dummy Agent Tests (E2E) ║")
print(" ╚═══════════════════════════════════════╝")
# Step 1: detect credentials and let user pick
provider = prompt_provider_selection()
# Step 1: prefer ~/.hive/configuration.json unless --interactive
provider = None
if not interactive:
provider = _load_from_hive_config()
if provider:
print(f"\n Using hive config: {provider['display_model']}")
# Fall back to interactive selection
if provider is None:
provider = prompt_provider_selection()
smoke_test_provider(provider)
# Step 2: inject selection into conftest module state
from tests.dummy_agents.conftest import set_llm_selection
@@ -0,0 +1,329 @@
"""Component tests: Continuous Conversation Mode — threading, buffer.
Exercises conversation threading across nodes to verify that downstream
nodes receive context from upstream nodes in continuous mode.
"""
from __future__ import annotations
import pytest
from framework.graph.edge import EdgeCondition, EdgeSpec, GraphSpec
from framework.graph.node import NodeSpec
from .conftest import make_executor
SET_OUTPUT_INSTRUCTION = (
"You MUST call the set_output tool to provide your answer. "
"Do not just write text — call set_output with the correct "
"key and value."
)
def _build_pipeline_graph(
conversation_mode: str = "continuous",
) -> GraphSpec:
"""Two-node pipeline: intake captures, transform uppercases."""
return GraphSpec(
id="continuous-pipeline",
goal_id="dummy",
entry_node="intake",
entry_points={"start": "intake"},
terminal_nodes=["transform"],
conversation_mode=conversation_mode,
nodes=[
NodeSpec(
id="intake",
name="Intake",
description="Captures raw input",
node_type="event_loop",
input_keys=["raw"],
output_keys=["captured"],
system_prompt=(
"Read the 'raw' input value and call "
"set_output with key='captured' and the "
"same value. " + SET_OUTPUT_INSTRUCTION
),
),
NodeSpec(
id="transform",
name="Transform",
description="Uppercases the value",
node_type="event_loop",
input_keys=["value"],
output_keys=["result"],
system_prompt=(
"Read the 'value' input, convert it to "
"UPPERCASE, then call set_output with "
"key='result' and the uppercased value. " + SET_OUTPUT_INSTRUCTION
),
),
],
edges=[
EdgeSpec(
id="intake-to-transform",
source="intake",
target="transform",
condition=EdgeCondition.ON_SUCCESS,
input_mapping={"value": "captured"},
),
],
memory_keys=["raw", "captured", "value", "result"],
)
@pytest.mark.asyncio
async def test_continuous_pipeline_traverses(runtime, goal, llm_provider, artifact):
"""Continuous mode pipeline should traverse both nodes."""
graph = _build_pipeline_graph(conversation_mode="continuous")
executor = make_executor(
runtime,
llm_provider,
loop_config={"max_iterations": 5},
)
result = await executor.execute(
graph,
goal,
{"raw": "hello"},
validate_graph=False,
)
artifact.record(
result,
expected=("success=True, path=['intake','transform'], output['result'] is set"),
)
artifact.check(
"execution succeeds",
result.success,
actual=str(result.success),
expected_val="True",
)
assert result.success
artifact.check(
"path matches",
result.path == ["intake", "transform"],
actual=str(result.path),
expected_val="['intake', 'transform']",
)
assert result.path == ["intake", "transform"]
actual_output = result.output.get("result")
artifact.check(
"output['result'] is set",
actual_output is not None,
actual=repr(actual_output),
expected_val="non-None value",
)
assert result.output.get("result") is not None
@pytest.mark.asyncio
async def test_continuous_data_flows_through(runtime, goal, llm_provider, artifact):
"""Data from node 1's output should be available to node 2."""
graph = _build_pipeline_graph(conversation_mode="continuous")
executor = make_executor(
runtime,
llm_provider,
loop_config={"max_iterations": 5},
)
result = await executor.execute(
graph,
goal,
{"raw": "test_data"},
validate_graph=False,
)
artifact.record(
result,
expected="success=True, output['result'] is non-empty",
)
artifact.check(
"execution succeeds",
result.success,
actual=str(result.success),
expected_val="True",
)
assert result.success
actual_output = result.output.get("result")
artifact.check(
"output['result'] is set",
actual_output is not None,
actual=repr(actual_output),
expected_val="non-None value",
)
assert result.output.get("result") is not None
output_len = len(str(result.output["result"]))
artifact.check(
"output is non-empty",
output_len > 0,
actual=str(output_len),
expected_val=">0",
)
assert len(str(result.output["result"])) > 0
@pytest.mark.asyncio
async def test_isolated_pipeline_traverses(runtime, goal, llm_provider, artifact):
"""Isolated mode pipeline should also traverse both nodes."""
graph = _build_pipeline_graph(conversation_mode="isolated")
executor = make_executor(
runtime,
llm_provider,
loop_config={"max_iterations": 5},
)
result = await executor.execute(
graph,
goal,
{"raw": "data"},
validate_graph=False,
)
artifact.record(
result,
expected="success=True, path=['intake','transform']",
)
artifact.check(
"execution succeeds",
result.success,
actual=str(result.success),
expected_val="True",
)
assert result.success
artifact.check(
"path matches",
result.path == ["intake", "transform"],
actual=str(result.path),
expected_val="['intake', 'transform']",
)
assert result.path == ["intake", "transform"]
@pytest.mark.asyncio
async def test_continuous_three_node_chain(runtime, goal, llm_provider, artifact):
"""Three-node continuous pipeline should thread end-to-end."""
graph = GraphSpec(
id="three-node-chain",
goal_id="dummy",
entry_node="a",
entry_points={"start": "a"},
terminal_nodes=["c"],
conversation_mode="continuous",
nodes=[
NodeSpec(
id="a",
name="Node A",
description="First node",
node_type="event_loop",
input_keys=["input"],
output_keys=["a_out"],
system_prompt=(
"Read the 'input' value and call set_output "
"with key='a_out' and the same value. " + SET_OUTPUT_INSTRUCTION
),
),
NodeSpec(
id="b",
name="Node B",
description="Middle node",
node_type="event_loop",
input_keys=["b_in"],
output_keys=["b_out"],
system_prompt=(
"Read the 'b_in' value and call set_output "
"with key='b_out' and value='processed_' "
"followed by the input. " + SET_OUTPUT_INSTRUCTION
),
),
NodeSpec(
id="c",
name="Node C",
description="Terminal node",
node_type="event_loop",
input_keys=["c_in"],
output_keys=["result"],
system_prompt=(
"Read the 'c_in' value and call set_output "
"with key='result' and the same value. " + SET_OUTPUT_INSTRUCTION
),
),
],
edges=[
EdgeSpec(
id="a-to-b",
source="a",
target="b",
condition=EdgeCondition.ON_SUCCESS,
input_mapping={"b_in": "a_out"},
),
EdgeSpec(
id="b-to-c",
source="b",
target="c",
condition=EdgeCondition.ON_SUCCESS,
input_mapping={"c_in": "b_out"},
),
],
memory_keys=[
"input",
"a_out",
"b_in",
"b_out",
"c_in",
"result",
],
)
executor = make_executor(
runtime,
llm_provider,
loop_config={"max_iterations": 5},
)
result = await executor.execute(
graph,
goal,
{"input": "payload"},
validate_graph=False,
)
artifact.record(
result,
expected=("success=True, path=['a','b','c'], steps=3, output['result'] is set"),
)
artifact.check(
"execution succeeds",
result.success,
actual=str(result.success),
expected_val="True",
)
assert result.success
artifact.check(
"path matches",
result.path == ["a", "b", "c"],
actual=str(result.path),
expected_val="['a', 'b', 'c']",
)
assert result.path == ["a", "b", "c"]
artifact.check(
"steps_executed is 3",
result.steps_executed == 3,
actual=str(result.steps_executed),
expected_val="3",
)
assert result.steps_executed == 3
actual_output = result.output.get("result")
artifact.check(
"output['result'] is set",
actual_output is not None,
actual=repr(actual_output),
expected_val="non-None value",
)
assert result.output.get("result") is not None
@@ -0,0 +1,241 @@
"""Component tests: Conversation Persistence — write-through, storage.
Exercises conversation persistence by running real LLM turns and
verifying that messages and state are written to disk correctly.
"""
from __future__ import annotations
import pytest
from framework.graph.edge import GraphSpec
from framework.graph.node import NodeSpec
from .conftest import make_executor
def _build_echo_graph() -> GraphSpec:
"""Single-node graph that echoes input to output."""
return GraphSpec(
id="conv-echo",
goal_id="dummy",
entry_node="echo",
entry_points={"start": "echo"},
terminal_nodes=["echo"],
nodes=[
NodeSpec(
id="echo",
name="Echo",
description="Echoes input to output",
node_type="event_loop",
input_keys=["input"],
output_keys=["output"],
system_prompt=(
"Read the 'input' value and immediately call "
"set_output with key='output' and the same "
"value. Do not add any text."
),
),
],
edges=[],
memory_keys=["input", "output"],
conversation_mode="continuous",
)
@pytest.mark.asyncio
async def test_conversation_persists_messages(runtime, goal, llm_provider, tmp_path, artifact):
"""After execution, conversation data should exist on disk."""
storage = tmp_path / "session"
graph = _build_echo_graph()
executor = make_executor(
runtime,
llm_provider,
storage_path=storage,
)
result = await executor.execute(
graph,
goal,
{"input": "hello"},
validate_graph=False,
)
artifact.record(
result,
expected=("success=True, conversations/ dir exists with data files"),
)
artifact.check(
"execution succeeds",
result.success,
actual=str(result.success),
expected_val="True",
)
assert result.success
# Verify conversation directory was created with content
conv_dir = storage / "conversations"
artifact.check(
"conversations/ dir exists",
conv_dir.exists(),
actual=str(conv_dir.exists()),
expected_val="True",
)
assert conv_dir.exists(), "conversations/ directory should exist"
# Should have at least one file (messages or cursor)
all_files = list(conv_dir.rglob("*"))
data_files = [f for f in all_files if f.is_file()]
artifact.check(
"at least one data file",
len(data_files) > 0,
actual=str(len(data_files)),
expected_val=">0",
)
assert len(data_files) > 0, "Should have persisted at least one conversation file"
@pytest.mark.asyncio
async def test_conversation_output_matches_execution(
runtime, goal, llm_provider, tmp_path, artifact
):
"""ExecutionResult output should be consistent with the node."""
storage = tmp_path / "session"
graph = _build_echo_graph()
executor = make_executor(
runtime,
llm_provider,
storage_path=storage,
)
result = await executor.execute(
graph,
goal,
{"input": "test_value"},
validate_graph=False,
)
artifact.record(
result,
expected="success=True, output['output'] is non-empty",
)
artifact.check(
"execution succeeds",
result.success,
actual=str(result.success),
expected_val="True",
)
assert result.success
actual_output = result.output.get("output")
artifact.check(
"output['output'] is set",
actual_output is not None,
actual=repr(actual_output),
expected_val="non-None value",
)
assert result.output.get("output") is not None
# The echo node should produce some non-empty output
output_len = len(str(result.output["output"]))
artifact.check(
"output is non-empty",
output_len > 0,
actual=str(output_len),
expected_val=">0",
)
assert len(str(result.output["output"])) > 0
@pytest.mark.asyncio
async def test_conversation_multi_node_persistence(runtime, goal, llm_provider, tmp_path, artifact):
"""Multi-node graph should persist conversation data for each node."""
from framework.graph.edge import EdgeCondition, EdgeSpec
storage = tmp_path / "session"
graph = GraphSpec(
id="multi-conv",
goal_id="dummy",
entry_node="step1",
entry_points={"start": "step1"},
terminal_nodes=["step2"],
conversation_mode="continuous",
nodes=[
NodeSpec(
id="step1",
name="Step 1",
description="First step",
node_type="event_loop",
output_keys=["intermediate"],
system_prompt=(
"Call set_output with key='intermediate' "
"and value='step1_done'. Do not write text."
),
),
NodeSpec(
id="step2",
name="Step 2",
description="Second step",
node_type="event_loop",
input_keys=["intermediate"],
output_keys=["result"],
system_prompt=(
"Call set_output with key='result' and value='step2_done'. Do not write text."
),
),
],
edges=[
EdgeSpec(
id="step1-to-step2",
source="step1",
target="step2",
condition=EdgeCondition.ON_SUCCESS,
input_mapping={"intermediate": "intermediate"},
),
],
memory_keys=["intermediate", "result"],
)
executor = make_executor(
runtime,
llm_provider,
storage_path=storage,
)
result = await executor.execute(
graph,
goal,
{},
validate_graph=False,
)
artifact.record(
result,
expected=("success=True, path=['step1','step2'], conversations/ dir exists"),
)
artifact.check(
"execution succeeds",
result.success,
actual=str(result.success),
expected_val="True",
)
assert result.success
artifact.check(
"path matches",
result.path == ["step1", "step2"],
actual=str(result.path),
expected_val="['step1', 'step2']",
)
assert result.path == ["step1", "step2"]
# Both nodes should have written conversation data
conv_dir = storage / "conversations"
artifact.check(
"conversations/ dir exists",
conv_dir.exists(),
actual=str(conv_dir.exists()),
expected_val="True",
)
assert conv_dir.exists()
@@ -0,0 +1,266 @@
"""Component tests: Edge Evaluation — conditional routing, LLM_DECIDE.
Exercises edge conditions with real LLM calls to verify that routing
decisions work correctly across providers.
"""
from __future__ import annotations
import pytest
from framework.graph.edge import EdgeCondition, EdgeSpec, GraphSpec
from framework.graph.node import NodeSpec
from .conftest import make_executor
SET_OUTPUT_INSTRUCTION = (
"You MUST call the set_output tool to provide your answer. "
"Do not just write text — call set_output with the correct "
"key and value."
)
@pytest.mark.asyncio
async def test_edge_conditional_true_path(runtime, goal, llm_provider, artifact):
"""Conditional edge with True expression should be traversed."""
graph = GraphSpec(
id="cond-true",
goal_id="dummy",
entry_node="source",
entry_points={"start": "source"},
terminal_nodes=["target"],
conversation_mode="continuous",
nodes=[
NodeSpec(
id="source",
name="Source",
description="Produces label=yes",
node_type="event_loop",
output_keys=["label"],
system_prompt=(
"Call set_output with key='label' and value='yes'. " + SET_OUTPUT_INSTRUCTION
),
),
NodeSpec(
id="target",
name="Target",
description="Terminal node",
node_type="event_loop",
output_keys=["result"],
system_prompt=(
"Call set_output with key='result' and "
"value='reached'. " + SET_OUTPUT_INSTRUCTION
),
),
],
edges=[
EdgeSpec(
id="source-to-target",
source="source",
target="target",
condition=EdgeCondition.CONDITIONAL,
condition_expr="output.get('label') == 'yes'",
),
],
memory_keys=["label", "result"],
)
executor = make_executor(
runtime,
llm_provider,
loop_config={"max_iterations": 3},
)
result = await executor.execute(
graph,
goal,
{},
validate_graph=False,
)
artifact.record(
result,
expected="success=True, path=['source','target']",
)
artifact.check(
"execution succeeds",
result.success,
actual=str(result.success),
expected_val="True",
)
assert result.success
artifact.check(
"path matches",
result.path == ["source", "target"],
actual=str(result.path),
expected_val="['source', 'target']",
)
assert result.path == ["source", "target"]
@pytest.mark.asyncio
async def test_edge_conditional_false_path(runtime, goal, llm_provider, artifact):
"""Conditional edge with False expression should NOT be traversed."""
graph = GraphSpec(
id="cond-false",
goal_id="dummy",
entry_node="source",
entry_points={"start": "source"},
terminal_nodes=["source", "target"],
conversation_mode="continuous",
nodes=[
NodeSpec(
id="source",
name="Source",
description="Produces label=no",
node_type="event_loop",
output_keys=["label"],
system_prompt=(
"Call set_output with key='label' and value='no'. " + SET_OUTPUT_INSTRUCTION
),
),
NodeSpec(
id="target",
name="Target",
description="Should not be reached",
node_type="event_loop",
output_keys=["result"],
system_prompt=("Call set_output with key='result' and value='bad'."),
),
],
edges=[
EdgeSpec(
id="source-to-target",
source="source",
target="target",
condition=EdgeCondition.CONDITIONAL,
condition_expr="output.get('label') == 'yes'",
),
],
memory_keys=["label", "result"],
)
executor = make_executor(
runtime,
llm_provider,
loop_config={"max_iterations": 3},
)
result = await executor.execute(
graph,
goal,
{},
validate_graph=False,
)
artifact.record(
result,
expected="success=True, 'target' not in path",
)
artifact.check(
"execution succeeds",
result.success,
actual=str(result.success),
expected_val="True",
)
assert result.success
artifact.check(
"target not in path",
"target" not in result.path,
actual=str(result.path),
expected_val="path without 'target'",
)
assert "target" not in result.path
@pytest.mark.asyncio
async def test_edge_priority_selects_higher(runtime, goal, llm_provider, artifact):
"""When multiple conditional edges match, higher priority wins."""
graph = GraphSpec(
id="priority-test",
goal_id="dummy",
entry_node="source",
entry_points={"start": "source"},
terminal_nodes=["high", "low"],
conversation_mode="continuous",
nodes=[
NodeSpec(
id="source",
name="Source",
description="Sets value=match",
node_type="event_loop",
output_keys=["value"],
system_prompt=(
"Call set_output with key='value' and value='match'. " + SET_OUTPUT_INSTRUCTION
),
),
NodeSpec(
id="high",
name="High Priority",
description="High priority terminal",
node_type="event_loop",
output_keys=["result"],
system_prompt=(
"Call set_output with key='result' and value='HIGH'. " + SET_OUTPUT_INSTRUCTION
),
),
NodeSpec(
id="low",
name="Low Priority",
description="Low priority terminal",
node_type="event_loop",
output_keys=["result"],
system_prompt=(
"Call set_output with key='result' and value='LOW'. " + SET_OUTPUT_INSTRUCTION
),
),
],
edges=[
EdgeSpec(
id="to-high",
source="source",
target="high",
condition=EdgeCondition.CONDITIONAL,
condition_expr="output.get('value') == 'match'",
priority=10,
),
EdgeSpec(
id="to-low",
source="source",
target="low",
condition=EdgeCondition.CONDITIONAL,
condition_expr="output.get('value') == 'match'",
priority=1,
),
],
memory_keys=["value", "result"],
)
executor = make_executor(
runtime,
llm_provider,
loop_config={"max_iterations": 3},
)
result = await executor.execute(
graph,
goal,
{},
validate_graph=False,
)
artifact.record(
result,
expected="success=True, path=['source','high']",
)
artifact.check(
"execution succeeds",
result.success,
actual=str(result.success),
expected_val="True",
)
assert result.success
artifact.check(
"path matches",
result.path == ["source", "high"],
actual=str(result.path),
expected_val="['source', 'high']",
)
assert result.path == ["source", "high"]

Some files were not shown because too many files have changed in this diff Show More