chore: lint format

fix: edge case for memory recall
fix: reflection agent runner
2026-04-07 13:57:05 -07:00 · 2026-04-07 13:55:18 -07:00 · 2026-04-07 13:07:41 -07:00 · 2026-04-07 12:45:37 -07:00 · 2026-04-07 12:08:35 -07:00 · 2026-04-07 09:18:34 -07:00
555 changed files with 85558 additions and 36656 deletions
@@ -0,0 +1,241 @@
+---
+name: browser-edge-cases
+description: SOP for debugging browser automation failures on complex websites. Use when browser tools fail on specific sites like LinkedIn, Twitter/X, SPAs, or sites with Shadow DOM.
+license: MIT
+---
+
+# Browser Tool Edge Cases
+
+Standard Operating Procedure for debugging and fixing browser automation failures on complex websites.
+
+## When to Use This Skill
+
+- `browser_scroll` succeeds but page doesn't move
+- `browser_click` succeeds but no action triggered
+- `browser_type` text disappears or doesn't work
+- `browser_snapshot` hangs or returns stale content
+- `browser_navigate` loads wrong content
+
+## SOP: Debugging Browser Tool Failures
+
+### Phase 1: Reproduce & Isolate
+
+```
+1. Create minimal test case demonstrating failure
+2. Test against simple site (example.com) to verify tool works
+3. Test against problematic site to confirm issue
+```
+
+**Quick isolation test:**
+```python
+# Test 1: Does the tool work at all?
+await browser_navigate(tab_id, "https://example.com")
+result = await browser_scroll(tab_id, "down", 100)
+# Should work on simple sites
+
+# Test 2: Does it fail on the problematic site?
+await browser_navigate(tab_id, "https://linkedin.com/feed")
+result = await browser_scroll(tab_id, "down", 100)
+# If this fails but example.com works → site-specific edge case
+```
+
+### Phase 2: Analyze Root Cause
+
+**Step 2a: Check console for errors**
+```python
+console = await browser_console(tab_id)
+# Look for: CSP violations, React errors, JavaScript exceptions
+```
+
+**Step 2b: Inspect DOM structure**
+```python
+html = await browser_html(tab_id)
+snapshot = await browser_snapshot(tab_id)
+# Look for:
+# - Nested scrollable divs (overflow: scroll/auto)
+# - Shadow DOM roots
+# - iframes
+# - Custom widgets
+```
+
+**Step 2c: Identify the pattern**
+
+| Symptom | Likely Cause | Check |
+|---------|--------------|-------|
+| Scroll doesn't move | Nested scroll container | Look for `overflow: scroll` divs |
+| Click no effect | Element covered | Check `getBoundingClientRect` vs viewport |
+| Type clears | Autocomplete/React | Check for event listeners on input |
+| Snapshot hangs | Huge DOM | Check node count in snapshot |
+| Snapshot stale | SPA hydration | Wait after navigation |
+
+### Phase 3: Implement Multi-Layer Fix
+
+**Pattern: Always have fallbacks**
+
+```python
+async def robust_operation(tab_id):
+    # Method 1: Primary approach
+    try:
+        result = await primary_method(tab_id)
+        if verify_success(result):
+            return result
+    except Exception:
+        pass
+
+    # Method 2: CDP fallback
+    try:
+        result = await cdp_fallback(tab_id)
+        if verify_success(result):
+            return result
+    except Exception:
+        pass
+
+    # Method 3: JavaScript fallback
+    return await javascript_fallback(tab_id)
+```
+
+**Pattern: Always add timeouts**
+
+```python
+# Bad - can hang forever
+result = await browser_snapshot(tab_id)
+
+# Good - fails fast with useful error
+try:
+    result = await browser_snapshot(tab_id, timeout_s=10.0)
+except asyncio.TimeoutError:
+    # Handle timeout gracefully
+    result = await fallback_snapshot(tab_id)
+```
+
+### Phase 4: Verify Fix
+
+```
+1. Run against problematic site → should work
+2. Run against simple site → should still work (regression check)
+3. Document in registry.md
+```
+
+## Pattern Library
+
+### P1: Nested Scrollable Containers
+
+**Sites:** LinkedIn, Twitter/X, any SPA with scrollable feeds
+
+**Detection:**
+```javascript
+// Find largest scrollable container
+const candidates = [];
+document.querySelectorAll('*').forEach(el => {
+    const style = getComputedStyle(el);
+    if (style.overflow.includes('scroll') || style.overflow.includes('auto')) {
+        const rect = el.getBoundingClientRect();
+        if (rect.width > 100 && rect.height > 100) {
+            candidates.push({el, area: rect.width * rect.height});
+        }
+    }
+});
+candidates.sort((a, b) => b.area - a.area);
+return candidates[0]?.el;
+```
+
+**Fix:** Dispatch scroll events at container's center, not viewport center.
+
+### P2: Element Covered by Overlay
+
+**Sites:** Modals, tooltips, SPAs with loading overlays
+
+**Detection:**
+```javascript
+const rect = element.getBoundingClientRect();
+const centerX = rect.left + rect.width / 2;
+const centerY = rect.top + rect.height / 2;
+const topElement = document.elementFromPoint(centerX, centerY);
+return topElement === element || element.contains(topElement);
+```
+
+**Fix:** Wait for overlay to disappear, or use JavaScript click.
+
+### P3: React Synthetic Events
+
+**Sites:** React SPAs, modern web apps
+
+**Detection:** If CDP click doesn't trigger handler but manual click works.
+
+**Fix:** Use JavaScript click as primary:
+```javascript
+element.click();
+```
+
+### P4: Huge DOM / Accessibility Tree
+
+**Sites:** LinkedIn, Facebook, Twitter (feeds with 1000s of nodes)
+
+**Detection:**
+```javascript
+document.querySelectorAll('*').length > 5000
+```
+
+**Fix:**
+1. Add timeout to snapshot operation
+2. Truncate tree at 2000 nodes
+3. Fall back to DOM-based snapshot if accessibility tree too large
+
+### P5: SPA Hydration Delay
+
+**Sites:** React, Vue, Angular SPAs after navigation
+
+**Detection:**
+```javascript
+// Check if React app has hydrated
+document.querySelector('[data-reactroot]') ||
+document.querySelector('[data-reactid]')
+```
+
+**Fix:** Wait for specific selector after navigation:
+```python
+await browser_navigate(tab_id, url, wait_until="load")
+await browser_wait(tab_id, selector='[data-testid="content"]', timeout_ms=5000)
+```
+
+### P6: Shadow DOM
+
+**Sites:** Components using Shadow DOM, Lit elements
+
+**Detection:**
+```javascript
+document.querySelectorAll('*').some(el => el.shadowRoot)
+```
+
+**Fix:** Pierce shadow root:
+```javascript
+function queryShadow(selector) {
+    const parts = selector.split('>>>');
+    let node = document;
+    for (const part of parts) {
+        if (node.shadowRoot) {
+            node = node.shadowRoot.querySelector(part.trim());
+        } else {
+            node = node.querySelector(part.trim());
+        }
+    }
+    return node;
+}
+```
+
+## Quick Reference
+
+| Issue | Primary Fix | Fallback |
+|-------|-------------|----------|
+| Scroll not working | Find scrollable container | Mouse wheel at container center |
+| Click no effect | JavaScript click() | CDP mouse events |
+| Type clears | Add delay_ms | Use execCommand |
+| Snapshot hangs | Add timeout_s | DOM snapshot fallback |
+| Stale content | Wait for selector | Increase wait_until timeout |
+| Shadow DOM | Pierce selector | JavaScript traversal |
+
+## References
+
+- [registry.md](registry.md) - Full list of known edge cases
+- [scripts/test_case.py](scripts/test_case.py) - Template for testing new cases
+- [BROWSER_USE_PATTERNS.md](../../tools/BROWSER_USE_PATTERNS.md) - Implementation patterns from browser-use
@@ -0,0 +1,261 @@
+# Browser Edge Case Registry
+
+Curated list of known browser automation edge cases with symptoms, causes, and fixes.
+
+---
+
+## Scroll Issues
+
+### #1: LinkedIn Nested Scroll Container
+
+| Attribute | Value |
+|-----------|-------|
+| **Site** | LinkedIn (linkedin.com/feed) |
+| **Symptom** | `browser_scroll()` returns `{ok: true}` but page doesn't move |
+| **Root Cause** | Content is in a nested scrollable div (`overflow: scroll`), not the main window |
+| **Detection** | `document.querySelectorAll('*')` with `overflow: scroll/auto` has large candidates |
+| **Fix** | JavaScript finds largest scrollable container, uses `container.scrollBy()` |
+| **Code** | `bridge.py:808-891` - smart scroll with container detection |
+| **Verified** | 2026-04-03 ✓ |
+
+### #2: Twitter/X Lazy Loading
+
+| Attribute | Value |
+|-----------|-------|
+| **Site** | Twitter/X (x.com) |
+| **Symptom** | Infinite scroll doesn't load new content |
+| **Root Cause** | Lazy loading requires content to be visible before loading more |
+| **Detection** | Scroll position at bottom but no new `[data-testid="tweet"]` elements |
+| **Fix** | Add `wait_for_selector` between scroll calls with 1s delay |
+| **Code** | Test file: `tests/test_x_page_load_repro.py` |
+| **Verified** | - |
+
+### #3: Modal/Dialog Scroll Container
+
+| Attribute | Value |
+|-----------|-------|
+| **Site** | Any site with modal dialogs |
+| **Symptom** | Scroll scrolls background page, not modal content |
+| **Root Cause** | Modal has its own scroll container with `overflow: scroll` |
+| **Detection** | Visible element with `position: fixed` and scrollable content |
+| **Fix** | Find visible modal container (highest z-index scrollable), scroll that |
+| **Code** | - |
+| **Verified** | - |
+
+---
+
+## Click Issues
+
+### #4: Element Covered by Overlay
+
+| Attribute | Value |
+|-----------|-------|
+| **Site** | SPAs, sites with loading overlays |
+| **Symptom** | Click succeeds but no action triggered |
+| **Root Cause** | Element is covered by transparent overlay, tooltip, or iframe |
+| **Detection** | `document.elementFromPoint(x, y) !== target` |
+| **Fix** | Wait for overlay to disappear, or use JavaScript `element.click()` |
+| **Code** | `bridge.py:394-591` - JavaScript click as primary |
+| **Verified** | - |
+
+### #5: React Synthetic Events
+
+| Attribute | Value |
+|-----------|-------|
+| **Site** | React applications |
+| **Symptom** | CDP click doesn't trigger React handler |
+| **Root Cause** | React uses synthetic events that don't respond to CDP events |
+| **Detection** | Site uses React (check for `__reactFiber$` or `data-reactroot`) |
+| **Fix** | Use JavaScript `element.click()` as primary method |
+| **Code** | `bridge.py:394-591` - JavaScript-first click |
+| **Verified** | - |
+
+### #6: Shadow DOM Elements
+
+| Attribute | Value |
+|-----------|-------|
+| **Site** | Components using Shadow DOM, Lit elements |
+| **Symptom** | `querySelector` can't find element |
+| **Root Cause** | Element is inside a shadow root, not main DOM tree |
+| **Detection** | `element.shadowRoot !== null` on parent elements |
+| **Fix** | Use piercing selector (`host >>> target`) or traverse shadow roots |
+| **Code** | See SKILL.md P6 pattern |
+| **Verified** | 2026-04-03 ✓ |
+
+---
+
+## Input Issues
+
+### #7: ContentEditable / Rich Text Editors
+
+| Attribute | Value |
+|-----------|-------|
+| **Site** | Rich text editors (Notion, Slack web, etc.) |
+| **Symptom** | `browser_type()` doesn't insert text |
+| **Root Cause** | Element is `contenteditable`, not an `<input>` or `<textarea>` |
+| **Detection** | `element.contentEditable === 'true'` |
+| **Fix** | Focus via JavaScript, use `execCommand('insertText')` or `Input.dispatchKeyEvent` |
+| **Code** | `bridge.py:616-694` - contentEditable handling |
+| **Verified** | 2026-04-03 ✓ |
+
+### #8: Autocomplete Field Clearing
+
+| Attribute | Value |
+|-----------|-------|
+| **Site** | Search fields with autocomplete, address forms |
+| **Symptom** | Typed text gets cleared immediately |
+| **Root Cause** | Field expects realistic keystroke timing for autocomplete |
+| **Detection** | Field has autocomplete listeners or dropdown appears |
+| **Fix** | Add `delay_ms=50` between keystrokes |
+| **Code** | `bridge.py:type()` - delay_ms parameter |
+| **Verified** | 2026-04-03 ✓ |
+
+### #9: Custom Date Pickers
+
+| Attribute | Value |
+|-----------|-------|
+| **Site** | Forms with custom date widgets |
+| **Symptom** | Can't type date into date field |
+| **Root Cause** | Custom widget intercepts and blocks keyboard input |
+| **Detection** | Typing doesn't change field value |
+| **Fix** | Click calendar widget icon, select date from dropdown |
+| **Code** | - |
+| **Verified** | - |
+
+---
+
+## Snapshot Issues
+
+### #10: LinkedIn Huge DOM Tree
+
+| Attribute | Value |
+|-----------|-------|
+| **Site** | LinkedIn, Facebook, Twitter feeds |
+| **Symptom** | `browser_snapshot()` hangs forever |
+| **Root Cause** | 10k+ DOM nodes, accessibility tree has 50k+ nodes |
+| **Detection** | `document.querySelectorAll('*').length > 5000` |
+| **Fix** | Add `timeout_s` param with `asyncio.timeout()`, proper error handling |
+| **Code** | `bridge.py:1041-1028` - snapshot with timeout protection |
+| **Verified** | 2026-04-03 ✓ (0.08s on LinkedIn) |
+
+### #11: SPA Hydration Delay
+
+| Attribute | Value |
+|-----------|-------|
+| **Site** | React/Vue/Angular SPAs |
+| **Symptom** | Snapshot shows old content after navigation |
+| **Root Cause** | Client-side hydration hasn't completed when snapshot runs |
+| **Detection** | `document.readyState === 'complete'` but content missing |
+| **Fix** | Wait for specific selector after navigation |
+| **Code** | Test file: `tests/test_x_page_load_repro.py` |
+| **Verified** | - |
+
+### #12: iframe Content Missing
+
+| Attribute | Value |
+|-----------|-------|
+| **Site** | Sites with embedded content |
+| **Symptom** | Snapshot missing iframe content |
+| **Root Cause** | Accessibility tree doesn't include iframe content |
+| **Detection** | `document.querySelectorAll('iframe')` has results |
+| **Fix** | Use `DOM.getFrameOwner` + separate snapshot for each iframe |
+| **Code** | - |
+| **Verified** | - |
+
+---
+
+## Navigation Issues
+
+### #13: SPA Navigation Events
+
+| Attribute | Value |
+|-----------|-------|
+| **Site** | React Router, Vue Router SPAs |
+| **Symptom** | `wait_until="load"` fires before content ready |
+| **Root Cause** | SPA uses client-side routing, no full page load |
+| **Detection** | URL changes but `load` event already fired |
+| **Fix** | Use `wait_until="networkidle"` or `wait_for_selector` |
+| **Code** | `bridge.py:navigate()` - wait_until options |
+| **Verified** | - |
+
+### #14: Cross-Origin Redirects
+
+| Attribute | Value |
+|-----------|-------|
+| **Site** | OAuth flows, SSO logins |
+| **Symptom** | Navigation fails during redirect |
+| **Root Cause** | Cross-origin security prevents CDP tracking |
+| **Detection** | URL changes to different domain |
+| **Fix** | Use `wait_for_url` with pattern matching instead of exact URL |
+| **Code** | - |
+| **Verified** | - |
+
+---
+
+## Screenshot Issues
+
+### #15: Selector Screenshot Not Implemented
+
+| Attribute | Value |
+|-----------|-------|
+| **Site** | Any site |
+| **Symptom** | `browser_screenshot(selector="h1")` takes full viewport instead of element |
+| **Root Cause** | `selector` param existed in signature but was silently ignored in both `bridge.py` and `inspection.py` |
+| **Detection** | Screenshot with selector same byte size as screenshot without selector |
+| **Fix** | Use CDP `Runtime.evaluate` to call `getBoundingClientRect()` on the element, pass result as `clip` to `Page.captureScreenshot` |
+| **Code** | `bridge.py:1315-1344` - selector clip logic; `inspection.py:94-96` - pass selector to bridge |
+| **Verified** | 2026-04-03 ✓ (JS rect query returns correct viewport coords; requires server restart) |
+
+### #16: Stale Browser Context (Group ID Mismatch)
+
+| Attribute | Value |
+|-----------|-------|
+| **Site** | Any |
+| **Symptom** | `browser_open()` returns `"No group with id: XXXXXXX"` even though `browser_status` shows `running: true` |
+| **Root Cause** | In-memory `_contexts` dict has a stale `groupId` from a Chrome tab group that was closed outside the tool (e.g. user closed the tab group) |
+| **Detection** | `browser_status` returns `running: true` but `browser_open` fails with "No group with id" |
+| **Fix** | Call `browser_stop()` to clear stale context from `_contexts`, then `browser_start()` again |
+| **Code** | `tools/lifecycle.py:144-160` - `already_running` check uses cached dict without validating against Chrome |
+| **Verified** | 2026-04-03 ✓ |
+
+---
+
+## How to Add New Edge Cases
+
+1. **Reproduce** the issue with minimal test case
+2. **Document** using the template below
+3. **Implement** fix with multi-layer fallback
+4. **Verify** against both problematic and simple sites
+5. **Submit** by appending to this file
+
+### Template
+
+```markdown
+### #N: [Short Title]
+
+| Attribute | Value |
+|-----------|-------|
+| **Site** | [URL or site type] |
+| **Symptom** | [What the user observes] |
+| **Root Cause** | [Technical explanation] |
+| **Detection** | [JavaScript to detect this case] |
+| **Fix** | [Solution approach] |
+| **Code** | [File:line reference if implemented] |
+| **Verified** | [Date or "pending"] |
+```
+
+---
+
+## Statistics
+
+| Category | Count |
+|----------|-------|
+| Scroll Issues | 3 |
+| Click Issues | 3 |
+| Input Issues | 3 |
+| Snapshot Issues | 3 |
+| Navigation Issues | 2 |
+| Screenshot Issues | 2 |
+| **Total** | **16** |
+
+Last updated: 2026-04-03
@@ -0,0 +1,113 @@
+#!/usr/bin/env python
+"""
+Test #2: Twitter/X Lazy Loading Scroll
+
+Symptom: Infinite scroll doesn't load new content
+Root Cause: Lazy loading requires content to be visible before loading more
+Fix: Add wait_for_selector between scroll calls
+"""
+
+import asyncio
+import sys
+from pathlib import Path
+
+sys.path.insert(0, str(Path(__file__).parent.parent.parent.parent / "tools" / "src"))
+
+from gcu.browser.bridge import BeelineBridge
+
+BRIDGE_PORT = 9229
+CONTEXT_NAME = "twitter-scroll-test"
+
+
+async def test_twitter_lazy_scroll():
+    """Test that repeated scrolls with waits load new content."""
+    print("=" * 70)
+    print("TEST #2: Twitter/X Lazy Loading Scroll")
+    print("=" * 70)
+
+    bridge = BeelineBridge()
+
+    try:
+        await bridge.start()
+
+        for i in range(10):
+            await asyncio.sleep(1)
+            if bridge.is_connected:
+                print("✓ Extension connected!")
+                break
+            print(f"Waiting for extension... ({i + 1}/10)")
+        else:
+            print("✗ Extension not connected")
+            return
+
+        context = await bridge.create_context(CONTEXT_NAME)
+        tab_id = context.get("tabId")
+        group_id = context.get("groupId")
+        print(f"✓ Created tab: {tab_id}")
+
+        # Navigate to Twitter/X
+        print("\n--- Navigating to X.com ---")
+        await bridge.navigate(tab_id, "https://x.com", wait_until="networkidle", timeout_ms=30000)
+        print("✓ Page loaded")
+
+        # Wait for tweets to appear
+        print("\n--- Waiting for tweets ---")
+        await bridge.wait_for_selector(tab_id, '[data-testid="tweet"]', timeout_ms=10000)
+
+        # Count initial tweets
+        initial_count = await bridge.evaluate(
+            tab_id,
+            "(function() { return document.querySelectorAll("
+            "'[data-testid=\"tweet\"]').length; })()",
+        )
+        print(f"Initial tweet count: {initial_count.get('result', 0)}")
+
+        # Take screenshot of initial state
+        screenshot = await bridge.screenshot(tab_id)
+        print(f"Screenshot: {len(screenshot.get('data', ''))} bytes")
+
+        # Scroll multiple times with waits
+        print("\n--- Scrolling with waits ---")
+        for i in range(3):
+            result = await bridge.scroll(tab_id, "down", 500)
+            print(f"  Scroll {i + 1}: {result.get('method', 'unknown')} method")
+
+            # Wait for new content to load
+            await asyncio.sleep(2)
+
+            # Count tweets after scroll
+            count_result = await bridge.evaluate(
+                tab_id,
+                "(function() { return document.querySelectorAll("
+                "'[data-testid=\"tweet\"]').length; })()",
+            )
+            count = count_result.get("result", 0)
+            print(f"  Tweet count after scroll: {count}")
+
+        # Final count
+        final_count = await bridge.evaluate(
+            tab_id,
+            "(function() { return document.querySelectorAll("
+            "'[data-testid=\"tweet\"]').length; })()",
+        )
+        final = final_count.get("result", 0)
+        initial = initial_count.get("result", 0)
+
+        print("\n--- Results ---")
+        print(f"Initial tweets: {initial}")
+        print(f"Final tweets: {final}")
+
+        if final > initial:
+            print(f"✓ PASS: Loaded {final - initial} new tweets")
+        else:
+            print("✗ FAIL: No new tweets loaded (may need login)")
+
+        await bridge.destroy_context(group_id)
+        print("\n✓ Context destroyed")
+
+    finally:
+        await bridge.stop()
+
+
+if __name__ == "__main__":
+    asyncio.run(test_twitter_lazy_scroll())
@@ -0,0 +1,96 @@
+#!/usr/bin/env python
+"""
+Test #3: Modal/Dialog Scroll Container
+
+Symptom: Scroll scrolls background page, not modal content
+Root Cause: Modal has its own scroll container with overflow: scroll
+Fix: Find visible modal container (highest z-index scrollable), scroll that
+"""
+
+import asyncio
+import sys
+from pathlib import Path
+
+sys.path.insert(0, str(Path(__file__).parent.parent.parent.parent / "tools" / "src"))
+
+from gcu.browser.bridge import BeelineBridge
+
+BRIDGE_PORT = 9229
+CONTEXT_NAME = "modal-scroll-test"
+
+# Test site with modal - using a demo site
+MODAL_DEMO_URL = "https://www.w3schools.com/howto/howto_css_modals.asp"
+
+
+async def test_modal_scroll():
+    """Test that scroll targets modal content, not background."""
+    print("=" * 70)
+    print("TEST #3: Modal/Dialog Scroll Container")
+    print("=" * 70)
+
+    bridge = BeelineBridge()
+
+    try:
+        await bridge.start()
+
+        for i in range(10):
+            await asyncio.sleep(1)
+            if bridge.is_connected:
+                print("✓ Extension connected!")
+                break
+        else:
+            print("✗ Extension not connected")
+            return
+
+        context = await bridge.create_context(CONTEXT_NAME)
+        tab_id = context.get("tabId")
+        group_id = context.get("groupId")
+        print(f"✓ Created tab: {tab_id}")
+
+        # Navigate to modal demo
+        print("\n--- Navigating to modal demo ---")
+        await bridge.navigate(tab_id, MODAL_DEMO_URL, wait_until="load")
+        print("✓ Page loaded")
+
+        # Take screenshot before
+        screenshot_before = await bridge.screenshot(tab_id)
+        print(f"Screenshot before: {len(screenshot_before.get('data', ''))} bytes")
+
+        # Click button to open modal
+        print("\n--- Opening modal ---")
+        # Find and click the "Open Modal" button
+        result = await bridge.click(tab_id, ".ws-btn", timeout_ms=5000)
+        print(f"Click result: {result}")
+
+        await asyncio.sleep(1)
+
+        # Take screenshot with modal open
+        screenshot_modal = await bridge.screenshot(tab_id)
+        print(f"Screenshot modal open: {len(screenshot_modal.get('data', ''))} bytes")
+
+        # Try to scroll within modal
+        print("\n--- Scrolling modal content ---")
+        result = await bridge.scroll(tab_id, "down", 100)
+        print(f"Scroll result: {result}")
+
+        await asyncio.sleep(0.5)
+
+        # Take screenshot after scroll
+        screenshot_after = await bridge.screenshot(tab_id)
+        print(f"Screenshot after scroll: {len(screenshot_after.get('data', ''))} bytes")
+
+        # Check if modal content scrolled (not background)
+        # This is a visual check - we can verify by comparing screenshots
+        print("\n--- Results ---")
+        print(f"Modal scroll test completed. Method used: {result.get('method', 'unknown')}")
+        print("Visual verification needed: Check if modal content scrolled vs background")
+
+        await bridge.destroy_context(group_id)
+        print("\n✓ Context destroyed")
+
+    finally:
+        await bridge.stop()
+
+
+if __name__ == "__main__":
+    asyncio.run(test_modal_scroll())
@@ -0,0 +1,123 @@
+#!/usr/bin/env python
+"""
+Test #4: Element Covered by Overlay
+
+Symptom: Click succeeds but no action triggered
+Root Cause: Element is covered by transparent overlay, tooltip, or iframe
+Detection: document.elementFromPoint(x, y) !== target
+Fix: Wait for overlay to disappear, or use JavaScript element.click()
+"""
+
+import asyncio
+import sys
+from pathlib import Path
+
+sys.path.insert(0, str(Path(__file__).parent.parent.parent.parent / "tools" / "src"))
+
+from gcu.browser.bridge import BeelineBridge
+
+CONTEXT_NAME = "overlay-click-test"
+
+
+async def test_overlay_click():
+    """Test clicking elements that are covered by overlays."""
+    print("=" * 70)
+    print("TEST #4: Element Covered by Overlay")
+    print("=" * 70)
+
+    bridge = BeelineBridge()
+
+    try:
+        await bridge.start()
+
+        for i in range(10):
+            await asyncio.sleep(1)
+            if bridge.is_connected:
+                print("✓ Extension connected!")
+                break
+        else:
+            print("✗ Extension not connected")
+            return
+
+        context = await bridge.create_context(CONTEXT_NAME)
+        tab_id = context.get("tabId")
+        group_id = context.get("groupId")
+        print(f"✓ Created tab: {tab_id}")
+
+        # Create a test page with overlay
+        print("\n--- Creating test page with overlay ---")
+        test_html = """
+        <!DOCTYPE html>
+        <html>
+        <head><title>Overlay Test</title></head>
+        <body>
+            <button id="target-btn" onclick="alert('Clicked!')">Click Me</button>
+            <div id="overlay" style="position:fixed;top:0;left:0;
+            width:100%;height:100%;
+            background:rgba(0,0,0,0.3);z-index:1000;"></div>
+            <script>
+                window.clickCount = 0;
+                document.getElementById('target-btn').addEventListener('click', () => {
+                    window.clickCount++;
+                });
+            </script>
+        </body>
+        </html>
+        """
+
+        # Navigate to data URL
+        import base64
+
+        data_url = f"data:text/html;base64,{base64.b64encode(test_html.encode()).decode()}"
+        await bridge.navigate(tab_id, data_url, wait_until="load")
+
+        # Screenshot before
+        screenshot = await bridge.screenshot(tab_id)
+        print(f"Screenshot: {len(screenshot.get('data', ''))} bytes")
+
+        # Try to click the covered button
+        print("\n--- Attempting to click covered button ---")
+
+        # First, check if element is covered
+        coverage_check = await bridge.evaluate(
+            tab_id,
+            """
+            (function() {
+                const btn = document.getElementById('target-btn');
+                const rect = btn.getBoundingClientRect();
+                const centerX = rect.left + rect.width / 2;
+                const centerY = rect.top + rect.height / 2;
+                const topElement = document.elementFromPoint(centerX, centerY);
+                return {
+                    isCovered: topElement !== btn && !btn.contains(topElement),
+                    topElement: topElement?.tagName,
+                    targetElement: btn.tagName
+                };
+            })();
+        """,
+        )
+        print(f"Coverage check: {coverage_check.get('result', {})}")
+
+        # Try CDP click (may fail due to overlay)
+        click_result = await bridge.click(tab_id, "#target-btn", timeout_ms=5000)
+        print(f"Click result: {click_result}")
+
+        # Check if click registered
+        count_result = await bridge.evaluate(tab_id, "(function() { return window.clickCount; })()")
+        count = count_result.get("result", 0)
+        print(f"Click count after CDP click: {count}")
+
+        if count > 0:
+            print("✓ PASS: JavaScript click penetrated overlay")
+        else:
+            print("✗ FAIL: Click did not reach button (overlay blocked it)")
+
+        await bridge.destroy_context(group_id)
+        print("\n✓ Context destroyed")
+
+    finally:
+        await bridge.stop()
+
+
+if __name__ == "__main__":
+    asyncio.run(test_overlay_click())
@@ -0,0 +1,152 @@
+#!/usr/bin/env python
+"""
+Test #6: Shadow DOM Elements
+
+Symptom: querySelector can't find element
+Root Cause: Element is inside a shadow root, not main DOM tree
+Detection: element.shadowRoot !== null on parent elements
+Fix: Use piercing selector (host >>> target) or traverse shadow roots
+"""
+
+import asyncio
+import sys
+from pathlib import Path
+
+sys.path.insert(0, str(Path(__file__).parent.parent.parent.parent / "tools" / "src"))
+
+from gcu.browser.bridge import BeelineBridge
+
+CONTEXT_NAME = "shadow-dom-test"
+
+
+async def test_shadow_dom():
+    """Test clicking elements inside Shadow DOM."""
+    print("=" * 70)
+    print("TEST #6: Shadow DOM Elements")
+    print("=" * 70)
+
+    bridge = BeelineBridge()
+
+    try:
+        await bridge.start()
+
+        for i in range(10):
+            await asyncio.sleep(1)
+            if bridge.is_connected:
+                print("✓ Extension connected!")
+                break
+        else:
+            print("✗ Extension not connected")
+            return
+
+        context = await bridge.create_context(CONTEXT_NAME)
+        tab_id = context.get("tabId")
+        group_id = context.get("groupId")
+        print(f"✓ Created tab: {tab_id}")
+
+        # Create test page with Shadow DOM
+        print("\n--- Creating test page with Shadow DOM ---")
+        test_html = """
+        <!DOCTYPE html>
+        <html>
+        <head><title>Shadow DOM Test</title></head>
+        <body>
+            <div id="shadow-host"></div>
+            <script>
+                const host = document.getElementById('shadow-host');
+                const shadow = host.attachShadow({ mode: 'open' });
+                shadow.innerHTML = `
+                    <style>
+                        button { padding: 10px 20px; font-size: 16px; }
+                    </style>
+                    <button id="shadow-btn">Shadow Button</button>
+                `;
+                shadow.getElementById('shadow-btn').addEventListener('click', () => {
+                    window.shadowClickCount = (window.shadowClickCount || 0) + 1;
+                    console.log('Shadow button clicked:', window.shadowClickCount);
+                });
+            </script>
+        </body>
+        </html>
+        """
+
+        # Write to file and use file:// URL (data: URLs don't work well with extension)
+        test_file = Path("/tmp/shadow_dom_test.html")
+        test_file.write_text(test_html.strip())
+        file_url = f"file://{test_file}"
+        await bridge.navigate(tab_id, file_url, wait_until="load")
+        print("✓ Page loaded")
+
+        # Screenshot
+        screenshot = await bridge.screenshot(tab_id)
+        print(f"Screenshot: {len(screenshot.get('data', ''))} bytes")
+
+        # Detect Shadow DOM
+        print("\n--- Detecting Shadow DOM ---")
+        detection = await bridge.evaluate(
+            tab_id,
+            """
+            (function() {
+                const hosts = [];
+                document.querySelectorAll('*').forEach(el => {
+                    if (el.shadowRoot) {
+                        hosts.push({
+                            tag: el.tagName,
+                            id: el.id,
+                            hasButton: el.shadowRoot.querySelector('button') !== null
+                        });
+                    }
+                });
+                return { count: hosts.length, hosts };
+            })();
+        """,
+        )
+        print(f"Shadow DOM detection: {detection.get('result', {})}")
+
+        # Try to click shadow button using regular selector (should fail)
+        print("\n--- Attempting click with regular selector ---")
+        try:
+            result = await bridge.click(tab_id, "#shadow-btn", timeout_ms=3000)
+            print(f"Result: {result}")
+        except Exception as e:
+            print(f"Expected failure: {e}")
+
+        # Try to click using JavaScript that pierces shadow DOM
+        print("\n--- Clicking via JavaScript shadow piercing ---")
+        click_result = await bridge.evaluate(
+            tab_id,
+            """
+            (function() {
+                const host = document.getElementById('shadow-host');
+                const btn = host.shadowRoot.getElementById('shadow-btn');
+                if (btn) {
+                    btn.click();
+                    return { success: true, clicked: 'shadow-btn' };
+                }
+                return { success: false, error: 'Button not found' };
+            })();
+        """,
+        )
+        print(f"JS click result: {click_result.get('result', {})}")
+
+        # Verify click was registered
+        count_result = await bridge.evaluate(
+            tab_id, "(function() { return window.shadowClickCount || 0; })()"
+        )
+        count = count_result.get("result") or 0
+        print(f"Shadow click count: {count}")
+
+        if count and count > 0:
+            print("✓ PASS: Shadow DOM element clicked successfully")
+        else:
+            print("✗ FAIL: Could not click Shadow DOM element")
+
+        await bridge.destroy_context(group_id)
+        print("\n✓ Context destroyed")
+
+    finally:
+        await bridge.stop()
+
+
+if __name__ == "__main__":
+    asyncio.run(test_shadow_dom())
@@ -0,0 +1,180 @@
+#!/usr/bin/env python
+"""
+Test #7: ContentEditable / Rich Text Editors
+
+Symptom: browser_type() doesn't insert text
+Root Cause: Element is contenteditable, not an <input> or <textarea>
+Detection: element.contentEditable === 'true'
+Fix: Focus via JavaScript, use execCommand('insertText') or Input.dispatchKeyEvent
+"""
+
+import asyncio
+import sys
+from pathlib import Path
+
+sys.path.insert(0, str(Path(__file__).parent.parent.parent.parent / "tools" / "src"))
+
+from gcu.browser.bridge import BeelineBridge
+
+CONTEXT_NAME = "contenteditable-test"
+
+
+async def test_contenteditable():
+    """Test typing into contenteditable elements."""
+    print("=" * 70)
+    print("TEST #7: ContentEditable / Rich Text Editors")
+    print("=" * 70)
+
+    bridge = BeelineBridge()
+
+    try:
+        await bridge.start()
+
+        for i in range(10):
+            await asyncio.sleep(1)
+            if bridge.is_connected:
+                print("✓ Extension connected!")
+                break
+        else:
+            print("✗ Extension not connected")
+            return
+
+        context = await bridge.create_context(CONTEXT_NAME)
+        tab_id = context.get("tabId")
+        group_id = context.get("groupId")
+        print(f"✓ Created tab: {tab_id}")
+
+        # Create test page with contenteditable
+        test_html = """
+        <!DOCTYPE html>
+        <html>
+        <head><title>ContentEditable Test</title></head>
+        <body>
+            <h2>ContentEditable Test</h2>
+
+            <h3>1. Simple contenteditable div</h3>
+            <div id="editor1" contenteditable="true"
+            style="border:1px solid #ccc;padding:10px;
+            min-height:50px;">Start text</div>
+
+            <h3>2. Rich text editor (like Notion)</h3>
+            <div id="editor2" contenteditable="true"
+            style="border:1px solid #ccc;padding:10px;
+            min-height:50px;">
+                <p>Type here...</p>
+            </div>
+
+            <h3>3. Regular input (for comparison)</h3>
+            <input id="input1" type="text" placeholder="Regular input" />
+
+            <script>
+                // Track content changes
+                window.editor1Content = '';
+                window.editor2Content = '';
+
+                document.getElementById('editor1').addEventListener('input', (e) => {
+                    window.editor1Content = e.target.innerText;
+                });
+                document.getElementById('editor2').addEventListener('input', (e) => {
+                    window.editor2Content = e.target.innerText;
+                });
+            </script>
+        </body>
+        </html>
+        """
+
+        # Write to file and use file:// URL (data: URLs don't work well with extension)
+        test_file = Path("/tmp/contenteditable_test.html")
+        test_file.write_text(test_html.strip())
+        file_url = f"file://{test_file}"
+        await bridge.navigate(tab_id, file_url, wait_until="load")
+        print("✓ Page loaded")
+
+        # Screenshot with timeout protection
+        try:
+            screenshot = await asyncio.wait_for(bridge.screenshot(tab_id), timeout=10.0)
+            print(f"Screenshot: {len(screenshot.get('data', ''))} bytes")
+        except asyncio.TimeoutError:
+            print("Screenshot timed out (skipping)")
+
+        # Detect contenteditable
+        print("\n--- Detecting contenteditable elements ---")
+        detection = await bridge.evaluate(
+            tab_id,
+            """
+            (function() {
+                const editables = document.querySelectorAll('[contenteditable="true"]');
+                return {
+                    count: editables.length,
+                    ids: Array.from(editables).map(el => el.id)
+                };
+            })();
+        """,
+        )
+        print(f"Contenteditable detection: {detection.get('result', {})}")
+
+        # Test 1: Type into regular input (baseline)
+        print("\n--- Test 1: Regular input ---")
+        await bridge.click(tab_id, "#input1")
+        await bridge.type_text(tab_id, "#input1", "Hello input")
+        input_result = await bridge.evaluate(
+            tab_id, "(function() { return document.getElementById('input1').value; })()"
+        )
+        print(f"Input value: {input_result.get('result', '')}")
+
+        # Test 2: Type into contenteditable div
+        print("\n--- Test 2: Contenteditable div ---")
+        await bridge.click(tab_id, "#editor1")
+        await bridge.type_text(tab_id, "#editor1", "Hello contenteditable", clear_first=True)
+        editor_result = await bridge.evaluate(
+            tab_id,
+            "(function() { return document.getElementById('editor1').innerText; })()",
+        )
+        print(f"Editor1 innerText: {editor_result.get('result', '')}")
+
+        # Test 3: Use JavaScript insertText for rich editor
+        print("\n--- Test 3: JavaScript insertText for rich editor ---")
+        insert_result = await bridge.evaluate(
+            tab_id,
+            """
+            (function() {
+                const editor = document.getElementById('editor2');
+                editor.focus();
+                document.execCommand('selectAll', false, null);
+                document.execCommand('insertText', false, 'Hello from execCommand');
+                return editor.innerText;
+            })();
+        """,
+        )
+        print(f"Editor2 after execCommand: {insert_result.get('result', '')}")
+
+        # Screenshot after with timeout protection
+        try:
+            screenshot_after = await asyncio.wait_for(bridge.screenshot(tab_id), timeout=10.0)
+            print(f"Screenshot after: {len(screenshot_after.get('data', ''))} bytes")
+        except asyncio.TimeoutError:
+            print("Screenshot after timed out (skipping)")
+
+        # Results
+        print("\n--- Results ---")
+        input_val = input_result.get("result", "")
+        editor1_val = editor_result.get("result", "")
+        editor2_val = insert_result.get("result", "")
+
+        input_pass = "Hello input" in input_val
+        editor1_pass = "Hello contenteditable" in editor1_val
+        editor2_pass = "execCommand" in editor2_val
+
+        print(f"Input: {'✓ PASS' if input_pass else '✗ FAIL'} - {input_val}")
+        print(f"Editor1: {'✓ PASS' if editor1_pass else '✗ FAIL'} - {editor1_val}")
+        print(f"Editor2: {'✓ PASS' if editor2_pass else '✗ FAIL'} - {editor2_val}")
+
+        await bridge.destroy_context(group_id)
+        print("\n✓ Context destroyed")
+
+    finally:
+        await bridge.stop()
+
+
+if __name__ == "__main__":
+    asyncio.run(test_contenteditable())
@@ -0,0 +1,253 @@
+#!/usr/bin/env python
+"""
+Test #8: Autocomplete Field Clearing
+
+Symptom: Typed text gets cleared immediately
+Root Cause: Field expects realistic keystroke timing for autocomplete
+Detection: Field has autocomplete listeners or dropdown appears
+Fix: Add delay_ms between keystrokes
+"""
+
+import asyncio
+import sys
+from pathlib import Path
+
+sys.path.insert(0, str(Path(__file__).parent.parent.parent.parent / "tools" / "src"))
+
+from gcu.browser.bridge import BeelineBridge
+
+CONTEXT_NAME = "autocomplete-test"
+
+
+async def test_autocomplete():
+    """Test typing into fields with autocomplete behavior."""
+    print("=" * 70)
+    print("TEST #8: Autocomplete Field Clearing")
+    print("=" * 70)
+
+    bridge = BeelineBridge()
+
+    try:
+        await bridge.start()
+
+        for i in range(10):
+            await asyncio.sleep(1)
+            if bridge.is_connected:
+                print("✓ Extension connected!")
+                break
+        else:
+            print("✗ Extension not connected")
+            return
+
+        context = await bridge.create_context(CONTEXT_NAME)
+        tab_id = context.get("tabId")
+        group_id = context.get("groupId")
+        print(f"✓ Created tab: {tab_id}")
+
+        # Create test page with autocomplete behavior
+        test_html = """
+        <!DOCTYPE html>
+        <html>
+        <head><title>Autocomplete Test</title>
+        <style>
+            .autocomplete-items {
+                position: absolute;
+                border: 1px solid #d4d4d4;
+                border-top: none;
+                z-index: 99;
+                top: 100%;
+                left: 0;
+                right: 0;
+                max-height: 200px;
+                overflow-y: auto;
+                background: white;
+            }
+            .autocomplete-items div {
+                padding: 10px;
+                cursor: pointer;
+            }
+            .autocomplete-items div:hover {
+                background-color: #e9e9e9;
+            }
+            .autocomplete-active {
+                background-color: DodgerBlue !important;
+                color: white;
+            }
+            .autocomplete { position: relative; display: inline-block; }
+            input { width: 300px; padding: 10px; font-size: 16px; }
+        </style></head>
+        <body>
+            <h2>Autocomplete Test</h2>
+
+            <div class="autocomplete">
+                <input id="search" type="text" placeholder="Search countries..." autocomplete="off">
+            </div>
+
+            <div id="log" style="margin-top:20px;font-family:monospace;"></div>
+
+            <script>
+                const countries = [
+                    "Afghanistan","Albania","Algeria",
+                    "Andorra","Angola","Argentina",
+                    "Armenia","Australia","Austria",
+                    "Azerbaijan","Bahamas","Bahrain",
+                    "Bangladesh","Belarus","Belgium",
+                    "Belize","Benin","Bhutan",
+                    "Bolivia","Brazil","Canada",
+                    "China","Colombia","Denmark",
+                    "Egypt","France","Germany",
+                    "India","Indonesia","Italy",
+                    "Japan","Mexico","Netherlands",
+                    "Nigeria","Norway","Pakistan",
+                    "Peru","Philippines","Poland",
+                    "Portugal","Russia","Spain",
+                    "Sweden","Switzerland","Thailand",
+                    "Turkey","Ukraine",
+                    "United Kingdom","United States",
+                    "Vietnam"
+                ];
+
+                const input = document.getElementById('search');
+                const log = document.getElementById('log');
+                let currentFocus = -1;
+                let typingTimeout = null;
+
+                // Track events for testing
+                window.inputEvents = [];
+                window.inputValue = '';
+
+                function logEvent(type, value) {
+                    window.inputEvents.push({ type, value, time: Date.now() });
+                    const entry = document.createElement('div');
+                    entry.textContent = type + ': ' + value;
+                    log.insertBefore(entry, log.firstChild);
+                }
+
+                // Simulate autocomplete that clears fast typing
+                input.addEventListener('input', function(e) {
+                    const val = this.value;
+
+                    // Clear previous dropdown
+                    closeAllLists();
+
+                    if (!val) return;
+
+                    // If typing too fast (autocomplete-style), clear and restart
+                    clearTimeout(typingTimeout);
+                    typingTimeout = setTimeout(() => {
+                        logEvent('input', val);
+                        window.inputValue = val;
+
+                        // Create dropdown
+                        const div = document.createElement('div');
+                        div.setAttribute('id', this.id + 'autocomplete-list');
+                        div.setAttribute('class', 'autocomplete-items');
+                        this.parentNode.appendChild(div);
+
+                        countries.filter(
+                            c => c.substr(0, val.length).toUpperCase()
+                                === val.toUpperCase()
+                        ).slice(0, 5).forEach(country => {
+                                const item = document.createElement('div');
+                                item.innerHTML = '<strong>'
+                                    + country.substr(0, val.length)
+                                    + '</strong>'
+                                    + country.substr(val.length);
+                                item.addEventListener('click', function() {
+                                    input.value = country;
+                                    closeAllLists();
+                                    logEvent('select', country);
+                                    window.inputValue = country;
+                                });
+                                div.appendChild(item);
+                            });
+                    }, 100); // 100ms debounce
+                });
+
+                function closeAllLists() {
+                    document.querySelectorAll('.autocomplete-items').forEach(el => el.remove());
+                }
+
+                document.addEventListener('click', function() {
+                    closeAllLists();
+                });
+            </script>
+        </body>
+        </html>
+        """
+
+        # Write to file and use file:// URL (data: URLs don't work well with extension)
+        test_file = Path("/tmp/autocomplete_test.html")
+        test_file.write_text(test_html.strip())
+        file_url = f"file://{test_file}"
+        await bridge.navigate(tab_id, file_url, wait_until="load")
+        print("✓ Page loaded")
+
+        # Screenshot
+        screenshot = await bridge.screenshot(tab_id)
+        print(f"Screenshot: {len(screenshot.get('data', ''))} bytes")
+
+        # Test 1: Fast typing (no delay) - may fail
+        print("\n--- Test 1: Fast typing (delay_ms=0) ---")
+        await bridge.click(tab_id, "#search")
+        await bridge.type_text(tab_id, "#search", "Ger", clear_first=True, delay_ms=0)
+        await asyncio.sleep(0.5)
+
+        fast_result = await bridge.evaluate(
+            tab_id, "(function() { return document.getElementById('search').value; })()"
+        )
+        fast_value = fast_result.get("result", "")
+        print(f"Value after fast typing: '{fast_value}'")
+
+        # Check events
+        events_result = await bridge.evaluate(
+            tab_id, "(function() { return window.inputEvents; })()"
+        )
+        print(f"Events logged: {events_result.get('result', [])}")
+
+        # Test 2: Slow typing (with delay) - should work
+        print("\n--- Test 2: Slow typing (delay_ms=100) ---")
+        await bridge.click(tab_id, "#search")
+        await bridge.type_text(tab_id, "#search", "United", clear_first=True, delay_ms=100)
+        await asyncio.sleep(0.5)
+
+        slow_result = await bridge.evaluate(
+            tab_id, "(function() { return document.getElementById('search').value; })()"
+        )
+        slow_value = slow_result.get("result", "")
+        print(f"Value after slow typing: '{slow_value}'")
+
+        # Check if dropdown appeared
+        dropdown_result = await bridge.evaluate(
+            tab_id,
+            "(function() { return document.querySelectorAll("
+            "'.autocomplete-items div').length; })()",
+        )
+        dropdown_count = dropdown_result.get("result", 0)
+        print(f"Dropdown items: {dropdown_count}")
+
+        # Screenshot with dropdown
+        screenshot_dropdown = await bridge.screenshot(tab_id)
+        print(f"Screenshot with dropdown: {len(screenshot_dropdown.get('data', ''))} bytes")
+
+        # Results
+        print("\n--- Results ---")
+        if "United" in slow_value:
+            print("✓ PASS: Slow typing with delay_ms worked")
+        else:
+            print("✗ FAIL: Slow typing still didn't work")
+
+        if dropdown_count > 0:
+            print("✓ PASS: Autocomplete dropdown appeared")
+        else:
+            print("⚠ WARNING: No autocomplete dropdown")
+
+        await bridge.destroy_context(group_id)
+        print("\n✓ Context destroyed")
+
+    finally:
+        await bridge.stop()
+
+
+if __name__ == "__main__":
+    asyncio.run(test_autocomplete())
@@ -0,0 +1,162 @@
+#!/usr/bin/env python
+"""
+Test #10: LinkedIn Huge DOM Tree
+
+Symptom: browser_snapshot() hangs forever
+Root Cause: 10k+ DOM nodes, accessibility tree has 50k+ nodes
+Detection: document.querySelectorAll('*').length > 5000
+Fix: Add timeout (10s default), truncate tree at 2000 nodes
+"""
+
+import asyncio
+import sys
+import time
+import base64
+from pathlib import Path
+
+sys.path.insert(0, str(Path(__file__).parent.parent.parent.parent / "tools" / "src"))
+
+from gcu.browser.bridge import BeelineBridge
+
+CONTEXT_NAME = "huge-dom-test"
+
+
+async def test_huge_dom():
+    """Test snapshot performance on huge DOM trees."""
+    print("=" * 70)
+    print("TEST #10: Huge DOM Tree (LinkedIn-style)")
+    print("=" * 70)
+
+    bridge = BeelineBridge()
+
+    try:
+        await bridge.start()
+
+        for i in range(10):
+            await asyncio.sleep(1)
+            if bridge.is_connected:
+                print("✓ Extension connected!")
+                break
+        else:
+            print("✗ Extension not connected")
+            return
+
+        context = await bridge.create_context(CONTEXT_NAME)
+        tab_id = context.get("tabId")
+        group_id = context.get("groupId")
+        print(f"✓ Created tab: {tab_id}")
+
+        # Test 1: Small DOM (baseline)
+        print("\n--- Test 1: Small DOM (baseline) ---")
+        small_html = """
+        <!DOCTYPE html>
+        <html><body>
+            <h1>Small Page</h1>
+            <p>A few elements</p>
+            <button>Click me</button>
+        </body></html>
+        """
+        data_url = f"data:text/html;base64,{base64.b64encode(small_html.encode()).decode()}"
+        await bridge.navigate(tab_id, data_url, wait_until="load")
+
+        start = time.perf_counter()
+        snapshot = await bridge.snapshot(tab_id, timeout_s=5.0)
+        elapsed = time.perf_counter() - start
+        tree_len = len(snapshot.get("tree", ""))
+        print(f"Small DOM snapshot: {elapsed:.3f}s, {tree_len} chars")
+
+        # Test 2: Generate huge DOM
+        print("\n--- Test 2: Huge DOM (5000+ elements) ---")
+        huge_html = """
+        <!DOCTYPE html>
+        <html><body>
+        <h1>Huge DOM Test</h1>
+        <div id="container"></div>
+        <script>
+            const container = document.getElementById('container');
+            for (let i = 0; i < 5000; i++) {
+                const div = document.createElement('div');
+                div.className = 'item-' + i;
+                div.innerHTML = '<span>Item ' + i + '</span><button>Action</button>';
+                container.appendChild(div);
+            }
+        </script>
+        </body></html>
+        """
+        data_url = f"data:text/html;base64,{base64.b64encode(huge_html.encode()).decode()}"
+        await bridge.navigate(tab_id, data_url, wait_until="load")
+
+        # Count elements
+        count_result = await bridge.evaluate(
+            tab_id, "(function() { return document.querySelectorAll('*').length; })()"
+        )
+        elem_count = count_result.get("result", 0)
+        print(f"DOM elements: {elem_count}")
+
+        # Skip screenshot on huge DOM - it can timeout
+        # Instead verify page loaded by checking DOM
+        print("✓ Page verified (skipping screenshot on huge DOM)")
+
+        # Test snapshot with timeout
+        print("\n--- Testing snapshot with 10s timeout ---")
+        start = time.perf_counter()
+        try:
+            snapshot = await bridge.snapshot(tab_id, timeout_s=10.0)
+            elapsed = time.perf_counter() - start
+            tree_len = len(snapshot.get("tree", ""))
+            truncated = "(truncated)" in snapshot.get("tree", "")
+            print(f"✓ Huge DOM snapshot: {elapsed:.3f}s, {tree_len} chars, truncated={truncated}")
+
+            if elapsed < 5.0:
+                print("✓ PASS: Snapshot completed quickly")
+            else:
+                print(f"⚠ WARNING: Snapshot took {elapsed:.1f}s")
+
+            if truncated:
+                print("✓ PASS: Tree was truncated to prevent hang")
+            else:
+                print("⚠ WARNING: Tree not truncated (may need adjustment)")
+
+        except asyncio.TimeoutError:
+            print("✗ FAIL: Snapshot timed out (this shouldn't happen)")
+
+        # Test 3: Real LinkedIn
+        print("\n--- Test 3: Real LinkedIn Feed ---")
+        await bridge.navigate(
+            tab_id, "https://www.linkedin.com/feed", wait_until="load", timeout_ms=30000
+        )
+        await asyncio.sleep(2)
+
+        count_result = await bridge.evaluate(
+            tab_id, "(function() { return document.querySelectorAll('*').length; })()"
+        )
+        elem_count = count_result.get("result", 0)
+        print(f"LinkedIn DOM elements: {elem_count}")
+
+        start = time.perf_counter()
+        try:
+            snapshot = await bridge.snapshot(tab_id, timeout_s=15.0)
+            elapsed = time.perf_counter() - start
+            tree_len = len(snapshot.get("tree", ""))
+            truncated = "(truncated)" in snapshot.get("tree", "")
+            print(f"LinkedIn snapshot: {elapsed:.3f}s, {tree_len} chars, truncated={truncated}")
+
+            if elapsed < 5.0:
+                print("✓ PASS: LinkedIn snapshot fast enough")
+            elif elapsed < 15.0:
+                print("⚠ WARNING: LinkedIn snapshot slow but within timeout")
+            else:
+                print("✗ FAIL: LinkedIn snapshot too slow")
+
+        except asyncio.TimeoutError:
+            print("✗ FAIL: LinkedIn snapshot timed out")
+
+        await bridge.destroy_context(group_id)
+        print("\n✓ Context destroyed")
+
+    finally:
+        await bridge.stop()
+
+
+if __name__ == "__main__":
+    asyncio.run(test_huge_dom())
@@ -0,0 +1,190 @@
+#!/usr/bin/env python
+"""
+Test #13: SPA Navigation Events
+
+Symptom: wait_until="load" fires before content ready
+Root Cause: SPA uses client-side routing, no full page load
+Detection: URL changes but load event already fired
+Fix: Use wait_until="networkidle" or wait_for_selector
+"""
+
+import asyncio
+import sys
+import time
+from pathlib import Path
+
+sys.path.insert(0, str(Path(__file__).parent.parent.parent.parent / "tools" / "src"))
+
+from gcu.browser.bridge import BeelineBridge
+
+CONTEXT_NAME = "spa-nav-test"
+
+
+async def test_spa_navigation():
+    """Test navigation timing on SPA pages."""
+    print("=" * 70)
+    print("TEST #13: SPA Navigation Events")
+    print("=" * 70)
+
+    bridge = BeelineBridge()
+
+    try:
+        await bridge.start()
+
+        for i in range(10):
+            await asyncio.sleep(1)
+            if bridge.is_connected:
+                print("✓ Extension connected!")
+                break
+        else:
+            print("✗ Extension not connected")
+            return
+
+        context = await bridge.create_context(CONTEXT_NAME)
+        tab_id = context.get("tabId")
+        group_id = context.get("groupId")
+        print(f"✓ Created tab: {tab_id}")
+
+        # Create a test SPA
+        spa_html = """
+        <!DOCTYPE html>
+        <html>
+        <head>
+            <title>SPA Test</title>
+            <style>
+                nav a { margin-right: 10px; }
+                .page { padding: 20px; border: 1px solid #ccc; margin-top: 10px; }
+            </style>
+        </head>
+        <body>
+            <nav>
+                <a href="#home" onclick="navigate('home')">Home</a>
+                <a href="#about" onclick="navigate('about')">About</a>
+                <a href="#contact" onclick="navigate('contact')">Contact</a>
+            </nav>
+            <div id="app" class="page">
+                <h1>Loading...</h1>
+            </div>
+            <script>
+                // Simulate SPA routing
+                let currentPage = '';
+
+                async function navigate(page) {
+                    event.preventDefault();
+                    currentPage = page;
+
+                    // Show loading state
+                    document.getElementById('app').innerHTML = '<h1>Loading...</h1>';
+
+                    // Simulate async content loading (like real SPAs)
+                    await new Promise(r => setTimeout(r, 500));
+
+                    // Render content
+                    const content = {
+                        home: '<h1>Home Page</h1><p>Welcome!</p>'
+                            + '<button id="home-btn">Home Action</button>',
+                        about: '<h1>About Page</h1><p>Simulated SPA.</p>'
+                            + '<button id="about-btn">About Action</button>',
+                        contact: '<h1>Contact Page</h1>'
+                            + '<p>Contact us at test@example.com</p>'
+                            + '<button id="contact-btn">Contact Action</button>'
+                    };
+
+                    document.getElementById('app').innerHTML = content[page] || '<h1>404</h1>';
+                    window.location.hash = page;
+                }
+
+                // Initial load with delay (simulates SPA hydration)
+                setTimeout(() => {
+                    navigate('home');
+                }, 1000);
+
+                // Track for testing
+                window.pageLoads = [];
+                window.addEventListener('hashchange', () => {
+                    window.pageLoads.push(window.location.hash);
+                });
+            </script>
+        </body>
+        </html>
+        """
+
+        # Write to file and use file:// URL (data: URLs don't work well with extension)
+        test_file = Path("/tmp/spa_test.html")
+        test_file.write_text(spa_html.strip())
+        file_url = f"file://{test_file}"
+
+        # Test 1: wait_until="load" - may fire before content ready
+        print("\n--- Test 1: wait_until='load' ---")
+        start = time.perf_counter()
+        await bridge.navigate(tab_id, file_url, wait_until="load")
+        elapsed = time.perf_counter() - start
+        print(f"Navigation completed in {elapsed:.3f}s")
+
+        # Check content immediately
+        content = await bridge.evaluate(
+            tab_id,
+            "(function() { return document.getElementById('app').innerText; })()",
+        )
+        print(f"Content immediately after load: '{content.get('result', '')}'")
+
+        # Screenshot
+        screenshot = await bridge.screenshot(tab_id)
+        print(f"Screenshot: {len(screenshot.get('data', ''))} bytes")
+
+        # Wait for content
+        print("\n--- Waiting for content to hydrate ---")
+        await bridge.wait_for_selector(tab_id, "#home-btn", timeout_ms=5000)
+        print("✓ Content loaded")
+
+        # Check content after wait
+        content_after = await bridge.evaluate(
+            tab_id,
+            "(function() { return document.getElementById('app').innerText; })()",
+        )
+        print(f"Content after wait: '{content_after.get('result', '')}'")
+
+        # Test 2: SPA navigation (no full page load)
+        print("\n--- Test 2: SPA client-side navigation ---")
+
+        # Click "About" link
+        await bridge.click(tab_id, 'a[href="#about"]')
+        await asyncio.sleep(1)
+
+        # Check if content changed
+        about_content = await bridge.evaluate(
+            tab_id,
+            "(function() { return document.getElementById('app').innerText; })()",
+        )
+        print(f"Content after SPA nav: '{about_content.get('result', '')}'")
+
+        if "About Page" in about_content.get("result", ""):
+            print("✓ PASS: SPA navigation worked")
+        else:
+            print("✗ FAIL: SPA navigation didn't update content")
+
+        # Test 3: wait_until="networkidle"
+        print("\n--- Test 3: wait_until='networkidle' ---")
+        await bridge.navigate(tab_id, file_url, wait_until="networkidle", timeout_ms=10000)
+
+        # Check content immediately
+        content_networkidle = await bridge.evaluate(
+            tab_id,
+            "(function() { return document.getElementById('app').innerText; })()",
+        )
+        print(f"Content after networkidle: '{content_networkidle.get('result', '')}'")
+
+        if "Home Page" in content_networkidle.get("result", ""):
+            print("✓ PASS: networkidle waited for content")
+        else:
+            print("⚠ WARNING: networkidle didn't wait long enough")
+
+        await bridge.destroy_context(group_id)
+        print("\n✓ Context destroyed")
+
+    finally:
+        await bridge.stop()
+
+
+if __name__ == "__main__":
+    asyncio.run(test_spa_navigation())
@@ -0,0 +1,267 @@
+#!/usr/bin/env python
+"""
+Test #15: Screenshot Functionality
+
+Tests browser_screenshot across multiple scenarios:
+- Basic viewport screenshot
+- Full-page screenshot
+- Selector-based screenshot
+- Screenshot on complex DOM
+- Timeout handling
+
+Category: screenshot
+"""
+
+import asyncio
+import base64
+import sys
+import time
+from pathlib import Path
+
+sys.path.insert(0, str(Path(__file__).parent.parent.parent.parent / "tools" / "src"))
+
+from gcu.browser.bridge import BeelineBridge
+
+CONTEXT_NAME = "screenshot-test"
+
+SIMPLE_HTML = """<!DOCTYPE html>
+<html>
+<head><style>
+  body { margin: 0; background: #fff; font-family: sans-serif; }
+  h1 { color: #333; padding: 20px; }
+  .box { width: 200px; height: 100px; background: #4a90e2; margin: 20px; }
+  .long-content { height: 2000px; background: linear-gradient(blue, red); }
+</style></head>
+<body>
+  <h1 id="title">Screenshot Test Page</h1>
+  <div class="box" id="target-box">Target Box</div>
+  <div class="long-content"></div>
+</body>
+</html>"""
+
+
+def check_png(data: str) -> bool:
+    """Verify that base64 data decodes to a valid PNG."""
+    try:
+        raw = base64.b64decode(data)
+        return raw[:8] == b"\x89PNG\r\n\x1a\n"
+    except Exception:
+        return False
+
+
+async def test_basic_screenshot(bridge: BeelineBridge, tab_id: int, data_url: str):
+    print("\n--- Test 1: Basic Viewport Screenshot ---")
+    await bridge.navigate(tab_id, data_url, wait_until="load")
+    await asyncio.sleep(0.5)
+
+    start = time.perf_counter()
+    result = await bridge.screenshot(tab_id)
+    elapsed = time.perf_counter() - start
+
+    ok = result.get("ok")
+    data = result.get("data", "")
+    mime = result.get("mimeType", "")
+
+    print(f"  ok={ok}, mimeType={mime}, elapsed={elapsed:.3f}s")
+    print(f"  data length: {len(data)} chars")
+
+    if ok and data:
+        valid_png = check_png(data)
+        print(f"  valid PNG: {valid_png}")
+        if valid_png:
+            raw = base64.b64decode(data)
+            print(f"  PNG size: {len(raw)} bytes")
+            print("  ✓ PASS: Basic screenshot works")
+            return True
+        else:
+            print("  ✗ FAIL: Data is not a valid PNG")
+    else:
+        print(f"  ✗ FAIL: {result.get('error', 'no data')}")
+    return False
+
+
+async def test_full_page_screenshot(bridge: BeelineBridge, tab_id: int, data_url: str):
+    print("\n--- Test 2: Full Page Screenshot ---")
+    await bridge.navigate(tab_id, data_url, wait_until="load")
+    await asyncio.sleep(0.5)
+
+    viewport_result = await bridge.screenshot(tab_id, full_page=False)
+    full_result = await bridge.screenshot(tab_id, full_page=True)
+
+    v_data = viewport_result.get("data", "")
+    f_data = full_result.get("data", "")
+
+    if not v_data or not f_data:
+        print(f"  ✗ FAIL: viewport ok={viewport_result.get('ok')}, full ok={full_result.get('ok')}")
+        return False
+
+    v_size = len(base64.b64decode(v_data))
+    f_size = len(base64.b64decode(f_data))
+    print(f"  Viewport PNG: {v_size} bytes")
+    print(f"  Full page PNG: {f_size} bytes")
+
+    if f_size > v_size:
+        print("  ✓ PASS: Full page larger than viewport")
+        return True
+    else:
+        print("  ✗ FAIL: Full page not larger than viewport (may not capture long pages)")
+        return False
+
+
+async def test_selector_screenshot(bridge: BeelineBridge, tab_id: int, data_url: str):
+    print("\n--- Test 3: Selector Screenshot ---")
+    await bridge.navigate(tab_id, data_url, wait_until="load")
+    await asyncio.sleep(0.5)
+
+    # selector param exists in signature but may not be implemented
+    result = await bridge.screenshot(tab_id, selector="#target-box")
+
+    ok = result.get("ok")
+    data = result.get("data", "")
+
+    if ok and data:
+        # If implemented, the box screenshot should be smaller than a full viewport screenshot
+        full_result = await bridge.screenshot(tab_id)
+        full_data = full_result.get("data", "")
+
+        if full_data:
+            sel_size = len(base64.b64decode(data))
+            full_size = len(base64.b64decode(full_data))
+            print(f"  Selector PNG: {sel_size} bytes")
+            print(f"  Full page PNG: {full_size} bytes")
+            if sel_size < full_size:
+                print("  ✓ PASS: Selector screenshot smaller than full page")
+                return True
+            else:
+                print("  ⚠ WARNING: Selector screenshot not smaller (may be full page)")
+                return False
+    else:
+        print(
+            "  ⚠ NOT IMPLEMENTED: selector param ignored"
+            f" (returns full page) - error={result.get('error')}"
+        )
+        print("  NOTE: selector parameter exists in signature but is not used in implementation")
+        return False
+
+
+async def test_screenshot_url_metadata(bridge: BeelineBridge, tab_id: int):
+    print("\n--- Test 4: Screenshot URL Metadata ---")
+    await bridge.navigate(tab_id, "https://example.com", wait_until="load")
+    await asyncio.sleep(1)
+
+    result = await bridge.screenshot(tab_id)
+    url = result.get("url", "")
+    tab = result.get("tabId")
+
+    print(f"  url={url!r}, tabId={tab}")
+
+    if "example.com" in url:
+        print("  ✓ PASS: URL metadata captured correctly")
+        return True
+    else:
+        print(f"  ✗ FAIL: Expected example.com in URL, got {url!r}")
+        return False
+
+
+async def test_screenshot_timeout(bridge: BeelineBridge, tab_id: int, data_url: str):
+    print("\n--- Test 5: Timeout Handling ---")
+    await bridge.navigate(tab_id, data_url, wait_until="load")
+
+    # Very short timeout - likely still completes since simple page
+    start = time.perf_counter()
+    result = await bridge.screenshot(tab_id, timeout_s=0.001)
+    elapsed = time.perf_counter() - start
+
+    if not result.get("ok"):
+        err = result.get("error", "")
+        if "timed out" in err or "cancelled" in err:
+            print(f"  ✓ PASS: Timeout handled gracefully: {err!r}")
+            return True
+        else:
+            print(f"  ⚠ Fast enough to beat timeout: {err!r} in {elapsed:.3f}s")
+            return True  # Not a failure, just fast
+    else:
+        print(
+            f"  ⚠ Screenshot completed before timeout ({elapsed:.3f}s) - too fast to test timeout"
+        )
+        return True  # Still ok, just very fast
+
+
+async def test_screenshot_complex_site(bridge: BeelineBridge, tab_id: int):
+    print("\n--- Test 6: Complex Site (example.com) ---")
+    await bridge.navigate(tab_id, "https://example.com", wait_until="load")
+    await asyncio.sleep(1)
+
+    start = time.perf_counter()
+    result = await bridge.screenshot(tab_id)
+    elapsed = time.perf_counter() - start
+
+    ok = result.get("ok")
+    data = result.get("data", "")
+
+    print(f"  ok={ok}, elapsed={elapsed:.3f}s, data_len={len(data)}")
+    if ok and check_png(data):
+        print("  ✓ PASS: Screenshot on real site works")
+        return True
+    else:
+        print(f"  ✗ FAIL: {result.get('error', 'bad data')}")
+        return False
+
+
+async def main():
+    print("=" * 70)
+    print("TEST #15: Screenshot Functionality")
+    print("=" * 70)
+
+    bridge = BeelineBridge()
+
+    try:
+        await bridge.start()
+
+        for i in range(10):
+            await asyncio.sleep(1)
+            if bridge.is_connected:
+                print("✓ Extension connected!")
+                break
+            print(f"Waiting for extension... ({i + 1}/10)")
+        else:
+            print("✗ Extension not connected. Ensure Chrome with Beeline extension is running.")
+            return
+
+        context = await bridge.create_context(CONTEXT_NAME)
+        tab_id = context.get("tabId")
+        group_id = context.get("groupId")
+        print(f"✓ Created tab: {tab_id}")
+
+        data_url = f"data:text/html;base64,{base64.b64encode(SIMPLE_HTML.encode()).decode()}"
+
+        results = {
+            "basic": await test_basic_screenshot(bridge, tab_id, data_url),
+            "full_page": await test_full_page_screenshot(bridge, tab_id, data_url),
+            "selector": await test_selector_screenshot(bridge, tab_id, data_url),
+            "metadata": await test_screenshot_url_metadata(bridge, tab_id),
+            "timeout": await test_screenshot_timeout(bridge, tab_id, data_url),
+            "complex_site": await test_screenshot_complex_site(bridge, tab_id),
+        }
+
+        print("\n" + "=" * 70)
+        print("SUMMARY")
+        print("=" * 70)
+        for name, passed in results.items():
+            status = "✓ PASS" if passed else "✗ FAIL"
+            print(f"  {status}: {name}")
+
+        passed_count = sum(1 for v in results.values() if v)
+        total = len(results)
+        print(f"\n  {passed_count}/{total} tests passed")
+
+        await bridge.destroy_context(group_id)
+        print("\n✓ Context destroyed")
+
+    finally:
+        await bridge.stop()
+        print("✓ Bridge stopped")
+
+
+if __name__ == "__main__":
+    asyncio.run(main())
@@ -0,0 +1,333 @@
+#!/usr/bin/env python
+"""
+Browser Edge Case Test Template
+
+This script provides a template for testing and debugging browser tool failures
+on specific websites. Use this to reproduce, isolate, and verify fixes.
+
+Usage:
+    1. Copy this file: cp test_case.py test_#[number]_[site].py
+    2. Fill in the CONFIG section with your test details
+    3. Run: uv run python test_#[number]_[site].py
+
+Example:
+    uv run python test_01_linkedin_scroll.py
+"""
+
+import asyncio
+import sys
+import time
+from pathlib import Path
+
+# Add tools to path
+sys.path.insert(0, str(Path(__file__).parent.parent.parent.parent / "tools" / "src"))
+
+from gcu.browser.bridge import BeelineBridge
+
+# ═══════════════════════════════════════════════════════════════════════════════
+# CONFIG: Fill in these values for your test case
+# ═══════════════════════════════════════════════════════════════════════════════
+
+TEST_CASE = {
+    "number": 1,
+    "name": "LinkedIn Nested Scroll Container",
+    "site": "https://www.linkedin.com/feed",
+    "simple_site": "https://example.com",
+    "category": "scroll",  # scroll, click, input, snapshot, navigation
+    "symptom": "scroll() returns success but page doesn't move",
+}
+
+BRIDGE_PORT = 9229
+CONTEXT_NAME = "edge-case-test"
+
+
+# ═══════════════════════════════════════════════════════════════════════════════
+# TEST FUNCTIONS
+# ═══════════════════════════════════════════════════════════════════════════════
+
+
+async def test_simple_site(bridge: BeelineBridge, tab_id: int) -> dict:
+    """Test that the tool works on a simple site (baseline)."""
+    print("\n--- Baseline Test (Simple Site) ---")
+
+    await bridge.navigate(tab_id, TEST_CASE["simple_site"], wait_until="load")
+    await asyncio.sleep(1)
+
+    # Adjust this based on category
+    if TEST_CASE["category"] == "scroll":
+        result = await bridge.scroll(tab_id, "down", 100)
+        print(f"  Scroll result: {result}")
+        return result
+    elif TEST_CASE["category"] == "click":
+        # Add click test
+        pass
+    elif TEST_CASE["category"] == "snapshot":
+        result = await bridge.snapshot(tab_id, timeout_s=5.0)
+        print(f"  Snapshot length: {len(result.get('tree', ''))}")
+        return result
+
+    return {"ok": True}
+
+
+async def test_problematic_site(bridge: BeelineBridge, tab_id: int) -> dict:
+    """Test the tool on the problematic site."""
+    print("\n--- Problem Site Test ---")
+
+    await bridge.navigate(tab_id, TEST_CASE["site"], wait_until="load", timeout_ms=30000)
+    await asyncio.sleep(2)
+
+    # Adjust this based on category
+    if TEST_CASE["category"] == "scroll":
+        # Get scroll positions before
+        before = await bridge.evaluate(
+            tab_id,
+            """
+            (function() {
+                const results = { window: { y: window.scrollY } };
+                document.querySelectorAll('*').forEach((el, i) => {
+                    const style = getComputedStyle(el);
+                    if ((style.overflowY === 'scroll' || style.overflowY === 'auto') &&
+                        el.scrollHeight > el.clientHeight) {
+                        results['el_' + i] = {
+                            tag: el.tagName,
+                            scrollTop: el.scrollTop,
+                            class: el.className.substring(0, 30)
+                        };
+                    }
+                });
+                return results;
+            })();
+        """,
+        )
+        print(f"  Before scroll: {before.get('result', {})}")
+
+        # Try to scroll
+        result = await bridge.scroll(tab_id, "down", 500)
+        print(f"  Scroll result: {result}")
+
+        await asyncio.sleep(1)
+
+        # Get scroll positions after
+        after = await bridge.evaluate(
+            tab_id,
+            """
+            (function() {
+                const results = { window: { y: window.scrollY } };
+                document.querySelectorAll('*').forEach((el, i) => {
+                    const style = getComputedStyle(el);
+                    if ((style.overflowY === 'scroll' || style.overflowY === 'auto') &&
+                        el.scrollHeight > el.clientHeight) {
+                        results['el_' + i] = {
+                            tag: el.tagName,
+                            scrollTop: el.scrollTop,
+                            class: el.className.substring(0, 30)
+                        };
+                    }
+                });
+                return results;
+            })();
+        """,
+        )
+        print(f"  After scroll: {after.get('result', {})}")
+
+        # Check if anything changed
+        before_data = before.get("result", {}) or {}
+        after_data = after.get("result", {}) or {}
+
+        changed = False
+        for key in after_data:
+            if key in before_data:
+                b_val = (
+                    before_data[key].get("scrollTop", 0)
+                    if isinstance(before_data[key], dict)
+                    else 0
+                )
+                a_val = (
+                    after_data[key].get("scrollTop", 0) if isinstance(after_data[key], dict) else 0
+                )
+                if a_val != b_val:
+                    print(f"  ✓ CHANGE DETECTED: {key} scrolled from {b_val} to {a_val}")
+                    changed = True
+
+        if not changed:
+            print("  ✗ NO CHANGE: Scroll did not affect any container")
+
+        return {"ok": changed, "scroll_result": result}
+
+    elif TEST_CASE["category"] == "snapshot":
+        start = time.perf_counter()
+        try:
+            result = await bridge.snapshot(tab_id, timeout_s=15.0)
+            elapsed = time.perf_counter() - start
+            tree_len = len(result.get("tree", ""))
+            print(f"  Snapshot completed in {elapsed:.2f}s, {tree_len} chars")
+            return {"ok": True, "elapsed": elapsed, "tree_length": tree_len}
+        except asyncio.TimeoutError:
+            print("  ✗ SNAPSHOT TIMED OUT")
+            return {"ok": False, "error": "timeout"}
+
+    return {"ok": True}
+
+
+async def detect_root_cause(bridge: BeelineBridge, tab_id: int) -> dict:
+    """Run detection scripts to identify the root cause."""
+    print("\n--- Root Cause Detection ---")
+
+    detections = {}
+
+    # Detection 1: Nested scrollable containers
+    scroll_check = await bridge.evaluate(
+        tab_id,
+        """
+        (function() {
+            const candidates = [];
+            document.querySelectorAll('*').forEach(el => {
+                const style = getComputedStyle(el);
+                if (style.overflow.includes('scroll') || style.overflow.includes('auto')) {
+                    const rect = el.getBoundingClientRect();
+                    if (rect.width > 100 && rect.height > 100) {
+                        candidates.push({
+                            tag: el.tagName,
+                            area: rect.width * rect.height,
+                            class: el.className.substring(0, 30)
+                        });
+                    }
+                }
+            });
+            candidates.sort((a, b) => b.area - a.area);
+            return {
+                count: candidates.length,
+                largest: candidates[0]
+            };
+        })();
+    """,
+    )
+    detections["nested_scroll"] = scroll_check.get("result", {})
+    print(f"  Nested scroll containers: {detections['nested_scroll']}")
+
+    # Detection 2: Shadow DOM
+    shadow_check = await bridge.evaluate(
+        tab_id,
+        """
+        (function() {
+            const withShadow = [];
+            document.querySelectorAll('*').forEach(el => {
+                if (el.shadowRoot) {
+                    withShadow.push(el.tagName);
+                }
+            });
+            return { count: withShadow.length, elements: withShadow.slice(0, 5) };
+        })();
+    """,
+    )
+    detections["shadow_dom"] = shadow_check.get("result", {})
+    print(f"  Shadow DOM: {detections['shadow_dom']}")
+
+    # Detection 3: iframes
+    iframe_check = await bridge.evaluate(
+        tab_id,
+        """
+        (function() {
+            const iframes = document.querySelectorAll('iframe');
+            return { count: iframes.length };
+        })();
+    """,
+    )
+    detections["iframes"] = iframe_check.get("result", {})
+    print(f"  iframes: {detections['iframes']}")
+
+    # Detection 4: DOM size
+    dom_check = await bridge.evaluate(
+        tab_id,
+        """
+        (function() {
+            return {
+                elements: document.querySelectorAll('*').length,
+                body_children: document.body.children.length
+            };
+        })();
+    """,
+    )
+    detections["dom_size"] = dom_check.get("result", {})
+    print(f"  DOM size: {detections['dom_size']}")
+
+    # Detection 5: Framework detection
+    framework_check = await bridge.evaluate(
+        tab_id,
+        """
+        (function() {
+            return {
+                react: !!document.querySelector('[data-reactroot], [data-reactid]'),
+                vue: !!document.querySelector('[data-v-]'),
+                angular: !!document.querySelector('[ng-app], [ng-version]')
+            };
+        })();
+    """,
+    )
+    detections["frameworks"] = framework_check.get("result", {})
+    print(f"  Frameworks: {detections['frameworks']}")
+
+    return detections
+
+
+# ═══════════════════════════════════════════════════════════════════════════════
+# MAIN
+# ═══════════════════════════════════════════════════════════════════════════════
+
+
+async def main():
+    print("=" * 70)
+    print(f"EDGE CASE TEST #{TEST_CASE['number']}: {TEST_CASE['name']}")
+    print("=" * 70)
+    print(f"Site: {TEST_CASE['site']}")
+    print(f"Category: {TEST_CASE['category']}")
+    print(f"Symptom: {TEST_CASE['symptom']}")
+
+    bridge = BeelineBridge()
+
+    try:
+        print("\n--- Starting Bridge ---")
+        await bridge.start()
+
+        # Wait for extension connection
+        for i in range(10):
+            await asyncio.sleep(1)
+            if bridge.is_connected:
+                print("✓ Extension connected!")
+                break
+            print(f"Waiting for extension... ({i + 1}/10)")
+        else:
+            print("✗ Extension not connected. Ensure Chrome with Beeline extension is running.")
+            return
+
+        # Create browser context
+        context = await bridge.create_context(CONTEXT_NAME)
+        tab_id = context.get("tabId")
+        group_id = context.get("groupId")
+        print(f"✓ Created tab: {tab_id}")
+
+        # Run tests
+        baseline_result = await test_simple_site(bridge, tab_id)
+        problem_result = await test_problematic_site(bridge, tab_id)
+        detections = await detect_root_cause(bridge, tab_id)
+
+        # Summary
+        print("\n" + "=" * 70)
+        print("SUMMARY")
+        print("=" * 70)
+        print(f"Baseline test: {'✓ PASS' if baseline_result.get('ok') else '✗ FAIL'}")
+        print(f"Problem test: {'✓ PASS' if problem_result.get('ok') else '✗ FAIL'}")
+        print(f"Root cause indicators: {list(k for k, v in detections.items() if v)}")
+
+        # Cleanup
+        print("\n--- Cleanup ---")
+        await bridge.destroy_context(group_id)
+        print("✓ Context destroyed")
+
+    finally:
+        await bridge.stop()
+        print("✓ Bridge stopped")
+
+
+if __name__ == "__main__":
+    asyncio.run(main())
@@ -0,0 +1,225 @@
+# Integration Test Reporting Skill
+
+Run the Level 2 dummy agent integration test suite and produce a detailed HTML report with per-test input → outcome analysis.
+
+## Trigger
+
+User wants to run integration tests and see results:
+- `/test-reporting`
+- `/test-reporting test_component_queen_live.py`
+- `/test-reporting --all`
+
+## SOP: Running Tests
+
+### Step 1: Select Scope
+
+If the user provides a specific test file or pattern, use it. Otherwise run the full suite.
+
+```bash
+# Full suite
+cd core && echo "1" | uv run python tests/dummy_agents/run_all.py --interactive 2>&1
+
+# Specific file (requires manual provider setup)
+cd core && uv run python -c "
+import sys
+sys.path.insert(0, '.')
+from tests.dummy_agents.run_all import detect_available
+from tests.dummy_agents.conftest import set_llm_selection
+
+avail = detect_available()
+claude = [p for p in avail if 'Claude Code' in p['name']]
+if not claude:
+    avail_names = [p['name'] for p in avail]
+    raise RuntimeError(f'No Claude Code subscription. Available: {avail_names}')
+provider = claude[0]
+set_llm_selection(
+    model=provider['model'],
+    api_key=provider['api_key'],
+    extra_headers=provider.get('extra_headers'),
+    api_base=provider.get('api_base'),
+)
+
+import pytest
+sys.exit(pytest.main([
+    'tests/dummy_agents/TEST_FILE_HERE',
+    '-v', '--override-ini=asyncio_mode=auto', '--no-header', '--tb=long',
+    '--log-cli-level=WARNING', '--junitxml=/tmp/hive_test_results.xml',
+]))
+"
+```
+
+### Step 2: Collect Results
+
+After the test run completes, collect:
+1. **JUnit XML** from `--junitxml` output (if available)
+2. **stdout/stderr** from the run
+3. **Summary table** from `run_all.py` output (the Unicode table)
+
+### Step 3: Generate HTML Report
+
+Write the report to `/tmp/hive_integration_test_report.html`.
+
+The report MUST include these sections:
+
+#### Header
+- Run timestamp (ISO 8601)
+- Provider used (model name, source)
+- Total tests / passed / failed / skipped
+- Total wall-clock time
+- Overall verdict: PASS (all green) or FAIL (with count)
+
+#### Per-Test Table
+
+For EVERY test (not just failures), include a row with:
+
+| Column | Description |
+|--------|-------------|
+| Component | Test file grouping (e.g., `component_queen_live`) |
+| Test Name | Function name (e.g., `test_queen_starts_in_planning_without_worker`) |
+| Status | PASS / FAIL / SKIP / ERROR with color badge |
+| Duration | Wall-clock seconds |
+| What | One-line description of what the test verifies |
+| How | How it works (setup → action → assertion) |
+| Why | Why this test matters (what bug/behavior it catches) |
+| Input | The input data or configuration (graph spec, initial prompt, phase, etc.) |
+| Expected Outcome | What the test asserts |
+| Actual Outcome | What actually happened (PASS: matches expected / FAIL: actual vs expected) |
+| Failure Detail | For failures only: full traceback + diagnosis |
+
+#### What / How / Why Descriptions
+
+These MUST be derived from the test function's docstring and code. Read each test file to extract:
+- **What**: From the docstring first line
+- **How**: From the test body (what fixtures, what graph, what assertions)
+- **Why**: From the docstring body or "Why this matters" section in the test module
+
+Use these mappings for the component test files:
+
+```
+test_component_llm.py          → "LLM Provider" — streaming, tool calling, tokens
+test_component_tools.py        → "Tool Registry + MCP" — connection, execution
+test_component_event_loop.py   → "EventLoopNode" — iteration, output, stall
+test_component_edges.py        → "Edge Evaluation" — conditional, priority
+test_component_conversation.py → "Conversation Persistence" — storage, cursor
+test_component_escalation.py   → "Escalation Flow" — worker→queen signaling
+test_component_continuous.py   → "Continuous Mode" — conversation threading
+test_component_queen.py        → "Queen Phase (Unit)" — phase state, tools, events
+test_component_queen_live.py   → "Queen Phase (Live)" — real queen, real LLM
+test_component_queen_state_machine.py → "Queen State Machine" — edge cases, races
+test_component_worker_comms.py → "Worker Communication" — events, data flow
+test_component_strict_outcomes.py → "Strict Outcomes" — exact path, output, quality
+```
+
+#### HTML Template
+
+Use this structure:
+
+```html
+<!DOCTYPE html>
+<html lang="en">
+<head>
+<meta charset="utf-8">
+<title>Hive Integration Test Report — {timestamp}</title>
+<style>
+  :root { --pass: #22c55e; --fail: #ef4444; --skip: #f59e0b; --bg: #0f172a; --surface: #1e293b; --text: #e2e8f0; --muted: #94a3b8; --border: #334155; }
+  * { box-sizing: border-box; margin: 0; padding: 0; }
+  body { font-family: 'SF Mono', 'Fira Code', monospace; background: var(--bg); color: var(--text); padding: 2rem; line-height: 1.6; }
+  h1, h2, h3 { font-weight: 600; }
+  h1 { font-size: 1.5rem; margin-bottom: 1rem; }
+  h2 { font-size: 1.2rem; margin: 2rem 0 1rem; border-bottom: 1px solid var(--border); padding-bottom: 0.5rem; }
+  .summary { display: grid; grid-template-columns: repeat(auto-fit, minmax(150px, 1fr)); gap: 1rem; margin-bottom: 2rem; }
+  .card { background: var(--surface); padding: 1rem; border-radius: 8px; border: 1px solid var(--border); }
+  .card .label { color: var(--muted); font-size: 0.75rem; text-transform: uppercase; }
+  .card .value { font-size: 1.5rem; font-weight: 700; margin-top: 0.25rem; }
+  .card .value.pass { color: var(--pass); }
+  .card .value.fail { color: var(--fail); }
+  table { width: 100%; border-collapse: collapse; font-size: 0.8rem; }
+  th { background: var(--surface); position: sticky; top: 0; text-align: left; padding: 0.5rem; border-bottom: 2px solid var(--border); color: var(--muted); text-transform: uppercase; font-size: 0.7rem; }
+  td { padding: 0.5rem; border-bottom: 1px solid var(--border); vertical-align: top; }
+  tr:hover { background: rgba(255,255,255,0.03); }
+  .badge { display: inline-block; padding: 2px 8px; border-radius: 4px; font-size: 0.7rem; font-weight: 700; }
+  .badge.pass { background: rgba(34,197,94,0.2); color: var(--pass); }
+  .badge.fail { background: rgba(239,68,68,0.2); color: var(--fail); }
+  .badge.skip { background: rgba(245,158,11,0.2); color: var(--skip); }
+  .detail { background: #1a1a2e; padding: 0.75rem; border-radius: 4px; margin-top: 0.5rem; font-size: 0.75rem; white-space: pre-wrap; overflow-x: auto; max-height: 200px; overflow-y: auto; }
+  .component-header { background: var(--surface); padding: 0.75rem 0.5rem; font-weight: 600; font-size: 0.85rem; }
+  .meta { color: var(--muted); font-size: 0.75rem; }
+</style>
+</head>
+<body>
+<h1>Hive Integration Test Report</h1>
+<p class="meta">Generated: {timestamp} | Provider: {provider} | Duration: {duration}s</p>
+
+<div class="summary">
+  <div class="card"><div class="label">Total</div><div class="value">{total}</div></div>
+  <div class="card"><div class="label">Passed</div><div class="value pass">{passed}</div></div>
+  <div class="card"><div class="label">Failed</div><div class="value fail">{failed}</div></div>
+  <div class="card"><div class="label">Verdict</div><div class="value {verdict_class}">{verdict}</div></div>
+</div>
+
+<h2>Test Results</h2>
+<table>
+<thead>
+<tr>
+  <th>Component</th>
+  <th>Test</th>
+  <th>Status</th>
+  <th>Time</th>
+  <th>What</th>
+  <th>Input → Expected → Actual</th>
+</tr>
+</thead>
+<tbody>
+<!-- For each test: -->
+<tr>
+  <td>{component}</td>
+  <td>{test_name}</td>
+  <td><span class="badge {status_class}">{status}</span></td>
+  <td>{duration}s</td>
+  <td>{what_description}</td>
+  <td>
+    <strong>Input:</strong> {input_description}<br>
+    <strong>Expected:</strong> {expected_outcome}<br>
+    <strong>Actual:</strong> {actual_outcome}
+    <!-- If failed: -->
+    <div class="detail">{failure_traceback}</div>
+  </td>
+</tr>
+</tbody>
+</table>
+
+<h2>Failure Analysis</h2>
+<!-- Only if there are failures -->
+<p>For each failure, provide:</p>
+<ul>
+  <li><strong>Root cause:</strong> Why it failed</li>
+  <li><strong>Impact:</strong> What this means for the system</li>
+  <li><strong>Suggested fix:</strong> How to address it</li>
+</ul>
+
+</body>
+</html>
+```
+
+### Step 4: Output
+
+1. Write the HTML file to `/tmp/hive_integration_test_report.html`
+2. Print the file path so the user can open it
+3. Print a concise summary to the terminal:
+   ```
+   Test Report: /tmp/hive_integration_test_report.html
+   Result: 74/76 PASSED (2 failures)
+   Failures:
+     - parallel_merge::test_parallel_disjoint_output_keys
+     - worker::test_worker_timestamped_note_artifact
+   ```
+
+## Key Rules
+
+1. ALWAYS use `--junitxml` when running pytest to get structured results
+2. ALWAYS read the test source files to populate What/How/Why columns — do not guess
+3. For Input/Expected/Actual, extract from the test's graph spec, assertions, and result
+4. Color-code everything: green for pass, red for fail, amber for skip
+5. Include the full traceback for failures in a scrollable `<div class="detail">`
+6. Group tests by component (file name) with a visual separator
+7. The report must be self-contained HTML (no external CSS/JS dependencies)
@@ -0,0 +1,78 @@
+name: Standard Bounty
+description: A bounty task for general framework contributions (not integration-specific)
+title: "[Bounty]: "
+labels: []
+body:
+  - type: markdown
+    attributes:
+      value: |
+        ## Standard Bounty
+
+        This issue is part of the [Bounty Program](../../docs/bounty-program/README.md).
+        **Claim this bounty** by commenting below — a maintainer will assign you within 24 hours.
+
+  - type: dropdown
+    id: bounty-size
+    attributes:
+      label: Bounty Size
+      options:
+        - "Small (10 pts)"
+        - "Medium (30 pts)"
+        - "Large (75 pts)"
+        - "Extreme (150 pts)"
+    validations:
+      required: true
+
+  - type: dropdown
+    id: difficulty
+    attributes:
+      label: Difficulty
+      options:
+        - Easy
+        - Medium
+        - Hard
+    validations:
+      required: true
+
+  - type: textarea
+    id: description
+    attributes:
+      label: Description
+      description: What needs to be done to complete this bounty.
+      placeholder: |
+        Describe the specific task, including:
+        - What the contributor needs to do
+        - Links to relevant files in the repo
+        - Any context or motivation for the change
+    validations:
+      required: true
+
+  - type: textarea
+    id: acceptance-criteria
+    attributes:
+      label: Acceptance Criteria
+      description: What "done" looks like. The PR must meet all criteria.
+      placeholder: |
+        - [ ] Criterion 1
+        - [ ] Criterion 2
+        - [ ] CI passes
+    validations:
+      required: true
+
+  - type: textarea
+    id: relevant-files
+    attributes:
+      label: Relevant Files
+      description: Links to files or directories related to this bounty.
+      placeholder: |
+        - `path/to/file.py`
+        - `path/to/directory/`
+
+  - type: textarea
+    id: resources
+    attributes:
+      label: Resources
+      description: Links to docs, issues, or external references that will help.
+      placeholder: |
+        - Related issue: #XXXX
+        - Docs: https://...
@@ -2,14 +2,22 @@ name: Bounty completed
 description: Awards points and notifies Discord when a bounty PR is merged

 on:
-  pull_request:
+  pull_request_target:
    types: [closed]

+  workflow_dispatch:
+    inputs:
+      pr_number:
+        description: "PR number to process (for missed bounties)"
+        required: true
+        type: number
+
 jobs:
  bounty-notify:
    if: >
-      github.event.pull_request.merged == true &&
-      contains(join(github.event.pull_request.labels.*.name, ','), 'bounty:')
+      github.event_name == 'workflow_dispatch' ||
+      (github.event.pull_request.merged == true &&
+       contains(join(github.event.pull_request.labels.*.name, ','), 'bounty:'))
    runs-on: ubuntu-latest
    timeout-minutes: 5
    permissions:
@@ -32,6 +40,8 @@ jobs:
          GITHUB_REPOSITORY_OWNER: ${{ github.repository_owner }}
          GITHUB_REPOSITORY_NAME: ${{ github.event.repository.name }}
          DISCORD_WEBHOOK_URL: ${{ secrets.DISCORD_BOUNTY_WEBHOOK_URL }}
+          BOT_API_URL: ${{ secrets.BOT_API_URL }}
+          BOT_API_KEY: ${{ secrets.BOT_API_KEY }}
          LURKR_API_KEY: ${{ secrets.LURKR_API_KEY }}
          LURKR_GUILD_ID: ${{ secrets.LURKR_GUILD_ID }}
-          PR_NUMBER: ${{ github.event.pull_request.number }}
+          PR_NUMBER: ${{ inputs.pr_number || github.event.pull_request.number }}
@@ -63,7 +63,7 @@ jobs:
        working-directory: core
        run: |
          uv sync
-          uv run pytest tests/ -v
+          uv run pytest tests/ -v --ignore=tests/dummy_agents

  test-tools:
    name: Test Tools (${{ matrix.os }})
@@ -1,126 +0,0 @@
-name: Link Discord account
-description: Auto-creates a PR to add contributor to contributors.yml when a link-discord issue is opened
-
-on:
-  issues:
-    types: [opened]
-
-jobs:
-  link-discord:
-    if: contains(github.event.issue.labels.*.name, 'link-discord')
-    runs-on: ubuntu-latest
-    timeout-minutes: 2
-    permissions:
-      contents: write
-      issues: write
-      pull-requests: write
-
-    steps:
-      - name: Checkout repository
-        uses: actions/checkout@v4
-
-      - name: Parse issue and update contributors.yml
-        uses: actions/github-script@v7
-        with:
-          script: |
-            const fs = require('fs');
-
-            const issue = context.payload.issue;
-            const githubUsername = issue.user.login;
-
-            // Parse the issue body for form fields
-            const body = issue.body || '';
-
-            // Extract Discord ID — look for the numeric value after the "Discord User ID" heading
-            const discordMatch = body.match(/### Discord User ID\s*\n\s*(\d{17,20})/);
-            if (!discordMatch) {
-              await github.rest.issues.createComment({
-                ...context.repo,
-                issue_number: issue.number,
-                body: `Could not find a valid Discord ID in the issue body. Please make sure you entered a numeric ID (17-20 digits), not a username.\n\nExample: \`123456789012345678\``
-              });
-              await github.rest.issues.update({
-                ...context.repo,
-                issue_number: issue.number,
-                state: 'closed',
-                state_reason: 'not_planned'
-              });
-              return;
-            }
-            const discordId = discordMatch[1];
-
-            // Extract display name (optional)
-            const nameMatch = body.match(/### Display Name \(optional\)\s*\n\s*(.+)/);
-            const displayName = nameMatch ? nameMatch[1].trim() : '';
-
-            // Check if user already exists
-            const yml = fs.readFileSync('contributors.yml', 'utf-8');
-            if (yml.includes(`github: ${githubUsername}`)) {
-              await github.rest.issues.createComment({
-                ...context.repo,
-                issue_number: issue.number,
-                body: `@${githubUsername} is already in \`contributors.yml\`. If you need to update your Discord ID, please edit the file directly via PR.`
-              });
-              await github.rest.issues.update({
-                ...context.repo,
-                issue_number: issue.number,
-                state: 'closed',
-                state_reason: 'completed'
-              });
-              return;
-            }
-
-            // Append entry to contributors.yml
-            let entry = `  - github: ${githubUsername}\n    discord: "${discordId}"`;
-            if (displayName && displayName !== '_No response_') {
-              entry += `\n    name: ${displayName}`;
-            }
-            entry += '\n';
-
-            const updated = yml.trimEnd() + '\n' + entry;
-            fs.writeFileSync('contributors.yml', updated);
-
-            // Set outputs for commit step
-            core.exportVariable('GITHUB_USERNAME', githubUsername);
-            core.exportVariable('DISCORD_ID', discordId);
-            core.exportVariable('ISSUE_NUMBER', issue.number.toString());
-
-      - name: Create PR
-        run: |
-          # Check if there are changes
-          if git diff --quiet contributors.yml; then
-            echo "No changes to contributors.yml"
-            exit 0
-          fi
-
-          BRANCH="docs/link-discord-${GITHUB_USERNAME}"
-          git config user.name "github-actions[bot]"
-          git config user.email "41898282+github-actions[bot]@users.noreply.github.com"
-          git checkout -b "$BRANCH"
-          git add contributors.yml
-          git commit -m "docs: link @${GITHUB_USERNAME} to Discord"
-          git push origin "$BRANCH"
-
-          gh pr create \
-            --title "docs: link @${GITHUB_USERNAME} to Discord" \
-            --body "Adds @${GITHUB_USERNAME} (Discord \`${DISCORD_ID}\`) to \`contributors.yml\` for bounty XP tracking.
-
-          Closes #${ISSUE_NUMBER}" \
-            --base main \
-            --head "$BRANCH" \
-            --label "link-discord"
-        env:
-          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
-
-      - name: Notify on issue
-        uses: actions/github-script@v7
-        with:
-          script: |
-            const username = process.env.GITHUB_USERNAME;
-            const issueNumber = parseInt(process.env.ISSUE_NUMBER);
-
-            await github.rest.issues.createComment({
-              ...context.repo,
-              issue_number: issueNumber,
-              body: `A PR has been created to link your account. A maintainer will merge it shortly — once merged, you'll receive XP and Discord pings when your bounty PRs are merged.`
-            });
@@ -35,6 +35,8 @@ jobs:
          GITHUB_REPOSITORY_OWNER: ${{ github.repository_owner }}
          GITHUB_REPOSITORY_NAME: ${{ github.event.repository.name }}
          DISCORD_WEBHOOK_URL: ${{ secrets.DISCORD_BOUNTY_WEBHOOK_URL }}
+          BOT_API_URL: ${{ secrets.BOT_API_URL }}
+          BOT_API_KEY: ${{ secrets.BOT_API_KEY }}
          LURKR_API_KEY: ${{ secrets.LURKR_API_KEY }}
          LURKR_GUILD_ID: ${{ secrets.LURKR_GUILD_ID }}
          SINCE_DATE: ${{ github.event.inputs.since_date || '' }}
@@ -13,6 +13,10 @@ out/
 .env
 .env.local
 .env.*.local
+.venv
+/venv
+tools/src/uv.lock
+

 # User configuration (copied from .example)
 config.yaml
@@ -68,9 +72,6 @@ temp/
 exports/*

 .claude/settings.local.json
-.claude/skills/ship-it/
-
-.venv

 docs/github-issues/*
 core/tests/*dumps/*
@@ -0,0 +1,9 @@
+{"type": "connection", "event": "connect", "ts": "2026-04-04T01:10:38.245667+00:00", "profile": "default"}
+{"type": "connection", "event": "hello", "details": {"version": "1.0"}, "ts": "2026-04-04T01:10:38.247207+00:00", "profile": "default"}
+{"type": "connection", "event": "disconnect", "ts": "2026-04-04T01:11:57.148273+00:00", "profile": "default"}
+{"type": "connection", "event": "connect", "ts": "2026-04-04T01:12:09.162378+00:00", "profile": "default"}
+{"type": "connection", "event": "hello", "details": {"version": "1.0"}, "ts": "2026-04-04T01:12:09.163899+00:00", "profile": "default"}
+{"type": "connection", "event": "disconnect", "ts": "2026-04-04T01:15:12.826042+00:00", "profile": "default"}
+{"type": "connection", "event": "connect", "ts": "2026-04-04T01:15:30.842533+00:00", "profile": "default"}
+{"type": "connection", "event": "hello", "details": {"version": "1.0"}, "ts": "2026-04-04T01:15:30.845025+00:00", "profile": "default"}
+{"type": "tool_call", "tool": "browser_stop", "params": {"profile": "gcu-browser-worker:3"}, "result": {"ok": true, "status": "not_running", "profile": "gcu-browser-worker:3"}, "ok": true, "duration_ms": 0.01, "ts": "2026-04-04T01:29:04.294954+00:00", "profile": "default"}
@@ -1,17 +1,149 @@
 # Release Notes

+## v0.7.1
+
+**Release Date:** March 13, 2026
+**Tag:** v0.7.1
+
+### Chrome-Native Browser Control
+
+v0.7.1 replaces Playwright with direct Chrome DevTools Protocol (CDP) integration. The GCU now launches the user's system Chrome via `open -n` on macOS, connects over CDP, and manages browser lifecycle end-to-end -- no extra browser binary required.
+
+---
+
+### Highlights
+
+#### System Chrome via CDP
+
+The entire GCU browser stack has been rewritten:
+
+- **Chrome finder & launcher** -- New `chrome_finder.py` discovers installed Chrome and `chrome_launcher.py` manages process lifecycle with `--remote-debugging-port`
+- **Coexist with user's browser** -- `open -n` on macOS launches a separate Chrome instance so the user's tabs stay untouched
+- **Dynamic viewport sizing** -- Viewport auto-sizes to the available display area, suppressing Chrome warning bars
+- **Orphan cleanup** -- Chrome processes are killed on GCU server shutdown to prevent leaks
+- **`--no-startup-window`** -- Chrome launches headlessly by default until a page is needed
+
+#### Per-Subagent Browser Isolation
+
+Each GCU subagent gets its own Chrome user-data directory, preventing cookie/session cross-contamination:
+
+- Unique browser profiles injected per subagent
+- Profiles cleaned up after top-level GCU node execution
+- Tab origin and age metadata tracked per subagent
+
+#### Dummy Agent Testing Framework
+
+A comprehensive test suite for validating agent graph patterns without LLM calls:
+
+- 8 test modules covering echo, pipeline, branch, parallel merge, retry, feedback loop, worker, and GCU subagent patterns
+- Shared fixtures and a `run_all.py` runner for CI integration
+- Subagent lifecycle tests
+
+---
+
+### What's New
+
+#### GCU Browser
+
+- **Switch from Playwright to system Chrome via CDP** -- Direct CDP connection replaces Playwright dependency. (@bryanadenhq)
+- **Chrome finder and launcher modules** -- `chrome_finder.py` and `chrome_launcher.py` for cross-platform Chrome discovery and process management. (@bryanadenhq)
+- **Dynamic viewport sizing** -- Auto-size viewport and suppress Chrome warning bar. (@bryanadenhq)
+- **Per-subagent browser profile isolation** -- Unique user-data directories per subagent with cleanup. (@bryanadenhq)
+- **Tab origin/age metadata** -- Track which subagent opened each tab and when. (@bryanadenhq)
+- **`browser_close_all` tool** -- Bulk tab cleanup for agents managing many pages. (@bryanadenhq)
+- **Auto-track popup pages** -- Popups are automatically captured and tracked. (@bryanadenhq)
+- **Auto-snapshot from browser interactions** -- Browser interaction tools return screenshots automatically. (@bryanadenhq)
+- **Kill orphaned Chrome processes** -- GCU server shutdown cleans up lingering Chrome instances. (@bryanadenhq)
+- **`--no-startup-window` Chrome flag** -- Prevent empty window on launch. (@bryanadenhq)
+- **Launch Chrome via `open -n` on macOS** -- Coexist with the user's running browser. (@bryanadenhq)
+
+#### Framework & Runtime
+
+- **Session resume fix for new agents** -- Correctly resume sessions when a new agent is loaded. (@bryanadenhq)
+- **Queen upsert fix** -- Prevent duplicate queen entries on session restore. (@bryanadenhq)
+- **Anchor worker monitoring to queen's session ID on cold-restore** -- Worker monitors reconnect to the correct queen after restart. (@bryanadenhq)
+- **Update meta.json when loading workers** -- Worker metadata stays in sync with runtime state. (@RichardTang-Aden)
+- **Generate worker MCP file correctly** -- Fix MCP config generation for spawned workers. (@RichardTang-Aden)
+- **Share event bus so tool events are visible to parent** -- Tool execution events propagate up to parent graphs. (@bryanadenhq)
+- **Subagent activity tracking in queen status** -- Queen instructions include live subagent status. (@bryanadenhq)
+- **GCU system prompt updates** -- Auto-snapshots, batching, popup tracking, and close_all guidance. (@bryanadenhq)
+
+#### Frontend
+
+- **Loading spinner in draft panel** -- Shows spinner during planning phase instead of blank panel. (@bryanadenhq)
+- **Fix credential modal errors** -- Modal no longer eats errors; banner stays visible. (@bryanadenhq)
+- **Fix credentials_required loop** -- Stop clearing the flag on modal close to prevent infinite re-prompting. (@bryanadenhq)
+- **Fix "Add tab" dropdown overflow** -- Dropdown no longer hidden when many agents are open. (@prasoonmhwr)
+
+#### Testing
+
+- **Dummy agent test framework** -- 8 test modules (echo, pipeline, branch, parallel merge, retry, feedback loop, worker, GCU subagent) with shared fixtures and CI runner. (@bryanadenhq)
+- **Subagent lifecycle tests** -- Validate subagent spawn and completion flows. (@bryanadenhq)
+
+#### Documentation & Infrastructure
+
+- **MCP integration PRD** -- Product requirements for MCP server registry. (@TimothyZhang7)
+- **Skills registry PRD** -- Product requirements for skill registry system. (@bryanadenhq)
+- **Bounty program updates** -- Standard bounty issue template and updated contributor guide. (@bryanadenhq)
+- **Windows quickstart** -- Add default context limit for PowerShell setup. (@bryanadenhq)
+- **Remove deprecated files** -- Clean up `setup_mcp.py`, `verify_mcp.py`, `antigravity-setup.md`, and `setup-antigravity-mcp.sh`. (@bryanadenhq)
+
+---
+
+### Bug Fixes
+
+- Fix credential modal eating errors and banner staying open
+- Stop clearing `credentials_required` on modal close to prevent infinite loop
+- Share event bus so tool events are visible to parent graph
+- Use lazy %-formatting in subagent completion log to avoid f-string in logger
+- Anchor worker monitoring to queen's session ID on cold-restore
+- Update meta.json when loading workers
+- Generate worker MCP file correctly
+- Fix "Add tab" dropdown partially hidden when creating multiple agents
+
+---
+
+### Community Contributors
+
+- **Prasoon Mahawar** (@prasoonmhwr) -- Fix UI overflow on agent tab dropdown
+- **Richard Tang** (@RichardTang-Aden) -- Worker MCP generation and meta.json fixes
+
+---
+
+### Upgrading
+
+```bash
+git pull origin main
+uv sync
+```
+
+The Playwright dependency is no longer required for GCU browser operations. Chrome must be installed on the host system.
+
+---
+
+## v0.7.0
+
+**Release Date:** March 5, 2026
+**Tag:** v0.7.0
+
+Session management refactor release.
+
+---
+
+## v0.5.1
+
 **Release Date:** February 18, 2026
 **Tag:** v0.5.1

-## The Hive Gets a Brain
+### The Hive Gets a Brain

 v0.5.1 is our most ambitious release yet. Hive agents can now **build other agents** -- the new Hive Coder meta-agent writes, tests, and fixes agent packages from natural language. The runtime grows multi-graph support so one session can orchestrate multiple agents simultaneously. The TUI gets a complete overhaul with an in-app agent picker, live streaming, and seamless escalation to the Coder. And we're now provider-agnostic: Claude Code subscriptions, OpenAI-compatible endpoints, and any LiteLLM-supported model work out of the box.

 ---

-## Highlights
+### Highlights

-### Hive Coder -- The Agent That Builds Agents
+#### Hive Coder -- The Agent That Builds Agents

 A native meta-agent that lives inside the framework at `core/framework/agents/hive_coder/`. Give it a natural-language specification and it produces a complete agent package -- goal definition, node prompts, edge routing, MCP tool wiring, tests, and all boilerplate files.

@@ -30,7 +162,7 @@ The Coder ships with:
 - **Coder Tools MCP server** -- file I/O, fuzzy-match editing, git snapshots, and sandboxed shell execution (`tools/coder_tools_server.py`)
 - **Test generation** -- structural tests for forever-alive agents that don't hang on `runner.run()`

-### Multi-Graph Agent Runtime
+#### Multi-Graph Agent Runtime

 `AgentRuntime` now supports loading, managing, and switching between multiple agent graphs within a single session. Six new lifecycle tools give agents (and the TUI) full control:

@@ -44,7 +176,7 @@ await runtime.add_graph("exports/deep_research_agent")

 The Hive Coder uses multi-graph internally -- when you escalate from a worker agent, the Coder loads as a separate graph while the worker stays alive in the background.

-### TUI Revamp
+#### TUI Revamp

 The Terminal UI gets a ground-up rebuild with five major additions:

@@ -54,7 +186,7 @@ The Terminal UI gets a ground-up rebuild with five major additions:
 - **PDF attachments** -- `/attach` and `/detach` commands with native OS file dialog (macOS, Linux, Windows)
 - **Multi-graph commands** -- `/graphs`, `/graph <id>`, `/load <path>`, `/unload <id>` for managing agent graphs in-session

-### Provider-Agnostic LLM Support
+#### Provider-Agnostic LLM Support

 Hive is no longer Anthropic-only. v0.5.1 adds first-class support for:

@@ -66,9 +198,9 @@ The quickstart script auto-detects Claude Code subscriptions and ZAI Code instal

 ---

-## What's New
+### What's New

-### Architecture & Runtime
+#### Architecture & Runtime

 - **Hive Coder meta-agent** -- Natural-language agent builder with reference docs, guardian watchdog, and `hive code` CLI command. (@TimothyZhang7)
 - **Multi-graph agent sessions** -- `add_graph`/`remove_graph` on AgentRuntime with 6 lifecycle tools (`load_agent`, `unload_agent`, `start_agent`, `restart_agent`, `list_agents`, `get_user_presence`). (@TimothyZhang7)
@@ -79,7 +211,7 @@ The quickstart script auto-detects Claude Code subscriptions and ZAI Code instal
 - **Pre-start confirmation prompt** -- Interactive prompt before agent execution allowing credential updates or abort. (@RichardTang-Aden)
 - **Event bus multi-graph support** -- `graph_id` on events, `filter_graph` on subscriptions, `ESCALATION_REQUESTED` event type, `exclude_own_graph` filter. (@TimothyZhang7)

-### TUI Improvements
+#### TUI Improvements

 - **In-app agent picker** (Ctrl+A) -- Tabbed modal for browsing agents with metadata badges (nodes, tools, sessions, tags). (@TimothyZhang7)
 - **Runtime-optional TUI startup** -- Launches without a pre-loaded agent, shows agent picker on startup. (@TimothyZhang7)
@@ -89,7 +221,7 @@ The quickstart script auto-detects Claude Code subscriptions and ZAI Code instal
 - **Multi-graph TUI commands** -- `/graphs`, `/graph <id>`, `/load <path>`, `/unload <id>`. (@TimothyZhang7)
 - **Agent Guardian watchdog** -- Event-driven monitor that catches secondary agent failures and triggers automatic remediation, with `--no-guardian` CLI flag. (@TimothyZhang7)

-### New Tool Integrations
+#### New Tool Integrations

 | Tool                   | Description                                                                                                                                                            | Contributor        |
 | ---------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------ |
@@ -99,7 +231,7 @@ The quickstart script auto-detects Claude Code subscriptions and ZAI Code instal
 | **Google Docs**        | Document creation, reading, and editing with OAuth credential support                                                                                                  | @haliaeetusvocifer |
 | **Gmail enhancements** | Expanded mail operations for inbox management                                                                                                                          | @bryanadenhq       |

-### Infrastructure
+#### Infrastructure

 - **Default node type → `event_loop`** -- `NodeSpec.node_type` defaults to `"event_loop"` instead of `"llm_tool_use"`. (@TimothyZhang7)
 - **Default `max_node_visits` → 0 (unlimited)** -- Nodes default to unlimited visits, reducing friction for feedback loops and forever-alive agents. (@TimothyZhang7)
@@ -112,7 +244,7 @@ The quickstart script auto-detects Claude Code subscriptions and ZAI Code instal

 ---

-## Bug Fixes
+### Bug Fixes

 - Flush WIP accumulator outputs on cancel/failure so edge conditions see correct values on resume
 - Stall detection state preserved across resume (no more resets on checkpoint restore)
@@ -125,13 +257,13 @@ The quickstart script auto-detects Claude Code subscriptions and ZAI Code instal
 - Fix email agent version conflicts (@RichardTang-Aden)
 - Fix coder tool timeouts (120s for tests, 300s cap for commands)

-## Documentation
+### Documentation

 - Clarify installation and prevent root pip install misuse (@paarths-collab)

 ---

-## Agent Updates
+### Agent Updates

 - **Email Inbox Management** -- Consolidate `gmail_inbox_guardian` and `inbox_management` into a single unified agent with updated prompts and config. (@RichardTang-Aden, @bryanadenhq)
 - **Job Hunter** -- Updated node prompts, config, and agent metadata; added PDF resume selection. (@bryanadenhq)
@@ -141,7 +273,7 @@ The quickstart script auto-detects Claude Code subscriptions and ZAI Code instal

 ---

-## Breaking Changes
+### Breaking Changes

 - **Deprecated node types raise `RuntimeError`** -- `llm_tool_use`, `llm_generate`, `function`, `router`, `human_input` now fail instead of warning. Migrate to `event_loop`.
 - **`NodeSpec.node_type` defaults to `"event_loop"`** (was `"llm_tool_use"`)
@@ -150,7 +282,7 @@ The quickstart script auto-detects Claude Code subscriptions and ZAI Code instal

 ---

-## Community Contributors
+### Community Contributors

 A huge thank you to everyone who contributed to this release:

@@ -165,14 +297,14 @@ A huge thank you to everyone who contributed to this release:

 ---

-## Upgrading
+### Upgrading

 ```bash
 git pull origin main
 uv sync
 ```

-### Migration Guide
+#### Migration Guide

 If your agents use deprecated node types, update them:

@@ -196,12 +328,3 @@ hive code
 # Or from TUI -- press Ctrl+E to escalate
 hive tui
 ```
-
---
-
-## What's Next
-
- **Agent-to-agent communication** -- one agent's output triggers another agent's entry point
- **Cost visibility** -- detailed runtime log of LLM costs per node and per session
- **Persistent webhook subscriptions** -- survive agent restarts without re-registering
- **Remote agent deployment** -- run agents as long-lived services with HTTP APIs
@@ -4,7 +4,7 @@

 Welcome to Aden Hive, an open-source AI agent framework built for developers who demand production-grade reliability, cross-platform support, and real-world performance. This guide will help you contribute effectively, whether you're fixing bugs, adding features, improving documentation, or building new tools.

-Thank you for your interest in contributing! We're especially looking for help building tools, integrations ([check #2805](https://github.com/adenhq/hive/issues/2805)), and example agents for the framework.
+Thank you for your interest in contributing! We're especially looking for help building tools, integrations ([check #2805](https://github.com/aden-hive/hive/issues/2805)), and example agents for the framework.

 ---

@@ -121,9 +121,15 @@ uv sync
 6. Make your changes
 7. Run checks and tests:
   ```bash
-   make check    # Lint and format checks (ruff check + ruff format --check)
+   make check    # Lint and format checks
   make test     # Core tests
   ```
+   On Windows (no make), run directly:
+   ```powershell
+   uv run ruff check core/ tools/
+   uv run ruff format --check core/ tools/
+   uv run pytest core/tests/
+   ```
 8. Commit your changes following our commit conventions
 9. Push to your fork and submit a Pull Request

@@ -222,8 +228,7 @@ else:  # linux
 - **Node.js 18+** (optional, for frontend development)

 > **Windows Users:**
-> If you are on native Windows, it is recommended to use **WSL (Windows Subsystem for Linux)**.
-> Alternatively, make sure to run PowerShell or Git Bash with Python 3.11+ installed, and disable "App Execution Aliases" in Windows settings.
+> Native Windows is supported. Use `.\quickstart.ps1` for setup and `.\hive.ps1` to run (PowerShell 5.1+). Disable "App Execution Aliases" in Windows settings to avoid Python path conflicts. WSL is also an option but not required.

 > **Tip:** Installing Claude Code skills is optional for running existing agents, but required if you plan to **build new agents**.

@@ -328,6 +333,22 @@ make test-live     # Run live API integration tests (requires credentials)
 - **WebSocket** for real-time updates
 - **Tailwind CSS** for styling

+### Frontend Dev Workflow
+
+> **Note:** `./quickstart.sh` handles the full setup including the web UI.
+> The commands below are for contributors iterating on the frontend code after
+> initial setup is complete.
+
+```bash
+# Start the backend server
+hive serve
+
+# In a separate terminal, run the frontend dev server with hot-reload
+cd core/frontend
+npm install   # only needed after dependency changes
+npm run dev
+```
+
 ### Useful Development Commands

 ```bash
@@ -385,6 +406,8 @@ Aden Hive supports **100+ LLM providers** via LiteLLM, giving users maximum flex
 |----------|--------|-------|
 | **Anthropic** | Claude 3.5 Sonnet, Haiku, Opus | Default provider, best for reasoning |
 | **OpenAI** | GPT-4, GPT-4 Turbo, GPT-4o | Function calling, vision |
+| **OpenRouter** | Any OpenRouter catalog model | Uses `OPENROUTER_API_KEY` and `https://openrouter.ai/api/v1` |
+| **Hive LLM** | `queen`, `kimi-2.5`, `GLM-5` | Uses `HIVE_API_KEY` and the Hive-managed endpoint |
 | **Google** | Gemini 1.5 Pro, Flash | Long context windows |
 | **DeepSeek** | DeepSeek V3 | Cost-effective, strong reasoning |
 | **Mistral** | Mistral Large, Medium, Small | Open weights, EU hosting |
@@ -410,6 +433,10 @@ DEFAULT_MODEL = "claude-haiku-4-5-20251001"
 - **Cost**: DeepSeek or Gemini Flash (budget-conscious)
 - **Privacy**: Ollama with local models (no data leaves server)

+**Provider-Specific Notes**
+- **OpenRouter**: store `provider` as `openrouter`, use the raw OpenRouter model ID in `model` (for example `x-ai/grok-4.20-beta`), and use `OPENROUTER_API_KEY`
+- **Hive LLM**: store `provider` as `hive`, use Hive model names such as `queen`, `kimi-2.5`, or `GLM-5`, and use `HIVE_API_KEY`
+
 **For Development**
 - Use cheaper/faster models (Haiku, GPT-4o-mini)
 - Test with multiple providers to catch provider-specific issues
@@ -421,7 +448,7 @@ DEFAULT_MODEL = "claude-haiku-4-5-20251001"
 2. **Add credential handling** in `core/framework/credentials/`
 3. **Add provider-specific configuration** in `core/framework/llm/`
 4. **Write tests** in `core/tests/test_llm_provider.py`
-5. **Update documentation** in `docs/llm_providers.md`
+5. **Update documentation** in `README.md`, `docs/configuration.md`, and any setup guides that mention provider configuration

 **Example: Testing LLM Integration**

@@ -592,11 +619,6 @@ from litellm import completion_cost
 cost = completion_cost(model="claude-3-5-sonnet-20241022", messages=[...])
 ```

-**Monitoring Dashboard** (`/core/framework/monitoring/`)
- WebSocket-based real-time monitoring
- Displays: active agents, tool calls, token usage, errors
- Access at: `http://localhost:8000/monitor`
-
 ### How to Add Performance Metrics

 **1. Instrument your code**
@@ -1,27 +1,34 @@
-.PHONY: lint format check test install-hooks help frontend-install frontend-dev frontend-build
+.PHONY: lint format check test test-tools test-live test-all install-hooks help frontend-install frontend-dev frontend-build
+
+# ── Ensure uv is findable in Git Bash on Windows ──────────────────────────────
+# uv installs to ~/.local/bin on Windows/Linux/macOS. Git Bash may not include
+# this in PATH by default, so we prepend it here.
+export PATH := $(HOME)/.local/bin:$(PATH)
+
+# ── Targets ───────────────────────────────────────────────────────────────────

 help: ## Show this help
 	@grep -E '^[a-zA-Z_-]+:.*?## .*$$' $(MAKEFILE_LIST) | \
 		awk 'BEGIN {FS = ":.*?## "}; {printf "  \033[36m%-15s\033[0m %s\n", $$1, $$2}'

 lint: ## Run ruff linter and formatter (with auto-fix)
-	cd core && ruff check --fix .
-	cd tools && ruff check --fix .
-	cd core && ruff format .
-	cd tools && ruff format .
+	cd core && uv run ruff check --fix .
+	cd tools && uv run ruff check --fix .
+	cd core && uv run ruff format .
+	cd tools && uv run ruff format .

 format: ## Run ruff formatter
-	cd core && ruff format .
-	cd tools && ruff format .
+	cd core && uv run ruff format .
+	cd tools && uv run ruff format .

 check: ## Run all checks without modifying files (CI-safe)
-	cd core && ruff check .
-	cd tools && ruff check .
-	cd core && ruff format --check .
-	cd tools && ruff format --check .
+	cd core && uv run ruff check .
+	cd tools && uv run ruff check .
+	cd core && uv run ruff format --check .
+	cd tools && uv run ruff format --check .

 test: ## Run all tests (core + tools, excludes live)
-	cd core && uv run python -m pytest tests/ -v
+	cd core && uv run python -m pytest tests/ -v --ignore=tests/dummy_agents
 	cd tools && uv run python -m pytest -v

 test-tools: ## Run tool tests only (mocked, no credentials needed)
@@ -31,7 +38,7 @@ test-live: ## Run live integration tests (requires real API credentials)
 	cd tools && uv run python -m pytest -m live -s -o "addopts=" --log-cli-level=INFO

 test-all: ## Run everything including live tests
-	cd core && uv run python -m pytest tests/ -v
+	cd core && uv run python -m pytest tests/ -v --ignore=tests/dummy_agents
 	cd tools && uv run python -m pytest -v
 	cd tools && uv run python -m pytest -m live -s -o "addopts=" --log-cli-level=INFO

@@ -46,4 +53,4 @@ frontend-dev: ## Start frontend dev server
 	cd core/frontend && npm run dev

 frontend-build: ## Build frontend for production
-	cd core/frontend && npm run build
+	cd core/frontend && npm run build
@@ -23,11 +23,12 @@
 </p>

 <p align="center">
+  <img src="https://img.shields.io/badge/Agent_Harness-Runtime_Layer-ff6600?style=flat-square" alt="Agent Harness" />
  <img src="https://img.shields.io/badge/AI_Agents-Self--Improving-brightgreen?style=flat-square" alt="AI Agents" />
  <img src="https://img.shields.io/badge/Multi--Agent-Systems-blue?style=flat-square" alt="Multi-Agent" />
  <img src="https://img.shields.io/badge/Headless-Development-purple?style=flat-square" alt="Headless" />
  <img src="https://img.shields.io/badge/Human--in--the--Loop-orange?style=flat-square" alt="HITL" />
-  <img src="https://img.shields.io/badge/Production--Ready-red?style=flat-square" alt="Production" />
+  <img src="https://img.shields.io/badge/Browser-Use-red?style=flat-square" alt="Browser Use" />
 </p>
 <p align="center">
  <img src="https://img.shields.io/badge/OpenAI-supported-412991?style=flat-square&logo=openai" alt="OpenAI" />
@@ -35,37 +36,42 @@
  <img src="https://img.shields.io/badge/Google_Gemini-supported-4285F4?style=flat-square&logo=google" alt="Gemini" />
 </p>

+<p align="center"><em>The agent harness for production workloads — state management, failure recovery, observability, and human oversight so your agents actually run.</em></p>
+
 ## Overview

-Build autonomous, reliable, self-improving AI agents without hardcoding workflows. Define your goal through conversation with hive coding agent(queen), and the framework generates a node graph with dynamically created connection code. When things break, the framework captures failure data, evolves the agent through the coding agent, and redeploys. Built-in human-in-the-loop nodes, credential management, and real-time monitoring give you control without sacrificing adaptability.
+Hive is a runtime harness for AI agents in production. You describe your goal in natural language; a coding agent (the queen) generates the agent graph and connection code to achieve it. During execution, the harness manages state isolation, checkpoint-based crash recovery, cost enforcement, and real-time observability. When agents fail, the framework captures failure data, evolves the graph through the coding agent, and redeploys automatically. Built-in human-in-the-loop nodes, browser control, credential management, and parallel execution give you production reliability without sacrificing adaptability.

 Visit [adenhq.com](https://adenhq.com) for complete documentation, examples, and guides.

-[![Hive Demo](https://img.youtube.com/vi/XDOG9fOaLjU/maxresdefault.jpg)](https://www.youtube.com/watch?v=XDOG9fOaLjU)
+Visit [HoneyComb](http://honeycomb.open-hive.com/) to see what jobs are being automated by AI. It’s a stock market for jobs, driven by our community’s AI agent progress. You can long and short jobs (with no real money but compute token)based on how much you think a job is going to be replaced by AI.
+
+https://github.com/user-attachments/assets/bf10edc3-06ba-48b6-98ba-d069b15fb69d
+

 ## Who Is Hive For?

-Hive is designed for developers and teams who want to build **production-grade AI agents** without manually wiring complex workflows.
+Hive is the multi-agent harness layer for teams moving AI agents from prototype to production. Single agents like Openclaw and Cowork can finish personal jobs pretty well but lack the rigor to fulfil business processes. 

 Hive is a good fit if you:

 - Want AI agents that **execute real business processes**, not demos
- Need **fast or high volume agent execution** over open workflow
+- Need a **runtime that handles state, recovery, and parallel execution** at scale
 - Need **self-healing and adaptive agents** that improve over time
 - Require **human-in-the-loop control**, observability, and cost limits
- Plan to run agents in **production environments**
+- Plan to run agents in **production** where uptime, cost, and auditability matter

 Hive may not be the best fit if you’re only experimenting with simple agent chains or one-off scripts.

 ## When Should You Use Hive?

-Use Hive when you need:
+Use Hive when the bottleneck is no longer the model but the harness around it:

- Long-running, autonomous agents
- Strong guardrails, process, and controls
- Continuous improvement based on failures
- Multi-agent coordination
- A framework that evolves with your goals
+- Long-running agents that need **state persistence and crash recovery**
+- Production workloads requiring **cost enforcement, observability, and audit trails**
+- Agents that **self-heal** through failure capture and graph evolution
+- Multi-agent coordination with **session isolation and shared buffers**
+- A framework that **scales with model improvements** rather than fighting them

 ## Quick Links

@@ -73,7 +79,7 @@ Use Hive when you need:
 - **[Self-Hosting Guide](https://docs.adenhq.com/getting-started/quickstart)** - Deploy Hive on your infrastructure
 - **[Changelog](https://github.com/aden-hive/hive/releases)** - Latest updates and releases
 - **[Roadmap](docs/roadmap.md)** - Upcoming features and plans
- **[Report Issues](https://github.com/adenhq/hive/issues)** - Bug reports and feature requests
+- **[Report Issues](https://github.com/aden-hive/hive/issues)** - Bug reports and feature requests
 - **[Contributing](CONTRIBUTING.md)** - How to contribute and submit PRs

 ## Quick Start
@@ -84,7 +90,7 @@ Use Hive when you need:
 - An LLM provider that powers the agents
 - **ripgrep (optional, recommended on Windows):** The `search_files` tool uses ripgrep for faster file search. If not installed, a Python fallback is used. On Windows: `winget install BurntSushi.ripgrep` or `scoop install ripgrep`

-> **Note for Windows Users:** It is strongly recommended to use **WSL (Windows Subsystem for Linux)** or **Git Bash** to run this framework. Some core automation scripts may not execute correctly in standard Command Prompt or PowerShell.
+> **Windows Users:** Native Windows is supported via `quickstart.ps1` and `hive.ps1`. Run these in PowerShell 5.1+. WSL is also an option but not required.

 ### Installation

@@ -98,9 +104,11 @@ Use Hive when you need:
 git clone https://github.com/aden-hive/hive.git
 cd hive

-
-# Run quickstart setup
+# Run quickstart setup (macOS/Linux)
 ./quickstart.sh
+
+# Windows (PowerShell)
+.\quickstart.ps1
 ```

 This sets up:
@@ -108,18 +116,16 @@ This sets up:
 - **framework** - Core agent runtime and graph executor (in `core/.venv`)
 - **aden_tools** - MCP tools for agent capabilities (in `tools/.venv`)
 - **credential store** - Encrypted API key storage (`~/.hive/credentials`)
- **LLM provider** - Interactive default model configuration
+- **LLM provider** - Interactive default model configuration, including Hive LLM and OpenRouter
 - All required Python dependencies with `uv`

 - Finally, it will open the Hive interface in your browser

 > **Tip:** To reopen the dashboard later, run `hive open` from the project directory.

-<img width="2500" height="1214" alt="home-screen" src="https://github.com/user-attachments/assets/134d897f-5e75-4874-b00b-e0505f6b45c4" />
-
 ### Build Your First Agent

-Type the agent you want to build in the home input box
+Type the agent you want to build in the home input box. The queen is going to ask you questions and work out a solution with you.

 <img width="2500" height="1214" alt="Image" src="https://github.com/user-attachments/assets/1ce19141-a78b-46f5-8d64-dbf987e048f4" />

@@ -131,7 +137,7 @@ Click "Try a sample agent" and check the templates. You can run a template direc

 Now you can run an agent by selecting the agent (either an existing agent or example agent). You can click the Run button on the top left, or talk to the queen agent and it can run the agent for you.

-<img width="2500" height="1214" alt="Image" src="https://github.com/user-attachments/assets/71c38206-2ad5-49aa-bde8-6698d0bc55f5" />
+<img width="2549" height="1174" alt="Screenshot 2026-03-12 at 9 27 36 PM" src="https://github.com/user-attachments/assets/7c7d30fa-9ceb-4c23-95af-b1caa405547d" />

 ## Features

@@ -140,22 +146,21 @@ Now you can run an agent by selecting the agent (either an existing agent or exa
 - **[Goal-Driven Generation](docs/key_concepts/goals_outcome.md)** - Define objectives in natural language; the coding agent generates the agent graph and connection code to achieve them
 - **[Adaptiveness](docs/key_concepts/evolution.md)** - Framework captures failures, calibrates according to the objectives, and evolves the agent graph
 - **[Dynamic Node Connections](docs/key_concepts/graph.md)** - No predefined edges; connection code is generated by any capable LLM based on your goals
- **SDK-Wrapped Nodes** - Every node gets shared memory, local RLM memory, monitoring, tools, and LLM access out of the box
+- **SDK-Wrapped Nodes** - Every node gets a shared data buffer, local RLM memory, monitoring, tools, and LLM access out of the box
 - **[Human-in-the-Loop](docs/key_concepts/graph.md#human-in-the-loop)** - Intervention nodes that pause execution for human input with configurable timeouts and escalation
 - **Real-time Observability** - WebSocket streaming for live monitoring of agent execution, decisions, and node-to-node communication
- **Production-Ready** - Self-hostable, built for scale and reliability

 ## Integration

 <a href="https://github.com/aden-hive/hive/tree/main/tools/src/aden_tools/tools"><img width="100%" alt="Integration" src="https://github.com/user-attachments/assets/a1573f93-cf02-4bb8-b3d5-b305b05b1e51" /></a>
 Hive is built to be model-agnostic and system-agnostic.

- **LLM flexibility** - Hive Framework is designed to support various types of LLMs, including hosted and local models through LiteLLM-compatible providers.
+- **LLM flexibility** - Hive Framework supports Anthropic, OpenAI, OpenRouter, Hive LLM, and other hosted or local models through LiteLLM-compatible providers.
 - **Business system connectivity** - Hive Framework is designed to connect to all kinds of business systems as tools, such as CRM, support, messaging, data, file, and internal APIs via MCP.

-## Why Aden
+## Why Hive

-Hive focuses on generating agents that run real business processes rather than generic agents. Instead of requiring you to manually design workflows, define agent interactions, and handle failures reactively, Hive flips the paradigm: **you describe outcomes, and the system builds itself**—delivering an outcome-driven, adaptive experience with an easy-to-use set of tools and integrations.
+As models improve, the upper bound of what agents can do rises — but their reliability and production value are determined by the harness. Hive focuses on generating agents that run real business processes rather than generic agents. Instead of requiring you to manually design workflows, define agent interactions, and handle failures reactively, Hive flips the paradigm: **you describe outcomes, and the system builds itself**—delivering an outcome-driven, adaptive experience with an easy-to-use set of tools and integrations.

 ```mermaid
 flowchart LR
@@ -189,17 +194,6 @@ flowchart LR
    style V6 fill:#fff,stroke:#ed8c00,stroke-width:1px,color:#cc5d00
 ```

-### The Hive Advantage
-
-| Traditional Frameworks     | Hive                                   |
-| -------------------------- | -------------------------------------- |
-| Hardcode agent workflows   | Describe goals in natural language     |
-| Manual graph definition    | Auto-generated agent graphs            |
-| Reactive error handling    | Outcome-evaluation and adaptiveness    |
-| Static tool configurations | Dynamic SDK-wrapped nodes              |
-| Separate monitoring setup  | Built-in real-time observability       |
-| DIY budget management      | Integrated cost controls & degradation |
-
 ### How It Works

 1. **[Define Your Goal](docs/key_concepts/goals_outcome.md)** → Describe what you want to achieve in plain English
@@ -378,7 +372,7 @@ This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENS

 **Q: What LLM providers does Hive support?**

-Hive supports 100+ LLM providers through LiteLLM integration, including OpenAI (GPT-4, GPT-4o), Anthropic (Claude models), Google Gemini, DeepSeek, Mistral, Groq, and many more. Simply set the appropriate API key environment variable and specify the model name. We recommend using Claude, GLM and Gemini as they have the best performance.
+Hive supports 100+ LLM providers through LiteLLM integration, including OpenAI (GPT-4, GPT-4o), Anthropic (Claude models), Google Gemini, DeepSeek, Mistral, Groq, OpenRouter, and Hive LLM. Simply set the appropriate API key environment variable and specify the model name. See [docs/configuration.md](docs/configuration.md) for provider-specific configuration examples.

 **Q: Can I use Hive with local AI models like Ollama?**

@@ -386,16 +380,12 @@ Yes! Hive supports local models through LiteLLM. Simply use the model name forma

 **Q: What makes Hive different from other agent frameworks?**

-Hive generates your entire agent system from natural language goals using a coding agent—you don't hardcode workflows or manually define graphs. When agents fail, the framework automatically captures failure data, [evolves the agent graph](docs/key_concepts/evolution.md), and redeploys. This self-improving loop is unique to Aden.
+Hive is an agent harness, not just an orchestration framework. It provides the production runtime layer — session isolation, checkpoint-based crash recovery, cost enforcement, real-time observability, and human-in-the-loop controls — that makes agents reliable enough to run real workloads. On top of that, Hive generates your entire agent system from natural language goals and automatically [evolves the graph](docs/key_concepts/evolution.md) when agents fail. The combination of a robust harness with self-improving generation is what sets Hive apart.

 **Q: Is Hive open-source?**

 Yes, Hive is fully open-source under the Apache License 2.0. We actively encourage community contributions and collaboration.

-**Q: Can Hive handle complex, production-scale use cases?**
-
-Yes. Hive is explicitly designed for production environments with features like automatic failure recovery, real-time observability, cost controls, and horizontal scaling support. The framework handles both simple automations and complex multi-agent workflows.
-
 **Q: Does Hive support human-in-the-loop workflows?**

 Yes, Hive fully supports [human-in-the-loop](docs/key_concepts/graph.md#human-in-the-loop) workflows through intervention nodes that pause execution for human input. These include configurable timeouts and escalation policies, allowing seamless collaboration between human experts and AI agents.
@@ -420,6 +410,16 @@ Visit [docs.adenhq.com](https://docs.adenhq.com/) for complete guides, API refer

 Contributions are welcome! Fork the repository, create your feature branch, implement your changes, and submit a pull request. See [CONTRIBUTING.md](CONTRIBUTING.md) for detailed guidelines.

+## Star History
+
+<a href="https://star-history.com/#aden-hive/hive&Date">
+ <picture>
+   <source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/svg?repos=aden-hive/hive&type=Date&theme=dark" />
+   <source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/svg?repos=aden-hive/hive&type=Date" />
+   <img alt="Star History Chart" src="https://api.star-history.com/svg?repos=aden-hive/hive&type=Date" />
+ </picture>
+</a>
+
 ---

 <p align="center">
@@ -1,31 +0,0 @@
-perf: reduce subprocess spawning in quickstart scripts (#4427)
-
-## Problem
-Windows process creation (CreateProcess) is 10-100x slower than Linux fork/exec.
-The quickstart scripts were spawning 4+ separate `uv run python -c "import X"` 
-processes to verify imports, adding ~600ms overhead on Windows.
-
-## Solution
-Consolidated all import checks into a single batch script that checks multiple 
-modules in one subprocess call, reducing spawn overhead by ~75%.
-
-## Changes
- **New**: `scripts/check_requirements.py` - Batched import checker
- **New**: `scripts/test_check_requirements.py` - Test suite  
- **New**: `scripts/benchmark_quickstart.ps1` - Performance benchmark tool
- **Modified**: `quickstart.ps1` - Updated import verification (2 sections)
- **Modified**: `quickstart.sh` - Updated import verification
-
-## Performance Impact
-**Benchmark results on Windows:**
- Before: ~19.8 seconds for import checks
- After: ~4.9 seconds for import checks
- **Improvement: 14.9 seconds saved (75.2% faster)**
-
-## Testing
- ✅ All functional tests pass (`scripts/test_check_requirements.py`)
- ✅ Quickstart scripts work correctly on Windows
- ✅ Error handling verified (invalid imports reported correctly)
- ✅ Performance benchmark confirms 75%+ improvement
-
-Fixes #4427
@@ -1,27 +0,0 @@
-# Identity mapping: GitHub username -> Discord ID
-#
-# This file links GitHub accounts to Discord accounts for the
-# Integration Bounty Program. When a bounty PR is merged, the
-# GitHub Action uses this file to ping the contributor on Discord.
-#
-# HOW TO ADD YOURSELF:
-# Open a "Link Discord Account" issue:
-# https://github.com/aden-hive/hive/issues/new?template=link-discord.yml
-# A GitHub Action will automatically add your entry here.
-#
-# To find your Discord ID:
-# 1. Open Discord Settings > Advanced > Enable Developer Mode
-# 2. Right-click your name > Copy User ID
-#
-# Format:
-#   - github: your-github-username
-#     discord: "your-discord-id"  # quotes required (it's a number)
-#     name: Your Display Name      # optional
-
-contributors:
-  # - github: example-user
-  #   discord: "123456789012345678"
-  #   name: Example User
-  - github: TimothyZhang7
-    discord: "408460790061072384"
-    name: Timothy@Aden
@@ -6,7 +6,7 @@ This guide explains how to integrate Model Context Protocol (MCP) servers with t

 The framework provides built-in support for MCP servers, allowing you to:

- **Register MCP servers** via STDIO or HTTP transport
+- **Register MCP servers** via STDIO, HTTP, Unix socket, or SSE transport
 - **Auto-discover tools** from registered servers
 - **Use MCP tools** seamlessly in your agents
 - **Manage multiple MCP servers** simultaneously
@@ -104,6 +104,48 @@ runner.register_mcp_server(
 - `url`: Base URL of the MCP server
 - `headers`: HTTP headers to include (optional)

+### Unix Socket Transport
+
+Best for same-host inter-process communication with lower overhead than TCP:
+
+```python
+runner.register_mcp_server(
+    name="local-ipc-tools",
+    transport="unix",
+    url="http://localhost",
+    socket_path="/tmp/mcp_server.sock",
+    headers={
+        "Authorization": "Bearer token"
+    }
+)
+```
+
+**Configuration:**
+
+- `url`: Base URL for HTTP requests over the socket (required, e.g., `"http://localhost"`)
+- `socket_path`: Absolute path to the Unix socket file (required, e.g., `"/tmp/mcp_server.sock"`)
+- `headers`: HTTP headers to include (optional)
+
+### SSE Transport
+
+Best for real-time, event-driven connections using the MCP SDK's SSE client:
+
+```python
+runner.register_mcp_server(
+    name="streaming-tools",
+    transport="sse",
+    url="http://localhost:8000/sse",
+    headers={
+        "Authorization": "Bearer token"
+    }
+)
+```
+
+**Configuration:**
+
+- `url`: SSE endpoint URL (required, e.g., `"http://localhost:8000/sse"`)
+- `headers`: HTTP headers for the SSE connection (optional)
+
 ## Using MCP Tools in Agents

 Once registered, MCP tools are available just like any other tool:
@@ -258,7 +300,32 @@ runner.register_mcp_server(
 )
 ```

-### 3. Handle Cleanup
+### 3. Use Unix Socket for Same-Host IPC
+
+When both the agent and MCP server run on the same machine, Unix sockets avoid TCP overhead:
+
+```python
+runner.register_mcp_server(
+    name="fast-local-tools",
+    transport="unix",
+    url="http://localhost",
+    socket_path="/tmp/mcp_server.sock"
+)
+```
+
+### 4. Use SSE for Streaming and Real-Time Tools
+
+SSE transport maintains a persistent connection, ideal for event-driven servers:
+
+```python
+runner.register_mcp_server(
+    name="realtime-tools",
+    transport="sse",
+    url="http://realtime-server:8000/sse"
+)
+```
+
+### 5. Handle Cleanup

 Always clean up MCP connections when done:

@@ -280,7 +347,7 @@ async with AgentRunner.load("exports/my-agent") as runner:
    # Automatic cleanup
 ```

-### 4. Tool Name Conflicts
+### 6. Tool Name Conflicts

 If multiple MCP servers provide tools with the same name, the last registered server wins. To avoid conflicts:

@@ -315,6 +382,24 @@ If HTTP transport fails:
 2. Check firewall settings
 3. Verify the URL and port are correct

+### Unix Socket Not Connecting
+
+If Unix socket transport fails:
+
+1. Verify the socket file exists: `ls -la /tmp/mcp_server.sock`
+2. Check file permissions on the socket
+3. Ensure no other process has locked the socket
+4. Verify the `url` field is set (e.g., `"http://localhost"`)
+
+### SSE Connection Issues
+
+If SSE transport fails:
+
+1. Verify the server supports SSE at the given URL
+2. Check that the `mcp` Python package is installed (`pip install mcp`)
+3. Ensure the SSE endpoint is accessible: `curl http://localhost:8000/sse`
+4. Check for firewall or proxy issues blocking long-lived connections
+
 ## Example: Full Agent with MCP Tools

 Here's a complete example of an agent that uses MCP tools:
@@ -0,0 +1,583 @@
+#!/usr/bin/env python3
+"""Antigravity authentication CLI.
+
+Implements OAuth2 flow for Google's Antigravity Code Assist gateway.
+Credentials are stored in ~/.hive/antigravity-accounts.json.
+
+Usage:
+    python -m antigravity_auth auth account add
+    python -m antigravity_auth auth account list
+    python -m antigravity_auth auth account remove <email>
+"""
+
+from __future__ import annotations
+
+import argparse
+import json
+import logging
+import os
+import secrets
+import socket
+import sys
+import time
+import urllib.parse
+import urllib.request
+import webbrowser
+from http.server import BaseHTTPRequestHandler, HTTPServer
+from pathlib import Path
+from typing import Any
+
+logging.basicConfig(level=logging.INFO, format="%(message)s")
+logger = logging.getLogger(__name__)
+
+# OAuth endpoints
+_OAUTH_AUTH_URL = "https://accounts.google.com/o/oauth2/v2/auth"
+_OAUTH_TOKEN_URL = "https://oauth2.googleapis.com/token"
+
+# Scopes for Antigravity/Cloud Code Assist
+_OAUTH_SCOPES = [
+    "https://www.googleapis.com/auth/cloud-platform",
+    "https://www.googleapis.com/auth/userinfo.email",
+    "https://www.googleapis.com/auth/userinfo.profile",
+]
+
+# Credentials file path in ~/.hive/
+_ACCOUNTS_FILE = Path.home() / ".hive" / "antigravity-accounts.json"
+
+# Default project ID
+_DEFAULT_PROJECT_ID = "rising-fact-p41fc"
+_DEFAULT_REDIRECT_PORT = 51121
+
+# OAuth credentials fetched from the opencode-antigravity-auth project.
+# This project reverse-engineered and published the public OAuth credentials
+# for Google's Antigravity/Cloud Code Assist API.
+# Source: https://github.com/NoeFabris/opencode-antigravity-auth
+_CREDENTIALS_URL = (
+    "https://raw.githubusercontent.com/NoeFabris/opencode-antigravity-auth/dev/src/constants.ts"
+)
+
+# Cached credentials fetched from public source
+_cached_client_id: str | None = None
+_cached_client_secret: str | None = None
+
+
+def _fetch_credentials_from_public_source() -> tuple[str | None, str | None]:
+    """Fetch OAuth client ID and secret from the public npm package source on GitHub."""
+    global _cached_client_id, _cached_client_secret
+    if _cached_client_id and _cached_client_secret:
+        return _cached_client_id, _cached_client_secret
+
+    try:
+        req = urllib.request.Request(
+            _CREDENTIALS_URL, headers={"User-Agent": "Hive-Antigravity-Auth/1.0"}
+        )
+        with urllib.request.urlopen(req, timeout=10) as resp:
+            content = resp.read().decode("utf-8")
+            import re
+
+            id_match = re.search(r'ANTIGRAVITY_CLIENT_ID\s*=\s*"([^"]+)"', content)
+            secret_match = re.search(r'ANTIGRAVITY_CLIENT_SECRET\s*=\s*"([^"]+)"', content)
+            if id_match:
+                _cached_client_id = id_match.group(1)
+            if secret_match:
+                _cached_client_secret = secret_match.group(1)
+            return _cached_client_id, _cached_client_secret
+    except Exception as e:
+        logger.debug(f"Failed to fetch credentials from public source: {e}")
+    return None, None
+
+
+def get_client_id() -> str:
+    """Get OAuth client ID from env, config, or public source."""
+    env_id = os.environ.get("ANTIGRAVITY_CLIENT_ID")
+    if env_id:
+        return env_id
+
+    # Try hive config
+    hive_cfg = Path.home() / ".hive" / "configuration.json"
+    if hive_cfg.exists():
+        try:
+            with open(hive_cfg) as f:
+                cfg = json.load(f)
+                cfg_id = cfg.get("llm", {}).get("antigravity_client_id")
+                if cfg_id:
+                    return cfg_id
+        except Exception:
+            pass
+
+    # Fetch from public source
+    client_id, _ = _fetch_credentials_from_public_source()
+    if client_id:
+        return client_id
+
+    raise RuntimeError("Could not obtain Antigravity OAuth client ID")
+
+
+def get_client_secret() -> str | None:
+    """Get OAuth client secret from env, config, or public source."""
+    secret = os.environ.get("ANTIGRAVITY_CLIENT_SECRET")
+    if secret:
+        return secret
+
+    # Try to read from hive config
+    hive_cfg = Path.home() / ".hive" / "configuration.json"
+    if hive_cfg.exists():
+        try:
+            with open(hive_cfg) as f:
+                cfg = json.load(f)
+                secret = cfg.get("llm", {}).get("antigravity_client_secret")
+                if secret:
+                    return secret
+        except Exception:
+            pass
+
+    # Fetch from public source (npm package on GitHub)
+    _, secret = _fetch_credentials_from_public_source()
+    return secret
+
+
+def find_free_port() -> int:
+    """Find an available local port."""
+    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
+        s.bind(("", 0))
+        s.listen(1)
+        return s.getsockname()[1]
+
+
+class OAuthCallbackHandler(BaseHTTPRequestHandler):
+    """Handle OAuth callback from browser."""
+
+    auth_code: str | None = None
+    state: str | None = None
+    error: str | None = None
+
+    def log_message(self, format: str, *args: Any) -> None:
+        pass  # Suppress default logging
+
+    def do_GET(self) -> None:
+        parsed = urllib.parse.urlparse(self.path)
+
+        if parsed.path == "/oauth-callback":
+            query = urllib.parse.parse_qs(parsed.query)
+
+            if "error" in query:
+                self.error = query["error"][0]
+                self._send_response("Authentication failed. You can close this window.")
+                return
+
+            if "code" in query and "state" in query:
+                OAuthCallbackHandler.auth_code = query["code"][0]
+                OAuthCallbackHandler.state = query["state"][0]
+                self._send_response(
+                    "Authentication successful! You can close this window "
+                    "and return to the terminal."
+                )
+                return
+
+        self._send_response("Waiting for authentication...")
+
+    def _send_response(self, message: str) -> None:
+        self.send_response(200)
+        self.send_header("Content-Type", "text/html")
+        self.end_headers()
+        html = f"""<!DOCTYPE html>
+<html>
+<head><title>Antigravity Auth</title></head>
+<body style="font-family: system-ui; display: flex; align-items: center;
+      justify-content: center; height: 100vh; margin: 0; background: #1a1a2e;
+      color: #eee;">
+    <div style="text-align: center;">
+        <h2>{message}</h2>
+    </div>
+</body>
+</html>"""
+        self.wfile.write(html.encode())
+
+
+def wait_for_callback(port: int, timeout: int = 300) -> tuple[str | None, str | None, str | None]:
+    """Start local server and wait for OAuth callback."""
+    server = HTTPServer(("localhost", port), OAuthCallbackHandler)
+    server.timeout = 1
+
+    start = time.time()
+    while time.time() - start < timeout:
+        if OAuthCallbackHandler.auth_code:
+            return (
+                OAuthCallbackHandler.auth_code,
+                OAuthCallbackHandler.state,
+                OAuthCallbackHandler.error,
+            )
+        server.handle_request()
+
+    return None, None, "timeout"
+
+
+def exchange_code_for_tokens(
+    code: str, redirect_uri: str, client_id: str, client_secret: str | None
+) -> dict[str, Any] | None:
+    """Exchange authorization code for tokens."""
+    data = {
+        "code": code,
+        "client_id": client_id,
+        "redirect_uri": redirect_uri,
+        "grant_type": "authorization_code",
+    }
+    if client_secret:
+        data["client_secret"] = client_secret
+
+    body = urllib.parse.urlencode(data).encode()
+
+    req = urllib.request.Request(
+        _OAUTH_TOKEN_URL,
+        data=body,
+        headers={"Content-Type": "application/x-www-form-urlencoded"},
+        method="POST",
+    )
+
+    try:
+        with urllib.request.urlopen(req, timeout=30) as resp:
+            return json.loads(resp.read())
+    except Exception as e:
+        logger.error(f"Token exchange failed: {e}")
+        return None
+
+
+def get_user_email(access_token: str) -> str | None:
+    """Get user email from Google API."""
+    req = urllib.request.Request(
+        "https://www.googleapis.com/oauth2/v2/userinfo",
+        headers={"Authorization": f"Bearer {access_token}"},
+    )
+    try:
+        with urllib.request.urlopen(req, timeout=10) as resp:
+            data = json.loads(resp.read())
+            return data.get("email")
+    except Exception:
+        return None
+
+
+def load_accounts() -> dict[str, Any]:
+    """Load existing accounts from file."""
+    if not _ACCOUNTS_FILE.exists():
+        return {"schemaVersion": 4, "accounts": []}
+    try:
+        with open(_ACCOUNTS_FILE) as f:
+            return json.load(f)
+    except Exception:
+        return {"schemaVersion": 4, "accounts": []}
+
+
+def save_accounts(data: dict[str, Any]) -> None:
+    """Save accounts to file."""
+    _ACCOUNTS_FILE.parent.mkdir(parents=True, exist_ok=True)
+    with open(_ACCOUNTS_FILE, "w") as f:
+        json.dump(data, f, indent=2)
+    logger.info(f"Saved credentials to {_ACCOUNTS_FILE}")
+
+
+def validate_credentials(access_token: str, project_id: str = _DEFAULT_PROJECT_ID) -> bool:
+    """Test if credentials work by making a simple API call to Antigravity.
+
+    Returns True if credentials are valid, False otherwise.
+    """
+    endpoint = "https://daily-cloudcode-pa.sandbox.googleapis.com"
+    body = {
+        "project": project_id,
+        "model": "gemini-3-flash",
+        "request": {
+            "contents": [{"role": "user", "parts": [{"text": "hi"}]}],
+            "generationConfig": {"maxOutputTokens": 10},
+        },
+        "requestType": "agent",
+        "userAgent": "antigravity",
+        "requestId": "validation-test",
+    }
+    headers = {
+        "Authorization": f"Bearer {access_token}",
+        "Content-Type": "application/json",
+        "User-Agent": (
+            "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
+            "AppleWebKit/537.36 (KHTML, like Gecko) Antigravity/1.18.3"
+        ),
+        "X-Goog-Api-Client": "google-cloud-sdk vscode_cloudshelleditor/0.1",
+    }
+
+    try:
+        req = urllib.request.Request(
+            f"{endpoint}/v1internal:generateContent",
+            data=json.dumps(body).encode("utf-8"),
+            headers=headers,
+            method="POST",
+        )
+        with urllib.request.urlopen(req, timeout=30) as resp:
+            json.loads(resp.read())
+            return True
+    except Exception:
+        return False
+
+
+def refresh_access_token(
+    refresh_token: str, client_id: str, client_secret: str | None
+) -> dict | None:
+    """Refresh the access token using the refresh token."""
+    data = {
+        "grant_type": "refresh_token",
+        "refresh_token": refresh_token,
+        "client_id": client_id,
+    }
+    if client_secret:
+        data["client_secret"] = client_secret
+
+    body = urllib.parse.urlencode(data).encode()
+    req = urllib.request.Request(
+        _OAUTH_TOKEN_URL,
+        data=body,
+        headers={"Content-Type": "application/x-www-form-urlencoded"},
+        method="POST",
+    )
+    try:
+        with urllib.request.urlopen(req, timeout=30) as resp:
+            return json.loads(resp.read())
+    except Exception as e:
+        logger.debug(f"Token refresh failed: {e}")
+        return None
+
+
+def cmd_account_add(args: argparse.Namespace) -> int:
+    """Add a new Antigravity account via OAuth2.
+
+    First checks if valid credentials already exist. If so, validates them
+    and skips OAuth if they work. Otherwise, proceeds with OAuth flow.
+    """
+    client_id = get_client_id()
+    client_secret = get_client_secret()
+
+    # Check if credentials already exist
+    accounts_data = load_accounts()
+    accounts = accounts_data.get("accounts", [])
+
+    if accounts:
+        account = next((a for a in accounts if a.get("enabled", True) is not False), accounts[0])
+        access_token = account.get("access")
+        refresh_token_str = account.get("refresh", "")
+        refresh_token = refresh_token_str.split("|")[0] if refresh_token_str else None
+        project_id = (
+            refresh_token_str.split("|")[1] if "|" in refresh_token_str else _DEFAULT_PROJECT_ID
+        )
+        email = account.get("email", "unknown")
+        expires_ms = account.get("expires", 0)
+        expires_at = expires_ms / 1000.0 if expires_ms else 0.0
+
+        # Check if token is expired or near expiry
+        if access_token and expires_at and time.time() < expires_at - 60:
+            # Token still valid, test it
+            logger.info(f"Found existing credentials for: {email}")
+            logger.info("Validating existing credentials...")
+            if validate_credentials(access_token, project_id):
+                logger.info("✓ Credentials valid! Skipping OAuth.")
+                return 0
+            else:
+                logger.info("Credentials failed validation, refreshing...")
+        elif refresh_token:
+            logger.info(f"Found expired credentials for: {email}")
+            logger.info("Attempting token refresh...")
+
+            tokens = refresh_access_token(refresh_token, client_id, client_secret)
+            if tokens:
+                new_access = tokens.get("access_token")
+                expires_in = tokens.get("expires_in", 3600)
+                if new_access:
+                    # Update the account
+                    account["access"] = new_access
+                    account["expires"] = int((time.time() + expires_in) * 1000)
+                    accounts_data["last_refresh"] = time.strftime(
+                        "%Y-%m-%dT%H:%M:%SZ", time.gmtime()
+                    )
+                    save_accounts(accounts_data)
+
+                    # Validate the refreshed token
+                    logger.info("Validating refreshed credentials...")
+                    if validate_credentials(new_access, project_id):
+                        logger.info("✓ Credentials refreshed and validated!")
+                        return 0
+                    else:
+                        logger.info("Refreshed token failed validation, proceeding with OAuth...")
+            else:
+                logger.info("Token refresh failed, proceeding with OAuth...")
+
+    # No valid credentials, proceed with OAuth
+    if not client_secret:
+        logger.warning(
+            "No client secret configured. Token refresh may fail.\n"
+            "Set ANTIGRAVITY_CLIENT_SECRET env var or add "
+            "'antigravity_client_secret' to ~/.hive/configuration.json"
+        )
+
+    # Use fixed port and path matching Google's expected OAuth redirect URI
+    port = _DEFAULT_REDIRECT_PORT
+    redirect_uri = f"http://localhost:{port}/oauth-callback"
+
+    # Generate state for CSRF protection
+    state = secrets.token_urlsafe(16)
+
+    # Build authorization URL
+    params = {
+        "client_id": client_id,
+        "redirect_uri": redirect_uri,
+        "response_type": "code",
+        "scope": " ".join(_OAUTH_SCOPES),
+        "state": state,
+        "access_type": "offline",
+        "prompt": "consent",
+    }
+    auth_url = f"{_OAUTH_AUTH_URL}?{urllib.parse.urlencode(params)}"
+
+    logger.info("Opening browser for authentication...")
+    logger.info(f"If the browser doesn't open, visit: {auth_url}\n")
+
+    # Open browser
+    webbrowser.open(auth_url)
+
+    # Wait for callback
+    logger.info(f"Listening for callback on port {port}...")
+    code, received_state, error = wait_for_callback(port)
+
+    if error:
+        logger.error(f"Authentication failed: {error}")
+        return 1
+
+    if not code:
+        logger.error("No authorization code received")
+        return 1
+
+    if received_state != state:
+        logger.error("State mismatch - possible CSRF attack")
+        return 1
+
+    # Exchange code for tokens
+    logger.info("Exchanging authorization code for tokens...")
+    tokens = exchange_code_for_tokens(code, redirect_uri, client_id, client_secret)
+
+    if not tokens:
+        return 1
+
+    access_token = tokens.get("access_token")
+    refresh_token = tokens.get("refresh_token")
+    expires_in = tokens.get("expires_in", 3600)
+
+    if not access_token:
+        logger.error("No access token in response")
+        return 1
+
+    # Get user email
+    email = get_user_email(access_token)
+    if email:
+        logger.info(f"Authenticated as: {email}")
+
+    # Load existing accounts and add/update
+    accounts_data = load_accounts()
+    accounts = accounts_data.get("accounts", [])
+
+    # Build new account entry (V4 schema)
+    expires_ms = int((time.time() + expires_in) * 1000)
+    refresh_entry = f"{refresh_token}|{_DEFAULT_PROJECT_ID}"
+
+    new_account = {
+        "access": access_token,
+        "refresh": refresh_entry,
+        "expires": expires_ms,
+        "email": email,
+        "enabled": True,
+    }
+
+    # Update existing account or add new one
+    existing_idx = next((i for i, a in enumerate(accounts) if a.get("email") == email), None)
+    if existing_idx is not None:
+        accounts[existing_idx] = new_account
+        logger.info(f"Updated existing account: {email}")
+    else:
+        accounts.append(new_account)
+        logger.info(f"Added new account: {email}")
+
+    accounts_data["accounts"] = accounts
+    accounts_data["schemaVersion"] = 4
+    accounts_data["last_refresh"] = time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime())
+
+    save_accounts(accounts_data)
+    logger.info("\n✓ Authentication complete!")
+    return 0
+
+
+def cmd_account_list(args: argparse.Namespace) -> int:
+    """List all stored accounts."""
+    data = load_accounts()
+    accounts = data.get("accounts", [])
+
+    if not accounts:
+        logger.info("No accounts configured.")
+        logger.info("Run 'antigravity auth account add' to add one.")
+        return 0
+
+    logger.info("Configured accounts:\n")
+    for i, account in enumerate(accounts, 1):
+        email = account.get("email", "unknown")
+        enabled = "enabled" if account.get("enabled", True) else "disabled"
+        logger.info(f"  {i}. {email} ({enabled})")
+
+    return 0
+
+
+def cmd_account_remove(args: argparse.Namespace) -> int:
+    """Remove an account by email."""
+    email = args.email
+    data = load_accounts()
+    accounts = data.get("accounts", [])
+
+    original_len = len(accounts)
+    accounts = [a for a in accounts if a.get("email") != email]
+
+    if len(accounts) == original_len:
+        logger.error(f"No account found with email: {email}")
+        return 1
+
+    data["accounts"] = accounts
+    save_accounts(data)
+    logger.info(f"Removed account: {email}")
+    return 0
+
+
+def main() -> int:
+    parser = argparse.ArgumentParser(
+        description="Antigravity authentication CLI",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+    )
+    subparsers = parser.add_subparsers(dest="command", help="Commands")
+
+    # auth account add
+    auth_parser = subparsers.add_parser("auth", help="Authentication commands")
+    auth_subparsers = auth_parser.add_subparsers(dest="auth_command")
+
+    account_parser = auth_subparsers.add_parser("account", help="Account management")
+    account_subparsers = account_parser.add_subparsers(dest="account_command")
+
+    add_parser = account_subparsers.add_parser("add", help="Add a new account via OAuth2")
+    add_parser.set_defaults(func=cmd_account_add)
+
+    list_parser = account_subparsers.add_parser("list", help="List configured accounts")
+    list_parser.set_defaults(func=cmd_account_list)
+
+    remove_parser = account_subparsers.add_parser("remove", help="Remove an account")
+    remove_parser.add_argument("email", help="Email of account to remove")
+    remove_parser.set_defaults(func=cmd_account_remove)
+
+    args = parser.parse_args()
+
+    if hasattr(args, "func"):
+        return args.func(args)
+
+    parser.print_help()
+    return 0
+
+
+if __name__ == "__main__":
+    sys.exit(main())
@@ -17,6 +17,7 @@ import http.server
 import json
 import os
 import platform
+import queue
 import secrets
 import subprocess
 import sys
@@ -27,6 +28,7 @@ import urllib.parse
 import urllib.request
 from datetime import UTC, datetime
 from pathlib import Path
+from typing import TextIO

 # OAuth constants (from the Codex CLI binary)
 CLIENT_ID = "app_EMoamEEZ73f0CkXaXp7hrann"
@@ -165,11 +167,11 @@ def open_browser(url: str) -> bool:
        if system == "Darwin":
            subprocess.Popen(["open", url], stdout=devnull, stderr=devnull)
        elif system == "Windows":
-            subprocess.Popen(["cmd", "/c", "start", url], stdout=devnull, stderr=devnull)
+            os.startfile(url)  # type: ignore[attr-defined]
        else:
            subprocess.Popen(["xdg-open", url], stdout=devnull, stderr=devnull)
        return True
-    except OSError:
+    except (AttributeError, OSError):
        return False


@@ -266,6 +268,71 @@ def parse_manual_input(value: str, expected_state: str) -> str | None:
    return None


+def _read_manual_input_lines(
+    manual_inputs: queue.Queue[str],
+    stop_event: threading.Event,
+    stdin: TextIO | None = None,
+) -> None:
+    stream = sys.stdin if stdin is None else stdin
+
+    while not stop_event.is_set():
+        try:
+            manual = stream.readline()
+        except (EOFError, OSError):
+            return
+
+        if not manual:
+            return
+
+        if manual.strip():
+            manual_inputs.put(manual)
+
+
+def wait_for_code_from_callback_or_stdin(
+    expected_state: str,
+    callback_result: list[str | None],
+    callback_done: threading.Event,
+    timeout_secs: float = 120,
+    poll_interval: float = 0.1,
+    stdin: TextIO | None = None,
+) -> str | None:
+    manual_inputs: queue.Queue[str] = queue.Queue()
+    stop_event = threading.Event()
+
+    # Read stdin on a daemon thread so manual paste works on platforms where
+    # select() cannot poll console handles, including Windows terminals.
+    threading.Thread(
+        target=_read_manual_input_lines,
+        args=(manual_inputs, stop_event, stdin),
+        daemon=True,
+    ).start()
+
+    deadline = time.time() + timeout_secs
+    try:
+        while time.time() < deadline:
+            if callback_result[0]:
+                return callback_result[0]
+
+            while True:
+                try:
+                    manual = manual_inputs.get_nowait()
+                except queue.Empty:
+                    break
+
+                code = parse_manual_input(manual, expected_state)
+                if code:
+                    return code
+
+            if callback_done.is_set():
+                return callback_result[0]
+
+            time.sleep(poll_interval)
+
+        return callback_result[0]
+    finally:
+        stop_event.set()
+
+
 def main() -> int:
    # Generate PKCE and state
    verifier, challenge = generate_pkce()
@@ -315,41 +382,28 @@ def main() -> int:

        # Start callback server in background
        callback_result: list[str | None] = [None]
+        callback_done = threading.Event()

        def run_server() -> None:
-            callback_result[0] = wait_for_callback(state, timeout_secs=120)
+            try:
+                callback_result[0] = wait_for_callback(state, timeout_secs=120)
+            finally:
+                callback_done.set()

        server_thread = threading.Thread(target=run_server)
        server_thread.daemon = True
        server_thread.start()

-        # Also accept manual input in parallel
-        # We poll for both the server result and stdin
        try:
-            import select
-
-            while server_thread.is_alive():
-                # Check if stdin has data (non-blocking on unix)
-                if hasattr(select, "select"):
-                    ready, _, _ = select.select([sys.stdin], [], [], 0.5)
-                    if ready:
-                        manual = sys.stdin.readline()
-                        if manual.strip():
-                            code = parse_manual_input(manual, state)
-                            if code:
-                                break
-                else:
-                    time.sleep(0.5)
-
-                if callback_result[0]:
-                    code = callback_result[0]
-                    break
-        except (KeyboardInterrupt, EOFError):
+            code = wait_for_code_from_callback_or_stdin(
+                state,
+                callback_result,
+                callback_done,
+                timeout_secs=120,
+            )
+        except KeyboardInterrupt:
            print("\n\033[0;31mCancelled.\033[0m")
            return 1
-
-        if not code:
-            code = callback_result[0]
    else:
        # Manual paste mode
        try:
@@ -1,740 +0,0 @@
-#!/usr/bin/env python3
-"""
-EventLoopNode WebSocket Demo
-
-Real LLM, real FileConversationStore, real EventBus.
-Streams EventLoopNode execution to a browser via WebSocket.
-
-Usage:
-    cd /home/timothy/oss/hive/core
-    python demos/event_loop_wss_demo.py
-
-    Then open http://localhost:8765 in your browser.
-"""
-
-import asyncio
-import json
-import logging
-import sys
-import tempfile
-from http import HTTPStatus
-from pathlib import Path
-
-import httpx
-import websockets
-from bs4 import BeautifulSoup
-from websockets.http11 import Request, Response
-
-# Add core, tools, and hive root to path
-_CORE_DIR = Path(__file__).resolve().parent.parent
-_HIVE_DIR = _CORE_DIR.parent
-sys.path.insert(0, str(_CORE_DIR))  # framework.*
-sys.path.insert(0, str(_HIVE_DIR / "tools" / "src"))  # aden_tools.*
-sys.path.insert(0, str(_HIVE_DIR))  # core.framework.* (for aden_tools imports)
-
-import os  # noqa: E402
-
-from aden_tools.credentials import CREDENTIAL_SPECS, CredentialStoreAdapter  # noqa: E402
-from core.framework.credentials import CredentialStore  # noqa: E402
-
-from framework.credentials.storage import (  # noqa: E402
-    CompositeStorage,
-    EncryptedFileStorage,
-    EnvVarStorage,
-)
-from framework.graph.event_loop_node import EventLoopNode, LoopConfig  # noqa: E402
-from framework.graph.node import NodeContext, NodeSpec, SharedMemory  # noqa: E402
-from framework.llm.litellm import LiteLLMProvider  # noqa: E402
-from framework.llm.provider import Tool  # noqa: E402
-from framework.runner.tool_registry import ToolRegistry  # noqa: E402
-from framework.runtime.core import Runtime  # noqa: E402
-from framework.runtime.event_bus import EventBus, EventType  # noqa: E402
-from framework.storage.conversation_store import FileConversationStore  # noqa: E402
-
-logging.basicConfig(level=logging.INFO, format="%(asctime)s %(name)s %(message)s")
-logger = logging.getLogger("demo")
-
-# -------------------------------------------------------------------------
-# Persistent state (shared across WebSocket connections)
-# -------------------------------------------------------------------------
-
-STORE_DIR = Path(tempfile.mkdtemp(prefix="hive_demo_"))
-STORE = FileConversationStore(STORE_DIR / "conversation")
-RUNTIME = Runtime(STORE_DIR / "runtime")
-LLM = LiteLLMProvider(model="claude-sonnet-4-5-20250929")
-
-# -------------------------------------------------------------------------
-# Tool Registry — real tools via ToolRegistry (same pattern as GraphExecutor)
-# -------------------------------------------------------------------------
-
-TOOL_REGISTRY = ToolRegistry()
-
-# Credential store: Aden sync (OAuth2 tokens) + encrypted files + env var fallback
-_env_mapping = {name: spec.env_var for name, spec in CREDENTIAL_SPECS.items()}
-_local_storage = CompositeStorage(
-    primary=EncryptedFileStorage(),
-    fallbacks=[EnvVarStorage(env_mapping=_env_mapping)],
-)
-
-if os.environ.get("ADEN_API_KEY"):
-    try:
-        from framework.credentials.aden import (  # noqa: E402
-            AdenCachedStorage,
-            AdenClientConfig,
-            AdenCredentialClient,
-            AdenSyncProvider,
-        )
-
-        _client = AdenCredentialClient(AdenClientConfig(base_url="https://api.adenhq.com"))
-        _provider = AdenSyncProvider(client=_client)
-        _storage = AdenCachedStorage(
-            local_storage=_local_storage,
-            aden_provider=_provider,
-        )
-        _cred_store = CredentialStore(storage=_storage, providers=[_provider], auto_refresh=True)
-        _synced = _provider.sync_all(_cred_store)
-        logger.info("Synced %d credentials from Aden", _synced)
-    except Exception as e:
-        logger.warning("Aden sync unavailable: %s", e)
-        _cred_store = CredentialStore(storage=_local_storage)
-else:
-    logger.info("ADEN_API_KEY not set, using local credential storage")
-    _cred_store = CredentialStore(storage=_local_storage)
-
-CREDENTIALS = CredentialStoreAdapter(_cred_store)
-
-# Debug: log which credentials resolved
-for _name in ["brave_search", "hubspot", "anthropic"]:
-    _val = CREDENTIALS.get(_name)
-    if _val:
-        logger.debug("credential %s: OK (len=%d)", _name, len(_val))
-    else:
-        logger.debug("credential %s: not found", _name)
-
-# --- web_search (Brave Search API) ---
-
-TOOL_REGISTRY.register(
-    name="web_search",
-    tool=Tool(
-        name="web_search",
-        description=(
-            "Search the web for current information. "
-            "Returns titles, URLs, and snippets from search results."
-        ),
-        parameters={
-            "type": "object",
-            "properties": {
-                "query": {
-                    "type": "string",
-                    "description": "The search query (1-500 characters)",
-                },
-                "num_results": {
-                    "type": "integer",
-                    "description": "Number of results to return (1-20, default 10)",
-                },
-            },
-            "required": ["query"],
-        },
-    ),
-    executor=lambda inputs: _exec_web_search(inputs),
-)
-
-
-def _exec_web_search(inputs: dict) -> dict:
-    api_key = CREDENTIALS.get("brave_search")
-    if not api_key:
-        return {"error": "brave_search credential not configured"}
-    query = inputs.get("query", "")
-    num_results = min(inputs.get("num_results", 10), 20)
-    resp = httpx.get(
-        "https://api.search.brave.com/res/v1/web/search",
-        params={"q": query, "count": num_results},
-        headers={"X-Subscription-Token": api_key, "Accept": "application/json"},
-        timeout=30.0,
-    )
-    if resp.status_code != 200:
-        return {"error": f"Brave API HTTP {resp.status_code}"}
-    data = resp.json()
-    results = [
-        {
-            "title": item.get("title", ""),
-            "url": item.get("url", ""),
-            "snippet": item.get("description", ""),
-        }
-        for item in data.get("web", {}).get("results", [])[:num_results]
-    ]
-    return {"query": query, "results": results, "total": len(results)}
-
-
-# --- web_scrape (httpx + BeautifulSoup, no playwright for sync compat) ---
-
-TOOL_REGISTRY.register(
-    name="web_scrape",
-    tool=Tool(
-        name="web_scrape",
-        description=(
-            "Scrape and extract text content from a webpage URL. "
-            "Returns the page title and main text content."
-        ),
-        parameters={
-            "type": "object",
-            "properties": {
-                "url": {
-                    "type": "string",
-                    "description": "URL of the webpage to scrape",
-                },
-                "max_length": {
-                    "type": "integer",
-                    "description": "Maximum text length (default 50000)",
-                },
-            },
-            "required": ["url"],
-        },
-    ),
-    executor=lambda inputs: _exec_web_scrape(inputs),
-)
-
-_SCRAPE_HEADERS = {
-    "User-Agent": (
-        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
-        "AppleWebKit/537.36 (KHTML, like Gecko) "
-        "Chrome/131.0.0.0 Safari/537.36"
-    ),
-    "Accept": "text/html,application/xhtml+xml",
-}
-
-
-def _exec_web_scrape(inputs: dict) -> dict:
-    url = inputs.get("url", "")
-    max_length = max(1000, min(inputs.get("max_length", 50000), 500000))
-    if not url.startswith(("http://", "https://")):
-        url = "https://" + url
-    try:
-        resp = httpx.get(url, timeout=30.0, follow_redirects=True, headers=_SCRAPE_HEADERS)
-        if resp.status_code != 200:
-            return {"error": f"HTTP {resp.status_code}"}
-        soup = BeautifulSoup(resp.text, "html.parser")
-        for tag in soup(["script", "style", "nav", "footer", "header", "aside", "noscript"]):
-            tag.decompose()
-        title = soup.title.get_text(strip=True) if soup.title else ""
-        main = (
-            soup.find("article")
-            or soup.find("main")
-            or soup.find(attrs={"role": "main"})
-            or soup.find("body")
-        )
-        text = main.get_text(separator=" ", strip=True) if main else ""
-        text = " ".join(text.split())
-        if len(text) > max_length:
-            text = text[:max_length] + "..."
-        return {"url": url, "title": title, "content": text, "length": len(text)}
-    except httpx.TimeoutException:
-        return {"error": "Request timed out"}
-    except Exception as e:
-        return {"error": f"Scrape failed: {e}"}
-
-
-# --- HubSpot CRM tools (optional, requires HUBSPOT_ACCESS_TOKEN) ---
-
-_HUBSPOT_API = "https://api.hubapi.com"
-
-
-def _hubspot_headers() -> dict | None:
-    token = CREDENTIALS.get("hubspot")
-    if token:
-        logger.debug("HubSpot token: %s...%s (len=%d)", token[:8], token[-4:], len(token))
-    else:
-        logger.debug("HubSpot token: not found")
-    if not token:
-        return None
-    return {
-        "Authorization": f"Bearer {token}",
-        "Content-Type": "application/json",
-        "Accept": "application/json",
-    }
-
-
-def _exec_hubspot_search(inputs: dict) -> dict:
-    headers = _hubspot_headers()
-    if not headers:
-        return {"error": "HUBSPOT_ACCESS_TOKEN not set"}
-    object_type = inputs.get("object_type", "contacts")
-    query = inputs.get("query", "")
-    limit = min(inputs.get("limit", 10), 100)
-    body: dict = {"limit": limit}
-    if query:
-        body["query"] = query
-    try:
-        resp = httpx.post(
-            f"{_HUBSPOT_API}/crm/v3/objects/{object_type}/search",
-            headers=headers,
-            json=body,
-            timeout=30.0,
-        )
-        if resp.status_code != 200:
-            return {"error": f"HubSpot API HTTP {resp.status_code}: {resp.text[:200]}"}
-        return resp.json()
-    except httpx.TimeoutException:
-        return {"error": "Request timed out"}
-    except Exception as e:
-        return {"error": f"HubSpot error: {e}"}
-
-
-TOOL_REGISTRY.register(
-    name="hubspot_search",
-    tool=Tool(
-        name="hubspot_search",
-        description=(
-            "Search HubSpot CRM objects (contacts, companies, or deals). "
-            "Returns matching records with their properties."
-        ),
-        parameters={
-            "type": "object",
-            "properties": {
-                "object_type": {
-                    "type": "string",
-                    "description": "CRM object type: 'contacts', 'companies', or 'deals'",
-                },
-                "query": {
-                    "type": "string",
-                    "description": "Search query (name, email, domain, etc.)",
-                },
-                "limit": {
-                    "type": "integer",
-                    "description": "Max results (1-100, default 10)",
-                },
-            },
-            "required": ["object_type"],
-        },
-    ),
-    executor=lambda inputs: _exec_hubspot_search(inputs),
-)
-
-logger.info(
-    "ToolRegistry loaded: %s",
-    ", ".join(TOOL_REGISTRY.get_registered_names()),
-)
-
-
-# -------------------------------------------------------------------------
-# HTML page (embedded)
-# -------------------------------------------------------------------------
-
-HTML_PAGE = (  # noqa: E501
-    """<!DOCTYPE html>
-<html lang="en">
-<head>
-<meta charset="utf-8">
-<meta name="viewport" content="width=device-width, initial-scale=1">
-<title>EventLoopNode Live Demo</title>
-<style>
-  * { box-sizing: border-box; margin: 0; padding: 0; }
-  body {
-    font-family: 'SF Mono', 'Fira Code', monospace;
-    background: #0d1117; color: #c9d1d9;
-    height: 100vh; display: flex; flex-direction: column;
-  }
-  header {
-    background: #161b22; padding: 12px 20px;
-    border-bottom: 1px solid #30363d;
-    display: flex; align-items: center; gap: 16px;
-  }
-  header h1 { font-size: 16px; color: #58a6ff; font-weight: 600; }
-  .status {
-    font-size: 12px; padding: 3px 10px; border-radius: 12px;
-    background: #21262d; color: #8b949e;
-  }
-  .status.running { background: #1a4b2e; color: #3fb950; }
-  .status.done { background: #1a3a5c; color: #58a6ff; }
-  .status.error { background: #4b1a1a; color: #f85149; }
-  .chat { flex: 1; overflow-y: auto; padding: 16px; }
-  .msg {
-    margin: 8px 0; padding: 10px 14px; border-radius: 8px;
-    line-height: 1.6; white-space: pre-wrap; word-wrap: break-word;
-  }
-  .msg.user { background: #1a3a5c; color: #58a6ff; }
-  .msg.assistant { background: #161b22; color: #c9d1d9; }
-  .msg.event {
-    background: transparent; color: #8b949e; font-size: 11px;
-    padding: 4px 14px; border-left: 3px solid #30363d;
-  }
-  .msg.event.loop { border-left-color: #58a6ff; }
-  .msg.event.tool { border-left-color: #d29922; }
-  .msg.event.stall { border-left-color: #f85149; }
-  .input-bar {
-    padding: 12px 16px; background: #161b22;
-    border-top: 1px solid #30363d; display: flex; gap: 8px;
-  }
-  .input-bar input {
-    flex: 1; background: #0d1117; border: 1px solid #30363d;
-    color: #c9d1d9; padding: 8px 12px; border-radius: 6px;
-    font-family: inherit; font-size: 14px; outline: none;
-  }
-  .input-bar input:focus { border-color: #58a6ff; }
-  .input-bar button {
-    background: #238636; color: #fff; border: none;
-    padding: 8px 20px; border-radius: 6px; cursor: pointer;
-    font-family: inherit; font-weight: 600;
-  }
-  .input-bar button:hover { background: #2ea043; }
-  .input-bar button:disabled {
-    background: #21262d; color: #484f58; cursor: not-allowed;
-  }
-  .input-bar button.clear { background: #da3633; }
-  .input-bar button.clear:hover { background: #f85149; }
-</style>
-</head>
-<body>
-  <header>
-    <h1>EventLoopNode Live</h1>
-    <span id="status" class="status">Idle</span>
-    <span id="iter" class="status" style="display:none">Step 0</span>
-  </header>
-  <div id="chat" class="chat"></div>
-  <div class="input-bar">
-    <input id="input" type="text"
-           placeholder="Ask anything..." autofocus />
-    <button id="go" onclick="run()">Send</button>
-    <button class="clear"
-            onclick="clearConversation()">Clear</button>
-  </div>
-
-<script>
-let ws = null;
-let currentAssistantEl = null;
-let iterCount = 0;
-const chat = document.getElementById('chat');
-const status = document.getElementById('status');
-const iterEl = document.getElementById('iter');
-const goBtn = document.getElementById('go');
-const inputEl = document.getElementById('input');
-
-inputEl.addEventListener('keydown', e => {
-  if (e.key === 'Enter') run();
-});
-
-function setStatus(text, cls) {
-  status.textContent = text;
-  status.className = 'status ' + cls;
-}
-
-function addMsg(text, cls) {
-  const el = document.createElement('div');
-  el.className = 'msg ' + cls;
-  el.textContent = text;
-  chat.appendChild(el);
-  chat.scrollTop = chat.scrollHeight;
-  return el;
-}
-
-function connect() {
-  ws = new WebSocket('ws://' + location.host + '/ws');
-  ws.onopen = () => {
-    setStatus('Ready', 'done');
-    goBtn.disabled = false;
-  };
-  ws.onmessage = handleEvent;
-  ws.onerror = () => { setStatus('Error', 'error'); };
-  ws.onclose = () => {
-    setStatus('Reconnecting...', '');
-    goBtn.disabled = true;
-    setTimeout(connect, 2000);
-  };
-}
-
-function handleEvent(msg) {
-  const evt = JSON.parse(msg.data);
-
-  if (evt.type === 'llm_text_delta') {
-    if (currentAssistantEl) {
-      currentAssistantEl.textContent += evt.content;
-      chat.scrollTop = chat.scrollHeight;
-    }
-  }
-  else if (evt.type === 'ready') {
-    setStatus('Ready', 'done');
-    if (currentAssistantEl && !currentAssistantEl.textContent)
-      currentAssistantEl.remove();
-    goBtn.disabled = false;
-  }
-  else if (evt.type === 'node_loop_iteration') {
-    iterCount = evt.iteration || (iterCount + 1);
-    iterEl.textContent = 'Step ' + iterCount;
-    iterEl.style.display = '';
-  }
-  else if (evt.type === 'tool_call_started') {
-    var info = evt.tool_name + '('
-      + JSON.stringify(evt.tool_input).slice(0, 120) + ')';
-    addMsg('TOOL  ' + info, 'event tool');
-  }
-  else if (evt.type === 'tool_call_completed') {
-    var preview = (evt.result || '').slice(0, 200);
-    var cls = evt.is_error ? 'stall' : 'tool';
-    addMsg('RESULT  ' + evt.tool_name + ': ' + preview,
-           'event ' + cls);
-    currentAssistantEl = addMsg('', 'assistant');
-  }
-  else if (evt.type === 'result') {
-    setStatus('Session ended', evt.success ? 'done' : 'error');
-    if (evt.error) addMsg('ERROR  ' + evt.error, 'event stall');
-    if (currentAssistantEl && !currentAssistantEl.textContent)
-      currentAssistantEl.remove();
-    goBtn.disabled = false;
-  }
-  else if (evt.type === 'node_stalled') {
-    addMsg('STALLED  ' + evt.reason, 'event stall');
-  }
-  else if (evt.type === 'cleared') {
-    chat.innerHTML = '';
-    iterCount = 0;
-    iterEl.textContent = 'Step 0';
-    iterEl.style.display = 'none';
-    setStatus('Ready', 'done');
-    goBtn.disabled = false;
-  }
-}
-
-function run() {
-  const text = inputEl.value.trim();
-  if (!text || !ws || ws.readyState !== 1) return;
-  addMsg(text, 'user');
-  currentAssistantEl = addMsg('', 'assistant');
-  inputEl.value = '';
-  setStatus('Running', 'running');
-  goBtn.disabled = true;
-  ws.send(JSON.stringify({ topic: text }));
-}
-
-function clearConversation() {
-  if (ws && ws.readyState === 1) {
-    ws.send(JSON.stringify({ command: 'clear' }));
-  }
-}
-
-connect();
-</script>
-</body>
-</html>"""
-)
-
-
-# -------------------------------------------------------------------------
-# WebSocket handler
-# -------------------------------------------------------------------------
-
-
-async def handle_ws(websocket):
-    """Persistent WebSocket: long-lived EventLoopNode with client_facing blocking."""
-    global STORE
-
-    # -- Event forwarding (WebSocket ← EventBus) ----------------------------
-    bus = EventBus()
-
-    async def forward_event(event):
-        try:
-            payload = {"type": event.type.value, **event.data}
-            if event.node_id:
-                payload["node_id"] = event.node_id
-            await websocket.send(json.dumps(payload))
-        except Exception:
-            pass
-
-    bus.subscribe(
-        event_types=[
-            EventType.NODE_LOOP_STARTED,
-            EventType.NODE_LOOP_ITERATION,
-            EventType.NODE_LOOP_COMPLETED,
-            EventType.LLM_TEXT_DELTA,
-            EventType.TOOL_CALL_STARTED,
-            EventType.TOOL_CALL_COMPLETED,
-            EventType.NODE_STALLED,
-        ],
-        handler=forward_event,
-    )
-
-    # -- Per-connection state -----------------------------------------------
-    node = None
-    loop_task = None
-
-    tools = list(TOOL_REGISTRY.get_tools().values())
-    tool_executor = TOOL_REGISTRY.get_executor()
-
-    node_spec = NodeSpec(
-        id="assistant",
-        name="Chat Assistant",
-        description="A conversational assistant that remembers context across messages",
-        node_type="event_loop",
-        client_facing=True,
-        system_prompt=(
-            "You are a helpful assistant with access to tools. "
-            "You can search the web, scrape webpages, and query HubSpot CRM. "
-            "Use tools when the user asks for current information or external data. "
-            "You have full conversation history, so you can reference previous messages."
-        ),
-    )
-
-    # -- Ready callback: subscribe to CLIENT_INPUT_REQUESTED on the bus ---
-    async def on_input_requested(event):
-        try:
-            await websocket.send(json.dumps({"type": "ready"}))
-        except Exception:
-            pass
-
-    bus.subscribe(
-        event_types=[EventType.CLIENT_INPUT_REQUESTED],
-        handler=on_input_requested,
-    )
-
-    async def start_loop(first_message: str):
-        """Create an EventLoopNode and run it as a background task."""
-        nonlocal node, loop_task
-
-        memory = SharedMemory()
-        ctx = NodeContext(
-            runtime=RUNTIME,
-            node_id="assistant",
-            node_spec=node_spec,
-            memory=memory,
-            input_data={},
-            llm=LLM,
-            available_tools=tools,
-        )
-        node = EventLoopNode(
-            event_bus=bus,
-            config=LoopConfig(max_iterations=10_000, max_context_tokens=32_000),
-            conversation_store=STORE,
-            tool_executor=tool_executor,
-        )
-        await node.inject_event(first_message)
-
-        async def _run():
-            try:
-                result = await node.execute(ctx)
-                try:
-                    await websocket.send(
-                        json.dumps(
-                            {
-                                "type": "result",
-                                "success": result.success,
-                                "output": result.output,
-                                "error": result.error,
-                                "tokens": result.tokens_used,
-                            }
-                        )
-                    )
-                except Exception:
-                    pass
-                logger.info(f"Loop ended: success={result.success}, tokens={result.tokens_used}")
-            except websockets.exceptions.ConnectionClosed:
-                logger.info("Loop stopped: WebSocket closed")
-            except Exception as e:
-                logger.exception("Loop error")
-                try:
-                    await websocket.send(
-                        json.dumps(
-                            {
-                                "type": "result",
-                                "success": False,
-                                "error": str(e),
-                                "output": {},
-                            }
-                        )
-                    )
-                except Exception:
-                    pass
-
-        loop_task = asyncio.create_task(_run())
-
-    async def stop_loop():
-        """Signal the node and wait for the loop task to finish."""
-        nonlocal node, loop_task
-        if loop_task and not loop_task.done():
-            if node:
-                node.signal_shutdown()
-            try:
-                await asyncio.wait_for(loop_task, timeout=5.0)
-            except (TimeoutError, asyncio.CancelledError):
-                loop_task.cancel()
-        node = None
-        loop_task = None
-
-    # -- Message loop (runs for the lifetime of this WebSocket) -------------
-    try:
-        async for raw in websocket:
-            try:
-                msg = json.loads(raw)
-            except Exception:
-                continue
-
-            # Clear command
-            if msg.get("command") == "clear":
-                import shutil
-
-                await stop_loop()
-                await STORE.close()
-                conv_dir = STORE_DIR / "conversation"
-                if conv_dir.exists():
-                    shutil.rmtree(conv_dir)
-                STORE = FileConversationStore(conv_dir)
-                await websocket.send(json.dumps({"type": "cleared"}))
-                logger.info("Conversation cleared")
-                continue
-
-            topic = msg.get("topic", "")
-            if not topic:
-                continue
-
-            if node is None:
-                # First message — spin up the loop
-                logger.info(f"Starting persistent loop: {topic}")
-                await start_loop(topic)
-            else:
-                # Subsequent message — inject into the running loop
-                logger.info(f"Injecting message: {topic}")
-                await node.inject_event(topic)
-
-    except websockets.exceptions.ConnectionClosed:
-        pass
-    finally:
-        await stop_loop()
-        logger.info("WebSocket closed, loop stopped")
-
-
-# -------------------------------------------------------------------------
-# HTTP handler for serving the HTML page
-# -------------------------------------------------------------------------
-
-
-async def process_request(connection, request: Request):
-    """Serve HTML on GET /, upgrade to WebSocket on /ws."""
-    if request.path == "/ws":
-        return None  # let websockets handle the upgrade
-    # Serve the HTML page for any other path
-    return Response(
-        HTTPStatus.OK,
-        "OK",
-        websockets.Headers({"Content-Type": "text/html; charset=utf-8"}),
-        HTML_PAGE.encode(),
-    )
-
-
-# -------------------------------------------------------------------------
-# Main
-# -------------------------------------------------------------------------
-
-
-async def main():
-    port = 8765
-    async with websockets.serve(
-        handle_ws,
-        "0.0.0.0",
-        port,
-        process_request=process_request,
-    ):
-        logger.info(f"Demo running at http://localhost:{port}")
-        logger.info("Open in your browser and enter a topic to research.")
-        await asyncio.Future()  # run forever
-
-
-if __name__ == "__main__":
-    asyncio.run(main())
@@ -1,930 +0,0 @@
-#!/usr/bin/env python3
-"""
-Two-Node ContextHandoff Demo
-
-Demonstrates ContextHandoff between two EventLoopNode instances:
-  Node A (Researcher) → ContextHandoff → Node B (Analyst)
-
-Real LLM, real FileConversationStore, real EventBus.
-Streams both nodes to a browser via WebSocket.
-
-Usage:
-    cd /home/timothy/oss/hive/core
-    python demos/handoff_demo.py
-
-    Then open http://localhost:8766 in your browser.
-"""
-
-import asyncio
-import json
-import logging
-import sys
-import tempfile
-from http import HTTPStatus
-from pathlib import Path
-
-import httpx
-import websockets
-from bs4 import BeautifulSoup
-from websockets.http11 import Request, Response
-
-# Add core, tools, and hive root to path
-_CORE_DIR = Path(__file__).resolve().parent.parent
-_HIVE_DIR = _CORE_DIR.parent
-sys.path.insert(0, str(_CORE_DIR))  # framework.*
-sys.path.insert(0, str(_HIVE_DIR / "tools" / "src"))  # aden_tools.*
-sys.path.insert(0, str(_HIVE_DIR))  # core.framework.* (for aden_tools imports)
-
-from aden_tools.credentials import CREDENTIAL_SPECS, CredentialStoreAdapter  # noqa: E402
-from core.framework.credentials import CredentialStore  # noqa: E402
-
-from framework.credentials.storage import (  # noqa: E402
-    CompositeStorage,
-    EncryptedFileStorage,
-    EnvVarStorage,
-)
-from framework.graph.context_handoff import ContextHandoff  # noqa: E402
-from framework.graph.conversation import NodeConversation  # noqa: E402
-from framework.graph.event_loop_node import EventLoopNode, LoopConfig  # noqa: E402
-from framework.graph.node import NodeContext, NodeSpec, SharedMemory  # noqa: E402
-from framework.llm.litellm import LiteLLMProvider  # noqa: E402
-from framework.llm.provider import Tool  # noqa: E402
-from framework.runner.tool_registry import ToolRegistry  # noqa: E402
-from framework.runtime.core import Runtime  # noqa: E402
-from framework.runtime.event_bus import EventBus, EventType  # noqa: E402
-from framework.storage.conversation_store import FileConversationStore  # noqa: E402
-
-logging.basicConfig(level=logging.INFO, format="%(asctime)s %(name)s %(message)s")
-logger = logging.getLogger("handoff_demo")
-
-# -------------------------------------------------------------------------
-# Persistent state
-# -------------------------------------------------------------------------
-
-STORE_DIR = Path(tempfile.mkdtemp(prefix="hive_handoff_"))
-RUNTIME = Runtime(STORE_DIR / "runtime")
-LLM = LiteLLMProvider(model="claude-sonnet-4-5-20250929")
-
-# -------------------------------------------------------------------------
-# Credentials
-# -------------------------------------------------------------------------
-
-# Composite credential store: encrypted files (primary) + env vars (fallback)
-_env_mapping = {name: spec.env_var for name, spec in CREDENTIAL_SPECS.items()}
-_composite = CompositeStorage(
-    primary=EncryptedFileStorage(),
-    fallbacks=[EnvVarStorage(env_mapping=_env_mapping)],
-)
-CREDENTIALS = CredentialStoreAdapter(CredentialStore(storage=_composite))
-
-for _name in ["brave_search", "hubspot"]:
-    _val = CREDENTIALS.get(_name)
-    if _val:
-        logger.debug("credential %s: OK (len=%d)", _name, len(_val))
-    else:
-        logger.debug("credential %s: not found", _name)
-
-# -------------------------------------------------------------------------
-# Tool Registry — web_search + web_scrape for Node A (Researcher)
-# -------------------------------------------------------------------------
-
-TOOL_REGISTRY = ToolRegistry()
-
-
-def _exec_web_search(inputs: dict) -> dict:
-    api_key = CREDENTIALS.get("brave_search")
-    if not api_key:
-        return {"error": "brave_search credential not configured"}
-    query = inputs.get("query", "")
-    num_results = min(inputs.get("num_results", 10), 20)
-    resp = httpx.get(
-        "https://api.search.brave.com/res/v1/web/search",
-        params={"q": query, "count": num_results},
-        headers={
-            "X-Subscription-Token": api_key,
-            "Accept": "application/json",
-        },
-        timeout=30.0,
-    )
-    if resp.status_code != 200:
-        return {"error": f"Brave API HTTP {resp.status_code}"}
-    data = resp.json()
-    results = [
-        {
-            "title": item.get("title", ""),
-            "url": item.get("url", ""),
-            "snippet": item.get("description", ""),
-        }
-        for item in data.get("web", {}).get("results", [])[:num_results]
-    ]
-    return {"query": query, "results": results, "total": len(results)}
-
-
-TOOL_REGISTRY.register(
-    name="web_search",
-    tool=Tool(
-        name="web_search",
-        description=(
-            "Search the web for current information. "
-            "Returns titles, URLs, and snippets from search results."
-        ),
-        parameters={
-            "type": "object",
-            "properties": {
-                "query": {
-                    "type": "string",
-                    "description": "The search query (1-500 characters)",
-                },
-                "num_results": {
-                    "type": "integer",
-                    "description": "Number of results (1-20, default 10)",
-                },
-            },
-            "required": ["query"],
-        },
-    ),
-    executor=lambda inputs: _exec_web_search(inputs),
-)
-
-_SCRAPE_HEADERS = {
-    "User-Agent": (
-        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
-        "AppleWebKit/537.36 (KHTML, like Gecko) "
-        "Chrome/131.0.0.0 Safari/537.36"
-    ),
-    "Accept": "text/html,application/xhtml+xml",
-}
-
-
-def _exec_web_scrape(inputs: dict) -> dict:
-    url = inputs.get("url", "")
-    max_length = max(1000, min(inputs.get("max_length", 50000), 500000))
-    if not url.startswith(("http://", "https://")):
-        url = "https://" + url
-    try:
-        resp = httpx.get(
-            url,
-            timeout=30.0,
-            follow_redirects=True,
-            headers=_SCRAPE_HEADERS,
-        )
-        if resp.status_code != 200:
-            return {"error": f"HTTP {resp.status_code}"}
-        soup = BeautifulSoup(resp.text, "html.parser")
-        for tag in soup(["script", "style", "nav", "footer", "header", "aside", "noscript"]):
-            tag.decompose()
-        title = soup.title.get_text(strip=True) if soup.title else ""
-        main = (
-            soup.find("article")
-            or soup.find("main")
-            or soup.find(attrs={"role": "main"})
-            or soup.find("body")
-        )
-        text = main.get_text(separator=" ", strip=True) if main else ""
-        text = " ".join(text.split())
-        if len(text) > max_length:
-            text = text[:max_length] + "..."
-        return {
-            "url": url,
-            "title": title,
-            "content": text,
-            "length": len(text),
-        }
-    except httpx.TimeoutException:
-        return {"error": "Request timed out"}
-    except Exception as e:
-        return {"error": f"Scrape failed: {e}"}
-
-
-TOOL_REGISTRY.register(
-    name="web_scrape",
-    tool=Tool(
-        name="web_scrape",
-        description=(
-            "Scrape and extract text content from a webpage URL. "
-            "Returns the page title and main text content."
-        ),
-        parameters={
-            "type": "object",
-            "properties": {
-                "url": {
-                    "type": "string",
-                    "description": "URL of the webpage to scrape",
-                },
-                "max_length": {
-                    "type": "integer",
-                    "description": "Maximum text length (default 50000)",
-                },
-            },
-            "required": ["url"],
-        },
-    ),
-    executor=lambda inputs: _exec_web_scrape(inputs),
-)
-
-logger.info(
-    "ToolRegistry loaded: %s",
-    ", ".join(TOOL_REGISTRY.get_registered_names()),
-)
-
-# -------------------------------------------------------------------------
-# Node Specs
-# -------------------------------------------------------------------------
-
-RESEARCHER_SPEC = NodeSpec(
-    id="researcher",
-    name="Researcher",
-    description="Researches a topic using web search and scraping tools",
-    node_type="event_loop",
-    input_keys=["topic"],
-    output_keys=["research_summary"],
-    system_prompt=(
-        "You are a thorough research assistant. Your job is to research "
-        "the given topic using the web_search and web_scrape tools.\n\n"
-        "1. Search for relevant information on the topic\n"
-        "2. Scrape 1-2 of the most promising URLs for details\n"
-        "3. Synthesize your findings into a comprehensive summary\n"
-        "4. Use set_output with key='research_summary' to save your "
-        "findings\n\n"
-        "Be thorough but efficient. Aim for 2-4 search/scrape calls, "
-        "then summarize and set_output."
-    ),
-)
-
-ANALYST_SPEC = NodeSpec(
-    id="analyst",
-    name="Analyst",
-    description="Analyzes research findings and provides insights",
-    node_type="event_loop",
-    input_keys=["context"],
-    output_keys=["analysis"],
-    system_prompt=(
-        "You are a strategic analyst. You receive research findings from "
-        "a previous researcher and must:\n\n"
-        "1. Identify key themes and patterns\n"
-        "2. Assess the reliability and significance of the findings\n"
-        "3. Provide actionable insights and recommendations\n"
-        "4. Use set_output with key='analysis' to save your analysis\n\n"
-        "Be concise but insightful. Focus on what matters most."
-    ),
-)
-
-
-# -------------------------------------------------------------------------
-# HTML page
-# -------------------------------------------------------------------------
-
-HTML_PAGE = (  # noqa: E501
-    """<!DOCTYPE html>
-<html lang="en">
-<head>
-<meta charset="utf-8">
-<meta name="viewport" content="width=device-width, initial-scale=1">
-<title>ContextHandoff Demo</title>
-<style>
-  * {
-    box-sizing: border-box;
-    margin: 0;
-    padding: 0;
-  }
-  body {
-    font-family: 'SF Mono', 'Fira Code', monospace;
-    background: #0d1117;
-    color: #c9d1d9;
-    height: 100vh;
-    display: flex;
-    flex-direction: column;
-  }
-  header {
-    background: #161b22;
-    padding: 12px 20px;
-    border-bottom: 1px solid #30363d;
-    display: flex;
-    align-items: center;
-    gap: 16px;
-  }
-  header h1 {
-    font-size: 16px;
-    color: #58a6ff;
-    font-weight: 600;
-  }
-  .badge {
-    font-size: 12px;
-    padding: 3px 10px;
-    border-radius: 12px;
-    background: #21262d;
-    color: #8b949e;
-  }
-  .badge.researcher {
-    background: #1a3a5c;
-    color: #58a6ff;
-  }
-  .badge.analyst {
-    background: #1a4b2e;
-    color: #3fb950;
-  }
-  .badge.handoff {
-    background: #3d1f00;
-    color: #d29922;
-  }
-  .badge.done {
-    background: #21262d;
-    color: #8b949e;
-  }
-  .badge.error {
-    background: #4b1a1a;
-    color: #f85149;
-  }
-  .chat {
-    flex: 1;
-    overflow-y: auto;
-    padding: 16px;
-  }
-  .msg {
-    margin: 8px 0;
-    padding: 10px 14px;
-    border-radius: 8px;
-    line-height: 1.6;
-    white-space: pre-wrap;
-    word-wrap: break-word;
-  }
-  .msg.user {
-    background: #1a3a5c;
-    color: #58a6ff;
-  }
-  .msg.assistant {
-    background: #161b22;
-    color: #c9d1d9;
-  }
-  .msg.assistant.analyst-msg {
-    border-left: 3px solid #3fb950;
-  }
-  .msg.event {
-    background: transparent;
-    color: #8b949e;
-    font-size: 11px;
-    padding: 4px 14px;
-    border-left: 3px solid #30363d;
-  }
-  .msg.event.loop {
-    border-left-color: #58a6ff;
-  }
-  .msg.event.tool {
-    border-left-color: #d29922;
-  }
-  .msg.event.stall {
-    border-left-color: #f85149;
-  }
-  .handoff-banner {
-    margin: 16px 0;
-    padding: 16px;
-    background: #1c1200;
-    border: 1px solid #d29922;
-    border-radius: 8px;
-    text-align: center;
-  }
-  .handoff-banner h3 {
-    color: #d29922;
-    font-size: 14px;
-    margin-bottom: 8px;
-  }
-  .handoff-banner p, .result-banner p {
-    color: #8b949e;
-    font-size: 12px;
-    line-height: 1.5;
-    max-height: 200px;
-    overflow-y: auto;
-    white-space: pre-wrap;
-    text-align: left;
-  }
-  .result-banner {
-    margin: 16px 0;
-    padding: 16px;
-    background: #0a2614;
-    border: 1px solid #3fb950;
-    border-radius: 8px;
-  }
-  .result-banner h3 {
-    color: #3fb950;
-    font-size: 14px;
-    margin-bottom: 8px;
-    text-align: center;
-  }
-  .result-banner .label {
-    color: #58a6ff;
-    font-size: 11px;
-    font-weight: 600;
-    margin-top: 10px;
-    margin-bottom: 2px;
-  }
-  .result-banner .tokens {
-    color: #484f58;
-    font-size: 11px;
-    text-align: center;
-    margin-top: 10px;
-  }
-  .input-bar {
-    padding: 12px 16px;
-    background: #161b22;
-    border-top: 1px solid #30363d;
-    display: flex;
-    gap: 8px;
-  }
-  .input-bar input {
-    flex: 1;
-    background: #0d1117;
-    border: 1px solid #30363d;
-    color: #c9d1d9;
-    padding: 8px 12px;
-    border-radius: 6px;
-    font-family: inherit;
-    font-size: 14px;
-    outline: none;
-  }
-  .input-bar input:focus {
-    border-color: #58a6ff;
-  }
-  .input-bar button {
-    background: #238636;
-    color: #fff;
-    border: none;
-    padding: 8px 20px;
-    border-radius: 6px;
-    cursor: pointer;
-    font-family: inherit;
-    font-weight: 600;
-  }
-  .input-bar button:hover {
-    background: #2ea043;
-  }
-  .input-bar button:disabled {
-    background: #21262d;
-    color: #484f58;
-    cursor: not-allowed;
-  }
-</style>
-</head>
-<body>
-  <header>
-    <h1>ContextHandoff Demo</h1>
-    <span id="phase" class="badge">Idle</span>
-    <span id="iter" class="badge" style="display:none">Step 0</span>
-  </header>
-  <div id="chat" class="chat"></div>
-  <div class="input-bar">
-    <input id="input" type="text"
-           placeholder="Enter a research topic..." autofocus />
-    <button id="go" onclick="run()">Research</button>
-  </div>
-
-<script>
-let ws = null;
-let currentAssistantEl = null;
-let iterCount = 0;
-let currentPhase = 'idle';
-const chat = document.getElementById('chat');
-const phase = document.getElementById('phase');
-const iterEl = document.getElementById('iter');
-const goBtn = document.getElementById('go');
-const inputEl = document.getElementById('input');
-
-inputEl.addEventListener('keydown', e => {
-  if (e.key === 'Enter') run();
-});
-
-function setPhase(text, cls) {
-  phase.textContent = text;
-  phase.className = 'badge ' + cls;
-  currentPhase = cls;
-}
-
-function addMsg(text, cls) {
-  const el = document.createElement('div');
-  el.className = 'msg ' + cls;
-  el.textContent = text;
-  chat.appendChild(el);
-  chat.scrollTop = chat.scrollHeight;
-  return el;
-}
-
-function addHandoffBanner(summary) {
-  const banner = document.createElement('div');
-  banner.className = 'handoff-banner';
-  const h3 = document.createElement('h3');
-  h3.textContent = 'Context Handoff: Researcher -> Analyst';
-  const p = document.createElement('p');
-  p.textContent = summary || 'Passing research context...';
-  banner.appendChild(h3);
-  banner.appendChild(p);
-  chat.appendChild(banner);
-  chat.scrollTop = chat.scrollHeight;
-}
-
-function addResultBanner(researcher, analyst, tokens) {
-  const banner = document.createElement('div');
-  banner.className = 'result-banner';
-  const h3 = document.createElement('h3');
-  h3.textContent = 'Pipeline Complete';
-  banner.appendChild(h3);
-
-  if (researcher && researcher.research_summary) {
-    const lbl = document.createElement('div');
-    lbl.className = 'label';
-    lbl.textContent = 'RESEARCH SUMMARY';
-    banner.appendChild(lbl);
-    const p = document.createElement('p');
-    p.textContent = researcher.research_summary;
-    banner.appendChild(p);
-  }
-
-  if (analyst && analyst.analysis) {
-    const lbl = document.createElement('div');
-    lbl.className = 'label';
-    lbl.textContent = 'ANALYSIS';
-    lbl.style.color = '#3fb950';
-    banner.appendChild(lbl);
-    const p = document.createElement('p');
-    p.textContent = analyst.analysis;
-    banner.appendChild(p);
-  }
-
-  if (tokens) {
-    const t = document.createElement('div');
-    t.className = 'tokens';
-    t.textContent = 'Total tokens: ' + tokens.toLocaleString();
-    banner.appendChild(t);
-  }
-
-  chat.appendChild(banner);
-  chat.scrollTop = chat.scrollHeight;
-}
-
-function connect() {
-  ws = new WebSocket('ws://' + location.host + '/ws');
-  ws.onopen = () => {
-    setPhase('Ready', 'done');
-    goBtn.disabled = false;
-  };
-  ws.onmessage = handleEvent;
-  ws.onerror = () => { setPhase('Error', 'error'); };
-  ws.onclose = () => {
-    setPhase('Reconnecting...', '');
-    goBtn.disabled = true;
-    setTimeout(connect, 2000);
-  };
-}
-
-function handleEvent(msg) {
-  const evt = JSON.parse(msg.data);
-
-  if (evt.type === 'phase') {
-    if (evt.phase === 'researcher') {
-      setPhase('Researcher', 'researcher');
-    } else if (evt.phase === 'handoff') {
-      setPhase('Handoff', 'handoff');
-    } else if (evt.phase === 'analyst') {
-      setPhase('Analyst', 'analyst');
-    }
-    iterCount = 0;
-    iterEl.style.display = 'none';
-  }
-  else if (evt.type === 'llm_text_delta') {
-    if (currentAssistantEl) {
-      currentAssistantEl.textContent += evt.content;
-      chat.scrollTop = chat.scrollHeight;
-    }
-  }
-  else if (evt.type === 'node_loop_iteration') {
-    iterCount = evt.iteration || (iterCount + 1);
-    iterEl.textContent = 'Step ' + iterCount;
-    iterEl.style.display = '';
-  }
-  else if (evt.type === 'tool_call_started') {
-    var info = evt.tool_name + '('
-      + JSON.stringify(evt.tool_input).slice(0, 120) + ')';
-    addMsg('TOOL  ' + info, 'event tool');
-  }
-  else if (evt.type === 'tool_call_completed') {
-    var preview = (evt.result || '').slice(0, 200);
-    var cls = evt.is_error ? 'stall' : 'tool';
-    addMsg(
-      'RESULT  ' + evt.tool_name + ': ' + preview,
-      'event ' + cls
-    );
-    var assistCls = currentPhase === 'analyst'
-      ? 'assistant analyst-msg' : 'assistant';
-    currentAssistantEl = addMsg('', assistCls);
-  }
-  else if (evt.type === 'handoff_context') {
-    addHandoffBanner(evt.summary);
-    var assistCls = 'assistant analyst-msg';
-    currentAssistantEl = addMsg('', assistCls);
-  }
-  else if (evt.type === 'node_result') {
-    if (evt.node_id === 'researcher') {
-      if (currentAssistantEl
-          && !currentAssistantEl.textContent) {
-        currentAssistantEl.remove();
-      }
-    }
-  }
-  else if (evt.type === 'done') {
-    setPhase('Done', 'done');
-    iterEl.style.display = 'none';
-    if (currentAssistantEl
-        && !currentAssistantEl.textContent) {
-      currentAssistantEl.remove();
-    }
-    currentAssistantEl = null;
-    addResultBanner(
-      evt.researcher, evt.analyst, evt.total_tokens
-    );
-    goBtn.disabled = false;
-    inputEl.placeholder = 'Enter another topic...';
-  }
-  else if (evt.type === 'error') {
-    setPhase('Error', 'error');
-    addMsg('ERROR  ' + evt.message, 'event stall');
-    goBtn.disabled = false;
-  }
-  else if (evt.type === 'node_stalled') {
-    addMsg('STALLED  ' + evt.reason, 'event stall');
-  }
-}
-
-function run() {
-  const text = inputEl.value.trim();
-  if (!text || !ws || ws.readyState !== 1) return;
-  chat.innerHTML = '';
-  addMsg(text, 'user');
-  currentAssistantEl = addMsg('', 'assistant');
-  inputEl.value = '';
-  goBtn.disabled = true;
-  ws.send(JSON.stringify({ topic: text }));
-}
-
-connect();
-</script>
-</body>
-</html>"""
-)
-
-
-# -------------------------------------------------------------------------
-# WebSocket handler — sequential Node A → Handoff → Node B
-# -------------------------------------------------------------------------
-
-
-async def handle_ws(websocket):
-    """Run the two-node handoff pipeline per user message."""
-    try:
-        async for raw in websocket:
-            try:
-                msg = json.loads(raw)
-            except Exception:
-                continue
-
-            topic = msg.get("topic", "")
-            if not topic:
-                continue
-
-            logger.info(f"Starting handoff pipeline for: {topic}")
-
-            try:
-                await _run_pipeline(websocket, topic)
-            except websockets.exceptions.ConnectionClosed:
-                logger.info("WebSocket closed during pipeline")
-                return
-            except Exception as e:
-                logger.exception("Pipeline error")
-                try:
-                    await websocket.send(json.dumps({"type": "error", "message": str(e)}))
-                except Exception:
-                    pass
-
-    except websockets.exceptions.ConnectionClosed:
-        pass
-
-
-async def _run_pipeline(websocket, topic: str):
-    """Execute: Node A (research) → ContextHandoff → Node B (analysis)."""
-    import shutil
-
-    # Fresh stores for each run
-    run_dir = Path(tempfile.mkdtemp(prefix="hive_run_", dir=STORE_DIR))
-    store_a = FileConversationStore(run_dir / "node_a")
-    store_b = FileConversationStore(run_dir / "node_b")
-
-    # Shared event bus
-    bus = EventBus()
-
-    async def forward_event(event):
-        try:
-            payload = {"type": event.type.value, **event.data}
-            if event.node_id:
-                payload["node_id"] = event.node_id
-            await websocket.send(json.dumps(payload))
-        except Exception:
-            pass
-
-    bus.subscribe(
-        event_types=[
-            EventType.NODE_LOOP_STARTED,
-            EventType.NODE_LOOP_ITERATION,
-            EventType.NODE_LOOP_COMPLETED,
-            EventType.LLM_TEXT_DELTA,
-            EventType.TOOL_CALL_STARTED,
-            EventType.TOOL_CALL_COMPLETED,
-            EventType.NODE_STALLED,
-        ],
-        handler=forward_event,
-    )
-
-    tools = list(TOOL_REGISTRY.get_tools().values())
-    tool_executor = TOOL_REGISTRY.get_executor()
-
-    # ---- Phase 1: Researcher ------------------------------------------------
-    await websocket.send(json.dumps({"type": "phase", "phase": "researcher"}))
-
-    node_a = EventLoopNode(
-        event_bus=bus,
-        judge=None,  # implicit judge: accept when output_keys filled
-        config=LoopConfig(
-            max_iterations=20,
-            max_tool_calls_per_turn=30,
-            max_context_tokens=32_000,
-        ),
-        conversation_store=store_a,
-        tool_executor=tool_executor,
-    )
-
-    ctx_a = NodeContext(
-        runtime=RUNTIME,
-        node_id="researcher",
-        node_spec=RESEARCHER_SPEC,
-        memory=SharedMemory(),
-        input_data={"topic": topic},
-        llm=LLM,
-        available_tools=tools,
-    )
-
-    result_a = await node_a.execute(ctx_a)
-    logger.info(
-        "Researcher done: success=%s, tokens=%s",
-        result_a.success,
-        result_a.tokens_used,
-    )
-
-    await websocket.send(
-        json.dumps(
-            {
-                "type": "node_result",
-                "node_id": "researcher",
-                "success": result_a.success,
-                "output": result_a.output,
-            }
-        )
-    )
-
-    if not result_a.success:
-        await websocket.send(
-            json.dumps(
-                {
-                    "type": "error",
-                    "message": f"Researcher failed: {result_a.error}",
-                }
-            )
-        )
-        return
-
-    # ---- Phase 2: Context Handoff -------------------------------------------
-    await websocket.send(json.dumps({"type": "phase", "phase": "handoff"}))
-
-    # Restore the researcher's conversation from store
-    conversation_a = await NodeConversation.restore(store_a)
-    if conversation_a is None:
-        await websocket.send(
-            json.dumps(
-                {
-                    "type": "error",
-                    "message": "Failed to restore researcher conversation",
-                }
-            )
-        )
-        return
-
-    handoff_engine = ContextHandoff(llm=LLM)
-    handoff_context = handoff_engine.summarize_conversation(
-        conversation=conversation_a,
-        node_id="researcher",
-        output_keys=["research_summary"],
-    )
-
-    formatted_handoff = ContextHandoff.format_as_input(handoff_context)
-    logger.info(
-        "Handoff: %d turns, ~%d tokens, keys=%s",
-        handoff_context.turn_count,
-        handoff_context.total_tokens_used,
-        list(handoff_context.key_outputs.keys()),
-    )
-
-    # Send handoff context to browser
-    await websocket.send(
-        json.dumps(
-            {
-                "type": "handoff_context",
-                "summary": handoff_context.summary[:500],
-                "turn_count": handoff_context.turn_count,
-                "tokens": handoff_context.total_tokens_used,
-                "key_outputs": handoff_context.key_outputs,
-            }
-        )
-    )
-
-    # ---- Phase 3: Analyst ---------------------------------------------------
-    await websocket.send(json.dumps({"type": "phase", "phase": "analyst"}))
-
-    node_b = EventLoopNode(
-        event_bus=bus,
-        judge=None,  # implicit judge
-        config=LoopConfig(
-            max_iterations=10,
-            max_tool_calls_per_turn=30,
-            max_context_tokens=32_000,
-        ),
-        conversation_store=store_b,
-    )
-
-    ctx_b = NodeContext(
-        runtime=RUNTIME,
-        node_id="analyst",
-        node_spec=ANALYST_SPEC,
-        memory=SharedMemory(),
-        input_data={"context": formatted_handoff},
-        llm=LLM,
-        available_tools=[],
-    )
-
-    result_b = await node_b.execute(ctx_b)
-    logger.info(
-        "Analyst done: success=%s, tokens=%s",
-        result_b.success,
-        result_b.tokens_used,
-    )
-
-    # ---- Done ---------------------------------------------------------------
-    await websocket.send(
-        json.dumps(
-            {
-                "type": "done",
-                "researcher": result_a.output,
-                "analyst": result_b.output,
-                "total_tokens": ((result_a.tokens_used or 0) + (result_b.tokens_used or 0)),
-            }
-        )
-    )
-
-    # Clean up temp stores
-    try:
-        shutil.rmtree(run_dir)
-    except Exception:
-        pass
-
-
-# -------------------------------------------------------------------------
-# HTTP handler
-# -------------------------------------------------------------------------
-
-
-async def process_request(connection, request: Request):
-    """Serve HTML on GET /, upgrade to WebSocket on /ws."""
-    if request.path == "/ws":
-        return None
-    return Response(
-        HTTPStatus.OK,
-        "OK",
-        websockets.Headers({"Content-Type": "text/html; charset=utf-8"}),
-        HTML_PAGE.encode(),
-    )
-
-
-# -------------------------------------------------------------------------
-# Main
-# -------------------------------------------------------------------------
-
-
-async def main():
-    port = 8766
-    async with websockets.serve(
-        handle_ws,
-        "0.0.0.0",
-        port,
-        process_request=process_request,
-    ):
-        logger.info(f"Handoff demo at http://localhost:{port}")
-        logger.info("Enter a research topic to start the pipeline.")
-        await asyncio.Future()
-
-
-if __name__ == "__main__":
-    asyncio.run(main())
@@ -27,7 +27,7 @@ class GreeterNode(NodeProtocol):
    async def execute(self, ctx: NodeContext) -> NodeResult:
        name = ctx.input_data.get("name", "World")
        greeting = f"Hello, {name}!"
-        ctx.memory.write("greeting", greeting)
+        ctx.buffer.write("greeting", greeting)
        return NodeResult(success=True, output={"greeting": greeting})


@@ -35,9 +35,9 @@ class UppercaserNode(NodeProtocol):
    """Convert text to uppercase."""

    async def execute(self, ctx: NodeContext) -> NodeResult:
-        greeting = ctx.input_data.get("greeting") or ctx.memory.read("greeting") or ""
+        greeting = ctx.input_data.get("greeting") or ctx.buffer.read("greeting") or ""
        result = greeting.upper()
-        ctx.memory.write("final_greeting", result)
+        ctx.buffer.write("final_greeting", result)
        return NodeResult(success=True, output={"final_greeting": result})


@@ -79,7 +79,7 @@ async def example_3_config_file():
    # Copy example config (in practice, you'd place this in your agent folder)
    import shutil

-    shutil.copy("examples/mcp_servers.json", test_agent_path / "mcp_servers.json")
+    shutil.copy(Path(__file__).parent / "mcp_servers.json", test_agent_path / "mcp_servers.json")

    # Load agent - MCP servers will be auto-discovered
    runner = AgentRunner.load(test_agent_path)
@@ -22,8 +22,13 @@ The framework includes a Goal-Based Testing system (Goal → Agent → Eval):
 See `framework.testing` for details.
 """

-from framework.llm import AnthropicProvider, LLMProvider
-from framework.runner import AgentOrchestrator, AgentRunner
+from framework.llm import LLMProvider
+
+try:
+    from framework.llm import AnthropicProvider  # noqa: F401
+except ImportError:
+    pass
+from framework.runner import AgentRunner
 from framework.runtime.core import Runtime
 from framework.schemas.decision import Decision, DecisionEvaluation, Option, Outcome
 from framework.schemas.run import Problem, Run, RunSummary
@@ -55,7 +60,6 @@ __all__ = [
    "AnthropicProvider",
    # Runner
    "AgentRunner",
-    "AgentOrchestrator",
    # Testing
    "Test",
    "TestResult",
@@ -1,8 +1,6 @@
 """CLI entry point for Credential Tester agent."""

 import asyncio
-import logging
-import sys

 import click

@@ -16,6 +16,7 @@ after the user picks an account programmatically.

 from __future__ import annotations

+import logging
 from pathlib import Path
 from typing import TYPE_CHECKING

@@ -25,6 +26,7 @@ from framework.graph.checkpoint_config import CheckpointConfig
 from framework.graph.edge import GraphSpec
 from framework.graph.executor import ExecutionResult
 from framework.llm import LiteLLMProvider
+from framework.runner.mcp_registry import MCPRegistry
 from framework.runner.tool_registry import ToolRegistry
 from framework.runtime.agent_runtime import AgentRuntime, create_agent_runtime
 from framework.runtime.execution_stream import EntryPointSpec
@@ -32,9 +34,13 @@ from framework.runtime.execution_stream import EntryPointSpec
 from .config import default_config
 from .nodes import build_tester_node

+logger = logging.getLogger(__name__)
+
 if TYPE_CHECKING:
    from framework.runner import AgentRunner

+logger = logging.getLogger(__name__)
+
 # ---------------------------------------------------------------------------
 # Goal
 # ---------------------------------------------------------------------------
@@ -107,7 +113,11 @@ def _list_aden_accounts() -> list[dict]:
            for c in integrations
            if c.status == "active"
        ]
+    except (ImportError, OSError) as exc:
+        logger.debug("Could not list Aden accounts: %s", exc)
+        return []
    except Exception:
+        logger.warning("Unexpected error listing Aden accounts", exc_info=True)
        return []


@@ -119,7 +129,11 @@ def _list_local_accounts() -> list[dict]:
        return [
            info.to_account_dict() for info in LocalCredentialRegistry.default().list_accounts()
        ]
+    except ImportError as exc:
+        logger.debug("Local credential registry unavailable: %s", exc)
+        return []
    except Exception:
+        logger.warning("Unexpected error listing local accounts", exc_info=True)
        return []


@@ -140,7 +154,11 @@ def _list_env_fallback_accounts() -> list[dict]:
        from framework.credentials.storage import EncryptedFileStorage

        encrypted_ids: set[str] = set(EncryptedFileStorage().list_all())
+    except (ImportError, OSError) as exc:
+        logger.debug("Could not read encrypted store: %s", exc)
+        encrypted_ids = set()
    except Exception:
+        logger.warning("Unexpected error reading encrypted store", exc_info=True)
        encrypted_ids = set()

    def _is_configured(cred_name: str, spec) -> bool:
@@ -300,8 +318,10 @@ def _activate_local_account(credential_id: str, alias: str) -> None:

            if key:
                os.environ[spec.env_var] = key
+    except (ImportError, KeyError, OSError) as exc:
+        logger.debug("Could not inject credentials: %s", exc)
    except Exception:
-        pass
+        logger.warning("Unexpected error injecting credentials", exc_info=True)


 def _configure_aden_node(
@@ -563,6 +583,23 @@ class CredentialTesterAgent:
        if mcp_config_path.exists():
            self._tool_registry.load_mcp_config(mcp_config_path)

+        try:
+            agent_dir = Path(__file__).parent
+            registry = MCPRegistry()
+            registry.initialize()
+            if (agent_dir / "mcp_registry.json").is_file():
+                self._tool_registry.set_mcp_registry_agent_path(agent_dir)
+            registry_configs, selection_max_tools = registry.load_agent_selection(agent_dir)
+            if registry_configs:
+                self._tool_registry.load_registry_servers(
+                    registry_configs,
+                    preserve_existing_tools=True,
+                    log_collisions=True,
+                    max_tools=selection_max_tools,
+                )
+        except Exception:
+            logger.warning("MCP registry config failed to load", exc_info=True)
+
        extra_kwargs = getattr(self.config, "extra_kwargs", {}) or {}
        llm = LiteLLMProvider(
            model=self.config.model,
@@ -16,31 +16,63 @@ class AgentEntry:
    description: str
    category: str
    session_count: int = 0
+    run_count: int = 0
    node_count: int = 0
    tool_count: int = 0
    tags: list[str] = field(default_factory=list)
    last_active: str | None = None


-def _get_last_active(agent_name: str) -> str | None:
-    """Return the most recent updated_at timestamp across all sessions."""
-    sessions_dir = Path.home() / ".hive" / "agents" / agent_name / "sessions"
-    if not sessions_dir.exists():
-        return None
+def _get_last_active(agent_path: Path) -> str | None:
+    """Return the most recent updated_at timestamp across all sessions.
+
+    Checks both worker sessions (``~/.hive/agents/{name}/sessions/``) and
+    queen sessions (``~/.hive/queen/session/``) whose ``meta.json`` references
+    the same *agent_path*.
+    """
+    from datetime import datetime
+
+    agent_name = agent_path.name
    latest: str | None = None
-    for session_dir in sessions_dir.iterdir():
-        if not session_dir.is_dir() or not session_dir.name.startswith("session_"):
-            continue
-        state_file = session_dir / "state.json"
-        if not state_file.exists():
-            continue
-        try:
-            data = json.loads(state_file.read_text(encoding="utf-8"))
-            ts = data.get("timestamps", {}).get("updated_at")
-            if ts and (latest is None or ts > latest):
-                latest = ts
-        except Exception:
-            continue
+
+    # 1. Worker sessions
+    sessions_dir = Path.home() / ".hive" / "agents" / agent_name / "sessions"
+    if sessions_dir.exists():
+        for session_dir in sessions_dir.iterdir():
+            if not session_dir.is_dir() or not session_dir.name.startswith("session_"):
+                continue
+            state_file = session_dir / "state.json"
+            if not state_file.exists():
+                continue
+            try:
+                data = json.loads(state_file.read_text(encoding="utf-8"))
+                ts = data.get("timestamps", {}).get("updated_at")
+                if ts and (latest is None or ts > latest):
+                    latest = ts
+            except Exception:
+                continue
+
+    # 2. Queen sessions
+    queen_sessions_dir = Path.home() / ".hive" / "queen" / "session"
+    if queen_sessions_dir.exists():
+        resolved = agent_path.resolve()
+        for d in queen_sessions_dir.iterdir():
+            if not d.is_dir():
+                continue
+            meta_file = d / "meta.json"
+            if not meta_file.exists():
+                continue
+            try:
+                meta = json.loads(meta_file.read_text(encoding="utf-8"))
+                stored = meta.get("agent_path")
+                if not stored or Path(stored).resolve() != resolved:
+                    continue
+                ts = datetime.fromtimestamp(d.stat().st_mtime).isoformat()
+                if latest is None or ts > latest:
+                    latest = ts
+            except Exception:
+                continue
+
    return latest


@@ -52,6 +84,31 @@ def _count_sessions(agent_name: str) -> int:
    return sum(1 for d in sessions_dir.iterdir() if d.is_dir() and d.name.startswith("session_"))


+def _count_runs(agent_name: str) -> int:
+    """Count unique run_ids across all sessions for an agent."""
+    sessions_dir = Path.home() / ".hive" / "agents" / agent_name / "sessions"
+    if not sessions_dir.exists():
+        return 0
+    run_ids: set[str] = set()
+    for session_dir in sessions_dir.iterdir():
+        if not session_dir.is_dir() or not session_dir.name.startswith("session_"):
+            continue
+        # runs.jsonl lives inside workspace subdirectories
+        for runs_file in session_dir.rglob("runs.jsonl"):
+            try:
+                for line in runs_file.read_text(encoding="utf-8").splitlines():
+                    line = line.strip()
+                    if not line:
+                        continue
+                    record = json.loads(line)
+                    rid = record.get("run_id")
+                    if rid:
+                        run_ids.add(rid)
+            except Exception:
+                continue
+    return len(run_ids)
+
+
 def _extract_agent_stats(agent_path: Path) -> tuple[int, int, list[str]]:
    """Extract node count, tool count, and tags from an agent directory.

@@ -139,10 +196,11 @@ def discover_agents() -> dict[str, list[AgentEntry]]:
                    description=desc,
                    category=category,
                    session_count=_count_sessions(path.name),
+                    run_count=_count_runs(path.name),
                    node_count=node_count,
                    tool_count=tool_count,
                    tags=tags,
-                    last_active=_get_last_active(path.name),
+                    last_active=_get_last_active(path),
                )
            )
        if entries:
@@ -14,8 +14,7 @@ queen_goal = Goal(
    id="queen-manager",
    name="Queen Manager",
    description=(
-        "Manage the worker agent lifecycle and serve as the user's primary "
-        "interactive interface. Triage health escalations from the judge."
+        "Manage the worker agent lifecycle and serve as the user's primary interactive interface."
    ),
    success_criteria=[],
    constraints=[],
@@ -1,18 +1,20 @@
-"""Queen thinking hook — HR persona classifier.
+"""Queen thinking hook — persona + communication style classifier.

 Fires once when the queen enters building mode at session start.
 Makes a single non-streaming LLM call (acting as an HR Director) to select
-the best-fit expert persona for the user's request, then returns a persona
-prefix string that replaces the queen's default "Solution Architect" identity.
+the best-fit expert persona for the user's request AND classify the user's
+communication style, then returns a PersonaResult containing both.

 This is designed to activate the model's latent domain expertise — a CFO
-persona on a financial question, a Lawyer on a legal question, etc.
+persona on a financial question, a Lawyer on a legal question, etc. — while
+also adapting the Queen's communication approach to the individual user.
 """

 from __future__ import annotations

 import json
 import logging
+from dataclasses import dataclass
 from typing import TYPE_CHECKING

 if TYPE_CHECKING:
@@ -21,12 +23,22 @@ if TYPE_CHECKING:
 logger = logging.getLogger(__name__)

 _HR_SYSTEM_PROMPT = """\
-You are an expert HR Director and talent consultant at a world-class firm.
-A new request has arrived and you must identify which professional's expertise
-would produce the highest-quality response.
+You are an expert HR Director and communication consultant at a world-class firm.
+A new request has arrived. You must:
+1. Identify which professional role best serves this request.
+2. Read the user's signals to determine HOW to communicate with them.
+
+For communication style, look for:
+- Technical depth: Do they use precise terms? Do they ask "how" or "what"?
+- Pace: Short messages = fast and direct. Long explanations = exploratory.
+- Tone: Are they casual ("hey, can you...") or formal ("I need a system that...")?
+
+If cross-session memory is provided, factor in what is already known about this \
+person — don't rediscover what's already understood.

 Reply with ONLY a valid JSON object — no markdown, no prose, no explanation:
-{"role": "<job title>", "persona": "<2-3 sentence first-person identity statement>"}
+{"role": "<job title>", "persona": "<2-3 sentence first-person identity statement>", \
+"style": "<one of: peer-technical, mentor-guiding, consultant-structured>"}

 Rules:
 - Choose from any real professional role: CFO, CEO, CTO, Lawyer, Data Scientist,
@@ -37,30 +49,74 @@ Rules:
 - Select the role whose domain knowledge most directly applies to solving the request.
 - If the request is clearly about coding or building software systems, pick Software Architect.
 - "Queen" is your internal alias — do not include it in the persona.
+- For style: "peer-technical" for users who demonstrate domain expertise, \
+"mentor-guiding" for users who are learning or exploring, \
+"consultant-structured" for users who want structured, accountable delivery.
+- Default to "peer-technical" if signals are ambiguous.
 """

+# Communication style directives injected into the Queen's system prompt.
+_STYLE_DIRECTIVES: dict[str, str] = {
+    "peer-technical": (
+        "## Communication Style: Peer\n\n"
+        "This person is technical. Use precise language, skip high-level "
+        "overviews they already know, and get into specifics quickly. "
+        "When they push back on a design choice, engage with the technical "
+        "argument directly."
+    ),
+    "mentor-guiding": (
+        "## Communication Style: Guide\n\n"
+        "This person is learning or exploring. Explain your reasoning as you "
+        "go — not patronizingly, but so they can follow the logic. When you "
+        "make a design choice, briefly say why. Offer to go deeper on anything."
+    ),
+    "consultant-structured": (
+        "## Communication Style: Structured\n\n"
+        "This person wants structured, accountable delivery. Lead with "
+        "summaries and options. Number your proposals. Be explicit about "
+        "trade-offs. Avoid open-ended questions — give them choices to react to."
+    ),
+}

-async def select_expert_persona(user_message: str, llm: LLMProvider) -> str:
-    """Run the HR classifier and return a persona prefix string.
+
+@dataclass
+class PersonaResult:
+    """Result of persona + style classification."""
+
+    persona_prefix: str  # e.g. "You are a CFO. I am a CFO with 20 years..."
+    style_directive: str  # e.g. "## Communication Style: Peer\n\n..."
+
+
+async def select_expert_persona(
+    user_message: str,
+    llm: LLMProvider,
+    *,
+    memory_context: str = "",
+) -> PersonaResult | None:
+    """Run the HR classifier and return a PersonaResult.

    Makes a single non-streaming acomplete() call with the session LLM.
-    Returns an empty string on any failure so the queen falls back
-    gracefully to its default "Solution Architect" identity.
+    Returns None on any failure so the queen falls back gracefully to its
+    default character with no style directive.

    Args:
        user_message: The user's opening message for the session.
        llm: The session LLM provider.
+        memory_context: Optional cross-session memory to inform style classification.

    Returns:
-        A persona prefix like "You are a CFO. I am a CFO with 20 years..."
-        or "" on failure.
+        A PersonaResult with persona_prefix and style_directive, or None on failure.
    """
    if not user_message.strip():
-        return ""
+        return None
+
+    prompt = user_message
+    if memory_context:
+        prompt = f"{user_message}\n\n{memory_context}"

    try:
        response = await llm.acomplete(
-            messages=[{"role": "user", "content": user_message}],
+            messages=[{"role": "user", "content": prompt}],
            system=_HR_SYSTEM_PROMPT,
            max_tokens=1024,
            json_mode=True,
@@ -69,12 +125,14 @@ async def select_expert_persona(user_message: str, llm: LLMProvider) -> str:
        parsed = json.loads(raw)
        role = parsed.get("role", "").strip()
        persona = parsed.get("persona", "").strip()
+        style_key = parsed.get("style", "peer-technical").strip()
        if not role or not persona:
            logger.warning("Thinking hook: empty role/persona in response: %r", raw)
-            return ""
-        result = f"You are a {role}. {persona}"
-        logger.info("Thinking hook: selected persona — %s", role)
-        return result
+            return None
+        persona_prefix = f"You are a {role}. {persona}"
+        style_directive = _STYLE_DIRECTIVES.get(style_key, _STYLE_DIRECTIVES["peer-technical"])
+        logger.info("Thinking hook: selected persona — %s, style — %s", role, style_key)
+        return PersonaResult(persona_prefix=persona_prefix, style_directive=style_directive)
    except Exception:
        logger.warning("Thinking hook: persona classification failed", exc_info=True)
-        return ""
+        return None
@@ -1,371 +0,0 @@
-"""Queen global cross-session memory.
-
-Three-tier memory architecture:
-  ~/.hive/queen/MEMORY.md                            — semantic (who, what, why)
-  ~/.hive/queen/memories/MEMORY-YYYY-MM-DD.md        — episodic (daily journals)
-  ~/.hive/queen/session/{id}/data/adapt.md           — working (session-scoped)
-
-Semantic and episodic files are injected at queen session start.
-
-Semantic memory (MEMORY.md) is updated automatically at session end via
-consolidate_queen_memory() — the queen never rewrites this herself.
-
-Episodic memory (MEMORY-date.md) can be written by the queen during a session
-via the write_to_diary tool, and is also appended to at session end by
-consolidate_queen_memory().
-"""
-
-from __future__ import annotations
-
-import asyncio
-import json
-import logging
-import traceback
-from datetime import date, datetime
-from pathlib import Path
-
-logger = logging.getLogger(__name__)
-
-
-def _queen_dir() -> Path:
-    return Path.home() / ".hive" / "queen"
-
-
-def semantic_memory_path() -> Path:
-    return _queen_dir() / "MEMORY.md"
-
-
-def episodic_memory_path(d: date | None = None) -> Path:
-    d = d or date.today()
-    return _queen_dir() / "memories" / f"MEMORY-{d.strftime('%Y-%m-%d')}.md"
-
-
-def read_semantic_memory() -> str:
-    path = semantic_memory_path()
-    return path.read_text(encoding="utf-8").strip() if path.exists() else ""
-
-
-def read_episodic_memory(d: date | None = None) -> str:
-    path = episodic_memory_path(d)
-    return path.read_text(encoding="utf-8").strip() if path.exists() else ""
-
-
-def format_for_injection() -> str:
-    """Format cross-session memory for system prompt injection.
-
-    Returns an empty string if no meaningful content exists yet (e.g. first
-    session with only the seed template).
-    """
-    semantic = read_semantic_memory()
-    episodic = read_episodic_memory()
-
-    # Suppress injection if semantic is still just the seed template
-    if semantic and semantic.startswith("# My Understanding of the User\n\n*No sessions"):
-        semantic = ""
-
-    parts: list[str] = []
-    if semantic:
-        parts.append(semantic)
-    if episodic:
-        today_str = date.today().strftime("%B %-d, %Y")
-        parts.append(f"## Today — {today_str}\n\n{episodic}")
-
-    if not parts:
-        return ""
-
-    body = "\n\n---\n\n".join(parts)
-    return "--- Your Cross-Session Memory ---\n\n" + body + "\n\n--- End Cross-Session Memory ---"
-
-
-_SEED_TEMPLATE = """\
-# My Understanding of the User
-
-*No sessions recorded yet.*
-
-## Who They Are
-
-## What They're Trying to Achieve
-
-## What's Working
-
-## What I've Learned
-"""
-
-
-def append_episodic_entry(content: str) -> None:
-    """Append a timestamped prose entry to today's episodic memory file.
-
-    Creates the file (with a date heading) if it doesn't exist yet.
-    Used both by the queen's diary tool and by the consolidation hook.
-    """
-    ep_path = episodic_memory_path()
-    ep_path.parent.mkdir(parents=True, exist_ok=True)
-    today_str = date.today().strftime("%B %-d, %Y")
-    timestamp = datetime.now().strftime("%H:%M")
-    if not ep_path.exists():
-        header = f"# {today_str}\n\n"
-        block = f"{header}### {timestamp}\n\n{content.strip()}\n"
-    else:
-        block = f"\n\n### {timestamp}\n\n{content.strip()}\n"
-    with ep_path.open("a", encoding="utf-8") as f:
-        f.write(block)
-
-
-def seed_if_missing() -> None:
-    """Create MEMORY.md with a blank template if it doesn't exist yet."""
-    path = semantic_memory_path()
-    if path.exists():
-        return
-    path.parent.mkdir(parents=True, exist_ok=True)
-    path.write_text(_SEED_TEMPLATE, encoding="utf-8")
-
-
-# ---------------------------------------------------------------------------
-# Consolidation prompt
-# ---------------------------------------------------------------------------
-
-_SEMANTIC_SYSTEM = """\
-You maintain the persistent cross-session memory of an AI assistant called the Queen.
-Review the session notes and rewrite MEMORY.md — the Queen's durable understanding of the
-person she works with across all sessions.
-
-Write entirely in the Queen's voice — first person, reflective, honest.
-Not a log of events, but genuine understanding of who this person is over time.
-
-Rules:
- Update and synthesise: incorporate new understanding, update facts that have changed, remove
-  details that are stale, superseded, or no longer say anything meaningful about the person.
- Keep it as structured markdown with named sections about the PERSON, not about today.
- Do NOT include diary sections, daily logs, or session summaries. Those belong elsewhere.
-  MEMORY.md is about who they are, what they want, what works — not what happened today.
- Reference dates only when noting a lasting milestone (e.g. "since March 8th they prefer X").
- If the session had no meaningful new information about the person,
-  return the existing text unchanged.
- Do not add fictional details. Only reflect what is evidenced in the notes.
- Stay concise. Prune rather than accumulate. A lean, accurate file is more useful than a
-  dense one. If something was true once but has been resolved or superseded, remove it.
- Output only the raw markdown content of MEMORY.md. No preamble, no code fences.
-"""
-
-_DIARY_SYSTEM = """\
-You maintain the daily episodic diary of an AI assistant called the Queen.
-You receive: (1) today's existing diary so far, and (2) notes from the latest session.
-
-Rewrite the complete diary for today as a single unified narrative —
-first person, reflective, honest.
-Merge and deduplicate: if the same story (e.g. a research agent stalling) recurred several times,
-describe it once with appropriate weight rather than retelling it. Weave in new developments from
-the session notes. Preserve important milestones, emotional texture, and session path references.
-
-If today's diary is empty, write the initial entry based on the session notes alone.
-
-Output only the full diary prose — no date heading, no timestamp headers,
-no preamble, no code fences.
-"""
-
-
-def read_session_context(session_dir: Path, max_messages: int = 80) -> str:
-    """Extract a readable transcript from conversation parts + adapt.md.
-
-    Reads the last ``max_messages`` conversation parts and the session's
-    adapt.md (working memory). Tool results are omitted — only user and
-    assistant turns (with tool-call names noted) are included.
-    """
-    parts: list[str] = []
-
-    # Working notes
-    adapt_path = session_dir / "data" / "adapt.md"
-    if adapt_path.exists():
-        text = adapt_path.read_text(encoding="utf-8").strip()
-        if text:
-            parts.append(f"## Session Working Notes (adapt.md)\n\n{text}")
-
-    # Conversation transcript
-    parts_dir = session_dir / "conversations" / "parts"
-    if parts_dir.exists():
-        part_files = sorted(parts_dir.glob("*.json"))[-max_messages:]
-        lines: list[str] = []
-        for pf in part_files:
-            try:
-                data = json.loads(pf.read_text(encoding="utf-8"))
-                role = data.get("role", "")
-                content = str(data.get("content", "")).strip()
-                tool_calls = data.get("tool_calls") or []
-                if role == "tool":
-                    continue  # skip verbose tool results
-                if role == "assistant" and tool_calls and not content:
-                    names = [tc.get("function", {}).get("name", "?") for tc in tool_calls]
-                    lines.append(f"[queen calls: {', '.join(names)}]")
-                elif content:
-                    label = "user" if role == "user" else "queen"
-                    lines.append(f"[{label}]: {content[:600]}")
-            except Exception:
-                continue
-        if lines:
-            parts.append("## Conversation\n\n" + "\n".join(lines))
-
-    return "\n\n".join(parts)
-
-
-# ---------------------------------------------------------------------------
-# Context compaction (binary-split LLM summarisation)
-# ---------------------------------------------------------------------------
-
-# If the raw session context exceeds this many characters, compact it first
-# before sending to the consolidation LLM. ~200 k chars ≈ 50 k tokens.
-_CTX_COMPACT_CHAR_LIMIT = 200_000
-_CTX_COMPACT_MAX_DEPTH = 8
-
-_COMPACT_SYSTEM = (
-    "Summarise this conversation segment. Preserve: user goals, key decisions, "
-    "what was built or changed, emotional tone, and important outcomes. "
-    "Write concisely in third person past tense. Omit routine tool invocations "
-    "unless the result matters."
-)
-
-
-async def _compact_context(text: str, llm: object, *, _depth: int = 0) -> str:
-    """Binary-split and LLM-summarise *text* until it fits within the char limit.
-
-    Mirrors the recursive binary-splitting strategy used by the main agent
-    compaction pipeline (EventLoopNode._llm_compact).
-    """
-    if len(text) <= _CTX_COMPACT_CHAR_LIMIT or _depth >= _CTX_COMPACT_MAX_DEPTH:
-        return text
-
-    # Split near the midpoint on a line boundary so we don't cut mid-message
-    mid = len(text) // 2
-    split_at = text.rfind("\n", 0, mid) + 1
-    if split_at <= 0:
-        split_at = mid
-
-    half1, half2 = text[:split_at], text[split_at:]
-
-    async def _summarise(chunk: str) -> str:
-        try:
-            resp = await llm.acomplete(
-                messages=[{"role": "user", "content": chunk}],
-                system=_COMPACT_SYSTEM,
-                max_tokens=2048,
-            )
-            return resp.content.strip()
-        except Exception:
-            logger.warning(
-                "queen_memory: context compaction LLM call failed (depth=%d), truncating",
-                _depth,
-            )
-            return chunk[: _CTX_COMPACT_CHAR_LIMIT // 4]
-
-    s1, s2 = await asyncio.gather(_summarise(half1), _summarise(half2))
-    combined = s1 + "\n\n" + s2
-    if len(combined) > _CTX_COMPACT_CHAR_LIMIT:
-        return await _compact_context(combined, llm, _depth=_depth + 1)
-    return combined
-
-
-async def consolidate_queen_memory(
-    session_id: str,
-    session_dir: Path,
-    llm: object,
-) -> None:
-    """Update MEMORY.md and append a diary entry based on the current session.
-
-    Reads conversation parts and adapt.md from session_dir. Called
-    periodically in the background and once at session end. Failures are
-    logged and silently swallowed so they never block teardown.
-
-    Args:
-        session_id: The session ID (used for the adapt.md path reference).
-        session_dir: Path to the session directory (~/.hive/queen/session/{id}).
-        llm: LLMProvider instance (must support acomplete()).
-    """
-    try:
-        session_context = read_session_context(session_dir)
-        if not session_context:
-            logger.debug("queen_memory: no session context, skipping consolidation")
-            return
-
-        logger.info("queen_memory: consolidating memory for session %s ...", session_id)
-
-        # If the transcript is very large, compact it with recursive binary LLM
-        # summarisation before sending to the consolidation model.
-        if len(session_context) > _CTX_COMPACT_CHAR_LIMIT:
-            logger.info(
-                "queen_memory: session context is %d chars — compacting first",
-                len(session_context),
-            )
-            session_context = await _compact_context(session_context, llm)
-            logger.info("queen_memory: compacted to %d chars", len(session_context))
-
-        existing_semantic = read_semantic_memory()
-        today_journal = read_episodic_memory()
-        today_str = date.today().strftime("%B %-d, %Y")
-        adapt_path = session_dir / "data" / "adapt.md"
-
-        user_msg = (
-            f"## Existing Semantic Memory (MEMORY.md)\n\n"
-            f"{existing_semantic or '(none yet)'}\n\n"
-            f"## Today's Diary So Far ({today_str})\n\n"
-            f"{today_journal or '(none yet)'}\n\n"
-            f"{session_context}\n\n"
-            f"## Session Reference\n\n"
-            f"Session ID: {session_id}\n"
-            f"Session path: {adapt_path}\n"
-        )
-
-        logger.debug(
-            "queen_memory: calling LLM (%d chars of context, ~%d tokens est.)",
-            len(user_msg),
-            len(user_msg) // 4,
-        )
-
-        from framework.agents.queen.config import default_config
-
-        semantic_resp, diary_resp = await asyncio.gather(
-            llm.acomplete(
-                messages=[{"role": "user", "content": user_msg}],
-                system=_SEMANTIC_SYSTEM,
-                max_tokens=default_config.max_tokens,
-            ),
-            llm.acomplete(
-                messages=[{"role": "user", "content": user_msg}],
-                system=_DIARY_SYSTEM,
-                max_tokens=default_config.max_tokens,
-            ),
-        )
-
-        new_semantic = semantic_resp.content.strip()
-        diary_entry = diary_resp.content.strip()
-
-        if new_semantic:
-            path = semantic_memory_path()
-            path.parent.mkdir(parents=True, exist_ok=True)
-            path.write_text(new_semantic, encoding="utf-8")
-            logger.info("queen_memory: semantic memory updated (%d chars)", len(new_semantic))
-
-        if diary_entry:
-            # Rewrite today's episodic file in-place — the LLM has merged and
-            # deduplicated the full day's content, so we replace rather than append.
-            ep_path = episodic_memory_path()
-            ep_path.parent.mkdir(parents=True, exist_ok=True)
-            heading = f"# {today_str}"
-            ep_path.write_text(f"{heading}\n\n{diary_entry}\n", encoding="utf-8")
-            logger.info(
-                "queen_memory: episodic diary rewritten for %s (%d chars)",
-                today_str,
-                len(diary_entry),
-            )
-
-    except Exception:
-        tb = traceback.format_exc()
-        logger.exception("queen_memory: consolidation failed")
-        # Write to file so the cause is findable regardless of log verbosity.
-        error_path = _queen_dir() / "consolidation_error.txt"
-        try:
-            error_path.parent.mkdir(parents=True, exist_ok=True)
-            error_path.write_text(
-                f"session: {session_id}\ntime: {datetime.now().isoformat()}\n\n{tb}",
-                encoding="utf-8",
-            )
-        except Exception:
-            pass
@@ -0,0 +1,214 @@
+"""Queen global memory helpers.
+
+Global memory lives in ``~/.hive/queen/global_memory/`` and stores durable
+cross-session knowledge about the user (profile, preferences, environment,
+feedback).  Each memory is an individual ``.md`` file with optional YAML
+frontmatter (name, type, description).
+"""
+
+from __future__ import annotations
+
+import logging
+import re
+from dataclasses import dataclass, field
+from pathlib import Path
+
+logger = logging.getLogger(__name__)
+
+# ---------------------------------------------------------------------------
+# Constants
+# ---------------------------------------------------------------------------
+
+GLOBAL_MEMORY_CATEGORIES: tuple[str, ...] = ("profile", "preference", "environment", "feedback")
+
+_HIVE_QUEEN_DIR = Path.home() / ".hive" / "queen"
+
+MAX_FILES: int = 200
+MAX_FILE_SIZE_BYTES: int = 4096  # 4 KB hard limit per memory file
+
+# How many lines of a memory file to read for header scanning.
+_HEADER_LINE_LIMIT: int = 30
+
+
+def global_memory_dir() -> Path:
+    """Return the queen-global memory directory."""
+    return _HIVE_QUEEN_DIR / "global_memory"
+
+
+# ---------------------------------------------------------------------------
+# Frontmatter parsing (lenient)
+# ---------------------------------------------------------------------------
+
+_FRONTMATTER_RE = re.compile(r"^---\s*\n(.*?)\n---\s*\n?", re.DOTALL)
+
+
+def parse_frontmatter(text: str) -> dict[str, str]:
+    """Extract YAML-ish frontmatter from *text*.
+
+    Returns a dict of key-value pairs.  Never raises — returns ``{}`` on
+    any parse failure.  Values are stripped strings; no nested structures.
+    """
+    m = _FRONTMATTER_RE.match(text)
+    if not m:
+        return {}
+    result: dict[str, str] = {}
+    for line in m.group(1).splitlines():
+        line = line.strip()
+        if not line or line.startswith("#"):
+            continue
+        colon = line.find(":")
+        if colon < 1:
+            continue
+        key = line[:colon].strip().lower()
+        val = line[colon + 1 :].strip()
+        if val:
+            result[key] = val
+    return result
+
+
+def parse_global_memory_category(raw: str | None) -> str | None:
+    """Validate *raw* against ``GLOBAL_MEMORY_CATEGORIES``."""
+    if raw is None:
+        return None
+    normalized = raw.strip().lower()
+    return normalized if normalized in GLOBAL_MEMORY_CATEGORIES else None
+
+
+# ---------------------------------------------------------------------------
+# MemoryFile dataclass
+# ---------------------------------------------------------------------------
+
+
+@dataclass
+class MemoryFile:
+    """Parsed representation of a single memory file on disk."""
+
+    filename: str
+    path: Path
+    # Frontmatter fields — all nullable (lenient parsing).
+    name: str | None = None
+    type: str | None = None
+    description: str | None = None
+    # First N lines of the file (for manifest / header scanning).
+    header_lines: list[str] = field(default_factory=list)
+    # Filesystem modification time (seconds since epoch).
+    mtime: float = 0.0
+
+    @classmethod
+    def from_path(cls, path: Path) -> MemoryFile:
+        """Read a memory file and leniently parse its frontmatter."""
+        try:
+            text = path.read_text(encoding="utf-8")
+        except OSError:
+            return cls(filename=path.name, path=path)
+
+        fm = parse_frontmatter(text)
+        lines = text.splitlines()[:_HEADER_LINE_LIMIT]
+
+        try:
+            mtime = path.stat().st_mtime
+        except OSError:
+            mtime = 0.0
+
+        return cls(
+            filename=path.name,
+            path=path,
+            name=fm.get("name"),
+            type=parse_global_memory_category(fm.get("type")),
+            description=fm.get("description"),
+            header_lines=lines,
+            mtime=mtime,
+        )
+
+
+# ---------------------------------------------------------------------------
+# Scanning
+# ---------------------------------------------------------------------------
+
+
+def scan_memory_files(memory_dir: Path | None = None) -> list[MemoryFile]:
+    """Scan *memory_dir* for ``.md`` files, returning up to ``MAX_FILES``.
+
+    Files are sorted by modification time (newest first).  Dotfiles and
+    subdirectories are ignored.
+    """
+    d = memory_dir or global_memory_dir()
+    if not d.is_dir():
+        return []
+
+    md_files = sorted(
+        (f for f in d.glob("*.md") if f.is_file() and not f.name.startswith(".")),
+        key=lambda p: p.stat().st_mtime,
+        reverse=True,
+    )
+
+    return [MemoryFile.from_path(f) for f in md_files[:MAX_FILES]]
+
+
+def slugify_memory_name(raw: str) -> str:
+    """Create a filesystem-safe slug for a memory filename."""
+    slug = re.sub(r"[^a-z0-9]+", "-", raw.strip().lower()).strip("-")
+    return slug or "memory"
+
+
+def allocate_memory_filename(
+    memory_dir: Path,
+    name: str,
+    *,
+    suffix: str = ".md",
+) -> str:
+    """Allocate a unique filename in *memory_dir* based on *name*."""
+    base = slugify_memory_name(name)
+    candidate = f"{base}{suffix}"
+    counter = 2
+    while (memory_dir / candidate).exists():
+        candidate = f"{base}-{counter}{suffix}"
+        counter += 1
+    return candidate
+
+
+def build_memory_document(
+    *,
+    name: str,
+    description: str,
+    mem_type: str,
+    body: str,
+) -> str:
+    """Build one memory file with frontmatter and body."""
+    return (
+        f"---\n"
+        f"name: {name.strip()}\n"
+        f"description: {description.strip()}\n"
+        f"type: {mem_type.strip()}\n"
+        f"---\n\n"
+        f"{body.strip()}\n"
+    )
+
+
+# ---------------------------------------------------------------------------
+# Manifest formatting
+# ---------------------------------------------------------------------------
+
+
+def format_memory_manifest(files: list[MemoryFile]) -> str:
+    """One-line-per-file text manifest.
+
+    Format: ``[type] filename: description``
+    """
+    lines: list[str] = []
+    for mf in files:
+        t = mf.type or "unknown"
+        desc = mf.description or "(no description)"
+        lines.append(f"[{t}] {mf.filename}: {desc}")
+    return "\n".join(lines)
+
+
+# ---------------------------------------------------------------------------
+# Initialisation
+# ---------------------------------------------------------------------------
+
+
+def init_memory_dir(memory_dir: Path | None = None) -> None:
+    """Create the memory directory if missing."""
+    d = memory_dir or global_memory_dir()
+    d.mkdir(parents=True, exist_ok=True)
@@ -0,0 +1,129 @@
+"""Recall selector — pre-turn global memory selection for the queen.
+
+Before each conversation turn the system:
+  1. Scans the global memory directory for ``.md`` files (cap: 200).
+  2. Reads headers (frontmatter + first 30 lines).
+  3. Uses a single LLM call with structured JSON output to pick the ~5
+     most relevant memories.
+  4. Injects them into the system prompt.
+
+The selector only sees the user's query string — no full conversation
+context.  This keeps it cheap and fast.  Errors are caught and return
+``[]`` so the main conversation is never blocked.
+"""
+
+from __future__ import annotations
+
+import json
+import logging
+from pathlib import Path
+from typing import Any
+
+from framework.agents.queen.queen_memory_v2 import (
+    format_memory_manifest,
+    global_memory_dir,
+    scan_memory_files,
+)
+
+logger = logging.getLogger(__name__)
+
+# ---------------------------------------------------------------------------
+# Structured output schema
+# ---------------------------------------------------------------------------
+
+SELECT_MEMORIES_SYSTEM_PROMPT = """\
+You are selecting memories that will be useful to the Queen agent as it \
+processes a user's query.
+
+You will be given the user's query and a list of available memory files \
+with their filenames and descriptions.
+
+Return a JSON object with a single key "selected_memories" containing a \
+list of filenames for the memories that will clearly be useful as the \
+Queen processes the user's query (up to 5).
+
+Only include memories that you are certain will be helpful based on their \
+name and description.
+- If you are unsure if a memory will be useful in processing the user's \
+query, then do not include it in your list.  Be selective and discerning.
+- If there are no memories in the list that would clearly be useful, \
+return an empty list.
+"""
+
+# ---------------------------------------------------------------------------
+# Core functions
+# ---------------------------------------------------------------------------
+
+
+async def select_memories(
+    query: str,
+    llm: Any,
+    memory_dir: Path | None = None,
+    *,
+    max_results: int = 5,
+) -> list[str]:
+    """Select up to 5 relevant memory filenames for *query*.
+
+    Returns a list of filenames.  Best-effort: on any error returns ``[]``.
+    """
+    mem_dir = memory_dir or global_memory_dir()
+    files = scan_memory_files(mem_dir)
+    if not files:
+        logger.debug("recall: no memory files found, skipping selection")
+        return []
+
+    logger.debug("recall: selecting from %d memories for query: %.100s", len(files), query)
+    manifest = format_memory_manifest(files)
+    user_msg = f"## User query\n\n{query}\n\n## Available memories\n\n{manifest}"
+
+    try:
+        resp = await llm.acomplete(
+            messages=[{"role": "user", "content": user_msg}],
+            system=SELECT_MEMORIES_SYSTEM_PROMPT,
+            max_tokens=1024,
+            response_format={"type": "json_object"},
+        )
+        raw = (resp.content or "").strip()
+        if not raw:
+            logger.warning(
+                "recall: LLM returned empty response (model=%s, stop=%s)",
+                resp.model,
+                resp.stop_reason,
+            )
+            return []
+        data = json.loads(raw)
+        selected = data.get("selected_memories", [])
+        valid_names = {f.filename for f in files}
+        result = [s for s in selected if s in valid_names][:max_results]
+        logger.debug("recall: selected %d memories: %s", len(result), result)
+        return result
+    except Exception as exc:
+        logger.warning("recall: memory selection failed (%s), returning []", exc)
+        return []
+
+
+def format_recall_injection(
+    filenames: list[str],
+    memory_dir: Path | None = None,
+) -> str:
+    """Read selected memory files and format for system prompt injection."""
+    mem_dir = memory_dir or global_memory_dir()
+    if not filenames:
+        return ""
+
+    blocks: list[str] = []
+    for fname in filenames:
+        path = mem_dir / fname
+        if not path.is_file():
+            continue
+        try:
+            content = path.read_text(encoding="utf-8").strip()
+        except OSError:
+            continue
+        blocks.append(f"### {fname}\n\n{content}")
+
+    if not blocks:
+        return ""
+
+    body = "\n\n---\n\n".join(blocks)
+    return f"--- Global Memories ---\n\n{body}\n\n--- End Global Memories ---"
@@ -27,7 +27,9 @@
 ## GCU Errors
 15. **Manually wiring browser tools on event_loop nodes** — Use `node_type="gcu"` which auto-includes browser tools. Do NOT manually list browser tool names.
 16. **Using GCU nodes as regular graph nodes** — GCU nodes are subagents only. They must ONLY appear in `sub_agents=["gcu-node-id"]` and be invoked via `delegate_to_sub_agent()`. Never connect via edges or use as entry/terminal nodes.
+17. **Reusing the same GCU node ID for parallel tasks** — Each concurrent browser task needs a distinct GCU node ID (e.g. `gcu-site-a`, `gcu-site-b`). Two `delegate_to_sub_agent` calls with the same `agent_id` share a browser profile and will interfere with each other's pages.
+18. **Passing `profile=` in GCU tool calls** — Profile isolation for parallel subagents is automatic. The framework injects a unique profile per subagent via an asyncio `ContextVar`. Hardcoding `profile="default"` in a GCU system prompt breaks this isolation.

 ## Worker Agent Errors
-17. **Adding client-facing intake node to workers** — The queen owns intake. Workers should start with an autonomous processing node. Client-facing nodes in workers are for mid-execution review/approval only.
-18. **Putting `escalate` or `set_output` in NodeSpec `tools=[]`** — These are synthetic framework tools, auto-injected at runtime. Only list MCP tools from `list_agent_tools()`.
+19. **Adding client-facing intake node to workers** — The queen owns intake. Workers should start with an autonomous processing node. Route worker review/approval through queen escalation instead of direct worker HITL.
+20. **Putting `escalate` or `set_output` in NodeSpec `tools=[]`** — These are synthetic framework tools, auto-injected at runtime. Only list MCP tools from `list_agent_tools()`.
@@ -332,81 +332,46 @@ class MyAgent:
 default_agent = MyAgent()
 ```

-## agent.py — Async Entry Points Variant
+## triggers.json — Timer and Webhook Triggers

-When an agent needs timers, webhooks, or event-driven triggers, add
-`async_entry_points` and optionally `runtime_config` as module-level variables.
-These are IN ADDITION to the standard variables above.
+When an agent needs timers, webhooks, or event-driven triggers, create a
+`triggers.json` file in the agent's directory (alongside `agent.py`).
+The queen loads these at session start and the user can manage them via
+the `set_trigger` / `remove_trigger` tools at runtime.

-```python
-# Additional imports for async entry points
-from framework.graph.edge import GraphSpec, AsyncEntryPointSpec
-from framework.runtime.agent_runtime import (
-    AgentRuntime, AgentRuntimeConfig, create_agent_runtime,
-)
-
-# ... (goal, nodes, edges, entry_node, entry_points, etc. as above) ...
-
-# Async entry points — event-driven triggers
-async_entry_points = [
-    # Timer with cron: daily at 9am
-    AsyncEntryPointSpec(
-        id="daily-check",
-        name="Daily Check",
-        entry_node="process-node",
-        trigger_type="timer",
-        trigger_config={"cron": "0 9 * * *"},
-        isolation_level="shared",
-        max_concurrent=1,
-    ),
-    # Timer with fixed interval: every 20 minutes
-    AsyncEntryPointSpec(
-        id="scheduled-check",
-        name="Scheduled Check",
-        entry_node="process-node",
-        trigger_type="timer",
-        trigger_config={"interval_minutes": 20, "run_immediately": False},
-        isolation_level="shared",
-        max_concurrent=1,
-    ),
-    # Event: reacts to webhook events
-    AsyncEntryPointSpec(
-        id="webhook-event",
-        name="Webhook Event Handler",
-        entry_node="process-node",
-        trigger_type="event",
-        trigger_config={"event_types": ["webhook_received"]},
-        isolation_level="shared",
-        max_concurrent=10,
-    ),
+```json
+[
+  {
+    "id": "daily-check",
+    "name": "Daily Check",
+    "trigger_type": "timer",
+    "trigger_config": {"cron": "0 9 * * *"},
+    "task": "Run the daily check process"
+  },
+  {
+    "id": "scheduled-check",
+    "name": "Scheduled Check",
+    "trigger_type": "timer",
+    "trigger_config": {"interval_minutes": 20},
+    "task": "Run the scheduled check"
+  },
+  {
+    "id": "webhook-event",
+    "name": "Webhook Event Handler",
+    "trigger_type": "webhook",
+    "trigger_config": {"event_types": ["webhook_received"]},
+    "task": "Process incoming webhook event"
+  }
 ]
-
-# Webhook server config (only needed if using webhooks)
-runtime_config = AgentRuntimeConfig(
-    webhook_host="127.0.0.1",
-    webhook_port=8080,
-    webhook_routes=[
-        {
-            "source_id": "my-source",
-            "path": "/webhooks/my-source",
-            "methods": ["POST"],
-        },
-    ],
-)
 ```

-**Key rules for async entry points:**
- `async_entry_points` is a list of `AsyncEntryPointSpec` (NOT `EntryPointSpec`)
- `runtime_config` is `AgentRuntimeConfig` (NOT `RuntimeConfig` from config.py)
- Valid trigger_types: `timer`, `event`, `webhook`, `manual`, `api`
- Valid isolation_levels: `isolated`, `shared`, `synchronized`
+**Key rules for triggers.json:**
+- Valid trigger_types: `timer`, `webhook`
 - Timer trigger_config (cron): `{"cron": "0 9 * * *"}` — standard 5-field cron expression
- Timer trigger_config (interval): `{"interval_minutes": float, "run_immediately": bool}`
- Event trigger_config: `{"event_types": ["webhook_received"], "filter_stream": "...", "filter_node": "..."}`
- Use `isolation_level="shared"` for async entry points that need to read
-  the primary session's memory (e.g., user-configured rules)
- The `_build_graph()` method passes `async_entry_points` to GraphSpec
- Reference: `exports/gmail_inbox_guardian/agent.py`
+- Timer trigger_config (interval): `{"interval_minutes": float}`
+- Each trigger must have a unique `id`
+- The `task` field describes what the worker should do when the trigger fires
+- Triggers are persisted back to `triggers.json` when modified via queen tools

 ## __init__.py

@@ -453,21 +418,6 @@ __all__ = [
 ]
 ```

-**If the agent uses async entry points**, also import and export:
-```python
-from .agent import (
-    ...,
-    async_entry_points,
-    runtime_config,  # Only if using webhooks
-)
-
-__all__ = [
-    ...,
-    "async_entry_points",
-    "runtime_config",
-]
-```
-
 ## __main__.py

 ```python
@@ -31,8 +31,7 @@ module-level variables via `getattr()`:
 | `conversation_mode` | no | not passed | Isolated mode (no context carryover) |
 | `identity_prompt` | no | not passed | No agent-level identity |
 | `loop_config` | no | `{}` | No iteration limits |
-| `async_entry_points` | no | `[]` | No async triggers (timers, webhooks, events) |
-| `runtime_config` | no | `None` | No webhook server |
+| `triggers.json` (file) | no | not present | No triggers (timers, webhooks) |

 **CRITICAL:** `__init__.py` MUST import and re-export ALL of these from
 `agent.py`. Missing exports silently fall back to defaults, causing
@@ -77,7 +76,7 @@ goal = Goal(
 | output_keys | list[str] | required | Memory keys this node writes via set_output |
 | system_prompt | str | "" | LLM instructions |
 | tools | list[str] | [] | Tool names from MCP servers |
-| client_facing | bool | False | If True, streams to user and blocks for input |
+| client_facing | bool | False | Deprecated compatibility field. Queen interactivity is implicit; workers should escalate instead |
 | nullable_output_keys | list[str] | [] | Keys that may remain unset |
 | max_node_visits | int | 0 | 0=unlimited (default); >1 for one-shot feedback loops |
 | max_retries | int | 3 | Retries on failure |
@@ -111,7 +110,7 @@ This prevents premature set_output before user interaction.
 **Hard limit: 3-6 nodes for most agents.** Never exceed 6 unless the user
 explicitly requests a complex multi-phase pipeline.

-Each node boundary serializes outputs to shared memory and **destroys** all
+Each node boundary serializes outputs to the shared buffer and **destroys** all
 in-context information: tool call results, intermediate reasoning, conversation
 history. A research node that searches, fetches, and analyzes in ONE node keeps
 all source material in its conversation context. Split across 3 nodes, each
@@ -133,13 +132,14 @@ downstream node only sees the serialized summary string.

 **Typical agent structure (2 nodes):**
 ```
-process (autonomous) ←→ review (client-facing)
+process (autonomous) ←→ review (queen-mediated)
 ```
 The queen owns intake — she gathers requirements from the user, then
 passes structured input via `run_agent_with_input(task)`. When building
 the agent, design the entry node's `input_keys` to match what the queen
 will provide at run time. Worker agents should NOT have a client-facing
-intake node. Client-facing nodes are for mid-execution review/approval only.
+intake node. Mid-execution review/approval should happen through queen
+escalation rather than direct worker HITL.

 For simpler agents, just 1 autonomous node:
 ```
@@ -173,7 +173,7 @@ Use `conversation_mode="continuous"` to preserve context across transitions.
 ### set_output
 - Synthetic tool injected by framework
 - Call separately from real tool calls (separate turn)
- `set_output("key", "value")` stores to shared memory
+- `set_output("key", "value")` stores to the shared buffer

 ## Edge Conditions

@@ -247,7 +247,7 @@ For large data that exceeds context:
 Multiple ON_SUCCESS edges from same source → parallel execution via asyncio.gather().
 - Parallel nodes must have disjoint output_keys
 - Only one branch may have client_facing nodes
- Fan-in node gets all outputs in shared memory
+- Fan-in node gets all outputs in the shared buffer

 ## Judge System

@@ -257,44 +257,28 @@ Multiple ON_SUCCESS edges from same source → parallel execution via asyncio.ga

 Judge is the SOLE acceptance mechanism — no ad-hoc framework gating.

-## Async Entry Points (Webhooks, Timers, Events)
+## Triggers (Timers, Webhooks)

-For agents that react to external events, use `AsyncEntryPointSpec`:
+For agents that react to external events, create a `triggers.json` file
+in the agent's export directory:

-```python
-from framework.graph.edge import AsyncEntryPointSpec
-from framework.runtime.agent_runtime import AgentRuntimeConfig
-
-# Timer trigger (cron or interval)
-async_entry_points = [
-    AsyncEntryPointSpec(
-        id="daily-check",
-        name="Daily Check",
-        entry_node="process",
-        trigger_type="timer",
-        trigger_config={"cron": "0 9 * * *"},  # daily at 9am
-        isolation_level="shared",
-    )
+```json
+[
+  {
+    "id": "daily-check",
+    "name": "Daily Check",
+    "trigger_type": "timer",
+    "trigger_config": {"cron": "0 9 * * *"},
+    "task": "Run the daily check process"
+  }
 ]
-
-# Webhook server (optional)
-runtime_config = AgentRuntimeConfig(
-    webhook_host="127.0.0.1",
-    webhook_port=8080,
-    webhook_routes=[{"source_id": "gmail", "path": "/webhooks/gmail", "methods": ["POST"]}],
-)
 ```

 ### Key Fields
- `trigger_type`: `"timer"`, `"event"`, `"webhook"`, `"manual"`
+- `trigger_type`: `"timer"` or `"webhook"`
 - `trigger_config`: `{"cron": "0 9 * * *"}` or `{"interval_minutes": 20}`
- `isolation_level`: `"shared"` (recommended), `"isolated"`, `"synchronized"`
- `event_types`: For event triggers, e.g., `["webhook_received"]`
-
-### Exports Required
-Both `async_entry_points` and `runtime_config` must be exported from `__init__.py`.
-
-See `exports/gmail_inbox_guardian/agent.py` for complete example.
+- `task`: describes what the worker should do when the trigger fires
+- Triggers can also be created/removed at runtime via `set_trigger` / `remove_trigger` queen tools

 ## Tool Discovery

@@ -109,9 +109,48 @@ Key rules to bake into GCU node prompts:
 - Keep tool calls per turn ≤10
 - Tab isolation: when browser is already running, use `browser_open(background=true)` and pass `target_id` to every call

+## Multiple Concurrent GCU Subagents
+
+When a task can be parallelized across multiple sites or profiles, declare a distinct GCU
+node for each and invoke them all in the same LLM turn.  The framework batches all
+`delegate_to_sub_agent` calls made in one turn and runs them with `asyncio.gather`, so
+they execute concurrently — not sequentially.
+
+**Each GCU subagent automatically gets its own isolated browser context** — no `profile=`
+argument is needed in tool calls.  The framework derives a unique profile from the subagent's
+node ID and instance counter and injects it via an asyncio `ContextVar` before the subagent
+runs.
+
+### Example: three sites in parallel
+
+```python
+# Three distinct GCU nodes
+gcu_site_a = NodeSpec(id="gcu-site-a", node_type="gcu", ...)
+gcu_site_b = NodeSpec(id="gcu-site-b", node_type="gcu", ...)
+gcu_site_c = NodeSpec(id="gcu-site-c", node_type="gcu", ...)
+
+orchestrator = NodeSpec(
+    id="orchestrator",
+    node_type="event_loop",
+    sub_agents=["gcu-site-a", "gcu-site-b", "gcu-site-c"],
+    system_prompt="""\
+Call all three subagents in a single response to run them in parallel:
+  delegate_to_sub_agent(agent_id="gcu-site-a", task="Scrape prices from site A")
+  delegate_to_sub_agent(agent_id="gcu-site-b", task="Scrape prices from site B")
+  delegate_to_sub_agent(agent_id="gcu-site-c", task="Scrape prices from site C")
+""",
+)
+```
+
+**Rules:**
+- Use distinct node IDs for each concurrent task — sharing an ID shares the browser context.
+- The GCU node prompts do not need to mention `profile=`; isolation is automatic.
+- Cleanup is automatic at session end, but GCU nodes can call `browser_stop()` explicitly
+  if they want to release resources mid-run.
+
 ## GCU Anti-Patterns

- Using `browser_screenshot` to read text (use `browser_snapshot`)
+- Using `browser_screenshot` to read text (use `browser_snapshot` instead; screenshots are for visual context only)
 - Re-navigating after scrolling (resets scroll position)
 - Attempting login on auth walls
 - Forgetting `target_id` in multi-tab scenarios
@@ -1,63 +0,0 @@
-# Queen Memory — File System Structure
-
-```
-~/.hive/
-├── queen/
-│   ├── MEMORY.md                          ← Semantic memory
-│   ├── memories/
-│   │   ├── MEMORY-2026-03-09.md           ← Episodic memory (today)
-│   │   ├── MEMORY-2026-03-08.md
-│   │   └── ...
-│   └── session/
-│       └── {session_id}/                  ← One dir per session (or resumed-from session)
-│           ├── conversations/
-│           │   ├── parts/
-│           │   │   ├── 00001.json         ← One file per message (role, content, tool_calls)
-│           │   │   ├── 00002.json
-│           │   │   └── ...
-│           │   └── spillover/
-│           │       ├── conversation_1.md  ← Compacted old conversation segments
-│           │       ├── conversation_2.md
-│           │       └── ...
-│           └── data/
-│               ├── adapt.md              ← Working memory (session-scoped)
-│               ├── web_search_1.txt      ← Spillover: large tool results
-│               ├── web_search_2.txt
-│               └── ...
-```
-
---
-
-## The three memory tiers
-
-| File | Tier | Written by | Read at |
-|---|---|---|---|
-| `MEMORY.md` | Semantic | Consolidation LLM (auto, post-session) | Session start (injected into system prompt) |
-| `memories/MEMORY-YYYY-MM-DD.md` | Episodic | Queen via `write_to_diary` tool + consolidation LLM | Session start (today's file injected) |
-| `data/adapt.md` | Working | Queen via `update_session_notes` tool | Every turn (inlined in system prompt) |
-
---
-
-## Session directory naming
-
-The session directory name is **`queen_resume_from`** when a cold-restore resumes an existing
-session, otherwise the new **`session_id`**. This means resumed sessions accumulate all messages
-in the original directory rather than fragmenting across multiple folders.
-
---
-
-## Consolidation
-
-`consolidate_queen_memory()` runs every **5 minutes** in the background and once more at session
-end. It reads:
-
-1. `conversations/parts/*.json` — full message history (user + assistant turns; tool results skipped)
-2. `data/adapt.md` — current working notes
-
-It then makes two LLM writes:
-
- Rewrites `MEMORY.md` in place (semantic memory — queen never touches this herself)
- Appends a timestamped prose entry to today's `memories/MEMORY-YYYY-MM-DD.md`
-
-If the combined transcript exceeds ~200 K characters it is recursively binary-compacted via the
-LLM before being sent to the consolidation model (mirrors `EventLoopNode._llm_compact`).
@@ -0,0 +1,594 @@
+"""Reflection agent — background global memory extraction for the queen.
+
+A lightweight side agent that runs after each queen LLM turn.  It inspects
+recent conversation messages and extracts durable user knowledge into
+individual memory files in ``~/.hive/queen/global_memory/``.
+
+Two reflection types:
+  - **Short reflection**: after conversational queen turns.  Distills
+    learnings about the user (profile, preferences, environment, feedback).
+  - **Long reflection**: every 5 short reflections and on CONTEXT_COMPACTED.
+    Organises, deduplicates, trims the global memory directory.
+
+Concurrency: an ``asyncio.Lock`` prevents overlapping runs.  If a trigger
+fires while a reflection is already active the event is skipped.
+
+All reflections are fire-and-forget (spawned via ``asyncio.create_task``)
+so they never block the queen's event loop.
+"""
+
+from __future__ import annotations
+
+import asyncio
+import json
+import logging
+import traceback
+from datetime import datetime
+from pathlib import Path
+from typing import Any
+
+from framework.agents.queen.queen_memory_v2 import (
+    GLOBAL_MEMORY_CATEGORIES,
+    MAX_FILE_SIZE_BYTES,
+    MAX_FILES,
+    format_memory_manifest,
+    global_memory_dir,
+    parse_frontmatter,
+    scan_memory_files,
+)
+from framework.llm.provider import LLMResponse, Tool
+
+logger = logging.getLogger(__name__)
+
+# ---------------------------------------------------------------------------
+# Reflection tool definitions (internal — not in queen's main registry)
+# ---------------------------------------------------------------------------
+
+_REFLECTION_TOOLS: list[Tool] = [
+    Tool(
+        name="list_memory_files",
+        description=(
+            "List all memory files with their type, name, and description. "
+            "Returns a text manifest — one line per file."
+        ),
+        parameters={
+            "type": "object",
+            "properties": {},
+            "additionalProperties": False,
+        },
+    ),
+    Tool(
+        name="read_memory_file",
+        description="Read the full content of a memory file by filename.",
+        parameters={
+            "type": "object",
+            "properties": {
+                "filename": {
+                    "type": "string",
+                    "description": "The filename (e.g. 'user-prefers-dark-mode.md').",
+                },
+            },
+            "required": ["filename"],
+            "additionalProperties": False,
+        },
+    ),
+    Tool(
+        name="write_memory_file",
+        description=(
+            "Create or overwrite a memory file.  Content should include YAML "
+            "frontmatter (name, description, type) followed by the memory body.  "
+            f"Max file size: {MAX_FILE_SIZE_BYTES} bytes.  Max files: {MAX_FILES}."
+        ),
+        parameters={
+            "type": "object",
+            "properties": {
+                "filename": {
+                    "type": "string",
+                    "description": "Filename ending in .md (e.g. 'user-prefers-dark-mode.md').",
+                },
+                "content": {
+                    "type": "string",
+                    "description": "Full file content including frontmatter.",
+                },
+            },
+            "required": ["filename", "content"],
+            "additionalProperties": False,
+        },
+    ),
+    Tool(
+        name="delete_memory_file",
+        description=(
+            "Delete a memory file by filename.  Use during long "
+            "reflection to prune stale or redundant memories."
+        ),
+        parameters={
+            "type": "object",
+            "properties": {
+                "filename": {
+                    "type": "string",
+                    "description": "The filename to delete.",
+                },
+            },
+            "required": ["filename"],
+            "additionalProperties": False,
+        },
+    ),
+]
+
+
+def _safe_memory_path(filename: str, memory_dir: Path) -> Path:
+    """Resolve *filename* inside *memory_dir*, raising if it escapes."""
+    if not filename or filename.strip() != filename:
+        raise ValueError(f"Invalid filename: {filename!r}")
+    if "/" in filename or "\\" in filename or ".." in filename:
+        raise ValueError(f"Invalid filename: path components not allowed: {filename!r}")
+    candidate = (memory_dir / filename).resolve()
+    root = memory_dir.resolve()
+    if not candidate.is_relative_to(root):
+        raise ValueError(f"Path escapes memory directory: {filename!r}")
+    return candidate
+
+
+def _execute_tool(name: str, args: dict[str, Any], memory_dir: Path) -> str:
+    """Execute a reflection tool synchronously.  Returns the result string."""
+    if name == "list_memory_files":
+        files = scan_memory_files(memory_dir)
+        logger.debug("reflect: tool list_memory_files → %d files", len(files))
+        if not files:
+            return "(no memory files yet)"
+        return format_memory_manifest(files)
+
+    if name == "read_memory_file":
+        filename = args.get("filename", "")
+        try:
+            path = _safe_memory_path(filename, memory_dir)
+        except ValueError as exc:
+            return f"ERROR: {exc}"
+        if not path.exists() or not path.is_file():
+            return f"ERROR: File not found: {filename}"
+        try:
+            return path.read_text(encoding="utf-8")
+        except OSError as e:
+            return f"ERROR: {e}"
+
+    if name == "write_memory_file":
+        filename = args.get("filename", "")
+        content = args.get("content", "")
+        if not filename.endswith(".md"):
+            return "ERROR: Filename must end with .md"
+        # Enforce global memory type restrictions.
+        fm = parse_frontmatter(content)
+        mem_type = (fm.get("type") or "").strip().lower()
+        if mem_type and mem_type not in GLOBAL_MEMORY_CATEGORIES:
+            return (
+                f"ERROR: Invalid memory type '{mem_type}'. "
+                f"Allowed types: {', '.join(GLOBAL_MEMORY_CATEGORIES)}."
+            )
+        # Enforce file size limit.
+        if len(content.encode("utf-8")) > MAX_FILE_SIZE_BYTES:
+            return f"ERROR: Content exceeds {MAX_FILE_SIZE_BYTES} byte limit."
+        # Enforce file cap (only for new files).
+        try:
+            path = _safe_memory_path(filename, memory_dir)
+        except ValueError as exc:
+            return f"ERROR: {exc}"
+        if not path.exists():
+            existing = list(memory_dir.glob("*.md"))
+            if len(existing) >= MAX_FILES:
+                return f"ERROR: File cap reached ({MAX_FILES}).  Delete a file first."
+        memory_dir.mkdir(parents=True, exist_ok=True)
+        path.write_text(content, encoding="utf-8")
+        logger.debug("reflect: tool write_memory_file → %s (%d chars)", filename, len(content))
+        return f"Wrote {filename} ({len(content)} chars)."
+
+    if name == "delete_memory_file":
+        filename = args.get("filename", "")
+        try:
+            path = _safe_memory_path(filename, memory_dir)
+        except ValueError as exc:
+            return f"ERROR: {exc}"
+        if not path.exists():
+            return f"ERROR: File not found: {filename}"
+        path.unlink()
+        logger.debug("reflect: tool delete_memory_file → %s", filename)
+        return f"Deleted {filename}."
+
+    return f"ERROR: Unknown tool: {name}"
+
+
+# ---------------------------------------------------------------------------
+# Mini event loop
+# ---------------------------------------------------------------------------
+
+_MAX_TURNS = 5
+
+
+async def _reflection_loop(
+    llm: Any,
+    system: str,
+    user_msg: str,
+    memory_dir: Path,
+    max_turns: int = _MAX_TURNS,
+) -> tuple[bool, list[str], str]:
+    """Run a mini tool-use loop: LLM → tool calls → repeat.
+
+    Returns (success, changed_files, last_text).
+    """
+    messages: list[dict[str, Any]] = [{"role": "user", "content": user_msg}]
+    changed_files: list[str] = []
+    last_text: str = ""
+
+    for _turn in range(max_turns):
+        logger.info("reflect: loop turn %d/%d (msgs=%d)", _turn + 1, max_turns, len(messages))
+        try:
+            resp: LLMResponse = await llm.acomplete(
+                messages=messages,
+                system=system,
+                tools=_REFLECTION_TOOLS,
+                max_tokens=2048,
+            )
+        except asyncio.CancelledError:
+            logger.warning("reflect: LLM call cancelled (task cancelled)")
+            return False, changed_files, last_text
+        except Exception:
+            logger.warning("reflect: LLM call failed", exc_info=True)
+            return False, changed_files, last_text
+
+        # Extract tool calls from litellm/OpenAI response object.
+        tool_calls_raw: list[dict[str, Any]] = []
+        raw = resp.raw_response
+        if raw is not None:
+            # litellm returns a ModelResponse object; tool calls live on
+            # choices[0].message.tool_calls as a list of ChatCompletionMessageToolCall.
+            try:
+                msg_obj = raw.choices[0].message
+                if hasattr(msg_obj, "tool_calls") and msg_obj.tool_calls:
+                    for tc in msg_obj.tool_calls:
+                        fn = tc.function
+                        try:
+                            args = json.loads(fn.arguments) if fn.arguments else {}
+                        except (json.JSONDecodeError, TypeError):
+                            args = {}
+                        tool_calls_raw.append(
+                            {
+                                "id": tc.id,
+                                "name": fn.name,
+                                "input": args,
+                            }
+                        )
+            except (AttributeError, IndexError):
+                pass
+
+        logger.info(
+            "reflect: LLM responded, text=%d chars, tool_calls=%d",
+            len(resp.content or ""),
+            len(tool_calls_raw),
+        )
+
+        turn_text = resp.content or ""
+        if turn_text:
+            last_text = turn_text
+        assistant_msg: dict[str, Any] = {"role": "assistant", "content": turn_text}
+        if tool_calls_raw:
+            assistant_msg["tool_calls"] = [
+                {
+                    "id": tc["id"],
+                    "type": "function",
+                    "function": {
+                        "name": tc["name"],
+                        "arguments": json.dumps(tc.get("input", {})),
+                    },
+                }
+                for tc in tool_calls_raw
+            ]
+        messages.append(assistant_msg)
+
+        if not tool_calls_raw:
+            break
+
+        for tc in tool_calls_raw:
+            result = _execute_tool(tc["name"], tc.get("input", {}), memory_dir)
+            if tc["name"] in ("write_memory_file", "delete_memory_file"):
+                fname = tc.get("input", {}).get("filename", "")
+                if fname and not result.startswith("ERROR"):
+                    changed_files.append(fname)
+            messages.append({"role": "tool", "tool_call_id": tc["id"], "content": result})
+
+    return True, changed_files, last_text
+
+
+# ---------------------------------------------------------------------------
+# System prompts
+# ---------------------------------------------------------------------------
+
+_CATEGORIES_STR = ", ".join(GLOBAL_MEMORY_CATEGORIES)
+
+_SHORT_REFLECT_SYSTEM = f"""\
+You are a reflection agent that distills durable knowledge about the USER
+into persistent global memory files.  You run in the background after each
+assistant turn.
+
+Your goal: identify anything from the recent messages worth remembering
+about the user across ALL future sessions — their profile, preferences,
+environment setup, or feedback on assistant behavior.
+
+Memory categories: {_CATEGORIES_STR}
+
+Expected format for each memory file:
+```markdown
+---
+name: {{{{memory name}}}}
+description: {{{{one-line description — specific and search-friendly}}}}
+type: {{{{{_CATEGORIES_STR}}}}}
+---
+
+{{{{memory content}}}}
+```
+
+Workflow (aim for 2 turns):
+  Turn 1 — call list_memory_files to see what exists, then read_memory_file
+            for any that might need updating.
+  Turn 2 — call write_memory_file for new/updated memories.
+
+Rules:
+- ONLY persist durable knowledge about the USER — who they are, how they
+  like to work, their tech environment, their feedback on your behavior.
+- Do NOT store task-specific details, code patterns, file paths, or
+  ephemeral session state.
+- Keep files concise.  Each file should cover ONE topic.
+- If an existing memory already covers the learning, UPDATE it rather than
+  creating a duplicate.
+- If there is nothing worth remembering, do nothing (respond with a brief
+  reason — no tool calls needed).
+- File names should be kebab-case slugs ending in .md.
+- Do NOT exceed {MAX_FILE_SIZE_BYTES} bytes per file or {MAX_FILES} total files.
+"""
+
+_LONG_REFLECT_SYSTEM = f"""\
+You are a reflection agent performing a periodic housekeeping pass over the
+global memory directory.  Your job is to organise, deduplicate, and trim
+noise from the accumulated memory files.
+
+Memory categories: {_CATEGORIES_STR}
+
+Workflow:
+  1. list_memory_files to get the full manifest.
+  2. read_memory_file for files that look redundant, stale, or overlapping.
+  3. Merge duplicates, delete stale entries, consolidate related memories.
+  4. Ensure descriptions are specific and search-friendly.
+  5. Enforce limits: max {MAX_FILES} files, max {MAX_FILE_SIZE_BYTES} bytes each.
+
+Rules:
+- Prefer merging over deleting — combine related memories into one file.
+- Remove memories that are no longer relevant or are superseded.
+- Keep the total collection lean and high-signal.
+- Do NOT invent new information — only reorganise what exists.
+"""
+
+
+# ---------------------------------------------------------------------------
+# Short & long reflection entry points
+# ---------------------------------------------------------------------------
+
+
+async def _read_conversation_parts(session_dir: Path) -> list[dict[str, Any]]:
+    """Read conversation parts from the queen session directory."""
+    from framework.storage.conversation_store import FileConversationStore
+
+    store = FileConversationStore(session_dir / "conversations")
+    return await store.read_parts()
+
+
+async def run_short_reflection(
+    session_dir: Path,
+    llm: Any,
+    memory_dir: Path | None = None,
+) -> None:
+    """Run a short reflection: extract user knowledge from conversation."""
+    logger.info("reflect: starting short reflection for %s", session_dir)
+    mem_dir = memory_dir or global_memory_dir()
+
+    messages = await _read_conversation_parts(session_dir)
+    if not messages:
+        logger.info("reflect: no conversation parts found in %s, skipping", session_dir)
+        return
+
+    transcript_lines: list[str] = []
+    for msg in messages[-50:]:
+        role = msg.get("role", "")
+        content = str(msg.get("content", "")).strip()
+        if role == "tool" or not content:
+            continue
+        label = "user" if role == "user" else "assistant"
+        if len(content) > 800:
+            content = content[:800] + "…"
+        transcript_lines.append(f"[{label}]: {content}")
+
+    if not transcript_lines:
+        logger.info("reflect: no transcript lines after filtering, skipping")
+        return
+
+    transcript = "\n".join(transcript_lines)
+    user_msg = (
+        f"## Recent conversation ({len(messages)} messages total)\n\n"
+        f"{transcript}\n\n"
+        f"Timestamp: {datetime.now().isoformat(timespec='minutes')}"
+    )
+
+    _, changed, reason = await _reflection_loop(llm, _SHORT_REFLECT_SYSTEM, user_msg, mem_dir)
+    if changed:
+        logger.info("reflect: short reflection done, changed files: %s", changed)
+    else:
+        logger.info("reflect: short reflection done, no changes — %s", reason or "no reason")
+
+
+async def run_long_reflection(
+    llm: Any,
+    memory_dir: Path | None = None,
+) -> None:
+    """Run a long reflection: organise and deduplicate all global memories."""
+    logger.debug("reflect: starting long reflection")
+    mem_dir = memory_dir or global_memory_dir()
+    files = scan_memory_files(mem_dir)
+
+    if not files:
+        logger.debug("reflect: no memory files, skipping long reflection")
+        return
+
+    manifest = format_memory_manifest(files)
+    user_msg = (
+        f"## Current memory manifest ({len(files)} files)\n\n"
+        f"{manifest}\n\n"
+        f"Timestamp: {datetime.now().isoformat(timespec='minutes')}"
+    )
+
+    _, changed, reason = await _reflection_loop(llm, _LONG_REFLECT_SYSTEM, user_msg, mem_dir)
+    if changed:
+        logger.debug("reflect: long reflection done (%d files), changed: %s", len(files), changed)
+    else:
+        logger.debug(
+            "reflect: long reflection done (%d files), no changes — %s",
+            len(files),
+            reason or "no reason",
+        )
+
+
+async def run_shutdown_reflection(
+    session_dir: Path,
+    llm: Any,
+    memory_dir: Path | None = None,
+) -> None:
+    """Run a final short reflection on session shutdown.
+
+    Called during session teardown so recent conversation insights are
+    persisted before the session is destroyed.
+    """
+    logger.info("reflect: running shutdown reflection for %s", session_dir)
+    mem_dir = memory_dir or global_memory_dir()
+    try:
+        await run_short_reflection(session_dir, llm, mem_dir)
+        logger.info("reflect: shutdown reflection completed for %s", session_dir)
+    except asyncio.CancelledError:
+        logger.warning("reflect: shutdown reflection cancelled for %s", session_dir)
+    except Exception:
+        logger.warning("reflect: shutdown reflection failed", exc_info=True)
+        _write_error("shutdown reflection")
+
+
+# ---------------------------------------------------------------------------
+# Event-bus integration
+# ---------------------------------------------------------------------------
+
+_LONG_REFLECT_INTERVAL = 5
+
+
+async def subscribe_reflection_triggers(
+    event_bus: Any,
+    session_dir: Path,
+    llm: Any,
+    memory_dir: Path | None = None,
+) -> list[str]:
+    """Subscribe to queen turn events and return subscription IDs.
+
+    Call this once during queen setup.  Returns a list of event-bus
+    subscription IDs for cleanup during session teardown.
+    """
+    from framework.runtime.event_bus import EventType
+
+    mem_dir = memory_dir or global_memory_dir()
+    _lock = asyncio.Lock()
+    _short_count = 0
+    _background_tasks: set[asyncio.Task] = set()
+
+    async def _do_turn_reflect(is_interval: bool, count: int) -> None:
+        async with _lock:
+            try:
+                if is_interval:
+                    await run_short_reflection(session_dir, llm, mem_dir)
+                    await run_long_reflection(llm, mem_dir)
+                else:
+                    await run_short_reflection(session_dir, llm, mem_dir)
+            except Exception:
+                logger.warning("reflect: reflection failed", exc_info=True)
+                _write_error("short/long reflection")
+
+    async def _do_compaction_reflect() -> None:
+        async with _lock:
+            try:
+                await run_long_reflection(llm, mem_dir)
+            except Exception:
+                logger.warning("reflect: compaction-triggered reflection failed", exc_info=True)
+                _write_error("compaction reflection")
+
+    def _fire_and_forget(coro: Any) -> None:
+        """Spawn a background task and prevent GC before it finishes."""
+        task = asyncio.create_task(coro)
+        _background_tasks.add(task)
+        task.add_done_callback(_background_tasks.discard)
+
+    async def _on_turn_complete(event: Any) -> None:
+        nonlocal _short_count
+
+        if getattr(event, "stream_id", None) != "queen":
+            return
+
+        _short_count += 1
+
+        event_data = getattr(event, "data", {}) or {}
+        stop_reason = event_data.get("stop_reason", "")
+        is_tool_turn = stop_reason in ("tool_use", "tool_calls")
+        is_interval = _short_count % _LONG_REFLECT_INTERVAL == 0
+
+        if is_tool_turn and not is_interval:
+            logger.debug("reflect: skipping tool turn (count=%d)", _short_count)
+            return
+
+        if _lock.locked():
+            logger.debug("reflect: skipping, already running (count=%d)", _short_count)
+            return
+
+        logger.debug(
+            "reflect: triggered (count=%d, interval=%s, stop_reason=%s)",
+            _short_count,
+            is_interval,
+            stop_reason,
+        )
+        _fire_and_forget(_do_turn_reflect(is_interval, _short_count))
+
+    async def _on_compaction(event: Any) -> None:
+        if getattr(event, "stream_id", None) != "queen":
+            return
+        if _lock.locked():
+            logger.debug("reflect: skipping compaction trigger, already running")
+            return
+        logger.debug("reflect: compaction triggered long reflection")
+        _fire_and_forget(_do_compaction_reflect())
+
+    sub_ids: list[str] = []
+
+    sub1 = event_bus.subscribe(
+        event_types=[EventType.LLM_TURN_COMPLETE],
+        handler=_on_turn_complete,
+    )
+    sub_ids.append(sub1)
+
+    sub2 = event_bus.subscribe(
+        event_types=[EventType.CONTEXT_COMPACTED],
+        handler=_on_compaction,
+    )
+    sub_ids.append(sub2)
+
+    return sub_ids
+
+
+def _write_error(context: str) -> None:
+    """Best-effort write of the last traceback to an error file."""
+    try:
+        error_path = global_memory_dir() / ".reflection_error.txt"
+        error_path.parent.mkdir(parents=True, exist_ok=True)
+        error_path.write_text(
+            f"context: {context}\ntime: {datetime.now().isoformat()}\n\n{traceback.format_exc()}",
+            encoding="utf-8",
+        )
+    except OSError:
+        pass
@@ -1,27 +0,0 @@
-"""Queen's ticket receiver entry point.
-
-When the Worker Health Judge emits a WORKER_ESCALATION_TICKET event on the
-shared EventBus, this entry point fires and routes to the ``ticket_triage``
-node, where the Queen deliberates and decides whether to notify the operator.
-
-Isolation level is ``isolated`` — the queen's triage memory is kept separate
-from the worker's shared memory. Each ticket triage runs in its own context.
-"""
-
-from __future__ import annotations
-
-from framework.graph.edge import AsyncEntryPointSpec
-
-TICKET_RECEIVER_ENTRY_POINT = AsyncEntryPointSpec(
-    id="ticket_receiver",
-    name="Worker Escalation Ticket Receiver",
-    entry_node="ticket_triage",
-    trigger_type="event",
-    trigger_config={
-        "event_types": ["worker_escalation_ticket"],
-        # Do not fire on our own graph's events (prevents loops if queen
-        # somehow emits a worker_escalation_ticket for herself)
-        "exclude_own_graph": True,
-    },
-    isolation_level="isolated",
-)
@@ -6,7 +6,6 @@ Usage:
    hive info exports/my-agent
    hive validate exports/my-agent
    hive list exports/
-    hive dispatch exports/ --input '{"key": "value"}'
    hive shell exports/my-agent

 Testing commands:
@@ -79,7 +78,7 @@ def main():

    subparsers = parser.add_subparsers(dest="command", required=True)

-    # Register runner commands (run, info, validate, list, dispatch, shell)
+    # Register runner commands (run, info, validate, list, shell)
    from framework.runner.cli import register_commands

    register_commands(subparsers)
@@ -89,6 +88,21 @@ def main():

    register_testing_commands(subparsers)

+    # Register skill commands (skill list, skill trust, ...)
+    from framework.skills.cli import register_skill_commands
+
+    register_skill_commands(subparsers)
+
+    # Register debugger commands (debugger)
+    from framework.debugger.cli import register_debugger_commands
+
+    register_debugger_commands(subparsers)
+
+    # Register MCP registry commands (mcp install, mcp add, ...)
+    from framework.runner.mcp_registry_cli import register_mcp_commands
+
+    register_mcp_commands(subparsers)
+
    args = parser.parse_args()

    if hasattr(args, "func"):
@@ -19,6 +19,10 @@ from framework.graph.edge import DEFAULT_MAX_TOKENS
 # ---------------------------------------------------------------------------

 HIVE_CONFIG_FILE = Path.home() / ".hive" / "configuration.json"
+
+# Hive LLM router endpoint (Anthropic-compatible).
+# litellm's Anthropic handler appends /v1/messages, so this is just the base host.
+HIVE_LLM_ENDPOINT = "https://api.adenhq.com"
 logger = logging.getLogger(__name__)


@@ -47,16 +51,169 @@ def get_preferred_model() -> str:
    """Return the user's preferred LLM model string (e.g. 'anthropic/claude-sonnet-4-20250514')."""
    llm = get_hive_config().get("llm", {})
    if llm.get("provider") and llm.get("model"):
-        return f"{llm['provider']}/{llm['model']}"
+        provider = str(llm["provider"])
+        model = str(llm["model"]).strip()
+        # OpenRouter quickstart stores raw model IDs; tolerate pasted "openrouter/<id>" too.
+        if provider.lower() == "openrouter" and model.lower().startswith("openrouter/"):
+            model = model[len("openrouter/") :]
+        if model:
+            return f"{provider}/{model}"
    return "anthropic/claude-sonnet-4-20250514"


+def get_preferred_worker_model() -> str | None:
+    """Return the user's preferred worker LLM model, or None if not configured.
+
+    Reads from the ``worker_llm`` section of ~/.hive/configuration.json.
+    Returns None when no worker-specific model is set, so callers can
+    fall back to the default (queen) model via ``get_preferred_model()``.
+    """
+    worker_llm = get_hive_config().get("worker_llm", {})
+    if worker_llm.get("provider") and worker_llm.get("model"):
+        provider = str(worker_llm["provider"])
+        model = str(worker_llm["model"]).strip()
+        if provider.lower() == "openrouter" and model.lower().startswith("openrouter/"):
+            model = model[len("openrouter/") :]
+        if model:
+            return f"{provider}/{model}"
+    return None
+
+
+def get_worker_api_key() -> str | None:
+    """Return the API key for the worker LLM, falling back to the default key."""
+    worker_llm = get_hive_config().get("worker_llm", {})
+    if not worker_llm:
+        return get_api_key()
+
+    # Worker-specific subscription / env var
+    if worker_llm.get("use_claude_code_subscription"):
+        try:
+            from framework.runner.runner import get_claude_code_token
+
+            token = get_claude_code_token()
+            if token:
+                return token
+        except ImportError:
+            pass
+
+    if worker_llm.get("use_codex_subscription"):
+        try:
+            from framework.runner.runner import get_codex_token
+
+            token = get_codex_token()
+            if token:
+                return token
+        except ImportError:
+            pass
+
+    if worker_llm.get("use_kimi_code_subscription"):
+        try:
+            from framework.runner.runner import get_kimi_code_token
+
+            token = get_kimi_code_token()
+            if token:
+                return token
+        except ImportError:
+            pass
+
+    if worker_llm.get("use_antigravity_subscription"):
+        try:
+            from framework.runner.runner import get_antigravity_token
+
+            token = get_antigravity_token()
+            if token:
+                return token
+        except ImportError:
+            pass
+
+    api_key_env_var = worker_llm.get("api_key_env_var")
+    if api_key_env_var:
+        return os.environ.get(api_key_env_var)
+
+    # Fall back to default key
+    return get_api_key()
+
+
+def get_worker_api_base() -> str | None:
+    """Return the api_base for the worker LLM, falling back to the default."""
+    worker_llm = get_hive_config().get("worker_llm", {})
+    if not worker_llm:
+        return get_api_base()
+
+    if worker_llm.get("use_codex_subscription"):
+        return "https://chatgpt.com/backend-api/codex"
+    if worker_llm.get("use_kimi_code_subscription"):
+        return "https://api.kimi.com/coding"
+    if worker_llm.get("use_antigravity_subscription"):
+        # Antigravity uses AntigravityProvider directly — no api_base needed.
+        return None
+    if worker_llm.get("api_base"):
+        return worker_llm["api_base"]
+    if str(worker_llm.get("provider", "")).lower() == "openrouter":
+        return OPENROUTER_API_BASE
+    return None
+
+
+def get_worker_llm_extra_kwargs() -> dict[str, Any]:
+    """Return extra kwargs for the worker LLM provider."""
+    worker_llm = get_hive_config().get("worker_llm", {})
+    if not worker_llm:
+        return get_llm_extra_kwargs()
+
+    if worker_llm.get("use_claude_code_subscription"):
+        api_key = get_worker_api_key()
+        if api_key:
+            return {
+                "extra_headers": {"authorization": f"Bearer {api_key}"},
+            }
+    if worker_llm.get("use_codex_subscription"):
+        api_key = get_worker_api_key()
+        if api_key:
+            headers: dict[str, str] = {
+                "Authorization": f"Bearer {api_key}",
+                "User-Agent": "CodexBar",
+            }
+            try:
+                from framework.runner.runner import get_codex_account_id
+
+                account_id = get_codex_account_id()
+                if account_id:
+                    headers["ChatGPT-Account-Id"] = account_id
+            except ImportError:
+                pass
+            return {
+                "extra_headers": headers,
+                "store": False,
+                "allowed_openai_params": ["store"],
+            }
+    if worker_llm.get("provider") == "ollama":
+        return {"num_ctx": worker_llm.get("num_ctx", 16384)}
+    return {}
+
+
+def get_worker_max_tokens() -> int:
+    """Return max_tokens for the worker LLM, falling back to default."""
+    worker_llm = get_hive_config().get("worker_llm", {})
+    if worker_llm and "max_tokens" in worker_llm:
+        return worker_llm["max_tokens"]
+    return get_max_tokens()
+
+
+def get_worker_max_context_tokens() -> int:
+    """Return max_context_tokens for the worker LLM, falling back to default."""
+    worker_llm = get_hive_config().get("worker_llm", {})
+    if worker_llm and "max_context_tokens" in worker_llm:
+        return worker_llm["max_context_tokens"]
+    return get_max_context_tokens()
+
+
 def get_max_tokens() -> int:
    """Return the configured max_tokens, falling back to DEFAULT_MAX_TOKENS."""
    return get_hive_config().get("llm", {}).get("max_tokens", DEFAULT_MAX_TOKENS)


 DEFAULT_MAX_CONTEXT_TOKENS = 32_000
+OPENROUTER_API_BASE = "https://openrouter.ai/api/v1"


 def get_max_context_tokens() -> int:
@@ -109,6 +266,17 @@ def get_api_key() -> str | None:
        except ImportError:
            pass

+    # Antigravity subscription: read OAuth token from accounts JSON
+    if llm.get("use_antigravity_subscription"):
+        try:
+            from framework.runner.runner import get_antigravity_token
+
+            token = get_antigravity_token()
+            if token:
+                return token
+        except ImportError:
+            pass
+
    # Standard env-var path (covers ZAI Code and all API-key providers)
    api_key_env_var = llm.get("api_key_env_var")
    if api_key_env_var:
@@ -116,11 +284,99 @@ def get_api_key() -> str | None:
    return None


+# OAuth credentials for Antigravity are fetched from the opencode-antigravity-auth project.
+# This project reverse-engineered and published the public OAuth credentials
+# for Google's Antigravity/Cloud Code Assist API.
+# Source: https://github.com/NoeFabris/opencode-antigravity-auth
+_ANTIGRAVITY_CREDENTIALS_URL = (
+    "https://raw.githubusercontent.com/NoeFabris/opencode-antigravity-auth/dev/src/constants.ts"
+)
+_antigravity_credentials_cache: tuple[str | None, str | None] = (None, None)
+
+
+def _fetch_antigravity_credentials() -> tuple[str | None, str | None]:
+    """Fetch OAuth client ID and secret from the public npm package source on GitHub."""
+    global _antigravity_credentials_cache
+    if _antigravity_credentials_cache[0] and _antigravity_credentials_cache[1]:
+        return _antigravity_credentials_cache
+
+    import re
+    import urllib.request
+
+    try:
+        req = urllib.request.Request(
+            _ANTIGRAVITY_CREDENTIALS_URL, headers={"User-Agent": "Hive/1.0"}
+        )
+        with urllib.request.urlopen(req, timeout=10) as resp:
+            content = resp.read().decode("utf-8")
+            id_match = re.search(r'ANTIGRAVITY_CLIENT_ID\s*=\s*"([^"]+)"', content)
+            secret_match = re.search(r'ANTIGRAVITY_CLIENT_SECRET\s*=\s*"([^"]+)"', content)
+            client_id = id_match.group(1) if id_match else None
+            client_secret = secret_match.group(1) if secret_match else None
+            if client_id and client_secret:
+                _antigravity_credentials_cache = (client_id, client_secret)
+            return client_id, client_secret
+    except Exception as e:
+        logger.debug("Failed to fetch Antigravity credentials from public source: %s", e)
+    return None, None
+
+
+def get_antigravity_client_id() -> str:
+    """Return the Antigravity OAuth application client ID.
+
+    Checked in order:
+    1. ``ANTIGRAVITY_CLIENT_ID`` environment variable
+    2. ``llm.antigravity_client_id`` in ~/.hive/configuration.json
+    3. Fetch from public source (opencode-antigravity-auth project on GitHub)
+    """
+    env = os.environ.get("ANTIGRAVITY_CLIENT_ID")
+    if env:
+        return env
+    cfg_val = get_hive_config().get("llm", {}).get("antigravity_client_id")
+    if cfg_val:
+        return cfg_val
+    # Fetch from public source
+    client_id, _ = _fetch_antigravity_credentials()
+    if client_id:
+        return client_id
+    raise RuntimeError("Could not obtain Antigravity OAuth client ID")
+
+
+def get_antigravity_client_secret() -> str | None:
+    """Return the Antigravity OAuth client secret.
+
+    Checked in order:
+    1. ``ANTIGRAVITY_CLIENT_SECRET`` environment variable
+    2. ``llm.antigravity_client_secret`` in ~/.hive/configuration.json
+    3. Fetch from public source (opencode-antigravity-auth project on GitHub)
+
+    Returns None when not found — token refresh will be skipped and
+    the caller must use whatever access token is already available.
+    """
+    env = os.environ.get("ANTIGRAVITY_CLIENT_SECRET")
+    if env:
+        return env
+    cfg_val = get_hive_config().get("llm", {}).get("antigravity_client_secret") or None
+    if cfg_val:
+        return cfg_val
+    # Fetch from public source
+    _, secret = _fetch_antigravity_credentials()
+    return secret
+
+
 def get_gcu_enabled() -> bool:
    """Return whether GCU (browser automation) is enabled in user config."""
    return get_hive_config().get("gcu_enabled", True)


+def get_gcu_viewport_scale() -> float:
+    """Return GCU viewport scale factor (0.1-1.0), default 0.8."""
+    scale = get_hive_config().get("gcu_viewport_scale", 0.8)
+    if isinstance(scale, (int, float)) and 0.1 <= scale <= 1.0:
+        return float(scale)
+    return 0.8
+
+
 def get_api_base() -> str | None:
    """Return the api_base URL for OpenAI-compatible endpoints, if configured."""
    llm = get_hive_config().get("llm", {})
@@ -130,7 +386,14 @@ def get_api_base() -> str | None:
    if llm.get("use_kimi_code_subscription"):
        # Kimi Code uses an Anthropic-compatible endpoint (no /v1 suffix).
        return "https://api.kimi.com/coding"
-    return llm.get("api_base")
+    if llm.get("use_antigravity_subscription"):
+        # Antigravity uses AntigravityProvider directly — no api_base needed.
+        return None
+    if llm.get("api_base"):
+        return llm["api_base"]
+    if str(llm.get("provider", "")).lower() == "openrouter":
+        return OPENROUTER_API_BASE
+    return None


 def get_llm_extra_kwargs() -> dict[str, Any]:
@@ -171,6 +434,11 @@ def get_llm_extra_kwargs() -> dict[str, Any]:
                "store": False,
                "allowed_openai_params": ["store"],
            }
+    if llm.get("provider") == "ollama":
+        # Pass num_ctx to Ollama so it doesn't silently truncate the ~9.5k Queen prompt.
+        # Ollama's default num_ctx is only 2048. We set it to 16384 here so LiteLLM
+        # passes it through as a provider-specific option.
+        return {"num_ctx": llm.get("num_ctx", 16384)}
    return {}


@@ -142,13 +142,17 @@ def save_aden_api_key(key: str) -> None:
    os.environ[ADEN_ENV_VAR] = key


-def delete_aden_api_key() -> None:
-    """Remove ADEN_API_KEY from the encrypted store and ``os.environ``."""
+def delete_aden_api_key() -> bool:
+    """Remove ADEN_API_KEY from the encrypted store and ``os.environ``.
+
+    Returns True if the key existed and was deleted, False otherwise.
+    """
+    deleted = False
    try:
        from .storage import EncryptedFileStorage

        storage = EncryptedFileStorage()
-        storage.delete(ADEN_CREDENTIAL_ID)
+        deleted = storage.delete(ADEN_CREDENTIAL_ID)
    except (FileNotFoundError, PermissionError) as e:
        logger.debug("Could not delete %s from encrypted store: %s", ADEN_CREDENTIAL_ID, e)
    except Exception:
@@ -157,8 +161,8 @@ def delete_aden_api_key() -> None:
            ADEN_CREDENTIAL_ID,
            exc_info=True,
        )
-
    os.environ.pop(ADEN_ENV_VAR, None)
+    return deleted


 # ---------------------------------------------------------------------------
@@ -27,6 +27,7 @@ from __future__ import annotations

 import getpass
 import json
+import logging
 import os
 import sys
 from collections.abc import Callable
@@ -37,6 +38,8 @@ from typing import TYPE_CHECKING, Any
 if TYPE_CHECKING:
    from framework.graph import NodeSpec

+logger = logging.getLogger(__name__)
+

 # ANSI colors for terminal output
 class Colors:
@@ -365,8 +368,11 @@ class CredentialSetupSession:
        self._print("")
        try:
            api_key = self.password_fn(f"Paste your {cred.env_var}: ").strip()
+        except (EOFError, OSError) as exc:
+            logger.debug("Password input unavailable, falling back to plain input: %s", exc)
+            api_key = self._input(f"Paste your {cred.env_var}: ").strip()
        except Exception:
-            # Fallback to regular input if password input fails
+            logger.warning("Unexpected error reading password input", exc_info=True)
            api_key = self._input(f"Paste your {cred.env_var}: ").strip()

        if not api_key:
@@ -403,7 +409,11 @@ class CredentialSetupSession:

            try:
                aden_key = self.password_fn("Paste your ADEN_API_KEY: ").strip()
+            except (EOFError, OSError) as exc:
+                logger.debug("Password input unavailable for ADEN_API_KEY: %s", exc)
+                aden_key = self._input("Paste your ADEN_API_KEY: ").strip()
            except Exception:
+                logger.warning("Unexpected error reading ADEN_API_KEY input", exc_info=True)
                aden_key = self._input("Paste your ADEN_API_KEY: ").strip()

            if not aden_key:
@@ -433,8 +443,10 @@ class CredentialSetupSession:
                    value = store.get_key(cred_id, cred.credential_key)
                    if value:
                        os.environ[cred.env_var] = value
+                except (KeyError, OSError) as exc:
+                    logger.debug("Could not export credential to env: %s", exc)
                except Exception:
-                    pass
+                    logger.warning("Unexpected error exporting credential to env", exc_info=True)
                return True
            else:
                self._print(
@@ -457,9 +469,12 @@ class CredentialSetupSession:
                "message": result.message,
                "details": result.details,
            }
-        except Exception:
+        except ImportError:
            # No health checker available
            return None
+        except Exception:
+            logger.warning("Health check failed for %s", cred.credential_name, exc_info=True)
+            return None

    def _store_credential(self, cred: MissingCredential, value: str) -> None:
        """Store credential in encrypted store and export to env."""
@@ -561,7 +576,11 @@ def _load_nodes_from_python_agent(agent_path: Path) -> list:
        sys.modules[spec.name] = module
        spec.loader.exec_module(module)
        return getattr(module, "nodes", [])
+    except (ImportError, OSError) as exc:
+        logger.debug("Could not load agent module: %s", exc)
+        return []
    except Exception:
+        logger.warning("Unexpected error loading agent module", exc_info=True)
        return []


@@ -588,7 +607,11 @@ def _load_nodes_from_json_agent(agent_json: Path) -> list:
                )
            )
        return nodes
+    except (json.JSONDecodeError, KeyError, OSError) as exc:
+        logger.debug("Could not load JSON agent: %s", exc)
+        return []
    except Exception:
+        logger.warning("Unexpected error loading JSON agent", exc_info=True)
        return []


@@ -51,6 +51,16 @@ def ensure_credential_key_env() -> None:
                    if found and value:
                        os.environ[var_name] = value
                        logger.debug("Loaded %s from shell config", var_name)
+        # Also load the currently configured LLM env var even if it's not in CREDENTIAL_SPECS.
+        # This keeps quickstart-written keys available to fresh processes on Unix shells.
+        from framework.config import get_hive_config
+
+        llm_env_var = str(get_hive_config().get("llm", {}).get("api_key_env_var", "")).strip()
+        if llm_env_var and not os.environ.get(llm_env_var):
+            found, value = check_env_var_in_shell_config(llm_env_var)
+            if found and value:
+                os.environ[llm_env_var] = value
+                logger.debug("Loaded configured LLM env var %s from shell config", llm_env_var)
    except ImportError:
        pass

@@ -0,0 +1,76 @@
+"""CLI command for the LLM debug log viewer."""
+
+import argparse
+import subprocess
+import sys
+from pathlib import Path
+
+_SCRIPT = Path(__file__).resolve().parents[3] / "scripts" / "llm_debug_log_visualizer.py"
+
+
+def register_debugger_commands(subparsers: argparse._SubParsersAction) -> None:
+    """Register the ``hive debugger`` command."""
+    parser = subparsers.add_parser(
+        "debugger",
+        help="Open the LLM debug log viewer",
+        description=(
+            "Start a local server that lets you browse LLM debug sessions "
+            "recorded in ~/.hive/llm_logs. Sessions are loaded on demand so "
+            "the browser stays responsive."
+        ),
+    )
+    parser.add_argument(
+        "--session",
+        help="Execution ID to select initially.",
+    )
+    parser.add_argument(
+        "--port",
+        type=int,
+        default=0,
+        help="Port for the local server (0 = auto-pick a free port).",
+    )
+    parser.add_argument(
+        "--logs-dir",
+        help="Directory containing JSONL log files (default: ~/.hive/llm_logs).",
+    )
+    parser.add_argument(
+        "--limit-files",
+        type=int,
+        default=None,
+        help="Maximum number of newest log files to scan (default: 200).",
+    )
+    parser.add_argument(
+        "--output",
+        help="Write a static HTML file instead of starting a server.",
+    )
+    parser.add_argument(
+        "--no-open",
+        action="store_true",
+        help="Start the server but do not open a browser.",
+    )
+    parser.add_argument(
+        "--include-tests",
+        action="store_true",
+        help="Show test/mock sessions (hidden by default).",
+    )
+    parser.set_defaults(func=cmd_debugger)
+
+
+def cmd_debugger(args: argparse.Namespace) -> int:
+    """Launch the LLM debug log visualizer."""
+    cmd: list[str] = [sys.executable, str(_SCRIPT)]
+    if args.session:
+        cmd += ["--session", args.session]
+    if args.port:
+        cmd += ["--port", str(args.port)]
+    if args.logs_dir:
+        cmd += ["--logs-dir", args.logs_dir]
+    if args.limit_files is not None:
+        cmd += ["--limit-files", str(args.limit_files)]
+    if args.output:
+        cmd += ["--output", args.output]
+    if args.no_open:
+        cmd.append("--no-open")
+    if args.include_tests:
+        cmd.append("--include-tests")
+    return subprocess.call(cmd)
@@ -1,11 +1,6 @@
 """Graph structures: Goals, Nodes, Edges, and Execution."""

-from framework.graph.client_io import (
-    ActiveNodeClientIO,
-    ClientIOGateway,
-    InertNodeClientIO,
-    NodeClientIO,
-)
+from framework.graph.context import GraphContext
 from framework.graph.context_handoff import ContextHandoff, HandoffContext
 from framework.graph.conversation import ConversationStore, Message, NodeConversation
 from framework.graph.edge import DEFAULT_MAX_TOKENS, EdgeCondition, EdgeSpec, GraphSpec
@@ -19,6 +14,14 @@ from framework.graph.event_loop_node import (
 from framework.graph.executor import GraphExecutor
 from framework.graph.goal import Constraint, Goal, GoalStatus, SuccessCriterion
 from framework.graph.node import NodeContext, NodeProtocol, NodeResult, NodeSpec
+from framework.graph.worker_agent import (
+    Activation,
+    FanOutTag,
+    FanOutTracker,
+    WorkerAgent,
+    WorkerCompletion,
+    WorkerLifecycle,
+)

 __all__ = [
    # Goal
@@ -51,9 +54,12 @@ __all__ = [
    # Context Handoff
    "ContextHandoff",
    "HandoffContext",
-    # Client I/O
-    "NodeClientIO",
-    "ActiveNodeClientIO",
-    "InertNodeClientIO",
-    "ClientIOGateway",
+    # Worker Agent
+    "WorkerAgent",
+    "WorkerLifecycle",
+    "WorkerCompletion",
+    "Activation",
+    "FanOutTag",
+    "FanOutTracker",
+    "GraphContext",
 ]
@@ -59,6 +59,13 @@ class ActiveNodeClientIO(NodeClientIO):
        self._input_result: str | None = None

    async def emit_output(self, content: str, is_final: bool = False) -> None:
+        # Strip leading whitespace from first output chunk to avoid leading spaces
+        # (some LLMs like Kimi output leading whitespace before text)
+        if not self._output_snapshot and content:
+            content = content.lstrip()
+            if not content:  # Content was all whitespace
+                return
+
        self._output_snapshot += content
        await self._output_queue.put(content)

@@ -0,0 +1,323 @@
+"""Shared graph execution context helpers.
+
+This module centralizes:
+- Graph-run shared state (`GraphContext`)
+- Scoped buffer permission shaping for a node
+- Per-node accounts prompt resolution
+- Canonical `NodeContext` construction
+"""
+
+from __future__ import annotations
+
+import asyncio
+from dataclasses import dataclass, field
+from typing import Any
+
+from framework.graph.edge import GraphSpec
+from framework.graph.goal import Goal
+from framework.graph.node import DataBuffer, NodeContext, NodeProtocol, NodeSpec
+from framework.runtime.core import Runtime
+
+
+@dataclass
+class GraphContext:
+    """Shared state for one graph execution run."""
+
+    graph: GraphSpec
+    goal: Goal
+    buffer: DataBuffer
+    runtime: Runtime
+    llm: Any  # LLMProvider
+    tools: list[Any]  # list[Tool]
+    tool_executor: Any  # Callable
+    event_bus: Any  # GraphScopedEventBus
+    execution_id: str
+    stream_id: str
+    run_id: str
+    storage_path: Any  # Path | None
+    runtime_logger: Any = None
+    node_registry: dict[str, NodeProtocol] = field(default_factory=dict)
+    node_spec_registry: dict[str, NodeSpec] = field(default_factory=dict)
+    parallel_config: Any = None  # ParallelExecutionConfig | None
+    enable_parallel_execution: bool = True
+    is_continuous: bool = False
+    continuous_conversation: Any = None
+    cumulative_tools: list[Any] = field(default_factory=list)
+    cumulative_tool_names: set[str] = field(default_factory=set)
+    cumulative_output_keys: list[str] = field(default_factory=list)
+    accounts_prompt: str = ""
+    accounts_data: list[dict] | None = None
+    tool_provider_map: dict[str, str] | None = None
+    skills_catalog_prompt: str = ""
+    protocols_prompt: str = ""
+    skill_dirs: list[str] = field(default_factory=list)
+    context_warn_ratio: float | None = None
+    batch_init_nudge: str | None = None
+    dynamic_tools_provider: Any = None
+    dynamic_prompt_provider: Any = None
+    dynamic_memory_provider: Any = None
+    iteration_metadata_provider: Any = None
+    loop_config: dict[str, Any] = field(default_factory=dict)
+    path: list[str] = field(default_factory=list)
+    node_visit_counts: dict[str, int] = field(default_factory=dict)
+    _path_lock: asyncio.Lock = field(default_factory=asyncio.Lock)
+    _visits_lock: asyncio.Lock = field(default_factory=asyncio.Lock)
+    # Fan-out buffer conflict tracking: key → worker_id that wrote it
+    _fanout_written_keys: dict[str, str] = field(default_factory=dict)
+    # Retry tracking: worker_id → retry_count (for execution quality assessment)
+    retry_counts: dict[str, int] = field(default_factory=dict)
+    nodes_with_retries: set[str] = field(default_factory=set)
+
+
+def build_scoped_buffer(buffer: DataBuffer, node_spec: NodeSpec) -> DataBuffer:
+    """Create a node-scoped buffer view.
+
+    When permissions are already restricted, auto-include framework-managed
+    `_`-prefixed keys used by the default skill protocols.
+    """
+
+    read_keys = list(node_spec.input_keys)
+    write_keys = list(node_spec.output_keys)
+
+    if read_keys or write_keys:
+        from framework.skills.defaults import DATA_BUFFER_KEYS as _skill_keys
+
+        existing_underscore = [k for k in buffer._data if k.startswith("_")]
+        extra_keys = set(_skill_keys) | set(existing_underscore)
+
+        for key in extra_keys:
+            if read_keys and key not in read_keys:
+                read_keys.append(key)
+            if write_keys and key not in write_keys:
+                write_keys.append(key)
+
+    return buffer.with_permissions(read_keys=read_keys, write_keys=write_keys)
+
+
+def build_node_accounts_prompt(
+    *,
+    accounts_prompt: str,
+    accounts_data: list[dict] | None,
+    tool_provider_map: dict[str, str] | None,
+    node_tool_names: list[str] | None,
+    fallback_to_default: bool = False,
+) -> str:
+    """Resolve the accounts prompt for one node."""
+
+    resolved = accounts_prompt
+    if accounts_data and tool_provider_map:
+        from framework.graph.prompting import build_accounts_prompt
+
+        filtered = build_accounts_prompt(
+            accounts_data,
+            tool_provider_map,
+            node_tool_names=node_tool_names,
+        )
+        if filtered or not fallback_to_default:
+            resolved = filtered
+
+    return resolved
+
+
+def _resolve_available_tools(
+    *,
+    node_spec: NodeSpec,
+    tools: list[Any],
+    override_tools: list[Any] | None,
+) -> list[Any]:
+    """Select tools available to the current node."""
+
+    if override_tools is not None:
+        return list(override_tools)
+
+    if not node_spec.tools:
+        return []
+
+    return [tool for tool in tools if tool.name in node_spec.tools]
+
+
+def _derive_input_data(buffer: DataBuffer, input_keys: list[str]) -> dict[str, Any]:
+    """Collect node inputs from the shared buffer."""
+
+    input_data: dict[str, Any] = {}
+    for key in input_keys:
+        value = buffer.read(key)
+        if value is not None:
+            input_data[key] = value
+    return input_data
+
+
+def build_node_context(
+    *,
+    runtime: Runtime,
+    node_spec: NodeSpec,
+    buffer: DataBuffer,
+    goal: Goal,
+    llm: Any,
+    tools: list[Any],
+    max_tokens: int,
+    input_data: dict[str, Any] | None = None,
+    derive_input_data_from_buffer: bool = False,
+    runtime_logger: Any = None,
+    pause_event: Any = None,
+    continuous_mode: bool = False,
+    inherited_conversation: Any = None,
+    override_tools: list[Any] | None = None,
+    cumulative_output_keys: list[str] | None = None,
+    event_triggered: bool = False,
+    accounts_prompt: str = "",
+    accounts_data: list[dict] | None = None,
+    tool_provider_map: dict[str, str] | None = None,
+    fallback_to_default_accounts_prompt: bool = False,
+    identity_prompt: str = "",
+    narrative: str = "",
+    execution_id: str = "",
+    run_id: str = "",
+    stream_id: str = "",
+    node_registry: dict[str, NodeSpec] | None = None,
+    all_tools: list[Any] | None = None,
+    shared_node_registry: dict[str, NodeProtocol] | None = None,
+    dynamic_tools_provider: Any = None,
+    dynamic_prompt_provider: Any = None,
+    dynamic_memory_provider: Any = None,
+    iteration_metadata_provider: Any = None,
+    skills_catalog_prompt: str = "",
+    protocols_prompt: str = "",
+    skill_dirs: list[str] | None = None,
+    default_skill_warn_ratio: float | None = None,
+    default_skill_batch_nudge: str | None = None,
+    memory_prompt: str = "",
+) -> NodeContext:
+    """Build a canonical `NodeContext` for graph execution."""
+
+    available_tools = _resolve_available_tools(
+        node_spec=node_spec,
+        tools=tools,
+        override_tools=override_tools,
+    )
+    scoped_buffer = build_scoped_buffer(buffer, node_spec)
+    node_accounts_prompt = build_node_accounts_prompt(
+        accounts_prompt=accounts_prompt,
+        accounts_data=accounts_data,
+        tool_provider_map=tool_provider_map,
+        node_tool_names=node_spec.tools,
+        fallback_to_default=fallback_to_default_accounts_prompt,
+    )
+
+    resolved_input_data = (
+        _derive_input_data(buffer, node_spec.input_keys)
+        if input_data is None and derive_input_data_from_buffer
+        else dict(input_data or {})
+    )
+
+    return NodeContext(
+        runtime=runtime,
+        node_id=node_spec.id,
+        node_spec=node_spec,
+        buffer=scoped_buffer,
+        input_data=resolved_input_data,
+        llm=llm,
+        available_tools=available_tools,
+        goal_context=goal.to_prompt_context(),
+        goal=goal,
+        max_tokens=max_tokens,
+        runtime_logger=runtime_logger,
+        pause_event=pause_event,
+        continuous_mode=continuous_mode,
+        inherited_conversation=inherited_conversation,
+        cumulative_output_keys=cumulative_output_keys or [],
+        event_triggered=event_triggered,
+        accounts_prompt=node_accounts_prompt,
+        identity_prompt=identity_prompt,
+        narrative=narrative,
+        memory_prompt=memory_prompt,
+        execution_id=execution_id,
+        run_id=run_id,
+        stream_id=stream_id,
+        node_registry=node_registry or {},
+        all_tools=list(all_tools or tools),
+        shared_node_registry=shared_node_registry or {},
+        dynamic_tools_provider=dynamic_tools_provider,
+        dynamic_prompt_provider=dynamic_prompt_provider,
+        dynamic_memory_provider=dynamic_memory_provider,
+        iteration_metadata_provider=iteration_metadata_provider,
+        skills_catalog_prompt=skills_catalog_prompt,
+        protocols_prompt=protocols_prompt,
+        skill_dirs=list(skill_dirs or []),
+        default_skill_warn_ratio=default_skill_warn_ratio,
+        default_skill_batch_nudge=default_skill_batch_nudge,
+    )
+
+
+def build_node_context_from_graph_context(
+    graph_context: GraphContext,
+    *,
+    node_spec: NodeSpec,
+    pause_event: Any = None,
+    input_data: dict[str, Any] | None = None,
+    derive_input_data_from_buffer: bool = True,
+    override_tools: list[Any] | None = None,
+    inherited_conversation: Any = None,
+    cumulative_output_keys: list[str] | None = None,
+    event_triggered: bool = False,
+    identity_prompt: str | None = None,
+    narrative: str = "",
+    node_registry: dict[str, NodeSpec] | None = None,
+    fallback_to_default_accounts_prompt: bool = True,
+) -> NodeContext:
+    """Build `NodeContext` using shared graph-run state."""
+
+    gc = graph_context
+    resolved_override_tools = override_tools
+    if resolved_override_tools is None and gc.is_continuous and gc.cumulative_tools:
+        resolved_override_tools = list(gc.cumulative_tools)
+
+    resolved_inherited_conversation = inherited_conversation
+    if resolved_inherited_conversation is None and gc.is_continuous:
+        resolved_inherited_conversation = gc.continuous_conversation
+
+    resolved_output_keys = cumulative_output_keys
+    if resolved_output_keys is None and gc.is_continuous:
+        resolved_output_keys = list(gc.cumulative_output_keys)
+
+    return build_node_context(
+        runtime=gc.runtime,
+        node_spec=node_spec,
+        buffer=gc.buffer,
+        goal=gc.goal,
+        llm=gc.llm,
+        tools=gc.tools,
+        max_tokens=gc.graph.max_tokens,
+        input_data=input_data,
+        derive_input_data_from_buffer=derive_input_data_from_buffer,
+        runtime_logger=gc.runtime_logger,
+        pause_event=pause_event,
+        continuous_mode=gc.is_continuous,
+        inherited_conversation=resolved_inherited_conversation,
+        override_tools=resolved_override_tools,
+        cumulative_output_keys=resolved_output_keys,
+        event_triggered=event_triggered,
+        accounts_prompt=gc.accounts_prompt,
+        accounts_data=gc.accounts_data,
+        tool_provider_map=gc.tool_provider_map,
+        fallback_to_default_accounts_prompt=fallback_to_default_accounts_prompt,
+        identity_prompt=identity_prompt
+        if identity_prompt is not None
+        else getattr(gc.graph, "identity_prompt", "") or "",
+        narrative=narrative,
+        execution_id=gc.execution_id,
+        run_id=gc.run_id,
+        stream_id=gc.stream_id,
+        node_registry=node_registry or gc.node_spec_registry,
+        all_tools=gc.tools,
+        shared_node_registry=gc.node_registry,
+        dynamic_tools_provider=gc.dynamic_tools_provider,
+        dynamic_prompt_provider=gc.dynamic_prompt_provider,
+        dynamic_memory_provider=gc.dynamic_memory_provider,
+        iteration_metadata_provider=gc.iteration_metadata_provider,
+        skills_catalog_prompt=gc.skills_catalog_prompt,
+        protocols_prompt=gc.protocols_prompt,
+        skill_dirs=gc.skill_dirs,
+        default_skill_warn_ratio=gc.context_warn_ratio,
+        default_skill_batch_nudge=gc.batch_init_nudge,
+    )
@@ -8,6 +8,13 @@ from dataclasses import dataclass
 from pathlib import Path
 from typing import Any, Literal, Protocol, runtime_checkable

+LEGACY_RUN_ID = "__legacy_run__"
+
+
+def is_legacy_run_id(run_id: str | None) -> bool:
+    """True when run_id represents pre-migration (no run boundary) data."""
+    return run_id is None or run_id == LEGACY_RUN_ID
+

@dataclass
 class Message:
@@ -33,10 +40,22 @@ class Message:
    is_transition_marker: bool = False
    # True when this message is real human input (from /chat), not a system prompt
    is_client_input: bool = False
+    # Optional image content blocks (e.g. from browser_screenshot)
+    image_content: list[dict[str, Any]] | None = None
+    # True when message contains an activated skill body (AS-10: never prune)
+    is_skill_content: bool = False
+    # Logical worker run identifier for shared-session persistence
+    run_id: str | None = None

    def to_llm_dict(self) -> dict[str, Any]:
        """Convert to OpenAI-format message dict."""
        if self.role == "user":
+            if self.image_content:
+                blocks: list[dict[str, Any]] = []
+                if self.content:
+                    blocks.append({"type": "text", "text": self.content})
+                blocks.extend(self.image_content)
+                return {"role": "user", "content": blocks}
            return {"role": "user", "content": self.content}

        if self.role == "assistant":
@@ -47,6 +66,15 @@ class Message:

        # role == "tool"
        content = f"ERROR: {self.content}" if self.is_error else self.content
+        if self.image_content:
+            # Multimodal tool result: text + image content blocks
+            blocks: list[dict[str, Any]] = [{"type": "text", "text": content}]
+            blocks.extend(self.image_content)
+            return {
+                "role": "tool",
+                "tool_call_id": self.tool_use_id,
+                "content": blocks,
+            }
        return {
            "role": "tool",
            "tool_call_id": self.tool_use_id,
@@ -72,6 +100,10 @@ class Message:
            d["is_transition_marker"] = self.is_transition_marker
        if self.is_client_input:
            d["is_client_input"] = self.is_client_input
+        if self.image_content is not None:
+            d["image_content"] = self.image_content
+        if self.run_id is not None:
+            d["run_id"] = self.run_id
        return d

    @classmethod
@@ -87,9 +119,41 @@ class Message:
            phase_id=data.get("phase_id"),
            is_transition_marker=data.get("is_transition_marker", False),
            is_client_input=data.get("is_client_input", False),
+            image_content=data.get("image_content"),
+            run_id=data.get("run_id"),
        )


+def _normalize_cursor(cursor: dict[str, Any] | None) -> dict[str, Any]:
+    """Normalize legacy and run-scoped cursor formats into one flat shape."""
+    return dict(cursor) if cursor else {}
+
+
+def get_cursor_next_seq(cursor: dict[str, Any] | None) -> int | None:
+    next_seq = (cursor or {}).get("next_seq")
+    return next_seq if isinstance(next_seq, int) else None
+
+
+def update_cursor_next_seq(cursor: dict[str, Any] | None, next_seq: int) -> dict[str, Any]:
+    updated = dict(cursor or {})
+    updated["next_seq"] = next_seq
+    return updated
+
+
+def get_run_cursor(cursor: dict[str, Any] | None, run_id: str | None) -> dict[str, Any] | None:
+    return dict(cursor) if cursor else None
+
+
+def update_run_cursor(
+    cursor: dict[str, Any] | None,
+    run_id: str | None,
+    values: dict[str, Any],
+) -> dict[str, Any]:
+    updated = dict(cursor or {})
+    updated.update(values)
+    return updated
+
+
 def _extract_spillover_filename(content: str) -> str | None:
    """Extract spillover filename from a tool result annotation.

@@ -239,7 +303,7 @@ class ConversationStore(Protocol):

    async def read_cursor(self) -> dict[str, Any] | None: ...

-    async def delete_parts_before(self, seq: int) -> None: ...
+    async def delete_parts_before(self, seq: int, run_id: str | None = None) -> None: ...

    async def close(self) -> None: ...

@@ -311,6 +375,7 @@ class NodeConversation:
        compaction_threshold: float = 0.8,
        output_keys: list[str] | None = None,
        store: ConversationStore | None = None,
+        run_id: str | None = None,
    ) -> None:
        self._system_prompt = system_prompt
        self._max_context_tokens = max_context_tokens
@@ -322,6 +387,7 @@ class NodeConversation:
        self._meta_persisted: bool = False
        self._last_api_input_tokens: int | None = None
        self._current_phase: str | None = None
+        self._run_id: str | None = run_id

    # --- Properties --------------------------------------------------------

@@ -373,17 +439,23 @@ class NodeConversation:
        *,
        is_transition_marker: bool = False,
        is_client_input: bool = False,
+        image_content: list[dict[str, Any]] | None = None,
    ) -> Message:
        msg = Message(
            seq=self._next_seq,
            role="user",
            content=content,
            phase_id=self._current_phase,
+            run_id=self._run_id,
            is_transition_marker=is_transition_marker,
            is_client_input=is_client_input,
+            image_content=image_content,
        )
        self._messages.append(msg)
        self._next_seq += 1
+        # Invalidate stale API token count so estimate_tokens() uses
+        # the char-based heuristic which reflects the new message.
+        self._last_api_input_tokens = None
        await self._persist(msg)
        return msg

@@ -398,9 +470,11 @@ class NodeConversation:
            content=content,
            tool_calls=tool_calls,
            phase_id=self._current_phase,
+            run_id=self._run_id,
        )
        self._messages.append(msg)
        self._next_seq += 1
+        self._last_api_input_tokens = None
        await self._persist(msg)
        return msg

@@ -409,6 +483,8 @@ class NodeConversation:
        tool_use_id: str,
        content: str,
        is_error: bool = False,
+        image_content: list[dict[str, Any]] | None = None,
+        is_skill_content: bool = False,
    ) -> Message:
        msg = Message(
            seq=self._next_seq,
@@ -417,9 +493,13 @@ class NodeConversation:
            tool_use_id=tool_use_id,
            is_error=is_error,
            phase_id=self._current_phase,
+            image_content=image_content,
+            is_skill_content=is_skill_content,
+            run_id=self._run_id,
        )
        self._messages.append(msg)
        self._next_seq += 1
+        self._last_api_input_tokens = None
        await self._persist(msg)
        return msg

@@ -500,12 +580,15 @@ class NodeConversation:

        Uses actual API input token count when available (set via
        :meth:`update_token_count`), otherwise falls back to a
-        ``total_chars / 4`` heuristic that includes both message content
-        AND tool_call argument sizes.
+        character-based heuristic that includes message content, tool_call
+        arguments, and image blocks.  The heuristic applies a 4/3 safety
+        margin to avoid under-counting (inspired by Claude Code's compact
+        service).
        """
        if self._last_api_input_tokens is not None:
            return self._last_api_input_tokens
        total_chars = 0
+        image_tokens = 0
        for m in self._messages:
            total_chars += len(m.content)
            if m.tool_calls:
@@ -513,7 +596,11 @@ class NodeConversation:
                    func = tc.get("function", {})
                    total_chars += len(func.get("arguments", ""))
                    total_chars += len(func.get("name", ""))
-        return total_chars // 4
+            if m.image_content:
+                # Images/documents have a fixed token cost per block
+                image_tokens += len(m.image_content) * 2000
+        # Apply 4/3 safety margin to character-based estimate
+        return (total_chars * 4) // (3 * 4) + image_tokens

    def update_token_count(self, actual_input_tokens: int) -> None:
        """Store actual API input token count for more accurate compaction.
@@ -610,8 +697,15 @@ class NodeConversation:
                continue
            if msg.is_error:
                continue  # never prune errors
+            if msg.is_skill_content:
+                continue  # never prune activated skill instructions (AS-10)
            if msg.content.startswith("[Pruned tool result"):
                continue  # already pruned
+            # Tiny results (set_output acks, confirmations) — pruning
+            # saves negligible space but makes the LLM think the call
+            # failed, causing costly retries.
+            if len(msg.content) < 100:
+                continue

            # Phase-aware: protect current phase messages
            if self._current_phase and msg.phase_id == self._current_phase:
@@ -653,6 +747,7 @@ class NodeConversation:
                is_error=msg.is_error,
                phase_id=msg.phase_id,
                is_transition_marker=msg.is_transition_marker,
+                run_id=msg.run_id,
            )
            count += 1

@@ -729,14 +824,14 @@ class NodeConversation:
            summary_seq = self._next_seq
            self._next_seq += 1

-        summary_msg = Message(seq=summary_seq, role="user", content=summary)
+        summary_msg = Message(seq=summary_seq, role="user", content=summary, run_id=self._run_id)

        # Persist
        if self._store:
            delete_before = recent_messages[0].seq if recent_messages else self._next_seq
            await self._store.delete_parts_before(delete_before)
            await self._store.write_part(summary_msg.seq, summary_msg.to_storage_dict())
-            await self._store.write_cursor({"next_seq": self._next_seq})
+            await self._write_next_seq()

        self._messages = [summary_msg] + recent_messages
        self._last_api_input_tokens = None  # reset; next LLM call will recalibrate
@@ -794,6 +889,15 @@ class NodeConversation:
        freeform_lines: list[str] = []
        collapsed_msgs: list[Message] = []

+        # Collect all tool_use IDs present in old messages so we can detect
+        # orphaned tool results whose parent assistant message was already
+        # compacted away (API invariant protection).
+        old_tc_ids: set[str] = set()
+        for msg in old_messages:
+            if msg.tool_calls:
+                for tc in msg.tool_calls:
+                    old_tc_ids.add(tc.get("id", ""))
+
        if aggressive:
            # Aggressive: only keep set_output tool pairs and error results.
            # Everything else is collapsed into a tool-call history summary.
@@ -815,9 +919,17 @@ class NodeConversation:
                else:
                    collapsible_tc_ids |= tc_ids

+            # Skill content and transition markers are always protected
+            for msg in old_messages:
+                if msg.role == "tool" and msg.is_skill_content and msg.tool_use_id:
+                    protected_tc_ids.add(msg.tool_use_id)
+
            # Second pass: classify all messages
            for msg in old_messages:
-                if msg.role == "tool":
+                if msg.is_transition_marker:
+                    # Transition markers are always kept (phase boundaries)
+                    kept_structural.append(msg)
+                elif msg.role == "tool":
                    tc_id = msg.tool_use_id or ""
                    if tc_id in protected_tc_ids:
                        kept_structural.append(msg)
@@ -826,6 +938,12 @@ class NodeConversation:
                        kept_structural.append(msg)
                        # Protect the parent assistant message too
                        protected_tc_ids.add(tc_id)
+                    elif msg.is_skill_content:
+                        kept_structural.append(msg)
+                    elif tc_id and tc_id not in old_tc_ids:
+                        # Orphaned tool result — parent tool_use not in old msgs.
+                        # Keep it to maintain API invariants.
+                        kept_structural.append(msg)
                    else:
                        collapsed_msgs.append(msg)
                elif msg.role == "assistant" and msg.tool_calls:
@@ -842,6 +960,7 @@ class NodeConversation:
                                is_error=msg.is_error,
                                phase_id=msg.phase_id,
                                is_transition_marker=msg.is_transition_marker,
+                                run_id=msg.run_id,
                            )
                        )
                    else:
@@ -856,7 +975,10 @@ class NodeConversation:
        else:
            # Standard mode: keep all tool call pairs as structural
            for msg in old_messages:
-                if msg.role == "tool":
+                if msg.is_transition_marker:
+                    # Transition markers are always kept (phase boundaries)
+                    kept_structural.append(msg)
+                elif msg.role == "tool":
                    kept_structural.append(msg)
                elif msg.role == "assistant" and msg.tool_calls:
                    compact_tcs = _compact_tool_calls(msg.tool_calls)
@@ -869,6 +991,7 @@ class NodeConversation:
                            is_error=msg.is_error,
                            phase_id=msg.phase_id,
                            is_transition_marker=msg.is_transition_marker,
+                            run_id=msg.run_id,
                        )
                    )
                else:
@@ -901,8 +1024,7 @@ class NodeConversation:
            full_path = str((spill_path / conv_filename).resolve())
            ref_parts.append(
                f"[Previous conversation saved to '{full_path}'. "
-                f"Use load_data('{conv_filename}'), read_file('{full_path}'), "
-                f"or run_command('cat \"{full_path}\"') to review if needed.]"
+                f"Use load_data('{conv_filename}') to review if needed.]"
            )
        elif not collapsed_msgs:
            ref_parts.append("[Previous freeform messages compacted.]")
@@ -927,7 +1049,7 @@ class NodeConversation:
            ref_seq = self._next_seq
            self._next_seq += 1

-        ref_msg = Message(seq=ref_seq, role="user", content=ref_content)
+        ref_msg = Message(seq=ref_seq, role="user", content=ref_content, run_id=self._run_id)

        # Persist: delete old messages from store, write reference + kept structural.
        # In aggressive mode, collapsed messages may be interspersed with kept
@@ -941,7 +1063,7 @@ class NodeConversation:
            # Write kept structural messages (they may have been modified)
            for msg in kept_structural:
                await self._store.write_part(msg.seq, msg.to_storage_dict())
-            await self._store.write_cursor({"next_seq": self._next_seq})
+            await self._write_next_seq()

        # Reassemble: reference + kept structural (in original order) + recent
        self._messages = [ref_msg] + kept_structural + recent_messages
@@ -978,7 +1100,7 @@ class NodeConversation:
        """Remove all messages, keep system prompt, preserve ``_next_seq``."""
        if self._store:
            await self._store.delete_parts_before(self._next_seq)
-            await self._store.write_cursor({"next_seq": self._next_seq})
+            await self._write_next_seq()
        self._messages.clear()
        self._last_api_input_tokens = None

@@ -1020,22 +1142,32 @@ class NodeConversation:
        if not self._meta_persisted:
            await self._persist_meta()
        await self._store.write_part(message.seq, message.to_storage_dict())
-        await self._store.write_cursor({"next_seq": self._next_seq})
+        await self._write_next_seq()

    async def _persist_meta(self) -> None:
-        """Lazily write conversation metadata to the store (called once)."""
+        """Lazily write conversation metadata to the store (called once).
+
+        When ``self._run_id`` is set, metadata is written flat for backward
+        compatibility (run-scoped isolation has been reverted).
+        """
        if self._store is None:
            return
-        await self._store.write_meta(
-            {
-                "system_prompt": self._system_prompt,
-                "max_context_tokens": self._max_context_tokens,
-                "compaction_threshold": self._compaction_threshold,
-                "output_keys": self._output_keys,
-            }
-        )
+        run_meta = {
+            "system_prompt": self._system_prompt,
+            "max_context_tokens": self._max_context_tokens,
+            "compaction_threshold": self._compaction_threshold,
+            "output_keys": self._output_keys,
+        }
+        await self._store.write_meta(run_meta)
        self._meta_persisted = True

+    async def _write_next_seq(self) -> None:
+        if self._store is None:
+            return
+        cursor = await self._store.read_cursor() or {}
+        cursor["next_seq"] = self._next_seq
+        await self._store.write_cursor(cursor)
+
    # --- Restore -----------------------------------------------------------

    @classmethod
@@ -1043,6 +1175,7 @@ class NodeConversation:
        cls,
        store: ConversationStore,
        phase_id: str | None = None,
+        run_id: str | None = None,
    ) -> NodeConversation | None:
        """Reconstruct a NodeConversation from a store.

@@ -1052,6 +1185,9 @@ class NodeConversation:
                Used in isolated mode so a node only sees its own
                messages in the shared flat store.  In continuous mode
                pass ``None`` to load all parts.
+            run_id: If set, only load parts matching this run_id.
+                Ensures intentional restarts (new run_id) start fresh
+                while crash recovery (same run_id) resumes correctly.

        Returns ``None`` if the store contains no metadata (i.e. the
        conversation was never persisted).
@@ -1066,17 +1202,23 @@ class NodeConversation:
            compaction_threshold=meta.get("compaction_threshold", 0.8),
            output_keys=meta.get("output_keys"),
            store=store,
+            run_id=run_id,
        )
        conv._meta_persisted = True

        parts = await store.read_parts()
        if phase_id:
            parts = [p for p in parts if p.get("phase_id") == phase_id]
+        # Filter by run_id so intentional restarts (new run_id) start fresh
+        # while crash recovery (same run_id) loads prior parts.
+        if run_id and not is_legacy_run_id(run_id):
+            parts = [p for p in parts if p.get("run_id") == run_id]
        conv._messages = [Message.from_storage_dict(p) for p in parts]

        cursor = await store.read_cursor()
-        if cursor:
-            conv._next_seq = cursor["next_seq"]
+        next_seq = get_cursor_next_seq(cursor)
+        if next_seq is not None:
+            conv._next_seq = next_seq
        elif conv._messages:
            conv._next_seq = conv._messages[-1].seq + 1

@@ -108,7 +108,7 @@ class EdgeSpec(BaseModel):
        self,
        source_success: bool,
        source_output: dict[str, Any],
-        memory: dict[str, Any],
+        buffer_data: dict[str, Any],
        llm: Any | None = None,
        goal: Any | None = None,
        source_node_name: str | None = None,
@@ -120,7 +120,7 @@ class EdgeSpec(BaseModel):
        Args:
            source_success: Whether the source node succeeded
            source_output: Output from the source node
-            memory: Current shared memory state
+            buffer_data: Current data buffer state
            llm: LLM provider for LLM_DECIDE edges
            goal: Goal object for LLM_DECIDE edges
            source_node_name: Name of source node (for LLM context)
@@ -139,7 +139,7 @@ class EdgeSpec(BaseModel):
            return not source_success

        if self.condition == EdgeCondition.CONDITIONAL:
-            return self._evaluate_condition(source_output, memory)
+            return self._evaluate_condition(source_output, buffer_data)

        if self.condition == EdgeCondition.LLM_DECIDE:
            if llm is None or goal is None:
@@ -150,7 +150,7 @@ class EdgeSpec(BaseModel):
                goal=goal,
                source_success=source_success,
                source_output=source_output,
-                memory=memory,
+                buffer_data=buffer_data,
                source_node_name=source_node_name,
                target_node_name=target_node_name,
            )
@@ -160,7 +160,7 @@ class EdgeSpec(BaseModel):
    def _evaluate_condition(
        self,
        output: dict[str, Any],
-        memory: dict[str, Any],
+        buffer_data: dict[str, Any],
    ) -> bool:
        """Evaluate a conditional expression."""

@@ -168,14 +168,14 @@ class EdgeSpec(BaseModel):
            return True

        # Build evaluation context
-        # Include memory keys directly for easier access in conditions
+        # Include buffer keys directly for easier access in conditions
        context = {
            "output": output,
-            "memory": memory,
+            "buffer": buffer_data,
            "result": output.get("result"),
            "true": True,  # Allow lowercase true/false in conditions
            "false": False,
-            **memory,  # Unpack memory keys directly into context
+            **buffer_data,  # Unpack buffer keys directly into context
        }

        try:
@@ -186,7 +186,7 @@ class EdgeSpec(BaseModel):
            expr_vars = {
                k: repr(context[k])
                for k in context
-                if k not in ("output", "memory", "result", "true", "false")
+                if k not in ("output", "buffer", "result", "true", "false")
                and k in self.condition_expr
            }
            logger.info(
@@ -209,7 +209,7 @@ class EdgeSpec(BaseModel):
        goal: Any,
        source_success: bool,
        source_output: dict[str, Any],
-        memory: dict[str, Any],
+        buffer_data: dict[str, Any],
        source_node_name: str | None,
        target_node_name: str | None,
    ) -> bool:
@@ -234,8 +234,8 @@ class EdgeSpec(BaseModel):
 Should we proceed to: {target_node_name or self.target}?
 Edge description: {self.description or "No description"}

-**Context from memory**:
-{json.dumps({k: str(v)[:100] for k, v in list(memory.items())[:5]}, indent=2)}
+**Context from data buffer**:
+{json.dumps({k: str(v)[:100] for k, v in list(buffer_data.items())[:5]}, indent=2)}

 Evaluate whether proceeding to this next node is the right step toward achieving the goal.
 Consider:
@@ -276,14 +276,14 @@ Respond with ONLY a JSON object:
    def map_inputs(
        self,
        source_output: dict[str, Any],
-        memory: dict[str, Any],
+        buffer_data: dict[str, Any],
    ) -> dict[str, Any]:
        """
        Map source outputs to target inputs.

        Args:
            source_output: Output from source node
-            memory: Current shared memory
+            buffer_data: Current data buffer

        Returns:
            Input dict for target node
@@ -294,64 +294,15 @@ Respond with ONLY a JSON object:

        result = {}
        for target_key, source_key in self.input_mapping.items():
-            # Try source output first, then memory
+            # Try source output first, then buffer
            if source_key in source_output:
                result[target_key] = source_output[source_key]
-            elif source_key in memory:
-                result[target_key] = memory[source_key]
+            elif source_key in buffer_data:
+                result[target_key] = buffer_data[source_key]

        return result


-class AsyncEntryPointSpec(BaseModel):
-    """
-    Specification for an asynchronous entry point.
-
-    Used with AgentRuntime for multi-entry-point agents that handle
-    concurrent execution streams (e.g., webhook + API handlers).
-
-    Example:
-        AsyncEntryPointSpec(
-            id="webhook",
-            name="Zendesk Webhook Handler",
-            entry_node="process-webhook",
-            trigger_type="webhook",
-            isolation_level="shared",
-        )
-    """
-
-    id: str = Field(description="Unique identifier for this entry point")
-    name: str = Field(description="Human-readable name")
-    entry_node: str = Field(description="Node ID to start execution from")
-    trigger_type: str = Field(
-        default="manual",
-        description="How this entry point is triggered: webhook, api, timer, event, manual",
-    )
-    trigger_config: dict[str, Any] = Field(
-        default_factory=dict,
-        description="Trigger-specific configuration (e.g., webhook URL, timer interval)",
-    )
-    isolation_level: str = Field(
-        default="shared", description="State isolation: isolated, shared, or synchronized"
-    )
-    priority: int = Field(default=0, description="Execution priority (higher = more priority)")
-    max_concurrent: int = Field(
-        default=10, description="Maximum concurrent executions for this entry point"
-    )
-    max_resurrections: int = Field(
-        default=3,
-        description="Auto-restart on non-fatal failure (0 to disable)",
-    )
-
-    model_config = {"extra": "allow"}
-
-    def get_isolation_level(self):
-        """Convert string isolation level to enum (duck-type with EntryPointSpec)."""
-        from framework.runtime.execution_stream import IsolationLevel
-
-        return IsolationLevel(self.isolation_level)
-
-
 class GraphSpec(BaseModel):
    """
    Complete specification of an agent graph.
@@ -368,28 +319,8 @@ class GraphSpec(BaseModel):
            edges=[...],
        )

-    For multi-entry-point agents (concurrent streams):
-        GraphSpec(
-            id="support-agent-graph",
-            goal_id="support-001",
-            entry_node="process-webhook",  # Default entry
-            async_entry_points=[
-                AsyncEntryPointSpec(
-                    id="webhook",
-                    name="Zendesk Webhook",
-                    entry_node="process-webhook",
-                    trigger_type="webhook",
-                ),
-                AsyncEntryPointSpec(
-                    id="api",
-                    name="API Handler",
-                    entry_node="process-request",
-                    trigger_type="api",
-                ),
-            ],
-            nodes=[...],
-            edges=[...],
-        )
+    Triggers (timer, webhook, event) are now defined in ``triggers.json``
+    alongside the agent directory, not embedded in the graph spec.
    """

    id: str
@@ -402,12 +333,6 @@ class GraphSpec(BaseModel):
        default_factory=dict,
        description="Named entry points for resuming execution. Format: {name: node_id}",
    )
-    async_entry_points: list[AsyncEntryPointSpec] = Field(
-        default_factory=list,
-        description=(
-            "Asynchronous entry points for concurrent execution streams (used with AgentRuntime)"
-        ),
-    )
    terminal_nodes: list[str] = Field(
        default_factory=list, description="IDs of nodes that end execution"
    )
@@ -421,9 +346,9 @@ class GraphSpec(BaseModel):
    )
    edges: list[EdgeSpec] = Field(default_factory=list, description="All edge specifications")

-    # Shared memory keys
-    memory_keys: list[str] = Field(
-        default_factory=list, description="Keys available in shared memory"
+    # Data buffer keys
+    buffer_keys: list[str] = Field(
+        default_factory=list, description="Keys available in data buffer"
    )

    # Default LLM settings
@@ -486,17 +411,6 @@ class GraphSpec(BaseModel):
                return node
        return None

-    def has_async_entry_points(self) -> bool:
-        """Check if this graph uses async entry points (multi-stream execution)."""
-        return len(self.async_entry_points) > 0
-
-    def get_async_entry_point(self, entry_point_id: str) -> AsyncEntryPointSpec | None:
-        """Get an async entry point by ID."""
-        for ep in self.async_entry_points:
-            if ep.id == entry_point_id:
-                return ep
-        return None
-
    def get_outgoing_edges(self, node_id: str) -> list[EdgeSpec]:
        """Get all edges leaving a node, sorted by priority."""
        edges = [e for e in self.edges if e.source == node_id]
@@ -587,37 +501,6 @@ class GraphSpec(BaseModel):
        if not self.get_node(self.entry_node):
            errors.append(f"Entry node '{self.entry_node}' not found")

-        # Check async entry points
-        seen_entry_ids = set()
-        for entry_point in self.async_entry_points:
-            # Check for duplicate IDs
-            if entry_point.id in seen_entry_ids:
-                errors.append(f"Duplicate async entry point ID: '{entry_point.id}'")
-            seen_entry_ids.add(entry_point.id)
-
-            # Check entry node exists
-            if not self.get_node(entry_point.entry_node):
-                errors.append(
-                    f"Async entry point '{entry_point.id}' references "
-                    f"missing node '{entry_point.entry_node}'"
-                )
-
-            # Validate isolation level
-            valid_isolation = {"isolated", "shared", "synchronized"}
-            if entry_point.isolation_level not in valid_isolation:
-                errors.append(
-                    f"Async entry point '{entry_point.id}' has invalid isolation_level "
-                    f"'{entry_point.isolation_level}'. Valid: {valid_isolation}"
-                )
-
-            # Validate trigger type
-            valid_triggers = {"webhook", "api", "timer", "event", "manual"}
-            if entry_point.trigger_type not in valid_triggers:
-                errors.append(
-                    f"Async entry point '{entry_point.id}' has invalid trigger_type "
-                    f"'{entry_point.trigger_type}'. Valid: {valid_triggers}"
-                )
-
        # Check terminal nodes exist
        for term in self.terminal_nodes:
            if not self.get_node(term):
@@ -646,10 +529,6 @@ class GraphSpec(BaseModel):
        for entry_point_node in self.entry_points.values():
            to_visit.append(entry_point_node)

-        # Add all async entry points as valid starting points
-        for async_entry in self.async_entry_points:
-            to_visit.append(async_entry.entry_node)
-
        # Traverse from all entry points
        while to_visit:
            current = to_visit.pop()
@@ -666,36 +545,23 @@ class GraphSpec(BaseModel):
                for sub_agent_id in sub_agents:
                    reachable.add(sub_agent_id)

-        # Build set of async entry point nodes for quick lookup
-        async_entry_nodes = {ep.entry_node for ep in self.async_entry_points}
-
        for node in self.nodes:
            if node.id not in reachable:
-                # Skip if node is a pause node, entry point target, or async entry
-                # (pause/resume architecture and async entry points make reachable)
-                if (
-                    node.id in self.pause_nodes
-                    or node.id in self.entry_points.values()
-                    or node.id in async_entry_nodes
-                ):
+                # Skip if node is a pause node or entry point target
+                if node.id in self.pause_nodes or node.id in self.entry_points.values():
                    continue
                errors.append(f"Node '{node.id}' is unreachable from entry")

-        # Client-facing fan-out validation
-        fan_outs = self.detect_fan_out_nodes()
-        for source_id, targets in fan_outs.items():
-            client_facing_targets = [
-                t
-                for t in targets
-                if self.get_node(t) and getattr(self.get_node(t), "client_facing", False)
-            ]
-            if len(client_facing_targets) > 1:
-                errors.append(
-                    f"Fan-out from '{source_id}' has multiple client-facing nodes: "
-                    f"{client_facing_targets}. Only one branch may be client-facing."
+        for node in self.nodes:
+            if getattr(node, "client_facing", False) and getattr(node, "id", "") != "queen":
+                warnings.append(
+                    f"Node '{node.id}' sets deprecated client_facing=True. "
+                    "Only the queen talks directly to users now; migrate this node "
+                    "to queen-mediated escalation."
                )

        # Output key overlap on parallel event_loop nodes
+        fan_outs = self.detect_fan_out_nodes()
        for source_id, targets in fan_outs.items():
            event_loop_targets = [
                t
@@ -0,0 +1,6 @@
+"""EventLoopNode subpackage — modular components of the event loop orchestrator.
+
+All public symbols are re-exported by the parent ``event_loop_node.py`` for
+backward compatibility.  Internal consumers may import directly from these
+submodules for clarity.
+"""
@@ -0,0 +1,866 @@
+"""Conversation compaction pipeline.
+
+Implements the multi-level compaction strategy:
+0. Microcompaction (count-based tool result clearing — cheapest)
+1. Prune old tool results (token-budget based)
+2. Structure-preserving compaction (spillover)
+3. LLM summary compaction (with recursive splitting)
+4. Emergency deterministic summary (no LLM)
+"""
+
+from __future__ import annotations
+
+import json
+import logging
+import os
+import re
+import time
+from datetime import UTC, datetime
+from pathlib import Path
+from typing import Any
+
+from framework.graph.conversation import Message, NodeConversation
+from framework.graph.event_loop.event_publishing import publish_context_usage
+from framework.graph.event_loop.types import LoopConfig, OutputAccumulator
+from framework.graph.node import NodeContext
+from framework.runtime.event_bus import EventBus
+
+logger = logging.getLogger(__name__)
+
+# Limits for LLM compaction
+LLM_COMPACT_CHAR_LIMIT: int = 240_000
+LLM_COMPACT_MAX_DEPTH: int = 10
+
+# Microcompaction: tools whose results can be safely cleared
+COMPACTABLE_TOOLS: frozenset[str] = frozenset(
+    {
+        "read_file",
+        "run_command",
+        "web_search",
+        "web_fetch",
+        "grep_search",
+        "glob_search",
+        "write_file",
+        "edit_file",
+        "browser_screenshot",
+        "list_directory",
+    }
+)
+
+# Keep at most this many compactable tool results; clear older ones
+MICROCOMPACT_KEEP_RECENT: int = 8
+
+# Circuit-breaker: stop auto-compacting after this many consecutive failures
+MAX_CONSECUTIVE_FAILURES: int = 3
+
+# Track consecutive compaction failures per conversation (module-level)
+_failure_counts: dict[int, int] = {}
+
+# Track last compaction time per conversation for recompaction detection
+_last_compact_times: dict[int, float] = {}
+
+
+def microcompact(
+    conversation: NodeConversation,
+    *,
+    keep_recent: int = MICROCOMPACT_KEEP_RECENT,
+) -> int:
+    """Clear old compactable tool results by count, keeping only the most recent.
+
+    This is the cheapest possible compaction — no LLM call, no structural
+    changes, just replaces old tool result content with a short placeholder.
+    Inspired by Claude Code's cached-microcompact strategy.
+
+    Returns the number of tool results cleared.
+    """
+    # Collect indices of compactable tool results (newest first)
+    compactable_indices: list[int] = []
+    messages = conversation.messages
+    for i in range(len(messages) - 1, -1, -1):
+        msg = messages[i]
+        if msg.role != "tool" or msg.is_error or msg.is_skill_content:
+            continue
+        if msg.content.startswith(("[Pruned tool result", "[Old tool result")):
+            continue
+        if len(msg.content) < 100:
+            continue
+
+        # Check if the tool that produced this result is compactable
+        tool_name = _find_tool_name_for_result(messages, msg)
+        if tool_name and tool_name in COMPACTABLE_TOOLS:
+            compactable_indices.append(i)
+
+    # Keep the most recent N, clear the rest
+    to_clear = compactable_indices[keep_recent:]
+    if not to_clear:
+        return 0
+
+    cleared = 0
+    for i in to_clear:
+        msg = messages[i]
+        spillover = _extract_spillover_filename_inline(msg.content)
+        orig_len = len(msg.content)
+        if spillover:
+            placeholder = (
+                f"[Old tool result cleared: {orig_len} chars. "
+                f"Full data in '{spillover}'. "
+                f"Use load_data('{spillover}') to retrieve.]"
+            )
+        else:
+            placeholder = f"[Old tool result cleared: {orig_len} chars.]"
+
+        # Mutate in-place (microcompact is synchronous, no store writes)
+        conversation._messages[i] = Message(
+            seq=msg.seq,
+            role=msg.role,
+            content=placeholder,
+            tool_use_id=msg.tool_use_id,
+            tool_calls=msg.tool_calls,
+            is_error=msg.is_error,
+            phase_id=msg.phase_id,
+            is_transition_marker=msg.is_transition_marker,
+        )
+        cleared += 1
+
+    if cleared > 0:
+        # Invalidate cached token count
+        conversation._last_api_input_tokens = None
+
+    return cleared
+
+
+def _find_tool_name_for_result(messages: list[Message], tool_msg: Message) -> str | None:
+    """Find the tool name from the assistant message that triggered this tool result."""
+    if not tool_msg.tool_use_id:
+        return None
+    for msg in messages:
+        if msg.tool_calls:
+            for tc in msg.tool_calls:
+                if tc.get("id") == tool_msg.tool_use_id:
+                    return tc.get("function", {}).get("name")
+    return None
+
+
+def _extract_spillover_filename_inline(content: str) -> str | None:
+    """Quick inline check for spillover filename in tool result content."""
+    match = re.search(r"saved to '([^']+)'", content, re.IGNORECASE)
+    return match.group(1) if match else None
+
+
+async def compact(
+    ctx: NodeContext,
+    conversation: NodeConversation,
+    accumulator: OutputAccumulator | None,
+    *,
+    config: LoopConfig,
+    event_bus: EventBus | None,
+    char_limit: int = LLM_COMPACT_CHAR_LIMIT,
+    max_depth: int = LLM_COMPACT_MAX_DEPTH,
+) -> None:
+    """Run the full compaction pipeline if conversation needs compaction.
+
+    Pipeline stages (in order, short-circuits when budget is restored):
+    0. Microcompaction (count-based tool result clearing — cheapest)
+    1. Prune old tool results (token-budget based)
+    2. Structure-preserving compaction (free, no LLM)
+    3. LLM summary compaction (recursive split if too large)
+    4. Emergency deterministic summary (fallback)
+    """
+    conv_id = id(conversation)
+
+    # Circuit breaker: stop auto-compacting after repeated failures
+    if _failure_counts.get(conv_id, 0) >= MAX_CONSECUTIVE_FAILURES:
+        logger.warning(
+            "Circuit breaker: skipping compaction after %d consecutive failures",
+            _failure_counts[conv_id],
+        )
+        return
+
+    # Recompaction detection
+    now = time.monotonic()
+    last_time = _last_compact_times.get(conv_id)
+    if last_time is not None and (now - last_time) < 30:
+        logger.warning(
+            "Recompaction chain detected: only %.1fs since last compaction",
+            now - last_time,
+        )
+
+    ratio_before = conversation.usage_ratio()
+    phase_grad = getattr(ctx, "continuous_mode", False)
+    pre_inventory: list[dict[str, Any]] | None = None
+
+    if ratio_before >= 1.0:
+        pre_inventory = build_message_inventory(conversation)
+
+    # --- Step 0: Microcompaction (count-based, cheapest) ---
+    mc_cleared = microcompact(conversation)
+    if mc_cleared > 0:
+        logger.info(
+            "Microcompact cleared %d old tool results: %.0f%% -> %.0f%%",
+            mc_cleared,
+            ratio_before * 100,
+            conversation.usage_ratio() * 100,
+        )
+    if not conversation.needs_compaction():
+        _record_success(conv_id, now)
+        await log_compaction(
+            ctx,
+            conversation,
+            ratio_before,
+            event_bus,
+            pre_inventory=pre_inventory,
+        )
+        return
+
+    # --- Step 1: Prune old tool results (free, fast) ---
+    protect = max(2000, config.max_context_tokens // 12)
+    pruned = await conversation.prune_old_tool_results(
+        protect_tokens=protect,
+        min_prune_tokens=max(1000, protect // 3),
+    )
+    if pruned > 0:
+        logger.info(
+            "Pruned %d old tool results: %.0f%% -> %.0f%%",
+            pruned,
+            ratio_before * 100,
+            conversation.usage_ratio() * 100,
+        )
+    if not conversation.needs_compaction():
+        _record_success(conv_id, now)
+        await log_compaction(
+            ctx,
+            conversation,
+            ratio_before,
+            event_bus,
+            pre_inventory=pre_inventory,
+        )
+        return
+
+    # --- Step 2: Standard structure-preserving compaction (free, no LLM) ---
+    spill_dir = config.spillover_dir
+    if spill_dir:
+        await conversation.compact_preserving_structure(
+            spillover_dir=spill_dir,
+            keep_recent=4,
+            phase_graduated=phase_grad,
+        )
+    if not conversation.needs_compaction():
+        _record_success(conv_id, now)
+        await log_compaction(
+            ctx,
+            conversation,
+            ratio_before,
+            event_bus,
+            pre_inventory=pre_inventory,
+        )
+        return
+
+    # --- Step 3: LLM summary compaction ---
+    if ctx.llm is not None:
+        logger.info(
+            "LLM summary compaction triggered (%.0f%% usage)",
+            conversation.usage_ratio() * 100,
+        )
+        try:
+            summary = await llm_compact(
+                ctx,
+                list(conversation.messages),
+                accumulator,
+                char_limit=char_limit,
+                max_depth=max_depth,
+                max_context_tokens=config.max_context_tokens,
+            )
+            await conversation.compact(
+                summary,
+                keep_recent=2,
+                phase_graduated=phase_grad,
+            )
+        except Exception as e:
+            logger.warning("LLM compaction failed: %s", e)
+            _failure_counts[conv_id] = _failure_counts.get(conv_id, 0) + 1
+
+    if not conversation.needs_compaction():
+        _record_success(conv_id, now)
+        await log_compaction(
+            ctx,
+            conversation,
+            ratio_before,
+            event_bus,
+            pre_inventory=pre_inventory,
+        )
+        return
+
+    # --- Step 4: Emergency deterministic summary (LLM failed/unavailable) ---
+    logger.warning(
+        "Emergency compaction (%.0f%% usage)",
+        conversation.usage_ratio() * 100,
+    )
+    summary = build_emergency_summary(ctx, accumulator, conversation, config)
+    await conversation.compact(
+        summary,
+        keep_recent=1,
+        phase_graduated=phase_grad,
+    )
+    _record_success(conv_id, now)
+    await log_compaction(
+        ctx,
+        conversation,
+        ratio_before,
+        event_bus,
+        pre_inventory=pre_inventory,
+    )
+
+
+def _record_success(conv_id: int, timestamp: float) -> None:
+    """Reset failure counter and record compaction time on success."""
+    _failure_counts.pop(conv_id, None)
+    _last_compact_times[conv_id] = timestamp
+
+
+# --- LLM compaction with binary-search splitting ----------------------
+
+
+def strip_images_from_messages(messages: list[Message]) -> list[Message]:
+    """Strip image_content from messages before LLM summarisation.
+
+    Images/documents are replaced with ``[image]`` markers so the summary
+    notes they existed without wasting tokens sending binary data to the
+    compaction LLM.  Returns a new list (original messages are not mutated).
+    """
+    stripped: list[Message] = []
+    for msg in messages:
+        if msg.image_content:
+            n_images = len(msg.image_content)
+            marker = " ".join("[image]" for _ in range(n_images))
+            content = f"{msg.content}\n{marker}" if msg.content else marker
+            stripped.append(
+                Message(
+                    seq=msg.seq,
+                    role=msg.role,
+                    content=content,
+                    tool_use_id=msg.tool_use_id,
+                    tool_calls=msg.tool_calls,
+                    is_error=msg.is_error,
+                    phase_id=msg.phase_id,
+                    is_transition_marker=msg.is_transition_marker,
+                    image_content=None,  # stripped
+                )
+            )
+        else:
+            stripped.append(msg)
+    return stripped
+
+
+async def llm_compact(
+    ctx: NodeContext,
+    messages: list,
+    accumulator: OutputAccumulator | None = None,
+    _depth: int = 0,
+    *,
+    char_limit: int = LLM_COMPACT_CHAR_LIMIT,
+    max_depth: int = LLM_COMPACT_MAX_DEPTH,
+    max_context_tokens: int = 128_000,
+) -> str:
+    """Summarise *messages* with LLM, splitting recursively if too large.
+
+    If the formatted text exceeds ``LLM_COMPACT_CHAR_LIMIT`` or the LLM
+    rejects the call with a context-length error, the messages are split
+    in half and each half is summarised independently.  Tool history is
+    appended once at the top-level call (``_depth == 0``).
+    """
+    from framework.graph.conversation import extract_tool_call_history
+    from framework.graph.event_loop.tool_result_handler import is_context_too_large_error
+
+    if _depth > max_depth:
+        raise RuntimeError(f"LLM compaction recursion limit ({max_depth})")
+
+    # Strip images before summarisation to avoid wasting tokens
+    if _depth == 0:
+        messages = strip_images_from_messages(messages)
+
+    formatted = format_messages_for_summary(messages)
+
+    # Proactive split: avoid wasting an API call on oversized input
+    if len(formatted) > char_limit and len(messages) > 1:
+        summary = await _llm_compact_split(
+            ctx,
+            messages,
+            accumulator,
+            _depth,
+            char_limit=char_limit,
+            max_depth=max_depth,
+            max_context_tokens=max_context_tokens,
+        )
+    else:
+        prompt = build_llm_compaction_prompt(
+            ctx,
+            accumulator,
+            formatted,
+            max_context_tokens=max_context_tokens,
+        )
+        summary_budget = max(1024, max_context_tokens // 2)
+        try:
+            response = await ctx.llm.acomplete(
+                messages=[{"role": "user", "content": prompt}],
+                system=(
+                    "You are a conversation compactor for an AI agent. "
+                    "Write a detailed summary that allows the agent to "
+                    "continue its work. Preserve user-stated rules, "
+                    "constraints, and account/identity preferences verbatim."
+                ),
+                max_tokens=summary_budget,
+            )
+            summary = response.content
+        except Exception as e:
+            if is_context_too_large_error(e) and len(messages) > 1:
+                logger.info(
+                    "LLM context too large (depth=%d, msgs=%d) — splitting",
+                    _depth,
+                    len(messages),
+                )
+                summary = await _llm_compact_split(
+                    ctx,
+                    messages,
+                    accumulator,
+                    _depth,
+                    char_limit=char_limit,
+                    max_depth=max_depth,
+                    max_context_tokens=max_context_tokens,
+                )
+            else:
+                raise
+
+    # Append tool history at top level only
+    if _depth == 0:
+        tool_history = extract_tool_call_history(messages)
+        if tool_history and "TOOLS ALREADY CALLED" not in summary:
+            summary += "\n\n" + tool_history
+
+    return summary
+
+
+async def _llm_compact_split(
+    ctx: NodeContext,
+    messages: list,
+    accumulator: OutputAccumulator | None,
+    _depth: int,
+    *,
+    char_limit: int = LLM_COMPACT_CHAR_LIMIT,
+    max_depth: int = LLM_COMPACT_MAX_DEPTH,
+    max_context_tokens: int = 128_000,
+) -> str:
+    """Split messages in half and summarise each half independently."""
+    mid = max(1, len(messages) // 2)
+    s1 = await llm_compact(
+        ctx,
+        messages[:mid],
+        None,
+        _depth + 1,
+        char_limit=char_limit,
+        max_depth=max_depth,
+        max_context_tokens=max_context_tokens,
+    )
+    s2 = await llm_compact(
+        ctx,
+        messages[mid:],
+        accumulator,
+        _depth + 1,
+        char_limit=char_limit,
+        max_depth=max_depth,
+        max_context_tokens=max_context_tokens,
+    )
+    return s1 + "\n\n" + s2
+
+
+# --- Compaction helpers ------------------------------------------------
+
+
+def format_messages_for_summary(messages: list) -> str:
+    """Format messages as text for LLM summarisation."""
+    lines: list[str] = []
+    for m in messages:
+        if m.role == "tool":
+            content = m.content[:500]
+            if len(m.content) > 500:
+                content += "..."
+            lines.append(f"[tool result]: {content}")
+        elif m.role == "assistant" and m.tool_calls:
+            names = [tc.get("function", {}).get("name", "?") for tc in m.tool_calls]
+            text = m.content[:200] if m.content else ""
+            lines.append(f"[assistant (calls: {', '.join(names)})]: {text}")
+        else:
+            lines.append(f"[{m.role}]: {m.content}")
+    return "\n\n".join(lines)
+
+
+def build_llm_compaction_prompt(
+    ctx: NodeContext,
+    accumulator: OutputAccumulator | None,
+    formatted_messages: str,
+    *,
+    max_context_tokens: int = 128_000,
+) -> str:
+    """Build prompt for LLM compaction targeting 50% of token budget.
+
+    Uses a structured section format inspired by Claude Code's compact
+    service.  Each section focuses on a different aspect of the conversation
+    so the summariser produces consistently useful, well-organised output.
+    """
+    spec = ctx.node_spec
+    ctx_lines = [f"NODE: {spec.name} (id={spec.id})"]
+    if spec.description:
+        ctx_lines.append(f"PURPOSE: {spec.description}")
+    if spec.success_criteria:
+        ctx_lines.append(f"SUCCESS CRITERIA: {spec.success_criteria}")
+
+    if accumulator:
+        acc = accumulator.to_dict()
+        done = {k: v for k, v in acc.items() if v is not None}
+        todo = [k for k, v in acc.items() if v is None]
+        if done:
+            ctx_lines.append(
+                "OUTPUTS ALREADY SET:\n"
+                + "\n".join(f"  {k}: {str(v)[:150]}" for k, v in done.items())
+            )
+        if todo:
+            ctx_lines.append(f"OUTPUTS STILL NEEDED: {', '.join(todo)}")
+    elif spec.output_keys:
+        ctx_lines.append(f"OUTPUTS STILL NEEDED: {', '.join(spec.output_keys)}")
+
+    target_tokens = max_context_tokens // 2
+    target_chars = target_tokens * 4
+    node_ctx = "\n".join(ctx_lines)
+
+    return (
+        "You are compacting an AI agent's conversation history. "
+        "The agent is still working and needs to continue.\n\n"
+        f"AGENT CONTEXT:\n{node_ctx}\n\n"
+        f"CONVERSATION MESSAGES:\n{formatted_messages}\n\n"
+        "INSTRUCTIONS:\n"
+        f"Write a summary of approximately {target_chars} characters "
+        f"(~{target_tokens} tokens).\n\n"
+        "Organise the summary into these sections (omit empty ones):\n\n"
+        "1. **Primary Request and Intent** — What the user originally asked "
+        "for and the high-level goal the agent is working toward.\n"
+        "2. **Key Technical Concepts** — Important domain-specific terms, "
+        "patterns, or architectural decisions established in the conversation.\n"
+        "3. **Files and Code Sections** — Specific files read/written/edited "
+        "with brief descriptions of changes. Include short code snippets only "
+        "when they capture critical logic.\n"
+        "4. **Errors and Fixes** — Problems encountered and how they were "
+        "resolved. Include root causes so the agent doesn't repeat them.\n"
+        "5. **Problem Solving Efforts** — Approaches tried, dead ends hit, "
+        "and reasoning behind the current strategy.\n"
+        "6. **User Messages** — Preserve ALL user-stated rules, constraints, "
+        "identity preferences, and account details verbatim.\n"
+        "7. **Pending Tasks** — Work remaining, outputs still needed, and "
+        "any blockers.\n"
+        "8. **Current Work** — The most recent action taken and the immediate "
+        "next step the agent should perform. This section is the most important "
+        "for seamless resumption.\n\n"
+        "Additional rules:\n"
+        "- Be detailed enough that the agent can resume without re-doing work.\n"
+        "- Preserve key decisions made and results obtained.\n"
+        "- When in doubt, keep information rather than discard it.\n"
+    )
+
+
+def build_message_inventory(conversation: NodeConversation) -> list[dict[str, Any]]:
+    """Build a per-message size inventory for debug logging."""
+    inventory: list[dict[str, Any]] = []
+    for message in conversation.messages:
+        content_chars = len(message.content)
+        tool_call_args_chars = 0
+        tool_name = None
+        if message.tool_calls:
+            for tool_call in message.tool_calls:
+                args = tool_call.get("function", {}).get("arguments", "")
+                tool_call_args_chars += (
+                    len(args) if isinstance(args, str) else len(json.dumps(args))
+                )
+            names = [
+                tool_call.get("function", {}).get("name", "?") for tool_call in message.tool_calls
+            ]
+            tool_name = ", ".join(names)
+        elif message.role == "tool" and message.tool_use_id:
+            for previous in conversation.messages:
+                if previous.tool_calls:
+                    for tool_call in previous.tool_calls:
+                        if tool_call.get("id") == message.tool_use_id:
+                            tool_name = tool_call.get("function", {}).get("name", "?")
+                            break
+                if tool_name:
+                    break
+        entry: dict[str, Any] = {
+            "seq": message.seq,
+            "role": message.role,
+            "content_chars": content_chars,
+        }
+        if tool_call_args_chars:
+            entry["tool_call_args_chars"] = tool_call_args_chars
+        if tool_name:
+            entry["tool"] = tool_name
+        if message.is_error:
+            entry["is_error"] = True
+        if message.phase_id:
+            entry["phase"] = message.phase_id
+        if content_chars > 2000:
+            entry["preview"] = message.content[:200] + "…"
+        inventory.append(entry)
+    return inventory
+
+
+def write_compaction_debug_log(
+    ctx: NodeContext,
+    before_pct: int,
+    after_pct: int,
+    level: str,
+    inventory: list[dict[str, Any]] | None,
+) -> None:
+    """Write detailed compaction analysis to ~/.hive/compaction_log/."""
+    log_dir = Path.home() / ".hive" / "compaction_log"
+    log_dir.mkdir(parents=True, exist_ok=True)
+
+    ts = datetime.now(UTC).strftime("%Y%m%dT%H%M%S_%f")
+    node_label = ctx.node_id.replace("/", "_")
+    log_path = log_dir / f"{ts}_{node_label}.md"
+
+    lines: list[str] = [
+        f"# Compaction Debug — {ctx.node_id}",
+        f"**Time:** {datetime.now(UTC).isoformat()}",
+        f"**Node:** {ctx.node_spec.name} (`{ctx.node_id}`)",
+    ]
+    if ctx.stream_id:
+        lines.append(f"**Stream:** {ctx.stream_id}")
+    lines.append(f"**Level:** {level}")
+    lines.append(f"**Usage:** {before_pct}% → {after_pct}%")
+    lines.append("")
+
+    if inventory:
+        total_chars = sum(
+            entry.get("content_chars", 0) + entry.get("tool_call_args_chars", 0)
+            for entry in inventory
+        )
+        lines.append(
+            "## Pre-Compaction Message Inventory "
+            f"({len(inventory)} messages, {total_chars:,} total chars)"
+        )
+        lines.append("")
+        ranked = sorted(
+            inventory,
+            key=lambda entry: entry.get("content_chars", 0) + entry.get("tool_call_args_chars", 0),
+            reverse=True,
+        )
+        lines.append("| # | seq | role | tool | chars | % of total | flags |")
+        lines.append("|---|-----|------|------|------:|------------|-------|")
+        for i, entry in enumerate(ranked, 1):
+            chars = entry.get("content_chars", 0) + entry.get("tool_call_args_chars", 0)
+            pct = (chars / total_chars * 100) if total_chars else 0
+            tool = entry.get("tool", "")
+            flags: list[str] = []
+            if entry.get("is_error"):
+                flags.append("error")
+            if entry.get("phase"):
+                flags.append(f"phase={entry['phase']}")
+            lines.append(
+                f"| {i} | {entry['seq']} | {entry['role']} | {tool} "
+                f"| {chars:,} | {pct:.1f}% | {', '.join(flags)} |"
+            )
+
+        large = [entry for entry in ranked if entry.get("preview")]
+        if large:
+            lines.append("")
+            lines.append("### Large message previews")
+            for entry in large:
+                lines.append(
+                    f"\n**seq={entry['seq']}** ({entry['role']}, {entry.get('tool', '')}):"
+                )
+                lines.append(f"```\n{entry['preview']}\n```")
+    lines.append("")
+
+    try:
+        log_path.write_text("\n".join(lines), encoding="utf-8")
+        logger.debug("Compaction debug log written to %s", log_path)
+    except OSError:
+        logger.debug("Failed to write compaction debug log to %s", log_path)
+
+
+async def log_compaction(
+    ctx: NodeContext,
+    conversation: NodeConversation,
+    ratio_before: float,
+    event_bus: EventBus | None,
+    *,
+    pre_inventory: list[dict[str, Any]] | None = None,
+) -> None:
+    """Log compaction result to runtime logger and event bus."""
+    ratio_after = conversation.usage_ratio()
+    before_pct = round(ratio_before * 100)
+    after_pct = round(ratio_after * 100)
+
+    # Determine label from what happened
+    if after_pct >= before_pct - 1:
+        level = "prune_only"
+    elif ratio_after <= 0.6:
+        level = "llm"
+    else:
+        level = "structural"
+
+    logger.info(
+        "Compaction complete (%s): %d%% -> %d%%",
+        level,
+        before_pct,
+        after_pct,
+    )
+
+    if ctx.runtime_logger:
+        ctx.runtime_logger.log_step(
+            node_id=ctx.node_id,
+            node_type="event_loop",
+            step_index=-1,
+            llm_text=f"Context compacted ({level}): {before_pct}% \u2192 {after_pct}%",
+            verdict="COMPACTION",
+            verdict_feedback=f"level={level} before={before_pct}% after={after_pct}%",
+        )
+
+    if event_bus:
+        from framework.runtime.event_bus import AgentEvent, EventType
+
+        event_data: dict[str, Any] = {
+            "level": level,
+            "usage_before": before_pct,
+            "usage_after": after_pct,
+        }
+        if pre_inventory is not None:
+            event_data["message_inventory"] = pre_inventory
+        await event_bus.publish(
+            AgentEvent(
+                type=EventType.CONTEXT_COMPACTED,
+                stream_id=ctx.stream_id or ctx.node_id,
+                node_id=ctx.node_id,
+                data=event_data,
+            )
+        )
+
+    await publish_context_usage(event_bus, ctx, conversation, "post_compaction")
+
+    if os.environ.get("HIVE_COMPACTION_DEBUG"):
+        write_compaction_debug_log(ctx, before_pct, after_pct, level, pre_inventory)
+
+
+def build_emergency_summary(
+    ctx: NodeContext,
+    accumulator: OutputAccumulator | None = None,
+    conversation: NodeConversation | None = None,
+    config: LoopConfig | None = None,
+) -> str:
+    """Build a structured emergency compaction summary.
+
+    Unlike normal/aggressive compaction which uses an LLM summary,
+    emergency compaction cannot afford an LLM call (context is already
+    way over budget).  Instead, build a deterministic summary from the
+    node's known state so the LLM can continue working after
+    compaction without losing track of its task and inputs.
+    """
+    parts = [
+        "EMERGENCY COMPACTION — previous conversation was too large "
+        "and has been replaced with this summary.\n"
+    ]
+
+    # 1. Node identity
+    spec = ctx.node_spec
+    parts.append(f"NODE: {spec.name} (id={spec.id})")
+    if spec.description:
+        parts.append(f"PURPOSE: {spec.description}")
+
+    # 2. Inputs the node received
+    input_lines = []
+    for key in spec.input_keys:
+        value = ctx.input_data.get(key) or ctx.buffer.read(key)
+        if value is not None:
+            # Truncate long values but keep them recognisable
+            v_str = str(value)
+            if len(v_str) > 200:
+                v_str = v_str[:200] + "…"
+            input_lines.append(f"  {key}: {v_str}")
+    if input_lines:
+        parts.append("INPUTS:\n" + "\n".join(input_lines))
+
+    # 3. Output accumulator state (what's been set so far)
+    if accumulator:
+        acc_state = accumulator.to_dict()
+        set_keys = {k: v for k, v in acc_state.items() if v is not None}
+        missing = [k for k, v in acc_state.items() if v is None]
+        if set_keys:
+            lines = [f"  {k}: {str(v)[:150]}" for k, v in set_keys.items()]
+            parts.append("OUTPUTS ALREADY SET:\n" + "\n".join(lines))
+        if missing:
+            parts.append(f"OUTPUTS STILL NEEDED: {', '.join(missing)}")
+    elif spec.output_keys:
+        parts.append(f"OUTPUTS STILL NEEDED: {', '.join(spec.output_keys)}")
+
+    # 4. Available tools reminder
+    if spec.tools:
+        parts.append(f"AVAILABLE TOOLS: {', '.join(spec.tools)}")
+
+    # 5. Spillover files — list actual files so the LLM can load
+    # them immediately instead of having to call list_data_files first.
+    spillover_dir = config.spillover_dir if config else None
+    if spillover_dir:
+        try:
+            from pathlib import Path
+
+            data_dir = Path(spillover_dir)
+            if data_dir.is_dir():
+                all_files = sorted(f.name for f in data_dir.iterdir() if f.is_file())
+                # Separate conversation history files from regular data files
+                conv_files = [f for f in all_files if re.match(r"conversation_\d+\.md$", f)]
+                data_files = [f for f in all_files if f not in conv_files]
+
+                if conv_files:
+                    conv_list = "\n".join(
+                        f"  - {f}  (full path: {data_dir / f})" for f in conv_files
+                    )
+                    parts.append(
+                        "CONVERSATION HISTORY (freeform messages saved during compaction — "
+                        "use load_data('<filename>') to review earlier dialogue):\n" + conv_list
+                    )
+                if data_files:
+                    file_list = "\n".join(
+                        f"  - {f}  (full path: {data_dir / f})" for f in data_files[:30]
+                    )
+                    parts.append("DATA FILES (use load_data('<filename>') to read):\n" + file_list)
+                if not all_files:
+                    parts.append(
+                        "NOTE: Large tool results may have been saved to files. "
+                        "Use list_directory to check the data directory."
+                    )
+        except Exception:
+            parts.append(
+                "NOTE: Large tool results were saved to files. "
+                "Use read_file(path='<path>') to read them."
+            )
+
+    # 6. Tool call history (prevent re-calling tools)
+    if conversation is not None:
+        tool_history = _extract_tool_call_history(conversation)
+        if tool_history:
+            parts.append(tool_history)
+
+    parts.append(
+        "\nContinue working towards setting the remaining outputs. "
+        "Use your tools and the inputs above."
+    )
+    return "\n\n".join(parts)
+
+
+def _extract_tool_call_history(conversation: NodeConversation) -> str:
+    """Extract tool call history from conversation messages.
+
+    This is the instance-level variant that operates on a NodeConversation
+    directly (vs. the module-level extract_tool_call_history in conversation.py
+    which works on raw message lists).
+    """
+    from framework.graph.conversation import extract_tool_call_history
+
+    return extract_tool_call_history(list(conversation.messages))
@@ -0,0 +1,258 @@
+"""Cursor persistence, queue draining, and pause detection.
+
+Handles the checkpoint/resume cycle: restoring state from a previous
+conversation store, writing cursor data, and managing injection/trigger
+queues between iterations.
+"""
+
+from __future__ import annotations
+
+import asyncio
+import json
+import logging
+from collections.abc import Awaitable, Callable
+from dataclasses import dataclass
+from typing import Any
+
+from framework.graph.conversation import ConversationStore, NodeConversation
+from framework.graph.event_loop.types import LoopConfig, OutputAccumulator, TriggerEvent
+from framework.graph.node import NodeContext
+from framework.llm.capabilities import supports_image_tool_results
+
+logger = logging.getLogger(__name__)
+
+
+@dataclass
+class RestoredState:
+    """State recovered from a previous checkpoint."""
+
+    conversation: NodeConversation
+    accumulator: OutputAccumulator
+    start_iteration: int
+    recent_responses: list[str]
+    recent_tool_fingerprints: list[list[tuple[str, str]]]
+    pending_input: dict[str, Any] | None
+
+
+async def restore(
+    conversation_store: ConversationStore | None,
+    ctx: NodeContext,
+    config: LoopConfig,
+) -> RestoredState | None:
+    """Attempt to restore from a previous checkpoint.
+
+    Returns a ``RestoredState`` with conversation, accumulator, iteration
+    counter, and stall/doom-loop detection state — everything needed to
+    resume exactly where execution stopped.
+    """
+    if conversation_store is None:
+        return None
+
+    # In isolated mode, filter parts by phase_id so the node only sees
+    # its own messages in the shared flat conversation store.  In
+    # continuous mode (or when _restore is called for timer-resume)
+    # load all parts — the full conversation threads across nodes.
+    _is_continuous = getattr(ctx, "continuous_mode", False)
+    phase_filter = None if _is_continuous else ctx.node_id
+    conversation = await NodeConversation.restore(
+        conversation_store,
+        phase_id=phase_filter,
+        run_id=ctx.effective_run_id,
+    )
+    if conversation is None:
+        return None
+
+    # If run_id filtering removed all messages, this is an intentional
+    # restart (new run), not a crash recovery.  Return None so the caller
+    # falls through to the fresh-conversation path.
+    if conversation.message_count == 0:
+        return None
+
+    accumulator = await OutputAccumulator.restore(conversation_store, run_id=ctx.effective_run_id)
+    accumulator.spillover_dir = config.spillover_dir
+    accumulator.max_value_chars = config.max_output_value_chars
+
+    cursor = await conversation_store.read_cursor() or {}
+    start_iteration = cursor.get("iteration", 0) + 1
+
+    # Restore stall/doom-loop detection state
+    recent_responses: list[str] = cursor.get("recent_responses", [])
+    raw_fps = cursor.get("recent_tool_fingerprints", [])
+    recent_tool_fingerprints: list[list[tuple[str, str]]] = [
+        [tuple(pair) for pair in fps]  # type: ignore[misc]
+        for fps in raw_fps
+    ]
+    pending_input = cursor.get("pending_input")
+    if not isinstance(pending_input, dict):
+        pending_input = None
+
+    logger.info(
+        f"Restored event loop: iteration={start_iteration}, "
+        f"messages={conversation.message_count}, "
+        f"outputs={list(accumulator.values.keys())}, "
+        f"stall_window={len(recent_responses)}, "
+        f"doom_window={len(recent_tool_fingerprints)}"
+    )
+    return RestoredState(
+        conversation=conversation,
+        accumulator=accumulator,
+        start_iteration=start_iteration,
+        recent_responses=recent_responses,
+        recent_tool_fingerprints=recent_tool_fingerprints,
+        pending_input=pending_input,
+    )
+
+
+async def write_cursor(
+    conversation_store: ConversationStore | None,
+    ctx: NodeContext,
+    conversation: NodeConversation,
+    accumulator: OutputAccumulator,
+    iteration: int,
+    *,
+    recent_responses: list[str] | None = None,
+    recent_tool_fingerprints: list[list[tuple[str, str]]] | None = None,
+    pending_input: dict[str, Any] | None = None,
+) -> None:
+    """Write checkpoint cursor for crash recovery.
+
+    Persists iteration counter, accumulator outputs, and stall/doom-loop
+    detection state so that resume picks up exactly where execution stopped.
+    """
+    if conversation_store:
+        cursor = await conversation_store.read_cursor() or {}
+        cursor.update(
+            {
+                "iteration": iteration,
+                "node_id": ctx.node_id,
+                "outputs": accumulator.to_dict(),
+            }
+        )
+        # Persist stall/doom-loop detection state for reliable resume
+        if recent_responses is not None:
+            cursor["recent_responses"] = recent_responses
+        if recent_tool_fingerprints is not None:
+            # Convert list[list[tuple]] → list[list[list]] for JSON
+            cursor["recent_tool_fingerprints"] = [
+                [list(pair) for pair in fps] for fps in recent_tool_fingerprints
+            ]
+        # Persist blocked-input state so restored runs re-block instead of
+        # manufacturing a synthetic continuation turn.
+        cursor["pending_input"] = pending_input
+        await conversation_store.write_cursor(cursor)
+
+
+async def drain_injection_queue(
+    queue: asyncio.Queue,
+    conversation: NodeConversation,
+    *,
+    ctx: NodeContext,
+    describe_images_as_text_fn: (
+        Callable[[list[dict[str, Any]]], Awaitable[str | None]] | None
+    ) = None,
+) -> int:
+    """Drain all pending injected events as user messages. Returns count."""
+    count = 0
+    logger.debug(
+        "[drain_injection_queue] Starting to drain queue, initial queue size: %s",
+        queue.qsize() if hasattr(queue, "qsize") else "unknown",
+    )
+    while not queue.empty():
+        try:
+            content, is_client_input, image_content = queue.get_nowait()
+            logger.info(
+                "[drain] injected message (client_input=%s, images=%d): %s",
+                is_client_input,
+                len(image_content) if image_content else 0,
+                content[:200] if content else "(empty)",
+            )
+            if image_content and ctx.llm and not supports_image_tool_results(ctx.llm.model):
+                logger.info(
+                    "Model '%s' does not support images; attempting vision fallback",
+                    ctx.llm.model,
+                )
+                if describe_images_as_text_fn is not None:
+                    description = await describe_images_as_text_fn(image_content)
+                    if description:
+                        content = f"{content}\n\n{description}" if content else description
+                        logger.info("[drain] image described as text via vision fallback")
+                    else:
+                        logger.info("[drain] no vision fallback available; images dropped")
+                image_content = None
+            # Real user input is stored as-is; external events get a prefix
+            if is_client_input:
+                await conversation.add_user_message(
+                    content,
+                    is_client_input=True,
+                    image_content=image_content,
+                )
+            else:
+                await conversation.add_user_message(f"[External event]: {content}")
+            count += 1
+        except asyncio.QueueEmpty:
+            break
+    return count
+
+
+async def drain_trigger_queue(
+    queue: asyncio.Queue,
+    conversation: NodeConversation,
+) -> int:
+    """Drain all pending trigger events as a single batched user message.
+
+    Multiple triggers are merged so the LLM sees them atomically and can
+    reason about all pending triggers before acting.
+    """
+    triggers: list[TriggerEvent] = []
+    while not queue.empty():
+        try:
+            triggers.append(queue.get_nowait())
+        except asyncio.QueueEmpty:
+            break
+
+    if not triggers:
+        return 0
+
+    parts: list[str] = []
+    for t in triggers:
+        task = t.payload.get("task", "")
+        task_line = f"\nTask: {task}" if task else ""
+        payload_str = json.dumps(t.payload, default=str)
+        parts.append(f"[TRIGGER: {t.trigger_type}/{t.source_id}]{task_line}\n{payload_str}")
+
+    combined = "\n\n".join(parts)
+    logger.info("[drain] %d trigger(s): %s", len(triggers), combined[:200])
+    await conversation.add_user_message(combined)
+    return len(triggers)
+
+
+async def check_pause(
+    ctx: NodeContext,
+    conversation: NodeConversation,
+    iteration: int,
+) -> bool:
+    """
+    Check if pause has been requested. Returns True if paused.
+
+    Note: This check happens BEFORE starting iteration N, after completing N-1.
+    If paused, the node exits having completed {iteration} iterations (0 to iteration-1).
+    """
+    # Check executor-level pause event (for /pause command, Ctrl+Z)
+    if ctx.pause_event and ctx.pause_event.is_set():
+        completed = iteration  # 0-indexed: iteration=3 means 3 iterations completed (0,1,2)
+        logger.info(f"⏸ Pausing after {completed} iteration(s) completed (executor-level)")
+        return True
+
+    # Check context-level pause flags (legacy/alternative methods)
+    pause_requested = ctx.input_data.get("pause_requested", False)
+    if not pause_requested:
+        try:
+            pause_requested = ctx.buffer.read("pause_requested") or False
+        except (PermissionError, KeyError):
+            pause_requested = False
+    if pause_requested:
+        completed = iteration
+        logger.info(f"⏸ Pausing after {completed} iteration(s) completed (context-level)")
+        return True
+
+    return False
@@ -0,0 +1,360 @@
+"""EventBus publishing helpers for the event loop.
+
+Thin wrappers around EventBus.emit_*() calls that check for bus existence
+before publishing.  Extracted to reduce noise in the main orchestrator.
+"""
+
+from __future__ import annotations
+
+import logging
+import time
+
+from framework.graph.conversation import NodeConversation
+from framework.graph.event_loop.types import HookContext
+from framework.graph.node import NodeContext
+from framework.runtime.event_bus import EventBus
+
+logger = logging.getLogger(__name__)
+
+
+async def publish_loop_started(
+    event_bus: EventBus | None,
+    stream_id: str,
+    node_id: str,
+    max_iterations: int,
+    execution_id: str = "",
+) -> None:
+    if event_bus:
+        await event_bus.emit_node_loop_started(
+            stream_id=stream_id,
+            node_id=node_id,
+            max_iterations=max_iterations,
+            execution_id=execution_id,
+        )
+
+
+async def generate_action_plan(
+    event_bus: EventBus | None,
+    ctx: NodeContext,
+    stream_id: str,
+    node_id: str,
+    execution_id: str,
+) -> None:
+    """Generate a brief action plan via LLM and emit it as an SSE event.
+
+    Runs as a fire-and-forget task so it never blocks the main loop.
+    """
+    try:
+        system_prompt = ctx.node_spec.system_prompt or ""
+        # Trim to keep the prompt small
+        prompt_summary = system_prompt[:500]
+        if len(system_prompt) > 500:
+            prompt_summary += "..."
+
+        tool_names = [t.name for t in ctx.available_tools]
+        output_keys = ctx.node_spec.output_keys or []
+
+        prompt = (
+            f'You are about to work on a task as node "{node_id}".\n\n'
+            f"System prompt:\n{prompt_summary}\n\n"
+            f"Tools available: {tool_names}\n"
+            f"Required outputs: {output_keys}\n\n"
+            f"Write a brief action plan (2-5 bullet points) describing "
+            f"what you will do to complete this task. Be specific and concise.\n"
+            f"Return ONLY the plan text, no preamble."
+        )
+
+        response = await ctx.llm.acomplete(
+            messages=[{"role": "user", "content": prompt}],
+            max_tokens=1024,
+        )
+
+        plan = response.content.strip()
+        if plan and event_bus:
+            await event_bus.emit_node_action_plan(
+                stream_id=stream_id,
+                node_id=node_id,
+                plan=plan,
+                execution_id=execution_id,
+            )
+    except Exception as e:
+        logger.warning("Action plan generation failed for node '%s': %s", node_id, e)
+
+
+async def publish_iteration(
+    event_bus: EventBus | None,
+    stream_id: str,
+    node_id: str,
+    iteration: int,
+    execution_id: str = "",
+    extra_data: dict | None = None,
+) -> None:
+    if event_bus:
+        await event_bus.emit_node_loop_iteration(
+            stream_id=stream_id,
+            node_id=node_id,
+            iteration=iteration,
+            execution_id=execution_id,
+            extra_data=extra_data,
+        )
+
+
+async def publish_llm_turn_complete(
+    event_bus: EventBus | None,
+    stream_id: str,
+    node_id: str,
+    stop_reason: str,
+    model: str,
+    input_tokens: int,
+    output_tokens: int,
+    cached_tokens: int = 0,
+    execution_id: str = "",
+    iteration: int | None = None,
+) -> None:
+    if event_bus:
+        await event_bus.emit_llm_turn_complete(
+            stream_id=stream_id,
+            node_id=node_id,
+            stop_reason=stop_reason,
+            model=model,
+            input_tokens=input_tokens,
+            output_tokens=output_tokens,
+            cached_tokens=cached_tokens,
+            execution_id=execution_id,
+            iteration=iteration,
+        )
+
+
+def log_skip_judge(
+    ctx: NodeContext,
+    node_id: str,
+    iteration: int,
+    feedback: str,
+    tool_calls: list[dict],
+    llm_text: str,
+    turn_tokens: dict[str, int],
+    iter_start: float,
+) -> None:
+    """Log a CONTINUE step that skips judge evaluation (e.g., waiting for input)."""
+    if ctx.runtime_logger:
+        ctx.runtime_logger.log_step(
+            node_id=node_id,
+            node_type="event_loop",
+            step_index=iteration,
+            verdict="CONTINUE",
+            verdict_feedback=feedback,
+            tool_calls=tool_calls,
+            llm_text=llm_text,
+            input_tokens=turn_tokens.get("input", 0),
+            output_tokens=turn_tokens.get("output", 0),
+            latency_ms=int((time.time() - iter_start) * 1000),
+        )
+
+
+async def publish_loop_completed(
+    event_bus: EventBus | None,
+    stream_id: str,
+    node_id: str,
+    iterations: int,
+    execution_id: str = "",
+) -> None:
+    if event_bus:
+        await event_bus.emit_node_loop_completed(
+            stream_id=stream_id,
+            node_id=node_id,
+            iterations=iterations,
+            execution_id=execution_id,
+        )
+
+
+async def publish_context_usage(
+    event_bus: EventBus | None,
+    ctx: NodeContext,
+    conversation: NodeConversation,
+    trigger: str,
+) -> None:
+    """Emit a CONTEXT_USAGE_UPDATED event with current context window state."""
+    if not event_bus:
+        return
+
+    from framework.runtime.event_bus import AgentEvent, EventType
+
+    estimated = conversation.estimate_tokens()
+    max_tokens = conversation._max_context_tokens
+    ratio = estimated / max_tokens if max_tokens > 0 else 0.0
+    await event_bus.publish(
+        AgentEvent(
+            type=EventType.CONTEXT_USAGE_UPDATED,
+            stream_id=ctx.stream_id or ctx.node_id,
+            node_id=ctx.node_id,
+            data={
+                "usage_ratio": round(ratio, 4),
+                "usage_pct": round(ratio * 100),
+                "message_count": conversation.message_count,
+                "estimated_tokens": estimated,
+                "max_context_tokens": max_tokens,
+                "trigger": trigger,
+            },
+        )
+    )
+
+
+async def publish_stalled(
+    event_bus: EventBus | None,
+    stream_id: str,
+    node_id: str,
+    execution_id: str = "",
+) -> None:
+    if event_bus:
+        await event_bus.emit_node_stalled(
+            stream_id=stream_id,
+            node_id=node_id,
+            reason="Consecutive similar responses detected",
+            execution_id=execution_id,
+        )
+
+
+async def publish_text_delta(
+    event_bus: EventBus | None,
+    stream_id: str,
+    node_id: str,
+    content: str,
+    snapshot: str,
+    ctx: NodeContext,
+    execution_id: str = "",
+    iteration: int | None = None,
+    inner_turn: int = 0,
+) -> None:
+    if event_bus:
+        if ctx.emits_client_io:
+            await event_bus.emit_client_output_delta(
+                stream_id=stream_id,
+                node_id=node_id,
+                content=content,
+                snapshot=snapshot,
+                execution_id=execution_id,
+                iteration=iteration,
+                inner_turn=inner_turn,
+            )
+        else:
+            await event_bus.emit_llm_text_delta(
+                stream_id=stream_id,
+                node_id=node_id,
+                content=content,
+                snapshot=snapshot,
+                execution_id=execution_id,
+                inner_turn=inner_turn,
+            )
+
+
+async def publish_tool_started(
+    event_bus: EventBus | None,
+    stream_id: str,
+    node_id: str,
+    tool_use_id: str,
+    tool_name: str,
+    tool_input: dict,
+    execution_id: str = "",
+) -> None:
+    if event_bus:
+        await event_bus.emit_tool_call_started(
+            stream_id=stream_id,
+            node_id=node_id,
+            tool_use_id=tool_use_id,
+            tool_name=tool_name,
+            tool_input=tool_input,
+            execution_id=execution_id,
+        )
+
+
+async def publish_tool_completed(
+    event_bus: EventBus | None,
+    stream_id: str,
+    node_id: str,
+    tool_use_id: str,
+    tool_name: str,
+    result: str,
+    is_error: bool,
+    execution_id: str = "",
+) -> None:
+    if event_bus:
+        await event_bus.emit_tool_call_completed(
+            stream_id=stream_id,
+            node_id=node_id,
+            tool_use_id=tool_use_id,
+            tool_name=tool_name,
+            result=result,
+            is_error=is_error,
+            execution_id=execution_id,
+        )
+
+
+async def publish_judge_verdict(
+    event_bus: EventBus | None,
+    stream_id: str,
+    node_id: str,
+    action: str,
+    feedback: str = "",
+    judge_type: str = "implicit",
+    iteration: int = 0,
+    execution_id: str = "",
+) -> None:
+    if event_bus:
+        await event_bus.emit_judge_verdict(
+            stream_id=stream_id,
+            node_id=node_id,
+            action=action,
+            feedback=feedback,
+            judge_type=judge_type,
+            iteration=iteration,
+            execution_id=execution_id,
+        )
+
+
+async def publish_output_key_set(
+    event_bus: EventBus | None,
+    stream_id: str,
+    node_id: str,
+    key: str,
+    execution_id: str = "",
+) -> None:
+    if event_bus:
+        await event_bus.emit_output_key_set(
+            stream_id=stream_id, node_id=node_id, key=key, execution_id=execution_id
+        )
+
+
+async def run_hooks(
+    hooks_config: dict[str, list],
+    event: str,
+    conversation: NodeConversation,
+    trigger: str | None = None,
+) -> None:
+    """Run all registered hooks for *event*, applying their results.
+
+    Each hook receives a HookContext and may return a HookResult that:
+    - replaces the system prompt (result.system_prompt)
+    - injects an extra user message (result.inject)
+    Hooks run in registration order; each sees the prompt as left by the
+    previous hook.
+    """
+    hook_list = hooks_config.get(event, [])
+    if not hook_list:
+        return
+    for hook in hook_list:
+        ctx = HookContext(
+            event=event,
+            trigger=trigger,
+            system_prompt=conversation.system_prompt,
+        )
+        try:
+            result = await hook(ctx)
+        except Exception:
+            logger.warning("Hook '%s' raised an exception", event, exc_info=True)
+            continue
+        if result is None:
+            continue
+        if result.system_prompt:
+            conversation.update_system_prompt(result.system_prompt)
+        if result.inject:
+            await conversation.add_user_message(result.inject)
@@ -0,0 +1,175 @@
+"""Judge evaluation pipeline for the event loop."""
+
+from __future__ import annotations
+
+import logging
+from collections.abc import Callable
+
+from framework.graph.conversation import NodeConversation
+from framework.graph.event_loop.types import JudgeProtocol, JudgeVerdict, OutputAccumulator
+from framework.graph.node import NodeContext
+
+logger = logging.getLogger(__name__)
+
+
+class SubagentJudge:
+    """Judge for subagent execution."""
+
+    def __init__(self, task: str, max_iterations: int = 10):
+        self._task = task
+        self._max_iterations = max_iterations
+
+    async def evaluate(self, context: dict[str, object]) -> JudgeVerdict:
+        missing = context.get("missing_keys", [])
+        if not isinstance(missing, list) or not missing:
+            return JudgeVerdict(action="ACCEPT", feedback="")
+
+        iteration = context.get("iteration", 0)
+        if not isinstance(iteration, int):
+            iteration = 0
+        remaining = self._max_iterations - iteration - 1
+
+        if remaining <= 3:
+            urgency = (
+                f"URGENT: Only {remaining} iterations left. "
+                f"Stop all other work and call set_output NOW for: {missing}"
+            )
+        elif remaining <= self._max_iterations // 2:
+            urgency = (
+                f"WARNING: {remaining} iterations remaining. "
+                f"You must call set_output for: {missing}"
+            )
+        else:
+            urgency = f"Missing output keys: {missing}. Use set_output to provide them."
+
+        return JudgeVerdict(action="RETRY", feedback=f"Your task: {self._task}\n{urgency}")
+
+
+async def judge_turn(
+    *,
+    mark_complete_flag: bool,
+    judge: JudgeProtocol | None,
+    ctx: NodeContext,
+    conversation: NodeConversation,
+    accumulator: OutputAccumulator,
+    assistant_text: str,
+    tool_results: list[dict[str, object]],
+    iteration: int,
+    get_missing_output_keys_fn: Callable[
+        [OutputAccumulator, list[str] | None, list[str] | None],
+        list[str],
+    ],
+    max_context_tokens: int,
+) -> JudgeVerdict:
+    """Evaluate the current state using judge or implicit logic.
+
+    Evaluation levels (in order):
+      0. Short-circuits: mark_complete, skip_judge, tool-continue.
+      1. Custom judge (JudgeProtocol) — full authority when set.
+      2. Implicit judge — output-key check + optional conversation-aware
+         quality gate (when ``success_criteria`` is defined).
+
+    Returns a JudgeVerdict.  ``feedback=None`` means no real evaluation
+    happened (skip_judge, tool-continue); the caller must not inject a
+    feedback message.  Any non-None feedback (including ``""``) means a
+    real evaluation occurred and will be logged into the conversation.
+    """
+    # --- Level 0: short-circuits (no evaluation) -----------------------
+
+    if mark_complete_flag:
+        return JudgeVerdict(action="ACCEPT")
+
+    if ctx.node_spec.skip_judge:
+        return JudgeVerdict(action="RETRY")  # feedback=None → not logged
+
+    # --- Level 1: custom judge -----------------------------------------
+
+    if judge is not None:
+        context = {
+            "assistant_text": assistant_text,
+            "tool_calls": tool_results,
+            "output_accumulator": accumulator.to_dict(),
+            "accumulator": accumulator,
+            "iteration": iteration,
+            "conversation_summary": conversation.export_summary(),
+            "output_keys": ctx.node_spec.output_keys,
+            "missing_keys": get_missing_output_keys_fn(
+                accumulator, ctx.node_spec.output_keys, ctx.node_spec.nullable_output_keys
+            ),
+        }
+        verdict = await judge.evaluate(context)
+        # Ensure evaluated RETRY always carries feedback for logging.
+        if verdict.action == "RETRY" and not verdict.feedback:
+            return JudgeVerdict(action="RETRY", feedback="Custom judge returned RETRY.")
+        return verdict
+
+    # --- Level 2: implicit judge ---------------------------------------
+
+    # Real tool calls were made — let the agent keep working.
+    if tool_results:
+        return JudgeVerdict(action="RETRY")  # feedback=None → not logged
+
+    missing = get_missing_output_keys_fn(
+        accumulator, ctx.node_spec.output_keys, ctx.node_spec.nullable_output_keys
+    )
+
+    if missing:
+        return JudgeVerdict(
+            action="RETRY",
+            feedback=(
+                f"Task incomplete. Required outputs not yet produced: {missing}. "
+                f"Follow your system prompt instructions to complete the work."
+            ),
+        )
+
+    # All output keys present — run safety checks before accepting.
+
+    output_keys = ctx.node_spec.output_keys or []
+    nullable_keys = set(ctx.node_spec.nullable_output_keys or [])
+
+    # All-nullable with nothing set → node produced nothing useful.
+    all_nullable = output_keys and nullable_keys >= set(output_keys)
+    none_set = not any(accumulator.get(k) is not None for k in output_keys)
+    if all_nullable and none_set:
+        return JudgeVerdict(
+            action="RETRY",
+            feedback=(
+                f"No output keys have been set yet. "
+                f"Use set_output to set at least one of: {output_keys}"
+            ),
+        )
+
+    # Queen with no output keys → continuous interaction node.
+    # Inject tool-use pressure instead of auto-accepting.
+    if not output_keys and ctx.supports_direct_user_io:
+        return JudgeVerdict(
+            action="RETRY",
+            feedback=(
+                "STOP describing what you will do. "
+                "You have FULL access to all tools — file creation, "
+                "shell commands, MCP tools — and you CAN call them "
+                "directly in your response. Respond ONLY with tool "
+                "calls, no prose. Execute the task now."
+            ),
+        )
+
+    # Level 2b: conversation-aware quality check (if success_criteria set)
+    if ctx.node_spec.success_criteria and ctx.llm:
+        from framework.graph.conversation_judge import evaluate_phase_completion
+
+        verdict = await evaluate_phase_completion(
+            llm=ctx.llm,
+            conversation=conversation,
+            phase_name=ctx.node_spec.name,
+            phase_description=ctx.node_spec.description,
+            success_criteria=ctx.node_spec.success_criteria,
+            accumulator_state=accumulator.to_dict(),
+            max_context_tokens=max_context_tokens,
+        )
+        if verdict.action != "ACCEPT":
+            return JudgeVerdict(
+                action=verdict.action,
+                feedback=verdict.feedback or "Phase criteria not met.",
+            )
+
+    return JudgeVerdict(action="ACCEPT", feedback="")
@@ -0,0 +1,106 @@
+"""Stall and doom-loop detection for the event loop.
+
+Pure functions with no class dependencies — safe to call from any context.
+"""
+
+from __future__ import annotations
+
+import json
+
+
+def ngram_similarity(s1: str, s2: str, n: int = 2) -> float:
+    """Jaccard similarity of n-gram sets.
+
+    Returns 0.0-1.0, where 1.0 is exact match.
+    Fast: O(len(s) + len(s2)) using set operations.
+    """
+
+    def _ngrams(s: str) -> set[str]:
+        return {s[i : i + n] for i in range(len(s) - n + 1) if s.strip()}
+
+    if not s1 or not s2:
+        return 0.0
+
+    ngrams1, ngrams2 = _ngrams(s1.lower()), _ngrams(s2.lower())
+    if not ngrams1 or not ngrams2:
+        return 0.0
+
+    intersection = len(ngrams1 & ngrams2)
+    union = len(ngrams1 | ngrams2)
+    return intersection / union if union else 0.0
+
+
+def is_stalled(
+    recent_responses: list[str],
+    threshold: int,
+    similarity_threshold: float,
+) -> bool:
+    """Detect stall using n-gram similarity.
+
+    Detects when ALL N consecutive responses are mutually similar
+    (>= threshold).  A single dissimilar response resets the signal.
+    This catches phrases like "I'm still stuck" vs "I'm stuck"
+    without false-positives on "attempt 1" vs "attempt 2".
+    """
+    if len(recent_responses) < threshold:
+        return False
+    if not recent_responses[0]:
+        return False
+
+    # Every consecutive pair must be similar
+    for i in range(1, len(recent_responses)):
+        if ngram_similarity(recent_responses[i], recent_responses[i - 1]) < similarity_threshold:
+            return False
+    return True
+
+
+def fingerprint_tool_calls(
+    tool_results: list[dict],
+) -> list[tuple[str, str]]:
+    """Create deterministic fingerprints for a turn's tool calls.
+
+    Each fingerprint is (tool_name, canonical_args_json).  Order-sensitive
+    so [search("a"), fetch("b")] != [fetch("b"), search("a")].
+    """
+    fingerprints = []
+    for tr in tool_results:
+        name = tr.get("tool_name", "")
+        args = tr.get("tool_input", {})
+        try:
+            canonical = json.dumps(args, sort_keys=True, default=str)
+        except (TypeError, ValueError):
+            canonical = str(args)
+        fingerprints.append((name, canonical))
+    return fingerprints
+
+
+def is_tool_doom_loop(
+    recent_tool_fingerprints: list[list[tuple[str, str]]],
+    threshold: int,
+    enabled: bool = True,
+) -> tuple[bool, str]:
+    """Detect doom loop via exact fingerprint match.
+
+    Detects when N consecutive turns invoke the same tools with
+    identical (canonicalized) arguments.  Different arguments mean
+    different work, so only exact matches count.
+
+    Returns (is_doom_loop, description).
+    """
+    if not enabled:
+        return False, ""
+    if len(recent_tool_fingerprints) < threshold:
+        return False, ""
+    first = recent_tool_fingerprints[0]
+    if not first:
+        return False, ""
+
+    # All turns in the window must match the first exactly
+    if all(fp == first for fp in recent_tool_fingerprints[1:]):
+        tool_names = [name for name, _ in first]
+        desc = (
+            f"Doom loop detected: {len(recent_tool_fingerprints)} "
+            f"identical consecutive tool calls ({', '.join(tool_names)})"
+        )
+        return True, desc
+    return False, ""
@@ -0,0 +1,370 @@
+"""Subagent execution for the event loop.
+
+Handles the full subagent lifecycle: validation, context setup, tool filtering,
+conversation store derivation, execution, and cleanup.
+"""
+
+from __future__ import annotations
+
+import json
+import logging
+import time
+from collections.abc import Awaitable, Callable
+from pathlib import Path
+from typing import TYPE_CHECKING, Any
+
+from framework.graph.conversation import ConversationStore
+from framework.graph.event_loop.judge_pipeline import SubagentJudge
+from framework.graph.event_loop.types import LoopConfig, OutputAccumulator
+from framework.graph.node import DataBuffer, NodeContext
+from framework.llm.provider import ToolResult, ToolUse
+from framework.runner.tool_registry import ToolRegistry
+from framework.runtime.event_bus import EventBus
+
+if TYPE_CHECKING:
+    from framework.graph.event_loop_node import EventLoopNode
+
+logger = logging.getLogger(__name__)
+
+
+async def execute_subagent(
+    ctx: NodeContext,
+    agent_id: str,
+    task: str,
+    *,
+    config: LoopConfig,
+    event_loop_node_cls: type[EventLoopNode],
+    escalation_receiver_cls: Callable[[], Any],
+    accumulator: OutputAccumulator | None = None,
+    event_bus: EventBus | None = None,
+    tool_executor: Callable[[ToolUse], ToolResult | Awaitable[ToolResult]] | None = None,
+    conversation_store: ConversationStore | None = None,
+    subagent_instance_counter: dict[str, int] | None = None,
+) -> ToolResult:
+    """Execute a subagent and return the result as a ToolResult.
+
+    The subagent:
+    - Gets a fresh conversation with just the task
+    - Has read-only access to the parent's readable memory
+    - Cannot delegate to its own subagents (prevents recursion)
+    - Returns its output in structured JSON format
+
+    Args:
+        ctx: Parent node's context (for memory, tools, LLM access).
+        agent_id: The node ID of the subagent to invoke.
+        task: The task description to give the subagent.
+        accumulator: Parent's OutputAccumulator.
+        event_bus: EventBus for lifecycle events.
+        config: LoopConfig for iteration/tool limits.
+        tool_executor: Tool executor callable.
+        conversation_store: Parent conversation store (for deriving subagent store).
+        subagent_instance_counter: Mutable counter dict for unique subagent paths.
+
+    Returns:
+        ToolResult with structured JSON output.
+    """
+    # Log subagent invocation start
+    logger.info(
+        "\n" + "=" * 60 + "\n"
+        "🤖 SUBAGENT INVOCATION\n"
+        "=" * 60 + "\n"
+        "Parent Node: %s\n"
+        "Subagent ID: %s\n"
+        "Task: %s\n" + "=" * 60,
+        ctx.node_id,
+        agent_id,
+        task[:500] + "..." if len(task) > 500 else task,
+    )
+
+    # 1. Validate agent exists in registry
+    if agent_id not in ctx.node_registry:
+        return ToolResult(
+            tool_use_id="",
+            content=json.dumps(
+                {
+                    "message": f"Sub-agent '{agent_id}' not found in registry",
+                    "data": None,
+                    "metadata": {"agent_id": agent_id, "success": False, "error": "not_found"},
+                }
+            ),
+            is_error=True,
+        )
+
+    subagent_spec = ctx.node_registry[agent_id]
+
+    # 2. Create read-only memory snapshot
+    parent_data = ctx.buffer.read_all()
+
+    # Merge in-flight outputs from the parent's accumulator.
+    if accumulator:
+        for key, value in accumulator.to_dict().items():
+            if key not in parent_data:
+                parent_data[key] = value
+
+    subagent_buffer = DataBuffer()
+    for key, value in parent_data.items():
+        subagent_buffer.write(key, value, validate=False)
+
+    read_keys = set(parent_data.keys()) | set(subagent_spec.input_keys or [])
+    scoped_buffer = subagent_buffer.with_permissions(
+        read_keys=list(read_keys),
+        write_keys=[],  # Read-only!
+    )
+
+    # 2b. Compute instance counter early so the callback and child context
+    # share the same stable node_id for this subagent invocation.
+    if subagent_instance_counter is not None:
+        subagent_instance_counter.setdefault(agent_id, 0)
+        subagent_instance_counter[agent_id] += 1
+        subagent_instance = str(subagent_instance_counter[agent_id])
+    else:
+        subagent_instance = "1"
+
+    if subagent_instance == "1":
+        sa_node_id = f"{ctx.node_id}:subagent:{agent_id}"
+    else:
+        sa_node_id = f"{ctx.node_id}:subagent:{agent_id}:{subagent_instance}"
+
+    # 2c. Set up report callback (one-way channel to parent / event bus)
+    subagent_reports: list[dict] = []
+
+    async def _report_callback(
+        message: str,
+        data: dict | None = None,
+        *,
+        wait_for_response: bool = False,
+    ) -> str | None:
+        subagent_reports.append({"message": message, "data": data, "timestamp": time.time()})
+        if event_bus:
+            await event_bus.emit_subagent_report(
+                stream_id=ctx.node_id,
+                node_id=sa_node_id,
+                subagent_id=agent_id,
+                message=message,
+                data=data,
+                execution_id=ctx.execution_id,
+            )
+
+        if not wait_for_response:
+            return None
+
+        if not event_bus:
+            logger.warning(
+                "Subagent '%s' requested user response but no event_bus available",
+                agent_id,
+            )
+            return None
+
+        # Create isolated receiver and register for input routing
+        import uuid
+
+        escalation_id = f"{ctx.node_id}:escalation:{uuid.uuid4().hex[:8]}"
+        receiver = escalation_receiver_cls()
+        registry = ctx.shared_node_registry
+
+        registry[escalation_id] = receiver
+        try:
+            await event_bus.emit_escalation_requested(
+                stream_id=ctx.stream_id or ctx.node_id,
+                node_id=escalation_id,
+                reason=f"Subagent report (wait_for_response) from {agent_id}",
+                context=message,
+                execution_id=ctx.execution_id,
+            )
+            # Block until queen responds
+            return await receiver.wait()
+        finally:
+            registry.pop(escalation_id, None)
+
+    # 3. Filter tools for subagent
+    subagent_tool_names = set(subagent_spec.tools or [])
+    tool_source = ctx.all_tools if ctx.all_tools else ctx.available_tools
+
+    # GCU auto-population
+    if subagent_spec.node_type == "gcu" and not subagent_tool_names:
+        subagent_tools = [t for t in tool_source if t.name != "delegate_to_sub_agent"]
+    else:
+        subagent_tools = [
+            t
+            for t in tool_source
+            if t.name in subagent_tool_names and t.name != "delegate_to_sub_agent"
+        ]
+
+    missing = subagent_tool_names - {t.name for t in subagent_tools}
+    if missing:
+        logger.warning(
+            "Subagent '%s' requested tools not found in catalog: %s",
+            agent_id,
+            sorted(missing),
+        )
+
+    logger.info(
+        "📦 Subagent '%s' configuration:\n"
+        "   - System prompt: %s\n"
+        "   - Tools available (%d): %s\n"
+        "   - Memory keys inherited: %s",
+        agent_id,
+        (subagent_spec.system_prompt[:200] + "...")
+        if subagent_spec.system_prompt and len(subagent_spec.system_prompt) > 200
+        else subagent_spec.system_prompt,
+        len(subagent_tools),
+        [t.name for t in subagent_tools],
+        list(parent_data.keys()),
+    )
+
+    # 4. Build subagent context
+    max_iter = min(config.max_iterations, 10)
+    subagent_ctx = NodeContext(
+        runtime=ctx.runtime,
+        node_id=sa_node_id,
+        node_spec=subagent_spec,
+        buffer=scoped_buffer,
+        input_data={"task": task, **parent_data},
+        llm=ctx.llm,
+        available_tools=subagent_tools,
+        goal_context=(
+            f"Your specific task: {task}\n\n"
+            f"COMPLETION REQUIREMENTS:\n"
+            f"When your task is done, you MUST call set_output() "
+            f"for each required key: {subagent_spec.output_keys}\n"
+            f"Alternatively, call report_to_parent(mark_complete=true) "
+            f"with your findings in message/data.\n"
+            + (
+                "Before finishing, call browser_close_finished() to clean up your browser tabs.\n"
+                if subagent_spec.node_type == "gcu"
+                else ""
+            )
+            + f"You have a maximum of {max_iter} turns to complete this task."
+        ),
+        goal=ctx.goal,
+        max_tokens=ctx.max_tokens,
+        runtime_logger=ctx.runtime_logger,
+        is_subagent_mode=True,  # Prevents nested delegation
+        report_callback=_report_callback,
+        node_registry={},  # Empty - no nested subagents
+        shared_node_registry=ctx.shared_node_registry,  # For escalation routing
+    )
+
+    # 5. Create and execute subagent EventLoopNode
+    subagent_conv_store = None
+    if conversation_store is not None:
+        from framework.storage.conversation_store import FileConversationStore
+
+        parent_base = getattr(conversation_store, "_base", None)
+        if parent_base is not None:
+            conversations_dir = parent_base.parent
+            subagent_dir_name = f"{agent_id}-{subagent_instance}"
+            subagent_store_path = conversations_dir / subagent_dir_name
+            subagent_conv_store = FileConversationStore(base_path=subagent_store_path)
+
+    # Derive a subagent-scoped spillover dir
+    subagent_spillover = None
+    if config.spillover_dir:
+        subagent_spillover = str(Path(config.spillover_dir) / agent_id / subagent_instance)
+
+    subagent_node = event_loop_node_cls(
+        event_bus=event_bus,
+        judge=SubagentJudge(task=task, max_iterations=max_iter),
+        config=LoopConfig(
+            max_iterations=max_iter,
+            max_tool_calls_per_turn=config.max_tool_calls_per_turn,
+            tool_call_overflow_margin=config.tool_call_overflow_margin,
+            max_context_tokens=config.max_context_tokens,
+            stall_detection_threshold=config.stall_detection_threshold,
+            max_tool_result_chars=config.max_tool_result_chars,
+            spillover_dir=subagent_spillover,
+        ),
+        tool_executor=tool_executor,
+        conversation_store=subagent_conv_store,
+    )
+
+    # Each subagent instance gets its own unique browser profile so concurrent
+    # subagents don't share tab groups. The profile is set as execution context
+    # so the tool registry auto-injects it into every browser_* MCP tool call.
+    _gcu_profile = f"{agent_id}:{subagent_instance}"
+    _profile_token = ToolRegistry.set_execution_context(profile=_gcu_profile)
+
+    try:
+        logger.info("🚀 Starting subagent '%s' execution...", agent_id)
+        start_time = time.time()
+        result = await subagent_node.execute(subagent_ctx)
+        latency_ms = int((time.time() - start_time) * 1000)
+
+        separator = "-" * 60
+        logger.info(
+            "\n%s\n"
+            "✅ SUBAGENT '%s' COMPLETED\n"
+            "%s\n"
+            "Success: %s\n"
+            "Latency: %dms\n"
+            "Tokens used: %s\n"
+            "Output keys: %s\n"
+            "%s",
+            separator,
+            agent_id,
+            separator,
+            result.success,
+            latency_ms,
+            result.tokens_used,
+            list(result.output.keys()) if result.output else [],
+            separator,
+        )
+
+        result_json = {
+            "message": (
+                f"Sub-agent '{agent_id}' completed successfully"
+                if result.success
+                else f"Sub-agent '{agent_id}' failed: {result.error}"
+            ),
+            "data": result.output,
+            "reports": subagent_reports if subagent_reports else None,
+            "metadata": {
+                "agent_id": agent_id,
+                "success": result.success,
+                "tokens_used": result.tokens_used,
+                "latency_ms": latency_ms,
+                "report_count": len(subagent_reports),
+            },
+        }
+
+        return ToolResult(
+            tool_use_id="",
+            content=json.dumps(result_json, indent=2, default=str),
+            is_error=not result.success,
+        )
+
+    except Exception as e:
+        logger.exception(
+            "\n" + "!" * 60 + "\n❌ SUBAGENT '%s' FAILED\nError: %s\n" + "!" * 60,
+            agent_id,
+            str(e),
+        )
+        result_json = {
+            "message": f"Sub-agent '{agent_id}' raised exception: {e}",
+            "data": None,
+            "metadata": {
+                "agent_id": agent_id,
+                "success": False,
+                "error": str(e),
+            },
+        }
+        return ToolResult(
+            tool_use_id="",
+            content=json.dumps(result_json, indent=2),
+            is_error=True,
+        )
+    finally:
+        ToolRegistry.reset_execution_context(_profile_token)
+        # Close the tab group this subagent created, if any.
+        try:
+            from gcu.browser.bridge import get_bridge
+            from gcu.browser.tools.lifecycle import _contexts
+
+            bridge = get_bridge()
+            ctx_entry = _contexts.pop(_gcu_profile, None)
+            if bridge and bridge.is_connected and ctx_entry:
+                group_id = ctx_entry.get("groupId")
+                if group_id is not None:
+                    await bridge.destroy_context(group_id)
+        except Exception:
+            pass
@@ -0,0 +1,369 @@
+"""Synthetic tool builders for the event loop.
+
+Factory functions that create ``Tool`` definitions for framework-level
+synthetic tools (set_output, ask_user, escalate, delegate, report_to_parent).
+Also includes the ``handle_set_output`` validation logic.
+
+All functions are pure — they receive explicit parameters and return
+``Tool`` or ``ToolResult`` objects with no side effects.
+"""
+
+from __future__ import annotations
+
+from typing import Any
+
+from framework.llm.provider import Tool, ToolResult
+
+
+def build_ask_user_tool() -> Tool:
+    """Build the synthetic ask_user tool for explicit user-input requests.
+
+    The queen calls ask_user() when it needs to pause and wait
+    for user input.  Text-only turns WITHOUT ask_user flow through without
+    blocking, allowing progress updates and summaries to stream freely.
+    """
+    return Tool(
+        name="ask_user",
+        description=(
+            "You MUST call this tool whenever you need the user's response. "
+            "Always call it after greeting the user, asking a question, or "
+            "requesting approval. Do NOT call it for status updates or "
+            "summaries that don't require a response. "
+            "Always include 2-3 predefined options. The UI automatically "
+            "appends an 'Other' free-text input after your options, so NEVER "
+            "include catch-all options like 'Custom idea', 'Something else', "
+            "'Other', or 'None of the above' — the UI handles that. "
+            "When the question primarily needs a typed answer but you must "
+            "include options, make one option signal that typing is expected "
+            "(e.g. 'I\\'ll type my response'). This helps users discover the "
+            "free-text input. "
+            "The ONLY exception: omit options when the question demands a "
+            "free-form answer the user must type out (e.g. 'Describe your "
+            "agent idea', 'Paste the error message'). "
+            '{"question": "What would you like to do?", "options": '
+            '["Build a new agent", "Modify existing agent", "Run tests"]} '
+            "Free-form example: "
+            '{"question": "Describe the agent you want to build."}'
+        ),
+        parameters={
+            "type": "object",
+            "properties": {
+                "question": {
+                    "type": "string",
+                    "description": "The question or prompt shown to the user.",
+                },
+                "options": {
+                    "type": "array",
+                    "items": {"type": "string"},
+                    "description": (
+                        "2-3 specific predefined choices. Include in most cases. "
+                        'Example: ["Option A", "Option B", "Option C"]. '
+                        "The UI always appends an 'Other' free-text input, so "
+                        "do NOT include catch-alls like 'Custom idea' or 'Other'. "
+                        "Omit ONLY when the user must type a free-form answer."
+                    ),
+                    "minItems": 2,
+                    "maxItems": 3,
+                },
+            },
+            "required": ["question"],
+        },
+    )
+
+
+def build_ask_user_multiple_tool() -> Tool:
+    """Build the synthetic ask_user_multiple tool for batched questions.
+
+    Queen-only tool that presents multiple questions at once so the user
+    can answer them all in a single interaction rather than one at a time.
+    """
+    return Tool(
+        name="ask_user_multiple",
+        description=(
+            "Ask the user multiple questions at once. Use this instead of "
+            "ask_user when you have 2 or more questions to ask in the same "
+            "turn — it lets the user answer everything in one go rather than "
+            "going back and forth. Each question can have its own predefined "
+            "options (2-3 choices) or be free-form. The UI renders all "
+            "questions together with a single Submit button. "
+            "ALWAYS prefer this over ask_user when you have multiple things "
+            "to clarify. "
+            "IMPORTANT: Do NOT repeat the questions in your text response — "
+            "the widget renders them. Keep your text to a brief intro only. "
+            '{"questions": ['
+            '  {"id": "scope", "prompt": "What scope?", "options": ["Full", "Partial"]},'
+            '  {"id": "format", "prompt": "Output format?", "options": ["PDF", "CSV", "JSON"]},'
+            '  {"id": "details", "prompt": "Any special requirements?"}'
+            "]}"
+        ),
+        parameters={
+            "type": "object",
+            "properties": {
+                "questions": {
+                    "type": "array",
+                    "items": {
+                        "type": "object",
+                        "properties": {
+                            "id": {
+                                "type": "string",
+                                "description": (
+                                    "Short identifier for this question (used in the response)."
+                                ),
+                            },
+                            "prompt": {
+                                "type": "string",
+                                "description": "The question text shown to the user.",
+                            },
+                            "options": {
+                                "type": "array",
+                                "items": {"type": "string"},
+                                "description": (
+                                    "2-3 predefined choices. The UI appends an "
+                                    "'Other' free-text input automatically. "
+                                    "Omit only when the user must type a free-form answer."
+                                ),
+                                "minItems": 2,
+                                "maxItems": 3,
+                            },
+                        },
+                        "required": ["id", "prompt"],
+                    },
+                    "minItems": 2,
+                    "maxItems": 8,
+                    "description": "List of questions to present to the user.",
+                },
+            },
+            "required": ["questions"],
+        },
+    )
+
+
+def build_set_output_tool(output_keys: list[str] | None) -> Tool | None:
+    """Build the synthetic set_output tool for explicit output declaration."""
+    if not output_keys:
+        return None
+    return Tool(
+        name="set_output",
+        description=(
+            "Set an output value for this node. Call once per output key. "
+            "Use this for brief notes, counts, status, and file references — "
+            "NOT for large data payloads. When a tool result was saved to a "
+            "data file, pass the filename as the value "
+            "(e.g. 'google_sheets_get_values_1.txt') so the next phase can "
+            "load the full data. Values exceeding ~2000 characters are "
+            "auto-saved to data files. "
+            f"Valid keys: {output_keys}"
+        ),
+        parameters={
+            "type": "object",
+            "properties": {
+                "key": {
+                    "type": "string",
+                    "description": f"Output key. Must be one of: {output_keys}",
+                    "enum": output_keys,
+                },
+                "value": {
+                    "type": "string",
+                    "description": (
+                        "The output value — a brief note, count, status, "
+                        "or data filename reference."
+                    ),
+                },
+            },
+            "required": ["key", "value"],
+        },
+    )
+
+
+def build_escalate_tool() -> Tool:
+    """Build the synthetic escalate tool for worker -> queen handoff."""
+    return Tool(
+        name="escalate",
+        description=(
+            "Escalate to the queen when requesting user input, "
+            "blocked by errors, missing "
+            "credentials, or ambiguous constraints that require supervisor "
+            "guidance. Include a concise reason and optional context. "
+            "The node will pause until the queen injects guidance."
+        ),
+        parameters={
+            "type": "object",
+            "properties": {
+                "reason": {
+                    "type": "string",
+                    "description": (
+                        "Short reason for escalation (e.g. 'Tool repeatedly failing')."
+                    ),
+                },
+                "context": {
+                    "type": "string",
+                    "description": "Optional diagnostic details for the queen.",
+                },
+            },
+            "required": ["reason"],
+        },
+    )
+
+
+def build_delegate_tool(sub_agents: list[str], node_registry: dict[str, Any]) -> Tool | None:
+    """Build the synthetic delegate_to_sub_agent tool for subagent invocation.
+
+    Args:
+        sub_agents: List of node IDs that can be invoked as subagents.
+        node_registry: Map of node_id -> NodeSpec for looking up subagent descriptions.
+
+    Returns:
+        Tool definition if sub_agents is non-empty, None otherwise.
+    """
+    if not sub_agents:
+        return None
+
+    agent_descriptions = []
+    for agent_id in sub_agents:
+        spec = node_registry.get(agent_id)
+        if spec:
+            desc = getattr(spec, "description", "(no description)")
+            agent_descriptions.append(f"- {agent_id}: {desc}")
+        else:
+            agent_descriptions.append(f"- {agent_id}: (not found in registry)")
+
+    return Tool(
+        name="delegate_to_sub_agent",
+        description=(
+            "Delegate a task to a specialized sub-agent. The sub-agent runs "
+            "autonomously with read-only access to current memory and returns "
+            "its result. Use this to parallelize work or leverage specialized capabilities.\n\n"
+            "Available sub-agents:\n" + "\n".join(agent_descriptions)
+        ),
+        parameters={
+            "type": "object",
+            "properties": {
+                "agent_id": {
+                    "type": "string",
+                    "description": f"The sub-agent to invoke. Must be one of: {sub_agents}",
+                    "enum": sub_agents,
+                },
+                "task": {
+                    "type": "string",
+                    "description": (
+                        "The task description for the sub-agent to execute. "
+                        "Be specific about what you want the sub-agent to do and "
+                        "what information to return."
+                    ),
+                },
+            },
+            "required": ["agent_id", "task"],
+        },
+    )
+
+
+def build_report_to_parent_tool() -> Tool:
+    """Build the synthetic report_to_parent tool for sub-agent progress reports.
+
+    Sub-agents call this to send one-way progress updates, partial findings,
+    or status reports to the parent node (and external observers via event bus)
+    without blocking execution.
+
+    When ``wait_for_response`` is True, the sub-agent blocks until the parent
+    relays the user's response — used for escalation (e.g. login pages, CAPTCHAs).
+
+    When ``mark_complete`` is True, the sub-agent terminates immediately after
+    sending the report — no need to call set_output for each output key.
+    """
+    return Tool(
+        name="report_to_parent",
+        description=(
+            "Send a report to the parent agent. By default this is fire-and-forget: "
+            "the parent receives the report but does not respond. "
+            "Set wait_for_response=true to BLOCK until the user replies — use this "
+            "when you need human intervention (e.g. login pages, CAPTCHAs, "
+            "authentication walls). The user's response is returned as the tool result. "
+            "Set mark_complete=true to finish your task and terminate immediately "
+            "after sending the report — use this when your findings are in the "
+            "message/data fields and you don't need to call set_output."
+        ),
+        parameters={
+            "type": "object",
+            "properties": {
+                "message": {
+                    "type": "string",
+                    "description": "A human-readable status or progress message.",
+                },
+                "data": {
+                    "type": "object",
+                    "description": "Optional structured data to include with the report.",
+                },
+                "wait_for_response": {
+                    "type": "boolean",
+                    "description": (
+                        "If true, block execution until the user responds. "
+                        "Use for escalation scenarios requiring human intervention."
+                    ),
+                    "default": False,
+                },
+                "mark_complete": {
+                    "type": "boolean",
+                    "description": (
+                        "If true, terminate the sub-agent immediately after sending "
+                        "this report. The report message and data are delivered to the "
+                        "parent as the final result. No set_output calls are needed."
+                    ),
+                    "default": False,
+                },
+            },
+            "required": ["message"],
+        },
+    )
+
+
+def handle_set_output(
+    tool_input: dict[str, Any],
+    output_keys: list[str] | None,
+) -> ToolResult:
+    """Handle set_output tool call. Returns ToolResult (sync)."""
+    import logging
+    import re
+
+    logger = logging.getLogger(__name__)
+
+    key = tool_input.get("key", "")
+    value = tool_input.get("value", "")
+    valid_keys = output_keys or []
+
+    # Recover from truncated JSON (max_tokens hit mid-argument).
+    # The _raw key is set by litellm when json.loads fails.
+    if not key and "_raw" in tool_input:
+        raw = tool_input["_raw"]
+        key_match = re.search(r'"key"\s*:\s*"(\w+)"', raw)
+        if key_match:
+            key = key_match.group(1)
+        val_match = re.search(r'"value"\s*:\s*"', raw)
+        if val_match:
+            start = val_match.end()
+            value = raw[start:].rstrip()
+            for suffix in ('"}\n', '"}', '"'):
+                if value.endswith(suffix):
+                    value = value[: -len(suffix)]
+                    break
+        if key:
+            logger.warning(
+                "Recovered set_output args from truncated JSON: key=%s, value_len=%d",
+                key,
+                len(value),
+            )
+            # Re-inject so the caller sees proper key/value
+            tool_input["key"] = key
+            tool_input["value"] = value
+
+    if key not in valid_keys:
+        return ToolResult(
+            tool_use_id="",
+            content=f"Invalid output key '{key}'. Valid keys: {valid_keys}",
+            is_error=True,
+        )
+
+    return ToolResult(
+        tool_use_id="",
+        content=f"Output '{key}' set successfully.",
+        is_error=False,
+    )
@@ -0,0 +1,499 @@
+"""Tool result handling: truncation, spillover, JSON preview, and execution.
+
+Manages tool result size limits, file spillover for large results, and
+smart JSON previews.  Also includes transient error classification and
+the context-window-exceeded error detector.
+"""
+
+from __future__ import annotations
+
+import asyncio
+import contextvars
+import json
+import logging
+import re
+from pathlib import Path
+from typing import Any
+
+from framework.llm.provider import ToolResult, ToolUse
+from framework.llm.stream_events import ToolCallEvent
+
+logger = logging.getLogger(__name__)
+
+# Pattern for detecting context-window-exceeded errors across LLM providers.
+_CONTEXT_TOO_LARGE_RE = re.compile(
+    r"context.{0,20}(length|window|limit|size)|"
+    r"too.{0,10}(long|large|many.{0,10}tokens)|"
+    r"(exceed|exceeds|exceeded).{0,30}(limit|window|context|tokens)|"
+    r"maximum.{0,20}token|prompt.{0,20}too.{0,10}long",
+    re.IGNORECASE,
+)
+
+
+def is_context_too_large_error(exc: BaseException) -> bool:
+    """Detect whether an exception indicates the LLM input was too large."""
+    cls = type(exc).__name__
+    if "ContextWindow" in cls:
+        return True
+    return bool(_CONTEXT_TOO_LARGE_RE.search(str(exc)))
+
+
+def is_transient_error(exc: BaseException) -> bool:
+    """Classify whether an exception is transient (retryable) vs permanent.
+
+    Transient: network errors, rate limits, server errors, timeouts.
+    Permanent: auth errors, bad requests, context window exceeded.
+    """
+    try:
+        from litellm.exceptions import (
+            APIConnectionError,
+            BadGatewayError,
+            InternalServerError,
+            RateLimitError,
+            ServiceUnavailableError,
+        )
+
+        transient_types: tuple[type[BaseException], ...] = (
+            RateLimitError,
+            APIConnectionError,
+            InternalServerError,
+            BadGatewayError,
+            ServiceUnavailableError,
+            TimeoutError,
+            ConnectionError,
+            OSError,
+        )
+    except ImportError:
+        transient_types = (TimeoutError, ConnectionError, OSError)
+
+    if isinstance(exc, transient_types):
+        return True
+
+    # RuntimeError from StreamErrorEvent with "Stream error:" prefix
+    if isinstance(exc, RuntimeError):
+        error_str = str(exc).lower()
+        transient_keywords = [
+            "rate limit",
+            "429",
+            "timeout",
+            "connection",
+            "internal server",
+            "502",
+            "503",
+            "504",
+            "service unavailable",
+            "bad gateway",
+            "overloaded",
+            "failed to parse tool call",
+        ]
+        return any(kw in error_str for kw in transient_keywords)
+
+    return False
+
+
+def extract_json_metadata(parsed: Any, *, _depth: int = 0, _max_depth: int = 3) -> str:
+    """Return a concise structural summary of parsed JSON.
+
+    Reports key names, value types, and — crucially — array lengths so
+    the LLM knows how much data exists beyond the preview.
+
+    Returns an empty string for simple scalars.
+    """
+    if _depth >= _max_depth:
+        if isinstance(parsed, dict):
+            return f"dict with {len(parsed)} keys"
+        if isinstance(parsed, list):
+            return f"list of {len(parsed)} items"
+        return type(parsed).__name__
+
+    if isinstance(parsed, dict):
+        if not parsed:
+            return "empty dict"
+        lines: list[str] = []
+        indent = "  " * (_depth + 1)
+        for key, value in list(parsed.items())[:20]:
+            if isinstance(value, list):
+                line = f'{indent}"{key}": list of {len(value)} items'
+                if value:
+                    first = value[0]
+                    if isinstance(first, dict):
+                        sample_keys = list(first.keys())[:10]
+                        line += f" (each item: dict with keys {sample_keys})"
+                    elif isinstance(first, list):
+                        line += f" (each item: list of {len(first)} elements)"
+                lines.append(line)
+            elif isinstance(value, dict):
+                child = extract_json_metadata(value, _depth=_depth + 1, _max_depth=_max_depth)
+                lines.append(f'{indent}"{key}": {child}')
+            else:
+                lines.append(f'{indent}"{key}": {type(value).__name__}')
+        if len(parsed) > 20:
+            lines.append(f"{indent}... and {len(parsed) - 20} more keys")
+        return "\n".join(lines)
+
+    if isinstance(parsed, list):
+        if not parsed:
+            return "empty list"
+        desc = f"list of {len(parsed)} items"
+        first = parsed[0]
+        if isinstance(first, dict):
+            sample_keys = list(first.keys())[:10]
+            desc += f" (each item: dict with keys {sample_keys})"
+        elif isinstance(first, list):
+            desc += f" (each item: list of {len(first)} elements)"
+        return desc
+
+    return ""
+
+
+def build_json_preview(parsed: Any, *, max_chars: int = 5000) -> str | None:
+    """Build a smart preview of parsed JSON, truncating large arrays.
+
+    Shows first 3 + last 1 items of large arrays with explicit count
+    markers so the LLM cannot mistake the preview for the full dataset.
+
+    Returns ``None`` if no truncation was needed (no large arrays).
+    """
+    _LARGE_ARRAY_THRESHOLD = 10
+
+    def _truncate_arrays(obj: Any) -> tuple[Any, bool]:
+        """Return (truncated_copy, was_truncated)."""
+        if isinstance(obj, list) and len(obj) > _LARGE_ARRAY_THRESHOLD:
+            n = len(obj)
+            head = obj[:3]
+            tail = obj[-1:]
+            marker = f"... ({n - 4} more items omitted, {n} total) ..."
+            return head + [marker] + tail, True
+        if isinstance(obj, dict):
+            changed = False
+            out: dict[str, Any] = {}
+            for k, v in obj.items():
+                new_v, did = _truncate_arrays(v)
+                out[k] = new_v
+                changed = changed or did
+            return (out, True) if changed else (obj, False)
+        return obj, False
+
+    preview_obj, was_truncated = _truncate_arrays(parsed)
+    if not was_truncated:
+        return None  # No large arrays — caller should use raw slicing
+
+    try:
+        result = json.dumps(preview_obj, indent=2, ensure_ascii=False)
+    except (TypeError, ValueError):
+        return None
+
+    if len(result) > max_chars:
+        # Even 3+1 items too big — try just 1 item
+        def _minimal_arrays(obj: Any) -> Any:
+            if isinstance(obj, list) and len(obj) > _LARGE_ARRAY_THRESHOLD:
+                n = len(obj)
+                return obj[:1] + [f"... ({n - 1} more items omitted, {n} total) ..."]
+            if isinstance(obj, dict):
+                return {k: _minimal_arrays(v) for k, v in obj.items()}
+            return obj
+
+        preview_obj = _minimal_arrays(parsed)
+        try:
+            result = json.dumps(preview_obj, indent=2, ensure_ascii=False)
+        except (TypeError, ValueError):
+            return None
+        if len(result) > max_chars:
+            result = result[:max_chars] + "…"
+
+    return result
+
+
+def truncate_tool_result(
+    result: ToolResult,
+    tool_name: str,
+    *,
+    max_tool_result_chars: int,
+    spillover_dir: str | None,
+    next_spill_filename_fn: Any,  # Callable[[str], str]
+) -> ToolResult:
+    """Persist tool result to file and optionally truncate for context.
+
+    When *spillover_dir* is configured, EVERY non-error tool result is
+    saved to a file (short filename like ``web_search_1.txt``).  A
+    ``[Saved to '...']`` annotation is appended so the reference
+    survives pruning and compaction.
+
+    - Small results (≤ limit): full content kept + file annotation
+    - Large results (> limit): preview + file reference
+    - Errors: pass through unchanged
+    - read_file/load_data results: truncate with pagination hint (no re-spill)
+    """
+    limit = max_tool_result_chars
+
+    # Errors always pass through unchanged
+    if result.is_error:
+        return result
+
+    # read_file/load_data reads FROM spilled files — never re-spill (circular).
+    # Just truncate with a pagination hint if the result is too large.
+    if tool_name in ("load_data", "read_file"):
+        if limit <= 0 or len(result.content) <= limit:
+            return result  # Small result — pass through as-is
+        # Large result — truncate with smart preview
+        PREVIEW_CAP = min(5000, max(limit - 500, limit // 2))
+
+        metadata_str = ""
+        smart_preview: str | None = None
+        try:
+            parsed_ld = json.loads(result.content)
+            metadata_str = extract_json_metadata(parsed_ld)
+            smart_preview = build_json_preview(parsed_ld, max_chars=PREVIEW_CAP)
+        except (json.JSONDecodeError, TypeError, ValueError):
+            pass
+
+        if smart_preview is not None:
+            preview_block = smart_preview
+        else:
+            preview_block = result.content[:PREVIEW_CAP] + "…"
+
+        header = (
+            f"[{tool_name} result: {len(result.content):,} chars — "
+            f"too large for context. Use offset_bytes/limit_bytes "
+            f"parameters to read smaller chunks.]"
+        )
+        if metadata_str:
+            header += f"\n\nData structure:\n{metadata_str}"
+        header += (
+            "\n\nWARNING: This is an INCOMPLETE preview. Do NOT draw conclusions or counts from it."
+        )
+
+        truncated = f"{header}\n\nPreview (small sample only):\n{preview_block}"
+        logger.info(
+            "%s result truncated: %d → %d chars (use offset/limit to paginate)",
+            tool_name,
+            len(result.content),
+            len(truncated),
+        )
+        return ToolResult(
+            tool_use_id=result.tool_use_id,
+            content=truncated,
+            is_error=False,
+            image_content=result.image_content,
+            is_skill_content=result.is_skill_content,
+        )
+
+    spill_dir = spillover_dir
+    if spill_dir:
+        spill_path = Path(spill_dir)
+        spill_path.mkdir(parents=True, exist_ok=True)
+        filename = next_spill_filename_fn(tool_name)
+
+        # Pretty-print JSON content so read_file's line-based
+        # pagination works correctly.
+        write_content = result.content
+        parsed_json: Any = None  # track for metadata extraction
+        try:
+            parsed_json = json.loads(result.content)
+            write_content = json.dumps(parsed_json, indent=2, ensure_ascii=False)
+        except (json.JSONDecodeError, TypeError, ValueError):
+            pass  # Not JSON — write as-is
+
+        file_path = spill_path / filename
+        file_path.write_text(write_content, encoding="utf-8")
+        # Use absolute path so parent agents can find files from subagents
+        abs_path = str(file_path.resolve())
+
+        if limit > 0 and len(result.content) > limit:
+            # Large result: build a small, metadata-rich preview so the
+            # LLM cannot mistake it for the complete dataset.
+            PREVIEW_CAP = 5000
+
+            # Extract structural metadata (array lengths, key names)
+            metadata_str = ""
+            smart_preview: str | None = None
+            if parsed_json is not None:
+                metadata_str = extract_json_metadata(parsed_json)
+                smart_preview = build_json_preview(parsed_json, max_chars=PREVIEW_CAP)
+
+            if smart_preview is not None:
+                preview_block = smart_preview
+            else:
+                preview_block = result.content[:PREVIEW_CAP] + "…"
+
+            # Assemble header with structural info + warning
+            header = (
+                f"[Result from {tool_name}: {len(result.content):,} chars — "
+                f"too large for context, saved to '{abs_path}'.]\n"
+            )
+            if metadata_str:
+                header += f"\nData structure:\n{metadata_str}"
+            header += (
+                f"\n\nWARNING: The preview below is INCOMPLETE. "
+                f"Do NOT draw conclusions or counts from it. "
+                f"Use read_file(path='{abs_path}') to read the "
+                f"full data before analysis."
+            )
+
+            content = f"{header}\n\nPreview (small sample only):\n{preview_block}"
+            logger.info(
+                "Tool result spilled to file: %s (%d chars → %s)",
+                tool_name,
+                len(result.content),
+                abs_path,
+            )
+        else:
+            # Small result: keep full content + annotation with absolute path
+            content = f"{result.content}\n\n[Saved to '{abs_path}']"
+            logger.info(
+                "Tool result saved to file: %s (%d chars → %s)",
+                tool_name,
+                len(result.content),
+                filename,
+            )
+
+        return ToolResult(
+            tool_use_id=result.tool_use_id,
+            content=content,
+            is_error=False,
+            image_content=result.image_content,
+            is_skill_content=result.is_skill_content,
+        )
+
+    # No spillover_dir — truncate in-place if needed
+    if limit > 0 and len(result.content) > limit:
+        PREVIEW_CAP = min(5000, max(limit - 500, limit // 2))
+
+        metadata_str = ""
+        smart_preview: str | None = None
+        try:
+            parsed_inline = json.loads(result.content)
+            metadata_str = extract_json_metadata(parsed_inline)
+            smart_preview = build_json_preview(parsed_inline, max_chars=PREVIEW_CAP)
+        except (json.JSONDecodeError, TypeError, ValueError):
+            pass
+
+        if smart_preview is not None:
+            preview_block = smart_preview
+        else:
+            preview_block = result.content[:PREVIEW_CAP] + "…"
+
+        header = (
+            f"[Result from {tool_name}: {len(result.content):,} chars — "
+            f"truncated to fit context budget.]"
+        )
+        if metadata_str:
+            header += f"\n\nData structure:\n{metadata_str}"
+        header += (
+            "\n\nWARNING: This is an INCOMPLETE preview. "
+            "Do NOT draw conclusions or counts from the preview alone."
+        )
+
+        truncated = f"{header}\n\n{preview_block}"
+        logger.info(
+            "Tool result truncated in-place: %s (%d → %d chars)",
+            tool_name,
+            len(result.content),
+            len(truncated),
+        )
+        return ToolResult(
+            tool_use_id=result.tool_use_id,
+            content=truncated,
+            is_error=False,
+            image_content=result.image_content,
+            is_skill_content=result.is_skill_content,
+        )
+
+    return result
+
+
+async def execute_tool(
+    tool_executor: Any,  # Callable[[ToolUse], ToolResult | Awaitable[ToolResult]] | None
+    tc: ToolCallEvent,
+    timeout: float,
+    skill_dirs: list[str] | None = None,
+) -> ToolResult:
+    """Execute a tool call, handling both sync and async executors.
+
+    Applies ``tool_call_timeout_seconds`` to prevent hung MCP servers
+    from blocking the event loop indefinitely.  The initial executor
+    call is offloaded to a thread pool so that sync executors don't
+    freeze the event loop.
+    """
+    if tool_executor is None:
+        return ToolResult(
+            tool_use_id=tc.tool_use_id,
+            content=f"No tool executor configured for '{tc.tool_name}'",
+            is_error=True,
+        )
+
+    skill_dirs = skill_dirs or []
+    skill_read_tools = {"view_file", "load_data", "read_file"}
+    if tc.tool_name in skill_read_tools and skill_dirs:
+        raw_path = tc.tool_input.get("path", "")
+        if raw_path:
+            resolved = Path(raw_path).resolve(strict=False)
+            resolved_roots = [Path(skill_dir).resolve(strict=False) for skill_dir in skill_dirs]
+            if any(resolved.is_relative_to(root) for root in resolved_roots):
+                try:
+                    content = resolved.read_text(encoding="utf-8")
+                except Exception as exc:
+                    return ToolResult(
+                        tool_use_id=tc.tool_use_id,
+                        content=f"Could not read skill resource '{raw_path}': {exc}",
+                        is_error=True,
+                    )
+                return ToolResult(
+                    tool_use_id=tc.tool_use_id,
+                    content=content,
+                    is_skill_content=resolved.name == "SKILL.md",
+                )
+
+    tool_use = ToolUse(id=tc.tool_use_id, name=tc.tool_name, input=tc.tool_input)
+
+    async def _run() -> ToolResult:
+        # Offload the executor call to a thread.  Sync MCP executors
+        # block on future.result() — running in a thread keeps the
+        # event loop free so asyncio.wait_for can fire the timeout.
+        # Copy the current context so contextvars (e.g. data_dir from
+        # execution context) propagate into the worker thread.
+        loop = asyncio.get_running_loop()
+        ctx = contextvars.copy_context()
+        result = await loop.run_in_executor(None, ctx.run, tool_executor, tool_use)
+        # Async executors return a coroutine — await it on the loop
+        if asyncio.iscoroutine(result) or asyncio.isfuture(result):
+            result = await result
+        return result
+
+    try:
+        if timeout > 0:
+            result = await asyncio.wait_for(_run(), timeout=timeout)
+        else:
+            result = await _run()
+    except TimeoutError:
+        logger.warning("Tool '%s' timed out after %.0fs", tc.tool_name, timeout)
+        return ToolResult(
+            tool_use_id=tc.tool_use_id,
+            content=(
+                f"Tool '{tc.tool_name}' timed out after {timeout:.0f}s. "
+                "The operation took too long and was cancelled. "
+                "Try a simpler request or a different approach."
+            ),
+            is_error=True,
+        )
+    return result
+
+
+def restore_spill_counter(spillover_dir: str | None) -> int:
+    """Scan spillover_dir for existing spill files and return the max counter.
+
+    Returns the highest spill number found (or 0 if none).
+    """
+    if not spillover_dir:
+        return 0
+    spill_path = Path(spillover_dir)
+    if not spill_path.is_dir():
+        return 0
+    max_n = 0
+    for f in spill_path.iterdir():
+        if not f.is_file():
+            continue
+        m = re.search(r"_(\d+)\.txt$", f.name)
+        if m:
+            max_n = max(max_n, int(m.group(1)))
+    return max_n
@@ -0,0 +1,205 @@
+"""Shared types and state containers for the event loop package."""
+
+from __future__ import annotations
+
+import json
+import logging
+import time
+from dataclasses import dataclass, field
+from pathlib import Path
+from typing import Any, Literal, Protocol, runtime_checkable
+
+from framework.graph.conversation import (
+    ConversationStore,
+)
+
+logger = logging.getLogger(__name__)
+
+
+@dataclass
+class TriggerEvent:
+    """A framework-level trigger signal (timer tick or webhook hit)."""
+
+    trigger_type: str
+    source_id: str
+    payload: dict[str, Any] = field(default_factory=dict)
+    timestamp: float = field(default_factory=time.time)
+
+
+@dataclass
+class JudgeVerdict:
+    """Result of judge evaluation for the event loop."""
+
+    action: Literal["ACCEPT", "RETRY", "ESCALATE"]
+    # None  = no evaluation happened (skip_judge, tool-continue); not logged.
+    # ""    = evaluated but no feedback; logged with default text.
+    # "..." = evaluated with feedback; logged as-is.
+    feedback: str | None = None
+
+
+@runtime_checkable
+class JudgeProtocol(Protocol):
+    """Protocol for event-loop judges."""
+
+    async def evaluate(self, context: dict[str, Any]) -> JudgeVerdict: ...
+
+
+@dataclass
+class LoopConfig:
+    """Configuration for the event loop."""
+
+    max_iterations: int = 50
+    max_tool_calls_per_turn: int = 30
+    judge_every_n_turns: int = 1
+    stall_detection_threshold: int = 3
+    stall_similarity_threshold: float = 0.85
+    max_context_tokens: int = 32_000
+    store_prefix: str = ""
+
+    # Overflow margin for max_tool_calls_per_turn. Tool calls are only
+    # discarded when the count exceeds max_tool_calls_per_turn * (1 + margin).
+    tool_call_overflow_margin: float = 0.5
+
+    # Tool result context management.
+    max_tool_result_chars: int = 30_000
+    spillover_dir: str | None = None
+
+    # set_output value spilling.
+    max_output_value_chars: int = 2_000
+
+    # Stream retry.
+    max_stream_retries: int = 3
+    stream_retry_backoff_base: float = 2.0
+    stream_retry_max_delay: float = 60.0
+
+    # Tool doom loop detection.
+    tool_doom_loop_threshold: int = 3
+
+    # Client-facing auto-block grace period.
+    cf_grace_turns: int = 1
+    # Worker auto-escalation: text-only turns before escalating to queen.
+    worker_escalation_grace_turns: int = 1
+    tool_doom_loop_enabled: bool = True
+
+    # Per-tool-call timeout.
+    tool_call_timeout_seconds: float = 60.0
+
+    # Subagent delegation timeout (wall-clock max).
+    subagent_timeout_seconds: float = 3600.0
+
+    # Subagent inactivity timeout - only timeout if no activity for this duration.
+    # This resets whenever the subagent makes progress (tool calls, LLM responses).
+    # Set to 0 to use only the wall-clock timeout.
+    subagent_inactivity_timeout_seconds: float = 300.0
+
+    # Lifecycle hooks.
+    hooks: dict[str, list] | None = None
+
+    def __post_init__(self) -> None:
+        if self.hooks is None:
+            object.__setattr__(self, "hooks", {})
+
+
+@dataclass
+class HookContext:
+    """Context passed to every lifecycle hook."""
+
+    event: str
+    trigger: str | None
+    system_prompt: str
+
+
+@dataclass
+class HookResult:
+    """What a hook may return to modify node state."""
+
+    system_prompt: str | None = None
+    inject: str | None = None
+
+
+@dataclass
+class OutputAccumulator:
+    """Accumulates output key-value pairs with optional write-through persistence."""
+
+    values: dict[str, Any] = field(default_factory=dict)
+    store: ConversationStore | None = None
+    spillover_dir: str | None = None
+    max_value_chars: int = 0
+    run_id: str | None = None
+
+    async def set(self, key: str, value: Any) -> None:
+        """Set a key-value pair, auto-spilling large values to files."""
+        value = self._auto_spill(key, value)
+        self.values[key] = value
+        if self.store:
+            cursor = await self.store.read_cursor() or {}
+            outputs = cursor.get("outputs", {})
+            outputs[key] = value
+            cursor["outputs"] = outputs
+            await self.store.write_cursor(cursor)
+
+    def _auto_spill(self, key: str, value: Any) -> Any:
+        """Save large values to a file and return a reference string."""
+        if self.max_value_chars <= 0 or not self.spillover_dir:
+            return value
+
+        val_str = json.dumps(value, ensure_ascii=False) if not isinstance(value, str) else value
+        if len(val_str) <= self.max_value_chars:
+            return value
+
+        spill_path = Path(self.spillover_dir)
+        spill_path.mkdir(parents=True, exist_ok=True)
+        ext = ".json" if isinstance(value, (dict, list)) else ".txt"
+        filename = f"output_{key}{ext}"
+        write_content = (
+            json.dumps(value, indent=2, ensure_ascii=False)
+            if isinstance(value, (dict, list))
+            else str(value)
+        )
+        file_path = spill_path / filename
+        file_path.write_text(write_content, encoding="utf-8")
+        file_size = file_path.stat().st_size
+        logger.info(
+            "set_output value auto-spilled: key=%s, %d chars -> %s (%d bytes)",
+            key,
+            len(val_str),
+            filename,
+            file_size,
+        )
+        # Use absolute path so parent agents can find files from subagents
+        abs_path = str(file_path.resolve())
+        return (
+            f"[Saved to '{abs_path}' ({file_size:,} bytes). "
+            f"Use read_file(path='{abs_path}') "
+            f"to access full data.]"
+        )
+
+    def get(self, key: str) -> Any | None:
+        return self.values.get(key)
+
+    def to_dict(self) -> dict[str, Any]:
+        return dict(self.values)
+
+    def has_all_keys(self, required: list[str]) -> bool:
+        return all(key in self.values and self.values[key] is not None for key in required)
+
+    @classmethod
+    async def restore(
+        cls,
+        store: ConversationStore,
+        run_id: str | None = None,
+    ) -> OutputAccumulator:
+        cursor = await store.read_cursor()
+        values = cursor.get("outputs", {}) if cursor else {}
+        return cls(values=values, store=store, run_id=run_id)
+
+
+__all__ = [
+    "HookContext",
+    "HookResult",
+    "JudgeProtocol",
+    "JudgeVerdict",
+    "LoopConfig",
+    "OutputAccumulator",
+    "TriggerEvent",
+]
@@ -37,24 +37,45 @@ Follow these rules for reliable, efficient browser interaction.
 ## Reading Pages
 - ALWAYS prefer `browser_snapshot` over `browser_get_text("body")`
  — it returns a compact ~1-5 KB accessibility tree vs 100+ KB of raw HTML.
- Use `browser_snapshot_aria` when you need full ARIA properties
-  for detailed element inspection.
- Do NOT use `browser_screenshot` for reading text content
-  — it produces huge base64 images with no searchable text.
+- Interaction tools (`browser_click`, `browser_type`, `browser_fill`,
+  `browser_scroll`, etc.) return a page snapshot automatically in their
+  result. Use it to decide your next action — do NOT call
+  `browser_snapshot` separately after every action.
+  Only call `browser_snapshot` when you need a fresh view without
+  performing an action, or after setting `auto_snapshot=false`.
+- Do NOT use `browser_screenshot` to read text — use
+  `browser_snapshot` for that (compact, searchable, fast).
+- DO use `browser_screenshot` when you need visual context:
+  charts, images, canvas elements, layout verification, or when
+  the snapshot doesn't capture what you need.
 - Only fall back to `browser_get_text` for extracting specific
  small elements by CSS selector.

 ## Navigation & Waiting
- Always call `browser_wait` after navigation actions
-  (`browser_open`, `browser_navigate`, `browser_click` on links)
-  to let the page load.
+- `browser_navigate` and `browser_open` already wait for the page to
+  load (`domcontentloaded`). Do NOT call `browser_wait` with no
+  arguments after navigation — it wastes time.
+  Only use `browser_wait` when you need a *specific element* or *text*
+  to appear (pass `selector` or `text`).
 - NEVER re-navigate to the same URL after scrolling
  — this resets your scroll position and loses loaded content.

 ## Scrolling
 - Use large scroll amounts ~2000 when loading more content
  — sites like twitter and linkedin have lazy loading for paging.
- After scrolling, take a new `browser_snapshot` to see updated content.
+- The scroll result includes a snapshot automatically — no need to call
+  `browser_snapshot` separately.
+
+## Batching Actions
+- You can call multiple tools in a single turn — they execute in parallel.
+  ALWAYS batch independent actions together. Examples:
+  - Fill multiple form fields in one turn.
+  - Navigate + snapshot in one turn.
+  - Click + scroll if targeting different elements.
+- When batching, set `auto_snapshot=false` on all but the last action
+  to avoid redundant snapshots.
+- Aim for 3-5 tool calls per turn minimum. One tool call per turn is
+  wasteful.

 ## Error Recovery
 - If a tool fails, retry once with the same approach.
@@ -65,11 +86,109 @@ Follow these rules for reliable, efficient browser interaction.
  then `browser_start`, then retry.

 ## Tab Management
- Use `browser_tabs` to list open tabs when managing multiple pages.
- Pass `target_id` to tools when operating on a specific tab.
- Open background tabs with `browser_open(url=..., background=true)`
-  to avoid losing your current context.
- Close tabs you no longer need with `browser_close` to free resources.
+
+**Close tabs as soon as you are done with them** — not only at the end of the task.
+After reading or extracting data from a tab, close it immediately.
+
+**Decision rules:**
+- Finished reading/extracting from a tab? → `browser_close(target_id=...)`
+- Completed a multi-tab workflow? → `browser_close_finished()` to clean up all your tabs
+- More than 3 tabs open? → stop and close finished ones before opening more
+- Popup appeared that you didn't need? → close it immediately
+
+**Origin awareness:** `browser_tabs` returns an `origin` field for each tab:
+- `"agent"` — you opened it; you own it; close it when done
+- `"popup"` — opened by a link or script; close after extracting what you need
+- `"startup"` or `"user"` — leave these alone unless the task requires it
+
+**Cleanup tools:**
+- `browser_close(target_id=...)` — close one specific tab
+- `browser_close_finished()` — close all your agent/popup tabs (safe: leaves startup/user tabs)
+- `browser_close_all()` — close everything except the active tab (use only for full reset)
+
+**Multi-tab workflow pattern:**
+1. Open background tabs with `browser_open(url=..., background=true)` to stay on current tab
+2. Process each tab and close it with `browser_close` when done
+3. When the full workflow completes, call `browser_close_finished()` to confirm cleanup
+4. Check `browser_tabs` at any point — it shows `origin` and `age_seconds` per tab
+
+Never accumulate tabs. Treat every tab you open as a resource you must free.
+
+## Shadow DOM & Overlays
+
+Some sites (LinkedIn messaging, etc.) render content inside closed shadow roots that are
+invisible to regular DOM queries and `browser_snapshot` coordinates.
+
+**Detecting shadow DOM**: `document.elementFromPoint(x, y)` returns a zero-height host element
+(e.g. `#interop-outlet`) for the entire overlay area — this is normal, not a bug.
+`document.body.innerText` and `document.querySelectorAll` return nothing for shadow content.
+`browser_snapshot` CAN read shadow DOM text but cannot return coordinates.
+
+**Querying into shadow DOM:**
+```
+browser_shadow_query("#interop-outlet >>> #msg-overlay >>> p")
+```
+Uses `>>>` to pierce shadow roots. Returns `rect` in CSS pixels and `physicalRect` ready for
+`browser_click_coordinate` / `browser_hover_coordinate`.
+
+**Getting physical rect for any element (including shadow DOM):**
+```
+browser_get_rect(selector="#interop-outlet >>> .msg-convo-wrapper", pierce_shadow=true)
+```
+
+**Manual JS traversal when selector is dynamic:**
+```js
+const shadow = document.getElementById('interop-outlet').shadowRoot;
+const convo = shadow.querySelector('#ember37');
+const rect = convo.querySelector('p').getBoundingClientRect();
+// rect is in CSS pixels — multiply by DPR for physical pixels
+```
+Pass this as a multi-statement script to `browser_evaluate`; it wraps automatically in an IIFE.
+Use `JSON.stringify(rect)` to serialize the result.
+
+## Coordinate System
+
+There are THREE coordinate spaces. Using the wrong one causes clicks/hovers to land in the
+wrong place.
+
+| Space | Used by | How to get |
+|---|---|---|
+| Physical pixels | `browser_click_coordinate` | `browser_coords` `physical_x/y` |
+| CSS pixels | `getBoundingClientRect()`, `elementFromPoint` | `browser_coords` `css_x/y` |
+| Screenshot pixels | What you see in the 800px image | Raw position in screenshot |
+
+**Converting screenshot → physical**: `browser_coords(x, y)` → use `physical_x/y`.
+**Converting CSS → physical**: multiply by `window.devicePixelRatio` (typically 1.6 on HiDPI).
+**Never** pass raw `getBoundingClientRect()` values to `browser_hover_coordinate` without
+multiplying by DPR first.
+
+## Screenshots
+
+Screenshot data is base64-encoded PNG. To view it:
+```
+run_command("echo '<base64_data>' | base64 -d > /tmp/screenshot.png")
+```
+Then use `read_file("/tmp/screenshot.png")` to view the image.
+
+Always use `full_page=false` (default) unless you specifically need the full scrolled page.
+
+## JavaScript Evaluation
+
+`browser_evaluate` wraps your script in an IIFE automatically:
+- Single expression (`document.title`) → wrapped with `return`
+- Multi-statement or contains `;`/`\n` → wrapped without return (add explicit `return` yourself)
+- Already an IIFE → run as-is
+
+**Avoid**: complex closures with `return` inside `for` loops — Chrome CDP returns `null`.
+**Use instead**: `Array.from(...).map(...).join(...)` chains, or build result objects and
+`JSON.stringify()` them.
+
+**For shadow DOM traversal with dynamic selectors**, write the full JS path:
+```js
+const s = document.getElementById('interop-outlet').shadowRoot;
+const el = s.querySelector('.msg-convo-wrapper');
+return JSON.stringify(el.getBoundingClientRect());
+```

 ## Login & Auth Walls
 - If you see a "Log in" or "Sign up" prompt instead of expected
@@ -167,14 +167,6 @@ class Goal(BaseModel):

        return met_weight >= total_weight * 0.9  # 90% threshold

-    def check_constraint(self, constraint_id: str, value: Any) -> bool:
-        """Check if a specific constraint is satisfied."""
-        for c in self.constraints:
-            if c.id == constraint_id:
-                # This would be expanded with actual evaluation logic
-                return True
-        return True
-
    def to_prompt_context(self) -> str:
        """Generate context string for LLM prompts.

@@ -2,7 +2,7 @@
 Node Protocol - The building block of agent graphs.

 A Node is a unit of work that:
-1. Receives context (goal, shared memory, input)
+1. Receives context (goal, shared buffer, input)
 2. Makes decisions (using LLM, tools, or logic)
 3. Produces results (output, state changes)
 4. Records everything to the Runtime
@@ -30,62 +30,6 @@ from framework.runtime.core import Runtime
 logger = logging.getLogger(__name__)


-def _fix_unescaped_newlines_in_json(json_str: str) -> str:
-    """Fix unescaped newlines inside JSON string values.
-
-    LLMs sometimes output actual newlines inside JSON strings instead of \\n.
-    This function fixes that by properly escaping newlines within string values.
-    """
-    result = []
-    in_string = False
-    escape_next = False
-    i = 0
-
-    while i < len(json_str):
-        char = json_str[i]
-
-        if escape_next:
-            result.append(char)
-            escape_next = False
-            i += 1
-            continue
-
-        if char == "\\" and in_string:
-            escape_next = True
-            result.append(char)
-            i += 1
-            continue
-
-        if char == '"' and not escape_next:
-            in_string = not in_string
-            result.append(char)
-            i += 1
-            continue
-
-        # Fix unescaped newlines inside strings
-        if in_string and char == "\n":
-            result.append("\\n")
-            i += 1
-            continue
-
-        # Fix unescaped carriage returns inside strings
-        if in_string and char == "\r":
-            result.append("\\r")
-            i += 1
-            continue
-
-        # Fix unescaped tabs inside strings
-        if in_string and char == "\t":
-            result.append("\\t")
-            i += 1
-            continue
-
-        result.append(char)
-        i += 1
-
-    return "".join(result)
-
-
 def find_json_object(text: str) -> str | None:
    """Find the first valid JSON object in text using balanced brace matching.

@@ -171,10 +115,10 @@ class NodeSpec(BaseModel):

    # Data flow
    input_keys: list[str] = Field(
-        default_factory=list, description="Keys this node reads from shared memory or input"
+        default_factory=list, description="Keys this node reads from the shared buffer or input"
    )
    output_keys: list[str] = Field(
-        default_factory=list, description="Keys this node writes to shared memory or output"
+        default_factory=list, description="Keys this node writes to the shared buffer or output"
    )
    nullable_output_keys: list[str] = Field(
        default_factory=list,
@@ -249,7 +193,10 @@ class NodeSpec(BaseModel):
    # Client-facing behavior
    client_facing: bool = Field(
        default=False,
-        description="If True, this node streams output to the end user and can request input.",
+        description=(
+            "Deprecated compatibility field. The queen is intrinsically interactive; "
+            "non-queen nodes should escalate to the queen instead of talking to users directly."
+        ),
    )

    # Phase completion criteria for conversation-aware judge (Level 2)
@@ -274,20 +221,46 @@ class NodeSpec(BaseModel):

    model_config = {"extra": "allow", "arbitrary_types_allowed": True}

+    def is_queen_node(self) -> bool:
+        """Return True when this spec is the queen conversational node."""
+        return self.id == "queen"

-class MemoryWriteError(Exception):
-    """Raised when an invalid value is written to memory."""
+    def supports_direct_user_io(self) -> bool:
+        """Return True when this node may talk to the user directly."""
+        return self.is_queen_node()
+
+
+def deprecated_client_facing_warning(node_spec: NodeSpec) -> str | None:
+    """Return a deprecation warning for legacy non-queen client_facing nodes."""
+    if node_spec.client_facing and not node_spec.is_queen_node():
+        return (
+            f"Node '{node_spec.id}' sets deprecated client_facing=True. "
+            "Non-queen direct human I/O is no longer supported; route worker "
+            "questions and approvals through queen escalation instead."
+        )
+    return None
+
+
+def warn_if_deprecated_client_facing(node_spec: NodeSpec) -> None:
+    """Log a compatibility warning once the node is loaded for execution."""
+    warning = deprecated_client_facing_warning(node_spec)
+    if warning:
+        logger.warning(warning)
+
+
+class DataBufferWriteError(Exception):
+    """Raised when an invalid value is written to the data buffer."""

    pass


@dataclass
-class SharedMemory:
+class DataBuffer:
    """
-    Shared state between nodes in a graph execution.
+    Shared data buffer between nodes in a graph execution.

-    Nodes read and write to shared memory using typed keys.
-    The memory is scoped to a single run.
+    Nodes read and write to the data buffer using typed keys.
+    The buffer is scoped to a single run.

    For parallel execution, use write_async() which provides per-key locking
    to prevent race conditions when multiple nodes write concurrently.
@@ -306,23 +279,23 @@ class SharedMemory:
            self._lock = asyncio.Lock()

    def read(self, key: str) -> Any:
-        """Read a value from shared memory."""
+        """Read a value from the data buffer."""
        if self._allowed_read and key not in self._allowed_read:
            raise PermissionError(f"Node not allowed to read key: {key}")
        return self._data.get(key)

    def write(self, key: str, value: Any, validate: bool = True) -> None:
        """
-        Write a value to shared memory.
+        Write a value to the data buffer.

        Args:
-            key: The memory key to write to
+            key: The buffer key to write to
            value: The value to write
            validate: If True, check for suspicious content (default True)

        Raises:
            PermissionError: If node doesn't have write permission
-            MemoryWriteError: If value appears to be hallucinated content
+            DataBufferWriteError: If value appears to be hallucinated content
        """
        if self._allowed_write and key not in self._allowed_write:
            raise PermissionError(f"Node not allowed to write key: {key}")
@@ -336,7 +309,7 @@ class SharedMemory:
                        f"⚠ Suspicious write to key '{key}': appears to be code "
                        f"({len(value)} chars). Consider using validate=False if intended."
                    )
-                    raise MemoryWriteError(
+                    raise DataBufferWriteError(
                        f"Rejected suspicious content for key '{key}': "
                        f"appears to be hallucinated code ({len(value)} chars). "
                        "If this is intentional, use validate=False."
@@ -352,13 +325,13 @@ class SharedMemory:
        parallel execution. Each key has its own lock to minimize contention.

        Args:
-            key: The memory key to write to
+            key: The buffer key to write to
            value: The value to write
            validate: If True, check for suspicious content (default True)

        Raises:
            PermissionError: If node doesn't have write permission
-            MemoryWriteError: If value appears to be hallucinated content
+            DataBufferWriteError: If value appears to be hallucinated content
        """
        # Check permissions first (no lock needed)
        if self._allowed_write and key not in self._allowed_write:
@@ -379,7 +352,7 @@ class SharedMemory:
                            f"⚠ Suspicious write to key '{key}': appears to be code "
                            f"({len(value)} chars). Consider using validate=False if intended."
                        )
-                        raise MemoryWriteError(
+                        raise DataBufferWriteError(
                            f"Rejected suspicious content for key '{key}': "
                            f"appears to be hallucinated code ({len(value)} chars). "
                            "If this is intentional, use validate=False."
@@ -457,13 +430,13 @@ class SharedMemory:
        self,
        read_keys: list[str],
        write_keys: list[str],
-    ) -> "SharedMemory":
+    ) -> "DataBuffer":
        """Create a view with restricted permissions for a specific node.

        The scoped view shares the same underlying data and locks,
        enabling thread-safe parallel execution across scoped views.
        """
-        return SharedMemory(
+        return DataBuffer(
            _data=self._data,
            _allowed_read=set(read_keys) if read_keys else set(),
            _allowed_write=set(write_keys) if write_keys else set(),
@@ -479,7 +452,7 @@ class NodeContext:

    This is passed to every node and provides:
    - Access to the runtime (for decision logging)
-    - Access to shared memory (for state)
+    - Access to the data buffer (for state)
    - Access to LLM (for generation)
    - Access to tools (for actions)
    - The goal context (for guidance)
@@ -493,7 +466,7 @@ class NodeContext:
    node_spec: NodeSpec

    # State
-    memory: SharedMemory
+    buffer: DataBuffer
    input_data: dict[str, Any] = field(default_factory=dict)

    # LLM access (if applicable)
@@ -529,12 +502,25 @@ class NodeContext:
    # rebuilding the full system prompt when restoring from conversation store.
    identity_prompt: str = ""
    narrative: str = ""
+    # Static memory block injected into the system prompt.
+    memory_prompt: str = ""

    # Event-triggered execution (no interactive user attached)
    event_triggered: bool = False

    # Execution ID (from StreamRuntimeAdapter)
    execution_id: str = ""
+    run_id: str = ""
+
+    @property
+    def effective_run_id(self) -> str | None:
+        """Normalized run_id: returns run_id if truthy, otherwise None.
+
+        The field defaults to ``""``; callers should use this property
+        instead of ``self.run_id or None`` to avoid silently falling
+        back to session-scoped storage.
+        """
+        return self.run_id or None

    # Stream identity — the ExecutionStream this node runs within.
    # Falls back to node_id when not set (legacy / standalone executor).
@@ -564,6 +550,38 @@ class NodeContext:
    # the queen to switch between phase-specific prompts (building /
    # staging / running) without restarting the conversation.
    dynamic_prompt_provider: Any = None  # Callable[[], str] | None
+    # Dynamic memory provider — when set, EventLoopNode rebuilds the
+    # system prompt with the latest memory block each iteration.
+    dynamic_memory_provider: Any = None  # Callable[[], str] | None
+
+    # Skill system prompts — injected by the skill discovery pipeline
+    skills_catalog_prompt: str = ""  # Available skills XML catalog
+    protocols_prompt: str = ""  # Default skill operational protocols
+    skill_dirs: list[str] = field(default_factory=list)  # Skill base dirs for resource access
+    # DS-12: batch auto-detection nudge appended to system prompt when input looks like a batch
+    default_skill_batch_nudge: str | None = None
+    # DS-13: token usage ratio at which to inject a context preservation warning
+    default_skill_warn_ratio: float | None = None
+
+    # Per-iteration metadata provider — when set, EventLoopNode merges
+    # the returned dict into node_loop_iteration event data.  Used by
+    # the queen to record the current phase per iteration.
+    iteration_metadata_provider: Any = None  # Callable[[], dict] | None
+
+    @property
+    def is_queen_stream(self) -> bool:
+        """Return True when this context belongs to the queen conversation."""
+        return self.stream_id == "queen" or self.node_spec.is_queen_node()
+
+    @property
+    def emits_client_io(self) -> bool:
+        """Return True when text should be published to user-facing streams."""
+        return self.is_queen_stream
+
+    @property
+    def supports_direct_user_io(self) -> bool:
+        """Return True when the node may directly request user input."""
+        return self.is_queen_stream and not self.event_triggered


@dataclass
@@ -672,6 +690,6 @@ class NodeProtocol(ABC):
        """
        errors = []
        for key in ctx.node_spec.input_keys:
-            if key not in ctx.input_data and ctx.memory.read(key) is None:
+            if key not in ctx.input_data and ctx.buffer.read(key) is None:
                errors.append(f"Missing required input: {key}")
        return errors
@@ -1,138 +1,28 @@
-"""Prompt composition for continuous agent mode.
+"""Legacy compatibility wrapper around :mod:`framework.graph.prompting`.

-Composes the three-layer system prompt (onion model) and generates
-transition markers inserted into the conversation at phase boundaries.
-
-Layer 1 — Identity (static, defined at agent level, never changes):
-  "You are a thorough research agent. You prefer clarity over jargon..."
-
-Layer 2 — Narrative (auto-generated from conversation/memory state):
-  "We've finished scoping the project. The user wants to focus on..."
-
-Layer 3 — Focus (per-node system_prompt, reframed as focus directive):
-  "Your current attention: synthesize findings into a report..."
+New runtime code should import from ``framework.graph.prompting`` directly.
 """

 from __future__ import annotations

-import logging
-from datetime import datetime
+import json
 from pathlib import Path
-from typing import TYPE_CHECKING, Any
+from typing import TYPE_CHECKING
+
+from framework.graph.prompting import (
+    EXECUTION_SCOPE_PREAMBLE,
+    TransitionSpec,
+    build_accounts_prompt,
+    build_narrative,
+    build_system_prompt,
+    stamp_prompt_datetime,
+)

 if TYPE_CHECKING:
-    from framework.graph.edge import GraphSpec
-    from framework.graph.node import NodeSpec, SharedMemory
-
-logger = logging.getLogger(__name__)
+    from framework.graph.node import DataBuffer, NodeSpec


-def _with_datetime(prompt: str) -> str:
-    """Append current datetime with local timezone to a system prompt."""
-    local = datetime.now().astimezone()
-    stamp = f"Current date and time: {local.strftime('%Y-%m-%d %H:%M %Z (UTC%z)')}"
-    return f"{prompt}\n\n{stamp}" if prompt else stamp
-
-
-def build_accounts_prompt(
-    accounts: list[dict[str, Any]],
-    tool_provider_map: dict[str, str] | None = None,
-    node_tool_names: list[str] | None = None,
-) -> str:
-    """Build a prompt section describing connected accounts.
-
-    When tool_provider_map is provided, produces structured output grouped
-    by provider with tool mapping, so the LLM knows which ``account`` value
-    to pass to which tool.
-
-    When node_tool_names is also provided, filters to only show providers
-    whose tools overlap with the node's tool list.
-
-    Args:
-        accounts: List of account info dicts from
-            CredentialStoreAdapter.get_all_account_info().
-        tool_provider_map: Mapping of tool_name -> provider_name
-            (e.g. {"gmail_list_messages": "google"}).
-        node_tool_names: Tool names available to the current node.
-            When provided, only providers with matching tools are shown.
-
-    Returns:
-        Formatted accounts block, or empty string if no accounts.
-    """
-    if not accounts:
-        return ""
-
-    # Flat format (backward compat) when no tool mapping provided
-    if tool_provider_map is None:
-        lines = [
-            "Connected accounts (use the alias as the `account` parameter "
-            "when calling tools to target a specific account):"
-        ]
-        for acct in accounts:
-            provider = acct.get("provider", "unknown")
-            alias = acct.get("alias", "unknown")
-            identity = acct.get("identity", {})
-            detail_parts = [f"{k}: {v}" for k, v in identity.items() if v]
-            detail = f" ({', '.join(detail_parts)})" if detail_parts else ""
-            lines.append(f"- {provider}/{alias}{detail}")
-        return "\n".join(lines)
-
-    # --- Structured format: group by provider with tool mapping ---
-
-    # Invert tool_provider_map to provider -> [tools]
-    provider_tools: dict[str, list[str]] = {}
-    for tool_name, provider in tool_provider_map.items():
-        provider_tools.setdefault(provider, []).append(tool_name)
-
-    # Filter to relevant providers based on node tools
-    node_tool_set = set(node_tool_names) if node_tool_names else None
-
-    # Group accounts by provider
-    provider_accounts: dict[str, list[dict[str, Any]]] = {}
-    for acct in accounts:
-        provider = acct.get("provider", "unknown")
-        provider_accounts.setdefault(provider, []).append(acct)
-
-    sections: list[str] = ["Connected accounts:"]
-
-    for provider, acct_list in provider_accounts.items():
-        tools_for_provider = sorted(provider_tools.get(provider, []))
-
-        # If node tools specified, only show providers with overlapping tools
-        if node_tool_set is not None:
-            relevant_tools = [t for t in tools_for_provider if t in node_tool_set]
-            if not relevant_tools:
-                continue
-            tools_for_provider = relevant_tools
-
-        # Local-only providers: tools read from env vars, no account= routing
-        all_local = all(a.get("source") == "local" for a in acct_list)
-
-        # Provider header with tools
-        display_name = provider.replace("_", " ").title()
-        if tools_for_provider and not all_local:
-            tools_str = ", ".join(tools_for_provider)
-            sections.append(f'\n{display_name} (use account="<alias>" with: {tools_str}):')
-        elif tools_for_provider and all_local:
-            tools_str = ", ".join(tools_for_provider)
-            sections.append(f"\n{display_name} (tools: {tools_str}):")
-        else:
-            sections.append(f"\n{display_name}:")
-
-        # Account entries
-        for acct in acct_list:
-            alias = acct.get("alias", "unknown")
-            identity = acct.get("identity", {})
-            detail_parts = [f"{k}: {v}" for k, v in identity.items() if v]
-            detail = f" ({', '.join(detail_parts)})" if detail_parts else ""
-            source_tag = " [local]" if acct.get("source") == "local" else ""
-            sections.append(f"  - {provider}/{alias}{detail}{source_tag}")
-
-    # If filtering removed all providers, return empty
-    if len(sections) <= 1:
-        return ""
-
-    return "\n".join(sections)
+_with_datetime = stamp_prompt_datetime


 def compose_system_prompt(
@@ -140,167 +30,119 @@ def compose_system_prompt(
    focus_prompt: str | None,
    narrative: str | None = None,
    accounts_prompt: str | None = None,
+    skills_catalog_prompt: str | None = None,
+    protocols_prompt: str | None = None,
+    execution_preamble: str | None = None,
+    node_type_preamble: str | None = None,
 ) -> str:
-    """Compose the three-layer system prompt.
+    """Compatibility wrapper for the legacy function signature."""
+    from framework.graph.prompting import NodePromptSpec

-    Args:
-        identity_prompt: Layer 1 — static agent identity (from GraphSpec).
-        focus_prompt: Layer 3 — per-node focus directive (from NodeSpec.system_prompt).
-        narrative: Layer 2 — auto-generated from conversation state.
-        accounts_prompt: Connected accounts block (sits between identity and narrative).
-
-    Returns:
-        Composed system prompt with all layers present, plus current datetime.
-    """
-    parts: list[str] = []
-
-    # Layer 1: Identity (always first, anchors the personality)
-    if identity_prompt:
-        parts.append(identity_prompt)
-
-    # Accounts (semi-static, deployment-specific)
-    if accounts_prompt:
-        parts.append(f"\n{accounts_prompt}")
-
-    # Layer 2: Narrative (what's happened so far)
-    if narrative:
-        parts.append(f"\n--- Context (what has happened so far) ---\n{narrative}")
-
-    # Layer 3: Focus (current phase directive)
-    if focus_prompt:
-        parts.append(f"\n--- Current Focus ---\n{focus_prompt}")
-
-    return _with_datetime("\n".join(parts) if parts else "")
-
-
-def build_narrative(
-    memory: SharedMemory,
-    execution_path: list[str],
-    graph: GraphSpec,
-) -> str:
-    """Build Layer 2 (narrative) from structured state.
-
-    Deterministic — no LLM call. Reads SharedMemory and execution path
-    to describe what has happened so far. Cheap and fast.
-
-    Args:
-        memory: Current shared memory state.
-        execution_path: List of node IDs visited so far.
-        graph: Graph spec (for node names/descriptions).
-
-    Returns:
-        Narrative string describing the session state.
-    """
-    parts: list[str] = []
-
-    # Describe execution path
-    if execution_path:
-        phase_descriptions: list[str] = []
-        for node_id in execution_path:
-            node_spec = graph.get_node(node_id)
-            if node_spec:
-                phase_descriptions.append(f"- {node_spec.name}: {node_spec.description}")
-            else:
-                phase_descriptions.append(f"- {node_id}")
-        parts.append("Phases completed:\n" + "\n".join(phase_descriptions))
-
-    # Describe key memory values (skip very long values)
-    all_memory = memory.read_all()
-    if all_memory:
-        memory_lines: list[str] = []
-        for key, value in all_memory.items():
-            if value is None:
-                continue
-            val_str = str(value)
-            if len(val_str) > 200:
-                val_str = val_str[:200] + "..."
-            memory_lines.append(f"- {key}: {val_str}")
-        if memory_lines:
-            parts.append("Current state:\n" + "\n".join(memory_lines))
-
-    return "\n\n".join(parts) if parts else ""
+    spec = NodePromptSpec(
+        identity_prompt=identity_prompt or "",
+        focus_prompt=focus_prompt or "",
+        narrative=narrative or "",
+        accounts_prompt=accounts_prompt or "",
+        skills_catalog_prompt=skills_catalog_prompt or "",
+        protocols_prompt=protocols_prompt or "",
+        # Legacy callers explicitly passed these preambles. Preserve them by
+        # folding them into the focus block when present.
+        node_type="event_loop",
+    )
+    if execution_preamble or node_type_preamble:
+        focus_parts = []
+        if execution_preamble:
+            focus_parts.append(execution_preamble)
+        if node_type_preamble:
+            focus_parts.append(node_type_preamble)
+        if spec.focus_prompt:
+            focus_parts.append(spec.focus_prompt)
+        spec = NodePromptSpec(
+            identity_prompt=spec.identity_prompt,
+            focus_prompt="\n\n".join(focus_parts),
+            narrative=spec.narrative,
+            accounts_prompt=spec.accounts_prompt,
+            skills_catalog_prompt=spec.skills_catalog_prompt,
+            protocols_prompt=spec.protocols_prompt,
+            node_type=spec.node_type,
+            output_keys=spec.output_keys,
+            is_subagent_mode=spec.is_subagent_mode,
+        )
+    return build_system_prompt(spec)


 def build_transition_marker(
    previous_node: NodeSpec,
    next_node: NodeSpec,
-    memory: SharedMemory,
+    buffer: DataBuffer,
    cumulative_tool_names: list[str],
    data_dir: Path | str | None = None,
-    adapt_content: str | None = None,
 ) -> str:
-    """Build a 'State of the World' transition marker.
+    """Legacy transition builder with best-effort spillover compatibility."""
+    buffer_items: dict[str, str] = {}
+    data_files: list[str] = []

-    Inserted into the conversation as a user message at phase boundaries.
-    Gives the LLM full situational awareness: what happened, what's stored,
-    what tools are available, and what to focus on next.
+    all_buffer = buffer.read_all()
+    for key, value in all_buffer.items():
+        if value is None:
+            continue
+        val_str = str(value)
+        if len(val_str) > 300 and data_dir:
+            data_path = Path(data_dir)
+            data_path.mkdir(parents=True, exist_ok=True)
+            ext = ".json" if isinstance(value, (dict, list)) else ".txt"
+            filename = f"output_{key}{ext}"
+            file_path = data_path / filename
+            try:
+                write_content = (
+                    json.dumps(value, indent=2, ensure_ascii=False)
+                    if isinstance(value, (dict, list))
+                    else str(value)
+                )
+                file_path.write_text(write_content, encoding="utf-8")
+                file_size = file_path.stat().st_size
+                buffer_items[key] = (
+                    f"[Saved to '{filename}' ({file_size:,} bytes). "
+                    f"Use load_data(filename='{filename}') to access.]"
+                )
+            except Exception:
+                buffer_items[key] = val_str[:300] + "..."
+        elif len(val_str) > 300:
+            buffer_items[key] = val_str[:300] + "..."
+        else:
+            buffer_items[key] = val_str

-    Args:
-        previous_node: NodeSpec of the phase just completed.
-        next_node: NodeSpec of the phase about to start.
-        memory: Current shared memory state.
-        cumulative_tool_names: All tools available (cumulative set).
-        data_dir: Path to spillover data directory.
-        adapt_content: Agent working memory (adapt.md) content.
-
-    Returns:
-        Transition marker message text.
-    """
-    sections: list[str] = []
-
-    # Header
-    sections.append(f"--- PHASE TRANSITION: {previous_node.name} → {next_node.name} ---")
-
-    # What just completed
-    sections.append(f"\nCompleted: {previous_node.name}")
-    sections.append(f"  {previous_node.description}")
-
-    # Outputs in memory
-    all_memory = memory.read_all()
-    if all_memory:
-        memory_lines: list[str] = []
-        for key, value in all_memory.items():
-            if value is None:
-                continue
-            val_str = str(value)
-            if len(val_str) > 300:
-                val_str = val_str[:300] + "..."
-            memory_lines.append(f"  {key}: {val_str}")
-        if memory_lines:
-            sections.append("\nOutputs available:\n" + "\n".join(memory_lines))
-
-    # Files in data directory
    if data_dir:
        data_path = Path(data_dir)
        if data_path.exists():
-            files = sorted(data_path.iterdir())
-            if files:
-                file_lines = [
-                    f"  {f.name} ({f.stat().st_size:,} bytes)" for f in files if f.is_file()
-                ]
-                if file_lines:
-                    sections.append(
-                        "\nData files (use read_file to access):\n" + "\n".join(file_lines)
-                    )
+            data_files = [
+                f"{entry.name} ({entry.stat().st_size:,} bytes)"
+                for entry in sorted(data_path.iterdir())
+                if entry.is_file()
+            ]

-    # Agent working memory
-    if adapt_content:
-        sections.append(f"\n--- Agent Memory ---\n{adapt_content}")
-
-    # Available tools
-    if cumulative_tool_names:
-        sections.append("\nAvailable tools: " + ", ".join(sorted(cumulative_tool_names)))
-
-    # Next phase
-    sections.append(f"\nNow entering: {next_node.name}")
-    sections.append(f"  {next_node.description}")
-
-    # Reflection prompt (engineered metacognition)
-    sections.append(
-        "\nBefore proceeding, briefly reflect: what went well in the "
-        "previous phase? Are there any gaps or surprises worth noting?"
+    return build_transition_message(
+        TransitionSpec(
+            previous_name=previous_node.name,
+            previous_description=previous_node.description,
+            next_name=next_node.name,
+            next_description=next_node.description,
+            next_output_keys=tuple(next_node.output_keys or ()),
+            buffer_items=buffer_items,
+            cumulative_tool_names=tuple(sorted(cumulative_tool_names)),
+            data_files=tuple(data_files),
+        )
    )

-    sections.append("\n--- END TRANSITION ---")

-    return "\n".join(sections)
+from framework.graph.prompting import build_transition_message  # noqa: E402
+
+__all__ = [
+    "EXECUTION_SCOPE_PREAMBLE",
+    "_with_datetime",
+    "build_accounts_prompt",
+    "build_narrative",
+    "build_transition_marker",
+    "build_transition_message",
+    "compose_system_prompt",
+]
@@ -0,0 +1,312 @@
+"""Pure prompt rendering helpers for graph execution.
+
+This module owns all prompt text assembly for graph nodes.
+It intentionally avoids side effects so runtime code can prepare any
+spill files or transition metadata separately and then pass plain data in.
+"""
+
+from __future__ import annotations
+
+from dataclasses import dataclass, field
+from datetime import datetime
+from typing import TYPE_CHECKING, Any
+
+if TYPE_CHECKING:
+    from framework.graph.edge import GraphSpec
+    from framework.graph.node import DataBuffer
+
+
+# Injected into every worker node's system prompt so the LLM understands
+# it is one step in a multi-node pipeline and should not overreach.
+EXECUTION_SCOPE_PREAMBLE = (
+    "EXECUTION SCOPE: You are one node in a multi-step workflow graph. "
+    "Focus ONLY on the task described in your instructions below. "
+    "Call set_output() for each of your declared output keys, then stop. "
+    "Do NOT attempt work that belongs to other nodes - the framework "
+    "routes data between nodes automatically."
+)
+
+
+@dataclass(frozen=True)
+class NodePromptSpec:
+    """Structured inputs for building one node system prompt."""
+
+    identity_prompt: str = ""
+    focus_prompt: str = ""
+    narrative: str = ""
+    accounts_prompt: str = ""
+    skills_catalog_prompt: str = ""
+    protocols_prompt: str = ""
+    memory_prompt: str = ""
+    node_type: str = "event_loop"
+    output_keys: tuple[str, ...] = ()
+    is_subagent_mode: bool = False
+
+
+@dataclass(frozen=True)
+class TransitionSpec:
+    """Structured inputs for a transition marker message."""
+
+    previous_name: str
+    previous_description: str
+    next_name: str
+    next_description: str
+    next_output_keys: tuple[str, ...] = ()
+    buffer_items: dict[str, str] = field(default_factory=dict)
+    cumulative_tool_names: tuple[str, ...] = ()
+    data_files: tuple[str, ...] = ()
+
+
+def stamp_prompt_datetime(prompt: str) -> str:
+    """Append current datetime with local timezone to a prompt."""
+    local = datetime.now().astimezone()
+    stamp = f"Current date and time: {local.strftime('%Y-%m-%d %H:%M %Z (UTC%z)')}"
+    return f"{prompt}\n\n{stamp}" if prompt else stamp
+
+
+def build_accounts_prompt(
+    accounts: list[dict[str, Any]],
+    tool_provider_map: dict[str, str] | None = None,
+    node_tool_names: list[str] | None = None,
+) -> str:
+    """Build a prompt section describing connected accounts."""
+    if not accounts:
+        return ""
+
+    if tool_provider_map is None:
+        lines = [
+            "Connected accounts (use the alias as the `account` parameter "
+            "when calling tools to target a specific account):"
+        ]
+        for acct in accounts:
+            provider = acct.get("provider", "unknown")
+            alias = acct.get("alias", "unknown")
+            identity = acct.get("identity", {})
+            detail_parts = [f"{k}: {v}" for k, v in identity.items() if v]
+            detail = f" ({', '.join(detail_parts)})" if detail_parts else ""
+            lines.append(f"- {provider}/{alias}{detail}")
+        return "\n".join(lines)
+
+    provider_tools: dict[str, list[str]] = {}
+    for tool_name, provider in tool_provider_map.items():
+        provider_tools.setdefault(provider, []).append(tool_name)
+
+    node_tool_set = set(node_tool_names) if node_tool_names else None
+
+    provider_accounts: dict[str, list[dict[str, Any]]] = {}
+    for acct in accounts:
+        provider = acct.get("provider", "unknown")
+        provider_accounts.setdefault(provider, []).append(acct)
+
+    sections: list[str] = ["Connected accounts:"]
+
+    for provider, acct_list in provider_accounts.items():
+        tools_for_provider = sorted(provider_tools.get(provider, []))
+
+        if node_tool_set is not None:
+            relevant_tools = [
+                tool_name for tool_name in tools_for_provider if tool_name in node_tool_set
+            ]
+            if not relevant_tools:
+                continue
+            tools_for_provider = relevant_tools
+
+        all_local = all(acct.get("source") == "local" for acct in acct_list)
+        display_name = provider.replace("_", " ").title()
+        if tools_for_provider and not all_local:
+            tools_str = ", ".join(tools_for_provider)
+            sections.append(f'\n{display_name} (use account="<alias>" with: {tools_str}):')
+        elif tools_for_provider and all_local:
+            tools_str = ", ".join(tools_for_provider)
+            sections.append(f"\n{display_name} (tools: {tools_str}):")
+        else:
+            sections.append(f"\n{display_name}:")
+
+        for acct in acct_list:
+            alias = acct.get("alias", "unknown")
+            identity = acct.get("identity", {})
+            detail_parts = [f"{k}: {v}" for k, v in identity.items() if v]
+            detail = f" ({', '.join(detail_parts)})" if detail_parts else ""
+            source_tag = " [local]" if acct.get("source") == "local" else ""
+            sections.append(f"  - {provider}/{alias}{detail}{source_tag}")
+
+    if len(sections) <= 1:
+        return ""
+
+    return "\n".join(sections)
+
+
+def build_prompt_spec_from_node_context(
+    ctx: Any,
+    *,
+    focus_prompt: str | None = None,
+    narrative: str | None = None,
+    memory_prompt: str | None = None,
+) -> NodePromptSpec:
+    """Convert a NodeContext-like object into structured prompt inputs."""
+    resolved_memory_prompt = memory_prompt
+    if resolved_memory_prompt is None:
+        resolved_memory_prompt = getattr(ctx, "memory_prompt", "") or ""
+        dynamic_memory_provider = getattr(ctx, "dynamic_memory_provider", None)
+        if dynamic_memory_provider is not None:
+            try:
+                resolved_memory_prompt = dynamic_memory_provider() or ""
+            except Exception:
+                resolved_memory_prompt = getattr(ctx, "memory_prompt", "") or ""
+    return NodePromptSpec(
+        identity_prompt=ctx.identity_prompt or "",
+        focus_prompt=focus_prompt
+        if focus_prompt is not None
+        else (ctx.node_spec.system_prompt or ""),
+        narrative=narrative if narrative is not None else (ctx.narrative or ""),
+        accounts_prompt=ctx.accounts_prompt or "",
+        skills_catalog_prompt=ctx.skills_catalog_prompt or "",
+        protocols_prompt=ctx.protocols_prompt or "",
+        memory_prompt=resolved_memory_prompt,
+        node_type=ctx.node_spec.node_type,
+        output_keys=tuple(ctx.node_spec.output_keys or ()),
+        is_subagent_mode=bool(getattr(ctx, "is_subagent_mode", False)),
+    )
+
+
+def build_system_prompt(spec: NodePromptSpec) -> str:
+    """Compose one canonical system prompt for a node."""
+    parts: list[str] = []
+
+    if spec.identity_prompt:
+        parts.append(spec.identity_prompt)
+
+    if spec.accounts_prompt:
+        parts.append(f"\n{spec.accounts_prompt}")
+
+    if spec.skills_catalog_prompt:
+        parts.append(f"\n{spec.skills_catalog_prompt}")
+
+    if spec.protocols_prompt:
+        parts.append(f"\n{spec.protocols_prompt}")
+
+    if spec.memory_prompt:
+        parts.append(
+            "\nRelevant recalled memories may appear below. Treat them as "
+            "point-in-time guidance and verify stale details against current context."
+        )
+        parts.append(f"\n{spec.memory_prompt}")
+
+    if spec.narrative:
+        parts.append(f"\n--- Context (what has happened so far) ---\n{spec.narrative}")
+
+    if not spec.is_subagent_mode and spec.node_type in ("event_loop", "gcu") and spec.output_keys:
+        parts.append(f"\n{EXECUTION_SCOPE_PREAMBLE}")
+
+    if spec.node_type == "gcu":
+        from framework.graph.gcu import GCU_BROWSER_SYSTEM_PROMPT
+
+        parts.append(f"\n{GCU_BROWSER_SYSTEM_PROMPT}")
+
+    if spec.focus_prompt:
+        parts.append(f"\n--- Current Focus ---\n{spec.focus_prompt}")
+
+    return stamp_prompt_datetime("\n".join(parts) if parts else "")
+
+
+def build_system_prompt_for_node_context(
+    ctx: Any,
+    *,
+    focus_prompt: str | None = None,
+    narrative: str | None = None,
+    memory_prompt: str | None = None,
+) -> str:
+    """Build a canonical system prompt from a NodeContext-like object."""
+    spec = build_prompt_spec_from_node_context(
+        ctx,
+        focus_prompt=focus_prompt,
+        narrative=narrative,
+        memory_prompt=memory_prompt,
+    )
+    return build_system_prompt(spec)
+
+
+def build_narrative(
+    buffer: DataBuffer,
+    execution_path: list[str],
+    graph: GraphSpec,
+) -> str:
+    """Build a deterministic Layer 2 narrative from graph state."""
+    parts: list[str] = []
+
+    if execution_path:
+        phase_descriptions: list[str] = []
+        for node_id in execution_path:
+            node_spec = graph.get_node(node_id)
+            if node_spec:
+                phase_descriptions.append(f"- {node_spec.name}: {node_spec.description}")
+            else:
+                phase_descriptions.append(f"- {node_id}")
+        parts.append("Phases completed:\n" + "\n".join(phase_descriptions))
+
+    all_buffer = buffer.read_all()
+    if all_buffer:
+        memory_lines: list[str] = []
+        for key, value in all_buffer.items():
+            if value is None:
+                continue
+            val_str = str(value)
+            if len(val_str) > 200:
+                val_str = val_str[:200] + "..."
+            memory_lines.append(f"- {key}: {val_str}")
+        if memory_lines:
+            parts.append("Current state:\n" + "\n".join(memory_lines))
+
+    return "\n\n".join(parts) if parts else ""
+
+
+def build_transition_message(spec: TransitionSpec) -> str:
+    """Build a pure transition marker message."""
+    sections: list[str] = [
+        f"--- PHASE TRANSITION: {spec.previous_name} -> {spec.next_name} ---",
+        f"\nCompleted: {spec.previous_name}",
+        f"  {spec.previous_description}",
+    ]
+
+    if spec.buffer_items:
+        lines = [f"  {key}: {value}" for key, value in spec.buffer_items.items()]
+        sections.append("\nOutputs available:\n" + "\n".join(lines))
+
+    if spec.data_files:
+        sections.append(
+            "\nData files (use load_data to access):\n"
+            + "\n".join(f"  {entry}" for entry in spec.data_files)
+        )
+
+    if spec.cumulative_tool_names:
+        sections.append("\nAvailable tools: " + ", ".join(sorted(spec.cumulative_tool_names)))
+
+    sections.append(f"\nNow entering: {spec.next_name}")
+    sections.append(f"  {spec.next_description}")
+    if spec.next_output_keys:
+        sections.append(
+            f"\nYour ONLY job in this phase: complete the task above and call "
+            f"set_output() for {list(spec.next_output_keys)}. Do NOT do work that "
+            f"belongs to later phases."
+        )
+
+    sections.append(
+        "\nBefore proceeding, briefly reflect: what went well in the "
+        "previous phase? Are there any gaps or surprises worth noting?"
+    )
+    sections.append("\n--- END TRANSITION ---")
+    return "\n".join(sections)
+
+
+__all__ = [
+    "EXECUTION_SCOPE_PREAMBLE",
+    "NodePromptSpec",
+    "TransitionSpec",
+    "build_accounts_prompt",
+    "build_narrative",
+    "build_prompt_spec_from_node_context",
+    "build_system_prompt",
+    "build_system_prompt_for_node_context",
+    "build_transition_message",
+    "stamp_prompt_datetime",
+]
@@ -1,7 +1,84 @@
 import ast
 import operator
+import signal
+import threading
+import time
+from contextlib import contextmanager
 from typing import Any

+# Power operations can allocate extremely large integers. Keep conservative
+# limits here so untrusted edge conditions cannot exhaust CPU or memory.
+MAX_POWER_ABS_EXPONENT = 1_000
+MAX_POWER_RESULT_BITS = 4_096
+# Typical edge-condition evaluations in this repo complete well under 1ms.
+# 100ms leaves ample headroom for legitimate checks while failing fast on abuse.
+DEFAULT_TIMEOUT_MS = 100
+
+
+def _safe_pow(base: Any, exp: Any) -> Any:
+    if isinstance(exp, (int, float)) and abs(exp) > MAX_POWER_ABS_EXPONENT:
+        raise ValueError(f"Power exponent exceeds safe limit ({MAX_POWER_ABS_EXPONENT})")
+
+    if isinstance(base, int) and isinstance(exp, int) and exp > 0:
+        abs_base = abs(base)
+        if abs_base > 1:
+            # Estimate bit growth instead of materializing a huge integer.
+            estimated_bits = exp * abs_base.bit_length()
+            if estimated_bits > MAX_POWER_RESULT_BITS:
+                raise ValueError("Power operation exceeds safe size limit")
+
+    return operator.pow(base, exp)
+
+
+def _timeout_message(timeout_ms: int) -> str:
+    return f"safe_eval exceeded {timeout_ms}ms execution timeout"
+
+
+def _check_timeout(deadline: float | None, timeout_ms: int | None) -> None:
+    if deadline is not None and timeout_ms is not None and time.perf_counter() >= deadline:
+        raise TimeoutError(_timeout_message(timeout_ms))
+
+
+@contextmanager
+def _execution_timeout(timeout_ms: int | None):
+    if timeout_ms is None:
+        yield
+        return
+
+    if timeout_ms <= 0:
+        raise ValueError("timeout_ms must be greater than 0")
+
+    can_use_alarm = (
+        hasattr(signal, "SIGALRM")
+        and hasattr(signal, "ITIMER_REAL")
+        and hasattr(signal, "getitimer")
+        and hasattr(signal, "setitimer")
+        and threading.current_thread() is threading.main_thread()
+    )
+    if not can_use_alarm:
+        yield
+        return
+
+    current_delay, current_interval = signal.getitimer(signal.ITIMER_REAL)
+    if current_delay > 0 or current_interval > 0:
+        # safe_eval runs inside a shared framework process, so it must not
+        # replace a timer another subsystem already owns.
+        yield
+        return
+
+    def _handle_timeout(signum, frame):
+        raise TimeoutError(_timeout_message(timeout_ms))
+
+    old_handler = signal.getsignal(signal.SIGALRM)
+    signal.signal(signal.SIGALRM, _handle_timeout)
+    old_delay, old_interval = signal.setitimer(signal.ITIMER_REAL, timeout_ms / 1000)
+    try:
+        yield
+    finally:
+        signal.signal(signal.SIGALRM, old_handler)
+        signal.setitimer(signal.ITIMER_REAL, old_delay, old_interval)
+
+
 # Safe operators whitelist
 SAFE_OPERATORS = {
    ast.Add: operator.add,
@@ -10,7 +87,7 @@ SAFE_OPERATORS = {
    ast.Div: operator.truediv,
    ast.FloorDiv: operator.floordiv,
    ast.Mod: operator.mod,
-    ast.Pow: operator.pow,
+    ast.Pow: _safe_pow,
    ast.LShift: operator.lshift,
    ast.RShift: operator.rshift,
    ast.BitOr: operator.or_,
@@ -54,10 +131,19 @@ SAFE_FUNCTIONS = {


 class SafeEvalVisitor(ast.NodeVisitor):
-    def __init__(self, context: dict[str, Any]):
+    def __init__(
+        self,
+        context: dict[str, Any],
+        *,
+        deadline: float | None = None,
+        timeout_ms: int | None = None,
+    ):
        self.context = context
+        self.deadline = deadline
+        self.timeout_ms = timeout_ms

    def visit(self, node: ast.AST) -> Any:
+        _check_timeout(self.deadline, self.timeout_ms)
        # Override visit to prevent default behavior and ensure only explicitly allowed nodes work
        method = "visit_" + node.__class__.__name__
        visitor = getattr(self, method, self.generic_visit)
@@ -115,11 +201,23 @@ class SafeEvalVisitor(ast.NodeVisitor):
        return True

    def visit_BoolOp(self, node: ast.BoolOp) -> Any:
-        values = [self.visit(v) for v in node.values]
+        # Short-circuit evaluation to match Python semantics.
+        # Previously all operands were eagerly evaluated, which broke
+        # guard patterns like: ``x is not None and x.get("key")``
        if isinstance(node.op, ast.And):
-            return all(values)
+            result = True
+            for v in node.values:
+                result = self.visit(v)
+                if not result:
+                    return result
+            return result
        elif isinstance(node.op, ast.Or):
-            return any(values)
+            result = False
+            for v in node.values:
+                result = self.visit(v)
+                if result:
+                    return result
+            return result
        raise ValueError(f"Boolean operator {type(node.op).__name__} is not allowed")

    def visit_IfExp(self, node: ast.IfExp) -> Any:
@@ -171,6 +269,7 @@ class SafeEvalVisitor(ast.NodeVisitor):
        raise AttributeError(f"Object has no attribute '{node.attr}'")

    def visit_Call(self, node: ast.Call) -> Any:
+        _check_timeout(self.deadline, self.timeout_ms)
        # Only allow calling whitelisted functions
        func = self.visit(node.func)

@@ -214,20 +313,24 @@ class SafeEvalVisitor(ast.NodeVisitor):
        args = [self.visit(arg) for arg in node.args]
        keywords = {kw.arg: self.visit(kw.value) for kw in node.keywords}

+        _check_timeout(self.deadline, self.timeout_ms)
        return func(*args, **keywords)

-    def visit_Index(self, node: ast.Index) -> Any:
-        # Python < 3.9
-        return self.visit(node.value)

-
-def safe_eval(expr: str, context: dict[str, Any] | None = None) -> Any:
+def safe_eval(
+    expr: str,
+    context: dict[str, Any] | None = None,
+    *,
+    timeout_ms: int | None = DEFAULT_TIMEOUT_MS,
+) -> Any:
    """
    Safely evaluate a python expression string.

    Args:
        expr: The expression string to evaluate.
        context: Dictionary of variables available in the expression.
+        timeout_ms: Maximum evaluation time in milliseconds. Use ``None`` to
+            disable the timeout.

    Returns:
        The result of the evaluation.
@@ -243,10 +346,18 @@ def safe_eval(expr: str, context: dict[str, Any] | None = None) -> Any:
    full_context = context.copy()
    full_context.update(SAFE_FUNCTIONS)

-    try:
-        tree = ast.parse(expr, mode="eval")
-    except SyntaxError as e:
-        raise SyntaxError(f"Invalid syntax in expression: {e}") from e
+    deadline = None if timeout_ms is None else time.perf_counter() + (timeout_ms / 1000)

-    visitor = SafeEvalVisitor(full_context)
-    return visitor.visit(tree)
+    with _execution_timeout(timeout_ms):
+        try:
+            tree = ast.parse(expr, mode="eval")
+        except SyntaxError as e:
+            raise SyntaxError(f"Invalid syntax in expression: {e}") from e
+
+        _check_timeout(deadline, timeout_ms)
+        visitor = SafeEvalVisitor(
+            full_context,
+            deadline=deadline,
+            timeout_ms=timeout_ms,
+        )
+        return visitor.visit(tree)
@@ -0,0 +1,849 @@
+"""
+WorkerAgent — First-class autonomous worker for event-driven graph execution.
+
+Each node in a graph becomes a WorkerAgent that:
+- Owns its lifecycle, retry logic, memory scope, and LLM config
+- Receives activations from upstream workers (via GraphExecutor routing)
+- Self-checks readiness (fan-out group tracking)
+- Self-triggers when ready
+- Evaluates outgoing edges and publishes activations for downstream workers
+"""
+
+from __future__ import annotations
+
+import asyncio
+import logging
+import time
+import uuid
+from dataclasses import dataclass, field
+from enum import StrEnum
+from typing import Any
+
+from framework.graph.context import GraphContext, build_node_context_from_graph_context
+from framework.graph.edge import EdgeCondition, EdgeSpec
+from framework.graph.node import (
+    NodeContext,
+    NodeProtocol,
+    NodeResult,
+    NodeSpec,
+)
+from framework.graph.validator import OutputValidator
+
+logger = logging.getLogger(__name__)
+
+
+# ---------------------------------------------------------------------------
+# Enums & data types
+# ---------------------------------------------------------------------------
+
+
+class WorkerLifecycle(StrEnum):
+    PENDING = "pending"
+    RUNNING = "running"
+    COMPLETED = "completed"
+    FAILED = "failed"
+
+
+@dataclass
+class FanOutTag:
+    """Carried in activations, propagated through the worker chain.
+
+    When a source activates multiple targets (fan-out), each activation
+    receives a FanOutTag.  Downstream convergence workers track these tags
+    to determine when all parallel branches have reached them.
+    """
+
+    fan_out_id: str  # Unique ID for this fan-out event
+    fan_out_source: str  # Node that performed the fan-out
+    branches: frozenset[str]  # All target node IDs in this fan-out
+    via_branch: str  # Which branch this activation passed through
+
+
+@dataclass
+class FanOutTracker:
+    """Per fan-out group, tracked by the target worker."""
+
+    fan_out_id: str
+    branches: frozenset[str]
+    reached: set[str] = field(default_factory=set)
+
+    @property
+    def is_complete(self) -> bool:
+        return self.reached == self.branches
+
+
+@dataclass
+class Activation:
+    """Payload sent from a completed source to a target worker."""
+
+    source_id: str
+    target_id: str
+    edge_id: str
+    edge: EdgeSpec
+    mapped_inputs: dict[str, Any]
+    fan_out_tags: list[FanOutTag] = field(default_factory=list)
+
+
+@dataclass
+class WorkerCompletion:
+    """Payload in WORKER_COMPLETED event."""
+
+    worker_id: str
+    success: bool
+    output: dict[str, Any]
+    tokens_used: int = 0
+    latency_ms: int = 0
+    conversation: Any = None  # NodeConversation for continuous mode
+    activations: list[Activation] = field(default_factory=list)
+
+
+@dataclass
+class RetryState:
+    attempt: int = 0
+    max_retries: int = 3
+    is_event_loop: bool = False
+
+
+# ---------------------------------------------------------------------------
+# WorkerAgent
+# ---------------------------------------------------------------------------
+
+
+class WorkerAgent:
+    """First-class autonomous worker for one node in the graph.
+
+    Lifecycle:
+        PENDING - waiting for activations
+        RUNNING - executing the node
+        COMPLETED- finished successfully, activations published
+        FAILED  - failed after retries exhausted
+    """
+
+    def __init__(
+        self,
+        node_spec: NodeSpec,
+        graph_context: GraphContext,
+    ) -> None:
+        self.node_spec = node_spec
+        self._gc = graph_context
+
+        # Edge topology (resolved at construction, immutable)
+        self.incoming_edges: list[EdgeSpec] = graph_context.graph.get_incoming_edges(node_spec.id)
+        self.outgoing_edges: list[EdgeSpec] = graph_context.graph.get_outgoing_edges(node_spec.id)
+
+        # Lifecycle
+        self.lifecycle: WorkerLifecycle = WorkerLifecycle.PENDING
+        self._task: asyncio.Task | None = None
+
+        # Retry state
+        self.retry_state = RetryState(
+            max_retries=node_spec.max_retries,
+            is_event_loop=node_spec.node_type == "event_loop",
+        )
+
+        # Activation tracking
+        self._inherited_fan_out_tags: list[FanOutTag] = []
+        self._active_fan_outs: dict[str, FanOutTracker] = {}
+        self._received_activations: list[Activation] = []
+        self._has_been_activated = False
+
+        # Pause support
+        # _run_gate controls whether worker execution may proceed.
+        # _pause_requested mirrors the pause-request semantics expected by
+        # EventLoopNode, where is_set() means "pause requested".
+        self._run_gate: asyncio.Event = asyncio.Event()
+        self._run_gate.set()  # Not paused by default
+        self._pause_requested: asyncio.Event = asyncio.Event()
+
+        # Validator
+        self._validator = OutputValidator()
+
+        # Node implementation (lazy)
+        self._node_impl: NodeProtocol | None = None
+
+        # Metrics for this worker
+        self._tokens_used: int = 0
+        self._latency_ms: int = 0
+
+        # Last execution result (accessible by polling executor)
+        self._last_result: NodeResult | None = None
+        self._last_activations: list[Activation] = []
+
+    # ------------------------------------------------------------------
+    # Public activation interface
+    # ------------------------------------------------------------------
+
+    def activate(self, inherited_tags: list[FanOutTag] | None = None) -> None:
+        """Activate this worker — launch execution as an asyncio.Task."""
+        if self.lifecycle != WorkerLifecycle.PENDING:
+            return
+
+        self._inherited_fan_out_tags = inherited_tags or []
+        self._has_been_activated = True
+        self.lifecycle = WorkerLifecycle.RUNNING
+        self._task = asyncio.ensure_future(self._execute_self())
+
+    def receive_activation(self, activation: Activation) -> None:
+        """Receive an activation from an upstream worker.
+
+        Called by GraphExecutor when routing a WORKER_COMPLETED event's
+        activations to their target workers.
+        """
+        if self.lifecycle != WorkerLifecycle.PENDING:
+            return
+
+        self._received_activations.append(activation)
+
+        # Update fan-out trackers from this activation's tags.
+        # Skip tags where this worker IS the via_branch — those tags exist
+        # for downstream convergence tracking, not for gating this worker.
+        for tag in activation.fan_out_tags:
+            if tag.via_branch == self.node_spec.id:
+                continue
+            if tag.fan_out_id not in self._active_fan_outs:
+                self._active_fan_outs[tag.fan_out_id] = FanOutTracker(
+                    fan_out_id=tag.fan_out_id,
+                    branches=tag.branches,
+                )
+            self._active_fan_outs[tag.fan_out_id].reached.add(tag.via_branch)
+
+    def check_readiness(self) -> bool:
+        """Check if all fan-out groups have been satisfied."""
+        if self._has_been_activated:
+            return True
+        if not self._active_fan_outs:
+            # No fan-out tracking — ready on first activation
+            return bool(self._received_activations)
+        return all(t.is_complete for t in self._active_fan_outs.values())
+
+    def reset_for_revisit(self) -> None:
+        """Reset a completed worker so it can execute again (feedback loops).
+
+        Preserves the node implementation (cached) but clears lifecycle,
+        activation, and result state.
+        """
+        self.lifecycle = WorkerLifecycle.PENDING
+        self._inherited_fan_out_tags = []
+        self._active_fan_outs = {}
+        self._received_activations = []
+        self._has_been_activated = False
+        self._task = None
+        self._last_result = None
+        self._last_activations = []
+        self._tokens_used = 0
+        self._latency_ms = 0
+
+    # ------------------------------------------------------------------
+    # Execution
+    # ------------------------------------------------------------------
+
+    async def _execute_self(self) -> None:
+        """Main execution loop: run node, handle retries, publish result."""
+        gc = self._gc
+        node_spec = self.node_spec
+        try:
+            # Write all mapped inputs from received activations to buffer
+            for activation in self._received_activations:
+                for key, value in activation.mapped_inputs.items():
+                    gc.buffer.write(key, value, validate=False)
+
+            # Increment visit count (always, even if skipped)
+            async with gc._visits_lock:
+                visit_count = gc.node_visit_counts.get(node_spec.id, 0) + 1
+                gc.node_visit_counts[node_spec.id] = visit_count
+
+            # Check max_node_visits — skip execution but still propagate edges
+            if node_spec.max_node_visits > 0 and visit_count > node_spec.max_node_visits:
+                logger.info(
+                    "Worker %s: visit %d exceeds max_node_visits=%d, skipping",
+                    node_spec.id,
+                    visit_count,
+                    node_spec.max_node_visits,
+                )
+                # Build a synthetic success result from current buffer state
+                existing_output: dict[str, Any] = {}
+                for key in node_spec.output_keys:
+                    val = gc.buffer.read(key)
+                    if val is not None:
+                        existing_output[key] = val
+
+                result = NodeResult(success=True, output=existing_output)
+
+                # Evaluate outgoing edges so the cycle continues
+                activations = await self._evaluate_outgoing_edges(result)
+
+                self.lifecycle = WorkerLifecycle.COMPLETED
+                self._last_result = result
+                self._last_activations = activations
+                return
+
+            # Clear stale nullable outputs on re-visit
+            if visit_count > 1:
+                nullable_keys = getattr(node_spec, "nullable_output_keys", None) or []
+                for key in nullable_keys:
+                    if gc.buffer.read(key) is not None:
+                        gc.buffer.write(key, None, validate=False)
+
+            # Continuous mode: accumulate tools and output keys
+            if gc.is_continuous and node_spec.tools:
+                for t in gc.tools:
+                    if t.name in node_spec.tools and t.name not in gc.cumulative_tool_names:
+                        gc.cumulative_tools.append(t)
+                        gc.cumulative_tool_names.add(t.name)
+            if gc.is_continuous and node_spec.output_keys:
+                for k in node_spec.output_keys:
+                    if k not in gc.cumulative_output_keys:
+                        gc.cumulative_output_keys.append(k)
+
+            # Append to execution path
+            async with gc._path_lock:
+                gc.path.append(node_spec.id)
+
+            # Get node implementation
+            node_impl = self._get_node_implementation()
+
+            # Build context
+            ctx = self._build_node_context()
+
+            # Execute with retry
+            result = await self._execute_with_retries(node_impl, ctx)
+
+            # Handle result
+            if result.success:
+                # Validate and write outputs
+                self._write_outputs(result)
+
+                # Evaluate outgoing edges
+                activations = await self._evaluate_outgoing_edges(result)
+
+                # Publish completion
+                self.lifecycle = WorkerLifecycle.COMPLETED
+                self._last_result = result
+                self._last_activations = activations
+                completion = WorkerCompletion(
+                    worker_id=node_spec.id,
+                    success=True,
+                    output=result.output,
+                    tokens_used=result.tokens_used,
+                    latency_ms=result.latency_ms,
+                    conversation=result.conversation,
+                    activations=activations,
+                )
+                if gc.is_continuous and completion.conversation is not None:
+                    gc.continuous_conversation = completion.conversation
+                    await self._apply_continuous_transition(completion.activations)
+                await self._publish_completion(completion)
+            else:
+                # Evaluate outgoing edges even on failure (ON_FAILURE edges)
+                activations = await self._evaluate_outgoing_edges(result)
+
+                self.lifecycle = WorkerLifecycle.FAILED
+                self._last_result = result
+                self._last_activations = activations
+                await self._publish_failure(result.error or "Unknown error")
+        except Exception as exc:
+            error = str(exc) or type(exc).__name__
+            logger.exception("Worker %s crashed during execution", node_spec.id)
+            self.lifecycle = WorkerLifecycle.FAILED
+            self._last_result = NodeResult(success=False, error=error)
+            self._last_activations = []
+            await self._publish_failure(error)
+
+    async def _execute_with_retries(self, node_impl: NodeProtocol, ctx: NodeContext) -> NodeResult:
+        """Execute node with exponential backoff retry."""
+        gc = self._gc
+        # Only skip retries for actual EventLoopNode instances (they handle
+        # retries internally).  Custom NodeProtocol impls registered via
+        # register_node should be retried by the executor.
+        from framework.graph.event_loop_node import EventLoopNode as _ELN
+
+        if isinstance(node_impl, _ELN):
+            max_retries = 0
+        else:
+            max_retries = self.retry_state.max_retries
+
+        total_attempts = max(1, max_retries)
+        for attempt in range(total_attempts):
+            # Check pause
+            await self._run_gate.wait()
+
+            ctx.attempt = attempt + 1
+            start = time.monotonic()
+
+            try:
+                result = await node_impl.execute(ctx)
+                result.latency_ms = int((time.monotonic() - start) * 1000)
+
+                if result.success:
+                    return result
+
+                # Failure
+                if attempt + 1 < total_attempts:
+                    gc.retry_counts[self.node_spec.id] = (
+                        gc.retry_counts.get(self.node_spec.id, 0) + 1
+                    )
+                    gc.nodes_with_retries.add(self.node_spec.id)
+                    delay = 1.0 * (2**attempt)
+                    logger.warning(
+                        "Worker %s failed (attempt %d/%d), retrying in %.1fs: %s",
+                        self.node_spec.id,
+                        attempt + 1,
+                        max_retries,
+                        delay,
+                        result.error,
+                    )
+                    # Emit retry event
+                    if gc.event_bus:
+                        await gc.event_bus.emit_node_retry(
+                            stream_id=gc.stream_id,
+                            node_id=self.node_spec.id,
+                            attempt=attempt + 1,
+                            max_retries=max_retries,
+                            execution_id=gc.execution_id,
+                        )
+                    await asyncio.sleep(delay)
+                    continue
+                else:
+                    return NodeResult(
+                        success=False,
+                        error=f"failed after {attempt + 1} attempts: {result.error}",
+                    )
+
+            except Exception as exc:
+                if attempt + 1 < total_attempts:
+                    gc.retry_counts[self.node_spec.id] = (
+                        gc.retry_counts.get(self.node_spec.id, 0) + 1
+                    )
+                    gc.nodes_with_retries.add(self.node_spec.id)
+                    delay = 1.0 * (2**attempt)
+                    logger.warning(
+                        "Worker %s raised %s (attempt %d/%d), retrying in %.1fs",
+                        self.node_spec.id,
+                        type(exc).__name__,
+                        attempt + 1,
+                        max(1, max_retries),
+                        delay,
+                    )
+                    await asyncio.sleep(delay)
+                    continue
+                return NodeResult(
+                    success=False,
+                    error=f"failed after {attempt + 1} attempts: {exc}",
+                )
+
+        return NodeResult(
+            success=False,
+            error=f"failed after {max(1, max_retries)} attempts",
+        )
+
+    # ------------------------------------------------------------------
+    # Edge evaluation (source-side)
+    # ------------------------------------------------------------------
+
+    async def _evaluate_outgoing_edges(self, result: NodeResult) -> list[Activation]:
+        """Evaluate outgoing edges and create activations for downstream.
+
+        Same logic as current _get_all_traversable_edges() plus
+        priority filtering for CONDITIONAL edges.
+        """
+        gc = self._gc
+        edges = gc.graph.get_outgoing_edges(self.node_spec.id)
+
+        traversable: list[EdgeSpec] = []
+        for edge in edges:
+            target_spec = gc.graph.get_node(edge.target)
+            if await edge.should_traverse(
+                source_success=result.success,
+                source_output=result.output,
+                buffer_data=gc.buffer.read_all(),
+                llm=gc.llm,
+                goal=gc.goal,
+                source_node_name=self.node_spec.name,
+                target_node_name=target_spec.name if target_spec else edge.target,
+            ):
+                traversable.append(edge)
+
+        # Priority filtering for CONDITIONAL edges
+        if len(traversable) > 1:
+            conditionals = [e for e in traversable if e.condition == EdgeCondition.CONDITIONAL]
+            if len(conditionals) > 1:
+                max_prio = max(e.priority for e in conditionals)
+                traversable = [
+                    e
+                    for e in traversable
+                    if e.condition != EdgeCondition.CONDITIONAL or e.priority == max_prio
+                ]
+
+        # When parallel execution is disabled, follow first match only (sequential)
+        if not gc.enable_parallel_execution and len(traversable) > 1:
+            traversable = traversable[:1]
+
+        # Build activations
+        is_fan_out = len(traversable) > 1
+        fan_out_id = f"{self.node_spec.id}_{uuid.uuid4().hex[:8]}" if is_fan_out else None
+
+        activations: list[Activation] = []
+        for edge in traversable:
+            mapped = edge.map_inputs(result.output, gc.buffer.read_all())
+
+            # Build fan-out tags: inherited + new
+            tags = list(self._inherited_fan_out_tags)
+            if is_fan_out:
+                tags.append(
+                    FanOutTag(
+                        fan_out_id=fan_out_id,
+                        fan_out_source=self.node_spec.id,
+                        branches=frozenset(e.target for e in traversable),
+                        via_branch=edge.target,
+                    )
+                )
+
+            activations.append(
+                Activation(
+                    source_id=self.node_spec.id,
+                    target_id=edge.target,
+                    edge_id=edge.id,
+                    edge=edge,
+                    mapped_inputs=mapped,
+                    fan_out_tags=tags,
+                )
+            )
+
+        if traversable:
+            logger.info(
+                "Worker %s → %d outgoing activation(s)%s",
+                self.node_spec.id,
+                len(activations),
+                f" (fan-out: {[a.target_id for a in activations]})" if is_fan_out else "",
+            )
+
+        return activations
+
+    # ------------------------------------------------------------------
+    # Output handling
+    # ------------------------------------------------------------------
+
+    def _write_outputs(self, result: NodeResult) -> None:
+        """Validate and write node outputs to buffer."""
+        gc = self._gc
+        node_spec = self.node_spec
+
+        # Event loop nodes skip executor-level validation (judge is the authority)
+        if node_spec.node_type != "event_loop":
+            errors = self._validator.validate_all(
+                output=result.output,
+                output_keys=node_spec.output_keys,
+                nullable_keys=getattr(node_spec, "nullable_output_keys", []) or [],
+                output_schema=getattr(node_spec, "output_schema", None),
+                output_model=getattr(node_spec, "output_model", None),
+            )
+            if errors:
+                logger.warning("Worker %s output validation warnings: %s", node_spec.id, errors)
+
+        # Determine if this worker is a fan-out branch
+        is_fanout_branch = any(
+            tag.via_branch == node_spec.id for tag in self._inherited_fan_out_tags
+        )
+
+        # Collect keys to write: declared output_keys + any extra output items
+        # (for fan-out branches, all output items need conflict checking)
+        keys_to_write: set[str] = set(node_spec.output_keys)
+        if is_fanout_branch:
+            keys_to_write |= set(result.output.keys())
+
+        # Write all keys to buffer
+        for key in keys_to_write:
+            value = result.output.get(key)
+            if value is not None:
+                if is_fanout_branch:
+                    conflict_strategy = (
+                        getattr(gc.parallel_config, "buffer_conflict_strategy", "last_wins")
+                        if gc.parallel_config
+                        else "last_wins"
+                    )
+                    prior_worker = gc._fanout_written_keys.get(key)
+                    if prior_worker and prior_worker != node_spec.id:
+                        if conflict_strategy == "error":
+                            raise RuntimeError(
+                                f"Buffer write failed (conflict): key '{key}' already written "
+                                f"by worker '{prior_worker}', "
+                                f"conflicting write from '{node_spec.id}'"
+                            )
+                        elif conflict_strategy == "first_wins":
+                            logger.debug(
+                                "Skipping write to '%s' (first_wins: already set by %s)",
+                                key,
+                                prior_worker,
+                            )
+                            continue
+                        else:
+                            # last_wins: log and overwrite
+                            logger.debug(
+                                "Key '%s' overwritten (last_wins: %s -> %s)",
+                                key,
+                                prior_worker,
+                                node_spec.id,
+                            )
+                    gc._fanout_written_keys[key] = node_spec.id
+                gc.buffer.write(key, value, validate=False)
+
+    # ------------------------------------------------------------------
+    # Context building
+    # ------------------------------------------------------------------
+
+    def _get_node_implementation(self) -> NodeProtocol:
+        """Get or create node implementation."""
+        gc = self._gc
+        if self._node_impl is not None:
+            return self._node_impl
+
+        # Check shared registry first
+        if self.node_spec.id in gc.node_registry:
+            self._node_impl = gc.node_registry[self.node_spec.id]
+            return self._node_impl
+
+        # Auto-create EventLoopNode
+        if self.node_spec.node_type in ("event_loop", "gcu"):
+            from framework.graph.event_loop.types import LoopConfig
+            from framework.graph.event_loop_node import EventLoopNode
+            from framework.graph.node import warn_if_deprecated_client_facing
+
+            conv_store = None
+            if gc.storage_path:
+                from framework.storage.conversation_store import FileConversationStore
+
+                conv_store = FileConversationStore(base_path=gc.storage_path / "conversations")
+
+            spillover = str(gc.storage_path / "data") if gc.storage_path else None
+            lc = gc.loop_config
+            warn_if_deprecated_client_facing(self.node_spec)
+            default_max_iter = 100 if self.node_spec.supports_direct_user_io() else 50
+
+            node = EventLoopNode(
+                event_bus=gc.event_bus,
+                judge=None,
+                config=LoopConfig(
+                    max_iterations=lc.get("max_iterations", default_max_iter),
+                    max_tool_calls_per_turn=lc.get("max_tool_calls_per_turn", 30),
+                    tool_call_overflow_margin=lc.get("tool_call_overflow_margin", 0.5),
+                    stall_detection_threshold=lc.get("stall_detection_threshold", 3),
+                    max_context_tokens=lc.get(
+                        "max_context_tokens",
+                        _default_max_context_tokens(),
+                    ),
+                    max_tool_result_chars=lc.get("max_tool_result_chars", 30_000),
+                    spillover_dir=spillover,
+                    hooks=lc.get("hooks", {}),
+                ),
+                tool_executor=gc.tool_executor,
+                conversation_store=conv_store,
+            )
+            gc.node_registry[self.node_spec.id] = node
+            self._node_impl = node
+            return node
+
+        raise RuntimeError(
+            f"No implementation for node '{self.node_spec.id}' (type: {self.node_spec.node_type})"
+        )
+
+    def _build_node_context(self) -> NodeContext:
+        """Build NodeContext for this worker's execution."""
+        return build_node_context_from_graph_context(
+            self._gc,
+            node_spec=self.node_spec,
+            pause_event=self._pause_requested,
+        )
+
+    # ------------------------------------------------------------------
+    # Event publishing
+    # ------------------------------------------------------------------
+
+    async def _publish_completion(self, completion: WorkerCompletion) -> None:
+        """Publish WORKER_COMPLETED event via the graph-scoped event bus."""
+        gc = self._gc
+        if not gc.event_bus:
+            return
+        if not hasattr(gc.event_bus, "emit_worker_completed"):
+            return
+
+        # Serialize activations to dicts for event data
+        activations_data = []
+        for act in completion.activations:
+            activations_data.append(
+                {
+                    "source_id": act.source_id,
+                    "target_id": act.target_id,
+                    "edge_id": act.edge_id,
+                    "mapped_inputs": act.mapped_inputs,
+                    "fan_out_tags": [
+                        {
+                            "fan_out_id": t.fan_out_id,
+                            "fan_out_source": t.fan_out_source,
+                            "branches": list(t.branches),
+                            "via_branch": t.via_branch,
+                        }
+                        for t in act.fan_out_tags
+                    ],
+                }
+            )
+
+        await gc.event_bus.emit_worker_completed(
+            stream_id=gc.stream_id,
+            node_id=self.node_spec.id,
+            worker_id=self.node_spec.id,
+            success=completion.success,
+            output=completion.output,
+            activations=activations_data,
+            execution_id=gc.execution_id,
+            tokens_used=completion.tokens_used,
+            latency_ms=completion.latency_ms,
+            conversation=completion.conversation,
+        )
+
+    async def _publish_failure(self, error: str) -> None:
+        """Publish WORKER_FAILED event."""
+        gc = self._gc
+        if not gc.event_bus:
+            return
+        if not hasattr(gc.event_bus, "emit_worker_failed"):
+            return
+
+        await gc.event_bus.emit_worker_failed(
+            stream_id=gc.stream_id,
+            node_id=self.node_spec.id,
+            worker_id=self.node_spec.id,
+            error=error,
+            execution_id=gc.execution_id,
+        )
+
+    async def _apply_continuous_transition(self, activations: list[Activation]) -> None:
+        """Apply continuous mode conversation threading for the next node.
+
+        This prepares the inherited conversation before the completion event
+        is published so downstream workers receive a fully updated thread.
+        """
+        gc = self._gc
+        if not gc.is_continuous or not gc.continuous_conversation:
+            return
+
+        next_node_id = next((activation.target_id for activation in activations), None)
+        if not next_node_id:
+            return
+
+        next_spec = gc.graph.get_node(next_node_id)
+        if not next_spec or next_spec.node_type != "event_loop":
+            return
+
+        from framework.graph.prompting import (
+            TransitionSpec,
+            build_narrative,
+            build_system_prompt_for_node_context,
+            build_transition_message,
+        )
+
+        narrative = build_narrative(gc.buffer, gc.path, gc.graph)
+        next_ctx = build_node_context_from_graph_context(
+            gc,
+            node_spec=next_spec,
+            pause_event=self._pause_requested,
+            inherited_conversation=gc.continuous_conversation,
+            narrative=narrative,
+        )
+        gc.continuous_conversation.update_system_prompt(
+            build_system_prompt_for_node_context(next_ctx)
+        )
+        gc.continuous_conversation.set_current_phase(next_spec.id)
+
+        buffer_items, data_files = self._prepare_transition_payload()
+        marker = build_transition_message(
+            TransitionSpec(
+                previous_name=self.node_spec.name,
+                previous_description=self.node_spec.description,
+                next_name=next_spec.name,
+                next_description=next_spec.description,
+                next_output_keys=tuple(next_spec.output_keys or ()),
+                buffer_items=buffer_items,
+                cumulative_tool_names=tuple(sorted(gc.cumulative_tool_names)),
+                data_files=tuple(data_files),
+            )
+        )
+        await gc.continuous_conversation.add_user_message(
+            marker,
+            is_transition_marker=True,
+        )
+
+    def _prepare_transition_payload(self) -> tuple[dict[str, str], list[str]]:
+        """Build transition marker data and spill oversized values when possible."""
+        import json
+        from pathlib import Path
+
+        gc = self._gc
+        data_dir = Path(gc.storage_path / "data") if gc.storage_path else None
+        buffer_items: dict[str, str] = {}
+
+        for key, value in gc.buffer.read_all().items():
+            if value is None:
+                continue
+            val_str = str(value)
+            if len(val_str) > 300 and data_dir is not None:
+                data_dir.mkdir(parents=True, exist_ok=True)
+                ext = ".json" if isinstance(value, (dict, list)) else ".txt"
+                filename = f"output_{key}{ext}"
+                file_path = data_dir / filename
+                try:
+                    write_content = (
+                        json.dumps(value, indent=2, ensure_ascii=False)
+                        if isinstance(value, (dict, list))
+                        else str(value)
+                    )
+                    file_path.write_text(write_content, encoding="utf-8")
+                    file_size = file_path.stat().st_size
+                    buffer_items[key] = (
+                        f"[Saved to '{filename}' ({file_size:,} bytes). "
+                        f"Use load_data(filename='{filename}') to access.]"
+                    )
+                    continue
+                except Exception:
+                    pass
+
+            buffer_items[key] = val_str[:300] + "..." if len(val_str) > 300 else val_str
+
+        data_files: list[str] = []
+        if data_dir is not None and data_dir.exists():
+            data_files = [
+                f"{entry.name} ({entry.stat().st_size:,} bytes)"
+                for entry in sorted(data_dir.iterdir())
+                if entry.is_file()
+            ]
+
+        return buffer_items, data_files
+
+    # ------------------------------------------------------------------
+    # Utility
+    # ------------------------------------------------------------------
+
+    def pause(self) -> None:
+        self._pause_requested.set()
+        self._run_gate.clear()
+
+    def resume(self) -> None:
+        self._pause_requested.clear()
+        self._run_gate.set()
+
+    @property
+    def is_terminal(self) -> bool:
+        return self.node_spec.id in (self._gc.graph.terminal_nodes or [])
+
+    @property
+    def is_entry(self) -> bool:
+        return len(self.incoming_edges) == 0
+
+
+def _default_max_context_tokens() -> int:
+    """Resolve max_context_tokens from global config, falling back to 32000."""
+    try:
+        from framework.config import get_max_context_tokens  # type: ignore[import-untyped]
+
+        return get_max_context_tokens()
+    except Exception:
+        return 32_000
@@ -0,0 +1,706 @@
+"""Antigravity (Google internal Cloud Code Assist) LLM provider.
+
+Antigravity is Google's unified gateway API that routes requests to Gemini,
+Claude, and GPT-OSS models through a single Gemini-style interface.  It is
+NOT the public ``generativelanguage.googleapis.com`` API.
+
+Authentication uses Google OAuth2.  Token refresh is done directly with the
+OAuth client secret — no local proxy required.
+
+Credential sources (checked in order):
+  1. ``~/.hive/antigravity-accounts.json`` (native OAuth implementation)
+  2. Antigravity IDE SQLite state DB (macOS / Linux)
+"""
+
+from __future__ import annotations
+
+import json
+import logging
+import re
+import time
+import uuid
+from collections.abc import AsyncIterator, Callable, Iterator
+from pathlib import Path
+from typing import Any
+
+from framework.llm.provider import LLMProvider, LLMResponse, Tool
+from framework.llm.stream_events import (
+    FinishEvent,
+    StreamErrorEvent,
+    StreamEvent,
+    TextDeltaEvent,
+    TextEndEvent,
+    ToolCallEvent,
+)
+
+logger = logging.getLogger(__name__)
+
+# ---------------------------------------------------------------------------
+# Constants
+# ---------------------------------------------------------------------------
+
+_TOKEN_URL = "https://oauth2.googleapis.com/token"
+
+# Fallback order: daily sandbox → autopush sandbox → production
+_ENDPOINTS = [
+    "https://daily-cloudcode-pa.sandbox.googleapis.com",
+    "https://autopush-cloudcode-pa.sandbox.googleapis.com",
+    "https://cloudcode-pa.googleapis.com",
+]
+_DEFAULT_PROJECT_ID = "rising-fact-p41fc"
+_TOKEN_REFRESH_BUFFER_SECS = 60
+
+# Credentials file in ~/.hive/ (native implementation)
+_ACCOUNTS_FILE = Path.home() / ".hive" / "antigravity-accounts.json"
+_IDE_STATE_DB_MAC = (
+    Path.home()
+    / "Library"
+    / "Application Support"
+    / "Antigravity"
+    / "User"
+    / "globalStorage"
+    / "state.vscdb"
+)
+_IDE_STATE_DB_LINUX = (
+    Path.home() / ".config" / "Antigravity" / "User" / "globalStorage" / "state.vscdb"
+)
+_IDE_STATE_DB_KEY = "antigravityUnifiedStateSync.oauthToken"
+
+_BASE_HEADERS: dict[str, str] = {
+    # Mimic the Antigravity Electron app so the API accepts the request.
+    "User-Agent": (
+        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 "
+        "(KHTML, like Gecko) Antigravity/1.18.3 Chrome/138.0.7204.235 "
+        "Electron/37.3.1 Safari/537.36"
+    ),
+    "X-Goog-Api-Client": "google-cloud-sdk vscode_cloudshelleditor/0.1",
+    "Client-Metadata": '{"ideType":"ANTIGRAVITY","platform":"MACOS","pluginType":"GEMINI"}',
+}
+
+
+# ---------------------------------------------------------------------------
+# Credential loading helpers
+# ---------------------------------------------------------------------------
+
+
+def _load_from_json_file() -> tuple[str | None, str | None, str, float]:
+    """Read credentials from JSON accounts file.
+
+    Reads from ~/.hive/antigravity-accounts.json.
+
+    Returns ``(access_token | None, refresh_token | None, project_id, expires_at)``.
+    ``expires_at`` is a Unix timestamp (seconds); 0.0 means unknown.
+    """
+    if not _ACCOUNTS_FILE.exists():
+        return None, None, _DEFAULT_PROJECT_ID, 0.0
+    try:
+        with open(_ACCOUNTS_FILE, encoding="utf-8") as fh:
+            data = json.load(fh)
+    except (OSError, json.JSONDecodeError) as exc:
+        logger.debug("Failed to read Antigravity accounts file: %s", exc)
+        return None, None, _DEFAULT_PROJECT_ID, 0.0
+
+    accounts = data.get("accounts", [])
+    if not accounts:
+        return None, None, _DEFAULT_PROJECT_ID, 0.0
+
+    account = next((a for a in accounts if a.get("enabled", True) is not False), accounts[0])
+    schema_version = data.get("schemaVersion", 1)
+
+    if schema_version >= 4:
+        # V4 schema: refresh = "refreshToken|projectId[|managedProjectId]"
+        refresh_str = account.get("refresh", "")
+        parts = refresh_str.split("|") if refresh_str else []
+        refresh_token: str | None = parts[0] if parts else None
+        project_id = parts[1] if len(parts) >= 2 and parts[1] else _DEFAULT_PROJECT_ID
+
+        access_token: str | None = account.get("access")
+        expires_ms: int = account.get("expires", 0)
+        expires_at = float(expires_ms) / 1000.0 if expires_ms else 0.0
+
+        # Treat near-expiry tokens as absent so _ensure_token() triggers a refresh.
+        if access_token and expires_at and time.time() >= expires_at - _TOKEN_REFRESH_BUFFER_SECS:
+            access_token = None
+            expires_at = 0.0
+
+        return access_token, refresh_token, project_id, expires_at
+    else:
+        # V1–V3 schema: plain accessToken / refreshToken fields
+        access_token = account.get("accessToken")
+        refresh_token = account.get("refreshToken")
+        # Estimate expiry from last_refresh + 1 h
+        last_refresh_str: str | None = data.get("last_refresh")
+        expires_at = 0.0
+        if last_refresh_str:
+            try:
+                from datetime import datetime  # noqa: PLC0415
+
+                ts = datetime.fromisoformat(last_refresh_str.replace("Z", "+00:00")).timestamp()
+                expires_at = ts + 3600.0
+                if time.time() >= expires_at - _TOKEN_REFRESH_BUFFER_SECS:
+                    access_token = None
+            except (ValueError, TypeError):
+                pass
+        return access_token, refresh_token, _DEFAULT_PROJECT_ID, expires_at
+
+
+def _load_from_ide_db() -> tuple[str | None, str | None, float]:
+    """Extract ``(access_token, refresh_token, expires_at)`` from the IDE SQLite DB."""
+    import base64  # noqa: PLC0415
+    import sqlite3  # noqa: PLC0415
+
+    for db_path in (_IDE_STATE_DB_MAC, _IDE_STATE_DB_LINUX):
+        if not db_path.exists():
+            continue
+        try:
+            con = sqlite3.connect(f"file:{db_path}?mode=ro", uri=True)
+            try:
+                row = con.execute(
+                    "SELECT value FROM ItemTable WHERE key = ?",
+                    (_IDE_STATE_DB_KEY,),
+                ).fetchone()
+            finally:
+                con.close()
+            if not row:
+                continue
+
+            blob = base64.b64decode(row[0])
+            candidates = re.findall(rb"[A-Za-z0-9+/=_\-]{40,}", blob)
+            access_token: str | None = None
+            refresh_token: str | None = None
+            for candidate in candidates:
+                try:
+                    padded = candidate + b"=" * (-len(candidate) % 4)
+                    inner = base64.urlsafe_b64decode(padded)
+                except Exception:
+                    continue
+                if not access_token:
+                    m = re.search(rb"ya29\.[A-Za-z0-9_\-\.]+", inner)
+                    if m:
+                        access_token = m.group(0).decode("ascii")
+                if not refresh_token:
+                    m = re.search(rb"1//[A-Za-z0-9_\-\.]+", inner)
+                    if m:
+                        refresh_token = m.group(0).decode("ascii")
+                if access_token and refresh_token:
+                    break
+
+            if access_token:
+                # Estimate expiry from DB mtime (IDE refreshes while running)
+                mtime = db_path.stat().st_mtime
+                expires_at = mtime + 3600.0
+                return access_token, refresh_token, expires_at
+        except Exception as exc:
+            logger.debug("Failed to read Antigravity IDE state DB: %s", exc)
+            continue
+    return None, None, 0.0
+
+
+def _do_token_refresh(refresh_token: str) -> tuple[str, float] | None:
+    """POST to Google OAuth endpoint and return ``(new_access_token, expires_at)``.
+
+    The client secret is sourced via ``get_antigravity_client_secret()`` (env var,
+    config file, or npm package fallback). When unavailable the refresh is attempted
+    without it — Google will reject it for web-app clients, but the npm fallback in
+    ``get_antigravity_client_secret()`` should ensure the secret is found at runtime.
+
+    Returns None when the HTTP request fails.
+    """
+    from framework.config import get_antigravity_client_secret  # noqa: PLC0415
+
+    client_secret = get_antigravity_client_secret()
+    if not client_secret:
+        logger.debug(
+            "Antigravity client secret not configured — attempting refresh without it. "
+            "Set ANTIGRAVITY_CLIENT_SECRET or run quickstart to configure."
+        )
+
+    import urllib.error  # noqa: PLC0415
+    import urllib.parse  # noqa: PLC0415
+    import urllib.request  # noqa: PLC0415
+
+    from framework.config import get_antigravity_client_id  # noqa: PLC0415
+
+    params: dict[str, str] = {
+        "grant_type": "refresh_token",
+        "refresh_token": refresh_token,
+        "client_id": get_antigravity_client_id(),
+    }
+    if client_secret:
+        params["client_secret"] = client_secret
+    body = urllib.parse.urlencode(params).encode("utf-8")
+
+    req = urllib.request.Request(
+        _TOKEN_URL,
+        data=body,
+        headers={"Content-Type": "application/x-www-form-urlencoded"},
+        method="POST",
+    )
+    try:
+        with urllib.request.urlopen(req, timeout=15) as resp:  # noqa: S310
+            payload = json.loads(resp.read())
+        access_token: str = payload["access_token"]
+        expires_in: int = payload.get("expires_in", 3600)
+        logger.debug("Antigravity token refreshed successfully")
+        return access_token, time.time() + expires_in
+    except Exception as exc:
+        logger.debug("Antigravity token refresh failed: %s", exc)
+        return None
+
+
+# ---------------------------------------------------------------------------
+# Message conversion helpers
+# ---------------------------------------------------------------------------
+
+
+def _clean_tool_name(name: str) -> str:
+    """Sanitize a tool name for the Antigravity function-calling schema."""
+    name = re.sub(r"[/\s]", "_", name)
+    if name and not (name[0].isalpha() or name[0] == "_"):
+        name = "_" + name
+    return name[:64]
+
+
+def _to_gemini_contents(
+    messages: list[dict[str, Any]],
+    thought_sigs: dict[str, str] | None = None,
+) -> list[dict[str, Any]]:
+    """Convert OpenAI-format messages to Gemini-style ``contents`` array."""
+    # Pre-build a map tool_call_id → function_name from assistant messages.
+    # Tool result messages (role="tool") only carry tool_call_id, not the name,
+    # but Gemini requires functionResponse.name to match the functionCall.name.
+    tc_id_to_name: dict[str, str] = {}
+    for msg in messages:
+        if msg.get("role") == "assistant":
+            for tc in msg.get("tool_calls") or []:
+                tc_id = tc.get("id")
+                fn_name = tc.get("function", {}).get("name", "")
+                if tc_id and fn_name:
+                    tc_id_to_name[tc_id] = fn_name
+
+    contents: list[dict[str, Any]] = []
+    # Consecutive tool-result messages must be batched into one user turn.
+    pending_tool_parts: list[dict[str, Any]] = []
+
+    def _flush_tool_results() -> None:
+        if pending_tool_parts:
+            contents.append({"role": "user", "parts": list(pending_tool_parts)})
+            pending_tool_parts.clear()
+
+    for msg in messages:
+        role = msg.get("role", "user")
+        content = msg.get("content")
+
+        if role == "system":
+            continue  # Handled via systemInstruction, not in contents.
+
+        if role == "tool":
+            # OpenAI tool result → Gemini functionResponse part.
+            result_str = content if isinstance(content, str) else str(content or "")
+            tc_id = msg.get("tool_call_id", "")
+            # Look up function name from the pre-built map; fall back to msg.name.
+            fn_name = tc_id_to_name.get(tc_id) or msg.get("name", "")
+            pending_tool_parts.append(
+                {
+                    "functionResponse": {
+                        "name": fn_name,
+                        "id": tc_id,
+                        "response": {"content": result_str},
+                    }
+                }
+            )
+            continue
+
+        _flush_tool_results()
+
+        gemini_role = "model" if role == "assistant" else "user"
+        parts: list[dict[str, Any]] = []
+
+        if isinstance(content, str) and content:
+            parts.append({"text": content})
+        elif isinstance(content, list):
+            for block in content:
+                if not isinstance(block, dict):
+                    continue
+                if block.get("type") == "text":
+                    text = block.get("text", "")
+                    if text:
+                        parts.append({"text": text})
+                # Other block types (image_url etc.) skipped.
+
+        # Assistant messages may carry OpenAI-style tool_calls.
+        for tc in msg.get("tool_calls") or []:
+            fn = tc.get("function", {})
+            try:
+                args = json.loads(fn.get("arguments", "{}") or "{}")
+            except (json.JSONDecodeError, TypeError):
+                args = {}
+            tc_id = tc.get("id", str(uuid.uuid4()))
+            fc_part: dict[str, Any] = {
+                "functionCall": {
+                    "name": fn.get("name", ""),
+                    "args": args,
+                    "id": tc_id,
+                }
+            }
+            if thought_sigs:
+                sig = thought_sigs.get(tc_id, "")
+                if sig:
+                    fc_part["thoughtSignature"] = sig  # part-level, not inside functionCall
+            parts.append(fc_part)
+
+        if parts:
+            contents.append({"role": gemini_role, "parts": parts})
+
+    _flush_tool_results()
+
+    # Gemini requires the first turn to be a user turn.  Drop any leading
+    # model messages so the API doesn't reject with a 400.
+    while contents and contents[0].get("role") == "model":
+        contents.pop(0)
+
+    return contents
+
+
+# ---------------------------------------------------------------------------
+# Response parsing helpers
+# ---------------------------------------------------------------------------
+
+
+def _map_finish_reason(reason: str) -> str:
+    return {"STOP": "stop", "MAX_TOKENS": "max_tokens", "OTHER": "tool_use"}.get(
+        (reason or "").upper(), "stop"
+    )
+
+
+def _parse_complete_response(raw: dict[str, Any], model: str) -> LLMResponse:
+    """Parse a non-streaming Antigravity response dict → LLMResponse."""
+    payload: dict[str, Any] = raw.get("response", raw)
+    candidates: list[dict[str, Any]] = payload.get("candidates", [])
+    usage: dict[str, Any] = payload.get("usageMetadata", {})
+
+    text_parts: list[str] = []
+    if candidates:
+        for part in candidates[0].get("content", {}).get("parts", []):
+            if "text" in part and not part.get("thought"):
+                text_parts.append(part["text"])
+
+    return LLMResponse(
+        content="".join(text_parts),
+        model=payload.get("modelVersion", model),
+        input_tokens=usage.get("promptTokenCount", 0),
+        output_tokens=usage.get("candidatesTokenCount", 0),
+        stop_reason=_map_finish_reason(candidates[0].get("finishReason", "") if candidates else ""),
+        raw_response=raw,
+    )
+
+
+def _parse_sse_stream(
+    response: Any,
+    model: str,
+    on_thought_signature: Callable[[str, str], None] | None = None,
+) -> Iterator[StreamEvent]:
+    """Parse Antigravity SSE response line-by-line → StreamEvents.
+
+    Each SSE line looks like::
+
+        data: {"response": {"candidates": [...], "usageMetadata": {...}}, "traceId": "..."}
+    """
+    accumulated = ""
+    input_tokens = 0
+    output_tokens = 0
+    finish_reason = ""
+
+    for raw_line in response:
+        line: str = raw_line.decode("utf-8", errors="replace").rstrip("\r\n")
+        if not line.startswith("data:"):
+            continue
+        data_str = line[5:].strip()
+        if not data_str or data_str == "[DONE]":
+            continue
+        try:
+            data: dict[str, Any] = json.loads(data_str)
+        except json.JSONDecodeError:
+            continue
+
+        # The outer envelope is {"response": {...}, "traceId": "..."}.
+        payload: dict[str, Any] = data.get("response", data)
+
+        usage = payload.get("usageMetadata", {})
+        if usage:
+            input_tokens = usage.get("promptTokenCount", input_tokens)
+            output_tokens = usage.get("candidatesTokenCount", output_tokens)
+
+        for candidate in payload.get("candidates", []):
+            fr = candidate.get("finishReason", "")
+            if fr:
+                finish_reason = fr
+
+            for part in candidate.get("content", {}).get("parts", []):
+                if "text" in part and not part.get("thought"):
+                    delta: str = part["text"]
+                    accumulated += delta
+                    yield TextDeltaEvent(content=delta, snapshot=accumulated)
+                elif "functionCall" in part:
+                    fc: dict[str, Any] = part["functionCall"]
+                    tool_use_id = fc.get("id") or str(uuid.uuid4())
+                    thought_sig = part.get("thoughtSignature", "")  # sibling of functionCall
+                    if thought_sig and on_thought_signature:
+                        on_thought_signature(tool_use_id, thought_sig)
+                    args = fc.get("args", {})
+                    if isinstance(args, str):
+                        try:
+                            args = json.loads(args)
+                        except json.JSONDecodeError:
+                            args = {}
+                    yield ToolCallEvent(
+                        tool_use_id=tool_use_id,
+                        tool_name=fc.get("name", ""),
+                        tool_input=args,
+                    )
+
+    if accumulated:
+        yield TextEndEvent(full_text=accumulated)
+    yield FinishEvent(
+        stop_reason=_map_finish_reason(finish_reason),
+        input_tokens=input_tokens,
+        output_tokens=output_tokens,
+        model=model,
+    )
+
+
+# ---------------------------------------------------------------------------
+# Provider
+# ---------------------------------------------------------------------------
+
+
+class AntigravityProvider(LLMProvider):
+    """LLM provider for Google's internal Antigravity Code Assist gateway.
+
+    No local proxy required.  Handles OAuth token refresh, Gemini-format
+    request/response conversion, and SSE streaming directly.
+    """
+
+    def __init__(self, model: str = "gemini-3-flash") -> None:
+        # Strip any provider prefix ("openai/gemini-3-flash" → "gemini-3-flash").
+        if "/" in model:
+            model = model.split("/", 1)[1]
+        self.model = model
+
+        self._access_token: str | None = None
+        self._refresh_token: str | None = None
+        self._project_id: str = _DEFAULT_PROJECT_ID
+        self._token_expires_at: float = 0.0
+        self._thought_sigs: dict[str, str] = {}  # tool_use_id → thoughtSignature
+
+        self._init_credentials()
+
+    # --- Credential management -------------------------------------------- #
+
+    def _init_credentials(self) -> None:
+        """Load credentials from the best available source."""
+        access, refresh, project_id, expires_at = _load_from_json_file()
+        if refresh:
+            self._refresh_token = refresh
+            self._project_id = project_id
+            self._access_token = access
+            self._token_expires_at = expires_at
+            return
+
+        # Fall back to IDE state DB.
+        access, refresh, expires_at = _load_from_ide_db()
+        if access:
+            self._access_token = access
+            self._refresh_token = refresh
+            self._token_expires_at = expires_at
+
+    def has_credentials(self) -> bool:
+        """Return True if any credential is available."""
+        return bool(self._access_token or self._refresh_token)
+
+    def _ensure_token(self) -> str:
+        """Return a valid access token, refreshing via OAuth if needed."""
+        if (
+            self._access_token
+            and self._token_expires_at
+            and time.time() < self._token_expires_at - _TOKEN_REFRESH_BUFFER_SECS
+        ):
+            return self._access_token
+
+        if self._refresh_token:
+            result = _do_token_refresh(self._refresh_token)
+            if result:
+                self._access_token, self._token_expires_at = result
+                return self._access_token
+
+        if self._access_token:
+            logger.warning("Using potentially stale Antigravity access token")
+            return self._access_token
+
+        raise RuntimeError(
+            "No valid Antigravity credentials. "
+            "Run: uv run python core/antigravity_auth.py auth account add"
+        )
+
+    # --- Request building -------------------------------------------------- #
+
+    def _build_body(
+        self,
+        messages: list[dict[str, Any]],
+        system: str,
+        tools: list[Tool] | None,
+        max_tokens: int,
+    ) -> dict[str, Any]:
+        contents = _to_gemini_contents(messages, self._thought_sigs)
+        inner: dict[str, Any] = {
+            "contents": contents,
+            "generationConfig": {"maxOutputTokens": max_tokens},
+        }
+        if system:
+            inner["systemInstruction"] = {"parts": [{"text": system}]}
+        if tools:
+            inner["tools"] = [
+                {
+                    "functionDeclarations": [
+                        {
+                            "name": _clean_tool_name(t.name),
+                            "description": t.description,
+                            "parameters": t.parameters
+                            or {
+                                "type": "object",
+                                "properties": {},
+                            },
+                        }
+                        for t in tools
+                    ]
+                }
+            ]
+        return {
+            "project": self._project_id,
+            "model": self.model,
+            "request": inner,
+            "requestType": "agent",
+            "userAgent": "antigravity",
+            "requestId": f"agent-{uuid.uuid4()}",
+        }
+
+    # --- HTTP transport ---------------------------------------------------- #
+
+    def _post(self, body: dict[str, Any], *, streaming: bool) -> Any:
+        """POST to the Antigravity endpoint, falling back through the endpoint list."""
+        import urllib.error  # noqa: PLC0415
+        import urllib.request  # noqa: PLC0415
+
+        token = self._ensure_token()
+        body_bytes = json.dumps(body).encode("utf-8")
+        path = (
+            "/v1internal:streamGenerateContent?alt=sse"
+            if streaming
+            else "/v1internal:generateContent"
+        )
+        headers = {
+            **_BASE_HEADERS,
+            "Authorization": f"Bearer {token}",
+            "Content-Type": "application/json",
+        }
+        if streaming:
+            headers["Accept"] = "text/event-stream"
+
+        last_exc: Exception | None = None
+        for base_url in _ENDPOINTS:
+            url = f"{base_url}{path}"
+            req = urllib.request.Request(url, data=body_bytes, headers=headers, method="POST")
+            try:
+                return urllib.request.urlopen(req, timeout=120)  # noqa: S310
+            except urllib.error.HTTPError as exc:
+                if exc.code in (401, 403) and self._refresh_token:
+                    # Token rejected — refresh once and retry this endpoint.
+                    result = _do_token_refresh(self._refresh_token)
+                    if result:
+                        self._access_token, self._token_expires_at = result
+                        headers["Authorization"] = f"Bearer {self._access_token}"
+                        req2 = urllib.request.Request(
+                            url, data=body_bytes, headers=headers, method="POST"
+                        )
+                        try:
+                            return urllib.request.urlopen(req2, timeout=120)  # noqa: S310
+                        except urllib.error.HTTPError as exc2:
+                            last_exc = exc2
+                            continue
+                    last_exc = exc
+                    continue
+                elif exc.code >= 500:
+                    last_exc = exc
+                    continue
+                # Include the API response body in the exception for easier debugging.
+                try:
+                    err_body = exc.read().decode("utf-8", errors="replace")
+                except Exception:
+                    err_body = "(unreadable)"
+                raise RuntimeError(f"Antigravity HTTP {exc.code} from {url}: {err_body}") from exc
+            except (urllib.error.URLError, OSError) as exc:
+                last_exc = exc
+                continue
+
+        raise RuntimeError(
+            f"All Antigravity endpoints failed. Last error: {last_exc}"
+        ) from last_exc
+
+    # --- LLMProvider interface --------------------------------------------- #
+
+    def complete(
+        self,
+        messages: list[dict[str, Any]],
+        system: str = "",
+        tools: list[Tool] | None = None,
+        max_tokens: int = 1024,
+        response_format: dict[str, Any] | None = None,
+        json_mode: bool = False,
+        max_retries: int | None = None,
+    ) -> LLMResponse:
+        if json_mode:
+            suffix = "\n\nPlease respond with a valid JSON object."
+            system = (system + suffix) if system else suffix.strip()
+
+        body = self._build_body(messages, system, tools, max_tokens)
+        resp = self._post(body, streaming=False)
+        return _parse_complete_response(json.loads(resp.read()), self.model)
+
+    async def stream(
+        self,
+        messages: list[dict[str, Any]],
+        system: str = "",
+        tools: list[Tool] | None = None,
+        max_tokens: int = 4096,
+    ) -> AsyncIterator[StreamEvent]:
+        import asyncio  # noqa: PLC0415
+        import concurrent.futures  # noqa: PLC0415
+
+        loop = asyncio.get_running_loop()
+        queue: asyncio.Queue[StreamEvent | None] = asyncio.Queue()
+
+        def _blocking_work() -> None:
+            try:
+                body = self._build_body(messages, system, tools, max_tokens)
+                http_resp = self._post(body, streaming=True)
+                for event in _parse_sse_stream(
+                    http_resp, self.model, self._thought_sigs.__setitem__
+                ):
+                    loop.call_soon_threadsafe(queue.put_nowait, event)
+            except Exception as exc:
+                logger.error("Antigravity stream error: %s", exc)
+                loop.call_soon_threadsafe(queue.put_nowait, StreamErrorEvent(error=str(exc)))
+            finally:
+                loop.call_soon_threadsafe(queue.put_nowait, None)  # sentinel
+
+        executor = concurrent.futures.ThreadPoolExecutor(max_workers=1)
+        fut = loop.run_in_executor(executor, _blocking_work)
+        try:
+            while True:
+                event = await queue.get()
+                if event is None:
+                    break
+                yield event
+        finally:
+            await fut
+            executor.shutdown(wait=False)
@@ -0,0 +1,106 @@
+"""Model capability checks for LLM providers.
+
+Vision support rules are derived from official vendor documentation:
+- ZAI (z.ai): docs.z.ai/guides/vlm — GLM-4.6V variants are vision; GLM-5/4.6/4.7 are text-only
+- MiniMax: platform.minimax.io/docs — minimax-vl-01 is vision; M2.x are text-only
+- DeepSeek: api-docs.deepseek.com — deepseek-vl2 is vision; chat/reasoner are text-only
+- Cerebras: inference-docs.cerebras.ai — no vision models at all
+- Groq: console.groq.com/docs/vision — vision capable; treat as supported by default
+- Ollama/LM Studio/vLLM/llama.cpp: local runners denied by default; model names
+  don't reliably indicate vision support, so users must configure explicitly
+"""
+
+from __future__ import annotations
+
+
+def _model_name(model: str) -> str:
+    """Return the bare model name after stripping any 'provider/' prefix."""
+    if "/" in model:
+        return model.split("/", 1)[1]
+    return model
+
+
+# Step 1: explicit vision allow-list — these always support images regardless
+# of what the provider-level rules say.  Checked first so that e.g. glm-4.6v
+# is allowed even though glm-4.6 is denied.
+_VISION_ALLOW_BARE_PREFIXES: tuple[str, ...] = (
+    # ZAI/GLM vision models (docs.z.ai/guides/vlm)
+    "glm-4v",  # GLM-4V series (legacy)
+    "glm-4.6v",  # GLM-4.6V, GLM-4.6V-flash, GLM-4.6V-flashx
+    # DeepSeek vision models
+    "deepseek-vl",  # deepseek-vl2, deepseek-vl2-small, deepseek-vl2-tiny
+    # MiniMax vision model
+    "minimax-vl",  # minimax-vl-01
+)
+
+# Step 2: provider-level deny — every model from this provider is text-only.
+_TEXT_ONLY_PROVIDER_PREFIXES: tuple[str, ...] = (
+    # Cerebras: inference-docs.cerebras.ai lists only text models
+    "cerebras/",
+    # Local runners: model names don't reliably indicate vision support
+    "ollama/",
+    "ollama_chat/",
+    "lm_studio/",
+    "vllm/",
+    "llamacpp/",
+)
+
+# Step 3: per-model deny — text-only models within otherwise mixed providers.
+# Matched against the bare model name (provider prefix stripped, lower-cased).
+# The vision allow-list above is checked first, so vision variants of the same
+# family are already handled before these deny patterns are reached.
+_TEXT_ONLY_MODEL_BARE_PREFIXES: tuple[str, ...] = (
+    # --- ZAI / GLM family ---
+    # text-only: glm-5, glm-4.6, glm-4.7, glm-4.5, zai-glm-*
+    # vision:    glm-4v, glm-4.6v (caught by allow-list above)
+    "glm-5",
+    "glm-4.6",  # bare glm-4.6 is text-only; glm-4.6v is caught by allow-list
+    "glm-4.7",
+    "glm-4.5",
+    "zai-glm",
+    # --- DeepSeek ---
+    # text-only: deepseek-chat, deepseek-coder, deepseek-reasoner
+    # vision:    deepseek-vl2 (caught by allow-list above)
+    # Note: LiteLLM's deepseek handler may flatten content lists for some models;
+    # VL models are allowed through and rely on LiteLLM's native VL support.
+    "deepseek-chat",
+    "deepseek-coder",
+    "deepseek-reasoner",
+    # --- MiniMax ---
+    # text-only: minimax-m2.*, minimax-text-*, abab* (legacy)
+    # vision:    minimax-vl-01 (caught by allow-list above)
+    "minimax-m2",
+    "minimax-text",
+    "abab",
+)
+
+
+def supports_image_tool_results(model: str) -> bool:
+    """Return whether *model* can receive image content in messages.
+
+    Used to gate both user-message images and tool-result image blocks.
+
+    Logic (checked in order):
+    1. Vision allow-list  → True  (known vision model, skip all denies)
+    2. Provider deny      → False (entire provider is text-only)
+    3. Model deny         → False (specific text-only model within a mixed provider)
+    4. Default            → True  (assume capable; unknown providers and models)
+    """
+    model_lower = model.lower()
+    bare = _model_name(model_lower)
+
+    # 1. Explicit vision allow — takes priority over all denies
+    if any(bare.startswith(p) for p in _VISION_ALLOW_BARE_PREFIXES):
+        return True
+
+    # 2. Provider-level deny (all models from this provider are text-only)
+    if any(model_lower.startswith(p) for p in _TEXT_ONLY_PROVIDER_PREFIXES):
+        return False
+
+    # 3. Per-model deny (text-only variants within mixed-capability families)
+    if any(bare.startswith(p) for p in _TEXT_ONLY_MODEL_BARE_PREFIXES):
+        return False
+
+    # 5. Default: assume vision capable
+    #    Covers: OpenAI, Anthropic, Google, Mistral, Kimi, and other hosted providers
+    return True
@@ -45,6 +45,8 @@ class ToolResult:
    tool_use_id: str
    content: str
    is_error: bool = False
+    image_content: list[dict[str, Any]] | None = None
+    is_skill_content: bool = False  # AS-10: marks activated skill body, protected from pruning


 class LLMProvider(ABC):
@@ -1,33 +0,0 @@
-"""Framework-level worker monitoring package.
-
-Provides the Worker Health Judge: a reusable secondary graph that attaches to
-any worker agent runtime and monitors its execution health via periodic log
-inspection. Emits structured EscalationTickets when degradation is detected.
-
-Usage::
-
-    from framework.monitoring import HEALTH_JUDGE_ENTRY_POINT, judge_goal, judge_graph
-    from framework.tools.worker_monitoring_tools import register_worker_monitoring_tools
-
-    # Register tools bound to the worker runtime's EventBus
-    monitoring_registry = ToolRegistry()
-    register_worker_monitoring_tools(monitoring_registry, worker_runtime._event_bus, storage_path)
-
-    # Load judge as secondary graph on the worker runtime
-    await worker_runtime.add_graph(
-        graph_id="judge",
-        graph=judge_graph,
-        goal=judge_goal,
-        entry_points={"health_check": HEALTH_JUDGE_ENTRY_POINT},
-        storage_subpath="graphs/judge",
-    )
-"""
-
-from .judge import HEALTH_JUDGE_ENTRY_POINT, judge_goal, judge_graph, judge_node
-
-__all__ = [
-    "HEALTH_JUDGE_ENTRY_POINT",
-    "judge_goal",
-    "judge_graph",
-    "judge_node",
-]
@@ -1,258 +0,0 @@
-"""Worker Health Judge — framework-level reusable monitoring graph.
-
-Attaches to any worker agent runtime as a secondary graph. Fires on a
-2-minute timer, reads the worker's session logs via ``get_worker_health_summary``,
-accumulates observations in a continuous conversation context, and emits a
-structured ``EscalationTicket`` when it detects a degradation pattern.
-
-Usage::
-
-    from framework.monitoring import judge_graph, judge_goal, HEALTH_JUDGE_ENTRY_POINT
-    from framework.tools.worker_monitoring_tools import register_worker_monitoring_tools
-
-    # Register tools bound to the worker runtime's event bus
-    monitoring_registry = ToolRegistry()
-    register_worker_monitoring_tools(
-        monitoring_registry, worker_runtime._event_bus, storage_path
-    )
-    monitoring_tools = list(monitoring_registry.get_tools().values())
-    monitoring_executor = monitoring_registry.get_executor()
-
-    # Load judge as secondary graph on the worker runtime
-    await worker_runtime.add_graph(
-        graph_id="judge",
-        graph=judge_graph,
-        goal=judge_goal,
-        entry_points={"health_check": HEALTH_JUDGE_ENTRY_POINT},
-        storage_subpath="graphs/judge",
-    )
-
-Design:
- ``isolation_level="isolated"`` — the judge has its own memory, not
-  polluting the worker's shared memory namespace.
- ``conversation_mode="continuous"`` — the judge's conversation carries
-  across timer ticks. The conversation IS the judge's memory. It tracks
-  trends by referring to its own prior messages ("Last check I saw 47
-  steps; now 52; 5 new steps, 3 RETRY").
- No shared memory keys. No external state files.
-"""
-
-from __future__ import annotations
-
-from framework.graph import Constraint, Goal, NodeSpec, SuccessCriterion
-from framework.graph.edge import AsyncEntryPointSpec, GraphSpec
-
-# ---------------------------------------------------------------------------
-# Goal
-# ---------------------------------------------------------------------------
-
-judge_goal = Goal(
-    id="worker-health-monitor",
-    name="Worker Health Monitor",
-    description=(
-        "Periodically assess the health of the worker agent by reading its "
-        "execution logs. Detect degradation patterns (excessive retries, "
-        "stalls, doom loops) and emit structured EscalationTickets when the "
-        "worker needs attention."
-    ),
-    success_criteria=[
-        SuccessCriterion(
-            id="accurate-detection",
-            description="Only escalates genuine degradation, not normal retry cycles",
-            metric="false_positive_rate",
-            target="low",
-            weight=0.5,
-        ),
-        SuccessCriterion(
-            id="timely-detection",
-            description="Detects genuine stalls within 2 timer ticks (≤4 minutes)",
-            metric="detection_latency_minutes",
-            target="<=4",
-            weight=0.5,
-        ),
-    ],
-    constraints=[
-        Constraint(
-            id="conservative-escalation",
-            description=(
-                "Do not escalate on a single bad verdict or a brief stall. "
-                "Require clear patterns (10+ consecutive bad verdicts or 4+ minute stall) "
-                "before creating a ticket."
-            ),
-            constraint_type="hard",
-            category="quality",
-        ),
-        Constraint(
-            id="complete-ticket",
-            description=(
-                "Every EscalationTicket must have all required fields filled. "
-                "Do not emit partial or placeholder tickets."
-            ),
-            constraint_type="hard",
-            category="correctness",
-        ),
-    ],
-)
-
-# ---------------------------------------------------------------------------
-# Node
-# ---------------------------------------------------------------------------
-
-judge_node = NodeSpec(
-    id="judge",
-    name="Worker Health Judge",
-    description=(
-        "Autonomous health monitor for worker agents. Reads execution logs "
-        "on each timer tick, compares to prior observations (via conversation "
-        "history), and emits a structured EscalationTicket when a genuine "
-        "degradation pattern is detected."
-    ),
-    node_type="event_loop",
-    client_facing=False,  # Autonomous monitor, not interactive
-    max_node_visits=0,  # Unbounded — runs on every timer tick
-    input_keys=[],
-    output_keys=["health_verdict"],
-    nullable_output_keys=["health_verdict"],
-    success_criteria=(
-        "A clear health verdict is produced each check: either 'healthy' with "
-        "a brief observation, or a complete EscalationTicket is emitted via "
-        "emit_escalation_ticket and health_verdict describes the issue."
-    ),
-    tools=[
-        "get_worker_health_summary",
-        "emit_escalation_ticket",
-    ],
-    system_prompt="""\
-You are the Worker Health Judge. You run every 2 minutes alongside a worker \
-agent to monitor its execution health.
-
-# Your Role
-
-You observe the worker's iteration patterns over time and escalate only when \
-you see genuine degradation — not normal retry cycles. Your conversation history \
-IS your memory. On each check, refer to your previous observations to track trends.
-
-# Check Procedure
-
-On each timer tick (every 2 minutes):
-
-## Step 1: Read health snapshot
-Call get_worker_health_summary() with no arguments to auto-discover the active \
-session. This returns:
- worker_agent_id: the worker's agent name — use this for ticket identity fields
- worker_graph_id: the worker's primary graph ID — use this for ticket identity fields
- session_id: the session being monitored — use this for worker_session_id in tickets
- total_steps: how many log steps have been recorded
- recent_verdicts: list of recent ACCEPT/RETRY/CONTINUE verdicts
- steps_since_last_accept: consecutive non-ACCEPT steps
- stall_minutes: wall-clock since last step (null if active)
- evidence_snippet: recent LLM output
-
-## Step 2: Compare to prior check
-Look at your conversation history. What was total_steps last time?
- If total_steps is UNCHANGED from prior check AND prior check was also unchanged:
-  → STALL confirmed (worker has produced no new iterations in 4+ minutes).
-  → Escalate with severity="high" or "critical" depending on stall duration.
- If total_steps increased: worker is making progress. Examine verdicts.
-
-## Step 3: Analyze verdict pattern
- Healthy: Mix of ACCEPT and RETRY, steps_since_last_accept < 5. No action.
- Warning: steps_since_last_accept is 5-9. Note it, no escalation yet.
- Degraded: steps_since_last_accept >= 10. Examine evidence_snippet.
-  - If evidence shows the agent is making real progress (complex reasoning,
-    exploring solutions, productive tool use): may be a hard problem. Note it.
-  - If evidence shows a loop (same error, same tool call, no new information):
-    → Escalate with severity="medium" or "high".
- Critical: steps_since_last_accept >= 20, OR stall_minutes >= 4.
-  → Escalate with severity="critical".
-
-## Step 4: Decide
-
-### If healthy:
-set_output("health_verdict", "healthy: <brief observation>")
-Done.
-
-### If escalating:
-Build an EscalationTicket JSON string with ALL required fields:
-{
-  "worker_agent_id": "<worker_agent_id from get_worker_health_summary>",
-  "worker_session_id": "<session_id from get_worker_health_summary>",
-  "worker_node_id": "<worker_graph_id from get_worker_health_summary>",
-  "worker_graph_id": "<worker_graph_id from get_worker_health_summary>",
-  "severity": "<low|medium|high|critical>",
-  "cause": "<what you observed — concrete, specific>",
-  "judge_reasoning": "<why you decided to escalate, not just dismiss>",
-  "suggested_action": "<what you recommend: restart, human review, etc.>",
-  "recent_verdicts": [<list from get_worker_health_summary>],
-  "total_steps_checked": <int>,
-  "steps_since_last_accept": <int>,
-  "stall_minutes": <float or null>,
-  "evidence_snippet": "<from get_worker_health_summary>"
-}
-
-Call: emit_escalation_ticket(ticket_json=<the JSON string above>)
-Then: set_output("health_verdict", "escalated: <one-line summary>")
-
-# Severity Guide
-
- low: Mild concern, worth noting. 5-9 consecutive bad verdicts.
- medium: Clear degradation pattern. 10-15 bad verdicts or brief stall (1-2 min).
- high: Serious issue. 15+ bad verdicts or stall 2-4 minutes or clear doom loop.
- critical: Worker is definitively stuck. 20+ bad verdicts or stall > 4 minutes.
-
-# Conservative Bias
-
-You MUST resist the urge to escalate prematurely. Worker agents naturally retry.
-A node may legitimately need 5-8 retries before succeeding. Do not escalate unless:
-1. The pattern is clear and sustained across your observation window, AND
-2. The evidence shows no genuine progress
-
-One missed escalation is less costly than two false alarms. The Queen will filter \
-further. But do not be passive — genuine stalls and doom loops must be caught.
-
-# Rules
- Never escalate on the FIRST check unless stall_minutes > 4
- Always call get_worker_health_summary FIRST before deciding anything
- All ticket fields are REQUIRED — do not submit partial tickets
- After any emit_escalation_ticket call, always set_output to complete the check
-""",
-)
-
-# ---------------------------------------------------------------------------
-# Entry Point
-# ---------------------------------------------------------------------------
-
-HEALTH_JUDGE_ENTRY_POINT = AsyncEntryPointSpec(
-    id="health_check",
-    name="Worker Health Check",
-    entry_node="judge",
-    trigger_type="timer",
-    trigger_config={
-        "interval_minutes": 2,
-        "run_immediately": True,  # Fire immediately to establish a baseline
-    },
-    isolation_level="isolated",  # Own memory namespace, not polluting worker's
-)
-
-# ---------------------------------------------------------------------------
-# Graph
-# ---------------------------------------------------------------------------
-
-judge_graph = GraphSpec(
-    id="judge-graph",
-    goal_id=judge_goal.id,
-    version="1.0.0",
-    entry_node="judge",
-    entry_points={"health_check": "judge"},
-    terminal_nodes=["judge"],  # Judge node can terminate after each check
-    pause_nodes=[],
-    nodes=[judge_node],
-    edges=[],
-    conversation_mode="continuous",  # Conversation persists across timer ticks
-    async_entry_points=[HEALTH_JUDGE_ENTRY_POINT],
-    loop_config={
-        "max_iterations": 10,  # One check shouldn't take many turns
-        "max_tool_calls_per_turn": 3,  # get_summary + optionally emit_ticket
-        "max_context_tokens": 16000,  # Compact — judge only needs recent context
-    },
-)
@@ -83,18 +83,18 @@ configure_logging(level="INFO", format="auto")
 - Compact single-line format (easy to stream/parse)
 - All trace context fields included automatically

-### Human-Readable Format (Development)
+### Human-Readable Format (Development / Terminal)

 ```
-[INFO    ] [trace:12345678 | exec:a1b2c3d4 | agent:sales-agent] Starting agent execution
-[INFO    ] [trace:12345678 | exec:a1b2c3d4 | agent:sales-agent] Processing input data [node_id:input-processor]
-[INFO    ] [trace:12345678 | exec:a1b2c3d4 | agent:sales-agent] LLM call completed [latency_ms:1250] [tokens_used:450]
+[INFO    ] [agent:sales-agent] Starting agent execution
+[INFO    ] [agent:sales-agent] Processing input data [node_id:input-processor]
+[INFO    ] [agent:sales-agent] LLM call completed [latency_ms:1250] [tokens_used:450]
 ```

 **Features:**
 - Color-coded log levels
- Shortened IDs for readability (first 8 chars)
- Context prefix shows trace correlation
+- Terminal output omits trace_id and execution_id for readability
+- For full traceability (e.g. debugging), use `ENV=production` to get JSON file logs with trace_id and execution_id

 ## Trace Context Fields

@@ -4,8 +4,9 @@ Structured logging with automatic trace context propagation.
 Key Features:
 - Zero developer friction: Standard logger.info() calls get automatic context
 - ContextVar-based propagation: Thread-safe and async-safe
- Dual output modes: JSON for production, human-readable for development
- Correlation IDs: trace_id follows entire request flow automatically
+- Dual output modes: JSON for production (full trace_id/execution_id), human-readable for terminal
+- Terminal omits trace_id/execution_id for readability
+- Use ENV=production for file logs with full traceability

 Architecture:
    Runtime.start_run() → Generates trace_id, sets context once
@@ -29,6 +30,8 @@ from typing import Any
 # ContextVar is thread-safe and async-safe - perfect for concurrent agent execution
 trace_context: ContextVar[dict[str, Any] | None] = ContextVar("trace_context", default=None)

+_STANDARD_LOG_RECORD_FIELDS = set(logging.makeLogRecord({}).__dict__)
+
 # ANSI escape code pattern (matches \033[...m or \x1b[...m)
 ANSI_ESCAPE_PATTERN = re.compile(r"\x1b\[[0-9;]*m|\033\[[0-9;]*m")

@@ -91,6 +94,14 @@ class StructuredFormatter(logging.Formatter):
        if model is not None:
            log_entry["model"] = model

+        # Preserve arbitrary structured fields passed via ``extra=...``.
+        for key, value in record.__dict__.items():
+            if key in _STANDARD_LOG_RECORD_FIELDS or key.startswith("_"):
+                continue
+            if key in log_entry:
+                continue
+            log_entry[key] = value
+
        # Add exception info if present (strip ANSI codes from exception text too)
        if record.exc_info:
            exception_text = self.formatException(record.exc_info)
@@ -101,10 +112,11 @@ class StructuredFormatter(logging.Formatter):

 class HumanReadableFormatter(logging.Formatter):
    """
-    Human-readable formatter for development.
+    Human-readable formatter for development (terminal output).

-    Provides colorized logs with trace context for local debugging.
-    Includes trace_id prefix for correlation - AUTOMATIC!
+    Provides colorized logs for local debugging. Omits trace_id and execution_id
+    from the terminal for readability; use ENV=production (JSON file logs) when
+    traceability is needed.
    """

    COLORS = {
@@ -118,18 +130,11 @@ class HumanReadableFormatter(logging.Formatter):

    def format(self, record: logging.LogRecord) -> str:
        """Format log record as human-readable string."""
-        # Get trace context - AUTOMATIC!
+        # Get trace context; omit trace_id and execution_id in terminal for readability
        context = trace_context.get() or {}
-        trace_id = context.get("trace_id", "")
-        execution_id = context.get("execution_id", "")
        agent_id = context.get("agent_id", "")

-        # Build context prefix
        prefix_parts = []
-        if trace_id:
-            prefix_parts.append(f"trace:{trace_id[:8]}")
-        if execution_id:
-            prefix_parts.append(f"exec:{execution_id[-8:]}")
        if agent_id:
            prefix_parts.append(f"agent:{agent_id}")

@@ -211,6 +216,15 @@ def configure_logging(
    root_logger.addHandler(handler)
    root_logger.setLevel(level.upper())

+    # Suppress noisy LiteLLM INFO logs (model/provider line + Provider List URL
+    # printed on every single completion call).  Warnings and errors still show.
+    # Honour LITELLM_LOG env var so users can opt-in to debug output.
+    _litellm_level = os.getenv("LITELLM_LOG", "").upper()
+    if _litellm_level and hasattr(logging, _litellm_level):
+        logging.getLogger("LiteLLM").setLevel(getattr(logging, _litellm_level))
+    else:
+        logging.getLogger("LiteLLM").setLevel(logging.WARNING)
+
    # When in JSON mode, configure known third-party loggers to use JSON formatter
    # This ensures libraries like LiteLLM, httpcore also output clean JSON
    if format == "json":
@@ -1,6 +1,6 @@
 """Agent Runner - load and run exported agents."""

-from framework.runner.orchestrator import AgentOrchestrator
+from framework.runner.mcp_registry import MCPRegistry
 from framework.runner.protocol import (
    AgentMessage,
    CapabilityLevel,
@@ -17,9 +17,8 @@ __all__ = [
    "AgentInfo",
    "ValidationResult",
    "ToolRegistry",
+    "MCPRegistry",
    "tool",
-    # Multi-agent
-    "AgentOrchestrator",
    "AgentMessage",
    "MessageType",
    "CapabilityLevel",
@@ -51,6 +51,11 @@ def register_commands(subparsers: argparse._SubParsersAction) -> None:
        action="store_true",
        help="Show detailed execution logs (steps, LLM calls, etc.)",
    )
+    run_parser.add_argument(
+        "--debug",
+        action="store_true",
+        help="Show all debug-level logs",
+    )

    run_parser.add_argument(
        "--model",
@@ -119,46 +124,6 @@ def register_commands(subparsers: argparse._SubParsersAction) -> None:
    )
    list_parser.set_defaults(func=cmd_list)

-    # dispatch command (multi-agent)
-    dispatch_parser = subparsers.add_parser(
-        "dispatch",
-        help="Dispatch request to multiple agents",
-        description="Route a request to the best agent(s) using the orchestrator.",
-    )
-    dispatch_parser.add_argument(
-        "agents_dir",
-        type=str,
-        nargs="?",
-        default="exports",
-        help="Directory containing agent folders (default: exports)",
-    )
-    dispatch_parser.add_argument(
-        "--input",
-        "-i",
-        type=str,
-        required=True,
-        help="Input context as JSON string",
-    )
-    dispatch_parser.add_argument(
-        "--intent",
-        type=str,
-        help="Description of what you want to accomplish",
-    )
-    dispatch_parser.add_argument(
-        "--agents",
-        "-a",
-        type=str,
-        nargs="+",
-        help="Specific agent names to use (default: all in directory)",
-    )
-    dispatch_parser.add_argument(
-        "--quiet",
-        "-q",
-        action="store_true",
-        help="Only output the final result JSON",
-    )
-    dispatch_parser.set_defaults(func=cmd_dispatch)
-
    # shell command (interactive agent session)
    shell_parser = subparsers.add_parser(
        "shell",
@@ -177,11 +142,6 @@ def register_commands(subparsers: argparse._SubParsersAction) -> None:
        default="exports",
        help="Directory containing agents (default: exports)",
    )
-    shell_parser.add_argument(
-        "--multi",
-        action="store_true",
-        help="Enable multi-agent mode with orchestrator",
-    )
    shell_parser.add_argument(
        "--no-approve",
        action="store_true",
@@ -243,12 +203,8 @@ def register_commands(subparsers: argparse._SubParsersAction) -> None:
        action="store_true",
        help="Open dashboard in browser after server starts",
    )
-    serve_parser.add_argument(
-        "--verbose", "-v", action="store_true", help="Enable INFO log level"
-    )
-    serve_parser.add_argument(
-        "--debug", action="store_true", help="Enable DEBUG log level"
-    )
+    serve_parser.add_argument("--verbose", "-v", action="store_true", help="Enable INFO log level")
+    serve_parser.add_argument("--debug", action="store_true", help="Enable DEBUG log level")
    serve_parser.set_defaults(func=cmd_serve)

    # open command (serve + auto-open browser)
@@ -286,19 +242,18 @@ def register_commands(subparsers: argparse._SubParsersAction) -> None:
        default=None,
        help="LLM model for preloaded agents",
    )
-    open_parser.add_argument(
-        "--verbose", "-v", action="store_true", help="Enable INFO log level"
-    )
-    open_parser.add_argument(
-        "--debug", action="store_true", help="Enable DEBUG log level"
-    )
+    open_parser.add_argument("--verbose", "-v", action="store_true", help="Enable INFO log level")
+    open_parser.add_argument("--debug", action="store_true", help="Enable DEBUG log level")
    open_parser.set_defaults(func=cmd_open)


 def _load_resume_state(
    agent_path: str, session_id: str, checkpoint_id: str | None = None
 ) -> dict | None:
-    """Load session or checkpoint state for headless resume.
+    """Load checkpoint state for headless resume.
+
+    All resumes require a checkpoint. If ``checkpoint_id`` is not provided
+    the latest checkpoint is auto-discovered.

    Args:
        agent_path: Path to the agent folder (e.g., exports/my_agent)
@@ -306,7 +261,7 @@ def _load_resume_state(
        checkpoint_id: Optional checkpoint ID within the session

    Returns:
-        session_state dict for executor, or None if not found
+        session_state dict for executor, or None if no checkpoint found
    """
    agent_name = Path(agent_path).name
    agent_work_dir = Path.home() / ".hive" / "agents" / agent_name
@@ -315,40 +270,37 @@ def _load_resume_state(
    if not session_dir.exists():
        return None

-    if checkpoint_id:
-        # Checkpoint-based resume: load checkpoint and extract state
-        cp_path = session_dir / "checkpoints" / f"{checkpoint_id}.json"
-        if not cp_path.exists():
+    # Auto-discover latest checkpoint when not specified
+    if not checkpoint_id:
+        cp_dir = session_dir / "checkpoints"
+        if cp_dir.exists():
+            checkpoints = sorted(
+                cp_dir.glob("*.json"),
+                key=lambda p: p.stat().st_mtime,
+                reverse=True,
+            )
+            if checkpoints:
+                checkpoint_id = checkpoints[0].stem
+        if not checkpoint_id:
            return None
-        try:
-            cp_data = json.loads(cp_path.read_text(encoding="utf-8"))
-        except (json.JSONDecodeError, OSError):
-            return None
-        return {
-            "resume_session_id": session_id,
-            "memory": cp_data.get("shared_memory", {}),
-            "paused_at": cp_data.get("next_node") or cp_data.get("current_node"),
-            "execution_path": cp_data.get("execution_path", []),
-            "node_visit_counts": {},
-        }
-    else:
-        # Session state resume: load state.json
-        state_path = session_dir / "state.json"
-        if not state_path.exists():
-            return None
-        try:
-            state_data = json.loads(state_path.read_text(encoding="utf-8"))
-        except (json.JSONDecodeError, OSError):
-            return None
-        progress = state_data.get("progress", {})
-        paused_at = progress.get("paused_at") or progress.get("resume_from")
-        return {
-            "resume_session_id": session_id,
-            "memory": state_data.get("memory", {}),
-            "paused_at": paused_at,
-            "execution_path": progress.get("path", []),
-            "node_visit_counts": progress.get("node_visit_counts", {}),
-        }
+
+    cp_path = session_dir / "checkpoints" / f"{checkpoint_id}.json"
+    if not cp_path.exists():
+        return None
+    try:
+        cp_data = json.loads(cp_path.read_text(encoding="utf-8"))
+    except (json.JSONDecodeError, OSError):
+        return None
+
+    return {
+        "resume_session_id": session_id,
+        "resume_from_checkpoint": checkpoint_id,
+        "run_id": cp_data.get("run_id") or None,
+        "data_buffer": cp_data.get("data_buffer", cp_data.get("shared_memory", {})),
+        "paused_at": cp_data.get("next_node") or cp_data.get("current_node"),
+        "execution_path": cp_data.get("execution_path", []),
+        "node_visit_counts": cp_data.get("node_visit_counts", {}),
+    }


 def _prompt_before_start(agent_path: str, runner, model: str | None = None):
@@ -387,16 +339,16 @@ def _prompt_before_start(agent_path: str, runner, model: str | None = None):

 def cmd_run(args: argparse.Namespace) -> int:
    """Run an exported agent."""
-    import logging

    from framework.credentials.models import CredentialError
-    from framework.runner import AgentRunner
-
    from framework.observability import configure_logging
+    from framework.runner import AgentRunner

    # Set logging level (quiet by default for cleaner output)
    if args.quiet:
        configure_logging(level="ERROR")
+    elif getattr(args, "debug", False):
+        configure_logging(level="DEBUG")
    elif getattr(args, "verbose", False):
        configure_logging(level="INFO")
    else:
@@ -732,118 +684,6 @@ def cmd_list(args: argparse.Namespace) -> int:
    return 0


-def cmd_dispatch(args: argparse.Namespace) -> int:
-    """Dispatch request to multiple agents via orchestrator."""
-    from framework.runner import AgentOrchestrator
-
-    # Parse input
-    try:
-        context = json.loads(args.input)
-    except json.JSONDecodeError as e:
-        print(f"Error parsing --input JSON: {e}", file=sys.stderr)
-        return 1
-
-    # Find agents
-    agents_dir = Path(args.agents_dir)
-    if not agents_dir.exists():
-        print(f"Directory not found: {agents_dir}", file=sys.stderr)
-        return 1
-
-    # Create orchestrator and register agents
-    orchestrator = AgentOrchestrator()
-
-    agent_paths = []
-    if args.agents:
-        # Use specific agents
-        for agent_name in args.agents:
-            # Guard against full paths: if the name contains path separators
-            # (e.g. "exports/my_agent"), it will be doubled with agents_dir
-            agent_name_path = Path(agent_name)
-            if len(agent_name_path.parts) > 1:
-                print(
-                    f"Error: --agents expects agent names, not paths. "
-                    f"Use: --agents {agent_name_path.name} "
-                    f"instead of --agents {agent_name}",
-                    file=sys.stderr,
-                )
-                return 1
-            agent_path = agents_dir / agent_name
-            if not _is_valid_agent_dir(agent_path):
-                print(f"Agent not found: {agent_path}", file=sys.stderr)
-                return 1
-            agent_paths.append((agent_name, agent_path))
-    else:
-        # Discover all agents
-        for path in agents_dir.iterdir():
-            if _is_valid_agent_dir(path):
-                agent_paths.append((path.name, path))
-
-    if not agent_paths:
-        print(f"No agents found in {agents_dir}", file=sys.stderr)
-        return 1
-
-    # Register agents
-    for name, path in agent_paths:
-        try:
-            orchestrator.register(name, path)
-            if not args.quiet:
-                print(f"Registered agent: {name}")
-        except Exception as e:
-            print(f"Failed to register {name}: {e}", file=sys.stderr)
-
-    if not args.quiet:
-        print()
-        print(f"Input: {json.dumps(context)}")
-        if args.intent:
-            print(f"Intent: {args.intent}")
-        print()
-        print("=" * 60)
-        print("Dispatching to agents...")
-        print("=" * 60)
-        print()
-
-    # Dispatch
-    result = asyncio.run(orchestrator.dispatch(context, intent=args.intent))
-
-    # Output results
-    if args.quiet:
-        output = {
-            "success": result.success,
-            "handled_by": result.handled_by,
-            "results": result.results,
-            "error": result.error,
-        }
-        print(json.dumps(output, indent=2, default=str))
-    else:
-        print()
-        print("=" * 60)
-        print(f"Success: {result.success}")
-        print(f"Handled by: {', '.join(result.handled_by) or 'none'}")
-        if result.error:
-            print(f"Error: {result.error}")
-        print("=" * 60)
-
-        if result.results:
-            print("\n--- Results by Agent ---")
-            for agent_name, data in result.results.items():
-                print(f"\n{agent_name}:")
-                status = data.get("status", "unknown")
-                print(f"  Status: {status}")
-                if "completed_steps" in data:
-                    print(f"  Steps: {len(data['completed_steps'])}")
-                if "results" in data:
-                    results_preview = json.dumps(data["results"], default=str)
-                    if len(results_preview) > 200:
-                        results_preview = results_preview[:200] + "..."
-                    print(f"  Results: {results_preview}")
-
-        if not args.quiet:
-            print(f"\nMessage trace: {len(result.messages)} messages")
-
-    orchestrator.cleanup()
-    return 0 if result.success else 1
-
-
 def _interactive_approval(request):
    """Interactive approval callback for HITL mode."""
    from framework.graph import ApprovalDecision, ApprovalResult
@@ -932,22 +772,15 @@ def _format_natural_language_to_json(

 def cmd_shell(args: argparse.Namespace) -> int:
    """Start an interactive agent session."""
-    import logging

    from framework.credentials.models import CredentialError
-    from framework.runner import AgentRunner
-
    from framework.observability import configure_logging
+    from framework.runner import AgentRunner

    configure_logging(level="INFO")

    agents_dir = Path(args.agents_dir)

-    # Multi-agent mode with orchestrator
-    if args.multi:
-        return _interactive_multi(agents_dir)
-
-    # Single agent mode
    agent_path = args.agent_path
    if not agent_path:
        # List available agents and let user choose
@@ -1421,107 +1254,6 @@ def _select_agent(agents_dir: Path) -> str | None:
            return None


-def _interactive_multi(agents_dir: Path) -> int:
-    """Interactive multi-agent mode with orchestrator."""
-    from framework.runner import AgentOrchestrator
-
-    if not agents_dir.exists():
-        print(f"Directory not found: {agents_dir}", file=sys.stderr)
-        return 1
-
-    orchestrator = AgentOrchestrator()
-    agent_count = 0
-
-    # Register all agents
-    for path in agents_dir.iterdir():
-        if _is_valid_agent_dir(path):
-            try:
-                orchestrator.register(path.name, path)
-                agent_count += 1
-            except Exception as e:
-                print(f"Warning: Failed to register {path.name}: {e}")
-
-    if agent_count == 0:
-        print(f"No agents found in {agents_dir}", file=sys.stderr)
-        return 1
-
-    print(f"\n{'=' * 60}")
-    print("Multi-Agent Interactive Mode")
-    print(f"Registered {agent_count} agents")
-    print(f"{'=' * 60}")
-    print("\nCommands:")
-    print("  /agents  - List registered agents")
-    print("  /quit    - Exit")
-    print("  {...}    - JSON input to dispatch")
-    print()
-
-    while True:
-        try:
-            user_input = input(">>> ").strip()
-        except (EOFError, KeyboardInterrupt):
-            print("\nExiting...")
-            break
-
-        if not user_input:
-            continue
-
-        if user_input == "/quit":
-            break
-
-        if user_input == "/agents":
-            print("\nRegistered agents:")
-            for agent in orchestrator.list_agents():
-                print(f"  - {agent['name']}: {agent['description'][:60]}...")
-            print()
-            continue
-
-        # Parse intent if provided
-        intent = None
-        if user_input.startswith("/intent "):
-            parts = user_input.split(" ", 2)
-            if len(parts) >= 3:
-                intent = parts[1]
-                user_input = parts[2]
-
-        # Try to parse as JSON
-        try:
-            context = json.loads(user_input)
-        except json.JSONDecodeError:
-            print("Error: Invalid JSON input. Use {...} format.")
-            continue
-
-        print(f"\nDispatching: {json.dumps(context)}")
-        if intent:
-            print(f"Intent: {intent}")
-        print("-" * 40)
-
-        result = asyncio.run(orchestrator.dispatch(context, intent=intent))
-
-        print(f"\nSuccess: {result.success}")
-        print(f"Handled by: {', '.join(result.handled_by) or 'none'}")
-
-        if result.error:
-            print(f"Error: {result.error}")
-
-        if result.results:
-            print("\nResults by agent:")
-            for agent_name, data in result.results.items():
-                print(f"\n  {agent_name}:")
-                status = data.get("status", "unknown")
-                print(f"    Status: {status}")
-                if "results" in data:
-                    results_preview = json.dumps(data["results"], default=str)
-                    if len(results_preview) > 150:
-                        results_preview = results_preview[:150] + "..."
-                    print(f"    Results: {results_preview}")
-
-        print(f"\nMessage trace: {len(result.messages)} messages")
-        print()
-
-    orchestrator.cleanup()
-    return 0
-
-
 def cmd_setup_credentials(args: argparse.Namespace) -> int:
    """Interactive credential setup for an agent."""
    from framework.credentials.setup import CredentialSetupSession
@@ -1544,10 +1276,51 @@ def cmd_setup_credentials(args: argparse.Namespace) -> int:
    return 0 if result.success else 1


+def _find_chrome_bin() -> str | None:
+    """Return the path to a Chrome/Chromium binary, or None if not found."""
+    import shutil
+
+    for candidate in (
+        "google-chrome",
+        "google-chrome-stable",
+        "chromium",
+        "chromium-browser",
+        "microsoft-edge",
+        "microsoft-edge-stable",
+    ):
+        if shutil.which(candidate):
+            return candidate
+
+    mac_paths = [
+        "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome",
+        Path.home() / "Applications/Google Chrome.app/Contents/MacOS/Google Chrome",
+        "/Applications/Microsoft Edge.app/Contents/MacOS/Microsoft Edge",
+    ]
+    for p in mac_paths:
+        if Path(p).exists():
+            return str(p)
+
+    return None
+
+
 def _open_browser(url: str) -> None:
-    """Open URL in the default browser (best-effort, non-blocking)."""
+    """Open URL in the browser (best-effort, non-blocking)."""
    import subprocess

+    chrome = _find_chrome_bin()
+
+    try:
+        if chrome:
+            subprocess.Popen(
+                [chrome, url],
+                stdout=subprocess.DEVNULL,
+                stderr=subprocess.DEVNULL,
+            )
+            return
+    except Exception:
+        pass
+
+    # Fallback: open with system default browser
    try:
        if sys.platform == "darwin":
            subprocess.Popen(
@@ -1573,6 +1346,37 @@ def _open_browser(url: str) -> None:
        pass  # Best-effort — don't crash if browser can't open


+def _ping_hive_gateway_availability(from_source: str) -> None:
+    """Ping Hive gateway availability for lightweight reachability logging."""
+    from urllib import error, parse, request
+
+    base_url = "https://api.adenhq.com/v1/gateway/availability"
+    query = parse.urlencode({"from": from_source})
+    url = f"{base_url}?{query}"
+
+    try:
+        with request.urlopen(url, timeout=5) as response:
+            response.read()
+    except (error.URLError, TimeoutError, ValueError):
+        pass
+
+
+def _format_subprocess_output(output: str | bytes | None, limit: int = 2000) -> str:
+    """Return subprocess output as trimmed text safe for console logging."""
+    if not output:
+        return ""
+
+    if isinstance(output, bytes):
+        text = output.decode(errors="replace")
+    else:
+        text = output
+
+    text = text.strip()
+    if len(text) <= limit:
+        return text
+    return text[-limit:]
+
+
 def _build_frontend() -> bool:
    """Build the frontend if source is newer than dist. Returns True if dist exists."""
    import subprocess
@@ -1608,18 +1412,25 @@ def _build_frontend() -> bool:

    # Need to build
    print("Building frontend...")
+    npm_cmd = "npm.cmd" if sys.platform == "win32" else "npm"
    try:
+        # Incremental tsc caches can drift across branch changes and block builds.
+        for cache_file in frontend_dir.glob("tsconfig*.tsbuildinfo"):
+            cache_file.unlink(missing_ok=True)
+
        # Ensure deps are installed
        subprocess.run(
-            ["npm", "install", "--no-fund", "--no-audit"],
+            [npm_cmd, "install", "--no-fund", "--no-audit"],
            encoding="utf-8",
+            errors="replace",
            cwd=frontend_dir,
            check=True,
            capture_output=True,
        )
        subprocess.run(
-            ["npm", "run", "build"],
+            [npm_cmd, "run", "build"],
            encoding="utf-8",
+            errors="replace",
            cwd=frontend_dir,
            check=True,
            capture_output=True,
@@ -1630,22 +1441,26 @@ def _build_frontend() -> bool:
        print("Node.js not found — skipping frontend build.")
        return dist_dir.is_dir()
    except subprocess.CalledProcessError as exc:
-        stderr = exc.stderr.decode(errors="replace") if exc.stderr else ""
-        print(f"Frontend build failed: {stderr[:500]}")
+        stdout = _format_subprocess_output(exc.stdout)
+        stderr = _format_subprocess_output(exc.stderr)
+        cmd = " ".join(exc.cmd) if isinstance(exc.cmd, (list, tuple)) else str(exc.cmd)
+        details = "\n".join(part for part in [stdout, stderr] if part).strip()
+        if details:
+            print(f"Frontend build failed while running {cmd}:\n{details}")
+        else:
+            print(f"Frontend build failed while running {cmd} (exit {exc.returncode}).")
        return dist_dir.is_dir()


 def cmd_serve(args: argparse.Namespace) -> int:
    """Start the HTTP API server."""
-    import logging

    from aiohttp import web

    _build_frontend()

-    from framework.server.app import create_app
-
    from framework.observability import configure_logging
+    from framework.server.app import create_app

    if getattr(args, "debug", False):
        configure_logging(level="DEBUG")
@@ -1661,10 +1476,10 @@ def cmd_serve(args: argparse.Namespace) -> int:
        # Preload agents specified via --agent
        for agent_path in args.agent:
            try:
-                session = await manager.create_session_with_worker(agent_path, model=model)
+                session = await manager.create_session_with_worker_graph(agent_path, model=model)
                info = session.worker_info
-                name = info.name if info else session.worker_id
-                print(f"Loaded agent: {session.worker_id} ({name})")
+                name = info.name if info else session.graph_id
+                print(f"Loaded agent: {session.graph_id} ({name})")
            except Exception as e:
                print(f"Error loading {agent_path}: {e}")

@@ -1687,7 +1502,7 @@ def cmd_serve(args: argparse.Namespace) -> int:
        if has_frontend:
            print(f"Dashboard: {dashboard_url}")
        print(f"Health: {dashboard_url}/api/health")
-        print(f"Agents loaded: {sum(1 for s in manager.list_sessions() if s.worker_runtime)}")
+        print(f"Agents loaded: {sum(1 for s in manager.list_sessions() if s.graph_runtime)}")
        print()
        print("Press Ctrl+C to stop")

@@ -1714,5 +1529,6 @@ def cmd_serve(args: argparse.Namespace) -> int:

 def cmd_open(args: argparse.Namespace) -> int:
    """Start the HTTP API server and open the dashboard in the browser."""
+    _ping_hive_gateway_availability("hive-open")
    args.open = True
    return cmd_serve(args)
@@ -1,7 +1,7 @@
 """MCP Client for connecting to Model Context Protocol servers.

 This module provides a client for connecting to MCP servers and invoking their tools.
-Supports both STDIO and HTTP transports using the official MCP Python SDK.
+Supports STDIO, HTTP, UNIX socket, and SSE transports using the official MCP Python SDK.
 """

 import asyncio
@@ -14,6 +14,8 @@ from typing import Any, Literal

 import httpx

+from framework.runner.mcp_errors import MCPToolNotFoundError
+
 logger = logging.getLogger(__name__)


@@ -22,7 +24,7 @@ class MCPServerConfig:
    """Configuration for an MCP server connection."""

    name: str
-    transport: Literal["stdio", "http"]
+    transport: Literal["stdio", "http", "unix", "sse"]

    # For STDIO transport
    command: str | None = None
@@ -33,6 +35,7 @@ class MCPServerConfig:
    # For HTTP transport
    url: str | None = None
    headers: dict[str, str] = field(default_factory=dict)
+    socket_path: str | None = None

    # Optional metadata
    description: str = ""
@@ -52,7 +55,7 @@ class MCPClient:
    """
    Client for communicating with MCP servers.

-    Supports both STDIO and HTTP transports using the official MCP SDK.
+    Supports STDIO, HTTP, UNIX socket, and SSE transports using the official MCP SDK.
    Manages the connection lifecycle and provides methods to list and invoke tools.
    """

@@ -68,6 +71,7 @@ class MCPClient:
        self._read_stream = None
        self._write_stream = None
        self._stdio_context = None  # Context manager for stdio_client
+        self._sse_context = None  # Context manager for sse_client
        self._errlog_handle = None  # Track errlog file handle for cleanup
        self._http_client: httpx.Client | None = None
        self._tools: dict[str, MCPTool] = {}
@@ -141,6 +145,10 @@ class MCPClient:
            self._connect_stdio()
        elif self.config.transport == "http":
            self._connect_http()
+        elif self.config.transport == "unix":
+            self._connect_unix()
+        elif self.config.transport == "sse":
+            self._connect_sse()
        else:
            raise ValueError(f"Unsupported transport: {self.config.transport}")

@@ -266,10 +274,94 @@ class MCPClient:
            logger.warning(f"Health check failed for MCP server '{self.config.name}': {e}")
            # Continue anyway, server might not have health endpoint

+    def _connect_unix(self) -> None:
+        """Connect to MCP server via UNIX domain socket transport."""
+        if not self.config.url:
+            raise ValueError("url is required for UNIX transport")
+        if not self.config.socket_path:
+            raise ValueError("socket_path is required for UNIX transport")
+
+        self._http_client = httpx.Client(
+            base_url=self.config.url,
+            headers=self.config.headers,
+            timeout=30.0,
+            transport=httpx.HTTPTransport(uds=self.config.socket_path),
+        )
+
+        try:
+            response = self._http_client.get("/health")
+            response.raise_for_status()
+            logger.info(
+                "Connected to MCP server '%s' via UNIX socket at %s",
+                self.config.name,
+                self.config.socket_path,
+            )
+        except Exception as e:
+            logger.warning(f"Health check failed for MCP server '{self.config.name}': {e}")
+            # Continue anyway, server might not have health endpoint
+
+    def _connect_sse(self) -> None:
+        """Connect to MCP server via SSE transport using MCP SDK with persistent session."""
+        if not self.config.url:
+            raise ValueError("url is required for SSE transport")
+
+        try:
+            loop_started = threading.Event()
+            connection_ready = threading.Event()
+            connection_error = []
+
+            def run_event_loop():
+                """Run event loop in background thread."""
+                self._loop = asyncio.new_event_loop()
+                asyncio.set_event_loop(self._loop)
+                loop_started.set()
+
+                async def init_connection():
+                    try:
+                        from mcp import ClientSession
+                        from mcp.client.sse import sse_client
+
+                        self._sse_context = sse_client(
+                            self.config.url,
+                            headers=self.config.headers,
+                            timeout=30.0,
+                        )
+                        (
+                            self._read_stream,
+                            self._write_stream,
+                        ) = await self._sse_context.__aenter__()
+
+                        self._session = ClientSession(self._read_stream, self._write_stream)
+                        await self._session.__aenter__()
+                        await self._session.initialize()
+
+                        connection_ready.set()
+                    except Exception as e:
+                        connection_error.append(e)
+                        connection_ready.set()
+
+                self._loop.create_task(init_connection())
+                self._loop.run_forever()
+
+            self._loop_thread = threading.Thread(target=run_event_loop, daemon=True)
+            self._loop_thread.start()
+
+            loop_started.wait(timeout=5)
+            if not loop_started.is_set():
+                raise RuntimeError("Event loop failed to start")
+
+            connection_ready.wait(timeout=10)
+            if connection_error:
+                raise connection_error[0]
+
+            logger.info(f"Connected to MCP server '{self.config.name}' via SSE")
+        except Exception as e:
+            raise RuntimeError(f"Failed to connect to MCP server: {e}") from e
+
    def _discover_tools(self) -> None:
        """Discover available tools from the MCP server."""
        try:
-            if self.config.transport == "stdio":
+            if self.config.transport in {"stdio", "sse"}:
                tools_list = self._run_async(self._list_tools_stdio_async())
            else:
                tools_list = self._list_tools_http()
@@ -366,14 +458,45 @@ class MCPClient:
            self.connect()

        if tool_name not in self._tools:
-            raise ValueError(f"Unknown tool: {tool_name}")
+            raise MCPToolNotFoundError(
+                server=self.config.name,
+                tool_name=tool_name,
+            )

        if self.config.transport == "stdio":
            with self._stdio_call_lock:
                return self._run_async(self._call_tool_stdio_async(tool_name, arguments))
+        elif self.config.transport == "sse":
+            return self._call_tool_with_retry(
+                lambda: self._run_async(self._call_tool_stdio_async(tool_name, arguments))
+            )
+        elif self.config.transport == "unix":
+            return self._call_tool_with_retry(lambda: self._call_tool_http(tool_name, arguments))
        else:
            return self._call_tool_http(tool_name, arguments)

+    def _call_tool_with_retry(self, call: Any) -> Any:
+        """Retry transient MCP transport failures once after reconnecting."""
+        if self.config.transport == "stdio":
+            return call()
+
+        if self.config.transport not in {"unix", "sse"}:
+            return call()
+
+        try:
+            return call()
+        except (httpx.ConnectError, httpx.ReadTimeout) as original_error:
+            logger.warning(
+                "Retrying MCP tool call after transport error from '%s': %s",
+                self.config.name,
+                original_error,
+            )
+            self._reconnect()
+            try:
+                return call()
+            except (httpx.ConnectError, httpx.ReadTimeout) as retry_error:
+                raise original_error from retry_error
+
    async def _call_tool_stdio_async(self, tool_name: str, arguments: dict[str, Any]) -> Any:
        """Call tool via STDIO protocol using persistent session."""
        if not self._session:
@@ -389,19 +512,35 @@ class MCPClient:
                content_item = result.content[0]
                if hasattr(content_item, "text"):
                    error_text = content_item.text
-            raise RuntimeError(f"MCP tool '{tool_name}' failed: {error_text}")
+            raise RuntimeError(
+                f"[Server: {self.config.name}] [Transport: {self.config.transport}] "
+                f"Tool '{tool_name}' failed: {error_text}"
+            )

-        # Extract content
+        # Extract content — preserve image blocks alongside text
        if result.content:
-            # MCP returns content as a list of content items
-            if len(result.content) > 0:
-                content_item = result.content[0]
-                # Check if it's a text content item
-                if hasattr(content_item, "text"):
-                    return content_item.text
-                elif hasattr(content_item, "data"):
-                    return content_item.data
-            return result.content
+            text_parts: list[str] = []
+            image_parts: list[dict[str, Any]] = []
+            for item in result.content:
+                if hasattr(item, "text"):
+                    text_parts.append(item.text)
+                elif hasattr(item, "data") and hasattr(item, "mimeType"):
+                    # MCP ImageContent — preserve as structured image block
+                    image_parts.append(
+                        {
+                            "type": "image_url",
+                            "image_url": {
+                                "url": f"data:{item.mimeType};base64,{item.data}",
+                            },
+                        }
+                    )
+                elif hasattr(item, "data"):
+                    text_parts.append(str(item.data))
+
+            text = "\n".join(text_parts) if text_parts else ""
+            if image_parts:
+                return {"_text": text, "_images": image_parts}
+            return text if text else None

        return None

@@ -427,24 +566,36 @@ class MCPClient:
            data = response.json()

            if "error" in data:
-                raise RuntimeError(f"Tool execution error: {data['error']}")
+                raise RuntimeError(
+                    f"[Server: {self.config.name}] [Transport: {self.config.transport}] "
+                    f"Tool '{tool_name}' failed: {data['error']}"
+                )

            return data.get("result", {}).get("content", [])
        except Exception as e:
-            raise RuntimeError(f"Failed to call tool via HTTP: {e}") from e
+            raise RuntimeError(
+                f"[Server: {self.config.name}] [Transport: {self.config.transport}] "
+                f"Failed to call tool via HTTP: Tool '{tool_name}' failed: {e}"
+            ) from e
+
+    def _reconnect(self) -> None:
+        """Reconnect to the configured MCP server."""
+        logger.info(f"Reconnecting to MCP server '{self.config.name}'...")
+        self.disconnect()
+        self.connect()

    _CLEANUP_TIMEOUT = 10
    _THREAD_JOIN_TIMEOUT = 12

    async def _cleanup_stdio_async(self) -> None:
-        """Async cleanup for STDIO session and context managers.
+        """Async cleanup for persistent MCP session and context managers.

        Cleanup order is critical:
-        - The session must be closed BEFORE the stdio_context because the session
-          depends on the streams provided by stdio_context.
-        - This mirrors the initialization order in _connect_stdio(), where
-          stdio_context is entered first (providing streams), then the session is
-          created with those streams and entered.
+        - The session must be closed BEFORE the transport context manager because the
+          session depends on the streams provided by that context.
+        - This mirrors the initialization order in _connect_stdio() / _connect_sse(),
+          where the transport context is entered first (providing streams), then the
+          session is created with those streams and entered.
        - Do not change this ordering without carefully considering these dependencies.
        """
        # First: close session (depends on stdio_context streams)
@@ -477,6 +628,16 @@ class MCPClient:
        finally:
            self._stdio_context = None

+        try:
+            if self._sse_context:
+                await self._sse_context.__aexit__(None, None, None)
+        except asyncio.CancelledError:
+            logger.debug("SSE context cleanup was cancelled; proceeding with best-effort shutdown")
+        except Exception as e:
+            logger.warning(f"Error closing SSE context: {e}")
+        finally:
+            self._sse_context = None
+
        # Third: close errlog file handle if we opened one
        if self._errlog_handle is not None:
            try:
@@ -552,6 +713,7 @@ class MCPClient:
            # Setting None to None is safe and ensures clean state.
            self._session = None
            self._stdio_context = None
+            self._sse_context = None
            self._read_stream = None
            self._write_stream = None
            self._loop = None
@@ -0,0 +1,409 @@
+"""Shared MCP client connection management."""
+
+import logging
+import threading
+
+import httpx
+
+from framework.runner.mcp_client import MCPClient, MCPServerConfig
+
+logger = logging.getLogger(__name__)
+
+_TRANSITION_TIMEOUT = 30.0
+
+
+class MCPConnectionManager:
+    """Process-wide MCP client pool keyed by server name."""
+
+    _instance = None
+    _lock = threading.Lock()
+
+    def __init__(self) -> None:
+        self._pool: dict[str, MCPClient] = {}
+        self._refcounts: dict[str, int] = {}
+        self._configs: dict[str, MCPServerConfig] = {}
+        self._pool_lock = threading.Lock()
+        self._transitions: dict[str, threading.Event] = {}
+
+    @classmethod
+    def get_instance(cls) -> "MCPConnectionManager":
+        """Return the process-level singleton instance."""
+        if cls._instance is None:
+            with cls._lock:
+                if cls._instance is None:
+                    cls._instance = cls()
+        return cls._instance
+
+    @staticmethod
+    def _is_connected(client: MCPClient | None) -> bool:
+        return bool(client and getattr(client, "_connected", False))
+
+    def has_connection(self, server_name: str) -> bool:
+        """Return True when a live pooled connection exists for ``server_name``."""
+        with self._pool_lock:
+            return self._is_connected(self._pool.get(server_name))
+
+    def acquire(self, config: MCPServerConfig) -> MCPClient:
+        """Get or create a shared connection and increment its refcount."""
+        server_name = config.name
+
+        while True:
+            should_connect = False
+            transition_event: threading.Event | None = None
+
+            with self._pool_lock:
+                client = self._pool.get(server_name)
+                if self._is_connected(client) and server_name not in self._transitions:
+                    new_refcount = self._refcounts.get(server_name, 0) + 1
+                    self._refcounts[server_name] = new_refcount
+                    self._configs[server_name] = config
+                    logger.debug(
+                        "Reusing pooled connection for MCP server '%s' (refcount=%d)",
+                        server_name,
+                        new_refcount,
+                    )
+                    return client
+
+                transition_event = self._transitions.get(server_name)
+                if transition_event is None:
+                    transition_event = threading.Event()
+                    self._transitions[server_name] = transition_event
+                    self._configs[server_name] = config
+                    should_connect = True
+
+            if not should_connect:
+                if not transition_event.wait(timeout=_TRANSITION_TIMEOUT):
+                    logger.warning(
+                        "Timed out waiting for transition on MCP server '%s', "
+                        "forcing cleanup and retrying",
+                        server_name,
+                    )
+                    with self._pool_lock:
+                        stuck = self._transitions.get(server_name)
+                        if stuck is transition_event:
+                            self._transitions.pop(server_name, None)
+                            transition_event.set()
+                continue
+
+            logger.info("Connecting to MCP server '%s'", server_name)
+            client = MCPClient(config)
+            try:
+                client.connect()
+            except Exception:
+                logger.warning(
+                    "Failed to connect to MCP server '%s'",
+                    server_name,
+                    exc_info=True,
+                )
+                with self._pool_lock:
+                    current = self._transitions.get(server_name)
+                    if current is transition_event:
+                        self._transitions.pop(server_name, None)
+                        if (
+                            server_name not in self._pool
+                            and self._refcounts.get(server_name, 0) <= 0
+                        ):
+                            self._configs.pop(server_name, None)
+                        transition_event.set()
+                raise
+
+            with self._pool_lock:
+                current = self._transitions.get(server_name)
+                if current is transition_event:
+                    self._pool[server_name] = client
+                    self._refcounts[server_name] = self._refcounts.get(server_name, 0) + 1
+                    self._configs[server_name] = config
+                    self._transitions.pop(server_name, None)
+                    transition_event.set()
+                    logger.info(
+                        "Connected to MCP server '%s' (refcount=1)",
+                        server_name,
+                    )
+                    return client
+
+            # Lost the transition race, clean up and retry
+            try:
+                client.disconnect()
+            except Exception:
+                logger.debug(
+                    "Error disconnecting stale client for '%s'",
+                    server_name,
+                    exc_info=True,
+                )
+
+    def release(self, server_name: str) -> None:
+        """Decrement refcount and disconnect when the last user releases."""
+        while True:
+            disconnect_client: MCPClient | None = None
+            transition_event: threading.Event | None = None
+            should_disconnect = False
+
+            with self._pool_lock:
+                transition_event = self._transitions.get(server_name)
+                if transition_event is None:
+                    refcount = self._refcounts.get(server_name, 0)
+                    if refcount <= 0:
+                        return
+                    if refcount > 1:
+                        self._refcounts[server_name] = refcount - 1
+                        logger.debug(
+                            "Released MCP server '%s' (refcount=%d)",
+                            server_name,
+                            refcount - 1,
+                        )
+                        return
+
+                    disconnect_client = self._pool.pop(server_name, None)
+                    self._refcounts.pop(server_name, None)
+                    self._configs.pop(server_name, None)
+                    transition_event = threading.Event()
+                    self._transitions[server_name] = transition_event
+                    should_disconnect = True
+
+            if not should_disconnect:
+                if not transition_event.wait(timeout=_TRANSITION_TIMEOUT):
+                    logger.warning(
+                        "Timed out waiting for transition on '%s' during release, forcing cleanup",
+                        server_name,
+                    )
+                    with self._pool_lock:
+                        stuck = self._transitions.get(server_name)
+                        if stuck is transition_event:
+                            self._transitions.pop(server_name, None)
+                            transition_event.set()
+                continue
+
+            try:
+                if disconnect_client is not None:
+                    disconnect_client.disconnect()
+                    logger.info(
+                        "Disconnected MCP server '%s' (last reference released)",
+                        server_name,
+                    )
+            except Exception:
+                logger.warning(
+                    "Error disconnecting MCP server '%s' during release",
+                    server_name,
+                    exc_info=True,
+                )
+            finally:
+                with self._pool_lock:
+                    current = self._transitions.get(server_name)
+                    if current is transition_event:
+                        self._transitions.pop(server_name, None)
+                        transition_event.set()
+            return
+
+    def health_check(self, server_name: str) -> bool:
+        """Return True when the pooled connection appears healthy."""
+        while True:
+            with self._pool_lock:
+                transition_event = self._transitions.get(server_name)
+                if transition_event is None:
+                    client = self._pool.get(server_name)
+                    config = self._configs.get(server_name)
+                    break
+
+            if not transition_event.wait(timeout=_TRANSITION_TIMEOUT):
+                logger.warning(
+                    "Timed out waiting for transition on '%s' during health check",
+                    server_name,
+                )
+                return False
+
+        if client is None or config is None:
+            return False
+
+        try:
+            match config.transport:
+                case "stdio":
+                    client.list_tools()
+                    return True
+                case "http":
+                    if not config.url:
+                        return False
+                    with httpx.Client(
+                        base_url=config.url,
+                        headers=config.headers,
+                        timeout=5.0,
+                    ) as http_client:
+                        response = http_client.get("/health")
+                        response.raise_for_status()
+                    return True
+                case "sse":
+                    client.list_tools()
+                    return True
+                case "unix":
+                    if not config.socket_path:
+                        return False
+                    with httpx.Client(
+                        base_url=config.url or "http://localhost",
+                        headers=config.headers,
+                        timeout=5.0,
+                        transport=httpx.HTTPTransport(uds=config.socket_path),
+                    ) as http_client:
+                        response = http_client.get("/health")
+                        response.raise_for_status()
+                    return True
+                case _:
+                    logger.warning(
+                        "Unknown transport '%s' for health check on '%s'",
+                        config.transport,
+                        server_name,
+                    )
+                    return False
+        except Exception:
+            logger.debug(
+                "Health check failed for MCP server '%s'",
+                server_name,
+                exc_info=True,
+            )
+            return False
+
+    def reconnect(self, server_name: str) -> MCPClient:
+        """Force a disconnect and replace the pooled client with a fresh one."""
+        while True:
+            transition_event: threading.Event | None = None
+            old_client: MCPClient | None = None
+
+            with self._pool_lock:
+                transition_event = self._transitions.get(server_name)
+                if transition_event is None:
+                    config = self._configs.get(server_name)
+                    if config is None:
+                        raise KeyError(f"Unknown MCP server: {server_name}")
+                    old_client = self._pool.get(server_name)
+                    transition_event = threading.Event()
+                    self._transitions[server_name] = transition_event
+                    break
+
+            if not transition_event.wait(timeout=_TRANSITION_TIMEOUT):
+                logger.warning(
+                    "Timed out waiting for transition on '%s' during reconnect, forcing cleanup",
+                    server_name,
+                )
+                with self._pool_lock:
+                    stuck = self._transitions.get(server_name)
+                    if stuck is transition_event:
+                        self._transitions.pop(server_name, None)
+                        transition_event.set()
+
+        # Disconnect old client safely
+        if old_client is not None:
+            try:
+                old_client.disconnect()
+                logger.info("Disconnected old client for '%s'", server_name)
+            except Exception:
+                logger.warning(
+                    "Error disconnecting old client for '%s' during reconnect",
+                    server_name,
+                    exc_info=True,
+                )
+
+        logger.info("Reconnecting MCP server '%s'", server_name)
+        new_client = MCPClient(config)
+        try:
+            new_client.connect()
+        except Exception:
+            with self._pool_lock:
+                current = self._transitions.get(server_name)
+                if current is transition_event:
+                    self._pool.pop(server_name, None)
+                    self._transitions.pop(server_name, None)
+                    transition_event.set()
+            raise
+
+        with self._pool_lock:
+            current = self._transitions.get(server_name)
+            if current is transition_event:
+                current_refcount = self._refcounts.get(server_name, 0)
+                if current_refcount <= 0:
+                    # All holders released during reconnect. Discard the
+                    # new client instead of creating a phantom reference.
+                    # Caller should acquire() fresh if needed.
+                    self._transitions.pop(server_name, None)
+                    transition_event.set()
+                    logger.info(
+                        "Reconnected MCP server '%s' but refcount dropped to 0, "
+                        "discarding new client",
+                        server_name,
+                    )
+                    try:
+                        new_client.disconnect()
+                    except Exception:
+                        logger.debug(
+                            "Error disconnecting discarded client for '%s'",
+                            server_name,
+                            exc_info=True,
+                        )
+                    raise KeyError(
+                        f"MCP server '{server_name}' was fully released during reconnect"
+                    )
+
+                self._pool[server_name] = new_client
+                self._configs[server_name] = config
+                self._refcounts[server_name] = current_refcount
+                self._transitions.pop(server_name, None)
+                transition_event.set()
+                logger.info(
+                    "Reconnected MCP server '%s' (refcount=%d)",
+                    server_name,
+                    current_refcount,
+                )
+                return new_client
+
+        try:
+            new_client.disconnect()
+        except Exception:
+            logger.debug(
+                "Error disconnecting stale client for '%s' after reconnect race",
+                server_name,
+                exc_info=True,
+            )
+        return self.acquire(config)
+
+    def cleanup_all(self) -> None:
+        """Disconnect all pooled clients and clear manager state."""
+        while True:
+            with self._pool_lock:
+                if self._transitions:
+                    pending = list(self._transitions.values())
+                else:
+                    cleanup_events = {name: threading.Event() for name in self._pool}
+                    clients = list(self._pool.items())
+                    self._transitions.update(cleanup_events)
+                    self._pool.clear()
+                    self._refcounts.clear()
+                    self._configs.clear()
+                    break
+
+            all_resolved = all(event.wait(timeout=_TRANSITION_TIMEOUT) for event in pending)
+            if not all_resolved:
+                logger.warning(
+                    "Timed out waiting for pending transitions during cleanup, "
+                    "forcing cleanup of stuck transitions",
+                )
+                with self._pool_lock:
+                    for sn, evt in list(self._transitions.items()):
+                        if not evt.is_set():
+                            self._transitions.pop(sn, None)
+                            evt.set()
+
+        logger.info("Cleaning up %d pooled MCP connections", len(clients))
+        for server_name, client in clients:
+            try:
+                client.disconnect()
+                logger.debug("Disconnected MCP server '%s' during cleanup", server_name)
+            except Exception:
+                logger.warning(
+                    "Error disconnecting MCP server '%s' during cleanup",
+                    server_name,
+                    exc_info=True,
+                )
+
+        with self._pool_lock:
+            for server_name, event in cleanup_events.items():
+                current = self._transitions.get(server_name)
+                if current is event:
+                    self._transitions.pop(server_name, None)
+                    event.set()
@@ -0,0 +1,99 @@
+"""Structured error codes and exceptions for MCP server operations."""
+
+from enum import Enum
+
+
+class MCPErrorCode(Enum):
+    """Standardized error codes for MCP operations."""
+
+    MCP_INSTALL_FAILED = "MCP_INSTALL_FAILED"
+    MCP_AUTH_MISSING = "MCP_AUTH_MISSING"
+    MCP_CONNECT_TIMEOUT = "MCP_CONNECT_TIMEOUT"
+    MCP_TOOL_NOT_FOUND = "MCP_TOOL_NOT_FOUND"
+    MCP_PROTOCOL_MISMATCH = "MCP_PROTOCOL_MISMATCH"
+    MCP_VERSION_CONFLICT = "MCP_VERSION_CONFLICT"
+    MCP_HEALTH_FAILED = "MCP_HEALTH_FAILED"
+
+
+class MCPError(ValueError):
+    """Base exception for all structured MCP errors."""
+
+    def __init__(self, code: MCPErrorCode, what: str, why: str, fix: str):
+        self.code = code
+        self.what = what
+        self.why = why
+        self.fix = fix
+        self.message = (
+            f"[{self.code.value}]\nWhat failed: {self.what}\nWhy: {self.why}\nFix: {self.fix}"
+        )
+        super().__init__(self.message)
+
+
+class MCPToolNotFoundError(MCPError):
+    def __init__(self, server: str, tool_name: str):
+        super().__init__(
+            code=MCPErrorCode.MCP_TOOL_NOT_FOUND,
+            what=f"Tool '{tool_name}' not found on server '{server}'",
+            why=f"The server '{server}' does not expose a tool named '{tool_name}'.",
+            fix=f"Run 'hive mcp inspect {server}' to view available tools.",
+        )
+
+
+class MCPConnectTimeoutError(MCPError):
+    def __init__(self, server: str, transport: str, timeout_sec: int):
+        super().__init__(
+            code=MCPErrorCode.MCP_CONNECT_TIMEOUT,
+            what=f"Connection timed out while starting server '{server}'",
+            why=f"The {transport} transport did not respond within {timeout_sec} seconds.",
+            fix=f"Check if the server is running. Run 'hive mcp doctor {server}' for diagnostics.",
+        )
+
+
+class MCPAuthError(MCPError):
+    def __init__(self, server: str, env_var: str):
+        super().__init__(
+            code=MCPErrorCode.MCP_AUTH_MISSING,
+            what=f"Authentication failed for server '{server}'",
+            why=f"The required environment variable '{env_var}' is missing or empty.",
+            fix=f"Run: hive mcp config {server} --set {env_var}=<your-token>",
+        )
+
+
+class MCPInstallError(MCPError):
+    def __init__(self, server: str, why: str, fix: str):
+        super().__init__(
+            code=MCPErrorCode.MCP_INSTALL_FAILED,
+            what=f"Could not install MCP server '{server}'",
+            why=why,
+            fix=fix,
+        )
+
+
+class MCPProtocolMismatchError(MCPError):
+    def __init__(self, server: str, detail: str):
+        super().__init__(
+            code=MCPErrorCode.MCP_PROTOCOL_MISMATCH,
+            what=f"Protocol mismatch with server '{server}'",
+            why=detail,
+            fix=f"Check the MCP SDK version required by '{server}' matches your installation.",
+        )
+
+
+class MCPVersionConflictError(MCPError):
+    def __init__(self, server: str, detail: str):
+        super().__init__(
+            code=MCPErrorCode.MCP_VERSION_CONFLICT,
+            what=f"Version conflict with server '{server}'",
+            why=detail,
+            fix="Update or pin the MCP server package to a compatible version.",
+        )
+
+
+class MCPHealthCheckError(MCPError):
+    def __init__(self, server: str, detail: str):
+        super().__init__(
+            code=MCPErrorCode.MCP_HEALTH_FAILED,
+            what=f"Health check failed for server '{server}'",
+            why=detail,
+            fix=f"Run 'hive mcp doctor {server}' to diagnose the issue.",
+        )
@@ -0,0 +1,904 @@
+"""MCP Server Registry: local state management for installed MCP servers."""
+
+from __future__ import annotations
+
+import json
+import logging
+import os
+import tempfile
+import tomllib
+from datetime import UTC, datetime
+from importlib.metadata import PackageNotFoundError, version
+from pathlib import Path
+from typing import Any, Literal
+
+import httpx
+
+from framework.runner.mcp_client import MCPClient, MCPServerConfig
+from framework.runner.mcp_connection_manager import MCPConnectionManager
+from framework.runner.mcp_errors import (
+    MCPError,
+    MCPErrorCode,
+    MCPInstallError,
+)
+
+logger = logging.getLogger(__name__)
+
+DEFAULT_INDEX_URL = (
+    "https://raw.githubusercontent.com/aden-hive/hive-mcp-registry/main/registry_index.json"
+)
+DEFAULT_REFRESH_INTERVAL_HOURS = 24
+_LAST_FETCHED_FILENAME = "last_fetched"
+_LEGACY_LAST_FETCHED_FILENAME = "last_fetched.json"
+
+_DEFAULT_CONFIG = {
+    "index_url": DEFAULT_INDEX_URL,
+    "refresh_interval_hours": DEFAULT_REFRESH_INTERVAL_HOURS,
+}
+
+
+class MCPRegistry:
+    """Manages local MCP server state in ~/.hive/mcp_registry/."""
+
+    def __init__(self, base_path: Path | None = None):
+        self._base = base_path or Path.home() / ".hive" / "mcp_registry"
+        self._installed_path = self._base / "installed.json"
+        self._config_path = self._base / "config.json"
+        self._cache_dir = self._base / "cache"
+
+    # ── Initialization ──────────────────────────────────────────────
+
+    def initialize(self) -> None:
+        """Create directory structure and default files if missing."""
+        self._base.mkdir(parents=True, exist_ok=True)
+        self._cache_dir.mkdir(parents=True, exist_ok=True)
+
+        if not self._config_path.exists():
+            self._write_json(self._config_path, _DEFAULT_CONFIG)
+
+        if not self._installed_path.exists():
+            self._write_json(self._installed_path, {"servers": {}})
+
+    # ── Internal I/O ────────────────────────────────────────────────
+
+    def _read_installed(self) -> dict:
+        """Read installed.json, initializing if needed."""
+        if not self._installed_path.exists():
+            self.initialize()
+        return json.loads(self._installed_path.read_text(encoding="utf-8"))
+
+    def _write_installed(self, data: dict) -> None:
+        """Write installed.json."""
+        self._write_json(self._installed_path, data)
+
+    def _read_config(self) -> dict:
+        """Read config.json."""
+        if not self._config_path.exists():
+            self.initialize()
+        return json.loads(self._config_path.read_text(encoding="utf-8"))
+
+    def _read_cached_index(self) -> dict:
+        """Read cached registry_index.json."""
+        index_path = self._cache_dir / "registry_index.json"
+        if not index_path.exists():
+            return {"servers": {}}
+        return json.loads(index_path.read_text(encoding="utf-8"))
+
+    def _get_effective_manifest(
+        self,
+        name: str,
+        entry: dict,
+        cached_index: dict | None = None,
+    ) -> dict:
+        """Return the manifest currently in effect for an installed entry."""
+        manifest = entry.get("manifest", {})
+        if entry.get("source") != "registry":
+            return manifest
+
+        index = cached_index or self._read_cached_index()
+        cached_manifest = index.get("servers", {}).get(name)
+        if cached_manifest is not None:
+            return cached_manifest
+
+        # Fall back to persisted manifest data when the cache is unavailable.
+        if isinstance(manifest, dict) and manifest:
+            return manifest
+        return {}
+
+    @staticmethod
+    def _write_json(path: Path, data: dict) -> None:
+        """Write JSON to file atomically (write to temp, fsync, rename)."""
+        content = json.dumps(data, indent=2) + "\n"
+        fd, tmp_path = tempfile.mkstemp(dir=path.parent, suffix=".tmp")
+        try:
+            with os.fdopen(fd, "w", encoding="utf-8") as f:
+                f.write(content)
+                f.flush()
+                os.fsync(f.fileno())
+            os.replace(tmp_path, path)
+        except BaseException:
+            try:
+                os.unlink(tmp_path)
+            except OSError:
+                pass
+            raise
+
+    # ── add_local ───────────────────────────────────────────────────
+
+    def add_local(
+        self,
+        name: str,
+        transport: str | None = None,
+        manifest: dict | None = None,
+        url: str | None = None,
+        command: str | None = None,
+        args: list[str] | None = None,
+        env: dict[str, str] | None = None,
+        headers: dict[str, str] | None = None,
+        cwd: str | None = None,
+        socket_path: str | None = None,
+        description: str = "",
+    ) -> dict:
+        """Register a local/running MCP server.
+
+        Can be called with an inline manifest dict, or with individual
+        transport/url/command params that build a manifest automatically.
+        """
+        data = self._read_installed()
+        if name in data["servers"]:
+            raise MCPError(
+                code=MCPErrorCode.MCP_INSTALL_FAILED,
+                what=f"Server '{name}' already exists",
+                why="A server with this name is already registered locally.",
+                fix=f"Run: hive mcp remove {name}  — then add it again.",
+            )
+
+        if manifest is not None:
+            # Inline manifest provided directly
+            manifest = {**manifest, "name": name}
+            transport_config = manifest.get("transport", {})
+            transport = transport or transport_config.get("default", "stdio")
+            if "transport" not in manifest:
+                manifest["transport"] = {"supported": [transport], "default": transport}
+        else:
+            # Build manifest from individual params
+            if not transport:
+                raise MCPError(
+                    code=MCPErrorCode.MCP_INSTALL_FAILED,
+                    what=f"Cannot register server '{name}'",
+                    why="transport is required when manifest is not provided.",
+                    fix="Pass --transport stdio|http|unix|sse when using hive mcp add.",
+                )
+            manifest = {
+                "name": name,
+                "description": description,
+                "transport": {"supported": [transport], "default": transport},
+            }
+            match transport:
+                case "http":
+                    if not url:
+                        raise MCPError(
+                            code=MCPErrorCode.MCP_INSTALL_FAILED,
+                            what=f"Cannot register server '{name}' with http transport",
+                            why="url is required for http transport.",
+                            fix="Pass --url https://your-server to hive mcp add.",
+                        )
+                    manifest["http"] = {"url": url, "headers": headers or {}}
+                case "stdio":
+                    if not command:
+                        raise MCPError(
+                            code=MCPErrorCode.MCP_INSTALL_FAILED,
+                            what=f"Cannot register server '{name}' with stdio transport",
+                            why="command is required for stdio transport.",
+                            fix="Pass --command <executable> to hive mcp add.",
+                        )
+                    manifest["stdio"] = {
+                        "command": command,
+                        "args": args or [],
+                        "env": env or {},
+                        "cwd": cwd,
+                    }
+                case "unix":
+                    if not socket_path:
+                        raise MCPError(
+                            code=MCPErrorCode.MCP_INSTALL_FAILED,
+                            what=f"Cannot register server '{name}' with unix transport",
+                            why="socket_path is required for unix transport.",
+                            fix="Pass --socket-path /path/to/socket to hive mcp add.",
+                        )
+                    manifest["unix"] = {"socket_path": socket_path}
+                    manifest["http"] = {"url": url or "http://localhost"}
+                case "sse":
+                    if not url:
+                        raise MCPError(
+                            code=MCPErrorCode.MCP_INSTALL_FAILED,
+                            what=f"Cannot register server '{name}' with sse transport",
+                            why="url is required for sse transport.",
+                            fix="Pass --url https://your-server to hive mcp add.",
+                        )
+                    manifest["sse"] = {"url": url}
+                case _:
+                    raise MCPError(
+                        code=MCPErrorCode.MCP_INSTALL_FAILED,
+                        what=f"Cannot register server '{name}'",
+                        why=f"Unsupported transport: '{transport}'.",
+                        fix="Use one of: stdio, http, unix, sse.",
+                    )
+
+        entry = self._make_entry(
+            source="local",
+            manifest=manifest,
+            transport=transport,
+            installed_by="hive mcp add",
+        )
+
+        data["servers"][name] = entry
+        self._write_installed(data)
+        logger.info("Registered local MCP server '%s' (%s)", name, transport)
+        return entry
+
+    # ── install ─────────────────────────────────────────────────────
+
+    def install(self, name: str, transport: str | None = None, version: str | None = None) -> dict:
+        """Install a server from the cached remote registry index."""
+        data = self._read_installed()
+        if name in data["servers"]:
+            raise MCPInstallError(
+                server=name,
+                why=f"Server '{name}' already exists in the registry.",
+                fix=f"Run: hive mcp remove {name}  — then install again.",
+            )
+
+        index = self._read_cached_index()
+        manifest = index.get("servers", {}).get(name)
+        if manifest is None:
+            raise MCPInstallError(
+                server=name,
+                why=f"Server '{name}' not found in registry index.",
+                fix="Run: hive mcp update  — then try again.",
+            )
+
+        # Validate version if specified
+        if version is not None:
+            index_version = manifest.get("version")
+            if index_version is None:
+                raise MCPError(
+                    code=MCPErrorCode.MCP_VERSION_CONFLICT,
+                    what=f"Cannot pin version for '{name}'",
+                    why="The registry manifest has no version field.",
+                    fix="Run: hive mcp update  — then omit --version to use latest.",
+                )
+            if index_version != version:
+                raise MCPError(
+                    code=MCPErrorCode.MCP_VERSION_CONFLICT,
+                    what=f"Version mismatch for '{name}'",
+                    why=f"Requested {version} but index has {index_version}.",
+                    fix="Run: hive mcp update  — or omit --version to use latest.",
+                )
+
+        transport_config = manifest.get("transport", {})
+        supported = transport_config.get("supported", [])
+        if transport is not None:
+            if supported and transport not in supported:
+                raise MCPError(
+                    code=MCPErrorCode.MCP_INSTALL_FAILED,
+                    what=f"Transport '{transport}' not supported by '{name}'",
+                    why=f"Server supports: {supported}.",
+                    fix=f"Use one of the supported transports: {supported}.",
+                )
+            resolved_transport = transport
+        else:
+            resolved_transport = transport_config.get("default", "stdio")
+
+        entry = self._make_entry(
+            source="registry",
+            manifest=self._make_registry_manifest_snapshot(name, manifest),
+            transport=resolved_transport,
+            installed_by="hive mcp install",
+            pinned=version is not None,
+            auto_update=version is None,
+            resolved_package_version=manifest.get("version"),
+        )
+
+        data["servers"][name] = entry
+        self._write_installed(data)
+        logger.info(
+            "Installed MCP server '%s' v%s from registry",
+            name,
+            entry["manifest_version"],
+        )
+        return entry
+
+    # ── remove / enable / disable ───────────────────────────────────
+
+    def remove(self, name: str) -> None:
+        """Remove a server from the registry."""
+        data = self._read_installed()
+        if name not in data["servers"]:
+            raise MCPError(
+                code=MCPErrorCode.MCP_INSTALL_FAILED,
+                what=f"Cannot remove server '{name}'",
+                why="Server is not installed.",
+                fix="Run: hive mcp list  — to see installed servers.",
+            )
+        del data["servers"][name]
+        self._write_installed(data)
+        logger.info("Removed MCP server '%s'", name)
+
+    def enable(self, name: str) -> None:
+        """Enable a disabled server."""
+        self._set_enabled(name, enabled=True)
+
+    def disable(self, name: str) -> None:
+        """Disable a server without removing it."""
+        self._set_enabled(name, enabled=False)
+
+    def _set_enabled(self, name: str, *, enabled: bool) -> None:
+        data = self._read_installed()
+        if name not in data["servers"]:
+            raise MCPError(
+                code=MCPErrorCode.MCP_INSTALL_FAILED,
+                what=f"Cannot {'enable' if enabled else 'disable'} server '{name}'",
+                why="Server is not installed.",
+                fix="Run: hive mcp list  — to see installed servers.",
+            )
+        data["servers"][name]["enabled"] = enabled
+        self._write_installed(data)
+        logger.info("%s MCP server '%s'", "Enabled" if enabled else "Disabled", name)
+
+    # ── list / get ──────────────────────────────────────────────────
+
+    def list_installed(self) -> list[dict]:
+        """Return all installed servers as a list of dicts with name included."""
+        data = self._read_installed()
+        return [{"name": name, **entry} for name, entry in data["servers"].items()]
+
+    def get_server(self, name: str) -> dict | None:
+        """Get a single installed server entry by name, or None if not found."""
+        data = self._read_installed()
+        entry = data["servers"].get(name)
+        if entry is None:
+            return None
+        return {"name": name, **entry}
+
+    def list_available(self) -> list[dict]:
+        """List all servers from cached remote index."""
+        index = self._read_cached_index()
+        return [{"name": name, **m} for name, m in index.get("servers", {}).items()]
+
+    # ── set_override ────────────────────────────────────────────────
+
+    def set_override(
+        self,
+        name: str,
+        key: str,
+        value: str,
+        override_type: Literal["env", "headers"] = "env",
+    ) -> None:
+        """Set an env or header override for a server."""
+        data = self._read_installed()
+        if name not in data["servers"]:
+            raise MCPError(
+                code=MCPErrorCode.MCP_INSTALL_FAILED,
+                what=f"Cannot set override for server '{name}'",
+                why="Server is not installed.",
+                fix="Run: hive mcp list  — to see installed servers.",
+            )
+        if override_type not in ("env", "headers"):
+            raise MCPError(
+                code=MCPErrorCode.MCP_INSTALL_FAILED,
+                what=f"Invalid override type '{override_type}' for server '{name}'",
+                why="Override type must be 'env' or 'headers'.",
+                fix="Use --type env or --type headers.",
+            )
+        data["servers"][name]["overrides"][override_type][key] = value
+        self._write_installed(data)
+        logger.info("Set %s override %s for MCP server '%s'", override_type, key, name)
+
+    # ── search ──────────────────────────────────────────────────────
+
+    def search(self, query: str) -> list[dict]:
+        """Search registry index by name, tag, description, or tool name."""
+        query_lower = query.lower()
+        index = self._read_cached_index()
+        matches = []
+
+        for name, manifest in index.get("servers", {}).items():
+            if self._matches_query(name, manifest, query_lower):
+                matches.append({"name": name, **manifest})
+
+        return matches
+
+    @staticmethod
+    def _matches_query(name: str, manifest: dict, query: str) -> bool:
+        """Check if a manifest matches a search query."""
+        if query in name.lower():
+            return True
+
+        description = manifest.get("description", "")
+        if query in description.lower():
+            return True
+
+        for tag in manifest.get("tags", []):
+            if query in tag.lower():
+                return True
+
+        for tool in manifest.get("tools", []):
+            tool_name = tool.get("name", "") if isinstance(tool, dict) else str(tool)
+            if query in tool_name.lower():
+                return True
+
+        return False
+
+    # ── update_index ────────────────────────────────────────────────
+
+    def is_index_stale(self) -> bool:
+        """Check if the cached registry index needs refreshing."""
+        last_fetched_path = self._cache_dir / _LAST_FETCHED_FILENAME
+        legacy_path = self._cache_dir / _LEGACY_LAST_FETCHED_FILENAME
+        if not last_fetched_path.exists() and not legacy_path.exists():
+            return True
+
+        try:
+            path = last_fetched_path if last_fetched_path.exists() else legacy_path
+            data = json.loads(path.read_text(encoding="utf-8"))
+            last_fetched = datetime.fromisoformat(data["timestamp"])
+            config = self._read_config()
+            interval_hours = config.get("refresh_interval_hours", DEFAULT_REFRESH_INTERVAL_HOURS)
+            age_hours = (datetime.now(UTC) - last_fetched).total_seconds() / 3600
+            return age_hours >= interval_hours
+        except (KeyError, ValueError, OSError):
+            return True
+
+    def update_index(self) -> int:
+        """Fetch the latest registry index from remote and cache it.
+
+        Returns the number of servers in the index.
+        """
+        config = self._read_config()
+        url = config.get("index_url", DEFAULT_INDEX_URL)
+
+        response = httpx.get(url, timeout=10.0)
+        response.raise_for_status()
+        index = response.json()
+
+        self._write_json(self._cache_dir / "registry_index.json", index)
+        # Write last_fetched atomically too
+        self._write_json(
+            self._cache_dir / _LAST_FETCHED_FILENAME,
+            {"timestamp": datetime.now(UTC).isoformat()},
+        )
+
+        server_count = len(index.get("servers", {}))
+        logger.info("Updated registry index: %d servers available", server_count)
+        return server_count
+
+    # ── load_agent_selection ────────────────────────────────────────
+
+    def load_agent_selection(self, agent_path: Path) -> tuple[list[dict[str, Any]], int | None]:
+        """Load mcp_registry.json from an agent directory and resolve servers.
+
+        Returns:
+            (server_config_dicts, max_tools) for :meth:`ToolRegistry.load_registry_servers`.
+            ``max_tools`` is ``None`` when omitted or invalid in JSON.
+        """
+        registry_json_path = agent_path / "mcp_registry.json"
+        if not registry_json_path.exists():
+            return [], None
+
+        selection = json.loads(registry_json_path.read_text(encoding="utf-8"))
+
+        # Validate types at the JSON boundary. Bad fields are dropped with a
+        # warning so the agent still starts (graceful degradation).
+        expected_types: dict[str, type] = {
+            "include": list,
+            "tags": list,
+            "exclude": list,
+            "profile": str,
+            "max_tools": int,
+            "versions": dict,
+        }
+        validated: dict[str, Any] = {}
+        for field, expected in expected_types.items():
+            value = selection.get(field)
+            if value is None:
+                continue
+            if not isinstance(value, expected):
+                logger.warning(
+                    "mcp_registry.json: '%s' must be %s, got %s; ignoring",
+                    field,
+                    expected.__name__,
+                    type(value).__name__,
+                )
+                continue
+            validated[field] = value
+
+        max_tools = validated.get("max_tools")
+        configs = self.resolve_for_agent(
+            include=validated.get("include"),
+            tags=validated.get("tags"),
+            exclude=validated.get("exclude"),
+            profile=validated.get("profile"),
+            max_tools=max_tools,
+            versions=validated.get("versions"),
+        )
+        return [self._server_config_to_dict(c) for c in configs], max_tools
+
+    # ── resolve_for_agent ───────────────────────────────────────────
+
+    def resolve_for_agent(
+        self,
+        include: list[str] | None = None,
+        tags: list[str] | None = None,
+        exclude: list[str] | None = None,
+        profile: str | None = None,
+        max_tools: int | None = None,
+        versions: dict[str, str] | None = None,
+    ) -> list[MCPServerConfig]:
+        """Resolve installed servers matching agent selection criteria.
+
+        Selection precedence per PRD section 7.2:
+        1. profile expands to server names (union with include + tags)
+        2. include adds explicit servers
+        3. tags adds servers whose tags overlap
+        4. exclude removes (always wins)
+        5. Load order: include-order first, then alphabetical for tag/profile matches
+
+        Returns list of MCPServerConfig objects ready for ToolRegistry.
+        """
+        data = self._read_installed()
+        servers = data.get("servers", {})
+        cached_index = self._read_cached_index()
+        exclude_set = set(exclude or [])
+
+        # Phase 1: collect profile-matched servers (alphabetical)
+        profile_matched: list[str] = []
+        if profile:
+            for name, entry in sorted(servers.items()):
+                if name in exclude_set:
+                    continue
+                if profile == "all":
+                    profile_matched.append(name)
+                else:
+                    manifest = self._get_effective_manifest(name, entry, cached_index)
+                    profiles = manifest.get("hive", {}).get("profiles", [])
+                    if profile in profiles:
+                        profile_matched.append(name)
+
+        # Phase 2: collect tag-matched servers (alphabetical)
+        tag_matched: list[str] = []
+        if tags:
+            tag_set = set(tags)
+            for name, entry in sorted(servers.items()):
+                if name in exclude_set:
+                    continue
+                manifest = self._get_effective_manifest(name, entry, cached_index)
+                server_tags = set(manifest.get("tags", []))
+                if tag_set & server_tags:
+                    tag_matched.append(name)
+
+        # Phase 3: build final ordered list
+        # include-order first, then alphabetical for profile/tag matches
+        selected: list[str] = []
+        seen: set[str] = set()
+
+        for name in include or []:
+            if name not in seen and name not in exclude_set:
+                selected.append(name)
+                seen.add(name)
+
+        for name in profile_matched:
+            if name not in seen:
+                selected.append(name)
+                seen.add(name)
+
+        for name in tag_matched:
+            if name not in seen:
+                selected.append(name)
+                seen.add(name)
+
+        # Build configs, tracking aggregate tool count for max_tools cap (FR-56)
+        configs: list[MCPServerConfig] = []
+        total_tools = 0
+        for name in selected:
+            entry = servers.get(name)
+            if entry is None:
+                logger.warning(
+                    "Server '%s' requested but not installed. Run: hive mcp install %s",
+                    name,
+                    name,
+                )
+                continue
+            if not entry.get("enabled", True):
+                continue
+
+            manifest = self._get_effective_manifest(name, entry, cached_index)
+
+            # Check version pin (VC-6)
+            if versions and name in versions:
+                installed_version = entry.get("manifest_version", "0.0.0")
+                pinned_version = versions[name]
+                if installed_version != pinned_version:
+                    logger.warning(
+                        "Server '%s' version mismatch: installed=%s, pinned=%s. "
+                        "Run: hive mcp update %s",
+                        name,
+                        installed_version,
+                        pinned_version,
+                        name,
+                    )
+                    continue
+
+            # Check tool count cap before adding (FR-56), using manifest tool list when present.
+            # When ``tools`` is empty (e.g. ``add_local``), counts are unknown here—callers should
+            # pass the same ``max_tools`` to ToolRegistry.load_registry_servers to cap registration.
+            manifest_tools = manifest.get("tools", [])
+            server_tool_count = len(manifest_tools)
+            if max_tools is not None and server_tool_count == 0:
+                logger.debug(
+                    "Server '%s' has no tools list in manifest; max_tools enforced at registration",
+                    name,
+                )
+            elif max_tools is not None and total_tools + server_tool_count > max_tools:
+                logger.info(
+                    "Skipping server '%s' (%d tools): would exceed max_tools=%d",
+                    name,
+                    server_tool_count,
+                    max_tools,
+                )
+                continue
+
+            config = self._manifest_to_server_config(
+                name,
+                manifest,
+                entry.get("overrides", {}),
+                transport_override=entry.get("transport"),
+            )
+            if config is not None:
+                configs.append(config)
+                total_tools += server_tool_count
+
+        return configs
+
+    def _manifest_to_server_config(
+        self,
+        name: str,
+        manifest: dict,
+        overrides: dict | None = None,
+        transport_override: str | None = None,
+    ) -> MCPServerConfig | None:
+        """Convert a manifest and overrides to MCPServerConfig."""
+        overrides = overrides or {}
+        transport_config = manifest.get("transport", {})
+        transport = transport_override or transport_config.get("default", "stdio")
+        description = manifest.get("description", "")
+
+        match transport:
+            case "stdio":
+                stdio_config = manifest.get("stdio", {})
+                merged_env = {
+                    **stdio_config.get("env", {}),
+                    **overrides.get("env", {}),
+                }
+                return MCPServerConfig(
+                    name=name,
+                    transport="stdio",
+                    command=stdio_config.get("command"),
+                    args=stdio_config.get("args", []),
+                    env=merged_env,
+                    cwd=stdio_config.get("cwd"),
+                    description=description,
+                )
+            case "http":
+                http_config = manifest.get("http", {})
+                url = http_config.get("url", "")
+                merged_headers = {
+                    **http_config.get("headers", {}),
+                    **overrides.get("headers", {}),
+                }
+                return MCPServerConfig(
+                    name=name,
+                    transport="http",
+                    url=url,
+                    headers=merged_headers,
+                    description=description,
+                )
+            case "unix":
+                unix_config = manifest.get("unix", {})
+                http_config = manifest.get("http", {})
+                merged_headers = {
+                    **http_config.get("headers", {}),
+                    **overrides.get("headers", {}),
+                }
+                return MCPServerConfig(
+                    name=name,
+                    transport="unix",
+                    socket_path=unix_config.get("socket_path"),
+                    url=http_config.get("url") or "http://localhost",
+                    headers=merged_headers,
+                    description=description,
+                )
+            case "sse":
+                sse_config = manifest.get("sse", {})
+                merged_headers = {
+                    **sse_config.get("headers", {}),
+                    **overrides.get("headers", {}),
+                }
+                return MCPServerConfig(
+                    name=name,
+                    transport="sse",
+                    url=sse_config.get("url", ""),
+                    headers=merged_headers,
+                    description=description,
+                )
+            case _:
+                logger.warning(
+                    "Unsupported transport '%s' for server '%s'",
+                    transport,
+                    name,
+                )
+                return None
+
+    @staticmethod
+    def _server_config_to_dict(config: MCPServerConfig) -> dict[str, Any]:
+        """Convert MCPServerConfig to plain dict for ToolRegistry.register_mcp_server()."""
+        return {
+            "name": config.name,
+            "transport": config.transport,
+            "command": config.command,
+            "args": config.args,
+            "env": config.env,
+            "cwd": config.cwd,
+            "url": config.url,
+            "headers": config.headers,
+            "socket_path": config.socket_path,
+            "description": config.description,
+        }
+
+    # ── run_health_check ────────────────────────────────────────────
+
+    def health_check(self, name: str | None = None) -> dict | dict[str, dict]:
+        """Check health of installed server(s). Updates telemetry fields.
+
+        If name is None, checks all installed servers and returns
+        a dict mapping server names to their health results.
+
+        """
+        if name is None:
+            results = {}
+            for server in self.list_installed():
+                results[server["name"]] = self.health_check(server["name"])
+            return results
+
+        data = self._read_installed()
+        if name not in data["servers"]:
+            raise MCPError(
+                code=MCPErrorCode.MCP_HEALTH_FAILED,
+                what=f"Cannot health-check server '{name}'",
+                why="Server is not installed.",
+                fix="Run: hive mcp list  — to see installed servers.",
+            )
+
+        entry = data["servers"][name]
+        manifest = self._get_effective_manifest(name, entry)
+        config = self._manifest_to_server_config(
+            name,
+            manifest,
+            entry.get("overrides", {}),
+            transport_override=entry.get("transport"),
+        )
+        now = datetime.now(UTC).isoformat()
+
+        result: dict[str, Any] = {
+            "name": name,
+            "status": "unknown",
+            "tools": 0,
+            "error": None,
+        }
+
+        if config is None:
+            transport = entry.get("transport", "unknown")
+            result["status"] = "unhealthy"
+            result["error"] = f"Unsupported transport '{transport}'"
+            entry["last_health_status"] = "unhealthy"
+            entry["last_error"] = result["error"]
+            entry["last_health_check_at"] = now
+            self._write_installed(data)
+            return result
+
+        manager = MCPConnectionManager.get_instance()
+
+        try:
+            if manager.has_connection(name):
+                is_healthy = manager.health_check(name)
+                if not is_healthy:
+                    raise MCPError(
+                        code=MCPErrorCode.MCP_HEALTH_FAILED,
+                        what=f"Health check failed for server '{name}'",
+                        why="Shared MCP connection reported unhealthy.",
+                        fix=f"Run: hive mcp doctor {name}  — for diagnostics.",
+                    )
+                pooled_client = manager.acquire(config)
+                try:
+                    tools = pooled_client.list_tools()
+                finally:
+                    manager.release(name)
+            else:
+                with MCPClient(config) as client:
+                    tools = client.list_tools()
+
+            result["status"] = "healthy"
+            result["tools"] = len(tools)
+            entry["last_health_status"] = "healthy"
+            entry["last_error"] = None
+            entry["last_validated_with_hive_version"] = self._get_hive_version()
+        except Exception as exc:
+            result["status"] = "unhealthy"
+            result["error"] = str(exc)
+            entry["last_health_status"] = "unhealthy"
+            entry["last_error"] = str(exc)
+
+        entry["last_health_check_at"] = now
+        self._write_installed(data)
+        return result
+
+    def run_health_check(self, name: str | None = None) -> dict | dict[str, dict]:
+        """Backward-compatible wrapper for the public health_check API."""
+        return self.health_check(name)
+
+    @staticmethod
+    def _get_hive_version() -> str:
+        """Get the current Hive version."""
+        try:
+            return version("framework")
+        except PackageNotFoundError:
+            project_toml = Path(__file__).resolve().parents[2] / "pyproject.toml"
+            if not project_toml.exists():
+                return "unknown"
+            try:
+                with project_toml.open("rb") as f:
+                    data = tomllib.load(f)
+                return data.get("project", {}).get("version", "unknown")
+            except (tomllib.TOMLDecodeError, OSError):
+                return "unknown"
+
+    # ── helpers ──────────────────────────────────────────────────────
+
+    @staticmethod
+    def _make_entry(
+        *,
+        source: str,
+        manifest: dict,
+        transport: str,
+        installed_by: str,
+        pinned: bool = False,
+        auto_update: bool = False,
+        resolved_package_version: str | None = None,
+    ) -> dict:
+        """Build a standard installed server entry."""
+        now = datetime.now(UTC).isoformat()
+        return {
+            "source": source,
+            "manifest_version": manifest.get("version", "0.0.0"),
+            "manifest": manifest,
+            "installed_at": now,
+            "installed_by": installed_by,
+            "transport": transport,
+            "enabled": True,
+            "pinned": pinned,
+            "auto_update": auto_update,
+            "resolved_package_version": resolved_package_version,
+            "overrides": {"env": {}, "headers": {}},
+            "last_health_check_at": None,
+            "last_health_status": None,
+            "last_error": None,
+            "last_used_at": None,
+            "last_validated_with_hive_version": None,
+        }
+
+    @staticmethod
+    def _make_registry_manifest_snapshot(name: str, manifest: dict) -> dict[str, Any]:
+        """Persist a full manifest snapshot for registry-installed servers."""
+        manifest_snapshot = dict(manifest)
+        manifest_snapshot["name"] = name
+        return manifest_snapshot
@@ -0,0 +1,906 @@
+"""CLI commands for MCP server registry management.
+
+Commands:
+    hive mcp install <name>           Install a server from the registry
+    hive mcp add                      Register a local/running MCP server
+    hive mcp remove <name>            Remove an installed server
+    hive mcp enable <name>            Enable a server
+    hive mcp disable <name>           Disable a server
+    hive mcp list                     List installed servers
+    hive mcp info <name>              Show server details
+    hive mcp config <name>            Set env/header overrides
+    hive mcp search <query>           Search the registry index
+    hive mcp health [name]            Check server health
+    hive mcp update                   Refresh index and update installed servers
+    hive mcp update <name>            Update a single installed server
+"""
+
+from __future__ import annotations
+
+import json
+import os
+import sys
+from pathlib import Path
+from typing import Any
+
+# ── Shared helpers ──────────────────────────────────────────────────
+
+
+def _get_registry(base_path: Path | None = None):
+    """Initialize and return an MCPRegistry instance."""
+    from framework.runner.mcp_registry import MCPRegistry
+
+    registry = MCPRegistry(base_path=base_path)
+    registry.initialize()
+    return registry
+
+
+def _ensure_index_available(registry) -> bool:
+    """Ensure the registry index is cached locally.
+
+    If no index exists or the cache is stale, fetches a fresh copy.
+    Returns True if a usable index exists, False otherwise.
+
+    Semantics:
+    - Stale cache + refresh fails -> warn and continue with stale cache (True)
+    - No cache + refresh fails -> hard fail (False)
+    """
+    import httpx
+
+    cache_exists = (registry._cache_dir / "registry_index.json").exists()
+
+    if registry.is_index_stale():
+        print("Updating registry index...", file=sys.stderr)
+        try:
+            count = registry.update_index()
+            print(f"Registry index updated ({count} servers available).", file=sys.stderr)
+            return True
+        except (httpx.HTTPError, OSError) as exc:
+            if cache_exists:
+                print(
+                    f"Warning: failed to update registry index: {exc}\nUsing cached index.",
+                    file=sys.stderr,
+                )
+                return True
+            print(
+                f"Error: no registry index available and refresh failed: {exc}\n"
+                "Check your network connection and try: hive mcp update",
+                file=sys.stderr,
+            )
+            return False
+
+    return cache_exists
+
+
+_SECURITY_NOTICE = (
+    "Registry servers run code on your machine. Only install servers you trust.\n"
+    "Learn more: https://github.com/aden-hive/hive-mcp-registry"
+)
+_NOTICE_SENTINEL = ".security_notice_shown"
+
+
+def _print_security_notice_if_first_use(registry_base: Path) -> None:
+    """Print a one-time security notice on first registry install.
+
+    Only prints the notice. Call _mark_security_notice_shown() after
+    a successful install to persist the sentinel.
+    """
+    sentinel = registry_base / _NOTICE_SENTINEL
+    if sentinel.exists():
+        return
+    print(f"\n  {_SECURITY_NOTICE}\n", file=sys.stderr)
+
+
+def _mark_security_notice_shown(registry_base: Path) -> None:
+    """Persist the security notice sentinel after a successful install."""
+    sentinel = registry_base / _NOTICE_SENTINEL
+    try:
+        sentinel.touch()
+    except OSError:
+        pass
+
+
+def _prompt_for_missing_credentials(
+    registry,
+    name: str,
+    manifest: dict,
+) -> None:
+    """Prompt for required credentials not already set in env or overrides."""
+    credentials = manifest.get("credentials", [])
+    if not credentials:
+        return
+
+    server = registry.get_server(name)
+    existing_overrides = server.get("overrides", {}).get("env", {}) if server else {}
+
+    prompted = False
+    for cred in credentials:
+        if not isinstance(cred, dict):
+            continue
+        env_var = cred.get("env_var", "")
+        if not env_var:
+            continue
+        required = cred.get("required", False)
+        if not required:
+            continue
+
+        # Skip if already in environment or overrides
+        if os.environ.get(env_var) or existing_overrides.get(env_var):
+            continue
+
+        if not prompted:
+            print(f"\n{name} requires credentials:", file=sys.stderr)
+            prompted = True
+
+        description = cred.get("description", env_var)
+        help_url = cred.get("help_url", "")
+        help_hint = f" (get one at {help_url})" if help_url else ""
+
+        try:
+            value = input(f"  {description}{help_hint}\n  {env_var}: ").strip()
+        except (EOFError, KeyboardInterrupt):
+            print("\nSkipped credential prompting.", file=sys.stderr)
+            return
+
+        if value:
+            registry.set_override(name, env_var, value, override_type="env")
+
+
+def _parse_key_value_pairs(values: list[str]) -> dict[str, str]:
+    """Parse KEY=VAL pairs from CLI args. Raises ValueError on bad format."""
+    result = {}
+    for item in values:
+        if "=" not in item:
+            raise ValueError(
+                f"Invalid format: '{item}'. Expected KEY=VALUE.\n"
+                f"Example: --set JIRA_API_TOKEN=abc123"
+            )
+        key, _, value = item.partition("=")
+        if not key:
+            raise ValueError(f"Invalid format: '{item}'. Key cannot be empty.")
+        result[key] = value
+    return result
+
+
+def _find_agents_using_server(registry, name: str) -> list[str]:
+    """Scan agent directories for mcp_registry.json files that would load a server.
+
+    Uses MCPRegistry.load_agent_selection() to resolve actual selection logic
+    so results stay consistent with runtime behavior.
+    """
+    agent_dirs: list[Path] = []
+    # parents: [0]=runner, [1]=framework, [2]=core, [3]=hive (project root)
+    # NOTE: This path arithmetic assumes running from the source tree layout.
+    # It will not resolve correctly if installed via pip into site-packages.
+    project_root = Path(__file__).resolve().parents[3]
+    core_dir = Path(__file__).resolve().parents[2]
+
+    candidates = [
+        project_root / "exports",
+        core_dir / "exports",
+        core_dir / "framework" / "agents",
+    ]
+    for candidate in candidates:
+        if candidate.is_dir():
+            for child in candidate.iterdir():
+                if child.is_dir():
+                    agent_dirs.append(child)
+
+    matches = []
+    for agent_dir in agent_dirs:
+        registry_json = agent_dir / "mcp_registry.json"
+        if not registry_json.exists():
+            continue
+        try:
+            configs = registry.load_agent_selection(agent_dir)
+            resolved_names = {c["name"] for c in configs}
+            if name in resolved_names:
+                matches.append(str(agent_dir))
+        except Exception:
+            continue
+
+    return matches
+
+
+def _render_installed_table(entries: list[dict]) -> None:
+    """Render installed servers as a formatted table."""
+    if not entries:
+        print("No servers installed.")
+        print("Run 'hive mcp install <name>' or 'hive mcp add' to get started.")
+        return
+
+    # Column widths
+    name_w = max(len(e["name"]) for e in entries)
+    name_w = max(name_w, 4)
+    transport_w = max(len(e.get("transport", "")) for e in entries)
+    transport_w = max(transport_w, 9)
+
+    header = (
+        f"  {'NAME':<{name_w}}  "
+        f"{'TRANSPORT':<{transport_w}}  "
+        f"{'ENABLED':<7}  "
+        f"{'HEALTH':<9}  "
+        f"{'TOOLS':<5}  "
+        f"{'TRUST':<10}  "
+        f"{'SOURCE'}"
+    )
+    print(header)
+    print("  " + "─" * (len(header) - 2))
+
+    for entry in entries:
+        enabled = "yes" if entry.get("enabled", True) else "no"
+        health = entry.get("last_health_status") or "unknown"
+        health_sym = {"healthy": "✓", "unhealthy": "✗"}.get(health, "●")
+        source = entry.get("source", "")
+        manifest = entry.get("manifest", {})
+        tools_count = str(len(manifest.get("tools", [])))
+        trust_tier = manifest.get("status", "")
+        print(
+            f"  {entry['name']:<{name_w}}  "
+            f"{entry.get('transport', ''):<{transport_w}}  "
+            f"{enabled:<7}  "
+            f"{health_sym} {health:<7}  "
+            f"{tools_count:<5}  "
+            f"{trust_tier:<10}  "
+            f"{source}"
+        )
+
+
+def _render_available_table(entries: list[dict]) -> None:
+    """Render available registry servers as a formatted table."""
+    if not entries:
+        print("No servers in registry index.")
+        print("Run 'hive mcp update' to refresh the index.")
+        return
+
+    name_w = max(len(e["name"]) for e in entries)
+    name_w = max(name_w, 4)
+
+    header = f"  {'NAME':<{name_w}}  {'VERSION':<9}  {'STATUS':<10}  DESCRIPTION"
+    print(header)
+    print("  " + "─" * (len(header) - 2))
+
+    for entry in entries:
+        version = entry.get("version", "")
+        status = entry.get("status", "community")
+        desc = entry.get("description", "")
+        # Truncate long descriptions
+        if len(desc) > 60:
+            desc = desc[:57] + "..."
+        print(f"  {entry['name']:<{name_w}}  {version:<9}  {status:<10}  {desc}")
+
+
+def _mask_overrides(overrides: dict) -> dict:
+    """Replace override values with '<set>' markers. Shared by all output paths."""
+    masked: dict[str, dict[str, str]] = {}
+    if overrides.get("env"):
+        masked["env"] = dict.fromkeys(overrides["env"], "<set>")
+    else:
+        masked["env"] = {}
+    if overrides.get("headers"):
+        masked["headers"] = dict.fromkeys(overrides["headers"], "<set>")
+    else:
+        masked["headers"] = {}
+    return masked
+
+
+def _emit_json(data: Any) -> None:
+    """Print data as formatted JSON."""
+    print(json.dumps(data, indent=2, default=str))
+
+
+# ── Command registration ───────────────────────────────────────────
+
+
+def register_mcp_commands(subparsers) -> None:
+    """Register the ``hive mcp`` subcommand group."""
+    mcp_parser = subparsers.add_parser("mcp", help="Manage MCP servers")
+    mcp_sub = mcp_parser.add_subparsers(dest="mcp_command", required=True)
+
+    # ── install ──
+    install_p = mcp_sub.add_parser("install", help="Install a server from the registry")
+    install_p.add_argument("name", help="Server name in the registry")
+    install_p.add_argument(
+        "--version", dest="version", default=None, help="Pin to a specific version"
+    )
+    install_p.add_argument(
+        "--transport", default=None, help="Override default transport (stdio, http, unix, sse)"
+    )
+    install_p.set_defaults(func=cmd_mcp_install)
+
+    # ── add ──
+    add_p = mcp_sub.add_parser("add", help="Register a local/running MCP server")
+    add_p.add_argument("--name", required=False, help="Server name")
+    add_p.add_argument(
+        "--transport",
+        choices=["stdio", "http", "unix", "sse"],
+        default=None,
+        help="Transport type",
+    )
+    add_p.add_argument("--url", default=None, help="Server URL (http, unix, sse)")
+    add_p.add_argument("--command", default=None, help="Command to run (stdio)")
+    add_p.add_argument("--args", nargs="*", default=None, help="Command arguments (stdio)")
+    add_p.add_argument("--socket-path", default=None, help="Unix socket path")
+    add_p.add_argument("--description", default="", help="Server description")
+    add_p.add_argument("--from", dest="from_manifest", default=None, help="Path to manifest.json")
+    add_p.set_defaults(func=cmd_mcp_add)
+
+    # ── remove ──
+    remove_p = mcp_sub.add_parser("remove", help="Remove an installed server")
+    remove_p.add_argument("name", help="Server name")
+    remove_p.set_defaults(func=cmd_mcp_remove)
+
+    # ── enable ──
+    enable_p = mcp_sub.add_parser("enable", help="Enable a disabled server")
+    enable_p.add_argument("name", help="Server name")
+    enable_p.set_defaults(func=cmd_mcp_enable)
+
+    # ── disable ──
+    disable_p = mcp_sub.add_parser("disable", help="Disable a server without removing it")
+    disable_p.add_argument("name", help="Server name")
+    disable_p.set_defaults(func=cmd_mcp_disable)
+
+    # ── list ──
+    list_p = mcp_sub.add_parser("list", help="List servers")
+    list_p.add_argument(
+        "--available", action="store_true", help="Show available servers from registry"
+    )
+    list_p.add_argument("--json", dest="output_json", action="store_true", help="Output as JSON")
+    list_p.set_defaults(func=cmd_mcp_list)
+
+    # ── info ──
+    info_p = mcp_sub.add_parser("info", help="Show server details")
+    info_p.add_argument("name", help="Server name")
+    info_p.add_argument("--json", dest="output_json", action="store_true", help="Output as JSON")
+    info_p.set_defaults(func=cmd_mcp_info)
+
+    # ── config ──
+    config_p = mcp_sub.add_parser("config", help="Set server configuration overrides")
+    config_p.add_argument("name", help="Server name")
+    config_p.add_argument(
+        "--set",
+        dest="set_env",
+        nargs="+",
+        metavar="KEY=VAL",
+        help="Set environment variable overrides",
+    )
+    config_p.add_argument(
+        "--set-header", dest="set_header", nargs="+", metavar="KEY=VAL", help="Set header overrides"
+    )
+    config_p.set_defaults(func=cmd_mcp_config)
+
+    # ── search ──
+    search_p = mcp_sub.add_parser("search", help="Search the registry")
+    search_p.add_argument("query", help="Search term (name, tag, description, tool name)")
+    search_p.add_argument("--json", dest="output_json", action="store_true", help="Output as JSON")
+    search_p.set_defaults(func=cmd_mcp_search)
+
+    # ── health ──
+    health_p = mcp_sub.add_parser("health", help="Check server health")
+    health_p.add_argument("name", nargs="?", default=None, help="Server name (all if omitted)")
+    health_p.add_argument("--json", dest="output_json", action="store_true", help="Output as JSON")
+    health_p.set_defaults(func=cmd_mcp_health)
+
+    # ── update ──
+    update_p = mcp_sub.add_parser(
+        "update", help="Update installed servers or refresh the registry index"
+    )
+    update_p.add_argument(
+        "name",
+        nargs="?",
+        default=None,
+        help="Server name to update (omit to update all registry servers)",
+    )
+    update_p.set_defaults(func=cmd_mcp_update)
+
+
+# ── P0 command handlers ────────────────────────────────────────────
+
+
+def cmd_mcp_install(args) -> int:
+    """Install a server from the registry index."""
+    registry = _get_registry()
+    _print_security_notice_if_first_use(registry._base)
+    if not _ensure_index_available(registry):
+        return 1
+
+    try:
+        entry = registry.install(
+            args.name,
+            transport=args.transport,
+            version=args.version,
+        )
+    except ValueError as exc:
+        print(f"Error: {exc}", file=sys.stderr)
+        return 1
+
+    _mark_security_notice_shown(registry._base)
+
+    version_str = entry.get("manifest_version", "")
+    transport = entry.get("transport", "")
+    print(f"✓ Installed {args.name} v{version_str} ({transport})")
+
+    # Prompt for credentials defined in the manifest
+    manifest = entry.get("manifest", {})
+    _prompt_for_missing_credentials(registry, args.name, manifest)
+
+    print("\nNext steps:")
+    print(f"  hive mcp health {args.name}    Check that the server is reachable")
+    print(f"  hive mcp info {args.name}      View server details")
+    return 0
+
+
+def cmd_mcp_add(args) -> int:
+    """Register a local/running MCP server."""
+    registry = _get_registry()
+
+    # Handle --from manifest.json
+    if args.from_manifest:
+        return _cmd_mcp_add_from_manifest(registry, args.from_manifest)
+
+    if not args.name:
+        print(
+            "Error: --name is required.\n"
+            "Usage: hive mcp add --name my-server --transport http --url http://localhost:8080\n"
+            "   or: hive mcp add --from manifest.json",
+            file=sys.stderr,
+        )
+        return 1
+
+    if not args.transport:
+        print(
+            f"Error: --transport is required.\n"
+            f"Supported transports: stdio, http, unix, sse\n"
+            f"Example: hive mcp add --name {args.name} --transport http --url http://localhost:8080",
+            file=sys.stderr,
+        )
+        return 1
+
+    try:
+        entry = registry.add_local(
+            name=args.name,
+            transport=args.transport,
+            url=args.url,
+            command=args.command,
+            args=args.args,
+            socket_path=args.socket_path,
+            description=args.description,
+        )
+    except ValueError as exc:
+        print(f"Error: {exc}", file=sys.stderr)
+        return 1
+
+    print(f"✓ Registered {args.name} ({entry['transport']})")
+    return 0
+
+
+def _cmd_mcp_add_from_manifest(registry, manifest_path: str) -> int:
+    """Register a server from a manifest.json file."""
+    path = Path(manifest_path)
+    if not path.exists():
+        print(
+            f"Error: manifest file not found: {manifest_path}\nCheck the path and try again.",
+            file=sys.stderr,
+        )
+        return 1
+
+    try:
+        manifest = json.loads(path.read_text(encoding="utf-8"))
+    except json.JSONDecodeError as exc:
+        print(
+            f"Error: invalid JSON in {manifest_path}: {exc}\n"
+            f"Validate with: python -m json.tool {manifest_path}",
+            file=sys.stderr,
+        )
+        return 1
+
+    name = manifest.get("name")
+    if not name:
+        print(
+            f"Error: manifest missing 'name' field.\nAdd a 'name' field to {manifest_path}.",
+            file=sys.stderr,
+        )
+        return 1
+
+    try:
+        entry = registry.add_local(name=name, manifest=manifest)
+    except ValueError as exc:
+        print(f"Error: {exc}", file=sys.stderr)
+        return 1
+
+    print(f"✓ Registered {name} from {manifest_path} ({entry['transport']})")
+    return 0
+
+
+def cmd_mcp_remove(args) -> int:
+    """Remove an installed server."""
+    registry = _get_registry()
+    try:
+        registry.remove(args.name)
+    except ValueError as exc:
+        print(f"Error: {exc}", file=sys.stderr)
+        return 1
+
+    print(f"✓ Removed {args.name}")
+    return 0
+
+
+def cmd_mcp_enable(args) -> int:
+    """Enable a disabled server."""
+    registry = _get_registry()
+    try:
+        registry.enable(args.name)
+    except ValueError as exc:
+        print(f"Error: {exc}", file=sys.stderr)
+        return 1
+
+    print(f"✓ Enabled {args.name}")
+    return 0
+
+
+def cmd_mcp_disable(args) -> int:
+    """Disable a server without removing it."""
+    registry = _get_registry()
+    try:
+        registry.disable(args.name)
+    except ValueError as exc:
+        print(f"Error: {exc}", file=sys.stderr)
+        return 1
+
+    print(f"✓ Disabled {args.name}")
+    return 0
+
+
+def cmd_mcp_list(args) -> int:
+    """List installed or available servers."""
+    registry = _get_registry()
+
+    if args.available:
+        if not _ensure_index_available(registry):
+            return 1
+        entries = registry.list_available()
+        if args.output_json:
+            _emit_json(entries)
+        else:
+            _render_available_table(entries)
+    else:
+        entries = registry.list_installed()
+        if args.output_json:
+            safe_entries = []
+            for entry in entries:
+                safe = dict(entry)
+                safe["overrides"] = _mask_overrides(safe.get("overrides", {}))
+                safe_entries.append(safe)
+            _emit_json(safe_entries)
+        else:
+            _render_installed_table(entries)
+
+    return 0
+
+
+def cmd_mcp_info(args) -> int:
+    """Show full details for a server."""
+    registry = _get_registry()
+    server = registry.get_server(args.name)
+
+    if server is None:
+        print(
+            f"Error: server '{args.name}' is not installed.\n"
+            f"Run 'hive mcp list' to see installed servers.\n"
+            f"Run 'hive mcp install {args.name}' to install from registry.",
+            file=sys.stderr,
+        )
+        return 1
+
+    # Enrich with agent usage for both JSON and human output
+    agents = _find_agents_using_server(registry, args.name)
+    if agents:
+        server["used_by_agents"] = agents
+
+    if args.output_json:
+        safe = dict(server)
+        safe["overrides"] = _mask_overrides(safe.get("overrides", {}))
+        _emit_json(safe)
+        return 0
+
+    manifest = server.get("manifest", {})
+    overrides = _mask_overrides(server.get("overrides", {}))
+    tools = manifest.get("tools", [])
+    status = manifest.get("status", "community")
+    hive_block = manifest.get("hive", {})
+
+    print(f"{server['name']}")
+    print("=" * 50)
+
+    # Core info
+    print(f"  Source:     {server.get('source', '')}")
+    print(f"  Transport:  {server.get('transport', '')}")
+    print(f"  Version:    {server.get('manifest_version', 'unknown')}")
+    print(f"  Trust tier: {status}")
+    print(f"  Enabled:    {'yes' if server.get('enabled', True) else 'no'}")
+
+    # Description
+    desc = manifest.get("description", "")
+    if desc:
+        print(f"  Description: {desc}")
+
+    # Health
+    health = server.get("last_health_status")
+    if health:
+        health_sym = {"healthy": "✓", "unhealthy": "✗"}.get(health, "●")
+        print(f"  Health:     {health_sym} {health}")
+        last_check = server.get("last_health_check_at")
+        if last_check:
+            print(f"  Last check: {last_check}")
+    last_error = server.get("last_error")
+    if last_error:
+        print(f"  Last error: {last_error}")
+
+    # Tools
+    if tools:
+        print(f"\n  Tools ({len(tools)}):")
+        for tool in tools:
+            if isinstance(tool, dict):
+                tool_name = tool.get("name", "")
+                tool_desc = tool.get("description", "")
+                print(f"    • {tool_name}: {tool_desc}" if tool_desc else f"    • {tool_name}")
+            else:
+                print(f"    • {tool}")
+
+    # Overrides
+    env_overrides = overrides.get("env", {})
+    header_overrides = overrides.get("headers", {})
+    if env_overrides or header_overrides:
+        print("\n  Overrides:")
+        for key in env_overrides:
+            print(f"    env.{key} = <set>")
+        for key in header_overrides:
+            print(f"    header.{key} = <set>")
+
+    # Hive block
+    if hive_block:
+        profiles = hive_block.get("profiles", [])
+        if profiles:
+            print(f"\n  Profiles: {', '.join(profiles)}")
+        min_ver = hive_block.get("min_version")
+        if min_ver:
+            print(f"  Min Hive version: {min_ver}")
+
+    # Agent usage
+    if agents:
+        print("\n  Used by agents:")
+        for agent in agents:
+            print(f"    • {agent}")
+
+    # Timestamps
+    print(f"\n  Installed:  {server.get('installed_at', 'unknown')}")
+    print(f"  Installed by: {server.get('installed_by', 'unknown')}")
+
+    return 0
+
+
+def cmd_mcp_config(args) -> int:
+    """Set env or header overrides for a server."""
+    registry = _get_registry()
+
+    if not args.set_env and not args.set_header:
+        # Show current config
+        server = registry.get_server(args.name)
+        if server is None:
+            print(
+                f"Error: server '{args.name}' is not installed.\n"
+                f"Run 'hive mcp list' to see installed servers.",
+                file=sys.stderr,
+            )
+            return 1
+        masked = _mask_overrides(server.get("overrides", {}))
+        env_o = masked.get("env", {})
+        header_o = masked.get("headers", {})
+        if not env_o and not header_o:
+            print(f"No overrides set for {args.name}.")
+            print(f"Set one with: hive mcp config {args.name} --set KEY=VALUE")
+        else:
+            print(f"Overrides for {args.name}:")
+            for key in env_o:
+                print(f"  env.{key} = <set>")
+            for key in header_o:
+                print(f"  header.{key} = <set>")
+        return 0
+
+    try:
+        if args.set_env:
+            pairs = _parse_key_value_pairs(args.set_env)
+            for key, value in pairs.items():
+                registry.set_override(args.name, key, value, override_type="env")
+            print(f"✓ Set {len(pairs)} env override(s) for {args.name}")
+
+        if args.set_header:
+            pairs = _parse_key_value_pairs(args.set_header)
+            for key, value in pairs.items():
+                registry.set_override(args.name, key, value, override_type="headers")
+            print(f"✓ Set {len(pairs)} header override(s) for {args.name}")
+
+    except ValueError as exc:
+        print(f"Error: {exc}", file=sys.stderr)
+        return 1
+
+    return 0
+
+
+# ── P1 command handlers ────────────────────────────────────────────
+
+
+def cmd_mcp_search(args) -> int:
+    """Search the registry index."""
+    registry = _get_registry()
+    if not _ensure_index_available(registry):
+        return 1
+
+    results = registry.search(args.query)
+
+    if args.output_json:
+        _emit_json(results)
+        return 0
+
+    if not results:
+        print(f"No servers matching '{args.query}'.")
+        return 0
+
+    print(f"Found {len(results)} server(s) matching '{args.query}':\n")
+    _render_available_table(results)
+    return 0
+
+
+def cmd_mcp_health(args) -> int:
+    """Check server health."""
+    registry = _get_registry()
+
+    try:
+        results = registry.health_check(name=args.name)
+    except ValueError as exc:
+        print(f"Error: {exc}", file=sys.stderr)
+        return 1
+
+    # Single server returns a flat dict, all-servers returns name->dict
+    if args.name:
+        results = {args.name: results}
+
+    if args.output_json:
+        _emit_json(results)
+        return 0
+
+    for name, result in results.items():
+        status = result.get("status", "unknown")
+        tools = result.get("tools", 0)
+        error = result.get("error")
+        sym = {"healthy": "✓", "unhealthy": "✗"}.get(status, "●")
+
+        print(f"  {sym} {name}: {status}", end="")
+        if status == "healthy" and tools:
+            print(f" ({tools} tools)")
+        elif error:
+            print(f"\n    Error: {error}")
+        else:
+            print()
+
+    return 0
+
+
+def cmd_mcp_update(args) -> int:
+    """Update a single server, or refresh the index and update all registry servers."""
+    registry = _get_registry()
+
+    if args.name:
+        return _cmd_mcp_update_server(args.name, registry)
+
+    # Step 1: refresh the registry index
+    try:
+        count = registry.update_index()
+    except Exception as exc:
+        print(
+            f"Error: failed to update registry index: {exc}\n"
+            f"Check your network connection and try again.",
+            file=sys.stderr,
+        )
+        return 1
+
+    print(f"✓ Registry index updated ({count} servers available)")
+
+    # Step 2: update all installed registry servers (skip local/pinned)
+    installed = registry.list_installed()
+    registry_servers = [
+        s for s in installed if s.get("source") == "registry" and not s.get("pinned")
+    ]
+
+    if not registry_servers:
+        return 0
+
+    print(f"\nUpdating {len(registry_servers)} installed server(s)...")
+    errors = 0
+    for server in registry_servers:
+        name = server["name"]
+        rc = _cmd_mcp_update_server(name, registry)
+        if rc != 0:
+            errors += 1
+
+    return 1 if errors else 0
+
+
+def _cmd_mcp_update_server(name: str, registry=None) -> int:
+    """Bridge: reinstall a server from the latest index.
+
+    This is a temporary bridge until #6355 adds proper version diffing,
+    tool-signature change detection, and --dry-run support.
+    """
+    if registry is None:
+        registry = _get_registry()
+
+    server = registry.get_server(name)
+    if server is None:
+        print(
+            f"Error: server '{name}' is not installed.\n"
+            f"Run 'hive mcp install {name}' to install it.",
+            file=sys.stderr,
+        )
+        return 1
+
+    if server.get("source") != "registry":
+        print(
+            f"Error: '{name}' is a local server and cannot be updated from the registry.\n"
+            f"Use 'hive mcp remove {name}' and 'hive mcp add' to re-register it.",
+            file=sys.stderr,
+        )
+        return 1
+
+    if server.get("pinned"):
+        print(
+            f"Error: '{name}' is pinned to v{server.get('manifest_version', '?')}.\n"
+            f"To update a pinned server, remove and reinstall:\n"
+            f"  hive mcp remove {name} && hive mcp install {name}",
+            file=sys.stderr,
+        )
+        return 1
+
+    # Refresh index, then reinstall
+    if not _ensure_index_available(registry):
+        return 1
+
+    old_version = server.get("manifest_version", "unknown")
+    transport = server.get("transport")
+    overrides = server.get("overrides", {})
+    was_enabled = server.get("enabled", True)
+
+    # Save the full entry before removing so we can restore on failure
+    saved_entry = dict(server)
+    saved_entry.pop("name", None)
+
+    try:
+        registry.remove(name)
+        entry = registry.install(name, transport=transport)
+    except ValueError as exc:
+        # Restore the original entry so update doesn't become an uninstall
+        data = registry._read_installed()
+        data["servers"][name] = saved_entry
+        registry._write_installed(data)
+        print(
+            f"Error: {exc}\nServer '{name}' has been restored to its previous state.",
+            file=sys.stderr,
+        )
+        return 1
+
+    new_version = entry.get("manifest_version", "unknown")
+
+    # Restore prior state from the previous installation
+    for key, value in overrides.get("env", {}).items():
+        registry.set_override(name, key, value, override_type="env")
+    for key, value in overrides.get("headers", {}).items():
+        registry.set_override(name, key, value, override_type="headers")
+    if not was_enabled:
+        registry.disable(name)
+
+    if old_version == new_version:
+        print(f"✓ {name} is already at v{new_version}")
+    else:
+        print(f"✓ Updated {name}: v{old_version} → v{new_version}")
+
+    return 0
@@ -1,517 +0,0 @@
-"""Agent Orchestrator - routes requests and relays messages between agents."""
-
-from __future__ import annotations
-
-import asyncio
-import json
-from dataclasses import dataclass, field
-from pathlib import Path
-from typing import Any
-
-from framework.llm.provider import LLMProvider
-from framework.runner.protocol import (
-    AgentMessage,
-    CapabilityLevel,
-    CapabilityResponse,
-    MessageType,
-    OrchestratorResult,
-    RegisteredAgent,
-)
-from framework.runner.runner import AgentRunner
-
-
-@dataclass
-class RoutingDecision:
-    """Decision about which agent(s) should handle a request."""
-
-    selected_agents: list[str]
-    reasoning: str
-    confidence: float
-    should_parallelize: bool = False
-    fallback_agents: list[str] = field(default_factory=list)
-
-
-class AgentOrchestrator:
-    """
-    Manages multiple agents and routes communications between them.
-
-    The orchestrator:
-    1. Maintains a registry of available agents
-    2. Routes incoming requests to appropriate agent(s) using LLM
-    3. Relays messages between agents
-    4. Logs all communications for traceability
-
-    Usage:
-        orchestrator = AgentOrchestrator()
-        orchestrator.register("sales", "exports/outbound-sales")
-        orchestrator.register("support", "exports/customer-support")
-
-        result = await orchestrator.dispatch({
-            "intent": "help customer with billing issue",
-            "customer_id": "123",
-        })
-    """
-
-    def __init__(
-        self,
-        llm: LLMProvider | None = None,
-        model: str = "claude-haiku-4-5-20251001",
-    ):
-        """
-        Initialize the orchestrator.
-
-        Args:
-            llm: LLM provider for routing decisions (auto-creates if None)
-            model: Model to use for routing
-        """
-        self._agents: dict[str, RegisteredAgent] = {}
-        self._llm = llm
-        self._model = model
-        self._message_log: list[AgentMessage] = []
-
-        # Auto-create LLM - LiteLLM auto-detects provider and API key from model name
-        if self._llm is None:
-            from framework.config import get_api_base, get_api_key, get_llm_extra_kwargs
-            from framework.llm.litellm import LiteLLMProvider
-
-            self._llm = LiteLLMProvider(
-                model=self._model,
-                api_key=get_api_key(),
-                api_base=get_api_base(),
-                **get_llm_extra_kwargs(),
-            )
-
-    def register(
-        self,
-        name: str,
-        agent_path: str | Path,
-        capabilities: list[str] | None = None,
-        priority: int = 0,
-    ) -> None:
-        """
-        Register an agent with the orchestrator.
-
-        Args:
-            name: Unique name for this agent
-            agent_path: Path to agent folder (containing agent.json)
-            capabilities: Optional list of capability keywords
-            priority: Higher = checked first for routing
-        """
-        runner = AgentRunner.load(agent_path)
-        info = runner.info()
-
-        self._agents[name] = RegisteredAgent(
-            name=name,
-            runner=runner,
-            description=info.description,
-            capabilities=capabilities or [],
-            priority=priority,
-        )
-
-    def register_runner(
-        self,
-        name: str,
-        runner: AgentRunner,
-        capabilities: list[str] | None = None,
-        priority: int = 0,
-    ) -> None:
-        """
-        Register an existing AgentRunner.
-
-        Args:
-            name: Unique name for this agent
-            runner: AgentRunner instance
-            capabilities: Optional list of capability keywords
-            priority: Higher = checked first for routing
-        """
-        info = runner.info()
-
-        self._agents[name] = RegisteredAgent(
-            name=name,
-            runner=runner,
-            description=info.description,
-            capabilities=capabilities or [],
-            priority=priority,
-        )
-
-    def list_agents(self) -> list[dict]:
-        """List all registered agents."""
-        return [
-            {
-                "name": agent.name,
-                "description": agent.description,
-                "capabilities": agent.capabilities,
-                "priority": agent.priority,
-            }
-            for agent in sorted(
-                self._agents.values(),
-                key=lambda a: -a.priority,
-            )
-        ]
-
-    async def dispatch(
-        self,
-        request: dict,
-        intent: str | None = None,
-    ) -> OrchestratorResult:
-        """
-        Route a request to the appropriate agent(s).
-
-        Args:
-            request: The request data
-            intent: Optional description of what's being asked
-
-        Returns:
-            OrchestratorResult with results from handling agent(s)
-        """
-        messages: list[AgentMessage] = []
-
-        # Create initial message
-        initial_message = AgentMessage(
-            type=MessageType.REQUEST,
-            intent=intent or "Process request",
-            content=request,
-        )
-        messages.append(initial_message)
-        self._message_log.append(initial_message)
-
-        # Step 1: Check capabilities of all agents
-        capabilities = await self._check_all_capabilities(request)
-
-        # Step 2: Route to best agent(s)
-        routing = await self._route_request(request, intent, capabilities)
-
-        if not routing.selected_agents:
-            return OrchestratorResult(
-                success=False,
-                handled_by=[],
-                results={},
-                messages=messages,
-                error="No agent capable of handling this request",
-            )
-
-        # Step 3: Execute on selected agent(s)
-        results: dict[str, Any] = {}
-        handled_by: list[str] = []
-
-        if routing.should_parallelize and len(routing.selected_agents) > 1:
-            # Run agents in parallel
-            tasks = []
-            for agent_name in routing.selected_agents:
-                msg = AgentMessage(
-                    type=MessageType.REQUEST,
-                    from_agent="orchestrator",
-                    to_agent=agent_name,
-                    intent=intent or "Process request",
-                    content=request,
-                    parent_id=initial_message.id,
-                )
-                messages.append(msg)
-                self._message_log.append(msg)
-                tasks.append(self._send_to_agent(agent_name, msg))
-
-            responses = await asyncio.gather(*tasks, return_exceptions=True)
-
-            for agent_name, response in zip(routing.selected_agents, responses, strict=False):
-                if isinstance(response, Exception):
-                    results[agent_name] = {"error": str(response)}
-                else:
-                    messages.append(response)
-                    self._message_log.append(response)
-                    results[agent_name] = response.content
-                    handled_by.append(agent_name)
-        else:
-            # Run agents sequentially
-            accumulated_context = dict(request)
-
-            for agent_name in routing.selected_agents:
-                msg = AgentMessage(
-                    type=MessageType.REQUEST,
-                    from_agent="orchestrator",
-                    to_agent=agent_name,
-                    intent=intent or "Process request",
-                    content=accumulated_context,
-                    parent_id=initial_message.id,
-                )
-                messages.append(msg)
-                self._message_log.append(msg)
-
-                try:
-                    response = await self._send_to_agent(agent_name, msg)
-                    messages.append(response)
-                    self._message_log.append(response)
-                    results[agent_name] = response.content
-                    handled_by.append(agent_name)
-
-                    # Pass results to next agent
-                    if "results" in response.content:
-                        accumulated_context.update(response.content["results"])
-                except Exception as e:
-                    results[agent_name] = {"error": str(e)}
-                    # Try fallback if available
-                    if routing.fallback_agents:
-                        fallback = routing.fallback_agents.pop(0)
-                        routing.selected_agents.append(fallback)
-
-        return OrchestratorResult(
-            success=len(handled_by) > 0,
-            handled_by=handled_by,
-            results=results,
-            messages=messages,
-        )
-
-    async def relay(
-        self,
-        from_agent: str,
-        to_agent: str,
-        content: dict,
-        intent: str = "",
-    ) -> AgentMessage:
-        """
-        Relay a message from one agent to another.
-
-        Args:
-            from_agent: Source agent name
-            to_agent: Target agent name
-            content: Message content
-            intent: Description of what's being asked
-
-        Returns:
-            Response message from target agent
-        """
-        if to_agent not in self._agents:
-            raise ValueError(f"Unknown agent: {to_agent}")
-
-        message = AgentMessage(
-            type=MessageType.HANDOFF,
-            from_agent=from_agent,
-            to_agent=to_agent,
-            intent=intent,
-            content=content,
-        )
-        self._message_log.append(message)
-
-        response = await self._send_to_agent(to_agent, message)
-        self._message_log.append(response)
-
-        return response
-
-    async def broadcast(
-        self,
-        content: dict,
-        intent: str = "",
-        exclude: list[str] | None = None,
-    ) -> dict[str, AgentMessage]:
-        """
-        Send a message to all agents.
-
-        Args:
-            content: Message content
-            intent: Description of what's being asked
-            exclude: Agent names to exclude
-
-        Returns:
-            Dict of agent name -> response message
-        """
-        exclude = exclude or []
-        responses: dict[str, AgentMessage] = {}
-
-        message = AgentMessage(
-            type=MessageType.BROADCAST,
-            from_agent="orchestrator",
-            intent=intent,
-            content=content,
-        )
-        self._message_log.append(message)
-
-        tasks = []
-        agent_names = []
-        for name in self._agents:
-            if name not in exclude:
-                agent_names.append(name)
-                tasks.append(self._send_to_agent(name, message))
-
-        results = await asyncio.gather(*tasks, return_exceptions=True)
-
-        for name, result in zip(agent_names, results, strict=False):
-            if isinstance(result, Exception):
-                responses[name] = AgentMessage(
-                    type=MessageType.RESPONSE,
-                    from_agent=name,
-                    content={"error": str(result)},
-                    parent_id=message.id,
-                )
-            else:
-                responses[name] = result
-                self._message_log.append(result)
-
-        return responses
-
-    async def _check_all_capabilities(
-        self,
-        request: dict,
-    ) -> dict[str, CapabilityResponse]:
-        """Check all agents' capabilities in parallel."""
-        tasks = []
-        agent_names = []
-
-        for name, agent in self._agents.items():
-            agent_names.append(name)
-            tasks.append(agent.runner.can_handle(request, self._llm))
-
-        results = await asyncio.gather(*tasks, return_exceptions=True)
-
-        capabilities = {}
-        for name, result in zip(agent_names, results, strict=False):
-            if isinstance(result, Exception):
-                capabilities[name] = CapabilityResponse(
-                    agent_name=name,
-                    level=CapabilityLevel.CANNOT_HANDLE,
-                    confidence=0.0,
-                    reasoning=f"Error: {result}",
-                )
-            else:
-                capabilities[name] = result
-
-        return capabilities
-
-    async def _route_request(
-        self,
-        request: dict,
-        intent: str | None,
-        capabilities: dict[str, CapabilityResponse],
-    ) -> RoutingDecision:
-        """Decide which agent(s) should handle the request."""
-
-        # Filter to capable agents
-        capable = [
-            (name, cap)
-            for name, cap in capabilities.items()
-            if cap.level in (CapabilityLevel.BEST_FIT, CapabilityLevel.CAN_HANDLE)
-        ]
-
-        # Sort by confidence (highest first)
-        capable.sort(key=lambda x: -x[1].confidence)
-
-        # If only one capable agent, use it
-        if len(capable) == 1:
-            return RoutingDecision(
-                selected_agents=[capable[0][0]],
-                reasoning=capable[0][1].reasoning,
-                confidence=capable[0][1].confidence,
-            )
-
-        # If multiple capable agents and we have LLM, let it decide
-        if len(capable) > 1 and self._llm:
-            return await self._llm_route(request, intent, capable)
-
-        # If no capable agents, check uncertain ones
-        uncertain = [
-            (name, cap)
-            for name, cap in capabilities.items()
-            if cap.level == CapabilityLevel.UNCERTAIN
-        ]
-        if uncertain:
-            uncertain.sort(key=lambda x: -x[1].confidence)
-            return RoutingDecision(
-                selected_agents=[uncertain[0][0]],
-                reasoning=f"Uncertain match: {uncertain[0][1].reasoning}",
-                confidence=uncertain[0][1].confidence,
-                fallback_agents=[u[0] for u in uncertain[1:3]],
-            )
-
-        # No agents can handle
-        return RoutingDecision(
-            selected_agents=[],
-            reasoning="No capable agents found",
-            confidence=0.0,
-        )
-
-    async def _llm_route(
-        self,
-        request: dict,
-        intent: str | None,
-        capable: list[tuple[str, CapabilityResponse]],
-    ) -> RoutingDecision:
-        """Use LLM to decide routing when multiple agents are capable."""
-
-        agents_info = "\n".join(
-            f"- {name}: {cap.reasoning} (confidence: {cap.confidence:.2f})" for name, cap in capable
-        )
-
-        prompt = f"""Multiple agents can handle this request. Decide the best routing.
-
-Request:
-{json.dumps(request, indent=2)}
-
-Intent: {intent or "Not specified"}
-
-Capable agents:
-{agents_info}
-
-Decide:
-1. Which agent(s) should handle this?
-2. Should they run in parallel or sequence?
-3. Why this routing?
-
-Respond with JSON only:
-{{
-    "selected": ["agent_name", ...],
-    "parallel": true/false,
-    "reasoning": "explanation"
-}}"""
-
-        try:
-            response = await self._llm.acomplete(
-                messages=[{"role": "user", "content": prompt}],
-                system="You are a request router. Respond with JSON only.",
-                max_tokens=256,
-            )
-
-            import re
-
-            json_match = re.search(r"\{[^{}]*\}", response.content, re.DOTALL)
-            if json_match:
-                data = json.loads(json_match.group())
-                selected = data.get("selected", [])
-                # Validate selected agents exist
-                selected = [s for s in selected if s in self._agents]
-                if selected:
-                    return RoutingDecision(
-                        selected_agents=selected,
-                        reasoning=data.get("reasoning", ""),
-                        confidence=0.8,
-                        should_parallelize=data.get("parallel", False),
-                    )
-        except Exception:
-            pass
-
-        # Fallback: use highest confidence
-        return RoutingDecision(
-            selected_agents=[capable[0][0]],
-            reasoning=capable[0][1].reasoning,
-            confidence=capable[0][1].confidence,
-        )
-
-    async def _send_to_agent(
-        self,
-        agent_name: str,
-        message: AgentMessage,
-    ) -> AgentMessage:
-        """Send a message to an agent and get response."""
-        agent = self._agents[agent_name]
-        return await agent.runner.receive_message(message)
-
-    def get_message_log(self) -> list[AgentMessage]:
-        """Get full message log for debugging/tracing."""
-        return list(self._message_log)
-
-    def clear_message_log(self) -> None:
-        """Clear the message log."""
-        self._message_log.clear()
-
-    def cleanup(self) -> None:
-        """Clean up all agent resources."""
-        for agent in self._agents.values():
-            agent.runner.cleanup()
-        self._agents.clear()
@@ -1,6 +1,6 @@
 """Pre-load validation for agent graphs.

-Runs structural and credential checks before MCP servers are spawned.
+Runs structural, credential, and skill-trust checks before MCP servers are spawned.
 Fails fast with actionable error messages.
 """

@@ -169,6 +169,9 @@ def run_preload_validation(
    1. Graph structure (includes GCU subagent-only checks) — non-recoverable
    2. Credentials — potentially recoverable via interactive setup

+    Skill discovery and trust gating (AS-13) happen later in runner._setup()
+    so they have access to agent-level skill configuration.
+
    Raises PreloadValidationError for structural issues.
    Raises CredentialError for credential issues.
    """
--- a/Show More
+++ b/Show More