Compare commits

..

2 Commits

Author SHA1 Message Date
Richard T 8350c8f62f Fix: dependencies 2026-01-21 10:31:12 -08:00
Red Rose f95f43c0a3 feat: set up vitest testing infrastructure 2026-01-20 20:17:02 -08:00
1405 changed files with 67352 additions and 315794 deletions
-15
View File
@@ -1,15 +0,0 @@
{
"hooks": {
"PostToolUse": [
{
"matcher": "Edit|Write|NotebookEdit",
"hooks": [
{
"type": "command",
"command": "ruff check --fix \"$CLAUDE_FILE_PATH\" 2>/dev/null; ruff format \"$CLAUDE_FILE_PATH\" 2>/dev/null; true"
}
]
}
]
}
}
-16
View File
@@ -1,16 +0,0 @@
{
"permissions": {
"allow": [
"Bash(git status:*)",
"Bash(gh run view:*)",
"Bash(uv run:*)",
"Bash(env:*)",
"Bash(python -m py_compile:*)",
"Bash(python -m pytest:*)",
"Bash(source:*)",
"Bash(find:*)",
"Bash(PYTHONPATH=core:exports:tools/src uv run pytest:*)"
]
},
"enabledMcpjsonServers": ["tools"]
}
-241
View File
@@ -1,241 +0,0 @@
---
name: browser-edge-cases
description: SOP for debugging browser automation failures on complex websites. Use when browser tools fail on specific sites like LinkedIn, Twitter/X, SPAs, or sites with Shadow DOM.
license: MIT
---
# Browser Tool Edge Cases
Standard Operating Procedure for debugging and fixing browser automation failures on complex websites.
## When to Use This Skill
- `browser_scroll` succeeds but page doesn't move
- `browser_click` succeeds but no action triggered
- `browser_type` text disappears or doesn't work
- `browser_snapshot` hangs or returns stale content
- `browser_navigate` loads wrong content
## SOP: Debugging Browser Tool Failures
### Phase 1: Reproduce & Isolate
```
1. Create minimal test case demonstrating failure
2. Test against simple site (example.com) to verify tool works
3. Test against problematic site to confirm issue
```
**Quick isolation test:**
```python
# Test 1: Does the tool work at all?
await browser_navigate(tab_id, "https://example.com")
result = await browser_scroll(tab_id, "down", 100)
# Should work on simple sites
# Test 2: Does it fail on the problematic site?
await browser_navigate(tab_id, "https://linkedin.com/feed")
result = await browser_scroll(tab_id, "down", 100)
# If this fails but example.com works → site-specific edge case
```
### Phase 2: Analyze Root Cause
**Step 2a: Check console for errors**
```python
console = await browser_console(tab_id)
# Look for: CSP violations, React errors, JavaScript exceptions
```
**Step 2b: Inspect DOM structure**
```python
html = await browser_html(tab_id)
snapshot = await browser_snapshot(tab_id)
# Look for:
# - Nested scrollable divs (overflow: scroll/auto)
# - Shadow DOM roots
# - iframes
# - Custom widgets
```
**Step 2c: Identify the pattern**
| Symptom | Likely Cause | Check |
|---------|--------------|-------|
| Scroll doesn't move | Nested scroll container | Look for `overflow: scroll` divs |
| Click no effect | Element covered | Check `getBoundingClientRect` vs viewport |
| Type clears | Autocomplete/React | Check for event listeners on input |
| Snapshot hangs | Huge DOM | Check node count in snapshot |
| Snapshot stale | SPA hydration | Wait after navigation |
### Phase 3: Implement Multi-Layer Fix
**Pattern: Always have fallbacks**
```python
async def robust_operation(tab_id):
# Method 1: Primary approach
try:
result = await primary_method(tab_id)
if verify_success(result):
return result
except Exception:
pass
# Method 2: CDP fallback
try:
result = await cdp_fallback(tab_id)
if verify_success(result):
return result
except Exception:
pass
# Method 3: JavaScript fallback
return await javascript_fallback(tab_id)
```
**Pattern: Always add timeouts**
```python
# Bad - can hang forever
result = await browser_snapshot(tab_id)
# Good - fails fast with useful error
try:
result = await browser_snapshot(tab_id, timeout_s=10.0)
except asyncio.TimeoutError:
# Handle timeout gracefully
result = await fallback_snapshot(tab_id)
```
### Phase 4: Verify Fix
```
1. Run against problematic site → should work
2. Run against simple site → should still work (regression check)
3. Document in registry.md
```
## Pattern Library
### P1: Nested Scrollable Containers
**Sites:** LinkedIn, Twitter/X, any SPA with scrollable feeds
**Detection:**
```javascript
// Find largest scrollable container
const candidates = [];
document.querySelectorAll('*').forEach(el => {
const style = getComputedStyle(el);
if (style.overflow.includes('scroll') || style.overflow.includes('auto')) {
const rect = el.getBoundingClientRect();
if (rect.width > 100 && rect.height > 100) {
candidates.push({el, area: rect.width * rect.height});
}
}
});
candidates.sort((a, b) => b.area - a.area);
return candidates[0]?.el;
```
**Fix:** Dispatch scroll events at container's center, not viewport center.
### P2: Element Covered by Overlay
**Sites:** Modals, tooltips, SPAs with loading overlays
**Detection:**
```javascript
const rect = element.getBoundingClientRect();
const centerX = rect.left + rect.width / 2;
const centerY = rect.top + rect.height / 2;
const topElement = document.elementFromPoint(centerX, centerY);
return topElement === element || element.contains(topElement);
```
**Fix:** Wait for overlay to disappear, or use JavaScript click.
### P3: React Synthetic Events
**Sites:** React SPAs, modern web apps
**Detection:** If CDP click doesn't trigger handler but manual click works.
**Fix:** Use JavaScript click as primary:
```javascript
element.click();
```
### P4: Huge DOM / Accessibility Tree
**Sites:** LinkedIn, Facebook, Twitter (feeds with 1000s of nodes)
**Detection:**
```javascript
document.querySelectorAll('*').length > 5000
```
**Fix:**
1. Add timeout to snapshot operation
2. Truncate tree at 2000 nodes
3. Fall back to DOM-based snapshot if accessibility tree too large
### P5: SPA Hydration Delay
**Sites:** React, Vue, Angular SPAs after navigation
**Detection:**
```javascript
// Check if React app has hydrated
document.querySelector('[data-reactroot]') ||
document.querySelector('[data-reactid]')
```
**Fix:** Wait for specific selector after navigation:
```python
await browser_navigate(tab_id, url, wait_until="load")
await browser_wait(tab_id, selector='[data-testid="content"]', timeout_ms=5000)
```
### P6: Shadow DOM
**Sites:** Components using Shadow DOM, Lit elements
**Detection:**
```javascript
document.querySelectorAll('*').some(el => el.shadowRoot)
```
**Fix:** Pierce shadow root:
```javascript
function queryShadow(selector) {
const parts = selector.split('>>>');
let node = document;
for (const part of parts) {
if (node.shadowRoot) {
node = node.shadowRoot.querySelector(part.trim());
} else {
node = node.querySelector(part.trim());
}
}
return node;
}
```
## Quick Reference
| Issue | Primary Fix | Fallback |
|-------|-------------|----------|
| Scroll not working | Find scrollable container | Mouse wheel at container center |
| Click no effect | JavaScript click() | CDP mouse events |
| Type clears | Add delay_ms | Use execCommand |
| Snapshot hangs | Add timeout_s | DOM snapshot fallback |
| Stale content | Wait for selector | Increase wait_until timeout |
| Shadow DOM | Pierce selector | JavaScript traversal |
## References
- [registry.md](registry.md) - Full list of known edge cases
- [scripts/test_case.py](scripts/test_case.py) - Template for testing new cases
- [BROWSER_USE_PATTERNS.md](../../tools/BROWSER_USE_PATTERNS.md) - Implementation patterns from browser-use
@@ -1,261 +0,0 @@
# Browser Edge Case Registry
Curated list of known browser automation edge cases with symptoms, causes, and fixes.
---
## Scroll Issues
### #1: LinkedIn Nested Scroll Container
| Attribute | Value |
|-----------|-------|
| **Site** | LinkedIn (linkedin.com/feed) |
| **Symptom** | `browser_scroll()` returns `{ok: true}` but page doesn't move |
| **Root Cause** | Content is in a nested scrollable div (`overflow: scroll`), not the main window |
| **Detection** | `document.querySelectorAll('*')` with `overflow: scroll/auto` has large candidates |
| **Fix** | JavaScript finds largest scrollable container, uses `container.scrollBy()` |
| **Code** | `bridge.py:808-891` - smart scroll with container detection |
| **Verified** | 2026-04-03 ✓ |
### #2: Twitter/X Lazy Loading
| Attribute | Value |
|-----------|-------|
| **Site** | Twitter/X (x.com) |
| **Symptom** | Infinite scroll doesn't load new content |
| **Root Cause** | Lazy loading requires content to be visible before loading more |
| **Detection** | Scroll position at bottom but no new `[data-testid="tweet"]` elements |
| **Fix** | Add `wait_for_selector` between scroll calls with 1s delay |
| **Code** | Test file: `tests/test_x_page_load_repro.py` |
| **Verified** | - |
### #3: Modal/Dialog Scroll Container
| Attribute | Value |
|-----------|-------|
| **Site** | Any site with modal dialogs |
| **Symptom** | Scroll scrolls background page, not modal content |
| **Root Cause** | Modal has its own scroll container with `overflow: scroll` |
| **Detection** | Visible element with `position: fixed` and scrollable content |
| **Fix** | Find visible modal container (highest z-index scrollable), scroll that |
| **Code** | - |
| **Verified** | - |
---
## Click Issues
### #4: Element Covered by Overlay
| Attribute | Value |
|-----------|-------|
| **Site** | SPAs, sites with loading overlays |
| **Symptom** | Click succeeds but no action triggered |
| **Root Cause** | Element is covered by transparent overlay, tooltip, or iframe |
| **Detection** | `document.elementFromPoint(x, y) !== target` |
| **Fix** | Wait for overlay to disappear, or use JavaScript `element.click()` |
| **Code** | `bridge.py:394-591` - JavaScript click as primary |
| **Verified** | - |
### #5: React Synthetic Events
| Attribute | Value |
|-----------|-------|
| **Site** | React applications |
| **Symptom** | CDP click doesn't trigger React handler |
| **Root Cause** | React uses synthetic events that don't respond to CDP events |
| **Detection** | Site uses React (check for `__reactFiber$` or `data-reactroot`) |
| **Fix** | Use JavaScript `element.click()` as primary method |
| **Code** | `bridge.py:394-591` - JavaScript-first click |
| **Verified** | - |
### #6: Shadow DOM Elements
| Attribute | Value |
|-----------|-------|
| **Site** | Components using Shadow DOM, Lit elements |
| **Symptom** | `querySelector` can't find element |
| **Root Cause** | Element is inside a shadow root, not main DOM tree |
| **Detection** | `element.shadowRoot !== null` on parent elements |
| **Fix** | Use piercing selector (`host >>> target`) or traverse shadow roots |
| **Code** | See SKILL.md P6 pattern |
| **Verified** | 2026-04-03 ✓ |
---
## Input Issues
### #7: ContentEditable / Rich Text Editors
| Attribute | Value |
|-----------|-------|
| **Site** | Rich text editors (Notion, Slack web, etc.) |
| **Symptom** | `browser_type()` doesn't insert text |
| **Root Cause** | Element is `contenteditable`, not an `<input>` or `<textarea>` |
| **Detection** | `element.contentEditable === 'true'` |
| **Fix** | Focus via JavaScript, use `execCommand('insertText')` or `Input.dispatchKeyEvent` |
| **Code** | `bridge.py:616-694` - contentEditable handling |
| **Verified** | 2026-04-03 ✓ |
### #8: Autocomplete Field Clearing
| Attribute | Value |
|-----------|-------|
| **Site** | Search fields with autocomplete, address forms |
| **Symptom** | Typed text gets cleared immediately |
| **Root Cause** | Field expects realistic keystroke timing for autocomplete |
| **Detection** | Field has autocomplete listeners or dropdown appears |
| **Fix** | Add `delay_ms=50` between keystrokes |
| **Code** | `bridge.py:type()` - delay_ms parameter |
| **Verified** | 2026-04-03 ✓ |
### #9: Custom Date Pickers
| Attribute | Value |
|-----------|-------|
| **Site** | Forms with custom date widgets |
| **Symptom** | Can't type date into date field |
| **Root Cause** | Custom widget intercepts and blocks keyboard input |
| **Detection** | Typing doesn't change field value |
| **Fix** | Click calendar widget icon, select date from dropdown |
| **Code** | - |
| **Verified** | - |
---
## Snapshot Issues
### #10: LinkedIn Huge DOM Tree
| Attribute | Value |
|-----------|-------|
| **Site** | LinkedIn, Facebook, Twitter feeds |
| **Symptom** | `browser_snapshot()` hangs forever |
| **Root Cause** | 10k+ DOM nodes, accessibility tree has 50k+ nodes |
| **Detection** | `document.querySelectorAll('*').length > 5000` |
| **Fix** | Add `timeout_s` param with `asyncio.timeout()`, proper error handling |
| **Code** | `bridge.py:1041-1028` - snapshot with timeout protection |
| **Verified** | 2026-04-03 ✓ (0.08s on LinkedIn) |
### #11: SPA Hydration Delay
| Attribute | Value |
|-----------|-------|
| **Site** | React/Vue/Angular SPAs |
| **Symptom** | Snapshot shows old content after navigation |
| **Root Cause** | Client-side hydration hasn't completed when snapshot runs |
| **Detection** | `document.readyState === 'complete'` but content missing |
| **Fix** | Wait for specific selector after navigation |
| **Code** | Test file: `tests/test_x_page_load_repro.py` |
| **Verified** | - |
### #12: iframe Content Missing
| Attribute | Value |
|-----------|-------|
| **Site** | Sites with embedded content |
| **Symptom** | Snapshot missing iframe content |
| **Root Cause** | Accessibility tree doesn't include iframe content |
| **Detection** | `document.querySelectorAll('iframe')` has results |
| **Fix** | Use `DOM.getFrameOwner` + separate snapshot for each iframe |
| **Code** | - |
| **Verified** | - |
---
## Navigation Issues
### #13: SPA Navigation Events
| Attribute | Value |
|-----------|-------|
| **Site** | React Router, Vue Router SPAs |
| **Symptom** | `wait_until="load"` fires before content ready |
| **Root Cause** | SPA uses client-side routing, no full page load |
| **Detection** | URL changes but `load` event already fired |
| **Fix** | Use `wait_until="networkidle"` or `wait_for_selector` |
| **Code** | `bridge.py:navigate()` - wait_until options |
| **Verified** | - |
### #14: Cross-Origin Redirects
| Attribute | Value |
|-----------|-------|
| **Site** | OAuth flows, SSO logins |
| **Symptom** | Navigation fails during redirect |
| **Root Cause** | Cross-origin security prevents CDP tracking |
| **Detection** | URL changes to different domain |
| **Fix** | Use `wait_for_url` with pattern matching instead of exact URL |
| **Code** | - |
| **Verified** | - |
---
## Screenshot Issues
### #15: Selector Screenshot Not Implemented
| Attribute | Value |
|-----------|-------|
| **Site** | Any site |
| **Symptom** | `browser_screenshot(selector="h1")` takes full viewport instead of element |
| **Root Cause** | `selector` param existed in signature but was silently ignored in both `bridge.py` and `inspection.py` |
| **Detection** | Screenshot with selector same byte size as screenshot without selector |
| **Fix** | Use CDP `Runtime.evaluate` to call `getBoundingClientRect()` on the element, pass result as `clip` to `Page.captureScreenshot` |
| **Code** | `bridge.py:1315-1344` - selector clip logic; `inspection.py:94-96` - pass selector to bridge |
| **Verified** | 2026-04-03 ✓ (JS rect query returns correct viewport coords; requires server restart) |
### #16: Stale Browser Context (Group ID Mismatch)
| Attribute | Value |
|-----------|-------|
| **Site** | Any |
| **Symptom** | `browser_open()` returns `"No group with id: XXXXXXX"` even though `browser_status` shows `running: true` |
| **Root Cause** | In-memory `_contexts` dict has a stale `groupId` from a Chrome tab group that was closed outside the tool (e.g. user closed the tab group) |
| **Detection** | `browser_status` returns `running: true` but `browser_open` fails with "No group with id" |
| **Fix** | Call `browser_stop()` to clear stale context from `_contexts`, then `browser_start()` again |
| **Code** | `tools/lifecycle.py:144-160` - `already_running` check uses cached dict without validating against Chrome |
| **Verified** | 2026-04-03 ✓ |
---
## How to Add New Edge Cases
1. **Reproduce** the issue with minimal test case
2. **Document** using the template below
3. **Implement** fix with multi-layer fallback
4. **Verify** against both problematic and simple sites
5. **Submit** by appending to this file
### Template
```markdown
### #N: [Short Title]
| Attribute | Value |
|-----------|-------|
| **Site** | [URL or site type] |
| **Symptom** | [What the user observes] |
| **Root Cause** | [Technical explanation] |
| **Detection** | [JavaScript to detect this case] |
| **Fix** | [Solution approach] |
| **Code** | [File:line reference if implemented] |
| **Verified** | [Date or "pending"] |
```
---
## Statistics
| Category | Count |
|----------|-------|
| Scroll Issues | 3 |
| Click Issues | 3 |
| Input Issues | 3 |
| Snapshot Issues | 3 |
| Navigation Issues | 2 |
| Screenshot Issues | 2 |
| **Total** | **16** |
Last updated: 2026-04-03
@@ -1,111 +0,0 @@
#!/usr/bin/env python
"""
Test #2: Twitter/X Lazy Loading Scroll
Symptom: Infinite scroll doesn't load new content
Root Cause: Lazy loading requires content to be visible before loading more
Fix: Add wait_for_selector between scroll calls
"""
import asyncio
import sys
import time
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent.parent.parent / "tools" / "src"))
from gcu.browser.bridge import BeelineBridge
BRIDGE_PORT = 9229
CONTEXT_NAME = "twitter-scroll-test"
async def test_twitter_lazy_scroll():
"""Test that repeated scrolls with waits load new content."""
print("=" * 70)
print("TEST #2: Twitter/X Lazy Loading Scroll")
print("=" * 70)
bridge = BeelineBridge()
try:
await bridge.start()
for i in range(10):
await asyncio.sleep(1)
if bridge.is_connected:
print("✓ Extension connected!")
break
print(f"Waiting for extension... ({i+1}/10)")
else:
print("✗ Extension not connected")
return
context = await bridge.create_context(CONTEXT_NAME)
tab_id = context.get("tabId")
group_id = context.get("groupId")
print(f"✓ Created tab: {tab_id}")
# Navigate to Twitter/X
print("\n--- Navigating to X.com ---")
await bridge.navigate(tab_id, "https://x.com", wait_until="networkidle", timeout_ms=30000)
print("✓ Page loaded")
# Wait for tweets to appear
print("\n--- Waiting for tweets ---")
await bridge.wait_for_selector(tab_id, '[data-testid="tweet"]', timeout_ms=10000)
# Count initial tweets
initial_count = await bridge.evaluate(
tab_id,
'(function() { return document.querySelectorAll(\'[data-testid="tweet"]\').length; })()'
)
print(f"Initial tweet count: {initial_count.get('result', 0)}")
# Take screenshot of initial state
screenshot = await bridge.screenshot(tab_id)
print(f"Screenshot: {len(screenshot.get('data', ''))} bytes")
# Scroll multiple times with waits
print("\n--- Scrolling with waits ---")
for i in range(3):
result = await bridge.scroll(tab_id, "down", 500)
print(f" Scroll {i+1}: {result.get('method', 'unknown')} method")
# Wait for new content to load
await asyncio.sleep(2)
# Count tweets after scroll
count_result = await bridge.evaluate(
tab_id,
'(function() { return document.querySelectorAll(\'[data-testid="tweet"]\').length; })()'
)
count = count_result.get('result', 0)
print(f" Tweet count after scroll: {count}")
# Final count
final_count = await bridge.evaluate(
tab_id,
'(function() { return document.querySelectorAll(\'[data-testid="tweet"]\').length; })()'
)
final = final_count.get('result', 0)
initial = initial_count.get('result', 0)
print(f"\n--- Results ---")
print(f"Initial tweets: {initial}")
print(f"Final tweets: {final}")
if final > initial:
print(f"✓ PASS: Loaded {final - initial} new tweets")
else:
print("✗ FAIL: No new tweets loaded (may need login)")
await bridge.destroy_context(group_id)
print("\n✓ Context destroyed")
finally:
await bridge.stop()
if __name__ == "__main__":
asyncio.run(test_twitter_lazy_scroll())
@@ -1,97 +0,0 @@
#!/usr/bin/env python
"""
Test #3: Modal/Dialog Scroll Container
Symptom: Scroll scrolls background page, not modal content
Root Cause: Modal has its own scroll container with overflow: scroll
Fix: Find visible modal container (highest z-index scrollable), scroll that
"""
import asyncio
import sys
import time
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent.parent.parent / "tools" / "src"))
from gcu.browser.bridge import BeelineBridge
BRIDGE_PORT = 9229
CONTEXT_NAME = "modal-scroll-test"
# Test site with modal - using a demo site
MODAL_DEMO_URL = "https://www.w3schools.com/howto/howto_css_modals.asp"
async def test_modal_scroll():
"""Test that scroll targets modal content, not background."""
print("=" * 70)
print("TEST #3: Modal/Dialog Scroll Container")
print("=" * 70)
bridge = BeelineBridge()
try:
await bridge.start()
for i in range(10):
await asyncio.sleep(1)
if bridge.is_connected:
print("✓ Extension connected!")
break
else:
print("✗ Extension not connected")
return
context = await bridge.create_context(CONTEXT_NAME)
tab_id = context.get("tabId")
group_id = context.get("groupId")
print(f"✓ Created tab: {tab_id}")
# Navigate to modal demo
print("\n--- Navigating to modal demo ---")
await bridge.navigate(tab_id, MODAL_DEMO_URL, wait_until="load")
print("✓ Page loaded")
# Take screenshot before
screenshot_before = await bridge.screenshot(tab_id)
print(f"Screenshot before: {len(screenshot_before.get('data', ''))} bytes")
# Click button to open modal
print("\n--- Opening modal ---")
# Find and click the "Open Modal" button
result = await bridge.click(tab_id, '.ws-btn', timeout_ms=5000)
print(f"Click result: {result}")
await asyncio.sleep(1)
# Take screenshot with modal open
screenshot_modal = await bridge.screenshot(tab_id)
print(f"Screenshot modal open: {len(screenshot_modal.get('data', ''))} bytes")
# Try to scroll within modal
print("\n--- Scrolling modal content ---")
result = await bridge.scroll(tab_id, "down", 100)
print(f"Scroll result: {result}")
await asyncio.sleep(0.5)
# Take screenshot after scroll
screenshot_after = await bridge.screenshot(tab_id)
print(f"Screenshot after scroll: {len(screenshot_after.get('data', ''))} bytes")
# Check if modal content scrolled (not background)
# This is a visual check - we can verify by comparing screenshots
print("\n--- Results ---")
print(f"Modal scroll test completed. Method used: {result.get('method', 'unknown')}")
print("Visual verification needed: Check if modal content scrolled vs background")
await bridge.destroy_context(group_id)
print("\n✓ Context destroyed")
finally:
await bridge.stop()
if __name__ == "__main__":
asyncio.run(test_modal_scroll())
@@ -1,123 +0,0 @@
#!/usr/bin/env python
"""
Test #4: Element Covered by Overlay
Symptom: Click succeeds but no action triggered
Root Cause: Element is covered by transparent overlay, tooltip, or iframe
Detection: document.elementFromPoint(x, y) !== target
Fix: Wait for overlay to disappear, or use JavaScript element.click()
"""
import asyncio
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent.parent.parent / "tools" / "src"))
from gcu.browser.bridge import BeelineBridge
CONTEXT_NAME = "overlay-click-test"
async def test_overlay_click():
"""Test clicking elements that are covered by overlays."""
print("=" * 70)
print("TEST #4: Element Covered by Overlay")
print("=" * 70)
bridge = BeelineBridge()
try:
await bridge.start()
for i in range(10):
await asyncio.sleep(1)
if bridge.is_connected:
print("✓ Extension connected!")
break
else:
print("✗ Extension not connected")
return
context = await bridge.create_context(CONTEXT_NAME)
tab_id = context.get("tabId")
group_id = context.get("groupId")
print(f"✓ Created tab: {tab_id}")
# Create a test page with overlay
print("\n--- Creating test page with overlay ---")
test_html = """
<!DOCTYPE html>
<html>
<head><title>Overlay Test</title></head>
<body>
<button id="target-btn" onclick="alert('Clicked!')">Click Me</button>
<div id="overlay" style="position:fixed;top:0;left:0;width:100%;height:100%;background:rgba(0,0,0,0.3);z-index:1000;"></div>
<script>
window.clickCount = 0;
document.getElementById('target-btn').addEventListener('click', () => {
window.clickCount++;
});
</script>
</body>
</html>
"""
# Navigate to data URL
import base64
data_url = f"data:text/html;base64,{base64.b64encode(test_html.encode()).decode()}"
await bridge.navigate(tab_id, data_url, wait_until="load")
# Screenshot before
screenshot = await bridge.screenshot(tab_id)
print(f"Screenshot: {len(screenshot.get('data', ''))} bytes")
# Try to click the covered button
print("\n--- Attempting to click covered button ---")
# First, check if element is covered
coverage_check = await bridge.evaluate(
tab_id,
"""
(function() {
const btn = document.getElementById('target-btn');
const rect = btn.getBoundingClientRect();
const centerX = rect.left + rect.width / 2;
const centerY = rect.top + rect.height / 2;
const topElement = document.elementFromPoint(centerX, centerY);
return {
isCovered: topElement !== btn && !btn.contains(topElement),
topElement: topElement?.tagName,
targetElement: btn.tagName
};
})();
"""
)
print(f"Coverage check: {coverage_check.get('result', {})}")
# Try CDP click (may fail due to overlay)
click_result = await bridge.click(tab_id, "#target-btn", timeout_ms=5000)
print(f"Click result: {click_result}")
# Check if click registered
count_result = await bridge.evaluate(
tab_id,
"(function() { return window.clickCount; })()"
)
count = count_result.get("result", 0)
print(f"Click count after CDP click: {count}")
if count > 0:
print("✓ PASS: JavaScript click penetrated overlay")
else:
print("✗ FAIL: Click did not reach button (overlay blocked it)")
await bridge.destroy_context(group_id)
print("\n✓ Context destroyed")
finally:
await bridge.stop()
if __name__ == "__main__":
asyncio.run(test_overlay_click())
@@ -1,154 +0,0 @@
#!/usr/bin/env python
"""
Test #6: Shadow DOM Elements
Symptom: querySelector can't find element
Root Cause: Element is inside a shadow root, not main DOM tree
Detection: element.shadowRoot !== null on parent elements
Fix: Use piercing selector (host >>> target) or traverse shadow roots
"""
import asyncio
import sys
import base64
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent.parent.parent / "tools" / "src"))
from gcu.browser.bridge import BeelineBridge
CONTEXT_NAME = "shadow-dom-test"
async def test_shadow_dom():
"""Test clicking elements inside Shadow DOM."""
print("=" * 70)
print("TEST #6: Shadow DOM Elements")
print("=" * 70)
bridge = BeelineBridge()
try:
await bridge.start()
for i in range(10):
await asyncio.sleep(1)
if bridge.is_connected:
print("✓ Extension connected!")
break
else:
print("✗ Extension not connected")
return
context = await bridge.create_context(CONTEXT_NAME)
tab_id = context.get("tabId")
group_id = context.get("groupId")
print(f"✓ Created tab: {tab_id}")
# Create test page with Shadow DOM
print("\n--- Creating test page with Shadow DOM ---")
test_html = """
<!DOCTYPE html>
<html>
<head><title>Shadow DOM Test</title></head>
<body>
<div id="shadow-host"></div>
<script>
const host = document.getElementById('shadow-host');
const shadow = host.attachShadow({ mode: 'open' });
shadow.innerHTML = `
<style>
button { padding: 10px 20px; font-size: 16px; }
</style>
<button id="shadow-btn">Shadow Button</button>
`;
shadow.getElementById('shadow-btn').addEventListener('click', () => {
window.shadowClickCount = (window.shadowClickCount || 0) + 1;
console.log('Shadow button clicked:', window.shadowClickCount);
});
</script>
</body>
</html>
"""
# Write to file and use file:// URL (data: URLs don't work well with extension)
test_file = Path("/tmp/shadow_dom_test.html")
test_file.write_text(test_html.strip())
file_url = f"file://{test_file}"
await bridge.navigate(tab_id, file_url, wait_until="load")
print("✓ Page loaded")
# Screenshot
screenshot = await bridge.screenshot(tab_id)
print(f"Screenshot: {len(screenshot.get('data', ''))} bytes")
# Detect Shadow DOM
print("\n--- Detecting Shadow DOM ---")
detection = await bridge.evaluate(
tab_id,
"""
(function() {
const hosts = [];
document.querySelectorAll('*').forEach(el => {
if (el.shadowRoot) {
hosts.push({
tag: el.tagName,
id: el.id,
hasButton: el.shadowRoot.querySelector('button') !== null
});
}
});
return { count: hosts.length, hosts };
})();
"""
)
print(f"Shadow DOM detection: {detection.get('result', {})}")
# Try to click shadow button using regular selector (should fail)
print("\n--- Attempting click with regular selector ---")
try:
result = await bridge.click(tab_id, "#shadow-btn", timeout_ms=3000)
print(f"Result: {result}")
except Exception as e:
print(f"Expected failure: {e}")
# Try to click using JavaScript that pierces shadow DOM
print("\n--- Clicking via JavaScript shadow piercing ---")
click_result = await bridge.evaluate(
tab_id,
"""
(function() {
const host = document.getElementById('shadow-host');
const btn = host.shadowRoot.getElementById('shadow-btn');
if (btn) {
btn.click();
return { success: true, clicked: 'shadow-btn' };
}
return { success: false, error: 'Button not found' };
})();
"""
)
print(f"JS click result: {click_result.get('result', {})}")
# Verify click was registered
count_result = await bridge.evaluate(
tab_id,
"(function() { return window.shadowClickCount || 0; })()"
)
count = count_result.get("result") or 0
print(f"Shadow click count: {count}")
if count and count > 0:
print("✓ PASS: Shadow DOM element clicked successfully")
else:
print("✗ FAIL: Could not click Shadow DOM element")
await bridge.destroy_context(group_id)
print("\n✓ Context destroyed")
finally:
await bridge.stop()
if __name__ == "__main__":
asyncio.run(test_shadow_dom())
@@ -1,178 +0,0 @@
#!/usr/bin/env python
"""
Test #7: ContentEditable / Rich Text Editors
Symptom: browser_type() doesn't insert text
Root Cause: Element is contenteditable, not an <input> or <textarea>
Detection: element.contentEditable === 'true'
Fix: Focus via JavaScript, use execCommand('insertText') or Input.dispatchKeyEvent
"""
import asyncio
import sys
import base64
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent.parent.parent / "tools" / "src"))
from gcu.browser.bridge import BeelineBridge
CONTEXT_NAME = "contenteditable-test"
async def test_contenteditable():
"""Test typing into contenteditable elements."""
print("=" * 70)
print("TEST #7: ContentEditable / Rich Text Editors")
print("=" * 70)
bridge = BeelineBridge()
try:
await bridge.start()
for i in range(10):
await asyncio.sleep(1)
if bridge.is_connected:
print("✓ Extension connected!")
break
else:
print("✗ Extension not connected")
return
context = await bridge.create_context(CONTEXT_NAME)
tab_id = context.get("tabId")
group_id = context.get("groupId")
print(f"✓ Created tab: {tab_id}")
# Create test page with contenteditable
test_html = """
<!DOCTYPE html>
<html>
<head><title>ContentEditable Test</title></head>
<body>
<h2>ContentEditable Test</h2>
<h3>1. Simple contenteditable div</h3>
<div id="editor1" contenteditable="true" style="border:1px solid #ccc;padding:10px;min-height:50px;">Start text</div>
<h3>2. Rich text editor (like Notion)</h3>
<div id="editor2" contenteditable="true" style="border:1px solid #ccc;padding:10px;min-height:50px;">
<p>Type here...</p>
</div>
<h3>3. Regular input (for comparison)</h3>
<input id="input1" type="text" placeholder="Regular input" />
<script>
// Track content changes
window.editor1Content = '';
window.editor2Content = '';
document.getElementById('editor1').addEventListener('input', (e) => {
window.editor1Content = e.target.innerText;
});
document.getElementById('editor2').addEventListener('input', (e) => {
window.editor2Content = e.target.innerText;
});
</script>
</body>
</html>
"""
# Write to file and use file:// URL (data: URLs don't work well with extension)
test_file = Path("/tmp/contenteditable_test.html")
test_file.write_text(test_html.strip())
file_url = f"file://{test_file}"
await bridge.navigate(tab_id, file_url, wait_until="load")
print("✓ Page loaded")
# Screenshot with timeout protection
try:
screenshot = await asyncio.wait_for(bridge.screenshot(tab_id), timeout=10.0)
print(f"Screenshot: {len(screenshot.get('data', ''))} bytes")
except asyncio.TimeoutError:
print("Screenshot timed out (skipping)")
# Detect contenteditable
print("\n--- Detecting contenteditable elements ---")
detection = await bridge.evaluate(
tab_id,
"""
(function() {
const editables = document.querySelectorAll('[contenteditable="true"]');
return {
count: editables.length,
ids: Array.from(editables).map(el => el.id)
};
})();
"""
)
print(f"Contenteditable detection: {detection.get('result', {})}")
# Test 1: Type into regular input (baseline)
print("\n--- Test 1: Regular input ---")
await bridge.click(tab_id, "#input1")
await bridge.type_text(tab_id, "#input1", "Hello input")
input_result = await bridge.evaluate(
tab_id,
"(function() { return document.getElementById('input1').value; })()"
)
print(f"Input value: {input_result.get('result', '')}")
# Test 2: Type into contenteditable div
print("\n--- Test 2: Contenteditable div ---")
await bridge.click(tab_id, "#editor1")
await bridge.type_text(tab_id, "#editor1", "Hello contenteditable", clear_first=True)
editor_result = await bridge.evaluate(
tab_id,
"(function() { return document.getElementById('editor1').innerText; })()"
)
print(f"Editor1 innerText: {editor_result.get('result', '')}")
# Test 3: Use JavaScript insertText for rich editor
print("\n--- Test 3: JavaScript insertText for rich editor ---")
insert_result = await bridge.evaluate(
tab_id,
"""
(function() {
const editor = document.getElementById('editor2');
editor.focus();
document.execCommand('selectAll', false, null);
document.execCommand('insertText', false, 'Hello from execCommand');
return editor.innerText;
})();
"""
)
print(f"Editor2 after execCommand: {insert_result.get('result', '')}")
# Screenshot after with timeout protection
try:
screenshot_after = await asyncio.wait_for(bridge.screenshot(tab_id), timeout=10.0)
print(f"Screenshot after: {len(screenshot_after.get('data', ''))} bytes")
except asyncio.TimeoutError:
print("Screenshot after timed out (skipping)")
# Results
print("\n--- Results ---")
input_val = input_result.get("result", "")
editor1_val = editor_result.get("result", "")
editor2_val = insert_result.get("result", "")
input_pass = "Hello input" in input_val
editor1_pass = "Hello contenteditable" in editor1_val
editor2_pass = "execCommand" in editor2_val
print(f"Input: {'✓ PASS' if input_pass else '✗ FAIL'} - {input_val}")
print(f"Editor1: {'✓ PASS' if editor1_pass else '✗ FAIL'} - {editor1_val}")
print(f"Editor2: {'✓ PASS' if editor2_pass else '✗ FAIL'} - {editor2_val}")
await bridge.destroy_context(group_id)
print("\n✓ Context destroyed")
finally:
await bridge.stop()
if __name__ == "__main__":
asyncio.run(test_contenteditable())
@@ -1,233 +0,0 @@
#!/usr/bin/env python
"""
Test #8: Autocomplete Field Clearing
Symptom: Typed text gets cleared immediately
Root Cause: Field expects realistic keystroke timing for autocomplete
Detection: Field has autocomplete listeners or dropdown appears
Fix: Add delay_ms between keystrokes
"""
import asyncio
import sys
import base64
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent.parent.parent / "tools" / "src"))
from gcu.browser.bridge import BeelineBridge
CONTEXT_NAME = "autocomplete-test"
async def test_autocomplete():
"""Test typing into fields with autocomplete behavior."""
print("=" * 70)
print("TEST #8: Autocomplete Field Clearing")
print("=" * 70)
bridge = BeelineBridge()
try:
await bridge.start()
for i in range(10):
await asyncio.sleep(1)
if bridge.is_connected:
print("✓ Extension connected!")
break
else:
print("✗ Extension not connected")
return
context = await bridge.create_context(CONTEXT_NAME)
tab_id = context.get("tabId")
group_id = context.get("groupId")
print(f"✓ Created tab: {tab_id}")
# Create test page with autocomplete behavior
test_html = """
<!DOCTYPE html>
<html>
<head><title>Autocomplete Test</title>
<style>
.autocomplete-items {
position: absolute;
border: 1px solid #d4d4d4;
border-top: none;
z-index: 99;
top: 100%;
left: 0;
right: 0;
max-height: 200px;
overflow-y: auto;
background: white;
}
.autocomplete-items div {
padding: 10px;
cursor: pointer;
}
.autocomplete-items div:hover {
background-color: #e9e9e9;
}
.autocomplete-active {
background-color: DodgerBlue !important;
color: white;
}
.autocomplete { position: relative; display: inline-block; }
input { width: 300px; padding: 10px; font-size: 16px; }
</style></head>
<body>
<h2>Autocomplete Test</h2>
<div class="autocomplete">
<input id="search" type="text" placeholder="Search countries..." autocomplete="off">
</div>
<div id="log" style="margin-top:20px;font-family:monospace;"></div>
<script>
const countries = ["Afghanistan","Albania","Algeria","Andorra","Angola","Argentina","Armenia","Australia","Austria","Azerbaijan","Bahamas","Bahrain","Bangladesh","Belarus","Belgium","Belize","Benin","Bhutan","Bolivia","Brazil","Canada","China","Colombia","Denmark","Egypt","France","Germany","India","Indonesia","Italy","Japan","Mexico","Netherlands","Nigeria","Norway","Pakistan","Peru","Philippines","Poland","Portugal","Russia","Spain","Sweden","Switzerland","Thailand","Turkey","Ukraine","United Kingdom","United States","Vietnam"];
const input = document.getElementById('search');
const log = document.getElementById('log');
let currentFocus = -1;
let typingTimeout = null;
// Track events for testing
window.inputEvents = [];
window.inputValue = '';
function logEvent(type, value) {
window.inputEvents.push({ type, value, time: Date.now() });
const entry = document.createElement('div');
entry.textContent = type + ': ' + value;
log.insertBefore(entry, log.firstChild);
}
// Simulate autocomplete that clears fast typing
input.addEventListener('input', function(e) {
const val = this.value;
// Clear previous dropdown
closeAllLists();
if (!val) return;
// If typing too fast (autocomplete-style), clear and restart
clearTimeout(typingTimeout);
typingTimeout = setTimeout(() => {
logEvent('input', val);
window.inputValue = val;
// Create dropdown
const div = document.createElement('div');
div.setAttribute('id', this.id + 'autocomplete-list');
div.setAttribute('class', 'autocomplete-items');
this.parentNode.appendChild(div);
countries.filter(c => c.substr(0, val.length).toUpperCase() === val.toUpperCase())
.slice(0, 5)
.forEach(country => {
const item = document.createElement('div');
item.innerHTML = '<strong>' + country.substr(0, val.length) + '</strong>' + country.substr(val.length);
item.addEventListener('click', function() {
input.value = country;
closeAllLists();
logEvent('select', country);
window.inputValue = country;
});
div.appendChild(item);
});
}, 100); // 100ms debounce
});
function closeAllLists() {
document.querySelectorAll('.autocomplete-items').forEach(el => el.remove());
}
document.addEventListener('click', function() {
closeAllLists();
});
</script>
</body>
</html>
"""
# Write to file and use file:// URL (data: URLs don't work well with extension)
test_file = Path("/tmp/autocomplete_test.html")
test_file.write_text(test_html.strip())
file_url = f"file://{test_file}"
await bridge.navigate(tab_id, file_url, wait_until="load")
print("✓ Page loaded")
# Screenshot
screenshot = await bridge.screenshot(tab_id)
print(f"Screenshot: {len(screenshot.get('data', ''))} bytes")
# Test 1: Fast typing (no delay) - may fail
print("\n--- Test 1: Fast typing (delay_ms=0) ---")
await bridge.click(tab_id, "#search")
await bridge.type_text(tab_id, "#search", "Ger", clear_first=True, delay_ms=0)
await asyncio.sleep(0.5)
fast_result = await bridge.evaluate(
tab_id,
"(function() { return document.getElementById('search').value; })()"
)
fast_value = fast_result.get("result", "")
print(f"Value after fast typing: '{fast_value}'")
# Check events
events_result = await bridge.evaluate(
tab_id,
"(function() { return window.inputEvents; })()"
)
print(f"Events logged: {events_result.get('result', [])}")
# Test 2: Slow typing (with delay) - should work
print("\n--- Test 2: Slow typing (delay_ms=100) ---")
await bridge.click(tab_id, "#search")
await bridge.type_text(tab_id, "#search", "United", clear_first=True, delay_ms=100)
await asyncio.sleep(0.5)
slow_result = await bridge.evaluate(
tab_id,
"(function() { return document.getElementById('search').value; })()"
)
slow_value = slow_result.get("result", "")
print(f"Value after slow typing: '{slow_value}'")
# Check if dropdown appeared
dropdown_result = await bridge.evaluate(
tab_id,
"(function() { return document.querySelectorAll('.autocomplete-items div').length; })()"
)
dropdown_count = dropdown_result.get("result", 0)
print(f"Dropdown items: {dropdown_count}")
# Screenshot with dropdown
screenshot_dropdown = await bridge.screenshot(tab_id)
print(f"Screenshot with dropdown: {len(screenshot_dropdown.get('data', ''))} bytes")
# Results
print("\n--- Results ---")
if "United" in slow_value:
print("✓ PASS: Slow typing with delay_ms worked")
else:
print("✗ FAIL: Slow typing still didn't work")
if dropdown_count > 0:
print("✓ PASS: Autocomplete dropdown appeared")
else:
print("⚠ WARNING: No autocomplete dropdown")
await bridge.destroy_context(group_id)
print("\n✓ Context destroyed")
finally:
await bridge.stop()
if __name__ == "__main__":
asyncio.run(test_autocomplete())
@@ -1,162 +0,0 @@
#!/usr/bin/env python
"""
Test #10: LinkedIn Huge DOM Tree
Symptom: browser_snapshot() hangs forever
Root Cause: 10k+ DOM nodes, accessibility tree has 50k+ nodes
Detection: document.querySelectorAll('*').length > 5000
Fix: Add timeout (10s default), truncate tree at 2000 nodes
"""
import asyncio
import sys
import time
import base64
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent.parent.parent / "tools" / "src"))
from gcu.browser.bridge import BeelineBridge
CONTEXT_NAME = "huge-dom-test"
async def test_huge_dom():
"""Test snapshot performance on huge DOM trees."""
print("=" * 70)
print("TEST #10: Huge DOM Tree (LinkedIn-style)")
print("=" * 70)
bridge = BeelineBridge()
try:
await bridge.start()
for i in range(10):
await asyncio.sleep(1)
if bridge.is_connected:
print("✓ Extension connected!")
break
else:
print("✗ Extension not connected")
return
context = await bridge.create_context(CONTEXT_NAME)
tab_id = context.get("tabId")
group_id = context.get("groupId")
print(f"✓ Created tab: {tab_id}")
# Test 1: Small DOM (baseline)
print("\n--- Test 1: Small DOM (baseline) ---")
small_html = """
<!DOCTYPE html>
<html><body>
<h1>Small Page</h1>
<p>A few elements</p>
<button>Click me</button>
</body></html>
"""
data_url = f"data:text/html;base64,{base64.b64encode(small_html.encode()).decode()}"
await bridge.navigate(tab_id, data_url, wait_until="load")
start = time.perf_counter()
snapshot = await bridge.snapshot(tab_id, timeout_s=5.0)
elapsed = time.perf_counter() - start
tree_len = len(snapshot.get("tree", ""))
print(f"Small DOM snapshot: {elapsed:.3f}s, {tree_len} chars")
# Test 2: Generate huge DOM
print("\n--- Test 2: Huge DOM (5000+ elements) ---")
huge_html = """
<!DOCTYPE html>
<html><body>
<h1>Huge DOM Test</h1>
<div id="container"></div>
<script>
const container = document.getElementById('container');
for (let i = 0; i < 5000; i++) {
const div = document.createElement('div');
div.className = 'item-' + i;
div.innerHTML = '<span>Item ' + i + '</span><button>Action</button>';
container.appendChild(div);
}
</script>
</body></html>
"""
data_url = f"data:text/html;base64,{base64.b64encode(huge_html.encode()).decode()}"
await bridge.navigate(tab_id, data_url, wait_until="load")
# Count elements
count_result = await bridge.evaluate(
tab_id,
"(function() { return document.querySelectorAll('*').length; })()"
)
elem_count = count_result.get("result", 0)
print(f"DOM elements: {elem_count}")
# Skip screenshot on huge DOM - it can timeout
# Instead verify page loaded by checking DOM
print("✓ Page verified (skipping screenshot on huge DOM)")
# Test snapshot with timeout
print("\n--- Testing snapshot with 10s timeout ---")
start = time.perf_counter()
try:
snapshot = await bridge.snapshot(tab_id, timeout_s=10.0)
elapsed = time.perf_counter() - start
tree_len = len(snapshot.get("tree", ""))
truncated = "(truncated)" in snapshot.get("tree", "")
print(f"✓ Huge DOM snapshot: {elapsed:.3f}s, {tree_len} chars, truncated={truncated}")
if elapsed < 5.0:
print("✓ PASS: Snapshot completed quickly")
else:
print(f"⚠ WARNING: Snapshot took {elapsed:.1f}s")
if truncated:
print("✓ PASS: Tree was truncated to prevent hang")
else:
print("⚠ WARNING: Tree not truncated (may need adjustment)")
except asyncio.TimeoutError:
print("✗ FAIL: Snapshot timed out (this shouldn't happen)")
# Test 3: Real LinkedIn
print("\n--- Test 3: Real LinkedIn Feed ---")
await bridge.navigate(tab_id, "https://www.linkedin.com/feed", wait_until="load", timeout_ms=30000)
await asyncio.sleep(2)
count_result = await bridge.evaluate(
tab_id,
"(function() { return document.querySelectorAll('*').length; })()"
)
elem_count = count_result.get("result", 0)
print(f"LinkedIn DOM elements: {elem_count}")
start = time.perf_counter()
try:
snapshot = await bridge.snapshot(tab_id, timeout_s=15.0)
elapsed = time.perf_counter() - start
tree_len = len(snapshot.get("tree", ""))
truncated = "(truncated)" in snapshot.get("tree", "")
print(f"LinkedIn snapshot: {elapsed:.3f}s, {tree_len} chars, truncated={truncated}")
if elapsed < 5.0:
print("✓ PASS: LinkedIn snapshot fast enough")
elif elapsed < 15.0:
print("⚠ WARNING: LinkedIn snapshot slow but within timeout")
else:
print("✗ FAIL: LinkedIn snapshot too slow")
except asyncio.TimeoutError:
print("✗ FAIL: LinkedIn snapshot timed out")
await bridge.destroy_context(group_id)
print("\n✓ Context destroyed")
finally:
await bridge.stop()
if __name__ == "__main__":
asyncio.run(test_huge_dom())
@@ -1,187 +0,0 @@
#!/usr/bin/env python
"""
Test #13: SPA Navigation Events
Symptom: wait_until="load" fires before content ready
Root Cause: SPA uses client-side routing, no full page load
Detection: URL changes but load event already fired
Fix: Use wait_until="networkidle" or wait_for_selector
"""
import asyncio
import sys
import time
import base64
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent.parent.parent / "tools" / "src"))
from gcu.browser.bridge import BeelineBridge
CONTEXT_NAME = "spa-nav-test"
async def test_spa_navigation():
"""Test navigation timing on SPA pages."""
print("=" * 70)
print("TEST #13: SPA Navigation Events")
print("=" * 70)
bridge = BeelineBridge()
try:
await bridge.start()
for i in range(10):
await asyncio.sleep(1)
if bridge.is_connected:
print("✓ Extension connected!")
break
else:
print("✗ Extension not connected")
return
context = await bridge.create_context(CONTEXT_NAME)
tab_id = context.get("tabId")
group_id = context.get("groupId")
print(f"✓ Created tab: {tab_id}")
# Create a test SPA
spa_html = """
<!DOCTYPE html>
<html>
<head>
<title>SPA Test</title>
<style>
nav a { margin-right: 10px; }
.page { padding: 20px; border: 1px solid #ccc; margin-top: 10px; }
</style>
</head>
<body>
<nav>
<a href="#home" onclick="navigate('home')">Home</a>
<a href="#about" onclick="navigate('about')">About</a>
<a href="#contact" onclick="navigate('contact')">Contact</a>
</nav>
<div id="app" class="page">
<h1>Loading...</h1>
</div>
<script>
// Simulate SPA routing
let currentPage = '';
async function navigate(page) {
event.preventDefault();
currentPage = page;
// Show loading state
document.getElementById('app').innerHTML = '<h1>Loading...</h1>';
// Simulate async content loading (like real SPAs)
await new Promise(r => setTimeout(r, 500));
// Render content
const content = {
home: '<h1>Home Page</h1><p>Welcome to the SPA!</p><button id="home-btn">Home Action</button>',
about: '<h1>About Page</h1><p>This is a simulated SPA.</p><button id="about-btn">About Action</button>',
contact: '<h1>Contact Page</h1><p>Contact us at test@example.com</p><button id="contact-btn">Contact Action</button>'
};
document.getElementById('app').innerHTML = content[page] || '<h1>404</h1>';
window.location.hash = page;
}
// Initial load with delay (simulates SPA hydration)
setTimeout(() => {
navigate('home');
}, 1000);
// Track for testing
window.pageLoads = [];
window.addEventListener('hashchange', () => {
window.pageLoads.push(window.location.hash);
});
</script>
</body>
</html>
"""
# Write to file and use file:// URL (data: URLs don't work well with extension)
test_file = Path("/tmp/spa_test.html")
test_file.write_text(spa_html.strip())
file_url = f"file://{test_file}"
# Test 1: wait_until="load" - may fire before content ready
print("\n--- Test 1: wait_until='load' ---")
start = time.perf_counter()
await bridge.navigate(tab_id, file_url, wait_until="load")
elapsed = time.perf_counter() - start
print(f"Navigation completed in {elapsed:.3f}s")
# Check content immediately
content = await bridge.evaluate(
tab_id,
"(function() { return document.getElementById('app').innerText; })()"
)
print(f"Content immediately after load: '{content.get('result', '')}'")
# Screenshot
screenshot = await bridge.screenshot(tab_id)
print(f"Screenshot: {len(screenshot.get('data', ''))} bytes")
# Wait for content
print("\n--- Waiting for content to hydrate ---")
await bridge.wait_for_selector(tab_id, "#home-btn", timeout_ms=5000)
print("✓ Content loaded")
# Check content after wait
content_after = await bridge.evaluate(
tab_id,
"(function() { return document.getElementById('app').innerText; })()"
)
print(f"Content after wait: '{content_after.get('result', '')}'")
# Test 2: SPA navigation (no full page load)
print("\n--- Test 2: SPA client-side navigation ---")
# Click "About" link
await bridge.click(tab_id, 'a[href="#about"]')
await asyncio.sleep(1)
# Check if content changed
about_content = await bridge.evaluate(
tab_id,
"(function() { return document.getElementById('app').innerText; })()"
)
print(f"Content after SPA nav: '{about_content.get('result', '')}'")
if "About Page" in about_content.get("result", ""):
print("✓ PASS: SPA navigation worked")
else:
print("✗ FAIL: SPA navigation didn't update content")
# Test 3: wait_until="networkidle"
print("\n--- Test 3: wait_until='networkidle' ---")
await bridge.navigate(tab_id, file_url, wait_until="networkidle", timeout_ms=10000)
# Check content immediately
content_networkidle = await bridge.evaluate(
tab_id,
"(function() { return document.getElementById('app').innerText; })()"
)
print(f"Content after networkidle: '{content_networkidle.get('result', '')}'")
if "Home Page" in content_networkidle.get("result", ""):
print("✓ PASS: networkidle waited for content")
else:
print("⚠ WARNING: networkidle didn't wait long enough")
await bridge.destroy_context(group_id)
print("\n✓ Context destroyed")
finally:
await bridge.stop()
if __name__ == "__main__":
asyncio.run(test_spa_navigation())
@@ -1,262 +0,0 @@
#!/usr/bin/env python
"""
Test #15: Screenshot Functionality
Tests browser_screenshot across multiple scenarios:
- Basic viewport screenshot
- Full-page screenshot
- Selector-based screenshot
- Screenshot on complex DOM
- Timeout handling
Category: screenshot
"""
import asyncio
import base64
import sys
import time
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent.parent.parent / "tools" / "src"))
from gcu.browser.bridge import BeelineBridge
CONTEXT_NAME = "screenshot-test"
SIMPLE_HTML = """<!DOCTYPE html>
<html>
<head><style>
body { margin: 0; background: #fff; font-family: sans-serif; }
h1 { color: #333; padding: 20px; }
.box { width: 200px; height: 100px; background: #4a90e2; margin: 20px; }
.long-content { height: 2000px; background: linear-gradient(blue, red); }
</style></head>
<body>
<h1 id="title">Screenshot Test Page</h1>
<div class="box" id="target-box">Target Box</div>
<div class="long-content"></div>
</body>
</html>"""
def check_png(data: str) -> bool:
"""Verify that base64 data decodes to a valid PNG."""
try:
raw = base64.b64decode(data)
return raw[:8] == b'\x89PNG\r\n\x1a\n'
except Exception:
return False
async def test_basic_screenshot(bridge: BeelineBridge, tab_id: int, data_url: str):
print("\n--- Test 1: Basic Viewport Screenshot ---")
await bridge.navigate(tab_id, data_url, wait_until="load")
await asyncio.sleep(0.5)
start = time.perf_counter()
result = await bridge.screenshot(tab_id)
elapsed = time.perf_counter() - start
ok = result.get("ok")
data = result.get("data", "")
mime = result.get("mimeType", "")
print(f" ok={ok}, mimeType={mime}, elapsed={elapsed:.3f}s")
print(f" data length: {len(data)} chars")
if ok and data:
valid_png = check_png(data)
print(f" valid PNG: {valid_png}")
if valid_png:
raw = base64.b64decode(data)
print(f" PNG size: {len(raw)} bytes")
print(" ✓ PASS: Basic screenshot works")
return True
else:
print(" ✗ FAIL: Data is not a valid PNG")
else:
print(f" ✗ FAIL: {result.get('error', 'no data')}")
return False
async def test_full_page_screenshot(bridge: BeelineBridge, tab_id: int, data_url: str):
print("\n--- Test 2: Full Page Screenshot ---")
await bridge.navigate(tab_id, data_url, wait_until="load")
await asyncio.sleep(0.5)
viewport_result = await bridge.screenshot(tab_id, full_page=False)
full_result = await bridge.screenshot(tab_id, full_page=True)
v_data = viewport_result.get("data", "")
f_data = full_result.get("data", "")
if not v_data or not f_data:
print(f" ✗ FAIL: viewport ok={viewport_result.get('ok')}, full ok={full_result.get('ok')}")
return False
v_size = len(base64.b64decode(v_data))
f_size = len(base64.b64decode(f_data))
print(f" Viewport PNG: {v_size} bytes")
print(f" Full page PNG: {f_size} bytes")
if f_size > v_size:
print(" ✓ PASS: Full page larger than viewport")
return True
else:
print(" ✗ FAIL: Full page not larger than viewport (may not capture long pages)")
return False
async def test_selector_screenshot(bridge: BeelineBridge, tab_id: int, data_url: str):
print("\n--- Test 3: Selector Screenshot ---")
await bridge.navigate(tab_id, data_url, wait_until="load")
await asyncio.sleep(0.5)
# selector param exists in signature but may not be implemented
result = await bridge.screenshot(tab_id, selector="#target-box")
ok = result.get("ok")
data = result.get("data", "")
if ok and data:
# If implemented, the box screenshot should be smaller than a full viewport screenshot
full_result = await bridge.screenshot(tab_id)
full_data = full_result.get("data", "")
if full_data:
sel_size = len(base64.b64decode(data))
full_size = len(base64.b64decode(full_data))
print(f" Selector PNG: {sel_size} bytes")
print(f" Full page PNG: {full_size} bytes")
if sel_size < full_size:
print(" ✓ PASS: Selector screenshot smaller than full page")
return True
else:
print(" ⚠ WARNING: Selector screenshot not smaller (may be full page)")
return False
else:
print(f" ⚠ NOT IMPLEMENTED: selector param ignored (returns full page) - error={result.get('error')}")
print(" NOTE: selector parameter exists in signature but is not used in implementation")
return False
async def test_screenshot_url_metadata(bridge: BeelineBridge, tab_id: int):
print("\n--- Test 4: Screenshot URL Metadata ---")
await bridge.navigate(tab_id, "https://example.com", wait_until="load")
await asyncio.sleep(1)
result = await bridge.screenshot(tab_id)
url = result.get("url", "")
tab = result.get("tabId")
print(f" url={url!r}, tabId={tab}")
if "example.com" in url:
print(" ✓ PASS: URL metadata captured correctly")
return True
else:
print(f" ✗ FAIL: Expected example.com in URL, got {url!r}")
return False
async def test_screenshot_timeout(bridge: BeelineBridge, tab_id: int, data_url: str):
print("\n--- Test 5: Timeout Handling ---")
await bridge.navigate(tab_id, data_url, wait_until="load")
# Very short timeout - likely still completes since simple page
start = time.perf_counter()
result = await bridge.screenshot(tab_id, timeout_s=0.001)
elapsed = time.perf_counter() - start
if not result.get("ok"):
err = result.get("error", "")
if "timed out" in err or "cancelled" in err:
print(f" ✓ PASS: Timeout handled gracefully: {err!r}")
return True
else:
print(f" ⚠ Fast enough to beat timeout: {err!r} in {elapsed:.3f}s")
return True # Not a failure, just fast
else:
print(f" ⚠ Screenshot completed before timeout ({elapsed:.3f}s) - too fast to test timeout")
return True # Still ok, just very fast
async def test_screenshot_complex_site(bridge: BeelineBridge, tab_id: int):
print("\n--- Test 6: Complex Site (example.com) ---")
await bridge.navigate(tab_id, "https://example.com", wait_until="load")
await asyncio.sleep(1)
start = time.perf_counter()
result = await bridge.screenshot(tab_id)
elapsed = time.perf_counter() - start
ok = result.get("ok")
data = result.get("data", "")
print(f" ok={ok}, elapsed={elapsed:.3f}s, data_len={len(data)}")
if ok and check_png(data):
print(" ✓ PASS: Screenshot on real site works")
return True
else:
print(f" ✗ FAIL: {result.get('error', 'bad data')}")
return False
async def main():
print("=" * 70)
print("TEST #15: Screenshot Functionality")
print("=" * 70)
bridge = BeelineBridge()
try:
await bridge.start()
for i in range(10):
await asyncio.sleep(1)
if bridge.is_connected:
print("✓ Extension connected!")
break
print(f"Waiting for extension... ({i+1}/10)")
else:
print("✗ Extension not connected. Ensure Chrome with Beeline extension is running.")
return
context = await bridge.create_context(CONTEXT_NAME)
tab_id = context.get("tabId")
group_id = context.get("groupId")
print(f"✓ Created tab: {tab_id}")
data_url = f"data:text/html;base64,{base64.b64encode(SIMPLE_HTML.encode()).decode()}"
results = {
"basic": await test_basic_screenshot(bridge, tab_id, data_url),
"full_page": await test_full_page_screenshot(bridge, tab_id, data_url),
"selector": await test_selector_screenshot(bridge, tab_id, data_url),
"metadata": await test_screenshot_url_metadata(bridge, tab_id),
"timeout": await test_screenshot_timeout(bridge, tab_id, data_url),
"complex_site": await test_screenshot_complex_site(bridge, tab_id),
}
print("\n" + "=" * 70)
print("SUMMARY")
print("=" * 70)
for name, passed in results.items():
status = "✓ PASS" if passed else "✗ FAIL"
print(f" {status}: {name}")
passed_count = sum(1 for v in results.values() if v)
total = len(results)
print(f"\n {passed_count}/{total} tests passed")
await bridge.destroy_context(group_id)
print("\n✓ Context destroyed")
finally:
await bridge.stop()
print("✓ Bridge stopped")
if __name__ == "__main__":
asyncio.run(main())
@@ -1,327 +0,0 @@
#!/usr/bin/env python
"""
Browser Edge Case Test Template
This script provides a template for testing and debugging browser tool failures
on specific websites. Use this to reproduce, isolate, and verify fixes.
Usage:
1. Copy this file: cp test_case.py test_#[number]_[site].py
2. Fill in the CONFIG section with your test details
3. Run: uv run python test_#[number]_[site].py
Example:
uv run python test_01_linkedin_scroll.py
"""
import asyncio
import sys
import time
from pathlib import Path
# Add tools to path
sys.path.insert(0, str(Path(__file__).parent.parent.parent.parent / "tools" / "src"))
from gcu.browser.bridge import BeelineBridge
# ═══════════════════════════════════════════════════════════════════════════════
# CONFIG: Fill in these values for your test case
# ═══════════════════════════════════════════════════════════════════════════════
TEST_CASE = {
"number": 1,
"name": "LinkedIn Nested Scroll Container",
"site": "https://www.linkedin.com/feed",
"simple_site": "https://example.com",
"category": "scroll", # scroll, click, input, snapshot, navigation
"symptom": "scroll() returns success but page doesn't move",
}
BRIDGE_PORT = 9229
CONTEXT_NAME = "edge-case-test"
# ═══════════════════════════════════════════════════════════════════════════════
# TEST FUNCTIONS
# ═══════════════════════════════════════════════════════════════════════════════
async def test_simple_site(bridge: BeelineBridge, tab_id: int) -> dict:
"""Test that the tool works on a simple site (baseline)."""
print("\n--- Baseline Test (Simple Site) ---")
await bridge.navigate(tab_id, TEST_CASE["simple_site"], wait_until="load")
await asyncio.sleep(1)
# Adjust this based on category
if TEST_CASE["category"] == "scroll":
result = await bridge.scroll(tab_id, "down", 100)
print(f" Scroll result: {result}")
return result
elif TEST_CASE["category"] == "click":
# Add click test
pass
elif TEST_CASE["category"] == "snapshot":
result = await bridge.snapshot(tab_id, timeout_s=5.0)
print(f" Snapshot length: {len(result.get('tree', ''))}")
return result
return {"ok": True}
async def test_problematic_site(bridge: BeelineBridge, tab_id: int) -> dict:
"""Test the tool on the problematic site."""
print("\n--- Problem Site Test ---")
await bridge.navigate(tab_id, TEST_CASE["site"], wait_until="load", timeout_ms=30000)
await asyncio.sleep(2)
# Adjust this based on category
if TEST_CASE["category"] == "scroll":
# Get scroll positions before
before = await bridge.evaluate(
tab_id,
"""
(function() {
const results = { window: { y: window.scrollY } };
document.querySelectorAll('*').forEach((el, i) => {
const style = getComputedStyle(el);
if ((style.overflowY === 'scroll' || style.overflowY === 'auto') &&
el.scrollHeight > el.clientHeight) {
results['el_' + i] = {
tag: el.tagName,
scrollTop: el.scrollTop,
class: el.className.substring(0, 30)
};
}
});
return results;
})();
"""
)
print(f" Before scroll: {before.get('result', {})}")
# Try to scroll
result = await bridge.scroll(tab_id, "down", 500)
print(f" Scroll result: {result}")
await asyncio.sleep(1)
# Get scroll positions after
after = await bridge.evaluate(
tab_id,
"""
(function() {
const results = { window: { y: window.scrollY } };
document.querySelectorAll('*').forEach((el, i) => {
const style = getComputedStyle(el);
if ((style.overflowY === 'scroll' || style.overflowY === 'auto') &&
el.scrollHeight > el.clientHeight) {
results['el_' + i] = {
tag: el.tagName,
scrollTop: el.scrollTop,
class: el.className.substring(0, 30)
};
}
});
return results;
})();
"""
)
print(f" After scroll: {after.get('result', {})}")
# Check if anything changed
before_data = before.get("result", {}) or {}
after_data = after.get("result", {}) or {}
changed = False
for key in after_data:
if key in before_data:
b_val = before_data[key].get("scrollTop", 0) if isinstance(before_data[key], dict) else 0
a_val = after_data[key].get("scrollTop", 0) if isinstance(after_data[key], dict) else 0
if a_val != b_val:
print(f" ✓ CHANGE DETECTED: {key} scrolled from {b_val} to {a_val}")
changed = True
if not changed:
print(" ✗ NO CHANGE: Scroll did not affect any container")
return {"ok": changed, "scroll_result": result}
elif TEST_CASE["category"] == "snapshot":
start = time.perf_counter()
try:
result = await bridge.snapshot(tab_id, timeout_s=15.0)
elapsed = time.perf_counter() - start
tree_len = len(result.get("tree", ""))
print(f" Snapshot completed in {elapsed:.2f}s, {tree_len} chars")
return {"ok": True, "elapsed": elapsed, "tree_length": tree_len}
except asyncio.TimeoutError:
print(" ✗ SNAPSHOT TIMED OUT")
return {"ok": False, "error": "timeout"}
return {"ok": True}
async def detect_root_cause(bridge: BeelineBridge, tab_id: int) -> dict:
"""Run detection scripts to identify the root cause."""
print("\n--- Root Cause Detection ---")
detections = {}
# Detection 1: Nested scrollable containers
scroll_check = await bridge.evaluate(
tab_id,
"""
(function() {
const candidates = [];
document.querySelectorAll('*').forEach(el => {
const style = getComputedStyle(el);
if (style.overflow.includes('scroll') || style.overflow.includes('auto')) {
const rect = el.getBoundingClientRect();
if (rect.width > 100 && rect.height > 100) {
candidates.push({
tag: el.tagName,
area: rect.width * rect.height,
class: el.className.substring(0, 30)
});
}
}
});
candidates.sort((a, b) => b.area - a.area);
return {
count: candidates.length,
largest: candidates[0]
};
})();
"""
)
detections["nested_scroll"] = scroll_check.get("result", {})
print(f" Nested scroll containers: {detections['nested_scroll']}")
# Detection 2: Shadow DOM
shadow_check = await bridge.evaluate(
tab_id,
"""
(function() {
const withShadow = [];
document.querySelectorAll('*').forEach(el => {
if (el.shadowRoot) {
withShadow.push(el.tagName);
}
});
return { count: withShadow.length, elements: withShadow.slice(0, 5) };
})();
"""
)
detections["shadow_dom"] = shadow_check.get("result", {})
print(f" Shadow DOM: {detections['shadow_dom']}")
# Detection 3: iframes
iframe_check = await bridge.evaluate(
tab_id,
"""
(function() {
const iframes = document.querySelectorAll('iframe');
return { count: iframes.length };
})();
"""
)
detections["iframes"] = iframe_check.get("result", {})
print(f" iframes: {detections['iframes']}")
# Detection 4: DOM size
dom_check = await bridge.evaluate(
tab_id,
"""
(function() {
return {
elements: document.querySelectorAll('*').length,
body_children: document.body.children.length
};
})();
"""
)
detections["dom_size"] = dom_check.get("result", {})
print(f" DOM size: {detections['dom_size']}")
# Detection 5: Framework detection
framework_check = await bridge.evaluate(
tab_id,
"""
(function() {
return {
react: !!document.querySelector('[data-reactroot], [data-reactid]'),
vue: !!document.querySelector('[data-v-]'),
angular: !!document.querySelector('[ng-app], [ng-version]')
};
})();
"""
)
detections["frameworks"] = framework_check.get("result", {})
print(f" Frameworks: {detections['frameworks']}")
return detections
# ═══════════════════════════════════════════════════════════════════════════════
# MAIN
# ═══════════════════════════════════════════════════════════════════════════════
async def main():
print("=" * 70)
print(f"EDGE CASE TEST #{TEST_CASE['number']}: {TEST_CASE['name']}")
print("=" * 70)
print(f"Site: {TEST_CASE['site']}")
print(f"Category: {TEST_CASE['category']}")
print(f"Symptom: {TEST_CASE['symptom']}")
bridge = BeelineBridge()
try:
print("\n--- Starting Bridge ---")
await bridge.start()
# Wait for extension connection
for i in range(10):
await asyncio.sleep(1)
if bridge.is_connected:
print("✓ Extension connected!")
break
print(f"Waiting for extension... ({i+1}/10)")
else:
print("✗ Extension not connected. Ensure Chrome with Beeline extension is running.")
return
# Create browser context
context = await bridge.create_context(CONTEXT_NAME)
tab_id = context.get("tabId")
group_id = context.get("groupId")
print(f"✓ Created tab: {tab_id}")
# Run tests
baseline_result = await test_simple_site(bridge, tab_id)
problem_result = await test_problematic_site(bridge, tab_id)
detections = await detect_root_cause(bridge, tab_id)
# Summary
print("\n" + "=" * 70)
print("SUMMARY")
print("=" * 70)
print(f"Baseline test: {'✓ PASS' if baseline_result.get('ok') else '✗ FAIL'}")
print(f"Problem test: {'✓ PASS' if problem_result.get('ok') else '✗ FAIL'}")
print(f"Root cause indicators: {list(k for k, v in detections.items() if v)}")
# Cleanup
print("\n--- Cleanup ---")
await bridge.destroy_context(group_id)
print("✓ Context destroyed")
finally:
await bridge.stop()
print("✓ Bridge stopped")
if __name__ == "__main__":
asyncio.run(main())
+1
View File
@@ -0,0 +1 @@
../../core/.claude/skills/building-agents
-225
View File
@@ -1,225 +0,0 @@
# Integration Test Reporting Skill
Run the Level 2 dummy agent integration test suite and produce a detailed HTML report with per-test input → outcome analysis.
## Trigger
User wants to run integration tests and see results:
- `/test-reporting`
- `/test-reporting test_component_queen_live.py`
- `/test-reporting --all`
## SOP: Running Tests
### Step 1: Select Scope
If the user provides a specific test file or pattern, use it. Otherwise run the full suite.
```bash
# Full suite
cd core && echo "1" | uv run python tests/dummy_agents/run_all.py --interactive 2>&1
# Specific file (requires manual provider setup)
cd core && uv run python -c "
import sys
sys.path.insert(0, '.')
from tests.dummy_agents.run_all import detect_available
from tests.dummy_agents.conftest import set_llm_selection
avail = detect_available()
claude = [p for p in avail if 'Claude Code' in p['name']]
if not claude:
avail_names = [p['name'] for p in avail]
raise RuntimeError(f'No Claude Code subscription. Available: {avail_names}')
provider = claude[0]
set_llm_selection(
model=provider['model'],
api_key=provider['api_key'],
extra_headers=provider.get('extra_headers'),
api_base=provider.get('api_base'),
)
import pytest
sys.exit(pytest.main([
'tests/dummy_agents/TEST_FILE_HERE',
'-v', '--override-ini=asyncio_mode=auto', '--no-header', '--tb=long',
'--log-cli-level=WARNING', '--junitxml=/tmp/hive_test_results.xml',
]))
"
```
### Step 2: Collect Results
After the test run completes, collect:
1. **JUnit XML** from `--junitxml` output (if available)
2. **stdout/stderr** from the run
3. **Summary table** from `run_all.py` output (the Unicode table)
### Step 3: Generate HTML Report
Write the report to `/tmp/hive_integration_test_report.html`.
The report MUST include these sections:
#### Header
- Run timestamp (ISO 8601)
- Provider used (model name, source)
- Total tests / passed / failed / skipped
- Total wall-clock time
- Overall verdict: PASS (all green) or FAIL (with count)
#### Per-Test Table
For EVERY test (not just failures), include a row with:
| Column | Description |
|--------|-------------|
| Component | Test file grouping (e.g., `component_queen_live`) |
| Test Name | Function name (e.g., `test_queen_starts_in_planning_without_worker`) |
| Status | PASS / FAIL / SKIP / ERROR with color badge |
| Duration | Wall-clock seconds |
| What | One-line description of what the test verifies |
| How | How it works (setup → action → assertion) |
| Why | Why this test matters (what bug/behavior it catches) |
| Input | The input data or configuration (graph spec, initial prompt, phase, etc.) |
| Expected Outcome | What the test asserts |
| Actual Outcome | What actually happened (PASS: matches expected / FAIL: actual vs expected) |
| Failure Detail | For failures only: full traceback + diagnosis |
#### What / How / Why Descriptions
These MUST be derived from the test function's docstring and code. Read each test file to extract:
- **What**: From the docstring first line
- **How**: From the test body (what fixtures, what graph, what assertions)
- **Why**: From the docstring body or "Why this matters" section in the test module
Use these mappings for the component test files:
```
test_component_llm.py → "LLM Provider" — streaming, tool calling, tokens
test_component_tools.py → "Tool Registry + MCP" — connection, execution
test_component_event_loop.py → "EventLoopNode" — iteration, output, stall
test_component_edges.py → "Edge Evaluation" — conditional, priority
test_component_conversation.py → "Conversation Persistence" — storage, cursor
test_component_escalation.py → "Escalation Flow" — worker→queen signaling
test_component_continuous.py → "Continuous Mode" — conversation threading
test_component_queen.py → "Queen Phase (Unit)" — phase state, tools, events
test_component_queen_live.py → "Queen Phase (Live)" — real queen, real LLM
test_component_queen_state_machine.py → "Queen State Machine" — edge cases, races
test_component_worker_comms.py → "Worker Communication" — events, data flow
test_component_strict_outcomes.py → "Strict Outcomes" — exact path, output, quality
```
#### HTML Template
Use this structure:
```html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Hive Integration Test Report — {timestamp}</title>
<style>
:root { --pass: #22c55e; --fail: #ef4444; --skip: #f59e0b; --bg: #0f172a; --surface: #1e293b; --text: #e2e8f0; --muted: #94a3b8; --border: #334155; }
* { box-sizing: border-box; margin: 0; padding: 0; }
body { font-family: 'SF Mono', 'Fira Code', monospace; background: var(--bg); color: var(--text); padding: 2rem; line-height: 1.6; }
h1, h2, h3 { font-weight: 600; }
h1 { font-size: 1.5rem; margin-bottom: 1rem; }
h2 { font-size: 1.2rem; margin: 2rem 0 1rem; border-bottom: 1px solid var(--border); padding-bottom: 0.5rem; }
.summary { display: grid; grid-template-columns: repeat(auto-fit, minmax(150px, 1fr)); gap: 1rem; margin-bottom: 2rem; }
.card { background: var(--surface); padding: 1rem; border-radius: 8px; border: 1px solid var(--border); }
.card .label { color: var(--muted); font-size: 0.75rem; text-transform: uppercase; }
.card .value { font-size: 1.5rem; font-weight: 700; margin-top: 0.25rem; }
.card .value.pass { color: var(--pass); }
.card .value.fail { color: var(--fail); }
table { width: 100%; border-collapse: collapse; font-size: 0.8rem; }
th { background: var(--surface); position: sticky; top: 0; text-align: left; padding: 0.5rem; border-bottom: 2px solid var(--border); color: var(--muted); text-transform: uppercase; font-size: 0.7rem; }
td { padding: 0.5rem; border-bottom: 1px solid var(--border); vertical-align: top; }
tr:hover { background: rgba(255,255,255,0.03); }
.badge { display: inline-block; padding: 2px 8px; border-radius: 4px; font-size: 0.7rem; font-weight: 700; }
.badge.pass { background: rgba(34,197,94,0.2); color: var(--pass); }
.badge.fail { background: rgba(239,68,68,0.2); color: var(--fail); }
.badge.skip { background: rgba(245,158,11,0.2); color: var(--skip); }
.detail { background: #1a1a2e; padding: 0.75rem; border-radius: 4px; margin-top: 0.5rem; font-size: 0.75rem; white-space: pre-wrap; overflow-x: auto; max-height: 200px; overflow-y: auto; }
.component-header { background: var(--surface); padding: 0.75rem 0.5rem; font-weight: 600; font-size: 0.85rem; }
.meta { color: var(--muted); font-size: 0.75rem; }
</style>
</head>
<body>
<h1>Hive Integration Test Report</h1>
<p class="meta">Generated: {timestamp} | Provider: {provider} | Duration: {duration}s</p>
<div class="summary">
<div class="card"><div class="label">Total</div><div class="value">{total}</div></div>
<div class="card"><div class="label">Passed</div><div class="value pass">{passed}</div></div>
<div class="card"><div class="label">Failed</div><div class="value fail">{failed}</div></div>
<div class="card"><div class="label">Verdict</div><div class="value {verdict_class}">{verdict}</div></div>
</div>
<h2>Test Results</h2>
<table>
<thead>
<tr>
<th>Component</th>
<th>Test</th>
<th>Status</th>
<th>Time</th>
<th>What</th>
<th>Input → Expected → Actual</th>
</tr>
</thead>
<tbody>
<!-- For each test: -->
<tr>
<td>{component}</td>
<td>{test_name}</td>
<td><span class="badge {status_class}">{status}</span></td>
<td>{duration}s</td>
<td>{what_description}</td>
<td>
<strong>Input:</strong> {input_description}<br>
<strong>Expected:</strong> {expected_outcome}<br>
<strong>Actual:</strong> {actual_outcome}
<!-- If failed: -->
<div class="detail">{failure_traceback}</div>
</td>
</tr>
</tbody>
</table>
<h2>Failure Analysis</h2>
<!-- Only if there are failures -->
<p>For each failure, provide:</p>
<ul>
<li><strong>Root cause:</strong> Why it failed</li>
<li><strong>Impact:</strong> What this means for the system</li>
<li><strong>Suggested fix:</strong> How to address it</li>
</ul>
</body>
</html>
```
### Step 4: Output
1. Write the HTML file to `/tmp/hive_integration_test_report.html`
2. Print the file path so the user can open it
3. Print a concise summary to the terminal:
```
Test Report: /tmp/hive_integration_test_report.html
Result: 74/76 PASSED (2 failures)
Failures:
- parallel_merge::test_parallel_disjoint_output_keys
- worker::test_worker_timestamped_note_artifact
```
## Key Rules
1. ALWAYS use `--junitxml` when running pytest to get structured results
2. ALWAYS read the test source files to populate What/How/Why columns — do not guess
3. For Input/Expected/Actual, extract from the test's graph spec, assertions, and result
4. Color-code everything: green for pass, red for fail, amber for skip
5. Include the full traceback for failures in a scrollable `<div class="detail">`
6. Group tests by component (file name) with a visual separator
7. The report must be self-contained HTML (no external CSS/JS dependencies)
-145
View File
@@ -1,145 +0,0 @@
# Triage Issue Skill
Analyze a GitHub issue, verify claims against the codebase, and close invalid issues with a technical response.
## Trigger
User provides a GitHub issue URL or number, e.g.:
- `/triage-issue 1970`
- `/triage-issue https://github.com/adenhq/hive/issues/1970`
## Workflow
### Step 1: Fetch Issue Details
```bash
gh issue view <number> --repo adenhq/hive --json title,body,state,labels,author
```
Extract:
- Title
- Body (the claim/bug report)
- Current state
- Labels
- Author
If issue is already closed, inform user and stop.
### Step 2: Analyze the Claim
Read the issue body and identify:
1. **The core claim** - What is the user asserting?
2. **Technical specifics** - File paths, function names, code snippets mentioned
3. **Expected behavior** - What do they think should happen?
4. **Severity claimed** - Security issue? Bug? Feature request?
### Step 3: Investigate the Codebase
For each technical claim:
1. Find the referenced code using Grep/Glob/Read
2. Understand the actual implementation
3. Check if the claim accurately describes the behavior
4. Look for related tests, documentation, or design decisions
### Step 4: Evaluate Validity
Categorize the issue as one of:
| Category | Action |
|----------|--------|
| **Valid Bug** | Do NOT close. Inform user this is a real issue. |
| **Valid Feature Request** | Do NOT close. Suggest labeling appropriately. |
| **Misunderstanding** | Prepare technical explanation for why behavior is correct. |
| **Fundamentally Flawed** | Prepare critique explaining the technical impossibility or design rationale. |
| **Duplicate** | Find the original issue and prepare duplicate notice. |
| **Incomplete** | Prepare request for more information. |
### Step 5: Draft Response
For issues to be closed, draft a response that:
1. **Acknowledges the concern** - Don't be dismissive
2. **Explains the actual behavior** - With code references
3. **Provides technical rationale** - Why it works this way
4. **References industry standards** - If applicable
5. **Offers alternatives** - If there's a better approach for the user
Use this template:
```markdown
## Analysis
[Brief summary of what was investigated]
## Technical Details
[Explanation with code references]
## Why This Is Working As Designed
[Rationale]
## Recommendation
[What the user should do instead, if applicable]
---
*This issue was reviewed and closed by the maintainers.*
```
### Step 6: User Review
Present the draft to the user with:
```
## Issue #<number>: <title>
**Claim:** <summary of claim>
**Finding:** <valid/invalid/misunderstanding/etc>
**Draft Response:**
<the markdown response>
---
Do you want me to post this comment and close the issue?
```
Use AskUserQuestion with options:
- "Post and close" - Post comment, close issue
- "Edit response" - Let user modify the response
- "Skip" - Don't take action
### Step 7: Execute Action
If user approves:
```bash
# Post comment
gh issue comment <number> --repo adenhq/hive --body "<response>"
# Close issue
gh issue close <number> --repo adenhq/hive --reason "not planned"
```
Report success with link to the issue.
## Important Guidelines
1. **Never close valid issues** - If there's any merit to the claim, don't close it
2. **Be respectful** - The reporter took time to file the issue
3. **Be technical** - Provide code references and evidence
4. **Be educational** - Help them understand, don't just dismiss
5. **Check twice** - Make sure you understand the code before declaring something invalid
6. **Consider edge cases** - Maybe their environment reveals a real issue
## Example Critiques
### Security Misunderstanding
> "The claim that secrets are exposed in plaintext misunderstands the encryption architecture. While `SecretStr` is used for logging protection, actual encryption is provided by Fernet (AES-128-CBC) at the storage layer. The code path is: serialize → encrypt → write. Only encrypted bytes touch disk."
### Impossible Request
> "The requested feature would require [X] which violates [fundamental constraint]. This is not a limitation of our implementation but a fundamental property of [technology/protocol]."
### Already Handled
> "This scenario is already handled by [code reference]. The reporter may be using an older version or misconfigured environment."
-18
View File
@@ -1,18 +0,0 @@
This project uses ruff for Python linting and formatting.
Rules:
- Line length: 100 characters
- Python target: 3.11+
- Use double quotes for strings
- Sort imports with isort (ruff I rules): stdlib, third-party, first-party (framework), local
- Combine as-imports
- Use type hints on all function signatures
- Use `from __future__ import annotations` for modern type syntax
- Raise exceptions with `from` in except blocks (B904)
- No unused imports (F401), no unused variables (F841)
- Prefer list/dict/set comprehensions over map/filter (C4)
Run `make lint` to auto-fix, `make check` to verify without modifying files.
Run `make format` to apply ruff formatting.
The ruff config lives in core/pyproject.toml under [tool.ruff].
-3
View File
@@ -11,9 +11,6 @@ indent_size = 2
insert_final_newline = true
trim_trailing_whitespace = true
[*.py]
indent_size = 4
[*.md]
trim_trailing_whitespace = false
-124
View File
@@ -1,124 +0,0 @@
# Normalize line endings for all text files
* text=auto
# Source code
*.py text diff=python
*.js text
*.ts text
*.jsx text
*.tsx text
*.json text
*.yaml text
*.yml text
*.toml text
*.ini text
*.cfg text
# Shell scripts (must use LF)
*.sh text eol=lf
quickstart.sh text eol=lf
# PowerShell scripts (Windows-friendly)
*.ps1 text eol=lf
*.psm1 text eol=lf
# Windows batch files (must use CRLF)
*.bat text eol=crlf
*.cmd text eol=crlf
# Documentation
*.md text
*.txt text
*.rst text
*.tex text
# Configuration files
.gitignore text
.gitattributes text
.editorconfig text
Dockerfile text
docker-compose.yml text
requirements*.txt text
pyproject.toml text
setup.py text
setup.cfg text
MANIFEST.in text
LICENSE text
README* text
CHANGELOG* text
CONTRIBUTING* text
CODE_OF_CONDUCT* text
# Web files
*.html text
*.css text
*.scss text
*.sass text
# Data files
*.xml text
*.csv text
*.sql text
# Graphics (binary)
*.png binary
*.jpg binary
*.jpeg binary
*.gif binary
*.ico binary
*.svg binary
*.eps binary
*.bmp binary
*.tif binary
*.tiff binary
# Archives (binary)
*.zip binary
*.tar binary
*.gz binary
*.bz2 binary
*.7z binary
*.rar binary
# Python compiled (binary)
*.pyc binary
*.pyo binary
*.pyd binary
*.whl binary
*.egg binary
# System libraries (binary)
*.so binary
*.dll binary
*.dylib binary
*.lib binary
*.a binary
# Documents (binary)
*.pdf binary
*.doc binary
*.docx binary
*.ppt binary
*.pptx binary
*.xls binary
*.xlsx binary
# Fonts (binary)
*.ttf binary
*.otf binary
*.woff binary
*.woff2 binary
*.eot binary
# Audio/Video (binary)
*.mp3 binary
*.mp4 binary
*.wav binary
*.avi binary
*.mov binary
*.flv binary
# Database files (binary)
*.db binary
*.sqlite binary
*.sqlite3 binary
+1
View File
@@ -8,6 +8,7 @@
/hive/ @adenhq/maintainers
# Infrastructure
/docker-compose*.yml @adenhq/maintainers
/.github/ @adenhq/maintainers
# Documentation
+6 -6
View File
@@ -1,10 +1,9 @@
---
name: Bug Report
about: Report a bug to help us improve
title: "[Bug]: "
labels: bug, enhancement
title: '[Bug]: '
labels: bug
assignees: ''
---
## Describe the Bug
@@ -30,12 +29,13 @@ If applicable, add screenshots to help explain your problem.
## Environment
- OS: [e.g., Ubuntu 22.04, macOS 14]
- Python version: [e.g., 3.11.0]
- Docker version (if applicable): [e.g., 24.0.0]
- Docker version: [e.g., 24.0.0]
- Node version: [e.g., 20.10.0]
- Browser (if applicable): [e.g., Chrome 120]
## Configuration
Relevant parts of your agent configuration or environment setup (remove any sensitive data):
Relevant parts of your `config.yaml` (remove any sensitive data):
```yaml
# paste here
+1 -2
View File
@@ -1,10 +1,9 @@
---
name: Feature Request
about: Suggest a new feature or enhancement
title: "[Feature]: "
title: '[Feature]: '
labels: enhancement
assignees: ''
---
## Problem Statement
@@ -1,89 +0,0 @@
name: Integration Bounty
description: A bounty task for the integration contribution program
title: "[Bounty]: "
labels: []
body:
- type: markdown
attributes:
value: |
## Integration Bounty
This issue is part of the [Integration Bounty Program](../../docs/bounty-program/README.md).
**Claim this bounty** by commenting below — a maintainer will assign you within 24 hours.
- type: dropdown
id: bounty-type
attributes:
label: Bounty Type
options:
- "Test a Tool (20 pts)"
- "Write Docs (20 pts)"
- "Code Contribution (30 pts)"
- "New Integration (75 pts)"
validations:
required: true
- type: dropdown
id: difficulty
attributes:
label: Difficulty
options:
- Easy
- Medium
- Hard
validations:
required: true
- type: input
id: tool-name
attributes:
label: Tool Name
description: The integration this bounty targets (e.g., `airtable`, `salesforce`)
placeholder: e.g., airtable
validations:
required: true
- type: textarea
id: description
attributes:
label: Description
description: What needs to be done to complete this bounty.
placeholder: |
Describe the specific task, including:
- What the contributor needs to do
- Links to relevant files in the repo
- Any setup requirements (API keys, accounts, etc.)
validations:
required: true
- type: textarea
id: acceptance-criteria
attributes:
label: Acceptance Criteria
description: What "done" looks like. The PR or report must meet all criteria.
placeholder: |
- [ ] Criterion 1
- [ ] Criterion 2
- [ ] CI passes
validations:
required: true
- type: textarea
id: relevant-files
attributes:
label: Relevant Files
description: Links to tool directory, credential spec, health check file, etc.
placeholder: |
- Tool: `tools/src/aden_tools/tools/{tool_name}/`
- Credential spec: `tools/src/aden_tools/credentials/{category}.py`
- Health checks: `tools/src/aden_tools/credentials/health_check.py`
- type: textarea
id: resources
attributes:
label: Resources
description: Links to API docs, examples, or guides that will help the contributor.
placeholder: |
- [Building Tools Guide](../../tools/BUILDING_TOOLS.md)
- [Tool README Template](../../docs/bounty-program/templates/tool-readme-template.md)
- API docs: https://...
@@ -1,71 +0,0 @@
---
name: Integration Request
about: Suggest a new integration
title: "[Integration]:"
labels: ''
assignees: ''
---
## Service
Name and brief description of the service and what it enables agents to do.
**Description:** [e.g., "API key for Slack Bot" — short one-liner for the credential spec]
## Credential Identity
- **credential_id:** [e.g., `slack`]
- **env_var:** [e.g., `SLACK_BOT_TOKEN`]
- **credential_key:** [e.g., `access_token`, `api_key`, `bot_token`]
## Tools
Tool function names that require this credential:
- [e.g., `slack_send_message`]
- [e.g., `slack_list_channels`]
## Auth Methods
- **Direct API key supported:** Yes / No
- **Aden OAuth supported:** Yes / No
If Aden OAuth is supported, describe the OAuth scopes/permissions required.
## How to Get the Credential
Link where users obtain the key/token:
[e.g., https://api.slack.com/apps]
Step-by-step instructions:
1. Go to ...
2. Create a ...
3. Select scopes/permissions: ...
4. Copy the key/token
## Health Check
A lightweight API call to validate the credential (no writes, no charges).
- **Endpoint:** [e.g., `https://slack.com/api/auth.test`]
- **Method:** [e.g., `GET` or `POST`]
- **Auth header:** [e.g., `Authorization: Bearer {token}` or `X-Api-Key: {key}`]
- **Parameters (if any):** [e.g., `?limit=1`]
- **200 means:** [e.g., key is valid]
- **401 means:** [e.g., invalid or expired]
- **429 means:** [e.g., rate limited but key is valid]
## Credential Group
Does this require multiple credentials configured together? (e.g., Google Custom Search needs
both an API key and a CSE ID)
- [ ] No, single credential
- [ ] Yes — list the other credential IDs in the group:
## Additional Context
Links to API docs, rate limits, free tier availability, or anything else relevant.
@@ -1,78 +0,0 @@
name: Standard Bounty
description: A bounty task for general framework contributions (not integration-specific)
title: "[Bounty]: "
labels: []
body:
- type: markdown
attributes:
value: |
## Standard Bounty
This issue is part of the [Bounty Program](../../docs/bounty-program/README.md).
**Claim this bounty** by commenting below — a maintainer will assign you within 24 hours.
- type: dropdown
id: bounty-size
attributes:
label: Bounty Size
options:
- "Small (10 pts)"
- "Medium (30 pts)"
- "Large (75 pts)"
- "Extreme (150 pts)"
validations:
required: true
- type: dropdown
id: difficulty
attributes:
label: Difficulty
options:
- Easy
- Medium
- Hard
validations:
required: true
- type: textarea
id: description
attributes:
label: Description
description: What needs to be done to complete this bounty.
placeholder: |
Describe the specific task, including:
- What the contributor needs to do
- Links to relevant files in the repo
- Any context or motivation for the change
validations:
required: true
- type: textarea
id: acceptance-criteria
attributes:
label: Acceptance Criteria
description: What "done" looks like. The PR must meet all criteria.
placeholder: |
- [ ] Criterion 1
- [ ] Criterion 2
- [ ] CI passes
validations:
required: true
- type: textarea
id: relevant-files
attributes:
label: Relevant Files
description: Links to files or directories related to this bounty.
placeholder: |
- `path/to/file.py`
- `path/to/directory/`
- type: textarea
id: resources
attributes:
label: Resources
description: Links to docs, issues, or external references that will help.
placeholder: |
- Related issue: #XXXX
- Docs: https://...
+2 -2
View File
@@ -24,8 +24,8 @@ Fixes #(issue number)
Describe the tests you ran to verify your changes:
- [ ] Unit tests pass (`cd core && pytest tests/`)
- [ ] Lint passes (`cd core && ruff check .`)
- [ ] Unit tests pass (`npm run test`)
- [ ] Lint passes (`npm run lint`)
- [ ] Manual testing performed
## Checklist
@@ -1,34 +0,0 @@
name: Auto-close duplicate issues
description: Auto-closes issues that are duplicates of existing issues
on:
schedule:
- cron: "0 */6 * * *"
workflow_dispatch:
jobs:
auto-close-duplicates:
runs-on: ubuntu-latest
timeout-minutes: 10
permissions:
contents: read
issues: write
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Setup Bun
uses: oven-sh/setup-bun@v2
with:
bun-version: latest
- name: Run auto-close-duplicates tests
run: bun test scripts/auto-close-duplicates
- name: Auto-close duplicate issues
run: bun run scripts/auto-close-duplicates.ts
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GITHUB_REPOSITORY_OWNER: ${{ github.repository_owner }}
GITHUB_REPOSITORY_NAME: ${{ github.event.repository.name }}
STATSIG_API_KEY: ${{ secrets.STATSIG_API_KEY }}
-47
View File
@@ -1,47 +0,0 @@
name: Bounty completed
description: Awards points and notifies Discord when a bounty PR is merged
on:
pull_request_target:
types: [closed]
workflow_dispatch:
inputs:
pr_number:
description: "PR number to process (for missed bounties)"
required: true
type: number
jobs:
bounty-notify:
if: >
github.event_name == 'workflow_dispatch' ||
(github.event.pull_request.merged == true &&
contains(join(github.event.pull_request.labels.*.name, ','), 'bounty:'))
runs-on: ubuntu-latest
timeout-minutes: 5
permissions:
contents: read
pull-requests: read
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Setup Bun
uses: oven-sh/setup-bun@v2
with:
bun-version: latest
- name: Award XP and notify Discord
run: bun run scripts/bounty-tracker.ts notify
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GITHUB_REPOSITORY_OWNER: ${{ github.repository_owner }}
GITHUB_REPOSITORY_NAME: ${{ github.event.repository.name }}
DISCORD_WEBHOOK_URL: ${{ secrets.DISCORD_BOUNTY_WEBHOOK_URL }}
BOT_API_URL: ${{ secrets.BOT_API_URL }}
BOT_API_KEY: ${{ secrets.BOT_API_KEY }}
LURKR_API_KEY: ${{ secrets.LURKR_API_KEY }}
LURKR_GUILD_ID: ${{ secrets.LURKR_GUILD_ID }}
PR_NUMBER: ${{ inputs.pr_number || github.event.pull_request.number }}
+60 -110
View File
@@ -5,141 +5,91 @@ on:
branches: [main]
pull_request:
branches: [main]
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
lint:
name: Lint Python
name: Lint
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Python
uses: actions/setup-python@v5
- name: Setup Node.js
uses: actions/setup-node@v4
with:
python-version: '3.11'
- name: Install uv
uses: astral-sh/setup-uv@v4
with:
enable-cache: true
node-version: '20'
cache: 'npm'
- name: Install dependencies
run: uv sync --project core --group dev
run: npm ci
- name: Ruff lint
run: |
uv run --project core ruff check core/
uv run --project core ruff check tools/
- name: Ruff format
run: |
uv run --project core ruff format --check core/
uv run --project core ruff format --check tools/
- name: Run linter
run: npm run lint
test:
name: Test Python Framework
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest, windows-latest]
steps:
- uses: actions/checkout@v4
- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install uv
uses: astral-sh/setup-uv@v4
with:
enable-cache: true
- name: Install dependencies and run tests
working-directory: core
run: |
uv sync
uv run pytest tests/ -v
test-tools:
name: Test Tools (${{ matrix.os }})
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest, windows-latest]
steps:
- uses: actions/checkout@v4
- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install uv
uses: astral-sh/setup-uv@v4
with:
enable-cache: true
- name: Install dependencies and run tests
working-directory: tools
run: |
uv sync --extra dev
uv run pytest tests/ -v
validate:
name: Validate Agent Exports
name: Test
runs-on: ubuntu-latest
needs: [lint, test, test-tools]
steps:
- uses: actions/checkout@v4
- name: Setup Python
uses: actions/setup-python@v5
- name: Setup Node.js
uses: actions/setup-node@v4
with:
python-version: '3.11'
node-version: '20'
cache: 'npm'
- name: Install uv
uses: astral-sh/setup-uv@v4
with:
enable-cache: true
- name: Install dependencies
working-directory: core
run: |
uv sync
run: npm ci
- name: Validate exported agents
run: |
# Check that agent exports have valid structure
if [ ! -d "exports" ]; then
echo "No exports/ directory found, skipping validation"
exit 0
fi
- name: Run tests
run: npm run test
shopt -s nullglob
agent_dirs=(exports/*/)
shopt -u nullglob
build:
name: Build
runs-on: ubuntu-latest
needs: [lint, test]
steps:
- uses: actions/checkout@v4
if [ ${#agent_dirs[@]} -eq 0 ]; then
echo "No agent directories in exports/, skipping validation"
exit 0
fi
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
validated=0
for agent_dir in "${agent_dirs[@]}"; do
if [ -f "$agent_dir/agent.json" ]; then
echo "Validating $agent_dir"
uv run python -c "import json; json.load(open('$agent_dir/agent.json'))"
validated=$((validated + 1))
fi
done
- name: Install dependencies
run: npm ci
if [ "$validated" -eq 0 ]; then
echo "No agent.json files found in exports/, skipping validation"
else
echo "Validated $validated agent(s)"
fi
- name: Build packages
run: npm run build
docker:
name: Docker Build
runs-on: ubuntu-latest
needs: [lint, test]
steps:
- uses: actions/checkout@v4
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Build frontend image
uses: docker/build-push-action@v5
with:
context: ./honeycomb
push: false
tags: honeycomb-frontend:test
cache-from: type=gha
cache-to: type=gha,mode=max
- name: Build backend image
uses: docker/build-push-action@v5
with:
context: ./hive
push: false
tags: honeycomb-backend:test
cache-from: type=gha
cache-to: type=gha,mode=max
-103
View File
@@ -1,103 +0,0 @@
name: Issue Triage
on:
issues:
types: [opened]
jobs:
triage:
runs-on: ubuntu-latest
timeout-minutes: 10
permissions:
contents: read
issues: write
id-token: write
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
fetch-depth: 1
- name: Triage and check for duplicates
uses: anthropics/claude-code-action@v1
with:
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
github_token: ${{ secrets.GITHUB_TOKEN }}
allowed_non_write_users: "*"
prompt: |
Analyze this new issue and perform triage tasks.
Issue: #${{ github.event.issue.number }}
Repository: ${{ github.repository }}
## Your Tasks:
### 1. Get issue details
Use mcp__github__get_issue to get the full details of issue #${{ github.event.issue.number }}
### 2. Check for duplicates
Search for similar existing issues using mcp__github__search_issues with relevant keywords from the issue title and body.
Criteria for duplicates:
- Same bug or error being reported
- Same feature request (even if worded differently)
- Same question being asked
- Issues describing the same root problem
If you find a duplicate:
- Add a comment using EXACTLY this format (required for auto-close to work):
"Found a possible duplicate of #<issue_number>: <brief explanation of why it's a duplicate>"
- Do NOT apply the "duplicate" label yet (the auto-close script will add it after 12 hours if no objections)
- Suggest the user react with a thumbs-down if they disagree
### 3. Check for Low-Quality / AI Spam
Analyze the issue quality. We are receiving many low-effort, AI-generated spam issues.
Flag the issue as INVALID if it matches these criteria:
- **Vague/Generic**: Title is "Fix bug" or "Error" without specific context.
- **Hallucinated**: Refers to files or features that do not exist in this repo.
- **Template Filler**: Body contains "Insert description here" or unrelated gibberish.
- **Low Effort**: No reproduction steps, no logs, only 1-2 sentences.
If identified as spam/low-quality:
- Add the "invalid" label.
- Add a comment:
"This issue has been automatically flagged as low-quality or potentially AI-generated spam. It lacks specific details (logs, reproduction steps, file references) required for us to help. Please open a new issue following the template exactly if this is a legitimate request."
- Do NOT proceed to other steps.
### 4. Check for invalid issues (General)
If the issue is not spam but still lacks information:
- Add the "invalid" label
- Comment asking for clarification
### 5. Categorize with labels (if NOT a duplicate or spam)
Apply appropriate labels based on the issue content. Use ONLY these labels:
- bug: Something isn't working
- enhancement: New feature or request
- question: Further information is requested
- documentation: Improvements or additions to documentation
- good first issue: Good for newcomers (if issue is well-defined and small scope)
- help wanted: Extra attention is needed (if issue needs community input)
- backlog: Tracked for the future, but not currently planned or prioritized
### 6. Estimate size (if NOT a duplicate, spam, or invalid)
Apply exactly ONE size label to help contributors match their capacity to the task:
- "size: small": Docs, typos, single-file fixes, config changes
- "size: medium": Bug fixes with tests, adding a single tool, changes within one package
- "size: large": Cross-package changes (core + tools), new modules, complex logic, architectural refactors
You may apply multiple labels if appropriate (e.g., "bug", "size: small", and "good first issue").
## Tools Available:
- mcp__github__get_issue: Get issue details
- mcp__github__search_issues: Search for similar issues
- mcp__github__list_issues: List recent issues if needed
- mcp__github__add_issue_comment: Add a comment
- mcp__github__update_issue: Add labels
- mcp__github__get_issue_comments: Get existing comments
Be thorough but efficient. Focus on accurate categorization and finding true duplicates.
claude_args: |
--model claude-haiku-4-5-20251001
--allowedTools "mcp__github__get_issue,mcp__github__search_issues,mcp__github__list_issues,mcp__github__add_issue_comment,mcp__github__update_issue,mcp__github__get_issue_comments"
-204
View File
@@ -1,204 +0,0 @@
name: PR Check Command
on:
issue_comment:
types: [created]
jobs:
check-pr:
# Only run on PR comments that start with /check
if: github.event.issue.pull_request && startsWith(github.event.comment.body, '/check')
runs-on: ubuntu-latest
permissions:
pull-requests: write
issues: write
checks: write
statuses: write
steps:
- name: Check PR requirements
uses: actions/github-script@v7
with:
script: |
const prNumber = context.payload.issue.number;
console.log(`Triggered by /check comment on PR #${prNumber}`);
// Fetch PR data
const { data: pr } = await github.rest.pulls.get({
owner: context.repo.owner,
repo: context.repo.repo,
pull_number: prNumber,
});
const prBody = pr.body || '';
const prTitle = pr.title || '';
const prAuthor = pr.user.login;
const headSha = pr.head.sha;
// Create a check run in progress
const { data: checkRun } = await github.rest.checks.create({
owner: context.repo.owner,
repo: context.repo.repo,
name: 'check-requirements',
head_sha: headSha,
status: 'in_progress',
started_at: new Date().toISOString(),
});
// Extract issue numbers
const issuePattern = /(?:close[sd]?|fix(?:e[sd])?|resolve[sd]?)?\s*#(\d+)/gi;
const allText = `${prTitle} ${prBody}`;
const matches = [...allText.matchAll(issuePattern)];
const issueNumbers = [...new Set(matches.map(m => parseInt(m[1], 10)))];
console.log(`PR #${prNumber}:`);
console.log(` Author: ${prAuthor}`);
console.log(` Found issue references: ${issueNumbers.length > 0 ? issueNumbers.join(', ') : 'none'}`);
if (issueNumbers.length === 0) {
const message = `## PR Closed - Requirements Not Met
This PR has been automatically closed because it doesn't meet the requirements.
**Missing:** No linked issue found.
**To fix:**
1. Create or find an existing issue for this work
2. Assign yourself to the issue
3. Re-open this PR and add \`Fixes #123\` in the description
**Why is this required?** See #472 for details.`;
await github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: prNumber,
body: message,
});
await github.rest.pulls.update({
owner: context.repo.owner,
repo: context.repo.repo,
pull_number: prNumber,
state: 'closed',
});
// Update check run to failure
await github.rest.checks.update({
owner: context.repo.owner,
repo: context.repo.repo,
check_run_id: checkRun.id,
status: 'completed',
conclusion: 'failure',
completed_at: new Date().toISOString(),
output: {
title: 'Missing linked issue',
summary: 'PR must reference an issue (e.g., `Fixes #123`)',
},
});
core.setFailed('PR must reference an issue');
return;
}
// Check if PR author is assigned to any linked issue
let issueWithAuthorAssigned = null;
let issuesWithoutAuthor = [];
for (const issueNum of issueNumbers) {
try {
const { data: issue } = await github.rest.issues.get({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: issueNum,
});
const assigneeLogins = (issue.assignees || []).map(a => a.login);
if (assigneeLogins.includes(prAuthor)) {
issueWithAuthorAssigned = issueNum;
console.log(` Issue #${issueNum} has PR author ${prAuthor} as assignee`);
break;
} else {
issuesWithoutAuthor.push({
number: issueNum,
assignees: assigneeLogins
});
console.log(` Issue #${issueNum} assignees: ${assigneeLogins.length > 0 ? assigneeLogins.join(', ') : 'none'}`);
}
} catch (error) {
console.log(` Issue #${issueNum} not found`);
}
}
if (!issueWithAuthorAssigned) {
const issueList = issuesWithoutAuthor.map(i =>
`#${i.number} (assignees: ${i.assignees.length > 0 ? i.assignees.join(', ') : 'none'})`
).join(', ');
const message = `## PR Closed - Requirements Not Met
This PR has been automatically closed because it doesn't meet the requirements.
**PR Author:** @${prAuthor}
**Found issues:** ${issueList}
**Problem:** The PR author must be assigned to the linked issue.
**To fix:**
1. Assign yourself (@${prAuthor}) to one of the linked issues
2. Re-open this PR
**Why is this required?** See #472 for details.`;
await github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: prNumber,
body: message,
});
await github.rest.pulls.update({
owner: context.repo.owner,
repo: context.repo.repo,
pull_number: prNumber,
state: 'closed',
});
// Update check run to failure
await github.rest.checks.update({
owner: context.repo.owner,
repo: context.repo.repo,
check_run_id: checkRun.id,
status: 'completed',
conclusion: 'failure',
completed_at: new Date().toISOString(),
output: {
title: 'PR author not assigned to issue',
summary: `PR author @${prAuthor} must be assigned to one of the linked issues: ${issueList}`,
},
});
core.setFailed('PR author must be assigned to the linked issue');
} else {
await github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: prNumber,
body: `✅ PR requirements met! Issue #${issueWithAuthorAssigned} has @${prAuthor} as assignee.`,
});
// Update check run to success
await github.rest.checks.update({
owner: context.repo.owner,
repo: context.repo.repo,
check_run_id: checkRun.id,
status: 'completed',
conclusion: 'success',
completed_at: new Date().toISOString(),
output: {
title: 'Requirements met',
summary: `Issue #${issueWithAuthorAssigned} has @${prAuthor} as assignee.`,
},
});
console.log(`PR requirements met!`);
}
@@ -1,138 +0,0 @@
name: PR Requirements Backfill
on:
workflow_dispatch:
jobs:
check-all-open-prs:
runs-on: ubuntu-latest
permissions:
pull-requests: write
issues: write
steps:
- name: Check all open PRs
uses: actions/github-script@v7
with:
script: |
const { data: pullRequests } = await github.rest.pulls.list({
owner: context.repo.owner,
repo: context.repo.repo,
state: 'open',
per_page: 100,
});
console.log(`Found ${pullRequests.length} open PRs`);
for (const pr of pullRequests) {
const prNumber = pr.number;
const prBody = pr.body || '';
const prTitle = pr.title || '';
const prAuthor = pr.user.login;
console.log(`\nChecking PR #${prNumber}: ${prTitle}`);
// Extract issue numbers from body and title
const issuePattern = /(?:close[sd]?|fix(?:e[sd])?|resolve[sd]?)?\s*#(\d+)/gi;
const allText = `${prTitle} ${prBody}`;
const matches = [...allText.matchAll(issuePattern)];
const issueNumbers = [...new Set(matches.map(m => parseInt(m[1], 10)))];
console.log(` Found issue references: ${issueNumbers.length > 0 ? issueNumbers.join(', ') : 'none'}`);
if (issueNumbers.length === 0) {
console.log(` ❌ No linked issue - closing PR`);
const message = `## PR Closed - Requirements Not Met
This PR has been automatically closed because it doesn't meet the requirements.
**Missing:** No linked issue found.
**To fix:**
1. Create or find an existing issue for this work
2. Assign yourself to the issue
3. Re-open this PR and add \`Fixes #123\` in the description`;
await github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: prNumber,
body: message,
});
await github.rest.pulls.update({
owner: context.repo.owner,
repo: context.repo.repo,
pull_number: prNumber,
state: 'closed',
});
continue;
}
// Check if any linked issue has the PR author as assignee
let issueWithAuthorAssigned = null;
let issuesWithoutAuthor = [];
for (const issueNum of issueNumbers) {
try {
const { data: issue } = await github.rest.issues.get({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: issueNum,
});
const assigneeLogins = (issue.assignees || []).map(a => a.login);
if (assigneeLogins.includes(prAuthor)) {
issueWithAuthorAssigned = issueNum;
break;
} else {
issuesWithoutAuthor.push({
number: issueNum,
assignees: assigneeLogins
});
}
} catch (error) {
console.log(` Issue #${issueNum} not found or inaccessible`);
}
}
if (!issueWithAuthorAssigned) {
const issueList = issuesWithoutAuthor.map(i =>
`#${i.number} (assignees: ${i.assignees.length > 0 ? i.assignees.join(', ') : 'none'})`
).join(', ');
console.log(` ❌ PR author not assigned to any linked issue - closing PR`);
const message = `## PR Closed - Requirements Not Met
This PR has been automatically closed because it doesn't meet the requirements.
**PR Author:** @${prAuthor}
**Found issues:** ${issueList}
**Problem:** The PR author must be assigned to the linked issue.
**To fix:**
1. Assign yourself (@${prAuthor}) to one of the linked issues
2. Re-open this PR`;
await github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: prNumber,
body: message,
});
await github.rest.pulls.update({
owner: context.repo.owner,
repo: context.repo.repo,
pull_number: prNumber,
state: 'closed',
});
} else {
console.log(` ✅ PR requirements met! Issue #${issueWithAuthorAssigned} has ${prAuthor} as assignee.`);
}
}
console.log('\nBackfill complete!');
@@ -1,54 +0,0 @@
# Closes PRs that still have the `pr-requirements-warning` label
# after contributors were warned in pr-requirements.yml.
name: PR Requirements Enforcement
on:
schedule:
- cron: "0 0 * * *" # runs every day once at midnight
jobs:
enforce:
name: Close PRs still failing contribution requirements
runs-on: ubuntu-latest
permissions:
pull-requests: write
issues: write
steps:
- name: Close PRs still failing requirements
uses: actions/github-script@v7
with:
script: |
const { owner, repo } = context.repo;
const prs = await github.paginate(github.rest.pulls.list, {
owner,
repo,
state: "open",
per_page: 100
});
for (const pr of prs) {
// Skip draft PRs — author may still be actively working toward compliance
if (pr.draft) continue;
const labels = pr.labels.map(l => l.name);
if (!labels.includes("pr-requirements-warning")) continue;
const gracePeriod = 24 * 60 * 60 * 1000;
const lastUpdated = new Date(pr.created_at);
const now = new Date();
if (now - lastUpdated < gracePeriod) {
console.log(`Skipping PR #${pr.number} — still within grace period`);
continue;
}
const prNumber = pr.number;
const prAuthor = pr.user.login;
await github.rest.issues.createComment({
owner,
repo,
issue_number: prNumber,
body: `Closing PR because the contribution requirements were not resolved within the 24-hour grace period.
If this was closed in error, feel free to reopen the PR after fixing the requirements.`
});
await github.rest.pulls.update({
owner,
repo,
pull_number: prNumber,
state: "closed"
});
console.log(`Closed PR #${prNumber} by ${prAuthor} (PR requirements were not met)`);
}
-203
View File
@@ -1,203 +0,0 @@
name: PR Requirements Check
on:
pull_request_target:
types: [opened, reopened, edited, synchronize]
jobs:
check-requirements:
runs-on: ubuntu-latest
permissions:
pull-requests: write
issues: write
steps:
- name: Check PR has linked issue with assignee
uses: actions/github-script@v7
with:
script: |
const pr = context.payload.pull_request;
const prNumber = pr.number;
const prBody = pr.body || '';
const prTitle = pr.title || '';
const prLabels = (pr.labels || []).map(l => l.name);
// Allow micro-fix and documentation PRs without a linked issue
const isMicroFix = prLabels.includes('micro-fix') || /micro-fix/i.test(prTitle);
const isDocumentation = prLabels.includes('documentation') || /\bdocs?\b/i.test(prTitle);
if (isMicroFix || isDocumentation) {
const reason = isMicroFix ? 'micro-fix' : 'documentation';
console.log(`PR #${prNumber} is a ${reason}, skipping issue requirement.`);
return;
}
// Extract issue numbers from body and title
// Matches: fixes #123, closes #123, resolves #123, or plain #123
const issuePattern = /(?:close[sd]?|fix(?:e[sd])?|resolve[sd]?)?\s*#(\d+)/gi;
const allText = `${prTitle} ${prBody}`;
const matches = [...allText.matchAll(issuePattern)];
const issueNumbers = [...new Set(matches.map(m => parseInt(m[1], 10)))];
console.log(`PR #${prNumber}:`);
console.log(` Found issue references: ${issueNumbers.length > 0 ? issueNumbers.join(', ') : 'none'}`);
if (issueNumbers.length === 0) {
const message = `## PR Requirements Warning
This PR does not meet the contribution requirements.
If the issue is not fixed within ~24 hours, it may be automatically closed.
**Missing:** No linked issue found.
**To fix:**
1. Create or find an existing issue for this work
2. Assign yourself to the issue
3. Re-open this PR and add \`Fixes #123\` in the description
**Exception:** To bypass this requirement, you can:
- Add the \`micro-fix\` label or include \`micro-fix\` in your PR title for trivial fixes
- Add the \`documentation\` label or include \`doc\`/\`docs\` in your PR title for documentation changes
**Micro-fix requirements** (must meet ALL):
| Qualifies | Disqualifies |
|-----------|--------------|
| < 20 lines changed | Any functional bug fix |
| Typos & Documentation & Linting | Refactoring for "clean code" |
| No logic/API/DB changes | New features (even tiny ones) |
**Why is this required?** See #472 for details.`;
const comments = await github.paginate(github.rest.issues.listComments, {
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: prNumber,
per_page: 100,
});
const botComment = comments.find(
(c) => c.user.type === 'Bot' && c.body.includes('PR Requirements Warning')
);
if (!botComment) {
await github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: prNumber,
body: message,
});
}
await github.rest.issues.addLabels({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: prNumber,
labels: ['pr-requirements-warning'],
});
core.setFailed('PR must reference an issue');
return;
}
// Check if any linked issue has the PR author as assignee
const prAuthor = pr.user.login;
let issueWithAuthorAssigned = null;
let issuesWithoutAuthor = [];
for (const issueNum of issueNumbers) {
try {
const { data: issue } = await github.rest.issues.get({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: issueNum,
});
const assigneeLogins = (issue.assignees || []).map(a => a.login);
if (assigneeLogins.includes(prAuthor)) {
issueWithAuthorAssigned = issueNum;
console.log(` Issue #${issueNum} has PR author ${prAuthor} as assignee`);
break;
} else {
issuesWithoutAuthor.push({
number: issueNum,
assignees: assigneeLogins
});
console.log(` Issue #${issueNum} assignees: ${assigneeLogins.length > 0 ? assigneeLogins.join(', ') : 'none'} (PR author: ${prAuthor})`);
}
} catch (error) {
console.log(` Issue #${issueNum} not found or inaccessible`);
}
}
if (!issueWithAuthorAssigned) {
const issueList = issuesWithoutAuthor.map(i =>
`#${i.number} (assignees: ${i.assignees.length > 0 ? i.assignees.join(', ') : 'none'})`
).join(', ');
const message = `## PR Requirements Warning
This PR does not meet the contribution requirements.
If the issue is not fixed within ~24 hours, it may be automatically closed.
**PR Author:** @${prAuthor}
**Found issues:** ${issueList}
**Problem:** The PR author must be assigned to the linked issue.
**To fix:**
1. Assign yourself (@${prAuthor}) to one of the linked issues
2. Re-open this PR
**Exception:** To bypass this requirement, you can:
- Add the \`micro-fix\` label or include \`micro-fix\` in your PR title for trivial fixes
- Add the \`documentation\` label or include \`doc\`/\`docs\` in your PR title for documentation changes
**Micro-fix requirements** (must meet ALL):
| Qualifies | Disqualifies |
|-----------|--------------|
| < 20 lines changed | Any functional bug fix |
| Typos & Documentation & Linting | Refactoring for "clean code" |
| No logic/API/DB changes | New features (even tiny ones) |
**Why is this required?** See #472 for details.`;
const comments = await github.paginate(github.rest.issues.listComments, {
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: prNumber,
per_page: 100,
});
const botComment = comments.find(
(c) => c.user.type === 'Bot' && c.body.includes('PR Requirements Warning')
);
if (!botComment) {
await github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: prNumber,
body: message,
});
}
await github.rest.issues.addLabels({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: prNumber,
labels: ['pr-requirements-warning'],
});
core.setFailed('PR author must be assigned to the linked issue');
} else {
console.log(`PR requirements met! Issue #${issueWithAuthorAssigned} has ${prAuthor} as assignee.`);
try {
await github.rest.issues.removeLabel({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: prNumber,
name: "pr-requirements-warning"
});
}catch (error){
//ignore if label doesn't exist
}
}
+57 -12
View File
@@ -7,6 +7,7 @@ on:
permissions:
contents: write
packages: write
jobs:
release:
@@ -17,23 +18,20 @@ jobs:
with:
fetch-depth: 0
- name: Setup Python
uses: actions/setup-python@v5
- name: Setup Node.js
uses: actions/setup-node@v4
with:
python-version: '3.11'
- name: Install uv
uses: astral-sh/setup-uv@v4
node-version: '20'
cache: 'npm'
- name: Install dependencies
run: |
cd core
uv sync
run: npm ci
- name: Build packages
run: npm run build
- name: Run tests
run: |
cd core
uv run pytest tests/ -v
run: npm run test
- name: Generate changelog
id: changelog
@@ -48,3 +46,50 @@ jobs:
generate_release_notes: true
draft: false
prerelease: ${{ contains(github.ref, '-') }}
docker-publish:
name: Publish Docker Images
runs-on: ubuntu-latest
needs: release
steps:
- uses: actions/checkout@v4
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Login to GitHub Container Registry
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract metadata
id: meta
uses: docker/metadata-action@v5
with:
images: |
ghcr.io/${{ github.repository }}/frontend
ghcr.io/${{ github.repository }}/backend
tags: |
type=semver,pattern={{version}}
type=semver,pattern={{major}}.{{minor}}
type=semver,pattern={{major}}
- name: Build and push frontend
uses: docker/build-push-action@v5
with:
context: ./honeycomb
push: true
tags: ghcr.io/${{ github.repository }}/frontend:${{ github.ref_name }}
cache-from: type=gha
cache-to: type=gha,mode=max
- name: Build and push backend
uses: docker/build-push-action@v5
with:
context: ./hive
push: true
tags: ghcr.io/${{ github.repository }}/backend:${{ github.ref_name }}
cache-from: type=gha
cache-to: type=gha,mode=max
-42
View File
@@ -1,42 +0,0 @@
name: Weekly bounty leaderboard
description: Posts the integration bounty leaderboard to Discord every Monday
on:
schedule:
# Every Monday at 9:00 UTC
- cron: "0 9 * * 1"
workflow_dispatch:
inputs:
since_date:
description: "Only count PRs merged after this date (YYYY-MM-DD). Leave empty for all-time."
required: false
jobs:
leaderboard:
runs-on: ubuntu-latest
timeout-minutes: 5
permissions:
contents: read
pull-requests: read
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Setup Bun
uses: oven-sh/setup-bun@v2
with:
bun-version: latest
- name: Post leaderboard to Discord
run: bun run scripts/bounty-tracker.ts leaderboard
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GITHUB_REPOSITORY_OWNER: ${{ github.repository_owner }}
GITHUB_REPOSITORY_NAME: ${{ github.event.repository.name }}
DISCORD_WEBHOOK_URL: ${{ secrets.DISCORD_BOUNTY_WEBHOOK_URL }}
BOT_API_URL: ${{ secrets.BOT_API_URL }}
BOT_API_KEY: ${{ secrets.BOT_API_KEY }}
LURKR_API_KEY: ${{ secrets.LURKR_API_KEY }}
LURKR_GUILD_ID: ${{ secrets.LURKR_GUILD_ID }}
SINCE_DATE: ${{ github.event.inputs.since_date || '' }}
+4 -19
View File
@@ -9,14 +9,12 @@ workdir/
.next/
out/
# Environment files
# Environment files (generated from config.yaml)
.env
.env.local
.env.*.local
.venv
/venv
tools/src/uv.lock
honeycomb/.env
hive/.env
# User configuration (copied from .example)
config.yaml
@@ -50,7 +48,6 @@ coverage/
# TypeScript
*.tsbuildinfo
vite.config.d.ts
# Python
__pycache__/
@@ -60,22 +57,10 @@ __pycache__/
.eggs/
*.egg
# Generated runtime data
core/data/
# Misc
*.local
.cache/
tmp/
temp/
exports/*
.claude/settings.local.json
docs/github-issues/*
core/tests/*dumps/*
screenshots/*
.gemini/*
exports/*
+7 -1
View File
@@ -1,3 +1,9 @@
{
"mcpServers": {}
"mcpServers": {
"agent-builder": {
"command": "python",
"args": ["-m", "framework.mcp.agent_builder_server"],
"cwd": "/home/timothy/oss/hive/core"
}
}
}
-18
View File
@@ -1,18 +0,0 @@
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.15.0
hooks:
- id: ruff
name: ruff lint (core)
args: [--fix]
files: ^core/
- id: ruff
name: ruff lint (tools)
args: [--fix]
files: ^tools/
- id: ruff-format
name: ruff format (core)
files: ^core/
- id: ruff-format
name: ruff format (tools)
files: ^tools/
-1
View File
@@ -1 +0,0 @@
3.11
-30
View File
@@ -1,30 +0,0 @@
# Repository Guidelines
Shared agent instructions for this workspace.
## Coding Agent Notes
-
- When working on a GitHub Issue or PR, print the full URL at the end of the task.
- When answering questions, respond with high-confidence answers only: verify in code; do not guess.
- Do not update dependencies casually. Version bumps, patched dependencies, overrides, or vendored dependency changes require explicit approval.
- Add brief comments for tricky logic. Keep files reasonably small when practical; split or refactor large files instead of growing them indefinitely.
- If shared guardrails are available locally, review them; otherwise follow this repo's guidance.
- Use `uv` for Python execution and package management. Do not use `python` or `python3` directly unless the user explicitly asks for it.
- Prefer `uv run` for scripts and tests, and `uv pip` for package operations.
## Multi-Agent Safety
- Do not create, apply, or drop `git stash` entries unless explicitly requested.
- Do not create, remove, or modify `git worktree` checkouts unless explicitly requested.
- Do not switch branches or check out a different branch unless explicitly requested.
- When the user says `push`, you may `git pull --rebase` to integrate latest changes, but never discard other in-progress work.
- When the user says `commit`, commit only your changes. When the user says `commit all`, commit everything in grouped chunks.
- When you see unrecognized files or unrelated changes, keep going and focus on your scoped changes.
## Change Hygiene
- If staged and unstaged diffs are formatting-only, resolve them without asking.
- If a commit or push was already requested, include formatting-only follow-up changes in that same commit when practical.
- Only stop to ask for confirmation when changes are semantic and may alter behavior.
+28 -318
View File
@@ -1,330 +1,40 @@
# Release Notes
# Changelog
## v0.7.1
All notable changes to this project will be documented in this file.
**Release Date:** March 13, 2026
**Tag:** v0.7.1
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
### Chrome-Native Browser Control
## [Unreleased]
v0.7.1 replaces Playwright with direct Chrome DevTools Protocol (CDP) integration. The GCU now launches the user's system Chrome via `open -n` on macOS, connects over CDP, and manages browser lifecycle end-to-end -- no extra browser binary required.
### Added
- Initial project structure
- React frontend (honeycomb) with Vite and TypeScript
- Node.js backend (hive) with Express and TypeScript
- Docker Compose configuration for local development
- Configuration system via `config.yaml`
- GitHub Actions CI/CD workflows
- Comprehensive documentation
---
### Changed
- N/A
### Highlights
### Deprecated
- N/A
#### System Chrome via CDP
### Removed
- N/A
The entire GCU browser stack has been rewritten:
### Fixed
- N/A
- **Chrome finder & launcher** -- New `chrome_finder.py` discovers installed Chrome and `chrome_launcher.py` manages process lifecycle with `--remote-debugging-port`
- **Coexist with user's browser** -- `open -n` on macOS launches a separate Chrome instance so the user's tabs stay untouched
- **Dynamic viewport sizing** -- Viewport auto-sizes to the available display area, suppressing Chrome warning bars
- **Orphan cleanup** -- Chrome processes are killed on GCU server shutdown to prevent leaks
- **`--no-startup-window`** -- Chrome launches headlessly by default until a page is needed
### Security
- N/A
#### Per-Subagent Browser Isolation
## [0.1.0] - 2025-01-13
Each GCU subagent gets its own Chrome user-data directory, preventing cookie/session cross-contamination:
### Added
- Initial release
- Unique browser profiles injected per subagent
- Profiles cleaned up after top-level GCU node execution
- Tab origin and age metadata tracked per subagent
#### Dummy Agent Testing Framework
A comprehensive test suite for validating agent graph patterns without LLM calls:
- 8 test modules covering echo, pipeline, branch, parallel merge, retry, feedback loop, worker, and GCU subagent patterns
- Shared fixtures and a `run_all.py` runner for CI integration
- Subagent lifecycle tests
---
### What's New
#### GCU Browser
- **Switch from Playwright to system Chrome via CDP** -- Direct CDP connection replaces Playwright dependency. (@bryanadenhq)
- **Chrome finder and launcher modules** -- `chrome_finder.py` and `chrome_launcher.py` for cross-platform Chrome discovery and process management. (@bryanadenhq)
- **Dynamic viewport sizing** -- Auto-size viewport and suppress Chrome warning bar. (@bryanadenhq)
- **Per-subagent browser profile isolation** -- Unique user-data directories per subagent with cleanup. (@bryanadenhq)
- **Tab origin/age metadata** -- Track which subagent opened each tab and when. (@bryanadenhq)
- **`browser_close_all` tool** -- Bulk tab cleanup for agents managing many pages. (@bryanadenhq)
- **Auto-track popup pages** -- Popups are automatically captured and tracked. (@bryanadenhq)
- **Auto-snapshot from browser interactions** -- Browser interaction tools return screenshots automatically. (@bryanadenhq)
- **Kill orphaned Chrome processes** -- GCU server shutdown cleans up lingering Chrome instances. (@bryanadenhq)
- **`--no-startup-window` Chrome flag** -- Prevent empty window on launch. (@bryanadenhq)
- **Launch Chrome via `open -n` on macOS** -- Coexist with the user's running browser. (@bryanadenhq)
#### Framework & Runtime
- **Session resume fix for new agents** -- Correctly resume sessions when a new agent is loaded. (@bryanadenhq)
- **Queen upsert fix** -- Prevent duplicate queen entries on session restore. (@bryanadenhq)
- **Anchor worker monitoring to queen's session ID on cold-restore** -- Worker monitors reconnect to the correct queen after restart. (@bryanadenhq)
- **Update meta.json when loading workers** -- Worker metadata stays in sync with runtime state. (@RichardTang-Aden)
- **Generate worker MCP file correctly** -- Fix MCP config generation for spawned workers. (@RichardTang-Aden)
- **Share event bus so tool events are visible to parent** -- Tool execution events propagate up to parent graphs. (@bryanadenhq)
- **Subagent activity tracking in queen status** -- Queen instructions include live subagent status. (@bryanadenhq)
- **GCU system prompt updates** -- Auto-snapshots, batching, popup tracking, and close_all guidance. (@bryanadenhq)
#### Frontend
- **Loading spinner in draft panel** -- Shows spinner during planning phase instead of blank panel. (@bryanadenhq)
- **Fix credential modal errors** -- Modal no longer eats errors; banner stays visible. (@bryanadenhq)
- **Fix credentials_required loop** -- Stop clearing the flag on modal close to prevent infinite re-prompting. (@bryanadenhq)
- **Fix "Add tab" dropdown overflow** -- Dropdown no longer hidden when many agents are open. (@prasoonmhwr)
#### Testing
- **Dummy agent test framework** -- 8 test modules (echo, pipeline, branch, parallel merge, retry, feedback loop, worker, GCU subagent) with shared fixtures and CI runner. (@bryanadenhq)
- **Subagent lifecycle tests** -- Validate subagent spawn and completion flows. (@bryanadenhq)
#### Documentation & Infrastructure
- **MCP integration PRD** -- Product requirements for MCP server registry. (@TimothyZhang7)
- **Skills registry PRD** -- Product requirements for skill registry system. (@bryanadenhq)
- **Bounty program updates** -- Standard bounty issue template and updated contributor guide. (@bryanadenhq)
- **Windows quickstart** -- Add default context limit for PowerShell setup. (@bryanadenhq)
- **Remove deprecated files** -- Clean up `setup_mcp.py`, `verify_mcp.py`, `antigravity-setup.md`, and `setup-antigravity-mcp.sh`. (@bryanadenhq)
---
### Bug Fixes
- Fix credential modal eating errors and banner staying open
- Stop clearing `credentials_required` on modal close to prevent infinite loop
- Share event bus so tool events are visible to parent graph
- Use lazy %-formatting in subagent completion log to avoid f-string in logger
- Anchor worker monitoring to queen's session ID on cold-restore
- Update meta.json when loading workers
- Generate worker MCP file correctly
- Fix "Add tab" dropdown partially hidden when creating multiple agents
---
### Community Contributors
- **Prasoon Mahawar** (@prasoonmhwr) -- Fix UI overflow on agent tab dropdown
- **Richard Tang** (@RichardTang-Aden) -- Worker MCP generation and meta.json fixes
---
### Upgrading
```bash
git pull origin main
uv sync
```
The Playwright dependency is no longer required for GCU browser operations. Chrome must be installed on the host system.
---
## v0.7.0
**Release Date:** March 5, 2026
**Tag:** v0.7.0
Session management refactor release.
---
## v0.5.1
**Release Date:** February 18, 2026
**Tag:** v0.5.1
### The Hive Gets a Brain
v0.5.1 is our most ambitious release yet. Hive agents can now **build other agents** -- the new Hive Coder meta-agent writes, tests, and fixes agent packages from natural language. The runtime grows multi-graph support so one session can orchestrate multiple agents simultaneously. The TUI gets a complete overhaul with an in-app agent picker, live streaming, and seamless escalation to the Coder. And we're now provider-agnostic: Claude Code subscriptions, OpenAI-compatible endpoints, and any LiteLLM-supported model work out of the box.
---
### Highlights
#### Hive Coder -- The Agent That Builds Agents
A native meta-agent that lives inside the framework at `core/framework/agents/hive_coder/`. Give it a natural-language specification and it produces a complete agent package -- goal definition, node prompts, edge routing, MCP tool wiring, tests, and all boilerplate files.
```bash
# Launch the Coder directly
hive code
# Or escalate from any running agent (TUI)
Ctrl+E # or /coder in chat
```
The Coder ships with:
- **Reference documentation** -- anti-patterns, construction guide, and design patterns baked into its system prompt
- **Guardian watchdog** -- an event-driven monitor that catches agent failures and triggers automatic remediation
- **Coder Tools MCP server** -- file I/O, fuzzy-match editing, git snapshots, and sandboxed shell execution (`tools/coder_tools_server.py`)
- **Test generation** -- structural tests for forever-alive agents that don't hang on `runner.run()`
#### Multi-Graph Agent Runtime
`AgentRuntime` now supports loading, managing, and switching between multiple agent graphs within a single session. Six new lifecycle tools give agents (and the TUI) full control:
```python
# Load a second agent into the runtime
await runtime.add_graph("exports/deep_research_agent")
# Tools available to agents:
# load_agent, unload_agent, start_agent, restart_agent, list_agents, get_user_presence
```
The Hive Coder uses multi-graph internally -- when you escalate from a worker agent, the Coder loads as a separate graph while the worker stays alive in the background.
#### TUI Revamp
The Terminal UI gets a ground-up rebuild with five major additions:
- **Agent Picker** (Ctrl+A) -- tabbed modal screen for browsing Your Agents, Framework agents, and Examples with metadata badges (node count, tool count, session count, tags)
- **Runtime-optional startup** -- TUI launches without a pre-loaded agent, showing the picker on first open
- **Live streaming pane** -- dedicated RichLog widget shows LLM tokens as they arrive, replacing the old one-token-per-line display
- **PDF attachments** -- `/attach` and `/detach` commands with native OS file dialog (macOS, Linux, Windows)
- **Multi-graph commands** -- `/graphs`, `/graph <id>`, `/load <path>`, `/unload <id>` for managing agent graphs in-session
#### Provider-Agnostic LLM Support
Hive is no longer Anthropic-only. v0.5.1 adds first-class support for:
- **Claude Code subscriptions** -- `use_claude_code_subscription: true` in `~/.hive/configuration.json` reads OAuth tokens from `~/.claude/.credentials.json` with automatic refresh
- **OpenAI-compatible endpoints** -- `api_base` config routes traffic through any compatible API (Azure OpenAI, vLLM, Ollama, etc.)
- **Any LiteLLM model** -- `RuntimeConfig` now passes `api_key`, `api_base`, and `extra_kwargs` through to LiteLLM
The quickstart script auto-detects Claude Code subscriptions and ZAI Code installations.
---
### What's New
#### Architecture & Runtime
- **Hive Coder meta-agent** -- Natural-language agent builder with reference docs, guardian watchdog, and `hive code` CLI command. (@TimothyZhang7)
- **Multi-graph agent sessions** -- `add_graph`/`remove_graph` on AgentRuntime with 6 lifecycle tools (`load_agent`, `unload_agent`, `start_agent`, `restart_agent`, `list_agents`, `get_user_presence`). (@TimothyZhang7)
- **Claude Code subscription support** -- OAuth token refresh via `use_claude_code_subscription` config, auto-detection in quickstart, LiteLLM header patching. (@TimothyZhang7)
- **OpenAI-compatible endpoint support** -- `api_base` and `extra_kwargs` in `RuntimeConfig` for any OpenAI-compatible API. (@TimothyZhang7)
- **Remove deprecated node types** -- Delete `FlexibleGraphExecutor`, `WorkerNode`, `HybridJudge`, `CodeSandbox`, `Plan`, `FunctionNode`, `LLMNode`, `RouterNode`. Deprecated types (`llm_tool_use`, `llm_generate`, `function`, `router`, `human_input`) now raise `RuntimeError` with migration guidance. (@TimothyZhang7)
- **Interactive credential setup** -- Guided `CredentialSetupSession` with health checks and encrypted storage, accessible via `hive setup-credentials` or automatic prompting on credential errors. (@RichardTang-Aden)
- **Pre-start confirmation prompt** -- Interactive prompt before agent execution allowing credential updates or abort. (@RichardTang-Aden)
- **Event bus multi-graph support** -- `graph_id` on events, `filter_graph` on subscriptions, `ESCALATION_REQUESTED` event type, `exclude_own_graph` filter. (@TimothyZhang7)
#### TUI Improvements
- **In-app agent picker** (Ctrl+A) -- Tabbed modal for browsing agents with metadata badges (nodes, tools, sessions, tags). (@TimothyZhang7)
- **Runtime-optional TUI startup** -- Launches without a pre-loaded agent, shows agent picker on startup. (@TimothyZhang7)
- **Hive Coder escalation** (Ctrl+E) -- Escalate to Hive Coder and return; also available via `/coder` and `/back` chat commands. (@TimothyZhang7)
- **PDF attachment support** -- `/attach` and `/detach` commands with native OS file dialog. (@TimothyZhang7)
- **Streaming output pane** -- Dedicated RichLog widget for live LLM token streaming. (@TimothyZhang7)
- **Multi-graph TUI commands** -- `/graphs`, `/graph <id>`, `/load <path>`, `/unload <id>`. (@TimothyZhang7)
- **Agent Guardian watchdog** -- Event-driven monitor that catches secondary agent failures and triggers automatic remediation, with `--no-guardian` CLI flag. (@TimothyZhang7)
#### New Tool Integrations
| Tool | Description | Contributor |
| ---------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------ |
| **Discord** | 4 MCP tools (`discord_list_guilds`, `discord_list_channels`, `discord_send_message`, `discord_get_messages`) with rate-limit retry and channel filtering | @mishrapravin114 |
| **Exa Search API** | 4 AI-powered search tools (`exa_search`, `exa_find_similar`, `exa_get_contents`, `exa_answer`) with neural/keyword search, domain filters, and citation-backed answers | @JeetKaria06 |
| **Razorpay** | 6 payment processing tools for payments, invoices, payment links, and refunds with HTTP Basic Auth | @shivamshahi07 |
| **Google Docs** | Document creation, reading, and editing with OAuth credential support | @haliaeetusvocifer |
| **Gmail enhancements** | Expanded mail operations for inbox management | @bryanadenhq |
#### Infrastructure
- **Default node type → `event_loop`** -- `NodeSpec.node_type` defaults to `"event_loop"` instead of `"llm_tool_use"`. (@TimothyZhang7)
- **Default `max_node_visits` → 0 (unlimited)** -- Nodes default to unlimited visits, reducing friction for feedback loops and forever-alive agents. (@TimothyZhang7)
- **Remove `function` field from NodeSpec** -- Follows deprecation of `FunctionNode`. (@TimothyZhang7)
- **LiteLLM OAuth patch** -- Correct header construction for OAuth tokens (remove `x-api-key` when Bearer token is present). (@TimothyZhang7)
- **Orchestrator config centralization** -- Reads `api_key`, `api_base`, `extra_kwargs` from centralized `~/.hive/configuration.json`. (@TimothyZhang7)
- **System prompt datetime injection** -- All system prompts now include current date/time for time-aware agent behavior. (@TimothyZhang7)
- **Utils module exports** -- Proper `__init__.py` exports for the utils module. (@Siddharth2624)
- **Increased default max_tokens** -- Opus 4.6 defaults to 32768, Sonnet 4.5 to 16384 (up from 8192). (@TimothyZhang7)
---
### Bug Fixes
- Flush WIP accumulator outputs on cancel/failure so edge conditions see correct values on resume
- Stall detection state preserved across resume (no more resets on checkpoint restore)
- Skip client-facing blocking for event-triggered executions (timer/webhook)
- Executor retry override scoped to actual EventLoopNode instances only
- Add `_awaiting_input` flag to EventLoopNode to prevent input injection race conditions
- Fix TUI streaming display (tokens no longer appear one-per-line)
- Fix `_return_from_escalation` crash when ChatRepl widgets not yet mounted
- Fix tools registration problems for Google Docs credentials (@RichardTang-Aden)
- Fix email agent version conflicts (@RichardTang-Aden)
- Fix coder tool timeouts (120s for tests, 300s cap for commands)
### Documentation
- Clarify installation and prevent root pip install misuse (@paarths-collab)
---
### Agent Updates
- **Email Inbox Management** -- Consolidate `gmail_inbox_guardian` and `inbox_management` into a single unified agent with updated prompts and config. (@RichardTang-Aden, @bryanadenhq)
- **Job Hunter** -- Updated node prompts, config, and agent metadata; added PDF resume selection. (@bryanadenhq)
- **Deep Research Agent** -- Revised node implementations with updated prompts and output handling.
- **Tech News Reporter** -- Revised node prompts for improved output quality.
- **Vulnerability Assessment** -- Expanded prompts with more detailed assessment instructions. (@bryanadenhq)
---
### Breaking Changes
- **Deprecated node types raise `RuntimeError`** -- `llm_tool_use`, `llm_generate`, `function`, `router`, `human_input` now fail instead of warning. Migrate to `event_loop`.
- **`NodeSpec.node_type` defaults to `"event_loop"`** (was `"llm_tool_use"`)
- **`NodeSpec.max_node_visits` defaults to `0` / unlimited** (was `1`)
- **`NodeSpec.function` field removed** -- `FunctionNode` is deleted; use event_loop nodes with tools instead.
---
### Community Contributors
A huge thank you to everyone who contributed to this release:
- **Richard Tang** (@RichardTang-Aden) -- Interactive credential setup, pre-start confirmation, email agent consolidation, tool registration fixes, lint and formatting
- **Pravin Mishra** (@mishrapravin114) -- Discord integration with 4 MCP tools
- **Jeet Karia** (@JeetKaria06) -- Exa Search API integration with 4 AI-powered search tools
- **Shivam Shahi** (@shivamshahi07) -- Razorpay payment processing integration
- **Siddharth Varshney** (@Siddharth2624) -- Utils module exports
- **@haliaeetusvocifer** -- Google Docs integration with OAuth support
- **Bryan** (@bryanadenhq) -- PDF selection, inbox agent fixes, Job Hunter and Vulnerability Assessment updates
- **@paarths-collab** -- Documentation improvements
---
### Upgrading
```bash
git pull origin main
uv sync
```
#### Migration Guide
If your agents use deprecated node types, update them:
```python
# Before (v0.5.0) -- these now raise RuntimeError
NodeSpec(node_type="llm_tool_use", ...)
NodeSpec(node_type="function", function=my_func, ...)
# After (v0.5.1) -- use event_loop for everything
NodeSpec(node_type="event_loop", ...) # or just omit node_type (it's the default now)
```
If your agents set `max_node_visits=1` explicitly, they'll still work. The only change is the _default_ -- new agents without an explicit value now get unlimited visits.
To try the new Hive Coder:
```bash
# Launch Coder directly
hive code
# Or from TUI -- press Ctrl+E to escalate
hive tui
```
[Unreleased]: https://github.com/adenhq/hive/compare/v0.1.0...HEAD
[0.1.0]: https://github.com/adenhq/hive/releases/tag/v0.1.0
-1
View File
@@ -1 +0,0 @@
AGENTS.md
+37 -1104
View File
File diff suppressed because it is too large Load Diff
+1198
View File
File diff suppressed because it is too large Load Diff
-56
View File
@@ -1,56 +0,0 @@
.PHONY: lint format check test test-tools test-live test-all install-hooks help frontend-install frontend-dev frontend-build
# ── Ensure uv is findable in Git Bash on Windows ──────────────────────────────
# uv installs to ~/.local/bin on Windows/Linux/macOS. Git Bash may not include
# this in PATH by default, so we prepend it here.
export PATH := $(HOME)/.local/bin:$(PATH)
# ── Targets ───────────────────────────────────────────────────────────────────
help: ## Show this help
@grep -E '^[a-zA-Z_-]+:.*?## .*$$' $(MAKEFILE_LIST) | \
awk 'BEGIN {FS = ":.*?## "}; {printf " \033[36m%-15s\033[0m %s\n", $$1, $$2}'
lint: ## Run ruff linter and formatter (with auto-fix)
cd core && uv run ruff check --fix .
cd tools && uv run ruff check --fix .
cd core && uv run ruff format .
cd tools && uv run ruff format .
format: ## Run ruff formatter
cd core && uv run ruff format .
cd tools && uv run ruff format .
check: ## Run all checks without modifying files (CI-safe)
cd core && uv run ruff check .
cd tools && uv run ruff check .
cd core && uv run ruff format --check .
cd tools && uv run ruff format --check .
test: ## Run all tests (core + tools, excludes live)
cd core && uv run python -m pytest tests/ -v
cd tools && uv run python -m pytest -v
test-tools: ## Run tool tests only (mocked, no credentials needed)
cd tools && uv run python -m pytest -v
test-live: ## Run live integration tests (requires real API credentials)
cd tools && uv run python -m pytest -m live -s -o "addopts=" --log-cli-level=INFO
test-all: ## Run everything including live tests
cd core && uv run python -m pytest tests/ -v
cd tools && uv run python -m pytest -v
cd tools && uv run python -m pytest -m live -s -o "addopts=" --log-cli-level=INFO
install-hooks: ## Install pre-commit hooks
uv pip install pre-commit
pre-commit install
frontend-install: ## Install frontend npm packages
cd core/frontend && npm install
frontend-dev: ## Start frontend dev server
cd core/frontend && npm run dev
frontend-build: ## Build frontend for production
cd core/frontend && npm run build
+236 -320
View File
@@ -1,362 +1,254 @@
<p align="center">
<img width="100%" alt="Hive Banner" src="https://github.com/user-attachments/assets/a027429b-5d3c-4d34-88e4-0feaeaabbab3" />
<img width="100%" alt="Hive Banner" src="https://storage.googleapis.com/aden-prod-assets/website/aden-title-card.png" />
</p>
<p align="center">
<a href="README.md">English</a> |
<a href="docs/i18n/zh-CN.md">简体中文</a> |
<a href="docs/i18n/es.md">Español</a> |
<a href="docs/i18n/hi.md">हिन्दी</a> |
<a href="docs/i18n/pt.md">Português</a> |
<a href="docs/i18n/ja.md">日本語</a> |
<a href="docs/i18n/ru.md">Русский</a> |
<a href="docs/i18n/ko.md">한국어</a>
</p>
[![Apache 2.0 License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://github.com/adenhq/hive/blob/main/LICENSE)
[![Y Combinator](https://img.shields.io/badge/Y%20Combinator-Aden-orange)](https://www.ycombinator.com/companies/aden)
[![Docker Pulls](https://img.shields.io/docker/pulls/adenhq/hive?logo=Docker&labelColor=%23528bff)](https://hub.docker.com/u/adenhq)
[![Discord](https://img.shields.io/discord/1172610340073242735?logo=discord&labelColor=%235462eb&logoColor=%23f5f5f5&color=%235462eb)](https://discord.com/invite/MXE49hrKDk)
[![Twitter Follow](https://img.shields.io/twitter/follow/teamaden?logo=X&color=%23f5f5f5)](https://x.com/aden_hq)
[![LinkedIn](https://custom-icon-badges.demolab.com/badge/LinkedIn-0A66C2?logo=linkedin-white&logoColor=fff)](https://www.linkedin.com/company/teamaden/)
<p align="center">
<a href="https://github.com/aden-hive/hive/blob/main/LICENSE"><img src="https://img.shields.io/badge/License-Apache%202.0-blue.svg" alt="Apache 2.0 License" /></a>
<a href="https://www.ycombinator.com/companies/aden"><img src="https://img.shields.io/badge/Y%20Combinator-Aden-orange" alt="Y Combinator" /></a>
<a href="https://discord.com/invite/MXE49hrKDk"><img src="https://img.shields.io/discord/1172610340073242735?logo=discord&labelColor=%235462eb&logoColor=%23f5f5f5&color=%235462eb" alt="Discord" /></a>
<a href="https://x.com/aden_hq"><img src="https://img.shields.io/twitter/follow/teamaden?logo=X&color=%23f5f5f5" alt="Twitter Follow" /></a>
<a href="https://www.linkedin.com/company/teamaden/"><img src="https://custom-icon-badges.demolab.com/badge/LinkedIn-0A66C2?logo=linkedin-white&logoColor=fff" alt="LinkedIn" /></a>
<img src="https://img.shields.io/badge/MCP-102_Tools-00ADD8?style=flat-square" alt="MCP" />
</p>
<p align="center">
<img src="https://img.shields.io/badge/Agent_Harness-Runtime_Layer-ff6600?style=flat-square" alt="Agent Harness" />
<img src="https://img.shields.io/badge/AI_Agents-Self--Improving-brightgreen?style=flat-square" alt="AI Agents" />
<img src="https://img.shields.io/badge/Multi--Agent-Systems-blue?style=flat-square" alt="Multi-Agent" />
<img src="https://img.shields.io/badge/Headless-Development-purple?style=flat-square" alt="Headless" />
<img src="https://img.shields.io/badge/Goal--Driven-Development-purple?style=flat-square" alt="Goal-Driven" />
<img src="https://img.shields.io/badge/Human--in--the--Loop-orange?style=flat-square" alt="HITL" />
<img src="https://img.shields.io/badge/Browser-Use-red?style=flat-square" alt="Browser Use" />
<img src="https://img.shields.io/badge/Production--Ready-red?style=flat-square" alt="Production" />
</p>
<p align="center">
<img src="https://img.shields.io/badge/OpenAI-supported-412991?style=flat-square&logo=openai" alt="OpenAI" />
<img src="https://img.shields.io/badge/Anthropic-supported-d4a574?style=flat-square" alt="Anthropic" />
<img src="https://img.shields.io/badge/Google_Gemini-supported-4285F4?style=flat-square&logo=google" alt="Gemini" />
<img src="https://img.shields.io/badge/MCP-19_Tools-00ADD8?style=flat-square" alt="MCP" />
</p>
<p align="center"><em>The agent harness for production workloads — state management, failure recovery, observability, and human oversight so your agents actually run.</em></p>
## Overview
Hive is a runtime harness for AI agents in production. You describe your goal in natural language; a coding agent (the queen) generates the agent graph and connection code to achieve it. During execution, the harness manages state isolation, checkpoint-based crash recovery, cost enforcement, and real-time observability. When agents fail, the framework captures failure data, evolves the graph through the coding agent, and redeploys automatically. Built-in human-in-the-loop nodes, browser control, credential management, and parallel execution give you production reliability without sacrificing adaptability.
Build reliable, self-improving AI agents without hardcoding workflows. Define your goal through conversation with a coding agent, and the framework generates a node graph with dynamically created connection code. When things break, the framework captures failure data, evolves the agent through the coding agent, and redeploys. Built-in human-in-the-loop nodes, credential management, and real-time monitoring give you control without sacrificing adaptability.
Visit [adenhq.com](https://adenhq.com) for complete documentation, examples, and guides.
Visit [HoneyComb](http://honeycomb.open-hive.com/) to see what jobs are being automated by AI. Its a stock market for jobs, driven by our communitys AI agent progress. You can long and short jobs (with no real money but compute token)based on how much you think a job is going to be replaced by AI.
https://github.com/user-attachments/assets/bf10edc3-06ba-48b6-98ba-d069b15fb69d
## Who Is Hive For?
Hive is the harness layer for teams moving AI agents from prototype to production. Models are getting better on their own — the bottleneck is the infrastructure around them: state management, failure recovery, cost control, and observability.
Hive is a good fit if you:
- Want AI agents that **execute real business processes**, not demos
- Need a **runtime that handles state, recovery, and parallel execution** at scale
- Need **self-healing and adaptive agents** that improve over time
- Require **human-in-the-loop control**, observability, and cost limits
- Plan to run agents in **production** where uptime, cost, and auditability matter
Hive may not be the best fit if youre only experimenting with simple agent chains or one-off scripts.
## When Should You Use Hive?
Use Hive when the bottleneck is no longer the model but the harness around it:
- Long-running agents that need **state persistence and crash recovery**
- Production workloads requiring **cost enforcement, observability, and audit trails**
- Agents that **self-heal** through failure capture and graph evolution
- Multi-agent coordination with **session isolation and shared buffers**
- A framework that **scales with model improvements** rather than fighting them
## Quick Links
- **[Documentation](https://docs.adenhq.com/)** - Complete guides and API reference
- **[Self-Hosting Guide](https://docs.adenhq.com/getting-started/quickstart)** - Deploy Hive on your infrastructure
- **[Changelog](https://github.com/aden-hive/hive/releases)** - Latest updates and releases
- **[Roadmap](docs/roadmap.md)** - Upcoming features and plans
- **[Report Issues](https://github.com/aden-hive/hive/issues)** - Bug reports and feature requests
- **[Contributing](CONTRIBUTING.md)** - How to contribute and submit PRs
- **[Changelog](https://github.com/adenhq/hive/releases)** - Latest updates and releases
<!-- - **[Roadmap](https://adenhq.com/roadmap)** - Upcoming features and plans -->
- **[Report Issues](https://github.com/adenhq/hive/issues)** - Bug reports and feature requests
## Quick Start
### Prerequisites
- Python 3.11+ for agent development
- An LLM provider that powers the agents
- **ripgrep (optional, recommended on Windows):** The `search_files` tool uses ripgrep for faster file search. If not installed, a Python fallback is used. On Windows: `winget install BurntSushi.ripgrep` or `scoop install ripgrep`
> **Windows Users:** Native Windows is supported via `quickstart.ps1` and `hive.ps1`. Run these in PowerShell 5.1+. WSL is also an option but not required.
- [Docker](https://docs.docker.com/get-docker/) (v20.10+)
- [Docker Compose](https://docs.docker.com/compose/install/) (v2.0+)
### Installation
> **Note**
> Hive uses a `uv` workspace layout and is not installed with `pip install`.
> Running `pip install -e .` from the repository root will create a placeholder package and Hive will not function correctly.
> Please use the quickstart script below to set up the environment.
```bash
# Clone the repository
git clone https://github.com/aden-hive/hive.git
git clone https://github.com/adenhq/hive.git
cd hive
# Run quickstart setup (macOS/Linux)
./quickstart.sh
# Copy and configure
cp config.yaml.example config.yaml
# Windows (PowerShell)
.\quickstart.ps1
# Run setup and start services
npm run setup
docker compose up
```
This sets up:
**Access the application:**
- **framework** - Core agent runtime and graph executor (in `core/.venv`)
- **aden_tools** - MCP tools for agent capabilities (in `tools/.venv`)
- **credential store** - Encrypted API key storage (`~/.hive/credentials`)
- **LLM provider** - Interactive default model configuration, including Hive LLM and OpenRouter
- All required Python dependencies with `uv`
- Finally, it will open the Hive interface in your browser
> **Tip:** To reopen the dashboard later, run `hive open` from the project directory.
### Build Your First Agent
Type the agent you want to build in the home input box. The queen is going to ask you questions and work out a solution with you.
<img width="2500" height="1214" alt="Image" src="https://github.com/user-attachments/assets/1ce19141-a78b-46f5-8d64-dbf987e048f4" />
### Use Template Agents
Click "Try a sample agent" and check the templates. You can run a template directly or choose to build your version on top of the existing template.
### Run Agents
Now you can run an agent by selecting the agent (either an existing agent or example agent). You can click the Run button on the top left, or talk to the queen agent and it can run the agent for you.
<img width="2549" height="1174" alt="Screenshot 2026-03-12 at 9 27 36PM" src="https://github.com/user-attachments/assets/7c7d30fa-9ceb-4c23-95af-b1caa405547d" />
- Dashboard: http://localhost:3000
- API: http://localhost:4000
- Health: http://localhost:4000/health
## Features
- **Browser-Use** - Control the browser on your computer to achieve hard tasks
- **Parallel Execution** - Execute the generated graph in parallel. This way you can have multiple agents completing the jobs for you
- **[Goal-Driven Generation](docs/key_concepts/goals_outcome.md)** - Define objectives in natural language; the coding agent generates the agent graph and connection code to achieve them
- **[Adaptiveness](docs/key_concepts/evolution.md)** - Framework captures failures, calibrates according to the objectives, and evolves the agent graph
- **[Dynamic Node Connections](docs/key_concepts/graph.md)** - No predefined edges; connection code is generated by any capable LLM based on your goals
- **SDK-Wrapped Nodes** - Every node gets a shared data buffer, local RLM memory, monitoring, tools, and LLM access out of the box
- **[Human-in-the-Loop](docs/key_concepts/graph.md#human-in-the-loop)** - Intervention nodes that pause execution for human input with configurable timeouts and escalation
- **Goal-Driven Development** - Define objectives in natural language; the coding agent generates the agent graph and connection code to achieve them
- **Self-Adapting Agents** - Framework captures failures, updates objectives and updates the agent graph
- **Dynamic Node Connections** - No predefined edges; connection code is generated by any capable LLM based on your goals
- **SDK-Wrapped Nodes** - Every node gets shared memory, local RLM memory, monitoring, tools, and LLM access out of the box
- **Human-in-the-Loop** - Intervention nodes that pause execution for human input with configurable timeouts and escalation
- **Real-time Observability** - WebSocket streaming for live monitoring of agent execution, decisions, and node-to-node communication
- **Cost & Budget Control** - Set spending limits, throttles, and automatic model degradation policies
- **Production-Ready** - Self-hostable, built for scale and reliability
## Integration
## Why Aden
<a href="https://github.com/aden-hive/hive/tree/main/tools/src/aden_tools/tools"><img width="100%" alt="Integration" src="https://github.com/user-attachments/assets/a1573f93-cf02-4bb8-b3d5-b305b05b1e51" /></a>
Hive is built to be model-agnostic and system-agnostic.
- **LLM flexibility** - Hive Framework supports Anthropic, OpenAI, OpenRouter, Hive LLM, and other hosted or local models through LiteLLM-compatible providers.
- **Business system connectivity** - Hive Framework is designed to connect to all kinds of business systems as tools, such as CRM, support, messaging, data, file, and internal APIs via MCP.
## Why Hive
As models improve, the upper bound of what agents can do rises — but their reliability and production value are determined by the harness. Hive focuses on generating agents that run real business processes rather than generic agents. Instead of requiring you to manually design workflows, define agent interactions, and handle failures reactively, Hive flips the paradigm: **you describe outcomes, and the system builds itself**—delivering an outcome-driven, adaptive experience with an easy-to-use set of tools and integrations.
Traditional agent frameworks require you to manually design workflows, define agent interactions, and handle failures reactively. Aden flips this paradigm—**you describe outcomes, and the system builds itself**.
```mermaid
flowchart LR
GOAL["Define Goal"] --> GEN["Auto-Generate Graph"]
GEN --> EXEC["Execute Agents"]
EXEC --> MON["Monitor & Observe"]
MON --> CHECK{{"Pass?"}}
CHECK -- "Yes" --> DONE["Deliver Result"]
CHECK -- "No" --> EVOLVE["Evolve Graph"]
EVOLVE --> EXEC
GOAL -.- V1["Natural Language"]
GEN -.- V2["Instant Architecture"]
EXEC -.- V3["Easy Integrations"]
MON -.- V4["Full visibility"]
EVOLVE -.- V5["Adaptability"]
DONE -.- V6["Reliable outcomes"]
style GOAL fill:#ffbe42,stroke:#cc5d00,stroke-width:2px,color:#333
style GEN fill:#ffb100,stroke:#cc5d00,stroke-width:2px,color:#333
style EXEC fill:#ff9800,stroke:#cc5d00,stroke-width:2px,color:#fff
style MON fill:#ff9800,stroke:#cc5d00,stroke-width:2px,color:#fff
style CHECK fill:#fff59d,stroke:#ed8c00,stroke-width:2px,color:#333
style DONE fill:#4caf50,stroke:#2e7d32,stroke-width:2px,color:#fff
style EVOLVE fill:#e8763d,stroke:#cc5d00,stroke-width:2px,color:#fff
style V1 fill:#fff,stroke:#ed8c00,stroke-width:1px,color:#cc5d00
style V2 fill:#fff,stroke:#ed8c00,stroke-width:1px,color:#cc5d00
style V3 fill:#fff,stroke:#ed8c00,stroke-width:1px,color:#cc5d00
style V4 fill:#fff,stroke:#ed8c00,stroke-width:1px,color:#cc5d00
style V5 fill:#fff,stroke:#ed8c00,stroke-width:1px,color:#cc5d00
style V6 fill:#fff,stroke:#ed8c00,stroke-width:1px,color:#cc5d00
```
### The Hive Advantage
| Typical Agent Frameworks | Hive |
| -------------------------- | -------------------------------------- |
| Focus on model orchestration | **Production harness**: state, recovery, observability |
| Hardcode agent workflows | Describe goals in natural language |
| Manual graph definition | Auto-generated agent graphs |
| Reactive error handling | Outcome-evaluation and adaptiveness |
| Static tool configurations | Dynamic SDK-wrapped nodes |
| Separate monitoring setup | Built-in real-time observability |
| DIY budget management | Integrated cost controls & degradation |
### How It Works
1. **[Define Your Goal](docs/key_concepts/goals_outcome.md)** → Describe what you want to achieve in plain English
2. **Coding Agent Generates** → Creates the [agent graph](docs/key_concepts/graph.md), connection code, and test cases
3. **[Workers Execute](docs/key_concepts/worker_agent.md)** → SDK-wrapped nodes run with full observability and tool access
4. **Control Plane Monitors** → Real-time metrics, budget enforcement, policy management
5. **[Adaptiveness](docs/key_concepts/evolution.md)** → On failure, the system evolves the graph and redeploys automatically
## Documentation
- **[Developer Guide](docs/developer-guide.md)** - Comprehensive guide for developers
- [Getting Started](docs/getting-started.md) - Quick setup instructions
- [Configuration Guide](docs/configuration.md) - All configuration options
- [Architecture Overview](docs/architecture/README.md) - System design and structure
## Roadmap
Aden Hive Agent Framework aims to help developers build outcome-oriented, self-adaptive agents. See [roadmap.md](docs/roadmap.md) for details.
```mermaid
flowchart TB
%% Main Entity
User([User])
%% =========================================
%% EXTERNAL EVENT SOURCES
%% =========================================
subgraph ExtEventSource [External Event Source]
E_Sch["Schedulers"]
E_WH["Webhook"]
E_SSE["SSE"]
subgraph BUILD["🏗️ BUILD"]
GOAL["Define Goal<br/>+ Success Criteria"] --> NODES["Add Nodes<br/>LLM/Router/Function"]
NODES --> EDGES["Connect Edges<br/>on_success/failure/conditional"]
EDGES --> TEST["Test & Validate"] --> APPROVE["Approve & Export"]
end
%% =========================================
%% SYSTEM NODES
%% =========================================
subgraph WorkerBees [Worker Bees]
WB_C["Conversation"]
WB_SP["System prompt"]
subgraph EXPORT["📦 EXPORT"]
direction TB
JSON["agent.json<br/>(GraphSpec)"]
TOOLS["tools.py<br/>(Functions)"]
MCP["mcp_servers.json<br/>(Integrations)"]
end
subgraph Graph [Graph]
direction TB
N1["Node"] --> N2["Node"] --> N3["Node"]
N1 -.-> AN["Active Node"]
N2 -.-> AN
N3 -.-> AN
subgraph RUN["🚀 RUNTIME"]
LOAD["AgentRunner<br/>Load + Parse"] --> SETUP["Setup Runtime<br/>+ ToolRegistry"]
SETUP --> EXEC["GraphExecutor<br/>Execute Nodes"]
%% Nested Event Loop Node
subgraph EventLoopNode [Event Loop Node]
ELN_L["listener"]
ELN_SP["System Prompt<br/>(Task)"]
ELN_EL["Event loop"]
ELN_C["Conversation"]
end
subgraph DECISION["Decision Recording"]
DEC1["runtime.decide()<br/>intent → options → choice"]
DEC2["runtime.record_outcome()<br/>success, result, metrics"]
end
end
subgraph JudgeNode [Judge]
J_C["Criteria"]
J_P["Principles"]
J_EL["Event loop"] <--> J_S["Scheduler"]
subgraph INFRA["⚙️ INFRASTRUCTURE"]
CTX["NodeContext<br/>memory • llm • tools"]
STORE[("FileStorage<br/>Runs & Decisions")]
end
subgraph QueenBee [Queen Bee]
QB_SP["System prompt"]
QB_EL["Event loop"]
QB_C["Conversation"]
end
APPROVE --> EXPORT
EXPORT --> LOAD
EXEC --> DECISION
EXEC --> CTX
DECISION --> STORE
STORE -.->|"Analyze & Improve"| NODES
subgraph Infra [Infra]
SA["Sub Agent"]
TR["Tool Registry"]
WTM["Write through Conversation Memory<br/>(Logs/RAM/Harddrive)"]
SM["Shared Memory<br/>(State/Harddrive)"]
EB["Event Bus<br/>(RAM)"]
CS["Credential Store<br/>(Harddrive/Cloud)"]
end
subgraph PC [PC]
B["Browser"]
CB["Codebase<br/>v 0.0.x ... v n.n.n"]
end
%% =========================================
%% CONNECTIONS & DATA FLOW
%% =========================================
%% External Event Routing
E_Sch --> ELN_L
E_WH --> ELN_L
E_SSE --> ELN_L
ELN_L -->|"triggers"| ELN_EL
%% User Interactions
User -->|"Talk"| WB_C
User -->|"Talk"| QB_C
User -->|"Read/Write Access"| CS
%% Inter-System Logic
ELN_C <-->|"Mirror"| WB_C
WB_C -->|"Focus"| AN
WorkerBees -->|"Inquire"| JudgeNode
JudgeNode -->|"Approve"| WorkerBees
%% Judge Alignments
J_C <-.->|"aligns"| WB_SP
J_P <-.->|"aligns"| QB_SP
%% Escalate path
J_EL -->|"Report (Escalate)"| QB_EL
%% Pub/Sub Logic
AN -->|"publish"| EB
EB -->|"subscribe"| QB_C
%% Infra and Process Spawning
ELN_EL -->|"Spawn"| SA
SA -->|"Inform"| ELN_EL
SA -->|"Starts"| B
B -->|"Report"| ELN_EL
TR -->|"Assigned"| ELN_EL
CB -->|"Modify Worker Bee"| WB_C
%% =========================================
%% SHARED MEMORY & LOGS ACCESS
%% =========================================
%% Worker Bees Access (link to node inside Graph subgraph)
AN <-->|"Read/Write"| WTM
AN <-->|"Read/Write"| SM
%% Queen Bee Access
QB_C <-->|"Read/Write"| WTM
QB_EL <-->|"Read/Write"| SM
%% Credentials Access
CS -->|"Read Access"| QB_C
style BUILD fill:#ffbe42,stroke:#cc5d00,stroke-width:3px,color:#333
style EXPORT fill:#fff59d,stroke:#ed8c00,stroke-width:2px,color:#333
style RUN fill:#ffb100,stroke:#cc5d00,stroke-width:3px,color:#333
style DECISION fill:#ffcc80,stroke:#ed8c00,stroke-width:2px,color:#333
style INFRA fill:#e8763d,stroke:#cc5d00,stroke-width:3px,color:#fff
style STORE fill:#ed8c00,stroke:#cc5d00,stroke-width:2px,color:#fff
```
## Contributing
We welcome contributions from the community! Were especially looking for help building tools, integrations, and example agents for the framework ([check #2805](https://github.com/aden-hive/hive/issues/2805)). If youre interested in extending its functionality, this is the perfect place to start. Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
### The Aden Advantage
**Important:** Please get assigned to an issue before submitting a PR. Comment on an issue to claim it, and a maintainer will assign you. Issues with reproducible steps and proposals are prioritized. This helps prevent duplicate work.
| Traditional Frameworks | Aden |
|------------------------|------|
| Hardcode agent workflows | Describe goals in natural language |
| Manual graph definition | Auto-generated agent graphs |
| Reactive error handling | Proactive self-evolution |
| Static tool configurations | Dynamic SDK-wrapped nodes |
| Separate monitoring setup | Built-in real-time observability |
| DIY budget management | Integrated cost controls & degradation |
1. Find or create an issue and get assigned
2. Fork the repository
3. Create your feature branch (`git checkout -b feature/amazing-feature`)
4. Commit your changes (`git commit -m 'Add amazing feature'`)
5. Push to the branch (`git push origin feature/amazing-feature`)
6. Open a Pull Request
### How It Works
1. **Define Your Goal** → Describe what you want to achieve in plain English
2. **Coding Agent Generates** → Creates the agent graph, connection code, and test cases
3. **Workers Execute** → SDK-wrapped nodes run with full observability and tool access
4. **Control Plane Monitors** → Real-time metrics, budget enforcement, policy management
5. **Self-Improve** → On failure, the system evolves the graph and redeploys automatically
## How Aden Compares
Aden takes a fundamentally different approach to agent development. While most frameworks require you to hardcode workflows or manually define agent graphs, Aden uses a **coding agent to generate your entire agent system** from natural language goals. When agents fail, the framework doesn't just log errors—it **automatically evolves the agent graph** and redeploys.
### Comparison Table
| Framework | Category | Approach | Aden Difference |
|-----------|----------|----------|-----------------|
| **LangChain, LlamaIndex, Haystack** | Component Libraries | Predefined components for RAG/LLM apps; manual connection logic | Generates entire graph and connection code upfront |
| **CrewAI, AutoGen, Swarm** | Multi-Agent Orchestration | Role-based agents with predefined collaboration patterns | Dynamically creates agents/connections; adapts on failure |
| **PydanticAI, Mastra, Agno** | Type-Safe Frameworks | Structured outputs and validation for known workflows | Evolving workflows; structure emerges through iteration |
| **Agent Zero, Letta** | Personal AI Assistants | Memory and learning; OS-as-tool or stateful memory focus | Production multi-agent systems with self-healing |
| **CAMEL** | Research Framework | Emergent behavior in large-scale simulations (up to 1M agents) | Production-oriented with reliable execution and recovery |
| **TEN Framework, Genkit** | Infrastructure Frameworks | Real-time multimodal (TEN) or full-stack AI (Genkit) | Higher abstraction—generates and evolves agent logic |
| **GPT Engineer, Motia** | Code Generation | Code from specs (GPT Engineer) or "Step" primitive (Motia) | Self-adapting graphs with automatic failure recovery |
| **Trading Agents** | Domain-Specific | Hardcoded trading firm roles on LangGraph | Domain-agnostic; generates structures for any use case |
### When to Choose Aden
Choose Aden when you need:
- Agents that **self-improve from failures** without manual intervention
- **Goal-driven development** where you describe outcomes, not workflows
- **Production reliability** with automatic recovery and redeployment
- **Rapid iteration** on agent architectures without rewriting code
- **Full observability** with real-time monitoring and human oversight
Choose other frameworks when you need:
- **Type-safe, predictable workflows** (PydanticAI, Mastra)
- **RAG and document processing** (LlamaIndex, Haystack)
- **Research on agent emergence** (CAMEL)
- **Real-time voice/multimodal** (TEN Framework)
- **Simple component chaining** (LangChain, Swarm)
## Project Structure
```
hive/
├── honeycomb/ # Frontend Dashboard
├── hive/ # Backend API Server
├── aden-tools/ # MCP Tools Package - 19 tools for agent capabilities
├── docs/ # Documentation and guides
├── scripts/ # Build and utility scripts
├── config.yaml.example # Configuration template
├── docker-compose.yml # Container orchestration
├── DEVELOPER.md # Developer guide
├── CONTRIBUTING.md # Contribution guidelines
└── ROADMAP.md # Product roadmap
```
## Development
### Local Development with Hot Reload
```bash
# Copy development overrides
cp docker-compose.override.yml.example docker-compose.override.yml
# Start with hot reload enabled
docker compose up
```
### Running Without Docker
```bash
# Install dependencies
npm install
# Generate environment files
npm run generate:env
# Start frontend (in honeycomb/)
cd honeycomb && npm run dev
# Start backend (in hive/)
cd hive && npm run dev
```
## Documentation
- **[Developer Guide](DEVELOPER.md)** - Comprehensive guide for developers
- [Getting Started](docs/getting-started.md) - Quick setup instructions
- [Configuration Guide](docs/configuration.md) - All configuration options
- [Architecture Overview](docs/architecture.md) - System design and structure
## Roadmap
Aden Agent Framework aims to help developers build outcome oriented, self-adaptive agents. Please find our roadmap here
[ROADMAP.md](ROADMAP.md)
```mermaid
timeline
title Aden Agent Framework Roadmap
section Foundation
Architecture : Node-Based Architecture : Python SDK : LLM Integration (OpenAI, Anthropic, Google) : Communication Protocol
Coding Agent : Goal Creation Session : Worker Agent Creation : MCP Tools Integration
Worker Agent : Human-in-the-Loop : Callback Handlers : Intervention Points : Streaming Interface
Tools : File Use : Memory (STM/LTM) : Web Search : Web Scraper : Audit Trail
Core : Eval System : Pydantic Validation : Docker Deployment : Documentation : Sample Agents
section Expansion
Intelligence : Guardrails : Streaming Mode : Semantic Search
Platform : JavaScript SDK : Custom Tool Integrator : Credential Store
Deployment : Self-Hosted : Cloud Services : CI/CD Pipeline
Templates : Sales Agent : Marketing Agent : Analytics Agent : Training Agent : Smart Form Agent
```
## Community & Support
@@ -366,6 +258,16 @@ We use [Discord](https://discord.com/invite/MXE49hrKDk) for support, feature req
- Twitter/X - [@adenhq](https://x.com/aden_hq)
- LinkedIn - [Company Page](https://www.linkedin.com/company/teamaden/)
## Contributing
We welcome contributions! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
1. Fork the repository
2. Create your feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request
## Join Our Team
**We're hiring!** Join us in engineering, research, and go-to-market roles.
@@ -382,55 +284,69 @@ This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENS
## Frequently Asked Questions (FAQ)
**Q: What LLM providers does Hive support?**
**Q: Does Aden depend on LangChain or other agent frameworks?**
Hive supports 100+ LLM providers through LiteLLM integration, including OpenAI (GPT-4, GPT-4o), Anthropic (Claude models), Google Gemini, DeepSeek, Mistral, Groq, OpenRouter, and Hive LLM. Simply set the appropriate API key environment variable and specify the model name. See [docs/configuration.md](docs/configuration.md) for provider-specific configuration examples.
No. Aden is built from the ground up with no dependencies on LangChain, CrewAI, or other agent frameworks. The framework is designed to be lean and flexible, generating agent graphs dynamically rather than relying on predefined components.
**Q: Can I use Hive with local AI models like Ollama?**
**Q: What LLM providers does Aden support?**
Yes! Hive supports local models through LiteLLM. Simply use the model name format `ollama/model-name` (e.g., `ollama/llama3`, `ollama/mistral`) and ensure Ollama is running locally.
Aden supports OpenAI (GPT-4, GPT-4o), Anthropic (Claude models), and Google Gemini out of the box. The architecture is provider-agnostic through SDK abstraction, with LiteLLM integration on the roadmap for expanded model support.
**Q: What makes Hive different from other agent frameworks?**
**Q: Can I use Aden with local AI models like Ollama?**
Hive is an agent harness, not just an orchestration framework. It provides the production runtime layer — session isolation, checkpoint-based crash recovery, cost enforcement, real-time observability, and human-in-the-loop controls — that makes agents reliable enough to run real workloads. On top of that, Hive generates your entire agent system from natural language goals and automatically [evolves the graph](docs/key_concepts/evolution.md) when agents fail. The combination of a robust harness with self-improving generation is what sets Hive apart.
Local model support through LiteLLM integration is on our roadmap. The SDK's provider-agnostic design means adding local model support will be straightforward once implemented.
**Q: Is Hive open-source?**
**Q: What makes Aden different from other agent frameworks?**
Yes, Hive is fully open-source under the Apache License 2.0. We actively encourage community contributions and collaboration.
Aden generates your entire agent system from natural language goals using a coding agent—you don't hardcode workflows or manually define graphs. When agents fail, the framework automatically captures failure data, evolves the agent graph, and redeploys. This self-improving loop is unique to Aden.
**Q: Does Hive support human-in-the-loop workflows?**
**Q: Is Aden open-source?**
Yes, Hive fully supports [human-in-the-loop](docs/key_concepts/graph.md#human-in-the-loop) workflows through intervention nodes that pause execution for human input. These include configurable timeouts and escalation policies, allowing seamless collaboration between human experts and AI agents.
Yes, Aden is fully open-source under the Apache License 2.0. We actively encourage community contributions and collaboration.
**Q: What programming languages does Hive support?**
**Q: Does Aden collect data from users?**
The Hive framework is built in Python. A JavaScript/TypeScript SDK is on the roadmap.
Aden collects telemetry data for monitoring and observability purposes, including token usage, latency metrics, and cost tracking. Content capture (prompts and responses) is configurable and stored with team-scoped data isolation. All data stays within your infrastructure when self-hosted.
**Q: Can Hive agents interact with external tools and APIs?**
**Q: What deployment options does Aden support?**
Aden supports Docker Compose deployment out of the box, with both production and development configurations. Self-hosted deployments work on any infrastructure supporting Docker. Cloud deployment options and Kubernetes-ready configurations are on the roadmap.
**Q: Can Aden handle complex, production-scale use cases?**
Yes. Aden is explicitly designed for production environments with features like automatic failure recovery, real-time observability, cost controls, and horizontal scaling support. The framework handles both simple automations and complex multi-agent workflows.
**Q: Does Aden support human-in-the-loop workflows?**
Yes, Aden fully supports human-in-the-loop workflows through intervention nodes that pause execution for human input. These include configurable timeouts and escalation policies, allowing seamless collaboration between human experts and AI agents.
**Q: What monitoring and debugging tools does Aden provide?**
Aden includes comprehensive observability features: real-time WebSocket streaming for live agent execution monitoring, TimescaleDB-powered analytics for cost and performance metrics, health check endpoints for Kubernetes integration, and 19 MCP tools for budget management, agent status, and policy control.
**Q: What programming languages does Aden support?**
Aden provides SDKs for both Python and JavaScript/TypeScript. The Python SDK includes integration templates for LangGraph, LangFlow, and LiveKit. The backend is Node.js/TypeScript, and the frontend is React/TypeScript.
**Q: Can Aden agents interact with external tools and APIs?**
Yes. Aden's SDK-wrapped nodes provide built-in tool access, and the framework supports flexible tool ecosystems. Agents can integrate with external APIs, databases, and services through the node architecture.
**Q: How does cost control work in Hive?**
**Q: How does cost control work in Aden?**
Hive provides granular budget controls including spending limits, throttles, and automatic model degradation policies. You can set budgets at the team, agent, or workflow level, with real-time cost tracking and alerts.
Aden provides granular budget controls including spending limits, throttles, and automatic model degradation policies. You can set budgets at the team, agent, or workflow level, with real-time cost tracking and alerts.
**Q: Where can I find examples and documentation?**
Visit [docs.adenhq.com](https://docs.adenhq.com/) for complete guides, API reference, and getting started tutorials. The repository also includes documentation in the `docs/` folder and a comprehensive [developer guide](docs/developer-guide.md).
Visit [docs.adenhq.com](https://docs.adenhq.com/) for complete guides, API reference, and getting started tutorials. The repository also includes documentation in the `docs/` folder and a comprehensive [DEVELOPER.md](DEVELOPER.md) guide.
**Q: How can I contribute to Aden?**
Contributions are welcome! Fork the repository, create your feature branch, implement your changes, and submit a pull request. See [CONTRIBUTING.md](CONTRIBUTING.md) for detailed guidelines.
## Star History
**Q: Does Aden offer enterprise support?**
<a href="https://star-history.com/#aden-hive/hive&Date">
<picture>
<source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/svg?repos=aden-hive/hive&type=Date&theme=dark" />
<source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/svg?repos=aden-hive/hive&type=Date" />
<img alt="Star History Chart" src="https://api.star-history.com/svg?repos=aden-hive/hive&type=Date" />
</picture>
</a>
For enterprise inquiries, contact the Aden team through [adenhq.com](https://adenhq.com) or join our [Discord community](https://discord.com/invite/MXE49hrKDk) for support and discussions.
---
+150
View File
@@ -0,0 +1,150 @@
Product Roadmap
Aden Agent Framework aims to help developers build outcome oriented, self-adaptive agents. Please find our roadmap here
```mermaid
timeline
title Aden Agent Framework Roadmap
section Foundation
Architecture : Node-Based Architecture : Python SDK : LLM Integration (OpenAI, Anthropic, Google) : Communication Protocol
Coding Agent : Goal Creation Session : Worker Agent Creation : MCP Tools Integration
Worker Agent : Human-in-the-Loop : Callback Handlers : Intervention Points : Streaming Interface
Tools : File Use : Memory (STM/LTM) : Web Search : Web Scraper : Audit Trail
Core : Eval System : Pydantic Validation : Docker Deployment : Documentation : Sample Agents
section Expansion
Intelligence : Guardrails : Streaming Mode : Semantic Search
Platform : JavaScript SDK : Custom Tool Integrator : Credential Store
Deployment : Self-Hosted : Cloud Services : CI/CD Pipeline
Templates : Sales Agent : Marketing Agent : Analytics Agent : Training Agent : Smart Form Agent
```
---
## Phase 1: Foundation
### Backbone Architecture
- [ ] **Node-Based Architecture (Agent as a node)**
- [x] Object schema definition
- [x] Node wrapper SDK
- [ ] Shared memory access
- [ ] Default monitoring hooks
- [ ] Tool access layer
- [x] LLM integration layer (Natively supports all mainstream LLMs through LiteLLM)
- [x] Anthropic
- [x] OpenAI
- [x] Google
- [ ] **Communication protocol between nodes**
- [ ] **[Coding Agent] Goal Creation Session** (separate from coding session)
- [ ] Instruction back and forth
- [x] Goal Object schema definition
- [ ] Being able to generate the test cases
- [ ] Test case validation for worker agent (Outcome driven)
- [ ] **[Coding Agent] Worker Agent Creation**
- [x] Coding Agent tools
- [ ] Use Template Agent as a start
- [x] Use our MCP tools
- [ ] **[Worker Agent] Human-in-the-Loop**
- [x] Worker Agents request with questions and options
- [x] Callback Handler System to receive events throughout execution
- [ ] Tool-Based Intervention Points (tool to pause execution and request human input)
- [x] Multiple entrypoint for different event source (e.g. Human input, webhook)
- [ ] Streaming Interface for Real-time Monitoring
- [ ] Request State Management
### Essential Tools
- [x] **File Use Tool Kit**
- [ ] **Memory Tools**
- [x] STM Layer Tool (state-based short-term memory)
- [x] LTM Layer Tool (RLM - long-term memory)
- [ ] **Infrastructure Tools**
- [x] Runtime Log Tool (logs for coding agent)
- [ ] Audit Trail Tool (decision timeline generation)
- [ ] Web Search
- [ ] Web Scraper
- [ ] Recipe for "Add your own tools"
### Memory & File System
- [x] DB for long-term persistent memory (Filesystem as durable scratchpad pattern)
- [x] Session Local memory isolation
### Eval System (Basic)
- [x] Test Driven - Run test case for all agent iteration
- [ ] Failure recording mechanism
- [ ] SDK for defining failure conditions
- [ ] Basic observability hooks
- [ ] User-driven log analysis (OSS approach)
### Data Validation
- [ ] Natively Support data validation of LLMs output with Pydantic
### Developer Experience
- [ ] **Debugging mode**
- [ ] **Documentation**
- [ ] Quick start guide
- [ ] Goal creation guide
- [ ] Agent creation guide
- [ ] GitHub Page setup
- [ ] README with examples
- [ ] Contributing guidelines
- [ ] **Distribution**
- [ ] PyPI package
- [ ] Docker image on Docker Hub
### Sample Agents
- [ ] Knowledge Agent
- [ ] Blog Writer Agent
- [ ] SDR Agent
---
## Phase 2: Expansion
### Basic Guardrails
- [ ] Support Basic Monitoring from Agent node SDK
- [ ] SDK guardrail implementation (in node)
- [ ] Guardrail type support (Determined Condition as Guardrails)
### Agent Capability
- [ ] Streaming mode support
### Cross-Platform
- [ ] JavaScript / TypeScript Version SDK
### File System Enhancement
- [ ] Semantic Search integration
- [ ] Interactive File System in product (frontend integration)
### More Worker Tools
- [ ] Custom Tool Integrator
- [ ] Integration as a tool (Credential Store & Support)
- [ ] **Core Agent Tools**
- [ ] Node Discovery Tool (find other agents in the graph)
- [ ] HITL Tool (pause execution for human approval)
- [ ] Wake-up Tool (resume agent tasks)
### Deployment (Self-Hosted)
- [ ] Docker container standardization
- [ ] Headless backend execution
- [ ] Exposed API for frontend attachment
- [ ] Local monitoring & observability (from hive repo)
- [ ] Basic lifecycle APIs (Start, Stop, Pause, Resume)
### Deployment (Cloud)
- [ ] Cloud Service Options
- [ ] Support deployment to 3rd-party platforms
- [ ] Self-deploy + orchestrator connection
- [ ] **CI/CD Pipeline**
- [ ] Automated test execution
- [ ] Agent version control
- [ ] All tests must pass for deployment
### Developer Experience Enhancement
- [ ] Tool usage documentation
- [ ] Discord Support Channel
### More Agent Templates
- [ ] GTM Sales Agent (workflow)
- [ ] GTM Marketing Agent (workflow)
- [ ] Analytics Agent
- [ ] Training Agent
- [ ] Smart Entry / Form Agent (self-evolution emphasis)
+2 -2
View File
@@ -39,8 +39,8 @@ We consider security research conducted in accordance with this policy to be:
## Security Best Practices for Users
1. **Keep Updated**: Always run the latest version
2. **Secure Configuration**: Review your `~/.hive/configuration.json`, `.mcp.json`, and environment variable settings, especially in production
3. **Environment Variables**: Never commit `.env` files or any configuration files that contain secrets
2. **Secure Configuration**: Review `config.yaml` settings, especially in production
3. **Environment Variables**: Never commit `.env` files or `config.yaml` with secrets
4. **Network Security**: Use HTTPS in production, configure firewalls appropriately
5. **Database Security**: Use strong passwords, limit network access
+186
View File
@@ -0,0 +1,186 @@
# Building Tools for Aden
This guide explains how to create new tools for the Aden agent framework using FastMCP.
## Quick Start Checklist
1. Create folder under `src/aden_tools/tools/<tool_name>/`
2. Implement a `register_tools(mcp: FastMCP)` function using the `@mcp.tool()` decorator
3. Add a `README.md` documenting your tool
4. Register in `src/aden_tools/tools/__init__.py`
5. Add tests in `tests/tools/`
## Tool Structure
Each tool lives in its own folder:
```
src/aden_tools/tools/my_tool/
├── __init__.py # Export register_tools function
├── my_tool.py # Tool implementation
└── README.md # Documentation
```
## Implementation Pattern
Tools use FastMCP's native decorator pattern:
```python
from fastmcp import FastMCP
def register_tools(mcp: FastMCP) -> None:
"""Register my tools with the MCP server."""
@mcp.tool()
def my_tool(
query: str,
limit: int = 10,
) -> dict:
"""
Search for items matching a query.
Use this when you need to find specific information.
Args:
query: The search query (1-500 chars)
limit: Maximum number of results (1-100)
Returns:
Dict with search results or error dict
"""
# Validate inputs
if not query or len(query) > 500:
return {"error": "Query must be 1-500 characters"}
if limit < 1 or limit > 100:
limit = max(1, min(100, limit))
try:
# Your implementation here
results = do_search(query, limit)
return {
"query": query,
"results": results,
"total": len(results),
}
except Exception as e:
return {"error": f"Search failed: {str(e)}"}
```
## Exporting the Tool
In `src/aden_tools/tools/my_tool/__init__.py`:
```python
from .my_tool import register_tools
__all__ = ["register_tools"]
```
In `src/aden_tools/tools/__init__.py`, add to `_TOOL_MODULES`:
```python
_TOOL_MODULES = [
# ... existing tools
"my_tool",
]
```
## Environment Variables
For tools requiring API keys or configuration, check environment variables at runtime:
```python
import os
def register_tools(mcp: FastMCP) -> None:
@mcp.tool()
def my_api_tool(query: str) -> dict:
"""Tool that requires an API key."""
api_key = os.getenv("MY_API_KEY")
if not api_key:
return {
"error": "MY_API_KEY environment variable not set",
"help": "Get an API key at https://example.com/api",
}
# Use the API key...
```
## Best Practices
### Error Handling
Return error dicts instead of raising exceptions:
```python
@mcp.tool()
def my_tool(**kwargs) -> dict:
try:
result = do_work()
return {"success": True, "data": result}
except SpecificError as e:
return {"error": f"Failed to process: {str(e)}"}
except Exception as e:
return {"error": f"Unexpected error: {str(e)}"}
```
### Return Values
- Return dicts for structured data
- Include relevant metadata (query, total count, etc.)
- Use `{"error": "message"}` for errors
### Documentation
The docstring becomes the tool description in MCP. Include:
- What the tool does
- When to use it
- Args with types and constraints
- What it returns
Every tool folder needs a `README.md` with:
- Description and use cases
- Usage examples
- Argument table
- Environment variables (if any)
- Error handling notes
## Testing
Place tests in `tests/tools/test_{{tool_name}}.py`:
```python
import pytest
from fastmcp import FastMCP
from aden_tools.tools.{{tool_name}} import register_tools
@pytest.fixture
def mcp():
"""Create a FastMCP instance with tools registered."""
server = FastMCP("test")
register_tools(server)
return server
def test_my_tool_basic(mcp):
"""Test basic tool functionality."""
tool_fn = mcp._tool_manager._tools["my_tool"].fn
result = tool_fn(query="test")
assert "results" in result
def test_my_tool_validation(mcp):
"""Test input validation."""
tool_fn = mcp._tool_manager._tools["my_tool"].fn
result = tool_fn(query="")
assert "error" in result
```
Mock external APIs to keep tests fast and deterministic.
## Naming Conventions
- **Folder name**: `snake_case` with `_tool` suffix (e.g., `file_read_tool`)
- **Function name**: `snake_case` (e.g., `file_read`)
- **Tool description**: Clear, actionable docstring
+2 -11
View File
@@ -1,5 +1,5 @@
# Aden Tools MCP Server
# Exposes tools via Model Context Protocol
# Exposes aden-tools via Model Context Protocol
FROM python:3.11-slim
@@ -14,15 +14,6 @@ COPY mcp_server.py ./
# Install package with all dependencies
RUN pip install --no-cache-dir -e .
# Install Google Chrome (stable) — used by GCU browser tools via CDP
RUN apt-get update && apt-get install -y wget gnupg \
&& mkdir -p /etc/apt/keyrings \
&& wget -q -O /etc/apt/keyrings/google-chrome.asc https://dl.google.com/linux/linux_signing_key.pub \
&& echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/google-chrome.asc] http://dl.google.com/linux/chrome/deb/ stable main" \
> /etc/apt/sources.list.d/google-chrome.list \
&& apt-get update && apt-get install -y google-chrome-stable \
&& apt-get clean && rm -rf /var/lib/apt/lists/*
# Create non-root user for security
RUN useradd -m -u 1001 appuser
@@ -34,7 +25,7 @@ RUN mkdir -p /app/workdir/workspaces && \
USER appuser
# Declare volume for workspace persistence across container runs
VOLUME ["/app//workspaces"]
VOLUME ["/app/workdir/workspaces"]
# Expose MCP server port
EXPOSE 4001
+103
View File
@@ -0,0 +1,103 @@
# Aden Tools
Tool library for the Aden agent framework. Provides a collection of tools that AI agents can use to interact with external systems, process data, and perform actions via the Model Context Protocol (MCP).
## Installation
```bash
pip install -e aden-tools
```
For development:
```bash
pip install -e "aden-tools[dev]"
```
## Quick Start
### As an MCP Server
```python
from fastmcp import FastMCP
from aden_tools.tools import register_all_tools
mcp = FastMCP("aden-tools")
register_all_tools(mcp)
mcp.run()
```
Or run directly:
```bash
python mcp_server.py
```
## Available Tools
| Tool | Description |
|------|-------------|
| `example_tool` | Template tool demonstrating the pattern |
| `file_read` | Read contents of local files |
| `file_write` | Write content to local files |
| `web_search` | Search the web using Brave Search API |
| `web_scrape` | Scrape and extract content from webpages |
| `pdf_read` | Read and extract text from PDF files |
## Project Structure
```
aden-tools/
├── src/aden_tools/
│ ├── __init__.py # Main exports
│ ├── utils/ # Utility functions
│ └── tools/ # Tool implementations
│ ├── example_tool/
│ ├── file_read_tool/
│ ├── file_write_tool/
│ ├── web_search_tool/
│ ├── web_scrape_tool/
│ └── pdf_read_tool/
├── tests/ # Test suite
├── mcp_server.py # MCP server entry point
├── README.md
├── BUILDING_TOOLS.md # Tool development guide
└── pyproject.toml
```
## Creating Custom Tools
Tools use FastMCP's native decorator pattern:
```python
from fastmcp import FastMCP
def register_tools(mcp: FastMCP) -> None:
@mcp.tool()
def my_tool(query: str, limit: int = 10) -> dict:
"""
Search for items matching the query.
Args:
query: The search query
limit: Max results to return
Returns:
Dict with results or error
"""
try:
results = do_search(query, limit)
return {"results": results, "total": len(results)}
except Exception as e:
return {"error": str(e)}
```
See [BUILDING_TOOLS.md](BUILDING_TOOLS.md) for the full guide.
## Documentation
- [Building Tools Guide](BUILDING_TOOLS.md) - How to create new tools
- Individual tool READMEs in `src/aden_tools/tools/*/README.md`
## License
This project is licensed under the Apache License 2.0 - see the [LICENSE](../LICENSE) file for details.
+79
View File
@@ -0,0 +1,79 @@
#!/usr/bin/env python3
"""
Aden Tools MCP Server
Exposes all aden-tools via Model Context Protocol using FastMCP.
Usage:
# Run with HTTP transport (default, for Docker)
python mcp_server.py
# Run with custom port
python mcp_server.py --port 8001
# Run with STDIO transport (for local testing)
python mcp_server.py --stdio
Environment Variables:
MCP_PORT - Server port (default: 4001)
BRAVE_SEARCH_API_KEY - Required for web_search tool
"""
import argparse
import os
from fastmcp import FastMCP
from starlette.requests import Request
from starlette.responses import PlainTextResponse
mcp = FastMCP("aden-tools")
# Register all tools with the MCP server
from aden_tools.tools import register_all_tools
tools = register_all_tools(mcp)
print(f"[MCP] Registered {len(tools)} tools: {tools}")
@mcp.custom_route("/health", methods=["GET"])
async def health_check(request: Request) -> PlainTextResponse:
"""Health check endpoint for container orchestration."""
return PlainTextResponse("OK")
@mcp.custom_route("/", methods=["GET"])
async def index(request: Request) -> PlainTextResponse:
"""Landing page for browser visits."""
return PlainTextResponse("Welcome to the Hive MCP Server")
def main() -> None:
"""Entry point for the MCP server."""
parser = argparse.ArgumentParser(description="Aden Tools MCP Server")
parser.add_argument(
"--port",
type=int,
default=int(os.getenv("MCP_PORT", "4001")),
help="HTTP server port (default: 4001)",
)
parser.add_argument(
"--host",
default="0.0.0.0",
help="HTTP server host (default: 0.0.0.0)",
)
parser.add_argument(
"--stdio",
action="store_true",
help="Use STDIO transport instead of HTTP",
)
args = parser.parse_args()
if args.stdio:
print("[MCP] Starting with STDIO transport")
mcp.run(transport="stdio")
else:
print(f"[MCP] Starting HTTP server on {args.host}:{args.port}")
mcp.run(transport="http", host=args.host, port=args.port)
if __name__ == "__main__":
main()
+60
View File
@@ -0,0 +1,60 @@
[project]
name = "aden-tools"
version = "0.1.0"
description = "Tools library for the Aden agent framework"
readme = "README.md"
requires-python = ">=3.10"
license = { text = "Apache-2.0" }
authors = [
{ name = "Aden", email = "team@aden.ai" }
]
keywords = ["ai", "agents", "tools", "llm"]
classifiers = [
"Development Status :: 3 - Alpha",
"Intended Audience :: Developers",
"License :: OSI Approved :: Apache Software License",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
]
dependencies = [
"pydantic>=2.0.0",
"httpx>=0.27.0",
"beautifulsoup4>=4.12.0",
"pypdf>=4.0.0",
"pandas>=2.0.0",
"jsonpath-ng>=1.6.0",
"fastmcp>=2.0.0",
"diff-match-patch>=20230430",
]
[project.optional-dependencies]
dev = [
"pytest>=7.0.0",
"pytest-asyncio>=0.21.0",
]
sandbox = [
"RestrictedPython>=7.0",
]
ocr = [
"pytesseract>=0.3.10",
"pillow>=10.0.0",
]
all = [
"RestrictedPython>=7.0",
"pytesseract>=0.3.10",
"pillow>=10.0.0",
]
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
[tool.hatch.build.targets.wheel]
packages = ["src/aden_tools"]
[tool.pytest.ini_options]
testpaths = ["tests"]
asyncio_mode = "auto"
+30
View File
@@ -0,0 +1,30 @@
"""
Aden Tools - Tool library for the Aden agent framework.
Tools provide capabilities that AI agents can use to interact with
external systems, process data, and perform actions.
Usage:
from fastmcp import FastMCP
from aden_tools.tools import register_all_tools
mcp = FastMCP("my-server")
register_all_tools(mcp)
"""
__version__ = "0.1.0"
# Utilities
from .utils import get_env_var
# MCP registration
from .tools import register_all_tools
__all__ = [
# Version
"__version__",
# Utilities
"get_env_var",
# MCP registration
"register_all_tools",
]
@@ -0,0 +1,79 @@
"""
Aden Tools - Tool implementations for FastMCP.
Usage:
from fastmcp import FastMCP
from aden_tools.tools import register_all_tools
mcp = FastMCP("my-server")
register_all_tools(mcp)
"""
from typing import List
from fastmcp import FastMCP
# Import register_tools from each tool module
from .example_tool import register_tools as register_example
from .file_read_tool import register_tools as register_file_read
from .file_write_tool import register_tools as register_file_write
from .web_search_tool import register_tools as register_web_search
from .web_scrape_tool import register_tools as register_web_scrape
from .pdf_read_tool import register_tools as register_pdf_read
# Import file system toolkits
from .file_system_toolkits.view_file import register_tools as register_view_file
from .file_system_toolkits.write_to_file import register_tools as register_write_to_file
from .file_system_toolkits.list_dir import register_tools as register_list_dir
from .file_system_toolkits.replace_file_content import register_tools as register_replace_file_content
from .file_system_toolkits.apply_diff import register_tools as register_apply_diff
from .file_system_toolkits.apply_patch import register_tools as register_apply_patch
from .file_system_toolkits.grep_search import register_tools as register_grep_search
from .file_system_toolkits.execute_command_tool import register_tools as register_execute_command
def register_all_tools(mcp: FastMCP) -> List[str]:
"""
Register all aden-tools with a FastMCP server.
Args:
mcp: FastMCP server instance
Returns:
List of registered tool names
"""
register_example(mcp)
register_file_read(mcp)
register_file_write(mcp)
register_web_search(mcp)
register_web_scrape(mcp)
register_pdf_read(mcp)
# Register file system toolkits
register_view_file(mcp)
register_write_to_file(mcp)
register_list_dir(mcp)
register_replace_file_content(mcp)
register_apply_diff(mcp)
register_apply_patch(mcp)
register_grep_search(mcp)
register_execute_command(mcp)
return [
"example_tool",
"file_read",
"file_write",
"web_search",
"web_scrape",
"pdf_read",
"view_file",
"write_to_file",
"list_dir",
"replace_file_content",
"apply_diff",
"apply_patch",
"grep_search",
"execute_command_tool",
]
__all__ = ["register_all_tools"]
@@ -1,5 +1,4 @@
"""Example Tool package."""
from .example_tool import register_tools
__all__ = ["register_tools"]
@@ -3,7 +3,6 @@ Example Tool - A simple text processing tool for FastMCP.
Demonstrates native FastMCP tool registration pattern.
"""
from __future__ import annotations
from fastmcp import FastMCP
@@ -0,0 +1,28 @@
# File Read Tool
Read contents of local files with encoding support.
## Description
Use for reading configs, data files, source code, logs, or any text file. Returns file content along with path, name, size, and encoding metadata.
## Arguments
| Argument | Type | Required | Default | Description |
|----------|------|----------|---------|-------------|
| `file_path` | str | Yes | - | Path to the file to read (absolute or relative) |
| `encoding` | str | No | `utf-8` | File encoding (utf-8, latin-1, etc.) |
| `max_size` | int | No | `10000000` | Maximum file size to read in bytes (default 10MB) |
## Environment Variables
This tool does not require any environment variables.
## Error Handling
Returns error dicts for common issues:
- `File not found: <path>` - File does not exist
- `Not a file: <path>` - Path points to a directory
- `File too large: <size> bytes (max: <max_size>)` - File exceeds max_size limit
- `Failed to decode file with encoding '<encoding>'` - Wrong encoding specified
- `Permission denied: <path>` - No read access to file
@@ -0,0 +1,4 @@
"""File Read Tool - Read contents of local files."""
from .file_read_tool import register_tools
__all__ = ["register_tools"]
@@ -0,0 +1,75 @@
"""
File Read Tool - Read contents of local files.
Supports reading text files with various encodings.
Returns file content along with metadata.
"""
from __future__ import annotations
from pathlib import Path
from fastmcp import FastMCP
def register_tools(mcp: FastMCP) -> None:
"""Register file read tools with the MCP server."""
@mcp.tool()
def file_read(
file_path: str,
encoding: str = "utf-8",
max_size: int = 10_000_000,
) -> dict:
"""
Read the contents of a local file.
Use for reading configs, data files, source code, logs, or any text file.
Returns file content along with path, name, size, and encoding.
Args:
file_path: Path to the file to read (absolute or relative)
encoding: File encoding (utf-8, latin-1, etc.)
max_size: Maximum file size to read in bytes (default 10MB)
Returns:
Dict with file content and metadata, or error dict
"""
try:
path = Path(file_path).resolve()
# Check if file exists
if not path.exists():
return {"error": f"File not found: {file_path}"}
# Check if it's a file (not directory)
if not path.is_file():
return {"error": f"Not a file: {file_path}"}
# Check file size
file_size = path.stat().st_size
if max_size > 0 and file_size > max_size:
return {
"error": f"File too large: {file_size} bytes (max: {max_size})",
"file_size": file_size,
}
# Read the file
content = path.read_text(encoding=encoding)
return {
"path": str(path),
"name": path.name,
"content": content,
"size": len(content),
"encoding": encoding,
}
except UnicodeDecodeError as e:
return {
"error": f"Failed to decode file with encoding '{encoding}': {str(e)}",
"suggestion": "Try a different encoding like 'latin-1' or 'cp1252'",
}
except PermissionError:
return {"error": f"Permission denied: {file_path}"}
except Exception as e:
return {"error": f"Failed to read file: {str(e)}"}
@@ -1,3 +1,3 @@
from .apply_diff import register_tools
__all__ = ["register_tools"]
__all__ = ["register_tools"]
@@ -1,49 +1,37 @@
import os
import diff_match_patch as dmp_module
from mcp.server.fastmcp import FastMCP
from ..security import get_sandboxed_path
from ..security import get_secure_path
def register_tools(mcp: FastMCP) -> None:
"""Register diff application tools with the MCP server."""
@mcp.tool()
def apply_diff(
path: str, diff_text: str, agent_id: str
) -> dict:
def apply_diff(path: str, diff_text: str, workspace_id: str, agent_id: str, session_id: str) -> dict:
"""
Purpose
Apply a structured diff to update a file while preserving context.
Apply a diff to a file within the session sandbox.
When to use
Larger but still controlled updates
Refactoring structured memory (tables, sections)
Automated compaction or cleanup passes
Rules & Constraints
Diff must be context-aware
Rejected if it touches restricted sections
Prefer apply_patch for small changes
Use this when you need to apply structured diff patches to modify file content.
Args:
path: The path to the file (relative to agent sandbox)
path: The path to the file (relative to session root)
diff_text: The diff patch text to apply
workspace_id: The ID of the workspace
agent_id: The ID of the agent
session_id: The ID of the current session
Returns:
Dict with application status and patch results, or error dict
"""
try:
secure_path = get_sandboxed_path(path, agent_id)
secure_path = get_secure_path(path, workspace_id, agent_id, session_id)
if not os.path.exists(secure_path):
return {"error": f"File not found at {path}"}
dmp = dmp_module.diff_match_patch()
patches = dmp.patch_fromText(diff_text)
with open(secure_path, encoding="utf-8") as f:
with open(secure_path, "r", encoding="utf-8") as f:
content = f.read()
new_content, results = dmp.patch_apply(patches, content)
@@ -55,7 +43,7 @@ def register_tools(mcp: FastMCP) -> None:
"success": True,
"path": path,
"patches_applied": len(patches),
"all_successful": True,
"all_successful": True
}
else:
failed_count = sum(1 for r in results if not r)
@@ -64,7 +52,7 @@ def register_tools(mcp: FastMCP) -> None:
"path": path,
"patches_applied": len([r for r in results if r]),
"patches_failed": failed_count,
"error": f"Failed to apply {failed_count} of {len(patches)} patches",
"error": f"Failed to apply {failed_count} of {len(patches)} patches"
}
except Exception as e:
return {"error": f"Failed to apply diff: {str(e)}"}
@@ -1,3 +1,3 @@
from .apply_patch import register_tools
__all__ = ["register_tools"]
__all__ = ["register_tools"]
@@ -1,53 +1,39 @@
import os
import diff_match_patch as dmp_module
from mcp.server.fastmcp import FastMCP
from ..security import get_sandboxed_path
from ..security import get_secure_path
def register_tools(mcp: FastMCP) -> None:
"""Register patch application tools with the MCP server."""
@mcp.tool()
def apply_patch(
path: str, patch_text: str, agent_id: str
) -> dict:
def apply_patch(path: str, patch_text: str, workspace_id: str, agent_id: str, session_id: str) -> dict:
"""
Purpose
Apply a scoped, line-level modification to an existing file.
Apply a patch to a file within the session sandbox.
When to use
Update curated canonical memory
Fix or refine existing summaries or facts
Remove duplication or stale information
Rules & Constraints
Patch must be small and targeted
Must preserve unrelated content
Only allowed on approved files and sections
Best practice
Always read the file first. Never patch blindly.
Use this when you need to apply patch-formatted changes to a file.
This is an alias for apply_diff with the same functionality.
Args:
path: The path to the file (relative to agent sandbox)
path: The path to the file (relative to session root)
patch_text: The patch text to apply
workspace_id: The ID of the workspace
agent_id: The ID of the agent
session_id: The ID of the current session
Returns:
Dict with application status and patch results, or error dict
"""
# Logic duplicated from apply_diff for atomic isolation
try:
secure_path = get_sandboxed_path(path, agent_id)
secure_path = get_secure_path(path, workspace_id, agent_id, session_id)
if not os.path.exists(secure_path):
return {"error": f"File not found at {path}"}
dmp = dmp_module.diff_match_patch()
patches = dmp.patch_fromText(patch_text)
with open(secure_path, encoding="utf-8") as f:
with open(secure_path, "r", encoding="utf-8") as f:
content = f.read()
new_content, results = dmp.patch_apply(patches, content)
@@ -59,7 +45,7 @@ def register_tools(mcp: FastMCP) -> None:
"success": True,
"path": path,
"patches_applied": len(patches),
"all_successful": True,
"all_successful": True
}
else:
failed_count = sum(1 for r in results if not r)
@@ -68,7 +54,7 @@ def register_tools(mcp: FastMCP) -> None:
"path": path,
"patches_applied": len([r for r in results if r]),
"patches_failed": failed_count,
"error": f"Failed to apply {failed_count} of {len(patches)} patches",
"error": f"Failed to apply {failed_count} of {len(patches)} patches"
}
except Exception as e:
return {"error": f"Failed to apply patch: {str(e)}"}
@@ -1,3 +1,3 @@
from .execute_command_tool import register_tools
__all__ = ["register_tools"]
__all__ = ["register_tools"]
@@ -0,0 +1,58 @@
import os
import subprocess
from typing import Optional
from mcp.server.fastmcp import FastMCP
from ..security import get_secure_path, WORKSPACES_DIR
def register_tools(mcp: FastMCP) -> None:
"""Register command execution tools with the MCP server."""
@mcp.tool()
def execute_command_tool(command: str, workspace_id: str, agent_id: str, session_id: str, cwd: Optional[str] = None) -> dict:
"""
Execute a shell command within the session sandbox.
Use this when you need to run shell commands safely within the sandboxed environment.
Commands are executed with a 60-second timeout.
Args:
command: The shell command to execute
workspace_id: The ID of the workspace
agent_id: The ID of the agent
session_id: The ID of the current session
cwd: The working directory for the command (relative to session root, optional)
Returns:
Dict with command output and execution details, or error dict
"""
try:
# Default cwd is the session root
session_root = os.path.join(WORKSPACES_DIR, workspace_id, agent_id, session_id)
os.makedirs(session_root, exist_ok=True)
if cwd:
secure_cwd = get_secure_path(cwd, workspace_id, agent_id, session_id)
else:
secure_cwd = session_root
result = subprocess.run(
command,
shell=True,
cwd=secure_cwd,
capture_output=True,
text=True,
timeout=60
)
return {
"success": True,
"command": command,
"return_code": result.returncode,
"stdout": result.stdout,
"stderr": result.stderr,
"cwd": cwd or "."
}
except subprocess.TimeoutExpired:
return {"error": "Command timed out after 60 seconds"}
except Exception as e:
return {"error": f"Failed to execute command: {str(e)}"}
@@ -36,13 +36,12 @@ grep_search(
| `agent_id` | str | Yes | - | The ID of the agent |
| `session_id` | str | Yes | - | The ID of the current session |
| `recursive` | bool | No | False | Whether to search recursively in subdirectories |
| `hashline` | bool | No | False | If True, include an `anchor` field (`N:hhhh`) in each match for use with `hashline_edit` |
## Returns
Returns a dictionary with the following structure:
**Success (default mode):**
**Success:**
```python
{
"success": True,
@@ -65,25 +64,6 @@ Returns a dictionary with the following structure:
}
```
**Success (hashline mode):**
```python
{
"success": True,
"pattern": "def \\w+\\(",
"path": "src",
"recursive": True,
"matches": [
{
"file": "src/main.py",
"line_number": 10,
"line_content": "def process_data(args):",
"anchor": "10:a3f2"
}
],
"total_matches": 1
}
```
**No matches:**
```python
{
@@ -1,3 +1,3 @@
from .grep_search import register_tools
__all__ = ["register_tools"]
__all__ = ["register_tools"]
@@ -0,0 +1,70 @@
import os
import re
from mcp.server.fastmcp import FastMCP
from ..security import get_secure_path, WORKSPACES_DIR
def register_tools(mcp: FastMCP) -> None:
"""Register grep search tools with the MCP server."""
@mcp.tool()
def grep_search(path: str, pattern: str, workspace_id: str, agent_id: str, session_id: str, recursive: bool = False) -> dict:
"""
Search for a pattern in a file or directory within the session sandbox.
Use this when you need to find specific content or patterns in files using regex.
Set recursive=True to search through all subdirectories.
Args:
path: The path to search in (file or directory, relative to session root)
pattern: The regex pattern to search for
workspace_id: The ID of the workspace
agent_id: The ID of the agent
session_id: The ID of the current session
recursive: Whether to search recursively in directories (default: False)
Returns:
Dict with search results and match details, or error dict
"""
try:
secure_path = get_secure_path(path, workspace_id, agent_id, session_id)
# Use session dir root for relative path calculations
session_root = os.path.join(WORKSPACES_DIR, workspace_id, agent_id, session_id)
matches = []
regex = re.compile(pattern)
if os.path.isfile(secure_path):
files = [secure_path]
elif recursive:
files = []
for root, _, filenames in os.walk(secure_path):
for filename in filenames:
files.append(os.path.join(root, filename))
else:
files = [os.path.join(secure_path, f) for f in os.listdir(secure_path) if os.path.isfile(os.path.join(secure_path, f))]
for file_path in files:
# Calculate relative path for display
display_path = os.path.relpath(file_path, session_root)
try:
with open(file_path, "r", encoding="utf-8") as f:
for i, line in enumerate(f, 1):
if regex.search(line):
matches.append({
"file": display_path,
"line_number": i,
"line_content": line.strip()
})
except (UnicodeDecodeError, PermissionError):
continue
return {
"success": True,
"pattern": pattern,
"path": path,
"recursive": recursive,
"matches": matches,
"total_matches": len(matches)
}
except Exception as e:
return {"error": f"Failed to perform grep search: {str(e)}"}
@@ -1,3 +1,3 @@
from .list_dir import register_tools
__all__ = ["register_tools"]
__all__ = ["register_tools"]
@@ -0,0 +1,49 @@
import os
from mcp.server.fastmcp import FastMCP
from ..security import get_secure_path
def register_tools(mcp: FastMCP) -> None:
"""Register directory listing tools with the MCP server."""
@mcp.tool()
def list_dir(path: str, workspace_id: str, agent_id: str, session_id: str) -> dict:
"""
List the contents of a directory within the session sandbox.
Use this when you need to explore directory contents and see what files
and subdirectories exist.
Args:
path: The directory path (relative to session root)
workspace_id: The ID of the workspace
agent_id: The ID of the agent
session_id: The ID of the current session
Returns:
Dict with directory contents and metadata, or error dict
"""
try:
secure_path = get_secure_path(path, workspace_id, agent_id, session_id)
if not os.path.exists(secure_path):
return {"error": f"Directory not found at {path}"}
items = os.listdir(secure_path)
entries = []
for item in items:
full_path = os.path.join(secure_path, item)
is_dir = os.path.isdir(full_path)
entry = {
"name": item,
"type": "directory" if is_dir else "file",
"size_bytes": os.path.getsize(full_path) if not is_dir else None
}
entries.append(entry)
return {
"success": True,
"path": path,
"entries": entries,
"total_count": len(entries)
}
except Exception as e:
return {"error": f"Failed to list directory: {str(e)}"}
@@ -1,3 +1,3 @@
from .replace_file_content import register_tools
__all__ = ["register_tools"]
__all__ = ["register_tools"]
@@ -1,46 +1,35 @@
import os
from mcp.server.fastmcp import FastMCP
from ..security import get_sandboxed_path
from ..security import get_secure_path
def register_tools(mcp: FastMCP) -> None:
"""Register file content replacement tools with the MCP server."""
@mcp.tool()
def replace_file_content(
path: str, target: str, replacement: str, agent_id: str
) -> dict:
def replace_file_content(path: str, target: str, replacement: str, workspace_id: str, agent_id: str, session_id: str) -> dict:
"""
Purpose
Replace all occurrences of a target string with replacement text in a file.
Replace content in a file within the session sandbox.
When to use
Fixing repeated errors or typos
Updating deprecated terms or placeholders
Refactoring simple patterns across a file
Rules & Constraints
Target must exist in file
Replacement must be intentional
No regex or complex logic - pure string replacement
Use this when you need to perform find-and-replace operations on file content.
All occurrences of the target string will be replaced.
Args:
path: The path to the file (relative to agent sandbox)
path: The path to the file (relative to session root)
target: The string to search for and replace
replacement: The string to replace it with
workspace_id: The ID of the workspace
agent_id: The ID of the agent
session_id: The ID of the current session
Returns:
Dict with replacement count and status, or error dict
"""
try:
secure_path = get_sandboxed_path(path, agent_id)
secure_path = get_secure_path(path, workspace_id, agent_id, session_id)
if not os.path.exists(secure_path):
return {"error": f"File not found at {path}"}
with open(secure_path, encoding="utf-8") as f:
with open(secure_path, "r", encoding="utf-8") as f:
content = f.read()
if target not in content:
@@ -56,7 +45,7 @@ def register_tools(mcp: FastMCP) -> None:
"path": path,
"occurrences_replaced": occurrences,
"target_length": len(target),
"replacement_length": len(replacement),
"replacement_length": len(replacement)
}
except Exception as e:
return {"error": f"Failed to replace content: {str(e)}"}
@@ -0,0 +1,27 @@
import os
WORKSPACES_DIR = os.path.abspath(os.path.join(os.getcwd(), "workdir/workspaces"))
def get_secure_path(path: str, workspace_id: str, agent_id: str, session_id: str) -> str:
"""Resolve and verify a path within a 3-layer sandbox (workspace/agent/session)."""
if not workspace_id or not agent_id or not session_id:
raise ValueError("workspace_id, agent_id, and session_id are all required")
# Ensure session directory exists: runtime/workspace_id/agent_id/session_id
session_dir = os.path.join(WORKSPACES_DIR, workspace_id, agent_id, session_id)
os.makedirs(session_dir, exist_ok=True)
# Resolve absolute path
if os.path.isabs(path):
# Treat absolute paths as relative to the session root if they start with /
rel_path = path.lstrip(os.sep)
final_path = os.path.abspath(os.path.join(session_dir, rel_path))
else:
final_path = os.path.abspath(os.path.join(session_dir, path))
# Verify path is within session_dir
common_prefix = os.path.commonpath([final_path, session_dir])
if common_prefix != session_dir:
raise ValueError(f"Access denied: Path '{path}' is outside the session sandbox.")
return final_path
@@ -0,0 +1,86 @@
# View File Tool
Reads the content of a file within the secure session sandbox.
## Description
The `view_file` tool allows you to read and retrieve the complete content of files within a sandboxed session environment. It provides metadata about the file along with its content.
## Use Cases
- Reading configuration files
- Viewing source code
- Inspecting log files
- Retrieving data files for processing
## Usage
```python
view_file(
path="config/settings.json",
workspace_id="workspace-123",
agent_id="agent-456",
session_id="session-789"
)
```
## Arguments
| Argument | Type | Required | Default | Description |
|----------|------|----------|---------|-------------|
| `path` | str | Yes | - | The path to the file (relative to session root) |
| `workspace_id` | str | Yes | - | The ID of the workspace |
| `agent_id` | str | Yes | - | The ID of the agent |
| `session_id` | str | Yes | - | The ID of the current session |
## Returns
Returns a dictionary with the following structure:
**Success:**
```python
{
"success": True,
"path": "config/settings.json",
"content": "{\"debug\": true}",
"size_bytes": 16,
"lines": 1
}
```
**Error:**
```python
{
"error": "File not found at config/settings.json"
}
```
## Error Handling
- Returns an error dict if the file doesn't exist
- Returns an error dict if the file cannot be read (permission issues, encoding errors, etc.)
- Handles binary files gracefully by returning appropriate error messages
## Examples
### Reading a text file
```python
result = view_file(
path="README.md",
workspace_id="ws-1",
agent_id="agent-1",
session_id="session-1"
)
# Returns: {"success": True, "path": "README.md", "content": "# My Project\n...", "size_bytes": 1024, "lines": 42}
```
### Handling missing files
```python
result = view_file(
path="nonexistent.txt",
workspace_id="ws-1",
agent_id="agent-1",
session_id="session-1"
)
# Returns: {"error": "File not found at nonexistent.txt"}
```
@@ -0,0 +1,3 @@
from .view_file import register_tools
__all__ = ["register_tools"]
@@ -0,0 +1,40 @@
import os
from mcp.server.fastmcp import FastMCP
from ..security import get_secure_path
def register_tools(mcp: FastMCP) -> None:
"""Register file view tools with the MCP server."""
@mcp.tool()
def view_file(path: str, workspace_id: str, agent_id: str, session_id: str) -> dict:
"""
Read the content of a file within the session sandbox.
Use this when you need to view the contents of an existing file.
Args:
path: The path to the file (relative to session root)
workspace_id: The ID of the workspace
agent_id: The ID of the agent
session_id: The ID of the current session
Returns:
Dict with file content and metadata, or error dict
"""
try:
secure_path = get_secure_path(path, workspace_id, agent_id, session_id)
if not os.path.exists(secure_path):
return {"error": f"File not found at {path}"}
with open(secure_path, "r", encoding="utf-8") as f:
content = f.read()
return {
"success": True,
"path": path,
"content": content,
"size_bytes": len(content.encode("utf-8")),
"lines": len(content.splitlines())
}
except Exception as e:
return {"error": f"Failed to read file: {str(e)}"}
@@ -0,0 +1,92 @@
# Write to File Tool
Writes content to a file within the secure session sandbox. Supports both overwriting and appending modes.
## Description
The `write_to_file` tool allows you to create new files or modify existing files within a sandboxed session environment. It automatically creates parent directories if they don't exist and provides flexible write modes.
## Use Cases
- Creating new configuration files
- Writing generated code or data
- Appending logs or output to existing files
- Saving processed results to disk
## Usage
```python
write_to_file(
path="config/settings.json",
content='{"debug": true}',
workspace_id="workspace-123",
agent_id="agent-456",
session_id="session-789",
append=False
)
```
## Arguments
| Argument | Type | Required | Default | Description |
|----------|------|----------|---------|-------------|
| `path` | str | Yes | - | The path to the file (relative to session root) |
| `content` | str | Yes | - | The content to write to the file |
| `workspace_id` | str | Yes | - | The ID of the workspace |
| `agent_id` | str | Yes | - | The ID of the agent |
| `session_id` | str | Yes | - | The ID of the current session |
| `append` | bool | No | False | Whether to append to the file instead of overwriting |
## Returns
Returns a dictionary with the following structure:
**Success:**
```python
{
"success": True,
"path": "config/settings.json",
"mode": "written", # or "appended"
"bytes_written": 18
}
```
**Error:**
```python
{
"error": "Failed to write to file: [error message]"
}
```
## Error Handling
- Returns an error dict if the file cannot be written (permission issues, invalid path, etc.)
- Automatically creates parent directories if they don't exist
- Handles encoding errors gracefully
## Examples
### Creating a new file
```python
result = write_to_file(
path="data/output.txt",
content="Hello, world!",
workspace_id="ws-1",
agent_id="agent-1",
session_id="session-1"
)
# Returns: {"success": True, "path": "data/output.txt", "mode": "written", "bytes_written": 13}
```
### Appending to a file
```python
result = write_to_file(
path="logs/activity.log",
content="\n[INFO] Task completed",
workspace_id="ws-1",
agent_id="agent-1",
session_id="session-1",
append=True
)
# Returns: {"success": True, "path": "logs/activity.log", "mode": "appended", "bytes_written": 24}
```
@@ -0,0 +1,3 @@
from .write_to_file import register_tools
__all__ = ["register_tools"]
@@ -0,0 +1,40 @@
import os
from mcp.server.fastmcp import FastMCP
from ..security import get_secure_path
def register_tools(mcp: FastMCP) -> None:
"""Register file write tools with the MCP server."""
@mcp.tool()
def write_to_file(path: str, content: str, workspace_id: str, agent_id: str, session_id: str, append: bool = False) -> dict:
"""
Write content to a file within the session sandbox.
Use this when you need to create a new file or overwrite an existing file.
Set append=True to add content to the end of an existing file.
Args:
path: The path to the file (relative to session root)
content: The content to write to the file
workspace_id: The ID of the workspace
agent_id: The ID of the agent
session_id: The ID of the current session
append: Whether to append to the file instead of overwriting (default: False)
Returns:
Dict with success status and path, or error dict
"""
try:
secure_path = get_secure_path(path, workspace_id, agent_id, session_id)
os.makedirs(os.path.dirname(secure_path), exist_ok=True)
mode = "a" if append else "w"
with open(secure_path, mode, encoding="utf-8") as f:
f.write(content)
return {
"success": True,
"path": path,
"mode": "appended" if append else "written",
"bytes_written": len(content.encode("utf-8"))
}
except Exception as e:
return {"error": f"Failed to write to file: {str(e)}"}
@@ -0,0 +1,29 @@
# File Write Tool
Write content to local files with encoding support.
## Description
Can create new files or overwrite/append to existing ones. Use for saving data, creating configs, writing reports, or exporting results. Optionally creates parent directories if they don't exist.
## Arguments
| Argument | Type | Required | Default | Description |
|----------|------|----------|---------|-------------|
| `file_path` | str | Yes | - | Path to the file to write (absolute or relative) |
| `content` | str | Yes | - | Content to write to the file |
| `encoding` | str | No | `utf-8` | File encoding (utf-8, latin-1, etc.) |
| `mode` | str | No | `write` | Write mode - 'write' (overwrite) or 'append' |
| `create_dirs` | bool | No | `True` | Create parent directories if they don't exist |
## Environment Variables
This tool does not require any environment variables.
## Error Handling
Returns error dicts for common issues:
- `Parent directory does not exist: <path>` - Parent dir missing and create_dirs=False
- `Invalid mode: <mode>. Use 'write' or 'append'.` - Invalid mode specified
- `Permission denied: <path>` - No write access to file/directory
- `OS error writing file: <error>` - Filesystem error
@@ -0,0 +1,4 @@
"""File Write Tool - Create or update local files."""
from .file_write_tool import register_tools
__all__ = ["register_tools"]
@@ -0,0 +1,83 @@
"""
File Write Tool - Create or update local files.
Supports writing text files with various encodings.
Can create directories if they don't exist.
"""
from __future__ import annotations
from pathlib import Path
from fastmcp import FastMCP
def register_tools(mcp: FastMCP) -> None:
"""Register file write tools with the MCP server."""
@mcp.tool()
def file_write(
file_path: str,
content: str,
encoding: str = "utf-8",
mode: str = "write",
create_dirs: bool = True,
) -> dict:
"""
Write content to a local file.
Can create new files or overwrite/append to existing ones.
Use for saving data, creating configs, writing reports, or exporting results.
Args:
file_path: Path to the file to write (absolute or relative)
content: Content to write to the file
encoding: File encoding (utf-8, latin-1, etc.)
mode: Write mode - 'write' (overwrite) or 'append'
create_dirs: Create parent directories if they don't exist
Returns:
Dict with write result or error dict
"""
try:
path = Path(file_path).resolve()
# Create parent directories if requested
if create_dirs:
path.parent.mkdir(parents=True, exist_ok=True)
elif not path.parent.exists():
return {"error": f"Parent directory does not exist: {path.parent}"}
# Determine write mode
if mode == "append":
write_mode = "a"
elif mode == "write":
write_mode = "w"
else:
return {"error": f"Invalid mode: {mode}. Use 'write' or 'append'."}
# Check if we're overwriting
existed = path.exists()
previous_size = path.stat().st_size if existed else 0
# Write the file
with open(path, write_mode, encoding=encoding) as f:
f.write(content)
new_size = path.stat().st_size
return {
"path": str(path),
"name": path.name,
"bytes_written": len(content.encode(encoding)),
"total_size": new_size,
"mode": mode,
"created": not existed,
"previous_size": previous_size if existed else None,
}
except PermissionError:
return {"error": f"Permission denied: {file_path}"}
except OSError as e:
return {"error": f"OS error writing file: {str(e)}"}
except Exception as e:
return {"error": f"Failed to write file: {str(e)}"}
@@ -1,5 +1,4 @@
"""PDF Read Tool - Parse and extract text from PDF files."""
from .pdf_read_tool import register_tools
__all__ = ["register_tools"]
@@ -1,17 +1,14 @@
"""
PDF Read Tool - Manage Accounting and Financial Operations.
PDF Read Tool - Parse and extract text from PDF files.
Uses pypdf to read PDF documents and extract text content
along with metadata. Supports both local file paths and URLs.
along with metadata.
"""
from __future__ import annotations
import tempfile
from pathlib import Path
from typing import Any
from typing import Any, List
import httpx
from fastmcp import FastMCP
from pypdf import PdfReader
@@ -20,27 +17,16 @@ def register_tools(mcp: FastMCP) -> None:
"""Register PDF read tools with the MCP server."""
def parse_page_range(
pages: str | None,
total_pages: int,
max_pages: int,
) -> dict[str, Any]:
pages: str | None, total_pages: int, max_pages: int
) -> List[int] | dict:
"""
Parse page range string into list of 0-indexed page numbers.
Returns:
Dict with either:
- {"indices": [...], "truncated": bool, "requested_pages": int}
- {"error": "..."} on invalid input
Returns list of indices or error dict.
"""
if pages is None or pages.lower() == "all":
requested_pages = total_pages
limited = min(total_pages, max_pages)
indices = list(range(limited))
return {
"indices": indices,
"truncated": requested_pages > max_pages,
"requested_pages": requested_pages,
}
indices = list(range(min(total_pages, max_pages)))
return indices
try:
# Single page: "5"
@@ -48,7 +34,7 @@ def register_tools(mcp: FastMCP) -> None:
page_num = int(pages)
if page_num < 1 or page_num > total_pages:
return {"error": f"Page {page_num} out of range. PDF has {total_pages} pages."}
return {"indices": [page_num - 1], "truncated": False, "requested_pages": 1}
return [page_num - 1]
# Range: "1-10"
if "-" in pages and "," not in pages:
@@ -60,14 +46,8 @@ def register_tools(mcp: FastMCP) -> None:
return {"error": f"Page numbers start at 1, got {start}."}
if end > total_pages:
return {"error": f"Page {end} out of range. PDF has {total_pages} pages."}
requested_pages = end - start + 1
limited_end = min(end, start - 1 + max_pages)
indices = list(range(start - 1, limited_end))
return {
"indices": indices,
"truncated": requested_pages > max_pages,
"requested_pages": requested_pages,
}
indices = list(range(start - 1, min(end, start - 1 + max_pages)))
return indices
# Comma-separated: "1,3,5"
if "," in pages:
@@ -75,13 +55,8 @@ def register_tools(mcp: FastMCP) -> None:
for p in page_nums:
if p < 1 or p > total_pages:
return {"error": f"Page {p} out of range. PDF has {total_pages} pages."}
requested_pages = len(page_nums)
indices = [p - 1 for p in page_nums[:max_pages]]
return {
"indices": indices,
"truncated": requested_pages > max_pages,
"requested_pages": requested_pages,
}
return indices
return {"error": f"Invalid page format: '{pages}'. Use 'all', '5', '1-10', or '1,3,5'."}
@@ -100,60 +75,18 @@ def register_tools(mcp: FastMCP) -> None:
Returns text content with page markers and optional metadata.
Use for reading PDFs, reports, documents, or any PDF file.
Supports both local file paths and URLs.
Args:
file_path: Path or URL to the PDF file (local path, or http/https URL)
pages: Page range - 'all'/None for all, '5' for single,
'1-10' for range, '1,3,5' for specific
file_path: Path to the PDF file to read (absolute or relative)
pages: Page range to extract - 'all'/None for all, '5' for single, '1-10' for range, '1,3,5' for specific
max_pages: Maximum number of pages to process (1-1000, memory safety)
include_metadata: Include PDF metadata (author, title, creation date, etc.)
Returns:
Dict with extracted text and metadata, or error dict
"""
temp_file = None
try:
# Check if input is a URL
is_url = file_path.startswith(("http://", "https://"))
if is_url:
# Download PDF from URL to temporary file
try:
response = httpx.get(
file_path,
headers={"User-Agent": "AdenBot/1.0 (PDF Reader)"},
follow_redirects=True,
timeout=60.0,
)
if response.status_code != 200:
return {"error": f"Failed to download PDF: HTTP {response.status_code}"}
# Validate content-type
content_type = response.headers.get("content-type", "").lower()
if "application/pdf" not in content_type:
return {
"error": (
f"URL does not point to a PDF file. Content-Type: {content_type}"
),
"content_type": content_type,
"url": file_path,
}
# Save to temporary file
temp_file = tempfile.NamedTemporaryFile(mode="wb", suffix=".pdf", delete=False)
temp_file.write(response.content)
temp_file.close()
path = Path(temp_file.name)
except httpx.TimeoutException:
return {"error": "PDF download timed out"}
except httpx.RequestError as e:
return {"error": f"Failed to download PDF: {str(e)}"}
else:
# Local file path
path = Path(file_path).resolve()
path = Path(file_path).resolve()
# Validate file exists
if not path.exists():
@@ -182,11 +115,9 @@ def register_tools(mcp: FastMCP) -> None:
total_pages = len(reader.pages)
# Parse page range
page_info = parse_page_range(pages, total_pages, max_pages)
if "error" in page_info:
return page_info
page_indices = page_info["indices"]
page_indices = parse_page_range(pages, total_pages, max_pages)
if isinstance(page_indices, dict): # Error dict
return page_indices
# Extract text from pages
content_parts = []
@@ -205,15 +136,6 @@ def register_tools(mcp: FastMCP) -> None:
"char_count": len(content),
}
# Surface truncation information when requested pages exceed max_pages
if page_info.get("truncated"):
requested = page_info.get("requested_pages", len(page_indices))
result["truncated"] = True
result["truncation_warning"] = (
f"Requested {requested} page(s), but max_pages={max_pages}. "
f"Only the first {len(page_indices)} page(s) were processed."
)
# Add metadata if requested
if include_metadata and reader.metadata:
meta = reader.metadata
@@ -223,9 +145,7 @@ def register_tools(mcp: FastMCP) -> None:
"subject": meta.get("/Subject"),
"creator": meta.get("/Creator"),
"producer": meta.get("/Producer"),
"created": str(meta.get("/CreationDate"))
if meta.get("/CreationDate")
else None,
"created": str(meta.get("/CreationDate")) if meta.get("/CreationDate") else None,
"modified": str(meta.get("/ModDate")) if meta.get("/ModDate") else None,
}
@@ -235,10 +155,3 @@ def register_tools(mcp: FastMCP) -> None:
return {"error": f"Permission denied: {file_path}"}
except Exception as e:
return {"error": f"Failed to read PDF: {str(e)}"}
finally:
# Clean up temporary file if it was created
if temp_file is not None:
try:
Path(temp_file.name).unlink(missing_ok=True)
except Exception:
pass # Ignore cleanup errors
@@ -1,10 +1,10 @@
# Web Scrape Tool
Scrape and extract text content from webpages using a headless browser.
Scrape and extract text content from webpages.
## Description
Use when you need to read the content of a specific URL, extract data from a website, or read articles/documentation. Uses Playwright with stealth to render JavaScript-heavy pages and evade bot detection. Automatically removes noise elements (scripts, navigation, footers) and extracts the main content.
Use when you need to read the content of a specific URL, extract data from a website, or read articles/documentation. Automatically removes noise elements (scripts, navigation, footers) and extracts the main content.
## Arguments
@@ -14,16 +14,6 @@ Use when you need to read the content of a specific URL, extract data from a web
| `selector` | str | No | `None` | CSS selector to target specific content (e.g., 'article', '.main-content') |
| `include_links` | bool | No | `False` | Include extracted links in the response |
| `max_length` | int | No | `50000` | Maximum length of extracted text (1000-500000) |
| `respect_robots_txt` | bool | No | `True` | Whether to respect robots.txt rules |
## Setup
Requires Chromium browser binaries:
```bash
uv pip install playwright playwright-stealth
uv run playwright install chromium
```
## Environment Variables
@@ -33,19 +23,14 @@ This tool does not require any environment variables.
Returns error dicts for common issues:
- `HTTP <status>: Failed to fetch URL` - Server returned error status
- `Navigation failed: no response received` - Browser could not navigate to URL
- `No elements found matching selector: <selector>` - CSS selector matched nothing
- `Request timed out` - Page load exceeded 60s timeout
- `Blocked by robots.txt: <url>` - URL disallowed by site's robots.txt
- `Browser error: <error>` - Playwright/Chromium error
- `Request timed out` - Request exceeded 30s timeout
- `Network error: <error>` - Connection or DNS issues
- `Scraping failed: <error>` - HTML parsing or other error
## Notes
- Uses Playwright (Chromium) with playwright-stealth for bot detection evasion
- Renders JavaScript before extracting content (works with SPAs and dynamic pages)
- URLs without protocol are automatically prefixed with `https://`
- Waits for `networkidle` before extracting content
- Follows redirects automatically
- Removes script, style, nav, footer, header, aside, noscript, and iframe elements
- Auto-detects main content using article, main, or common content class selectors
- Respects robots.txt by default (set `respect_robots_txt=False` to disable)
@@ -1,5 +1,4 @@
"""Web Scrape Tool - Extract content from web pages."""
from .web_scrape_tool import register_tools
__all__ = ["register_tools"]
@@ -0,0 +1,134 @@
"""
Web Scrape Tool - Extract content from web pages.
Uses httpx for requests and BeautifulSoup for HTML parsing.
Returns clean text content from web pages.
"""
from __future__ import annotations
from typing import Any, List
import httpx
from bs4 import BeautifulSoup
from fastmcp import FastMCP
def register_tools(mcp: FastMCP) -> None:
"""Register web scrape tools with the MCP server."""
@mcp.tool()
def web_scrape(
url: str,
selector: str | None = None,
include_links: bool = False,
max_length: int = 50000,
) -> dict:
"""
Scrape and extract text content from a webpage.
Use when you need to read the content of a specific URL,
extract data from a website, or read articles/documentation.
Args:
url: URL of the webpage to scrape
selector: CSS selector to target specific content (e.g., 'article', '.main-content')
include_links: Include extracted links in the response
max_length: Maximum length of extracted text (1000-500000)
Returns:
Dict with scraped content (url, title, description, content, length) or error dict
"""
try:
# Validate URL
if not url.startswith(("http://", "https://")):
url = "https://" + url
# Validate max_length
if max_length < 1000:
max_length = 1000
elif max_length > 500000:
max_length = 500000
# Make request
response = httpx.get(
url,
headers={
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.5",
},
follow_redirects=True,
timeout=30.0,
)
if response.status_code != 200:
return {"error": f"HTTP {response.status_code}: Failed to fetch URL"}
# Parse HTML
soup = BeautifulSoup(response.text, "html.parser")
# Remove noise elements
for tag in soup(["script", "style", "nav", "footer", "header", "aside", "noscript", "iframe"]):
tag.decompose()
# Get title and description
title = ""
title_tag = soup.find("title")
if title_tag:
title = title_tag.get_text(strip=True)
description = ""
meta_desc = soup.find("meta", attrs={"name": "description"})
if meta_desc:
description = meta_desc.get("content", "")
# Target content
if selector:
content_elem = soup.select_one(selector)
if not content_elem:
return {"error": f"No elements found matching selector: {selector}"}
text = content_elem.get_text(separator=" ", strip=True)
else:
# Auto-detect main content
main_content = (
soup.find("article")
or soup.find("main")
or soup.find(attrs={"role": "main"})
or soup.find(class_=["content", "post", "entry", "article-body"])
or soup.find("body")
)
text = main_content.get_text(separator=" ", strip=True) if main_content else ""
# Clean up whitespace
text = " ".join(text.split())
# Truncate if needed
if len(text) > max_length:
text = text[:max_length] + "..."
result: dict[str, Any] = {
"url": str(response.url),
"title": title,
"description": description,
"content": text,
"length": len(text),
}
# Extract links if requested
if include_links:
links: List[dict[str, str]] = []
for a in soup.find_all("a", href=True)[:50]:
href = a["href"]
link_text = a.get_text(strip=True)
if link_text and href:
links.append({"text": link_text, "href": href})
result["links"] = links
return result
except httpx.TimeoutException:
return {"error": "Request timed out"}
except httpx.RequestError as e:
return {"error": f"Network error: {str(e)}"}
except Exception as e:
return {"error": f"Scraping failed: {str(e)}"}
@@ -0,0 +1,31 @@
# Web Search Tool
Search the web using the Brave Search API.
## Description
Returns titles, URLs, and snippets for search results. Use when you need current information, research topics, or find websites.
## Arguments
| Argument | Type | Required | Default | Description |
|----------|------|----------|---------|-------------|
| `query` | str | Yes | - | The search query (1-500 chars) |
| `num_results` | int | No | `10` | Number of results to return (1-20) |
| `country` | str | No | `us` | Country code for localized results (us, uk, de, etc.) |
## Environment Variables
| Variable | Required | Description |
|----------|----------|-------------|
| `BRAVE_SEARCH_API_KEY` | Yes | API key from [Brave Search API](https://brave.com/search/api/) |
## Error Handling
Returns error dicts for common issues:
- `BRAVE_SEARCH_API_KEY environment variable not set` - Missing API key
- `Query must be 1-500 characters` - Empty or too long query
- `Invalid API key` - API key rejected (HTTP 401)
- `Rate limit exceeded. Try again later.` - Too many requests (HTTP 429)
- `Search request timed out` - Request exceeded 30s timeout
- `Network error: <error>` - Connection or DNS issues
@@ -1,5 +1,4 @@
"""Web Search Tool - Search the web using Brave Search API."""
from .web_search_tool import register_tools
__all__ = ["register_tools"]

Some files were not shown because too many files have changed in this diff Show More