feat: browser automated test

This commit is contained in:
Timothy
2026-04-03 07:31:10 -07:00
parent e0cd16b92b
commit 95f1d1abcd
25 changed files with 5095 additions and 458 deletions
+241
View File
@@ -0,0 +1,241 @@
---
name: browser-edge-cases
description: SOP for debugging browser automation failures on complex websites. Use when browser tools fail on specific sites like LinkedIn, Twitter/X, SPAs, or sites with Shadow DOM.
license: MIT
---
# Browser Tool Edge Cases
Standard Operating Procedure for debugging and fixing browser automation failures on complex websites.
## When to Use This Skill
- `browser_scroll` succeeds but page doesn't move
- `browser_click` succeeds but no action triggered
- `browser_type` text disappears or doesn't work
- `browser_snapshot` hangs or returns stale content
- `browser_navigate` loads wrong content
## SOP: Debugging Browser Tool Failures
### Phase 1: Reproduce & Isolate
```
1. Create minimal test case demonstrating failure
2. Test against simple site (example.com) to verify tool works
3. Test against problematic site to confirm issue
```
**Quick isolation test:**
```python
# Test 1: Does the tool work at all?
await browser_navigate(tab_id, "https://example.com")
result = await browser_scroll(tab_id, "down", 100)
# Should work on simple sites
# Test 2: Does it fail on the problematic site?
await browser_navigate(tab_id, "https://linkedin.com/feed")
result = await browser_scroll(tab_id, "down", 100)
# If this fails but example.com works → site-specific edge case
```
### Phase 2: Analyze Root Cause
**Step 2a: Check console for errors**
```python
console = await browser_console(tab_id)
# Look for: CSP violations, React errors, JavaScript exceptions
```
**Step 2b: Inspect DOM structure**
```python
html = await browser_html(tab_id)
snapshot = await browser_snapshot(tab_id)
# Look for:
# - Nested scrollable divs (overflow: scroll/auto)
# - Shadow DOM roots
# - iframes
# - Custom widgets
```
**Step 2c: Identify the pattern**
| Symptom | Likely Cause | Check |
|---------|--------------|-------|
| Scroll doesn't move | Nested scroll container | Look for `overflow: scroll` divs |
| Click no effect | Element covered | Check `getBoundingClientRect` vs viewport |
| Type clears | Autocomplete/React | Check for event listeners on input |
| Snapshot hangs | Huge DOM | Check node count in snapshot |
| Snapshot stale | SPA hydration | Wait after navigation |
### Phase 3: Implement Multi-Layer Fix
**Pattern: Always have fallbacks**
```python
async def robust_operation(tab_id):
# Method 1: Primary approach
try:
result = await primary_method(tab_id)
if verify_success(result):
return result
except Exception:
pass
# Method 2: CDP fallback
try:
result = await cdp_fallback(tab_id)
if verify_success(result):
return result
except Exception:
pass
# Method 3: JavaScript fallback
return await javascript_fallback(tab_id)
```
**Pattern: Always add timeouts**
```python
# Bad - can hang forever
result = await browser_snapshot(tab_id)
# Good - fails fast with useful error
try:
result = await browser_snapshot(tab_id, timeout_s=10.0)
except asyncio.TimeoutError:
# Handle timeout gracefully
result = await fallback_snapshot(tab_id)
```
### Phase 4: Verify Fix
```
1. Run against problematic site → should work
2. Run against simple site → should still work (regression check)
3. Document in registry.md
```
## Pattern Library
### P1: Nested Scrollable Containers
**Sites:** LinkedIn, Twitter/X, any SPA with scrollable feeds
**Detection:**
```javascript
// Find largest scrollable container
const candidates = [];
document.querySelectorAll('*').forEach(el => {
const style = getComputedStyle(el);
if (style.overflow.includes('scroll') || style.overflow.includes('auto')) {
const rect = el.getBoundingClientRect();
if (rect.width > 100 && rect.height > 100) {
candidates.push({el, area: rect.width * rect.height});
}
}
});
candidates.sort((a, b) => b.area - a.area);
return candidates[0]?.el;
```
**Fix:** Dispatch scroll events at container's center, not viewport center.
### P2: Element Covered by Overlay
**Sites:** Modals, tooltips, SPAs with loading overlays
**Detection:**
```javascript
const rect = element.getBoundingClientRect();
const centerX = rect.left + rect.width / 2;
const centerY = rect.top + rect.height / 2;
const topElement = document.elementFromPoint(centerX, centerY);
return topElement === element || element.contains(topElement);
```
**Fix:** Wait for overlay to disappear, or use JavaScript click.
### P3: React Synthetic Events
**Sites:** React SPAs, modern web apps
**Detection:** If CDP click doesn't trigger handler but manual click works.
**Fix:** Use JavaScript click as primary:
```javascript
element.click();
```
### P4: Huge DOM / Accessibility Tree
**Sites:** LinkedIn, Facebook, Twitter (feeds with 1000s of nodes)
**Detection:**
```javascript
document.querySelectorAll('*').length > 5000
```
**Fix:**
1. Add timeout to snapshot operation
2. Truncate tree at 2000 nodes
3. Fall back to DOM-based snapshot if accessibility tree too large
### P5: SPA Hydration Delay
**Sites:** React, Vue, Angular SPAs after navigation
**Detection:**
```javascript
// Check if React app has hydrated
document.querySelector('[data-reactroot]') ||
document.querySelector('[data-reactid]')
```
**Fix:** Wait for specific selector after navigation:
```python
await browser_navigate(tab_id, url, wait_until="load")
await browser_wait(tab_id, selector='[data-testid="content"]', timeout_ms=5000)
```
### P6: Shadow DOM
**Sites:** Components using Shadow DOM, Lit elements
**Detection:**
```javascript
document.querySelectorAll('*').some(el => el.shadowRoot)
```
**Fix:** Pierce shadow root:
```javascript
function queryShadow(selector) {
const parts = selector.split('>>>');
let node = document;
for (const part of parts) {
if (node.shadowRoot) {
node = node.shadowRoot.querySelector(part.trim());
} else {
node = node.querySelector(part.trim());
}
}
return node;
}
```
## Quick Reference
| Issue | Primary Fix | Fallback |
|-------|-------------|----------|
| Scroll not working | Find scrollable container | Mouse wheel at container center |
| Click no effect | JavaScript click() | CDP mouse events |
| Type clears | Add delay_ms | Use execCommand |
| Snapshot hangs | Add timeout_s | DOM snapshot fallback |
| Stale content | Wait for selector | Increase wait_until timeout |
| Shadow DOM | Pierce selector | JavaScript traversal |
## References
- [registry.md](registry.md) - Full list of known edge cases
- [scripts/test_case.py](scripts/test_case.py) - Template for testing new cases
- [BROWSER_USE_PATTERNS.md](../../tools/BROWSER_USE_PATTERNS.md) - Implementation patterns from browser-use
@@ -0,0 +1,232 @@
# Browser Edge Case Registry
Curated list of known browser automation edge cases with symptoms, causes, and fixes.
---
## Scroll Issues
### #1: LinkedIn Nested Scroll Container
| Attribute | Value |
|-----------|-------|
| **Site** | LinkedIn (linkedin.com/feed) |
| **Symptom** | `browser_scroll()` returns `{ok: true}` but page doesn't move |
| **Root Cause** | Content is in a nested scrollable div (`overflow: scroll`), not the main window |
| **Detection** | `document.querySelectorAll('*')` with `overflow: scroll/auto` has large candidates |
| **Fix** | Find largest scrollable container, dispatch mouse wheel at its center coordinates |
| **Code** | `bridge.py:808-981` - smart scroll with container detection |
| **Verified** | 2026-04-02 |
### #2: Twitter/X Lazy Loading
| Attribute | Value |
|-----------|-------|
| **Site** | Twitter/X (x.com) |
| **Symptom** | Infinite scroll doesn't load new content |
| **Root Cause** | Lazy loading requires content to be visible before loading more |
| **Detection** | Scroll position at bottom but no new `[data-testid="tweet"]` elements |
| **Fix** | Add `wait_for_selector` between scroll calls with 1s delay |
| **Code** | Test file: `tests/test_x_page_load_repro.py` |
| **Verified** | - |
### #3: Modal/Dialog Scroll Container
| Attribute | Value |
|-----------|-------|
| **Site** | Any site with modal dialogs |
| **Symptom** | Scroll scrolls background page, not modal content |
| **Root Cause** | Modal has its own scroll container with `overflow: scroll` |
| **Detection** | Visible element with `position: fixed` and scrollable content |
| **Fix** | Find visible modal container (highest z-index scrollable), scroll that |
| **Code** | - |
| **Verified** | - |
---
## Click Issues
### #4: Element Covered by Overlay
| Attribute | Value |
|-----------|-------|
| **Site** | SPAs, sites with loading overlays |
| **Symptom** | Click succeeds but no action triggered |
| **Root Cause** | Element is covered by transparent overlay, tooltip, or iframe |
| **Detection** | `document.elementFromPoint(x, y) !== target` |
| **Fix** | Wait for overlay to disappear, or use JavaScript `element.click()` |
| **Code** | `bridge.py:394-591` - JavaScript click as primary |
| **Verified** | - |
### #5: React Synthetic Events
| Attribute | Value |
|-----------|-------|
| **Site** | React applications |
| **Symptom** | CDP click doesn't trigger React handler |
| **Root Cause** | React uses synthetic events that don't respond to CDP events |
| **Detection** | Site uses React (check for `__reactFiber$` or `data-reactroot`) |
| **Fix** | Use JavaScript `element.click()` as primary method |
| **Code** | `bridge.py:394-591` - JavaScript-first click |
| **Verified** | - |
### #6: Shadow DOM Elements
| Attribute | Value |
|-----------|-------|
| **Site** | Components using Shadow DOM, Lit elements |
| **Symptom** | `querySelector` can't find element |
| **Root Cause** | Element is inside a shadow root, not main DOM tree |
| **Detection** | `element.shadowRoot !== null` on parent elements |
| **Fix** | Use piercing selector (`host >>> target`) or traverse shadow roots |
| **Code** | See SKILL.md P6 pattern |
| **Verified** | - |
---
## Input Issues
### #7: ContentEditable / Rich Text Editors
| Attribute | Value |
|-----------|-------|
| **Site** | Rich text editors (Notion, Slack web, etc.) |
| **Symptom** | `browser_type()` doesn't insert text |
| **Root Cause** | Element is `contenteditable`, not an `<input>` or `<textarea>` |
| **Detection** | `element.contentEditable === 'true'` |
| **Fix** | Focus via JavaScript, use `execCommand('insertText')` or `Input.dispatchKeyEvent` |
| **Code** | `bridge.py:616-694` - contentEditable handling |
| **Verified** | - |
### #8: Autocomplete Field Clearing
| Attribute | Value |
|-----------|-------|
| **Site** | Search fields with autocomplete, address forms |
| **Symptom** | Typed text gets cleared immediately |
| **Root Cause** | Field expects realistic keystroke timing for autocomplete |
| **Detection** | Field has autocomplete listeners or dropdown appears |
| **Fix** | Add `delay_ms=50` between keystrokes |
| **Code** | `bridge.py:type()` - delay_ms parameter |
| **Verified** | - |
### #9: Custom Date Pickers
| Attribute | Value |
|-----------|-------|
| **Site** | Forms with custom date widgets |
| **Symptom** | Can't type date into date field |
| **Root Cause** | Custom widget intercepts and blocks keyboard input |
| **Detection** | Typing doesn't change field value |
| **Fix** | Click calendar widget icon, select date from dropdown |
| **Code** | - |
| **Verified** | - |
---
## Snapshot Issues
### #10: LinkedIn Huge DOM Tree
| Attribute | Value |
|-----------|-------|
| **Site** | LinkedIn, Facebook, Twitter feeds |
| **Symptom** | `browser_snapshot()` hangs forever |
| **Root Cause** | 10k+ DOM nodes, accessibility tree has 50k+ nodes |
| **Detection** | `document.querySelectorAll('*').length > 5000` |
| **Fix** | Add timeout (10s default), truncate tree at 2000 nodes |
| **Code** | `bridge.py:1005-1050` - timeout_s param, max_nodes limit |
| **Verified** | 2026-04-02 |
### #11: SPA Hydration Delay
| Attribute | Value |
|-----------|-------|
| **Site** | React/Vue/Angular SPAs |
| **Symptom** | Snapshot shows old content after navigation |
| **Root Cause** | Client-side hydration hasn't completed when snapshot runs |
| **Detection** | `document.readyState === 'complete'` but content missing |
| **Fix** | Wait for specific selector after navigation |
| **Code** | Test file: `tests/test_x_page_load_repro.py` |
| **Verified** | - |
### #12: iframe Content Missing
| Attribute | Value |
|-----------|-------|
| **Site** | Sites with embedded content |
| **Symptom** | Snapshot missing iframe content |
| **Root Cause** | Accessibility tree doesn't include iframe content |
| **Detection** | `document.querySelectorAll('iframe')` has results |
| **Fix** | Use `DOM.getFrameOwner` + separate snapshot for each iframe |
| **Code** | - |
| **Verified** | - |
---
## Navigation Issues
### #13: SPA Navigation Events
| Attribute | Value |
|-----------|-------|
| **Site** | React Router, Vue Router SPAs |
| **Symptom** | `wait_until="load"` fires before content ready |
| **Root Cause** | SPA uses client-side routing, no full page load |
| **Detection** | URL changes but `load` event already fired |
| **Fix** | Use `wait_until="networkidle"` or `wait_for_selector` |
| **Code** | `bridge.py:navigate()` - wait_until options |
| **Verified** | - |
### #14: Cross-Origin Redirects
| Attribute | Value |
|-----------|-------|
| **Site** | OAuth flows, SSO logins |
| **Symptom** | Navigation fails during redirect |
| **Root Cause** | Cross-origin security prevents CDP tracking |
| **Detection** | URL changes to different domain |
| **Fix** | Use `wait_for_url` with pattern matching instead of exact URL |
| **Code** | - |
| **Verified** | - |
---
## How to Add New Edge Cases
1. **Reproduce** the issue with minimal test case
2. **Document** using the template below
3. **Implement** fix with multi-layer fallback
4. **Verify** against both problematic and simple sites
5. **Submit** by appending to this file
### Template
```markdown
### #N: [Short Title]
| Attribute | Value |
|-----------|-------|
| **Site** | [URL or site type] |
| **Symptom** | [What the user observes] |
| **Root Cause** | [Technical explanation] |
| **Detection** | [JavaScript to detect this case] |
| **Fix** | [Solution approach] |
| **Code** | [File:line reference if implemented] |
| **Verified** | [Date or "pending"] |
```
---
## Statistics
| Category | Count |
|----------|-------|
| Scroll Issues | 3 |
| Click Issues | 3 |
| Input Issues | 3 |
| Snapshot Issues | 3 |
| Navigation Issues | 2 |
| **Total** | **14** |
Last updated: 2026-04-02
@@ -0,0 +1,111 @@
#!/usr/bin/env python
"""
Test #2: Twitter/X Lazy Loading Scroll
Symptom: Infinite scroll doesn't load new content
Root Cause: Lazy loading requires content to be visible before loading more
Fix: Add wait_for_selector between scroll calls
"""
import asyncio
import sys
import time
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent.parent.parent / "tools" / "src"))
from gcu.browser.bridge import BeelineBridge
BRIDGE_PORT = 9229
CONTEXT_NAME = "twitter-scroll-test"
async def test_twitter_lazy_scroll():
"""Test that repeated scrolls with waits load new content."""
print("=" * 70)
print("TEST #2: Twitter/X Lazy Loading Scroll")
print("=" * 70)
bridge = BeelineBridge()
try:
await bridge.start()
for i in range(10):
await asyncio.sleep(1)
if bridge.is_connected:
print("✓ Extension connected!")
break
print(f"Waiting for extension... ({i+1}/10)")
else:
print("✗ Extension not connected")
return
context = await bridge.create_context(CONTEXT_NAME)
tab_id = context.get("tabId")
group_id = context.get("groupId")
print(f"✓ Created tab: {tab_id}")
# Navigate to Twitter/X
print("\n--- Navigating to X.com ---")
await bridge.navigate(tab_id, "https://x.com", wait_until="networkidle", timeout_ms=30000)
print("✓ Page loaded")
# Wait for tweets to appear
print("\n--- Waiting for tweets ---")
await bridge.wait_for_selector(tab_id, '[data-testid="tweet"]', timeout_ms=10000)
# Count initial tweets
initial_count = await bridge.evaluate(
tab_id,
'(function() { return document.querySelectorAll(\'[data-testid="tweet"]\').length; })()'
)
print(f"Initial tweet count: {initial_count.get('result', 0)}")
# Take screenshot of initial state
screenshot = await bridge.screenshot(tab_id)
print(f"Screenshot: {len(screenshot.get('data', ''))} bytes")
# Scroll multiple times with waits
print("\n--- Scrolling with waits ---")
for i in range(3):
result = await bridge.scroll(tab_id, "down", 500)
print(f" Scroll {i+1}: {result.get('method', 'unknown')} method")
# Wait for new content to load
await asyncio.sleep(2)
# Count tweets after scroll
count_result = await bridge.evaluate(
tab_id,
'(function() { return document.querySelectorAll(\'[data-testid="tweet"]\').length; })()'
)
count = count_result.get('result', 0)
print(f" Tweet count after scroll: {count}")
# Final count
final_count = await bridge.evaluate(
tab_id,
'(function() { return document.querySelectorAll(\'[data-testid="tweet"]\').length; })()'
)
final = final_count.get('result', 0)
initial = initial_count.get('result', 0)
print(f"\n--- Results ---")
print(f"Initial tweets: {initial}")
print(f"Final tweets: {final}")
if final > initial:
print(f"✓ PASS: Loaded {final - initial} new tweets")
else:
print("✗ FAIL: No new tweets loaded (may need login)")
await bridge.destroy_context(group_id)
print("\n✓ Context destroyed")
finally:
await bridge.stop()
if __name__ == "__main__":
asyncio.run(test_twitter_lazy_scroll())
@@ -0,0 +1,97 @@
#!/usr/bin/env python
"""
Test #3: Modal/Dialog Scroll Container
Symptom: Scroll scrolls background page, not modal content
Root Cause: Modal has its own scroll container with overflow: scroll
Fix: Find visible modal container (highest z-index scrollable), scroll that
"""
import asyncio
import sys
import time
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent.parent.parent / "tools" / "src"))
from gcu.browser.bridge import BeelineBridge
BRIDGE_PORT = 9229
CONTEXT_NAME = "modal-scroll-test"
# Test site with modal - using a demo site
MODAL_DEMO_URL = "https://www.w3schools.com/howto/howto_css_modals.asp"
async def test_modal_scroll():
"""Test that scroll targets modal content, not background."""
print("=" * 70)
print("TEST #3: Modal/Dialog Scroll Container")
print("=" * 70)
bridge = BeelineBridge()
try:
await bridge.start()
for i in range(10):
await asyncio.sleep(1)
if bridge.is_connected:
print("✓ Extension connected!")
break
else:
print("✗ Extension not connected")
return
context = await bridge.create_context(CONTEXT_NAME)
tab_id = context.get("tabId")
group_id = context.get("groupId")
print(f"✓ Created tab: {tab_id}")
# Navigate to modal demo
print("\n--- Navigating to modal demo ---")
await bridge.navigate(tab_id, MODAL_DEMO_URL, wait_until="load")
print("✓ Page loaded")
# Take screenshot before
screenshot_before = await bridge.screenshot(tab_id)
print(f"Screenshot before: {len(screenshot_before.get('data', ''))} bytes")
# Click button to open modal
print("\n--- Opening modal ---")
# Find and click the "Open Modal" button
result = await bridge.click(tab_id, '.ws-btn', timeout_ms=5000)
print(f"Click result: {result}")
await asyncio.sleep(1)
# Take screenshot with modal open
screenshot_modal = await bridge.screenshot(tab_id)
print(f"Screenshot modal open: {len(screenshot_modal.get('data', ''))} bytes")
# Try to scroll within modal
print("\n--- Scrolling modal content ---")
result = await bridge.scroll(tab_id, "down", 100)
print(f"Scroll result: {result}")
await asyncio.sleep(0.5)
# Take screenshot after scroll
screenshot_after = await bridge.screenshot(tab_id)
print(f"Screenshot after scroll: {len(screenshot_after.get('data', ''))} bytes")
# Check if modal content scrolled (not background)
# This is a visual check - we can verify by comparing screenshots
print("\n--- Results ---")
print(f"Modal scroll test completed. Method used: {result.get('method', 'unknown')}")
print("Visual verification needed: Check if modal content scrolled vs background")
await bridge.destroy_context(group_id)
print("\n✓ Context destroyed")
finally:
await bridge.stop()
if __name__ == "__main__":
asyncio.run(test_modal_scroll())
@@ -0,0 +1,123 @@
#!/usr/bin/env python
"""
Test #4: Element Covered by Overlay
Symptom: Click succeeds but no action triggered
Root Cause: Element is covered by transparent overlay, tooltip, or iframe
Detection: document.elementFromPoint(x, y) !== target
Fix: Wait for overlay to disappear, or use JavaScript element.click()
"""
import asyncio
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent.parent.parent / "tools" / "src"))
from gcu.browser.bridge import BeelineBridge
CONTEXT_NAME = "overlay-click-test"
async def test_overlay_click():
"""Test clicking elements that are covered by overlays."""
print("=" * 70)
print("TEST #4: Element Covered by Overlay")
print("=" * 70)
bridge = BeelineBridge()
try:
await bridge.start()
for i in range(10):
await asyncio.sleep(1)
if bridge.is_connected:
print("✓ Extension connected!")
break
else:
print("✗ Extension not connected")
return
context = await bridge.create_context(CONTEXT_NAME)
tab_id = context.get("tabId")
group_id = context.get("groupId")
print(f"✓ Created tab: {tab_id}")
# Create a test page with overlay
print("\n--- Creating test page with overlay ---")
test_html = """
<!DOCTYPE html>
<html>
<head><title>Overlay Test</title></head>
<body>
<button id="target-btn" onclick="alert('Clicked!')">Click Me</button>
<div id="overlay" style="position:fixed;top:0;left:0;width:100%;height:100%;background:rgba(0,0,0,0.3);z-index:1000;"></div>
<script>
window.clickCount = 0;
document.getElementById('target-btn').addEventListener('click', () => {
window.clickCount++;
});
</script>
</body>
</html>
"""
# Navigate to data URL
import base64
data_url = f"data:text/html;base64,{base64.b64encode(test_html.encode()).decode()}"
await bridge.navigate(tab_id, data_url, wait_until="load")
# Screenshot before
screenshot = await bridge.screenshot(tab_id)
print(f"Screenshot: {len(screenshot.get('data', ''))} bytes")
# Try to click the covered button
print("\n--- Attempting to click covered button ---")
# First, check if element is covered
coverage_check = await bridge.evaluate(
tab_id,
"""
(function() {
const btn = document.getElementById('target-btn');
const rect = btn.getBoundingClientRect();
const centerX = rect.left + rect.width / 2;
const centerY = rect.top + rect.height / 2;
const topElement = document.elementFromPoint(centerX, centerY);
return {
isCovered: topElement !== btn && !btn.contains(topElement),
topElement: topElement?.tagName,
targetElement: btn.tagName
};
})();
"""
)
print(f"Coverage check: {coverage_check.get('result', {})}")
# Try CDP click (may fail due to overlay)
click_result = await bridge.click(tab_id, "#target-btn", timeout_ms=5000)
print(f"Click result: {click_result}")
# Check if click registered
count_result = await bridge.evaluate(
tab_id,
"(function() { return window.clickCount; })()"
)
count = count_result.get("result", 0)
print(f"Click count after CDP click: {count}")
if count > 0:
print("✓ PASS: JavaScript click penetrated overlay")
else:
print("✗ FAIL: Click did not reach button (overlay blocked it)")
await bridge.destroy_context(group_id)
print("\n✓ Context destroyed")
finally:
await bridge.stop()
if __name__ == "__main__":
asyncio.run(test_overlay_click())
@@ -0,0 +1,151 @@
#!/usr/bin/env python
"""
Test #6: Shadow DOM Elements
Symptom: querySelector can't find element
Root Cause: Element is inside a shadow root, not main DOM tree
Detection: element.shadowRoot !== null on parent elements
Fix: Use piercing selector (host >>> target) or traverse shadow roots
"""
import asyncio
import sys
import base64
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent.parent.parent / "tools" / "src"))
from gcu.browser.bridge import BeelineBridge
CONTEXT_NAME = "shadow-dom-test"
async def test_shadow_dom():
"""Test clicking elements inside Shadow DOM."""
print("=" * 70)
print("TEST #6: Shadow DOM Elements")
print("=" * 70)
bridge = BeelineBridge()
try:
await bridge.start()
for i in range(10):
await asyncio.sleep(1)
if bridge.is_connected:
print("✓ Extension connected!")
break
else:
print("✗ Extension not connected")
return
context = await bridge.create_context(CONTEXT_NAME)
tab_id = context.get("tabId")
group_id = context.get("groupId")
print(f"✓ Created tab: {tab_id}")
# Create test page with Shadow DOM
print("\n--- Creating test page with Shadow DOM ---")
test_html = """
<!DOCTYPE html>
<html>
<head><title>Shadow DOM Test</title></head>
<body>
<div id="shadow-host"></div>
<script>
const host = document.getElementById('shadow-host');
const shadow = host.attachShadow({ mode: 'open' });
shadow.innerHTML = `
<style>
button { padding: 10px 20px; font-size: 16px; }
</style>
<button id="shadow-btn">Shadow Button</button>
`;
shadow.getElementById('shadow-btn').addEventListener('click', () => {
window.shadowClickCount = (window.shadowClickCount || 0) + 1;
console.log('Shadow button clicked:', window.shadowClickCount);
});
</script>
</body>
</html>
"""
data_url = f"data:text/html;base64,{base64.b64encode(test_html.encode()).decode()}"
await bridge.navigate(tab_id, data_url, wait_until="load")
print("✓ Page loaded")
# Screenshot
screenshot = await bridge.screenshot(tab_id)
print(f"Screenshot: {len(screenshot.get('data', ''))} bytes")
# Detect Shadow DOM
print("\n--- Detecting Shadow DOM ---")
detection = await bridge.evaluate(
tab_id,
"""
(function() {
const hosts = [];
document.querySelectorAll('*').forEach(el => {
if (el.shadowRoot) {
hosts.push({
tag: el.tagName,
id: el.id,
hasButton: el.shadowRoot.querySelector('button') !== null
});
}
});
return { count: hosts.length, hosts };
})();
"""
)
print(f"Shadow DOM detection: {detection.get('result', {})}")
# Try to click shadow button using regular selector (should fail)
print("\n--- Attempting click with regular selector ---")
try:
result = await bridge.click(tab_id, "#shadow-btn", timeout_ms=3000)
print(f"Result: {result}")
except Exception as e:
print(f"Expected failure: {e}")
# Try to click using JavaScript that pierces shadow DOM
print("\n--- Clicking via JavaScript shadow piercing ---")
click_result = await bridge.evaluate(
tab_id,
"""
(function() {
const host = document.getElementById('shadow-host');
const btn = host.shadowRoot.getElementById('shadow-btn');
if (btn) {
btn.click();
return { success: true, clicked: 'shadow-btn' };
}
return { success: false, error: 'Button not found' };
})();
"""
)
print(f"JS click result: {click_result.get('result', {})}")
# Verify click was registered
count_result = await bridge.evaluate(
tab_id,
"(function() { return window.shadowClickCount || 0; })()"
)
count = count_result.get("result", 0)
print(f"Shadow click count: {count}")
if count > 0:
print("✓ PASS: Shadow DOM element clicked successfully")
else:
print("✗ FAIL: Could not click Shadow DOM element")
await bridge.destroy_context(group_id)
print("\n✓ Context destroyed")
finally:
await bridge.stop()
if __name__ == "__main__":
asyncio.run(test_shadow_dom())
@@ -0,0 +1,169 @@
#!/usr/bin/env python
"""
Test #7: ContentEditable / Rich Text Editors
Symptom: browser_type() doesn't insert text
Root Cause: Element is contenteditable, not an <input> or <textarea>
Detection: element.contentEditable === 'true'
Fix: Focus via JavaScript, use execCommand('insertText') or Input.dispatchKeyEvent
"""
import asyncio
import sys
import base64
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent.parent.parent / "tools" / "src"))
from gcu.browser.bridge import BeelineBridge
CONTEXT_NAME = "contenteditable-test"
async def test_contenteditable():
"""Test typing into contenteditable elements."""
print("=" * 70)
print("TEST #7: ContentEditable / Rich Text Editors")
print("=" * 70)
bridge = BeelineBridge()
try:
await bridge.start()
for i in range(10):
await asyncio.sleep(1)
if bridge.is_connected:
print("✓ Extension connected!")
break
else:
print("✗ Extension not connected")
return
context = await bridge.create_context(CONTEXT_NAME)
tab_id = context.get("tabId")
group_id = context.get("groupId")
print(f"✓ Created tab: {tab_id}")
# Create test page with contenteditable
test_html = """
<!DOCTYPE html>
<html>
<head><title>ContentEditable Test</title></head>
<body>
<h2>ContentEditable Test</h2>
<h3>1. Simple contenteditable div</h3>
<div id="editor1" contenteditable="true" style="border:1px solid #ccc;padding:10px;min-height:50px;">Start text</div>
<h3>2. Rich text editor (like Notion)</h3>
<div id="editor2" contenteditable="true" style="border:1px solid #ccc;padding:10px;min-height:50px;">
<p>Type here...</p>
</div>
<h3>3. Regular input (for comparison)</h3>
<input id="input1" type="text" placeholder="Regular input" />
<script>
// Track content changes
window.editor1Content = '';
window.editor2Content = '';
document.getElementById('editor1').addEventListener('input', (e) => {
window.editor1Content = e.target.innerText;
});
document.getElementById('editor2').addEventListener('input', (e) => {
window.editor2Content = e.target.innerText;
});
</script>
</body>
</html>
"""
data_url = f"data:text/html;base64,{base64.b64encode(test_html.encode()).decode()}"
await bridge.navigate(tab_id, data_url, wait_until="load")
print("✓ Page loaded")
# Screenshot
screenshot = await bridge.screenshot(tab_id)
print(f"Screenshot: {len(screenshot.get('data', ''))} bytes")
# Detect contenteditable
print("\n--- Detecting contenteditable elements ---")
detection = await bridge.evaluate(
tab_id,
"""
(function() {
const editables = document.querySelectorAll('[contenteditable="true"]');
return {
count: editables.length,
ids: Array.from(editables).map(el => el.id)
};
})();
"""
)
print(f"Contenteditable detection: {detection.get('result', {})}")
# Test 1: Type into regular input (baseline)
print("\n--- Test 1: Regular input ---")
await bridge.click(tab_id, "#input1")
await bridge.type(tab_id, "#input1", "Hello input")
input_result = await bridge.evaluate(
tab_id,
"(function() { return document.getElementById('input1').value; })()"
)
print(f"Input value: {input_result.get('result', '')}")
# Test 2: Type into contenteditable div
print("\n--- Test 2: Contenteditable div ---")
await bridge.click(tab_id, "#editor1")
await bridge.type(tab_id, "#editor1", "Hello contenteditable", clear_first=True)
editor_result = await bridge.evaluate(
tab_id,
"(function() { return document.getElementById('editor1').innerText; })()"
)
print(f"Editor1 innerText: {editor_result.get('result', '')}")
# Test 3: Use JavaScript insertText for rich editor
print("\n--- Test 3: JavaScript insertText for rich editor ---")
insert_result = await bridge.evaluate(
tab_id,
"""
(function() {
const editor = document.getElementById('editor2');
editor.focus();
document.execCommand('selectAll', false, null);
document.execCommand('insertText', false, 'Hello from execCommand');
return editor.innerText;
})();
"""
)
print(f"Editor2 after execCommand: {insert_result.get('result', '')}")
# Screenshot after
screenshot_after = await bridge.screenshot(tab_id)
print(f"Screenshot after: {len(screenshot_after.get('data', ''))} bytes")
# Results
print("\n--- Results ---")
input_val = input_result.get("result", "")
editor1_val = editor_result.get("result", "")
editor2_val = insert_result.get("result", "")
input_pass = "Hello input" in input_val
editor1_pass = "Hello contenteditable" in editor1_val
editor2_pass = "execCommand" in editor2_val
print(f"Input: {'✓ PASS' if input_pass else '✗ FAIL'} - {input_val}")
print(f"Editor1: {'✓ PASS' if editor1_pass else '✗ FAIL'} - {editor1_val}")
print(f"Editor2: {'✓ PASS' if editor2_pass else '✗ FAIL'} - {editor2_val}")
await bridge.destroy_context(group_id)
print("\n✓ Context destroyed")
finally:
await bridge.stop()
if __name__ == "__main__":
asyncio.run(test_contenteditable())
@@ -0,0 +1,230 @@
#!/usr/bin/env python
"""
Test #8: Autocomplete Field Clearing
Symptom: Typed text gets cleared immediately
Root Cause: Field expects realistic keystroke timing for autocomplete
Detection: Field has autocomplete listeners or dropdown appears
Fix: Add delay_ms between keystrokes
"""
import asyncio
import sys
import base64
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent.parent.parent / "tools" / "src"))
from gcu.browser.bridge import BeelineBridge
CONTEXT_NAME = "autocomplete-test"
async def test_autocomplete():
"""Test typing into fields with autocomplete behavior."""
print("=" * 70)
print("TEST #8: Autocomplete Field Clearing")
print("=" * 70)
bridge = BeelineBridge()
try:
await bridge.start()
for i in range(10):
await asyncio.sleep(1)
if bridge.is_connected:
print("✓ Extension connected!")
break
else:
print("✗ Extension not connected")
return
context = await bridge.create_context(CONTEXT_NAME)
tab_id = context.get("tabId")
group_id = context.get("groupId")
print(f"✓ Created tab: {tab_id}")
# Create test page with autocomplete behavior
test_html = """
<!DOCTYPE html>
<html>
<head><title>Autocomplete Test</title>
<style>
.autocomplete-items {
position: absolute;
border: 1px solid #d4d4d4;
border-top: none;
z-index: 99;
top: 100%;
left: 0;
right: 0;
max-height: 200px;
overflow-y: auto;
background: white;
}
.autocomplete-items div {
padding: 10px;
cursor: pointer;
}
.autocomplete-items div:hover {
background-color: #e9e9e9;
}
.autocomplete-active {
background-color: DodgerBlue !important;
color: white;
}
.autocomplete { position: relative; display: inline-block; }
input { width: 300px; padding: 10px; font-size: 16px; }
</style></head>
<body>
<h2>Autocomplete Test</h2>
<div class="autocomplete">
<input id="search" type="text" placeholder="Search countries..." autocomplete="off">
</div>
<div id="log" style="margin-top:20px;font-family:monospace;"></div>
<script>
const countries = ["Afghanistan","Albania","Algeria","Andorra","Angola","Argentina","Armenia","Australia","Austria","Azerbaijan","Bahamas","Bahrain","Bangladesh","Belarus","Belgium","Belize","Benin","Bhutan","Bolivia","Brazil","Canada","China","Colombia","Denmark","Egypt","France","Germany","India","Indonesia","Italy","Japan","Mexico","Netherlands","Nigeria","Norway","Pakistan","Peru","Philippines","Poland","Portugal","Russia","Spain","Sweden","Switzerland","Thailand","Turkey","Ukraine","United Kingdom","United States","Vietnam"];
const input = document.getElementById('search');
const log = document.getElementById('log');
let currentFocus = -1;
let typingTimeout = null;
// Track events for testing
window.inputEvents = [];
window.inputValue = '';
function logEvent(type, value) {
window.inputEvents.push({ type, value, time: Date.now() });
const entry = document.createElement('div');
entry.textContent = type + ': ' + value;
log.insertBefore(entry, log.firstChild);
}
// Simulate autocomplete that clears fast typing
input.addEventListener('input', function(e) {
const val = this.value;
// Clear previous dropdown
closeAllLists();
if (!val) return;
// If typing too fast (autocomplete-style), clear and restart
clearTimeout(typingTimeout);
typingTimeout = setTimeout(() => {
logEvent('input', val);
window.inputValue = val;
// Create dropdown
const div = document.createElement('div');
div.setAttribute('id', this.id + 'autocomplete-list');
div.setAttribute('class', 'autocomplete-items');
this.parentNode.appendChild(div);
countries.filter(c => c.substr(0, val.length).toUpperCase() === val.toUpperCase())
.slice(0, 5)
.forEach(country => {
const item = document.createElement('div');
item.innerHTML = '<strong>' + country.substr(0, val.length) + '</strong>' + country.substr(val.length);
item.addEventListener('click', function() {
input.value = country;
closeAllLists();
logEvent('select', country);
window.inputValue = country;
});
div.appendChild(item);
});
}, 100); // 100ms debounce
});
function closeAllLists() {
document.querySelectorAll('.autocomplete-items').forEach(el => el.remove());
}
document.addEventListener('click', function() {
closeAllLists();
});
</script>
</body>
</html>
"""
data_url = f"data:text/html;base64,{base64.b64encode(test_html.encode()).decode()}"
await bridge.navigate(tab_id, data_url, wait_until="load")
print("✓ Page loaded")
# Screenshot
screenshot = await bridge.screenshot(tab_id)
print(f"Screenshot: {len(screenshot.get('data', ''))} bytes")
# Test 1: Fast typing (no delay) - may fail
print("\n--- Test 1: Fast typing (delay_ms=0) ---")
await bridge.click(tab_id, "#search")
await bridge.type(tab_id, "#search", "Ger", clear_first=True, delay_ms=0)
await asyncio.sleep(0.5)
fast_result = await bridge.evaluate(
tab_id,
"(function() { return document.getElementById('search').value; })()"
)
fast_value = fast_result.get("result", "")
print(f"Value after fast typing: '{fast_value}'")
# Check events
events_result = await bridge.evaluate(
tab_id,
"(function() { return window.inputEvents; })()"
)
print(f"Events logged: {events_result.get('result', [])}")
# Test 2: Slow typing (with delay) - should work
print("\n--- Test 2: Slow typing (delay_ms=100) ---")
await bridge.click(tab_id, "#search")
await bridge.type(tab_id, "#search", "United", clear_first=True, delay_ms=100)
await asyncio.sleep(0.5)
slow_result = await bridge.evaluate(
tab_id,
"(function() { return document.getElementById('search').value; })()"
)
slow_value = slow_result.get("result", "")
print(f"Value after slow typing: '{slow_value}'")
# Check if dropdown appeared
dropdown_result = await bridge.evaluate(
tab_id,
"(function() { return document.querySelectorAll('.autocomplete-items div').length; })()"
)
dropdown_count = dropdown_result.get("result", 0)
print(f"Dropdown items: {dropdown_count}")
# Screenshot with dropdown
screenshot_dropdown = await bridge.screenshot(tab_id)
print(f"Screenshot with dropdown: {len(screenshot_dropdown.get('data', ''))} bytes")
# Results
print("\n--- Results ---")
if "United" in slow_value:
print("✓ PASS: Slow typing with delay_ms worked")
else:
print("✗ FAIL: Slow typing still didn't work")
if dropdown_count > 0:
print("✓ PASS: Autocomplete dropdown appeared")
else:
print("⚠ WARNING: No autocomplete dropdown")
await bridge.destroy_context(group_id)
print("\n✓ Context destroyed")
finally:
await bridge.stop()
if __name__ == "__main__":
asyncio.run(test_autocomplete())
@@ -0,0 +1,162 @@
#!/usr/bin/env python
"""
Test #10: LinkedIn Huge DOM Tree
Symptom: browser_snapshot() hangs forever
Root Cause: 10k+ DOM nodes, accessibility tree has 50k+ nodes
Detection: document.querySelectorAll('*').length > 5000
Fix: Add timeout (10s default), truncate tree at 2000 nodes
"""
import asyncio
import sys
import time
import base64
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent.parent.parent / "tools" / "src"))
from gcu.browser.bridge import BeelineBridge
CONTEXT_NAME = "huge-dom-test"
async def test_huge_dom():
"""Test snapshot performance on huge DOM trees."""
print("=" * 70)
print("TEST #10: Huge DOM Tree (LinkedIn-style)")
print("=" * 70)
bridge = BeelineBridge()
try:
await bridge.start()
for i in range(10):
await asyncio.sleep(1)
if bridge.is_connected:
print("✓ Extension connected!")
break
else:
print("✗ Extension not connected")
return
context = await bridge.create_context(CONTEXT_NAME)
tab_id = context.get("tabId")
group_id = context.get("groupId")
print(f"✓ Created tab: {tab_id}")
# Test 1: Small DOM (baseline)
print("\n--- Test 1: Small DOM (baseline) ---")
small_html = """
<!DOCTYPE html>
<html><body>
<h1>Small Page</h1>
<p>A few elements</p>
<button>Click me</button>
</body></html>
"""
data_url = f"data:text/html;base64,{base64.b64encode(small_html.encode()).decode()}"
await bridge.navigate(tab_id, data_url, wait_until="load")
start = time.perf_counter()
snapshot = await bridge.snapshot(tab_id, timeout_s=5.0)
elapsed = time.perf_counter() - start
tree_len = len(snapshot.get("tree", ""))
print(f"Small DOM snapshot: {elapsed:.3f}s, {tree_len} chars")
# Test 2: Generate huge DOM
print("\n--- Test 2: Huge DOM (5000+ elements) ---")
huge_html = """
<!DOCTYPE html>
<html><body>
<h1>Huge DOM Test</h1>
<div id="container"></div>
<script>
const container = document.getElementById('container');
for (let i = 0; i < 5000; i++) {
const div = document.createElement('div');
div.className = 'item-' + i;
div.innerHTML = '<span>Item ' + i + '</span><button>Action</button>';
container.appendChild(div);
}
</script>
</body></html>
"""
data_url = f"data:text/html;base64,{base64.b64encode(huge_html.encode()).decode()}"
await bridge.navigate(tab_id, data_url, wait_until="load")
# Count elements
count_result = await bridge.evaluate(
tab_id,
"(function() { return document.querySelectorAll('*').length; })()"
)
elem_count = count_result.get("result", 0)
print(f"DOM elements: {elem_count}")
# Screenshot to verify page loaded
screenshot = await bridge.screenshot(tab_id)
print(f"Screenshot: {len(screenshot.get('data', ''))} bytes")
# Test snapshot with timeout
print("\n--- Testing snapshot with 10s timeout ---")
start = time.perf_counter()
try:
snapshot = await bridge.snapshot(tab_id, timeout_s=10.0)
elapsed = time.perf_counter() - start
tree_len = len(snapshot.get("tree", ""))
truncated = "(truncated)" in snapshot.get("tree", "")
print(f"✓ Huge DOM snapshot: {elapsed:.3f}s, {tree_len} chars, truncated={truncated}")
if elapsed < 5.0:
print("✓ PASS: Snapshot completed quickly")
else:
print(f"⚠ WARNING: Snapshot took {elapsed:.1f}s")
if truncated:
print("✓ PASS: Tree was truncated to prevent hang")
else:
print("⚠ WARNING: Tree not truncated (may need adjustment)")
except asyncio.TimeoutError:
print("✗ FAIL: Snapshot timed out (this shouldn't happen)")
# Test 3: Real LinkedIn
print("\n--- Test 3: Real LinkedIn Feed ---")
await bridge.navigate(tab_id, "https://www.linkedin.com/feed", wait_until="load", timeout_ms=30000)
await asyncio.sleep(2)
count_result = await bridge.evaluate(
tab_id,
"(function() { return document.querySelectorAll('*').length; })()"
)
elem_count = count_result.get("result", 0)
print(f"LinkedIn DOM elements: {elem_count}")
start = time.perf_counter()
try:
snapshot = await bridge.snapshot(tab_id, timeout_s=15.0)
elapsed = time.perf_counter() - start
tree_len = len(snapshot.get("tree", ""))
truncated = "(truncated)" in snapshot.get("tree", "")
print(f"LinkedIn snapshot: {elapsed:.3f}s, {tree_len} chars, truncated={truncated}")
if elapsed < 5.0:
print("✓ PASS: LinkedIn snapshot fast enough")
elif elapsed < 15.0:
print("⚠ WARNING: LinkedIn snapshot slow but within timeout")
else:
print("✗ FAIL: LinkedIn snapshot too slow")
except asyncio.TimeoutError:
print("✗ FAIL: LinkedIn snapshot timed out")
await bridge.destroy_context(group_id)
print("\n✓ Context destroyed")
finally:
await bridge.stop()
if __name__ == "__main__":
asyncio.run(test_huge_dom())
@@ -0,0 +1,184 @@
#!/usr/bin/env python
"""
Test #13: SPA Navigation Events
Symptom: wait_until="load" fires before content ready
Root Cause: SPA uses client-side routing, no full page load
Detection: URL changes but load event already fired
Fix: Use wait_until="networkidle" or wait_for_selector
"""
import asyncio
import sys
import time
import base64
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent.parent.parent / "tools" / "src"))
from gcu.browser.bridge import BeelineBridge
CONTEXT_NAME = "spa-nav-test"
async def test_spa_navigation():
"""Test navigation timing on SPA pages."""
print("=" * 70)
print("TEST #13: SPA Navigation Events")
print("=" * 70)
bridge = BeelineBridge()
try:
await bridge.start()
for i in range(10):
await asyncio.sleep(1)
if bridge.is_connected:
print("✓ Extension connected!")
break
else:
print("✗ Extension not connected")
return
context = await bridge.create_context(CONTEXT_NAME)
tab_id = context.get("tabId")
group_id = context.get("groupId")
print(f"✓ Created tab: {tab_id}")
# Create a test SPA
spa_html = """
<!DOCTYPE html>
<html>
<head>
<title>SPA Test</title>
<style>
nav a { margin-right: 10px; }
.page { padding: 20px; border: 1px solid #ccc; margin-top: 10px; }
</style>
</head>
<body>
<nav>
<a href="#home" onclick="navigate('home')">Home</a>
<a href="#about" onclick="navigate('about')">About</a>
<a href="#contact" onclick="navigate('contact')">Contact</a>
</nav>
<div id="app" class="page">
<h1>Loading...</h1>
</div>
<script>
// Simulate SPA routing
let currentPage = '';
async function navigate(page) {
event.preventDefault();
currentPage = page;
// Show loading state
document.getElementById('app').innerHTML = '<h1>Loading...</h1>';
// Simulate async content loading (like real SPAs)
await new Promise(r => setTimeout(r, 500));
// Render content
const content = {
home: '<h1>Home Page</h1><p>Welcome to the SPA!</p><button id="home-btn">Home Action</button>',
about: '<h1>About Page</h1><p>This is a simulated SPA.</p><button id="about-btn">About Action</button>',
contact: '<h1>Contact Page</h1><p>Contact us at test@example.com</p><button id="contact-btn">Contact Action</button>'
};
document.getElementById('app').innerHTML = content[page] || '<h1>404</h1>';
window.location.hash = page;
}
// Initial load with delay (simulates SPA hydration)
setTimeout(() => {
navigate('home');
}, 1000);
// Track for testing
window.pageLoads = [];
window.addEventListener('hashchange', () => {
window.pageLoads.push(window.location.hash);
});
</script>
</body>
</html>
"""
data_url = f"data:text/html;base64,{base64.b64encode(spa_html.encode()).decode()}"
# Test 1: wait_until="load" - may fire before content ready
print("\n--- Test 1: wait_until='load' ---")
start = time.perf_counter()
await bridge.navigate(tab_id, data_url, wait_until="load")
elapsed = time.perf_counter() - start
print(f"Navigation completed in {elapsed:.3f}s")
# Check content immediately
content = await bridge.evaluate(
tab_id,
"(function() { return document.getElementById('app').innerText; })()"
)
print(f"Content immediately after load: '{content.get('result', '')}'")
# Screenshot
screenshot = await bridge.screenshot(tab_id)
print(f"Screenshot: {len(screenshot.get('data', ''))} bytes")
# Wait for content
print("\n--- Waiting for content to hydrate ---")
await bridge.wait_for_selector(tab_id, "#home-btn", timeout_ms=5000)
print("✓ Content loaded")
# Check content after wait
content_after = await bridge.evaluate(
tab_id,
"(function() { return document.getElementById('app').innerText; })()"
)
print(f"Content after wait: '{content_after.get('result', '')}'")
# Test 2: SPA navigation (no full page load)
print("\n--- Test 2: SPA client-side navigation ---")
# Click "About" link
await bridge.click(tab_id, 'a[href="#about"]')
await asyncio.sleep(1)
# Check if content changed
about_content = await bridge.evaluate(
tab_id,
"(function() { return document.getElementById('app').innerText; })()"
)
print(f"Content after SPA nav: '{about_content.get('result', '')}'")
if "About Page" in about_content.get("result", ""):
print("✓ PASS: SPA navigation worked")
else:
print("✗ FAIL: SPA navigation didn't update content")
# Test 3: wait_until="networkidle"
print("\n--- Test 3: wait_until='networkidle' ---")
await bridge.navigate(tab_id, data_url, wait_until="networkidle", timeout_ms=10000)
# Check content immediately
content_networkidle = await bridge.evaluate(
tab_id,
"(function() { return document.getElementById('app').innerText; })()"
)
print(f"Content after networkidle: '{content_networkidle.get('result', '')}'")
if "Home Page" in content_networkidle.get("result", ""):
print("✓ PASS: networkidle waited for content")
else:
print("⚠ WARNING: networkidle didn't wait long enough")
await bridge.destroy_context(group_id)
print("\n✓ Context destroyed")
finally:
await bridge.stop()
if __name__ == "__main__":
asyncio.run(test_spa_navigation())
@@ -0,0 +1,327 @@
#!/usr/bin/env python
"""
Browser Edge Case Test Template
This script provides a template for testing and debugging browser tool failures
on specific websites. Use this to reproduce, isolate, and verify fixes.
Usage:
1. Copy this file: cp test_case.py test_#[number]_[site].py
2. Fill in the CONFIG section with your test details
3. Run: uv run python test_#[number]_[site].py
Example:
uv run python test_01_linkedin_scroll.py
"""
import asyncio
import sys
import time
from pathlib import Path
# Add tools to path
sys.path.insert(0, str(Path(__file__).parent.parent.parent.parent / "tools" / "src"))
from gcu.browser.bridge import BeelineBridge
# ═══════════════════════════════════════════════════════════════════════════════
# CONFIG: Fill in these values for your test case
# ═══════════════════════════════════════════════════════════════════════════════
TEST_CASE = {
"number": 1,
"name": "LinkedIn Nested Scroll Container",
"site": "https://www.linkedin.com/feed",
"simple_site": "https://example.com",
"category": "scroll", # scroll, click, input, snapshot, navigation
"symptom": "scroll() returns success but page doesn't move",
}
BRIDGE_PORT = 9229
CONTEXT_NAME = "edge-case-test"
# ═══════════════════════════════════════════════════════════════════════════════
# TEST FUNCTIONS
# ═══════════════════════════════════════════════════════════════════════════════
async def test_simple_site(bridge: BeelineBridge, tab_id: int) -> dict:
"""Test that the tool works on a simple site (baseline)."""
print("\n--- Baseline Test (Simple Site) ---")
await bridge.navigate(tab_id, TEST_CASE["simple_site"], wait_until="load")
await asyncio.sleep(1)
# Adjust this based on category
if TEST_CASE["category"] == "scroll":
result = await bridge.scroll(tab_id, "down", 100)
print(f" Scroll result: {result}")
return result
elif TEST_CASE["category"] == "click":
# Add click test
pass
elif TEST_CASE["category"] == "snapshot":
result = await bridge.snapshot(tab_id, timeout_s=5.0)
print(f" Snapshot length: {len(result.get('tree', ''))}")
return result
return {"ok": True}
async def test_problematic_site(bridge: BeelineBridge, tab_id: int) -> dict:
"""Test the tool on the problematic site."""
print("\n--- Problem Site Test ---")
await bridge.navigate(tab_id, TEST_CASE["site"], wait_until="load", timeout_ms=30000)
await asyncio.sleep(2)
# Adjust this based on category
if TEST_CASE["category"] == "scroll":
# Get scroll positions before
before = await bridge.evaluate(
tab_id,
"""
(function() {
const results = { window: { y: window.scrollY } };
document.querySelectorAll('*').forEach((el, i) => {
const style = getComputedStyle(el);
if ((style.overflowY === 'scroll' || style.overflowY === 'auto') &&
el.scrollHeight > el.clientHeight) {
results['el_' + i] = {
tag: el.tagName,
scrollTop: el.scrollTop,
class: el.className.substring(0, 30)
};
}
});
return results;
})();
"""
)
print(f" Before scroll: {before.get('result', {})}")
# Try to scroll
result = await bridge.scroll(tab_id, "down", 500)
print(f" Scroll result: {result}")
await asyncio.sleep(1)
# Get scroll positions after
after = await bridge.evaluate(
tab_id,
"""
(function() {
const results = { window: { y: window.scrollY } };
document.querySelectorAll('*').forEach((el, i) => {
const style = getComputedStyle(el);
if ((style.overflowY === 'scroll' || style.overflowY === 'auto') &&
el.scrollHeight > el.clientHeight) {
results['el_' + i] = {
tag: el.tagName,
scrollTop: el.scrollTop,
class: el.className.substring(0, 30)
};
}
});
return results;
})();
"""
)
print(f" After scroll: {after.get('result', {})}")
# Check if anything changed
before_data = before.get("result", {}) or {}
after_data = after.get("result", {}) or {}
changed = False
for key in after_data:
if key in before_data:
b_val = before_data[key].get("scrollTop", 0) if isinstance(before_data[key], dict) else 0
a_val = after_data[key].get("scrollTop", 0) if isinstance(after_data[key], dict) else 0
if a_val != b_val:
print(f" ✓ CHANGE DETECTED: {key} scrolled from {b_val} to {a_val}")
changed = True
if not changed:
print(" ✗ NO CHANGE: Scroll did not affect any container")
return {"ok": changed, "scroll_result": result}
elif TEST_CASE["category"] == "snapshot":
start = time.perf_counter()
try:
result = await bridge.snapshot(tab_id, timeout_s=15.0)
elapsed = time.perf_counter() - start
tree_len = len(result.get("tree", ""))
print(f" Snapshot completed in {elapsed:.2f}s, {tree_len} chars")
return {"ok": True, "elapsed": elapsed, "tree_length": tree_len}
except asyncio.TimeoutError:
print(" ✗ SNAPSHOT TIMED OUT")
return {"ok": False, "error": "timeout"}
return {"ok": True}
async def detect_root_cause(bridge: BeelineBridge, tab_id: int) -> dict:
"""Run detection scripts to identify the root cause."""
print("\n--- Root Cause Detection ---")
detections = {}
# Detection 1: Nested scrollable containers
scroll_check = await bridge.evaluate(
tab_id,
"""
(function() {
const candidates = [];
document.querySelectorAll('*').forEach(el => {
const style = getComputedStyle(el);
if (style.overflow.includes('scroll') || style.overflow.includes('auto')) {
const rect = el.getBoundingClientRect();
if (rect.width > 100 && rect.height > 100) {
candidates.push({
tag: el.tagName,
area: rect.width * rect.height,
class: el.className.substring(0, 30)
});
}
}
});
candidates.sort((a, b) => b.area - a.area);
return {
count: candidates.length,
largest: candidates[0]
};
})();
"""
)
detections["nested_scroll"] = scroll_check.get("result", {})
print(f" Nested scroll containers: {detections['nested_scroll']}")
# Detection 2: Shadow DOM
shadow_check = await bridge.evaluate(
tab_id,
"""
(function() {
const withShadow = [];
document.querySelectorAll('*').forEach(el => {
if (el.shadowRoot) {
withShadow.push(el.tagName);
}
});
return { count: withShadow.length, elements: withShadow.slice(0, 5) };
})();
"""
)
detections["shadow_dom"] = shadow_check.get("result", {})
print(f" Shadow DOM: {detections['shadow_dom']}")
# Detection 3: iframes
iframe_check = await bridge.evaluate(
tab_id,
"""
(function() {
const iframes = document.querySelectorAll('iframe');
return { count: iframes.length };
})();
"""
)
detections["iframes"] = iframe_check.get("result", {})
print(f" iframes: {detections['iframes']}")
# Detection 4: DOM size
dom_check = await bridge.evaluate(
tab_id,
"""
(function() {
return {
elements: document.querySelectorAll('*').length,
body_children: document.body.children.length
};
})();
"""
)
detections["dom_size"] = dom_check.get("result", {})
print(f" DOM size: {detections['dom_size']}")
# Detection 5: Framework detection
framework_check = await bridge.evaluate(
tab_id,
"""
(function() {
return {
react: !!document.querySelector('[data-reactroot], [data-reactid]'),
vue: !!document.querySelector('[data-v-]'),
angular: !!document.querySelector('[ng-app], [ng-version]')
};
})();
"""
)
detections["frameworks"] = framework_check.get("result", {})
print(f" Frameworks: {detections['frameworks']}")
return detections
# ═══════════════════════════════════════════════════════════════════════════════
# MAIN
# ═══════════════════════════════════════════════════════════════════════════════
async def main():
print("=" * 70)
print(f"EDGE CASE TEST #{TEST_CASE['number']}: {TEST_CASE['name']}")
print("=" * 70)
print(f"Site: {TEST_CASE['site']}")
print(f"Category: {TEST_CASE['category']}")
print(f"Symptom: {TEST_CASE['symptom']}")
bridge = BeelineBridge()
try:
print("\n--- Starting Bridge ---")
await bridge.start()
# Wait for extension connection
for i in range(10):
await asyncio.sleep(1)
if bridge.is_connected:
print("✓ Extension connected!")
break
print(f"Waiting for extension... ({i+1}/10)")
else:
print("✗ Extension not connected. Ensure Chrome with Beeline extension is running.")
return
# Create browser context
context = await bridge.create_context(CONTEXT_NAME)
tab_id = context.get("tabId")
group_id = context.get("groupId")
print(f"✓ Created tab: {tab_id}")
# Run tests
baseline_result = await test_simple_site(bridge, tab_id)
problem_result = await test_problematic_site(bridge, tab_id)
detections = await detect_root_cause(bridge, tab_id)
# Summary
print("\n" + "=" * 70)
print("SUMMARY")
print("=" * 70)
print(f"Baseline test: {'✓ PASS' if baseline_result.get('ok') else '✗ FAIL'}")
print(f"Problem test: {'✓ PASS' if problem_result.get('ok') else '✗ FAIL'}")
print(f"Root cause indicators: {list(k for k, v in detections.items() if v)}")
# Cleanup
print("\n--- Cleanup ---")
await bridge.destroy_context(group_id)
print("✓ Context destroyed")
finally:
await bridge.stop()
print("✓ Bridge stopped")
if __name__ == "__main__":
asyncio.run(main())
@@ -222,7 +222,7 @@ def truncate_tool_result(
- Small results ( limit): full content kept + file annotation
- Large results (> limit): preview + file reference
- Errors: pass through unchanged
- load_data results: truncate with pagination hint (no re-spill)
- read_file/load_data results: truncate with pagination hint (no re-spill)
"""
limit = max_tool_result_chars
@@ -230,12 +230,12 @@ def truncate_tool_result(
if result.is_error:
return result
# load_data reads FROM spilled files — never re-spill (circular).
# read_file/load_data reads FROM spilled files — never re-spill (circular).
# Just truncate with a pagination hint if the result is too large.
if tool_name == "load_data":
if tool_name in ("load_data", "read_file"):
if limit <= 0 or len(result.content) <= limit:
return result # Small load_data result — pass through as-is
# Large load_data result — truncate with smart preview
return result # Small result — pass through as-is
# Large result — truncate with smart preview
PREVIEW_CAP = min(5000, max(limit - 500, limit // 2))
metadata_str = ""
@@ -284,7 +284,7 @@ def truncate_tool_result(
spill_path.mkdir(parents=True, exist_ok=True)
filename = next_spill_filename_fn(tool_name)
# Pretty-print JSON content so load_data's line-based
# Pretty-print JSON content so read_file's line-based
# pagination works correctly.
write_content = result.content
parsed_json: Any = None # track for metadata extraction
@@ -294,7 +294,10 @@ def truncate_tool_result(
except (json.JSONDecodeError, TypeError, ValueError):
pass # Not JSON — write as-is
(spill_path / filename).write_text(write_content, encoding="utf-8")
file_path = spill_path / filename
file_path.write_text(write_content, encoding="utf-8")
# Use absolute path so parent agents can find files from subagents
abs_path = str(file_path.resolve())
if limit > 0 and len(result.content) > limit:
# Large result: build a small, metadata-rich preview so the
@@ -316,14 +319,14 @@ def truncate_tool_result(
# Assemble header with structural info + warning
header = (
f"[Result from {tool_name}: {len(result.content):,} chars — "
f"too large for context, saved to '{filename}'.]\n"
f"too large for context, saved to '{abs_path}'.]\n"
)
if metadata_str:
header += f"\nData structure:\n{metadata_str}"
header += (
f"\n\nWARNING: The preview below is INCOMPLETE. "
f"Do NOT draw conclusions or counts from it. "
f"Use load_data(filename='{filename}') to read the "
f"Use read_file(path='{abs_path}') to read the "
f"full data before analysis."
)
@@ -332,11 +335,11 @@ def truncate_tool_result(
"Tool result spilled to file: %s (%d chars → %s)",
tool_name,
len(result.content),
filename,
abs_path,
)
else:
# Small result: keep full content + annotation
content = f"{result.content}\n\n[Saved to '{filename}']"
# Small result: keep full content + annotation with absolute path
content = f"{result.content}\n\n[Saved to '{abs_path}']"
logger.info(
"Tool result saved to file: %s (%d chars → %s)",
tool_name,
+7 -4
View File
@@ -151,8 +151,9 @@ class OutputAccumulator:
if isinstance(value, (dict, list))
else str(value)
)
(spill_path / filename).write_text(write_content, encoding="utf-8")
file_size = (spill_path / filename).stat().st_size
file_path = spill_path / filename
file_path.write_text(write_content, encoding="utf-8")
file_size = file_path.stat().st_size
logger.info(
"set_output value auto-spilled: key=%s, %d chars -> %s (%d bytes)",
key,
@@ -160,9 +161,11 @@ class OutputAccumulator:
filename,
file_size,
)
# Use absolute path so parent agents can find files from subagents
abs_path = str(file_path.resolve())
return (
f"[Saved to '{filename}' ({file_size:,} bytes). "
f"Use load_data(filename='{filename}') "
f"[Saved to '{abs_path}' ({file_size:,} bytes). "
f"Use read_file(path='{abs_path}') "
f"to access full data.]"
)
+180
View File
@@ -0,0 +1,180 @@
# Browser-Use Patterns Analysis
## Key Learnings from `/home/timothy/aden/browser-use`
### 1. Element Click Implementation
**browser-use approach** (`browser_use/actor/element.py`):
```python
# Three fallback methods for element geometry:
# Method 1: DOM.getContentQuads (best for inline elements and complex layouts)
content_quads_result = await self._client.send.DOM.getContentQuads(
params={'backendNodeId': self._backend_node_id}, session_id=self._session_id
)
# Method 2: DOM.getBoxModel (fallback)
box_model = await self._client.send.DOM.getBoxModel(
params={'backendNodeId': self._backend_node_id}, session_id=self._session_id
)
# Method 3: JavaScript getBoundingClientRect (final fallback)
bounds_result = await self._client.send.Runtime.callFunctionOn(
params={
'functionDeclaration': """
function() {
const rect = this.getBoundingClientRect();
return {
x: rect.left,
y: rect.top,
width: rect.width,
height: rect.height
};
}
""",
'objectId': object_id,
'returnByValue': True,
},
session_id=self._session_id,
)
# Method 4: JavaScript click (if all else fails)
await self._client.send.Runtime.callFunctionOn(
params={
'functionDeclaration': 'function() { this.click(); }',
'objectId': object_id,
},
session_id=self._session_id,
)
```
**Key differences from our implementation:**
- Uses `backendNodeId` instead of `nodeId` (more stable across DOM updates)
- Tries `DOM.getContentQuads` first (better for complex layouts)
- Multiple fallback methods with JavaScript click as final resort
- Finds largest visible quad within viewport
- Has timeouts for each mouse operation
- Proper modifier key handling
### 2. Input/Type Text Implementation
**browser-use approach**:
```python
# 1. Scroll element into view
await cdp_client.send.DOM.scrollIntoViewIfNeeded(
params={'backendNodeId': backend_node_id},
session_id=session_id
)
# 2. Get object ID
result = await cdp_client.send.DOM.resolveNode(
params={'backendNodeId': backend_node_id},
session_id=session_id,
)
object_id = result['object']['objectId']
# 3. Focus via JavaScript (more reliable than CDP focus)
await cdp_client.send.Runtime.callFunctionOn(
params={
'functionDeclaration': 'function() { this.focus(); }',
'objectId': object_id,
},
session_id=session_id,
)
# 4. Type using Input.dispatchKeyEvent
for char in text:
await self._client.send.Input.dispatchKeyEvent(
params={
'type': 'keyDown',
'key': char,
'text': char,
},
session_id=self._session_id,
)
```
### 3. Accessibility Tree (Snapshot)
**browser-use approach** (`browser_use/dom/service.py`):
- Uses `Accessibility.getFullAXTree` for accessibility data
- Combines with DOM tree for enhanced snapshot
- Filters by paint order (elements actually visible)
- Handles iframes with depth limits
- Detects hidden interactive elements and reports them
- Uses `DOM.getFrameOwner` for iframe handling
### 4. CDP Domain Handling
**browser-use approach** (`browser_use/browser/session.py`):
```python
# Session setup enables ONLY these domains:
await self._client.send.Page.enable(session_id=self._session_id)
await self._client.send.DOM.enable(session_id=self._session_id)
await self._client.send.Runtime.enable(session_id=self._session_id)
await self._client.send.Network.enable(session_id=self._session_id)
# Input.enable is NEVER called - it's not required!
```
### 5. Element Selection
**browser-use approach**:
- Uses index-based element selection from accessibility tree
- Maintains a map of index -> EnhancedAXNode
- Elements have `backendNodeId` which is stable
- Uses `DOM.pushNodesByBackendIdsToFrontend` to get fresh nodeId
### 6. Scroll Handling
**browser-use approach**:
```python
# Uses multiple methods:
# 1. DOM.scrollIntoViewIfNeeded (CDP)
# 2. JavaScript scrollIntoView as fallback
# 3. Mouse wheel events for smooth scrolling
```
### 7. Wait Strategies
**browser-use approach**:
- `wait_for_element` uses CDP DOM queries with polling
- Has configurable timeouts
- Uses `DOM.getContentQuads` to verify element is visible
- Detects page load state via `Page.loadEventFired`
---
## Improvements to Make to hive/tools
### 1. Bridge Updates
- [x] Use `backendNodeId` instead of `nodeId` where possible
- [x] Add `DOM.getContentQuads` as primary method for element geometry
- [x] Add JavaScript click as final fallback
- [x] Add proper timeouts to mouse operations
- [x] Handle modifier keys for click
### 2. Type Text Updates
- [x] Focus element via JavaScript before typing
- [x] Use `Input.dispatchKeyEvent` for typing (more reliable than insertText)
### 3. Snapshot Updates
- [x] Use accessibility tree (CDP Accessibility domain)
- [x] Add computed styles to detect visibility
- [x] Report hidden interactive elements
### 4. Error Handling
- [x] Better error messages with element context
- [x] Graceful fallbacks instead of hard failures
- [x] Timeout handling for all CDP operations
+14 -8
View File
@@ -10,10 +10,11 @@ const HIVE_WS_URL = "ws://127.0.0.1:9229/bridge";
let ws = null;
let reconnectAttempts = 0;
const MAX_RECONNECT_DELAY = 10000; // Max 10 seconds between attempts
function connect() {
// Don't try to reconnect too fast
const delay = Math.min(reconnectAttempts * 1000, 5000);
// Exponential backoff with cap
const delay = Math.min(reconnectAttempts * 1000, MAX_RECONNECT_DELAY);
if (reconnectAttempts > 0) {
console.log(`[Beeline] Reconnecting in ${delay}ms (attempt ${reconnectAttempts + 1})...`);
@@ -34,18 +35,21 @@ function connect() {
};
ws.onclose = (event) => {
console.log(`[Beeline] WebSocket closed: code=${event.code}`);
console.log(`[Beeline] WebSocket closed: code=${event.code}, reason=${event.reason}`);
chrome.runtime.sendMessage({ _beeline: true, type: "ws_close" });
reconnectAttempts++;
// Reconnect after delay
setTimeout(connect, 2000);
};
ws.onerror = (error) => {
console.error("[Beeline] WebSocket error:", error);
ws.close();
ws.onerror = () => {
// Don't log the full error object - it's usually just an Event
// The actual error will be reflected in onclose
console.warn(`[Beeline] WebSocket connection failed (server may not be running)`);
// Don't close here - let onclose handle cleanup
};
} catch (error) {
console.error("[Beeline] Failed to create WebSocket:", error);
console.error("[Beeline] Failed to create WebSocket:", error.message);
reconnectAttempts++;
setTimeout(connect, 2000);
}
@@ -58,9 +62,11 @@ chrome.runtime.onMessage.addListener((msg) => {
if (ws && ws.readyState === WebSocket.OPEN) {
ws.send(msg.data);
} else {
console.warn("[Beeline] Cannot send - WebSocket not connected");
console.warn("[Beeline] Cannot send - WebSocket not connected (state: %s)",
ws ? ws.readyState : "null");
}
}
});
// Start connection
connect();
+18 -1
View File
@@ -83,11 +83,28 @@ def _find_project_root() -> str:
def _resolve_path(path: str) -> str:
"""Resolve path relative to PROJECT_ROOT. Raises ValueError if outside."""
"""Resolve path relative to PROJECT_ROOT. Raises ValueError if outside.
Also allows access to ~/.hive/ directory for agent session data files.
"""
# Normalize slashes for cross-platform (e.g. exports/hi_agent from LLM)
path = path.replace("/", os.sep)
# Expand ~ to home directory
if path.startswith("~"):
path = os.path.expanduser(path)
if os.path.isabs(path):
resolved = os.path.abspath(path)
# Allow access to ~/.hive/ for agent session data
hive_dir = os.path.expanduser("~/.hive")
try:
if os.path.commonpath([resolved, hive_dir]) == hive_dir:
return resolved # Path is under ~/.hive, allow it
except ValueError:
pass
try:
common = os.path.commonpath([resolved, PROJECT_ROOT])
except ValueError:
+91
View File
@@ -0,0 +1,91 @@
#!/usr/bin/env python
"""
Direct browser control test - uses the bridge directly.
Run: uv run python direct_browser_test.py
"""
import asyncio
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent / "src"))
from gcu.browser.bridge import BeelineBridge
async def main():
print("=" * 60)
print("DIRECT BROWSER TEST")
print("=" * 60)
bridge = BeelineBridge()
await bridge.start()
# Wait for connection
print("\nWaiting for extension connection...")
for i in range(10):
await asyncio.sleep(1)
if bridge.is_connected:
print("✓ Extension connected!")
break
print(f" Waiting... ({i+1}/10)")
else:
print("✗ Extension not connected")
await bridge.stop()
return
# Create a context (tab group)
print("\n--- Creating browser context ---")
ctx = await bridge.create_context("test-session")
tab_id = ctx.get("tabId")
group_id = ctx.get("groupId")
print(f"✓ Context created: tabId={tab_id}, groupId={group_id}")
# Navigate
print("\n--- Navigating to example.com ---")
result = await bridge.navigate(tab_id, "https://example.com", wait_until="load")
print(f"✓ Navigated: {result}")
await asyncio.sleep(1)
# Get snapshot
print("\n--- Getting page snapshot ---")
snapshot = await bridge.snapshot(tab_id)
tree = snapshot.get("tree", "")
print(f"✓ Snapshot ({len(tree)} chars):")
print(tree[:500] + "..." if len(tree) > 500 else tree)
# Click the link
print("\n--- Clicking link ---")
result = await bridge.click(tab_id, "a", timeout_ms=5000)
print(f"Click result: {result}")
if result.get("ok"):
print("✓ Click succeeded!")
await asyncio.sleep(2)
# Go back
await bridge.go_back(tab_id)
await asyncio.sleep(1)
# Test type
print("\n--- Testing type on Google ---")
await bridge.navigate(tab_id, "https://www.google.com", wait_until="load")
await asyncio.sleep(2)
result = await bridge.type_text(tab_id, "textarea[name='q']", "hello world")
print(f"Type result: {result}")
if result.get("ok"):
print("✓ Type succeeded!")
# Cleanup
print("\n--- Cleaning up ---")
await bridge.destroy_context(group_id)
print("✓ Context destroyed")
await bridge.stop()
print("✓ Bridge stopped")
if __name__ == "__main__":
asyncio.run(main())
+185
View File
@@ -0,0 +1,185 @@
#!/usr/bin/env python
"""
Test browser tools on LinkedIn - specifically tests scroll and snapshot fixes.
Run: uv run python linkedin_browser_test.py
"""
import asyncio
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent / "src"))
from gcu.browser.bridge import BeelineBridge
async def main():
print("=" * 60)
print("LINKEDIN BROWSER TEST")
print("=" * 60)
print("\nThis tests the fixes for:")
print("1. Scroll on nested scrollable containers (LinkedIn feed)")
print("2. Snapshot timeout on large DOM trees")
print()
bridge = BeelineBridge()
try:
print("Starting bridge...")
await bridge.start()
for i in range(10):
await asyncio.sleep(1)
if bridge.is_connected:
print("✓ Extension connected!")
break
print(f"Waiting for extension... ({i+1}/10)")
else:
print("✗ Extension not connected")
return
# Create context
context = await bridge.create_context("linkedin-test")
tab_id = context.get("tabId")
group_id = context.get("groupId")
print(f"✓ Created tab: {tab_id}")
# Navigate to LinkedIn
print("\n--- Navigating to LinkedIn ---")
try:
await bridge.navigate(tab_id, "https://www.linkedin.com", wait_until="load", timeout_ms=30000)
print("✓ Page loaded")
except Exception as e:
print(f"Navigation result: {e}")
await asyncio.sleep(2)
# Test 1: Snapshot with timeout
print("\n--- Test 1: Snapshot (with timeout protection) ---")
try:
import time
start = time.perf_counter()
snapshot = await bridge.snapshot(tab_id, timeout_s=15.0)
elapsed = time.perf_counter() - start
tree = snapshot.get("tree", "")
print(f"✓ Snapshot completed in {elapsed:.2f}s")
print(f" Tree length: {len(tree)} chars")
if "truncated" in tree:
print(" (Tree was truncated due to size)")
print(f" First 300 chars:\n{tree[:300]}...")
except asyncio.TimeoutError:
print("✗ Snapshot timed out (this shouldn't happen with 15s timeout)")
except Exception as e:
print(f"✗ Snapshot error: {e}")
# Test 2: Scroll - should now find nested scrollable container
print("\n--- Test 2: Scroll (finds nested scrollable container) ---")
# Get scroll position of ALL scrollable elements before
get_scroll_positions = """
(function() {
const results = { window: { x: window.scrollX, y: window.scrollY } };
const scrollables = document.querySelectorAll('*');
let idx = 0;
for (const el of scrollables) {
const style = getComputedStyle(el);
if ((style.overflowY === 'scroll' || style.overflowY === 'auto') &&
el.scrollHeight > el.clientHeight) {
const rect = el.getBoundingClientRect();
if (rect.width > 200 && rect.height > 200) {
results['container_' + idx] = {
tag: el.tagName,
class: el.className.substring(0, 30),
scrollTop: el.scrollTop,
scrollHeight: el.scrollHeight
};
idx++;
}
}
}
return results;
})();
"""
try:
pos_before = await bridge.evaluate(tab_id, get_scroll_positions)
before_data = pos_before.get("result", {}) if pos_before else {}
print(f" Positions before scroll:")
if isinstance(before_data, dict):
for key, val in before_data.items():
print(f" {key}: {val}")
else:
print(f" {before_data}")
result = await bridge.scroll(tab_id, "down", 500)
print(f" Scroll result: {result}")
if result.get("ok"):
method = result.get("method", "unknown")
container = result.get("container", "unknown")
print(f" ✓ Scroll command succeeded using {method}")
else:
print(f" ✗ Scroll command failed: {result.get('error')}")
await asyncio.sleep(1)
# Get scroll positions after
pos_after = await bridge.evaluate(tab_id, get_scroll_positions)
after_data = pos_after.get("result", {}) if pos_after else {}
print(f" Positions after scroll:")
if isinstance(after_data, dict):
for key, val in after_data.items():
print(f" {key}: {val}")
else:
print(f" {after_data}")
# Check if any position changed
changed = False
if isinstance(before_data, dict) and isinstance(after_data, dict):
for key in after_data:
if key in before_data:
b_val = before_data[key].get("scrollTop", 0) if isinstance(before_data[key], dict) else 0
a_val = after_data[key].get("scrollTop", 0) if isinstance(after_data[key], dict) else 0
if a_val != b_val:
print(f" ✓ SCROLL CONFIRMED: {key} changed from {b_val} to {a_val}")
changed = True
if not changed:
print(" ✗ NO SCROLL DETECTED: No scroll positions changed")
except Exception as e:
import traceback
print(f"✗ Scroll error: {e}")
traceback.print_exc()
# Test 3: Multiple scrolls
print("\n--- Test 3: Multiple Scrolls ---")
for i in range(3):
try:
result = await bridge.scroll(tab_id, "down", 200)
print(f" Scroll {i+1}: {result.get('method', 'failed')} on {result.get('container', 'unknown')}")
await asyncio.sleep(0.5)
except Exception as e:
print(f" Scroll {i+1} failed: {e}")
# Test 4: Snapshot after scroll
print("\n--- Test 4: Snapshot After Scroll ---")
try:
snapshot = await bridge.snapshot(tab_id, timeout_s=10.0)
tree = snapshot.get("tree", "")
print(f"✓ Snapshot: {len(tree)} chars")
except Exception as e:
print(f"✗ Snapshot error: {e}")
# Cleanup
print("\n=== Cleanup ===")
await bridge.destroy_context(group_id)
print("✓ Context destroyed")
finally:
await bridge.stop()
print("✓ Bridge stopped")
if __name__ == "__main__":
asyncio.run(main())
+321
View File
@@ -0,0 +1,321 @@
#!/usr/bin/env python
"""
Complex browser scenarios test - real browser interaction.
Tests complex selectors and interactions similar to:
- LinkedIn profile scrolling and data extraction
- Twitter/X infinite timeline
- YouTube video controls
Prerequisites:
1. Chrome with Beeline extension installed
2. Logged into LinkedIn/Twitter/YouTube (for some tests)
3. Run: uv run python manual_browser_complex_test.py
"""
import asyncio
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent / "src"))
from gcu.browser.bridge import BeelineBridge
async def wait_for_bridge(bridge: BeelineBridge, timeout: int = 5) -> bool:
"""Wait for extension connection."""
await bridge.start()
for i in range(timeout):
await asyncio.sleep(1)
if bridge.is_connected:
return True
return False
async def test_linkedin_profile_scroll(bridge: BeelineBridge, tab_id: int) -> dict:
"""Test LinkedIn-style infinite scroll and data extraction."""
print("\n=== LinkedIn: Profile Scroll Test ===")
try:
# Navigate to a LinkedIn page (public profile, no login required)
await bridge.navigate(tab_id, "https://www.linkedin.com/in/williamhgates/", wait_until="networkidle")
await asyncio.sleep(3)
results = {"steps": []}
# Scroll down to load more content
for i in range(3):
result = await bridge.scroll(tab_id, "down", 400)
results["steps"].append(f"scroll_{i}: {result.get('ok')}")
await asyncio.sleep(1)
# Extract profile data using complex selectors
profile_script = """
const name = document.querySelector('h1.text-heading-xlarge')?.innerText ||
document.querySelector('h1')?.innerText || 'Not found';
const headline = document.querySelector('.text-body-medium')?.innerText || 'Not found';
return { name, headline };
"""
result = await bridge.evaluate(tab_id, profile_script)
profile_data = result.get("result", {}).get("value", {})
results["profile"] = profile_data
print(f" Profile: {profile_data}")
# Check if we found real content
if profile_data.get("name") and profile_data.get("name") != "Not found":
results["ok"] = True
print(" ✓ Successfully extracted LinkedIn profile data")
else:
results["ok"] = False
print(" ✗ Could not extract profile data (may need login)")
return results
except Exception as e:
print(f" ✗ Error: {e}")
return {"ok": False, "error": str(e)}
async def test_twitter_timeline(bridge: BeelineBridge, tab_id: int) -> dict:
"""Test Twitter/X timeline interaction."""
print("\n=== Twitter/X: Timeline Test ===")
try:
# Navigate to Twitter explore (doesn't require login)
await bridge.navigate(tab_id, "https://twitter.com/explore", wait_until="networkidle")
await asyncio.sleep(3)
results = {"steps": []}
# Try to find and interact with content
extraction_script = """
// Twitter has complex selectors
const tweets = document.querySelectorAll('article[data-testid="tweet"]');
const titles = document.querySelectorAll('h2');
return {
tweetCount: tweets.length,
titleCount: titles.length,
pageTitle: document.title
};
"""
result = await bridge.evaluate(tab_id, extraction_script)
data = result.get("result", {}).get("value", {})
results["data"] = data
print(f" Page data: {data}")
# Scroll to load more
await bridge.scroll(tab_id, "down", 500)
await asyncio.sleep(2)
results["steps"].append("scrolled")
results["ok"] = data.get("pageTitle", "").lower().find("x") >= 0 or data.get("tweetCount", 0) >= 0
print(f" {'' if results['ok'] else ''} Twitter page loaded")
return results
except Exception as e:
print(f" ✗ Error: {e}")
return {"ok": False, "error": str(e)}
async def test_youtube_controls(bridge: BeelineBridge, tab_id: int) -> dict:
"""Test YouTube video player interaction."""
print("\n=== YouTube: Video Controls Test ===")
try:
# Navigate to a YouTube video
await bridge.navigate(tab_id, "https://www.youtube.com/watch?v=dQw4w9WgXcQ", wait_until="networkidle")
await asyncio.sleep(3)
results = {"steps": []}
# Get player state
player_script = """
const video = document.querySelector('video');
if (video) {
return {
hasVideo: true,
paused: video.paused,
currentTime: Math.round(video.currentTime),
duration: Math.round(video.duration),
muted: video.muted
};
}
return { hasVideo: false };
"""
result = await bridge.evaluate(tab_id, player_script)
state = result.get("result", {}).get("value", {})
results["initialState"] = state
print(f" Initial state: {state}")
if state.get("hasVideo"):
# Try clicking play/pause button
click_result = await bridge.click(tab_id, "button.ytp-play-button", timeout_ms=5000)
results["steps"].append(f"click_play: {click_result.get('ok')}")
await asyncio.sleep(1)
# Check state after click
result = await bridge.evaluate(tab_id, player_script)
new_state = result.get("result", {}).get("value", {})
results["afterClickState"] = new_state
print(f" After click: {new_state}")
results["ok"] = True
print(" ✓ YouTube video controls working")
else:
results["ok"] = False
print(" ✗ Video element not found")
return results
except Exception as e:
print(f" ✗ Error: {e}")
return {"ok": False, "error": str(e)}
async def test_form_interaction(bridge: BeelineBridge, tab_id: int) -> dict:
"""Test complex form filling with various input types."""
print("\n=== Form: Complex Input Test ===")
try:
# Navigate to a form testing page
await bridge.navigate(tab_id, "https://httpbin.org/forms/post", wait_until="load")
await asyncio.sleep(2)
results = {"steps": []}
# Fill text input
result = await bridge.type_text(tab_id, "input[name='custname']", "Test Customer")
results["steps"].append(f"type_name: {result.get('ok')}")
# Fill textarea
result = await bridge.type_text(tab_id, "textarea[name='comments']", "This is a test comment with multiple lines.\nLine 2.\nLine 3.")
results["steps"].append(f"type_comments: {result.get('ok')}")
# Click radio button
result = await bridge.click(tab_id, "input[value='medium']")
results["steps"].append(f"click_radio: {result.get('ok')}")
# Click checkbox
result = await bridge.click(tab_id, "input[name='topping'][value='cheese']")
results["steps"].append(f"click_checkbox: {result.get('ok')}")
# Verify form state
verify_script = """
return {
name: document.querySelector("input[name='custname']")?.value,
comments: document.querySelector("textarea[name='comments']")?.value,
medium: document.querySelector("input[value='medium']")?.checked,
cheese: document.querySelector("input[name='topping'][value='cheese']")?.checked
};
"""
result = await bridge.evaluate(tab_id, verify_script)
form_state = result.get("result", {}).get("value", {})
results["formState"] = form_state
print(f" Form state: {form_state}")
# Check all fields are filled correctly
results["ok"] = (
form_state.get("name") == "Test Customer" and
form_state.get("medium") is True and
form_state.get("cheese") is True
)
print(f" {'' if results['ok'] else ''} Form interaction")
return results
except Exception as e:
print(f" ✗ Error: {e}")
return {"ok": False, "error": str(e)}
async def test_drag_drop(bridge: BeelineBridge, tab_id: int) -> dict:
"""Test drag and drop functionality."""
print("\n=== Drag & Drop Test ===")
try:
# Navigate to a drag-drop demo page
await bridge.navigate(tab_id, "https://www.w3schools.com/html/html5_draganddrop.asp", wait_until="load")
await asyncio.sleep(2)
results = {"steps": []}
# Scroll to demo
await bridge.scroll(tab_id, "down", 600)
await asyncio.sleep(1)
# Try drag operation - this page has draggable elements
# Note: HTML5 drag-drop via CDP is limited, this tests mouse events
result = await bridge.evaluate(tab_id, """
// Check if drag elements exist
const drag1 = document.getElementById('drag1');
const div2 = document.getElementById('div2');
return {
hasDragElement: !!drag1,
hasDropZone: !!div2
};
""")
elements = result.get("result", {}).get("value", {})
results["elements"] = elements
print(f" Elements found: {elements}")
results["ok"] = elements.get("hasDragElement") and elements.get("hasDropZone")
print(f" {'' if results['ok'] else ''} Drag elements found")
return results
except Exception as e:
print(f" ✗ Error: {e}")
return {"ok": False, "error": str(e)}
async def main():
print("=" * 60)
print("COMPLEX BROWSER SCENARIOS TEST")
print("=" * 60)
print("\nThis tests complex interactions on real websites.")
print("Some tests may fail if not logged in to the respective sites.\n")
bridge = BeelineBridge()
try:
if not await wait_for_bridge(bridge):
print("❌ Extension not connected. Ensure Chrome extension is running.")
return
# Create context
context = await bridge.create_context("complex-test")
tab_id = context.get("tabId")
group_id = context.get("groupId")
print(f"✓ Created context: tabId={tab_id}")
# Run tests
results = []
results.append(("LinkedIn Profile", await test_linkedin_profile_scroll(bridge, tab_id)))
results.append(("Twitter Timeline", await test_twitter_timeline(bridge, tab_id)))
results.append(("YouTube Controls", await test_youtube_controls(bridge, tab_id)))
results.append(("Form Interaction", await test_form_interaction(bridge, tab_id)))
results.append(("Drag & Drop", await test_drag_drop(bridge, tab_id)))
# Cleanup
print("\n=== Cleanup ===")
await bridge.destroy_context(group_id)
print("✓ Context destroyed")
# Summary
print("\n" + "=" * 60)
print("RESULTS")
print("=" * 60)
passed = sum(1 for _, r in results if r.get("ok"))
for name, result in results:
status = "" if result.get("ok") else ""
print(f" {status} {name}")
if not result.get("ok") and result.get("error"):
print(f" Error: {result['error']}")
print(f"\nTotal: {passed}/{len(results)} passed")
finally:
await bridge.stop()
print("\nBridge stopped.")
if __name__ == "__main__":
asyncio.run(main())
+280
View File
@@ -0,0 +1,280 @@
#!/usr/bin/env python
"""
Manual browser tools test - connects to real Chrome extension.
Prerequisites:
1. Chrome with Beeline extension installed and enabled
2. Run: uv run python manual_browser_test.py
This will test:
- Bridge connection
- Tab group creation
- Navigation
- Click, type, scroll interactions
- Snapshot/screenshot
- Complex JS execution (LinkedIn-style selectors)
"""
import asyncio
import json
import sys
from pathlib import Path
# Add src to path
sys.path.insert(0, str(Path(__file__).parent / "src"))
from gcu.browser.bridge import BeelineBridge
async def test_connection(bridge: BeelineBridge) -> bool:
"""Test 1: Extension connection."""
print("\n=== Test 1: Extension Connection ===")
print("Starting bridge on port 9229...")
await bridge.start()
for i in range(5):
await asyncio.sleep(1)
if bridge.is_connected:
print("✓ Extension connected!")
return True
print(f" Waiting... ({i+1}/5)")
print("✗ Extension not connected. Ensure Chrome extension is installed.")
return False
async def test_context_creation(bridge: BeelineBridge) -> dict | None:
"""Test 2: Create tab group/context."""
print("\n=== Test 2: Create Tab Group ===")
try:
result = await bridge.create_context("manual-test-agent")
print(f"✓ Created context: groupId={result.get('groupId')}, tabId={result.get('tabId')}")
return result
except Exception as e:
print(f"✗ Failed: {e}")
return None
async def test_navigation(bridge: BeelineBridge, tab_id: int) -> bool:
"""Test 3: Navigate to example.com."""
print("\n=== Test 3: Navigation ===")
try:
result = await bridge.navigate(tab_id, "https://example.com", wait_until="load")
print(f"✓ Navigated to: {result.get('url')}")
await asyncio.sleep(1)
return True
except Exception as e:
print(f"✗ Failed: {e}")
return False
async def test_snapshot(bridge: BeelineBridge, tab_id: int) -> bool:
"""Test 4: Get accessibility snapshot."""
print("\n=== Test 4: Accessibility Snapshot ===")
try:
result = await bridge.snapshot(tab_id)
tree = result.get("tree", "")
lines = tree.split("\n")[:10]
print(f"✓ Got snapshot ({len(tree)} chars)")
print(" First 10 lines:")
for line in lines:
print(f" {line}")
return True
except Exception as e:
print(f"✗ Failed: {e}")
return False
async def test_click(bridge: BeelineBridge, tab_id: int) -> bool:
"""Test 5: Click a link."""
print("\n=== Test 5: Click Element ===")
try:
# example.com has a link to "More information..."
result = await bridge.click(tab_id, "a", timeout_ms=5000)
if result.get("ok"):
print(f"✓ Clicked link at ({result.get('x')}, {result.get('y')})")
await asyncio.sleep(2)
# Go back
await bridge.go_back(tab_id)
await asyncio.sleep(1)
return True
else:
print(f"✗ Click failed: {result.get('error')}")
return False
except Exception as e:
print(f"✗ Failed: {e}")
return False
async def test_type_and_clear(bridge: BeelineBridge, tab_id: int) -> bool:
"""Test 6: Type into an input field."""
print("\n=== Test 6: Type Text ===")
try:
# Navigate to a page with an input
await bridge.navigate(tab_id, "https://www.google.com", wait_until="load")
await asyncio.sleep(2)
# Type in search box
result = await bridge.type_text(tab_id, "textarea[name='q']", "hello world")
if result.get("ok"):
print("✓ Typed 'hello world' into search box")
await asyncio.sleep(0.5)
# Clear and type something else
await bridge.press_key(tab_id, "Control+a")
await asyncio.sleep(0.2)
await bridge.type_text(tab_id, "textarea[name='q']", "new search")
print("✓ Replaced with 'new search'")
return True
else:
print(f"✗ Type failed: {result.get('error')}")
return False
except Exception as e:
print(f"✗ Failed: {e}")
return False
async def test_scroll(bridge: BeelineBridge, tab_id: int) -> bool:
"""Test 7: Scroll page."""
print("\n=== Test 7: Scroll ===")
try:
# Scroll down
result = await bridge.scroll(tab_id, "down", 500)
if result.get("ok"):
print("✓ Scrolled down 500px")
await asyncio.sleep(0.5)
# Scroll up
await bridge.scroll(tab_id, "up", 250)
print("✓ Scrolled up 250px")
return True
else:
print(f"✗ Scroll failed: {result.get('error')}")
return False
except Exception as e:
print(f"✗ Failed: {e}")
return False
async def test_evaluate_js(bridge: BeelineBridge, tab_id: int) -> bool:
"""Test 8: Execute JavaScript."""
print("\n=== Test 8: JavaScript Execution ===")
try:
# Simple return
result = await bridge.evaluate(tab_id, "return document.title;")
print(f"✓ Page title: {result.get('result', {}).get('value')}")
# Complex selector (LinkedIn-style)
complex_script = """
const links = document.querySelectorAll('a');
return {
total: links.length,
first: links[0]?.href || null
};
"""
result = await bridge.evaluate(tab_id, complex_script)
value = result.get("result", {}).get("value", {})
print(f"✓ Found {value.get('total')} links, first: {value.get('first', 'N/A')[:50]}...")
return True
except Exception as e:
print(f"✗ Failed: {e}")
return False
async def test_screenshot(bridge: BeelineBridge, tab_id: int) -> bool:
"""Test 9: Take screenshot."""
print("\n=== Test 9: Screenshot ===")
try:
result = await bridge.screenshot(tab_id, full_page=False)
if result.get("ok"):
data = result.get("data", "")
print(f"✓ Screenshot captured ({len(data)} chars base64)")
return True
else:
print(f"✗ Screenshot failed: {result.get('error')}")
return False
except Exception as e:
print(f"✗ Failed: {e}")
return False
async def test_tab_management(bridge: BeelineBridge, group_id: int, tab_id: int) -> bool:
"""Test 10: Create and close tabs."""
print("\n=== Test 10: Tab Management ===")
try:
# Create new tab
new_tab = await bridge.create_tab(group_id, "https://httpbin.org")
new_tab_id = new_tab.get("tabId")
print(f"✓ Created new tab: {new_tab_id}")
await asyncio.sleep(2)
# List tabs
tabs = await bridge.list_tabs(group_id)
print(f"✓ Group has {len(tabs.get('tabs', []))} tabs")
# Close the new tab
await bridge.close_tab(new_tab_id)
print(f"✓ Closed tab {new_tab_id}")
return True
except Exception as e:
print(f"✗ Failed: {e}")
return False
async def main():
print("=" * 60)
print("MANUAL BROWSER TOOLS TEST")
print("=" * 60)
bridge = BeelineBridge()
try:
# Test 1: Connection
if not await test_connection(bridge):
print("\n❌ Cannot proceed without extension connection")
return
# Test 2: Context creation
context = await test_context_creation(bridge)
if not context:
print("\n❌ Cannot proceed without context")
return
tab_id = context.get("tabId")
group_id = context.get("groupId")
results = []
# Run all tests
results.append(("Navigation", await test_navigation(bridge, tab_id)))
results.append(("Snapshot", await test_snapshot(bridge, tab_id)))
results.append(("Click", await test_click(bridge, tab_id)))
results.append(("Type", await test_type_and_clear(bridge, tab_id)))
results.append(("Scroll", await test_scroll(bridge, tab_id)))
results.append(("Evaluate JS", await test_evaluate_js(bridge, tab_id)))
results.append(("Screenshot", await test_screenshot(bridge, tab_id)))
results.append(("Tab Management", await test_tab_management(bridge, group_id, tab_id)))
# Cleanup
print("\n=== Cleanup ===")
await bridge.destroy_context(group_id)
print("✓ Destroyed context")
# Summary
print("\n" + "=" * 60)
print("RESULTS SUMMARY")
print("=" * 60)
passed = sum(1 for _, r in results if r)
total = len(results)
for name, result in results:
status = "✓ PASS" if result else "✗ FAIL"
print(f" {status}: {name}")
print(f"\nTotal: {passed}/{total} passed")
finally:
await bridge.stop()
print("\nBridge stopped.")
if __name__ == "__main__":
asyncio.run(main())
+148
View File
@@ -0,0 +1,148 @@
#!/usr/bin/env python
"""
Debug browser click - specifically tests the Input.enable domain issue.
Run: uv run python manual_click_debug.py
"""
import asyncio
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent / "src"))
from gcu.browser.bridge import BeelineBridge
async def main():
print("=" * 60)
print("BROWSER CLICK DEBUG")
print("=" * 60)
print("\nThis tests the click functionality and CDP domain handling.\n")
bridge = BeelineBridge()
try:
print("Starting bridge...")
await bridge.start()
for i in range(5):
await asyncio.sleep(1)
if bridge.is_connected:
print("✓ Extension connected!")
break
print(f"Waiting for extension... ({i+1}/5)")
else:
print("✗ Extension not connected")
return
# Create context
context = await bridge.create_context("click-debug")
tab_id = context.get("tabId")
group_id = context.get("groupId")
print(f"✓ Created tab: {tab_id}")
# Navigate to a simple page
print("\nNavigating to example.com...")
await bridge.navigate(tab_id, "https://example.com", wait_until="load")
await asyncio.sleep(1)
print("✓ Page loaded")
# Test 1: Get snapshot first
print("\n--- Test 1: Snapshot ---")
try:
snapshot = await bridge.snapshot(tab_id)
print(f"✓ Snapshot: {snapshot.get('tree', '')[:200]}...")
except Exception as e:
print(f"✗ Snapshot failed: {e}")
# Test 2: Click the "More information" link (example.com has <p> with link inside)
print("\n--- Test 2: Click Link ---")
try:
# First, let's see what elements are on the page
check_script = """
const links = document.querySelectorAll('a');
return Array.from(links).map(a => ({
href: a.href,
text: a.innerText.substring(0, 50)
}));
"""
result = await bridge.evaluate(tab_id, check_script)
# The evaluate method returns {"ok": True, "result": value}
links = result.get("result", [])
print(f" Found {len(links) if isinstance(links, list) else 0} links: {links}")
if links and isinstance(links, list) and len(links) > 0:
# example.com structure: <p><a href="...">More information...</a></p>
result = await bridge.click(tab_id, "a", timeout_ms=5000)
print(f" Click result: {result}")
if result.get("ok"):
print(" ✓ Click succeeded!")
await asyncio.sleep(2)
# Go back
await bridge.go_back(tab_id)
await asyncio.sleep(1)
else:
print(f" ✗ Click failed: {result.get('error')}")
else:
print(" No links found to click")
except Exception as e:
print(f" ✗ Click exception: {e}")
# Test 3: Click at coordinates
print("\n--- Test 3: Click Coordinates ---")
try:
result = await bridge.click_coordinate(tab_id, 100, 100)
print(f"Click coordinate result: {result}")
except Exception as e:
print(f"✗ Click coordinate exception: {e}")
# Test 4: Type text (requires input)
print("\n--- Test 4: Type Text ---")
try:
# Navigate to Google
await bridge.navigate(tab_id, "https://www.google.com", wait_until="load")
await asyncio.sleep(2)
result = await bridge.type_text(tab_id, "textarea[name='q']", "test query")
print(f"Type result: {result}")
if result.get("ok"):
print("✓ Type succeeded!")
else:
print(f"✗ Type failed: {result.get('error')}")
except Exception as e:
print(f"✗ Type exception: {e}")
# Test 5: Hover (on a visible button)
print("\n--- Test 5: Hover ---")
try:
# Stay on Google, hover over the "Google Search" button
result = await bridge.hover(tab_id, "input[value='Google Search']", timeout_ms=5000)
print(f"Hover result: {result}")
if result.get("ok"):
print("✓ Hover succeeded!")
else:
# Try hovering over the search input instead
print(f" First hover failed: {result.get('error')}")
result = await bridge.hover(tab_id, "textarea[name='q']", timeout_ms=3000)
print(f" Hover input result: {result}")
if result.get("ok"):
print("✓ Hover on input succeeded!")
else:
print(f"✗ Hover failed: {result.get('error')}")
except Exception as e:
import traceback
print(f"✗ Hover exception: {e}")
traceback.print_exc()
# Cleanup
print("\n=== Cleanup ===")
await bridge.destroy_context(group_id)
print("✓ Done")
finally:
await bridge.stop()
if __name__ == "__main__":
asyncio.run(main())
+82
View File
@@ -26,6 +26,7 @@ import subprocess
import sys
import tempfile
from collections.abc import Callable
from contextvars import ContextVar
from pathlib import Path
from fastmcp import FastMCP
@@ -106,6 +107,87 @@ BINARY_EXTENSIONS = frozenset(
}
)
# ── Context-aware sandboxing ─────────────────────────────────────────────────
# Context variable for additional allowed paths (beyond base_root)
_allowed_paths_ctx: ContextVar[list[str]] = ContextVar("allowed_paths", default=[])
def set_allowed_paths(paths: list[str]) -> None:
"""Set additional allowed paths for file operations in this context.
Use this to grant access to paths beyond the base root (e.g., ~/.hive/
for cross-agent file access).
"""
_allowed_paths_ctx.set(list(paths))
def get_allowed_paths() -> list[str]:
"""Get current allowed paths including ~/.hive/."""
paths = list(_allowed_paths_ctx.get())
hive_dir = os.path.expanduser("~/.hive")
if hive_dir not in paths:
paths.append(hive_dir)
return paths
def create_sandboxed_resolver(
base_root: str,
allowed_paths: list[str] | None = None,
) -> Callable[[str], str]:
"""Create a path resolver that enforces sandbox boundaries.
Args:
base_root: The primary allowed directory (e.g., PROJECT_ROOT or data_dir).
allowed_paths: Additional allowed paths. If None, uses get_allowed_paths()
which includes ~/.hive/ by default.
Returns:
A path resolver function that raises ValueError for paths outside allowed scopes.
The resolver:
- Resolves relative paths against base_root
- Allows absolute paths under base_root or any allowed_path
- Blocks access outside allowed scopes with a helpful error message
"""
hive_dir = os.path.expanduser("~/.hive")
def resolve(path: str) -> str:
# Normalize slashes for cross-platform
path = path.replace("/", os.sep)
# Expand ~ to home directory
if path.startswith("~"):
path = os.path.expanduser(path)
# Resolve to absolute path
if os.path.isabs(path):
resolved = os.path.abspath(path)
else:
resolved = os.path.abspath(os.path.join(base_root, path))
# Build allowed paths list
extra_paths = allowed_paths if allowed_paths is not None else get_allowed_paths()
all_allowed = [base_root] + extra_paths
# Check against all allowed paths
for allowed_path in all_allowed:
try:
if os.path.commonpath([resolved, allowed_path]) == allowed_path:
return resolved
except ValueError:
continue
# Block and remind
allowed_str = ", ".join(f"'{p}'" for p in all_allowed)
raise ValueError(
f"Access denied: '{path}' is not accessible. "
f"Allowed paths: {allowed_str}"
)
return resolve
# ── Private helpers ───────────────────────────────────────────────────────
@@ -1,377 +1,295 @@
"""
Data Tools - Load, save, and list data files for agent pipelines.
Data Tools - Read and write files with data_dir sandboxing.
These tools let agents store large intermediate results in files and
retrieve them with pagination, keeping the LLM conversation context small.
Used in conjunction with the spillover system: when a tool result is too
large, the framework writes it to a file and the agent can load it back
with load_data().
These tools let agents read and write files within their session's data directory
and access files in ~/.hive/ for cross-agent file sharing.
Uses context injection for data_dir - the parameter is auto-injected by the
framework and doesn't need to be provided by the LLM.
"""
from __future__ import annotations
import os
from pathlib import Path
from mcp.server.fastmcp import FastMCP
from aden_tools.credentials.browser import open_browser
# ~/.hive/ is always allowed for cross-agent file access
HIVE_DIR = os.path.expanduser("~/.hive")
def _resolve_path(path: str, data_dir: str | None) -> str:
"""Resolve and validate a path against the allowed directories.
Args:
path: The path to resolve (can be relative or absolute)
data_dir: The session's data directory from context
Returns:
The resolved absolute path
Raises:
ValueError: If path is outside allowed directories
"""
if not data_dir:
raise ValueError("data_dir is not configured")
# Normalize path
path = path.replace("/", os.sep)
# Expand ~ to home directory
if path.startswith("~"):
path = os.path.expanduser(path)
# Resolve to absolute path
if os.path.isabs(path):
resolved = os.path.abspath(path)
else:
resolved = os.path.abspath(os.path.join(data_dir, path))
# Check against allowed paths
allowed_paths = [data_dir, HIVE_DIR]
for allowed in allowed_paths:
try:
if os.path.commonpath([resolved, allowed]) == allowed:
return resolved
except ValueError:
continue
# Block and remind
allowed_str = ", ".join(f"'{p}'" for p in allowed_paths)
raise ValueError(
f"Access denied: '{path}' is not accessible. "
f"Allowed paths: {allowed_str}"
)
def register_tools(mcp: FastMCP) -> None:
"""Register data management tools with the MCP server."""
"""Register file management tools with the MCP server."""
@mcp.tool()
def save_data(filename: str, data: str, data_dir: str) -> dict:
"""
Purpose
Save data to a file for later retrieval by this or downstream nodes.
def read_file(
path: str,
offset: int = 1,
limit: int = 0,
data_dir: str = "",
) -> str:
"""Read file contents with line numbers.
When to use
Store large results (search results, profiles, analysis) instead
of passing them inline through set_output.
Returns a brief summary with the filename to reference later.
Rules & Constraints
filename must be a simple name like 'results.json' no paths or '..'
data_dir must be the absolute path to the data directory
Files are read from the session's data directory or ~/.hive/.
Large files are automatically truncated at 2000 lines or 50KB.
Use offset and limit to paginate through large files.
Args:
filename: Simple filename like 'github_users.json'. No paths or '..'.
data: The string data to write (typically JSON).
data_dir: Absolute path to the data directory.
Returns:
Dict with success status and file metadata, or error dict
path: File path to read. Can be relative to data_dir or absolute.
offset: Starting line number, 1-indexed (default: 1).
limit: Max lines to return, 0 = up to 2000 (default: 0).
data_dir: Auto-injected - the session's data directory.
"""
if not filename or ".." in filename or "/" in filename or "\\" in filename:
return {"error": "Invalid filename. Use simple names like 'users.json'"}
if not data_dir:
return {"error": "data_dir is required"}
try:
resolved = _resolve_path(path, data_dir)
except ValueError as e:
return f"Error: {e}"
if os.path.isdir(resolved):
entries = []
for entry in sorted(os.listdir(resolved)):
full = os.path.join(resolved, entry)
suffix = "/" if os.path.isdir(full) else ""
entries.append(f" {entry}{suffix}")
total = len(entries)
return f"Directory: {path} ({total} entries)\n" + "\n".join(entries[:200])
if not os.path.isfile(resolved):
return f"Error: File not found: {path}"
# Check for binary files
try:
with open(resolved, "rb") as f:
chunk = f.read(4096)
if b"\x00" in chunk:
size = os.path.getsize(resolved)
return f"Binary file: {path} ({size:,} bytes). Cannot display binary content."
except OSError:
pass
try:
dir_path = Path(data_dir)
dir_path.mkdir(parents=True, exist_ok=True)
path = dir_path / filename
path.write_text(data, encoding="utf-8")
lines = data.count("\n") + 1
return {
"success": True,
"filename": filename,
"size_bytes": len(data.encode("utf-8")),
"lines": lines,
"preview": data[:200] + ("..." if len(data) > 200 else ""),
}
except Exception as e:
return {"error": f"Failed to save data: {str(e)}"}
with open(resolved, encoding="utf-8", errors="replace") as f:
content = f.read()
@mcp.tool()
def load_data(
filename: str,
data_dir: str,
offset_bytes: int = 0,
limit_bytes: int = 10000,
) -> dict:
"""
Purpose
Load data from a previously saved file with byte-based pagination.
Efficient for files of any size (1 byte to 1 TB).
Automatically detects safe UTF-8 boundaries to prevent character splitting.
all_lines = content.splitlines()
total_lines = len(all_lines)
start_idx = max(0, offset - 1)
effective_limit = limit if limit > 0 else 2000
end_idx = min(start_idx + effective_limit, total_lines)
When to use
Retrieve large tool results that were spilled to disk.
Read data saved by save_data or by the spillover system.
Page through large files without loading everything into context.
max_bytes = 50 * 1024
output_lines = []
byte_count = 0
Rules & Constraints
filename must match a file in data_dir
Uses byte offsets for O(1) seeking (works with huge files)
Automatically trims to valid UTF-8 character boundaries
Returns exactly limit_bytes or less (rounded to safe boundary)
for i in range(start_idx, end_idx):
line = all_lines[i]
if len(line) > 2000:
line = line[:2000] + "..."
formatted = f"{i + 1:>6}\t{line}"
line_bytes = len(formatted.encode("utf-8")) + 1
if byte_count + line_bytes > max_bytes:
break
output_lines.append(formatted)
byte_count += line_bytes
Args:
filename: The filename to load (as shown in spillover messages or save_data results).
data_dir: Absolute path to the data directory.
offset_bytes: Byte offset to start reading from. Default 0.
limit_bytes: Max number of bytes to return. Default 10000 (10KB).
result = "\n".join(output_lines)
lines_shown = len(output_lines)
actual_end = start_idx + lines_shown
Returns:
Dict with content, pagination info, and metadata
Examples:
load_data('emails.jsonl', '/data') # first 10KB
load_data('emails.jsonl', '/data', offset_bytes=10000) # next 10KB
load_data('large.txt', '/data', limit_bytes=50000) # first 50KB
"""
if not filename or ".." in filename or "/" in filename or "\\" in filename:
return {"error": "Invalid filename"}
if not data_dir:
return {"error": "data_dir is required"}
try:
offset_bytes = int(offset_bytes)
limit_bytes = int(limit_bytes)
path = Path(data_dir) / filename
if not path.exists():
return {"error": f"File not found: {filename}"}
file_size = path.stat().st_size
# Handle edge case: offset beyond file size
if offset_bytes >= file_size:
return {
"success": True,
"filename": filename,
"content": "",
"offset_bytes": offset_bytes,
"bytes_read": 0,
"next_offset_bytes": file_size,
"file_size_bytes": file_size,
"has_more": False,
}
with open(path, "rb") as f:
# O(1) seek to byte offset
f.seek(offset_bytes)
# Read exactly limit_bytes
raw_bytes = f.read(limit_bytes)
# Trim to valid UTF-8 boundary
# Scan backwards max 4 bytes to find valid UTF-8 start
chunk = raw_bytes
text = None
for i in range(min(4, len(raw_bytes)) + 1):
try:
slice_end = len(raw_bytes) - i if i > 0 else len(raw_bytes)
text = raw_bytes[:slice_end].decode("utf-8")
chunk = raw_bytes[:slice_end]
break
except UnicodeDecodeError:
continue
# If we couldn't decode at all, return error
if text is None:
return {"error": "Could not decode file as UTF-8"}
# UTF-8 boundary is already handled above
next_offset = offset_bytes + len(chunk)
return {
"success": True,
"filename": filename,
"content": text,
"offset_bytes": offset_bytes,
"bytes_read": len(chunk),
"next_offset_bytes": next_offset,
"file_size_bytes": file_size,
"has_more": next_offset < file_size,
}
except Exception as e:
return {"error": f"Failed to load data: {str(e)}"}
@mcp.tool()
def serve_file_to_user(
filename: str, data_dir: str, label: str = "", open_in_browser: bool = False
) -> dict:
"""
Purpose
Resolve a sandboxed file path to a fully qualified file URI
that the user can click to open in their system viewer.
When to use
After saving a file (HTML report, CSV export, etc.) with save_data,
call this to give the user a clickable link to open it.
The TUI will render the file:// URI as a clickable link.
Set open_in_browser=True to also auto-open the file in the
user's default browser.
Rules & Constraints
filename must be a simple name no paths or '..'
The file must already exist in data_dir
Returns a file:// URI the agent should include in its response
Args:
filename: The filename to serve (must exist in data_dir).
data_dir: Absolute path to the data directory.
label: Optional display label (defaults to filename).
open_in_browser: If True, auto-open the file in the default browser.
Returns:
Dict with file_uri, file_path, label, and optionally browser_opened
"""
if not filename or ".." in filename or "/" in filename or "\\" in filename:
return {"error": "Invalid filename. Use simple names like 'report.html'"}
if not data_dir:
return {"error": "data_dir is required"}
try:
path = Path(data_dir) / filename
if not path.exists():
return {"error": f"File not found: {filename}"}
full_path = str(path.resolve())
file_uri = f"file://{full_path}"
result = {
"success": True,
"file_uri": file_uri,
"file_path": full_path,
"label": label or filename,
}
if open_in_browser:
opened, msg = open_browser(file_uri)
result["browser_opened"] = opened
result["browser_message"] = msg
if actual_end < total_lines:
result += f"\n\n(Showing lines {start_idx + 1}-{actual_end} of {total_lines}. Use offset={actual_end + 1} to continue reading.)"
return result
except Exception as e:
return {"error": f"Failed to serve file: {str(e)}"}
return f"Error reading file: {e}"
@mcp.tool()
def list_data_files(data_dir: str) -> dict:
"""
Purpose
List all data files in the data directory.
def write_file(
path: str,
content: str,
data_dir: str = "",
) -> str:
"""Create or overwrite a file with the given content.
When to use
Discover what intermediate results or spillover files are available.
Check what data was saved by previous nodes in the pipeline.
Automatically creates parent directories. Files are written to
the session's data directory or ~/.hive/.
Args:
data_dir: Absolute path to the data directory.
Returns:
Dict with list of files and their sizes
path: File path to write. Can be relative to data_dir or absolute.
content: Complete file content to write.
data_dir: Auto-injected - the session's data directory.
"""
if not data_dir:
return {"error": "data_dir is required"}
try:
resolved = _resolve_path(path, data_dir)
except ValueError as e:
return f"Error: {e}"
try:
dir_path = Path(data_dir)
if not dir_path.exists():
return {"files": []}
resolved_path = Path(resolved)
resolved_path.parent.mkdir(parents=True, exist_ok=True)
files = []
for f in sorted(dir_path.iterdir()):
if f.is_file():
files.append(
{
"filename": f.name,
"size_bytes": f.stat().st_size,
}
)
return {"files": files}
existed = resolved_path.is_file()
content_str = content if content is not None else ""
with open(resolved_path, "w", encoding="utf-8") as f:
f.write(content_str)
f.flush()
os.fsync(f.fileno())
line_count = content_str.count("\n") + (
1 if content_str and not content_str.endswith("\n") else 0
)
action = "Updated" if existed else "Created"
return f"{action} {path} ({len(content_str):,} bytes, {line_count} lines)"
except Exception as e:
return {"error": f"Failed to list data files: {str(e)}"}
return f"Error writing file: {e}"
@mcp.tool()
def append_data(filename: str, data: str, data_dir: str) -> dict:
"""
Purpose
Append data to the end of an existing file, or create it if it
doesn't exist yet.
def list_files(
path: str = ".",
recursive: bool = False,
data_dir: str = "",
) -> str:
"""List directory contents with type indicators.
When to use
Build large files incrementally instead of writing everything in
one save_data call. For example, write an HTML skeleton first,
then append each section separately to stay within token limits.
Rules & Constraints
filename must be a simple name like 'report.html' no paths or '..'
Directories have a / suffix. Hidden files and common build directories
are skipped.
Args:
filename: Simple filename to append to. No paths or '..'.
data: The string data to append.
data_dir: Absolute path to the data directory.
Returns:
Dict with success status, new total size, and bytes appended
path: Directory path (default: data_dir).
recursive: List recursively (default: false).
data_dir: Auto-injected - the session's data directory.
"""
if not filename or ".." in filename or "/" in filename or "\\" in filename:
return {"error": "Invalid filename. Use simple names like 'report.html'"}
if not data_dir:
return {"error": "data_dir is required"}
try:
resolved = _resolve_path(path, data_dir)
except ValueError as e:
return f"Error: {e}"
if not os.path.isdir(resolved):
return f"Error: Directory not found: {path}"
try:
dir_path = Path(data_dir)
dir_path.mkdir(parents=True, exist_ok=True)
path = dir_path / filename
with open(path, "a", encoding="utf-8") as f:
f.write(data)
appended_bytes = len(data.encode("utf-8"))
total_bytes = path.stat().st_size
return {
"success": True,
"filename": filename,
"size_bytes": total_bytes,
"appended_bytes": appended_bytes,
}
skip = {".git", "__pycache__", "node_modules", ".venv", ".tox"}
entries: list[str] = []
if recursive:
for root, dirs, files in os.walk(resolved):
dirs[:] = sorted(d for d in dirs if d not in skip and not d.startswith("."))
rel_root = os.path.relpath(root, resolved)
if rel_root == ".":
rel_root = ""
for f in sorted(files):
if f.startswith("."):
continue
entries.append(os.path.join(rel_root, f) if rel_root else f)
if len(entries) >= 500:
entries.append("... (truncated at 500 entries)")
return "\n".join(entries)
else:
for entry in sorted(os.listdir(resolved)):
if entry.startswith(".") or entry in skip:
continue
full = os.path.join(resolved, entry)
suffix = "/" if os.path.isdir(full) else ""
entries.append(f"{entry}{suffix}")
return "\n".join(entries) if entries else "(empty directory)"
except Exception as e:
return {"error": f"Failed to append data: {str(e)}"}
return f"Error listing directory: {e}"
@mcp.tool()
def edit_data(filename: str, old_text: str, new_text: str, data_dir: str) -> dict:
"""
Purpose
Find and replace a specific text segment in an existing file.
Works like a surgical diff only the matched portion changes.
def search_files(
pattern: str,
path: str = ".",
data_dir: str = "",
) -> str:
"""Search file contents using regex.
When to use
Update a section of a previously saved file without rewriting
the entire content. For example, replace a placeholder in an
HTML report or fix a specific paragraph.
Rules & Constraints
old_text must appear exactly once in the file. If it appears
zero times or more than once, the edit is rejected with an
error message.
Results sorted by file with line numbers. Searches within
the session's data directory or ~/.hive/.
Args:
filename: The file to edit. Must exist in data_dir.
old_text: The exact text to find (must match exactly once).
new_text: The replacement text.
data_dir: Absolute path to the data directory.
Returns:
Dict with success status and updated file size
pattern: Regex pattern to search for.
path: Directory path to search (default: data_dir).
data_dir: Auto-injected - the session's data directory.
"""
if not filename or ".." in filename or "/" in filename or "\\" in filename:
return {"error": "Invalid filename. Use simple names like 'report.html'"}
if not data_dir:
return {"error": "data_dir is required"}
import re
try:
path = Path(data_dir) / filename
if not path.exists():
return {"error": f"File not found: {filename}"}
resolved = _resolve_path(path, data_dir)
except ValueError as e:
return f"Error: {e}"
content = path.read_text(encoding="utf-8")
count = content.count(old_text)
if not os.path.isdir(resolved):
return f"Error: Directory not found: {path}"
if count == 0:
return {
"error": (
"old_text not found in the file. "
"Make sure you're matching the exact text, "
"including whitespace and newlines."
)
}
if count > 1:
return {
"error": (
f"old_text found {count} times — it must be unique. "
"Include more surrounding context to match exactly once."
)
}
try:
compiled = re.compile(pattern)
matches: list[str] = []
skip_dirs = {".git", "__pycache__", "node_modules", ".venv"}
updated = content.replace(old_text, new_text, 1)
path.write_text(updated, encoding="utf-8")
for root, dirs, files in os.walk(resolved):
dirs[:] = [d for d in dirs if d not in skip_dirs]
for fname in files:
fpath = os.path.join(root, fname)
display_path = os.path.relpath(fpath, resolved)
try:
with open(fpath, encoding="utf-8", errors="ignore") as f:
for i, line in enumerate(f, 1):
stripped = line.rstrip()
if compiled.search(stripped):
matches.append(f"{display_path}:{i}:{stripped[:2000]}")
if len(matches) >= 100:
return "\n".join(matches) + "\n... (truncated)"
except (OSError, UnicodeDecodeError):
continue
return {
"success": True,
"filename": filename,
"size_bytes": len(updated.encode("utf-8")),
"replacements": 1,
}
except Exception as e:
return {"error": f"Failed to edit data: {str(e)}"}
return "\n".join(matches) if matches else "No matches found."
except re.error as e:
return f"Error: Invalid regex: {e}"
+614 -116
View File
@@ -78,8 +78,20 @@ class BeelineBridge:
return
try:
self._server = await websockets.serve(self._handle_connection, "127.0.0.1", port)
logger.info("Beeline bridge listening on ws://127.0.0.1:%d/bridge", port)
# Suppress noisy websockets logging for invalid upgrade attempts
# by providing a null logger
import logging
null_logger = logging.getLogger("websockets.null")
null_logger.setLevel(logging.CRITICAL)
null_logger.addHandler(logging.NullHandler())
self._server = await websockets.serve(
self._handle_connection,
"127.0.0.1",
port,
logger=null_logger,
)
logger.info("Beeline bridge listening on ws://127.0.0.1:%d", port)
except OSError as e:
logger.warning("Beeline bridge could not start on port %d: %s", port, e)
@@ -170,6 +182,21 @@ class BeelineBridge:
log_cdp_command(tab_id, method, params, error=str(e), duration_ms=duration_ms)
raise
async def _try_enable_domain(self, tab_id: int, domain: str) -> None:
"""Try to enable a CDP domain, ignoring errors if not available.
Some domains (like Input) may not be available on certain page types
(e.g., chrome:// URLs, extension pages, or restricted sites).
"""
try:
await self._cdp(tab_id, f"{domain}.enable")
except RuntimeError as e:
# Log but don't fail - domain may not be available on all pages
if "wasn't found" in str(e) or "not found" in str(e).lower():
logger.debug("CDP domain %s.enable not available for tab %s", domain, tab_id)
else:
raise
# ── Context (Tab Group) Management ─────────────────────────────────────────
async def create_context(self, agent_id: str) -> dict:
@@ -374,12 +401,15 @@ class BeelineBridge:
) -> dict:
"""Click an element by selector.
Uses DOM.getDocument + DOM.querySelector to find the element,
then DOM.getBoxModel to get coordinates, then Input.dispatchMouseEvent.
Uses multiple fallback methods for robustness:
1. CDP mouse events with JavaScript bounds
2. JavaScript click() as fallback
Inspired by browser-use's robust click implementation.
"""
await self.cdp_attach(tab_id)
await self._cdp(tab_id, "DOM.enable")
await self._cdp(tab_id, "Input.enable")
await self._try_enable_domain(tab_id, "DOM")
await self._try_enable_domain(tab_id, "Input")
# Get document and find element
doc = await self._cdp(tab_id, "DOM.getDocument")
@@ -400,56 +430,172 @@ class BeelineBridge:
if not node_id:
return {"ok": False, "error": f"Element not found: {selector}"}
# Get box model for coordinates
box = await self._cdp(tab_id, "DOM.getBoxModel", {"nodeId": node_id})
content = box.get("content", [])
if len(content) < 4:
# Scroll into view FIRST to ensure element is rendered
try:
await self._cdp(
tab_id,
"DOM.scrollIntoViewIfNeeded",
{"nodeId": node_id},
)
await asyncio.sleep(0.05) # Wait for scroll to complete
except Exception:
pass # Best effort - continue even if scroll fails
# Get viewport dimensions for bounds checking
viewport_script = """
(function() {
return {
width: window.innerWidth,
height: window.innerHeight
};
})();
"""
viewport_result = await self.evaluate(tab_id, viewport_script)
viewport = viewport_result.get("result", {}).get("value", {})
viewport_width = viewport.get("width", 1920)
viewport_height = viewport.get("height", 1080)
# Method 1: Use JavaScript to get element bounds and click
# This is more reliable than CDP for complex layouts
click_script = f"""
(function() {{
const el = document.querySelector({json.dumps(selector)});
if (!el) return {{ error: 'Element not found' }};
// Check if element is visible
const rect = el.getBoundingClientRect();
if (rect.width === 0 || rect.height === 0) {{
return {{ error: 'Element has zero dimensions' }};
}}
// Check if element is within viewport
if (rect.bottom < 0 || rect.top > {viewport_height} ||
rect.right < 0 || rect.left > {viewport_width}) {{
return {{ error: 'Element not in viewport' }};
}}
// Get center for metadata
const x = rect.x + rect.width / 2;
const y = rect.y + rect.height / 2;
// Perform the click
el.click();
return {{ x: x, y: y, width: rect.width, height: rect.height }};
}})();
"""
try:
result = await self.evaluate(tab_id, click_script)
value = result.get("result", {}).get("value")
if isinstance(value, dict) and "error" not in value:
# JavaScript click succeeded
return {
"ok": True,
"action": "click",
"selector": selector,
"x": value.get("x", 0),
"y": value.get("y", 0),
"method": "javascript"
}
# If JavaScript click failed, try CDP approach
if isinstance(value, dict) and value.get("error"):
logger.debug("JS click failed: %s, trying CDP", value["error"])
except Exception as e:
logger.debug("JS click exception: %s, trying CDP", e)
# Method 2: CDP mouse events (fallback)
# Get element bounds via JavaScript (more reliable than CDP getBoxModel)
bounds_script = f"""
(function() {{
const el = document.querySelector({json.dumps(selector)});
if (!el) return null;
const rect = el.getBoundingClientRect();
return {{
x: rect.x + rect.width / 2,
y: rect.y + rect.height / 2,
width: rect.width,
height: rect.height
}};
}})();
"""
bounds_result = await self.evaluate(tab_id, bounds_script)
bounds_value = bounds_result.get("result", {}).get("value")
if not bounds_value:
return {"ok": False, "error": f"Could not get element bounds: {selector}"}
# Calculate center of element (content quad is [x1,y1, x2,y2, x3,y3, x4,y4])
x = (content[0] + content[2] + content[4] + content[6]) / 4
y = (content[1] + content[3] + content[5] + content[7]) / 4
x = bounds_value.get("x", 0)
y = bounds_value.get("y", 0)
# Scroll into view first
await self._cdp(
tab_id,
"DOM.scrollIntoViewIfNeeded",
{"nodeId": node_id},
)
# Clamp coordinates to viewport bounds
x = max(0, min(viewport_width - 1, x))
y = max(0, min(viewport_height - 1, y))
# Dispatch mouse events
# Dispatch mouse events with proper timing
button_map = {"left": "left", "right": "right", "middle": "middle"}
cdp_button = button_map.get(button, "left")
await self._cdp(
tab_id,
"Input.dispatchMouseEvent",
{
"type": "mousePressed",
"x": x,
"y": y,
"button": cdp_button,
"clickCount": click_count,
},
)
await self._cdp(
tab_id,
"Input.dispatchMouseEvent",
{
"type": "mouseReleased",
"x": x,
"y": y,
"button": cdp_button,
"clickCount": click_count,
},
)
try:
# Move mouse to element first
await self._cdp(
tab_id,
"Input.dispatchMouseEvent",
{"type": "mouseMoved", "x": x, "y": y},
)
await asyncio.sleep(0.05)
return {"ok": True, "action": "click", "selector": selector, "x": x, "y": y}
# Mouse down
try:
await asyncio.wait_for(
self._cdp(
tab_id,
"Input.dispatchMouseEvent",
{
"type": "mousePressed",
"x": x,
"y": y,
"button": cdp_button,
"clickCount": click_count,
},
),
timeout=1.0,
)
except asyncio.TimeoutError:
pass # Continue even if timeout
await asyncio.sleep(0.08)
# Mouse up
try:
await asyncio.wait_for(
self._cdp(
tab_id,
"Input.dispatchMouseEvent",
{
"type": "mouseReleased",
"x": x,
"y": y,
"button": cdp_button,
"clickCount": click_count,
},
),
timeout=3.0,
)
except asyncio.TimeoutError:
pass # Continue even if timeout
return {"ok": True, "action": "click", "selector": selector, "x": x, "y": y, "method": "cdp"}
except Exception as e:
return {"ok": False, "error": f"Click failed: {e}"}
async def click_coordinate(self, tab_id: int, x: float, y: float, button: str = "left") -> dict:
"""Click at specific coordinates."""
await self.cdp_attach(tab_id)
await self._cdp(tab_id, "Input.enable")
await self._try_enable_domain(tab_id, "Input")
button_map = {"left": "left", "right": "right", "middle": "middle"}
cdp_button = button_map.get(button, "left")
@@ -476,44 +622,59 @@ class BeelineBridge:
delay_ms: int = 0,
timeout_ms: int = 30000,
) -> dict:
"""Type text into an element."""
"""Type text into an element.
Uses JavaScript focus for reliability, then CDP key events.
"""
await self.cdp_attach(tab_id)
await self._cdp(tab_id, "DOM.enable")
await self._cdp(tab_id, "Input.enable")
await self._try_enable_domain(tab_id, "DOM")
await self._try_enable_domain(tab_id, "Input")
await self._try_enable_domain(tab_id, "Runtime")
# Get document and find element
doc = await self._cdp(tab_id, "DOM.getDocument")
root_id = doc.get("root", {}).get("nodeId")
# First, scroll into view and focus via JavaScript (more reliable than CDP)
focus_script = f"""
(function() {{
const el = document.querySelector({json.dumps(selector)});
if (!el) return false;
deadline = asyncio.get_event_loop().time() + timeout_ms / 1000
node_id = None
while asyncio.get_event_loop().time() < deadline:
result = await self._cdp(
tab_id, "DOM.querySelector", {"nodeId": root_id, "selector": selector}
)
node_id = result.get("nodeId")
if node_id:
break
await asyncio.sleep(0.1)
// Scroll into view
el.scrollIntoView({{ block: 'center' }});
if not node_id:
return {"ok": False, "error": f"Element not found: {selector}"}
// Focus the element
el.focus();
# Focus the element
await self._cdp(tab_id, "DOM.focus", {"nodeId": node_id})
// Clear if requested
if ({str(clear_first).lower()}) {{
if (el.value !== undefined) {{
el.value = '';
}} else if (el.isContentEditable) {{
el.textContent = '';
}}
}}
# Clear if requested
if clear_first:
await self._cdp(
tab_id,
"Runtime.evaluate",
{
"expression": f"document.querySelector({json.dumps(selector)}).value = ''",
"returnByValue": True,
},
)
return true;
}})();
"""
# Type each character
focus_result = await self.evaluate(tab_id, focus_script)
success = focus_result.get("result", {}).get("value", False)
if not success:
# Element not found - wait and retry
deadline = asyncio.get_event_loop().time() + timeout_ms / 1000
while asyncio.get_event_loop().time() < deadline:
result = await self.evaluate(tab_id, focus_script)
if result.get("result", {}).get("value", False):
success = True
break
await asyncio.sleep(0.1)
if not success:
return {"ok": False, "error": f"Element not found: {selector}"}
await asyncio.sleep(0.05) # Wait for focus to take effect
# Type each character using CDP key events
for char in text:
# Dispatch key down
await self._cdp(
@@ -540,7 +701,7 @@ class BeelineBridge:
selector: Optional selector to focus first
"""
await self.cdp_attach(tab_id)
await self._cdp(tab_id, "Input.enable")
await self._try_enable_domain(tab_id, "Input")
if selector:
doc = await self._cdp(tab_id, "DOM.getDocument")
@@ -585,43 +746,73 @@ class BeelineBridge:
return {"ok": True, "action": "press", "key": key}
async def hover(self, tab_id: int, selector: str, timeout_ms: int = 30000) -> dict:
"""Hover over an element."""
"""Hover over an element.
Uses JavaScript for bounds (more reliable than CDP getBoxModel).
"""
await self.cdp_attach(tab_id)
await self._cdp(tab_id, "DOM.enable")
await self._cdp(tab_id, "Input.enable")
await self._try_enable_domain(tab_id, "DOM")
await self._try_enable_domain(tab_id, "Input")
await self._try_enable_domain(tab_id, "Runtime")
doc = await self._cdp(tab_id, "DOM.getDocument")
root_id = doc.get("root", {}).get("nodeId")
# Use JavaScript to scroll into view and get bounds
hover_script = f"""
(function() {{
const el = document.querySelector({json.dumps(selector)});
if (!el) return null;
// Scroll into view
el.scrollIntoView({{ block: 'center' }});
const rect = el.getBoundingClientRect();
return {{
x: rect.x + rect.width / 2,
y: rect.y + rect.height / 2,
width: rect.width,
height: rect.height
}};
}})();
"""
# Wait for element and get bounds
deadline = asyncio.get_event_loop().time() + timeout_ms / 1000
node_id = None
bounds_value = None
while asyncio.get_event_loop().time() < deadline:
result = await self._cdp(
tab_id, "DOM.querySelector", {"nodeId": root_id, "selector": selector}
)
node_id = result.get("nodeId")
if node_id:
result = await self.evaluate(tab_id, hover_script)
bounds_value = result.get("result", {}).get("value")
if bounds_value:
break
await asyncio.sleep(0.1)
if not node_id:
if not bounds_value:
return {"ok": False, "error": f"Element not found: {selector}"}
box = await self._cdp(tab_id, "DOM.getBoxModel", {"nodeId": node_id})
content = box.get("content", [])
x = (content[0] + content[2] + content[4] + content[6]) / 4
y = (content[1] + content[3] + content[5] + content[7]) / 4
x = bounds_value.get("x", 0)
y = bounds_value.get("y", 0)
if x == 0 and y == 0:
return {"ok": False, "error": f"Element has zero dimensions: {selector}"}
await asyncio.sleep(0.05) # Wait for scroll
# Dispatch mouse moved event
await self._cdp(
tab_id,
"Input.dispatchMouseEvent",
{"type": "mouseMoved", "x": x, "y": y},
)
return {"ok": True, "action": "hover", "selector": selector}
return {"ok": True, "action": "hover", "selector": selector, "x": x, "y": y}
async def scroll(self, tab_id: int, direction: str = "down", amount: int = 500) -> dict:
"""Scroll the page."""
"""Scroll the page.
Uses multiple methods for robustness:
1. Find and scroll the largest scrollable container (handles SPAs like LinkedIn)
2. Fallback to window scroll
3. Fallback to mouse wheel events via CDP
"""
await self.cdp_attach(tab_id)
delta_x = 0
@@ -635,16 +826,159 @@ class BeelineBridge:
elif direction == "left":
delta_x = -amount
await self._cdp(
tab_id,
"Runtime.evaluate",
{
"expression": f"window.scrollBy({delta_x}, {delta_y})",
"returnByValue": True,
},
)
# Method 1: Find and scroll the largest scrollable container
# This handles SPAs like LinkedIn where content is in a nested scrollable div
smart_scroll_script = f"""
(function() {{
// Find the largest scrollable container
function findScrollableContainer() {{
const candidates = [];
return {"ok": True, "action": "scroll", "direction": direction, "amount": amount}
// Check all elements with overflow scroll/auto
const allElements = document.querySelectorAll('*');
for (const el of allElements) {{
const style = getComputedStyle(el);
const overflow = style.overflow + style.overflowY;
if (overflow.includes('scroll') || overflow.includes('auto')) {{
const rect = el.getBoundingClientRect();
// Must be visible and reasonably large
if (rect.width > 100 && rect.height > 100 &&
el.scrollHeight > el.clientHeight + 100) {{
candidates.push({{
el: el,
area: rect.width * rect.height,
scrollable: el.scrollHeight - el.clientHeight
}});
}}
}}
}}
// Sort by area (largest first) and return best candidate
candidates.sort((a, b) => b.area - a.area);
return candidates.length > 0 ? candidates[0].el : null;
}}
const container = findScrollableContainer();
if (container) {{
container.scrollBy({{
top: {delta_y},
left: {delta_x},
behavior: 'smooth'
}});
return {{
method: 'container-smooth',
success: true,
containerTag: container.tagName,
containerClass: container.className.substring(0, 50)
}};
}}
// Fallback to window scroll
if ('scrollBehavior' in document.documentElement.style) {{
window.scrollBy({{
top: {delta_y},
left: {delta_x},
behavior: 'smooth'
}});
return {{ method: 'window-smooth', success: true }};
}}
window.scrollBy({delta_x}, {delta_y});
return {{ method: 'window-instant', success: true }};
}})();
"""
try:
result = await self.evaluate(tab_id, smart_scroll_script)
value = result.get("result", {})
if value and value.get("success"):
return {
"ok": True,
"action": "scroll",
"direction": direction,
"amount": amount,
"method": value.get("method", "js"),
"container": value.get("containerTag", "window")
}
except Exception as e:
logger.debug("Smart scroll script failed: %s", e)
# Method 2: Find scrollable container and use mouse wheel at its center
try:
# Find the largest scrollable container and its position
find_container_script = """
(function() {
const candidates = [];
const allElements = document.querySelectorAll('*');
for (const el of allElements) {
const style = getComputedStyle(el);
const overflow = style.overflow + style.overflowY;
if (overflow.includes('scroll') || overflow.includes('auto')) {
const rect = el.getBoundingClientRect();
if (rect.width > 100 && rect.height > 100 &&
el.scrollHeight > el.clientHeight + 100) {
candidates.push({
x: Math.round(rect.left + rect.width / 2),
y: Math.round(rect.top + rect.height / 2),
area: rect.width * rect.height,
tag: el.tagName
});
}
}
}
candidates.sort((a, b) => b.area - a.area);
return candidates.length > 0 ? candidates[0] : null;
})();
"""
container_result = await self._cdp(
tab_id,
"Runtime.evaluate",
{"expression": find_container_script, "returnByValue": True},
)
container_info = container_result.get("result", {}).get("value", {})
if container_info and isinstance(container_info, dict):
x = container_info.get("x", 400)
y = container_info.get("y", 300)
else:
# Fallback to viewport center
viewport_result = await self._cdp(
tab_id,
"Runtime.evaluate",
{
"expression": "({w: window.innerWidth, h: window.innerHeight})",
"returnByValue": True,
},
)
vp = viewport_result.get("result", {}).get("value", {})
x = vp.get("w", 800) // 2
y = vp.get("h", 600) // 2
# Dispatch mouse wheel event at container center
await self._cdp(
tab_id,
"Input.dispatchMouseEvent",
{
"type": "mouseWheel",
"x": x,
"y": y,
"deltaX": -delta_x,
"deltaY": -delta_y,
},
)
return {
"ok": True,
"action": "scroll",
"direction": direction,
"amount": amount,
"method": "mouseWheel",
"target": f"({x}, {y})"
}
except Exception as e:
logger.warning("Scroll failed: %s", e)
return {"ok": False, "error": str(e)}
async def select_option(self, tab_id: int, selector: str, values: list[str]) -> dict:
"""Select options in a select element."""
@@ -675,6 +1009,8 @@ class BeelineBridge:
async def evaluate(self, tab_id: int, script: str) -> dict:
"""Execute JavaScript in the page."""
await self.cdp_attach(tab_id)
await self._try_enable_domain(tab_id, "Runtime")
# Wrap in IIFE to allow return statements at top level
wrapped_script = f"(function() {{ {script} }})()"
result = await self._cdp(
@@ -683,31 +1019,176 @@ class BeelineBridge:
{"expression": wrapped_script, "returnByValue": True, "awaitPromise": True},
)
if result is None:
return {"ok": False, "error": "CDP returned no result"}
if "exceptionDetails" in result:
return {
"ok": False,
"error": result["exceptionDetails"].get("text", "Script error"),
}
# The CDP response structure is {result: {type: ..., value: ...}}
# But our bridge returns just the inner result object
inner_result = result.get("result", {})
value = inner_result.get("value") if isinstance(inner_result, dict) else None
return {
"ok": True,
"action": "evaluate",
"result": result.get("result", {}).get("value"),
"result": value,
}
async def snapshot(self, tab_id: int) -> dict:
async def snapshot(self, tab_id: int, timeout_s: float = 10.0) -> dict:
"""Get an accessibility snapshot of the page.
Uses CDP Accessibility.getFullAXTree and formats it as a readable tree.
Uses a hybrid approach:
1. CDP Accessibility.getFullAXTree for semantic structure
2. DOM queries for visibility and computed styles
3. Falls back to DOM tree if accessibility returns mostly ignored
Args:
tab_id: The tab ID to snapshot
timeout_s: Maximum time to spend building snapshot (default 10s)
"""
await self.cdp_attach(tab_id)
await self._cdp(tab_id, "Accessibility.enable")
async with asyncio.timeout(timeout_s):
await self.cdp_attach(tab_id)
await self._try_enable_domain(tab_id, "Accessibility")
await self._try_enable_domain(tab_id, "DOM")
await self._try_enable_domain(tab_id, "Runtime")
result = await self._cdp(tab_id, "Accessibility.getFullAXTree")
nodes = result.get("nodes", [])
# Try accessibility tree first
result = await self._cdp(tab_id, "Accessibility.getFullAXTree")
nodes = result.get("nodes", [])
# Format the tree
snapshot = self._format_ax_tree(nodes)
# Count non-ignored nodes
visible_count = sum(1 for n in nodes if not n.get("ignored", False))
# If tree is too large or mostly ignored, use DOM-based snapshot
if len(nodes) > 5000:
logger.debug(
"Accessibility tree too large (%d nodes), using DOM snapshot",
len(nodes),
)
return await self._dom_snapshot(tab_id)
if visible_count < 10 and len(nodes) > 50:
logger.debug(
"Accessibility tree has only %d/%d visible nodes, falling back to DOM snapshot",
visible_count,
len(nodes),
)
return await self._dom_snapshot(tab_id)
# Format the accessibility tree (with node limit)
snapshot = self._format_ax_tree(nodes, max_nodes=2000)
# Get URL
url_result = await self._cdp(
tab_id,
"Runtime.evaluate",
{"expression": "window.location.href", "returnByValue": True},
)
url = url_result.get("result", {}).get("value", "")
return {
"ok": True,
"tabId": tab_id,
"url": url,
"tree": snapshot,
}
async def _dom_snapshot(self, tab_id: int) -> dict:
"""Fallback: build snapshot from DOM tree with visibility info."""
# Get all interactive elements using DOM queries
script = """
(function() {
const interactiveSelectors = [
'a', 'button', 'input', 'textarea', 'select', 'option',
'[onclick]', '[role="button"]', '[role="link"]',
'[contenteditable="true"]', 'summary', 'details',
'a[href]', 'button[type]', 'input[type]',
'label', 'form', 'nav', 'nav a', 'nav button',
'[aria-label]', '[aria-labelledby]', '[tabindex]',
'h1', 'h2', 'h3', 'h4', 'h5', 'h6'
].join(', ');
const elements = document.querySelectorAll(interactiveSelectors);
const results = [];
for (const el of elements) {
const rect = el.getBoundingClientRect();
const styles = window.getComputedStyle(el);
// Skip invisible elements
if (rect.width === 0 || rect.height === 1 ||
styles.display === 'none' ||
styles.visibility === 'hidden' ||
styles.opacity === '0') {
continue;
}
// Skip elements outside viewport
if (rect.bottom < 0 || rect.top > window.innerHeight ||
rect.right < 0 || rect.left > window.innerWidth) {
continue;
}
const tag = el.tagName.toLowerCase();
const text = (el.innerText || el.value || el.placeholder || el.getAttribute('aria-label') || '').substring(0, 80);
const type = el.type || tag;
const role = el.getAttribute('role') || tag;
const name = el.name || el.id || '';
const href = el.href || '';
const className = el.className || '';
results.push({
tag,
type,
role,
text: text.trim(),
name,
href,
className: className.split(' ').slice(0, 3).join(' '),
rect: {
x: Math.round(rect.x),
y: Math.round(rect.y),
width: Math.round(rect.width),
height: Math.round(rect.height)
}
});
}
return results;
})();
"""
result = await self.evaluate(tab_id, script)
elements = result.get("result", [])
if not elements:
return {
"ok": True,
"tabId": tab_id,
"tree": "(no visible interactive elements found)",
}
# Format as tree
lines = []
for i in range(0, min(100, len(elements))):
el = elements[i]
ref = f"e{i}"
tag = el.get("tag", "unknown")
text = el.get("text", "")
role = el.get("role", tag)
desc = f"{role}"
if text:
desc += f' "{text[:40]}"'
if el.get("href"):
desc += f' [href]'
desc += f" [ref={ref}]"
lines.append(f" - {desc}")
# Get URL
url_result = await self._cdp(
@@ -715,17 +1196,22 @@ class BeelineBridge:
"Runtime.evaluate",
{"expression": "window.location.href", "returnByValue": True},
)
url = url_result.get("result", {}).get("result", {}).get("value", "")
url = url_result.get("result", {}).get("value", "")
return {
"ok": True,
"tabId": tab_id,
"url": url,
"snapshot": snapshot,
"tree": "\n".join(lines),
}
def _format_ax_tree(self, nodes: list[dict]) -> str:
"""Format a CDP Accessibility.getFullAXTree result."""
def _format_ax_tree(self, nodes: list[dict], max_nodes: int = 2000) -> str:
"""Format a CDP Accessibility.getFullAXTree result.
Args:
nodes: List of accessibility tree nodes
max_nodes: Maximum number of nodes to process (prevents hangs on huge trees)
"""
if not nodes:
return "(empty tree)"
@@ -737,9 +1223,14 @@ class BeelineBridge:
lines: list[str] = []
ref_counter = [0] # Use list to allow mutation in nested function
node_counter = [0] # Track total nodes processed
ref_map: dict[str, str] = {}
def _walk(node_id: str, depth: int) -> None:
# Stop if we've processed enough nodes
if node_counter[0] >= max_nodes:
return
node = by_id.get(node_id)
if not node:
return
@@ -760,6 +1251,8 @@ class BeelineBridge:
_walk(cid, depth)
return
node_counter[0] += 1
name_info = node.get("name", {})
name = name_info.get("value", "") if isinstance(name_info, dict) else str(name_info)
@@ -807,6 +1300,11 @@ class BeelineBridge:
_walk(cid, depth + 1)
_walk(nodes[0]["nodeId"], 0)
# Add truncation notice if we hit the limit
if node_counter[0] >= max_nodes:
lines.append("... (tree truncated, too many nodes)")
return "\n".join(lines) if lines else "(empty tree)"
async def get_text(self, tab_id: int, selector: str, timeout_ms: int = 30000) -> dict:
@@ -0,0 +1,878 @@
"""Comprehensive tests for browser tools with FastMCP fixtures.
Tests cover:
- Multiple subagents with multiple tab groups
- Complex script execution for LinkedIn, Twitter, YouTube
- Tab lifecycle management
- Navigation and interactions
- Error handling and edge cases
"""
from __future__ import annotations
import asyncio
import json
from collections.abc import Callable
from typing import Any
from unittest.mock import AsyncMock, MagicMock, patch
import pytest
from fastmcp import FastMCP
from gcu.browser.bridge import BeelineBridge
from gcu.browser.tools.advanced import register_advanced_tools
from gcu.browser.tools.inspection import register_inspection_tools
from gcu.browser.tools.interactions import register_interaction_tools
from gcu.browser.tools.lifecycle import register_lifecycle_tools
from gcu.browser.tools.navigation import register_navigation_tools
from gcu.browser.tools.tabs import register_tab_tools
# ─────────────────────────────────────────────────────────────────────────────
# Fixtures
# ─────────────────────────────────────────────────────────────────────────────
@pytest.fixture
def mcp() -> FastMCP:
"""Create a fresh FastMCP instance for testing."""
return FastMCP("test-browser-comprehensive")
@pytest.fixture
def mock_bridge() -> MagicMock:
"""Create a mock BeelineBridge with common methods pre-configured."""
bridge = MagicMock(spec=BeelineBridge)
bridge.is_connected = True
bridge._cdp_attached = set()
# Context management
bridge.create_context = AsyncMock(return_value={"groupId": 1, "tabId": 100})
bridge.destroy_context = AsyncMock(return_value={"ok": True})
# Tab management
bridge.create_tab = AsyncMock(return_value={"tabId": 101})
bridge.close_tab = AsyncMock(return_value={"ok": True})
bridge.list_tabs = AsyncMock(return_value={"tabs": []})
bridge.activate_tab = AsyncMock(return_value={"ok": True})
# Navigation
bridge.navigate = AsyncMock(return_value={"ok": True, "url": "https://example.com"})
bridge.go_back = AsyncMock(return_value={"ok": True})
bridge.go_forward = AsyncMock(return_value={"ok": True})
bridge.reload = AsyncMock(return_value={"ok": True})
# Interactions
bridge.click = AsyncMock(return_value={"ok": True})
bridge.click_coordinate = AsyncMock(return_value={"ok": True})
bridge.type_text = AsyncMock(return_value={"ok": True})
bridge.press_key = AsyncMock(return_value={"ok": True})
bridge.hover = AsyncMock(return_value={"ok": True})
bridge.scroll = AsyncMock(return_value={"ok": True})
bridge.select_option = AsyncMock(return_value={"ok": True, "selected": ["option1"]})
bridge.drag = AsyncMock(return_value={"ok": True})
# Inspection
bridge.evaluate = AsyncMock(return_value={"result": {"value": True}})
bridge.snapshot = AsyncMock(return_value={"tree": "mock_accessibility_tree"})
bridge.screenshot = AsyncMock(return_value={"data": "base64imagedata"})
bridge.get_text = AsyncMock(return_value={"text": "sample text"})
bridge.get_attribute = AsyncMock(return_value={"value": "attribute_value"})
# Advanced
bridge.wait_for_selector = AsyncMock(return_value={"ok": True})
bridge.wait_for_text = AsyncMock(return_value={"ok": True})
bridge.resize = AsyncMock(return_value={"ok": True})
bridge.upload_file = AsyncMock(return_value={"ok": True})
bridge.handle_dialog = AsyncMock(return_value={"ok": True})
bridge.cdp_attach = AsyncMock(return_value={"ok": True})
bridge.cdp_detach = AsyncMock(return_value={"ok": True})
return bridge
# ─────────────────────────────────────────────────────────────────────────────
# Test Classes
# ─────────────────────────────────────────────────────────────────────────────
class TestMultipleSubagentsTabGroups:
"""Tests for multiple subagents creating and managing multiple tab groups."""
@pytest.mark.asyncio
async def test_multiple_agents_create_separate_tab_groups(
self, mcp: FastMCP, mock_bridge: MagicMock
):
"""Multiple subagents should each create their own tab group."""
call_count = 0
async def mock_create_context(agent_id: str) -> dict:
nonlocal call_count
call_count += 1
return {"groupId": call_count, "tabId": 100 + call_count}
mock_bridge.create_context = AsyncMock(side_effect=mock_create_context)
# Register tools first
register_lifecycle_tools(mcp)
browser_start = mcp._tool_manager._tools["browser_start"].fn
# Now patch for execution
with patch("gcu.browser.tools.lifecycle.get_bridge", return_value=mock_bridge):
# Simulate 3 different subagents starting browsers
results = await asyncio.gather(
browser_start(profile="agent_1"),
browser_start(profile="agent_2"),
browser_start(profile="agent_3"),
)
# Each should have created a separate context
assert mock_bridge.create_context.call_count == 3
assert all(r.get("ok") for r in results)
@pytest.mark.asyncio
async def test_concurrent_tab_operations_different_groups(
self, mcp: FastMCP, mock_bridge: MagicMock
):
"""Tab operations in different groups should not interfere."""
group1_tabs = [
{"id": 101, "url": "https://site1.com", "title": "Site 1"},
{"id": 102, "url": "https://site2.com", "title": "Site 2"},
]
group2_tabs = [
{"id": 201, "url": "https://site3.com", "title": "Site 3"},
{"id": 202, "url": "https://site4.com", "title": "Site 4"},
]
def mock_list_tabs(group_id: int) -> dict:
if group_id == 1:
return {"tabs": group1_tabs}
elif group_id == 2:
return {"tabs": group2_tabs}
return {"tabs": []}
mock_bridge.list_tabs = AsyncMock(side_effect=mock_list_tabs)
register_tab_tools(mcp)
browser_tabs = mcp._tool_manager._tools["browser_tabs"].fn
with patch("gcu.browser.tools.tabs.get_bridge", return_value=mock_bridge):
with patch(
"gcu.browser.tools.tabs._get_context",
side_effect=lambda p: {
"groupId": 1 if p == "agent_1" else 2,
"activeTabId": 101 if p == "agent_1" else 201,
},
):
# Concurrent tab listing from different agents
results = await asyncio.gather(
browser_tabs(profile="agent_1"),
browser_tabs(profile="agent_2"),
)
# Each should see only their own tabs
assert len(results[0].get("tabs", [])) == 2
assert len(results[1].get("tabs", [])) == 2
assert results[0]["tabs"][0]["id"] == 101
assert results[1]["tabs"][0]["id"] == 201
@pytest.mark.asyncio
async def test_tab_group_isolation(self, mcp: FastMCP, mock_bridge: MagicMock):
"""Closing a tab in one group should not affect other groups."""
closed_tabs = []
async def mock_close_tab(tab_id: int) -> dict:
closed_tabs.append(tab_id)
return {"ok": True}
mock_bridge.close_tab = AsyncMock(side_effect=mock_close_tab)
register_tab_tools(mcp)
browser_close = mcp._tool_manager._tools["browser_close"].fn
with patch("gcu.browser.tools.tabs.get_bridge", return_value=mock_bridge):
with patch(
"gcu.browser.tools.tabs._get_context",
return_value={"groupId": 1, "activeTabId": 101},
):
result = await browser_close(tab_id=101, profile="agent_1")
assert result.get("ok") is True
assert 101 in closed_tabs
class TestComplexScriptExecution:
"""Tests for complex JavaScript execution patterns on real-world sites."""
@pytest.mark.asyncio
async def test_linkedin_scroll_infinite_feed(self, mcp: FastMCP, mock_bridge: MagicMock):
"""Test LinkedIn-style infinite feed scrolling with lazy loading."""
scroll_calls = []
async def mock_scroll(tab_id: int, direction: str, amount: int = 500) -> dict:
scroll_calls.append((tab_id, direction, amount))
return {"ok": True}
mock_bridge.scroll = AsyncMock(side_effect=mock_scroll)
register_interaction_tools(mcp)
browser_scroll = mcp._tool_manager._tools["browser_scroll"].fn
with patch("gcu.browser.tools.interactions.get_bridge", return_value=mock_bridge):
with patch(
"gcu.browser.tools.interactions._get_context",
return_value={"groupId": 1, "activeTabId": 100},
):
# Simulate infinite scroll - multiple scroll operations
for _ in range(3):
await browser_scroll(direction="down", amount=500)
assert len(scroll_calls) == 3
@pytest.mark.asyncio
async def test_linkedin_profile_data_extraction(self, mcp: FastMCP, mock_bridge: MagicMock):
"""Test extracting LinkedIn profile data using complex selectors."""
profile_data = {
"name": "John Doe",
"title": "Software Engineer at Tech Corp",
}
mock_bridge.evaluate = AsyncMock(return_value={"result": {"value": profile_data}})
register_advanced_tools(mcp)
browser_evaluate = mcp._tool_manager._tools["browser_evaluate"].fn
with patch("gcu.browser.tools.advanced.get_bridge", return_value=mock_bridge):
with patch(
"gcu.browser.tools.advanced._get_context",
return_value={"groupId": 1, "activeTabId": 100},
):
# Extract profile data via JavaScript
extraction_script = """
const name = document.querySelector('.text-heading-xlarge')?.innerText;
const title = document.querySelector('.text-body-medium')?.innerText;
return { name, title };
"""
result = await browser_evaluate(script=extraction_script)
# browser_evaluate returns the raw result from bridge.evaluate
assert "result" in result
assert result["result"]["value"] == profile_data
@pytest.mark.asyncio
async def test_twitter_x_infinite_timeline(self, mcp: FastMCP, mock_bridge: MagicMock):
"""Test Twitter/X infinite timeline scrolling with tweet loading."""
tweets_loaded = ["tweet_0", "tweet_1", "tweet_2", "tweet_3", "tweet_4"]
mock_bridge.evaluate = AsyncMock(return_value={"result": {"value": tweets_loaded}})
mock_bridge.scroll = AsyncMock(return_value={"ok": True})
register_interaction_tools(mcp)
register_advanced_tools(mcp)
browser_scroll = mcp._tool_manager._tools["browser_scroll"].fn
browser_evaluate = mcp._tool_manager._tools["browser_evaluate"].fn
with patch("gcu.browser.tools.interactions.get_bridge", return_value=mock_bridge):
with patch(
"gcu.browser.tools.interactions._get_context",
return_value={"groupId": 1, "activeTabId": 100},
):
# Simulate Twitter timeline scroll
await browser_scroll(direction="down", amount=800)
with patch("gcu.browser.tools.advanced.get_bridge", return_value=mock_bridge):
with patch(
"gcu.browser.tools.advanced._get_context",
return_value={"groupId": 1, "activeTabId": 100},
):
extract_script = """
return Array.from(document.querySelectorAll('article[data-testid="tweet"]'))
.slice(0, 5)
.map(t => t.innerText);
"""
result = await browser_evaluate(script=extract_script)
# browser_evaluate returns raw result from bridge
assert "result" in result
assert result["result"]["value"] == tweets_loaded
@pytest.mark.asyncio
async def test_youtube_video_player_interaction(self, mcp: FastMCP, mock_bridge: MagicMock):
"""Test YouTube video player controls and state management."""
player_state = {"playing": False, "currentTime": 0, "duration": 300}
mock_bridge.evaluate = AsyncMock(return_value={"result": {"value": player_state}})
mock_bridge.click = AsyncMock(return_value={"ok": True})
register_advanced_tools(mcp)
register_interaction_tools(mcp)
browser_evaluate = mcp._tool_manager._tools["browser_evaluate"].fn
with patch("gcu.browser.tools.advanced.get_bridge", return_value=mock_bridge):
with patch(
"gcu.browser.tools.advanced._get_context",
return_value={"groupId": 1, "activeTabId": 100},
):
# Interact with YouTube player
play_script = """
document.querySelector('.ytp-play-button')?.click();
return true;
"""
result = await browser_evaluate(script=play_script)
# browser_evaluate returns raw result from bridge
assert "result" in result
@pytest.mark.asyncio
async def test_youtube_comments_expansion(self, mcp: FastMCP, mock_bridge: MagicMock):
"""Test YouTube comments section expansion and loading."""
comments = ["comment_1", "comment_2", "comment_3"]
mock_bridge.evaluate = AsyncMock(return_value={"result": {"value": comments}})
mock_bridge.scroll = AsyncMock(return_value={"ok": True})
mock_bridge.click = AsyncMock(return_value={"ok": True})
register_advanced_tools(mcp)
browser_evaluate = mcp._tool_manager._tools["browser_evaluate"].fn
with patch("gcu.browser.tools.advanced.get_bridge", return_value=mock_bridge):
with patch(
"gcu.browser.tools.advanced._get_context",
return_value={"groupId": 1, "activeTabId": 100},
):
# Scroll to comments and expand
expand_script = """
const commentsSection = document.querySelector('ytd-comments#comments');
if (commentsSection) {
commentsSection.scrollIntoView();
return true;
}
return false;
"""
result = await browser_evaluate(script=expand_script)
# browser_evaluate returns raw result from bridge
assert "result" in result
@pytest.mark.asyncio
async def test_complex_form_filling_linkedin(self, mcp: FastMCP, mock_bridge: MagicMock):
"""Test complex form filling on LinkedIn with dynamic fields."""
filled_fields = {}
async def mock_type_text(tab_id: int, selector: str, text: str, **kwargs) -> dict:
filled_fields[selector] = text
return {"ok": True}
async def mock_select_option(tab_id: int, selector: str, values: list, **kwargs) -> dict:
filled_fields[selector] = values
return {"ok": True, "selected": values}
mock_bridge.type_text = AsyncMock(side_effect=mock_type_text)
mock_bridge.select_option = AsyncMock(side_effect=mock_select_option)
register_interaction_tools(mcp)
browser_type = mcp._tool_manager._tools["browser_type"].fn
browser_select = mcp._tool_manager._tools["browser_select"].fn
with patch("gcu.browser.tools.interactions.get_bridge", return_value=mock_bridge):
with patch(
"gcu.browser.tools.interactions._get_context",
return_value={"groupId": 1, "activeTabId": 100},
):
# Fill out a LinkedIn job application form
await browser_type(selector="#first-name", text="John")
await browser_type(selector="#last-name", text="Doe")
await browser_type(selector="#email", text="john.doe@example.com")
await browser_select(selector="#experience-level", values=["5-10 years"])
assert filled_fields.get("#first-name") == "John"
assert filled_fields.get("#last-name") == "Doe"
assert filled_fields.get("#email") == "john.doe@example.com"
class TestTabLifecycle:
"""Tests for tab lifecycle management."""
@pytest.mark.asyncio
async def test_create_and_close_tab(self, mcp: FastMCP, mock_bridge: MagicMock):
"""Test creating and closing a tab."""
mock_bridge.create_tab = AsyncMock(return_value={"tabId": 123})
mock_bridge.close_tab = AsyncMock(return_value={"ok": True})
register_tab_tools(mcp)
browser_open = mcp._tool_manager._tools["browser_open"].fn
browser_close = mcp._tool_manager._tools["browser_close"].fn
with patch("gcu.browser.tools.tabs.get_bridge", return_value=mock_bridge):
with patch(
"gcu.browser.tools.tabs._get_context",
return_value={"groupId": 1, "activeTabId": 100},
):
open_result = await browser_open(url="https://example.com")
assert open_result.get("ok") is True
close_result = await browser_close(tab_id=123)
assert close_result.get("ok") is True
@pytest.mark.asyncio
async def test_tab_focus_switching(self, mcp: FastMCP, mock_bridge: MagicMock):
"""Test switching focus between tabs."""
mock_bridge.activate_tab = AsyncMock(return_value={"ok": True})
register_tab_tools(mcp)
browser_focus = mcp._tool_manager._tools["browser_focus"].fn
with patch("gcu.browser.tools.tabs.get_bridge", return_value=mock_bridge):
with patch(
"gcu.browser.tools.tabs._get_context",
return_value={"groupId": 1, "activeTabId": 100},
):
result = await browser_focus(tab_id=200)
assert result.get("ok") is True
mock_bridge.activate_tab.assert_awaited_once_with(200)
class TestNavigation:
"""Tests for navigation tools."""
@pytest.mark.asyncio
async def test_navigate_with_wait_until(self, mcp: FastMCP, mock_bridge: MagicMock):
"""Test navigation with different wait_until options."""
mock_bridge.navigate = AsyncMock(return_value={"ok": True, "url": "https://example.com"})
register_navigation_tools(mcp)
browser_navigate = mcp._tool_manager._tools["browser_navigate"].fn
with patch("gcu.browser.tools.navigation.get_bridge", return_value=mock_bridge):
with patch(
"gcu.browser.tools.navigation._get_context",
return_value={"groupId": 1, "activeTabId": 100},
):
result = await browser_navigate(
url="https://example.com", wait_until="networkidle"
)
assert result.get("ok") is True
# The bridge.navigate is called with wait_until as keyword argument
mock_bridge.navigate.assert_awaited_once_with(100, "https://example.com", wait_until="networkidle")
@pytest.mark.asyncio
async def test_navigation_history(self, mcp: FastMCP, mock_bridge: MagicMock):
"""Test back/forward navigation."""
mock_bridge.go_back = AsyncMock(return_value={"ok": True})
mock_bridge.go_forward = AsyncMock(return_value={"ok": True})
register_navigation_tools(mcp)
browser_go_back = mcp._tool_manager._tools["browser_go_back"].fn
browser_go_forward = mcp._tool_manager._tools["browser_go_forward"].fn
with patch("gcu.browser.tools.navigation.get_bridge", return_value=mock_bridge):
with patch(
"gcu.browser.tools.navigation._get_context",
return_value={"groupId": 1, "activeTabId": 100},
):
back_result = await browser_go_back()
forward_result = await browser_go_forward()
assert back_result.get("ok") is True
assert forward_result.get("ok") is True
class TestInteractions:
"""Tests for interaction tools."""
@pytest.mark.asyncio
async def test_click_with_different_buttons(self, mcp: FastMCP, mock_bridge: MagicMock):
"""Test clicking with left, right, and middle buttons."""
click_calls = []
async def track_click(tab_id: int, selector: str, button: str = "left", **kwargs) -> dict:
click_calls.append((tab_id, selector, button))
return {"ok": True}
mock_bridge.click = AsyncMock(side_effect=track_click)
register_interaction_tools(mcp)
browser_click = mcp._tool_manager._tools["browser_click"].fn
with patch("gcu.browser.tools.interactions.get_bridge", return_value=mock_bridge):
with patch(
"gcu.browser.tools.interactions._get_context",
return_value={"groupId": 1, "activeTabId": 100},
):
await browser_click(selector="button", button="left")
await browser_click(selector="button", button="right")
await browser_click(selector="button", button="middle")
assert len(click_calls) == 3
assert [c[2] for c in click_calls] == ["left", "right", "middle"]
@pytest.mark.asyncio
async def test_type_with_special_characters(self, mcp: FastMCP, mock_bridge: MagicMock):
"""Test typing text with special characters and unicode."""
typed_texts = []
async def track_type(tab_id: int, selector: str, text: str, **kwargs) -> dict:
typed_texts.append(text)
return {"ok": True}
mock_bridge.type_text = AsyncMock(side_effect=track_type)
register_interaction_tools(mcp)
browser_type = mcp._tool_manager._tools["browser_type"].fn
with patch("gcu.browser.tools.interactions.get_bridge", return_value=mock_bridge):
with patch(
"gcu.browser.tools.interactions._get_context",
return_value={"groupId": 1, "activeTabId": 100},
):
# Test various special characters
special_texts = [
"Hello, World!", # Basic punctuation
"O'Reilly & Associates", # Quotes and ampersands
"Price: $100 (20% off)", # Currency and parentheses
"Email: user@example.com", # Email format
"日本語テスト", # Japanese characters
"Émojis: 🎉🚀💻", # Emojis
]
for text in special_texts:
result = await browser_type(selector="input", text=text)
assert result.get("ok") is True
assert typed_texts == special_texts
@pytest.mark.asyncio
async def test_drag_and_drop(self, mcp: FastMCP, mock_bridge: MagicMock):
"""Test drag and drop operation."""
# browser_drag uses _cdp directly for DOM queries and mouse events
mock_bridge._cdp = AsyncMock(side_effect=lambda tab_id, method, params=None: {
"DOM.getDocument": {"root": {"nodeId": 1}},
"DOM.querySelector": {"nodeId": 2},
"DOM.getBoxModel": {"content": [0, 0, 100, 0, 100, 50, 0, 50]},
"Input.dispatchMouseEvent": {},
}.get(method, {}))
register_interaction_tools(mcp)
browser_drag = mcp._tool_manager._tools["browser_drag"].fn
with patch("gcu.browser.tools.interactions.get_bridge", return_value=mock_bridge):
with patch(
"gcu.browser.tools.interactions._get_context",
return_value={"groupId": 1, "activeTabId": 100},
):
result = await browser_drag(
start_selector="#draggable",
end_selector="#dropzone",
)
assert result.get("ok") is True
class TestInspection:
"""Tests for inspection tools."""
@pytest.mark.asyncio
async def test_snapshot_accessibility_tree(self, mcp: FastMCP, mock_bridge: MagicMock):
"""Test getting accessibility tree snapshot."""
mock_snapshot = """
[1] document "Page Title"
[2] button "Submit"
[3] textbox "Search"
"""
mock_bridge.snapshot = AsyncMock(return_value={"tree": mock_snapshot})
register_inspection_tools(mcp)
browser_snapshot = mcp._tool_manager._tools["browser_snapshot"].fn
with patch("gcu.browser.tools.inspection.get_bridge", return_value=mock_bridge):
with patch(
"gcu.browser.tools.inspection._get_context",
return_value={"groupId": 1, "activeTabId": 100},
):
result = await browser_snapshot()
# browser_snapshot returns raw result from bridge
assert "tree" in result
@pytest.mark.asyncio
async def test_screenshot_full_page(self, mcp: FastMCP, mock_bridge: MagicMock):
"""Test taking full page screenshot."""
mock_bridge.screenshot = AsyncMock(
return_value={"ok": True, "data": "base64encodedimagedata", "width": 1920, "height": 5000}
)
register_inspection_tools(mcp)
browser_screenshot = mcp._tool_manager._tools["browser_screenshot"].fn
with patch("gcu.browser.tools.inspection.get_bridge", return_value=mock_bridge):
with patch(
"gcu.browser.tools.inspection._get_context",
return_value={"groupId": 1, "activeTabId": 100},
):
result = await browser_screenshot(full_page=True)
# browser_screenshot returns list of content blocks
assert isinstance(result, list)
mock_bridge.screenshot.assert_awaited_once_with(100, full_page=True)
class TestAdvancedTools:
"""Tests for advanced tools."""
@pytest.mark.asyncio
async def test_wait_for_selector_timeout(self, mcp: FastMCP, mock_bridge: MagicMock):
"""Test wait_for_selector timeout behavior."""
mock_bridge.wait_for_selector = AsyncMock(
side_effect=TimeoutError("Element not found within timeout")
)
register_advanced_tools(mcp)
browser_wait = mcp._tool_manager._tools["browser_wait"].fn
with patch("gcu.browser.tools.advanced.get_bridge", return_value=mock_bridge):
with patch(
"gcu.browser.tools.advanced._get_context",
return_value={"groupId": 1, "activeTabId": 100},
):
result = await browser_wait(selector=".nonexistent", timeout_ms=1000)
# Should return error result, not raise
assert result.get("ok") is False
assert "error" in result or "timed out" in str(result).lower()
@pytest.mark.asyncio
async def test_evaluate_with_return_value(self, mcp: FastMCP, mock_bridge: MagicMock):
"""Test JavaScript evaluation with return value."""
mock_bridge.evaluate = AsyncMock(
return_value={"result": {"value": {"status": "success", "count": 42}}}
)
register_advanced_tools(mcp)
browser_evaluate = mcp._tool_manager._tools["browser_evaluate"].fn
with patch("gcu.browser.tools.advanced.get_bridge", return_value=mock_bridge):
with patch(
"gcu.browser.tools.advanced._get_context",
return_value={"groupId": 1, "activeTabId": 100},
):
result = await browser_evaluate(script="return { status: 'success', count: 42 };")
# browser_evaluate returns raw result from bridge
assert "result" in result
assert result["result"]["value"]["status"] == "success"
@pytest.mark.asyncio
async def test_file_upload(self, mcp: FastMCP, mock_bridge: MagicMock):
"""Test file upload functionality."""
mock_bridge.upload_file = AsyncMock(return_value={"ok": True, "files": 2})
register_advanced_tools(mcp)
browser_upload = mcp._tool_manager._tools["browser_upload"].fn
with patch("gcu.browser.tools.advanced.get_bridge", return_value=mock_bridge):
with patch(
"gcu.browser.tools.advanced._get_context",
return_value={"groupId": 1, "activeTabId": 100},
):
result = await browser_upload(
selector="input[type='file']",
file_paths=["/tmp/file1.pdf", "/tmp/file2.pdf"],
)
assert result.get("ok") is True
class TestErrorHandling:
"""Tests for error handling scenarios."""
@pytest.mark.asyncio
async def test_bridge_not_connected(self, mcp: FastMCP):
"""Test behavior when bridge is not connected."""
mock_bridge = MagicMock(spec=BeelineBridge)
mock_bridge.is_connected = False
register_lifecycle_tools(mcp)
browser_start = mcp._tool_manager._tools["browser_start"].fn
with patch("gcu.browser.tools.lifecycle.get_bridge", return_value=mock_bridge):
result = await browser_start(profile="test")
assert result.get("ok") is False
assert "not connected" in result.get("error", "").lower()
@pytest.mark.asyncio
async def test_browser_not_started(self, mcp: FastMCP, mock_bridge: MagicMock):
"""Test behavior when browser is not started."""
register_tab_tools(mcp)
browser_tabs = mcp._tool_manager._tools["browser_tabs"].fn
with patch("gcu.browser.tools.tabs.get_bridge", return_value=mock_bridge):
with patch("gcu.browser.tools.tabs._get_context", return_value=None):
result = await browser_tabs(profile="nonexistent")
assert result.get("ok") is False
assert "not started" in result.get("error", "").lower()
@pytest.mark.asyncio
async def test_cdp_command_failure(self, mcp: FastMCP, mock_bridge: MagicMock):
"""Test handling of CDP command failures."""
mock_bridge.click = AsyncMock(
side_effect=RuntimeError("CDP error: Element not found")
)
register_interaction_tools(mcp)
browser_click = mcp._tool_manager._tools["browser_click"].fn
with patch("gcu.browser.tools.interactions.get_bridge", return_value=mock_bridge):
with patch(
"gcu.browser.tools.interactions._get_context",
return_value={"groupId": 1, "activeTabId": 100},
):
result = await browser_click(selector=".nonexistent")
assert result.get("ok") is False
assert "error" in result
class TestIFWrapping:
"""Tests for JavaScript IIFE wrapping to handle return statements."""
@pytest.mark.asyncio
async def test_evaluate_with_bare_return(self, mcp: FastMCP, mock_bridge: MagicMock):
"""Test that scripts with bare return statements are wrapped properly."""
call_args = []
async def mock_evaluate_capture(tab_id: int, script: str) -> dict:
call_args.append(script)
return {"result": {"value": 42}}
mock_bridge.evaluate = AsyncMock(side_effect=mock_evaluate_capture)
register_advanced_tools(mcp)
browser_evaluate = mcp._tool_manager._tools["browser_evaluate"].fn
with patch("gcu.browser.tools.advanced.get_bridge", return_value=mock_bridge):
with patch(
"gcu.browser.tools.advanced._get_context",
return_value={"groupId": 1, "activeTabId": 100},
):
# Script with bare return at top level
result = await browser_evaluate(script="return 42;")
# Verify the script was wrapped in IIFE
assert len(call_args) == 1
wrapped_script = call_args[0]
assert wrapped_script.startswith("(function()")
assert wrapped_script.endswith("})()")
assert result.get("ok") is True
@pytest.mark.asyncio
async def test_evaluate_complex_script(self, mcp: FastMCP, mock_bridge: MagicMock):
"""Test complex multi-line script execution."""
mock_bridge.evaluate = AsyncMock(
return_value={"result": {"value": {"total": 100, "filtered": 50}}}
)
register_advanced_tools(mcp)
browser_evaluate = mcp._tool_manager._tools["browser_evaluate"].fn
with patch("gcu.browser.tools.advanced.get_bridge", return_value=mock_bridge):
with patch(
"gcu.browser.tools.advanced._get_context",
return_value={"groupId": 1, "activeTabId": 100},
):
complex_script = """
const items = document.querySelectorAll('.item');
const filtered = Array.from(items).filter(i => i.classList.contains('active'));
return {
total: items.length,
filtered: filtered.length
};
"""
result = await browser_evaluate(script=complex_script)
assert result.get("ok") is True
class TestConcurrentOperations:
"""Tests for concurrent browser operations."""
@pytest.mark.asyncio
async def test_concurrent_clicks_different_tabs(self, mcp: FastMCP, mock_bridge: MagicMock):
"""Test clicking on multiple tabs concurrently."""
click_order = []
async def mock_click(tab_id: int, selector: str, **kwargs) -> dict:
click_order.append(tab_id)
await asyncio.sleep(0.01) # Simulate async operation
return {"ok": True}
mock_bridge.click = AsyncMock(side_effect=mock_click)
register_interaction_tools(mcp)
browser_click = mcp._tool_manager._tools["browser_click"].fn
with patch("gcu.browser.tools.interactions.get_bridge", return_value=mock_bridge):
with patch(
"gcu.browser.tools.interactions._get_context",
side_effect=lambda p: {
"groupId": 1 if p == "agent_1" else 2 if p == "agent_2" else 3,
"activeTabId": 101 if p == "agent_1" else 201 if p == "agent_2" else 301,
},
):
# Concurrent clicks from different agents
await asyncio.gather(
browser_click(selector="button", profile="agent_1"),
browser_click(selector="button", profile="agent_2"),
browser_click(selector="button", profile="agent_3"),
)
# All clicks should have been executed
assert len(click_order) == 3
assert set(click_order) == {101, 201, 301}
@pytest.mark.asyncio
async def test_mixed_operations_same_tab(self, mcp: FastMCP, mock_bridge: MagicMock):
"""Test mixed operations (click, type, scroll) on same tab."""
operations = []
async def track_click(tab_id: int, selector: str, **kwargs) -> dict:
operations.append("click")
return {"ok": True}
async def track_type(tab_id: int, selector: str, text: str, **kwargs) -> dict:
operations.append("type")
return {"ok": True}
async def track_scroll(tab_id: int, direction: str, **kwargs) -> dict:
operations.append("scroll")
return {"ok": True}
mock_bridge.click = AsyncMock(side_effect=track_click)
mock_bridge.type_text = AsyncMock(side_effect=track_type)
mock_bridge.scroll = AsyncMock(side_effect=track_scroll)
register_interaction_tools(mcp)
browser_click = mcp._tool_manager._tools["browser_click"].fn
browser_type = mcp._tool_manager._tools["browser_type"].fn
browser_scroll = mcp._tool_manager._tools["browser_scroll"].fn
with patch("gcu.browser.tools.interactions.get_bridge", return_value=mock_bridge):
with patch(
"gcu.browser.tools.interactions._get_context",
return_value={"groupId": 1, "activeTabId": 100},
):
# Mix of operations
await browser_click(selector="button")
await browser_type(selector="input", text="hello")
await browser_scroll(direction="down")
assert "click" in operations
assert "type" in operations
assert "scroll" in operations