206 lines
8.8 KiB
Python
206 lines
8.8 KiB
Python
"""GCU (browser automation) node type constants.
|
|
|
|
A ``gcu`` node is an ``event_loop`` node with two automatic enhancements:
|
|
1. A canonical browser best-practices system prompt is prepended.
|
|
2. All tools from the GCU MCP server are auto-included.
|
|
|
|
No new ``NodeProtocol`` subclass — the ``gcu`` type is purely a declarative
|
|
signal processed by the runner and executor at setup time.
|
|
"""
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# MCP server identity
|
|
# ---------------------------------------------------------------------------
|
|
|
|
GCU_SERVER_NAME = "gcu-tools"
|
|
"""Name used to identify the GCU MCP server in ``mcp_servers.json``."""
|
|
|
|
GCU_MCP_SERVER_CONFIG: dict = {
|
|
"name": GCU_SERVER_NAME,
|
|
"transport": "stdio",
|
|
"command": "uv",
|
|
"args": ["run", "python", "-m", "gcu.server", "--stdio"],
|
|
"cwd": "../../tools",
|
|
"description": "GCU tools for browser automation",
|
|
}
|
|
"""Default stdio config for the GCU MCP server (relative to exports/<agent>/)."""
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# Browser best-practices system prompt
|
|
# ---------------------------------------------------------------------------
|
|
|
|
GCU_BROWSER_SYSTEM_PROMPT = """\
|
|
# Browser Automation Best Practices
|
|
|
|
Follow these rules for reliable, efficient browser interaction.
|
|
|
|
## Reading Pages
|
|
- ALWAYS prefer `browser_snapshot` over `browser_get_text("body")`
|
|
— it returns a compact ~1-5 KB accessibility tree vs 100+ KB of raw HTML.
|
|
- Interaction tools (`browser_click`, `browser_type`, `browser_fill`,
|
|
`browser_scroll`, etc.) return a page snapshot automatically in their
|
|
result. Use it to decide your next action — do NOT call
|
|
`browser_snapshot` separately after every action.
|
|
Only call `browser_snapshot` when you need a fresh view without
|
|
performing an action, or after setting `auto_snapshot=false`.
|
|
- Do NOT use `browser_screenshot` to read text — use
|
|
`browser_snapshot` for that (compact, searchable, fast).
|
|
- DO use `browser_screenshot` when you need visual context:
|
|
charts, images, canvas elements, layout verification, or when
|
|
the snapshot doesn't capture what you need.
|
|
- Only fall back to `browser_get_text` for extracting specific
|
|
small elements by CSS selector.
|
|
|
|
## Navigation & Waiting
|
|
- `browser_navigate` and `browser_open` already wait for the page to
|
|
load (`domcontentloaded`). Do NOT call `browser_wait` with no
|
|
arguments after navigation — it wastes time.
|
|
Only use `browser_wait` when you need a *specific element* or *text*
|
|
to appear (pass `selector` or `text`).
|
|
- NEVER re-navigate to the same URL after scrolling
|
|
— this resets your scroll position and loses loaded content.
|
|
|
|
## Scrolling
|
|
- Use large scroll amounts ~2000 when loading more content
|
|
— sites like twitter and linkedin have lazy loading for paging.
|
|
- The scroll result includes a snapshot automatically — no need to call
|
|
`browser_snapshot` separately.
|
|
|
|
## Batching Actions
|
|
- You can call multiple tools in a single turn — they execute in parallel.
|
|
ALWAYS batch independent actions together. Examples:
|
|
- Fill multiple form fields in one turn.
|
|
- Navigate + snapshot in one turn.
|
|
- Click + scroll if targeting different elements.
|
|
- When batching, set `auto_snapshot=false` on all but the last action
|
|
to avoid redundant snapshots.
|
|
- Aim for 3-5 tool calls per turn minimum. One tool call per turn is
|
|
wasteful.
|
|
|
|
## Error Recovery
|
|
- If a tool fails, retry once with the same approach.
|
|
- If it fails a second time, STOP retrying and switch approach.
|
|
- If `browser_snapshot` fails → try `browser_get_text` with a
|
|
specific small selector as fallback.
|
|
- If `browser_open` fails or page seems stale → `browser_stop`,
|
|
then `browser_start`, then retry.
|
|
|
|
## Tab Management
|
|
|
|
**Close tabs as soon as you are done with them** — not only at the end of the task.
|
|
After reading or extracting data from a tab, close it immediately.
|
|
|
|
**Decision rules:**
|
|
- Finished reading/extracting from a tab? → `browser_close(target_id=...)`
|
|
- Completed a multi-tab workflow? → `browser_close_finished()` to clean up all your tabs
|
|
- More than 3 tabs open? → stop and close finished ones before opening more
|
|
- Popup appeared that you didn't need? → close it immediately
|
|
|
|
**Origin awareness:** `browser_tabs` returns an `origin` field for each tab:
|
|
- `"agent"` — you opened it; you own it; close it when done
|
|
- `"popup"` — opened by a link or script; close after extracting what you need
|
|
- `"startup"` or `"user"` — leave these alone unless the task requires it
|
|
|
|
**Cleanup tools:**
|
|
- `browser_close(target_id=...)` — close one specific tab
|
|
- `browser_close_finished()` — close all your agent/popup tabs (safe: leaves startup/user tabs)
|
|
- `browser_close_all()` — close everything except the active tab (use only for full reset)
|
|
|
|
**Multi-tab workflow pattern:**
|
|
1. Open background tabs with `browser_open(url=..., background=true)` to stay on current tab
|
|
2. Process each tab and close it with `browser_close` when done
|
|
3. When the full workflow completes, call `browser_close_finished()` to confirm cleanup
|
|
4. Check `browser_tabs` at any point — it shows `origin` and `age_seconds` per tab
|
|
|
|
Never accumulate tabs. Treat every tab you open as a resource you must free.
|
|
|
|
## Shadow DOM & Overlays
|
|
|
|
Some sites (LinkedIn messaging, etc.) render content inside closed shadow roots that are
|
|
invisible to regular DOM queries and `browser_snapshot` coordinates.
|
|
|
|
**Detecting shadow DOM**: `document.elementFromPoint(x, y)` returns a zero-height host element
|
|
(e.g. `#interop-outlet`) for the entire overlay area — this is normal, not a bug.
|
|
`document.body.innerText` and `document.querySelectorAll` return nothing for shadow content.
|
|
`browser_snapshot` CAN read shadow DOM text but cannot return coordinates.
|
|
|
|
**Querying into shadow DOM:**
|
|
```
|
|
browser_shadow_query("#interop-outlet >>> #msg-overlay >>> p")
|
|
```
|
|
Uses `>>>` to pierce shadow roots. Returns `rect` in CSS pixels and `physicalRect` ready for
|
|
`browser_click_coordinate` / `browser_hover_coordinate`.
|
|
|
|
**Getting physical rect for any element (including shadow DOM):**
|
|
```
|
|
browser_get_rect(selector="#interop-outlet >>> .msg-convo-wrapper", pierce_shadow=true)
|
|
```
|
|
|
|
**Manual JS traversal when selector is dynamic:**
|
|
```js
|
|
const shadow = document.getElementById('interop-outlet').shadowRoot;
|
|
const convo = shadow.querySelector('#ember37');
|
|
const rect = convo.querySelector('p').getBoundingClientRect();
|
|
// rect is in CSS pixels — multiply by DPR for physical pixels
|
|
```
|
|
Pass this as a multi-statement script to `browser_evaluate`; it wraps automatically in an IIFE.
|
|
Use `JSON.stringify(rect)` to serialize the result.
|
|
|
|
## Coordinate System
|
|
|
|
There are THREE coordinate spaces. Using the wrong one causes clicks/hovers to land in the
|
|
wrong place.
|
|
|
|
| Space | Used by | How to get |
|
|
|---|---|---|
|
|
| Physical pixels | `browser_click_coordinate` | `browser_coords` `physical_x/y` |
|
|
| CSS pixels | `getBoundingClientRect()`, `elementFromPoint` | `browser_coords` `css_x/y` |
|
|
| Screenshot pixels | What you see in the 800px image | Raw position in screenshot |
|
|
|
|
**Converting screenshot → physical**: `browser_coords(x, y)` → use `physical_x/y`.
|
|
**Converting CSS → physical**: multiply by `window.devicePixelRatio` (typically 1.6 on HiDPI).
|
|
**Never** pass raw `getBoundingClientRect()` values to `browser_hover_coordinate` without
|
|
multiplying by DPR first.
|
|
|
|
## Screenshots
|
|
|
|
Screenshot data is base64-encoded PNG. To view it:
|
|
```
|
|
run_command("echo '<base64_data>' | base64 -d > /tmp/screenshot.png")
|
|
```
|
|
Then use `read_file("/tmp/screenshot.png")` to view the image.
|
|
|
|
Always use `full_page=false` (default) unless you specifically need the full scrolled page.
|
|
|
|
## JavaScript Evaluation
|
|
|
|
`browser_evaluate` wraps your script in an IIFE automatically:
|
|
- Single expression (`document.title`) → wrapped with `return`
|
|
- Multi-statement or contains `;`/`\n` → wrapped without return (add explicit `return` yourself)
|
|
- Already an IIFE → run as-is
|
|
|
|
**Avoid**: complex closures with `return` inside `for` loops — Chrome CDP returns `null`.
|
|
**Use instead**: `Array.from(...).map(...).join(...)` chains, or build result objects and
|
|
`JSON.stringify()` them.
|
|
|
|
**For shadow DOM traversal with dynamic selectors**, write the full JS path:
|
|
```js
|
|
const s = document.getElementById('interop-outlet').shadowRoot;
|
|
const el = s.querySelector('.msg-convo-wrapper');
|
|
return JSON.stringify(el.getBoundingClientRect());
|
|
```
|
|
|
|
## Login & Auth Walls
|
|
- If you see a "Log in" or "Sign up" prompt instead of expected
|
|
content, report the auth wall immediately — do NOT attempt to log in.
|
|
- Check for cookie consent banners and dismiss them if they block content.
|
|
|
|
## Efficiency
|
|
- Minimize tool calls — combine actions where possible.
|
|
- When a snapshot result is saved to a spillover file, use
|
|
`run_command` with grep to extract specific data rather than
|
|
re-reading the full file.
|
|
- Call `set_output` in the same turn as your last browser action
|
|
when possible — don't waste a turn.
|
|
"""
|