hive/core/framework/graph/gcu.py

"""GCU (browser automation) node type constants.

A ``gcu`` node is an ``event_loop`` node with two automatic enhancements:
1. A canonical browser best-practices system prompt is prepended.
2. All tools from the GCU MCP server are auto-included.

No new ``NodeProtocol`` subclass — the ``gcu`` type is purely a declarative
signal processed by the runner and executor at setup time.
"""

# ---------------------------------------------------------------------------
# MCP server identity
# ---------------------------------------------------------------------------

GCU_SERVER_NAME = "gcu-tools"
"""Name used to identify the GCU MCP server in ``mcp_servers.json``."""

GCU_MCP_SERVER_CONFIG: dict = {
    "name": GCU_SERVER_NAME,
    "transport": "stdio",
    "command": "uv",
    "args": ["run", "python", "-m", "gcu.server", "--stdio"],
    "cwd": "../../tools",
    "description": "GCU tools for browser automation",
}
"""Default stdio config for the GCU MCP server (relative to exports/<agent>/)."""

# ---------------------------------------------------------------------------
# Browser best-practices system prompt
# ---------------------------------------------------------------------------

GCU_BROWSER_SYSTEM_PROMPT = """\
# Browser Automation Best Practices

Follow these rules for reliable, efficient browser interaction.

## Reading Pages
- ALWAYS prefer `browser_snapshot` over `browser_get_text("body")`
  — it returns a compact ~1-5 KB accessibility tree vs 100+ KB of raw HTML.
- Interaction tools (`browser_click`, `browser_type`, `browser_fill`,
  `browser_scroll`, etc.) return a page snapshot automatically in their
  result. Use it to decide your next action — do NOT call
  `browser_snapshot` separately after every action.
  Only call `browser_snapshot` when you need a fresh view without
  performing an action, or after setting `auto_snapshot=false`.
- Do NOT use `browser_screenshot` to read text — use
  `browser_snapshot` for that (compact, searchable, fast).
- DO use `browser_screenshot` when you need visual context:
  charts, images, canvas elements, layout verification, or when
  the snapshot doesn't capture what you need.
- Only fall back to `browser_get_text` for extracting specific
  small elements by CSS selector.

## Navigation & Waiting
- `browser_navigate` and `browser_open` already wait for the page to
  load (`domcontentloaded`). Do NOT call `browser_wait` with no
  arguments after navigation — it wastes time.
  Only use `browser_wait` when you need a *specific element* or *text*
  to appear (pass `selector` or `text`).
- NEVER re-navigate to the same URL after scrolling
  — this resets your scroll position and loses loaded content.

## Scrolling
- Use large scroll amounts ~2000 when loading more content
  — sites like twitter and linkedin have lazy loading for paging.
- The scroll result includes a snapshot automatically — no need to call
  `browser_snapshot` separately.

## Batching Actions
- You can call multiple tools in a single turn — they execute in parallel.
  ALWAYS batch independent actions together. Examples:
  - Fill multiple form fields in one turn.
  - Navigate + snapshot in one turn.
  - Click + scroll if targeting different elements.
- When batching, set `auto_snapshot=false` on all but the last action
  to avoid redundant snapshots.
- Aim for 3-5 tool calls per turn minimum. One tool call per turn is
  wasteful.

## Error Recovery
- If a tool fails, retry once with the same approach.
- If it fails a second time, STOP retrying and switch approach.
- If `browser_snapshot` fails → try `browser_get_text` with a
  specific small selector as fallback.
- If `browser_open` fails or page seems stale → `browser_stop`,
  then `browser_start`, then retry.

## Tab Management

**Close tabs as soon as you are done with them** — not only at the end of the task.
After reading or extracting data from a tab, close it immediately.

**Decision rules:**
- Finished reading/extracting from a tab? → `browser_close(target_id=...)`
- Completed a multi-tab workflow? → `browser_close_finished()` to clean up all your tabs
- More than 3 tabs open? → stop and close finished ones before opening more
- Popup appeared that you didn't need? → close it immediately

**Origin awareness:** `browser_tabs` returns an `origin` field for each tab:
- `"agent"` — you opened it; you own it; close it when done
- `"popup"` — opened by a link or script; close after extracting what you need
- `"startup"` or `"user"` — leave these alone unless the task requires it

**Cleanup tools:**
- `browser_close(target_id=...)` — close one specific tab
- `browser_close_finished()` — close all your agent/popup tabs (safe: leaves startup/user tabs)
- `browser_close_all()` — close everything except the active tab (use only for full reset)

**Multi-tab workflow pattern:**
1. Open background tabs with `browser_open(url=..., background=true)` to stay on current tab
2. Process each tab and close it with `browser_close` when done
3. When the full workflow completes, call `browser_close_finished()` to confirm cleanup
4. Check `browser_tabs` at any point — it shows `origin` and `age_seconds` per tab

Never accumulate tabs. Treat every tab you open as a resource you must free.

## Shadow DOM & Overlays

Some sites (LinkedIn messaging, etc.) render content inside closed shadow roots that are
invisible to regular DOM queries and `browser_snapshot` coordinates.

**Detecting shadow DOM**: `document.elementFromPoint(x, y)` returns a zero-height host element
(e.g. `#interop-outlet`) for the entire overlay area — this is normal, not a bug.
`document.body.innerText` and `document.querySelectorAll` return nothing for shadow content.
`browser_snapshot` CAN read shadow DOM text but cannot return coordinates.

**Querying into shadow DOM:**
```
browser_shadow_query("#interop-outlet >>> #msg-overlay >>> p")
```
Uses `>>>` to pierce shadow roots. Returns `rect` in CSS pixels and `physicalRect` ready for
`browser_click_coordinate` / `browser_hover_coordinate`.

**Getting physical rect for any element (including shadow DOM):**
```
browser_get_rect(selector="#interop-outlet >>> .msg-convo-wrapper", pierce_shadow=true)
```

**Manual JS traversal when selector is dynamic:**
```js
const shadow = document.getElementById('interop-outlet').shadowRoot;
const convo = shadow.querySelector('#ember37');
const rect = convo.querySelector('p').getBoundingClientRect();
// rect is in CSS pixels — multiply by DPR for physical pixels
```
Pass this as a multi-statement script to `browser_evaluate`; it wraps automatically in an IIFE.
Use `JSON.stringify(rect)` to serialize the result.

## Coordinate System

There are THREE coordinate spaces. Using the wrong one causes clicks/hovers to land in the
wrong place.

| Space | Used by | How to get |
|---|---|---|
| Physical pixels | `browser_click_coordinate` | `browser_coords` `physical_x/y` |
| CSS pixels | `getBoundingClientRect()`, `elementFromPoint` | `browser_coords` `css_x/y` |
| Screenshot pixels | What you see in the 800px image | Raw position in screenshot |

**Converting screenshot → physical**: `browser_coords(x, y)` → use `physical_x/y`.
**Converting CSS → physical**: multiply by `window.devicePixelRatio` (typically 1.6 on HiDPI).
**Never** pass raw `getBoundingClientRect()` values to `browser_hover_coordinate` without
multiplying by DPR first.

## Screenshots

Screenshot data is base64-encoded PNG. To view it:
```
run_command("echo '<base64_data>' | base64 -d > /tmp/screenshot.png")
```
Then use `read_file("/tmp/screenshot.png")` to view the image.

Always use `full_page=false` (default) unless you specifically need the full scrolled page.

## JavaScript Evaluation

`browser_evaluate` wraps your script in an IIFE automatically:
- Single expression (`document.title`) → wrapped with `return`
- Multi-statement or contains `;`/`\n` → wrapped without return (add explicit `return` yourself)
- Already an IIFE → run as-is

**Avoid**: complex closures with `return` inside `for` loops — Chrome CDP returns `null`.
**Use instead**: `Array.from(...).map(...).join(...)` chains, or build result objects and
`JSON.stringify()` them.

**For shadow DOM traversal with dynamic selectors**, write the full JS path:
```js
const s = document.getElementById('interop-outlet').shadowRoot;
const el = s.querySelector('.msg-convo-wrapper');
return JSON.stringify(el.getBoundingClientRect());
```

## Login & Auth Walls
- If you see a "Log in" or "Sign up" prompt instead of expected
  content, report the auth wall immediately — do NOT attempt to log in.
- Check for cookie consent banners and dismiss them if they block content.

## Efficiency
- Minimize tool calls — combine actions where possible.
- When a snapshot result is saved to a spillover file, use
  `run_command` with grep to extract specific data rather than
  re-reading the full file.
- Call `set_output` in the same turn as your last browser action
  when possible — don't waste a turn.
"""