fix: upgrade browser skills

This commit is contained in:
Timothy
2026-04-13 09:45:07 -07:00
parent eeb46a2b3e
commit 273d4ec66e
2 changed files with 421 additions and 49 deletions
+5 -1
View File
@@ -25,7 +25,11 @@
"Bash(ps -eo pid,cmd)",
"Bash(ps -o pid,lstart,cmd -p 746640)",
"Bash(kill 746636)",
"Bash(ps -eo pid,lstart,cmd)"
"Bash(ps -eo pid,lstart,cmd)",
"Bash(grep -E \"^d|\\\\.py$\")",
"Bash(grep -E \"\\\\.\\(ts|tsx\\)$\")",
"Bash(xargs cat:*)",
"Bash(find /home/timothy/aden/hive -path \"*/.venv\" -prune -o -name \"*.py\" -type f -exec grep -l \"frontend\\\\|UI\\\\|terminal\\\\|interactive\\\\|TUI\" {} \\\\;)"
],
"additionalDirectories": [
"/home/timothy/.hive/skills/writing-hive-skills",
@@ -1,43 +1,290 @@
---
name: hive.browser-automation
description: Best practices for browser automation via gcu-tools MCP server (reading pages, navigation, scrolling, tab management, shadow DOM, coordinates).
name: browser-automation
description: Drive a real Chrome browser via the GCU Beeline extension + Chrome DevTools Protocol. Navigation, clicks, typing, screenshots, shadow-DOM sites (LinkedIn / Reddit / X), keyboard shortcuts, CSP gotchas, rich-text editors. Verified against real production sites 2026-04-11.
metadata:
author: hive
type: default-skill
version: "2.0"
verified: 2026-04-11
---
## Operational Protocol: Browser Automation
# GCU Browser Automation
Follow these rules for reliable, efficient browser interaction.
All GCU browser tools drive a real Chrome instance through the Beeline extension and Chrome DevTools Protocol (CDP). That means clicks, keystrokes, and screenshots are processed by the actual browser's native hit testing, focus, and layout engines — **not** a synthetic event layer. Understanding this unlocks strategies that make hard sites easy.
### Reading Pages
- ALWAYS prefer `browser_snapshot` over `browser_get_text("body")` -- it returns a compact ~1-5 KB accessibility tree vs 100+ KB of raw HTML.
- Interaction tools (`browser_click`, `browser_type`, `browser_fill`, `browser_scroll`, etc.) return a page snapshot automatically in their result. Use it to decide your next action -- do NOT call `browser_snapshot` separately after every action. Only call `browser_snapshot` when you need a fresh view without performing an action, or after setting `auto_snapshot=false`.
- Many complex pages (LinkedIn, Twitter/X, SPAs with virtual scrolling) have DOMs that don't match what's visually rendered — snapshot refs may be stale, missing, or misaligned with visible layout. On these pages, `browser_screenshot` is the only reliable way to orient yourself.
- When using screenshots for interaction, you MUST convert image pixel positions via `browser_coords(x, y)` before clicking. NEVER pass raw screenshot pixel positions directly to `browser_click_coordinate` — the image is downscaled and the coordinates will be wrong. Always: screenshot → read position → `browser_coords` → use `physical_x/y` to click.
## Coordinates: always CSS pixels
**Chrome DevTools Protocol `Input.dispatchMouseEvent` operates in CSS pixels, not physical pixels.**
When you call `browser_coords(image_x, image_y)` after a screenshot, the returned dict has both `css_x/y` and `physical_x/y`. **Always use `css_x/y` for clicks, hovers, and key presses.**
```
browser_screenshot() → image (downscaled to 800/900 px wide)
browser_coords(img_x, img_y) → {css_x, css_y, physical_x, physical_y}
browser_click_coordinate(css_x, css_y) ← USE css_x/y
browser_hover_coordinate(css_x, css_y) ← USE css_x/y
browser_press_at(css_x, css_y, key) ← USE css_x/y
```
Feeding `physical_x/y` on a HiDPI display overshoots by DPR× — on a DPR=1.6 laptop, clicks land 60% too far right and down. The ratio between `physicalScale` and `cssScale` tells you the effective DPR.
`getBoundingClientRect()` already returns CSS pixels — feed those values straight through to click/hover tools without any DPR multiplication.
**Exception for zoomed elements:** pages that use `zoom` or `transform: scale()` on a container (LinkedIn's `#interop-outlet`, some embedded iframes) render in a scaled local coordinate space. `getBoundingClientRect` there may not match CDP's hit space. Use `browser_shadow_query` which handles the math, or fall back to visually picking coordinates from a screenshot.
## Screenshot + coordinates is shadow-agnostic — prefer it on shadow-heavy sites
On sites that use Shadow DOM heavily (Reddit's faceplate Web Components, LinkedIn's `#interop-outlet` messaging overlay, some X custom elements), **coordinate-based operations reach elements that selector-based tools can't see.**
Why:
- **CDP hit testing walks shadow roots natively.** `browser_click_coordinate(css_x, css_y)` routes through Chrome's native hit tester, which traverses open shadow roots automatically. You don't need to know the shadow structure.
- **Keyboard dispatch follows focus** into shadow roots. After a click focuses an input (even one three shadow levels deep), `browser_press(...)` with no selector dispatches keys to `document.activeElement`'s computed focus target.
- **Screenshots render the real layout** regardless of DOM implementation.
Whereas `wait_for_selector`, `browser_click(selector=...)`, `browser_type(selector=...)` all use `document.querySelector` under the hood, which **stops at shadow boundaries**. They cannot see elements inside shadow roots.
### Recommended workflow on shadow-heavy sites
1. `browser_screenshot()` → visual image
2. Identify the target visually → image pixel `(x, y)` (eyeball from the screenshot)
3. `browser_coords(x, y)` → convert to CSS px
4. `browser_click_coordinate(css_x, css_y)` → lands on the element via native hit testing; inputs get focused
5. For typing:
- If the element was reachable via a selector → `browser_type(selector, text)`
- Otherwise → `browser_press(key)` per character (dispatches to focused element, no selector needed)
6. Verify by reading element state via a targeted `browser_evaluate` that walks the shadow tree
### Empirically verified (2026-04-11)
Tested against `https://www.reddit.com/r/programming/` whose search input lives at:
```
document > reddit-search-large [shadow]
> faceplate-search-input#search-input [shadow]
> input[name="q"]
```
- `document.querySelector('input')`**0 visible inputs** on the page (all in shadow)
- `browser_type('faceplate-search-input input', 'python')` → "Element not found"
- `browser_click_coordinate(617, 28)` → focus trail: `REDDIT-SEARCH-LARGE > FACEPLATE-SEARCH-INPUT > INPUT`
- Char-by-char key dispatch after the click → `input.value === 'python'`
Coordinate pipeline: works perfectly. Selector pipeline: unusable without shadow-piercing syntax.
### Shadow-piercing selectors
When you DO want a selector-based approach and know the shadow structure, `browser_shadow_query` and `browser_get_rect` support `>>>` shadow-piercing syntax:
```
browser_shadow_query("reddit-search-large >>> #search-input")
browser_get_rect("#interop-outlet >>> #ember37 >>> p")
```
Returns the element's rect in **CSS pixels** (feed directly to click tools). Remember: `browser_type` and `wait_for_selector` do **not** support `>>>` — only shadow_query and get_rect do.
## Navigation and waiting
### The basics
```
browser_navigate(url, wait_until="load") # "load" | "domcontentloaded" | "networkidle"
browser_wait_for_selector("h1", timeout_ms=5000)
browser_wait_for_text("Some text", timeout_ms=5000)
browser_go_back()
browser_go_forward()
browser_reload()
```
All return real URLs and titles. On a fast page `navigate(wait_until="load")` returns in sub-second. `wait_for_selector` and `wait_for_text` typically resolve in single-digit milliseconds on elements already in the DOM.
### Timing expectations (measured against real sites)
| Site | Navigate load time |
|---|---|
| example.com | 100400 ms |
| wikipedia.org | 200500 ms |
| reddit.com | 1.52 s |
| x.com/twitter | 1.21.6 s |
| linkedin.com (logged in) | 45 s |
Use `timeout_ms=20000` for LinkedIn and other heavy SPAs to give them margin.
### After navigate, always let SPA hydrate
Even after `wait_until="load"`, React/Vue SPAs often render their real chrome in a second pass. Add `await sleep(2)` to `await sleep(3)` before querying for site-specific elements. Otherwise `wait_for_selector` will fail on elements that do exist moments later.
### Reading pages efficiently
- **Prefer `browser_snapshot` over `browser_get_text("body")`** — returns a compact ~15 KB accessibility tree vs 100+ KB of raw HTML.
- Interaction tools (`browser_click`, `browser_type`, `browser_fill`, `browser_scroll`, etc.) return a page snapshot automatically in their result. Use it to decide your next action — do NOT call `browser_snapshot` separately after every action. Only call `browser_snapshot` when you need a fresh view without performing an action, or after setting `auto_snapshot=false`.
- Complex pages (LinkedIn, Twitter/X, SPAs with virtual scrolling) have DOMs that don't match what's visually rendered — snapshot refs may be stale, missing, or misaligned with visible layout. On these pages, `browser_screenshot` is the only reliable way to orient yourself.
- Only fall back to `browser_get_text` for extracting specific small elements by CSS selector.
### Navigation & Waiting
- `browser_navigate` and `browser_open` already wait for the page to load. Do NOT call `browser_wait` with no arguments after navigation -- it wastes time. Only use `browser_wait` when you need a *specific element* or *text* to appear (pass `selector` or `text`).
- NEVER re-navigate to the same URL after scrolling -- this resets your scroll position and loses loaded content.
## Typing and keyboard input
### Scrolling
- Use large scroll amounts ~2000 when loading more content -- sites like twitter and linkedin have lazy loading for paging.
- The scroll result includes a snapshot automatically -- no need to call `browser_snapshot` separately.
### ALWAYS click before typing into rich-text editors
### Batching Actions
- You can call multiple tools in a single turn -- they execute in parallel. ALWAYS batch independent actions together. Examples: fill multiple form fields in one turn, navigate + snapshot in one turn, click + scroll if targeting different elements.
**The single most common "looks like it worked but send button stays disabled" failure.** If you're typing into a modern editor (X/Twitter's Draft.js compose, LinkedIn's post composer, Reddit's comment box, Gmail compose, Slack, Discord, Notion, Monaco, any `contenteditable`), **click the input area first with `browser_click_coordinate` or `browser_click(selector)` before you type**.
Why this is necessary:
- **React / Vue controlled components** don't trust JS-sourced `.focus()`. React uses event delegation and watches for *native* pointer/focus events — a `click` dispatched via CDP fires the real `pointerdown`/`pointerup`/`click`/`focus` sequence that React listens to, and updates its internal state. A JS-only `.focus()` sets `document.activeElement` but the framework's controlled state doesn't see it.
- **Draft.js** (X/Twitter compose) and **Lexical** (Gmail, LinkedIn DMs) use contenteditable divs with immutable editor state. They only enter "edit mode" after a real click on the editor surface. Typing at them without clicking routes keys to `document.body` or gets silently discarded.
- **Send/submit buttons are bound to framework state**, not DOM state. They're typically `disabled={!hasRealContent}` where `hasRealContent` is computed from React/Vue/Svelte state. The input field can have characters in the DOM but the button stays disabled because the framework never saw a real input event.
The symptom is always the same: **you type, the characters appear visually, and the send button doesn't enable**. The agent then clicks send anyway, nothing happens, and it thinks the post failed.
### Safe "click-then-type-then-verify" pattern
```
# 1. Focus the real element via a real click (not JS .focus()).
rect = browser_get_rect(selector) # or browser_shadow_query for shadow sites
browser_click_coordinate(rect.cx, rect.cy)
sleep(0.5) # let the editor open / focus settle
# 2. Type. browser_type now uses CDP Input.insertText by default, which is
# the most reliable way to insert text into rich editors (Lexical,
# Draft.js, ProseMirror, any React-controlled contenteditable).
browser_type(selector, text)
sleep(1.0) # let framework state commit
# 3. BEFORE clicking send, verify the submit button is actually enabled.
# Don't trust that typing worked — check state.
state = browser_evaluate("""
(function(){
const btn = document.querySelector('[data-testid="tweetButton"]');
if (!btn) return {exists: false};
return {
exists: true,
disabled: btn.disabled || btn.getAttribute('aria-disabled') === 'true',
text: btn.textContent.trim(),
};
})()
""")
# 4. Only click send if the button is enabled.
if not state['disabled']:
browser_click(submit_selector)
else:
# Recovery: sometimes a click-again + one extra keystroke nudges
# React into recomputing hasRealContent.
browser_click_coordinate(rect.cx, rect.cy)
browser_press("End")
browser_press(" ")
browser_press("Backspace")
# re-check state
```
### Why `browser_type` uses `Input.insertText` by default
CDP has a dedicated method — `Input.insertText` — for committing text into the focused element as if IME just committed it. It **bypasses the keyboard event pipeline entirely** and works cleanly on every rich-text editor tested to date: Lexical (LinkedIn DMs, Gmail), Draft.js (X compose), ProseMirror (Reddit), Monaco, and plain `contenteditable`. Playwright uses this under the hood for `keyboard.type()` on rich editors.
Per-character `Input.dispatchKeyEvent` looks equivalent on paper, but some rich editors listen for `beforeinput` events with a specific shape and route insertion through their own state machine — the raw keys arrive but never get turned into text. That was the exact failure mode that left LinkedIn's message composer empty (and its Send button disabled) during the 2026-04-11 empirical run.
If you need per-keystroke dispatch (autocomplete testing, code editors, animated typing with `delay_ms`), pass `use_insert_text=False` to fall back to the old `keyDown/keyUp` path.
### Neutralizing `beforeunload` draft dialogs
When a composer has unsent text and you try to navigate away or close the tab, sites like LinkedIn pop a native "You have an unsent message, leave?" confirm dialog via `window.onbeforeunload`. Your automation hangs waiting on the dialog — `browser_close_tab` and `browser_navigate` both time out.
**Strip the handler via `browser_evaluate` before navigating:**
```
browser_evaluate("""
(function(){
window.onbeforeunload = null;
window.addEventListener('beforeunload', function(e){
e.stopImmediatePropagation();
}, true);
return true;
})()
""")
# Now browser_navigate / close_tab work without hitting a confirm
```
Always include an equivalent cleanup block in any script that types into a compose UI — without it, a script crash mid-type leaves the tab in an unusable state with the draft modal blocking every subsequent automation call.
### Verified site-specific quirks
| Site | Editor | Workaround |
|---|---|---|
| **X / Twitter** compose | Draft.js | Click `[data-testid='tweetTextarea_0']` first, then type with `delay_ms=20`. First 1-2 chars may be eaten — accept truncation or prepend a throwaway char. Verify `[data-testid='tweetButton']` has `disabled: false` before clicking. |
| **LinkedIn** messaging | contenteditable (inside `#interop-outlet` shadow root) | Use `browser_shadow_query` to find the rect, click-coordinate to focus, then type via focus-based key dispatch (selector-based type can't reach shadow). Send button is `.msg-form__send-button`. |
| **LinkedIn** feed post composer | Quill/LinkedIn custom | Click the "Start a post" trigger first, wait 1s for modal, click the textarea, type. |
| **Reddit** comment/post box | ProseMirror | Click the textarea, wait 0.5s for the toolbar to mount, then type. Submit is `button[slot="submit-button"]` inside a shreddit-composer. |
| **Gmail** compose | Lexical | Click the body first. Gmail has a visible `div[contenteditable=true][aria-label*='Message Body']` after opening a compose window. |
| **Slack** message box | contenteditable | Click first, then type. Send is a paper-plane button with `data-qa='texty_send_button'`. |
| **Discord** | Slate | Click first. Discord's send is implicit on Enter (no button), so just press Enter after typing. |
| **Monaco** editors (GitHub code review, CodeSandbox) | Monaco | Click first, type with `delay_ms=10`. Monaco listens for `textarea` input events on a hidden textarea — requires focus to be on that textarea. |
### Plain text into a real input
For plain `<input>` and `<textarea>` elements with no framework wrapper (forms on static sites, simple search bars that pass a selector string straight through), `browser_type(selector, text)` is sufficient — the bridge's internal `focus()` call does the right thing. But when in doubt, click first. It's cheap insurance.
```
browser_type(selector, text)
```
- Sends `keyDown` (with `key`, `code`, `text` fields populated) → `keyUp` per character (or a single `Input.insertText` by default)
- Fires real `keydown` / `keypress` / `input` / `keyup` events — frameworks that branch on `event.key` or `event.code` see the right values
- Matches what Playwright and Puppeteer send
Works on real `<input>`, `<textarea>`, and `contenteditable` elements. For shadow-DOM inputs, see the "shadow-heavy sites" section above — `type_text(selector=)` can't see past shadow boundaries.
### Keyboard shortcuts (Ctrl+A, Shift+Tab, Cmd+Enter)
```
browser_press("a", modifiers=["ctrl"]) # Ctrl+A — select all
browser_press("Backspace") # clear selected text
browser_press("Enter", modifiers=["meta"]) # Cmd+Enter (mac) — submit
browser_press("Tab", modifiers=["shift"]) # Shift+Tab — reverse focus
```
Accepted modifier names (case-insensitive): `"alt"`, `"ctrl"` / `"control"`, `"meta"` / `"cmd"`, `"shift"`.
Behind the scenes this dispatches the modifier's own `keyDown` first, then the main key with `code` and `windowsVirtualKeyCode` populated (so Chrome's shortcut dispatcher recognises it), then releases modifiers in reverse order. Without the `code` + `windowsVirtualKeyCode` fields Chrome routes the event to the DOM without firing shortcuts — which is what plain string keys get.
### Special keys
Recognized without modifiers: `Enter`, `Tab`, `Escape`, `Backspace`, `Delete`, `ArrowUp/Down/Left/Right`, `Home`, `End`, `PageUp`, `PageDown`.
## Screenshots
```
browser_screenshot() # viewport, 900 px wide by default
browser_screenshot(full_page=True) # full scrollable page
browser_screenshot(selector="#header") # clip to element's rect
```
Returns a PNG with automatic downscaling to a target width (default 900 px) plus a JSON metadata block containing `cssWidth`, `devicePixelRatio`, `physicalScale`, `cssScale`, and a `scaleHint` string. The image is also annotated with a highlight rectangle/dot showing the last interaction (click, hover, type) if one happened on this tab.
The highlight overlay stays visible on the page for **10 seconds** after each interaction, then fades. Before a screenshot is likely, make sure your click / hover / type happens <10 s before the screenshot.
### Anatomy of the scale fields
- `cssWidth` = `window.innerWidth` (CSS px)
- `devicePixelRatio` = `window.devicePixelRatio` (often 1.6, 2, or 3 on modern displays)
- `physicalScale = png_width / image_width` (how many physical-px per image-px)
- `cssScale = cssWidth / image_width` (how many CSS-px per image-px)
- Effective DPR = `physicalScale / cssScale` (should match `devicePixelRatio`)
When converting image coordinates for clicks, always use `cssScale`. The `physicalScale` field is there for debugging HiDPI displays, not for inputs.
## Scrolling
- Use large scroll amounts (~2000) when loading more content — sites like Twitter and LinkedIn have lazy loading for paging.
- The scroll result includes a snapshot automatically — no need to call `browser_snapshot` separately.
- Never re-navigate to the same URL after scrolling — this resets your scroll position and loses loaded content.
## Batching actions
- You can call multiple tools in a single turn — they execute in parallel. ALWAYS batch independent actions together. Examples: fill multiple form fields in one turn, navigate + snapshot in one turn, click + scroll if targeting different elements.
- When batching, set `auto_snapshot=false` on all but the last action to avoid redundant snapshots.
- Aim for 3-5 tool calls per turn minimum. One tool call per turn is wasteful.
- Aim for 35 tool calls per turn minimum. One tool call per turn is wasteful.
### Error Recovery
- If a tool fails, retry once with the same approach.
- If it fails a second time, STOP retrying and switch approach.
- If `browser_snapshot` fails, try `browser_get_text` with a specific small selector as fallback.
- If `browser_open` fails or page seems stale, `browser_stop`, then `browser_start`, then retry.
## Tab management
### Tab Management
**Close tabs as soon as you are done with them** -- not only at the end of the task. After reading or extracting data from a tab, close it immediately.
**Close tabs as soon as you are done with them** — not only at the end of the task. After reading or extracting data from a tab, close it immediately.
- Finished reading/extracting from a tab? `browser_close(target_id=...)`
- Completed a multi-tab workflow? `browser_close_finished()` to clean up all your tabs
@@ -45,36 +292,157 @@ Follow these rules for reliable, efficient browser interaction.
- Popup appeared that you didn't need? Close it immediately
`browser_tabs` returns an `origin` field for each tab:
- `"agent"` -- you opened it; you own it; close it when done
- `"popup"` -- opened by a link or script; close after extracting what you need
- `"startup"` or `"user"` -- leave these alone unless the task requires it
- `"agent"` you opened it; you own it; close it when done
- `"popup"` opened by a link or script; close after extracting what you need
- `"startup"` or `"user"` leave these alone unless the task requires it
Never accumulate tabs. Treat every tab you open as a resource you must free.
### Shadow DOM & Overlays
Some sites (LinkedIn messaging, etc.) render content inside closed shadow roots invisible to regular DOM queries.
The bridge automatically evicts per-tab state (`_cdp_attached`, `_interaction_highlights`) when a tab is closed, so you can't leak stale annotations or attached-debugger flags.
- `browser_shadow_query("#interop-outlet >>> #msg-overlay >>> p")` -- uses `>>>` to pierce shadow roots. Returns `rect` in CSS pixels and `physicalRect` ready for coordinate tools.
- `browser_get_rect(selector="...", pierce_shadow=true)` -- get physical rect for any element including shadow DOM.
## Site-specific selectors (verified 2026-04-11)
### Coordinate System
There are THREE coordinate spaces. Using the wrong one causes clicks/hovers to land in the wrong place.
### LinkedIn
| Space | Used by | How to get |
|---|---|---|
| Physical pixels | `browser_click_coordinate` | `browser_coords` `physical_x/y` |
| CSS pixels | `getBoundingClientRect()`, `elementFromPoint` | `browser_coords` `css_x/y` |
| Screenshot pixels | What you see in the image | Raw position in screenshot |
| Target | Selector |
|---|---|
| Global search input | `input[data-testid='typeahead-input']` |
| Own profile link | `a[href*='linkedin.com/in/']` |
| Messaging overlay | `#interop-outlet >>> [aria-label]` (use shadow_query) |
**Converting screenshot to physical**: `browser_coords(x, y)` then use `physical_x/y`.
**Converting CSS to physical**: multiply by `window.devicePixelRatio` (typically 1.6 on HiDPI).
**Never** pass raw `getBoundingClientRect()` values to coordinate tools without multiplying by DPR first.
LinkedIn enforces **strict Trusted Types CSP**. Any script you inject via `browser_evaluate` that uses `innerHTML = "<...>"` will be **silently dropped** — the wrapper element gets added but its content is empty, no console error. Always use `createElement` + `appendChild` + `setAttribute` for DOM injection on LinkedIn. `style.cssText`, `textContent`, and `.value` assignments are fine (they don't go through the Trusted Types sink).
### Login & Auth Walls
- If you see a "Log in" or "Sign up" prompt, report the auth wall immediately -- do NOT attempt to log in.
### Reddit (new reddit / shreddit)
| Target | Selector |
|---|---|
| Search input (shadow) | `reddit-search-large >>> #search-input` (rect only; type via click-to-focus) |
| Reddit logo (home) | `#reddit-logo` |
| Subreddit posts | `shreddit-post` custom elements |
| Create post button | `a[href*='/submit']` |
Reddit's search input lives **two shadow levels deep** inside `reddit-search-large > faceplate-search-input`. You cannot reach it with `browser_type(selector=)`. The working pattern:
1. `browser_shadow_query("reddit-search-large >>> #search-input")` → rect
2. `browser_click_coordinate(rect.cx, rect.cy)` → click lands on the real shadow input via native hit testing; input becomes focused
3. `browser_press(c)` for each character → dispatches to focused element
4. Verify by reading `.value` via `browser_evaluate` walking the shadow path
### X / Twitter
| Target | Selector |
|---|---|
| Main search input | `input[data-testid='SearchBox_Search_Input']` |
| Home nav link | `a[data-testid='AppTabBar_Home_Link']` |
| Post text area (compose) | `[data-testid='tweetTextarea_0']` |
| Reply buttons on feed | `[data-testid='reply']` |
| Post / Tweet submit button | `[data-testid='tweetButton']` |
| Caret (⋯) menu on a post | `[data-testid='caret']` |
| Confirmation sheet button | `[data-testid='confirmationSheetConfirm']` |
**X uses Draft.js for the compose text editor**, which does NOT accept synthetic input reliably. Working workaround: `browser_type(selector='[data-testid="tweetTextarea_0"]', text="...", delay_ms=20)`. The delay gives Draft.js time to process each keystroke. The first 12 characters may still get eaten — accept minor truncation or prepend a throwaway character. After typing, check `[data-testid="tweetButton"]` has `disabled: false` before clicking submit.
After submitting, press Escape to close the composer.
## Common pitfalls
- **Typing into a rich-text editor without clicking first → send button stays disabled.** Draft.js (X), Lexical (Gmail, LinkedIn DMs), ProseMirror (Reddit), and React-controlled `contenteditable` elements only register input as "real" when the element received a native focus event — JS-sourced `.focus()` is not enough. `browser_type` now does this automatically via a real CDP pointer click before inserting text, but always verify the submit button's `disabled` state before clicking send. See the "ALWAYS click before typing" section above.
- **Using per-character `keyDown` on Lexical / Draft.js editors → keys dispatch but text never appears.** Those editors intercept `beforeinput` and route insertion through their own state machine; raw keyDown events are silently dropped. `browser_type` now uses `Input.insertText` by default (the CDP IME-commit method) which these editors accept cleanly. Only set `use_insert_text=False` when you explicitly need per-keystroke dispatch.
- **Leaving a composer with text then trying to navigate → `beforeunload` dialog hangs the bridge.** LinkedIn and several other sites pop a native "unsent message" confirm. `browser_navigate` and `close_tab` both time out against this. Always strip `window.onbeforeunload = null` via `browser_evaluate` before any navigation after typing in a composer, or wrap your logic in a `try/finally` that runs the cleanup block.
- **Clicking at physical pixels.** CDP uses CSS px. `browser_coords` returns both for debugging, but always feed `css_x/y` to click tools.
- **Calling `wait_for_selector` on a shadow element.** It'll always time out. Use `browser_shadow_query` or the screenshot + coordinate strategy.
- **Relying on `innerHTML` in injected scripts on LinkedIn.** Silently discarded. Use `createElement` + `appendChild`.
- **Not waiting for SPA hydration.** `wait_until="load"` fires before React/Vue rendering on many sites. Add a 23 s sleep before querying for chrome elements.
- **Using `browser_type(selector)` on LinkedIn DMs or any shadow-DOM input.** Won't find the element. Fall back to click-to-focus + `browser_press` per character.
- **Keyboard shortcuts without the `code` field.** Chrome's shortcut dispatcher ignores keyboard events that lack a `code` or `windowsVirtualKeyCode`. `browser_press(..., modifiers=[...])` populates these automatically; raw `Input.dispatchKeyEvent` calls from `browser_evaluate` may not.
- **Taking a screenshot more than 10s after the last interaction** and expecting the highlight to still be visible. The overlay fades after 10s. Take the screenshot sooner, or re-trigger the interaction.
- **Expecting `browser_navigate` to return when you specified `wait_until="networkidle"` on a busy site.** networkidle is approximate — some sites keep a websocket or analytics beacon open forever. Use `"load"` or `"domcontentloaded"` for reliable timing.
## Dead CDP sessions and auto-recovery
If Chrome detaches the debugger for its own reasons (tab closed, user opened DevTools manually, cross-origin navigation, `chrome://` page loaded), the bridge detects the "target closed" / "not attached" error on the next call and **automatically reattaches + retries once**. You don't need to handle this yourself.
If reattach also fails, you'll get the underlying CDP error string — that's a real problem, usually the tab is gone.
## When to reach for `browser_evaluate`
Use it when:
- You need to read state from inside a shadow root that `browser_get_rect` doesn't handle
- You need a one-shot JS snippet to trigger a site-specific action (scroll a specific container, open a menu, set a form field value directly)
- You need to walk an AX tree or measure layout that the standard tools don't expose
Avoid it when:
- A standard tool (`browser_click_coordinate`, `browser_type`, `browser_press`) already does what you need. Those go through CDP's native event pipeline, which real sites trust more than synthetic JS dispatch.
- You're on a strict-CSP site and want to inject DOM — stick to `createElement` + `appendChild`, never `innerHTML`.
- You need to trigger React / Vue / framework state changes — those frameworks watch for real browser events (`input`, `change`, `click`), not scripted `dispatchEvent` calls. Native-event tools are more reliable.
## Login & auth walls
- If you see a "Log in" or "Sign up" prompt, report the auth wall immediately — do NOT attempt to log in.
- Check for cookie consent banners and dismiss them if they block content.
### Efficiency
- Minimize tool calls -- combine actions where possible.
- When a snapshot result is saved to a spillover file, use `run_command` with grep to extract specific data rather than re-reading the full file.
- Call `set_output` in the same turn as your last browser action when possible -- don't waste a turn.
## Error recovery
- If a tool fails, retry once with the same approach.
- If it fails a second time, STOP retrying and switch approach.
- If `browser_snapshot` fails, try `browser_get_text` with a specific small selector as fallback.
- If `browser_open` fails or page seems stale, `browser_stop`, then `browser_start`, then retry.
## Verified workflows
These sequences have been empirically verified against real production sites on 2026-04-11.
### Search on X and read the live dropdown
```
browser_navigate("https://x.com/explore", wait_until="load")
# Wait for SPA hydration
sleep(3)
browser_wait_for_selector("input[data-testid='SearchBox_Search_Input']", timeout_ms=5000)
rect = browser_get_rect("input[data-testid='SearchBox_Search_Input']")
browser_click_coordinate(rect.cx, rect.cy)
browser_type("input[data-testid='SearchBox_Search_Input']", "openai", clear_first=True)
# Screenshot now shows live search suggestions
browser_screenshot()
browser_press("Escape", selector="input[data-testid='SearchBox_Search_Input']")
```
### Search Reddit (shadow DOM)
```
browser_navigate("https://www.reddit.com/r/programming/", wait_until="load")
sleep(2)
# Shadow-pierce the nested search input
sq = browser_shadow_query("reddit-search-large >>> #search-input")
browser_click_coordinate(sq.rect.cx, sq.rect.cy)
# Typing can't use selector (shadow); focused input receives raw key presses
for c in "python":
browser_press(c)
browser_screenshot()
browser_press("Escape")
```
### Search LinkedIn and dismiss without submitting
```
browser_navigate("https://www.linkedin.com/feed/", wait_until="load", timeout_ms=20000)
sleep(3)
browser_wait_for_selector("input[data-testid='typeahead-input']", timeout_ms=5000)
rect = browser_get_rect("input[data-testid='typeahead-input']")
browser_click_coordinate(rect.cx, rect.cy)
browser_type("input[data-testid='typeahead-input']", "anthropic", clear_first=True)
# Dropdown shows real live suggestions
browser_screenshot()
browser_press("Escape", selector="input[data-testid='typeahead-input']")
```
## Debugging checklist when a click / type "didn't work"
1. **Send button stays disabled after typing?** Two possible causes. (a) You didn't click the input first, so React never saw a native focus event. `browser_type` now clicks automatically — but if you're using raw `Input.dispatchKeyEvent`, click first yourself. (b) You're using per-character `keyDown` on a Lexical / Draft.js editor, and those editors dropped the keys because they listen for `beforeinput` with a specific shape. Switch to `browser_type(selector, text)` (which now uses `Input.insertText` by default) or, at a lower level, call CDP `Input.insertText` directly. Always `browser_evaluate` the submit button's `disabled` / `aria-disabled` state before clicking send; if still disabled after those fixes, the framework never saw real input.
2. **Did the selector match anything?** Run `browser_get_rect(selector)` — if it returns `visible=False` or zero rect, the element isn't laid out yet. Wait longer or use a different selector.
3. **Is the element inside a shadow root?** Try `browser_shadow_query(path)`. If your selector is light-DOM only, switch to the screenshot + coordinate strategy.
4. **Did the click hit something on top of the element?** Register a temporary event listener via `browser_evaluate` on the target element, click, then read `window.__hits` to see what actually received the click. If something else is intercepting (overlay, modal, floating button), dismiss it first.
5. **Did `type_text` find the element but fail to insert text?** Some editors (Draft.js on X, ProseMirror on some sites, Monaco) require a small `delay_ms` between keystrokes. Try `delay_ms=20`.
6. **Is this a keyboard shortcut that doesn't fire?** Make sure you're using `browser_press(key, modifiers=[...])` — not raw `browser_evaluate` with `dispatchEvent`. Chrome ignores shortcut key events that lack `code` and `windowsVirtualKeyCode`.
7. **Did the navigation actually complete?** Check the return value of `browser_navigate` — it now returns a real `url` and `title`. An empty title usually means a blank page or a hung load.
8. **Is your screenshot stale?** The highlight overlay stays for 10 s; if the screenshot was taken later, the annotation is gone but the click was real. Check the logs of `browser_click_coordinate` to see the coordinates that were actually sent.