Compare commits
8 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| c6b6a5a2f7 | |||
| 18f5f078fc | |||
| cc6ec97a75 | |||
| 44d114f0d0 | |||
| 9e71f16d15 | |||
| 28cad2376c | |||
| 8222cd306e | |||
| 916803889f |
@@ -64,7 +64,7 @@ snapshot = await browser_snapshot(tab_id)
|
||||
|---------|--------------|-------|
|
||||
| Scroll doesn't move | Nested scroll container | Look for `overflow: scroll` divs |
|
||||
| Click no effect | Element covered | Check `getBoundingClientRect` vs viewport |
|
||||
| Type clears | Autocomplete/React | Check for event listeners on input |
|
||||
| Type clears | Autocomplete/React | Check for event listeners on input; try `browser_type_focused` |
|
||||
| Snapshot hangs | Huge DOM | Check node count in snapshot |
|
||||
| Snapshot stale | SPA hydration | Wait after navigation |
|
||||
|
||||
@@ -229,7 +229,7 @@ function queryShadow(selector) {
|
||||
|-------|-------------|----------|
|
||||
| Scroll not working | Find scrollable container | Mouse wheel at container center |
|
||||
| Click no effect | JavaScript click() | CDP mouse events |
|
||||
| Type clears | Add delay_ms | Use execCommand |
|
||||
| Type clears | Add delay_ms | Use `browser_type_focused` (Input.insertText) |
|
||||
| Snapshot hangs | Add timeout_s | DOM snapshot fallback |
|
||||
| Stale content | Wait for selector | Increase wait_until timeout |
|
||||
| Shadow DOM | Pierce selector | JavaScript traversal |
|
||||
|
||||
@@ -18,7 +18,7 @@ Use browser nodes (with `tools: {policy: "all"}`) when:
|
||||
|
||||
All tools are prefixed with `browser_`:
|
||||
- `browser_start`, `browser_open`, `browser_navigate` — launch/navigate
|
||||
- `browser_click`, `browser_click_coordinate`, `browser_fill`, `browser_type` — interact
|
||||
- `browser_click`, `browser_click_coordinate`, `browser_fill`, `browser_type`, `browser_type_focused` — interact
|
||||
- `browser_press` (with optional `modifiers=["ctrl"]` etc.) — keyboard shortcuts
|
||||
- `browser_snapshot` — compact accessibility-tree read (structured)
|
||||
<!-- vision-only -->
|
||||
@@ -50,7 +50,8 @@ Chrome DevTools Protocol `Input.dispatchMouseEvent` takes **CSS pixels**, not ph
|
||||
2. For static pages (docs, forms, search results), browser_snapshot is fine.
|
||||
3. Before typing into a rich-text editor (X compose, LinkedIn DM, Gmail, Reddit),
|
||||
click the input area first with browser_click_coordinate so React / Draft.js /
|
||||
Lexical register a native focus event. Otherwise the send button stays disabled.
|
||||
Lexical register a native focus event, then use browser_type_focused(text=...)
|
||||
for shadow-DOM inputs or browser_type(selector, text) for light-DOM inputs.
|
||||
4. Use browser_wait(seconds=2-3) after navigation for SPA hydration.
|
||||
5. If you hit an auth wall, call set_output with an error and move on.
|
||||
6. Keep tool calls per turn <= 10 for reliability.
|
||||
|
||||
@@ -70,10 +70,12 @@ ProseMirror only register input as "real" after a native pointer-
|
||||
sourced focus event; JS `.focus()` is not enough. Without a real click
|
||||
first, the editor stays empty and the send button stays disabled.
|
||||
|
||||
`browser_type` now does this automatically — it clicks the element,
|
||||
then inserts text via CDP `Input.insertText` (IME-commit style), which
|
||||
rich editors accept cleanly. Before clicking send, verify the submit
|
||||
button's `disabled` / `aria-disabled` state via `browser_evaluate`.
|
||||
`browser_type` does this automatically when you have a selector — it
|
||||
clicks the element, then inserts text via CDP `Input.insertText`.
|
||||
For shadow-DOM inputs where selectors can't reach, use
|
||||
`browser_click_coordinate` to focus, then `browser_type_focused(text=...)`
|
||||
to type into the active element. Before clicking send, verify the
|
||||
submit button's `disabled` / `aria-disabled` state via `browser_evaluate`.
|
||||
|
||||
## Shadow DOM
|
||||
|
||||
@@ -90,8 +92,8 @@ reach shadow elements transparently.
|
||||
3. `browser_coords(x, y)` → CSS px
|
||||
4. `browser_click_coordinate(css_x, css_y)` → lands via native hit
|
||||
test; inputs get focused regardless of shadow depth
|
||||
5. Type via `browser_type` or, if the selector path can't reach the
|
||||
element, dispatch keys to the focused element
|
||||
5. Type via `browser_type_focused` (no selector needed — types into the
|
||||
already-focused element), or `browser_type` if you have a selector
|
||||
|
||||
For selector-style access when you know the shadow path:
|
||||
`browser_shadow_query("#interop-outlet >>> #msg-overlay >>> p")` —
|
||||
|
||||
@@ -42,18 +42,25 @@ Why:
|
||||
- **Keyboard dispatch follows focus** into shadow roots. After a click focuses an input (even one three shadow levels deep), `browser_press(...)` with no selector dispatches keys to `document.activeElement`'s computed focus target.
|
||||
- **Screenshots render the real layout** regardless of DOM implementation.
|
||||
|
||||
Whereas `wait_for_selector`, `browser_click(selector=...)`, `browser_type(selector=...)` all use `document.querySelector` under the hood, which **stops at shadow boundaries**. They cannot see elements inside shadow roots.
|
||||
Whereas `wait_for_selector`, `browser_click(selector=...)`, `browser_type(selector=...)` all use `document.querySelector` under the hood, which **stops at shadow boundaries**. They cannot see elements inside shadow roots. For shadow-DOM inputs, use `browser_type_focused` after focusing via click-coordinate.
|
||||
|
||||
### Recommended workflow on shadow-heavy sites
|
||||
|
||||
1. `browser_screenshot()` → visual image
|
||||
2. Identify the target visually → image pixel `(x, y)` (eyeball from the screenshot)
|
||||
3. `browser_coords(x, y)` → convert to CSS px
|
||||
4. `browser_click_coordinate(css_x, css_y)` → lands on the element via native hit testing; inputs get focused
|
||||
5. For typing:
|
||||
- If the element was reachable via a selector → `browser_type(selector, text)`
|
||||
- Otherwise → `browser_press(key)` per character (dispatches to focused element, no selector needed)
|
||||
6. Verify by reading element state via a targeted `browser_evaluate` that walks the shadow tree
|
||||
4. `browser_click_coordinate(css_x, css_y)` → lands on the element via native hit testing; inputs get focused. **The response now includes `focused_element: {tag, id, role, contenteditable, rect, ...}`** — use it to verify you actually focused what you intended.
|
||||
5. `browser_type_focused(text="...")` → dispatches CDP `Input.insertText` to `document.activeElement`. Shadow roots, iframes, Lexical, Draft.js, ProseMirror all just work. Use `browser_type(selector, text)` instead when you have a reliable CSS selector for a light-DOM element.
|
||||
6. Verify via `browser_screenshot` OR `browser_get_attribute` on a known-reachable marker (e.g. check that the Send button's `aria-disabled` flipped to `false`).
|
||||
|
||||
### The click→type loop (canonical pattern)
|
||||
|
||||
1. Call `browser_click_coordinate(x, y)` to click the target element.
|
||||
2. Check the `focused_element` field in the response — it tells you what actually received focus (tag, id, role, contenteditable, rect).
|
||||
3. If the focused element is editable, call `browser_type_focused(text="...")` to insert text. use tools to verify the text took effect.
|
||||
4. If it is NOT editable, your click landed on the wrong thing — refine coordinates and retry. Do NOT reach for `browser_evaluate` + `execCommand('insertText')` or shadow-root traversals. The problem is the click target, not the typing method.
|
||||
|
||||
`browser_click` (selector-based) also returns `focused_element`, so the same check works whether you clicked by selector or coordinate.
|
||||
|
||||
### Empirically verified (2026-04-11)
|
||||
|
||||
@@ -64,13 +71,6 @@ document > reddit-search-large [shadow]
|
||||
> input[name="q"]
|
||||
```
|
||||
|
||||
- `document.querySelector('input')` → **0 visible inputs** on the page (all in shadow)
|
||||
- `browser_type('faceplate-search-input input', 'python')` → "Element not found"
|
||||
- `browser_click_coordinate(617, 28)` → focus trail: `REDDIT-SEARCH-LARGE > FACEPLATE-SEARCH-INPUT > INPUT` ✓
|
||||
- Char-by-char key dispatch after the click → `input.value === 'python'` ✓
|
||||
|
||||
Coordinate pipeline: works perfectly. Selector pipeline: unusable without shadow-piercing syntax.
|
||||
|
||||
### Shadow-piercing selectors
|
||||
|
||||
When you DO want a selector-based approach and know the shadow structure, `browser_shadow_query` and `browser_get_rect` support `>>>` shadow-piercing syntax:
|
||||
@@ -88,8 +88,8 @@ Returns the element's rect in **CSS pixels** (feed directly to click tools). Rem
|
||||
|
||||
```
|
||||
browser_navigate(url, wait_until="load") # "load" | "domcontentloaded" | "networkidle"
|
||||
browser_wait_for_selector("h1", timeout_ms=5000)
|
||||
browser_wait_for_text("Some text", timeout_ms=5000)
|
||||
browser_wait_for_selector("h1", timeout_ms=2000)
|
||||
browser_wait_for_text("Some text", timeout_ms=2000)
|
||||
browser_go_back()
|
||||
browser_go_forward()
|
||||
browser_reload()
|
||||
@@ -107,7 +107,7 @@ All return real URLs and titles. On a fast page `navigate(wait_until="load")` re
|
||||
| x.com/twitter | 1.2–1.6 s |
|
||||
| linkedin.com (logged in) | 4–5 s |
|
||||
|
||||
Use `timeout_ms=20000` for LinkedIn and other heavy SPAs to give them margin.
|
||||
For LinkedIn and other heavy SPAs, rely on `sleep()` after navigation to let the page hydrate.
|
||||
|
||||
### After navigate, always let SPA hydrate
|
||||
|
||||
@@ -116,7 +116,7 @@ Even after `wait_until="load"`, React/Vue SPAs often render their real chrome in
|
||||
### Reading pages efficiently
|
||||
|
||||
- **Prefer `browser_snapshot` over `browser_get_text("body")`** — returns a compact ~1–5 KB accessibility tree vs 100+ KB of raw HTML.
|
||||
- Interaction tools (`browser_click`, `browser_type`, `browser_fill`, `browser_scroll`, etc.) return a page snapshot automatically in their result. Use it to decide your next action — do NOT call `browser_snapshot` separately after every action. Only call `browser_snapshot` when you need a fresh view without performing an action, or after setting `auto_snapshot=false`.
|
||||
- Interaction tools (`browser_click`, `browser_type`, `browser_type_focused`, `browser_fill`, `browser_scroll`, etc.) return a page snapshot automatically in their result. Use it to decide your next action — do NOT call `browser_snapshot` separately after every action. Only call `browser_snapshot` when you need a fresh view without performing an action, or after setting `auto_snapshot=false`.
|
||||
- Complex pages (LinkedIn, Twitter/X, SPAs with virtual scrolling) have DOMs that don't match what's visually rendered — snapshot refs may be stale, missing, or misaligned with visible layout. On these pages, `browser_screenshot` is the only reliable way to orient yourself.
|
||||
- Only fall back to `browser_get_text` for extracting specific small elements by CSS selector.
|
||||
|
||||
@@ -136,44 +136,13 @@ The symptom is always the same: **you type, the characters appear visually, and
|
||||
|
||||
### Safe "click-then-type-then-verify" pattern
|
||||
|
||||
```
|
||||
# 1. Focus the real element via a real click (not JS .focus()).
|
||||
rect = browser_get_rect(selector) # or browser_shadow_query for shadow sites
|
||||
browser_click_coordinate(rect.cx, rect.cy)
|
||||
sleep(0.5) # let the editor open / focus settle
|
||||
1. **Focus** the real element via a real click (not JS `.focus()`). Use `browser_get_rect(selector)` (or `browser_shadow_query` for shadow sites) to get coordinates, then `browser_click_coordinate(cx, cy)`. Wait ~0.5 s for the editor to open and focus to settle.
|
||||
|
||||
# 2. Type. browser_type now uses CDP Input.insertText by default, which is
|
||||
# the most reliable way to insert text into rich editors (Lexical,
|
||||
# Draft.js, ProseMirror, any React-controlled contenteditable).
|
||||
browser_type(selector, text)
|
||||
sleep(1.0) # let framework state commit
|
||||
2. **Type** the text. Use `browser_type(selector, text)` for light-DOM inputs, or `browser_type_focused(text=...)` for shadow-DOM / already-focused inputs. Both use CDP `Input.insertText` by default, which is the most reliable method for rich editors (Lexical, Draft.js, ProseMirror). Wait ~500 ms for framework state to commit.
|
||||
|
||||
# 3. BEFORE clicking send, verify the submit button is actually enabled.
|
||||
# Don't trust that typing worked — check state.
|
||||
state = browser_evaluate("""
|
||||
(function(){
|
||||
const btn = document.querySelector('[data-testid="tweetButton"]');
|
||||
if (!btn) return {exists: false};
|
||||
return {
|
||||
exists: true,
|
||||
disabled: btn.disabled || btn.getAttribute('aria-disabled') === 'true',
|
||||
text: btn.textContent.trim(),
|
||||
};
|
||||
})()
|
||||
""")
|
||||
3. **Verify** the submit button is enabled before clicking it. Use `browser_evaluate` to check the button's `disabled` or `aria-disabled` attribute. Do NOT trust that typing worked — always check state.
|
||||
|
||||
# 4. Only click send if the button is enabled.
|
||||
if not state['disabled']:
|
||||
browser_click(submit_selector)
|
||||
else:
|
||||
# Recovery: sometimes a click-again + one extra keystroke nudges
|
||||
# React into recomputing hasRealContent.
|
||||
browser_click_coordinate(rect.cx, rect.cy)
|
||||
browser_press("End")
|
||||
browser_press(" ")
|
||||
browser_press("Backspace")
|
||||
# re-check state
|
||||
```
|
||||
4. **Only click send if the button is enabled.** If the button is still disabled, try the recovery dance: click the textarea again, press `End`, press a space, press `Backspace` — this forces React to recompute `hasRealContent`. Then re-check the button state.
|
||||
|
||||
### Why `browser_type` uses `Input.insertText` by default
|
||||
|
||||
@@ -209,7 +178,7 @@ Always include an equivalent cleanup block in any script that types into a compo
|
||||
| Site | Editor | Workaround |
|
||||
|---|---|---|
|
||||
| **X / Twitter** compose | Draft.js | Click `[data-testid='tweetTextarea_0']` first, then type with `delay_ms=20`. First 1-2 chars may be eaten — accept truncation or prepend a throwaway char. Verify `[data-testid='tweetButton']` has `disabled: false` before clicking. |
|
||||
| **LinkedIn** messaging | contenteditable (inside `#interop-outlet` shadow root) | Use `browser_shadow_query` to find the rect, click-coordinate to focus, then type via focus-based key dispatch (selector-based type can't reach shadow). Send button is `.msg-form__send-button`. |
|
||||
| **LinkedIn** messaging | contenteditable (inside `#interop-outlet` shadow root) | Use `browser_shadow_query` to find the rect, click-coordinate to focus, then `browser_type_focused(text=...)` (selector-based `browser_type` can't reach shadow). Send button is `.msg-form__send-button`. |
|
||||
| **LinkedIn** feed post composer | Quill/LinkedIn custom | Click the "Start a post" trigger first, wait 1s for modal, click the textarea, type. |
|
||||
| **Reddit** comment/post box | ProseMirror | Click the textarea, wait 0.5s for the toolbar to mount, then type. Submit is `button[slot="submit-button"]` inside a shreddit-composer. |
|
||||
| **Gmail** compose | Lexical | Click the body first. Gmail has a visible `div[contenteditable=true][aria-label*='Message Body']` after opening a compose window. |
|
||||
@@ -229,7 +198,7 @@ browser_type(selector, text)
|
||||
- Fires real `keydown` / `keypress` / `input` / `keyup` events — frameworks that branch on `event.key` or `event.code` see the right values
|
||||
- Matches what Playwright and Puppeteer send
|
||||
|
||||
Works on real `<input>`, `<textarea>`, and `contenteditable` elements. For shadow-DOM inputs, see the "shadow-heavy sites" section above — `type_text(selector=)` can't see past shadow boundaries.
|
||||
Works on real `<input>`, `<textarea>`, and `contenteditable` elements. For shadow-DOM inputs, see the "shadow-heavy sites" section above — `browser_type(selector=)` can't see past shadow boundaries; use `browser_type_focused` after click-coordinate focus.
|
||||
|
||||
### Keyboard shortcuts (Ctrl+A, Shift+Tab, Cmd+Enter)
|
||||
|
||||
@@ -325,7 +294,7 @@ Reddit's search input lives **two shadow levels deep** inside `reddit-search-lar
|
||||
|
||||
1. `browser_shadow_query("reddit-search-large >>> #search-input")` → rect
|
||||
2. `browser_click_coordinate(rect.cx, rect.cy)` → click lands on the real shadow input via native hit testing; input becomes focused
|
||||
3. `browser_press(c)` for each character → dispatches to focused element
|
||||
3. `browser_type_focused(text="query")` → dispatches to focused element via `Input.insertText`
|
||||
4. Verify by reading `.value` via `browser_evaluate` walking the shadow path
|
||||
|
||||
### X / Twitter
|
||||
@@ -398,7 +367,7 @@ Then pass the most specific selector that uniquely identifies the right input (e
|
||||
- **Calling `wait_for_selector` on a shadow element.** It'll always time out. Use `browser_shadow_query` or the screenshot + coordinate strategy.
|
||||
- **Relying on `innerHTML` in injected scripts on LinkedIn.** Silently discarded. Use `createElement` + `appendChild`.
|
||||
- **Not waiting for SPA hydration.** `wait_until="load"` fires before React/Vue rendering on many sites. Add a 2–3 s sleep before querying for chrome elements.
|
||||
- **Using `browser_type(selector)` on LinkedIn DMs or any shadow-DOM input.** Won't find the element. Fall back to click-to-focus + `browser_press` per character.
|
||||
- **Using `browser_type(selector)` on LinkedIn DMs or any shadow-DOM input.** Won't find the element. Use `browser_click_coordinate` to focus, then `browser_type_focused(text=...)` to type.
|
||||
- **Clicking a "Photo" / "Attach" / "Upload" button to pick a file.** This opens Chrome's NATIVE OS file picker, which is rendered outside the web page and cannot be interacted with via CDP. Your automation will hang staring at an unreachable dialog. ALWAYS use `browser_upload(selector, file_paths)` against the underlying `<input type='file'>` element — see the "File uploads" section above for the full pattern. This is the single most common way to wedge a browser session on compose-with-media flows (X/LinkedIn/Gmail).
|
||||
- **Keyboard shortcuts without the `code` field.** Chrome's shortcut dispatcher ignores keyboard events that lack a `code` or `windowsVirtualKeyCode`. `browser_press(..., modifiers=[...])` populates these automatically; raw `Input.dispatchKeyEvent` calls from `browser_evaluate` may not.
|
||||
- **Taking a screenshot more than 10s after the last interaction** and expecting the highlight to still be visible. The overlay fades after 10s. Take the screenshot sooner, or re-trigger the interaction.
|
||||
@@ -461,9 +430,8 @@ sleep(2)
|
||||
# Shadow-pierce the nested search input
|
||||
sq = browser_shadow_query("reddit-search-large >>> #search-input")
|
||||
browser_click_coordinate(sq.rect.cx, sq.rect.cy)
|
||||
# Typing can't use selector (shadow); focused input receives raw key presses
|
||||
for c in "python":
|
||||
browser_press(c)
|
||||
# Typing can't use selector (shadow); use browser_type_focused on the focused input
|
||||
browser_type_focused(text="python")
|
||||
browser_screenshot()
|
||||
browser_press("Escape")
|
||||
```
|
||||
@@ -471,7 +439,7 @@ browser_press("Escape")
|
||||
### Search LinkedIn and dismiss without submitting
|
||||
|
||||
```
|
||||
browser_navigate("https://www.linkedin.com/feed/", wait_until="load", timeout_ms=20000)
|
||||
browser_navigate("https://www.linkedin.com/feed/", wait_until="load")
|
||||
sleep(3)
|
||||
browser_wait_for_selector("input[data-testid='typeahead-input']", timeout_ms=5000)
|
||||
rect = browser_get_rect("input[data-testid='typeahead-input']")
|
||||
|
||||
@@ -13,15 +13,15 @@ metadata:
|
||||
|
||||
LinkedIn is the hardest mainstream site to automate because it combines **shadow DOM** (`#interop-outlet` for messaging), **strict Trusted Types CSP** (silently drops `innerHTML`), **heavy React reconciliation** (injected nodes get stripped on re-render), **native `beforeunload` draft dialogs** (hang the bridge), and **aggressive spam filters**. Every one of those has bit us at least once. This skill documents what actually works.
|
||||
|
||||
**Always activate `browser-automation` first.** This skill assumes you already know about CSS-px coordinates, `browser_type`'s click-first behavior, and `browser_shadow_query`. The guidance below is LinkedIn-specific; general browser rules are there.
|
||||
**Always activate `browser-automation` first.** This skill assumes you already know about CSS-px coordinates, `browser_type`/`browser_type_focused`, and `browser_shadow_query`. The guidance below is LinkedIn-specific; general browser rules are there.
|
||||
|
||||
## Timing expectations
|
||||
|
||||
- `browser_navigate(wait_until="load", timeout_ms=20000)` — LinkedIn takes **4–5 seconds** to load the feed cold. Default 30s timeout is fine; use 20s as a floor.
|
||||
- `browser_navigate(wait_until="load")` — LinkedIn takes **4–5 seconds** to load the feed cold.
|
||||
- After navigation, **always `sleep(3)`** to let React hydrate the profile/feed chrome before querying selectors. Without the sleep `wait_for_selector` will flake on elements that exist moments later.
|
||||
- Composer modal slide-in takes **~2 seconds** after you click the Message button.
|
||||
|
||||
## Verified selectors (2026-04-11)
|
||||
## Verified selectors
|
||||
|
||||
| Target | Selector | Notes |
|
||||
|---|---|---|
|
||||
@@ -40,8 +40,8 @@ LinkedIn changes class names aggressively. If a class-based selector breaks, fal
|
||||
|
||||
```
|
||||
# 1. Load the profile
|
||||
browser_navigate("https://www.linkedin.com/in/<username>/", wait_until="load", timeout_ms=20000)
|
||||
sleep(4)
|
||||
browser_navigate("https://www.linkedin.com/in/<username>/", wait_until="load")
|
||||
sleep(3)
|
||||
|
||||
# 2. Strip onbeforeunload before any state-mutating work — prevents draft-dialog deadlock later
|
||||
browser_evaluate("""
|
||||
@@ -98,19 +98,18 @@ textarea = browser_evaluate("""
|
||||
browser_click_coordinate(textarea['cx'], textarea['cy'])
|
||||
sleep(0.6)
|
||||
|
||||
# 6. Insert text via document.execCommand('insertText') through browser_evaluate.
|
||||
# This is the ONLY reliable approach for LinkedIn's Lexical composer.
|
||||
# See the "Lexical composer quirks" section below for why browser_type
|
||||
# with a selector does NOT work here (the contenteditable lives inside
|
||||
# the #interop-outlet shadow root which document.querySelector can't
|
||||
# reach). The click in step 5 already put Lexical into edit mode, so
|
||||
# execCommand injects straight into the focused editor's state.
|
||||
browser_evaluate("""
|
||||
(function(){
|
||||
document.execCommand('insertText', false, %s);
|
||||
return true;
|
||||
})();
|
||||
""" % json.dumps(message_text)) # json.dumps gives you a safely-escaped JS string literal
|
||||
# 6. Insert text via browser_type_focused. This dispatches CDP
|
||||
# Input.insertText to document.activeElement — the same underlying
|
||||
# mechanism as execCommand('insertText') but with no JSON escaping,
|
||||
# no browser_evaluate round trip, and built-in retry. The click in
|
||||
# step 5 already focused Lexical, so insertText lands in the editor
|
||||
# regardless of the shadow wrapping around #interop-outlet.
|
||||
#
|
||||
# Use browser_type_focused (not browser_type) here — browser_type
|
||||
# requires a selector, which cannot see past the #interop-outlet
|
||||
# shadow root. browser_type_focused targets document.activeElement
|
||||
# directly, sidestepping shadow boundaries entirely.
|
||||
browser_type_focused(text=message_text)
|
||||
sleep(1.0) # let Lexical commit state + enable Send button
|
||||
|
||||
# 7. Find the modal Send button (filter by in-viewport, reject pinned bar)
|
||||
@@ -143,20 +142,21 @@ send = browser_evaluate("""
|
||||
})();
|
||||
""")
|
||||
|
||||
# 8. ONLY click Send if it's enabled — if disabled, the execCommand
|
||||
# 8. ONLY click Send if it's enabled — if disabled, the insertText
|
||||
# didn't land. DO NOT retry with a different tool; the fix is
|
||||
# always: re-click the composer rect, re-run execCommand, re-check.
|
||||
# The Send button's `disabled` state IS the ground truth — if
|
||||
# Lexical registered your text, it enables the button. If it's
|
||||
# always: re-click the composer rect, re-run browser_type_focused(text=...),
|
||||
# re-check. The Send button's `disabled` state IS the ground truth —
|
||||
# if Lexical registered your text, it enables the button. If it's
|
||||
# still disabled, your text did not reach the editor, regardless
|
||||
# of what any tool call claims.
|
||||
if send['disabled']:
|
||||
# The editor didn't receive your text. Do NOT click Send. Do NOT
|
||||
# fall back to browser_type with a dummy selector (see anti-pattern
|
||||
# in Common Pitfalls). Instead: re-click the textarea rect from
|
||||
# step 4, wait a beat, re-run the execCommand insertText from step
|
||||
# 6. If that still fails after 2 retries, bail and surface — the
|
||||
# modal may have been reclaimed by a stale state or auth wall.
|
||||
# fall back to browser_type with a selector (see anti-pattern in
|
||||
# Common Pitfalls — selector-based type can't reach the shadow-DOM
|
||||
# composer). Instead: re-click the textarea rect from step 4, wait
|
||||
# a beat, re-run browser_type_focused(text=message_text) from
|
||||
# step 6. If that still fails after 2 retries, bail and surface —
|
||||
# the modal may have been reclaimed by a stale state or auth wall.
|
||||
raise Exception("Send button disabled after insertText — editor did not receive input")
|
||||
|
||||
browser_click_coordinate(send['cx'], send['cy'])
|
||||
@@ -171,7 +171,7 @@ Daily outbound pattern — accept pending connection requests and send a templat
|
||||
|
||||
```
|
||||
browser_navigate("https://www.linkedin.com/mynetwork/invitation-manager/received/",
|
||||
wait_until="load", timeout_ms=20000)
|
||||
wait_until="load")
|
||||
sleep(4)
|
||||
browser_evaluate("(function(){window.onbeforeunload=null;})()")
|
||||
|
||||
@@ -215,7 +215,7 @@ for card in cards[:25]:
|
||||
## Feed post composer flow
|
||||
|
||||
```
|
||||
browser_navigate("https://www.linkedin.com/feed/", wait_until="load", timeout_ms=20000)
|
||||
browser_navigate("https://www.linkedin.com/feed/", wait_until="load")
|
||||
sleep(4)
|
||||
browser_evaluate("(function(){window.onbeforeunload=null;})()")
|
||||
|
||||
@@ -302,7 +302,7 @@ If the image isn't already on disk, write it first with `write_file(absolute_pat
|
||||
|
||||
## Rate limits and safety
|
||||
|
||||
LinkedIn's abuse detection is aggressive. Respect these limits:
|
||||
LinkedIn's abuse detection is aggressive. Beware of the limits, let user know but dare to break them if user confirms.:
|
||||
|
||||
| Action | Limit |
|
||||
|---|---|
|
||||
@@ -310,8 +310,7 @@ LinkedIn's abuse detection is aggressive. Respect these limits:
|
||||
| Outbound messages to new 1st-degree connections | **25/day max**, 5–10s randomized delays |
|
||||
| Connection request sends | **100/week max**, spread across days, warm intros preferred |
|
||||
| Profile views | Several hundred/day is usually fine but varies by account age |
|
||||
| Post publications | 1–3/day, no URL-only posts |
|
||||
| Feed reactions | Dozens/day is fine; vary your activity mix |
|
||||
| Post publications | 1–5/day, no URL-only posts |
|
||||
|
||||
Signals you're being throttled:
|
||||
- "Message failed to send" with no error detail
|
||||
@@ -324,9 +323,8 @@ If any of those show up, **stop the run, screenshot the state, and surface the i
|
||||
## Common pitfalls
|
||||
|
||||
- **`innerHTML` injection is silently dropped** — LinkedIn's Trusted Types CSP discards any `innerHTML = "<...>"` from injected scripts, no console error. Always use `createElement` + `appendChild` + `setAttribute` for DOM injection. `textContent`, `style.cssText`, and `.value` assignments are fine.
|
||||
- **Do NOT use `browser_type` on the message composer — use `document.execCommand('insertText', false, text)` via `browser_evaluate` instead.** The Lexical contenteditable lives inside the `#interop-outlet` shadow root which `document.querySelector` (what `browser_type` uses under the hood) cannot see. Attempts to work around this with `browser_shadow_query` fail because `browser_type` doesn't support the `>>>` shadow-pierce syntax. The ONLY reliable insert path is: (1) `browser_click_coordinate` on the composer rect (put Lexical in edit mode via a real CDP pointer click) → (2) `browser_evaluate` with `document.execCommand('insertText', false, <message>)` against the focused editor. This pattern is verified end-to-end across 15+ successful sends in session `session_20260414_113244_a98cfd66` (2026-04-14).
|
||||
- **Per-char keyDown on the message composer produces empty text** — Lexical intercepts `beforeinput` and drops raw keys. Ignore `browser_type` entirely for LinkedIn DMs; use the `execCommand('insertText')` path above.
|
||||
- **ANTI-PATTERN: "inject a dummy `<div id='dummy-target'>` and pass it as the `selector` arg to `browser_type`".** This looks tempting but fails compoundingly: `browser_type` clicks the **dummy div's** rect (not the editor's), the click lands on the Lexical wrapper's non-editable chrome, the contenteditable never receives focus, and `Input.insertText` fires against nothing. The bridge will still return `{"ok": true, "action": "type", "length": N}` because it has no way to verify the text actually landed. Symptom: Send button stays `disabled: true` forever. Fix: use `execCommand('insertText')` exactly as shown in the profile-message flow above. (See `session_20260414_114820_08bd3c4d` for the failed attempt.)
|
||||
- **Use `browser_type_focused` (not `browser_type`) on the message composer.** The Lexical contenteditable lives inside the `#interop-outlet` shadow root which `document.querySelector` (what `browser_type`'s selector path uses under the hood) cannot see. `browser_type` requires a selector and will fail with "Element not found". The reliable insert path is: (1) `browser_click_coordinate` on the composer rect — the response's `focused_element` confirms Lexical received focus → (2) `browser_type_focused(text=message_text)` — CDP `Input.insertText` dispatches to `document.activeElement` regardless of shadow wrapping.
|
||||
- **Per-char keyDown on the message composer produces empty text** — Lexical intercepts `beforeinput` and drops raw keys. Use `browser_type_focused(text=..., use_insert_text=True)` after click-coordinate focused the composer. The CDP `Input.insertText` method commits as if IME fired, which Lexical accepts cleanly.
|
||||
- **Multiple Send buttons on the page** — the pinned bottom-right messaging bar has its own `msg-form__send-button` that's usually below `innerHeight`. Filter by in-viewport before clicking.
|
||||
- **`window.onbeforeunload` hangs navigation/close** — after typing in a composer, any `browser_navigate` or `close_tab` can pop a native "unsent message, leave?" confirm dialog that deadlocks the bridge. Always strip `onbeforeunload` before any navigation, and wrap composer flows in a `try/finally` that runs the cleanup block:
|
||||
|
||||
@@ -347,7 +345,7 @@ browser_evaluate("""
|
||||
|
||||
## Auth wall detection
|
||||
|
||||
If you see a "Log in" / "Join LinkedIn" prompt instead of the logged-in feed, **stop immediately** and surface the issue. Do NOT attempt to log in via automation — LinkedIn's bot detection will flag the account.
|
||||
If you see a "Log in" / "Join LinkedIn" prompt instead of the logged-in feed, **stop immediately** and surface the issue to user. Do NOT attempt to log in via automation — LinkedIn's bot detection will flag the account.
|
||||
|
||||
Check via:
|
||||
```
|
||||
|
||||
@@ -899,6 +899,7 @@ def test_concurrency_safe_allowlist_is_conservative():
|
||||
"hashline_edit",
|
||||
"browser_click",
|
||||
"browser_type",
|
||||
"browser_type_focused",
|
||||
"browser_navigate",
|
||||
):
|
||||
assert forbidden not in allowlist, f"{forbidden} must not be concurrency-safe"
|
||||
|
||||
@@ -0,0 +1,268 @@
|
||||
"""
|
||||
Browser Remote Control — act as an agent to call browser tools via a UI.
|
||||
|
||||
Spawns its own GCU MCP server subprocess (same way a real agent does),
|
||||
connects as an MCP client, and exposes the tools over HTTP for the web UI.
|
||||
|
||||
Usage:
|
||||
uv run scripts/browser_remote.py # starts server + opens UI
|
||||
uv run scripts/browser_remote.py --no-ui # API only, no browser open
|
||||
|
||||
Then use the UI at http://localhost:9250/ui or curl directly:
|
||||
curl -X POST http://localhost:9250/browser_click \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{"selector": "#login-btn"}'
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import asyncio
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import sys
|
||||
import webbrowser
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
from aiohttp import web
|
||||
|
||||
# Add framework to path so we can use the existing MCPClient
|
||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "core"))
|
||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "tools", "src"))
|
||||
|
||||
from framework.loader.mcp_client import MCPClient, MCPServerConfig
|
||||
|
||||
logger = logging.getLogger("browser_remote")
|
||||
|
||||
DEFAULT_PORT = 9250
|
||||
TOOLS_DIR = str((Path(__file__).parent.parent / "tools").resolve())
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# MCP client — connects to GCU server exactly like an agent would
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
_mcp_client: MCPClient | None = None
|
||||
|
||||
|
||||
def get_mcp_client() -> MCPClient:
|
||||
"""Get or create the MCP client connected to the GCU server."""
|
||||
global _mcp_client
|
||||
if _mcp_client is None:
|
||||
bridge_port = os.environ.get("HIVE_BRIDGE_PORT", "9229")
|
||||
config = MCPServerConfig(
|
||||
name="gcu-tools",
|
||||
transport="stdio",
|
||||
command="uv",
|
||||
args=["run", "python", "-m", "gcu.server", "--stdio", "--capabilities", "browser"],
|
||||
cwd=TOOLS_DIR,
|
||||
env={"HIVE_BRIDGE_PORT": bridge_port},
|
||||
)
|
||||
_mcp_client = MCPClient(config)
|
||||
_mcp_client.connect()
|
||||
tools = _mcp_client.list_tools()
|
||||
logger.info(
|
||||
"Connected to GCU server, %d tools available: %s",
|
||||
len(tools),
|
||||
[t.name for t in tools],
|
||||
)
|
||||
return _mcp_client
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# HTTP Handlers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
async def handle_ui(request: web.Request) -> web.Response:
|
||||
"""GET /ui — serve the web UI."""
|
||||
ui_path = Path(__file__).parent / "browser_remote_ui.html"
|
||||
return web.FileResponse(ui_path)
|
||||
|
||||
|
||||
async def handle_index(request: web.Request) -> web.Response:
|
||||
"""GET / — redirect to UI."""
|
||||
raise web.HTTPFound("/ui")
|
||||
|
||||
|
||||
async def handle_status(request: web.Request) -> web.Response:
|
||||
"""GET /status — connection status."""
|
||||
try:
|
||||
client = get_mcp_client()
|
||||
tools = client.list_tools()
|
||||
return web.json_response({
|
||||
"connected": True,
|
||||
"tools_count": len(tools),
|
||||
})
|
||||
except Exception as e:
|
||||
return web.json_response({"connected": False, "error": str(e)})
|
||||
|
||||
|
||||
async def handle_tools(request: web.Request) -> web.Response:
|
||||
"""GET /tools — list available tools with their schemas."""
|
||||
try:
|
||||
client = get_mcp_client()
|
||||
tools = client.list_tools()
|
||||
schemas = {}
|
||||
for tool in tools:
|
||||
props = tool.input_schema.get("properties", {})
|
||||
required = tool.input_schema.get("required", [])
|
||||
params = {}
|
||||
for pname, pspec in props.items():
|
||||
param_def: dict[str, Any] = {"type": pspec.get("type", "string")}
|
||||
if pname in required:
|
||||
param_def["required"] = True
|
||||
if "default" in pspec:
|
||||
param_def["default"] = pspec["default"]
|
||||
if "enum" in pspec:
|
||||
param_def["enum"] = pspec["enum"]
|
||||
if pspec.get("type") == "array" and "items" in pspec:
|
||||
param_def["items"] = pspec["items"].get("type", "string")
|
||||
params[pname] = param_def
|
||||
schemas[tool.name] = {
|
||||
"description": tool.description.split("\n")[0].strip() if tool.description else "",
|
||||
"params": params,
|
||||
}
|
||||
return web.json_response(schemas)
|
||||
except Exception as e:
|
||||
return web.json_response({"error": str(e)}, status=500)
|
||||
|
||||
|
||||
async def handle_tool_call(request: web.Request) -> web.Response:
|
||||
"""POST /<tool_name> — call a browser tool."""
|
||||
tool_name = request.match_info["tool"]
|
||||
|
||||
try:
|
||||
body = await request.read()
|
||||
params = json.loads(body) if body.strip() else {}
|
||||
except json.JSONDecodeError:
|
||||
return web.json_response({"ok": False, "error": "Invalid JSON"}, status=400)
|
||||
|
||||
logger.info("=> %s %s", tool_name, json.dumps(params, default=str)[:200])
|
||||
|
||||
try:
|
||||
client = get_mcp_client()
|
||||
# call_tool is synchronous (blocks on the stdio subprocess)
|
||||
# Run it in a thread so we don't block the event loop
|
||||
loop = asyncio.get_event_loop()
|
||||
result = await loop.run_in_executor(None, client.call_tool, tool_name, params)
|
||||
|
||||
# MCP returns a list of content blocks — extract text/image
|
||||
response = _format_mcp_result(result)
|
||||
logger.info("<= %s ok=%s", tool_name, response.get("ok", True))
|
||||
return web.json_response(response)
|
||||
except Exception as e:
|
||||
logger.error("<= %s error: %s", tool_name, e)
|
||||
return web.json_response({"ok": False, "error": str(e)}, status=500)
|
||||
|
||||
|
||||
def _format_mcp_result(result: Any) -> dict:
|
||||
"""Convert MCP tool result into a JSON-friendly dict."""
|
||||
if result is None:
|
||||
return {"ok": True}
|
||||
|
||||
# MCPClient.call_tool returns the raw result from the MCP SDK
|
||||
# which could be a list of content blocks, a dict, or a string
|
||||
if isinstance(result, dict):
|
||||
return result
|
||||
|
||||
if isinstance(result, str):
|
||||
try:
|
||||
return json.loads(result)
|
||||
except (json.JSONDecodeError, TypeError):
|
||||
return {"ok": True, "text": result}
|
||||
|
||||
if isinstance(result, list):
|
||||
# List of MCP content blocks (TextContent, ImageContent, etc.)
|
||||
texts = []
|
||||
images = []
|
||||
for item in result:
|
||||
if hasattr(item, "text"):
|
||||
try:
|
||||
parsed = json.loads(item.text)
|
||||
if isinstance(parsed, dict):
|
||||
return parsed # Tool returned structured JSON
|
||||
except (json.JSONDecodeError, TypeError):
|
||||
pass
|
||||
texts.append(item.text)
|
||||
elif hasattr(item, "data"):
|
||||
images.append({"mime_type": getattr(item, "mime_type", "image/png"), "data": item.data})
|
||||
|
||||
response: dict[str, Any] = {"ok": True}
|
||||
if texts:
|
||||
response["text"] = "\n".join(texts)
|
||||
if images:
|
||||
response["images"] = images
|
||||
return response
|
||||
|
||||
return {"ok": True, "result": str(result)}
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Server setup
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
@web.middleware
|
||||
async def cors_middleware(request: web.Request, handler):
|
||||
if request.method == "OPTIONS":
|
||||
resp = web.Response()
|
||||
else:
|
||||
resp = await handler(request)
|
||||
resp.headers["Access-Control-Allow-Origin"] = "*"
|
||||
resp.headers["Access-Control-Allow-Methods"] = "GET, POST, OPTIONS"
|
||||
resp.headers["Access-Control-Allow-Headers"] = "Content-Type"
|
||||
return resp
|
||||
|
||||
|
||||
def create_app() -> web.Application:
|
||||
app = web.Application(middlewares=[cors_middleware])
|
||||
app.router.add_get("/", handle_index)
|
||||
app.router.add_get("/ui", handle_ui)
|
||||
app.router.add_get("/tools", handle_tools)
|
||||
app.router.add_get("/status", handle_status)
|
||||
app.router.add_post("/{tool}", handle_tool_call)
|
||||
return app
|
||||
|
||||
|
||||
def main() -> None:
|
||||
parser = argparse.ArgumentParser(description="Browser Remote Control")
|
||||
parser.add_argument("--port", type=int, default=int(os.environ.get("BROWSER_REMOTE_PORT", DEFAULT_PORT)))
|
||||
parser.add_argument("--no-ui", action="store_true", help="Don't auto-open the browser")
|
||||
args = parser.parse_args()
|
||||
|
||||
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(name)s %(levelname)s %(message)s")
|
||||
|
||||
# Connect to GCU server eagerly so we fail fast if something is wrong
|
||||
try:
|
||||
client = get_mcp_client()
|
||||
except Exception as e:
|
||||
logger.error("Failed to connect to GCU server: %s", e)
|
||||
sys.exit(1)
|
||||
|
||||
# Auto-start browser context so tools work immediately
|
||||
try:
|
||||
result = client.call_tool("browser_start", {})
|
||||
logger.info("browser_start: %s", result)
|
||||
except Exception as e:
|
||||
logger.warning("browser_start failed (may already be started): %s", e)
|
||||
|
||||
app = create_app()
|
||||
|
||||
async def on_startup(app: web.Application) -> None:
|
||||
if not args.no_ui:
|
||||
webbrowser.open(f"http://localhost:{args.port}/ui")
|
||||
|
||||
app.on_startup.append(on_startup)
|
||||
|
||||
print(f"Browser Remote Control on http://localhost:{args.port}")
|
||||
print(f" UI: http://localhost:{args.port}/ui")
|
||||
print(f" API: POST http://localhost:{args.port}/<tool>")
|
||||
print()
|
||||
web.run_app(app, host="127.0.0.1", port=args.port, print=None)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -0,0 +1,838 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
<title>Browser Remote Control</title>
|
||||
<style>
|
||||
:root {
|
||||
--bg: #0d1117;
|
||||
--surface: #161b22;
|
||||
--surface2: #21262d;
|
||||
--border: #30363d;
|
||||
--text: #e6edf3;
|
||||
--text2: #8b949e;
|
||||
--accent: #58a6ff;
|
||||
--accent-dim: #1f6feb;
|
||||
--green: #3fb950;
|
||||
--red: #f85149;
|
||||
--orange: #d29922;
|
||||
--radius: 8px;
|
||||
}
|
||||
|
||||
* { box-sizing: border-box; margin: 0; padding: 0; }
|
||||
|
||||
body {
|
||||
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Helvetica, Arial, sans-serif;
|
||||
background: var(--bg);
|
||||
color: var(--text);
|
||||
line-height: 1.5;
|
||||
padding: 0;
|
||||
}
|
||||
|
||||
header {
|
||||
background: var(--surface);
|
||||
border-bottom: 1px solid var(--border);
|
||||
padding: 16px 24px;
|
||||
display: flex;
|
||||
align-items: center;
|
||||
justify-content: space-between;
|
||||
position: sticky;
|
||||
top: 0;
|
||||
z-index: 100;
|
||||
}
|
||||
|
||||
header h1 {
|
||||
font-size: 18px;
|
||||
font-weight: 600;
|
||||
}
|
||||
|
||||
#status-badge {
|
||||
font-size: 13px;
|
||||
padding: 4px 12px;
|
||||
border-radius: 20px;
|
||||
font-weight: 500;
|
||||
}
|
||||
#status-badge.connected { background: rgba(63,185,80,0.15); color: var(--green); }
|
||||
#status-badge.disconnected { background: rgba(248,81,73,0.15); color: var(--red); }
|
||||
#status-badge.checking { background: rgba(210,153,34,0.15); color: var(--orange); }
|
||||
|
||||
.layout {
|
||||
display: flex;
|
||||
height: calc(100vh - 57px);
|
||||
}
|
||||
|
||||
/* Sidebar */
|
||||
.sidebar {
|
||||
width: 240px;
|
||||
min-width: 240px;
|
||||
background: var(--surface);
|
||||
border-right: 1px solid var(--border);
|
||||
overflow-y: auto;
|
||||
padding: 12px 0;
|
||||
}
|
||||
|
||||
.sidebar-group {
|
||||
margin-bottom: 8px;
|
||||
}
|
||||
|
||||
.sidebar-group-label {
|
||||
font-size: 11px;
|
||||
font-weight: 600;
|
||||
text-transform: uppercase;
|
||||
letter-spacing: 0.5px;
|
||||
color: var(--text2);
|
||||
padding: 8px 16px 4px;
|
||||
}
|
||||
|
||||
.sidebar-item {
|
||||
display: block;
|
||||
width: 100%;
|
||||
text-align: left;
|
||||
background: none;
|
||||
border: none;
|
||||
color: var(--text2);
|
||||
font-size: 13px;
|
||||
padding: 6px 16px 6px 24px;
|
||||
cursor: pointer;
|
||||
font-family: 'SF Mono', 'Fira Code', monospace;
|
||||
transition: background 0.1s, color 0.1s;
|
||||
}
|
||||
|
||||
.sidebar-item:hover {
|
||||
background: var(--surface2);
|
||||
color: var(--text);
|
||||
}
|
||||
|
||||
.sidebar-item.active {
|
||||
background: rgba(88,166,255,0.1);
|
||||
color: var(--accent);
|
||||
border-right: 2px solid var(--accent);
|
||||
}
|
||||
|
||||
/* Main content */
|
||||
.main {
|
||||
flex: 1;
|
||||
overflow-y: auto;
|
||||
padding: 24px 32px;
|
||||
}
|
||||
|
||||
.tools-grid {
|
||||
display: grid;
|
||||
grid-template-columns: repeat(auto-fill, minmax(420px, 1fr));
|
||||
gap: 16px;
|
||||
}
|
||||
|
||||
.tool-card {
|
||||
background: var(--surface);
|
||||
border: 1px solid var(--border);
|
||||
border-radius: var(--radius);
|
||||
overflow: hidden;
|
||||
transition: border-color 0.15s;
|
||||
}
|
||||
|
||||
.tool-card:hover { border-color: var(--accent-dim); }
|
||||
.tool-card.active { border-color: var(--accent); }
|
||||
|
||||
.tool-card-header {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
justify-content: space-between;
|
||||
padding: 12px 16px;
|
||||
border-bottom: 1px solid var(--border);
|
||||
cursor: pointer;
|
||||
user-select: none;
|
||||
}
|
||||
|
||||
.tool-card-header:hover { background: var(--surface2); }
|
||||
|
||||
.tool-name {
|
||||
font-family: 'SF Mono', 'Fira Code', monospace;
|
||||
font-size: 13px;
|
||||
font-weight: 600;
|
||||
color: var(--accent);
|
||||
}
|
||||
|
||||
.tool-desc {
|
||||
font-size: 12px;
|
||||
color: var(--text2);
|
||||
margin-left: 8px;
|
||||
}
|
||||
|
||||
.tool-card-body {
|
||||
padding: 16px;
|
||||
display: none;
|
||||
}
|
||||
|
||||
.tool-card.open .tool-card-body { display: block; }
|
||||
|
||||
.chevron {
|
||||
color: var(--text2);
|
||||
transition: transform 0.2s;
|
||||
font-size: 12px;
|
||||
}
|
||||
|
||||
.tool-card.open .chevron { transform: rotate(90deg); }
|
||||
|
||||
/* Form fields */
|
||||
.field {
|
||||
margin-bottom: 12px;
|
||||
}
|
||||
|
||||
.field:last-of-type { margin-bottom: 16px; }
|
||||
|
||||
.field label {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 6px;
|
||||
font-size: 12px;
|
||||
font-weight: 500;
|
||||
color: var(--text2);
|
||||
margin-bottom: 4px;
|
||||
}
|
||||
|
||||
.field label .required {
|
||||
color: var(--red);
|
||||
font-size: 10px;
|
||||
}
|
||||
|
||||
.field label .type-tag {
|
||||
font-size: 10px;
|
||||
padding: 1px 5px;
|
||||
border-radius: 3px;
|
||||
background: var(--surface2);
|
||||
color: var(--text2);
|
||||
font-family: 'SF Mono', 'Fira Code', monospace;
|
||||
}
|
||||
|
||||
.field input, .field select, .field textarea {
|
||||
width: 100%;
|
||||
background: var(--bg);
|
||||
border: 1px solid var(--border);
|
||||
border-radius: 6px;
|
||||
color: var(--text);
|
||||
font-size: 13px;
|
||||
padding: 8px 10px;
|
||||
font-family: 'SF Mono', 'Fira Code', monospace;
|
||||
outline: none;
|
||||
transition: border-color 0.15s;
|
||||
}
|
||||
|
||||
.field input:focus, .field select:focus, .field textarea:focus {
|
||||
border-color: var(--accent);
|
||||
}
|
||||
|
||||
.field textarea { min-height: 60px; resize: vertical; }
|
||||
|
||||
.field input[type="checkbox"] {
|
||||
width: auto;
|
||||
margin-right: 4px;
|
||||
}
|
||||
|
||||
.checkbox-row {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 6px;
|
||||
padding: 4px 0;
|
||||
}
|
||||
|
||||
.checkbox-row label {
|
||||
margin-bottom: 0;
|
||||
cursor: pointer;
|
||||
}
|
||||
|
||||
/* Buttons */
|
||||
.btn-run {
|
||||
display: inline-flex;
|
||||
align-items: center;
|
||||
gap: 6px;
|
||||
background: var(--accent-dim);
|
||||
color: #fff;
|
||||
border: none;
|
||||
border-radius: 6px;
|
||||
padding: 8px 20px;
|
||||
font-size: 13px;
|
||||
font-weight: 600;
|
||||
cursor: pointer;
|
||||
transition: background 0.15s;
|
||||
}
|
||||
|
||||
.btn-run:hover { background: var(--accent); }
|
||||
.btn-run:disabled { opacity: 0.5; cursor: not-allowed; }
|
||||
.btn-run.running { background: var(--orange); }
|
||||
|
||||
/* Result area */
|
||||
.result-area {
|
||||
margin-top: 12px;
|
||||
display: none;
|
||||
}
|
||||
|
||||
.result-area.visible { display: block; }
|
||||
|
||||
.result-header {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 8px;
|
||||
margin-bottom: 6px;
|
||||
}
|
||||
|
||||
.result-status {
|
||||
font-size: 12px;
|
||||
font-weight: 600;
|
||||
padding: 2px 8px;
|
||||
border-radius: 4px;
|
||||
}
|
||||
|
||||
.result-status.ok { background: rgba(63,185,80,0.15); color: var(--green); }
|
||||
.result-status.error { background: rgba(248,81,73,0.15); color: var(--red); }
|
||||
|
||||
.result-duration {
|
||||
font-size: 11px;
|
||||
color: var(--text2);
|
||||
}
|
||||
|
||||
.result-json {
|
||||
background: var(--bg);
|
||||
border: 1px solid var(--border);
|
||||
border-radius: 6px;
|
||||
padding: 12px;
|
||||
font-family: 'SF Mono', 'Fira Code', monospace;
|
||||
font-size: 12px;
|
||||
line-height: 1.6;
|
||||
max-height: 300px;
|
||||
overflow: auto;
|
||||
white-space: pre-wrap;
|
||||
word-break: break-word;
|
||||
}
|
||||
|
||||
.result-screenshot {
|
||||
max-width: 100%;
|
||||
border: 1px solid var(--border);
|
||||
border-radius: 6px;
|
||||
margin-top: 8px;
|
||||
}
|
||||
|
||||
/* History panel */
|
||||
.history-panel {
|
||||
width: 320px;
|
||||
min-width: 320px;
|
||||
background: var(--surface);
|
||||
border-left: 1px solid var(--border);
|
||||
overflow-y: auto;
|
||||
padding: 12px;
|
||||
}
|
||||
|
||||
.history-title {
|
||||
font-size: 12px;
|
||||
font-weight: 600;
|
||||
text-transform: uppercase;
|
||||
letter-spacing: 0.5px;
|
||||
color: var(--text2);
|
||||
padding: 4px 4px 8px;
|
||||
border-bottom: 1px solid var(--border);
|
||||
margin-bottom: 8px;
|
||||
}
|
||||
|
||||
.history-item {
|
||||
padding: 8px;
|
||||
border-radius: 6px;
|
||||
margin-bottom: 4px;
|
||||
cursor: pointer;
|
||||
transition: background 0.1s;
|
||||
border: 1px solid transparent;
|
||||
}
|
||||
|
||||
.history-item:hover {
|
||||
background: var(--surface2);
|
||||
}
|
||||
|
||||
.history-item-tool {
|
||||
font-family: 'SF Mono', 'Fira Code', monospace;
|
||||
font-size: 12px;
|
||||
font-weight: 600;
|
||||
}
|
||||
|
||||
.history-item-tool.ok { color: var(--green); }
|
||||
.history-item-tool.error { color: var(--red); }
|
||||
|
||||
.history-item-time {
|
||||
font-size: 11px;
|
||||
color: var(--text2);
|
||||
}
|
||||
|
||||
.history-item-params {
|
||||
font-size: 11px;
|
||||
color: var(--text2);
|
||||
font-family: 'SF Mono', 'Fira Code', monospace;
|
||||
white-space: nowrap;
|
||||
overflow: hidden;
|
||||
text-overflow: ellipsis;
|
||||
max-width: 280px;
|
||||
}
|
||||
|
||||
.history-empty {
|
||||
color: var(--text2);
|
||||
font-size: 13px;
|
||||
text-align: center;
|
||||
padding: 24px 0;
|
||||
}
|
||||
|
||||
.clear-history {
|
||||
background: none;
|
||||
border: none;
|
||||
color: var(--text2);
|
||||
font-size: 11px;
|
||||
cursor: pointer;
|
||||
float: right;
|
||||
padding: 0;
|
||||
}
|
||||
.clear-history:hover { color: var(--red); }
|
||||
|
||||
/* View mode toggle */
|
||||
.view-toggle {
|
||||
display: flex;
|
||||
gap: 4px;
|
||||
background: var(--surface2);
|
||||
border-radius: 6px;
|
||||
padding: 2px;
|
||||
}
|
||||
|
||||
.view-toggle button {
|
||||
background: none;
|
||||
border: none;
|
||||
color: var(--text2);
|
||||
font-size: 12px;
|
||||
padding: 4px 12px;
|
||||
border-radius: 4px;
|
||||
cursor: pointer;
|
||||
}
|
||||
|
||||
.view-toggle button.active {
|
||||
background: var(--accent-dim);
|
||||
color: #fff;
|
||||
}
|
||||
|
||||
/* Scrollbar */
|
||||
::-webkit-scrollbar { width: 8px; height: 8px; }
|
||||
::-webkit-scrollbar-track { background: transparent; }
|
||||
::-webkit-scrollbar-thumb { background: var(--border); border-radius: 4px; }
|
||||
::-webkit-scrollbar-thumb:hover { background: var(--text2); }
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
|
||||
<header>
|
||||
<div style="display:flex;align-items:center;gap:16px;">
|
||||
<h1>Browser Remote Control</h1>
|
||||
<div class="view-toggle">
|
||||
<button class="active" onclick="setView('grid')">Grid</button>
|
||||
<button onclick="setView('single')">Focus</button>
|
||||
</div>
|
||||
</div>
|
||||
<div style="display:flex;align-items:center;gap:12px;">
|
||||
<span id="context-info" style="font-size:12px;color:var(--text2)"></span>
|
||||
<span id="status-badge" class="checking">checking...</span>
|
||||
</div>
|
||||
</header>
|
||||
|
||||
<div class="layout">
|
||||
<nav class="sidebar" id="sidebar"></nav>
|
||||
<main class="main" id="main-content"></main>
|
||||
<aside class="history-panel" id="history-panel">
|
||||
<div class="history-title">
|
||||
History
|
||||
<button class="clear-history" onclick="clearHistory()">clear</button>
|
||||
</div>
|
||||
<div id="history-list">
|
||||
<div class="history-empty">No calls yet</div>
|
||||
</div>
|
||||
</aside>
|
||||
</div>
|
||||
|
||||
<script>
|
||||
const API_BASE = window.location.origin;
|
||||
let toolSchemas = {};
|
||||
let history = [];
|
||||
let currentView = 'grid';
|
||||
|
||||
// Tool categories for sidebar grouping
|
||||
const CATEGORIES = {
|
||||
'Lifecycle': ['browser_setup', 'browser_start', 'browser_stop', 'browser_status'],
|
||||
'Tabs': ['browser_tabs', 'browser_open', 'browser_close', 'browser_close_all', 'browser_close_finished', 'browser_focus'],
|
||||
'Navigation': ['browser_navigate', 'browser_go_back', 'browser_go_forward', 'browser_reload'],
|
||||
'Interactions': ['browser_click', 'browser_click_coordinate', 'browser_type', 'browser_type_focused', 'browser_fill', 'browser_press', 'browser_press_at', 'browser_hover', 'browser_hover_coordinate', 'browser_select', 'browser_scroll', 'browser_drag'],
|
||||
'Inspection': ['browser_screenshot', 'browser_snapshot', 'browser_console', 'browser_html', 'browser_get_text', 'browser_get_attribute', 'browser_get_rect', 'browser_shadow_query', 'browser_evaluate', 'browser_wait'],
|
||||
'Advanced': ['browser_resize', 'browser_upload', 'browser_dialog'],
|
||||
};
|
||||
|
||||
async function init() {
|
||||
await checkStatus();
|
||||
await loadTools();
|
||||
setInterval(checkStatus, 5000);
|
||||
}
|
||||
|
||||
async function checkStatus() {
|
||||
const badge = document.getElementById('status-badge');
|
||||
const ctx = document.getElementById('context-info');
|
||||
try {
|
||||
const res = await fetch(`${API_BASE}/status`);
|
||||
const data = await res.json();
|
||||
if (data.connected) {
|
||||
badge.textContent = 'connected';
|
||||
badge.className = 'connected';
|
||||
if (data.tools_count) {
|
||||
ctx.textContent = `${data.tools_count} tools`;
|
||||
} else if (data.contexts) {
|
||||
const contexts = Object.entries(data.contexts);
|
||||
ctx.textContent = contexts.length > 0
|
||||
? contexts.map(([k,v]) => `${k}: tab ${v.activeTabId}`).join(', ')
|
||||
: 'no active context';
|
||||
} else {
|
||||
ctx.textContent = '';
|
||||
}
|
||||
} else {
|
||||
badge.textContent = 'disconnected';
|
||||
badge.className = 'disconnected';
|
||||
ctx.textContent = '';
|
||||
}
|
||||
} catch {
|
||||
badge.textContent = 'unreachable';
|
||||
badge.className = 'disconnected';
|
||||
ctx.textContent = '';
|
||||
}
|
||||
}
|
||||
|
||||
async function loadTools() {
|
||||
try {
|
||||
const res = await fetch(`${API_BASE}/tools`);
|
||||
toolSchemas = await res.json();
|
||||
renderSidebar();
|
||||
renderToolCards();
|
||||
} catch (e) {
|
||||
document.getElementById('main-content').innerHTML =
|
||||
`<div style="color:var(--red);padding:40px;">Failed to load tools: ${e.message}</div>`;
|
||||
}
|
||||
}
|
||||
|
||||
function renderSidebar() {
|
||||
const sidebar = document.getElementById('sidebar');
|
||||
let html = '';
|
||||
const categorized = new Set();
|
||||
for (const [group, tools] of Object.entries(CATEGORIES)) {
|
||||
const available = tools.filter(t => toolSchemas[t]);
|
||||
if (available.length === 0) continue;
|
||||
html += `<div class="sidebar-group"><div class="sidebar-group-label">${group}</div>`;
|
||||
for (const tool of available) {
|
||||
categorized.add(tool);
|
||||
const shortName = tool.replace('browser_', '');
|
||||
html += `<button class="sidebar-item" data-tool="${tool}" onclick="scrollToTool('${tool}')">${shortName}</button>`;
|
||||
}
|
||||
html += '</div>';
|
||||
}
|
||||
// Show any uncategorized tools from the server
|
||||
const other = Object.keys(toolSchemas).filter(t => !categorized.has(t));
|
||||
if (other.length > 0) {
|
||||
html += `<div class="sidebar-group"><div class="sidebar-group-label">Other</div>`;
|
||||
for (const tool of other) {
|
||||
const shortName = tool.replace('browser_', '');
|
||||
html += `<button class="sidebar-item" data-tool="${tool}" onclick="scrollToTool('${tool}')">${shortName}</button>`;
|
||||
}
|
||||
html += '</div>';
|
||||
}
|
||||
sidebar.innerHTML = html;
|
||||
}
|
||||
|
||||
function renderToolCards() {
|
||||
const main = document.getElementById('main-content');
|
||||
let html = '<div class="tools-grid" id="tools-grid">';
|
||||
for (const [tool, schema] of Object.entries(toolSchemas)) {
|
||||
html += buildToolCard(tool, schema);
|
||||
}
|
||||
html += '</div>';
|
||||
main.innerHTML = html;
|
||||
}
|
||||
|
||||
function buildToolCard(tool, schema) {
|
||||
const shortName = tool.replace('browser_', '');
|
||||
let fieldsHtml = '';
|
||||
for (const [param, spec] of Object.entries(schema.params)) {
|
||||
fieldsHtml += buildField(tool, param, spec);
|
||||
}
|
||||
|
||||
return `
|
||||
<div class="tool-card" id="card-${tool}" data-tool="${tool}">
|
||||
<div class="tool-card-header" onclick="toggleCard('${tool}')">
|
||||
<div>
|
||||
<span class="tool-name">${shortName}</span>
|
||||
<span class="tool-desc">${schema.description}</span>
|
||||
</div>
|
||||
<span class="chevron">▶</span>
|
||||
</div>
|
||||
<div class="tool-card-body">
|
||||
<form id="form-${tool}" onsubmit="runTool(event, '${tool}')">
|
||||
${fieldsHtml}
|
||||
<button class="btn-run" type="submit" id="btn-${tool}">Run</button>
|
||||
</form>
|
||||
<div class="result-area" id="result-${tool}"></div>
|
||||
</div>
|
||||
</div>`;
|
||||
}
|
||||
|
||||
function buildField(tool, param, spec) {
|
||||
const id = `${tool}__${param}`;
|
||||
const required = spec.required ? '<span class="required">*</span>' : '';
|
||||
const typeTag = `<span class="type-tag">${spec.type}</span>`;
|
||||
const defaultVal = spec.default !== undefined ? spec.default : '';
|
||||
|
||||
if (spec.type === 'boolean') {
|
||||
return `
|
||||
<div class="field">
|
||||
<div class="checkbox-row">
|
||||
<input type="checkbox" id="${id}" ${defaultVal ? 'checked' : ''}>
|
||||
<label for="${id}">${param} ${typeTag} ${required}</label>
|
||||
</div>
|
||||
</div>`;
|
||||
}
|
||||
|
||||
if (spec.enum) {
|
||||
const opts = spec.enum.map(v => `<option value="${v}" ${v === defaultVal ? 'selected' : ''}>${v}</option>`).join('');
|
||||
return `
|
||||
<div class="field">
|
||||
<label for="${id}">${param} ${typeTag} ${required}</label>
|
||||
<select id="${id}">${opts}</select>
|
||||
</div>`;
|
||||
}
|
||||
|
||||
if (spec.type === 'array') {
|
||||
return `
|
||||
<div class="field">
|
||||
<label for="${id}">${param} ${typeTag} ${required}
|
||||
<span class="type-tag" style="margin-left:2px">JSON</span>
|
||||
</label>
|
||||
<input type="text" id="${id}" placeholder='["value1", "value2"]'>
|
||||
</div>`;
|
||||
}
|
||||
|
||||
// For expression / text that might be multiline
|
||||
if (param === 'expression' || param === 'text') {
|
||||
return `
|
||||
<div class="field">
|
||||
<label for="${id}">${param} ${typeTag} ${required}</label>
|
||||
<textarea id="${id}" placeholder="${param}">${defaultVal}</textarea>
|
||||
</div>`;
|
||||
}
|
||||
|
||||
const inputType = (spec.type === 'integer' || spec.type === 'number') ? 'number' : 'text';
|
||||
const step = spec.type === 'number' ? ' step="any"' : '';
|
||||
|
||||
return `
|
||||
<div class="field">
|
||||
<label for="${id}">${param} ${typeTag} ${required}</label>
|
||||
<input type="${inputType}" id="${id}"${step} placeholder="${defaultVal !== '' ? defaultVal : param}" value="${defaultVal !== '' && spec.type !== 'string' ? defaultVal : ''}">
|
||||
</div>`;
|
||||
}
|
||||
|
||||
function toggleCard(tool) {
|
||||
const card = document.getElementById(`card-${tool}`);
|
||||
const wasOpen = card.classList.contains('open');
|
||||
if (currentView === 'single') {
|
||||
document.querySelectorAll('.tool-card.open').forEach(c => c.classList.remove('open'));
|
||||
}
|
||||
card.classList.toggle('open', !wasOpen);
|
||||
|
||||
// Update sidebar active state
|
||||
document.querySelectorAll('.sidebar-item').forEach(s => s.classList.remove('active'));
|
||||
if (!wasOpen) {
|
||||
const sideItem = document.querySelector(`.sidebar-item[data-tool="${tool}"]`);
|
||||
if (sideItem) sideItem.classList.add('active');
|
||||
}
|
||||
}
|
||||
|
||||
function scrollToTool(tool) {
|
||||
const card = document.getElementById(`card-${tool}`);
|
||||
if (!card) return;
|
||||
|
||||
// Open it
|
||||
if (!card.classList.contains('open')) {
|
||||
if (currentView === 'single') {
|
||||
document.querySelectorAll('.tool-card.open').forEach(c => c.classList.remove('open'));
|
||||
}
|
||||
card.classList.add('open');
|
||||
}
|
||||
|
||||
card.scrollIntoView({ behavior: 'smooth', block: 'start' });
|
||||
|
||||
document.querySelectorAll('.sidebar-item').forEach(s => s.classList.remove('active'));
|
||||
const sideItem = document.querySelector(`.sidebar-item[data-tool="${tool}"]`);
|
||||
if (sideItem) sideItem.classList.add('active');
|
||||
}
|
||||
|
||||
function collectParams(tool) {
|
||||
const schema = toolSchemas[tool];
|
||||
const params = {};
|
||||
for (const [param, spec] of Object.entries(schema.params)) {
|
||||
const el = document.getElementById(`${tool}__${param}`);
|
||||
if (!el) continue;
|
||||
|
||||
if (spec.type === 'boolean') {
|
||||
params[param] = el.checked;
|
||||
} else if (spec.type === 'array') {
|
||||
const v = el.value.trim();
|
||||
if (v) {
|
||||
try { params[param] = JSON.parse(v); }
|
||||
catch { params[param] = v.split(',').map(s => s.trim()); }
|
||||
}
|
||||
} else if (spec.type === 'integer') {
|
||||
const v = el.value.trim();
|
||||
if (v) params[param] = parseInt(v, 10);
|
||||
} else if (spec.type === 'number') {
|
||||
const v = el.value.trim();
|
||||
if (v) params[param] = parseFloat(v);
|
||||
} else {
|
||||
const v = (el.value || '').trim();
|
||||
if (v) params[param] = v;
|
||||
}
|
||||
}
|
||||
return params;
|
||||
}
|
||||
|
||||
async function runTool(event, tool) {
|
||||
event.preventDefault();
|
||||
const btn = document.getElementById(`btn-${tool}`);
|
||||
const resultArea = document.getElementById(`result-${tool}`);
|
||||
|
||||
const params = collectParams(tool);
|
||||
btn.textContent = 'Running...';
|
||||
btn.classList.add('running');
|
||||
btn.disabled = true;
|
||||
|
||||
const startTime = Date.now();
|
||||
let result;
|
||||
try {
|
||||
const res = await fetch(`${API_BASE}/${tool}`, {
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify(params),
|
||||
});
|
||||
result = await res.json();
|
||||
} catch (e) {
|
||||
result = { ok: false, error: e.message };
|
||||
}
|
||||
const elapsed = Date.now() - startTime;
|
||||
|
||||
btn.textContent = 'Run';
|
||||
btn.classList.remove('running');
|
||||
btn.disabled = false;
|
||||
|
||||
// Render result
|
||||
const isOk = result.ok !== false;
|
||||
const statusClass = isOk ? 'ok' : 'error';
|
||||
const statusText = isOk ? 'OK' : 'ERROR';
|
||||
const duration = result._duration_ms ? `${result._duration_ms}ms` : `${elapsed}ms`;
|
||||
|
||||
let bodyHtml = '';
|
||||
|
||||
// Special handling for screenshot — show the image
|
||||
if (tool === 'browser_screenshot' && result.data) {
|
||||
bodyHtml = `<img class="result-screenshot" src="data:image/png;base64,${result.data}">`;
|
||||
// Don't show the raw base64 in JSON
|
||||
const display = { ...result };
|
||||
display.data = `[${result.data.length} chars base64]`;
|
||||
bodyHtml += `<pre class="result-json">${JSON.stringify(display, null, 2)}</pre>`;
|
||||
} else {
|
||||
bodyHtml = `<pre class="result-json">${JSON.stringify(result, null, 2)}</pre>`;
|
||||
}
|
||||
|
||||
resultArea.innerHTML = `
|
||||
<div class="result-header">
|
||||
<span class="result-status ${statusClass}">${statusText}</span>
|
||||
<span class="result-duration">${duration}</span>
|
||||
</div>
|
||||
${bodyHtml}`;
|
||||
resultArea.classList.add('visible');
|
||||
|
||||
// Add to history
|
||||
addHistory(tool, params, result, duration);
|
||||
}
|
||||
|
||||
function addHistory(tool, params, result, duration) {
|
||||
const entry = {
|
||||
tool,
|
||||
params,
|
||||
result,
|
||||
duration,
|
||||
time: new Date().toLocaleTimeString(),
|
||||
ok: result.ok !== false,
|
||||
};
|
||||
history.unshift(entry);
|
||||
if (history.length > 50) history.pop();
|
||||
renderHistory();
|
||||
}
|
||||
|
||||
function renderHistory() {
|
||||
const list = document.getElementById('history-list');
|
||||
if (history.length === 0) {
|
||||
list.innerHTML = '<div class="history-empty">No calls yet</div>';
|
||||
return;
|
||||
}
|
||||
list.innerHTML = history.map((h, i) => {
|
||||
const shortName = h.tool.replace('browser_', '');
|
||||
const paramsStr = JSON.stringify(h.params);
|
||||
const statusCls = h.ok ? 'ok' : 'error';
|
||||
return `
|
||||
<div class="history-item" onclick="replayHistory(${i})" title="Click to load params">
|
||||
<div style="display:flex;justify-content:space-between;align-items:center;">
|
||||
<span class="history-item-tool ${statusCls}">${shortName}</span>
|
||||
<span class="history-item-time">${h.time} (${h.duration})</span>
|
||||
</div>
|
||||
<div class="history-item-params">${paramsStr}</div>
|
||||
</div>`;
|
||||
}).join('');
|
||||
}
|
||||
|
||||
function replayHistory(idx) {
|
||||
const h = history[idx];
|
||||
const tool = h.tool;
|
||||
|
||||
// Open the card and scroll to it
|
||||
scrollToTool(tool);
|
||||
|
||||
// Fill the form with saved params
|
||||
const schema = toolSchemas[tool];
|
||||
for (const [param, spec] of Object.entries(schema.params)) {
|
||||
const el = document.getElementById(`${tool}__${param}`);
|
||||
if (!el) continue;
|
||||
const val = h.params[param];
|
||||
if (val === undefined) continue;
|
||||
|
||||
if (spec.type === 'boolean') {
|
||||
el.checked = !!val;
|
||||
} else if (spec.type === 'array') {
|
||||
el.value = JSON.stringify(val);
|
||||
} else {
|
||||
el.value = val;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
function clearHistory() {
|
||||
history = [];
|
||||
renderHistory();
|
||||
}
|
||||
|
||||
function setView(mode) {
|
||||
currentView = mode;
|
||||
document.querySelectorAll('.view-toggle button').forEach(b => b.classList.remove('active'));
|
||||
event.target.classList.add('active');
|
||||
const grid = document.getElementById('tools-grid');
|
||||
if (mode === 'single') {
|
||||
grid.style.gridTemplateColumns = '1fr';
|
||||
} else {
|
||||
grid.style.gridTemplateColumns = 'repeat(auto-fill, minmax(420px, 1fr))';
|
||||
}
|
||||
}
|
||||
|
||||
init();
|
||||
</script>
|
||||
</body>
|
||||
</html>
|
||||
@@ -88,13 +88,13 @@ Find Textarea (it is hidden inside shadow DOM):
|
||||
```
|
||||
Click that coordinate, `sleep(1)`.
|
||||
|
||||
Inject text and Send:
|
||||
Type the message:
|
||||
Construct the message: `Hey {first_name}, thanks for the connection invite! I'm currently building a prediction market for jobs: https://honeycomb.open-hive.com/. If you could check it out and share some feedback, I'd really appreciate it.`
|
||||
|
||||
Escape the string properly for JS injection, then run:
|
||||
```javascript
|
||||
// Replace MSG_TEXT with your actual string
|
||||
browser_evaluate("(function(){ document.execCommand('insertText', false, `MSG_TEXT`); return true; })()")
|
||||
Use `browser_type_focused` — it dispatches CDP `Input.insertText` to the already-focused composer (document.activeElement), which works through shadow DOM without JSON-escaping issues:
|
||||
```
|
||||
browser_type_focused(text=message_text)
|
||||
sleep(1.0)
|
||||
```
|
||||
|
||||
Find Send button (also inside shadow DOM):
|
||||
|
||||
@@ -45,8 +45,8 @@ def register_tools(mcp: FastMCP) -> None:
|
||||
- Tabs: browser_tabs, browser_open, browser_close, browser_focus
|
||||
- Navigation: browser_navigate, browser_go_back, browser_go_forward, browser_reload
|
||||
- Inspection: browser_screenshot, browser_snapshot, browser_console
|
||||
- Interactions: browser_click, browser_click_coordinate, browser_type, browser_fill,
|
||||
browser_press, browser_hover, browser_select, browser_scroll, browser_drag
|
||||
- Interactions: browser_click, browser_click_coordinate, browser_type, browser_type_focused,
|
||||
browser_fill, browser_press, browser_hover, browser_select, browser_scroll, browser_drag
|
||||
- Advanced: browser_wait, browser_evaluate, browser_get_text, browser_get_attribute,
|
||||
browser_resize, browser_upload, browser_dialog
|
||||
"""
|
||||
|
||||
+265
-97
@@ -80,6 +80,37 @@ async def _adaptive_poll_sleep(elapsed_s: float) -> None:
|
||||
_interaction_highlights: dict[int, dict] = {}
|
||||
|
||||
|
||||
# Compact descriptor of document.activeElement. Returned by both click()
|
||||
# and click_coordinate() so the agent can verify it focused what it
|
||||
# intended, then decide whether to follow up with browser_type_focused(text=...).
|
||||
# Keeping this as a single shared string avoids drift
|
||||
# between the two click paths.
|
||||
_FOCUSED_ELEMENT_JS = """
|
||||
(function() {
|
||||
var el = document.activeElement;
|
||||
if (!el || el === document.body) return null;
|
||||
var rect = el.getBoundingClientRect();
|
||||
var attrs = {};
|
||||
for (var i = 0; i < el.attributes.length && i < 10; i++) {
|
||||
attrs[el.attributes[i].name] = el.attributes[i].value.substring(0, 200);
|
||||
}
|
||||
return {
|
||||
tag: el.tagName.toLowerCase(),
|
||||
id: el.id || null,
|
||||
className: el.className || null,
|
||||
name: el.getAttribute('name') || null,
|
||||
type: el.getAttribute('type') || null,
|
||||
role: el.getAttribute('role') || null,
|
||||
contenteditable: el.getAttribute('contenteditable') || null,
|
||||
text: (el.innerText || '').substring(0, 200),
|
||||
value: (el.value !== undefined ? String(el.value).substring(0, 200) : null),
|
||||
attributes: attrs,
|
||||
rect: { x: rect.x, y: rect.y, width: rect.width, height: rect.height }
|
||||
};
|
||||
})()
|
||||
"""
|
||||
|
||||
|
||||
def _get_active_profile() -> str:
|
||||
"""Get the current active profile from context variable."""
|
||||
try:
|
||||
@@ -763,7 +794,8 @@ class BeelineBridge:
|
||||
rx = value.get("x", 0) - value.get("width", 0) / 2
|
||||
ry = value.get("y", 0) - value.get("height", 0) / 2
|
||||
await self.highlight_rect(tab_id, rx, ry, value.get("width", 0), value.get("height", 0), label=selector)
|
||||
return {
|
||||
focused_info = await self._read_focused_element(tab_id)
|
||||
resp = {
|
||||
"ok": True,
|
||||
"action": "click",
|
||||
"selector": selector,
|
||||
@@ -771,6 +803,9 @@ class BeelineBridge:
|
||||
"y": value.get("y", 0),
|
||||
"method": "javascript",
|
||||
}
|
||||
if focused_info:
|
||||
resp["focused_element"] = focused_info
|
||||
return resp
|
||||
|
||||
# If JavaScript click failed, try CDP approach
|
||||
if isinstance(value, dict) and value.get("error"):
|
||||
@@ -883,7 +918,8 @@ class BeelineBridge:
|
||||
w = bounds_value.get("width", 0)
|
||||
h = bounds_value.get("height", 0)
|
||||
await self.highlight_rect(tab_id, x - w / 2, y - h / 2, w, h, label=selector)
|
||||
return {
|
||||
focused_info = await self._read_focused_element(tab_id)
|
||||
resp = {
|
||||
"ok": True,
|
||||
"action": "click",
|
||||
"selector": selector,
|
||||
@@ -891,10 +927,29 @@ class BeelineBridge:
|
||||
"y": y,
|
||||
"method": "cdp",
|
||||
}
|
||||
if focused_info:
|
||||
resp["focused_element"] = focused_info
|
||||
return resp
|
||||
|
||||
except Exception as e:
|
||||
return {"ok": False, "error": f"Click failed: {e}"}
|
||||
|
||||
async def _read_focused_element(self, tab_id: int) -> dict | None:
|
||||
"""Read document.activeElement and return a compact descriptor.
|
||||
|
||||
Returns None on any failure — never raises. Used by both click
|
||||
paths (selector-based click() and click_coordinate()) so the
|
||||
agent gets the same response shape regardless of which one was
|
||||
called. The descriptor lets the agent answer "did my click land
|
||||
on an editable?" without a second round-trip.
|
||||
"""
|
||||
try:
|
||||
await self._try_enable_domain(tab_id, "Runtime")
|
||||
result = await self.evaluate(tab_id, _FOCUSED_ELEMENT_JS)
|
||||
return (result or {}).get("result")
|
||||
except Exception:
|
||||
return None
|
||||
|
||||
async def click_coordinate(self, tab_id: int, x: float, y: float, button: str = "left") -> dict:
|
||||
"""Click at specific coordinates."""
|
||||
await self.cdp_attach(tab_id)
|
||||
@@ -930,15 +985,20 @@ class BeelineBridge:
|
||||
)
|
||||
|
||||
await self.highlight_point(tab_id, x, y, label=f"click ({x},{y})")
|
||||
return {"ok": True, "action": "click_coordinate", "x": x, "y": y}
|
||||
|
||||
focused_info = await self._read_focused_element(tab_id)
|
||||
resp = {"ok": True, "action": "click_coordinate", "x": x, "y": y}
|
||||
if focused_info:
|
||||
resp["focused_element"] = focused_info
|
||||
return resp
|
||||
|
||||
async def type_text(
|
||||
self,
|
||||
tab_id: int,
|
||||
selector: str,
|
||||
selector: str | None,
|
||||
text: str,
|
||||
clear_first: bool = True,
|
||||
delay_ms: int = 0,
|
||||
delay_ms: int = 1,
|
||||
timeout_ms: int = 30000,
|
||||
use_insert_text: bool = True,
|
||||
) -> dict:
|
||||
@@ -974,79 +1034,98 @@ class BeelineBridge:
|
||||
await self._try_enable_domain(tab_id, "Input")
|
||||
await self._try_enable_domain(tab_id, "Runtime")
|
||||
|
||||
# Find + scroll + (optionally) clear via JS. We still need the
|
||||
# rect, and clearing via `.value = ''` / `.textContent = ''`
|
||||
# is the most reliable way to reset pre-existing content.
|
||||
focus_script = f"""
|
||||
(function() {{
|
||||
const el = document.querySelector({json.dumps(selector)});
|
||||
if (!el) return null;
|
||||
if selector is not None:
|
||||
# Find + scroll + (optionally) clear via JS. We still need the
|
||||
# rect, and clearing via `.value = ''` / `.textContent = ''`
|
||||
# is the most reliable way to reset pre-existing content.
|
||||
focus_script = f"""
|
||||
(function() {{
|
||||
const el = document.querySelector({json.dumps(selector)});
|
||||
if (!el) return null;
|
||||
|
||||
// Scroll into view so the click lands in-viewport.
|
||||
el.scrollIntoView({{ block: 'center' }});
|
||||
// Scroll into view so the click lands in-viewport.
|
||||
el.scrollIntoView({{ block: 'center' }});
|
||||
|
||||
// Clear if requested.
|
||||
if ({str(clear_first).lower()}) {{
|
||||
if (el.value !== undefined) {{
|
||||
el.value = '';
|
||||
// Nudge React's onChange — the framework reads
|
||||
// .value via a setter hook, and without firing
|
||||
// an input event the component state remains
|
||||
// stale after our value assignment.
|
||||
el.dispatchEvent(new Event('input', {{bubbles: true}}));
|
||||
}} else if (el.isContentEditable) {{
|
||||
el.textContent = '';
|
||||
el.dispatchEvent(new Event('input', {{bubbles: true}}));
|
||||
// Clear if requested.
|
||||
if ({str(clear_first).lower()}) {{
|
||||
if (el.value !== undefined) {{
|
||||
el.value = '';
|
||||
// Nudge React's onChange — the framework reads
|
||||
// .value via a setter hook, and without firing
|
||||
// an input event the component state remains
|
||||
// stale after our value assignment.
|
||||
el.dispatchEvent(new Event('input', {{bubbles: true}}));
|
||||
}} else if (el.isContentEditable) {{
|
||||
el.textContent = '';
|
||||
el.dispatchEvent(new Event('input', {{bubbles: true}}));
|
||||
}}
|
||||
}}
|
||||
}}
|
||||
|
||||
const r = el.getBoundingClientRect();
|
||||
return {{
|
||||
x: r.left + r.width / 2,
|
||||
y: r.top + r.height / 2,
|
||||
w: r.width,
|
||||
h: r.height,
|
||||
}};
|
||||
}})();
|
||||
"""
|
||||
const r = el.getBoundingClientRect();
|
||||
return {{
|
||||
x: r.left + r.width / 2,
|
||||
y: r.top + r.height / 2,
|
||||
w: r.width,
|
||||
h: r.height,
|
||||
}};
|
||||
}})();
|
||||
"""
|
||||
|
||||
focus_result = await self.evaluate(tab_id, focus_script)
|
||||
rect = (focus_result or {}).get("result")
|
||||
|
||||
if not rect:
|
||||
# Element not found — wait + retry until timeout.
|
||||
deadline = asyncio.get_event_loop().time() + timeout_ms / 1000
|
||||
while asyncio.get_event_loop().time() < deadline:
|
||||
result = await self.evaluate(tab_id, focus_script)
|
||||
rect = (result or {}).get("result") if result else None
|
||||
if rect:
|
||||
break
|
||||
await asyncio.sleep(0.1)
|
||||
focus_result = await self.evaluate(tab_id, focus_script)
|
||||
rect = (focus_result or {}).get("result")
|
||||
|
||||
if not rect:
|
||||
return {"ok": False, "error": f"Element not found: {selector}"}
|
||||
# Element not found — wait + retry until timeout.
|
||||
deadline = asyncio.get_event_loop().time() + timeout_ms / 1000
|
||||
while asyncio.get_event_loop().time() < deadline:
|
||||
result = await self.evaluate(tab_id, focus_script)
|
||||
rect = (result or {}).get("result") if result else None
|
||||
if rect:
|
||||
break
|
||||
await asyncio.sleep(0.1)
|
||||
|
||||
if not rect.get("w") or not rect.get("h"):
|
||||
return {
|
||||
"ok": False,
|
||||
"error": f"Element has zero dimensions, can't click to focus: {selector}",
|
||||
}
|
||||
if not rect:
|
||||
return {"ok": False, "error": f"Element not found: {selector}"}
|
||||
|
||||
# Fire a real CDP pointer click at the element's center. This is
|
||||
# what unblocks rich-text editors — JS el.focus() is not enough.
|
||||
click_x = rect["x"]
|
||||
click_y = rect["y"]
|
||||
await self._cdp(
|
||||
tab_id,
|
||||
"Input.dispatchMouseEvent",
|
||||
{"type": "mousePressed", "x": click_x, "y": click_y, "button": "left", "clickCount": 1},
|
||||
)
|
||||
await self._cdp(
|
||||
tab_id,
|
||||
"Input.dispatchMouseEvent",
|
||||
{"type": "mouseReleased", "x": click_x, "y": click_y, "button": "left", "clickCount": 1},
|
||||
)
|
||||
await asyncio.sleep(0.15) # Let focus / editor-init animations settle.
|
||||
if not rect.get("w") or not rect.get("h"):
|
||||
return {
|
||||
"ok": False,
|
||||
"error": f"Element has zero dimensions, can't click to focus: {selector}",
|
||||
}
|
||||
|
||||
# Fire a real CDP pointer click at the element's center. This is
|
||||
# what unblocks rich-text editors — JS el.focus() is not enough.
|
||||
click_x = rect["x"]
|
||||
click_y = rect["y"]
|
||||
await self._cdp(
|
||||
tab_id,
|
||||
"Input.dispatchMouseEvent",
|
||||
{"type": "mousePressed", "x": click_x, "y": click_y, "button": "left", "clickCount": 1},
|
||||
)
|
||||
await self._cdp(
|
||||
tab_id,
|
||||
"Input.dispatchMouseEvent",
|
||||
{"type": "mouseReleased", "x": click_x, "y": click_y, "button": "left", "clickCount": 1},
|
||||
)
|
||||
await asyncio.sleep(0.15) # Let focus / editor-init animations settle.
|
||||
else:
|
||||
# No selector — assume the caller already focused the target
|
||||
# element (e.g. via browser_click_coordinate). Just clear the
|
||||
# active element if requested, then insert text directly.
|
||||
if clear_first:
|
||||
await self.evaluate(tab_id, """
|
||||
(function() {
|
||||
const el = document.activeElement;
|
||||
if (!el) return;
|
||||
if (el.value !== undefined) {
|
||||
el.value = '';
|
||||
el.dispatchEvent(new Event('input', {bubbles: true}));
|
||||
} else if (el.isContentEditable) {
|
||||
el.textContent = '';
|
||||
el.dispatchEvent(new Event('input', {bubbles: true}));
|
||||
}
|
||||
})();
|
||||
""")
|
||||
|
||||
if use_insert_text and delay_ms <= 0:
|
||||
# CDP Input.insertText is the most reliable way to insert
|
||||
@@ -1086,16 +1165,36 @@ class BeelineBridge:
|
||||
await asyncio.sleep(delay_ms / 1000)
|
||||
|
||||
# Highlight the element that was typed into
|
||||
rect_result = await self.evaluate(
|
||||
tab_id,
|
||||
f"(function(){{const el=document.querySelector("
|
||||
f"{json.dumps(selector)});if(!el)return null;"
|
||||
f"const r=el.getBoundingClientRect();"
|
||||
f"return{{x:r.left,y:r.top,w:r.width,h:r.height}};}})()",
|
||||
)
|
||||
rect = (rect_result or {}).get("result")
|
||||
if rect:
|
||||
await self.highlight_rect(tab_id, rect["x"], rect["y"], rect["w"], rect["h"], label=selector)
|
||||
if selector is not None:
|
||||
rect_result = await self.evaluate(
|
||||
tab_id,
|
||||
f"(function(){{const el=document.querySelector("
|
||||
f"{json.dumps(selector)});if(!el)return null;"
|
||||
f"const r=el.getBoundingClientRect();"
|
||||
f"return{{x:r.left,y:r.top,w:r.width,h:r.height}};}})()",
|
||||
)
|
||||
rect = (rect_result or {}).get("result")
|
||||
if rect:
|
||||
await self.highlight_rect(tab_id, rect["x"], rect["y"], rect["w"], rect["h"], label=selector)
|
||||
else:
|
||||
# Highlight the active element when no selector was provided.
|
||||
# Drill into same-origin iframes to find the real focused
|
||||
# element — the top-level activeElement may be a full-screen
|
||||
# iframe whose rect covers the entire viewport.
|
||||
rect_result = await self.evaluate(
|
||||
tab_id,
|
||||
"(function(){"
|
||||
"var el=document.activeElement;"
|
||||
"try{while(el&&el.tagName==='IFRAME'&&el.contentDocument){"
|
||||
"el=el.contentDocument.activeElement;"
|
||||
"}}catch(e){}"
|
||||
"if(!el||el===document.body||el===document.documentElement)return null;"
|
||||
"const r=el.getBoundingClientRect();"
|
||||
"return{x:r.left,y:r.top,w:r.width,h:r.height};})()",
|
||||
)
|
||||
rect = (rect_result or {}).get("result")
|
||||
if rect:
|
||||
await self.highlight_rect(tab_id, rect["x"], rect["y"], rect["w"], rect["h"], label="active element", border_style="dashed")
|
||||
return {"ok": True, "action": "type", "selector": selector, "length": len(text)}
|
||||
|
||||
# CDP Input.dispatchKeyEvent modifiers bitmask.
|
||||
@@ -1465,6 +1564,7 @@ class BeelineBridge:
|
||||
h: float,
|
||||
label: str = "",
|
||||
color: dict | None = None,
|
||||
border_style: str = "solid",
|
||||
) -> None:
|
||||
"""Inject a visible highlight overlay into the page DOM.
|
||||
|
||||
@@ -1493,7 +1593,7 @@ class BeelineBridge:
|
||||
box.id = '__hive_hl';
|
||||
box.style.cssText = 'position:fixed;z-index:2147483647;pointer-events:none;'
|
||||
+ 'left:{int(x)}px;top:{int(y)}px;width:{max(1, int(w))}px;height:{max(1, int(h))}px;'
|
||||
+ 'border:2px solid {border_rgb};background:{bg_rgba};'
|
||||
+ 'border:2px {border_style} {border_rgb};background:{bg_rgba};'
|
||||
+ 'border-radius:3px;transition:opacity 0.4s ease;opacity:1;'
|
||||
+ 'box-shadow:0 0 8px {bg_rgba};';
|
||||
|
||||
@@ -1836,7 +1936,7 @@ class BeelineBridge:
|
||||
"result": value,
|
||||
}
|
||||
|
||||
async def snapshot(self, tab_id: int, timeout_s: float = 30.0) -> dict:
|
||||
async def snapshot(self, tab_id: int, timeout_s: float = 30.0, mode: str = "default") -> dict:
|
||||
"""Get an accessibility snapshot of the page.
|
||||
|
||||
Uses a hybrid approach:
|
||||
@@ -1847,6 +1947,7 @@ class BeelineBridge:
|
||||
Args:
|
||||
tab_id: The tab ID to snapshot
|
||||
timeout_s: Maximum time to spend building snapshot (default 10s)
|
||||
mode: Filtering mode — "default", "simple", or "interactive"
|
||||
"""
|
||||
try:
|
||||
async with asyncio.timeout(timeout_s):
|
||||
@@ -1878,8 +1979,11 @@ class BeelineBridge:
|
||||
)
|
||||
return await self._dom_snapshot(tab_id)
|
||||
|
||||
# Clean redundant InlineTextBox children before formatting
|
||||
nodes = self._clean_inline_text_boxes(nodes)
|
||||
|
||||
# Format the accessibility tree (with node limit)
|
||||
snapshot = self._format_ax_tree(nodes, max_nodes=2000)
|
||||
snapshot = self._format_ax_tree(nodes, max_nodes=2000, mode=mode)
|
||||
|
||||
# Get URL
|
||||
url_result = await self._cdp(
|
||||
@@ -2013,13 +2117,78 @@ class BeelineBridge:
|
||||
"tree": "\n".join(lines),
|
||||
}
|
||||
|
||||
def _format_ax_tree(self, nodes: list[dict], max_nodes: int = 2000) -> str:
|
||||
@staticmethod
|
||||
def _clean_inline_text_boxes(nodes: list[dict]) -> list[dict]:
|
||||
"""Remove redundant InlineTextBox children from StaticText nodes.
|
||||
|
||||
If a StaticText node has 3+ InlineTextBox children and ALL their
|
||||
text is already contained in the StaticText's name, remove all
|
||||
the InlineTextBox children (they add no information).
|
||||
"""
|
||||
by_id = {n["nodeId"]: n for n in nodes}
|
||||
children_map: dict[str, list[str]] = {}
|
||||
for n in nodes:
|
||||
for child_id in n.get("childIds", []):
|
||||
children_map.setdefault(n["nodeId"], []).append(child_id)
|
||||
|
||||
ids_to_remove: set[str] = set()
|
||||
|
||||
for n in nodes:
|
||||
role_info = n.get("role", {})
|
||||
role = role_info.get("value", "") if isinstance(role_info, dict) else str(role_info)
|
||||
if role != "StaticText":
|
||||
continue
|
||||
|
||||
child_ids = children_map.get(n["nodeId"], [])
|
||||
if len(child_ids) < 3:
|
||||
continue
|
||||
|
||||
name_info = n.get("name", {})
|
||||
parent_name = name_info.get("value", "") if isinstance(name_info, dict) else str(name_info)
|
||||
if not parent_name:
|
||||
continue
|
||||
|
||||
all_inline = True
|
||||
for cid in child_ids:
|
||||
child = by_id.get(cid)
|
||||
if not child:
|
||||
all_inline = False
|
||||
break
|
||||
child_role_info = child.get("role", {})
|
||||
child_role = (
|
||||
child_role_info.get("value", "") if isinstance(child_role_info, dict) else str(child_role_info)
|
||||
)
|
||||
if child_role != "InlineTextBox":
|
||||
all_inline = False
|
||||
break
|
||||
child_name_info = child.get("name", {})
|
||||
child_name = (
|
||||
child_name_info.get("value", "") if isinstance(child_name_info, dict) else str(child_name_info)
|
||||
)
|
||||
if child_name and child_name not in parent_name:
|
||||
all_inline = False
|
||||
break
|
||||
|
||||
if all_inline:
|
||||
ids_to_remove.update(child_ids)
|
||||
n["childIds"] = []
|
||||
|
||||
if not ids_to_remove:
|
||||
return nodes
|
||||
|
||||
return [n for n in nodes if n["nodeId"] not in ids_to_remove]
|
||||
|
||||
def _format_ax_tree(self, nodes: list[dict], max_nodes: int = 2000, mode: str = "default") -> str:
|
||||
"""Format a CDP Accessibility.getFullAXTree result.
|
||||
|
||||
Args:
|
||||
nodes: List of accessibility tree nodes
|
||||
max_nodes: Maximum number of nodes to process (prevents hangs on huge trees)
|
||||
mode: Filtering mode — "default" (full tree), "simple" (interactive +
|
||||
content, skip unnamed structural), "interactive" (interactive only)
|
||||
"""
|
||||
from .refs import INTERACTIVE_ROLES, STRUCTURAL_ROLES
|
||||
|
||||
if not nodes:
|
||||
return "(empty tree)"
|
||||
|
||||
@@ -2059,11 +2228,21 @@ class BeelineBridge:
|
||||
_walk(cid, depth)
|
||||
return
|
||||
|
||||
node_counter[0] += 1
|
||||
|
||||
name_info = node.get("name", {})
|
||||
name = name_info.get("value", "") if isinstance(name_info, dict) else str(name_info)
|
||||
|
||||
# Mode-based filtering — skip node but walk children at same depth
|
||||
if mode == "interactive" and role not in INTERACTIVE_ROLES:
|
||||
for cid in children_map.get(node_id, []):
|
||||
_walk(cid, depth)
|
||||
return
|
||||
if mode == "simple" and role in STRUCTURAL_ROLES and not name:
|
||||
for cid in children_map.get(node_id, []):
|
||||
_walk(cid, depth)
|
||||
return
|
||||
|
||||
node_counter[0] += 1
|
||||
|
||||
# Build property annotations
|
||||
props: list[str] = []
|
||||
for prop in node.get("properties", []):
|
||||
@@ -2080,18 +2259,7 @@ class BeelineBridge:
|
||||
label = f"- {role}"
|
||||
|
||||
# Add ref for interactive elements
|
||||
interactive_roles = {
|
||||
"button",
|
||||
"link",
|
||||
"textbox",
|
||||
"checkbox",
|
||||
"radio",
|
||||
"combobox",
|
||||
"menuitem",
|
||||
"tab",
|
||||
"searchbox",
|
||||
}
|
||||
if role in interactive_roles or name:
|
||||
if role in INTERACTIVE_ROLES or name:
|
||||
ref_counter[0] += 1
|
||||
ref_id = f"e{ref_counter[0]}"
|
||||
ref_map[ref_id] = f"[{role}]{name}"
|
||||
|
||||
@@ -0,0 +1,186 @@
|
||||
"""Tool schemas for the bridge remote HTTP API (port 9230)."""
|
||||
|
||||
TOOL_SCHEMAS: dict[str, dict] = {
|
||||
"browser_click": {
|
||||
"description": "Click an element on the page.",
|
||||
"params": {
|
||||
"selector": {"type": "string", "required": True},
|
||||
"tab_id": {"type": "integer"},
|
||||
"profile": {"type": "string"},
|
||||
"button": {"type": "string", "default": "left", "enum": ["left", "right", "middle"]},
|
||||
"double_click": {"type": "boolean", "default": False},
|
||||
"timeout_ms": {"type": "integer", "default": 5000},
|
||||
},
|
||||
},
|
||||
"browser_click_coordinate": {
|
||||
"description": "Click at specific viewport coordinates (CSS pixels).",
|
||||
"params": {
|
||||
"x": {"type": "number", "required": True},
|
||||
"y": {"type": "number", "required": True},
|
||||
"tab_id": {"type": "integer"},
|
||||
"profile": {"type": "string"},
|
||||
"button": {"type": "string", "default": "left"},
|
||||
},
|
||||
},
|
||||
"browser_type": {
|
||||
"description": "Type text into an input element.",
|
||||
"params": {
|
||||
"selector": {"type": "string", "required": True},
|
||||
"text": {"type": "string", "required": True},
|
||||
"tab_id": {"type": "integer"},
|
||||
"profile": {"type": "string"},
|
||||
"delay_ms": {"type": "integer", "default": 1},
|
||||
"clear_first": {"type": "boolean", "default": True},
|
||||
"timeout_ms": {"type": "integer", "default": 30000},
|
||||
"use_insert_text": {"type": "boolean", "default": True},
|
||||
},
|
||||
},
|
||||
"browser_fill": {
|
||||
"description": "Fill an input element (clears existing content first).",
|
||||
"params": {
|
||||
"selector": {"type": "string", "required": True},
|
||||
"value": {"type": "string", "required": True},
|
||||
"tab_id": {"type": "integer"},
|
||||
"profile": {"type": "string"},
|
||||
"timeout_ms": {"type": "integer", "default": 30000},
|
||||
},
|
||||
},
|
||||
"browser_type_focused": {
|
||||
"description": "Type text into the already-focused element. Use after browser_click_coordinate has focused the target. Faster than browser_press for multi-character input.",
|
||||
"params": {
|
||||
"text": {"type": "string", "required": True},
|
||||
"tab_id": {"type": "integer"},
|
||||
"profile": {"type": "string"},
|
||||
"delay_ms": {"type": "integer", "default": 1},
|
||||
"clear_first": {"type": "boolean", "default": True},
|
||||
"use_insert_text": {"type": "boolean", "default": True},
|
||||
},
|
||||
},
|
||||
"browser_press": {
|
||||
"description": "Press a keyboard key, optionally with modifiers.",
|
||||
"params": {
|
||||
"key": {"type": "string", "required": True},
|
||||
"selector": {"type": "string"},
|
||||
"tab_id": {"type": "integer"},
|
||||
"profile": {"type": "string"},
|
||||
"modifiers": {"type": "array", "items": "string"},
|
||||
},
|
||||
},
|
||||
"browser_press_at": {
|
||||
"description": "Move mouse to coordinates then press a key.",
|
||||
"params": {
|
||||
"x": {"type": "number", "required": True},
|
||||
"y": {"type": "number", "required": True},
|
||||
"key": {"type": "string", "required": True},
|
||||
"tab_id": {"type": "integer"},
|
||||
"profile": {"type": "string"},
|
||||
},
|
||||
},
|
||||
"browser_navigate": {
|
||||
"description": "Navigate a tab to a URL.",
|
||||
"params": {
|
||||
"url": {"type": "string", "required": True},
|
||||
"tab_id": {"type": "integer"},
|
||||
"profile": {"type": "string"},
|
||||
"wait_until": {"type": "string", "default": "load"},
|
||||
},
|
||||
},
|
||||
"browser_go_back": {
|
||||
"description": "Navigate back in browser history.",
|
||||
"params": {
|
||||
"tab_id": {"type": "integer"},
|
||||
"profile": {"type": "string"},
|
||||
},
|
||||
},
|
||||
"browser_go_forward": {
|
||||
"description": "Navigate forward in browser history.",
|
||||
"params": {
|
||||
"tab_id": {"type": "integer"},
|
||||
"profile": {"type": "string"},
|
||||
},
|
||||
},
|
||||
"browser_reload": {
|
||||
"description": "Reload the current page.",
|
||||
"params": {
|
||||
"tab_id": {"type": "integer"},
|
||||
"profile": {"type": "string"},
|
||||
},
|
||||
},
|
||||
"browser_scroll": {
|
||||
"description": "Scroll the page.",
|
||||
"params": {
|
||||
"direction": {"type": "string", "default": "down", "enum": ["up", "down", "left", "right"]},
|
||||
"amount": {"type": "integer", "default": 500},
|
||||
"tab_id": {"type": "integer"},
|
||||
"profile": {"type": "string"},
|
||||
},
|
||||
},
|
||||
"browser_hover": {
|
||||
"description": "Hover over an element.",
|
||||
"params": {
|
||||
"selector": {"type": "string", "required": True},
|
||||
"tab_id": {"type": "integer"},
|
||||
"profile": {"type": "string"},
|
||||
"timeout_ms": {"type": "integer", "default": 30000},
|
||||
},
|
||||
},
|
||||
"browser_hover_coordinate": {
|
||||
"description": "Hover at CSS pixel coordinates.",
|
||||
"params": {
|
||||
"x": {"type": "number", "required": True},
|
||||
"y": {"type": "number", "required": True},
|
||||
"tab_id": {"type": "integer"},
|
||||
"profile": {"type": "string"},
|
||||
},
|
||||
},
|
||||
"browser_select": {
|
||||
"description": "Select option(s) in a dropdown.",
|
||||
"params": {
|
||||
"selector": {"type": "string", "required": True},
|
||||
"values": {"type": "array", "required": True},
|
||||
"tab_id": {"type": "integer"},
|
||||
"profile": {"type": "string"},
|
||||
},
|
||||
},
|
||||
"browser_screenshot": {
|
||||
"description": "Take a screenshot of the page (returns base64 PNG).",
|
||||
"params": {
|
||||
"tab_id": {"type": "integer"},
|
||||
"profile": {"type": "string"},
|
||||
"full_page": {"type": "boolean", "default": False},
|
||||
},
|
||||
},
|
||||
"browser_snapshot": {
|
||||
"description": "Get the accessibility tree snapshot of the page.",
|
||||
"params": {
|
||||
"tab_id": {"type": "integer"},
|
||||
"profile": {"type": "string"},
|
||||
},
|
||||
},
|
||||
"browser_evaluate": {
|
||||
"description": "Evaluate JavaScript in the page.",
|
||||
"params": {
|
||||
"expression": {"type": "string", "required": True},
|
||||
"tab_id": {"type": "integer"},
|
||||
"profile": {"type": "string"},
|
||||
},
|
||||
},
|
||||
"browser_get_text": {
|
||||
"description": "Get text content of an element.",
|
||||
"params": {
|
||||
"selector": {"type": "string", "required": True},
|
||||
"tab_id": {"type": "integer"},
|
||||
"profile": {"type": "string"},
|
||||
},
|
||||
},
|
||||
"browser_wait": {
|
||||
"description": "Wait for an element or text to appear on the page.",
|
||||
"params": {
|
||||
"selector": {"type": "string"},
|
||||
"text": {"type": "string"},
|
||||
"tab_id": {"type": "integer"},
|
||||
"profile": {"type": "string"},
|
||||
"timeout_ms": {"type": "integer", "default": 30000},
|
||||
},
|
||||
},
|
||||
}
|
||||
@@ -13,7 +13,13 @@ from typing import TYPE_CHECKING
|
||||
if TYPE_CHECKING:
|
||||
from .session import BrowserSession
|
||||
|
||||
# Role sets for interactive elements
|
||||
"""Shared ARIA role classification sets.
|
||||
|
||||
Keep these in sync across snapshot paths — divergence causes different
|
||||
drivers to produce different snapshot output for the same page.
|
||||
"""
|
||||
|
||||
# Roles that represent user-interactive elements and always get a ref.
|
||||
INTERACTIVE_ROLES: frozenset[str] = frozenset(
|
||||
{
|
||||
"button",
|
||||
@@ -26,7 +32,6 @@ INTERACTIVE_ROLES: frozenset[str] = frozenset(
|
||||
"menuitemradio",
|
||||
"option",
|
||||
"radio",
|
||||
"scrollbar",
|
||||
"searchbox",
|
||||
"slider",
|
||||
"spinbutton",
|
||||
@@ -37,11 +42,44 @@ INTERACTIVE_ROLES: frozenset[str] = frozenset(
|
||||
}
|
||||
)
|
||||
|
||||
NAMED_CONTENT_ROLES: frozenset[str] = frozenset(
|
||||
# Roles that carry meaningful content and get a ref when named.
|
||||
CONTENT_ROLES: frozenset[str] = frozenset(
|
||||
{
|
||||
"article",
|
||||
"cell",
|
||||
"columnheader",
|
||||
"gridcell",
|
||||
"heading",
|
||||
"img",
|
||||
"listitem",
|
||||
"main",
|
||||
"navigation",
|
||||
"region",
|
||||
"rowheader",
|
||||
}
|
||||
)
|
||||
|
||||
# Structural/container roles — typically skipped in compact mode.
|
||||
STRUCTURAL_ROLES: frozenset[str] = frozenset(
|
||||
{
|
||||
"application",
|
||||
"directory",
|
||||
"document",
|
||||
"generic",
|
||||
"grid",
|
||||
"group",
|
||||
"ignored",
|
||||
"list",
|
||||
"menu",
|
||||
"menubar",
|
||||
"none",
|
||||
"presentation",
|
||||
"row",
|
||||
"rowgroup",
|
||||
"table",
|
||||
"tablist",
|
||||
"toolbar",
|
||||
"tree",
|
||||
"treegrid",
|
||||
}
|
||||
)
|
||||
|
||||
@@ -81,7 +119,7 @@ def annotate_snapshot(snapshot: str) -> tuple[str, RefMap]:
|
||||
role = m.group(2)
|
||||
name = m.group(3)
|
||||
|
||||
if role in INTERACTIVE_ROLES or (role in NAMED_CONTENT_ROLES and name):
|
||||
if role in INTERACTIVE_ROLES or (role in CONTENT_ROLES and name):
|
||||
candidates.append((i, role, name))
|
||||
|
||||
ref_map: RefMap = {}
|
||||
|
||||
@@ -547,6 +547,7 @@ def register_inspection_tools(mcp: FastMCP) -> None:
|
||||
async def browser_snapshot(
|
||||
tab_id: int | None = None,
|
||||
profile: str | None = None,
|
||||
mode: Literal["default", "simple", "interactive"] = "default",
|
||||
) -> dict:
|
||||
"""
|
||||
Get an accessibility snapshot of the page.
|
||||
@@ -565,12 +566,16 @@ def register_inspection_tools(mcp: FastMCP) -> None:
|
||||
Args:
|
||||
tab_id: Chrome tab ID (default: active tab)
|
||||
profile: Browser profile name (default: "default")
|
||||
mode: Snapshot filtering mode (default: "default")
|
||||
- "default": full accessibility tree
|
||||
- "simple": interactive + content nodes, skip unnamed structural nodes
|
||||
- "interactive": only interactive nodes (buttons, links, inputs, etc.)
|
||||
|
||||
Returns:
|
||||
Dict with the snapshot text tree, URL, and tab ID
|
||||
"""
|
||||
start = time.perf_counter()
|
||||
params = {"tab_id": tab_id, "profile": profile}
|
||||
params = {"tab_id": tab_id, "profile": profile, "mode": mode}
|
||||
|
||||
bridge = get_bridge()
|
||||
if not bridge or not bridge.is_connected:
|
||||
@@ -591,7 +596,7 @@ def register_inspection_tools(mcp: FastMCP) -> None:
|
||||
return result
|
||||
|
||||
try:
|
||||
snapshot_result = await bridge.snapshot(target_tab)
|
||||
snapshot_result = await bridge.snapshot(target_tab, mode=mode)
|
||||
log_tool_call(
|
||||
"browser_snapshot",
|
||||
params,
|
||||
|
||||
@@ -179,43 +179,34 @@ def register_interaction_tools(mcp: FastMCP) -> None:
|
||||
text: str,
|
||||
tab_id: int | None = None,
|
||||
profile: str | None = None,
|
||||
delay_ms: int = 0,
|
||||
delay_ms: int = 1,
|
||||
clear_first: bool = True,
|
||||
timeout_ms: int = 30000,
|
||||
use_insert_text: bool = True,
|
||||
) -> dict:
|
||||
"""
|
||||
Type text into an input element.
|
||||
Click a selector to focus it, then type text into it.
|
||||
|
||||
Automatically routes through a real CDP pointer click on the
|
||||
element before inserting text — so that rich-text editors like
|
||||
Lexical (Gmail, LinkedIn DMs), Draft.js (X compose), and
|
||||
ProseMirror (Reddit) see a native focus event and enable their
|
||||
submit buttons. See the gcu-browser skill for the full "click-
|
||||
then-type" pattern.
|
||||
|
||||
By default uses CDP Input.insertText which is the most reliable
|
||||
way to insert text into rich editors. Set
|
||||
``use_insert_text=False`` to fall back to per-character
|
||||
keyDown/keyUp events (needed only for code editors that fire
|
||||
on specific keystrokes, or when ``delay_ms`` typing animation
|
||||
is required).
|
||||
Uses CDP ``Input.insertText`` by default, which works for both
|
||||
standard inputs and many rich-text editors. Use
|
||||
``browser_type_focused`` when the target is already focused or
|
||||
you cannot reliably address it with a selector.
|
||||
|
||||
Args:
|
||||
selector: CSS selector for the input element
|
||||
text: Text to type
|
||||
tab_id: Chrome tab ID (default: active tab)
|
||||
profile: Browser profile name (default: "default")
|
||||
delay_ms: Delay between keystrokes in ms (default: 0).
|
||||
Forces the per-keystroke fallback when > 0.
|
||||
clear_first: Clear existing text before typing (default: True)
|
||||
timeout_ms: Timeout waiting for element (default: 30000)
|
||||
selector: CSS selector for the input element.
|
||||
text: Text to type.
|
||||
tab_id: Chrome tab ID (default: active tab).
|
||||
profile: Browser profile name (default: "default").
|
||||
delay_ms: Delay between keystrokes in ms (default: 1).
|
||||
Forces the per-keystroke fallback when > 0.
|
||||
clear_first: Clear existing text before typing (default: True).
|
||||
timeout_ms: Timeout waiting for element (default: 30000).
|
||||
use_insert_text: Use CDP Input.insertText (default: True) for
|
||||
reliable insertion into rich-text editors.
|
||||
Set False for per-keystroke dispatch.
|
||||
reliable insertion into rich-text editors. Set False for
|
||||
per-keystroke dispatch.
|
||||
|
||||
Returns:
|
||||
Dict with type result
|
||||
Dict with type result.
|
||||
"""
|
||||
start = time.perf_counter()
|
||||
params = {"selector": selector, "text": text, "tab_id": tab_id, "profile": profile}
|
||||
@@ -293,6 +284,77 @@ def register_interaction_tools(mcp: FastMCP) -> None:
|
||||
timeout_ms=timeout_ms,
|
||||
)
|
||||
|
||||
@mcp.tool()
|
||||
async def browser_type_focused(
|
||||
text: str,
|
||||
tab_id: int | None = None,
|
||||
profile: str | None = None,
|
||||
delay_ms: int = 1,
|
||||
clear_first: bool = True,
|
||||
use_insert_text: bool = True,
|
||||
) -> dict:
|
||||
"""
|
||||
Type text into the already-focused element.
|
||||
|
||||
Targets ``document.activeElement`` and is ideal after a
|
||||
coordinate click, or when the editable cannot be reached
|
||||
reliably with a selector. Faster than repeated
|
||||
``browser_press`` calls for multi-character input.
|
||||
|
||||
Args:
|
||||
text: Text to insert at the current cursor position.
|
||||
tab_id: Chrome tab ID (default: active tab).
|
||||
profile: Browser profile name (default: "default").
|
||||
delay_ms: Delay between keystrokes in ms (default: 1).
|
||||
Forces per-keystroke dispatch when > 0.
|
||||
clear_first: Clear existing text before typing (default: True).
|
||||
use_insert_text: Use CDP Input.insertText (default: True).
|
||||
|
||||
Returns:
|
||||
Dict with type result.
|
||||
"""
|
||||
start = time.perf_counter()
|
||||
params = {"text": text, "tab_id": tab_id, "profile": profile}
|
||||
|
||||
bridge = get_bridge()
|
||||
if not bridge or not bridge.is_connected:
|
||||
result = {"ok": False, "error": "Browser extension not connected"}
|
||||
log_tool_call("browser_type_focused", params, result=result)
|
||||
return result
|
||||
|
||||
ctx = _get_context(profile)
|
||||
if not ctx:
|
||||
result = {"ok": False, "error": "Browser not started. Call browser_start first."}
|
||||
log_tool_call("browser_type_focused", params, result=result)
|
||||
return result
|
||||
|
||||
target_tab = tab_id or ctx.get("activeTabId")
|
||||
if target_tab is None:
|
||||
result = {"ok": False, "error": "No active tab"}
|
||||
log_tool_call("browser_type_focused", params, result=result)
|
||||
return result
|
||||
|
||||
try:
|
||||
type_result = await bridge.type_text(
|
||||
target_tab,
|
||||
None,
|
||||
text,
|
||||
clear_first=clear_first,
|
||||
delay_ms=delay_ms,
|
||||
use_insert_text=use_insert_text,
|
||||
)
|
||||
log_tool_call(
|
||||
"browser_type_focused",
|
||||
params,
|
||||
result=type_result,
|
||||
duration_ms=(time.perf_counter() - start) * 1000,
|
||||
)
|
||||
return type_result
|
||||
except Exception as e:
|
||||
result = {"ok": False, "error": str(e)}
|
||||
log_tool_call("browser_type_focused", params, error=e, duration_ms=(time.perf_counter() - start) * 1000)
|
||||
return result
|
||||
|
||||
@mcp.tool()
|
||||
async def browser_press(
|
||||
key: str,
|
||||
|
||||
Reference in New Issue
Block a user