feat: reduce gemini context window to improve reliability
This commit is contained in:
@@ -0,0 +1,64 @@
|
||||
# 🐝 Hive Agent v0.10.2
|
||||
|
||||
> A browser-automation-focused follow-up to **v0.10.1**. Coordinates that flow between the vision model and Chrome are now **fractions of the viewport** instead of screenshot pixels — so the same `(x, y)` works across Claude, GPT-4o, Gemini, and any other VLM regardless of how each one resizes or tiles the image. Plus reliability fixes for queen switching, tab-group isolation, and CI.
|
||||
|
||||
---
|
||||
|
||||
## ✨ Highlights
|
||||
|
||||
- **Model-invariant visual clicks.** Every coordinate-taking browser tool (`browser_click_coordinate`, `browser_hover_coordinate`, `browser_press_at`) and every rect-returning tool (`browser_get_rect`, `browser_shadow_query`, the `rect` inside `focused_element`) now speaks in `0..1` fractions of the viewport. Vision-model pixel resizing no longer silently breaks clicks when you swap backends.
|
||||
- **Queens survive profile/queen switches.** Switching queens no longer tears down the active queen's runtime.
|
||||
- **Tab-group isolation.** Browser tab groups are now namespaced per profile, so stale highlight / attach state can't bleed across profiles when Chrome reuses a tab id.
|
||||
- **Remote browser debugger.** New `scripts/browser_remote.py` + HTML UI give a visual debugging surface for the Chrome extension bridge — live screenshots, coord inspector, and one-click test harness for the GCU tools.
|
||||
- **Greener CI.** All framework/tools test failures resolved and Windows CI is unbroken; full ruff lint + format pass across the codebase.
|
||||
|
||||
---
|
||||
|
||||
## 🆕 What's New
|
||||
|
||||
### Browser automation
|
||||
|
||||
- **Fraction-based coordinates** — all click / hover / press / rect tools now use `(0..1, 0..1)` fractions of the viewport. Internally each tool multiplies by the cached `cssWidth` / `cssHeight` before dispatching to CDP. Four-decimal precision (`0.0001` ≈ 0.17 CSS px on a 1717-wide viewport) is sufficient for the tightest targets. (@timothyadenhq)
|
||||
- **`browser_type_focused` — dedicated focused-element typing tool** split out from `browser_type`. Use after `browser_click_coordinate` focuses the target; faster than `browser_press` for multi-character input. (@RichardTang-Aden)
|
||||
- **Multi-mode screenshot tool** — `browser_screenshot` gained viewport / full-page / selector-clip modes and returns `cssWidth` / `cssHeight` in metadata so callers can reason about viewport size if they need to. (@RichardTang-Aden)
|
||||
- **Dashed highlighter for type-focus events** — visual differentiation between click (solid) and type-focus (dashed) highlights on post-interaction screenshots. (@RichardTang-Aden)
|
||||
- **Default 1 ms key delay + prompt tuning** — `browser_type` now uses a 1 ms delay by default (was higher), matching what real rich-text editors expect; related orchestrator prompt improvements. (@RichardTang-Aden)
|
||||
- **Remote browser debugger UI** — `scripts/browser_remote.py` + `scripts/browser_remote_ui.html` provide a live visual surface to exercise the GCU browser bridge (screenshots, click targeting, coord readout). (@RichardTang-Aden)
|
||||
- **Iframe-aware `focused_element`** — same-origin iframe descent (capped at 5 levels), so `focused_element` reports the real inner element instead of `{tag: "iframe"}`. Adds an `inFrame: [...]` breadcrumb when traversed. (@timothyadenhq)
|
||||
|
||||
### Skills & prompts
|
||||
|
||||
- **Browser / LinkedIn automation skills rewritten** around the new fraction convention — "read proportion off the image" workflow, updated rect examples, updated troubleshooting entries. (@timothyadenhq, @RichardTang-Aden)
|
||||
- **GCP skills and prompt improvements** — polish on the browser-edge-cases skill and the queen GCU reference guide. (@RichardTang-Aden)
|
||||
- **Canonical workflow simplified** — slimmer, less prescriptive guidance in the default browser/linkedin skills. (@timothyadenhq)
|
||||
|
||||
### Core / server
|
||||
|
||||
- **Namespaced browser tab groups** — per-profile `tab_group` tracking in `queen_orchestrator` / `session_manager`, with a `clear_tab_highlights(tab_ids)` cleanup hook called on context destruction so stale highlight / attach state can't leak onto reused tab ids. (@timothyadenhq)
|
||||
- **Don't kill the queen on switch** — queen switching no longer invokes the "stop runtime" path, keeping active sessions alive across UI navigation. (@timothyadenhq)
|
||||
|
||||
### Developer experience
|
||||
|
||||
- **Codebase-wide ruff clean** — 155 lint errors (70 auto-fixed + 85 manual) resolved across framework and tools; 343 files reformatted. Long-line, missing-import, duplicate-method, and W291 whitespace issues all cleared. (#7058)
|
||||
- **Framework + tools test suite green** — 52 → 0 failures across framework tests (mock LLM `model` attribute, updated skill/prompt assertions, compaction formatting, model catalog) and tools tests (csv_tool paths, browser_evaluate toast wrapper). (#7059)
|
||||
- **Windows CI unbroken** — background-job test uses `sys.executable` + double quotes, CLI entry-point guards against `None` stdout, safe-eval timeout bumped for slower Windows runners. (#7061)
|
||||
|
||||
---
|
||||
|
||||
## 🐛 Bug Fixes
|
||||
|
||||
- **Fraction-click tab-state leak (`_screenshot_css_scales` NameError)** — `clear_tab_state` raised `NameError` on every tab close and profile teardown because a removed cache was still referenced. Fixed in `tools/src/gcu/browser/tools/inspection.py`.
|
||||
- **Missing highlight cleanup on profile destroy** — introduced `clear_tab_highlights` so orphaned highlight state doesn't reappear when Chrome reuses a tab id on a later profile.
|
||||
- **Queen session shutdown on switch** — switching between queens no longer terminates the active queen's runtime.
|
||||
- **Pruned tool-result sentinel mismatch** — compaction / conversation now accept both `Pruned tool result ...` and `[Pruned tool result ...]` sentinel shapes.
|
||||
- **Mock LLM infinite loop on exhausted scenarios** — `MockStreamingLLM` and `_ByTaskMockLLM` now emit a clean text-stop when scenarios are consumed, unblocking `test_worker_report`.
|
||||
|
||||
---
|
||||
|
||||
## ⬆️ Upgrading from v0.10.1
|
||||
|
||||
No migration steps for stored state — existing `~/.hive/` profiles, queens, and sessions continue to work.
|
||||
|
||||
**Behavior change for direct callers of browser coord tools:** `browser_click_coordinate`, `browser_hover_coordinate`, `browser_press_at`, and rect-returning tools now expect and return **fractions** of the viewport (`0..1` on each axis) instead of screenshot pixels. Agents using the default browser-automation skill get this automatically — the skill was updated alongside the tool change. Only custom code that hardcoded pixel coordinates against the prior 800 px-wide screenshot space needs adjustment: divide by `cssWidth` / `cssHeight` (now exposed in `browser_screenshot` metadata) to convert.
|
||||
|
||||
Pull `main` at the `v0.10.2` tag and restart Hive.
|
||||
Reference in New Issue
Block a user