* fix(ci): apply ruff format to browser tool files
Refs #7083
* fix(ci): unbreak test_refs (img regression) and test_model_catalog
test_refs:
- Add `img` back to CONTENT_ROLES so named images get refs again. The
recent `cc6ec97a feat: multiple modes browser snapshot tool` refactor
renamed NAMED_CONTENT_ROLES → CONTENT_ROLES and accidentally dropped
`img`, breaking `test_named_content_roles_get_refs`.
- Drop the `navigation` assertion from `test_skips_structural_roles`.
That same refactor intentionally added landmark roles (navigation,
main, listitem) to CONTENT_ROLES so AI agents can ref them by name,
and the test was not updated to reflect that.
test_model_catalog:
- Add 5 openrouter models that were added to model_catalog.json by
#7081 (UI/UX improvements) but not reflected in the test.
Refs #7083
* fix(ci): wait for event propagation in subagent report test on Windows
`test_worker_report_emits_subagent_report_event` waited only for
`worker.is_active` to flip to False, then immediately asserted on the
collected events. On Windows the event loop scheduling differs enough
that the SUBAGENT_REPORT subscriber callback can run a few ticks after
the worker is marked inactive, so the assertion fires against an empty
list. Wait for both conditions.
Refs #7083
- test_background_job: use sys.executable and double quotes instead of
single-quoted 'python -c' which Windows cmd.exe doesn't understand
- test_cli_entry_point: guard against None stdout on Windows with
(result.stdout or "").lower()
- test_safe_eval: bump DEFAULT_TIMEOUT_MS from 100 to 500 to accommodate
slow Windows CI runners where SIGALRM is unavailable
* fix(tests): resolve test failures across framework and tools
Framework tests (52 -> 1 failure):
- Add missing `model` attribute to mock LLM classes (MockStreamingLLM,
CrashingLLM, ErrorThenSuccessLLM, etc.) to match new agent_loop.py
requirement at line 624
- Update skill count assertions from 6 to 7 (new writing-hive-skills)
- Fix phase compaction test to match new message format (no brackets)
- Update model catalog test for current gemini model names
- Fix queen memory test: set phase="building" to match prompt_building,
adjust reflection trigger count to match cooldown behavior
Tools tests (52 -> 0 failures):
- Update csv_tool tests: remove agent_id parameter, use absolute paths,
patch _ALLOWED_ROOTS instead of AGENT_SANDBOXES_DIR
- Fix browser_evaluate test to allow toast wrapper around script
Remaining: 1 pre-existing failure in test_worker_report where mock LLM
gets stuck when scenarios are exhausted (separate bug).
* fix(tests): resolve remaining test failures
- Add text stop scenario to test_worker_report so worker terminates
cleanly after tool_calls finish instead of replaying the last
scenario forever
- Remove duplicated hive home isolation fixture from test_colony_fork_live;
reuse conftest autouse fixture and only add config copy on top
* fix(tests): prevent mock LLM infinite loops on exhausted scenarios
fix(core): accept both pruned tool result sentinel formats
MockStreamingLLM and _ByTaskMockLLM replay the last scenario forever
when call_index exceeds the scenario list, causing worker timeouts in
CI. Fix by emitting a text stop when scenarios are exhausted (scenarios
mode) or already consumed (by_task mode).
Also fix pruned tool result sentinel mismatch: conversation.py produces
"Pruned tool result ..." but compaction.py and conversation.py only
checked for "[Pruned tool result". Now both formats are accepted.
Also remove duplicated hive home isolation fixture from
test_colony_fork_live; reuse conftest autouse fixture instead.
The previous code did `text[:max_length] + "..."`, which made the
returned content always 3 chars longer than the requested max_length.
Reserve room for the ellipsis inside the limit so the contract holds.
Fixes#2098