- test_background_job: use sys.executable and double quotes instead of
single-quoted 'python -c' which Windows cmd.exe doesn't understand
- test_cli_entry_point: guard against None stdout on Windows with
(result.stdout or "").lower()
- test_safe_eval: bump DEFAULT_TIMEOUT_MS from 100 to 500 to accommodate
slow Windows CI runners where SIGALRM is unavailable
* fix(tests): resolve test failures across framework and tools
Framework tests (52 -> 1 failure):
- Add missing `model` attribute to mock LLM classes (MockStreamingLLM,
CrashingLLM, ErrorThenSuccessLLM, etc.) to match new agent_loop.py
requirement at line 624
- Update skill count assertions from 6 to 7 (new writing-hive-skills)
- Fix phase compaction test to match new message format (no brackets)
- Update model catalog test for current gemini model names
- Fix queen memory test: set phase="building" to match prompt_building,
adjust reflection trigger count to match cooldown behavior
Tools tests (52 -> 0 failures):
- Update csv_tool tests: remove agent_id parameter, use absolute paths,
patch _ALLOWED_ROOTS instead of AGENT_SANDBOXES_DIR
- Fix browser_evaluate test to allow toast wrapper around script
Remaining: 1 pre-existing failure in test_worker_report where mock LLM
gets stuck when scenarios are exhausted (separate bug).
* fix(tests): resolve remaining test failures
- Add text stop scenario to test_worker_report so worker terminates
cleanly after tool_calls finish instead of replaying the last
scenario forever
- Remove duplicated hive home isolation fixture from test_colony_fork_live;
reuse conftest autouse fixture and only add config copy on top
* fix(tests): prevent mock LLM infinite loops on exhausted scenarios
fix(core): accept both pruned tool result sentinel formats
MockStreamingLLM and _ByTaskMockLLM replay the last scenario forever
when call_index exceeds the scenario list, causing worker timeouts in
CI. Fix by emitting a text stop when scenarios are exhausted (scenarios
mode) or already consumed (by_task mode).
Also fix pruned tool result sentinel mismatch: conversation.py produces
"Pruned tool result ..." but compaction.py and conversation.py only
checked for "[Pruned tool result". Now both formats are accepted.
Also remove duplicated hive home isolation fixture from
test_colony_fork_live; reuse conftest autouse fixture instead.
The previous code did `text[:max_length] + "..."`, which made the
returned content always 3 chars longer than the requested max_length.
Reserve room for the ellipsis inside the limit so the contract holds.
Fixes#2098
* feat(tools): add Weights & Biases experiment tracking and model monitoring integration
* style: fix ruff formatting in wandb_tool.py
* feat(tools): add Weights & Biases ML experiment tracking integration
* fix(tools): address CodeRabbit review comments on wandb_tool
* fix(tools): rewrite wandb_tool to use official Python SDK instead of undocumented REST endpoints
* fix(tools): address Hundao review — remove .coverage, switch to GraphQL/httpx, fix wandb_host, add README
* fix(tools): wire filters to GraphQL, validate empty metric_keys, fix line lengths
* fix(tools): check credentials before input validation in wandb_get_run_metrics
Move _get_creds() call before run_id/metric_keys checks so the
framework credential test receives the expected {error, help} response
instead of a bare input-validation error.
* docs(tools): add README for 10 tools (batch 3)
Adds README.md for: supabase_tool, zoom_tool, twitter_tool,
twilio_tool, shopify_tool, snowflake_tool, zendesk_tool,
yahoo_finance_tool, youtube_transcript_tool, docker_hub_tool
Partial fix for #6486
* docs(shopify): fix fulfillment_status value and body_html param name
- fulfillment_status example: "unfulfilled" is not a valid Shopify API
value, changed to "unshipped"
- tool table: "description" is not the actual param name, it's body_html
---------
Co-authored-by: hundao <alchemy_wimp@hotmail.com>
* docs(tools): add README for huggingface, jira, pinecone, langfuse, linear, mongodb, redis, vercel, confluence
* docs(tools): fix review comments in confluence, mongodb and vercel READMEs
* docs(mongodb): add MONGODB_DATA_SOURCE to setup section
The code uses os.getenv("MONGODB_DATA_SOURCE") in every API request
body as the dataSource field. Without it, requests send an empty
dataSource and fail. Add it back to the setup section.
---------
Co-authored-by: hundao <alchemy_wimp@hotmail.com>
* fix(tools): move playwright back to main dependencies
playwright was moved to the browser extra in c7e85aa9 as part of the GCU
refactor to use a browser extension. But web_scrape_tool still imports
playwright at module level and requires it unconditionally, so CI's
Test Tools job breaks with ModuleNotFoundError.
web_scrape_tool has no fallback without playwright — it's a hard
dependency, not optional. Put it back in main deps.
Fixes CI failure on Test Tools (ubuntu-latest).
* chore: remove dead test_highlights.py script
tools/test_highlights.py is orphaned from the GCU refactor in c7e85aa9:
- imports highlight_coordinate and highlight_element from gcu.browser.highlight,
but highlight.py was deleted in that refactor
- calls BrowserSession.start(), open_tab(), get_active_page(), stop() — none
of these methods exist on the current BrowserSession class
The script can't run at all, and it's tripping ruff's I001 import-order
check (fail on Lint CI after cache invalidation).
* test: fix browser/refs tests broken by GCU refactor
Tests were still testing the old Playwright-based API after c7e85aa9
moved GCU to an extension-bridge architecture.
test_refs.py (6 tests):
Refs system now produces CSS selectors like
[role="button"][aria-label="Submit"]:nth-of-type(1) for the bridge's
DOM matcher, instead of Playwright's role=button[name="Submit"] >> nth=0.
Updated expected values to match. Renamed test_escapes_quotes_in_name to
test_quoted_name_passes_through and added a comment noting that inner
quotes aren't currently escaped (follow-up concern).
test_browser_tools_comprehensive.py (4 tests):
- test_screenshot_full_page: browser_screenshot passes selector=None
when no selector is provided; update assertion.
- test_file_upload: browser_upload validates file paths exist on disk.
Create real tmp files and mock the CDP calls it makes.
- test_evaluate_with_bare_return: renamed to
test_evaluate_passes_script_through_to_bridge. IIFE wrapping lives
in bridge.evaluate, not in the browser_evaluate tool — mocking the
bridge bypasses the wrapping logic, so the tool just passes the
script through.
- test_evaluate_complex_script: browser_evaluate returns bridge's raw
result (no 'ok' wrapper); check for 'result' key instead.
test_browser_advanced_tools.py (deleted):
The whole file patched get_session and page.wait_for_function (the old
Playwright-based API). The bug it guarded against (user text interpolated
into a JS source string) is architecturally impossible in the new
bridge-based tools, which send text via structured RPC. Coverage for
browser_wait exists in test_browser_tools_comprehensive.py.
* test(core): fix event_loop tests broken by hive-v1 refactor
Several framework tests were left failing or hanging after the hive-v1
refactor landed. This un-breaks CI without touching production code.
- Worker auto-escalation: 8 tests were hanging because EventLoopNode
with event_bus treats non-queen/non-subagent nodes as workers and
auto-escalates to queen, then blocks on _await_user_input forever
(no queen in standalone tests). Opt out via is_subagent_mode=True.
- MockConversationStore: added clear() to match the production store
(storage/conversation_store.py), which event_loop_node.py:425 calls.
- Executor output semantics: result.output now only contains terminal-
node outputs; two handoff tests now read intermediate outputs from
result.session_state["data_buffer"].
- Restore filter: test_restore_from_checkpoint needs set_current_phase
so restore()'s phase_id filter matches.
- Removed two _build_context tests whose target method no longer exists
(replaced by standalone build_node_context()). Remaining execution_id
coverage is adequate in TestExecutionId + integration tests.
* style: ruff format + drop em dash in comment
* test(core): fix remaining framework tests broken by hive-v1 refactor
Rounds out the fix started in the previous commit. Full framework
suite now passes (1589 passed, 0 failed).
- conftest.py: force-bind framework.runner submodules (mcp_registry,
mcp_client, mcp_connection_manager) as attributes on the parent
package. Without this, pytest monkeypatch.setattr with dotted-string
paths fails because the attribute walker can't resolve the submodule
even though __init__.py imports from it. Affects ~25 MCP tests.
- test_queen_memory: _execute_tool() grew a required caller kwarg for
worker type-restrictions. Pass caller="queen" so path-traversal
checks run without caller restrictions interfering.
- test_session_manager_worker_handoff: _subscribe_worker_digest was
removed in the refactor, dropped the dead monkeypatches.
- test_skill_context_protection: NodeConversation now reads _run_id
in add_tool_result(), so the __new__-based test helper has to
initialise it.
- test_node_conversation: restore() now filters parts by run_id for
crash recovery. Renamed the stale test and flipped the assertion
to match the new filtering semantics.
- test_tool_registry: CONTEXT_PARAMS was updated (workspace_id out,
profile in). Switched the test's example stripped params.
* docs: drop circular PR reference in test_refs comment
Addresses CodeRabbit nitpick. The comment referenced the PR that was
adding the comment, which becomes a self-reference after merge.