Commit Graph

3210 Commits

Author SHA1 Message Date
Richard Tang f52c44821a feat: partially validation after typing 2026-04-17 12:16:13 -07:00
Richard Tang 97432ea08c feat: colony side bar 2026-04-17 11:52:49 -07:00
Timothy 0abd1125b7 fix: parallel execution 2026-04-17 11:20:06 -07:00
Timothy 803337ec74 feat: new queen phases 2026-04-17 06:19:15 -07:00
Timothy 2b055d4d42 fix: simplify system prompt 2026-04-17 04:47:51 -07:00
Timothy dde4dfaec9 Merge branch 'feature/colony-sqlite' into feature/clean-context 2026-04-17 04:12:35 -07:00
Timothy 6be026fcb1 fix: partial parts and system nudge 2026-04-17 04:06:59 -07:00
Richard Tang 3c2161aad5 chore: release v0.10.2
Release / Create Release (push) Waiting to run
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v0.10.2
2026-04-16 23:43:20 -07:00
Richard Tang e74ebe6835 feat: reduce gemini context window to improve reliability 2026-04-16 23:41:24 -07:00
Richard Tang d788e5b2f7 chore: ruff lint 2026-04-16 23:33:48 -07:00
Richard Tang 583a5b41b4 fix: ununsed reference 2026-04-16 23:23:38 -07:00
Richard Tang 83cc44bdef Merge branch 'feature/full-image-size' 2026-04-16 23:15:59 -07:00
Timothy 558813e7fa feat: fraction-based visual clicks 2026-04-16 22:36:41 -07:00
Timothy aba0ff07ba fix: model invariant screenshot 2026-04-16 20:29:05 -07:00
Timothy 4303a36df0 fix: namespaced browser tab groups 2026-04-16 20:07:05 -07:00
Timothy e68d8ef10b fix: do not kill queen when switching 2026-04-16 19:29:00 -07:00
Richard Tang c6b6a5a2f7 feat: GCP skills and prompts improvements 2026-04-16 17:43:52 -07:00
Richard Tang 18f5f078fc feat: dashed highlighter for browser type focus 2026-04-16 17:26:09 -07:00
Richard Tang cc6ec97a75 feat: multiple modes browser snapshot tool 2026-04-16 17:22:44 -07:00
Richard Tang 44d114f0d0 feat: default 1ms delay and prompt improvements 2026-04-16 16:19:38 -07:00
Richard Tang 9e71f16d15 Merge remote-tracking branch 'origin/fix/browser-behaviour-improvements' into fix/browser-behaviour-improvements 2026-04-16 16:14:43 -07:00
Richard Tang 28cad2376c feat: separate type focus tool 2026-04-16 16:08:43 -07:00
Timothy 8222cd306e fix: simplify canonical workflow 2026-04-16 16:02:37 -07:00
Timothy b50f237506 fix: screenshot skill diction 2026-04-16 15:16:22 -07:00
Richard Tang 916803889f feat: browswer control tools improvement and debugger 2026-04-16 15:14:08 -07:00
Timothy 59b1bc9338 fix: tool grouping logic 2026-04-16 12:55:10 -07:00
Timothy 37672c5581 fix: remove worker tool from dm 2026-04-16 12:23:19 -07:00
Timothy 7b0948cd62 Merge branch 'refactor/worker-message' into feature/colony-sqlite 2026-04-16 11:26:46 -07:00
Timothy 4aa5fd7a90 refactor: align worker display 2026-04-16 11:26:32 -07:00
Richard Tang d20b617008 feat: queen profile in message bubbles 2026-04-16 11:21:02 -07:00
Timothy c4ee12532f fix: worker message display 2026-04-16 11:20:17 -07:00
Richard Tang 36ebf27e3e feat: make side bar size adjustble 2026-04-16 11:15:47 -07:00
Richard Tang ae1599c66a feat: queen profile side bar 2026-04-16 11:15:30 -07:00
Richard Tang 810cf5a6d3 Merge remote-tracking branch 'origin/main' into feature/colony-sqlite 2026-04-16 11:10:34 -07:00
Timothy 1ee0d5a2e8 feat: worker bubble display 2026-04-16 10:48:44 -07:00
Hundao 9051c443fb fix(tests): resolve Windows CI failures (#7061)
- test_background_job: use sys.executable and double quotes instead of
  single-quoted 'python -c' which Windows cmd.exe doesn't understand
- test_cli_entry_point: guard against None stdout on Windows with
  (result.stdout or "").lower()
- test_safe_eval: bump DEFAULT_TIMEOUT_MS from 100 to 500 to accommodate
  slow Windows CI runners where SIGALRM is unavailable
2026-04-16 21:05:09 +08:00
Hundao e5a93b059f fix(tests): resolve test failures across framework and tools (#7059)
* fix(tests): resolve test failures across framework and tools

Framework tests (52 -> 1 failure):
- Add missing `model` attribute to mock LLM classes (MockStreamingLLM,
  CrashingLLM, ErrorThenSuccessLLM, etc.) to match new agent_loop.py
  requirement at line 624
- Update skill count assertions from 6 to 7 (new writing-hive-skills)
- Fix phase compaction test to match new message format (no brackets)
- Update model catalog test for current gemini model names
- Fix queen memory test: set phase="building" to match prompt_building,
  adjust reflection trigger count to match cooldown behavior

Tools tests (52 -> 0 failures):
- Update csv_tool tests: remove agent_id parameter, use absolute paths,
  patch _ALLOWED_ROOTS instead of AGENT_SANDBOXES_DIR
- Fix browser_evaluate test to allow toast wrapper around script

Remaining: 1 pre-existing failure in test_worker_report where mock LLM
gets stuck when scenarios are exhausted (separate bug).

* fix(tests): resolve remaining test failures

- Add text stop scenario to test_worker_report so worker terminates
  cleanly after tool_calls finish instead of replaying the last
  scenario forever
- Remove duplicated hive home isolation fixture from test_colony_fork_live;
  reuse conftest autouse fixture and only add config copy on top

* fix(tests): prevent mock LLM infinite loops on exhausted scenarios

fix(core): accept both pruned tool result sentinel formats

MockStreamingLLM and _ByTaskMockLLM replay the last scenario forever
when call_index exceeds the scenario list, causing worker timeouts in
CI. Fix by emitting a text stop when scenarios are exhausted (scenarios
mode) or already consumed (by_task mode).

Also fix pruned tool result sentinel mismatch: conversation.py produces
"Pruned tool result ..." but compaction.py and conversation.py only
checked for "[Pruned tool result". Now both formats are accepted.

Also remove duplicated hive home isolation fixture from
test_colony_fork_live; reuse conftest autouse fixture instead.
2026-04-16 20:13:43 +08:00
Hundao 589c5b06fe fix: resolve all ruff lint and format errors across codebase (#7058)
- Auto-fixed 70 lint errors (import sorting, aliased errors, datetime.UTC)
- Fixed 85 remaining errors manually:
  - E501: wrapped long lines in queen_profiles, catalog, routes_credentials
  - F821: added missing TYPE_CHECKING imports for AgentHost, ToolRegistry,
    HookContext, HookResult; added runtime imports where needed
  - F811: removed duplicate method definitions in queen_lifecycle_tools
  - F841/B007: removed unused variables in discovery.py
  - W291: removed trailing whitespace in queen nodes
  - E402: moved import to top of queen_memory_v2.py
  - Fixed AgentRuntime -> AgentHost in example template type annotations
- Reformatted 343 files with ruff format
2026-04-16 19:30:01 +08:00
Richard Tang be94c611bd fix: queen fail when no worker is running 2026-04-15 22:14:36 -07:00
Timothy 45df68c146 feat: ensure sqlite3 installation 2026-04-15 18:34:33 -07:00
Richard Tang 4fdbc438f9 chore: release v0.10.1
Release / Create Release (push) Waiting to run
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
v0.10.1
2026-04-15 18:15:40 -07:00
Timothy 2231dc5742 fix: delete spilled skill 2026-04-15 18:14:10 -07:00
Timothy 446844b2ad fix: tighten worker with sqlite skills 2026-04-15 18:11:15 -07:00
Richard Tang 78301274cd feat: broswer tool improvements 2026-04-15 18:09:28 -07:00
Timothy e719523434 fix: remove conflicting tools 2026-04-15 17:38:05 -07:00
Richard Tang 451a5d55d2 feat: queen independent prompt improvements 2026-04-15 17:36:48 -07:00
Richard Tang e2a21b3613 chore: title of finance 2026-04-15 16:55:00 -07:00
Richard Tang 5c251645d3 Merge branch 'main' into feat/gui-ux-updates 2026-04-15 16:45:39 -07:00
Richard Tang 8783f372fc feat: use the customtools model for gemini 2026-04-15 16:44:23 -07:00
bryan 2790d13bb6 Merge branch 'main' into feat/gui-ux-updates 2026-04-15 15:45:56 -07:00