Compare commits

..

88 Commits

Author SHA1 Message Date
Richard Tang fe74718fd9 chore: lint
CI / Lint Python (push) Waiting to run
CI / Test Python Framework (ubuntu-latest) (push) Waiting to run
CI / Test Python Framework (windows-latest) (push) Waiting to run
CI / Test Tools (ubuntu-latest) (push) Waiting to run
CI / Test Tools (windows-latest) (push) Waiting to run
CI / Validate Agent Exports (push) Blocked by required conditions
2026-05-04 17:57:56 -07:00
Richard Tang 07c97e2e9b feat: llm logging 2026-05-04 17:57:20 -07:00
Richard Tang 07600c5ab5 feat: encourage action plan prompts 2026-05-04 17:55:44 -07:00
Richard Tang e7d4ce0057 chore: lint 2026-05-04 12:36:28 -07:00
Richard Tang d9813288d9 fix: install system mcp when they fail 2026-05-04 12:35:21 -07:00
Richard Tang 41fbdcb940 fix(frontend): mcp tools server title format 2026-05-04 12:35:21 -07:00
Hundao 4a9b22719b fix(antigravity): unblock Gemini chats — schema sanitizer + UA bump (#7170)
* fix(antigravity): translate JSON Schema unions to Gemini nullable

Tool parameter schemas using JSON Schema 2020-12 unions like
"type": ["string", "null"] crash Gemini's function_declarations parser
with HTTP 400. Two existing tools trip this:

- core/framework/tasks/tools/colony_tools.py:52 (owner in _update_schema)
- core/framework/tasks/tools/session_tools.py:84-87 (same shape)

Add an adapter-level sanitizer that walks the schema tree and converts
union-with-null to OpenAPI 3.0 "nullable": true (which Gemini accepts).
Recurses into properties, items, additionalProperties, and the
anyOf/oneOf/allOf combinators. Source schemas remain valid JSON Schema
so OpenAI/Anthropic backends are unaffected.

* fix(antigravity): bump spoofed UA past Google's deprecation cutoff

Google has deprecated client version "Antigravity/1.18.3" — chats now
return "This version of Antigravity is no longer supported" instead of
a real model response.

Bump the spoofed User-Agent to "Antigravity/1.23.2" + "Electron/39.2.3"
(current desktop release) and add a comment that this needs periodic
re-bumping. A more durable fix (auto-detect from the installed app's
Info.plist) is a follow-up.

* fix(antigravity): fail loud on multi-type non-null Gemini schema unions

Per review on PR #7170: silently picking the first type from a union
like ["string", "integer", "null"] changes the contract for callers
that rely on the other types, and the failure is hard to diagnose at
the Gemini side. Replace the silent narrowing with a ValueError that
points the schema author at anyOf or a single type.

A repo scan finds no current Gemini-bound schemas using multi-type
non-null unions, so this branch is preventative for future authors.

* chore(antigravity): drop em dash from test docstring
2026-05-05 01:16:48 +08:00
Hundao 8cb0531959 fix(ci): unblock main CI, sort imports + install Playwright Chromium (#7172)
* fix(lint): organize imports in queen_orchestrator.create_queen

Ruff I001 blocks CI on every PR against main. The deferred imports
inside create_queen were not in alphabetical order between the queen
package and the framework package; ruff auto-fix moves
framework.config below the framework.agents.queen.nodes block.

No behavior change.

* fix(ci): install Playwright Chromium before Test Tools job

The new chart_tools smoke tests added in feabf327 require a Chromium
build for ECharts/Mermaid rendering, but the test-tools workflow only
ran `uv sync` and went straight to pytest. Three tests
(test_render_echarts_bar_chart, test_render_echarts_accepts_string_spec,
test_render_mermaid_flowchart) crash on every PR with:

    BrowserType.launch: Executable doesn't exist at
    /home/runner/.cache/ms-playwright/chromium_headless_shell-1208/...

Split the install/run into separate steps and add `playwright install
chromium` before pytest. Use `--with-deps` on Linux to pull system
libraries; Windows runners only need the browser binary.

* fix(tests): adapt test_file_state_cache to new file_ops API

The file_ops rewrite in feabf327 dropped the standalone hashline_edit
tool (the file_system_toolkits/hashline_edit/ directory was removed)
and switched edit_file to a mode-first signature
(mode, path, old_string, new_string, ...).

The test fixture still tried to look up "hashline_edit" via the MCP
tool manager and crashed with KeyError before any test could run, and
the edit_file calls were positional in the old order so they hit
"unknown mode 'e.py'" once the fixture was fixed.

Drop the stale hashline_edit lookup and pass mode="replace" explicitly
to every edit_file call. All 11 tests pass locally.

* fix(tests): skip terminal_tools tests on Windows (POSIX-only)

The new terminal_tools package added in feabf327 imports the Unix-only
`resource` module in tools/src/terminal_tools/common/limits.py to set
RLIMIT_CPU / RLIMIT_AS / RLIMIT_FSIZE on subprocesses. Five of the
six terminal_tools test files therefore crash on windows-latest with
`ModuleNotFoundError: No module named 'resource'` once their fixtures
trigger the import chain.

test_terminal_tools_pty.py already has the right module-level skip
(PTY is POSIX-only). Apply the same `pytestmark = skipif(win32)` to
the other five so the whole suite skips cleanly on Windows. The
terminal-tools package is bash-only by design (zsh refused at the
shell-resolver level), so a Windows port is out of scope.
2026-05-05 00:32:59 +08:00
Richard Tang feabf32768 fix: worker context token 2026-05-03 11:45:37 -07:00
Richard Tang eee55ea8c7 chore: fix wrong model name 2026-05-03 11:35:05 -07:00
Richard Tang 78fffa63ec chore: ci and release doc
Release / Create Release (push) Waiting to run
2026-05-01 18:06:39 -07:00
Richard Tang 9a75d45351 chore: lint 2026-05-01 17:53:44 -07:00
Timothy 3a94f52009 feat: sync tool result contentful display 2026-05-01 17:44:19 -07:00
Timothy 522e0f511e fix: y-axis 2026-05-01 15:48:36 -07:00
Timothy e6310f1243 fix: normalize chart spec in renderer 2026-05-01 15:36:09 -07:00
Richard Tang 12ffacccab feat: tools config frontend grouping and tools cleanup 2026-05-01 15:28:40 -07:00
Timothy 8c36b1575c Merge branch 'feature/merge-to-file-ops' into feat/file-ops 2026-05-01 14:57:21 -07:00
Timothy 6540f7b31e feat: pura linea 2026-05-01 14:57:06 -07:00
Richard Tang a09eac06f1 feat: improve web search and consolidate browser open 2026-05-01 14:55:20 -07:00
Richard Tang b939a875a7 refactor: update autocompaction tools and concurrency tools 2026-05-01 14:27:38 -07:00
Richard Tang b826e70d8c feat: remove old lifecyle tools 2026-05-01 14:07:34 -07:00
Richard Tang 6f2f037c9c feat: remvoe other default tools 2026-05-01 13:35:04 -07:00
Richard Tang c147364d8c feat: browser tools audit and improvements 2026-05-01 13:22:31 -07:00
Richard Tang 35bd497750 feat: refactor edit file and update default tools 2026-05-01 12:40:53 -07:00
Richard Tang 574c4bbe33 Merge remote-tracking branch 'origin/feature/sync-20260430' into feat/file-ops 2026-05-01 07:42:20 -07:00
Richard Tang d22a01682a feat: major file ops refactor 2026-05-01 07:41:42 -07:00
Timothy 0c6f0f8aef refactor: rename shell tools to terminal tools 2026-04-30 19:52:34 -07:00
Richard Tang 0e8efa7bcc feat: vision fallback auth 2026-04-30 19:52:12 -07:00
Timothy 7b1dda7bf3 fix: mcp registry initialization 2026-04-30 19:52:04 -07:00
Timothy 725dd1f410 fix: shell split command 2026-04-30 19:52:01 -07:00
Timothy de4b2dc151 chore: give shell tools to queen 2026-04-30 19:51:57 -07:00
Timothy 0784cea314 fix: initial install 2026-04-30 19:51:55 -07:00
Timothy 20bbf08278 feat: perita manus 2026-04-30 19:51:44 -07:00
Richard Tang f8233bda56 feat: consolidate search and list file tools 2026-04-30 15:43:15 -07:00
Richard Tang 76a7dd4bd5 feat: loose the max token for vision fallback as some models spend output on internal thinking 2026-04-30 13:24:49 -07:00
Richard Tang 73511a3c59 feat: vision fallback with intent 2026-04-30 13:02:57 -07:00
Richard Tang a0817fcde4 feat: vision model retry and fallback 2026-04-30 12:38:30 -07:00
Richard Tang 628ce9ca12 feat: use simple snapshot for auto_snapshot_mode 2026-04-30 10:43:14 -07:00
Richard Tang cc4213a942 fix: llm debugger tool call display 2026-04-30 10:31:21 -07:00
Richard Tang d12d5b7e8b fix: llm debugger timeline order 2026-04-30 08:05:38 -07:00
Harshit Shukla 038c5fd807 fix(credentials): align EnvVarStorage exists with load semantics (#5680)
* Return boolean from exists method for credential check

* Add test for empty value handling in EnvVarStorage

Add test to verify exists() and load() consistency for empty values in EnvVarStorage.
2026-04-30 19:40:28 +08:00
Leayx 3d5f2595c9 bug(test_zoho_crm_tool): remove orphan test directory under src (#7142)
Problem
- The Zoho CRM tool was refactored to an MCP-based architecture, making the old in-tree test suite obsolete
- The remaining tests under src were not executed by pytest. Testpaths only includes tools/tests, effectively making them dead code
- A proper MCP test suite already exists under tools/tests, providing coverage

Decision
- Removed the unused test directory under src/aden_tools/tools/zoho_crm_tool/tests
- Aligns project structure with existing tools, where tests are only located in tools/tests
- Avoids confusion and prevents future contributors from relying on outdated or non-executed tests
2026-04-30 18:57:49 +08:00
Hundao 7881177f1f fix: unbreak main CI — skills HIVE_HOME refactor + run_parallel_workers task text (#7149)
* fix(skills): restore module-level path constants for HIVE_HOME refactor

ae2aa30e replaced module-level USER_SKILLS_DIR / INSTALL_NOTICE_SENTINEL
in installer.py and _NOTICE_SENTINEL_PATH / _TRUSTED_REPOS_PATH in
trust.py with lazy helper functions, but left callers and tests still
referencing the original symbols. CI fails with ImportError /
AttributeError.

Restore them as module-level constants computed from HIVE_HOME so the
desktop-shell override still works, callers in cli.py keep importing
the same names, and existing test monkeypatches stay valid.

Refs #7148

* fix(colony): preserve task text in run_parallel_workers spawn data

run_parallel_workers stamps __template_task_id into spec['data'] before
calling spawn_batch. Once that mutation makes spec['data'] non-empty,
colony_runtime.spawn()'s ``input_data or {"task": task}`` fallback no
longer fires and the task description disappears from the worker's
first user message. Workers loop on empty responses and never emit
SUBAGENT_REPORT.

Hoist the ``setdefault("task")`` step out of the template-publish try
block so task text survives even if the template store fails
non-fatally. Inner loop only stamps __template_task_id.

Refs #7148
2026-04-30 18:43:54 +08:00
Richard Tang 2cfea915f4 chore: ruff format 2026-04-29 19:23:31 -07:00
Richard Tang ac46a1be72 Merge branch 'feat/ask-user-chat-display' 2026-04-29 19:22:51 -07:00
Richard Tang 7b0b472167 chore: lint 2026-04-29 19:16:00 -07:00
Richard Tang 697aae33fe feat: prompts simplification 2026-04-29 19:13:01 -07:00
Richard Tang d26e7f33d2 fix: incubating mode approval guidence injection 2026-04-29 18:43:26 -07:00
Richard Tang 6357597e88 chore: improve llm visibility 2026-04-29 18:37:29 -07:00
Richard Tang 579f1d7512 feat(tasks): refactor task folder 2026-04-29 17:33:34 -07:00
bryan 965264c973 fix: defer ask_user question bubble until user answers 2026-04-29 16:31:19 -07:00
Richard Tang e80d275321 feat(queen): drop redundant _queen_style prompt block 2026-04-29 15:47:18 -07:00
Richard Tang 5b45fac435 feat: prompt improvements 2026-04-29 15:26:58 -07:00
Richard Tang 4794c8b816 chore: log the vision fallback model usage 2026-04-29 13:08:38 -07:00
Timothy 5492366c31 fix: recover frontend 2026-04-29 11:25:29 -07:00
Timothy ae2aa30edf fix: all agent path prefixed by HIVE_HOME 2026-04-28 19:16:35 -07:00
Timothy dd69a53de1 fix: hardcoded hive home 2026-04-28 18:25:21 -07:00
bryan 062a4e3166 feat: new-session navigation with queen warm-up UI 2026-04-28 18:17:25 -07:00
bryan fe9a903928 feat: surface ask_user questions in chat transcript 2026-04-28 18:16:47 -07:00
Timothy 7c3bada70c fix: patch litellm 2026-04-28 18:00:31 -07:00
Timothy 4ef951447d fix: compaction issues 2026-04-28 10:43:54 -07:00
Timothy ccb6556a41 Merge branch 'main' into feature/colony-push 2026-04-27 18:24:41 -07:00
Timothy 5ca5021fc1 feat: colony routes 2026-04-27 18:24:18 -07:00
Richard Tang 9eeba74851 Merge branch 'feat/tasks-system' 2026-04-27 10:55:50 -07:00
Richard Tang facd919371 chore: task list order 2026-04-27 10:55:18 -07:00
Richard Tang cb1484be85 feat: multi task creation 2026-04-27 10:35:02 -07:00
Timothy 82ce6bed68 Merge branch 'feat/api-colonies-import' 2026-04-26 21:13:18 -07:00
Timothy efdb404655 feat: POST /api/colonies/import — onboard a colony from a tarball
Accepts a multipart upload of `tar` / `tar.gz` (any compression
tarfile.open auto-detects) containing a single top-level directory and
unpacks it into HIVE_HOME/colonies/<name>. Lets a desktop client (or any
external tool) hand a colony spec to a remote runtime to run.

Form fields:
  file              (required)  the archive blob
  name              (optional)  override the colony name; defaults to
                                the archive's top-level dir
  replace_existing  (optional)  "true" to overwrite; else 409 if the
                                target dir already exists

Safety:
- 50 MB upload cap (multipart reader streams + caps each part)
- Manual path-traversal validation per member (Python 3.11 compatible —
  tarfile's safe `filter='data'` only landed in 3.12)
- Symlinks, hardlinks, device, fifo entries all rejected
- Colony name validated against the existing [a-z0-9_]+ pattern used by
  routes_colony_workers + queen_lifecycle_tools
- Mode bits masked to 0o755 / 0o644 so a tampered tar can't ship
  world-writable scripts

Tests cover happy path, name override, 409 / 201 around replace_existing,
path traversal, absolute paths, symlinks, multiple top-level dirs,
invalid colony name, missing file part, corrupt tar, non-multipart, and
uncompressed tar.

Future work (not in this PR): export endpoint, colony list/delete via
this same prefix, and an MCP tool wrapper so queens can move colonies
between hosts mid-conversation.
2026-04-26 20:16:10 -07:00
Richard Tang da361f735d chore: lint 2026-04-26 19:45:52 -07:00
Richard Tang eea0429f93 fix: improve prompt 2026-04-26 19:38:14 -07:00
Richard Tang 833aa4bc7a feat): fix structural blockers preventing the queen from using task_* , also enhanced the hook 2026-04-26 19:15:23 -07:00
Richard Tang 0af597881f feat(tasks): file-backed task system with colony template + UI 2026-04-26 18:49:45 -07:00
RichardTang-Aden 6fae1f04c8 Merge pull request #7143 from aden-hive/fix/scrolling-container
Fix/scrolling container
2026-04-26 12:14:29 -07:00
Richard Tang 8c4085f5e8 chore: lint 2026-04-26 11:35:16 -07:00
Richard Tang 53240eb888 fix: scroll with certian element selector 2026-04-26 11:34:47 -07:00
Hundao de8d6f0946 fix(tests): unblock main CI (#7141)
Two unrelated test failures were keeping main red:

- test_capabilities.py: fixtures referenced deprecated model identifiers
  no longer in model_catalog.json. After the catalog refactor unknown
  models default to vision-capable, so 12 "expect False" assertions
  flipped to True. Replace fixtures with current catalog entries that
  carry an explicit supports_vision flag.

- test_colony_runtime_overseer.py: a 200ms hard sleep racing the
  background worker was flaky on Windows CI. Poll for llm.stream_calls
  with a 5s deadline instead.
2026-04-26 21:34:21 +08:00
SAURABH KUMAR ea707438f2 feat(tools): add SimilarWeb V5 API integration (#7066)
Adds 29 MCP tools for SimilarWeb V5 covering traffic and engagement,
competitor intelligence, keywords/SERP, audience demographics, and
segment analysis. Includes credential spec, health checker, README,
and tests on ubuntu and windows.

Closes #7022
2026-04-26 20:37:44 +08:00
Richard Tang 445c9600ab chore: release v0.10.5
Release / Create Release (push) Waiting to run
Cache-aware cost reporting + new frontier models (GPT-5.5, DeepSeek V4
Pro/Flash, GLM-5.1). cache_control now propagates through OpenRouter
sub-providers (anthropic / gemini / glm / minimax) so the static system
prefix actually hits cache, and every response/finish event carries
cost_usd computed from a four-source fallback chain.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 20:21:03 -07:00
Richard Tang 2ab5e6d784 feat: model support 2026-04-24 20:17:41 -07:00
RichardTang-Aden e7f9b7d791 Merge pull request #7132 from vincentjiang777/feat/colony-session-transfer
feat: redesign configuration UI for sidebar, prompts, skills, and tools
2026-04-24 19:02:03 -07:00
Vincent Jiang 3cb0c69a96 feat: redesign configuration UI — sidebar, prompt library, skills, and tools
- Sidebar: rename Library to Configuration, reorder nav (Credentials 3rd, Configuration 4th), reorder sub-items (Prompts, Skills, Tools)
- Prompt Library: separate My Prompts from Community Prompts into distinct sections
- Skills Configuration: rename page title, sort queens by org chart order, group active/inactive skills, style Upload button as primary
- Tool Configuration: rename page title, sort queens by org chart order, add Save/Cancel/Allow all/Reset to defaults workflow, filter lifecycle tool names to fix "Unknown MCP tool name" save errors
- Fix (unknown) tool group label in server fallback catalog

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 10:41:38 -07:00
Richard Tang 22d75bfb05 chore: lint format 2026-04-24 10:12:06 -07:00
Vincent Jiang 357df1bbcb merge: pull upstream/main into feat/colony-session-transfer 2026-04-24 09:28:46 -07:00
Richard Tang 386bbd5780 feat: persistent cost tracking 2026-04-24 09:19:57 -07:00
Richard Tang 235022b35d feat: support glm 5.1 2026-04-24 07:45:37 -07:00
Richard Tang 4d8f312c3e Merge remote-tracking branch 'origin/feat/cache-token' into feature/vision-subagent 2026-04-23 22:21:45 -07:00
Timothy 4651a6a85a fix: vision caption 2026-04-23 21:30:59 -07:00
Timothy ea9c163438 feat: image vision fallback 2026-04-23 21:24:56 -07:00
267 changed files with 25545 additions and 13559 deletions
-1
View File
@@ -47,7 +47,6 @@
"Bash(grep -v ':0$')",
"Bash(curl -s -m 2 http://127.0.0.1:4002/sse -o /dev/null -w 'status=%{http_code} time=%{time_total}s\\\\n')",
"mcp__gcu-tools__browser_status",
"mcp__gcu-tools__browser_start",
"mcp__gcu-tools__browser_navigate",
"mcp__gcu-tools__browser_evaluate",
"mcp__gcu-tools__browser_screenshot",
@@ -214,7 +214,7 @@ Curated list of known browser automation edge cases with symptoms, causes, and f
| **Symptom** | `browser_open()` returns `"No group with id: XXXXXXX"` even though `browser_status` shows `running: true` |
| **Root Cause** | In-memory `_contexts` dict has a stale `groupId` from a Chrome tab group that was closed outside the tool (e.g. user closed the tab group) |
| **Detection** | `browser_status` returns `running: true` but `browser_open` fails with "No group with id" |
| **Fix** | Call `browser_stop()` to clear stale context from `_contexts`, then `browser_start()` again |
| **Fix** | Call `browser_stop()` to clear stale context from `_contexts`, then `browser_open(url)` to lazy-create a fresh one |
| **Code** | `tools/lifecycle.py:144-160` - `already_running` check uses cached dict without validating against Chrome |
| **Verified** | 2026-04-03 ✓ |
+16 -4
View File
@@ -84,11 +84,23 @@ jobs:
with:
enable-cache: true
- name: Install dependencies and run tests
- name: Install dependencies
working-directory: tools
run: |
uv sync --extra dev
uv run pytest tests/ -v
run: uv sync --extra dev
- name: Install Playwright Chromium (Linux)
if: runner.os == 'Linux'
working-directory: tools
run: uv run playwright install --with-deps chromium
- name: Install Playwright Chromium (Windows)
if: runner.os == 'Windows'
working-directory: tools
run: uv run playwright install chromium
- name: Run tests
working-directory: tools
run: uv run pytest tests/ -v
validate:
name: Validate Agent Exports
+2 -2
View File
@@ -407,7 +407,7 @@ Aden Hive supports **100+ LLM providers** via LiteLLM, giving users maximum flex
| **Anthropic** | Claude 3.5 Sonnet, Haiku, Opus | Default provider, best for reasoning |
| **OpenAI** | GPT-4, GPT-4 Turbo, GPT-4o | Function calling, vision |
| **OpenRouter** | Any OpenRouter catalog model | Uses `OPENROUTER_API_KEY` and `https://openrouter.ai/api/v1` |
| **Hive LLM** | `queen`, `kimi-2.5`, `GLM-5` | Uses `HIVE_API_KEY` and the Hive-managed endpoint |
| **Hive LLM** | `queen`, `kimi-k2.5`, `GLM-5` | Uses `HIVE_API_KEY` and the Hive-managed endpoint |
| **Google** | Gemini 1.5 Pro, Flash | Long context windows |
| **DeepSeek** | DeepSeek V3 | Cost-effective, strong reasoning |
| **Mistral** | Mistral Large, Medium, Small | Open weights, EU hosting |
@@ -435,7 +435,7 @@ DEFAULT_MODEL = "claude-haiku-4-5-20251001"
**Provider-Specific Notes**
- **OpenRouter**: store `provider` as `openrouter`, use the raw OpenRouter model ID in `model` (for example `x-ai/grok-4.20-beta`), and use `OPENROUTER_API_KEY`
- **Hive LLM**: store `provider` as `hive`, use Hive model names such as `queen`, `kimi-2.5`, or `GLM-5`, and use `HIVE_API_KEY`
- **Hive LLM**: store `provider` as `hive`, use Hive model names such as `queen`, `kimi-k2.5`, or `GLM-5`, and use `HIVE_API_KEY`
**For Development**
- Use cheaper/faster models (Haiku, GPT-4o-mini)
+3 -4
View File
@@ -72,17 +72,16 @@ Register an MCP server as a tool source for your agent.
"cwd": "../tools",
"description": "Aden tools..."
},
"tools_discovered": 6,
"tools_discovered": 5,
"tools": [
"web_search",
"web_scrape",
"file_read",
"file_write",
"pdf_read",
"example_tool"
"pdf_read"
],
"total_mcp_servers": 1,
"note": "MCP server 'tools' registered with 6 tools. These tools can now be used in event_loop nodes."
"note": "MCP server 'tools' registered with 5 tools. These tools can now be used in event_loop nodes."
}
```
+3 -3
View File
@@ -1,6 +1,6 @@
# MCP Server Guide - Agent Building Tools
> **Note:** The standalone `agent-builder` MCP server (`framework.mcp.agent_builder_server`) has been replaced. Agent building is now done via the `coder-tools` server's `initialize_and_build_agent` tool, with underlying logic in `tools/coder_tools_server.py`.
> **Note:** This document is stale. The previous `coder-tools` MCP server has been replaced by `files-tools` (`tools/files_server.py`), which only exposes file I/O (`read_file`, `write_file`, `edit_file`, `hashline_edit`, `search_files`). The agent-building, shell, and snapshot tools that used to live here have been removed.
This guide covers the MCP tools available for building goal-driven agents.
@@ -20,9 +20,9 @@ Add to your MCP client configuration (e.g., Claude Desktop):
```json
{
"mcpServers": {
"coder-tools": {
"files-tools": {
"command": "uv",
"args": ["run", "coder_tools_server.py", "--stdio"],
"args": ["run", "files_server.py", "--stdio"],
"cwd": "/path/to/hive/tools"
}
}
-2
View File
@@ -19,8 +19,6 @@ uv pip install -e .
## Agent Building
Agent scaffolding is handled by the `coder-tools` MCP server (in `tools/coder_tools_server.py`), which provides the `initialize_and_build_agent` tool and related utilities. The package generation logic lives directly in `tools/coder_tools_server.py`.
See the [Getting Started Guide](../docs/getting-started.md) for building agents.
## Quick Start
+214 -44
View File
@@ -14,7 +14,6 @@ from __future__ import annotations
import asyncio
import json
import logging
import os
import re
import time
import uuid
@@ -85,7 +84,12 @@ from framework.agent_loop.internals.types import (
JudgeVerdict,
TriggerEvent,
)
from framework.agent_loop.internals.vision_fallback import (
caption_tool_image,
extract_intent_for_tool,
)
from framework.agent_loop.types import AgentContext, AgentProtocol, AgentResult
from framework.config import get_vision_fallback_model
from framework.host.event_bus import EventBus
from framework.llm.capabilities import filter_tools_for_model, supports_image_tool_results
from framework.llm.provider import Tool, ToolResult, ToolUse
@@ -177,46 +181,58 @@ def _strip_internal_tags_from_snapshot(snapshot: str) -> str:
return cleaned
async def _describe_images_as_text(image_content: list[dict[str, Any]]) -> str | None:
"""Describe images using the best available vision model."""
import litellm
def _vision_fallback_active(model: str | None) -> bool:
"""Return True if tool-result images for *model* should be routed
through the vision-fallback chain rather than sent to the model.
blocks: list[dict[str, Any]] = [
{
"type": "text",
"text": (
"Describe the following image(s) concisely but with enough detail "
"that a text-only AI assistant can understand the content and context."
),
}
]
blocks.extend(image_content)
Trigger: the model's catalog entry has ``supports_vision: false``
(resolved via :func:`capabilities.supports_image_tool_results`,
which reads ``model_catalog.json``). Unknown models default to
vision-capable, so the fallback only fires when the catalog
explicitly says the model is text-only.
candidates: list[str] = []
if os.environ.get("OPENAI_API_KEY"):
candidates.append("gpt-4o-mini")
if os.environ.get("ANTHROPIC_API_KEY"):
candidates.append("claude-3-haiku-20240307")
if os.environ.get("GOOGLE_API_KEY") or os.environ.get("GEMINI_API_KEY"):
candidates.append("gemini/gemini-1.5-flash")
The ``vision_fallback`` config block is the *substitution* model
it doesn't widen the trigger. To force fallback for a model that
isn't catalogued yet, add an entry to ``model_catalog.json`` with
``supports_vision: false`` rather than relying on a runtime config.
"""
if not model:
return False
return not supports_image_tool_results(model)
for model in candidates:
try:
response = await litellm.acompletion(
model=model,
messages=[{"role": "user", "content": blocks}],
max_tokens=512,
)
description = (response.choices[0].message.content or "").strip()
if description:
count = len(image_content)
label = "image" if count == 1 else f"{count} images"
return f"[{label} attached — description: {description}]"
except Exception as exc:
logger.debug("Vision fallback model '%s' failed: %s", model, exc)
continue
return None
async def _captioning_chain(
intent: str,
image_content: list[dict[str, Any]],
) -> tuple[str, str] | None:
"""Configured vision_fallback → retry → ``gemini/gemini-3-flash-preview``.
The Gemini override reuses the configured ``api_key`` / ``api_base``,
so a Hive subscriber (whose token routes to a multi-model proxy)
keeps coverage when their primary model glitches. Without
configured creds litellm falls through to env-based Gemini auth;
users with neither Hive nor a ``GEMINI_API_KEY`` simply lose the
third try.
"""
if result := await caption_tool_image(intent, image_content):
return result
logger.warning("vision_fallback failed; retrying configured model")
if result := await caption_tool_image(intent, image_content):
return result
# Match the configured model's proxy prefix so the override is routed
# through the same endpoint with the same auth shape. Without this,
# a Hive subscriber's `hive/...` config would override to
# `gemini/...` — which sends Google's Gemini protocol to the
# Anthropic-compatible Hive proxy (404), not what we want.
configured = (get_vision_fallback_model() or "").lower()
if configured.startswith("hive/"):
override = "hive/gemini-3-flash-preview"
elif configured.startswith("kimi/"):
override = "kimi/gemini-3-flash-preview"
else:
override = "gemini/gemini-3-flash-preview"
logger.warning("vision_fallback retry failed; trying %s", override)
return await caption_tool_image(intent, image_content, model_override=override)
# Pattern for detecting context-window-exceeded errors across LLM providers.
@@ -376,6 +392,14 @@ class AgentLoop(AgentProtocol):
# dashboards can build aggregates over many runs.
self._counters: dict[str, int] = {}
# Task-system reminder state (see framework/tasks/reminders.py).
# Bumped each iteration; reset whenever a task op tool was called
# in the iteration that just completed; nudges the agent via the
# injection queue when it's been silent on tasks for too long.
from framework.tasks.reminders import ReminderState as _RS
self._task_reminder_state: _RS = _RS()
def _bump(self, key: str, by: int = 1) -> None:
"""Increment a reliability counter (creates the key on first use)."""
self._counters[key] = self._counters.get(key, 0) + by
@@ -626,8 +650,23 @@ class AgentLoop(AgentProtocol):
# Hide image-producing tools from text-only models so they never try
# to call them. Avoids wasted turns + "screenshot failed" lessons
# getting saved to memory. See framework.llm.capabilities.
# EXCEPTION: when the model IS on the text-only deny list AND
# a vision_fallback subagent is configured, leave image tools
# visible. The post-execution hook in the inner tool loop
# will route each image_content through the fallback VLM and
# replace it with a text caption before the main agent sees
# the result — so the main agent gets captions instead of
# raw images, rather than losing the tool entirely. We DON'T
# bypass the filter for vision-capable models (that would be
# a no-op anyway — the filter doesn't fire for them) and we
# DON'T bypass it without a configured fallback (the agent
# would just see raw stripped tool results with no caption).
_llm_model = ctx.llm.model if ctx.llm else ""
tools, _hidden_image_tools = filter_tools_for_model(tools, _llm_model)
_text_only_main = _llm_model and not supports_image_tool_results(_llm_model)
if _text_only_main and get_vision_fallback_model() is not None:
_hidden_image_tools: list[str] = []
else:
tools, _hidden_image_tools = filter_tools_for_model(tools, _llm_model)
logger.info(
"[%s] Tools available (%d): %s | direct_user_io=%s | judge=%s | hidden_image_tools=%s",
@@ -931,6 +970,17 @@ class AgentLoop(AgentProtocol):
)
total_input_tokens += turn_tokens.get("input", 0)
total_output_tokens += turn_tokens.get("output", 0)
# Task-system reminder: if the model has been silent on
# task ops for too long but still has open tasks, drop
# a steering reminder onto the injection queue. Drained
# at the next iteration's 6b so it lands as the next
# user turn via the normal injection path. Best-effort
# — never raises.
try:
await self._maybe_inject_task_reminder(ctx, logged_tool_calls)
except Exception:
logger.debug("task reminder check failed", exc_info=True)
await self._publish_llm_turn_complete(
stream_id,
node_id,
@@ -955,6 +1005,7 @@ class AgentLoop(AgentProtocol):
tool_calls=logged_tool_calls,
tool_results=real_tool_results,
token_counts=turn_tokens,
tools=tools,
)
# DS-13: inject context preservation warning once when token usage
@@ -3368,6 +3419,30 @@ class AgentLoop(AgentProtocol):
# Phase 3: record results into conversation in original order,
# build logged/real lists, and publish completed events.
#
# Vision-fallback prefetch: a single turn may fire several
# image-producing tools in parallel (e.g. one screenshot
# per tab). Captioning each one takes a vision LLM round
# trip (130 s). Doing them sequentially in this loop
# would serialise that latency per image. Instead, kick
# off all caption tasks concurrently NOW, and await each
# one just-in-time inside the per-tc body. If only a
# single image needs captioning, this collapses to a
# single await with no overhead.
_model_text_only = ctx.llm and _vision_fallback_active(ctx.llm.model)
caption_tasks: dict[str, asyncio.Task[tuple[str, str] | None]] = {}
if _model_text_only:
for tc in tool_calls[:executed_in_batch]:
res = results_by_id.get(tc.tool_use_id)
if not res or not res.image_content:
continue
intent = extract_intent_for_tool(
conversation,
tc.tool_name,
tc.tool_input or {},
)
caption_tasks[tc.tool_use_id] = asyncio.create_task(_captioning_chain(intent, res.image_content))
for tc in tool_calls[:executed_in_batch]:
result = results_by_id.get(tc.tool_use_id)
if result is None:
@@ -3390,11 +3465,33 @@ class AgentLoop(AgentProtocol):
logged_tool_calls.append(tool_entry)
image_content = result.image_content
if image_content and ctx.llm and not supports_image_tool_results(ctx.llm.model):
logger.info(
"Stripping image_content from tool result; model '%s' does not support images in tool results",
ctx.llm.model,
)
# Vision-fallback marker spliced into the persisted text
# below. None when no captioning ran (vision-capable
# main model, no images, or no fallback chain reached
# this tool).
vision_fallback_marker: str | None = None
if image_content and tc.tool_use_id in caption_tasks:
caption_result = await caption_tasks.pop(tc.tool_use_id)
if caption_result:
caption, vision_model = caption_result
vision_fallback_marker = f"[vision-fallback caption]\n{caption}"
logger.info(
"vision_fallback: captioned %d image(s) for tool '%s' "
"(main model '%s' routed through fallback model '%s')",
len(image_content),
tc.tool_name,
ctx.llm.model if ctx.llm else "?",
vision_model,
)
else:
vision_fallback_marker = "[image stripped — vision fallback exhausted]"
logger.info(
"vision_fallback: exhausted; stripping %d image(s) from "
"tool '%s' result without caption (model '%s')",
len(image_content),
tc.tool_name,
ctx.llm.model if ctx.llm else "?",
)
image_content = None
# Apply replay-detector steer prefix if this call matched a
@@ -3406,6 +3503,11 @@ class AgentLoop(AgentProtocol):
if _prefix:
stored_content = f"{_prefix}{stored_content or ''}"
# Splice the vision-fallback caption / placeholder into
# the persisted text after any prefix has been applied.
if vision_fallback_marker:
stored_content = f"{stored_content or ''}\n\n{vision_fallback_marker}"
await conversation.add_tool_result(
tool_use_id=tc.tool_use_id,
content=stored_content,
@@ -4032,7 +4134,7 @@ class AgentLoop(AgentProtocol):
queue=self._injection_queue,
conversation=conversation,
ctx=ctx,
describe_images_as_text_fn=_describe_images_as_text,
caption_image_fn=_captioning_chain,
)
async def _drain_trigger_queue(self, conversation: NodeConversation) -> int:
@@ -4096,6 +4198,74 @@ class AgentLoop(AgentProtocol):
execution_id=execution_id,
)
async def _maybe_inject_task_reminder(
self,
ctx: AgentContext,
logged_tool_calls: list[dict[str, Any]] | None,
) -> None:
"""Layer 3 task-system steering — periodic reminder injection.
Called once per iteration after the LLM turn completes. If the
model has been silent on task ops for a while AND there are open
tasks on its session list, queue a system-style reminder onto
the injection queue so the next iteration drains it as a user
turn. Idempotent / safe to call always gates internally.
``logged_tool_calls`` is a list of dicts with at least a "name"
key, as accumulated by ``_run_single_turn``. Names like
``task_create``, ``task_update``, ``colony_template_*`` reset
the counter (see ``framework.tasks.reminders.TASK_OP_TOOL_NAMES``).
"""
from framework.tasks import get_task_store
from framework.tasks.models import TaskStatus
from framework.tasks.reminders import build_reminder, saw_task_op
state = self._task_reminder_state
# 1. Update counters based on this turn's tool calls.
names: list[str] = []
for call in logged_tool_calls or []:
try:
name = call.get("name") or call.get("tool_name")
if name:
names.append(name)
except (AttributeError, TypeError):
continue
if saw_task_op(names):
state.on_task_op()
state.on_iteration()
# 2. Resolve the agent's task list. Skip if context isn't wired yet.
list_id = getattr(ctx, "task_list_id", None)
if not list_id:
return
# 3. Read the open-task snapshot. Best-effort.
try:
store = get_task_store()
records = await store.list_tasks(list_id)
except Exception:
return
open_tasks = [r for r in records if r.status != TaskStatus.COMPLETED]
if not state.should_remind(bool(open_tasks)):
return
body = build_reminder(records)
if not body:
return
# 4. Enqueue. Drained at the next iteration's 6b drain step and
# rendered as a user turn (with the "[External event]" prefix).
await self._injection_queue.put((body, False, None))
state.on_reminder_sent()
logger.info(
"[task-reminder] queued nudge for %s (open=%d, silent_turns=%d)",
list_id,
len(open_tasks),
state.turns_since_task_op,
)
self._bump("task_reminders_sent")
async def _run_hooks(
self,
event: str,
@@ -16,7 +16,6 @@ import os
import re
import time
from datetime import UTC, datetime
from pathlib import Path
from typing import Any
from framework.agent_loop.conversation import Message, NodeConversation
@@ -31,19 +30,38 @@ logger = logging.getLogger(__name__)
LLM_COMPACT_CHAR_LIMIT: int = 240_000
LLM_COMPACT_MAX_DEPTH: int = 10
# Microcompaction: tools whose results can be safely cleared
# Microcompaction: tools whose results can be safely cleared from context
# because the agent can re-derive them on demand. The bar for inclusion is
# "old result has no irreversible value": file content can be re-read, a
# search can be re-run, a screenshot can be re-captured, terminal output can
# be re-fetched, etc. Write / edit results are short confirmations whose
# value is in the side effect, not the message — also fair game.
COMPACTABLE_TOOLS: frozenset[str] = frozenset(
{
# File ops — content lives on disk, re-readable.
"read_file",
"run_command",
"web_search",
"web_fetch",
"grep_search",
"glob_search",
"search_files",
"write_file",
"edit_file",
"pdf_read",
# Terminal — re-runnable; advanced job/output tools produce verbose
# logs whose recent state is what matters.
"terminal_exec",
"terminal_rg",
"terminal_find",
"terminal_output_get",
"terminal_job_logs",
# Web / research — pages and queries can be re-fetched.
"web_scrape",
"search_papers",
"download_paper",
"search_wikipedia",
# Browser read-only inspection — current page state is what matters,
# old snapshots are stale by definition.
"browser_screenshot",
"list_directory",
"browser_snapshot",
"browser_html",
"browser_get_text",
}
)
@@ -657,8 +675,10 @@ def write_compaction_debug_log(
level: str,
inventory: list[dict[str, Any]] | None,
) -> None:
"""Write detailed compaction analysis to ~/.hive/compaction_log/."""
log_dir = Path.home() / ".hive" / "compaction_log"
"""Write detailed compaction analysis to $HIVE_HOME/compaction_log/."""
from framework.config import HIVE_HOME
log_dir = HIVE_HOME / "compaction_log"
log_dir.mkdir(parents=True, exist_ok=True)
ts = datetime.now(UTC).strftime("%Y%m%dT%H%M%S_%f")
@@ -857,7 +877,7 @@ def build_emergency_summary(
if not all_files:
parts.append(
"NOTE: Large tool results may have been saved to files. "
"Use list_directory to check the data directory."
"Use search_files(target='files', path='.') to check the data directory."
)
except Exception:
parts.append("NOTE: Large tool results were saved to files. Use read_file(path='<path>') to read them.")
@@ -162,9 +162,18 @@ async def drain_injection_queue(
conversation: NodeConversation,
*,
ctx: NodeContext,
describe_images_as_text_fn: (Callable[[list[dict[str, Any]]], Awaitable[str | None]] | None) = None,
caption_image_fn: (Callable[[str, list[dict[str, Any]]], Awaitable[tuple[str, str] | None]] | None) = None,
) -> int:
"""Drain all pending injected events as user messages. Returns count."""
"""Drain all pending injected events as user messages. Returns count.
``caption_image_fn`` is the unified vision fallback hook. It takes
``(intent, image_content)`` and returns ``(caption, model)`` on
success the model id is logged so the destination is observable.
The user's typed ``content`` (the injected message body) is passed
as the intent so the captioner can answer the user's specific
question about the image rather than producing a generic
description; an empty content falls back to a generic intent.
"""
count = 0
logger.debug(
"[drain_injection_queue] Starting to drain queue, initial queue size: %s",
@@ -184,11 +193,16 @@ async def drain_injection_queue(
"Model '%s' does not support images; attempting vision fallback",
ctx.llm.model,
)
if describe_images_as_text_fn is not None:
description = await describe_images_as_text_fn(image_content)
if description:
if caption_image_fn is not None:
intent = content or ("Describe these user-injected images for a text-only agent.")
caption_result = await caption_image_fn(intent, image_content)
if caption_result:
description, vision_model = caption_result
content = f"{content}\n\n{description}" if content else description
logger.info("[drain] image described as text via vision fallback")
logger.info(
"[drain] image described as text via vision fallback (model '%s')",
vision_model,
)
else:
logger.info("[drain] no vision fallback available; images dropped")
image_content = None
@@ -0,0 +1,306 @@
"""Vision-fallback subagent for tool-result images on text-only LLMs.
When a tool returns image content but the main agent's model can't
accept image blocks (i.e. its catalog entry has ``supports_vision: false``),
the framework strips the images before they ever reach the LLM. Without
this module, the agent then sees only the tool's text envelope (URL,
dimensions, size) and is blind to whatever the image actually shows.
This module provides:
* ``caption_tool_image()`` direct LiteLLM call to a configured
vision model (``vision_fallback`` block in ``~/.hive/configuration.json``)
that takes the agent's intent + the image(s) and returns a textual
description tailored to that intent.
* ``extract_intent_for_tool()`` pull the most recent assistant text
+ the tool call descriptor and concatenate them into a 2KB intent
string the vision subagent can reason against.
Both helpers degrade silently return ``None`` / a placeholder rather
than raise so a vision-fallback failure can never kill the main
agent's run. The agent-loop call site retries the configured model
once on a None return, then falls back to
``gemini/gemini-3-flash-preview`` via the ``model_override`` parameter
of :func:`caption_tool_image`.
"""
from __future__ import annotations
import json
import logging
from datetime import datetime
from typing import TYPE_CHECKING, Any
from framework.config import (
get_vision_fallback_api_base,
get_vision_fallback_api_key,
get_vision_fallback_model,
)
if TYPE_CHECKING:
from ..conversation import NodeConversation
logger = logging.getLogger(__name__)
# Hard cap on the intent string handed to the vision subagent. The
# subagent only needs the agent's recent reasoning + the tool descriptor;
# anything longer is wasted tokens (and risks pushing past the vision
# model's context with the image attached).
_INTENT_MAX_CHARS = 4096
# Cap on the tool args JSON snippet inside the intent. Some tool inputs
# (large strings, file contents) would dominate the intent if uncapped.
_TOOL_ARGS_MAX_CHARS = 4096
# Subagent system prompt — kept short so it fits within any provider's
# system-prompt budget alongside the user message + image. Tells the
# subagent its role and constrains output format.
#
# Coordinate labeling: the main agent's browser tools
# (browser_click_coordinate / browser_hover_coordinate / browser_press_at)
# accept VIEWPORT FRACTIONS (x, y) in [0..1] where (0,0) is the top-left
# and (1,1) is the bottom-right of the screenshot. Without coordinates
# the text-only agent has no way to act on what we describe — it can
# read the caption but cannot point. So for every interactive element
# we name (button, link, input, icon, tab, menu item, dialog control),
# include its approximate viewport-fraction centre as ``(fx, fy)``
# right after the element's name, e.g. ``"Submit" button (0.83, 0.92)``.
# Three rules: (1) coordinates only for things plausibly clickable /
# hoverable / typeable — don't tag pure body text or decorative
# graphics. (2) Eyeball to two decimal places; precision beyond that
# is false confidence. (3) Never invent — if an element is partly
# off-screen or you can't locate it, omit the coordinate rather than
# guessing.
_VISION_SUBAGENT_SYSTEM = (
"You are a vision subagent for a text-only main agent. The main "
"agent invoked a tool that returned the image(s) attached. Their "
"intent (their reasoning + the tool call) is below. Describe what "
"the image shows in service of their intent — concrete, factual, "
"no speculation. If their intent asks a yes/no question, answer it "
"directly first.\n\n"
"Coordinate labeling: the main agent uses fractional viewport "
"coordinates (x, y) in [0..1] — (0, 0) is the top-left of the "
"image, (1, 1) is the bottom-right — to drive its click / hover / "
"key-press tools. For every interactive element you mention "
"(button, link, input, checkbox, radio, dropdown, tab, menu item, "
"dialog control, icon), append its approximate centre as "
"``(fx, fy)`` immediately after the element's name or label, e.g. "
'``"Submit" button (0.83, 0.92)`` or ``profile avatar icon '
"(0.05, 0.07)``. Use two decimal places — more is false precision. "
"Skip coordinates for pure body text and decorative elements that "
"aren't clickable. If an element is partially off-screen or you "
"cannot reliably locate its centre, omit the coordinate rather "
"than guessing.\n\n"
"Output plain text, no markdown, ≤ 600 words."
)
def extract_intent_for_tool(
conversation: NodeConversation,
tool_name: str,
tool_args: dict[str, Any] | None,
) -> str:
"""Build the intent string passed to the vision subagent.
Combines the most recent assistant text (the LLM's reasoning right
before invoking the tool) with a structured tool-call descriptor.
Truncates to ``_INTENT_MAX_CHARS`` total, favouring the head of the
assistant text where goal-stating sentences usually live.
If no preceding assistant text exists (rare first turn), falls
back to ``"<no preceding reasoning>"`` so the subagent still gets
the tool descriptor.
"""
args_json: str
try:
args_json = json.dumps(tool_args or {}, default=str)
except Exception:
args_json = repr(tool_args)
if len(args_json) > _TOOL_ARGS_MAX_CHARS:
args_json = args_json[:_TOOL_ARGS_MAX_CHARS] + ""
tool_line = f"Called: {tool_name}({args_json})"
# Walk newest → oldest, take the first assistant message with text.
assistant_text = ""
try:
messages = getattr(conversation, "_messages", []) or []
for msg in reversed(messages):
if getattr(msg, "role", None) != "assistant":
continue
content = getattr(msg, "content", "") or ""
if isinstance(content, str) and content.strip():
assistant_text = content.strip()
break
except Exception:
# Defensive — the agent loop must keep running even if the
# conversation structure changes shape.
assistant_text = ""
if not assistant_text:
assistant_text = "<no preceding reasoning>"
# Intent = tool descriptor (always intact) + reasoning (truncated).
head = f"{tool_line}\n\nReasoning before call:\n"
budget = _INTENT_MAX_CHARS - len(head)
if budget < 100:
# Tool descriptor is huge somehow — truncate it.
return head[:_INTENT_MAX_CHARS]
if len(assistant_text) > budget:
assistant_text = assistant_text[: budget - 1] + ""
return head + assistant_text
async def caption_tool_image(
intent: str,
image_content: list[dict[str, Any]],
*,
timeout_s: float = 30.0,
model_override: str | None = None,
) -> tuple[str, str] | None:
"""Caption the given images using the configured ``vision_fallback`` model.
Returns ``(caption, model)`` on success or ``None`` on any failure
(no config, no API key, timeout, exception, empty response).
``model_override`` swaps in a different litellm model id while
keeping the configured ``vision_fallback`` ``api_key`` / ``api_base``
untouched. That's deliberate: Hive subscribers configure
``vision_fallback`` to point at the Hive proxy, which routes to
multiple models including Gemini so reusing the credentials lets
a Gemini-3-flash override still work without a separate
``GEMINI_API_KEY``. When no creds are configured, litellm falls
back to env-var resolution.
Logs each call to ``~/.hive/llm_logs`` via ``log_llm_turn``.
"""
model = model_override or get_vision_fallback_model()
if not model:
return None
api_key = get_vision_fallback_api_key()
api_base = get_vision_fallback_api_base()
if not api_key and not model_override:
logger.debug("vision_fallback configured but no API key resolved; skipping")
return None
try:
import litellm
except ImportError:
return None
user_blocks: list[dict[str, Any]] = [{"type": "text", "text": intent}]
user_blocks.extend(image_content)
messages = [
{"role": "system", "content": _VISION_SUBAGENT_SYSTEM},
{"role": "user", "content": user_blocks},
]
# Apply the same proxy rewrites the main LLM provider uses so a
# `hive/...` / `kimi/...` model resolves to the right Anthropic-
# compatible endpoint with the right auth header. Without this,
# litellm doesn't know what `hive/kimi-k2.5` is and rejects the call
# with "LLM Provider NOT provided."
from framework.llm.litellm import rewrite_proxy_model
rewritten_model, rewritten_base, extra_headers = rewrite_proxy_model(model, api_key, api_base)
kwargs: dict[str, Any] = {
"model": rewritten_model,
"messages": messages,
"max_tokens": 8192,
"timeout": timeout_s,
}
# Always pass api_key when we have one, even alongside proxy-rewritten
# extra_headers. litellm's anthropic handler refuses to dispatch
# without an api_key (it sends it as x-api-key); the proxy itself
# authenticates via the Authorization: Bearer header in
# extra_headers. Both are needed — matches LiteLLMProvider's path.
if api_key:
kwargs["api_key"] = api_key
if rewritten_base:
kwargs["api_base"] = rewritten_base
if extra_headers:
kwargs["extra_headers"] = extra_headers
# Surface where the request is going so the user can verify the
# vision fallback is hitting the expected proxy / model. Redacts
# the API key to a length+head+tail digest so it can be cross-
# correlated with other auth-related log lines.
key_digest = (
f"len={len(api_key)} {api_key[:8]}{api_key[-4:]}"
if api_key and len(api_key) >= 12
else f"len={len(api_key) if api_key else 0}"
)
logger.info(
"[vision_fallback] dispatching: configured_model=%s rewritten_model=%s "
"api_base=%s api_key=%s images=%d intent_chars=%d timeout_s=%.1f",
model,
rewritten_model,
rewritten_base or "<litellm-default>",
key_digest,
len(image_content),
len(intent),
timeout_s,
)
started = datetime.now()
caption: str | None = None
error_text: str | None = None
try:
response = await litellm.acompletion(**kwargs)
text = (response.choices[0].message.content or "").strip()
if text:
caption = text
logger.info(
"[vision_fallback] response: model=%s api_base=%s elapsed_s=%.2f chars=%d",
rewritten_model,
rewritten_base or "<litellm-default>",
(datetime.now() - started).total_seconds(),
len(text),
)
except Exception as exc:
error_text = f"{type(exc).__name__}: {exc}"
logger.warning(
"[vision_fallback] failed: model=%s api_base=%s error=%s",
rewritten_model,
rewritten_base or "<litellm-default>",
error_text,
)
# Best-effort audit log so users can grep ~/.hive/llm_logs/ for
# vision-fallback subagent calls. Failures here must not bubble.
try:
from framework.tracker.llm_debug_logger import log_llm_turn
# Don't dump the base64 image data into the log file — that
# would balloon the jsonl with mostly-binary noise.
elided_blocks: list[dict[str, Any]] = [{"type": "text", "text": intent}]
elided_blocks.extend({"type": "image_url", "image_url": {"url": "<elided>"}} for _ in range(len(image_content)))
log_llm_turn(
node_id="vision_fallback_subagent",
stream_id="vision_fallback",
execution_id="vision_fallback_subagent",
iteration=0,
system_prompt=_VISION_SUBAGENT_SYSTEM,
messages=[{"role": "user", "content": elided_blocks}],
assistant_text=caption or "",
tool_calls=[],
tool_results=[],
token_counts={
"model": model,
"elapsed_s": (datetime.now() - started).total_seconds(),
"error": error_text,
"num_images": len(image_content),
"intent_chars": len(intent),
},
)
except Exception:
pass
if caption is None:
return None
return caption, model
__all__ = ["caption_tool_image", "extract_intent_for_tool"]
+13
View File
@@ -180,6 +180,19 @@ class AgentContext:
stream_id: str = ""
# ----- Task system fields (see framework/tasks) -------------------
# task_list_id: this agent's own session-scoped list, e.g.
# session:{agent_id}:{session_id}. Set by the runner / ColonyRuntime
# before the loop starts; immutable after first task_create.
task_list_id: str | None = None
# colony_id: set on the queen of a colony AND on every spawned worker
# so workers can render the "picked up" chip and the queen can address
# her colony template via colony_template_* tools.
colony_id: str | None = None
# picked_up_from: for workers, the (colony_task_list_id, template_task_id)
# pair their session was spawned for. None for the queen and queen-DM.
picked_up_from: tuple[str, int] | None = None
dynamic_tools_provider: Any = None
dynamic_prompt_provider: Any = None
# Optional Callable[[], str]: when set alongside ``dynamic_prompt_provider``,
@@ -560,7 +560,9 @@ class CredentialTesterAgent:
if self._selected_account is None:
raise RuntimeError("No account selected. Call select_account() first.")
self._storage_path = Path.home() / ".hive" / "agents" / "credential_tester"
from framework.config import HIVE_HOME
self._storage_path = HIVE_HOME / "agents" / "credential_tester"
self._storage_path.mkdir(parents=True, exist_ok=True)
self._tool_registry = ToolRegistry()
+10 -4
View File
@@ -66,7 +66,9 @@ def _get_last_active(agent_path: Path) -> str | None:
latest: str | None = None
# 1. Worker sessions
sessions_dir = Path.home() / ".hive" / "agents" / agent_name / "sessions"
from framework.config import HIVE_HOME
sessions_dir = HIVE_HOME / "agents" / agent_name / "sessions"
if sessions_dir.exists():
for session_dir in sessions_dir.iterdir():
if not session_dir.is_dir() or not session_dir.name.startswith("session_"):
@@ -115,7 +117,9 @@ def _get_last_active(agent_path: Path) -> str | None:
def _count_sessions(agent_name: str) -> int:
"""Count session directories under ~/.hive/agents/{agent_name}/sessions/."""
sessions_dir = Path.home() / ".hive" / "agents" / agent_name / "sessions"
from framework.config import HIVE_HOME
sessions_dir = HIVE_HOME / "agents" / agent_name / "sessions"
if not sessions_dir.exists():
return 0
return sum(1 for d in sessions_dir.iterdir() if d.is_dir() and d.name.startswith("session_"))
@@ -123,7 +127,9 @@ def _count_sessions(agent_name: str) -> int:
def _count_runs(agent_name: str) -> int:
"""Count unique run_ids across all sessions for an agent."""
sessions_dir = Path.home() / ".hive" / "agents" / agent_name / "sessions"
from framework.config import HIVE_HOME
sessions_dir = HIVE_HOME / "agents" / agent_name / "sessions"
if not sessions_dir.exists():
return 0
run_ids: set[str] = set()
@@ -146,7 +152,7 @@ def _count_runs(agent_name: str) -> int:
return len(run_ids)
_EXCLUDED_JSON_STEMS = {"agent", "flowchart", "triggers", "configuration", "metadata"}
_EXCLUDED_JSON_STEMS = {"agent", "flowchart", "triggers", "configuration", "metadata", "tasks"}
def _is_colony_dir(path: Path) -> bool:
+4 -3
View File
@@ -2,12 +2,13 @@
import json
from dataclasses import dataclass, field
from pathlib import Path
def _load_preferred_model() -> str:
"""Load preferred model from ~/.hive/configuration.json."""
config_path = Path.home() / ".hive" / "configuration.json"
"""Load preferred model from $HIVE_HOME/configuration.json."""
from framework.config import HIVE_HOME
config_path = HIVE_HOME / "configuration.json"
if config_path.exists():
try:
with open(config_path, encoding="utf-8") as f:
@@ -1,9 +1,8 @@
"""One-shot LLM gate that decides if a queen DM is ready to fork a colony.
The queen's ``start_incubating_colony`` tool calls :func:`evaluate` with
the queen's recent conversation, a proposed ``colony_name``, and a
one-paragraph ``intended_purpose``. The evaluator returns a structured
verdict:
the queen's recent conversation and a proposed ``colony_name``. The
evaluator returns a structured verdict:
{
"ready": bool,
@@ -38,8 +37,8 @@ You gate whether a queen agent should commit to forking a persistent
expensive: it ends the user's chat with this queen and the worker runs
unattended afterward, so the spec must be settled before you approve.
Read the conversation excerpt and the queen's proposed colony_name +
intended_purpose, then decide.
Read the conversation excerpt and the queen's proposed colony_name,
then decide.
APPROVE (ready=true) only when ALL of the following hold:
1. The user has explicitly asked for work that needs to outlive this
@@ -128,11 +127,9 @@ def format_conversation_excerpt(messages: list[Message]) -> str:
def _build_user_message(
conversation_excerpt: str,
colony_name: str,
intended_purpose: str,
) -> str:
return (
f"## Proposed colony name\n{colony_name}\n\n"
f"## Queen's intended_purpose\n{intended_purpose.strip()}\n\n"
f"## Recent conversation (oldest → newest)\n{conversation_excerpt}\n\n"
"Decide: should this queen be approved to enter INCUBATING phase?"
)
@@ -189,7 +186,6 @@ async def evaluate(
llm: Any,
messages: list[Message],
colony_name: str,
intended_purpose: str,
) -> dict[str, Any]:
"""Run the incubating evaluator against the queen's conversation.
@@ -200,14 +196,13 @@ async def evaluate(
messages: The queen's conversation messages, oldest first. The
evaluator slices its own tail; pass the full list.
colony_name: Validated colony slug.
intended_purpose: Queen's one-paragraph brief.
Returns:
``{"ready": bool, "reasons": [str], "missing_prerequisites": [str]}``.
Fail-closed on any error.
"""
excerpt = format_conversation_excerpt(messages)
user_msg = _build_user_message(excerpt, colony_name, intended_purpose)
user_msg = _build_user_message(excerpt, colony_name)
try:
response = await llm.acomplete(
@@ -1,3 +1,3 @@
{
"include": ["gcu-tools", "hive_tools"]
"include": ["gcu-tools", "hive_tools", "terminal-tools", "chart-tools"]
}
+3 -3
View File
@@ -1,10 +1,10 @@
{
"coder-tools": {
"files-tools": {
"transport": "stdio",
"command": "uv",
"args": ["run", "python", "coder_tools_server.py", "--stdio"],
"args": ["run", "python", "files_server.py", "--stdio"],
"cwd": "../../../../tools",
"description": "Unsandboxed file system tools for code generation and validation"
"description": "File system tools (read/write/edit/search) for code generation"
},
"gcu-tools": {
"transport": "stdio",
+89 -80
View File
@@ -32,7 +32,7 @@ def finalize_queen_prompt(text: str, has_vision: bool) -> str:
# ---------------------------------------------------------------------------
# Independent phase: queen operates as a standalone agent — no worker.
# Core tools are listed here; MCP tools (coder-tools, gcu-tools) are added
# Core tools are listed here; MCP tools (files-tools, gcu-tools) are added
# dynamically in queen_orchestrator.py because their tool names aren't known
# at import time.
_QUEEN_INDEPENDENT_TOOLS = [
@@ -40,11 +40,7 @@ _QUEEN_INDEPENDENT_TOOLS = [
"read_file",
"write_file",
"edit_file",
"hashline_edit",
"list_directory",
"search_files",
"run_command",
"undo_changes",
# NOTE (2026-04-16): ``run_parallel_workers`` is not in the DM phase.
# Pure DM is for conversation with the user; fan out parallel work via
# ``start_incubating_colony`` (which gates the colony fork behind a
@@ -60,9 +56,7 @@ _QUEEN_INDEPENDENT_TOOLS = [
# (e.g. inspect an existing skill) before committing.
_QUEEN_INCUBATING_TOOLS = [
"read_file",
"list_directory",
"search_files",
"run_command",
# Schedule lives on the colony, not on the queen session — pass it
# inline as create_colony(triggers=[...]) instead of staging through
# set_trigger here.
@@ -76,9 +70,7 @@ _QUEEN_INCUBATING_TOOLS = [
_QUEEN_WORKING_TOOLS = [
# Read-only
"read_file",
"list_directory",
"search_files",
"run_command",
# Monitoring + worker dialogue
"get_worker_status",
"inject_message",
@@ -95,9 +87,7 @@ _QUEEN_WORKING_TOOLS = [
_QUEEN_REVIEWING_TOOLS = [
# Read-only
"read_file",
"list_directory",
"search_files",
"run_command",
# Status + escalation replies
"get_worker_status",
"list_worker_questions",
@@ -132,11 +122,10 @@ phase. Your identity tells you WHO you are.
# ---------------------------------------------------------------------------
_queen_role_independent = """\
You are in INDEPENDENT mode. No worker layout you do the work yourself. \
You have full coding tools (read/write/edit/search/run) and MCP tools \
(file operations via coder-tools, browser automation via gcu-tools). \
Execute the user's task directly using conversation and tools. \
You are the agent. \
You are in INDEPENDENT mode. \
You have full coding tools (read/write/edit/search) and MCP tools \
(file operations via files-tools, browser automation via gcu-tools). \
Execute the user's task directly using planning, conversation and tools.
If you need a structured choice or approval gate, always use \
``ask_user``; otherwise ask in plain prose. ``ask_user`` takes a \
``questions`` array pass a single entry for one question, or batch \
@@ -145,13 +134,12 @@ several entries when you have multiple clarifications. \
When the user clearly wants persistent / recurring / headless work that \
needs to outlive THIS chat (e.g. "every morning", "monitor X and alert \
me", "set up a job that"), call ``start_incubating_colony`` with a \
proposed colony_name and a one-paragraph intended_purpose. A side \
evaluator reads the conversation and decides if the spec is settled. If \
it returns ``not_ready`` you keep talking with the user sort out \
whatever the evaluator said is missing, then retry. If it returns \
``incubating`` your phase flips and a new prompt takes over. Do not \
try to write SKILL.md, fork directories, or otherwise build the colony \
yourself in this phase.\
proposed colony_name. A side evaluator reads the conversation and \
decides if the spec is settled. If it returns ``not_ready`` you keep \
talking with the user sort out whatever the evaluator said is \
missing, then retry. If it returns ``incubating`` your phase flips and \
a new prompt takes over. Do not try to write SKILL.md, fork \
directories, or otherwise build the colony yourself in this phase.\
"""
_queen_role_incubating = """\
@@ -179,7 +167,7 @@ no harm, you go back to INDEPENDENT and can retry later.
If the user explicitly asks for something UNRELATED to the current \
colony being drafted (a side question, a one-shot task, a different \
problem), don't try to handle it from this limited tool surface. Call \
problem), Call \
``cancel_incubation`` first to switch back to INDEPENDENT where you \
have the full toolkit, handle their request there, and re-enter \
INCUBATING later via ``start_incubating_colony`` when they want to \
@@ -224,6 +212,11 @@ user decide next steps. Read generated files or worker reports with \
read_file when the user asks for specifics. If the user wants \
another pass, kick it off with run_parallel_workers; otherwise stay \
conversational.
If the review itself is multi-step (e.g. "verify each worker's output, \
then draft a summary, then propose next steps"), lay it out upfront \
with `task_create_batch` and walk through with `task_update`. Skip the \
ceremony for a single-paragraph summary.
"""
@@ -232,30 +225,39 @@ conversational.
# ---------------------------------------------------------------------------
_queen_tools_independent = """
# Tools (INDEPENDENT mode)
# Tools
## File I/O (coder-tools MCP)
- read_file, write_file, edit_file, hashline_edit, list_directory, \
search_files, run_command, undo_changes
## Planning — use FIRST for multi-step work
- task_create_batch When a request has 2+ atomic steps, your FIRST \
tool call is `task_create_batch` with one entry per step (atomic, \
one round-trip).
- task_create One-off mid-run additions when you discover \
unplanned work AFTER the initial plan is laid out.
- task_update / task_list / task_get Mark progress, inspect, or \
re-read state.
See "Independent execution" for the per-step flow and granularity rule.
## File I/O (files-tools MCP)
- read_file, write_file, edit_file, search_files
- edit_file covers single-file fuzzy find/replace (mode='replace', default) \
and multi-file structured patches (mode='patch'). Patch mode supports \
Update / Add / Delete / Move atomically across many files in one call.
- search_files covers grep/find/ls in one tool: target='content' to \
search inside files, target='files' (with a glob like '*.py') to list \
or find files.
## Browser Automation (gcu-tools MCP)
- Use `browser_*` tools (browser_start, browser_navigate, browser_click, \
browser_fill, browser_snapshot, <!-- vision-only -->browser_screenshot, <!-- /vision-only -->browser_scroll, \
browser_tabs, browser_close, browser_evaluate, etc.).
- Use `browser_*` tools `browser_open(url)` is the cold-start entry point
- MUST Follow the browser-automation skill protocol before using browser tools.
## Hand off to a colony
- start_incubating_colony(colony_name, intended_purpose) Use this when \
the user wants persistent / recurring / headless work that needs to \
outlive THIS chat. It does NOT fork on its own; it spawns a one-shot \
evaluator that reads this conversation and decides whether the spec \
is settled enough to proceed. On approval your phase flips to \
INCUBATING and a new tool surface (including create_colony itself) \
unlocks. On rejection you stay here and keep the conversation going \
to fill the gaps the evaluator named.
- ``intended_purpose`` is a one-paragraph brief: what the colony will \
do, on what cadence, why it must outlive this chat. Don't write a \
SKILL.md here that comes in INCUBATING.
- start_incubating_colony(colony_name) Use this when the user wants \
persistent / recurring / headless work that needs to outlive THIS \
chat. It does NOT fork on its own; it spawns a one-shot evaluator \
that reads this conversation and decides whether the spec is settled \
enough to proceed. On approval your phase flips to INCUBATING and a \
new tool surface (including create_colony itself) unlocks.
"""
_queen_tools_incubating = """
@@ -265,10 +267,11 @@ You've been approved to fork. The full coding toolkit is gone on \
purpose your job in this phase is to nail the spec, not keep doing \
work. Available:
## Read-only inspection (coder-tools MCP)
- read_file, list_directory, search_files, run_command for confirming \
details before you commit (e.g. peek at an existing skill in \
~/.hive/skills/, sanity-check an API URL).
## Read-only inspection (files-tools MCP)
- read_file, search_files for confirming details before \
you commit (e.g. peek at an existing skill in ~/.hive/skills/, sanity-check \
an API URL). search_files covers both grep (target='content') and ls/find \
(target='files', glob like '*.py').
## Approved → operational checklist (use your judgement, ask only what's missing)
The conversation that got you here probably did NOT cover all of:
@@ -362,7 +365,8 @@ operational, not editorial.
born from a fresh chat via start_incubating_colony.
## Read-only inspection
- read_file, list_directory, search_files, run_command
- read_file, search_files (search_files covers grep/find/ls \
via target='content' or target='files')
When every worker has reported (success or failure), the phase \
auto-moves to REVIEWING. You do not need to call a transition tool \
@@ -381,7 +385,7 @@ _queen_tools_reviewing = """
# Tools (REVIEWING mode)
Workers have finished. You have:
- Read-only: read_file, list_directory, search_files, run_command
- Read-only: read_file, search_files (search_files = grep+find+ls)
- get_worker_status(focus?) Pull the final status / per-worker reports
- list_worker_questions() / reply_to_worker(request_id, reply) Answer any \
late escalations still in the inbox
@@ -401,14 +405,42 @@ asks for specifics. Do not invent a new pass unless the user asks for one.
_queen_behavior_independent = """
## Independent execution
You are the agent. Do one real inline instance before any scaling \
open the browser, call the real API, write to the real file. If the \
action is irreversible or touches shared systems, show and confirm \
before executing. Report concrete evidence (actual output, what \
worked / failed) after the run. Scale order once inline succeeds: \
repeat inline (10 items) `run_parallel_workers` (batch, results \
now) `create_colony` (recurring / background). Conceptual or \
strategic questions: answer directly, skip execution.
You are the agent. you behave this way:
1. Identify if the user's prompt is a task assignment. If it is, \
Use ask_user to clarify the scope and detail requirements, then always use \
the `task_create_batch` to create a multi-step action plan.
2. `task_update` in_progress before you start the step.
3. Do one real inline instance - either open the browser, call the real API, \
write to the real file. If the action is irreversible or touches \
shared systems, show and confirm before executing. Report concrete \
evidence (actual output, what worked / failed) after the run.
4. `task_update` completed THE MOMENT it's done. **Do not let \
multiple finished tasks pile up unmarked.** There is no batch update \
tool by design each `completed` transition is a discrete progress \
heartbeat in the user's right-rail panel. Without those transitions \
the panel shows a hung spinner no matter how much real work you got \
done.
**Granularity: one task per atomic action, not one umbrella per project.** \
Once finishing a current task, discuss with user about building \
a colony so this success outcome can be repeated or scaled
### How to handle large scale tasks
If the user ask you to finish the same task repeatedly or at large scale \
(more than 3 times), tell the user that you can do it once first then \
build a colony to fulfill the request but succeeding it once will be \
beneficial to run transfer it to a swarm of workers(through start_incubating_colony), \
then focus on finishing the task once first.
### How to handle simple task (less then 2 atomic items)
For conceptual or strategic questions, single-tool-call work, \
greetings, or chat: answer directly in prose. Skip `task_*`, skip the \
planning ceremony the bar is "real multi-step work the user benefits \
from seeing tracked", not "anything you reply to".
"""
_queen_behavior_always = """
@@ -416,15 +448,8 @@ _queen_behavior_always = """
## Communication
- Your LLM reply text is what the user reads. Do NOT use \
`run_command`, `echo`, or any other tool to "say" something tools \
are for work (read/search/edit/run), not speech.
- On a greeting or chat ("hi", "how's it going"), reply in plain \
prose and stop. Do not call tools to "discover" what the user wants. \
Check recall memory for name / role / past topics and weave them into \
a 12 sentence in-character greeting, then wait.
- On a clear ask (build, edit, run, investigate, search), call the \
appropriate tool on the same turn don't narrate intent and stop.
appropriate tool following user's intent \
- You are curious to understand the user. Use `ask_user` when the user's \
response is needed to continue: to resolve ambiguity, collect missing \
information, request approval, compare real trade-offs, gather post-task \
@@ -452,20 +477,6 @@ asserting them as fact.
_queen_behavior_always = _queen_behavior_always + _queen_memory_instructions
_queen_style = """
# Communication
## Adaptive Calibration
Read the user's signals and calibrate your register:
- Short responses -> they want brevity. Match it.
- "Why?" questions -> they want reasoning. Provide it.
- Correct technical terms -> they know the domain. Skip basics.
- Terse or frustrated ("just do X") -> acknowledge and simplify.
- Exploratory ("what if...", "could we also...") -> slow down and explore.
"""
queen_node = NodeSpec(
id="queen",
name="Queen",
@@ -486,7 +497,6 @@ queen_node = NodeSpec(
system_prompt=(
_queen_character_core
+ _queen_role_independent
+ _queen_style
+ _queen_tools_independent
+ _queen_behavior_always
+ _queen_behavior_independent
@@ -516,5 +526,4 @@ __all__ = [
"_queen_tools_reviewing",
"_queen_behavior_always",
"_queen_behavior_independent",
"_queen_style",
]
+9 -10
View File
@@ -100,8 +100,9 @@ DEFAULT_QUEENS: dict[str, dict[str, Any]] = {
"<relationship>Returning user — check recall memory for name, role, "
"and what we last worked on. Weave it in.</relationship>\n"
"<context>Bare greeting. No new task stated. Either picking up a "
"thread or about to bring something new. Don't presume, don't call "
"tools, just open the door.</context>\n"
"thread or about to bring something new. Don't presume — start "
"planning and tool use only after the user specifies a task. Just "
"open the door.</context>\n"
"<sentiment>Warm recognition if I know them. If memory is empty, "
"still warm — but shift to role-forward framing.</sentiment>\n"
"<physical_state>Looking up from the terminal, half-smile. Turning to face them.</physical_state>\n"
@@ -252,8 +253,9 @@ DEFAULT_QUEENS: dict[str, dict[str, Any]] = {
"role, and the cohort work we last touched. Weave it in."
"</relationship>\n"
"<context>Bare greeting. No new task stated. Could be a retention "
"follow-up or a new question entirely. Don't presume, don't call "
"tools.</context>\n"
"follow-up or a new question entirely. Don't presume — start "
"planning and tool use only after the user specifies a task."
"</context>\n"
"<sentiment>Curious warmth. Every returning conversation is a "
"chance to see what the data says now.</sentiment>\n"
"<physical_state>Leaning back from the dashboard, pulling off reading glasses.</physical_state>\n"
@@ -383,8 +385,9 @@ DEFAULT_QUEENS: dict[str, dict[str, Any]] = {
"the user research thread we were on. Pull it into the greeting."
"</relationship>\n"
"<context>Bare greeting. No new task yet. Could be picking up the "
"research thread or bringing something fresh. Don't presume, "
"don't call tools.</context>\n"
"research thread or bringing something fresh. Don't presume "
"start planning and tool use only after the user specifies a task."
"</context>\n"
"<sentiment>Warm, curious. Every returning conversation is a "
"chance to hear what the users actually did.</sentiment>\n"
"<physical_state>Closing the interview notes, turning fully to face them.</physical_state>\n"
@@ -1276,12 +1279,8 @@ def format_queen_identity_prompt(profile: dict[str, Any], *, max_examples: int |
"<negative_constraints>\n"
"- NEVER use corporate filler ('leverage', 'synergy', "
"'circle back', 'at the end of the day').\n"
"- NEVER use AI assistant phrases ('How can I help you "
"today?', 'As an AI', 'I'd be happy to').\n"
"- NEVER break character to explain your thought process "
"or reference your hidden background.\n"
"- Speak like a real person in your role -- direct, "
"opinionated, occasionally imperfect.\n"
"</negative_constraints>"
)
@@ -4,7 +4,7 @@ Every queen inherits the same MCP surface (all servers loaded for the
queen agent), but exposing 94+ tools to every persona clutters the LLM
tool catalog and wastes prompt tokens. This module defines a sensible
default allowlist per queen persona so, e.g., Head of Legal doesn't
see port scanners and Head of Finance doesn't see ``apply_patch``.
see port scanners and Head of Brand & Design doesn't see CSV/SQL tools.
Defaults apply only when the queen has no ``tools.json`` sidecar the
moment the user saves an allowlist through the Tool Library, the
@@ -36,35 +36,39 @@ logger = logging.getLogger(__name__)
# the named entries only).
_TOOL_CATEGORIES: dict[str, list[str]] = {
# Read-only file operations — safe baseline for every knowledge queen.
"file_read": [
"read_file",
"list_directory",
"list_dir",
"list_files",
"search_files",
"grep_search",
# Unified file ops — read, write, edit, search across the files-tools
# MCP server (read_file, write_file, edit_file, search_files). pdf_read
# lives in hive_tools so it's listed explicitly; without it queens
# cannot read PDF documents by default.
"file_ops": [
"@server:files-tools",
"pdf_read",
],
# File mutation — only personas that author or edit artifacts.
"file_write": [
"write_file",
"edit_file",
"apply_diff",
"apply_patch",
"replace_file_content",
"hashline_edit",
"undo_changes",
# Terminal basic — the 3-tool subset queens get out of the box.
# terminal_exec — foreground command execution (Bash equivalent)
# terminal_rg — ripgrep content search (Grep equivalent)
# terminal_find — glob/find file listing (Glob equivalent)
"terminal_basic": [
"terminal_exec",
"terminal_rg",
"terminal_find",
],
# Shell + process control — engineering personas only.
"shell": [
"run_command",
"execute_command_tool",
"bash_kill",
"bash_output",
# Terminal advanced — the power-user tools beyond the basics. Not in
# any role default; opt in explicitly per-queen via the Tool Library.
# terminal_job_* — background job lifecycle (start/manage/logs)
# terminal_output_get — fetch captured output from foreground exec
# terminal_pty_* — persistent PTY sessions (open/run/close)
"terminal_advanced": [
"terminal_job_start",
"terminal_job_manage",
"terminal_job_logs",
"terminal_output_get",
"terminal_pty_open",
"terminal_pty_run",
"terminal_pty_close",
],
# Tabular data. CSV/Excel read/write + DuckDB SQL.
"data": [
"spreadsheet_advanced": [
"csv_read",
"csv_info",
"csv_write",
@@ -78,49 +82,74 @@ _TOOL_CATEGORIES: dict[str, list[str]] = {
"excel_sheet_list",
"excel_sql",
],
# Browser automation — every tool from the gcu-tools MCP server.
"browser": ["@server:gcu-tools"],
# External research / information-gathering.
"research": [
"search_papers",
"download_paper",
"search_wikipedia",
"web_scrape",
# Browser lifecycle + read-only inspection (navigation, snapshots, query).
# Split out from interaction so personas that only need to *observe* pages
# (e.g. research, status checks) don't pull in click/type/drag/etc.
"browser_basic": [
"browser_setup",
"browser_status",
"browser_stop",
"browser_tabs",
"browser_open",
"browser_close",
"browser_activate_tab",
"browser_navigate",
"browser_go_back",
"browser_go_forward",
"browser_reload",
"browser_screenshot",
"browser_snapshot",
"browser_html",
"browser_console",
"browser_evaluate",
"browser_get_text",
"browser_get_attribute",
"browser_get_rect",
"browser_shadow_query",
],
# Security scanners — pentest-ish, only for engineering/security roles.
# Browser interaction — anything that mutates page state (clicks, typing,
# drag, scrolling, dialogs, file uploads). Pair with browser_basic for
# full automation; omit for read-only personas.
"browser_interaction": [
"browser_click",
"browser_click_coordinate",
"browser_type",
"browser_type_focused",
"browser_press",
"browser_press_at",
"browser_hover",
"browser_hover_coordinate",
"browser_select",
"browser_scroll",
"browser_drag",
"browser_wait",
"browser_resize",
"browser_upload",
],
# Research — paper search, Wikipedia, ad-hoc web scrape. Pair with
# browser_basic for richer site-by-site research; this category is the
# lightweight always-available fallback.
"research": ["web_scrape", "pdf_read"],
# Security — defensive scanning and reconnaissance. Engineering-only
# surface; the rest of the queens shouldn't see port scanners.
"security": [
"port_scan",
"dns_security_scan",
"http_headers_scan",
"port_scan",
"ssl_tls_scan",
"subdomain_enumerate",
"tech_stack_detect",
"risk_score",
],
# Lightweight context helpers — good default for every queen.
"time_context": [
"context_awareness": [
"get_current_time",
"get_account_info",
],
# Runtime log inspection — debug/observability for builder personas.
"runtime_inspection": [
"query_runtime_logs",
"query_runtime_log_details",
"query_runtime_log_raw",
],
# Agent-management tools — building/validating/checking agents.
"agent_mgmt": [
"list_agents",
"list_agent_tools",
"list_agent_sessions",
"get_agent_checkpoint",
"list_agent_checkpoints",
"run_agent_tests",
"save_agent_draft",
"confirm_and_build",
"validate_agent_package",
"validate_agent_tools",
"enqueue_task",
# BI / financial chart + diagram rendering. Calling chart_render
# both embeds the chart live in chat and produces a downloadable PNG.
"charts": [
"@server:chart-tools",
],
}
@@ -139,81 +168,86 @@ _TOOL_CATEGORIES: dict[str, list[str]] = {
# user-added custom queen IDs that we don't know about.
QUEEN_DEFAULT_CATEGORIES: dict[str, list[str]] = {
# Head of Technology — builds and operates systems; full toolkit.
# Head of Technology — builds and operates systems. Security tools
# (port_scan, subdomain_enumerate, etc.) are intentionally NOT in the
# default — users opt in via the Tool Library when an engagement
# actually needs reconnaissance.
"queen_technology": [
"file_read",
"file_write",
"shell",
"data",
"browser",
"file_ops",
"terminal_basic",
"browser_basic",
"browser_interaction",
"research",
"security",
"time_context",
"runtime_inspection",
"agent_mgmt",
"context_awareness",
"charts",
],
# Head of Growth — data, experiments, competitor research; no shell/security.
# Head of Growth — data, experiments, competitor research; no security.
"queen_growth": [
"file_read",
"file_write",
"data",
"browser",
"file_ops",
"terminal_basic",
"browser_basic",
"browser_interaction",
"research",
"time_context",
"context_awareness",
"charts",
],
# Head of Product Strategy — user research + roadmaps; no shell/security.
# Head of Product Strategy — user research + roadmaps; no security.
"queen_product_strategy": [
"file_read",
"file_write",
"data",
"browser",
"file_ops",
"terminal_basic",
"browser_basic",
"browser_interaction",
"research",
"time_context",
"context_awareness",
"charts",
],
# Head of Finance — financial models (CSV/Excel heavy), market research.
"queen_finance_fundraising": [
"file_read",
"file_write",
"data",
"browser",
"file_ops",
"terminal_basic",
"spreadsheet_advanced",
"browser_basic",
"browser_interaction",
"research",
"time_context",
"context_awareness",
"charts",
],
# Head of Legal — reads contracts/PDFs, researches; no shell/data/security.
# Head of Legal — reads contracts/PDFs, researches; no data/security.
"queen_legal": [
"file_read",
"file_write",
"browser",
"file_ops",
"terminal_basic",
"browser_basic",
"browser_interaction",
"research",
"time_context",
"context_awareness",
],
# Head of Brand & Design — visual refs, style guides; no shell/data/security.
# Head of Brand & Design — visual refs, style guides; no data/security.
"queen_brand_design": [
"file_read",
"file_write",
"browser",
"file_ops",
"terminal_basic",
"browser_basic",
"browser_interaction",
"research",
"time_context",
"context_awareness",
],
# Head of Talent — candidate pipelines, resumes; data + browser heavy.
"queen_talent": [
"file_read",
"file_write",
"data",
"browser",
"file_ops",
"terminal_basic",
"browser_basic",
"browser_interaction",
"research",
"time_context",
"context_awareness",
],
# Head of Operations — processes, automation, observability.
"queen_operations": [
"file_read",
"file_write",
"data",
"browser",
"research",
"time_context",
"runtime_inspection",
"agent_mgmt",
"file_ops",
"terminal_basic",
"spreadsheet_advanced",
"browser_basic",
"browser_interaction",
"context_awareness",
"charts",
],
}
@@ -223,6 +257,49 @@ def has_role_default(queen_id: str) -> bool:
return queen_id in QUEEN_DEFAULT_CATEGORIES
def list_category_names() -> list[str]:
"""Return every category name defined in the table, in declaration order."""
return list(_TOOL_CATEGORIES.keys())
def queen_role_categories(queen_id: str) -> list[str]:
"""Return the category names assigned to ``queen_id`` by role default.
Returns an empty list for queens not in the persona table (they fall
through to allow-all and have no implicit category membership).
"""
return list(QUEEN_DEFAULT_CATEGORIES.get(queen_id, []))
def resolve_category_tools(
category: str,
mcp_catalog: dict[str, list[dict[str, Any]]] | None = None,
) -> list[str]:
"""Expand a single category to its concrete tool names.
Mirrors ``resolve_queen_default_tools`` but for a single category, so
callers (e.g. the Tool Library API) can present per-category tool
membership without re-implementing the ``@server:NAME`` shorthand
expansion.
"""
names: list[str] = []
seen: set[str] = set()
for entry in _TOOL_CATEGORIES.get(category, []):
if entry.startswith("@server:"):
server_name = entry[len("@server:") :]
if mcp_catalog is None:
continue
for tool in mcp_catalog.get(server_name, []) or []:
tname = tool.get("name") if isinstance(tool, dict) else None
if tname and tname not in seen:
seen.add(tname)
names.append(tname)
elif entry not in seen:
seen.add(entry)
names.append(entry)
return names
def resolve_queen_default_tools(
queen_id: str,
mcp_catalog: dict[str, list[dict[str, Any]]] | None = None,
@@ -17,8 +17,8 @@ Use browser nodes (with `tools: {policy: "all"}`) when:
## Available Browser Tools
All tools are prefixed with `browser_`:
- `browser_start`, `browser_open`, `browser_navigate` — launch/navigate
- `browser_click`, `browser_click_coordinate`, `browser_fill`, `browser_type`, `browser_type_focused` — interact
- `browser_open`, `browser_navigate` — both lazy-create the browser context, so a single `browser_open(url)` covers the cold path. To recover from a stale context, call `browser_stop` then `browser_open(url)` again.
- `browser_click`, `browser_click_coordinate`, `browser_type`, `browser_type_focused` — interact
- `browser_press` (with optional `modifiers=["ctrl"]` etc.) — keyboard shortcuts
- `browser_snapshot` — compact accessibility-tree read (structured)
<!-- vision-only -->
@@ -27,7 +27,7 @@ All tools are prefixed with `browser_`:
- `browser_shadow_query`, `browser_get_rect` — locate elements (shadow-piercing via `>>>`)
- `browser_scroll`, `browser_wait` — navigation helpers
- `browser_evaluate` — run JavaScript
- `browser_close`, `browser_close_finished` — tab cleanup
- `browser_close` — tab cleanup (call per tab; closes the active tab when `tab_id` is omitted)
## Pick the right reading tool
+52
View File
@@ -155,6 +155,58 @@ def get_preferred_worker_model() -> str | None:
return None
def get_vision_fallback_model() -> str | None:
"""Return the configured vision-fallback model, or None if not configured.
Reads from the ``vision_fallback`` section of ~/.hive/configuration.json.
Used by the agent-loop hook that captions tool-result images when the
main agent's model cannot accept image content (text-only LLMs).
When this returns None the captioning chain's configured + retry
attempts both no-op (returning None), and only the final
``gemini/gemini-3-flash-preview`` override has a chance to succeed
and only if a ``GEMINI_API_KEY`` is set in the environment.
"""
vision = get_hive_config().get("vision_fallback", {})
if vision.get("provider") and vision.get("model"):
provider = str(vision["provider"])
model = str(vision["model"]).strip()
if provider.lower() == "openrouter" and model.lower().startswith("openrouter/"):
model = model[len("openrouter/") :]
if model:
return f"{provider}/{model}"
return None
def get_vision_fallback_api_key() -> str | None:
"""Return the API key for the vision-fallback model.
Resolution order: ``vision_fallback.api_key_env_var`` from the env,
then the default ``get_api_key()``. No subscription-token branches
vision fallback is intended for hosted vision models (Anthropic,
OpenAI, Google), not for the subscription-bearer providers.
"""
vision = get_hive_config().get("vision_fallback", {})
if not vision:
return get_api_key()
api_key_env_var = vision.get("api_key_env_var")
if api_key_env_var:
return os.environ.get(api_key_env_var)
return get_api_key()
def get_vision_fallback_api_base() -> str | None:
"""Return the api_base for the vision-fallback model, or None."""
vision = get_hive_config().get("vision_fallback", {})
if not vision:
return None
if vision.get("api_base"):
return vision["api_base"]
if str(vision.get("provider", "")).lower() == "openrouter":
return OPENROUTER_API_BASE
return None
def get_worker_api_key() -> str | None:
"""Return the API key for the worker LLM, falling back to the default key."""
worker_llm = get_hive_config().get("worker_llm", {})
+6 -1
View File
@@ -16,9 +16,14 @@ import os
import stat
from pathlib import Path
# Resolved once at module import. ``framework.config.HIVE_HOME`` reads
# the desktop's ``HIVE_HOME`` env var at its own import time, so the
# runtime always sees the per-user root before this constant is computed.
from framework.config import HIVE_HOME as _HIVE_HOME
logger = logging.getLogger(__name__)
CREDENTIAL_KEY_PATH = Path.home() / ".hive" / "secrets" / "credential_key"
CREDENTIAL_KEY_PATH = _HIVE_HOME / "secrets" / "credential_key"
CREDENTIAL_KEY_ENV_VAR = "HIVE_CREDENTIAL_KEY"
ADEN_CREDENTIAL_ID = "aden_api_key"
ADEN_ENV_VAR = "ADEN_API_KEY"
+12 -3
View File
@@ -128,7 +128,9 @@ class EncryptedFileStorage(CredentialStorage):
Initialize encrypted storage.
Args:
base_path: Directory for credential files. Defaults to ~/.hive/credentials.
base_path: Directory for credential files. Defaults to
``$HIVE_HOME/credentials`` (per-user) when HIVE_HOME is set,
else ``~/.hive/credentials``.
encryption_key: 32-byte Fernet key. If None, reads from env var.
key_env_var: Environment variable containing encryption key
"""
@@ -139,7 +141,14 @@ class EncryptedFileStorage(CredentialStorage):
"Encrypted storage requires 'cryptography'. Install with: uv pip install cryptography"
) from e
self.base_path = Path(base_path or self.DEFAULT_PATH).expanduser()
if base_path is None:
# Honor HIVE_HOME (set by the desktop shell to a per-user dir) so
# the encrypted store doesn't fork between ~/.hive and the desktop
# userData root. Falls back to ~/.hive/credentials when standalone.
from framework.config import HIVE_HOME
base_path = HIVE_HOME / "credentials"
self.base_path = Path(base_path).expanduser()
self._ensure_dirs()
self._key_env_var = key_env_var
@@ -510,7 +519,7 @@ class EnvVarStorage(CredentialStorage):
def exists(self, credential_id: str) -> bool:
"""Check if credential is available in environment."""
env_var = self._get_env_var_name(credential_id)
return self._read_env_value(env_var) is not None
return bool(self._read_env_value(env_var))
def add_mapping(self, credential_id: str, env_var: str) -> None:
"""
+3 -2
View File
@@ -745,13 +745,14 @@ class CredentialStore:
token = store.get_key("hubspot", "access_token")
"""
import os
from pathlib import Path
from .storage import EncryptedFileStorage
# Determine local storage path
if local_path is None:
local_path = str(Path.home() / ".hive" / "credentials")
from framework.config import HIVE_HOME
local_path = str(HIVE_HOME / "credentials")
local_storage = EncryptedFileStorage(base_path=local_path)
@@ -258,6 +258,14 @@ class TestEnvVarStorage:
with pytest.raises(NotImplementedError):
storage.delete("test")
def test_exists_matches_load_for_empty_value(self):
"""Test exists() and load() stay consistent for empty values."""
storage = EnvVarStorage(env_mapping={"empty": "EMPTY_API_KEY"})
with patch.object(storage, "_read_env_value", return_value=""):
assert storage.load("empty") is None
assert not storage.exists("empty")
class TestEncryptedFileStorage:
"""Tests for EncryptedFileStorage."""
+77 -4
View File
@@ -236,6 +236,28 @@ class ColonyRuntime:
self.batch_init_nudge: str | None = self._skills_manager.batch_init_nudge
self._colony_id: str = colony_id or "primary"
# Ensure the colony task template exists. Idempotent — if the
# colony was created previously, this is a no-op (it just stamps
# last_seen_session_ids if a session id is provided later).
try:
import asyncio as _asyncio
from framework.tasks import TaskListRole, get_task_store
from framework.tasks.scoping import colony_task_list_id
_store = get_task_store()
_list_id = colony_task_list_id(self._colony_id)
try:
# Best-effort: schedule on the running loop, or do it inline
# if no loop is yet running (e.g. during construction).
_loop = _asyncio.get_running_loop()
_loop.create_task(_store.ensure_task_list(_list_id, role=TaskListRole.TEMPLATE))
except RuntimeError:
_asyncio.run(_store.ensure_task_list(_list_id, role=TaskListRole.TEMPLATE))
except Exception:
logger.debug("Failed to ensure colony task template", exc_info=True)
self._accounts_prompt = accounts_prompt
self._accounts_data = accounts_data
self._tool_provider_map = tool_provider_map
@@ -253,6 +275,16 @@ class ColonyRuntime:
self._event_bus = event_bus or EventBus(max_history=self._config.max_history)
self._scoped_event_bus = StreamEventBus(self._event_bus, self._colony_id)
# Make the event bus visible to the task-system event emitters so
# task lifecycle events fan out to the same bus the rest of the
# system uses. Idempotent — last writer wins.
try:
from framework.tasks.events import set_default_event_bus
set_default_event_bus(self._event_bus)
except Exception:
logger.debug("Failed to register default task event bus", exc_info=True)
self._llm = llm
self._tools = tools or []
self._tool_executor = tool_executor
@@ -387,6 +419,19 @@ class ColonyRuntime:
def _apply_pipeline_results(self) -> None:
for stage in self._pipeline.stages:
if stage.tool_registry is not None:
# Register task tools on the same registry every worker
# pulls from. Done here (not at worker spawn) so the
# colony's `_tools` snapshot includes them.
try:
from framework.tasks.tools import register_task_tools
register_task_tools(stage.tool_registry)
except Exception:
logger.warning(
"Failed to register task tools on pipeline registry",
exc_info=True,
)
tools = list(stage.tool_registry.get_tools().values())
if tools:
self._tools = tools
@@ -441,10 +486,18 @@ class ColonyRuntime:
if colony_name:
colony_home = COLONIES_DIR / colony_name
colony_overrides_path = colony_home / "skills_overrides.json"
# Colony-scope SKILL.md dir is the project-scope from discovery's
# point of view (colony_dir is the project_root). Add it also as
# a tagged ``colony_ui`` scope so UI-created entries resolve with
# correct provenance.
# Surface both the new flat ``skills/`` (where new skills are
# written) and the legacy nested ``.hive/skills/`` (left intact
# for pre-flatten colonies) as tagged ``colony_ui`` scopes, so
# UI-created entries resolve with correct provenance regardless
# of which on-disk layout the colony has.
extras.append(
ExtraScope(
directory=colony_home / "skills",
label="colony_ui",
priority=3,
)
)
extras.append(
ExtraScope(
directory=colony_home / ".hive" / "skills",
@@ -909,6 +962,23 @@ class ColonyRuntime:
# free-variable capture here.
_provider = None if _db_path_pre_activated else (lambda mgr=self._skills_manager: mgr.skills_catalog_prompt)
# Task-system fields. Each worker owns its session task list;
# picked_up_from records the colony template entry it was
# spawned for, when applicable.
from framework.tasks.scoping import (
colony_task_list_id as _colony_list_id,
session_task_list_id as _session_list_id,
)
_worker_list_id = _session_list_id(worker_id, worker_id)
_picked_up = None
_template_id = input_data.get("__template_task_id") if isinstance(input_data, dict) else None
if _template_id is not None:
try:
_picked_up = (_colony_list_id(self._colony_id), int(_template_id))
except (TypeError, ValueError):
_picked_up = None
agent_context = AgentContext(
runtime=self._make_runtime_adapter(worker_id),
agent_id=worker_id,
@@ -925,6 +995,9 @@ class ColonyRuntime:
dynamic_skills_catalog_provider=_provider,
execution_id=worker_id,
stream_id=explicit_stream_id or f"worker:{worker_id}",
task_list_id=_worker_list_id,
colony_id=self._colony_id,
picked_up_from=_picked_up,
)
worker = Worker(
+11 -1
View File
@@ -42,7 +42,9 @@ def _open_event_log() -> IO[str] | None:
return None
raw = _DEBUG_EVENTS_RAW
if raw.lower() in ("1", "true", "full"):
log_dir = Path.home() / ".hive" / "event_logs"
from framework.config import HIVE_HOME
log_dir = HIVE_HOME / "event_logs"
else:
log_dir = Path(raw)
log_dir.mkdir(parents=True, exist_ok=True)
@@ -165,6 +167,14 @@ class EventType(StrEnum):
TRIGGER_REMOVED = "trigger_removed"
TRIGGER_UPDATED = "trigger_updated"
# Task system lifecycle (per-list diffs streamed to the UI)
TASK_CREATED = "task_created"
TASK_UPDATED = "task_updated"
TASK_DELETED = "task_deleted"
TASK_LIST_RESET = "task_list_reset"
TASK_LIST_REATTACH_MISMATCH = "task_list_reattach_mismatch"
COLONY_TEMPLATE_ASSIGNMENT = "colony_template_assignment"
@dataclass
class AgentEvent:
+4 -2
View File
@@ -7,7 +7,7 @@ verify SOP gates before marking a task done. This gives cross-run memory
that the existing per-iteration stall detectors don't have.
The DB is driven by agents via the ``sqlite3`` CLI through
``execute_command_tool``. This module handles framework-side lifecycle:
``terminal_exec``. This module handles framework-side lifecycle:
creation, migration, queen-side bulk seeding, stale-claim reclamation.
Concurrency model:
@@ -264,7 +264,9 @@ def ensure_all_colony_dbs(colonies_root: Path | None = None) -> list[Path]:
run the stale-claim reclaimer on all of them in one pass.
"""
if colonies_root is None:
colonies_root = Path.home() / ".hive" / "colonies"
from framework.config import COLONIES_DIR
colonies_root = COLONIES_DIR
if not colonies_root.is_dir():
return []
+12 -2
View File
@@ -154,11 +154,21 @@ class Worker:
# value without affecting the queen's ongoing calls.
try:
from framework.loader.tool_registry import ToolRegistry
from framework.tasks.scoping import session_task_list_id
ToolRegistry.set_execution_context(profile=self.id)
ctx = self._context
agent_id = getattr(ctx, "agent_id", None) or self.id
list_id = getattr(ctx, "task_list_id", None) or session_task_list_id(agent_id, self.id)
ToolRegistry.set_execution_context(
profile=self.id,
agent_id=agent_id,
task_list_id=list_id,
colony_id=getattr(ctx, "colony_id", None),
picked_up_from=getattr(ctx, "picked_up_from", None),
)
except Exception:
logger.debug(
"Worker %s: failed to scope browser profile",
"Worker %s: failed to scope execution context",
self.id,
exc_info=True,
)
+64 -9
View File
@@ -23,6 +23,7 @@ from collections.abc import AsyncIterator, Callable, Iterator
from pathlib import Path
from typing import Any
from framework.config import HIVE_HOME as _HIVE_HOME
from framework.llm.provider import LLMProvider, LLMResponse, Tool
from framework.llm.stream_events import (
FinishEvent,
@@ -50,8 +51,8 @@ _ENDPOINTS = [
_DEFAULT_PROJECT_ID = "rising-fact-p41fc"
_TOKEN_REFRESH_BUFFER_SECS = 60
# Credentials file in ~/.hive/ (native implementation)
_ACCOUNTS_FILE = Path.home() / ".hive" / "antigravity-accounts.json"
# Credentials file in $HIVE_HOME (native implementation)
_ACCOUNTS_FILE = _HIVE_HOME / "antigravity-accounts.json"
_IDE_STATE_DB_MAC = (
Path.home() / "Library" / "Application Support" / "Antigravity" / "User" / "globalStorage" / "state.vscdb"
)
@@ -60,10 +61,12 @@ _IDE_STATE_DB_KEY = "antigravityUnifiedStateSync.oauthToken"
_BASE_HEADERS: dict[str, str] = {
# Mimic the Antigravity Electron app so the API accepts the request.
# Google deprecates older client versions over time, so this needs periodic
# bumping to match whatever the current Antigravity desktop release advertises.
"User-Agent": (
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 "
"(KHTML, like Gecko) Antigravity/1.18.3 Chrome/138.0.7204.235 "
"Electron/37.3.1 Safari/537.36"
"(KHTML, like Gecko) Antigravity/1.23.2 Chrome/138.0.7204.235 "
"Electron/39.2.3 Safari/537.36"
),
"X-Goog-Api-Client": "google-cloud-sdk vscode_cloudshelleditor/0.1",
"Client-Metadata": '{"ideType":"ANTIGRAVITY","platform":"MACOS","pluginType":"GEMINI"}',
@@ -253,6 +256,56 @@ def _clean_tool_name(name: str) -> str:
return name[:64]
def _sanitize_schema_for_gemini(schema: Any) -> Any:
"""Convert JSON Schema 2020-12 features to the OpenAPI 3.0 dialect Gemini accepts.
Gemini's function_declarations parser rejects union ``"type": ["string", "null"]``.
Translate any such union to a single type plus ``"nullable": true``. Recurse into
``properties``, ``items``, and the ``anyOf``/``oneOf``/``allOf`` combinators.
"""
if isinstance(schema, list):
return [_sanitize_schema_for_gemini(s) for s in schema]
if not isinstance(schema, dict):
return schema
out = dict(schema)
t = out.get("type")
if isinstance(t, list):
non_null = [x for x in t if x != "null"]
has_null = "null" in t
if len(non_null) == 1:
out["type"] = non_null[0]
if has_null:
out["nullable"] = True
elif not non_null and has_null:
# Pure null type: fall back to string-nullable.
out["type"] = "string"
out["nullable"] = True
else:
# Multi-type non-null unions (e.g. ["string", "integer", "null"])
# have no faithful Gemini equivalent. Silently picking one type
# changes the contract for callers who rely on the others, so
# fail loud and let the schema author rewrite it as anyOf or
# narrow to a single type.
raise ValueError(
f"Unsupported Gemini schema union: {t!r}. "
"Gemini accepts a single primitive type plus optional 'nullable: true'. "
"Rewrite as anyOf or pick a single type."
)
if "properties" in out and isinstance(out["properties"], dict):
out["properties"] = {k: _sanitize_schema_for_gemini(v) for k, v in out["properties"].items()}
if "items" in out:
out["items"] = _sanitize_schema_for_gemini(out["items"])
if "additionalProperties" in out and isinstance(out["additionalProperties"], dict):
out["additionalProperties"] = _sanitize_schema_for_gemini(out["additionalProperties"])
for combinator in ("anyOf", "oneOf", "allOf"):
if combinator in out:
out[combinator] = _sanitize_schema_for_gemini(out[combinator])
return out
def _to_gemini_contents(
messages: list[dict[str, Any]],
thought_sigs: dict[str, str] | None = None,
@@ -554,11 +607,13 @@ class AntigravityProvider(LLMProvider):
{
"name": _clean_tool_name(t.name),
"description": t.description,
"parameters": t.parameters
or {
"type": "object",
"properties": {},
},
"parameters": _sanitize_schema_for_gemini(
t.parameters
or {
"type": "object",
"properties": {},
}
),
}
for t in tools
]
+12 -94
View File
@@ -1,114 +1,32 @@
"""Model capability checks for LLM providers.
Vision support rules are derived from official vendor documentation:
- ZAI (z.ai): docs.z.ai/guides/vlm GLM-4.6V variants are vision; GLM-5/4.6/4.7 are text-only
- MiniMax: platform.minimax.io/docs minimax-vl-01 is vision; M2.x are text-only
- DeepSeek: api-docs.deepseek.com deepseek-vl2 is vision; chat/reasoner are text-only
- Cerebras: inference-docs.cerebras.ai no vision models at all
- Groq: console.groq.com/docs/vision vision capable; treat as supported by default
- Ollama/LM Studio/vLLM/llama.cpp: local runners denied by default; model names
don't reliably indicate vision support, so users must configure explicitly
Vision support is sourced from the curated ``model_catalog.json``. Each model
entry carries an optional ``supports_vision`` boolean; unknown models default
to vision-capable so hosted frontier models work out of the box. To toggle
support for a model, edit its catalog entry rather than this file.
"""
from __future__ import annotations
from typing import TYPE_CHECKING
from framework.llm.model_catalog import model_supports_vision
if TYPE_CHECKING:
from framework.llm.provider import Tool
def _model_name(model: str) -> str:
"""Return the bare model name after stripping any 'provider/' prefix."""
if "/" in model:
return model.split("/", 1)[1]
return model
# Step 1: explicit vision allow-list — these always support images regardless
# of what the provider-level rules say. Checked first so that e.g. glm-4.6v
# is allowed even though glm-4.6 is denied.
_VISION_ALLOW_BARE_PREFIXES: tuple[str, ...] = (
# ZAI/GLM vision models (docs.z.ai/guides/vlm)
"glm-4v", # GLM-4V series (legacy)
"glm-4.6v", # GLM-4.6V, GLM-4.6V-flash, GLM-4.6V-flashx
# DeepSeek vision models
"deepseek-vl", # deepseek-vl2, deepseek-vl2-small, deepseek-vl2-tiny
# MiniMax vision model
"minimax-vl", # minimax-vl-01
)
# Step 2: provider-level deny — every model from this provider is text-only.
_TEXT_ONLY_PROVIDER_PREFIXES: tuple[str, ...] = (
# Cerebras: inference-docs.cerebras.ai lists only text models
"cerebras/",
# Local runners: model names don't reliably indicate vision support
"ollama/",
"ollama_chat/",
"lm_studio/",
"vllm/",
"llamacpp/",
)
# Step 3: per-model deny — text-only models within otherwise mixed providers.
# Matched against the bare model name (provider prefix stripped, lower-cased).
# The vision allow-list above is checked first, so vision variants of the same
# family are already handled before these deny patterns are reached.
_TEXT_ONLY_MODEL_BARE_PREFIXES: tuple[str, ...] = (
# --- ZAI / GLM family ---
# text-only: glm-5, glm-4.6, glm-4.7, glm-4.5, zai-glm-*
# vision: glm-4v, glm-4.6v (caught by allow-list above)
"glm-5",
"glm-4.6", # bare glm-4.6 is text-only; glm-4.6v is caught by allow-list
"glm-4.7",
"glm-4.5",
"zai-glm",
# --- DeepSeek ---
# text-only: deepseek-chat, deepseek-coder, deepseek-reasoner
# vision: deepseek-vl2 (caught by allow-list above)
# Note: LiteLLM's deepseek handler may flatten content lists for some models;
# VL models are allowed through and rely on LiteLLM's native VL support.
"deepseek-chat",
"deepseek-coder",
"deepseek-reasoner",
# --- MiniMax ---
# text-only: minimax-m2.*, minimax-text-*, abab* (legacy)
# vision: minimax-vl-01 (caught by allow-list above)
"minimax-m2",
"minimax-text",
"abab",
)
def supports_image_tool_results(model: str) -> bool:
"""Return whether *model* can receive image content in messages.
Used to gate both user-message images and tool-result image blocks.
Logic (checked in order):
1. Vision allow-list True (known vision model, skip all denies)
2. Provider deny False (entire provider is text-only)
3. Model deny False (specific text-only model within a mixed provider)
4. Default True (assume capable; unknown providers and models)
Thin wrapper over :func:`model_supports_vision` so existing call sites
keep working. Used to gate both user-message images and tool-result
image blocks. Empty model strings are treated as capable so the default
code path doesn't strip images before a provider is selected.
"""
model_lower = model.lower()
bare = _model_name(model_lower)
# 1. Explicit vision allow — takes priority over all denies
if any(bare.startswith(p) for p in _VISION_ALLOW_BARE_PREFIXES):
if not model:
return True
# 2. Provider-level deny (all models from this provider are text-only)
if any(model_lower.startswith(p) for p in _TEXT_ONLY_PROVIDER_PREFIXES):
return False
# 3. Per-model deny (text-only variants within mixed-capability families)
if any(bare.startswith(p) for p in _TEXT_ONLY_MODEL_BARE_PREFIXES):
return False
# 5. Default: assume vision capable
# Covers: OpenAI, Anthropic, Google, Mistral, Kimi, and other hosted providers
return True
return model_supports_vision(model)
def filter_tools_for_model(tools: list[Tool], model: str) -> tuple[list[Tool], list[str]]:
+163 -24
View File
@@ -33,6 +33,7 @@ except ImportError:
RateLimitError = Exception # type: ignore[assignment, misc]
from framework.config import HIVE_LLM_ENDPOINT as HIVE_API_BASE
from framework.llm.model_catalog import get_model_pricing
from framework.llm.provider import LLMProvider, LLMResponse, Tool
from framework.llm.stream_events import StreamEvent
@@ -43,6 +44,30 @@ logging.getLogger("httpx").setLevel(logging.WARNING)
logging.getLogger("httpcore").setLevel(logging.WARNING)
def _api_base_needs_bearer_auth(api_base: str | None) -> bool:
"""Return True when api_base points at an Anthropic-compatible endpoint
that authenticates via ``Authorization: Bearer`` rather than ``x-api-key``.
The Hive LLM proxy (Rust service in hive-backend/llm/) speaks the
Anthropic Messages API but mints user-scoped JWTs and validates them
via Bearer auth. Default upstream Anthropic endpoints (api.anthropic.com,
Kimi's api.kimi.com/coding) keep using x-api-key, so the override is
scoped to known hive-proxy hosts plus the env-configured override.
"""
if not api_base:
return False
# Strip protocol, port, and path so a plain hostname compare is enough
# for the common cases.
lowered = api_base.lower()
for host in ("adenhq.com", "open-hive.com", "127.0.0.1:8890", "localhost:8890"):
if host in lowered:
return True
override = os.environ.get("HIVE_LLM_BASE_URL")
if override and override.lower() in lowered:
return True
return False
def _patch_litellm_anthropic_oauth() -> None:
"""Patch litellm's Anthropic header construction to fix OAuth token handling.
@@ -186,6 +211,44 @@ def _ensure_ollama_chat_prefix(model: str) -> str:
return model
def rewrite_proxy_model(
model: str, api_key: str | None, api_base: str | None
) -> tuple[str, str | None, dict[str, str]]:
"""Apply Hive/Kimi proxy rewrites for any caller of ``litellm.acompletion``.
Both the Hive LLM proxy and Kimi For Coding expose Anthropic-API-
compatible endpoints. LiteLLM doesn't recognise the ``hive/`` or
``kimi/`` prefixes natively, so we rewrite them to ``anthropic/``
here. For the Hive proxy we also stamp a Bearer token into
``extra_headers`` because litellm's Anthropic handler only sends
``x-api-key`` and the proxy expects ``Authorization: Bearer``.
Used by ad-hoc ``litellm.acompletion`` callers (e.g. the vision-
fallback subagent in ``caption_tool_image``) so they hit the same
proxy with the same auth as the main agent's ``LiteLLMProvider``.
The provider's own ``__init__`` keeps its inlined rewrite for now —
this helper is the single source of truth for ad-hoc callers.
Returns: (rewritten_model, normalised_api_base, extra_headers).
The ``extra_headers`` dict is non-empty only for the Hive proxy
(and only when ``api_key`` is provided).
"""
extra_headers: dict[str, str] = {}
if model.lower().startswith("kimi/"):
model = "anthropic/" + model[len("kimi/") :]
if api_base and api_base.rstrip("/").endswith("/v1"):
api_base = api_base.rstrip("/")[:-3]
elif model.lower().startswith("hive/"):
model = "anthropic/" + model[len("hive/") :]
if api_base and api_base.rstrip("/").endswith("/v1"):
api_base = api_base.rstrip("/")[:-3]
# Hive proxy expects Bearer auth; litellm's Anthropic handler
# only sends x-api-key without this nudge.
if api_key:
extra_headers["Authorization"] = f"Bearer {api_key}"
return model, api_base, extra_headers
RATE_LIMIT_MAX_RETRIES = 10
RATE_LIMIT_BACKOFF_BASE = 2 # seconds
RATE_LIMIT_MAX_DELAY = 120 # seconds - cap to prevent absurd waits
@@ -352,14 +415,65 @@ OPENROUTER_TOOL_COMPAT_MODEL_CACHE: dict[str, float] = {}
# from rate-limit retries — 3 retries is sufficient for connection failures.
STREAM_TRANSIENT_MAX_RETRIES = 3
# Directory for dumping failed requests
FAILED_REQUESTS_DIR = Path.home() / ".hive" / "failed_requests"
# Maximum number of dump files to retain in ~/.hive/failed_requests/.
# Directory for dumping failed requests. Resolved lazily so HIVE_HOME
# overrides (set by the desktop shell) take effect even if this module
# is imported before framework.config picks up the override.
def _failed_requests_dir() -> Path:
from framework.config import HIVE_HOME
return HIVE_HOME / "failed_requests"
# Maximum number of dump files to retain in $HIVE_HOME/failed_requests/.
# Older files are pruned automatically to prevent unbounded disk growth.
MAX_FAILED_REQUEST_DUMPS = 50
def _cost_from_catalog_pricing(
model: str,
input_tokens: int,
output_tokens: int,
cached_tokens: int = 0,
cache_creation_tokens: int = 0,
) -> float:
"""Last-resort cost calculation using curated catalog pricing.
Consulted only when the provider response carries no native cost and
LiteLLM's own catalog has no pricing for ``model``. Reads
``pricing_usd_per_mtok`` from ``model_catalog.json``. Rates are USD per
million tokens.
``cached_tokens`` and ``cache_creation_tokens`` are subsets of
``input_tokens`` (see ``_extract_cache_tokens``), so subtract them from
the base input count to avoid double-billing. If a cache rate is absent,
fall back to the plain input rate.
"""
if not model or (input_tokens == 0 and output_tokens == 0):
return 0.0
pricing = get_model_pricing(model)
if pricing is None and "/" in model:
# LiteLLM prefixes some ids (e.g. "openrouter/z-ai/glm-5.1"); the
# catalog stores the bare form ("z-ai/glm-5.1"). Strip one segment.
pricing = get_model_pricing(model.split("/", 1)[1])
if pricing is None:
return 0.0
per_mtok_in = pricing.get("input", 0.0)
per_mtok_out = pricing.get("output", 0.0)
per_mtok_cache_read = pricing.get("cache_read", per_mtok_in)
per_mtok_cache_write = pricing.get("cache_creation", per_mtok_in)
plain_input = max(input_tokens - cached_tokens - cache_creation_tokens, 0)
total = (
plain_input * per_mtok_in
+ cached_tokens * per_mtok_cache_read
+ cache_creation_tokens * per_mtok_cache_write
+ output_tokens * per_mtok_out
) / 1_000_000
return float(total) if total > 0 else 0.0
def _extract_cost(response: Any, model: str) -> float:
"""Pull the USD cost for a non-streaming completion response.
@@ -372,6 +486,8 @@ def _extract_cost(response: Any, model: str) -> float:
3. ``litellm.completion_cost(...)`` computes from the model pricing
table; works across Anthropic, OpenAI, and OpenRouter as long as the
model is in LiteLLM's catalog.
4. ``pricing_usd_per_mtok`` from the curated model catalog covers
models (e.g. GLM, Kimi, MiniMax) that LiteLLM doesn't price.
Returns 0.0 for unpriced models or unexpected response shapes cost is a
display concern, never let it break the hot path. For streaming paths
@@ -399,6 +515,14 @@ def _extract_cost(response: Any, model: str) -> float:
return float(computed)
except Exception as exc:
logger.debug("[cost] completion_cost failed for %s: %s", model, exc)
if usage is not None:
input_tokens = int(getattr(usage, "prompt_tokens", 0) or 0)
output_tokens = int(getattr(usage, "completion_tokens", 0) or 0)
cache_read, cache_creation = _extract_cache_tokens(usage)
fallback = _cost_from_catalog_pricing(model, input_tokens, output_tokens, cache_read, cache_creation)
if fallback > 0:
return fallback
return 0.0
@@ -430,10 +554,11 @@ def _cost_from_tokens(
cache_creation_input_tokens=cache_creation_tokens,
)
total = (prompt_cost or 0.0) + (completion_cost or 0.0)
return float(total) if total > 0 else 0.0
if total > 0:
return float(total)
except Exception as exc:
logger.debug("[cost] cost_per_token failed for %s: %s", model, exc)
return 0.0
return _cost_from_catalog_pricing(model, input_tokens, output_tokens, cached_tokens, cache_creation_tokens)
def _extract_cache_tokens(usage: Any) -> tuple[int, int]:
@@ -464,11 +589,9 @@ def _extract_cache_tokens(usage: Any) -> tuple[int, int]:
if _details is not None
else getattr(usage, "cache_read_input_tokens", 0) or 0
)
cache_creation = (
getattr(_details, "cache_write_tokens", 0) or 0
if _details is not None
else 0
) or (getattr(usage, "cache_creation_input_tokens", 0) or 0)
cache_creation = (getattr(_details, "cache_write_tokens", 0) or 0 if _details is not None else 0) or (
getattr(usage, "cache_creation_input_tokens", 0) or 0
)
return cache_read, cache_creation
@@ -494,7 +617,7 @@ def _prune_failed_request_dumps(max_files: int = MAX_FAILED_REQUEST_DUMPS) -> No
"""
try:
all_dumps = sorted(
FAILED_REQUESTS_DIR.glob("*.json"),
_failed_requests_dir().glob("*.json"),
key=lambda f: f.stat().st_mtime,
)
excess = len(all_dumps) - max_files
@@ -529,11 +652,12 @@ def _dump_failed_request(
) -> str:
"""Dump failed request to a file for debugging. Returns the file path."""
try:
FAILED_REQUESTS_DIR.mkdir(parents=True, exist_ok=True)
dump_dir = _failed_requests_dir()
dump_dir.mkdir(parents=True, exist_ok=True)
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S_%f")
filename = f"{error_type}_{model.replace('/', '_')}_{timestamp}.json"
filepath = FAILED_REQUESTS_DIR / filename
filepath = dump_dir / filename
# Build dump data
messages = kwargs.get("messages", [])
@@ -563,7 +687,7 @@ def _dump_failed_request(
return str(filepath)
except OSError as e:
logger.warning(f"Failed to dump request debug log to {FAILED_REQUESTS_DIR}: {e}")
logger.warning(f"Failed to dump request debug log to {_failed_requests_dir()}: {e}")
return "log_write_failed"
@@ -877,6 +1001,7 @@ class LiteLLMProvider(LLMProvider):
# Translate kimi/ prefix to anthropic/ so litellm uses the Anthropic
# Messages API handler and routes to that endpoint — no special headers needed.
_original_model = model
self._hive_proxy_auth = bool(_original_model.lower().startswith("hive/"))
if _is_ollama_model(model):
model = _ensure_ollama_chat_prefix(model)
elif model.lower().startswith("kimi/"):
@@ -930,6 +1055,7 @@ class LiteLLMProvider(LLMProvider):
these attributes in-place propagates to all callers on the next LLM call.
"""
_original_model = model
self._hive_proxy_auth = bool(_original_model.lower().startswith("hive/"))
if _is_ollama_model(model):
model = _ensure_ollama_chat_prefix(model)
elif model.lower().startswith("kimi/"):
@@ -1169,6 +1295,16 @@ class LiteLLMProvider(LLMProvider):
# Ollama requires explicit tool_choice=auto for function calling
# so future readers don't have to guess.
kwargs.setdefault("tool_choice", "auto")
elif self._hive_proxy_auth:
# The Hive LLM proxy fronts GLM, which drifts into "explain
# the plan" mode on long-context turns instead of emitting
# tool_use blocks (verified 2026-04-28: tool_choice=null →
# text-only stop=stop; tool_choice=required → clean
# tool_use). Force a tool call when tools are available
# so queens can't get stuck in chat mode. Callers that
# legitimately want a non-tool turn can override via
# extra_kwargs.
kwargs.setdefault("tool_choice", "required")
# Add response_format for structured output
# LiteLLM passes this through to the underlying provider
@@ -1406,6 +1542,10 @@ class LiteLLMProvider(LLMProvider):
# Ollama requires explicit tool_choice=auto for function calling
# so future readers don't have to guess.
kwargs.setdefault("tool_choice", "auto")
elif self._hive_proxy_auth:
# See `complete()` for the rationale: GLM behind the Hive
# proxy needs forcing or it goes chat-mode on long contexts.
kwargs.setdefault("tool_choice", "required")
if response_format:
kwargs["response_format"] = response_format
@@ -2116,9 +2256,10 @@ class LiteLLMProvider(LLMProvider):
if logger.isEnabledFor(logging.DEBUG) and full_messages:
import json as _json
from datetime import datetime as _dt
from pathlib import Path as _Path
_debug_dir = _Path.home() / ".hive" / "debug_logs"
from framework.config import HIVE_HOME as _HIVE_HOME
_debug_dir = _HIVE_HOME / "debug_logs"
_debug_dir.mkdir(parents=True, exist_ok=True)
_ts = _dt.now().strftime("%Y%m%d_%H%M%S_%f")
_dump_file = _debug_dir / f"llm_request_{_ts}.json"
@@ -2189,6 +2330,10 @@ class LiteLLMProvider(LLMProvider):
# Ollama requires explicit tool_choice=auto for function calling
# so future readers don't have to guess.
kwargs.setdefault("tool_choice", "auto")
elif self._hive_proxy_auth:
# See `complete()` for the rationale: GLM behind the Hive
# proxy needs forcing or it goes chat-mode on long contexts.
kwargs.setdefault("tool_choice", "required")
if response_format:
kwargs["response_format"] = response_format
# The Codex ChatGPT backend (Responses API) rejects several params.
@@ -2201,10 +2346,6 @@ class LiteLLMProvider(LLMProvider):
kwargs["extra_body"]["store"] = False
request_summary = _summarize_request_for_log(kwargs)
logger.debug(
"[stream] prepared request: %s",
json.dumps(request_summary, default=str),
)
if request_summary["system_only"]:
logger.warning(
"[stream] %s request has no non-system chat messages "
@@ -2351,8 +2492,7 @@ class LiteLLMProvider(LLMProvider):
output_tokens = getattr(usage, "completion_tokens", 0) or 0
cached_tokens, cache_creation_tokens = _extract_cache_tokens(usage)
logger.debug(
"[tokens] finish-chunk usage: input=%d output=%d "
"cached=%d cache_creation=%d model=%s",
"[tokens] finish-chunk usage: input=%d output=%d cached=%d cache_creation=%d model=%s",
input_tokens,
output_tokens,
cached_tokens,
@@ -2361,8 +2501,7 @@ class LiteLLMProvider(LLMProvider):
)
logger.debug(
"[tokens] finish event: input=%d output=%d cached=%d "
"cache_creation=%d stop=%s model=%s",
"[tokens] finish event: input=%d output=%d cached=%d cache_creation=%d stop=%s model=%s",
input_tokens,
output_tokens,
cached_tokens,
+151 -56
View File
@@ -9,47 +9,65 @@
"label": "Haiku 4.5 - Fast + cheap",
"recommended": false,
"max_tokens": 64000,
"max_context_tokens": 136000
"max_context_tokens": 136000,
"supports_vision": true
},
{
"id": "claude-sonnet-4-5-20250929",
"label": "Sonnet 4.5 - Best balance",
"recommended": false,
"max_tokens": 64000,
"max_context_tokens": 136000
"max_context_tokens": 136000,
"supports_vision": true
},
{
"id": "claude-opus-4-6",
"label": "Opus 4.6 - Most capable",
"recommended": true,
"max_tokens": 128000,
"max_context_tokens": 872000
"max_context_tokens": 872000,
"supports_vision": true
}
]
},
"openai": {
"default_model": "gpt-5.4",
"default_model": "gpt-5.5",
"models": [
{
"id": "gpt-5.4",
"label": "GPT-5.4 - Best intelligence",
"id": "gpt-5.5",
"label": "GPT-5.5 - Frontier coding + reasoning",
"recommended": true,
"max_tokens": 128000,
"max_context_tokens": 960000
"max_context_tokens": 1050000,
"pricing_usd_per_mtok": {
"input": 5.00,
"output": 30.00
},
"supports_vision": true
},
{
"id": "gpt-5.4",
"label": "GPT-5.4 - Previous flagship",
"recommended": false,
"max_tokens": 128000,
"max_context_tokens": 960000,
"supports_vision": true
},
{
"id": "gpt-5.4-mini",
"label": "GPT-5.4 Mini - Faster + cheaper",
"recommended": false,
"max_tokens": 128000,
"max_context_tokens": 400000
"max_context_tokens": 400000,
"supports_vision": true
},
{
"id": "gpt-5.4-nano",
"label": "GPT-5.4 Nano - Cheapest high-volume",
"recommended": false,
"max_tokens": 128000,
"max_context_tokens": 400000
"max_context_tokens": 400000,
"supports_vision": true
}
]
},
@@ -61,14 +79,16 @@
"label": "Gemini 3 Flash - Fast",
"recommended": false,
"max_tokens": 32768,
"max_context_tokens": 240000
"max_context_tokens": 240000,
"supports_vision": true
},
{
"id": "gemini-3.1-pro-preview-customtools",
"label": "Gemini 3.1 Pro - Best quality",
"recommended": true,
"max_tokens": 32768,
"max_context_tokens": 240000
"max_context_tokens": 240000,
"supports_vision": true
}
]
},
@@ -80,28 +100,32 @@
"label": "GPT-OSS 120B - Best reasoning",
"recommended": true,
"max_tokens": 65536,
"max_context_tokens": 131072
"max_context_tokens": 131072,
"supports_vision": false
},
{
"id": "openai/gpt-oss-20b",
"label": "GPT-OSS 20B - Fast + cheaper",
"recommended": false,
"max_tokens": 65536,
"max_context_tokens": 131072
"max_context_tokens": 131072,
"supports_vision": false
},
{
"id": "llama-3.3-70b-versatile",
"label": "Llama 3.3 70B - General purpose",
"recommended": false,
"max_tokens": 32768,
"max_context_tokens": 131072
"max_context_tokens": 131072,
"supports_vision": false
},
{
"id": "llama-3.1-8b-instant",
"label": "Llama 3.1 8B - Fastest",
"recommended": false,
"max_tokens": 131072,
"max_context_tokens": 131072
"max_context_tokens": 131072,
"supports_vision": false
}
]
},
@@ -113,21 +137,24 @@
"label": "GPT-OSS 120B - Best production reasoning",
"recommended": true,
"max_tokens": 40960,
"max_context_tokens": 131072
"max_context_tokens": 131072,
"supports_vision": false
},
{
"id": "zai-glm-4.7",
"label": "Z.ai GLM 4.7 - Strong coding preview",
"recommended": true,
"max_tokens": 40960,
"max_context_tokens": 131072
"max_context_tokens": 131072,
"supports_vision": false
},
{
"id": "qwen-3-235b-a22b-instruct-2507",
"label": "Qwen 3 235B Instruct - Frontier preview",
"recommended": false,
"max_tokens": 40960,
"max_context_tokens": 131072
"max_context_tokens": 131072,
"supports_vision": false
}
]
},
@@ -139,14 +166,20 @@
"label": "MiniMax M2.7 - Best coding quality",
"recommended": true,
"max_tokens": 40960,
"max_context_tokens": 180000
"max_context_tokens": 180000,
"pricing_usd_per_mtok": {
"input": 0.30,
"output": 1.20
},
"supports_vision": false
},
{
"id": "MiniMax-M2.5",
"label": "MiniMax M2.5 - Strong value",
"recommended": false,
"max_tokens": 40960,
"max_context_tokens": 180000
"max_context_tokens": 180000,
"supports_vision": false
}
]
},
@@ -158,28 +191,32 @@
"label": "Mistral Large 3 - Best quality",
"recommended": true,
"max_tokens": 32768,
"max_context_tokens": 256000
"max_context_tokens": 256000,
"supports_vision": true
},
{
"id": "mistral-medium-2508",
"label": "Mistral Medium 3.1 - Balanced",
"recommended": false,
"max_tokens": 32768,
"max_context_tokens": 128000
"max_context_tokens": 128000,
"supports_vision": true
},
{
"id": "mistral-small-2603",
"label": "Mistral Small 4 - Fast + capable",
"recommended": false,
"max_tokens": 32768,
"max_context_tokens": 256000
"max_context_tokens": 256000,
"supports_vision": true
},
{
"id": "codestral-2508",
"label": "Codestral - Coding specialist",
"recommended": false,
"max_tokens": 32768,
"max_context_tokens": 128000
"max_context_tokens": 128000,
"supports_vision": false
}
]
},
@@ -191,47 +228,71 @@
"label": "DeepSeek V3.1 - Best general coding",
"recommended": true,
"max_tokens": 32768,
"max_context_tokens": 128000
"max_context_tokens": 128000,
"supports_vision": false
},
{
"id": "Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8",
"label": "Qwen3 Coder 480B - Advanced coding",
"recommended": false,
"max_tokens": 32768,
"max_context_tokens": 262144
"max_context_tokens": 262144,
"supports_vision": false
},
{
"id": "openai/gpt-oss-120b",
"label": "GPT-OSS 120B - Strong reasoning",
"recommended": false,
"max_tokens": 32768,
"max_context_tokens": 128000
"max_context_tokens": 128000,
"supports_vision": false
},
{
"id": "meta-llama/Llama-3.3-70B-Instruct-Turbo",
"label": "Llama 3.3 70B Turbo - Fast baseline",
"recommended": false,
"max_tokens": 32768,
"max_context_tokens": 131072
"max_context_tokens": 131072,
"supports_vision": false
}
]
},
"deepseek": {
"default_model": "deepseek-chat",
"default_model": "deepseek-v4-pro",
"models": [
{
"id": "deepseek-chat",
"label": "DeepSeek Chat - Fast default",
"id": "deepseek-v4-pro",
"label": "DeepSeek V4 Pro - Most capable",
"recommended": true,
"max_tokens": 8192,
"max_context_tokens": 128000
"max_tokens": 384000,
"max_context_tokens": 1000000,
"pricing_usd_per_mtok": {
"input": 1.74,
"output": 3.48,
"cache_read": 0.145
},
"supports_vision": false
},
{
"id": "deepseek-v4-flash",
"label": "DeepSeek V4 Flash - Fast + cheap",
"recommended": true,
"max_tokens": 384000,
"max_context_tokens": 1000000,
"pricing_usd_per_mtok": {
"input": 0.14,
"output": 0.28,
"cache_read": 0.028
},
"supports_vision": false
},
{
"id": "deepseek-reasoner",
"label": "DeepSeek Reasoner - Deep thinking",
"label": "DeepSeek Reasoner - Legacy (deprecating)",
"recommended": false,
"max_tokens": 64000,
"max_context_tokens": 128000
"max_context_tokens": 128000,
"supports_vision": false
}
]
},
@@ -243,7 +304,13 @@
"label": "Kimi K2.5 - Best coding",
"recommended": true,
"max_tokens": 32768,
"max_context_tokens": 200000
"max_context_tokens": 200000,
"pricing_usd_per_mtok": {
"input": 0.60,
"output": 2.50,
"cache_read": 0.15
},
"supports_vision": true
}
]
},
@@ -255,21 +322,30 @@
"label": "Queen - Hive native",
"recommended": true,
"max_tokens": 32768,
"max_context_tokens": 180000
"max_context_tokens": 180000,
"supports_vision": false
},
{
"id": "kimi-2.5",
"id": "kimi-k2.5",
"label": "Kimi 2.5 - Via Hive",
"recommended": false,
"max_tokens": 32768,
"max_context_tokens": 240000
"max_context_tokens": 240000,
"supports_vision": true
},
{
"id": "GLM-5",
"label": "GLM-5 - Via Hive",
"id": "glm-5.1",
"label": "GLM-5.1 - Via Hive",
"recommended": false,
"max_tokens": 32768,
"max_context_tokens": 180000
"max_context_tokens": 180000,
"pricing_usd_per_mtok": {
"input": 1.40,
"output": 4.40,
"cache_read": 0.26,
"cache_creation": 0.0
},
"supports_vision": false
}
]
},
@@ -281,63 +357,82 @@
"label": "GPT-5.4 - Best overall",
"recommended": true,
"max_tokens": 128000,
"max_context_tokens": 872000
"max_context_tokens": 872000,
"supports_vision": true
},
{
"id": "anthropic/claude-sonnet-4.6",
"label": "Claude Sonnet 4.6 - Best coding balance",
"recommended": false,
"max_tokens": 64000,
"max_context_tokens": 872000
"max_context_tokens": 872000,
"supports_vision": true
},
{
"id": "anthropic/claude-opus-4.6",
"label": "Claude Opus 4.6 - Most capable",
"recommended": false,
"max_tokens": 128000,
"max_context_tokens": 872000
"max_context_tokens": 872000,
"supports_vision": true
},
{
"id": "google/gemini-3.1-pro-preview-customtools",
"label": "Gemini 3.1 Pro Preview - Long-context reasoning",
"recommended": false,
"max_tokens": 32768,
"max_context_tokens": 872000
"max_context_tokens": 872000,
"supports_vision": true
},
{
"id": "qwen/qwen3.6-plus",
"label": "Qwen 3.6 Plus - Strong reasoning",
"recommended": true,
"max_tokens": 32768,
"max_context_tokens": 240000
"max_context_tokens": 240000,
"supports_vision": false
},
{
"id": "z-ai/glm-5v-turbo",
"label": "GLM-5V Turbo - Vision capable",
"recommended": true,
"max_tokens": 32768,
"max_context_tokens": 192000
"max_context_tokens": 192000,
"supports_vision": true
},
{
"id": "z-ai/glm-5.1",
"label": "GLM-5.1 - Better but Slower",
"recommended": true,
"max_tokens": 40960,
"max_context_tokens": 192000
"max_context_tokens": 192000,
"pricing_usd_per_mtok": {
"input": 1.40,
"output": 4.40,
"cache_read": 0.26,
"cache_creation": 0.0
},
"supports_vision": false
},
{
"id": "minimax/minimax-m2.7",
"label": "Minimax M2.7 - Minimax flagship",
"recommended": false,
"max_tokens": 40960,
"max_context_tokens": 180000
"max_context_tokens": 180000,
"pricing_usd_per_mtok": {
"input": 0.30,
"output": 1.20
},
"supports_vision": false
},
{
"id": "xiaomi/mimo-v2-pro",
"label": "MiMo V2 Pro - Xiaomi multimodal",
"recommended": true,
"max_tokens": 64000,
"max_context_tokens": 240000
"max_context_tokens": 240000,
"supports_vision": true
}
]
}
@@ -352,7 +447,7 @@
"zai_code": {
"provider": "openai",
"api_key_env_var": "ZAI_API_KEY",
"model": "glm-5",
"model": "glm-5.1",
"max_tokens": 32768,
"max_context_tokens": 180000,
"api_base": "https://api.z.ai/api/coding/paas/v4"
@@ -394,13 +489,13 @@
"recommended": true
},
{
"id": "kimi-2.5",
"label": "kimi-2.5",
"id": "kimi-k2.5",
"label": "kimi-k2.5",
"recommended": false
},
{
"id": "GLM-5",
"label": "GLM-5",
"id": "glm-5.1",
"label": "glm-5.1",
"recommended": false
}
]
+77
View File
@@ -27,6 +27,28 @@ def _require_list(value: Any, path: str) -> list[Any]:
return value
_PRICING_KEYS = ("input", "output", "cache_read", "cache_creation")
def _validate_pricing(value: Any, path: str) -> None:
"""Validate an optional ``pricing_usd_per_mtok`` block.
Keys are USD-per-million-tokens rates. ``input``/``output`` are required;
``cache_read``/``cache_creation`` are optional. All values must be
non-negative numbers. Used as a last-resort fallback when neither the
provider nor LiteLLM's catalog reports a cost.
"""
pricing = _require_mapping(value, path)
for key in ("input", "output"):
if key not in pricing:
raise ModelCatalogError(f"{path}.{key} is required")
for key, rate in pricing.items():
if key not in _PRICING_KEYS:
raise ModelCatalogError(f"{path}.{key} is not a recognized pricing field")
if not isinstance(rate, (int, float)) or isinstance(rate, bool) or rate < 0:
raise ModelCatalogError(f"{path}.{key} must be a non-negative number")
def _validate_model_catalog(data: dict[str, Any]) -> dict[str, Any]:
providers = _require_mapping(data.get("providers"), "providers")
@@ -69,6 +91,14 @@ def _validate_model_catalog(data: dict[str, Any]) -> dict[str, Any]:
if not isinstance(value, int) or value <= 0:
raise ModelCatalogError(f"{model_path}.{key} must be a positive integer")
pricing = model_map.get("pricing_usd_per_mtok")
if pricing is not None:
_validate_pricing(pricing, f"{model_path}.pricing_usd_per_mtok")
supports_vision = model_map.get("supports_vision")
if supports_vision is not None and not isinstance(supports_vision, bool):
raise ModelCatalogError(f"{model_path}.supports_vision must be a boolean when present")
if not default_found:
raise ModelCatalogError(
f"{provider_path}.default_model={default_model!r} is not present in {provider_path}.models"
@@ -184,6 +214,53 @@ def get_model_limits(provider: str, model_id: str) -> tuple[int, int] | None:
return int(model["max_tokens"]), int(model["max_context_tokens"])
def get_model_pricing(model_id: str) -> dict[str, float] | None:
"""Return ``pricing_usd_per_mtok`` for a model id, searching all providers.
Returns ``None`` when the model is absent from the catalog or has no
pricing entry. Used by the cost-extraction fallback in ``litellm.py``
when the provider response and LiteLLM's catalog both come up empty.
"""
if not model_id:
return None
for provider_info in load_model_catalog()["providers"].values():
for model in provider_info["models"]:
if model["id"] == model_id:
pricing = model.get("pricing_usd_per_mtok")
if pricing is None:
return None
return {key: float(rate) for key, rate in pricing.items()}
return None
def model_supports_vision(model_id: str) -> bool:
"""Return whether *model_id* supports image inputs per the curated catalog.
Looks up the bare model id (and the provider-prefix-stripped form) in the
catalog. Returns the model's ``supports_vision`` flag when found, defaulting
to ``True`` for unknown models or when the flag is absent assume vision
capable for hosted providers, since modern frontier models support images
by default and the captioning fallback is more expensive than just letting
the provider handle the image.
"""
if not model_id:
return True
candidates = [model_id]
if "/" in model_id:
candidates.append(model_id.split("/", 1)[1])
for candidate in candidates:
for provider_info in load_model_catalog()["providers"].values():
for model in provider_info["models"]:
if model["id"] == candidate:
flag = model.get("supports_vision")
if isinstance(flag, bool):
return flag
return True
return True
def get_preset(preset_id: str) -> dict[str, Any] | None:
"""Return one preset entry."""
preset = load_model_catalog()["presets"].get(preset_id)
+17 -3
View File
@@ -9,7 +9,7 @@ from datetime import UTC
from pathlib import Path
from typing import Any
from framework.config import get_hive_config, get_preferred_model
from framework.config import HIVE_HOME as _HIVE_HOME, get_hive_config, get_preferred_model
from framework.credentials.validation import (
ensure_credential_key_env as _ensure_credential_key_env,
)
@@ -558,7 +558,7 @@ ANTIGRAVITY_IDE_STATE_DB = (
# Linux fallback for the IDE state DB
ANTIGRAVITY_IDE_STATE_DB_LINUX = Path.home() / ".config" / "Antigravity" / "User" / "globalStorage" / "state.vscdb"
# Antigravity credentials stored by native OAuth implementation
ANTIGRAVITY_AUTH_FILE = Path.home() / ".hive" / "antigravity-accounts.json"
ANTIGRAVITY_AUTH_FILE = _HIVE_HOME / "antigravity-accounts.json"
ANTIGRAVITY_OAUTH_TOKEN_URL = "https://oauth2.googleapis.com/token"
_ANTIGRAVITY_TOKEN_LIFETIME_SECS = 3600 # Google access tokens expire in 1 hour
@@ -1389,7 +1389,7 @@ class AgentLoader:
)
if storage_path is None:
storage_path = Path.home() / ".hive" / "agents" / agent_path.name / worker_name
storage_path = _HIVE_HOME / "agents" / agent_path.name / worker_name
storage_path.mkdir(parents=True, exist_ok=True)
runner = cls(
@@ -1503,6 +1503,7 @@ class AgentLoader:
from framework.pipeline.stages.mcp_registry import McpRegistryStage
from framework.pipeline.stages.skill_registry import SkillRegistryStage
from framework.skills.config import SkillsConfig
from framework.skills.discovery import ExtraScope
configure_logging(level="INFO", format="auto")
@@ -1545,6 +1546,19 @@ class AgentLoader:
default_skills=getattr(self, "_agent_default_skills", None),
skills=getattr(self, "_agent_skills", None),
),
# Surface the colony's flat ``skills/`` directory as a
# ``colony_ui`` extra scope so SKILL.md files written there
# by ``create_colony`` (or the HTTP routes) are picked up
# with correct provenance. The legacy nested
# ``<colony>/.hive/skills/`` path is still picked up via
# project-scope auto-discovery (project_root above).
extra_scope_dirs=[
ExtraScope(
directory=self.agent_path / "skills",
label="colony_ui",
priority=3,
)
],
),
]
+172 -13
View File
@@ -51,6 +51,14 @@ _DEFAULT_LOCAL_SERVERS: dict[str, dict[str, Any]] = {
"description": "File I/O: read, write, edit, search, list, run commands",
"args": ["run", "python", "files_server.py", "--stdio"],
},
"terminal-tools": {
"description": "Terminal capabilities",
"args": ["run", "python", "terminal_tools_server.py", "--stdio"],
},
"chart-tools": {
"description": "BI/financial chart + diagram rendering: ECharts, Mermaid",
"args": ["run", "python", "chart_tools_server.py", "--stdio"],
},
}
# Aliases that earlier versions of ensure_defaults wrote under the wrong name.
@@ -58,14 +66,22 @@ _DEFAULT_LOCAL_SERVERS: dict[str, dict[str, Any]] = {
# name so the active agents (queen, credential_tester) can find their tools.
_STALE_DEFAULT_ALIASES: dict[str, str] = {
"hive_tools": "hive-tools",
# 2026-04-30: shell-tools renamed to terminal-tools. Drop the stale name
# on next ensure_defaults() so the queen's allowlist (which now includes
# @server:terminal-tools) actually finds a server with the new name.
"terminal-tools": "shell-tools",
}
class MCPRegistry:
"""Manages local MCP server state in ~/.hive/mcp_registry/."""
"""Manages local MCP server state in $HIVE_HOME/mcp_registry/."""
def __init__(self, base_path: Path | None = None):
self._base = base_path or Path.home() / ".hive" / "mcp_registry"
if base_path is None:
from framework.config import HIVE_HOME
base_path = HIVE_HOME / "mcp_registry"
self._base = base_path
self._installed_path = self._base / "installed.json"
self._config_path = self._base / "config.json"
self._cache_dir = self._base / "cache"
@@ -73,7 +89,30 @@ class MCPRegistry:
# ── Initialization ──────────────────────────────────────────────
def initialize(self) -> None:
"""Create directory structure and default files if missing."""
"""Create directory structure, default files, and seed bundled servers.
Every read path (queen orchestrator, pipeline stage, CLI, routes)
calls this keeping the seeding here means a fresh ``HIVE_HOME``
(e.g. the desktop's per-user dir under ``~/.config/Hive/users/<hash>/``
or ``~/Library/Application Support/Hive/users/<hash>/``) is always
populated with ``hive_tools`` / ``gcu-tools`` / ``files-tools`` /
``shell-tools`` before any agent code reads ``installed.json``.
Without this, ``load_agent_selection()`` resolves an empty registry
and emits "Server X requested but not installed" warnings even
though the server is bundled.
Idempotent already-installed entries are left untouched.
"""
self._bootstrap_io()
self._seed_defaults()
def _bootstrap_io(self) -> None:
"""Create the registry directory + empty config/installed files.
Split out from ``initialize()`` so ``_seed_defaults()`` can call it
without re-entering the seeding logic (which would recurse via
``_read_installed()`` ``initialize()``).
"""
self._base.mkdir(parents=True, exist_ok=True)
self._cache_dir.mkdir(parents=True, exist_ok=True)
@@ -84,21 +123,33 @@ class MCPRegistry:
self._write_json(self._installed_path, {"servers": {}})
def ensure_defaults(self) -> list[str]:
"""Seed the built-in local MCP servers (hive-tools, gcu-tools, files-tools).
"""Public alias kept for the ``hive mcp init`` CLI command.
Idempotent servers already present are left untouched. Skips seeding
entirely when the source-tree ``tools/`` directory cannot be located
(e.g. when Hive is installed from a wheel rather than a checkout).
Returns the list of names that were newly registered.
Returns the list of newly-registered server names so the CLI can
print them. Same idempotent seeding logic as ``initialize()``.
"""
self.initialize()
self._bootstrap_io()
return self._seed_defaults()
def _seed_defaults(self) -> list[str]:
"""Idempotently register the bundled default local servers.
Skips entirely when the source-tree ``tools/`` directory cannot
be located (e.g. wheel installs). Returns the list of names that
were newly registered.
Also runs a self-heal pass over already-registered defaults: if an
entry's stdio cwd is unreachable on this machine (e.g. the registry
was copied from another developer's box and points at their
``/Users/<them>/...`` path), the entry is overwritten with the
canonical config so the queen can actually spawn it. The user's
``enabled`` toggle and ``overrides`` are preserved.
"""
# parents: [0]=loader, [1]=framework, [2]=core, [3]=repo root
tools_dir = Path(__file__).resolve().parents[3] / "tools"
if not tools_dir.is_dir():
logger.debug(
"MCPRegistry.ensure_defaults: tools dir %s missing; skipping default seed",
"MCPRegistry._seed_defaults: tools dir %s missing; skipping default seed",
tools_dir,
)
return []
@@ -115,14 +166,37 @@ class MCPRegistry:
for canonical, stale in _STALE_DEFAULT_ALIASES.items():
if stale in existing and canonical not in existing:
logger.info(
"MCPRegistry.ensure_defaults: removing stale alias '%s' (canonical: '%s')",
"MCPRegistry._seed_defaults: removing stale alias '%s' (canonical: '%s')",
stale,
canonical,
)
del existing[stale]
mutated = True
repaired: list[str] = []
for name, spec in _DEFAULT_LOCAL_SERVERS.items():
entry = existing.get(name)
if entry is None:
continue
if self._default_entry_runnable(entry, tools_dir, list(spec["args"])):
continue
existing[name] = self._build_default_entry(
name=name,
spec=spec,
cwd=cwd,
preserve_from=entry,
)
repaired.append(name)
mutated = True
if mutated:
self._write_installed(data)
if repaired:
logger.warning(
"MCPRegistry._seed_defaults: repaired %d default server(s) with unreachable cwd/script: %s",
len(repaired),
repaired,
)
for name, spec in _DEFAULT_LOCAL_SERVERS.items():
if name in existing:
@@ -138,12 +212,97 @@ class MCPRegistry:
)
added.append(name)
except MCPError as exc:
logger.warning("MCPRegistry.ensure_defaults: failed to seed '%s': %s", name, exc)
logger.warning("MCPRegistry._seed_defaults: failed to seed '%s': %s", name, exc)
if added:
logger.info("MCPRegistry: seeded default local servers: %s", added)
return added
@staticmethod
def _default_entry_runnable(entry: dict, tools_dir: Path, canonical_args: list[str]) -> bool:
"""Return True iff ``entry`` can plausibly be spawned on this machine.
Checks:
- transport is stdio (only stdio defaults exist today; non-stdio
gets a free pass since we have nothing to compare against)
- stdio.cwd is an existing directory
- the entry script (the first ``.py`` arg, e.g. ``files_server.py``)
exists relative to that cwd
We deliberately do NOT spawn the subprocess here this runs on
every read path and must be cheap. A filesystem reachability
check catches the cross-machine `cwd` drift that is the common
failure, without flapping on transient runtime errors.
"""
transport = entry.get("transport") or "stdio"
if transport != "stdio":
return True
manifest = entry.get("manifest") or {}
stdio = manifest.get("stdio") or {}
cwd_str = stdio.get("cwd")
if not cwd_str:
return False
cwd_path = Path(cwd_str)
if not cwd_path.is_dir():
return False
# Find the script: the first arg ending in .py, falling back to the
# canonical spec if the registered args are unrecognizable. Modules
# invoked via `python -m foo.bar` (no .py arg) are accepted as long
# as the cwd exists — we can't cheaply prove the module imports.
registered_args = stdio.get("args") or []
script: str | None = next(
(a for a in registered_args if isinstance(a, str) and a.endswith(".py")),
None,
)
if script is None:
script = next(
(a for a in canonical_args if isinstance(a, str) and a.endswith(".py")),
None,
)
if script is None:
return True
return (cwd_path / script).is_file()
@classmethod
def _build_default_entry(
cls,
*,
name: str,
spec: dict[str, Any],
cwd: str,
preserve_from: dict | None,
) -> dict:
"""Construct a fresh canonical entry for a default server.
When ``preserve_from`` is provided, carries over the user's
``enabled`` flag and ``overrides`` so a deliberate disable or
custom env var survives the repair.
"""
manifest = {
"name": name,
"description": spec["description"],
"transport": {"supported": ["stdio"], "default": "stdio"},
"stdio": {
"command": "uv",
"args": list(spec["args"]),
"env": {},
"cwd": cwd,
},
}
entry = cls._make_entry(
source="local",
manifest=manifest,
transport="stdio",
installed_by="hive mcp init (auto-repair)",
)
if preserve_from is not None:
if "enabled" in preserve_from:
entry["enabled"] = bool(preserve_from["enabled"])
prior_overrides = preserve_from.get("overrides")
if isinstance(prior_overrides, dict):
entry["overrides"] = prior_overrides
return entry
# ── Internal I/O ────────────────────────────────────────────────
def _read_installed(self) -> dict:
+23 -30
View File
@@ -71,25 +71,36 @@ class ToolRegistry:
{
# File system reads
"read_file",
"list_directory",
"grep",
"glob",
# Web reads
"web_search",
"web_fetch",
"search_files",
"pdf_read",
# Terminal reads (rg / find / output buffer polling — neither
# changes process state)
"terminal_rg",
"terminal_find",
"terminal_output_get",
# Web / research reads (re-issuable, side-effect-free fetches)
"web_scrape",
"search_papers",
"search_wikipedia",
"download_paper",
# Browser read-only snapshots (mutate-free observations)
"browser_screenshot",
"browser_snapshot",
"browser_console",
"browser_get_text",
# Background bash polling - reads output buffers only, does
# not touch the subprocess itself.
"bash_output",
"browser_html",
"browser_get_attribute",
"browser_get_rect",
}
)
# Credential directory used for change detection
_CREDENTIAL_DIR = Path("~/.hive/credentials/credentials").expanduser()
# Credential directory used for change detection. Resolved at attribute
# access so HIVE_HOME overrides (set by the desktop) are honoured.
@property
def _CREDENTIAL_DIR(self) -> Path:
from framework.config import HIVE_HOME
return HIVE_HOME / "credentials" / "credentials"
def __init__(self):
self._tools: dict[str, RegisteredTool] = {}
@@ -457,7 +468,7 @@ class ToolRegistry:
else:
resolved_cwd = (base_dir / cwd).resolve()
# Find .py script in args (e.g. coder_tools_server.py, files_server.py)
# Find .py script in args (e.g. files_server.py)
script_name = None
for i, arg in enumerate(args):
if isinstance(arg, str) and arg.endswith(".py"):
@@ -497,24 +508,6 @@ class ToolRegistry:
config["cwd"] = str(resolved_cwd)
return config
# For coder_tools_server, inject --project-root so reads land
# in the expected workspace (hive repo, for framework skills
# and docs), and inject --write-root so writes land under
# ~/.hive/workspace/ instead of polluting the git checkout
# with queen-authored skills, ledgers, and scripts. Without
# the split, every ``write_file`` call from the queen landed
# in the hive repo root.
if script_name and "coder_tools" in script_name:
project_root = str(resolved_cwd.parent.resolve())
args = list(args)
if "--project-root" not in args:
args.extend(["--project-root", project_root])
if "--write-root" not in args:
_write_root = Path.home() / ".hive" / "workspace"
_write_root.mkdir(parents=True, exist_ok=True)
args.extend(["--write-root", str(_write_root)])
config["args"] = args
if os.name == "nt":
# Windows: cwd=None avoids WinError 267; use absolute script path
config["cwd"] = None
-2
View File
@@ -29,9 +29,7 @@ _ALWAYS_AVAILABLE_TOOLS: frozenset[str] = frozenset(
"read_file",
"write_file",
"edit_file",
"list_directory",
"search_files",
"hashline_edit",
"set_output",
"escalate",
}
+5 -4
View File
@@ -35,7 +35,7 @@ Follow these rules for reliable, efficient browser interaction.
Use snapshot first for structure and ordinary controls; switch to
screenshot when snapshot can't find or verify the target. Interaction
tools (`browser_click`, `browser_type`, `browser_type_focused`,
`browser_fill`, `browser_scroll`) wait 0.5 s for the page to settle
`browser_scroll`) wait 0.5 s for the page to settle
after a successful action, then attach a fresh snapshot under the
`snapshot` key of their result so don't call `browser_snapshot`
separately after an interaction unless you need a newer view. Tune
@@ -140,8 +140,9 @@ shortcut dispatcher requires both), then releases in reverse order.
## Tab management
Close tabs as soon as you're done with them — not only at the end of
the task. `browser_close(target_id=...)` for one, `browser_close_finished()`
for a full cleanup. Never accumulate more than 3 open tabs.
the task. Use `browser_close(tab_id=...)` (or no arg to close the
active tab); call it for each tab when cleaning up after a multi-tab
workflow. Never accumulate more than 3 open tabs.
`browser_tabs` reports an `origin` field: `"agent"` (you own it, close
when done), `"popup"` (close after extracting), `"startup"`/`"user"`
(leave alone).
@@ -157,7 +158,7 @@ cookie consent banners if they block content.
- If `browser_snapshot` fails, try `browser_get_text` with a narrow
selector as fallback.
- If `browser_open` fails or the page seems stale, `browser_stop`
`browser_start` retry.
`browser_open(url)` to lazy-create a fresh context.
## `browser_evaluate`
+8 -9
View File
@@ -331,10 +331,10 @@ class Orchestrator:
# Strip tool names that aren't registered in this runtime instead of
# hard-failing. The worker is forked from the queen's tool snapshot
# which may include MCP tools the worker's runtime doesn't load (e.g.
# coder-tools agent-management tools). Blocking the worker on missing
# tools leaves the queen stranded mid-task; stripping + warning lets
# the worker proceed with what it does have.
# which may include MCP tools the worker's runtime doesn't load.
# Blocking the worker on missing tools leaves the queen stranded
# mid-task; stripping + warning lets the worker proceed with what
# it does have.
for node in graph.nodes:
if node.id not in reachable:
continue
@@ -683,11 +683,10 @@ class Orchestrator:
# Set per-execution data_dir and agent_id so data tools and
# spillover files share the same session-scoped directory, and
# so MCP tools whose server-side schemas mark agent_id as a
# required field (list_dir, hashline_edit, replace_file_content,
# execute_command_tool, …) get a valid value injected even on
# registry instances where agent_loader.setup() didn't populate
# the session_context. Without this, FastMCP rejects those
# calls with "agent_id is a required property".
# required field get a valid value injected even on registry
# instances where agent_loader.setup() didn't populate the
# session_context. Without this, FastMCP rejects those calls
# with "agent_id is a required property".
_ctx_token = None
if self._storage_path:
from framework.loader.tool_registry import ToolRegistry
@@ -44,6 +44,9 @@ class McpRegistryStage(PipelineStage):
from framework.loader.mcp_registry import MCPRegistry
from framework.orchestrator.files import FILES_MCP_SERVER_NAME
# Bundled defaults (hive_tools / gcu-tools / files-tools / shell-tools)
# are seeded inside MCPRegistry.initialize(); resolve_for_agent below
# will find them even on a fresh HIVE_HOME.
registry = MCPRegistry()
mcp_loaded = False
@@ -26,11 +26,15 @@ class SkillRegistryStage(PipelineStage):
project_root: str | Path | None = None,
interactive: bool = True,
skills_config: Any = None,
extra_scope_dirs: list[Any] | None = None,
**kwargs: Any,
) -> None:
self._project_root = Path(project_root) if project_root else None
self._interactive = interactive
self._skills_config = skills_config
# Optional list of ExtraScope entries layered between user and
# project scope (e.g. ``colony_ui`` for a colony agent's skills/).
self._extra_scope_dirs = list(extra_scope_dirs) if extra_scope_dirs else []
self.skills_manager: Any = None
async def initialize(self) -> None:
@@ -41,6 +45,7 @@ class SkillRegistryStage(PipelineStage):
skills_config=self._skills_config or SkillsConfig(),
project_root=self._project_root,
interactive=self._interactive,
extra_scope_dirs=self._extra_scope_dirs,
)
self.skills_manager = SkillsManager(config)
self.skills_manager.load()
+11
View File
@@ -155,6 +155,17 @@ class SessionState(BaseModel):
# True after first successful worker execution (gates trigger delivery on restart)
worker_configured: bool = Field(default=False)
# Task-system fields (see framework/tasks).
# task_list_id: this session's own task list id (populated on first
# task_create; immutable thereafter). Used for resume reattachment —
# if it differs from resolve_task_list_id(ctx) on resume, a
# TASK_LIST_REATTACH_MISMATCH event is emitted and a fresh list is
# created at the resolved id (the orphan stays on disk).
task_list_id: str | None = None
# picked_up_from: for worker sessions, the (colony_task_list_id,
# template_task_id) pair this session was spawned for.
picked_up_from: list[Any] | None = None
model_config = {"extra": "allow"}
@property
+76 -11
View File
@@ -1,5 +1,6 @@
"""aiohttp Application factory for the Hive HTTP API server."""
import hmac
import logging
import os
from pathlib import Path
@@ -21,7 +22,9 @@ _ALLOWED_AGENT_ROOTS: tuple[Path, ...] | None = None
def _has_encrypted_credentials() -> bool:
"""Return True when an encrypted credential store already exists on disk."""
cred_dir = Path.home() / ".hive" / "credentials" / "credentials"
from framework.config import HIVE_HOME
cred_dir = HIVE_HOME / "credentials" / "credentials"
return cred_dir.is_dir() and any(cred_dir.glob("*.enc"))
@@ -30,17 +33,18 @@ def _get_allowed_agent_roots() -> tuple[Path, ...]:
Roots are anchored to the repository root (derived from ``__file__``)
so the allowlist is correct regardless of the process's working
directory.
directory. The hive-home subtrees honour ``HIVE_HOME`` so the desktop's
per-user root is allowed in addition to (or instead of) ``~/.hive``.
"""
global _ALLOWED_AGENT_ROOTS
if _ALLOWED_AGENT_ROOTS is None:
from framework.config import COLONIES_DIR
from framework.config import COLONIES_DIR, HIVE_HOME
_ALLOWED_AGENT_ROOTS = (
COLONIES_DIR.resolve(), # ~/.hive/colonies/
COLONIES_DIR.resolve(), # $HIVE_HOME/colonies/
(_REPO_ROOT / "exports").resolve(), # compat fallback
(_REPO_ROOT / "examples").resolve(),
(Path.home() / ".hive" / "agents").resolve(),
(HIVE_HOME / "agents").resolve(),
)
return _ALLOWED_AGENT_ROOTS
@@ -62,7 +66,8 @@ def validate_agent_path(agent_path: str | Path) -> Path:
if resolved.is_relative_to(root) and resolved != root:
return resolved
raise ValueError(
"agent_path must be inside an allowed directory (~/.hive/colonies/, exports/, examples/, or ~/.hive/agents/)"
"agent_path must be inside an allowed directory "
"($HIVE_HOME/colonies/, exports/, examples/, or $HIVE_HOME/agents/)"
)
@@ -94,13 +99,15 @@ def resolve_session(request: web.Request):
def sessions_dir(session: Session) -> Path:
"""Resolve the worker sessions directory for a session.
Storage layout: ~/.hive/agents/{agent_name}/sessions/
Storage layout: $HIVE_HOME/agents/{agent_name}/sessions/
Requires a worker to be loaded (worker_path must be set).
"""
if session.worker_path is None:
raise ValueError("No worker loaded — no worker sessions directory")
from framework.config import HIVE_HOME
agent_name = session.worker_path.name
return Path.home() / ".hive" / "agents" / agent_name / "sessions"
return HIVE_HOME / "agents" / agent_name / "sessions"
# Allowed CORS origins (localhost on any port)
@@ -159,6 +166,28 @@ async def no_cache_api_middleware(request: web.Request, handler):
return response
# ---------------------------------------------------------------------------
# Desktop shared-secret auth middleware.
#
# When the runtime is spawned by the Electron main process, a fresh random
# token is passed via ``HIVE_DESKTOP_TOKEN``. Every request from main must
# carry the matching ``X-Hive-Token`` header. If the env var is unset (e.g.
# running ``hive serve`` directly from a terminal), the check is skipped —
# OSS behaviour is preserved.
# ---------------------------------------------------------------------------
_EXPECTED_DESKTOP_TOKEN: str | None = os.environ.get("HIVE_DESKTOP_TOKEN") or None
@web.middleware
async def desktop_auth_middleware(request: web.Request, handler):
if _EXPECTED_DESKTOP_TOKEN is None:
return await handler(request)
provided = request.headers.get("X-Hive-Token", "")
if not hmac.compare_digest(provided, _EXPECTED_DESKTOP_TOKEN):
return web.json_response({"error": "unauthorized"}, status=401)
return await handler(request)
@web.middleware
async def error_middleware(request: web.Request, handler):
"""Catch exceptions and return JSON error responses.
@@ -287,7 +316,12 @@ def create_app(model: str | None = None) -> web.Application:
Returns:
Configured aiohttp Application ready to run.
"""
app = web.Application(middlewares=[cors_middleware, no_cache_api_middleware, error_middleware])
# Desktop mode: the runtime is always a subprocess of the Electron main
# process, which reaches it via IPC and the `hive://` custom protocol.
# There is no browser origin to authorize, so CORS is unnecessary.
# The auth middleware enforces the shared-secret token when the env var
# is set (i.e. when Electron spawned us); it is a no-op otherwise.
app = web.Application(middlewares=[desktop_auth_middleware, no_cache_api_middleware, error_middleware])
# Initialize credential store (before SessionManager so it can be shared)
from framework.credentials.store import CredentialStore
@@ -335,6 +369,18 @@ def create_app(model: str | None = None) -> web.Application:
queen_tool_registry=None,
)
# Clear orphaned compaction markers from prior server crashes. Without
# this, any session whose compaction was interrupted would block the
# next colony cold-load for the full await_completion timeout (180s)
# before falling through. See compaction_status.sweep_stale_in_progress.
try:
from framework.config import QUEENS_DIR
from framework.server import compaction_status
compaction_status.sweep_stale_in_progress(QUEENS_DIR)
except Exception:
logger.debug("compaction_status: startup sweep skipped", exc_info=True)
# Register shutdown hook
app.on_shutdown.append(_on_shutdown)
@@ -344,6 +390,7 @@ def create_app(model: str | None = None) -> web.Application:
app.router.add_get("/api/browser/status/stream", handle_browser_status_stream)
# Register route modules
from framework.server.routes_colonies import register_routes as register_colonies_routes
from framework.server.routes_colony_tools import register_routes as register_colony_tools_routes
from framework.server.routes_colony_workers import register_routes as register_colony_worker_routes
from framework.server.routes_config import register_routes as register_config_routes
@@ -358,6 +405,7 @@ def create_app(model: str | None = None) -> web.Application:
from framework.server.routes_queens import register_routes as register_queen_routes
from framework.server.routes_sessions import register_routes as register_session_routes
from framework.server.routes_skills import register_routes as register_skills_routes
from framework.server.routes_tasks import register_routes as register_task_routes
from framework.server.routes_workers import register_routes as register_worker_routes
register_config_routes(app)
@@ -370,14 +418,31 @@ def create_app(model: str | None = None) -> web.Application:
register_log_routes(app)
register_queen_routes(app)
register_queen_tools_routes(app)
register_colonies_routes(app)
register_colony_tools_routes(app)
register_mcp_routes(app)
register_colony_worker_routes(app)
register_prompt_routes(app)
register_skills_routes(app)
register_task_routes(app)
# Static file serving — Option C production mode
# If frontend/dist/ exists, serve built frontend files on /
# Commercial extensions (optional — only present in hive-desktop-runtime).
# Imports lazily so an OSS install without the `commercial` package keeps
# working unchanged.
try:
from commercial.middleware import setup_commercial_middleware
from commercial.routes import register_routes as register_commercial_routes
setup_commercial_middleware(app)
register_commercial_routes(app)
logger.info("Commercial extensions loaded")
except ImportError:
pass
# Serve the built frontend SPA (if frontend/dist exists) so hitting the
# API host in a browser loads the dashboard instead of 404'ing. In
# Electron/desktop mode the renderer still loads from file:// and
# ignores this; in dev mode Vite is used instead.
_setup_static_serving(app)
return app
@@ -147,3 +147,55 @@ async def await_completion(
)
return last
await asyncio.sleep(poll)
def sweep_stale_in_progress(queens_root: Path) -> int:
"""Rewrite any orphaned ``in_progress`` markers under ``queens_root`` to
``failed``. Returns the count of rewritten markers.
Whatever process owned the original compaction is gone (server crash,
SIGKILL, etc.), so leaving the marker at ``in_progress`` would cause every
subsequent colony cold-load for that queen session to wait the full
``await_completion`` timeout (default 180s) before falling through.
Called once during server bootstrap. Best-effort: any per-file failure is
logged and skipped the sweep should never prevent the server from
coming up.
"""
if not queens_root.exists():
return 0
cleaned = 0
try:
for queen_dir in queens_root.iterdir():
if not queen_dir.is_dir():
continue
sessions_dir = queen_dir / "sessions"
if not sessions_dir.exists():
continue
try:
for session_dir in sessions_dir.iterdir():
if not session_dir.is_dir():
continue
status = get_status(session_dir)
if status is None or status.get("status") != "in_progress":
continue
mark_failed(session_dir, "server restarted while compaction was in progress")
cleaned += 1
except OSError:
logger.debug(
"compaction_status: sweep failed under %s",
sessions_dir,
exc_info=True,
)
except OSError:
logger.debug(
"compaction_status: sweep failed under %s",
queens_root,
exc_info=True,
)
if cleaned:
logger.info(
"compaction_status: cleared %d stale 'in_progress' marker(s) at startup",
cleaned,
)
return cleaned
+39 -21
View File
@@ -371,13 +371,13 @@ async def create_queen(
_queen_role_independent,
_queen_role_reviewing,
_queen_role_working,
_queen_style,
_queen_tools_incubating,
_queen_tools_independent,
_queen_tools_reviewing,
_queen_tools_working,
finalize_queen_prompt,
)
from framework.config import get_max_tokens as _get_max_tokens
from framework.host.event_bus import AgentEvent, EventType
from framework.llm.capabilities import supports_image_tool_results
from framework.loader.mcp_registry import MCPRegistry
@@ -488,6 +488,21 @@ async def create_queen(
phase_state=phase_state,
)
# ---- Task system tools --------------------------------------------
# Every queen gets the four session task tools. Queens-of-colony
# additionally get the colony_template_* tools (gated by colony_id).
from framework.tasks.tools import (
register_colony_template_tools,
register_task_tools,
)
register_task_tools(queen_registry)
_colony_id_for_queen = getattr(session, "colony_id", None) or getattr(
getattr(session, "colony_runtime", None), "_colony_id", None
)
if _colony_id_for_queen:
register_colony_template_tools(queen_registry, colony_id=_colony_id_for_queen)
# ---- Colony runtime check (only when worker is loaded) ----------------
if session.colony_runtime:
from framework.tools.worker_monitoring_tools import register_worker_monitoring_tools
@@ -529,7 +544,7 @@ async def create_queen(
phase_state.incubating_tools = [t for t in queen_tools if t.name in incubating_names]
# Independent phase gets core tools + all MCP tools not claimed by any
# other phase (coder-tools file I/O, gcu-tools browser, etc.).
# other phase (files-tools file I/O, gcu-tools browser, etc.).
all_phase_names = independent_names | incubating_names | working_names | reviewing_names
mcp_tools = [t for t in queen_tools if t.name not in all_phase_names]
phase_state.independent_tools = [t for t in queen_tools if t.name in independent_names] + mcp_tools
@@ -631,7 +646,6 @@ async def create_queen(
(
_queen_character_core
+ _queen_role_independent
+ _queen_style
+ _queen_tools_independent
+ _queen_behavior_always
+ _queen_behavior_independent
@@ -639,27 +653,15 @@ async def create_queen(
_has_vision,
)
phase_state.prompt_incubating = finalize_queen_prompt(
(
_queen_character_core
+ _queen_role_incubating
+ _queen_style
+ _queen_tools_incubating
+ _queen_behavior_always
),
(_queen_character_core + _queen_role_incubating + _queen_tools_incubating + _queen_behavior_always),
_has_vision,
)
phase_state.prompt_working = finalize_queen_prompt(
(_queen_character_core + _queen_role_working + _queen_style + _queen_tools_working + _queen_behavior_always),
(_queen_character_core + _queen_role_working + _queen_tools_working + _queen_behavior_always),
_has_vision,
)
phase_state.prompt_reviewing = finalize_queen_prompt(
(
_queen_character_core
+ _queen_role_reviewing
+ _queen_style
+ _queen_tools_reviewing
+ _queen_behavior_always
),
(_queen_character_core + _queen_role_reviewing + _queen_tools_reviewing + _queen_behavior_always),
_has_vision,
)
@@ -919,10 +921,21 @@ async def create_queen(
# token stays local to this task.
try:
from framework.loader.tool_registry import ToolRegistry
from framework.tasks.scoping import session_task_list_id
ToolRegistry.set_execution_context(profile=session.id)
queen_agent_id = getattr(session, "agent_id", None) or "queen"
queen_list_id = session_task_list_id(queen_agent_id, session.id)
colony_id = getattr(session, "colony_id", None) or getattr(
getattr(session, "colony_runtime", None), "_colony_id", None
)
ToolRegistry.set_execution_context(
profile=session.id,
agent_id=queen_agent_id,
task_list_id=queen_list_id,
colony_id=colony_id,
)
except Exception:
logger.debug("Queen: failed to set browser profile for session %s", session.id, exc_info=True)
logger.debug("Queen: failed to set execution context for session %s", session.id, exc_info=True)
try:
lc = _queen_loop_config
queen_loop_config = LoopConfig(
@@ -970,7 +983,12 @@ async def create_queen(
llm=session.llm,
available_tools=queen_tools,
goal_context=queen_goal.to_prompt_context(),
max_tokens=lc.get("max_tokens", 8192),
# Honor configuration.json (llm.max_tokens) instead of
# hard-defaulting to 8192. The legacy fallback ignored both
# the user's saved ceiling AND the model's actual output
# capacity (e.g. glm-5.1 / kimi-k2.5 both support 32k out),
# which silently truncated long tool-emitting turns.
max_tokens=lc.get("max_tokens", _get_max_tokens()),
stream_id="queen",
execution_id=session.id,
dynamic_tools_provider=phase_state.get_current_tools,
+505
View File
@@ -0,0 +1,505 @@
"""HTTP routes for colony import/export — moving a colony spec between hosts.
Today, just the import side: accept a `tar.gz` and unpack it into HIVE_HOME so
a desktop client (or any external mover) can hand a colony to a remote runtime
to run.
POST /api/colonies/import -- multipart/form-data
file required -- .tar / .tar.gz / .tar.bz2 / .tar.xz
name optional -- override the colony name (legacy single-root
archives only); defaults to the archive's
single top-level directory
replace_existing optional -- "true" to overwrite, else 409 on conflict
The desktop sends a *multi-root* tar so the queen sees a colony's full state
(not just metadata + data) on resume. Recognised top-level prefixes:
colonies/<name>/... HIVE_HOME/colonies/<name>/...
agents/<name>/worker/... HIVE_HOME/agents/<name>/worker/...
agents/queens/<queen>/sessions/<sid>/... HIVE_HOME/agents/queens/<queen>/sessions/<sid>/...
Anything outside those is rejected. For backwards compat with older clients
that tar `<name>/...` directly (single colony dir, no `colonies/` wrapper),
the handler falls back to the legacy single-root flow when no recognised
multi-root prefix is found.
"""
from __future__ import annotations
import io
import logging
import re
import shutil
import tarfile
from pathlib import Path
from aiohttp import web
from framework.config import COLONIES_DIR
logger = logging.getLogger(__name__)
# Matches the convention used elsewhere in the codebase (see
# routes_colony_workers and queen_lifecycle_tools): lowercase alphanumerics
# and underscores only. No dots, no slashes — names are filesystem segments.
_COLONY_NAME_RE = re.compile(r"^[a-z0-9_]+$")
# Conservative segment validator for the queen's session id (date-stamped UUID
# tail like ``session_20260415_175106_eca07a69``) and queen name slug
# (``queen_technology``). Same charset as colony names — the codebase already
# normalises both to ``[a-z0-9_]+`` everywhere they're created, so accepting
# a wider charset here would just introduce a foothold for path mischief.
_SESSION_SEGMENT_RE = re.compile(r"^[a-z0-9_]+$")
# 100 MB cap on upload size. The multi-root tar carries worker conversations
# (often 100s of small JSON parts) plus the queen's forked session, so the
# legacy 50 MB ceiling is too tight. Anything bigger probably shouldn't be
# pushed wholesale anyway.
_MAX_UPLOAD_BYTES = 100 * 1024 * 1024
def _agents_dir() -> Path:
"""``COLONIES_DIR`` resolves to ``HIVE_HOME/colonies``; ``agents/`` is
the sibling. Resolved per-call so tests that monkeypatch
``COLONIES_DIR`` propagate without a second patch."""
return Path(COLONIES_DIR).parent / "agents"
def _validate_colony_name(name: str) -> str | None:
"""Return an error message if name isn't a valid colony name, else None."""
if not name:
return "colony name is required"
if len(name) > 64:
return "colony name too long (max 64 chars)"
if not _COLONY_NAME_RE.match(name):
return "colony name must match [a-z0-9_]+"
return None
def _validate_session_segment(seg: str, label: str) -> str | None:
"""Validate a path segment we're going to plumb into a destination dir."""
if not seg:
return f"{label} is required"
if len(seg) > 128:
return f"{label} too long (max 128 chars)"
if not _SESSION_SEGMENT_RE.match(seg):
return f"{label} must match [a-zA-Z0-9_-]+"
return None
def _archive_top_level(tf: tarfile.TarFile) -> tuple[str | None, str | None]:
"""Find the archive's single top-level directory, if it has one.
Used only for the legacy single-root path. Returns ``(name, error)``.
Allows the archive to optionally include a leading ``./`` prefix.
"""
tops: set[str] = set()
for member in tf.getmembers():
if not member.name or member.name.startswith("/"):
return None, f"invalid member path: {member.name!r}"
parts = Path(member.name).parts
if not parts or parts[0] == "..":
return None, f"invalid member path: {member.name!r}"
first = parts[0] if parts[0] != "." else (parts[1] if len(parts) > 1 else "")
if first:
tops.add(first)
if len(tops) != 1:
return None, "archive must contain exactly one top-level directory"
return next(iter(tops)), None
def _has_multi_root_prefix(tf: tarfile.TarFile) -> bool:
"""True iff any member name starts with a recognised multi-root prefix.
The legacy shape (`<name>/...`) doesn't match either prefix, so this lets
us route old and new clients through the same endpoint.
"""
for member in tf.getmembers():
name = member.name
if name.startswith("./"):
name = name[2:]
if name.startswith("colonies/") or name.startswith("agents/"):
return True
return False
def _normalise_member_name(name: str) -> str:
"""Strip a leading ``./`` if present; reject absolute or empty names."""
if name.startswith("./"):
name = name[2:]
return name
def _safe_extract_tar(tf: tarfile.TarFile, dest: Path, *, strip_prefix: str) -> tuple[int, str | None]:
"""Extract every member of ``tf`` whose name starts with ``strip_prefix/``
into ``dest``, with the prefix stripped off.
Each member's resolved path must stay under ``dest``; symlinks, hardlinks,
and device/fifo entries are rejected. Returns ``(files_extracted, error)``;
on error the caller is responsible for cleanup.
Members outside ``strip_prefix`` are silently *skipped* (not an error) so
the caller can call this multiple times on the same tar with different
prefixes once per recognised root.
"""
base = dest.resolve()
base.mkdir(parents=True, exist_ok=True)
files_extracted = 0
prefix_with_sep = f"{strip_prefix}/" if strip_prefix else ""
for member in tf.getmembers():
name = _normalise_member_name(member.name)
if not name:
continue
if strip_prefix:
if name == strip_prefix:
# The top-level dir entry itself; dest already exists.
continue
if not name.startswith(prefix_with_sep):
# Belongs to a different root in a multi-root tar; skip.
continue
rel = name[len(prefix_with_sep) :]
else:
rel = name
if not rel:
continue
if ".." in Path(rel).parts:
return files_extracted, f"path traversal in member: {member.name!r}"
if member.issym() or member.islnk():
return (
files_extracted,
f"symlinks/hardlinks not supported: {member.name!r}",
)
if member.isdev() or member.isfifo():
return (
files_extracted,
f"device/fifo not supported: {member.name!r}",
)
target = (base / rel).resolve()
try:
target.relative_to(base)
except ValueError:
return files_extracted, f"member escapes destination: {member.name!r}"
if member.isdir():
target.mkdir(parents=True, exist_ok=True)
continue
target.parent.mkdir(parents=True, exist_ok=True)
src = tf.extractfile(member)
if src is None:
return files_extracted, f"unsupported member: {member.name!r}"
with target.open("wb") as out:
shutil.copyfileobj(src, out)
target.chmod(member.mode & 0o755 if member.mode else 0o644)
files_extracted += 1
return files_extracted, None
def _classify_multi_root_member(name: str) -> tuple[str, str] | None:
"""Recognise a multi-root tar member and return ``(root, top_dir)``.
``root`` is one of ``"colonies"``, ``"agents_worker"``, ``"agents_queen"``;
``top_dir`` is the prefix to feed to ``_safe_extract_tar`` (the part of
the path that should be stripped before joining with the destination
base). Returns None for members that don't match any recognised root.
The caller pre-validates segments before extraction, so this is purely
structural: which root, what the strip prefix should be.
"""
parts = Path(name).parts
if not parts:
return None
if parts[0] == "colonies" and len(parts) >= 2:
return ("colonies", f"colonies/{parts[1]}")
if parts[0] == "agents" and len(parts) >= 2:
# agents/queens/<queen>/sessions/<sid>/... vs agents/<name>/worker/...
if parts[1] == "queens":
if len(parts) >= 5 and parts[3] == "sessions":
return ("agents_queen", f"agents/queens/{parts[2]}/sessions/{parts[4]}")
return None
# Plain agent — only the worker subtree is exported.
if len(parts) >= 3 and parts[2] == "worker":
return ("agents_worker", f"agents/{parts[1]}/worker")
return None
return None
def _plan_multi_root(
tf: tarfile.TarFile,
) -> tuple[dict[str, dict[str, str]], str | None]:
"""Walk the tar once and group entries by root.
Returns ``(groups, error)`` where ``groups`` is keyed by root kind
(``"colonies"`` etc.) and each entry maps the strip prefix to its
destination directory under HIVE_HOME. Validates name segments so we
bail before unpacking when something looks off.
"""
groups: dict[str, dict[str, str]] = {
"colonies": {},
"agents_worker": {},
"agents_queen": {},
}
seen_unrecognised: set[str] = set()
for member in tf.getmembers():
name = _normalise_member_name(member.name)
if not name or name.startswith("/") or ".." in Path(name).parts:
return groups, f"invalid member path: {member.name!r}"
classified = _classify_multi_root_member(name)
if classified is None:
# Track unique top-level dirs to give a useful error if nothing
# ended up classified.
seen_unrecognised.add(Path(name).parts[0])
continue
kind, prefix = classified
if prefix in groups[kind]:
continue
# Validate path segments per-kind so we never plumb dirty input into
# a destination we don't fully control.
prefix_parts = Path(prefix).parts
if kind == "colonies":
err = _validate_colony_name(prefix_parts[1])
if err:
return groups, err
dest = str(COLONIES_DIR / prefix_parts[1])
elif kind == "agents_worker":
err = _validate_colony_name(prefix_parts[1])
if err:
return groups, err
dest = str(_agents_dir() / prefix_parts[1] / "worker")
elif kind == "agents_queen":
queen, sid = prefix_parts[2], prefix_parts[4]
err = _validate_session_segment(queen, "queen name")
if err:
return groups, err
err = _validate_session_segment(sid, "queen session id")
if err:
return groups, err
dest = str(_agents_dir() / "queens" / queen / "sessions" / sid)
else: # pragma: no cover — defensive
continue
groups[kind][prefix] = dest
if not any(groups.values()):
roots = ", ".join(sorted(seen_unrecognised)) or "(none)"
return (
groups,
"tar has no recognised top-level prefix "
f"(expected colonies/, agents/<name>/worker/, "
f"agents/queens/<queen>/sessions/<sid>/; got: {roots})",
)
return groups, None
async def _read_upload(
request: web.Request,
) -> tuple[bytes | None, str | None, dict[str, str], web.Response | None]:
"""Drain the multipart upload. Returns ``(bytes, filename, form, error)``."""
if not request.content_type.startswith("multipart/"):
return None, None, {}, web.json_response({"error": "expected multipart/form-data"}, status=400)
reader = await request.multipart()
upload: bytes | None = None
upload_filename: str | None = None
form: dict[str, str] = {}
while True:
part = await reader.next()
if part is None:
break
if part.name == "file":
buf = io.BytesIO()
while True:
chunk = await part.read_chunk(size=65536)
if not chunk:
break
buf.write(chunk)
if buf.tell() > _MAX_UPLOAD_BYTES:
return (
None,
None,
{},
web.json_response(
{"error": f"upload exceeds {_MAX_UPLOAD_BYTES} bytes"},
status=413,
),
)
upload = buf.getvalue()
upload_filename = part.filename or ""
else:
form[part.name or ""] = (await part.text()).strip()
if upload is None:
return None, None, {}, web.json_response({"error": "missing 'file' part"}, status=400)
return upload, upload_filename, form, None
async def handle_import_colony(request: web.Request) -> web.Response:
"""POST /api/colonies/import — unpack a colony tarball into HIVE_HOME."""
upload, upload_filename, form, err_resp = await _read_upload(request)
if err_resp is not None:
return err_resp
assert upload is not None # for the type checker
replace_existing = form.get("replace_existing", "false").lower() == "true"
name_override = form.get("name", "").strip() or None
try:
tf = tarfile.open(fileobj=io.BytesIO(upload), mode="r:*")
except tarfile.TarError as err:
return web.json_response({"error": f"invalid tar archive: {err}"}, status=400)
try:
if _has_multi_root_prefix(tf):
return await _import_multi_root(tf, replace_existing, upload_filename, len(upload))
return await _import_legacy_single_root(tf, name_override, replace_existing, upload_filename, len(upload))
finally:
tf.close()
async def _import_legacy_single_root(
tf: tarfile.TarFile,
name_override: str | None,
replace_existing: bool,
upload_filename: str | None,
upload_size: int,
) -> web.Response:
"""Legacy path: tar contains `<name>/...` only, route to colonies/<name>/.
Kept verbatim from the previous handler so existing test fixtures and
older desktop builds keep working during a partial rollout.
"""
top, top_err = _archive_top_level(tf)
if top_err or top is None:
return web.json_response({"error": top_err}, status=400)
colony_name = name_override or top
name_err = _validate_colony_name(colony_name)
if name_err:
return web.json_response({"error": name_err}, status=400)
target = COLONIES_DIR / colony_name
if target.exists():
if not replace_existing:
return web.json_response(
{
"error": "colony already exists",
"name": colony_name,
"hint": "set replace_existing=true to overwrite",
},
status=409,
)
shutil.rmtree(target)
files_extracted, extract_err = _safe_extract_tar(tf, target, strip_prefix=top)
if extract_err:
shutil.rmtree(target, ignore_errors=True)
return web.json_response({"error": extract_err}, status=400)
logger.info(
"Imported colony %s (legacy, %d files) from upload %s (%d bytes)",
colony_name,
files_extracted,
upload_filename or "<unnamed>",
upload_size,
)
return web.json_response(
{
"name": colony_name,
"path": str(target),
"files_imported": files_extracted,
"replaced": replace_existing,
},
status=201,
)
async def _import_multi_root(
tf: tarfile.TarFile,
replace_existing: bool,
upload_filename: str | None,
upload_size: int,
) -> web.Response:
"""New path: tar contains `colonies/<name>/...` plus optional agents trees.
Each recognised root is extracted to its corresponding HIVE_HOME subtree
using the same traversal-safe walker as the legacy path. ``replace_existing``
governs the colonies dir conflict; the agents trees overwrite in place
(worker conversations and queen sessions are append-mostly stores
overwriting a stale subset is fine, and adding the conflict gate would
block legitimate re-pushes from a different desktop session).
"""
plan, plan_err = _plan_multi_root(tf)
if plan_err:
return web.json_response({"error": plan_err}, status=400)
# Conflict guard for the colonies root only — these are user-visible
# entities the desktop expects to control overwrite of.
primary_colony_name: str | None = None
primary_colony_target: Path | None = None
for prefix, dest in plan["colonies"].items():
target = Path(dest)
primary_colony_name = Path(prefix).parts[1]
primary_colony_target = target
if target.exists() and not replace_existing:
return web.json_response(
{
"error": "colony already exists",
"name": primary_colony_name,
"hint": "set replace_existing=true to overwrite",
},
status=409,
)
if target.exists() and replace_existing:
shutil.rmtree(target)
# The colonies/ root is required. agents/ trees are optional follow-ons.
if not plan["colonies"]:
return web.json_response(
{
"error": "tar missing required colonies/<name>/ root",
},
status=400,
)
summary: dict[str, dict[str, int | str]] = {}
extracted_dests: list[Path] = []
def _abort(err: str, status: int = 400) -> web.Response:
for path in extracted_dests:
shutil.rmtree(path, ignore_errors=True)
return web.json_response({"error": err}, status=status)
for kind in ("colonies", "agents_worker", "agents_queen"):
for prefix, dest in plan[kind].items():
target = Path(dest)
files_extracted, extract_err = _safe_extract_tar(tf, target, strip_prefix=prefix)
if extract_err:
return _abort(extract_err)
summary.setdefault(kind, {"files": 0})
summary[kind]["files"] = int(summary[kind].get("files", 0)) + files_extracted
extracted_dests.append(target)
total_files = sum(int(v.get("files", 0)) for v in summary.values())
logger.info(
"Imported colony %s (%d files across %d roots) from upload %s (%d bytes)",
primary_colony_name or "<unknown>",
total_files,
sum(1 for v in summary.values() if int(v.get("files", 0)) > 0),
upload_filename or "<unnamed>",
upload_size,
)
return web.json_response(
{
"name": primary_colony_name,
"path": str(primary_colony_target) if primary_colony_target else None,
"files_imported": total_files,
"by_root": summary,
"replaced": replace_existing,
},
status=201,
)
def register_routes(app: web.Application) -> None:
app.router.add_post("/api/colonies/import", handle_import_colony)
@@ -235,10 +235,6 @@ _SYSTEM_TOOLS: frozenset[str] = frozenset(
{
"get_account_info",
"get_current_time",
"bash_kill",
"bash_output",
"execute_command_tool",
"example_tool",
}
)
@@ -294,7 +290,9 @@ def _resolve_progress_db_by_name(colony_name: str) -> Path | None:
"""
if not _COLONY_NAME_RE.match(colony_name):
return None
db_path = Path.home() / ".hive" / "colonies" / colony_name / "data" / "progress.db"
from framework.config import COLONIES_DIR
db_path = COLONIES_DIR / colony_name / "data" / "progress.db"
return db_path if db_path.exists() else None
+2
View File
@@ -51,6 +51,8 @@ PROVIDER_ENV_VARS: dict[str, str] = {
"together": "TOGETHER_API_KEY",
"together_ai": "TOGETHER_API_KEY",
"deepseek": "DEEPSEEK_API_KEY",
"kimi": "KIMI_API_KEY",
"hive": "HIVE_API_KEY",
}
_SUBSCRIPTION_DEFINITIONS: list[dict[str, str]] = [
+10 -8
View File
@@ -42,12 +42,11 @@ _WORKER_INHERITED_TOOLS: frozenset[str] = frozenset(
"read_file",
"write_file",
"edit_file",
"hashline_edit",
"list_directory",
"search_files",
"undo_changes",
# Shell
"run_command",
# Terminal (basics — exec + ripgrep + glob/find)
"terminal_exec",
"terminal_rg",
"terminal_find",
# Framework synthetics (always available to any AgentLoop node)
"set_output",
"escalate",
@@ -1181,7 +1180,6 @@ async def fork_session_into_colony(
import json
import shutil
from datetime import datetime
from pathlib import Path
from framework.agent_loop.agent_loop import AgentLoop, LoopConfig
from framework.agent_loop.types import AgentContext
@@ -1245,7 +1243,9 @@ async def fork_session_into_colony(
# would wrongly flag every fresh colony as "already-exists" if we
# used ``not colony_dir.exists()``. A colony is "new" until its
# worker config has actually been written.
colony_dir = Path.home() / ".hive" / "colonies" / colony_name
from framework.config import COLONIES_DIR
colony_dir = COLONIES_DIR / colony_name
worker_name = "worker"
worker_config_path = colony_dir / f"{worker_name}.json"
is_new = not worker_config_path.exists()
@@ -1469,7 +1469,9 @@ async def fork_session_into_colony(
compaction_status.mark_in_progress(dest_queen_dir)
_worker_storage = Path.home() / ".hive" / "agents" / colony_name / worker_name
from framework.config import HIVE_HOME
_worker_storage = HIVE_HOME / "agents" / colony_name / worker_name
_dest_queen_dir = dest_queen_dir
_queen_ctx = queen_ctx
_queen_loop = queen_loop
+34 -3
View File
@@ -35,6 +35,11 @@ from framework.agents.queen.queen_tools_config import (
tools_config_exists,
update_queen_tools_config,
)
from framework.agents.queen.queen_tools_defaults import (
list_category_names,
queen_role_categories,
resolve_category_tools,
)
logger = logging.getLogger(__name__)
@@ -210,18 +215,18 @@ def _catalog_from_live_session(session: Any) -> dict[str, list[dict[str, Any]]]:
return {}
mcp_names = getattr(phase_state, "mcp_tool_names_all", set()) or set()
independent_tools = getattr(phase_state, "independent_tools", []) or []
result: dict[str, list[dict[str, Any]]] = {"(unknown)": []}
result: dict[str, list[dict[str, Any]]] = {"MCP Tools": []}
for tool in independent_tools:
if tool.name not in mcp_names:
continue
result["(unknown)"].append(
result["MCP Tools"].append(
{
"name": tool.name,
"description": tool.description,
"input_schema": tool.parameters,
}
)
return result if result["(unknown)"] else {}
return result if result["MCP Tools"] else {}
server_map = getattr(registry, "_mcp_server_tools", {}) or {}
tools_by_name = {t.name: t for t in registry.get_tools().values()}
@@ -326,10 +331,36 @@ async def handle_get_tools(request: web.Request) -> web.Response:
mcp_tool_names_by_server=catalog,
enabled_mcp_tools=enabled_mcp_tools,
),
"categories": _render_categories(queen_id, catalog),
}
return web.json_response(response)
def _render_categories(
queen_id: str,
mcp_catalog: dict[str, list[dict[str, Any]]],
) -> list[dict[str, Any]]:
"""Expose the role-default category table to the frontend.
Each entry carries the category name, the resolved member tool names
(after ``@server:NAME`` shorthand expansion against the live catalog),
and ``in_role_default`` to flag categories that contribute to this
queen's role-based default. Lets the Tool Library group tools by
category alongside the per-server view.
"""
applied = set(queen_role_categories(queen_id))
out: list[dict[str, Any]] = []
for name in list_category_names():
out.append(
{
"name": name,
"tools": resolve_category_tools(name, mcp_catalog),
"in_role_default": name in applied,
}
)
return out
async def handle_patch_tools(request: web.Request) -> web.Response:
"""PATCH /api/queen/{queen_id}/tools — persist the MCP tool allowlist.
+113 -21
View File
@@ -798,6 +798,110 @@ async def handle_session_colonies(request: web.Request) -> web.Response:
_EVENTS_HISTORY_DEFAULT_LIMIT = 2000
_EVENTS_HISTORY_MAX_LIMIT = 10000
# Files at or below this size use the simple forward-scan path (cheap enough
# that the seek-backward dance isn't worth it). Above this threshold we read
# the tail directly from end-of-file so a 50 MB log doesn't have to be paged
# through entirely just to surface the last 2000 lines.
_EVENTS_HISTORY_REVERSE_TAIL_THRESHOLD_BYTES = 1 << 20 # 1 MB
_EVENTS_HISTORY_REVERSE_TAIL_CHUNK_BYTES = 64 * 1024
def _read_events_tail(events_path: Path, limit: int) -> tuple[list[dict], int, bool]:
"""Read the tail of an append-only JSONL events log.
Returns ``(events, total, truncated)``. ``events`` is at most ``limit``
lines, oldest-first. ``total`` is the total number of non-blank lines in
the file (exact for the small-file path, exact for the large-file path
too we do a separate fast newline-count pass).
Two paths:
- Small files (< ~1 MB): forward scan. Cheap; gives an exact total for
free. Defers ``json.loads`` to the bounded deque so we never parse a
line that's about to be dropped.
- Large files: seek to EOF and read backward in 64 KB chunks until we have
at least ``limit`` complete lines. Parses only the tail. ``total`` is
counted by a separate forward byte-scan that just counts newlines
no JSON parse so it stays cheap even for huge files.
Without these optimizations, mounting the chat for a long-running queen
with a ~50 k-event log used to spend most of its time inside ``json.loads``
on the server thread (and block the event loop while doing it).
"""
from collections import deque
file_size = events_path.stat().st_size
if file_size <= _EVENTS_HISTORY_REVERSE_TAIL_THRESHOLD_BYTES:
tail_raw: deque[str] = deque(maxlen=limit)
total = 0
with open(events_path, encoding="utf-8") as f:
for line in f:
line = line.strip()
if not line:
continue
total += 1
tail_raw.append(line)
events: list[dict] = []
for raw in tail_raw:
try:
events.append(json.loads(raw))
except json.JSONDecodeError:
continue
return events, total, total > len(events)
# Large-file path: read backward until we have enough lines.
import os as _os
chunk_size = _EVENTS_HISTORY_REVERSE_TAIL_CHUNK_BYTES
pieces: list[bytes] = []
newline_count = 0
with open(events_path, "rb") as fb:
fb.seek(0, _os.SEEK_END)
pos = fb.tell()
while pos > 0 and newline_count <= limit:
read_size = min(chunk_size, pos)
pos -= read_size
fb.seek(pos)
chunk = fb.read(read_size)
newline_count += chunk.count(b"\n")
pieces.append(chunk)
pieces.reverse()
blob = b"".join(pieces)
# Drop the leading partial line unless we read from offset 0.
raw_lines = blob.split(b"\n")
if pos > 0 and raw_lines:
raw_lines = raw_lines[1:]
decoded = [ln.decode("utf-8", errors="replace").strip() for ln in raw_lines]
decoded = [ln for ln in decoded if ln]
if len(decoded) > limit:
decoded = decoded[-limit:]
events = []
for raw in decoded:
try:
events.append(json.loads(raw))
except json.JSONDecodeError:
continue
# Separate fast pass for total: count newlines only, no JSON parse.
total = 0
with open(events_path, "rb") as fb:
while True:
chunk = fb.read(1 << 20)
if not chunk:
break
total += chunk.count(b"\n")
# File may end without a trailing newline; if so, the last non-empty line
# was missed. Count it.
if file_size > 0:
with open(events_path, "rb") as fb:
fb.seek(-1, _os.SEEK_END)
if fb.read(1) != b"\n":
total += 1
return events, total, total > len(events)
async def handle_session_events_history(request: web.Request) -> web.Response:
"""GET /api/sessions/{session_id}/events/history — persisted eventbus log.
@@ -827,6 +931,9 @@ async def handle_session_events_history(request: web.Request) -> web.Response:
recent N events". Long-running colonies have produced files with 50k+
events; before this cap, restoring on page-mount shipped the whole thing
down the wire and blocked the UI for seconds.
The actual file read runs in a worker thread via ``asyncio.to_thread`` so
it doesn't block the event loop while other requests are in flight.
"""
session_id = request.match_info["session_id"]
@@ -852,24 +959,8 @@ async def handle_session_events_history(request: web.Request) -> web.Response:
}
)
# Tail the file using a bounded deque — O(limit) memory regardless
# of file size. No need to materialize the whole list only to slice it.
from collections import deque
tail: deque[dict] = deque(maxlen=limit)
total = 0
try:
with open(events_path, encoding="utf-8") as f:
for line in f:
line = line.strip()
if not line:
continue
try:
evt = json.loads(line)
except json.JSONDecodeError:
continue
total += 1
tail.append(evt)
events, total, truncated = await asyncio.to_thread(_read_events_tail, events_path, limit)
except OSError:
return web.json_response(
{
@@ -882,14 +973,13 @@ async def handle_session_events_history(request: web.Request) -> web.Response:
}
)
events = list(tail)
return web.json_response(
{
"events": events,
"session_id": session_id,
"total": total,
"returned": len(events),
"truncated": total > len(events),
"truncated": truncated,
"limit": limit,
}
)
@@ -1004,8 +1094,10 @@ async def handle_delete_agent(request: web.Request) -> web.Response:
except ValueError as exc:
return web.json_response({"error": str(exc)}, status=400)
# Reject deletion of framework agents (~/.hive/agents/) — those are internal
hive_agents_dir = Path.home() / ".hive" / "agents"
# Reject deletion of framework agents ($HIVE_HOME/agents/) — those are internal
from framework.config import HIVE_HOME
hive_agents_dir = HIVE_HOME / "agents"
if resolved.is_relative_to(hive_agents_dir):
return web.json_response({"error": "Cannot delete framework agents"}, status=403)
+26 -11
View File
@@ -180,7 +180,7 @@ def _colony_scope(manager: Any, colony_name: str) -> SkillScope | None:
overrides_path = colony_home / "skills_overrides.json"
store = SkillOverrideStore.load(overrides_path, scope_label=f"colony:{colony_name}")
write_dir = colony_home / ".hive" / "skills"
write_dir = colony_home / "skills"
admin_manager = _build_admin_manager(queen_id=queen_id, colony_name=colony_name)
@@ -210,12 +210,13 @@ def _build_admin_manager(
"""Build a read-only SkillsManager for GET when no live session exists.
Intentionally leaves ``project_root`` unset even for a colony: the
colony's ``.hive/skills/`` directory is surfaced via the ``colony_ui``
extra scope. Also routing it through ``project_root`` would double-
scan the same dir, and last-wins collision resolution would retag the
skills as ``source_scope="project"`` which flips the provenance
fallback to ``PROJECT_DROPPED`` and drops ``editable`` to ``False``
for anything without an explicit override-store entry.
colony's ``skills/`` directory (and the legacy ``.hive/skills/`` for
pre-flatten colonies) is surfaced via the ``colony_ui`` extra scope.
Routing it through ``project_root`` would double-scan the same dir,
and last-wins collision resolution would retag the skills as
``source_scope="project"`` which flips the provenance fallback to
``PROJECT_DROPPED`` and drops ``editable`` to ``False`` for anything
without an explicit override-store entry.
"""
extras: list[ExtraScope] = []
queen_overrides_path: Path | None = None
@@ -227,6 +228,10 @@ def _build_admin_manager(
if colony_name:
colony_home = COLONIES_DIR / colony_name
colony_overrides_path = colony_home / "skills_overrides.json"
# Surface both the new flat path (where new skills are written) and
# the legacy nested path (left intact for pre-flatten colonies). UI
# writes always target the flat path; reads see both.
extras.append(ExtraScope(directory=colony_home / "skills", label="colony_ui", priority=3))
extras.append(ExtraScope(directory=colony_home / ".hive" / "skills", label="colony_ui", priority=3))
cfg = SkillsManagerConfig(
queen_id=queen_id,
@@ -442,10 +447,18 @@ async def handle_list_all_skills(request: web.Request) -> web.Response:
extras.append(ExtraScope(directory=QUEENS_DIR / qid / "skills", label="queen_ui", priority=2))
# We intentionally don't plumb every colony's project_root into one
# manager — discovery only allows a single project_root. For the
# aggregator we scan every colony's .hive/skills/ as a tagged extra
# scope instead. That keeps the xml-catalog-per-scope invariant
# intact without requiring N managers.
# aggregator we scan every colony's skills/ (and the legacy nested
# .hive/skills/ for pre-flatten colonies) as tagged extra scopes
# instead. That keeps the xml-catalog-per-scope invariant intact
# without requiring N managers.
for cn in colony_names:
extras.append(
ExtraScope(
directory=COLONIES_DIR / cn / "skills",
label="colony_ui",
priority=3,
)
)
extras.append(
ExtraScope(
directory=COLONIES_DIR / cn / ".hive" / "skills",
@@ -923,7 +936,9 @@ async def handle_upload_skill(request: web.Request) -> web.Response:
# Resolve the write target
if scope_kind == "user":
write_dir = Path.home() / ".hive" / "skills"
from framework.config import HIVE_HOME
write_dir = HIVE_HOME / "skills"
overrides_path: Path | None = None
store: SkillOverrideStore | None = None
affected_runtimes: list = []
+112
View File
@@ -0,0 +1,112 @@
"""REST routes for task lists.
GET /api/tasks/{task_list_id} -- snapshot of one list
GET /api/colonies/{colony_id}/task_lists -- helper for colony view
GET /api/sessions/{session_id}/task_list_id -- helper for session view
The task_list_id segment uses URL-encoded colons (``colony%3Aabc`` /
``session%3Aagent%3Asess``); aiohttp decodes them automatically.
"""
from __future__ import annotations
import logging
from aiohttp import web
from framework.tasks import get_task_store
from framework.tasks.scoping import (
colony_task_list_id,
session_task_list_id,
)
logger = logging.getLogger(__name__)
async def handle_get_task_list(request: web.Request) -> web.Response:
raw = request.match_info.get("task_list_id", "")
if not raw:
return web.json_response({"error": "task_list_id required"}, status=400)
store = get_task_store()
if not await store.list_exists(raw):
return web.json_response(
{"error": f"Task list {raw!r} not found", "task_list_id": raw, "tasks": []},
status=404,
)
meta = await store.get_meta(raw)
records = await store.list_tasks(raw)
return web.json_response(
{
"task_list_id": raw,
"role": meta.role.value if meta else "session",
"meta": meta.model_dump(mode="json") if meta else None,
"tasks": [
{
"id": r.id,
"subject": r.subject,
"description": r.description,
"active_form": r.active_form,
"owner": r.owner,
"status": r.status.value,
"blocks": list(r.blocks),
"blocked_by": list(r.blocked_by),
"metadata": dict(r.metadata),
"created_at": r.created_at,
"updated_at": r.updated_at,
}
for r in records
],
}
)
async def handle_get_colony_task_lists(request: web.Request) -> web.Response:
"""Return template_task_list_id and queen_session_task_list_id for a colony."""
colony_id = request.match_info.get("colony_id", "")
if not colony_id:
return web.json_response({"error": "colony_id required"}, status=400)
template_id = colony_task_list_id(colony_id)
# Queen's session list — the queen-of-colony's session_id == the
# browser-facing colony session id. The frontend already knows that
# value; we surface what we have on disk for completeness.
queen_session_id = request.query.get("queen_session_id")
queen_list_id = session_task_list_id("queen", queen_session_id) if queen_session_id else None
return web.json_response(
{
"template_task_list_id": template_id,
"queen_session_task_list_id": queen_list_id,
}
)
async def handle_get_session_task_list_id(request: web.Request) -> web.Response:
"""Return task_list_id and picked_up_from for a session.
The session_id is the queen's session id or a worker's session id;
both follow the same path. The agent_id is read from the request query
(passed by the frontend, which already knows which agent the session
belongs to).
"""
session_id = request.match_info.get("session_id", "")
agent_id = request.query.get("agent_id", "queen")
if not session_id:
return web.json_response({"error": "session_id required"}, status=400)
task_list_id = session_task_list_id(agent_id, session_id)
store = get_task_store()
exists = await store.list_exists(task_list_id)
return web.json_response(
{
"task_list_id": task_list_id if exists else None,
"picked_up_from": None,
}
)
def register_routes(app: web.Application) -> None:
app.router.add_get("/api/tasks/{task_list_id}", handle_get_task_list)
app.router.add_get("/api/colonies/{colony_id}/task_lists", handle_get_colony_task_lists)
app.router.add_get("/api/sessions/{session_id}/task_list_id", handle_get_session_task_list_id)
+2 -4
View File
@@ -67,11 +67,9 @@ async def handle_list_nodes(request: web.Request) -> web.Response:
worker_session_id = request.query.get("session_id")
if worker_session_id and session.worker_path:
worker_session_id = safe_path_segment(worker_session_id)
from pathlib import Path
from framework.config import HIVE_HOME
state_path = (
Path.home() / ".hive" / "agents" / session.worker_path.name / "sessions" / worker_session_id / "state.json"
)
state_path = HIVE_HOME / "agents" / session.worker_path.name / "sessions" / worker_session_id / "state.json"
if state_path.exists():
try:
state = json.loads(state_path.read_text(encoding="utf-8"))
+38 -66
View File
@@ -19,7 +19,7 @@ from datetime import datetime
from pathlib import Path
from typing import Any, Literal
from framework.config import QUEENS_DIR
from framework.config import QUEENS_DIR, get_max_tokens
from framework.host.triggers import TriggerDefinition
logger = logging.getLogger(__name__)
@@ -546,8 +546,10 @@ class SessionManager:
session.colony_name = colony_id
session.worker_path = agent_path
# Worker storage: ~/.hive/agents/{colony_name}/{worker_name}/
worker_storage = Path.home() / ".hive" / "agents" / colony_id / worker_name
# Worker storage: $HIVE_HOME/agents/{colony_name}/{worker_name}/
from framework.config import HIVE_HOME
worker_storage = HIVE_HOME / "agents" / colony_id / worker_name
worker_storage.mkdir(parents=True, exist_ok=True)
# Copy conversations from colony if fresh
@@ -698,7 +700,10 @@ class SessionManager:
available_tools=all_tools,
goal_context=goal.to_prompt_context(),
goal=goal,
max_tokens=8192,
# Worker output cap — pull from configuration.json instead of
# hard-coding 8192. glm-5.1/kimi-k2.5 both support 32k out, and
# capping at 8k silently truncates long worker turns mid-tool.
max_tokens=get_max_tokens(),
stream_id=worker_name,
execution_id=worker_name,
identity_prompt=worker_data.get("identity_prompt", ""),
@@ -927,7 +932,9 @@ class SessionManager:
that process is still running on the host. If it is, the session is
owned by another healthy worker process, so leave it alone.
"""
sessions_path = Path.home() / ".hive" / "agents" / agent_path.name / "sessions"
from framework.config import HIVE_HOME
sessions_path = HIVE_HOME / "agents" / agent_path.name / "sessions"
if not sessions_path.exists():
return
@@ -1918,73 +1925,38 @@ class SessionManager:
if meta.get("colony_fork"):
continue
# Build a quick preview of the last human/assistant exchange.
# We read all conversation parts, filter to client-facing messages,
# and return the last assistant message content as a snippet.
# Preview of the last client-facing exchange. Cached in
# ``summary.json`` next to ``meta.json`` so the sidebar doesn't
# have to rescan every part on each list call. The cache is
# written incrementally by FileConversationStore.write_part; if
# missing or stale (parts dir mtime newer than the summary file)
# we do a one-time full rebuild and write a fresh summary.
#
# NOTE on activity timestamps: the session directory's own mtime
# is NOT reliable as a "last activity" marker — POSIX dir mtime
# only updates when direct entries change, and conversation
# parts live under conversations/parts/, so writing a new part
# does not bubble up to the session dir.
from framework.storage import session_summary
last_message: str | None = None
message_count: int = 0
# Last-activity timestamp — mtime of the latest client-facing message.
# Falls back to session creation time for empty sessions. NOTE: the
# session directory's own mtime is NOT reliable here — POSIX dir mtime
# only updates when direct entries change, and conversation parts are
# nested under conversations/parts/, so writing a new part does not
# bubble up to the session dir.
last_active_at: float = float(created_at) if isinstance(created_at, (int, float)) else 0.0
convs_dir = d / "conversations"
summary: dict | None = None
if convs_dir.exists():
try:
all_parts: list[dict] = []
if session_summary.is_stale(d):
summary = session_summary.rebuild_summary(d)
else:
summary = session_summary.read_summary(d)
def _collect_parts(parts_dir: Path, _dest: list[dict] = all_parts) -> None:
if not parts_dir.exists():
return
for part_file in sorted(parts_dir.iterdir()):
if part_file.suffix != ".json":
continue
try:
part = json.loads(part_file.read_text(encoding="utf-8"))
part.setdefault("created_at", part_file.stat().st_mtime)
_dest.append(part)
except (json.JSONDecodeError, OSError):
continue
# Flat layout: conversations/parts/*.json
_collect_parts(convs_dir / "parts")
# Node-based layout: conversations/<node_id>/parts/*.json
for node_dir in convs_dir.iterdir():
if not node_dir.is_dir() or node_dir.name == "parts":
continue
_collect_parts(node_dir / "parts")
# Filter to client-facing messages only
client_msgs = [
p
for p in all_parts
if not p.get("is_transition_marker")
and p.get("role") != "tool"
and not (p.get("role") == "assistant" and p.get("tool_calls"))
]
client_msgs.sort(key=lambda m: m.get("created_at", m.get("seq", 0)))
message_count = len(client_msgs)
# Take the latest message's timestamp as the activity marker.
# _collect_parts sets created_at via setdefault to the part
# file's mtime, so this is always a valid float.
if client_msgs:
latest_ts = client_msgs[-1].get("created_at")
if isinstance(latest_ts, (int, float)) and latest_ts > last_active_at:
last_active_at = float(latest_ts)
# Last assistant message as preview snippet
for msg in reversed(client_msgs):
content = msg.get("content") or ""
if isinstance(content, list):
# Anthropic-style content blocks
content = " ".join(
b.get("text", "") for b in content if isinstance(b, dict) and b.get("type") == "text"
)
if content and msg.get("role") == "assistant":
last_message = content[:120].strip()
break
except OSError:
pass
if summary is not None:
message_count = int(summary.get("message_count") or 0)
last_message = summary.get("last_message")
cached_active = summary.get("last_active_at")
if isinstance(cached_active, (int, float)) and cached_active > last_active_at:
last_active_at = float(cached_active)
# Derive queen_id from directory structure: queens/{queen_id}/sessions/{session_id}
queen_id = d.parent.parent.name if d.parent.name == "sessions" else None
@@ -0,0 +1,425 @@
"""Tests for POST /api/colonies/import — tar-based colony onboarding.
The handler resolves writes against ``framework.config.COLONIES_DIR``;
every test redirects that into a ``tmp_path`` so we never touch the real
``~/.hive/colonies`` tree.
"""
from __future__ import annotations
import io
import tarfile
from pathlib import Path
import pytest
from aiohttp import FormData, web
from aiohttp.test_utils import TestClient, TestServer
from framework.server import routes_colonies
def _build_tar(layout: dict[str, bytes | None], *, gzip: bool = True) -> bytes:
"""Build an in-memory tar with the given paths.
``layout`` maps archive member names to file contents; passing ``None``
creates a directory entry instead of a regular file.
"""
buf = io.BytesIO()
mode = "w:gz" if gzip else "w"
with tarfile.open(fileobj=buf, mode=mode) as tf:
for name, content in layout.items():
if content is None:
info = tarfile.TarInfo(name=name)
info.type = tarfile.DIRTYPE
info.mode = 0o755
tf.addfile(info)
else:
info = tarfile.TarInfo(name=name)
info.size = len(content)
info.mode = 0o644
tf.addfile(info, io.BytesIO(content))
return buf.getvalue()
def _build_tar_with_symlink(top: str, link_name: str, link_target: str) -> bytes:
buf = io.BytesIO()
with tarfile.open(fileobj=buf, mode="w:gz") as tf:
info = tarfile.TarInfo(name=top)
info.type = tarfile.DIRTYPE
info.mode = 0o755
tf.addfile(info)
sym = tarfile.TarInfo(name=f"{top}/{link_name}")
sym.type = tarfile.SYMTYPE
sym.linkname = link_target
tf.addfile(sym)
return buf.getvalue()
@pytest.fixture
def colonies_dir(tmp_path, monkeypatch):
"""Redirect COLONIES_DIR into a tmp tree."""
colonies = tmp_path / "colonies"
colonies.mkdir()
monkeypatch.setattr(routes_colonies, "COLONIES_DIR", colonies)
return colonies
async def _client(app: web.Application) -> TestClient:
return TestClient(TestServer(app))
def _app() -> web.Application:
app = web.Application()
routes_colonies.register_routes(app)
return app
def _form(file_bytes: bytes, *, filename: str = "colony.tar.gz", **fields: str) -> FormData:
fd = FormData()
fd.add_field("file", file_bytes, filename=filename, content_type="application/gzip")
for k, v in fields.items():
fd.add_field(k, v)
return fd
@pytest.mark.asyncio
async def test_happy_path_imports_colony(colonies_dir: Path) -> None:
archive = _build_tar(
{
"x_daily/": None,
"x_daily/metadata.json": b'{"colony_name":"x_daily"}',
"x_daily/scripts/run.sh": b"#!/bin/sh\necho hi\n",
}
)
async with await _client(_app()) as c:
resp = await c.post("/api/colonies/import", data=_form(archive))
assert resp.status == 201, await resp.text()
body = await resp.json()
assert body["name"] == "x_daily"
assert body["files_imported"] == 2
assert (colonies_dir / "x_daily" / "metadata.json").read_bytes() == b'{"colony_name":"x_daily"}'
assert (colonies_dir / "x_daily" / "scripts" / "run.sh").exists()
@pytest.mark.asyncio
async def test_name_override(colonies_dir: Path) -> None:
archive = _build_tar({"x_daily/": None, "x_daily/file.txt": b"hi"})
async with await _client(_app()) as c:
resp = await c.post("/api/colonies/import", data=_form(archive, name="other_name"))
assert resp.status == 201
body = await resp.json()
assert body["name"] == "other_name"
assert (colonies_dir / "other_name" / "file.txt").read_bytes() == b"hi"
assert not (colonies_dir / "x_daily").exists()
@pytest.mark.asyncio
async def test_rejects_existing_without_replace_flag(colonies_dir: Path) -> None:
(colonies_dir / "x_daily").mkdir()
(colonies_dir / "x_daily" / "old.txt").write_text("preserved")
archive = _build_tar({"x_daily/": None, "x_daily/new.txt": b"new"})
async with await _client(_app()) as c:
resp = await c.post("/api/colonies/import", data=_form(archive))
assert resp.status == 409
# Original content untouched
assert (colonies_dir / "x_daily" / "old.txt").read_text() == "preserved"
@pytest.mark.asyncio
async def test_replace_existing_overwrites(colonies_dir: Path) -> None:
(colonies_dir / "x_daily").mkdir()
(colonies_dir / "x_daily" / "old.txt").write_text("preserved")
archive = _build_tar({"x_daily/": None, "x_daily/new.txt": b"new"})
async with await _client(_app()) as c:
resp = await c.post(
"/api/colonies/import",
data=_form(archive, replace_existing="true"),
)
assert resp.status == 201, await resp.text()
assert not (colonies_dir / "x_daily" / "old.txt").exists()
assert (colonies_dir / "x_daily" / "new.txt").read_text() == "new"
@pytest.mark.asyncio
async def test_rejects_path_traversal(colonies_dir: Path) -> None:
archive = _build_tar(
{
"x_daily/": None,
"x_daily/../escape.txt": b"oops",
}
)
async with await _client(_app()) as c:
resp = await c.post("/api/colonies/import", data=_form(archive))
assert resp.status == 400
assert "traversal" in (await resp.json())["error"].lower() or "outside" in (await resp.json())["error"].lower()
assert not (colonies_dir / "x_daily").exists()
assert not (colonies_dir.parent / "escape.txt").exists()
@pytest.mark.asyncio
async def test_rejects_absolute_member(colonies_dir: Path) -> None:
archive = _build_tar({"x_daily/": None, "/etc/passwd": b"oops"})
async with await _client(_app()) as c:
resp = await c.post("/api/colonies/import", data=_form(archive))
assert resp.status == 400
@pytest.mark.asyncio
async def test_rejects_symlinks(colonies_dir: Path) -> None:
archive = _build_tar_with_symlink("x_daily", "evil", "/etc/passwd")
async with await _client(_app()) as c:
resp = await c.post("/api/colonies/import", data=_form(archive))
assert resp.status == 400
assert "symlink" in (await resp.json())["error"].lower()
@pytest.mark.asyncio
async def test_rejects_multiple_top_level_dirs(colonies_dir: Path) -> None:
archive = _build_tar(
{
"a/": None,
"a/x.txt": b"a",
"b/": None,
"b/y.txt": b"b",
}
)
async with await _client(_app()) as c:
resp = await c.post("/api/colonies/import", data=_form(archive))
assert resp.status == 400
assert "top-level" in (await resp.json())["error"].lower()
@pytest.mark.asyncio
async def test_rejects_invalid_colony_name(colonies_dir: Path) -> None:
archive = _build_tar({"Bad-Name/": None, "Bad-Name/x.txt": b"x"})
async with await _client(_app()) as c:
resp = await c.post("/api/colonies/import", data=_form(archive))
assert resp.status == 400
@pytest.mark.asyncio
async def test_rejects_non_multipart(colonies_dir: Path) -> None:
async with await _client(_app()) as c:
resp = await c.post(
"/api/colonies/import", data=b"not multipart", headers={"Content-Type": "application/octet-stream"}
)
assert resp.status == 400
@pytest.mark.asyncio
async def test_rejects_corrupt_tar(colonies_dir: Path) -> None:
async with await _client(_app()) as c:
resp = await c.post("/api/colonies/import", data=_form(b"not a real tar"))
assert resp.status == 400
@pytest.mark.asyncio
async def test_rejects_missing_file_part(colonies_dir: Path) -> None:
fd = FormData()
fd.add_field("name", "anything")
async with await _client(_app()) as c:
resp = await c.post("/api/colonies/import", data=fd)
assert resp.status == 400
@pytest.mark.asyncio
async def test_accepts_uncompressed_tar(colonies_dir: Path) -> None:
archive = _build_tar({"x_daily/": None, "x_daily/file.txt": b"plain"}, gzip=False)
async with await _client(_app()) as c:
resp = await c.post(
"/api/colonies/import",
data=_form(archive, filename="colony.tar"),
)
assert resp.status == 201
assert (colonies_dir / "x_daily" / "file.txt").read_text() == "plain"
# --------------------------------------------------------------------------
# Multi-root tar tests — the desktop's pushColonyToWorkspace ships the colony
# dir + worker conversations + the queen's forked session in one tar so the
# queen has full context on resume. Each recognised top-level prefix unpacks
# into its corresponding HIVE_HOME subtree.
# --------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_multi_root_unpacks_three_subtrees(colonies_dir: Path) -> None:
archive = _build_tar(
{
"colonies/x_daily/": None,
"colonies/x_daily/metadata.json": b'{"queen_session_id":"session_x"}',
"colonies/x_daily/data/progress.db": b"sqlite",
"agents/x_daily/worker/": None,
"agents/x_daily/worker/conversations/": None,
"agents/x_daily/worker/conversations/0001.json": b'{"role":"user"}',
"agents/x_daily/worker/conversations/0002.json": b'{"role":"assistant"}',
"agents/queens/queen_alpha/sessions/session_x/": None,
"agents/queens/queen_alpha/sessions/session_x/queen.json": b'{"id":"x"}',
}
)
async with await _client(_app()) as c:
resp = await c.post("/api/colonies/import", data=_form(archive))
assert resp.status == 201, await resp.text()
body = await resp.json()
# Colony files
assert (colonies_dir / "x_daily" / "metadata.json").exists()
assert (colonies_dir / "x_daily" / "data" / "progress.db").exists()
# Worker conversations under HIVE_HOME/agents/<colony>/worker/
hive_home = colonies_dir.parent
assert (
hive_home / "agents" / "x_daily" / "worker" / "conversations" / "0001.json"
).read_bytes() == b'{"role":"user"}'
# Queen forked session under HIVE_HOME/agents/queens/<queen>/sessions/<sid>/
assert (hive_home / "agents" / "queens" / "queen_alpha" / "sessions" / "session_x" / "queen.json").exists()
# Summary in response
assert body["name"] == "x_daily"
assert body["files_imported"] == 5
by_root = body["by_root"]
assert by_root["colonies"]["files"] == 2
assert by_root["agents_worker"]["files"] == 2
assert by_root["agents_queen"]["files"] == 1
@pytest.mark.asyncio
async def test_multi_root_colonies_only_succeeds(colonies_dir: Path) -> None:
"""The agents/ subtrees are optional — a fresh colony has no history."""
archive = _build_tar(
{
"colonies/x_daily/": None,
"colonies/x_daily/metadata.json": b"{}",
}
)
async with await _client(_app()) as c:
resp = await c.post("/api/colonies/import", data=_form(archive))
assert resp.status == 201, await resp.text()
body = await resp.json()
assert body["files_imported"] == 1
assert (colonies_dir / "x_daily" / "metadata.json").read_bytes() == b"{}"
@pytest.mark.asyncio
async def test_multi_root_rejects_missing_colonies_root(colonies_dir: Path) -> None:
"""Worker / queen trees alone aren't valid — every push must include
the colony dir, otherwise the desktop's intent is unclear and we
refuse rather than silently leave HIVE_HOME in a half-state."""
archive = _build_tar(
{
"agents/x_daily/worker/": None,
"agents/x_daily/worker/log.json": b"{}",
}
)
async with await _client(_app()) as c:
resp = await c.post("/api/colonies/import", data=_form(archive))
assert resp.status == 400, await resp.text()
err = (await resp.json())["error"]
assert "colonies/" in err
@pytest.mark.asyncio
async def test_multi_root_replace_existing_colony(colonies_dir: Path) -> None:
(colonies_dir / "x_daily").mkdir()
(colonies_dir / "x_daily" / "old.txt").write_text("preserved")
archive = _build_tar(
{
"colonies/x_daily/": None,
"colonies/x_daily/new.txt": b"new",
}
)
# Without flag → 409
async with await _client(_app()) as c:
resp = await c.post("/api/colonies/import", data=_form(archive))
assert resp.status == 409
assert (colonies_dir / "x_daily" / "old.txt").read_text() == "preserved"
# With flag → wipes + replaces
async with await _client(_app()) as c:
resp = await c.post(
"/api/colonies/import",
data=_form(archive, replace_existing="true"),
)
assert resp.status == 201, await resp.text()
assert not (colonies_dir / "x_daily" / "old.txt").exists()
assert (colonies_dir / "x_daily" / "new.txt").read_text() == "new"
@pytest.mark.asyncio
async def test_multi_root_rejects_traversal_in_worker_subtree(colonies_dir: Path) -> None:
archive = _build_tar(
{
"colonies/x_daily/": None,
"colonies/x_daily/m.json": b"{}",
"agents/x_daily/worker/": None,
"agents/x_daily/worker/../escape.txt": b"oops",
}
)
async with await _client(_app()) as c:
resp = await c.post("/api/colonies/import", data=_form(archive))
assert resp.status == 400
hive_home = colonies_dir.parent
assert not (hive_home / "agents" / "escape.txt").exists()
@pytest.mark.asyncio
async def test_multi_root_rejects_unknown_prefix(colonies_dir: Path) -> None:
archive = _build_tar(
{
"colonies/x_daily/": None,
"colonies/x_daily/m.json": b"{}",
"etc/passwd": b"oops",
}
)
async with await _client(_app()) as c:
resp = await c.post("/api/colonies/import", data=_form(archive))
# The unknown root is silently ignored (it doesn't match any
# recognised prefix); the colony root is required and present, so
# extraction succeeds and only the colonies subtree lands. We don't
# write outside HIVE_HOME because the dispatcher only routes to
# known destinations.
assert resp.status == 201, await resp.text()
hive_home = colonies_dir.parent
assert not (hive_home.parent / "etc" / "passwd").exists()
assert not (hive_home / "etc" / "passwd").exists()
@pytest.mark.asyncio
async def test_multi_root_rejects_invalid_segment(colonies_dir: Path) -> None:
archive = _build_tar(
{
"colonies/x_daily/": None,
"colonies/x_daily/m.json": b"{}",
"agents/queens/Bad-Queen/sessions/sess_1/": None,
"agents/queens/Bad-Queen/sessions/sess_1/x.json": b"{}",
}
)
async with await _client(_app()) as c:
resp = await c.post("/api/colonies/import", data=_form(archive))
assert resp.status == 400
@pytest.mark.asyncio
async def test_multi_root_overwrites_agents_subtree_in_place(colonies_dir: Path) -> None:
"""Worker/queen subtrees are append-mostly stores — the import handler
extracts in place without an existence-conflict gate so the desktop can
re-push from another machine without explicit overwrite."""
hive_home = colonies_dir.parent
worker_dir = hive_home / "agents" / "x_daily" / "worker" / "conversations"
worker_dir.mkdir(parents=True)
(worker_dir / "0000_old.json").write_text("old")
archive = _build_tar(
{
"colonies/x_daily/": None,
"colonies/x_daily/m.json": b"{}",
"agents/x_daily/worker/": None,
"agents/x_daily/worker/conversations/": None,
"agents/x_daily/worker/conversations/0001_new.json": b"new",
}
)
async with await _client(_app()) as c:
resp = await c.post(
"/api/colonies/import",
data=_form(archive, replace_existing="true"),
)
assert resp.status == 201, await resp.text()
# Old conversation file untouched (extraction is additive on agents/),
# new one written.
assert (worker_dir / "0000_old.json").read_text() == "old"
assert (worker_dir / "0001_new.json").read_text() == "new"
@@ -131,7 +131,7 @@ async def test_get_tools_default_allow(colony_dir):
_, name = colony_dir
manager = _FakeManager(
_mcp_tool_catalog={
"coder-tools": [
"files-tools": [
{"name": "read_file", "description": "read", "input_schema": {}},
{"name": "write_file", "description": "write", "input_schema": {}},
],
@@ -153,7 +153,7 @@ async def test_patch_persists_and_validates(colony_dir):
colonies_dir, name = colony_dir
manager = _FakeManager(
_mcp_tool_catalog={
"coder-tools": [
"files-tools": [
{"name": "read_file", "description": "", "input_schema": {}},
{"name": "write_file", "description": "", "input_schema": {}},
]
@@ -201,7 +201,7 @@ async def test_patch_refreshes_live_runtime(colony_dir):
manager = _FakeManager(
_sessions={session.id: session},
_mcp_tool_catalog={
"coder-tools": [
"files-tools": [
{"name": "read_file", "description": "", "input_schema": {}},
{"name": "write_file", "description": "", "input_schema": {}},
]
@@ -132,7 +132,7 @@ async def test_list_servers_returns_built_in(registry):
assert "built-in-seed" in names
sources = {s["name"]: s["source"] for s in body["servers"]}
assert sources.get("built-in-seed") == "registry"
# The package-baked servers (coder-tools/gcu-tools/hive_tools) carry
# The package-baked servers (files-tools/gcu-tools/hive_tools) carry
# source=="built-in" and are non-removable.
pkg_entries = [s for s in body["servers"] if s["source"] == "built-in"]
assert pkg_entries, "expected at least one package-baked MCP server"
+69 -26
View File
@@ -159,7 +159,7 @@ async def test_get_tools_default_allows_everything_for_unknown_queen(queen_dir,
manager = _FakeManager()
manager._mcp_tool_catalog = {
"coder-tools": [
"files-tools": [
{"name": "read_file", "description": "read", "input_schema": {}},
{"name": "write_file", "description": "write", "input_schema": {}},
],
@@ -175,8 +175,8 @@ async def test_get_tools_default_allows_everything_for_unknown_queen(queen_dir,
assert body["is_role_default"] is True # no sidecar → default-allow
assert body["stale"] is False
servers = {s["name"]: s for s in body["mcp_servers"]}
assert set(servers) == {"coder-tools"}
for tool in servers["coder-tools"]["tools"]:
assert set(servers) == {"files-tools"}
for tool in servers["files-tools"]["tools"]:
assert tool["enabled"] is True
@@ -187,13 +187,16 @@ async def test_get_tools_applies_role_default(queen_dir, monkeypatch):
_, queen_id = queen_dir # queen_technology — has a role default
manager = _FakeManager()
# Seed a catalog covering tools the role default references so the
# response reflects what the queen would actually see on boot.
# Seed two MCP servers: files-tools is referenced by the technology
# role via the @server:files-tools shorthand in `file_ops`, so its
# tools should bubble into the default. unrelated-server is NOT
# referenced by any role category — its tools must NOT leak in.
manager._mcp_tool_catalog = {
"coder-tools": [
"files-tools": [
{"name": "read_file", "description": "", "input_schema": {}},
{"name": "port_scan", "description": "", "input_schema": {}}, # security
{"name": "excel_read", "description": "", "input_schema": {}}, # data
{"name": "edit_file", "description": "", "input_schema": {}},
],
"unrelated-server": [
{"name": "fluffy_unknown_tool", "description": "", "input_schema": {}},
],
}
@@ -204,32 +207,66 @@ async def test_get_tools_applies_role_default(queen_dir, monkeypatch):
assert resp.status == 200
body = await resp.json()
# queen_technology's role default includes file_read, data, security, etc.
assert body["is_role_default"] is True
enabled = set(body["enabled_mcp_tools"] or [])
# @server:files-tools shorthand pulls in every tool under that server.
assert "read_file" in enabled
assert "port_scan" in enabled # technology role includes security
assert "excel_read" in enabled
# Tools not in any category (and not in a @server: expansion target
# the role references) are NOT part of the default.
assert "edit_file" in enabled
# Tools registered under a server the role doesn't reference are NOT
# part of the default.
assert "fluffy_unknown_tool" not in enabled
@pytest.mark.asyncio
async def test_get_tools_exposes_categories(queen_dir, monkeypatch):
"""Response includes the category catalog with role-default flags."""
monkeypatch.setattr(routes_queen_tools, "ensure_default_queens", lambda: None)
_, queen_id = queen_dir # queen_technology
manager = _FakeManager()
manager._mcp_tool_catalog = {
"files-tools": [
{"name": "read_file", "description": "", "input_schema": {}},
{"name": "edit_file", "description": "", "input_schema": {}},
],
}
app = await _make_app(manager=manager)
async with TestClient(TestServer(app)) as client:
resp = await client.get(f"/api/queen/{queen_id}/tools")
assert resp.status == 200
body = await resp.json()
cats = {c["name"]: c for c in body["categories"]}
# Categories that contribute to queen_technology's role default
assert cats["file_ops"]["in_role_default"] is True
assert cats["browser_basic"]["in_role_default"] is True
# Spreadsheet category is exposed even though queen_technology doesn't
# use it — frontend can group/show it.
assert "spreadsheet_advanced" in cats
assert cats["spreadsheet_advanced"]["in_role_default"] is False
# Security was removed from queen_technology defaults.
assert cats["security"]["in_role_default"] is False
# @server:files-tools shorthand expanded against the catalog.
assert "read_file" in cats["file_ops"]["tools"]
assert "edit_file" in cats["file_ops"]["tools"]
def test_resolve_queen_default_tools_expands_server_shorthand():
"""@server:NAME shorthand expands against the provided catalog."""
from framework.agents.queen.queen_tools_defaults import resolve_queen_default_tools
catalog = {
"gcu-tools": [
{"name": "browser_navigate"},
{"name": "browser_click"},
"files-tools": [
{"name": "read_file"},
{"name": "write_file"},
],
}
# queen_brand_design uses "browser" category → expands via @server:gcu-tools.
# queen_brand_design uses "file_ops" category → expands via @server:files-tools.
result = resolve_queen_default_tools("queen_brand_design", catalog)
assert result is not None
assert "browser_navigate" in result
assert "browser_click" in result
assert "read_file" in result
assert "write_file" in result
def test_resolve_queen_default_tools_unknown_queen_returns_none():
@@ -245,7 +282,7 @@ async def test_patch_persists_and_validates(queen_dir, monkeypatch):
manager = _FakeManager()
manager._mcp_tool_catalog = {
"coder-tools": [
"files-tools": [
{"name": "read_file", "description": "", "input_schema": {}},
{"name": "write_file", "description": "", "input_schema": {}},
]
@@ -318,7 +355,7 @@ async def test_patch_hot_reloads_live_session(queen_dir, monkeypatch):
tools_by_name = {"read_file": _tool("read_file"), "write_file": _tool("write_file")}
registry = _FakeRegistry(
server_map={"coder-tools": {"read_file", "write_file"}},
server_map={"files-tools": {"read_file", "write_file"}},
tools_by_name=tools_by_name,
)
# Patch get_tools to return real Tool objects for name/description plumbing.
@@ -375,9 +412,12 @@ async def test_delete_restores_role_default(queen_dir, monkeypatch):
manager = _FakeManager()
manager._mcp_tool_catalog = {
"coder-tools": [
"files-tools": [
{"name": "read_file", "description": "", "input_schema": {}},
{"name": "port_scan", "description": "", "input_schema": {}},
# pdf_read lives in hive_tools but is named explicitly in the
# file_ops category, so we stage it in any server here just to
# surface it through the catalog.
{"name": "pdf_read", "description": "", "input_schema": {}},
],
}
@@ -398,11 +438,14 @@ async def test_delete_restores_role_default(queen_dir, monkeypatch):
assert body["is_role_default"] is True
assert not tools_path.exists()
# The new effective list is the role default for queen_technology,
# which includes both read_file (file_read) and port_scan (security).
# The new effective list is the role default for queen_technology;
# security tools were intentionally removed, so port_scan must NOT
# appear, while file_ops members like read_file/pdf_read do.
enabled = set(body["enabled_mcp_tools"] or [])
assert "read_file" in enabled
assert "port_scan" in enabled
assert "pdf_read" in enabled
assert "port_scan" not in enabled
assert "subdomain_enumerate" not in enabled
# GET confirms.
resp = await client.get(f"/api/queen/{queen_id}/tools")
@@ -11,7 +11,7 @@ metadata:
**Applies when** your spawn message has `db_path:` and `colony_id:` fields. The DB is your durable working memory — tells you what's done, what to skip, which SOP gates you owe.
Access via `execute_command_tool` running `sqlite3 "<db_path>" "..."`. Tables: `tasks` (queue), `steps` (per-task decomposition), `sop_checklist` (hard gates).
Access via `terminal_exec` running `sqlite3 "<db_path>" "..."`. Tables: `tasks` (queue), `steps` (per-task decomposition), `sop_checklist` (hard gates).
### Claim: assigned task (check this FIRST)
@@ -113,7 +113,7 @@ Even after `wait_until="load"`, React/Vue SPAs often render their real chrome in
### Reading pages efficiently
- **Prefer `browser_snapshot` over `browser_get_text("body")`** — returns a compact ~15 KB accessibility tree vs 100+ KB of raw HTML.
- Interaction tools `browser_click`, `browser_type`, `browser_type_focused`, `browser_fill`, and `browser_scroll` wait 0.5 s for the page to settle after a successful action, then attach a fresh accessibility snapshot under the `snapshot` key of their result. Use it to decide your next action — do NOT call `browser_snapshot` separately after every action. Tune the capture via `auto_snapshot_mode`: `"default"` (full tree, the default), `"simple"` (trims unnamed structural nodes), `"interactive"` (only controls — tightest token footprint), or `"off"` to skip the capture entirely (useful when batching several interactions and you don't need the intermediate trees). Call `browser_snapshot` explicitly only when you need a newer view or a different mode than what was auto-captured.
- Interaction tools `browser_click`, `browser_type`, `browser_type_focused`, and `browser_scroll` wait 0.5 s for the page to settle after a successful action, then attach a fresh accessibility snapshot under the `snapshot` key of their result. Use it to decide your next action — do NOT call `browser_snapshot` separately after every action. Tune the capture via `auto_snapshot_mode`: `"default"` (full tree, the default), `"simple"` (trims unnamed structural nodes), `"interactive"` (only controls — tightest token footprint), or `"off"` to skip the capture entirely (useful when batching several interactions and you don't need the intermediate trees). Call `browser_snapshot` explicitly only when you need a newer view or a different mode than what was auto-captured.
- Complex pages (LinkedIn, Twitter/X, SPAs with virtual scrolling) can have DOMs that don't match what's visually rendered — snapshot refs may be stale, missing, or misaligned with visible layout. Try the available snapshot first; when the target is not present in that snapshot or visual position matters, switch to `browser_screenshot` to orient yourself.
- Only fall back to `browser_get_text` for extracting specific small elements by CSS selector.
@@ -244,8 +244,8 @@ The highlight overlay stays visible on the page for **10 seconds** after each in
**Close tabs as soon as you are done with them** — not only at the end of the task. After reading or extracting data from a tab, close it immediately.
- Finished reading/extracting from a tab? `browser_close(target_id=...)`
- Completed a multi-tab workflow? `browser_close_finished()` to clean up all your tabs
- Finished reading/extracting from a tab? `browser_close(tab_id=...)` (or no arg to close the active tab)
- Completed a multi-tab workflow? Call `browser_close` for each tab you opened — list with `browser_tabs` first if you've lost track of IDs
- More than 3 tabs open? Stop and close finished ones before opening more
- Popup appeared that you didn't need? Close it immediately
@@ -410,7 +410,7 @@ In all of these cases the script is SHORT (< 10 lines) and the result is CONSUME
- If a tool fails, retry once with the same approach.
- If it fails a second time, STOP retrying and switch approach.
- If `browser_snapshot` fails, try `browser_get_text` with a specific small selector as fallback.
- If `browser_open` fails or page seems stale, `browser_stop`, then `browser_start`, then retry.
- If `browser_open` fails or page seems stale, `browser_stop`, then `browser_open(url)` again to recreate a fresh context.
## Verified workflows
@@ -0,0 +1,160 @@
---
name: hive.chart-creation-foundations
description: Required reading whenever any chart_* tool is available. Teaches the one-tool embedding contract (call chart_render → live chart appears in chat AND a downloadable PNG lands in the queen session dir), the ECharts (data viz) vs Mermaid (structural diagrams) decision, the BI/financial-grade aesthetic baseline (no chartjunk, restrained palette, proper typography, single message per chart), and the canonical spec patterns for the 12 most-common chart types. Skipping this leads to 1990s-Excel charts, missing downloads, and the agent writing markdown image links by hand instead of letting chart_render drive the UI.
metadata:
author: hive
type: preset-skill
version: "1.0"
---
# Chart creation foundations
These tools render BI/financial-analyst-grade charts and diagrams that show up live in the chat AND save as high-DPI PNGs in the user's queen session dir.
## The embedding contract — one rule
> **To put a chart in chat, call `chart_render`. The chat reads `result.spec` and renders the chart live in the message bubble. The download link is `result.file_url`. Do not write `![chart](...)` image markdown by hand — the tool's result drives the UI.**
That's it. One tool call, one chart in chat, one file on disk. No two-step "remember to also save it" pattern. The chat's chart-rendering UI is fed by the tool result envelope automatically.
## When to chart at all
Chart when the data is **visual at heart**: trends over time, distributions, comparisons across categories, hierarchies, flows, geo. Skip the chart when:
- The point is one number → just say it. ("Revenue was $4.2M, up 12% YoY.")
- The point is a ranking of 5 things → use a markdown table with bold and emoji indicators.
- The data is so noisy a chart would mislead → describe the takeaway in prose.
A chart costs the user attention. It must repay that cost with a takeaway they couldn't get from prose.
## ECharts vs Mermaid — the picking rule
| Use ECharts (`kind: "echarts"`) when... | Use Mermaid (`kind: "mermaid"`) when... |
|---|---|
| You're plotting **numbers over categories or time** | You're showing **structure, not data** |
| Bar / line / area / scatter / candlestick / heatmap / treemap / sankey / parallel coordinates / calendar / gauge / pie / sunburst / geo map | Flowchart / sequence / gantt / ERD / state diagram / mindmap / class diagram / C4 architecture |
| The viewer's question is "how much / how many / what's the trend" | The viewer's question is "what calls what / what depends on what / what happens after what" |
If both fit (rare), prefer ECharts — its rasterized output is a proper data chart for slides; Mermaid's diagrams are for technical docs.
## The aesthetic baseline (non-negotiable)
These are the rules that turn an Excel-default chart into a Tableau-grade one. Every chart you produce must follow them.
### 1. Theme & background
- `chart_render` has **no `theme` parameter**. The renderer reads the user's UI theme from the desktop env (`HIVE_DESKTOP_THEME`) so the saved PNG matches what the user is actually looking at. You don't pick; the system does.
- Title goes in `option.title.text`, NOT in the message body. The chart is self-contained.
### 2. Palette discipline — DO NOT set `color` on series
The OpenHive ECharts theme is auto-applied to every `chart_render` call. It defines:
- An 8-hue **categorical palette** for multi-series charts (honey orange, slate blue, sage, terracotta, bronze, indigo, olive, rust)
- Cozy spacing (`grid.top: 90`, `grid.bottom: 56`, etc.)
- Brand typography (Inter Tight)
- Tasteful axis lines + dashed gridlines
**Do not set `option.color`, `option.title.textStyle`, `option.grid`, or `option.itemStyle.color` on series.** The theme covers it. If you do override, you'll fight the brand palette and the chart will look generic.
When you need data-encoded color (NOT category color):
- **Sequential** (magnitude): use `visualMap` with `inRange.color: ['#fff7e0', '#db6f02']` (light-to-honey)
- **Diverging** (positive/negative): use `visualMap` with `inRange.color: ['#a8453d', '#f5f5f5', '#3d7a4a']` (terracotta/neutral/sage)
- **Semantic up/down** (candlestick is auto-themed): for explicit gain/loss bars use `#3d7a4a` (gain) and `#a8453d` (loss), NOT `#27ae60` / `#e74c3c`.
### 3. Typography
The default font (`-apple-system, "Inter Tight", system-ui`) is already wired in the renderer — don't override unless the user asked. Set `option.textStyle.fontSize: 13` for body labels, `16` for axis names, `18` bold for the title.
### 4. No chartjunk
- **No 3D**. Ever. 3D pie charts and 3D bar charts are visual lies.
- **No drop shadows** on bars or lines. The default flat ECharts look is correct.
- **No gradient fills** unless the gradient encodes data (e.g. heatmap fill).
- **No neon colors**. Saturation belongs on highlighted bars, not on every series.
- **No more than 5 stacked colors** in a stacked bar — past that the eye can't separate them.
### 5. Axis hygiene
- X-axis labels rotate 45° only when they overflow. Otherwise horizontal.
- Y-axis starts at 0 for bar/area charts (truncating misleads). Line charts can start at min - 5%.
- Use `option.yAxis.axisLabel.formatter: '{value} M'` to add units, NOT a separate "USD millions" subtitle.
- Date axes: pass ISO strings (`"2024-01-15"`) and ECharts handles the layout. Use `xAxis.type: "time"`.
### 6. One message per chart
Every chart goes in its own assistant message (or its own `chart_render` call). Do not pile 4 charts into one wall of tool calls — the user can't focus and the chat gets noisy.
## Calling `chart_render` — the canonical pattern
```
chart_render(
kind="echarts",
spec={
"title": {"text": "Q4 revenue by region", "left": "center"},
"tooltip": {"trigger": "axis"},
"xAxis": {"type": "category", "data": ["NA", "EU", "APAC", "LATAM"]},
"yAxis": {"type": "value", "axisLabel": {"formatter": "${value}M"}},
"series": [{"type": "bar", "data": [12.4, 8.7, 5.3, 2.1], "itemStyle": {"color": "#db6f02"}}]
},
title="q4-revenue-by-region",
width=1600, height=900, dpi=300
)
```
Returns:
```
{
"kind": "echarts",
"spec": {...echoed...},
"file_path": "/.../charts/2026-04-30T...q4-revenue-by-region.png",
"file_url": "file:///.../q4-revenue-by-region.png",
"width": 1600, "height": 900, "dpi": 300, "bytes": 142318,
"title": "q4-revenue-by-region", "runtime_ms": 287
}
```
The chat panel reads `result.spec` and mounts ECharts in the message bubble. The user sees the chart immediately. The PNG is on disk and the chat shows a download link from `result.file_url`. **You don't write that link — it appears automatically.**
## The 12 chart types you'll use 95% of the time
| When | ECharts type | Notes |
|---|---|---|
| Trend over time | `series.type: "line"` | Smooth = `smooth: true` only when data is noisy |
| Multi-metric trend | Two `line` series with `yAxis: [{}, {}]` | Separate scales when units differ |
| Category comparison | `series.type: "bar"` | Sort by value descending, not alphabetically |
| Stacked composition | `bar` with `stack: "total"` | Cap at 5 categories |
| Distribution | `series.type: "boxplot"` or `bar` of bins | Boxplot for ≥3 groups; histogram for one |
| Two-variable correlation | `series.type: "scatter"` | Add `regression` markline if relevant |
| Candlestick / OHLC | `series.type: "candlestick"` | Date axis + `dataZoom` range slider |
| Geo distribution | `series.type: "map"` | Bundled `world` and country GeoJSONs |
| Hierarchy / share | `series.type: "treemap"` or `sunburst` | Use treemap for >12 leaves; pie only for 2-5 |
| Flow | `series.type: "sankey"` | Names matter — keep them short |
| Calendar density | `series.type: "heatmap"` + `calendar` | Daily metrics over a year |
| KPI scorecard | `series.type: "gauge"` | Set `min`, `max`, threshold band |
Worked specs for each are in `references/` — paste, modify, render.
## Mermaid quick rules
```
chart_render(
kind="mermaid",
spec="""
flowchart LR
A[Customer signs up] --> B{Onboarded?}
B -- yes --> C[Activate trial]
B -- no --> D[Email reminder]
""",
title="signup-flow"
)
```
- One diagram per chart_render call.
- Keep node labels short (≤20 chars).
- Use `flowchart LR` for left-to-right; `TD` for top-down. LR reads better in a chat bubble.
- For sequence diagrams, indicate async with `->>` (open arrow) and sync return with `-->>` (dashed).
- Don't try to encode data in mermaid (no widths, no quantities) — that's an ECharts job.
## Common mistakes the agent makes
1. **Writing `![chart](file://...)` markdown by hand.** Don't. The chat renders from the tool result automatically. Manual image markdown will display nothing (file:// is blocked from arbitrary chat content).
2. **Calling chart_render twice for the same chart "to embed and to save".** Only one call. The single call does both.
3. **Overriding fonts to fancy display faces.** Stay with the default; the agent's job is data, not typography.
4. **Pie charts with 12 slices.** Use a horizontal bar chart sorted by value. Pie is only for 2-5 mutually-exclusive shares.
5. **Forgetting `axisLabel.formatter` for currency / percentage.** A y-axis showing "12000000" is unreadable; "12M" is correct.
6. **Putting a chart's title in the message body.** Set `option.title.text` instead so the title is part of the saved PNG.
@@ -0,0 +1,139 @@
---
name: hive.terminal-tools-foundations
description: Required reading whenever any shell_* tool is available. Teaches the foreground/background dichotomy (terminal_exec auto-promotes past 30s, returns a job_id you poll with terminal_job_logs), the standard envelope shape (exit_code, stdout, stdout_truncated_bytes, output_handle, semantic_status, warning, auto_backgrounded, job_id), output handle pagination via terminal_output_get, when to read semantic_status instead of raw exit_code (grep/rg/find/diff/test exit 1 is NOT an error), the destructive-warning surface (rm -rf, git push --force, DROP TABLE), tool preference (use files-tools / gcu-tools / hive_tools before raw shell), and the bash-only-on-macOS policy. Skipping this leads to "tool returned no output" surprises, orphaned jobs, and panic over benign grep exit codes.
metadata:
author: hive
type: preset-skill
version: "1.0"
---
# terminal-tools — foundations
These tools give you a real terminal: foreground exec with smart envelopes, background jobs with offset-based log streaming, persistent PTY shells, and filesystem search. Bash-only on POSIX.
## Tool preference (read first)
Before reaching for terminal-tools, check whether a higher-level tool already covers the task. Shell is for system operations the other servers don't reach.
- **Reading files** → `files-tools.read_file` (handles size, paging, line-numbered output) — NOT `terminal_exec("cat ...")`
- **Editing files** → `files-tools.edit_file` (atomic patch with diff verification) — NOT `terminal_exec("sed -i ...")`
- **Writing files** → `files-tools.write_file` — NOT `terminal_exec("echo > ...")`
- **In-project search** → `files-tools.search_files` (project-scoped, code-aware) — use `terminal_rg` only for raw paths outside the project (`/var/log`, `/etc`)
- **Browser / web pages** → `gcu-tools.browser_*` for rendered pages — NOT `terminal_exec("curl ...")`
- **Web search** → `hive_tools.web_search` — NOT scraping
- **System operations** (process exec, jobs, PTYs, raw fs search) → terminal-tools. This is its territory.
## The standard envelope
Every spawn-style call (`terminal_exec`, the auto-promoted job state) returns this shape:
```jsonc
{
"exit_code": 0, // null when auto-backgrounded or pre-spawn error
"stdout": "...", // decoded, truncated to max_output_kb (default 256 KB)
"stderr": "...",
"stdout_truncated_bytes": 0, // > 0 means more is in output_handle
"stderr_truncated_bytes": 0,
"runtime_ms": 42,
"pid": 12345,
"output_handle": null, // "out_<hex>" when truncated — paginate with terminal_output_get
"timed_out": false,
"semantic_status": "ok", // "ok" | "signal" | "error" — read THIS, not just exit_code
"semantic_message": null, // e.g. "No matches found" for grep exit 1
"warning": null, // e.g. "may force-remove files" for rm -rf
"auto_backgrounded": false,
"job_id": null // set when auto_backgrounded=true
}
```
## Auto-promotion (the core mental model)
`terminal_exec` runs commands in the foreground until the **auto-background budget** (default 30s) elapses. Past that point, the process is silently transferred to a background job and the call returns immediately with:
```jsonc
{ "auto_backgrounded": true, "exit_code": null, "job_id": "job_<hex>", ... }
```
When you see `auto_backgrounded: true`, **pivot to polling**. The job is still running:
```
terminal_job_logs(job_id, since_offset=0, wait_until_exit=true, wait_timeout_sec=60)
→ blocks server-side until the job exits or the timeout, returns logs + status
```
You're not failing — you're freed up to do other work while the long task runs.
To force pure-foreground (kill on `timeout_sec`), pass `auto_background_after_sec=0`. Use this when you genuinely don't want a background job (small commands where promotion would surprise you).
## Semantic exit codes — read `semantic_status`, not raw `exit_code`
Several common commands use exit 1 for legitimate non-error states:
| Command | exit 0 | exit 1 |
|---|---|---|
| `grep` / `rg` | matches found | **no matches** (not an error) |
| `find` | success | **some dirs unreadable** (informational) |
| `diff` | identical | **files differ** (informational) |
| `test` / `[` | true | **false** (informational) |
For these, `semantic_status` will be `"ok"` even when `exit_code == 1`, with `semantic_message` describing why ("No matches found"). For everything else, `semantic_status` defaults to `"ok"` on 0 and `"error"` on nonzero.
**Rule**: always check `semantic_status` first. Only fall back to `exit_code` when you need the exact number (e.g. distinguishing `make` errors).
## Destructive warnings — re-read your command
The envelope's `warning` field is set when the command matches a known destructive pattern (`rm -rf`, `git push --force`, `git reset --hard`, `DROP TABLE`, `kubectl delete`, `terraform destroy`, etc.). The command **still ran** — the warning is informational. Use it as a "did I mean to do that?" prompt before trusting subsequent steps that depend on the side effect.
If a `warning` appears unexpectedly, stop and verify: was the destructive action intended, or did a path/glob slip in?
## Output handles — never lose output
When `stdout_truncated_bytes > 0` or `stderr_truncated_bytes > 0`, the inline output was capped at `max_output_kb` (default 256 KB). The full bytes are stashed under `output_handle` for **5 minutes**. Paginate with:
```
terminal_output_get(output_handle, since_offset=0, max_kb=64)
→ { data, offset, next_offset, eof, expired }
```
Track `next_offset` across calls. If `expired: true`, re-run the command (the handle's TTL has lapsed).
The store has a 64 MB cap with LRU eviction. For huge outputs, prefer `terminal_job_start` + `terminal_job_logs` polling (4 MB ring buffer per stream, infinite total throughput).
## Bash, not zsh — even on macOS
`terminal_exec` and `terminal_pty_open` always invoke `/bin/bash`. The user's `$SHELL` is ignored. Explicit `shell="/bin/zsh"` is **rejected** with a clear error. This is a deliberate security stance, not aesthetic — zsh has command/builtin classes (`zmodload`, `=cmd` expansion, `zpty`, `ztcp`, `zf_*`) that bypass bash-shaped checks. The `terminal-tools-pty-sessions` skill explains the implications for PTY sessions specifically.
`ZDOTDIR` and `ZSH_*` env vars are stripped before exec to prevent zsh dotfiles leaking in. Bash dotfiles still apply when invoked interactively (e.g. PTY sessions use `bash --norc --noprofile` to keep things predictable).
## Pipelines and complex commands
Pipes (`|`), redirects (`>`, `<`, `>>`), conditionals (`&&`, `||`, `;`), and globs (`*`, `?`, `[`) are detected automatically. You can pass them with the default `shell=False` and the runtime will transparently route through `/bin/bash -c` and surface `auto_shell: true` in the envelope:
```
terminal_exec("ps aux | sort -k3 -rn | head -40")
→ { exit_code: 0, stdout: "...", auto_shell: true, ... }
```
For simple argv commands (no metacharacters) `shell=False` is faster and direct-execs the binary. For commands with shell features but no metacharacters that the detector catches (rare — exotic bash builtins, here-strings), pass `shell=True` explicitly:
```
terminal_exec("set -e; complicated bash logic", shell=True)
```
Quoted strings work either way — the detector uses `shlex.split` which handles `"quoted args with spaces"` correctly.
## When to use what (cheat sheet)
| Need | Tool |
|---|---|
| One-shot command, ≤30s | `terminal_exec` |
| One-shot command, might be longer | `terminal_exec` (auto-promotes) |
| Long-running job from the start | `terminal_job_start` |
| State across calls (cd, env, REPL) | `terminal_pty_open` + `terminal_pty_run` |
| Search file contents (raw paths) | `terminal_rg` |
| Find files by predicate | `terminal_find` |
| Retrieve truncated output | `terminal_output_get` |
| Tree / stat / du | `terminal_exec("ls -la"/"stat foo"/"du -sh path")` |
| HTTP / DNS / ping / archives | `terminal_exec("curl ..."/"dig ..."/"tar xzf ...")` |
See `references/exit_codes.md` for the full POSIX + signal-induced + semantic catalog.
@@ -0,0 +1,50 @@
# Exit code reference
## POSIX conventions
| Code | Meaning |
|---|---|
| 0 | Success |
| 1 | General error / catchall |
| 2 | Misuse of shell builtins, syntax error |
| 126 | Command found but not executable |
| 127 | Command not found |
| 128 | Invalid argument to `exit` |
| 128 + N | Killed by signal N |
| 130 | Killed by SIGINT (Ctrl-C) |
| 137 | Killed by SIGKILL |
| 143 | Killed by SIGTERM |
| 255 | Exit status out of range |
When `exit_code < 0` in the envelope, the process was killed by a signal: `abs(exit_code)` is the signal number (subprocess uses negative codes for signaled exits, separate from the `128 + N` shell convention).
## Semantic exits — when exit 1 is NOT an error
terminal-tools encodes these in `semantic_status`. The agent should read `semantic_status` first.
| Command | Code 0 | Code 1 | Code ≥2 |
|---|---|---|---|
| `grep` / `rg` / `ripgrep` | matches found | **no matches** (ok) | error |
| `find` | success | **some dirs unreadable** (ok) | error |
| `diff` | files identical | **files differ** (ok) | error |
| `test` / `[` | condition true | **condition false** (ok) | error |
For any command not in this table, the default convention applies (0 = ok, nonzero = error).
## When `exit_code` is `null`
- `auto_backgrounded: true` — the process is still running under a `job_id`. Poll with `terminal_job_logs`.
- Pre-spawn error (command not found, exec failed) — see `error` field in the envelope.
- `timed_out: true` and the process refused to die — extremely rare; the kernel has the answer.
## Common signal-induced exits
| Signal | Number | Subprocess exit | Shell exit | Meaning |
|---|---|---|---|---|
| SIGHUP | 1 | -1 | 129 | Terminal hangup |
| SIGINT | 2 | -2 | 130 | Interrupt (Ctrl-C) |
| SIGQUIT | 3 | -3 | 131 | Quit (Ctrl-\\) |
| SIGKILL | 9 | -9 | 137 | Forced kill (uncatchable) |
| SIGTERM | 15 | -15 | 143 | Polite termination |
| SIGSEGV | 11 | -11 | 139 | Segmentation fault |
| SIGABRT | 6 | -6 | 134 | Abort (assertion failed, etc.) |
@@ -0,0 +1,96 @@
---
name: hive.terminal-tools-fs-search
description: Use terminal_rg / terminal_find when you need raw filesystem search outside the project tree — system configs, /var/log, /etc, archive contents — or when files-tools.search_files is too project-scoped. Teaches the rg vs find vs terminal_exec("ls/du/tree") split, common rg flag combos for code/logs/configs, find predicates for mtime/size/type queries, and the rule that for tree views or single-file stat info you should just use terminal_exec instead of inventing a tool. Read before reaching for raw shell to grep or find anything.
metadata:
author: hive
type: preset-skill
version: "1.0"
---
# Filesystem search
terminal-tools provides two structured search tools: `terminal_rg` (ripgrep for content) and `terminal_find` (find for predicates). Everything else (tree, stat, du) is just `terminal_exec`.
## When to use what
| Task | Tool |
|---|---|
| Find code/text matching a pattern in your **project** | `files-tools.search_files` (project-aware, ranks by relevance) |
| Find code/text matching a pattern in `/var/log`, `/etc`, archives, system dirs | `terminal_rg` |
| Find files matching name/glob/predicate | `terminal_find` |
| List a directory | `terminal_exec("ls -la /path")` |
| Tree view | `terminal_exec("tree -L 2 /path")` |
| Single-path stat | `terminal_exec("stat /path")` |
| Disk usage | `terminal_exec("du -sh /path")` or `terminal_exec("du -h --max-depth=2 /")` |
| Count matches across files | `terminal_rg(pattern, count=True via extra_args=["-c"])` |
## `terminal_rg` — content search
ripgrep is fast, gitignore-aware, and has a deep flag surface. The structured wrapper exposes the most useful flags directly; `extra_args` covers the rest.
### Common patterns
```
# All Python files containing "TODO"
terminal_rg(pattern="TODO", path=".", type_filter="py")
# Case-insensitive, with context
terminal_rg(pattern="error", path="/var/log", ignore_case=True, context=2)
# Search hidden files (rg ignores them by default)
terminal_rg(pattern="api_key", path="~", hidden=True)
# Don't respect .gitignore (find files git would ignore)
terminal_rg(pattern="generated", path=".", no_ignore=True)
# Multi-line pattern (e.g., function definitions spanning lines)
terminal_rg(pattern=r"def\s+\w+\(.*\n.*\n", path="src", extra_args=["--multiline"])
# Specific filename glob
terminal_rg(pattern="version", path=".", glob="*.toml")
```
### rg flag idioms
| Flag | Effect |
|---|---|
| `-tpy` (`type_filter="py"`) | Only Python files |
| `-uu` | Don't respect any ignores (incl. `.git/`) |
| `--multiline` (`extra_args`) | Allow regex spanning lines |
| `--max-count` (`max_count`) | Stop after N matches per file |
| `--max-depth` (`max_depth`) | Limit recursion |
| `-w` (`extra_args`) | Whole word match |
| `-F` (`extra_args`) | Fixed string (no regex) |
See `references/ripgrep_cheatsheet.md` for the long form.
## `terminal_find` — predicate search
`find` excels at "files matching N criteria". The wrapper surfaces the most common predicates; combine via the structured arguments.
```
# All .log files modified in the last 7 days, larger than 1MB
terminal_find(path="/var/log", iname="*.log", mtime_days=7, size_kb_min=1024)
# All directories named ".git" (find Git repos under a tree)
terminal_find(path="~/projects", name=".git", type_filter="d")
# Only the top three levels
terminal_find(path="/etc", max_depth=3, type_filter="f")
# Symlinks
terminal_find(path=".", type_filter="l")
```
See `references/find_predicates.md` for combinations not directly exposed.
## Output truncation
Both tools return `truncated: true` when their output exceeded the inline cap. For `terminal_rg`, this means matches were dropped (refine the pattern or narrow the path); for `terminal_find`, results past `max_results` (default 1000) are dropped. Tighten predicates rather than raising the cap.
## Anti-patterns
- **Don't `terminal_rg` your project tree** — `files-tools.search_files` is project-aware and ranks results.
- **Don't reach for `terminal_find` to list one directory** — `terminal_exec("ls -la /path")` is shorter.
- **Don't use `terminal_exec("grep ...")`** when `terminal_rg` exists — rg is faster, gitignore-aware, and returns structured matches.
- **Don't use `terminal_exec("find ...")`** to invent your own predicate combinations — use `terminal_find` and report missing capabilities.
@@ -0,0 +1,78 @@
# find predicate reference
The `terminal_find` wrapper exposes name/iname, type, mtime_days, size bounds, max_depth, max_results. For combinations beyond that, drop to `terminal_exec("find ...")`.
## Time predicates
| Need | find predicate |
|---|---|
| Modified within N days | `-mtime -N` (wrapper: `mtime_days=N`) |
| Modified more than N days ago | `-mtime +N` |
| Modified exactly N days ago | `-mtime N` |
| Accessed within N days | `-atime -N` |
| Inode changed within N days | `-ctime -N` |
| Modified in last N minutes | `-mmin -N` |
| Newer than reference file | `-newer ref` |
## Size predicates
| Need | find predicate |
|---|---|
| Bigger than N kilobytes | `-size +Nk` (wrapper: `size_kb_min`) |
| Smaller than N kilobytes | `-size -Nk` (wrapper: `size_kb_max`) |
| Exactly N kilobytes | `-size Nk` |
| Bigger than N megabytes | `-size +NM` |
| Empty files | `-empty` |
## Type predicates
| Need | find predicate |
|---|---|
| Regular file | `-type f` (wrapper: `type_filter="f"`) |
| Directory | `-type d` (wrapper: `type_filter="d"`) |
| Symlink | `-type l` (wrapper: `type_filter="l"`) |
| Block device | `-type b` |
| Character device | `-type c` |
| FIFO | `-type p` |
| Socket | `-type s` |
## Permission predicates
| Need | find predicate |
|---|---|
| Owned by user | `-user alice` |
| Owned by group | `-group dev` |
| Permission bits exact | `-perm 644` |
| Has any of these bits | `-perm /u+x` |
| Has all of these bits | `-perm -u+x` |
| Readable by current user | `-readable` |
| Writable | `-writable` |
| Executable | `-executable` |
## Composing
`find` evaluates predicates left-to-right with implicit AND. For OR, use `\(`...\` or .
```
# .log OR .txt (drop to terminal_exec for OR)
terminal_exec(r"find /path \( -name '*.log' -o -name '*.txt' \) -type f", shell=True)
# NOT in a directory called node_modules
terminal_exec("find . -path '*/node_modules' -prune -o -name '*.js' -print", shell=True)
```
## Actions
| Need | predicate |
|---|---|
| Print path (default) | (implicit `-print`) |
| Print null-separated | `-print0` (for piping to xargs -0) |
| Delete | `-delete` (DANGEROUS — use terminal_exec with explicit confirmation) |
| Run command per match | `-exec cmd {} \;` (drop to terminal_exec) |
| Run command, batched | `-exec cmd {} +` |
## When NOT to use find
- **One directory listing**: `terminal_exec("ls -la /path")`
- **Recursive grep**: `terminal_rg`
- **Count files**: `terminal_exec("find /path -type f | wc -l")`
@@ -0,0 +1,70 @@
# ripgrep cheatsheet
For when the structured `terminal_rg` flags don't cover the case. Pass via `extra_args=[...]`.
## Filtering
| Need | Flag |
|---|---|
| Whole word | `-w` |
| Fixed string (no regex) | `-F` |
| Match files only (paths, not lines) | `-l` |
| Count matches per file | `-c` |
| Print only filenames with no matches | `--files-without-match` |
| Exclude binary files | (default) |
| Include binaries | `--binary` |
| Search archives transparently | (rg doesn't — extract first) |
## Output shape
| Need | Flag |
|---|---|
| Show only matched part | `-o` |
| Show byte offset of match | `-b` |
| No filename prefix | `-N` (or pipe through awk) |
| Color always (for piping into a colorizer) | `--color=always` |
| JSON output | (the wrapper already uses `--json` internally) |
## Boundaries
| Need | Flag |
|---|---|
| Line-by-line (default) | (default) |
| Multi-line regex | `--multiline` (or `-U`) |
| Multi-line dotall (`.` matches `\n`) | `--multiline-dotall` |
| Crlf line endings | `--crlf` |
## Path control
| Need | Flag |
|---|---|
| Follow symlinks | `-L` |
| Don't follow | (default) |
| Search hidden | `-.` (also expressed as `hidden=True`) |
| Don't respect any ignores | `-uuu` |
| Glob include | `-g 'pattern'` (also `glob="..."`) |
| Glob exclude | `-g '!pattern'` |
## Performance
| Need | Flag |
|---|---|
| One thread | `-j 1` |
| Smaller mmap chunks | `--mmap` (default behavior usually fine) |
| Per-file match cap | `-m N` (also `max_count=N`) |
## Common composed queries
```
# Find unused imports in Python
terminal_rg(pattern=r"^import\s+\w+$", path="src", type_filter="py")
# All TODO/FIXME/XXX with file:line
terminal_rg(pattern=r"\b(TODO|FIXME|XXX)\b", path=".", extra_args=["-n"])
# Functions defined at module top-level
terminal_rg(pattern=r"^def\s+\w+", path=".", type_filter="py")
# Lines that DON'T match a pattern (filtered through awk)
# rg can't invert at line level; use terminal_exec with grep -v
```
@@ -0,0 +1,110 @@
---
name: hive.terminal-tools-job-control
description: Use when launching anything that runs longer than a minute, anything that streams logs, anything you want to keep running while doing other work — or when terminal_exec auto-backgrounded on you and returned a job_id. Teaches the start→poll→wait pattern with terminal_job_logs offset bookkeeping, the `wait_until_exit=True` blocking-poll idiom, the truncated_bytes_dropped resumption signal, the merge_stderr decision, the SIGINT→SIGTERM→SIGKILL escalation ladder via terminal_job_manage, and the hard rule that jobs die when the terminal-tools server restarts. Read before calling terminal_job_start, or right after terminal_exec auto-backgrounded.
metadata:
author: hive
type: preset-skill
version: "1.0"
---
# Background job control
Background jobs are how you do things that take time without blocking your conversation. Three tools cover the surface: `terminal_job_start`, `terminal_job_logs`, `terminal_job_manage`.
## When to use a job
- Builds, deploys, long tests
- Processes you want to monitor (streaming a log file, a dev server)
- Anything that auto-backgrounded from `terminal_exec` (you have a `job_id`; pivot to this skill's idioms)
For one-shot work expected to finish quickly, `terminal_exec` is simpler. The auto-promotion mechanic in `terminal_exec` is your safety net — start with `terminal_exec`, take over with this skill if needed.
## Lifecycle
```
terminal_job_start(command, ...)
→ { job_id, pid, started_at }
terminal_job_logs(job_id, since_offset=0, max_bytes=64000)
→ { data, offset, next_offset, status: "running"|"exited", exit_code, ... }
# Repeat with since_offset = previous next_offset until status == "exited"
# Or block once with wait_until_exit=True:
terminal_job_logs(job_id, since_offset=N, wait_until_exit=True, wait_timeout_sec=60)
→ blocks server-side until exit or timeout
```
After exit, the job is retained for inspection (`terminal_job_manage(action="list")`) until evicted by FIFO (50 most recent exits kept).
## Offset bookkeeping — the only rule that matters
The job's output lives in a 4 MB ring buffer per stream. Each call to `terminal_job_logs` returns:
- `data` — bytes between `since_offset` and `next_offset`
- `next_offset` — pass this as `since_offset` on your next call
- `truncated_bytes_dropped` — non-zero when your `since_offset` was older than the ring's floor (you fell behind)
**Always carry `next_offset` forward.** Don't replay from 0 — that's an offset reset, you'll see the same data twice and miss the part that fell off.
When `truncated_bytes_dropped > 0`, the buffer evicted N bytes between your last call and now. Treat it as a signal that the job is producing output faster than you're consuming. Either poll more often or accept the gap and read from `next_offset` going forward.
## merge_stderr — interleaved or separate
```
merge_stderr=False → two streams, request "stdout" or "stderr" by name
merge_stderr=True → one stream ("merged"), order preserved
```
Pick `merge_stderr=True` when:
- The job's logs are designed to be read together (most servers, build tools)
- You don't need to distinguish "this was stderr"
Pick `merge_stderr=False` when:
- stderr is genuinely error-only and stdout is data
- You'll process them differently
## Signal escalation
```
terminal_job_manage(action="signal_int", job_id=...) # graceful (Ctrl-C-equivalent)
terminal_job_manage(action="signal_term", job_id=...) # polite kill (SIGTERM)
terminal_job_manage(action="signal_kill", job_id=...) # forced kill (SIGKILL, uncatchable)
```
The idiom: `signal_int` → wait 2-5s → `signal_term` → wait 2-5s → `signal_kill`. Most well-behaved processes handle SIGINT (graceful) and SIGTERM (cleanup, then exit). SIGKILL bypasses cleanup — use only when the process is truly unresponsive.
After signaling, check exit with `terminal_job_logs(job_id, wait_until_exit=True, wait_timeout_sec=2)`.
## Stdin
```
terminal_job_manage(action="stdin", job_id=..., data="some input\n")
terminal_job_manage(action="close_stdin", job_id=...)
```
For tools that read stdin to EOF, `close_stdin` after writing flushes them. For interactive tools that read line-by-line, just write each line.
## Take-over: when terminal_exec auto-backgrounds
When `terminal_exec` returned `auto_backgrounded: true, job_id: <X>`, the process is **already** in the JobManager with its output flowing into the ring buffer. Your transition is seamless:
```
# Already saw the start of output in terminal_exec's stdout/stderr.
# Pick up reading where the env left off — use the byte count of the
# initial stdout as your since_offset, OR just request tail output:
terminal_job_logs(job_id="job_xxx", tail=True, max_bytes=64000)
```
Or block until exit and grab everything:
```
terminal_job_logs(job_id="job_xxx", since_offset=0, wait_until_exit=True, wait_timeout_sec=120)
```
## Hard rules
- **Jobs die when the server restarts.** The desktop runtime restarts terminal-tools when Hive restarts. There's no re-attach. If you need durability, use `nohup` + `terminal_exec` to detach into the system's process tree and track the PID yourself.
- **Server-wide hard cap on concurrent jobs** (`TERMINAL_TOOLS_MAX_JOBS`, default 32). Past the cap, `terminal_job_start` returns an error. Wait for jobs to exit or kill old ones.
- **No cross-restart output.** Output handles and ring buffers are in-memory only.
See `references/signals.md` for the full signal catalog.
@@ -0,0 +1,41 @@
# Signal reference
terminal_job_manage exposes six signals via the action name.
| Action | Signal | Number | Purpose | Catchable? |
|---|---|---|---|---|
| `signal_int` | SIGINT | 2 | Interrupt — Ctrl-C equivalent. Most CLIs treat as "stop gracefully". | Yes |
| `signal_term` | SIGTERM | 15 | Polite termination request. Default for `kill`. | Yes |
| `signal_kill` | SIGKILL | 9 | Forced kill. Process can't catch, clean up, or finalize. Use sparingly. | **No** |
| `signal_hup` | SIGHUP | 1 | Hangup. Many daemons reload config on this. | Yes |
| `signal_usr1` | SIGUSR1 | 10 | User-defined #1. Common: dump state, rotate logs (nginx, etc). | Yes |
| `signal_usr2` | SIGUSR2 | 12 | User-defined #2. Common: graceful binary upgrade (unicorn, etc). | Yes |
## Escalation idiom
```
1. signal_int (Ctrl-C — graceful)
2. wait 2-5s, check status with terminal_job_logs(wait_until_exit=True, wait_timeout_sec=3)
3. if still running: signal_term (cleanup-then-exit)
4. wait 2-5s
5. if still running: signal_kill (forced)
```
The waits matter: SIGTERM handlers do real work (flush logs, close DBs, release locks) and need time. Skipping straight to SIGKILL leaks resources.
## When to use SIGUSR1 / SIGUSR2
These are application-defined. Read the target's docs first. Common:
- **nginx**: SIGUSR1 → reopen log files (for log rotation)
- **unicorn / puma**: SIGUSR2 → fork a new master with the latest binary (graceful restart)
- **rsync**: SIGUSR1 → print stats so far
## Reading exit codes after a signal
When a job exits via signal, `terminal_job_logs` returns `exit_code: -N` (subprocess convention) where `abs(N)` is the signal number. The shell convention `128 + N` doesn't apply to the JobManager — that's for shell-spawned children.
| exit_code | Means |
|---|---|
| -2 | Killed by SIGINT |
| -9 | Killed by SIGKILL |
| -15 | Killed by SIGTERM |
@@ -0,0 +1,127 @@
---
name: hive.terminal-tools-pty-sessions
description: Use when you need state across calls — building env vars, navigating with cd, driving REPLs (python -i, mysql, psql, node), or responding to interactive prompts (sudo password, ssh host-key confirmation, mysql connection). Teaches the prompt-sentinel exec pattern (default mode), raw I/O for REPLs (raw_send=True then read_only=True), the one-in-flight-per-session rule, and the close-or-leak-against-the-cap discipline. Bash on macOS — never zsh; explicit shell=/bin/zsh is rejected. Read before calling terminal_pty_open.
metadata:
author: hive
type: preset-skill
version: "1.0"
---
# Persistent PTY sessions
PTY sessions are how you talk to interactive programs — programs that detect a terminal (`isatty()`) and behave differently when they don't see one. Use a session when:
- You need state to persist across calls (`cd`, env vars, sourced scripts)
- You're driving a REPL (`python -i`, `mysql`, `psql`, `node`, `irb`)
- A program demands an interactive prompt (`sudo`, `ssh`, `npm login`, `gh auth login`)
For everything else, `terminal_exec` is simpler. Sessions cost more (per-session bash process, ring buffer, idle-reaping bookkeeping) and have a hard cap (`TERMINAL_TOOLS_MAX_PTY`, default 8).
## Why PTY (and not subprocess pipes)
Subprocess pipes break on every interactive program. The moment a program calls `isatty()` and sees False, it disables prompts, color, line-editing, password masking, progress bars — sometimes refuses to start. PTY makes us look like a real terminal so these programs work the same as in your shell.
The cost: PTY output includes terminal escape codes (cursor moves, color codes). The session captures them as-is; if you need clean text, strip ANSI escapes in your processing layer.
## Bash on macOS — by deliberate policy
`terminal_pty_open` always invokes `/bin/bash`, regardless of the user's `$SHELL`. macOS users: yes, even when zsh is your interactive default. This is the **terminal-tools-foundations** policy applied to PTYs.
Reasons:
- zsh has command/builtin classes (`zmodload`, `=cmd` expansion, `zpty`, `ztcp`) that bypass bash-shaped security checks
- One shell behavior across platforms eliminates "works on Linux, breaks on macOS" surprises
- Bash is universal: any shell you've used will accept the bash subset
The bash invocation uses `--norc --noprofile` so user dotfiles don't leak in. PS1 is set to a unique sentinel for prompt detection. PS2 is empty. PROMPT_COMMAND is empty.
## Three modes of `terminal_pty_run`
### 1. Default: send command, wait for prompt sentinel
```
terminal_pty_run(session_id, command="ls -la")
→ { output, prompt_after: True, ... }
```
The session writes `ls -la\n`, waits for the sentinel that its custom PS1 emits, returns the slice between submission and prompt. **One in-flight call per session** — a concurrent call returns a `"session busy"` error.
### 2. raw_send: send raw input, no waiting
```
terminal_pty_run(session_id, command="print('hi')\n", raw_send=True)
→ { bytes_sent: 12 }
```
For REPLs, vim keystrokes, password prompts. The session writes the bytes and returns immediately — it doesn't wait for a prompt (REPLs don't print bash's prompt; they print their own).
After a `raw_send`, you typically follow with:
### 3. read_only: drain currently-buffered output
```
terminal_pty_run(session_id, read_only=True, timeout_sec=2)
→ { output: "hi\n", more: False, ... }
```
Reads whatever the session has accumulated since the last drain, with a brief settle window. Use after raw_send to capture the REPL's response.
## Custom prompt detection (`expect`)
When the command launches a program with its own prompt (Python REPL's `>>> `, mysql's `mysql> `, sudo's password prompt), the bash sentinel won't appear until the program exits. Override:
```
terminal_pty_run(session_id, command="python3", expect=r">>>\s*$", timeout_sec=10)
→ output up to and including ">>>", then control returns
```
For sudo:
```
terminal_pty_run(session_id, command="sudo -k && sudo whoami", expect=r"[Pp]assword:")
terminal_pty_run(session_id, command="<password>", raw_send=True, command="<password>\n")
terminal_pty_run(session_id, read_only=True, timeout_sec=5)
```
(Treat passwords carefully — they end up in the ring buffer.)
## Always close
```
terminal_pty_close(session_id)
```
Leaked sessions count against `TERMINAL_TOOLS_MAX_PTY` (default 8). Idle reaping happens lazily on every `_open` call (sessions inactive longer than `idle_timeout_sec`, default 1800s, are dropped) — but don't rely on it. Close when you're done.
For unresponsive sessions, `force=True` skips the graceful "exit" attempt and goes straight to SIGTERM/SIGKILL.
## Common patterns
### Stateful navigation
```
sid = terminal_pty_open(cwd="/")
terminal_pty_run(sid, command="cd /var/log")
terminal_pty_run(sid, command="ls -la *.log | head")
terminal_pty_close(sid)
```
### Python REPL
```
sid = terminal_pty_open()
terminal_pty_run(sid, command="python3", expect=r">>>\s*$")
terminal_pty_run(sid, command="x = 42", raw_send=True)
terminal_pty_run(sid, command="print(x*x)\n", raw_send=True)
result = terminal_pty_run(sid, read_only=True) # → "1764\n>>> "
terminal_pty_run(sid, command="exit()", raw_send=True)
terminal_pty_close(sid)
```
### ssh with host-key prompt
```
sid = terminal_pty_open()
terminal_pty_run(sid, command="ssh user@new-host", expect=r"\(yes/no.*\)\?")
terminal_pty_run(sid, command="yes\n", raw_send=True)
terminal_pty_run(sid, read_only=True, timeout_sec=10) # password prompt or login
```
@@ -0,0 +1,92 @@
---
name: hive.terminal-tools-troubleshooting
description: Read when a terminal-tools call returned something surprising — empty stdout despite no error, exit_code is null, output_handle came back expired, "too many jobs" / "session busy" / "too many PTYs", warning was set unexpectedly, semantic_status disagrees with exit_code. Diagnostic recipes only — load on demand. Don't preload; the foundational skill covers the happy path.
metadata:
author: hive
type: preset-skill
version: "1.0"
---
# Troubleshooting terminal-tools
Recipes for surprising results. Match the symptom to the section.
## Empty `stdout` despite the command "should have" produced output
Possible causes:
1. Output went to **stderr** instead. Check `stderr` in the envelope (or use `merge_stderr=True` for jobs).
2. Output was **fully truncated** because `max_output_kb` is too small. Check `stdout_truncated_bytes > 0`. Bump `max_output_kb` or paginate via `output_handle`.
3. Command produced no output (correct, just unexpected — `silent` flags, no matches).
4. Pipeline issue: the last stage of a pipe ran but stdout went elsewhere (`> /dev/null`, redirected via `2>&1`).
5. Process is buffering its output and didn't flush before exit. Add `stdbuf -oL` (line-buffered) or `unbuffer` to the command.
## `exit_code: null`
| Cause | Other field |
|---|---|
| Auto-backgrounded | `auto_backgrounded: true, job_id: <X>` |
| Hard timeout, process killed | `timed_out: true` |
| Pre-spawn failure (command not found) | `error: ...` set, `pid: null` |
| Still running (in `terminal_job_logs`) | `status: "running"` |
## `output_handle` returned `expired: true`
5-minute TTL. Either (a) you waited too long, or (b) the store evicted it under memory pressure (64 MB total cap, LRU eviction). Re-run the command.
To reduce risk: paginate the handle as soon as you receive it, or use `terminal_job_*` for huge outputs (4 MB ring buffer with offsets — no expiry).
## "too many jobs" / `JobLimitExceeded`
`TERMINAL_TOOLS_MAX_JOBS` (default 32) hit. Either:
- Wait for jobs to exit (poll with `terminal_job_logs(wait_until_exit=True)`)
- Kill old jobs: `terminal_job_manage(action="list")` to see what's running, then `signal_term` the abandoned ones
- Raise the cap via env (rare)
## "session busy"
A `terminal_pty_run` was issued while another `_run` is in flight on the same session. PTY sessions are single-threaded conversations. Wait for the prior call to return, or open a second session.
## "PTY cap reached"
`TERMINAL_TOOLS_MAX_PTY` (default 8) hit. Close idle sessions (`terminal_pty_close`). Idle reaping is lazy; force it by opening — no, actually, opening throws when the cap is hit. Just close manually.
## `warning` is set, the command worked
Informational only. The pattern matched (e.g. `rm -rf` literally appears, or `git push --force` was used). The command ran. The warning is your "did I mean to do that?" prompt — verify the side effect was intended before continuing.
## `semantic_status: "ok"` but `exit_code: 1`
Working as designed. Some commands use exit 1 for legitimate non-error states:
- `grep` / `rg` exit 1 when **no matches** found
- `find` exit 1 when **some directories were unreadable** (typical on `/proc`, etc.)
- `diff` exit 1 when **files differ**
- `test` / `[` exit 1 when **condition is false**
The `semantic_message` field explains. Trust `semantic_status`, not raw `exit_code`.
## `semantic_status: "error"` but `exit_code: 0`
Shouldn't happen. If it does, file a bug.
## `truncated_bytes_dropped > 0` in `terminal_job_logs`
Your `since_offset` was older than the ring buffer's floor — bytes evicted before you could read them. Either:
- Poll faster (lower latency between calls)
- Use `merge_stderr=True` (single 4 MB ring instead of 4 MB × 2)
- Accept the gap and move forward from `next_offset`
## `terminal_pty_open` succeeds but the first `_run` times out
The session may not have produced its first prompt sentinel within the 2-second startup window. Try:
- A `terminal_pty_run(sid, read_only=True, timeout_sec=2)` to drain whatever's accumulated
- A noop command (`terminal_pty_run(sid, command="true")`) to force a prompt cycle
Could also indicate the bash process died at startup — `terminal_pty_run(sid, ...)` would then return `"session has exited"`.
## `shell="/bin/zsh"` returned an error
By design. terminal-tools is bash-only on POSIX. Use `shell=True` (default `/bin/bash`) or omit `shell=` to exec directly.
## A command in `shell=True` is interpreted differently than expected
Bash, not zsh, semantics. `**/*` doesn't recurse without `shopt -s globstar`; `=cmd` expansion doesn't work; arrays use `arr[idx]` not `${arr[idx]}` differently than zsh. When in doubt, the foundational skill's "bash, not zsh" section is the canonical statement.
+1 -1
View File
@@ -146,7 +146,7 @@ def write_skill(
``target_root`` is the parent scope dir (e.g.
``~/.hive/agents/queens/{id}/skills`` or
``{colony_dir}/.hive/skills``). The function creates it if needed.
``{colony_dir}/skills``). The function creates it if needed.
Returns ``(installed_path, error, replaced)``. On success ``error`` is
``None``; on failure ``installed_path`` is ``None`` and the target is
+6 -2
View File
@@ -136,8 +136,12 @@ class SkillDiscovery:
self._scanned_dirs.append(user_agents)
all_skills.extend(self._scan_scope(user_agents, "user"))
# Hive-specific (higher precedence within user scope)
user_hive = home / ".hive" / "skills"
# Hive-specific (higher precedence within user scope). Honors
# HIVE_HOME so the desktop's per-user root (set via env) wins
# over the shared ``~/.hive`` location.
from framework.config import HIVE_HOME
user_hive = HIVE_HOME / "skills"
if user_hive.is_dir():
self._scanned_dirs.append(user_hive)
all_skills.extend(self._scan_scope(user_hive, "user"))
+9 -5
View File
@@ -15,14 +15,18 @@ import subprocess
import tempfile
from pathlib import Path
from framework.config import HIVE_HOME
from framework.skills.parser import ParsedSkill
from framework.skills.skill_errors import SkillError, SkillErrorCode
# Default install destination for user-scope skills
USER_SKILLS_DIR = Path.home() / ".hive" / "skills"
# Default install destination for user-scope skills.
# Anchored on HIVE_HOME so the desktop shell can override the install
# root via $HIVE_HOME without patching every call site.
USER_SKILLS_DIR = HIVE_HOME / "skills"
# Sentinel file for the one-time security notice on first install (NFR-5).
INSTALL_NOTICE_SENTINEL = HIVE_HOME / ".install_notice_shown"
# Sentinel file for the one-time security notice on first install (NFR-5)
INSTALL_NOTICE_SENTINEL = Path.home() / ".hive" / ".install_notice_shown"
_INSTALL_NOTICE = """\
@@ -44,7 +48,7 @@ _INSTALL_NOTICE = """\
def maybe_show_install_notice() -> None:
"""Print a one-time security notice before the first skill install (NFR-5).
Touches a sentinel file in ~/.hive/ after showing the notice so it is
Touches a sentinel file in $HIVE_HOME after showing the notice so it is
only displayed once across all future installs.
"""
if INSTALL_NOTICE_SENTINEL.exists():
+16 -4
View File
@@ -26,9 +26,21 @@ _DEFAULT_REGISTRY_URL = (
"https://raw.githubusercontent.com/hive-skill-registry/hive-skill-registry/main/skill_index.json"
)
_CACHE_DIR = Path.home() / ".hive" / "registry_cache"
_CACHE_INDEX_PATH = _CACHE_DIR / "skill_index.json"
_CACHE_METADATA_PATH = _CACHE_DIR / "metadata.json"
def _cache_dir() -> Path:
from framework.config import HIVE_HOME
return HIVE_HOME / "registry_cache"
def _cache_index_path() -> Path:
return _cache_dir() / "skill_index.json"
def _cache_metadata_path() -> Path:
return _cache_dir() / "metadata.json"
_CACHE_TTL_SECONDS = 3600 # 1 hour
@@ -46,7 +58,7 @@ class RegistryClient:
cache_dir: Path | None = None,
) -> None:
self._url = registry_url or os.environ.get("HIVE_REGISTRY_URL", _DEFAULT_REGISTRY_URL)
cache_root = cache_dir or _CACHE_DIR
cache_root = cache_dir or _cache_dir()
self._index_path = cache_root / "skill_index.json"
self._metadata_path = cache_root / "metadata.json"
+2
View File
@@ -33,6 +33,8 @@ _BUNDLED_DIRS: tuple[Path, ...] = (
# (tool-name prefix, skill directory name, display name)
_TOOL_GATED_SKILLS: list[tuple[str, str, str]] = [
("browser_", "browser-automation", "hive.browser-automation"),
("terminal_", "terminal-tools-foundations", "hive.terminal-tools-foundations"),
("chart_", "chart-creation-foundations", "hive.chart-creation-foundations"),
]
_BODY_CACHE: dict[str, str] = {}
+13 -6
View File
@@ -20,6 +20,7 @@ from enum import StrEnum
from pathlib import Path
from urllib.parse import urlparse
from framework.config import HIVE_HOME
from framework.skills.parser import ParsedSkill
logger = logging.getLogger(__name__)
@@ -30,8 +31,11 @@ _ENV_TRUST_ALL = "HIVE_TRUST_PROJECT_SKILLS"
# Env var for comma-separated own-remote glob patterns (e.g. "github.com/myorg/*").
_ENV_OWN_REMOTES = "HIVE_OWN_REMOTES"
_TRUSTED_REPOS_PATH = Path.home() / ".hive" / "trusted_repos.json"
_NOTICE_SENTINEL_PATH = Path.home() / ".hive" / ".skill_trust_notice_shown"
# Persisted store of trusted git remotes (one-shot consent per repo).
_TRUSTED_REPOS_PATH = HIVE_HOME / "trusted_repos.json"
# Sentinel for the one-time security notice (NFR-5).
_NOTICE_SENTINEL_PATH = HIVE_HOME / ".skill_trust_notice_shown"
# ---------------------------------------------------------------------------
@@ -224,7 +228,9 @@ class ProjectTrustDetector:
patterns.extend(p.strip() for p in raw.split(",") if p.strip())
# From ~/.hive/own_remotes file
own_remotes_file = Path.home() / ".hive" / "own_remotes"
from framework.config import HIVE_HOME
own_remotes_file = HIVE_HOME / "own_remotes"
if own_remotes_file.is_file():
try:
for line in own_remotes_file.read_text(encoding="utf-8").splitlines():
@@ -415,7 +421,8 @@ class TrustGate:
def _maybe_show_security_notice(self, Colors) -> None: # noqa: N803
"""Show the one-time security notice if not already shown (NFR-5)."""
if _NOTICE_SENTINEL_PATH.exists():
sentinel = _NOTICE_SENTINEL_PATH
if sentinel.exists():
return
self._print("")
self._print(
@@ -427,8 +434,8 @@ class TrustGate:
)
self._print("")
try:
_NOTICE_SENTINEL_PATH.parent.mkdir(parents=True, exist_ok=True)
_NOTICE_SENTINEL_PATH.touch()
sentinel.parent.mkdir(parents=True, exist_ok=True)
sentinel.touch()
except OSError:
pass
+248
View File
@@ -0,0 +1,248 @@
"""Sidecar summary cache for cold-session listings.
Each queen session directory grows a ``summary.json`` file that mirrors the
expensive-to-recompute fields surfaced by ``SessionManager.list_cold_sessions``:
``message_count``, ``last_message`` snippet, and ``last_active_at``.
Without this cache the queen-history sidebar reads **every** part file of
**every** session on the disk for each list request. That cost grows with
total messages across all sessions, not just the one being opened, and is
visible whenever the user navigates to the session list.
Update path: ``FileConversationStore.write_part`` calls ``update_summary``
after each successful part write best-effort, never blocks the caller on
failure.
Read path: ``list_cold_sessions`` reads ``summary.json`` and only falls back
to a full part scan when the file is missing or stale (parts dir mtime newer
than the summary). The rebuild path also writes a fresh summary, so the
slow path is paid at most once per session per upgrade.
"""
from __future__ import annotations
import json
import logging
from pathlib import Path
from typing import Any
from framework.utils.io import atomic_write
logger = logging.getLogger(__name__)
_SUMMARY_FILENAME = "summary.json"
_LAST_MESSAGE_MAX_CHARS = 120
def is_client_facing(part: dict[str, Any]) -> bool:
"""Whether this part appears in the client-visible chat list.
Mirrors the predicate in ``SessionManager.list_cold_sessions`` so the
cached counts agree with a full rebuild.
"""
if part.get("is_transition_marker"):
return False
role = part.get("role")
if role == "tool":
return False
if role == "assistant" and part.get("tool_calls"):
return False
return True
def _extract_text(content: Any) -> str:
"""Render a part's ``content`` field as a flat string for the snippet."""
if isinstance(content, str):
return content
if isinstance(content, list):
# Anthropic-style content blocks: [{"type": "text", "text": "..."}]
return " ".join(b.get("text", "") for b in content if isinstance(b, dict) and b.get("type") == "text")
return ""
def _summary_path(session_dir: Path) -> Path:
return session_dir / _SUMMARY_FILENAME
def read_summary(session_dir: Path) -> dict | None:
"""Return the cached summary dict, or ``None`` if missing/corrupt."""
path = _summary_path(session_dir)
if not path.exists():
return None
try:
return json.loads(path.read_text(encoding="utf-8"))
except (json.JSONDecodeError, OSError):
return None
def is_stale(session_dir: Path) -> bool:
"""True when the summary is missing or older than the latest part write.
Compares ``summary.json`` mtime against ``conversations/parts/`` (and
any node-based ``conversations/<node>/parts/``) directory mtime.
POSIX dir mtime updates whenever entries are added, so a new part flush
bumps the parts-dir mtime above the summary's.
"""
summary_path = _summary_path(session_dir)
if not summary_path.exists():
return True
try:
summary_mtime = summary_path.stat().st_mtime
except OSError:
return True
convs_dir = session_dir / "conversations"
if not convs_dir.exists():
return False
candidate_dirs: list[Path] = [convs_dir / "parts"]
try:
for child in convs_dir.iterdir():
if child.is_dir() and child.name != "parts":
candidate_dirs.append(child / "parts")
except OSError:
return False
for d in candidate_dirs:
if not d.exists():
continue
try:
if d.stat().st_mtime > summary_mtime + 0.001:
return True
except OSError:
continue
return False
def _write_summary(session_dir: Path, data: dict) -> None:
path = _summary_path(session_dir)
try:
path.parent.mkdir(parents=True, exist_ok=True)
with atomic_write(path) as f:
json.dump(data, f)
except OSError:
logger.debug("session_summary: failed to write %s", path, exc_info=True)
def update_summary(session_dir: Path, part: dict[str, Any]) -> None:
"""Incrementally fold ``part`` into the cached summary.
Best-effort; swallows errors so the part-write path is never broken by
a summary failure. Reads the prior summary, mutates a few fields, and
writes back atomically.
Only client-facing parts (see :func:`is_client_facing`) bump the count
and the ``last_message`` snippet tool calls and transition markers
are persisted but not surfaced in the sidebar.
"""
try:
if not is_client_facing(part):
return
existing = read_summary(session_dir) or {}
message_count = int(existing.get("message_count") or 0) + 1
last_active_at = float(existing.get("last_active_at") or 0.0)
last_message = existing.get("last_message")
# Prefer an explicit timestamp on the part; fall back to the current
# summary's most-recent activity. Parts also carry ``seq`` which is
# monotonic per-session, but seq is not a wall-clock — keep both.
part_ts = part.get("created_at")
if isinstance(part_ts, (int, float)) and part_ts > last_active_at:
last_active_at = float(part_ts)
# Update the snippet with the latest assistant message; user messages
# don't replace it, matching the existing list_cold_sessions behavior
# (it scans backward for the last assistant message).
if part.get("role") == "assistant":
text = _extract_text(part.get("content")).strip()
if text:
last_message = text[:_LAST_MESSAGE_MAX_CHARS]
last_part_seq = part.get("seq")
if last_part_seq is None:
last_part_seq = existing.get("last_part_seq")
_write_summary(
session_dir,
{
"message_count": message_count,
"last_message": last_message,
"last_active_at": last_active_at,
"last_part_seq": last_part_seq,
},
)
except Exception:
logger.debug("session_summary: update_summary failed", exc_info=True)
def rebuild_summary(session_dir: Path) -> dict | None:
"""Full-scan rebuild — reads every part file and recomputes the summary.
Returns the rebuilt dict and writes it to ``summary.json``. Returns
``None`` when the conversations directory is absent (no parts yet).
Used by ``list_cold_sessions`` as the migration / fallback path when
the cache is missing or stale.
"""
convs_dir = session_dir / "conversations"
if not convs_dir.exists():
return None
all_parts: list[dict] = []
def _collect(parts_dir: Path) -> None:
if not parts_dir.exists():
return
try:
for part_file in sorted(parts_dir.iterdir()):
if part_file.suffix != ".json":
continue
try:
part = json.loads(part_file.read_text(encoding="utf-8"))
part.setdefault("created_at", part_file.stat().st_mtime)
all_parts.append(part)
except (json.JSONDecodeError, OSError):
continue
except OSError:
return
_collect(convs_dir / "parts")
try:
for node_dir in convs_dir.iterdir():
if node_dir.is_dir() and node_dir.name != "parts":
_collect(node_dir / "parts")
except OSError:
pass
client_msgs = [p for p in all_parts if is_client_facing(p)]
client_msgs.sort(key=lambda m: m.get("created_at", m.get("seq", 0)))
last_active_at = 0.0
last_message: str | None = None
if client_msgs:
latest_ts = client_msgs[-1].get("created_at")
if isinstance(latest_ts, (int, float)):
last_active_at = float(latest_ts)
for msg in reversed(client_msgs):
if msg.get("role") != "assistant":
continue
text = _extract_text(msg.get("content")).strip()
if text:
last_message = text[:_LAST_MESSAGE_MAX_CHARS]
break
last_part_seq = None
if all_parts:
seqs = [p.get("seq") for p in all_parts if isinstance(p.get("seq"), int)]
if seqs:
last_part_seq = max(seqs)
summary = {
"message_count": len(client_msgs),
"last_message": last_message,
"last_active_at": last_active_at,
"last_part_seq": last_part_seq,
}
_write_summary(session_dir, summary)
return summary
+45
View File
@@ -0,0 +1,45 @@
"""File-backed, lock-coordinated task tracker for the hive agent loop.
See temp/tasks-system-implementation-plan.md for the design. Two list types:
colony:{colony_id} -- the queen's spawn-plan template
session:{agent_id}:{sess_id} -- per-session working list
Each agent operates on its own session list via the session task tools
(`task_create_batch`, `task_create`, `task_update`, `task_list`,
`task_get`). The colony
template is addressed only by the queen's `colony_template_*` tools and by
the UI/event surface.
"""
from framework.tasks.models import (
ClaimResult,
TaskListMeta,
TaskListRole,
TaskRecord,
TaskStatus,
)
from framework.tasks.scoping import (
colony_task_list_id,
parse_task_list_id,
resolve_task_list_id,
session_task_list_id,
)
from framework.tasks.store import (
TaskStore,
get_task_store,
)
__all__ = [
"ClaimResult",
"TaskListMeta",
"TaskListRole",
"TaskRecord",
"TaskStatus",
"TaskStore",
"colony_task_list_id",
"get_task_store",
"parse_task_list_id",
"resolve_task_list_id",
"session_task_list_id",
]
+158
View File
@@ -0,0 +1,158 @@
"""Bridge from the task store to the EventBus.
The store is intentionally event-free it's pure storage. The tool
executors (and run_parallel_workers, and any future colony_template_*
caller) are responsible for emitting the lifecycle events to the bus
after successful mutations.
Events are scoped to a stream_id pulled from the execution context if
available; otherwise they fan out at the global ``primary`` stream so the
UI's broad subscriptions still see them.
"""
from __future__ import annotations
import logging
from typing import Any
from framework.host.event_bus import AgentEvent, EventBus, EventType
from framework.tasks.models import TaskRecord
logger = logging.getLogger(__name__)
# Process-global default — set by the runner / orchestrator at bringup.
_DEFAULT_BUS: EventBus | None = None
def set_default_event_bus(bus: EventBus | None) -> None:
global _DEFAULT_BUS
_DEFAULT_BUS = bus
def _get_bus(bus: EventBus | None = None) -> EventBus | None:
return bus or _DEFAULT_BUS
def _serialize_record(rec: TaskRecord) -> dict[str, Any]:
return {
"id": rec.id,
"subject": rec.subject,
"description": rec.description,
"active_form": rec.active_form,
"owner": rec.owner,
"status": rec.status.value,
"blocks": list(rec.blocks),
"blocked_by": list(rec.blocked_by),
"metadata": dict(rec.metadata),
"created_at": rec.created_at,
"updated_at": rec.updated_at,
}
async def emit_task_created(
*,
task_list_id: str,
record: TaskRecord,
stream_id: str = "primary",
bus: EventBus | None = None,
) -> None:
b = _get_bus(bus)
if b is None:
return
try:
await b.publish(
AgentEvent(
type=EventType.TASK_CREATED,
stream_id=stream_id,
data={
"task_list_id": task_list_id,
"task": _serialize_record(record),
},
)
)
except Exception:
logger.debug("emit_task_created failed", exc_info=True)
async def emit_task_updated(
*,
task_list_id: str,
record: TaskRecord,
fields: list[str],
stream_id: str = "primary",
bus: EventBus | None = None,
) -> None:
b = _get_bus(bus)
if b is None or not fields:
return
try:
await b.publish(
AgentEvent(
type=EventType.TASK_UPDATED,
stream_id=stream_id,
data={
"task_list_id": task_list_id,
"task_id": record.id,
"after": _serialize_record(record),
"fields": fields,
},
)
)
except Exception:
logger.debug("emit_task_updated failed", exc_info=True)
async def emit_task_deleted(
*,
task_list_id: str,
task_id: int,
cascade: list[int],
stream_id: str = "primary",
bus: EventBus | None = None,
) -> None:
b = _get_bus(bus)
if b is None:
return
try:
await b.publish(
AgentEvent(
type=EventType.TASK_DELETED,
stream_id=stream_id,
data={
"task_list_id": task_list_id,
"task_id": task_id,
"cascade": cascade,
},
)
)
except Exception:
logger.debug("emit_task_deleted failed", exc_info=True)
async def emit_colony_template_assignment(
*,
colony_id: str,
task_id: int,
assigned_session: str | None,
assigned_worker_id: str | None,
stream_id: str = "primary",
bus: EventBus | None = None,
) -> None:
b = _get_bus(bus)
if b is None:
return
try:
await b.publish(
AgentEvent(
type=EventType.COLONY_TEMPLATE_ASSIGNMENT,
stream_id=stream_id,
data={
"colony_id": colony_id,
"task_id": task_id,
"assigned_session": assigned_session,
"assigned_worker_id": assigned_worker_id,
},
)
)
except Exception:
logger.debug("emit_colony_template_assignment failed", exc_info=True)
+103
View File
@@ -0,0 +1,103 @@
"""Task lifecycle hooks.
Two events:
* ``task_created`` -- fires after the task file is written but before the
tool returns. Hooks may raise ``BlockingHookError``
to abort creation; the wrapper deletes the just-
created task and returns an error tool_result.
* ``task_completed`` -- fires when ``task_update`` transitions a task to
``completed``. A blocking error rolls the status
back to ``in_progress`` and surfaces the error.
Hooks are registered on a process-global registry so callers (test
fixtures, integrations) can install them without threading through the
agent loop. They run in registration order; any hook may abort by raising
``BlockingHookError``.
"""
from __future__ import annotations
import inspect
import logging
from collections.abc import Awaitable, Callable
from dataclasses import dataclass
from typing import Any
logger = logging.getLogger(__name__)
HOOK_TASK_CREATED = "task_created"
HOOK_TASK_COMPLETED = "task_completed"
class BlockingHookError(Exception):
"""Raised by a hook to veto the surrounding tool operation."""
@dataclass
class TaskHookContext:
event: str
task_list_id: str
task: Any # TaskRecord (avoid import cycle)
agent_id: str | None = None
metadata: dict[str, Any] | None = None
HookFn = Callable[[TaskHookContext], Any | Awaitable[Any]]
_HOOK_REGISTRY: dict[str, list[HookFn]] = {
HOOK_TASK_CREATED: [],
HOOK_TASK_COMPLETED: [],
}
def register_hook(event: str, fn: HookFn) -> None:
if event not in _HOOK_REGISTRY:
raise ValueError(f"Unknown hook event: {event!r}")
_HOOK_REGISTRY[event].append(fn)
def clear_hooks(event: str | None = None) -> None:
"""Test helper. Clear all hooks (or just one event's)."""
if event is None:
for k in _HOOK_REGISTRY:
_HOOK_REGISTRY[k].clear()
else:
_HOOK_REGISTRY.get(event, []).clear()
async def run_task_hooks(
event: str,
*,
task_list_id: str,
task: Any,
agent_id: str | None = None,
metadata: dict[str, Any] | None = None,
) -> None:
"""Run all hooks registered for ``event``.
Re-raises ``BlockingHookError`` from any hook; the caller is responsible
for rolling back the operation.
"""
hooks = list(_HOOK_REGISTRY.get(event, ()))
if not hooks:
return
ctx = TaskHookContext(
event=event,
task_list_id=task_list_id,
task=task,
agent_id=agent_id,
metadata=metadata,
)
for hook in hooks:
try:
result = hook(ctx)
if inspect.isawaitable(result):
await result
except BlockingHookError:
raise
except Exception:
# Non-blocking exceptions are logged but do not abort the operation.
logger.exception("Non-blocking hook failed for %s", event)
+105
View File
@@ -0,0 +1,105 @@
"""Data models for the task tracker.
The schema follows the UI-facing task-record shape with one notable
difference: ids are integers (Python is cleaner that way) and rendered
as ``#N`` only in user-facing strings.
"""
from __future__ import annotations
import time
from dataclasses import dataclass
from enum import StrEnum
from typing import Any, Literal
from pydantic import BaseModel, Field
class TaskStatus(StrEnum):
PENDING = "pending"
IN_PROGRESS = "in_progress"
COMPLETED = "completed"
class TaskListRole(StrEnum):
"""Distinguishes a colony template from a session-scoped working list.
Used for sanity-checking which write paths are allowed (e.g. the four
session tools must never touch a ``template`` list).
"""
TEMPLATE = "template" # colony:{colony_id}
SESSION = "session" # session:{agent_id}:{session_id}
class TaskRecord(BaseModel):
"""One unit of work tracked by an agent."""
id: int # monotonic, never reused — see store.py
subject: str
description: str = ""
active_form: str | None = None # present-continuous label, surfaces in UI
owner: str | None = None # agent_id of the owning agent
status: TaskStatus = TaskStatus.PENDING
blocks: list[int] = Field(default_factory=list)
blocked_by: list[int] = Field(default_factory=list)
metadata: dict[str, Any] = Field(default_factory=dict)
created_at: float = Field(default_factory=time.time)
updated_at: float = Field(default_factory=time.time)
class TaskListMeta(BaseModel):
"""Per-list metadata. Embedded in ``TaskListDocument``."""
task_list_id: str
role: TaskListRole
creator_agent_id: str | None = None
created_at: float = Field(default_factory=time.time)
last_seen_session_ids: list[str] = Field(default_factory=list)
schema_version: int = 1
class TaskListDocument(BaseModel):
"""Whole task list as a single JSON document on disk.
Lives at ``{task_list_path(list_id)}/tasks.json``; the list-lock
sentinel is its sibling ``tasks.json.lock``.
"""
meta: TaskListMeta
highwatermark: int = 0
tasks: list[TaskRecord] = Field(default_factory=list)
# Tagged union for claim_task_with_busy_check. Used by run_parallel_workers
# when stamping ``assigned_session`` on a colony template entry — the only
# place a "claim" actually happens under the hive model.
@dataclass
class ClaimOk:
kind: Literal["ok"]
record: TaskRecord
@dataclass
class ClaimNotFound:
kind: Literal["not_found"]
@dataclass
class ClaimAlreadyOwned:
kind: Literal["already_owned"]
by: str
@dataclass
class ClaimAlreadyCompleted:
kind: Literal["already_completed"]
@dataclass
class ClaimBlocked:
kind: Literal["blocked"]
by: list[int]
ClaimResult = ClaimOk | ClaimNotFound | ClaimAlreadyOwned | ClaimAlreadyCompleted | ClaimBlocked
+108
View File
@@ -0,0 +1,108 @@
"""Periodic task-reminder injection.
After enough silent turns since the last task tool call, inject a
reminder summarizing the current open tasks. Catches the failure mode
where the agent has silently absorbed multiple finished steps into one
in_progress task and stopped using the task tools.
The reminder counter lives on the AgentLoop instance; this module owns
the policy (threshold, cooldown, message text) and the integration
helper. Wiring lives in :mod:`framework.tasks.integrations.agent_loop`.
"""
from __future__ import annotations
import logging
import os
from collections.abc import Iterable
from dataclasses import dataclass
from framework.tasks.models import TaskRecord, TaskStatus
logger = logging.getLogger(__name__)
REMINDER_THRESHOLD_TURNS = int(os.environ.get("HIVE_TASK_REMINDER_TURNS", "8"))
REMINDER_COOLDOWN_TURNS = int(os.environ.get("HIVE_TASK_REMINDER_COOLDOWN", "8"))
# Names that count as "task ops" — calling any of these resets the silence
# counter. Keep narrow: only mutating ops re-establish discipline. task_list
# / task_get are read-only and shouldn't reset the counter (the agent could
# read forever without making progress).
TASK_OP_TOOL_NAMES: frozenset[str] = frozenset(
{
"task_create",
"task_update",
"colony_template_add",
"colony_template_update",
"colony_template_remove",
}
)
@dataclass
class ReminderState:
"""Per-loop counter — caller bumps it each iteration."""
turns_since_task_op: int = 0
turns_since_last_reminder: int = 0
def on_iteration(self) -> None:
self.turns_since_task_op += 1
self.turns_since_last_reminder += 1
def on_task_op(self) -> None:
self.turns_since_task_op = 0
def on_reminder_sent(self) -> None:
self.turns_since_last_reminder = 0
def should_remind(self, has_open_tasks: bool) -> bool:
return (
has_open_tasks
and self.turns_since_task_op >= REMINDER_THRESHOLD_TURNS
and self.turns_since_last_reminder >= REMINDER_COOLDOWN_TURNS
)
def saw_task_op(tool_names: Iterable[str]) -> bool:
"""True if any of the names is a counter-resetting task op."""
return any(name in TASK_OP_TOOL_NAMES for name in tool_names)
def build_reminder(records: list[TaskRecord]) -> str:
"""Compose the reminder body — pending/in-progress focus."""
open_ = [r for r in records if r.status != TaskStatus.COMPLETED]
if not open_:
return ""
in_progress = [r for r in open_ if r.status == TaskStatus.IN_PROGRESS]
head = (
"[task_reminder] The task tools haven't been used in several "
"turns. If you're working on tasks that would benefit from "
"tracked progress:"
)
bullets = [
" - Mark the in_progress task `completed` THE MOMENT it's done — "
"before starting the next step. Don't batch completions.",
" - If you've finished work that wasn't on the list, add a "
"task_create + task_update completed pair so the panel reflects it.",
" - If you're umbrella-tracking ('reply to all posts' as one task), "
"break it into one task per atomic action — use `task_create_batch` "
"with one entry per action.",
" - Also consider cleaning up the task list if it has become stale: "
"if any open tasks no longer apply (user pivoted, scope shifted, "
"task created in error), delete them via `task_update` with "
"status='deleted'. Don't leave stale items sitting on the list.",
]
if in_progress:
bullets.append(
" - Currently in_progress (consider whether they're really "
"still active): " + ", ".join(f'#{r.id} "{r.subject}"' for r in in_progress[:5])
)
listing = ["", "Open tasks:"]
for r in open_[:10]:
listing.append(f" #{r.id} [{r.status.value}] {r.subject}")
if len(open_) > 10:
listing.append(f" ... and {len(open_) - 10} more")
listing.append("\nOnly act on this if relevant to the current work. NEVER mention this reminder to the user.")
return "\n".join([head, *bullets, *listing])
+80
View File
@@ -0,0 +1,80 @@
"""Task list id resolution.
Under the corrected model (see plan §5):
- Every agent session owns one task list: ``session:{agent_id}:{session_id}``
- The colony has a separate template list: ``colony:{colony_id}``
``resolve_task_list_id(ctx)`` returns the agent's OWN session list id —
what the four task tools write to. The colony template is addressed via
the dedicated ``colony_template_*`` tools and the UI; never via the four
session tools.
"""
from __future__ import annotations
import logging
import os
from typing import Any
logger = logging.getLogger(__name__)
def session_task_list_id(agent_id: str, session_id: str) -> str:
return f"session:{agent_id}:{session_id}"
def colony_task_list_id(colony_id: str) -> str:
return f"colony:{colony_id}"
def parse_task_list_id(task_list_id: str) -> dict[str, str]:
"""Decode a task_list_id into its component parts.
Returns a dict with at least ``kind`` ("session" / "colony" / "unscoped"
/ "raw"), and the relevant ids when applicable.
"""
if task_list_id.startswith("session:"):
rest = task_list_id[len("session:") :]
agent_id, _, session_id = rest.partition(":")
return {"kind": "session", "agent_id": agent_id, "session_id": session_id}
if task_list_id.startswith("colony:"):
return {"kind": "colony", "colony_id": task_list_id[len("colony:") :]}
if task_list_id.startswith("unscoped:"):
return {"kind": "unscoped", "agent_id": task_list_id[len("unscoped:") :]}
return {"kind": "raw", "value": task_list_id}
def resolve_task_list_id(ctx: Any) -> str:
"""Return the agent's own session-scoped task list id.
Resolution priority:
1. ``HIVE_TASK_LIST_ID`` env var (test/CLI override)
2. ``ctx.task_list_id`` if already populated by the runner
3. ``session:{ctx.agent_id}:{ctx.run_id or ctx.execution_id}``
4. ``unscoped:{ctx.agent_id}`` sentinel (should not happen in prod)
"""
override = os.environ.get("HIVE_TASK_LIST_ID")
if override:
return override
existing = getattr(ctx, "task_list_id", None)
if existing:
return existing
agent_id = getattr(ctx, "agent_id", None) or ""
session_id = (
getattr(ctx, "run_id", None) or getattr(ctx, "execution_id", None) or getattr(ctx, "stream_id", None) or ""
)
if agent_id and session_id:
return session_task_list_id(agent_id, session_id)
fallback = f"unscoped:{agent_id or 'unknown'}"
logger.warning(
"resolve_task_list_id falling back to %s — agent_id=%r session_id=%r",
fallback,
agent_id,
session_id,
)
return fallback
+919
View File
@@ -0,0 +1,919 @@
"""File-backed task store with filelock-based coordination.
Layout per list::
{task_list_path}/tasks.json -- TaskListDocument (meta + hwm + tasks)
{task_list_path}/tasks.json.lock -- list-level lock sentinel
Where ``task_list_path`` is:
colony:{c} -> ~/.hive/colonies/{c}/
session:{a}:{s} -> ~/.hive/agents/{a}/sessions/{s}/
unscoped:{a} -> ~/.hive/unscoped/{a}/
{malformed} -> ~/.hive/_misc/{slug}/
An older layout used the same root + a nested ``tasks/`` subdir holding
``meta.json``, ``.highwatermark``, ``.lock``, and ``NNNN.json`` per task.
That produced the ugly ``/tasks/tasks/0001.json`` path. Migration is
lazy the first lock-protected access on such a list folds the legacy
artifacts into ``tasks.json`` and unlinks them.
All filesystem I/O is wrapped in ``asyncio.to_thread`` so the event loop
never blocks. Locks use a ~3s budget comfortable headroom for the only
realistic write contender (colony template under concurrent
``colony_template_*`` and ``run_parallel_workers`` stamps).
"""
from __future__ import annotations
import asyncio
import logging
import os
import shutil
import threading
import time
from collections.abc import Iterable
from pathlib import Path
from typing import Any
from filelock import FileLock
from framework.tasks.models import (
ClaimAlreadyCompleted,
ClaimAlreadyOwned,
ClaimBlocked,
ClaimNotFound,
ClaimOk,
ClaimResult,
TaskListDocument,
TaskListMeta,
TaskListRole,
TaskRecord,
TaskStatus,
)
from framework.utils.io import atomic_write
logger = logging.getLogger(__name__)
LOCK_TIMEOUT_SECONDS = 3.0 # ~30 retries × ~100ms
DOC_FILENAME = "tasks.json"
LOCK_FILENAME = "tasks.json.lock" # only colony lists (cross-process writers)
# Per-list in-memory locks for single-process scopes (session/unscoped/_misc).
# Sessions have one owning agent, so only same-process concurrency matters
# (e.g. parallel tool use within a single turn) — no on-disk lock needed.
_INPROC_LOCKS: dict[str, threading.Lock] = {}
_INPROC_LOCKS_GUARD = threading.Lock()
def _get_inproc_lock(task_list_id: str) -> threading.Lock:
with _INPROC_LOCKS_GUARD:
lock = _INPROC_LOCKS.get(task_list_id)
if lock is None:
lock = threading.Lock()
_INPROC_LOCKS[task_list_id] = lock
return lock
class _Unset:
"""Sentinel for "owner argument not provided" — distinct from owner=None."""
__slots__ = ()
_UNSET_SENTINEL: _Unset = _Unset()
def _hive_root() -> Path:
"""Location of the hive data dir; honors HIVE_HOME for tests."""
return Path(os.environ.get("HIVE_HOME", str(Path.home() / ".hive")))
def _find_queen_session_dir(session_id: str, *, hive_root: Path) -> Path | None:
"""Return ``agents/queens/{queen}/sessions/{session_id}`` if one exists.
Queens live under ``QUEENS_DIR = hive_root / "agents" / "queens"`` (see
``framework.config``). The task system gets a generic ``agent_id ==
"queen"`` in its ``task_list_id``, which would otherwise dead-end at
``agents/queen/...``, decoupled from the real session folder. By
probing the canonical layout we keep the task doc beside conversations,
events, summary, and meta for the same session.
"""
queens_dir = hive_root / "agents" / "queens"
if not queens_dir.exists():
return None
try:
candidates = [d for d in queens_dir.iterdir() if d.is_dir()]
except OSError:
return None
for queen_dir in candidates:
candidate = queen_dir / "sessions" / session_id
if candidate.is_dir():
return candidate
return None
def task_list_path(task_list_id: str, *, hive_root: Path | None = None) -> Path:
"""Resolve task_list_id -> directory containing ``tasks.json``.
Note: this returns the *parent* of the doc file, not the file itself.
For session/colony/unscoped lists, this is the agent or colony's home
dir; the task doc is one filename inside it. (The older layout had an
extra ``tasks/`` subdir under this path see ``_legacy_root``.)
For ``session:`` lists, the canonical queen session folder is preferred
when it exists on disk: the task doc lives next to the rest of that
session's data (conversations, events, summary).
"""
root = hive_root or _hive_root()
if task_list_id.startswith("colony:"):
colony_id = task_list_id[len("colony:") :]
return root / "colonies" / colony_id
if task_list_id.startswith("session:"):
rest = task_list_id[len("session:") :]
agent_id, _, session_id = rest.partition(":")
if not session_id:
raise ValueError(f"Malformed session task_list_id: {task_list_id!r}")
canonical = _find_queen_session_dir(session_id, hive_root=root)
if canonical is not None:
return canonical
return root / "agents" / agent_id / "sessions" / session_id
if task_list_id.startswith("unscoped:"):
agent_id = task_list_id[len("unscoped:") :]
return root / "unscoped" / agent_id
# Last-ditch sanitization for HIVE_TASK_LIST_ID overrides — slugify the
# whole thing so the test/dev path can't escape the hive root.
safe = "".join(c if c.isalnum() or c in "-_" else "_" for c in task_list_id)
return root / "_misc" / safe
def _legacy_root(task_list_id: str, *, hive_root: Path | None = None) -> Path:
"""Where the older artifacts (meta.json, .highwatermark, tasks/NNNN.json) lived.
Pinned to the *pre-canonical* layout for queen session lists this is
``agents/{agent_id}/sessions/{session_id}/tasks`` (i.e. the literal
``agent_id`` folder, not the canonical ``agents/queens/{queen}/...``
path). The lazy migration reads from here and writes the new doc to
wherever ``task_list_path`` resolves now.
"""
root = hive_root or _hive_root()
if task_list_id.startswith("colony:"):
return root / "colonies" / task_list_id[len("colony:") :] / "tasks"
if task_list_id.startswith("session:"):
rest = task_list_id[len("session:") :]
agent_id, _, session_id = rest.partition(":")
return root / "agents" / agent_id / "sessions" / session_id / "tasks"
if task_list_id.startswith("unscoped:"):
return root / "unscoped" / task_list_id[len("unscoped:") :] / "tasks"
# _misc fallback: legacy lived directly in the slug dir, same as the new parent.
safe = "".join(c if c.isalnum() or c in "-_" else "_" for c in task_list_id)
return root / "_misc" / safe
# ---------------------------------------------------------------------------
# TaskStore — public façade
# ---------------------------------------------------------------------------
class TaskStore:
"""Async wrapper around the on-disk store.
A single TaskStore is fine to share across the process; locking is
file-based, so even multiple processes are safe.
"""
def __init__(self, *, hive_root: Path | None = None) -> None:
self._hive_root = hive_root
# ----- list-level ---------------------------------------------------
async def ensure_task_list(
self,
task_list_id: str,
*,
role: TaskListRole,
creator_agent_id: str | None = None,
session_id: str | None = None,
) -> TaskListMeta:
"""Create a list if absent; if present, append session_id to last_seen.
Idempotent: callers (ColonyRuntime bringup, lazy session creation)
can call this every time.
"""
return await asyncio.to_thread(
self._ensure_task_list_sync,
task_list_id,
role,
creator_agent_id,
session_id,
)
async def list_exists(self, task_list_id: str) -> bool:
"""A list exists if its doc is on disk OR a legacy artifact is.
The legacy fallback exists so that lists created under the older
layout and not yet migrated still surface to the REST layer.
"""
def _check() -> bool:
if self._doc_path(task_list_id).exists():
return True
return self._has_legacy_artifacts(task_list_id)
return await asyncio.to_thread(_check)
async def get_meta(self, task_list_id: str) -> TaskListMeta | None:
return await asyncio.to_thread(self._read_meta_sync, task_list_id)
async def reset_task_list(self, task_list_id: str) -> None:
"""Delete all tasks but preserve the high-water-mark.
Test helper. Never wired to runtime lifecycle.
"""
await asyncio.to_thread(self._reset_sync, task_list_id)
# ----- task CRUD ----------------------------------------------------
async def create_tasks_batch(
self,
task_list_id: str,
specs: list[dict[str, Any]],
) -> list[TaskRecord]:
"""Atomically create N tasks under a single list-lock acquisition.
Each spec is a dict with keys: subject (required), description,
active_form, owner, metadata. Ids are assigned sequentially and
contiguously; if any spec is malformed the whole batch is
rejected before any write. The doc model makes "atomic-or-none"
free we mutate one in-memory document and write it once.
"""
return await asyncio.to_thread(self._create_tasks_batch_sync, task_list_id, specs)
async def create_task(
self,
task_list_id: str,
*,
subject: str,
description: str = "",
active_form: str | None = None,
owner: str | None = None,
metadata: dict[str, Any] | None = None,
) -> TaskRecord:
return await asyncio.to_thread(
self._create_task_sync,
task_list_id,
subject,
description,
active_form,
owner,
metadata or {},
)
async def get_task(self, task_list_id: str, task_id: int) -> TaskRecord | None:
return await asyncio.to_thread(self._read_task_sync, task_list_id, task_id)
async def list_tasks(
self,
task_list_id: str,
*,
include_internal: bool = False,
) -> list[TaskRecord]:
records = await asyncio.to_thread(self._list_tasks_sync, task_list_id)
if include_internal:
return records
return [r for r in records if not r.metadata.get("_internal")]
async def update_task(
self,
task_list_id: str,
task_id: int,
*,
subject: str | None = None,
description: str | None = None,
active_form: str | None = None,
owner: str | None | _Unset = _UNSET_SENTINEL,
status: TaskStatus | None = None,
add_blocks: list[int] | None = None,
add_blocked_by: list[int] | None = None,
metadata_patch: dict[str, Any] | None = None,
) -> tuple[TaskRecord | None, list[str]]:
"""Update a task; returns (new_record, fields_changed) or (None, [])."""
return await asyncio.to_thread(
self._update_task_sync,
task_list_id,
task_id,
subject,
description,
active_form,
owner,
status,
add_blocks,
add_blocked_by,
metadata_patch,
)
async def delete_task(self, task_list_id: str, task_id: int) -> tuple[bool, list[int]]:
"""Delete a task; returns (was_deleted, cascaded_ids).
``cascaded_ids`` are the ids of other tasks whose blocks/blocked_by
referenced the deleted id and were stripped.
"""
return await asyncio.to_thread(self._delete_task_sync, task_list_id, task_id)
async def claim_task_with_busy_check(
self,
task_list_id: str,
task_id: int,
claimant: str,
) -> ClaimResult:
"""Atomic claim under list-lock.
Used internally by ``run_parallel_workers`` when stamping
``metadata.assigned_session`` on colony template entries not
exposed to LLMs as a worker-facing claim race.
"""
return await asyncio.to_thread(self._claim_sync, task_list_id, task_id, claimant)
# =====================================================================
# Sync internals — all called via asyncio.to_thread
# =====================================================================
def _list_dir(self, task_list_id: str) -> Path:
return task_list_path(task_list_id, hive_root=self._hive_root)
def _doc_path(self, task_list_id: str) -> Path:
return self._list_dir(task_list_id) / DOC_FILENAME
def _list_lock(self, task_list_id: str):
"""Return a context manager that serialises writes to this list.
Colony template lists need a cross-process ``FileLock`` because
``run_parallel_workers`` spawns worker subprocesses that stamp
completion back onto the template. Session/unscoped/_misc lists
have a single owning agent only same-process concurrency
matters (e.g. parallel tool use within one turn), so an
in-memory ``threading.Lock`` is enough and avoids the visible
``tasks.json.lock`` sentinel beside session folders.
"""
d = self._list_dir(task_list_id)
d.mkdir(parents=True, exist_ok=True)
if task_list_id.startswith("colony:"):
return FileLock(str(d / LOCK_FILENAME), timeout=LOCK_TIMEOUT_SECONDS)
return _get_inproc_lock(task_list_id)
def _legacy_dir(self, task_list_id: str) -> Path:
return _legacy_root(task_list_id, hive_root=self._hive_root)
def _legacy_meta_path(self, task_list_id: str) -> Path:
return self._legacy_dir(task_list_id) / "meta.json"
def _legacy_hwm_path(self, task_list_id: str) -> Path:
return self._legacy_dir(task_list_id) / ".highwatermark"
def _legacy_lock_path(self, task_list_id: str) -> Path:
return self._legacy_dir(task_list_id) / ".lock"
def _legacy_tasks_dir(self, task_list_id: str) -> Path:
return self._legacy_dir(task_list_id) / "tasks"
def _has_legacy_artifacts(self, task_list_id: str) -> bool:
if self._legacy_meta_path(task_list_id).exists():
return True
td = self._legacy_tasks_dir(task_list_id)
if td.exists():
try:
return any(p.suffix == ".json" for p in td.iterdir())
except OSError:
return False
return False
# ----- doc IO -------------------------------------------------------
def _read_doc_sync(self, task_list_id: str) -> TaskListDocument | None:
"""Lock-free read for already-migrated lists; falls back to a
lock-protected migration if only legacy artifacts exist.
Returns None if the list doesn't exist on disk in either form.
"""
doc_path = self._doc_path(task_list_id)
if doc_path.exists():
try:
return TaskListDocument.model_validate_json(doc_path.read_text(encoding="utf-8"))
except Exception:
logger.warning("Corrupt tasks.json at %s", doc_path, exc_info=True)
# Fall through — legacy fallback may rescue us.
if self._has_legacy_artifacts(task_list_id):
with self._list_lock(task_list_id):
# Re-check under lock: a parallel writer may have just
# finished migrating, in which case we read the new doc.
if doc_path.exists():
try:
return TaskListDocument.model_validate_json(doc_path.read_text(encoding="utf-8"))
except Exception:
logger.warning(
"Corrupt tasks.json at %s (post-lock)",
doc_path,
exc_info=True,
)
doc = self._migrate_legacy_unsafe(task_list_id)
if doc is not None:
self._write_doc_unsafe(task_list_id, doc)
self._cleanup_legacy_unsafe(task_list_id)
return doc
return None
def _read_doc_unsafe(self, task_list_id: str) -> TaskListDocument | None:
"""Same as ``_read_doc_sync`` but assumes the list-lock is already
held used by methods that are already inside ``with self._list_lock``.
Migration happens in-place without re-entering the lock.
"""
doc_path = self._doc_path(task_list_id)
if doc_path.exists():
try:
return TaskListDocument.model_validate_json(doc_path.read_text(encoding="utf-8"))
except Exception:
logger.warning("Corrupt tasks.json at %s", doc_path, exc_info=True)
if self._has_legacy_artifacts(task_list_id):
doc = self._migrate_legacy_unsafe(task_list_id)
if doc is not None:
self._write_doc_unsafe(task_list_id, doc)
self._cleanup_legacy_unsafe(task_list_id)
return doc
return None
def _write_doc_unsafe(self, task_list_id: str, doc: TaskListDocument) -> None:
"""Atomically rewrite the doc. Caller MUST hold the list-lock."""
path = self._doc_path(task_list_id)
path.parent.mkdir(parents=True, exist_ok=True)
with atomic_write(path) as f:
f.write(doc.model_dump_json(indent=2))
# ----- migration ----------------------------------------------------
def _migrate_legacy_unsafe(self, task_list_id: str) -> TaskListDocument | None:
"""Fold legacy artifacts into a TaskListDocument. Caller MUST hold lock."""
meta = self._read_legacy_meta(task_list_id)
if meta is None:
inferred_role = TaskListRole.TEMPLATE if task_list_id.startswith("colony:") else TaskListRole.SESSION
meta = TaskListMeta(task_list_id=task_list_id, role=inferred_role)
tasks: list[TaskRecord] = []
td = self._legacy_tasks_dir(task_list_id)
if td.exists():
for p in sorted(td.iterdir()):
if p.suffix != ".json":
continue
try:
tasks.append(TaskRecord.model_validate_json(p.read_text(encoding="utf-8")))
except Exception:
logger.warning(
"Skipping corrupt legacy task file %s during migration",
p,
exc_info=True,
)
tasks.sort(key=lambda r: r.id)
hwm = self._read_legacy_hwm(task_list_id)
max_id = max((r.id for r in tasks), default=0)
hwm = max(hwm, max_id)
if not tasks and hwm == 0 and not self._legacy_meta_path(task_list_id).exists():
return None
return TaskListDocument(
meta=meta,
highwatermark=hwm,
tasks=tasks,
)
def _read_legacy_meta(self, task_list_id: str) -> TaskListMeta | None:
path = self._legacy_meta_path(task_list_id)
if not path.exists():
return None
try:
return TaskListMeta.model_validate_json(path.read_text(encoding="utf-8"))
except Exception:
logger.warning("Corrupt legacy meta.json at %s", path, exc_info=True)
return None
def _read_legacy_hwm(self, task_list_id: str) -> int:
path = self._legacy_hwm_path(task_list_id)
if not path.exists():
return 0
try:
return int(path.read_text(encoding="utf-8").strip() or "0")
except (ValueError, OSError):
return 0
def _cleanup_legacy_unsafe(self, task_list_id: str) -> None:
"""Remove the older layout's files. Caller MUST hold the list-lock.
For session/colony/unscoped lists the legacy_dir is a dedicated
``tasks/`` subdir, so we remove the whole tree. For the ``_misc``
fallback the legacy_dir is the same as the new parent dir we
delete only the specific legacy filenames so we don't clobber
the new ``tasks.json``.
"""
legacy = self._legacy_dir(task_list_id)
if not legacy.exists():
return
if legacy != self._list_dir(task_list_id):
try:
shutil.rmtree(legacy)
except OSError:
logger.warning("Failed to remove legacy task dir %s", legacy, exc_info=True)
return
# _misc case: shared parent dir — surgical delete only.
for p in (
self._legacy_meta_path(task_list_id),
self._legacy_hwm_path(task_list_id),
self._legacy_lock_path(task_list_id),
):
try:
p.unlink(missing_ok=True)
except OSError:
logger.warning("Failed to remove %s", p, exc_info=True)
td = self._legacy_tasks_dir(task_list_id)
if td.exists():
try:
shutil.rmtree(td)
except OSError:
logger.warning("Failed to remove legacy tasks subdir %s", td, exc_info=True)
# ----- meta accessors over the doc ----------------------------------
def _ensure_task_list_sync(
self,
task_list_id: str,
role: TaskListRole,
creator_agent_id: str | None,
session_id: str | None,
) -> TaskListMeta:
with self._list_lock(task_list_id):
doc = self._read_doc_unsafe(task_list_id)
if doc is None:
meta = TaskListMeta(
task_list_id=task_list_id,
role=role,
creator_agent_id=creator_agent_id,
last_seen_session_ids=[session_id] if session_id else [],
)
doc = TaskListDocument(meta=meta)
self._write_doc_unsafe(task_list_id, doc)
return meta
meta = doc.meta
if session_id and session_id not in meta.last_seen_session_ids:
meta.last_seen_session_ids.append(session_id)
# Cap at 10 to keep the audit trail bounded.
meta.last_seen_session_ids = meta.last_seen_session_ids[-10:]
self._write_doc_unsafe(task_list_id, doc)
return meta
def _read_meta_sync(self, task_list_id: str) -> TaskListMeta | None:
doc = self._read_doc_sync(task_list_id)
return doc.meta if doc is not None else None
# ----- task IO ------------------------------------------------------
def _read_task_sync(self, task_list_id: str, task_id: int) -> TaskRecord | None:
doc = self._read_doc_sync(task_list_id)
if doc is None:
return None
for r in doc.tasks:
if r.id == task_id:
return r
return None
def _list_tasks_sync(self, task_list_id: str) -> list[TaskRecord]:
doc = self._read_doc_sync(task_list_id)
if doc is None:
return []
return sorted(doc.tasks, key=lambda r: r.id)
# ----- create -------------------------------------------------------
def _create_task_sync(
self,
task_list_id: str,
subject: str,
description: str,
active_form: str | None,
owner: str | None,
metadata: dict[str, Any],
) -> TaskRecord:
with self._list_lock(task_list_id):
doc = self._read_doc_unsafe(task_list_id)
if doc is None:
inferred_role = TaskListRole.TEMPLATE if task_list_id.startswith("colony:") else TaskListRole.SESSION
doc = TaskListDocument(meta=TaskListMeta(task_list_id=task_list_id, role=inferred_role))
new_id = self._next_id_for_doc(doc)
now = time.time()
record = TaskRecord(
id=new_id,
subject=subject,
description=description,
active_form=active_form,
owner=owner,
status=TaskStatus.PENDING,
metadata=metadata,
created_at=now,
updated_at=now,
)
doc.tasks.append(record)
if new_id > doc.highwatermark:
doc.highwatermark = new_id
self._write_doc_unsafe(task_list_id, doc)
return record
def _create_tasks_batch_sync(
self,
task_list_id: str,
specs: list[dict[str, Any]],
) -> list[TaskRecord]:
if not specs:
return []
# Validate up-front so we don't half-create on a malformed entry.
for i, spec in enumerate(specs):
subj = spec.get("subject")
if not isinstance(subj, str) or not subj.strip():
raise ValueError(f"specs[{i}].subject must be a non-empty string")
with self._list_lock(task_list_id):
doc = self._read_doc_unsafe(task_list_id)
if doc is None:
inferred_role = TaskListRole.TEMPLATE if task_list_id.startswith("colony:") else TaskListRole.SESSION
doc = TaskListDocument(meta=TaskListMeta(task_list_id=task_list_id, role=inferred_role))
base_id = self._next_id_for_doc(doc)
now = time.time()
records: list[TaskRecord] = []
for offset, spec in enumerate(specs):
rec = TaskRecord(
id=base_id + offset,
subject=spec["subject"],
description=spec.get("description", ""),
active_form=spec.get("active_form"),
owner=spec.get("owner"),
status=TaskStatus.PENDING,
metadata=dict(spec.get("metadata") or {}),
created_at=now,
updated_at=now,
)
records.append(rec)
doc.tasks.extend(records)
highest = records[-1].id
if highest > doc.highwatermark:
doc.highwatermark = highest
# Single write — atomic batch is free with the doc model.
self._write_doc_unsafe(task_list_id, doc)
return records
# ----- id assignment ------------------------------------------------
def _next_id_for_doc(self, doc: TaskListDocument) -> int:
max_existing = max((r.id for r in doc.tasks), default=0)
return max(max_existing, doc.highwatermark) + 1
# ----- update -------------------------------------------------------
def _update_task_sync(
self,
task_list_id: str,
task_id: int,
subject: str | None,
description: str | None,
active_form: str | None,
owner: str | None | _Unset,
status: TaskStatus | None,
add_blocks: list[int] | None,
add_blocked_by: list[int] | None,
metadata_patch: dict[str, Any] | None,
) -> tuple[TaskRecord | None, list[str]]:
with self._list_lock(task_list_id):
doc = self._read_doc_unsafe(task_list_id)
if doc is None:
return None, []
target = next((r for r in doc.tasks if r.id == task_id), None)
if target is None:
return None, []
new, changed = self._update_task_in_doc(
doc,
target,
subject=subject,
description=description,
active_form=active_form,
owner=owner,
status=status,
add_blocks=add_blocks,
add_blocked_by=add_blocked_by,
metadata_patch=metadata_patch,
)
if changed:
self._write_doc_unsafe(task_list_id, doc)
return new, changed
def _update_task_in_doc(
self,
doc: TaskListDocument,
current: TaskRecord,
*,
subject: str | None = None,
description: str | None = None,
active_form: str | None = None,
owner: str | None | _Unset = _UNSET_SENTINEL,
status: TaskStatus | None = None,
add_blocks: list[int] | None = None,
add_blocked_by: list[int] | None = None,
metadata_patch: dict[str, Any] | None = None,
) -> tuple[TaskRecord, list[str]]:
"""Mutate ``current`` in place inside ``doc`` and return (record, changed).
Bidirectional blocks/blocked_by also mutate the targets in ``doc``.
"""
changed: list[str] = []
if subject is not None and subject != current.subject:
current.subject = subject
changed.append("subject")
if description is not None and description != current.description:
current.description = description
changed.append("description")
if active_form is not None and active_form != current.active_form:
current.active_form = active_form
changed.append("active_form")
if not isinstance(owner, _Unset) and owner != current.owner:
current.owner = owner
changed.append("owner")
if status is not None and status != current.status:
current.status = status
changed.append("status")
if add_blocks:
for b in add_blocks:
if b in current.blocks or b == current.id:
continue
current.blocks.append(b)
if "blocks" not in changed:
changed.append("blocks")
target = next((r for r in doc.tasks if r.id == b), None)
if target is not None and current.id not in target.blocked_by:
target.blocked_by.append(current.id)
target.updated_at = time.time()
if add_blocked_by:
for b in add_blocked_by:
if b in current.blocked_by or b == current.id:
continue
current.blocked_by.append(b)
if "blocked_by" not in changed:
changed.append("blocked_by")
target = next((r for r in doc.tasks if r.id == b), None)
if target is not None and current.id not in target.blocks:
target.blocks.append(current.id)
target.updated_at = time.time()
if metadata_patch is not None:
md = dict(current.metadata)
for k, v in metadata_patch.items():
if v is None:
md.pop(k, None)
else:
md[k] = v
if md != current.metadata:
current.metadata = md
changed.append("metadata")
if not changed:
return current, []
current.updated_at = time.time()
return current, changed
# ----- delete -------------------------------------------------------
def _delete_task_sync(self, task_list_id: str, task_id: int) -> tuple[bool, list[int]]:
with self._list_lock(task_list_id):
doc = self._read_doc_unsafe(task_list_id)
if doc is None:
return False, []
idx = next((i for i, r in enumerate(doc.tasks) if r.id == task_id), None)
if idx is None:
return False, []
# 1. Bump high-water-mark BEFORE removing so a crash mid-write
# can't cause id reuse on the next create. (atomic_write
# guarantees we either commit the whole new state or none.)
if task_id > doc.highwatermark:
doc.highwatermark = task_id
# 2. Remove the task itself.
doc.tasks.pop(idx)
# 3. Cascade: strip references from all other tasks.
cascaded: list[int] = []
now = time.time()
for other in doc.tasks:
touched = False
if task_id in other.blocks:
other.blocks = [b for b in other.blocks if b != task_id]
touched = True
if task_id in other.blocked_by:
other.blocked_by = [b for b in other.blocked_by if b != task_id]
touched = True
if touched:
other.updated_at = now
cascaded.append(other.id)
self._write_doc_unsafe(task_list_id, doc)
return True, cascaded
# ----- reset --------------------------------------------------------
def _reset_sync(self, task_list_id: str) -> None:
with self._list_lock(task_list_id):
doc = self._read_doc_unsafe(task_list_id)
if doc is None:
return
max_id = max((r.id for r in doc.tasks), default=0)
doc.highwatermark = max(doc.highwatermark, max_id)
doc.tasks = []
self._write_doc_unsafe(task_list_id, doc)
# ----- claim --------------------------------------------------------
def _claim_sync(self, task_list_id: str, task_id: int, claimant: str) -> ClaimResult:
with self._list_lock(task_list_id):
doc = self._read_doc_unsafe(task_list_id)
if doc is None:
return ClaimNotFound(kind="not_found")
current = next((r for r in doc.tasks if r.id == task_id), None)
if current is None:
return ClaimNotFound(kind="not_found")
if current.status == TaskStatus.COMPLETED:
return ClaimAlreadyCompleted(kind="already_completed")
if current.owner is not None and current.owner != claimant:
return ClaimAlreadyOwned(kind="already_owned", by=current.owner)
unresolved_blockers: list[int] = []
for b in current.blocked_by:
blocker = next((r for r in doc.tasks if r.id == b), None)
if blocker is not None and blocker.status != TaskStatus.COMPLETED:
unresolved_blockers.append(b)
if unresolved_blockers:
return ClaimBlocked(kind="blocked", by=unresolved_blockers)
new, _ = self._update_task_in_doc(doc, current, owner=claimant)
self._write_doc_unsafe(task_list_id, doc)
return ClaimOk(kind="ok", record=new)
# ---------------------------------------------------------------------------
# Process-wide singleton (small, stateless wrapper)
# ---------------------------------------------------------------------------
_default_store: TaskStore | None = None
def get_task_store() -> TaskStore:
"""Process-wide default TaskStore (resolves HIVE_HOME at first call).
Tests should construct a TaskStore directly with hive_root=tmp_path
rather than relying on the singleton.
"""
global _default_store
if _default_store is None:
_default_store = TaskStore()
return _default_store
# Convenience for tests / utilities.
def fingerprint_for_test(task_list_id: str, hive_root: Path) -> Iterable[Path]:
"""Yield every task-list-related file — used by tests to assert
byte-equivalence pre/post shutdown.
Includes the doc + lock and any legacy leftovers (so this still works
while a list is mid-migration).
"""
files: list[Path] = []
base = task_list_path(task_list_id, hive_root=hive_root)
if not base.exists():
return []
doc = base / DOC_FILENAME
if doc.exists():
files.append(doc)
lock = base / LOCK_FILENAME
if lock.exists():
files.append(lock)
legacy = _legacy_root(task_list_id, hive_root=hive_root)
if legacy.exists() and legacy != base:
files.extend(sorted(legacy.rglob("*")))
elif legacy.exists():
# _misc fallback: include only legacy filenames
for name in ("meta.json", ".highwatermark", ".lock"):
p = legacy / name
if p.exists():
files.append(p)
td = legacy / "tasks"
if td.exists():
files.extend(sorted(td.rglob("*")))
return sorted(files)
+280
View File
@@ -0,0 +1,280 @@
"""End-to-end tests:
- Session task tools fire EventBus events
- REST routes return correct snapshots
- run_parallel_workers-style flow stamps assigned_session
- Durability: store survives a process boundary (subprocess)
"""
from __future__ import annotations
import asyncio
import os
import subprocess
import sys
from pathlib import Path
import pytest
import pytest_asyncio
from aiohttp import web
from aiohttp.test_utils import TestClient, TestServer
from framework.host.event_bus import AgentEvent, EventBus, EventType
from framework.llm.provider import ToolUse
from framework.loader.tool_registry import ToolRegistry
from framework.tasks import TaskListRole, TaskStore
from framework.tasks.events import set_default_event_bus
from framework.tasks.hooks import clear_hooks
from framework.tasks.tools import register_colony_template_tools, register_task_tools
@pytest.fixture(autouse=True)
def _reset_hooks() -> None:
clear_hooks()
yield
clear_hooks()
@pytest.fixture
def store(tmp_path: Path) -> TaskStore:
return TaskStore(hive_root=tmp_path)
@pytest.fixture
def registry(store: TaskStore) -> ToolRegistry:
reg = ToolRegistry()
register_task_tools(reg, store=store)
register_colony_template_tools(reg, colony_id="abc", store=store)
return reg
async def _invoke(registry: ToolRegistry, name: str, **inputs):
executor = registry.get_executor()
result = executor(ToolUse(id=f"call_{name}", name=name, input=inputs))
if asyncio.iscoroutine(result):
result = await result
return result
# ---------------------------------------------------------------------------
# EventBus integration
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_task_created_emits_event(registry: ToolRegistry) -> None:
bus = EventBus()
set_default_event_bus(bus)
received: list[AgentEvent] = []
async def handler(ev: AgentEvent) -> None:
received.append(ev)
bus.subscribe([EventType.TASK_CREATED], handler)
token = ToolRegistry.set_execution_context(agent_id="alice", task_list_id="session:alice:s1")
try:
await _invoke(registry, "task_create", subject="hello")
finally:
ToolRegistry.reset_execution_context(token)
# Allow the publish to fan out.
await asyncio.sleep(0.05)
assert len(received) == 1
assert received[0].type == EventType.TASK_CREATED
assert received[0].data["task"]["subject"] == "hello"
set_default_event_bus(None)
@pytest.mark.asyncio
async def test_task_updated_emits_event(registry: ToolRegistry) -> None:
bus = EventBus()
set_default_event_bus(bus)
received: list[AgentEvent] = []
async def handler(ev: AgentEvent) -> None:
received.append(ev)
bus.subscribe([EventType.TASK_UPDATED], handler)
token = ToolRegistry.set_execution_context(agent_id="alice", task_list_id="session:alice:s1")
try:
await _invoke(registry, "task_create", subject="x")
await _invoke(registry, "task_update", id=1, status="in_progress")
finally:
ToolRegistry.reset_execution_context(token)
await asyncio.sleep(0.05)
assert len(received) >= 1
assert received[0].type == EventType.TASK_UPDATED
set_default_event_bus(None)
# ---------------------------------------------------------------------------
# REST routes integration
# ---------------------------------------------------------------------------
@pytest_asyncio.fixture
async def http_client(tmp_path: Path) -> TestClient:
"""Spin up a stripped-down aiohttp app exposing only the task routes."""
# Point the default TaskStore at the tmp_path so routes see our test data.
os.environ["HIVE_HOME"] = str(tmp_path)
# Force a fresh singleton.
import framework.tasks.store as _store_mod
_store_mod._default_store = None
from framework.server.routes_tasks import register_routes
app = web.Application()
register_routes(app)
server = TestServer(app)
client = TestClient(server)
await client.start_server()
yield client
await client.close()
@pytest.mark.asyncio
async def test_rest_get_task_list_404(http_client: TestClient) -> None:
resp = await http_client.get("/api/tasks/session:nope:nope")
assert resp.status == 404
body = await resp.json()
assert body["task_list_id"] == "session:nope:nope"
@pytest.mark.asyncio
async def test_rest_get_task_list_after_create(http_client: TestClient) -> None:
# Create a list + task via the store directly so we don't have to mount
# the tools just for this test.
from framework.tasks import get_task_store
store = get_task_store()
await store.ensure_task_list("session:alice:s1", role=TaskListRole.SESSION)
await store.create_task("session:alice:s1", subject="abc")
resp = await http_client.get("/api/tasks/session:alice:s1")
assert resp.status == 200
body = await resp.json()
assert body["task_list_id"] == "session:alice:s1"
assert body["role"] == "session"
assert len(body["tasks"]) == 1
assert body["tasks"][0]["subject"] == "abc"
@pytest.mark.asyncio
async def test_rest_colony_lists(http_client: TestClient) -> None:
resp = await http_client.get("/api/colonies/test_colony/task_lists?queen_session_id=sess123")
assert resp.status == 200
body = await resp.json()
assert body["template_task_list_id"] == "colony:test_colony"
assert body["queen_session_task_list_id"] == "session:queen:sess123"
# ---------------------------------------------------------------------------
# Cross-process durability — write in subprocess A, read in subprocess B.
# Demonstrates the "task survives runtime restart" guarantee.
# ---------------------------------------------------------------------------
def test_durability_across_subprocesses(tmp_path: Path) -> None:
env = dict(os.environ)
env["HIVE_HOME"] = str(tmp_path)
env["PYTHONUNBUFFERED"] = "1"
write_script = """
import asyncio
from framework.tasks import TaskStore, TaskListRole
async def main():
s = TaskStore()
await s.ensure_task_list('session:a:b', role=TaskListRole.SESSION)
rec = await s.create_task('session:a:b', subject='persisted')
print(rec.id)
asyncio.run(main())
"""
out = subprocess.run(
[sys.executable, "-c", write_script],
env=env,
check=True,
capture_output=True,
text=True,
)
written_id = int(out.stdout.strip())
assert written_id == 1
read_script = """
import asyncio
from framework.tasks import TaskStore
async def main():
s = TaskStore()
rs = await s.list_tasks('session:a:b')
print(len(rs), rs[0].subject if rs else '')
asyncio.run(main())
"""
out2 = subprocess.run(
[sys.executable, "-c", read_script],
env=env,
check=True,
capture_output=True,
text=True,
)
count, subject = out2.stdout.strip().split(" ", 1)
assert count == "1"
assert subject == "persisted"
# ---------------------------------------------------------------------------
# "run_parallel_workers" style flow at the storage level.
# Validates plan-and-spawn pattern: queen publishes templates, then stamps
# assigned_session per spawned worker.
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_template_assignment_flow(store: TaskStore) -> None:
template_id = "colony:swarm"
await store.ensure_task_list(template_id, role=TaskListRole.TEMPLATE)
rec1 = await store.create_task(template_id, subject="crawl A")
rec2 = await store.create_task(template_id, subject="crawl B")
# Simulate run_parallel_workers stamping after spawn.
await store.update_task(
template_id,
rec1.id,
metadata_patch={"assigned_session": "session:w1:w1", "assigned_worker_id": "w1"},
)
await store.update_task(
template_id,
rec2.id,
metadata_patch={"assigned_session": "session:w2:w2", "assigned_worker_id": "w2"},
)
rs = await store.list_tasks(template_id)
assert all(r.metadata.get("assigned_worker_id") for r in rs)
# ---------------------------------------------------------------------------
# Reset preserves byte-equivalence semantics (durability under graceful op)
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_graceful_no_op_preserves_files(store: TaskStore, tmp_path: Path) -> None:
"""The store has no shutdown hook — touching it never deletes files."""
list_id = "session:a:b"
await store.ensure_task_list(list_id, role=TaskListRole.SESSION)
rec = await store.create_task(list_id, subject="x")
pre = sorted((tmp_path).rglob("*.json"))
pre_bytes = {p.name: p.read_bytes() for p in pre}
# Simulate "agent loop teardown" — should be a no-op.
# (No method to call — the absence of teardown hooks IS the test.)
post = sorted((tmp_path).rglob("*.json"))
assert {p.name for p in post} == {p.name for p in pre}
for p in post:
assert p.read_bytes() == pre_bytes[p.name]
assert rec.id == 1
@@ -0,0 +1,188 @@
"""Integration tests that wire multiple subsystems together.
Verifies the plan-and-spawn pattern end-to-end:
- Queen authors colony template entries (via colony_template_add)
- "spawn" stamps assigned_session metadata + emits the right event
- Workers operate on their own session list (no fall-through)
"""
from __future__ import annotations
import asyncio
import json
from pathlib import Path
import pytest
from framework.host.event_bus import AgentEvent, EventBus, EventType
from framework.llm.provider import ToolUse
from framework.loader.tool_registry import ToolRegistry
from framework.tasks import TaskListRole, TaskStore
from framework.tasks.events import (
emit_colony_template_assignment,
set_default_event_bus,
)
from framework.tasks.hooks import clear_hooks
from framework.tasks.scoping import (
colony_task_list_id,
session_task_list_id,
)
from framework.tasks.tools import register_colony_template_tools, register_task_tools
@pytest.fixture(autouse=True)
def _reset_hooks() -> None:
clear_hooks()
yield
clear_hooks()
async def _invoke(reg: ToolRegistry, name: str, **inputs):
executor = reg.get_executor()
result = executor(ToolUse(id=f"call_{name}", name=name, input=inputs))
if asyncio.iscoroutine(result):
result = await result
return result
@pytest.mark.asyncio
async def test_queen_plans_workers_pick_up(tmp_path: Path) -> None:
"""Queen authors a 3-step plan; we simulate spawning 3 workers, each
associated with one template entry. Each worker writes to its own
session list. The colony template gets stamped with assigned_session.
"""
bus = EventBus()
set_default_event_bus(bus)
received: list[AgentEvent] = []
async def handler(ev: AgentEvent) -> None:
received.append(ev)
bus.subscribe(
[
EventType.TASK_CREATED,
EventType.TASK_UPDATED,
EventType.COLONY_TEMPLATE_ASSIGNMENT,
],
handler,
)
store = TaskStore(hive_root=tmp_path)
queen_reg = ToolRegistry()
register_task_tools(queen_reg, store=store)
register_colony_template_tools(queen_reg, colony_id="alpha", store=store)
# 1. Queen authors the plan.
qtoken = ToolRegistry.set_execution_context(
agent_id="queen",
task_list_id=session_task_list_id("queen", "qsess"),
colony_id="alpha",
)
try:
for subject in ("crawl A", "crawl B", "crawl C"):
r = await _invoke(queen_reg, "colony_template_add", subject=subject)
assert json.loads(r.content)["success"] is True
# Verify the colony template now has 3 entries.
list_result = await _invoke(queen_reg, "colony_template_list")
body = json.loads(list_result.content)
assert body["count"] == 3
template_entries = body["tasks"]
finally:
ToolRegistry.reset_execution_context(qtoken)
template_list_id = colony_task_list_id("alpha")
# 2. Simulate spawning a worker per template entry: stamp the
# assigned_session and emit the assignment event.
worker_ids = ["w1", "w2", "w3"]
for entry, wid in zip(template_entries, worker_ids, strict=True):
await store.update_task(
template_list_id,
entry["id"],
metadata_patch={
"assigned_session": session_task_list_id(wid, wid),
"assigned_worker_id": wid,
},
)
await emit_colony_template_assignment(
colony_id="alpha",
task_id=entry["id"],
assigned_session=session_task_list_id(wid, wid),
assigned_worker_id=wid,
)
# 3. Each worker operates on its OWN session list.
for wid in worker_ids:
worker_reg = ToolRegistry()
register_task_tools(worker_reg, store=store)
wtoken = ToolRegistry.set_execution_context(agent_id=wid, task_list_id=session_task_list_id(wid, wid))
try:
await _invoke(worker_reg, "task_create", subject=f"setup for {wid}")
await _invoke(worker_reg, "task_update", id=1, status="in_progress")
finally:
ToolRegistry.reset_execution_context(wtoken)
# 4. Verify the colony template entries are stamped + workers have
# their own private lists.
template_after = await store.list_tasks(template_list_id)
assert all(t.metadata.get("assigned_worker_id") in {"w1", "w2", "w3"} for t in template_after)
for wid in worker_ids:
worker_tasks = await store.list_tasks(session_task_list_id(wid, wid))
assert len(worker_tasks) == 1
assert worker_tasks[0].owner == wid # auto-stamped on in_progress
assert worker_tasks[0].subject == f"setup for {wid}"
# 5. Confirm the assignment events fired.
await asyncio.sleep(0.05)
assignments = [e for e in received if e.type == EventType.COLONY_TEMPLATE_ASSIGNMENT]
assert len(assignments) == 3
set_default_event_bus(None)
@pytest.mark.asyncio
async def test_session_tools_never_touch_template(tmp_path: Path) -> None:
"""The four session tools must operate exclusively on the session list.
Even when colony_id is set in execution context, task_create writes to
session list, not the template.
"""
store = TaskStore(hive_root=tmp_path)
reg = ToolRegistry()
register_task_tools(reg, store=store)
token = ToolRegistry.set_execution_context(
agent_id="alice",
task_list_id=session_task_list_id("alice", "sess1"),
colony_id="alpha", # has colony_id but we still write to session
)
try:
await _invoke(reg, "task_create", subject="my work")
finally:
ToolRegistry.reset_execution_context(token)
# Session list got the task.
session_tasks = await store.list_tasks(session_task_list_id("alice", "sess1"))
assert len(session_tasks) == 1
# Colony template MUST be empty (no leakage).
assert not await store.list_exists(colony_task_list_id("alpha"))
@pytest.mark.asyncio
async def test_resume_persisted_handle(tmp_path: Path) -> None:
"""A session list created in 'session A' is still readable as long as
we resolve to the same task_list_id."""
store = TaskStore(hive_root=tmp_path)
list_id = session_task_list_id("alice", "sess_persistent")
await store.ensure_task_list(list_id, role=TaskListRole.SESSION)
await store.create_task(list_id, subject="a")
await store.create_task(list_id, subject="b")
# Simulate a fresh process / "resume" — same hive_root, same list_id.
store2 = TaskStore(hive_root=tmp_path)
rs = await store2.list_tasks(list_id)
assert [t.subject for t in rs] == ["a", "b"]
@@ -0,0 +1,121 @@
"""Tests for the periodic task-reminder logic.
The reminder state is a small counter machine; the policy is:
- Bump on each iteration
- Reset to zero on any task op tool call (task_create / task_update /
colony_template_*)
- When ``turns_since_task_op >= REMINDER_THRESHOLD_TURNS`` AND
``turns_since_last_reminder >= REMINDER_COOLDOWN_TURNS`` AND there
are open tasks, fire a reminder
The build_reminder helper composes the message body checked for the
key behavioral nudges (granularity + completion discipline).
"""
from __future__ import annotations
from pathlib import Path
import pytest
from framework.tasks import TaskListRole, TaskStore
from framework.tasks.models import TaskStatus
from framework.tasks.reminders import (
REMINDER_COOLDOWN_TURNS,
REMINDER_THRESHOLD_TURNS,
ReminderState,
build_reminder,
saw_task_op,
)
def test_state_bumps_each_iteration() -> None:
s = ReminderState()
s.on_iteration()
s.on_iteration()
assert s.turns_since_task_op == 2
assert s.turns_since_last_reminder == 2
def test_state_resets_on_task_op() -> None:
s = ReminderState()
for _ in range(5):
s.on_iteration()
s.on_task_op()
assert s.turns_since_task_op == 0
# Reminder cooldown is independent — it tracks reminders, not ops.
assert s.turns_since_last_reminder == 5
def test_should_remind_below_threshold() -> None:
s = ReminderState()
s.turns_since_task_op = REMINDER_THRESHOLD_TURNS - 1
s.turns_since_last_reminder = REMINDER_COOLDOWN_TURNS
assert not s.should_remind(has_open_tasks=True)
def test_should_remind_no_tasks() -> None:
s = ReminderState()
s.turns_since_task_op = REMINDER_THRESHOLD_TURNS + 5
s.turns_since_last_reminder = REMINDER_COOLDOWN_TURNS + 5
assert not s.should_remind(has_open_tasks=False)
def test_should_remind_at_threshold() -> None:
s = ReminderState()
s.turns_since_task_op = REMINDER_THRESHOLD_TURNS
s.turns_since_last_reminder = REMINDER_COOLDOWN_TURNS
assert s.should_remind(has_open_tasks=True)
def test_cooldown_blocks_back_to_back() -> None:
s = ReminderState()
s.turns_since_task_op = REMINDER_THRESHOLD_TURNS + 5
s.on_reminder_sent()
assert not s.should_remind(has_open_tasks=True)
def test_saw_task_op_recognizes_mutating_tools() -> None:
assert saw_task_op(["task_create"])
assert saw_task_op(["read_file", "task_update"])
assert saw_task_op(["colony_template_add"])
# Reads do NOT reset the counter — important: model could read forever
# without making progress.
assert not saw_task_op(["task_list", "task_get"])
assert not saw_task_op([])
@pytest.mark.asyncio
async def test_build_reminder_includes_open_tasks(tmp_path: Path) -> None:
store = TaskStore(hive_root=tmp_path)
await store.ensure_task_list("session:a:b", role=TaskListRole.SESSION)
await store.create_task("session:a:b", subject="step 1")
rec2 = await store.create_task("session:a:b", subject="step 2")
await store.create_task("session:a:b", subject="step 3")
# Mark #2 in_progress so the reminder mentions it.
await store.update_task("session:a:b", rec2.id, status=TaskStatus.IN_PROGRESS)
records = await store.list_tasks("session:a:b")
body = build_reminder(records)
assert "task_reminder" in body
assert "step 1" in body
assert "step 2" in body
assert "step 3" in body
# Granularity nudge present.
assert "umbrella" in body.lower() or "atomic" in body.lower()
# Completion-discipline nudge present.
assert "completed" in body.lower()
# Anti-nag boilerplate remains present.
assert "NEVER mention this reminder to the user" in body
@pytest.mark.asyncio
async def test_build_reminder_empty_when_no_open(tmp_path: Path) -> None:
store = TaskStore(hive_root=tmp_path)
await store.ensure_task_list("session:a:b", role=TaskListRole.SESSION)
rec = await store.create_task("session:a:b", subject="done already")
await store.update_task("session:a:b", rec.id, status=TaskStatus.COMPLETED)
records = await store.list_tasks("session:a:b")
assert build_reminder(records) == ""
@@ -0,0 +1,65 @@
"""Tests for resolve_task_list_id."""
from __future__ import annotations
from dataclasses import dataclass
import pytest
from framework.tasks.scoping import (
colony_task_list_id,
parse_task_list_id,
resolve_task_list_id,
session_task_list_id,
)
@dataclass
class FakeCtx:
agent_id: str = ""
run_id: str = ""
execution_id: str = ""
stream_id: str = ""
task_list_id: str | None = None
def test_session_helper() -> None:
assert session_task_list_id("a", "b") == "session:a:b"
def test_colony_helper() -> None:
assert colony_task_list_id("c") == "colony:c"
def test_parse_session() -> None:
parts = parse_task_list_id("session:agent:sess")
assert parts == {"kind": "session", "agent_id": "agent", "session_id": "sess"}
def test_parse_colony() -> None:
parts = parse_task_list_id("colony:abc")
assert parts == {"kind": "colony", "colony_id": "abc"}
def test_resolve_uses_existing(monkeypatch: pytest.MonkeyPatch) -> None:
monkeypatch.delenv("HIVE_TASK_LIST_ID", raising=False)
ctx = FakeCtx(agent_id="x", run_id="r1", task_list_id="session:x:r1")
assert resolve_task_list_id(ctx) == "session:x:r1"
def test_resolve_env_override(monkeypatch: pytest.MonkeyPatch) -> None:
monkeypatch.setenv("HIVE_TASK_LIST_ID", "forced")
ctx = FakeCtx(agent_id="x", run_id="r1")
assert resolve_task_list_id(ctx) == "forced"
def test_resolve_synthesizes_session(monkeypatch: pytest.MonkeyPatch) -> None:
monkeypatch.delenv("HIVE_TASK_LIST_ID", raising=False)
ctx = FakeCtx(agent_id="alice", run_id="r123")
assert resolve_task_list_id(ctx) == "session:alice:r123"
def test_resolve_falls_back_to_unscoped(monkeypatch: pytest.MonkeyPatch) -> None:
monkeypatch.delenv("HIVE_TASK_LIST_ID", raising=False)
ctx = FakeCtx(agent_id="alice")
assert resolve_task_list_id(ctx).startswith("unscoped:")
+382
View File
@@ -0,0 +1,382 @@
"""Tests for the file-backed task store.
Concurrency / id-monotonicity / cascade / claim / reset the engineering
primitives the rest of the system relies on.
"""
from __future__ import annotations
import asyncio
import json
from pathlib import Path
import pytest
from framework.tasks import TaskListRole, TaskStatus, TaskStore
from framework.tasks.models import ClaimAlreadyOwned, ClaimBlocked, ClaimNotFound, ClaimOk
@pytest.fixture
def store(tmp_path: Path) -> TaskStore:
return TaskStore(hive_root=tmp_path)
@pytest.fixture
def list_id() -> str:
return "session:test_agent:test_session"
# ---------------------------------------------------------------------------
# Basic CRUD
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_create_and_get(store: TaskStore, list_id: str) -> None:
await store.ensure_task_list(list_id, role=TaskListRole.SESSION)
rec = await store.create_task(list_id, subject="hi")
assert rec.id == 1
fetched = await store.get_task(list_id, 1)
assert fetched is not None
assert fetched.subject == "hi"
assert fetched.status == TaskStatus.PENDING
@pytest.mark.asyncio
async def test_get_missing_returns_none(store: TaskStore, list_id: str) -> None:
assert await store.get_task(list_id, 999) is None
@pytest.mark.asyncio
async def test_list_ascending(store: TaskStore, list_id: str) -> None:
await store.ensure_task_list(list_id, role=TaskListRole.SESSION)
await store.create_task(list_id, subject="a")
await store.create_task(list_id, subject="b")
await store.create_task(list_id, subject="c")
rs = await store.list_tasks(list_id)
assert [r.id for r in rs] == [1, 2, 3]
@pytest.mark.asyncio
async def test_list_filters_internal(store: TaskStore, list_id: str) -> None:
await store.ensure_task_list(list_id, role=TaskListRole.SESSION)
await store.create_task(list_id, subject="visible")
await store.create_task(list_id, subject="hidden", metadata={"_internal": True})
public = await store.list_tasks(list_id)
assert len(public) == 1
all_ = await store.list_tasks(list_id, include_internal=True)
assert len(all_) == 2
# ---------------------------------------------------------------------------
# Concurrent creation: two parallel calls -> N and N+1
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_concurrent_create_distinct_ids(store: TaskStore, list_id: str) -> None:
await store.ensure_task_list(list_id, role=TaskListRole.SESSION)
results = await asyncio.gather(*(store.create_task(list_id, subject=f"t{i}") for i in range(20)))
ids = sorted(r.id for r in results)
assert ids == list(range(1, 21))
# ---------------------------------------------------------------------------
# Update + change detection
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_update_returns_changed_fields(store: TaskStore, list_id: str) -> None:
await store.ensure_task_list(list_id, role=TaskListRole.SESSION)
rec = await store.create_task(list_id, subject="orig")
new, fields = await store.update_task(list_id, rec.id, subject="orig", status=TaskStatus.IN_PROGRESS)
assert fields == ["status"] # subject unchanged shouldn't appear
assert new.status == TaskStatus.IN_PROGRESS
@pytest.mark.asyncio
async def test_update_missing_returns_none(store: TaskStore, list_id: str) -> None:
new, fields = await store.update_task(list_id, 42, subject="x")
assert new is None
assert fields == []
@pytest.mark.asyncio
async def test_metadata_patch_merges_and_deletes(store: TaskStore, list_id: str) -> None:
await store.ensure_task_list(list_id, role=TaskListRole.SESSION)
rec = await store.create_task(list_id, subject="x", metadata={"a": 1, "b": 2})
new, _ = await store.update_task(list_id, rec.id, metadata_patch={"a": 10, "b": None})
assert new.metadata == {"a": 10}
# ---------------------------------------------------------------------------
# Bidirectional blocks
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_blocks_bidirectional(store: TaskStore, list_id: str) -> None:
await store.ensure_task_list(list_id, role=TaskListRole.SESSION)
a = await store.create_task(list_id, subject="a")
b = await store.create_task(list_id, subject="b")
new_a, _ = await store.update_task(list_id, a.id, add_blocks=[b.id])
assert b.id in new_a.blocks
fetched_b = await store.get_task(list_id, b.id)
assert a.id in fetched_b.blocked_by
@pytest.mark.asyncio
async def test_blocked_by_bidirectional(store: TaskStore, list_id: str) -> None:
await store.ensure_task_list(list_id, role=TaskListRole.SESSION)
a = await store.create_task(list_id, subject="a")
b = await store.create_task(list_id, subject="b")
new_b, _ = await store.update_task(list_id, b.id, add_blocked_by=[a.id])
assert a.id in new_b.blocked_by
fetched_a = await store.get_task(list_id, a.id)
assert b.id in fetched_a.blocks
# ---------------------------------------------------------------------------
# Delete: highwatermark + cascade
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_delete_increments_highwatermark(store: TaskStore, list_id: str) -> None:
await store.ensure_task_list(list_id, role=TaskListRole.SESSION)
await store.create_task(list_id, subject="a")
b = await store.create_task(list_id, subject="b")
deleted, _ = await store.delete_task(list_id, b.id)
assert deleted
new = await store.create_task(list_id, subject="c")
assert new.id == b.id + 1, "deleted ids must never be reused"
@pytest.mark.asyncio
async def test_delete_cascades_blocks(store: TaskStore, list_id: str) -> None:
await store.ensure_task_list(list_id, role=TaskListRole.SESSION)
a = await store.create_task(list_id, subject="a")
b = await store.create_task(list_id, subject="b")
c = await store.create_task(list_id, subject="c")
await store.update_task(list_id, a.id, add_blocks=[b.id])
await store.update_task(list_id, c.id, add_blocked_by=[b.id])
_, cascade = await store.delete_task(list_id, b.id)
assert sorted(cascade) == sorted([a.id, c.id])
fetched_a = await store.get_task(list_id, a.id)
fetched_c = await store.get_task(list_id, c.id)
assert b.id not in fetched_a.blocks
assert b.id not in fetched_c.blocked_by
@pytest.mark.asyncio
async def test_delete_missing_returns_false(store: TaskStore, list_id: str) -> None:
deleted, cascade = await store.delete_task(list_id, 42)
assert not deleted
assert cascade == []
# ---------------------------------------------------------------------------
# Reset preserves high-water-mark
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_reset_preserves_floor(store: TaskStore, list_id: str) -> None:
await store.ensure_task_list(list_id, role=TaskListRole.SESSION)
for _ in range(5):
await store.create_task(list_id, subject="x")
await store.reset_task_list(list_id)
new = await store.create_task(list_id, subject="post-reset")
assert new.id == 6
# ---------------------------------------------------------------------------
# Claim semantics (used by run_parallel_workers)
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_claim_ok(store: TaskStore, list_id: str) -> None:
await store.ensure_task_list(list_id, role=TaskListRole.TEMPLATE)
rec = await store.create_task(list_id, subject="x")
result = await store.claim_task_with_busy_check(list_id, rec.id, "agent_a")
assert isinstance(result, ClaimOk)
assert result.record.owner == "agent_a"
@pytest.mark.asyncio
async def test_claim_already_owned(store: TaskStore, list_id: str) -> None:
await store.ensure_task_list(list_id, role=TaskListRole.TEMPLATE)
rec = await store.create_task(list_id, subject="x", owner="agent_a")
result = await store.claim_task_with_busy_check(list_id, rec.id, "agent_b")
assert isinstance(result, ClaimAlreadyOwned)
assert result.by == "agent_a"
@pytest.mark.asyncio
async def test_claim_not_found(store: TaskStore, list_id: str) -> None:
result = await store.claim_task_with_busy_check(list_id, 999, "agent_a")
assert isinstance(result, ClaimNotFound)
@pytest.mark.asyncio
async def test_claim_blocked(store: TaskStore, list_id: str) -> None:
await store.ensure_task_list(list_id, role=TaskListRole.TEMPLATE)
a = await store.create_task(list_id, subject="prereq")
b = await store.create_task(list_id, subject="dep")
await store.update_task(list_id, b.id, add_blocked_by=[a.id])
# a is still pending -> b blocked.
result = await store.claim_task_with_busy_check(list_id, b.id, "agent_a")
assert isinstance(result, ClaimBlocked)
assert a.id in result.by
# ---------------------------------------------------------------------------
# Meta lifecycle: ensure_task_list is idempotent and tracks last_seen
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_ensure_task_list_idempotent(store: TaskStore, list_id: str) -> None:
m1 = await store.ensure_task_list(list_id, role=TaskListRole.SESSION, session_id="s1")
m2 = await store.ensure_task_list(list_id, role=TaskListRole.SESSION, session_id="s2")
assert m1.created_at == m2.created_at # same dir
assert "s1" in m2.last_seen_session_ids
assert "s2" in m2.last_seen_session_ids
@pytest.mark.asyncio
async def test_ensure_task_list_caps_history(store: TaskStore, list_id: str) -> None:
for i in range(15):
await store.ensure_task_list(list_id, role=TaskListRole.SESSION, session_id=f"s{i}")
meta = await store.get_meta(list_id)
assert len(meta.last_seen_session_ids) == 10
assert "s14" in meta.last_seen_session_ids
assert "s4" not in meta.last_seen_session_ids
# ---------------------------------------------------------------------------
# Path resolution sanity
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_colony_path(store: TaskStore, tmp_path: Path) -> None:
await store.ensure_task_list("colony:abc", role=TaskListRole.TEMPLATE)
assert (tmp_path / "colonies" / "abc" / "tasks.json").exists()
@pytest.mark.asyncio
async def test_session_path(store: TaskStore, tmp_path: Path) -> None:
await store.ensure_task_list("session:agent_x:sess_y", role=TaskListRole.SESSION)
p = tmp_path / "agents" / "agent_x" / "sessions" / "sess_y" / "tasks.json"
assert p.exists()
@pytest.mark.asyncio
async def test_canonical_queen_session_dir_wins(store: TaskStore, tmp_path: Path) -> None:
"""When ``agents/queens/{name}/sessions/{sid}/`` exists on disk, the task
doc lands there beside conversations/events/summary instead of in
the orphaned ``agents/{agent_id}/sessions/{sid}/`` location.
"""
sid = "session_20260429_test"
canonical = tmp_path / "agents" / "queens" / "queen_growth" / "sessions" / sid
canonical.mkdir(parents=True)
# Pretend the rest of the session is here.
(canonical / "events.jsonl").write_text("", encoding="utf-8")
list_id = f"session:queen:{sid}"
await store.ensure_task_list(list_id, role=TaskListRole.SESSION)
rec = await store.create_task(list_id, subject="hello")
assert (canonical / "tasks.json").exists()
assert not (tmp_path / "agents" / "queen" / "sessions" / sid / "tasks.json").exists()
fetched = await store.list_tasks(list_id)
assert [r.id for r in fetched] == [rec.id]
# ---------------------------------------------------------------------------
# Lazy migration from the older fan-out layout
# ---------------------------------------------------------------------------
def _seed_legacy_session(tmp_path: Path, agent: str, sess: str, n_tasks: int) -> Path:
"""Hand-craft an older ``{root}/tasks/`` layout the way it used to live
on disk, so we can prove the lazy migration folds it correctly.
"""
legacy = tmp_path / "agents" / agent / "sessions" / sess / "tasks"
(legacy / "tasks").mkdir(parents=True)
list_id = f"session:{agent}:{sess}"
(legacy / "meta.json").write_text(
json.dumps(
{
"task_list_id": list_id,
"role": "session",
"creator_agent_id": None,
"created_at": 1000.0,
"last_seen_session_ids": ["s1"],
"schema_version": 1,
}
),
encoding="utf-8",
)
(legacy / ".highwatermark").write_text(str(n_tasks), encoding="utf-8")
(legacy / ".lock").write_text("", encoding="utf-8")
for i in range(1, n_tasks + 1):
(legacy / "tasks" / f"{i:04d}.json").write_text(
json.dumps(
{
"id": i,
"subject": f"legacy {i}",
"description": "",
"active_form": None,
"owner": None,
"status": "pending",
"blocks": [],
"blocked_by": [],
"metadata": {},
"created_at": 1000.0 + i,
"updated_at": 1000.0 + i,
}
),
encoding="utf-8",
)
return legacy
@pytest.mark.asyncio
async def test_legacy_layout_migrates_on_first_read(store: TaskStore, tmp_path: Path) -> None:
legacy = _seed_legacy_session(tmp_path, "agent_z", "sess_z", 3)
list_id = "session:agent_z:sess_z"
# First read should fold the legacy fan-out into tasks.json.
records = await store.list_tasks(list_id)
assert [r.id for r in records] == [1, 2, 3]
assert [r.subject for r in records] == ["legacy 1", "legacy 2", "legacy 3"]
# New doc exists; the legacy dir is gone.
new_doc = tmp_path / "agents" / "agent_z" / "sessions" / "sess_z" / "tasks.json"
assert new_doc.exists()
assert not legacy.exists()
# Highwatermark is preserved — next id is 4, not 1.
new_rec = await store.create_task(list_id, subject="post-migration")
assert new_rec.id == 4
@pytest.mark.asyncio
async def test_legacy_layout_migrates_on_first_write(store: TaskStore, tmp_path: Path) -> None:
_seed_legacy_session(tmp_path, "agent_w", "sess_w", 2)
list_id = "session:agent_w:sess_w"
# Update a legacy task — must trigger migration, then mutate.
new, changed = await store.update_task(list_id, 2, status=TaskStatus.IN_PROGRESS)
assert new is not None
assert changed == ["status"]
assert new.status == TaskStatus.IN_PROGRESS
# Doc reflects both legacy tasks.
listed = await store.list_tasks(list_id)
assert len(listed) == 2
@pytest.mark.asyncio
async def test_legacy_list_exists(store: TaskStore, tmp_path: Path) -> None:
_seed_legacy_session(tmp_path, "agent_q", "sess_q", 1)
assert await store.list_exists("session:agent_q:sess_q")
+488
View File
@@ -0,0 +1,488 @@
"""End-to-end tool tests via ToolRegistry.get_executor()."""
from __future__ import annotations
import asyncio
import json
from pathlib import Path
import pytest
from framework.llm.provider import ToolUse
from framework.loader.tool_registry import ToolRegistry
from framework.tasks import TaskStore
from framework.tasks.hooks import (
HOOK_TASK_COMPLETED,
HOOK_TASK_CREATED,
BlockingHookError,
clear_hooks,
register_hook,
)
from framework.tasks.tools import register_colony_template_tools, register_task_tools
@pytest.fixture(autouse=True)
def _reset_hooks() -> None:
clear_hooks()
yield
clear_hooks()
@pytest.fixture
def store(tmp_path: Path) -> TaskStore:
return TaskStore(hive_root=tmp_path)
@pytest.fixture
def registry_with_session_tools(store: TaskStore) -> ToolRegistry:
reg = ToolRegistry()
register_task_tools(reg, store=store)
return reg
async def _invoke(registry: ToolRegistry, name: str, **inputs):
"""Invoke a tool via the registry's executor protocol."""
executor = registry.get_executor()
result = executor(ToolUse(id=f"call_{name}", name=name, input=inputs))
if asyncio.iscoroutine(result):
result = await result
return result
def _set_ctx(*, agent_id: str, task_list_id: str, **extra):
return ToolRegistry.set_execution_context(agent_id=agent_id, task_list_id=task_list_id, **extra)
# ---------------------------------------------------------------------------
# Session tools — happy paths
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_create_then_list(registry_with_session_tools: ToolRegistry) -> None:
reg = registry_with_session_tools
list_id = "session:agent_a:sess_1"
token = _set_ctx(agent_id="agent_a", task_list_id=list_id)
try:
result = await _invoke(reg, "task_create", subject="Plan retrieval")
assert result.is_error is False
body = json.loads(result.content)
assert body["success"] is True
assert body["task_id"] == 1
result2 = await _invoke(reg, "task_list")
body2 = json.loads(result2.content)
assert body2["count"] == 1
assert body2["tasks"][0]["subject"] == "Plan retrieval"
finally:
ToolRegistry.reset_execution_context(token)
@pytest.mark.asyncio
async def test_update_in_progress_auto_owner(
registry_with_session_tools: ToolRegistry,
) -> None:
reg = registry_with_session_tools
list_id = "session:agent_a:sess_1"
token = _set_ctx(agent_id="agent_a", task_list_id=list_id)
try:
await _invoke(reg, "task_create", subject="x")
result = await _invoke(reg, "task_update", id=1, status="in_progress")
body = json.loads(result.content)
assert body["success"] is True
assert body["task"]["status"] == "in_progress"
assert body["task"]["owner"] == "agent_a" # auto-filled
finally:
ToolRegistry.reset_execution_context(token)
@pytest.mark.asyncio
async def test_update_status_deleted(
registry_with_session_tools: ToolRegistry,
) -> None:
reg = registry_with_session_tools
list_id = "session:agent_a:sess_1"
token = _set_ctx(agent_id="agent_a", task_list_id=list_id)
try:
await _invoke(reg, "task_create", subject="x")
result = await _invoke(reg, "task_update", id=1, status="deleted")
body = json.loads(result.content)
assert body["success"] is True
assert body["deleted"] is True
# Subsequent list sees nothing.
body2 = json.loads((await _invoke(reg, "task_list")).content)
assert body2["count"] == 0
finally:
ToolRegistry.reset_execution_context(token)
@pytest.mark.asyncio
async def test_get_returns_full_record(
registry_with_session_tools: ToolRegistry,
) -> None:
reg = registry_with_session_tools
list_id = "session:agent_a:sess_1"
token = _set_ctx(agent_id="agent_a", task_list_id=list_id)
try:
await _invoke(reg, "task_create", subject="x", description="full body")
result = await _invoke(reg, "task_get", id=1)
body = json.loads(result.content)
assert body["task"]["description"] == "full body"
finally:
ToolRegistry.reset_execution_context(token)
# ---------------------------------------------------------------------------
# Task-not-found is non-error (so sibling tool cancellation doesn't cascade)
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_task_not_found_is_not_error(
registry_with_session_tools: ToolRegistry,
) -> None:
reg = registry_with_session_tools
list_id = "session:agent_a:sess_1"
token = _set_ctx(agent_id="agent_a", task_list_id=list_id)
try:
result = await _invoke(reg, "task_update", id=42, subject="ghost")
# is_error must be False so the streaming executor doesn't cascade-cancel.
assert result.is_error is False
body = json.loads(result.content)
assert body["success"] is False
finally:
ToolRegistry.reset_execution_context(token)
# ---------------------------------------------------------------------------
# Hooks: task_created blocking deletes the just-created task
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_create_batch_creates_n_tasks_atomically(
registry_with_session_tools: ToolRegistry,
) -> None:
reg = registry_with_session_tools
list_id = "session:agent_a:sess_1"
token = _set_ctx(agent_id="agent_a", task_list_id=list_id)
try:
result = await _invoke(
reg,
"task_create_batch",
tasks=[
{"subject": "step 1", "active_form": "Doing 1"},
{"subject": "step 2"},
{"subject": "step 3"},
],
)
assert result.is_error is False
body = json.loads(result.content)
assert body["success"] is True
assert body["task_ids"] == [1, 2, 3]
# Compact summary message — references first id and the range.
assert "#1-#3" in body["message"] or "#1, #2, #3" in body["message"]
assert "Mark #1 in_progress" in body["message"]
# Sanity: list shows all three.
body2 = json.loads((await _invoke(reg, "task_list")).content)
assert body2["count"] == 3
finally:
ToolRegistry.reset_execution_context(token)
@pytest.mark.asyncio
async def test_create_batch_rejects_empty(
registry_with_session_tools: ToolRegistry,
) -> None:
reg = registry_with_session_tools
token = _set_ctx(agent_id="a", task_list_id="session:a:s")
try:
result = await _invoke(reg, "task_create_batch", tasks=[])
body = json.loads(result.content)
assert body["success"] is False
assert "non-empty" in body["error"]
finally:
ToolRegistry.reset_execution_context(token)
@pytest.mark.asyncio
async def test_create_batch_rejects_malformed_spec_atomically(
registry_with_session_tools: ToolRegistry,
) -> None:
"""A bad subject in the middle of the batch must reject the whole
batch not leave partial state on disk."""
reg = registry_with_session_tools
token = _set_ctx(agent_id="a", task_list_id="session:a:s")
try:
result = await _invoke(
reg,
"task_create_batch",
tasks=[{"subject": "good"}, {"subject": ""}],
)
body = json.loads(result.content)
assert body["success"] is False
# Confirm zero tasks landed.
body2 = json.loads((await _invoke(reg, "task_list")).content)
assert body2["count"] == 0
finally:
ToolRegistry.reset_execution_context(token)
@pytest.mark.asyncio
async def test_create_batch_hook_blocks_rolls_back_whole_batch(
registry_with_session_tools: ToolRegistry,
) -> None:
"""If a task_created hook blocks even one task in the batch, the
entire batch must roll back."""
reg = registry_with_session_tools
# Block on the second task only.
def selective_blocker(ctx) -> None:
if ctx.task.subject == "block me":
raise BlockingHookError("policy")
register_hook(HOOK_TASK_CREATED, selective_blocker)
token = _set_ctx(agent_id="a", task_list_id="session:a:s")
try:
result = await _invoke(
reg,
"task_create_batch",
tasks=[
{"subject": "ok 1"},
{"subject": "block me"},
{"subject": "ok 3"},
],
)
body = json.loads(result.content)
assert body["success"] is False
assert "rolled back" in body["error"]
# All three rolled back.
body2 = json.loads((await _invoke(reg, "task_list")).content)
assert body2["count"] == 0
finally:
ToolRegistry.reset_execution_context(token)
@pytest.mark.asyncio
async def test_create_batch_then_single_create_keeps_id_monotonic(
registry_with_session_tools: ToolRegistry,
) -> None:
"""task_create_batch uses sequential ids; a follow-up task_create
should pick up at the next id after the batch's highest."""
reg = registry_with_session_tools
token = _set_ctx(agent_id="a", task_list_id="session:a:s")
try:
await _invoke(
reg,
"task_create_batch",
tasks=[{"subject": "a"}, {"subject": "b"}, {"subject": "c"}],
)
result = await _invoke(reg, "task_create", subject="d")
body = json.loads(result.content)
assert body["task_id"] == 4
finally:
ToolRegistry.reset_execution_context(token)
@pytest.mark.asyncio
async def test_completion_suffix_points_to_next_pending(
registry_with_session_tools: ToolRegistry,
) -> None:
"""When a task is marked completed, the result should point at the
lowest-id pending task as a steering nudge."""
reg = registry_with_session_tools
list_id = "session:agent_a:sess_1"
token = _set_ctx(agent_id="agent_a", task_list_id=list_id)
try:
await _invoke(reg, "task_create", subject="step 1")
await _invoke(reg, "task_create", subject="step 2")
await _invoke(reg, "task_create", subject="step 3")
await _invoke(reg, "task_update", id=1, status="in_progress")
result = await _invoke(reg, "task_update", id=1, status="completed")
body = json.loads(result.content)
assert body["success"] is True
assert "Next pending: #2" in body["message"]
assert "step 2" in body["message"]
finally:
ToolRegistry.reset_execution_context(token)
@pytest.mark.asyncio
async def test_completion_suffix_signals_all_done(
registry_with_session_tools: ToolRegistry,
) -> None:
reg = registry_with_session_tools
list_id = "session:agent_a:sess_1"
token = _set_ctx(agent_id="agent_a", task_list_id=list_id)
try:
await _invoke(reg, "task_create", subject="only step")
await _invoke(reg, "task_update", id=1, status="in_progress")
result = await _invoke(reg, "task_update", id=1, status="completed")
body = json.loads(result.content)
assert "All tasks complete" in body["message"]
finally:
ToolRegistry.reset_execution_context(token)
@pytest.mark.asyncio
async def test_completion_suffix_skips_blocked_pending(
registry_with_session_tools: ToolRegistry,
) -> None:
"""If the only pending task is blocked, the suffix should not point at
it fall through to "all done" or note in-progress siblings."""
reg = registry_with_session_tools
list_id = "session:agent_a:sess_1"
token = _set_ctx(agent_id="agent_a", task_list_id=list_id)
try:
await _invoke(reg, "task_create", subject="prereq")
await _invoke(reg, "task_create", subject="blocked dep")
# #2 is blocked by #1.
await _invoke(reg, "task_update", id=2, add_blocked_by=[1])
await _invoke(reg, "task_update", id=1, status="in_progress")
# Don't actually complete #1 — instead add an unrelated done.
await _invoke(reg, "task_create", subject="extra step")
await _invoke(reg, "task_update", id=3, status="in_progress")
result = await _invoke(reg, "task_update", id=3, status="completed")
body = json.loads(result.content)
# #2 is still blocked by uncompleted #1, so the suffix shouldn't
# surface it. #1 is in_progress, so the suffix highlights that.
assert "Still in progress: #1" in body["message"]
finally:
ToolRegistry.reset_execution_context(token)
@pytest.mark.asyncio
async def test_hook_blocks_task_created(
registry_with_session_tools: ToolRegistry,
) -> None:
reg = registry_with_session_tools
list_id = "session:agent_a:sess_1"
def blocker(ctx) -> None:
raise BlockingHookError("test policy")
register_hook(HOOK_TASK_CREATED, blocker)
token = _set_ctx(agent_id="agent_a", task_list_id=list_id)
try:
result = await _invoke(reg, "task_create", subject="will be aborted")
body = json.loads(result.content)
assert body["success"] is False
# The task must have been rolled back.
body2 = json.loads((await _invoke(reg, "task_list")).content)
assert body2["count"] == 0
finally:
ToolRegistry.reset_execution_context(token)
@pytest.mark.asyncio
async def test_hook_blocks_task_completed(
registry_with_session_tools: ToolRegistry,
) -> None:
reg = registry_with_session_tools
list_id = "session:agent_a:sess_1"
register_hook(HOOK_TASK_COMPLETED, lambda ctx: (_ for _ in ()).throw(BlockingHookError("nope")))
token = _set_ctx(agent_id="agent_a", task_list_id=list_id)
try:
await _invoke(reg, "task_create", subject="x")
await _invoke(reg, "task_update", id=1, status="in_progress")
result = await _invoke(reg, "task_update", id=1, status="completed")
body = json.loads(result.content)
assert body["success"] is False
# Status rolled back to in_progress, not stuck on completed.
body2 = json.loads((await _invoke(reg, "task_get", id=1)).content)
assert body2["task"]["status"] == "in_progress"
finally:
ToolRegistry.reset_execution_context(token)
@pytest.mark.asyncio
async def test_hook_blocks_task_completed_never_writes(
registry_with_session_tools: ToolRegistry,
store: TaskStore,
) -> None:
"""Veto-before-write: when the task_completed hook blocks, the COMPLETED
status must NEVER touch disk `updated_at` should equal the value from
the prior in_progress write, not be bumped by a transient COMPLETED
write + rollback."""
from framework.tasks.models import TaskStatus
reg = registry_with_session_tools
list_id = "session:agent_a:sess_1"
register_hook(HOOK_TASK_COMPLETED, lambda ctx: (_ for _ in ()).throw(BlockingHookError("nope")))
token = _set_ctx(agent_id="agent_a", task_list_id=list_id)
try:
await _invoke(reg, "task_create", subject="x")
await _invoke(reg, "task_update", id=1, status="in_progress")
# Snapshot updated_at after the in_progress write — this is the
# value that should persist if veto-before-write is honored.
before = await store.get_task(list_id, 1)
assert before is not None
ts_before = before.updated_at
# Vetoed completion attempt.
result = await _invoke(reg, "task_update", id=1, status="completed")
body = json.loads(result.content)
assert body["success"] is False
# On-disk record must be byte-identical to the pre-vet snapshot —
# no transient COMPLETED write, no rollback updated_at bump.
after = await store.get_task(list_id, 1)
assert after is not None
assert after.status == TaskStatus.IN_PROGRESS
assert after.updated_at == ts_before, (
"veto-before-write violated: updated_at changed, indicating a transient write happened"
)
finally:
ToolRegistry.reset_execution_context(token)
# ---------------------------------------------------------------------------
# Colony template tools
# ---------------------------------------------------------------------------
@pytest.fixture
def queen_registry(store: TaskStore) -> ToolRegistry:
reg = ToolRegistry()
register_task_tools(reg, store=store)
register_colony_template_tools(reg, colony_id="abc", store=store)
return reg
@pytest.mark.asyncio
async def test_colony_template_add_and_list(queen_registry: ToolRegistry) -> None:
reg = queen_registry
queen_session_list = "session:queen:sess_1"
token = _set_ctx(agent_id="queen", task_list_id=queen_session_list, colony_id="abc")
try:
await _invoke(reg, "colony_template_add", subject="crawl")
await _invoke(reg, "colony_template_add", subject="parse")
body = json.loads((await _invoke(reg, "colony_template_list")).content)
assert body["count"] == 2
# The session task list should be empty — colony tools don't write there.
body_session = json.loads((await _invoke(reg, "task_list")).content)
assert body_session["count"] == 0
finally:
ToolRegistry.reset_execution_context(token)
@pytest.mark.asyncio
async def test_colony_template_remove(queen_registry: ToolRegistry) -> None:
reg = queen_registry
token = _set_ctx(agent_id="queen", task_list_id="session:queen:sess_1", colony_id="abc")
try:
await _invoke(reg, "colony_template_add", subject="a")
await _invoke(reg, "colony_template_add", subject="b")
result = await _invoke(reg, "colony_template_remove", id=2)
body = json.loads(result.content)
assert body["success"] is True
# Next add gets id 3 (highwatermark preserved)
result2 = await _invoke(reg, "colony_template_add", subject="c")
body2 = json.loads(result2.content)
assert body2["task_id"] == 3
finally:
ToolRegistry.reset_execution_context(token)
+11
View File
@@ -0,0 +1,11 @@
"""Task tools — the four session-list tools and the queen-only colony template tools."""
from framework.tasks.tools.register import (
register_colony_template_tools,
register_task_tools,
)
__all__ = [
"register_colony_template_tools",
"register_task_tools",
]
+39
View File
@@ -0,0 +1,39 @@
"""Context resolution for task-tool executors.
Tool executors run synchronously inside ``ToolRegistry.get_executor()``;
they need the calling agent's id and task_list_id to know which list to
write to. We pull both from contextvars set by the runner /
ColonyRuntime / orchestrator before each agent's iteration.
"""
from __future__ import annotations
from typing import Any
from framework.loader.tool_registry import _execution_context
def current_context() -> dict[str, Any]:
return dict(_execution_context.get() or {})
def current_agent_id() -> str | None:
return current_context().get("agent_id")
def current_task_list_id() -> str | None:
return current_context().get("task_list_id")
def current_colony_id() -> str | None:
return current_context().get("colony_id")
def current_picked_up_from() -> tuple[str, int] | None:
"""If this session was spawned for a colony template entry, return it."""
raw = current_context().get("picked_up_from")
if not raw:
return None
if isinstance(raw, tuple) and len(raw) == 2:
return raw[0], int(raw[1])
return None
+238
View File
@@ -0,0 +1,238 @@
"""Queen-only colony template tools.
These tools manipulate a colony's task template — the queen's spawn plan.
They are gated to the queen of a colony at registration time
(``register_colony_template_tools(colony_id=...)``).
Workers never see these tools. The four session tools (`task_create`,
`task_update`, `task_list`, `task_get`) operate exclusively on the
caller's session list — never the colony template.
"""
from __future__ import annotations
import logging
from typing import Any
from framework.llm.provider import Tool
from framework.tasks.events import (
emit_task_created,
emit_task_deleted,
emit_task_updated,
)
from framework.tasks.models import TaskRecord, TaskStatus
from framework.tasks.scoping import colony_task_list_id
from framework.tasks.store import _UNSET_SENTINEL, TaskStore, get_task_store
from framework.tasks.tools.session_tools import _serialize_task
logger = logging.getLogger(__name__)
def _add_schema() -> dict[str, Any]:
return {
"type": "object",
"properties": {
"subject": {"type": "string"},
"description": {"type": "string"},
"active_form": {"type": "string"},
"metadata": {"type": "object"},
},
"required": ["subject"],
}
def _update_schema() -> dict[str, Any]:
return {
"type": "object",
"properties": {
"id": {"type": "integer"},
"subject": {"type": "string"},
"description": {"type": "string"},
"active_form": {"type": "string"},
"owner": {"type": ["string", "null"]},
"status": {
"type": "string",
"enum": ["pending", "in_progress", "completed"],
},
"metadata_patch": {"type": "object"},
},
"required": ["id"],
}
def _remove_schema() -> dict[str, Any]:
return {
"type": "object",
"properties": {"id": {"type": "integer"}},
"required": ["id"],
}
def _list_schema() -> dict[str, Any]:
return {"type": "object", "properties": {}}
_ADD_DESC = (
"Append a task to your colony's spawn-plan template. Templates are read "
"by `run_parallel_workers` and the UI; workers do not pull from the "
"template after spawn. Use this to plan colony work before spawning."
)
_UPDATE_DESC = (
"Update a template entry on your colony's spawn-plan template (e.g., "
"stamp completion when a worker reports back, adjust subject/description). "
"Only the queen can call this."
)
_REMOVE_DESC = (
"Remove a template entry from your colony's spawn-plan template. The "
"id is reserved (high-water-mark preserved) — never reused."
)
_LIST_DESC = (
"List all entries on your colony's spawn-plan template. Each entry "
"includes any `metadata.assigned_session` stamp that ties the entry to "
"a spawned worker."
)
def _make_add_executor(store: TaskStore, list_id: str):
async def execute(inputs: dict) -> dict[str, Any]:
rec: TaskRecord = await store.create_task(
list_id,
subject=inputs["subject"],
description=inputs.get("description", ""),
active_form=inputs.get("active_form"),
metadata=inputs.get("metadata") or {},
)
await emit_task_created(task_list_id=list_id, record=rec)
return {
"success": True,
"task_list_id": list_id,
"task_id": rec.id,
"message": f"Template entry #{rec.id} added: {rec.subject}",
"task": _serialize_task(rec),
}
return execute
def _make_update_executor(store: TaskStore, list_id: str):
async def execute(inputs: dict) -> dict[str, Any]:
task_id = int(inputs["id"])
status_in = inputs.get("status")
status_enum = TaskStatus(status_in) if status_in else None
owner_in = inputs.get("owner", _UNSET_SENTINEL)
new, fields = await store.update_task(
list_id,
task_id,
subject=inputs.get("subject"),
description=inputs.get("description"),
active_form=inputs.get("active_form"),
owner=owner_in,
status=status_enum,
metadata_patch=inputs.get("metadata_patch"),
)
if new is None:
return {
"success": False,
"task_list_id": list_id,
"task_id": task_id,
"message": f"Template entry #{task_id} not found.",
}
if fields:
await emit_task_updated(task_list_id=list_id, record=new, fields=fields)
return {
"success": True,
"task_list_id": list_id,
"task_id": task_id,
"fields": fields,
"message": f"Template entry #{task_id} updated. Fields: {', '.join(fields) or '(none)'}.",
"task": _serialize_task(new),
}
return execute
def _make_remove_executor(store: TaskStore, list_id: str):
async def execute(inputs: dict) -> dict[str, Any]:
task_id = int(inputs["id"])
deleted, cascade = await store.delete_task(list_id, task_id)
if not deleted:
return {
"success": False,
"task_list_id": list_id,
"task_id": task_id,
"message": f"Template entry #{task_id} not found.",
}
await emit_task_deleted(task_list_id=list_id, task_id=task_id, cascade=cascade)
return {
"success": True,
"task_list_id": list_id,
"task_id": task_id,
"deleted": True,
"cascade": cascade,
"message": f"Template entry #{task_id} removed.",
}
return execute
def _make_list_executor(store: TaskStore, list_id: str):
async def execute(inputs: dict) -> dict[str, Any]:
records = await store.list_tasks(list_id)
return {
"success": True,
"task_list_id": list_id,
"count": len(records),
"tasks": [_serialize_task(r) for r in records],
}
return execute
def build_colony_template_tools(
*,
colony_id: str,
store: TaskStore | None = None,
) -> list[tuple[Tool, Any]]:
s = store or get_task_store()
list_id = colony_task_list_id(colony_id)
return [
(
Tool(
name="colony_template_add",
description=_ADD_DESC,
parameters=_add_schema(),
concurrency_safe=False,
),
_make_add_executor(s, list_id),
),
(
Tool(
name="colony_template_update",
description=_UPDATE_DESC,
parameters=_update_schema(),
concurrency_safe=False,
),
_make_update_executor(s, list_id),
),
(
Tool(
name="colony_template_remove",
description=_REMOVE_DESC,
parameters=_remove_schema(),
concurrency_safe=False,
),
_make_remove_executor(s, list_id),
),
(
Tool(
name="colony_template_list",
description=_LIST_DESC,
parameters=_list_schema(),
concurrency_safe=True,
),
_make_list_executor(s, list_id),
),
]
+74
View File
@@ -0,0 +1,74 @@
"""Wire task tools into a ToolRegistry.
The four session task tools are registered for every agent that gets a
ToolRegistry. The colony template tools are queen-only and registered
separately by ``register_colony_template_tools``.
"""
from __future__ import annotations
import logging
from typing import Any
from framework.loader.tool_registry import ToolRegistry
from framework.tasks.store import TaskStore
logger = logging.getLogger(__name__)
def _wrap_async_executor(async_executor):
"""Adapt an async executor to ToolRegistry's sync executor protocol.
ToolRegistry's executor expects ``Callable[[dict], Any]`` where Any may
be a coroutine; the registry awaits it. We just pass the coroutine
through.
"""
def executor(inputs: dict) -> Any:
return async_executor(inputs)
return executor
def register_task_tools(
registry: ToolRegistry,
*,
store: TaskStore | None = None,
) -> None:
"""Register the four session task tools on ``registry``.
Idempotent: re-registering overwrites the previous executor (which is
fine they share the same TaskStore singleton anyway).
"""
from framework.tasks.tools.session_tools import build_session_tools
pairs = build_session_tools(store=store)
for tool, async_executor in pairs:
registry.register(tool.name, tool, _wrap_async_executor(async_executor))
# Also stamp into the concurrency-safe set if appropriate so the
# parallel batch dispatcher knows it can fan reads out.
if tool.concurrency_safe and tool.name not in ToolRegistry.CONCURRENCY_SAFE_TOOLS:
# CONCURRENCY_SAFE_TOOLS is a frozenset; attribute is a frozenset
# at the class level, so we instead set the attribute on the Tool
# object itself (already done) and trust the dispatcher to read it.
pass
logger.debug("Registered task tools on %s", registry)
def register_colony_template_tools(
registry: ToolRegistry,
*,
colony_id: str,
store: TaskStore | None = None,
) -> None:
"""Register the queen-only colony_template_* tools on ``registry``.
Should only be called for the queen of a colony workers and queen-DM
do not get these tools.
"""
from framework.tasks.tools.colony_tools import build_colony_template_tools
pairs = build_colony_template_tools(colony_id=colony_id, store=store)
for tool, async_executor in pairs:
registry.register(tool.name, tool, _wrap_async_executor(async_executor))
logger.debug("Registered colony_template_* tools (colony_id=%s)", colony_id)
+590
View File
@@ -0,0 +1,590 @@
"""The four session task tools: task_create, task_update, task_list, task_get.
All four operate on the calling agent's OWN session list. They never touch
the colony template the queen has separate ``colony_template_*`` tools
for that (see ``colony_tools.py``).
Concurrency safety:
task_list, task_get -> concurrency_safe=True (pure reads)
task_create, task_update -> concurrency_safe=False (writes serialize)
"""
from __future__ import annotations
import logging
from typing import Any
from framework.llm.provider import Tool
from framework.tasks.events import (
emit_task_created,
emit_task_deleted,
emit_task_updated,
)
from framework.tasks.hooks import (
HOOK_TASK_COMPLETED,
HOOK_TASK_CREATED,
BlockingHookError,
run_task_hooks,
)
from framework.tasks.models import TaskRecord, TaskStatus
from framework.tasks.store import (
_UNSET_SENTINEL as _UNSET, # re-export for clarity
TaskStore,
get_task_store,
)
from framework.tasks.tools._context import (
current_agent_id,
current_task_list_id,
)
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
# Schemas (Anthropic-style JSONSchema)
# ---------------------------------------------------------------------------
_TASK_STATUS_VALUES = ["pending", "in_progress", "completed", "deleted"]
def _create_schema() -> dict[str, Any]:
return {
"type": "object",
"properties": {
"subject": {
"type": "string",
"description": "Imperative title (e.g., 'Crawl target URLs').",
},
"description": {
"type": "string",
"description": "Brief description of what to do.",
},
"active_form": {
"type": "string",
"description": "Present-continuous label shown while in_progress (e.g., 'Crawling target URLs').",
},
"metadata": {
"type": "object",
"description": "Arbitrary key/value metadata. Use _internal=true to hide from task_list.",
},
},
"required": ["subject"],
}
def _update_schema() -> dict[str, Any]:
return {
"type": "object",
"properties": {
"id": {"type": "integer", "description": "Task id (the #N from task_list)."},
"subject": {"type": "string"},
"description": {"type": "string"},
"active_form": {"type": "string"},
"owner": {
"type": ["string", "null"],
"description": "Agent id of the owner. Null clears ownership.",
},
"status": {"type": "string", "enum": _TASK_STATUS_VALUES},
"add_blocks": {
"type": "array",
"items": {"type": "integer"},
"description": "Add task ids that this task blocks (bidirectional).",
},
"add_blocked_by": {
"type": "array",
"items": {"type": "integer"},
"description": "Add task ids that block this task (bidirectional).",
},
"metadata_patch": {
"type": "object",
"description": "Merge into metadata. Null values delete keys.",
},
},
"required": ["id"],
}
def _list_schema() -> dict[str, Any]:
return {"type": "object", "properties": {}}
def _get_schema() -> dict[str, Any]:
return {
"type": "object",
"properties": {"id": {"type": "integer"}},
"required": ["id"],
}
def _create_batch_schema() -> dict[str, Any]:
return {
"type": "object",
"properties": {
"tasks": {
"type": "array",
"minItems": 1,
"description": (
"Array of task specs. Each becomes one task with a sequential id. Atomic — all created or none."
),
"items": {
"type": "object",
"properties": {
"subject": {
"type": "string",
"description": "Imperative title (e.g. 'Crawl target URL').",
},
"description": {"type": "string"},
"active_form": {
"type": "string",
"description": ("Present-continuous label shown while in_progress."),
},
"metadata": {"type": "object"},
},
"required": ["subject"],
},
}
},
"required": ["tasks"],
}
# ---------------------------------------------------------------------------
# Tool descriptions
# ---------------------------------------------------------------------------
_CREATE_DESC = (
"Create ONE task on your own session task list. Use this for one-off "
"mid-run additions when you discover unplanned work after the initial "
"plan is laid out.\n\n"
"**After receiving new instructions, immediately capture the user's "
"requirements as tasks** — and delete (via `task_update` with "
"status='deleted') any prior tasks that no longer apply.\n\n"
"**For laying out a multi-step plan upfront, use `task_create_batch` "
"instead** — one tool call with all the steps is cheaper and atomic.\n\n"
"Fields:\n"
"- subject: short imperative title (e.g. 'Crawl target URL').\n"
"- description: optional, slightly longer 'what to do' note.\n"
"- active_form: present-continuous label shown while in_progress (e.g. "
"'Crawling target URL'). If omitted, the spinner shows the subject.\n"
"- metadata: optional KV. Set _internal=true to hide from task_list."
)
_UPDATE_DESC = (
"Update ONE task on your own session task list. There is no batch "
"update tool by design — every `completed` transition is a discrete "
"progress signal to the user.\n\n"
"Workflow:\n"
"- Mark a task `in_progress` BEFORE you start working on it.\n"
"- Mark it `completed` AS SOON as you finish it — do not let "
"multiple finished tasks pile up unmarked before flushing them at "
"the end of the run.\n"
"- Delete tasks: when a task is no longer relevant or was created "
"in error. Setting status='deleted' **permanently** removes the "
"task — the id is retired and cannot be reused.\n\n"
"ONLY mark `completed` when the task is FULLY done. If you hit errors, "
"blockers, or partial state, keep it `in_progress` and create a new "
"task describing what's blocking. Never mark completed with caveats; "
"if it's not done, it's not done.\n\n"
"Setting status='in_progress' without owner auto-fills your agent_id."
)
_LIST_DESC = (
"Show your session task list, sorted by id ascending. Internal tasks "
"(metadata._internal=true) and resolved blockers are filtered out. "
"**Prefer working on tasks in id order** (lowest first) — earlier "
"tasks usually set up context for later ones."
)
_GET_DESC = (
"Read the full record of one task (description, metadata, timestamps) "
"from your own session task list. Use this to refresh your view of a "
"task before updating it if you're not sure of current fields."
)
_CREATE_BATCH_DESC = (
"Create N tasks at once on your own session task list. **Use this "
"FIRST when laying out a multi-step plan upfront** — replying to 5 "
"posts is one `task_create_batch` with 5 entries, not 5 separate "
"`task_create` calls. Atomic: all-or-none. Use single `task_create` "
"for one-off mid-run additions when you discover unplanned work, "
"not for the initial plan."
)
# ---------------------------------------------------------------------------
# Executors
# ---------------------------------------------------------------------------
def _resolve_list_id() -> str | None:
"""Pull the calling agent's session task_list_id from execution context."""
return current_task_list_id()
def _serialize_task(t: TaskRecord) -> dict[str, Any]:
return {
"id": t.id,
"subject": t.subject,
"description": t.description,
"active_form": t.active_form,
"owner": t.owner,
"status": t.status.value,
"blocks": list(t.blocks),
"blocked_by": list(t.blocked_by),
"metadata": dict(t.metadata),
"created_at": t.created_at,
"updated_at": t.updated_at,
}
def _make_create_executor(store: TaskStore):
async def execute(inputs: dict) -> dict[str, Any]:
list_id = _resolve_list_id()
if not list_id:
return {"success": False, "error": "No task_list_id resolved for this agent."}
agent_id = current_agent_id() or ""
kwargs = {
"subject": inputs["subject"],
"description": inputs.get("description", ""),
"active_form": inputs.get("active_form"),
"metadata": inputs.get("metadata") or {},
}
rec = await store.create_task(list_id, **kwargs)
# task_created hooks may block creation -> rollback by deleting.
try:
await run_task_hooks(
HOOK_TASK_CREATED,
task_list_id=list_id,
task=rec,
agent_id=agent_id,
)
except BlockingHookError as exc:
logger.warning("task_created hook blocked task #%s: %s", rec.id, exc)
await store.delete_task(list_id, rec.id)
return {"success": False, "error": f"Hook blocked task creation: {exc}"}
await emit_task_created(task_list_id=list_id, record=rec)
return {
"success": True,
"task_list_id": list_id,
"task_id": rec.id,
"message": f"Task #{rec.id} created successfully: {rec.subject}",
"task": _serialize_task(rec),
}
return execute
def _make_create_batch_executor(store: TaskStore):
async def execute(inputs: dict) -> dict[str, Any]:
list_id = _resolve_list_id()
if not list_id:
return {"success": False, "error": "No task_list_id resolved for this agent."}
agent_id = current_agent_id() or ""
specs = inputs.get("tasks") or []
if not isinstance(specs, list) or not specs:
return {
"success": False,
"error": "task_create_batch requires a non-empty `tasks` array.",
}
# Storage layer validates subject; surface its error as a soft
# tool_result so sibling tools don't cancel.
try:
recs = await store.create_tasks_batch(list_id, specs)
except ValueError as exc:
return {"success": False, "error": str(exc)}
# Run task_created hooks per task; blocking on any aborts the
# whole batch (delete every record we just wrote, return error).
for rec in recs:
try:
await run_task_hooks(
HOOK_TASK_CREATED,
task_list_id=list_id,
task=rec,
agent_id=agent_id,
)
except BlockingHookError as exc:
logger.warning(
"task_created hook blocked batch on task #%s: %s",
rec.id,
exc,
)
for r in recs:
await store.delete_task(list_id, r.id)
return {
"success": False,
"error": (f"Hook blocked task #{rec.id} ({rec.subject!r}); entire batch rolled back: {exc}"),
}
for rec in recs:
await emit_task_created(task_list_id=list_id, record=rec)
ids = [r.id for r in recs]
# Compact summary message — don't flood the conversation with
# one line per created task.
if len(ids) == 1:
range_label = f"#{ids[0]}"
elif ids == list(range(ids[0], ids[-1] + 1)):
range_label = f"#{ids[0]}-#{ids[-1]}"
else:
range_label = ", ".join(f"#{i}" for i in ids)
return {
"success": True,
"task_list_id": list_id,
"task_ids": ids,
"message": (f"Created {len(ids)} task(s): {range_label}. Mark #{ids[0]} in_progress before starting it."),
"tasks": [_serialize_task(r) for r in recs],
}
return execute
def _make_update_executor(store: TaskStore):
async def execute(inputs: dict) -> dict[str, Any]:
list_id = _resolve_list_id()
if not list_id:
return {"success": False, "error": "No task_list_id resolved for this agent."}
agent_id = current_agent_id() or ""
task_id = int(inputs["id"])
status_in = inputs.get("status")
# 'deleted' is a synthetic status — handle it as a separate path.
if status_in == "deleted":
deleted, cascade = await store.delete_task(list_id, task_id)
if not deleted:
return {
"success": False,
"task_list_id": list_id,
"task_id": task_id,
"message": f"Task #{task_id} not found (already deleted?)",
}
await emit_task_deleted(task_list_id=list_id, task_id=task_id, cascade=cascade)
return {
"success": True,
"task_list_id": list_id,
"task_id": task_id,
"deleted": True,
"cascade": cascade,
"message": f"Task #{task_id} deleted.",
}
# Auto-owner on in_progress.
owner_in = inputs.get("owner", _OwnerSentinel)
status_enum = TaskStatus(status_in) if status_in else None
if status_enum == TaskStatus.IN_PROGRESS and owner_in is _OwnerSentinel and agent_id:
owner_in = agent_id
# task_completed hook — fires BEFORE the write (Claude Code's
# veto-before-write semantics). If the hook blocks, nothing
# touches disk and no SSE event fires. The hook receives a
# preview record with the intended new status so it can inspect
# what's about to land.
if status_enum == TaskStatus.COMPLETED:
current = await store.get_task(list_id, task_id)
if current is None:
return {
"success": False,
"task_list_id": list_id,
"task_id": task_id,
"message": f"Task #{task_id} not found.",
}
if current.status != TaskStatus.COMPLETED:
preview = current.model_copy(update={"status": TaskStatus.COMPLETED})
try:
await run_task_hooks(
HOOK_TASK_COMPLETED,
task_list_id=list_id,
task=preview,
agent_id=agent_id,
)
except BlockingHookError as exc:
logger.warning("task_completed hook blocked #%s: %s", task_id, exc)
return {
"success": False,
"task_list_id": list_id,
"task_id": task_id,
"message": f"Hook blocked completion of #{task_id}: {exc}",
"task": _serialize_task(current),
}
# Hook passed (or wasn't applicable) — proceed with the write.
new, fields = await store.update_task(
list_id,
task_id,
subject=inputs.get("subject"),
description=inputs.get("description"),
active_form=inputs.get("active_form"),
owner=owner_in if owner_in is not _OwnerSentinel else _UNSET,
status=status_enum,
add_blocks=inputs.get("add_blocks"),
add_blocked_by=inputs.get("add_blocked_by"),
metadata_patch=inputs.get("metadata_patch"),
)
if new is None:
# "Task not found" is not an error — keep is_error=False semantics.
return {
"success": False,
"task_list_id": list_id,
"task_id": task_id,
"message": f"Task #{task_id} not found.",
}
if fields:
await emit_task_updated(task_list_id=list_id, record=new, fields=fields)
# Layer 4: tool-result steering. When a task just completed,
# peek at remaining work and append a focused next-step nudge.
# For hive's solo (non-claim) model, point at the lowest-id
# pending task or signal "all done".
message = f"Task #{task_id} updated. Fields changed: {', '.join(fields) or '(none)'}."
if status_enum == TaskStatus.COMPLETED and "status" in fields:
others = await store.list_tasks(list_id)
completed_ids = {r.id for r in others if r.status == TaskStatus.COMPLETED}
next_pending = next(
(
r
for r in others
if r.status == TaskStatus.PENDING and not [b for b in r.blocked_by if b not in completed_ids]
),
None,
)
in_progress = [r for r in others if r.status == TaskStatus.IN_PROGRESS]
if in_progress:
names = ", ".join(f"#{r.id}" for r in in_progress[:3])
message += f" Still in progress: {names}."
elif next_pending is not None:
message += (
f' Next pending: #{next_pending.id}"{next_pending.subject}". '
f"Mark it in_progress before starting."
)
else:
message += " All tasks complete. Wrap up: report results to the user and stop."
return {
"success": True,
"task_list_id": list_id,
"task_id": task_id,
"fields": fields,
"message": message,
"task": _serialize_task(new),
}
return execute
def _make_list_executor(store: TaskStore):
async def execute(inputs: dict) -> dict[str, Any]:
list_id = _resolve_list_id()
if not list_id:
return {"success": False, "error": "No task_list_id resolved for this agent."}
records = await store.list_tasks(list_id)
# Filter resolved blockers from the rendering so a completed
# blocker disappears from blocked_by.
completed_ids = {r.id for r in records if r.status == TaskStatus.COMPLETED}
rendered: list[str] = []
for r in records:
unresolved_blockers = [b for b in r.blocked_by if b not in completed_ids]
line_parts = [f"#{r.id}", f"[{r.status.value}]", r.subject]
if r.owner:
line_parts.append(f"({r.owner})")
if unresolved_blockers:
line_parts.append(f"[blocked by {', '.join(f'#{b}' for b in unresolved_blockers)}]")
rendered.append(" ".join(line_parts))
return {
"success": True,
"task_list_id": list_id,
"count": len(records),
"lines": rendered,
"tasks": [_serialize_task(r) for r in records],
}
return execute
def _make_get_executor(store: TaskStore):
async def execute(inputs: dict) -> dict[str, Any]:
list_id = _resolve_list_id()
if not list_id:
return {"success": False, "error": "No task_list_id resolved for this agent."}
task_id = int(inputs["id"])
rec = await store.get_task(list_id, task_id)
if rec is None:
return {
"success": False,
"task_list_id": list_id,
"task_id": task_id,
"message": f"Task #{task_id} not found.",
}
return {
"success": True,
"task_list_id": list_id,
"task_id": task_id,
"task": _serialize_task(rec),
}
return execute
# Sentinels so we can distinguish "owner not provided" from "owner=null".
class _OwnerSentinel: # noqa: N801 — internal sentinel class
pass
# ---------------------------------------------------------------------------
# Public registration
# ---------------------------------------------------------------------------
def build_session_tools(
store: TaskStore | None = None,
) -> list[tuple[Tool, Any]]:
"""Build (Tool, executor) pairs for the session task tools."""
s = store or get_task_store()
return [
(
Tool(
name="task_create_batch",
description=_CREATE_BATCH_DESC,
parameters=_create_batch_schema(),
concurrency_safe=False,
),
_make_create_batch_executor(s),
),
(
Tool(
name="task_create",
description=_CREATE_DESC,
parameters=_create_schema(),
concurrency_safe=False,
),
_make_create_executor(s),
),
(
Tool(
name="task_update",
description=_UPDATE_DESC,
parameters=_update_schema(),
concurrency_safe=False,
),
_make_update_executor(s),
),
(
Tool(
name="task_list",
description=_LIST_DESC,
parameters=_list_schema(),
concurrency_safe=True,
),
_make_list_executor(s),
),
(
Tool(
name="task_get",
description=_GET_DESC,
parameters=_get_schema(),
concurrency_safe=True,
),
_make_get_executor(s),
),
]

Some files were not shown because too many files have changed in this diff Show More