Compare commits

...

50 Commits

Author SHA1 Message Date
Richard Tang 78fffa63ec chore: ci and release doc
Release / Create Release (push) Waiting to run
2026-05-01 18:06:39 -07:00
Richard Tang 9a75d45351 chore: lint 2026-05-01 17:53:44 -07:00
Timothy 3a94f52009 feat: sync tool result contentful display 2026-05-01 17:44:19 -07:00
Timothy 522e0f511e fix: y-axis 2026-05-01 15:48:36 -07:00
Timothy e6310f1243 fix: normalize chart spec in renderer 2026-05-01 15:36:09 -07:00
Richard Tang 12ffacccab feat: tools config frontend grouping and tools cleanup 2026-05-01 15:28:40 -07:00
Timothy 8c36b1575c Merge branch 'feature/merge-to-file-ops' into feat/file-ops 2026-05-01 14:57:21 -07:00
Timothy 6540f7b31e feat: pura linea 2026-05-01 14:57:06 -07:00
Richard Tang a09eac06f1 feat: improve web search and consolidate browser open 2026-05-01 14:55:20 -07:00
Richard Tang b939a875a7 refactor: update autocompaction tools and concurrency tools 2026-05-01 14:27:38 -07:00
Richard Tang b826e70d8c feat: remove old lifecyle tools 2026-05-01 14:07:34 -07:00
Richard Tang 6f2f037c9c feat: remvoe other default tools 2026-05-01 13:35:04 -07:00
Richard Tang c147364d8c feat: browser tools audit and improvements 2026-05-01 13:22:31 -07:00
Richard Tang 35bd497750 feat: refactor edit file and update default tools 2026-05-01 12:40:53 -07:00
Richard Tang 574c4bbe33 Merge remote-tracking branch 'origin/feature/sync-20260430' into feat/file-ops 2026-05-01 07:42:20 -07:00
Richard Tang d22a01682a feat: major file ops refactor 2026-05-01 07:41:42 -07:00
Timothy 0c6f0f8aef refactor: rename shell tools to terminal tools 2026-04-30 19:52:34 -07:00
Richard Tang 0e8efa7bcc feat: vision fallback auth 2026-04-30 19:52:12 -07:00
Timothy 7b1dda7bf3 fix: mcp registry initialization 2026-04-30 19:52:04 -07:00
Timothy 725dd1f410 fix: shell split command 2026-04-30 19:52:01 -07:00
Timothy de4b2dc151 chore: give shell tools to queen 2026-04-30 19:51:57 -07:00
Timothy 0784cea314 fix: initial install 2026-04-30 19:51:55 -07:00
Timothy 20bbf08278 feat: perita manus 2026-04-30 19:51:44 -07:00
Richard Tang f8233bda56 feat: consolidate search and list file tools 2026-04-30 15:43:15 -07:00
Richard Tang 76a7dd4bd5 feat: loose the max token for vision fallback as some models spend output on internal thinking 2026-04-30 13:24:49 -07:00
Richard Tang 73511a3c59 feat: vision fallback with intent 2026-04-30 13:02:57 -07:00
Richard Tang a0817fcde4 feat: vision model retry and fallback 2026-04-30 12:38:30 -07:00
Richard Tang 628ce9ca12 feat: use simple snapshot for auto_snapshot_mode 2026-04-30 10:43:14 -07:00
Richard Tang cc4213a942 fix: llm debugger tool call display 2026-04-30 10:31:21 -07:00
Richard Tang d12d5b7e8b fix: llm debugger timeline order 2026-04-30 08:05:38 -07:00
Harshit Shukla 038c5fd807 fix(credentials): align EnvVarStorage exists with load semantics (#5680)
* Return boolean from exists method for credential check

* Add test for empty value handling in EnvVarStorage

Add test to verify exists() and load() consistency for empty values in EnvVarStorage.
2026-04-30 19:40:28 +08:00
Leayx 3d5f2595c9 bug(test_zoho_crm_tool): remove orphan test directory under src (#7142)
Problem
- The Zoho CRM tool was refactored to an MCP-based architecture, making the old in-tree test suite obsolete
- The remaining tests under src were not executed by pytest. Testpaths only includes tools/tests, effectively making them dead code
- A proper MCP test suite already exists under tools/tests, providing coverage

Decision
- Removed the unused test directory under src/aden_tools/tools/zoho_crm_tool/tests
- Aligns project structure with existing tools, where tests are only located in tools/tests
- Avoids confusion and prevents future contributors from relying on outdated or non-executed tests
2026-04-30 18:57:49 +08:00
Hundao 7881177f1f fix: unbreak main CI — skills HIVE_HOME refactor + run_parallel_workers task text (#7149)
* fix(skills): restore module-level path constants for HIVE_HOME refactor

ae2aa30e replaced module-level USER_SKILLS_DIR / INSTALL_NOTICE_SENTINEL
in installer.py and _NOTICE_SENTINEL_PATH / _TRUSTED_REPOS_PATH in
trust.py with lazy helper functions, but left callers and tests still
referencing the original symbols. CI fails with ImportError /
AttributeError.

Restore them as module-level constants computed from HIVE_HOME so the
desktop-shell override still works, callers in cli.py keep importing
the same names, and existing test monkeypatches stay valid.

Refs #7148

* fix(colony): preserve task text in run_parallel_workers spawn data

run_parallel_workers stamps __template_task_id into spec['data'] before
calling spawn_batch. Once that mutation makes spec['data'] non-empty,
colony_runtime.spawn()'s ``input_data or {"task": task}`` fallback no
longer fires and the task description disappears from the worker's
first user message. Workers loop on empty responses and never emit
SUBAGENT_REPORT.

Hoist the ``setdefault("task")`` step out of the template-publish try
block so task text survives even if the template store fails
non-fatally. Inner loop only stamps __template_task_id.

Refs #7148
2026-04-30 18:43:54 +08:00
Richard Tang 2cfea915f4 chore: ruff format 2026-04-29 19:23:31 -07:00
Richard Tang ac46a1be72 Merge branch 'feat/ask-user-chat-display' 2026-04-29 19:22:51 -07:00
Richard Tang 7b0b472167 chore: lint 2026-04-29 19:16:00 -07:00
Richard Tang 697aae33fe feat: prompts simplification 2026-04-29 19:13:01 -07:00
Richard Tang d26e7f33d2 fix: incubating mode approval guidence injection 2026-04-29 18:43:26 -07:00
Richard Tang 6357597e88 chore: improve llm visibility 2026-04-29 18:37:29 -07:00
Richard Tang 579f1d7512 feat(tasks): refactor task folder 2026-04-29 17:33:34 -07:00
bryan 965264c973 fix: defer ask_user question bubble until user answers 2026-04-29 16:31:19 -07:00
Richard Tang e80d275321 feat(queen): drop redundant _queen_style prompt block 2026-04-29 15:47:18 -07:00
Richard Tang 5b45fac435 feat: prompt improvements 2026-04-29 15:26:58 -07:00
Richard Tang 4794c8b816 chore: log the vision fallback model usage 2026-04-29 13:08:38 -07:00
Timothy 5492366c31 fix: recover frontend 2026-04-29 11:25:29 -07:00
Timothy ae2aa30edf fix: all agent path prefixed by HIVE_HOME 2026-04-28 19:16:35 -07:00
Timothy dd69a53de1 fix: hardcoded hive home 2026-04-28 18:25:21 -07:00
bryan 062a4e3166 feat: new-session navigation with queen warm-up UI 2026-04-28 18:17:25 -07:00
bryan fe9a903928 feat: surface ask_user questions in chat transcript 2026-04-28 18:16:47 -07:00
Timothy 7c3bada70c fix: patch litellm 2026-04-28 18:00:31 -07:00
205 changed files with 15364 additions and 12264 deletions
-1
View File
@@ -47,7 +47,6 @@
"Bash(grep -v ':0$')",
"Bash(curl -s -m 2 http://127.0.0.1:4002/sse -o /dev/null -w 'status=%{http_code} time=%{time_total}s\\\\n')",
"mcp__gcu-tools__browser_status",
"mcp__gcu-tools__browser_start",
"mcp__gcu-tools__browser_navigate",
"mcp__gcu-tools__browser_evaluate",
"mcp__gcu-tools__browser_screenshot",
@@ -214,7 +214,7 @@ Curated list of known browser automation edge cases with symptoms, causes, and f
| **Symptom** | `browser_open()` returns `"No group with id: XXXXXXX"` even though `browser_status` shows `running: true` |
| **Root Cause** | In-memory `_contexts` dict has a stale `groupId` from a Chrome tab group that was closed outside the tool (e.g. user closed the tab group) |
| **Detection** | `browser_status` returns `running: true` but `browser_open` fails with "No group with id" |
| **Fix** | Call `browser_stop()` to clear stale context from `_contexts`, then `browser_start()` again |
| **Fix** | Call `browser_stop()` to clear stale context from `_contexts`, then `browser_open(url)` to lazy-create a fresh one |
| **Code** | `tools/lifecycle.py:144-160` - `already_running` check uses cached dict without validating against Chrome |
| **Verified** | 2026-04-03 ✓ |
+3 -4
View File
@@ -72,17 +72,16 @@ Register an MCP server as a tool source for your agent.
"cwd": "../tools",
"description": "Aden tools..."
},
"tools_discovered": 6,
"tools_discovered": 5,
"tools": [
"web_search",
"web_scrape",
"file_read",
"file_write",
"pdf_read",
"example_tool"
"pdf_read"
],
"total_mcp_servers": 1,
"note": "MCP server 'tools' registered with 6 tools. These tools can now be used in event_loop nodes."
"note": "MCP server 'tools' registered with 5 tools. These tools can now be used in event_loop nodes."
}
```
+3 -3
View File
@@ -1,6 +1,6 @@
# MCP Server Guide - Agent Building Tools
> **Note:** The standalone `agent-builder` MCP server (`framework.mcp.agent_builder_server`) has been replaced. Agent building is now done via the `coder-tools` server's `initialize_and_build_agent` tool, with underlying logic in `tools/coder_tools_server.py`.
> **Note:** This document is stale. The previous `coder-tools` MCP server has been replaced by `files-tools` (`tools/files_server.py`), which only exposes file I/O (`read_file`, `write_file`, `edit_file`, `hashline_edit`, `search_files`). The agent-building, shell, and snapshot tools that used to live here have been removed.
This guide covers the MCP tools available for building goal-driven agents.
@@ -20,9 +20,9 @@ Add to your MCP client configuration (e.g., Claude Desktop):
```json
{
"mcpServers": {
"coder-tools": {
"files-tools": {
"command": "uv",
"args": ["run", "coder_tools_server.py", "--stdio"],
"args": ["run", "files_server.py", "--stdio"],
"cwd": "/path/to/hive/tools"
}
}
-2
View File
@@ -19,8 +19,6 @@ uv pip install -e .
## Agent Building
Agent scaffolding is handled by the `coder-tools` MCP server (in `tools/coder_tools_server.py`), which provides the `initialize_and_build_agent` tool and related utilities. The package generation logic lives directly in `tools/coder_tools_server.py`.
See the [Getting Started Guide](../docs/getting-started.md) for building agents.
## Quick Start
+36 -61
View File
@@ -14,7 +14,6 @@ from __future__ import annotations
import asyncio
import json
import logging
import os
import re
import time
import uuid
@@ -182,48 +181,6 @@ def _strip_internal_tags_from_snapshot(snapshot: str) -> str:
return cleaned
async def _describe_images_as_text(image_content: list[dict[str, Any]]) -> str | None:
"""Describe images using the best available vision model."""
import litellm
blocks: list[dict[str, Any]] = [
{
"type": "text",
"text": (
"Describe the following image(s) concisely but with enough detail "
"that a text-only AI assistant can understand the content and context."
),
}
]
blocks.extend(image_content)
candidates: list[str] = []
if os.environ.get("OPENAI_API_KEY"):
candidates.append("gpt-4o-mini")
if os.environ.get("ANTHROPIC_API_KEY"):
candidates.append("claude-3-haiku-20240307")
if os.environ.get("GOOGLE_API_KEY") or os.environ.get("GEMINI_API_KEY"):
candidates.append("gemini/gemini-1.5-flash")
for model in candidates:
try:
response = await litellm.acompletion(
model=model,
messages=[{"role": "user", "content": blocks}],
max_tokens=512,
)
description = (response.choices[0].message.content or "").strip()
if description:
count = len(image_content)
label = "image" if count == 1 else f"{count} images"
return f"[{label} attached — description: {description}]"
except Exception as exc:
logger.debug("Vision fallback model '%s' failed: %s", model, exc)
continue
return None
def _vision_fallback_active(model: str | None) -> bool:
"""Return True if tool-result images for *model* should be routed
through the vision-fallback chain rather than sent to the model.
@@ -247,21 +204,35 @@ def _vision_fallback_active(model: str | None) -> bool:
async def _captioning_chain(
intent: str,
image_content: list[dict[str, Any]],
) -> str | None:
"""Two-stage caption chain used by the agent-loop tool-result hook.
) -> tuple[str, str] | None:
"""Configured vision_fallback → retry → ``gemini/gemini-3-flash-preview``.
Stage 1: configured ``vision_fallback`` model with intent + images.
Stage 2: generic-caption rotation (gpt-4o-mini claude-3-haiku
gemini-flash) when stage 1 is unconfigured or fails.
Returns the caption text or None if both stages fail. Caller is
responsible for the placeholder-on-None and the splice into the
persisted tool-result content.
The Gemini override reuses the configured ``api_key`` / ``api_base``,
so a Hive subscriber (whose token routes to a multi-model proxy)
keeps coverage when their primary model glitches. Without
configured creds litellm falls through to env-based Gemini auth;
users with neither Hive nor a ``GEMINI_API_KEY`` simply lose the
third try.
"""
caption = await caption_tool_image(intent, image_content)
if not caption:
caption = await _describe_images_as_text(image_content)
return caption
if result := await caption_tool_image(intent, image_content):
return result
logger.warning("vision_fallback failed; retrying configured model")
if result := await caption_tool_image(intent, image_content):
return result
# Match the configured model's proxy prefix so the override is routed
# through the same endpoint with the same auth shape. Without this,
# a Hive subscriber's `hive/...` config would override to
# `gemini/...` — which sends Google's Gemini protocol to the
# Anthropic-compatible Hive proxy (404), not what we want.
configured = (get_vision_fallback_model() or "").lower()
if configured.startswith("hive/"):
override = "hive/gemini-3-flash-preview"
elif configured.startswith("kimi/"):
override = "kimi/gemini-3-flash-preview"
else:
override = "gemini/gemini-3-flash-preview"
logger.warning("vision_fallback retry failed; trying %s", override)
return await caption_tool_image(intent, image_content, model_override=override)
# Pattern for detecting context-window-exceeded errors across LLM providers.
@@ -1034,6 +1005,7 @@ class AgentLoop(AgentProtocol):
tool_calls=logged_tool_calls,
tool_results=real_tool_results,
token_counts=turn_tokens,
tools=tools,
)
# DS-13: inject context preservation warning once when token usage
@@ -3458,7 +3430,7 @@ class AgentLoop(AgentProtocol):
# single image needs captioning, this collapses to a
# single await with no overhead.
_model_text_only = ctx.llm and _vision_fallback_active(ctx.llm.model)
caption_tasks: dict[str, asyncio.Task[str | None]] = {}
caption_tasks: dict[str, asyncio.Task[tuple[str, str] | None]] = {}
if _model_text_only:
for tc in tool_calls[:executed_in_batch]:
res = results_by_id.get(tc.tool_use_id)
@@ -3499,14 +3471,17 @@ class AgentLoop(AgentProtocol):
# this tool).
vision_fallback_marker: str | None = None
if image_content and tc.tool_use_id in caption_tasks:
caption = await caption_tasks.pop(tc.tool_use_id)
if caption:
caption_result = await caption_tasks.pop(tc.tool_use_id)
if caption_result:
caption, vision_model = caption_result
vision_fallback_marker = f"[vision-fallback caption]\n{caption}"
logger.info(
"vision_fallback: captioned %d image(s) for tool '%s' (model '%s' routed through fallback)",
"vision_fallback: captioned %d image(s) for tool '%s' "
"(main model '%s' routed through fallback model '%s')",
len(image_content),
tc.tool_name,
ctx.llm.model if ctx.llm else "?",
vision_model,
)
else:
vision_fallback_marker = "[image stripped — vision fallback exhausted]"
@@ -4159,7 +4134,7 @@ class AgentLoop(AgentProtocol):
queue=self._injection_queue,
conversation=conversation,
ctx=ctx,
describe_images_as_text_fn=_describe_images_as_text,
caption_image_fn=_captioning_chain,
)
async def _drain_trigger_queue(self, conversation: NodeConversation) -> int:
@@ -16,7 +16,6 @@ import os
import re
import time
from datetime import UTC, datetime
from pathlib import Path
from typing import Any
from framework.agent_loop.conversation import Message, NodeConversation
@@ -31,19 +30,38 @@ logger = logging.getLogger(__name__)
LLM_COMPACT_CHAR_LIMIT: int = 240_000
LLM_COMPACT_MAX_DEPTH: int = 10
# Microcompaction: tools whose results can be safely cleared
# Microcompaction: tools whose results can be safely cleared from context
# because the agent can re-derive them on demand. The bar for inclusion is
# "old result has no irreversible value": file content can be re-read, a
# search can be re-run, a screenshot can be re-captured, terminal output can
# be re-fetched, etc. Write / edit results are short confirmations whose
# value is in the side effect, not the message — also fair game.
COMPACTABLE_TOOLS: frozenset[str] = frozenset(
{
# File ops — content lives on disk, re-readable.
"read_file",
"run_command",
"web_search",
"web_fetch",
"grep_search",
"glob_search",
"search_files",
"write_file",
"edit_file",
"pdf_read",
# Terminal — re-runnable; advanced job/output tools produce verbose
# logs whose recent state is what matters.
"terminal_exec",
"terminal_rg",
"terminal_find",
"terminal_output_get",
"terminal_job_logs",
# Web / research — pages and queries can be re-fetched.
"web_scrape",
"search_papers",
"download_paper",
"search_wikipedia",
# Browser read-only inspection — current page state is what matters,
# old snapshots are stale by definition.
"browser_screenshot",
"list_directory",
"browser_snapshot",
"browser_html",
"browser_get_text",
}
)
@@ -657,8 +675,10 @@ def write_compaction_debug_log(
level: str,
inventory: list[dict[str, Any]] | None,
) -> None:
"""Write detailed compaction analysis to ~/.hive/compaction_log/."""
log_dir = Path.home() / ".hive" / "compaction_log"
"""Write detailed compaction analysis to $HIVE_HOME/compaction_log/."""
from framework.config import HIVE_HOME
log_dir = HIVE_HOME / "compaction_log"
log_dir.mkdir(parents=True, exist_ok=True)
ts = datetime.now(UTC).strftime("%Y%m%dT%H%M%S_%f")
@@ -857,7 +877,7 @@ def build_emergency_summary(
if not all_files:
parts.append(
"NOTE: Large tool results may have been saved to files. "
"Use list_directory to check the data directory."
"Use search_files(target='files', path='.') to check the data directory."
)
except Exception:
parts.append("NOTE: Large tool results were saved to files. Use read_file(path='<path>') to read them.")
@@ -162,9 +162,18 @@ async def drain_injection_queue(
conversation: NodeConversation,
*,
ctx: NodeContext,
describe_images_as_text_fn: (Callable[[list[dict[str, Any]]], Awaitable[str | None]] | None) = None,
caption_image_fn: (Callable[[str, list[dict[str, Any]]], Awaitable[tuple[str, str] | None]] | None) = None,
) -> int:
"""Drain all pending injected events as user messages. Returns count."""
"""Drain all pending injected events as user messages. Returns count.
``caption_image_fn`` is the unified vision fallback hook. It takes
``(intent, image_content)`` and returns ``(caption, model)`` on
success the model id is logged so the destination is observable.
The user's typed ``content`` (the injected message body) is passed
as the intent so the captioner can answer the user's specific
question about the image rather than producing a generic
description; an empty content falls back to a generic intent.
"""
count = 0
logger.debug(
"[drain_injection_queue] Starting to drain queue, initial queue size: %s",
@@ -184,11 +193,16 @@ async def drain_injection_queue(
"Model '%s' does not support images; attempting vision fallback",
ctx.llm.model,
)
if describe_images_as_text_fn is not None:
description = await describe_images_as_text_fn(image_content)
if description:
if caption_image_fn is not None:
intent = content or ("Describe these user-injected images for a text-only agent.")
caption_result = await caption_image_fn(intent, image_content)
if caption_result:
description, vision_model = caption_result
content = f"{content}\n\n{description}" if content else description
logger.info("[drain] image described as text via vision fallback")
logger.info(
"[drain] image described as text via vision fallback (model '%s')",
vision_model,
)
else:
logger.info("[drain] no vision fallback available; images dropped")
image_content = None
@@ -18,9 +18,10 @@ This module provides:
Both helpers degrade silently return ``None`` / a placeholder rather
than raise so a vision-fallback failure can never kill the main
agent's run. The agent-loop call site is responsible for chaining
through to the existing generic-caption rotation
(``_describe_images_as_text``) on a None return.
agent's run. The agent-loop call site retries the configured model
once on a None return, then falls back to
``gemini/gemini-3-flash-preview`` via the ``model_override`` parameter
of :func:`caption_tool_image`.
"""
from __future__ import annotations
@@ -156,24 +157,30 @@ async def caption_tool_image(
image_content: list[dict[str, Any]],
*,
timeout_s: float = 30.0,
) -> str | None:
model_override: str | None = None,
) -> tuple[str, str] | None:
"""Caption the given images using the configured ``vision_fallback`` model.
Returns the model's text response on success, or ``None`` on any
failure (no config, no API key, timeout, exception, empty
response). Callers chain to the next stage of the fallback on None.
Returns ``(caption, model)`` on success or ``None`` on any failure
(no config, no API key, timeout, exception, empty response).
Logs each call to ``~/.hive/llm_logs`` via ``log_llm_turn`` so the
cost / latency / quality are auditable post-hoc, tagged with
``execution_id="vision_fallback_subagent"``.
``model_override`` swaps in a different litellm model id while
keeping the configured ``vision_fallback`` ``api_key`` / ``api_base``
untouched. That's deliberate: Hive subscribers configure
``vision_fallback`` to point at the Hive proxy, which routes to
multiple models including Gemini so reusing the credentials lets
a Gemini-3-flash override still work without a separate
``GEMINI_API_KEY``. When no creds are configured, litellm falls
back to env-var resolution.
Logs each call to ``~/.hive/llm_logs`` via ``log_llm_turn``.
"""
model = get_vision_fallback_model()
model = model_override or get_vision_fallback_model()
if not model:
return None
api_key = get_vision_fallback_api_key()
api_base = get_vision_fallback_api_base()
if not api_key:
if not api_key and not model_override:
logger.debug("vision_fallback configured but no API key resolved; skipping")
return None
@@ -189,15 +196,53 @@ async def caption_tool_image(
{"role": "user", "content": user_blocks},
]
# Apply the same proxy rewrites the main LLM provider uses so a
# `hive/...` / `kimi/...` model resolves to the right Anthropic-
# compatible endpoint with the right auth header. Without this,
# litellm doesn't know what `hive/kimi-k2.5` is and rejects the call
# with "LLM Provider NOT provided."
from framework.llm.litellm import rewrite_proxy_model
rewritten_model, rewritten_base, extra_headers = rewrite_proxy_model(model, api_key, api_base)
kwargs: dict[str, Any] = {
"model": model,
"model": rewritten_model,
"messages": messages,
"max_tokens": 1024,
"max_tokens": 8192,
"timeout": timeout_s,
"api_key": api_key,
}
if api_base:
kwargs["api_base"] = api_base
# Always pass api_key when we have one, even alongside proxy-rewritten
# extra_headers. litellm's anthropic handler refuses to dispatch
# without an api_key (it sends it as x-api-key); the proxy itself
# authenticates via the Authorization: Bearer header in
# extra_headers. Both are needed — matches LiteLLMProvider's path.
if api_key:
kwargs["api_key"] = api_key
if rewritten_base:
kwargs["api_base"] = rewritten_base
if extra_headers:
kwargs["extra_headers"] = extra_headers
# Surface where the request is going so the user can verify the
# vision fallback is hitting the expected proxy / model. Redacts
# the API key to a length+head+tail digest so it can be cross-
# correlated with other auth-related log lines.
key_digest = (
f"len={len(api_key)} {api_key[:8]}{api_key[-4:]}"
if api_key and len(api_key) >= 12
else f"len={len(api_key) if api_key else 0}"
)
logger.info(
"[vision_fallback] dispatching: configured_model=%s rewritten_model=%s "
"api_base=%s api_key=%s images=%d intent_chars=%d timeout_s=%.1f",
model,
rewritten_model,
rewritten_base or "<litellm-default>",
key_digest,
len(image_content),
len(intent),
timeout_s,
)
started = datetime.now()
caption: str | None = None
@@ -207,9 +252,21 @@ async def caption_tool_image(
text = (response.choices[0].message.content or "").strip()
if text:
caption = text
logger.info(
"[vision_fallback] response: model=%s api_base=%s elapsed_s=%.2f chars=%d",
rewritten_model,
rewritten_base or "<litellm-default>",
(datetime.now() - started).total_seconds(),
len(text),
)
except Exception as exc:
error_text = f"{type(exc).__name__}: {exc}"
logger.debug("vision_fallback model '%s' failed: %s", model, exc)
logger.warning(
"[vision_fallback] failed: model=%s api_base=%s error=%s",
rewritten_model,
rewritten_base or "<litellm-default>",
error_text,
)
# Best-effort audit log so users can grep ~/.hive/llm_logs/ for
# vision-fallback subagent calls. Failures here must not bubble.
@@ -241,7 +298,9 @@ async def caption_tool_image(
except Exception:
pass
return caption
if caption is None:
return None
return caption, model
__all__ = ["caption_tool_image", "extract_intent_for_tool"]
@@ -560,7 +560,9 @@ class CredentialTesterAgent:
if self._selected_account is None:
raise RuntimeError("No account selected. Call select_account() first.")
self._storage_path = Path.home() / ".hive" / "agents" / "credential_tester"
from framework.config import HIVE_HOME
self._storage_path = HIVE_HOME / "agents" / "credential_tester"
self._storage_path.mkdir(parents=True, exist_ok=True)
self._tool_registry = ToolRegistry()
+10 -4
View File
@@ -66,7 +66,9 @@ def _get_last_active(agent_path: Path) -> str | None:
latest: str | None = None
# 1. Worker sessions
sessions_dir = Path.home() / ".hive" / "agents" / agent_name / "sessions"
from framework.config import HIVE_HOME
sessions_dir = HIVE_HOME / "agents" / agent_name / "sessions"
if sessions_dir.exists():
for session_dir in sessions_dir.iterdir():
if not session_dir.is_dir() or not session_dir.name.startswith("session_"):
@@ -115,7 +117,9 @@ def _get_last_active(agent_path: Path) -> str | None:
def _count_sessions(agent_name: str) -> int:
"""Count session directories under ~/.hive/agents/{agent_name}/sessions/."""
sessions_dir = Path.home() / ".hive" / "agents" / agent_name / "sessions"
from framework.config import HIVE_HOME
sessions_dir = HIVE_HOME / "agents" / agent_name / "sessions"
if not sessions_dir.exists():
return 0
return sum(1 for d in sessions_dir.iterdir() if d.is_dir() and d.name.startswith("session_"))
@@ -123,7 +127,9 @@ def _count_sessions(agent_name: str) -> int:
def _count_runs(agent_name: str) -> int:
"""Count unique run_ids across all sessions for an agent."""
sessions_dir = Path.home() / ".hive" / "agents" / agent_name / "sessions"
from framework.config import HIVE_HOME
sessions_dir = HIVE_HOME / "agents" / agent_name / "sessions"
if not sessions_dir.exists():
return 0
run_ids: set[str] = set()
@@ -146,7 +152,7 @@ def _count_runs(agent_name: str) -> int:
return len(run_ids)
_EXCLUDED_JSON_STEMS = {"agent", "flowchart", "triggers", "configuration", "metadata"}
_EXCLUDED_JSON_STEMS = {"agent", "flowchart", "triggers", "configuration", "metadata", "tasks"}
def _is_colony_dir(path: Path) -> bool:
+4 -3
View File
@@ -2,12 +2,13 @@
import json
from dataclasses import dataclass, field
from pathlib import Path
def _load_preferred_model() -> str:
"""Load preferred model from ~/.hive/configuration.json."""
config_path = Path.home() / ".hive" / "configuration.json"
"""Load preferred model from $HIVE_HOME/configuration.json."""
from framework.config import HIVE_HOME
config_path = HIVE_HOME / "configuration.json"
if config_path.exists():
try:
with open(config_path, encoding="utf-8") as f:
@@ -1,9 +1,8 @@
"""One-shot LLM gate that decides if a queen DM is ready to fork a colony.
The queen's ``start_incubating_colony`` tool calls :func:`evaluate` with
the queen's recent conversation, a proposed ``colony_name``, and a
one-paragraph ``intended_purpose``. The evaluator returns a structured
verdict:
the queen's recent conversation and a proposed ``colony_name``. The
evaluator returns a structured verdict:
{
"ready": bool,
@@ -38,8 +37,8 @@ You gate whether a queen agent should commit to forking a persistent
expensive: it ends the user's chat with this queen and the worker runs
unattended afterward, so the spec must be settled before you approve.
Read the conversation excerpt and the queen's proposed colony_name +
intended_purpose, then decide.
Read the conversation excerpt and the queen's proposed colony_name,
then decide.
APPROVE (ready=true) only when ALL of the following hold:
1. The user has explicitly asked for work that needs to outlive this
@@ -128,11 +127,9 @@ def format_conversation_excerpt(messages: list[Message]) -> str:
def _build_user_message(
conversation_excerpt: str,
colony_name: str,
intended_purpose: str,
) -> str:
return (
f"## Proposed colony name\n{colony_name}\n\n"
f"## Queen's intended_purpose\n{intended_purpose.strip()}\n\n"
f"## Recent conversation (oldest → newest)\n{conversation_excerpt}\n\n"
"Decide: should this queen be approved to enter INCUBATING phase?"
)
@@ -189,7 +186,6 @@ async def evaluate(
llm: Any,
messages: list[Message],
colony_name: str,
intended_purpose: str,
) -> dict[str, Any]:
"""Run the incubating evaluator against the queen's conversation.
@@ -200,14 +196,13 @@ async def evaluate(
messages: The queen's conversation messages, oldest first. The
evaluator slices its own tail; pass the full list.
colony_name: Validated colony slug.
intended_purpose: Queen's one-paragraph brief.
Returns:
``{"ready": bool, "reasons": [str], "missing_prerequisites": [str]}``.
Fail-closed on any error.
"""
excerpt = format_conversation_excerpt(messages)
user_msg = _build_user_message(excerpt, colony_name, intended_purpose)
user_msg = _build_user_message(excerpt, colony_name)
try:
response = await llm.acomplete(
@@ -1,3 +1,3 @@
{
"include": ["gcu-tools", "hive_tools"]
"include": ["gcu-tools", "hive_tools", "terminal-tools", "chart-tools"]
}
+3 -3
View File
@@ -1,10 +1,10 @@
{
"coder-tools": {
"files-tools": {
"transport": "stdio",
"command": "uv",
"args": ["run", "python", "coder_tools_server.py", "--stdio"],
"args": ["run", "python", "files_server.py", "--stdio"],
"cwd": "../../../../tools",
"description": "Unsandboxed file system tools for code generation and validation"
"description": "File system tools (read/write/edit/search) for code generation"
},
"gcu-tools": {
"transport": "stdio",
+59 -85
View File
@@ -32,7 +32,7 @@ def finalize_queen_prompt(text: str, has_vision: bool) -> str:
# ---------------------------------------------------------------------------
# Independent phase: queen operates as a standalone agent — no worker.
# Core tools are listed here; MCP tools (coder-tools, gcu-tools) are added
# Core tools are listed here; MCP tools (files-tools, gcu-tools) are added
# dynamically in queen_orchestrator.py because their tool names aren't known
# at import time.
_QUEEN_INDEPENDENT_TOOLS = [
@@ -40,11 +40,7 @@ _QUEEN_INDEPENDENT_TOOLS = [
"read_file",
"write_file",
"edit_file",
"hashline_edit",
"list_directory",
"search_files",
"run_command",
"undo_changes",
# NOTE (2026-04-16): ``run_parallel_workers`` is not in the DM phase.
# Pure DM is for conversation with the user; fan out parallel work via
# ``start_incubating_colony`` (which gates the colony fork behind a
@@ -60,9 +56,7 @@ _QUEEN_INDEPENDENT_TOOLS = [
# (e.g. inspect an existing skill) before committing.
_QUEEN_INCUBATING_TOOLS = [
"read_file",
"list_directory",
"search_files",
"run_command",
# Schedule lives on the colony, not on the queen session — pass it
# inline as create_colony(triggers=[...]) instead of staging through
# set_trigger here.
@@ -76,9 +70,7 @@ _QUEEN_INCUBATING_TOOLS = [
_QUEEN_WORKING_TOOLS = [
# Read-only
"read_file",
"list_directory",
"search_files",
"run_command",
# Monitoring + worker dialogue
"get_worker_status",
"inject_message",
@@ -95,9 +87,7 @@ _QUEEN_WORKING_TOOLS = [
_QUEEN_REVIEWING_TOOLS = [
# Read-only
"read_file",
"list_directory",
"search_files",
"run_command",
# Status + escalation replies
"get_worker_status",
"list_worker_questions",
@@ -132,11 +122,10 @@ phase. Your identity tells you WHO you are.
# ---------------------------------------------------------------------------
_queen_role_independent = """\
You are in INDEPENDENT mode. No worker layout you do the work yourself. \
You have full coding tools (read/write/edit/search/run) and MCP tools \
(file operations via coder-tools, browser automation via gcu-tools). \
Execute the user's task directly using conversation and tools. \
You are the agent. \
You are in INDEPENDENT mode. \
You have full coding tools (read/write/edit/search) and MCP tools \
(file operations via files-tools, browser automation via gcu-tools). \
Execute the user's task directly using planning, conversation and tools.
If you need a structured choice or approval gate, always use \
``ask_user``; otherwise ask in plain prose. ``ask_user`` takes a \
``questions`` array pass a single entry for one question, or batch \
@@ -145,13 +134,12 @@ several entries when you have multiple clarifications. \
When the user clearly wants persistent / recurring / headless work that \
needs to outlive THIS chat (e.g. "every morning", "monitor X and alert \
me", "set up a job that"), call ``start_incubating_colony`` with a \
proposed colony_name and a one-paragraph intended_purpose. A side \
evaluator reads the conversation and decides if the spec is settled. If \
it returns ``not_ready`` you keep talking with the user sort out \
whatever the evaluator said is missing, then retry. If it returns \
``incubating`` your phase flips and a new prompt takes over. Do not \
try to write SKILL.md, fork directories, or otherwise build the colony \
yourself in this phase.\
proposed colony_name. A side evaluator reads the conversation and \
decides if the spec is settled. If it returns ``not_ready`` you keep \
talking with the user sort out whatever the evaluator said is \
missing, then retry. If it returns ``incubating`` your phase flips and \
a new prompt takes over. Do not try to write SKILL.md, fork \
directories, or otherwise build the colony yourself in this phase.\
"""
_queen_role_incubating = """\
@@ -179,7 +167,7 @@ no harm, you go back to INDEPENDENT and can retry later.
If the user explicitly asks for something UNRELATED to the current \
colony being drafted (a side question, a one-shot task, a different \
problem), don't try to handle it from this limited tool surface. Call \
problem), Call \
``cancel_incubation`` first to switch back to INDEPENDENT where you \
have the full toolkit, handle their request there, and re-enter \
INCUBATING later via ``start_incubating_colony`` when they want to \
@@ -237,13 +225,12 @@ ceremony for a single-paragraph summary.
# ---------------------------------------------------------------------------
_queen_tools_independent = """
# Tools (INDEPENDENT mode)
# Tools
## Planning — use FIRST for multi-step work
- task_create_batch When a request has 3+ atomic steps, your FIRST \
- task_create_batch When a request has 2+ atomic steps, your FIRST \
tool call is `task_create_batch` with one entry per step (atomic, \
one round-trip). Use this for the upfront plan, NOT five separate \
`task_create` calls.
one round-trip).
- task_create One-off mid-run additions when you discover \
unplanned work AFTER the initial plan is laid out.
- task_update / task_list / task_get Mark progress, inspect, or \
@@ -251,28 +238,32 @@ re-read state.
See "Independent execution" for the per-step flow and granularity rule.
## File I/O (coder-tools MCP)
- read_file, write_file, edit_file, hashline_edit, list_directory, \
search_files, run_command, undo_changes
## File I/O (files-tools MCP)
- read_file, write_file, edit_file, search_files
- edit_file covers single-file fuzzy find/replace (mode='replace', default) \
and multi-file structured patches (mode='patch'). Patch mode supports \
Update / Add / Delete / Move atomically across many files in one call.
- search_files covers grep/find/ls in one tool: target='content' to \
search inside files, target='files' (with a glob like '*.py') to list \
or find files. Mtime-sorted in files mode.
## Browser Automation (gcu-tools MCP)
- Use `browser_*` tools (browser_start, browser_navigate, browser_click, \
browser_fill, browser_snapshot, <!-- vision-only -->browser_screenshot, <!-- /vision-only -->browser_scroll, \
browser_tabs, browser_close, browser_evaluate, etc.).
- Use `browser_*` tools `browser_open(url)` is the cold-start entry point \
(lazy-creates the context; no separate "start" call). Then `browser_navigate`, \
`browser_click`, `browser_type`, `browser_snapshot`, \
<!-- vision-only -->`browser_screenshot`, <!-- /vision-only -->`browser_scroll`, \
`browser_tabs`, `browser_close`, `browser_evaluate`, etc.
- MUST Follow the browser-automation skill protocol before using browser tools.
## Hand off to a colony
- start_incubating_colony(colony_name, intended_purpose) Use this when \
the user wants persistent / recurring / headless work that needs to \
outlive THIS chat. It does NOT fork on its own; it spawns a one-shot \
evaluator that reads this conversation and decides whether the spec \
is settled enough to proceed. On approval your phase flips to \
INCUBATING and a new tool surface (including create_colony itself) \
unlocks. On rejection you stay here and keep the conversation going \
to fill the gaps the evaluator named.
- ``intended_purpose`` is a one-paragraph brief: what the colony will \
do, on what cadence, why it must outlive this chat. Don't write a \
SKILL.md here that comes in INCUBATING.
- start_incubating_colony(colony_name) Use this when the user wants \
persistent / recurring / headless work that needs to outlive THIS \
chat. It does NOT fork on its own; it spawns a one-shot evaluator \
that reads this conversation and decides whether the spec is settled \
enough to proceed. On approval your phase flips to INCUBATING and a \
new tool surface (including create_colony itself) unlocks. On \
rejection you stay here and keep the conversation going to fill the \
gaps the evaluator named.
"""
_queen_tools_incubating = """
@@ -282,10 +273,11 @@ You've been approved to fork. The full coding toolkit is gone on \
purpose your job in this phase is to nail the spec, not keep doing \
work. Available:
## Read-only inspection (coder-tools MCP)
- read_file, list_directory, search_files, run_command for confirming \
details before you commit (e.g. peek at an existing skill in \
~/.hive/skills/, sanity-check an API URL).
## Read-only inspection (files-tools MCP)
- read_file, search_files for confirming details before \
you commit (e.g. peek at an existing skill in ~/.hive/skills/, sanity-check \
an API URL). search_files covers both grep (target='content') and ls/find \
(target='files', glob like '*.py').
## Approved → operational checklist (use your judgement, ask only what's missing)
The conversation that got you here probably did NOT cover all of:
@@ -379,7 +371,8 @@ operational, not editorial.
born from a fresh chat via start_incubating_colony.
## Read-only inspection
- read_file, list_directory, search_files, run_command
- read_file, search_files (search_files covers grep/find/ls \
via target='content' or target='files')
When every worker has reported (success or failure), the phase \
auto-moves to REVIEWING. You do not need to call a transition tool \
@@ -398,7 +391,7 @@ _queen_tools_reviewing = """
# Tools (REVIEWING mode)
Workers have finished. You have:
- Read-only: read_file, list_directory, search_files, run_command
- Read-only: read_file, search_files (search_files = grep+find+ls)
- get_worker_status(focus?) Pull the final status / per-worker reports
- list_worker_questions() / reply_to_worker(request_id, reply) Answer any \
late escalations still in the inbox
@@ -418,10 +411,10 @@ asks for specifics. Do not invent a new pass unless the user asks for one.
_queen_behavior_independent = """
## Independent execution
You are the agent. **For multi-step work (3+ atomic actions): your FIRST \
tool call is `task_create_batch`** with one entry per atomic action, \
before you touch any other tool. (One call, atomic not N separate \
`task_create` calls.) Then work the list one task at a time:
You are the agent. **For multi-step work (2+ atomic actions): call \
`task_create_batch`** with one entry per atomic action, \
before you touch any other tool. \
Then work the list one task at a time:
1. `task_update` in_progress before you start the step.
2. Do one real inline instance open the browser, call the real API, \
@@ -436,14 +429,18 @@ the panel shows a hung spinner no matter how much real work you got \
done.
**Granularity: one task per atomic action, not one umbrella per project.** \
Replying to 5 posts is 5 tasks, not 1. Crawling 3 sites is 3 tasks. \
An umbrella task that stays `in_progress` for the whole run looks \
identical to the user as "the queen is stuck".
Once one task succeeds inline, scale order for the rest of that task's \
work: repeat inline (10 items) `run_parallel_workers` (batch, \
results now) `create_colony` (recurring / background).
Once finishing all current tasks, discuss with user about building \
a colony so this sucess can be repeated or scaled
### How to handle large scale tasks
If the user ask you to finish the same task repeatly or at large scale \
(more than 10 times), tell the user that you can do it once first then \
build a colony to fulfill the request but succeeding it once will be \
beneficial to run it in the future, \
then focus on finishing the task once first.
### How to handle simple task (less then 2 atomic items)
For conceptual or strategic questions, single-tool-call work, \
greetings, or chat: answer directly in prose. Skip `task_*`, skip the \
planning ceremony the bar is "real multi-step work the user benefits \
@@ -455,15 +452,8 @@ _queen_behavior_always = """
## Communication
- Your LLM reply text is what the user reads. Do NOT use \
`run_command`, `echo`, or any other tool to "say" something tools \
are for work (read/search/edit/run), not speech.
- On a greeting or chat ("hi", "how's it going"), reply in plain \
prose and stop. Do not call tools to "discover" what the user wants. \
Check recall memory for name / role / past topics and weave them into \
a 12 sentence in-character greeting, then wait.
- On a clear ask (build, edit, run, investigate, search), call the \
appropriate tool on the same turn don't narrate intent and stop.
appropriate tool following user's intent \
- You are curious to understand the user. Use `ask_user` when the user's \
response is needed to continue: to resolve ambiguity, collect missing \
information, request approval, compare real trade-offs, gather post-task \
@@ -491,20 +481,6 @@ asserting them as fact.
_queen_behavior_always = _queen_behavior_always + _queen_memory_instructions
_queen_style = """
# Communication
## Adaptive Calibration
Read the user's signals and calibrate your register:
- Short responses -> they want brevity. Match it.
- "Why?" questions -> they want reasoning. Provide it.
- Correct technical terms -> they know the domain. Skip basics.
- Terse or frustrated ("just do X") -> acknowledge and simplify.
- Exploratory ("what if...", "could we also...") -> slow down and explore.
"""
queen_node = NodeSpec(
id="queen",
name="Queen",
@@ -525,7 +501,6 @@ queen_node = NodeSpec(
system_prompt=(
_queen_character_core
+ _queen_role_independent
+ _queen_style
+ _queen_tools_independent
+ _queen_behavior_always
+ _queen_behavior_independent
@@ -555,5 +530,4 @@ __all__ = [
"_queen_tools_reviewing",
"_queen_behavior_always",
"_queen_behavior_independent",
"_queen_style",
]
@@ -1279,12 +1279,8 @@ def format_queen_identity_prompt(profile: dict[str, Any], *, max_examples: int |
"<negative_constraints>\n"
"- NEVER use corporate filler ('leverage', 'synergy', "
"'circle back', 'at the end of the day').\n"
"- NEVER use AI assistant phrases ('How can I help you "
"today?', 'As an AI', 'I'd be happy to').\n"
"- NEVER break character to explain your thought process "
"or reference your hidden background.\n"
"- Speak like a real person in your role -- direct, "
"opinionated, occasionally imperfect.\n"
"</negative_constraints>"
)
@@ -4,7 +4,7 @@ Every queen inherits the same MCP surface (all servers loaded for the
queen agent), but exposing 94+ tools to every persona clutters the LLM
tool catalog and wastes prompt tokens. This module defines a sensible
default allowlist per queen persona so, e.g., Head of Legal doesn't
see port scanners and Head of Finance doesn't see ``apply_patch``.
see port scanners and Head of Brand & Design doesn't see CSV/SQL tools.
Defaults apply only when the queen has no ``tools.json`` sidecar the
moment the user saves an allowlist through the Tool Library, the
@@ -36,35 +36,39 @@ logger = logging.getLogger(__name__)
# the named entries only).
_TOOL_CATEGORIES: dict[str, list[str]] = {
# Read-only file operations — safe baseline for every knowledge queen.
"file_read": [
"read_file",
"list_directory",
"list_dir",
"list_files",
"search_files",
"grep_search",
# Unified file ops — read, write, edit, search across the files-tools
# MCP server (read_file, write_file, edit_file, search_files). pdf_read
# lives in hive_tools so it's listed explicitly; without it queens
# cannot read PDF documents by default.
"file_ops": [
"@server:files-tools",
"pdf_read",
],
# File mutation — only personas that author or edit artifacts.
"file_write": [
"write_file",
"edit_file",
"apply_diff",
"apply_patch",
"replace_file_content",
"hashline_edit",
"undo_changes",
# Terminal basic — the 3-tool subset queens get out of the box.
# terminal_exec — foreground command execution (Bash equivalent)
# terminal_rg — ripgrep content search (Grep equivalent)
# terminal_find — glob/find file listing (Glob equivalent)
"terminal_basic": [
"terminal_exec",
"terminal_rg",
"terminal_find",
],
# Shell + process control — engineering personas only.
"shell": [
"run_command",
"execute_command_tool",
"bash_kill",
"bash_output",
# Terminal advanced — the power-user tools beyond the basics. Not in
# any role default; opt in explicitly per-queen via the Tool Library.
# terminal_job_* — background job lifecycle (start/manage/logs)
# terminal_output_get — fetch captured output from foreground exec
# terminal_pty_* — persistent PTY sessions (open/run/close)
"terminal_advanced": [
"terminal_job_start",
"terminal_job_manage",
"terminal_job_logs",
"terminal_output_get",
"terminal_pty_open",
"terminal_pty_run",
"terminal_pty_close",
],
# Tabular data. CSV/Excel read/write + DuckDB SQL.
"data": [
"spreadsheet_advanced": [
"csv_read",
"csv_info",
"csv_write",
@@ -78,49 +82,74 @@ _TOOL_CATEGORIES: dict[str, list[str]] = {
"excel_sheet_list",
"excel_sql",
],
# Browser automation — every tool from the gcu-tools MCP server.
"browser": ["@server:gcu-tools"],
# External research / information-gathering.
"research": [
"search_papers",
"download_paper",
"search_wikipedia",
"web_scrape",
# Browser lifecycle + read-only inspection (navigation, snapshots, query).
# Split out from interaction so personas that only need to *observe* pages
# (e.g. research, status checks) don't pull in click/type/drag/etc.
"browser_basic": [
"browser_setup",
"browser_status",
"browser_stop",
"browser_tabs",
"browser_open",
"browser_close",
"browser_activate_tab",
"browser_navigate",
"browser_go_back",
"browser_go_forward",
"browser_reload",
"browser_screenshot",
"browser_snapshot",
"browser_html",
"browser_console",
"browser_evaluate",
"browser_get_text",
"browser_get_attribute",
"browser_get_rect",
"browser_shadow_query",
],
# Security scanners — pentest-ish, only for engineering/security roles.
# Browser interaction — anything that mutates page state (clicks, typing,
# drag, scrolling, dialogs, file uploads). Pair with browser_basic for
# full automation; omit for read-only personas.
"browser_interaction": [
"browser_click",
"browser_click_coordinate",
"browser_type",
"browser_type_focused",
"browser_press",
"browser_press_at",
"browser_hover",
"browser_hover_coordinate",
"browser_select",
"browser_scroll",
"browser_drag",
"browser_wait",
"browser_resize",
"browser_upload",
],
# Research — paper search, Wikipedia, ad-hoc web scrape. Pair with
# browser_basic for richer site-by-site research; this category is the
# lightweight always-available fallback.
"research": ["web_scrape", "pdf_read"],
# Security — defensive scanning and reconnaissance. Engineering-only
# surface; the rest of the queens shouldn't see port scanners.
"security": [
"port_scan",
"dns_security_scan",
"http_headers_scan",
"port_scan",
"ssl_tls_scan",
"subdomain_enumerate",
"tech_stack_detect",
"risk_score",
],
# Lightweight context helpers — good default for every queen.
"time_context": [
"context_awareness": [
"get_current_time",
"get_account_info",
],
# Runtime log inspection — debug/observability for builder personas.
"runtime_inspection": [
"query_runtime_logs",
"query_runtime_log_details",
"query_runtime_log_raw",
],
# Agent-management tools — building/validating/checking agents.
"agent_mgmt": [
"list_agents",
"list_agent_tools",
"list_agent_sessions",
"get_agent_checkpoint",
"list_agent_checkpoints",
"run_agent_tests",
"save_agent_draft",
"confirm_and_build",
"validate_agent_package",
"validate_agent_tools",
"enqueue_task",
# BI / financial chart + diagram rendering. Calling chart_render
# both embeds the chart live in chat and produces a downloadable PNG.
"charts": [
"@server:chart-tools",
],
}
@@ -139,81 +168,86 @@ _TOOL_CATEGORIES: dict[str, list[str]] = {
# user-added custom queen IDs that we don't know about.
QUEEN_DEFAULT_CATEGORIES: dict[str, list[str]] = {
# Head of Technology — builds and operates systems; full toolkit.
# Head of Technology — builds and operates systems. Security tools
# (port_scan, subdomain_enumerate, etc.) are intentionally NOT in the
# default — users opt in via the Tool Library when an engagement
# actually needs reconnaissance.
"queen_technology": [
"file_read",
"file_write",
"shell",
"data",
"browser",
"file_ops",
"terminal_basic",
"browser_basic",
"browser_interaction",
"research",
"security",
"time_context",
"runtime_inspection",
"agent_mgmt",
"context_awareness",
"charts",
],
# Head of Growth — data, experiments, competitor research; no shell/security.
# Head of Growth — data, experiments, competitor research; no security.
"queen_growth": [
"file_read",
"file_write",
"data",
"browser",
"file_ops",
"terminal_basic",
"browser_basic",
"browser_interaction",
"research",
"time_context",
"context_awareness",
"charts",
],
# Head of Product Strategy — user research + roadmaps; no shell/security.
# Head of Product Strategy — user research + roadmaps; no security.
"queen_product_strategy": [
"file_read",
"file_write",
"data",
"browser",
"file_ops",
"terminal_basic",
"browser_basic",
"browser_interaction",
"research",
"time_context",
"context_awareness",
"charts",
],
# Head of Finance — financial models (CSV/Excel heavy), market research.
"queen_finance_fundraising": [
"file_read",
"file_write",
"data",
"browser",
"file_ops",
"terminal_basic",
"spreadsheet_advanced",
"browser_basic",
"browser_interaction",
"research",
"time_context",
"context_awareness",
"charts",
],
# Head of Legal — reads contracts/PDFs, researches; no shell/data/security.
# Head of Legal — reads contracts/PDFs, researches; no data/security.
"queen_legal": [
"file_read",
"file_write",
"browser",
"file_ops",
"terminal_basic",
"browser_basic",
"browser_interaction",
"research",
"time_context",
"context_awareness",
],
# Head of Brand & Design — visual refs, style guides; no shell/data/security.
# Head of Brand & Design — visual refs, style guides; no data/security.
"queen_brand_design": [
"file_read",
"file_write",
"browser",
"file_ops",
"terminal_basic",
"browser_basic",
"browser_interaction",
"research",
"time_context",
"context_awareness",
],
# Head of Talent — candidate pipelines, resumes; data + browser heavy.
"queen_talent": [
"file_read",
"file_write",
"data",
"browser",
"file_ops",
"terminal_basic",
"browser_basic",
"browser_interaction",
"research",
"time_context",
"context_awareness",
],
# Head of Operations — processes, automation, observability.
"queen_operations": [
"file_read",
"file_write",
"data",
"browser",
"research",
"time_context",
"runtime_inspection",
"agent_mgmt",
"file_ops",
"terminal_basic",
"spreadsheet_advanced",
"browser_basic",
"browser_interaction",
"context_awareness",
"charts",
],
}
@@ -223,6 +257,49 @@ def has_role_default(queen_id: str) -> bool:
return queen_id in QUEEN_DEFAULT_CATEGORIES
def list_category_names() -> list[str]:
"""Return every category name defined in the table, in declaration order."""
return list(_TOOL_CATEGORIES.keys())
def queen_role_categories(queen_id: str) -> list[str]:
"""Return the category names assigned to ``queen_id`` by role default.
Returns an empty list for queens not in the persona table (they fall
through to allow-all and have no implicit category membership).
"""
return list(QUEEN_DEFAULT_CATEGORIES.get(queen_id, []))
def resolve_category_tools(
category: str,
mcp_catalog: dict[str, list[dict[str, Any]]] | None = None,
) -> list[str]:
"""Expand a single category to its concrete tool names.
Mirrors ``resolve_queen_default_tools`` but for a single category, so
callers (e.g. the Tool Library API) can present per-category tool
membership without re-implementing the ``@server:NAME`` shorthand
expansion.
"""
names: list[str] = []
seen: set[str] = set()
for entry in _TOOL_CATEGORIES.get(category, []):
if entry.startswith("@server:"):
server_name = entry[len("@server:") :]
if mcp_catalog is None:
continue
for tool in mcp_catalog.get(server_name, []) or []:
tname = tool.get("name") if isinstance(tool, dict) else None
if tname and tname not in seen:
seen.add(tname)
names.append(tname)
elif entry not in seen:
seen.add(entry)
names.append(entry)
return names
def resolve_queen_default_tools(
queen_id: str,
mcp_catalog: dict[str, list[dict[str, Any]]] | None = None,
@@ -17,8 +17,8 @@ Use browser nodes (with `tools: {policy: "all"}`) when:
## Available Browser Tools
All tools are prefixed with `browser_`:
- `browser_start`, `browser_open`, `browser_navigate` — launch/navigate
- `browser_click`, `browser_click_coordinate`, `browser_fill`, `browser_type`, `browser_type_focused` — interact
- `browser_open`, `browser_navigate` — both lazy-create the browser context, so a single `browser_open(url)` covers the cold path. To recover from a stale context, call `browser_stop` then `browser_open(url)` again.
- `browser_click`, `browser_click_coordinate`, `browser_type`, `browser_type_focused` — interact
- `browser_press` (with optional `modifiers=["ctrl"]` etc.) — keyboard shortcuts
- `browser_snapshot` — compact accessibility-tree read (structured)
<!-- vision-only -->
@@ -27,7 +27,7 @@ All tools are prefixed with `browser_`:
- `browser_shadow_query`, `browser_get_rect` — locate elements (shadow-piercing via `>>>`)
- `browser_scroll`, `browser_wait` — navigation helpers
- `browser_evaluate` — run JavaScript
- `browser_close`, `browser_close_finished` — tab cleanup
- `browser_close` — tab cleanup (call per tab; closes the active tab when `tab_id` is omitted)
## Pick the right reading tool
+4 -3
View File
@@ -162,9 +162,10 @@ def get_vision_fallback_model() -> str | None:
Used by the agent-loop hook that captions tool-result images when the
main agent's model cannot accept image content (text-only LLMs).
When this returns None the fallback chain skips the configured-subagent
stage and proceeds straight to the generic caption rotation
(``_describe_images_as_text``).
When this returns None the captioning chain's configured + retry
attempts both no-op (returning None), and only the final
``gemini/gemini-3-flash-preview`` override has a chance to succeed
and only if a ``GEMINI_API_KEY`` is set in the environment.
"""
vision = get_hive_config().get("vision_fallback", {})
if vision.get("provider") and vision.get("model"):
+6 -1
View File
@@ -16,9 +16,14 @@ import os
import stat
from pathlib import Path
# Resolved once at module import. ``framework.config.HIVE_HOME`` reads
# the desktop's ``HIVE_HOME`` env var at its own import time, so the
# runtime always sees the per-user root before this constant is computed.
from framework.config import HIVE_HOME as _HIVE_HOME
logger = logging.getLogger(__name__)
CREDENTIAL_KEY_PATH = Path.home() / ".hive" / "secrets" / "credential_key"
CREDENTIAL_KEY_PATH = _HIVE_HOME / "secrets" / "credential_key"
CREDENTIAL_KEY_ENV_VAR = "HIVE_CREDENTIAL_KEY"
ADEN_CREDENTIAL_ID = "aden_api_key"
ADEN_ENV_VAR = "ADEN_API_KEY"
+12 -3
View File
@@ -128,7 +128,9 @@ class EncryptedFileStorage(CredentialStorage):
Initialize encrypted storage.
Args:
base_path: Directory for credential files. Defaults to ~/.hive/credentials.
base_path: Directory for credential files. Defaults to
``$HIVE_HOME/credentials`` (per-user) when HIVE_HOME is set,
else ``~/.hive/credentials``.
encryption_key: 32-byte Fernet key. If None, reads from env var.
key_env_var: Environment variable containing encryption key
"""
@@ -139,7 +141,14 @@ class EncryptedFileStorage(CredentialStorage):
"Encrypted storage requires 'cryptography'. Install with: uv pip install cryptography"
) from e
self.base_path = Path(base_path or self.DEFAULT_PATH).expanduser()
if base_path is None:
# Honor HIVE_HOME (set by the desktop shell to a per-user dir) so
# the encrypted store doesn't fork between ~/.hive and the desktop
# userData root. Falls back to ~/.hive/credentials when standalone.
from framework.config import HIVE_HOME
base_path = HIVE_HOME / "credentials"
self.base_path = Path(base_path).expanduser()
self._ensure_dirs()
self._key_env_var = key_env_var
@@ -510,7 +519,7 @@ class EnvVarStorage(CredentialStorage):
def exists(self, credential_id: str) -> bool:
"""Check if credential is available in environment."""
env_var = self._get_env_var_name(credential_id)
return self._read_env_value(env_var) is not None
return bool(self._read_env_value(env_var))
def add_mapping(self, credential_id: str, env_var: str) -> None:
"""
+3 -2
View File
@@ -745,13 +745,14 @@ class CredentialStore:
token = store.get_key("hubspot", "access_token")
"""
import os
from pathlib import Path
from .storage import EncryptedFileStorage
# Determine local storage path
if local_path is None:
local_path = str(Path.home() / ".hive" / "credentials")
from framework.config import HIVE_HOME
local_path = str(HIVE_HOME / "credentials")
local_storage = EncryptedFileStorage(base_path=local_path)
@@ -258,6 +258,14 @@ class TestEnvVarStorage:
with pytest.raises(NotImplementedError):
storage.delete("test")
def test_exists_matches_load_for_empty_value(self):
"""Test exists() and load() stay consistent for empty values."""
storage = EnvVarStorage(env_mapping={"empty": "EMPTY_API_KEY"})
with patch.object(storage, "_read_env_value", return_value=""):
assert storage.load("empty") is None
assert not storage.exists("empty")
class TestEncryptedFileStorage:
"""Tests for EncryptedFileStorage."""
+3 -1
View File
@@ -42,7 +42,9 @@ def _open_event_log() -> IO[str] | None:
return None
raw = _DEBUG_EVENTS_RAW
if raw.lower() in ("1", "true", "full"):
log_dir = Path.home() / ".hive" / "event_logs"
from framework.config import HIVE_HOME
log_dir = HIVE_HOME / "event_logs"
else:
log_dir = Path(raw)
log_dir.mkdir(parents=True, exist_ok=True)
+4 -2
View File
@@ -7,7 +7,7 @@ verify SOP gates before marking a task done. This gives cross-run memory
that the existing per-iteration stall detectors don't have.
The DB is driven by agents via the ``sqlite3`` CLI through
``execute_command_tool``. This module handles framework-side lifecycle:
``terminal_exec``. This module handles framework-side lifecycle:
creation, migration, queen-side bulk seeding, stale-claim reclamation.
Concurrency model:
@@ -264,7 +264,9 @@ def ensure_all_colony_dbs(colonies_root: Path | None = None) -> list[Path]:
run the stale-claim reclaimer on all of them in one pass.
"""
if colonies_root is None:
colonies_root = Path.home() / ".hive" / "colonies"
from framework.config import COLONIES_DIR
colonies_root = COLONIES_DIR
if not colonies_root.is_dir():
return []
+3 -2
View File
@@ -23,6 +23,7 @@ from collections.abc import AsyncIterator, Callable, Iterator
from pathlib import Path
from typing import Any
from framework.config import HIVE_HOME as _HIVE_HOME
from framework.llm.provider import LLMProvider, LLMResponse, Tool
from framework.llm.stream_events import (
FinishEvent,
@@ -50,8 +51,8 @@ _ENDPOINTS = [
_DEFAULT_PROJECT_ID = "rising-fact-p41fc"
_TOKEN_REFRESH_BUFFER_SECS = 60
# Credentials file in ~/.hive/ (native implementation)
_ACCOUNTS_FILE = Path.home() / ".hive" / "antigravity-accounts.json"
# Credentials file in $HIVE_HOME (native implementation)
_ACCOUNTS_FILE = _HIVE_HOME / "antigravity-accounts.json"
_IDE_STATE_DB_MAC = (
Path.home() / "Library" / "Application Support" / "Antigravity" / "User" / "globalStorage" / "state.vscdb"
)
+100 -9
View File
@@ -44,6 +44,30 @@ logging.getLogger("httpx").setLevel(logging.WARNING)
logging.getLogger("httpcore").setLevel(logging.WARNING)
def _api_base_needs_bearer_auth(api_base: str | None) -> bool:
"""Return True when api_base points at an Anthropic-compatible endpoint
that authenticates via ``Authorization: Bearer`` rather than ``x-api-key``.
The Hive LLM proxy (Rust service in hive-backend/llm/) speaks the
Anthropic Messages API but mints user-scoped JWTs and validates them
via Bearer auth. Default upstream Anthropic endpoints (api.anthropic.com,
Kimi's api.kimi.com/coding) keep using x-api-key, so the override is
scoped to known hive-proxy hosts plus the env-configured override.
"""
if not api_base:
return False
# Strip protocol, port, and path so a plain hostname compare is enough
# for the common cases.
lowered = api_base.lower()
for host in ("adenhq.com", "open-hive.com", "127.0.0.1:8890", "localhost:8890"):
if host in lowered:
return True
override = os.environ.get("HIVE_LLM_BASE_URL")
if override and override.lower() in lowered:
return True
return False
def _patch_litellm_anthropic_oauth() -> None:
"""Patch litellm's Anthropic header construction to fix OAuth token handling.
@@ -187,6 +211,44 @@ def _ensure_ollama_chat_prefix(model: str) -> str:
return model
def rewrite_proxy_model(
model: str, api_key: str | None, api_base: str | None
) -> tuple[str, str | None, dict[str, str]]:
"""Apply Hive/Kimi proxy rewrites for any caller of ``litellm.acompletion``.
Both the Hive LLM proxy and Kimi For Coding expose Anthropic-API-
compatible endpoints. LiteLLM doesn't recognise the ``hive/`` or
``kimi/`` prefixes natively, so we rewrite them to ``anthropic/``
here. For the Hive proxy we also stamp a Bearer token into
``extra_headers`` because litellm's Anthropic handler only sends
``x-api-key`` and the proxy expects ``Authorization: Bearer``.
Used by ad-hoc ``litellm.acompletion`` callers (e.g. the vision-
fallback subagent in ``caption_tool_image``) so they hit the same
proxy with the same auth as the main agent's ``LiteLLMProvider``.
The provider's own ``__init__`` keeps its inlined rewrite for now —
this helper is the single source of truth for ad-hoc callers.
Returns: (rewritten_model, normalised_api_base, extra_headers).
The ``extra_headers`` dict is non-empty only for the Hive proxy
(and only when ``api_key`` is provided).
"""
extra_headers: dict[str, str] = {}
if model.lower().startswith("kimi/"):
model = "anthropic/" + model[len("kimi/") :]
if api_base and api_base.rstrip("/").endswith("/v1"):
api_base = api_base.rstrip("/")[:-3]
elif model.lower().startswith("hive/"):
model = "anthropic/" + model[len("hive/") :]
if api_base and api_base.rstrip("/").endswith("/v1"):
api_base = api_base.rstrip("/")[:-3]
# Hive proxy expects Bearer auth; litellm's Anthropic handler
# only sends x-api-key without this nudge.
if api_key:
extra_headers["Authorization"] = f"Bearer {api_key}"
return model, api_base, extra_headers
RATE_LIMIT_MAX_RETRIES = 10
RATE_LIMIT_BACKOFF_BASE = 2 # seconds
RATE_LIMIT_MAX_DELAY = 120 # seconds - cap to prevent absurd waits
@@ -353,10 +415,17 @@ OPENROUTER_TOOL_COMPAT_MODEL_CACHE: dict[str, float] = {}
# from rate-limit retries — 3 retries is sufficient for connection failures.
STREAM_TRANSIENT_MAX_RETRIES = 3
# Directory for dumping failed requests
FAILED_REQUESTS_DIR = Path.home() / ".hive" / "failed_requests"
# Maximum number of dump files to retain in ~/.hive/failed_requests/.
# Directory for dumping failed requests. Resolved lazily so HIVE_HOME
# overrides (set by the desktop shell) take effect even if this module
# is imported before framework.config picks up the override.
def _failed_requests_dir() -> Path:
from framework.config import HIVE_HOME
return HIVE_HOME / "failed_requests"
# Maximum number of dump files to retain in $HIVE_HOME/failed_requests/.
# Older files are pruned automatically to prevent unbounded disk growth.
MAX_FAILED_REQUEST_DUMPS = 50
@@ -548,7 +617,7 @@ def _prune_failed_request_dumps(max_files: int = MAX_FAILED_REQUEST_DUMPS) -> No
"""
try:
all_dumps = sorted(
FAILED_REQUESTS_DIR.glob("*.json"),
_failed_requests_dir().glob("*.json"),
key=lambda f: f.stat().st_mtime,
)
excess = len(all_dumps) - max_files
@@ -583,11 +652,12 @@ def _dump_failed_request(
) -> str:
"""Dump failed request to a file for debugging. Returns the file path."""
try:
FAILED_REQUESTS_DIR.mkdir(parents=True, exist_ok=True)
dump_dir = _failed_requests_dir()
dump_dir.mkdir(parents=True, exist_ok=True)
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S_%f")
filename = f"{error_type}_{model.replace('/', '_')}_{timestamp}.json"
filepath = FAILED_REQUESTS_DIR / filename
filepath = dump_dir / filename
# Build dump data
messages = kwargs.get("messages", [])
@@ -617,7 +687,7 @@ def _dump_failed_request(
return str(filepath)
except OSError as e:
logger.warning(f"Failed to dump request debug log to {FAILED_REQUESTS_DIR}: {e}")
logger.warning(f"Failed to dump request debug log to {_failed_requests_dir()}: {e}")
return "log_write_failed"
@@ -931,6 +1001,7 @@ class LiteLLMProvider(LLMProvider):
# Translate kimi/ prefix to anthropic/ so litellm uses the Anthropic
# Messages API handler and routes to that endpoint — no special headers needed.
_original_model = model
self._hive_proxy_auth = bool(_original_model.lower().startswith("hive/"))
if _is_ollama_model(model):
model = _ensure_ollama_chat_prefix(model)
elif model.lower().startswith("kimi/"):
@@ -984,6 +1055,7 @@ class LiteLLMProvider(LLMProvider):
these attributes in-place propagates to all callers on the next LLM call.
"""
_original_model = model
self._hive_proxy_auth = bool(_original_model.lower().startswith("hive/"))
if _is_ollama_model(model):
model = _ensure_ollama_chat_prefix(model)
elif model.lower().startswith("kimi/"):
@@ -1223,6 +1295,16 @@ class LiteLLMProvider(LLMProvider):
# Ollama requires explicit tool_choice=auto for function calling
# so future readers don't have to guess.
kwargs.setdefault("tool_choice", "auto")
elif self._hive_proxy_auth:
# The Hive LLM proxy fronts GLM, which drifts into "explain
# the plan" mode on long-context turns instead of emitting
# tool_use blocks (verified 2026-04-28: tool_choice=null →
# text-only stop=stop; tool_choice=required → clean
# tool_use). Force a tool call when tools are available
# so queens can't get stuck in chat mode. Callers that
# legitimately want a non-tool turn can override via
# extra_kwargs.
kwargs.setdefault("tool_choice", "required")
# Add response_format for structured output
# LiteLLM passes this through to the underlying provider
@@ -1460,6 +1542,10 @@ class LiteLLMProvider(LLMProvider):
# Ollama requires explicit tool_choice=auto for function calling
# so future readers don't have to guess.
kwargs.setdefault("tool_choice", "auto")
elif self._hive_proxy_auth:
# See `complete()` for the rationale: GLM behind the Hive
# proxy needs forcing or it goes chat-mode on long contexts.
kwargs.setdefault("tool_choice", "required")
if response_format:
kwargs["response_format"] = response_format
@@ -2170,9 +2256,10 @@ class LiteLLMProvider(LLMProvider):
if logger.isEnabledFor(logging.DEBUG) and full_messages:
import json as _json
from datetime import datetime as _dt
from pathlib import Path as _Path
_debug_dir = _Path.home() / ".hive" / "debug_logs"
from framework.config import HIVE_HOME as _HIVE_HOME
_debug_dir = _HIVE_HOME / "debug_logs"
_debug_dir.mkdir(parents=True, exist_ok=True)
_ts = _dt.now().strftime("%Y%m%d_%H%M%S_%f")
_dump_file = _debug_dir / f"llm_request_{_ts}.json"
@@ -2243,6 +2330,10 @@ class LiteLLMProvider(LLMProvider):
# Ollama requires explicit tool_choice=auto for function calling
# so future readers don't have to guess.
kwargs.setdefault("tool_choice", "auto")
elif self._hive_proxy_auth:
# See `complete()` for the rationale: GLM behind the Hive
# proxy needs forcing or it goes chat-mode on long contexts.
kwargs.setdefault("tool_choice", "required")
if response_format:
kwargs["response_format"] = response_format
# The Codex ChatGPT backend (Responses API) rejects several params.
+3 -3
View File
@@ -9,7 +9,7 @@ from datetime import UTC
from pathlib import Path
from typing import Any
from framework.config import get_hive_config, get_preferred_model
from framework.config import HIVE_HOME as _HIVE_HOME, get_hive_config, get_preferred_model
from framework.credentials.validation import (
ensure_credential_key_env as _ensure_credential_key_env,
)
@@ -558,7 +558,7 @@ ANTIGRAVITY_IDE_STATE_DB = (
# Linux fallback for the IDE state DB
ANTIGRAVITY_IDE_STATE_DB_LINUX = Path.home() / ".config" / "Antigravity" / "User" / "globalStorage" / "state.vscdb"
# Antigravity credentials stored by native OAuth implementation
ANTIGRAVITY_AUTH_FILE = Path.home() / ".hive" / "antigravity-accounts.json"
ANTIGRAVITY_AUTH_FILE = _HIVE_HOME / "antigravity-accounts.json"
ANTIGRAVITY_OAUTH_TOKEN_URL = "https://oauth2.googleapis.com/token"
_ANTIGRAVITY_TOKEN_LIFETIME_SECS = 3600 # Google access tokens expire in 1 hour
@@ -1389,7 +1389,7 @@ class AgentLoader:
)
if storage_path is None:
storage_path = Path.home() / ".hive" / "agents" / agent_path.name / worker_name
storage_path = _HIVE_HOME / "agents" / agent_path.name / worker_name
storage_path.mkdir(parents=True, exist_ok=True)
runner = cls(
+57 -13
View File
@@ -51,6 +51,14 @@ _DEFAULT_LOCAL_SERVERS: dict[str, dict[str, Any]] = {
"description": "File I/O: read, write, edit, search, list, run commands",
"args": ["run", "python", "files_server.py", "--stdio"],
},
"terminal-tools": {
"description": "Terminal capabilities",
"args": ["run", "python", "terminal_tools_server.py", "--stdio"],
},
"chart-tools": {
"description": "BI/financial chart + diagram rendering: ECharts, Mermaid",
"args": ["run", "python", "chart_tools_server.py", "--stdio"],
},
}
# Aliases that earlier versions of ensure_defaults wrote under the wrong name.
@@ -58,14 +66,22 @@ _DEFAULT_LOCAL_SERVERS: dict[str, dict[str, Any]] = {
# name so the active agents (queen, credential_tester) can find their tools.
_STALE_DEFAULT_ALIASES: dict[str, str] = {
"hive_tools": "hive-tools",
# 2026-04-30: shell-tools renamed to terminal-tools. Drop the stale name
# on next ensure_defaults() so the queen's allowlist (which now includes
# @server:terminal-tools) actually finds a server with the new name.
"terminal-tools": "shell-tools",
}
class MCPRegistry:
"""Manages local MCP server state in ~/.hive/mcp_registry/."""
"""Manages local MCP server state in $HIVE_HOME/mcp_registry/."""
def __init__(self, base_path: Path | None = None):
self._base = base_path or Path.home() / ".hive" / "mcp_registry"
if base_path is None:
from framework.config import HIVE_HOME
base_path = HIVE_HOME / "mcp_registry"
self._base = base_path
self._installed_path = self._base / "installed.json"
self._config_path = self._base / "config.json"
self._cache_dir = self._base / "cache"
@@ -73,7 +89,30 @@ class MCPRegistry:
# ── Initialization ──────────────────────────────────────────────
def initialize(self) -> None:
"""Create directory structure and default files if missing."""
"""Create directory structure, default files, and seed bundled servers.
Every read path (queen orchestrator, pipeline stage, CLI, routes)
calls this keeping the seeding here means a fresh ``HIVE_HOME``
(e.g. the desktop's per-user dir under ``~/.config/Hive/users/<hash>/``
or ``~/Library/Application Support/Hive/users/<hash>/``) is always
populated with ``hive_tools`` / ``gcu-tools`` / ``files-tools`` /
``shell-tools`` before any agent code reads ``installed.json``.
Without this, ``load_agent_selection()`` resolves an empty registry
and emits "Server X requested but not installed" warnings even
though the server is bundled.
Idempotent already-installed entries are left untouched.
"""
self._bootstrap_io()
self._seed_defaults()
def _bootstrap_io(self) -> None:
"""Create the registry directory + empty config/installed files.
Split out from ``initialize()`` so ``_seed_defaults()`` can call it
without re-entering the seeding logic (which would recurse via
``_read_installed()`` ``initialize()``).
"""
self._base.mkdir(parents=True, exist_ok=True)
self._cache_dir.mkdir(parents=True, exist_ok=True)
@@ -84,21 +123,26 @@ class MCPRegistry:
self._write_json(self._installed_path, {"servers": {}})
def ensure_defaults(self) -> list[str]:
"""Seed the built-in local MCP servers (hive-tools, gcu-tools, files-tools).
"""Public alias kept for the ``hive mcp init`` CLI command.
Idempotent servers already present are left untouched. Skips seeding
entirely when the source-tree ``tools/`` directory cannot be located
(e.g. when Hive is installed from a wheel rather than a checkout).
Returns the list of names that were newly registered.
Returns the list of newly-registered server names so the CLI can
print them. Same idempotent seeding logic as ``initialize()``.
"""
self.initialize()
self._bootstrap_io()
return self._seed_defaults()
def _seed_defaults(self) -> list[str]:
"""Idempotently register the bundled default local servers.
Skips entirely when the source-tree ``tools/`` directory cannot
be located (e.g. wheel installs). Returns the list of names that
were newly registered.
"""
# parents: [0]=loader, [1]=framework, [2]=core, [3]=repo root
tools_dir = Path(__file__).resolve().parents[3] / "tools"
if not tools_dir.is_dir():
logger.debug(
"MCPRegistry.ensure_defaults: tools dir %s missing; skipping default seed",
"MCPRegistry._seed_defaults: tools dir %s missing; skipping default seed",
tools_dir,
)
return []
@@ -115,7 +159,7 @@ class MCPRegistry:
for canonical, stale in _STALE_DEFAULT_ALIASES.items():
if stale in existing and canonical not in existing:
logger.info(
"MCPRegistry.ensure_defaults: removing stale alias '%s' (canonical: '%s')",
"MCPRegistry._seed_defaults: removing stale alias '%s' (canonical: '%s')",
stale,
canonical,
)
@@ -138,7 +182,7 @@ class MCPRegistry:
)
added.append(name)
except MCPError as exc:
logger.warning("MCPRegistry.ensure_defaults: failed to seed '%s': %s", name, exc)
logger.warning("MCPRegistry._seed_defaults: failed to seed '%s': %s", name, exc)
if added:
logger.info("MCPRegistry: seeded default local servers: %s", added)
+23 -30
View File
@@ -71,25 +71,36 @@ class ToolRegistry:
{
# File system reads
"read_file",
"list_directory",
"grep",
"glob",
# Web reads
"web_search",
"web_fetch",
"search_files",
"pdf_read",
# Terminal reads (rg / find / output buffer polling — neither
# changes process state)
"terminal_rg",
"terminal_find",
"terminal_output_get",
# Web / research reads (re-issuable, side-effect-free fetches)
"web_scrape",
"search_papers",
"search_wikipedia",
"download_paper",
# Browser read-only snapshots (mutate-free observations)
"browser_screenshot",
"browser_snapshot",
"browser_console",
"browser_get_text",
# Background bash polling - reads output buffers only, does
# not touch the subprocess itself.
"bash_output",
"browser_html",
"browser_get_attribute",
"browser_get_rect",
}
)
# Credential directory used for change detection
_CREDENTIAL_DIR = Path("~/.hive/credentials/credentials").expanduser()
# Credential directory used for change detection. Resolved at attribute
# access so HIVE_HOME overrides (set by the desktop) are honoured.
@property
def _CREDENTIAL_DIR(self) -> Path:
from framework.config import HIVE_HOME
return HIVE_HOME / "credentials" / "credentials"
def __init__(self):
self._tools: dict[str, RegisteredTool] = {}
@@ -457,7 +468,7 @@ class ToolRegistry:
else:
resolved_cwd = (base_dir / cwd).resolve()
# Find .py script in args (e.g. coder_tools_server.py, files_server.py)
# Find .py script in args (e.g. files_server.py)
script_name = None
for i, arg in enumerate(args):
if isinstance(arg, str) and arg.endswith(".py"):
@@ -497,24 +508,6 @@ class ToolRegistry:
config["cwd"] = str(resolved_cwd)
return config
# For coder_tools_server, inject --project-root so reads land
# in the expected workspace (hive repo, for framework skills
# and docs), and inject --write-root so writes land under
# ~/.hive/workspace/ instead of polluting the git checkout
# with queen-authored skills, ledgers, and scripts. Without
# the split, every ``write_file`` call from the queen landed
# in the hive repo root.
if script_name and "coder_tools" in script_name:
project_root = str(resolved_cwd.parent.resolve())
args = list(args)
if "--project-root" not in args:
args.extend(["--project-root", project_root])
if "--write-root" not in args:
_write_root = Path.home() / ".hive" / "workspace"
_write_root.mkdir(parents=True, exist_ok=True)
args.extend(["--write-root", str(_write_root)])
config["args"] = args
if os.name == "nt":
# Windows: cwd=None avoids WinError 267; use absolute script path
config["cwd"] = None
-2
View File
@@ -29,9 +29,7 @@ _ALWAYS_AVAILABLE_TOOLS: frozenset[str] = frozenset(
"read_file",
"write_file",
"edit_file",
"list_directory",
"search_files",
"hashline_edit",
"set_output",
"escalate",
}
+5 -4
View File
@@ -35,7 +35,7 @@ Follow these rules for reliable, efficient browser interaction.
Use snapshot first for structure and ordinary controls; switch to
screenshot when snapshot can't find or verify the target. Interaction
tools (`browser_click`, `browser_type`, `browser_type_focused`,
`browser_fill`, `browser_scroll`) wait 0.5 s for the page to settle
`browser_scroll`) wait 0.5 s for the page to settle
after a successful action, then attach a fresh snapshot under the
`snapshot` key of their result so don't call `browser_snapshot`
separately after an interaction unless you need a newer view. Tune
@@ -140,8 +140,9 @@ shortcut dispatcher requires both), then releases in reverse order.
## Tab management
Close tabs as soon as you're done with them — not only at the end of
the task. `browser_close(target_id=...)` for one, `browser_close_finished()`
for a full cleanup. Never accumulate more than 3 open tabs.
the task. Use `browser_close(tab_id=...)` (or no arg to close the
active tab); call it for each tab when cleaning up after a multi-tab
workflow. Never accumulate more than 3 open tabs.
`browser_tabs` reports an `origin` field: `"agent"` (you own it, close
when done), `"popup"` (close after extracting), `"startup"`/`"user"`
(leave alone).
@@ -157,7 +158,7 @@ cookie consent banners if they block content.
- If `browser_snapshot` fails, try `browser_get_text` with a narrow
selector as fallback.
- If `browser_open` fails or the page seems stale, `browser_stop`
`browser_start` retry.
`browser_open(url)` to lazy-create a fresh context.
## `browser_evaluate`
+8 -9
View File
@@ -331,10 +331,10 @@ class Orchestrator:
# Strip tool names that aren't registered in this runtime instead of
# hard-failing. The worker is forked from the queen's tool snapshot
# which may include MCP tools the worker's runtime doesn't load (e.g.
# coder-tools agent-management tools). Blocking the worker on missing
# tools leaves the queen stranded mid-task; stripping + warning lets
# the worker proceed with what it does have.
# which may include MCP tools the worker's runtime doesn't load.
# Blocking the worker on missing tools leaves the queen stranded
# mid-task; stripping + warning lets the worker proceed with what
# it does have.
for node in graph.nodes:
if node.id not in reachable:
continue
@@ -683,11 +683,10 @@ class Orchestrator:
# Set per-execution data_dir and agent_id so data tools and
# spillover files share the same session-scoped directory, and
# so MCP tools whose server-side schemas mark agent_id as a
# required field (list_dir, hashline_edit, replace_file_content,
# execute_command_tool, …) get a valid value injected even on
# registry instances where agent_loader.setup() didn't populate
# the session_context. Without this, FastMCP rejects those
# calls with "agent_id is a required property".
# required field get a valid value injected even on registry
# instances where agent_loader.setup() didn't populate the
# session_context. Without this, FastMCP rejects those calls
# with "agent_id is a required property".
_ctx_token = None
if self._storage_path:
from framework.loader.tool_registry import ToolRegistry
@@ -44,6 +44,9 @@ class McpRegistryStage(PipelineStage):
from framework.loader.mcp_registry import MCPRegistry
from framework.orchestrator.files import FILES_MCP_SERVER_NAME
# Bundled defaults (hive_tools / gcu-tools / files-tools / shell-tools)
# are seeded inside MCPRegistry.initialize(); resolve_for_agent below
# will find them even on a fresh HIVE_HOME.
registry = MCPRegistry()
mcp_loaded = False
+60 -11
View File
@@ -1,5 +1,6 @@
"""aiohttp Application factory for the Hive HTTP API server."""
import hmac
import logging
import os
from pathlib import Path
@@ -21,7 +22,9 @@ _ALLOWED_AGENT_ROOTS: tuple[Path, ...] | None = None
def _has_encrypted_credentials() -> bool:
"""Return True when an encrypted credential store already exists on disk."""
cred_dir = Path.home() / ".hive" / "credentials" / "credentials"
from framework.config import HIVE_HOME
cred_dir = HIVE_HOME / "credentials" / "credentials"
return cred_dir.is_dir() and any(cred_dir.glob("*.enc"))
@@ -30,17 +33,18 @@ def _get_allowed_agent_roots() -> tuple[Path, ...]:
Roots are anchored to the repository root (derived from ``__file__``)
so the allowlist is correct regardless of the process's working
directory.
directory. The hive-home subtrees honour ``HIVE_HOME`` so the desktop's
per-user root is allowed in addition to (or instead of) ``~/.hive``.
"""
global _ALLOWED_AGENT_ROOTS
if _ALLOWED_AGENT_ROOTS is None:
from framework.config import COLONIES_DIR
from framework.config import COLONIES_DIR, HIVE_HOME
_ALLOWED_AGENT_ROOTS = (
COLONIES_DIR.resolve(), # ~/.hive/colonies/
COLONIES_DIR.resolve(), # $HIVE_HOME/colonies/
(_REPO_ROOT / "exports").resolve(), # compat fallback
(_REPO_ROOT / "examples").resolve(),
(Path.home() / ".hive" / "agents").resolve(),
(HIVE_HOME / "agents").resolve(),
)
return _ALLOWED_AGENT_ROOTS
@@ -62,7 +66,8 @@ def validate_agent_path(agent_path: str | Path) -> Path:
if resolved.is_relative_to(root) and resolved != root:
return resolved
raise ValueError(
"agent_path must be inside an allowed directory (~/.hive/colonies/, exports/, examples/, or ~/.hive/agents/)"
"agent_path must be inside an allowed directory "
"($HIVE_HOME/colonies/, exports/, examples/, or $HIVE_HOME/agents/)"
)
@@ -94,13 +99,15 @@ def resolve_session(request: web.Request):
def sessions_dir(session: Session) -> Path:
"""Resolve the worker sessions directory for a session.
Storage layout: ~/.hive/agents/{agent_name}/sessions/
Storage layout: $HIVE_HOME/agents/{agent_name}/sessions/
Requires a worker to be loaded (worker_path must be set).
"""
if session.worker_path is None:
raise ValueError("No worker loaded — no worker sessions directory")
from framework.config import HIVE_HOME
agent_name = session.worker_path.name
return Path.home() / ".hive" / "agents" / agent_name / "sessions"
return HIVE_HOME / "agents" / agent_name / "sessions"
# Allowed CORS origins (localhost on any port)
@@ -159,6 +166,28 @@ async def no_cache_api_middleware(request: web.Request, handler):
return response
# ---------------------------------------------------------------------------
# Desktop shared-secret auth middleware.
#
# When the runtime is spawned by the Electron main process, a fresh random
# token is passed via ``HIVE_DESKTOP_TOKEN``. Every request from main must
# carry the matching ``X-Hive-Token`` header. If the env var is unset (e.g.
# running ``hive serve`` directly from a terminal), the check is skipped —
# OSS behaviour is preserved.
# ---------------------------------------------------------------------------
_EXPECTED_DESKTOP_TOKEN: str | None = os.environ.get("HIVE_DESKTOP_TOKEN") or None
@web.middleware
async def desktop_auth_middleware(request: web.Request, handler):
if _EXPECTED_DESKTOP_TOKEN is None:
return await handler(request)
provided = request.headers.get("X-Hive-Token", "")
if not hmac.compare_digest(provided, _EXPECTED_DESKTOP_TOKEN):
return web.json_response({"error": "unauthorized"}, status=401)
return await handler(request)
@web.middleware
async def error_middleware(request: web.Request, handler):
"""Catch exceptions and return JSON error responses.
@@ -287,7 +316,12 @@ def create_app(model: str | None = None) -> web.Application:
Returns:
Configured aiohttp Application ready to run.
"""
app = web.Application(middlewares=[cors_middleware, no_cache_api_middleware, error_middleware])
# Desktop mode: the runtime is always a subprocess of the Electron main
# process, which reaches it via IPC and the `hive://` custom protocol.
# There is no browser origin to authorize, so CORS is unnecessary.
# The auth middleware enforces the shared-secret token when the env var
# is set (i.e. when Electron spawned us); it is a no-op otherwise.
app = web.Application(middlewares=[desktop_auth_middleware, no_cache_api_middleware, error_middleware])
# Initialize credential store (before SessionManager so it can be shared)
from framework.credentials.store import CredentialStore
@@ -392,8 +426,23 @@ def create_app(model: str | None = None) -> web.Application:
register_skills_routes(app)
register_task_routes(app)
# Static file serving — Option C production mode
# If frontend/dist/ exists, serve built frontend files on /
# Commercial extensions (optional — only present in hive-desktop-runtime).
# Imports lazily so an OSS install without the `commercial` package keeps
# working unchanged.
try:
from commercial.middleware import setup_commercial_middleware
from commercial.routes import register_routes as register_commercial_routes
setup_commercial_middleware(app)
register_commercial_routes(app)
logger.info("Commercial extensions loaded")
except ImportError:
pass
# Serve the built frontend SPA (if frontend/dist exists) so hitting the
# API host in a browser loads the dashboard instead of 404'ing. In
# Electron/desktop mode the renderer still loads from file:// and
# ignores this; in dev mode Vite is used instead.
_setup_static_serving(app)
return app
+4 -18
View File
@@ -371,7 +371,6 @@ async def create_queen(
_queen_role_independent,
_queen_role_reviewing,
_queen_role_working,
_queen_style,
_queen_tools_incubating,
_queen_tools_independent,
_queen_tools_reviewing,
@@ -544,7 +543,7 @@ async def create_queen(
phase_state.incubating_tools = [t for t in queen_tools if t.name in incubating_names]
# Independent phase gets core tools + all MCP tools not claimed by any
# other phase (coder-tools file I/O, gcu-tools browser, etc.).
# other phase (files-tools file I/O, gcu-tools browser, etc.).
all_phase_names = independent_names | incubating_names | working_names | reviewing_names
mcp_tools = [t for t in queen_tools if t.name not in all_phase_names]
phase_state.independent_tools = [t for t in queen_tools if t.name in independent_names] + mcp_tools
@@ -646,7 +645,6 @@ async def create_queen(
(
_queen_character_core
+ _queen_role_independent
+ _queen_style
+ _queen_tools_independent
+ _queen_behavior_always
+ _queen_behavior_independent
@@ -654,27 +652,15 @@ async def create_queen(
_has_vision,
)
phase_state.prompt_incubating = finalize_queen_prompt(
(
_queen_character_core
+ _queen_role_incubating
+ _queen_style
+ _queen_tools_incubating
+ _queen_behavior_always
),
(_queen_character_core + _queen_role_incubating + _queen_tools_incubating + _queen_behavior_always),
_has_vision,
)
phase_state.prompt_working = finalize_queen_prompt(
(_queen_character_core + _queen_role_working + _queen_style + _queen_tools_working + _queen_behavior_always),
(_queen_character_core + _queen_role_working + _queen_tools_working + _queen_behavior_always),
_has_vision,
)
phase_state.prompt_reviewing = finalize_queen_prompt(
(
_queen_character_core
+ _queen_role_reviewing
+ _queen_style
+ _queen_tools_reviewing
+ _queen_behavior_always
),
(_queen_character_core + _queen_role_reviewing + _queen_tools_reviewing + _queen_behavior_always),
_has_vision,
)
+18 -28
View File
@@ -57,6 +57,7 @@ _SESSION_SEGMENT_RE = re.compile(r"^[a-z0-9_]+$")
# pushed wholesale anyway.
_MAX_UPLOAD_BYTES = 100 * 1024 * 1024
def _agents_dir() -> Path:
"""``COLONIES_DIR`` resolves to ``HIVE_HOME/colonies``; ``agents/`` is
the sibling. Resolved per-call so tests that monkeypatch
@@ -129,9 +130,7 @@ def _normalise_member_name(name: str) -> str:
return name
def _safe_extract_tar(
tf: tarfile.TarFile, dest: Path, *, strip_prefix: str
) -> tuple[int, str | None]:
def _safe_extract_tar(tf: tarfile.TarFile, dest: Path, *, strip_prefix: str) -> tuple[int, str | None]:
"""Extract every member of ``tf`` whose name starts with ``strip_prefix/``
into ``dest``, with the prefix stripped off.
@@ -159,7 +158,7 @@ def _safe_extract_tar(
if not name.startswith(prefix_with_sep):
# Belongs to a different root in a multi-root tar; skip.
continue
rel = name[len(prefix_with_sep):]
rel = name[len(prefix_with_sep) :]
else:
rel = name
if not rel:
@@ -299,9 +298,7 @@ async def _read_upload(
) -> tuple[bytes | None, str | None, dict[str, str], web.Response | None]:
"""Drain the multipart upload. Returns ``(bytes, filename, form, error)``."""
if not request.content_type.startswith("multipart/"):
return None, None, {}, web.json_response(
{"error": "expected multipart/form-data"}, status=400
)
return None, None, {}, web.json_response({"error": "expected multipart/form-data"}, status=400)
reader = await request.multipart()
upload: bytes | None = None
upload_filename: str | None = None
@@ -318,18 +315,21 @@ async def _read_upload(
break
buf.write(chunk)
if buf.tell() > _MAX_UPLOAD_BYTES:
return None, None, {}, web.json_response(
{"error": f"upload exceeds {_MAX_UPLOAD_BYTES} bytes"},
status=413,
return (
None,
None,
{},
web.json_response(
{"error": f"upload exceeds {_MAX_UPLOAD_BYTES} bytes"},
status=413,
),
)
upload = buf.getvalue()
upload_filename = part.filename or ""
else:
form[part.name or ""] = (await part.text()).strip()
if upload is None:
return None, None, {}, web.json_response(
{"error": "missing 'file' part"}, status=400
)
return None, None, {}, web.json_response({"error": "missing 'file' part"}, status=400)
return upload, upload_filename, form, None
@@ -346,18 +346,12 @@ async def handle_import_colony(request: web.Request) -> web.Response:
try:
tf = tarfile.open(fileobj=io.BytesIO(upload), mode="r:*")
except tarfile.TarError as err:
return web.json_response(
{"error": f"invalid tar archive: {err}"}, status=400
)
return web.json_response({"error": f"invalid tar archive: {err}"}, status=400)
try:
if _has_multi_root_prefix(tf):
return await _import_multi_root(
tf, replace_existing, upload_filename, len(upload)
)
return await _import_legacy_single_root(
tf, name_override, replace_existing, upload_filename, len(upload)
)
return await _import_multi_root(tf, replace_existing, upload_filename, len(upload))
return await _import_legacy_single_root(tf, name_override, replace_existing, upload_filename, len(upload))
finally:
tf.close()
@@ -478,15 +472,11 @@ async def _import_multi_root(
for kind in ("colonies", "agents_worker", "agents_queen"):
for prefix, dest in plan[kind].items():
target = Path(dest)
files_extracted, extract_err = _safe_extract_tar(
tf, target, strip_prefix=prefix
)
files_extracted, extract_err = _safe_extract_tar(tf, target, strip_prefix=prefix)
if extract_err:
return _abort(extract_err)
summary.setdefault(kind, {"files": 0})
summary[kind]["files"] = (
int(summary[kind].get("files", 0)) + files_extracted
)
summary[kind]["files"] = int(summary[kind].get("files", 0)) + files_extracted
extracted_dests.append(target)
total_files = sum(int(v.get("files", 0)) for v in summary.values())
@@ -235,10 +235,6 @@ _SYSTEM_TOOLS: frozenset[str] = frozenset(
{
"get_account_info",
"get_current_time",
"bash_kill",
"bash_output",
"execute_command_tool",
"example_tool",
}
)
@@ -294,7 +290,9 @@ def _resolve_progress_db_by_name(colony_name: str) -> Path | None:
"""
if not _COLONY_NAME_RE.match(colony_name):
return None
db_path = Path.home() / ".hive" / "colonies" / colony_name / "data" / "progress.db"
from framework.config import COLONIES_DIR
db_path = COLONIES_DIR / colony_name / "data" / "progress.db"
return db_path if db_path.exists() else None
+10 -8
View File
@@ -42,12 +42,11 @@ _WORKER_INHERITED_TOOLS: frozenset[str] = frozenset(
"read_file",
"write_file",
"edit_file",
"hashline_edit",
"list_directory",
"search_files",
"undo_changes",
# Shell
"run_command",
# Terminal (basics — exec + ripgrep + glob/find)
"terminal_exec",
"terminal_rg",
"terminal_find",
# Framework synthetics (always available to any AgentLoop node)
"set_output",
"escalate",
@@ -1181,7 +1180,6 @@ async def fork_session_into_colony(
import json
import shutil
from datetime import datetime
from pathlib import Path
from framework.agent_loop.agent_loop import AgentLoop, LoopConfig
from framework.agent_loop.types import AgentContext
@@ -1245,7 +1243,9 @@ async def fork_session_into_colony(
# would wrongly flag every fresh colony as "already-exists" if we
# used ``not colony_dir.exists()``. A colony is "new" until its
# worker config has actually been written.
colony_dir = Path.home() / ".hive" / "colonies" / colony_name
from framework.config import COLONIES_DIR
colony_dir = COLONIES_DIR / colony_name
worker_name = "worker"
worker_config_path = colony_dir / f"{worker_name}.json"
is_new = not worker_config_path.exists()
@@ -1469,7 +1469,9 @@ async def fork_session_into_colony(
compaction_status.mark_in_progress(dest_queen_dir)
_worker_storage = Path.home() / ".hive" / "agents" / colony_name / worker_name
from framework.config import HIVE_HOME
_worker_storage = HIVE_HOME / "agents" / colony_name / worker_name
_dest_queen_dir = dest_queen_dir
_queen_ctx = queen_ctx
_queen_loop = queen_loop
@@ -35,6 +35,11 @@ from framework.agents.queen.queen_tools_config import (
tools_config_exists,
update_queen_tools_config,
)
from framework.agents.queen.queen_tools_defaults import (
list_category_names,
queen_role_categories,
resolve_category_tools,
)
logger = logging.getLogger(__name__)
@@ -326,10 +331,36 @@ async def handle_get_tools(request: web.Request) -> web.Response:
mcp_tool_names_by_server=catalog,
enabled_mcp_tools=enabled_mcp_tools,
),
"categories": _render_categories(queen_id, catalog),
}
return web.json_response(response)
def _render_categories(
queen_id: str,
mcp_catalog: dict[str, list[dict[str, Any]]],
) -> list[dict[str, Any]]:
"""Expose the role-default category table to the frontend.
Each entry carries the category name, the resolved member tool names
(after ``@server:NAME`` shorthand expansion against the live catalog),
and ``in_role_default`` to flag categories that contribute to this
queen's role-based default. Lets the Tool Library group tools by
category alongside the per-server view.
"""
applied = set(queen_role_categories(queen_id))
out: list[dict[str, Any]] = []
for name in list_category_names():
out.append(
{
"name": name,
"tools": resolve_category_tools(name, mcp_catalog),
"in_role_default": name in applied,
}
)
return out
async def handle_patch_tools(request: web.Request) -> web.Response:
"""PATCH /api/queen/{queen_id}/tools — persist the MCP tool allowlist.
+4 -2
View File
@@ -1094,8 +1094,10 @@ async def handle_delete_agent(request: web.Request) -> web.Response:
except ValueError as exc:
return web.json_response({"error": str(exc)}, status=400)
# Reject deletion of framework agents (~/.hive/agents/) — those are internal
hive_agents_dir = Path.home() / ".hive" / "agents"
# Reject deletion of framework agents ($HIVE_HOME/agents/) — those are internal
from framework.config import HIVE_HOME
hive_agents_dir = HIVE_HOME / "agents"
if resolved.is_relative_to(hive_agents_dir):
return web.json_response({"error": "Cannot delete framework agents"}, status=403)
+3 -1
View File
@@ -936,7 +936,9 @@ async def handle_upload_skill(request: web.Request) -> web.Response:
# Resolve the write target
if scope_kind == "user":
write_dir = Path.home() / ".hive" / "skills"
from framework.config import HIVE_HOME
write_dir = HIVE_HOME / "skills"
overrides_path: Path | None = None
store: SkillOverrideStore | None = None
affected_runtimes: list = []
+2 -4
View File
@@ -67,11 +67,9 @@ async def handle_list_nodes(request: web.Request) -> web.Response:
worker_session_id = request.query.get("session_id")
if worker_session_id and session.worker_path:
worker_session_id = safe_path_segment(worker_session_id)
from pathlib import Path
from framework.config import HIVE_HOME
state_path = (
Path.home() / ".hive" / "agents" / session.worker_path.name / "sessions" / worker_session_id / "state.json"
)
state_path = HIVE_HOME / "agents" / session.worker_path.name / "sessions" / worker_session_id / "state.json"
if state_path.exists():
try:
state = json.loads(state_path.read_text(encoding="utf-8"))
+7 -3
View File
@@ -546,8 +546,10 @@ class SessionManager:
session.colony_name = colony_id
session.worker_path = agent_path
# Worker storage: ~/.hive/agents/{colony_name}/{worker_name}/
worker_storage = Path.home() / ".hive" / "agents" / colony_id / worker_name
# Worker storage: $HIVE_HOME/agents/{colony_name}/{worker_name}/
from framework.config import HIVE_HOME
worker_storage = HIVE_HOME / "agents" / colony_id / worker_name
worker_storage.mkdir(parents=True, exist_ok=True)
# Copy conversations from colony if fresh
@@ -927,7 +929,9 @@ class SessionManager:
that process is still running on the host. If it is, the session is
owned by another healthy worker process, so leave it alone.
"""
sessions_path = Path.home() / ".hive" / "agents" / agent_path.name / "sessions"
from framework.config import HIVE_HOME
sessions_path = HIVE_HOME / "agents" / agent_path.name / "sessions"
if not sessions_path.exists():
return
@@ -105,9 +105,7 @@ async def test_happy_path_imports_colony(colonies_dir: Path) -> None:
async def test_name_override(colonies_dir: Path) -> None:
archive = _build_tar({"x_daily/": None, "x_daily/file.txt": b"hi"})
async with await _client(_app()) as c:
resp = await c.post(
"/api/colonies/import", data=_form(archive, name="other_name")
)
resp = await c.post("/api/colonies/import", data=_form(archive, name="other_name"))
assert resp.status == 201
body = await resp.json()
assert body["name"] == "other_name"
@@ -202,7 +200,9 @@ async def test_rejects_invalid_colony_name(colonies_dir: Path) -> None:
@pytest.mark.asyncio
async def test_rejects_non_multipart(colonies_dir: Path) -> None:
async with await _client(_app()) as c:
resp = await c.post("/api/colonies/import", data=b"not multipart", headers={"Content-Type": "application/octet-stream"})
resp = await c.post(
"/api/colonies/import", data=b"not multipart", headers={"Content-Type": "application/octet-stream"}
)
assert resp.status == 400
@@ -266,7 +266,9 @@ async def test_multi_root_unpacks_three_subtrees(colonies_dir: Path) -> None:
assert (colonies_dir / "x_daily" / "data" / "progress.db").exists()
# Worker conversations under HIVE_HOME/agents/<colony>/worker/
hive_home = colonies_dir.parent
assert (hive_home / "agents" / "x_daily" / "worker" / "conversations" / "0001.json").read_bytes() == b'{"role":"user"}'
assert (
hive_home / "agents" / "x_daily" / "worker" / "conversations" / "0001.json"
).read_bytes() == b'{"role":"user"}'
# Queen forked session under HIVE_HOME/agents/queens/<queen>/sessions/<sid>/
assert (hive_home / "agents" / "queens" / "queen_alpha" / "sessions" / "session_x" / "queen.json").exists()
# Summary in response
@@ -131,7 +131,7 @@ async def test_get_tools_default_allow(colony_dir):
_, name = colony_dir
manager = _FakeManager(
_mcp_tool_catalog={
"coder-tools": [
"files-tools": [
{"name": "read_file", "description": "read", "input_schema": {}},
{"name": "write_file", "description": "write", "input_schema": {}},
],
@@ -153,7 +153,7 @@ async def test_patch_persists_and_validates(colony_dir):
colonies_dir, name = colony_dir
manager = _FakeManager(
_mcp_tool_catalog={
"coder-tools": [
"files-tools": [
{"name": "read_file", "description": "", "input_schema": {}},
{"name": "write_file", "description": "", "input_schema": {}},
]
@@ -201,7 +201,7 @@ async def test_patch_refreshes_live_runtime(colony_dir):
manager = _FakeManager(
_sessions={session.id: session},
_mcp_tool_catalog={
"coder-tools": [
"files-tools": [
{"name": "read_file", "description": "", "input_schema": {}},
{"name": "write_file", "description": "", "input_schema": {}},
]
@@ -132,7 +132,7 @@ async def test_list_servers_returns_built_in(registry):
assert "built-in-seed" in names
sources = {s["name"]: s["source"] for s in body["servers"]}
assert sources.get("built-in-seed") == "registry"
# The package-baked servers (coder-tools/gcu-tools/hive_tools) carry
# The package-baked servers (files-tools/gcu-tools/hive_tools) carry
# source=="built-in" and are non-removable.
pkg_entries = [s for s in body["servers"] if s["source"] == "built-in"]
assert pkg_entries, "expected at least one package-baked MCP server"
+69 -26
View File
@@ -159,7 +159,7 @@ async def test_get_tools_default_allows_everything_for_unknown_queen(queen_dir,
manager = _FakeManager()
manager._mcp_tool_catalog = {
"coder-tools": [
"files-tools": [
{"name": "read_file", "description": "read", "input_schema": {}},
{"name": "write_file", "description": "write", "input_schema": {}},
],
@@ -175,8 +175,8 @@ async def test_get_tools_default_allows_everything_for_unknown_queen(queen_dir,
assert body["is_role_default"] is True # no sidecar → default-allow
assert body["stale"] is False
servers = {s["name"]: s for s in body["mcp_servers"]}
assert set(servers) == {"coder-tools"}
for tool in servers["coder-tools"]["tools"]:
assert set(servers) == {"files-tools"}
for tool in servers["files-tools"]["tools"]:
assert tool["enabled"] is True
@@ -187,13 +187,16 @@ async def test_get_tools_applies_role_default(queen_dir, monkeypatch):
_, queen_id = queen_dir # queen_technology — has a role default
manager = _FakeManager()
# Seed a catalog covering tools the role default references so the
# response reflects what the queen would actually see on boot.
# Seed two MCP servers: files-tools is referenced by the technology
# role via the @server:files-tools shorthand in `file_ops`, so its
# tools should bubble into the default. unrelated-server is NOT
# referenced by any role category — its tools must NOT leak in.
manager._mcp_tool_catalog = {
"coder-tools": [
"files-tools": [
{"name": "read_file", "description": "", "input_schema": {}},
{"name": "port_scan", "description": "", "input_schema": {}}, # security
{"name": "excel_read", "description": "", "input_schema": {}}, # data
{"name": "edit_file", "description": "", "input_schema": {}},
],
"unrelated-server": [
{"name": "fluffy_unknown_tool", "description": "", "input_schema": {}},
],
}
@@ -204,32 +207,66 @@ async def test_get_tools_applies_role_default(queen_dir, monkeypatch):
assert resp.status == 200
body = await resp.json()
# queen_technology's role default includes file_read, data, security, etc.
assert body["is_role_default"] is True
enabled = set(body["enabled_mcp_tools"] or [])
# @server:files-tools shorthand pulls in every tool under that server.
assert "read_file" in enabled
assert "port_scan" in enabled # technology role includes security
assert "excel_read" in enabled
# Tools not in any category (and not in a @server: expansion target
# the role references) are NOT part of the default.
assert "edit_file" in enabled
# Tools registered under a server the role doesn't reference are NOT
# part of the default.
assert "fluffy_unknown_tool" not in enabled
@pytest.mark.asyncio
async def test_get_tools_exposes_categories(queen_dir, monkeypatch):
"""Response includes the category catalog with role-default flags."""
monkeypatch.setattr(routes_queen_tools, "ensure_default_queens", lambda: None)
_, queen_id = queen_dir # queen_technology
manager = _FakeManager()
manager._mcp_tool_catalog = {
"files-tools": [
{"name": "read_file", "description": "", "input_schema": {}},
{"name": "edit_file", "description": "", "input_schema": {}},
],
}
app = await _make_app(manager=manager)
async with TestClient(TestServer(app)) as client:
resp = await client.get(f"/api/queen/{queen_id}/tools")
assert resp.status == 200
body = await resp.json()
cats = {c["name"]: c for c in body["categories"]}
# Categories that contribute to queen_technology's role default
assert cats["file_ops"]["in_role_default"] is True
assert cats["browser_basic"]["in_role_default"] is True
# Spreadsheet category is exposed even though queen_technology doesn't
# use it — frontend can group/show it.
assert "spreadsheet_advanced" in cats
assert cats["spreadsheet_advanced"]["in_role_default"] is False
# Security was removed from queen_technology defaults.
assert cats["security"]["in_role_default"] is False
# @server:files-tools shorthand expanded against the catalog.
assert "read_file" in cats["file_ops"]["tools"]
assert "edit_file" in cats["file_ops"]["tools"]
def test_resolve_queen_default_tools_expands_server_shorthand():
"""@server:NAME shorthand expands against the provided catalog."""
from framework.agents.queen.queen_tools_defaults import resolve_queen_default_tools
catalog = {
"gcu-tools": [
{"name": "browser_navigate"},
{"name": "browser_click"},
"files-tools": [
{"name": "read_file"},
{"name": "write_file"},
],
}
# queen_brand_design uses "browser" category → expands via @server:gcu-tools.
# queen_brand_design uses "file_ops" category → expands via @server:files-tools.
result = resolve_queen_default_tools("queen_brand_design", catalog)
assert result is not None
assert "browser_navigate" in result
assert "browser_click" in result
assert "read_file" in result
assert "write_file" in result
def test_resolve_queen_default_tools_unknown_queen_returns_none():
@@ -245,7 +282,7 @@ async def test_patch_persists_and_validates(queen_dir, monkeypatch):
manager = _FakeManager()
manager._mcp_tool_catalog = {
"coder-tools": [
"files-tools": [
{"name": "read_file", "description": "", "input_schema": {}},
{"name": "write_file", "description": "", "input_schema": {}},
]
@@ -318,7 +355,7 @@ async def test_patch_hot_reloads_live_session(queen_dir, monkeypatch):
tools_by_name = {"read_file": _tool("read_file"), "write_file": _tool("write_file")}
registry = _FakeRegistry(
server_map={"coder-tools": {"read_file", "write_file"}},
server_map={"files-tools": {"read_file", "write_file"}},
tools_by_name=tools_by_name,
)
# Patch get_tools to return real Tool objects for name/description plumbing.
@@ -375,9 +412,12 @@ async def test_delete_restores_role_default(queen_dir, monkeypatch):
manager = _FakeManager()
manager._mcp_tool_catalog = {
"coder-tools": [
"files-tools": [
{"name": "read_file", "description": "", "input_schema": {}},
{"name": "port_scan", "description": "", "input_schema": {}},
# pdf_read lives in hive_tools but is named explicitly in the
# file_ops category, so we stage it in any server here just to
# surface it through the catalog.
{"name": "pdf_read", "description": "", "input_schema": {}},
],
}
@@ -398,11 +438,14 @@ async def test_delete_restores_role_default(queen_dir, monkeypatch):
assert body["is_role_default"] is True
assert not tools_path.exists()
# The new effective list is the role default for queen_technology,
# which includes both read_file (file_read) and port_scan (security).
# The new effective list is the role default for queen_technology;
# security tools were intentionally removed, so port_scan must NOT
# appear, while file_ops members like read_file/pdf_read do.
enabled = set(body["enabled_mcp_tools"] or [])
assert "read_file" in enabled
assert "port_scan" in enabled
assert "pdf_read" in enabled
assert "port_scan" not in enabled
assert "subdomain_enumerate" not in enabled
# GET confirms.
resp = await client.get(f"/api/queen/{queen_id}/tools")
@@ -11,7 +11,7 @@ metadata:
**Applies when** your spawn message has `db_path:` and `colony_id:` fields. The DB is your durable working memory — tells you what's done, what to skip, which SOP gates you owe.
Access via `execute_command_tool` running `sqlite3 "<db_path>" "..."`. Tables: `tasks` (queue), `steps` (per-task decomposition), `sop_checklist` (hard gates).
Access via `terminal_exec` running `sqlite3 "<db_path>" "..."`. Tables: `tasks` (queue), `steps` (per-task decomposition), `sop_checklist` (hard gates).
### Claim: assigned task (check this FIRST)
@@ -113,7 +113,7 @@ Even after `wait_until="load"`, React/Vue SPAs often render their real chrome in
### Reading pages efficiently
- **Prefer `browser_snapshot` over `browser_get_text("body")`** — returns a compact ~15 KB accessibility tree vs 100+ KB of raw HTML.
- Interaction tools `browser_click`, `browser_type`, `browser_type_focused`, `browser_fill`, and `browser_scroll` wait 0.5 s for the page to settle after a successful action, then attach a fresh accessibility snapshot under the `snapshot` key of their result. Use it to decide your next action — do NOT call `browser_snapshot` separately after every action. Tune the capture via `auto_snapshot_mode`: `"default"` (full tree, the default), `"simple"` (trims unnamed structural nodes), `"interactive"` (only controls — tightest token footprint), or `"off"` to skip the capture entirely (useful when batching several interactions and you don't need the intermediate trees). Call `browser_snapshot` explicitly only when you need a newer view or a different mode than what was auto-captured.
- Interaction tools `browser_click`, `browser_type`, `browser_type_focused`, and `browser_scroll` wait 0.5 s for the page to settle after a successful action, then attach a fresh accessibility snapshot under the `snapshot` key of their result. Use it to decide your next action — do NOT call `browser_snapshot` separately after every action. Tune the capture via `auto_snapshot_mode`: `"default"` (full tree, the default), `"simple"` (trims unnamed structural nodes), `"interactive"` (only controls — tightest token footprint), or `"off"` to skip the capture entirely (useful when batching several interactions and you don't need the intermediate trees). Call `browser_snapshot` explicitly only when you need a newer view or a different mode than what was auto-captured.
- Complex pages (LinkedIn, Twitter/X, SPAs with virtual scrolling) can have DOMs that don't match what's visually rendered — snapshot refs may be stale, missing, or misaligned with visible layout. Try the available snapshot first; when the target is not present in that snapshot or visual position matters, switch to `browser_screenshot` to orient yourself.
- Only fall back to `browser_get_text` for extracting specific small elements by CSS selector.
@@ -244,8 +244,8 @@ The highlight overlay stays visible on the page for **10 seconds** after each in
**Close tabs as soon as you are done with them** — not only at the end of the task. After reading or extracting data from a tab, close it immediately.
- Finished reading/extracting from a tab? `browser_close(target_id=...)`
- Completed a multi-tab workflow? `browser_close_finished()` to clean up all your tabs
- Finished reading/extracting from a tab? `browser_close(tab_id=...)` (or no arg to close the active tab)
- Completed a multi-tab workflow? Call `browser_close` for each tab you opened — list with `browser_tabs` first if you've lost track of IDs
- More than 3 tabs open? Stop and close finished ones before opening more
- Popup appeared that you didn't need? Close it immediately
@@ -410,7 +410,7 @@ In all of these cases the script is SHORT (< 10 lines) and the result is CONSUME
- If a tool fails, retry once with the same approach.
- If it fails a second time, STOP retrying and switch approach.
- If `browser_snapshot` fails, try `browser_get_text` with a specific small selector as fallback.
- If `browser_open` fails or page seems stale, `browser_stop`, then `browser_start`, then retry.
- If `browser_open` fails or page seems stale, `browser_stop`, then `browser_open(url)` again to recreate a fresh context.
## Verified workflows
@@ -0,0 +1,160 @@
---
name: hive.chart-creation-foundations
description: Required reading whenever any chart_* tool is available. Teaches the one-tool embedding contract (call chart_render → live chart appears in chat AND a downloadable PNG lands in the queen session dir), the ECharts (data viz) vs Mermaid (structural diagrams) decision, the BI/financial-grade aesthetic baseline (no chartjunk, restrained palette, proper typography, single message per chart), and the canonical spec patterns for the 12 most-common chart types. Skipping this leads to 1990s-Excel charts, missing downloads, and the agent writing markdown image links by hand instead of letting chart_render drive the UI.
metadata:
author: hive
type: preset-skill
version: "1.0"
---
# Chart creation foundations
These tools render BI/financial-analyst-grade charts and diagrams that show up live in the chat AND save as high-DPI PNGs in the user's queen session dir.
## The embedding contract — one rule
> **To put a chart in chat, call `chart_render`. The chat reads `result.spec` and renders the chart live in the message bubble. The download link is `result.file_url`. Do not write `![chart](...)` image markdown by hand — the tool's result drives the UI.**
That's it. One tool call, one chart in chat, one file on disk. No two-step "remember to also save it" pattern. The chat's chart-rendering UI is fed by the tool result envelope automatically.
## When to chart at all
Chart when the data is **visual at heart**: trends over time, distributions, comparisons across categories, hierarchies, flows, geo. Skip the chart when:
- The point is one number → just say it. ("Revenue was $4.2M, up 12% YoY.")
- The point is a ranking of 5 things → use a markdown table with bold and emoji indicators.
- The data is so noisy a chart would mislead → describe the takeaway in prose.
A chart costs the user attention. It must repay that cost with a takeaway they couldn't get from prose.
## ECharts vs Mermaid — the picking rule
| Use ECharts (`kind: "echarts"`) when... | Use Mermaid (`kind: "mermaid"`) when... |
|---|---|
| You're plotting **numbers over categories or time** | You're showing **structure, not data** |
| Bar / line / area / scatter / candlestick / heatmap / treemap / sankey / parallel coordinates / calendar / gauge / pie / sunburst / geo map | Flowchart / sequence / gantt / ERD / state diagram / mindmap / class diagram / C4 architecture |
| The viewer's question is "how much / how many / what's the trend" | The viewer's question is "what calls what / what depends on what / what happens after what" |
If both fit (rare), prefer ECharts — its rasterized output is a proper data chart for slides; Mermaid's diagrams are for technical docs.
## The aesthetic baseline (non-negotiable)
These are the rules that turn an Excel-default chart into a Tableau-grade one. Every chart you produce must follow them.
### 1. Theme & background
- `chart_render` has **no `theme` parameter**. The renderer reads the user's UI theme from the desktop env (`HIVE_DESKTOP_THEME`) so the saved PNG matches what the user is actually looking at. You don't pick; the system does.
- Title goes in `option.title.text`, NOT in the message body. The chart is self-contained.
### 2. Palette discipline — DO NOT set `color` on series
The OpenHive ECharts theme is auto-applied to every `chart_render` call. It defines:
- An 8-hue **categorical palette** for multi-series charts (honey orange, slate blue, sage, terracotta, bronze, indigo, olive, rust)
- Cozy spacing (`grid.top: 90`, `grid.bottom: 56`, etc.)
- Brand typography (Inter Tight)
- Tasteful axis lines + dashed gridlines
**Do not set `option.color`, `option.title.textStyle`, `option.grid`, or `option.itemStyle.color` on series.** The theme covers it. If you do override, you'll fight the brand palette and the chart will look generic.
When you need data-encoded color (NOT category color):
- **Sequential** (magnitude): use `visualMap` with `inRange.color: ['#fff7e0', '#db6f02']` (light-to-honey)
- **Diverging** (positive/negative): use `visualMap` with `inRange.color: ['#a8453d', '#f5f5f5', '#3d7a4a']` (terracotta/neutral/sage)
- **Semantic up/down** (candlestick is auto-themed): for explicit gain/loss bars use `#3d7a4a` (gain) and `#a8453d` (loss), NOT `#27ae60` / `#e74c3c`.
### 3. Typography
The default font (`-apple-system, "Inter Tight", system-ui`) is already wired in the renderer — don't override unless the user asked. Set `option.textStyle.fontSize: 13` for body labels, `16` for axis names, `18` bold for the title.
### 4. No chartjunk
- **No 3D**. Ever. 3D pie charts and 3D bar charts are visual lies.
- **No drop shadows** on bars or lines. The default flat ECharts look is correct.
- **No gradient fills** unless the gradient encodes data (e.g. heatmap fill).
- **No neon colors**. Saturation belongs on highlighted bars, not on every series.
- **No more than 5 stacked colors** in a stacked bar — past that the eye can't separate them.
### 5. Axis hygiene
- X-axis labels rotate 45° only when they overflow. Otherwise horizontal.
- Y-axis starts at 0 for bar/area charts (truncating misleads). Line charts can start at min - 5%.
- Use `option.yAxis.axisLabel.formatter: '{value} M'` to add units, NOT a separate "USD millions" subtitle.
- Date axes: pass ISO strings (`"2024-01-15"`) and ECharts handles the layout. Use `xAxis.type: "time"`.
### 6. One message per chart
Every chart goes in its own assistant message (or its own `chart_render` call). Do not pile 4 charts into one wall of tool calls — the user can't focus and the chat gets noisy.
## Calling `chart_render` — the canonical pattern
```
chart_render(
kind="echarts",
spec={
"title": {"text": "Q4 revenue by region", "left": "center"},
"tooltip": {"trigger": "axis"},
"xAxis": {"type": "category", "data": ["NA", "EU", "APAC", "LATAM"]},
"yAxis": {"type": "value", "axisLabel": {"formatter": "${value}M"}},
"series": [{"type": "bar", "data": [12.4, 8.7, 5.3, 2.1], "itemStyle": {"color": "#db6f02"}}]
},
title="q4-revenue-by-region",
width=1600, height=900, dpi=300
)
```
Returns:
```
{
"kind": "echarts",
"spec": {...echoed...},
"file_path": "/.../charts/2026-04-30T...q4-revenue-by-region.png",
"file_url": "file:///.../q4-revenue-by-region.png",
"width": 1600, "height": 900, "dpi": 300, "bytes": 142318,
"title": "q4-revenue-by-region", "runtime_ms": 287
}
```
The chat panel reads `result.spec` and mounts ECharts in the message bubble. The user sees the chart immediately. The PNG is on disk and the chat shows a download link from `result.file_url`. **You don't write that link — it appears automatically.**
## The 12 chart types you'll use 95% of the time
| When | ECharts type | Notes |
|---|---|---|
| Trend over time | `series.type: "line"` | Smooth = `smooth: true` only when data is noisy |
| Multi-metric trend | Two `line` series with `yAxis: [{}, {}]` | Separate scales when units differ |
| Category comparison | `series.type: "bar"` | Sort by value descending, not alphabetically |
| Stacked composition | `bar` with `stack: "total"` | Cap at 5 categories |
| Distribution | `series.type: "boxplot"` or `bar` of bins | Boxplot for ≥3 groups; histogram for one |
| Two-variable correlation | `series.type: "scatter"` | Add `regression` markline if relevant |
| Candlestick / OHLC | `series.type: "candlestick"` | Date axis + `dataZoom` range slider |
| Geo distribution | `series.type: "map"` | Bundled `world` and country GeoJSONs |
| Hierarchy / share | `series.type: "treemap"` or `sunburst` | Use treemap for >12 leaves; pie only for 2-5 |
| Flow | `series.type: "sankey"` | Names matter — keep them short |
| Calendar density | `series.type: "heatmap"` + `calendar` | Daily metrics over a year |
| KPI scorecard | `series.type: "gauge"` | Set `min`, `max`, threshold band |
Worked specs for each are in `references/` — paste, modify, render.
## Mermaid quick rules
```
chart_render(
kind="mermaid",
spec="""
flowchart LR
A[Customer signs up] --> B{Onboarded?}
B -- yes --> C[Activate trial]
B -- no --> D[Email reminder]
""",
title="signup-flow"
)
```
- One diagram per chart_render call.
- Keep node labels short (≤20 chars).
- Use `flowchart LR` for left-to-right; `TD` for top-down. LR reads better in a chat bubble.
- For sequence diagrams, indicate async with `->>` (open arrow) and sync return with `-->>` (dashed).
- Don't try to encode data in mermaid (no widths, no quantities) — that's an ECharts job.
## Common mistakes the agent makes
1. **Writing `![chart](file://...)` markdown by hand.** Don't. The chat renders from the tool result automatically. Manual image markdown will display nothing (file:// is blocked from arbitrary chat content).
2. **Calling chart_render twice for the same chart "to embed and to save".** Only one call. The single call does both.
3. **Overriding fonts to fancy display faces.** Stay with the default; the agent's job is data, not typography.
4. **Pie charts with 12 slices.** Use a horizontal bar chart sorted by value. Pie is only for 2-5 mutually-exclusive shares.
5. **Forgetting `axisLabel.formatter` for currency / percentage.** A y-axis showing "12000000" is unreadable; "12M" is correct.
6. **Putting a chart's title in the message body.** Set `option.title.text` instead so the title is part of the saved PNG.
@@ -0,0 +1,139 @@
---
name: hive.terminal-tools-foundations
description: Required reading whenever any shell_* tool is available. Teaches the foreground/background dichotomy (terminal_exec auto-promotes past 30s, returns a job_id you poll with terminal_job_logs), the standard envelope shape (exit_code, stdout, stdout_truncated_bytes, output_handle, semantic_status, warning, auto_backgrounded, job_id), output handle pagination via terminal_output_get, when to read semantic_status instead of raw exit_code (grep/rg/find/diff/test exit 1 is NOT an error), the destructive-warning surface (rm -rf, git push --force, DROP TABLE), tool preference (use files-tools / gcu-tools / hive_tools before raw shell), and the bash-only-on-macOS policy. Skipping this leads to "tool returned no output" surprises, orphaned jobs, and panic over benign grep exit codes.
metadata:
author: hive
type: preset-skill
version: "1.0"
---
# terminal-tools — foundations
These tools give you a real terminal: foreground exec with smart envelopes, background jobs with offset-based log streaming, persistent PTY shells, and filesystem search. Bash-only on POSIX.
## Tool preference (read first)
Before reaching for terminal-tools, check whether a higher-level tool already covers the task. Shell is for system operations the other servers don't reach.
- **Reading files** → `files-tools.read_file` (handles size, paging, line-numbered output) — NOT `terminal_exec("cat ...")`
- **Editing files** → `files-tools.edit_file` (atomic patch with diff verification) — NOT `terminal_exec("sed -i ...")`
- **Writing files** → `files-tools.write_file` — NOT `terminal_exec("echo > ...")`
- **In-project search** → `files-tools.search_files` (project-scoped, code-aware) — use `terminal_rg` only for raw paths outside the project (`/var/log`, `/etc`)
- **Browser / web pages** → `gcu-tools.browser_*` for rendered pages — NOT `terminal_exec("curl ...")`
- **Web search** → `hive_tools.web_search` — NOT scraping
- **System operations** (process exec, jobs, PTYs, raw fs search) → terminal-tools. This is its territory.
## The standard envelope
Every spawn-style call (`terminal_exec`, the auto-promoted job state) returns this shape:
```jsonc
{
"exit_code": 0, // null when auto-backgrounded or pre-spawn error
"stdout": "...", // decoded, truncated to max_output_kb (default 256 KB)
"stderr": "...",
"stdout_truncated_bytes": 0, // > 0 means more is in output_handle
"stderr_truncated_bytes": 0,
"runtime_ms": 42,
"pid": 12345,
"output_handle": null, // "out_<hex>" when truncated — paginate with terminal_output_get
"timed_out": false,
"semantic_status": "ok", // "ok" | "signal" | "error" — read THIS, not just exit_code
"semantic_message": null, // e.g. "No matches found" for grep exit 1
"warning": null, // e.g. "may force-remove files" for rm -rf
"auto_backgrounded": false,
"job_id": null // set when auto_backgrounded=true
}
```
## Auto-promotion (the core mental model)
`terminal_exec` runs commands in the foreground until the **auto-background budget** (default 30s) elapses. Past that point, the process is silently transferred to a background job and the call returns immediately with:
```jsonc
{ "auto_backgrounded": true, "exit_code": null, "job_id": "job_<hex>", ... }
```
When you see `auto_backgrounded: true`, **pivot to polling**. The job is still running:
```
terminal_job_logs(job_id, since_offset=0, wait_until_exit=true, wait_timeout_sec=60)
→ blocks server-side until the job exits or the timeout, returns logs + status
```
You're not failing — you're freed up to do other work while the long task runs.
To force pure-foreground (kill on `timeout_sec`), pass `auto_background_after_sec=0`. Use this when you genuinely don't want a background job (small commands where promotion would surprise you).
## Semantic exit codes — read `semantic_status`, not raw `exit_code`
Several common commands use exit 1 for legitimate non-error states:
| Command | exit 0 | exit 1 |
|---|---|---|
| `grep` / `rg` | matches found | **no matches** (not an error) |
| `find` | success | **some dirs unreadable** (informational) |
| `diff` | identical | **files differ** (informational) |
| `test` / `[` | true | **false** (informational) |
For these, `semantic_status` will be `"ok"` even when `exit_code == 1`, with `semantic_message` describing why ("No matches found"). For everything else, `semantic_status` defaults to `"ok"` on 0 and `"error"` on nonzero.
**Rule**: always check `semantic_status` first. Only fall back to `exit_code` when you need the exact number (e.g. distinguishing `make` errors).
## Destructive warnings — re-read your command
The envelope's `warning` field is set when the command matches a known destructive pattern (`rm -rf`, `git push --force`, `git reset --hard`, `DROP TABLE`, `kubectl delete`, `terraform destroy`, etc.). The command **still ran** — the warning is informational. Use it as a "did I mean to do that?" prompt before trusting subsequent steps that depend on the side effect.
If a `warning` appears unexpectedly, stop and verify: was the destructive action intended, or did a path/glob slip in?
## Output handles — never lose output
When `stdout_truncated_bytes > 0` or `stderr_truncated_bytes > 0`, the inline output was capped at `max_output_kb` (default 256 KB). The full bytes are stashed under `output_handle` for **5 minutes**. Paginate with:
```
terminal_output_get(output_handle, since_offset=0, max_kb=64)
→ { data, offset, next_offset, eof, expired }
```
Track `next_offset` across calls. If `expired: true`, re-run the command (the handle's TTL has lapsed).
The store has a 64 MB cap with LRU eviction. For huge outputs, prefer `terminal_job_start` + `terminal_job_logs` polling (4 MB ring buffer per stream, infinite total throughput).
## Bash, not zsh — even on macOS
`terminal_exec` and `terminal_pty_open` always invoke `/bin/bash`. The user's `$SHELL` is ignored. Explicit `shell="/bin/zsh"` is **rejected** with a clear error. This is a deliberate security stance, not aesthetic — zsh has command/builtin classes (`zmodload`, `=cmd` expansion, `zpty`, `ztcp`, `zf_*`) that bypass bash-shaped checks. The `terminal-tools-pty-sessions` skill explains the implications for PTY sessions specifically.
`ZDOTDIR` and `ZSH_*` env vars are stripped before exec to prevent zsh dotfiles leaking in. Bash dotfiles still apply when invoked interactively (e.g. PTY sessions use `bash --norc --noprofile` to keep things predictable).
## Pipelines and complex commands
Pipes (`|`), redirects (`>`, `<`, `>>`), conditionals (`&&`, `||`, `;`), and globs (`*`, `?`, `[`) are detected automatically. You can pass them with the default `shell=False` and the runtime will transparently route through `/bin/bash -c` and surface `auto_shell: true` in the envelope:
```
terminal_exec("ps aux | sort -k3 -rn | head -40")
→ { exit_code: 0, stdout: "...", auto_shell: true, ... }
```
For simple argv commands (no metacharacters) `shell=False` is faster and direct-execs the binary. For commands with shell features but no metacharacters that the detector catches (rare — exotic bash builtins, here-strings), pass `shell=True` explicitly:
```
terminal_exec("set -e; complicated bash logic", shell=True)
```
Quoted strings work either way — the detector uses `shlex.split` which handles `"quoted args with spaces"` correctly.
## When to use what (cheat sheet)
| Need | Tool |
|---|---|
| One-shot command, ≤30s | `terminal_exec` |
| One-shot command, might be longer | `terminal_exec` (auto-promotes) |
| Long-running job from the start | `terminal_job_start` |
| State across calls (cd, env, REPL) | `terminal_pty_open` + `terminal_pty_run` |
| Search file contents (raw paths) | `terminal_rg` |
| Find files by predicate | `terminal_find` |
| Retrieve truncated output | `terminal_output_get` |
| Tree / stat / du | `terminal_exec("ls -la"/"stat foo"/"du -sh path")` |
| HTTP / DNS / ping / archives | `terminal_exec("curl ..."/"dig ..."/"tar xzf ...")` |
See `references/exit_codes.md` for the full POSIX + signal-induced + semantic catalog.
@@ -0,0 +1,50 @@
# Exit code reference
## POSIX conventions
| Code | Meaning |
|---|---|
| 0 | Success |
| 1 | General error / catchall |
| 2 | Misuse of shell builtins, syntax error |
| 126 | Command found but not executable |
| 127 | Command not found |
| 128 | Invalid argument to `exit` |
| 128 + N | Killed by signal N |
| 130 | Killed by SIGINT (Ctrl-C) |
| 137 | Killed by SIGKILL |
| 143 | Killed by SIGTERM |
| 255 | Exit status out of range |
When `exit_code < 0` in the envelope, the process was killed by a signal: `abs(exit_code)` is the signal number (subprocess uses negative codes for signaled exits, separate from the `128 + N` shell convention).
## Semantic exits — when exit 1 is NOT an error
terminal-tools encodes these in `semantic_status`. The agent should read `semantic_status` first.
| Command | Code 0 | Code 1 | Code ≥2 |
|---|---|---|---|
| `grep` / `rg` / `ripgrep` | matches found | **no matches** (ok) | error |
| `find` | success | **some dirs unreadable** (ok) | error |
| `diff` | files identical | **files differ** (ok) | error |
| `test` / `[` | condition true | **condition false** (ok) | error |
For any command not in this table, the default convention applies (0 = ok, nonzero = error).
## When `exit_code` is `null`
- `auto_backgrounded: true` — the process is still running under a `job_id`. Poll with `terminal_job_logs`.
- Pre-spawn error (command not found, exec failed) — see `error` field in the envelope.
- `timed_out: true` and the process refused to die — extremely rare; the kernel has the answer.
## Common signal-induced exits
| Signal | Number | Subprocess exit | Shell exit | Meaning |
|---|---|---|---|---|
| SIGHUP | 1 | -1 | 129 | Terminal hangup |
| SIGINT | 2 | -2 | 130 | Interrupt (Ctrl-C) |
| SIGQUIT | 3 | -3 | 131 | Quit (Ctrl-\\) |
| SIGKILL | 9 | -9 | 137 | Forced kill (uncatchable) |
| SIGTERM | 15 | -15 | 143 | Polite termination |
| SIGSEGV | 11 | -11 | 139 | Segmentation fault |
| SIGABRT | 6 | -6 | 134 | Abort (assertion failed, etc.) |
@@ -0,0 +1,96 @@
---
name: hive.terminal-tools-fs-search
description: Use terminal_rg / terminal_find when you need raw filesystem search outside the project tree — system configs, /var/log, /etc, archive contents — or when files-tools.search_files is too project-scoped. Teaches the rg vs find vs terminal_exec("ls/du/tree") split, common rg flag combos for code/logs/configs, find predicates for mtime/size/type queries, and the rule that for tree views or single-file stat info you should just use terminal_exec instead of inventing a tool. Read before reaching for raw shell to grep or find anything.
metadata:
author: hive
type: preset-skill
version: "1.0"
---
# Filesystem search
terminal-tools provides two structured search tools: `terminal_rg` (ripgrep for content) and `terminal_find` (find for predicates). Everything else (tree, stat, du) is just `terminal_exec`.
## When to use what
| Task | Tool |
|---|---|
| Find code/text matching a pattern in your **project** | `files-tools.search_files` (project-aware, ranks by relevance) |
| Find code/text matching a pattern in `/var/log`, `/etc`, archives, system dirs | `terminal_rg` |
| Find files matching name/glob/predicate | `terminal_find` |
| List a directory | `terminal_exec("ls -la /path")` |
| Tree view | `terminal_exec("tree -L 2 /path")` |
| Single-path stat | `terminal_exec("stat /path")` |
| Disk usage | `terminal_exec("du -sh /path")` or `terminal_exec("du -h --max-depth=2 /")` |
| Count matches across files | `terminal_rg(pattern, count=True via extra_args=["-c"])` |
## `terminal_rg` — content search
ripgrep is fast, gitignore-aware, and has a deep flag surface. The structured wrapper exposes the most useful flags directly; `extra_args` covers the rest.
### Common patterns
```
# All Python files containing "TODO"
terminal_rg(pattern="TODO", path=".", type_filter="py")
# Case-insensitive, with context
terminal_rg(pattern="error", path="/var/log", ignore_case=True, context=2)
# Search hidden files (rg ignores them by default)
terminal_rg(pattern="api_key", path="~", hidden=True)
# Don't respect .gitignore (find files git would ignore)
terminal_rg(pattern="generated", path=".", no_ignore=True)
# Multi-line pattern (e.g., function definitions spanning lines)
terminal_rg(pattern=r"def\s+\w+\(.*\n.*\n", path="src", extra_args=["--multiline"])
# Specific filename glob
terminal_rg(pattern="version", path=".", glob="*.toml")
```
### rg flag idioms
| Flag | Effect |
|---|---|
| `-tpy` (`type_filter="py"`) | Only Python files |
| `-uu` | Don't respect any ignores (incl. `.git/`) |
| `--multiline` (`extra_args`) | Allow regex spanning lines |
| `--max-count` (`max_count`) | Stop after N matches per file |
| `--max-depth` (`max_depth`) | Limit recursion |
| `-w` (`extra_args`) | Whole word match |
| `-F` (`extra_args`) | Fixed string (no regex) |
See `references/ripgrep_cheatsheet.md` for the long form.
## `terminal_find` — predicate search
`find` excels at "files matching N criteria". The wrapper surfaces the most common predicates; combine via the structured arguments.
```
# All .log files modified in the last 7 days, larger than 1MB
terminal_find(path="/var/log", iname="*.log", mtime_days=7, size_kb_min=1024)
# All directories named ".git" (find Git repos under a tree)
terminal_find(path="~/projects", name=".git", type_filter="d")
# Only the top three levels
terminal_find(path="/etc", max_depth=3, type_filter="f")
# Symlinks
terminal_find(path=".", type_filter="l")
```
See `references/find_predicates.md` for combinations not directly exposed.
## Output truncation
Both tools return `truncated: true` when their output exceeded the inline cap. For `terminal_rg`, this means matches were dropped (refine the pattern or narrow the path); for `terminal_find`, results past `max_results` (default 1000) are dropped. Tighten predicates rather than raising the cap.
## Anti-patterns
- **Don't `terminal_rg` your project tree** — `files-tools.search_files` is project-aware and ranks results.
- **Don't reach for `terminal_find` to list one directory** — `terminal_exec("ls -la /path")` is shorter.
- **Don't use `terminal_exec("grep ...")`** when `terminal_rg` exists — rg is faster, gitignore-aware, and returns structured matches.
- **Don't use `terminal_exec("find ...")`** to invent your own predicate combinations — use `terminal_find` and report missing capabilities.
@@ -0,0 +1,78 @@
# find predicate reference
The `terminal_find` wrapper exposes name/iname, type, mtime_days, size bounds, max_depth, max_results. For combinations beyond that, drop to `terminal_exec("find ...")`.
## Time predicates
| Need | find predicate |
|---|---|
| Modified within N days | `-mtime -N` (wrapper: `mtime_days=N`) |
| Modified more than N days ago | `-mtime +N` |
| Modified exactly N days ago | `-mtime N` |
| Accessed within N days | `-atime -N` |
| Inode changed within N days | `-ctime -N` |
| Modified in last N minutes | `-mmin -N` |
| Newer than reference file | `-newer ref` |
## Size predicates
| Need | find predicate |
|---|---|
| Bigger than N kilobytes | `-size +Nk` (wrapper: `size_kb_min`) |
| Smaller than N kilobytes | `-size -Nk` (wrapper: `size_kb_max`) |
| Exactly N kilobytes | `-size Nk` |
| Bigger than N megabytes | `-size +NM` |
| Empty files | `-empty` |
## Type predicates
| Need | find predicate |
|---|---|
| Regular file | `-type f` (wrapper: `type_filter="f"`) |
| Directory | `-type d` (wrapper: `type_filter="d"`) |
| Symlink | `-type l` (wrapper: `type_filter="l"`) |
| Block device | `-type b` |
| Character device | `-type c` |
| FIFO | `-type p` |
| Socket | `-type s` |
## Permission predicates
| Need | find predicate |
|---|---|
| Owned by user | `-user alice` |
| Owned by group | `-group dev` |
| Permission bits exact | `-perm 644` |
| Has any of these bits | `-perm /u+x` |
| Has all of these bits | `-perm -u+x` |
| Readable by current user | `-readable` |
| Writable | `-writable` |
| Executable | `-executable` |
## Composing
`find` evaluates predicates left-to-right with implicit AND. For OR, use `\(`...\` or .
```
# .log OR .txt (drop to terminal_exec for OR)
terminal_exec(r"find /path \( -name '*.log' -o -name '*.txt' \) -type f", shell=True)
# NOT in a directory called node_modules
terminal_exec("find . -path '*/node_modules' -prune -o -name '*.js' -print", shell=True)
```
## Actions
| Need | predicate |
|---|---|
| Print path (default) | (implicit `-print`) |
| Print null-separated | `-print0` (for piping to xargs -0) |
| Delete | `-delete` (DANGEROUS — use terminal_exec with explicit confirmation) |
| Run command per match | `-exec cmd {} \;` (drop to terminal_exec) |
| Run command, batched | `-exec cmd {} +` |
## When NOT to use find
- **One directory listing**: `terminal_exec("ls -la /path")`
- **Recursive grep**: `terminal_rg`
- **Count files**: `terminal_exec("find /path -type f | wc -l")`
@@ -0,0 +1,70 @@
# ripgrep cheatsheet
For when the structured `terminal_rg` flags don't cover the case. Pass via `extra_args=[...]`.
## Filtering
| Need | Flag |
|---|---|
| Whole word | `-w` |
| Fixed string (no regex) | `-F` |
| Match files only (paths, not lines) | `-l` |
| Count matches per file | `-c` |
| Print only filenames with no matches | `--files-without-match` |
| Exclude binary files | (default) |
| Include binaries | `--binary` |
| Search archives transparently | (rg doesn't — extract first) |
## Output shape
| Need | Flag |
|---|---|
| Show only matched part | `-o` |
| Show byte offset of match | `-b` |
| No filename prefix | `-N` (or pipe through awk) |
| Color always (for piping into a colorizer) | `--color=always` |
| JSON output | (the wrapper already uses `--json` internally) |
## Boundaries
| Need | Flag |
|---|---|
| Line-by-line (default) | (default) |
| Multi-line regex | `--multiline` (or `-U`) |
| Multi-line dotall (`.` matches `\n`) | `--multiline-dotall` |
| Crlf line endings | `--crlf` |
## Path control
| Need | Flag |
|---|---|
| Follow symlinks | `-L` |
| Don't follow | (default) |
| Search hidden | `-.` (also expressed as `hidden=True`) |
| Don't respect any ignores | `-uuu` |
| Glob include | `-g 'pattern'` (also `glob="..."`) |
| Glob exclude | `-g '!pattern'` |
## Performance
| Need | Flag |
|---|---|
| One thread | `-j 1` |
| Smaller mmap chunks | `--mmap` (default behavior usually fine) |
| Per-file match cap | `-m N` (also `max_count=N`) |
## Common composed queries
```
# Find unused imports in Python
terminal_rg(pattern=r"^import\s+\w+$", path="src", type_filter="py")
# All TODO/FIXME/XXX with file:line
terminal_rg(pattern=r"\b(TODO|FIXME|XXX)\b", path=".", extra_args=["-n"])
# Functions defined at module top-level
terminal_rg(pattern=r"^def\s+\w+", path=".", type_filter="py")
# Lines that DON'T match a pattern (filtered through awk)
# rg can't invert at line level; use terminal_exec with grep -v
```
@@ -0,0 +1,110 @@
---
name: hive.terminal-tools-job-control
description: Use when launching anything that runs longer than a minute, anything that streams logs, anything you want to keep running while doing other work — or when terminal_exec auto-backgrounded on you and returned a job_id. Teaches the start→poll→wait pattern with terminal_job_logs offset bookkeeping, the `wait_until_exit=True` blocking-poll idiom, the truncated_bytes_dropped resumption signal, the merge_stderr decision, the SIGINT→SIGTERM→SIGKILL escalation ladder via terminal_job_manage, and the hard rule that jobs die when the terminal-tools server restarts. Read before calling terminal_job_start, or right after terminal_exec auto-backgrounded.
metadata:
author: hive
type: preset-skill
version: "1.0"
---
# Background job control
Background jobs are how you do things that take time without blocking your conversation. Three tools cover the surface: `terminal_job_start`, `terminal_job_logs`, `terminal_job_manage`.
## When to use a job
- Builds, deploys, long tests
- Processes you want to monitor (streaming a log file, a dev server)
- Anything that auto-backgrounded from `terminal_exec` (you have a `job_id`; pivot to this skill's idioms)
For one-shot work expected to finish quickly, `terminal_exec` is simpler. The auto-promotion mechanic in `terminal_exec` is your safety net — start with `terminal_exec`, take over with this skill if needed.
## Lifecycle
```
terminal_job_start(command, ...)
→ { job_id, pid, started_at }
terminal_job_logs(job_id, since_offset=0, max_bytes=64000)
→ { data, offset, next_offset, status: "running"|"exited", exit_code, ... }
# Repeat with since_offset = previous next_offset until status == "exited"
# Or block once with wait_until_exit=True:
terminal_job_logs(job_id, since_offset=N, wait_until_exit=True, wait_timeout_sec=60)
→ blocks server-side until exit or timeout
```
After exit, the job is retained for inspection (`terminal_job_manage(action="list")`) until evicted by FIFO (50 most recent exits kept).
## Offset bookkeeping — the only rule that matters
The job's output lives in a 4 MB ring buffer per stream. Each call to `terminal_job_logs` returns:
- `data` — bytes between `since_offset` and `next_offset`
- `next_offset` — pass this as `since_offset` on your next call
- `truncated_bytes_dropped` — non-zero when your `since_offset` was older than the ring's floor (you fell behind)
**Always carry `next_offset` forward.** Don't replay from 0 — that's an offset reset, you'll see the same data twice and miss the part that fell off.
When `truncated_bytes_dropped > 0`, the buffer evicted N bytes between your last call and now. Treat it as a signal that the job is producing output faster than you're consuming. Either poll more often or accept the gap and read from `next_offset` going forward.
## merge_stderr — interleaved or separate
```
merge_stderr=False → two streams, request "stdout" or "stderr" by name
merge_stderr=True → one stream ("merged"), order preserved
```
Pick `merge_stderr=True` when:
- The job's logs are designed to be read together (most servers, build tools)
- You don't need to distinguish "this was stderr"
Pick `merge_stderr=False` when:
- stderr is genuinely error-only and stdout is data
- You'll process them differently
## Signal escalation
```
terminal_job_manage(action="signal_int", job_id=...) # graceful (Ctrl-C-equivalent)
terminal_job_manage(action="signal_term", job_id=...) # polite kill (SIGTERM)
terminal_job_manage(action="signal_kill", job_id=...) # forced kill (SIGKILL, uncatchable)
```
The idiom: `signal_int` → wait 2-5s → `signal_term` → wait 2-5s → `signal_kill`. Most well-behaved processes handle SIGINT (graceful) and SIGTERM (cleanup, then exit). SIGKILL bypasses cleanup — use only when the process is truly unresponsive.
After signaling, check exit with `terminal_job_logs(job_id, wait_until_exit=True, wait_timeout_sec=2)`.
## Stdin
```
terminal_job_manage(action="stdin", job_id=..., data="some input\n")
terminal_job_manage(action="close_stdin", job_id=...)
```
For tools that read stdin to EOF, `close_stdin` after writing flushes them. For interactive tools that read line-by-line, just write each line.
## Take-over: when terminal_exec auto-backgrounds
When `terminal_exec` returned `auto_backgrounded: true, job_id: <X>`, the process is **already** in the JobManager with its output flowing into the ring buffer. Your transition is seamless:
```
# Already saw the start of output in terminal_exec's stdout/stderr.
# Pick up reading where the env left off — use the byte count of the
# initial stdout as your since_offset, OR just request tail output:
terminal_job_logs(job_id="job_xxx", tail=True, max_bytes=64000)
```
Or block until exit and grab everything:
```
terminal_job_logs(job_id="job_xxx", since_offset=0, wait_until_exit=True, wait_timeout_sec=120)
```
## Hard rules
- **Jobs die when the server restarts.** The desktop runtime restarts terminal-tools when Hive restarts. There's no re-attach. If you need durability, use `nohup` + `terminal_exec` to detach into the system's process tree and track the PID yourself.
- **Server-wide hard cap on concurrent jobs** (`TERMINAL_TOOLS_MAX_JOBS`, default 32). Past the cap, `terminal_job_start` returns an error. Wait for jobs to exit or kill old ones.
- **No cross-restart output.** Output handles and ring buffers are in-memory only.
See `references/signals.md` for the full signal catalog.
@@ -0,0 +1,41 @@
# Signal reference
terminal_job_manage exposes six signals via the action name.
| Action | Signal | Number | Purpose | Catchable? |
|---|---|---|---|---|
| `signal_int` | SIGINT | 2 | Interrupt — Ctrl-C equivalent. Most CLIs treat as "stop gracefully". | Yes |
| `signal_term` | SIGTERM | 15 | Polite termination request. Default for `kill`. | Yes |
| `signal_kill` | SIGKILL | 9 | Forced kill. Process can't catch, clean up, or finalize. Use sparingly. | **No** |
| `signal_hup` | SIGHUP | 1 | Hangup. Many daemons reload config on this. | Yes |
| `signal_usr1` | SIGUSR1 | 10 | User-defined #1. Common: dump state, rotate logs (nginx, etc). | Yes |
| `signal_usr2` | SIGUSR2 | 12 | User-defined #2. Common: graceful binary upgrade (unicorn, etc). | Yes |
## Escalation idiom
```
1. signal_int (Ctrl-C — graceful)
2. wait 2-5s, check status with terminal_job_logs(wait_until_exit=True, wait_timeout_sec=3)
3. if still running: signal_term (cleanup-then-exit)
4. wait 2-5s
5. if still running: signal_kill (forced)
```
The waits matter: SIGTERM handlers do real work (flush logs, close DBs, release locks) and need time. Skipping straight to SIGKILL leaks resources.
## When to use SIGUSR1 / SIGUSR2
These are application-defined. Read the target's docs first. Common:
- **nginx**: SIGUSR1 → reopen log files (for log rotation)
- **unicorn / puma**: SIGUSR2 → fork a new master with the latest binary (graceful restart)
- **rsync**: SIGUSR1 → print stats so far
## Reading exit codes after a signal
When a job exits via signal, `terminal_job_logs` returns `exit_code: -N` (subprocess convention) where `abs(N)` is the signal number. The shell convention `128 + N` doesn't apply to the JobManager — that's for shell-spawned children.
| exit_code | Means |
|---|---|
| -2 | Killed by SIGINT |
| -9 | Killed by SIGKILL |
| -15 | Killed by SIGTERM |
@@ -0,0 +1,127 @@
---
name: hive.terminal-tools-pty-sessions
description: Use when you need state across calls — building env vars, navigating with cd, driving REPLs (python -i, mysql, psql, node), or responding to interactive prompts (sudo password, ssh host-key confirmation, mysql connection). Teaches the prompt-sentinel exec pattern (default mode), raw I/O for REPLs (raw_send=True then read_only=True), the one-in-flight-per-session rule, and the close-or-leak-against-the-cap discipline. Bash on macOS — never zsh; explicit shell=/bin/zsh is rejected. Read before calling terminal_pty_open.
metadata:
author: hive
type: preset-skill
version: "1.0"
---
# Persistent PTY sessions
PTY sessions are how you talk to interactive programs — programs that detect a terminal (`isatty()`) and behave differently when they don't see one. Use a session when:
- You need state to persist across calls (`cd`, env vars, sourced scripts)
- You're driving a REPL (`python -i`, `mysql`, `psql`, `node`, `irb`)
- A program demands an interactive prompt (`sudo`, `ssh`, `npm login`, `gh auth login`)
For everything else, `terminal_exec` is simpler. Sessions cost more (per-session bash process, ring buffer, idle-reaping bookkeeping) and have a hard cap (`TERMINAL_TOOLS_MAX_PTY`, default 8).
## Why PTY (and not subprocess pipes)
Subprocess pipes break on every interactive program. The moment a program calls `isatty()` and sees False, it disables prompts, color, line-editing, password masking, progress bars — sometimes refuses to start. PTY makes us look like a real terminal so these programs work the same as in your shell.
The cost: PTY output includes terminal escape codes (cursor moves, color codes). The session captures them as-is; if you need clean text, strip ANSI escapes in your processing layer.
## Bash on macOS — by deliberate policy
`terminal_pty_open` always invokes `/bin/bash`, regardless of the user's `$SHELL`. macOS users: yes, even when zsh is your interactive default. This is the **terminal-tools-foundations** policy applied to PTYs.
Reasons:
- zsh has command/builtin classes (`zmodload`, `=cmd` expansion, `zpty`, `ztcp`) that bypass bash-shaped security checks
- One shell behavior across platforms eliminates "works on Linux, breaks on macOS" surprises
- Bash is universal: any shell you've used will accept the bash subset
The bash invocation uses `--norc --noprofile` so user dotfiles don't leak in. PS1 is set to a unique sentinel for prompt detection. PS2 is empty. PROMPT_COMMAND is empty.
## Three modes of `terminal_pty_run`
### 1. Default: send command, wait for prompt sentinel
```
terminal_pty_run(session_id, command="ls -la")
→ { output, prompt_after: True, ... }
```
The session writes `ls -la\n`, waits for the sentinel that its custom PS1 emits, returns the slice between submission and prompt. **One in-flight call per session** — a concurrent call returns a `"session busy"` error.
### 2. raw_send: send raw input, no waiting
```
terminal_pty_run(session_id, command="print('hi')\n", raw_send=True)
→ { bytes_sent: 12 }
```
For REPLs, vim keystrokes, password prompts. The session writes the bytes and returns immediately — it doesn't wait for a prompt (REPLs don't print bash's prompt; they print their own).
After a `raw_send`, you typically follow with:
### 3. read_only: drain currently-buffered output
```
terminal_pty_run(session_id, read_only=True, timeout_sec=2)
→ { output: "hi\n", more: False, ... }
```
Reads whatever the session has accumulated since the last drain, with a brief settle window. Use after raw_send to capture the REPL's response.
## Custom prompt detection (`expect`)
When the command launches a program with its own prompt (Python REPL's `>>> `, mysql's `mysql> `, sudo's password prompt), the bash sentinel won't appear until the program exits. Override:
```
terminal_pty_run(session_id, command="python3", expect=r">>>\s*$", timeout_sec=10)
→ output up to and including ">>>", then control returns
```
For sudo:
```
terminal_pty_run(session_id, command="sudo -k && sudo whoami", expect=r"[Pp]assword:")
terminal_pty_run(session_id, command="<password>", raw_send=True, command="<password>\n")
terminal_pty_run(session_id, read_only=True, timeout_sec=5)
```
(Treat passwords carefully — they end up in the ring buffer.)
## Always close
```
terminal_pty_close(session_id)
```
Leaked sessions count against `TERMINAL_TOOLS_MAX_PTY` (default 8). Idle reaping happens lazily on every `_open` call (sessions inactive longer than `idle_timeout_sec`, default 1800s, are dropped) — but don't rely on it. Close when you're done.
For unresponsive sessions, `force=True` skips the graceful "exit" attempt and goes straight to SIGTERM/SIGKILL.
## Common patterns
### Stateful navigation
```
sid = terminal_pty_open(cwd="/")
terminal_pty_run(sid, command="cd /var/log")
terminal_pty_run(sid, command="ls -la *.log | head")
terminal_pty_close(sid)
```
### Python REPL
```
sid = terminal_pty_open()
terminal_pty_run(sid, command="python3", expect=r">>>\s*$")
terminal_pty_run(sid, command="x = 42", raw_send=True)
terminal_pty_run(sid, command="print(x*x)\n", raw_send=True)
result = terminal_pty_run(sid, read_only=True) # → "1764\n>>> "
terminal_pty_run(sid, command="exit()", raw_send=True)
terminal_pty_close(sid)
```
### ssh with host-key prompt
```
sid = terminal_pty_open()
terminal_pty_run(sid, command="ssh user@new-host", expect=r"\(yes/no.*\)\?")
terminal_pty_run(sid, command="yes\n", raw_send=True)
terminal_pty_run(sid, read_only=True, timeout_sec=10) # password prompt or login
```
@@ -0,0 +1,92 @@
---
name: hive.terminal-tools-troubleshooting
description: Read when a terminal-tools call returned something surprising — empty stdout despite no error, exit_code is null, output_handle came back expired, "too many jobs" / "session busy" / "too many PTYs", warning was set unexpectedly, semantic_status disagrees with exit_code. Diagnostic recipes only — load on demand. Don't preload; the foundational skill covers the happy path.
metadata:
author: hive
type: preset-skill
version: "1.0"
---
# Troubleshooting terminal-tools
Recipes for surprising results. Match the symptom to the section.
## Empty `stdout` despite the command "should have" produced output
Possible causes:
1. Output went to **stderr** instead. Check `stderr` in the envelope (or use `merge_stderr=True` for jobs).
2. Output was **fully truncated** because `max_output_kb` is too small. Check `stdout_truncated_bytes > 0`. Bump `max_output_kb` or paginate via `output_handle`.
3. Command produced no output (correct, just unexpected — `silent` flags, no matches).
4. Pipeline issue: the last stage of a pipe ran but stdout went elsewhere (`> /dev/null`, redirected via `2>&1`).
5. Process is buffering its output and didn't flush before exit. Add `stdbuf -oL` (line-buffered) or `unbuffer` to the command.
## `exit_code: null`
| Cause | Other field |
|---|---|
| Auto-backgrounded | `auto_backgrounded: true, job_id: <X>` |
| Hard timeout, process killed | `timed_out: true` |
| Pre-spawn failure (command not found) | `error: ...` set, `pid: null` |
| Still running (in `terminal_job_logs`) | `status: "running"` |
## `output_handle` returned `expired: true`
5-minute TTL. Either (a) you waited too long, or (b) the store evicted it under memory pressure (64 MB total cap, LRU eviction). Re-run the command.
To reduce risk: paginate the handle as soon as you receive it, or use `terminal_job_*` for huge outputs (4 MB ring buffer with offsets — no expiry).
## "too many jobs" / `JobLimitExceeded`
`TERMINAL_TOOLS_MAX_JOBS` (default 32) hit. Either:
- Wait for jobs to exit (poll with `terminal_job_logs(wait_until_exit=True)`)
- Kill old jobs: `terminal_job_manage(action="list")` to see what's running, then `signal_term` the abandoned ones
- Raise the cap via env (rare)
## "session busy"
A `terminal_pty_run` was issued while another `_run` is in flight on the same session. PTY sessions are single-threaded conversations. Wait for the prior call to return, or open a second session.
## "PTY cap reached"
`TERMINAL_TOOLS_MAX_PTY` (default 8) hit. Close idle sessions (`terminal_pty_close`). Idle reaping is lazy; force it by opening — no, actually, opening throws when the cap is hit. Just close manually.
## `warning` is set, the command worked
Informational only. The pattern matched (e.g. `rm -rf` literally appears, or `git push --force` was used). The command ran. The warning is your "did I mean to do that?" prompt — verify the side effect was intended before continuing.
## `semantic_status: "ok"` but `exit_code: 1`
Working as designed. Some commands use exit 1 for legitimate non-error states:
- `grep` / `rg` exit 1 when **no matches** found
- `find` exit 1 when **some directories were unreadable** (typical on `/proc`, etc.)
- `diff` exit 1 when **files differ**
- `test` / `[` exit 1 when **condition is false**
The `semantic_message` field explains. Trust `semantic_status`, not raw `exit_code`.
## `semantic_status: "error"` but `exit_code: 0`
Shouldn't happen. If it does, file a bug.
## `truncated_bytes_dropped > 0` in `terminal_job_logs`
Your `since_offset` was older than the ring buffer's floor — bytes evicted before you could read them. Either:
- Poll faster (lower latency between calls)
- Use `merge_stderr=True` (single 4 MB ring instead of 4 MB × 2)
- Accept the gap and move forward from `next_offset`
## `terminal_pty_open` succeeds but the first `_run` times out
The session may not have produced its first prompt sentinel within the 2-second startup window. Try:
- A `terminal_pty_run(sid, read_only=True, timeout_sec=2)` to drain whatever's accumulated
- A noop command (`terminal_pty_run(sid, command="true")`) to force a prompt cycle
Could also indicate the bash process died at startup — `terminal_pty_run(sid, ...)` would then return `"session has exited"`.
## `shell="/bin/zsh"` returned an error
By design. terminal-tools is bash-only on POSIX. Use `shell=True` (default `/bin/bash`) or omit `shell=` to exec directly.
## A command in `shell=True` is interpreted differently than expected
Bash, not zsh, semantics. `**/*` doesn't recurse without `shopt -s globstar`; `=cmd` expansion doesn't work; arrays use `arr[idx]` not `${arr[idx]}` differently than zsh. When in doubt, the foundational skill's "bash, not zsh" section is the canonical statement.
+6 -2
View File
@@ -136,8 +136,12 @@ class SkillDiscovery:
self._scanned_dirs.append(user_agents)
all_skills.extend(self._scan_scope(user_agents, "user"))
# Hive-specific (higher precedence within user scope)
user_hive = home / ".hive" / "skills"
# Hive-specific (higher precedence within user scope). Honors
# HIVE_HOME so the desktop's per-user root (set via env) wins
# over the shared ``~/.hive`` location.
from framework.config import HIVE_HOME
user_hive = HIVE_HOME / "skills"
if user_hive.is_dir():
self._scanned_dirs.append(user_hive)
all_skills.extend(self._scan_scope(user_hive, "user"))
+9 -5
View File
@@ -15,14 +15,18 @@ import subprocess
import tempfile
from pathlib import Path
from framework.config import HIVE_HOME
from framework.skills.parser import ParsedSkill
from framework.skills.skill_errors import SkillError, SkillErrorCode
# Default install destination for user-scope skills
USER_SKILLS_DIR = Path.home() / ".hive" / "skills"
# Default install destination for user-scope skills.
# Anchored on HIVE_HOME so the desktop shell can override the install
# root via $HIVE_HOME without patching every call site.
USER_SKILLS_DIR = HIVE_HOME / "skills"
# Sentinel file for the one-time security notice on first install (NFR-5).
INSTALL_NOTICE_SENTINEL = HIVE_HOME / ".install_notice_shown"
# Sentinel file for the one-time security notice on first install (NFR-5)
INSTALL_NOTICE_SENTINEL = Path.home() / ".hive" / ".install_notice_shown"
_INSTALL_NOTICE = """\
@@ -44,7 +48,7 @@ _INSTALL_NOTICE = """\
def maybe_show_install_notice() -> None:
"""Print a one-time security notice before the first skill install (NFR-5).
Touches a sentinel file in ~/.hive/ after showing the notice so it is
Touches a sentinel file in $HIVE_HOME after showing the notice so it is
only displayed once across all future installs.
"""
if INSTALL_NOTICE_SENTINEL.exists():
+16 -4
View File
@@ -26,9 +26,21 @@ _DEFAULT_REGISTRY_URL = (
"https://raw.githubusercontent.com/hive-skill-registry/hive-skill-registry/main/skill_index.json"
)
_CACHE_DIR = Path.home() / ".hive" / "registry_cache"
_CACHE_INDEX_PATH = _CACHE_DIR / "skill_index.json"
_CACHE_METADATA_PATH = _CACHE_DIR / "metadata.json"
def _cache_dir() -> Path:
from framework.config import HIVE_HOME
return HIVE_HOME / "registry_cache"
def _cache_index_path() -> Path:
return _cache_dir() / "skill_index.json"
def _cache_metadata_path() -> Path:
return _cache_dir() / "metadata.json"
_CACHE_TTL_SECONDS = 3600 # 1 hour
@@ -46,7 +58,7 @@ class RegistryClient:
cache_dir: Path | None = None,
) -> None:
self._url = registry_url or os.environ.get("HIVE_REGISTRY_URL", _DEFAULT_REGISTRY_URL)
cache_root = cache_dir or _CACHE_DIR
cache_root = cache_dir or _cache_dir()
self._index_path = cache_root / "skill_index.json"
self._metadata_path = cache_root / "metadata.json"
+2
View File
@@ -33,6 +33,8 @@ _BUNDLED_DIRS: tuple[Path, ...] = (
# (tool-name prefix, skill directory name, display name)
_TOOL_GATED_SKILLS: list[tuple[str, str, str]] = [
("browser_", "browser-automation", "hive.browser-automation"),
("terminal_", "terminal-tools-foundations", "hive.terminal-tools-foundations"),
("chart_", "chart-creation-foundations", "hive.chart-creation-foundations"),
]
_BODY_CACHE: dict[str, str] = {}
+13 -6
View File
@@ -20,6 +20,7 @@ from enum import StrEnum
from pathlib import Path
from urllib.parse import urlparse
from framework.config import HIVE_HOME
from framework.skills.parser import ParsedSkill
logger = logging.getLogger(__name__)
@@ -30,8 +31,11 @@ _ENV_TRUST_ALL = "HIVE_TRUST_PROJECT_SKILLS"
# Env var for comma-separated own-remote glob patterns (e.g. "github.com/myorg/*").
_ENV_OWN_REMOTES = "HIVE_OWN_REMOTES"
_TRUSTED_REPOS_PATH = Path.home() / ".hive" / "trusted_repos.json"
_NOTICE_SENTINEL_PATH = Path.home() / ".hive" / ".skill_trust_notice_shown"
# Persisted store of trusted git remotes (one-shot consent per repo).
_TRUSTED_REPOS_PATH = HIVE_HOME / "trusted_repos.json"
# Sentinel for the one-time security notice (NFR-5).
_NOTICE_SENTINEL_PATH = HIVE_HOME / ".skill_trust_notice_shown"
# ---------------------------------------------------------------------------
@@ -224,7 +228,9 @@ class ProjectTrustDetector:
patterns.extend(p.strip() for p in raw.split(",") if p.strip())
# From ~/.hive/own_remotes file
own_remotes_file = Path.home() / ".hive" / "own_remotes"
from framework.config import HIVE_HOME
own_remotes_file = HIVE_HOME / "own_remotes"
if own_remotes_file.is_file():
try:
for line in own_remotes_file.read_text(encoding="utf-8").splitlines():
@@ -415,7 +421,8 @@ class TrustGate:
def _maybe_show_security_notice(self, Colors) -> None: # noqa: N803
"""Show the one-time security notice if not already shown (NFR-5)."""
if _NOTICE_SENTINEL_PATH.exists():
sentinel = _NOTICE_SENTINEL_PATH
if sentinel.exists():
return
self._print("")
self._print(
@@ -427,8 +434,8 @@ class TrustGate:
)
self._print("")
try:
_NOTICE_SENTINEL_PATH.parent.mkdir(parents=True, exist_ok=True)
_NOTICE_SENTINEL_PATH.touch()
sentinel.parent.mkdir(parents=True, exist_ok=True)
sentinel.touch()
except OSError:
pass
+13 -1
View File
@@ -49,7 +49,7 @@ class TaskRecord(BaseModel):
class TaskListMeta(BaseModel):
"""Per-list metadata stored in ``meta.json`` next to the task files."""
"""Per-list metadata. Embedded in ``TaskListDocument``."""
task_list_id: str
role: TaskListRole
@@ -59,6 +59,18 @@ class TaskListMeta(BaseModel):
schema_version: int = 1
class TaskListDocument(BaseModel):
"""Whole task list as a single JSON document on disk.
Lives at ``{task_list_path(list_id)}/tasks.json``; the list-lock
sentinel is its sibling ``tasks.json.lock``.
"""
meta: TaskListMeta
highwatermark: int = 0
tasks: list[TaskRecord] = Field(default_factory=list)
# Tagged union for claim_task_with_busy_check. Used by run_parallel_workers
# when stamping ``assigned_session`` on a colony template entry — the only
# place a "claim" actually happens under the hive model.
+4
View File
@@ -89,6 +89,10 @@ def build_reminder(records: list[TaskRecord]) -> str:
" - If you're umbrella-tracking ('reply to all posts' as one task), "
"break it into one task per atomic action — use `task_create_batch` "
"with one entry per action.",
" - Also consider cleaning up the task list if it has become stale: "
"if any open tasks no longer apply (user pivoted, scope shifted, "
"task created in error), delete them via `task_update` with "
"status='deleted'. Don't leave stale items sitting on the list.",
]
if in_progress:
bullets.append(
+470 -273
View File
@@ -2,29 +2,26 @@
Layout per list::
{root}/{task_list_id}/
meta.json -- TaskListMeta
tasks/
0001.json -- TaskRecord (zero-padded for ls-sort)
0002.json
...
.lock -- list-level lock
.highwatermark -- ID floor (deleted ids never reused)
{task_list_path}/tasks.json -- TaskListDocument (meta + hwm + tasks)
{task_list_path}/tasks.json.lock -- list-level lock sentinel
Two list-roots:
Where ``task_list_path`` is:
colony:{colony_id} -> ~/.hive/colonies/{colony_id}/tasks/
session:{a}:{s} -> ~/.hive/agents/{a}/sessions/{s}/tasks/
colony:{c} -> ~/.hive/colonies/{c}/
session:{a}:{s} -> ~/.hive/agents/{a}/sessions/{s}/
unscoped:{a} -> ~/.hive/unscoped/{a}/
{malformed} -> ~/.hive/_misc/{slug}/
An older layout used the same root + a nested ``tasks/`` subdir holding
``meta.json``, ``.highwatermark``, ``.lock``, and ``NNNN.json`` per task.
That produced the ugly ``/tasks/tasks/0001.json`` path. Migration is
lazy the first lock-protected access on such a list folds the legacy
artifacts into ``tasks.json`` and unlinks them.
All filesystem I/O is wrapped in ``asyncio.to_thread`` so the event loop
never blocks. Locks use a 30-retry / ~2.6s budget comfortable headroom
for the only realistic write contender (colony template under concurrent
never blocks. Locks use a ~3s budget comfortable headroom for the only
realistic write contender (colony template under concurrent
``colony_template_*`` and ``run_parallel_workers`` stamps).
The "_unsafe" variants exist because filelock is **not re-entrant**: a
caller already holding a lock must NOT re-acquire it (would deadlock).
The unsafe path skips acquisition and is callable only from inside another
locked function. See ``claim_task_with_busy_check`` and ``delete_task``.
"""
from __future__ import annotations
@@ -32,6 +29,8 @@ from __future__ import annotations
import asyncio
import logging
import os
import shutil
import threading
import time
from collections.abc import Iterable
from pathlib import Path
@@ -46,6 +45,7 @@ from framework.tasks.models import (
ClaimNotFound,
ClaimOk,
ClaimResult,
TaskListDocument,
TaskListMeta,
TaskListRole,
TaskRecord,
@@ -57,6 +57,24 @@ logger = logging.getLogger(__name__)
LOCK_TIMEOUT_SECONDS = 3.0 # ~30 retries × ~100ms
DOC_FILENAME = "tasks.json"
LOCK_FILENAME = "tasks.json.lock" # only colony lists (cross-process writers)
# Per-list in-memory locks for single-process scopes (session/unscoped/_misc).
# Sessions have one owning agent, so only same-process concurrency matters
# (e.g. parallel tool use within a single turn) — no on-disk lock needed.
_INPROC_LOCKS: dict[str, threading.Lock] = {}
_INPROC_LOCKS_GUARD = threading.Lock()
def _get_inproc_lock(task_list_id: str) -> threading.Lock:
with _INPROC_LOCKS_GUARD:
lock = _INPROC_LOCKS.get(task_list_id)
if lock is None:
lock = threading.Lock()
_INPROC_LOCKS[task_list_id] = lock
return lock
class _Unset:
"""Sentinel for "owner argument not provided" — distinct from owner=None."""
@@ -72,27 +90,87 @@ def _hive_root() -> Path:
return Path(os.environ.get("HIVE_HOME", str(Path.home() / ".hive")))
def _find_queen_session_dir(session_id: str, *, hive_root: Path) -> Path | None:
"""Return ``agents/queens/{queen}/sessions/{session_id}`` if one exists.
Queens live under ``QUEENS_DIR = hive_root / "agents" / "queens"`` (see
``framework.config``). The task system gets a generic ``agent_id ==
"queen"`` in its ``task_list_id``, which would otherwise dead-end at
``agents/queen/...``, decoupled from the real session folder. By
probing the canonical layout we keep the task doc beside conversations,
events, summary, and meta for the same session.
"""
queens_dir = hive_root / "agents" / "queens"
if not queens_dir.exists():
return None
try:
candidates = [d for d in queens_dir.iterdir() if d.is_dir()]
except OSError:
return None
for queen_dir in candidates:
candidate = queen_dir / "sessions" / session_id
if candidate.is_dir():
return candidate
return None
def task_list_path(task_list_id: str, *, hive_root: Path | None = None) -> Path:
"""Resolve task_list_id -> on-disk root."""
"""Resolve task_list_id -> directory containing ``tasks.json``.
Note: this returns the *parent* of the doc file, not the file itself.
For session/colony/unscoped lists, this is the agent or colony's home
dir; the task doc is one filename inside it. (The older layout had an
extra ``tasks/`` subdir under this path see ``_legacy_root``.)
For ``session:`` lists, the canonical queen session folder is preferred
when it exists on disk: the task doc lives next to the rest of that
session's data (conversations, events, summary).
"""
root = hive_root or _hive_root()
if task_list_id.startswith("colony:"):
colony_id = task_list_id[len("colony:") :]
return root / "colonies" / colony_id / "tasks"
return root / "colonies" / colony_id
if task_list_id.startswith("session:"):
rest = task_list_id[len("session:") :]
agent_id, _, session_id = rest.partition(":")
if not session_id:
raise ValueError(f"Malformed session task_list_id: {task_list_id!r}")
return root / "agents" / agent_id / "sessions" / session_id / "tasks"
canonical = _find_queen_session_dir(session_id, hive_root=root)
if canonical is not None:
return canonical
return root / "agents" / agent_id / "sessions" / session_id
if task_list_id.startswith("unscoped:"):
agent_id = task_list_id[len("unscoped:") :]
return root / "unscoped" / agent_id / "tasks"
return root / "unscoped" / agent_id
# Last-ditch sanitization for HIVE_TASK_LIST_ID overrides — slugify the
# whole thing so the test/dev path can't escape the hive root.
safe = "".join(c if c.isalnum() or c in "-_" else "_" for c in task_list_id)
return root / "_misc" / safe
def _legacy_root(task_list_id: str, *, hive_root: Path | None = None) -> Path:
"""Where the older artifacts (meta.json, .highwatermark, tasks/NNNN.json) lived.
Pinned to the *pre-canonical* layout for queen session lists this is
``agents/{agent_id}/sessions/{session_id}/tasks`` (i.e. the literal
``agent_id`` folder, not the canonical ``agents/queens/{queen}/...``
path). The lazy migration reads from here and writes the new doc to
wherever ``task_list_path`` resolves now.
"""
root = hive_root or _hive_root()
if task_list_id.startswith("colony:"):
return root / "colonies" / task_list_id[len("colony:") :] / "tasks"
if task_list_id.startswith("session:"):
rest = task_list_id[len("session:") :]
agent_id, _, session_id = rest.partition(":")
return root / "agents" / agent_id / "sessions" / session_id / "tasks"
if task_list_id.startswith("unscoped:"):
return root / "unscoped" / task_list_id[len("unscoped:") :] / "tasks"
# _misc fallback: legacy lived directly in the slug dir, same as the new parent.
safe = "".join(c if c.isalnum() or c in "-_" else "_" for c in task_list_id)
return root / "_misc" / safe
# ---------------------------------------------------------------------------
# TaskStore — public façade
# ---------------------------------------------------------------------------
@@ -132,23 +210,16 @@ class TaskStore:
)
async def list_exists(self, task_list_id: str) -> bool:
"""A list exists if its meta.json OR any task file is on disk.
"""A list exists if its doc is on disk OR a legacy artifact is.
meta.json is normally written by ``ensure_task_list``, but session
lists may be created lazily via the first ``task_create`` (see
``_create_task_sync``) in that case meta.json is backfilled the
first time the list is read. Until then, we still want to expose
the list's tasks via REST.
The legacy fallback exists so that lists created under the older
layout and not yet migrated still surface to the REST layer.
"""
def _check() -> bool:
root = self._list_root(task_list_id)
if (root / "meta.json").exists():
if self._doc_path(task_list_id).exists():
return True
tasks_dir = root / "tasks"
if tasks_dir.exists() and any(p.suffix == ".json" for p in tasks_dir.iterdir()):
return True
return False
return self._has_legacy_artifacts(task_list_id)
return await asyncio.to_thread(_check)
@@ -156,7 +227,7 @@ class TaskStore:
return await asyncio.to_thread(self._read_meta_sync, task_list_id)
async def reset_task_list(self, task_list_id: str) -> None:
"""Delete all task files but preserve the high-water-mark.
"""Delete all tasks but preserve the high-water-mark.
Test helper. Never wired to runtime lifecycle.
"""
@@ -173,17 +244,11 @@ class TaskStore:
Each spec is a dict with keys: subject (required), description,
active_form, owner, metadata. Ids are assigned sequentially and
contiguously if any task fails to write, an exception is raised
and the whole batch is rolled back (file unlinked, high-water-mark
kept at the prior value).
Atomic-or-none semantics matter for the tool surface: a failed
partial batch would leave the LLM reasoning about cleanup, which
defeats the point of batching as a single decision.
contiguously; if any spec is malformed the whole batch is
rejected before any write. The doc model makes "atomic-or-none"
free we mutate one in-memory document and write it once.
"""
return await asyncio.to_thread(
self._create_tasks_batch_sync, task_list_id, specs
)
return await asyncio.to_thread(self._create_tasks_batch_sync, task_list_id, specs)
async def create_task(
self,
@@ -274,30 +339,212 @@ class TaskStore:
# Sync internals — all called via asyncio.to_thread
# =====================================================================
def _list_root(self, task_list_id: str) -> Path:
def _list_dir(self, task_list_id: str) -> Path:
return task_list_path(task_list_id, hive_root=self._hive_root)
def _tasks_dir(self, task_list_id: str) -> Path:
return self._list_root(task_list_id) / "tasks"
def _doc_path(self, task_list_id: str) -> Path:
return self._list_dir(task_list_id) / DOC_FILENAME
def _list_lock(self, task_list_id: str) -> FileLock:
# FileLock targets a sentinel file; it tolerates the file being absent
# by creating it on first acquire. We use the .lock filename so it's
# visible alongside the other list files.
root = self._list_root(task_list_id)
root.mkdir(parents=True, exist_ok=True)
return FileLock(str(root / ".lock"), timeout=LOCK_TIMEOUT_SECONDS)
def _list_lock(self, task_list_id: str):
"""Return a context manager that serialises writes to this list.
def _highwatermark_path(self, task_list_id: str) -> Path:
return self._list_root(task_list_id) / ".highwatermark"
Colony template lists need a cross-process ``FileLock`` because
``run_parallel_workers`` spawns worker subprocesses that stamp
completion back onto the template. Session/unscoped/_misc lists
have a single owning agent only same-process concurrency
matters (e.g. parallel tool use within one turn), so an
in-memory ``threading.Lock`` is enough and avoids the visible
``tasks.json.lock`` sentinel beside session folders.
"""
d = self._list_dir(task_list_id)
d.mkdir(parents=True, exist_ok=True)
if task_list_id.startswith("colony:"):
return FileLock(str(d / LOCK_FILENAME), timeout=LOCK_TIMEOUT_SECONDS)
return _get_inproc_lock(task_list_id)
def _meta_path(self, task_list_id: str) -> Path:
return self._list_root(task_list_id) / "meta.json"
def _legacy_dir(self, task_list_id: str) -> Path:
return _legacy_root(task_list_id, hive_root=self._hive_root)
def _task_path(self, task_list_id: str, task_id: int) -> Path:
return self._tasks_dir(task_list_id) / f"{task_id:04d}.json"
def _legacy_meta_path(self, task_list_id: str) -> Path:
return self._legacy_dir(task_list_id) / "meta.json"
# ----- meta ---------------------------------------------------------
def _legacy_hwm_path(self, task_list_id: str) -> Path:
return self._legacy_dir(task_list_id) / ".highwatermark"
def _legacy_lock_path(self, task_list_id: str) -> Path:
return self._legacy_dir(task_list_id) / ".lock"
def _legacy_tasks_dir(self, task_list_id: str) -> Path:
return self._legacy_dir(task_list_id) / "tasks"
def _has_legacy_artifacts(self, task_list_id: str) -> bool:
if self._legacy_meta_path(task_list_id).exists():
return True
td = self._legacy_tasks_dir(task_list_id)
if td.exists():
try:
return any(p.suffix == ".json" for p in td.iterdir())
except OSError:
return False
return False
# ----- doc IO -------------------------------------------------------
def _read_doc_sync(self, task_list_id: str) -> TaskListDocument | None:
"""Lock-free read for already-migrated lists; falls back to a
lock-protected migration if only legacy artifacts exist.
Returns None if the list doesn't exist on disk in either form.
"""
doc_path = self._doc_path(task_list_id)
if doc_path.exists():
try:
return TaskListDocument.model_validate_json(doc_path.read_text(encoding="utf-8"))
except Exception:
logger.warning("Corrupt tasks.json at %s", doc_path, exc_info=True)
# Fall through — legacy fallback may rescue us.
if self._has_legacy_artifacts(task_list_id):
with self._list_lock(task_list_id):
# Re-check under lock: a parallel writer may have just
# finished migrating, in which case we read the new doc.
if doc_path.exists():
try:
return TaskListDocument.model_validate_json(doc_path.read_text(encoding="utf-8"))
except Exception:
logger.warning(
"Corrupt tasks.json at %s (post-lock)",
doc_path,
exc_info=True,
)
doc = self._migrate_legacy_unsafe(task_list_id)
if doc is not None:
self._write_doc_unsafe(task_list_id, doc)
self._cleanup_legacy_unsafe(task_list_id)
return doc
return None
def _read_doc_unsafe(self, task_list_id: str) -> TaskListDocument | None:
"""Same as ``_read_doc_sync`` but assumes the list-lock is already
held used by methods that are already inside ``with self._list_lock``.
Migration happens in-place without re-entering the lock.
"""
doc_path = self._doc_path(task_list_id)
if doc_path.exists():
try:
return TaskListDocument.model_validate_json(doc_path.read_text(encoding="utf-8"))
except Exception:
logger.warning("Corrupt tasks.json at %s", doc_path, exc_info=True)
if self._has_legacy_artifacts(task_list_id):
doc = self._migrate_legacy_unsafe(task_list_id)
if doc is not None:
self._write_doc_unsafe(task_list_id, doc)
self._cleanup_legacy_unsafe(task_list_id)
return doc
return None
def _write_doc_unsafe(self, task_list_id: str, doc: TaskListDocument) -> None:
"""Atomically rewrite the doc. Caller MUST hold the list-lock."""
path = self._doc_path(task_list_id)
path.parent.mkdir(parents=True, exist_ok=True)
with atomic_write(path) as f:
f.write(doc.model_dump_json(indent=2))
# ----- migration ----------------------------------------------------
def _migrate_legacy_unsafe(self, task_list_id: str) -> TaskListDocument | None:
"""Fold legacy artifacts into a TaskListDocument. Caller MUST hold lock."""
meta = self._read_legacy_meta(task_list_id)
if meta is None:
inferred_role = TaskListRole.TEMPLATE if task_list_id.startswith("colony:") else TaskListRole.SESSION
meta = TaskListMeta(task_list_id=task_list_id, role=inferred_role)
tasks: list[TaskRecord] = []
td = self._legacy_tasks_dir(task_list_id)
if td.exists():
for p in sorted(td.iterdir()):
if p.suffix != ".json":
continue
try:
tasks.append(TaskRecord.model_validate_json(p.read_text(encoding="utf-8")))
except Exception:
logger.warning(
"Skipping corrupt legacy task file %s during migration",
p,
exc_info=True,
)
tasks.sort(key=lambda r: r.id)
hwm = self._read_legacy_hwm(task_list_id)
max_id = max((r.id for r in tasks), default=0)
hwm = max(hwm, max_id)
if not tasks and hwm == 0 and not self._legacy_meta_path(task_list_id).exists():
return None
return TaskListDocument(
meta=meta,
highwatermark=hwm,
tasks=tasks,
)
def _read_legacy_meta(self, task_list_id: str) -> TaskListMeta | None:
path = self._legacy_meta_path(task_list_id)
if not path.exists():
return None
try:
return TaskListMeta.model_validate_json(path.read_text(encoding="utf-8"))
except Exception:
logger.warning("Corrupt legacy meta.json at %s", path, exc_info=True)
return None
def _read_legacy_hwm(self, task_list_id: str) -> int:
path = self._legacy_hwm_path(task_list_id)
if not path.exists():
return 0
try:
return int(path.read_text(encoding="utf-8").strip() or "0")
except (ValueError, OSError):
return 0
def _cleanup_legacy_unsafe(self, task_list_id: str) -> None:
"""Remove the older layout's files. Caller MUST hold the list-lock.
For session/colony/unscoped lists the legacy_dir is a dedicated
``tasks/`` subdir, so we remove the whole tree. For the ``_misc``
fallback the legacy_dir is the same as the new parent dir we
delete only the specific legacy filenames so we don't clobber
the new ``tasks.json``.
"""
legacy = self._legacy_dir(task_list_id)
if not legacy.exists():
return
if legacy != self._list_dir(task_list_id):
try:
shutil.rmtree(legacy)
except OSError:
logger.warning("Failed to remove legacy task dir %s", legacy, exc_info=True)
return
# _misc case: shared parent dir — surgical delete only.
for p in (
self._legacy_meta_path(task_list_id),
self._legacy_hwm_path(task_list_id),
self._legacy_lock_path(task_list_id),
):
try:
p.unlink(missing_ok=True)
except OSError:
logger.warning("Failed to remove %s", p, exc_info=True)
td = self._legacy_tasks_dir(task_list_id)
if td.exists():
try:
shutil.rmtree(td)
except OSError:
logger.warning("Failed to remove legacy tasks subdir %s", td, exc_info=True)
# ----- meta accessors over the doc ----------------------------------
def _ensure_task_list_sync(
self,
@@ -306,107 +553,47 @@ class TaskStore:
creator_agent_id: str | None,
session_id: str | None,
) -> TaskListMeta:
root = self._list_root(task_list_id)
root.mkdir(parents=True, exist_ok=True)
(root / "tasks").mkdir(exist_ok=True)
meta_path = self._meta_path(task_list_id)
with self._list_lock(task_list_id):
if meta_path.exists():
meta = self._read_meta_sync(task_list_id)
if meta is None:
# File existed but failed to parse — rewrite fresh.
meta = TaskListMeta(
task_list_id=task_list_id,
role=role,
creator_agent_id=creator_agent_id,
)
if session_id and session_id not in meta.last_seen_session_ids:
meta.last_seen_session_ids.append(session_id)
# Cap at 10 to keep the audit trail bounded.
meta.last_seen_session_ids = meta.last_seen_session_ids[-10:]
self._write_meta_sync(task_list_id, meta)
doc = self._read_doc_unsafe(task_list_id)
if doc is None:
meta = TaskListMeta(
task_list_id=task_list_id,
role=role,
creator_agent_id=creator_agent_id,
last_seen_session_ids=[session_id] if session_id else [],
)
doc = TaskListDocument(meta=meta)
self._write_doc_unsafe(task_list_id, doc)
return meta
meta = TaskListMeta(
task_list_id=task_list_id,
role=role,
creator_agent_id=creator_agent_id,
last_seen_session_ids=[session_id] if session_id else [],
)
self._write_meta_sync(task_list_id, meta)
meta = doc.meta
if session_id and session_id not in meta.last_seen_session_ids:
meta.last_seen_session_ids.append(session_id)
# Cap at 10 to keep the audit trail bounded.
meta.last_seen_session_ids = meta.last_seen_session_ids[-10:]
self._write_doc_unsafe(task_list_id, doc)
return meta
def _read_meta_sync(self, task_list_id: str) -> TaskListMeta | None:
path = self._meta_path(task_list_id)
if not path.exists():
return None
try:
return TaskListMeta.model_validate_json(path.read_text(encoding="utf-8"))
except Exception:
logger.warning("Corrupt meta.json at %s", path, exc_info=True)
return None
def _write_meta_sync(self, task_list_id: str, meta: TaskListMeta) -> None:
path = self._meta_path(task_list_id)
path.parent.mkdir(parents=True, exist_ok=True)
with atomic_write(path) as f:
f.write(meta.model_dump_json(indent=2))
doc = self._read_doc_sync(task_list_id)
return doc.meta if doc is not None else None
# ----- task IO ------------------------------------------------------
def _read_task_sync(self, task_list_id: str, task_id: int) -> TaskRecord | None:
path = self._task_path(task_list_id, task_id)
if not path.exists():
doc = self._read_doc_sync(task_list_id)
if doc is None:
return None
try:
return TaskRecord.model_validate_json(path.read_text(encoding="utf-8"))
except Exception:
logger.warning("Corrupt task file at %s", path, exc_info=True)
return None
def _write_task_sync(self, task_list_id: str, record: TaskRecord) -> None:
path = self._task_path(task_list_id, record.id)
path.parent.mkdir(parents=True, exist_ok=True)
with atomic_write(path) as f:
f.write(record.model_dump_json(indent=2))
for r in doc.tasks:
if r.id == task_id:
return r
return None
def _list_tasks_sync(self, task_list_id: str) -> list[TaskRecord]:
d = self._tasks_dir(task_list_id)
if not d.exists():
doc = self._read_doc_sync(task_list_id)
if doc is None:
return []
records: list[TaskRecord] = []
for path in sorted(d.iterdir()):
if path.suffix != ".json":
continue
try:
records.append(TaskRecord.model_validate_json(path.read_text(encoding="utf-8")))
except Exception:
logger.warning("Skipping corrupt task file %s", path, exc_info=True)
records.sort(key=lambda r: r.id)
return records
# ----- highwatermark / id assignment --------------------------------
def _read_highwatermark_sync(self, task_list_id: str) -> int:
path = self._highwatermark_path(task_list_id)
if not path.exists():
return 0
try:
return int(path.read_text(encoding="utf-8").strip() or "0")
except (ValueError, OSError):
return 0
def _write_highwatermark_sync(self, task_list_id: str, value: int) -> None:
path = self._highwatermark_path(task_list_id)
path.parent.mkdir(parents=True, exist_ok=True)
with atomic_write(path) as f:
f.write(str(value))
def _next_id_sync(self, task_list_id: str) -> int:
"""Compute next id under the assumption the list-lock is held."""
existing = self._list_tasks_sync(task_list_id)
max_existing = max((r.id for r in existing), default=0)
floor = self._read_highwatermark_sync(task_list_id)
return max(max_existing, floor) + 1
return sorted(doc.tasks, key=lambda r: r.id)
# ----- create -------------------------------------------------------
@@ -420,20 +607,11 @@ class TaskStore:
metadata: dict[str, Any],
) -> TaskRecord:
with self._list_lock(task_list_id):
# Lazy-create meta.json on first task. Session lists are
# frequently created via the first task_create (no explicit
# ensure_task_list call); without this backfill the REST
# endpoint can't discover them. Role is inferred from prefix.
if not self._meta_path(task_list_id).exists():
doc = self._read_doc_unsafe(task_list_id)
if doc is None:
inferred_role = TaskListRole.TEMPLATE if task_list_id.startswith("colony:") else TaskListRole.SESSION
self._write_meta_sync(
task_list_id,
TaskListMeta(
task_list_id=task_list_id,
role=inferred_role,
),
)
new_id = self._next_id_sync(task_list_id)
doc = TaskListDocument(meta=TaskListMeta(task_list_id=task_list_id, role=inferred_role))
new_id = self._next_id_for_doc(doc)
now = time.time()
record = TaskRecord(
id=new_id,
@@ -446,11 +624,10 @@ class TaskStore:
created_at=now,
updated_at=now,
)
self._write_task_sync(task_list_id, record)
# Bump high-water-mark eagerly so even a concurrent racer that
# somehow missed the listing snapshot can't pick the same id.
if new_id > self._read_highwatermark_sync(task_list_id):
self._write_highwatermark_sync(task_list_id, new_id)
doc.tasks.append(record)
if new_id > doc.highwatermark:
doc.highwatermark = new_id
self._write_doc_unsafe(task_list_id, doc)
return record
def _create_tasks_batch_sync(
@@ -467,19 +644,12 @@ class TaskStore:
raise ValueError(f"specs[{i}].subject must be a non-empty string")
with self._list_lock(task_list_id):
# Same lazy meta backfill as _create_task_sync.
if not self._meta_path(task_list_id).exists():
inferred_role = (
TaskListRole.TEMPLATE
if task_list_id.startswith("colony:")
else TaskListRole.SESSION
)
self._write_meta_sync(
task_list_id,
TaskListMeta(task_list_id=task_list_id, role=inferred_role),
)
doc = self._read_doc_unsafe(task_list_id)
if doc is None:
inferred_role = TaskListRole.TEMPLATE if task_list_id.startswith("colony:") else TaskListRole.SESSION
doc = TaskListDocument(meta=TaskListMeta(task_list_id=task_list_id, role=inferred_role))
base_id = self._next_id_sync(task_list_id)
base_id = self._next_id_for_doc(doc)
now = time.time()
records: list[TaskRecord] = []
for offset, spec in enumerate(specs):
@@ -496,27 +666,20 @@ class TaskStore:
)
records.append(rec)
# Write all task files; on any failure, unlink everything we
# wrote so far and re-raise. High-water-mark is bumped only
# after a successful full-batch write.
written: list[Path] = []
try:
for rec in records:
self._write_task_sync(task_list_id, rec)
written.append(self._task_path(task_list_id, rec.id))
except Exception:
for path in written:
try:
path.unlink(missing_ok=True)
except OSError:
logger.warning("Failed to roll back batch task at %s", path, exc_info=True)
raise
doc.tasks.extend(records)
highest = records[-1].id
if highest > self._read_highwatermark_sync(task_list_id):
self._write_highwatermark_sync(task_list_id, highest)
if highest > doc.highwatermark:
doc.highwatermark = highest
# Single write — atomic batch is free with the doc model.
self._write_doc_unsafe(task_list_id, doc)
return records
# ----- id assignment ------------------------------------------------
def _next_id_for_doc(self, doc: TaskListDocument) -> int:
max_existing = max((r.id for r in doc.tasks), default=0)
return max(max_existing, doc.highwatermark) + 1
# ----- update -------------------------------------------------------
def _update_task_sync(
@@ -533,12 +696,15 @@ class TaskStore:
metadata_patch: dict[str, Any] | None,
) -> tuple[TaskRecord | None, list[str]]:
with self._list_lock(task_list_id):
current = self._read_task_sync(task_list_id, task_id)
if current is None:
doc = self._read_doc_unsafe(task_list_id)
if doc is None:
return None, []
return self._update_task_unsafe(
task_list_id,
current,
target = next((r for r in doc.tasks if r.id == task_id), None)
if target is None:
return None, []
new, changed = self._update_task_in_doc(
doc,
target,
subject=subject,
description=description,
active_form=active_form,
@@ -548,10 +714,13 @@ class TaskStore:
add_blocked_by=add_blocked_by,
metadata_patch=metadata_patch,
)
if changed:
self._write_doc_unsafe(task_list_id, doc)
return new, changed
def _update_task_unsafe(
def _update_task_in_doc(
self,
task_list_id: str,
doc: TaskListDocument,
current: TaskRecord,
*,
subject: str | None = None,
@@ -563,84 +732,89 @@ class TaskStore:
add_blocked_by: list[int] | None = None,
metadata_patch: dict[str, Any] | None = None,
) -> tuple[TaskRecord, list[str]]:
"""Update without acquiring the list-lock. Caller MUST hold it."""
"""Mutate ``current`` in place inside ``doc`` and return (record, changed).
Bidirectional blocks/blocked_by also mutate the targets in ``doc``.
"""
changed: list[str] = []
new = current.model_copy(deep=True)
if subject is not None and subject != new.subject:
new.subject = subject
if subject is not None and subject != current.subject:
current.subject = subject
changed.append("subject")
if description is not None and description != new.description:
new.description = description
if description is not None and description != current.description:
current.description = description
changed.append("description")
if active_form is not None and active_form != new.active_form:
new.active_form = active_form
if active_form is not None and active_form != current.active_form:
current.active_form = active_form
changed.append("active_form")
if not isinstance(owner, _Unset) and owner != new.owner:
new.owner = owner
if not isinstance(owner, _Unset) and owner != current.owner:
current.owner = owner
changed.append("owner")
if status is not None and status != new.status:
new.status = status
if status is not None and status != current.status:
current.status = status
changed.append("status")
if add_blocks:
for b in add_blocks:
if b not in new.blocks and b != new.id:
new.blocks.append(b)
if "blocks" not in changed:
changed.append("blocks")
# Maintain the bidirectional invariant by stamping
# blocked_by on the target as well.
target = self._read_task_sync(task_list_id, b)
if target and new.id not in target.blocked_by:
target.blocked_by.append(new.id)
target.updated_at = time.time()
self._write_task_sync(task_list_id, target)
if b in current.blocks or b == current.id:
continue
current.blocks.append(b)
if "blocks" not in changed:
changed.append("blocks")
target = next((r for r in doc.tasks if r.id == b), None)
if target is not None and current.id not in target.blocked_by:
target.blocked_by.append(current.id)
target.updated_at = time.time()
if add_blocked_by:
for b in add_blocked_by:
if b not in new.blocked_by and b != new.id:
new.blocked_by.append(b)
if "blocked_by" not in changed:
changed.append("blocked_by")
target = self._read_task_sync(task_list_id, b)
if target and new.id not in target.blocks:
target.blocks.append(new.id)
target.updated_at = time.time()
self._write_task_sync(task_list_id, target)
if b in current.blocked_by or b == current.id:
continue
current.blocked_by.append(b)
if "blocked_by" not in changed:
changed.append("blocked_by")
target = next((r for r in doc.tasks if r.id == b), None)
if target is not None and current.id not in target.blocks:
target.blocks.append(current.id)
target.updated_at = time.time()
if metadata_patch is not None:
md = dict(new.metadata)
md = dict(current.metadata)
for k, v in metadata_patch.items():
if v is None:
md.pop(k, None)
else:
md[k] = v
if md != new.metadata:
new.metadata = md
if md != current.metadata:
current.metadata = md
changed.append("metadata")
if not changed:
return new, []
return current, []
new.updated_at = time.time()
self._write_task_sync(task_list_id, new)
return new, changed
current.updated_at = time.time()
return current, changed
# ----- delete -------------------------------------------------------
def _delete_task_sync(self, task_list_id: str, task_id: int) -> tuple[bool, list[int]]:
with self._list_lock(task_list_id):
path = self._task_path(task_list_id, task_id)
if not path.exists():
doc = self._read_doc_unsafe(task_list_id)
if doc is None:
return False, []
# 1. Bump high-water-mark BEFORE unlinking so a crash mid-delete
# can't accidentally re-allocate the id.
current_floor = self._read_highwatermark_sync(task_list_id)
if task_id > current_floor:
self._write_highwatermark_sync(task_list_id, task_id)
# 2. Unlink the task itself.
path.unlink()
idx = next((i for i, r in enumerate(doc.tasks) if r.id == task_id), None)
if idx is None:
return False, []
# 1. Bump high-water-mark BEFORE removing so a crash mid-write
# can't cause id reuse on the next create. (atomic_write
# guarantees we either commit the whole new state or none.)
if task_id > doc.highwatermark:
doc.highwatermark = task_id
# 2. Remove the task itself.
doc.tasks.pop(idx)
# 3. Cascade: strip references from all other tasks.
cascaded: list[int] = []
for other in self._list_tasks_sync(task_list_id):
now = time.time()
for other in doc.tasks:
touched = False
if task_id in other.blocks:
other.blocks = [b for b in other.blocks if b != task_id]
@@ -649,31 +823,31 @@ class TaskStore:
other.blocked_by = [b for b in other.blocked_by if b != task_id]
touched = True
if touched:
other.updated_at = time.time()
self._write_task_sync(task_list_id, other)
other.updated_at = now
cascaded.append(other.id)
self._write_doc_unsafe(task_list_id, doc)
return True, cascaded
# ----- reset --------------------------------------------------------
def _reset_sync(self, task_list_id: str) -> None:
with self._list_lock(task_list_id):
tasks = self._list_tasks_sync(task_list_id)
max_id = max((r.id for r in tasks), default=0)
floor = self._read_highwatermark_sync(task_list_id)
new_floor = max(max_id, floor)
self._write_highwatermark_sync(task_list_id, new_floor)
d = self._tasks_dir(task_list_id)
if d.exists():
for p in d.iterdir():
if p.suffix == ".json":
p.unlink()
doc = self._read_doc_unsafe(task_list_id)
if doc is None:
return
max_id = max((r.id for r in doc.tasks), default=0)
doc.highwatermark = max(doc.highwatermark, max_id)
doc.tasks = []
self._write_doc_unsafe(task_list_id, doc)
# ----- claim --------------------------------------------------------
def _claim_sync(self, task_list_id: str, task_id: int, claimant: str) -> ClaimResult:
with self._list_lock(task_list_id):
current = self._read_task_sync(task_list_id, task_id)
doc = self._read_doc_unsafe(task_list_id)
if doc is None:
return ClaimNotFound(kind="not_found")
current = next((r for r in doc.tasks if r.id == task_id), None)
if current is None:
return ClaimNotFound(kind="not_found")
if current.status == TaskStatus.COMPLETED:
@@ -682,12 +856,13 @@ class TaskStore:
return ClaimAlreadyOwned(kind="already_owned", by=current.owner)
unresolved_blockers: list[int] = []
for b in current.blocked_by:
blocker = self._read_task_sync(task_list_id, b)
blocker = next((r for r in doc.tasks if r.id == b), None)
if blocker is not None and blocker.status != TaskStatus.COMPLETED:
unresolved_blockers.append(b)
if unresolved_blockers:
return ClaimBlocked(kind="blocked", by=unresolved_blockers)
new, _ = self._update_task_unsafe(task_list_id, current, owner=claimant)
new, _ = self._update_task_in_doc(doc, current, owner=claimant)
self._write_doc_unsafe(task_list_id, doc)
return ClaimOk(kind="ok", record=new)
@@ -713,10 +888,32 @@ def get_task_store() -> TaskStore:
# Convenience for tests / utilities.
def fingerprint_for_test(task_list_id: str, hive_root: Path) -> Iterable[Path]:
"""Yield every file under a list root — used by tests to assert
"""Yield every task-list-related file — used by tests to assert
byte-equivalence pre/post shutdown.
Includes the doc + lock and any legacy leftovers (so this still works
while a list is mid-migration).
"""
root = task_list_path(task_list_id, hive_root=hive_root)
if not root.exists():
files: list[Path] = []
base = task_list_path(task_list_id, hive_root=hive_root)
if not base.exists():
return []
return sorted(root.rglob("*"))
doc = base / DOC_FILENAME
if doc.exists():
files.append(doc)
lock = base / LOCK_FILENAME
if lock.exists():
files.append(lock)
legacy = _legacy_root(task_list_id, hive_root=hive_root)
if legacy.exists() and legacy != base:
files.extend(sorted(legacy.rglob("*")))
elif legacy.exists():
# _misc fallback: include only legacy filenames
for name in ("meta.json", ".highwatermark", ".lock"):
p = legacy / name
if p.exists():
files.append(p)
td = legacy / "tasks"
if td.exists():
files.extend(sorted(td.rglob("*")))
return sorted(files)
+111 -2
View File
@@ -7,6 +7,7 @@ primitives the rest of the system relies on.
from __future__ import annotations
import asyncio
import json
from pathlib import Path
import pytest
@@ -263,11 +264,119 @@ async def test_ensure_task_list_caps_history(store: TaskStore, list_id: str) ->
@pytest.mark.asyncio
async def test_colony_path(store: TaskStore, tmp_path: Path) -> None:
await store.ensure_task_list("colony:abc", role=TaskListRole.TEMPLATE)
assert (tmp_path / "colonies" / "abc" / "tasks" / "meta.json").exists()
assert (tmp_path / "colonies" / "abc" / "tasks.json").exists()
@pytest.mark.asyncio
async def test_session_path(store: TaskStore, tmp_path: Path) -> None:
await store.ensure_task_list("session:agent_x:sess_y", role=TaskListRole.SESSION)
p = tmp_path / "agents" / "agent_x" / "sessions" / "sess_y" / "tasks" / "meta.json"
p = tmp_path / "agents" / "agent_x" / "sessions" / "sess_y" / "tasks.json"
assert p.exists()
@pytest.mark.asyncio
async def test_canonical_queen_session_dir_wins(store: TaskStore, tmp_path: Path) -> None:
"""When ``agents/queens/{name}/sessions/{sid}/`` exists on disk, the task
doc lands there beside conversations/events/summary instead of in
the orphaned ``agents/{agent_id}/sessions/{sid}/`` location.
"""
sid = "session_20260429_test"
canonical = tmp_path / "agents" / "queens" / "queen_growth" / "sessions" / sid
canonical.mkdir(parents=True)
# Pretend the rest of the session is here.
(canonical / "events.jsonl").write_text("", encoding="utf-8")
list_id = f"session:queen:{sid}"
await store.ensure_task_list(list_id, role=TaskListRole.SESSION)
rec = await store.create_task(list_id, subject="hello")
assert (canonical / "tasks.json").exists()
assert not (tmp_path / "agents" / "queen" / "sessions" / sid / "tasks.json").exists()
fetched = await store.list_tasks(list_id)
assert [r.id for r in fetched] == [rec.id]
# ---------------------------------------------------------------------------
# Lazy migration from the older fan-out layout
# ---------------------------------------------------------------------------
def _seed_legacy_session(tmp_path: Path, agent: str, sess: str, n_tasks: int) -> Path:
"""Hand-craft an older ``{root}/tasks/`` layout the way it used to live
on disk, so we can prove the lazy migration folds it correctly.
"""
legacy = tmp_path / "agents" / agent / "sessions" / sess / "tasks"
(legacy / "tasks").mkdir(parents=True)
list_id = f"session:{agent}:{sess}"
(legacy / "meta.json").write_text(
json.dumps(
{
"task_list_id": list_id,
"role": "session",
"creator_agent_id": None,
"created_at": 1000.0,
"last_seen_session_ids": ["s1"],
"schema_version": 1,
}
),
encoding="utf-8",
)
(legacy / ".highwatermark").write_text(str(n_tasks), encoding="utf-8")
(legacy / ".lock").write_text("", encoding="utf-8")
for i in range(1, n_tasks + 1):
(legacy / "tasks" / f"{i:04d}.json").write_text(
json.dumps(
{
"id": i,
"subject": f"legacy {i}",
"description": "",
"active_form": None,
"owner": None,
"status": "pending",
"blocks": [],
"blocked_by": [],
"metadata": {},
"created_at": 1000.0 + i,
"updated_at": 1000.0 + i,
}
),
encoding="utf-8",
)
return legacy
@pytest.mark.asyncio
async def test_legacy_layout_migrates_on_first_read(store: TaskStore, tmp_path: Path) -> None:
legacy = _seed_legacy_session(tmp_path, "agent_z", "sess_z", 3)
list_id = "session:agent_z:sess_z"
# First read should fold the legacy fan-out into tasks.json.
records = await store.list_tasks(list_id)
assert [r.id for r in records] == [1, 2, 3]
assert [r.subject for r in records] == ["legacy 1", "legacy 2", "legacy 3"]
# New doc exists; the legacy dir is gone.
new_doc = tmp_path / "agents" / "agent_z" / "sessions" / "sess_z" / "tasks.json"
assert new_doc.exists()
assert not legacy.exists()
# Highwatermark is preserved — next id is 4, not 1.
new_rec = await store.create_task(list_id, subject="post-migration")
assert new_rec.id == 4
@pytest.mark.asyncio
async def test_legacy_layout_migrates_on_first_write(store: TaskStore, tmp_path: Path) -> None:
_seed_legacy_session(tmp_path, "agent_w", "sess_w", 2)
list_id = "session:agent_w:sess_w"
# Update a legacy task — must trigger migration, then mutate.
new, changed = await store.update_task(list_id, 2, status=TaskStatus.IN_PROGRESS)
assert new is not None
assert changed == ["status"]
assert new.status == TaskStatus.IN_PROGRESS
# Doc reflects both legacy tasks.
listed = await store.list_tasks(list_id)
assert len(listed) == 2
@pytest.mark.asyncio
async def test_legacy_list_exists(store: TaskStore, tmp_path: Path) -> None:
_seed_legacy_session(tmp_path, "agent_q", "sess_q", 1)
assert await store.list_exists("session:agent_q:sess_q")
+10 -14
View File
@@ -125,8 +125,7 @@ def _create_batch_schema() -> dict[str, Any]:
"type": "array",
"minItems": 1,
"description": (
"Array of task specs. Each becomes one task with a "
"sequential id. Atomic — all created or none."
"Array of task specs. Each becomes one task with a sequential id. Atomic — all created or none."
),
"items": {
"type": "object",
@@ -138,9 +137,7 @@ def _create_batch_schema() -> dict[str, Any]:
"description": {"type": "string"},
"active_form": {
"type": "string",
"description": (
"Present-continuous label shown while in_progress."
),
"description": ("Present-continuous label shown while in_progress."),
},
"metadata": {"type": "object"},
},
@@ -160,6 +157,9 @@ _CREATE_DESC = (
"Create ONE task on your own session task list. Use this for one-off "
"mid-run additions when you discover unplanned work after the initial "
"plan is laid out.\n\n"
"**After receiving new instructions, immediately capture the user's "
"requirements as tasks** — and delete (via `task_update` with "
"status='deleted') any prior tasks that no longer apply.\n\n"
"**For laying out a multi-step plan upfront, use `task_create_batch` "
"instead** — one tool call with all the steps is cheaper and atomic.\n\n"
"Fields:\n"
@@ -179,7 +179,9 @@ _UPDATE_DESC = (
"- Mark it `completed` AS SOON as you finish it — do not let "
"multiple finished tasks pile up unmarked before flushing them at "
"the end of the run.\n"
"- Set status='deleted' to drop a task that's no longer relevant.\n\n"
"- Delete tasks: when a task is no longer relevant or was created "
"in error. Setting status='deleted' **permanently** removes the "
"task — the id is retired and cannot be reused.\n\n"
"ONLY mark `completed` when the task is FULLY done. If you hit errors, "
"blockers, or partial state, keep it `in_progress` and create a new "
"task describing what's blocking. Never mark completed with caveats; "
@@ -312,10 +314,7 @@ def _make_create_batch_executor(store: TaskStore):
await store.delete_task(list_id, r.id)
return {
"success": False,
"error": (
f"Hook blocked task #{rec.id} ({rec.subject!r}); "
f"entire batch rolled back: {exc}"
),
"error": (f"Hook blocked task #{rec.id} ({rec.subject!r}); entire batch rolled back: {exc}"),
}
for rec in recs:
@@ -334,10 +333,7 @@ def _make_create_batch_executor(store: TaskStore):
"success": True,
"task_list_id": list_id,
"task_ids": ids,
"message": (
f"Created {len(ids)} task(s): {range_label}. "
f"Mark #{ids[0]} in_progress before starting it."
),
"message": (f"Created {len(ids)} task(s): {range_label}. Mark #{ids[0]} in_progress before starting it."),
"tasks": [_serialize_task(r) for r in recs],
}
File diff suppressed because it is too large Load Diff
+40 -5
View File
@@ -9,13 +9,21 @@ write. Errors are silently swallowed — this must never break the agent.
import json
import logging
import os
from datetime import datetime
from datetime import UTC, datetime
from pathlib import Path
from typing import IO, Any
logger = logging.getLogger(__name__)
_LLM_DEBUG_DIR = Path.home() / ".hive" / "llm_logs"
def _llm_debug_dir() -> Path:
"""Resolve $HIVE_HOME/llm_logs lazily so the env override (set by the
desktop) takes effect. A module-level constant would freeze whatever
HIVE_HOME was at import time and miss late-bound test overrides."""
from framework.config import HIVE_HOME
return HIVE_HOME / "llm_logs"
_log_file: IO[str] | None = None
_log_ready = False # lazy init guard
@@ -23,13 +31,36 @@ _log_ready = False # lazy init guard
def _open_log() -> IO[str] | None:
"""Open the JSONL log file for this process."""
_LLM_DEBUG_DIR.mkdir(parents=True, exist_ok=True)
debug_dir = _llm_debug_dir()
debug_dir.mkdir(parents=True, exist_ok=True)
ts = datetime.now().strftime("%Y%m%d_%H%M%S")
path = _LLM_DEBUG_DIR / f"{ts}.jsonl"
path = debug_dir / f"{ts}.jsonl"
logger.info("LLM debug log → %s", path)
return open(path, "a", encoding="utf-8") # noqa: SIM115
def _serialize_tools(tools: Any) -> list[dict[str, Any]]:
"""Reduce a list of Tool dataclasses to the schema fields shown to the LLM.
Best-effort: unknown shapes fall back to ``str()`` so logging never raises.
"""
if not tools:
return []
out: list[dict[str, Any]] = []
for tool in tools:
try:
out.append(
{
"name": getattr(tool, "name", ""),
"description": getattr(tool, "description", ""),
"parameters": getattr(tool, "parameters", {}) or {},
}
)
except Exception:
out.append({"name": str(tool)})
return out
def log_llm_turn(
*,
node_id: str,
@@ -42,6 +73,7 @@ def log_llm_turn(
tool_calls: list[dict[str, Any]],
tool_results: list[dict[str, Any]],
token_counts: dict[str, Any],
tools: list[Any] | None = None,
) -> None:
"""Write one JSONL line capturing a complete LLM turn.
@@ -58,12 +90,15 @@ def log_llm_turn(
if _log_file is None:
return
record = {
"timestamp": datetime.now().isoformat(),
# UTC + offset matches tool_call start_timestamp (agent_loop.py)
# so the viewer can render every event in one consistent local zone.
"timestamp": datetime.now(UTC).isoformat(),
"node_id": node_id,
"stream_id": stream_id,
"execution_id": execution_id,
"iteration": iteration,
"system_prompt": system_prompt,
"tools": _serialize_tools(tools),
"messages": messages,
"assistant_text": assistant_text,
"tool_calls": tool_calls,
+1264 -2
View File
File diff suppressed because it is too large Load Diff
+2
View File
@@ -11,7 +11,9 @@
},
"dependencies": {
"clsx": "^2.1.1",
"echarts": "^5.6.0",
"lucide-react": "^0.575.0",
"mermaid": "^11.14.0",
"react": "^18.3.1",
"react-dom": "^18.3.1",
"react-markdown": "^10.1.0",
+2
View File
@@ -3,6 +3,7 @@ import AppLayout from "./layouts/AppLayout";
import Home from "./pages/home";
import ColonyChat from "./pages/colony-chat";
import QueenDM from "./pages/queen-dm";
import QueenRouting from "./pages/queen-routing";
import OrgChart from "./pages/org-chart";
import PromptLibrary from "./pages/prompt-library";
import SkillsLibrary from "./pages/skills-library";
@@ -16,6 +17,7 @@ function App() {
<Route element={<AppLayout />}>
<Route path="/" element={<Home />} />
<Route path="/colony/:colonyId" element={<ColonyChat />} />
<Route path="/queen-routing" element={<QueenRouting />} />
<Route path="/queen/:queenId" element={<QueenDM />} />
<Route path="/org-chart" element={<OrgChart />} />
<Route path="/skills-library" element={<SkillsLibrary />} />
+14
View File
@@ -28,6 +28,16 @@ export interface McpServerTools {
tools: Array<ToolMeta & { enabled: boolean }>;
}
export interface ToolCategory {
/** Category id (e.g. "spreadsheet_advanced", "browser_basic"). */
name: string;
/** Concrete tool names that belong to this category, after expansion
* of any ``@server:NAME`` shorthands against the live MCP catalog. */
tools: string[];
/** True when this category contributes to the queen's role-based default. */
in_role_default: boolean;
}
export interface QueenToolsResponse {
queen_id: string;
enabled_mcp_tools: string[] | null;
@@ -39,6 +49,10 @@ export interface QueenToolsResponse {
lifecycle: ToolMeta[];
synthetic: ToolMeta[];
mcp_servers: McpServerTools[];
/** Curated category groupings (file_ops, browser_basic, security, …)
* with resolved tool members. ``in_role_default`` flags categories
* baked into this queen's default allowlist. */
categories: ToolCategory[];
}
export interface QueenToolsUpdateResult {
+83 -54
View File
@@ -13,6 +13,9 @@ import {
} from "lucide-react";
import WorkerRunBubble from "@/components/WorkerRunBubble";
import type { WorkerRunGroup } from "@/components/WorkerRunBubble";
import ChartToolDetail, {
type ChartToolEntry,
} from "@/components/charts/ChartToolDetail";
export interface ImageContent {
type: "image_url";
@@ -91,6 +94,10 @@ interface ChatPanelProps {
activeThread: string;
/** When true, the input is disabled (e.g. during loading) */
disabled?: boolean;
/** When true, only the send button is locked — the textarea stays typable.
* Used during new-session bootstrap so the user can compose a follow-up
* while the queen finishes warming up / streaming her first reply. */
sendLocked?: boolean;
/** When false, the image attach button is hidden (model lacks vision support) */
supportsImages?: boolean;
/** Called when user clicks the stop button to cancel the queen's current turn */
@@ -201,7 +208,7 @@ export function toolHex(name: string): string {
}
export function ToolActivityRow({ content }: { content: string }) {
let tools: { name: string; done: boolean }[] = [];
let tools: ChartToolEntry[] = [];
try {
const parsed = JSON.parse(content);
tools = parsed.tools || [];
@@ -235,52 +242,65 @@ export function ToolActivityRow({ content }: { content: string }) {
if (counts.done > 0) donePills.push({ name, count: counts.done });
}
// Per-call chart embeds: chart_render's result envelope carries the
// spec back, so the chat renders the same chart the server
// rasterized to PNG. Other tools stay pill-only by design.
const chartDetails = tools.filter((t) => t.name.startsWith("chart_"));
return (
<div className="flex gap-3 pl-10">
<div className="flex flex-wrap items-center gap-1.5">
{runningPills.map((p) => {
const hex = toolHex(p.name);
return (
<span
key={`run-${p.name}`}
className="inline-flex items-center gap-1 text-[11px] px-2.5 py-0.5 rounded-full"
style={{
color: hex,
backgroundColor: `${hex}18`,
border: `1px solid ${hex}35`,
}}
>
<Loader2 className="w-2.5 h-2.5 animate-spin" />
{p.name}
{p.count > 1 && (
<span className="text-[10px] font-medium opacity-70">
×{p.count}
</span>
)}
</span>
);
})}
{donePills.map((p) => {
const hex = toolHex(p.name);
return (
<span
key={`done-${p.name}`}
className="inline-flex items-center gap-1 text-[11px] px-2.5 py-0.5 rounded-full"
style={{
color: hex,
backgroundColor: `${hex}18`,
border: `1px solid ${hex}35`,
}}
>
<Check className="w-2.5 h-2.5" />
{p.name}
{p.count > 1 && (
<span className="text-[10px] opacity-80">×{p.count}</span>
)}
</span>
);
})}
<div className="flex flex-col gap-0.5">
<div className="flex gap-3 pl-10">
<div className="flex flex-wrap items-center gap-1.5">
{runningPills.map((p) => {
const hex = toolHex(p.name);
return (
<span
key={`run-${p.name}`}
className="inline-flex items-center gap-1 text-[11px] px-2.5 py-0.5 rounded-full"
style={{
color: hex,
backgroundColor: `${hex}18`,
border: `1px solid ${hex}35`,
}}
>
<Loader2 className="w-2.5 h-2.5 animate-spin" />
{p.name}
{p.count > 1 && (
<span className="text-[10px] font-medium opacity-70">
×{p.count}
</span>
)}
</span>
);
})}
{donePills.map((p) => {
const hex = toolHex(p.name);
return (
<span
key={`done-${p.name}`}
className="inline-flex items-center gap-1 text-[11px] px-2.5 py-0.5 rounded-full"
style={{
color: hex,
backgroundColor: `${hex}18`,
border: `1px solid ${hex}35`,
}}
>
<Check className="w-2.5 h-2.5" />
{p.name}
{p.count > 1 && (
<span className="text-[10px] opacity-80">×{p.count}</span>
)}
</span>
);
})}
</div>
</div>
{chartDetails.map((t, idx) => (
<ChartToolDetail
key={t.callKey ?? `${t.name}-${idx}`}
entry={t}
/>
))}
</div>
);
}
@@ -916,6 +936,7 @@ export default function ChatPanel({
isBusy,
activeThread,
disabled,
sendLocked,
onCancel,
onSteer,
onCancelQueued,
@@ -1401,8 +1422,10 @@ export default function ChatPanel({
);
})}
{/* Show typing indicator while waiting for first queen response (disabled + empty chat) */}
{(isWaiting || (disabled && threadMessages.length === 0)) && (
{/* Show typing indicator while waiting for first queen response
(disabled / sendLocked + empty chat counts as warm-up). */}
{(isWaiting ||
((disabled || sendLocked) && threadMessages.length === 0)) && (
<div className="flex gap-3">
<div
className="flex-shrink-0 w-9 h-9 rounded-xl flex items-center justify-center overflow-hidden"
@@ -1669,9 +1692,11 @@ export default function ChatPanel({
placeholder={
disabled
? "Connecting to agent..."
: isBusy
? "Queue a message — or click Steer to inject now..."
: "Message Queen Bee..."
: sendLocked
? "Type ahead — send unlocks once the queen is ready..."
: isBusy
? "Queue a message — or click Steer to inject now..."
: "Message Queen Bee..."
}
disabled={disabled}
className="flex-1 bg-transparent text-sm text-foreground outline-none placeholder:text-muted-foreground disabled:opacity-50 disabled:cursor-not-allowed resize-none overflow-y-auto"
@@ -1689,12 +1714,16 @@ export default function ChatPanel({
<button
type="submit"
disabled={
(!input.trim() && pendingImages.length === 0) || disabled
(!input.trim() && pendingImages.length === 0) ||
disabled ||
sendLocked
}
title={
isBusy
? "Queue message — sent after the current turn, or click Steer on the bubble to send now"
: "Send"
sendLocked
? "Hold tight — the queen is starting up. Send unlocks once she's ready."
: isBusy
? "Queue message — sent after the current turn, or click Steer on the bubble to send now"
: "Send"
}
className={`p-2 rounded-lg disabled:opacity-30 hover:opacity-90 transition-opacity ${
isBusy
+92 -12
View File
@@ -8,7 +8,7 @@ import {
Wrench,
AlertCircle,
} from "lucide-react";
import type { ToolMeta, McpServerTools } from "@/api/queens";
import type { ToolMeta, McpServerTools, ToolCategory } from "@/api/queens";
/** Shape every Tools section (Queen / Colony) shares. */
export interface ToolsSnapshot {
@@ -17,11 +17,86 @@ export interface ToolsSnapshot {
lifecycle: ToolMeta[];
synthetic: ToolMeta[];
mcp_servers: McpServerTools[];
/** Optional: curated category groupings (queens only today). When
* present, tools that belong to a category are grouped under that
* category instead of their MCP server. */
categories?: ToolCategory[];
/** Optional: when true, the allowlist came from the role-based
* default (no explicit save). Only queens surface this today. */
is_role_default?: boolean;
}
type ToolWithEnabled = ToolMeta & { enabled: boolean };
interface RenderGroup {
/** Stable key for expansion state and React keys. */
key: string;
/** Display title shown in the collapsible header. */
title: string;
tools: ToolWithEnabled[];
}
/** Snake_case / kebab-case → Title Case for category labels so they
* read naturally next to MCP server names. */
function formatCategoryTitle(name: string): string {
return name
.split(/[_-]+/)
.filter((w) => w.length > 0)
.map((w) => w.charAt(0).toUpperCase() + w.slice(1))
.join(" ");
}
/** Build display groups with the priority: category → MCP server → "Other tools".
* A tool that belongs to multiple categories lands in the first one (input order). */
function buildGroups(
mcpServers: McpServerTools[],
categories: ToolCategory[] | undefined,
): RenderGroup[] {
const toolCategory = new Map<string, string>();
categories?.forEach((cat) => {
cat.tools.forEach((toolName) => {
if (!toolCategory.has(toolName)) toolCategory.set(toolName, cat.name);
});
});
const groupMap = new Map<string, RenderGroup>();
// Pre-seed category groups in their original order so categories
// come before MCP servers regardless of which tool we encounter first.
categories?.forEach((cat) => {
groupMap.set(`cat:${cat.name}`, {
key: `cat:${cat.name}`,
title: formatCategoryTitle(cat.name),
tools: [],
});
});
mcpServers.forEach((srv) => {
srv.tools.forEach((t) => {
const cat = toolCategory.get(t.name);
let key: string;
let title: string;
if (cat) {
key = `cat:${cat}`;
title = formatCategoryTitle(cat);
} else if (srv.name && srv.name !== "(unknown)") {
key = `srv:${srv.name}`;
title = srv.name;
} else {
key = "other";
title = "Other tools";
}
let group = groupMap.get(key);
if (!group) {
group = { key, title, tools: [] };
groupMap.set(key, group);
}
group.tools.push(t);
});
});
return Array.from(groupMap.values()).filter((g) => g.tools.length > 0);
}
export interface ToolsEditorProps {
/** Stable identifier — refetches when it changes. */
subjectKey: string;
@@ -219,6 +294,11 @@ export default function ToolsEditor({
return s;
}, [data]);
const groups = useMemo(
() => (data ? buildGroups(data.mcp_servers, data.categories) : []),
[data],
);
const dirty = useMemo(() => {
const a = draftAllowed;
const b = baselineRef.current;
@@ -401,10 +481,10 @@ export default function ToolsEditor({
</CollapsibleGroup>
)}
{data.mcp_servers.map((srv) => {
const toolNames = srv.tools.map((t) => t.name);
{groups.map((group) => {
const toolNames = group.tools.map((t) => t.name);
const state = triStateForServer(toolNames, draftAllowed);
const enabledInServer =
const enabledInGroup =
draftAllowed === null
? toolNames.length
: toolNames.reduce(
@@ -413,13 +493,13 @@ export default function ToolsEditor({
);
return (
<CollapsibleGroup
key={srv.name}
title={srv.name === "(unknown)" ? "MCP Tools" : srv.name}
count={srv.tools.length}
badge={`${enabledInServer}/${srv.tools.length}`}
expanded={!!expanded[srv.name]}
key={group.key}
title={group.title}
count={group.tools.length}
badge={`${enabledInGroup}/${group.tools.length}`}
expanded={!!expanded[group.key]}
onToggle={() =>
setExpanded((p) => ({ ...p, [srv.name]: !p[srv.name] }))
setExpanded((p) => ({ ...p, [group.key]: !p[group.key] }))
}
leading={
<TriStateCheckbox
@@ -429,12 +509,12 @@ export default function ToolsEditor({
}
>
<div className="flex flex-col">
{srv.tools.map((t) => {
{group.tools.map((t) => {
const enabled =
draftAllowed === null ? true : draftAllowed.has(t.name);
return (
<ToolRow
key={`${srv.name}-${t.name}`}
key={`${group.key}-${t.name}`}
name={t.name}
description={t.description}
enabled={enabled}
@@ -0,0 +1,158 @@
/**
* Per-call detail row for ``chart_*`` tool calls.
*
* The canonical embedding mechanism: when the agent invokes
* ``chart_render``, the runtime stores the result envelope in
* ``events.jsonl``; ``chat-helpers.replayEvent`` retains it and the
* chat panel dispatches it here. We read ``result.spec`` and mount
* the live renderer; ``result.file_url`` becomes the download link.
*
* Rules baked in:
* - The chart is reconstructed FROM THE TOOL RESULT, not from any
* markdown fence the agent might have written. Calling the tool
* IS the embedding — there's nothing else to remember.
* - The chart survives session reload because the spec lives in
* events.jsonl alongside the tool_call_completed event.
* - The downloadable PNG lives at ``result.file_url`` (a ``file://``
* URI on the runtime host). The web frontend can't open file://
* directly; we surface ``file_path`` as text and give a Copy
* button so the user can paste it into a file manager. (The
* desktop renderer has an Electron IPC bridge — not available
* in OSS.)
*/
import { lazy, Suspense, useState } from "react";
import { Copy, Loader2, Check } from "lucide-react";
// Lazy chunks so non-chart messages don't drag in echarts/mermaid.
const EChartsBlock = lazy(() => import("./EChartsBlock"));
const MermaidBlock = lazy(() => import("./MermaidBlock"));
export interface ChartToolEntry {
name: string;
done: boolean;
args?: unknown;
result?: unknown;
isError?: boolean;
callKey?: string;
}
interface ChartResult {
kind?: "echarts" | "mermaid";
spec?: unknown;
file_path?: string;
file_url?: string;
title?: string;
error?: string;
// Width/height come back from the server tool but are NOT displayed
// in the footer. Kept here so the live in-chat render can match the
// spec's native aspect ratio instead of forcing a 16:9 box that
// clips wide dashboards.
width?: number;
height?: number;
}
function asResult(v: unknown): ChartResult {
if (v && typeof v === "object") return v as ChartResult;
return {};
}
export default function ChartToolDetail({ entry }: { entry: ChartToolEntry }) {
const [copyState, setCopyState] = useState<"idle" | "copied">("idle");
// Still running: show a tiny inline spinner. Charts render fast (a
// few hundred ms), so a full skeleton would flash and feel janky.
if (!entry.done) {
return (
<div className="pl-10 mt-1.5">
<div className="flex items-center gap-2 text-[11px] text-muted-foreground">
<Loader2 className="w-3 h-3 animate-spin shrink-0" />
<span>rendering chart</span>
</div>
</div>
);
}
const result = asResult(entry.result);
if (result.error) {
// Errors are intentionally NOT shown to the user — the agent sees
// them in the tool result envelope and is expected to retry with a
// fixed spec.
return null;
}
const kind = result.kind;
const spec = result.spec;
if (!kind || spec === undefined) {
return null;
}
const handleCopyPath = async () => {
if (!result.file_path) return;
try {
await navigator.clipboard.writeText(result.file_path);
setCopyState("copied");
window.setTimeout(() => setCopyState("idle"), 2000);
} catch {
// Clipboard API unavailable (insecure context); silently no-op.
}
};
// Honor the spec's native aspect ratio when both dimensions are
// known (the server tool always returns them).
const aspectRatio =
result.width && result.height ? result.width / result.height : undefined;
return (
<div className="pl-10 mt-1.5 max-w-5xl">
<Suspense
fallback={
<div className="flex items-center gap-2 text-[11px] text-muted-foreground">
<Loader2 className="w-3 h-3 animate-spin" />
<span>loading chart engine</span>
</div>
}
>
{kind === "echarts" ? (
<EChartsBlock spec={spec} aspectRatio={aspectRatio} />
) : kind === "mermaid" ? (
<MermaidBlock source={typeof spec === "string" ? spec : ""} />
) : (
<div className="text-[11px] text-muted-foreground">
unknown chart kind: {String(kind)}
</div>
)}
</Suspense>
{/* Footer: title + path-copy. The PNG lives on the runtime host;
web browsers can't open file:// URIs from a hosted page, so
we surface the path as a copyable string instead of a fake
Download button. */}
<div className="flex items-center justify-between mt-2 px-1 text-[10.5px] text-muted-foreground/80">
<span className="truncate min-w-0 flex-1">
{result.title || kind}
</span>
{result.file_path && (
<button
type="button"
onClick={handleCopyPath}
className="inline-flex items-center gap-1 hover:text-foreground transition shrink-0 cursor-pointer"
title={
copyState === "copied"
? "Copied to clipboard"
: `Copy path: ${result.file_path}`
}
>
{copyState === "copied" ? (
<Check className="w-3 h-3 text-primary" />
) : (
<Copy className="w-3 h-3" />
)}
{copyState === "copied" ? "Copied" : "Copy path"}
</button>
)}
</div>
</div>
);
}
@@ -0,0 +1,232 @@
/**
* Live ECharts renderer for the chat bubble.
*
* Mounts an ECharts instance into a sized div and feeds it the spec
* the agent passed to `chart_render`. The same spec is rendered
* server-side to a PNG; the live render is the in-chat experience,
* the PNG is the downloadable.
*
* - Lazy-loaded via dynamic import so non-chart messages don't pay
* the ~1 MB bundle cost.
* - SVG renderer (`{ renderer: 'svg' }`) for crisp scaling and lower
* memory than canvas. Looks identical at chat-bubble sizes.
* - Resize handled via ResizeObserver; charts adapt to the bubble's
* width while keeping a fixed aspect ratio.
* - Error boundary inside the component itself: invalid specs render
* a tiny "spec invalid" pill with a copy-spec button so the agent
* can self-correct on the next turn.
*/
import { useEffect, useRef, useState } from "react";
import { AlertCircle } from "lucide-react";
import { buildOpenHiveTheme } from "./openhiveTheme";
interface Props {
spec: unknown;
/** Aspect ratio kept while the width adapts to the bubble. Defaults
* to 16:9 — the standard chart shape that fits in slide decks. */
aspectRatio?: number;
/** Hard cap on rendered height (px). Prevents very-tall charts from
* dominating the chat scroll. */
maxHeight?: number;
}
const _themeRegistered: Record<"light" | "dark", boolean> = {
light: false,
dark: false,
};
/**
* Detect the user's current UI theme from the DOM. The OpenHive
* desktop app applies a `dark` class to <html> in dark mode (see
* index.css). We use the same signal here so the live chart matches
* the surrounding chat — neither the agent nor the caller picks the
* theme, and the PNG download is rendered server-side from the same
* source of truth (HIVE_DESKTOP_THEME env, set by Electron from
* nativeTheme.shouldUseDarkColors).
*/
function useDocumentTheme(): "light" | "dark" {
const [theme, setTheme] = useState<"light" | "dark">(() =>
document.documentElement.classList.contains("dark") ? "dark" : "light",
);
useEffect(() => {
const obs = new MutationObserver(() => {
setTheme(
document.documentElement.classList.contains("dark") ? "dark" : "light",
);
});
obs.observe(document.documentElement, {
attributes: true,
attributeFilter: ["class"],
});
return () => obs.disconnect();
}, []);
return theme;
}
export default function EChartsBlock({
spec,
aspectRatio = 16 / 9,
maxHeight = 480,
}: Props) {
const containerRef = useRef<HTMLDivElement>(null);
const chartRef = useRef<unknown>(null); // echarts.ECharts instance, kept untyped to avoid coupling the type import
const [error, setError] = useState<string | null>(null);
// Theme follows the user's OpenHive UI mode automatically. Same
// signal feeds the server-side PNG render via HIVE_DESKTOP_THEME, so
// live chart and downloaded file always match.
const theme = useDocumentTheme();
useEffect(() => {
if (!containerRef.current) return;
let disposed = false;
let resizeObserver: ResizeObserver | null = null;
(async () => {
try {
const echarts = await import("echarts");
if (disposed || !containerRef.current) return;
// Register the OpenHive brand theme once per (theme, mode) so
// bar/line/etc. inherit our palette + cozy spacing instead of
// ECharts' generic-web-2010 defaults. Theme matches the
// server-side render via tools/src/chart_tools/theme.py.
const themeName = theme === "dark" ? "openhive-dark" : "openhive-light";
if (!_themeRegistered[theme]) {
echarts.registerTheme(themeName, buildOpenHiveTheme(theme));
_themeRegistered[theme] = true;
}
// Coerce string specs to objects (defensive — the agent should
// pass dicts but LLMs sometimes serialize before sending).
let parsedSpec: Record<string, unknown>;
if (typeof spec === "string") {
try {
parsedSpec = JSON.parse(spec);
} catch {
throw new Error("spec is a string and not valid JSON");
}
} else {
parsedSpec = spec as Record<string, unknown>;
}
// Disjoint-region layout policy. ECharts has no auto-layout
// for component overlap (verified against the option ref):
// title/legend/grid are absolutely positioned and ignore each
// other. We enforce three non-overlapping regions:
// - Title: anchored to TOP (top:16, no bottom)
// - Legend: anchored to BOTTOM (bottom:16, no top) except
// when orient:'vertical' (side legend stays where placed)
// - Grid: middle, with containLabel for axis labels
// Strips user-supplied vertical positions so an agent spec
// like `legend.top:"8%"` (which lands inside the title at
// chat-bubble dimensions — the 2026-05-01 bug) can't collide.
// Horizontal anchoring is preserved so left-aligned legends
// still work. Must mirror chart_tools/renderer.py exactly so
// the live chart and downloaded PNG look the same.
const userTitle = (parsedSpec.title as Record<string, unknown> | undefined) ?? {};
const userLegend = parsedSpec.legend as Record<string, unknown> | undefined;
const userGrid = (parsedSpec.grid as Record<string, unknown> | undefined) ?? {};
const legendVertical = userLegend?.orient === "vertical";
const stripV = (o: Record<string, unknown>) => {
const c = { ...o };
delete c.top;
delete c.bottom;
return c;
};
const normalizedSpec: Record<string, unknown> = {
...parsedSpec,
title: { left: "center", ...stripV(userTitle), top: 16 },
grid: {
left: 56,
right: 56,
...stripV(userGrid),
// Force vertical bounds — user-supplied grid.top/bottom
// (often percentage strings like "8%" the agent picks at
// default dims) don't generalize across chat-bubble sizes.
// 96 covers: bottom legend (~36) + xAxis name (containLabel
// handles tick labels but NOT axis name; outerBoundsMode is
// v6+ and we're on v5). 40 when no legend.
top: 64,
bottom: userLegend && !legendVertical ? 96 : 40,
containLabel: true,
},
};
if (userLegend) {
const legendDefaults = {
icon: "roundRect",
itemWidth: 12,
itemHeight: 12,
itemGap: 16,
};
normalizedSpec.legend = legendVertical
? { ...legendDefaults, ...userLegend }
: { ...legendDefaults, ...stripV(userLegend), bottom: 16 };
}
// Fresh chart instance per spec; cheaper than reuse + setOption
// for our sizes and avoids stale state between specs.
const chart = echarts.init(containerRef.current, themeName, {
renderer: "svg",
});
chartRef.current = chart;
chart.setOption(normalizedSpec, {
notMerge: true,
lazyUpdate: false,
});
// Resize on container size change.
resizeObserver = new ResizeObserver(() => {
if (chartRef.current && containerRef.current) {
(chartRef.current as { resize: () => void }).resize();
}
});
resizeObserver.observe(containerRef.current);
} catch (e) {
if (!disposed) {
setError(e instanceof Error ? e.message : String(e));
}
}
})();
return () => {
disposed = true;
if (resizeObserver) resizeObserver.disconnect();
if (chartRef.current) {
try {
(chartRef.current as { dispose: () => void }).dispose();
} catch {
// best-effort cleanup
}
chartRef.current = null;
}
};
}, [spec, theme]);
if (error) {
return (
<div
className="flex items-center gap-2 text-[11px] text-muted-foreground px-2.5 py-1.5 rounded-md border border-border/40 bg-muted/30"
role="alert"
>
<AlertCircle className="w-3 h-3 shrink-0" />
<span>chart spec invalid: {error}</span>
</div>
);
}
return (
<div
ref={containerRef}
// Transparent background so the chart blends with the chat bubble
// instead of sitting in an obtrusive white card. The OpenHive
// ECharts theme also sets backgroundColor: 'transparent' so the
// chart itself is see-through. Subtle rounded corners only.
className="w-full rounded-lg bg-transparent"
style={{
// Reserve aspect-ratio space so the chart doesn't pop in.
// ECharts will overwrite the inline style as it lays out.
aspectRatio,
maxHeight,
}}
/>
);
}
@@ -0,0 +1,81 @@
/**
* Live Mermaid renderer for the chat bubble.
*
* Renders a mermaid source string to an inline SVG. Mermaid is
* lazy-loaded so non-diagram messages don't pay the ~600 KB cost.
*
* Theme follows the OpenHive light/dark setting. Errors render a
* tiny pill so the agent gets feedback for the next turn.
*/
import { useEffect, useRef, useState } from "react";
import { AlertCircle } from "lucide-react";
interface Props {
source: string;
theme?: "light" | "dark";
}
let _mermaidInitialized = false;
export default function MermaidBlock({ source, theme = "light" }: Props) {
const ref = useRef<HTMLDivElement>(null);
const [error, setError] = useState<string | null>(null);
useEffect(() => {
let disposed = false;
(async () => {
try {
const mermaid = (await import("mermaid")).default;
if (!_mermaidInitialized) {
mermaid.initialize({
startOnLoad: false,
theme: theme === "dark" ? "dark" : "default",
securityLevel: "loose",
fontFamily:
"'Inter Tight', -apple-system, BlinkMacSystemFont, system-ui, sans-serif",
});
_mermaidInitialized = true;
}
if (disposed || !ref.current) return;
// Unique id per render to avoid conflicting injected styles.
const id = `mmd-${Math.random().toString(36).slice(2, 10)}`;
const { svg } = await mermaid.render(id, source);
if (disposed || !ref.current) return;
ref.current.innerHTML = svg;
} catch (e) {
if (!disposed) {
setError(e instanceof Error ? e.message : String(e));
}
}
})();
return () => {
disposed = true;
};
}, [source, theme]);
if (error) {
return (
<div
className="flex items-center gap-2 text-[11px] text-muted-foreground px-2.5 py-1.5 rounded-md border border-border/40 bg-muted/30"
role="alert"
>
<AlertCircle className="w-3 h-3 shrink-0" />
<span>diagram syntax invalid: {error}</span>
</div>
);
}
return (
<div
ref={ref}
// Match EChartsBlock: transparent so the diagram blends with the
// chat bubble; rounded corners and inner padding give breathing
// room without adding a visible card.
className="w-full overflow-x-auto rounded-lg bg-transparent p-4 [&_svg]:max-w-full [&_svg]:h-auto [&_svg]:mx-auto"
/>
);
}
@@ -0,0 +1,131 @@
/**
* OpenHive ECharts theme — must stay in sync with
* tools/src/chart_tools/theme.py on the runtime side.
*
* Same palette + spacing both for the live in-chat ECharts mount
* (see EChartsBlock.tsx) and the headless server-side render that
* produces the downloadable PNG. Without this both diverge: the chat
* shows ECharts default colors and the PNG shows OpenHive colors,
* confusing the user.
*/
const PALETTE_LIGHT = [
"#db6f02", // honey orange (primary)
"#456a8d", // slate blue
"#3d7a4a", // sage green
"#a8453d", // terracotta brick
"#c48820", // warm bronze
"#5d5b88", // indigo
"#7d6b51", // olive
"#8e4200", // rust
];
const PALETTE_DARK = [
"#ffb825",
"#7ba2c4",
"#7bb285",
"#d97470",
"#e0a83a",
"#9892c4",
"#b8a685",
"#d97e3a",
];
export function buildOpenHiveTheme(theme: "light" | "dark" = "light") {
const isDark = theme === "dark";
const fg = isDark ? "#e8e6e0" : "#1a1a1a";
const fgMuted = isDark ? "#8a8a8a" : "#6b6b6b";
const gridLine = isDark ? "#2a2724" : "#ebe9e2";
const axisLine = isDark ? "#3a3733" : "#d0cfca";
const tooltipBg = isDark ? "#181715" : "#ffffff";
const tooltipBorder = isDark ? "#2a2724" : "#d0cfca";
const palette = isDark ? PALETTE_DARK : PALETTE_LIGHT;
return {
color: palette,
backgroundColor: "transparent",
textStyle: {
fontFamily:
'"Inter Tight", -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, "Helvetica Neue", Arial, sans-serif',
color: fg,
fontSize: 12,
},
title: {
left: "center",
top: 28,
textStyle: { color: fg, fontSize: 16, fontWeight: 600 },
subtextStyle: { color: fgMuted, fontSize: 12 },
},
legend: {
top: 64,
icon: "roundRect",
itemWidth: 12,
itemHeight: 12,
itemGap: 20,
textStyle: { color: fgMuted, fontSize: 12 },
},
grid: {
top: 116,
left: 48,
right: 48,
bottom: 72,
containLabel: true,
},
categoryAxis: {
axisLine: { show: true, lineStyle: { color: axisLine } },
axisTick: { show: false },
axisLabel: { color: fgMuted, fontSize: 11, margin: 14 },
splitLine: { show: false },
nameLocation: "middle",
nameGap: 36,
nameTextStyle: { color: fgMuted, fontSize: 12 },
},
valueAxis: {
axisLine: { show: false },
axisTick: { show: false },
axisLabel: { color: fgMuted, fontSize: 11, margin: 14 },
splitLine: { lineStyle: { color: gridLine, type: "dashed" } },
nameLocation: "middle",
nameGap: 42,
nameTextStyle: { color: fgMuted, fontSize: 12, fontWeight: 500 },
// Don't auto-rotate value-axis names — the theme can't tell xAxis
// (horizontal bar) from yAxis (vertical bar), so rotating both at
// 90° vertical-mounts the xAxis name on horizontal-bar charts and
// it collides with the legend (peer_val regression). Let specs
// set nameRotate explicitly when they want a vertical y-name.
},
timeAxis: {
axisLine: { show: true, lineStyle: { color: axisLine } },
axisLabel: { color: fgMuted, fontSize: 11, margin: 14 },
splitLine: { show: false },
nameLocation: "middle",
nameGap: 36,
nameTextStyle: { color: fgMuted, fontSize: 12 },
},
tooltip: {
backgroundColor: tooltipBg,
borderColor: tooltipBorder,
borderWidth: 1,
padding: [8, 12],
textStyle: { color: fg, fontSize: 12 },
axisPointer: {
lineStyle: { color: axisLine, type: "dashed" },
crossStyle: { color: axisLine },
},
},
bar: { itemStyle: { borderRadius: [3, 3, 0, 0] } },
line: {
lineStyle: { width: 2.5 },
symbol: "circle",
symbolSize: 6,
},
candlestick: {
itemStyle: {
color: "#3d7a4a",
color0: "#a8453d",
borderColor: "#3d7a4a",
borderColor0: "#a8453d",
},
},
};
}
+35 -3
View File
@@ -309,12 +309,44 @@ describe("sseEventToChatMessage", () => {
expect(result!.id).toMatch(/^stream-t-\d+-chat$/);
});
it("returns null for client_input_requested (handled in workspace.tsx)", () => {
it("converts single client_input_requested question to a queen-style bubble", () => {
const event = makeEvent({
type: "client_input_requested",
node_id: "chat",
node_id: "queen",
execution_id: "abc",
data: { prompt: "What next?" },
data: {
questions: [{ id: "q0", prompt: "Which folder?" }],
},
});
const result = sseEventToChatMessage(event, "t");
expect(result).not.toBeNull();
expect(result!.content).toBe("Which folder?");
expect(result!.id).toMatch(/^ask-user-abc-/);
});
it("converts multi-question client_input_requested to a numbered list", () => {
const event = makeEvent({
type: "client_input_requested",
node_id: "queen",
execution_id: "abc",
data: {
questions: [
{ id: "q0", prompt: "Which folder?" },
{ id: "q1", prompt: "Which date range?" },
],
},
});
const result = sseEventToChatMessage(event, "t");
expect(result).not.toBeNull();
expect(result!.content).toBe("1. Which folder?\n2. Which date range?");
});
it("returns null for client_input_requested with no questions (auto-wait park)", () => {
const event = makeEvent({
type: "client_input_requested",
node_id: "queen",
execution_id: "abc",
data: {},
});
expect(sseEventToChatMessage(event, "t")).toBeNull();
});
+124 -11
View File
@@ -140,9 +140,47 @@ export function sseEventToChatMessage(
};
}
case "client_input_requested":
// Handled explicitly in handleSSEEvent (workspace.tsx) for queen input widgets.
return null;
case "client_input_requested": {
// Surface the question(s) as a queen bubble in the chat history so the
// transcript records what was asked alongside the user's answer. The
// input widget at the bottom of the panel still drives the actual
// answer flow — this bubble is read-only context.
const rawQuestions = event.data?.questions;
if (!Array.isArray(rawQuestions) || rawQuestions.length === 0) return null;
const prompts: string[] = [];
for (const q of rawQuestions) {
if (!q || typeof q !== "object") continue;
const qo = q as Record<string, unknown>;
const prompt =
typeof qo.prompt === "string"
? qo.prompt
: typeof qo.question === "string"
? (qo.question as string)
: null;
if (prompt) prompts.push(prompt);
}
if (prompts.length === 0) return null;
const content =
prompts.length === 1
? prompts[0]
: prompts.map((p, i) => `${i + 1}. ${p}`).join("\n");
return {
// Stable per-request id so live + replay paths upsert the same row.
id: `ask-user-${event.execution_id ?? ""}-${event.timestamp ?? createdAt}`,
agent: agentDisplayName || event.node_id || "Agent",
agentColor: "",
content,
timestamp: "",
// Default to worker; the replayEvent wrapper upgrades to "queen"
// when stream_id === "queen". Mirrors llm_text_delta's pattern.
role: "worker",
thread,
createdAt,
nodeId: event.node_id || undefined,
executionId: event.execution_id || undefined,
streamId: event.stream_id || undefined,
};
}
case "client_input_received": {
const userContent = (event.data?.content as string) || "";
@@ -257,12 +295,40 @@ export function sseEventToChatMessage(
* deferred `tool_call_completed` events can find the exact pill they belong
* to after the turn counter moves on.
*/
/**
* For chart_* tools we retain the args (from tool_call_started) and
* result envelope (from tool_call_completed) so the chat panel can
* render the live chart inline from the same spec the runtime
* rasterized to PNG. Other tools omit these fields to keep the
* tool_status content payload small (catalogs are pill-only).
*/
type ToolEntry = {
name: string;
done: boolean;
/** opaque per-call id surfaced to the UI; used to key React rows */
callKey?: string;
/** present only for tools whose name matches shouldRetainDetail */
args?: unknown;
result?: unknown;
isError?: boolean;
};
type ToolRowState = {
streamId: string;
executionId: string;
tools: Record<string, { name: string; done: boolean }>;
tools: Record<string, ToolEntry>;
};
/**
* Names whose detail (args + result envelope) we surface in the chat.
* Other tools stay pill-only — keeping their args/results out of the
* message content avoids ballooning the chat history with tool
* catalogs, file blobs, etc.
*/
function shouldRetainDetail(toolName: string): boolean {
return toolName.startsWith("chart_");
}
export interface ReplayState {
turnCounters: Record<string, number>;
toolRows: Record<string, ToolRowState>;
@@ -311,10 +377,20 @@ function toolLookupKey(
}
function toolRowContent(row: ToolRowState): string {
const tools = Object.values(row.tools).map((t) => ({
name: t.name,
done: t.done,
}));
const tools = Object.values(row.tools).map((t) => {
const out: ToolEntry = { name: t.name, done: t.done };
// Carry callKey + retained fields only for tools whose detail the
// UI mounts (chart_*). Pill-only tools stay terse so the
// tool_status payload doesn't grow with every catalog/file_ops
// call and existing snapshot tests stay valid.
if (shouldRetainDetail(t.name)) {
if (t.callKey !== undefined) out.callKey = t.callKey;
if (t.args !== undefined) out.args = t.args;
if (t.result !== undefined) out.result = t.result;
if (t.isError !== undefined) out.isError = t.isError;
}
return out;
});
const allDone = tools.length > 0 && tools.every((t) => t.done);
return JSON.stringify({ tools, allDone });
}
@@ -379,10 +455,19 @@ export function replayEvent(
tools: {},
});
const toolKey = toolUseId || `anonymous-${Object.keys(row.tools).length}`;
row.tools[toolKey] = {
const entry: ToolEntry = {
name: toolName,
done: false,
callKey: toolKey,
};
// Capture args at start for retained-detail tools so the chat
// can show what the agent rendered. Other tools' arguments are
// intentionally dropped to keep the tool_status JSON small.
if (shouldRetainDetail(toolName)) {
const toolInput = event.data?.tool_input;
if (toolInput !== undefined) entry.args = toolInput;
}
row.tools[toolKey] = entry;
if (toolUseId) {
state.toolUseToPill[toolLookupKey(streamId, event.execution_id, toolUseId)] = {
msgId: pillId,
@@ -415,10 +500,38 @@ export function replayEvent(
if (!tracked) break;
const row = state.toolRows[tracked.msgId];
if (!row) break;
row.tools[tracked.toolKey] = {
name: row.tools[tracked.toolKey]?.name || tracked.name,
const prior = row.tools[tracked.toolKey];
const completedName = prior?.name || tracked.name;
const completed: ToolEntry = {
name: completedName,
done: true,
callKey: tracked.toolKey,
};
// Preserve any args captured at start; capture the result
// envelope for retained-detail tools (chart_* needs spec/file_url
// to mount the live chart).
if (shouldRetainDetail(completedName)) {
if (prior?.args !== undefined) completed.args = prior.args;
const rawResult = event.data?.result;
if (rawResult !== undefined) {
// The framework serializes envelopes as JSON strings. Try to
// parse so the renderer can pick fields cheaply; fall back to
// the raw value when parsing fails (already-an-object or
// non-JSON string).
if (typeof rawResult === "string") {
try {
completed.result = JSON.parse(rawResult);
} catch {
completed.result = rawResult;
}
} else {
completed.result = rawResult;
}
}
const isErr = event.data?.is_error;
if (typeof isErr === "boolean") completed.isError = isErr;
}
row.tools[tracked.toolKey] = completed;
out.push({
id: tracked.msgId,
agent: effectiveName || event.node_id || "Agent",
+31
View File
@@ -326,6 +326,11 @@ export default function ColonyChat() {
// client_input_requested so we don't flicker the typing bubble off while
// the queen is about to resume on the flushed input.
const queenAboutToResumeRef = useRef(false);
// Question bubble for an ask_user that's actively awaiting an answer.
// Stashed instead of pushed into messages so the user only sees ONE copy
// of the question (the popup widget) while answering. Committed to the
// transcript on client_input_received so it lands above the user's reply.
const pendingAskUserBubbleRef = useRef<ChatMessage | null>(null);
const suppressIntroRef = useRef(false);
const loadingRef = useRef(false);
@@ -710,8 +715,34 @@ export default function ColonyChat() {
case "client_input_received":
case "client_input_requested":
case "llm_text_delta": {
// Defer the queen's ask_user bubble so it doesn't render alongside
// the popup widget. Stash on request, commit on receive — see
// pendingAskUserBubbleRef declaration above for rationale.
let stashedAskUserBubble: ChatMessage | null = null;
if (
event.type === "client_input_requested" &&
isQueen &&
emittedMessages.length > 0
) {
const rawQuestions = event.data?.questions;
if (Array.isArray(rawQuestions) && rawQuestions.length > 0) {
stashedAskUserBubble = emittedMessages[0];
pendingAskUserBubbleRef.current = stashedAskUserBubble;
}
}
if (
event.type === "client_input_received" &&
pendingAskUserBubbleRef.current &&
!suppressQueenMessages
) {
// Commit the stashed bubble first; createdAt predates this
// event so timestamp-ordered insert places it above the answer.
upsertMessage(pendingAskUserBubbleRef.current);
pendingAskUserBubbleRef.current = null;
}
if (!suppressQueenMessages) {
for (const msg of emittedMessages) {
if (msg === stashedAskUserBubble) continue;
if (isQueen) {
msg.phase = queenPhaseRef.current as ChatMessage["phase"];
}
+15 -39
View File
@@ -1,8 +1,8 @@
import { useState, useRef } from "react";
import { useNavigate } from "react-router-dom";
import { Loader2, Send } from "lucide-react";
import { messagesApi } from "@/api/messages";
import { Send } from "lucide-react";
import { useColony } from "@/context/ColonyContext";
import { PENDING_CLASSIFY_KEY } from "./queen-routing";
const promptHints = [
"Check my inbox for urgent emails",
@@ -13,32 +13,25 @@ const promptHints = [
export default function Home() {
const navigate = useNavigate();
const { userProfile, refresh } = useColony();
const { userProfile } = useColony();
const [inputValue, setInputValue] = useState("");
const [submitting, setSubmitting] = useState(false);
const [activePrompt, setActivePrompt] = useState<string | null>(null);
const textareaRef = useRef<HTMLTextAreaElement>(null);
const displayName = userProfile.displayName || "there";
const startQueenSession = async (text: string) => {
// Stash the prompt and bounce to /queen-routing immediately. The classify
// LLM call (2-5s) runs on the routing screen rather than blocking nav, so
// the user never watches a spinner on the home page.
const startQueenSession = (text: string) => {
const trimmed = text.trim();
if (!trimmed || submitting) return;
setSubmitting(true);
setActivePrompt(trimmed);
if (!trimmed) return;
try {
const { queen_id } = await messagesApi.classify(trimmed);
// Hand the first message to queen-dm via sessionStorage so it
// survives the navigation without leaking into the URL/history.
sessionStorage.setItem(`queenFirstMessage:${queen_id}`, trimmed);
refresh();
navigate(`/queen/${queen_id}?new=1`);
sessionStorage.setItem(PENDING_CLASSIFY_KEY, trimmed);
} catch {
// Keep the user on home if bootstrap fails.
} finally {
setSubmitting(false);
setActivePrompt(null);
// sessionStorage disabled — fall through; the routing page will
// redirect back to home when the key is missing.
}
navigate("/queen-routing");
};
const handlePromptHint = (text: string) => {
@@ -97,14 +90,10 @@ export default function Home() {
<div className="absolute right-3 bottom-2.5">
<button
type="submit"
disabled={!inputValue.trim() || submitting}
disabled={!inputValue.trim()}
className="w-8 h-8 rounded-lg bg-primary/90 hover:bg-primary text-primary-foreground flex items-center justify-center transition-colors disabled:opacity-30 disabled:cursor-not-allowed"
>
{submitting && !activePrompt ? (
<Loader2 className="w-3.5 h-3.5 animate-spin" />
) : (
<Send className="w-3.5 h-3.5" />
)}
<Send className="w-3.5 h-3.5" />
</button>
</div>
</div>
@@ -116,25 +105,12 @@ export default function Home() {
<button
key={hint}
onClick={() => handlePromptHint(hint)}
disabled={submitting}
className="text-xs text-muted-foreground hover:text-foreground border border-border/50 hover:border-primary/30 rounded-full px-3.5 py-1.5 transition-all hover:bg-primary/[0.03] disabled:opacity-60 disabled:cursor-not-allowed"
className="text-xs text-muted-foreground hover:text-foreground border border-border/50 hover:border-primary/30 rounded-full px-3.5 py-1.5 transition-all hover:bg-primary/[0.03]"
>
{hint}
</button>
))}
</div>
{submitting && activePrompt && (
<p className="mt-4 text-center text-xs">
<span className="queen-debate-line">
<span>The queens are debating who should take this on</span>
<span aria-hidden="true">
{[0, 1, 2].map((dot) => (
<span key={dot}>.</span>
))}
</span>
</span>
</p>
)}
</div>
</div>
);
+32 -28
View File
@@ -1,6 +1,6 @@
import { useState, useCallback, useRef, useEffect, useMemo } from "react";
import { useParams, useSearchParams } from "react-router-dom";
import { Loader2, Minus, Plus } from "lucide-react";
import { Minus, Plus } from "lucide-react";
import ChatPanel, {
type ChatMessage,
type ImageContent,
@@ -117,6 +117,12 @@ export default function QueenDM() {
// client_input_requested so we don't flicker the typing bubble off while
// the queen is about to resume on the flushed input.
const queenAboutToResumeRef = useRef(false);
// Question bubble for an ask_user that's actively awaiting an answer. We
// stash it here instead of pushing it into messages so the user only sees
// ONE copy of the question (the popup widget) while answering. Committed
// to the transcript on client_input_received so the bubble lands right
// above the user's answer for scroll-back context.
const pendingAskUserBubbleRef = useRef<ChatMessage | null>(null);
const [queenPhase, setQueenPhase] = useState<
"independent" | "incubating" | "working" | "reviewing"
>("independent");
@@ -541,19 +547,11 @@ export default function QueenDM() {
const handleCreateNewSession = useCallback(() => {
if (!queenId) return;
setCreatingNewSession(true);
const request = queensApi.createNewSession(
queenId,
undefined,
"independent",
);
request
.then((result) => {
setSearchParams({ session: result.session_id });
})
.catch(() => {
setCreatingNewSession(false);
});
// Bounce through the ?new=1 bootstrap path so the chat shell appears
// immediately with a typing indicator while createNewSession runs in
// the background. URL is replaced with ?session=<id> when it resolves.
// Avoids the 5s "nothing happens, then chat appears" dead window.
setSearchParams({ new: "1" });
}, [queenId, setSearchParams]);
useEffect(() => {
@@ -662,6 +660,14 @@ export default function QueenDM() {
queenAboutToResumeRef.current = false;
break;
}
// Stash the question bubble (synthesized by replayEvent) instead
// of upserting now: while the popup widget is open the user only
// wants to see ONE copy of the question. We commit the bubble on
// client_input_received so it lands right above the user's
// answer in the transcript.
if (emittedMessages.length > 0) {
pendingAskUserBubbleRef.current = emittedMessages[0];
}
setAwaitingInput(true);
setIsTyping(false);
setIsStreaming(false);
@@ -670,6 +676,14 @@ export default function QueenDM() {
}
case "client_input_received": {
// Commit the stashed ask_user bubble first so it appears above
// the user's reply in scroll-back. Its createdAt predates this
// event's, so the timestamp-ordered insert in upsertMessage
// places it correctly.
if (pendingAskUserBubbleRef.current) {
upsertMessage(pendingAskUserBubbleRef.current);
pendingAskUserBubbleRef.current = null;
}
for (const msg of emittedMessages) {
upsertMessage(msg, { reconcileOptimisticUser: true });
}
@@ -918,19 +932,6 @@ export default function QueenDM() {
<div className="flex flex-col h-full">
{/* Chat */}
<div className="flex-1 min-h-0 relative">
{loading && (
<div className="absolute inset-0 z-10 flex items-center justify-center bg-background/60 backdrop-blur-sm">
<div className="flex items-center gap-3 text-muted-foreground">
<Loader2 className="w-5 h-5 animate-spin" />
<span className="text-sm">
{selectedSessionParam?.startsWith("session_")
? "Connecting to session..."
: `Connecting to ${queenName}...`}
</span>
</div>
</div>
)}
<ChatPanel
messages={messages}
onSend={handleSend}
@@ -940,7 +941,10 @@ export default function QueenDM() {
activeThread="queen-dm"
isWaiting={isTyping && !isStreaming}
isBusy={isTyping}
disabled={loading || !queenReady}
// Keep the textarea typable while the queen is warming up so the
// user can compose a follow-up immediately. Send stays locked
// until the session is live and the queen is ready.
sendLocked={loading || !queenReady}
queenPhase={queenPhase}
showQueenPhaseBadge
pendingQuestions={awaitingInput ? pendingQuestions : null}
+92
View File
@@ -0,0 +1,92 @@
import { useEffect, useRef, useState } from "react";
import { useNavigate } from "react-router-dom";
import { Loader2 } from "lucide-react";
import { messagesApi } from "@/api/messages";
import { useColony } from "@/context/ColonyContext";
/**
* Transient routing screen the user lands on right after submitting from the
* home page. Reads the pending prompt from sessionStorage, runs queen
* classification, and redirects (replace) to the resolved queen DM with
* ?new=1 so the existing bootstrap flow takes over.
*
* The point of this page is to get the user out of the home screen *before*
* the classify LLM call runs they should never sit on the home page
* watching a spinner.
*/
export const PENDING_CLASSIFY_KEY = "hive:pendingClassifyMessage";
export default function QueenRouting() {
const navigate = useNavigate();
const { refresh } = useColony();
const [error, setError] = useState<string | null>(null);
// Re-runs of this effect (StrictMode, fast re-mounts) must not re-fire the
// classify call — once we've grabbed the pending message we own it.
const startedRef = useRef(false);
useEffect(() => {
if (startedRef.current) return;
startedRef.current = true;
let pending: string | null = null;
try {
pending = sessionStorage.getItem(PENDING_CLASSIFY_KEY);
if (pending) sessionStorage.removeItem(PENDING_CLASSIFY_KEY);
} catch {
pending = null;
}
if (!pending || !pending.trim()) {
navigate("/", { replace: true });
return;
}
const trimmed = pending.trim();
let cancelled = false;
(async () => {
try {
const { queen_id } = await messagesApi.classify(trimmed);
if (cancelled) return;
// Hand the prompt off to queen-dm via the same key its bootstrap
// path already consumes. Avoids leaking the message into the URL.
sessionStorage.setItem(`queenFirstMessage:${queen_id}`, trimmed);
refresh();
navigate(`/queen/${queen_id}?new=1`, { replace: true });
} catch {
if (cancelled) return;
setError("Couldn't route your request. Try again from the home screen.");
}
})();
return () => {
cancelled = true;
};
}, [navigate, refresh]);
return (
<div className="flex-1 flex flex-col items-center justify-center p-6">
<div className="flex items-center gap-3 text-muted-foreground">
<Loader2 className="w-5 h-5 animate-spin" />
<span className="queen-debate-line text-sm">
<span>The queens are debating who should take this on</span>
<span aria-hidden="true">
{[0, 1, 2].map((dot) => (
<span key={dot}>.</span>
))}
</span>
</span>
</div>
{error && (
<div className="mt-6 flex flex-col items-center gap-3">
<p className="text-sm text-destructive">{error}</p>
<button
onClick={() => navigate("/", { replace: true })}
className="text-xs text-muted-foreground hover:text-foreground border border-border/50 hover:border-primary/30 rounded-full px-3.5 py-1.5 transition-all"
>
Back to home
</button>
</div>
)}
</div>
);
}
+15
View File
@@ -60,6 +60,21 @@ _HIVE_PATH_NAMES = (
)
@pytest.fixture(autouse=True)
def _no_seed_mcp_defaults(monkeypatch):
"""Skip bundled-server seeding in MCPRegistry.initialize() for tests.
Production wants ``initialize()`` to seed ``hive_tools`` / ``gcu-tools``
/ ``files-tools`` / ``terminal-tools`` / ``chart-tools`` so a fresh
HIVE_HOME comes up with working defaults. Tests want a deterministic
empty registry every assertion about counts, "no servers installed"
output, or first-element identity breaks otherwise. Patching here
keeps the production API clean and avoids a test-only flag on
``initialize()``.
"""
monkeypatch.setattr(_mcp_registry.MCPRegistry, "_seed_defaults", lambda self: [])
@pytest.fixture(autouse=True)
def _isolate_hive_home_autouse(tmp_path, monkeypatch):
"""Per-test isolation of ``~/.hive`` to ``tmp_path/.hive``.
+1 -3
View File
@@ -25,9 +25,7 @@ def test_is_client_facing() -> None:
assert session_summary.is_client_facing({"role": "user", "content": "hi"})
assert session_summary.is_client_facing({"role": "assistant", "content": "ok"})
assert not session_summary.is_client_facing({"role": "tool", "content": "x"})
assert not session_summary.is_client_facing(
{"role": "assistant", "content": "", "tool_calls": [{"id": "1"}]}
)
assert not session_summary.is_client_facing({"role": "assistant", "content": "", "tool_calls": [{"id": "1"}]})
assert not session_summary.is_client_facing({"is_transition_marker": True})
+1 -1
View File
@@ -889,7 +889,7 @@ def test_concurrency_safe_allowlist_is_conservative():
allowlist = ToolRegistry.CONCURRENCY_SAFE_TOOLS
# Positive assertions: known-safe read operations are present.
for name in ("read_file", "grep", "glob", "list_directory", "web_search"):
for name in ("read_file", "terminal_rg", "terminal_find", "search_files", "web_scrape"):
assert name in allowlist, f"{name} should be concurrency-safe"
# Negative assertions: nothing that mutates state is allowed in.
+3 -3
View File
@@ -184,9 +184,9 @@ MCP (Model Context Protocol) servers are configured in `.mcp.json` at the projec
```json
{
"mcpServers": {
"coder-tools": {
"files-tools": {
"command": "uv",
"args": ["run", "coder_tools_server.py", "--stdio"],
"args": ["run", "files_server.py", "--stdio"],
"cwd": "tools"
},
"tools": {
@@ -198,7 +198,7 @@ MCP (Model Context Protocol) servers are configured in `.mcp.json` at the projec
}
```
The `coder-tools` server provides agent scaffolding via `initialize_and_build_agent` and related tools. The `tools` MCP server exposes tools including web search, PDF reading, CSV processing, and file system operations.
The `files-tools` server exposes file I/O (`read_file`, `write_file`, `edit_file`, `hashline_edit`, `search_files`). The `tools` MCP server exposes integration tools including web search, PDF reading, CSV processing, and file system operations.
## Storage
+5 -5
View File
@@ -119,7 +119,7 @@ hive/ # Repository root
│ └── README.md # Tools documentation
├── exports/ # AGENT PACKAGES (user-created, gitignored)
│ └── your_agent_name/ # Created via coder-tools workflow
│ └── your_agent_name/ # Created via files-tools workflow
├── examples/ # Example agents
│ └── templates/ # Pre-built template agents
@@ -157,7 +157,7 @@ The fastest way to build agents is with the configured MCP workflow:
./quickstart.sh
# Build a new agent
Use the coder-tools MCP tools from your IDE agent chat (e.g., initialize_and_build_agent)
Use the files-tools MCP tools from your IDE agent chat (e.g., initialize_and_build_agent)
```
### Agent Development Workflow
@@ -165,7 +165,7 @@ Use the coder-tools MCP tools from your IDE agent chat (e.g., initialize_and_bui
1. **Define Your Goal**
```
Use the coder-tools initialize_and_build_agent tool
Use the files-tools initialize_and_build_agent tool
Enter goal: "Build an agent that processes customer support tickets"
```
@@ -569,7 +569,7 @@ uv add <package>
```bash
# Option 1: Use Claude Code skill (recommended)
Use the coder-tools initialize_and_build_agent tool
Use the files-tools initialize_and_build_agent tool
# Option 2: Create manually
# Note: exports/ is initially empty (gitignored). Create your agent directory:
@@ -577,7 +577,7 @@ mkdir -p exports/my_new_agent
cd exports/my_new_agent
# Create agent.json, tools.py, README.md (see Agent Package Structure below)
# Option 3: Use the coder-tools MCP tools (advanced)
# Option 3: Use the files-tools MCP tools (advanced)
# See core/MCP_BUILDER_TOOLS_GUIDE.md
```
+6 -6
View File
@@ -110,7 +110,7 @@ MCP tools are also available in Cursor. To enable:
**Claude Code:**
```
Use the coder-tools initialize_and_build_agent tool to scaffold a new agent
Use the files-tools initialize_and_build_agent tool to scaffold a new agent
```
**Codex CLI:**
@@ -261,7 +261,7 @@ hive/
│ └── pyproject.toml
├── exports/ # Agent packages (user-created, gitignored)
│ └── your_agent_name/ # Created via coder-tools workflow
│ └── your_agent_name/ # Created via files-tools workflow
└── examples/
└── templates/ # Pre-built template agents
@@ -313,9 +313,9 @@ The `.mcp.json` at project root configures MCP servers to run through `uv run` i
```json
{
"mcpServers": {
"coder-tools": {
"files-tools": {
"command": "uv",
"args": ["run", "coder_tools_server.py", "--stdio"],
"args": ["run", "files_server.py", "--stdio"],
"cwd": "tools"
},
"tools": {
@@ -353,7 +353,7 @@ This design allows agents in `exports/` to be:
### 2. Build Agent (Claude Code)
```
Use the coder-tools initialize_and_build_agent tool
Use the files-tools initialize_and_build_agent tool
Enter goal: "Build an agent that processes customer support tickets"
```
@@ -414,7 +414,7 @@ cd core && uv run python tests/dummy_agents/run_all.py --verbose
| parallel_merge | 4 | Fan-out/fan-in, failure strategies |
| retry | 4 | Retry mechanics, exhaustion, `ON_FAILURE` edges |
| feedback_loop | 3 | Feedback cycles, `max_node_visits` |
| worker | 4 | Real MCP tools (`example_tool`, `get_current_time`, `save_data`/`load_data`) |
| worker | 4 | Real MCP tools (`get_current_time`, `save_data`/`load_data`) |
Typical runtime is 13 minutes depending on provider latency.
+4 -4
View File
@@ -65,7 +65,7 @@ This is the recommended way to create your first agent.
# Setup already done via quickstart.sh above
# Start Claude Code and build an agent
Use the coder-tools initialize_and_build_agent tool
Use the files-tools initialize_and_build_agent tool
```
Follow the interactive prompts to:
@@ -133,7 +133,7 @@ hive/
│ └── file_system_toolkits/
├── exports/ # Agent Packages (user-generated, not in repo)
│ └── your_agent/ # Your agents created via coder-tools workflow
│ └── your_agent/ # Your agents created via files-tools workflow
├── examples/
│ └── templates/ # Pre-built template agents
@@ -191,7 +191,7 @@ PYTHONPATH=exports uv run python -m my_agent test --type success
1. **Dashboard**: Run `hive open` to launch the web dashboard
2. **Detailed Setup**: See [environment-setup.md](./environment-setup.md)
3. **Developer Guide**: See [developer-guide.md](./developer-guide.md)
4. **Build Agents**: Use the coder-tools `initialize_and_build_agent` tool in Claude Code
4. **Build Agents**: Use the files-tools `initialize_and_build_agent` tool in Claude Code
5. **Custom Tools**: Learn to integrate MCP servers
6. **Join Community**: [Discord](https://discord.com/invite/MXE49hrKDk)
@@ -236,4 +236,4 @@ pip uninstall -y framework tools
- **Documentation**: Check the `/docs` folder
- **Issues**: [github.com/adenhq/hive/issues](https://github.com/aden-hive/hive/issues)
- **Discord**: [discord.com/invite/MXE49hrKDk](https://discord.com/invite/MXE49hrKDk)
- **Build Agents**: Use the coder-tools workflow to create agents
- **Build Agents**: Use the files-tools workflow to create agents
+127
View File
@@ -0,0 +1,127 @@
# 🐝 Hive Agent v0.11.0: Action Plans, Charts, and a Cleaner Queen
> Major features released in Hive 0.11. Now Queen has an action plan for everything and charting capability to do analytics for you. Overall the conversation and agent experience is also improved a lot thanks to a major Queen prompt and tools refactor.
---
## ✨ Highlights
### 📋 Queen now keeps an action plan for everything
A new file-backed task system gives Queen a persistent, structured plan for every conversation — visible to the user, editable on the fly, and surviving session reload.
- **File-backed task store** under `core/framework/tasks/` with full CRUD, scoping, hooks, and reminders. Tasks live on disk so they outlast a single agent run and can be inspected, replayed, or shared between Queen and colony workers.
- **Multi-task creation in one call** — Queen can stage a whole plan up front instead of dripping out one task at a time, then tick items off as it works.
- **Colony task templates** — colonies can publish a template task list that Queen picks up when the colony is invoked, so recurring workflows start with the same plan every time.
- **Live task list in the UI** — a new `TaskListPanel` renders the plan in real time next to the chat, with item status flowing through the event bus as Queen marks tasks done.
- **Task reminders + hooks** wire into Queen's loop so the plan stays in front of the model and structural blockers preventing tool calls on `task_*` are now resolved.
### 📊 Charting capability for analytics
Queen can now produce real charts inline in the conversation, not just describe them.
- **New `chart_tools` MCP server** with ECharts and Mermaid renderers, an OpenHive theme, and a `chart-creation-foundations` skill that teaches Queen when to chart vs. when to table.
- **Inline chart rendering in chat**`EChartsBlock` and `MermaidBlock` components render the chart spec directly in the transcript; tool results get a contentful display with `ChartToolDetail` instead of a JSON dump.
- **Chart spec normalization** in the renderer keeps Y-axis scaling, series colors, and theme tokens consistent regardless of how Queen phrases the spec.
### 🧹 Major Queen prompt + tools refactor
The biggest cleanup of Queen's tool surface and prompt since v0.7. Fewer, sharper tools; a shorter, more focused prompt; and a clearer model of what Queen has access to vs. what colonies do.
- **File ops consolidated**`apply_diff`, `apply_patch`, `hashline_edit`, the old `data_tools`, `grep_search`, and the legacy `coder_tools_server` are gone. A single rewritten `file_ops` module covers read / search / list / edit with a more predictable interface and ~1.7k fewer lines on net.
- **Search and list-files unified** into one toolkit so Queen stops juggling near-duplicate variants.
- **Browser tools audit** — interactions, navigation, tabs, and lifecycle trimmed and consolidated; `web_scrape` and `browser_open` merged into a single web-search-and-open path.
- **New shell/terminal toolkit** (`shell_tools`) — replaces the old `execute_command_tool` and the inline command sanitizer with a typed module that has proper job control, PTY sessions, ring-buffered output, semantic exit codes, and a destructive-command warning gate. Five new preset skills (`shell-tools-foundations`, `-fs-search`, `-job-control`, `-pty-sessions`, `-troubleshooting`) teach Queen the new surface.
- **Old lifecycle tools removed**`queen_lifecycle_tools.py` shrunk by ~900 lines as deprecated default tools came out.
- **Prompt simplification + improvements** — Queen's node prompts dropped redundant `_queen_style` blocks, tightened phrasing, and now lean on the task system for plan-keeping instead of restating the plan every turn.
- **Tools editor frontend grouping**`ToolsEditor.tsx` groups tools by category so configuring a queen profile is no longer a flat scroll through 80+ entries.
---
## 🆕 What's New
### Tasks & Action Plans
- **`core/framework/tasks/`** — full task subsystem: `store`, `models`, `events`, `hooks`, `reminders`, `scoping`, plus a `tools/` package exposing session and colony task tools to Queen. (@RichardTang-Aden)
- **`POST /api/tasks` routes** for the frontend to read and mutate the live plan. (@RichardTang-Aden)
- **`TaskListPanel` + `TaskItem` + `TaskListContext`** on the frontend render the plan in real time. (@RichardTang-Aden)
- **Multi-task creation tool** lets Queen stage a whole plan in one call. (@RichardTang-Aden)
- **Colony task templates** — colonies ship with a default task list that Queen adopts on entry. (@RichardTang-Aden)
- **Hook + reminder fixes** so Queen reliably uses `task_*` tools instead of skipping them. (@RichardTang-Aden)
### Charts
- **`tools/src/chart_tools/`** — new MCP server with `renderer.py`, `theme.py`, `tools.py`, plus bundled `echarts.min.js` and `mermaid.min.js`. (@TimothyZhang7)
- **`chart-creation-foundations` skill** teaches Queen when and how to chart. (@TimothyZhang7)
- **`EChartsBlock` / `MermaidBlock` / `ChartToolDetail`** components render charts inline. (@TimothyZhang7)
- **OpenHive chart theme** (`openhiveTheme.ts`) keeps chart styling consistent with the rest of the UI. (@TimothyZhang7)
- **Chart spec normalization** in the renderer fixes Y-axis edge cases and series defaults. (@TimothyZhang7)
### Queen Prompt & Tools Refactor
- **Major file ops refactor** — single rewritten `file_ops` module replaces `apply_diff`, `apply_patch`, `hashline_edit`, `grep_search`, `data_tools`, and the legacy `coder_tools_server`. (@RichardTang-Aden)
- **Edit-file refactor** with a tighter API surface and ~560 lines of dead `test_file_ops_hashline.py` removed. (@RichardTang-Aden)
- **Search + list-files consolidation** into one toolkit. (@RichardTang-Aden)
- **Browser tools audit** — navigation, interactions, lifecycle, and tabs trimmed; `web_scrape` and browser-open merged. (@RichardTang-Aden)
- **`shell_tools` package** replaces `execute_command_tool` with proper job control, PTY sessions, ring-buffered output, semantic exit codes, and destructive-command warnings. (@TimothyZhang7)
- **Five new shell preset skills** plus reference docs (`exit_codes.md`, `find_predicates.md`, `ripgrep_cheatsheet.md`, `signals.md`). (@TimothyZhang7)
- **Old lifecycle tools removed**`queen_lifecycle_tools.py` lost ~900 lines. (@RichardTang-Aden)
- **Autocompaction + concurrency tools updated** to play nicely with the new tool registry. (@RichardTang-Aden)
- **Prompt simplification**`nodes/__init__.py` dropped redundant `_queen_style` block and tightened phrasing across nodes. (@RichardTang-Aden)
- **`ToolsEditor` grouping** — frontend tool-config screen now groups tools by category. (@RichardTang-Aden)
### Conversation & Agent Experience
- **`ask_user` questions surface in the chat transcript** instead of vanishing into a side panel, and the question bubble now defers until the user actually answers. (@bryan)
- **New-session navigation with Queen warm-up UI** — new `queen-routing.tsx` page handles the warm-up so the user sees progress instead of a blank screen. (@bryan)
- **Sync tool result contentful display** — tool results render as structured cards (charts, file diffs, etc.) instead of raw JSON. (@TimothyZhang7)
### Vision Fallback
- **Vision model retry + fallback** — non-vision models can now route image inputs through a captioning step instead of failing. (@RichardTang-Aden)
- **Vision fallback with intent** — caption prompts incorporate the user's intent so the caption is task-relevant. (@RichardTang-Aden)
- **Vision fallback auth** — fallback path now uses the right credentials per provider. (@RichardTang-Aden)
- **Looser max-token cap** on vision fallback for models that spend output tokens on internal thinking. (@RichardTang-Aden)
- **Vision fallback model usage logging** for cost visibility. (@RichardTang-Aden)
### Colonies
- **`POST /api/colonies/import`** — onboard a colony from a `tar` / `tar.gz` upload. 50 MB cap, manual path-traversal validation (Python 3.11 compatible), symlinks/hardlinks/devices rejected, mode bits masked. Tests cover happy path, name override, replace flag, traversal, absolute paths, and corrupt archives. (@RichardTang-Aden)
- **Refactored colony routes**`routes_colonies.py` gained ~450 lines of structure for import/export/list flows. (@TimothyZhang7)
### MCP & Tools
- **SimilarWeb V5 integration** — 29 new MCP tools covering traffic & engagement, competitor intelligence, keywords/SERP, audience demographics, and segment analysis. Includes credential spec, health checker, README, and tests on Ubuntu and Windows. (#7066)
- **MCP registry initialization fix** — registry no longer races on first install. (@RichardTang-Aden)
---
## 🐛 Bug Fixes
- **Initial install** path resolution — hardcoded `HIVE_HOME` references replaced; all agent paths now prefixed by the resolved `HIVE_HOME`. (@RichardTang-Aden)
- **Frontend recovery** after a broken state on session reload. (@RichardTang-Aden)
- **Compaction issues** when the agent loop runs into the buffer mid-stream. (@RichardTang-Aden)
- **LiteLLM patch** for a streaming-usage edge case. (@RichardTang-Aden)
- **`ask_user` question bubble** now defers until the user answers. (@bryan)
- **Incubating-mode approval guidance** correctly injects into the prompt. (@RichardTang-Aden)
- **LLM debugger** — fixed timeline order and tool-call display. (@RichardTang-Aden)
- **Shell split-command** parsing fix. (@TimothyZhang7)
- **Chart Y-axis** + **chart spec normalization** edge cases. (@TimothyZhang7)
- **Scroll behavior** on certain element selectors. (@bryan)
- **CI fixes**: skills `HIVE_HOME` refactor regressions, `run_parallel_workers` losing task text on spawn, `test_capabilities` deprecated model identifiers, `test_colony_runtime_overseer` Windows flake. (#7141, #7149)
- **Orphan Zoho CRM test directory** removed under `src/` after the MCP refactor. (#7142)
- **Credentials**`EnvVarStorage.exists` now matches `load` semantics for empty values. (#5680)
---
## 🚀 Upgrading from v0.10.5
No migration required. Pull `main` at `v0.11.0` and restart Hive — existing `~/.hive/` profiles, queens, colonies, and sessions keep working.
A few things to know:
1. **Queen's default tool surface changed.** If you have a queen profile pinned to a removed tool (e.g. `apply_diff`, `apply_patch`, `hashline_edit`, `grep_search`, the old `execute_command_tool`), it'll fall back to the consolidated replacements. Custom profiles referencing those tool names should be updated.
2. **Old `queen_lifecycle_tools` entries are gone.** If you wired any external code against those defaults, switch to the new task system.
3. **Task plan is now persistent.** Queen will start staging a plan automatically on new sessions — if you don't want the panel, you can collapse it from the layout.
Plan the work. Chart the result. 🐝
+1 -3
View File
@@ -369,7 +369,6 @@ Port popular tools, and build out the Runtime Log, Audit Trail, Excel, and Email
- [x] Subdomain Enumerator (tools/subdomain_enumerator/)
- [x] Tech Stack Detector (tools/tech_stack_detector/)
- [x] **Runtime & Logging**
- [x] Runtime Log Tool (tools/runtime_logs_tool/)
- [x] Runtime Logger with L1/L2/L3 levels (runtime/runtime_logger.py)
- [ ] **Audit Trail System**
- [ ] Decision tracing beyond logs
@@ -802,8 +801,7 @@ Port the existing Terminal User Interface (TUI) into a rich web application, all
### Memory & State Inspector
Create a UI component to inspect the Shared Memory and Write-Through Conversation Memory, allowing developers to click on any node and see exactly what it is thinking.
- [x] **Runtime Logs Tool**
- [x] Inspect agent session logs (tools/runtime_logs_tool/)
- [x] **Session State**
- [x] Session state retrieval (builder/package_generator.py)
- [ ] **Memory Inspector UI**
- [ ] Shared Memory visualization
+1 -1
View File
@@ -334,7 +334,7 @@ Update incrementally — do not rewrite from scratch each time.
**Background:** Replaces the older in-memory `_batch_ledger` (and `_working_notes → Current Plan` decomposition) — both were removed on 2026-04-15 because they duplicated state that belongs in SQLite. The queue, per-task `steps` decomposition, and `sop_checklist` hard-gates now all live in `progress.db` and are authoritative.
**Protocol (injected into system prompt):** Workers receive `db_path` and `colony_id` (and optionally `task_id`) in their spawn message and interact with the ledger via `sqlite3` through `execute_command_tool`. The full claim → load plan → execute step → SOP-gate → mark done loop is documented in the skill's `SKILL.md`.
**Protocol (injected into system prompt):** Workers receive `db_path` and `colony_id` (and optionally `task_id`) in their spawn message and interact with the ledger via `sqlite3` through `terminal_exec`. The full claim → load plan → execute step → SOP-gate → mark done loop is documented in the skill's `SKILL.md`.
**Tables:**
- `tasks` — queue: pending → claimed → done|failed, with `worker_id` and atomic claim tokens
+1 -1
View File
@@ -22,7 +22,7 @@ template_name/
### Option 1: Build from template (recommended)
Use the `coder-tools` `initialize_and_build_agent` tool and select "From a template" to interactively pick a template, customize the goal/nodes/graph, and export a new agent.
Use the `files-tools` `initialize_and_build_agent` tool and select "From a template" to interactively pick a template, customize the goal/nodes/graph, and export a new agent.
### Option 2: Manual copy

Some files were not shown because too many files have changed in this diff Show More