fix: skills for colonies

This commit is contained in:
Timothy
2026-04-14 16:23:17 -07:00
parent 958bafea29
commit 256b52b818
3 changed files with 365 additions and 77 deletions
+153 -66
View File
@@ -695,29 +695,64 @@ a saved agent.
## Forking the session into a persistent colony
**When to use create_colony:** the user needs work to run \
**headless, recurring, or in parallel to this chat** something \
that keeps going after you stop talking. Typical triggers:
**Prove the work inline BEFORE scaling to a colony.** This is the \
most important rule in this section. A colony is a durable, \
unattended runtime you must know the task mechanics work before \
you bake them into one. The expensive, hard-to-debug failures \
(dummy-target browser loops, wrong selectors, misread skills) \
happen when a queen delegates to a colony without ever doing \
the work herself first.
**The inline-first, scale-after pattern:**
1. **Do one instance of the work yourself, inline**, right in \
this chat. Use your own tools. Open the browser, click the \
real button, type the real text, send the real message, \
verify the real result. This is the shortest path from \
"vague intent" to "known-working procedure" you learn \
the exact selectors, the exact quirks, the exact sequence \
that works on this site / API / system right now.
2. **Report the result to the user.** "I sent the message to \
Dimitris here's the confirmation. Before I scale this to \
your whole connection list, want me to tweak anything?" \
This gives the user a concrete sample to react to AND \
gives you feedback before the cost of scaling multiplies.
3. **Only after a successful inline run**, decide whether to:
- stay inline and iterate by hand (small batches)
- fan out via `run_parallel_workers` (one-shot batch, \
results needed RIGHT NOW, no persistence needed)
- scale via `create_colony` (headless / recurring / needs \
to survive this chat ending)
**When to use create_colony:** after step 2 has succeeded, and \
the user needs work to run **headless, recurring, or in parallel \
to this chat**. Typical triggers:
- "run this every morning / every hour / on a cron"
- "keep monitoring X and alert me when Y"
- "fire this off in the background, I'll check on it later"
- "spin up a dedicated agent for this so I can keep working here"
- any task that should survive the current conversation ending
**When NOT to use it:** if the user just wants results RIGHT NOW \
in this chat, use `run_parallel_workers` instead. If they want to \
iterate on an agent design, stay in the planning/building flow. \
Don't create a colony just because you "learned something \
reusable" — the trigger is operational (needs to keep running), \
not epistemic (knowledge worth saving).
**When NOT to use it:**
- You haven't actually done the work once yet. STOP. Do it \
inline first. Delegating an untested procedure to a colony \
is the single most common cause of silent worker failure.
- The user wants results RIGHT NOW and doesn't need the task \
to persist stay inline or use `run_parallel_workers`.
- You "learned something reusable" but there's no operational \
need to keep running knowledge worth saving goes in a \
skill file, not a colony.
**Two-step flow:**
**Two-step flow (assuming step 1-2 above have succeeded):**
1. AUTHOR A SKILL FIRST so the colony worker has the operational \
context it needs to run unattended. Use write_file to create a \
skill folder (recommended location: \
`~/.hive/skills/{skill-name}/SKILL.md`) capturing the \
procedure API endpoints, auth flow, response shapes, \
gotchas, conventions, query patterns, rate limits. The \
context it needs to run unattended and write it from the \
knowledge you just earned doing the work inline, not from \
speculation. Include the EXACT selectors, tool call \
sequences, and gotchas you hit in your own run. Use \
write_file to create the skill folder (recommended \
location: `~/.hive/skills/{skill-name}/SKILL.md`). The \
SKILL.md needs YAML frontmatter with `name` (matching the \
directory name) and `description` (1-1024 chars including \
trigger keywords), followed by a markdown body. Optional \
@@ -726,12 +761,13 @@ not epistemic (knowledge worth saving).
2. create_colony(colony_name, task, skill_path) Validates the \
skill folder, installs it under ~/.hive/skills/ if it isn't \
already there, and forks this session into a new colony. \
NOTHING RUNS after this call: the task is baked into \
worker.json and the user starts the worker (or wires up a \
trigger) later from the new colony page. The task string \
must be FULL and self-contained when the worker eventually \
runs it has zero memory of your chat. The skill you wrote is \
discovered on first scan so the worker starts informed.
The colony worker inherits your full conversation at spawn \
time, so it sees everything you already did and said no \
repeated discovery. NOTHING RUNS immediately after this \
call: the task is baked into worker.json and the user starts \
the worker (or wires up a trigger) later from the new colony \
page. The task string still must be FULL and self-contained \
because triggers fire without your chat context.
## Workflow summary
1. Understand requirements discover tools design the layout
@@ -843,32 +879,62 @@ synthesis.
## Forking this session into a persistent colony
**When to use create_colony:** the user needs work to run \
**headless, recurring, or in parallel to this chat** something \
that should keep going after this conversation ends. Typical \
triggers:
**Prove the work inline BEFORE scaling to a colony.** This is the \
most important rule in this section. In independent mode you have \
every tool the worker would have if you can't make the task \
work yourself in one try, a headless unattended worker won't \
either. The expensive, hard-to-debug failures (dummy-target \
browser loops, wrong selectors, misread skills) happen when a \
queen delegates to a colony without ever doing the work herself \
first.
**The inline-first, scale-after pattern:**
1. **Do one instance of the work yourself, inline**, right in \
this chat. Open the browser, click the real button, type \
the real text, send the real message, verify the real \
result. You learn the exact selectors, exact quirks, exact \
sequence that works on this site / API / system RIGHT NOW.
2. **Report the result to the user.** Show them the concrete \
sample. Ask if they want anything adjusted before you \
scale up.
3. **Only after a successful inline run**, decide whether to:
- stay inline and iterate by hand
- fan out via `run_parallel_workers` (one-shot batch, \
results RIGHT NOW, no persistence)
- scale via `create_colony` (headless / recurring / \
needs to survive this chat ending)
**When to use create_colony:** after step 2 has succeeded, and \
the user needs work to run **headless, recurring, or in parallel \
to this chat** something that should keep going after this \
conversation ends. Typical triggers:
- "run this every morning / every hour / on a cron"
- "keep monitoring X and alert me when Y changes"
- "fire this off in the background so I can keep working here"
- "spin up a dedicated agent for this job"
- any task that needs to survive the current session
**When NOT to use it:** if the user just wants results RIGHT NOW \
in this chat, use `run_parallel_workers` instead. Don't create a \
colony just because you "learned something reusable" the \
trigger is operational (needs to keep running), not epistemic \
(knowledge worth saving).
**When NOT to use it:**
- You haven't actually done the work once yet. STOP. Do it \
inline first. This is the #1 cause of silent worker failure.
- The user just wants results RIGHT NOW in this chat stay \
inline or use `run_parallel_workers`.
- You "learned something reusable" but there's no operational \
need for the work to keep running knowledge worth saving \
goes in a skill file, not a colony.
**Two-step flow:**
**Two-step flow (assuming step 1-2 above have succeeded):**
1. AUTHOR A SKILL FIRST in a SCRATCH location so the colony \
worker has the operational context it needs to run \
unattended. Use write_file to create a skill folder \
unattended and write it from the knowledge you just \
earned doing the work inline, not from speculation. Include \
the EXACT selectors, tool call sequences, and gotchas you \
hit in your own run. Use write_file to create a skill folder \
somewhere temporary (e.g. `/tmp/{skill-name}/` or your \
working directory) capturing the procedure API endpoints, \
auth flow, pagination, gotchas, rate limits, response \
shapes. DO NOT author it under `~/.hive/skills/` that path \
is user-global and would leak the skill to every other \
agent. The SKILL.md needs YAML frontmatter with `name` \
working directory). DO NOT author it under `~/.hive/skills/` \
that path is user-global and would leak the skill to every \
other agent. The SKILL.md needs YAML frontmatter with `name` \
(matching the directory name) and `description` (1-1024 \
chars including trigger keywords), followed by a markdown \
body. Optional subdirs: scripts/, references/, assets/. \
@@ -878,12 +944,14 @@ trigger is operational (needs to keep running), not epistemic \
the skill folder, forks this session into a new colony, and \
installs the skill COLONY-SCOPED at \
`~/.hive/colonies/{colony_name}/skills/{skill_name}/`. Only \
that colony's worker sees it, no other agent. NOTHING RUNS \
after this call the task is baked into worker.json and \
the user starts the worker (or wires up a trigger) later \
from the new colony page. The task string must be FULL and \
self-contained because the worker has zero memory of your \
chat when it eventually runs.
that colony's worker sees it, no other agent. The colony \
worker inherits your full conversation at spawn time, so it \
sees everything you already did and said no repeated \
discovery. NOTHING RUNS immediately after this call the \
task is baked into worker.json and the user starts the \
worker (or wires up a trigger) later from the new colony \
page. The task string must still be FULL and self-contained \
because triggers fire without your chat context.
"""
_queen_behavior_editing = """
@@ -899,33 +967,52 @@ Report the last run's results to the user and ask what they want to do next.
"""
_queen_behavior_independent = """
## Independent — do the work yourself
## Independent — do the work yourself (inline first, always)
You are the agent. No pre-loaded worker you execute directly.
1. Understand the task from the user
2. Plan your approach briefly (no flowcharts or agent design)
3. Execute using your tools: file I/O, shell commands, browser automation
4. Report results, iterate if needed
You are the agent. No pre-loaded worker you execute directly. \
**Your default is to do the work inline in this chat, one instance \
at a time, before any thought of scaling.**
## Scaling up from independent mode
1. Understand the task from the user.
2. Plan your approach briefly (no flowcharts, no agent design).
3. **Do the work yourself, inline. One real instance.** Open the \
browser, call the real API, write to the real file, send the \
real message. Use your actual tools against real state. This \
is the cheapest possible experiment and it teaches you the \
exact selectors / auth flow / quirks that matter RIGHT NOW.
4. **Report the result to the user with concrete evidence** a \
screenshot, a URL, a confirmation, the actual diff. Let them \
react before you scale.
5. Iterate if needed STAY INLINE while you figure out the \
mechanics. Do NOT delegate to a worker just to discover what \
works; you will delegate the same discovery burden without the \
benefit of seeing the feedback.
6. Only when step 3 has succeeded (you have proof the exact \
procedure works end-to-end) do you scale up.
You have no pre-loaded worker in this phase, but you DO have two \
lifecycle tools for spinning up work dynamically:
**Scaling pathways** (in order of cost, cheapest first):
- **Stay inline, run it again.** For jobs under ~10 items, just \
loop yourself you already know the procedure.
- **`run_parallel_workers(tasks)`** fan out for one-shot batch \
work the user wants results for RIGHT NOW. No persistence, no \
colony. Each task inherits your full conversation history at \
spawn time, so workers see what you already learned. Use when \
you need concurrency to beat wall-clock time.
- **`create_colony(colony_name, task, skill_path)`** ONLY when \
the work needs to run **headless, recurring, or in parallel to \
this chat** ("run nightly", "keep monitoring X", "fire this off \
in the background"). Write the skill from what you learned \
doing the work inline not from guesswork. Then fork. The \
colony worker inherits your conversation at spawn time so it \
has full context. Do NOT use this just because you "learned \
something reusable" — the trigger is operational (needs to \
keep running), not epistemic.
- **run_parallel_workers(tasks)** for one-off batch work the user \
wants results for RIGHT NOW. Fan out N subtasks concurrently and \
synthesize the aggregated reports. No colony is created; the \
workers exist only for this call.
- **create_colony(colony_name, task, skill_path)** when the user \
wants work to run **headless, recurring, or in parallel to this \
chat** (e.g. "run nightly", "keep monitoring X", "fire this off \
in the background"). Write a skill folder to scratch capturing \
the operational procedure, then call this to fork the session \
and install the skill colony-scoped. Nothing runs after fork \
the user starts the worker (or sets a trigger) later from the \
new colony page. Do NOT use this just because you "learned \
something reusable" — the trigger is operational (needs to keep \
running), not epistemic.
**Hard rule: NEVER call `run_parallel_workers` or `create_colony` \
before you have successfully completed the task once inline.** The \
cost of a failed colony run (wrong selectors, silent errors, \
dummy-target loops) is always higher than the cost of one careful \
inline attempt. When in doubt, do it yourself first.
You do NOT have the agent-building lifecycle (no save_agent_draft, \
confirm_and_build, load_built_agent, run_agent_with_input). If the \
+177
View File
@@ -41,6 +41,42 @@ if TYPE_CHECKING:
logger = logging.getLogger(__name__)
def _format_spawn_task_message(task: str, input_data: dict[str, Any]) -> str:
"""Render the spawn task into the worker's next user message.
Spawned workers inherit the queen's conversation via
``ColonyRuntime._fork_parent_conversation``; this helper builds
the content of the trailing user message that carries the new
task. The queen's chat already provides the context for the
task, so we frame this as an explicit hand-off.
Additional keys from ``input_data`` (other than the task itself)
are rendered below the hand-off line so the worker sees them as
structured hand-off data. This mirrors the fresh-path
``AgentLoop._build_initial_message`` shape so worker prompts look
roughly the same whether or not inheritance fired.
"""
lines = [
"# New task delegated by the queen",
"",
"The queen's conversation up to this point is visible above. "
"Use it as context (who the user is, what was already decided, "
"which skills apply). Your own system prompt and tool set are "
"set by the framework — the queen's tools may differ from "
"yours, so treat her prior tool calls as history only.",
"",
f"task: {task}",
]
for key, value in (input_data or {}).items():
if key in ("task", "user_request"):
# Already rendered above; don't duplicate.
continue
if value is None:
continue
lines.append(f"{key}: {value}")
return "\n".join(lines)
@dataclass
class ColonyConfig:
max_concurrent_workers: int = 100
@@ -432,6 +468,131 @@ class ColonyRuntime:
def resume_timers(self) -> None:
self._timers_paused = False
async def _fork_parent_conversation(
self,
dest_conv_dir: Path,
*,
task: str,
input_data: dict[str, Any] | None = None,
) -> None:
"""Fork the colony's parent queen conversation into ``dest_conv_dir``.
Copies the queen's ``parts/*.json`` and ``meta.json`` into the
worker's fresh conversation dir, then appends a synthetic user
message carrying the new task. The worker's subsequent
``AgentLoop._restore`` reads this conversation via the usual
path the queen's history is visible as prior turns, the task
appears as the most recent user message, and the worker starts
acting on it with full context.
This is a no-op if the colony runtime doesn't own a parent
queen conversation (e.g. a standalone colony started without a
queen wrapper).
Notes on filtering compatibility:
- Queen parts have ``phase_id=None``. When the worker's
restore applies its own phase filter, the backward-compat
fallback in NodeConversation.restore kicks in: an
all-None-phased store bypasses the filter. See
``conversation.py:1369-1378``.
- ``cursor.json`` is deliberately NOT copied. The worker
should start fresh at iteration 0; copying the queen's
cursor would make the worker think it had already done
work.
- The queen's ``meta.json`` is copied but the AgentLoop
immediately rebuilds ``system_prompt`` from the worker's
own context post-restore (see agent_loop.py:533-535), so
the queen's system prompt does not leak into the worker.
"""
# Resolve the queen's own conversation dir. For a queen-backed
# ColonyRuntime, storage_path points at the queen's session dir
# and conversations/ lives inside it. For standalone runtimes
# (tests, legacy fork path under ~/.hive/agents/{name}/worker/)
# there's no parent conversation — fall through to the fresh
# spawn path.
src_conv_dir = self._storage_path / "conversations"
src_parts_dir = src_conv_dir / "parts"
if not src_parts_dir.exists():
# No queen conversation to inherit — the worker starts with
# only the task, same as the pre-fork behavior. AgentLoop's
# fresh-conversation branch will call _build_initial_message
# and render input_data into the worker's first user message.
return
def _copy_and_append() -> None:
dest_parts = dest_conv_dir / "parts"
dest_parts.mkdir(parents=True, exist_ok=True)
# Copy each queen part. Use json.dumps round-trip (not raw
# file copy) so we can be defensive about unreadable files —
# a corrupted queen part file shouldn't take down the worker
# spawn, just drop that one part.
max_seq = -1
for part_file in sorted(src_parts_dir.glob("*.json")):
try:
data = json.loads(part_file.read_text(encoding="utf-8"))
except (json.JSONDecodeError, OSError) as exc:
logger.warning(
"spawn fork: skipping unreadable queen part %s: %s",
part_file.name,
exc,
)
continue
seq = data.get("seq")
if isinstance(seq, int) and seq > max_seq:
max_seq = seq
(dest_parts / part_file.name).write_text(
json.dumps(data, ensure_ascii=False),
encoding="utf-8",
)
# Copy the queen's meta.json so the worker's restore finds
# the conversation during its first run. The meta fields
# (system_prompt, max_context_tokens, etc.) get overridden
# by the worker's own AgentLoop config + context after
# restore, so nothing here bleeds into runtime behavior.
src_meta = src_conv_dir / "meta.json"
if src_meta.exists():
try:
meta_data = json.loads(src_meta.read_text(encoding="utf-8"))
(dest_conv_dir / "meta.json").write_text(
json.dumps(meta_data, ensure_ascii=False),
encoding="utf-8",
)
except (json.JSONDecodeError, OSError) as exc:
logger.warning(
"spawn fork: failed to copy queen meta.json: %s", exc
)
# Append the task as the next user message so the worker's
# LLM sees it as the most recent turn in the conversation
# after restore. This replaces the fresh-path call to
# _build_initial_message for spawned workers.
task_content = _format_spawn_task_message(task, input_data or {})
next_seq = max_seq + 1
task_part = {
"seq": next_seq,
"role": "user",
"content": task_content,
# phase_id omitted (None) so the backward-compat
# fallback in NodeConversation.restore keeps it visible
# to both queen-style and phase-filtered restores.
# run_id omitted so the worker's run_id filter (off by
# default since ctx.run_id is empty) doesn't reject it.
}
task_filename = f"{next_seq:010d}.json"
(dest_parts / task_filename).write_text(
json.dumps(task_part, ensure_ascii=False),
encoding="utf-8",
)
logger.info(
"spawn fork: inherited %d queen parts + appended task at seq %d",
max_seq + 1,
next_seq,
)
await asyncio.to_thread(_copy_and_append)
# ── Worker Spawning ─────────────────────────────────────────
async def spawn(
@@ -497,6 +658,22 @@ class ColonyRuntime:
# (worse) the process CWD.
worker_storage = self._storage_path / "workers" / worker_id
worker_storage.mkdir(parents=True, exist_ok=True)
# Fork the queen's conversation into the worker's store.
# The queen already accumulated the user chat, read relevant
# skills, and made decisions about how to approach the task;
# the worker would repeat that discovery work (and often
# mis-step — see the 2026-04-14 "dummy-target" incident)
# if spawned with a blank store. We snapshot the queen's
# parts + meta at spawn time, then append the task as the
# next user message so the worker's AgentLoop restores into
# a conversation that already ends with its new instruction.
await self._fork_parent_conversation(
worker_storage / "conversations",
task=task,
input_data=input_data,
)
worker_conv_store = FileConversationStore(
worker_storage / "conversations"
)
@@ -98,10 +98,20 @@ textarea = browser_evaluate("""
browser_click_coordinate(textarea['cx'], textarea['cy'])
sleep(0.6)
# 6. Insert text via CDP Input.insertText (browser_type does this by default now).
# Per-char keyDown fails on Lexical composers — the keys dispatch but
# the editor never turns them into text.
browser_type(<appropriate-selector-or-skip-selector-and-use-bridge-insertText>, text)
# 6. Insert text via document.execCommand('insertText') through browser_evaluate.
# This is the ONLY reliable approach for LinkedIn's Lexical composer.
# See the "Lexical composer quirks" section below for why browser_type
# with a selector does NOT work here (the contenteditable lives inside
# the #interop-outlet shadow root which document.querySelector can't
# reach). The click in step 5 already put Lexical into edit mode, so
# execCommand injects straight into the focused editor's state.
browser_evaluate("""
(function(){
document.execCommand('insertText', false, %s);
return true;
})();
""" % json.dumps(message_text)) # json.dumps gives you a safely-escaped JS string literal
sleep(1.0) # let Lexical commit state + enable Send button
# 7. Find the modal Send button (filter by in-viewport, reject pinned bar)
send = browser_evaluate("""
@@ -133,11 +143,24 @@ send = browser_evaluate("""
})();
""")
# 8. ONLY click Send if it's enabled — if disabled, the editor didn't register the input.
# Don't click blindly; the framework state is the source of truth, not the DOM text.
if not send['disabled']:
browser_click_coordinate(send['cx'], send['cy'])
sleep(2.5) # wait for send + bubble render
# 8. ONLY click Send if it's enabled — if disabled, the execCommand
# didn't land. DO NOT retry with a different tool; the fix is
# always: re-click the composer rect, re-run execCommand, re-check.
# The Send button's `disabled` state IS the ground truth — if
# Lexical registered your text, it enables the button. If it's
# still disabled, your text did not reach the editor, regardless
# of what any tool call claims.
if send['disabled']:
# The editor didn't receive your text. Do NOT click Send. Do NOT
# fall back to browser_type with a dummy selector (see anti-pattern
# in Common Pitfalls). Instead: re-click the textarea rect from
# step 4, wait a beat, re-run the execCommand insertText from step
# 6. If that still fails after 2 retries, bail and surface — the
# modal may have been reclaimed by a stale state or auth wall.
raise Exception("Send button disabled after insertText — editor did not receive input")
browser_click_coordinate(send['cx'], send['cy'])
sleep(2.5) # wait for send + bubble render
```
**Verify post-send**: the composer textarea should now be empty (`innerText === ''`) and `.msg-s-event-listitem__message-bubble` count should have grown by 1. Walk the shadow tree via `browser_evaluate` to check.
@@ -247,8 +270,9 @@ If any of those show up, **stop the run, screenshot the state, and surface the i
## Common pitfalls
- **`innerHTML` injection is silently dropped** — LinkedIn's Trusted Types CSP discards any `innerHTML = "<...>"` from injected scripts, no console error. Always use `createElement` + `appendChild` + `setAttribute` for DOM injection. `textContent`, `style.cssText`, and `.value` assignments are fine.
- **Per-char keyDown on the message composer produces empty text** — Lexical intercepts `beforeinput` and drops raw keys. Use `browser_type` (which now routes through CDP `Input.insertText`), or call `Input.insertText` directly via the bridge on the focused shadow element.
- **`browser_type(selector=...)` can't see the message composer** — it's inside `#interop-outlet` shadow. `document.querySelector('div.msg-form__contenteditable')` returns nothing. Use the shadow-walk + click-to-focus pattern above.
- **Do NOT use `browser_type` on the message composer — use `document.execCommand('insertText', false, text)` via `browser_evaluate` instead.** The Lexical contenteditable lives inside the `#interop-outlet` shadow root which `document.querySelector` (what `browser_type` uses under the hood) cannot see. Attempts to work around this with `browser_shadow_query` fail because `browser_type` doesn't support the `>>>` shadow-pierce syntax. The ONLY reliable insert path is: (1) `browser_click_coordinate` on the composer rect (put Lexical in edit mode via a real CDP pointer click) → (2) `browser_evaluate` with `document.execCommand('insertText', false, <message>)` against the focused editor. This pattern is verified end-to-end across 15+ successful sends in session `session_20260414_113244_a98cfd66` (2026-04-14).
- **Per-char keyDown on the message composer produces empty text** — Lexical intercepts `beforeinput` and drops raw keys. Ignore `browser_type` entirely for LinkedIn DMs; use the `execCommand('insertText')` path above.
- **ANTI-PATTERN: "inject a dummy `<div id='dummy-target'>` and pass it as the `selector` arg to `browser_type`".** This looks tempting but fails compoundingly: `browser_type` clicks the **dummy div's** rect (not the editor's), the click lands on the Lexical wrapper's non-editable chrome, the contenteditable never receives focus, and `Input.insertText` fires against nothing. The bridge will still return `{"ok": true, "action": "type", "length": N}` because it has no way to verify the text actually landed. Symptom: Send button stays `disabled: true` forever. Fix: use `execCommand('insertText')` exactly as shown in the profile-message flow above. (See `session_20260414_114820_08bd3c4d` for the failed attempt.)
- **Multiple Send buttons on the page** — the pinned bottom-right messaging bar has its own `msg-form__send-button` that's usually below `innerHeight`. Filter by in-viewport before clicking.
- **`window.onbeforeunload` hangs navigation/close** — after typing in a composer, any `browser_navigate` or `close_tab` can pop a native "unsent message, leave?" confirm dialog that deadlocks the bridge. Always strip `onbeforeunload` before any navigation, and wrap composer flows in a `try/finally` that runs the cleanup block: