fix: skills for colonies
This commit is contained in:
@@ -695,29 +695,64 @@ a saved agent.
|
||||
|
||||
## Forking the session into a persistent colony
|
||||
|
||||
**When to use create_colony:** the user needs work to run \
|
||||
**headless, recurring, or in parallel to this chat** — something \
|
||||
that keeps going after you stop talking. Typical triggers:
|
||||
**Prove the work inline BEFORE scaling to a colony.** This is the \
|
||||
most important rule in this section. A colony is a durable, \
|
||||
unattended runtime — you must know the task mechanics work before \
|
||||
you bake them into one. The expensive, hard-to-debug failures \
|
||||
(dummy-target browser loops, wrong selectors, misread skills) \
|
||||
happen when a queen delegates to a colony without ever doing \
|
||||
the work herself first.
|
||||
|
||||
**The inline-first, scale-after pattern:**
|
||||
|
||||
1. **Do one instance of the work yourself, inline**, right in \
|
||||
this chat. Use your own tools. Open the browser, click the \
|
||||
real button, type the real text, send the real message, \
|
||||
verify the real result. This is the shortest path from \
|
||||
"vague intent" to "known-working procedure" — you learn \
|
||||
the exact selectors, the exact quirks, the exact sequence \
|
||||
that works on this site / API / system right now.
|
||||
|
||||
2. **Report the result to the user.** "I sent the message to \
|
||||
Dimitris — here's the confirmation. Before I scale this to \
|
||||
your whole connection list, want me to tweak anything?" \
|
||||
This gives the user a concrete sample to react to AND \
|
||||
gives you feedback before the cost of scaling multiplies.
|
||||
|
||||
3. **Only after a successful inline run**, decide whether to:
|
||||
- stay inline and iterate by hand (small batches)
|
||||
- fan out via `run_parallel_workers` (one-shot batch, \
|
||||
results needed RIGHT NOW, no persistence needed)
|
||||
- scale via `create_colony` (headless / recurring / needs \
|
||||
to survive this chat ending)
|
||||
|
||||
**When to use create_colony:** after step 2 has succeeded, and \
|
||||
the user needs work to run **headless, recurring, or in parallel \
|
||||
to this chat**. Typical triggers:
|
||||
- "run this every morning / every hour / on a cron"
|
||||
- "keep monitoring X and alert me when Y"
|
||||
- "fire this off in the background, I'll check on it later"
|
||||
- "spin up a dedicated agent for this so I can keep working here"
|
||||
- any task that should survive the current conversation ending
|
||||
|
||||
**When NOT to use it:** if the user just wants results RIGHT NOW \
|
||||
in this chat, use `run_parallel_workers` instead. If they want to \
|
||||
iterate on an agent design, stay in the planning/building flow. \
|
||||
Don't create a colony just because you "learned something \
|
||||
reusable" — the trigger is operational (needs to keep running), \
|
||||
not epistemic (knowledge worth saving).
|
||||
**When NOT to use it:**
|
||||
- You haven't actually done the work once yet. STOP. Do it \
|
||||
inline first. Delegating an untested procedure to a colony \
|
||||
is the single most common cause of silent worker failure.
|
||||
- The user wants results RIGHT NOW and doesn't need the task \
|
||||
to persist → stay inline or use `run_parallel_workers`.
|
||||
- You "learned something reusable" but there's no operational \
|
||||
need to keep running — knowledge worth saving goes in a \
|
||||
skill file, not a colony.
|
||||
|
||||
**Two-step flow:**
|
||||
**Two-step flow (assuming step 1-2 above have succeeded):**
|
||||
1. AUTHOR A SKILL FIRST so the colony worker has the operational \
|
||||
context it needs to run unattended. Use write_file to create a \
|
||||
skill folder (recommended location: \
|
||||
`~/.hive/skills/{skill-name}/SKILL.md`) capturing the \
|
||||
procedure — API endpoints, auth flow, response shapes, \
|
||||
gotchas, conventions, query patterns, rate limits. The \
|
||||
context it needs to run unattended — and write it from the \
|
||||
knowledge you just earned doing the work inline, not from \
|
||||
speculation. Include the EXACT selectors, tool call \
|
||||
sequences, and gotchas you hit in your own run. Use \
|
||||
write_file to create the skill folder (recommended \
|
||||
location: `~/.hive/skills/{skill-name}/SKILL.md`). The \
|
||||
SKILL.md needs YAML frontmatter with `name` (matching the \
|
||||
directory name) and `description` (1-1024 chars including \
|
||||
trigger keywords), followed by a markdown body. Optional \
|
||||
@@ -726,12 +761,13 @@ not epistemic (knowledge worth saving).
|
||||
2. create_colony(colony_name, task, skill_path) — Validates the \
|
||||
skill folder, installs it under ~/.hive/skills/ if it isn't \
|
||||
already there, and forks this session into a new colony. \
|
||||
NOTHING RUNS after this call: the task is baked into \
|
||||
worker.json and the user starts the worker (or wires up a \
|
||||
trigger) later from the new colony page. The task string \
|
||||
must be FULL and self-contained — when the worker eventually \
|
||||
runs it has zero memory of your chat. The skill you wrote is \
|
||||
discovered on first scan so the worker starts informed.
|
||||
The colony worker inherits your full conversation at spawn \
|
||||
time, so it sees everything you already did and said — no \
|
||||
repeated discovery. NOTHING RUNS immediately after this \
|
||||
call: the task is baked into worker.json and the user starts \
|
||||
the worker (or wires up a trigger) later from the new colony \
|
||||
page. The task string still must be FULL and self-contained \
|
||||
because triggers fire without your chat context.
|
||||
|
||||
## Workflow summary
|
||||
1. Understand requirements → discover tools → design the layout
|
||||
@@ -843,32 +879,62 @@ synthesis.
|
||||
|
||||
## Forking this session into a persistent colony
|
||||
|
||||
**When to use create_colony:** the user needs work to run \
|
||||
**headless, recurring, or in parallel to this chat** — something \
|
||||
that should keep going after this conversation ends. Typical \
|
||||
triggers:
|
||||
**Prove the work inline BEFORE scaling to a colony.** This is the \
|
||||
most important rule in this section. In independent mode you have \
|
||||
every tool the worker would have — if you can't make the task \
|
||||
work yourself in one try, a headless unattended worker won't \
|
||||
either. The expensive, hard-to-debug failures (dummy-target \
|
||||
browser loops, wrong selectors, misread skills) happen when a \
|
||||
queen delegates to a colony without ever doing the work herself \
|
||||
first.
|
||||
|
||||
**The inline-first, scale-after pattern:**
|
||||
|
||||
1. **Do one instance of the work yourself, inline**, right in \
|
||||
this chat. Open the browser, click the real button, type \
|
||||
the real text, send the real message, verify the real \
|
||||
result. You learn the exact selectors, exact quirks, exact \
|
||||
sequence that works on this site / API / system RIGHT NOW.
|
||||
2. **Report the result to the user.** Show them the concrete \
|
||||
sample. Ask if they want anything adjusted before you \
|
||||
scale up.
|
||||
3. **Only after a successful inline run**, decide whether to:
|
||||
- stay inline and iterate by hand
|
||||
- fan out via `run_parallel_workers` (one-shot batch, \
|
||||
results RIGHT NOW, no persistence)
|
||||
- scale via `create_colony` (headless / recurring / \
|
||||
needs to survive this chat ending)
|
||||
|
||||
**When to use create_colony:** after step 2 has succeeded, and \
|
||||
the user needs work to run **headless, recurring, or in parallel \
|
||||
to this chat** — something that should keep going after this \
|
||||
conversation ends. Typical triggers:
|
||||
- "run this every morning / every hour / on a cron"
|
||||
- "keep monitoring X and alert me when Y changes"
|
||||
- "fire this off in the background so I can keep working here"
|
||||
- "spin up a dedicated agent for this job"
|
||||
- any task that needs to survive the current session
|
||||
|
||||
**When NOT to use it:** if the user just wants results RIGHT NOW \
|
||||
in this chat, use `run_parallel_workers` instead. Don't create a \
|
||||
colony just because you "learned something reusable" — the \
|
||||
trigger is operational (needs to keep running), not epistemic \
|
||||
(knowledge worth saving).
|
||||
**When NOT to use it:**
|
||||
- You haven't actually done the work once yet. STOP. Do it \
|
||||
inline first. This is the #1 cause of silent worker failure.
|
||||
- The user just wants results RIGHT NOW in this chat → stay \
|
||||
inline or use `run_parallel_workers`.
|
||||
- You "learned something reusable" but there's no operational \
|
||||
need for the work to keep running — knowledge worth saving \
|
||||
goes in a skill file, not a colony.
|
||||
|
||||
**Two-step flow:**
|
||||
**Two-step flow (assuming step 1-2 above have succeeded):**
|
||||
1. AUTHOR A SKILL FIRST in a SCRATCH location so the colony \
|
||||
worker has the operational context it needs to run \
|
||||
unattended. Use write_file to create a skill folder \
|
||||
unattended — and write it from the knowledge you just \
|
||||
earned doing the work inline, not from speculation. Include \
|
||||
the EXACT selectors, tool call sequences, and gotchas you \
|
||||
hit in your own run. Use write_file to create a skill folder \
|
||||
somewhere temporary (e.g. `/tmp/{skill-name}/` or your \
|
||||
working directory) capturing the procedure — API endpoints, \
|
||||
auth flow, pagination, gotchas, rate limits, response \
|
||||
shapes. DO NOT author it under `~/.hive/skills/` — that path \
|
||||
is user-global and would leak the skill to every other \
|
||||
agent. The SKILL.md needs YAML frontmatter with `name` \
|
||||
working directory). DO NOT author it under `~/.hive/skills/` \
|
||||
— that path is user-global and would leak the skill to every \
|
||||
other agent. The SKILL.md needs YAML frontmatter with `name` \
|
||||
(matching the directory name) and `description` (1-1024 \
|
||||
chars including trigger keywords), followed by a markdown \
|
||||
body. Optional subdirs: scripts/, references/, assets/. \
|
||||
@@ -878,12 +944,14 @@ trigger is operational (needs to keep running), not epistemic \
|
||||
the skill folder, forks this session into a new colony, and \
|
||||
installs the skill COLONY-SCOPED at \
|
||||
`~/.hive/colonies/{colony_name}/skills/{skill_name}/`. Only \
|
||||
that colony's worker sees it, no other agent. NOTHING RUNS \
|
||||
after this call — the task is baked into worker.json and \
|
||||
the user starts the worker (or wires up a trigger) later \
|
||||
from the new colony page. The task string must be FULL and \
|
||||
self-contained because the worker has zero memory of your \
|
||||
chat when it eventually runs.
|
||||
that colony's worker sees it, no other agent. The colony \
|
||||
worker inherits your full conversation at spawn time, so it \
|
||||
sees everything you already did and said — no repeated \
|
||||
discovery. NOTHING RUNS immediately after this call — the \
|
||||
task is baked into worker.json and the user starts the \
|
||||
worker (or wires up a trigger) later from the new colony \
|
||||
page. The task string must still be FULL and self-contained \
|
||||
because triggers fire without your chat context.
|
||||
"""
|
||||
|
||||
_queen_behavior_editing = """
|
||||
@@ -899,33 +967,52 @@ Report the last run's results to the user and ask what they want to do next.
|
||||
"""
|
||||
|
||||
_queen_behavior_independent = """
|
||||
## Independent — do the work yourself
|
||||
## Independent — do the work yourself (inline first, always)
|
||||
|
||||
You are the agent. No pre-loaded worker — you execute directly.
|
||||
1. Understand the task from the user
|
||||
2. Plan your approach briefly (no flowcharts or agent design)
|
||||
3. Execute using your tools: file I/O, shell commands, browser automation
|
||||
4. Report results, iterate if needed
|
||||
You are the agent. No pre-loaded worker — you execute directly. \
|
||||
**Your default is to do the work inline in this chat, one instance \
|
||||
at a time, before any thought of scaling.**
|
||||
|
||||
## Scaling up from independent mode
|
||||
1. Understand the task from the user.
|
||||
2. Plan your approach briefly (no flowcharts, no agent design).
|
||||
3. **Do the work yourself, inline. One real instance.** Open the \
|
||||
browser, call the real API, write to the real file, send the \
|
||||
real message. Use your actual tools against real state. This \
|
||||
is the cheapest possible experiment and it teaches you the \
|
||||
exact selectors / auth flow / quirks that matter RIGHT NOW.
|
||||
4. **Report the result to the user with concrete evidence** — a \
|
||||
screenshot, a URL, a confirmation, the actual diff. Let them \
|
||||
react before you scale.
|
||||
5. Iterate if needed — STAY INLINE while you figure out the \
|
||||
mechanics. Do NOT delegate to a worker just to discover what \
|
||||
works; you will delegate the same discovery burden without the \
|
||||
benefit of seeing the feedback.
|
||||
6. Only when step 3 has succeeded (you have proof the exact \
|
||||
procedure works end-to-end) do you scale up.
|
||||
|
||||
You have no pre-loaded worker in this phase, but you DO have two \
|
||||
lifecycle tools for spinning up work dynamically:
|
||||
**Scaling pathways** (in order of cost, cheapest first):
|
||||
- **Stay inline, run it again.** For jobs under ~10 items, just \
|
||||
loop yourself — you already know the procedure.
|
||||
- **`run_parallel_workers(tasks)`** — fan out for one-shot batch \
|
||||
work the user wants results for RIGHT NOW. No persistence, no \
|
||||
colony. Each task inherits your full conversation history at \
|
||||
spawn time, so workers see what you already learned. Use when \
|
||||
you need concurrency to beat wall-clock time.
|
||||
- **`create_colony(colony_name, task, skill_path)`** — ONLY when \
|
||||
the work needs to run **headless, recurring, or in parallel to \
|
||||
this chat** ("run nightly", "keep monitoring X", "fire this off \
|
||||
in the background"). Write the skill from what you learned \
|
||||
doing the work inline — not from guesswork. Then fork. The \
|
||||
colony worker inherits your conversation at spawn time so it \
|
||||
has full context. Do NOT use this just because you "learned \
|
||||
something reusable" — the trigger is operational (needs to \
|
||||
keep running), not epistemic.
|
||||
|
||||
- **run_parallel_workers(tasks)** — for one-off batch work the user \
|
||||
wants results for RIGHT NOW. Fan out N subtasks concurrently and \
|
||||
synthesize the aggregated reports. No colony is created; the \
|
||||
workers exist only for this call.
|
||||
- **create_colony(colony_name, task, skill_path)** — when the user \
|
||||
wants work to run **headless, recurring, or in parallel to this \
|
||||
chat** (e.g. "run nightly", "keep monitoring X", "fire this off \
|
||||
in the background"). Write a skill folder to scratch capturing \
|
||||
the operational procedure, then call this to fork the session \
|
||||
and install the skill colony-scoped. Nothing runs after fork — \
|
||||
the user starts the worker (or sets a trigger) later from the \
|
||||
new colony page. Do NOT use this just because you "learned \
|
||||
something reusable" — the trigger is operational (needs to keep \
|
||||
running), not epistemic.
|
||||
**Hard rule: NEVER call `run_parallel_workers` or `create_colony` \
|
||||
before you have successfully completed the task once inline.** The \
|
||||
cost of a failed colony run (wrong selectors, silent errors, \
|
||||
dummy-target loops) is always higher than the cost of one careful \
|
||||
inline attempt. When in doubt, do it yourself first.
|
||||
|
||||
You do NOT have the agent-building lifecycle (no save_agent_draft, \
|
||||
confirm_and_build, load_built_agent, run_agent_with_input). If the \
|
||||
|
||||
@@ -41,6 +41,42 @@ if TYPE_CHECKING:
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def _format_spawn_task_message(task: str, input_data: dict[str, Any]) -> str:
|
||||
"""Render the spawn task into the worker's next user message.
|
||||
|
||||
Spawned workers inherit the queen's conversation via
|
||||
``ColonyRuntime._fork_parent_conversation``; this helper builds
|
||||
the content of the trailing user message that carries the new
|
||||
task. The queen's chat already provides the context for the
|
||||
task, so we frame this as an explicit hand-off.
|
||||
|
||||
Additional keys from ``input_data`` (other than the task itself)
|
||||
are rendered below the hand-off line so the worker sees them as
|
||||
structured hand-off data. This mirrors the fresh-path
|
||||
``AgentLoop._build_initial_message`` shape so worker prompts look
|
||||
roughly the same whether or not inheritance fired.
|
||||
"""
|
||||
lines = [
|
||||
"# New task delegated by the queen",
|
||||
"",
|
||||
"The queen's conversation up to this point is visible above. "
|
||||
"Use it as context (who the user is, what was already decided, "
|
||||
"which skills apply). Your own system prompt and tool set are "
|
||||
"set by the framework — the queen's tools may differ from "
|
||||
"yours, so treat her prior tool calls as history only.",
|
||||
"",
|
||||
f"task: {task}",
|
||||
]
|
||||
for key, value in (input_data or {}).items():
|
||||
if key in ("task", "user_request"):
|
||||
# Already rendered above; don't duplicate.
|
||||
continue
|
||||
if value is None:
|
||||
continue
|
||||
lines.append(f"{key}: {value}")
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
@dataclass
|
||||
class ColonyConfig:
|
||||
max_concurrent_workers: int = 100
|
||||
@@ -432,6 +468,131 @@ class ColonyRuntime:
|
||||
def resume_timers(self) -> None:
|
||||
self._timers_paused = False
|
||||
|
||||
async def _fork_parent_conversation(
|
||||
self,
|
||||
dest_conv_dir: Path,
|
||||
*,
|
||||
task: str,
|
||||
input_data: dict[str, Any] | None = None,
|
||||
) -> None:
|
||||
"""Fork the colony's parent queen conversation into ``dest_conv_dir``.
|
||||
|
||||
Copies the queen's ``parts/*.json`` and ``meta.json`` into the
|
||||
worker's fresh conversation dir, then appends a synthetic user
|
||||
message carrying the new task. The worker's subsequent
|
||||
``AgentLoop._restore`` reads this conversation via the usual
|
||||
path — the queen's history is visible as prior turns, the task
|
||||
appears as the most recent user message, and the worker starts
|
||||
acting on it with full context.
|
||||
|
||||
This is a no-op if the colony runtime doesn't own a parent
|
||||
queen conversation (e.g. a standalone colony started without a
|
||||
queen wrapper).
|
||||
|
||||
Notes on filtering compatibility:
|
||||
- Queen parts have ``phase_id=None``. When the worker's
|
||||
restore applies its own phase filter, the backward-compat
|
||||
fallback in NodeConversation.restore kicks in: an
|
||||
all-None-phased store bypasses the filter. See
|
||||
``conversation.py:1369-1378``.
|
||||
- ``cursor.json`` is deliberately NOT copied. The worker
|
||||
should start fresh at iteration 0; copying the queen's
|
||||
cursor would make the worker think it had already done
|
||||
work.
|
||||
- The queen's ``meta.json`` is copied but the AgentLoop
|
||||
immediately rebuilds ``system_prompt`` from the worker's
|
||||
own context post-restore (see agent_loop.py:533-535), so
|
||||
the queen's system prompt does not leak into the worker.
|
||||
"""
|
||||
# Resolve the queen's own conversation dir. For a queen-backed
|
||||
# ColonyRuntime, storage_path points at the queen's session dir
|
||||
# and conversations/ lives inside it. For standalone runtimes
|
||||
# (tests, legacy fork path under ~/.hive/agents/{name}/worker/)
|
||||
# there's no parent conversation — fall through to the fresh
|
||||
# spawn path.
|
||||
src_conv_dir = self._storage_path / "conversations"
|
||||
src_parts_dir = src_conv_dir / "parts"
|
||||
if not src_parts_dir.exists():
|
||||
# No queen conversation to inherit — the worker starts with
|
||||
# only the task, same as the pre-fork behavior. AgentLoop's
|
||||
# fresh-conversation branch will call _build_initial_message
|
||||
# and render input_data into the worker's first user message.
|
||||
return
|
||||
|
||||
def _copy_and_append() -> None:
|
||||
dest_parts = dest_conv_dir / "parts"
|
||||
dest_parts.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# Copy each queen part. Use json.dumps round-trip (not raw
|
||||
# file copy) so we can be defensive about unreadable files —
|
||||
# a corrupted queen part file shouldn't take down the worker
|
||||
# spawn, just drop that one part.
|
||||
max_seq = -1
|
||||
for part_file in sorted(src_parts_dir.glob("*.json")):
|
||||
try:
|
||||
data = json.loads(part_file.read_text(encoding="utf-8"))
|
||||
except (json.JSONDecodeError, OSError) as exc:
|
||||
logger.warning(
|
||||
"spawn fork: skipping unreadable queen part %s: %s",
|
||||
part_file.name,
|
||||
exc,
|
||||
)
|
||||
continue
|
||||
seq = data.get("seq")
|
||||
if isinstance(seq, int) and seq > max_seq:
|
||||
max_seq = seq
|
||||
(dest_parts / part_file.name).write_text(
|
||||
json.dumps(data, ensure_ascii=False),
|
||||
encoding="utf-8",
|
||||
)
|
||||
|
||||
# Copy the queen's meta.json so the worker's restore finds
|
||||
# the conversation during its first run. The meta fields
|
||||
# (system_prompt, max_context_tokens, etc.) get overridden
|
||||
# by the worker's own AgentLoop config + context after
|
||||
# restore, so nothing here bleeds into runtime behavior.
|
||||
src_meta = src_conv_dir / "meta.json"
|
||||
if src_meta.exists():
|
||||
try:
|
||||
meta_data = json.loads(src_meta.read_text(encoding="utf-8"))
|
||||
(dest_conv_dir / "meta.json").write_text(
|
||||
json.dumps(meta_data, ensure_ascii=False),
|
||||
encoding="utf-8",
|
||||
)
|
||||
except (json.JSONDecodeError, OSError) as exc:
|
||||
logger.warning(
|
||||
"spawn fork: failed to copy queen meta.json: %s", exc
|
||||
)
|
||||
|
||||
# Append the task as the next user message so the worker's
|
||||
# LLM sees it as the most recent turn in the conversation
|
||||
# after restore. This replaces the fresh-path call to
|
||||
# _build_initial_message for spawned workers.
|
||||
task_content = _format_spawn_task_message(task, input_data or {})
|
||||
next_seq = max_seq + 1
|
||||
task_part = {
|
||||
"seq": next_seq,
|
||||
"role": "user",
|
||||
"content": task_content,
|
||||
# phase_id omitted (None) so the backward-compat
|
||||
# fallback in NodeConversation.restore keeps it visible
|
||||
# to both queen-style and phase-filtered restores.
|
||||
# run_id omitted so the worker's run_id filter (off by
|
||||
# default since ctx.run_id is empty) doesn't reject it.
|
||||
}
|
||||
task_filename = f"{next_seq:010d}.json"
|
||||
(dest_parts / task_filename).write_text(
|
||||
json.dumps(task_part, ensure_ascii=False),
|
||||
encoding="utf-8",
|
||||
)
|
||||
logger.info(
|
||||
"spawn fork: inherited %d queen parts + appended task at seq %d",
|
||||
max_seq + 1,
|
||||
next_seq,
|
||||
)
|
||||
|
||||
await asyncio.to_thread(_copy_and_append)
|
||||
|
||||
# ── Worker Spawning ─────────────────────────────────────────
|
||||
|
||||
async def spawn(
|
||||
@@ -497,6 +658,22 @@ class ColonyRuntime:
|
||||
# (worse) the process CWD.
|
||||
worker_storage = self._storage_path / "workers" / worker_id
|
||||
worker_storage.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# Fork the queen's conversation into the worker's store.
|
||||
# The queen already accumulated the user chat, read relevant
|
||||
# skills, and made decisions about how to approach the task;
|
||||
# the worker would repeat that discovery work (and often
|
||||
# mis-step — see the 2026-04-14 "dummy-target" incident)
|
||||
# if spawned with a blank store. We snapshot the queen's
|
||||
# parts + meta at spawn time, then append the task as the
|
||||
# next user message so the worker's AgentLoop restores into
|
||||
# a conversation that already ends with its new instruction.
|
||||
await self._fork_parent_conversation(
|
||||
worker_storage / "conversations",
|
||||
task=task,
|
||||
input_data=input_data,
|
||||
)
|
||||
|
||||
worker_conv_store = FileConversationStore(
|
||||
worker_storage / "conversations"
|
||||
)
|
||||
|
||||
@@ -98,10 +98,20 @@ textarea = browser_evaluate("""
|
||||
browser_click_coordinate(textarea['cx'], textarea['cy'])
|
||||
sleep(0.6)
|
||||
|
||||
# 6. Insert text via CDP Input.insertText (browser_type does this by default now).
|
||||
# Per-char keyDown fails on Lexical composers — the keys dispatch but
|
||||
# the editor never turns them into text.
|
||||
browser_type(<appropriate-selector-or-skip-selector-and-use-bridge-insertText>, text)
|
||||
# 6. Insert text via document.execCommand('insertText') through browser_evaluate.
|
||||
# This is the ONLY reliable approach for LinkedIn's Lexical composer.
|
||||
# See the "Lexical composer quirks" section below for why browser_type
|
||||
# with a selector does NOT work here (the contenteditable lives inside
|
||||
# the #interop-outlet shadow root which document.querySelector can't
|
||||
# reach). The click in step 5 already put Lexical into edit mode, so
|
||||
# execCommand injects straight into the focused editor's state.
|
||||
browser_evaluate("""
|
||||
(function(){
|
||||
document.execCommand('insertText', false, %s);
|
||||
return true;
|
||||
})();
|
||||
""" % json.dumps(message_text)) # json.dumps gives you a safely-escaped JS string literal
|
||||
sleep(1.0) # let Lexical commit state + enable Send button
|
||||
|
||||
# 7. Find the modal Send button (filter by in-viewport, reject pinned bar)
|
||||
send = browser_evaluate("""
|
||||
@@ -133,11 +143,24 @@ send = browser_evaluate("""
|
||||
})();
|
||||
""")
|
||||
|
||||
# 8. ONLY click Send if it's enabled — if disabled, the editor didn't register the input.
|
||||
# Don't click blindly; the framework state is the source of truth, not the DOM text.
|
||||
if not send['disabled']:
|
||||
browser_click_coordinate(send['cx'], send['cy'])
|
||||
sleep(2.5) # wait for send + bubble render
|
||||
# 8. ONLY click Send if it's enabled — if disabled, the execCommand
|
||||
# didn't land. DO NOT retry with a different tool; the fix is
|
||||
# always: re-click the composer rect, re-run execCommand, re-check.
|
||||
# The Send button's `disabled` state IS the ground truth — if
|
||||
# Lexical registered your text, it enables the button. If it's
|
||||
# still disabled, your text did not reach the editor, regardless
|
||||
# of what any tool call claims.
|
||||
if send['disabled']:
|
||||
# The editor didn't receive your text. Do NOT click Send. Do NOT
|
||||
# fall back to browser_type with a dummy selector (see anti-pattern
|
||||
# in Common Pitfalls). Instead: re-click the textarea rect from
|
||||
# step 4, wait a beat, re-run the execCommand insertText from step
|
||||
# 6. If that still fails after 2 retries, bail and surface — the
|
||||
# modal may have been reclaimed by a stale state or auth wall.
|
||||
raise Exception("Send button disabled after insertText — editor did not receive input")
|
||||
|
||||
browser_click_coordinate(send['cx'], send['cy'])
|
||||
sleep(2.5) # wait for send + bubble render
|
||||
```
|
||||
|
||||
**Verify post-send**: the composer textarea should now be empty (`innerText === ''`) and `.msg-s-event-listitem__message-bubble` count should have grown by 1. Walk the shadow tree via `browser_evaluate` to check.
|
||||
@@ -247,8 +270,9 @@ If any of those show up, **stop the run, screenshot the state, and surface the i
|
||||
## Common pitfalls
|
||||
|
||||
- **`innerHTML` injection is silently dropped** — LinkedIn's Trusted Types CSP discards any `innerHTML = "<...>"` from injected scripts, no console error. Always use `createElement` + `appendChild` + `setAttribute` for DOM injection. `textContent`, `style.cssText`, and `.value` assignments are fine.
|
||||
- **Per-char keyDown on the message composer produces empty text** — Lexical intercepts `beforeinput` and drops raw keys. Use `browser_type` (which now routes through CDP `Input.insertText`), or call `Input.insertText` directly via the bridge on the focused shadow element.
|
||||
- **`browser_type(selector=...)` can't see the message composer** — it's inside `#interop-outlet` shadow. `document.querySelector('div.msg-form__contenteditable')` returns nothing. Use the shadow-walk + click-to-focus pattern above.
|
||||
- **Do NOT use `browser_type` on the message composer — use `document.execCommand('insertText', false, text)` via `browser_evaluate` instead.** The Lexical contenteditable lives inside the `#interop-outlet` shadow root which `document.querySelector` (what `browser_type` uses under the hood) cannot see. Attempts to work around this with `browser_shadow_query` fail because `browser_type` doesn't support the `>>>` shadow-pierce syntax. The ONLY reliable insert path is: (1) `browser_click_coordinate` on the composer rect (put Lexical in edit mode via a real CDP pointer click) → (2) `browser_evaluate` with `document.execCommand('insertText', false, <message>)` against the focused editor. This pattern is verified end-to-end across 15+ successful sends in session `session_20260414_113244_a98cfd66` (2026-04-14).
|
||||
- **Per-char keyDown on the message composer produces empty text** — Lexical intercepts `beforeinput` and drops raw keys. Ignore `browser_type` entirely for LinkedIn DMs; use the `execCommand('insertText')` path above.
|
||||
- **ANTI-PATTERN: "inject a dummy `<div id='dummy-target'>` and pass it as the `selector` arg to `browser_type`".** This looks tempting but fails compoundingly: `browser_type` clicks the **dummy div's** rect (not the editor's), the click lands on the Lexical wrapper's non-editable chrome, the contenteditable never receives focus, and `Input.insertText` fires against nothing. The bridge will still return `{"ok": true, "action": "type", "length": N}` because it has no way to verify the text actually landed. Symptom: Send button stays `disabled: true` forever. Fix: use `execCommand('insertText')` exactly as shown in the profile-message flow above. (See `session_20260414_114820_08bd3c4d` for the failed attempt.)
|
||||
- **Multiple Send buttons on the page** — the pinned bottom-right messaging bar has its own `msg-form__send-button` that's usually below `innerHeight`. Filter by in-viewport before clicking.
|
||||
- **`window.onbeforeunload` hangs navigation/close** — after typing in a composer, any `browser_navigate` or `close_tab` can pop a native "unsent message, leave?" confirm dialog that deadlocks the bridge. Always strip `onbeforeunload` before any navigation, and wrap composer flows in a `try/finally` that runs the cleanup block:
|
||||
|
||||
|
||||
Reference in New Issue
Block a user