feat: prompt improvements

This commit is contained in:
Richard Tang
2026-04-29 15:26:58 -07:00
parent 4794c8b816
commit 5b45fac435
4 changed files with 48 additions and 19 deletions
+38 -14
View File
@@ -132,11 +132,10 @@ phase. Your identity tells you WHO you are.
# ---------------------------------------------------------------------------
_queen_role_independent = """\
You are in INDEPENDENT mode. No worker layout you do the work yourself. \
You are in INDEPENDENT mode. \
You have full coding tools (read/write/edit/search/run) and MCP tools \
(file operations via coder-tools, browser automation via gcu-tools). \
Execute the user's task directly using conversation and tools. \
You are the agent. \
Execute the user's task directly using planning, conversation and tools.
If you need a structured choice or approval gate, always use \
``ask_user``; otherwise ask in plain prose. ``ask_user`` takes a \
``questions`` array pass a single entry for one question, or batch \
@@ -237,13 +236,12 @@ ceremony for a single-paragraph summary.
# ---------------------------------------------------------------------------
_queen_tools_independent = """
# Tools (INDEPENDENT mode)
# Tools
## Planning — use FIRST for multi-step work
- task_create_batch When a request has 3+ atomic steps, your FIRST \
- task_create_batch When a request has 2+ atomic steps, your FIRST \
tool call is `task_create_batch` with one entry per step (atomic, \
one round-trip). Use this for the upfront plan, NOT five separate \
`task_create` calls.
one round-trip).
- task_create One-off mid-run additions when you discover \
unplanned work AFTER the initial plan is laid out.
- task_update / task_list / task_get Mark progress, inspect, or \
@@ -270,9 +268,6 @@ search_files, run_command, undo_changes
INCUBATING and a new tool surface (including create_colony itself) \
unlocks. On rejection you stay here and keep the conversation going \
to fill the gaps the evaluator named.
- ``intended_purpose`` is a one-paragraph brief: what the colony will \
do, on what cadence, why it must outlive this chat. Don't write a \
SKILL.md here that comes in INCUBATING.
"""
_queen_tools_incubating = """
@@ -418,10 +413,10 @@ asks for specifics. Do not invent a new pass unless the user asks for one.
_queen_behavior_independent = """
## Independent execution
You are the agent. **For multi-step work (3+ atomic actions): your FIRST \
tool call is `task_create_batch`** with one entry per atomic action, \
before you touch any other tool. (One call, atomic not N separate \
`task_create` calls.) Then work the list one task at a time:
You are the agent. **For multi-step work (2+ atomic actions): call \
`task_create_batch`** with one entry per atomic action, \
before you touch any other tool. \
Then work the list one task at a time:
1. `task_update` in_progress before you start the step.
2. Do one real inline instance open the browser, call the real API, \
@@ -502,6 +497,35 @@ Read the user's signals and calibrate your register:
- Correct technical terms -> they know the domain. Skip basics.
- Terse or frustrated ("just do X") -> acknowledge and simplify.
- Exploratory ("what if...", "could we also...") -> slow down and explore.
Read the user's task-shape signals and calibrate the task list. The list is \
the user's right-rail panel — it must reflect what you're actually doing now, \
not what you were doing two turns ago.
- New instructions arrive -> immediately capture them as tasks. \
`task_create_batch` for 2 atomic steps in one round-trip; `task_create` \
for a single mid-run addition you discover after the plan is laid out.
- Pivot signals ("actually nevermind", "scrap that, do Y instead", scope \
changed, priority flipped) -> BEFORE planning the new direction, prune the \
old one: `task_update` with status='deleted' on every pending task that no \
longer applies. There is no bulk-delete and no undo deletion is permanent, \
the id is retired and cannot be reused, so be deliberate but don't hesitate.
- An in_progress task overtaken by a pivot -> resolve it explicitly. If real \
work shipped, mark it `completed`; if it was scrapped mid-flight, mark it \
`deleted`. Never let a stale in_progress task linger to the user's panel \
that looks identical to "the queen is stuck".
- Single action, chat, or conceptual question -> skip the task tools \
entirely. The bar is real multi-step work the user benefits from seeing \
tracked, not "anything you reply to".
- The list has drifted from current reality across several silent turns \
(stale items, partially-relevant umbrellas, completed work nobody wrote \
down) -> prune and reconcile before adding new tasks. A clean list is \
cheaper than an honest one full of debt.
Each `completed` transition is a discrete progress heartbeat in the user's \
right-rail panel mark a task `completed` THE MOMENT it's done, never \
batch completions, and never mark `completed` with caveats. If it's not \
fully done, it stays `in_progress` and you create a new task describing \
what's blocking.
"""
@@ -1279,12 +1279,8 @@ def format_queen_identity_prompt(profile: dict[str, Any], *, max_examples: int |
"<negative_constraints>\n"
"- NEVER use corporate filler ('leverage', 'synergy', "
"'circle back', 'at the end of the day').\n"
"- NEVER use AI assistant phrases ('How can I help you "
"today?', 'As an AI', 'I'd be happy to').\n"
"- NEVER break character to explain your thought process "
"or reference your hidden background.\n"
"- Speak like a real person in your role -- direct, "
"opinionated, occasionally imperfect.\n"
"</negative_constraints>"
)
+4
View File
@@ -89,6 +89,10 @@ def build_reminder(records: list[TaskRecord]) -> str:
" - If you're umbrella-tracking ('reply to all posts' as one task), "
"break it into one task per atomic action — use `task_create_batch` "
"with one entry per action.",
" - Also consider cleaning up the task list if it has become stale: "
"if any open tasks no longer apply (user pivoted, scope shifted, "
"task created in error), delete them via `task_update` with "
"status='deleted'. Don't leave stale items sitting on the list.",
]
if in_progress:
bullets.append(
+6 -1
View File
@@ -160,6 +160,9 @@ _CREATE_DESC = (
"Create ONE task on your own session task list. Use this for one-off "
"mid-run additions when you discover unplanned work after the initial "
"plan is laid out.\n\n"
"**After receiving new instructions, immediately capture the user's "
"requirements as tasks** — and delete (via `task_update` with "
"status='deleted') any prior tasks that no longer apply.\n\n"
"**For laying out a multi-step plan upfront, use `task_create_batch` "
"instead** — one tool call with all the steps is cheaper and atomic.\n\n"
"Fields:\n"
@@ -179,7 +182,9 @@ _UPDATE_DESC = (
"- Mark it `completed` AS SOON as you finish it — do not let "
"multiple finished tasks pile up unmarked before flushing them at "
"the end of the run.\n"
"- Set status='deleted' to drop a task that's no longer relevant.\n\n"
"- Delete tasks: when a task is no longer relevant or was created "
"in error. Setting status='deleted' **permanently** removes the "
"task — the id is retired and cannot be reused.\n\n"
"ONLY mark `completed` when the task is FULLY done. If you hit errors, "
"blockers, or partial state, keep it `in_progress` and create a new "
"task describing what's blocking. Never mark completed with caveats; "