feat: prompt improvements

2026-04-29 15:26:58 -07:00
parent 4794c8b816
commit 5b45fac435
4 changed files with 48 additions and 19 deletions
@@ -132,11 +132,10 @@ phase. Your identity tells you WHO you are.
 # ---------------------------------------------------------------------------

 _queen_role_independent = """\
-You are in INDEPENDENT mode. No worker layout — you do the work yourself. \
+You are in INDEPENDENT mode. \
 You have full coding tools (read/write/edit/search/run) and MCP tools \
 (file operations via coder-tools, browser automation via gcu-tools). \
-Execute the user's task directly using conversation and tools. \
-You are the agent. \
+Execute the user's task directly using planning, conversation and tools.
 If you need a structured choice or approval gate, always use \
 ``ask_user``; otherwise ask in plain prose. ``ask_user`` takes a \
 ``questions`` array — pass a single entry for one question, or batch \
@@ -237,13 +236,12 @@ ceremony for a single-paragraph summary.
 # ---------------------------------------------------------------------------

 _queen_tools_independent = """
-# Tools (INDEPENDENT mode)
+# Tools

 ## Planning — use FIRST for multi-step work
- task_create_batch — When a request has 3+ atomic steps, your FIRST \
+- task_create_batch — When a request has 2+ atomic steps, your FIRST \
 tool call is `task_create_batch` with one entry per step (atomic, \
-one round-trip). Use this for the upfront plan, NOT five separate \
-`task_create` calls.
+one round-trip).
 - task_create — One-off mid-run additions when you discover \
 unplanned work AFTER the initial plan is laid out.
 - task_update / task_list / task_get — Mark progress, inspect, or \
@@ -270,9 +268,6 @@ search_files, run_command, undo_changes
  INCUBATING and a new tool surface (including create_colony itself) \
  unlocks. On rejection you stay here and keep the conversation going \
  to fill the gaps the evaluator named.
- ``intended_purpose`` is a one-paragraph brief: what the colony will \
-  do, on what cadence, why it must outlive this chat. Don't write a \
-  SKILL.md here — that comes in INCUBATING.
 """

 _queen_tools_incubating = """
@@ -418,10 +413,10 @@ asks for specifics. Do not invent a new pass unless the user asks for one.
 _queen_behavior_independent = """
 ## Independent execution

-You are the agent. **For multi-step work (3+ atomic actions): your FIRST \
-tool call is `task_create_batch`** with one entry per atomic action, \
-before you touch any other tool. (One call, atomic — not N separate \
-`task_create` calls.) Then work the list one task at a time:
+You are the agent. **For multi-step work (2+ atomic actions): call \
+`task_create_batch`** with one entry per atomic action, \
+before you touch any other tool. \
+Then work the list one task at a time:

 1. `task_update` → in_progress before you start the step.
 2. Do one real inline instance — open the browser, call the real API, \
@@ -502,6 +497,35 @@ Read the user's signals and calibrate your register:
 - Correct technical terms -> they know the domain. Skip basics.
 - Terse or frustrated ("just do X") -> acknowledge and simplify.
 - Exploratory ("what if...", "could we also...") -> slow down and explore.
+
+Read the user's task-shape signals and calibrate the task list. The list is \
+the user's right-rail panel — it must reflect what you're actually doing now, \
+not what you were doing two turns ago.
+- New instructions arrive -> immediately capture them as tasks. \
+`task_create_batch` for ≥2 atomic steps in one round-trip; `task_create` \
+for a single mid-run addition you discover after the plan is laid out.
+- Pivot signals ("actually nevermind", "scrap that, do Y instead", scope \
+changed, priority flipped) -> BEFORE planning the new direction, prune the \
+old one: `task_update` with status='deleted' on every pending task that no \
+longer applies. There is no bulk-delete and no undo — deletion is permanent, \
+the id is retired and cannot be reused, so be deliberate but don't hesitate.
+- An in_progress task overtaken by a pivot -> resolve it explicitly. If real \
+work shipped, mark it `completed`; if it was scrapped mid-flight, mark it \
+`deleted`. Never let a stale in_progress task linger — to the user's panel \
+that looks identical to "the queen is stuck".
+- Single action, chat, or conceptual question -> skip the task tools \
+entirely. The bar is real multi-step work the user benefits from seeing \
+tracked, not "anything you reply to".
+- The list has drifted from current reality across several silent turns \
+(stale items, partially-relevant umbrellas, completed work nobody wrote \
+down) -> prune and reconcile before adding new tasks. A clean list is \
+cheaper than an honest one full of debt.
+
+Each `completed` transition is a discrete progress heartbeat in the user's \
+right-rail panel — mark a task `completed` THE MOMENT it's done, never \
+batch completions, and never mark `completed` with caveats. If it's not \
+fully done, it stays `in_progress` and you create a new task describing \
+what's blocking.
 """


@@ -1279,12 +1279,8 @@ def format_queen_identity_prompt(profile: dict[str, Any], *, max_examples: int |
        "<negative_constraints>\n"
        "- NEVER use corporate filler ('leverage', 'synergy', "
        "'circle back', 'at the end of the day').\n"
-        "- NEVER use AI assistant phrases ('How can I help you "
-        "today?', 'As an AI', 'I'd be happy to').\n"
        "- NEVER break character to explain your thought process "
        "or reference your hidden background.\n"
-        "- Speak like a real person in your role -- direct, "
-        "opinionated, occasionally imperfect.\n"
        "</negative_constraints>"
    )

@@ -89,6 +89,10 @@ def build_reminder(records: list[TaskRecord]) -> str:
        "  - If you're umbrella-tracking ('reply to all posts' as one task), "
        "break it into one task per atomic action — use `task_create_batch` "
        "with one entry per action.",
+        "  - Also consider cleaning up the task list if it has become stale: "
+        "if any open tasks no longer apply (user pivoted, scope shifted, "
+        "task created in error), delete them via `task_update` with "
+        "status='deleted'. Don't leave stale items sitting on the list.",
    ]
    if in_progress:
        bullets.append(
@@ -160,6 +160,9 @@ _CREATE_DESC = (
    "Create ONE task on your own session task list. Use this for one-off "
    "mid-run additions when you discover unplanned work after the initial "
    "plan is laid out.\n\n"
+    "**After receiving new instructions, immediately capture the user's "
+    "requirements as tasks** — and delete (via `task_update` with "
+    "status='deleted') any prior tasks that no longer apply.\n\n"
    "**For laying out a multi-step plan upfront, use `task_create_batch` "
    "instead** — one tool call with all the steps is cheaper and atomic.\n\n"
    "Fields:\n"
@@ -179,7 +182,9 @@ _UPDATE_DESC = (
    "- Mark it `completed` AS SOON as you finish it — do not let "
    "multiple finished tasks pile up unmarked before flushing them at "
    "the end of the run.\n"
-    "- Set status='deleted' to drop a task that's no longer relevant.\n\n"
+    "- Delete tasks: when a task is no longer relevant or was created "
+    "in error. Setting status='deleted' **permanently** removes the "
+    "task — the id is retired and cannot be reused.\n\n"
    "ONLY mark `completed` when the task is FULLY done. If you hit errors, "
    "blockers, or partial state, keep it `in_progress` and create a new "
    "task describing what's blocking. Never mark completed with caveats; "