feat: queen prompt optimization

This commit is contained in:
Richard Tang
2026-03-06 12:27:08 -08:00
parent 4de140a170
commit 1f7efcd940
3 changed files with 51 additions and 105 deletions
@@ -125,17 +125,10 @@ what they want before this step, skip the question and proceed directly.
# Core Mandates
- **DO NOT propose a complete goal on your own.** Instead, \
collaborate with the user to define it.
- **Read before writing.** NEVER write code from assumptions. Read \
reference agents and templates first. Read every file before editing.
- **Conventions first.** Follow existing project patterns exactly. \
Analyze imports, structure, and style in reference agents.
- **Verify assumptions.** Never assume a class, import, or pattern \
exists. Read actual source to confirm. Search if unsure.
- **Discover tools dynamically.** NEVER reference tools from static \
docs. Always run list_agent_tools() to see what actually exists.
- **Professional objectivity.** If a use case is a poor fit for the \
framework, say so. Technical accuracy over validation.
- **Concise.** No emojis. No preambles. No postambles. Substance only.
- **Self-verify.** After writing code, run validation and tests. Fix \
errors yourself. Don't declare success until validation passes.
@@ -230,28 +223,15 @@ If a question doesn't do one of these, don't ask it. Make an assumption, state i
---
### 1.1: Let Them Talk, But Listen Like an Architect
### 1.1: Let Them Talk, But Listen Like an Solution Architect
When the stakeholder describes what they want, don't just hear the words — \
listen for the architecture underneath. While they talk, mentally construct:
When the stakeholder describes what they want, mentally construct:
- **The pain**: What about today's situation is broken, slow, or missing?
- **The actors**: Who are the people/systems involved?
- **The trigger**: What kicks off the workflow?
- **The core loop**: What's the main thing that happens repeatedly?
- **The output**: What's the valuable thing produced at the end?
- **The pain**: What about today's situation is broken, slow, or missing?
You are extracting a **domain model** from natural language in real time. \
Most stakeholders won't give you this structure explicitly — they'll give you a story. \
Your job is to hear the structure inside the story.
| They say... | You're hearing... |
|-------------|-------------------|
| Nouns they repeat | Your entities |
| Verbs they emphasize | Your core operations |
| Frustrations they mention | Your design constraints |
| Workarounds they describe | What the system must replace |
| People they name | Your user types |
---
@@ -317,13 +297,11 @@ Never ask what you could answer yourself.
| Turn | Who | What |
|------|-----|------|
| 1 | User | Describes what they need |
| 2 | Agent | Plays back understanding as a proposed model. Asks 1-2 critical questions max. |
| 2 | Agent | Plays back understanding as a proposed model. Asks 1-2 critical questions. |
| 3 | User | Corrects, confirms, or adds detail |
| 4 | Agent | Adjusts model, confirms MVP scope, states assumptions, declares starting point |
| *(5)* | *(Only if Turn 3 revealed something that fundamentally changes the approach)* |
**AFTER the conversation, IMMEDIATELY proceed to 2b. DO NOT skip to building.**
---
#### Anti-Patterns
@@ -331,11 +309,8 @@ Never ask what you could answer yourself.
| Don't | Do Instead |
|-------|------------|
| Open with a list of questions | Open with what you understood from their request |
| "What are your requirements?" | "Here's what I think you need — am I right?" |
| Ask about every edge case | Handle with smart defaults, flag in summary |
| 10+ turn discovery conversation | 3-8 turns. Start building, iterate with real software. |
| Being lazy nd not understand what user want to achieve | Understand "what" and "why |
| Ask for permission to start | State your plan and start |
| Being lazy and not understand what user want to achieve | Understand "what" and "why |
| Wait for certainty | Start at 80% confidence, iterate the rest |
| Ask what tech/tools to use | That's your job. Decide, disclose, move on. |
@@ -386,21 +361,13 @@ database, explain that's not how the framework works.
## 4: Design Graph and Propose
Design the agent architecture:
Act like an experienced AI solution architect Design the agent architecture:
- Goal: id, name, description, 3-5 success criteria, 2-4 constraints
- Nodes: **2-5 nodes** (warn if <2 or >5)
- Edges: on_success for linear, conditional for routing
- Lifecycle: ALWAYS mark the primary event_loop node as terminal \
(`terminal_nodes=["process"]`). The node has `output_keys` and can \
complete when the agent finishes its work. This is the standard \
pattern for all interactive agents.
### Node Design Rules
Each node boundary serializes outputs to shared memory \
and DESTROYS all in-context information (tool results, reasoning, history). \
- Nodes: **3-6 nodes** (warn if <3 or >6). \
Use as many nodes as the use case requires, but don't create nodes without \
tools merge them into nodes that do real work.
- Edges: on_success for linear, conditional for routing
- Lifecycle: ALWAYS have terminal_nodes
**MERGE nodes when:**
- Node has NO tools (pure LLM reasoning) merge into predecessor/successor
@@ -414,10 +381,8 @@ tools — merge them into nodes that do real work.
- Fan-out parallelism (parallel branches MUST be separate)
**Typical patterns (queen manages all user interaction):**
- 2 nodes: `process (autonomous) validate (autonomous) process`
- 1 node: `process (autonomous)` simplest; queen handles intake/review
- 3 nodes: `gather work review` (review loops back to gather if not satisfied)
- WRONG: 7 nodes where half have no tools and just do LLM reasoning
- WRONG: Any worker node with `client_facing=True`
Read reference agents before designing:
list_agents()
@@ -430,10 +395,19 @@ use box-drawing characters and clear flow arrows:
```
process
in: user_request
gather
subagent: gcu_search
input: user_request
tools: web_search,
save_data
escalate
on_success
work
subagent: gcu_interact
tools: save_data,
write_file
on_success
@@ -441,8 +415,8 @@ use box-drawing characters and clear flow arrows:
review
tools: set_output
on_success
back to process
on_failure
back to gather
```
The queen owns intake: she gathers user requirements, then calls \
@@ -465,7 +439,6 @@ Get user approval before implementing.
]
**WAIT for user response.**
- If **Proceed**: Move to next implementing
- If **Adjust scope**: Discuss what to change, update your notes, re-assess if needed
- If **More questions**: Answer them honestly, then ask again
@@ -526,18 +499,6 @@ run_agent_tests("{name}")
If anything fails: read error, fix with edit_file, re-validate. Up to 3x.
**CRITICAL: Testing continuous-loop agents**
Most agents mark the primary event_loop node as terminal \
(`terminal_nodes=["process"]`). This means the agent can complete \
when it finishes its work. Agent tests MUST be structural:
- Validate graph, node specs, edges, tools, prompts
- Check goal/constraints/success criteria definitions
- Test `AgentRunner.load()` succeeds (structural, no API key needed)
- NEVER call `runner.run()` or `trigger_and_wait()` in tests for \
interactive agents they run indefinitely waiting for user input.
When you restructure an agent (change nodes/edges), always update \
the tests to match. Stale tests referencing old node names will fail.
## 6. Present
Show the user what you built: agent name, goal summary, graph (same \
@@ -665,8 +626,8 @@ _queen_behavior_always = """
## CRITICAL RULE — ask_user tool
Every response that ends with a question, a prompt, or expects user \
input MUST finish with a call to ask_user(prompt, options). This is \
NON-NEGOTIABLE. The system CANNOT detect that you are waiting for \
input MUST finish with a call to ask_user(prompt, options). \
The system CANNOT detect that you are waiting for \
input unless you call ask_user. You MUST call ask_user as the LAST \
action in your response.
@@ -680,7 +641,7 @@ Examples:
- ask_user("What do you need?",
["Build a new agent", "Run the loaded worker", "Help with code"])
- ask_user("Which pattern?",
["Simple 2-node", "Rich with feedback", "Custom"])
["Simple 3-node", "Rich with feedback", "Custom"])
- ask_user("Ready to proceed?",
["Yes, go ahead", "Let me change something"])
@@ -697,15 +658,12 @@ If no worker is loaded, say so.
# -- BUILDING phase behavior --
_queen_behavior_building = """
## Worker delegation
The worker is a specialized agent (see Worker Profile at the end of this \
prompt). It can ONLY do what its goal and tools allow.
## Direct coding
You can do any coding task directly reading files, writing code, running \
commands, building agents, debugging. For quick tasks, do them yourself.
**Decision rule read the Worker Profile first:**
**Decision rule if worker exists, read the Worker Profile first:**
- The user's request directly matches the worker's goal use \
run_agent_with_input(task) (if in staging) or load then run (if in building)
- Anything else do it yourself. Do NOT reframe user requests into \
@@ -726,8 +684,8 @@ prompt). It can ONLY do what its goal and tools allow.
run_agent_with_input(task) (if in staging) or load then run (if in building)
- Anything else do it yourself. Do NOT reframe user requests into \
subtasks to justify delegation.
- Building, modifying, or configuring agents is ALWAYS your job. Never \
delegate agent construction to the worker, even as a "research" subtask.
- Building, modifying, or configuring agents is ALWAYS your job. \
Use stop_worker_and_edit when you need to.
## When the user says "run", "execute", or "start" (without specifics)
@@ -782,17 +740,6 @@ When the user asks to change, modify, or update the loaded worker \
1. Call stop_worker_and_edit() this stops the worker and gives you \
coding tools (switches to BUILDING phase).
2. Use the **Path** from the Worker Profile to locate the agent files.
3. Read the relevant files (nodes/__init__.py, agent.py, etc.).
4. Make the requested changes using edit_file / write_file.
5. Run validation (default_agent.validate(), AgentRunner.load(), \
validate_agent_tools()).
6. **Reload the modified worker**: call load_built_agent("{path}") \
so the changes take effect immediately (switches to STAGING phase). \
Then call run_agent_with_input(task) to restart execution.
Do NOT skip step 6 without reloading, the user will still be \
interacting with the old version.
"""
# -- RUNNING phase behavior --
@@ -883,17 +830,6 @@ When the user asks to change, modify, or update the loaded worker \
1. Call stop_worker_and_edit() this stops the worker and gives you \
coding tools (switches to BUILDING phase).
2. Use the **Path** from the Worker Profile to locate the agent files.
3. Read the relevant files (nodes/__init__.py, agent.py, etc.).
4. Make the requested changes using edit_file / write_file.
5. Run validation (default_agent.validate(), AgentRunner.load(), \
validate_agent_tools()).
6. **Reload the modified worker**: call load_built_agent("{path}") \
so the changes take effect immediately (switches to STAGING phase). \
Then call run_agent_with_input(task) to restart execution.
Do NOT skip step 6 without reloading, the user will still be \
interacting with the old version.
"""
# -- Backward-compatible composed versions (used by queen_node.system_prompt default) --
@@ -934,11 +870,8 @@ Do NOT tell the user to run `python -m {name} run` — load and run it here.
_queen_style = """
# Style
- Responsible and thoughtful
- Concise. No fluff. Direct. No emojis.
- **One phase per response.** Stop after each phase and get user \
confirmation before moving on. Never combine understand + design + \
implement in one response.
- When starting the worker, describe what you told it in one sentence.
- When an escalation arrives, lead with severity and recommended action.
"""
@@ -108,7 +108,7 @@ This prevents premature set_output before user interaction.
### Fewer, Richer Nodes (CRITICAL)
**Hard limit: 2-4 nodes for most agents.** Never exceed 5 unless the user
**Hard limit: 3-6 nodes for most agents.** Never exceed 6 unless the user
explicitly requests a complex multi-phase pipeline.
Each node boundary serializes outputs to shared memory and **destroys** all
+19 -6
View File
@@ -1403,6 +1403,19 @@ def validate_graph() -> str:
f"must be a subset of output_keys {node.output_keys}"
)
# Node count warning (prefer 3-6 nodes)
node_count = len(session.nodes)
if node_count < 3:
warnings.append(
f"Agent has only {node_count} node(s). "
"Consider adding nodes for better separation of concerns (recommend 3-6)."
)
elif node_count > 6:
warnings.append(
f"Agent has {node_count} nodes. "
"Consider consolidating to 3-6 nodes for simpler architecture."
)
# Worker nodes should be autonomous; queen owns user interaction.
el_nodes = [n for n in session.nodes if n.node_type == "event_loop"]
cf_el_nodes = [n for n in el_nodes if n.client_facing]
@@ -2777,28 +2790,28 @@ def initialize_agent_package(
}
)
# Warn about node count (prefer 2-5 nodes)
# Warn about node count (prefer 3-6 nodes)
node_count = len(session.nodes)
if node_count < 2:
if node_count < 3:
design_warnings.append(
{
"node_id": None,
"type": "too_few_nodes",
"message": (
f"Agent has only {node_count} node. "
"Consider adding nodes for better separation of concerns."
f"Agent has only {node_count} node(s). "
"Consider adding nodes for better separation of concerns (recommend 3-6)."
),
"severity": "warning",
}
)
elif node_count > 5:
elif node_count > 6:
design_warnings.append(
{
"node_id": None,
"type": "too_many_nodes",
"message": (
f"Agent has {node_count} nodes. "
"Consider consolidating to 2-5 nodes for simpler architecture."
"Consider consolidating to 3-6 nodes for simpler architecture."
),
"severity": "warning",
}