fix: simplify system prompt

This commit is contained in:
Timothy
2026-04-17 04:47:51 -07:00
parent dde4dfaec9
commit 2b055d4d42
7 changed files with 130 additions and 114 deletions
+24 -97
View File
@@ -929,45 +929,16 @@ Report the last run's results to the user and ask what they want to do next.
"""
_queen_behavior_independent = """
## Independent execution first (inline by default)
## Independent execution
You are the agent. You execute directly.
**Default behavior: do one real instance inline before any scaling.**
0. **Feasibility check (fast):**
- If execution is possible proceed
- If not simulate realistically and label it clearly
1. Understand the task
2. Plan briefly (15 bullets, no system design)
3. **Do the work yourself, inline. One real instance.** Open the \
browser, call the real API, write to the real file, send the \
real message. Use your actual tools against real state. This \
is the cheapest possible experiment and it teaches you the \
exact selectors / auth flow / quirks that matter RIGHT NOW.
**Risk check:**
If action is irreversible or affects real systems show and confirm before executing
4. **Report with concrete evidence**
- Actual output / result
- What worked / failed
- Key learnings
5. Iterate inline until the process is reliable
6. Only then consider scaling
**Hard rule:** no scaling before one successful inline run
if you finish one sucessful inline run, follow **Scaling order:**
- Repeat inline (10 items)
- Parallel workers (batch, immediate results)
- Colony (only for recurring/background tasks)
**Exception:**
If task is conceptual/strategic skip execution and answer directly
You are the agent. Do one real inline instance before any scaling \
open the browser, call the real API, write to the real file. If the \
action is irreversible or touches shared systems, show and confirm \
before executing. Report concrete evidence (actual output, what \
worked / failed) after the run. Scale order once inline succeeds: \
repeat inline (10 items) `run_parallel_workers` (batch, results \
now) `create_colony` (recurring / background). Conceptual or \
strategic questions: answer directly, skip execution.
"""
# -- Behavior shared across all phases --
@@ -977,65 +948,21 @@ _queen_behavior_always = """
## Communication
Plain-text output IS how you talk to the user your response is \
displayed directly in the chat. Use text for conversational replies, \
open-ended questions, explanations, and short status updates before \
tool calls. When the user just wants to chat, chat back naturally; \
you don't need a tool call to "hand off" the turn — the system \
detects the end of your response and waits for their next message.
## Visible response channel
Your visible response is the plain text in your LLM reply the text \
you write after the closing `<tone>` tag of your internal assessment. \
NEVER use `run_command`, `echo`, or any other tool to emit what you \
want the user to read. Tools are for work: reading files, running \
commands, searching, editing. Tools are not for speaking. If you \
ever find yourself about to call `run_command("echo ...")` to say \
something, stop write it as plain text instead. The LLM reply \
itself is the channel; there is no other.
## ask_user / ask_user_multiple
Use these tools ONLY when you need the user to pick from a small set \
of concrete options approval gates, structured preference questions, \
decision points with 2-4 clear alternatives. Typical triggers:
- "Postgres or SQLite?" use ask_user tool with options
- "Approve this draft? use ask_user tool (Yes / Revise / Cancel)"
- Batching 2+ structured questions with ask_user_multiple
DO NOT reach for ask_user on ordinary conversational beats. "What's \
your name?", "Tell me more about that", "How are you?" — just write \
those as text. Free-form questions belong in prose. Using ask_user \
for every reply feels robotic and blocks natural conversation. \
When you do use it, keep your text to a brief intro; the widget \
renders the question and options.
## Chatting vs acting
**When the user greets you or chats, reply in plain prose no tool \
calls.** A bare "hi", "hey", "hello", "how's it going" is a \
conversational opener, not a hidden task. Do NOT call `list_directory`, \
`search_files`, `run_command`, `ask_user`, or any other tool to \
"discover" what they want. Instead, check what you already know about \
this user from your recall memory their name, role, past topics, \
preferences and write a 12 sentence greeting in character that \
references it. If you know their name, use it. If you remember what \
you last worked on together, reference it. Then stop and wait. They \
will bring the task when they have one. Presuming a task that wasn't \
stated is worse than waiting a turn.
**When the user asks you to DO something** (build, edit, run, \
investigate, search), call the appropriate tool directly on the same \
turn don't narrate intent and stop. "Let me check that file." \
followed by an immediate read_file is fine; "I'll check that file." \
with no tool call and then waiting is not. If you can act now, act now.
## Images
Users can attach images to messages. Analyze them directly using your \
vision capability the image is embedded, no tool call needed.
- Your LLM reply text is what the user reads. Do NOT use \
`run_command`, `echo`, or any other tool to "say" something tools \
are for work (read/search/edit/run), not speech.
- On a greeting or chat ("hi", "how's it going"), reply in plain \
prose and stop. Do not call tools to "discover" what the user wants. \
Check recall memory for name / role / past topics and weave them into \
a 12 sentence in-character greeting, then wait.
- On a clear ask (build, edit, run, investigate, search), call the \
appropriate tool on the same turn don't narrate intent and stop.
- Use `ask_user` / `ask_user_multiple` only for structured choices \
(approvals, 24 concrete options like "Postgres or SQLite?"). \
Free-form questions belong in prose; reaching for `ask_user` on \
every reply blocks natural conversation.
- Images attached by the user are analyzed directly via your vision \
capability no tool call needed.
"""
# -- PLANNING phase behavior --
+10 -1
View File
@@ -1160,7 +1160,9 @@ def update_queen_profile(queen_id: str, updates: dict[str, Any]) -> dict[str, An
# ---------------------------------------------------------------------------
def format_queen_identity_prompt(profile: dict[str, Any]) -> str:
def format_queen_identity_prompt(
profile: dict[str, Any], *, max_examples: int | None = None
) -> str:
"""Convert a queen profile into a high-dimensional character prompt.
Uses the 5-pillar character construction system: core identity,
@@ -1168,6 +1170,11 @@ def format_queen_identity_prompt(profile: dict[str, Any]) -> str:
behavior rules, and world lore. The hidden background and
psychological profile are never shown to the user but shape
every response.
``max_examples`` caps the roleplay_examples block profiles ship
four worked examples (~2.4 KB) but one is enough at runtime to show
the internal-then-external pattern. Full rendering stays available
for profile authoring / eval playback by leaving ``max_examples=None``.
"""
name = profile.get("name", "the Queen")
title = profile.get("title", "Senior Advisor")
@@ -1248,6 +1255,8 @@ def format_queen_identity_prompt(profile: dict[str, Any]) -> str:
# Few-shot examples showing the full internal process
examples = profile.get("examples", [])
if examples and max_examples is not None:
examples = examples[:max_examples]
if examples:
example_parts: list[str] = []
for ex in examples:
+10 -8
View File
@@ -175,12 +175,10 @@ def _build_credentials_provider() -> Any:
adapter = CredentialStoreAdapter.default()
accounts = adapter.get_all_account_info()
tool_provider_map = adapter.get_tool_provider_map()
rendered = build_accounts_prompt(
accounts,
tool_provider_map=tool_provider_map,
node_tool_names=None,
)
# Compact form (no tool_provider_map) — tool schemas already
# surface function names; baking the full per-provider list
# into the system prompt on every turn was ~2 KB of redundancy.
rendered = build_accounts_prompt(accounts)
except Exception:
logger.debug("Failed to render ambient credentials block", exc_info=True)
rendered = ""
@@ -231,7 +229,7 @@ async def materialize_queen_identity(
phase_state.queen_id = queen_id
phase_state.queen_profile = queen_profile
phase_state.queen_identity_prompt = format_queen_identity_prompt(queen_profile)
phase_state.queen_identity_prompt = format_queen_identity_prompt(queen_profile, max_examples=1)
if event_bus is not None:
await event_bus.publish(
@@ -565,6 +563,10 @@ async def create_queen(
_queen_skills_mgr.load()
phase_state.protocols_prompt = _queen_skills_mgr.protocols_prompt
phase_state.skills_catalog_prompt = _queen_skills_mgr.skills_catalog_prompt
# Also store the manager so get_current_prompt() can render a
# phase-filtered catalog on each turn (skills with a `visibility`
# frontmatter that excludes the current phase are dropped).
phase_state.skills_manager = _queen_skills_mgr
_queen_skill_dirs = _queen_skills_mgr.allowlisted_dirs
except Exception:
logger.debug("Queen skill loading failed (non-fatal)", exc_info=True)
@@ -632,7 +634,7 @@ async def create_queen(
except FileNotFoundError:
logger.warning("Queen profile %s not found after selection", queen_id)
return None
identity_prompt = format_queen_identity_prompt(profile)
identity_prompt = format_queen_identity_prompt(profile, max_examples=1)
# Store on phase_state so identity persists across dynamic prompt refreshes
phase_state.queen_id = queen_id
phase_state.queen_profile = profile
+39 -6
View File
@@ -20,6 +20,11 @@ logger = logging.getLogger(__name__)
# visible. Preserving awareness of every skill beats truncating entries.
_COMPACT_THRESHOLD_CHARS = 5000
# Per-skill description cap. Descriptions often run 300500 chars of
# context that's only useful once — the first sentence is enough to
# decide whether a skill applies. Truncated entries get a trailing "…".
_DESCRIPTION_CAP_CHARS = 140
_MANDATORY_HEADER_FULL = """## Skills (mandatory)
Before replying: scan <available_skills> <description> entries.
- If exactly one skill clearly applies: read its SKILL.md at <location> with `read_file`, then follow it.
@@ -88,18 +93,27 @@ class SkillCatalog:
"""All skill base directories for file access allowlisting."""
return [skill.base_dir for skill in self._skills.values()]
def to_prompt(self) -> str:
def to_prompt(self, *, phase: str | None = None) -> str:
"""Generate the catalog prompt for system prompt injection.
Returns empty string when no skills are present. Otherwise returns
a mandatory pre-reply checklist + decision rules + rate-limit note,
followed by the <available_skills> XML body.
When the full XML body exceeds ``_COMPACT_THRESHOLD_CHARS``, the
compact variant is emitted instead: <description> elements are
dropped so every skill stays visible before any gets truncated.
When ``phase`` is set, skills whose ``visibility`` list is present
and does not include that phase are filtered out. Skills with
``visibility=None`` always appear.
Descriptions are capped to the first sentence or
``_DESCRIPTION_CAP_CHARS`` (whichever is shorter) with a trailing
"" on truncation. When the full XML body still exceeds
``_COMPACT_THRESHOLD_CHARS`` the compact variant is emitted:
<description> elements are dropped so every skill stays visible
before any gets truncated.
"""
all_skills = sorted(self._skills.values(), key=lambda s: s.name)
if phase is not None:
all_skills = [s for s in all_skills if s.visibility is None or phase in s.visibility]
if not all_skills:
return ""
@@ -111,7 +125,25 @@ class SkillCatalog:
return f"{_MANDATORY_HEADER_COMPACT}\n\n{compact_xml}"
@staticmethod
def _render_xml(skills: list[ParsedSkill], *, compact: bool) -> str:
def _cap_description(description: str) -> str:
"""Return the first sentence or first ``_DESCRIPTION_CAP_CHARS`` chars."""
text = description.strip()
if not text:
return text
# First sentence boundary — look for '. ', '! ', '? '. Avoid matching
# decimals or abbreviations by requiring whitespace after the mark.
for i, ch in enumerate(text):
if ch in ".!?" and (i + 1 == len(text) or text[i + 1].isspace()):
sentence = text[: i + 1]
if len(sentence) <= _DESCRIPTION_CAP_CHARS:
return sentence
break
if len(text) <= _DESCRIPTION_CAP_CHARS:
return text
return text[: _DESCRIPTION_CAP_CHARS - 1].rstrip() + ""
@classmethod
def _render_xml(cls, skills: list[ParsedSkill], *, compact: bool) -> str:
"""Render the `<available_skills>` block.
``compact=True`` drops `<description>` to preserve skill awareness
@@ -122,7 +154,8 @@ class SkillCatalog:
lines.append(" <skill>")
lines.append(f" <name>{escape(skill.name)}</name>")
if not compact:
lines.append(f" <description>{escape(skill.description)}</description>")
capped = cls._cap_description(skill.description)
lines.append(f" <description>{escape(capped)}</description>")
lines.append(f" <location>{escape(skill.location)}</location>")
lines.append(" </skill>")
lines.append("</available_skills>")
+15
View File
@@ -64,6 +64,7 @@ class SkillsManager:
def __init__(self, config: SkillsManagerConfig | None = None) -> None:
self._config = config or SkillsManagerConfig()
self._loaded = False
self._catalog: object = None # SkillCatalog, set after load()
self._catalog_prompt: str = ""
self._protocols_prompt: str = ""
self._allowlisted_dirs: list[str] = []
@@ -91,6 +92,7 @@ class SkillsManager:
mgr = cls.__new__(cls)
mgr._config = SkillsManagerConfig()
mgr._loaded = True # skip load()
mgr._catalog = None
mgr._catalog_prompt = skills_catalog_prompt
mgr._protocols_prompt = protocols_prompt
mgr._allowlisted_dirs = []
@@ -140,6 +142,7 @@ class SkillsManager:
)
catalog = SkillCatalog(discovered)
self._catalog = catalog
self._allowlisted_dirs = catalog.allowlisted_dirs
catalog_prompt = catalog.to_prompt()
@@ -271,6 +274,18 @@ class SkillsManager:
"""Community skills XML catalog for system prompt injection."""
return self._catalog_prompt
def skills_catalog_prompt_for_phase(self, phase: str | None) -> str:
"""Render the catalog filtered for the given queen phase.
Skills whose frontmatter ``visibility`` list is present and
excludes ``phase`` are dropped. Falls back to the cached
phase-agnostic prompt when no live catalog is available
(e.g. ``from_precomputed``).
"""
if self._catalog is None or phase is None:
return self._catalog_prompt
return self._catalog.to_prompt(phase=phase) # type: ignore[attr-defined]
@property
def protocols_prompt(self) -> str:
"""Default skill operational protocols for system prompt injection."""
+18
View File
@@ -37,6 +37,10 @@ class ParsedSkill:
compatibility: list[str] | None = None
metadata: dict[str, Any] | None = None
allowed_tools: list[str] | None = None
# List of queen phases in which this skill appears in the catalog.
# None = visible in all phases. Example: ["planning", "building"]
# hides a framework-authoring skill from the INDEPENDENT/DM prompt.
visibility: list[str] | None = None
def _try_fix_yaml(raw: str) -> str:
@@ -219,6 +223,19 @@ def parse_skill_md(path: Path, source_scope: str = "project") -> ParsedSkill | N
raw_tools = frontmatter.get("allowed-tools")
if isinstance(raw_tools, str):
raw_tools = [raw_tools]
# `visibility` lives under `metadata.visibility` so it stays inside
# the open `metadata` map (the skill-file schema used by the IDE
# and other tooling only allows a fixed set of top-level keys).
raw_metadata = frontmatter.get("metadata")
raw_visibility: Any = None
if isinstance(raw_metadata, dict):
raw_visibility = raw_metadata.get("visibility")
if isinstance(raw_visibility, str):
raw_visibility = [raw_visibility]
if isinstance(raw_visibility, list):
raw_visibility = [str(v).strip() for v in raw_visibility if str(v).strip()] or None
else:
raw_visibility = None
return ParsedSkill(
name=name,
@@ -231,4 +248,5 @@ def parse_skill_md(path: Path, source_scope: str = "project") -> ParsedSkill | N
compatibility=raw_compat,
metadata=frontmatter.get("metadata"),
allowed_tools=raw_tools,
visibility=raw_visibility,
)
+14 -2
View File
@@ -194,6 +194,12 @@ class QueenPhaseState:
protocols_prompt: str = ""
# Community skills catalog (XML) — appended after protocols
skills_catalog_prompt: str = ""
# Optional SkillsManager reference. When set, get_current_prompt()
# re-renders the catalog filtered by the current phase so skills
# whose frontmatter `visibility` list excludes this phase are
# dropped (shaves ~1 KB of DM-irrelevant framework skills on
# independent-phase turns).
skills_manager: Any = None
# Provider for the ambient "Connected integrations" block. See
# docstring on the simpler QueenPhaseState above.
@@ -250,8 +256,14 @@ class QueenPhaseState:
credentials_block = _render_credentials_block(self.credentials_prompt_provider)
if credentials_block:
parts.append(credentials_block)
if self.skills_catalog_prompt:
parts.append(self.skills_catalog_prompt)
catalog_prompt = self.skills_catalog_prompt
if self.skills_manager is not None:
try:
catalog_prompt = self.skills_manager.skills_catalog_prompt_for_phase(self.phase)
except Exception:
catalog_prompt = self.skills_catalog_prompt
if catalog_prompt:
parts.append(catalog_prompt)
if self.protocols_prompt:
parts.append(self.protocols_prompt)
if self._cached_global_recall_block: