fix: simplify system prompt
This commit is contained in:
@@ -929,45 +929,16 @@ Report the last run's results to the user and ask what they want to do next.
|
||||
"""
|
||||
|
||||
_queen_behavior_independent = """
|
||||
## Independent — execution first (inline by default)
|
||||
## Independent execution
|
||||
|
||||
You are the agent. You execute directly.
|
||||
|
||||
**Default behavior: do one real instance inline before any scaling.**
|
||||
|
||||
0. **Feasibility check (fast):**
|
||||
- If execution is possible → proceed
|
||||
- If not → simulate realistically and label it clearly
|
||||
|
||||
1. Understand the task
|
||||
2. Plan briefly (1–5 bullets, no system design)
|
||||
3. **Do the work yourself, inline. One real instance.** Open the \
|
||||
browser, call the real API, write to the real file, send the \
|
||||
real message. Use your actual tools against real state. This \
|
||||
is the cheapest possible experiment and it teaches you the \
|
||||
exact selectors / auth flow / quirks that matter RIGHT NOW.
|
||||
|
||||
**Risk check:**
|
||||
If action is irreversible or affects real systems → show and confirm before executing
|
||||
|
||||
4. **Report with concrete evidence**
|
||||
- Actual output / result
|
||||
- What worked / failed
|
||||
- Key learnings
|
||||
|
||||
5. Iterate inline until the process is reliable
|
||||
|
||||
6. Only then consider scaling
|
||||
|
||||
**Hard rule:** no scaling before one successful inline run
|
||||
if you finish one sucessful inline run, follow **Scaling order:**
|
||||
- Repeat inline (≤10 items)
|
||||
- Parallel workers (batch, immediate results)
|
||||
- Colony (only for recurring/background tasks)
|
||||
|
||||
|
||||
**Exception:**
|
||||
If task is conceptual/strategic → skip execution and answer directly
|
||||
You are the agent. Do one real inline instance before any scaling — \
|
||||
open the browser, call the real API, write to the real file. If the \
|
||||
action is irreversible or touches shared systems, show and confirm \
|
||||
before executing. Report concrete evidence (actual output, what \
|
||||
worked / failed) after the run. Scale order once inline succeeds: \
|
||||
repeat inline (≤10 items) → `run_parallel_workers` (batch, results \
|
||||
now) → `create_colony` (recurring / background). Conceptual or \
|
||||
strategic questions: answer directly, skip execution.
|
||||
"""
|
||||
|
||||
# -- Behavior shared across all phases --
|
||||
@@ -977,65 +948,21 @@ _queen_behavior_always = """
|
||||
|
||||
## Communication
|
||||
|
||||
Plain-text output IS how you talk to the user — your response is \
|
||||
displayed directly in the chat. Use text for conversational replies, \
|
||||
open-ended questions, explanations, and short status updates before \
|
||||
tool calls. When the user just wants to chat, chat back naturally; \
|
||||
you don't need a tool call to "hand off" the turn — the system \
|
||||
detects the end of your response and waits for their next message.
|
||||
|
||||
## Visible response channel
|
||||
|
||||
Your visible response is the plain text in your LLM reply — the text \
|
||||
you write after the closing `<tone>` tag of your internal assessment. \
|
||||
NEVER use `run_command`, `echo`, or any other tool to emit what you \
|
||||
want the user to read. Tools are for work: reading files, running \
|
||||
commands, searching, editing. Tools are not for speaking. If you \
|
||||
ever find yourself about to call `run_command("echo ...")` to say \
|
||||
something, stop — write it as plain text instead. The LLM reply \
|
||||
itself is the channel; there is no other.
|
||||
|
||||
## ask_user / ask_user_multiple
|
||||
|
||||
Use these tools ONLY when you need the user to pick from a small set \
|
||||
of concrete options — approval gates, structured preference questions, \
|
||||
decision points with 2-4 clear alternatives. Typical triggers:
|
||||
- "Postgres or SQLite?" use ask_user tool with options
|
||||
- "Approve this draft? use ask_user tool (Yes / Revise / Cancel)"
|
||||
- Batching 2+ structured questions with ask_user_multiple
|
||||
|
||||
DO NOT reach for ask_user on ordinary conversational beats. "What's \
|
||||
your name?", "Tell me more about that", "How are you?" — just write \
|
||||
those as text. Free-form questions belong in prose. Using ask_user \
|
||||
for every reply feels robotic and blocks natural conversation. \
|
||||
When you do use it, keep your text to a brief intro; the widget \
|
||||
renders the question and options.
|
||||
|
||||
## Chatting vs acting
|
||||
|
||||
**When the user greets you or chats, reply in plain prose — no tool \
|
||||
calls.** A bare "hi", "hey", "hello", "how's it going" is a \
|
||||
conversational opener, not a hidden task. Do NOT call `list_directory`, \
|
||||
`search_files`, `run_command`, `ask_user`, or any other tool to \
|
||||
"discover" what they want. Instead, check what you already know about \
|
||||
this user from your recall memory — their name, role, past topics, \
|
||||
preferences — and write a 1–2 sentence greeting in character that \
|
||||
references it. If you know their name, use it. If you remember what \
|
||||
you last worked on together, reference it. Then stop and wait. They \
|
||||
will bring the task when they have one. Presuming a task that wasn't \
|
||||
stated is worse than waiting a turn.
|
||||
|
||||
**When the user asks you to DO something** (build, edit, run, \
|
||||
investigate, search), call the appropriate tool directly on the same \
|
||||
turn — don't narrate intent and stop. "Let me check that file." \
|
||||
followed by an immediate read_file is fine; "I'll check that file." \
|
||||
with no tool call and then waiting is not. If you can act now, act now.
|
||||
|
||||
|
||||
## Images
|
||||
|
||||
Users can attach images to messages. Analyze them directly using your \
|
||||
vision capability — the image is embedded, no tool call needed.
|
||||
- Your LLM reply text is what the user reads. Do NOT use \
|
||||
`run_command`, `echo`, or any other tool to "say" something — tools \
|
||||
are for work (read/search/edit/run), not speech.
|
||||
- On a greeting or chat ("hi", "how's it going"), reply in plain \
|
||||
prose and stop. Do not call tools to "discover" what the user wants. \
|
||||
Check recall memory for name / role / past topics and weave them into \
|
||||
a 1–2 sentence in-character greeting, then wait.
|
||||
- On a clear ask (build, edit, run, investigate, search), call the \
|
||||
appropriate tool on the same turn — don't narrate intent and stop.
|
||||
- Use `ask_user` / `ask_user_multiple` only for structured choices \
|
||||
(approvals, 2–4 concrete options like "Postgres or SQLite?"). \
|
||||
Free-form questions belong in prose; reaching for `ask_user` on \
|
||||
every reply blocks natural conversation.
|
||||
- Images attached by the user are analyzed directly via your vision \
|
||||
capability — no tool call needed.
|
||||
"""
|
||||
|
||||
# -- PLANNING phase behavior --
|
||||
|
||||
@@ -1160,7 +1160,9 @@ def update_queen_profile(queen_id: str, updates: dict[str, Any]) -> dict[str, An
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def format_queen_identity_prompt(profile: dict[str, Any]) -> str:
|
||||
def format_queen_identity_prompt(
|
||||
profile: dict[str, Any], *, max_examples: int | None = None
|
||||
) -> str:
|
||||
"""Convert a queen profile into a high-dimensional character prompt.
|
||||
|
||||
Uses the 5-pillar character construction system: core identity,
|
||||
@@ -1168,6 +1170,11 @@ def format_queen_identity_prompt(profile: dict[str, Any]) -> str:
|
||||
behavior rules, and world lore. The hidden background and
|
||||
psychological profile are never shown to the user but shape
|
||||
every response.
|
||||
|
||||
``max_examples`` caps the roleplay_examples block — profiles ship
|
||||
four worked examples (~2.4 KB) but one is enough at runtime to show
|
||||
the internal-then-external pattern. Full rendering stays available
|
||||
for profile authoring / eval playback by leaving ``max_examples=None``.
|
||||
"""
|
||||
name = profile.get("name", "the Queen")
|
||||
title = profile.get("title", "Senior Advisor")
|
||||
@@ -1248,6 +1255,8 @@ def format_queen_identity_prompt(profile: dict[str, Any]) -> str:
|
||||
|
||||
# Few-shot examples showing the full internal process
|
||||
examples = profile.get("examples", [])
|
||||
if examples and max_examples is not None:
|
||||
examples = examples[:max_examples]
|
||||
if examples:
|
||||
example_parts: list[str] = []
|
||||
for ex in examples:
|
||||
|
||||
@@ -175,12 +175,10 @@ def _build_credentials_provider() -> Any:
|
||||
|
||||
adapter = CredentialStoreAdapter.default()
|
||||
accounts = adapter.get_all_account_info()
|
||||
tool_provider_map = adapter.get_tool_provider_map()
|
||||
rendered = build_accounts_prompt(
|
||||
accounts,
|
||||
tool_provider_map=tool_provider_map,
|
||||
node_tool_names=None,
|
||||
)
|
||||
# Compact form (no tool_provider_map) — tool schemas already
|
||||
# surface function names; baking the full per-provider list
|
||||
# into the system prompt on every turn was ~2 KB of redundancy.
|
||||
rendered = build_accounts_prompt(accounts)
|
||||
except Exception:
|
||||
logger.debug("Failed to render ambient credentials block", exc_info=True)
|
||||
rendered = ""
|
||||
@@ -231,7 +229,7 @@ async def materialize_queen_identity(
|
||||
|
||||
phase_state.queen_id = queen_id
|
||||
phase_state.queen_profile = queen_profile
|
||||
phase_state.queen_identity_prompt = format_queen_identity_prompt(queen_profile)
|
||||
phase_state.queen_identity_prompt = format_queen_identity_prompt(queen_profile, max_examples=1)
|
||||
|
||||
if event_bus is not None:
|
||||
await event_bus.publish(
|
||||
@@ -565,6 +563,10 @@ async def create_queen(
|
||||
_queen_skills_mgr.load()
|
||||
phase_state.protocols_prompt = _queen_skills_mgr.protocols_prompt
|
||||
phase_state.skills_catalog_prompt = _queen_skills_mgr.skills_catalog_prompt
|
||||
# Also store the manager so get_current_prompt() can render a
|
||||
# phase-filtered catalog on each turn (skills with a `visibility`
|
||||
# frontmatter that excludes the current phase are dropped).
|
||||
phase_state.skills_manager = _queen_skills_mgr
|
||||
_queen_skill_dirs = _queen_skills_mgr.allowlisted_dirs
|
||||
except Exception:
|
||||
logger.debug("Queen skill loading failed (non-fatal)", exc_info=True)
|
||||
@@ -632,7 +634,7 @@ async def create_queen(
|
||||
except FileNotFoundError:
|
||||
logger.warning("Queen profile %s not found after selection", queen_id)
|
||||
return None
|
||||
identity_prompt = format_queen_identity_prompt(profile)
|
||||
identity_prompt = format_queen_identity_prompt(profile, max_examples=1)
|
||||
# Store on phase_state so identity persists across dynamic prompt refreshes
|
||||
phase_state.queen_id = queen_id
|
||||
phase_state.queen_profile = profile
|
||||
|
||||
@@ -20,6 +20,11 @@ logger = logging.getLogger(__name__)
|
||||
# visible. Preserving awareness of every skill beats truncating entries.
|
||||
_COMPACT_THRESHOLD_CHARS = 5000
|
||||
|
||||
# Per-skill description cap. Descriptions often run 300–500 chars of
|
||||
# context that's only useful once — the first sentence is enough to
|
||||
# decide whether a skill applies. Truncated entries get a trailing "…".
|
||||
_DESCRIPTION_CAP_CHARS = 140
|
||||
|
||||
_MANDATORY_HEADER_FULL = """## Skills (mandatory)
|
||||
Before replying: scan <available_skills> <description> entries.
|
||||
- If exactly one skill clearly applies: read its SKILL.md at <location> with `read_file`, then follow it.
|
||||
@@ -88,18 +93,27 @@ class SkillCatalog:
|
||||
"""All skill base directories for file access allowlisting."""
|
||||
return [skill.base_dir for skill in self._skills.values()]
|
||||
|
||||
def to_prompt(self) -> str:
|
||||
def to_prompt(self, *, phase: str | None = None) -> str:
|
||||
"""Generate the catalog prompt for system prompt injection.
|
||||
|
||||
Returns empty string when no skills are present. Otherwise returns
|
||||
a mandatory pre-reply checklist + decision rules + rate-limit note,
|
||||
followed by the <available_skills> XML body.
|
||||
|
||||
When the full XML body exceeds ``_COMPACT_THRESHOLD_CHARS``, the
|
||||
compact variant is emitted instead: <description> elements are
|
||||
dropped so every skill stays visible before any gets truncated.
|
||||
When ``phase`` is set, skills whose ``visibility`` list is present
|
||||
and does not include that phase are filtered out. Skills with
|
||||
``visibility=None`` always appear.
|
||||
|
||||
Descriptions are capped to the first sentence or
|
||||
``_DESCRIPTION_CAP_CHARS`` (whichever is shorter) with a trailing
|
||||
"…" on truncation. When the full XML body still exceeds
|
||||
``_COMPACT_THRESHOLD_CHARS`` the compact variant is emitted:
|
||||
<description> elements are dropped so every skill stays visible
|
||||
before any gets truncated.
|
||||
"""
|
||||
all_skills = sorted(self._skills.values(), key=lambda s: s.name)
|
||||
if phase is not None:
|
||||
all_skills = [s for s in all_skills if s.visibility is None or phase in s.visibility]
|
||||
if not all_skills:
|
||||
return ""
|
||||
|
||||
@@ -111,7 +125,25 @@ class SkillCatalog:
|
||||
return f"{_MANDATORY_HEADER_COMPACT}\n\n{compact_xml}"
|
||||
|
||||
@staticmethod
|
||||
def _render_xml(skills: list[ParsedSkill], *, compact: bool) -> str:
|
||||
def _cap_description(description: str) -> str:
|
||||
"""Return the first sentence or first ``_DESCRIPTION_CAP_CHARS`` chars."""
|
||||
text = description.strip()
|
||||
if not text:
|
||||
return text
|
||||
# First sentence boundary — look for '. ', '! ', '? '. Avoid matching
|
||||
# decimals or abbreviations by requiring whitespace after the mark.
|
||||
for i, ch in enumerate(text):
|
||||
if ch in ".!?" and (i + 1 == len(text) or text[i + 1].isspace()):
|
||||
sentence = text[: i + 1]
|
||||
if len(sentence) <= _DESCRIPTION_CAP_CHARS:
|
||||
return sentence
|
||||
break
|
||||
if len(text) <= _DESCRIPTION_CAP_CHARS:
|
||||
return text
|
||||
return text[: _DESCRIPTION_CAP_CHARS - 1].rstrip() + "…"
|
||||
|
||||
@classmethod
|
||||
def _render_xml(cls, skills: list[ParsedSkill], *, compact: bool) -> str:
|
||||
"""Render the `<available_skills>` block.
|
||||
|
||||
``compact=True`` drops `<description>` to preserve skill awareness
|
||||
@@ -122,7 +154,8 @@ class SkillCatalog:
|
||||
lines.append(" <skill>")
|
||||
lines.append(f" <name>{escape(skill.name)}</name>")
|
||||
if not compact:
|
||||
lines.append(f" <description>{escape(skill.description)}</description>")
|
||||
capped = cls._cap_description(skill.description)
|
||||
lines.append(f" <description>{escape(capped)}</description>")
|
||||
lines.append(f" <location>{escape(skill.location)}</location>")
|
||||
lines.append(" </skill>")
|
||||
lines.append("</available_skills>")
|
||||
|
||||
@@ -64,6 +64,7 @@ class SkillsManager:
|
||||
def __init__(self, config: SkillsManagerConfig | None = None) -> None:
|
||||
self._config = config or SkillsManagerConfig()
|
||||
self._loaded = False
|
||||
self._catalog: object = None # SkillCatalog, set after load()
|
||||
self._catalog_prompt: str = ""
|
||||
self._protocols_prompt: str = ""
|
||||
self._allowlisted_dirs: list[str] = []
|
||||
@@ -91,6 +92,7 @@ class SkillsManager:
|
||||
mgr = cls.__new__(cls)
|
||||
mgr._config = SkillsManagerConfig()
|
||||
mgr._loaded = True # skip load()
|
||||
mgr._catalog = None
|
||||
mgr._catalog_prompt = skills_catalog_prompt
|
||||
mgr._protocols_prompt = protocols_prompt
|
||||
mgr._allowlisted_dirs = []
|
||||
@@ -140,6 +142,7 @@ class SkillsManager:
|
||||
)
|
||||
|
||||
catalog = SkillCatalog(discovered)
|
||||
self._catalog = catalog
|
||||
self._allowlisted_dirs = catalog.allowlisted_dirs
|
||||
catalog_prompt = catalog.to_prompt()
|
||||
|
||||
@@ -271,6 +274,18 @@ class SkillsManager:
|
||||
"""Community skills XML catalog for system prompt injection."""
|
||||
return self._catalog_prompt
|
||||
|
||||
def skills_catalog_prompt_for_phase(self, phase: str | None) -> str:
|
||||
"""Render the catalog filtered for the given queen phase.
|
||||
|
||||
Skills whose frontmatter ``visibility`` list is present and
|
||||
excludes ``phase`` are dropped. Falls back to the cached
|
||||
phase-agnostic prompt when no live catalog is available
|
||||
(e.g. ``from_precomputed``).
|
||||
"""
|
||||
if self._catalog is None or phase is None:
|
||||
return self._catalog_prompt
|
||||
return self._catalog.to_prompt(phase=phase) # type: ignore[attr-defined]
|
||||
|
||||
@property
|
||||
def protocols_prompt(self) -> str:
|
||||
"""Default skill operational protocols for system prompt injection."""
|
||||
|
||||
@@ -37,6 +37,10 @@ class ParsedSkill:
|
||||
compatibility: list[str] | None = None
|
||||
metadata: dict[str, Any] | None = None
|
||||
allowed_tools: list[str] | None = None
|
||||
# List of queen phases in which this skill appears in the catalog.
|
||||
# None = visible in all phases. Example: ["planning", "building"]
|
||||
# hides a framework-authoring skill from the INDEPENDENT/DM prompt.
|
||||
visibility: list[str] | None = None
|
||||
|
||||
|
||||
def _try_fix_yaml(raw: str) -> str:
|
||||
@@ -219,6 +223,19 @@ def parse_skill_md(path: Path, source_scope: str = "project") -> ParsedSkill | N
|
||||
raw_tools = frontmatter.get("allowed-tools")
|
||||
if isinstance(raw_tools, str):
|
||||
raw_tools = [raw_tools]
|
||||
# `visibility` lives under `metadata.visibility` so it stays inside
|
||||
# the open `metadata` map (the skill-file schema used by the IDE
|
||||
# and other tooling only allows a fixed set of top-level keys).
|
||||
raw_metadata = frontmatter.get("metadata")
|
||||
raw_visibility: Any = None
|
||||
if isinstance(raw_metadata, dict):
|
||||
raw_visibility = raw_metadata.get("visibility")
|
||||
if isinstance(raw_visibility, str):
|
||||
raw_visibility = [raw_visibility]
|
||||
if isinstance(raw_visibility, list):
|
||||
raw_visibility = [str(v).strip() for v in raw_visibility if str(v).strip()] or None
|
||||
else:
|
||||
raw_visibility = None
|
||||
|
||||
return ParsedSkill(
|
||||
name=name,
|
||||
@@ -231,4 +248,5 @@ def parse_skill_md(path: Path, source_scope: str = "project") -> ParsedSkill | N
|
||||
compatibility=raw_compat,
|
||||
metadata=frontmatter.get("metadata"),
|
||||
allowed_tools=raw_tools,
|
||||
visibility=raw_visibility,
|
||||
)
|
||||
|
||||
@@ -194,6 +194,12 @@ class QueenPhaseState:
|
||||
protocols_prompt: str = ""
|
||||
# Community skills catalog (XML) — appended after protocols
|
||||
skills_catalog_prompt: str = ""
|
||||
# Optional SkillsManager reference. When set, get_current_prompt()
|
||||
# re-renders the catalog filtered by the current phase so skills
|
||||
# whose frontmatter `visibility` list excludes this phase are
|
||||
# dropped (shaves ~1 KB of DM-irrelevant framework skills on
|
||||
# independent-phase turns).
|
||||
skills_manager: Any = None
|
||||
|
||||
# Provider for the ambient "Connected integrations" block. See
|
||||
# docstring on the simpler QueenPhaseState above.
|
||||
@@ -250,8 +256,14 @@ class QueenPhaseState:
|
||||
credentials_block = _render_credentials_block(self.credentials_prompt_provider)
|
||||
if credentials_block:
|
||||
parts.append(credentials_block)
|
||||
if self.skills_catalog_prompt:
|
||||
parts.append(self.skills_catalog_prompt)
|
||||
catalog_prompt = self.skills_catalog_prompt
|
||||
if self.skills_manager is not None:
|
||||
try:
|
||||
catalog_prompt = self.skills_manager.skills_catalog_prompt_for_phase(self.phase)
|
||||
except Exception:
|
||||
catalog_prompt = self.skills_catalog_prompt
|
||||
if catalog_prompt:
|
||||
parts.append(catalog_prompt)
|
||||
if self.protocols_prompt:
|
||||
parts.append(self.protocols_prompt)
|
||||
if self._cached_global_recall_block:
|
||||
|
||||
Reference in New Issue
Block a user