feat: improve validation

This commit is contained in:
Richard Tang
2026-03-06 16:34:28 -08:00
parent e2558e3f95
commit 1e06e87f4c
2 changed files with 41 additions and 32 deletions
@@ -244,10 +244,6 @@ use them as your starting mental model and let their specifics override your def
If they say "I need to monitor files and alert me," you know this probably involves: \
watch patterns, triggers, notifications, and state tracking.
**The key move**: Take your general knowledge of the domain and merge it with the \
specifics they've given you. The result is a draft understanding that's 60-80% right \
before you've asked a single question. Your questions close the remaining 20-40%.
---
### 1.3: Play Back a Proposed Model (Not a List of Questions)
@@ -353,7 +349,8 @@ database, explain that's not how the framework works.
Act like an experienced AI solution architect Design the agent architecture:
- Goal: id, name, description, 3-5 success criteria, 2-4 constraints
- Nodes: **3-6 nodes** (warn if <3 or >6). \
- Nodes: **3-6 nodes** (HARD RULE: never fewer than 3, never more than 6). \
2 nodes is ALWAYS wrong it means you under-decomposed the task. \
Use as many nodes as the use case requires, but don't create nodes without \
tools merge them into nodes that do real work.
- Edges: on_success for linear, conditional for routing
@@ -362,16 +359,19 @@ tools — merge them into nodes that do real work.
**MERGE nodes when:**
- Node has NO tools (pure LLM reasoning) merge into predecessor/successor
- Node sets only 1 trivial output collapse into predecessor
- Multiple consecutive autonomous nodes combine into one rich node
- A "report" or "summary" node merge into a processing node and return results to queen
- A "confirm" or "schedule" node that calls no external service remove
**SEPARATE nodes only when:**
- Fundamentally different tool sets
**SEPARATE nodes when:**
- Fundamentally different tool sets (e.g., search vs. write vs. validate)
- Fan-out parallelism (parallel branches MUST be separate)
- Different failure/retry semantics (e.g., gather can retry, transform cannot)
- Distinct phases of work (e.g., research, transform, validate, deliver)
- A node would need more than ~5 tools split by responsibility
**Typical patterns (queen manages all user interaction):**
- 3 nodes: `gather work review` (review loops back to gather if not satisfied)
- 3 nodes: `gather work review`
- 4 nodes: `gather analyze transform review`
- 5 nodes: `gather research transform validate deliver`
- WRONG: 2 nodes where everything is crammed into one giant node
- WRONG: 7 nodes where half have no tools and just do LLM reasoning
Read reference agents before designing:
@@ -447,18 +447,23 @@ and AgentRuntimeConfig to agent.py manually
Do NOT manually write these files from scratch always use the tool.
## 7. Verify
## 7. Verify and Load
Call `validate_agent_package("{name}")` after initialization. \
It runs all checks (class validation, runner load, tool validation, \
tests) and returns a consolidated result. If anything fails: read \
the error, fix with edit_file, re-validate. Up to 3x.
It runs structural checks (class validation, graph validation, tool \
validation, tests) and returns a consolidated result. If anything \
fails: read the error, fix with edit_file, re-validate. Up to 3x.
## 6. Present
When validation passes, immediately call \
`load_built_agent("exports/{name}")` to load the agent into the \
session. This switches to STAGING phase and shows the graph in the \
visualizer. Do NOT wait for user input between validation and loading.
## 8. Present
Show the user what you built: agent name, goal summary, graph (same \
ASCII style as Design), files created, validation status. Offer to \
revise or build another.
ASCII style as Design), files created, validation status. The agent \
is already loaded offer to run it, revise, or build another.
"""
@@ -825,13 +830,11 @@ _queen_behavior = (
)
_queen_phase_7 = """
## 7. Load into Session
## Running the Agent
After building and verifying, load the agent into the current session:
load_built_agent("exports/{name}")
This switches to STAGING phase the user sees the agent's graph and \
the tab name updates. Then call run_agent_with_input(task) to start it. \
Do NOT tell the user to run `python -m {name} run` load and run it here.
After validation passes and load_built_agent succeeds (STAGING phase), \
offer to run the agent. Call run_agent_with_input(task) to start it. \
Do NOT tell the user to run `python -m {name} run` run it here.
"""
_queen_style = """
+14 -8
View File
@@ -1215,14 +1215,17 @@ def run_agent_tests(
@mcp.tool()
def validate_agent_package(agent_name: str) -> str:
"""Run all validation checks on a built agent package in one call.
"""Run structural validation checks on a built agent package in one call.
Executes 4 steps and reports all results (does not stop on first failure):
1. Class validation checks graph structure and entry_points contract
2. Runner load checks package export contract (same path the UI uses)
2. Graph validation loads the agent graph without credential checks
3. Tool validation checks declared tools exist in MCP servers
4. Tests runs the agent's pytest suite
Note: Credential validation is intentionally skipped here (building phase).
Credentials are validated at run time by run_agent_with_input() preflight.
Args:
agent_name: Agent package name (e.g. 'my_agent'). Must exist in exports/.
@@ -1267,14 +1270,17 @@ def validate_agent_package(agent_name: str) -> str:
except Exception as e:
steps["class_validation"] = {"passed": False, "error": str(e)}
# Step B: Runner load test (subprocess for import isolation)
# Step B: Graph validation (subprocess for import isolation)
# Credentials are checked at run time (run_agent_with_input preflight),
# not at build time.
try:
proc = subprocess.run(
[
"uv", "run", "python", "-c",
f'from framework.runner.runner import AgentRunner; '
f'r = AgentRunner.load("exports/{agent_name}"); '
f'print("AgentRunner.load: OK")',
f'r = AgentRunner.load("exports/{agent_name}", '
f'skip_credential_validation=True); '
f'print("AgentRunner.load (graph-only): OK")',
],
capture_output=True,
text=True,
@@ -1284,14 +1290,14 @@ def validate_agent_package(agent_name: str) -> str:
stdin=subprocess.DEVNULL,
)
passed = proc.returncode == 0
steps["runner_load"] = {
steps["graph_validation"] = {
"passed": passed,
"output": (proc.stdout.strip() or proc.stderr.strip())[:2000],
}
if not passed:
steps["runner_load"]["error"] = proc.stderr.strip()[:2000]
steps["graph_validation"]["error"] = proc.stderr.strip()[:2000]
except Exception as e:
steps["runner_load"] = {"passed": False, "error": str(e)}
steps["graph_validation"] = {"passed": False, "error": str(e)}
# Step C: Tool validation (direct call)
try: