feat: improve validation
This commit is contained in:
@@ -244,10 +244,6 @@ use them as your starting mental model and let their specifics override your def
|
||||
If they say "I need to monitor files and alert me," you know this probably involves: \
|
||||
watch patterns, triggers, notifications, and state tracking.
|
||||
|
||||
**The key move**: Take your general knowledge of the domain and merge it with the \
|
||||
specifics they've given you. The result is a draft understanding that's 60-80% right \
|
||||
before you've asked a single question. Your questions close the remaining 20-40%.
|
||||
|
||||
---
|
||||
|
||||
### 1.3: Play Back a Proposed Model (Not a List of Questions)
|
||||
@@ -353,7 +349,8 @@ database, explain that's not how the framework works.
|
||||
|
||||
Act like an experienced AI solution architect Design the agent architecture:
|
||||
- Goal: id, name, description, 3-5 success criteria, 2-4 constraints
|
||||
- Nodes: **3-6 nodes** (warn if <3 or >6). \
|
||||
- Nodes: **3-6 nodes** (HARD RULE: never fewer than 3, never more than 6). \
|
||||
2 nodes is ALWAYS wrong — it means you under-decomposed the task. \
|
||||
Use as many nodes as the use case requires, but don't create nodes without \
|
||||
tools — merge them into nodes that do real work.
|
||||
- Edges: on_success for linear, conditional for routing
|
||||
@@ -362,16 +359,19 @@ tools — merge them into nodes that do real work.
|
||||
**MERGE nodes when:**
|
||||
- Node has NO tools (pure LLM reasoning) → merge into predecessor/successor
|
||||
- Node sets only 1 trivial output → collapse into predecessor
|
||||
- Multiple consecutive autonomous nodes → combine into one rich node
|
||||
- A "report" or "summary" node → merge into a processing node and return results to queen
|
||||
- A "confirm" or "schedule" node that calls no external service → remove
|
||||
|
||||
**SEPARATE nodes only when:**
|
||||
- Fundamentally different tool sets
|
||||
**SEPARATE nodes when:**
|
||||
- Fundamentally different tool sets (e.g., search vs. write vs. validate)
|
||||
- Fan-out parallelism (parallel branches MUST be separate)
|
||||
- Different failure/retry semantics (e.g., gather can retry, transform cannot)
|
||||
- Distinct phases of work (e.g., research, transform, validate, deliver)
|
||||
- A node would need more than ~5 tools — split by responsibility
|
||||
|
||||
**Typical patterns (queen manages all user interaction):**
|
||||
- 3 nodes: `gather → work → review` (review loops back to gather if not satisfied)
|
||||
- 3 nodes: `gather → work → review`
|
||||
- 4 nodes: `gather → analyze → transform → review`
|
||||
- 5 nodes: `gather → research → transform → validate → deliver`
|
||||
- WRONG: 2 nodes where everything is crammed into one giant node
|
||||
- WRONG: 7 nodes where half have no tools and just do LLM reasoning
|
||||
|
||||
Read reference agents before designing:
|
||||
@@ -447,18 +447,23 @@ and AgentRuntimeConfig to agent.py manually
|
||||
|
||||
Do NOT manually write these files from scratch — always use the tool.
|
||||
|
||||
## 7. Verify
|
||||
## 7. Verify and Load
|
||||
|
||||
Call `validate_agent_package("{name}")` after initialization. \
|
||||
It runs all checks (class validation, runner load, tool validation, \
|
||||
tests) and returns a consolidated result. If anything fails: read \
|
||||
the error, fix with edit_file, re-validate. Up to 3x.
|
||||
It runs structural checks (class validation, graph validation, tool \
|
||||
validation, tests) and returns a consolidated result. If anything \
|
||||
fails: read the error, fix with edit_file, re-validate. Up to 3x.
|
||||
|
||||
## 6. Present
|
||||
When validation passes, immediately call \
|
||||
`load_built_agent("exports/{name}")` to load the agent into the \
|
||||
session. This switches to STAGING phase and shows the graph in the \
|
||||
visualizer. Do NOT wait for user input between validation and loading.
|
||||
|
||||
## 8. Present
|
||||
|
||||
Show the user what you built: agent name, goal summary, graph (same \
|
||||
ASCII style as Design), files created, validation status. Offer to \
|
||||
revise or build another.
|
||||
ASCII style as Design), files created, validation status. The agent \
|
||||
is already loaded — offer to run it, revise, or build another.
|
||||
"""
|
||||
|
||||
|
||||
@@ -825,13 +830,11 @@ _queen_behavior = (
|
||||
)
|
||||
|
||||
_queen_phase_7 = """
|
||||
## 7. Load into Session
|
||||
## Running the Agent
|
||||
|
||||
After building and verifying, load the agent into the current session:
|
||||
load_built_agent("exports/{name}")
|
||||
This switches to STAGING phase — the user sees the agent's graph and \
|
||||
the tab name updates. Then call run_agent_with_input(task) to start it. \
|
||||
Do NOT tell the user to run `python -m {name} run` — load and run it here.
|
||||
After validation passes and load_built_agent succeeds (STAGING phase), \
|
||||
offer to run the agent. Call run_agent_with_input(task) to start it. \
|
||||
Do NOT tell the user to run `python -m {name} run` — run it here.
|
||||
"""
|
||||
|
||||
_queen_style = """
|
||||
|
||||
@@ -1215,14 +1215,17 @@ def run_agent_tests(
|
||||
|
||||
@mcp.tool()
|
||||
def validate_agent_package(agent_name: str) -> str:
|
||||
"""Run all validation checks on a built agent package in one call.
|
||||
"""Run structural validation checks on a built agent package in one call.
|
||||
|
||||
Executes 4 steps and reports all results (does not stop on first failure):
|
||||
1. Class validation — checks graph structure and entry_points contract
|
||||
2. Runner load — checks package export contract (same path the UI uses)
|
||||
2. Graph validation — loads the agent graph without credential checks
|
||||
3. Tool validation — checks declared tools exist in MCP servers
|
||||
4. Tests — runs the agent's pytest suite
|
||||
|
||||
Note: Credential validation is intentionally skipped here (building phase).
|
||||
Credentials are validated at run time by run_agent_with_input() preflight.
|
||||
|
||||
Args:
|
||||
agent_name: Agent package name (e.g. 'my_agent'). Must exist in exports/.
|
||||
|
||||
@@ -1267,14 +1270,17 @@ def validate_agent_package(agent_name: str) -> str:
|
||||
except Exception as e:
|
||||
steps["class_validation"] = {"passed": False, "error": str(e)}
|
||||
|
||||
# Step B: Runner load test (subprocess for import isolation)
|
||||
# Step B: Graph validation (subprocess for import isolation)
|
||||
# Credentials are checked at run time (run_agent_with_input preflight),
|
||||
# not at build time.
|
||||
try:
|
||||
proc = subprocess.run(
|
||||
[
|
||||
"uv", "run", "python", "-c",
|
||||
f'from framework.runner.runner import AgentRunner; '
|
||||
f'r = AgentRunner.load("exports/{agent_name}"); '
|
||||
f'print("AgentRunner.load: OK")',
|
||||
f'r = AgentRunner.load("exports/{agent_name}", '
|
||||
f'skip_credential_validation=True); '
|
||||
f'print("AgentRunner.load (graph-only): OK")',
|
||||
],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
@@ -1284,14 +1290,14 @@ def validate_agent_package(agent_name: str) -> str:
|
||||
stdin=subprocess.DEVNULL,
|
||||
)
|
||||
passed = proc.returncode == 0
|
||||
steps["runner_load"] = {
|
||||
steps["graph_validation"] = {
|
||||
"passed": passed,
|
||||
"output": (proc.stdout.strip() or proc.stderr.strip())[:2000],
|
||||
}
|
||||
if not passed:
|
||||
steps["runner_load"]["error"] = proc.stderr.strip()[:2000]
|
||||
steps["graph_validation"]["error"] = proc.stderr.strip()[:2000]
|
||||
except Exception as e:
|
||||
steps["runner_load"] = {"passed": False, "error": str(e)}
|
||||
steps["graph_validation"] = {"passed": False, "error": str(e)}
|
||||
|
||||
# Step C: Tool validation (direct call)
|
||||
try:
|
||||
|
||||
Reference in New Issue
Block a user