feat: automated test agent skill
This commit is contained in:
@@ -138,10 +138,10 @@ Two execution paths, use the right one for your situation.
|
||||
Run the agent via CLI. This creates sessions with checkpoints at `~/.hive/agents/{agent_name}/sessions/`:
|
||||
|
||||
```bash
|
||||
PYTHONPATH=core:exports uv run python -m {agent_name} --tui
|
||||
uv run hive run exports/{agent_name} --input '{"query": "test topic"}'
|
||||
```
|
||||
|
||||
The TUI lets you interact with client-facing nodes and see real-time execution. Sessions and checkpoints are saved automatically.
|
||||
Sessions and checkpoints are saved automatically. For agents with client-facing nodes that require user interaction, the user must launch the TUI manually in a separate terminal (Claude Code cannot interact with TUI apps).
|
||||
|
||||
#### Automated regression (for CI or final verification)
|
||||
|
||||
@@ -334,7 +334,7 @@ Resume when ALL of these are true:
|
||||
|
||||
```bash
|
||||
# Resume from the last clean checkpoint before the failing node
|
||||
PYTHONPATH=core:exports uv run python -m {agent_name} --tui \
|
||||
uv run hive run exports/{agent_name} \
|
||||
--resume-session session_20260209_143022_abc12345 \
|
||||
--checkpoint cp_node_complete_research_143030
|
||||
```
|
||||
@@ -350,7 +350,7 @@ Re-run when ANY of these are true:
|
||||
- You changed the graph structure (added/removed nodes/edges)
|
||||
|
||||
```bash
|
||||
PYTHONPATH=core:exports uv run python -m {agent_name} --tui
|
||||
uv run hive run exports/{agent_name} --input '{"query": "test topic"}'
|
||||
```
|
||||
|
||||
#### Inspecting a checkpoint before resuming
|
||||
@@ -696,7 +696,7 @@ run_tests(goal_id, agent_path, test_types='["success"]')
|
||||
|
||||
```bash
|
||||
# Iterative debugging with checkpoints (via CLI)
|
||||
PYTHONPATH=core:exports uv run python -m {agent_name} --tui
|
||||
uv run hive run exports/{agent_name} --input '{"query": "test"}'
|
||||
```
|
||||
|
||||
### Phase 3: Analysis
|
||||
@@ -739,8 +739,8 @@ get_agent_checkpoint(agent_work_dir, session_id, checkpoint_id)
|
||||
```
|
||||
|
||||
```bash
|
||||
# Resume from checkpoint via CLI
|
||||
PYTHONPATH=core:exports uv run python -m {agent_name} --tui \
|
||||
# Resume from checkpoint via CLI (headless)
|
||||
uv run hive run exports/{agent_name} \
|
||||
--resume-session {session_id} --checkpoint {checkpoint_id}
|
||||
```
|
||||
|
||||
@@ -757,8 +757,9 @@ PYTHONPATH=core:exports uv run python -m {agent_name} --tui \
|
||||
| Write 30+ tests | Write 8-15 focused tests |
|
||||
| Skip credential check | Use `/hive-credentials` before testing |
|
||||
| Confuse `exports/` with `~/.hive/agents/` | Code in `exports/`, runtime data in `~/.hive/` |
|
||||
| Use `run_tests` for iterative debugging | Use CLI with checkpoints for iterative debugging |
|
||||
| Use CLI for final regression | Use `run_tests` for automated regression |
|
||||
| Use `run_tests` for iterative debugging | Use headless CLI with checkpoints for iterative debugging |
|
||||
| Use headless CLI for final regression | Use `run_tests` for automated regression |
|
||||
| Use `--tui` from Claude Code | Use headless `run` command — TUI hangs in non-interactive shells |
|
||||
| Run tests without reading goal first | Always understand the goal before writing tests |
|
||||
| Skip Phase 3 analysis and guess | Use session + log tools to identify root cause |
|
||||
|
||||
@@ -866,7 +867,7 @@ list_agent_checkpoints(
|
||||
# → cp_node_complete_intake_150005
|
||||
|
||||
# Resume from after intake, re-run research with fixed prompt
|
||||
PYTHONPATH=core:exports uv run python -m deep_research_agent --tui \
|
||||
uv run hive run exports/deep_research_agent \
|
||||
--resume-session session_20260209_150000_abc12345 \
|
||||
--checkpoint cp_node_complete_intake_150005
|
||||
```
|
||||
@@ -874,7 +875,7 @@ PYTHONPATH=core:exports uv run python -m deep_research_agent --tui \
|
||||
Or for this simple case (intake is fast), just re-run:
|
||||
|
||||
```bash
|
||||
PYTHONPATH=core:exports uv run python -m deep_research_agent --tui
|
||||
uv run hive run exports/deep_research_agent --input '{"topic": "test"}'
|
||||
```
|
||||
|
||||
### Phase 6: Final verification
|
||||
|
||||
@@ -259,7 +259,7 @@ The fix is to the `report` node (the last node). To demonstrate checkpoint recov
|
||||
|
||||
```bash
|
||||
# Run via CLI to get checkpoints
|
||||
PYTHONPATH=core:exports uv run python -m deep_research_agent --tui
|
||||
uv run hive run exports/deep_research_agent --input '{"topic": "climate change effects"}'
|
||||
|
||||
# After it runs, find the clean checkpoint before report
|
||||
list_agent_checkpoints(
|
||||
@@ -270,7 +270,7 @@ list_agent_checkpoints(
|
||||
# → cp_node_complete_review_152100 (after review, before report)
|
||||
|
||||
# Resume — skips intake, research, review entirely
|
||||
PYTHONPATH=core:exports uv run python -m deep_research_agent --tui \
|
||||
uv run hive run exports/deep_research_agent \
|
||||
--resume-session session_20260209_152000_ghi34567 \
|
||||
--checkpoint cp_node_complete_review_152100
|
||||
```
|
||||
|
||||
@@ -332,6 +332,60 @@ def register_commands(subparsers: argparse._SubParsersAction) -> None:
|
||||
resume_parser.set_defaults(func=cmd_resume)
|
||||
|
||||
|
||||
def _load_resume_state(
|
||||
agent_path: str, session_id: str, checkpoint_id: str | None = None
|
||||
) -> dict | None:
|
||||
"""Load session or checkpoint state for headless resume.
|
||||
|
||||
Args:
|
||||
agent_path: Path to the agent folder (e.g., exports/my_agent)
|
||||
session_id: Session ID to resume from
|
||||
checkpoint_id: Optional checkpoint ID within the session
|
||||
|
||||
Returns:
|
||||
session_state dict for executor, or None if not found
|
||||
"""
|
||||
agent_name = Path(agent_path).name
|
||||
agent_work_dir = Path.home() / ".hive" / "agents" / agent_name
|
||||
session_dir = agent_work_dir / "sessions" / session_id
|
||||
|
||||
if not session_dir.exists():
|
||||
return None
|
||||
|
||||
if checkpoint_id:
|
||||
# Checkpoint-based resume: load checkpoint and extract state
|
||||
cp_path = session_dir / "checkpoints" / f"{checkpoint_id}.json"
|
||||
if not cp_path.exists():
|
||||
return None
|
||||
try:
|
||||
cp_data = json.loads(cp_path.read_text())
|
||||
except (json.JSONDecodeError, OSError):
|
||||
return None
|
||||
return {
|
||||
"memory": cp_data.get("shared_memory", {}),
|
||||
"paused_at": cp_data.get("next_node") or cp_data.get("current_node"),
|
||||
"execution_path": cp_data.get("execution_path", []),
|
||||
"node_visit_counts": {},
|
||||
}
|
||||
else:
|
||||
# Session state resume: load state.json
|
||||
state_path = session_dir / "state.json"
|
||||
if not state_path.exists():
|
||||
return None
|
||||
try:
|
||||
state_data = json.loads(state_path.read_text())
|
||||
except (json.JSONDecodeError, OSError):
|
||||
return None
|
||||
progress = state_data.get("progress", {})
|
||||
paused_at = progress.get("paused_at") or progress.get("resume_from")
|
||||
return {
|
||||
"memory": state_data.get("memory", {}),
|
||||
"paused_at": paused_at,
|
||||
"execution_path": progress.get("path", []),
|
||||
"node_visit_counts": progress.get("node_visit_counts", {}),
|
||||
}
|
||||
|
||||
|
||||
def cmd_run(args: argparse.Namespace) -> int:
|
||||
"""Run an exported agent."""
|
||||
import logging
|
||||
@@ -421,6 +475,27 @@ def cmd_run(args: argparse.Namespace) -> int:
|
||||
print(f"Error: {e}", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
# Load session/checkpoint state for resume (headless mode)
|
||||
session_state = None
|
||||
resume_session = getattr(args, "resume_session", None)
|
||||
checkpoint = getattr(args, "checkpoint", None)
|
||||
if resume_session:
|
||||
session_state = _load_resume_state(args.agent_path, resume_session, checkpoint)
|
||||
if session_state is None:
|
||||
print(
|
||||
f"Error: Could not load session state for {resume_session}",
|
||||
file=sys.stderr,
|
||||
)
|
||||
return 1
|
||||
if not args.quiet:
|
||||
resume_node = session_state.get("paused_at", "unknown")
|
||||
if checkpoint:
|
||||
print(f"Resuming from checkpoint: {checkpoint}")
|
||||
else:
|
||||
print(f"Resuming session: {resume_session}")
|
||||
print(f"Resume point: {resume_node}")
|
||||
print()
|
||||
|
||||
# Auto-inject user_id if the agent expects it but it's not provided
|
||||
entry_input_keys = runner.graph.nodes[0].input_keys if runner.graph.nodes else []
|
||||
if "user_id" in entry_input_keys and context.get("user_id") is None:
|
||||
@@ -440,7 +515,7 @@ def cmd_run(args: argparse.Namespace) -> int:
|
||||
print("=" * 60)
|
||||
print()
|
||||
|
||||
result = asyncio.run(runner.run(context))
|
||||
result = asyncio.run(runner.run(context, session_state=session_state))
|
||||
|
||||
# Format output
|
||||
output = {
|
||||
|
||||
@@ -0,0 +1,42 @@
|
||||
# Why Conditional Edges Need Priority (Function Nodes)
|
||||
|
||||
## The problem
|
||||
|
||||
Function nodes return everything they computed. They don't pick one output key — they return all of them.
|
||||
|
||||
```python
|
||||
def score_lead(inputs):
|
||||
score = compute_score(inputs["profile"])
|
||||
return {
|
||||
"score": score,
|
||||
"is_high_value": score > 80,
|
||||
"needs_enrichment": score > 50 and not inputs["profile"].get("company"),
|
||||
}
|
||||
```
|
||||
|
||||
Lead comes in: score 92, no company on file. Output: `{"score": 92, "is_high_value": True, "needs_enrichment": True}`.
|
||||
|
||||
Two conditional edges leaving this node:
|
||||
|
||||
```
|
||||
Edge A: needs_enrichment == True → enrichment node
|
||||
Edge B: is_high_value == True → outreach node
|
||||
```
|
||||
|
||||
Both are true. Without priority, the graph either fans out to both (wrong — you'd email someone while still enriching their data) or picks one randomly (wrong — non-deterministic).
|
||||
|
||||
## Priority fixes it
|
||||
|
||||
```
|
||||
Edge A: needs_enrichment == True priority=2 (higher = checked first)
|
||||
Edge B: is_high_value == True priority=1
|
||||
Edge C: is_high_value == False priority=0
|
||||
```
|
||||
|
||||
Executor keeps only the highest-priority matching group. A wins. Lead gets enriched first, loops back, gets re-scored — now `needs_enrichment` is false, B wins, outreach happens.
|
||||
|
||||
## Why event loop nodes don't need this
|
||||
|
||||
The LLM understands "if/else." You tell it in the prompt: "if needs enrichment, set `needs_enrichment`. Otherwise if high value, set `approved`." It picks one. Only one conditional edge matches.
|
||||
|
||||
A function just returns a dict. It doesn't do "otherwise." Priority is the "otherwise" for function nodes.
|
||||
Reference in New Issue
Block a user