feat: automated test agent skill

2026-02-09 12:39:20 -08:00
parent 9d11f834b8
commit faf534511b
4 changed files with 132 additions and 14 deletions
@@ -138,10 +138,10 @@ Two execution paths, use the right one for your situation.
 Run the agent via CLI. This creates sessions with checkpoints at `~/.hive/agents/{agent_name}/sessions/`:

 ```bash
-PYTHONPATH=core:exports uv run python -m {agent_name} --tui
+uv run hive run exports/{agent_name} --input '{"query": "test topic"}'
 ```

-The TUI lets you interact with client-facing nodes and see real-time execution. Sessions and checkpoints are saved automatically.
+Sessions and checkpoints are saved automatically. For agents with client-facing nodes that require user interaction, the user must launch the TUI manually in a separate terminal (Claude Code cannot interact with TUI apps).

 #### Automated regression (for CI or final verification)

@@ -334,7 +334,7 @@ Resume when ALL of these are true:

 ```bash
 # Resume from the last clean checkpoint before the failing node
-PYTHONPATH=core:exports uv run python -m {agent_name} --tui \
+uv run hive run exports/{agent_name} \
  --resume-session session_20260209_143022_abc12345 \
  --checkpoint cp_node_complete_research_143030
 ```
@@ -350,7 +350,7 @@ Re-run when ANY of these are true:
 - You changed the graph structure (added/removed nodes/edges)

 ```bash
-PYTHONPATH=core:exports uv run python -m {agent_name} --tui
+uv run hive run exports/{agent_name} --input '{"query": "test topic"}'
 ```

 #### Inspecting a checkpoint before resuming
@@ -696,7 +696,7 @@ run_tests(goal_id, agent_path, test_types='["success"]')

 ```bash
 # Iterative debugging with checkpoints (via CLI)
-PYTHONPATH=core:exports uv run python -m {agent_name} --tui
+uv run hive run exports/{agent_name} --input '{"query": "test"}'
 ```

 ### Phase 3: Analysis
@@ -739,8 +739,8 @@ get_agent_checkpoint(agent_work_dir, session_id, checkpoint_id)
 ```

 ```bash
-# Resume from checkpoint via CLI
-PYTHONPATH=core:exports uv run python -m {agent_name} --tui \
+# Resume from checkpoint via CLI (headless)
+uv run hive run exports/{agent_name} \
  --resume-session {session_id} --checkpoint {checkpoint_id}
 ```

@@ -757,8 +757,9 @@ PYTHONPATH=core:exports uv run python -m {agent_name} --tui \
 | Write 30+ tests | Write 8-15 focused tests |
 | Skip credential check | Use `/hive-credentials` before testing |
 | Confuse `exports/` with `~/.hive/agents/` | Code in `exports/`, runtime data in `~/.hive/` |
-| Use `run_tests` for iterative debugging | Use CLI with checkpoints for iterative debugging |
-| Use CLI for final regression | Use `run_tests` for automated regression |
+| Use `run_tests` for iterative debugging | Use headless CLI with checkpoints for iterative debugging |
+| Use headless CLI for final regression | Use `run_tests` for automated regression |
+| Use `--tui` from Claude Code | Use headless `run` command — TUI hangs in non-interactive shells |
 | Run tests without reading goal first | Always understand the goal before writing tests |
 | Skip Phase 3 analysis and guess | Use session + log tools to identify root cause |

@@ -866,7 +867,7 @@ list_agent_checkpoints(
 # → cp_node_complete_intake_150005

 # Resume from after intake, re-run research with fixed prompt
-PYTHONPATH=core:exports uv run python -m deep_research_agent --tui \
+uv run hive run exports/deep_research_agent \
  --resume-session session_20260209_150000_abc12345 \
  --checkpoint cp_node_complete_intake_150005
 ```
@@ -874,7 +875,7 @@ PYTHONPATH=core:exports uv run python -m deep_research_agent --tui \
 Or for this simple case (intake is fast), just re-run:

 ```bash
-PYTHONPATH=core:exports uv run python -m deep_research_agent --tui
+uv run hive run exports/deep_research_agent --input '{"topic": "test"}'
 ```

 ### Phase 6: Final verification
@@ -259,7 +259,7 @@ The fix is to the `report` node (the last node). To demonstrate checkpoint recov

 ```bash
 # Run via CLI to get checkpoints
-PYTHONPATH=core:exports uv run python -m deep_research_agent --tui
+uv run hive run exports/deep_research_agent --input '{"topic": "climate change effects"}'

 # After it runs, find the clean checkpoint before report
 list_agent_checkpoints(
@@ -270,7 +270,7 @@ list_agent_checkpoints(
 # → cp_node_complete_review_152100 (after review, before report)

 # Resume — skips intake, research, review entirely
-PYTHONPATH=core:exports uv run python -m deep_research_agent --tui \
+uv run hive run exports/deep_research_agent \
  --resume-session session_20260209_152000_ghi34567 \
  --checkpoint cp_node_complete_review_152100
 ```
@@ -332,6 +332,60 @@ def register_commands(subparsers: argparse._SubParsersAction) -> None:
    resume_parser.set_defaults(func=cmd_resume)


+def _load_resume_state(
+    agent_path: str, session_id: str, checkpoint_id: str | None = None
+) -> dict | None:
+    """Load session or checkpoint state for headless resume.
+
+    Args:
+        agent_path: Path to the agent folder (e.g., exports/my_agent)
+        session_id: Session ID to resume from
+        checkpoint_id: Optional checkpoint ID within the session
+
+    Returns:
+        session_state dict for executor, or None if not found
+    """
+    agent_name = Path(agent_path).name
+    agent_work_dir = Path.home() / ".hive" / "agents" / agent_name
+    session_dir = agent_work_dir / "sessions" / session_id
+
+    if not session_dir.exists():
+        return None
+
+    if checkpoint_id:
+        # Checkpoint-based resume: load checkpoint and extract state
+        cp_path = session_dir / "checkpoints" / f"{checkpoint_id}.json"
+        if not cp_path.exists():
+            return None
+        try:
+            cp_data = json.loads(cp_path.read_text())
+        except (json.JSONDecodeError, OSError):
+            return None
+        return {
+            "memory": cp_data.get("shared_memory", {}),
+            "paused_at": cp_data.get("next_node") or cp_data.get("current_node"),
+            "execution_path": cp_data.get("execution_path", []),
+            "node_visit_counts": {},
+        }
+    else:
+        # Session state resume: load state.json
+        state_path = session_dir / "state.json"
+        if not state_path.exists():
+            return None
+        try:
+            state_data = json.loads(state_path.read_text())
+        except (json.JSONDecodeError, OSError):
+            return None
+        progress = state_data.get("progress", {})
+        paused_at = progress.get("paused_at") or progress.get("resume_from")
+        return {
+            "memory": state_data.get("memory", {}),
+            "paused_at": paused_at,
+            "execution_path": progress.get("path", []),
+            "node_visit_counts": progress.get("node_visit_counts", {}),
+        }
+
+
 def cmd_run(args: argparse.Namespace) -> int:
    """Run an exported agent."""
    import logging
@@ -421,6 +475,27 @@ def cmd_run(args: argparse.Namespace) -> int:
            print(f"Error: {e}", file=sys.stderr)
            return 1

+        # Load session/checkpoint state for resume (headless mode)
+        session_state = None
+        resume_session = getattr(args, "resume_session", None)
+        checkpoint = getattr(args, "checkpoint", None)
+        if resume_session:
+            session_state = _load_resume_state(args.agent_path, resume_session, checkpoint)
+            if session_state is None:
+                print(
+                    f"Error: Could not load session state for {resume_session}",
+                    file=sys.stderr,
+                )
+                return 1
+            if not args.quiet:
+                resume_node = session_state.get("paused_at", "unknown")
+                if checkpoint:
+                    print(f"Resuming from checkpoint: {checkpoint}")
+                else:
+                    print(f"Resuming session: {resume_session}")
+                print(f"Resume point: {resume_node}")
+                print()
+
        # Auto-inject user_id if the agent expects it but it's not provided
        entry_input_keys = runner.graph.nodes[0].input_keys if runner.graph.nodes else []
        if "user_id" in entry_input_keys and context.get("user_id") is None:
@@ -440,7 +515,7 @@ def cmd_run(args: argparse.Namespace) -> int:
            print("=" * 60)
            print()

-        result = asyncio.run(runner.run(context))
+        result = asyncio.run(runner.run(context, session_state=session_state))

    # Format output
    output = {
@@ -0,0 +1,42 @@
+# Why Conditional Edges Need Priority (Function Nodes)
+
+## The problem
+
+Function nodes return everything they computed. They don't pick one output key — they return all of them.
+
+```python
+def score_lead(inputs):
+    score = compute_score(inputs["profile"])
+    return {
+        "score": score,
+        "is_high_value": score > 80,
+        "needs_enrichment": score > 50 and not inputs["profile"].get("company"),
+    }
+```
+
+Lead comes in: score 92, no company on file. Output: `{"score": 92, "is_high_value": True, "needs_enrichment": True}`.
+
+Two conditional edges leaving this node:
+
+```
+Edge A: needs_enrichment == True  → enrichment node
+Edge B: is_high_value == True     → outreach node
+```
+
+Both are true. Without priority, the graph either fans out to both (wrong — you'd email someone while still enriching their data) or picks one randomly (wrong — non-deterministic).
+
+## Priority fixes it
+
+```
+Edge A: needs_enrichment == True   priority=2  (higher = checked first)
+Edge B: is_high_value == True      priority=1
+Edge C: is_high_value == False     priority=0
+```
+
+Executor keeps only the highest-priority matching group. A wins. Lead gets enriched first, loops back, gets re-scored — now `needs_enrichment` is false, B wins, outreach happens.
+
+## Why event loop nodes don't need this
+
+The LLM understands "if/else." You tell it in the prompt: "if needs enrichment, set `needs_enrichment`. Otherwise if high value, set `approved`." It picks one. Only one conditional edge matches.
+
+A function just returns a dict. It doesn't do "otherwise." Priority is the "otherwise" for function nodes.