feat: worker agent memory

2026-04-02 17:05:32 -07:00
parent 4006ee96b6
commit ec08ae7438
37 changed files with 1322 additions and 161 deletions
@@ -406,15 +406,15 @@ flowchart TB

 ### How It Works

-**1. Outputs are persisted via the accumulator.** When the LLM calls `set_output(key, value)`, the `OutputAccumulator` stores the value in memory and writes through to the `ConversationStore` cursor (for crash recovery).
+**1. Outputs are persisted via the accumulator.** When the LLM calls `set_output(key, value)`, the `OutputAccumulator` stores the value in the data buffer and writes through to the `ConversationStore` cursor (for crash recovery).

 **2. Judge feedback becomes conversation memory.** When the judge issues a RETRY verdict with feedback, that feedback is injected as a `[Judge feedback]: ...` user message into the conversation. On the next LLM turn, the agent sees its prior attempt, the judge's critique, and can adjust. This is the core reflexion mechanism — in-context learning without model retraining.

 **3. The three-layer prompt onion refreshes each turn.** Layer 1 (identity) is static. Layer 2 (narrative) is rebuilt deterministically from `DataBuffer.read_all()` and the execution path — listing completed phases and current state values. Layer 3 (focus) is the current node's `system_prompt`. At phase transitions in continuous mode, Layer 3 swaps while Layers 1-2 and the full conversation history carry forward.

-**4. Phase transitions inject structured reflection.** When execution moves between nodes, a transition marker is inserted into the conversation containing: what phase completed, all outputs in memory, available data files, available tools, and an explicit reflection prompt: *"Before proceeding, briefly reflect: what went well in the previous phase? Are there any gaps or surprises worth noting?"* This engineered metacognition surfaces issues before they compound.
+**4. Phase transitions inject structured reflection.** When execution moves between nodes, a transition marker is inserted into the conversation containing: what phase completed, all outputs in the data buffer, available data files, available tools, and an explicit reflection prompt: *"Before proceeding, briefly reflect: what went well in the previous phase? Are there any gaps or surprises worth noting?"* This engineered metacognition surfaces issues before they compound.

-**5. Data buffer connects phases.** On ACCEPT, the accumulator's outputs are written to `DataBuffer`. The narrative layer reads these values to describe progress. In continuous mode, subsequent nodes see both the conversation history (what was discussed) and the structured memory (what was decided). In isolated mode, a `ContextHandoff` summarizes the prior node's conversation for the next node's input.
+**5. Data buffer connects phases.** On ACCEPT, the accumulator's outputs are written to `DataBuffer`. The narrative layer reads these values to describe progress. In continuous mode, subsequent nodes see both the conversation history (what was discussed) and the structured buffer state (what was decided). In isolated mode, a `ContextHandoff` summarizes the prior node's conversation for the next node's input.

 ### The Judge Evaluation Pipeline

@@ -90,7 +90,6 @@ hive/                                    # Repository root
 │   │   ├── graph/                       # GraphExecutor - executes node graphs
 │   │   ├── llm/                         # LLM provider integrations (Anthropic, OpenAI, OpenRouter, Hive, etc.)
 │   │   ├── mcp/                         # MCP server integration
-│   │   ├── monitoring/                  # Runtime monitoring
 │   │   ├── observability/               # Structured logging - human-readable and machine-parseable tracing
 │   │   ├── runner/                      # AgentRunner - loads and runs agents
 │   │   ├── runtime/                     # Runtime environment
@@ -127,12 +127,12 @@ decisions                 │                        │
    "input_keys": {
      "type": "array",
      "items": { "type": "string" },
-      "description": "Expected input memory keys"
+      "description": "Expected input buffer keys"
    },
    "output_keys": {
      "type": "array",
      "items": { "type": "string" },
-      "description": "Expected output memory keys"
+      "description": "Expected output buffer keys"
    },
    "success_criteria": {
      "type": "string",
@@ -4,13 +4,13 @@

 Real business processes aren't linear. A sales outreach might go: research a prospect, draft a message, realize the research is thin, go back and dig deeper, draft again, get human approval, send. There are loops, branches, fallbacks, and decision points.

-Hive models this as a directed graph. Nodes do work, edges connect them, and shared memory lets them pass data. The framework walks this structure — running nodes, following edges, managing retries — until the agent reaches its goal or exhausts its step budget.
+Hive models this as a directed graph. Nodes do work, edges connect them, and a shared data buffer lets them pass data. The framework walks this structure — running nodes, following edges, managing retries — until the agent reaches its goal or exhausts its step budget.

 Edges can loop back, creating feedback cycles where an agent retries a step or takes a different path. That's intentional. A graph that only moves forward can't self-correct.

 ## Nodes

-A node is a unit of work. Each node reads inputs from shared memory, does something, and writes outputs back.
+A node is a unit of work. Each node reads inputs from the shared buffer, does something, and writes outputs back.

 **`event_loop`** — This is the only node type in Hive. It's a multi-turn LLM loop where the model reasons about the current state, calls tools, observes results, and keeps going until it has produced the required outputs. All agent behavior happens in these nodes. They handle long-running tasks, manage their own context window, and can recover from crashes mid-conversation.

@@ -47,11 +47,11 @@ Edges also handle data plumbing between nodes — mapping one node's outputs to

 When a node has multiple outgoing edges, the framework can run those branches in parallel and reconverge when they're all done. This is useful for tasks like researching a prospect from multiple sources simultaneously.

-## Shared Memory
+## Shared Buffer

-Shared memory is how nodes communicate. It's a key-value store scoped to a single [session](./worker_agent.md). Every node declares which keys it reads and which it writes, and the framework enforces those boundaries — a node can't quietly access data it hasn't declared.
+The shared buffer is how nodes communicate. It's a key-value store scoped to a single [session](./worker_agent.md). Every node declares which keys it reads and which it writes, and the framework enforces those boundaries — a node can't quietly access data it hasn't declared.

-Data flows through the graph in a natural way: input arrives at the start, each node reads what it needs and writes what it produces, and edges map outputs to inputs as data moves between nodes. At the end, the full memory state is the execution result.
+Data flows through the graph in a natural way: input arrives at the start, each node reads what it needs and writes what it produces, and edges map outputs to inputs as data moves between nodes. At the end, the full buffer state is the execution result.

 ## Human-in-the-Loop

@@ -10,7 +10,7 @@ In Hive, a **Coding Agent** (like Claude Code or Cursor) generates worker agents

 A session is a single execution of a worker agent against a specific input. If your outreach agent processes 50 prospects, that's 50 sessions.

-Each session is isolated — it has its own shared memory, its own execution state, and its own history. This matters because sessions can be long-running. An agent might start researching a prospect, pause for human approval, wait hours or days, and then resume to send the message. The session preserves everything across that gap.
+Each session is isolated — it has its own shared buffer, its own execution state, and its own history. This matters because sessions can be long-running. An agent might start researching a prospect, pause for human approval, wait hours or days, and then resume to send the message. The session preserves everything across that gap.

 Sessions also make debugging straightforward. Every decision the agent made, every tool it called, every retry it attempted — it's all captured in the session. When something goes wrong, you can trace exactly what happened.

@@ -32,7 +32,7 @@ This is the operational model Hive is designed for: agents that run 24/7 as part

 ## The Runtime

-The worker agent runtime manages the lifecycle: starting sessions, executing the graph, handling pauses and resumes, tracking costs, and collecting metrics. It coordinates everything the agent needs — LLM access, tool execution, shared memory, credential management — so individual nodes can focus on their specific job.
+The worker agent runtime manages the lifecycle: starting sessions, executing the graph, handling pauses and resumes, tracking costs, and collecting metrics. It coordinates everything the agent needs — LLM access, tool execution, shared buffer state, credential management — so individual nodes can focus on their specific job.

 Key things the runtime handles:

@@ -42,7 +42,7 @@ Key things the runtime handles:

 **Event streaming** — The runtime emits events as the agent works. You can wire these up to dashboards, logs, or alerting systems to monitor agents in real time.

-**Crash recovery** — If execution is interrupted (process crash, deployment, anything), the runtime can resume from the last checkpoint. Conversation state and memory are persisted, so the agent picks up where it left off rather than starting over.
+**Crash recovery** — If execution is interrupted (process crash, deployment, anything), the runtime can resume from the last checkpoint. Conversation state and buffer state are persisted, so the agent picks up where it left off rather than starting over.

 ## The Big Picture

@@ -268,7 +268,7 @@ Default skills differ from community skills in how they integrate:
 | Aspect       | Default Skills                                 | Community Skills                                      |
 | ------------ | ---------------------------------------------- | ----------------------------------------------------- |
 | Loaded by    | Framework automatically                        | Agent decides at runtime (or pre-activated in config) |
-| Integration  | System prompt injection + shared memory hooks  | Instruction-following (standard Agent Skills)         |
+| Integration  | System prompt injection + shared buffer hooks  | Instruction-following (standard Agent Skills)         |
 | Graph impact | No dedicated nodes — woven into existing nodes | None (just context)                                   |
 | Overridable  | Yes (disable, configure, or replace)           | N/A                                                   |

@@ -294,7 +294,7 @@ Six default skills ship with Hive:
 ```markdown
 ## Operational Protocol: Structured Note-Taking

-Maintain structured working notes in shared memory key `_working_notes`.
+Maintain structured working notes in shared buffer key `_working_notes`.
 Update at these checkpoints:

 - After completing each discrete subtask or batch item
@@ -503,7 +503,7 @@ All default skill protocols combined must total under **2000 tokens** to minimiz

 ### 5.6 Shared Memory Convention

-All default skill shared memory keys use the `_` prefix (`_working_notes`, `_batch_ledger`, etc.) to avoid collisions with domain-level keys. These keys are:
+All default skill shared buffer keys use the `_` prefix (`_working_notes`, `_batch_ledger`, etc.) to avoid collisions with domain-level keys. These keys are:

 - Visible to the agent (for self-reference)
 - Visible to the judge (for evaluation context)
@@ -651,7 +651,7 @@ CI runs these evals on submitted skills to validate quality.
 | DS-2  | Default skills are valid Agent Skills packages (`SKILL.md` format) in the framework install directory                                                                 | P0       |
 | DS-3  | All default skills loaded automatically for every worker agent unless explicitly disabled                                                                             | P0       |
 | DS-4  | Default skills integrate via system prompt injection — no additional graph nodes                                                                                      | P0       |
-| DS-5  | Default skills use `_`-prefixed shared memory keys to avoid domain collisions                                                                                         | P0       |
+| DS-5  | Default skills use `_`-prefixed shared buffer keys to avoid domain collisions                                                                                         | P0       |
 | DS-6  | Each default skill independently configurable via `default_skills` in agent config                                                                                    | P0       |
 | DS-7  | All defaults disableable at once: `{"_all": {"enabled": false}}`                                                                                                      | P0       |
 | DS-8  | Default skill protocols appended in a `## Operational Protocols` system prompt section                                                                                | P0       |
@@ -853,7 +853,7 @@ Skills and MCP servers are complementary:
 | "Install and use your first skill"     | Users             | From `hive skill search` to skill activating in a session                      |
 | "Write your first skill"               | Contributors      | Step-by-step: `hive skill init` → write SKILL.md → validate → submit PR        |
 | "Port a skill from Claude Code/Cursor" | Contributors      | Usually just install it — guide explains verification                          |
-| "Default skills reference"             | All users         | All 6 defaults: purpose, config, shared memory keys, tuning                    |
+| "Default skills reference"             | All users         | All 6 defaults: purpose, config, shared buffer keys, tuning                    |
 | "Tuning default skills"                | Advanced builders | When to disable vs. configure; per-agent overrides; measuring impact           |
 | Skill cookbook                         | Contributors      | Annotated examples: research, triage, draft, review, outreach, data extraction |
 | "Evaluating skill quality"             | Contributors      | Setting up evals, writing assertions, iterating with the eval-driven loop      |
@@ -865,7 +865,7 @@ Skills and MCP servers are complementary:

 | Phase                                   | Scope                                                                                                                                                                                                                                                                                                                                                      | Depends On |
 | --------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------- |
-| **Phase 0: Default Skills**             | Implement 6 default skills as `SKILL.md` packages; `DefaultSkillManager` with system prompt injection, iteration callbacks, node completion hooks, phase transition hooks; `DefaultSkillConfig` in Python API and `agent.json`; `_`-prefixed shared memory convention; startup logging                                                                     | —          |
+| **Phase 0: Default Skills**             | Implement 6 default skills as `SKILL.md` packages; `DefaultSkillManager` with system prompt injection, iteration callbacks, node completion hooks, phase transition hooks; `DefaultSkillConfig` in Python API and `agent.json`; `_`-prefixed shared buffer convention; startup logging                                                                     | —          |
 | **Phase 1: Agent Skills Standard**      | `SkillDiscovery` scanning `.agents/skills/` and `.hive/skills/`; `SKILL.md` parsing with lenient validation; progressive disclosure (catalog injection, activation, resource loading); model-driven and user-driven activation; context protection; deduplication; pre-activated skills config; compatibility tests against `github.com/anthropics/skills` | —          |
 | **Phase 2: CLI & Contributor Tooling**  | `hive skill init`, `validate`, `test`, `fork`; `hive skill doctor`; `hive skill install/remove/list/search/info/update`; version pinning; `skills-ref` integration for validation                                                                                                                                                                          | Phase 1    |
 | **Phase 3: Registry Repo**              | Create `hive-skill-registry` GitHub repo; CI validation using `skills-ref`; `_template/`; `CONTRIBUTING.md`; seed with 10+ skills (extracted from templates + ported from anthropics/skills); eval CI                                                                                                                                                      | Phase 1    |
@@ -249,7 +249,7 @@ Hive ships with six built-in operational skills that provide runtime resilience.

 | Skill | Purpose |
 |-------|---------|
-| `hive.note-taking` | Structured working notes in shared memory |
+| `hive.note-taking` | Structured working notes in the shared buffer |
 | `hive.batch-ledger` | Track per-item status in batch operations |
 | `hive.context-preservation` | Save context before context window pruning |
 | `hive.quality-monitor` | Self-assess output quality periodically |
@@ -287,4 +287,4 @@ Skills written for any Agent Skills-compatible agent work in Hive:
 - The `SKILL.md` format is identical across Claude Code, Cursor, Gemini CLI, and others.
 - Skills installed at `~/.agents/skills/` are visible to all compatible agents on your machine.

-See the [Agent Skills specification](https://agentskills.io/specification) for the full format reference.
+See the [Agent Skills specification](https://agentskills.io/specification) for the full format reference.