Compare commits
20 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| b42a3293f1 | |||
| ba02e53bdd | |||
| 40d32f2e01 | |||
| 7779bc5336 | |||
| a2d21ec7bc | |||
| 06ccc853ee | |||
| 4847332161 | |||
| 8c1ee54725 | |||
| a12163d63f | |||
| 0cd6f21980 | |||
| a88fc1d75c | |||
| e9bde26611 | |||
| c02f40622c | |||
| 3328a388b3 | |||
| 8f632eb005 | |||
| c8ee961436 | |||
| bc9f6b0af8 | |||
| 594bceb8f5 | |||
| 47cd55052f | |||
| fb203b5bdf |
@@ -1,10 +1,10 @@
|
||||
---
|
||||
name: hive-create
|
||||
description: Step-by-step guide for building goal-driven agents. Creates package structure, defines goals, adds nodes, connects edges, and finalizes agent class. Use when actively building an agent.
|
||||
description: Step-by-step guide for building goal-driven agents. Qualifies use cases first (the good, bad, and ugly), then creates package structure, defines goals, adds nodes, connects edges, and finalizes agent class. Use when actively building an agent.
|
||||
license: Apache-2.0
|
||||
metadata:
|
||||
author: hive
|
||||
version: "2.1"
|
||||
version: "2.2"
|
||||
type: procedural
|
||||
part_of: hive
|
||||
requires: hive-concepts
|
||||
@@ -85,7 +85,7 @@ mcp__agent-builder__list_mcp_tools()
|
||||
mkdir -p exports/AGENT_NAME/nodes
|
||||
```
|
||||
|
||||
**Save the tool list for step 3** — you will need it for node design in STEP 3.
|
||||
**Save the tool list for STEP 4** — you will need it for node design.
|
||||
|
||||
**THEN immediately proceed to STEP 2** (do NOT display setup results to the user — just move on).
|
||||
|
||||
@@ -194,6 +194,7 @@ This reads the agent.json and populates the builder session with the goal, all n
|
||||
---
|
||||
|
||||
## STEP 2: Define Goal Together with User
|
||||
**A responsible engineer doesn't jump into building. First, understand the problem and be transparent about what the framework can and cannot do.**
|
||||
|
||||
**If starting from a template**, the goal is already loaded in the builder session. Present the existing goal to the user using the format below and ask for approval. Skip the collaborative drafting questions — go straight to presenting and asking "Do you approve this goal, or would you like to modify it?"
|
||||
|
||||
@@ -201,7 +202,7 @@ This reads the agent.json and populates the builder session with the goal, all n
|
||||
|
||||
```
|
||||
AskUserQuestion(questions=[{
|
||||
"question": "What kind of agent do you want to build?",
|
||||
"question": "What kind of agent do you want to build? Select an option below, or choose 'Other' to describe your own.",
|
||||
"header": "Agent type",
|
||||
"options": [
|
||||
{"label": "Data collection", "description": "Gathers information from the web, analyzes it, and produces a report or sends outreach (e.g. market research, news digest, email campaigns, competitive analysis)"},
|
||||
@@ -216,15 +217,222 @@ Use the user's selection (or their custom description if they chose "Other") as
|
||||
|
||||
**DO NOT propose a complete goal on your own.** Instead, collaborate with the user to define it.
|
||||
|
||||
**START by asking the user to help shape the goal:**
|
||||
### 2a: Fast Discovery (3-8 Turns)
|
||||
|
||||
> I've set up the build environment and discovered [N] available tools. Let's define the goal for your agent together.
|
||||
**The core principle**: Discovery should feel like progress, not paperwork. The stakeholder should walk away feeling like you understood them faster than anyone else would have.
|
||||
|
||||
**Communication sytle**: Be concise. Say less. Mean more. Impatient stakeholders don't want a wall of text — they want to know you get it. Every sentence you say should either move the conversation forward or prove you understood something. If it does neither, cut it.
|
||||
|
||||
**Ask Question Rules: Respect Their Time.** Every question must earn its place by:
|
||||
1. **Preventing a costly wrong turn** — you're about to build the wrong thing
|
||||
2. **Unlocking a shortcut** — their answer lets you simplify the design
|
||||
3. **Surfacing a dealbreaker** — there's a constraint that changes everything
|
||||
4. **Provide Options** - Provide options to your questions if possible, but also always allow the user to type something beyong the options.
|
||||
|
||||
If a question doesn't do one of these, don't ask it. Make an assumption, state it, and move on.
|
||||
|
||||
---
|
||||
|
||||
#### 2a.1: Let Them Talk, But Listen Like an Architect
|
||||
|
||||
When the stakeholder describes what they want, don't just hear the words — listen for the architecture underneath. While they talk, mentally construct:
|
||||
|
||||
- **The actors**: Who are the people/systems involved?
|
||||
- **The trigger**: What kicks off the workflow?
|
||||
- **The core loop**: What's the main thing that happens repeatedly?
|
||||
- **The output**: What's the valuable thing produced at the end?
|
||||
- **The pain**: What about today's situation is broken, slow, or missing?
|
||||
|
||||
You are extracting a **domain model** from natural language in real time. Most stakeholders won't give you this structure explicitly — they'll give you a story. Your job is to hear the structure inside the story.
|
||||
|
||||
| They say... | You're hearing... |
|
||||
|-------------|-------------------|
|
||||
| Nouns they repeat | Your entities |
|
||||
| Verbs they emphasize | Your core operations |
|
||||
| Frustrations they mention | Your design constraints |
|
||||
| Workarounds they describe | What the system must replace |
|
||||
| People they name | Your user types |
|
||||
|
||||
---
|
||||
|
||||
#### 2a.2: Use Domain Knowledge to Fill In the Blanks
|
||||
|
||||
You have broad knowledge of how systems work. Use it aggressively.
|
||||
|
||||
If they say "I need a research agent," you already know it probably involves: search, summarization, source tracking, and iteration. Don't ask about each — use them as your starting mental model and let their specifics override your defaults.
|
||||
|
||||
If they say "I need to monitor files and alert me," you know this probably involves: watch patterns, triggers, notifications, and state tracking.
|
||||
|
||||
**The key move**: Take your general knowledge of the domain and merge it with the specifics they've given you. The result is a draft understanding that's 60-80% right before you've asked a single question. Your questions close the remaining 20-40%.
|
||||
|
||||
---
|
||||
|
||||
#### 2a.3: Play Back a Proposed Model (Not a List of Questions)
|
||||
|
||||
After listening, present a **concrete picture** of what you think they need. Make it specific enough that they can spot what's wrong.
|
||||
|
||||
**Pattern: "Here's what I heard — tell me where I'm off"**
|
||||
|
||||
> "OK here's how I'm picturing this: [User type] needs to [core action]. Right now they're [current painful workflow]. What you want is [proposed solution that replaces the pain].
|
||||
>
|
||||
> To get started, can you help me understand:
|
||||
> The way I'd structure this: [key entities] connected by [key relationships], with the main flow being [trigger → steps → outcome].
|
||||
>
|
||||
> 1. **What should this agent accomplish?** (the core purpose)
|
||||
> 2. **How will we know it succeeded?** (what does "done" look like)
|
||||
> 3. **Are there any hard constraints?** (things it must never do, quality bars, etc.)
|
||||
> For the MVP, I'd focus on [the one thing that delivers the most value] and hold off on [things that can wait].
|
||||
>
|
||||
> Before I start — [1-2 specific questions you genuinely can't infer]."
|
||||
|
||||
Why this works:
|
||||
- **Proves you were listening** — they don't feel like they have to repeat themselves
|
||||
- **Shows competence** — you're already thinking in systems
|
||||
- **Fast to correct** — "no, it's more like X" takes 10 seconds vs. answering 15 questions
|
||||
- **Creates momentum** — heading toward building, not more talking
|
||||
|
||||
---
|
||||
|
||||
#### 2a.4: Ask Only What You Cannot Infer
|
||||
|
||||
Your questions should be **narrow, specific, and consequential**. Never ask what you could answer yourself.
|
||||
|
||||
**Good questions** (high-stakes, can't infer):
|
||||
- "Who's the primary user — you or your end customers?"
|
||||
- "Is this replacing a spreadsheet, or is there literally nothing today?"
|
||||
- "Does this need to integrate with anything, or standalone?"
|
||||
- "Is there existing data to migrate, or starting fresh?"
|
||||
|
||||
**Bad questions** (low-stakes, inferable):
|
||||
- "What should happen if there's an error?" *(handle gracefully, obviously)*
|
||||
- "Should it have search?" *(if there's a list, yes)*
|
||||
- "How should we handle permissions?" *(follow standard patterns)*
|
||||
- "What tools should I use?" *(your call, not theirs)*
|
||||
|
||||
---
|
||||
|
||||
#### Conversation Flow (3-5 Turns)
|
||||
|
||||
| Turn | Who | What |
|
||||
|------|-----|------|
|
||||
| 1 | User | Describes what they need |
|
||||
| 2 | Agent | Plays back understanding as a proposed model. Asks 1-2 critical questions max. |
|
||||
| 3 | User | Corrects, confirms, or adds detail |
|
||||
| 4 | Agent | Adjusts model, confirms MVP scope, states assumptions, declares starting point |
|
||||
| *(5)* | *(Only if Turn 3 revealed something that fundamentally changes the approach)* |
|
||||
|
||||
**AFTER the conversation, IMMEDIATELY proceed to 2b. DO NOT skip to building.**
|
||||
|
||||
---
|
||||
|
||||
#### Anti-Patterns
|
||||
|
||||
| Don't | Do Instead |
|
||||
|-------|------------|
|
||||
| Open with a list of questions | Open with what you understood from their request |
|
||||
| "What are your requirements?" | "Here's what I think you need — am I right?" |
|
||||
| Ask about every edge case | Handle with smart defaults, flag in summary |
|
||||
| 10+ turn discovery conversation | 3-8 turns. Start building, iterate with real software. |
|
||||
| Being lazy nd not understand what user want to achieve | Understand "what" and "why |
|
||||
| Ask for permission to start | State your plan and start |
|
||||
| Wait for certainty | Start at 80% confidence, iterate the rest |
|
||||
| Ask what tech/tools to use | That's your job. Decide, disclose, move on. |
|
||||
|
||||
---
|
||||
|
||||
|
||||
|
||||
### 2b: Capability Assessment
|
||||
|
||||
**After the user responds, analyze the fit.** Present this assessment honestly:
|
||||
|
||||
> **Framework Fit Assessment**
|
||||
>
|
||||
> Based on what you've described, here's my honest assessment of how well this framework fits your use case:
|
||||
>
|
||||
> **What Works Well (The Good):**
|
||||
> - [List 2-4 things the framework handles well for this use case]
|
||||
> - Examples: multi-turn conversations, human-in-the-loop review, tool orchestration, structured outputs
|
||||
>
|
||||
> **Limitations to Be Aware Of (The Bad):**
|
||||
> - [List 2-3 limitations that apply but are workable]
|
||||
> - Examples: LLM latency means not suitable for sub-second responses, context window limits for very large documents, cost per run for heavy tool usage
|
||||
>
|
||||
> **Potential Deal-Breakers (The Ugly):**
|
||||
> - [List any significant challenges or missing capabilities — be honest]
|
||||
> - Examples: no tool available for X, would require custom MCP server, framework not designed for Y
|
||||
|
||||
**Be specific.** Reference the actual tools discovered in Step 1. If the user needs `send_email` but it's not available, say so. If they need real-time streaming from a database, explain that's not how the framework works.
|
||||
|
||||
### 2c: Gap Analysis
|
||||
|
||||
**Identify specific gaps** between what the user wants and what you can deliver:
|
||||
|
||||
| Requirement | Framework Support | Gap/Workaround |
|
||||
|-------------|-------------------|----------------|
|
||||
| [User need] | [✅ Supported / ⚠️ Partial / ❌ Not supported] | [How to handle or why it's a problem] |
|
||||
|
||||
**Examples of gaps to identify:**
|
||||
- Missing tools (user needs X, but only Y and Z are available)
|
||||
- Scope issues (user wants to process 10,000 items, but LLM rate limits apply)
|
||||
- Interaction mismatches (user wants CLI-only, but agent is designed for TUI)
|
||||
- Data flow issues (user needs to persist state across runs, but sessions are isolated)
|
||||
- Latency requirements (user needs instant responses, but LLM calls take seconds)
|
||||
|
||||
### 2d: Recommendation
|
||||
|
||||
**Give a clear recommendation:**
|
||||
|
||||
> **My Recommendation:**
|
||||
>
|
||||
> [One of these three:]
|
||||
>
|
||||
> **✅ PROCEED** — This is a good fit. The framework handles your core needs well. [List any minor caveats.]
|
||||
>
|
||||
> **⚠️ PROCEED WITH SCOPE ADJUSTMENT** — This can work, but we should adjust: [specific changes]. Without these adjustments, you'll hit [specific problems].
|
||||
>
|
||||
> **🛑 RECONSIDER** — This framework may not be the right tool for this job because [specific reasons]. Consider instead: [alternatives — simpler script, different framework, custom solution].
|
||||
|
||||
### 2e: Get Explicit Acknowledgment
|
||||
|
||||
**CALL AskUserQuestion:**
|
||||
|
||||
```
|
||||
AskUserQuestion(questions=[{
|
||||
"question": "Based on this assessment, how would you like to proceed?",
|
||||
"header": "Proceed",
|
||||
"options": [
|
||||
{"label": "Proceed as described", "description": "I understand the limitations, let's build it"},
|
||||
{"label": "Adjust scope", "description": "Let's modify the requirements to fit better"},
|
||||
{"label": "More questions", "description": "I have questions about the assessment"},
|
||||
{"label": "Reconsider", "description": "Maybe this isn't the right approach"}
|
||||
],
|
||||
"multiSelect": false
|
||||
}])
|
||||
```
|
||||
|
||||
**WAIT for user response.**
|
||||
|
||||
- If **Proceed**: Move to STEP 3
|
||||
- If **Adjust scope**: Discuss what to change, update your notes, re-assess if needed
|
||||
- If **More questions**: Answer them honestly, then ask again
|
||||
- If **Reconsider**: Discuss alternatives. If they decide to proceed anyway, that's their informed choice
|
||||
|
||||
---
|
||||
|
||||
## STEP 3: Define Goal Together with User
|
||||
|
||||
**Now that the use case is qualified, collaborate on the goal definition.**
|
||||
|
||||
**START by synthesizing what you learned:**
|
||||
|
||||
> Based on our discussion, here's my understanding of the goal:
|
||||
>
|
||||
> **Core purpose:** [what you understood from 2a]
|
||||
> **Success looks like:** [what you inferred]
|
||||
> **Key constraints:** [what you inferred]
|
||||
>
|
||||
> Let me refine this with you:
|
||||
>
|
||||
> 1. **What should this agent accomplish?** (confirm or correct my understanding)
|
||||
> 2. **How will we know it succeeded?** (what specific outcomes matter)
|
||||
> 3. **Are there any hard constraints?** (things it must never do, quality bars)
|
||||
|
||||
**WAIT for the user to respond.** Use their input (and the agent type they selected) to draft:
|
||||
|
||||
@@ -268,12 +476,12 @@ AskUserQuestion(questions=[{
|
||||
|
||||
**WAIT for user response.**
|
||||
|
||||
- If **Approve**: Call `mcp__agent-builder__set_goal(...)` with the goal details, then proceed to STEP 3
|
||||
- If **Approve**: Call `mcp__agent-builder__set_goal(...)` with the goal details, then proceed to STEP 4
|
||||
- If **Modify**: Ask what they want to change, update the draft, ask again
|
||||
|
||||
---
|
||||
|
||||
## STEP 3: Design Conceptual Nodes
|
||||
## STEP 4: Design Conceptual Nodes
|
||||
|
||||
**If starting from a template**, the nodes are already loaded in the builder session. Present the existing nodes using the table format below and ask for approval. Skip the design phase.
|
||||
|
||||
@@ -328,12 +536,12 @@ AskUserQuestion(questions=[{
|
||||
|
||||
**WAIT for user response.**
|
||||
|
||||
- If **Approve**: Proceed to STEP 4
|
||||
- If **Approve**: Proceed to STEP 5
|
||||
- If **Modify**: Ask what they want to change, update design, ask again
|
||||
|
||||
---
|
||||
|
||||
## STEP 4: Design Full Graph and Review
|
||||
## STEP 5: Design Full Graph and Review
|
||||
|
||||
**If starting from a template**, the edges are already loaded in the builder session. Render the existing graph as ASCII art and present it to the user for approval. Skip the edge design phase.
|
||||
|
||||
@@ -445,15 +653,16 @@ AskUserQuestion(questions=[{
|
||||
|
||||
**WAIT for user response.**
|
||||
|
||||
- If **Approve**: Proceed to STEP 5
|
||||
- If **Approve**: Proceed to STEP 6
|
||||
- If **Modify**: Ask what they want to change, update the graph, re-render, ask again
|
||||
|
||||
---
|
||||
|
||||
## STEP 5: Build the Agent
|
||||
## STEP 6: Build the Agent
|
||||
|
||||
**NOW — and only now — write the actual code.** The user has approved the goal, nodes, and graph.
|
||||
|
||||
### 6a: Register nodes and edges with MCP
|
||||
**If starting from a template**, the copied files will be overwritten with the approved design. You MUST replace every occurrence of the old template name with the new agent name. Here is the complete checklist — miss NONE of these:
|
||||
|
||||
| File | What to rename |
|
||||
@@ -474,9 +683,7 @@ AskUserQuestion(questions=[{
|
||||
| `__init__.py` | `from .agent import OldNameAgent` import |
|
||||
| `__init__.py` | `__all__` list entry |
|
||||
|
||||
### 5a: Register nodes and edges with MCP
|
||||
|
||||
**If starting from a template and no modifications were made in Steps 2-4**, the nodes and edges are already registered. Skip to validation (`mcp__agent-builder__validate_graph()`). If modifications were made, re-register the changed nodes/edges (the MCP tools handle duplicates by overwriting).
|
||||
**If starting from a template and no modifications were made in Steps 2-5**, the nodes and edges are already registered. Skip to validation (`mcp__agent-builder__validate_graph()`). If modifications were made, re-register the changed nodes/edges (the MCP tools handle duplicates by overwriting).
|
||||
|
||||
**FOR EACH approved node**, call:
|
||||
|
||||
@@ -516,9 +723,9 @@ mcp__agent-builder__validate_graph()
|
||||
```
|
||||
|
||||
- If invalid: Fix the issues and re-validate
|
||||
- If valid: Continue to 5b
|
||||
- If valid: Continue to 6b
|
||||
|
||||
### 5b: Write Python package files
|
||||
### 6b: Write Python package files
|
||||
|
||||
**EXPORT the graph data:**
|
||||
|
||||
@@ -578,7 +785,7 @@ mcp__agent-builder__export_graph()
|
||||
|
||||
---
|
||||
|
||||
## STEP 6: Verify and Test
|
||||
## STEP 7: Verify and Test
|
||||
|
||||
**RUN validation:**
|
||||
|
||||
@@ -704,16 +911,70 @@ result = await executor.execute(graph=graph, goal=goal, input_data=input_data)
|
||||
|
||||
---
|
||||
|
||||
## REFERENCE: Framework Capabilities for Qualification
|
||||
|
||||
Use this reference during STEP 2 to give accurate, honest assessments.
|
||||
|
||||
### What the Framework Does Well (The Good)
|
||||
|
||||
| Capability | Description |
|
||||
|------------|-------------|
|
||||
| Multi-turn conversations | Client-facing nodes stream to users and block for input |
|
||||
| Human-in-the-loop review | Approval checkpoints with feedback loops back to earlier nodes |
|
||||
| Tool orchestration | LLM can call multiple tools, framework handles execution |
|
||||
| Structured outputs | `set_output` produces validated, typed outputs |
|
||||
| Parallel execution | Fan-out/fan-in for concurrent node execution |
|
||||
| Context management | Automatic compaction and spillover for large data |
|
||||
| Error recovery | Retry logic, judges, and feedback edges for self-correction |
|
||||
| Session persistence | State saved to disk, resumable sessions |
|
||||
|
||||
### Framework Limitations (The Bad)
|
||||
|
||||
| Limitation | Impact | Workaround |
|
||||
|------------|--------|------------|
|
||||
| LLM latency | 2-10+ seconds per turn | Not suitable for real-time/low-latency needs |
|
||||
| Context window limits | ~128K tokens max | Use data tools for spillover, design for chunking |
|
||||
| Cost per run | LLM API calls cost money | Budget planning, caching where possible |
|
||||
| Rate limits | API throttling on heavy usage | Backoff, queue management |
|
||||
| Node boundaries lose context | Outputs must be serialized | Prefer fewer, richer nodes |
|
||||
| Single-threaded within node | One LLM call at a time per node | Use fan-out for parallelism |
|
||||
|
||||
### Not Designed For (The Ugly)
|
||||
|
||||
| Use Case | Why It's Problematic | Alternative |
|
||||
|----------|---------------------|-------------|
|
||||
| Long-running daemons | Framework is request-response, not persistent | External scheduler + agent |
|
||||
| Sub-second responses | LLM latency is inherent | Traditional code, no LLM |
|
||||
| Processing millions of items | Context windows and rate limits | Batch processing + sampling |
|
||||
| Real-time streaming data | No built-in pub/sub or streaming input | Custom MCP server + agent |
|
||||
| Guaranteed determinism | LLM outputs vary | Function nodes for deterministic parts |
|
||||
| Offline/air-gapped | Requires LLM API access | Local models (not currently supported) |
|
||||
| Multi-user concurrency | Single-user session model | Separate agent instances per user |
|
||||
|
||||
### Tool Availability Reality Check
|
||||
|
||||
**Before promising any capability, check `list_mcp_tools()`.** Common gaps:
|
||||
|
||||
- **Email**: May not have `send_email` — check before promising email automation
|
||||
- **Calendar**: May not have calendar APIs — check before promising scheduling
|
||||
- **Database**: May not have SQL tools — check before promising data queries
|
||||
- **File system**: Has data tools but not arbitrary filesystem access
|
||||
- **External APIs**: Depends entirely on what MCP servers are registered
|
||||
|
||||
---
|
||||
|
||||
## COMMON MISTAKES TO AVOID
|
||||
|
||||
1. **Using tools that don't exist** - Always check `mcp__agent-builder__list_mcp_tools()` first
|
||||
2. **Wrong entry_points format** - Must be `{"start": "node-id"}`, NOT a set or list
|
||||
3. **Skipping validation** - Always validate nodes and graph before proceeding
|
||||
4. **Not waiting for approval** - Always ask user before major steps
|
||||
5. **Displaying this file** - Execute the steps, don't show documentation
|
||||
6. **Too many thin nodes** - Prefer fewer, richer nodes (4 nodes > 8 nodes)
|
||||
7. **Missing STEP 1/STEP 2 in client-facing prompts** - Client-facing nodes need explicit phases to prevent premature set_output
|
||||
8. **Forgetting nullable_output_keys** - Mark input_keys that only arrive on certain edges (e.g., feedback) as nullable on the receiving node
|
||||
9. **Adding framework gating for LLM behavior** - Fix prompts or use judges, not ad-hoc code
|
||||
10. **Writing code before user approves the graph** - Always get approval on goal, nodes, and graph BEFORE writing any agent code
|
||||
11. **Wrong mcp_servers.json format** - Use flat format (no `"mcpServers"` wrapper), `cwd` must be `"../../tools"`, and `command` must be `"uv"` with args `["run", "python", ...]`
|
||||
1. **Skipping use case qualification** - A responsible engineer qualifies the use case BEFORE building. Be transparent about what works, what doesn't, and what's problematic
|
||||
2. **Hiding limitations** - Don't oversell the framework. If a tool doesn't exist or a capability is missing, say so upfront
|
||||
3. **Using tools that don't exist** - Always check `mcp__agent-builder__list_mcp_tools()` first
|
||||
4. **Wrong entry_points format** - Must be `{"start": "node-id"}`, NOT a set or list
|
||||
5. **Skipping validation** - Always validate nodes and graph before proceeding
|
||||
6. **Not waiting for approval** - Always ask user before major steps
|
||||
7. **Displaying this file** - Execute the steps, don't show documentation
|
||||
8. **Too many thin nodes** - Prefer fewer, richer nodes (4 nodes > 8 nodes)
|
||||
9. **Missing STEP 1/STEP 2 in client-facing prompts** - Client-facing nodes need explicit phases to prevent premature set_output
|
||||
10. **Forgetting nullable_output_keys** - Mark input_keys that only arrive on certain edges (e.g., feedback) as nullable on the receiving node
|
||||
11. **Adding framework gating for LLM behavior** - Fix prompts or use judges, not ad-hoc code
|
||||
12. **Writing code before user approves the graph** - Always get approval on goal, nodes, and graph BEFORE writing any agent code
|
||||
13. **Wrong mcp_servers.json format** - Use flat format (no `"mcpServers"` wrapper), `cwd` must be `"../../tools"`, and `command` must be `"uv"` with args `["run", "python", ...]`
|
||||
|
||||
@@ -19,14 +19,18 @@ metadata:
|
||||
|
||||
**THIS IS AN EXECUTABLE WORKFLOW. DO NOT explore the codebase or read source files. ROUTE to the correct skill IMMEDIATELY.**
|
||||
|
||||
When this skill is loaded, determine what the user needs and invoke the appropriate skill NOW:
|
||||
- **User wants to build an agent** (from scratch or from a template) → Invoke `/hive-create` immediately
|
||||
- **User wants to test an agent** → Invoke `/hive-test` immediately
|
||||
- **User wants to learn concepts** → Invoke `/hive-concepts` immediately
|
||||
- **User wants patterns/optimization** → Invoke `/hive-patterns` immediately
|
||||
- **User wants to set up credentials** → Invoke `/hive-credentials` immediately
|
||||
- **User has a failing/broken agent** → Invoke `/hive-debugger` immediately
|
||||
- **Unclear what user needs** → Ask the user (do NOT explore the codebase to figure it out)
|
||||
When this skill is loaded, **ALWAYS use the AskUserQuestion tool** to present options:
|
||||
|
||||
```
|
||||
Use AskUserQuestion with these options:
|
||||
- "Build a new agent" → Then invoke /hive-create
|
||||
- "Test an existing agent" → Then invoke /hive-test
|
||||
- "Learn agent concepts" → Then invoke /hive-concepts
|
||||
- "Optimize agent design" → Then invoke /hive-patterns
|
||||
- "Set up credentials" → Then invoke /hive-credentials
|
||||
- "Debug a failing agent" → Then invoke /hive-debugger
|
||||
- "Other" (please describe what you want to achieve)
|
||||
```
|
||||
|
||||
**DO NOT:** Read source files, explore the codebase, search for code, or do any investigation before routing. The sub-skills handle all of that.
|
||||
|
||||
@@ -73,7 +77,6 @@ Use this meta-skill when:
|
||||
|
||||
## Phase 0: Understand Concepts (Optional)
|
||||
|
||||
**Duration**: 5-10 minutes
|
||||
**Skill**: `/hive-concepts`
|
||||
**Input**: Questions about agent architecture
|
||||
|
||||
@@ -95,7 +98,6 @@ Use this meta-skill when:
|
||||
|
||||
## Phase 1: Build Agent Structure
|
||||
|
||||
**Duration**: 15-30 minutes
|
||||
**Skill**: `/hive-create`
|
||||
**Input**: User requirements ("Build an agent that...") or a template to start from
|
||||
|
||||
@@ -166,7 +168,6 @@ exports/agent_name/
|
||||
|
||||
## Phase 1.5: Optimize Design (Optional)
|
||||
|
||||
**Duration**: 10-15 minutes
|
||||
**Skill**: `/hive-patterns`
|
||||
**Input**: Completed agent structure
|
||||
|
||||
@@ -191,14 +192,11 @@ exports/agent_name/
|
||||
|
||||
## Phase 2: Test & Validate
|
||||
|
||||
**Duration**: 20-40 minutes
|
||||
**Skill**: `/hive-test`
|
||||
**Input**: Working agent from Phase 1
|
||||
|
||||
### What This Phase Does
|
||||
|
||||
### What This Phase Does
|
||||
|
||||
Guides the creation and execution of a comprehensive test suite:
|
||||
- Constraint tests
|
||||
- Success criteria tests
|
||||
|
||||
@@ -0,0 +1,172 @@
|
||||
# Agent Runtime
|
||||
|
||||
Unified execution system for all Hive agents. Every agent — single-entry or multi-entry, headless or TUI — runs through the same runtime stack.
|
||||
|
||||
## Topology
|
||||
|
||||
```
|
||||
AgentRunner.load(agent_path)
|
||||
|
|
||||
AgentRunner
|
||||
(factory + public API)
|
||||
|
|
||||
_setup_agent_runtime()
|
||||
|
|
||||
AgentRuntime
|
||||
(lifecycle + orchestration)
|
||||
/ | \\
|
||||
Stream A Stream B Stream C ← one per entry point
|
||||
| | |
|
||||
GraphExecutor GraphExecutor GraphExecutor
|
||||
| | |
|
||||
Node → Node → Node (graph traversal)
|
||||
```
|
||||
|
||||
Single-entry agents get a `"default"` entry point automatically. There is no separate code path.
|
||||
|
||||
## Components
|
||||
|
||||
| Component | File | Role |
|
||||
| --- | --- | --- |
|
||||
| `AgentRunner` | `runner/runner.py` | Load agents, configure tools/LLM, expose high-level API |
|
||||
| `AgentRuntime` | `runtime/agent_runtime.py` | Lifecycle management, entry point routing, event bus |
|
||||
| `ExecutionStream` | `runtime/execution_stream.py` | Per-entry-point execution queue, session persistence |
|
||||
| `GraphExecutor` | `graph/executor.py` | Node traversal, tool dispatch, checkpointing |
|
||||
| `EventBus` | `runtime/event_bus.py` | Pub/sub for execution events (streaming, I/O) |
|
||||
| `SharedStateManager` | `runtime/shared_state.py` | Cross-stream state with isolation levels |
|
||||
| `OutcomeAggregator` | `runtime/outcome_aggregator.py` | Goal progress tracking across streams |
|
||||
| `SessionStore` | `storage/session_store.py` | Session state persistence (`sessions/{id}/state.json`) |
|
||||
|
||||
## Programming Interface
|
||||
|
||||
### AgentRunner (high-level)
|
||||
|
||||
```python
|
||||
from framework.runner import AgentRunner
|
||||
|
||||
# Load and run
|
||||
runner = AgentRunner.load("exports/my_agent", model="anthropic/claude-sonnet-4-20250514")
|
||||
result = await runner.run({"query": "hello"})
|
||||
|
||||
# Resume from paused session
|
||||
result = await runner.run({"query": "continue"}, session_state=saved_state)
|
||||
|
||||
# Lifecycle
|
||||
await runner.start() # Start the runtime
|
||||
await runner.stop() # Stop the runtime
|
||||
exec_id = await runner.trigger("default", {}) # Non-blocking trigger
|
||||
progress = await runner.get_goal_progress() # Goal evaluation
|
||||
entry_points = runner.get_entry_points() # List entry points
|
||||
|
||||
# Context manager
|
||||
async with AgentRunner.load("exports/my_agent") as runner:
|
||||
result = await runner.run({"query": "hello"})
|
||||
|
||||
# Cleanup
|
||||
runner.cleanup() # Synchronous
|
||||
await runner.cleanup_async() # Asynchronous
|
||||
```
|
||||
|
||||
### AgentRuntime (lower-level)
|
||||
|
||||
```python
|
||||
from framework.runtime.agent_runtime import AgentRuntime, create_agent_runtime
|
||||
from framework.runtime.execution_stream import EntryPointSpec
|
||||
|
||||
# Create runtime with entry points
|
||||
runtime = create_agent_runtime(
|
||||
graph=graph,
|
||||
goal=goal,
|
||||
storage_path=Path("~/.hive/agents/my_agent"),
|
||||
entry_points=[
|
||||
EntryPointSpec(id="default", name="Default", entry_node="start", trigger_type="manual"),
|
||||
],
|
||||
llm=llm,
|
||||
tools=tools,
|
||||
tool_executor=tool_executor,
|
||||
checkpoint_config=checkpoint_config,
|
||||
)
|
||||
|
||||
# Lifecycle
|
||||
await runtime.start()
|
||||
await runtime.stop()
|
||||
|
||||
# Execution
|
||||
exec_id = await runtime.trigger("default", {"query": "hello"}) # Non-blocking
|
||||
result = await runtime.trigger_and_wait("default", {"query": "hello"}) # Blocking
|
||||
result = await runtime.trigger_and_wait("default", {}, session_state=state) # Resume
|
||||
|
||||
# Client-facing node I/O
|
||||
await runtime.inject_input(node_id="chat", content="user response")
|
||||
|
||||
# Events
|
||||
sub_id = runtime.subscribe_to_events(
|
||||
event_types=[EventType.CLIENT_OUTPUT_DELTA],
|
||||
handler=my_handler,
|
||||
)
|
||||
runtime.unsubscribe_from_events(sub_id)
|
||||
|
||||
# Inspection
|
||||
runtime.is_running # bool
|
||||
runtime.event_bus # EventBus
|
||||
runtime.state_manager # SharedStateManager
|
||||
runtime.get_stats() # Runtime statistics
|
||||
```
|
||||
|
||||
## Execution Flow
|
||||
|
||||
1. `AgentRunner.run()` calls `AgentRuntime.trigger_and_wait()`
|
||||
2. `AgentRuntime` routes to the `ExecutionStream` for the entry point
|
||||
3. `ExecutionStream` creates a `GraphExecutor` and calls `execute()`
|
||||
4. `GraphExecutor` traverses nodes, dispatches tools, manages checkpoints
|
||||
5. `ExecutionResult` flows back up through the stack
|
||||
6. `ExecutionStream` writes session state to disk
|
||||
|
||||
## Session Resume
|
||||
|
||||
All execution paths support session resume:
|
||||
|
||||
```python
|
||||
# First run (agent pauses at a client-facing node)
|
||||
result = await runner.run({"query": "start task"})
|
||||
# result.paused_at = "review-node"
|
||||
# result.session_state = {"memory": {...}, "paused_at": "review-node", ...}
|
||||
|
||||
# Resume
|
||||
result = await runner.run({"input": "approved"}, session_state=result.session_state)
|
||||
```
|
||||
|
||||
Session state flows: `AgentRunner.run()` → `AgentRuntime.trigger_and_wait()` → `ExecutionStream.execute()` → `GraphExecutor.execute()`.
|
||||
|
||||
Checkpoints are saved at node boundaries (`sessions/{id}/checkpoints/`) for crash recovery.
|
||||
|
||||
## Event Bus
|
||||
|
||||
The `EventBus` provides real-time execution visibility:
|
||||
|
||||
| Event | When |
|
||||
| --- | --- |
|
||||
| `NODE_STARTED` | Node begins execution |
|
||||
| `NODE_COMPLETED` | Node finishes |
|
||||
| `TOOL_CALL_STARTED` | Tool invocation begins |
|
||||
| `TOOL_CALL_COMPLETED` | Tool invocation finishes |
|
||||
| `CLIENT_OUTPUT_DELTA` | Agent streams text to user |
|
||||
| `CLIENT_INPUT_REQUESTED` | Agent needs user input |
|
||||
| `EXECUTION_COMPLETED` | Full execution finishes |
|
||||
|
||||
In headless mode, `AgentRunner` subscribes to `CLIENT_OUTPUT_DELTA` and `CLIENT_INPUT_REQUESTED` to print output and read stdin. In TUI mode, `AdenTUI` subscribes to route events to UI widgets.
|
||||
|
||||
## Storage Layout
|
||||
|
||||
```
|
||||
~/.hive/agents/{agent_name}/
|
||||
sessions/
|
||||
session_YYYYMMDD_HHMMSS_{uuid}/
|
||||
state.json # Session state (status, memory, progress)
|
||||
checkpoints/ # Node-boundary snapshots
|
||||
logs/
|
||||
summary.json # Execution summary
|
||||
details.jsonl # Detailed event log
|
||||
tool_logs.jsonl # Tool call log
|
||||
runtime_logs/ # Cross-session runtime logs
|
||||
```
|
||||
@@ -0,0 +1,261 @@
|
||||
# Developer success
|
||||
Our value and principle is developer success. We truly care about helping developers achieve their goals — not just shipping features, but ensuring every developer who uses Hive can build, debug, deploy, and iterate on agents that work in production. Developer success means our developers succeed in their own work: automating real business processes, shipping products, and growing their capabilities. If our developers aren't winning, we aren't winning.
|
||||
|
||||
## Developer profiles
|
||||
From what we currently see, these are the developers who will achieve success with our framework the earliest with our framework
|
||||
- IT Specialists and Consultants
|
||||
- Individual developers who want to build a product
|
||||
- Developers who want to get a job done (they have a real-world business process)
|
||||
- Developers Who Want to learn and become a business process owner
|
||||
- One-man CEOs
|
||||
|
||||
## How They Find Us & Why They Use Us
|
||||
|
||||
**IT Specialists and Consultants:**
|
||||
Always trying to learn and find the state-of-the-art tools on the market, as it defines their career. They tried Claude but found it hard to apply to their customers' needs. They received Vincent's email and wanted to give it a try. They see the opportunity to resell this product and become active users of ours.
|
||||
|
||||
**Developers Who Want to Get a Job Done:**
|
||||
They find us through our marketing efforts selling the sample agents and our SEO pages for business processes, while they're researching solutions to the problems they're trying to solve.
|
||||
|
||||
**Developers Who Want to learn and become a business process owner:**
|
||||
They find us through the rage-bait post "If you're a developer that doesn't own a business process, you'll lose your job" and the seminars we host. They believe they need to upgrade themselves from just a coder to somebody who can own a process. They check the GitHub and find the templates interesting. Then they join our Discord to discover more agent ideas developed by the community.
|
||||
|
||||
**One-Man CEO:**
|
||||
Has a business idea and might have some traction, but is overwhelmed by too much work. They saw news saying AI agents can handle all their repetitive tasks. During research, they found us and our tutorials. After seeing a wall of sample agents and playing with them, they couldn't refuse the value and joined our Discord. [See roadmap — Hosted sample agent playgrounds]
|
||||
|
||||
**Individual Product Developer:**
|
||||
Has a product idea and is trying to find the best framework. They encounter a post from Patrick: "I built an AI agent that does market research for me every day using this new framework." They go to our GitHub, find the idea aligned with their vision, and join our Discord.
|
||||
|
||||
> **Note:** Individual product developers want to do one thing well and resell it. One-man CEOs have many things to do and need multiple agents.
|
||||
|
||||
> **Note:** Ordered by importance. Here is the rationale: Among all developers, IT people are going to be the first group to truly deploy their work in production and achieve real developer success. They are also likely to contribute to the framework. Developers who want to learn are the group who won't get things deployed anytime soon but can be good community members. The product developer is the more long-term play. As a dev tool, it would be a huge developer success if we have them building a product with it. It is the hardest challenge for our framework and also requires good product developers to spend time figuring things out. This is not going to happen in two months.
|
||||
|
||||
## What Is Their Success
|
||||
|
||||
**IT Specialists and Consultants:**
|
||||
Success means they're able to resell our framework to their customers and deliver use cases in a production environment. It will be critical for us to have a few "less serious" use cases so people know where to start.
|
||||
|
||||
**Developers Who Want to Get a Job Done:**
|
||||
The framework is adjustable enough for developers to either start from scratch or build from templates to get the job done.
|
||||
|
||||
Job done is considered as:
|
||||
1. The developer deploys it to production and gets users to use it
|
||||
2. The developer starts to own the business process and knows how to maintain it
|
||||
3. The developer can add more features and integrations to expand the agent's capability as the business process updates
|
||||
4. The developer is alerted when any failure/escalation happens and is able to debug the agent when sessions go wrong
|
||||
|
||||
**Developers Who Want to Learn and Become a Business Process Owner:**
|
||||
1. The developer learns from sample agents how business processes are done
|
||||
2. The developer can deploy a sample agent for their team to automate some processes
|
||||
3. The developer starts to own the business process and knows how to maintain it
|
||||
4. The developer can add more features and integrations to expand the agent's capability as the business process updates
|
||||
5. The developer is able to debug the agent when sessions go wrong
|
||||
|
||||
**One-Man CEO:**
|
||||
1. The developer can deploy multiple agents from sample agents
|
||||
2. The developer can tweak the agent according to their needs
|
||||
3. The developer can easily program a human-in-the-loop fallback so when the agent can't handle a problem, they receive a notification and fix the issue themselves
|
||||
4. The developer can generate ad-hoc agents that solve new issues for their business
|
||||
5. The developer can turn an ad-hoc agent into an agent that runs repeatedly
|
||||
6. The developer can turn a repeatedly-running agent into one that runs autonomously
|
||||
7. When the agent fails, the developer receives an alert
|
||||
|
||||
**Individual Product Developer:**
|
||||
1. The developer can develop an MVP with our generation framework
|
||||
2. The developer can easily add more capabilities
|
||||
3. The developer can trust the framework is future-proof for them
|
||||
4. The developer can have a deployment strategy where they wrap the agent as part of their product
|
||||
5. The developer can monitor the logs and costs for their users
|
||||
6. The product achieves success (like Unity), long term
|
||||
|
||||
```
|
||||
**Summary:**
|
||||
The common denominator:
|
||||
1. Can create an agent
|
||||
2. Can debug the agent
|
||||
3. Can maintain the agent
|
||||
4. Can deploy the agent
|
||||
5. Can iterate on the agent
|
||||
```
|
||||
|
||||
## Basic use cases (we shall have template for each one of these)
|
||||
|
||||
- Github issue triaging agent
|
||||
- Tech&AI news digest agent
|
||||
- Research report agent
|
||||
- Teams daily digest and to-dos
|
||||
- Discord autoreply bot
|
||||
- Finance stock digest
|
||||
- WhatsApp auto response agent
|
||||
- Email followup agent
|
||||
- Meeting time coordination agent
|
||||
|
||||
## Intermediate use cases
|
||||
|
||||
### 1. Sales & Marketing
|
||||
Marketing is often the most time-consuming "distraction" for a CEO. You provide the vision; they provide the volume.
|
||||
|
||||
- [Social Media Management](../examples/recipes/social_media_management/): Scheduling posts, replying to comments, and monitoring trends.
|
||||
- [News Jacking](../examples/recipes/news_jacking/): Personalized outreach triggered by real-time company news (funding, hires, press mentions).
|
||||
- [Newsletter Production](../examples/recipes/newsletter_production/): Taking your raw ideas or voice memos and turning them into a polished weekly email.
|
||||
- [CRM Update Agent](../examples/recipes/crm_hygiene/): Ensuring every lead has a follow-up date and a status update.
|
||||
|
||||
### 2. Customer Success
|
||||
You shouldn't be the one answering "How do I reset my password?" but you should be the one closing $10k deals.
|
||||
|
||||
- [Inquiry Triaging](../examples/recipes/inquiry_triaging/): Sorting the "tire kickers" from the "hot leads."
|
||||
- [Onboarding Assistance](../examples/recipes/onboarding_assistance/): Helping new clients set up their accounts or sending out "Welcome" kits.
|
||||
- [Customer support & Troubleshooting](../examples/recipes/support_troubleshooting/): Handling "Level 1" tech support for your platform or website.
|
||||
|
||||
### 3. Operations Automation
|
||||
This is your right hand. They keep the gears greased so you don't get stuck in the "admin trap."
|
||||
|
||||
- [Email Inbox Management](../examples/recipes/inbox_management/): Clearing out the spam and highlighting the three emails that actually need your brain.
|
||||
- [Invoicing & Collections](../examples/recipes/invoicing_collections/): Sending out bills and—more importantly—politely chasing down the people who haven't paid them.
|
||||
- [Data Keeper](../examples/recipes/data_keeper/): Pull data and reports from multiple data sources, and union them in one place.
|
||||
- [Travel & Calendar Coordination](../examples/recipes/calendar_coordination/): Protecting your "Deep Work" time from getting fragmented by random 15-minute meetings.
|
||||
|
||||
### 4. The Technical & Product Maintenance
|
||||
Unless you are a developer, tech debt will kill your productivity. A part-timer can keep the lights on.
|
||||
|
||||
- [Quality Assurance](../examples/recipes/quality_assurance/): Testing new features or links before they go live to ensure nothing is broken.
|
||||
- [Documentation](../examples/recipes/documentation/): Turning your messy processes into clean Standard Operating Procedures (SOPs).
|
||||
- [Issue Triaging](../examples/recipes/issue_triaging/): Categorizing and routing incoming bug reports by severity.
|
||||
|
||||
## Installation
|
||||
|
||||
Install the prerequisites like Python, then install the quickstart package.
|
||||
|
||||
## Use Existing Agent
|
||||
|
||||
To run an existing agent:
|
||||
|
||||
1. Run `hive run <agent_name>` or `hive tui <agent_name>`
|
||||
2. Hive automatically validates that your agent has all required prerequisites
|
||||
3. Type something in the TUI or trigger an event source (like receiving an email)
|
||||
4. Your agent runs, and the outcome is recorded
|
||||
5. If something fails, you'll see where the logs are saved
|
||||
|
||||
## Agent Generation (Alternative to Using Existing Agent)
|
||||
|
||||
If you want to build something custom, you can generate your own agent from scratch. See [Agent Generation](#agent-generation).
|
||||
|
||||
If you prefer to start with a working example first, try running an existing agent to see how it works. See [Use Existing Agent](#use-existing-agent).
|
||||
|
||||
If you find something you can't accomplish with the framework, you can contribute by opening an issue or sharing your feedback in our Discord channel.
|
||||
|
||||
## Agent Testing
|
||||
|
||||
**Interactive testing:** Run `hive tui` to test your agent in a terminal UI.
|
||||
|
||||
**Autonomous testing:** Run `hive run <agent_name> --debug` and trigger the event source. Testing scheduled events can be tricky—Hive provides developer tools to help you simulate them.
|
||||
|
||||
**Try before you install:** You can test sample agents hosted in the cloud without any local installation.
|
||||
|
||||
## Integration
|
||||
|
||||
You need to set up integrations correctly before testing can succeed.
|
||||
|
||||
**Happy path:** Your agent accomplishes the goal exactly as specified.
|
||||
|
||||
**Mid path:** After negotiation, your agent explicitly tells you what it can and cannot do.
|
||||
|
||||
**Sad path:** After negotiation, you may need to build a one-off integration for certain tools.
|
||||
|
||||
## Agent Debugging
|
||||
|
||||
When errors or unexpected behavior happen during testing, you need to be able to debug your agent effectively.
|
||||
|
||||
## Logging
|
||||
|
||||
Hive gives you an AI-assisted experience for checking logs and getting high signal-to-noise insights.
|
||||
|
||||
Hive uses **three-level observability** for tracking agent execution:
|
||||
|
||||
| Level | What it captures | File |
|
||||
|-------|------------------|------|
|
||||
| **L1 (Summary)** | Run outcomes — success/failure, execution quality, attention flags | `summary.json` |
|
||||
| **L2 (Details)** | Per-node results — retries, verdicts, latency, attention reasons | `details.jsonl` |
|
||||
| **L3 (Tool Logs)** | Step-by-step execution — tool calls, LLM responses, judge feedback | `tool_logs.jsonl` |
|
||||
|
||||
## (Optional) How Graph Works
|
||||
|
||||
To fix and improve your agent, you need to understand how node memory works and how tools are called. See `docs/key_concepts` for details.
|
||||
|
||||
## **First Success**
|
||||
|
||||
By this point, you should have run your first agent and understand how the framework works. You're ready to use it for real use cases, which often means updating and customizing your agent.
|
||||
|
||||
Everything before your first success should run as smoothly as possible—this is non-negotiable.
|
||||
|
||||
## Contribution
|
||||
|
||||
If you encounter issues creating your desired agent, or find that the integrations aren't sufficient for your use case, open an issue or let us know in our Discord channel.
|
||||
|
||||
## Iteration (Building) - More Like Debugging
|
||||
|
||||
After your MVP agent or sample agent runs, you'll want to iterate by expanding the use cases.
|
||||
|
||||
## Iteration (Production) - Evolution and Inventiveness
|
||||
|
||||
After your MVP is deployed, your taste and judgment still drive the direction—AI is a significant force multiplier for rapidly iterating and solving problems.
|
||||
|
||||
With Aden Cloud Hive, production evolution is fully automatic. The Aden Queen Bee runs natural selection by deploying, evaluating, and improving your agents.
|
||||
|
||||
## Version Control
|
||||
|
||||
Iteration doesn't always improve everything. Version control helps you get back to a previous version, like how git works. Run `hive git restore` to revert changes.
|
||||
|
||||
## Agent Personality
|
||||
|
||||
You can put your own soul into your agent. What remains constant across evolution matters. Success isn't about having your agent constantly changing—it's about knowing that your goal and personality stay fixed while your agent adapts to solve problems.
|
||||
|
||||
## Memory Management
|
||||
|
||||
Hive nodes have a built-in mechanism for handling node memory and passing memory between nodes. To implement cross-session memory or custom memory logic, use the memory tools.
|
||||
|
||||
# Deployment
|
||||
|
||||
## (Optional) How Agent Runtime Works
|
||||
|
||||
To fix and improve your agent, you need to understand how data transfers during runtime, how memory works, and how tools work. See `./agent_runtime.md` for details.
|
||||
|
||||
## Local Deployment
|
||||
|
||||
By default, Hive supports deployment through Docker.
|
||||
|
||||
1. Pre-flight Validation (Critical)
|
||||
2. One-Command Deployment (`hive deploy local my_agent`)
|
||||
3. Credential Handling in Containers (local credentials + Aden Cloud Credentials for OAuth)
|
||||
4. Persistence & State
|
||||
5. Debugging/Logging/Memory Access (start with CLI commands)
|
||||
6. Expose Hooks and APIs as SDK
|
||||
7. Documentation Deliverables
|
||||
|
||||
## Cloud Deployment
|
||||
|
||||
If you want zero-ops deployment, easier integration and credential management, and built-in logging, Aden Cloud is ideal. You get secure defaults, scaling, and observability out of the box—at the cost of less low-level control and some vendor lock-in.
|
||||
|
||||
## Deployment Strategy
|
||||
|
||||
Autonomous and interactive modes look different, but the core remains the same, and your deployment strategy should be consistent across both.
|
||||
|
||||
## Performance
|
||||
|
||||
Not a focus at the moment. Speed of execution, process pools, and hallucination handling are future considerations.
|
||||
|
||||
## How We Collect Data
|
||||
|
||||
Self-reported issues and cloud observability products.
|
||||
|
||||
## Runtime Guardrails
|
||||
|
||||
Hive provides built-in safety mechanisms to keep your agents within bounds.
|
||||
|
||||
## How We Make Reliability
|
||||
|
||||
Breakages still happen, even in the best business processes. Being reliable means being adaptive and fixing problems when they arise.
|
||||
|
||||
## Developer Trust
|
||||
|
||||
To deploy your agent for production use, Hive provides transparency in runtime, sufficient control, and guardrails to avoid catastrophic results.
|
||||
@@ -22,6 +22,32 @@ Each recipe is a markdown file (or folder with a markdown file) containing:
|
||||
|
||||
## Available recipes
|
||||
|
||||
### Sales & Marketing
|
||||
| Recipe | Description |
|
||||
|--------|-------------|
|
||||
| [marketing_agent](marketing_agent/) | Multi-channel marketing content generator with audience analysis and A/B copy variants |
|
||||
| [social_media_management](social_media_management/) | Schedule posts, reply to comments, monitor trends |
|
||||
| [newsletter_production](newsletter_production/) | Transform voice memos and ideas into polished emails |
|
||||
| [news_jacking](news_jacking/) | Personalized outreach triggered by real-time company news |
|
||||
| [crm_hygiene](crm_hygiene/) | Ensure every lead has follow-up dates and status |
|
||||
|
||||
### Customer Success
|
||||
| Recipe | Description |
|
||||
|--------|-------------|
|
||||
| [inquiry_triaging](inquiry_triaging/) | Sort tire kickers from hot leads |
|
||||
| [onboarding_assistance](onboarding_assistance/) | Guide new clients through setup and welcome kits |
|
||||
|
||||
### Operations Automation
|
||||
| Recipe | Description |
|
||||
|--------|-------------|
|
||||
| [inbox_management](inbox_management/) | Clear spam and surface emails that need your brain |
|
||||
| [invoicing_collections](invoicing_collections/) | Send invoices and chase overdue payments |
|
||||
| [data_keeper](data_keeper/) | Pull data from multiple sources into unified reports |
|
||||
| [calendar_coordination](calendar_coordination/) | Protect Deep Work time and book travel |
|
||||
|
||||
### Technical & Product Maintenance
|
||||
| Recipe | Description |
|
||||
|--------|-------------|
|
||||
| [quality_assurance](quality_assurance/) | Test features and links before they go live |
|
||||
| [documentation](documentation/) | Turn messy processes into clean SOPs |
|
||||
| [basic_troubleshooting](basic_troubleshooting/) | Handle Level 1 tech support |
|
||||
| [issue_triaging](issue_triaging/) | Categorize and route bug reports by severity |
|
||||
@@ -0,0 +1,36 @@
|
||||
# Recipe: Ad Campaign Monitoring
|
||||
|
||||
Checking daily spends on Meta/Google ads and flagging if the Cost Per Acquisition (CPA) spikes.
|
||||
|
||||
## Why
|
||||
|
||||
Ad platforms are designed to spend your money. Without daily oversight, a $50/day campaign can quietly become a $500 disaster. This agent watches your campaigns like a hawk, catching anomalies before they drain your budget and surfacing optimization opportunities you'd otherwise miss.
|
||||
|
||||
## What
|
||||
|
||||
- Monitor daily spend across all active campaigns
|
||||
- Track CPA, ROAS, CTR, and conversion metrics
|
||||
- Compare performance against historical benchmarks
|
||||
- Identify underperforming ads and audiences
|
||||
- Generate daily/weekly performance summaries
|
||||
|
||||
## Integrations
|
||||
|
||||
| Platform | Purpose |
|
||||
|----------|---------|
|
||||
| Meta Ads API | Facebook/Instagram campaign data |
|
||||
| Google Ads API | Search/Display/YouTube campaign data |
|
||||
| Google Analytics 4 | Conversion tracking and attribution |
|
||||
| Google Sheets | Performance dashboards and reporting |
|
||||
| Slack | Alerts and daily summaries |
|
||||
|
||||
## Escalation Path
|
||||
|
||||
| Trigger | Action |
|
||||
|---------|--------|
|
||||
| CPA spikes >30% above target | Alert with breakdown by ad set and pause recommendation |
|
||||
| Daily budget exhausted before noon | Immediate alert — possible click fraud or viral ad |
|
||||
| ROAS drops below profitability threshold | Pause campaign and notify with optimization suggestions |
|
||||
| Ad rejected by platform | Alert with rejection reason and suggested fix |
|
||||
| Competitor running aggressive campaign | Flag if detected through auction insights |
|
||||
| Budget pacing off by >20% | Alert with projected monthly spend |
|
||||
@@ -0,0 +1,37 @@
|
||||
# Recipe: Travel & Calendar Coordination
|
||||
|
||||
Protecting your "Deep Work" time from getting fragmented by random 15-minute meetings.
|
||||
|
||||
## Why
|
||||
|
||||
Your calendar is a battlefield. Everyone wants a slice of your time, and without protection, your days become a patchwork of 30-minute meetings with no room for actual work. This agent defends your schedule — booking travel, consolidating meetings, and protecting the focus time you need to think.
|
||||
|
||||
## What
|
||||
|
||||
- Block and protect "Deep Work" time slots
|
||||
- Batch similar meetings together to reduce context switching
|
||||
- Book travel (flights, hotels, ground transport)
|
||||
- Handle meeting requests and rescheduling
|
||||
- Prep briefing docs before important meetings
|
||||
|
||||
## Integrations
|
||||
|
||||
| Platform | Purpose |
|
||||
|----------|---------|
|
||||
| Google Calendar / Outlook | Calendar management |
|
||||
| Calendly / Cal.com | External scheduling |
|
||||
| TripIt / Google Flights / Kayak | Travel booking |
|
||||
| Expensify / Ramp | Travel expense tracking |
|
||||
| Notion / Google Docs | Meeting prep documents |
|
||||
| Slack | Schedule alerts and confirmations |
|
||||
|
||||
## Escalation Path
|
||||
|
||||
| Trigger | Action |
|
||||
|---------|--------|
|
||||
| Someone tries to book over Deep Work time | Decline and offer alternatives, alert you if they push back |
|
||||
| VIP requests meeting during protected time | Flag for your decision — worth the exception? |
|
||||
| Flight cancelled or significantly delayed | Immediate alert with rebooking options |
|
||||
| Double-booking conflict | Alert with suggested resolution |
|
||||
| Meeting with no agenda 24h before | Prompt organizer for agenda, flag if none provided |
|
||||
| Travel cost exceeds budget threshold | Queue for approval before booking |
|
||||
@@ -0,0 +1,35 @@
|
||||
# Recipe: CRM Update
|
||||
|
||||
Ensuring every lead has a follow-up date and a status update.
|
||||
|
||||
## Why
|
||||
|
||||
A messy CRM is a leaky pipeline. Leads without follow-up dates get forgotten. Deals without status updates go stale. This agent keeps your CRM clean and actionable — so when you open it, you see exactly what needs your attention today.
|
||||
|
||||
## What
|
||||
|
||||
- Audit leads missing follow-up dates or status updates
|
||||
- Flag stale deals that haven't been touched in X days
|
||||
- Merge duplicate contacts and companies
|
||||
- Enrich records with missing data (email, phone, company info)
|
||||
- Generate daily "pipeline hygiene" reports
|
||||
|
||||
## Integrations
|
||||
|
||||
| Platform | Purpose |
|
||||
|----------|---------|
|
||||
| HubSpot / Salesforce / Pipedrive | CRM management |
|
||||
| Clearbit / Apollo / ZoomInfo | Data enrichment |
|
||||
| Google Sheets | Hygiene reports and audits |
|
||||
| Slack | Daily pipeline summary and action items |
|
||||
| Zapier / Make | Cross-platform data sync |
|
||||
|
||||
## Escalation Path
|
||||
|
||||
| Trigger | Action |
|
||||
|---------|--------|
|
||||
| High-value deal stale >14 days | Alert with deal history and suggested re-engagement |
|
||||
| Duplicate detected for active deal | Flag before merging — might be intentional |
|
||||
| Lead data conflicts with enrichment | Queue for human verification |
|
||||
| Pipeline value drops significantly week-over-week | Alert with analysis of what changed |
|
||||
| Follow-up overdue for >5 leads | Daily digest with prioritized action list |
|
||||
@@ -0,0 +1,38 @@
|
||||
# Recipe: Data Keeper
|
||||
|
||||
Pull data and reports from multiple data sources.
|
||||
|
||||
## Why
|
||||
|
||||
You can't steer the ship if you're the one manually copying and pasting numbers from Google Analytics into an Excel sheet. Every hour spent wrangling data is an hour not spent making decisions based on that data. This agent becomes your "Data DJ" — mixing sources, syncing sheets, and serving up the numbers you need when you need them.
|
||||
|
||||
## What
|
||||
|
||||
- Pull metrics from analytics, ads, CRM, and other platforms
|
||||
- Consolidate data into unified dashboards and spreadsheets
|
||||
- Generate daily/weekly/monthly reports automatically
|
||||
- Track KPIs and flag anomalies or trends
|
||||
- Keep data sources in sync (no more stale spreadsheets)
|
||||
|
||||
## Integrations
|
||||
|
||||
| Platform | Purpose |
|
||||
|----------|---------|
|
||||
| Google Analytics 4 | Website traffic and conversion data |
|
||||
| Google Sheets / Excel | Report destination and dashboards |
|
||||
| Meta Ads / Google Ads | Ad performance metrics |
|
||||
| Stripe / QuickBooks | Revenue and financial data |
|
||||
| HubSpot / Salesforce | Sales pipeline and CRM metrics |
|
||||
| Slack | Report delivery and anomaly alerts |
|
||||
| BigQuery / Snowflake | Data warehouse queries (if applicable) |
|
||||
|
||||
## Escalation Path
|
||||
|
||||
| Trigger | Action |
|
||||
|---------|--------|
|
||||
| Data source API fails or returns errors | Alert with error details and last successful sync time |
|
||||
| KPI drops >20% week-over-week | Immediate alert with breakdown by segment |
|
||||
| Data discrepancy between sources | Flag for investigation — which source is correct? |
|
||||
| Report generation fails | Notify with error and offer manual trigger |
|
||||
| Unusual spike in any metric | Alert with context — is this real or a tracking bug? |
|
||||
| New data source requested | Queue for setup — may need credentials or API access |
|
||||
@@ -0,0 +1,37 @@
|
||||
# Recipe: Documentation
|
||||
|
||||
Turning your messy processes into clean Standard Operating Procedures (SOPs).
|
||||
|
||||
## Why
|
||||
|
||||
Knowledge trapped in your head is a liability. When you're the only one who knows how things work, you become the bottleneck for everything. This agent captures your processes, cleans them up, and turns them into documentation anyone can follow — including your future self.
|
||||
|
||||
## What
|
||||
|
||||
- Watch you perform processes and document the steps
|
||||
- Convert rough notes and recordings into structured SOPs
|
||||
- Maintain and update existing documentation
|
||||
- Identify undocumented processes that need capture
|
||||
- Create quick-reference guides and checklists
|
||||
|
||||
## Integrations
|
||||
|
||||
| Platform | Purpose |
|
||||
|----------|---------|
|
||||
| Notion / Confluence / GitBook | Documentation hosting |
|
||||
| Loom / Screen recording | Process capture |
|
||||
| Otter.ai / Whisper | Meeting and explanation transcription |
|
||||
| Slack | Documentation requests and updates |
|
||||
| GitHub | Technical documentation and READMEs |
|
||||
| Google Docs | Collaborative editing |
|
||||
|
||||
## Escalation Path
|
||||
|
||||
| Trigger | Action |
|
||||
|---------|--------|
|
||||
| Process has conflicting documentation | Flag discrepancy for clarification |
|
||||
| SOP referenced but outdated >6 months | Queue for your review and update |
|
||||
| Someone asks question not covered by docs | Note the gap, draft new section for approval |
|
||||
| Critical process has no documentation | Alert as priority documentation needed |
|
||||
| Documentation contradicts current practice | Flag for reconciliation — update docs or process? |
|
||||
| External compliance requirement needs docs | Escalate with deadline and requirements |
|
||||
@@ -0,0 +1,35 @@
|
||||
# Recipe: Inbox Management
|
||||
|
||||
Clearing out the spam and highlighting the three emails that actually need your brain.
|
||||
|
||||
## Why
|
||||
|
||||
Email is where productivity goes to die. The average CEO gets 120+ emails per day, but only a handful actually matter. This agent acts as your email bouncer — filtering the noise so you can focus on the messages that move the needle.
|
||||
|
||||
## What
|
||||
|
||||
- Filter and archive spam, newsletters, and low-priority messages
|
||||
- Categorize emails by urgency and type (action needed, FYI, waiting on)
|
||||
- Summarize long email threads into key points
|
||||
- Draft responses for routine inquiries
|
||||
- Surface the 3-5 emails that truly need your attention
|
||||
|
||||
## Integrations
|
||||
|
||||
| Platform | Purpose |
|
||||
|----------|---------|
|
||||
| Gmail API / Microsoft Graph | Email access and management |
|
||||
| Google Calendar | Context for scheduling-related emails |
|
||||
| Slack | Daily inbox briefing and urgent alerts |
|
||||
| Notion | Email summary archive for reference |
|
||||
| Your CRM | Cross-reference with known contacts and deals |
|
||||
|
||||
## Escalation Path
|
||||
|
||||
| Trigger | Action |
|
||||
|---------|--------|
|
||||
| Email from VIP contact (investor, key client, partner) | Surface immediately, never auto-respond |
|
||||
| Legal or compliance language detected | Flag for your review — do not respond |
|
||||
| Angry or escalation tone detected | Alert with suggested de-escalation response |
|
||||
| Email requires decision with financial impact | Queue for your review with context |
|
||||
| Unrecognized sender with urgent request | Flag as potential phishing or verify before acting |
|
||||
@@ -0,0 +1,35 @@
|
||||
# Recipe: Inquiry Triaging
|
||||
|
||||
Sorting the "tire kickers" from the "hot leads."
|
||||
|
||||
## Why
|
||||
|
||||
Not all leads are created equal. For every serious buyer, there are ten people who'll never purchase. Your time should go to the prospects most likely to close — this agent scores and routes inquiries so you only see the ones worth your attention.
|
||||
|
||||
## What
|
||||
|
||||
- Analyze incoming inquiries for buying signals
|
||||
- Score leads based on company size, budget mentions, urgency, and fit
|
||||
- Route hot leads to your calendar immediately
|
||||
- Nurture warm leads with automated sequences
|
||||
- Politely deflect poor-fit inquiries
|
||||
|
||||
## Integrations
|
||||
|
||||
| Platform | Purpose |
|
||||
|----------|---------|
|
||||
| HubSpot / Salesforce / Pipedrive | CRM and lead management |
|
||||
| Intercom / Drift / Crisp | Live chat and inquiry capture |
|
||||
| Calendly / Cal.com | Meeting scheduling for qualified leads |
|
||||
| Clearbit / Apollo | Company enrichment and firmographics |
|
||||
| Slack / Email | Hot lead alerts |
|
||||
|
||||
## Escalation Path
|
||||
|
||||
| Trigger | Action |
|
||||
|---------|--------|
|
||||
| Enterprise lead detected (>500 employees) | Immediate alert with company brief and suggested approach |
|
||||
| Lead mentions competitor by name | Flag for competitive positioning response |
|
||||
| Urgent language detected ("need this week", "ASAP") | Fast-track to your calendar |
|
||||
| Lead asks question outside playbook | Queue for your personal response |
|
||||
| High-value lead goes cold (no response in 48h) | Alert with re-engagement suggestions |
|
||||
@@ -0,0 +1,36 @@
|
||||
# Recipe: Invoicing & Collections
|
||||
|
||||
Sending out bills and—more importantly—politely chasing down the people who haven't paid them.
|
||||
|
||||
## Why
|
||||
|
||||
Cash flow is oxygen. But chasing invoices is awkward and time-consuming. This agent handles the uncomfortable job of asking for money — sending invoices on time, following up persistently but politely, and only escalating when the situation requires your personal touch.
|
||||
|
||||
## What
|
||||
|
||||
- Generate and send invoices on schedule
|
||||
- Track payment status across all outstanding invoices
|
||||
- Send automated payment reminders (friendly → firm → final)
|
||||
- Reconcile payments with bank transactions
|
||||
- Report on AR aging and cash flow projections
|
||||
|
||||
## Integrations
|
||||
|
||||
| Platform | Purpose |
|
||||
|----------|---------|
|
||||
| QuickBooks / Xero / FreshBooks | Invoicing and accounting |
|
||||
| Stripe / PayPal | Payment processing and status |
|
||||
| Plaid / Mercury | Bank transaction reconciliation |
|
||||
| Slack / Email | Collection alerts and summaries |
|
||||
| Google Sheets | AR aging reports and forecasts |
|
||||
|
||||
## Escalation Path
|
||||
|
||||
| Trigger | Action |
|
||||
|---------|--------|
|
||||
| Invoice overdue >30 days | Escalate with payment history and suggested next steps |
|
||||
| Large invoice (>$5k) goes overdue | Alert immediately with client context |
|
||||
| Client disputes invoice | Flag for your review with dispute details |
|
||||
| Payment bounces or fails | Alert with retry options |
|
||||
| Client requests payment plan | Queue for your approval with suggested terms |
|
||||
| Collections threshold reached (>60 days) | Recommend formal collection action |
|
||||
@@ -0,0 +1,38 @@
|
||||
# Recipe: Issue Triaging
|
||||
|
||||
Categorizing and routing incoming bug reports by severity and type.
|
||||
|
||||
## Why
|
||||
|
||||
Not all bugs are equal. A typo in the footer can wait; a checkout failure cannot. This agent sorts the incoming chaos — categorizing issues by severity, gathering reproduction steps, and routing them to the right person — so critical bugs get fixed fast and minor ones don't clog the queue.
|
||||
|
||||
## What
|
||||
|
||||
- Categorize incoming issues by type (bug, feature request, question)
|
||||
- Assess severity based on impact and frequency
|
||||
- Gather reproduction steps and environment details
|
||||
- Route to appropriate team member or queue
|
||||
- Track issue lifecycle from report to resolution
|
||||
|
||||
## Integrations
|
||||
|
||||
| Platform | Purpose |
|
||||
|----------|---------|
|
||||
| GitHub Issues / Linear / Jira | Issue tracking |
|
||||
| Sentry / LogRocket / Datadog | Error context and logs |
|
||||
| Slack | Triage notifications and discussion |
|
||||
| Intercom / Zendesk | Customer-reported issue intake |
|
||||
| Notion | Issue categorization rules and playbooks |
|
||||
| PagerDuty | Critical issue escalation |
|
||||
|
||||
## Escalation Path
|
||||
|
||||
| Trigger | Action |
|
||||
|---------|--------|
|
||||
| Security vulnerability reported | Immediate escalation, mark as confidential |
|
||||
| Data loss or corruption issue | P0 alert with all available context |
|
||||
| Issue affecting >10% of users | Escalate as incident with scope estimate |
|
||||
| Issue unsolvable within 30 minutes | Escalate with what was tried and ruled out |
|
||||
| Customer-reported issue from enterprise account | Priority flag regardless of severity assessment |
|
||||
| Same issue reported 5+ times in 24h | Alert as emerging pattern, consider incident |
|
||||
| Issue requires architecture decision | Queue for tech lead review |
|
||||
@@ -1,156 +0,0 @@
|
||||
# Recipe: Marketing Content Agent
|
||||
|
||||
A multi-channel marketing content generator. Given a product description and target audience, this agent analyzes the audience, generates tailored copy for multiple channels, and produces A/B variants.
|
||||
|
||||
## Goal
|
||||
|
||||
```
|
||||
Name: Marketing Content Generator
|
||||
Description: Generate targeted marketing content across multiple channels
|
||||
for a given product and audience.
|
||||
|
||||
Success criteria:
|
||||
- Audience analysis is produced with demographics and pain points
|
||||
- At least 2 channel-specific content pieces are generated
|
||||
- A/B variants are provided for each piece
|
||||
- All content aligns with the specified brand voice
|
||||
|
||||
Constraints:
|
||||
- (hard) No competitor brand names in generated content
|
||||
- (soft) Content should be under 280 characters for social media channels
|
||||
```
|
||||
|
||||
## Input / Output
|
||||
|
||||
**Input:**
|
||||
- `product_description` (str) — What the product is and does
|
||||
- `target_audience` (str) — Who the content is for
|
||||
- `brand_voice` (str) — Tone and style guidelines (e.g., "professional but approachable")
|
||||
- `channels` (list[str]) — Target channels, e.g. `["email", "twitter", "linkedin"]`
|
||||
|
||||
**Output:**
|
||||
- `audience_analysis` (dict) — Demographics, pain points, motivations
|
||||
- `content` (list[dict]) — Per-channel content with A/B variants
|
||||
|
||||
## Workflow
|
||||
|
||||
```
|
||||
[analyze_audience] → [generate_content] → [review_and_refine]
|
||||
|
|
||||
(conditional)
|
||||
|
|
||||
needs_revision == True → [generate_content]
|
||||
needs_revision == False → (done)
|
||||
```
|
||||
|
||||
## Nodes
|
||||
|
||||
### 1. analyze_audience
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| Type | `llm_generate` |
|
||||
| Input keys | `product_description`, `target_audience` |
|
||||
| Output keys | `audience_analysis` |
|
||||
| Tools | None |
|
||||
|
||||
**System prompt:**
|
||||
```
|
||||
You are a marketing strategist. Analyze the target audience for a product.
|
||||
|
||||
Product: {product_description}
|
||||
Target audience: {target_audience}
|
||||
|
||||
Produce a structured analysis in JSON:
|
||||
{{
|
||||
"audience_analysis": {{
|
||||
"demographics": "...",
|
||||
"pain_points": ["..."],
|
||||
"motivations": ["..."],
|
||||
"preferred_channels": ["..."],
|
||||
"messaging_angle": "..."
|
||||
}}
|
||||
}}
|
||||
```
|
||||
|
||||
### 2. generate_content
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| Type | `llm_generate` |
|
||||
| Input keys | `product_description`, `audience_analysis`, `brand_voice`, `channels` |
|
||||
| Output keys | `content` |
|
||||
| Tools | None |
|
||||
|
||||
**System prompt:**
|
||||
```
|
||||
You are a marketing copywriter. Generate content for each channel.
|
||||
|
||||
Product: {product_description}
|
||||
Audience analysis: {audience_analysis}
|
||||
Brand voice: {brand_voice}
|
||||
Channels: {channels}
|
||||
|
||||
For each channel, produce two variants (A and B).
|
||||
|
||||
Output as JSON:
|
||||
{{
|
||||
"content": [
|
||||
{{
|
||||
"channel": "twitter",
|
||||
"variant_a": "...",
|
||||
"variant_b": "..."
|
||||
}}
|
||||
]
|
||||
}}
|
||||
```
|
||||
|
||||
### 3. review_and_refine
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| Type | `llm_generate` |
|
||||
| Input keys | `content`, `brand_voice` |
|
||||
| Output keys | `content`, `needs_revision` |
|
||||
| Tools | None |
|
||||
|
||||
**System prompt:**
|
||||
```
|
||||
You are a senior marketing editor. Review the following content for brand
|
||||
voice alignment, clarity, and channel appropriateness.
|
||||
|
||||
Content: {content}
|
||||
Brand voice: {brand_voice}
|
||||
|
||||
If any piece needs revision, fix it and set needs_revision to true.
|
||||
If everything looks good, return the content unchanged with needs_revision false.
|
||||
|
||||
Output as JSON:
|
||||
{{
|
||||
"content": [...],
|
||||
"needs_revision": false
|
||||
}}
|
||||
```
|
||||
|
||||
## Edges
|
||||
|
||||
| Source | Target | Condition | Priority |
|
||||
|--------|--------|-----------|----------|
|
||||
| analyze_audience | generate_content | `on_success` | 0 |
|
||||
| generate_content | review_and_refine | `on_success` | 0 |
|
||||
| review_and_refine | generate_content | `conditional: needs_revision == True` | 10 |
|
||||
|
||||
The `review_and_refine → generate_content` loop has higher priority so it's checked first. If `needs_revision` is false, execution ends at `review_and_refine` (terminal node).
|
||||
|
||||
## Tools
|
||||
|
||||
This recipe uses no external tools — all nodes are `llm_generate`. To extend it, consider adding:
|
||||
- A web search tool for competitive analysis in the `analyze_audience` node
|
||||
- A URL shortener tool for social media content
|
||||
- An image generation tool for visual content variants
|
||||
|
||||
## Variations
|
||||
|
||||
- **Single-channel mode**: Remove the `channels` input and hardcode to one channel for simpler output
|
||||
- **With approval gate**: Add a `human_input` node between `review_and_refine` and the terminal to require human sign-off
|
||||
- **With analytics**: Add a `function` node that logs generated content to a tracking system
|
||||
@@ -0,0 +1,61 @@
|
||||
# Recipe: News Jacking
|
||||
|
||||
Automated personalized outreach triggered by real-time company news.
|
||||
|
||||
## Why
|
||||
|
||||
Cold outreach gets ignored. But when you reference something that *just* happened to someone — a funding round, a podcast appearance, a new hire announcement — suddenly you're not a stranger, you're someone who pays attention. The problem is manually monitoring hundreds of leads for these moments is impossible. This agent does the watching so you can do the reaching.
|
||||
|
||||
## What
|
||||
|
||||
- Monitor news sources for lead companies (LinkedIn, Google News, TechCrunch, press releases)
|
||||
- Detect trigger events: funding announcements, executive hires, podcast appearances, product launches, awards
|
||||
- Draft hyper-personalized outreach referencing the specific event
|
||||
- Queue emails for human review or auto-send based on confidence score
|
||||
- Track response rates by trigger type to optimize over time
|
||||
|
||||
## Integrations
|
||||
|
||||
| Platform | Purpose |
|
||||
|----------|---------|
|
||||
| Google News API / NewsAPI | Monitor company mentions |
|
||||
| LinkedIn Sales Navigator | Track company updates and job changes |
|
||||
| Apollo / Clearbit | Enrich lead data and find contact info |
|
||||
| Gmail / Outlook | Send personalized outreach |
|
||||
| CRM (HubSpot, Salesforce) | Log outreach and track responses |
|
||||
| Slack | Notify when high-value triggers detected |
|
||||
|
||||
## Escalation Path
|
||||
|
||||
| Trigger | Action |
|
||||
|---------|--------|
|
||||
| High-value lead (enterprise, known target account) | Queue for human review before sending |
|
||||
| Confidence score < 80% on event details | Flag for verification — do NOT auto-send |
|
||||
| Unable to verify news source | Skip outreach, log for manual review |
|
||||
| Lead responds | Alert immediately, pause automation for this lead |
|
||||
| Bounce or unsubscribe | Remove from automation, update CRM |
|
||||
| Same lead triggered multiple times in 30 days | Consolidate into single touchpoint |
|
||||
|
||||
## Guardrails
|
||||
|
||||
This agent has high "spam potential" if not configured carefully:
|
||||
|
||||
| Risk | Mitigation |
|
||||
|------|------------|
|
||||
| Hallucinated event details | Always include source URL, verify against multiple sources |
|
||||
| Tone-deaf timing (layoffs, bad news) | Filter out negative events, require human review for ambiguous |
|
||||
| Over-automation feels robotic | Randomize send times, vary templates, cap frequency per lead |
|
||||
| Referencing wrong person/company | Double-check entity resolution before drafting |
|
||||
|
||||
## Example Flow
|
||||
|
||||
```
|
||||
1. Agent detects: "[Lead's Company] raises $5M Series A" on TechCrunch
|
||||
2. Enriches: Finds CEO email via Apollo, confirms company match
|
||||
3. Drafts: "Hey [Name], congrats on the Series A! Saw the TechCrunch piece
|
||||
this morning. Scaling the team post-raise is always a ride — we help
|
||||
[Company Type] with [Value Prop]..."
|
||||
4. Scores: 92% confidence (verified source, exact name match)
|
||||
5. Routes: Auto-queue for send at 9:15 AM recipient's timezone
|
||||
6. Logs: Records in CRM with trigger type "funding_announcement"
|
||||
```
|
||||
@@ -0,0 +1,35 @@
|
||||
# Recipe: Newsletter Production
|
||||
|
||||
Taking your raw ideas or voice memos and turning them into a polished weekly email.
|
||||
|
||||
## Why
|
||||
|
||||
Your audience wants to hear from you, not your ghostwriter. But you don't have 4 hours to craft the perfect newsletter. This agent captures your voice from quick inputs — voice memos, bullet points, Slack messages — and transforms them into publish-ready emails that sound like you.
|
||||
|
||||
## What
|
||||
|
||||
- Ingest raw content (voice memos, notes, bullet points)
|
||||
- Draft newsletter in your voice and style
|
||||
- Format with headers, links, and CTAs
|
||||
- Schedule for optimal send time
|
||||
- Track open rates and click-through for future optimization
|
||||
|
||||
## Integrations
|
||||
|
||||
| Platform | Purpose |
|
||||
|----------|---------|
|
||||
| Otter.ai / Whisper | Voice memo transcription |
|
||||
| Notion / Google Docs | Draft storage and editing |
|
||||
| Mailchimp / ConvertKit / Beehiiv | Newsletter distribution |
|
||||
| Slack | Content intake and approvals |
|
||||
| Google Analytics / UTM tracking | Performance measurement |
|
||||
|
||||
## Escalation Path
|
||||
|
||||
| Trigger | Action |
|
||||
|---------|--------|
|
||||
| Draft ready for review | Send preview link and summary for your approval |
|
||||
| Unusually low open rate on last send | Alert with analysis and A/B test suggestions |
|
||||
| Subscriber replies with question | Forward replies that need your expertise |
|
||||
| Unsubscribe spike after send | Flag with content analysis — what went wrong? |
|
||||
| Sponsor or partnership mention required | Queue for your review before sending |
|
||||
@@ -0,0 +1,36 @@
|
||||
# Recipe: Onboarding Assistance
|
||||
|
||||
Helping new clients set up their accounts or sending out "Welcome" kits.
|
||||
|
||||
## Why
|
||||
|
||||
First impressions stick. A smooth onboarding experience sets the tone for the entire customer relationship — but walking each new client through the same steps is a time sink. This agent delivers a white-glove experience at scale, making every customer feel personally welcomed.
|
||||
|
||||
## What
|
||||
|
||||
- Send personalized welcome emails and kits
|
||||
- Guide clients through account setup step-by-step
|
||||
- Answer common "getting started" questions
|
||||
- Track onboarding completion and milestone progress
|
||||
- Follow up on incomplete setups
|
||||
|
||||
## Integrations
|
||||
|
||||
| Platform | Purpose |
|
||||
|----------|---------|
|
||||
| Intercom / Customer.io | Onboarding email sequences |
|
||||
| Notion / Loom | Tutorial content and documentation |
|
||||
| Calendly | Onboarding call scheduling |
|
||||
| Slack / Email | Progress updates and escalations |
|
||||
| Your product's API | Track setup completion status |
|
||||
| Typeform / Tally | Onboarding surveys and data collection |
|
||||
|
||||
## Escalation Path
|
||||
|
||||
| Trigger | Action |
|
||||
|---------|--------|
|
||||
| Client stuck on setup >48 hours | Alert with where they're stuck and offer to schedule call |
|
||||
| Technical blocker during setup | Route to support with context already gathered |
|
||||
| High-value client starts onboarding | Notify so you can send personal welcome |
|
||||
| Client expresses frustration | Immediate flag for human intervention |
|
||||
| Onboarding incomplete after 7 days | Escalate with churn risk assessment |
|
||||
@@ -0,0 +1,37 @@
|
||||
# Recipe: Quality Assurance (QA)
|
||||
|
||||
Testing new features or links before they go live to ensure nothing is broken.
|
||||
|
||||
## Why
|
||||
|
||||
Broken features kill trust. One bad deploy can undo months of goodwill with your users. This agent runs systematic checks before anything goes live — catching the broken links, form errors, and edge cases that would otherwise reach your customers first.
|
||||
|
||||
## What
|
||||
|
||||
- Run automated test suites before deploys
|
||||
- Manually verify critical user flows (signup, checkout, core features)
|
||||
- Check all links for 404s and broken redirects
|
||||
- Test across browsers and device sizes
|
||||
- Verify integrations are responding correctly
|
||||
|
||||
## Integrations
|
||||
|
||||
| Platform | Purpose |
|
||||
|----------|---------|
|
||||
| GitHub Actions / CircleCI | CI/CD pipeline integration |
|
||||
| Playwright / Cypress / Selenium | Automated browser testing |
|
||||
| BrowserStack / LambdaTest | Cross-browser testing |
|
||||
| Checkly / Uptrends | Synthetic monitoring |
|
||||
| Slack / PagerDuty | Test failure alerts |
|
||||
| Linear / Jira | Bug ticket creation |
|
||||
|
||||
## Escalation Path
|
||||
|
||||
| Trigger | Action |
|
||||
|---------|--------|
|
||||
| Critical test fails (auth, checkout, data) | Block deploy, alert immediately with failure details |
|
||||
| Flaky test (passes sometimes, fails others) | Flag for investigation but don't block |
|
||||
| New feature breaks existing functionality | Alert with regression details and affected areas |
|
||||
| Performance degradation detected | Flag with before/after metrics |
|
||||
| Security scan finds vulnerability | Immediate escalation with severity and remediation |
|
||||
| All tests pass but something "feels off" | Document observation and flag for human review |
|
||||
@@ -0,0 +1,34 @@
|
||||
# Recipe: Social Media Management
|
||||
|
||||
Scheduling posts, replying to comments, and monitoring trends.
|
||||
|
||||
## Why
|
||||
|
||||
Consistency kills on social media — but it also kills your time. One "quick post" turns into an hour of tweaking copy, finding hashtags, and responding to comments. This agent maintains your social presence so you stay visible without staying glued to your phone.
|
||||
|
||||
## What
|
||||
|
||||
- Schedule posts across platforms (Twitter/X, LinkedIn, Instagram, Facebook)
|
||||
- Reply to comments and DMs with on-brand responses
|
||||
- Monitor trending topics and hashtags in your niche
|
||||
- Track engagement metrics and surface what's working
|
||||
|
||||
## Integrations
|
||||
|
||||
| Platform | Purpose |
|
||||
|----------|---------|
|
||||
| Buffer / Hootsuite / Later | Post scheduling and publishing |
|
||||
| Twitter/X API | Direct posting and engagement |
|
||||
| LinkedIn API | Professional network management |
|
||||
| Meta Graph API | Facebook/Instagram management |
|
||||
| Slack | Notifications and escalations |
|
||||
|
||||
## Escalation Path
|
||||
|
||||
| Trigger | Action |
|
||||
|---------|--------|
|
||||
| Post goes viral (>10x normal engagement) | Alert with engagement stats and suggested follow-up content |
|
||||
| Negative viral moment | Immediate alert — do NOT auto-respond, queue for human review |
|
||||
| Influencer or press mentions you | Flag for personal response opportunity |
|
||||
| Controversial topic trending in your space | Alert before posting scheduled content that might be tone-deaf |
|
||||
| DM from verified account or known lead | Route directly to you |
|
||||
@@ -0,0 +1,37 @@
|
||||
# Recipe: Support Troubleshooting
|
||||
|
||||
Handling "Level 1" tech support for your platform or website.
|
||||
|
||||
## Why
|
||||
|
||||
Most support tickets are the same 20 questions over and over: password resets, access issues, "how do I..." questions. You don't need to answer these — but someone does. This agent handles the repetitive tier-1 support so your users get fast answers and you get your time back.
|
||||
|
||||
## What
|
||||
|
||||
- Handle password resets and account access issues
|
||||
- Answer common "how do I" questions from the knowledge base
|
||||
- Walk users through basic setup and configuration
|
||||
- Collect diagnostic information for complex issues
|
||||
- Log all support interactions for pattern analysis
|
||||
|
||||
## Integrations
|
||||
|
||||
| Platform | Purpose |
|
||||
|----------|---------|
|
||||
| Intercom / Zendesk / Freshdesk | Support ticket management |
|
||||
| Notion / Confluence | Knowledge base for answers |
|
||||
| Slack | Internal escalation channel |
|
||||
| Your product's API | Account status, password reset triggers |
|
||||
| LogRocket / FullStory | Session replay for debugging |
|
||||
| PagerDuty | Urgent escalation routing |
|
||||
|
||||
## Escalation Path
|
||||
|
||||
| Trigger | Action |
|
||||
|---------|--------|
|
||||
| Issue not resolved within 30 minutes | Escalate with full context gathered |
|
||||
| User expresses frustration or anger | Immediate handoff to human with de-escalation note |
|
||||
| Security-related issue (account compromise, data concern) | Escalate immediately, do not attempt to resolve |
|
||||
| Bug discovered during troubleshooting | Create ticket and escalate to engineering |
|
||||
| VIP or enterprise customer | Flag for priority handling regardless of issue |
|
||||
| Same issue reported by 3+ users | Alert as potential systemic problem |
|
||||
@@ -353,7 +353,18 @@ class CredentialStoreAdapter:
|
||||
cls,
|
||||
specs: dict[str, CredentialSpec] | None = None,
|
||||
) -> CredentialStoreAdapter:
|
||||
"""Create adapter with encrypted storage primary and env var fallback."""
|
||||
"""Create adapter with encrypted storage primary and env var fallback.
|
||||
|
||||
When ADEN_API_KEY is set, builds the store with AdenSyncProvider and
|
||||
AdenCachedStorage so that OAuth credentials (Google, HubSpot, Slack)
|
||||
auto-refresh via the Aden server. Non-Aden credentials (brave_search,
|
||||
anthropic, resend) still resolve from environment variables.
|
||||
|
||||
When ADEN_API_KEY is not set, behaves identically to before.
|
||||
"""
|
||||
import logging
|
||||
import os
|
||||
|
||||
from framework.credentials import CredentialStore
|
||||
from framework.credentials.storage import (
|
||||
CompositeStorage,
|
||||
@@ -361,6 +372,8 @@ class CredentialStoreAdapter:
|
||||
EnvVarStorage,
|
||||
)
|
||||
|
||||
log = logging.getLogger(__name__)
|
||||
|
||||
if specs is None:
|
||||
from . import CREDENTIAL_SPECS
|
||||
|
||||
@@ -368,17 +381,69 @@ class CredentialStoreAdapter:
|
||||
|
||||
env_mapping = {name: spec.env_var for name, spec in specs.items()}
|
||||
|
||||
# --- Aden sync branch ---
|
||||
# Note: we don't use CredentialStore.with_aden_sync() here because it
|
||||
# only wraps EncryptedFileStorage. We need CompositeStorage (encrypted
|
||||
# + env var fallback) so non-Aden credentials like brave_search still
|
||||
# resolve from environment variables.
|
||||
aden_api_key = os.environ.get("ADEN_API_KEY")
|
||||
if aden_api_key:
|
||||
try:
|
||||
from framework.credentials.aden import (
|
||||
AdenCachedStorage,
|
||||
AdenClientConfig,
|
||||
AdenCredentialClient,
|
||||
AdenSyncProvider,
|
||||
)
|
||||
|
||||
# Local storage: encrypted primary + env var fallback
|
||||
encrypted = EncryptedFileStorage()
|
||||
env = EnvVarStorage(env_mapping)
|
||||
local_composite = CompositeStorage(primary=encrypted, fallbacks=[env])
|
||||
|
||||
# Aden components
|
||||
client = AdenCredentialClient(
|
||||
AdenClientConfig(
|
||||
base_url=os.environ.get("ADEN_API_URL", "https://api.adenhq.com"),
|
||||
)
|
||||
)
|
||||
provider = AdenSyncProvider(client=client)
|
||||
|
||||
# AdenCachedStorage wraps composite, giving Aden priority
|
||||
cached_storage = AdenCachedStorage(
|
||||
local_storage=local_composite,
|
||||
aden_provider=provider,
|
||||
cache_ttl_seconds=300,
|
||||
)
|
||||
|
||||
store = CredentialStore(
|
||||
storage=cached_storage,
|
||||
providers=[provider],
|
||||
auto_refresh=True,
|
||||
)
|
||||
|
||||
# Initial sync: populate local cache from Aden
|
||||
try:
|
||||
synced = provider.sync_all(store)
|
||||
log.info("Aden credential sync complete: %d credentials synced", synced)
|
||||
except Exception as e:
|
||||
log.warning("Aden initial sync failed (will retry on access): %s", e)
|
||||
|
||||
return cls(store=store, specs=specs)
|
||||
|
||||
except Exception as e:
|
||||
log.warning(
|
||||
"Aden credential sync unavailable, falling back to default storage: %s", e
|
||||
)
|
||||
|
||||
# --- Default branch (no ADEN_API_KEY or Aden setup failed) ---
|
||||
try:
|
||||
encrypted = EncryptedFileStorage()
|
||||
env = EnvVarStorage(env_mapping)
|
||||
composite = CompositeStorage(primary=encrypted, fallbacks=[env])
|
||||
store = CredentialStore(storage=composite)
|
||||
except Exception as e:
|
||||
import logging
|
||||
|
||||
logging.getLogger(__name__).warning(
|
||||
"Encrypted credential storage unavailable, falling back to env vars: %s", e
|
||||
)
|
||||
log.warning("Encrypted credential storage unavailable, falling back to env vars: %s", e)
|
||||
store = CredentialStore.with_env_storage(env_mapping)
|
||||
|
||||
return cls(store=store, specs=specs)
|
||||
|
||||
@@ -1,5 +1,7 @@
|
||||
"""Tests for CredentialStoreAdapter."""
|
||||
|
||||
from unittest.mock import MagicMock, patch
|
||||
|
||||
import pytest
|
||||
|
||||
from aden_tools.credentials import (
|
||||
@@ -484,3 +486,130 @@ class TestSpecCompleteness:
|
||||
assert spec.credential_group == "", (
|
||||
f"Credential '{name}' has unexpected credential_group='{spec.credential_group}'"
|
||||
)
|
||||
|
||||
|
||||
class TestCredentialStoreAdapterAdenSync:
|
||||
"""Tests for Aden sync branch in CredentialStoreAdapter.default()."""
|
||||
|
||||
def _patch_encrypted_storage(self, tmp_path):
|
||||
"""Patch EncryptedFileStorage to use a temp directory."""
|
||||
from framework.credentials.storage import EncryptedFileStorage
|
||||
|
||||
original_init = EncryptedFileStorage.__init__
|
||||
|
||||
def patched_init(self_inner, base_path=None, **kwargs):
|
||||
original_init(self_inner, base_path=str(tmp_path / "creds"), **kwargs)
|
||||
|
||||
return patch.object(EncryptedFileStorage, "__init__", patched_init)
|
||||
|
||||
def test_default_with_aden_key_creates_aden_store(self, monkeypatch, tmp_path):
|
||||
"""When ADEN_API_KEY is set, default() wires up AdenSyncProvider."""
|
||||
monkeypatch.setenv("ADEN_API_KEY", "test-aden-key")
|
||||
monkeypatch.setenv("ADEN_API_URL", "https://test.adenhq.com")
|
||||
|
||||
mock_client = MagicMock()
|
||||
mock_client.list_integrations.return_value = []
|
||||
|
||||
with (
|
||||
self._patch_encrypted_storage(tmp_path),
|
||||
patch(
|
||||
"framework.credentials.aden.AdenCredentialClient",
|
||||
return_value=mock_client,
|
||||
),
|
||||
patch(
|
||||
"framework.credentials.aden.AdenClientConfig",
|
||||
),
|
||||
):
|
||||
adapter = CredentialStoreAdapter.default()
|
||||
|
||||
# Verify AdenSyncProvider is registered
|
||||
provider = adapter.store.get_provider("aden_sync")
|
||||
assert provider is not None
|
||||
|
||||
def test_default_without_aden_key_uses_env_fallback(self, monkeypatch, tmp_path):
|
||||
"""When ADEN_API_KEY is not set, default() uses env-only storage."""
|
||||
monkeypatch.delenv("ADEN_API_KEY", raising=False)
|
||||
monkeypatch.setenv("BRAVE_SEARCH_API_KEY", "test-brave-key")
|
||||
|
||||
with self._patch_encrypted_storage(tmp_path):
|
||||
adapter = CredentialStoreAdapter.default()
|
||||
|
||||
# No Aden provider should be registered
|
||||
assert adapter.store.get_provider("aden_sync") is None
|
||||
# Env vars still work
|
||||
assert adapter.get("brave_search") == "test-brave-key"
|
||||
|
||||
def test_default_aden_non_aden_cred_falls_through_to_env(self, monkeypatch, tmp_path):
|
||||
"""Non-Aden credentials (e.g. brave_search) resolve from env vars even with Aden."""
|
||||
monkeypatch.setenv("ADEN_API_KEY", "test-aden-key")
|
||||
monkeypatch.setenv("ADEN_API_URL", "https://test.adenhq.com")
|
||||
monkeypatch.setenv("BRAVE_SEARCH_API_KEY", "brave-from-env")
|
||||
|
||||
mock_client = MagicMock()
|
||||
mock_client.list_integrations.return_value = []
|
||||
# Aden returns None for brave_search (404 → None)
|
||||
mock_client.get_credential.return_value = None
|
||||
|
||||
with (
|
||||
self._patch_encrypted_storage(tmp_path),
|
||||
patch(
|
||||
"framework.credentials.aden.AdenCredentialClient",
|
||||
return_value=mock_client,
|
||||
),
|
||||
patch(
|
||||
"framework.credentials.aden.AdenClientConfig",
|
||||
),
|
||||
):
|
||||
adapter = CredentialStoreAdapter.default()
|
||||
|
||||
assert adapter.get("brave_search") == "brave-from-env"
|
||||
|
||||
def test_default_aden_sync_failure_falls_back_gracefully(self, monkeypatch, tmp_path):
|
||||
"""If Aden initial sync fails, adapter is still created and env vars work."""
|
||||
monkeypatch.setenv("ADEN_API_KEY", "test-aden-key")
|
||||
monkeypatch.setenv("ADEN_API_URL", "https://test.adenhq.com")
|
||||
monkeypatch.setenv("BRAVE_SEARCH_API_KEY", "brave-fallback")
|
||||
|
||||
mock_client = MagicMock()
|
||||
mock_client.list_integrations.side_effect = Exception("Connection refused")
|
||||
mock_client.get_credential.return_value = None
|
||||
|
||||
with (
|
||||
self._patch_encrypted_storage(tmp_path),
|
||||
patch(
|
||||
"framework.credentials.aden.AdenCredentialClient",
|
||||
return_value=mock_client,
|
||||
),
|
||||
patch(
|
||||
"framework.credentials.aden.AdenClientConfig",
|
||||
),
|
||||
):
|
||||
adapter = CredentialStoreAdapter.default()
|
||||
|
||||
# Adapter was created despite sync failure
|
||||
assert adapter is not None
|
||||
assert adapter.get("brave_search") == "brave-fallback"
|
||||
|
||||
def test_default_aden_import_error_falls_back(self, monkeypatch, tmp_path):
|
||||
"""If Aden imports fail (e.g. missing httpx), fall back to default storage."""
|
||||
monkeypatch.setenv("ADEN_API_KEY", "test-aden-key")
|
||||
monkeypatch.setenv("BRAVE_SEARCH_API_KEY", "brave-fallback")
|
||||
|
||||
import builtins
|
||||
|
||||
real_import = builtins.__import__
|
||||
|
||||
def mock_import(name, *args, **kwargs):
|
||||
if name == "framework.credentials.aden":
|
||||
raise ImportError(f"No module named '{name}'")
|
||||
return real_import(name, *args, **kwargs)
|
||||
|
||||
with (
|
||||
self._patch_encrypted_storage(tmp_path),
|
||||
patch.object(builtins, "__import__", side_effect=mock_import),
|
||||
):
|
||||
adapter = CredentialStoreAdapter.default()
|
||||
|
||||
# Fell back to default — env vars still work, no Aden provider
|
||||
assert adapter.store.get_provider("aden_sync") is None
|
||||
assert adapter.get("brave_search") == "brave-fallback"
|
||||
|
||||
Reference in New Issue
Block a user