Merge branch 'main' into feat/core-framework
This commit is contained in:
@@ -43,10 +43,19 @@ pnpm-debug.log*
|
||||
# Testing
|
||||
coverage/
|
||||
.nyc_output/
|
||||
.pytest_cache/
|
||||
|
||||
# TypeScript
|
||||
*.tsbuildinfo
|
||||
|
||||
# Python
|
||||
__pycache__/
|
||||
*.py[cod]
|
||||
*$py.class
|
||||
*.egg-info/
|
||||
.eggs/
|
||||
*.egg
|
||||
|
||||
# Misc
|
||||
*.local
|
||||
.cache/
|
||||
|
||||
@@ -9,6 +9,20 @@
|
||||
[](https://x.com/aden_hq)
|
||||
[](https://www.linkedin.com/company/teamaden/)
|
||||
|
||||
<p align="center">
|
||||
<img src="https://img.shields.io/badge/AI_Agents-Self--Improving-brightgreen?style=flat-square" alt="AI Agents" />
|
||||
<img src="https://img.shields.io/badge/Multi--Agent-Systems-blue?style=flat-square" alt="Multi-Agent" />
|
||||
<img src="https://img.shields.io/badge/Goal--Driven-Development-purple?style=flat-square" alt="Goal-Driven" />
|
||||
<img src="https://img.shields.io/badge/Human--in--the--Loop-orange?style=flat-square" alt="HITL" />
|
||||
<img src="https://img.shields.io/badge/Production--Ready-red?style=flat-square" alt="Production" />
|
||||
</p>
|
||||
<p align="center">
|
||||
<img src="https://img.shields.io/badge/OpenAI-supported-412991?style=flat-square&logo=openai" alt="OpenAI" />
|
||||
<img src="https://img.shields.io/badge/Anthropic-supported-d4a574?style=flat-square" alt="Anthropic" />
|
||||
<img src="https://img.shields.io/badge/Google_Gemini-supported-4285F4?style=flat-square&logo=google" alt="Gemini" />
|
||||
<img src="https://img.shields.io/badge/MCP-19_Tools-00ADD8?style=flat-square" alt="MCP" />
|
||||
</p>
|
||||
|
||||
## Overview
|
||||
|
||||
Build reliable, self-improving AI agents without hardcoding workflows. Define your goal through conversation with a coding agent, and the framework generates a node graph with dynamically created connection code. When things break, the framework captures failure data, evolves the agent through the coding agent, and redeploys. Built-in human-in-the-loop nodes, credential management, and real-time monitoring give you control without sacrificing adaptability.
|
||||
@@ -62,6 +76,139 @@ docker compose up
|
||||
- **Cost & Budget Control** - Set spending limits, throttles, and automatic model degradation policies
|
||||
- **Production-Ready** - Self-hostable, built for scale and reliability
|
||||
|
||||
## Why Aden
|
||||
|
||||
Traditional agent frameworks require you to manually design workflows, define agent interactions, and handle failures reactively. Aden flips this paradigm—**you describe outcomes, and the system builds itself**.
|
||||
|
||||
```mermaid
|
||||
flowchart TB
|
||||
subgraph USER["👤 User"]
|
||||
GOAL[("🎯 Define Goal<br/>(Natural Language)")]
|
||||
end
|
||||
|
||||
subgraph CODING["🤖 Coding Agent"]
|
||||
direction TB
|
||||
GENERATE["Generate Agent Graph"]
|
||||
CONNECTION["Create Connection Code"]
|
||||
TESTGEN["Generate Test Cases"]
|
||||
EVOLVE["Evolve on Failure"]
|
||||
end
|
||||
|
||||
subgraph WORKERS["⚙️ Worker Agents"]
|
||||
direction TB
|
||||
subgraph NODE1["SDK-Wrapped Node"]
|
||||
N1_MEM["Memory (STM/LTM)"]
|
||||
N1_TOOLS["Tools Access"]
|
||||
N1_LLM["LLM Integration"]
|
||||
N1_MON["Monitoring"]
|
||||
end
|
||||
subgraph NODE2["SDK-Wrapped Node"]
|
||||
N2_MEM["Memory (STM/LTM)"]
|
||||
N2_TOOLS["Tools Access"]
|
||||
N2_LLM["LLM Integration"]
|
||||
N2_MON["Monitoring"]
|
||||
end
|
||||
HITL["🙋 Human-in-the-Loop<br/>Intervention Points"]
|
||||
end
|
||||
|
||||
subgraph CONTROL["🎛️ Hive Control Plane"]
|
||||
direction TB
|
||||
BUDGET["Budget & Cost Control"]
|
||||
POLICY["Policy Management"]
|
||||
METRICS["Real-time Metrics"]
|
||||
MCP["19 MCP Tools"]
|
||||
end
|
||||
|
||||
subgraph STORAGE["💾 Storage Layer"]
|
||||
TSDB[("TimescaleDB<br/>Metrics & Events")]
|
||||
MONGO[("MongoDB<br/>Policies")]
|
||||
POSTGRES[("PostgreSQL<br/>Users & Config")]
|
||||
end
|
||||
|
||||
subgraph DASHBOARD["📊 Dashboard (Honeycomb)"]
|
||||
ANALYTICS["Analytics & KPIs"]
|
||||
AGENTS["Agent Monitoring"]
|
||||
COSTS["Cost Tracking"]
|
||||
end
|
||||
|
||||
GOAL --> GENERATE
|
||||
GENERATE --> CONNECTION
|
||||
CONNECTION --> TESTGEN
|
||||
TESTGEN --> NODE1
|
||||
TESTGEN --> NODE2
|
||||
|
||||
NODE1 <--> NODE2
|
||||
NODE1 & NODE2 --> HITL
|
||||
|
||||
NODE1 & NODE2 -->|Events| CONTROL
|
||||
CONTROL -->|Policies| NODE1 & NODE2
|
||||
CONTROL <-->|WebSocket| DASHBOARD
|
||||
|
||||
CONTROL --> STORAGE
|
||||
|
||||
NODE1 & NODE2 -->|Failure Data| EVOLVE
|
||||
EVOLVE -->|Updated Graph| GENERATE
|
||||
|
||||
style USER fill:#e8f5e9,stroke:#2e7d32
|
||||
style CODING fill:#e3f2fd,stroke:#1565c0
|
||||
style WORKERS fill:#fff3e0,stroke:#ef6c00
|
||||
style CONTROL fill:#fce4ec,stroke:#c2185b
|
||||
style STORAGE fill:#f3e5f5,stroke:#7b1fa2
|
||||
style DASHBOARD fill:#e0f7fa,stroke:#00838f
|
||||
```
|
||||
|
||||
### The Aden Advantage
|
||||
|
||||
| Traditional Frameworks | Aden |
|
||||
|------------------------|------|
|
||||
| Hardcode agent workflows | Describe goals in natural language |
|
||||
| Manual graph definition | Auto-generated agent graphs |
|
||||
| Reactive error handling | Proactive self-evolution |
|
||||
| Static tool configurations | Dynamic SDK-wrapped nodes |
|
||||
| Separate monitoring setup | Built-in real-time observability |
|
||||
| DIY budget management | Integrated cost controls & degradation |
|
||||
|
||||
### How It Works
|
||||
|
||||
1. **Define Your Goal** → Describe what you want to achieve in plain English
|
||||
2. **Coding Agent Generates** → Creates the agent graph, connection code, and test cases
|
||||
3. **Workers Execute** → SDK-wrapped nodes run with full observability and tool access
|
||||
4. **Control Plane Monitors** → Real-time metrics, budget enforcement, policy management
|
||||
5. **Self-Improve** → On failure, the system evolves the graph and redeploys automatically
|
||||
|
||||
## How Aden Compares
|
||||
|
||||
Aden takes a fundamentally different approach to agent development. While most frameworks require you to hardcode workflows or manually define agent graphs, Aden uses a **coding agent to generate your entire agent system** from natural language goals. When agents fail, the framework doesn't just log errors—it **automatically evolves the agent graph** and redeploys.
|
||||
|
||||
### Comparison Table
|
||||
|
||||
| Framework | Category | Approach | Aden Difference |
|
||||
|-----------|----------|----------|-----------------|
|
||||
| **LangChain, LlamaIndex, Haystack** | Component Libraries | Predefined components for RAG/LLM apps; manual connection logic | Generates entire graph and connection code upfront |
|
||||
| **CrewAI, AutoGen, Swarm** | Multi-Agent Orchestration | Role-based agents with predefined collaboration patterns | Dynamically creates agents/connections; adapts on failure |
|
||||
| **PydanticAI, Mastra, Agno** | Type-Safe Frameworks | Structured outputs and validation for known workflows | Evolving workflows; structure emerges through iteration |
|
||||
| **Agent Zero, Letta** | Personal AI Assistants | Memory and learning; OS-as-tool or stateful memory focus | Production multi-agent systems with self-healing |
|
||||
| **CAMEL** | Research Framework | Emergent behavior in large-scale simulations (up to 1M agents) | Production-oriented with reliable execution and recovery |
|
||||
| **TEN Framework, Genkit** | Infrastructure Frameworks | Real-time multimodal (TEN) or full-stack AI (Genkit) | Higher abstraction—generates and evolves agent logic |
|
||||
| **GPT Engineer, Motia** | Code Generation | Code from specs (GPT Engineer) or "Step" primitive (Motia) | Self-adapting graphs with automatic failure recovery |
|
||||
| **Trading Agents** | Domain-Specific | Hardcoded trading firm roles on LangGraph | Domain-agnostic; generates structures for any use case |
|
||||
|
||||
### When to Choose Aden
|
||||
|
||||
Choose Aden when you need:
|
||||
- Agents that **self-improve from failures** without manual intervention
|
||||
- **Goal-driven development** where you describe outcomes, not workflows
|
||||
- **Production reliability** with automatic recovery and redeployment
|
||||
- **Rapid iteration** on agent architectures without rewriting code
|
||||
- **Full observability** with real-time monitoring and human oversight
|
||||
|
||||
Choose other frameworks when you need:
|
||||
- **Type-safe, predictable workflows** (PydanticAI, Mastra)
|
||||
- **RAG and document processing** (LlamaIndex, Haystack)
|
||||
- **Research on agent emergence** (CAMEL)
|
||||
- **Real-time voice/multimodal** (TEN Framework)
|
||||
- **Simple component chaining** (LangChain, Swarm)
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
@@ -111,10 +258,26 @@ cd hive && npm run dev
|
||||
|
||||
## Roadmap
|
||||
|
||||
Aden Agent Framework aims to help developers build outcome orienated, self-adaptive agents. Please find our roadmap here
|
||||
|
||||
Aden Agent Framework aims to help developers build outcome oriented, self-adaptive agents. Please find our roadmap here
|
||||
|
||||
[ROADMAP.md](ROADMAP.md)
|
||||
|
||||
```mermaid
|
||||
timeline
|
||||
title Aden Agent Framework Roadmap
|
||||
section Foundation
|
||||
Architecture : Node-Based Architecture : Python SDK : LLM Integration (OpenAI, Anthropic, Google) : Communication Protocol
|
||||
Coding Agent : Goal Creation Session : Worker Agent Creation : MCP Tools Integration
|
||||
Worker Agent : Human-in-the-Loop : Callback Handlers : Intervention Points : Streaming Interface
|
||||
Tools : File Use : Memory (STM/LTM) : Web Search : Web Scraper : Audit Trail
|
||||
Core : Eval System : Pydantic Validation : Docker Deployment : Documentation : Sample Agents
|
||||
section Expansion
|
||||
Intelligence : Guardrails : Streaming Mode : Semantic Search
|
||||
Platform : JavaScript SDK : Custom Tool Integrator : Credential Store
|
||||
Deployment : Self-Hosted : Cloud Services : CI/CD Pipeline
|
||||
Templates : Sales Agent : Marketing Agent : Analytics Agent : Training Agent : Smart Form Agent
|
||||
```
|
||||
|
||||
## Community & Support
|
||||
|
||||
We use [Discord](https://discord.com/invite/MXE49hrKDk) for support, feature requests, and community discussions.
|
||||
@@ -147,8 +310,74 @@ For security concerns, please see [SECURITY.md](SECURITY.md).
|
||||
|
||||
This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.
|
||||
|
||||
## Frequently Asked Questions (FAQ)
|
||||
|
||||
**Q: Does Aden depend on LangChain or other agent frameworks?**
|
||||
|
||||
No. Aden is built from the ground up with no dependencies on LangChain, CrewAI, or other agent frameworks. The framework is designed to be lean and flexible, generating agent graphs dynamically rather than relying on predefined components.
|
||||
|
||||
**Q: What LLM providers does Aden support?**
|
||||
|
||||
Aden supports OpenAI (GPT-4, GPT-4o), Anthropic (Claude models), and Google Gemini out of the box. The architecture is provider-agnostic through SDK abstraction, with LiteLLM integration on the roadmap for expanded model support.
|
||||
|
||||
**Q: Can I use Aden with local AI models like Ollama?**
|
||||
|
||||
Local model support through LiteLLM integration is on our roadmap. The SDK's provider-agnostic design means adding local model support will be straightforward once implemented.
|
||||
|
||||
**Q: What makes Aden different from other agent frameworks?**
|
||||
|
||||
Aden generates your entire agent system from natural language goals using a coding agent—you don't hardcode workflows or manually define graphs. When agents fail, the framework automatically captures failure data, evolves the agent graph, and redeploys. This self-improving loop is unique to Aden.
|
||||
|
||||
**Q: Is Aden open-source?**
|
||||
|
||||
Yes, Aden is fully open-source under the Apache License 2.0. We actively encourage community contributions and collaboration.
|
||||
|
||||
**Q: Does Aden collect data from users?**
|
||||
|
||||
Aden collects telemetry data for monitoring and observability purposes, including token usage, latency metrics, and cost tracking. Content capture (prompts and responses) is configurable and stored with team-scoped data isolation. All data stays within your infrastructure when self-hosted.
|
||||
|
||||
**Q: What deployment options does Aden support?**
|
||||
|
||||
Aden supports Docker Compose deployment out of the box, with both production and development configurations. Self-hosted deployments work on any infrastructure supporting Docker. Cloud deployment options and Kubernetes-ready configurations are on the roadmap.
|
||||
|
||||
**Q: Can Aden handle complex, production-scale use cases?**
|
||||
|
||||
Yes. Aden is explicitly designed for production environments with features like automatic failure recovery, real-time observability, cost controls, and horizontal scaling support. The framework handles both simple automations and complex multi-agent workflows.
|
||||
|
||||
**Q: Does Aden support human-in-the-loop workflows?**
|
||||
|
||||
Yes, Aden fully supports human-in-the-loop workflows through intervention nodes that pause execution for human input. These include configurable timeouts and escalation policies, allowing seamless collaboration between human experts and AI agents.
|
||||
|
||||
**Q: What monitoring and debugging tools does Aden provide?**
|
||||
|
||||
Aden includes comprehensive observability features: real-time WebSocket streaming for live agent execution monitoring, TimescaleDB-powered analytics for cost and performance metrics, health check endpoints for Kubernetes integration, and 19 MCP tools for budget management, agent status, and policy control.
|
||||
|
||||
**Q: What programming languages does Aden support?**
|
||||
|
||||
Aden provides SDKs for both Python and JavaScript/TypeScript. The Python SDK includes integration templates for LangGraph, LangFlow, and LiveKit. The backend is Node.js/TypeScript, and the frontend is React/TypeScript.
|
||||
|
||||
**Q: Can Aden agents interact with external tools and APIs?**
|
||||
|
||||
Yes. Aden's SDK-wrapped nodes provide built-in tool access, and the framework supports flexible tool ecosystems. Agents can integrate with external APIs, databases, and services through the node architecture.
|
||||
|
||||
**Q: How does cost control work in Aden?**
|
||||
|
||||
Aden provides granular budget controls including spending limits, throttles, and automatic model degradation policies. You can set budgets at the team, agent, or workflow level, with real-time cost tracking and alerts.
|
||||
|
||||
**Q: Where can I find examples and documentation?**
|
||||
|
||||
Visit [docs.adenhq.com](https://docs.adenhq.com/) for complete guides, API reference, and getting started tutorials. The repository also includes documentation in the `docs/` folder and a comprehensive [DEVELOPER.md](DEVELOPER.md) guide.
|
||||
|
||||
**Q: How can I contribute to Aden?**
|
||||
|
||||
Contributions are welcome! Fork the repository, create your feature branch, implement your changes, and submit a pull request. See [CONTRIBUTING.md](CONTRIBUTING.md) for detailed guidelines.
|
||||
|
||||
**Q: Does Aden offer enterprise support?**
|
||||
|
||||
For enterprise inquiries, contact the Aden team through [adenhq.com](https://adenhq.com) or join our [Discord community](https://discord.com/invite/MXE49hrKDk) for support and discussions.
|
||||
|
||||
---
|
||||
|
||||
<p align="center">
|
||||
Made with care by the <a href="https://adenhq.com">Aden</a> team
|
||||
Made with 🔥 Passion in San Francisco
|
||||
</p>
|
||||
|
||||
+54
-33
@@ -1,56 +1,74 @@
|
||||
Product Roadmap
|
||||
|
||||
Aden Agent Framework aims to help developers build outcome orienated, self-adaptive agents. Please find our roadmap here
|
||||
Aden Agent Framework aims to help developers build outcome oriented, self-adaptive agents. Please find our roadmap here
|
||||
|
||||
```mermaid
|
||||
timeline
|
||||
title Aden Agent Framework Roadmap
|
||||
section MVP Phase
|
||||
Architecture : Node-Based : Python SDK : Flexible Edges : Hooks : Tool Use
|
||||
Capabilities : Goal Creation : Worker Agents Generation: File/Memory Tools : Multi-Agent : Human-in-the-Loop
|
||||
Foundations : Basic Eval : Docker Deployment : Documentation
|
||||
section Post-MVP
|
||||
Intelligence : Guardrails : Streaming : Semantic Search
|
||||
Ecosystem : Javascript SDK : Cloud Deployment : CI/CD : Autonomous Agent
|
||||
Agent Templates : Sales Agent : Marketing Agent : Analytics Agent
|
||||
section Foundation
|
||||
Architecture : Node-Based Architecture : Python SDK : LLM Integration (OpenAI, Anthropic, Google) : Communication Protocol
|
||||
Coding Agent : Goal Creation Session : Worker Agent Creation : MCP Tools Integration
|
||||
Worker Agent : Human-in-the-Loop : Callback Handlers : Intervention Points : Streaming Interface
|
||||
Tools : File Use : Memory (STM/LTM) : Web Search : Web Scraper : Audit Trail
|
||||
Core : Eval System : Pydantic Validation : Docker Deployment : Documentation : Sample Agents
|
||||
section Expansion
|
||||
Intelligence : Guardrails : Streaming Mode : Semantic Search
|
||||
Platform : JavaScript SDK : Custom Tool Integrator : Credential Store
|
||||
Deployment : Self-Hosted : Cloud Services : CI/CD Pipeline
|
||||
Templates : Sales Agent : Marketing Agent : Analytics Agent : Training Agent : Smart Form Agent
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Phase 1: MVP and SDK backbone
|
||||
## Phase 1: Foundation
|
||||
|
||||
### Backbone Architecture
|
||||
- [ ] **Node-Based Architecture (Agent as a node)**
|
||||
- [ ] Object schema definition
|
||||
- [ ] Node wrapper SDK
|
||||
- [x] Object schema definition
|
||||
- [x] Node wrapper SDK
|
||||
- [ ] Shared memory access
|
||||
- [ ] Default monitoring hooks
|
||||
- [ ] Tool access layer
|
||||
- [ ] LLM integration layer (Natively supports all mainstream LLMs through LiteLLM)
|
||||
- [x] LLM integration layer (Natively supports all mainstream LLMs through LiteLLM)
|
||||
- [x] Anthropic
|
||||
- [x] OpenAI
|
||||
- [x] Google
|
||||
- [ ] **Communication protocol between nodes**
|
||||
- [ ] **[Coding Agent] Goal Creation Session**
|
||||
- [ ] Instruction for coding agents supporting generation of goal with multiple rounds of conversation
|
||||
- [ ] Goal Object schema definition
|
||||
- [ ] Support generating test cases for goal
|
||||
- [ ] **[Coding Agent] Goal Creation Session** (separate from coding session)
|
||||
- [ ] Instruction back and forth
|
||||
- [x] Goal Object schema definition
|
||||
- [ ] Being able to generate the test cases
|
||||
- [ ] Test case validation for worker agent (Outcome driven)
|
||||
- [ ] **[Coding Agent] Worker Agent Creation**
|
||||
- [ ] Coding Agent tools
|
||||
- [x] Coding Agent tools
|
||||
- [ ] Use Template Agent as a start
|
||||
- [x] Use our MCP tools
|
||||
- [ ] **[Worker Agent] Human-in-the-Loop**
|
||||
- [x] Worker Agents request with questions and options
|
||||
- [x] Callback Handler System to receive events throughout execution
|
||||
- [ ] Tool-Based Intervention Points (tool to pause execution and request human input)
|
||||
- [x] Multiple entrypoint for different event source (e.g. Human input, webhook)
|
||||
- [ ] Streaming Interface for Real-time Monitoring
|
||||
- [ ] Request State Management
|
||||
|
||||
### Essential Tools
|
||||
- [ ] **File Use**
|
||||
- [x] **File Use Tool Kit**
|
||||
- [ ] **Memory Tools**
|
||||
- [ ] STM Layer Tool (state-based short-term memory)
|
||||
- [ ] LTM Layer Tool (RLM - long-term memory)
|
||||
- [x] STM Layer Tool (state-based short-term memory)
|
||||
- [x] LTM Layer Tool (RLM - long-term memory)
|
||||
- [ ] **Infrastructure Tools**
|
||||
- [ ] Runtime Log Tool (logs for coding agent)
|
||||
- [x] Runtime Log Tool (logs for coding agent)
|
||||
- [ ] Audit Trail Tool (decision timeline generation)
|
||||
- [ ] Web Search
|
||||
- [ ] Web Scraper
|
||||
- [ ] Recipe for "Add your own tools"
|
||||
|
||||
### Memory & File System
|
||||
- [ ] DB for long-term persistent memory (Filesystem as durable scratchpad pattern)
|
||||
- [ ] Session Local memory isolation
|
||||
- [x] DB for long-term persistent memory (Filesystem as durable scratchpad pattern)
|
||||
- [x] Session Local memory isolation
|
||||
|
||||
### Basic Eval System
|
||||
- [ ] Test Driven
|
||||
### Eval System (Basic)
|
||||
- [x] Test Driven - Run test case for all agent iteration
|
||||
- [ ] Failure recording mechanism
|
||||
- [ ] SDK for defining failure conditions
|
||||
- [ ] Basic observability hooks
|
||||
@@ -59,11 +77,15 @@ timeline
|
||||
### Data Validation
|
||||
- [ ] Natively Support data validation of LLMs output with Pydantic
|
||||
|
||||
### Developer Experience (MVP)
|
||||
### Developer Experience
|
||||
- [ ] **Debugging mode**
|
||||
- [ ] **Documentation**
|
||||
- [ ] Quick start guide
|
||||
- [ ] Goal creation guide
|
||||
- [ ] Agent creation guide
|
||||
- [ ] GitHub Page setup
|
||||
- [ ] README with examples
|
||||
- [ ] Contributing guidelines
|
||||
- [ ] **Distribution**
|
||||
- [ ] PyPI package
|
||||
- [ ] Docker image on Docker Hub
|
||||
@@ -75,7 +97,7 @@ timeline
|
||||
|
||||
---
|
||||
|
||||
## Phase 2: Post-MVP & Scaling
|
||||
## Phase 2: Expansion
|
||||
|
||||
### Basic Guardrails
|
||||
- [ ] Support Basic Monitoring from Agent node SDK
|
||||
@@ -86,7 +108,7 @@ timeline
|
||||
- [ ] Streaming mode support
|
||||
|
||||
### Cross-Platform
|
||||
- [ ] Javascript / TypeScript Version SDK
|
||||
- [ ] JavaScript / TypeScript Version SDK
|
||||
|
||||
### File System Enhancement
|
||||
- [ ] Semantic Search integration
|
||||
@@ -96,7 +118,7 @@ timeline
|
||||
- [ ] Custom Tool Integrator
|
||||
- [ ] Integration as a tool (Credential Store & Support)
|
||||
- [ ] **Core Agent Tools**
|
||||
- [ ] Node Discovery Tool
|
||||
- [ ] Node Discovery Tool (find other agents in the graph)
|
||||
- [ ] HITL Tool (pause execution for human approval)
|
||||
- [ ] Wake-up Tool (resume agent tasks)
|
||||
|
||||
@@ -117,8 +139,7 @@ timeline
|
||||
- [ ] All tests must pass for deployment
|
||||
|
||||
### Developer Experience Enhancement
|
||||
- [ ] Detailed Tool usage documentation
|
||||
- [ ] Recipe for common agent use cases
|
||||
- [ ] Tool usage documentation
|
||||
- [ ] Discord Support Channel
|
||||
|
||||
### More Agent Templates
|
||||
@@ -126,4 +147,4 @@ timeline
|
||||
- [ ] GTM Marketing Agent (workflow)
|
||||
- [ ] Analytics Agent
|
||||
- [ ] Training Agent
|
||||
- [ ] Smart Entry / Form Agent (self-evolution emphasis)
|
||||
- [ ] Smart Entry / Form Agent (self-evolution emphasis)
|
||||
|
||||
@@ -0,0 +1,186 @@
|
||||
# Building Tools for Aden
|
||||
|
||||
This guide explains how to create new tools for the Aden agent framework using FastMCP.
|
||||
|
||||
## Quick Start Checklist
|
||||
|
||||
1. Create folder under `src/aden_tools/tools/<tool_name>/`
|
||||
2. Implement a `register_tools(mcp: FastMCP)` function using the `@mcp.tool()` decorator
|
||||
3. Add a `README.md` documenting your tool
|
||||
4. Register in `src/aden_tools/tools/__init__.py`
|
||||
5. Add tests in `tests/tools/`
|
||||
|
||||
## Tool Structure
|
||||
|
||||
Each tool lives in its own folder:
|
||||
|
||||
```
|
||||
src/aden_tools/tools/my_tool/
|
||||
├── __init__.py # Export register_tools function
|
||||
├── my_tool.py # Tool implementation
|
||||
└── README.md # Documentation
|
||||
```
|
||||
|
||||
## Implementation Pattern
|
||||
|
||||
Tools use FastMCP's native decorator pattern:
|
||||
|
||||
```python
|
||||
from fastmcp import FastMCP
|
||||
|
||||
|
||||
def register_tools(mcp: FastMCP) -> None:
|
||||
"""Register my tools with the MCP server."""
|
||||
|
||||
@mcp.tool()
|
||||
def my_tool(
|
||||
query: str,
|
||||
limit: int = 10,
|
||||
) -> dict:
|
||||
"""
|
||||
Search for items matching a query.
|
||||
|
||||
Use this when you need to find specific information.
|
||||
|
||||
Args:
|
||||
query: The search query (1-500 chars)
|
||||
limit: Maximum number of results (1-100)
|
||||
|
||||
Returns:
|
||||
Dict with search results or error dict
|
||||
"""
|
||||
# Validate inputs
|
||||
if not query or len(query) > 500:
|
||||
return {"error": "Query must be 1-500 characters"}
|
||||
if limit < 1 or limit > 100:
|
||||
limit = max(1, min(100, limit))
|
||||
|
||||
try:
|
||||
# Your implementation here
|
||||
results = do_search(query, limit)
|
||||
return {
|
||||
"query": query,
|
||||
"results": results,
|
||||
"total": len(results),
|
||||
}
|
||||
except Exception as e:
|
||||
return {"error": f"Search failed: {str(e)}"}
|
||||
```
|
||||
|
||||
## Exporting the Tool
|
||||
|
||||
In `src/aden_tools/tools/my_tool/__init__.py`:
|
||||
```python
|
||||
from .my_tool import register_tools
|
||||
|
||||
__all__ = ["register_tools"]
|
||||
```
|
||||
|
||||
In `src/aden_tools/tools/__init__.py`, add to `_TOOL_MODULES`:
|
||||
```python
|
||||
_TOOL_MODULES = [
|
||||
# ... existing tools
|
||||
"my_tool",
|
||||
]
|
||||
```
|
||||
|
||||
## Environment Variables
|
||||
|
||||
For tools requiring API keys or configuration, check environment variables at runtime:
|
||||
|
||||
```python
|
||||
import os
|
||||
|
||||
def register_tools(mcp: FastMCP) -> None:
|
||||
@mcp.tool()
|
||||
def my_api_tool(query: str) -> dict:
|
||||
"""Tool that requires an API key."""
|
||||
api_key = os.getenv("MY_API_KEY")
|
||||
if not api_key:
|
||||
return {
|
||||
"error": "MY_API_KEY environment variable not set",
|
||||
"help": "Get an API key at https://example.com/api",
|
||||
}
|
||||
|
||||
# Use the API key...
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
### Error Handling
|
||||
|
||||
Return error dicts instead of raising exceptions:
|
||||
|
||||
```python
|
||||
@mcp.tool()
|
||||
def my_tool(**kwargs) -> dict:
|
||||
try:
|
||||
result = do_work()
|
||||
return {"success": True, "data": result}
|
||||
except SpecificError as e:
|
||||
return {"error": f"Failed to process: {str(e)}"}
|
||||
except Exception as e:
|
||||
return {"error": f"Unexpected error: {str(e)}"}
|
||||
```
|
||||
|
||||
### Return Values
|
||||
|
||||
- Return dicts for structured data
|
||||
- Include relevant metadata (query, total count, etc.)
|
||||
- Use `{"error": "message"}` for errors
|
||||
|
||||
### Documentation
|
||||
|
||||
The docstring becomes the tool description in MCP. Include:
|
||||
- What the tool does
|
||||
- When to use it
|
||||
- Args with types and constraints
|
||||
- What it returns
|
||||
|
||||
Every tool folder needs a `README.md` with:
|
||||
- Description and use cases
|
||||
- Usage examples
|
||||
- Argument table
|
||||
- Environment variables (if any)
|
||||
- Error handling notes
|
||||
|
||||
## Testing
|
||||
|
||||
Place tests in `tests/tools/test_my_tool.py`:
|
||||
|
||||
```python
|
||||
import pytest
|
||||
from fastmcp import FastMCP
|
||||
|
||||
from aden_tools.tools.my_tool import register_tools
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def mcp():
|
||||
"""Create a FastMCP instance with tools registered."""
|
||||
server = FastMCP("test")
|
||||
register_tools(server)
|
||||
return server
|
||||
|
||||
|
||||
def test_my_tool_basic(mcp):
|
||||
"""Test basic tool functionality."""
|
||||
tool_fn = mcp._tool_manager._tools["my_tool"].fn
|
||||
result = tool_fn(query="test")
|
||||
assert "results" in result
|
||||
|
||||
|
||||
def test_my_tool_validation(mcp):
|
||||
"""Test input validation."""
|
||||
tool_fn = mcp._tool_manager._tools["my_tool"].fn
|
||||
result = tool_fn(query="")
|
||||
assert "error" in result
|
||||
```
|
||||
|
||||
Mock external APIs to keep tests fast and deterministic.
|
||||
|
||||
## Naming Conventions
|
||||
|
||||
- **Folder name**: `snake_case` with `_tool` suffix (e.g., `file_read_tool`)
|
||||
- **Function name**: `snake_case` (e.g., `file_read`)
|
||||
- **Tool description**: Clear, actionable docstring
|
||||
@@ -0,0 +1,29 @@
|
||||
# Aden Tools MCP Server
|
||||
# Exposes aden-tools via Model Context Protocol
|
||||
|
||||
FROM python:3.11-slim
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
# Copy project files
|
||||
COPY pyproject.toml ./
|
||||
COPY README.md ./
|
||||
COPY src ./src
|
||||
COPY mcp_server.py ./
|
||||
|
||||
# Install package with all dependencies
|
||||
RUN pip install --no-cache-dir -e .
|
||||
|
||||
# Create non-root user for security
|
||||
RUN useradd -m -u 1001 appuser && chown -R appuser:appuser /app
|
||||
USER appuser
|
||||
|
||||
# Expose MCP server port
|
||||
EXPOSE 4001
|
||||
|
||||
# Health check - verify server is responding
|
||||
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
|
||||
CMD python -c "import httpx; httpx.get('http://localhost:4001/health').raise_for_status()" || exit 1
|
||||
|
||||
# Run MCP server with HTTP transport
|
||||
CMD ["python", "mcp_server.py"]
|
||||
@@ -0,0 +1,103 @@
|
||||
# Aden Tools
|
||||
|
||||
Tool library for the Aden agent framework. Provides a collection of tools that AI agents can use to interact with external systems, process data, and perform actions via the Model Context Protocol (MCP).
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
pip install -e aden-tools
|
||||
```
|
||||
|
||||
For development:
|
||||
```bash
|
||||
pip install -e "aden-tools[dev]"
|
||||
```
|
||||
|
||||
## Quick Start
|
||||
|
||||
### As an MCP Server
|
||||
|
||||
```python
|
||||
from fastmcp import FastMCP
|
||||
from aden_tools.tools import register_all_tools
|
||||
|
||||
mcp = FastMCP("aden-tools")
|
||||
register_all_tools(mcp)
|
||||
mcp.run()
|
||||
```
|
||||
|
||||
Or run directly:
|
||||
```bash
|
||||
python mcp_server.py
|
||||
```
|
||||
|
||||
## Available Tools
|
||||
|
||||
| Tool | Description |
|
||||
|------|-------------|
|
||||
| `example_tool` | Template tool demonstrating the pattern |
|
||||
| `file_read` | Read contents of local files |
|
||||
| `file_write` | Write content to local files |
|
||||
| `web_search` | Search the web using Brave Search API |
|
||||
| `web_scrape` | Scrape and extract content from webpages |
|
||||
| `pdf_read` | Read and extract text from PDF files |
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
aden-tools/
|
||||
├── src/aden_tools/
|
||||
│ ├── __init__.py # Main exports
|
||||
│ ├── utils/ # Utility functions
|
||||
│ └── tools/ # Tool implementations
|
||||
│ ├── example_tool/
|
||||
│ ├── file_read_tool/
|
||||
│ ├── file_write_tool/
|
||||
│ ├── web_search_tool/
|
||||
│ ├── web_scrape_tool/
|
||||
│ └── pdf_read_tool/
|
||||
├── tests/ # Test suite
|
||||
├── mcp_server.py # MCP server entry point
|
||||
├── README.md
|
||||
├── BUILDING_TOOLS.md # Tool development guide
|
||||
└── pyproject.toml
|
||||
```
|
||||
|
||||
## Creating Custom Tools
|
||||
|
||||
Tools use FastMCP's native decorator pattern:
|
||||
|
||||
```python
|
||||
from fastmcp import FastMCP
|
||||
|
||||
|
||||
def register_tools(mcp: FastMCP) -> None:
|
||||
@mcp.tool()
|
||||
def my_tool(query: str, limit: int = 10) -> dict:
|
||||
"""
|
||||
Search for items matching the query.
|
||||
|
||||
Args:
|
||||
query: The search query
|
||||
limit: Max results to return
|
||||
|
||||
Returns:
|
||||
Dict with results or error
|
||||
"""
|
||||
try:
|
||||
results = do_search(query, limit)
|
||||
return {"results": results, "total": len(results)}
|
||||
except Exception as e:
|
||||
return {"error": str(e)}
|
||||
```
|
||||
|
||||
See [BUILDING_TOOLS.md](BUILDING_TOOLS.md) for the full guide.
|
||||
|
||||
## Documentation
|
||||
|
||||
- [Building Tools Guide](BUILDING_TOOLS.md) - How to create new tools
|
||||
- Individual tool READMEs in `src/aden_tools/tools/*/README.md`
|
||||
|
||||
## License
|
||||
|
||||
This project is licensed under the Apache License 2.0 - see the [LICENSE](../LICENSE) file for details.
|
||||
@@ -0,0 +1,79 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Aden Tools MCP Server
|
||||
|
||||
Exposes all aden-tools via Model Context Protocol using FastMCP.
|
||||
|
||||
Usage:
|
||||
# Run with HTTP transport (default, for Docker)
|
||||
python mcp_server.py
|
||||
|
||||
# Run with custom port
|
||||
python mcp_server.py --port 8001
|
||||
|
||||
# Run with STDIO transport (for local testing)
|
||||
python mcp_server.py --stdio
|
||||
|
||||
Environment Variables:
|
||||
MCP_PORT - Server port (default: 4001)
|
||||
BRAVE_SEARCH_API_KEY - Required for web_search tool
|
||||
"""
|
||||
import argparse
|
||||
import os
|
||||
|
||||
from fastmcp import FastMCP
|
||||
from starlette.requests import Request
|
||||
from starlette.responses import PlainTextResponse
|
||||
|
||||
mcp = FastMCP("aden-tools")
|
||||
|
||||
# Register all tools with the MCP server
|
||||
from aden_tools.tools import register_all_tools
|
||||
|
||||
tools = register_all_tools(mcp)
|
||||
print(f"[MCP] Registered {len(tools)} tools: {tools}")
|
||||
|
||||
|
||||
@mcp.custom_route("/health", methods=["GET"])
|
||||
async def health_check(request: Request) -> PlainTextResponse:
|
||||
"""Health check endpoint for container orchestration."""
|
||||
return PlainTextResponse("OK")
|
||||
|
||||
|
||||
@mcp.custom_route("/", methods=["GET"])
|
||||
async def index(request: Request) -> PlainTextResponse:
|
||||
"""Landing page for browser visits."""
|
||||
return PlainTextResponse("Welcome to the Hive MCP Server")
|
||||
|
||||
|
||||
def main() -> None:
|
||||
"""Entry point for the MCP server."""
|
||||
parser = argparse.ArgumentParser(description="Aden Tools MCP Server")
|
||||
parser.add_argument(
|
||||
"--port",
|
||||
type=int,
|
||||
default=int(os.getenv("MCP_PORT", "4001")),
|
||||
help="HTTP server port (default: 4001)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--host",
|
||||
default="0.0.0.0",
|
||||
help="HTTP server host (default: 0.0.0.0)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--stdio",
|
||||
action="store_true",
|
||||
help="Use STDIO transport instead of HTTP",
|
||||
)
|
||||
args = parser.parse_args()
|
||||
|
||||
if args.stdio:
|
||||
print("[MCP] Starting with STDIO transport")
|
||||
mcp.run(transport="stdio")
|
||||
else:
|
||||
print(f"[MCP] Starting HTTP server on {args.host}:{args.port}")
|
||||
mcp.run(transport="http", host=args.host, port=args.port)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -0,0 +1,59 @@
|
||||
[project]
|
||||
name = "aden-tools"
|
||||
version = "0.1.0"
|
||||
description = "Tools library for the Aden agent framework"
|
||||
readme = "README.md"
|
||||
requires-python = ">=3.10"
|
||||
license = { text = "Apache-2.0" }
|
||||
authors = [
|
||||
{ name = "Aden", email = "team@aden.ai" }
|
||||
]
|
||||
keywords = ["ai", "agents", "tools", "llm"]
|
||||
classifiers = [
|
||||
"Development Status :: 3 - Alpha",
|
||||
"Intended Audience :: Developers",
|
||||
"License :: OSI Approved :: Apache Software License",
|
||||
"Programming Language :: Python :: 3",
|
||||
"Programming Language :: Python :: 3.10",
|
||||
"Programming Language :: Python :: 3.11",
|
||||
"Programming Language :: Python :: 3.12",
|
||||
]
|
||||
|
||||
dependencies = [
|
||||
"pydantic>=2.0.0",
|
||||
"httpx>=0.27.0",
|
||||
"beautifulsoup4>=4.12.0",
|
||||
"pypdf>=4.0.0",
|
||||
"pandas>=2.0.0",
|
||||
"jsonpath-ng>=1.6.0",
|
||||
"fastmcp>=2.0.0",
|
||||
]
|
||||
|
||||
[project.optional-dependencies]
|
||||
dev = [
|
||||
"pytest>=7.0.0",
|
||||
"pytest-asyncio>=0.21.0",
|
||||
]
|
||||
sandbox = [
|
||||
"RestrictedPython>=7.0",
|
||||
]
|
||||
ocr = [
|
||||
"pytesseract>=0.3.10",
|
||||
"pillow>=10.0.0",
|
||||
]
|
||||
all = [
|
||||
"RestrictedPython>=7.0",
|
||||
"pytesseract>=0.3.10",
|
||||
"pillow>=10.0.0",
|
||||
]
|
||||
|
||||
[build-system]
|
||||
requires = ["hatchling"]
|
||||
build-backend = "hatchling.build"
|
||||
|
||||
[tool.hatch.build.targets.wheel]
|
||||
packages = ["src/aden_tools"]
|
||||
|
||||
[tool.pytest.ini_options]
|
||||
testpaths = ["tests"]
|
||||
asyncio_mode = "auto"
|
||||
@@ -0,0 +1,30 @@
|
||||
"""
|
||||
Aden Tools - Tool library for the Aden agent framework.
|
||||
|
||||
Tools provide capabilities that AI agents can use to interact with
|
||||
external systems, process data, and perform actions.
|
||||
|
||||
Usage:
|
||||
from fastmcp import FastMCP
|
||||
from aden_tools.tools import register_all_tools
|
||||
|
||||
mcp = FastMCP("my-server")
|
||||
register_all_tools(mcp)
|
||||
"""
|
||||
|
||||
__version__ = "0.1.0"
|
||||
|
||||
# Utilities
|
||||
from .utils import get_env_var
|
||||
|
||||
# MCP registration
|
||||
from .tools import register_all_tools
|
||||
|
||||
__all__ = [
|
||||
# Version
|
||||
"__version__",
|
||||
# Utilities
|
||||
"get_env_var",
|
||||
# MCP registration
|
||||
"register_all_tools",
|
||||
]
|
||||
@@ -0,0 +1,51 @@
|
||||
"""
|
||||
Aden Tools - Tool implementations for FastMCP.
|
||||
|
||||
Usage:
|
||||
from fastmcp import FastMCP
|
||||
from aden_tools.tools import register_all_tools
|
||||
|
||||
mcp = FastMCP("my-server")
|
||||
register_all_tools(mcp)
|
||||
"""
|
||||
from typing import List
|
||||
|
||||
from fastmcp import FastMCP
|
||||
|
||||
# Import register_tools from each tool module
|
||||
from .example_tool import register_tools as register_example
|
||||
from .file_read_tool import register_tools as register_file_read
|
||||
from .file_write_tool import register_tools as register_file_write
|
||||
from .web_search_tool import register_tools as register_web_search
|
||||
from .web_scrape_tool import register_tools as register_web_scrape
|
||||
from .pdf_read_tool import register_tools as register_pdf_read
|
||||
|
||||
|
||||
def register_all_tools(mcp: FastMCP) -> List[str]:
|
||||
"""
|
||||
Register all aden-tools with a FastMCP server.
|
||||
|
||||
Args:
|
||||
mcp: FastMCP server instance
|
||||
|
||||
Returns:
|
||||
List of registered tool names
|
||||
"""
|
||||
register_example(mcp)
|
||||
register_file_read(mcp)
|
||||
register_file_write(mcp)
|
||||
register_web_search(mcp)
|
||||
register_web_scrape(mcp)
|
||||
register_pdf_read(mcp)
|
||||
|
||||
return [
|
||||
"example_tool",
|
||||
"file_read",
|
||||
"file_write",
|
||||
"web_search",
|
||||
"web_scrape",
|
||||
"pdf_read",
|
||||
]
|
||||
|
||||
|
||||
__all__ = ["register_all_tools"]
|
||||
@@ -0,0 +1,26 @@
|
||||
# Example Tool
|
||||
|
||||
A template tool demonstrating the Aden tools pattern.
|
||||
|
||||
## Description
|
||||
|
||||
This tool processes text messages with optional transformations. It serves as a reference implementation for creating new tools using the FastMCP decorator pattern.
|
||||
|
||||
## Arguments
|
||||
|
||||
| Argument | Type | Required | Default | Description |
|
||||
|----------|------|----------|---------|-------------|
|
||||
| `message` | str | Yes | - | The message to process (1-1000 chars) |
|
||||
| `uppercase` | bool | No | `False` | Convert message to uppercase |
|
||||
| `repeat` | int | No | `1` | Number of times to repeat (1-10) |
|
||||
|
||||
## Environment Variables
|
||||
|
||||
This tool does not require any environment variables.
|
||||
|
||||
## Error Handling
|
||||
|
||||
Returns error strings for validation issues:
|
||||
- `Error: message must be 1-1000 characters` - Empty or too long message
|
||||
- `Error: repeat must be 1-10` - Repeat value out of range
|
||||
- `Error processing message: <error>` - Unexpected error
|
||||
@@ -0,0 +1,4 @@
|
||||
"""Example Tool package."""
|
||||
from .example_tool import register_tools
|
||||
|
||||
__all__ = ["register_tools"]
|
||||
@@ -0,0 +1,51 @@
|
||||
"""
|
||||
Example Tool - A simple text processing tool for FastMCP.
|
||||
|
||||
Demonstrates native FastMCP tool registration pattern.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
from fastmcp import FastMCP
|
||||
|
||||
|
||||
def register_tools(mcp: FastMCP) -> None:
|
||||
"""Register example tools with the MCP server."""
|
||||
|
||||
@mcp.tool()
|
||||
def example_tool(
|
||||
message: str,
|
||||
uppercase: bool = False,
|
||||
repeat: int = 1,
|
||||
) -> str:
|
||||
"""
|
||||
A simple example tool that processes text messages.
|
||||
Use this tool when you need to transform or repeat text.
|
||||
|
||||
Args:
|
||||
message: The message to process (1-1000 chars)
|
||||
uppercase: If True, convert the message to uppercase
|
||||
repeat: Number of times to repeat the message (1-10)
|
||||
|
||||
Returns:
|
||||
The processed message string
|
||||
"""
|
||||
try:
|
||||
# Validate inputs
|
||||
if not message or len(message) > 1000:
|
||||
return "Error: message must be 1-1000 characters"
|
||||
if repeat < 1 or repeat > 10:
|
||||
return "Error: repeat must be 1-10"
|
||||
|
||||
# Process the message
|
||||
result = message
|
||||
if uppercase:
|
||||
result = result.upper()
|
||||
|
||||
# Repeat if requested
|
||||
if repeat > 1:
|
||||
result = " ".join([result] * repeat)
|
||||
|
||||
return result
|
||||
|
||||
except Exception as e:
|
||||
return f"Error processing message: {str(e)}"
|
||||
@@ -0,0 +1,28 @@
|
||||
# File Read Tool
|
||||
|
||||
Read contents of local files with encoding support.
|
||||
|
||||
## Description
|
||||
|
||||
Use for reading configs, data files, source code, logs, or any text file. Returns file content along with path, name, size, and encoding metadata.
|
||||
|
||||
## Arguments
|
||||
|
||||
| Argument | Type | Required | Default | Description |
|
||||
|----------|------|----------|---------|-------------|
|
||||
| `file_path` | str | Yes | - | Path to the file to read (absolute or relative) |
|
||||
| `encoding` | str | No | `utf-8` | File encoding (utf-8, latin-1, etc.) |
|
||||
| `max_size` | int | No | `10000000` | Maximum file size to read in bytes (default 10MB) |
|
||||
|
||||
## Environment Variables
|
||||
|
||||
This tool does not require any environment variables.
|
||||
|
||||
## Error Handling
|
||||
|
||||
Returns error dicts for common issues:
|
||||
- `File not found: <path>` - File does not exist
|
||||
- `Not a file: <path>` - Path points to a directory
|
||||
- `File too large: <size> bytes (max: <max_size>)` - File exceeds max_size limit
|
||||
- `Failed to decode file with encoding '<encoding>'` - Wrong encoding specified
|
||||
- `Permission denied: <path>` - No read access to file
|
||||
@@ -0,0 +1,4 @@
|
||||
"""File Read Tool - Read contents of local files."""
|
||||
from .file_read_tool import register_tools
|
||||
|
||||
__all__ = ["register_tools"]
|
||||
@@ -0,0 +1,75 @@
|
||||
"""
|
||||
File Read Tool - Read contents of local files.
|
||||
|
||||
Supports reading text files with various encodings.
|
||||
Returns file content along with metadata.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
from pathlib import Path
|
||||
|
||||
from fastmcp import FastMCP
|
||||
|
||||
|
||||
def register_tools(mcp: FastMCP) -> None:
|
||||
"""Register file read tools with the MCP server."""
|
||||
|
||||
@mcp.tool()
|
||||
def file_read(
|
||||
file_path: str,
|
||||
encoding: str = "utf-8",
|
||||
max_size: int = 10_000_000,
|
||||
) -> dict:
|
||||
"""
|
||||
Read the contents of a local file.
|
||||
|
||||
Use for reading configs, data files, source code, logs, or any text file.
|
||||
Returns file content along with path, name, size, and encoding.
|
||||
|
||||
Args:
|
||||
file_path: Path to the file to read (absolute or relative)
|
||||
encoding: File encoding (utf-8, latin-1, etc.)
|
||||
max_size: Maximum file size to read in bytes (default 10MB)
|
||||
|
||||
Returns:
|
||||
Dict with file content and metadata, or error dict
|
||||
"""
|
||||
try:
|
||||
path = Path(file_path).resolve()
|
||||
|
||||
# Check if file exists
|
||||
if not path.exists():
|
||||
return {"error": f"File not found: {file_path}"}
|
||||
|
||||
# Check if it's a file (not directory)
|
||||
if not path.is_file():
|
||||
return {"error": f"Not a file: {file_path}"}
|
||||
|
||||
# Check file size
|
||||
file_size = path.stat().st_size
|
||||
if max_size > 0 and file_size > max_size:
|
||||
return {
|
||||
"error": f"File too large: {file_size} bytes (max: {max_size})",
|
||||
"file_size": file_size,
|
||||
}
|
||||
|
||||
# Read the file
|
||||
content = path.read_text(encoding=encoding)
|
||||
|
||||
return {
|
||||
"path": str(path),
|
||||
"name": path.name,
|
||||
"content": content,
|
||||
"size": len(content),
|
||||
"encoding": encoding,
|
||||
}
|
||||
|
||||
except UnicodeDecodeError as e:
|
||||
return {
|
||||
"error": f"Failed to decode file with encoding '{encoding}': {str(e)}",
|
||||
"suggestion": "Try a different encoding like 'latin-1' or 'cp1252'",
|
||||
}
|
||||
except PermissionError:
|
||||
return {"error": f"Permission denied: {file_path}"}
|
||||
except Exception as e:
|
||||
return {"error": f"Failed to read file: {str(e)}"}
|
||||
@@ -0,0 +1,29 @@
|
||||
# File Write Tool
|
||||
|
||||
Write content to local files with encoding support.
|
||||
|
||||
## Description
|
||||
|
||||
Can create new files or overwrite/append to existing ones. Use for saving data, creating configs, writing reports, or exporting results. Optionally creates parent directories if they don't exist.
|
||||
|
||||
## Arguments
|
||||
|
||||
| Argument | Type | Required | Default | Description |
|
||||
|----------|------|----------|---------|-------------|
|
||||
| `file_path` | str | Yes | - | Path to the file to write (absolute or relative) |
|
||||
| `content` | str | Yes | - | Content to write to the file |
|
||||
| `encoding` | str | No | `utf-8` | File encoding (utf-8, latin-1, etc.) |
|
||||
| `mode` | str | No | `write` | Write mode - 'write' (overwrite) or 'append' |
|
||||
| `create_dirs` | bool | No | `True` | Create parent directories if they don't exist |
|
||||
|
||||
## Environment Variables
|
||||
|
||||
This tool does not require any environment variables.
|
||||
|
||||
## Error Handling
|
||||
|
||||
Returns error dicts for common issues:
|
||||
- `Parent directory does not exist: <path>` - Parent dir missing and create_dirs=False
|
||||
- `Invalid mode: <mode>. Use 'write' or 'append'.` - Invalid mode specified
|
||||
- `Permission denied: <path>` - No write access to file/directory
|
||||
- `OS error writing file: <error>` - Filesystem error
|
||||
@@ -0,0 +1,4 @@
|
||||
"""File Write Tool - Create or update local files."""
|
||||
from .file_write_tool import register_tools
|
||||
|
||||
__all__ = ["register_tools"]
|
||||
@@ -0,0 +1,83 @@
|
||||
"""
|
||||
File Write Tool - Create or update local files.
|
||||
|
||||
Supports writing text files with various encodings.
|
||||
Can create directories if they don't exist.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
from pathlib import Path
|
||||
|
||||
from fastmcp import FastMCP
|
||||
|
||||
|
||||
def register_tools(mcp: FastMCP) -> None:
|
||||
"""Register file write tools with the MCP server."""
|
||||
|
||||
@mcp.tool()
|
||||
def file_write(
|
||||
file_path: str,
|
||||
content: str,
|
||||
encoding: str = "utf-8",
|
||||
mode: str = "write",
|
||||
create_dirs: bool = True,
|
||||
) -> dict:
|
||||
"""
|
||||
Write content to a local file.
|
||||
|
||||
Can create new files or overwrite/append to existing ones.
|
||||
Use for saving data, creating configs, writing reports, or exporting results.
|
||||
|
||||
Args:
|
||||
file_path: Path to the file to write (absolute or relative)
|
||||
content: Content to write to the file
|
||||
encoding: File encoding (utf-8, latin-1, etc.)
|
||||
mode: Write mode - 'write' (overwrite) or 'append'
|
||||
create_dirs: Create parent directories if they don't exist
|
||||
|
||||
Returns:
|
||||
Dict with write result or error dict
|
||||
"""
|
||||
try:
|
||||
path = Path(file_path).resolve()
|
||||
|
||||
# Create parent directories if requested
|
||||
if create_dirs:
|
||||
path.parent.mkdir(parents=True, exist_ok=True)
|
||||
elif not path.parent.exists():
|
||||
return {"error": f"Parent directory does not exist: {path.parent}"}
|
||||
|
||||
# Determine write mode
|
||||
if mode == "append":
|
||||
write_mode = "a"
|
||||
elif mode == "write":
|
||||
write_mode = "w"
|
||||
else:
|
||||
return {"error": f"Invalid mode: {mode}. Use 'write' or 'append'."}
|
||||
|
||||
# Check if we're overwriting
|
||||
existed = path.exists()
|
||||
previous_size = path.stat().st_size if existed else 0
|
||||
|
||||
# Write the file
|
||||
with open(path, write_mode, encoding=encoding) as f:
|
||||
f.write(content)
|
||||
|
||||
new_size = path.stat().st_size
|
||||
|
||||
return {
|
||||
"path": str(path),
|
||||
"name": path.name,
|
||||
"bytes_written": len(content.encode(encoding)),
|
||||
"total_size": new_size,
|
||||
"mode": mode,
|
||||
"created": not existed,
|
||||
"previous_size": previous_size if existed else None,
|
||||
}
|
||||
|
||||
except PermissionError:
|
||||
return {"error": f"Permission denied: {file_path}"}
|
||||
except OSError as e:
|
||||
return {"error": f"OS error writing file: {str(e)}"}
|
||||
except Exception as e:
|
||||
return {"error": f"Failed to write file: {str(e)}"}
|
||||
@@ -0,0 +1,37 @@
|
||||
# PDF Read Tool
|
||||
|
||||
Read and extract text content from PDF files.
|
||||
|
||||
## Description
|
||||
|
||||
Returns text content with page markers and optional metadata. Use for reading PDFs, reports, documents, or any PDF file.
|
||||
|
||||
## Arguments
|
||||
|
||||
| Argument | Type | Required | Default | Description |
|
||||
|----------|------|----------|---------|-------------|
|
||||
| `file_path` | str | Yes | - | Path to the PDF file to read (absolute or relative) |
|
||||
| `pages` | str | No | `None` | Page range - 'all'/None for all, '5' for single, '1-10' for range, '1,3,5' for specific |
|
||||
| `max_pages` | int | No | `100` | Maximum pages to process (1-1000, for memory safety) |
|
||||
| `include_metadata` | bool | No | `True` | Include PDF metadata (author, title, creation date, etc.) |
|
||||
|
||||
## Environment Variables
|
||||
|
||||
This tool does not require any environment variables.
|
||||
|
||||
## Error Handling
|
||||
|
||||
Returns error dicts for common issues:
|
||||
- `PDF file not found: <path>` - File does not exist
|
||||
- `Not a file: <path>` - Path points to a directory
|
||||
- `Not a PDF file (expected .pdf): <path>` - Wrong file extension
|
||||
- `Cannot read encrypted PDF. Password required.` - PDF is password-protected
|
||||
- `Page <num> out of range. PDF has <total> pages.` - Invalid page number
|
||||
- `Invalid page format: '<pages>'` - Malformed page range string
|
||||
- `Permission denied: <path>` - No read access to file
|
||||
|
||||
## Notes
|
||||
|
||||
- Page numbers in the `pages` argument are 1-indexed (first page is 1, not 0)
|
||||
- Text is extracted with page markers: `--- Page N ---`
|
||||
- Metadata includes: title, author, subject, creator, producer, created, modified
|
||||
@@ -0,0 +1,4 @@
|
||||
"""PDF Read Tool - Parse and extract text from PDF files."""
|
||||
from .pdf_read_tool import register_tools
|
||||
|
||||
__all__ = ["register_tools"]
|
||||
@@ -0,0 +1,157 @@
|
||||
"""
|
||||
PDF Read Tool - Parse and extract text from PDF files.
|
||||
|
||||
Uses pypdf to read PDF documents and extract text content
|
||||
along with metadata.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
from pathlib import Path
|
||||
from typing import Any, List
|
||||
|
||||
from fastmcp import FastMCP
|
||||
from pypdf import PdfReader
|
||||
|
||||
|
||||
def register_tools(mcp: FastMCP) -> None:
|
||||
"""Register PDF read tools with the MCP server."""
|
||||
|
||||
def parse_page_range(
|
||||
pages: str | None, total_pages: int, max_pages: int
|
||||
) -> List[int] | dict:
|
||||
"""
|
||||
Parse page range string into list of 0-indexed page numbers.
|
||||
|
||||
Returns list of indices or error dict.
|
||||
"""
|
||||
if pages is None or pages.lower() == "all":
|
||||
indices = list(range(min(total_pages, max_pages)))
|
||||
return indices
|
||||
|
||||
try:
|
||||
# Single page: "5"
|
||||
if pages.isdigit():
|
||||
page_num = int(pages)
|
||||
if page_num < 1 or page_num > total_pages:
|
||||
return {"error": f"Page {page_num} out of range. PDF has {total_pages} pages."}
|
||||
return [page_num - 1]
|
||||
|
||||
# Range: "1-10"
|
||||
if "-" in pages and "," not in pages:
|
||||
start_str, end_str = pages.split("-", 1)
|
||||
start, end = int(start_str), int(end_str)
|
||||
if start > end:
|
||||
return {"error": f"Invalid page range: {pages}. Start must be less than end."}
|
||||
if start < 1:
|
||||
return {"error": f"Page numbers start at 1, got {start}."}
|
||||
if end > total_pages:
|
||||
return {"error": f"Page {end} out of range. PDF has {total_pages} pages."}
|
||||
indices = list(range(start - 1, min(end, start - 1 + max_pages)))
|
||||
return indices
|
||||
|
||||
# Comma-separated: "1,3,5"
|
||||
if "," in pages:
|
||||
page_nums = [int(p.strip()) for p in pages.split(",")]
|
||||
for p in page_nums:
|
||||
if p < 1 or p > total_pages:
|
||||
return {"error": f"Page {p} out of range. PDF has {total_pages} pages."}
|
||||
indices = [p - 1 for p in page_nums[:max_pages]]
|
||||
return indices
|
||||
|
||||
return {"error": f"Invalid page format: '{pages}'. Use 'all', '5', '1-10', or '1,3,5'."}
|
||||
|
||||
except ValueError as e:
|
||||
return {"error": f"Invalid page format: '{pages}'. {str(e)}"}
|
||||
|
||||
@mcp.tool()
|
||||
def pdf_read(
|
||||
file_path: str,
|
||||
pages: str | None = None,
|
||||
max_pages: int = 100,
|
||||
include_metadata: bool = True,
|
||||
) -> dict:
|
||||
"""
|
||||
Read and extract text content from a PDF file.
|
||||
|
||||
Returns text content with page markers and optional metadata.
|
||||
Use for reading PDFs, reports, documents, or any PDF file.
|
||||
|
||||
Args:
|
||||
file_path: Path to the PDF file to read (absolute or relative)
|
||||
pages: Page range to extract - 'all'/None for all, '5' for single, '1-10' for range, '1,3,5' for specific
|
||||
max_pages: Maximum number of pages to process (1-1000, memory safety)
|
||||
include_metadata: Include PDF metadata (author, title, creation date, etc.)
|
||||
|
||||
Returns:
|
||||
Dict with extracted text and metadata, or error dict
|
||||
"""
|
||||
try:
|
||||
path = Path(file_path).resolve()
|
||||
|
||||
# Validate file exists
|
||||
if not path.exists():
|
||||
return {"error": f"PDF file not found: {file_path}"}
|
||||
|
||||
if not path.is_file():
|
||||
return {"error": f"Not a file: {file_path}"}
|
||||
|
||||
# Check extension
|
||||
if path.suffix.lower() != ".pdf":
|
||||
return {"error": f"Not a PDF file (expected .pdf): {file_path}"}
|
||||
|
||||
# Validate max_pages
|
||||
if max_pages < 1:
|
||||
max_pages = 1
|
||||
elif max_pages > 1000:
|
||||
max_pages = 1000
|
||||
|
||||
# Open and read PDF
|
||||
reader = PdfReader(path)
|
||||
|
||||
# Check for encryption
|
||||
if reader.is_encrypted:
|
||||
return {"error": "Cannot read encrypted PDF. Password required."}
|
||||
|
||||
total_pages = len(reader.pages)
|
||||
|
||||
# Parse page range
|
||||
page_indices = parse_page_range(pages, total_pages, max_pages)
|
||||
if isinstance(page_indices, dict): # Error dict
|
||||
return page_indices
|
||||
|
||||
# Extract text from pages
|
||||
content_parts = []
|
||||
for i in page_indices:
|
||||
page_text = reader.pages[i].extract_text() or ""
|
||||
content_parts.append(f"--- Page {i + 1} ---\n{page_text}")
|
||||
|
||||
content = "\n\n".join(content_parts)
|
||||
|
||||
result: dict[str, Any] = {
|
||||
"path": str(path),
|
||||
"name": path.name,
|
||||
"total_pages": total_pages,
|
||||
"pages_extracted": len(page_indices),
|
||||
"content": content,
|
||||
"char_count": len(content),
|
||||
}
|
||||
|
||||
# Add metadata if requested
|
||||
if include_metadata and reader.metadata:
|
||||
meta = reader.metadata
|
||||
result["metadata"] = {
|
||||
"title": meta.get("/Title"),
|
||||
"author": meta.get("/Author"),
|
||||
"subject": meta.get("/Subject"),
|
||||
"creator": meta.get("/Creator"),
|
||||
"producer": meta.get("/Producer"),
|
||||
"created": str(meta.get("/CreationDate")) if meta.get("/CreationDate") else None,
|
||||
"modified": str(meta.get("/ModDate")) if meta.get("/ModDate") else None,
|
||||
}
|
||||
|
||||
return result
|
||||
|
||||
except PermissionError:
|
||||
return {"error": f"Permission denied: {file_path}"}
|
||||
except Exception as e:
|
||||
return {"error": f"Failed to read PDF: {str(e)}"}
|
||||
@@ -0,0 +1,36 @@
|
||||
# Web Scrape Tool
|
||||
|
||||
Scrape and extract text content from webpages.
|
||||
|
||||
## Description
|
||||
|
||||
Use when you need to read the content of a specific URL, extract data from a website, or read articles/documentation. Automatically removes noise elements (scripts, navigation, footers) and extracts the main content.
|
||||
|
||||
## Arguments
|
||||
|
||||
| Argument | Type | Required | Default | Description |
|
||||
|----------|------|----------|---------|-------------|
|
||||
| `url` | str | Yes | - | URL of the webpage to scrape |
|
||||
| `selector` | str | No | `None` | CSS selector to target specific content (e.g., 'article', '.main-content') |
|
||||
| `include_links` | bool | No | `False` | Include extracted links in the response |
|
||||
| `max_length` | int | No | `50000` | Maximum length of extracted text (1000-500000) |
|
||||
|
||||
## Environment Variables
|
||||
|
||||
This tool does not require any environment variables.
|
||||
|
||||
## Error Handling
|
||||
|
||||
Returns error dicts for common issues:
|
||||
- `HTTP <status>: Failed to fetch URL` - Server returned error status
|
||||
- `No elements found matching selector: <selector>` - CSS selector matched nothing
|
||||
- `Request timed out` - Request exceeded 30s timeout
|
||||
- `Network error: <error>` - Connection or DNS issues
|
||||
- `Scraping failed: <error>` - HTML parsing or other error
|
||||
|
||||
## Notes
|
||||
|
||||
- URLs without protocol are automatically prefixed with `https://`
|
||||
- Follows redirects automatically
|
||||
- Removes script, style, nav, footer, header, aside, noscript, and iframe elements
|
||||
- Auto-detects main content using article, main, or common content class selectors
|
||||
@@ -0,0 +1,4 @@
|
||||
"""Web Scrape Tool - Extract content from web pages."""
|
||||
from .web_scrape_tool import register_tools
|
||||
|
||||
__all__ = ["register_tools"]
|
||||
@@ -0,0 +1,134 @@
|
||||
"""
|
||||
Web Scrape Tool - Extract content from web pages.
|
||||
|
||||
Uses httpx for requests and BeautifulSoup for HTML parsing.
|
||||
Returns clean text content from web pages.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Any, List
|
||||
|
||||
import httpx
|
||||
from bs4 import BeautifulSoup
|
||||
from fastmcp import FastMCP
|
||||
|
||||
|
||||
def register_tools(mcp: FastMCP) -> None:
|
||||
"""Register web scrape tools with the MCP server."""
|
||||
|
||||
@mcp.tool()
|
||||
def web_scrape(
|
||||
url: str,
|
||||
selector: str | None = None,
|
||||
include_links: bool = False,
|
||||
max_length: int = 50000,
|
||||
) -> dict:
|
||||
"""
|
||||
Scrape and extract text content from a webpage.
|
||||
|
||||
Use when you need to read the content of a specific URL,
|
||||
extract data from a website, or read articles/documentation.
|
||||
|
||||
Args:
|
||||
url: URL of the webpage to scrape
|
||||
selector: CSS selector to target specific content (e.g., 'article', '.main-content')
|
||||
include_links: Include extracted links in the response
|
||||
max_length: Maximum length of extracted text (1000-500000)
|
||||
|
||||
Returns:
|
||||
Dict with scraped content (url, title, description, content, length) or error dict
|
||||
"""
|
||||
try:
|
||||
# Validate URL
|
||||
if not url.startswith(("http://", "https://")):
|
||||
url = "https://" + url
|
||||
|
||||
# Validate max_length
|
||||
if max_length < 1000:
|
||||
max_length = 1000
|
||||
elif max_length > 500000:
|
||||
max_length = 500000
|
||||
|
||||
# Make request
|
||||
response = httpx.get(
|
||||
url,
|
||||
headers={
|
||||
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
|
||||
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
|
||||
"Accept-Language": "en-US,en;q=0.5",
|
||||
},
|
||||
follow_redirects=True,
|
||||
timeout=30.0,
|
||||
)
|
||||
|
||||
if response.status_code != 200:
|
||||
return {"error": f"HTTP {response.status_code}: Failed to fetch URL"}
|
||||
|
||||
# Parse HTML
|
||||
soup = BeautifulSoup(response.text, "html.parser")
|
||||
|
||||
# Remove noise elements
|
||||
for tag in soup(["script", "style", "nav", "footer", "header", "aside", "noscript", "iframe"]):
|
||||
tag.decompose()
|
||||
|
||||
# Get title and description
|
||||
title = ""
|
||||
title_tag = soup.find("title")
|
||||
if title_tag:
|
||||
title = title_tag.get_text(strip=True)
|
||||
|
||||
description = ""
|
||||
meta_desc = soup.find("meta", attrs={"name": "description"})
|
||||
if meta_desc:
|
||||
description = meta_desc.get("content", "")
|
||||
|
||||
# Target content
|
||||
if selector:
|
||||
content_elem = soup.select_one(selector)
|
||||
if not content_elem:
|
||||
return {"error": f"No elements found matching selector: {selector}"}
|
||||
text = content_elem.get_text(separator=" ", strip=True)
|
||||
else:
|
||||
# Auto-detect main content
|
||||
main_content = (
|
||||
soup.find("article")
|
||||
or soup.find("main")
|
||||
or soup.find(attrs={"role": "main"})
|
||||
or soup.find(class_=["content", "post", "entry", "article-body"])
|
||||
or soup.find("body")
|
||||
)
|
||||
text = main_content.get_text(separator=" ", strip=True) if main_content else ""
|
||||
|
||||
# Clean up whitespace
|
||||
text = " ".join(text.split())
|
||||
|
||||
# Truncate if needed
|
||||
if len(text) > max_length:
|
||||
text = text[:max_length] + "..."
|
||||
|
||||
result: dict[str, Any] = {
|
||||
"url": str(response.url),
|
||||
"title": title,
|
||||
"description": description,
|
||||
"content": text,
|
||||
"length": len(text),
|
||||
}
|
||||
|
||||
# Extract links if requested
|
||||
if include_links:
|
||||
links: List[dict[str, str]] = []
|
||||
for a in soup.find_all("a", href=True)[:50]:
|
||||
href = a["href"]
|
||||
link_text = a.get_text(strip=True)
|
||||
if link_text and href:
|
||||
links.append({"text": link_text, "href": href})
|
||||
result["links"] = links
|
||||
|
||||
return result
|
||||
|
||||
except httpx.TimeoutException:
|
||||
return {"error": "Request timed out"}
|
||||
except httpx.RequestError as e:
|
||||
return {"error": f"Network error: {str(e)}"}
|
||||
except Exception as e:
|
||||
return {"error": f"Scraping failed: {str(e)}"}
|
||||
@@ -0,0 +1,31 @@
|
||||
# Web Search Tool
|
||||
|
||||
Search the web using the Brave Search API.
|
||||
|
||||
## Description
|
||||
|
||||
Returns titles, URLs, and snippets for search results. Use when you need current information, research topics, or find websites.
|
||||
|
||||
## Arguments
|
||||
|
||||
| Argument | Type | Required | Default | Description |
|
||||
|----------|------|----------|---------|-------------|
|
||||
| `query` | str | Yes | - | The search query (1-500 chars) |
|
||||
| `num_results` | int | No | `10` | Number of results to return (1-20) |
|
||||
| `country` | str | No | `us` | Country code for localized results (us, uk, de, etc.) |
|
||||
|
||||
## Environment Variables
|
||||
|
||||
| Variable | Required | Description |
|
||||
|----------|----------|-------------|
|
||||
| `BRAVE_SEARCH_API_KEY` | Yes | API key from [Brave Search API](https://brave.com/search/api/) |
|
||||
|
||||
## Error Handling
|
||||
|
||||
Returns error dicts for common issues:
|
||||
- `BRAVE_SEARCH_API_KEY environment variable not set` - Missing API key
|
||||
- `Query must be 1-500 characters` - Empty or too long query
|
||||
- `Invalid API key` - API key rejected (HTTP 401)
|
||||
- `Rate limit exceeded. Try again later.` - Too many requests (HTTP 429)
|
||||
- `Search request timed out` - Request exceeded 30s timeout
|
||||
- `Network error: <error>` - Connection or DNS issues
|
||||
@@ -0,0 +1,4 @@
|
||||
"""Web Search Tool - Search the web using Brave Search API."""
|
||||
from .web_search_tool import register_tools
|
||||
|
||||
__all__ = ["register_tools"]
|
||||
@@ -0,0 +1,100 @@
|
||||
"""
|
||||
Web Search Tool - Search the web using Brave Search API.
|
||||
|
||||
Requires BRAVE_SEARCH_API_KEY environment variable.
|
||||
Returns search results with titles, URLs, and snippets.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import os
|
||||
|
||||
import httpx
|
||||
from fastmcp import FastMCP
|
||||
|
||||
|
||||
def register_tools(mcp: FastMCP) -> None:
|
||||
"""Register web search tools with the MCP server."""
|
||||
|
||||
@mcp.tool()
|
||||
def web_search(
|
||||
query: str,
|
||||
num_results: int = 10,
|
||||
country: str = "us",
|
||||
) -> dict:
|
||||
"""
|
||||
Search the web for information using Brave Search API.
|
||||
|
||||
Returns titles, URLs, and snippets. Use when you need current
|
||||
information, research, or to find websites.
|
||||
|
||||
Requires BRAVE_SEARCH_API_KEY environment variable.
|
||||
|
||||
Args:
|
||||
query: The search query (1-500 chars)
|
||||
num_results: Number of results to return (1-20)
|
||||
country: Country code for localized results (us, uk, de, etc.)
|
||||
|
||||
Returns:
|
||||
Dict with search results or error dict
|
||||
"""
|
||||
api_key = os.getenv("BRAVE_SEARCH_API_KEY")
|
||||
if not api_key:
|
||||
return {
|
||||
"error": "BRAVE_SEARCH_API_KEY environment variable not set",
|
||||
"help": "Get an API key at https://brave.com/search/api/",
|
||||
}
|
||||
|
||||
# Validate inputs
|
||||
if not query or len(query) > 500:
|
||||
return {"error": "Query must be 1-500 characters"}
|
||||
if num_results < 1 or num_results > 20:
|
||||
num_results = max(1, min(20, num_results))
|
||||
|
||||
try:
|
||||
# Make request to Brave Search API
|
||||
response = httpx.get(
|
||||
"https://api.search.brave.com/res/v1/web/search",
|
||||
params={
|
||||
"q": query,
|
||||
"count": num_results,
|
||||
"country": country,
|
||||
},
|
||||
headers={
|
||||
"X-Subscription-Token": api_key,
|
||||
"Accept": "application/json",
|
||||
},
|
||||
timeout=30.0,
|
||||
)
|
||||
|
||||
if response.status_code == 401:
|
||||
return {"error": "Invalid API key"}
|
||||
elif response.status_code == 429:
|
||||
return {"error": "Rate limit exceeded. Try again later."}
|
||||
elif response.status_code != 200:
|
||||
return {"error": f"API request failed: HTTP {response.status_code}"}
|
||||
|
||||
data = response.json()
|
||||
|
||||
# Extract results
|
||||
results = []
|
||||
web_results = data.get("web", {}).get("results", [])
|
||||
|
||||
for item in web_results[:num_results]:
|
||||
results.append({
|
||||
"title": item.get("title", ""),
|
||||
"url": item.get("url", ""),
|
||||
"snippet": item.get("description", ""),
|
||||
})
|
||||
|
||||
return {
|
||||
"query": query,
|
||||
"results": results,
|
||||
"total": len(results),
|
||||
}
|
||||
|
||||
except httpx.TimeoutException:
|
||||
return {"error": "Search request timed out"}
|
||||
except httpx.RequestError as e:
|
||||
return {"error": f"Network error: {str(e)}"}
|
||||
except Exception as e:
|
||||
return {"error": f"Search failed: {str(e)}"}
|
||||
@@ -0,0 +1,6 @@
|
||||
"""
|
||||
Utility functions for Aden Tools.
|
||||
"""
|
||||
from .env_helpers import get_env_var
|
||||
|
||||
__all__ = ["get_env_var"]
|
||||
@@ -0,0 +1,35 @@
|
||||
"""
|
||||
Environment variable helpers for Aden Tools.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import os
|
||||
from typing import Optional
|
||||
|
||||
|
||||
def get_env_var(
|
||||
name: str,
|
||||
default: Optional[str] = None,
|
||||
required: bool = False,
|
||||
) -> Optional[str]:
|
||||
"""
|
||||
Get an environment variable with optional default and required validation.
|
||||
|
||||
Args:
|
||||
name: Name of the environment variable
|
||||
default: Default value if not set
|
||||
required: If True, raises ValueError when not set and no default
|
||||
|
||||
Returns:
|
||||
The environment variable value or default
|
||||
|
||||
Raises:
|
||||
ValueError: If required=True and variable is not set with no default
|
||||
"""
|
||||
value = os.environ.get(name, default)
|
||||
if required and value is None:
|
||||
raise ValueError(
|
||||
f"Required environment variable '{name}' is not set. "
|
||||
f"Please set it before using this tool."
|
||||
)
|
||||
return value
|
||||
@@ -0,0 +1 @@
|
||||
"""Aden Tools test suite."""
|
||||
@@ -0,0 +1,43 @@
|
||||
"""Shared fixtures for aden-tools tests."""
|
||||
import pytest
|
||||
from pathlib import Path
|
||||
|
||||
from fastmcp import FastMCP
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def mcp() -> FastMCP:
|
||||
"""Create a fresh FastMCP instance for testing."""
|
||||
return FastMCP("test-server")
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def sample_text_file(tmp_path: Path) -> Path:
|
||||
"""Create a simple text file for testing."""
|
||||
txt_file = tmp_path / "test.txt"
|
||||
txt_file.write_text("Hello, World!\nLine 2\nLine 3")
|
||||
return txt_file
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def sample_csv(tmp_path: Path) -> Path:
|
||||
"""Create a simple CSV file for testing."""
|
||||
csv_file = tmp_path / "test.csv"
|
||||
csv_file.write_text("name,age,city\nAlice,30,NYC\nBob,25,LA\nCharlie,35,Chicago\n")
|
||||
return csv_file
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def sample_json(tmp_path: Path) -> Path:
|
||||
"""Create a simple JSON file for testing."""
|
||||
json_file = tmp_path / "test.json"
|
||||
json_file.write_text('{"users": [{"name": "Alice", "age": 30}, {"name": "Bob", "age": 25}]}')
|
||||
return json_file
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def large_text_file(tmp_path: Path) -> Path:
|
||||
"""Create a large text file for size limit testing."""
|
||||
large_file = tmp_path / "large.txt"
|
||||
large_file.write_text("x" * 20_000_000) # 20MB
|
||||
return large_file
|
||||
@@ -0,0 +1,50 @@
|
||||
"""Tests for environment variable helpers."""
|
||||
import pytest
|
||||
|
||||
from aden_tools.utils import get_env_var
|
||||
|
||||
|
||||
class TestGetEnvVar:
|
||||
"""Tests for get_env_var function."""
|
||||
|
||||
def test_returns_value_when_set(self, monkeypatch):
|
||||
"""Returns the environment variable value when set."""
|
||||
monkeypatch.setenv("TEST_VAR", "test_value")
|
||||
|
||||
result = get_env_var("TEST_VAR")
|
||||
|
||||
assert result == "test_value"
|
||||
|
||||
def test_returns_default_when_not_set(self, monkeypatch):
|
||||
"""Returns default value when variable is not set."""
|
||||
monkeypatch.delenv("UNSET_VAR", raising=False)
|
||||
|
||||
result = get_env_var("UNSET_VAR", default="default_value")
|
||||
|
||||
assert result == "default_value"
|
||||
|
||||
def test_returns_none_when_not_set_and_no_default(self, monkeypatch):
|
||||
"""Returns None when variable is not set and no default provided."""
|
||||
monkeypatch.delenv("UNSET_VAR", raising=False)
|
||||
|
||||
result = get_env_var("UNSET_VAR")
|
||||
|
||||
assert result is None
|
||||
|
||||
def test_raises_when_required_and_missing(self, monkeypatch):
|
||||
"""Raises ValueError when required=True and variable is missing."""
|
||||
monkeypatch.delenv("REQUIRED_VAR", raising=False)
|
||||
|
||||
with pytest.raises(ValueError) as exc_info:
|
||||
get_env_var("REQUIRED_VAR", required=True)
|
||||
|
||||
assert "REQUIRED_VAR" in str(exc_info.value)
|
||||
assert "not set" in str(exc_info.value)
|
||||
|
||||
def test_returns_value_when_required_and_set(self, monkeypatch):
|
||||
"""Returns value when required=True and variable is set."""
|
||||
monkeypatch.setenv("REQUIRED_VAR", "my_value")
|
||||
|
||||
result = get_env_var("REQUIRED_VAR", required=True)
|
||||
|
||||
assert result == "my_value"
|
||||
@@ -0,0 +1 @@
|
||||
"""Tool-specific tests."""
|
||||
@@ -0,0 +1,96 @@
|
||||
"""Tests for file_read tool (FastMCP)."""
|
||||
import pytest
|
||||
from pathlib import Path
|
||||
|
||||
from fastmcp import FastMCP
|
||||
from aden_tools.tools.file_read_tool import register_tools
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def file_read_fn(mcp: FastMCP):
|
||||
"""Register and return the file_read tool function."""
|
||||
register_tools(mcp)
|
||||
# Access the registered tool's function directly
|
||||
return mcp._tool_manager._tools["file_read"].fn
|
||||
|
||||
|
||||
class TestFileReadTool:
|
||||
"""Tests for file_read tool."""
|
||||
|
||||
def test_read_existing_file(self, file_read_fn, sample_text_file: Path):
|
||||
"""Reading an existing file returns content and metadata."""
|
||||
result = file_read_fn(file_path=str(sample_text_file))
|
||||
|
||||
assert "error" not in result
|
||||
assert result["content"] == "Hello, World!\nLine 2\nLine 3"
|
||||
assert result["name"] == "test.txt"
|
||||
assert result["encoding"] == "utf-8"
|
||||
assert "size" in result
|
||||
|
||||
def test_read_file_not_found(self, file_read_fn, tmp_path: Path):
|
||||
"""Reading a non-existent file returns an error dict."""
|
||||
missing_file = tmp_path / "does_not_exist.txt"
|
||||
|
||||
result = file_read_fn(file_path=str(missing_file))
|
||||
|
||||
assert "error" in result
|
||||
assert "not found" in result["error"].lower()
|
||||
|
||||
def test_read_directory_returns_error(self, file_read_fn, tmp_path: Path):
|
||||
"""Reading a directory (not a file) returns an error."""
|
||||
result = file_read_fn(file_path=str(tmp_path))
|
||||
|
||||
assert "error" in result
|
||||
assert "not a file" in result["error"].lower()
|
||||
|
||||
def test_read_file_too_large(self, file_read_fn, tmp_path: Path):
|
||||
"""Reading a file exceeding max_size returns an error."""
|
||||
large_file = tmp_path / "large.txt"
|
||||
large_file.write_text("x" * 1000)
|
||||
|
||||
result = file_read_fn(file_path=str(large_file), max_size=100)
|
||||
|
||||
assert "error" in result
|
||||
assert "too large" in result["error"].lower()
|
||||
assert "file_size" in result
|
||||
|
||||
def test_read_with_no_size_limit(self, file_read_fn, tmp_path: Path):
|
||||
"""Reading with max_size=0 allows any file size."""
|
||||
large_file = tmp_path / "large.txt"
|
||||
content = "x" * 100_000
|
||||
large_file.write_text(content)
|
||||
|
||||
# max_size=0 means no limit in the implementation
|
||||
result = file_read_fn(file_path=str(large_file), max_size=0)
|
||||
|
||||
assert "error" not in result
|
||||
assert result["content"] == content
|
||||
|
||||
def test_read_with_different_encoding(self, file_read_fn, tmp_path: Path):
|
||||
"""Reading with a specific encoding works."""
|
||||
latin_file = tmp_path / "latin.txt"
|
||||
# Write bytes directly with latin-1 encoding
|
||||
latin_file.write_bytes("café".encode("latin-1"))
|
||||
|
||||
result = file_read_fn(file_path=str(latin_file), encoding="latin-1")
|
||||
|
||||
assert "error" not in result
|
||||
assert result["content"] == "café"
|
||||
assert result["encoding"] == "latin-1"
|
||||
|
||||
def test_read_with_wrong_encoding_returns_error(self, file_read_fn, tmp_path: Path):
|
||||
"""Reading with wrong encoding returns helpful error."""
|
||||
# Create a file with bytes that aren't valid UTF-8
|
||||
binary_file = tmp_path / "binary.txt"
|
||||
binary_file.write_bytes(b"\xff\xfe")
|
||||
|
||||
result = file_read_fn(file_path=str(binary_file), encoding="utf-8")
|
||||
|
||||
assert "error" in result
|
||||
assert "suggestion" in result
|
||||
|
||||
def test_returns_absolute_path(self, file_read_fn, sample_text_file: Path):
|
||||
"""Result includes the absolute path."""
|
||||
result = file_read_fn(file_path=str(sample_text_file))
|
||||
|
||||
assert result["path"] == str(sample_text_file.resolve())
|
||||
@@ -0,0 +1,99 @@
|
||||
"""Tests for file_write tool (FastMCP)."""
|
||||
import pytest
|
||||
from pathlib import Path
|
||||
|
||||
from fastmcp import FastMCP
|
||||
from aden_tools.tools.file_write_tool import register_tools
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def file_write_fn(mcp: FastMCP):
|
||||
"""Register and return the file_write tool function."""
|
||||
register_tools(mcp)
|
||||
return mcp._tool_manager._tools["file_write"].fn
|
||||
|
||||
|
||||
class TestFileWriteTool:
|
||||
"""Tests for file_write tool."""
|
||||
|
||||
def test_write_creates_new_file(self, file_write_fn, tmp_path: Path):
|
||||
"""Writing to a new file creates it with content."""
|
||||
new_file = tmp_path / "new.txt"
|
||||
|
||||
result = file_write_fn(file_path=str(new_file), content="Hello, World!")
|
||||
|
||||
assert "error" not in result
|
||||
assert result["created"] is True
|
||||
assert result["name"] == "new.txt"
|
||||
assert new_file.read_text() == "Hello, World!"
|
||||
|
||||
def test_write_overwrites_existing(self, file_write_fn, tmp_path: Path):
|
||||
"""Writing to existing file overwrites by default."""
|
||||
existing = tmp_path / "existing.txt"
|
||||
existing.write_text("old content")
|
||||
|
||||
result = file_write_fn(file_path=str(existing), content="new content")
|
||||
|
||||
assert "error" not in result
|
||||
assert result["created"] is False
|
||||
assert result["previous_size"] is not None
|
||||
assert existing.read_text() == "new content"
|
||||
|
||||
def test_write_appends_to_existing(self, file_write_fn, tmp_path: Path):
|
||||
"""Writing with mode='append' adds to existing content."""
|
||||
existing = tmp_path / "existing.txt"
|
||||
existing.write_text("line1\n")
|
||||
|
||||
result = file_write_fn(file_path=str(existing), content="line2\n", mode="append")
|
||||
|
||||
assert "error" not in result
|
||||
assert result["mode"] == "append"
|
||||
assert existing.read_text() == "line1\nline2\n"
|
||||
|
||||
def test_write_creates_parent_dirs(self, file_write_fn, tmp_path: Path):
|
||||
"""Writing with create_dirs=True creates missing directories."""
|
||||
deep_path = tmp_path / "nested" / "dirs" / "file.txt"
|
||||
|
||||
result = file_write_fn(file_path=str(deep_path), content="content", create_dirs=True)
|
||||
|
||||
assert "error" not in result
|
||||
assert deep_path.exists()
|
||||
assert deep_path.read_text() == "content"
|
||||
|
||||
def test_write_fails_without_parent_dir(self, file_write_fn, tmp_path: Path):
|
||||
"""Writing with create_dirs=False fails if parent doesn't exist."""
|
||||
missing_dir = tmp_path / "missing" / "file.txt"
|
||||
|
||||
result = file_write_fn(file_path=str(missing_dir), content="content", create_dirs=False)
|
||||
|
||||
assert "error" in result
|
||||
assert "parent directory" in result["error"].lower()
|
||||
|
||||
def test_write_invalid_mode(self, file_write_fn, tmp_path: Path):
|
||||
"""Writing with invalid mode returns error."""
|
||||
result = file_write_fn(
|
||||
file_path=str(tmp_path / "test.txt"),
|
||||
content="content",
|
||||
mode="invalid"
|
||||
)
|
||||
|
||||
assert "error" in result
|
||||
assert "invalid mode" in result["error"].lower()
|
||||
|
||||
def test_write_returns_bytes_written(self, file_write_fn, tmp_path: Path):
|
||||
"""Result includes accurate bytes_written count."""
|
||||
content = "Hello, World!"
|
||||
|
||||
result = file_write_fn(file_path=str(tmp_path / "test.txt"), content=content)
|
||||
|
||||
assert result["bytes_written"] == len(content.encode("utf-8"))
|
||||
|
||||
def test_write_with_encoding(self, file_write_fn, tmp_path: Path):
|
||||
"""Writing with specific encoding works."""
|
||||
file_path = tmp_path / "latin.txt"
|
||||
|
||||
result = file_write_fn(file_path=str(file_path), content="café", encoding="latin-1")
|
||||
|
||||
assert "error" not in result
|
||||
# Verify it was written with latin-1 encoding
|
||||
assert file_path.read_bytes() == "café".encode("latin-1")
|
||||
@@ -0,0 +1,80 @@
|
||||
"""Tests for pdf_read tool (FastMCP)."""
|
||||
import pytest
|
||||
from pathlib import Path
|
||||
|
||||
from fastmcp import FastMCP
|
||||
from aden_tools.tools.pdf_read_tool import register_tools
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def pdf_read_fn(mcp: FastMCP):
|
||||
"""Register and return the pdf_read tool function."""
|
||||
register_tools(mcp)
|
||||
return mcp._tool_manager._tools["pdf_read"].fn
|
||||
|
||||
|
||||
class TestPdfReadTool:
|
||||
"""Tests for pdf_read tool."""
|
||||
|
||||
def test_read_pdf_file_not_found(self, pdf_read_fn, tmp_path: Path):
|
||||
"""Reading non-existent PDF returns error."""
|
||||
result = pdf_read_fn(file_path=str(tmp_path / "missing.pdf"))
|
||||
|
||||
assert "error" in result
|
||||
assert "not found" in result["error"].lower()
|
||||
|
||||
def test_read_pdf_invalid_extension(self, pdf_read_fn, tmp_path: Path):
|
||||
"""Reading non-PDF file returns error."""
|
||||
txt_file = tmp_path / "test.txt"
|
||||
txt_file.write_text("not a pdf")
|
||||
|
||||
result = pdf_read_fn(file_path=str(txt_file))
|
||||
|
||||
assert "error" in result
|
||||
assert "not a pdf" in result["error"].lower()
|
||||
|
||||
def test_read_pdf_directory(self, pdf_read_fn, tmp_path: Path):
|
||||
"""Reading a directory returns error."""
|
||||
result = pdf_read_fn(file_path=str(tmp_path))
|
||||
|
||||
assert "error" in result
|
||||
assert "not a file" in result["error"].lower()
|
||||
|
||||
def test_max_pages_clamped_low(self, pdf_read_fn, tmp_path: Path):
|
||||
"""max_pages below 1 is clamped to 1."""
|
||||
pdf_file = tmp_path / "test.pdf"
|
||||
pdf_file.write_bytes(b"%PDF-1.4") # Minimal PDF header (will fail to parse)
|
||||
|
||||
result = pdf_read_fn(file_path=str(pdf_file), max_pages=0)
|
||||
# Will error due to invalid PDF, but max_pages should be accepted
|
||||
assert isinstance(result, dict)
|
||||
|
||||
def test_max_pages_clamped_high(self, pdf_read_fn, tmp_path: Path):
|
||||
"""max_pages above 1000 is clamped to 1000."""
|
||||
pdf_file = tmp_path / "test.pdf"
|
||||
pdf_file.write_bytes(b"%PDF-1.4")
|
||||
|
||||
result = pdf_read_fn(file_path=str(pdf_file), max_pages=2000)
|
||||
# Will error due to invalid PDF, but max_pages should be accepted
|
||||
assert isinstance(result, dict)
|
||||
|
||||
def test_pages_parameter_accepted(self, pdf_read_fn, tmp_path: Path):
|
||||
"""Various pages parameter formats are accepted."""
|
||||
pdf_file = tmp_path / "test.pdf"
|
||||
pdf_file.write_bytes(b"%PDF-1.4")
|
||||
|
||||
# Test different page formats - all should be accepted
|
||||
for pages in ["all", "1", "1-5", "1,3,5", None]:
|
||||
result = pdf_read_fn(file_path=str(pdf_file), pages=pages)
|
||||
assert isinstance(result, dict)
|
||||
|
||||
def test_include_metadata_parameter(self, pdf_read_fn, tmp_path: Path):
|
||||
"""include_metadata parameter is accepted."""
|
||||
pdf_file = tmp_path / "test.pdf"
|
||||
pdf_file.write_bytes(b"%PDF-1.4")
|
||||
|
||||
result = pdf_read_fn(file_path=str(pdf_file), include_metadata=False)
|
||||
assert isinstance(result, dict)
|
||||
|
||||
result = pdf_read_fn(file_path=str(pdf_file), include_metadata=True)
|
||||
assert isinstance(result, dict)
|
||||
@@ -0,0 +1,52 @@
|
||||
"""Tests for web_scrape tool (FastMCP)."""
|
||||
import pytest
|
||||
|
||||
from fastmcp import FastMCP
|
||||
from aden_tools.tools.web_scrape_tool import register_tools
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def web_scrape_fn(mcp: FastMCP):
|
||||
"""Register and return the web_scrape tool function."""
|
||||
register_tools(mcp)
|
||||
return mcp._tool_manager._tools["web_scrape"].fn
|
||||
|
||||
|
||||
class TestWebScrapeTool:
|
||||
"""Tests for web_scrape tool."""
|
||||
|
||||
def test_url_auto_prefixed_with_https(self, web_scrape_fn):
|
||||
"""URLs without scheme get https:// prefix."""
|
||||
# This will fail to connect, but we can verify the behavior
|
||||
result = web_scrape_fn(url="example.com")
|
||||
# Should either succeed or have a network error (not a validation error)
|
||||
assert isinstance(result, dict)
|
||||
|
||||
def test_max_length_clamped_low(self, web_scrape_fn):
|
||||
"""max_length below 1000 is clamped to 1000."""
|
||||
# Test with a very low max_length - implementation clamps to 1000
|
||||
result = web_scrape_fn(url="https://example.com", max_length=500)
|
||||
# Should not error due to invalid max_length
|
||||
assert isinstance(result, dict)
|
||||
|
||||
def test_max_length_clamped_high(self, web_scrape_fn):
|
||||
"""max_length above 500000 is clamped to 500000."""
|
||||
# Test with a very high max_length - implementation clamps to 500000
|
||||
result = web_scrape_fn(url="https://example.com", max_length=600000)
|
||||
# Should not error due to invalid max_length
|
||||
assert isinstance(result, dict)
|
||||
|
||||
def test_valid_max_length_accepted(self, web_scrape_fn):
|
||||
"""Valid max_length values are accepted."""
|
||||
result = web_scrape_fn(url="https://example.com", max_length=10000)
|
||||
assert isinstance(result, dict)
|
||||
|
||||
def test_include_links_option(self, web_scrape_fn):
|
||||
"""include_links parameter is accepted."""
|
||||
result = web_scrape_fn(url="https://example.com", include_links=True)
|
||||
assert isinstance(result, dict)
|
||||
|
||||
def test_selector_option(self, web_scrape_fn):
|
||||
"""selector parameter is accepted."""
|
||||
result = web_scrape_fn(url="https://example.com", selector=".content")
|
||||
assert isinstance(result, dict)
|
||||
@@ -0,0 +1,57 @@
|
||||
"""Tests for web_search tool (FastMCP)."""
|
||||
import pytest
|
||||
|
||||
from fastmcp import FastMCP
|
||||
from aden_tools.tools.web_search_tool import register_tools
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def web_search_fn(mcp: FastMCP):
|
||||
"""Register and return the web_search tool function."""
|
||||
register_tools(mcp)
|
||||
return mcp._tool_manager._tools["web_search"].fn
|
||||
|
||||
|
||||
class TestWebSearchTool:
|
||||
"""Tests for web_search tool."""
|
||||
|
||||
def test_search_missing_api_key(self, web_search_fn, monkeypatch):
|
||||
"""Search without API key returns helpful error."""
|
||||
monkeypatch.delenv("BRAVE_SEARCH_API_KEY", raising=False)
|
||||
|
||||
result = web_search_fn(query="test query")
|
||||
|
||||
assert "error" in result
|
||||
assert "BRAVE_SEARCH_API_KEY" in result["error"]
|
||||
assert "help" in result
|
||||
|
||||
def test_empty_query_returns_error(self, web_search_fn, monkeypatch):
|
||||
"""Empty query returns error."""
|
||||
monkeypatch.setenv("BRAVE_SEARCH_API_KEY", "test-key")
|
||||
|
||||
result = web_search_fn(query="")
|
||||
|
||||
assert "error" in result
|
||||
assert "1-500" in result["error"].lower() or "character" in result["error"].lower()
|
||||
|
||||
def test_long_query_returns_error(self, web_search_fn, monkeypatch):
|
||||
"""Query exceeding 500 chars returns error."""
|
||||
monkeypatch.setenv("BRAVE_SEARCH_API_KEY", "test-key")
|
||||
|
||||
result = web_search_fn(query="x" * 501)
|
||||
|
||||
assert "error" in result
|
||||
|
||||
def test_num_results_clamped_to_valid_range(self, web_search_fn, monkeypatch):
|
||||
"""num_results outside 1-20 is clamped (not error)."""
|
||||
monkeypatch.setenv("BRAVE_SEARCH_API_KEY", "test-key")
|
||||
|
||||
# Test that the function handles out-of-range values gracefully
|
||||
# The implementation clamps values, so we just verify it doesn't crash
|
||||
# (actual API call would fail with invalid key, but that's expected)
|
||||
result = web_search_fn(query="test", num_results=0)
|
||||
# Should either clamp or error - both are acceptable
|
||||
assert isinstance(result, dict)
|
||||
|
||||
result = web_search_fn(query="test", num_results=100)
|
||||
assert isinstance(result, dict)
|
||||
@@ -131,6 +131,31 @@ services:
|
||||
networks:
|
||||
- honeycomb-network
|
||||
|
||||
# Aden Tools MCP Server - Python tools via Model Context Protocol
|
||||
aden-tools-mcp:
|
||||
build:
|
||||
context: ./aden-tools
|
||||
container_name: honeycomb-aden-tools-mcp
|
||||
ports:
|
||||
- "${ADEN_TOOLS_MCP_PORT:-4001}:4001"
|
||||
environment:
|
||||
- MCP_PORT=4001
|
||||
# Pass through tool-specific env vars
|
||||
- BRAVE_SEARCH_API_KEY=${BRAVE_SEARCH_API_KEY:-}
|
||||
volumes:
|
||||
- .:/workspace:rw # Mount project root for file access
|
||||
working_dir: /workspace # Set working directory so relative paths work
|
||||
command: ["python", "/app/mcp_server.py"] # Use absolute path since working_dir changed
|
||||
healthcheck:
|
||||
test: ["CMD", "python", "-c", "import httpx; httpx.get('http://localhost:4001/health').raise_for_status()"]
|
||||
interval: 30s
|
||||
timeout: 5s
|
||||
retries: 5
|
||||
start_period: 10s
|
||||
restart: unless-stopped
|
||||
networks:
|
||||
- honeycomb-network
|
||||
|
||||
networks:
|
||||
honeycomb-network:
|
||||
driver: bridge
|
||||
|
||||
@@ -0,0 +1,30 @@
|
||||
# Aden Listicles & Comparisons
|
||||
|
||||
Educational content comparing AI agent frameworks and exploring the agent development landscape.
|
||||
|
||||
## Articles
|
||||
|
||||
| Article | Topic | Keywords |
|
||||
|---------|-------|----------|
|
||||
| [Top 10 AI Agent Frameworks in 2025](./top-10-ai-agent-frameworks-2025.md) | Overview | ai agents, frameworks, comparison |
|
||||
| [Aden vs LangChain](./aden-vs-langchain.md) | Comparison | langchain, rag, llm apps |
|
||||
| [Aden vs CrewAI](./aden-vs-crewai.md) | Comparison | crewai, multi-agent, orchestration |
|
||||
| [Aden vs AutoGen](./aden-vs-autogen.md) | Comparison | autogen, microsoft, conversational |
|
||||
| [Self-Improving vs Static Agents](./self-improving-vs-static-agents.md) | Concept | self-evolution, adaptation |
|
||||
| [Human-in-the-Loop Guide](./human-in-the-loop-ai-agents.md) | Guide | hitl, human oversight, safety |
|
||||
| [AI Agent Cost Management](./ai-agent-cost-management-guide.md) | Guide | cost control, budget, optimization |
|
||||
| [Building Production AI Agents](./building-production-ai-agents.md) | Guide | production, deployment, reliability |
|
||||
| [Multi-Agent vs Single-Agent](./multi-agent-vs-single-agent-systems.md) | Concept | architecture, design patterns |
|
||||
| [AI Agent Observability](./ai-agent-observability-monitoring.md) | Guide | monitoring, observability, debugging |
|
||||
|
||||
## Purpose
|
||||
|
||||
These articles help developers:
|
||||
- Understand the AI agent landscape
|
||||
- Make informed framework choices
|
||||
- Learn best practices for agent development
|
||||
- Compare different approaches objectively
|
||||
|
||||
## Contributing
|
||||
|
||||
Want to add or improve an article? See [CONTRIBUTING.md](../../CONTRIBUTING.md).
|
||||
@@ -0,0 +1,366 @@
|
||||
# Aden vs AutoGen: A Detailed Comparison
|
||||
|
||||
*Comparing self-evolving agents with conversational multi-agent systems*
|
||||
|
||||
---
|
||||
|
||||
Microsoft's AutoGen and Aden both enable multi-agent systems but serve different purposes. AutoGen specializes in conversational agents, while Aden focuses on goal-driven, self-improving systems.
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
| Aspect | AutoGen | Aden |
|
||||
|--------|---------|------|
|
||||
| **Developed By** | Microsoft | Aden |
|
||||
| **Philosophy** | Conversational agents | Goal-driven, self-evolving |
|
||||
| **Primary Pattern** | Multi-agent conversations | Node-based agent graphs |
|
||||
| **Communication** | Natural language dialogue | Generated connection code |
|
||||
| **Self-Improvement** | No | Yes |
|
||||
| **Best For** | Dialogue-heavy applications | Production agent systems |
|
||||
| **License** | MIT | Apache 2.0 |
|
||||
|
||||
---
|
||||
|
||||
## Philosophy & Approach
|
||||
|
||||
### AutoGen
|
||||
AutoGen enables agents to **communicate through natural language conversations**. Agents chat with each other to solve problems collaboratively.
|
||||
|
||||
```python
|
||||
# AutoGen: Conversation-based agents
|
||||
from autogen import AssistantAgent, UserProxyAgent
|
||||
|
||||
assistant = AssistantAgent(
|
||||
name="assistant",
|
||||
llm_config={"model": "gpt-4"}
|
||||
)
|
||||
|
||||
user_proxy = UserProxyAgent(
|
||||
name="user_proxy",
|
||||
human_input_mode="TERMINATE",
|
||||
code_execution_config={"work_dir": "coding"}
|
||||
)
|
||||
|
||||
# Agents solve problems through conversation
|
||||
user_proxy.initiate_chat(
|
||||
assistant,
|
||||
message="Create a Python script to analyze sales data"
|
||||
)
|
||||
```
|
||||
|
||||
### Aden
|
||||
Aden uses a **coding agent to generate complete agent systems** from goals. Agents are connected through generated code, not just conversation.
|
||||
|
||||
```python
|
||||
# Aden: Goal-driven agent generation
|
||||
goal = """
|
||||
Build a data analysis system that:
|
||||
1. Ingests sales data from multiple sources
|
||||
2. Generates insights and visualizations
|
||||
3. Creates weekly summary reports
|
||||
4. Escalates anomalies to the data team
|
||||
|
||||
When analysis fails or produces incorrect results,
|
||||
learn from the corrections to improve accuracy.
|
||||
"""
|
||||
|
||||
# Aden generates specialized agents with:
|
||||
# - Data ingestion tools
|
||||
# - Analysis capabilities
|
||||
# - Visualization outputs
|
||||
# - Human escalation for anomalies
|
||||
# - Self-improvement from feedback
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Feature Comparison
|
||||
|
||||
### Communication Model
|
||||
|
||||
| Feature | AutoGen | Aden |
|
||||
|---------|---------|------|
|
||||
| Agent-to-agent | Natural language | Generated connections |
|
||||
| Conversation history | Built-in | Via memory nodes |
|
||||
| Message passing | Sequential turns | Async/event-driven |
|
||||
| Human interaction | Via UserProxyAgent | Native HITL nodes |
|
||||
|
||||
**Verdict:** AutoGen is more natural for dialogue; Aden is more flexible for diverse patterns.
|
||||
|
||||
### Code Execution
|
||||
|
||||
| Feature | AutoGen | Aden |
|
||||
|---------|---------|------|
|
||||
| Code execution | Built-in (sandboxed) | Via tools |
|
||||
| Language support | Python (primarily) | Multi-language via tools |
|
||||
| Execution safety | Docker containers | Tool-level sandboxing |
|
||||
| Result handling | Conversation flow | Structured outputs |
|
||||
|
||||
**Verdict:** AutoGen has stronger built-in code execution; Aden uses tool abstraction.
|
||||
|
||||
### Multi-Agent Patterns
|
||||
|
||||
| Feature | AutoGen | Aden |
|
||||
|---------|---------|------|
|
||||
| Group chat | Native support | Via graph connections |
|
||||
| Hierarchical | Nested conversations | Node hierarchies |
|
||||
| Dynamic agents | Limited | Coding agent creates as needed |
|
||||
| Agent discovery | Manual | Auto-generated |
|
||||
|
||||
**Verdict:** AutoGen excels at chat patterns; Aden is more flexible for non-chat workflows.
|
||||
|
||||
### Production Features
|
||||
|
||||
| Feature | AutoGen | Aden |
|
||||
|---------|---------|------|
|
||||
| Monitoring | Basic logging | Full dashboard |
|
||||
| Cost tracking | Manual | Automatic |
|
||||
| Budget controls | Not built-in | Native |
|
||||
| Self-improvement | No | Yes |
|
||||
|
||||
**Verdict:** Aden is significantly more production-ready.
|
||||
|
||||
---
|
||||
|
||||
## Code Comparison
|
||||
|
||||
### Building a Coding Assistant
|
||||
|
||||
#### AutoGen Approach
|
||||
```python
|
||||
from autogen import AssistantAgent, UserProxyAgent, GroupChat, GroupChatManager
|
||||
|
||||
# Define specialized agents
|
||||
coder = AssistantAgent(
|
||||
name="coder",
|
||||
system_message="You are a Python expert...",
|
||||
llm_config=llm_config
|
||||
)
|
||||
|
||||
reviewer = AssistantAgent(
|
||||
name="reviewer",
|
||||
system_message="You review code for bugs and improvements...",
|
||||
llm_config=llm_config
|
||||
)
|
||||
|
||||
executor = UserProxyAgent(
|
||||
name="executor",
|
||||
human_input_mode="NEVER",
|
||||
code_execution_config={"work_dir": "workspace"}
|
||||
)
|
||||
|
||||
# Create group chat
|
||||
group_chat = GroupChat(
|
||||
agents=[coder, reviewer, executor],
|
||||
messages=[],
|
||||
max_round=10
|
||||
)
|
||||
|
||||
manager = GroupChatManager(groupchat=group_chat, llm_config=llm_config)
|
||||
|
||||
# Start conversation
|
||||
executor.initiate_chat(
|
||||
manager,
|
||||
message="Create a data processing pipeline"
|
||||
)
|
||||
|
||||
# Conversation happens naturally between agents
|
||||
# Each agent responds based on their role
|
||||
```
|
||||
|
||||
#### Aden Approach
|
||||
```python
|
||||
# Define goal for coding assistant system
|
||||
goal = """
|
||||
Build a code development system that:
|
||||
1. Understands coding requests and breaks them into tasks
|
||||
2. Writes Python code following best practices
|
||||
3. Reviews code for bugs, security issues, and improvements
|
||||
4. Executes code in a safe environment
|
||||
5. Iterates based on execution results
|
||||
|
||||
Human review required for:
|
||||
- Code that accesses external services
|
||||
- Changes to production systems
|
||||
- Code handling sensitive data
|
||||
|
||||
Self-improvement:
|
||||
- Learn from code review feedback
|
||||
- Track which patterns cause bugs
|
||||
- Improve based on execution failures
|
||||
"""
|
||||
|
||||
# Aden creates:
|
||||
# - Task decomposition agent
|
||||
# - Coder agent with best practices
|
||||
# - Reviewer agent with learned patterns
|
||||
# - Safe execution environment
|
||||
# - Human checkpoints for sensitive operations
|
||||
# - Feedback loop for continuous improvement
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Use Case Comparison
|
||||
|
||||
### Best for AutoGen
|
||||
|
||||
1. **Conversational AI applications**
|
||||
- Chatbots with multiple personalities
|
||||
- Customer service with specialist handoffs
|
||||
- Interactive tutoring systems
|
||||
|
||||
2. **Code generation through dialogue**
|
||||
- Pair programming assistants
|
||||
- Code review discussions
|
||||
- Debugging conversations
|
||||
|
||||
3. **Research and exploration**
|
||||
- Collaborative problem solving
|
||||
- Multi-perspective analysis
|
||||
- Brainstorming sessions
|
||||
|
||||
### Best for Aden
|
||||
|
||||
1. **Production agent systems**
|
||||
- Customer support with evolution
|
||||
- Data pipelines that self-correct
|
||||
- Content systems that improve
|
||||
|
||||
2. **Goal-oriented automation**
|
||||
- Business process automation
|
||||
- Monitoring and alerting
|
||||
- Report generation
|
||||
|
||||
3. **Systems requiring adaptation**
|
||||
- Changing requirements
|
||||
- Learning from failures
|
||||
- Continuous improvement
|
||||
|
||||
---
|
||||
|
||||
## Detailed Comparisons
|
||||
|
||||
### Conversation Management
|
||||
|
||||
| Aspect | AutoGen | Aden |
|
||||
|--------|---------|------|
|
||||
| Turn management | Automatic | Event-driven |
|
||||
| Context window | Managed | Via memory tools |
|
||||
| History persistence | Session-based | Durable storage |
|
||||
| Branching conversations | Supported | Via graph structure |
|
||||
|
||||
### Error Handling
|
||||
|
||||
| Aspect | AutoGen | Aden |
|
||||
|--------|---------|------|
|
||||
| Execution errors | Retry in conversation | Capture and evolve |
|
||||
| Logic errors | Agent discussion | Failure analysis |
|
||||
| Recovery | Manual intervention | Automatic adaptation |
|
||||
| Learning | No | Built-in |
|
||||
|
||||
### Integration
|
||||
|
||||
| Aspect | AutoGen | Aden |
|
||||
|--------|---------|------|
|
||||
| External tools | Function calling | Tool nodes |
|
||||
| APIs | Custom integration | SDK support |
|
||||
| Databases | Via code execution | Native connections |
|
||||
| Enterprise systems | Custom | MCP tools |
|
||||
|
||||
---
|
||||
|
||||
## When to Choose AutoGen
|
||||
|
||||
AutoGen is the better choice when:
|
||||
|
||||
1. **Conversation is the core pattern** - Your agents primarily communicate through dialogue
|
||||
2. **Code execution is central** - Need built-in sandboxed execution
|
||||
3. **Microsoft ecosystem** - Already invested in Microsoft AI tools
|
||||
4. **Research applications** - Exploring multi-agent conversations
|
||||
5. **Flexible dialogue** - Agents need natural back-and-forth
|
||||
6. **Quick prototypes** - Simple multi-agent conversations
|
||||
|
||||
---
|
||||
|
||||
## When to Choose Aden
|
||||
|
||||
Aden is the better choice when:
|
||||
|
||||
1. **Production requirements** - Need monitoring, cost control, health checks
|
||||
2. **Self-improvement matters** - System should evolve from failures
|
||||
3. **Goal-driven development** - Prefer describing outcomes
|
||||
4. **Non-conversational patterns** - Workflows beyond dialogue
|
||||
5. **Cost management** - Need budget enforcement
|
||||
6. **Human-in-the-loop** - Require structured intervention points
|
||||
7. **Long-running systems** - Agents operating continuously
|
||||
|
||||
---
|
||||
|
||||
## Hybrid Architectures
|
||||
|
||||
### AutoGen Agents in Aden
|
||||
AutoGen conversations can be wrapped as Aden nodes:
|
||||
|
||||
```python
|
||||
# AutoGen conversation as a node in Aden's graph
|
||||
class AutoGenConversationNode:
|
||||
def execute(self, input):
|
||||
# Run AutoGen conversation
|
||||
# Return structured output
|
||||
pass
|
||||
```
|
||||
|
||||
### Benefits of Hybrid
|
||||
- Use AutoGen's conversation for dialogue-heavy tasks
|
||||
- Use Aden's orchestration and monitoring
|
||||
- Get self-improvement across the system
|
||||
- Maintain cost controls
|
||||
|
||||
---
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
| Metric | AutoGen | Aden |
|
||||
|--------|---------|------|
|
||||
| Latency per turn | Higher (full responses) | Optimized per node |
|
||||
| Token efficiency | Conversation overhead | Direct communication |
|
||||
| Scalability | Memory-bound | Distributed-ready |
|
||||
| Cost tracking | Manual | Automatic |
|
||||
|
||||
---
|
||||
|
||||
## Community & Support
|
||||
|
||||
| Aspect | AutoGen | Aden |
|
||||
|--------|---------|------|
|
||||
| Backing | Microsoft Research | Y Combinator startup |
|
||||
| Community | Large, active | Growing |
|
||||
| Documentation | Comprehensive | Good and improving |
|
||||
| Enterprise support | Microsoft channels | Direct team support |
|
||||
|
||||
---
|
||||
|
||||
## Conclusion
|
||||
|
||||
**AutoGen** excels at creating agents that collaborate through natural language conversations. It's ideal for dialogue-heavy applications and leverages Microsoft's AI expertise.
|
||||
|
||||
**Aden** provides goal-driven, self-improving agent systems with production features built-in. It's better for systems that need to evolve and require operational visibility.
|
||||
|
||||
### Quick Decision Guide
|
||||
|
||||
| Your Need | Choose |
|
||||
|-----------|--------|
|
||||
| Conversational agents | AutoGen |
|
||||
| Code execution focus | AutoGen |
|
||||
| Self-improving systems | Aden |
|
||||
| Production monitoring | Aden |
|
||||
| Microsoft ecosystem | AutoGen |
|
||||
| Cost management | Aden |
|
||||
| Natural dialogue | AutoGen |
|
||||
| Goal-driven development | Aden |
|
||||
|
||||
---
|
||||
|
||||
*Last updated: January 2025*
|
||||
@@ -0,0 +1,346 @@
|
||||
# Aden vs CrewAI: A Detailed Comparison
|
||||
|
||||
*Comparing self-evolving agents with role-based agent teams*
|
||||
|
||||
---
|
||||
|
||||
CrewAI and Aden both focus on multi-agent systems but take fundamentally different approaches. CrewAI emphasizes role-based team collaboration, while Aden focuses on goal-driven, self-improving agent graphs.
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
| Aspect | CrewAI | Aden |
|
||||
|--------|--------|------|
|
||||
| **Philosophy** | Role-based agent teams | Goal-driven, self-evolving agents |
|
||||
| **Architecture** | Crews with roles | Node-based agent graphs |
|
||||
| **Workflow** | Predefined collaboration | Dynamically generated |
|
||||
| **Self-Improvement** | No | Yes |
|
||||
| **Human-in-the-Loop** | Basic support | Native intervention points |
|
||||
| **Monitoring** | Basic logging | Full dashboard |
|
||||
| **License** | MIT | Apache 2.0 |
|
||||
|
||||
---
|
||||
|
||||
## Philosophy & Approach
|
||||
|
||||
### CrewAI
|
||||
CrewAI organizes agents as a **crew** with defined **roles**. Each agent has a specific job, and they collaborate in predefined patterns to accomplish tasks.
|
||||
|
||||
```python
|
||||
# CrewAI: Role-based team definition
|
||||
from crewai import Agent, Task, Crew
|
||||
|
||||
researcher = Agent(
|
||||
role="Senior Research Analyst",
|
||||
goal="Uncover cutting-edge developments",
|
||||
backstory="You are an expert at finding information...",
|
||||
tools=[search_tool, web_scraper]
|
||||
)
|
||||
|
||||
writer = Agent(
|
||||
role="Content Writer",
|
||||
goal="Create engaging content from research",
|
||||
backstory="You are a skilled writer..."
|
||||
)
|
||||
|
||||
# Define tasks and crew
|
||||
crew = Crew(
|
||||
agents=[researcher, writer],
|
||||
tasks=[research_task, writing_task],
|
||||
process=Process.sequential
|
||||
)
|
||||
```
|
||||
|
||||
### Aden
|
||||
Aden uses a **coding agent** to generate agent systems from natural language goals. The system creates agents, connections, and evolves based on failures.
|
||||
|
||||
```python
|
||||
# Aden: Goal-driven generation
|
||||
goal = """
|
||||
Research cutting-edge developments in AI and create
|
||||
engaging blog content. When content is rejected by
|
||||
editors, learn from the feedback to improve future posts.
|
||||
"""
|
||||
|
||||
# Aden generates:
|
||||
# - Research agent with appropriate tools
|
||||
# - Writer agent with learned preferences
|
||||
# - Editor checkpoint (human-in-the-loop)
|
||||
# - Feedback loop for improvement
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Feature Comparison
|
||||
|
||||
### Agent Definition
|
||||
|
||||
| Feature | CrewAI | Aden |
|
||||
|---------|--------|------|
|
||||
| Agent creation | Manual role definition | Generated from goals |
|
||||
| Roles | Explicit (role, goal, backstory) | Inferred from requirements |
|
||||
| Tools assignment | Manual per agent | Auto-configured |
|
||||
| Customization | High | High (via goal refinement) |
|
||||
|
||||
**Verdict:** CrewAI offers more explicit control; Aden reduces boilerplate through generation.
|
||||
|
||||
### Team Collaboration
|
||||
|
||||
| Feature | CrewAI | Aden |
|
||||
|---------|--------|------|
|
||||
| Collaboration patterns | Sequential, hierarchical | Dynamic, goal-based |
|
||||
| Communication | Predefined handoffs | Generated connection code |
|
||||
| Flexibility | Within defined patterns | Fully dynamic |
|
||||
| Adaptation | Manual updates | Automatic evolution |
|
||||
|
||||
**Verdict:** CrewAI is more predictable; Aden is more adaptive.
|
||||
|
||||
### Failure Handling
|
||||
|
||||
| Feature | CrewAI | Aden |
|
||||
|---------|--------|------|
|
||||
| Error handling | Try/catch | Automatic capture |
|
||||
| Learning from failures | Not built-in | Core feature |
|
||||
| Agent evolution | Manual updates | Automatic |
|
||||
| Recovery strategies | Custom code | Built-in policies |
|
||||
|
||||
**Verdict:** Aden's failure handling and evolution is significantly more advanced.
|
||||
|
||||
### Production Features
|
||||
|
||||
| Feature | CrewAI | Aden |
|
||||
|---------|--------|------|
|
||||
| Monitoring dashboard | No | Yes |
|
||||
| Cost tracking | No | Yes |
|
||||
| Budget enforcement | No | Yes |
|
||||
| Health checks | Basic | Comprehensive |
|
||||
|
||||
**Verdict:** Aden is more production-ready out of the box.
|
||||
|
||||
---
|
||||
|
||||
## Code Comparison
|
||||
|
||||
### Building a Content Creation Team
|
||||
|
||||
#### CrewAI Approach
|
||||
```python
|
||||
from crewai import Agent, Task, Crew, Process
|
||||
|
||||
# Define agents with explicit roles
|
||||
researcher = Agent(
|
||||
role="Research Specialist",
|
||||
goal="Find accurate, relevant information",
|
||||
backstory="Expert researcher with attention to detail",
|
||||
verbose=True,
|
||||
tools=[search_tool, scrape_tool]
|
||||
)
|
||||
|
||||
writer = Agent(
|
||||
role="Content Writer",
|
||||
goal="Create engaging, SEO-friendly content",
|
||||
backstory="Experienced content creator",
|
||||
verbose=True
|
||||
)
|
||||
|
||||
editor = Agent(
|
||||
role="Editor",
|
||||
goal="Ensure quality and accuracy",
|
||||
backstory="Meticulous editor with high standards"
|
||||
)
|
||||
|
||||
# Define tasks
|
||||
research_task = Task(
|
||||
description="Research {topic} thoroughly",
|
||||
agent=researcher,
|
||||
expected_output="Comprehensive research notes"
|
||||
)
|
||||
|
||||
writing_task = Task(
|
||||
description="Write article based on research",
|
||||
agent=writer,
|
||||
expected_output="Draft article"
|
||||
)
|
||||
|
||||
editing_task = Task(
|
||||
description="Edit and polish the article",
|
||||
agent=editor,
|
||||
expected_output="Final article"
|
||||
)
|
||||
|
||||
# Create and run crew
|
||||
crew = Crew(
|
||||
agents=[researcher, writer, editor],
|
||||
tasks=[research_task, writing_task, editing_task],
|
||||
process=Process.sequential
|
||||
)
|
||||
|
||||
result = crew.kickoff(inputs={"topic": "AI trends 2025"})
|
||||
```
|
||||
|
||||
#### Aden Approach
|
||||
```python
|
||||
# Define goal - system generates the team
|
||||
goal = """
|
||||
Create a content creation system that:
|
||||
1. Researches topics thoroughly using web search
|
||||
2. Writes engaging, SEO-optimized articles
|
||||
3. Gets human editor approval before publishing
|
||||
4. Learns from editor feedback to improve over time
|
||||
|
||||
When articles are rejected:
|
||||
- Capture the feedback
|
||||
- Identify patterns in rejections
|
||||
- Adjust writing style and quality criteria
|
||||
"""
|
||||
|
||||
# Aden automatically:
|
||||
# - Creates research, writer nodes
|
||||
# - Sets up human-in-the-loop for editor
|
||||
# - Establishes feedback learning loop
|
||||
# - Monitors cost and quality metrics
|
||||
|
||||
# The system evolves:
|
||||
# - Writing improves based on rejections
|
||||
# - Research depth adjusts based on needs
|
||||
# - Quality thresholds adapt
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Detailed Comparisons
|
||||
|
||||
### Ease of Use
|
||||
|
||||
| Aspect | CrewAI | Aden |
|
||||
|--------|--------|------|
|
||||
| Learning curve | Moderate | Moderate |
|
||||
| Initial setup | Define roles/tasks | Define goals |
|
||||
| Iteration speed | Requires code changes | Goal refinement |
|
||||
| Documentation | Good | Growing |
|
||||
|
||||
### Scalability
|
||||
|
||||
| Aspect | CrewAI | Aden |
|
||||
|--------|--------|------|
|
||||
| Agent count | Grows with complexity | Managed automatically |
|
||||
| Task complexity | Manual orchestration | Dynamic handling |
|
||||
| Resource management | Manual | Built-in controls |
|
||||
|
||||
### Customization
|
||||
|
||||
| Aspect | CrewAI | Aden |
|
||||
|--------|--------|------|
|
||||
| Agent behavior | Full control via role/backstory | Via goals and feedback |
|
||||
| Tools | Assign per agent | Auto-configured + custom |
|
||||
| Workflows | Predefined processes | Generated + evolved |
|
||||
| Prompts | Full access | Goal-based abstraction |
|
||||
|
||||
---
|
||||
|
||||
## When to Choose CrewAI
|
||||
|
||||
CrewAI is the better choice when:
|
||||
|
||||
1. **Roles are well-defined** - You know exactly what each agent should do
|
||||
2. **Predictable workflows** - Sequential or hierarchical processes work
|
||||
3. **Direct control needed** - Want to define every aspect of agent behavior
|
||||
4. **Simple team structures** - Small crews with clear responsibilities
|
||||
5. **Quick prototyping** - Get a multi-agent system running fast
|
||||
6. **No evolution needed** - Workflow won't need to adapt over time
|
||||
|
||||
---
|
||||
|
||||
## When to Choose Aden
|
||||
|
||||
Aden is the better choice when:
|
||||
|
||||
1. **Goals over roles** - Know what to achieve, not how to organize
|
||||
2. **Adaptation required** - System needs to improve from failures
|
||||
3. **Complex workflows** - Dynamic connections between many agents
|
||||
4. **Production deployment** - Need monitoring, cost controls, health checks
|
||||
5. **Human oversight** - Require native HITL with escalation policies
|
||||
6. **Continuous improvement** - Want agents to get better automatically
|
||||
7. **Cost management** - Need budget enforcement and model degradation
|
||||
|
||||
---
|
||||
|
||||
## Hybrid Approaches
|
||||
|
||||
Some teams use both frameworks:
|
||||
|
||||
### CrewAI for Specific Tasks
|
||||
```python
|
||||
# Use CrewAI for well-defined sub-tasks
|
||||
research_crew = Crew(agents=[...], tasks=[...])
|
||||
```
|
||||
|
||||
### Aden for Orchestration
|
||||
```python
|
||||
# Aden orchestrates and evolves the overall system
|
||||
# CrewAI crews can be nodes in Aden's graph
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Migration Considerations
|
||||
|
||||
### CrewAI to Aden
|
||||
- Map roles to goal descriptions
|
||||
- Convert tasks to expected outcomes
|
||||
- Existing tools often transfer directly
|
||||
- Add failure scenarios to enable evolution
|
||||
|
||||
### Aden to CrewAI
|
||||
- Analyze generated agent graph for roles
|
||||
- Define explicit role/backstory from behavior
|
||||
- Recreate evolution logic manually if needed
|
||||
- Set up external monitoring
|
||||
|
||||
---
|
||||
|
||||
## Performance Comparison
|
||||
|
||||
| Metric | CrewAI | Aden |
|
||||
|--------|--------|------|
|
||||
| Startup time | Fast | Moderate (includes setup) |
|
||||
| Execution overhead | Low | Low |
|
||||
| Memory usage | Depends on agents | Includes monitoring |
|
||||
| LLM calls | As defined | Optimized + tracked |
|
||||
|
||||
---
|
||||
|
||||
## Community & Ecosystem
|
||||
|
||||
| Aspect | CrewAI | Aden |
|
||||
|--------|--------|------|
|
||||
| GitHub stars | High | Growing |
|
||||
| Community size | Large | Growing |
|
||||
| Enterprise users | Many | Early adopters |
|
||||
| Third-party tools | Growing ecosystem | Integrated platform |
|
||||
|
||||
---
|
||||
|
||||
## Conclusion
|
||||
|
||||
**CrewAI** excels at creating predictable, role-based agent teams with explicit control over behavior and collaboration patterns. It's ideal for well-defined workflows.
|
||||
|
||||
**Aden** shines when you need agents that evolve and improve, with built-in production features like monitoring and cost control. It's better for systems that need to adapt.
|
||||
|
||||
### Decision Matrix
|
||||
|
||||
| Your Situation | Choose |
|
||||
|----------------|--------|
|
||||
| Know exact roles needed | CrewAI |
|
||||
| Know outcomes, not structure | Aden |
|
||||
| Need predictable behavior | CrewAI |
|
||||
| Need adaptive behavior | Aden |
|
||||
| Simple prototyping | CrewAI |
|
||||
| Production deployment | Aden |
|
||||
| Cost management important | Aden |
|
||||
| Maximum control | CrewAI |
|
||||
|
||||
---
|
||||
|
||||
*Last updated: January 2025*
|
||||
@@ -0,0 +1,266 @@
|
||||
# Aden vs LangChain: A Detailed Comparison
|
||||
|
||||
*Choosing between goal-driven agents and component-based development*
|
||||
|
||||
---
|
||||
|
||||
LangChain and Aden represent two different philosophies for building AI agent systems. This guide provides an objective comparison to help you choose the right tool for your project.
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
| Aspect | LangChain | Aden |
|
||||
|--------|-----------|------|
|
||||
| **Philosophy** | Component library for LLM apps | Goal-driven, self-improving agents |
|
||||
| **Primary Language** | Python, JavaScript | Python SDK, TypeScript backend |
|
||||
| **Architecture** | Chains and components | Node-based agent graphs |
|
||||
| **Workflow Definition** | Manual chain creation | Generated from natural language |
|
||||
| **Self-Improvement** | No | Yes, automatic evolution |
|
||||
| **Monitoring** | Third-party integrations | Built-in dashboard |
|
||||
| **License** | MIT | Apache 2.0 |
|
||||
|
||||
---
|
||||
|
||||
## Philosophy & Approach
|
||||
|
||||
### LangChain
|
||||
LangChain follows a **component-based approach**. You manually select and connect components (LLMs, retrievers, tools, memory) to build chains and agents. This gives you fine-grained control but requires explicit workflow definition.
|
||||
|
||||
```python
|
||||
# LangChain: Manual chain construction
|
||||
from langchain import LLMChain, PromptTemplate
|
||||
from langchain.agents import create_react_agent
|
||||
|
||||
# You define every component and connection
|
||||
prompt = PromptTemplate(...)
|
||||
chain = LLMChain(llm=llm, prompt=prompt)
|
||||
agent = create_react_agent(llm, tools, prompt)
|
||||
```
|
||||
|
||||
### Aden
|
||||
Aden follows a **goal-driven approach**. You describe what you want to achieve in natural language, and a coding agent generates the agent graph and connection code. When things fail, the system evolves automatically.
|
||||
|
||||
```python
|
||||
# Aden: Goal-driven generation
|
||||
# Describe your goal, the coding agent generates the system
|
||||
goal = """
|
||||
Create a system that monitors customer feedback,
|
||||
categorizes sentiment, and escalates negative reviews
|
||||
to the support team with suggested responses.
|
||||
"""
|
||||
# The framework generates agents, connections, and tests
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Feature Comparison
|
||||
|
||||
### RAG & Document Processing
|
||||
|
||||
| Feature | LangChain | Aden |
|
||||
|---------|-----------|------|
|
||||
| Vector store integrations | Extensive (50+) | Growing |
|
||||
| Document loaders | Comprehensive | Via tools |
|
||||
| Retrieval strategies | Multiple built-in | Customizable |
|
||||
| Query transformation | Built-in | Agent-defined |
|
||||
|
||||
**Verdict:** LangChain excels at RAG with its mature ecosystem of integrations.
|
||||
|
||||
### Agent Architecture
|
||||
|
||||
| Feature | LangChain | Aden |
|
||||
|---------|-----------|------|
|
||||
| Agent types | ReAct, OpenAI Functions, etc. | SDK-wrapped nodes |
|
||||
| Multi-agent | Requires orchestration | Native multi-agent |
|
||||
| Communication | Manual setup | Auto-generated connections |
|
||||
| Graph visualization | Third-party | Built-in dashboard |
|
||||
|
||||
**Verdict:** Aden provides more native multi-agent support; LangChain offers more agent type options.
|
||||
|
||||
### Self-Improvement & Adaptation
|
||||
|
||||
| Feature | LangChain | Aden |
|
||||
|---------|-----------|------|
|
||||
| Failure handling | Manual try/catch | Automatic capture |
|
||||
| Learning from failures | Not built-in | Automatic evolution |
|
||||
| Agent graph updates | Manual code changes | Automated via coding agent |
|
||||
| A/B testing agents | Manual | Roadmap |
|
||||
|
||||
**Verdict:** Aden's self-improvement is a unique differentiator not found in LangChain.
|
||||
|
||||
### Observability & Monitoring
|
||||
|
||||
| Feature | LangChain | Aden |
|
||||
|---------|-----------|------|
|
||||
| Tracing | LangSmith (paid), third-party | Built-in |
|
||||
| Cost tracking | Third-party | Native |
|
||||
| Real-time monitoring | LangSmith | WebSocket dashboard |
|
||||
| Budget controls | Not built-in | Native with auto-degradation |
|
||||
|
||||
**Verdict:** Aden includes monitoring out of the box; LangChain requires LangSmith or third-party tools.
|
||||
|
||||
### Human-in-the-Loop
|
||||
|
||||
| Feature | LangChain | Aden |
|
||||
|---------|-----------|------|
|
||||
| Human approval | Manual implementation | Native intervention nodes |
|
||||
| Escalation policies | Custom code | Configurable timeouts |
|
||||
| Input collection | Custom | Built-in request system |
|
||||
|
||||
**Verdict:** Aden has more built-in HITL support; LangChain requires custom implementation.
|
||||
|
||||
---
|
||||
|
||||
## Code Comparison
|
||||
|
||||
### Building a Customer Support Agent
|
||||
|
||||
#### LangChain Approach
|
||||
```python
|
||||
from langchain.agents import AgentExecutor, create_openai_tools_agent
|
||||
from langchain_openai import ChatOpenAI
|
||||
from langchain.tools import Tool
|
||||
from langchain.memory import ConversationBufferMemory
|
||||
|
||||
# Define tools manually
|
||||
tools = [
|
||||
Tool(name="search_kb", func=search_knowledge_base, description="..."),
|
||||
Tool(name="create_ticket", func=create_support_ticket, description="..."),
|
||||
Tool(name="escalate", func=escalate_to_human, description="..."),
|
||||
]
|
||||
|
||||
# Create agent with explicit configuration
|
||||
llm = ChatOpenAI(model="gpt-4")
|
||||
memory = ConversationBufferMemory()
|
||||
agent = create_openai_tools_agent(llm, tools, prompt)
|
||||
executor = AgentExecutor(agent=agent, tools=tools, memory=memory)
|
||||
|
||||
# Run agent
|
||||
response = executor.invoke({"input": customer_query})
|
||||
|
||||
# Error handling is manual
|
||||
try:
|
||||
response = executor.invoke({"input": query})
|
||||
except Exception as e:
|
||||
log_error(e)
|
||||
# Manual recovery logic
|
||||
```
|
||||
|
||||
#### Aden Approach
|
||||
```python
|
||||
# Define goal - system generates the agent graph
|
||||
goal = """
|
||||
Build a customer support agent that:
|
||||
1. Searches our knowledge base for answers
|
||||
2. Creates tickets for unresolved issues
|
||||
3. Escalates to humans when confidence is low
|
||||
4. Learns from resolved tickets to improve responses
|
||||
|
||||
When the agent fails to help a customer, capture the failure
|
||||
and improve the response strategy.
|
||||
"""
|
||||
|
||||
# Aden generates:
|
||||
# - Agent graph with specialized nodes
|
||||
# - Connection code between nodes
|
||||
# - Test cases for validation
|
||||
# - Monitoring hooks
|
||||
|
||||
# The SDK handles:
|
||||
# - Automatic failure capture
|
||||
# - Evolution based on failures
|
||||
# - Cost tracking and budget enforcement
|
||||
# - Human escalation at intervention points
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Production Considerations
|
||||
|
||||
### Deployment
|
||||
|
||||
| Aspect | LangChain | Aden |
|
||||
|--------|-----------|------|
|
||||
| Deployment model | Library in your app | Self-hosted platform |
|
||||
| Infrastructure | You manage | Docker Compose included |
|
||||
| Scaling | Your responsibility | Built-in considerations |
|
||||
| Database requirements | Optional | TimescaleDB, MongoDB, PostgreSQL |
|
||||
|
||||
### Cost Management
|
||||
|
||||
| Aspect | LangChain | Aden |
|
||||
|--------|-----------|------|
|
||||
| Token tracking | Manual or LangSmith | Automatic |
|
||||
| Budget limits | Not built-in | Native with enforcement |
|
||||
| Model degradation | Manual | Automatic fallback |
|
||||
| Cost alerts | Third-party | Built-in |
|
||||
|
||||
### Reliability
|
||||
|
||||
| Aspect | LangChain | Aden |
|
||||
|--------|-----------|------|
|
||||
| Retry logic | Manual | Built-in |
|
||||
| Fallback chains | Manual | Automatic |
|
||||
| Health monitoring | Third-party | Native endpoints |
|
||||
| Self-healing | No | Yes |
|
||||
|
||||
---
|
||||
|
||||
## When to Choose LangChain
|
||||
|
||||
LangChain is the better choice when:
|
||||
|
||||
1. **Building RAG applications** - LangChain's retrieval ecosystem is unmatched
|
||||
2. **Need extensive integrations** - 50+ vector stores, document loaders, etc.
|
||||
3. **Want fine-grained control** - Every component is explicitly configured
|
||||
4. **Already invested** - Large existing LangChain codebase
|
||||
5. **Simple agent needs** - Single-purpose agents without complex orchestration
|
||||
6. **Prefer library over platform** - Want to embed in existing infrastructure
|
||||
|
||||
---
|
||||
|
||||
## When to Choose Aden
|
||||
|
||||
Aden is the better choice when:
|
||||
|
||||
1. **Agents need to evolve** - Systems should improve from failures automatically
|
||||
2. **Goal-driven development** - Prefer describing outcomes over coding workflows
|
||||
3. **Multi-agent systems** - Complex agent graphs with dynamic connections
|
||||
4. **Production monitoring is critical** - Need built-in observability
|
||||
5. **Cost control matters** - Require budget enforcement and auto-degradation
|
||||
6. **Human oversight needed** - Native HITL support with escalation
|
||||
7. **Rapid iteration** - Want to change agent behavior without code rewrites
|
||||
|
||||
---
|
||||
|
||||
## Migration Considerations
|
||||
|
||||
### LangChain to Aden
|
||||
- LangChain tools can often be adapted as Aden node tools
|
||||
- Existing prompts can inform goal definitions
|
||||
- Consider gradual migration, running systems in parallel
|
||||
|
||||
### Aden to LangChain
|
||||
- Agent graphs can be manually reimplemented as chains
|
||||
- Monitoring would need replacement (LangSmith or alternatives)
|
||||
- Self-improvement logic would need custom implementation
|
||||
|
||||
---
|
||||
|
||||
## Conclusion
|
||||
|
||||
**LangChain** is a mature, flexible component library ideal for RAG applications and developers who want explicit control over every aspect of their agent.
|
||||
|
||||
**Aden** offers a paradigm shift with goal-driven, self-improving agents, better suited for production systems that need to adapt and evolve over time with built-in monitoring.
|
||||
|
||||
The choice depends on:
|
||||
- **Control vs. Automation**: LangChain for control, Aden for automation
|
||||
- **Static vs. Evolving**: LangChain for stable workflows, Aden for adaptive systems
|
||||
- **Library vs. Platform**: LangChain as a library, Aden as a platform
|
||||
|
||||
Many teams use both: LangChain for specific RAG components, Aden for orchestration and evolution.
|
||||
|
||||
---
|
||||
|
||||
*Last updated: January 2025*
|
||||
@@ -0,0 +1,465 @@
|
||||
# AI Agent Cost Management: A Complete Guide
|
||||
|
||||
*Control spending, optimize efficiency, and prevent budget disasters*
|
||||
|
||||
---
|
||||
|
||||
AI agents can burn through budgets faster than you expect. A single runaway agent loop can cost thousands of dollars in minutes. This guide covers strategies, tools, and best practices for managing AI agent costs.
|
||||
|
||||
---
|
||||
|
||||
## The Cost Problem
|
||||
|
||||
### Why AI Agents Are Expensive
|
||||
|
||||
| Factor | Impact |
|
||||
|--------|--------|
|
||||
| LLM API calls | $0.01 - $0.10+ per call |
|
||||
| Token usage | Input + output tokens |
|
||||
| Agent loops | Multiple calls per task |
|
||||
| Retries | Failed calls still cost money |
|
||||
| Verbose prompts | More tokens = more cost |
|
||||
| Tool usage | Additional API calls |
|
||||
|
||||
### Real-World Example
|
||||
```
|
||||
Simple customer support agent:
|
||||
- 5 LLM calls per interaction
|
||||
- 2000 tokens average per call
|
||||
- GPT-4: ~$0.06 per call
|
||||
- 100 interactions/day = $30/day
|
||||
|
||||
Complex research agent:
|
||||
- 50+ LLM calls per task
|
||||
- 10000 tokens average per call
|
||||
- GPT-4: ~$0.30 per call
|
||||
- 10 tasks/day = $150/day
|
||||
|
||||
Runaway agent loop:
|
||||
- 1000 calls in 10 minutes
|
||||
- $300+ before detection
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Cost Control Strategies
|
||||
|
||||
### Strategy 1: Budget Limits
|
||||
|
||||
Set hard limits on spending per:
|
||||
- Time period (daily, weekly, monthly)
|
||||
- Agent
|
||||
- Task
|
||||
- Team
|
||||
- User
|
||||
|
||||
```python
|
||||
budget_config = {
|
||||
"daily_limit": 100.00,
|
||||
"per_task_limit": 5.00,
|
||||
"per_agent_limit": 50.00,
|
||||
"alert_at_percentage": 80,
|
||||
"action_on_limit": "block" # or "degrade", "alert"
|
||||
}
|
||||
```
|
||||
|
||||
### Strategy 2: Model Degradation
|
||||
|
||||
Automatically switch to cheaper models as budget is consumed:
|
||||
|
||||
```
|
||||
Budget usage:
|
||||
0-70% → Use GPT-4 (best quality)
|
||||
70-90% → Use GPT-3.5-turbo (good quality)
|
||||
90-100% → Use GPT-3.5-turbo with shorter prompts
|
||||
100%+ → Block or queue requests
|
||||
```
|
||||
|
||||
### Strategy 3: Request Throttling
|
||||
|
||||
Limit request rate to control burn rate:
|
||||
|
||||
```python
|
||||
throttle_config = {
|
||||
"requests_per_minute": 10,
|
||||
"requests_per_hour": 200,
|
||||
"backoff_multiplier": 2,
|
||||
"max_backoff_seconds": 60
|
||||
}
|
||||
```
|
||||
|
||||
### Strategy 4: Token Optimization
|
||||
|
||||
Reduce tokens per request:
|
||||
|
||||
| Technique | Savings |
|
||||
|-----------|---------|
|
||||
| Shorter system prompts | 20-40% |
|
||||
| Compressed context | 30-50% |
|
||||
| Response length limits | 20-30% |
|
||||
| Remove unnecessary examples | 10-20% |
|
||||
|
||||
### Strategy 5: Caching
|
||||
|
||||
Cache common requests and responses:
|
||||
|
||||
```python
|
||||
# Before: Every request hits the API
|
||||
result = llm.complete(prompt) # Costs money
|
||||
|
||||
# After: Cache frequent patterns
|
||||
cached = cache.get(prompt_hash)
|
||||
if cached:
|
||||
result = cached # Free
|
||||
else:
|
||||
result = llm.complete(prompt)
|
||||
cache.set(prompt_hash, result)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Framework Comparison: Cost Features
|
||||
|
||||
| Framework | Budget Limits | Degradation | Tracking | Alerts |
|
||||
|-----------|--------------|-------------|----------|--------|
|
||||
| LangChain | Third-party | Manual | LangSmith | Manual |
|
||||
| CrewAI | Not built-in | Manual | Basic | Manual |
|
||||
| AutoGen | Not built-in | Manual | Manual | Manual |
|
||||
| **Aden** | **Native** | **Automatic** | **Built-in** | **Native** |
|
||||
|
||||
### Aden's Cost Controls
|
||||
Aden includes comprehensive cost management:
|
||||
|
||||
```python
|
||||
# Budget configuration in Aden
|
||||
budget_rules = {
|
||||
"budget_id": "team_engineering",
|
||||
"limits": {
|
||||
"daily": 500.00,
|
||||
"monthly": 10000.00,
|
||||
"per_agent": 100.00
|
||||
},
|
||||
"degradation": {
|
||||
"80_percent": "switch_to_gpt35",
|
||||
"95_percent": "throttle",
|
||||
"100_percent": "block"
|
||||
},
|
||||
"alerts": {
|
||||
"channels": ["slack", "email"],
|
||||
"thresholds": [50, 80, 95, 100]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Implementing Cost Tracking
|
||||
|
||||
### Basic Tracking
|
||||
```python
|
||||
class CostTracker:
|
||||
def __init__(self):
|
||||
self.total_cost = 0
|
||||
self.cost_by_agent = {}
|
||||
self.cost_by_model = {}
|
||||
|
||||
def track(self, request, response, model):
|
||||
input_tokens = count_tokens(request)
|
||||
output_tokens = count_tokens(response)
|
||||
|
||||
cost = self.calculate_cost(model, input_tokens, output_tokens)
|
||||
|
||||
self.total_cost += cost
|
||||
self.cost_by_agent[request.agent_id] = \
|
||||
self.cost_by_agent.get(request.agent_id, 0) + cost
|
||||
self.cost_by_model[model] = \
|
||||
self.cost_by_model.get(model, 0) + cost
|
||||
|
||||
return cost
|
||||
|
||||
def calculate_cost(self, model, input_tokens, output_tokens):
|
||||
rates = {
|
||||
"gpt-4": {"input": 0.03, "output": 0.06}, # per 1K tokens
|
||||
"gpt-3.5-turbo": {"input": 0.0005, "output": 0.0015},
|
||||
"claude-3-opus": {"input": 0.015, "output": 0.075},
|
||||
"claude-3-sonnet": {"input": 0.003, "output": 0.015},
|
||||
}
|
||||
rate = rates.get(model, rates["gpt-3.5-turbo"])
|
||||
return (input_tokens * rate["input"] + output_tokens * rate["output"]) / 1000
|
||||
```
|
||||
|
||||
### Advanced Tracking with Attribution
|
||||
```python
|
||||
cost_record = {
|
||||
"timestamp": "2025-01-15T10:30:00Z",
|
||||
"request_id": "req_123",
|
||||
"agent_id": "support_agent_1",
|
||||
"task_id": "task_456",
|
||||
"team_id": "customer_success",
|
||||
"model": "gpt-4",
|
||||
"input_tokens": 1500,
|
||||
"output_tokens": 500,
|
||||
"cost_usd": 0.075,
|
||||
"cached": False,
|
||||
"degraded": False
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Alert Configuration
|
||||
|
||||
### Threshold Alerts
|
||||
```yaml
|
||||
alerts:
|
||||
- name: "Budget Warning"
|
||||
condition: "daily_spend > daily_budget * 0.8"
|
||||
channels: ["slack"]
|
||||
message: "80% of daily budget consumed"
|
||||
|
||||
- name: "Budget Critical"
|
||||
condition: "daily_spend > daily_budget * 0.95"
|
||||
channels: ["slack", "pagerduty"]
|
||||
message: "95% of daily budget - taking action"
|
||||
action: "degrade_models"
|
||||
|
||||
- name: "Runaway Agent"
|
||||
condition: "requests_per_minute > 100"
|
||||
channels: ["pagerduty"]
|
||||
message: "Possible runaway agent detected"
|
||||
action: "pause_agent"
|
||||
```
|
||||
|
||||
### Anomaly Detection
|
||||
```python
|
||||
def detect_anomalies(recent_costs, historical_average):
|
||||
"""Alert if costs significantly exceed historical patterns"""
|
||||
threshold = historical_average * 3 # 3x normal
|
||||
|
||||
if recent_costs > threshold:
|
||||
alert(
|
||||
level="critical",
|
||||
message=f"Cost anomaly: ${recent_costs:.2f} vs avg ${historical_average:.2f}",
|
||||
action="investigate"
|
||||
)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Model Selection Strategies
|
||||
|
||||
### Cost vs Quality Matrix
|
||||
|
||||
| Model | Cost (per 1K tokens) | Quality | Best For |
|
||||
|-------|---------------------|---------|----------|
|
||||
| GPT-4 | $0.03-0.06 | Highest | Complex reasoning |
|
||||
| GPT-4-turbo | $0.01-0.03 | High | Balance cost/quality |
|
||||
| GPT-3.5-turbo | $0.0005-0.0015 | Good | High volume, simple |
|
||||
| Claude 3 Opus | $0.015-0.075 | Highest | Long context |
|
||||
| Claude 3 Sonnet | $0.003-0.015 | High | Good balance |
|
||||
| Claude 3 Haiku | $0.00025-0.00125 | Good | Fast, cheap |
|
||||
|
||||
### Dynamic Model Selection
|
||||
```python
|
||||
def select_model(task_complexity, budget_remaining, daily_limit):
|
||||
budget_percentage = (daily_limit - budget_remaining) / daily_limit
|
||||
|
||||
if task_complexity == "simple":
|
||||
return "gpt-3.5-turbo" # Always cheap for simple
|
||||
elif budget_percentage < 0.5:
|
||||
return "gpt-4" # Best model when budget healthy
|
||||
elif budget_percentage < 0.8:
|
||||
return "gpt-4-turbo" # Balanced
|
||||
else:
|
||||
return "gpt-3.5-turbo" # Preserve budget
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Optimization Techniques
|
||||
|
||||
### 1. Prompt Engineering for Cost
|
||||
```python
|
||||
# Expensive: Long system prompt
|
||||
system_prompt = """
|
||||
You are a helpful assistant that specializes in customer support.
|
||||
You should always be polite, professional, and helpful.
|
||||
When answering questions, provide detailed explanations.
|
||||
Always consider the customer's perspective.
|
||||
Remember to be empathetic and understanding.
|
||||
[... 500 more tokens ...]
|
||||
"""
|
||||
|
||||
# Cheaper: Concise system prompt
|
||||
system_prompt = """
|
||||
Customer support agent. Be helpful, polite, concise.
|
||||
Resolve issues efficiently.
|
||||
"""
|
||||
# Savings: ~400 tokens × 1000 requests = $12/day
|
||||
```
|
||||
|
||||
### 2. Context Window Management
|
||||
```python
|
||||
def manage_context(messages, max_tokens=4000):
|
||||
"""Keep context within budget by summarizing old messages"""
|
||||
current_tokens = count_tokens(messages)
|
||||
|
||||
if current_tokens > max_tokens:
|
||||
# Summarize older messages
|
||||
old_messages = messages[:-5] # Keep recent
|
||||
summary = summarize(old_messages)
|
||||
|
||||
return [{"role": "system", "content": f"Previous context: {summary}"}] + messages[-5:]
|
||||
|
||||
return messages
|
||||
```
|
||||
|
||||
### 3. Batch Processing
|
||||
```python
|
||||
# Expensive: Individual requests
|
||||
for item in items:
|
||||
result = llm.complete(f"Process: {item}")
|
||||
|
||||
# Cheaper: Batch when possible
|
||||
batch_prompt = "Process these items:\n" + "\n".join(items)
|
||||
results = llm.complete(batch_prompt)
|
||||
```
|
||||
|
||||
### 4. Response Length Control
|
||||
```python
|
||||
# Add to system prompt
|
||||
system_prompt += "\nKeep responses under 200 words."
|
||||
|
||||
# Or use max_tokens parameter
|
||||
response = llm.complete(
|
||||
prompt,
|
||||
max_tokens=300 # Hard limit
|
||||
)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Runaway Agent Prevention
|
||||
|
||||
### Detection Mechanisms
|
||||
```python
|
||||
class RunawayDetector:
|
||||
def __init__(self):
|
||||
self.request_times = []
|
||||
self.max_requests_per_minute = 50
|
||||
self.max_cost_per_minute = 10.00
|
||||
|
||||
def check(self, cost):
|
||||
now = time.time()
|
||||
self.request_times.append((now, cost))
|
||||
|
||||
# Clean old entries
|
||||
self.request_times = [
|
||||
(t, c) for t, c in self.request_times
|
||||
if now - t < 60
|
||||
]
|
||||
|
||||
# Check thresholds
|
||||
requests_per_minute = len(self.request_times)
|
||||
cost_per_minute = sum(c for _, c in self.request_times)
|
||||
|
||||
if requests_per_minute > self.max_requests_per_minute:
|
||||
return "RUNAWAY_REQUESTS"
|
||||
if cost_per_minute > self.max_cost_per_minute:
|
||||
return "RUNAWAY_COST"
|
||||
|
||||
return "OK"
|
||||
```
|
||||
|
||||
### Circuit Breakers
|
||||
```python
|
||||
class CostCircuitBreaker:
|
||||
def __init__(self, threshold, window_seconds=60):
|
||||
self.threshold = threshold
|
||||
self.window_seconds = window_seconds
|
||||
self.costs = []
|
||||
self.is_open = False
|
||||
|
||||
def record_cost(self, cost):
|
||||
now = time.time()
|
||||
self.costs.append((now, cost))
|
||||
self._cleanup()
|
||||
|
||||
total_cost = sum(c for _, c in self.costs)
|
||||
if total_cost > self.threshold:
|
||||
self.is_open = True
|
||||
alert("Circuit breaker opened - costs exceeded threshold")
|
||||
|
||||
def allow_request(self):
|
||||
if self.is_open:
|
||||
# Check if we should reset
|
||||
if time.time() - self.costs[-1][0] > self.window_seconds:
|
||||
self.is_open = False
|
||||
self.costs = []
|
||||
return True
|
||||
return False
|
||||
return True
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Dashboard Metrics
|
||||
|
||||
### Essential Cost Metrics
|
||||
|
||||
| Metric | Description | Alert Threshold |
|
||||
|--------|-------------|-----------------|
|
||||
| Hourly spend | Cost in last hour | > 2x average |
|
||||
| Daily spend | Cost today | > 80% budget |
|
||||
| Cost per task | Average task cost | > expected |
|
||||
| Token efficiency | Output/input ratio | < 0.3 |
|
||||
| Cache hit rate | Cached vs new requests | < 50% |
|
||||
| Model distribution | % by model | Unexpected shifts |
|
||||
|
||||
### Aden Dashboard
|
||||
Aden provides built-in cost visualization:
|
||||
- Real-time cost tracking
|
||||
- Budget gauges with alerts
|
||||
- Cost by agent/model breakdown
|
||||
- Historical trends
|
||||
- Anomaly detection
|
||||
|
||||
---
|
||||
|
||||
## Best Practices Summary
|
||||
|
||||
### Do's
|
||||
1. ✅ Set budget limits before deployment
|
||||
2. ✅ Implement automatic degradation
|
||||
3. ✅ Monitor costs in real-time
|
||||
4. ✅ Alert on anomalies
|
||||
5. ✅ Optimize prompts for token efficiency
|
||||
6. ✅ Cache common requests
|
||||
7. ✅ Use appropriate models for task complexity
|
||||
8. ✅ Review costs regularly
|
||||
|
||||
### Don'ts
|
||||
1. ❌ Deploy without budget limits
|
||||
2. ❌ Use GPT-4 for everything
|
||||
3. ❌ Ignore cost metrics
|
||||
4. ❌ Allow unlimited retries
|
||||
5. ❌ Store full context forever
|
||||
6. ❌ Skip testing cost scenarios
|
||||
7. ❌ Forget about tool API costs
|
||||
|
||||
---
|
||||
|
||||
## Conclusion
|
||||
|
||||
AI agent cost management requires:
|
||||
|
||||
1. **Prevention**: Budget limits, degradation policies
|
||||
2. **Detection**: Real-time tracking, anomaly alerts
|
||||
3. **Optimization**: Smart model selection, token efficiency
|
||||
4. **Protection**: Circuit breakers, runaway detection
|
||||
|
||||
Frameworks like Aden with built-in cost controls make this easier, but the principles apply to any agent system. Start with conservative limits and adjust based on real usage patterns.
|
||||
|
||||
---
|
||||
|
||||
*Last updated: January 2025*
|
||||
@@ -0,0 +1,423 @@
|
||||
# AI Agent Observability & Monitoring: The Complete Guide
|
||||
|
||||
*How to know what your AI agents are actually doing*
|
||||
|
||||
---
|
||||
|
||||
AI agents are autonomous systems that make decisions, call tools, and interact with the world. Without proper observability, they become black boxes. This guide covers everything you need to monitor AI agents effectively.
|
||||
|
||||
---
|
||||
|
||||
## Why Agent Observability Is Different
|
||||
|
||||
Traditional application monitoring tracks requests and responses. Agent monitoring must track:
|
||||
|
||||
| Traditional Apps | AI Agents |
|
||||
|------------------|-----------|
|
||||
| Request/Response | Multi-step reasoning chains |
|
||||
| Deterministic behavior | Probabilistic decisions |
|
||||
| Fixed execution paths | Dynamic tool selection |
|
||||
| Predictable costs | Variable LLM spending |
|
||||
| Clear errors | Subtle quality degradation |
|
||||
|
||||
---
|
||||
|
||||
## The Four Pillars of Agent Observability
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Agent Observability Stack │
|
||||
│ │
|
||||
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
|
||||
│ │ Metrics │ │ Logs │ │ Traces │ │
|
||||
│ │ (Numbers) │ │ (Events) │ │ (Execution Flow) │ │
|
||||
│ └─────────────┘ └─────────────┘ └─────────────────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌───────────────────────┐ │
|
||||
│ │ Quality Evals │ │
|
||||
│ │ (Output Assessment) │ │
|
||||
│ └───────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### 1. Metrics
|
||||
Quantitative measurements over time:
|
||||
- Requests per minute
|
||||
- Success/failure rates
|
||||
- Latency distributions
|
||||
- Token usage
|
||||
- Cost per request
|
||||
- Tool call frequencies
|
||||
|
||||
### 2. Logs
|
||||
Discrete events with context:
|
||||
- Agent decisions
|
||||
- Tool inputs/outputs
|
||||
- Error messages
|
||||
- User interactions
|
||||
- System events
|
||||
|
||||
### 3. Traces
|
||||
End-to-end execution flows:
|
||||
- Full reasoning chains
|
||||
- Token-by-token generation
|
||||
- Tool call sequences
|
||||
- Parent-child relationships
|
||||
- Cross-agent communication
|
||||
|
||||
### 4. Quality Evals
|
||||
Output quality assessment:
|
||||
- Accuracy scoring
|
||||
- Hallucination detection
|
||||
- Task completion rates
|
||||
- User satisfaction
|
||||
- Regression detection
|
||||
|
||||
---
|
||||
|
||||
## Key Metrics to Track
|
||||
|
||||
### Performance Metrics
|
||||
| Metric | Description | Alert Threshold |
|
||||
|--------|-------------|-----------------|
|
||||
| `agent.latency.p50` | Median response time | > 5s |
|
||||
| `agent.latency.p99` | 99th percentile latency | > 30s |
|
||||
| `agent.throughput` | Requests/second | < baseline * 0.5 |
|
||||
| `agent.queue.depth` | Pending requests | > 100 |
|
||||
| `agent.timeout.rate` | Timeout percentage | > 5% |
|
||||
|
||||
### Reliability Metrics
|
||||
| Metric | Description | Alert Threshold |
|
||||
|--------|-------------|-----------------|
|
||||
| `agent.success.rate` | Successful completions | < 95% |
|
||||
| `agent.error.rate` | Error percentage | > 5% |
|
||||
| `agent.retry.rate` | Retries needed | > 10% |
|
||||
| `agent.fallback.rate` | Fallback usage | > 20% |
|
||||
| `agent.circuit.open` | Circuit breaker status | true |
|
||||
|
||||
### Cost Metrics
|
||||
| Metric | Description | Alert Threshold |
|
||||
|--------|-------------|-----------------|
|
||||
| `agent.cost.total` | Total spend | > budget * 0.9 |
|
||||
| `agent.cost.per.request` | Cost per request | > $0.50 |
|
||||
| `agent.tokens.input` | Input tokens used | anomaly detection |
|
||||
| `agent.tokens.output` | Output tokens used | anomaly detection |
|
||||
| `agent.model.usage` | Calls by model | unusual patterns |
|
||||
|
||||
### Quality Metrics
|
||||
| Metric | Description | Alert Threshold |
|
||||
|--------|-------------|-----------------|
|
||||
| `agent.quality.score` | Output quality (0-1) | < 0.7 |
|
||||
| `agent.hallucination.rate` | Detected hallucinations | > 5% |
|
||||
| `agent.task.completion` | Tasks fully completed | < 80% |
|
||||
| `agent.user.satisfaction` | User ratings | < 4.0/5.0 |
|
||||
|
||||
---
|
||||
|
||||
## Logging Best Practices
|
||||
|
||||
### Structured Logging Format
|
||||
```json
|
||||
{
|
||||
"timestamp": "2025-01-15T10:30:00Z",
|
||||
"level": "info",
|
||||
"event": "agent_tool_call",
|
||||
"agent_id": "agent-123",
|
||||
"session_id": "session-456",
|
||||
"trace_id": "trace-789",
|
||||
"tool": "search_web",
|
||||
"input": {"query": "latest AI news"},
|
||||
"output_tokens": 150,
|
||||
"latency_ms": 1200,
|
||||
"success": true
|
||||
}
|
||||
```
|
||||
|
||||
### What to Log
|
||||
|
||||
**Always Log:**
|
||||
- Agent start/stop
|
||||
- Tool calls (name, duration, success)
|
||||
- LLM calls (model, tokens, latency)
|
||||
- Errors and exceptions
|
||||
- Human interventions
|
||||
- Budget events
|
||||
|
||||
**Log Carefully (PII concerns):**
|
||||
- User inputs (may need redaction)
|
||||
- Agent outputs (may contain sensitive data)
|
||||
- Full prompts (can be large)
|
||||
|
||||
**Never Log:**
|
||||
- API keys
|
||||
- User credentials
|
||||
- Full conversation transcripts in production
|
||||
- Raw model weights
|
||||
|
||||
### Log Levels for Agents
|
||||
|
||||
| Level | Use Case |
|
||||
|-------|----------|
|
||||
| DEBUG | Full prompts, token-level details |
|
||||
| INFO | Tool calls, completions, metrics |
|
||||
| WARN | Retries, degradation, budget warnings |
|
||||
| ERROR | Failures, exceptions, circuit breaks |
|
||||
| FATAL | System crashes, unrecoverable errors |
|
||||
|
||||
---
|
||||
|
||||
## Distributed Tracing for Agents
|
||||
|
||||
### Why Tracing Matters
|
||||
Agents involve multiple steps, LLM calls, and tool invocations. Tracing connects them all.
|
||||
|
||||
```
|
||||
Trace: "Process customer refund"
|
||||
├── Span: Agent Initialize (5ms)
|
||||
├── Span: LLM Planning Call (800ms)
|
||||
│ └── Attribute: model=gpt-4, tokens=500
|
||||
├── Span: Tool: fetch_order (200ms)
|
||||
│ └── Attribute: order_id=12345
|
||||
├── Span: Tool: check_policy (50ms)
|
||||
├── Span: LLM Decision Call (600ms)
|
||||
│ └── Attribute: decision=approve
|
||||
├── Span: Tool: process_refund (300ms)
|
||||
└── Span: Agent Complete (10ms)
|
||||
└── Attribute: success=true, cost=$0.08
|
||||
```
|
||||
|
||||
### Key Trace Attributes
|
||||
- `agent.id`: Unique agent identifier
|
||||
- `agent.type`: Agent type/role
|
||||
- `session.id`: User session
|
||||
- `parent.agent`: For multi-agent systems
|
||||
- `llm.model`: Model used
|
||||
- `llm.tokens`: Token counts
|
||||
- `tool.name`: Tool being called
|
||||
- `tool.success`: Tool outcome
|
||||
|
||||
---
|
||||
|
||||
## Dashboard Design
|
||||
|
||||
### Dashboard 1: Operations Overview
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Agent Operations │
|
||||
├─────────────────┬─────────────────┬─────────────────────────┤
|
||||
│ Active Agents │ Requests/Min │ Error Rate │
|
||||
│ 42 │ 1,234 │ 0.3% ✓ │
|
||||
├─────────────────┴─────────────────┴─────────────────────────┤
|
||||
│ │
|
||||
│ Request Latency (p50/p99) Success Rate (24h) │
|
||||
│ ████████████████░░░░ ██████████████████████ │
|
||||
│ 1.2s / 4.5s 99.2% │
|
||||
│ │
|
||||
├─────────────────────────────────────────────────────────────┤
|
||||
│ Top Errors Active Alerts │
|
||||
│ • Rate limit exceeded (12) ⚠️ High latency p99 │
|
||||
│ • Tool timeout (5) ⚠️ Budget at 85% │
|
||||
│ • Validation failed (3) │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Dashboard 2: Cost & Usage
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Cost & Usage │
|
||||
├─────────────────┬─────────────────┬─────────────────────────┤
|
||||
│ Today's Spend │ Budget Used │ Projected Monthly │
|
||||
│ $127.50 │ 67% │ $3,825 │
|
||||
├─────────────────┴─────────────────┴─────────────────────────┤
|
||||
│ │
|
||||
│ Cost by Model │ Cost by Agent │
|
||||
│ ■ GPT-4: $89 │ ■ Support: $45 │
|
||||
│ ■ Claude: $28 │ ■ Research: $52 │
|
||||
│ ■ GPT-3.5: $10 │ ■ Writer: $30 │
|
||||
│ │
|
||||
├─────────────────────────────────────────────────────────────┤
|
||||
│ Token Usage Trend (7 days) │
|
||||
│ ▁▂▃▄▅▆▇█▇▆▅▄▃▂▁▂▃▄▅▆ │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Dashboard 3: Quality & Reliability
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Quality & Reliability │
|
||||
├─────────────────┬─────────────────┬─────────────────────────┤
|
||||
│ Quality Score │ Task Complete │ User Satisfaction │
|
||||
│ 0.92/1.0 │ 94.5% │ 4.6/5.0 │
|
||||
├─────────────────┴─────────────────┴─────────────────────────┤
|
||||
│ │
|
||||
│ Quality Trend (30 days) │ Failure Analysis │
|
||||
│ ████████████████████████ │ ■ LLM errors: 2% │
|
||||
│ ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔ │ ■ Tool errors: 1% │
|
||||
│ Target: 0.90 │ ■ Timeouts: 0.5% │
|
||||
│ │ ■ Logic errors: 0.5% │
|
||||
├─────────────────────────────────────────────────────────────┤
|
||||
│ Recent Quality Issues │
|
||||
│ • Agent-42 hallucination detected (15 min ago) │
|
||||
│ • Agent-17 task incomplete (1 hour ago) │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Alerting Strategy
|
||||
|
||||
### Critical Alerts (Page immediately)
|
||||
- Error rate > 10% for 5 minutes
|
||||
- All agents offline
|
||||
- Budget exceeded
|
||||
- Security anomaly detected
|
||||
|
||||
### Warning Alerts (Notify during business hours)
|
||||
- Error rate > 5% for 15 minutes
|
||||
- Latency p99 > 30s
|
||||
- Budget > 90% of limit
|
||||
- Quality score drops > 10%
|
||||
|
||||
### Informational (Daily digest)
|
||||
- Token usage trends
|
||||
- Cost projections
|
||||
- Quality score changes
|
||||
- New error types detected
|
||||
|
||||
### Alert Fatigue Prevention
|
||||
- Use anomaly detection vs fixed thresholds
|
||||
- Group related alerts
|
||||
- Implement progressive escalation
|
||||
- Review and tune alert thresholds monthly
|
||||
|
||||
---
|
||||
|
||||
## Tool Comparison
|
||||
|
||||
| Tool | Best For | Agent-Specific Features |
|
||||
|------|----------|------------------------|
|
||||
| Datadog | Enterprise, full-stack | APM for LLM calls |
|
||||
| Grafana | Self-hosted, flexibility | Custom dashboards |
|
||||
| LangSmith | LangChain users | Prompt tracing |
|
||||
| Weights & Biases | ML teams | Experiment tracking |
|
||||
| Helicone | LLM-focused | Token analytics |
|
||||
| Aden | Production agents | Built-in observability |
|
||||
|
||||
---
|
||||
|
||||
## How Aden Handles Observability
|
||||
|
||||
Aden provides built-in observability without additional setup:
|
||||
|
||||
### Automatic Collection
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Aden Observability │
|
||||
│ │
|
||||
│ ┌───────────────┐ ┌───────────────────────────────┐ │
|
||||
│ │ SDK-Wrapped │──────▶│ Event Stream │ │
|
||||
│ │ Nodes │ │ • Metrics • Logs • Traces │ │
|
||||
│ └───────────────┘ └───────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌───────────────────────────────────────────────────────┐ │
|
||||
│ │ Honeycomb Dashboard │ │
|
||||
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
|
||||
│ │ │ Metrics │ │ Costs │ │ Quality │ │ Alerts │ │ │
|
||||
│ │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │ │
|
||||
│ └───────────────────────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### What Aden Tracks Automatically
|
||||
- Every LLM call (model, tokens, latency, cost)
|
||||
- Every tool invocation (name, duration, success)
|
||||
- Agent lifecycle events (start, stop, error)
|
||||
- Budget consumption in real-time
|
||||
- Quality metrics via failure tracking
|
||||
- HITL intervention points
|
||||
|
||||
### Built-in Dashboards
|
||||
- Real-time agent status
|
||||
- Cost breakdown by agent/model
|
||||
- Quality trends over time
|
||||
- Failure analysis
|
||||
- Self-improvement metrics
|
||||
|
||||
### No Configuration Required
|
||||
Unlike external tools, Aden's observability requires no setup:
|
||||
```python
|
||||
# Just wrap your node with the SDK
|
||||
from aden import sdk
|
||||
|
||||
@sdk.node
|
||||
async def my_agent(input):
|
||||
# All metrics automatically collected
|
||||
return await process(input)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Implementation Checklist
|
||||
|
||||
### Phase 1: Basic (Week 1)
|
||||
- [ ] Structured logging in place
|
||||
- [ ] Basic metrics: latency, errors, throughput
|
||||
- [ ] Cost tracking per request
|
||||
- [ ] Simple dashboard with key metrics
|
||||
|
||||
### Phase 2: Comprehensive (Week 2-3)
|
||||
- [ ] Distributed tracing implemented
|
||||
- [ ] Quality evaluation pipeline
|
||||
- [ ] Alerting rules configured
|
||||
- [ ] Full dashboards built
|
||||
|
||||
### Phase 3: Advanced (Week 4+)
|
||||
- [ ] Anomaly detection
|
||||
- [ ] Automated regression detection
|
||||
- [ ] Cost optimization insights
|
||||
- [ ] Self-healing triggers
|
||||
|
||||
---
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
### 1. Logging Too Much
|
||||
**Problem:** Full prompts in production logs
|
||||
**Solution:** Log hashes or summaries, full content only for debugging
|
||||
|
||||
### 2. Alert Fatigue
|
||||
**Problem:** Too many non-actionable alerts
|
||||
**Solution:** Use anomaly detection, tune thresholds, require action plans
|
||||
|
||||
### 3. Missing Context
|
||||
**Problem:** Can't correlate events across agents
|
||||
**Solution:** Propagate trace IDs, use correlation IDs
|
||||
|
||||
### 4. Ignoring Quality
|
||||
**Problem:** Only track operational metrics
|
||||
**Solution:** Implement quality scoring, track user feedback
|
||||
|
||||
### 5. No Baselines
|
||||
**Problem:** Don't know what "normal" looks like
|
||||
**Solution:** Establish baselines before alerting, use relative thresholds
|
||||
|
||||
---
|
||||
|
||||
## Conclusion
|
||||
|
||||
Effective agent observability requires:
|
||||
|
||||
1. **Metrics**: Know your numbers (latency, errors, cost)
|
||||
2. **Logs**: Capture events with context
|
||||
3. **Traces**: Follow execution flows end-to-end
|
||||
4. **Quality**: Assess output, not just uptime
|
||||
|
||||
Modern agent platforms like Aden provide this built-in. For other frameworks, plan to invest significant effort in observability infrastructure.
|
||||
|
||||
The goal: Never wonder what your agents are doing—always know.
|
||||
|
||||
---
|
||||
|
||||
*Last updated: January 2025*
|
||||
@@ -0,0 +1,551 @@
|
||||
# Building Production AI Agents: From Prototype to Deployment
|
||||
|
||||
*A practical guide to taking AI agents from demo to production*
|
||||
|
||||
---
|
||||
|
||||
Getting an AI agent working in a demo is easy. Getting it to work reliably in production is hard. This guide covers the critical differences and how to bridge the gap.
|
||||
|
||||
---
|
||||
|
||||
## Demo vs Production
|
||||
|
||||
| Aspect | Demo | Production |
|
||||
|--------|------|------------|
|
||||
| Traffic | You testing it | Hundreds/thousands of users |
|
||||
| Uptime | "It worked when I tried" | 99.9% required |
|
||||
| Errors | "Let me restart it" | Must handle gracefully |
|
||||
| Cost | "It's just a demo" | Every dollar matters |
|
||||
| Security | None | Critical |
|
||||
| Monitoring | Print statements | Full observability |
|
||||
| Recovery | Manual restart | Automatic healing |
|
||||
|
||||
---
|
||||
|
||||
## The Production Readiness Checklist
|
||||
|
||||
### 1. Reliability
|
||||
|
||||
- [ ] Retry logic with exponential backoff
|
||||
- [ ] Circuit breakers for failing services
|
||||
- [ ] Graceful degradation (fallbacks)
|
||||
- [ ] Health check endpoints
|
||||
- [ ] Automatic recovery from crashes
|
||||
|
||||
### 2. Scalability
|
||||
|
||||
- [ ] Horizontal scaling capability
|
||||
- [ ] Stateless design (or managed state)
|
||||
- [ ] Queue-based processing for bursts
|
||||
- [ ] Database connection pooling
|
||||
- [ ] Caching layer
|
||||
|
||||
### 3. Observability
|
||||
|
||||
- [ ] Structured logging
|
||||
- [ ] Metrics collection
|
||||
- [ ] Distributed tracing
|
||||
- [ ] Alerting rules
|
||||
- [ ] Dashboard for monitoring
|
||||
|
||||
### 4. Security
|
||||
|
||||
- [ ] API authentication
|
||||
- [ ] Input validation
|
||||
- [ ] Output sanitization
|
||||
- [ ] Secrets management
|
||||
- [ ] Audit logging
|
||||
|
||||
### 5. Cost Control
|
||||
|
||||
- [ ] Budget limits
|
||||
- [ ] Usage tracking
|
||||
- [ ] Model degradation policies
|
||||
- [ ] Anomaly detection
|
||||
|
||||
### 6. Human Oversight
|
||||
|
||||
- [ ] HITL checkpoints
|
||||
- [ ] Escalation policies
|
||||
- [ ] Audit trails
|
||||
- [ ] Manual override capability
|
||||
|
||||
---
|
||||
|
||||
## Architecture Patterns
|
||||
|
||||
### Pattern 1: Simple Agent Service
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────┐
|
||||
│ Agent Service │
|
||||
│ ┌────────────────────────────────────┐ │
|
||||
│ │ Request Handler │ │
|
||||
│ │ ┌──────┐ ┌──────┐ ┌──────┐ │ │
|
||||
│ │ │Validate│→│Agent │→│Format │ │ │
|
||||
│ │ │ Input │ │Execute│ │Output│ │ │
|
||||
│ │ └──────┘ └──────┘ └──────┘ │ │
|
||||
│ └────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ┌─────────────────────────────────────┐│
|
||||
│ │ Dependencies ││
|
||||
│ │ • LLM API • Tools • Database ││
|
||||
│ └─────────────────────────────────────┘│
|
||||
└──────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**Best for:** Simple use cases, low volume
|
||||
|
||||
### Pattern 2: Queue-Based Processing
|
||||
|
||||
```
|
||||
┌───────┐ ┌───────┐ ┌───────────────┐
|
||||
│Request│───▶│ Queue │───▶│ Agent Workers │
|
||||
│ API │ │ │ │ (N copies) │
|
||||
└───────┘ └───────┘ └───────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────┐
|
||||
│ Results │
|
||||
│ DB │
|
||||
└─────────┘
|
||||
```
|
||||
|
||||
**Best for:** High volume, async processing
|
||||
|
||||
### Pattern 3: Event-Driven Agents
|
||||
|
||||
```
|
||||
┌─────────────┐
|
||||
│ Event Source│─────┐
|
||||
└─────────────┘ │
|
||||
▼
|
||||
┌─────────────┐ ┌─────────┐ ┌─────────────┐
|
||||
│ Event Source│─▶│ Event │─▶│ Agent │
|
||||
└─────────────┘ │ Bus │ │ Processors │
|
||||
└─────────┘ └─────────────┘
|
||||
┌─────────────┐ │
|
||||
│ Event Source│─────┘
|
||||
└─────────────┘
|
||||
```
|
||||
|
||||
**Best for:** Reactive systems, integrations
|
||||
|
||||
### Pattern 4: Full Platform (Aden)
|
||||
|
||||
```
|
||||
┌────────────────────────────────────────────────────────┐
|
||||
│ Aden Platform │
|
||||
│ │
|
||||
│ ┌──────────────┐ ┌──────────────┐ ┌─────────────┐ │
|
||||
│ │ Coding Agent │ │Worker Agents │ │ Dashboard │ │
|
||||
│ │ (Generate) │ │ (Execute) │ │ (Monitor) │ │
|
||||
│ └──────────────┘ └──────────────┘ └─────────────┘ │
|
||||
│ │ │ │ │
|
||||
│ ▼ ▼ ▼ │
|
||||
│ ┌────────────────────────────────────────────────┐ │
|
||||
│ │ Control Plane │ │
|
||||
│ │ • Budget • Policies • Metrics • HITL │ │
|
||||
│ └────────────────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ┌────────────────────────────────────────────────┐ │
|
||||
│ │ Storage Layer │ │
|
||||
│ │ • Events • Policies • Config │ │
|
||||
│ └────────────────────────────────────────────────┘ │
|
||||
└────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**Best for:** Complex systems, self-improving agents
|
||||
|
||||
---
|
||||
|
||||
## Implementing Reliability
|
||||
|
||||
### Retry Logic
|
||||
```python
|
||||
import time
|
||||
from functools import wraps
|
||||
|
||||
def retry_with_backoff(max_retries=3, base_delay=1, max_delay=60):
|
||||
def decorator(func):
|
||||
@wraps(func)
|
||||
async def wrapper(*args, **kwargs):
|
||||
retries = 0
|
||||
while True:
|
||||
try:
|
||||
return await func(*args, **kwargs)
|
||||
except (RateLimitError, TimeoutError) as e:
|
||||
retries += 1
|
||||
if retries > max_retries:
|
||||
raise
|
||||
|
||||
delay = min(base_delay * (2 ** retries), max_delay)
|
||||
logger.warning(f"Retry {retries}/{max_retries} after {delay}s: {e}")
|
||||
await asyncio.sleep(delay)
|
||||
return wrapper
|
||||
return decorator
|
||||
|
||||
@retry_with_backoff(max_retries=3)
|
||||
async def call_llm(prompt):
|
||||
return await llm_client.complete(prompt)
|
||||
```
|
||||
|
||||
### Circuit Breaker
|
||||
```python
|
||||
class CircuitBreaker:
|
||||
def __init__(self, failure_threshold=5, recovery_time=60):
|
||||
self.failure_count = 0
|
||||
self.failure_threshold = failure_threshold
|
||||
self.recovery_time = recovery_time
|
||||
self.last_failure_time = None
|
||||
self.state = "closed" # closed, open, half-open
|
||||
|
||||
async def call(self, func, *args, **kwargs):
|
||||
if self.state == "open":
|
||||
if time.time() - self.last_failure_time > self.recovery_time:
|
||||
self.state = "half-open"
|
||||
else:
|
||||
raise CircuitOpenError("Circuit breaker is open")
|
||||
|
||||
try:
|
||||
result = await func(*args, **kwargs)
|
||||
if self.state == "half-open":
|
||||
self.state = "closed"
|
||||
self.failure_count = 0
|
||||
return result
|
||||
except Exception as e:
|
||||
self.failure_count += 1
|
||||
self.last_failure_time = time.time()
|
||||
if self.failure_count >= self.failure_threshold:
|
||||
self.state = "open"
|
||||
raise
|
||||
```
|
||||
|
||||
### Graceful Degradation
|
||||
```python
|
||||
async def process_with_fallback(task):
|
||||
try:
|
||||
# Try primary approach
|
||||
return await primary_agent.execute(task)
|
||||
except AgentError:
|
||||
try:
|
||||
# Fall back to simpler approach
|
||||
return await fallback_agent.execute(task)
|
||||
except AgentError:
|
||||
# Last resort: static response
|
||||
return create_static_response(task)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Implementing Observability
|
||||
|
||||
### Structured Logging
|
||||
```python
|
||||
import structlog
|
||||
|
||||
logger = structlog.get_logger()
|
||||
|
||||
async def execute_agent(task):
|
||||
logger.info("agent_execution_started",
|
||||
task_id=task.id,
|
||||
agent_id=agent.id,
|
||||
input_tokens=count_tokens(task.input))
|
||||
|
||||
try:
|
||||
result = await agent.run(task)
|
||||
logger.info("agent_execution_completed",
|
||||
task_id=task.id,
|
||||
duration_ms=duration,
|
||||
output_tokens=count_tokens(result),
|
||||
cost_usd=calculate_cost(result))
|
||||
return result
|
||||
except Exception as e:
|
||||
logger.error("agent_execution_failed",
|
||||
task_id=task.id,
|
||||
error=str(e),
|
||||
error_type=type(e).__name__)
|
||||
raise
|
||||
```
|
||||
|
||||
### Metrics Collection
|
||||
```python
|
||||
from prometheus_client import Counter, Histogram, Gauge
|
||||
|
||||
# Counters
|
||||
agent_requests_total = Counter(
|
||||
'agent_requests_total',
|
||||
'Total agent requests',
|
||||
['agent_id', 'status']
|
||||
)
|
||||
|
||||
# Histograms
|
||||
agent_duration_seconds = Histogram(
|
||||
'agent_duration_seconds',
|
||||
'Agent execution duration',
|
||||
['agent_id']
|
||||
)
|
||||
|
||||
# Gauges
|
||||
agent_active_tasks = Gauge(
|
||||
'agent_active_tasks',
|
||||
'Currently running agent tasks',
|
||||
['agent_id']
|
||||
)
|
||||
|
||||
async def execute_with_metrics(agent, task):
|
||||
agent_active_tasks.labels(agent_id=agent.id).inc()
|
||||
start = time.time()
|
||||
|
||||
try:
|
||||
result = await agent.run(task)
|
||||
agent_requests_total.labels(agent_id=agent.id, status='success').inc()
|
||||
return result
|
||||
except Exception:
|
||||
agent_requests_total.labels(agent_id=agent.id, status='error').inc()
|
||||
raise
|
||||
finally:
|
||||
duration = time.time() - start
|
||||
agent_duration_seconds.labels(agent_id=agent.id).observe(duration)
|
||||
agent_active_tasks.labels(agent_id=agent.id).dec()
|
||||
```
|
||||
|
||||
### Distributed Tracing
|
||||
```python
|
||||
from opentelemetry import trace
|
||||
|
||||
tracer = trace.get_tracer(__name__)
|
||||
|
||||
async def execute_with_tracing(agent, task):
|
||||
with tracer.start_as_current_span("agent_execution") as span:
|
||||
span.set_attribute("agent.id", agent.id)
|
||||
span.set_attribute("task.id", task.id)
|
||||
|
||||
# LLM call
|
||||
with tracer.start_as_current_span("llm_call") as llm_span:
|
||||
llm_span.set_attribute("model", agent.model)
|
||||
result = await call_llm(task.prompt)
|
||||
llm_span.set_attribute("tokens", result.usage.total_tokens)
|
||||
|
||||
# Tool execution
|
||||
with tracer.start_as_current_span("tool_execution") as tool_span:
|
||||
tool_span.set_attribute("tool", tool.name)
|
||||
tool_result = await execute_tool(tool, result)
|
||||
|
||||
return tool_result
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Security Best Practices
|
||||
|
||||
### Input Validation
|
||||
```python
|
||||
from pydantic import BaseModel, validator
|
||||
|
||||
class AgentRequest(BaseModel):
|
||||
task: str
|
||||
context: dict = {}
|
||||
max_tokens: int = 1000
|
||||
|
||||
@validator('task')
|
||||
def validate_task(cls, v):
|
||||
if len(v) > 10000:
|
||||
raise ValueError('Task too long')
|
||||
if contains_injection_attempt(v):
|
||||
raise ValueError('Invalid input detected')
|
||||
return v
|
||||
|
||||
@validator('max_tokens')
|
||||
def validate_max_tokens(cls, v):
|
||||
if v > 4000:
|
||||
raise ValueError('max_tokens too high')
|
||||
return v
|
||||
```
|
||||
|
||||
### Output Sanitization
|
||||
```python
|
||||
def sanitize_output(result):
|
||||
# Remove any leaked secrets
|
||||
result = mask_patterns(result, SECRET_PATTERNS)
|
||||
|
||||
# Validate structure
|
||||
if not is_valid_response(result):
|
||||
raise OutputValidationError("Invalid response structure")
|
||||
|
||||
# Check for harmful content
|
||||
if contains_harmful_content(result):
|
||||
raise ContentPolicyError("Response violates content policy")
|
||||
|
||||
return result
|
||||
```
|
||||
|
||||
### Audit Logging
|
||||
```python
|
||||
async def audit_log(event):
|
||||
log_entry = {
|
||||
"timestamp": datetime.utcnow().isoformat(),
|
||||
"event_type": event.type,
|
||||
"agent_id": event.agent_id,
|
||||
"user_id": event.user_id,
|
||||
"action": event.action,
|
||||
"input_hash": hash_content(event.input), # Don't log full input
|
||||
"output_hash": hash_content(event.output),
|
||||
"metadata": event.metadata
|
||||
}
|
||||
await audit_db.insert(log_entry)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Deployment Strategies
|
||||
|
||||
### Blue-Green Deployment
|
||||
```
|
||||
Load Balancer
|
||||
│
|
||||
┌───────────┴───────────┐
|
||||
│ │
|
||||
┌─────▼─────┐ ┌─────▼─────┐
|
||||
│ Blue │ │ Green │
|
||||
│ (Current) │ │ (New) │
|
||||
└───────────┘ └───────────┘
|
||||
|
||||
1. Deploy new version to Green
|
||||
2. Test Green environment
|
||||
3. Switch traffic Blue → Green
|
||||
4. Keep Blue for rollback
|
||||
```
|
||||
|
||||
### Canary Deployment
|
||||
```
|
||||
Load Balancer
|
||||
│
|
||||
┌───────────┴───────────┐
|
||||
│ 95% 5% │
|
||||
┌─────▼─────┐ ┌─────▼─────┐
|
||||
│ Stable │ │ Canary │
|
||||
│ (v1.0) │ │ (v1.1) │
|
||||
└───────────┘ └───────────┘
|
||||
|
||||
1. Deploy new version as Canary
|
||||
2. Route 5% traffic to Canary
|
||||
3. Monitor metrics
|
||||
4. Gradually increase or rollback
|
||||
```
|
||||
|
||||
### Feature Flags
|
||||
```python
|
||||
async def execute_agent(task, user):
|
||||
if feature_flags.is_enabled("new_agent_v2", user.id):
|
||||
return await agent_v2.execute(task)
|
||||
else:
|
||||
return await agent_v1.execute(task)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Framework Comparison: Production Readiness
|
||||
|
||||
| Feature | DIY | LangChain | CrewAI | Aden |
|
||||
|---------|-----|-----------|--------|------|
|
||||
| Retry logic | Build | Partial | Basic | Built-in |
|
||||
| Circuit breakers | Build | No | No | Built-in |
|
||||
| Health checks | Build | No | No | Built-in |
|
||||
| Monitoring | Build | LangSmith | Build | Built-in |
|
||||
| Cost control | Build | No | No | Built-in |
|
||||
| HITL | Build | Build | Basic | Native |
|
||||
| Self-healing | Build | No | No | Native |
|
||||
| Dashboard | Build | LangSmith | No | Built-in |
|
||||
|
||||
---
|
||||
|
||||
## Testing for Production
|
||||
|
||||
### Unit Tests
|
||||
```python
|
||||
def test_agent_handles_rate_limit():
|
||||
with mock.patch('llm.complete', side_effect=RateLimitError()):
|
||||
result = agent.execute(task)
|
||||
assert result.status == "retried"
|
||||
|
||||
def test_agent_validates_input():
|
||||
with pytest.raises(ValidationError):
|
||||
agent.execute({"task": "x" * 100000}) # Too long
|
||||
```
|
||||
|
||||
### Integration Tests
|
||||
```python
|
||||
async def test_full_agent_flow():
|
||||
# Create test task
|
||||
task = create_test_task()
|
||||
|
||||
# Execute agent
|
||||
result = await agent.execute(task)
|
||||
|
||||
# Verify result
|
||||
assert result.success
|
||||
assert result.output is not None
|
||||
|
||||
# Verify monitoring
|
||||
assert metrics.request_count > 0
|
||||
assert metrics.last_cost < 1.0
|
||||
```
|
||||
|
||||
### Load Tests
|
||||
```python
|
||||
async def load_test_agent():
|
||||
tasks = [create_test_task() for _ in range(100)]
|
||||
|
||||
start = time.time()
|
||||
results = await asyncio.gather(*[
|
||||
agent.execute(task) for task in tasks
|
||||
])
|
||||
duration = time.time() - start
|
||||
|
||||
success_rate = sum(1 for r in results if r.success) / len(results)
|
||||
avg_latency = duration / len(tasks)
|
||||
|
||||
assert success_rate > 0.95
|
||||
assert avg_latency < 5.0 # seconds
|
||||
```
|
||||
|
||||
### Chaos Tests
|
||||
```python
|
||||
async def test_agent_survives_llm_outage():
|
||||
with mock.patch('llm.complete', side_effect=ConnectionError()):
|
||||
# Should use fallback or degrade gracefully
|
||||
result = await agent.execute(task)
|
||||
assert result.status in ["fallback", "degraded"]
|
||||
|
||||
async def test_agent_survives_high_load():
|
||||
# Simulate burst traffic
|
||||
tasks = [create_test_task() for _ in range(1000)]
|
||||
results = await asyncio.gather(*[
|
||||
agent.execute(task) for task in tasks
|
||||
], return_exceptions=True)
|
||||
|
||||
# Should not crash, may throttle
|
||||
errors = [r for r in results if isinstance(r, Exception)]
|
||||
assert len(errors) / len(results) < 0.1 # <10% error rate
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Conclusion
|
||||
|
||||
Production AI agents require:
|
||||
|
||||
1. **Reliability**: Retries, circuit breakers, fallbacks
|
||||
2. **Observability**: Logs, metrics, traces, dashboards
|
||||
3. **Security**: Validation, sanitization, auditing
|
||||
4. **Cost Control**: Budgets, tracking, degradation
|
||||
5. **Human Oversight**: HITL, escalation, override
|
||||
|
||||
Frameworks like Aden provide many of these out of the box. For other frameworks, you'll need to build this infrastructure yourself.
|
||||
|
||||
The gap between demo and production is significant—plan for it from the start.
|
||||
|
||||
---
|
||||
|
||||
*Last updated: January 2025*
|
||||
@@ -0,0 +1,441 @@
|
||||
# Human-in-the-Loop for AI Agents: A Complete Guide
|
||||
|
||||
*Balancing automation with human oversight for safe, effective AI systems*
|
||||
|
||||
---
|
||||
|
||||
Human-in-the-Loop (HITL) is a critical design pattern for AI agents. It ensures that humans remain in control of important decisions while still benefiting from AI automation. This guide covers everything you need to know about implementing HITL in agent systems.
|
||||
|
||||
---
|
||||
|
||||
## What is Human-in-the-Loop?
|
||||
|
||||
HITL refers to **incorporating human judgment into automated AI workflows**. Instead of fully autonomous operation, agents pause at critical points to request human input, approval, or guidance.
|
||||
|
||||
```
|
||||
Agent working → Critical decision → PAUSE → Human reviews → Continue/Modify
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Why HITL Matters
|
||||
|
||||
### Safety
|
||||
- Prevents harmful actions before they occur
|
||||
- Catches AI errors and hallucinations
|
||||
- Maintains accountability
|
||||
|
||||
### Quality
|
||||
- Ensures outputs meet standards
|
||||
- Incorporates domain expertise
|
||||
- Validates complex decisions
|
||||
|
||||
### Trust
|
||||
- Builds user confidence in AI systems
|
||||
- Provides transparency
|
||||
- Enables gradual autonomy increase
|
||||
|
||||
### Compliance
|
||||
- Meets regulatory requirements
|
||||
- Creates audit trails
|
||||
- Maintains human responsibility
|
||||
|
||||
---
|
||||
|
||||
## HITL Patterns
|
||||
|
||||
### Pattern 1: Approval Gates
|
||||
Agent completes work, then waits for human approval before proceeding.
|
||||
|
||||
```
|
||||
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
|
||||
│ Agent │────▶│ APPROVE? │────▶│ Action │
|
||||
│ works │ │ (Human) │ │ taken │
|
||||
└─────────────┘ └─────────────┘ └─────────────┘
|
||||
│
|
||||
│ Reject
|
||||
▼
|
||||
┌─────────────┐
|
||||
│ Revise │
|
||||
└─────────────┘
|
||||
```
|
||||
|
||||
**Use when:** Actions are irreversible or high-impact
|
||||
|
||||
**Example:**
|
||||
- Publishing content
|
||||
- Sending emails to customers
|
||||
- Making financial transactions
|
||||
|
||||
### Pattern 2: Confidence-Based Escalation
|
||||
Agent handles confident decisions autonomously, escalates uncertain ones.
|
||||
|
||||
```
|
||||
Agent decision
|
||||
│
|
||||
▼
|
||||
┌─────────────────┐
|
||||
│ Confidence? │
|
||||
└─────────────────┘
|
||||
│
|
||||
├── High ──▶ Proceed autonomously
|
||||
│
|
||||
└── Low ───▶ Request human input
|
||||
```
|
||||
|
||||
**Use when:** Volume is high, most cases are straightforward
|
||||
|
||||
**Example:**
|
||||
- Customer support ticket routing
|
||||
- Content moderation
|
||||
- Data classification
|
||||
|
||||
### Pattern 3: Sampling/Audit
|
||||
Agent operates autonomously, humans review a sample of decisions.
|
||||
|
||||
```
|
||||
Agent decisions: [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]
|
||||
│ │
|
||||
▼ ▼
|
||||
Human reviews sample
|
||||
│
|
||||
▼
|
||||
Feedback loop to agent
|
||||
```
|
||||
|
||||
**Use when:** Scale makes full review impossible
|
||||
|
||||
**Example:**
|
||||
- Fraud detection review
|
||||
- Quality assurance
|
||||
- Model monitoring
|
||||
|
||||
### Pattern 4: Collaborative Editing
|
||||
Human and agent work together in real-time.
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────┐
|
||||
│ │
|
||||
│ Agent suggests ←→ Human edits │
|
||||
│ │
|
||||
│ Iterative refinement │
|
||||
│ │
|
||||
└─────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**Use when:** Output quality is paramount
|
||||
|
||||
**Example:**
|
||||
- Document drafting
|
||||
- Code review
|
||||
- Creative content
|
||||
|
||||
---
|
||||
|
||||
## Implementing HITL
|
||||
|
||||
### Key Components
|
||||
|
||||
1. **Intervention Points**
|
||||
- Where in the workflow to pause
|
||||
- What triggers human involvement
|
||||
|
||||
2. **Request Interface**
|
||||
- How to present information to humans
|
||||
- What context to provide
|
||||
|
||||
3. **Response Handling**
|
||||
- How to process human input
|
||||
- Timeout and escalation policies
|
||||
|
||||
4. **Learning Loop**
|
||||
- Capturing human decisions for improvement
|
||||
- Reducing future intervention needs
|
||||
|
||||
### Implementation Example
|
||||
|
||||
```python
|
||||
class HITLAgent:
|
||||
def __init__(self, config):
|
||||
self.confidence_threshold = config.confidence_threshold
|
||||
self.timeout = config.human_timeout
|
||||
self.escalation_policy = config.escalation
|
||||
|
||||
async def execute(self, task):
|
||||
# Agent works on task
|
||||
result = await self.process(task)
|
||||
|
||||
# Check if human review needed
|
||||
if self.needs_human_review(result):
|
||||
# Create intervention request
|
||||
request = InterventionRequest(
|
||||
task=task,
|
||||
result=result,
|
||||
context=self.get_context(),
|
||||
options=self.get_options(result),
|
||||
deadline=self.timeout
|
||||
)
|
||||
|
||||
# Wait for human response
|
||||
human_response = await self.request_human_input(request)
|
||||
|
||||
if human_response.approved:
|
||||
return self.finalize(result, human_response.modifications)
|
||||
else:
|
||||
return self.handle_rejection(human_response.feedback)
|
||||
else:
|
||||
return result
|
||||
|
||||
def needs_human_review(self, result):
|
||||
# Determine based on:
|
||||
# - Confidence score
|
||||
# - Action type (high-impact?)
|
||||
# - Policy rules
|
||||
# - Historical patterns
|
||||
pass
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## HITL in Different Frameworks
|
||||
|
||||
### Basic Implementation (Most Frameworks)
|
||||
```python
|
||||
# Manual HITL implementation
|
||||
def agent_with_approval(task):
|
||||
result = agent.execute(task)
|
||||
|
||||
print(f"Agent proposes: {result}")
|
||||
approved = input("Approve? (y/n): ")
|
||||
|
||||
if approved == 'y':
|
||||
return execute_action(result)
|
||||
else:
|
||||
feedback = input("Feedback: ")
|
||||
return agent.revise(task, feedback)
|
||||
```
|
||||
|
||||
### CrewAI HITL
|
||||
```python
|
||||
from crewai import Agent
|
||||
|
||||
agent = Agent(
|
||||
role="Content Writer",
|
||||
human_input=True, # Enable human input
|
||||
# Agent will request input when uncertain
|
||||
)
|
||||
```
|
||||
|
||||
### AutoGen HITL
|
||||
```python
|
||||
from autogen import UserProxyAgent
|
||||
|
||||
user_proxy = UserProxyAgent(
|
||||
name="human",
|
||||
human_input_mode="ALWAYS", # or "TERMINATE", "NEVER"
|
||||
# Controls when human input is requested
|
||||
)
|
||||
```
|
||||
|
||||
### Aden HITL
|
||||
Aden has native support for HITL with:
|
||||
|
||||
```python
|
||||
# Goal definition includes HITL requirements
|
||||
goal = """
|
||||
Create a customer response system that:
|
||||
1. Drafts responses to customer inquiries
|
||||
2. Requires human approval for:
|
||||
- Refund requests over $100
|
||||
- Escalation decisions
|
||||
- Responses to VIP customers
|
||||
3. Auto-sends low-risk responses after 2-hour timeout
|
||||
4. Learns from approved/rejected responses
|
||||
"""
|
||||
|
||||
# Aden creates intervention nodes automatically
|
||||
# Dashboard shows pending approvals
|
||||
# Configurable timeout and escalation policies
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Timeout and Escalation Strategies
|
||||
|
||||
### What Happens When Humans Don't Respond?
|
||||
|
||||
| Strategy | When to Use | Implementation |
|
||||
|----------|-------------|----------------|
|
||||
| **Wait indefinitely** | Critical decisions | No timeout |
|
||||
| **Auto-approve** | Low-risk, time-sensitive | Proceed after timeout |
|
||||
| **Auto-reject** | Safety-first approach | Cancel after timeout |
|
||||
| **Escalate** | Important but time-sensitive | Notify additional humans |
|
||||
| **Fallback** | Must complete | Use safe default |
|
||||
|
||||
### Escalation Chain Example
|
||||
```
|
||||
Request sent
|
||||
│
|
||||
├── 30 min: Reminder to original reviewer
|
||||
│
|
||||
├── 1 hour: Escalate to team lead
|
||||
│
|
||||
├── 2 hours: Escalate to manager
|
||||
│
|
||||
└── 4 hours: Auto-reject with notification
|
||||
```
|
||||
|
||||
### Timeout Configuration
|
||||
```python
|
||||
intervention_config = {
|
||||
"timeout_minutes": 60,
|
||||
"reminders": [30, 45],
|
||||
"escalation_chain": ["team_lead", "manager"],
|
||||
"fallback_action": "reject",
|
||||
"notification_channels": ["email", "slack"]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Best Practices
|
||||
|
||||
### 1. Minimize Friction
|
||||
- **Good:** Clear, actionable requests
|
||||
- **Bad:** Vague requests requiring investigation
|
||||
|
||||
```
|
||||
# Good
|
||||
"Approve sending this email to john@example.com?
|
||||
Subject: Order Confirmation
|
||||
[View full email] [Approve] [Reject] [Edit]"
|
||||
|
||||
# Bad
|
||||
"Agent completed task. Review?"
|
||||
```
|
||||
|
||||
### 2. Provide Context
|
||||
Include everything humans need to decide:
|
||||
- What the agent did
|
||||
- Why it's asking (confidence, rules)
|
||||
- Relevant history
|
||||
- Available options
|
||||
|
||||
### 3. Make Actions Easy
|
||||
- One-click approval for clear cases
|
||||
- Pre-filled options
|
||||
- Keyboard shortcuts for power users
|
||||
|
||||
### 4. Learn from Decisions
|
||||
Track human decisions to:
|
||||
- Improve agent confidence calibration
|
||||
- Identify patterns for automation
|
||||
- Reduce future intervention needs
|
||||
|
||||
### 5. Design for Scale
|
||||
Consider what happens with:
|
||||
- 10 requests per day
|
||||
- 100 requests per day
|
||||
- 1000 requests per day
|
||||
|
||||
### 6. Handle Edge Cases
|
||||
- What if reviewer is unavailable?
|
||||
- What if multiple reviewers conflict?
|
||||
- What if reviewer makes a mistake?
|
||||
|
||||
---
|
||||
|
||||
## Metrics to Track
|
||||
|
||||
| Metric | What it Measures | Target |
|
||||
|--------|------------------|--------|
|
||||
| Intervention rate | % of tasks needing human | Minimize over time |
|
||||
| Response time | How fast humans respond | Optimize |
|
||||
| Approval rate | % of requests approved | Monitor for drift |
|
||||
| Override rate | Humans changing agent decisions | Quality indicator |
|
||||
| Timeout rate | % of requests timing out | Keep low |
|
||||
| Learning impact | Reduction in interventions | Should decrease |
|
||||
|
||||
---
|
||||
|
||||
## Common Mistakes
|
||||
|
||||
### 1. Too Many Interventions
|
||||
**Problem:** Humans overwhelmed, start rubber-stamping
|
||||
**Solution:** Reserve for truly important decisions
|
||||
|
||||
### 2. Too Few Interventions
|
||||
**Problem:** Errors slip through, trust erodes
|
||||
**Solution:** Start conservative, reduce over time
|
||||
|
||||
### 3. Poor Context
|
||||
**Problem:** Humans can't make informed decisions
|
||||
**Solution:** Include all relevant information
|
||||
|
||||
### 4. Slow Response
|
||||
**Problem:** Workflow bottlenecked on humans
|
||||
**Solution:** Timeouts, escalation, parallelization
|
||||
|
||||
### 5. No Learning
|
||||
**Problem:** Same interventions forever
|
||||
**Solution:** Track patterns, improve agent
|
||||
|
||||
---
|
||||
|
||||
## HITL and Compliance
|
||||
|
||||
### Audit Trail Requirements
|
||||
```python
|
||||
audit_log = {
|
||||
"timestamp": "2025-01-15T10:30:00Z",
|
||||
"task_id": "task_123",
|
||||
"agent_decision": "send_refund",
|
||||
"intervention_requested": True,
|
||||
"reviewer": "jane@company.com",
|
||||
"review_timestamp": "2025-01-15T10:45:00Z",
|
||||
"decision": "approved",
|
||||
"modifications": None,
|
||||
"rationale": "Within policy limits"
|
||||
}
|
||||
```
|
||||
|
||||
### Regulatory Considerations
|
||||
- GDPR: Human review for automated decisions affecting individuals
|
||||
- Financial: Approval requirements for transactions
|
||||
- Healthcare: Clinical decision support guidelines
|
||||
- AI regulations: Explainability and human oversight requirements
|
||||
|
||||
---
|
||||
|
||||
## Future of HITL
|
||||
|
||||
### Trends
|
||||
1. **Adaptive intervention** - AI learns when to ask
|
||||
2. **Predictive escalation** - Anticipate human needs
|
||||
3. **Collaborative interfaces** - Better human-AI interaction
|
||||
4. **Gradual autonomy** - Systems earn more independence
|
||||
|
||||
### Aden's Approach
|
||||
Aden is built around native HITL:
|
||||
- Intervention nodes are first-class citizens
|
||||
- Dashboard for managing approvals
|
||||
- Configurable policies per agent
|
||||
- Learning from human feedback
|
||||
- Self-improvement reduces intervention over time
|
||||
|
||||
---
|
||||
|
||||
## Conclusion
|
||||
|
||||
Human-in-the-Loop isn't about limiting AI—it's about **building AI systems that humans can trust and control**. The best HITL implementations:
|
||||
|
||||
1. Start conservative and earn autonomy
|
||||
2. Make human interaction effortless
|
||||
3. Learn from every decision
|
||||
4. Balance automation with oversight
|
||||
|
||||
As AI agents become more capable, thoughtful HITL design becomes more important, not less. The goal is collaboration, not competition, between human and artificial intelligence.
|
||||
|
||||
---
|
||||
|
||||
*Last updated: January 2025*
|
||||
@@ -0,0 +1,289 @@
|
||||
# Multi-Agent vs Single-Agent Systems: When to Use Each
|
||||
|
||||
*A practical guide to choosing the right architecture for your AI application*
|
||||
|
||||
---
|
||||
|
||||
When building AI applications, one of the first architectural decisions is whether to use a single agent or multiple agents working together. This guide breaks down when each approach makes sense.
|
||||
|
||||
---
|
||||
|
||||
## Single-Agent Systems
|
||||
|
||||
### What They Are
|
||||
A single agent handles all tasks, tool calls, and decision-making within one unified process.
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────┐
|
||||
│ Single Agent │
|
||||
│ ┌─────────────────────────────────┐ │
|
||||
│ │ LLM Brain │ │
|
||||
│ │ • Reasoning │ │
|
||||
│ │ • Planning │ │
|
||||
│ │ • Tool Selection │ │
|
||||
│ │ • Execution │ │
|
||||
│ └─────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ┌───────────────┴───────────────┐ │
|
||||
│ │ Tools │ │
|
||||
│ │ [A] [B] [C] [D] [E] [F] │ │
|
||||
│ └───────────────────────────────┘ │
|
||||
└─────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Advantages
|
||||
- **Simpler to build**: One agent, one context, one conversation
|
||||
- **Lower latency**: No inter-agent communication overhead
|
||||
- **Easier debugging**: Single point of execution to trace
|
||||
- **Lower cost**: Fewer LLM calls overall
|
||||
- **Unified context**: All information in one place
|
||||
|
||||
### Disadvantages
|
||||
- **Context limits**: One agent must fit everything in its context window
|
||||
- **Jack of all trades**: Hard to optimize for specialized tasks
|
||||
- **Single point of failure**: If the agent fails, everything fails
|
||||
- **Limited parallelism**: Sequential execution of tasks
|
||||
|
||||
### Best Use Cases
|
||||
1. **Simple Q&A chatbots**: Direct user interaction
|
||||
2. **Single-purpose tools**: One task done well
|
||||
3. **Prototype development**: Quick iteration
|
||||
4. **Low-complexity workflows**: Linear task sequences
|
||||
5. **Cost-sensitive applications**: Minimizing LLM usage
|
||||
|
||||
---
|
||||
|
||||
## Multi-Agent Systems
|
||||
|
||||
### What They Are
|
||||
Multiple specialized agents collaborate, each handling specific tasks or domains.
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ Multi-Agent System │
|
||||
│ │
|
||||
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ │
|
||||
│ │ Agent A │ │ Agent B │ │ Agent C │ │
|
||||
│ │ Researcher│ │ Writer │ │ Reviewer │ │
|
||||
│ │ [🔍] │ │ [✍️] │ │ [✓] │ │
|
||||
│ └─────┬─────┘ └─────┬─────┘ └─────┬─────┘ │
|
||||
│ │ │ │ │
|
||||
│ └───────────────┼───────────────┘ │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────┐ │
|
||||
│ │ Coordinator │ │
|
||||
│ │ / Orchestrator│ │
|
||||
│ └─────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Advantages
|
||||
- **Specialization**: Each agent optimized for its domain
|
||||
- **Scalability**: Add new agents for new capabilities
|
||||
- **Parallelism**: Multiple agents work simultaneously
|
||||
- **Fault isolation**: One agent failing doesn't crash everything
|
||||
- **Better context management**: Each agent has focused context
|
||||
|
||||
### Disadvantages
|
||||
- **Coordination complexity**: Managing agent communication
|
||||
- **Higher latency**: Inter-agent handoffs add time
|
||||
- **More expensive**: More LLM calls for coordination
|
||||
- **Debugging difficulty**: Distributed execution traces
|
||||
- **Potential conflicts**: Agents may have conflicting outputs
|
||||
|
||||
### Best Use Cases
|
||||
1. **Complex research tasks**: Multiple perspectives needed
|
||||
2. **Content pipelines**: Research → Write → Edit → Publish
|
||||
3. **Enterprise workflows**: Different departments/functions
|
||||
4. **Self-improving systems**: Separate learning from execution
|
||||
5. **High-reliability systems**: Redundancy and verification
|
||||
|
||||
---
|
||||
|
||||
## Framework Comparison
|
||||
|
||||
| Framework | Single-Agent | Multi-Agent | Coordination Style |
|
||||
|-----------|--------------|-------------|-------------------|
|
||||
| LangChain | Excellent | Basic | Manual chains |
|
||||
| CrewAI | Good | Excellent | Role-based crews |
|
||||
| AutoGen | Good | Excellent | Conversation-based |
|
||||
| Aden | Excellent | Excellent | Goal-driven + Self-improving |
|
||||
|
||||
---
|
||||
|
||||
## Aden's Hybrid Approach
|
||||
|
||||
Aden takes a unique approach by combining both paradigms:
|
||||
|
||||
### The Two-Agent Core
|
||||
```
|
||||
┌────────────────────────────────────────────────────────────┐
|
||||
│ Aden System │
|
||||
│ │
|
||||
│ ┌──────────────────┐ ┌──────────────────────────┐ │
|
||||
│ │ Coding Agent │ │ Worker Agents │ │
|
||||
│ │ (Single, Meta) │────▶│ (Multi, Specialized) │ │
|
||||
│ │ │ │ ┌──────┐ ┌──────┐ │ │
|
||||
│ │ • Generates │ │ │Agent1│ │Agent2│ ... │ │
|
||||
│ │ • Improves │ │ └──────┘ └──────┘ │ │
|
||||
│ │ • Orchestrates │ │ │ │
|
||||
│ └──────────────────┘ └──────────────────────────┘ │
|
||||
│ │ │ │
|
||||
│ └───────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ┌──────────▼──────────┐ │
|
||||
│ │ Control Plane │ │
|
||||
│ │ Budgets • Policies │ │
|
||||
│ └─────────────────────┘ │
|
||||
└────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### How It Works
|
||||
1. **Single Meta-Agent**: The Coding Agent acts as a single intelligent orchestrator
|
||||
2. **Multi-Agent Execution**: Worker Agents are specialized and run in parallel
|
||||
3. **Best of Both**: Simple development (goal-based) with multi-agent power
|
||||
4. **Self-Improving**: The system evolves based on execution feedback
|
||||
|
||||
### When Aden Shines
|
||||
- You want multi-agent power without multi-agent complexity
|
||||
- Your system needs to improve itself over time
|
||||
- You need production controls (budgets, HITL, monitoring)
|
||||
- You're building complex workflows from natural language goals
|
||||
|
||||
---
|
||||
|
||||
## Decision Framework
|
||||
|
||||
Use this flowchart to decide:
|
||||
|
||||
```
|
||||
Start
|
||||
│
|
||||
▼
|
||||
┌─────────────────────┐
|
||||
│ Is the task │
|
||||
│ single-purpose? │
|
||||
└──────────┬──────────┘
|
||||
│
|
||||
Yes ◄─────┴─────► No
|
||||
│ │
|
||||
▼ ▼
|
||||
┌───────────────┐ ┌────────────────────┐
|
||||
│ Single Agent │ │ Do tasks need │
|
||||
│ is sufficient │ │ different expertise?│
|
||||
└───────────────┘ └─────────┬──────────┘
|
||||
│
|
||||
Yes ◄─────┴─────► No
|
||||
│ │
|
||||
▼ ▼
|
||||
┌────────────────┐ ┌────────────────┐
|
||||
│ Multi-Agent │ │ Could benefit │
|
||||
│ Recommended │ │ from parallel │
|
||||
└────────────────┘ │ execution? │
|
||||
└────────┬───────┘
|
||||
│
|
||||
Yes ◄─────┴─────► No
|
||||
│ │
|
||||
▼ ▼
|
||||
┌────────────────┐ ┌────────────┐
|
||||
│ Multi-Agent │ │ Single │
|
||||
│ for speed │ │ Agent OK │
|
||||
└────────────────┘ └────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Practical Examples
|
||||
|
||||
### Example 1: Customer Support Bot
|
||||
**Recommended: Single Agent**
|
||||
|
||||
Why: Direct Q&A, unified context, low latency needed
|
||||
```
|
||||
User Question → Single Agent → Answer
|
||||
```
|
||||
|
||||
### Example 2: Research Report Generator
|
||||
**Recommended: Multi-Agent**
|
||||
|
||||
Why: Multiple sources, different skills, quality review
|
||||
```
|
||||
Topic → Researcher Agent → Writer Agent → Editor Agent → Report
|
||||
```
|
||||
|
||||
### Example 3: E-commerce Order Processing
|
||||
**Recommended: Multi-Agent with Aden**
|
||||
|
||||
Why: Multiple systems, needs reliability, self-improvement valuable
|
||||
```
|
||||
Order → Inventory Agent ─┐
|
||||
├──► Coordinator → Fulfillment
|
||||
Payment → Finance Agent ─┘
|
||||
```
|
||||
|
||||
### Example 4: Code Review Assistant
|
||||
**Recommended: Hybrid (Aden)**
|
||||
|
||||
Why: Needs specialization but also coordination
|
||||
```
|
||||
PR → Coding Agent generates → [Security Agent, Style Agent, Logic Agent]
|
||||
→ Synthesize Review
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Migration Strategies
|
||||
|
||||
### Single → Multi-Agent
|
||||
1. Identify natural task boundaries
|
||||
2. Extract specialized agents one at a time
|
||||
3. Add coordination layer
|
||||
4. Implement inter-agent communication
|
||||
5. Add monitoring for new failure modes
|
||||
|
||||
### Multi → Single-Agent
|
||||
1. Consolidate related agents
|
||||
2. Merge context and tools
|
||||
3. Simplify coordination logic
|
||||
4. Reduce LLM calls
|
||||
5. Improve response latency
|
||||
|
||||
---
|
||||
|
||||
## Key Metrics to Track
|
||||
|
||||
| Metric | Single-Agent | Multi-Agent |
|
||||
|--------|--------------|-------------|
|
||||
| Latency | Lower baseline | Higher, but parallelizable |
|
||||
| Cost/Request | Predictable | Variable, needs budgets |
|
||||
| Success Rate | Simpler to optimize | More failure points |
|
||||
| Throughput | Limited by one agent | Scales with agents |
|
||||
| Debugging Time | Linear | Exponential without tooling |
|
||||
|
||||
---
|
||||
|
||||
## Conclusion
|
||||
|
||||
**Choose Single-Agent when:**
|
||||
- Building simple, focused applications
|
||||
- Latency is critical
|
||||
- Budget is tight
|
||||
- Quick iteration is needed
|
||||
|
||||
**Choose Multi-Agent when:**
|
||||
- Tasks require different expertise
|
||||
- Parallelism improves outcomes
|
||||
- Reliability through redundancy matters
|
||||
- System complexity warrants specialization
|
||||
|
||||
**Choose Aden's Hybrid Approach when:**
|
||||
- You want multi-agent power with single-agent simplicity
|
||||
- Self-improvement is valuable
|
||||
- Production controls are essential
|
||||
- You're scaling from prototype to production
|
||||
|
||||
The right architecture depends on your specific use case. Start simple, measure results, and evolve your architecture as needs become clearer.
|
||||
|
||||
---
|
||||
|
||||
*Last updated: January 2025*
|
||||
@@ -0,0 +1,415 @@
|
||||
# Self-Improving vs Static Agents: Understanding the Paradigm Shift
|
||||
|
||||
*Why adaptive AI agents are changing how we build intelligent systems*
|
||||
|
||||
---
|
||||
|
||||
The AI agent landscape is divided between two fundamentally different approaches: **static agents** that execute predefined logic, and **self-improving agents** that evolve based on experience. Understanding this distinction is crucial for choosing the right architecture.
|
||||
|
||||
---
|
||||
|
||||
## The Core Difference
|
||||
|
||||
### Static Agents
|
||||
Static agents follow **predefined workflows** that remain constant until a developer manually updates them. They're predictable but require human intervention to improve.
|
||||
|
||||
```
|
||||
User Request → Fixed Logic → Response
|
||||
↓
|
||||
(If failure)
|
||||
↓
|
||||
Human fixes code
|
||||
↓
|
||||
Redeploy
|
||||
```
|
||||
|
||||
### Self-Improving Agents
|
||||
Self-improving agents **learn from their experiences**, automatically adjusting their behavior based on successes and failures.
|
||||
|
||||
```
|
||||
User Request → Adaptive Logic → Response
|
||||
↓
|
||||
(If failure)
|
||||
↓
|
||||
Capture failure data
|
||||
↓
|
||||
Evolve agent graph
|
||||
↓
|
||||
Auto-redeploy (improved)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Comparison Table
|
||||
|
||||
| Aspect | Static Agents | Self-Improving Agents |
|
||||
|--------|---------------|----------------------|
|
||||
| Behavior change | Manual code updates | Automatic evolution |
|
||||
| Failure response | Log and alert | Learn and adapt |
|
||||
| Improvement cycle | Days/weeks | Minutes/hours |
|
||||
| Human involvement | Required for changes | Optional oversight |
|
||||
| Predictability | High | Moderate (with guardrails) |
|
||||
| Long-term maintenance | Higher | Lower |
|
||||
| Initial complexity | Lower | Higher |
|
||||
|
||||
---
|
||||
|
||||
## How Static Agents Work
|
||||
|
||||
### Architecture
|
||||
```
|
||||
┌─────────────────────────────────────┐
|
||||
│ Static Agent │
|
||||
├─────────────────────────────────────┤
|
||||
│ ┌─────────────────────────────┐ │
|
||||
│ │ Hardcoded Workflow │ │
|
||||
│ │ ┌───┐ ┌───┐ ┌───┐ │ │
|
||||
│ │ │ A │→│ B │→│ C │ │ │
|
||||
│ │ └───┘ └───┘ └───┘ │ │
|
||||
│ └─────────────────────────────┘ │
|
||||
│ │
|
||||
│ • Fixed decision logic │
|
||||
│ • Predefined tool usage │
|
||||
│ • Static prompts │
|
||||
│ • Manual error handling │
|
||||
└─────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Typical Improvement Cycle
|
||||
|
||||
1. **Agent deployed** with initial logic
|
||||
2. **Failures occur** in production
|
||||
3. **Developers analyze** logs and errors
|
||||
4. **Code changes** made manually
|
||||
5. **Testing** in staging environment
|
||||
6. **Redeployment** to production
|
||||
7. **Repeat** for each issue
|
||||
|
||||
**Timeline:** Days to weeks per improvement
|
||||
|
||||
### Examples of Static Agent Frameworks
|
||||
- LangChain agents
|
||||
- Basic CrewAI implementations
|
||||
- Custom ReAct agents
|
||||
- Simple AutoGen conversations
|
||||
|
||||
---
|
||||
|
||||
## How Self-Improving Agents Work
|
||||
|
||||
### Architecture
|
||||
```
|
||||
┌─────────────────────────────────────────────────┐
|
||||
│ Self-Improving Agent System │
|
||||
├─────────────────────────────────────────────────┤
|
||||
│ ┌─────────────────────────────────────────┐ │
|
||||
│ │ Adaptive Agent Graph │ │
|
||||
│ │ ┌───┐ ┌───┐ ┌───┐ │ │
|
||||
│ │ │ A │→│ B │→│ C │ ← Can change │ │
|
||||
│ │ └───┘ └───┘ └───┘ │ │
|
||||
│ └─────────────────────────────────────────┘ │
|
||||
│ ↑ │
|
||||
│ │ Evolution │
|
||||
│ │ │
|
||||
│ ┌─────────────────────────────────────────┐ │
|
||||
│ │ Coding Agent │ │
|
||||
│ │ • Analyzes failures │ │
|
||||
│ │ • Generates improvements │ │
|
||||
│ │ • Updates agent graph │ │
|
||||
│ └─────────────────────────────────────────┘ │
|
||||
│ ↑ │
|
||||
│ │ │
|
||||
│ ┌─────────────────────────────────────────┐ │
|
||||
│ │ Failure Capture │ │
|
||||
│ │ • Error context │ │
|
||||
│ │ • Input/output data │ │
|
||||
│ │ • User feedback │ │
|
||||
│ └─────────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Typical Improvement Cycle
|
||||
|
||||
1. **Agent deployed** with initial goal-derived logic
|
||||
2. **Failures captured** automatically with full context
|
||||
3. **Coding agent analyzes** failure patterns
|
||||
4. **Graph evolved** with improved logic
|
||||
5. **Automatic validation** via test cases
|
||||
6. **Auto-redeployment** (with optional human approval)
|
||||
7. **Continuous improvement** as more data arrives
|
||||
|
||||
**Timeline:** Minutes to hours per improvement
|
||||
|
||||
### Examples of Self-Improving Systems
|
||||
- Aden's goal-driven agents
|
||||
- Custom evolutionary architectures
|
||||
- Reinforcement learning agents
|
||||
- Meta-learning systems
|
||||
|
||||
---
|
||||
|
||||
## When Failures Happen
|
||||
|
||||
### Static Agent Response
|
||||
```python
|
||||
# Static agent: failures require manual intervention
|
||||
try:
|
||||
result = agent.execute(task)
|
||||
except AgentError as e:
|
||||
logger.error(f"Agent failed: {e}")
|
||||
alert_team(e) # Human must investigate
|
||||
return fallback_response()
|
||||
|
||||
# Improvement requires:
|
||||
# 1. Developer reviews logs
|
||||
# 2. Identifies root cause
|
||||
# 3. Writes fix
|
||||
# 4. Tests fix
|
||||
# 5. Deploys update
|
||||
```
|
||||
|
||||
### Self-Improving Agent Response
|
||||
```python
|
||||
# Self-improving agent: failures trigger evolution
|
||||
try:
|
||||
result = agent.execute(task)
|
||||
except AgentError as e:
|
||||
# Automatic failure capture
|
||||
failure_data = {
|
||||
"error": e,
|
||||
"input": task,
|
||||
"context": agent.get_context(),
|
||||
"trace": agent.get_execution_trace()
|
||||
}
|
||||
|
||||
# Coding agent evolves the system
|
||||
improved_graph = coding_agent.evolve(
|
||||
current_graph=agent.graph,
|
||||
failure_data=failure_data
|
||||
)
|
||||
|
||||
# Validate and redeploy
|
||||
if improved_graph.passes_tests():
|
||||
agent.update_graph(improved_graph)
|
||||
|
||||
# Retry with improved agent
|
||||
result = agent.execute(task)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Advantages of Each Approach
|
||||
|
||||
### Static Agents: Advantages
|
||||
|
||||
1. **Predictability**
|
||||
- Behavior is deterministic
|
||||
- Easy to test and verify
|
||||
- No unexpected changes
|
||||
|
||||
2. **Simplicity**
|
||||
- Easier to understand
|
||||
- Straightforward debugging
|
||||
- Lower initial complexity
|
||||
|
||||
3. **Control**
|
||||
- Full visibility into logic
|
||||
- Manual approval of all changes
|
||||
- Compliance-friendly
|
||||
|
||||
4. **Stability**
|
||||
- No regression from auto-changes
|
||||
- Consistent performance
|
||||
- Known failure modes
|
||||
|
||||
### Self-Improving Agents: Advantages
|
||||
|
||||
1. **Adaptability**
|
||||
- Improves without human intervention
|
||||
- Handles novel situations
|
||||
- Evolves with changing needs
|
||||
|
||||
2. **Efficiency**
|
||||
- Faster improvement cycles
|
||||
- Reduced developer time
|
||||
- Lower maintenance burden
|
||||
|
||||
3. **Resilience**
|
||||
- Self-healing from failures
|
||||
- Automatic recovery
|
||||
- Continuous optimization
|
||||
|
||||
4. **Scale**
|
||||
- Handles more edge cases
|
||||
- Improves across all instances
|
||||
- Compounds improvements over time
|
||||
|
||||
---
|
||||
|
||||
## Challenges of Each Approach
|
||||
|
||||
### Static Agents: Challenges
|
||||
|
||||
- **Slow iteration**: Days/weeks to improve
|
||||
- **Developer bottleneck**: Changes require engineering time
|
||||
- **Scaling issues**: More edge cases = more manual work
|
||||
- **Technical debt**: Accumulated workarounds
|
||||
|
||||
### Self-Improving Agents: Challenges
|
||||
|
||||
- **Unpredictability**: Behavior may change unexpectedly
|
||||
- **Complexity**: Harder to understand current state
|
||||
- **Guardrails needed**: Must prevent harmful evolution
|
||||
- **Debugging**: Tracing why agent behaves certain way
|
||||
|
||||
---
|
||||
|
||||
## Guardrails for Self-Improving Agents
|
||||
|
||||
Successful self-improving systems need safety mechanisms:
|
||||
|
||||
### 1. Human-in-the-Loop Checkpoints
|
||||
```
|
||||
Evolution proposed → Human review → Approve/Reject
|
||||
```
|
||||
|
||||
### 2. Test Case Validation
|
||||
```
|
||||
Improved agent must pass:
|
||||
- Original test cases
|
||||
- Regression tests
|
||||
- New edge case tests
|
||||
```
|
||||
|
||||
### 3. Gradual Rollout
|
||||
```
|
||||
Evolution stages:
|
||||
1. Shadow mode (compare outputs)
|
||||
2. Canary deployment (small traffic)
|
||||
3. Full rollout (all traffic)
|
||||
```
|
||||
|
||||
### 4. Rollback Capability
|
||||
```
|
||||
If metrics degrade:
|
||||
- Automatic revert to previous version
|
||||
- Alert team for investigation
|
||||
```
|
||||
|
||||
### 5. Evolution Constraints
|
||||
```
|
||||
Coding agent cannot:
|
||||
- Remove human checkpoints
|
||||
- Bypass security measures
|
||||
- Exceed cost budgets
|
||||
- Change core objectives
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Real-World Scenarios
|
||||
|
||||
### Scenario 1: Customer Support Agent
|
||||
|
||||
**Static Approach:**
|
||||
- Agent handles known query types
|
||||
- New query types → escalate to human
|
||||
- Developer adds new handlers quarterly
|
||||
- Slow to adapt to trends
|
||||
|
||||
**Self-Improving Approach:**
|
||||
- Agent learns from successful resolutions
|
||||
- New patterns automatically incorporated
|
||||
- Escalation rules evolve based on outcomes
|
||||
- Continuously adapts to customer needs
|
||||
|
||||
### Scenario 2: Data Processing Pipeline
|
||||
|
||||
**Static Approach:**
|
||||
- Fixed schema expectations
|
||||
- New data formats → pipeline breaks
|
||||
- Manual updates for each change
|
||||
- High maintenance burden
|
||||
|
||||
**Self-Improving Approach:**
|
||||
- Learns new data patterns
|
||||
- Automatically adapts to schema changes
|
||||
- Self-corrects processing errors
|
||||
- Lower long-term maintenance
|
||||
|
||||
### Scenario 3: Content Generation
|
||||
|
||||
**Static Approach:**
|
||||
- Fixed style and structure
|
||||
- All changes require prompt updates
|
||||
- No learning from feedback
|
||||
- Consistent but may become stale
|
||||
|
||||
**Self-Improving Approach:**
|
||||
- Learns from editor feedback
|
||||
- Style evolves with brand changes
|
||||
- Improves quality over time
|
||||
- Balances consistency with growth
|
||||
|
||||
---
|
||||
|
||||
## Making the Choice
|
||||
|
||||
### Choose Static Agents When:
|
||||
|
||||
| Situation | Reason |
|
||||
|-----------|--------|
|
||||
| Regulatory requirements | Need audit trail of logic |
|
||||
| Safety-critical systems | Predictability essential |
|
||||
| Simple, stable workflows | No need for adaptation |
|
||||
| Small scale | Manual updates manageable |
|
||||
| High trust requirements | Must explain all decisions |
|
||||
|
||||
### Choose Self-Improving Agents When:
|
||||
|
||||
| Situation | Reason |
|
||||
|-----------|--------|
|
||||
| Rapidly changing requirements | Manual updates too slow |
|
||||
| High volume of edge cases | Can't manually handle all |
|
||||
| Continuous improvement needed | Competitive advantage |
|
||||
| Developer time is limited | Automation essential |
|
||||
| Long-running systems | Evolution provides value |
|
||||
|
||||
---
|
||||
|
||||
## Implementing Self-Improvement
|
||||
|
||||
### With Aden
|
||||
Aden provides built-in self-improvement through:
|
||||
|
||||
1. **Goal-driven generation**: Coding agent creates initial system
|
||||
2. **Failure capture**: Automatic context collection
|
||||
3. **Evolution engine**: Coding agent improves graph
|
||||
4. **Validation**: Test cases verify improvements
|
||||
5. **Deployment**: Automatic with optional approval
|
||||
|
||||
### DIY Approach
|
||||
Building your own requires:
|
||||
|
||||
1. **Failure logging**: Comprehensive context capture
|
||||
2. **Analysis system**: Pattern recognition in failures
|
||||
3. **Code generation**: LLM-based improvement proposals
|
||||
4. **Testing framework**: Automated validation
|
||||
5. **Deployment pipeline**: Safe rollout mechanism
|
||||
|
||||
---
|
||||
|
||||
## Conclusion
|
||||
|
||||
The choice between static and self-improving agents depends on your priorities:
|
||||
|
||||
- **Static agents** offer predictability and control, ideal for stable, regulated environments
|
||||
- **Self-improving agents** offer adaptability and efficiency, ideal for dynamic, scaling systems
|
||||
|
||||
The future likely belongs to **hybrid approaches**: core logic that's stable and auditable, with adaptive components that evolve safely within guardrails.
|
||||
|
||||
Frameworks like Aden are pioneering this space, making self-improvement accessible while maintaining the safety and oversight that production systems require.
|
||||
|
||||
---
|
||||
|
||||
*Last updated: January 2025*
|
||||
@@ -0,0 +1,326 @@
|
||||
# Top 10 AI Agent Frameworks in 2025
|
||||
|
||||
*A comprehensive guide to the leading frameworks for building AI agents*
|
||||
|
||||
---
|
||||
|
||||
The AI agent landscape has exploded with options for developers. Whether you're building RAG applications, multi-agent systems, or autonomous workflows, choosing the right framework can significantly impact your project's success.
|
||||
|
||||
This guide objectively compares the top 10 AI agent frameworks based on architecture, use cases, and production readiness.
|
||||
|
||||
---
|
||||
|
||||
## Quick Comparison
|
||||
|
||||
| Framework | Best For | Language | Open Source | Self-Improving |
|
||||
|-----------|----------|----------|-------------|----------------|
|
||||
| LangChain | RAG & LLM apps | Python/JS | Yes | No |
|
||||
| CrewAI | Role-based teams | Python | Yes | No |
|
||||
| AutoGen | Conversational agents | Python | Yes | No |
|
||||
| Aden | Self-evolving agents | Python/TS | Yes | Yes |
|
||||
| PydanticAI | Type-safe workflows | Python | Yes | No |
|
||||
| Swarm | Simple orchestration | Python | Yes | No |
|
||||
| CAMEL | Research simulations | Python | Yes | No |
|
||||
| Letta | Stateful memory | Python | Yes | No |
|
||||
| Mastra | Full-stack AI | TypeScript | Yes | No |
|
||||
| Haystack | Search & RAG | Python | Yes | No |
|
||||
|
||||
---
|
||||
|
||||
## 1. LangChain
|
||||
|
||||
**Category:** Component Library
|
||||
**Best For:** RAG applications, LLM-powered apps
|
||||
**Language:** Python, JavaScript
|
||||
|
||||
### Overview
|
||||
LangChain is one of the most popular frameworks for building LLM applications. It provides a comprehensive set of components for chains, agents, and retrieval-augmented generation.
|
||||
|
||||
### Strengths
|
||||
- Extensive documentation and community
|
||||
- Wide integration ecosystem
|
||||
- Flexible component architecture
|
||||
- Strong RAG capabilities
|
||||
|
||||
### Limitations
|
||||
- Can be complex for simple use cases
|
||||
- Requires manual workflow definition
|
||||
- No built-in self-improvement mechanisms
|
||||
- Debugging can be challenging
|
||||
|
||||
### When to Use
|
||||
Choose LangChain when you need a mature ecosystem with lots of integrations and are building document-centric applications.
|
||||
|
||||
---
|
||||
|
||||
## 2. CrewAI
|
||||
|
||||
**Category:** Multi-Agent Orchestration
|
||||
**Best For:** Role-based agent teams
|
||||
**Language:** Python
|
||||
|
||||
### Overview
|
||||
CrewAI enables you to create teams of AI agents with defined roles that collaborate to accomplish tasks. It emphasizes simplicity and role-based organization.
|
||||
|
||||
### Strengths
|
||||
- Intuitive role-based design
|
||||
- Clean API for team creation
|
||||
- Good for collaborative workflows
|
||||
- Active community
|
||||
|
||||
### Limitations
|
||||
- Predefined collaboration patterns
|
||||
- Limited adaptation to failures
|
||||
- Manual workflow definition required
|
||||
- Scaling can be complex
|
||||
|
||||
### When to Use
|
||||
Choose CrewAI when you have well-defined roles and want agents to collaborate in predictable patterns.
|
||||
|
||||
---
|
||||
|
||||
## 3. AutoGen
|
||||
|
||||
**Category:** Conversational Agents
|
||||
**Best For:** Multi-agent conversations
|
||||
**Language:** Python
|
||||
|
||||
### Overview
|
||||
Microsoft's AutoGen framework specializes in conversational AI agents that can engage in complex multi-turn dialogues and collaborate through conversation.
|
||||
|
||||
### Strengths
|
||||
- Strong conversational capabilities
|
||||
- Microsoft backing and support
|
||||
- Good for dialogue-heavy applications
|
||||
- Flexible agent configuration
|
||||
|
||||
### Limitations
|
||||
- Conversation-centric (less suited for other patterns)
|
||||
- Complex setup for non-conversational tasks
|
||||
- No automatic evolution
|
||||
|
||||
### When to Use
|
||||
Choose AutoGen when your agents primarily need to communicate through natural language conversations.
|
||||
|
||||
---
|
||||
|
||||
## 4. Aden
|
||||
|
||||
**Category:** Self-Evolving Agent Framework
|
||||
**Best For:** Production systems that need to adapt
|
||||
**Language:** Python SDK, TypeScript backend
|
||||
|
||||
### Overview
|
||||
Aden takes a fundamentally different approach by using a coding agent to generate agent systems from natural language goals. When agents fail, the framework automatically captures failure data, evolves the agent graph, and redeploys.
|
||||
|
||||
### Strengths
|
||||
- Goal-driven development (describe outcomes, not workflows)
|
||||
- Automatic self-improvement from failures
|
||||
- Built-in observability and cost controls
|
||||
- Human-in-the-loop support
|
||||
- Production-ready with monitoring dashboard
|
||||
|
||||
### Limitations
|
||||
- Newer framework with growing ecosystem
|
||||
- Requires understanding of goal-driven paradigm
|
||||
- More suited for complex, evolving systems
|
||||
|
||||
### When to Use
|
||||
Choose Aden when you need agents that improve over time, want to define goals rather than workflows, or require production-grade observability and cost management.
|
||||
|
||||
---
|
||||
|
||||
## 5. PydanticAI
|
||||
|
||||
**Category:** Type-Safe Framework
|
||||
**Best For:** Structured, validated outputs
|
||||
**Language:** Python
|
||||
|
||||
### Overview
|
||||
PydanticAI brings type safety and validation to AI agent development, ensuring outputs conform to defined schemas.
|
||||
|
||||
### Strengths
|
||||
- Strong type validation
|
||||
- Clean, Pythonic API
|
||||
- Good for structured outputs
|
||||
- Reliable data handling
|
||||
|
||||
### Limitations
|
||||
- Best for known workflow patterns
|
||||
- Less flexible for dynamic scenarios
|
||||
- No self-adaptation
|
||||
|
||||
### When to Use
|
||||
Choose PydanticAI when output structure and validation are critical to your application.
|
||||
|
||||
---
|
||||
|
||||
## 6. Swarm
|
||||
|
||||
**Category:** Lightweight Orchestration
|
||||
**Best For:** Simple multi-agent setups
|
||||
**Language:** Python
|
||||
|
||||
### Overview
|
||||
OpenAI's Swarm provides a minimal framework for orchestrating multiple agents with simple handoff patterns.
|
||||
|
||||
### Strengths
|
||||
- Extremely simple API
|
||||
- Easy to understand and use
|
||||
- Good for learning
|
||||
- Minimal overhead
|
||||
|
||||
### Limitations
|
||||
- Limited features for production
|
||||
- No built-in monitoring
|
||||
- Simple handoff patterns only
|
||||
|
||||
### When to Use
|
||||
Choose Swarm for prototyping or simple multi-agent interactions where complexity isn't needed.
|
||||
|
||||
---
|
||||
|
||||
## 7. CAMEL
|
||||
|
||||
**Category:** Research Framework
|
||||
**Best For:** Large-scale agent simulations
|
||||
**Language:** Python
|
||||
|
||||
### Overview
|
||||
CAMEL is designed for studying emergent behavior in large-scale multi-agent systems, supporting up to 1M agents.
|
||||
|
||||
### Strengths
|
||||
- Massive scale support
|
||||
- Research-oriented features
|
||||
- Good for studying emergence
|
||||
- Academic backing
|
||||
|
||||
### Limitations
|
||||
- Research-focused, not production-ready
|
||||
- Steep learning curve
|
||||
- Limited production tooling
|
||||
|
||||
### When to Use
|
||||
Choose CAMEL for academic research or when studying large-scale agent interactions.
|
||||
|
||||
---
|
||||
|
||||
## 8. Letta (formerly MemGPT)
|
||||
|
||||
**Category:** Stateful Memory
|
||||
**Best For:** Long-term memory agents
|
||||
**Language:** Python
|
||||
|
||||
### Overview
|
||||
Letta specializes in agents with sophisticated long-term memory, allowing agents to maintain context across extended interactions.
|
||||
|
||||
### Strengths
|
||||
- Advanced memory management
|
||||
- Long-term context retention
|
||||
- Good for personal assistants
|
||||
- Unique memory architecture
|
||||
|
||||
### Limitations
|
||||
- Memory-focused (less general purpose)
|
||||
- Complex memory tuning
|
||||
- Specific use cases
|
||||
|
||||
### When to Use
|
||||
Choose Letta when long-term memory and context retention are primary requirements.
|
||||
|
||||
---
|
||||
|
||||
## 9. Mastra
|
||||
|
||||
**Category:** Full-Stack AI Framework
|
||||
**Best For:** TypeScript developers
|
||||
**Language:** TypeScript
|
||||
|
||||
### Overview
|
||||
Mastra provides a TypeScript-first approach to building AI applications with integrated tooling.
|
||||
|
||||
### Strengths
|
||||
- TypeScript native
|
||||
- Full-stack integration
|
||||
- Modern developer experience
|
||||
- Good for web applications
|
||||
|
||||
### Limitations
|
||||
- TypeScript only
|
||||
- Smaller ecosystem
|
||||
- Less mature than alternatives
|
||||
|
||||
### When to Use
|
||||
Choose Mastra when building TypeScript applications and want tight integration with web technologies.
|
||||
|
||||
---
|
||||
|
||||
## 10. Haystack
|
||||
|
||||
**Category:** Search & RAG
|
||||
**Best For:** Document processing pipelines
|
||||
**Language:** Python
|
||||
|
||||
### Overview
|
||||
Haystack excels at building search and retrieval systems, with strong support for document processing pipelines.
|
||||
|
||||
### Strengths
|
||||
- Excellent for search applications
|
||||
- Strong document processing
|
||||
- Production-tested
|
||||
- Good pipeline abstractions
|
||||
|
||||
### Limitations
|
||||
- Search/RAG focused
|
||||
- Less suited for general agents
|
||||
- Pipeline-centric design
|
||||
|
||||
### When to Use
|
||||
Choose Haystack when building search, Q&A, or document processing systems.
|
||||
|
||||
---
|
||||
|
||||
## Decision Framework
|
||||
|
||||
### Choose Based on Your Primary Need
|
||||
|
||||
| Need | Recommended Framework |
|
||||
|------|----------------------|
|
||||
| RAG / Document apps | LangChain, Haystack |
|
||||
| Role-based teams | CrewAI |
|
||||
| Conversational agents | AutoGen |
|
||||
| Self-improving systems | Aden |
|
||||
| Type-safe outputs | PydanticAI |
|
||||
| Simple prototypes | Swarm |
|
||||
| Research simulations | CAMEL |
|
||||
| Long-term memory | Letta |
|
||||
| TypeScript apps | Mastra |
|
||||
|
||||
### Choose Based on Production Requirements
|
||||
|
||||
| Requirement | Best Options |
|
||||
|-------------|--------------|
|
||||
| Self-healing & adaptation | Aden |
|
||||
| Mature ecosystem | LangChain |
|
||||
| Cost management built-in | Aden |
|
||||
| Simple deployment | Swarm, CrewAI |
|
||||
| Enterprise support | LangChain, AutoGen |
|
||||
| Real-time monitoring | Aden |
|
||||
|
||||
---
|
||||
|
||||
## Conclusion
|
||||
|
||||
The "best" framework depends on your specific needs:
|
||||
|
||||
- **For most RAG applications:** LangChain remains the standard
|
||||
- **For collaborative agent teams:** CrewAI offers intuitive design
|
||||
- **For systems that need to evolve:** Aden's self-improving approach is unique
|
||||
- **For research:** CAMEL provides scale
|
||||
- **For simplicity:** Swarm is hard to beat
|
||||
|
||||
Consider your production requirements, team expertise, and whether you need agents that can adapt and improve over time when making your decision.
|
||||
|
||||
---
|
||||
|
||||
*Last updated: January 2025*
|
||||
@@ -0,0 +1,165 @@
|
||||
# 🚀 Getting Started Challenge
|
||||
|
||||
Welcome to Aden! This challenge will help you get familiar with our project and community. Complete all tasks to earn your first badge!
|
||||
|
||||
**Difficulty:** Beginner
|
||||
**Time:** ~30 minutes
|
||||
**Prerequisites:** GitHub account
|
||||
|
||||
---
|
||||
|
||||
## Part 1: Join the Aden Community (10 points)
|
||||
|
||||
### Task 1.1: Star the Repository ⭐
|
||||
Show your support by starring our repo!
|
||||
|
||||
1. Go to [github.com/adenhq/hive](https://github.com/adenhq/hive)
|
||||
2. Click the **Star** button in the top right
|
||||
3. **Screenshot** your starred repo (showing the star count)
|
||||
|
||||
### Task 1.2: Watch the Repository 👁️
|
||||
Stay updated with our latest changes!
|
||||
|
||||
1. Click the **Watch** button
|
||||
2. Select **"All Activity"** to get notifications
|
||||
3. **Screenshot** your watch settings
|
||||
|
||||
### Task 1.3: Fork the Repository 🍴
|
||||
Create your own copy to experiment with!
|
||||
|
||||
1. Click the **Fork** button
|
||||
2. Keep the default settings and create the fork
|
||||
3. **Screenshot** your forked repository
|
||||
|
||||
### Task 1.4: Join Discord 💬
|
||||
Connect with our community!
|
||||
|
||||
1. Join our [Discord server](https://discord.com/invite/MXE49hrKDk)
|
||||
2. Introduce yourself in `#introductions`
|
||||
3. **Screenshot** your introduction message
|
||||
|
||||
---
|
||||
|
||||
## Part 2: Explore Aden (15 points)
|
||||
|
||||
### Task 2.1: README Scavenger Hunt 🔍
|
||||
Find the answers to these questions by reading our README:
|
||||
|
||||
1. What are the **three LLM providers** Aden supports out of the box?
|
||||
2. How many **MCP tools** does the Hive Control Plane provide?
|
||||
3. What is the name of the **frontend dashboard**?
|
||||
4. In the "How It Works" section, what is **Step 5**?
|
||||
5. What city is Aden made with passion in?
|
||||
|
||||
### Task 2.2: Architecture Quiz 🏗️
|
||||
Based on the architecture diagram in the README:
|
||||
|
||||
1. What are the three databases in the Storage Layer?
|
||||
2. Name two components inside an "SDK-Wrapped Node"
|
||||
3. What connects the Control Plane to the Dashboard?
|
||||
4. Where does "Failure Data" flow to in the diagram?
|
||||
|
||||
### Task 2.3: Comparison Challenge 📊
|
||||
From the Comparison Table, answer:
|
||||
|
||||
1. What category is CrewAI in?
|
||||
2. What's the Aden difference compared to LangChain?
|
||||
3. Which framework focuses on "emergent behavior in large-scale simulations"?
|
||||
|
||||
---
|
||||
|
||||
## Part 3: Quick Code Exploration (15 points)
|
||||
|
||||
### Task 3.1: Project Structure 📁
|
||||
Clone your fork and explore the codebase:
|
||||
|
||||
```bash
|
||||
git clone https://github.com/YOUR_USERNAME/hive.git
|
||||
cd hive
|
||||
```
|
||||
|
||||
Answer these questions:
|
||||
|
||||
1. What is the main frontend folder called?
|
||||
2. What is the main backend folder called?
|
||||
3. What file would you edit to configure the application?
|
||||
4. What's the Docker command to start all services (hint: check README)?
|
||||
|
||||
### Task 3.2: Find the Features 🎯
|
||||
Look through the codebase to find:
|
||||
|
||||
1. Where are the MCP tools defined? (provide the file path)
|
||||
2. What port does the API run on? (hint: check README or docker-compose)
|
||||
3. Find one TypeScript interface related to agents (provide file path and interface name)
|
||||
|
||||
---
|
||||
|
||||
## Part 4: Creative Challenge (10 points)
|
||||
|
||||
### Task 4.1: Agent Idea 💡
|
||||
Aden can build self-improving agents for any use case. Propose ONE creative agent idea:
|
||||
|
||||
1. **Name:** Give your agent a catchy name
|
||||
2. **Goal:** What problem does it solve? (2-3 sentences)
|
||||
3. **Self-Improvement:** How would it get better over time when things fail?
|
||||
4. **Human-in-the-Loop:** When would it need human input?
|
||||
|
||||
Example format:
|
||||
```
|
||||
Name: DocBot
|
||||
Goal: Automatically keeps documentation in sync with code changes.
|
||||
Monitors PRs and updates relevant docs.
|
||||
Self-Improvement: When docs get rejected in review, it learns the feedback
|
||||
and adjusts its writing style and coverage.
|
||||
Human-in-the-Loop: Major architectural changes require human approval
|
||||
before doc updates go live.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Submission Checklist
|
||||
|
||||
Before submitting, make sure you have:
|
||||
|
||||
- [ ] Screenshots from Part 1 (Star, Watch, Fork, Discord)
|
||||
- [ ] Answers to all Part 2 questions
|
||||
- [ ] Answers to all Part 3 questions
|
||||
- [ ] Your creative agent idea from Part 4
|
||||
|
||||
### How to Submit
|
||||
|
||||
1. Create a GitHub Gist at [gist.github.com](https://gist.github.com)
|
||||
2. Name it `aden-getting-started-YOURNAME.md`
|
||||
3. Include all your answers and screenshots (use image hosting like imgur for screenshots)
|
||||
4. Email the Gist link to `careers@adenhq.com`
|
||||
- Subject: `[Getting Started Challenge] Your Name`
|
||||
- Include your GitHub username
|
||||
|
||||
---
|
||||
|
||||
## Scoring
|
||||
|
||||
| Section | Points |
|
||||
|---------|--------|
|
||||
| Part 1: Community | 10 |
|
||||
| Part 2: Explore | 15 |
|
||||
| Part 3: Code | 15 |
|
||||
| Part 4: Creative | 10 |
|
||||
| **Total** | **50** |
|
||||
|
||||
**Passing score:** 40+ points
|
||||
|
||||
---
|
||||
|
||||
## What's Next?
|
||||
|
||||
After completing this challenge, choose your specialization:
|
||||
|
||||
- **Backend Engineers:** [🧠 Architecture Deep Dive](./02-architecture-deep-dive.md)
|
||||
- **AI/ML Engineers:** [🤖 Build Your First Agent](./03-build-your-first-agent.md)
|
||||
- **Frontend Engineers:** [🎨 Frontend Challenge](./04-frontend-challenge.md)
|
||||
- **DevOps Engineers:** [🔧 DevOps Challenge](./05-devops-challenge.md)
|
||||
|
||||
---
|
||||
|
||||
Good luck! We're excited to see your submissions! 🎉
|
||||
@@ -0,0 +1,195 @@
|
||||
# 🧠 Architecture Deep Dive Challenge
|
||||
|
||||
Test your understanding of Aden's architecture and backend systems. This challenge is perfect for backend engineers who want to contribute to the core framework.
|
||||
|
||||
**Difficulty:** Intermediate
|
||||
**Time:** 1-2 hours
|
||||
**Prerequisites:** Complete [Getting Started](./01-getting-started.md), familiarity with Node.js/TypeScript
|
||||
|
||||
---
|
||||
|
||||
## Part 1: System Architecture (20 points)
|
||||
|
||||
### Task 1.1: Component Mapping 🗺️
|
||||
Study the Aden architecture and answer:
|
||||
|
||||
1. Describe the data flow from when a user defines a goal to when worker agents execute. Include all major components.
|
||||
|
||||
2. Explain the "self-improvement loop" - what happens when an agent fails?
|
||||
|
||||
3. What's the difference between:
|
||||
- Coding Agent vs Worker Agent
|
||||
- STM (Short-Term Memory) vs LTM (Long-Term Memory)
|
||||
- Hot storage vs Cold storage for events
|
||||
|
||||
### Task 1.2: Database Design 💾
|
||||
Aden uses three databases. For each, explain:
|
||||
|
||||
1. **TimescaleDB:** What type of data is stored? Why TimescaleDB specifically?
|
||||
2. **MongoDB:** What is stored here? Why a document database?
|
||||
3. **PostgreSQL:** What is its primary purpose?
|
||||
|
||||
### Task 1.3: Real-time Communication 📡
|
||||
Answer these about the real-time systems:
|
||||
|
||||
1. What protocol connects the SDK to the Hive backend for policy updates?
|
||||
2. How does the dashboard receive live agent metrics?
|
||||
3. What is the heartbeat interval for SDK health checks?
|
||||
|
||||
---
|
||||
|
||||
## Part 2: Code Analysis (25 points)
|
||||
|
||||
### Task 2.1: API Routes 🛣️
|
||||
Explore the backend code and document:
|
||||
|
||||
1. List all the main API route prefixes (e.g., `/user`, `/v1/control`, etc.)
|
||||
2. For the `/v1/control` routes, what are the main endpoints and their purposes?
|
||||
3. What authentication method is used for API requests?
|
||||
|
||||
### Task 2.2: MCP Tools Deep Dive 🔧
|
||||
The MCP server provides 19 tools. Categorize them and answer:
|
||||
|
||||
1. List all **Budget tools** (tools with "budget" in the name)
|
||||
2. List all **Analytics tools**
|
||||
3. List all **Policy tools**
|
||||
4. Pick ONE tool and explain:
|
||||
- What parameters does it accept?
|
||||
- What does it return?
|
||||
- When would the Coding Agent use it?
|
||||
|
||||
### Task 2.3: Event Specification 📊
|
||||
Find and analyze the SDK event specification:
|
||||
|
||||
1. What are the four event types that can be sent from SDK to server?
|
||||
2. For a `MetricEvent`, list at least 5 fields that are captured
|
||||
3. What is "Layer 0 content capture" and when is it used?
|
||||
|
||||
---
|
||||
|
||||
## Part 3: Design Questions (25 points)
|
||||
|
||||
### Task 3.1: Scaling Scenario 📈
|
||||
Imagine Aden needs to handle 1000 concurrent agents across 50 teams:
|
||||
|
||||
1. Which components would be the bottleneck? Why?
|
||||
2. How would you horizontally scale the system?
|
||||
3. What database optimizations would you recommend?
|
||||
4. How would you ensure team data isolation at scale?
|
||||
|
||||
### Task 3.2: New Feature Design 🆕
|
||||
Design a new feature: **Agent Collaboration Logs**
|
||||
|
||||
Requirements:
|
||||
- Track when agents communicate with each other
|
||||
- Store the message content and metadata
|
||||
- Support querying by time range, agent, or conversation thread
|
||||
- Real-time streaming to the dashboard
|
||||
|
||||
Provide:
|
||||
1. Database schema design (which DB and table structure)
|
||||
2. API endpoint design (routes and payloads)
|
||||
3. How would this integrate with existing event batching?
|
||||
|
||||
### Task 3.3: Failure Handling ⚠️
|
||||
The self-healing loop is core to Aden. Design the detailed flow:
|
||||
|
||||
1. How should failures be categorized (types of failures)?
|
||||
2. What data should be captured for the Coding Agent to improve?
|
||||
3. How do you prevent infinite failure loops?
|
||||
4. When should the system escalate to human intervention?
|
||||
|
||||
---
|
||||
|
||||
## Part 4: Practical Implementation (30 points)
|
||||
|
||||
### Task 4.1: Write a New MCP Tool 🛠️
|
||||
Create a new MCP tool called `hive_agent_performance_report`:
|
||||
|
||||
**Requirements:**
|
||||
- Returns performance metrics for a specific agent over a time period
|
||||
- Includes: total requests, success rate, avg latency, total cost
|
||||
- Accepts parameters: `agent_id`, `start_time`, `end_time`
|
||||
|
||||
Provide:
|
||||
1. Tool definition (name, description, input schema)
|
||||
2. Implementation pseudocode or actual TypeScript
|
||||
3. Example request and response
|
||||
|
||||
### Task 4.2: Budget Enforcement Algorithm 💰
|
||||
Implement the logic for budget enforcement:
|
||||
|
||||
```typescript
|
||||
interface BudgetCheck {
|
||||
action: 'allow' | 'block' | 'throttle' | 'degrade';
|
||||
reason: string;
|
||||
degradedModel?: string;
|
||||
delayMs?: number;
|
||||
}
|
||||
|
||||
function checkBudget(
|
||||
currentSpend: number,
|
||||
budgetLimit: number,
|
||||
requestedModel: string,
|
||||
estimatedCost: number
|
||||
): BudgetCheck {
|
||||
// Your implementation here
|
||||
}
|
||||
```
|
||||
|
||||
Requirements:
|
||||
- Block if budget would be exceeded
|
||||
- Throttle (2000ms delay) if ≥95% used
|
||||
- Degrade to cheaper model if ≥80% used
|
||||
- Allow otherwise
|
||||
|
||||
### Task 4.3: Event Aggregation Query 📈
|
||||
Write a SQL query for TimescaleDB that:
|
||||
|
||||
1. Aggregates metrics by hour for the last 24 hours
|
||||
2. Groups by model and provider
|
||||
3. Calculates: total tokens, total cost, avg latency, request count
|
||||
4. Orders by total cost descending
|
||||
|
||||
---
|
||||
|
||||
## Submission Checklist
|
||||
|
||||
- [ ] All Part 1 architecture answers
|
||||
- [ ] All Part 2 code analysis answers
|
||||
- [ ] All Part 3 design documents
|
||||
- [ ] All Part 4 implementations
|
||||
|
||||
### How to Submit
|
||||
|
||||
1. Create a GitHub Gist with your answers
|
||||
2. Name it `aden-architecture-YOURNAME.md`
|
||||
3. Include any code files as separate files in the Gist
|
||||
4. Email to `careers@adenhq.com`
|
||||
- Subject: `[Architecture Challenge] Your Name`
|
||||
|
||||
---
|
||||
|
||||
## Scoring
|
||||
|
||||
| Section | Points |
|
||||
|---------|--------|
|
||||
| Part 1: System Architecture | 20 |
|
||||
| Part 2: Code Analysis | 25 |
|
||||
| Part 3: Design Questions | 25 |
|
||||
| Part 4: Implementation | 30 |
|
||||
| **Total** | **100** |
|
||||
|
||||
**Passing score:** 75+ points
|
||||
|
||||
---
|
||||
|
||||
## Bonus Points (+20)
|
||||
|
||||
- Identify a bug or improvement in the actual codebase and open an issue
|
||||
- Submit a PR fixing a documentation issue
|
||||
- Create a diagram of your design using Mermaid or similar
|
||||
|
||||
---
|
||||
|
||||
Good luck! We're looking for engineers who can think systematically about distributed systems! 🏗️
|
||||
@@ -0,0 +1,277 @@
|
||||
# 🤖 Build Your First Agent Challenge
|
||||
|
||||
Get hands-on with AI agents! This challenge is for AI/ML engineers who want to understand agent development and contribute to Aden's agent ecosystem.
|
||||
|
||||
**Difficulty:** Intermediate
|
||||
**Time:** 2-3 hours
|
||||
**Prerequisites:** Complete [Getting Started](./01-getting-started.md), Python experience, basic LLM knowledge
|
||||
|
||||
---
|
||||
|
||||
## Part 1: Agent Fundamentals (20 points)
|
||||
|
||||
### Task 1.1: Core Concepts 📚
|
||||
Answer these questions about Aden's agent architecture:
|
||||
|
||||
1. What is a "node" in Aden's architecture? How does it differ from a traditional function?
|
||||
|
||||
2. Explain the SDK-wrapped node concept. What four capabilities does every node get automatically?
|
||||
|
||||
3. What's the difference between:
|
||||
- A Coding Agent and a Worker Agent
|
||||
- Goal-driven vs workflow-driven development
|
||||
- Predefined edges vs dynamic connections
|
||||
|
||||
4. Why does Aden generate "connection code" instead of using a fixed graph structure?
|
||||
|
||||
### Task 1.2: Memory Systems 🧠
|
||||
Aden has sophisticated memory management:
|
||||
|
||||
1. Describe the three types of memory available to agents:
|
||||
- Shared Memory
|
||||
- STM (Short-Term Memory)
|
||||
- LTM (Long-Term Memory / RLM)
|
||||
|
||||
2. When would an agent use each type?
|
||||
|
||||
3. How does "Session Local memory isolation" work?
|
||||
|
||||
### Task 1.3: Human-in-the-Loop 🙋
|
||||
Explain the HITL system:
|
||||
|
||||
1. What triggers a human intervention point?
|
||||
2. What happens if a human doesn't respond within the timeout?
|
||||
3. List three scenarios where HITL would be essential
|
||||
|
||||
---
|
||||
|
||||
## Part 2: Agent Design (25 points)
|
||||
|
||||
### Task 2.1: Design a Multi-Agent System 🎭
|
||||
Design a **Content Marketing Agent System** with multiple worker agents:
|
||||
|
||||
**Goal:** Automatically create and publish blog posts based on company news
|
||||
|
||||
Requirements:
|
||||
- Must use at least 3 specialized worker agents
|
||||
- Include human approval before publishing
|
||||
- Handle failures gracefully
|
||||
|
||||
Provide:
|
||||
1. **Agent Diagram:** Show all agents and how they connect
|
||||
2. **Agent Descriptions:** For each agent, describe:
|
||||
- Name and role
|
||||
- Inputs and outputs
|
||||
- Tools it needs
|
||||
- Failure scenarios
|
||||
3. **Human Checkpoints:** Where would humans intervene?
|
||||
4. **Self-Improvement:** How would this system learn from failures?
|
||||
|
||||
### Task 2.2: Goal Definition 🎯
|
||||
Write a natural language goal that a user might give to create your system:
|
||||
|
||||
```
|
||||
Example Goal:
|
||||
"Create a system that monitors our company RSS feed for news,
|
||||
writes engaging blog posts about each news item, gets approval
|
||||
from the marketing team, and publishes to our WordPress site.
|
||||
If a post is rejected, learn from the feedback to write better
|
||||
posts in the future."
|
||||
```
|
||||
|
||||
Your goal should be:
|
||||
- Clear and specific
|
||||
- Include success criteria
|
||||
- Mention failure handling
|
||||
- Specify human touchpoints
|
||||
|
||||
### Task 2.3: Test Cases 📋
|
||||
Design 5 test cases for your agent system:
|
||||
|
||||
| Test Case | Input | Expected Output | Success Criteria |
|
||||
|-----------|-------|-----------------|------------------|
|
||||
| Happy Path | Normal news item | Published blog post | Post live on site |
|
||||
| ... | ... | ... | ... |
|
||||
|
||||
Include at least:
|
||||
- 1 happy path
|
||||
- 2 edge cases
|
||||
- 2 failure scenarios
|
||||
|
||||
---
|
||||
|
||||
## Part 3: Practical Implementation (30 points)
|
||||
|
||||
### Task 3.1: Agent Pseudocode 💻
|
||||
Write pseudocode for ONE of your worker agents:
|
||||
|
||||
```python
|
||||
class ContentWriterAgent:
|
||||
"""
|
||||
Agent that takes news items and writes blog posts.
|
||||
"""
|
||||
|
||||
def __init__(self, config):
|
||||
# Initialize with tools, memory, LLM access
|
||||
pass
|
||||
|
||||
async def execute(self, input_data):
|
||||
# Main execution logic
|
||||
pass
|
||||
|
||||
async def handle_failure(self, error, context):
|
||||
# How to handle different types of failures
|
||||
pass
|
||||
|
||||
async def learn_from_feedback(self, feedback):
|
||||
# How to improve based on rejection feedback
|
||||
pass
|
||||
```
|
||||
|
||||
Provide detailed pseudocode with:
|
||||
- LLM calls and prompts
|
||||
- Memory reads/writes
|
||||
- Tool usage
|
||||
- Error handling
|
||||
|
||||
### Task 3.2: Prompt Engineering 📝
|
||||
Write the actual prompts for your agent:
|
||||
|
||||
1. **System Prompt:** The core instructions for your agent
|
||||
2. **Task Prompt Template:** How tasks are presented to the agent
|
||||
3. **Feedback Learning Prompt:** How rejection feedback is processed
|
||||
|
||||
Example format:
|
||||
```
|
||||
SYSTEM PROMPT:
|
||||
You are a professional content writer for {company_name}...
|
||||
|
||||
TASK PROMPT:
|
||||
Given the following news item:
|
||||
{news_content}
|
||||
|
||||
Write a blog post that...
|
||||
|
||||
FEEDBACK PROMPT:
|
||||
Your previous post was rejected with this feedback:
|
||||
{feedback}
|
||||
|
||||
Analyze what went wrong and...
|
||||
```
|
||||
|
||||
### Task 3.3: Tool Definitions 🔧
|
||||
Define the tools your agent needs:
|
||||
|
||||
```python
|
||||
tools = [
|
||||
{
|
||||
"name": "search_company_knowledge",
|
||||
"description": "Search internal knowledge base for relevant context",
|
||||
"parameters": {
|
||||
"query": "string - search query",
|
||||
"limit": "int - max results (default 5)"
|
||||
},
|
||||
"returns": "List of relevant documents"
|
||||
},
|
||||
# Add more tools...
|
||||
]
|
||||
```
|
||||
|
||||
Define at least 3 tools with:
|
||||
- Clear name and description
|
||||
- Input parameters with types
|
||||
- Return value description
|
||||
- Example usage
|
||||
|
||||
---
|
||||
|
||||
## Part 4: Advanced Challenges (25 points)
|
||||
|
||||
### Task 4.1: Failure Evolution Design 🔄
|
||||
Design the self-improvement mechanism in detail:
|
||||
|
||||
1. **Failure Classification:** Create a taxonomy of failures for your agent
|
||||
```
|
||||
- LLM Failures: rate limit, content filter, hallucination
|
||||
- Tool Failures: API down, invalid response, timeout
|
||||
- Logic Failures: wrong output format, missing data
|
||||
- Human Rejection: quality issues, off-brand, factual error
|
||||
```
|
||||
|
||||
2. **Learning Storage:** What data do you store for each failure type?
|
||||
|
||||
3. **Evolution Strategy:** How does the Coding Agent use failure data to improve?
|
||||
|
||||
4. **Guardrails:** What prevents the system from making things worse?
|
||||
|
||||
### Task 4.2: Cost Optimization 💰
|
||||
Your agent system will be called frequently. Design cost optimizations:
|
||||
|
||||
1. **Model Selection:** When to use GPT-4 vs GPT-3.5 vs Claude Haiku?
|
||||
2. **Caching Strategy:** What can be cached to reduce LLM calls?
|
||||
3. **Batching:** How can you batch operations for efficiency?
|
||||
4. **Budget Rules:** Design budget rules for your system
|
||||
|
||||
### Task 4.3: Observability Dashboard 📊
|
||||
Design what metrics should be tracked for your agent system:
|
||||
|
||||
1. **Performance Metrics:** (at least 5)
|
||||
2. **Quality Metrics:** (at least 3)
|
||||
3. **Cost Metrics:** (at least 3)
|
||||
4. **Alert Conditions:** When should the system alert humans?
|
||||
|
||||
---
|
||||
|
||||
## Submission Checklist
|
||||
|
||||
- [ ] All Part 1 concept answers
|
||||
- [ ] Complete multi-agent design (Part 2)
|
||||
- [ ] Implementation code/pseudocode (Part 3)
|
||||
- [ ] Advanced challenge solutions (Part 4)
|
||||
|
||||
### How to Submit
|
||||
|
||||
1. Create a GitHub Gist with your answers
|
||||
2. Name it `aden-agent-challenge-YOURNAME.md`
|
||||
3. Include code files separately
|
||||
4. If you created diagrams, include images
|
||||
5. Email to `careers@adenhq.com`
|
||||
- Subject: `[Agent Challenge] Your Name`
|
||||
|
||||
---
|
||||
|
||||
## Scoring
|
||||
|
||||
| Section | Points |
|
||||
|---------|--------|
|
||||
| Part 1: Fundamentals | 20 |
|
||||
| Part 2: Design | 25 |
|
||||
| Part 3: Implementation | 30 |
|
||||
| Part 4: Advanced | 25 |
|
||||
| **Total** | **100** |
|
||||
|
||||
**Passing score:** 75+ points
|
||||
|
||||
---
|
||||
|
||||
## Bonus Points (+25)
|
||||
|
||||
- **+10:** Actually implement a working prototype using any framework
|
||||
- **+10:** Create a demo video of your agent in action
|
||||
- **+5:** Submit a PR adding your agent as a template to the repo
|
||||
|
||||
---
|
||||
|
||||
## Example Agent Templates
|
||||
|
||||
Need inspiration? Here are some agent ideas:
|
||||
|
||||
1. **Research Agent:** Gathers information from multiple sources
|
||||
2. **Code Review Agent:** Reviews PRs and suggests improvements
|
||||
3. **Customer Support Agent:** Handles support tickets with escalation
|
||||
4. **Data Pipeline Agent:** Monitors and fixes data quality issues
|
||||
5. **Meeting Agent:** Summarizes meetings and creates action items
|
||||
|
||||
---
|
||||
|
||||
Good luck! We're excited to see your creative agent designs! 🤖✨
|
||||
@@ -0,0 +1,277 @@
|
||||
# 🎨 Frontend Challenge
|
||||
|
||||
Build beautiful, functional interfaces for AI agent management! This challenge is for frontend engineers who want to contribute to Honeycomb, Aden's dashboard.
|
||||
|
||||
**Difficulty:** Intermediate
|
||||
**Time:** 1-2 hours
|
||||
**Prerequisites:** Complete [Getting Started](./01-getting-started.md), React/TypeScript experience
|
||||
|
||||
---
|
||||
|
||||
## Part 1: Codebase Exploration (15 points)
|
||||
|
||||
### Task 1.1: Tech Stack Analysis 🔍
|
||||
Explore the `honeycomb/` directory and answer:
|
||||
|
||||
1. What React version is used?
|
||||
2. What styling solution is used? (Tailwind, CSS Modules, etc.)
|
||||
3. What state management approach is used?
|
||||
4. What charting library is used for analytics?
|
||||
5. How does the frontend communicate with the backend in real-time?
|
||||
|
||||
### Task 1.2: Component Structure 📁
|
||||
Map out the component architecture:
|
||||
|
||||
1. List the main page components (routes)
|
||||
2. Find and describe 3 reusable components
|
||||
3. Where are TypeScript types defined for agent data?
|
||||
4. How is authentication handled in the frontend?
|
||||
|
||||
### Task 1.3: Design System 🎨
|
||||
Analyze the UI patterns:
|
||||
|
||||
1. What UI component library is used? (Radix, shadcn, etc.)
|
||||
2. Find 3 custom components that aren't from a library
|
||||
3. What color scheme/theme approach is used?
|
||||
4. How are loading and error states typically handled?
|
||||
|
||||
---
|
||||
|
||||
## Part 2: UI/UX Analysis (20 points)
|
||||
|
||||
### Task 2.1: Dashboard Critique 📊
|
||||
Based on the codebase and agent control types, analyze what the dashboard likely shows:
|
||||
|
||||
1. What key metrics would you display for agent monitoring?
|
||||
2. How would you visualize the agent graph/connections?
|
||||
3. What real-time updates are most important to show?
|
||||
4. Critique: What could be improved in the current approach?
|
||||
|
||||
### Task 2.2: User Flow Design 🔄
|
||||
Design the user flow for this feature:
|
||||
|
||||
**Feature:** "Create New Agent from Goal"
|
||||
|
||||
Map out:
|
||||
1. Entry point (where does the user start?)
|
||||
2. Step-by-step screens needed
|
||||
3. Form fields and validation
|
||||
4. Success/error states
|
||||
5. How to show agent generation progress
|
||||
|
||||
Provide a wireframe (can be ASCII, hand-drawn, or Figma):
|
||||
|
||||
```
|
||||
+----------------------------------+
|
||||
| Create New Agent |
|
||||
|----------------------------------|
|
||||
| Step 1: Define Your Goal |
|
||||
| +----------------------------+ |
|
||||
| | Describe what you want | |
|
||||
| | your agent to achieve... | |
|
||||
| +----------------------------+ |
|
||||
| |
|
||||
| [ ] Include human checkpoints |
|
||||
| [ ] Enable cost controls |
|
||||
| |
|
||||
| [Cancel] [Next Step] |
|
||||
+----------------------------------+
|
||||
```
|
||||
|
||||
### Task 2.3: Accessibility Audit ♿
|
||||
Consider accessibility for the agent dashboard:
|
||||
|
||||
1. List 5 accessibility requirements for a data-heavy dashboard
|
||||
2. How would you make real-time updates accessible?
|
||||
3. What keyboard navigation is essential?
|
||||
4. How would you handle screen readers for the agent graph visualization?
|
||||
|
||||
---
|
||||
|
||||
## Part 3: Implementation Challenges (35 points)
|
||||
|
||||
### Task 3.1: Build a Component 🧱
|
||||
Create a React component: `AgentStatusCard`
|
||||
|
||||
Requirements:
|
||||
- Display agent name, status, and key metrics
|
||||
- Status: online (green), degraded (yellow), offline (red), unknown (gray)
|
||||
- Show: requests/min, success rate, avg latency, cost today
|
||||
- Include a mini sparkline chart for requests over last hour
|
||||
- Expandable to show more details
|
||||
- TypeScript with proper types
|
||||
|
||||
```tsx
|
||||
interface AgentStatusCardProps {
|
||||
agent: {
|
||||
id: string;
|
||||
name: string;
|
||||
status: 'online' | 'degraded' | 'offline' | 'unknown';
|
||||
metrics: {
|
||||
requestsPerMinute: number;
|
||||
successRate: number;
|
||||
avgLatency: number;
|
||||
costToday: number;
|
||||
requestHistory: number[]; // last 60 minutes
|
||||
};
|
||||
};
|
||||
onExpand?: () => void;
|
||||
expanded?: boolean;
|
||||
}
|
||||
|
||||
export function AgentStatusCard({ agent, onExpand, expanded }: AgentStatusCardProps) {
|
||||
// Your implementation
|
||||
}
|
||||
```
|
||||
|
||||
### Task 3.2: Real-time Hook 🔌
|
||||
Create a custom hook for real-time agent metrics:
|
||||
|
||||
```tsx
|
||||
interface UseAgentMetricsOptions {
|
||||
agentId: string;
|
||||
refreshInterval?: number;
|
||||
}
|
||||
|
||||
interface UseAgentMetricsResult {
|
||||
metrics: AgentMetrics | null;
|
||||
isLoading: boolean;
|
||||
error: Error | null;
|
||||
lastUpdated: Date | null;
|
||||
}
|
||||
|
||||
function useAgentMetrics(options: UseAgentMetricsOptions): UseAgentMetricsResult {
|
||||
// Your implementation
|
||||
// Should handle:
|
||||
// - WebSocket subscription for real-time updates
|
||||
// - Fallback to polling if WebSocket unavailable
|
||||
// - Cleanup on unmount
|
||||
// - Error handling and retry logic
|
||||
}
|
||||
```
|
||||
|
||||
### Task 3.3: Data Visualization 📈
|
||||
Design and implement a cost breakdown chart component:
|
||||
|
||||
Requirements:
|
||||
- Show cost by model (GPT-4, Claude, etc.) as a donut/pie chart
|
||||
- Show cost over time as a line/area chart
|
||||
- Toggle between daily/weekly/monthly views
|
||||
- Animate transitions between views
|
||||
- Show tooltip with details on hover
|
||||
|
||||
Provide:
|
||||
1. Component interface/props
|
||||
2. Implementation (can use Recharts, Vega, or any library)
|
||||
3. Example mock data
|
||||
4. Responsive design considerations
|
||||
|
||||
---
|
||||
|
||||
## Part 4: Advanced Frontend (30 points)
|
||||
|
||||
### Task 4.1: Agent Graph Visualization 🕸️
|
||||
Design how to visualize the agent graph:
|
||||
|
||||
**Challenge:** Show a dynamic graph where:
|
||||
- Nodes are agents
|
||||
- Edges are connections between agents
|
||||
- Real-time data flows are animated
|
||||
- Users can zoom, pan, and click for details
|
||||
|
||||
Provide:
|
||||
1. Library choice and justification (D3, React Flow, Cytoscape, etc.)
|
||||
2. Component architecture
|
||||
3. Performance considerations for 50+ nodes
|
||||
4. Interaction design (how users explore the graph)
|
||||
5. Code sketch for the main component
|
||||
|
||||
### Task 4.2: Optimistic UI for Budget Controls 💰
|
||||
Implement optimistic UI for budget updates:
|
||||
|
||||
**Scenario:** User changes an agent's budget limit
|
||||
- Update should appear instantly
|
||||
- Backend validation may reject the change
|
||||
- Must handle race conditions with real-time updates
|
||||
|
||||
Provide:
|
||||
1. State management approach
|
||||
2. Rollback mechanism on failure
|
||||
3. Conflict resolution strategy
|
||||
4. User feedback design
|
||||
|
||||
```tsx
|
||||
function useBudgetUpdate(agentId: string) {
|
||||
// Your implementation showing:
|
||||
// - Optimistic update
|
||||
// - Server sync
|
||||
// - Rollback on error
|
||||
// - Conflict handling
|
||||
}
|
||||
```
|
||||
|
||||
### Task 4.3: Performance Optimization ⚡
|
||||
The dashboard shows data for 100+ agents with real-time updates.
|
||||
|
||||
Design optimizations for:
|
||||
|
||||
1. **Rendering:** How to prevent unnecessary re-renders?
|
||||
2. **Data:** How to handle high-frequency WebSocket updates?
|
||||
3. **Memory:** How to prevent memory leaks with subscriptions?
|
||||
4. **Initial Load:** How to prioritize visible content?
|
||||
|
||||
Provide specific techniques and code examples for each.
|
||||
|
||||
---
|
||||
|
||||
## Submission Checklist
|
||||
|
||||
- [ ] All Part 1 exploration answers
|
||||
- [ ] Part 2 wireframes and design analysis
|
||||
- [ ] Part 3 component implementations
|
||||
- [ ] Part 4 advanced designs
|
||||
|
||||
### How to Submit
|
||||
|
||||
1. Create a GitHub Gist with your answers
|
||||
2. Name it `aden-frontend-YOURNAME.md`
|
||||
3. Include code files as separate Gist files
|
||||
4. If you created working code, include a CodeSandbox/StackBlitz link
|
||||
5. Email to `careers@adenhq.com`
|
||||
- Subject: `[Frontend Challenge] Your Name`
|
||||
|
||||
---
|
||||
|
||||
## Scoring
|
||||
|
||||
| Section | Points |
|
||||
|---------|--------|
|
||||
| Part 1: Exploration | 15 |
|
||||
| Part 2: UI/UX | 20 |
|
||||
| Part 3: Implementation | 35 |
|
||||
| Part 4: Advanced | 30 |
|
||||
| **Total** | **100** |
|
||||
|
||||
**Passing score:** 75+ points
|
||||
|
||||
---
|
||||
|
||||
## Bonus Points (+20)
|
||||
|
||||
- **+10:** Create a working prototype in CodeSandbox
|
||||
- **+5:** Submit a PR improving existing UI
|
||||
- **+5:** Create a Figma design for a new feature
|
||||
|
||||
---
|
||||
|
||||
## Resources
|
||||
|
||||
- [React Documentation](https://react.dev)
|
||||
- [Tailwind CSS](https://tailwindcss.com)
|
||||
- [Radix UI](https://radix-ui.com)
|
||||
- [Recharts](https://recharts.org)
|
||||
- [React Flow](https://reactflow.dev) (for graph visualization)
|
||||
|
||||
---
|
||||
|
||||
Good luck! We love engineers who care about user experience! 🎨✨
|
||||
@@ -0,0 +1,309 @@
|
||||
# 🔧 DevOps Challenge
|
||||
|
||||
Master the deployment and operations of AI agent infrastructure! This challenge is for DevOps and Platform engineers who want to ensure Aden runs reliably at scale.
|
||||
|
||||
**Difficulty:** Advanced
|
||||
**Time:** 2-3 hours
|
||||
**Prerequisites:** Complete [Getting Started](./01-getting-started.md), Docker, Linux, CI/CD experience
|
||||
|
||||
---
|
||||
|
||||
## Part 1: Infrastructure Analysis (20 points)
|
||||
|
||||
### Task 1.1: Docker Deep Dive 🐳
|
||||
Analyze the Aden Docker setup:
|
||||
|
||||
1. List all services defined in `docker-compose.yml`
|
||||
2. What's the purpose of `docker-compose.override.yml`?
|
||||
3. How is hot reload enabled for development?
|
||||
4. What volumes are mounted and why?
|
||||
5. What networking mode is used between services?
|
||||
|
||||
### Task 1.2: Service Dependencies 🔗
|
||||
Map the service dependencies:
|
||||
|
||||
1. Create a dependency diagram showing which services depend on which
|
||||
2. What's the startup order? Does it matter?
|
||||
3. What happens if MongoDB is unavailable?
|
||||
4. What happens if Redis is unavailable?
|
||||
5. Which services are stateless vs stateful?
|
||||
|
||||
### Task 1.3: Configuration Management ⚙️
|
||||
Analyze how configuration works:
|
||||
|
||||
1. How does `config.yaml` get generated?
|
||||
2. What environment variables are required?
|
||||
3. How are secrets managed? (API keys, database passwords)
|
||||
4. What's the difference between dev and prod configs?
|
||||
|
||||
---
|
||||
|
||||
## Part 2: Deployment Scenarios (25 points)
|
||||
|
||||
### Task 2.1: Production Deployment Plan 📋
|
||||
Design a production deployment for a company with:
|
||||
- 100 active agents
|
||||
- 10,000 LLM requests/day
|
||||
- 99.9% uptime requirement
|
||||
- Multi-region support needed
|
||||
|
||||
Provide:
|
||||
1. **Infrastructure diagram** (cloud provider of your choice)
|
||||
2. **Service sizing** (CPU, memory for each component)
|
||||
3. **Database setup** (primary/replica, backups)
|
||||
4. **Load balancing strategy**
|
||||
5. **Estimated monthly cost**
|
||||
|
||||
### Task 2.2: Kubernetes Migration 🚢
|
||||
Convert the Docker Compose setup to Kubernetes:
|
||||
|
||||
1. Create a Kubernetes deployment manifest for the Hive backend
|
||||
2. Create a Service and Ingress for external access
|
||||
3. Design a ConfigMap for configuration
|
||||
4. Create a Secret for sensitive data
|
||||
5. Set up a HorizontalPodAutoscaler
|
||||
|
||||
```yaml
|
||||
# Provide your manifests here
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: hive-backend
|
||||
spec:
|
||||
# Your implementation
|
||||
```
|
||||
|
||||
### Task 2.3: High Availability Design 🔄
|
||||
Design for high availability:
|
||||
|
||||
1. How would you handle backend service failures?
|
||||
2. How would you handle database failover?
|
||||
3. What's your strategy for zero-downtime deployments?
|
||||
4. How would you handle WebSocket connections during rolling updates?
|
||||
5. Design a disaster recovery plan
|
||||
|
||||
---
|
||||
|
||||
## Part 3: CI/CD Pipeline (25 points)
|
||||
|
||||
### Task 3.1: GitHub Actions Pipeline 🔄
|
||||
Create a complete CI/CD pipeline:
|
||||
|
||||
```yaml
|
||||
# .github/workflows/ci-cd.yml
|
||||
name: Aden CI/CD
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [main, develop]
|
||||
pull_request:
|
||||
branches: [main]
|
||||
|
||||
jobs:
|
||||
# Your implementation should include:
|
||||
# - Linting
|
||||
# - Type checking
|
||||
# - Unit tests
|
||||
# - Integration tests
|
||||
# - Build Docker images
|
||||
# - Push to registry
|
||||
# - Deploy to staging (on develop)
|
||||
# - Deploy to production (on main, with approval)
|
||||
```
|
||||
|
||||
Include:
|
||||
1. Separate jobs for frontend and backend
|
||||
2. Matrix testing for multiple Node versions
|
||||
3. Docker layer caching
|
||||
4. Deployment gates/approvals
|
||||
5. Rollback strategy
|
||||
|
||||
### Task 3.2: Testing Strategy 🧪
|
||||
Design the testing infrastructure:
|
||||
|
||||
1. **Unit Tests:** What to test? How to mock LLM calls?
|
||||
2. **Integration Tests:** How to test with real databases?
|
||||
3. **E2E Tests:** What user flows to test?
|
||||
4. **Load Tests:** How to simulate agent traffic?
|
||||
5. **Chaos Tests:** What failures to simulate?
|
||||
|
||||
Provide example test configurations for each type.
|
||||
|
||||
### Task 3.3: Environment Management 🌍
|
||||
Design environment strategy:
|
||||
|
||||
| Environment | Purpose | Data | Who Can Access |
|
||||
|-------------|---------|------|----------------|
|
||||
| Local | Development | Mock | Developers |
|
||||
| Dev | Integration | Sanitized | Engineering |
|
||||
| Staging | Pre-prod | Copy of prod | Engineering + QA |
|
||||
| Production | Live | Real | Restricted |
|
||||
|
||||
For each environment, specify:
|
||||
1. How it's provisioned
|
||||
2. How data is managed
|
||||
3. How deployments happen
|
||||
4. Access control
|
||||
|
||||
---
|
||||
|
||||
## Part 4: Observability & Operations (30 points)
|
||||
|
||||
### Task 4.1: Monitoring Stack 📊
|
||||
Design a comprehensive monitoring solution:
|
||||
|
||||
1. **Metrics:** What to collect? (list at least 10 key metrics)
|
||||
2. **Logs:** Logging strategy and aggregation
|
||||
3. **Traces:** Distributed tracing for agent flows
|
||||
4. **Dashboards:** Design 3 key dashboards
|
||||
|
||||
```yaml
|
||||
# Provide a docker-compose addition for monitoring
|
||||
services:
|
||||
prometheus:
|
||||
# Your config
|
||||
grafana:
|
||||
# Your config
|
||||
# Add more as needed
|
||||
```
|
||||
|
||||
### Task 4.2: Alerting Rules 🚨
|
||||
Create alerting rules for critical scenarios:
|
||||
|
||||
```yaml
|
||||
# Prometheus alerting rules
|
||||
groups:
|
||||
- name: aden-critical
|
||||
rules:
|
||||
- alert: HighErrorRate
|
||||
expr: # Your expression
|
||||
for: 5m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "High error rate detected"
|
||||
description: # Your description
|
||||
|
||||
# Add more alerts for:
|
||||
# - Service down
|
||||
# - High latency
|
||||
# - Budget exceeded
|
||||
# - Database connection issues
|
||||
# - Memory pressure
|
||||
```
|
||||
|
||||
Create at least 8 alert rules covering different failure modes.
|
||||
|
||||
### Task 4.3: Incident Response 🆘
|
||||
Create an incident response runbook:
|
||||
|
||||
**Scenario:** Agent response times spike to 30 seconds (normal: 2 seconds)
|
||||
|
||||
Provide:
|
||||
1. **Detection:** How was this discovered?
|
||||
2. **Triage:** Initial investigation steps
|
||||
3. **Diagnosis:** Decision tree for root causes
|
||||
4. **Resolution:** Steps for each root cause
|
||||
5. **Post-mortem:** Template for incident review
|
||||
|
||||
```markdown
|
||||
# Runbook: High Agent Latency
|
||||
|
||||
## Symptoms
|
||||
- Agent response times > 10s
|
||||
- Dashboard showing degraded status
|
||||
|
||||
## Initial Triage
|
||||
1. Check [ ] Is this affecting all agents or specific ones?
|
||||
2. Check [ ] Is the backend healthy? (health endpoint)
|
||||
3. Check [ ] Are databases responsive?
|
||||
...
|
||||
|
||||
## Diagnostic Steps
|
||||
...
|
||||
|
||||
## Resolution Steps
|
||||
### If LLM Provider Issue:
|
||||
...
|
||||
|
||||
### If Database Issue:
|
||||
...
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Part 5: Security Hardening (Bonus - 20 points)
|
||||
|
||||
### Task 5.1: Security Audit 🔒
|
||||
Perform a security analysis:
|
||||
|
||||
1. **Network:** What ports are exposed? Are they necessary?
|
||||
2. **Secrets:** How are secrets currently handled? Improvements?
|
||||
3. **Authentication:** How is API auth implemented?
|
||||
4. **Container Security:** What image scanning would you add?
|
||||
5. **Database Security:** What hardening is needed?
|
||||
|
||||
### Task 5.2: Compliance Checklist ✅
|
||||
For SOC 2 compliance, what changes are needed?
|
||||
|
||||
1. Access control improvements
|
||||
2. Audit logging requirements
|
||||
3. Encryption requirements
|
||||
4. Data retention policies
|
||||
5. Incident response requirements
|
||||
|
||||
---
|
||||
|
||||
## Submission Checklist
|
||||
|
||||
- [ ] Part 1 infrastructure analysis
|
||||
- [ ] Part 2 deployment designs and manifests
|
||||
- [ ] Part 3 CI/CD pipeline YAML
|
||||
- [ ] Part 4 monitoring and alerting configs
|
||||
- [ ] (Bonus) Part 5 security analysis
|
||||
|
||||
### How to Submit
|
||||
|
||||
1. Create a GitHub Gist with your answers
|
||||
2. Name it `aden-devops-YOURNAME.md`
|
||||
3. Include all YAML/configuration files
|
||||
4. Include any diagrams (use Mermaid, ASCII, or image links)
|
||||
5. Email to `careers@adenhq.com`
|
||||
- Subject: `[DevOps Challenge] Your Name`
|
||||
|
||||
---
|
||||
|
||||
## Scoring
|
||||
|
||||
| Section | Points |
|
||||
|---------|--------|
|
||||
| Part 1: Infrastructure | 20 |
|
||||
| Part 2: Deployment | 25 |
|
||||
| Part 3: CI/CD | 25 |
|
||||
| Part 4: Observability | 30 |
|
||||
| Part 5: Security (Bonus) | +20 |
|
||||
| **Total** | **100 (+20)** |
|
||||
|
||||
**Passing score:** 75+ points
|
||||
|
||||
---
|
||||
|
||||
## Bonus Points (+15)
|
||||
|
||||
- **+5:** Set up a working local Kubernetes cluster with Aden
|
||||
- **+5:** Create a Terraform module for cloud deployment
|
||||
- **+5:** Submit a PR improving deployment documentation
|
||||
|
||||
---
|
||||
|
||||
## Resources
|
||||
|
||||
- [Docker Documentation](https://docs.docker.com)
|
||||
- [Kubernetes Documentation](https://kubernetes.io/docs)
|
||||
- [GitHub Actions](https://docs.github.com/en/actions)
|
||||
- [Prometheus](https://prometheus.io/docs)
|
||||
- [Grafana](https://grafana.com/docs)
|
||||
|
||||
---
|
||||
|
||||
Good luck! We're looking for engineers who keep systems running smoothly! 🔧✨
|
||||
@@ -0,0 +1,46 @@
|
||||
# Aden Engineering Challenges
|
||||
|
||||
Welcome to the Aden Engineering Challenges! These quizzes are designed for students and applicants who want to join the Aden team or contribute to our open-source projects.
|
||||
|
||||
## How It Works
|
||||
|
||||
1. **Choose your track** based on your interests and skill level
|
||||
2. **Complete the challenges** in order
|
||||
3. **Submit your work** as instructed in each challenge
|
||||
4. **Get noticed** by the Aden team!
|
||||
|
||||
## Available Tracks
|
||||
|
||||
| Track | Difficulty | Time Estimate | Best For |
|
||||
|-------|------------|---------------|----------|
|
||||
| [🚀 Getting Started](./01-getting-started.md) | Beginner | 30 min | Everyone - Start Here! |
|
||||
| [🧠 Architecture Deep Dive](./02-architecture-deep-dive.md) | Intermediate | 1-2 hours | Backend Engineers |
|
||||
| [🤖 Build Your First Agent](./03-build-your-first-agent.md) | Intermediate | 2-3 hours | AI/ML Engineers |
|
||||
| [🎨 Frontend Challenge](./04-frontend-challenge.md) | Intermediate | 1-2 hours | Frontend Engineers |
|
||||
| [🔧 DevOps Challenge](./05-devops-challenge.md) | Advanced | 2-3 hours | DevOps/Platform Engineers |
|
||||
|
||||
## Why Complete These Challenges?
|
||||
|
||||
- 📚 **Learn** about cutting-edge AI agent technology
|
||||
- 🏆 **Stand out** in your application to Aden
|
||||
- 🤝 **Connect** with the Aden engineering team
|
||||
- 🌟 **Contribute** to an exciting open-source project
|
||||
- 💼 **Showcase** your skills with real-world projects
|
||||
|
||||
## Submission Guidelines
|
||||
|
||||
After completing challenges, submit your work by:
|
||||
|
||||
1. Creating a GitHub Gist with your answers
|
||||
2. Emailing the link to `careers@adenhq.com` with subject: `[Engineering Challenge] Your Name - Track Name`
|
||||
3. Include your GitHub username in the email
|
||||
|
||||
## Getting Help
|
||||
|
||||
- Join our [Discord](https://discord.com/invite/MXE49hrKDk) and ask in #applicant-challenges
|
||||
- Check out the [documentation](https://docs.adenhq.com/)
|
||||
- Review the [README](../../README.md) for project overview
|
||||
|
||||
---
|
||||
|
||||
**Ready to begin?** Start with [🚀 Getting Started](./01-getting-started.md)!
|
||||
Reference in New Issue
Block a user