Merge branch 'main' into feat/core-framework

This commit is contained in:
Timothy
2026-01-19 20:08:01 -08:00
57 changed files with 7375 additions and 36 deletions
+9
View File
@@ -43,10 +43,19 @@ pnpm-debug.log*
# Testing
coverage/
.nyc_output/
.pytest_cache/
# TypeScript
*.tsbuildinfo
# Python
__pycache__/
*.py[cod]
*$py.class
*.egg-info/
.eggs/
*.egg
# Misc
*.local
.cache/
+232 -3
View File
@@ -9,6 +9,20 @@
[![Twitter Follow](https://img.shields.io/twitter/follow/teamaden?logo=X&color=%23f5f5f5)](https://x.com/aden_hq)
[![LinkedIn](https://custom-icon-badges.demolab.com/badge/LinkedIn-0A66C2?logo=linkedin-white&logoColor=fff)](https://www.linkedin.com/company/teamaden/)
<p align="center">
<img src="https://img.shields.io/badge/AI_Agents-Self--Improving-brightgreen?style=flat-square" alt="AI Agents" />
<img src="https://img.shields.io/badge/Multi--Agent-Systems-blue?style=flat-square" alt="Multi-Agent" />
<img src="https://img.shields.io/badge/Goal--Driven-Development-purple?style=flat-square" alt="Goal-Driven" />
<img src="https://img.shields.io/badge/Human--in--the--Loop-orange?style=flat-square" alt="HITL" />
<img src="https://img.shields.io/badge/Production--Ready-red?style=flat-square" alt="Production" />
</p>
<p align="center">
<img src="https://img.shields.io/badge/OpenAI-supported-412991?style=flat-square&logo=openai" alt="OpenAI" />
<img src="https://img.shields.io/badge/Anthropic-supported-d4a574?style=flat-square" alt="Anthropic" />
<img src="https://img.shields.io/badge/Google_Gemini-supported-4285F4?style=flat-square&logo=google" alt="Gemini" />
<img src="https://img.shields.io/badge/MCP-19_Tools-00ADD8?style=flat-square" alt="MCP" />
</p>
## Overview
Build reliable, self-improving AI agents without hardcoding workflows. Define your goal through conversation with a coding agent, and the framework generates a node graph with dynamically created connection code. When things break, the framework captures failure data, evolves the agent through the coding agent, and redeploys. Built-in human-in-the-loop nodes, credential management, and real-time monitoring give you control without sacrificing adaptability.
@@ -62,6 +76,139 @@ docker compose up
- **Cost & Budget Control** - Set spending limits, throttles, and automatic model degradation policies
- **Production-Ready** - Self-hostable, built for scale and reliability
## Why Aden
Traditional agent frameworks require you to manually design workflows, define agent interactions, and handle failures reactively. Aden flips this paradigm—**you describe outcomes, and the system builds itself**.
```mermaid
flowchart TB
subgraph USER["👤 User"]
GOAL[("🎯 Define Goal<br/>(Natural Language)")]
end
subgraph CODING["🤖 Coding Agent"]
direction TB
GENERATE["Generate Agent Graph"]
CONNECTION["Create Connection Code"]
TESTGEN["Generate Test Cases"]
EVOLVE["Evolve on Failure"]
end
subgraph WORKERS["⚙️ Worker Agents"]
direction TB
subgraph NODE1["SDK-Wrapped Node"]
N1_MEM["Memory (STM/LTM)"]
N1_TOOLS["Tools Access"]
N1_LLM["LLM Integration"]
N1_MON["Monitoring"]
end
subgraph NODE2["SDK-Wrapped Node"]
N2_MEM["Memory (STM/LTM)"]
N2_TOOLS["Tools Access"]
N2_LLM["LLM Integration"]
N2_MON["Monitoring"]
end
HITL["🙋 Human-in-the-Loop<br/>Intervention Points"]
end
subgraph CONTROL["🎛️ Hive Control Plane"]
direction TB
BUDGET["Budget & Cost Control"]
POLICY["Policy Management"]
METRICS["Real-time Metrics"]
MCP["19 MCP Tools"]
end
subgraph STORAGE["💾 Storage Layer"]
TSDB[("TimescaleDB<br/>Metrics & Events")]
MONGO[("MongoDB<br/>Policies")]
POSTGRES[("PostgreSQL<br/>Users & Config")]
end
subgraph DASHBOARD["📊 Dashboard (Honeycomb)"]
ANALYTICS["Analytics & KPIs"]
AGENTS["Agent Monitoring"]
COSTS["Cost Tracking"]
end
GOAL --> GENERATE
GENERATE --> CONNECTION
CONNECTION --> TESTGEN
TESTGEN --> NODE1
TESTGEN --> NODE2
NODE1 <--> NODE2
NODE1 & NODE2 --> HITL
NODE1 & NODE2 -->|Events| CONTROL
CONTROL -->|Policies| NODE1 & NODE2
CONTROL <-->|WebSocket| DASHBOARD
CONTROL --> STORAGE
NODE1 & NODE2 -->|Failure Data| EVOLVE
EVOLVE -->|Updated Graph| GENERATE
style USER fill:#e8f5e9,stroke:#2e7d32
style CODING fill:#e3f2fd,stroke:#1565c0
style WORKERS fill:#fff3e0,stroke:#ef6c00
style CONTROL fill:#fce4ec,stroke:#c2185b
style STORAGE fill:#f3e5f5,stroke:#7b1fa2
style DASHBOARD fill:#e0f7fa,stroke:#00838f
```
### The Aden Advantage
| Traditional Frameworks | Aden |
|------------------------|------|
| Hardcode agent workflows | Describe goals in natural language |
| Manual graph definition | Auto-generated agent graphs |
| Reactive error handling | Proactive self-evolution |
| Static tool configurations | Dynamic SDK-wrapped nodes |
| Separate monitoring setup | Built-in real-time observability |
| DIY budget management | Integrated cost controls & degradation |
### How It Works
1. **Define Your Goal** → Describe what you want to achieve in plain English
2. **Coding Agent Generates** → Creates the agent graph, connection code, and test cases
3. **Workers Execute** → SDK-wrapped nodes run with full observability and tool access
4. **Control Plane Monitors** → Real-time metrics, budget enforcement, policy management
5. **Self-Improve** → On failure, the system evolves the graph and redeploys automatically
## How Aden Compares
Aden takes a fundamentally different approach to agent development. While most frameworks require you to hardcode workflows or manually define agent graphs, Aden uses a **coding agent to generate your entire agent system** from natural language goals. When agents fail, the framework doesn't just log errors—it **automatically evolves the agent graph** and redeploys.
### Comparison Table
| Framework | Category | Approach | Aden Difference |
|-----------|----------|----------|-----------------|
| **LangChain, LlamaIndex, Haystack** | Component Libraries | Predefined components for RAG/LLM apps; manual connection logic | Generates entire graph and connection code upfront |
| **CrewAI, AutoGen, Swarm** | Multi-Agent Orchestration | Role-based agents with predefined collaboration patterns | Dynamically creates agents/connections; adapts on failure |
| **PydanticAI, Mastra, Agno** | Type-Safe Frameworks | Structured outputs and validation for known workflows | Evolving workflows; structure emerges through iteration |
| **Agent Zero, Letta** | Personal AI Assistants | Memory and learning; OS-as-tool or stateful memory focus | Production multi-agent systems with self-healing |
| **CAMEL** | Research Framework | Emergent behavior in large-scale simulations (up to 1M agents) | Production-oriented with reliable execution and recovery |
| **TEN Framework, Genkit** | Infrastructure Frameworks | Real-time multimodal (TEN) or full-stack AI (Genkit) | Higher abstraction—generates and evolves agent logic |
| **GPT Engineer, Motia** | Code Generation | Code from specs (GPT Engineer) or "Step" primitive (Motia) | Self-adapting graphs with automatic failure recovery |
| **Trading Agents** | Domain-Specific | Hardcoded trading firm roles on LangGraph | Domain-agnostic; generates structures for any use case |
### When to Choose Aden
Choose Aden when you need:
- Agents that **self-improve from failures** without manual intervention
- **Goal-driven development** where you describe outcomes, not workflows
- **Production reliability** with automatic recovery and redeployment
- **Rapid iteration** on agent architectures without rewriting code
- **Full observability** with real-time monitoring and human oversight
Choose other frameworks when you need:
- **Type-safe, predictable workflows** (PydanticAI, Mastra)
- **RAG and document processing** (LlamaIndex, Haystack)
- **Research on agent emergence** (CAMEL)
- **Real-time voice/multimodal** (TEN Framework)
- **Simple component chaining** (LangChain, Swarm)
## Project Structure
```
@@ -111,10 +258,26 @@ cd hive && npm run dev
## Roadmap
Aden Agent Framework aims to help developers build outcome orienated, self-adaptive agents. Please find our roadmap here
Aden Agent Framework aims to help developers build outcome oriented, self-adaptive agents. Please find our roadmap here
[ROADMAP.md](ROADMAP.md)
```mermaid
timeline
title Aden Agent Framework Roadmap
section Foundation
Architecture : Node-Based Architecture : Python SDK : LLM Integration (OpenAI, Anthropic, Google) : Communication Protocol
Coding Agent : Goal Creation Session : Worker Agent Creation : MCP Tools Integration
Worker Agent : Human-in-the-Loop : Callback Handlers : Intervention Points : Streaming Interface
Tools : File Use : Memory (STM/LTM) : Web Search : Web Scraper : Audit Trail
Core : Eval System : Pydantic Validation : Docker Deployment : Documentation : Sample Agents
section Expansion
Intelligence : Guardrails : Streaming Mode : Semantic Search
Platform : JavaScript SDK : Custom Tool Integrator : Credential Store
Deployment : Self-Hosted : Cloud Services : CI/CD Pipeline
Templates : Sales Agent : Marketing Agent : Analytics Agent : Training Agent : Smart Form Agent
```
## Community & Support
We use [Discord](https://discord.com/invite/MXE49hrKDk) for support, feature requests, and community discussions.
@@ -147,8 +310,74 @@ For security concerns, please see [SECURITY.md](SECURITY.md).
This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.
## Frequently Asked Questions (FAQ)
**Q: Does Aden depend on LangChain or other agent frameworks?**
No. Aden is built from the ground up with no dependencies on LangChain, CrewAI, or other agent frameworks. The framework is designed to be lean and flexible, generating agent graphs dynamically rather than relying on predefined components.
**Q: What LLM providers does Aden support?**
Aden supports OpenAI (GPT-4, GPT-4o), Anthropic (Claude models), and Google Gemini out of the box. The architecture is provider-agnostic through SDK abstraction, with LiteLLM integration on the roadmap for expanded model support.
**Q: Can I use Aden with local AI models like Ollama?**
Local model support through LiteLLM integration is on our roadmap. The SDK's provider-agnostic design means adding local model support will be straightforward once implemented.
**Q: What makes Aden different from other agent frameworks?**
Aden generates your entire agent system from natural language goals using a coding agent—you don't hardcode workflows or manually define graphs. When agents fail, the framework automatically captures failure data, evolves the agent graph, and redeploys. This self-improving loop is unique to Aden.
**Q: Is Aden open-source?**
Yes, Aden is fully open-source under the Apache License 2.0. We actively encourage community contributions and collaboration.
**Q: Does Aden collect data from users?**
Aden collects telemetry data for monitoring and observability purposes, including token usage, latency metrics, and cost tracking. Content capture (prompts and responses) is configurable and stored with team-scoped data isolation. All data stays within your infrastructure when self-hosted.
**Q: What deployment options does Aden support?**
Aden supports Docker Compose deployment out of the box, with both production and development configurations. Self-hosted deployments work on any infrastructure supporting Docker. Cloud deployment options and Kubernetes-ready configurations are on the roadmap.
**Q: Can Aden handle complex, production-scale use cases?**
Yes. Aden is explicitly designed for production environments with features like automatic failure recovery, real-time observability, cost controls, and horizontal scaling support. The framework handles both simple automations and complex multi-agent workflows.
**Q: Does Aden support human-in-the-loop workflows?**
Yes, Aden fully supports human-in-the-loop workflows through intervention nodes that pause execution for human input. These include configurable timeouts and escalation policies, allowing seamless collaboration between human experts and AI agents.
**Q: What monitoring and debugging tools does Aden provide?**
Aden includes comprehensive observability features: real-time WebSocket streaming for live agent execution monitoring, TimescaleDB-powered analytics for cost and performance metrics, health check endpoints for Kubernetes integration, and 19 MCP tools for budget management, agent status, and policy control.
**Q: What programming languages does Aden support?**
Aden provides SDKs for both Python and JavaScript/TypeScript. The Python SDK includes integration templates for LangGraph, LangFlow, and LiveKit. The backend is Node.js/TypeScript, and the frontend is React/TypeScript.
**Q: Can Aden agents interact with external tools and APIs?**
Yes. Aden's SDK-wrapped nodes provide built-in tool access, and the framework supports flexible tool ecosystems. Agents can integrate with external APIs, databases, and services through the node architecture.
**Q: How does cost control work in Aden?**
Aden provides granular budget controls including spending limits, throttles, and automatic model degradation policies. You can set budgets at the team, agent, or workflow level, with real-time cost tracking and alerts.
**Q: Where can I find examples and documentation?**
Visit [docs.adenhq.com](https://docs.adenhq.com/) for complete guides, API reference, and getting started tutorials. The repository also includes documentation in the `docs/` folder and a comprehensive [DEVELOPER.md](DEVELOPER.md) guide.
**Q: How can I contribute to Aden?**
Contributions are welcome! Fork the repository, create your feature branch, implement your changes, and submit a pull request. See [CONTRIBUTING.md](CONTRIBUTING.md) for detailed guidelines.
**Q: Does Aden offer enterprise support?**
For enterprise inquiries, contact the Aden team through [adenhq.com](https://adenhq.com) or join our [Discord community](https://discord.com/invite/MXE49hrKDk) for support and discussions.
---
<p align="center">
Made with care by the <a href="https://adenhq.com">Aden</a> team
Made with 🔥 Passion in San Francisco
</p>
+54 -33
View File
@@ -1,56 +1,74 @@
Product Roadmap
Aden Agent Framework aims to help developers build outcome orienated, self-adaptive agents. Please find our roadmap here
Aden Agent Framework aims to help developers build outcome oriented, self-adaptive agents. Please find our roadmap here
```mermaid
timeline
title Aden Agent Framework Roadmap
section MVP Phase
Architecture : Node-Based : Python SDK : Flexible Edges : Hooks : Tool Use
Capabilities : Goal Creation : Worker Agents Generation: File/Memory Tools : Multi-Agent : Human-in-the-Loop
Foundations : Basic Eval : Docker Deployment : Documentation
section Post-MVP
Intelligence : Guardrails : Streaming : Semantic Search
Ecosystem : Javascript SDK : Cloud Deployment : CI/CD : Autonomous Agent
Agent Templates : Sales Agent : Marketing Agent : Analytics Agent
section Foundation
Architecture : Node-Based Architecture : Python SDK : LLM Integration (OpenAI, Anthropic, Google) : Communication Protocol
Coding Agent : Goal Creation Session : Worker Agent Creation : MCP Tools Integration
Worker Agent : Human-in-the-Loop : Callback Handlers : Intervention Points : Streaming Interface
Tools : File Use : Memory (STM/LTM) : Web Search : Web Scraper : Audit Trail
Core : Eval System : Pydantic Validation : Docker Deployment : Documentation : Sample Agents
section Expansion
Intelligence : Guardrails : Streaming Mode : Semantic Search
Platform : JavaScript SDK : Custom Tool Integrator : Credential Store
Deployment : Self-Hosted : Cloud Services : CI/CD Pipeline
Templates : Sales Agent : Marketing Agent : Analytics Agent : Training Agent : Smart Form Agent
```
---
## Phase 1: MVP and SDK backbone
## Phase 1: Foundation
### Backbone Architecture
- [ ] **Node-Based Architecture (Agent as a node)**
- [ ] Object schema definition
- [ ] Node wrapper SDK
- [x] Object schema definition
- [x] Node wrapper SDK
- [ ] Shared memory access
- [ ] Default monitoring hooks
- [ ] Tool access layer
- [ ] LLM integration layer (Natively supports all mainstream LLMs through LiteLLM)
- [x] LLM integration layer (Natively supports all mainstream LLMs through LiteLLM)
- [x] Anthropic
- [x] OpenAI
- [x] Google
- [ ] **Communication protocol between nodes**
- [ ] **[Coding Agent] Goal Creation Session**
- [ ] Instruction for coding agents supporting generation of goal with multiple rounds of conversation
- [ ] Goal Object schema definition
- [ ] Support generating test cases for goal
- [ ] **[Coding Agent] Goal Creation Session** (separate from coding session)
- [ ] Instruction back and forth
- [x] Goal Object schema definition
- [ ] Being able to generate the test cases
- [ ] Test case validation for worker agent (Outcome driven)
- [ ] **[Coding Agent] Worker Agent Creation**
- [ ] Coding Agent tools
- [x] Coding Agent tools
- [ ] Use Template Agent as a start
- [x] Use our MCP tools
- [ ] **[Worker Agent] Human-in-the-Loop**
- [x] Worker Agents request with questions and options
- [x] Callback Handler System to receive events throughout execution
- [ ] Tool-Based Intervention Points (tool to pause execution and request human input)
- [x] Multiple entrypoint for different event source (e.g. Human input, webhook)
- [ ] Streaming Interface for Real-time Monitoring
- [ ] Request State Management
### Essential Tools
- [ ] **File Use**
- [x] **File Use Tool Kit**
- [ ] **Memory Tools**
- [ ] STM Layer Tool (state-based short-term memory)
- [ ] LTM Layer Tool (RLM - long-term memory)
- [x] STM Layer Tool (state-based short-term memory)
- [x] LTM Layer Tool (RLM - long-term memory)
- [ ] **Infrastructure Tools**
- [ ] Runtime Log Tool (logs for coding agent)
- [x] Runtime Log Tool (logs for coding agent)
- [ ] Audit Trail Tool (decision timeline generation)
- [ ] Web Search
- [ ] Web Scraper
- [ ] Recipe for "Add your own tools"
### Memory & File System
- [ ] DB for long-term persistent memory (Filesystem as durable scratchpad pattern)
- [ ] Session Local memory isolation
- [x] DB for long-term persistent memory (Filesystem as durable scratchpad pattern)
- [x] Session Local memory isolation
### Basic Eval System
- [ ] Test Driven
### Eval System (Basic)
- [x] Test Driven - Run test case for all agent iteration
- [ ] Failure recording mechanism
- [ ] SDK for defining failure conditions
- [ ] Basic observability hooks
@@ -59,11 +77,15 @@ timeline
### Data Validation
- [ ] Natively Support data validation of LLMs output with Pydantic
### Developer Experience (MVP)
### Developer Experience
- [ ] **Debugging mode**
- [ ] **Documentation**
- [ ] Quick start guide
- [ ] Goal creation guide
- [ ] Agent creation guide
- [ ] GitHub Page setup
- [ ] README with examples
- [ ] Contributing guidelines
- [ ] **Distribution**
- [ ] PyPI package
- [ ] Docker image on Docker Hub
@@ -75,7 +97,7 @@ timeline
---
## Phase 2: Post-MVP & Scaling
## Phase 2: Expansion
### Basic Guardrails
- [ ] Support Basic Monitoring from Agent node SDK
@@ -86,7 +108,7 @@ timeline
- [ ] Streaming mode support
### Cross-Platform
- [ ] Javascript / TypeScript Version SDK
- [ ] JavaScript / TypeScript Version SDK
### File System Enhancement
- [ ] Semantic Search integration
@@ -96,7 +118,7 @@ timeline
- [ ] Custom Tool Integrator
- [ ] Integration as a tool (Credential Store & Support)
- [ ] **Core Agent Tools**
- [ ] Node Discovery Tool
- [ ] Node Discovery Tool (find other agents in the graph)
- [ ] HITL Tool (pause execution for human approval)
- [ ] Wake-up Tool (resume agent tasks)
@@ -117,8 +139,7 @@ timeline
- [ ] All tests must pass for deployment
### Developer Experience Enhancement
- [ ] Detailed Tool usage documentation
- [ ] Recipe for common agent use cases
- [ ] Tool usage documentation
- [ ] Discord Support Channel
### More Agent Templates
@@ -126,4 +147,4 @@ timeline
- [ ] GTM Marketing Agent (workflow)
- [ ] Analytics Agent
- [ ] Training Agent
- [ ] Smart Entry / Form Agent (self-evolution emphasis)
- [ ] Smart Entry / Form Agent (self-evolution emphasis)
+186
View File
@@ -0,0 +1,186 @@
# Building Tools for Aden
This guide explains how to create new tools for the Aden agent framework using FastMCP.
## Quick Start Checklist
1. Create folder under `src/aden_tools/tools/<tool_name>/`
2. Implement a `register_tools(mcp: FastMCP)` function using the `@mcp.tool()` decorator
3. Add a `README.md` documenting your tool
4. Register in `src/aden_tools/tools/__init__.py`
5. Add tests in `tests/tools/`
## Tool Structure
Each tool lives in its own folder:
```
src/aden_tools/tools/my_tool/
├── __init__.py # Export register_tools function
├── my_tool.py # Tool implementation
└── README.md # Documentation
```
## Implementation Pattern
Tools use FastMCP's native decorator pattern:
```python
from fastmcp import FastMCP
def register_tools(mcp: FastMCP) -> None:
"""Register my tools with the MCP server."""
@mcp.tool()
def my_tool(
query: str,
limit: int = 10,
) -> dict:
"""
Search for items matching a query.
Use this when you need to find specific information.
Args:
query: The search query (1-500 chars)
limit: Maximum number of results (1-100)
Returns:
Dict with search results or error dict
"""
# Validate inputs
if not query or len(query) > 500:
return {"error": "Query must be 1-500 characters"}
if limit < 1 or limit > 100:
limit = max(1, min(100, limit))
try:
# Your implementation here
results = do_search(query, limit)
return {
"query": query,
"results": results,
"total": len(results),
}
except Exception as e:
return {"error": f"Search failed: {str(e)}"}
```
## Exporting the Tool
In `src/aden_tools/tools/my_tool/__init__.py`:
```python
from .my_tool import register_tools
__all__ = ["register_tools"]
```
In `src/aden_tools/tools/__init__.py`, add to `_TOOL_MODULES`:
```python
_TOOL_MODULES = [
# ... existing tools
"my_tool",
]
```
## Environment Variables
For tools requiring API keys or configuration, check environment variables at runtime:
```python
import os
def register_tools(mcp: FastMCP) -> None:
@mcp.tool()
def my_api_tool(query: str) -> dict:
"""Tool that requires an API key."""
api_key = os.getenv("MY_API_KEY")
if not api_key:
return {
"error": "MY_API_KEY environment variable not set",
"help": "Get an API key at https://example.com/api",
}
# Use the API key...
```
## Best Practices
### Error Handling
Return error dicts instead of raising exceptions:
```python
@mcp.tool()
def my_tool(**kwargs) -> dict:
try:
result = do_work()
return {"success": True, "data": result}
except SpecificError as e:
return {"error": f"Failed to process: {str(e)}"}
except Exception as e:
return {"error": f"Unexpected error: {str(e)}"}
```
### Return Values
- Return dicts for structured data
- Include relevant metadata (query, total count, etc.)
- Use `{"error": "message"}` for errors
### Documentation
The docstring becomes the tool description in MCP. Include:
- What the tool does
- When to use it
- Args with types and constraints
- What it returns
Every tool folder needs a `README.md` with:
- Description and use cases
- Usage examples
- Argument table
- Environment variables (if any)
- Error handling notes
## Testing
Place tests in `tests/tools/test_my_tool.py`:
```python
import pytest
from fastmcp import FastMCP
from aden_tools.tools.my_tool import register_tools
@pytest.fixture
def mcp():
"""Create a FastMCP instance with tools registered."""
server = FastMCP("test")
register_tools(server)
return server
def test_my_tool_basic(mcp):
"""Test basic tool functionality."""
tool_fn = mcp._tool_manager._tools["my_tool"].fn
result = tool_fn(query="test")
assert "results" in result
def test_my_tool_validation(mcp):
"""Test input validation."""
tool_fn = mcp._tool_manager._tools["my_tool"].fn
result = tool_fn(query="")
assert "error" in result
```
Mock external APIs to keep tests fast and deterministic.
## Naming Conventions
- **Folder name**: `snake_case` with `_tool` suffix (e.g., `file_read_tool`)
- **Function name**: `snake_case` (e.g., `file_read`)
- **Tool description**: Clear, actionable docstring
+29
View File
@@ -0,0 +1,29 @@
# Aden Tools MCP Server
# Exposes aden-tools via Model Context Protocol
FROM python:3.11-slim
WORKDIR /app
# Copy project files
COPY pyproject.toml ./
COPY README.md ./
COPY src ./src
COPY mcp_server.py ./
# Install package with all dependencies
RUN pip install --no-cache-dir -e .
# Create non-root user for security
RUN useradd -m -u 1001 appuser && chown -R appuser:appuser /app
USER appuser
# Expose MCP server port
EXPOSE 4001
# Health check - verify server is responding
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
CMD python -c "import httpx; httpx.get('http://localhost:4001/health').raise_for_status()" || exit 1
# Run MCP server with HTTP transport
CMD ["python", "mcp_server.py"]
+103
View File
@@ -0,0 +1,103 @@
# Aden Tools
Tool library for the Aden agent framework. Provides a collection of tools that AI agents can use to interact with external systems, process data, and perform actions via the Model Context Protocol (MCP).
## Installation
```bash
pip install -e aden-tools
```
For development:
```bash
pip install -e "aden-tools[dev]"
```
## Quick Start
### As an MCP Server
```python
from fastmcp import FastMCP
from aden_tools.tools import register_all_tools
mcp = FastMCP("aden-tools")
register_all_tools(mcp)
mcp.run()
```
Or run directly:
```bash
python mcp_server.py
```
## Available Tools
| Tool | Description |
|------|-------------|
| `example_tool` | Template tool demonstrating the pattern |
| `file_read` | Read contents of local files |
| `file_write` | Write content to local files |
| `web_search` | Search the web using Brave Search API |
| `web_scrape` | Scrape and extract content from webpages |
| `pdf_read` | Read and extract text from PDF files |
## Project Structure
```
aden-tools/
├── src/aden_tools/
│ ├── __init__.py # Main exports
│ ├── utils/ # Utility functions
│ └── tools/ # Tool implementations
│ ├── example_tool/
│ ├── file_read_tool/
│ ├── file_write_tool/
│ ├── web_search_tool/
│ ├── web_scrape_tool/
│ └── pdf_read_tool/
├── tests/ # Test suite
├── mcp_server.py # MCP server entry point
├── README.md
├── BUILDING_TOOLS.md # Tool development guide
└── pyproject.toml
```
## Creating Custom Tools
Tools use FastMCP's native decorator pattern:
```python
from fastmcp import FastMCP
def register_tools(mcp: FastMCP) -> None:
@mcp.tool()
def my_tool(query: str, limit: int = 10) -> dict:
"""
Search for items matching the query.
Args:
query: The search query
limit: Max results to return
Returns:
Dict with results or error
"""
try:
results = do_search(query, limit)
return {"results": results, "total": len(results)}
except Exception as e:
return {"error": str(e)}
```
See [BUILDING_TOOLS.md](BUILDING_TOOLS.md) for the full guide.
## Documentation
- [Building Tools Guide](BUILDING_TOOLS.md) - How to create new tools
- Individual tool READMEs in `src/aden_tools/tools/*/README.md`
## License
This project is licensed under the Apache License 2.0 - see the [LICENSE](../LICENSE) file for details.
+79
View File
@@ -0,0 +1,79 @@
#!/usr/bin/env python3
"""
Aden Tools MCP Server
Exposes all aden-tools via Model Context Protocol using FastMCP.
Usage:
# Run with HTTP transport (default, for Docker)
python mcp_server.py
# Run with custom port
python mcp_server.py --port 8001
# Run with STDIO transport (for local testing)
python mcp_server.py --stdio
Environment Variables:
MCP_PORT - Server port (default: 4001)
BRAVE_SEARCH_API_KEY - Required for web_search tool
"""
import argparse
import os
from fastmcp import FastMCP
from starlette.requests import Request
from starlette.responses import PlainTextResponse
mcp = FastMCP("aden-tools")
# Register all tools with the MCP server
from aden_tools.tools import register_all_tools
tools = register_all_tools(mcp)
print(f"[MCP] Registered {len(tools)} tools: {tools}")
@mcp.custom_route("/health", methods=["GET"])
async def health_check(request: Request) -> PlainTextResponse:
"""Health check endpoint for container orchestration."""
return PlainTextResponse("OK")
@mcp.custom_route("/", methods=["GET"])
async def index(request: Request) -> PlainTextResponse:
"""Landing page for browser visits."""
return PlainTextResponse("Welcome to the Hive MCP Server")
def main() -> None:
"""Entry point for the MCP server."""
parser = argparse.ArgumentParser(description="Aden Tools MCP Server")
parser.add_argument(
"--port",
type=int,
default=int(os.getenv("MCP_PORT", "4001")),
help="HTTP server port (default: 4001)",
)
parser.add_argument(
"--host",
default="0.0.0.0",
help="HTTP server host (default: 0.0.0.0)",
)
parser.add_argument(
"--stdio",
action="store_true",
help="Use STDIO transport instead of HTTP",
)
args = parser.parse_args()
if args.stdio:
print("[MCP] Starting with STDIO transport")
mcp.run(transport="stdio")
else:
print(f"[MCP] Starting HTTP server on {args.host}:{args.port}")
mcp.run(transport="http", host=args.host, port=args.port)
if __name__ == "__main__":
main()
+59
View File
@@ -0,0 +1,59 @@
[project]
name = "aden-tools"
version = "0.1.0"
description = "Tools library for the Aden agent framework"
readme = "README.md"
requires-python = ">=3.10"
license = { text = "Apache-2.0" }
authors = [
{ name = "Aden", email = "team@aden.ai" }
]
keywords = ["ai", "agents", "tools", "llm"]
classifiers = [
"Development Status :: 3 - Alpha",
"Intended Audience :: Developers",
"License :: OSI Approved :: Apache Software License",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
]
dependencies = [
"pydantic>=2.0.0",
"httpx>=0.27.0",
"beautifulsoup4>=4.12.0",
"pypdf>=4.0.0",
"pandas>=2.0.0",
"jsonpath-ng>=1.6.0",
"fastmcp>=2.0.0",
]
[project.optional-dependencies]
dev = [
"pytest>=7.0.0",
"pytest-asyncio>=0.21.0",
]
sandbox = [
"RestrictedPython>=7.0",
]
ocr = [
"pytesseract>=0.3.10",
"pillow>=10.0.0",
]
all = [
"RestrictedPython>=7.0",
"pytesseract>=0.3.10",
"pillow>=10.0.0",
]
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
[tool.hatch.build.targets.wheel]
packages = ["src/aden_tools"]
[tool.pytest.ini_options]
testpaths = ["tests"]
asyncio_mode = "auto"
+30
View File
@@ -0,0 +1,30 @@
"""
Aden Tools - Tool library for the Aden agent framework.
Tools provide capabilities that AI agents can use to interact with
external systems, process data, and perform actions.
Usage:
from fastmcp import FastMCP
from aden_tools.tools import register_all_tools
mcp = FastMCP("my-server")
register_all_tools(mcp)
"""
__version__ = "0.1.0"
# Utilities
from .utils import get_env_var
# MCP registration
from .tools import register_all_tools
__all__ = [
# Version
"__version__",
# Utilities
"get_env_var",
# MCP registration
"register_all_tools",
]
@@ -0,0 +1,51 @@
"""
Aden Tools - Tool implementations for FastMCP.
Usage:
from fastmcp import FastMCP
from aden_tools.tools import register_all_tools
mcp = FastMCP("my-server")
register_all_tools(mcp)
"""
from typing import List
from fastmcp import FastMCP
# Import register_tools from each tool module
from .example_tool import register_tools as register_example
from .file_read_tool import register_tools as register_file_read
from .file_write_tool import register_tools as register_file_write
from .web_search_tool import register_tools as register_web_search
from .web_scrape_tool import register_tools as register_web_scrape
from .pdf_read_tool import register_tools as register_pdf_read
def register_all_tools(mcp: FastMCP) -> List[str]:
"""
Register all aden-tools with a FastMCP server.
Args:
mcp: FastMCP server instance
Returns:
List of registered tool names
"""
register_example(mcp)
register_file_read(mcp)
register_file_write(mcp)
register_web_search(mcp)
register_web_scrape(mcp)
register_pdf_read(mcp)
return [
"example_tool",
"file_read",
"file_write",
"web_search",
"web_scrape",
"pdf_read",
]
__all__ = ["register_all_tools"]
@@ -0,0 +1,26 @@
# Example Tool
A template tool demonstrating the Aden tools pattern.
## Description
This tool processes text messages with optional transformations. It serves as a reference implementation for creating new tools using the FastMCP decorator pattern.
## Arguments
| Argument | Type | Required | Default | Description |
|----------|------|----------|---------|-------------|
| `message` | str | Yes | - | The message to process (1-1000 chars) |
| `uppercase` | bool | No | `False` | Convert message to uppercase |
| `repeat` | int | No | `1` | Number of times to repeat (1-10) |
## Environment Variables
This tool does not require any environment variables.
## Error Handling
Returns error strings for validation issues:
- `Error: message must be 1-1000 characters` - Empty or too long message
- `Error: repeat must be 1-10` - Repeat value out of range
- `Error processing message: <error>` - Unexpected error
@@ -0,0 +1,4 @@
"""Example Tool package."""
from .example_tool import register_tools
__all__ = ["register_tools"]
@@ -0,0 +1,51 @@
"""
Example Tool - A simple text processing tool for FastMCP.
Demonstrates native FastMCP tool registration pattern.
"""
from __future__ import annotations
from fastmcp import FastMCP
def register_tools(mcp: FastMCP) -> None:
"""Register example tools with the MCP server."""
@mcp.tool()
def example_tool(
message: str,
uppercase: bool = False,
repeat: int = 1,
) -> str:
"""
A simple example tool that processes text messages.
Use this tool when you need to transform or repeat text.
Args:
message: The message to process (1-1000 chars)
uppercase: If True, convert the message to uppercase
repeat: Number of times to repeat the message (1-10)
Returns:
The processed message string
"""
try:
# Validate inputs
if not message or len(message) > 1000:
return "Error: message must be 1-1000 characters"
if repeat < 1 or repeat > 10:
return "Error: repeat must be 1-10"
# Process the message
result = message
if uppercase:
result = result.upper()
# Repeat if requested
if repeat > 1:
result = " ".join([result] * repeat)
return result
except Exception as e:
return f"Error processing message: {str(e)}"
@@ -0,0 +1,28 @@
# File Read Tool
Read contents of local files with encoding support.
## Description
Use for reading configs, data files, source code, logs, or any text file. Returns file content along with path, name, size, and encoding metadata.
## Arguments
| Argument | Type | Required | Default | Description |
|----------|------|----------|---------|-------------|
| `file_path` | str | Yes | - | Path to the file to read (absolute or relative) |
| `encoding` | str | No | `utf-8` | File encoding (utf-8, latin-1, etc.) |
| `max_size` | int | No | `10000000` | Maximum file size to read in bytes (default 10MB) |
## Environment Variables
This tool does not require any environment variables.
## Error Handling
Returns error dicts for common issues:
- `File not found: <path>` - File does not exist
- `Not a file: <path>` - Path points to a directory
- `File too large: <size> bytes (max: <max_size>)` - File exceeds max_size limit
- `Failed to decode file with encoding '<encoding>'` - Wrong encoding specified
- `Permission denied: <path>` - No read access to file
@@ -0,0 +1,4 @@
"""File Read Tool - Read contents of local files."""
from .file_read_tool import register_tools
__all__ = ["register_tools"]
@@ -0,0 +1,75 @@
"""
File Read Tool - Read contents of local files.
Supports reading text files with various encodings.
Returns file content along with metadata.
"""
from __future__ import annotations
from pathlib import Path
from fastmcp import FastMCP
def register_tools(mcp: FastMCP) -> None:
"""Register file read tools with the MCP server."""
@mcp.tool()
def file_read(
file_path: str,
encoding: str = "utf-8",
max_size: int = 10_000_000,
) -> dict:
"""
Read the contents of a local file.
Use for reading configs, data files, source code, logs, or any text file.
Returns file content along with path, name, size, and encoding.
Args:
file_path: Path to the file to read (absolute or relative)
encoding: File encoding (utf-8, latin-1, etc.)
max_size: Maximum file size to read in bytes (default 10MB)
Returns:
Dict with file content and metadata, or error dict
"""
try:
path = Path(file_path).resolve()
# Check if file exists
if not path.exists():
return {"error": f"File not found: {file_path}"}
# Check if it's a file (not directory)
if not path.is_file():
return {"error": f"Not a file: {file_path}"}
# Check file size
file_size = path.stat().st_size
if max_size > 0 and file_size > max_size:
return {
"error": f"File too large: {file_size} bytes (max: {max_size})",
"file_size": file_size,
}
# Read the file
content = path.read_text(encoding=encoding)
return {
"path": str(path),
"name": path.name,
"content": content,
"size": len(content),
"encoding": encoding,
}
except UnicodeDecodeError as e:
return {
"error": f"Failed to decode file with encoding '{encoding}': {str(e)}",
"suggestion": "Try a different encoding like 'latin-1' or 'cp1252'",
}
except PermissionError:
return {"error": f"Permission denied: {file_path}"}
except Exception as e:
return {"error": f"Failed to read file: {str(e)}"}
@@ -0,0 +1,29 @@
# File Write Tool
Write content to local files with encoding support.
## Description
Can create new files or overwrite/append to existing ones. Use for saving data, creating configs, writing reports, or exporting results. Optionally creates parent directories if they don't exist.
## Arguments
| Argument | Type | Required | Default | Description |
|----------|------|----------|---------|-------------|
| `file_path` | str | Yes | - | Path to the file to write (absolute or relative) |
| `content` | str | Yes | - | Content to write to the file |
| `encoding` | str | No | `utf-8` | File encoding (utf-8, latin-1, etc.) |
| `mode` | str | No | `write` | Write mode - 'write' (overwrite) or 'append' |
| `create_dirs` | bool | No | `True` | Create parent directories if they don't exist |
## Environment Variables
This tool does not require any environment variables.
## Error Handling
Returns error dicts for common issues:
- `Parent directory does not exist: <path>` - Parent dir missing and create_dirs=False
- `Invalid mode: <mode>. Use 'write' or 'append'.` - Invalid mode specified
- `Permission denied: <path>` - No write access to file/directory
- `OS error writing file: <error>` - Filesystem error
@@ -0,0 +1,4 @@
"""File Write Tool - Create or update local files."""
from .file_write_tool import register_tools
__all__ = ["register_tools"]
@@ -0,0 +1,83 @@
"""
File Write Tool - Create or update local files.
Supports writing text files with various encodings.
Can create directories if they don't exist.
"""
from __future__ import annotations
from pathlib import Path
from fastmcp import FastMCP
def register_tools(mcp: FastMCP) -> None:
"""Register file write tools with the MCP server."""
@mcp.tool()
def file_write(
file_path: str,
content: str,
encoding: str = "utf-8",
mode: str = "write",
create_dirs: bool = True,
) -> dict:
"""
Write content to a local file.
Can create new files or overwrite/append to existing ones.
Use for saving data, creating configs, writing reports, or exporting results.
Args:
file_path: Path to the file to write (absolute or relative)
content: Content to write to the file
encoding: File encoding (utf-8, latin-1, etc.)
mode: Write mode - 'write' (overwrite) or 'append'
create_dirs: Create parent directories if they don't exist
Returns:
Dict with write result or error dict
"""
try:
path = Path(file_path).resolve()
# Create parent directories if requested
if create_dirs:
path.parent.mkdir(parents=True, exist_ok=True)
elif not path.parent.exists():
return {"error": f"Parent directory does not exist: {path.parent}"}
# Determine write mode
if mode == "append":
write_mode = "a"
elif mode == "write":
write_mode = "w"
else:
return {"error": f"Invalid mode: {mode}. Use 'write' or 'append'."}
# Check if we're overwriting
existed = path.exists()
previous_size = path.stat().st_size if existed else 0
# Write the file
with open(path, write_mode, encoding=encoding) as f:
f.write(content)
new_size = path.stat().st_size
return {
"path": str(path),
"name": path.name,
"bytes_written": len(content.encode(encoding)),
"total_size": new_size,
"mode": mode,
"created": not existed,
"previous_size": previous_size if existed else None,
}
except PermissionError:
return {"error": f"Permission denied: {file_path}"}
except OSError as e:
return {"error": f"OS error writing file: {str(e)}"}
except Exception as e:
return {"error": f"Failed to write file: {str(e)}"}
@@ -0,0 +1,37 @@
# PDF Read Tool
Read and extract text content from PDF files.
## Description
Returns text content with page markers and optional metadata. Use for reading PDFs, reports, documents, or any PDF file.
## Arguments
| Argument | Type | Required | Default | Description |
|----------|------|----------|---------|-------------|
| `file_path` | str | Yes | - | Path to the PDF file to read (absolute or relative) |
| `pages` | str | No | `None` | Page range - 'all'/None for all, '5' for single, '1-10' for range, '1,3,5' for specific |
| `max_pages` | int | No | `100` | Maximum pages to process (1-1000, for memory safety) |
| `include_metadata` | bool | No | `True` | Include PDF metadata (author, title, creation date, etc.) |
## Environment Variables
This tool does not require any environment variables.
## Error Handling
Returns error dicts for common issues:
- `PDF file not found: <path>` - File does not exist
- `Not a file: <path>` - Path points to a directory
- `Not a PDF file (expected .pdf): <path>` - Wrong file extension
- `Cannot read encrypted PDF. Password required.` - PDF is password-protected
- `Page <num> out of range. PDF has <total> pages.` - Invalid page number
- `Invalid page format: '<pages>'` - Malformed page range string
- `Permission denied: <path>` - No read access to file
## Notes
- Page numbers in the `pages` argument are 1-indexed (first page is 1, not 0)
- Text is extracted with page markers: `--- Page N ---`
- Metadata includes: title, author, subject, creator, producer, created, modified
@@ -0,0 +1,4 @@
"""PDF Read Tool - Parse and extract text from PDF files."""
from .pdf_read_tool import register_tools
__all__ = ["register_tools"]
@@ -0,0 +1,157 @@
"""
PDF Read Tool - Parse and extract text from PDF files.
Uses pypdf to read PDF documents and extract text content
along with metadata.
"""
from __future__ import annotations
from pathlib import Path
from typing import Any, List
from fastmcp import FastMCP
from pypdf import PdfReader
def register_tools(mcp: FastMCP) -> None:
"""Register PDF read tools with the MCP server."""
def parse_page_range(
pages: str | None, total_pages: int, max_pages: int
) -> List[int] | dict:
"""
Parse page range string into list of 0-indexed page numbers.
Returns list of indices or error dict.
"""
if pages is None or pages.lower() == "all":
indices = list(range(min(total_pages, max_pages)))
return indices
try:
# Single page: "5"
if pages.isdigit():
page_num = int(pages)
if page_num < 1 or page_num > total_pages:
return {"error": f"Page {page_num} out of range. PDF has {total_pages} pages."}
return [page_num - 1]
# Range: "1-10"
if "-" in pages and "," not in pages:
start_str, end_str = pages.split("-", 1)
start, end = int(start_str), int(end_str)
if start > end:
return {"error": f"Invalid page range: {pages}. Start must be less than end."}
if start < 1:
return {"error": f"Page numbers start at 1, got {start}."}
if end > total_pages:
return {"error": f"Page {end} out of range. PDF has {total_pages} pages."}
indices = list(range(start - 1, min(end, start - 1 + max_pages)))
return indices
# Comma-separated: "1,3,5"
if "," in pages:
page_nums = [int(p.strip()) for p in pages.split(",")]
for p in page_nums:
if p < 1 or p > total_pages:
return {"error": f"Page {p} out of range. PDF has {total_pages} pages."}
indices = [p - 1 for p in page_nums[:max_pages]]
return indices
return {"error": f"Invalid page format: '{pages}'. Use 'all', '5', '1-10', or '1,3,5'."}
except ValueError as e:
return {"error": f"Invalid page format: '{pages}'. {str(e)}"}
@mcp.tool()
def pdf_read(
file_path: str,
pages: str | None = None,
max_pages: int = 100,
include_metadata: bool = True,
) -> dict:
"""
Read and extract text content from a PDF file.
Returns text content with page markers and optional metadata.
Use for reading PDFs, reports, documents, or any PDF file.
Args:
file_path: Path to the PDF file to read (absolute or relative)
pages: Page range to extract - 'all'/None for all, '5' for single, '1-10' for range, '1,3,5' for specific
max_pages: Maximum number of pages to process (1-1000, memory safety)
include_metadata: Include PDF metadata (author, title, creation date, etc.)
Returns:
Dict with extracted text and metadata, or error dict
"""
try:
path = Path(file_path).resolve()
# Validate file exists
if not path.exists():
return {"error": f"PDF file not found: {file_path}"}
if not path.is_file():
return {"error": f"Not a file: {file_path}"}
# Check extension
if path.suffix.lower() != ".pdf":
return {"error": f"Not a PDF file (expected .pdf): {file_path}"}
# Validate max_pages
if max_pages < 1:
max_pages = 1
elif max_pages > 1000:
max_pages = 1000
# Open and read PDF
reader = PdfReader(path)
# Check for encryption
if reader.is_encrypted:
return {"error": "Cannot read encrypted PDF. Password required."}
total_pages = len(reader.pages)
# Parse page range
page_indices = parse_page_range(pages, total_pages, max_pages)
if isinstance(page_indices, dict): # Error dict
return page_indices
# Extract text from pages
content_parts = []
for i in page_indices:
page_text = reader.pages[i].extract_text() or ""
content_parts.append(f"--- Page {i + 1} ---\n{page_text}")
content = "\n\n".join(content_parts)
result: dict[str, Any] = {
"path": str(path),
"name": path.name,
"total_pages": total_pages,
"pages_extracted": len(page_indices),
"content": content,
"char_count": len(content),
}
# Add metadata if requested
if include_metadata and reader.metadata:
meta = reader.metadata
result["metadata"] = {
"title": meta.get("/Title"),
"author": meta.get("/Author"),
"subject": meta.get("/Subject"),
"creator": meta.get("/Creator"),
"producer": meta.get("/Producer"),
"created": str(meta.get("/CreationDate")) if meta.get("/CreationDate") else None,
"modified": str(meta.get("/ModDate")) if meta.get("/ModDate") else None,
}
return result
except PermissionError:
return {"error": f"Permission denied: {file_path}"}
except Exception as e:
return {"error": f"Failed to read PDF: {str(e)}"}
@@ -0,0 +1,36 @@
# Web Scrape Tool
Scrape and extract text content from webpages.
## Description
Use when you need to read the content of a specific URL, extract data from a website, or read articles/documentation. Automatically removes noise elements (scripts, navigation, footers) and extracts the main content.
## Arguments
| Argument | Type | Required | Default | Description |
|----------|------|----------|---------|-------------|
| `url` | str | Yes | - | URL of the webpage to scrape |
| `selector` | str | No | `None` | CSS selector to target specific content (e.g., 'article', '.main-content') |
| `include_links` | bool | No | `False` | Include extracted links in the response |
| `max_length` | int | No | `50000` | Maximum length of extracted text (1000-500000) |
## Environment Variables
This tool does not require any environment variables.
## Error Handling
Returns error dicts for common issues:
- `HTTP <status>: Failed to fetch URL` - Server returned error status
- `No elements found matching selector: <selector>` - CSS selector matched nothing
- `Request timed out` - Request exceeded 30s timeout
- `Network error: <error>` - Connection or DNS issues
- `Scraping failed: <error>` - HTML parsing or other error
## Notes
- URLs without protocol are automatically prefixed with `https://`
- Follows redirects automatically
- Removes script, style, nav, footer, header, aside, noscript, and iframe elements
- Auto-detects main content using article, main, or common content class selectors
@@ -0,0 +1,4 @@
"""Web Scrape Tool - Extract content from web pages."""
from .web_scrape_tool import register_tools
__all__ = ["register_tools"]
@@ -0,0 +1,134 @@
"""
Web Scrape Tool - Extract content from web pages.
Uses httpx for requests and BeautifulSoup for HTML parsing.
Returns clean text content from web pages.
"""
from __future__ import annotations
from typing import Any, List
import httpx
from bs4 import BeautifulSoup
from fastmcp import FastMCP
def register_tools(mcp: FastMCP) -> None:
"""Register web scrape tools with the MCP server."""
@mcp.tool()
def web_scrape(
url: str,
selector: str | None = None,
include_links: bool = False,
max_length: int = 50000,
) -> dict:
"""
Scrape and extract text content from a webpage.
Use when you need to read the content of a specific URL,
extract data from a website, or read articles/documentation.
Args:
url: URL of the webpage to scrape
selector: CSS selector to target specific content (e.g., 'article', '.main-content')
include_links: Include extracted links in the response
max_length: Maximum length of extracted text (1000-500000)
Returns:
Dict with scraped content (url, title, description, content, length) or error dict
"""
try:
# Validate URL
if not url.startswith(("http://", "https://")):
url = "https://" + url
# Validate max_length
if max_length < 1000:
max_length = 1000
elif max_length > 500000:
max_length = 500000
# Make request
response = httpx.get(
url,
headers={
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.5",
},
follow_redirects=True,
timeout=30.0,
)
if response.status_code != 200:
return {"error": f"HTTP {response.status_code}: Failed to fetch URL"}
# Parse HTML
soup = BeautifulSoup(response.text, "html.parser")
# Remove noise elements
for tag in soup(["script", "style", "nav", "footer", "header", "aside", "noscript", "iframe"]):
tag.decompose()
# Get title and description
title = ""
title_tag = soup.find("title")
if title_tag:
title = title_tag.get_text(strip=True)
description = ""
meta_desc = soup.find("meta", attrs={"name": "description"})
if meta_desc:
description = meta_desc.get("content", "")
# Target content
if selector:
content_elem = soup.select_one(selector)
if not content_elem:
return {"error": f"No elements found matching selector: {selector}"}
text = content_elem.get_text(separator=" ", strip=True)
else:
# Auto-detect main content
main_content = (
soup.find("article")
or soup.find("main")
or soup.find(attrs={"role": "main"})
or soup.find(class_=["content", "post", "entry", "article-body"])
or soup.find("body")
)
text = main_content.get_text(separator=" ", strip=True) if main_content else ""
# Clean up whitespace
text = " ".join(text.split())
# Truncate if needed
if len(text) > max_length:
text = text[:max_length] + "..."
result: dict[str, Any] = {
"url": str(response.url),
"title": title,
"description": description,
"content": text,
"length": len(text),
}
# Extract links if requested
if include_links:
links: List[dict[str, str]] = []
for a in soup.find_all("a", href=True)[:50]:
href = a["href"]
link_text = a.get_text(strip=True)
if link_text and href:
links.append({"text": link_text, "href": href})
result["links"] = links
return result
except httpx.TimeoutException:
return {"error": "Request timed out"}
except httpx.RequestError as e:
return {"error": f"Network error: {str(e)}"}
except Exception as e:
return {"error": f"Scraping failed: {str(e)}"}
@@ -0,0 +1,31 @@
# Web Search Tool
Search the web using the Brave Search API.
## Description
Returns titles, URLs, and snippets for search results. Use when you need current information, research topics, or find websites.
## Arguments
| Argument | Type | Required | Default | Description |
|----------|------|----------|---------|-------------|
| `query` | str | Yes | - | The search query (1-500 chars) |
| `num_results` | int | No | `10` | Number of results to return (1-20) |
| `country` | str | No | `us` | Country code for localized results (us, uk, de, etc.) |
## Environment Variables
| Variable | Required | Description |
|----------|----------|-------------|
| `BRAVE_SEARCH_API_KEY` | Yes | API key from [Brave Search API](https://brave.com/search/api/) |
## Error Handling
Returns error dicts for common issues:
- `BRAVE_SEARCH_API_KEY environment variable not set` - Missing API key
- `Query must be 1-500 characters` - Empty or too long query
- `Invalid API key` - API key rejected (HTTP 401)
- `Rate limit exceeded. Try again later.` - Too many requests (HTTP 429)
- `Search request timed out` - Request exceeded 30s timeout
- `Network error: <error>` - Connection or DNS issues
@@ -0,0 +1,4 @@
"""Web Search Tool - Search the web using Brave Search API."""
from .web_search_tool import register_tools
__all__ = ["register_tools"]
@@ -0,0 +1,100 @@
"""
Web Search Tool - Search the web using Brave Search API.
Requires BRAVE_SEARCH_API_KEY environment variable.
Returns search results with titles, URLs, and snippets.
"""
from __future__ import annotations
import os
import httpx
from fastmcp import FastMCP
def register_tools(mcp: FastMCP) -> None:
"""Register web search tools with the MCP server."""
@mcp.tool()
def web_search(
query: str,
num_results: int = 10,
country: str = "us",
) -> dict:
"""
Search the web for information using Brave Search API.
Returns titles, URLs, and snippets. Use when you need current
information, research, or to find websites.
Requires BRAVE_SEARCH_API_KEY environment variable.
Args:
query: The search query (1-500 chars)
num_results: Number of results to return (1-20)
country: Country code for localized results (us, uk, de, etc.)
Returns:
Dict with search results or error dict
"""
api_key = os.getenv("BRAVE_SEARCH_API_KEY")
if not api_key:
return {
"error": "BRAVE_SEARCH_API_KEY environment variable not set",
"help": "Get an API key at https://brave.com/search/api/",
}
# Validate inputs
if not query or len(query) > 500:
return {"error": "Query must be 1-500 characters"}
if num_results < 1 or num_results > 20:
num_results = max(1, min(20, num_results))
try:
# Make request to Brave Search API
response = httpx.get(
"https://api.search.brave.com/res/v1/web/search",
params={
"q": query,
"count": num_results,
"country": country,
},
headers={
"X-Subscription-Token": api_key,
"Accept": "application/json",
},
timeout=30.0,
)
if response.status_code == 401:
return {"error": "Invalid API key"}
elif response.status_code == 429:
return {"error": "Rate limit exceeded. Try again later."}
elif response.status_code != 200:
return {"error": f"API request failed: HTTP {response.status_code}"}
data = response.json()
# Extract results
results = []
web_results = data.get("web", {}).get("results", [])
for item in web_results[:num_results]:
results.append({
"title": item.get("title", ""),
"url": item.get("url", ""),
"snippet": item.get("description", ""),
})
return {
"query": query,
"results": results,
"total": len(results),
}
except httpx.TimeoutException:
return {"error": "Search request timed out"}
except httpx.RequestError as e:
return {"error": f"Network error: {str(e)}"}
except Exception as e:
return {"error": f"Search failed: {str(e)}"}
@@ -0,0 +1,6 @@
"""
Utility functions for Aden Tools.
"""
from .env_helpers import get_env_var
__all__ = ["get_env_var"]
@@ -0,0 +1,35 @@
"""
Environment variable helpers for Aden Tools.
"""
from __future__ import annotations
import os
from typing import Optional
def get_env_var(
name: str,
default: Optional[str] = None,
required: bool = False,
) -> Optional[str]:
"""
Get an environment variable with optional default and required validation.
Args:
name: Name of the environment variable
default: Default value if not set
required: If True, raises ValueError when not set and no default
Returns:
The environment variable value or default
Raises:
ValueError: If required=True and variable is not set with no default
"""
value = os.environ.get(name, default)
if required and value is None:
raise ValueError(
f"Required environment variable '{name}' is not set. "
f"Please set it before using this tool."
)
return value
+1
View File
@@ -0,0 +1 @@
"""Aden Tools test suite."""
+43
View File
@@ -0,0 +1,43 @@
"""Shared fixtures for aden-tools tests."""
import pytest
from pathlib import Path
from fastmcp import FastMCP
@pytest.fixture
def mcp() -> FastMCP:
"""Create a fresh FastMCP instance for testing."""
return FastMCP("test-server")
@pytest.fixture
def sample_text_file(tmp_path: Path) -> Path:
"""Create a simple text file for testing."""
txt_file = tmp_path / "test.txt"
txt_file.write_text("Hello, World!\nLine 2\nLine 3")
return txt_file
@pytest.fixture
def sample_csv(tmp_path: Path) -> Path:
"""Create a simple CSV file for testing."""
csv_file = tmp_path / "test.csv"
csv_file.write_text("name,age,city\nAlice,30,NYC\nBob,25,LA\nCharlie,35,Chicago\n")
return csv_file
@pytest.fixture
def sample_json(tmp_path: Path) -> Path:
"""Create a simple JSON file for testing."""
json_file = tmp_path / "test.json"
json_file.write_text('{"users": [{"name": "Alice", "age": 30}, {"name": "Bob", "age": 25}]}')
return json_file
@pytest.fixture
def large_text_file(tmp_path: Path) -> Path:
"""Create a large text file for size limit testing."""
large_file = tmp_path / "large.txt"
large_file.write_text("x" * 20_000_000) # 20MB
return large_file
+50
View File
@@ -0,0 +1,50 @@
"""Tests for environment variable helpers."""
import pytest
from aden_tools.utils import get_env_var
class TestGetEnvVar:
"""Tests for get_env_var function."""
def test_returns_value_when_set(self, monkeypatch):
"""Returns the environment variable value when set."""
monkeypatch.setenv("TEST_VAR", "test_value")
result = get_env_var("TEST_VAR")
assert result == "test_value"
def test_returns_default_when_not_set(self, monkeypatch):
"""Returns default value when variable is not set."""
monkeypatch.delenv("UNSET_VAR", raising=False)
result = get_env_var("UNSET_VAR", default="default_value")
assert result == "default_value"
def test_returns_none_when_not_set_and_no_default(self, monkeypatch):
"""Returns None when variable is not set and no default provided."""
monkeypatch.delenv("UNSET_VAR", raising=False)
result = get_env_var("UNSET_VAR")
assert result is None
def test_raises_when_required_and_missing(self, monkeypatch):
"""Raises ValueError when required=True and variable is missing."""
monkeypatch.delenv("REQUIRED_VAR", raising=False)
with pytest.raises(ValueError) as exc_info:
get_env_var("REQUIRED_VAR", required=True)
assert "REQUIRED_VAR" in str(exc_info.value)
assert "not set" in str(exc_info.value)
def test_returns_value_when_required_and_set(self, monkeypatch):
"""Returns value when required=True and variable is set."""
monkeypatch.setenv("REQUIRED_VAR", "my_value")
result = get_env_var("REQUIRED_VAR", required=True)
assert result == "my_value"
+1
View File
@@ -0,0 +1 @@
"""Tool-specific tests."""
@@ -0,0 +1,96 @@
"""Tests for file_read tool (FastMCP)."""
import pytest
from pathlib import Path
from fastmcp import FastMCP
from aden_tools.tools.file_read_tool import register_tools
@pytest.fixture
def file_read_fn(mcp: FastMCP):
"""Register and return the file_read tool function."""
register_tools(mcp)
# Access the registered tool's function directly
return mcp._tool_manager._tools["file_read"].fn
class TestFileReadTool:
"""Tests for file_read tool."""
def test_read_existing_file(self, file_read_fn, sample_text_file: Path):
"""Reading an existing file returns content and metadata."""
result = file_read_fn(file_path=str(sample_text_file))
assert "error" not in result
assert result["content"] == "Hello, World!\nLine 2\nLine 3"
assert result["name"] == "test.txt"
assert result["encoding"] == "utf-8"
assert "size" in result
def test_read_file_not_found(self, file_read_fn, tmp_path: Path):
"""Reading a non-existent file returns an error dict."""
missing_file = tmp_path / "does_not_exist.txt"
result = file_read_fn(file_path=str(missing_file))
assert "error" in result
assert "not found" in result["error"].lower()
def test_read_directory_returns_error(self, file_read_fn, tmp_path: Path):
"""Reading a directory (not a file) returns an error."""
result = file_read_fn(file_path=str(tmp_path))
assert "error" in result
assert "not a file" in result["error"].lower()
def test_read_file_too_large(self, file_read_fn, tmp_path: Path):
"""Reading a file exceeding max_size returns an error."""
large_file = tmp_path / "large.txt"
large_file.write_text("x" * 1000)
result = file_read_fn(file_path=str(large_file), max_size=100)
assert "error" in result
assert "too large" in result["error"].lower()
assert "file_size" in result
def test_read_with_no_size_limit(self, file_read_fn, tmp_path: Path):
"""Reading with max_size=0 allows any file size."""
large_file = tmp_path / "large.txt"
content = "x" * 100_000
large_file.write_text(content)
# max_size=0 means no limit in the implementation
result = file_read_fn(file_path=str(large_file), max_size=0)
assert "error" not in result
assert result["content"] == content
def test_read_with_different_encoding(self, file_read_fn, tmp_path: Path):
"""Reading with a specific encoding works."""
latin_file = tmp_path / "latin.txt"
# Write bytes directly with latin-1 encoding
latin_file.write_bytes("café".encode("latin-1"))
result = file_read_fn(file_path=str(latin_file), encoding="latin-1")
assert "error" not in result
assert result["content"] == "café"
assert result["encoding"] == "latin-1"
def test_read_with_wrong_encoding_returns_error(self, file_read_fn, tmp_path: Path):
"""Reading with wrong encoding returns helpful error."""
# Create a file with bytes that aren't valid UTF-8
binary_file = tmp_path / "binary.txt"
binary_file.write_bytes(b"\xff\xfe")
result = file_read_fn(file_path=str(binary_file), encoding="utf-8")
assert "error" in result
assert "suggestion" in result
def test_returns_absolute_path(self, file_read_fn, sample_text_file: Path):
"""Result includes the absolute path."""
result = file_read_fn(file_path=str(sample_text_file))
assert result["path"] == str(sample_text_file.resolve())
@@ -0,0 +1,99 @@
"""Tests for file_write tool (FastMCP)."""
import pytest
from pathlib import Path
from fastmcp import FastMCP
from aden_tools.tools.file_write_tool import register_tools
@pytest.fixture
def file_write_fn(mcp: FastMCP):
"""Register and return the file_write tool function."""
register_tools(mcp)
return mcp._tool_manager._tools["file_write"].fn
class TestFileWriteTool:
"""Tests for file_write tool."""
def test_write_creates_new_file(self, file_write_fn, tmp_path: Path):
"""Writing to a new file creates it with content."""
new_file = tmp_path / "new.txt"
result = file_write_fn(file_path=str(new_file), content="Hello, World!")
assert "error" not in result
assert result["created"] is True
assert result["name"] == "new.txt"
assert new_file.read_text() == "Hello, World!"
def test_write_overwrites_existing(self, file_write_fn, tmp_path: Path):
"""Writing to existing file overwrites by default."""
existing = tmp_path / "existing.txt"
existing.write_text("old content")
result = file_write_fn(file_path=str(existing), content="new content")
assert "error" not in result
assert result["created"] is False
assert result["previous_size"] is not None
assert existing.read_text() == "new content"
def test_write_appends_to_existing(self, file_write_fn, tmp_path: Path):
"""Writing with mode='append' adds to existing content."""
existing = tmp_path / "existing.txt"
existing.write_text("line1\n")
result = file_write_fn(file_path=str(existing), content="line2\n", mode="append")
assert "error" not in result
assert result["mode"] == "append"
assert existing.read_text() == "line1\nline2\n"
def test_write_creates_parent_dirs(self, file_write_fn, tmp_path: Path):
"""Writing with create_dirs=True creates missing directories."""
deep_path = tmp_path / "nested" / "dirs" / "file.txt"
result = file_write_fn(file_path=str(deep_path), content="content", create_dirs=True)
assert "error" not in result
assert deep_path.exists()
assert deep_path.read_text() == "content"
def test_write_fails_without_parent_dir(self, file_write_fn, tmp_path: Path):
"""Writing with create_dirs=False fails if parent doesn't exist."""
missing_dir = tmp_path / "missing" / "file.txt"
result = file_write_fn(file_path=str(missing_dir), content="content", create_dirs=False)
assert "error" in result
assert "parent directory" in result["error"].lower()
def test_write_invalid_mode(self, file_write_fn, tmp_path: Path):
"""Writing with invalid mode returns error."""
result = file_write_fn(
file_path=str(tmp_path / "test.txt"),
content="content",
mode="invalid"
)
assert "error" in result
assert "invalid mode" in result["error"].lower()
def test_write_returns_bytes_written(self, file_write_fn, tmp_path: Path):
"""Result includes accurate bytes_written count."""
content = "Hello, World!"
result = file_write_fn(file_path=str(tmp_path / "test.txt"), content=content)
assert result["bytes_written"] == len(content.encode("utf-8"))
def test_write_with_encoding(self, file_write_fn, tmp_path: Path):
"""Writing with specific encoding works."""
file_path = tmp_path / "latin.txt"
result = file_write_fn(file_path=str(file_path), content="café", encoding="latin-1")
assert "error" not in result
# Verify it was written with latin-1 encoding
assert file_path.read_bytes() == "café".encode("latin-1")
@@ -0,0 +1,80 @@
"""Tests for pdf_read tool (FastMCP)."""
import pytest
from pathlib import Path
from fastmcp import FastMCP
from aden_tools.tools.pdf_read_tool import register_tools
@pytest.fixture
def pdf_read_fn(mcp: FastMCP):
"""Register and return the pdf_read tool function."""
register_tools(mcp)
return mcp._tool_manager._tools["pdf_read"].fn
class TestPdfReadTool:
"""Tests for pdf_read tool."""
def test_read_pdf_file_not_found(self, pdf_read_fn, tmp_path: Path):
"""Reading non-existent PDF returns error."""
result = pdf_read_fn(file_path=str(tmp_path / "missing.pdf"))
assert "error" in result
assert "not found" in result["error"].lower()
def test_read_pdf_invalid_extension(self, pdf_read_fn, tmp_path: Path):
"""Reading non-PDF file returns error."""
txt_file = tmp_path / "test.txt"
txt_file.write_text("not a pdf")
result = pdf_read_fn(file_path=str(txt_file))
assert "error" in result
assert "not a pdf" in result["error"].lower()
def test_read_pdf_directory(self, pdf_read_fn, tmp_path: Path):
"""Reading a directory returns error."""
result = pdf_read_fn(file_path=str(tmp_path))
assert "error" in result
assert "not a file" in result["error"].lower()
def test_max_pages_clamped_low(self, pdf_read_fn, tmp_path: Path):
"""max_pages below 1 is clamped to 1."""
pdf_file = tmp_path / "test.pdf"
pdf_file.write_bytes(b"%PDF-1.4") # Minimal PDF header (will fail to parse)
result = pdf_read_fn(file_path=str(pdf_file), max_pages=0)
# Will error due to invalid PDF, but max_pages should be accepted
assert isinstance(result, dict)
def test_max_pages_clamped_high(self, pdf_read_fn, tmp_path: Path):
"""max_pages above 1000 is clamped to 1000."""
pdf_file = tmp_path / "test.pdf"
pdf_file.write_bytes(b"%PDF-1.4")
result = pdf_read_fn(file_path=str(pdf_file), max_pages=2000)
# Will error due to invalid PDF, but max_pages should be accepted
assert isinstance(result, dict)
def test_pages_parameter_accepted(self, pdf_read_fn, tmp_path: Path):
"""Various pages parameter formats are accepted."""
pdf_file = tmp_path / "test.pdf"
pdf_file.write_bytes(b"%PDF-1.4")
# Test different page formats - all should be accepted
for pages in ["all", "1", "1-5", "1,3,5", None]:
result = pdf_read_fn(file_path=str(pdf_file), pages=pages)
assert isinstance(result, dict)
def test_include_metadata_parameter(self, pdf_read_fn, tmp_path: Path):
"""include_metadata parameter is accepted."""
pdf_file = tmp_path / "test.pdf"
pdf_file.write_bytes(b"%PDF-1.4")
result = pdf_read_fn(file_path=str(pdf_file), include_metadata=False)
assert isinstance(result, dict)
result = pdf_read_fn(file_path=str(pdf_file), include_metadata=True)
assert isinstance(result, dict)
@@ -0,0 +1,52 @@
"""Tests for web_scrape tool (FastMCP)."""
import pytest
from fastmcp import FastMCP
from aden_tools.tools.web_scrape_tool import register_tools
@pytest.fixture
def web_scrape_fn(mcp: FastMCP):
"""Register and return the web_scrape tool function."""
register_tools(mcp)
return mcp._tool_manager._tools["web_scrape"].fn
class TestWebScrapeTool:
"""Tests for web_scrape tool."""
def test_url_auto_prefixed_with_https(self, web_scrape_fn):
"""URLs without scheme get https:// prefix."""
# This will fail to connect, but we can verify the behavior
result = web_scrape_fn(url="example.com")
# Should either succeed or have a network error (not a validation error)
assert isinstance(result, dict)
def test_max_length_clamped_low(self, web_scrape_fn):
"""max_length below 1000 is clamped to 1000."""
# Test with a very low max_length - implementation clamps to 1000
result = web_scrape_fn(url="https://example.com", max_length=500)
# Should not error due to invalid max_length
assert isinstance(result, dict)
def test_max_length_clamped_high(self, web_scrape_fn):
"""max_length above 500000 is clamped to 500000."""
# Test with a very high max_length - implementation clamps to 500000
result = web_scrape_fn(url="https://example.com", max_length=600000)
# Should not error due to invalid max_length
assert isinstance(result, dict)
def test_valid_max_length_accepted(self, web_scrape_fn):
"""Valid max_length values are accepted."""
result = web_scrape_fn(url="https://example.com", max_length=10000)
assert isinstance(result, dict)
def test_include_links_option(self, web_scrape_fn):
"""include_links parameter is accepted."""
result = web_scrape_fn(url="https://example.com", include_links=True)
assert isinstance(result, dict)
def test_selector_option(self, web_scrape_fn):
"""selector parameter is accepted."""
result = web_scrape_fn(url="https://example.com", selector=".content")
assert isinstance(result, dict)
@@ -0,0 +1,57 @@
"""Tests for web_search tool (FastMCP)."""
import pytest
from fastmcp import FastMCP
from aden_tools.tools.web_search_tool import register_tools
@pytest.fixture
def web_search_fn(mcp: FastMCP):
"""Register and return the web_search tool function."""
register_tools(mcp)
return mcp._tool_manager._tools["web_search"].fn
class TestWebSearchTool:
"""Tests for web_search tool."""
def test_search_missing_api_key(self, web_search_fn, monkeypatch):
"""Search without API key returns helpful error."""
monkeypatch.delenv("BRAVE_SEARCH_API_KEY", raising=False)
result = web_search_fn(query="test query")
assert "error" in result
assert "BRAVE_SEARCH_API_KEY" in result["error"]
assert "help" in result
def test_empty_query_returns_error(self, web_search_fn, monkeypatch):
"""Empty query returns error."""
monkeypatch.setenv("BRAVE_SEARCH_API_KEY", "test-key")
result = web_search_fn(query="")
assert "error" in result
assert "1-500" in result["error"].lower() or "character" in result["error"].lower()
def test_long_query_returns_error(self, web_search_fn, monkeypatch):
"""Query exceeding 500 chars returns error."""
monkeypatch.setenv("BRAVE_SEARCH_API_KEY", "test-key")
result = web_search_fn(query="x" * 501)
assert "error" in result
def test_num_results_clamped_to_valid_range(self, web_search_fn, monkeypatch):
"""num_results outside 1-20 is clamped (not error)."""
monkeypatch.setenv("BRAVE_SEARCH_API_KEY", "test-key")
# Test that the function handles out-of-range values gracefully
# The implementation clamps values, so we just verify it doesn't crash
# (actual API call would fail with invalid key, but that's expected)
result = web_search_fn(query="test", num_results=0)
# Should either clamp or error - both are acceptable
assert isinstance(result, dict)
result = web_search_fn(query="test", num_results=100)
assert isinstance(result, dict)
+25
View File
@@ -131,6 +131,31 @@ services:
networks:
- honeycomb-network
# Aden Tools MCP Server - Python tools via Model Context Protocol
aden-tools-mcp:
build:
context: ./aden-tools
container_name: honeycomb-aden-tools-mcp
ports:
- "${ADEN_TOOLS_MCP_PORT:-4001}:4001"
environment:
- MCP_PORT=4001
# Pass through tool-specific env vars
- BRAVE_SEARCH_API_KEY=${BRAVE_SEARCH_API_KEY:-}
volumes:
- .:/workspace:rw # Mount project root for file access
working_dir: /workspace # Set working directory so relative paths work
command: ["python", "/app/mcp_server.py"] # Use absolute path since working_dir changed
healthcheck:
test: ["CMD", "python", "-c", "import httpx; httpx.get('http://localhost:4001/health').raise_for_status()"]
interval: 30s
timeout: 5s
retries: 5
start_period: 10s
restart: unless-stopped
networks:
- honeycomb-network
networks:
honeycomb-network:
driver: bridge
+30
View File
@@ -0,0 +1,30 @@
# Aden Listicles & Comparisons
Educational content comparing AI agent frameworks and exploring the agent development landscape.
## Articles
| Article | Topic | Keywords |
|---------|-------|----------|
| [Top 10 AI Agent Frameworks in 2025](./top-10-ai-agent-frameworks-2025.md) | Overview | ai agents, frameworks, comparison |
| [Aden vs LangChain](./aden-vs-langchain.md) | Comparison | langchain, rag, llm apps |
| [Aden vs CrewAI](./aden-vs-crewai.md) | Comparison | crewai, multi-agent, orchestration |
| [Aden vs AutoGen](./aden-vs-autogen.md) | Comparison | autogen, microsoft, conversational |
| [Self-Improving vs Static Agents](./self-improving-vs-static-agents.md) | Concept | self-evolution, adaptation |
| [Human-in-the-Loop Guide](./human-in-the-loop-ai-agents.md) | Guide | hitl, human oversight, safety |
| [AI Agent Cost Management](./ai-agent-cost-management-guide.md) | Guide | cost control, budget, optimization |
| [Building Production AI Agents](./building-production-ai-agents.md) | Guide | production, deployment, reliability |
| [Multi-Agent vs Single-Agent](./multi-agent-vs-single-agent-systems.md) | Concept | architecture, design patterns |
| [AI Agent Observability](./ai-agent-observability-monitoring.md) | Guide | monitoring, observability, debugging |
## Purpose
These articles help developers:
- Understand the AI agent landscape
- Make informed framework choices
- Learn best practices for agent development
- Compare different approaches objectively
## Contributing
Want to add or improve an article? See [CONTRIBUTING.md](../../CONTRIBUTING.md).
+366
View File
@@ -0,0 +1,366 @@
# Aden vs AutoGen: A Detailed Comparison
*Comparing self-evolving agents with conversational multi-agent systems*
---
Microsoft's AutoGen and Aden both enable multi-agent systems but serve different purposes. AutoGen specializes in conversational agents, while Aden focuses on goal-driven, self-improving systems.
---
## Overview
| Aspect | AutoGen | Aden |
|--------|---------|------|
| **Developed By** | Microsoft | Aden |
| **Philosophy** | Conversational agents | Goal-driven, self-evolving |
| **Primary Pattern** | Multi-agent conversations | Node-based agent graphs |
| **Communication** | Natural language dialogue | Generated connection code |
| **Self-Improvement** | No | Yes |
| **Best For** | Dialogue-heavy applications | Production agent systems |
| **License** | MIT | Apache 2.0 |
---
## Philosophy & Approach
### AutoGen
AutoGen enables agents to **communicate through natural language conversations**. Agents chat with each other to solve problems collaboratively.
```python
# AutoGen: Conversation-based agents
from autogen import AssistantAgent, UserProxyAgent
assistant = AssistantAgent(
name="assistant",
llm_config={"model": "gpt-4"}
)
user_proxy = UserProxyAgent(
name="user_proxy",
human_input_mode="TERMINATE",
code_execution_config={"work_dir": "coding"}
)
# Agents solve problems through conversation
user_proxy.initiate_chat(
assistant,
message="Create a Python script to analyze sales data"
)
```
### Aden
Aden uses a **coding agent to generate complete agent systems** from goals. Agents are connected through generated code, not just conversation.
```python
# Aden: Goal-driven agent generation
goal = """
Build a data analysis system that:
1. Ingests sales data from multiple sources
2. Generates insights and visualizations
3. Creates weekly summary reports
4. Escalates anomalies to the data team
When analysis fails or produces incorrect results,
learn from the corrections to improve accuracy.
"""
# Aden generates specialized agents with:
# - Data ingestion tools
# - Analysis capabilities
# - Visualization outputs
# - Human escalation for anomalies
# - Self-improvement from feedback
```
---
## Feature Comparison
### Communication Model
| Feature | AutoGen | Aden |
|---------|---------|------|
| Agent-to-agent | Natural language | Generated connections |
| Conversation history | Built-in | Via memory nodes |
| Message passing | Sequential turns | Async/event-driven |
| Human interaction | Via UserProxyAgent | Native HITL nodes |
**Verdict:** AutoGen is more natural for dialogue; Aden is more flexible for diverse patterns.
### Code Execution
| Feature | AutoGen | Aden |
|---------|---------|------|
| Code execution | Built-in (sandboxed) | Via tools |
| Language support | Python (primarily) | Multi-language via tools |
| Execution safety | Docker containers | Tool-level sandboxing |
| Result handling | Conversation flow | Structured outputs |
**Verdict:** AutoGen has stronger built-in code execution; Aden uses tool abstraction.
### Multi-Agent Patterns
| Feature | AutoGen | Aden |
|---------|---------|------|
| Group chat | Native support | Via graph connections |
| Hierarchical | Nested conversations | Node hierarchies |
| Dynamic agents | Limited | Coding agent creates as needed |
| Agent discovery | Manual | Auto-generated |
**Verdict:** AutoGen excels at chat patterns; Aden is more flexible for non-chat workflows.
### Production Features
| Feature | AutoGen | Aden |
|---------|---------|------|
| Monitoring | Basic logging | Full dashboard |
| Cost tracking | Manual | Automatic |
| Budget controls | Not built-in | Native |
| Self-improvement | No | Yes |
**Verdict:** Aden is significantly more production-ready.
---
## Code Comparison
### Building a Coding Assistant
#### AutoGen Approach
```python
from autogen import AssistantAgent, UserProxyAgent, GroupChat, GroupChatManager
# Define specialized agents
coder = AssistantAgent(
name="coder",
system_message="You are a Python expert...",
llm_config=llm_config
)
reviewer = AssistantAgent(
name="reviewer",
system_message="You review code for bugs and improvements...",
llm_config=llm_config
)
executor = UserProxyAgent(
name="executor",
human_input_mode="NEVER",
code_execution_config={"work_dir": "workspace"}
)
# Create group chat
group_chat = GroupChat(
agents=[coder, reviewer, executor],
messages=[],
max_round=10
)
manager = GroupChatManager(groupchat=group_chat, llm_config=llm_config)
# Start conversation
executor.initiate_chat(
manager,
message="Create a data processing pipeline"
)
# Conversation happens naturally between agents
# Each agent responds based on their role
```
#### Aden Approach
```python
# Define goal for coding assistant system
goal = """
Build a code development system that:
1. Understands coding requests and breaks them into tasks
2. Writes Python code following best practices
3. Reviews code for bugs, security issues, and improvements
4. Executes code in a safe environment
5. Iterates based on execution results
Human review required for:
- Code that accesses external services
- Changes to production systems
- Code handling sensitive data
Self-improvement:
- Learn from code review feedback
- Track which patterns cause bugs
- Improve based on execution failures
"""
# Aden creates:
# - Task decomposition agent
# - Coder agent with best practices
# - Reviewer agent with learned patterns
# - Safe execution environment
# - Human checkpoints for sensitive operations
# - Feedback loop for continuous improvement
```
---
## Use Case Comparison
### Best for AutoGen
1. **Conversational AI applications**
- Chatbots with multiple personalities
- Customer service with specialist handoffs
- Interactive tutoring systems
2. **Code generation through dialogue**
- Pair programming assistants
- Code review discussions
- Debugging conversations
3. **Research and exploration**
- Collaborative problem solving
- Multi-perspective analysis
- Brainstorming sessions
### Best for Aden
1. **Production agent systems**
- Customer support with evolution
- Data pipelines that self-correct
- Content systems that improve
2. **Goal-oriented automation**
- Business process automation
- Monitoring and alerting
- Report generation
3. **Systems requiring adaptation**
- Changing requirements
- Learning from failures
- Continuous improvement
---
## Detailed Comparisons
### Conversation Management
| Aspect | AutoGen | Aden |
|--------|---------|------|
| Turn management | Automatic | Event-driven |
| Context window | Managed | Via memory tools |
| History persistence | Session-based | Durable storage |
| Branching conversations | Supported | Via graph structure |
### Error Handling
| Aspect | AutoGen | Aden |
|--------|---------|------|
| Execution errors | Retry in conversation | Capture and evolve |
| Logic errors | Agent discussion | Failure analysis |
| Recovery | Manual intervention | Automatic adaptation |
| Learning | No | Built-in |
### Integration
| Aspect | AutoGen | Aden |
|--------|---------|------|
| External tools | Function calling | Tool nodes |
| APIs | Custom integration | SDK support |
| Databases | Via code execution | Native connections |
| Enterprise systems | Custom | MCP tools |
---
## When to Choose AutoGen
AutoGen is the better choice when:
1. **Conversation is the core pattern** - Your agents primarily communicate through dialogue
2. **Code execution is central** - Need built-in sandboxed execution
3. **Microsoft ecosystem** - Already invested in Microsoft AI tools
4. **Research applications** - Exploring multi-agent conversations
5. **Flexible dialogue** - Agents need natural back-and-forth
6. **Quick prototypes** - Simple multi-agent conversations
---
## When to Choose Aden
Aden is the better choice when:
1. **Production requirements** - Need monitoring, cost control, health checks
2. **Self-improvement matters** - System should evolve from failures
3. **Goal-driven development** - Prefer describing outcomes
4. **Non-conversational patterns** - Workflows beyond dialogue
5. **Cost management** - Need budget enforcement
6. **Human-in-the-loop** - Require structured intervention points
7. **Long-running systems** - Agents operating continuously
---
## Hybrid Architectures
### AutoGen Agents in Aden
AutoGen conversations can be wrapped as Aden nodes:
```python
# AutoGen conversation as a node in Aden's graph
class AutoGenConversationNode:
def execute(self, input):
# Run AutoGen conversation
# Return structured output
pass
```
### Benefits of Hybrid
- Use AutoGen's conversation for dialogue-heavy tasks
- Use Aden's orchestration and monitoring
- Get self-improvement across the system
- Maintain cost controls
---
## Performance Considerations
| Metric | AutoGen | Aden |
|--------|---------|------|
| Latency per turn | Higher (full responses) | Optimized per node |
| Token efficiency | Conversation overhead | Direct communication |
| Scalability | Memory-bound | Distributed-ready |
| Cost tracking | Manual | Automatic |
---
## Community & Support
| Aspect | AutoGen | Aden |
|--------|---------|------|
| Backing | Microsoft Research | Y Combinator startup |
| Community | Large, active | Growing |
| Documentation | Comprehensive | Good and improving |
| Enterprise support | Microsoft channels | Direct team support |
---
## Conclusion
**AutoGen** excels at creating agents that collaborate through natural language conversations. It's ideal for dialogue-heavy applications and leverages Microsoft's AI expertise.
**Aden** provides goal-driven, self-improving agent systems with production features built-in. It's better for systems that need to evolve and require operational visibility.
### Quick Decision Guide
| Your Need | Choose |
|-----------|--------|
| Conversational agents | AutoGen |
| Code execution focus | AutoGen |
| Self-improving systems | Aden |
| Production monitoring | Aden |
| Microsoft ecosystem | AutoGen |
| Cost management | Aden |
| Natural dialogue | AutoGen |
| Goal-driven development | Aden |
---
*Last updated: January 2025*
+346
View File
@@ -0,0 +1,346 @@
# Aden vs CrewAI: A Detailed Comparison
*Comparing self-evolving agents with role-based agent teams*
---
CrewAI and Aden both focus on multi-agent systems but take fundamentally different approaches. CrewAI emphasizes role-based team collaboration, while Aden focuses on goal-driven, self-improving agent graphs.
---
## Overview
| Aspect | CrewAI | Aden |
|--------|--------|------|
| **Philosophy** | Role-based agent teams | Goal-driven, self-evolving agents |
| **Architecture** | Crews with roles | Node-based agent graphs |
| **Workflow** | Predefined collaboration | Dynamically generated |
| **Self-Improvement** | No | Yes |
| **Human-in-the-Loop** | Basic support | Native intervention points |
| **Monitoring** | Basic logging | Full dashboard |
| **License** | MIT | Apache 2.0 |
---
## Philosophy & Approach
### CrewAI
CrewAI organizes agents as a **crew** with defined **roles**. Each agent has a specific job, and they collaborate in predefined patterns to accomplish tasks.
```python
# CrewAI: Role-based team definition
from crewai import Agent, Task, Crew
researcher = Agent(
role="Senior Research Analyst",
goal="Uncover cutting-edge developments",
backstory="You are an expert at finding information...",
tools=[search_tool, web_scraper]
)
writer = Agent(
role="Content Writer",
goal="Create engaging content from research",
backstory="You are a skilled writer..."
)
# Define tasks and crew
crew = Crew(
agents=[researcher, writer],
tasks=[research_task, writing_task],
process=Process.sequential
)
```
### Aden
Aden uses a **coding agent** to generate agent systems from natural language goals. The system creates agents, connections, and evolves based on failures.
```python
# Aden: Goal-driven generation
goal = """
Research cutting-edge developments in AI and create
engaging blog content. When content is rejected by
editors, learn from the feedback to improve future posts.
"""
# Aden generates:
# - Research agent with appropriate tools
# - Writer agent with learned preferences
# - Editor checkpoint (human-in-the-loop)
# - Feedback loop for improvement
```
---
## Feature Comparison
### Agent Definition
| Feature | CrewAI | Aden |
|---------|--------|------|
| Agent creation | Manual role definition | Generated from goals |
| Roles | Explicit (role, goal, backstory) | Inferred from requirements |
| Tools assignment | Manual per agent | Auto-configured |
| Customization | High | High (via goal refinement) |
**Verdict:** CrewAI offers more explicit control; Aden reduces boilerplate through generation.
### Team Collaboration
| Feature | CrewAI | Aden |
|---------|--------|------|
| Collaboration patterns | Sequential, hierarchical | Dynamic, goal-based |
| Communication | Predefined handoffs | Generated connection code |
| Flexibility | Within defined patterns | Fully dynamic |
| Adaptation | Manual updates | Automatic evolution |
**Verdict:** CrewAI is more predictable; Aden is more adaptive.
### Failure Handling
| Feature | CrewAI | Aden |
|---------|--------|------|
| Error handling | Try/catch | Automatic capture |
| Learning from failures | Not built-in | Core feature |
| Agent evolution | Manual updates | Automatic |
| Recovery strategies | Custom code | Built-in policies |
**Verdict:** Aden's failure handling and evolution is significantly more advanced.
### Production Features
| Feature | CrewAI | Aden |
|---------|--------|------|
| Monitoring dashboard | No | Yes |
| Cost tracking | No | Yes |
| Budget enforcement | No | Yes |
| Health checks | Basic | Comprehensive |
**Verdict:** Aden is more production-ready out of the box.
---
## Code Comparison
### Building a Content Creation Team
#### CrewAI Approach
```python
from crewai import Agent, Task, Crew, Process
# Define agents with explicit roles
researcher = Agent(
role="Research Specialist",
goal="Find accurate, relevant information",
backstory="Expert researcher with attention to detail",
verbose=True,
tools=[search_tool, scrape_tool]
)
writer = Agent(
role="Content Writer",
goal="Create engaging, SEO-friendly content",
backstory="Experienced content creator",
verbose=True
)
editor = Agent(
role="Editor",
goal="Ensure quality and accuracy",
backstory="Meticulous editor with high standards"
)
# Define tasks
research_task = Task(
description="Research {topic} thoroughly",
agent=researcher,
expected_output="Comprehensive research notes"
)
writing_task = Task(
description="Write article based on research",
agent=writer,
expected_output="Draft article"
)
editing_task = Task(
description="Edit and polish the article",
agent=editor,
expected_output="Final article"
)
# Create and run crew
crew = Crew(
agents=[researcher, writer, editor],
tasks=[research_task, writing_task, editing_task],
process=Process.sequential
)
result = crew.kickoff(inputs={"topic": "AI trends 2025"})
```
#### Aden Approach
```python
# Define goal - system generates the team
goal = """
Create a content creation system that:
1. Researches topics thoroughly using web search
2. Writes engaging, SEO-optimized articles
3. Gets human editor approval before publishing
4. Learns from editor feedback to improve over time
When articles are rejected:
- Capture the feedback
- Identify patterns in rejections
- Adjust writing style and quality criteria
"""
# Aden automatically:
# - Creates research, writer nodes
# - Sets up human-in-the-loop for editor
# - Establishes feedback learning loop
# - Monitors cost and quality metrics
# The system evolves:
# - Writing improves based on rejections
# - Research depth adjusts based on needs
# - Quality thresholds adapt
```
---
## Detailed Comparisons
### Ease of Use
| Aspect | CrewAI | Aden |
|--------|--------|------|
| Learning curve | Moderate | Moderate |
| Initial setup | Define roles/tasks | Define goals |
| Iteration speed | Requires code changes | Goal refinement |
| Documentation | Good | Growing |
### Scalability
| Aspect | CrewAI | Aden |
|--------|--------|------|
| Agent count | Grows with complexity | Managed automatically |
| Task complexity | Manual orchestration | Dynamic handling |
| Resource management | Manual | Built-in controls |
### Customization
| Aspect | CrewAI | Aden |
|--------|--------|------|
| Agent behavior | Full control via role/backstory | Via goals and feedback |
| Tools | Assign per agent | Auto-configured + custom |
| Workflows | Predefined processes | Generated + evolved |
| Prompts | Full access | Goal-based abstraction |
---
## When to Choose CrewAI
CrewAI is the better choice when:
1. **Roles are well-defined** - You know exactly what each agent should do
2. **Predictable workflows** - Sequential or hierarchical processes work
3. **Direct control needed** - Want to define every aspect of agent behavior
4. **Simple team structures** - Small crews with clear responsibilities
5. **Quick prototyping** - Get a multi-agent system running fast
6. **No evolution needed** - Workflow won't need to adapt over time
---
## When to Choose Aden
Aden is the better choice when:
1. **Goals over roles** - Know what to achieve, not how to organize
2. **Adaptation required** - System needs to improve from failures
3. **Complex workflows** - Dynamic connections between many agents
4. **Production deployment** - Need monitoring, cost controls, health checks
5. **Human oversight** - Require native HITL with escalation policies
6. **Continuous improvement** - Want agents to get better automatically
7. **Cost management** - Need budget enforcement and model degradation
---
## Hybrid Approaches
Some teams use both frameworks:
### CrewAI for Specific Tasks
```python
# Use CrewAI for well-defined sub-tasks
research_crew = Crew(agents=[...], tasks=[...])
```
### Aden for Orchestration
```python
# Aden orchestrates and evolves the overall system
# CrewAI crews can be nodes in Aden's graph
```
---
## Migration Considerations
### CrewAI to Aden
- Map roles to goal descriptions
- Convert tasks to expected outcomes
- Existing tools often transfer directly
- Add failure scenarios to enable evolution
### Aden to CrewAI
- Analyze generated agent graph for roles
- Define explicit role/backstory from behavior
- Recreate evolution logic manually if needed
- Set up external monitoring
---
## Performance Comparison
| Metric | CrewAI | Aden |
|--------|--------|------|
| Startup time | Fast | Moderate (includes setup) |
| Execution overhead | Low | Low |
| Memory usage | Depends on agents | Includes monitoring |
| LLM calls | As defined | Optimized + tracked |
---
## Community & Ecosystem
| Aspect | CrewAI | Aden |
|--------|--------|------|
| GitHub stars | High | Growing |
| Community size | Large | Growing |
| Enterprise users | Many | Early adopters |
| Third-party tools | Growing ecosystem | Integrated platform |
---
## Conclusion
**CrewAI** excels at creating predictable, role-based agent teams with explicit control over behavior and collaboration patterns. It's ideal for well-defined workflows.
**Aden** shines when you need agents that evolve and improve, with built-in production features like monitoring and cost control. It's better for systems that need to adapt.
### Decision Matrix
| Your Situation | Choose |
|----------------|--------|
| Know exact roles needed | CrewAI |
| Know outcomes, not structure | Aden |
| Need predictable behavior | CrewAI |
| Need adaptive behavior | Aden |
| Simple prototyping | CrewAI |
| Production deployment | Aden |
| Cost management important | Aden |
| Maximum control | CrewAI |
---
*Last updated: January 2025*
+266
View File
@@ -0,0 +1,266 @@
# Aden vs LangChain: A Detailed Comparison
*Choosing between goal-driven agents and component-based development*
---
LangChain and Aden represent two different philosophies for building AI agent systems. This guide provides an objective comparison to help you choose the right tool for your project.
---
## Overview
| Aspect | LangChain | Aden |
|--------|-----------|------|
| **Philosophy** | Component library for LLM apps | Goal-driven, self-improving agents |
| **Primary Language** | Python, JavaScript | Python SDK, TypeScript backend |
| **Architecture** | Chains and components | Node-based agent graphs |
| **Workflow Definition** | Manual chain creation | Generated from natural language |
| **Self-Improvement** | No | Yes, automatic evolution |
| **Monitoring** | Third-party integrations | Built-in dashboard |
| **License** | MIT | Apache 2.0 |
---
## Philosophy & Approach
### LangChain
LangChain follows a **component-based approach**. You manually select and connect components (LLMs, retrievers, tools, memory) to build chains and agents. This gives you fine-grained control but requires explicit workflow definition.
```python
# LangChain: Manual chain construction
from langchain import LLMChain, PromptTemplate
from langchain.agents import create_react_agent
# You define every component and connection
prompt = PromptTemplate(...)
chain = LLMChain(llm=llm, prompt=prompt)
agent = create_react_agent(llm, tools, prompt)
```
### Aden
Aden follows a **goal-driven approach**. You describe what you want to achieve in natural language, and a coding agent generates the agent graph and connection code. When things fail, the system evolves automatically.
```python
# Aden: Goal-driven generation
# Describe your goal, the coding agent generates the system
goal = """
Create a system that monitors customer feedback,
categorizes sentiment, and escalates negative reviews
to the support team with suggested responses.
"""
# The framework generates agents, connections, and tests
```
---
## Feature Comparison
### RAG & Document Processing
| Feature | LangChain | Aden |
|---------|-----------|------|
| Vector store integrations | Extensive (50+) | Growing |
| Document loaders | Comprehensive | Via tools |
| Retrieval strategies | Multiple built-in | Customizable |
| Query transformation | Built-in | Agent-defined |
**Verdict:** LangChain excels at RAG with its mature ecosystem of integrations.
### Agent Architecture
| Feature | LangChain | Aden |
|---------|-----------|------|
| Agent types | ReAct, OpenAI Functions, etc. | SDK-wrapped nodes |
| Multi-agent | Requires orchestration | Native multi-agent |
| Communication | Manual setup | Auto-generated connections |
| Graph visualization | Third-party | Built-in dashboard |
**Verdict:** Aden provides more native multi-agent support; LangChain offers more agent type options.
### Self-Improvement & Adaptation
| Feature | LangChain | Aden |
|---------|-----------|------|
| Failure handling | Manual try/catch | Automatic capture |
| Learning from failures | Not built-in | Automatic evolution |
| Agent graph updates | Manual code changes | Automated via coding agent |
| A/B testing agents | Manual | Roadmap |
**Verdict:** Aden's self-improvement is a unique differentiator not found in LangChain.
### Observability & Monitoring
| Feature | LangChain | Aden |
|---------|-----------|------|
| Tracing | LangSmith (paid), third-party | Built-in |
| Cost tracking | Third-party | Native |
| Real-time monitoring | LangSmith | WebSocket dashboard |
| Budget controls | Not built-in | Native with auto-degradation |
**Verdict:** Aden includes monitoring out of the box; LangChain requires LangSmith or third-party tools.
### Human-in-the-Loop
| Feature | LangChain | Aden |
|---------|-----------|------|
| Human approval | Manual implementation | Native intervention nodes |
| Escalation policies | Custom code | Configurable timeouts |
| Input collection | Custom | Built-in request system |
**Verdict:** Aden has more built-in HITL support; LangChain requires custom implementation.
---
## Code Comparison
### Building a Customer Support Agent
#### LangChain Approach
```python
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain_openai import ChatOpenAI
from langchain.tools import Tool
from langchain.memory import ConversationBufferMemory
# Define tools manually
tools = [
Tool(name="search_kb", func=search_knowledge_base, description="..."),
Tool(name="create_ticket", func=create_support_ticket, description="..."),
Tool(name="escalate", func=escalate_to_human, description="..."),
]
# Create agent with explicit configuration
llm = ChatOpenAI(model="gpt-4")
memory = ConversationBufferMemory()
agent = create_openai_tools_agent(llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools, memory=memory)
# Run agent
response = executor.invoke({"input": customer_query})
# Error handling is manual
try:
response = executor.invoke({"input": query})
except Exception as e:
log_error(e)
# Manual recovery logic
```
#### Aden Approach
```python
# Define goal - system generates the agent graph
goal = """
Build a customer support agent that:
1. Searches our knowledge base for answers
2. Creates tickets for unresolved issues
3. Escalates to humans when confidence is low
4. Learns from resolved tickets to improve responses
When the agent fails to help a customer, capture the failure
and improve the response strategy.
"""
# Aden generates:
# - Agent graph with specialized nodes
# - Connection code between nodes
# - Test cases for validation
# - Monitoring hooks
# The SDK handles:
# - Automatic failure capture
# - Evolution based on failures
# - Cost tracking and budget enforcement
# - Human escalation at intervention points
```
---
## Production Considerations
### Deployment
| Aspect | LangChain | Aden |
|--------|-----------|------|
| Deployment model | Library in your app | Self-hosted platform |
| Infrastructure | You manage | Docker Compose included |
| Scaling | Your responsibility | Built-in considerations |
| Database requirements | Optional | TimescaleDB, MongoDB, PostgreSQL |
### Cost Management
| Aspect | LangChain | Aden |
|--------|-----------|------|
| Token tracking | Manual or LangSmith | Automatic |
| Budget limits | Not built-in | Native with enforcement |
| Model degradation | Manual | Automatic fallback |
| Cost alerts | Third-party | Built-in |
### Reliability
| Aspect | LangChain | Aden |
|--------|-----------|------|
| Retry logic | Manual | Built-in |
| Fallback chains | Manual | Automatic |
| Health monitoring | Third-party | Native endpoints |
| Self-healing | No | Yes |
---
## When to Choose LangChain
LangChain is the better choice when:
1. **Building RAG applications** - LangChain's retrieval ecosystem is unmatched
2. **Need extensive integrations** - 50+ vector stores, document loaders, etc.
3. **Want fine-grained control** - Every component is explicitly configured
4. **Already invested** - Large existing LangChain codebase
5. **Simple agent needs** - Single-purpose agents without complex orchestration
6. **Prefer library over platform** - Want to embed in existing infrastructure
---
## When to Choose Aden
Aden is the better choice when:
1. **Agents need to evolve** - Systems should improve from failures automatically
2. **Goal-driven development** - Prefer describing outcomes over coding workflows
3. **Multi-agent systems** - Complex agent graphs with dynamic connections
4. **Production monitoring is critical** - Need built-in observability
5. **Cost control matters** - Require budget enforcement and auto-degradation
6. **Human oversight needed** - Native HITL support with escalation
7. **Rapid iteration** - Want to change agent behavior without code rewrites
---
## Migration Considerations
### LangChain to Aden
- LangChain tools can often be adapted as Aden node tools
- Existing prompts can inform goal definitions
- Consider gradual migration, running systems in parallel
### Aden to LangChain
- Agent graphs can be manually reimplemented as chains
- Monitoring would need replacement (LangSmith or alternatives)
- Self-improvement logic would need custom implementation
---
## Conclusion
**LangChain** is a mature, flexible component library ideal for RAG applications and developers who want explicit control over every aspect of their agent.
**Aden** offers a paradigm shift with goal-driven, self-improving agents, better suited for production systems that need to adapt and evolve over time with built-in monitoring.
The choice depends on:
- **Control vs. Automation**: LangChain for control, Aden for automation
- **Static vs. Evolving**: LangChain for stable workflows, Aden for adaptive systems
- **Library vs. Platform**: LangChain as a library, Aden as a platform
Many teams use both: LangChain for specific RAG components, Aden for orchestration and evolution.
---
*Last updated: January 2025*
@@ -0,0 +1,465 @@
# AI Agent Cost Management: A Complete Guide
*Control spending, optimize efficiency, and prevent budget disasters*
---
AI agents can burn through budgets faster than you expect. A single runaway agent loop can cost thousands of dollars in minutes. This guide covers strategies, tools, and best practices for managing AI agent costs.
---
## The Cost Problem
### Why AI Agents Are Expensive
| Factor | Impact |
|--------|--------|
| LLM API calls | $0.01 - $0.10+ per call |
| Token usage | Input + output tokens |
| Agent loops | Multiple calls per task |
| Retries | Failed calls still cost money |
| Verbose prompts | More tokens = more cost |
| Tool usage | Additional API calls |
### Real-World Example
```
Simple customer support agent:
- 5 LLM calls per interaction
- 2000 tokens average per call
- GPT-4: ~$0.06 per call
- 100 interactions/day = $30/day
Complex research agent:
- 50+ LLM calls per task
- 10000 tokens average per call
- GPT-4: ~$0.30 per call
- 10 tasks/day = $150/day
Runaway agent loop:
- 1000 calls in 10 minutes
- $300+ before detection
```
---
## Cost Control Strategies
### Strategy 1: Budget Limits
Set hard limits on spending per:
- Time period (daily, weekly, monthly)
- Agent
- Task
- Team
- User
```python
budget_config = {
"daily_limit": 100.00,
"per_task_limit": 5.00,
"per_agent_limit": 50.00,
"alert_at_percentage": 80,
"action_on_limit": "block" # or "degrade", "alert"
}
```
### Strategy 2: Model Degradation
Automatically switch to cheaper models as budget is consumed:
```
Budget usage:
0-70% → Use GPT-4 (best quality)
70-90% → Use GPT-3.5-turbo (good quality)
90-100% → Use GPT-3.5-turbo with shorter prompts
100%+ → Block or queue requests
```
### Strategy 3: Request Throttling
Limit request rate to control burn rate:
```python
throttle_config = {
"requests_per_minute": 10,
"requests_per_hour": 200,
"backoff_multiplier": 2,
"max_backoff_seconds": 60
}
```
### Strategy 4: Token Optimization
Reduce tokens per request:
| Technique | Savings |
|-----------|---------|
| Shorter system prompts | 20-40% |
| Compressed context | 30-50% |
| Response length limits | 20-30% |
| Remove unnecessary examples | 10-20% |
### Strategy 5: Caching
Cache common requests and responses:
```python
# Before: Every request hits the API
result = llm.complete(prompt) # Costs money
# After: Cache frequent patterns
cached = cache.get(prompt_hash)
if cached:
result = cached # Free
else:
result = llm.complete(prompt)
cache.set(prompt_hash, result)
```
---
## Framework Comparison: Cost Features
| Framework | Budget Limits | Degradation | Tracking | Alerts |
|-----------|--------------|-------------|----------|--------|
| LangChain | Third-party | Manual | LangSmith | Manual |
| CrewAI | Not built-in | Manual | Basic | Manual |
| AutoGen | Not built-in | Manual | Manual | Manual |
| **Aden** | **Native** | **Automatic** | **Built-in** | **Native** |
### Aden's Cost Controls
Aden includes comprehensive cost management:
```python
# Budget configuration in Aden
budget_rules = {
"budget_id": "team_engineering",
"limits": {
"daily": 500.00,
"monthly": 10000.00,
"per_agent": 100.00
},
"degradation": {
"80_percent": "switch_to_gpt35",
"95_percent": "throttle",
"100_percent": "block"
},
"alerts": {
"channels": ["slack", "email"],
"thresholds": [50, 80, 95, 100]
}
}
```
---
## Implementing Cost Tracking
### Basic Tracking
```python
class CostTracker:
def __init__(self):
self.total_cost = 0
self.cost_by_agent = {}
self.cost_by_model = {}
def track(self, request, response, model):
input_tokens = count_tokens(request)
output_tokens = count_tokens(response)
cost = self.calculate_cost(model, input_tokens, output_tokens)
self.total_cost += cost
self.cost_by_agent[request.agent_id] = \
self.cost_by_agent.get(request.agent_id, 0) + cost
self.cost_by_model[model] = \
self.cost_by_model.get(model, 0) + cost
return cost
def calculate_cost(self, model, input_tokens, output_tokens):
rates = {
"gpt-4": {"input": 0.03, "output": 0.06}, # per 1K tokens
"gpt-3.5-turbo": {"input": 0.0005, "output": 0.0015},
"claude-3-opus": {"input": 0.015, "output": 0.075},
"claude-3-sonnet": {"input": 0.003, "output": 0.015},
}
rate = rates.get(model, rates["gpt-3.5-turbo"])
return (input_tokens * rate["input"] + output_tokens * rate["output"]) / 1000
```
### Advanced Tracking with Attribution
```python
cost_record = {
"timestamp": "2025-01-15T10:30:00Z",
"request_id": "req_123",
"agent_id": "support_agent_1",
"task_id": "task_456",
"team_id": "customer_success",
"model": "gpt-4",
"input_tokens": 1500,
"output_tokens": 500,
"cost_usd": 0.075,
"cached": False,
"degraded": False
}
```
---
## Alert Configuration
### Threshold Alerts
```yaml
alerts:
- name: "Budget Warning"
condition: "daily_spend > daily_budget * 0.8"
channels: ["slack"]
message: "80% of daily budget consumed"
- name: "Budget Critical"
condition: "daily_spend > daily_budget * 0.95"
channels: ["slack", "pagerduty"]
message: "95% of daily budget - taking action"
action: "degrade_models"
- name: "Runaway Agent"
condition: "requests_per_minute > 100"
channels: ["pagerduty"]
message: "Possible runaway agent detected"
action: "pause_agent"
```
### Anomaly Detection
```python
def detect_anomalies(recent_costs, historical_average):
"""Alert if costs significantly exceed historical patterns"""
threshold = historical_average * 3 # 3x normal
if recent_costs > threshold:
alert(
level="critical",
message=f"Cost anomaly: ${recent_costs:.2f} vs avg ${historical_average:.2f}",
action="investigate"
)
```
---
## Model Selection Strategies
### Cost vs Quality Matrix
| Model | Cost (per 1K tokens) | Quality | Best For |
|-------|---------------------|---------|----------|
| GPT-4 | $0.03-0.06 | Highest | Complex reasoning |
| GPT-4-turbo | $0.01-0.03 | High | Balance cost/quality |
| GPT-3.5-turbo | $0.0005-0.0015 | Good | High volume, simple |
| Claude 3 Opus | $0.015-0.075 | Highest | Long context |
| Claude 3 Sonnet | $0.003-0.015 | High | Good balance |
| Claude 3 Haiku | $0.00025-0.00125 | Good | Fast, cheap |
### Dynamic Model Selection
```python
def select_model(task_complexity, budget_remaining, daily_limit):
budget_percentage = (daily_limit - budget_remaining) / daily_limit
if task_complexity == "simple":
return "gpt-3.5-turbo" # Always cheap for simple
elif budget_percentage < 0.5:
return "gpt-4" # Best model when budget healthy
elif budget_percentage < 0.8:
return "gpt-4-turbo" # Balanced
else:
return "gpt-3.5-turbo" # Preserve budget
```
---
## Optimization Techniques
### 1. Prompt Engineering for Cost
```python
# Expensive: Long system prompt
system_prompt = """
You are a helpful assistant that specializes in customer support.
You should always be polite, professional, and helpful.
When answering questions, provide detailed explanations.
Always consider the customer's perspective.
Remember to be empathetic and understanding.
[... 500 more tokens ...]
"""
# Cheaper: Concise system prompt
system_prompt = """
Customer support agent. Be helpful, polite, concise.
Resolve issues efficiently.
"""
# Savings: ~400 tokens × 1000 requests = $12/day
```
### 2. Context Window Management
```python
def manage_context(messages, max_tokens=4000):
"""Keep context within budget by summarizing old messages"""
current_tokens = count_tokens(messages)
if current_tokens > max_tokens:
# Summarize older messages
old_messages = messages[:-5] # Keep recent
summary = summarize(old_messages)
return [{"role": "system", "content": f"Previous context: {summary}"}] + messages[-5:]
return messages
```
### 3. Batch Processing
```python
# Expensive: Individual requests
for item in items:
result = llm.complete(f"Process: {item}")
# Cheaper: Batch when possible
batch_prompt = "Process these items:\n" + "\n".join(items)
results = llm.complete(batch_prompt)
```
### 4. Response Length Control
```python
# Add to system prompt
system_prompt += "\nKeep responses under 200 words."
# Or use max_tokens parameter
response = llm.complete(
prompt,
max_tokens=300 # Hard limit
)
```
---
## Runaway Agent Prevention
### Detection Mechanisms
```python
class RunawayDetector:
def __init__(self):
self.request_times = []
self.max_requests_per_minute = 50
self.max_cost_per_minute = 10.00
def check(self, cost):
now = time.time()
self.request_times.append((now, cost))
# Clean old entries
self.request_times = [
(t, c) for t, c in self.request_times
if now - t < 60
]
# Check thresholds
requests_per_minute = len(self.request_times)
cost_per_minute = sum(c for _, c in self.request_times)
if requests_per_minute > self.max_requests_per_minute:
return "RUNAWAY_REQUESTS"
if cost_per_minute > self.max_cost_per_minute:
return "RUNAWAY_COST"
return "OK"
```
### Circuit Breakers
```python
class CostCircuitBreaker:
def __init__(self, threshold, window_seconds=60):
self.threshold = threshold
self.window_seconds = window_seconds
self.costs = []
self.is_open = False
def record_cost(self, cost):
now = time.time()
self.costs.append((now, cost))
self._cleanup()
total_cost = sum(c for _, c in self.costs)
if total_cost > self.threshold:
self.is_open = True
alert("Circuit breaker opened - costs exceeded threshold")
def allow_request(self):
if self.is_open:
# Check if we should reset
if time.time() - self.costs[-1][0] > self.window_seconds:
self.is_open = False
self.costs = []
return True
return False
return True
```
---
## Dashboard Metrics
### Essential Cost Metrics
| Metric | Description | Alert Threshold |
|--------|-------------|-----------------|
| Hourly spend | Cost in last hour | > 2x average |
| Daily spend | Cost today | > 80% budget |
| Cost per task | Average task cost | > expected |
| Token efficiency | Output/input ratio | < 0.3 |
| Cache hit rate | Cached vs new requests | < 50% |
| Model distribution | % by model | Unexpected shifts |
### Aden Dashboard
Aden provides built-in cost visualization:
- Real-time cost tracking
- Budget gauges with alerts
- Cost by agent/model breakdown
- Historical trends
- Anomaly detection
---
## Best Practices Summary
### Do's
1. ✅ Set budget limits before deployment
2. ✅ Implement automatic degradation
3. ✅ Monitor costs in real-time
4. ✅ Alert on anomalies
5. ✅ Optimize prompts for token efficiency
6. ✅ Cache common requests
7. ✅ Use appropriate models for task complexity
8. ✅ Review costs regularly
### Don'ts
1. ❌ Deploy without budget limits
2. ❌ Use GPT-4 for everything
3. ❌ Ignore cost metrics
4. ❌ Allow unlimited retries
5. ❌ Store full context forever
6. ❌ Skip testing cost scenarios
7. ❌ Forget about tool API costs
---
## Conclusion
AI agent cost management requires:
1. **Prevention**: Budget limits, degradation policies
2. **Detection**: Real-time tracking, anomaly alerts
3. **Optimization**: Smart model selection, token efficiency
4. **Protection**: Circuit breakers, runaway detection
Frameworks like Aden with built-in cost controls make this easier, but the principles apply to any agent system. Start with conservative limits and adjust based on real usage patterns.
---
*Last updated: January 2025*
@@ -0,0 +1,423 @@
# AI Agent Observability & Monitoring: The Complete Guide
*How to know what your AI agents are actually doing*
---
AI agents are autonomous systems that make decisions, call tools, and interact with the world. Without proper observability, they become black boxes. This guide covers everything you need to monitor AI agents effectively.
---
## Why Agent Observability Is Different
Traditional application monitoring tracks requests and responses. Agent monitoring must track:
| Traditional Apps | AI Agents |
|------------------|-----------|
| Request/Response | Multi-step reasoning chains |
| Deterministic behavior | Probabilistic decisions |
| Fixed execution paths | Dynamic tool selection |
| Predictable costs | Variable LLM spending |
| Clear errors | Subtle quality degradation |
---
## The Four Pillars of Agent Observability
```
┌─────────────────────────────────────────────────────────────┐
│ Agent Observability Stack │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
│ │ Metrics │ │ Logs │ │ Traces │ │
│ │ (Numbers) │ │ (Events) │ │ (Execution Flow) │ │
│ └─────────────┘ └─────────────┘ └─────────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────┐ │
│ │ Quality Evals │ │
│ │ (Output Assessment) │ │
│ └───────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
```
### 1. Metrics
Quantitative measurements over time:
- Requests per minute
- Success/failure rates
- Latency distributions
- Token usage
- Cost per request
- Tool call frequencies
### 2. Logs
Discrete events with context:
- Agent decisions
- Tool inputs/outputs
- Error messages
- User interactions
- System events
### 3. Traces
End-to-end execution flows:
- Full reasoning chains
- Token-by-token generation
- Tool call sequences
- Parent-child relationships
- Cross-agent communication
### 4. Quality Evals
Output quality assessment:
- Accuracy scoring
- Hallucination detection
- Task completion rates
- User satisfaction
- Regression detection
---
## Key Metrics to Track
### Performance Metrics
| Metric | Description | Alert Threshold |
|--------|-------------|-----------------|
| `agent.latency.p50` | Median response time | > 5s |
| `agent.latency.p99` | 99th percentile latency | > 30s |
| `agent.throughput` | Requests/second | < baseline * 0.5 |
| `agent.queue.depth` | Pending requests | > 100 |
| `agent.timeout.rate` | Timeout percentage | > 5% |
### Reliability Metrics
| Metric | Description | Alert Threshold |
|--------|-------------|-----------------|
| `agent.success.rate` | Successful completions | < 95% |
| `agent.error.rate` | Error percentage | > 5% |
| `agent.retry.rate` | Retries needed | > 10% |
| `agent.fallback.rate` | Fallback usage | > 20% |
| `agent.circuit.open` | Circuit breaker status | true |
### Cost Metrics
| Metric | Description | Alert Threshold |
|--------|-------------|-----------------|
| `agent.cost.total` | Total spend | > budget * 0.9 |
| `agent.cost.per.request` | Cost per request | > $0.50 |
| `agent.tokens.input` | Input tokens used | anomaly detection |
| `agent.tokens.output` | Output tokens used | anomaly detection |
| `agent.model.usage` | Calls by model | unusual patterns |
### Quality Metrics
| Metric | Description | Alert Threshold |
|--------|-------------|-----------------|
| `agent.quality.score` | Output quality (0-1) | < 0.7 |
| `agent.hallucination.rate` | Detected hallucinations | > 5% |
| `agent.task.completion` | Tasks fully completed | < 80% |
| `agent.user.satisfaction` | User ratings | < 4.0/5.0 |
---
## Logging Best Practices
### Structured Logging Format
```json
{
"timestamp": "2025-01-15T10:30:00Z",
"level": "info",
"event": "agent_tool_call",
"agent_id": "agent-123",
"session_id": "session-456",
"trace_id": "trace-789",
"tool": "search_web",
"input": {"query": "latest AI news"},
"output_tokens": 150,
"latency_ms": 1200,
"success": true
}
```
### What to Log
**Always Log:**
- Agent start/stop
- Tool calls (name, duration, success)
- LLM calls (model, tokens, latency)
- Errors and exceptions
- Human interventions
- Budget events
**Log Carefully (PII concerns):**
- User inputs (may need redaction)
- Agent outputs (may contain sensitive data)
- Full prompts (can be large)
**Never Log:**
- API keys
- User credentials
- Full conversation transcripts in production
- Raw model weights
### Log Levels for Agents
| Level | Use Case |
|-------|----------|
| DEBUG | Full prompts, token-level details |
| INFO | Tool calls, completions, metrics |
| WARN | Retries, degradation, budget warnings |
| ERROR | Failures, exceptions, circuit breaks |
| FATAL | System crashes, unrecoverable errors |
---
## Distributed Tracing for Agents
### Why Tracing Matters
Agents involve multiple steps, LLM calls, and tool invocations. Tracing connects them all.
```
Trace: "Process customer refund"
├── Span: Agent Initialize (5ms)
├── Span: LLM Planning Call (800ms)
│ └── Attribute: model=gpt-4, tokens=500
├── Span: Tool: fetch_order (200ms)
│ └── Attribute: order_id=12345
├── Span: Tool: check_policy (50ms)
├── Span: LLM Decision Call (600ms)
│ └── Attribute: decision=approve
├── Span: Tool: process_refund (300ms)
└── Span: Agent Complete (10ms)
└── Attribute: success=true, cost=$0.08
```
### Key Trace Attributes
- `agent.id`: Unique agent identifier
- `agent.type`: Agent type/role
- `session.id`: User session
- `parent.agent`: For multi-agent systems
- `llm.model`: Model used
- `llm.tokens`: Token counts
- `tool.name`: Tool being called
- `tool.success`: Tool outcome
---
## Dashboard Design
### Dashboard 1: Operations Overview
```
┌─────────────────────────────────────────────────────────────┐
│ Agent Operations │
├─────────────────┬─────────────────┬─────────────────────────┤
│ Active Agents │ Requests/Min │ Error Rate │
│ 42 │ 1,234 │ 0.3% ✓ │
├─────────────────┴─────────────────┴─────────────────────────┤
│ │
│ Request Latency (p50/p99) Success Rate (24h) │
│ ████████████████░░░░ ██████████████████████ │
│ 1.2s / 4.5s 99.2% │
│ │
├─────────────────────────────────────────────────────────────┤
│ Top Errors Active Alerts │
│ • Rate limit exceeded (12) ⚠️ High latency p99 │
│ • Tool timeout (5) ⚠️ Budget at 85% │
│ • Validation failed (3) │
└─────────────────────────────────────────────────────────────┘
```
### Dashboard 2: Cost & Usage
```
┌─────────────────────────────────────────────────────────────┐
│ Cost & Usage │
├─────────────────┬─────────────────┬─────────────────────────┤
│ Today's Spend │ Budget Used │ Projected Monthly │
│ $127.50 │ 67% │ $3,825 │
├─────────────────┴─────────────────┴─────────────────────────┤
│ │
│ Cost by Model │ Cost by Agent │
│ ■ GPT-4: $89 │ ■ Support: $45 │
│ ■ Claude: $28 │ ■ Research: $52 │
│ ■ GPT-3.5: $10 │ ■ Writer: $30 │
│ │
├─────────────────────────────────────────────────────────────┤
│ Token Usage Trend (7 days) │
│ ▁▂▃▄▅▆▇█▇▆▅▄▃▂▁▂▃▄▅▆ │
└─────────────────────────────────────────────────────────────┘
```
### Dashboard 3: Quality & Reliability
```
┌─────────────────────────────────────────────────────────────┐
│ Quality & Reliability │
├─────────────────┬─────────────────┬─────────────────────────┤
│ Quality Score │ Task Complete │ User Satisfaction │
│ 0.92/1.0 │ 94.5% │ 4.6/5.0 │
├─────────────────┴─────────────────┴─────────────────────────┤
│ │
│ Quality Trend (30 days) │ Failure Analysis │
│ ████████████████████████ │ ■ LLM errors: 2% │
│ ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔ │ ■ Tool errors: 1% │
│ Target: 0.90 │ ■ Timeouts: 0.5% │
│ │ ■ Logic errors: 0.5% │
├─────────────────────────────────────────────────────────────┤
│ Recent Quality Issues │
│ • Agent-42 hallucination detected (15 min ago) │
│ • Agent-17 task incomplete (1 hour ago) │
└─────────────────────────────────────────────────────────────┘
```
---
## Alerting Strategy
### Critical Alerts (Page immediately)
- Error rate > 10% for 5 minutes
- All agents offline
- Budget exceeded
- Security anomaly detected
### Warning Alerts (Notify during business hours)
- Error rate > 5% for 15 minutes
- Latency p99 > 30s
- Budget > 90% of limit
- Quality score drops > 10%
### Informational (Daily digest)
- Token usage trends
- Cost projections
- Quality score changes
- New error types detected
### Alert Fatigue Prevention
- Use anomaly detection vs fixed thresholds
- Group related alerts
- Implement progressive escalation
- Review and tune alert thresholds monthly
---
## Tool Comparison
| Tool | Best For | Agent-Specific Features |
|------|----------|------------------------|
| Datadog | Enterprise, full-stack | APM for LLM calls |
| Grafana | Self-hosted, flexibility | Custom dashboards |
| LangSmith | LangChain users | Prompt tracing |
| Weights & Biases | ML teams | Experiment tracking |
| Helicone | LLM-focused | Token analytics |
| Aden | Production agents | Built-in observability |
---
## How Aden Handles Observability
Aden provides built-in observability without additional setup:
### Automatic Collection
```
┌─────────────────────────────────────────────────────────────┐
│ Aden Observability │
│ │
│ ┌───────────────┐ ┌───────────────────────────────┐ │
│ │ SDK-Wrapped │──────▶│ Event Stream │ │
│ │ Nodes │ │ • Metrics • Logs • Traces │ │
│ └───────────────┘ └───────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ Honeycomb Dashboard │ │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
│ │ │ Metrics │ │ Costs │ │ Quality │ │ Alerts │ │ │
│ │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │ │
│ └───────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
```
### What Aden Tracks Automatically
- Every LLM call (model, tokens, latency, cost)
- Every tool invocation (name, duration, success)
- Agent lifecycle events (start, stop, error)
- Budget consumption in real-time
- Quality metrics via failure tracking
- HITL intervention points
### Built-in Dashboards
- Real-time agent status
- Cost breakdown by agent/model
- Quality trends over time
- Failure analysis
- Self-improvement metrics
### No Configuration Required
Unlike external tools, Aden's observability requires no setup:
```python
# Just wrap your node with the SDK
from aden import sdk
@sdk.node
async def my_agent(input):
# All metrics automatically collected
return await process(input)
```
---
## Implementation Checklist
### Phase 1: Basic (Week 1)
- [ ] Structured logging in place
- [ ] Basic metrics: latency, errors, throughput
- [ ] Cost tracking per request
- [ ] Simple dashboard with key metrics
### Phase 2: Comprehensive (Week 2-3)
- [ ] Distributed tracing implemented
- [ ] Quality evaluation pipeline
- [ ] Alerting rules configured
- [ ] Full dashboards built
### Phase 3: Advanced (Week 4+)
- [ ] Anomaly detection
- [ ] Automated regression detection
- [ ] Cost optimization insights
- [ ] Self-healing triggers
---
## Common Pitfalls
### 1. Logging Too Much
**Problem:** Full prompts in production logs
**Solution:** Log hashes or summaries, full content only for debugging
### 2. Alert Fatigue
**Problem:** Too many non-actionable alerts
**Solution:** Use anomaly detection, tune thresholds, require action plans
### 3. Missing Context
**Problem:** Can't correlate events across agents
**Solution:** Propagate trace IDs, use correlation IDs
### 4. Ignoring Quality
**Problem:** Only track operational metrics
**Solution:** Implement quality scoring, track user feedback
### 5. No Baselines
**Problem:** Don't know what "normal" looks like
**Solution:** Establish baselines before alerting, use relative thresholds
---
## Conclusion
Effective agent observability requires:
1. **Metrics**: Know your numbers (latency, errors, cost)
2. **Logs**: Capture events with context
3. **Traces**: Follow execution flows end-to-end
4. **Quality**: Assess output, not just uptime
Modern agent platforms like Aden provide this built-in. For other frameworks, plan to invest significant effort in observability infrastructure.
The goal: Never wonder what your agents are doing—always know.
---
*Last updated: January 2025*
@@ -0,0 +1,551 @@
# Building Production AI Agents: From Prototype to Deployment
*A practical guide to taking AI agents from demo to production*
---
Getting an AI agent working in a demo is easy. Getting it to work reliably in production is hard. This guide covers the critical differences and how to bridge the gap.
---
## Demo vs Production
| Aspect | Demo | Production |
|--------|------|------------|
| Traffic | You testing it | Hundreds/thousands of users |
| Uptime | "It worked when I tried" | 99.9% required |
| Errors | "Let me restart it" | Must handle gracefully |
| Cost | "It's just a demo" | Every dollar matters |
| Security | None | Critical |
| Monitoring | Print statements | Full observability |
| Recovery | Manual restart | Automatic healing |
---
## The Production Readiness Checklist
### 1. Reliability
- [ ] Retry logic with exponential backoff
- [ ] Circuit breakers for failing services
- [ ] Graceful degradation (fallbacks)
- [ ] Health check endpoints
- [ ] Automatic recovery from crashes
### 2. Scalability
- [ ] Horizontal scaling capability
- [ ] Stateless design (or managed state)
- [ ] Queue-based processing for bursts
- [ ] Database connection pooling
- [ ] Caching layer
### 3. Observability
- [ ] Structured logging
- [ ] Metrics collection
- [ ] Distributed tracing
- [ ] Alerting rules
- [ ] Dashboard for monitoring
### 4. Security
- [ ] API authentication
- [ ] Input validation
- [ ] Output sanitization
- [ ] Secrets management
- [ ] Audit logging
### 5. Cost Control
- [ ] Budget limits
- [ ] Usage tracking
- [ ] Model degradation policies
- [ ] Anomaly detection
### 6. Human Oversight
- [ ] HITL checkpoints
- [ ] Escalation policies
- [ ] Audit trails
- [ ] Manual override capability
---
## Architecture Patterns
### Pattern 1: Simple Agent Service
```
┌──────────────────────────────────────────┐
│ Agent Service │
│ ┌────────────────────────────────────┐ │
│ │ Request Handler │ │
│ │ ┌──────┐ ┌──────┐ ┌──────┐ │ │
│ │ │Validate│→│Agent │→│Format │ │ │
│ │ │ Input │ │Execute│ │Output│ │ │
│ │ └──────┘ └──────┘ └──────┘ │ │
│ └────────────────────────────────────┘ │
│ │ │
│ ┌─────────────────────────────────────┐│
│ │ Dependencies ││
│ │ • LLM API • Tools • Database ││
│ └─────────────────────────────────────┘│
└──────────────────────────────────────────┘
```
**Best for:** Simple use cases, low volume
### Pattern 2: Queue-Based Processing
```
┌───────┐ ┌───────┐ ┌───────────────┐
│Request│───▶│ Queue │───▶│ Agent Workers │
│ API │ │ │ │ (N copies) │
└───────┘ └───────┘ └───────────────┘
┌─────────┐
│ Results │
│ DB │
└─────────┘
```
**Best for:** High volume, async processing
### Pattern 3: Event-Driven Agents
```
┌─────────────┐
│ Event Source│─────┐
└─────────────┘ │
┌─────────────┐ ┌─────────┐ ┌─────────────┐
│ Event Source│─▶│ Event │─▶│ Agent │
└─────────────┘ │ Bus │ │ Processors │
└─────────┘ └─────────────┘
┌─────────────┐ │
│ Event Source│─────┘
└─────────────┘
```
**Best for:** Reactive systems, integrations
### Pattern 4: Full Platform (Aden)
```
┌────────────────────────────────────────────────────────┐
│ Aden Platform │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌─────────────┐ │
│ │ Coding Agent │ │Worker Agents │ │ Dashboard │ │
│ │ (Generate) │ │ (Execute) │ │ (Monitor) │ │
│ └──────────────┘ └──────────────┘ └─────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌────────────────────────────────────────────────┐ │
│ │ Control Plane │ │
│ │ • Budget • Policies • Metrics • HITL │ │
│ └────────────────────────────────────────────────┘ │
│ │ │
│ ┌────────────────────────────────────────────────┐ │
│ │ Storage Layer │ │
│ │ • Events • Policies • Config │ │
│ └────────────────────────────────────────────────┘ │
└────────────────────────────────────────────────────────┘
```
**Best for:** Complex systems, self-improving agents
---
## Implementing Reliability
### Retry Logic
```python
import time
from functools import wraps
def retry_with_backoff(max_retries=3, base_delay=1, max_delay=60):
def decorator(func):
@wraps(func)
async def wrapper(*args, **kwargs):
retries = 0
while True:
try:
return await func(*args, **kwargs)
except (RateLimitError, TimeoutError) as e:
retries += 1
if retries > max_retries:
raise
delay = min(base_delay * (2 ** retries), max_delay)
logger.warning(f"Retry {retries}/{max_retries} after {delay}s: {e}")
await asyncio.sleep(delay)
return wrapper
return decorator
@retry_with_backoff(max_retries=3)
async def call_llm(prompt):
return await llm_client.complete(prompt)
```
### Circuit Breaker
```python
class CircuitBreaker:
def __init__(self, failure_threshold=5, recovery_time=60):
self.failure_count = 0
self.failure_threshold = failure_threshold
self.recovery_time = recovery_time
self.last_failure_time = None
self.state = "closed" # closed, open, half-open
async def call(self, func, *args, **kwargs):
if self.state == "open":
if time.time() - self.last_failure_time > self.recovery_time:
self.state = "half-open"
else:
raise CircuitOpenError("Circuit breaker is open")
try:
result = await func(*args, **kwargs)
if self.state == "half-open":
self.state = "closed"
self.failure_count = 0
return result
except Exception as e:
self.failure_count += 1
self.last_failure_time = time.time()
if self.failure_count >= self.failure_threshold:
self.state = "open"
raise
```
### Graceful Degradation
```python
async def process_with_fallback(task):
try:
# Try primary approach
return await primary_agent.execute(task)
except AgentError:
try:
# Fall back to simpler approach
return await fallback_agent.execute(task)
except AgentError:
# Last resort: static response
return create_static_response(task)
```
---
## Implementing Observability
### Structured Logging
```python
import structlog
logger = structlog.get_logger()
async def execute_agent(task):
logger.info("agent_execution_started",
task_id=task.id,
agent_id=agent.id,
input_tokens=count_tokens(task.input))
try:
result = await agent.run(task)
logger.info("agent_execution_completed",
task_id=task.id,
duration_ms=duration,
output_tokens=count_tokens(result),
cost_usd=calculate_cost(result))
return result
except Exception as e:
logger.error("agent_execution_failed",
task_id=task.id,
error=str(e),
error_type=type(e).__name__)
raise
```
### Metrics Collection
```python
from prometheus_client import Counter, Histogram, Gauge
# Counters
agent_requests_total = Counter(
'agent_requests_total',
'Total agent requests',
['agent_id', 'status']
)
# Histograms
agent_duration_seconds = Histogram(
'agent_duration_seconds',
'Agent execution duration',
['agent_id']
)
# Gauges
agent_active_tasks = Gauge(
'agent_active_tasks',
'Currently running agent tasks',
['agent_id']
)
async def execute_with_metrics(agent, task):
agent_active_tasks.labels(agent_id=agent.id).inc()
start = time.time()
try:
result = await agent.run(task)
agent_requests_total.labels(agent_id=agent.id, status='success').inc()
return result
except Exception:
agent_requests_total.labels(agent_id=agent.id, status='error').inc()
raise
finally:
duration = time.time() - start
agent_duration_seconds.labels(agent_id=agent.id).observe(duration)
agent_active_tasks.labels(agent_id=agent.id).dec()
```
### Distributed Tracing
```python
from opentelemetry import trace
tracer = trace.get_tracer(__name__)
async def execute_with_tracing(agent, task):
with tracer.start_as_current_span("agent_execution") as span:
span.set_attribute("agent.id", agent.id)
span.set_attribute("task.id", task.id)
# LLM call
with tracer.start_as_current_span("llm_call") as llm_span:
llm_span.set_attribute("model", agent.model)
result = await call_llm(task.prompt)
llm_span.set_attribute("tokens", result.usage.total_tokens)
# Tool execution
with tracer.start_as_current_span("tool_execution") as tool_span:
tool_span.set_attribute("tool", tool.name)
tool_result = await execute_tool(tool, result)
return tool_result
```
---
## Security Best Practices
### Input Validation
```python
from pydantic import BaseModel, validator
class AgentRequest(BaseModel):
task: str
context: dict = {}
max_tokens: int = 1000
@validator('task')
def validate_task(cls, v):
if len(v) > 10000:
raise ValueError('Task too long')
if contains_injection_attempt(v):
raise ValueError('Invalid input detected')
return v
@validator('max_tokens')
def validate_max_tokens(cls, v):
if v > 4000:
raise ValueError('max_tokens too high')
return v
```
### Output Sanitization
```python
def sanitize_output(result):
# Remove any leaked secrets
result = mask_patterns(result, SECRET_PATTERNS)
# Validate structure
if not is_valid_response(result):
raise OutputValidationError("Invalid response structure")
# Check for harmful content
if contains_harmful_content(result):
raise ContentPolicyError("Response violates content policy")
return result
```
### Audit Logging
```python
async def audit_log(event):
log_entry = {
"timestamp": datetime.utcnow().isoformat(),
"event_type": event.type,
"agent_id": event.agent_id,
"user_id": event.user_id,
"action": event.action,
"input_hash": hash_content(event.input), # Don't log full input
"output_hash": hash_content(event.output),
"metadata": event.metadata
}
await audit_db.insert(log_entry)
```
---
## Deployment Strategies
### Blue-Green Deployment
```
Load Balancer
┌───────────┴───────────┐
│ │
┌─────▼─────┐ ┌─────▼─────┐
│ Blue │ │ Green │
│ (Current) │ │ (New) │
└───────────┘ └───────────┘
1. Deploy new version to Green
2. Test Green environment
3. Switch traffic Blue → Green
4. Keep Blue for rollback
```
### Canary Deployment
```
Load Balancer
┌───────────┴───────────┐
│ 95% 5% │
┌─────▼─────┐ ┌─────▼─────┐
│ Stable │ │ Canary │
│ (v1.0) │ │ (v1.1) │
└───────────┘ └───────────┘
1. Deploy new version as Canary
2. Route 5% traffic to Canary
3. Monitor metrics
4. Gradually increase or rollback
```
### Feature Flags
```python
async def execute_agent(task, user):
if feature_flags.is_enabled("new_agent_v2", user.id):
return await agent_v2.execute(task)
else:
return await agent_v1.execute(task)
```
---
## Framework Comparison: Production Readiness
| Feature | DIY | LangChain | CrewAI | Aden |
|---------|-----|-----------|--------|------|
| Retry logic | Build | Partial | Basic | Built-in |
| Circuit breakers | Build | No | No | Built-in |
| Health checks | Build | No | No | Built-in |
| Monitoring | Build | LangSmith | Build | Built-in |
| Cost control | Build | No | No | Built-in |
| HITL | Build | Build | Basic | Native |
| Self-healing | Build | No | No | Native |
| Dashboard | Build | LangSmith | No | Built-in |
---
## Testing for Production
### Unit Tests
```python
def test_agent_handles_rate_limit():
with mock.patch('llm.complete', side_effect=RateLimitError()):
result = agent.execute(task)
assert result.status == "retried"
def test_agent_validates_input():
with pytest.raises(ValidationError):
agent.execute({"task": "x" * 100000}) # Too long
```
### Integration Tests
```python
async def test_full_agent_flow():
# Create test task
task = create_test_task()
# Execute agent
result = await agent.execute(task)
# Verify result
assert result.success
assert result.output is not None
# Verify monitoring
assert metrics.request_count > 0
assert metrics.last_cost < 1.0
```
### Load Tests
```python
async def load_test_agent():
tasks = [create_test_task() for _ in range(100)]
start = time.time()
results = await asyncio.gather(*[
agent.execute(task) for task in tasks
])
duration = time.time() - start
success_rate = sum(1 for r in results if r.success) / len(results)
avg_latency = duration / len(tasks)
assert success_rate > 0.95
assert avg_latency < 5.0 # seconds
```
### Chaos Tests
```python
async def test_agent_survives_llm_outage():
with mock.patch('llm.complete', side_effect=ConnectionError()):
# Should use fallback or degrade gracefully
result = await agent.execute(task)
assert result.status in ["fallback", "degraded"]
async def test_agent_survives_high_load():
# Simulate burst traffic
tasks = [create_test_task() for _ in range(1000)]
results = await asyncio.gather(*[
agent.execute(task) for task in tasks
], return_exceptions=True)
# Should not crash, may throttle
errors = [r for r in results if isinstance(r, Exception)]
assert len(errors) / len(results) < 0.1 # <10% error rate
```
---
## Conclusion
Production AI agents require:
1. **Reliability**: Retries, circuit breakers, fallbacks
2. **Observability**: Logs, metrics, traces, dashboards
3. **Security**: Validation, sanitization, auditing
4. **Cost Control**: Budgets, tracking, degradation
5. **Human Oversight**: HITL, escalation, override
Frameworks like Aden provide many of these out of the box. For other frameworks, you'll need to build this infrastructure yourself.
The gap between demo and production is significant—plan for it from the start.
---
*Last updated: January 2025*
@@ -0,0 +1,441 @@
# Human-in-the-Loop for AI Agents: A Complete Guide
*Balancing automation with human oversight for safe, effective AI systems*
---
Human-in-the-Loop (HITL) is a critical design pattern for AI agents. It ensures that humans remain in control of important decisions while still benefiting from AI automation. This guide covers everything you need to know about implementing HITL in agent systems.
---
## What is Human-in-the-Loop?
HITL refers to **incorporating human judgment into automated AI workflows**. Instead of fully autonomous operation, agents pause at critical points to request human input, approval, or guidance.
```
Agent working → Critical decision → PAUSE → Human reviews → Continue/Modify
```
---
## Why HITL Matters
### Safety
- Prevents harmful actions before they occur
- Catches AI errors and hallucinations
- Maintains accountability
### Quality
- Ensures outputs meet standards
- Incorporates domain expertise
- Validates complex decisions
### Trust
- Builds user confidence in AI systems
- Provides transparency
- Enables gradual autonomy increase
### Compliance
- Meets regulatory requirements
- Creates audit trails
- Maintains human responsibility
---
## HITL Patterns
### Pattern 1: Approval Gates
Agent completes work, then waits for human approval before proceeding.
```
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Agent │────▶│ APPROVE? │────▶│ Action │
│ works │ │ (Human) │ │ taken │
└─────────────┘ └─────────────┘ └─────────────┘
│ Reject
┌─────────────┐
│ Revise │
└─────────────┘
```
**Use when:** Actions are irreversible or high-impact
**Example:**
- Publishing content
- Sending emails to customers
- Making financial transactions
### Pattern 2: Confidence-Based Escalation
Agent handles confident decisions autonomously, escalates uncertain ones.
```
Agent decision
┌─────────────────┐
│ Confidence? │
└─────────────────┘
├── High ──▶ Proceed autonomously
└── Low ───▶ Request human input
```
**Use when:** Volume is high, most cases are straightforward
**Example:**
- Customer support ticket routing
- Content moderation
- Data classification
### Pattern 3: Sampling/Audit
Agent operates autonomously, humans review a sample of decisions.
```
Agent decisions: [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]
│ │
▼ ▼
Human reviews sample
Feedback loop to agent
```
**Use when:** Scale makes full review impossible
**Example:**
- Fraud detection review
- Quality assurance
- Model monitoring
### Pattern 4: Collaborative Editing
Human and agent work together in real-time.
```
┌─────────────────────────────────────┐
│ │
│ Agent suggests ←→ Human edits │
│ │
│ Iterative refinement │
│ │
└─────────────────────────────────────┘
```
**Use when:** Output quality is paramount
**Example:**
- Document drafting
- Code review
- Creative content
---
## Implementing HITL
### Key Components
1. **Intervention Points**
- Where in the workflow to pause
- What triggers human involvement
2. **Request Interface**
- How to present information to humans
- What context to provide
3. **Response Handling**
- How to process human input
- Timeout and escalation policies
4. **Learning Loop**
- Capturing human decisions for improvement
- Reducing future intervention needs
### Implementation Example
```python
class HITLAgent:
def __init__(self, config):
self.confidence_threshold = config.confidence_threshold
self.timeout = config.human_timeout
self.escalation_policy = config.escalation
async def execute(self, task):
# Agent works on task
result = await self.process(task)
# Check if human review needed
if self.needs_human_review(result):
# Create intervention request
request = InterventionRequest(
task=task,
result=result,
context=self.get_context(),
options=self.get_options(result),
deadline=self.timeout
)
# Wait for human response
human_response = await self.request_human_input(request)
if human_response.approved:
return self.finalize(result, human_response.modifications)
else:
return self.handle_rejection(human_response.feedback)
else:
return result
def needs_human_review(self, result):
# Determine based on:
# - Confidence score
# - Action type (high-impact?)
# - Policy rules
# - Historical patterns
pass
```
---
## HITL in Different Frameworks
### Basic Implementation (Most Frameworks)
```python
# Manual HITL implementation
def agent_with_approval(task):
result = agent.execute(task)
print(f"Agent proposes: {result}")
approved = input("Approve? (y/n): ")
if approved == 'y':
return execute_action(result)
else:
feedback = input("Feedback: ")
return agent.revise(task, feedback)
```
### CrewAI HITL
```python
from crewai import Agent
agent = Agent(
role="Content Writer",
human_input=True, # Enable human input
# Agent will request input when uncertain
)
```
### AutoGen HITL
```python
from autogen import UserProxyAgent
user_proxy = UserProxyAgent(
name="human",
human_input_mode="ALWAYS", # or "TERMINATE", "NEVER"
# Controls when human input is requested
)
```
### Aden HITL
Aden has native support for HITL with:
```python
# Goal definition includes HITL requirements
goal = """
Create a customer response system that:
1. Drafts responses to customer inquiries
2. Requires human approval for:
- Refund requests over $100
- Escalation decisions
- Responses to VIP customers
3. Auto-sends low-risk responses after 2-hour timeout
4. Learns from approved/rejected responses
"""
# Aden creates intervention nodes automatically
# Dashboard shows pending approvals
# Configurable timeout and escalation policies
```
---
## Timeout and Escalation Strategies
### What Happens When Humans Don't Respond?
| Strategy | When to Use | Implementation |
|----------|-------------|----------------|
| **Wait indefinitely** | Critical decisions | No timeout |
| **Auto-approve** | Low-risk, time-sensitive | Proceed after timeout |
| **Auto-reject** | Safety-first approach | Cancel after timeout |
| **Escalate** | Important but time-sensitive | Notify additional humans |
| **Fallback** | Must complete | Use safe default |
### Escalation Chain Example
```
Request sent
├── 30 min: Reminder to original reviewer
├── 1 hour: Escalate to team lead
├── 2 hours: Escalate to manager
└── 4 hours: Auto-reject with notification
```
### Timeout Configuration
```python
intervention_config = {
"timeout_minutes": 60,
"reminders": [30, 45],
"escalation_chain": ["team_lead", "manager"],
"fallback_action": "reject",
"notification_channels": ["email", "slack"]
}
```
---
## Best Practices
### 1. Minimize Friction
- **Good:** Clear, actionable requests
- **Bad:** Vague requests requiring investigation
```
# Good
"Approve sending this email to john@example.com?
Subject: Order Confirmation
[View full email] [Approve] [Reject] [Edit]"
# Bad
"Agent completed task. Review?"
```
### 2. Provide Context
Include everything humans need to decide:
- What the agent did
- Why it's asking (confidence, rules)
- Relevant history
- Available options
### 3. Make Actions Easy
- One-click approval for clear cases
- Pre-filled options
- Keyboard shortcuts for power users
### 4. Learn from Decisions
Track human decisions to:
- Improve agent confidence calibration
- Identify patterns for automation
- Reduce future intervention needs
### 5. Design for Scale
Consider what happens with:
- 10 requests per day
- 100 requests per day
- 1000 requests per day
### 6. Handle Edge Cases
- What if reviewer is unavailable?
- What if multiple reviewers conflict?
- What if reviewer makes a mistake?
---
## Metrics to Track
| Metric | What it Measures | Target |
|--------|------------------|--------|
| Intervention rate | % of tasks needing human | Minimize over time |
| Response time | How fast humans respond | Optimize |
| Approval rate | % of requests approved | Monitor for drift |
| Override rate | Humans changing agent decisions | Quality indicator |
| Timeout rate | % of requests timing out | Keep low |
| Learning impact | Reduction in interventions | Should decrease |
---
## Common Mistakes
### 1. Too Many Interventions
**Problem:** Humans overwhelmed, start rubber-stamping
**Solution:** Reserve for truly important decisions
### 2. Too Few Interventions
**Problem:** Errors slip through, trust erodes
**Solution:** Start conservative, reduce over time
### 3. Poor Context
**Problem:** Humans can't make informed decisions
**Solution:** Include all relevant information
### 4. Slow Response
**Problem:** Workflow bottlenecked on humans
**Solution:** Timeouts, escalation, parallelization
### 5. No Learning
**Problem:** Same interventions forever
**Solution:** Track patterns, improve agent
---
## HITL and Compliance
### Audit Trail Requirements
```python
audit_log = {
"timestamp": "2025-01-15T10:30:00Z",
"task_id": "task_123",
"agent_decision": "send_refund",
"intervention_requested": True,
"reviewer": "jane@company.com",
"review_timestamp": "2025-01-15T10:45:00Z",
"decision": "approved",
"modifications": None,
"rationale": "Within policy limits"
}
```
### Regulatory Considerations
- GDPR: Human review for automated decisions affecting individuals
- Financial: Approval requirements for transactions
- Healthcare: Clinical decision support guidelines
- AI regulations: Explainability and human oversight requirements
---
## Future of HITL
### Trends
1. **Adaptive intervention** - AI learns when to ask
2. **Predictive escalation** - Anticipate human needs
3. **Collaborative interfaces** - Better human-AI interaction
4. **Gradual autonomy** - Systems earn more independence
### Aden's Approach
Aden is built around native HITL:
- Intervention nodes are first-class citizens
- Dashboard for managing approvals
- Configurable policies per agent
- Learning from human feedback
- Self-improvement reduces intervention over time
---
## Conclusion
Human-in-the-Loop isn't about limiting AI—it's about **building AI systems that humans can trust and control**. The best HITL implementations:
1. Start conservative and earn autonomy
2. Make human interaction effortless
3. Learn from every decision
4. Balance automation with oversight
As AI agents become more capable, thoughtful HITL design becomes more important, not less. The goal is collaboration, not competition, between human and artificial intelligence.
---
*Last updated: January 2025*
@@ -0,0 +1,289 @@
# Multi-Agent vs Single-Agent Systems: When to Use Each
*A practical guide to choosing the right architecture for your AI application*
---
When building AI applications, one of the first architectural decisions is whether to use a single agent or multiple agents working together. This guide breaks down when each approach makes sense.
---
## Single-Agent Systems
### What They Are
A single agent handles all tasks, tool calls, and decision-making within one unified process.
```
┌─────────────────────────────────────────┐
│ Single Agent │
│ ┌─────────────────────────────────┐ │
│ │ LLM Brain │ │
│ │ • Reasoning │ │
│ │ • Planning │ │
│ │ • Tool Selection │ │
│ │ • Execution │ │
│ └─────────────────────────────────┘ │
│ │ │
│ ┌───────────────┴───────────────┐ │
│ │ Tools │ │
│ │ [A] [B] [C] [D] [E] [F] │ │
│ └───────────────────────────────┘ │
└─────────────────────────────────────────┘
```
### Advantages
- **Simpler to build**: One agent, one context, one conversation
- **Lower latency**: No inter-agent communication overhead
- **Easier debugging**: Single point of execution to trace
- **Lower cost**: Fewer LLM calls overall
- **Unified context**: All information in one place
### Disadvantages
- **Context limits**: One agent must fit everything in its context window
- **Jack of all trades**: Hard to optimize for specialized tasks
- **Single point of failure**: If the agent fails, everything fails
- **Limited parallelism**: Sequential execution of tasks
### Best Use Cases
1. **Simple Q&A chatbots**: Direct user interaction
2. **Single-purpose tools**: One task done well
3. **Prototype development**: Quick iteration
4. **Low-complexity workflows**: Linear task sequences
5. **Cost-sensitive applications**: Minimizing LLM usage
---
## Multi-Agent Systems
### What They Are
Multiple specialized agents collaborate, each handling specific tasks or domains.
```
┌─────────────────────────────────────────────────────────┐
│ Multi-Agent System │
│ │
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ │
│ │ Agent A │ │ Agent B │ │ Agent C │ │
│ │ Researcher│ │ Writer │ │ Reviewer │ │
│ │ [🔍] │ │ [✍️] │ │ [✓] │ │
│ └─────┬─────┘ └─────┬─────┘ └─────┬─────┘ │
│ │ │ │ │
│ └───────────────┼───────────────┘ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ Coordinator │ │
│ │ / Orchestrator│ │
│ └─────────────────┘ │
└─────────────────────────────────────────────────────────┘
```
### Advantages
- **Specialization**: Each agent optimized for its domain
- **Scalability**: Add new agents for new capabilities
- **Parallelism**: Multiple agents work simultaneously
- **Fault isolation**: One agent failing doesn't crash everything
- **Better context management**: Each agent has focused context
### Disadvantages
- **Coordination complexity**: Managing agent communication
- **Higher latency**: Inter-agent handoffs add time
- **More expensive**: More LLM calls for coordination
- **Debugging difficulty**: Distributed execution traces
- **Potential conflicts**: Agents may have conflicting outputs
### Best Use Cases
1. **Complex research tasks**: Multiple perspectives needed
2. **Content pipelines**: Research → Write → Edit → Publish
3. **Enterprise workflows**: Different departments/functions
4. **Self-improving systems**: Separate learning from execution
5. **High-reliability systems**: Redundancy and verification
---
## Framework Comparison
| Framework | Single-Agent | Multi-Agent | Coordination Style |
|-----------|--------------|-------------|-------------------|
| LangChain | Excellent | Basic | Manual chains |
| CrewAI | Good | Excellent | Role-based crews |
| AutoGen | Good | Excellent | Conversation-based |
| Aden | Excellent | Excellent | Goal-driven + Self-improving |
---
## Aden's Hybrid Approach
Aden takes a unique approach by combining both paradigms:
### The Two-Agent Core
```
┌────────────────────────────────────────────────────────────┐
│ Aden System │
│ │
│ ┌──────────────────┐ ┌──────────────────────────┐ │
│ │ Coding Agent │ │ Worker Agents │ │
│ │ (Single, Meta) │────▶│ (Multi, Specialized) │ │
│ │ │ │ ┌──────┐ ┌──────┐ │ │
│ │ • Generates │ │ │Agent1│ │Agent2│ ... │ │
│ │ • Improves │ │ └──────┘ └──────┘ │ │
│ │ • Orchestrates │ │ │ │
│ └──────────────────┘ └──────────────────────────┘ │
│ │ │ │
│ └───────────────────────────┘ │
│ │ │
│ ┌──────────▼──────────┐ │
│ │ Control Plane │ │
│ │ Budgets • Policies │ │
│ └─────────────────────┘ │
└────────────────────────────────────────────────────────────┘
```
### How It Works
1. **Single Meta-Agent**: The Coding Agent acts as a single intelligent orchestrator
2. **Multi-Agent Execution**: Worker Agents are specialized and run in parallel
3. **Best of Both**: Simple development (goal-based) with multi-agent power
4. **Self-Improving**: The system evolves based on execution feedback
### When Aden Shines
- You want multi-agent power without multi-agent complexity
- Your system needs to improve itself over time
- You need production controls (budgets, HITL, monitoring)
- You're building complex workflows from natural language goals
---
## Decision Framework
Use this flowchart to decide:
```
Start
┌─────────────────────┐
│ Is the task │
│ single-purpose? │
└──────────┬──────────┘
Yes ◄─────┴─────► No
│ │
▼ ▼
┌───────────────┐ ┌────────────────────┐
│ Single Agent │ │ Do tasks need │
│ is sufficient │ │ different expertise?│
└───────────────┘ └─────────┬──────────┘
Yes ◄─────┴─────► No
│ │
▼ ▼
┌────────────────┐ ┌────────────────┐
│ Multi-Agent │ │ Could benefit │
│ Recommended │ │ from parallel │
└────────────────┘ │ execution? │
└────────┬───────┘
Yes ◄─────┴─────► No
│ │
▼ ▼
┌────────────────┐ ┌────────────┐
│ Multi-Agent │ │ Single │
│ for speed │ │ Agent OK │
└────────────────┘ └────────────┘
```
---
## Practical Examples
### Example 1: Customer Support Bot
**Recommended: Single Agent**
Why: Direct Q&A, unified context, low latency needed
```
User Question → Single Agent → Answer
```
### Example 2: Research Report Generator
**Recommended: Multi-Agent**
Why: Multiple sources, different skills, quality review
```
Topic → Researcher Agent → Writer Agent → Editor Agent → Report
```
### Example 3: E-commerce Order Processing
**Recommended: Multi-Agent with Aden**
Why: Multiple systems, needs reliability, self-improvement valuable
```
Order → Inventory Agent ─┐
├──► Coordinator → Fulfillment
Payment → Finance Agent ─┘
```
### Example 4: Code Review Assistant
**Recommended: Hybrid (Aden)**
Why: Needs specialization but also coordination
```
PR → Coding Agent generates → [Security Agent, Style Agent, Logic Agent]
→ Synthesize Review
```
---
## Migration Strategies
### Single → Multi-Agent
1. Identify natural task boundaries
2. Extract specialized agents one at a time
3. Add coordination layer
4. Implement inter-agent communication
5. Add monitoring for new failure modes
### Multi → Single-Agent
1. Consolidate related agents
2. Merge context and tools
3. Simplify coordination logic
4. Reduce LLM calls
5. Improve response latency
---
## Key Metrics to Track
| Metric | Single-Agent | Multi-Agent |
|--------|--------------|-------------|
| Latency | Lower baseline | Higher, but parallelizable |
| Cost/Request | Predictable | Variable, needs budgets |
| Success Rate | Simpler to optimize | More failure points |
| Throughput | Limited by one agent | Scales with agents |
| Debugging Time | Linear | Exponential without tooling |
---
## Conclusion
**Choose Single-Agent when:**
- Building simple, focused applications
- Latency is critical
- Budget is tight
- Quick iteration is needed
**Choose Multi-Agent when:**
- Tasks require different expertise
- Parallelism improves outcomes
- Reliability through redundancy matters
- System complexity warrants specialization
**Choose Aden's Hybrid Approach when:**
- You want multi-agent power with single-agent simplicity
- Self-improvement is valuable
- Production controls are essential
- You're scaling from prototype to production
The right architecture depends on your specific use case. Start simple, measure results, and evolve your architecture as needs become clearer.
---
*Last updated: January 2025*
@@ -0,0 +1,415 @@
# Self-Improving vs Static Agents: Understanding the Paradigm Shift
*Why adaptive AI agents are changing how we build intelligent systems*
---
The AI agent landscape is divided between two fundamentally different approaches: **static agents** that execute predefined logic, and **self-improving agents** that evolve based on experience. Understanding this distinction is crucial for choosing the right architecture.
---
## The Core Difference
### Static Agents
Static agents follow **predefined workflows** that remain constant until a developer manually updates them. They're predictable but require human intervention to improve.
```
User Request → Fixed Logic → Response
(If failure)
Human fixes code
Redeploy
```
### Self-Improving Agents
Self-improving agents **learn from their experiences**, automatically adjusting their behavior based on successes and failures.
```
User Request → Adaptive Logic → Response
(If failure)
Capture failure data
Evolve agent graph
Auto-redeploy (improved)
```
---
## Comparison Table
| Aspect | Static Agents | Self-Improving Agents |
|--------|---------------|----------------------|
| Behavior change | Manual code updates | Automatic evolution |
| Failure response | Log and alert | Learn and adapt |
| Improvement cycle | Days/weeks | Minutes/hours |
| Human involvement | Required for changes | Optional oversight |
| Predictability | High | Moderate (with guardrails) |
| Long-term maintenance | Higher | Lower |
| Initial complexity | Lower | Higher |
---
## How Static Agents Work
### Architecture
```
┌─────────────────────────────────────┐
│ Static Agent │
├─────────────────────────────────────┤
│ ┌─────────────────────────────┐ │
│ │ Hardcoded Workflow │ │
│ │ ┌───┐ ┌───┐ ┌───┐ │ │
│ │ │ A │→│ B │→│ C │ │ │
│ │ └───┘ └───┘ └───┘ │ │
│ └─────────────────────────────┘ │
│ │
│ • Fixed decision logic │
│ • Predefined tool usage │
│ • Static prompts │
│ • Manual error handling │
└─────────────────────────────────────┘
```
### Typical Improvement Cycle
1. **Agent deployed** with initial logic
2. **Failures occur** in production
3. **Developers analyze** logs and errors
4. **Code changes** made manually
5. **Testing** in staging environment
6. **Redeployment** to production
7. **Repeat** for each issue
**Timeline:** Days to weeks per improvement
### Examples of Static Agent Frameworks
- LangChain agents
- Basic CrewAI implementations
- Custom ReAct agents
- Simple AutoGen conversations
---
## How Self-Improving Agents Work
### Architecture
```
┌─────────────────────────────────────────────────┐
│ Self-Improving Agent System │
├─────────────────────────────────────────────────┤
│ ┌─────────────────────────────────────────┐ │
│ │ Adaptive Agent Graph │ │
│ │ ┌───┐ ┌───┐ ┌───┐ │ │
│ │ │ A │→│ B │→│ C │ ← Can change │ │
│ │ └───┘ └───┘ └───┘ │ │
│ └─────────────────────────────────────────┘ │
│ ↑ │
│ │ Evolution │
│ │ │
│ ┌─────────────────────────────────────────┐ │
│ │ Coding Agent │ │
│ │ • Analyzes failures │ │
│ │ • Generates improvements │ │
│ │ • Updates agent graph │ │
│ └─────────────────────────────────────────┘ │
│ ↑ │
│ │ │
│ ┌─────────────────────────────────────────┐ │
│ │ Failure Capture │ │
│ │ • Error context │ │
│ │ • Input/output data │ │
│ │ • User feedback │ │
│ └─────────────────────────────────────────┘ │
└─────────────────────────────────────────────────┘
```
### Typical Improvement Cycle
1. **Agent deployed** with initial goal-derived logic
2. **Failures captured** automatically with full context
3. **Coding agent analyzes** failure patterns
4. **Graph evolved** with improved logic
5. **Automatic validation** via test cases
6. **Auto-redeployment** (with optional human approval)
7. **Continuous improvement** as more data arrives
**Timeline:** Minutes to hours per improvement
### Examples of Self-Improving Systems
- Aden's goal-driven agents
- Custom evolutionary architectures
- Reinforcement learning agents
- Meta-learning systems
---
## When Failures Happen
### Static Agent Response
```python
# Static agent: failures require manual intervention
try:
result = agent.execute(task)
except AgentError as e:
logger.error(f"Agent failed: {e}")
alert_team(e) # Human must investigate
return fallback_response()
# Improvement requires:
# 1. Developer reviews logs
# 2. Identifies root cause
# 3. Writes fix
# 4. Tests fix
# 5. Deploys update
```
### Self-Improving Agent Response
```python
# Self-improving agent: failures trigger evolution
try:
result = agent.execute(task)
except AgentError as e:
# Automatic failure capture
failure_data = {
"error": e,
"input": task,
"context": agent.get_context(),
"trace": agent.get_execution_trace()
}
# Coding agent evolves the system
improved_graph = coding_agent.evolve(
current_graph=agent.graph,
failure_data=failure_data
)
# Validate and redeploy
if improved_graph.passes_tests():
agent.update_graph(improved_graph)
# Retry with improved agent
result = agent.execute(task)
```
---
## Advantages of Each Approach
### Static Agents: Advantages
1. **Predictability**
- Behavior is deterministic
- Easy to test and verify
- No unexpected changes
2. **Simplicity**
- Easier to understand
- Straightforward debugging
- Lower initial complexity
3. **Control**
- Full visibility into logic
- Manual approval of all changes
- Compliance-friendly
4. **Stability**
- No regression from auto-changes
- Consistent performance
- Known failure modes
### Self-Improving Agents: Advantages
1. **Adaptability**
- Improves without human intervention
- Handles novel situations
- Evolves with changing needs
2. **Efficiency**
- Faster improvement cycles
- Reduced developer time
- Lower maintenance burden
3. **Resilience**
- Self-healing from failures
- Automatic recovery
- Continuous optimization
4. **Scale**
- Handles more edge cases
- Improves across all instances
- Compounds improvements over time
---
## Challenges of Each Approach
### Static Agents: Challenges
- **Slow iteration**: Days/weeks to improve
- **Developer bottleneck**: Changes require engineering time
- **Scaling issues**: More edge cases = more manual work
- **Technical debt**: Accumulated workarounds
### Self-Improving Agents: Challenges
- **Unpredictability**: Behavior may change unexpectedly
- **Complexity**: Harder to understand current state
- **Guardrails needed**: Must prevent harmful evolution
- **Debugging**: Tracing why agent behaves certain way
---
## Guardrails for Self-Improving Agents
Successful self-improving systems need safety mechanisms:
### 1. Human-in-the-Loop Checkpoints
```
Evolution proposed → Human review → Approve/Reject
```
### 2. Test Case Validation
```
Improved agent must pass:
- Original test cases
- Regression tests
- New edge case tests
```
### 3. Gradual Rollout
```
Evolution stages:
1. Shadow mode (compare outputs)
2. Canary deployment (small traffic)
3. Full rollout (all traffic)
```
### 4. Rollback Capability
```
If metrics degrade:
- Automatic revert to previous version
- Alert team for investigation
```
### 5. Evolution Constraints
```
Coding agent cannot:
- Remove human checkpoints
- Bypass security measures
- Exceed cost budgets
- Change core objectives
```
---
## Real-World Scenarios
### Scenario 1: Customer Support Agent
**Static Approach:**
- Agent handles known query types
- New query types → escalate to human
- Developer adds new handlers quarterly
- Slow to adapt to trends
**Self-Improving Approach:**
- Agent learns from successful resolutions
- New patterns automatically incorporated
- Escalation rules evolve based on outcomes
- Continuously adapts to customer needs
### Scenario 2: Data Processing Pipeline
**Static Approach:**
- Fixed schema expectations
- New data formats → pipeline breaks
- Manual updates for each change
- High maintenance burden
**Self-Improving Approach:**
- Learns new data patterns
- Automatically adapts to schema changes
- Self-corrects processing errors
- Lower long-term maintenance
### Scenario 3: Content Generation
**Static Approach:**
- Fixed style and structure
- All changes require prompt updates
- No learning from feedback
- Consistent but may become stale
**Self-Improving Approach:**
- Learns from editor feedback
- Style evolves with brand changes
- Improves quality over time
- Balances consistency with growth
---
## Making the Choice
### Choose Static Agents When:
| Situation | Reason |
|-----------|--------|
| Regulatory requirements | Need audit trail of logic |
| Safety-critical systems | Predictability essential |
| Simple, stable workflows | No need for adaptation |
| Small scale | Manual updates manageable |
| High trust requirements | Must explain all decisions |
### Choose Self-Improving Agents When:
| Situation | Reason |
|-----------|--------|
| Rapidly changing requirements | Manual updates too slow |
| High volume of edge cases | Can't manually handle all |
| Continuous improvement needed | Competitive advantage |
| Developer time is limited | Automation essential |
| Long-running systems | Evolution provides value |
---
## Implementing Self-Improvement
### With Aden
Aden provides built-in self-improvement through:
1. **Goal-driven generation**: Coding agent creates initial system
2. **Failure capture**: Automatic context collection
3. **Evolution engine**: Coding agent improves graph
4. **Validation**: Test cases verify improvements
5. **Deployment**: Automatic with optional approval
### DIY Approach
Building your own requires:
1. **Failure logging**: Comprehensive context capture
2. **Analysis system**: Pattern recognition in failures
3. **Code generation**: LLM-based improvement proposals
4. **Testing framework**: Automated validation
5. **Deployment pipeline**: Safe rollout mechanism
---
## Conclusion
The choice between static and self-improving agents depends on your priorities:
- **Static agents** offer predictability and control, ideal for stable, regulated environments
- **Self-improving agents** offer adaptability and efficiency, ideal for dynamic, scaling systems
The future likely belongs to **hybrid approaches**: core logic that's stable and auditable, with adaptive components that evolve safely within guardrails.
Frameworks like Aden are pioneering this space, making self-improvement accessible while maintaining the safety and oversight that production systems require.
---
*Last updated: January 2025*
@@ -0,0 +1,326 @@
# Top 10 AI Agent Frameworks in 2025
*A comprehensive guide to the leading frameworks for building AI agents*
---
The AI agent landscape has exploded with options for developers. Whether you're building RAG applications, multi-agent systems, or autonomous workflows, choosing the right framework can significantly impact your project's success.
This guide objectively compares the top 10 AI agent frameworks based on architecture, use cases, and production readiness.
---
## Quick Comparison
| Framework | Best For | Language | Open Source | Self-Improving |
|-----------|----------|----------|-------------|----------------|
| LangChain | RAG & LLM apps | Python/JS | Yes | No |
| CrewAI | Role-based teams | Python | Yes | No |
| AutoGen | Conversational agents | Python | Yes | No |
| Aden | Self-evolving agents | Python/TS | Yes | Yes |
| PydanticAI | Type-safe workflows | Python | Yes | No |
| Swarm | Simple orchestration | Python | Yes | No |
| CAMEL | Research simulations | Python | Yes | No |
| Letta | Stateful memory | Python | Yes | No |
| Mastra | Full-stack AI | TypeScript | Yes | No |
| Haystack | Search & RAG | Python | Yes | No |
---
## 1. LangChain
**Category:** Component Library
**Best For:** RAG applications, LLM-powered apps
**Language:** Python, JavaScript
### Overview
LangChain is one of the most popular frameworks for building LLM applications. It provides a comprehensive set of components for chains, agents, and retrieval-augmented generation.
### Strengths
- Extensive documentation and community
- Wide integration ecosystem
- Flexible component architecture
- Strong RAG capabilities
### Limitations
- Can be complex for simple use cases
- Requires manual workflow definition
- No built-in self-improvement mechanisms
- Debugging can be challenging
### When to Use
Choose LangChain when you need a mature ecosystem with lots of integrations and are building document-centric applications.
---
## 2. CrewAI
**Category:** Multi-Agent Orchestration
**Best For:** Role-based agent teams
**Language:** Python
### Overview
CrewAI enables you to create teams of AI agents with defined roles that collaborate to accomplish tasks. It emphasizes simplicity and role-based organization.
### Strengths
- Intuitive role-based design
- Clean API for team creation
- Good for collaborative workflows
- Active community
### Limitations
- Predefined collaboration patterns
- Limited adaptation to failures
- Manual workflow definition required
- Scaling can be complex
### When to Use
Choose CrewAI when you have well-defined roles and want agents to collaborate in predictable patterns.
---
## 3. AutoGen
**Category:** Conversational Agents
**Best For:** Multi-agent conversations
**Language:** Python
### Overview
Microsoft's AutoGen framework specializes in conversational AI agents that can engage in complex multi-turn dialogues and collaborate through conversation.
### Strengths
- Strong conversational capabilities
- Microsoft backing and support
- Good for dialogue-heavy applications
- Flexible agent configuration
### Limitations
- Conversation-centric (less suited for other patterns)
- Complex setup for non-conversational tasks
- No automatic evolution
### When to Use
Choose AutoGen when your agents primarily need to communicate through natural language conversations.
---
## 4. Aden
**Category:** Self-Evolving Agent Framework
**Best For:** Production systems that need to adapt
**Language:** Python SDK, TypeScript backend
### Overview
Aden takes a fundamentally different approach by using a coding agent to generate agent systems from natural language goals. When agents fail, the framework automatically captures failure data, evolves the agent graph, and redeploys.
### Strengths
- Goal-driven development (describe outcomes, not workflows)
- Automatic self-improvement from failures
- Built-in observability and cost controls
- Human-in-the-loop support
- Production-ready with monitoring dashboard
### Limitations
- Newer framework with growing ecosystem
- Requires understanding of goal-driven paradigm
- More suited for complex, evolving systems
### When to Use
Choose Aden when you need agents that improve over time, want to define goals rather than workflows, or require production-grade observability and cost management.
---
## 5. PydanticAI
**Category:** Type-Safe Framework
**Best For:** Structured, validated outputs
**Language:** Python
### Overview
PydanticAI brings type safety and validation to AI agent development, ensuring outputs conform to defined schemas.
### Strengths
- Strong type validation
- Clean, Pythonic API
- Good for structured outputs
- Reliable data handling
### Limitations
- Best for known workflow patterns
- Less flexible for dynamic scenarios
- No self-adaptation
### When to Use
Choose PydanticAI when output structure and validation are critical to your application.
---
## 6. Swarm
**Category:** Lightweight Orchestration
**Best For:** Simple multi-agent setups
**Language:** Python
### Overview
OpenAI's Swarm provides a minimal framework for orchestrating multiple agents with simple handoff patterns.
### Strengths
- Extremely simple API
- Easy to understand and use
- Good for learning
- Minimal overhead
### Limitations
- Limited features for production
- No built-in monitoring
- Simple handoff patterns only
### When to Use
Choose Swarm for prototyping or simple multi-agent interactions where complexity isn't needed.
---
## 7. CAMEL
**Category:** Research Framework
**Best For:** Large-scale agent simulations
**Language:** Python
### Overview
CAMEL is designed for studying emergent behavior in large-scale multi-agent systems, supporting up to 1M agents.
### Strengths
- Massive scale support
- Research-oriented features
- Good for studying emergence
- Academic backing
### Limitations
- Research-focused, not production-ready
- Steep learning curve
- Limited production tooling
### When to Use
Choose CAMEL for academic research or when studying large-scale agent interactions.
---
## 8. Letta (formerly MemGPT)
**Category:** Stateful Memory
**Best For:** Long-term memory agents
**Language:** Python
### Overview
Letta specializes in agents with sophisticated long-term memory, allowing agents to maintain context across extended interactions.
### Strengths
- Advanced memory management
- Long-term context retention
- Good for personal assistants
- Unique memory architecture
### Limitations
- Memory-focused (less general purpose)
- Complex memory tuning
- Specific use cases
### When to Use
Choose Letta when long-term memory and context retention are primary requirements.
---
## 9. Mastra
**Category:** Full-Stack AI Framework
**Best For:** TypeScript developers
**Language:** TypeScript
### Overview
Mastra provides a TypeScript-first approach to building AI applications with integrated tooling.
### Strengths
- TypeScript native
- Full-stack integration
- Modern developer experience
- Good for web applications
### Limitations
- TypeScript only
- Smaller ecosystem
- Less mature than alternatives
### When to Use
Choose Mastra when building TypeScript applications and want tight integration with web technologies.
---
## 10. Haystack
**Category:** Search & RAG
**Best For:** Document processing pipelines
**Language:** Python
### Overview
Haystack excels at building search and retrieval systems, with strong support for document processing pipelines.
### Strengths
- Excellent for search applications
- Strong document processing
- Production-tested
- Good pipeline abstractions
### Limitations
- Search/RAG focused
- Less suited for general agents
- Pipeline-centric design
### When to Use
Choose Haystack when building search, Q&A, or document processing systems.
---
## Decision Framework
### Choose Based on Your Primary Need
| Need | Recommended Framework |
|------|----------------------|
| RAG / Document apps | LangChain, Haystack |
| Role-based teams | CrewAI |
| Conversational agents | AutoGen |
| Self-improving systems | Aden |
| Type-safe outputs | PydanticAI |
| Simple prototypes | Swarm |
| Research simulations | CAMEL |
| Long-term memory | Letta |
| TypeScript apps | Mastra |
### Choose Based on Production Requirements
| Requirement | Best Options |
|-------------|--------------|
| Self-healing & adaptation | Aden |
| Mature ecosystem | LangChain |
| Cost management built-in | Aden |
| Simple deployment | Swarm, CrewAI |
| Enterprise support | LangChain, AutoGen |
| Real-time monitoring | Aden |
---
## Conclusion
The "best" framework depends on your specific needs:
- **For most RAG applications:** LangChain remains the standard
- **For collaborative agent teams:** CrewAI offers intuitive design
- **For systems that need to evolve:** Aden's self-improving approach is unique
- **For research:** CAMEL provides scale
- **For simplicity:** Swarm is hard to beat
Consider your production requirements, team expertise, and whether you need agents that can adapt and improve over time when making your decision.
---
*Last updated: January 2025*
+165
View File
@@ -0,0 +1,165 @@
# 🚀 Getting Started Challenge
Welcome to Aden! This challenge will help you get familiar with our project and community. Complete all tasks to earn your first badge!
**Difficulty:** Beginner
**Time:** ~30 minutes
**Prerequisites:** GitHub account
---
## Part 1: Join the Aden Community (10 points)
### Task 1.1: Star the Repository ⭐
Show your support by starring our repo!
1. Go to [github.com/adenhq/hive](https://github.com/adenhq/hive)
2. Click the **Star** button in the top right
3. **Screenshot** your starred repo (showing the star count)
### Task 1.2: Watch the Repository 👁️
Stay updated with our latest changes!
1. Click the **Watch** button
2. Select **"All Activity"** to get notifications
3. **Screenshot** your watch settings
### Task 1.3: Fork the Repository 🍴
Create your own copy to experiment with!
1. Click the **Fork** button
2. Keep the default settings and create the fork
3. **Screenshot** your forked repository
### Task 1.4: Join Discord 💬
Connect with our community!
1. Join our [Discord server](https://discord.com/invite/MXE49hrKDk)
2. Introduce yourself in `#introductions`
3. **Screenshot** your introduction message
---
## Part 2: Explore Aden (15 points)
### Task 2.1: README Scavenger Hunt 🔍
Find the answers to these questions by reading our README:
1. What are the **three LLM providers** Aden supports out of the box?
2. How many **MCP tools** does the Hive Control Plane provide?
3. What is the name of the **frontend dashboard**?
4. In the "How It Works" section, what is **Step 5**?
5. What city is Aden made with passion in?
### Task 2.2: Architecture Quiz 🏗️
Based on the architecture diagram in the README:
1. What are the three databases in the Storage Layer?
2. Name two components inside an "SDK-Wrapped Node"
3. What connects the Control Plane to the Dashboard?
4. Where does "Failure Data" flow to in the diagram?
### Task 2.3: Comparison Challenge 📊
From the Comparison Table, answer:
1. What category is CrewAI in?
2. What's the Aden difference compared to LangChain?
3. Which framework focuses on "emergent behavior in large-scale simulations"?
---
## Part 3: Quick Code Exploration (15 points)
### Task 3.1: Project Structure 📁
Clone your fork and explore the codebase:
```bash
git clone https://github.com/YOUR_USERNAME/hive.git
cd hive
```
Answer these questions:
1. What is the main frontend folder called?
2. What is the main backend folder called?
3. What file would you edit to configure the application?
4. What's the Docker command to start all services (hint: check README)?
### Task 3.2: Find the Features 🎯
Look through the codebase to find:
1. Where are the MCP tools defined? (provide the file path)
2. What port does the API run on? (hint: check README or docker-compose)
3. Find one TypeScript interface related to agents (provide file path and interface name)
---
## Part 4: Creative Challenge (10 points)
### Task 4.1: Agent Idea 💡
Aden can build self-improving agents for any use case. Propose ONE creative agent idea:
1. **Name:** Give your agent a catchy name
2. **Goal:** What problem does it solve? (2-3 sentences)
3. **Self-Improvement:** How would it get better over time when things fail?
4. **Human-in-the-Loop:** When would it need human input?
Example format:
```
Name: DocBot
Goal: Automatically keeps documentation in sync with code changes.
Monitors PRs and updates relevant docs.
Self-Improvement: When docs get rejected in review, it learns the feedback
and adjusts its writing style and coverage.
Human-in-the-Loop: Major architectural changes require human approval
before doc updates go live.
```
---
## Submission Checklist
Before submitting, make sure you have:
- [ ] Screenshots from Part 1 (Star, Watch, Fork, Discord)
- [ ] Answers to all Part 2 questions
- [ ] Answers to all Part 3 questions
- [ ] Your creative agent idea from Part 4
### How to Submit
1. Create a GitHub Gist at [gist.github.com](https://gist.github.com)
2. Name it `aden-getting-started-YOURNAME.md`
3. Include all your answers and screenshots (use image hosting like imgur for screenshots)
4. Email the Gist link to `careers@adenhq.com`
- Subject: `[Getting Started Challenge] Your Name`
- Include your GitHub username
---
## Scoring
| Section | Points |
|---------|--------|
| Part 1: Community | 10 |
| Part 2: Explore | 15 |
| Part 3: Code | 15 |
| Part 4: Creative | 10 |
| **Total** | **50** |
**Passing score:** 40+ points
---
## What's Next?
After completing this challenge, choose your specialization:
- **Backend Engineers:** [🧠 Architecture Deep Dive](./02-architecture-deep-dive.md)
- **AI/ML Engineers:** [🤖 Build Your First Agent](./03-build-your-first-agent.md)
- **Frontend Engineers:** [🎨 Frontend Challenge](./04-frontend-challenge.md)
- **DevOps Engineers:** [🔧 DevOps Challenge](./05-devops-challenge.md)
---
Good luck! We're excited to see your submissions! 🎉
+195
View File
@@ -0,0 +1,195 @@
# 🧠 Architecture Deep Dive Challenge
Test your understanding of Aden's architecture and backend systems. This challenge is perfect for backend engineers who want to contribute to the core framework.
**Difficulty:** Intermediate
**Time:** 1-2 hours
**Prerequisites:** Complete [Getting Started](./01-getting-started.md), familiarity with Node.js/TypeScript
---
## Part 1: System Architecture (20 points)
### Task 1.1: Component Mapping 🗺️
Study the Aden architecture and answer:
1. Describe the data flow from when a user defines a goal to when worker agents execute. Include all major components.
2. Explain the "self-improvement loop" - what happens when an agent fails?
3. What's the difference between:
- Coding Agent vs Worker Agent
- STM (Short-Term Memory) vs LTM (Long-Term Memory)
- Hot storage vs Cold storage for events
### Task 1.2: Database Design 💾
Aden uses three databases. For each, explain:
1. **TimescaleDB:** What type of data is stored? Why TimescaleDB specifically?
2. **MongoDB:** What is stored here? Why a document database?
3. **PostgreSQL:** What is its primary purpose?
### Task 1.3: Real-time Communication 📡
Answer these about the real-time systems:
1. What protocol connects the SDK to the Hive backend for policy updates?
2. How does the dashboard receive live agent metrics?
3. What is the heartbeat interval for SDK health checks?
---
## Part 2: Code Analysis (25 points)
### Task 2.1: API Routes 🛣️
Explore the backend code and document:
1. List all the main API route prefixes (e.g., `/user`, `/v1/control`, etc.)
2. For the `/v1/control` routes, what are the main endpoints and their purposes?
3. What authentication method is used for API requests?
### Task 2.2: MCP Tools Deep Dive 🔧
The MCP server provides 19 tools. Categorize them and answer:
1. List all **Budget tools** (tools with "budget" in the name)
2. List all **Analytics tools**
3. List all **Policy tools**
4. Pick ONE tool and explain:
- What parameters does it accept?
- What does it return?
- When would the Coding Agent use it?
### Task 2.3: Event Specification 📊
Find and analyze the SDK event specification:
1. What are the four event types that can be sent from SDK to server?
2. For a `MetricEvent`, list at least 5 fields that are captured
3. What is "Layer 0 content capture" and when is it used?
---
## Part 3: Design Questions (25 points)
### Task 3.1: Scaling Scenario 📈
Imagine Aden needs to handle 1000 concurrent agents across 50 teams:
1. Which components would be the bottleneck? Why?
2. How would you horizontally scale the system?
3. What database optimizations would you recommend?
4. How would you ensure team data isolation at scale?
### Task 3.2: New Feature Design 🆕
Design a new feature: **Agent Collaboration Logs**
Requirements:
- Track when agents communicate with each other
- Store the message content and metadata
- Support querying by time range, agent, or conversation thread
- Real-time streaming to the dashboard
Provide:
1. Database schema design (which DB and table structure)
2. API endpoint design (routes and payloads)
3. How would this integrate with existing event batching?
### Task 3.3: Failure Handling ⚠️
The self-healing loop is core to Aden. Design the detailed flow:
1. How should failures be categorized (types of failures)?
2. What data should be captured for the Coding Agent to improve?
3. How do you prevent infinite failure loops?
4. When should the system escalate to human intervention?
---
## Part 4: Practical Implementation (30 points)
### Task 4.1: Write a New MCP Tool 🛠️
Create a new MCP tool called `hive_agent_performance_report`:
**Requirements:**
- Returns performance metrics for a specific agent over a time period
- Includes: total requests, success rate, avg latency, total cost
- Accepts parameters: `agent_id`, `start_time`, `end_time`
Provide:
1. Tool definition (name, description, input schema)
2. Implementation pseudocode or actual TypeScript
3. Example request and response
### Task 4.2: Budget Enforcement Algorithm 💰
Implement the logic for budget enforcement:
```typescript
interface BudgetCheck {
action: 'allow' | 'block' | 'throttle' | 'degrade';
reason: string;
degradedModel?: string;
delayMs?: number;
}
function checkBudget(
currentSpend: number,
budgetLimit: number,
requestedModel: string,
estimatedCost: number
): BudgetCheck {
// Your implementation here
}
```
Requirements:
- Block if budget would be exceeded
- Throttle (2000ms delay) if ≥95% used
- Degrade to cheaper model if ≥80% used
- Allow otherwise
### Task 4.3: Event Aggregation Query 📈
Write a SQL query for TimescaleDB that:
1. Aggregates metrics by hour for the last 24 hours
2. Groups by model and provider
3. Calculates: total tokens, total cost, avg latency, request count
4. Orders by total cost descending
---
## Submission Checklist
- [ ] All Part 1 architecture answers
- [ ] All Part 2 code analysis answers
- [ ] All Part 3 design documents
- [ ] All Part 4 implementations
### How to Submit
1. Create a GitHub Gist with your answers
2. Name it `aden-architecture-YOURNAME.md`
3. Include any code files as separate files in the Gist
4. Email to `careers@adenhq.com`
- Subject: `[Architecture Challenge] Your Name`
---
## Scoring
| Section | Points |
|---------|--------|
| Part 1: System Architecture | 20 |
| Part 2: Code Analysis | 25 |
| Part 3: Design Questions | 25 |
| Part 4: Implementation | 30 |
| **Total** | **100** |
**Passing score:** 75+ points
---
## Bonus Points (+20)
- Identify a bug or improvement in the actual codebase and open an issue
- Submit a PR fixing a documentation issue
- Create a diagram of your design using Mermaid or similar
---
Good luck! We're looking for engineers who can think systematically about distributed systems! 🏗️
+277
View File
@@ -0,0 +1,277 @@
# 🤖 Build Your First Agent Challenge
Get hands-on with AI agents! This challenge is for AI/ML engineers who want to understand agent development and contribute to Aden's agent ecosystem.
**Difficulty:** Intermediate
**Time:** 2-3 hours
**Prerequisites:** Complete [Getting Started](./01-getting-started.md), Python experience, basic LLM knowledge
---
## Part 1: Agent Fundamentals (20 points)
### Task 1.1: Core Concepts 📚
Answer these questions about Aden's agent architecture:
1. What is a "node" in Aden's architecture? How does it differ from a traditional function?
2. Explain the SDK-wrapped node concept. What four capabilities does every node get automatically?
3. What's the difference between:
- A Coding Agent and a Worker Agent
- Goal-driven vs workflow-driven development
- Predefined edges vs dynamic connections
4. Why does Aden generate "connection code" instead of using a fixed graph structure?
### Task 1.2: Memory Systems 🧠
Aden has sophisticated memory management:
1. Describe the three types of memory available to agents:
- Shared Memory
- STM (Short-Term Memory)
- LTM (Long-Term Memory / RLM)
2. When would an agent use each type?
3. How does "Session Local memory isolation" work?
### Task 1.3: Human-in-the-Loop 🙋
Explain the HITL system:
1. What triggers a human intervention point?
2. What happens if a human doesn't respond within the timeout?
3. List three scenarios where HITL would be essential
---
## Part 2: Agent Design (25 points)
### Task 2.1: Design a Multi-Agent System 🎭
Design a **Content Marketing Agent System** with multiple worker agents:
**Goal:** Automatically create and publish blog posts based on company news
Requirements:
- Must use at least 3 specialized worker agents
- Include human approval before publishing
- Handle failures gracefully
Provide:
1. **Agent Diagram:** Show all agents and how they connect
2. **Agent Descriptions:** For each agent, describe:
- Name and role
- Inputs and outputs
- Tools it needs
- Failure scenarios
3. **Human Checkpoints:** Where would humans intervene?
4. **Self-Improvement:** How would this system learn from failures?
### Task 2.2: Goal Definition 🎯
Write a natural language goal that a user might give to create your system:
```
Example Goal:
"Create a system that monitors our company RSS feed for news,
writes engaging blog posts about each news item, gets approval
from the marketing team, and publishes to our WordPress site.
If a post is rejected, learn from the feedback to write better
posts in the future."
```
Your goal should be:
- Clear and specific
- Include success criteria
- Mention failure handling
- Specify human touchpoints
### Task 2.3: Test Cases 📋
Design 5 test cases for your agent system:
| Test Case | Input | Expected Output | Success Criteria |
|-----------|-------|-----------------|------------------|
| Happy Path | Normal news item | Published blog post | Post live on site |
| ... | ... | ... | ... |
Include at least:
- 1 happy path
- 2 edge cases
- 2 failure scenarios
---
## Part 3: Practical Implementation (30 points)
### Task 3.1: Agent Pseudocode 💻
Write pseudocode for ONE of your worker agents:
```python
class ContentWriterAgent:
"""
Agent that takes news items and writes blog posts.
"""
def __init__(self, config):
# Initialize with tools, memory, LLM access
pass
async def execute(self, input_data):
# Main execution logic
pass
async def handle_failure(self, error, context):
# How to handle different types of failures
pass
async def learn_from_feedback(self, feedback):
# How to improve based on rejection feedback
pass
```
Provide detailed pseudocode with:
- LLM calls and prompts
- Memory reads/writes
- Tool usage
- Error handling
### Task 3.2: Prompt Engineering 📝
Write the actual prompts for your agent:
1. **System Prompt:** The core instructions for your agent
2. **Task Prompt Template:** How tasks are presented to the agent
3. **Feedback Learning Prompt:** How rejection feedback is processed
Example format:
```
SYSTEM PROMPT:
You are a professional content writer for {company_name}...
TASK PROMPT:
Given the following news item:
{news_content}
Write a blog post that...
FEEDBACK PROMPT:
Your previous post was rejected with this feedback:
{feedback}
Analyze what went wrong and...
```
### Task 3.3: Tool Definitions 🔧
Define the tools your agent needs:
```python
tools = [
{
"name": "search_company_knowledge",
"description": "Search internal knowledge base for relevant context",
"parameters": {
"query": "string - search query",
"limit": "int - max results (default 5)"
},
"returns": "List of relevant documents"
},
# Add more tools...
]
```
Define at least 3 tools with:
- Clear name and description
- Input parameters with types
- Return value description
- Example usage
---
## Part 4: Advanced Challenges (25 points)
### Task 4.1: Failure Evolution Design 🔄
Design the self-improvement mechanism in detail:
1. **Failure Classification:** Create a taxonomy of failures for your agent
```
- LLM Failures: rate limit, content filter, hallucination
- Tool Failures: API down, invalid response, timeout
- Logic Failures: wrong output format, missing data
- Human Rejection: quality issues, off-brand, factual error
```
2. **Learning Storage:** What data do you store for each failure type?
3. **Evolution Strategy:** How does the Coding Agent use failure data to improve?
4. **Guardrails:** What prevents the system from making things worse?
### Task 4.2: Cost Optimization 💰
Your agent system will be called frequently. Design cost optimizations:
1. **Model Selection:** When to use GPT-4 vs GPT-3.5 vs Claude Haiku?
2. **Caching Strategy:** What can be cached to reduce LLM calls?
3. **Batching:** How can you batch operations for efficiency?
4. **Budget Rules:** Design budget rules for your system
### Task 4.3: Observability Dashboard 📊
Design what metrics should be tracked for your agent system:
1. **Performance Metrics:** (at least 5)
2. **Quality Metrics:** (at least 3)
3. **Cost Metrics:** (at least 3)
4. **Alert Conditions:** When should the system alert humans?
---
## Submission Checklist
- [ ] All Part 1 concept answers
- [ ] Complete multi-agent design (Part 2)
- [ ] Implementation code/pseudocode (Part 3)
- [ ] Advanced challenge solutions (Part 4)
### How to Submit
1. Create a GitHub Gist with your answers
2. Name it `aden-agent-challenge-YOURNAME.md`
3. Include code files separately
4. If you created diagrams, include images
5. Email to `careers@adenhq.com`
- Subject: `[Agent Challenge] Your Name`
---
## Scoring
| Section | Points |
|---------|--------|
| Part 1: Fundamentals | 20 |
| Part 2: Design | 25 |
| Part 3: Implementation | 30 |
| Part 4: Advanced | 25 |
| **Total** | **100** |
**Passing score:** 75+ points
---
## Bonus Points (+25)
- **+10:** Actually implement a working prototype using any framework
- **+10:** Create a demo video of your agent in action
- **+5:** Submit a PR adding your agent as a template to the repo
---
## Example Agent Templates
Need inspiration? Here are some agent ideas:
1. **Research Agent:** Gathers information from multiple sources
2. **Code Review Agent:** Reviews PRs and suggests improvements
3. **Customer Support Agent:** Handles support tickets with escalation
4. **Data Pipeline Agent:** Monitors and fixes data quality issues
5. **Meeting Agent:** Summarizes meetings and creates action items
---
Good luck! We're excited to see your creative agent designs! 🤖✨
+277
View File
@@ -0,0 +1,277 @@
# 🎨 Frontend Challenge
Build beautiful, functional interfaces for AI agent management! This challenge is for frontend engineers who want to contribute to Honeycomb, Aden's dashboard.
**Difficulty:** Intermediate
**Time:** 1-2 hours
**Prerequisites:** Complete [Getting Started](./01-getting-started.md), React/TypeScript experience
---
## Part 1: Codebase Exploration (15 points)
### Task 1.1: Tech Stack Analysis 🔍
Explore the `honeycomb/` directory and answer:
1. What React version is used?
2. What styling solution is used? (Tailwind, CSS Modules, etc.)
3. What state management approach is used?
4. What charting library is used for analytics?
5. How does the frontend communicate with the backend in real-time?
### Task 1.2: Component Structure 📁
Map out the component architecture:
1. List the main page components (routes)
2. Find and describe 3 reusable components
3. Where are TypeScript types defined for agent data?
4. How is authentication handled in the frontend?
### Task 1.3: Design System 🎨
Analyze the UI patterns:
1. What UI component library is used? (Radix, shadcn, etc.)
2. Find 3 custom components that aren't from a library
3. What color scheme/theme approach is used?
4. How are loading and error states typically handled?
---
## Part 2: UI/UX Analysis (20 points)
### Task 2.1: Dashboard Critique 📊
Based on the codebase and agent control types, analyze what the dashboard likely shows:
1. What key metrics would you display for agent monitoring?
2. How would you visualize the agent graph/connections?
3. What real-time updates are most important to show?
4. Critique: What could be improved in the current approach?
### Task 2.2: User Flow Design 🔄
Design the user flow for this feature:
**Feature:** "Create New Agent from Goal"
Map out:
1. Entry point (where does the user start?)
2. Step-by-step screens needed
3. Form fields and validation
4. Success/error states
5. How to show agent generation progress
Provide a wireframe (can be ASCII, hand-drawn, or Figma):
```
+----------------------------------+
| Create New Agent |
|----------------------------------|
| Step 1: Define Your Goal |
| +----------------------------+ |
| | Describe what you want | |
| | your agent to achieve... | |
| +----------------------------+ |
| |
| [ ] Include human checkpoints |
| [ ] Enable cost controls |
| |
| [Cancel] [Next Step] |
+----------------------------------+
```
### Task 2.3: Accessibility Audit ♿
Consider accessibility for the agent dashboard:
1. List 5 accessibility requirements for a data-heavy dashboard
2. How would you make real-time updates accessible?
3. What keyboard navigation is essential?
4. How would you handle screen readers for the agent graph visualization?
---
## Part 3: Implementation Challenges (35 points)
### Task 3.1: Build a Component 🧱
Create a React component: `AgentStatusCard`
Requirements:
- Display agent name, status, and key metrics
- Status: online (green), degraded (yellow), offline (red), unknown (gray)
- Show: requests/min, success rate, avg latency, cost today
- Include a mini sparkline chart for requests over last hour
- Expandable to show more details
- TypeScript with proper types
```tsx
interface AgentStatusCardProps {
agent: {
id: string;
name: string;
status: 'online' | 'degraded' | 'offline' | 'unknown';
metrics: {
requestsPerMinute: number;
successRate: number;
avgLatency: number;
costToday: number;
requestHistory: number[]; // last 60 minutes
};
};
onExpand?: () => void;
expanded?: boolean;
}
export function AgentStatusCard({ agent, onExpand, expanded }: AgentStatusCardProps) {
// Your implementation
}
```
### Task 3.2: Real-time Hook 🔌
Create a custom hook for real-time agent metrics:
```tsx
interface UseAgentMetricsOptions {
agentId: string;
refreshInterval?: number;
}
interface UseAgentMetricsResult {
metrics: AgentMetrics | null;
isLoading: boolean;
error: Error | null;
lastUpdated: Date | null;
}
function useAgentMetrics(options: UseAgentMetricsOptions): UseAgentMetricsResult {
// Your implementation
// Should handle:
// - WebSocket subscription for real-time updates
// - Fallback to polling if WebSocket unavailable
// - Cleanup on unmount
// - Error handling and retry logic
}
```
### Task 3.3: Data Visualization 📈
Design and implement a cost breakdown chart component:
Requirements:
- Show cost by model (GPT-4, Claude, etc.) as a donut/pie chart
- Show cost over time as a line/area chart
- Toggle between daily/weekly/monthly views
- Animate transitions between views
- Show tooltip with details on hover
Provide:
1. Component interface/props
2. Implementation (can use Recharts, Vega, or any library)
3. Example mock data
4. Responsive design considerations
---
## Part 4: Advanced Frontend (30 points)
### Task 4.1: Agent Graph Visualization 🕸️
Design how to visualize the agent graph:
**Challenge:** Show a dynamic graph where:
- Nodes are agents
- Edges are connections between agents
- Real-time data flows are animated
- Users can zoom, pan, and click for details
Provide:
1. Library choice and justification (D3, React Flow, Cytoscape, etc.)
2. Component architecture
3. Performance considerations for 50+ nodes
4. Interaction design (how users explore the graph)
5. Code sketch for the main component
### Task 4.2: Optimistic UI for Budget Controls 💰
Implement optimistic UI for budget updates:
**Scenario:** User changes an agent's budget limit
- Update should appear instantly
- Backend validation may reject the change
- Must handle race conditions with real-time updates
Provide:
1. State management approach
2. Rollback mechanism on failure
3. Conflict resolution strategy
4. User feedback design
```tsx
function useBudgetUpdate(agentId: string) {
// Your implementation showing:
// - Optimistic update
// - Server sync
// - Rollback on error
// - Conflict handling
}
```
### Task 4.3: Performance Optimization ⚡
The dashboard shows data for 100+ agents with real-time updates.
Design optimizations for:
1. **Rendering:** How to prevent unnecessary re-renders?
2. **Data:** How to handle high-frequency WebSocket updates?
3. **Memory:** How to prevent memory leaks with subscriptions?
4. **Initial Load:** How to prioritize visible content?
Provide specific techniques and code examples for each.
---
## Submission Checklist
- [ ] All Part 1 exploration answers
- [ ] Part 2 wireframes and design analysis
- [ ] Part 3 component implementations
- [ ] Part 4 advanced designs
### How to Submit
1. Create a GitHub Gist with your answers
2. Name it `aden-frontend-YOURNAME.md`
3. Include code files as separate Gist files
4. If you created working code, include a CodeSandbox/StackBlitz link
5. Email to `careers@adenhq.com`
- Subject: `[Frontend Challenge] Your Name`
---
## Scoring
| Section | Points |
|---------|--------|
| Part 1: Exploration | 15 |
| Part 2: UI/UX | 20 |
| Part 3: Implementation | 35 |
| Part 4: Advanced | 30 |
| **Total** | **100** |
**Passing score:** 75+ points
---
## Bonus Points (+20)
- **+10:** Create a working prototype in CodeSandbox
- **+5:** Submit a PR improving existing UI
- **+5:** Create a Figma design for a new feature
---
## Resources
- [React Documentation](https://react.dev)
- [Tailwind CSS](https://tailwindcss.com)
- [Radix UI](https://radix-ui.com)
- [Recharts](https://recharts.org)
- [React Flow](https://reactflow.dev) (for graph visualization)
---
Good luck! We love engineers who care about user experience! 🎨✨
+309
View File
@@ -0,0 +1,309 @@
# 🔧 DevOps Challenge
Master the deployment and operations of AI agent infrastructure! This challenge is for DevOps and Platform engineers who want to ensure Aden runs reliably at scale.
**Difficulty:** Advanced
**Time:** 2-3 hours
**Prerequisites:** Complete [Getting Started](./01-getting-started.md), Docker, Linux, CI/CD experience
---
## Part 1: Infrastructure Analysis (20 points)
### Task 1.1: Docker Deep Dive 🐳
Analyze the Aden Docker setup:
1. List all services defined in `docker-compose.yml`
2. What's the purpose of `docker-compose.override.yml`?
3. How is hot reload enabled for development?
4. What volumes are mounted and why?
5. What networking mode is used between services?
### Task 1.2: Service Dependencies 🔗
Map the service dependencies:
1. Create a dependency diagram showing which services depend on which
2. What's the startup order? Does it matter?
3. What happens if MongoDB is unavailable?
4. What happens if Redis is unavailable?
5. Which services are stateless vs stateful?
### Task 1.3: Configuration Management ⚙️
Analyze how configuration works:
1. How does `config.yaml` get generated?
2. What environment variables are required?
3. How are secrets managed? (API keys, database passwords)
4. What's the difference between dev and prod configs?
---
## Part 2: Deployment Scenarios (25 points)
### Task 2.1: Production Deployment Plan 📋
Design a production deployment for a company with:
- 100 active agents
- 10,000 LLM requests/day
- 99.9% uptime requirement
- Multi-region support needed
Provide:
1. **Infrastructure diagram** (cloud provider of your choice)
2. **Service sizing** (CPU, memory for each component)
3. **Database setup** (primary/replica, backups)
4. **Load balancing strategy**
5. **Estimated monthly cost**
### Task 2.2: Kubernetes Migration 🚢
Convert the Docker Compose setup to Kubernetes:
1. Create a Kubernetes deployment manifest for the Hive backend
2. Create a Service and Ingress for external access
3. Design a ConfigMap for configuration
4. Create a Secret for sensitive data
5. Set up a HorizontalPodAutoscaler
```yaml
# Provide your manifests here
apiVersion: apps/v1
kind: Deployment
metadata:
name: hive-backend
spec:
# Your implementation
```
### Task 2.3: High Availability Design 🔄
Design for high availability:
1. How would you handle backend service failures?
2. How would you handle database failover?
3. What's your strategy for zero-downtime deployments?
4. How would you handle WebSocket connections during rolling updates?
5. Design a disaster recovery plan
---
## Part 3: CI/CD Pipeline (25 points)
### Task 3.1: GitHub Actions Pipeline 🔄
Create a complete CI/CD pipeline:
```yaml
# .github/workflows/ci-cd.yml
name: Aden CI/CD
on:
push:
branches: [main, develop]
pull_request:
branches: [main]
jobs:
# Your implementation should include:
# - Linting
# - Type checking
# - Unit tests
# - Integration tests
# - Build Docker images
# - Push to registry
# - Deploy to staging (on develop)
# - Deploy to production (on main, with approval)
```
Include:
1. Separate jobs for frontend and backend
2. Matrix testing for multiple Node versions
3. Docker layer caching
4. Deployment gates/approvals
5. Rollback strategy
### Task 3.2: Testing Strategy 🧪
Design the testing infrastructure:
1. **Unit Tests:** What to test? How to mock LLM calls?
2. **Integration Tests:** How to test with real databases?
3. **E2E Tests:** What user flows to test?
4. **Load Tests:** How to simulate agent traffic?
5. **Chaos Tests:** What failures to simulate?
Provide example test configurations for each type.
### Task 3.3: Environment Management 🌍
Design environment strategy:
| Environment | Purpose | Data | Who Can Access |
|-------------|---------|------|----------------|
| Local | Development | Mock | Developers |
| Dev | Integration | Sanitized | Engineering |
| Staging | Pre-prod | Copy of prod | Engineering + QA |
| Production | Live | Real | Restricted |
For each environment, specify:
1. How it's provisioned
2. How data is managed
3. How deployments happen
4. Access control
---
## Part 4: Observability & Operations (30 points)
### Task 4.1: Monitoring Stack 📊
Design a comprehensive monitoring solution:
1. **Metrics:** What to collect? (list at least 10 key metrics)
2. **Logs:** Logging strategy and aggregation
3. **Traces:** Distributed tracing for agent flows
4. **Dashboards:** Design 3 key dashboards
```yaml
# Provide a docker-compose addition for monitoring
services:
prometheus:
# Your config
grafana:
# Your config
# Add more as needed
```
### Task 4.2: Alerting Rules 🚨
Create alerting rules for critical scenarios:
```yaml
# Prometheus alerting rules
groups:
- name: aden-critical
rules:
- alert: HighErrorRate
expr: # Your expression
for: 5m
labels:
severity: critical
annotations:
summary: "High error rate detected"
description: # Your description
# Add more alerts for:
# - Service down
# - High latency
# - Budget exceeded
# - Database connection issues
# - Memory pressure
```
Create at least 8 alert rules covering different failure modes.
### Task 4.3: Incident Response 🆘
Create an incident response runbook:
**Scenario:** Agent response times spike to 30 seconds (normal: 2 seconds)
Provide:
1. **Detection:** How was this discovered?
2. **Triage:** Initial investigation steps
3. **Diagnosis:** Decision tree for root causes
4. **Resolution:** Steps for each root cause
5. **Post-mortem:** Template for incident review
```markdown
# Runbook: High Agent Latency
## Symptoms
- Agent response times > 10s
- Dashboard showing degraded status
## Initial Triage
1. Check [ ] Is this affecting all agents or specific ones?
2. Check [ ] Is the backend healthy? (health endpoint)
3. Check [ ] Are databases responsive?
...
## Diagnostic Steps
...
## Resolution Steps
### If LLM Provider Issue:
...
### If Database Issue:
...
```
---
## Part 5: Security Hardening (Bonus - 20 points)
### Task 5.1: Security Audit 🔒
Perform a security analysis:
1. **Network:** What ports are exposed? Are they necessary?
2. **Secrets:** How are secrets currently handled? Improvements?
3. **Authentication:** How is API auth implemented?
4. **Container Security:** What image scanning would you add?
5. **Database Security:** What hardening is needed?
### Task 5.2: Compliance Checklist ✅
For SOC 2 compliance, what changes are needed?
1. Access control improvements
2. Audit logging requirements
3. Encryption requirements
4. Data retention policies
5. Incident response requirements
---
## Submission Checklist
- [ ] Part 1 infrastructure analysis
- [ ] Part 2 deployment designs and manifests
- [ ] Part 3 CI/CD pipeline YAML
- [ ] Part 4 monitoring and alerting configs
- [ ] (Bonus) Part 5 security analysis
### How to Submit
1. Create a GitHub Gist with your answers
2. Name it `aden-devops-YOURNAME.md`
3. Include all YAML/configuration files
4. Include any diagrams (use Mermaid, ASCII, or image links)
5. Email to `careers@adenhq.com`
- Subject: `[DevOps Challenge] Your Name`
---
## Scoring
| Section | Points |
|---------|--------|
| Part 1: Infrastructure | 20 |
| Part 2: Deployment | 25 |
| Part 3: CI/CD | 25 |
| Part 4: Observability | 30 |
| Part 5: Security (Bonus) | +20 |
| **Total** | **100 (+20)** |
**Passing score:** 75+ points
---
## Bonus Points (+15)
- **+5:** Set up a working local Kubernetes cluster with Aden
- **+5:** Create a Terraform module for cloud deployment
- **+5:** Submit a PR improving deployment documentation
---
## Resources
- [Docker Documentation](https://docs.docker.com)
- [Kubernetes Documentation](https://kubernetes.io/docs)
- [GitHub Actions](https://docs.github.com/en/actions)
- [Prometheus](https://prometheus.io/docs)
- [Grafana](https://grafana.com/docs)
---
Good luck! We're looking for engineers who keep systems running smoothly! 🔧✨
+46
View File
@@ -0,0 +1,46 @@
# Aden Engineering Challenges
Welcome to the Aden Engineering Challenges! These quizzes are designed for students and applicants who want to join the Aden team or contribute to our open-source projects.
## How It Works
1. **Choose your track** based on your interests and skill level
2. **Complete the challenges** in order
3. **Submit your work** as instructed in each challenge
4. **Get noticed** by the Aden team!
## Available Tracks
| Track | Difficulty | Time Estimate | Best For |
|-------|------------|---------------|----------|
| [🚀 Getting Started](./01-getting-started.md) | Beginner | 30 min | Everyone - Start Here! |
| [🧠 Architecture Deep Dive](./02-architecture-deep-dive.md) | Intermediate | 1-2 hours | Backend Engineers |
| [🤖 Build Your First Agent](./03-build-your-first-agent.md) | Intermediate | 2-3 hours | AI/ML Engineers |
| [🎨 Frontend Challenge](./04-frontend-challenge.md) | Intermediate | 1-2 hours | Frontend Engineers |
| [🔧 DevOps Challenge](./05-devops-challenge.md) | Advanced | 2-3 hours | DevOps/Platform Engineers |
## Why Complete These Challenges?
- 📚 **Learn** about cutting-edge AI agent technology
- 🏆 **Stand out** in your application to Aden
- 🤝 **Connect** with the Aden engineering team
- 🌟 **Contribute** to an exciting open-source project
- 💼 **Showcase** your skills with real-world projects
## Submission Guidelines
After completing challenges, submit your work by:
1. Creating a GitHub Gist with your answers
2. Emailing the link to `careers@adenhq.com` with subject: `[Engineering Challenge] Your Name - Track Name`
3. Include your GitHub username in the email
## Getting Help
- Join our [Discord](https://discord.com/invite/MXE49hrKDk) and ask in #applicant-challenges
- Check out the [documentation](https://docs.adenhq.com/)
- Review the [README](../../README.md) for project overview
---
**Ready to begin?** Start with [🚀 Getting Started](./01-getting-started.md)!