docs: remove outdated documents

This commit is contained in:
Richard Tang
2026-03-24 18:23:38 -07:00
parent 3154e34c7a
commit 645792fb1a
16 changed files with 0 additions and 4232 deletions
-30
View File
@@ -1,30 +0,0 @@
# Aden Listicles & Comparisons
Educational content comparing AI agent frameworks and exploring the agent development landscape.
## Articles
| Article | Topic | Keywords |
|---------|-------|----------|
| [Top 10 AI Agent Frameworks in 2025](./top-10-ai-agent-frameworks-2025.md) | Overview | ai agents, frameworks, comparison |
| [Aden vs LangChain](./aden-vs-langchain.md) | Comparison | langchain, rag, llm apps |
| [Aden vs CrewAI](./aden-vs-crewai.md) | Comparison | crewai, multi-agent, orchestration |
| [Aden vs AutoGen](./aden-vs-autogen.md) | Comparison | autogen, microsoft, conversational |
| [Self-Improving vs Static Agents](./self-improving-vs-static-agents.md) | Concept | self-evolution, adaptation |
| [Human-in-the-Loop Guide](./human-in-the-loop-ai-agents.md) | Guide | hitl, human oversight, safety |
| [AI Agent Cost Management](./ai-agent-cost-management-guide.md) | Guide | cost control, budget, optimization |
| [Building Production AI Agents](./building-production-ai-agents.md) | Guide | production, deployment, reliability |
| [Multi-Agent vs Single-Agent](./multi-agent-vs-single-agent-systems.md) | Concept | architecture, design patterns |
| [AI Agent Observability](./ai-agent-observability-monitoring.md) | Guide | monitoring, observability, debugging |
## Purpose
These articles help developers:
- Understand the AI agent landscape
- Make informed framework choices
- Learn best practices for agent development
- Compare different approaches objectively
## Contributing
Want to add or improve an article? See [CONTRIBUTING.md](../../CONTRIBUTING.md).
-366
View File
@@ -1,366 +0,0 @@
# Aden vs AutoGen: A Detailed Comparison
*Comparing self-evolving agents with conversational multi-agent systems*
---
Microsoft's AutoGen and Aden both enable multi-agent systems but serve different purposes. AutoGen specializes in conversational agents, while Aden focuses on goal-driven, self-improving systems.
---
## Overview
| Aspect | AutoGen | Aden |
|--------|---------|------|
| **Developed By** | Microsoft | Aden |
| **Philosophy** | Conversational agents | Goal-driven, self-evolving |
| **Primary Pattern** | Multi-agent conversations | Node-based agent graphs |
| **Communication** | Natural language dialogue | Generated connection code |
| **Self-Improvement** | No | Yes |
| **Best For** | Dialogue-heavy applications | Production agent systems |
| **License** | MIT | Apache 2.0 |
---
## Philosophy & Approach
### AutoGen
AutoGen enables agents to **communicate through natural language conversations**. Agents chat with each other to solve problems collaboratively.
```python
# AutoGen: Conversation-based agents
from autogen import AssistantAgent, UserProxyAgent
assistant = AssistantAgent(
name="assistant",
llm_config={"model": "gpt-4"}
)
user_proxy = UserProxyAgent(
name="user_proxy",
human_input_mode="TERMINATE",
code_execution_config={"work_dir": "coding"}
)
# Agents solve problems through conversation
user_proxy.initiate_chat(
assistant,
message="Create a Python script to analyze sales data"
)
```
### Aden
Aden uses a **coding agent to generate complete agent systems** from goals. Agents are connected through generated code, not just conversation.
```python
# Aden: Goal-driven agent generation
goal = """
Build a data analysis system that:
1. Ingests sales data from multiple sources
2. Generates insights and visualizations
3. Creates weekly summary reports
4. Escalates anomalies to the data team
When analysis fails or produces incorrect results,
learn from the corrections to improve accuracy.
"""
# Aden generates specialized agents with:
# - Data ingestion tools
# - Analysis capabilities
# - Visualization outputs
# - Human escalation for anomalies
# - Self-improvement from feedback
```
---
## Feature Comparison
### Communication Model
| Feature | AutoGen | Aden |
|---------|---------|------|
| Agent-to-agent | Natural language | Generated connections |
| Conversation history | Built-in | Via shared memory |
| Message passing | Sequential turns | Async/event-driven |
| Human interaction | Via UserProxyAgent | Client-facing nodes |
**Verdict:** AutoGen is more natural for dialogue; Aden is more flexible for diverse patterns.
### Code Execution
| Feature | AutoGen | Aden |
|---------|---------|------|
| Code execution | Built-in (sandboxed) | Via tools |
| Language support | Python (primarily) | Multi-language via tools |
| Execution safety | Docker containers | Tool-level sandboxing |
| Result handling | Conversation flow | Structured outputs |
**Verdict:** AutoGen has stronger built-in code execution; Aden uses tool abstraction.
### Multi-Agent Patterns
| Feature | AutoGen | Aden |
|---------|---------|------|
| Group chat | Native support | Via graph connections |
| Hierarchical | Nested conversations | Node hierarchies |
| Dynamic agents | Limited | Coding agent creates as needed |
| Agent discovery | Manual | Auto-generated |
**Verdict:** AutoGen excels at chat patterns; Aden is more flexible for non-chat workflows.
### Production Features
| Feature | AutoGen | Aden |
|---------|---------|------|
| Monitoring | Basic logging | Full dashboard |
| Cost tracking | Manual | Automatic |
| Budget controls | Not built-in | Native |
| Self-improvement | No | Yes |
**Verdict:** Aden is significantly more production-ready.
---
## Code Comparison
### Building a Coding Assistant
#### AutoGen Approach
```python
from autogen import AssistantAgent, UserProxyAgent, GroupChat, GroupChatManager
# Define specialized agents
coder = AssistantAgent(
name="coder",
system_message="You are a Python expert...",
llm_config=llm_config
)
reviewer = AssistantAgent(
name="reviewer",
system_message="You review code for bugs and improvements...",
llm_config=llm_config
)
executor = UserProxyAgent(
name="executor",
human_input_mode="NEVER",
code_execution_config={"work_dir": "workspace"}
)
# Create group chat
group_chat = GroupChat(
agents=[coder, reviewer, executor],
messages=[],
max_round=10
)
manager = GroupChatManager(groupchat=group_chat, llm_config=llm_config)
# Start conversation
executor.initiate_chat(
manager,
message="Create a data processing pipeline"
)
# Conversation happens naturally between agents
# Each agent responds based on their role
```
#### Aden Approach
```python
# Define goal for coding assistant system
goal = """
Build a code development system that:
1. Understands coding requests and breaks them into tasks
2. Writes Python code following best practices
3. Reviews code for bugs, security issues, and improvements
4. Executes code in a safe environment
5. Iterates based on execution results
Human review required for:
- Code that accesses external services
- Changes to production systems
- Code handling sensitive data
Self-improvement:
- Learn from code review feedback
- Track which patterns cause bugs
- Improve based on execution failures
"""
# Aden creates:
# - Task decomposition agent
# - Coder agent with best practices
# - Reviewer agent with learned patterns
# - Safe execution environment
# - Human checkpoints for sensitive operations
# - Feedback loop for continuous improvement
```
---
## Use Case Comparison
### Best for AutoGen
1. **Conversational AI applications**
- Chatbots with multiple personalities
- Customer service with specialist handoffs
- Interactive tutoring systems
2. **Code generation through dialogue**
- Pair programming assistants
- Code review discussions
- Debugging conversations
3. **Research and exploration**
- Collaborative problem solving
- Multi-perspective analysis
- Brainstorming sessions
### Best for Aden
1. **Production agent systems**
- Customer support with evolution
- Data pipelines that self-correct
- Content systems that improve
2. **Goal-oriented automation**
- Business process automation
- Monitoring and alerting
- Report generation
3. **Systems requiring adaptation**
- Changing requirements
- Learning from failures
- Continuous improvement
---
## Detailed Comparisons
### Conversation Management
| Aspect | AutoGen | Aden |
|--------|---------|------|
| Turn management | Automatic | Event-driven |
| Context window | Managed | Via memory tools |
| History persistence | Session-based | Durable storage |
| Branching conversations | Supported | Via graph structure |
### Error Handling
| Aspect | AutoGen | Aden |
|--------|---------|------|
| Execution errors | Retry in conversation | Capture and evolve |
| Logic errors | Agent discussion | Failure analysis |
| Recovery | Manual intervention | Automatic adaptation |
| Learning | No | Built-in |
### Integration
| Aspect | AutoGen | Aden |
|--------|---------|------|
| External tools | Function calling | Tool nodes |
| APIs | Custom integration | SDK support |
| Databases | Via code execution | Native connections |
| Enterprise systems | Custom | MCP tools |
---
## When to Choose AutoGen
AutoGen is the better choice when:
1. **Conversation is the core pattern** - Your agents primarily communicate through dialogue
2. **Code execution is central** - Need built-in sandboxed execution
3. **Microsoft ecosystem** - Already invested in Microsoft AI tools
4. **Research applications** - Exploring multi-agent conversations
5. **Flexible dialogue** - Agents need natural back-and-forth
6. **Quick prototypes** - Simple multi-agent conversations
---
## When to Choose Aden
Aden is the better choice when:
1. **Production requirements** - Need monitoring, cost control, health checks
2. **Self-improvement matters** - System should evolve from failures
3. **Goal-driven development** - Prefer describing outcomes
4. **Non-conversational patterns** - Workflows beyond dialogue
5. **Cost management** - Need budget enforcement
6. **Human-in-the-loop** - Require structured intervention points
7. **Long-running systems** - Agents operating continuously
---
## Hybrid Architectures
### AutoGen Agents in Aden
AutoGen conversations can be wrapped as Aden nodes:
```python
# AutoGen conversation as a node in Aden's graph
class AutoGenConversationNode:
def execute(self, input):
# Run AutoGen conversation
# Return structured output
pass
```
### Benefits of Hybrid
- Use AutoGen's conversation for dialogue-heavy tasks
- Use Aden's orchestration and monitoring
- Get self-improvement across the system
- Maintain cost controls
---
## Performance Considerations
| Metric | AutoGen | Aden |
|--------|---------|------|
| Latency per turn | Higher (full responses) | Optimized per node |
| Token efficiency | Conversation overhead | Direct communication |
| Scalability | Memory-bound | Distributed-ready |
| Cost tracking | Manual | Automatic |
---
## Community & Support
| Aspect | AutoGen | Aden |
|--------|---------|------|
| Backing | Microsoft Research | Y Combinator startup |
| Community | Large, active | Growing |
| Documentation | Comprehensive | Good and improving |
| Enterprise support | Microsoft channels | Direct team support |
---
## Conclusion
**AutoGen** excels at creating agents that collaborate through natural language conversations. It's ideal for dialogue-heavy applications and leverages Microsoft's AI expertise.
**Aden** provides goal-driven, self-improving agent systems with production features built-in. It's better for systems that need to evolve and require operational visibility.
### Quick Decision Guide
| Your Need | Choose |
|-----------|--------|
| Conversational agents | AutoGen |
| Code execution focus | AutoGen |
| Self-improving systems | Aden |
| Production monitoring | Aden |
| Microsoft ecosystem | AutoGen |
| Cost management | Aden |
| Natural dialogue | AutoGen |
| Goal-driven development | Aden |
---
*Last updated: January 2025*
-346
View File
@@ -1,346 +0,0 @@
# Aden vs CrewAI: A Detailed Comparison
*Comparing self-evolving agents with role-based agent teams*
---
CrewAI and Aden both focus on multi-agent systems but take fundamentally different approaches. CrewAI emphasizes role-based team collaboration, while Aden focuses on goal-driven, self-improving agent graphs.
---
## Overview
| Aspect | CrewAI | Aden |
|--------|--------|------|
| **Philosophy** | Role-based agent teams | Goal-driven, self-evolving agents |
| **Architecture** | Crews with roles | Node-based agent graphs |
| **Workflow** | Predefined collaboration | Dynamically generated |
| **Self-Improvement** | No | Yes |
| **Human-in-the-Loop** | Basic support | Native intervention points |
| **Monitoring** | Basic logging | Full dashboard |
| **License** | MIT | Apache 2.0 |
---
## Philosophy & Approach
### CrewAI
CrewAI organizes agents as a **crew** with defined **roles**. Each agent has a specific job, and they collaborate in predefined patterns to accomplish tasks.
```python
# CrewAI: Role-based team definition
from crewai import Agent, Task, Crew
researcher = Agent(
role="Senior Research Analyst",
goal="Uncover cutting-edge developments",
backstory="You are an expert at finding information...",
tools=[search_tool, web_scraper]
)
writer = Agent(
role="Content Writer",
goal="Create engaging content from research",
backstory="You are a skilled writer..."
)
# Define tasks and crew
crew = Crew(
agents=[researcher, writer],
tasks=[research_task, writing_task],
process=Process.sequential
)
```
### Aden
Aden uses a **coding agent** to generate agent systems from natural language goals. The system creates agents, connections, and evolves based on failures.
```python
# Aden: Goal-driven generation
goal = """
Research cutting-edge developments in AI and create
engaging blog content. When content is rejected by
editors, learn from the feedback to improve future posts.
"""
# Aden generates:
# - Research agent with appropriate tools
# - Writer agent with learned preferences
# - Editor checkpoint (human-in-the-loop)
# - Feedback loop for improvement
```
---
## Feature Comparison
### Agent Definition
| Feature | CrewAI | Aden |
|---------|--------|------|
| Agent creation | Manual role definition | Generated from goals |
| Roles | Explicit (role, goal, backstory) | Inferred from requirements |
| Tools assignment | Manual per agent | Auto-configured |
| Customization | High | High (via goal refinement) |
**Verdict:** CrewAI offers more explicit control; Aden reduces boilerplate through generation.
### Team Collaboration
| Feature | CrewAI | Aden |
|---------|--------|------|
| Collaboration patterns | Sequential, hierarchical | Dynamic, goal-based |
| Communication | Predefined handoffs | Generated connection code |
| Flexibility | Within defined patterns | Fully dynamic |
| Adaptation | Manual updates | Automatic evolution |
**Verdict:** CrewAI is more predictable; Aden is more adaptive.
### Failure Handling
| Feature | CrewAI | Aden |
|---------|--------|------|
| Error handling | Try/catch | Automatic capture |
| Learning from failures | Not built-in | Core feature |
| Agent evolution | Manual updates | Automatic |
| Recovery strategies | Custom code | Built-in policies |
**Verdict:** Aden's failure handling and evolution is significantly more advanced.
### Production Features
| Feature | CrewAI | Aden |
|---------|--------|------|
| Monitoring dashboard | No | Yes |
| Cost tracking | No | Yes |
| Budget enforcement | No | Yes |
| Health checks | Basic | Comprehensive |
**Verdict:** Aden is more production-ready out of the box.
---
## Code Comparison
### Building a Content Creation Team
#### CrewAI Approach
```python
from crewai import Agent, Task, Crew, Process
# Define agents with explicit roles
researcher = Agent(
role="Research Specialist",
goal="Find accurate, relevant information",
backstory="Expert researcher with attention to detail",
verbose=True,
tools=[search_tool, scrape_tool]
)
writer = Agent(
role="Content Writer",
goal="Create engaging, SEO-friendly content",
backstory="Experienced content creator",
verbose=True
)
editor = Agent(
role="Editor",
goal="Ensure quality and accuracy",
backstory="Meticulous editor with high standards"
)
# Define tasks
research_task = Task(
description="Research {topic} thoroughly",
agent=researcher,
expected_output="Comprehensive research notes"
)
writing_task = Task(
description="Write article based on research",
agent=writer,
expected_output="Draft article"
)
editing_task = Task(
description="Edit and polish the article",
agent=editor,
expected_output="Final article"
)
# Create and run crew
crew = Crew(
agents=[researcher, writer, editor],
tasks=[research_task, writing_task, editing_task],
process=Process.sequential
)
result = crew.kickoff(inputs={"topic": "AI trends 2025"})
```
#### Aden Approach
```python
# Define goal - system generates the team
goal = """
Create a content creation system that:
1. Researches topics thoroughly using web search
2. Writes engaging, SEO-optimized articles
3. Gets human editor approval before publishing
4. Learns from editor feedback to improve over time
When articles are rejected:
- Capture the feedback
- Identify patterns in rejections
- Adjust writing style and quality criteria
"""
# Aden automatically:
# - Creates research, writer nodes
# - Sets up human-in-the-loop for editor
# - Establishes feedback learning loop
# - Monitors cost and quality metrics
# The system evolves:
# - Writing improves based on rejections
# - Research depth adjusts based on needs
# - Quality thresholds adapt
```
---
## Detailed Comparisons
### Ease of Use
| Aspect | CrewAI | Aden |
|--------|--------|------|
| Learning curve | Moderate | Moderate |
| Initial setup | Define roles/tasks | Define goals |
| Iteration speed | Requires code changes | Goal refinement |
| Documentation | Good | Growing |
### Scalability
| Aspect | CrewAI | Aden |
|--------|--------|------|
| Agent count | Grows with complexity | Managed automatically |
| Task complexity | Manual orchestration | Dynamic handling |
| Resource management | Manual | Built-in controls |
### Customization
| Aspect | CrewAI | Aden |
|--------|--------|------|
| Agent behavior | Full control via role/backstory | Via goals and feedback |
| Tools | Assign per agent | Auto-configured + custom |
| Workflows | Predefined processes | Generated + evolved |
| Prompts | Full access | Goal-based abstraction |
---
## When to Choose CrewAI
CrewAI is the better choice when:
1. **Roles are well-defined** - You know exactly what each agent should do
2. **Predictable workflows** - Sequential or hierarchical processes work
3. **Direct control needed** - Want to define every aspect of agent behavior
4. **Simple team structures** - Small crews with clear responsibilities
5. **Quick prototyping** - Get a multi-agent system running fast
6. **No evolution needed** - Workflow won't need to adapt over time
---
## When to Choose Aden
Aden is the better choice when:
1. **Goals over roles** - Know what to achieve, not how to organize
2. **Adaptation required** - System needs to improve from failures
3. **Complex workflows** - Dynamic connections between many agents
4. **Production deployment** - Need monitoring, cost controls, health checks
5. **Human oversight** - Require native HITL with escalation policies
6. **Continuous improvement** - Want agents to get better automatically
7. **Cost management** - Need budget enforcement and model degradation
---
## Hybrid Approaches
Some teams use both frameworks:
### CrewAI for Specific Tasks
```python
# Use CrewAI for well-defined sub-tasks
research_crew = Crew(agents=[...], tasks=[...])
```
### Aden for Orchestration
```python
# Aden orchestrates and evolves the overall system
# CrewAI crews can be nodes in Aden's graph
```
---
## Migration Considerations
### CrewAI to Aden
- Map roles to goal descriptions
- Convert tasks to expected outcomes
- Existing tools often transfer directly
- Add failure scenarios to enable evolution
### Aden to CrewAI
- Analyze generated agent graph for roles
- Define explicit role/backstory from behavior
- Recreate evolution logic manually if needed
- Set up external monitoring
---
## Performance Comparison
| Metric | CrewAI | Aden |
|--------|--------|------|
| Startup time | Fast | Moderate (includes setup) |
| Execution overhead | Low | Low |
| Memory usage | Depends on agents | Includes monitoring |
| LLM calls | As defined | Optimized + tracked |
---
## Community & Ecosystem
| Aspect | CrewAI | Aden |
|--------|--------|------|
| GitHub stars | High | Growing |
| Community size | Large | Growing |
| Enterprise users | Many | Early adopters |
| Third-party tools | Growing ecosystem | Integrated platform |
---
## Conclusion
**CrewAI** excels at creating predictable, role-based agent teams with explicit control over behavior and collaboration patterns. It's ideal for well-defined workflows.
**Aden** shines when you need agents that evolve and improve, with built-in production features like monitoring and cost control. It's better for systems that need to adapt.
### Decision Matrix
| Your Situation | Choose |
|----------------|--------|
| Know exact roles needed | CrewAI |
| Know outcomes, not structure | Aden |
| Need predictable behavior | CrewAI |
| Need adaptive behavior | Aden |
| Simple prototyping | CrewAI |
| Production deployment | Aden |
| Cost management important | Aden |
| Maximum control | CrewAI |
---
*Last updated: January 2025*
-266
View File
@@ -1,266 +0,0 @@
# Aden vs LangChain: A Detailed Comparison
*Choosing between goal-driven agents and component-based development*
---
LangChain and Aden represent two different philosophies for building AI agent systems. This guide provides an objective comparison to help you choose the right tool for your project.
---
## Overview
| Aspect | LangChain | Aden |
|--------|-----------|------|
| **Philosophy** | Component library for LLM apps | Goal-driven, self-improving agents |
| **Primary Language** | Python, JavaScript | Python SDK, TypeScript backend |
| **Architecture** | Chains and components | Node-based agent graphs |
| **Workflow Definition** | Manual chain creation | Generated from natural language |
| **Self-Improvement** | No | Yes, automatic evolution |
| **Monitoring** | Third-party integrations | Built-in dashboard |
| **License** | MIT | Apache 2.0 |
---
## Philosophy & Approach
### LangChain
LangChain follows a **component-based approach**. You manually select and connect components (LLMs, retrievers, tools, memory) to build chains and agents. This gives you fine-grained control but requires explicit workflow definition.
```python
# LangChain: Manual chain construction
from langchain import LLMChain, PromptTemplate
from langchain.agents import create_react_agent
# You define every component and connection
prompt = PromptTemplate(...)
chain = LLMChain(llm=llm, prompt=prompt)
agent = create_react_agent(llm, tools, prompt)
```
### Aden
Aden follows a **goal-driven approach**. You describe what you want to achieve in natural language, and a coding agent generates the agent graph and connection code. When things fail, the system evolves automatically.
```python
# Aden: Goal-driven generation
# Describe your goal, the coding agent generates the system
goal = """
Create a system that monitors customer feedback,
categorizes sentiment, and escalates negative reviews
to the support team with suggested responses.
"""
# The framework generates agents, connections, and tests
```
---
## Feature Comparison
### RAG & Document Processing
| Feature | LangChain | Aden |
|---------|-----------|------|
| Vector store integrations | Extensive (50+) | Growing |
| Document loaders | Comprehensive | Via tools |
| Retrieval strategies | Multiple built-in | Customizable |
| Query transformation | Built-in | Agent-defined |
**Verdict:** LangChain excels at RAG with its mature ecosystem of integrations.
### Agent Architecture
| Feature | LangChain | Aden |
|---------|-----------|------|
| Agent types | ReAct, OpenAI Functions, etc. | SDK-wrapped nodes |
| Multi-agent | Requires orchestration | Native multi-agent |
| Communication | Manual setup | Auto-generated connections |
| Graph visualization | Third-party | Built-in dashboard |
**Verdict:** Aden provides more native multi-agent support; LangChain offers more agent type options.
### Self-Improvement & Adaptation
| Feature | LangChain | Aden |
|---------|-----------|------|
| Failure handling | Manual try/catch | Automatic capture |
| Learning from failures | Not built-in | Automatic evolution |
| Agent graph updates | Manual code changes | Automated via coding agent |
| A/B testing agents | Manual | Roadmap |
**Verdict:** Aden's self-improvement is a unique differentiator not found in LangChain.
### Observability & Monitoring
| Feature | LangChain | Aden |
|---------|-----------|------|
| Tracing | LangSmith (paid), third-party | Built-in |
| Cost tracking | Third-party | Native |
| Real-time monitoring | LangSmith | WebSocket dashboard |
| Budget controls | Not built-in | Native with auto-degradation |
**Verdict:** Aden includes monitoring out of the box; LangChain requires LangSmith or third-party tools.
### Human-in-the-Loop
| Feature | LangChain | Aden |
|---------|-----------|------|
| Human approval | Manual implementation | Native intervention nodes |
| Escalation policies | Custom code | Configurable timeouts |
| Input collection | Custom | Built-in request system |
**Verdict:** Aden has more built-in HITL support; LangChain requires custom implementation.
---
## Code Comparison
### Building a Customer Support Agent
#### LangChain Approach
```python
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain_openai import ChatOpenAI
from langchain.tools import Tool
from langchain.memory import ConversationBufferMemory
# Define tools manually
tools = [
Tool(name="search_kb", func=search_knowledge_base, description="..."),
Tool(name="create_ticket", func=create_support_ticket, description="..."),
Tool(name="escalate", func=escalate_to_human, description="..."),
]
# Create agent with explicit configuration
llm = ChatOpenAI(model="gpt-4")
memory = ConversationBufferMemory()
agent = create_openai_tools_agent(llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools, memory=memory)
# Run agent
response = executor.invoke({"input": customer_query})
# Error handling is manual
try:
response = executor.invoke({"input": query})
except Exception as e:
log_error(e)
# Manual recovery logic
```
#### Aden Approach
```python
# Define goal - system generates the agent graph
goal = """
Build a customer support agent that:
1. Searches our knowledge base for answers
2. Creates tickets for unresolved issues
3. Escalates to humans when confidence is low
4. Learns from resolved tickets to improve responses
When the agent fails to help a customer, capture the failure
and improve the response strategy.
"""
# Aden generates:
# - Agent graph with specialized nodes
# - Connection code between nodes
# - Test cases for validation
# - Monitoring hooks
# The SDK handles:
# - Automatic failure capture
# - Evolution based on failures
# - Cost tracking and budget enforcement
# - Human escalation at intervention points
```
---
## Production Considerations
### Deployment
| Aspect | LangChain | Aden |
|--------|-----------|------|
| Deployment model | Library in your app | Self-hosted platform |
| Infrastructure | You manage | Docker Compose included |
| Scaling | Your responsibility | Built-in considerations |
| Database requirements | Optional | TimescaleDB, MongoDB, PostgreSQL |
### Cost Management
| Aspect | LangChain | Aden |
|--------|-----------|------|
| Token tracking | Manual or LangSmith | Automatic |
| Budget limits | Not built-in | Native with enforcement |
| Model degradation | Manual | Automatic fallback |
| Cost alerts | Third-party | Built-in |
### Reliability
| Aspect | LangChain | Aden |
|--------|-----------|------|
| Retry logic | Manual | Built-in |
| Fallback chains | Manual | Automatic |
| Health monitoring | Third-party | Native endpoints |
| Self-healing | No | Yes |
---
## When to Choose LangChain
LangChain is the better choice when:
1. **Building RAG applications** - LangChain's retrieval ecosystem is unmatched
2. **Need extensive integrations** - 50+ vector stores, document loaders, etc.
3. **Want fine-grained control** - Every component is explicitly configured
4. **Already invested** - Large existing LangChain codebase
5. **Simple agent needs** - Single-purpose agents without complex orchestration
6. **Prefer library over platform** - Want to embed in existing infrastructure
---
## When to Choose Aden
Aden is the better choice when:
1. **Agents need to evolve** - Systems should improve from failures automatically
2. **Goal-driven development** - Prefer describing outcomes over coding workflows
3. **Multi-agent systems** - Complex agent graphs with dynamic connections
4. **Production monitoring is critical** - Need built-in observability
5. **Cost control matters** - Require budget enforcement and auto-degradation
6. **Human oversight needed** - Native HITL support with escalation
7. **Rapid iteration** - Want to change agent behavior without code rewrites
---
## Migration Considerations
### LangChain to Aden
- LangChain tools can often be adapted as Aden node tools
- Existing prompts can inform goal definitions
- Consider gradual migration, running systems in parallel
### Aden to LangChain
- Agent graphs can be manually reimplemented as chains
- Monitoring would need replacement (LangSmith or alternatives)
- Self-improvement logic would need custom implementation
---
## Conclusion
**LangChain** is a mature, flexible component library ideal for RAG applications and developers who want explicit control over every aspect of their agent.
**Aden** offers a paradigm shift with goal-driven, self-improving agents, better suited for production systems that need to adapt and evolve over time with built-in monitoring.
The choice depends on:
- **Control vs. Automation**: LangChain for control, Aden for automation
- **Static vs. Evolving**: LangChain for stable workflows, Aden for adaptive systems
- **Library vs. Platform**: LangChain as a library, Aden as a platform
Many teams use both: LangChain for specific RAG components, Aden for orchestration and evolution.
---
*Last updated: January 2025*
@@ -1,465 +0,0 @@
# AI Agent Cost Management: A Complete Guide
*Control spending, optimize efficiency, and prevent budget disasters*
---
AI agents can burn through budgets faster than you expect. A single runaway agent loop can cost thousands of dollars in minutes. This guide covers strategies, tools, and best practices for managing AI agent costs.
---
## The Cost Problem
### Why AI Agents Are Expensive
| Factor | Impact |
|--------|--------|
| LLM API calls | $0.01 - $0.10+ per call |
| Token usage | Input + output tokens |
| Agent loops | Multiple calls per task |
| Retries | Failed calls still cost money |
| Verbose prompts | More tokens = more cost |
| Tool usage | Additional API calls |
### Real-World Example
```
Simple customer support agent:
- 5 LLM calls per interaction
- 2000 tokens average per call
- GPT-4: ~$0.06 per call
- 100 interactions/day = $30/day
Complex research agent:
- 50+ LLM calls per task
- 10000 tokens average per call
- GPT-4: ~$0.30 per call
- 10 tasks/day = $150/day
Runaway agent loop:
- 1000 calls in 10 minutes
- $300+ before detection
```
---
## Cost Control Strategies
### Strategy 1: Budget Limits
Set hard limits on spending per:
- Time period (daily, weekly, monthly)
- Agent
- Task
- Team
- User
```python
budget_config = {
"daily_limit": 100.00,
"per_task_limit": 5.00,
"per_agent_limit": 50.00,
"alert_at_percentage": 80,
"action_on_limit": "block" # or "degrade", "alert"
}
```
### Strategy 2: Model Degradation
Automatically switch to cheaper models as budget is consumed:
```
Budget usage:
0-70% → Use GPT-4 (best quality)
70-90% → Use GPT-3.5-turbo (good quality)
90-100% → Use GPT-3.5-turbo with shorter prompts
100%+ → Block or queue requests
```
### Strategy 3: Request Throttling
Limit request rate to control burn rate:
```python
throttle_config = {
"requests_per_minute": 10,
"requests_per_hour": 200,
"backoff_multiplier": 2,
"max_backoff_seconds": 60
}
```
### Strategy 4: Token Optimization
Reduce tokens per request:
| Technique | Savings |
|-----------|---------|
| Shorter system prompts | 20-40% |
| Compressed context | 30-50% |
| Response length limits | 20-30% |
| Remove unnecessary examples | 10-20% |
### Strategy 5: Caching
Cache common requests and responses:
```python
# Before: Every request hits the API
result = llm.complete(prompt) # Costs money
# After: Cache frequent patterns
cached = cache.get(prompt_hash)
if cached:
result = cached # Free
else:
result = llm.complete(prompt)
cache.set(prompt_hash, result)
```
---
## Framework Comparison: Cost Features
| Framework | Budget Limits | Degradation | Tracking | Alerts |
|-----------|--------------|-------------|----------|--------|
| LangChain | Third-party | Manual | LangSmith | Manual |
| CrewAI | Not built-in | Manual | Basic | Manual |
| AutoGen | Not built-in | Manual | Manual | Manual |
| **Aden** | **Native** | **Automatic** | **Built-in** | **Native** |
### Aden's Cost Controls
Aden includes comprehensive cost management:
```python
# Budget configuration in Aden
budget_rules = {
"budget_id": "team_engineering",
"limits": {
"daily": 500.00,
"monthly": 10000.00,
"per_agent": 100.00
},
"degradation": {
"80_percent": "switch_to_gpt35",
"95_percent": "throttle",
"100_percent": "block"
},
"alerts": {
"channels": ["slack", "email"],
"thresholds": [50, 80, 95, 100]
}
}
```
---
## Implementing Cost Tracking
### Basic Tracking
```python
class CostTracker:
def __init__(self):
self.total_cost = 0
self.cost_by_agent = {}
self.cost_by_model = {}
def track(self, request, response, model):
input_tokens = count_tokens(request)
output_tokens = count_tokens(response)
cost = self.calculate_cost(model, input_tokens, output_tokens)
self.total_cost += cost
self.cost_by_agent[request.agent_id] = \
self.cost_by_agent.get(request.agent_id, 0) + cost
self.cost_by_model[model] = \
self.cost_by_model.get(model, 0) + cost
return cost
def calculate_cost(self, model, input_tokens, output_tokens):
rates = {
"gpt-4": {"input": 0.03, "output": 0.06}, # per 1K tokens
"gpt-3.5-turbo": {"input": 0.0005, "output": 0.0015},
"claude-3-opus": {"input": 0.015, "output": 0.075},
"claude-3-sonnet": {"input": 0.003, "output": 0.015},
}
rate = rates.get(model, rates["gpt-3.5-turbo"])
return (input_tokens * rate["input"] + output_tokens * rate["output"]) / 1000
```
### Advanced Tracking with Attribution
```python
cost_record = {
"timestamp": "2025-01-15T10:30:00Z",
"request_id": "req_123",
"agent_id": "support_agent_1",
"task_id": "task_456",
"team_id": "customer_success",
"model": "gpt-4",
"input_tokens": 1500,
"output_tokens": 500,
"cost_usd": 0.075,
"cached": False,
"degraded": False
}
```
---
## Alert Configuration
### Threshold Alerts
```yaml
alerts:
- name: "Budget Warning"
condition: "daily_spend > daily_budget * 0.8"
channels: ["slack"]
message: "80% of daily budget consumed"
- name: "Budget Critical"
condition: "daily_spend > daily_budget * 0.95"
channels: ["slack", "pagerduty"]
message: "95% of daily budget - taking action"
action: "degrade_models"
- name: "Runaway Agent"
condition: "requests_per_minute > 100"
channels: ["pagerduty"]
message: "Possible runaway agent detected"
action: "pause_agent"
```
### Anomaly Detection
```python
def detect_anomalies(recent_costs, historical_average):
"""Alert if costs significantly exceed historical patterns"""
threshold = historical_average * 3 # 3x normal
if recent_costs > threshold:
alert(
level="critical",
message=f"Cost anomaly: ${recent_costs:.2f} vs avg ${historical_average:.2f}",
action="investigate"
)
```
---
## Model Selection Strategies
### Cost vs Quality Matrix
| Model | Cost (per 1K tokens) | Quality | Best For |
|-------|---------------------|---------|----------|
| GPT-4 | $0.03-0.06 | Highest | Complex reasoning |
| GPT-4-turbo | $0.01-0.03 | High | Balance cost/quality |
| GPT-3.5-turbo | $0.0005-0.0015 | Good | High volume, simple |
| Claude 3 Opus | $0.015-0.075 | Highest | Long context |
| Claude 3 Sonnet | $0.003-0.015 | High | Good balance |
| Claude 3 Haiku | $0.00025-0.00125 | Good | Fast, cheap |
### Dynamic Model Selection
```python
def select_model(task_complexity, budget_remaining, daily_limit):
budget_percentage = (daily_limit - budget_remaining) / daily_limit
if task_complexity == "simple":
return "gpt-3.5-turbo" # Always cheap for simple
elif budget_percentage < 0.5:
return "gpt-4" # Best model when budget healthy
elif budget_percentage < 0.8:
return "gpt-4-turbo" # Balanced
else:
return "gpt-3.5-turbo" # Preserve budget
```
---
## Optimization Techniques
### 1. Prompt Engineering for Cost
```python
# Expensive: Long system prompt
system_prompt = """
You are a helpful assistant that specializes in customer support.
You should always be polite, professional, and helpful.
When answering questions, provide detailed explanations.
Always consider the customer's perspective.
Remember to be empathetic and understanding.
[... 500 more tokens ...]
"""
# Cheaper: Concise system prompt
system_prompt = """
Customer support agent. Be helpful, polite, concise.
Resolve issues efficiently.
"""
# Savings: ~400 tokens × 1000 requests = $12/day
```
### 2. Context Window Management
```python
def manage_context(messages, max_tokens=4000):
"""Keep context within budget by summarizing old messages"""
current_tokens = count_tokens(messages)
if current_tokens > max_tokens:
# Summarize older messages
old_messages = messages[:-5] # Keep recent
summary = summarize(old_messages)
return [{"role": "system", "content": f"Previous context: {summary}"}] + messages[-5:]
return messages
```
### 3. Batch Processing
```python
# Expensive: Individual requests
for item in items:
result = llm.complete(f"Process: {item}")
# Cheaper: Batch when possible
batch_prompt = "Process these items:\n" + "\n".join(items)
results = llm.complete(batch_prompt)
```
### 4. Response Length Control
```python
# Add to system prompt
system_prompt += "\nKeep responses under 200 words."
# Or use max_tokens parameter
response = llm.complete(
prompt,
max_tokens=1024 # Hard limit
)
```
---
## Runaway Agent Prevention
### Detection Mechanisms
```python
class RunawayDetector:
def __init__(self):
self.request_times = []
self.max_requests_per_minute = 50
self.max_cost_per_minute = 10.00
def check(self, cost):
now = time.time()
self.request_times.append((now, cost))
# Clean old entries
self.request_times = [
(t, c) for t, c in self.request_times
if now - t < 60
]
# Check thresholds
requests_per_minute = len(self.request_times)
cost_per_minute = sum(c for _, c in self.request_times)
if requests_per_minute > self.max_requests_per_minute:
return "RUNAWAY_REQUESTS"
if cost_per_minute > self.max_cost_per_minute:
return "RUNAWAY_COST"
return "OK"
```
### Circuit Breakers
```python
class CostCircuitBreaker:
def __init__(self, threshold, window_seconds=60):
self.threshold = threshold
self.window_seconds = window_seconds
self.costs = []
self.is_open = False
def record_cost(self, cost):
now = time.time()
self.costs.append((now, cost))
self._cleanup()
total_cost = sum(c for _, c in self.costs)
if total_cost > self.threshold:
self.is_open = True
alert("Circuit breaker opened - costs exceeded threshold")
def allow_request(self):
if self.is_open:
# Check if we should reset
if time.time() - self.costs[-1][0] > self.window_seconds:
self.is_open = False
self.costs = []
return True
return False
return True
```
---
## Dashboard Metrics
### Essential Cost Metrics
| Metric | Description | Alert Threshold |
|--------|-------------|-----------------|
| Hourly spend | Cost in last hour | > 2x average |
| Daily spend | Cost today | > 80% budget |
| Cost per task | Average task cost | > expected |
| Token efficiency | Output/input ratio | < 0.3 |
| Cache hit rate | Cached vs new requests | < 50% |
| Model distribution | % by model | Unexpected shifts |
### Aden Dashboard
Aden provides built-in cost visualization:
- Real-time cost tracking
- Budget gauges with alerts
- Cost by agent/model breakdown
- Historical trends
- Anomaly detection
---
## Best Practices Summary
### Do's
1. ✅ Set budget limits before deployment
2. ✅ Implement automatic degradation
3. ✅ Monitor costs in real-time
4. ✅ Alert on anomalies
5. ✅ Optimize prompts for token efficiency
6. ✅ Cache common requests
7. ✅ Use appropriate models for task complexity
8. ✅ Review costs regularly
### Don'ts
1. ❌ Deploy without budget limits
2. ❌ Use GPT-4 for everything
3. ❌ Ignore cost metrics
4. ❌ Allow unlimited retries
5. ❌ Store full context forever
6. ❌ Skip testing cost scenarios
7. ❌ Forget about tool API costs
---
## Conclusion
AI agent cost management requires:
1. **Prevention**: Budget limits, degradation policies
2. **Detection**: Real-time tracking, anomaly alerts
3. **Optimization**: Smart model selection, token efficiency
4. **Protection**: Circuit breakers, runaway detection
Frameworks like Aden with built-in cost controls make this easier, but the principles apply to any agent system. Start with conservative limits and adjust based on real usage patterns.
---
*Last updated: January 2025*
@@ -1,423 +0,0 @@
# AI Agent Observability & Monitoring: The Complete Guide
*How to know what your AI agents are actually doing*
---
AI agents are autonomous systems that make decisions, call tools, and interact with the world. Without proper observability, they become black boxes. This guide covers everything you need to monitor AI agents effectively.
---
## Why Agent Observability Is Different
Traditional application monitoring tracks requests and responses. Agent monitoring must track:
| Traditional Apps | AI Agents |
|------------------|-----------|
| Request/Response | Multi-step reasoning chains |
| Deterministic behavior | Probabilistic decisions |
| Fixed execution paths | Dynamic tool selection |
| Predictable costs | Variable LLM spending |
| Clear errors | Subtle quality degradation |
---
## The Four Pillars of Agent Observability
```
┌─────────────────────────────────────────────────────────────┐
│ Agent Observability Stack │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
│ │ Metrics │ │ Logs │ │ Traces │ │
│ │ (Numbers) │ │ (Events) │ │ (Execution Flow) │ │
│ └─────────────┘ └─────────────┘ └─────────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────┐ │
│ │ Quality Evals │ │
│ │ (Output Assessment) │ │
│ └───────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
```
### 1. Metrics
Quantitative measurements over time:
- Requests per minute
- Success/failure rates
- Latency distributions
- Token usage
- Cost per request
- Tool call frequencies
### 2. Logs
Discrete events with context:
- Agent decisions
- Tool inputs/outputs
- Error messages
- User interactions
- System events
### 3. Traces
End-to-end execution flows:
- Full reasoning chains
- Token-by-token generation
- Tool call sequences
- Parent-child relationships
- Cross-agent communication
### 4. Quality Evals
Output quality assessment:
- Accuracy scoring
- Hallucination detection
- Task completion rates
- User satisfaction
- Regression detection
---
## Key Metrics to Track
### Performance Metrics
| Metric | Description | Alert Threshold |
|--------|-------------|-----------------|
| `agent.latency.p50` | Median response time | > 5s |
| `agent.latency.p99` | 99th percentile latency | > 30s |
| `agent.throughput` | Requests/second | < baseline * 0.5 |
| `agent.queue.depth` | Pending requests | > 100 |
| `agent.timeout.rate` | Timeout percentage | > 5% |
### Reliability Metrics
| Metric | Description | Alert Threshold |
|--------|-------------|-----------------|
| `agent.success.rate` | Successful completions | < 95% |
| `agent.error.rate` | Error percentage | > 5% |
| `agent.retry.rate` | Retries needed | > 10% |
| `agent.fallback.rate` | Fallback usage | > 20% |
| `agent.circuit.open` | Circuit breaker status | true |
### Cost Metrics
| Metric | Description | Alert Threshold |
|--------|-------------|-----------------|
| `agent.cost.total` | Total spend | > budget * 0.9 |
| `agent.cost.per.request` | Cost per request | > $0.50 |
| `agent.tokens.input` | Input tokens used | anomaly detection |
| `agent.tokens.output` | Output tokens used | anomaly detection |
| `agent.model.usage` | Calls by model | unusual patterns |
### Quality Metrics
| Metric | Description | Alert Threshold |
|--------|-------------|-----------------|
| `agent.quality.score` | Output quality (0-1) | < 0.7 |
| `agent.hallucination.rate` | Detected hallucinations | > 5% |
| `agent.task.completion` | Tasks fully completed | < 80% |
| `agent.user.satisfaction` | User ratings | < 4.0/5.0 |
---
## Logging Best Practices
### Structured Logging Format
```json
{
"timestamp": "2025-01-15T10:30:00Z",
"level": "info",
"event": "agent_tool_call",
"agent_id": "agent-123",
"session_id": "session-456",
"trace_id": "trace-789",
"tool": "search_web",
"input": {"query": "latest AI news"},
"output_tokens": 150,
"latency_ms": 1200,
"success": true
}
```
### What to Log
**Always Log:**
- Agent start/stop
- Tool calls (name, duration, success)
- LLM calls (model, tokens, latency)
- Errors and exceptions
- Human interventions
- Budget events
**Log Carefully (PII concerns):**
- User inputs (may need redaction)
- Agent outputs (may contain sensitive data)
- Full prompts (can be large)
**Never Log:**
- API keys
- User credentials
- Full conversation transcripts in production
- Raw model weights
### Log Levels for Agents
| Level | Use Case |
|-------|----------|
| DEBUG | Full prompts, token-level details |
| INFO | Tool calls, completions, metrics |
| WARN | Retries, degradation, budget warnings |
| ERROR | Failures, exceptions, circuit breaks |
| FATAL | System crashes, unrecoverable errors |
---
## Distributed Tracing for Agents
### Why Tracing Matters
Agents involve multiple steps, LLM calls, and tool invocations. Tracing connects them all.
```
Trace: "Process customer refund"
├── Span: Agent Initialize (5ms)
├── Span: LLM Planning Call (800ms)
│ └── Attribute: model=gpt-4, tokens=500
├── Span: Tool: fetch_order (200ms)
│ └── Attribute: order_id=12345
├── Span: Tool: check_policy (50ms)
├── Span: LLM Decision Call (600ms)
│ └── Attribute: decision=approve
├── Span: Tool: process_refund (300ms)
└── Span: Agent Complete (10ms)
└── Attribute: success=true, cost=$0.08
```
### Key Trace Attributes
- `agent.id`: Unique agent identifier
- `agent.type`: Agent type/role
- `session.id`: User session
- `parent.agent`: For multi-agent systems
- `llm.model`: Model used
- `llm.tokens`: Token counts
- `tool.name`: Tool being called
- `tool.success`: Tool outcome
---
## Dashboard Design
### Dashboard 1: Operations Overview
```
┌─────────────────────────────────────────────────────────────┐
│ Agent Operations │
├─────────────────┬─────────────────┬─────────────────────────┤
│ Active Agents │ Requests/Min │ Error Rate │
│ 42 │ 1,234 │ 0.3% ✓ │
├─────────────────┴─────────────────┴─────────────────────────┤
│ │
│ Request Latency (p50/p99) Success Rate (24h) │
│ ████████████████░░░░ ██████████████████████ │
│ 1.2s / 4.5s 99.2% │
│ │
├─────────────────────────────────────────────────────────────┤
│ Top Errors Active Alerts │
│ • Rate limit exceeded (12) ⚠️ High latency p99 │
│ • Tool timeout (5) ⚠️ Budget at 85% │
│ • Validation failed (3) │
└─────────────────────────────────────────────────────────────┘
```
### Dashboard 2: Cost & Usage
```
┌─────────────────────────────────────────────────────────────┐
│ Cost & Usage │
├─────────────────┬─────────────────┬─────────────────────────┤
│ Today's Spend │ Budget Used │ Projected Monthly │
│ $127.50 │ 67% │ $3,825 │
├─────────────────┴─────────────────┴─────────────────────────┤
│ │
│ Cost by Model │ Cost by Agent │
│ ■ GPT-4: $89 │ ■ Support: $45 │
│ ■ Claude: $28 │ ■ Research: $52 │
│ ■ GPT-3.5: $10 │ ■ Writer: $30 │
│ │
├─────────────────────────────────────────────────────────────┤
│ Token Usage Trend (7 days) │
│ ▁▂▃▄▅▆▇█▇▆▅▄▃▂▁▂▃▄▅▆ │
└─────────────────────────────────────────────────────────────┘
```
### Dashboard 3: Quality & Reliability
```
┌─────────────────────────────────────────────────────────────┐
│ Quality & Reliability │
├─────────────────┬─────────────────┬─────────────────────────┤
│ Quality Score │ Task Complete │ User Satisfaction │
│ 0.92/1.0 │ 94.5% │ 4.6/5.0 │
├─────────────────┴─────────────────┴─────────────────────────┤
│ │
│ Quality Trend (30 days) │ Failure Analysis │
│ ████████████████████████ │ ■ LLM errors: 2% │
│ ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔ │ ■ Tool errors: 1% │
│ Target: 0.90 │ ■ Timeouts: 0.5% │
│ │ ■ Logic errors: 0.5% │
├─────────────────────────────────────────────────────────────┤
│ Recent Quality Issues │
│ • Agent-42 hallucination detected (15 min ago) │
│ • Agent-17 task incomplete (1 hour ago) │
└─────────────────────────────────────────────────────────────┘
```
---
## Alerting Strategy
### Critical Alerts (Page immediately)
- Error rate > 10% for 5 minutes
- All agents offline
- Budget exceeded
- Security anomaly detected
### Warning Alerts (Notify during business hours)
- Error rate > 5% for 15 minutes
- Latency p99 > 30s
- Budget > 90% of limit
- Quality score drops > 10%
### Informational (Daily digest)
- Token usage trends
- Cost projections
- Quality score changes
- New error types detected
### Alert Fatigue Prevention
- Use anomaly detection vs fixed thresholds
- Group related alerts
- Implement progressive escalation
- Review and tune alert thresholds monthly
---
## Tool Comparison
| Tool | Best For | Agent-Specific Features |
|------|----------|------------------------|
| Datadog | Enterprise, full-stack | APM for LLM calls |
| Grafana | Self-hosted, flexibility | Custom dashboards |
| LangSmith | LangChain users | Prompt tracing |
| Weights & Biases | ML teams | Experiment tracking |
| Helicone | LLM-focused | Token analytics |
| Aden | Production agents | Built-in observability |
---
## How Aden Handles Observability
Aden provides built-in observability without additional setup:
### Automatic Collection
```
┌─────────────────────────────────────────────────────────────┐
│ Aden Observability │
│ │
│ ┌───────────────┐ ┌───────────────────────────────┐ │
│ │ SDK-Wrapped │──────▶│ Event Stream │ │
│ │ Nodes │ │ • Metrics • Logs • Traces │ │
│ └───────────────┘ └───────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ Honeycomb Dashboard │ │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
│ │ │ Metrics │ │ Costs │ │ Quality │ │ Alerts │ │ │
│ │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │ │
│ └───────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
```
### What Aden Tracks Automatically
- Every LLM call (model, tokens, latency, cost)
- Every tool invocation (name, duration, success)
- Agent lifecycle events (start, stop, error)
- Budget consumption in real-time
- Quality metrics via failure tracking
- HITL intervention points
### Built-in Dashboards
- Real-time agent status
- Cost breakdown by agent/model
- Quality trends over time
- Failure analysis
- Self-improvement metrics
### No Configuration Required
Unlike external tools, Aden's observability requires no setup:
```python
# Just wrap your node with the SDK
from aden import sdk
@sdk.node
async def my_agent(input):
# All metrics automatically collected
return await process(input)
```
---
## Implementation Checklist
### Phase 1: Basic (Week 1)
- [ ] Structured logging in place
- [ ] Basic metrics: latency, errors, throughput
- [ ] Cost tracking per request
- [ ] Simple dashboard with key metrics
### Phase 2: Comprehensive (Week 2-3)
- [ ] Distributed tracing implemented
- [ ] Quality evaluation pipeline
- [ ] Alerting rules configured
- [ ] Full dashboards built
### Phase 3: Advanced (Week 4+)
- [ ] Anomaly detection
- [ ] Automated regression detection
- [ ] Cost optimization insights
- [ ] Self-healing triggers
---
## Common Pitfalls
### 1. Logging Too Much
**Problem:** Full prompts in production logs
**Solution:** Log hashes or summaries, full content only for debugging
### 2. Alert Fatigue
**Problem:** Too many non-actionable alerts
**Solution:** Use anomaly detection, tune thresholds, require action plans
### 3. Missing Context
**Problem:** Can't correlate events across agents
**Solution:** Propagate trace IDs, use correlation IDs
### 4. Ignoring Quality
**Problem:** Only track operational metrics
**Solution:** Implement quality scoring, track user feedback
### 5. No Baselines
**Problem:** Don't know what "normal" looks like
**Solution:** Establish baselines before alerting, use relative thresholds
---
## Conclusion
Effective agent observability requires:
1. **Metrics**: Know your numbers (latency, errors, cost)
2. **Logs**: Capture events with context
3. **Traces**: Follow execution flows end-to-end
4. **Quality**: Assess output, not just uptime
Modern agent platforms like Aden provide this built-in. For other frameworks, plan to invest significant effort in observability infrastructure.
The goal: Never wonder what your agents are doing—always know.
---
*Last updated: January 2025*
@@ -1,415 +0,0 @@
# Self-Improving vs Static Agents: Understanding the Paradigm Shift
*Why adaptive AI agents are changing how we build intelligent systems*
---
The AI agent landscape is divided between two fundamentally different approaches: **static agents** that execute predefined logic, and **self-improving agents** that evolve based on experience. Understanding this distinction is crucial for choosing the right architecture.
---
## The Core Difference
### Static Agents
Static agents follow **predefined workflows** that remain constant until a developer manually updates them. They're predictable but require human intervention to improve.
```
User Request → Fixed Logic → Response
(If failure)
Human fixes code
Redeploy
```
### Self-Improving Agents
Self-improving agents **learn from their experiences**, automatically adjusting their behavior based on successes and failures.
```
User Request → Adaptive Logic → Response
(If failure)
Capture failure data
Evolve agent graph
Auto-redeploy (improved)
```
---
## Comparison Table
| Aspect | Static Agents | Self-Improving Agents |
|--------|---------------|----------------------|
| Behavior change | Manual code updates | Automatic evolution |
| Failure response | Log and alert | Learn and adapt |
| Improvement cycle | Days/weeks | Minutes/hours |
| Human involvement | Required for changes | Optional oversight |
| Predictability | High | Moderate (with guardrails) |
| Long-term maintenance | Higher | Lower |
| Initial complexity | Lower | Higher |
---
## How Static Agents Work
### Architecture
```
┌─────────────────────────────────────┐
│ Static Agent │
├─────────────────────────────────────┤
│ ┌─────────────────────────────┐ │
│ │ Hardcoded Workflow │ │
│ │ ┌───┐ ┌───┐ ┌───┐ │ │
│ │ │ A │→│ B │→│ C │ │ │
│ │ └───┘ └───┘ └───┘ │ │
│ └─────────────────────────────┘ │
│ │
│ • Fixed decision logic │
│ • Predefined tool usage │
│ • Static prompts │
│ • Manual error handling │
└─────────────────────────────────────┘
```
### Typical Improvement Cycle
1. **Agent deployed** with initial logic
2. **Failures occur** in production
3. **Developers analyze** logs and errors
4. **Code changes** made manually
5. **Testing** in staging environment
6. **Redeployment** to production
7. **Repeat** for each issue
**Timeline:** Days to weeks per improvement
### Examples of Static Agent Frameworks
- LangChain agents
- Basic CrewAI implementations
- Custom ReAct agents
- Simple AutoGen conversations
---
## How Self-Improving Agents Work
### Architecture
```
┌─────────────────────────────────────────────────┐
│ Self-Improving Agent System │
├─────────────────────────────────────────────────┤
│ ┌─────────────────────────────────────────┐ │
│ │ Adaptive Agent Graph │ │
│ │ ┌───┐ ┌───┐ ┌───┐ │ │
│ │ │ A │→│ B │→│ C │ ← Can change │ │
│ │ └───┘ └───┘ └───┘ │ │
│ └─────────────────────────────────────────┘ │
│ ↑ │
│ │ Evolution │
│ │ │
│ ┌─────────────────────────────────────────┐ │
│ │ Coding Agent │ │
│ │ • Analyzes failures │ │
│ │ • Generates improvements │ │
│ │ • Updates agent graph │ │
│ └─────────────────────────────────────────┘ │
│ ↑ │
│ │ │
│ ┌─────────────────────────────────────────┐ │
│ │ Failure Capture │ │
│ │ • Error context │ │
│ │ • Input/output data │ │
│ │ • User feedback │ │
│ └─────────────────────────────────────────┘ │
└─────────────────────────────────────────────────┘
```
### Typical Improvement Cycle
1. **Agent deployed** with initial goal-derived logic
2. **Failures captured** automatically with full context
3. **Coding agent analyzes** failure patterns
4. **Graph evolved** with improved logic
5. **Automatic validation** via test cases
6. **Auto-redeployment** (with optional human approval)
7. **Continuous improvement** as more data arrives
**Timeline:** Minutes to hours per improvement
### Examples of Self-Improving Systems
- Aden's goal-driven agents
- Custom evolutionary architectures
- Reinforcement learning agents
- Meta-learning systems
---
## When Failures Happen
### Static Agent Response
```python
# Static agent: failures require manual intervention
try:
result = agent.execute(task)
except AgentError as e:
logger.error(f"Agent failed: {e}")
alert_team(e) # Human must investigate
return fallback_response()
# Improvement requires:
# 1. Developer reviews logs
# 2. Identifies root cause
# 3. Writes fix
# 4. Tests fix
# 5. Deploys update
```
### Self-Improving Agent Response
```python
# Self-improving agent: failures trigger evolution
try:
result = agent.execute(task)
except AgentError as e:
# Automatic failure capture
failure_data = {
"error": e,
"input": task,
"context": agent.get_context(),
"trace": agent.get_execution_trace()
}
# Coding agent evolves the system
improved_graph = coding_agent.evolve(
current_graph=agent.graph,
failure_data=failure_data
)
# Validate and redeploy
if improved_graph.passes_tests():
agent.update_graph(improved_graph)
# Retry with improved agent
result = agent.execute(task)
```
---
## Advantages of Each Approach
### Static Agents: Advantages
1. **Predictability**
- Behavior is deterministic
- Easy to test and verify
- No unexpected changes
2. **Simplicity**
- Easier to understand
- Straightforward debugging
- Lower initial complexity
3. **Control**
- Full visibility into logic
- Manual approval of all changes
- Compliance-friendly
4. **Stability**
- No regression from auto-changes
- Consistent performance
- Known failure modes
### Self-Improving Agents: Advantages
1. **Adaptability**
- Improves without human intervention
- Handles novel situations
- Evolves with changing needs
2. **Efficiency**
- Faster improvement cycles
- Reduced developer time
- Lower maintenance burden
3. **Resilience**
- Self-healing from failures
- Automatic recovery
- Continuous optimization
4. **Scale**
- Handles more edge cases
- Improves across all instances
- Compounds improvements over time
---
## Challenges of Each Approach
### Static Agents: Challenges
- **Slow iteration**: Days/weeks to improve
- **Developer bottleneck**: Changes require engineering time
- **Scaling issues**: More edge cases = more manual work
- **Technical debt**: Accumulated workarounds
### Self-Improving Agents: Challenges
- **Unpredictability**: Behavior may change unexpectedly
- **Complexity**: Harder to understand current state
- **Guardrails needed**: Must prevent harmful evolution
- **Debugging**: Tracing why agent behaves certain way
---
## Guardrails for Self-Improving Agents
Successful self-improving systems need safety mechanisms:
### 1. Human-in-the-Loop Checkpoints
```
Evolution proposed → Human review → Approve/Reject
```
### 2. Test Case Validation
```
Improved agent must pass:
- Original test cases
- Regression tests
- New edge case tests
```
### 3. Gradual Rollout
```
Evolution stages:
1. Shadow mode (compare outputs)
2. Canary deployment (small traffic)
3. Full rollout (all traffic)
```
### 4. Rollback Capability
```
If metrics degrade:
- Automatic revert to previous version
- Alert team for investigation
```
### 5. Evolution Constraints
```
Coding agent cannot:
- Remove human checkpoints
- Bypass security measures
- Exceed cost budgets
- Change core objectives
```
---
## Real-World Scenarios
### Scenario 1: Customer Support Agent
**Static Approach:**
- Agent handles known query types
- New query types → escalate to human
- Developer adds new handlers quarterly
- Slow to adapt to trends
**Self-Improving Approach:**
- Agent learns from successful resolutions
- New patterns automatically incorporated
- Escalation rules evolve based on outcomes
- Continuously adapts to customer needs
### Scenario 2: Data Processing Pipeline
**Static Approach:**
- Fixed schema expectations
- New data formats → pipeline breaks
- Manual updates for each change
- High maintenance burden
**Self-Improving Approach:**
- Learns new data patterns
- Automatically adapts to schema changes
- Self-corrects processing errors
- Lower long-term maintenance
### Scenario 3: Content Generation
**Static Approach:**
- Fixed style and structure
- All changes require prompt updates
- No learning from feedback
- Consistent but may become stale
**Self-Improving Approach:**
- Learns from editor feedback
- Style evolves with brand changes
- Improves quality over time
- Balances consistency with growth
---
## Making the Choice
### Choose Static Agents When:
| Situation | Reason |
|-----------|--------|
| Regulatory requirements | Need audit trail of logic |
| Safety-critical systems | Predictability essential |
| Simple, stable workflows | No need for adaptation |
| Small scale | Manual updates manageable |
| High trust requirements | Must explain all decisions |
### Choose Self-Improving Agents When:
| Situation | Reason |
|-----------|--------|
| Rapidly changing requirements | Manual updates too slow |
| High volume of edge cases | Can't manually handle all |
| Continuous improvement needed | Competitive advantage |
| Developer time is limited | Automation essential |
| Long-running systems | Evolution provides value |
---
## Implementing Self-Improvement
### With Aden
Aden provides built-in self-improvement through:
1. **Goal-driven generation**: Coding agent creates initial system
2. **Failure capture**: Automatic context collection
3. **Evolution engine**: Coding agent improves graph
4. **Validation**: Test cases verify improvements
5. **Deployment**: Automatic with optional approval
### DIY Approach
Building your own requires:
1. **Failure logging**: Comprehensive context capture
2. **Analysis system**: Pattern recognition in failures
3. **Code generation**: LLM-based improvement proposals
4. **Testing framework**: Automated validation
5. **Deployment pipeline**: Safe rollout mechanism
---
## Conclusion
The choice between static and self-improving agents depends on your priorities:
- **Static agents** offer predictability and control, ideal for stable, regulated environments
- **Self-improving agents** offer adaptability and efficiency, ideal for dynamic, scaling systems
The future likely belongs to **hybrid approaches**: core logic that's stable and auditable, with adaptive components that evolve safely within guardrails.
Frameworks like Aden are pioneering this space, making self-improvement accessible while maintaining the safety and oversight that production systems require.
---
*Last updated: January 2025*
@@ -1,326 +0,0 @@
# Top 10 AI Agent Frameworks in 2025
*A comprehensive guide to the leading frameworks for building AI agents*
---
The AI agent landscape has exploded with options for developers. Whether you're building RAG applications, multi-agent systems, or autonomous workflows, choosing the right framework can significantly impact your project's success.
This guide objectively compares the top 10 AI agent frameworks based on architecture, use cases, and production readiness.
---
## Quick Comparison
| Framework | Best For | Language | Open Source | Self-Improving |
|-----------|----------|----------|-------------|----------------|
| LangChain | RAG & LLM apps | Python/JS | Yes | No |
| CrewAI | Role-based teams | Python | Yes | No |
| AutoGen | Conversational agents | Python | Yes | No |
| Aden | Self-evolving agents | Python/TS | Yes | Yes |
| PydanticAI | Type-safe workflows | Python | Yes | No |
| Swarm | Simple orchestration | Python | Yes | No |
| CAMEL | Research simulations | Python | Yes | No |
| Letta | Stateful memory | Python | Yes | No |
| Mastra | Full-stack AI | TypeScript | Yes | No |
| Haystack | Search & RAG | Python | Yes | No |
---
## 1. LangChain
**Category:** Component Library
**Best For:** RAG applications, LLM-powered apps
**Language:** Python, JavaScript
### Overview
LangChain is one of the most popular frameworks for building LLM applications. It provides a comprehensive set of components for chains, agents, and retrieval-augmented generation.
### Strengths
- Extensive documentation and community
- Wide integration ecosystem
- Flexible component architecture
- Strong RAG capabilities
### Limitations
- Can be complex for simple use cases
- Requires manual workflow definition
- No built-in self-improvement mechanisms
- Debugging can be challenging
### When to Use
Choose LangChain when you need a mature ecosystem with lots of integrations and are building document-centric applications.
---
## 2. CrewAI
**Category:** Multi-Agent Orchestration
**Best For:** Role-based agent teams
**Language:** Python
### Overview
CrewAI enables you to create teams of AI agents with defined roles that collaborate to accomplish tasks. It emphasizes simplicity and role-based organization.
### Strengths
- Intuitive role-based design
- Clean API for team creation
- Good for collaborative workflows
- Active community
### Limitations
- Predefined collaboration patterns
- Limited adaptation to failures
- Manual workflow definition required
- Scaling can be complex
### When to Use
Choose CrewAI when you have well-defined roles and want agents to collaborate in predictable patterns.
---
## 3. AutoGen
**Category:** Conversational Agents
**Best For:** Multi-agent conversations
**Language:** Python
### Overview
Microsoft's AutoGen framework specializes in conversational AI agents that can engage in complex multi-turn dialogues and collaborate through conversation.
### Strengths
- Strong conversational capabilities
- Microsoft backing and support
- Good for dialogue-heavy applications
- Flexible agent configuration
### Limitations
- Conversation-centric (less suited for other patterns)
- Complex setup for non-conversational tasks
- No automatic evolution
### When to Use
Choose AutoGen when your agents primarily need to communicate through natural language conversations.
---
## 4. Aden
**Category:** Self-Evolving Agent Framework
**Best For:** Production systems that need to adapt
**Language:** Python SDK, TypeScript backend
### Overview
Aden takes a fundamentally different approach by using a coding agent to generate agent systems from natural language goals. When agents fail, the framework automatically captures failure data, evolves the agent graph, and redeploys.
### Strengths
- Goal-driven development (describe outcomes, not workflows)
- Automatic self-improvement from failures
- Built-in observability and cost controls
- Human-in-the-loop support
- Production-ready with monitoring dashboard
### Limitations
- Newer framework with growing ecosystem
- Requires understanding of goal-driven paradigm
- More suited for complex, evolving systems
### When to Use
Choose Aden when you need agents that improve over time, want to define goals rather than workflows, or require production-grade observability and cost management.
---
## 5. PydanticAI
**Category:** Type-Safe Framework
**Best For:** Structured, validated outputs
**Language:** Python
### Overview
PydanticAI brings type safety and validation to AI agent development, ensuring outputs conform to defined schemas.
### Strengths
- Strong type validation
- Clean, Pythonic API
- Good for structured outputs
- Reliable data handling
### Limitations
- Best for known workflow patterns
- Less flexible for dynamic scenarios
- No self-adaptation
### When to Use
Choose PydanticAI when output structure and validation are critical to your application.
---
## 6. Swarm
**Category:** Lightweight Orchestration
**Best For:** Simple multi-agent setups
**Language:** Python
### Overview
OpenAI's Swarm provides a minimal framework for orchestrating multiple agents with simple handoff patterns.
### Strengths
- Extremely simple API
- Easy to understand and use
- Good for learning
- Minimal overhead
### Limitations
- Limited features for production
- No built-in monitoring
- Simple handoff patterns only
### When to Use
Choose Swarm for prototyping or simple multi-agent interactions where complexity isn't needed.
---
## 7. CAMEL
**Category:** Research Framework
**Best For:** Large-scale agent simulations
**Language:** Python
### Overview
CAMEL is designed for studying emergent behavior in large-scale multi-agent systems, supporting up to 1M agents.
### Strengths
- Massive scale support
- Research-oriented features
- Good for studying emergence
- Academic backing
### Limitations
- Research-focused, not production-ready
- Steep learning curve
- Limited production tooling
### When to Use
Choose CAMEL for academic research or when studying large-scale agent interactions.
---
## 8. Letta (formerly MemGPT)
**Category:** Stateful Memory
**Best For:** Long-term memory agents
**Language:** Python
### Overview
Letta specializes in agents with sophisticated long-term memory, allowing agents to maintain context across extended interactions.
### Strengths
- Advanced memory management
- Long-term context retention
- Good for personal assistants
- Unique memory architecture
### Limitations
- Memory-focused (less general purpose)
- Complex memory tuning
- Specific use cases
### When to Use
Choose Letta when long-term memory and context retention are primary requirements.
---
## 9. Mastra
**Category:** Full-Stack AI Framework
**Best For:** TypeScript developers
**Language:** TypeScript
### Overview
Mastra provides a TypeScript-first approach to building AI applications with integrated tooling.
### Strengths
- TypeScript native
- Full-stack integration
- Modern developer experience
- Good for web applications
### Limitations
- TypeScript only
- Smaller ecosystem
- Less mature than alternatives
### When to Use
Choose Mastra when building TypeScript applications and want tight integration with web technologies.
---
## 10. Haystack
**Category:** Search & RAG
**Best For:** Document processing pipelines
**Language:** Python
### Overview
Haystack excels at building search and retrieval systems, with strong support for document processing pipelines.
### Strengths
- Excellent for search applications
- Strong document processing
- Production-tested
- Good pipeline abstractions
### Limitations
- Search/RAG focused
- Less suited for general agents
- Pipeline-centric design
### When to Use
Choose Haystack when building search, Q&A, or document processing systems.
---
## Decision Framework
### Choose Based on Your Primary Need
| Need | Recommended Framework |
|------|----------------------|
| RAG / Document apps | LangChain, Haystack |
| Role-based teams | CrewAI |
| Conversational agents | AutoGen |
| Self-improving systems | Aden |
| Type-safe outputs | PydanticAI |
| Simple prototypes | Swarm |
| Research simulations | CAMEL |
| Long-term memory | Letta |
| TypeScript apps | Mastra |
### Choose Based on Production Requirements
| Requirement | Best Options |
|-------------|--------------|
| Self-healing & adaptation | Aden |
| Mature ecosystem | LangChain |
| Cost management built-in | Aden |
| Simple deployment | Swarm, CrewAI |
| Enterprise support | LangChain, AutoGen |
| Real-time monitoring | Aden |
---
## Conclusion
The "best" framework depends on your specific needs:
- **For most RAG applications:** LangChain remains the standard
- **For collaborative agent teams:** CrewAI offers intuitive design
- **For systems that need to evolve:** Aden's self-improving approach is unique
- **For research:** CAMEL provides scale
- **For simplicity:** Swarm is hard to beat
Consider your production requirements, team expertise, and whether you need agents that can adapt and improve over time when making your decision.
---
*Last updated: January 2025*
-161
View File
@@ -1,161 +0,0 @@
# Phase 2: FunctionNode Removal + Dead Code Cleanup
> Ref: [GitHub Issue #4753](https://github.com/adenhq/hive/issues/4753)
## Context
`FunctionNode` (`node_type="function"`) breaks three core agent principles: conversation continuity, cumulative tools, and user interruptibility. Phase 1 (soft deprecation warnings) is complete. This plan covers Phase 2 (hard removal) plus cleanup of other dead code discovered during scoping.
**Total estimated removal: ~5,000+ lines** across production code, tests, docs, and examples.
---
## Part 1: Remove `FunctionNode` class and `"function"` node type
### 1.1 Core framework
| File | What to remove/change |
|---|---|
| `core/framework/graph/node.py` | Delete `FunctionNode` class (~L1878-1985). Remove `function` field from `NodeSpec` (~L200). |
| `core/framework/graph/executor.py` | Remove `FunctionNode` import (~L24). Remove `"function"` from `VALID_NODE_TYPES` (~L1473). Remove `node_type == "function"` branch (~L1529-1533). Remove `register_function()` (~L1975-1977). Add migration error for graphs with `node_type="function"`. |
| `core/framework/builder/workflow.py` | Remove `node_type == "function"` validation block (~L258-260). |
### 1.2 Builder Package Generator
| File | What to change |
|---|---|
| `core/framework/builder/package_generator.py` | Remove `"function"` from `node_type` description in `add_node` and `update_node`. Remove `node_type == "function"` simulation branch in `test_node`. |
### 1.3 Examples & demos
| File | Action |
|---|---|
| `core/examples/manual_agent.py` | Rewrite to use `event_loop` nodes |
| `core/demos/github_outreach_demo.py` | Convert `Sender` node from `function` to `event_loop` |
| `core/examples/mcp_integration_example.py` | Rewrite to use `event_loop` nodes |
### 1.4 Docs & skills
| File | Action |
|---|---|
| `docs/developer-guide.md` | Remove `"function"` from node type table (~L495, L856) |
| `docs/developer-guide.md` | Remove `"function"` node type reference (~L613) |
| `core/MCP_SERVER_GUIDE.md` | Audit for `"function"` references |
| `docs/why-conditional-edge-priority.md` | Remove or repurpose (entire doc framed around function nodes) |
| `docs/environment-setup.md` | Remove "function" from node types list (~L216) |
| `docs/i18n/*.md` | Update BUILD diagrams in 7 i18n files (ja, ko, pt, hi, es, ru, zh-CN) removing "Function" |
| `core/framework/runtime/runtime_log_schemas.py` | Remove `"function"` from node_type comment (~L40) |
---
## Part 2: Remove deprecated `LLMNode` + `llm_tool_use` / `llm_generate`
Already soft-deprecated with `DeprecationWarning`. No template agent uses them. Only `mcp_integration_example.py` references them.
| File | What to remove/change |
|---|---|
| `core/framework/graph/node.py` | Delete `LLMNode` class (~L660-1689, ~1000 lines). Largest single removal. |
| `core/framework/graph/executor.py` | Remove `LLMNode` import. Remove `"llm_tool_use"`/`"llm_generate"` from `VALID_NODE_TYPES`. Remove `DEPRECATED_NODE_TYPES` dict. Remove their branches in `_get_node_implementation` (~L1507-1523). Update `human_input` branch to use `EventLoopNode` instead of `LLMNode`. Add migration error for deprecated types. |
| `core/framework/builder/package_generator.py` | Remove `llm_tool_use`/`llm_generate` validation warnings and branches |
---
## Part 3: Rewrite tests using `function` nodes as fixtures
These tests use `node_type="function"` as convenient scaffolding but actually test graph execution features (retries, fan-out, feedback edges, etc.). They all need rewriting.
| Test file | What it tests |
|---|---|
| `core/tests/test_on_failure_edges.py` | On-failure edge routing (~10 function nodes) |
| `core/tests/test_executor_feedback_edges.py` | Max node visits, feedback loops (~20+ function nodes) |
| `core/tests/test_executor_max_retries.py` | Retry behavior (~7 function nodes) |
| `core/tests/test_fanout.py` | Fan-out/fan-in parallel execution (~20+ function nodes) |
| `core/tests/test_execution_quality.py` | Retry + quality scoring (~8 function nodes) |
| `core/tests/test_conditional_edge_direct_key.py` | Conditional edge evaluation (~8 function nodes) |
| `core/tests/test_event_loop_integration.py` | Mixed node graph test (~2 function nodes) |
| `core/tests/test_runtime_logger.py` | Runtime log schema (~2 references) |
| `tools/tests/tools/test_runtime_logs_tool.py` | Log tool output (~2 references) |
**Strategy:** Create a `MockNode(NodeProtocol)` test helper that wraps a callable, providing the same convenience as `FunctionNode` but scoped to tests only. Tests swap `node_type="function"` for a neutral `node_type="event_loop"` and register a `MockNode` in the executor's `node_registry`. This minimizes rewrite effort.
---
## Part 4: Items NOT recommended for removal
| Item | Reason to keep |
|---|---|
| `RouterNode` | Architecturally sound (deterministic routing), just lacks template examples |
| `human_input` node type | Valid HITL pattern, but switch implementation from `LLMNode` to `EventLoopNode` |
| `register_function` in `tool_registry.py` | For **tool** registration — completely different concept from function nodes |
---
## Part 5: Remove the Planner-Worker subsystem (~3,900 lines dead code)
The entire Planner-Worker-Judge pattern has **zero external consumers**. No template agent, example, demo, or runner references it. It is only consumed by:
- Its own internal files (self-referential imports)
- The builder package generator (exposes tools for it)
- Its own dedicated tests
### 5.1 Delete these files entirely
| File | Lines | What |
|---|---|---|
| `core/framework/graph/flexible_executor.py` | 552 | `FlexibleGraphExecutor` — Worker-Judge orchestrator |
| `core/framework/graph/worker_node.py` | 620 | `WorkerNode` — plan step dispatcher |
| `core/framework/graph/plan.py` | 513 | `Plan`, `PlanStep`, `ActionType`, `ActionSpec` data structures |
| `core/framework/graph/judge.py` | 406 | `HybridJudge` — step result evaluator |
| `core/framework/graph/code_sandbox.py` | 413 | `CodeSandbox` — sandboxed code execution |
| `core/tests/test_flexible_executor.py` | 442 | FlexibleGraphExecutor tests |
| `core/tests/test_plan.py` | 592 | Plan data structure tests |
| `core/tests/test_plan_dependency_resolution.py` | 384 | Plan dependency resolution tests |
### 5.2 Clean up exports
`core/framework/graph/__init__.py` — Remove all planner-worker exports: `FlexibleGraphExecutor`, `ExecutorConfig`, `WorkerNode`, `StepExecutionResult`, `HybridJudge`, `create_default_judge`, `CodeSandbox`, `safe_eval`, `safe_exec`, `Plan`, `PlanStep`, `ActionType`, `ActionSpec`, and all related symbols.
### 5.3 Remove MCP tools from builder package generator
`core/framework/builder/package_generator.py` — Remove these 7 MCP tools:
| MCP tool | Description |
|---|---|
| `create_plan` | Creates a plan with steps |
| `validate_plan` | Validates plan structure |
| `simulate_plan_execution` | Dry-run simulation |
| `load_exported_plan` | Loads plan from JSON |
| `add_evaluation_rule` | Adds HybridJudge rule |
| `list_evaluation_rules` | Lists evaluation rules |
| `remove_evaluation_rule` | Removes evaluation rule |
Also remove:
- `from framework.graph.plan import Plan` import (~L39, L3731)
- `_evaluation_rules` global list (~L2528)
- `"evaluation_rules"` from export/session data (~L1859)
- `load_plan_from_json()` helper function (~L3721-3733)
---
## Execution order
1. **Create `MockNode` test helper** — unblocks all test rewrites
2. **Rewrite tests** using function nodes as fixtures (Part 3)
3. **Remove `FunctionNode` class + all references** (Part 1)
4. **Remove `LLMNode` class + deprecated types** (Part 2)
5. **Delete Planner-Worker subsystem files** (Part 5.1)
6. **Clean up `__init__.py` exports** (Part 5.2)
7. **Remove MCP tools** for plans/evaluation from builder package generator (Part 5.3)
8. **Update examples/demos/docs/skills** (Parts 1.3, 1.4)
9. **Run full test suite** to verify
---
## Verification
1. `pytest core/tests/` — all tests pass
2. `pytest tools/tests/` — runtime log tests pass
3. Load any template agent JSON — no errors
4. Attempt to load a graph with `node_type="function"` — clear `RuntimeError` with migration guidance
5. Attempt to load a graph with `node_type="llm_tool_use"` — clear `RuntimeError` with migration guidance
6. Builder package generator: `add_node` with `node_type="function"` — rejected with helpful message
7. Plan/evaluation MCP tools no longer appear in tool list
-157
View File
@@ -1,157 +0,0 @@
# 🚀 Software Development Engineer
**Location:** San Francisco, CA (Hybrid) or Remote
**Type:** Full-time
**Team:** Engineering
---
## About Aden
We're building the future of AI agents. Aden is an open-source framework for creating self-improving, production-ready AI agents with built-in cost controls, human-in-the-loop capabilities, and comprehensive observability.
Our mission: Make AI agents reliable enough for real-world production use.
---
## The Role
We're looking for a Software Development Engineer to help build and scale our AI agent platform. You'll work across the full stack, from our React dashboard to our Node.js backend, contributing to core infrastructure that powers autonomous AI systems.
This is an opportunity to work on cutting-edge AI infrastructure alongside a small, experienced team passionate about shipping great software.
---
## What You'll Do
- Build and maintain features across our full-stack TypeScript codebase
- Design and implement APIs for agent management, monitoring, and control
- Work with real-time systems (WebSockets, event streaming)
- Optimize database performance (TimescaleDB, MongoDB, Redis)
- Contribute to our Model Context Protocol (MCP) server and tooling
- Collaborate on architecture decisions for scalability and reliability
- Write clean, tested, well-documented code
- Participate in code reviews and help maintain code quality
---
## Tech Stack
**Frontend (Honeycomb Dashboard)**
- React 18 + TypeScript
- Vite
- Tailwind CSS + Radix UI
- Zustand (state management)
- TanStack Query
- Recharts + Vega (data visualization)
- Socket.io (real-time updates)
**Backend (Hive)**
- Node.js + Express + TypeScript
- Socket.io (WebSocket)
- Model Context Protocol (MCP)
- Zod (validation)
- Passport + JWT (authentication)
**Data Layer**
- TimescaleDB (time-series metrics)
- MongoDB (policies, configuration)
- Redis (caching, pub/sub)
**Infrastructure**
- Docker + Docker Compose
- Kubernetes + Kustomize
- GitHub Actions (CI/CD)
- Nginx
---
## What We're Looking For
**Required:**
- 2+ years of professional software development experience
- Strong proficiency in TypeScript and Node.js
- Experience with React and modern frontend development
- Familiarity with SQL and NoSQL databases
- Understanding of RESTful APIs and WebSocket communication
- Comfortable with Git and collaborative development workflows
- Strong problem-solving skills and attention to detail
**Nice to Have:**
- Experience with AI/LLM applications or agent frameworks
- Knowledge of time-series databases (TimescaleDB, InfluxDB)
- Kubernetes and container orchestration experience
- Experience with real-time systems at scale
- Contributions to open-source projects
- Familiarity with Model Context Protocol (MCP)
---
## What We Offer
- Competitive salary + equity
- Health, dental, and vision insurance
- Flexible work arrangements (hybrid/remote)
- Learning & development budget
- Home office setup stipend
- Opportunity to work on open-source AI infrastructure
- Small team, big impact
---
## How to Apply
**Show us what you can do by contributing to our open-source project:**
1. **Solve an existing issue**
- Browse our [GitHub Issues](https://github.com/adenhq/hive/issues)
- Look for issues labeled `good first issue` or `help wanted`
- Comment on the issue to claim it
- Submit a Pull Request with your solution
2. **Create new issues**
- Found a bug? Report it with clear reproduction steps
- Have an idea? Open a feature request with your proposal
- Spotted documentation gaps? Suggest improvements
- Quality issues that show you understand the codebase stand out
3. **Submit Pull Requests**
- Fix bugs, add features, or improve documentation
- Follow our contribution guidelines
- Write clear PR descriptions explaining your changes
- Respond to code review feedback
4. **Submit your application:**
- Email: `contact@adenhq.com`
- Subject: `[SDE] Your Name`
- Include:
- Resume/CV
- GitHub profile
- Links to your Issues and/or PRs on our repo
- Brief intro about yourself
5. **What happens next:**
- We review your contributions (1-2 weeks)
- Technical interview (60 min)
- Team interview (45 min)
- Offer 🎉
---
## Why Join Us?
- **Impact:** Your code will power AI agents used by developers worldwide
- **Open Source:** Everything we build is open source
- **Learning:** Work with cutting-edge AI and distributed systems
- **Culture:** Small team, low ego, high trust, ship fast
- **Growth:** Early-stage company with room to grow
---
*Aden is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.*
---
**Questions?** Email us at `contact@adenhq.com` or open an issue on [GitHub](https://github.com/adenhq/hive).
Made with 🔥 Passion in San Francisco
-165
View File
@@ -1,165 +0,0 @@
# 🚀 Getting Started Challenge
Welcome to Aden! This challenge will help you get familiar with our project and community. Complete all tasks to earn your first badge!
**Difficulty:** Beginner
**Time:** ~30 minutes
**Prerequisites:** GitHub account
---
## Part 1: Join the Aden Community (10 points)
### Task 1.1: Star the Repository ⭐
Show your support by starring our repo!
1. Go to [github.com/adenhq/hive](https://github.com/adenhq/hive)
2. Click the **Star** button in the top right
3. **Screenshot** your starred repo (showing the star count)
### Task 1.2: Watch the Repository 👁️
Stay updated with our latest changes!
1. Click the **Watch** button
2. Select **"All Activity"** to get notifications
3. **Screenshot** your watch settings
### Task 1.3: Fork the Repository 🍴
Create your own copy to experiment with!
1. Click the **Fork** button
2. Keep the default settings and create the fork
3. **Screenshot** your forked repository
### Task 1.4: Join Discord 💬
Connect with our community!
1. Join our [Discord server](https://discord.com/invite/MXE49hrKDk)
2. Introduce yourself in `#introductions`
3. **Screenshot** your introduction message
---
## Part 2: Explore Aden (15 points)
### Task 2.1: README Scavenger Hunt 🔍
Find the answers to these questions by reading our README:
1. What are the **three LLM providers** Aden supports out of the box?
2. How many **MCP tools** does the Hive Control Plane provide?
3. What is the name of the **frontend dashboard**?
4. In the "How It Works" section, what is **Step 5**?
5. What city is Aden made with passion in?
### Task 2.2: Architecture Quiz 🏗️
Based on the architecture diagram in the README:
1. What are the three databases in the Storage Layer?
2. Name two components inside an "SDK-Wrapped Node"
3. What connects the Control Plane to the Dashboard?
4. Where does "Failure Data" flow to in the diagram?
### Task 2.3: Comparison Challenge 📊
From the Comparison Table, answer:
1. What category is CrewAI in?
2. What's the Aden difference compared to LangChain?
3. Which framework focuses on "emergent behavior in large-scale simulations"?
---
## Part 3: Quick Code Exploration (15 points)
### Task 3.1: Project Structure 📁
Clone your fork and explore the codebase:
```bash
git clone https://github.com/YOUR_USERNAME/hive.git
cd hive
```
Answer these questions:
1. What is the main frontend folder called?
2. What is the main backend folder called?
3. What file would you edit to configure the application?
4. What's the command to set up the Python environment (hint: check README)?
### Task 3.2: Find the Features 🎯
Look through the codebase to find:
1. Where are the MCP tools defined? (provide the file path)
2. What port does the MCP server run on? (hint: check the tools/Dockerfile)
3. Find one TypeScript interface related to agents (provide file path and interface name)
---
## Part 4: Creative Challenge (10 points)
### Task 4.1: Agent Idea 💡
Aden can build self-improving agents for any use case. Propose ONE creative agent idea:
1. **Name:** Give your agent a catchy name
2. **Goal:** What problem does it solve? (2-3 sentences)
3. **Self-Improvement:** How would it get better over time when things fail?
4. **Human-in-the-Loop:** When would it need human input?
Example format:
```
Name: DocBot
Goal: Automatically keeps documentation in sync with code changes.
Monitors PRs and updates relevant docs.
Self-Improvement: When docs get rejected in review, it learns the feedback
and adjusts its writing style and coverage.
Human-in-the-Loop: Major architectural changes require human approval
before doc updates go live.
```
---
## Submission Checklist
Before submitting, make sure you have:
- [ ] Screenshots from Part 1 (Star, Watch, Fork, Discord)
- [ ] Answers to all Part 2 questions
- [ ] Answers to all Part 3 questions
- [ ] Your creative agent idea from Part 4
### How to Submit
1. Create a GitHub Gist at [gist.github.com](https://gist.github.com)
2. Name it `aden-getting-started-YOURNAME.md`
3. Include all your answers and screenshots (use image hosting like imgur for screenshots)
4. Email the Gist link to `careers@adenhq.com`
- Subject: `[Getting Started Challenge] Your Name`
- Include your GitHub username
---
## Scoring
| Section | Points |
|---------|--------|
| Part 1: Community | 10 |
| Part 2: Explore | 15 |
| Part 3: Code | 15 |
| Part 4: Creative | 10 |
| **Total** | **50** |
**Passing score:** 40+ points
---
## What's Next?
After completing this challenge, choose your specialization:
- **Backend Engineers:** [🧠 Architecture Deep Dive](./02-architecture-deep-dive.md)
- **AI/ML Engineers:** [🤖 Build Your First Agent](./03-build-your-first-agent.md)
- **Frontend Engineers:** [🎨 Frontend Challenge](./04-frontend-challenge.md)
- **DevOps Engineers:** [🔧 DevOps Challenge](./05-devops-challenge.md)
---
Good luck! We're excited to see your submissions! 🎉
-195
View File
@@ -1,195 +0,0 @@
# 🧠 Architecture Deep Dive Challenge
Test your understanding of Aden's architecture and backend systems. This challenge is perfect for backend engineers who want to contribute to the core framework.
**Difficulty:** Intermediate
**Time:** 1-2 hours
**Prerequisites:** Complete [Getting Started](./01-getting-started.md), familiarity with Node.js/TypeScript
---
## Part 1: System Architecture (20 points)
### Task 1.1: Component Mapping 🗺️
Study the Aden architecture and answer:
1. Describe the data flow from when a user defines a goal to when worker agents execute. Include all major components.
2. Explain the "self-improvement loop" - what happens when an agent fails?
3. What's the difference between:
- Coding Agent vs Worker Agent
- STM (Short-Term Memory) vs LTM (Long-Term Memory)
- Hot storage vs Cold storage for events
### Task 1.2: Database Design 💾
Aden uses three databases. For each, explain:
1. **TimescaleDB:** What type of data is stored? Why TimescaleDB specifically?
2. **MongoDB:** What is stored here? Why a document database?
3. **PostgreSQL:** What is its primary purpose?
### Task 1.3: Real-time Communication 📡
Answer these about the real-time systems:
1. What protocol connects the SDK to the Hive backend for policy updates?
2. How does the dashboard receive live agent metrics?
3. What is the heartbeat interval for SDK health checks?
---
## Part 2: Code Analysis (25 points)
### Task 2.1: API Routes 🛣️
Explore the backend code and document:
1. List all the main API route prefixes (e.g., `/user`, `/v1/control`, etc.)
2. For the `/v1/control` routes, what are the main endpoints and their purposes?
3. What authentication method is used for API requests?
### Task 2.2: MCP Tools Deep Dive 🔧
The MCP server provides 19 tools. Categorize them and answer:
1. List all **Budget tools** (tools with "budget" in the name)
2. List all **Analytics tools**
3. List all **Policy tools**
4. Pick ONE tool and explain:
- What parameters does it accept?
- What does it return?
- When would the Coding Agent use it?
### Task 2.3: Event Specification 📊
Find and analyze the SDK event specification:
1. What are the four event types that can be sent from SDK to server?
2. For a `MetricEvent`, list at least 5 fields that are captured
3. What is "Layer 0 content capture" and when is it used?
---
## Part 3: Design Questions (25 points)
### Task 3.1: Scaling Scenario 📈
Imagine Aden needs to handle 1000 concurrent agents across 50 teams:
1. Which components would be the bottleneck? Why?
2. How would you horizontally scale the system?
3. What database optimizations would you recommend?
4. How would you ensure team data isolation at scale?
### Task 3.2: New Feature Design 🆕
Design a new feature: **Agent Collaboration Logs**
Requirements:
- Track when agents communicate with each other
- Store the message content and metadata
- Support querying by time range, agent, or conversation thread
- Real-time streaming to the dashboard
Provide:
1. Database schema design (which DB and table structure)
2. API endpoint design (routes and payloads)
3. How would this integrate with existing event batching?
### Task 3.3: Failure Handling ⚠️
The self-healing loop is core to Aden. Design the detailed flow:
1. How should failures be categorized (types of failures)?
2. What data should be captured for the Coding Agent to improve?
3. How do you prevent infinite failure loops?
4. When should the system escalate to human intervention?
---
## Part 4: Practical Implementation (30 points)
### Task 4.1: Write a New MCP Tool 🛠️
Create a new MCP tool called `hive_agent_performance_report`:
**Requirements:**
- Returns performance metrics for a specific agent over a time period
- Includes: total requests, success rate, avg latency, total cost
- Accepts parameters: `agent_id`, `start_time`, `end_time`
Provide:
1. Tool definition (name, description, input schema)
2. Implementation pseudocode or actual TypeScript
3. Example request and response
### Task 4.2: Budget Enforcement Algorithm 💰
Implement the logic for budget enforcement:
```typescript
interface BudgetCheck {
action: 'allow' | 'block' | 'throttle' | 'degrade';
reason: string;
degradedModel?: string;
delayMs?: number;
}
function checkBudget(
currentSpend: number,
budgetLimit: number,
requestedModel: string,
estimatedCost: number
): BudgetCheck {
// Your implementation here
}
```
Requirements:
- Block if budget would be exceeded
- Throttle (2000ms delay) if ≥95% used
- Degrade to cheaper model if ≥80% used
- Allow otherwise
### Task 4.3: Event Aggregation Query 📈
Write a SQL query for TimescaleDB that:
1. Aggregates metrics by hour for the last 24 hours
2. Groups by model and provider
3. Calculates: total tokens, total cost, avg latency, request count
4. Orders by total cost descending
---
## Submission Checklist
- [ ] All Part 1 architecture answers
- [ ] All Part 2 code analysis answers
- [ ] All Part 3 design documents
- [ ] All Part 4 implementations
### How to Submit
1. Create a GitHub Gist with your answers
2. Name it `aden-architecture-YOURNAME.md`
3. Include any code files as separate files in the Gist
4. Email to `careers@adenhq.com`
- Subject: `[Architecture Challenge] Your Name`
---
## Scoring
| Section | Points |
|---------|--------|
| Part 1: System Architecture | 20 |
| Part 2: Code Analysis | 25 |
| Part 3: Design Questions | 25 |
| Part 4: Implementation | 30 |
| **Total** | **100** |
**Passing score:** 75+ points
---
## Bonus Points (+20)
- Identify a bug or improvement in the actual codebase and open an issue
- Submit a PR fixing a documentation issue
- Create a diagram of your design using Mermaid or similar
---
Good luck! We're looking for engineers who can think systematically about distributed systems! 🏗️
-277
View File
@@ -1,277 +0,0 @@
# 🤖 Build Your First Agent Challenge
Get hands-on with AI agents! This challenge is for AI/ML engineers who want to understand agent development and contribute to Aden's agent ecosystem.
**Difficulty:** Intermediate
**Time:** 2-3 hours
**Prerequisites:** Complete [Getting Started](./01-getting-started.md), Python experience, basic LLM knowledge
---
## Part 1: Agent Fundamentals (20 points)
### Task 1.1: Core Concepts 📚
Answer these questions about Aden's agent architecture:
1. What is a "node" in Aden's architecture? How does it differ from a traditional function?
2. Explain the SDK-wrapped node concept. What four capabilities does every node get automatically?
3. What's the difference between:
- A Coding Agent and a Worker Agent
- Goal-driven vs workflow-driven development
- Predefined edges vs dynamic connections
4. Why does Aden generate "connection code" instead of using a fixed graph structure?
### Task 1.2: Memory Systems 🧠
Aden has sophisticated memory management:
1. Describe the three types of memory available to agents:
- Shared Memory
- STM (Short-Term Memory)
- LTM (Long-Term Memory / RLM)
2. When would an agent use each type?
3. How does "Session Local memory isolation" work?
### Task 1.3: Human-in-the-Loop 🙋
Explain the HITL system:
1. What triggers a human intervention point?
2. What happens if a human doesn't respond within the timeout?
3. List three scenarios where HITL would be essential
---
## Part 2: Agent Design (25 points)
### Task 2.1: Design a Multi-Agent System 🎭
Design a **Content Marketing Agent System** with multiple worker agents:
**Goal:** Automatically create and publish blog posts based on company news
Requirements:
- Must use at least 3 specialized worker agents
- Include human approval before publishing
- Handle failures gracefully
Provide:
1. **Agent Diagram:** Show all agents and how they connect
2. **Agent Descriptions:** For each agent, describe:
- Name and role
- Inputs and outputs
- Tools it needs
- Failure scenarios
3. **Human Checkpoints:** Where would humans intervene?
4. **Self-Improvement:** How would this system learn from failures?
### Task 2.2: Goal Definition 🎯
Write a natural language goal that a user might give to create your system:
```
Example Goal:
"Create a system that monitors our company RSS feed for news,
writes engaging blog posts about each news item, gets approval
from the marketing team, and publishes to our WordPress site.
If a post is rejected, learn from the feedback to write better
posts in the future."
```
Your goal should be:
- Clear and specific
- Include success criteria
- Mention failure handling
- Specify human touchpoints
### Task 2.3: Test Cases 📋
Design 5 test cases for your agent system:
| Test Case | Input | Expected Output | Success Criteria |
|-----------|-------|-----------------|------------------|
| Happy Path | Normal news item | Published blog post | Post live on site |
| ... | ... | ... | ... |
Include at least:
- 1 happy path
- 2 edge cases
- 2 failure scenarios
---
## Part 3: Practical Implementation (30 points)
### Task 3.1: Agent Pseudocode 💻
Write pseudocode for ONE of your worker agents:
```python
class ContentWriterAgent:
"""
Agent that takes news items and writes blog posts.
"""
def __init__(self, config):
# Initialize with tools, memory, LLM access
pass
async def execute(self, input_data):
# Main execution logic
pass
async def handle_failure(self, error, context):
# How to handle different types of failures
pass
async def learn_from_feedback(self, feedback):
# How to improve based on rejection feedback
pass
```
Provide detailed pseudocode with:
- LLM calls and prompts
- Memory reads/writes
- Tool usage
- Error handling
### Task 3.2: Prompt Engineering 📝
Write the actual prompts for your agent:
1. **System Prompt:** The core instructions for your agent
2. **Task Prompt Template:** How tasks are presented to the agent
3. **Feedback Learning Prompt:** How rejection feedback is processed
Example format:
```
SYSTEM PROMPT:
You are a professional content writer for {company_name}...
TASK PROMPT:
Given the following news item:
{news_content}
Write a blog post that...
FEEDBACK PROMPT:
Your previous post was rejected with this feedback:
{feedback}
Analyze what went wrong and...
```
### Task 3.3: Tool Definitions 🔧
Define the tools your agent needs:
```python
tools = [
{
"name": "search_company_knowledge",
"description": "Search internal knowledge base for relevant context",
"parameters": {
"query": "string - search query",
"limit": "int - max results (default 5)"
},
"returns": "List of relevant documents"
},
# Add more tools...
]
```
Define at least 3 tools with:
- Clear name and description
- Input parameters with types
- Return value description
- Example usage
---
## Part 4: Advanced Challenges (25 points)
### Task 4.1: Failure Evolution Design 🔄
Design the self-improvement mechanism in detail:
1. **Failure Classification:** Create a taxonomy of failures for your agent
```
- LLM Failures: rate limit, content filter, hallucination
- Tool Failures: API down, invalid response, timeout
- Logic Failures: wrong output format, missing data
- Human Rejection: quality issues, off-brand, factual error
```
2. **Learning Storage:** What data do you store for each failure type?
3. **Evolution Strategy:** How does the Coding Agent use failure data to improve?
4. **Guardrails:** What prevents the system from making things worse?
### Task 4.2: Cost Optimization 💰
Your agent system will be called frequently. Design cost optimizations:
1. **Model Selection:** When to use GPT-4 vs GPT-3.5 vs Claude Haiku?
2. **Caching Strategy:** What can be cached to reduce LLM calls?
3. **Batching:** How can you batch operations for efficiency?
4. **Budget Rules:** Design budget rules for your system
### Task 4.3: Observability Dashboard 📊
Design what metrics should be tracked for your agent system:
1. **Performance Metrics:** (at least 5)
2. **Quality Metrics:** (at least 3)
3. **Cost Metrics:** (at least 3)
4. **Alert Conditions:** When should the system alert humans?
---
## Submission Checklist
- [ ] All Part 1 concept answers
- [ ] Complete multi-agent design (Part 2)
- [ ] Implementation code/pseudocode (Part 3)
- [ ] Advanced challenge solutions (Part 4)
### How to Submit
1. Create a GitHub Gist with your answers
2. Name it `aden-agent-challenge-YOURNAME.md`
3. Include code files separately
4. If you created diagrams, include images
5. Email to `careers@adenhq.com`
- Subject: `[Agent Challenge] Your Name`
---
## Scoring
| Section | Points |
|---------|--------|
| Part 1: Fundamentals | 20 |
| Part 2: Design | 25 |
| Part 3: Implementation | 30 |
| Part 4: Advanced | 25 |
| **Total** | **100** |
**Passing score:** 75+ points
---
## Bonus Points (+25)
- **+10:** Actually implement a working prototype using any framework
- **+10:** Create a demo video of your agent in action
- **+5:** Submit a PR adding your agent as a template to the repo
---
## Example Agent Templates
Need inspiration? Here are some agent ideas:
1. **Research Agent:** Gathers information from multiple sources
2. **Code Review Agent:** Reviews PRs and suggests improvements
3. **Customer Support Agent:** Handles support tickets with escalation
4. **Data Pipeline Agent:** Monitors and fixes data quality issues
5. **Meeting Agent:** Summarizes meetings and creates action items
---
Good luck! We're excited to see your creative agent designs! 🤖✨
-277
View File
@@ -1,277 +0,0 @@
# 🎨 Frontend Challenge
Build beautiful, functional interfaces for AI agent management! This challenge is for frontend engineers who want to contribute to Honeycomb, Aden's dashboard.
**Difficulty:** Intermediate
**Time:** 1-2 hours
**Prerequisites:** Complete [Getting Started](./01-getting-started.md), React/TypeScript experience
---
## Part 1: Codebase Exploration (15 points)
### Task 1.1: Tech Stack Analysis 🔍
Explore the `honeycomb/` directory and answer:
1. What React version is used?
2. What styling solution is used? (Tailwind, CSS Modules, etc.)
3. What state management approach is used?
4. What charting library is used for analytics?
5. How does the frontend communicate with the backend in real-time?
### Task 1.2: Component Structure 📁
Map out the component architecture:
1. List the main page components (routes)
2. Find and describe 3 reusable components
3. Where are TypeScript types defined for agent data?
4. How is authentication handled in the frontend?
### Task 1.3: Design System 🎨
Analyze the UI patterns:
1. What UI component library is used? (Radix, shadcn, etc.)
2. Find 3 custom components that aren't from a library
3. What color scheme/theme approach is used?
4. How are loading and error states typically handled?
---
## Part 2: UI/UX Analysis (20 points)
### Task 2.1: Dashboard Critique 📊
Based on the codebase and agent control types, analyze what the dashboard likely shows:
1. What key metrics would you display for agent monitoring?
2. How would you visualize the agent graph/connections?
3. What real-time updates are most important to show?
4. Critique: What could be improved in the current approach?
### Task 2.2: User Flow Design 🔄
Design the user flow for this feature:
**Feature:** "Create New Agent from Goal"
Map out:
1. Entry point (where does the user start?)
2. Step-by-step screens needed
3. Form fields and validation
4. Success/error states
5. How to show agent generation progress
Provide a wireframe (can be ASCII, hand-drawn, or Figma):
```
+----------------------------------+
| Create New Agent |
|----------------------------------|
| Step 1: Define Your Goal |
| +----------------------------+ |
| | Describe what you want | |
| | your agent to achieve... | |
| +----------------------------+ |
| |
| [ ] Include human checkpoints |
| [ ] Enable cost controls |
| |
| [Cancel] [Next Step] |
+----------------------------------+
```
### Task 2.3: Accessibility Audit ♿
Consider accessibility for the agent dashboard:
1. List 5 accessibility requirements for a data-heavy dashboard
2. How would you make real-time updates accessible?
3. What keyboard navigation is essential?
4. How would you handle screen readers for the agent graph visualization?
---
## Part 3: Implementation Challenges (35 points)
### Task 3.1: Build a Component 🧱
Create a React component: `AgentStatusCard`
Requirements:
- Display agent name, status, and key metrics
- Status: online (green), degraded (yellow), offline (red), unknown (gray)
- Show: requests/min, success rate, avg latency, cost today
- Include a mini sparkline chart for requests over last hour
- Expandable to show more details
- TypeScript with proper types
```tsx
interface AgentStatusCardProps {
agent: {
id: string;
name: string;
status: 'online' | 'degraded' | 'offline' | 'unknown';
metrics: {
requestsPerMinute: number;
successRate: number;
avgLatency: number;
costToday: number;
requestHistory: number[]; // last 60 minutes
};
};
onExpand?: () => void;
expanded?: boolean;
}
export function AgentStatusCard({ agent, onExpand, expanded }: AgentStatusCardProps) {
// Your implementation
}
```
### Task 3.2: Real-time Hook 🔌
Create a custom hook for real-time agent metrics:
```tsx
interface UseAgentMetricsOptions {
agentId: string;
refreshInterval?: number;
}
interface UseAgentMetricsResult {
metrics: AgentMetrics | null;
isLoading: boolean;
error: Error | null;
lastUpdated: Date | null;
}
function useAgentMetrics(options: UseAgentMetricsOptions): UseAgentMetricsResult {
// Your implementation
// Should handle:
// - WebSocket subscription for real-time updates
// - Fallback to polling if WebSocket unavailable
// - Cleanup on unmount
// - Error handling and retry logic
}
```
### Task 3.3: Data Visualization 📈
Design and implement a cost breakdown chart component:
Requirements:
- Show cost by model (GPT-4, Claude, etc.) as a donut/pie chart
- Show cost over time as a line/area chart
- Toggle between daily/weekly/monthly views
- Animate transitions between views
- Show tooltip with details on hover
Provide:
1. Component interface/props
2. Implementation (can use Recharts, Vega, or any library)
3. Example mock data
4. Responsive design considerations
---
## Part 4: Advanced Frontend (30 points)
### Task 4.1: Agent Graph Visualization 🕸️
Design how to visualize the agent graph:
**Challenge:** Show a dynamic graph where:
- Nodes are agents
- Edges are connections between agents
- Real-time data flows are animated
- Users can zoom, pan, and click for details
Provide:
1. Library choice and justification (D3, React Flow, Cytoscape, etc.)
2. Component architecture
3. Performance considerations for 50+ nodes
4. Interaction design (how users explore the graph)
5. Code sketch for the main component
### Task 4.2: Optimistic UI for Budget Controls 💰
Implement optimistic UI for budget updates:
**Scenario:** User changes an agent's budget limit
- Update should appear instantly
- Backend validation may reject the change
- Must handle race conditions with real-time updates
Provide:
1. State management approach
2. Rollback mechanism on failure
3. Conflict resolution strategy
4. User feedback design
```tsx
function useBudgetUpdate(agentId: string) {
// Your implementation showing:
// - Optimistic update
// - Server sync
// - Rollback on error
// - Conflict handling
}
```
### Task 4.3: Performance Optimization ⚡
The dashboard shows data for 100+ agents with real-time updates.
Design optimizations for:
1. **Rendering:** How to prevent unnecessary re-renders?
2. **Data:** How to handle high-frequency WebSocket updates?
3. **Memory:** How to prevent memory leaks with subscriptions?
4. **Initial Load:** How to prioritize visible content?
Provide specific techniques and code examples for each.
---
## Submission Checklist
- [ ] All Part 1 exploration answers
- [ ] Part 2 wireframes and design analysis
- [ ] Part 3 component implementations
- [ ] Part 4 advanced designs
### How to Submit
1. Create a GitHub Gist with your answers
2. Name it `aden-frontend-YOURNAME.md`
3. Include code files as separate Gist files
4. If you created working code, include a CodeSandbox/StackBlitz link
5. Email to `careers@adenhq.com`
- Subject: `[Frontend Challenge] Your Name`
---
## Scoring
| Section | Points |
|---------|--------|
| Part 1: Exploration | 15 |
| Part 2: UI/UX | 20 |
| Part 3: Implementation | 35 |
| Part 4: Advanced | 30 |
| **Total** | **100** |
**Passing score:** 75+ points
---
## Bonus Points (+20)
- **+10:** Create a working prototype in CodeSandbox
- **+5:** Submit a PR improving existing UI
- **+5:** Create a Figma design for a new feature
---
## Resources
- [React Documentation](https://react.dev)
- [Tailwind CSS](https://tailwindcss.com)
- [Radix UI](https://radix-ui.com)
- [Recharts](https://recharts.org)
- [React Flow](https://reactflow.dev) (for graph visualization)
---
Good luck! We love engineers who care about user experience! 🎨✨
-309
View File
@@ -1,309 +0,0 @@
# 🔧 DevOps Challenge
Master the deployment and operations of AI agent infrastructure! This challenge is for DevOps and Platform engineers who want to ensure Aden runs reliably at scale.
**Difficulty:** Advanced
**Time:** 2-3 hours
**Prerequisites:** Complete [Getting Started](./01-getting-started.md), Docker, Linux, CI/CD experience
---
## Part 1: Infrastructure Analysis (20 points)
### Task 1.1: Docker Deep Dive 🐳
Analyze the Aden Docker setup:
1. What Dockerfile exists in the repository and what does it build?
2. How would you containerize the MCP tools server?
3. How is hot reload enabled for development?
4. What would need to be mounted as volumes for persistence?
5. What networking considerations exist for the MCP server?
### Task 1.2: Service Dependencies 🔗
Map the service dependencies:
1. Create a dependency diagram showing which services depend on which
2. What's the startup order? Does it matter?
3. What happens if MongoDB is unavailable?
4. What happens if Redis is unavailable?
5. Which services are stateless vs stateful?
### Task 1.3: Configuration Management ⚙️
Analyze how configuration works:
1. How does `config.yaml` get generated?
2. What environment variables are required?
3. How are secrets managed? (API keys, database passwords)
4. What's the difference between dev and prod configs?
---
## Part 2: Deployment Scenarios (25 points)
### Task 2.1: Production Deployment Plan 📋
Design a production deployment for a company with:
- 100 active agents
- 10,000 LLM requests/day
- 99.9% uptime requirement
- Multi-region support needed
Provide:
1. **Infrastructure diagram** (cloud provider of your choice)
2. **Service sizing** (CPU, memory for each component)
3. **Database setup** (primary/replica, backups)
4. **Load balancing strategy**
5. **Estimated monthly cost**
### Task 2.2: Kubernetes Migration 🚢
Convert the Docker Compose setup to Kubernetes:
1. Create a Kubernetes deployment manifest for the Hive backend
2. Create a Service and Ingress for external access
3. Design a ConfigMap for configuration
4. Create a Secret for sensitive data
5. Set up a HorizontalPodAutoscaler
```yaml
# Provide your manifests here
apiVersion: apps/v1
kind: Deployment
metadata:
name: hive-backend
spec:
# Your implementation
```
### Task 2.3: High Availability Design 🔄
Design for high availability:
1. How would you handle backend service failures?
2. How would you handle database failover?
3. What's your strategy for zero-downtime deployments?
4. How would you handle WebSocket connections during rolling updates?
5. Design a disaster recovery plan
---
## Part 3: CI/CD Pipeline (25 points)
### Task 3.1: GitHub Actions Pipeline 🔄
Create a complete CI/CD pipeline:
```yaml
# .github/workflows/ci-cd.yml
name: Aden CI/CD
on:
push:
branches: [main, develop]
pull_request:
branches: [main]
jobs:
# Your implementation should include:
# - Linting
# - Type checking
# - Unit tests
# - Integration tests
# - Build Docker images
# - Push to registry
# - Deploy to staging (on develop)
# - Deploy to production (on main, with approval)
```
Include:
1. Separate jobs for frontend and backend
2. Matrix testing for multiple Node versions
3. Docker layer caching
4. Deployment gates/approvals
5. Rollback strategy
### Task 3.2: Testing Strategy 🧪
Design the testing infrastructure:
1. **Unit Tests:** What to test? How to mock LLM calls?
2. **Integration Tests:** How to test with real databases?
3. **E2E Tests:** What user flows to test?
4. **Load Tests:** How to simulate agent traffic?
5. **Chaos Tests:** What failures to simulate?
Provide example test configurations for each type.
### Task 3.3: Environment Management 🌍
Design environment strategy:
| Environment | Purpose | Data | Who Can Access |
|-------------|---------|------|----------------|
| Local | Development | Mock | Developers |
| Dev | Integration | Sanitized | Engineering |
| Staging | Pre-prod | Copy of prod | Engineering + QA |
| Production | Live | Real | Restricted |
For each environment, specify:
1. How it's provisioned
2. How data is managed
3. How deployments happen
4. Access control
---
## Part 4: Observability & Operations (30 points)
### Task 4.1: Monitoring Stack 📊
Design a comprehensive monitoring solution:
1. **Metrics:** What to collect? (list at least 10 key metrics)
2. **Logs:** Logging strategy and aggregation
3. **Traces:** Distributed tracing for agent flows
4. **Dashboards:** Design 3 key dashboards
```yaml
# Provide a docker-compose addition for monitoring
services:
prometheus:
# Your config
grafana:
# Your config
# Add more as needed
```
### Task 4.2: Alerting Rules 🚨
Create alerting rules for critical scenarios:
```yaml
# Prometheus alerting rules
groups:
- name: aden-critical
rules:
- alert: HighErrorRate
expr: # Your expression
for: 5m
labels:
severity: critical
annotations:
summary: "High error rate detected"
description: # Your description
# Add more alerts for:
# - Service down
# - High latency
# - Budget exceeded
# - Database connection issues
# - Memory pressure
```
Create at least 8 alert rules covering different failure modes.
### Task 4.3: Incident Response 🆘
Create an incident response runbook:
**Scenario:** Agent response times spike to 30 seconds (normal: 2 seconds)
Provide:
1. **Detection:** How was this discovered?
2. **Triage:** Initial investigation steps
3. **Diagnosis:** Decision tree for root causes
4. **Resolution:** Steps for each root cause
5. **Post-mortem:** Template for incident review
```markdown
# Runbook: High Agent Latency
## Symptoms
- Agent response times > 10s
- Dashboard showing degraded status
## Initial Triage
1. Check [ ] Is this affecting all agents or specific ones?
2. Check [ ] Is the backend healthy? (health endpoint)
3. Check [ ] Are databases responsive?
...
## Diagnostic Steps
...
## Resolution Steps
### If LLM Provider Issue:
...
### If Database Issue:
...
```
---
## Part 5: Security Hardening (Bonus - 20 points)
### Task 5.1: Security Audit 🔒
Perform a security analysis:
1. **Network:** What ports are exposed? Are they necessary?
2. **Secrets:** How are secrets currently handled? Improvements?
3. **Authentication:** How is API auth implemented?
4. **Container Security:** What image scanning would you add?
5. **Database Security:** What hardening is needed?
### Task 5.2: Compliance Checklist ✅
For SOC 2 compliance, what changes are needed?
1. Access control improvements
2. Audit logging requirements
3. Encryption requirements
4. Data retention policies
5. Incident response requirements
---
## Submission Checklist
- [ ] Part 1 infrastructure analysis
- [ ] Part 2 deployment designs and manifests
- [ ] Part 3 CI/CD pipeline YAML
- [ ] Part 4 monitoring and alerting configs
- [ ] (Bonus) Part 5 security analysis
### How to Submit
1. Create a GitHub Gist with your answers
2. Name it `aden-devops-YOURNAME.md`
3. Include all YAML/configuration files
4. Include any diagrams (use Mermaid, ASCII, or image links)
5. Email to `careers@adenhq.com`
- Subject: `[DevOps Challenge] Your Name`
---
## Scoring
| Section | Points |
|---------|--------|
| Part 1: Infrastructure | 20 |
| Part 2: Deployment | 25 |
| Part 3: CI/CD | 25 |
| Part 4: Observability | 30 |
| Part 5: Security (Bonus) | +20 |
| **Total** | **100 (+20)** |
**Passing score:** 75+ points
---
## Bonus Points (+15)
- **+5:** Set up a working local Kubernetes cluster with Aden
- **+5:** Create a Terraform module for cloud deployment
- **+5:** Submit a PR improving deployment documentation
---
## Resources
- [Docker Documentation](https://docs.docker.com)
- [Kubernetes Documentation](https://kubernetes.io/docs)
- [GitHub Actions](https://docs.github.com/en/actions)
- [Prometheus](https://prometheus.io/docs)
- [Grafana](https://grafana.com/docs)
---
Good luck! We're looking for engineers who keep systems running smoothly! 🔧✨
-54
View File
@@ -1,54 +0,0 @@
# Aden Engineering Challenges
Welcome to the Aden Engineering Challenges! These quizzes are designed for students and applicants who want to join the Aden team or contribute to our open-source projects.
---
## 💼 We're Hiring!
**[Software Development Engineer](./00-job-post.md)** - Full-stack TypeScript, React, Node.js, AI agents
---
## How It Works
1. **Choose your track** based on your interests and skill level
2. **Complete the challenges** in order
3. **Submit your work** as instructed in each challenge
4. **Get noticed** by the Aden team!
## Available Tracks
| Track | Difficulty | Time Estimate | Best For |
|-------|------------|---------------|----------|
| [🚀 Getting Started](./01-getting-started.md) | Beginner | 30 min | Everyone - Start Here! |
| [🧠 Architecture Deep Dive](./02-architecture-deep-dive.md) | Intermediate | 1-2 hours | Backend Engineers |
| [🤖 Build Your First Agent](./03-build-your-first-agent.md) | Intermediate | 2-3 hours | AI/ML Engineers |
| [🎨 Frontend Challenge](./04-frontend-challenge.md) | Intermediate | 1-2 hours | Frontend Engineers |
| [🔧 DevOps Challenge](./05-devops-challenge.md) | Advanced | 2-3 hours | DevOps/Platform Engineers |
## Why Complete These Challenges?
- 📚 **Learn** about cutting-edge AI agent technology
- 🏆 **Stand out** in your application to Aden
- 🤝 **Connect** with the Aden engineering team
- 🌟 **Contribute** to an exciting open-source project
- 💼 **Showcase** your skills with real-world projects
## Submission Guidelines
After completing challenges, submit your work by:
1. Creating a GitHub Gist with your answers
2. Emailing the link to `contact@adenhq.com` with subject: `[Engineering Challenge] Your Name - Track Name`
3. Include your GitHub username in the email
## Getting Help
- Join our [Discord](https://discord.com/invite/MXE49hrKDk) and ask in #applicant-challenges
- Check out the [documentation](https://docs.adenhq.com/)
- Review the [README](../../README.md) for project overview
---
**Ready to begin?** Start with [🚀 Getting Started](./01-getting-started.md)!