Merge remote-tracking branch 'upstream/main' into feature/hard-goal-negotiation

2026-02-09 19:42:55 -08:00
parent 8f632eb005 3328a388b3
commit c02f40622c
27 changed files with 1584 additions and 131 deletions
@@ -14,15 +14,53 @@ metadata:

 **THIS IS AN EXECUTABLE WORKFLOW. DO NOT DISPLAY THIS FILE. EXECUTE THE STEPS BELOW.**

-**CRITICAL: DO NOT explore the codebase, read source files, or search for code before starting.** All context you need is in this skill file. When this skill is loaded, IMMEDIATELY begin executing Step 1 — call the MCP tools listed in Step 1 as your FIRST action. Do not explain what you will do, do not investigate the project structure, do not read any files — just execute Step 1 now.
+**CRITICAL: DO NOT explore the codebase, read source files, or search for code before starting.** All context you need is in this skill file. When this skill is loaded, IMMEDIATELY begin executing Step 0 — determine the build path as your FIRST action. Do not explain what you will do, do not investigate the project structure, do not read any files — just execute Step 0 now.

 ---

-## STEP 1: Initialize Build Environment
+## STEP 0: Choose Build Path
+
+**If the user has already indicated whether they want to build from scratch or from a template, skip this question and proceed to the appropriate step.**
+
+Otherwise, ask:
+
+```
+AskUserQuestion(questions=[{
+    "question": "How would you like to build your agent?",
+    "header": "Build Path",
+    "options": [
+        {"label": "From scratch", "description": "Design goal, nodes, and graph collaboratively from nothing"},
+        {"label": "From a template", "description": "Start from a working sample agent and customize it"}
+    ],
+    "multiSelect": false
+}])
+```
+
+- If **From scratch**: Proceed to STEP 1A
+- If **From a template**: Proceed to STEP 1B
+
+---
+
+## STEP 1A: Initialize Build Environment (From Scratch)

 **EXECUTE THESE TOOL CALLS NOW** (silent setup — no user interaction needed):

-1. Register the hive-tools MCP server:
+1. Check for existing sessions:
+
+```
+mcp__agent-builder__list_sessions()
+```
+
+- If a session with this agent name already exists, load it with `mcp__agent-builder__load_session_by_id(session_id="...")` and skip to step 3.
+- If no matching session exists, proceed to step 2.
+
+2. Create a build session (replace AGENT_NAME with the user's requested agent name in snake_case):
+
+```
+mcp__agent-builder__create_session(name="AGENT_NAME")
+```
+
+3. Register the hive-tools MCP server:

 ```
 mcp__agent-builder__add_mcp_server(
@@ -35,19 +73,13 @@ mcp__agent-builder__add_mcp_server(
 )
 ```

-2. Create a build session (replace AGENT_NAME with the user's requested agent name in snake_case):
-
-```
-mcp__agent-builder__create_session(name="AGENT_NAME")
-```
-
-3. Discover available tools:
+4. Discover available tools:

 ```
 mcp__agent-builder__list_mcp_tools()
 ```

-4. Create the package directory:
+5. Create the package directory:

 ```bash
 mkdir -p exports/AGENT_NAME/nodes
@@ -59,9 +91,130 @@ mkdir -p exports/AGENT_NAME/nodes

 ---

-## STEP 2: Qualify the Use Case
+## STEP 1B: Initialize Build Environment (From Template)

-**A responsible engineer doesn't jump into building. First, understand the problem and be transparent about what the framework can and cannot do.**
+**EXECUTE THESE STEPS NOW:**
+
+### 1B.1: Discover available templates
+
+List the template directories and read each template's `agent.json` to get its name and description:
+
+```bash
+ls examples/templates/
+```
+
+For each directory found, read `examples/templates/TEMPLATE_DIR/agent.json` with the Read tool and extract:
+- `agent.name` — the template's display name
+- `agent.description` — what the template does
+
+### 1B.2: Present templates to user
+
+Show the user a table of available templates:
+
+> **Available Templates:**
+>
+> | # | Template | Description |
+> |---|----------|-------------|
+> | 1 | [name from agent.json] | [description from agent.json] |
+> | 2 | ... | ... |
+
+Then ask the user to pick a template and provide a name for their new agent:
+
+```
+AskUserQuestion(questions=[{
+    "question": "Which template would you like to start from?",
+    "header": "Template",
+    "options": [
+        {"label": "[template 1 name]", "description": "[template 1 description]"},
+        {"label": "[template 2 name]", "description": "[template 2 description]"},
+        ...
+    ],
+    "multiSelect": false
+}, {
+    "question": "What should the new agent be named? (snake_case)",
+    "header": "Agent Name",
+    "options": [
+        {"label": "Use template name", "description": "Keep the original template name as-is"},
+        {"label": "Custom name", "description": "I'll provide a new snake_case name"}
+    ],
+    "multiSelect": false
+}])
+```
+
+### 1B.3: Copy template to exports
+
+```bash
+cp -r examples/templates/TEMPLATE_DIR exports/NEW_AGENT_NAME
+```
+
+### 1B.4: Create session and register MCP (same logic as STEP 1A)
+
+First, check for existing sessions:
+
+```
+mcp__agent-builder__list_sessions()
+```
+
+- If a session with this agent name already exists, load it with `mcp__agent-builder__load_session_by_id(session_id="...")` and skip to `list_mcp_tools`.
+- If no matching session exists, create one:
+
+```
+mcp__agent-builder__create_session(name="NEW_AGENT_NAME")
+```
+
+Then register MCP and discover tools:
+
+```
+mcp__agent-builder__add_mcp_server(
+    name="hive-tools",
+    transport="stdio",
+    command="uv",
+    args='["run", "python", "mcp_server.py", "--stdio"]',
+    cwd="tools",
+    description="Hive tools MCP server"
+)
+```
+
+```
+mcp__agent-builder__list_mcp_tools()
+```
+
+### 1B.5: Load template into builder session
+
+Import the entire agent definition in one call:
+
+```
+mcp__agent-builder__import_from_export(agent_json_path="exports/NEW_AGENT_NAME/agent.json")
+```
+
+This reads the agent.json and populates the builder session with the goal, all nodes, and all edges.
+
+**THEN immediately proceed to STEP 2.**
+
+---
+
+## STEP 2: Define Goal Together with User
+
+**If starting from a template**, the goal is already loaded in the builder session. Present the existing goal to the user using the format below and ask for approval. Skip the collaborative drafting questions — go straight to presenting and asking "Do you approve this goal, or would you like to modify it?"
+
+**If the user has NOT already described what they want to build**, start by asking what kind of agent they have in mind:
+
+```
+AskUserQuestion(questions=[{
+    "question": "What kind of agent do you want to build?",
+    "header": "Agent type",
+    "options": [
+        {"label": "Data collection", "description": "Gathers information from the web, analyzes it, and produces a report or sends outreach (e.g. market research, news digest, email campaigns, competitive analysis)"},
+        {"label": "Workflow automation", "description": "Automates a multi-step business process end-to-end (e.g. lead qualification, content publishing pipeline, data entry)"},
+        {"label": "Personal assistant", "description": "Handles recurring tasks or monitors for events and acts on them (e.g. daily briefings, meeting prep, file organization)"}
+    ],
+    "multiSelect": false
+}])
+```
+
+Use the user's selection (or their custom description if they chose "Other") as context when shaping the goal below. If the user already described what they want before this step, skip the question and proceed directly.
+
+**DO NOT propose a complete goal on your own.** Instead, collaborate with the user to define it.

 ### 2a: Fast Discovery (3-5 Turns)

@@ -310,7 +463,7 @@ AskUserQuestion(questions=[{
 > 2. **How will we know it succeeded?** (what specific outcomes matter)
 > 3. **Are there any hard constraints?** (things it must never do, quality bars)

-**WAIT for the user to respond.** Use their input to draft:
+**WAIT for the user to respond.** Use their input (and the agent type they selected) to draft:

 - Goal ID (kebab-case)
 - Goal name
@@ -359,6 +512,8 @@ AskUserQuestion(questions=[{

 ## STEP 4: Design Conceptual Nodes

+**If starting from a template**, the nodes are already loaded in the builder session. Present the existing nodes using the table format below and ask for approval. Skip the design phase.
+
 **BEFORE designing nodes**, review the available tools from Step 1. Nodes can ONLY use tools that exist.

 **DESIGN the workflow** as a series of nodes. For each node, determine:
@@ -417,6 +572,8 @@ AskUserQuestion(questions=[{

 ## STEP 5: Design Full Graph and Review

+**If starting from a template**, the edges are already loaded in the builder session. Render the existing graph as ASCII art and present it to the user for approval. Skip the edge design phase.
+
 **DETERMINE the edges** connecting the approved nodes. For each edge:

 - edge_id (kebab-case)
@@ -535,6 +692,29 @@ AskUserQuestion(questions=[{
 **NOW — and only now — write the actual code.** The user has approved the goal, nodes, and graph.

 ### 6a: Register nodes and edges with MCP
+**If starting from a template**, the copied files will be overwritten with the approved design. You MUST replace every occurrence of the old template name with the new agent name. Here is the complete checklist — miss NONE of these:
+
+| File | What to rename |
+|------|---------------|
+| `config.py` | `AgentMetadata.name` — the display name shown in TUI agent selection |
+| `config.py` | `AgentMetadata.description` — agent description |
+| `agent.py` | Module docstring (line 1) |
+| `agent.py` | `class OldNameAgent:` → `class NewNameAgent:` |
+| `agent.py` | `GraphSpec(id="old-name-graph")` → `GraphSpec(id="new-name-graph")` — shown in TUI status bar |
+| `agent.py` | Storage path: `Path.home() / ".hive" / "agents" / "old_name"` → `"new_name"` |
+| `__main__.py` | Module docstring (line 1) |
+| `__main__.py` | `from .agent import ... OldNameAgent` → `NewNameAgent` |
+| `__main__.py` | CLI help string in `def cli()` docstring |
+| `__main__.py` | All `OldNameAgent()` instantiations |
+| `__main__.py` | Storage path (duplicated from agent.py) |
+| `__main__.py` | Shell banner string (e.g. `"=== Old Name Agent ==="`) |
+| `__init__.py` | Package docstring |
+| `__init__.py` | `from .agent import OldNameAgent` import |
+| `__init__.py` | `__all__` list entry |
+
+### 5a: Register nodes and edges with MCP
+
+**If starting from a template and no modifications were made in Steps 2-4**, the nodes and edges are already registered. Skip to validation (`mcp__agent-builder__validate_graph()`). If modifications were made, re-register the changed nodes/edges (the MCP tools handle duplicates by overwriting).

 **FOR EACH approved node**, call:

@@ -99,7 +99,7 @@ Use this meta-skill when:
 ## Phase 1: Build Agent Structure

 **Skill**: `/hive-create`
-**Input**: User requirements ("Build an agent that...")
+**Input**: User requirements ("Build an agent that...") or a template to start from

 ### What This Phase Does

@@ -287,6 +287,19 @@ User: "Build an agent (first time)"
 → Done: Production-ready agent
 ```

+### Pattern 1c: Build from Template
+
+```
+User: "Build an agent based on the deep research template"
+→ Use /hive-create
+→ Select "From a template" path
+→ Pick template, name new agent
+→ Review/modify goal, nodes, graph
+→ Agent exported with customizations
+→ Use /hive-test
+→ Done: Customized agent
+```
+
 ### Pattern 2: Test Existing Agent

 ```
@@ -490,6 +503,7 @@ The workflow is **flexible** - skip phases as needed, iterate freely, and adapt
 - Have clear requirements
 - Ready to write code
 - Want step-by-step guidance
+- Want to start from an existing template and customize it

 **Choose hive-patterns when:**
 - Agent structure complete
@@ -1,65 +0,0 @@
-# Changelog
-
-All notable changes to this project will be documented in this file.
-
-The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
-and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
-
-## [Unreleased]
-
-### Added
-
-### Changed
-
-### Fixed
-
-### Security
-
-## [0.4.2] - 2026-02-08
-
-### Added
- Resumable sessions: agents now automatically save state and can resume after interruptions
- `/resume` command in TUI to resume latest paused/failed session
- `/resume <session_id>` command to resume specific sessions
- `/sessions` command to list all sessions for current agent
- `--resume-session` CLI flag for automatic session resumption on startup
- `--checkpoint <checkpoint_id>` CLI flag for checkpoint-based recovery
- Ctrl+Z now immediately pauses execution with full state capture
- `/pause` command for immediate pause during execution
- Session state persistence: memory, execution path, node positions, visit counts
- Unified session storage at `~/.hive/agents/{agent_name}/sessions/`
- Automatic memory restoration on resume with full conversation history
-
-### Changed
- TUI quit now pauses execution and saves state instead of cancelling
- Pause operations now use immediate task cancellation instead of waiting for node boundaries
- Session cleanup timeout increased from 0.5s to 5s to ensure proper state saving
- Session status now tracked as: active, paused, completed, failed, cancelled
-
-### Deprecated
- Pause nodes (use client-facing EventLoopNodes instead)
- `request_pause()` method (replaced with immediate task cancellation)
-
-### Removed
- N/A
-
-### Fixed
- Memory persistence: ExecutionResult.session_state["memory"] now populated at all exit points
- Resume now starts at correct paused_at node instead of intake node
- Visit count double-counting on resume (paused node count now properly adjusted)
- Session selection now picks most recent session instead of oldest
- Quit state save failures due to insufficient timeout
- Ctrl+Z pause implementation (was only showing notification without pausing)
- Empty memory on resume by ensuring session_state["memory"] is properly populated
-
-### Security
- N/A
-
-## [0.1.0] - 2025-01-13
-
-### Added
- Initial release
-
-[Unreleased]: https://github.com/adenhq/hive/compare/v0.4.2...HEAD
-[0.4.2]: https://github.com/adenhq/hive/compare/v0.4.0...v0.4.2
-[0.1.0]: https://github.com/adenhq/hive/releases/tag/v0.1.0
@@ -1,4 +1,5 @@
 exports/
 docs/
+.agent-builder-sessions/
 .pytest_cache/
 **/__pycache__/
@@ -33,6 +33,7 @@ from framework.graph.node import (
 from framework.graph.output_cleaner import CleansingConfig, OutputCleaner
 from framework.graph.validator import OutputValidator
 from framework.llm.provider import LLMProvider, Tool
+from framework.observability import set_trace_context
 from framework.runtime.core import Runtime
 from framework.schemas.checkpoint import Checkpoint
 from framework.storage.checkpoint_store import CheckpointStore
@@ -228,6 +229,9 @@ class GraphExecutor:
        Returns:
            ExecutionResult with output and metrics
        """
+        # Add agent_id to trace context for correlation
+        set_trace_context(agent_id=graph.id)
+
        # Validate graph
        errors = graph.validate()
        if errors:
@@ -404,7 +408,6 @@ class GraphExecutor:

        if self.runtime_logger:
            # Extract session_id from storage_path if available (for unified sessions)
-            # storage_path format: base_path/sessions/{session_id}/
            session_id = ""
            if self._storage_path and self._storage_path.name.startswith("session_"):
                session_id = self._storage_path.name
@@ -23,6 +23,7 @@ if _exports_dir.is_dir() and str(_exports_dir) not in sys.path:
 del _framework_dir, _project_root, _exports_dir

 from mcp.server import FastMCP  # noqa: E402
+from pydantic import ValidationError  # noqa: E402

 from framework.graph import (  # noqa: E402
    Constraint,
@@ -1856,6 +1857,85 @@ def export_graph() -> str:
    )


+@mcp.tool()
+def import_from_export(
+    agent_json_path: Annotated[str, "Path to the agent.json file to import"],
+) -> str:
+    """
+    Import an agent definition from an exported agent.json file into the current build session.
+
+    Reads the agent.json, parses goal/nodes/edges, and populates the current session.
+    This is the reverse of export_graph().
+
+    Args:
+        agent_json_path: Path to the agent.json file to import
+
+    Returns:
+        JSON summary of what was imported (goal name, node count, edge count)
+    """
+    session = get_session()
+
+    path = Path(agent_json_path)
+    if not path.exists():
+        return json.dumps({"success": False, "error": f"File not found: {agent_json_path}"})
+
+    try:
+        data = json.loads(path.read_text())
+    except json.JSONDecodeError as e:
+        return json.dumps({"success": False, "error": f"Invalid JSON: {e}"})
+
+    try:
+        # Parse goal (same pattern as BuildSession.from_dict lines 88-99)
+        goal_data = data.get("goal")
+        if goal_data:
+            session.goal = Goal(
+                id=goal_data["id"],
+                name=goal_data["name"],
+                description=goal_data["description"],
+                success_criteria=[
+                    SuccessCriterion(**sc) for sc in goal_data.get("success_criteria", [])
+                ],
+                constraints=[Constraint(**c) for c in goal_data.get("constraints", [])],
+            )
+
+        # Parse nodes (same pattern as BuildSession.from_dict line 102)
+        graph_data = data.get("graph", {})
+        nodes_data = graph_data.get("nodes", [])
+        session.nodes = [NodeSpec(**n) for n in nodes_data]
+
+        # Parse edges (same pattern as BuildSession.from_dict lines 105-118)
+        edges_data = graph_data.get("edges", [])
+        session.edges = []
+        for e in edges_data:
+            condition_str = e.get("condition")
+            if isinstance(condition_str, str):
+                condition_map = {
+                    "always": EdgeCondition.ALWAYS,
+                    "on_success": EdgeCondition.ON_SUCCESS,
+                    "on_failure": EdgeCondition.ON_FAILURE,
+                    "conditional": EdgeCondition.CONDITIONAL,
+                    "llm_decide": EdgeCondition.LLM_DECIDE,
+                }
+                e["condition"] = condition_map.get(condition_str, EdgeCondition.ON_SUCCESS)
+            session.edges.append(EdgeSpec(**e))
+    except (KeyError, TypeError, ValueError, ValidationError) as e:
+        return json.dumps({"success": False, "error": f"Malformed agent.json: {e}"})
+
+    # Persist updated session
+    _save_session(session)
+
+    return json.dumps(
+        {
+            "success": True,
+            "goal": session.goal.name if session.goal else None,
+            "nodes_count": len(session.nodes),
+            "edges_count": len(session.edges),
+            "node_ids": [n.id for n in session.nodes],
+            "edge_ids": [e.id for e in session.edges],
+        }
+    )
+
+
@mcp.tool()
 def get_session_status() -> str:
    """Get the current status of the build session."""
@@ -0,0 +1,236 @@
+# Observability - Structured Logging
+
+## Configuration via Environment Variables
+
+Control logging format using environment variables:
+
+```bash
+# JSON logging (production) - Machine-parseable, one line per log
+export LOG_FORMAT=json
+python -m my_agent run
+
+# Human-readable (development) - Color-coded, easy to read
+# Default if LOG_FORMAT is not set
+python -m my_agent run
+```
+
+**Alternative:** Set `ENV=production` to automatically use JSON format:
+
+```bash
+export ENV=production
+python -m my_agent run
+```
+
+---
+
+## Overview
+
+The Hive framework provides automatic structured logging with trace context propagation. Logs include correlation IDs (`trace_id`, `execution_id`) that automatically follow your agent execution flow.
+
+**Features:**
+- **Zero developer friction**: Standard `logger.info()` calls automatically get trace context
+- **ContextVar-based propagation**: Thread-safe and async-safe for concurrent executions
+- **Dual output modes**: JSON for production, human-readable for development
+- **Automatic correlation**: `trace_id` and `execution_id` propagate through all logs
+
+## Quick Start
+
+Logging is automatically configured when you use `AgentRunner`. No setup required:
+
+```python
+from framework.runner import AgentRunner
+
+runner = AgentRunner(graph=my_graph, goal=my_goal)
+result = await runner.run({"input": "data"})
+# Logs automatically include trace_id, execution_id, agent_id, etc.
+```
+
+## Programmatic Configuration
+
+Configure logging explicitly in your code:
+
+```python
+from framework.observability import configure_logging
+
+# Human-readable (development)
+configure_logging(level="DEBUG", format="human")
+
+# JSON (production)
+configure_logging(level="INFO", format="json")
+
+# Auto-detect from environment
+configure_logging(level="INFO", format="auto")
+```
+
+### Configuration Options
+
+- **level**: `"DEBUG"`, `"INFO"`, `"WARNING"`, `"ERROR"`, `"CRITICAL"`
+- **format**: 
+  - `"json"` - Machine-parseable JSON (one line per log entry)
+  - `"human"` - Human-readable with colors
+  - `"auto"` - Detects from `LOG_FORMAT` env var or `ENV=production`
+
+## Log Format Examples
+
+### JSON Format (Machine-parseable)
+
+```json
+{"timestamp": "2026-01-28T15:01:02.671126+00:00", "level": "info", "logger": "framework.runtime", "message": "Starting agent execution", "trace_id": "54e80d7b5bd6409dbc3217e5cd16a4fd", "execution_id": "b4c348ec54e80d7b5bd6409dbc3217e50", "agent_id": "sales-agent", "goal_id": "qualify-leads"}
+```
+
+**Features:**
+- `trace_id` and `execution_id` are 32 hex chars (W3C/OTel-aligned, no prefixes)
+- Compact single-line format (easy to stream/parse)
+- All trace context fields included automatically
+
+### Human-Readable Format (Development)
+
+```
+[INFO    ] [trace:12345678 | exec:a1b2c3d4 | agent:sales-agent] Starting agent execution
+[INFO    ] [trace:12345678 | exec:a1b2c3d4 | agent:sales-agent] Processing input data [node_id:input-processor]
+[INFO    ] [trace:12345678 | exec:a1b2c3d4 | agent:sales-agent] LLM call completed [latency_ms:1250] [tokens_used:450]
+```
+
+**Features:**
+- Color-coded log levels
+- Shortened IDs for readability (first 8 chars)
+- Context prefix shows trace correlation
+
+## Trace Context Fields
+
+When the framework sets trace context, these fields are included in all logs. IDs are 32 hex (W3C/OTel-aligned, no prefixes).
+
+- **trace_id**: Trace identifier
+- **execution_id**: Run/session correlation
+- **agent_id**: Agent/graph identifier
+- **goal_id**: Goal being pursued
+- **node_id**: Current node (when set)
+
+## Custom Log Fields
+
+Add custom fields using the `extra` parameter:
+
+```python
+import logging
+
+logger = logging.getLogger("my_module")
+
+# Add custom fields
+logger.info("LLM call completed", extra={
+    "latency_ms": 1250,
+    "tokens_used": 450,
+    "model": "claude-3-5-sonnet-20241022",
+    "node_id": "web-search"
+})
+```
+
+These fields appear in both JSON and human-readable formats.
+
+## Usage in Your Code
+
+### Standard Logging (Recommended)
+
+Just use Python's standard logging - context is automatic:
+
+```python
+import logging
+
+logger = logging.getLogger(__name__)
+
+def my_function():
+    # This log automatically includes trace_id, execution_id, etc.
+    logger.info("Processing data")
+    
+    try:
+        result = do_work()
+        logger.info("Work completed", extra={"result_count": len(result)})
+    except Exception as e:
+        logger.error("Work failed", exc_info=True)
+```
+
+### Framework-Managed Context
+
+The framework automatically sets trace context at key points:
+
+- **Runtime.start_run()**: Sets `trace_id`, `execution_id`, `goal_id`
+- **GraphExecutor.execute()**: Adds `agent_id`
+- **Node execution**: Adds `node_id`
+
+Propagation is automatic via ContextVar.
+
+## Advanced Usage
+
+### Manual Context Management
+
+If you need to set trace context manually (rare):
+
+```python
+from framework.observability import set_trace_context, get_trace_context
+
+# Set context (32-hex, no prefixes)
+set_trace_context(
+    trace_id="54e80d7b5bd6409dbc3217e5cd16a4fd",
+    execution_id="b4c348ec54e80d7b5bd6409dbc3217e50",
+    agent_id="my-agent"
+)
+
+# Get current context
+context = get_trace_context()
+print(context["execution_id"])
+
+# Clear context (usually not needed)
+from framework.observability import clear_trace_context
+clear_trace_context()
+```
+
+### Testing
+
+For tests, you may want to configure logging explicitly:
+
+```python
+import pytest
+from framework.observability import configure_logging
+
+@pytest.fixture(autouse=True)
+def setup_logging():
+    configure_logging(level="DEBUG", format="human")
+```
+
+## Best Practices
+
+1. **Production**: Use JSON format (`LOG_FORMAT=json` or `ENV=production`)
+2. **Development**: Use human-readable format (default)
+3. **Don't manually set context**: Let the framework manage it
+4. **Use standard logging**: No special APIs needed - just `logger.info()`
+5. **Add custom fields**: Use `extra` dict for additional metadata
+
+## Troubleshooting
+
+### Logs missing trace context
+
+Ensure `configure_logging()` has been called (usually automatic via `AgentRunner._setup()`).
+
+### JSON logs not appearing
+
+Check environment variables:
+```bash
+echo $LOG_FORMAT
+echo $ENV
+```
+
+Or explicitly set:
+```python
+configure_logging(format="json")
+```
+
+### Context not propagating
+
+ContextVar automatically propagates through async calls. If context seems lost, check:
+- Are you in the same async execution context?
+- Has `set_trace_context()` been called for this execution?
+
+## See Also
+
+- [Logging Implementation](../observability/logging.py) - Source code
+- [AgentRunner](../runner/runner.py) - Where logging is configured
+- [Runtime Core](../runtime/core.py) - Where trace context is set
@@ -0,0 +1,23 @@
+"""
+Observability module for automatic trace correlation and structured logging.
+
+This module provides zero-friction observability:
+- Automatic trace context propagation via ContextVar
+- Structured JSON logging for production
+- Human-readable logging for development
+- No manual ID passing required
+"""
+
+from framework.observability.logging import (
+    clear_trace_context,
+    configure_logging,
+    get_trace_context,
+    set_trace_context,
+)
+
+__all__ = [
+    "configure_logging",
+    "get_trace_context",
+    "set_trace_context",
+    "clear_trace_context",
+]
@@ -0,0 +1,302 @@
+"""
+Structured logging with automatic trace context propagation.
+
+Key Features:
+- Zero developer friction: Standard logger.info() calls get automatic context
+- ContextVar-based propagation: Thread-safe and async-safe
+- Dual output modes: JSON for production, human-readable for development
+- Correlation IDs: trace_id follows entire request flow automatically
+
+Architecture:
+    Runtime.start_run() → Generates trace_id, sets context once
+        ↓ (automatic propagation via ContextVar)
+    GraphExecutor.execute() → Adds agent_id to context
+        ↓ (automatic propagation)
+    Node.execute() → Adds node_id to context
+        ↓ (automatic propagation)
+    User code → logger.info("message") → Gets ALL context automatically!
+"""
+
+import json
+import logging
+import os
+import re
+from contextvars import ContextVar
+from datetime import UTC, datetime
+from typing import Any
+
+# Context variable for trace propagation
+# ContextVar is thread-safe and async-safe - perfect for concurrent agent execution
+trace_context: ContextVar[dict[str, Any] | None] = ContextVar("trace_context", default=None)
+
+# ANSI escape code pattern (matches \033[...m or \x1b[...m)
+ANSI_ESCAPE_PATTERN = re.compile(r"\x1b\[[0-9;]*m|\033\[[0-9;]*m")
+
+
+def strip_ansi_codes(text: str) -> str:
+    """Remove ANSI escape codes from text for clean JSON logging."""
+    return ANSI_ESCAPE_PATTERN.sub("", text)
+
+
+class StructuredFormatter(logging.Formatter):
+    """
+    JSON formatter for structured logging.
+
+    Produces machine-parseable log entries with:
+    - Standard fields (timestamp, level, logger, message)
+    - Trace context (trace_id, execution_id, agent_id, etc.) - AUTOMATIC
+    - Custom fields from extra dict
+    """
+
+    def format(self, record: logging.LogRecord) -> str:
+        """Format log record as JSON."""
+        # Get trace context for correlation - AUTOMATIC!
+        context = trace_context.get() or {}
+
+        # Strip ANSI codes from message for clean JSON output
+        message = strip_ansi_codes(record.getMessage())
+
+        # Build base log entry
+        log_entry = {
+            "timestamp": datetime.now(UTC).isoformat(),
+            "level": record.levelname.lower(),
+            "logger": record.name,
+            "message": message,
+        }
+
+        # Add trace context (trace_id, execution_id, agent_id, etc.) - AUTOMATIC!
+        log_entry.update(context)
+
+        # Add custom fields from extra (optional)
+        event = getattr(record, "event", None)
+        if event is not None:
+            if isinstance(event, str):
+                log_entry["event"] = strip_ansi_codes(str(event))
+            else:
+                log_entry["event"] = event
+
+        latency_ms = getattr(record, "latency_ms", None)
+        if latency_ms is not None:
+            log_entry["latency_ms"] = latency_ms
+
+        tokens_used = getattr(record, "tokens_used", None)
+        if tokens_used is not None:
+            log_entry["tokens_used"] = tokens_used
+
+        node_id = getattr(record, "node_id", None)
+        if node_id is not None:
+            log_entry["node_id"] = node_id
+
+        model = getattr(record, "model", None)
+        if model is not None:
+            log_entry["model"] = model
+
+        # Add exception info if present (strip ANSI codes from exception text too)
+        if record.exc_info:
+            exception_text = self.formatException(record.exc_info)
+            log_entry["exception"] = strip_ansi_codes(exception_text)
+
+        return json.dumps(log_entry)
+
+
+class HumanReadableFormatter(logging.Formatter):
+    """
+    Human-readable formatter for development.
+
+    Provides colorized logs with trace context for local debugging.
+    Includes trace_id prefix for correlation - AUTOMATIC!
+    """
+
+    COLORS = {
+        "DEBUG": "\033[36m",  # Cyan
+        "INFO": "\033[32m",  # Green
+        "WARNING": "\033[33m",  # Yellow
+        "ERROR": "\033[31m",  # Red
+        "CRITICAL": "\033[35m",  # Magenta
+    }
+    RESET = "\033[0m"
+
+    def format(self, record: logging.LogRecord) -> str:
+        """Format log record as human-readable string."""
+        # Get trace context - AUTOMATIC!
+        context = trace_context.get() or {}
+        trace_id = context.get("trace_id", "")
+        execution_id = context.get("execution_id", "")
+        agent_id = context.get("agent_id", "")
+
+        # Build context prefix
+        prefix_parts = []
+        if trace_id:
+            prefix_parts.append(f"trace:{trace_id[:8]}")
+        if execution_id:
+            prefix_parts.append(f"exec:{execution_id[-8:]}")
+        if agent_id:
+            prefix_parts.append(f"agent:{agent_id}")
+
+        context_prefix = f"[{' | '.join(prefix_parts)}] " if prefix_parts else ""
+
+        # Get color
+        color = self.COLORS.get(record.levelname, "")
+        reset = self.RESET
+
+        # Format log level (5 chars wide for alignment)
+        level = f"{record.levelname:<8}"
+
+        # Add event if present
+        event = ""
+        record_event = getattr(record, "event", None)
+        if record_event is not None:
+            event = f" [{record_event}]"
+
+        # Format message: [LEVEL] [trace context] message
+        return f"{color}[{level}]{reset} {context_prefix}{record.getMessage()}{event}"
+
+
+def configure_logging(
+    level: str = "INFO",
+    format: str = "auto",  # "json", "human", or "auto"
+) -> None:
+    """
+    Configure structured logging for the application.
+
+    This should be called ONCE at application startup, typically in:
+    - AgentRunner._setup()
+    - Main entry point
+    - Test fixtures
+
+    Args:
+        level: Log level (DEBUG, INFO, WARNING, ERROR, CRITICAL)
+        format: Output format:
+            - "json": Machine-parseable JSON (for production)
+            - "human": Human-readable with colors (for development)
+            - "auto": JSON if LOG_FORMAT=json or ENV=production, else human
+
+    Examples:
+        # Development mode (human-readable)
+        configure_logging(level="DEBUG", format="human")
+
+        # Production mode (JSON)
+        configure_logging(level="INFO", format="json")
+
+        # Auto-detect from environment
+        configure_logging(level="INFO", format="auto")
+    """
+    # Auto-detect format
+    if format == "auto":
+        # Use JSON if LOG_FORMAT=json or ENV=production
+        log_format_env = os.getenv("LOG_FORMAT", "").lower()
+        env = os.getenv("ENV", "development").lower()
+
+        if log_format_env == "json" or env == "production":
+            format = "json"
+        else:
+            format = "human"
+
+    # Select formatter
+    if format == "json":
+        formatter = StructuredFormatter()
+        # Disable colors in third-party libraries when using JSON format
+        _disable_third_party_colors()
+    else:
+        formatter = HumanReadableFormatter()
+
+    # Configure handler
+    handler = logging.StreamHandler()
+    handler.setFormatter(formatter)
+
+    # Configure root logger
+    root_logger = logging.getLogger()
+    root_logger.handlers.clear()
+    root_logger.addHandler(handler)
+    root_logger.setLevel(level.upper())
+
+    # When in JSON mode, configure known third-party loggers to use JSON formatter
+    # This ensures libraries like LiteLLM, httpcore also output clean JSON
+    if format == "json":
+        third_party_loggers = [
+            "LiteLLM",
+            "httpcore",
+            "httpx",
+            "openai",
+        ]
+        for logger_name in third_party_loggers:
+            logger = logging.getLogger(logger_name)
+            # Clear existing handlers so records propagate to root and use our formatter there
+            logger.handlers.clear()
+            logger.propagate = True  # Still propagate to root for consistency
+
+
+def _disable_third_party_colors() -> None:
+    """Disable color output in third-party libraries for clean JSON logging."""
+    # Set NO_COLOR environment variable (common convention for disabling colors)
+    os.environ["NO_COLOR"] = "1"
+    os.environ["FORCE_COLOR"] = "0"
+
+    # Disable LiteLLM debug/verbose output colors if available
+    try:
+        import litellm
+
+        # LiteLLM respects NO_COLOR, but we can also suppress debug info
+        if hasattr(litellm, "suppress_debug_info"):
+            litellm.suppress_debug_info = True  # type: ignore[attr-defined]
+    except (ImportError, AttributeError):
+        pass
+
+
+def set_trace_context(**kwargs: Any) -> None:
+    """
+    Set trace context for current execution.
+
+    Context is stored in a ContextVar and AUTOMATICALLY propagates
+    through async calls within the same execution context.
+
+    This is called by the framework at key points:
+    - Runtime.start_run(): Sets trace_id, execution_id, goal_id
+    - GraphExecutor.execute(): Adds agent_id
+    - Node execution: Adds node_id
+
+    Developers/agents NEVER call this directly - it's framework-managed.
+
+    Args:
+        **kwargs: Context fields (trace_id, execution_id, agent_id, etc.)
+
+    Example (framework code):
+        # In Runtime.start_run()
+        trace_id = uuid.uuid4().hex  # 32 hex, W3C Trace Context compliant
+        execution_id = uuid.uuid4().hex  # 32 hex, OTel-aligned for correlation
+        set_trace_context(
+            trace_id=trace_id,
+            execution_id=execution_id,
+            goal_id=goal_id
+        )
+        # All subsequent logs in this execution get these fields automatically!
+    """
+    current = trace_context.get() or {}
+    trace_context.set({**current, **kwargs})
+
+
+def get_trace_context() -> dict:
+    """
+    Get current trace context.
+
+    Returns:
+        Dict with trace_id, execution_id, agent_id, etc.
+        Empty dict if no context set.
+    """
+    context = trace_context.get() or {}
+    return context.copy()
+
+
+def clear_trace_context() -> None:
+    """
+    Clear trace context.
+
+    Useful for:
+    - Cleanup between test runs
+    - Starting a completely new execution context
+    - Manual context management (rare)
+
+    Note: Framework typically doesn't need to call this - ContextVar
+    is execution-scoped and cleans itself up automatically.
+    """
+    trace_context.set(None)
@@ -562,6 +562,11 @@ class AgentRunner:

    def _setup(self) -> None:
        """Set up runtime, LLM, and executor."""
+        # Configure structured logging (auto-detects JSON vs human-readable)
+        from framework.observability import configure_logging
+
+        configure_logging(level="INFO", format="auto")
+
        # Set up session context for tools (workspace_id, agent_id, session_id)
        workspace_id = "default"  # Could be derived from storage path
        agent_id = self.graph.id or "unknown"
@@ -197,8 +197,17 @@ class NodeStepLog:
    tokens_used: int
    latency_ms: int
    # ... detailed execution state
+    # Trace context (OTel-aligned; empty if observability context not set):
+    trace_id: str   # From set_trace_context (OTel trace)
+    span_id: str    # 16 hex chars per step (OTel span)
+    parent_span_id: str  # Optional; for nested span hierarchy
+    execution_id: str    # Session/run correlation id
 ```

+L3 entries include `trace_id`, `span_id`, and `execution_id` for correlation and **OpenTelemetry (OTel) compatibility**. When the framework sets trace context (e.g. via `Runtime.start_run()` or `StreamRuntime.start_run()`), these fields are populated automatically so L3 data can be exported to OTel backends without schema changes.
+
+**L2: NodeDetail** also includes `trace_id` and `span_id`; **L1: RunSummaryLog** includes `trace_id` and `execution_id` for the same correlation.
+
 ---

 ## Querying Logs (MCP Tools)
@@ -520,9 +529,10 @@ logger.start_run(goal_id, session_id=execution_id)
 **Written:** Incrementally (append per step)
 **Format:** JSONL (one JSON object per line)

+Each line includes **trace context** when the framework has set it (via the observability module): `trace_id`, `span_id`, `parent_span_id` (optional), and `execution_id`. These align with OpenTelemetry/W3C TraceContext so L3 data can be exported to OTel backends without schema changes.
+
 ```jsonl
-{"node_id":"intake-collector","step_index":3,"tool_calls":[{"tool":"web_search","args":{"query":"@RomuloNevesOf"}}],"tool_results":[{"status":"success","data":"..."}],"verdict":"RETRY","verdict_feedback":"Missing required output 'twitter_handles'. You found the handle but didn't call set_output.","llm_response_text":"I found the profile...","tokens_used":1234,"latency_ms":2500}
-{"node_id":"intake-collector","step_index":4,"tool_calls":[{"tool":"web_search","args":{"query":"@RomuloNevesOf twitter"}}],"tool_results":[{"status":"success","data":"..."}],"verdict":"RETRY","verdict_feedback":"Still missing 'twitter_handles'.","llm_response_text":"Found more info...","tokens_used":1456,"latency_ms":2300}
+{"node_id":"intake-collector","step_index":3,"trace_id":"54e80d7b5bd6409dbc3217e5cd16a4fd","span_id":"a1b2c3d4e5f67890","execution_id":"b4c348ec54e80d7b5bd6409dbc3217e50","tool_calls":[...],"verdict":"RETRY",...}
 ```

 **Why JSONL?**
@@ -13,6 +13,7 @@ from datetime import datetime
 from pathlib import Path
 from typing import Any

+from framework.observability import set_trace_context
 from framework.schemas.decision import Decision, DecisionType, Option, Outcome
 from framework.schemas.run import Run, RunStatus
 from framework.storage.backend import FileStorage
@@ -79,6 +80,14 @@ class Runtime:
            The run ID
        """
        run_id = f"run_{datetime.now().strftime('%Y%m%d_%H%M%S')}_{uuid.uuid4().hex[:8]}"
+        trace_id = uuid.uuid4().hex
+        execution_id = uuid.uuid4().hex  # 32 hex, OTel/W3C-aligned for logs
+
+        set_trace_context(
+            trace_id=trace_id,
+            execution_id=execution_id,
+            goal_id=goal_id,
+        )

        self._current_run = Run(
            id=run_id,
@@ -361,6 +361,13 @@ class ExecutionStream:
                # Create runtime adapter for this execution
                runtime_adapter = StreamRuntimeAdapter(self._runtime, execution_id)

+                # Start run to set trace context (CRITICAL for observability)
+                runtime_adapter.start_run(
+                    goal_id=self.goal.id,
+                    goal_description=self.goal.description,
+                    input_data=ctx.input_data,
+                )
+
                # Create per-execution runtime logger
                runtime_logger = None
                if self._runtime_log_store:
@@ -413,6 +420,13 @@ class ExecutionStream:
                # Store result with retention
                self._record_execution_result(execution_id, result)

+                # End run to complete trace (for observability)
+                runtime_adapter.end_run(
+                    success=result.success,
+                    narrative=f"Execution {'succeeded' if result.success else 'failed'}",
+                    output_data=result.output,
+                )
+
                # Update context
                ctx.completed_at = datetime.now()
                ctx.status = "completed" if result.success else "failed"
@@ -495,6 +509,16 @@ class ExecutionStream:
                # Write error session state
                await self._write_session_state(execution_id, ctx, error=str(e))

+                # End run with failure (for observability)
+                try:
+                    runtime_adapter.end_run(
+                        success=False,
+                        narrative=f"Execution failed: {str(e)}",
+                        output_data={},
+                    )
+                except Exception:
+                    pass  # Don't let end_run errors mask the original error
+
                # Emit failure event
                if self._event_bus:
                    await self._event_bus.emit_execution_failed(
@@ -31,6 +31,9 @@ class NodeStepLog(BaseModel):

    For EventLoopNode, each iteration is a step. For single-step nodes
    (LLMNode, FunctionNode, RouterNode), step_index is 0.
+
+    OTel-aligned fields (trace_id, span_id, execution_id) enable correlation
+    and future OpenTelemetry export without schema changes.
    """

    node_id: str
@@ -48,6 +51,11 @@ class NodeStepLog(BaseModel):
    error: str = ""  # Error message if step failed
    stacktrace: str = ""  # Full stack trace if exception occurred
    is_partial: bool = False  # True if step didn't complete normally
+    # OTel / trace context (from observability; empty if not set):
+    trace_id: str = ""  # OTel trace id (e.g. from set_trace_context)
+    span_id: str = ""  # OTel span id (16 hex chars per step)
+    parent_span_id: str = ""  # Optional; for nested span hierarchy
+    execution_id: str = ""  # Session/run correlation id


 # ---------------------------------------------------------------------------
@@ -56,7 +64,10 @@ class NodeStepLog(BaseModel):


 class NodeDetail(BaseModel):
-    """Per-node completion result and attention flags."""
+    """Per-node completion result and attention flags.
+
+    OTel-aligned fields (trace_id, span_id) tie L2 to the same trace as L3.
+    """

    node_id: str
    node_name: str = ""
@@ -78,6 +89,9 @@ class NodeDetail(BaseModel):
    continue_count: int = 0
    needs_attention: bool = False
    attention_reasons: list[str] = Field(default_factory=list)
+    # OTel / trace context (from observability; empty if not set):
+    trace_id: str = ""
+    span_id: str = ""  # Optional node-level span for hierarchy


 # ---------------------------------------------------------------------------
@@ -86,7 +100,10 @@ class NodeDetail(BaseModel):


 class RunSummaryLog(BaseModel):
-    """Run-level summary for a full graph execution."""
+    """Run-level summary for a full graph execution.
+
+    OTel-aligned fields (trace_id, execution_id) tie L1 to the same trace as L2/L3.
+    """

    run_id: str
    agent_id: str = ""
@@ -101,6 +118,9 @@ class RunSummaryLog(BaseModel):
    started_at: str = ""  # ISO timestamp
    duration_ms: int = 0
    execution_quality: str = ""  # "clean"|"degraded"|"failed"
+    # OTel / trace context (from observability; empty if not set):
+    trace_id: str = ""
+    execution_id: str = ""


 # ---------------------------------------------------------------------------
@@ -52,29 +52,20 @@ class RuntimeLogStore:

        - New format (session_*): {storage_root}/sessions/{run_id}/logs/
        - Old format (anything else): {base_path}/runs/{run_id}/ (deprecated)
-
-        When base_path ends with 'runtime_logs', we use the parent directory
-        to avoid nesting under runtime_logs/.
-
-        This allows backward compatibility for reading old logs.
        """
        if run_id.startswith("session_"):
-            # New: sessions/{session_id}/logs/
-            # If base_path ends with runtime_logs, use parent (storage root)
            is_runtime_logs = self._base_path.name == "runtime_logs"
            root = self._base_path.parent if is_runtime_logs else self._base_path
            return root / "sessions" / run_id / "logs"
-        else:
-            # Old: runs/{run_id}/ (deprecated, backward compatibility only)
-            import warnings
+        import warnings

-            warnings.warn(
-                f"Reading logs from deprecated location for run_id={run_id}. "
-                "New sessions use unified storage at sessions/session_*/logs/",
-                DeprecationWarning,
-                stacklevel=3,
-            )
-            return self._base_path / "runs" / run_id
+        warnings.warn(
+            f"Reading logs from deprecated location for run_id={run_id}. "
+            "New sessions use unified storage at sessions/session_*/logs/",
+            DeprecationWarning,
+            stacklevel=3,
+        )
+        return self._base_path / "runs" / run_id

    # -------------------------------------------------------------------
    # Incremental write (sync — called from locked sections)
@@ -26,6 +26,7 @@ import uuid
 from datetime import UTC, datetime
 from typing import Any

+from framework.observability import get_trace_context
 from framework.runtime.runtime_log_schemas import (
    NodeDetail,
    NodeStepLog,
@@ -64,10 +65,8 @@ class RuntimeLogger:
            The run_id (same as session_id if provided)
        """
        if session_id:
-            # Use provided session_id as run_id (unified sessions)
            self._run_id = session_id
        else:
-            # Generate run_id in old format (backward compatibility)
            ts = datetime.now(UTC).strftime("%Y%m%dT%H%M%S")
            short_uuid = uuid.uuid4().hex[:8]
            self._run_id = f"{ts}_{short_uuid}"
@@ -118,6 +117,12 @@ class RuntimeLogger:
                )
            )

+        # OTel / trace context: from observability ContextVar (empty if not set)
+        ctx = get_trace_context()
+        trace_id = ctx.get("trace_id", "")
+        execution_id = ctx.get("execution_id", "")
+        span_id = uuid.uuid4().hex[:16]  # OTel 16-hex span_id per step
+
        step_log = NodeStepLog(
            node_id=node_id,
            node_type=node_type,
@@ -132,6 +137,9 @@ class RuntimeLogger:
            error=error,
            stacktrace=stacktrace,
            is_partial=is_partial,
+            trace_id=trace_id,
+            span_id=span_id,
+            execution_id=execution_id,
        )

        with self._lock:
@@ -190,6 +198,11 @@ class RuntimeLogger:
            needs_attention = True
            attention_reasons.append(f"Many iterations: {total_steps}")

+        # OTel / trace context for L2 correlation
+        ctx = get_trace_context()
+        trace_id = ctx.get("trace_id", "")
+        span_id = uuid.uuid4().hex[:16]  # Optional node-level span
+
        detail = NodeDetail(
            node_id=node_id,
            node_name=node_name,
@@ -210,6 +223,8 @@ class RuntimeLogger:
            continue_count=continue_count,
            needs_attention=needs_attention,
            attention_reasons=attention_reasons,
+            trace_id=trace_id,
+            span_id=span_id,
        )

        with self._lock:
@@ -274,6 +289,11 @@ class RuntimeLogger:
            for nd in node_details:
                attention_reasons.extend(nd.attention_reasons)

+            # OTel / trace context for L1 correlation
+            ctx = get_trace_context()
+            trace_id = ctx.get("trace_id", "")
+            execution_id = ctx.get("execution_id", "")
+
            summary = RunSummaryLog(
                run_id=self._run_id,
                agent_id=self._agent_id,
@@ -288,6 +308,8 @@ class RuntimeLogger:
                started_at=self._started_at,
                duration_ms=duration_ms,
                execution_quality=execution_quality,
+                trace_id=trace_id,
+                execution_id=execution_id,
            )

            await self._store.save_summary(self._run_id, summary)
@@ -12,6 +12,7 @@ import uuid
 from datetime import datetime
 from typing import TYPE_CHECKING, Any

+from framework.observability import set_trace_context
 from framework.schemas.decision import Decision, DecisionType, Option, Outcome
 from framework.schemas.run import Run, RunStatus
 from framework.storage.concurrent import ConcurrentStorage
@@ -119,6 +120,16 @@ class StreamRuntime:
        """
        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
        run_id = f"run_{self.stream_id}_{timestamp}_{uuid.uuid4().hex[:8]}"
+        trace_id = uuid.uuid4().hex
+        otel_execution_id = uuid.uuid4().hex  # 32 hex, OTel/W3C-aligned for logs
+
+        set_trace_context(
+            trace_id=trace_id,
+            execution_id=otel_execution_id,
+            run_id=run_id,
+            goal_id=goal_id,
+            stream_id=self.stream_id,
+        )

        run = Run(
            id=run_id,
@@ -11,6 +11,7 @@ from pathlib import Path

 import pytest

+from framework.observability import clear_trace_context, set_trace_context
 from framework.runtime.runtime_log_schemas import (
    NodeDetail,
    NodeStepLog,
@@ -464,6 +465,114 @@ class TestRuntimeLogger:
        assert tool_logs.steps[0].verdict == "RETRY"
        assert tool_logs.steps[2].verdict == "ACCEPT"

+    @pytest.mark.asyncio
+    async def test_trace_context_populated_in_l1_l2_l3(self, tmp_path: Path):
+        """With trace context set, L3/L2/L1 entries include trace_id, span_id, execution_id."""
+        set_trace_context(
+            trace_id="a1b2c3d4e5f6789012345678abcdef01",
+            execution_id="b2c3d4e5f6789012345678abcdef0123",
+        )
+        try:
+            store = RuntimeLogStore(tmp_path / "logs")
+            rl = RuntimeLogger(store=store, agent_id="test-agent")
+            run_id = rl.start_run("goal-1")
+
+            rl.log_step(
+                node_id="node-1",
+                node_type="event_loop",
+                step_index=0,
+                llm_text="Step.",
+                input_tokens=10,
+                output_tokens=5,
+            )
+            rl.log_node_complete(
+                node_id="node-1",
+                node_name="Search",
+                node_type="event_loop",
+                success=True,
+                exit_status="success",
+            )
+            await rl.end_run(
+                status="success",
+                duration_ms=100,
+                node_path=["node-1"],
+                execution_quality="clean",
+            )
+
+            # L3: tool_logs
+            tool_logs = await store.load_tool_logs(run_id)
+            assert tool_logs is not None
+            assert len(tool_logs.steps) == 1
+            step = tool_logs.steps[0]
+            assert step.trace_id == "a1b2c3d4e5f6789012345678abcdef01"
+            assert step.execution_id == "b2c3d4e5f6789012345678abcdef0123"
+            assert len(step.span_id) == 16
+            assert all(c in "0123456789abcdef" for c in step.span_id)
+
+            # L2: details
+            details = await store.load_details(run_id)
+            assert details is not None
+            assert len(details.nodes) == 1
+            nd = details.nodes[0]
+            assert nd.trace_id == "a1b2c3d4e5f6789012345678abcdef01"
+            assert len(nd.span_id) == 16
+
+            # L1: summary
+            summary = await store.load_summary(run_id)
+            assert summary is not None
+            assert summary.trace_id == "a1b2c3d4e5f6789012345678abcdef01"
+            assert summary.execution_id == "b2c3d4e5f6789012345678abcdef0123"
+        finally:
+            clear_trace_context()
+
+    @pytest.mark.asyncio
+    async def test_trace_context_empty_when_not_set(self, tmp_path: Path):
+        """Without trace context, L3/L2/L1 trace_id and execution_id are empty."""
+        clear_trace_context()
+        store = RuntimeLogStore(tmp_path / "logs")
+        rl = RuntimeLogger(store=store, agent_id="test-agent")
+        run_id = rl.start_run("goal-1")
+
+        rl.log_step(
+            node_id="node-1",
+            node_type="event_loop",
+            step_index=0,
+            llm_text="Step.",
+            input_tokens=10,
+            output_tokens=5,
+        )
+        rl.log_node_complete(
+            node_id="node-1",
+            node_name="Search",
+            node_type="event_loop",
+            success=True,
+            exit_status="success",
+        )
+        await rl.end_run(
+            status="success",
+            duration_ms=100,
+            node_path=["node-1"],
+            execution_quality="clean",
+        )
+
+        # L3: trace_id and execution_id from context should be empty
+        tool_logs = await store.load_tool_logs(run_id)
+        assert tool_logs is not None
+        assert len(tool_logs.steps) == 1
+        assert tool_logs.steps[0].trace_id == ""
+        assert tool_logs.steps[0].execution_id == ""
+
+        # L2
+        details = await store.load_details(run_id)
+        assert details is not None
+        assert details.nodes[0].trace_id == ""
+
+        # L1
+        summary = await store.load_summary(run_id)
+        assert summary is not None
+        assert summary.trace_id == ""
+        assert summary.execution_id == ""
+
    @pytest.mark.asyncio
    async def test_multi_node_lifecycle(self, tmp_path: Path):
        """Test logging across multiple nodes in a graph run."""
@@ -163,6 +163,7 @@ hive/                                    # Repository root
 │   │   ├── llm/                         # LLM provider integrations (Anthropic, OpenAI, etc.)
 │   │   ├── mcp/                         # MCP server integration
 │   │   ├── runner/                      # AgentRunner - loads and runs agents
+|   |   ├── observability/               # Structured logging - human-readable and machine-parseable tracing
 │   │   ├── runtime/                     # Runtime environment
 │   │   ├── schemas/                     # Data schemas
 │   │   ├── storage/                     # File-based persistence
@@ -11,6 +11,7 @@ template_name/
 ├── __init__.py       # Package exports
 ├── __main__.py       # CLI entry point
 ├── agent.py          # Goal, edges, graph spec, agent class
+├── agent.json        # Agent definition (used by build-from-template)
 ├── config.py         # Runtime configuration
 ├── nodes/
 │   └── __init__.py   # Node definitions (NodeSpec instances)
@@ -19,20 +20,28 @@ template_name/

 ## How to use a template

+### Option 1: Build from template (recommended)
+
+Use the `/hive-create` skill and select "From a template" to interactively pick a template, customize the goal/nodes/graph, and export a new agent.
+
+### Option 2: Manual copy
+
 ```bash
 # 1. Copy to your exports directory
-cp -r examples/templates/marketing_agent exports/my_marketing_agent
+cp -r examples/templates/deep_research_agent exports/my_research_agent

 # 2. Update the module references in __main__.py and __init__.py

 # 3. Customize goal, nodes, edges, and prompts

 # 4. Run it
-uv run python -m exports.my_marketing_agent --input '{"product_description": "..."}'
+uv run python -m exports.my_research_agent --input '{"topic": "..."}'
 ```

 ## Available templates

 | Template | Description |
 |----------|-------------|
-| [marketing_agent](marketing_agent/) | Multi-channel marketing content generator with audience analysis, content generation, and editorial review nodes |
+| [deep_research_agent](deep_research_agent/) | Interactive research agent that searches diverse sources, evaluates findings with user checkpoints, and produces a cited HTML report |
+| [tech_news_reporter](tech_news_reporter/) | Researches the latest technology and AI news from the web and produces a well-organized report |
+| [twitter_outreach](twitter_outreach/) | Researches a Twitter/X profile, crafts a personalized outreach email, and sends it after user approval |
@@ -207,17 +207,8 @@ async def _interactive_shell(verbose=False):

                if result.success:
                    output = result.output
-                    if "report_content" in output:
-                        click.echo("\n--- Report ---\n")
-                        click.echo(output["report_content"])
-                        click.echo("\n")
-                    if "references" in output:
-                        click.echo("--- References ---\n")
-                        for ref in output.get("references", []):
-                            click.echo(
-                                f"  [{ref.get('number', '?')}] {ref.get('title', '')} - {ref.get('url', '')}"
-                            )
-                        click.echo("\n")
+                    status = output.get("delivery_status", "unknown")
+                    click.echo(f"\nResearch complete (status: {status})\n")
                else:
                    click.echo(f"\nResearch failed: {result.error}\n")

@@ -0,0 +1,276 @@
+{
+  "agent": {
+    "id": "deep_research_agent",
+    "name": "Deep Research Agent",
+    "version": "1.0.0",
+    "description": "Interactive research agent that rigorously investigates topics through multi-source search, quality evaluation, and synthesis - with TUI conversation at key checkpoints for user guidance and feedback."
+  },
+  "graph": {
+    "id": "deep-research-agent-graph",
+    "goal_id": "rigorous-interactive-research",
+    "version": "1.0.0",
+    "entry_node": "intake",
+    "entry_points": {
+      "start": "intake"
+    },
+    "pause_nodes": [],
+    "terminal_nodes": [
+      "report"
+    ],
+    "nodes": [
+      {
+        "id": "intake",
+        "name": "Research Intake",
+        "description": "Discuss the research topic with the user, clarify scope, and confirm direction",
+        "node_type": "event_loop",
+        "input_keys": [
+          "topic"
+        ],
+        "output_keys": [
+          "research_brief"
+        ],
+        "nullable_output_keys": [],
+        "input_schema": {},
+        "output_schema": {},
+        "system_prompt": "You are a research intake specialist. The user wants to research a topic.\nHave a brief conversation to clarify what they need.\n\n**STEP 1 \u2014 Read and respond (text only, NO tool calls):**\n1. Read the topic provided\n2. If it's vague, ask 1-2 clarifying questions (scope, angle, depth)\n3. If it's already clear, confirm your understanding and ask the user to confirm\n\nKeep it short. Don't over-ask.\n\nAfter your message, call ask_user() to wait for the user's response.\n\n**STEP 2 \u2014 After the user confirms, call set_output:**\n- set_output(\"research_brief\", \"A clear paragraph describing exactly what to research, what questions to answer, what scope to cover, and how deep to go.\")",
+        "tools": [],
+        "model": null,
+        "function": null,
+        "routes": {},
+        "max_retries": 3,
+        "retry_on": [],
+        "max_node_visits": 1,
+        "output_model": null,
+        "max_validation_retries": 2,
+        "client_facing": true
+      },
+      {
+        "id": "research",
+        "name": "Research",
+        "description": "Search the web, fetch source content, and compile findings",
+        "node_type": "event_loop",
+        "input_keys": [
+          "research_brief",
+          "feedback"
+        ],
+        "output_keys": [
+          "findings",
+          "sources",
+          "gaps"
+        ],
+        "nullable_output_keys": [
+          "feedback"
+        ],
+        "input_schema": {},
+        "output_schema": {},
+        "system_prompt": "You are a research agent. Given a research brief, find and analyze sources.\n\nIf feedback is provided, this is a follow-up round \u2014 focus on the gaps identified.\n\nWork in phases:\n1. **Search**: Use web_search with 3-5 diverse queries covering different angles.\n   Prioritize authoritative sources (.edu, .gov, established publications).\n2. **Fetch**: Use web_scrape on the most promising URLs (aim for 5-8 sources).\n   Skip URLs that fail. Extract the substantive content.\n3. **Analyze**: Review what you've collected. Identify key findings, themes,\n   and any contradictions between sources.\n\nImportant:\n- Work in batches of 3-4 tool calls at a time to manage context\n- After each batch, assess whether you have enough material\n- Prefer quality over quantity \u2014 5 good sources beat 15 thin ones\n- Track which URL each finding comes from (you'll need citations later)\n\nWhen done, use set_output:\n- set_output(\"findings\", \"Structured summary: key findings with source URLs for each claim. Include themes, contradictions, and confidence levels.\")\n- set_output(\"sources\", [{\"url\": \"...\", \"title\": \"...\", \"summary\": \"...\"}])\n- set_output(\"gaps\", \"What aspects of the research brief are NOT well-covered yet, if any.\")",
+        "tools": [
+          "web_search",
+          "web_scrape",
+          "load_data",
+          "save_data",
+          "list_data_files"
+        ],
+        "model": null,
+        "function": null,
+        "routes": {},
+        "max_retries": 3,
+        "retry_on": [],
+        "max_node_visits": 3,
+        "output_model": null,
+        "max_validation_retries": 2,
+        "client_facing": false
+      },
+      {
+        "id": "review",
+        "name": "Review Findings",
+        "description": "Present findings to user and decide whether to research more or write the report",
+        "node_type": "event_loop",
+        "input_keys": [
+          "findings",
+          "sources",
+          "gaps",
+          "research_brief"
+        ],
+        "output_keys": [
+          "needs_more_research",
+          "feedback"
+        ],
+        "nullable_output_keys": [],
+        "input_schema": {},
+        "output_schema": {},
+        "system_prompt": "Present the research findings to the user clearly and concisely.\n\n**STEP 1 \u2014 Present (your first message, text only, NO tool calls):**\n1. **Summary** (2-3 sentences of what was found)\n2. **Key Findings** (bulleted, with confidence levels)\n3. **Sources Used** (count and quality assessment)\n4. **Gaps** (what's still unclear or under-covered)\n\nEnd by asking: Are they satisfied, or do they want deeper research? Should we proceed to writing the final report?\n\nAfter your presentation, call ask_user() to wait for the user's response.\n\n**STEP 2 \u2014 After the user responds, call set_output:**\n- set_output(\"needs_more_research\", \"true\")  \u2014 if they want more\n- set_output(\"needs_more_research\", \"false\") \u2014 if they're satisfied\n- set_output(\"feedback\", \"What the user wants explored further, or empty string\")",
+        "tools": [],
+        "model": null,
+        "function": null,
+        "routes": {},
+        "max_retries": 3,
+        "retry_on": [],
+        "max_node_visits": 3,
+        "output_model": null,
+        "max_validation_retries": 2,
+        "client_facing": true
+      },
+      {
+        "id": "report",
+        "name": "Write & Deliver Report",
+        "description": "Write a cited HTML report from the findings and present it to the user",
+        "node_type": "event_loop",
+        "input_keys": [
+          "findings",
+          "sources",
+          "research_brief"
+        ],
+        "output_keys": [
+          "delivery_status"
+        ],
+        "nullable_output_keys": [],
+        "input_schema": {},
+        "output_schema": {},
+        "system_prompt": "Write a comprehensive research report as an HTML file and present it to the user.\n\n**STEP 1 \u2014 Write the HTML report (tool calls, NO text to user yet):**\n\n1. Compose a complete, self-contained HTML document with embedded CSS styling.\n   Use a clean, readable design: max-width container, pleasant typography,\n   numbered citation links, a table of contents, and a references section.\n\n   Report structure inside the HTML:\n   - Title & date\n   - Executive Summary (2-3 paragraphs)\n   - Table of Contents\n   - Findings (organized by theme, with [n] citation links)\n   - Analysis (synthesis, implications, areas of debate)\n   - Conclusion (key takeaways, confidence assessment)\n   - References (numbered list with clickable URLs)\n\n   Requirements:\n   - Every factual claim must cite its source with [n] notation\n   - Be objective \u2014 present multiple viewpoints where sources disagree\n   - Distinguish well-supported conclusions from speculation\n   - Answer the original research questions from the brief\n\n2. Save the HTML file:\n   save_data(filename=\"report.html\", data=<your_html>)\n\n3. Get the clickable link:\n   serve_file_to_user(filename=\"report.html\", label=\"Research Report\")\n\n**STEP 2 \u2014 Present the link to the user (text only, NO tool calls):**\n\nTell the user the report is ready and include the file:// URI from\nserve_file_to_user so they can click it to open. Give a brief summary\nof what the report covers. Ask if they have questions.\n\nAfter presenting the link, call ask_user() to wait for the user's response.\n\n**STEP 3 \u2014 After the user responds:**\n- Answer follow-up questions from the research material\n- Call ask_user() again if they might have more questions\n- When the user is satisfied: set_output(\"delivery_status\", \"completed\")",
+        "tools": [
+          "save_data",
+          "serve_file_to_user",
+          "load_data",
+          "list_data_files"
+        ],
+        "model": null,
+        "function": null,
+        "routes": {},
+        "max_retries": 3,
+        "retry_on": [],
+        "max_node_visits": 1,
+        "output_model": null,
+        "max_validation_retries": 2,
+        "client_facing": true
+      }
+    ],
+    "edges": [
+      {
+        "id": "intake-to-research",
+        "source": "intake",
+        "target": "research",
+        "condition": "on_success",
+        "condition_expr": null,
+        "priority": 1,
+        "input_mapping": {}
+      },
+      {
+        "id": "research-to-review",
+        "source": "research",
+        "target": "review",
+        "condition": "on_success",
+        "condition_expr": null,
+        "priority": 1,
+        "input_mapping": {}
+      },
+      {
+        "id": "review-to-research-feedback",
+        "source": "review",
+        "target": "research",
+        "condition": "conditional",
+        "condition_expr": "str(needs_more_research).lower() == 'true'",
+        "priority": 2,
+        "input_mapping": {}
+      },
+      {
+        "id": "review-to-report",
+        "source": "review",
+        "target": "report",
+        "condition": "conditional",
+        "condition_expr": "str(needs_more_research).lower() != 'true'",
+        "priority": 1,
+        "input_mapping": {}
+      }
+    ],
+    "max_steps": 100,
+    "max_retries_per_node": 3,
+    "description": "Interactive research agent that rigorously investigates topics through multi-source search, quality evaluation, and synthesis - with TUI conversation at key checkpoints for user guidance and feedback.",
+    "created_at": "2026-02-06T00:00:00.000000"
+  },
+  "goal": {
+    "id": "rigorous-interactive-research",
+    "name": "Rigorous Interactive Research",
+    "description": "Research any topic by searching diverse sources, analyzing findings, and producing a cited report \u2014 with user checkpoints to guide direction.",
+    "status": "draft",
+    "success_criteria": [
+      {
+        "id": "source-diversity",
+        "description": "Use multiple diverse, authoritative sources",
+        "metric": "source_count",
+        "target": ">=5",
+        "weight": 0.25,
+        "met": false
+      },
+      {
+        "id": "citation-coverage",
+        "description": "Every factual claim in the report cites its source",
+        "metric": "citation_coverage",
+        "target": "100%",
+        "weight": 0.25,
+        "met": false
+      },
+      {
+        "id": "user-satisfaction",
+        "description": "User reviews findings before report generation",
+        "metric": "user_approval",
+        "target": "true",
+        "weight": 0.25,
+        "met": false
+      },
+      {
+        "id": "report-completeness",
+        "description": "Final report answers the original research questions",
+        "metric": "question_coverage",
+        "target": "90%",
+        "weight": 0.25,
+        "met": false
+      }
+    ],
+    "constraints": [
+      {
+        "id": "no-hallucination",
+        "description": "Only include information found in fetched sources",
+        "constraint_type": "quality",
+        "category": "accuracy",
+        "check": ""
+      },
+      {
+        "id": "source-attribution",
+        "description": "Every claim must cite its source with a numbered reference",
+        "constraint_type": "quality",
+        "category": "accuracy",
+        "check": ""
+      },
+      {
+        "id": "user-checkpoint",
+        "description": "Present findings to the user before writing the final report",
+        "constraint_type": "functional",
+        "category": "interaction",
+        "check": ""
+      }
+    ],
+    "context": {},
+    "required_capabilities": [],
+    "input_schema": {},
+    "output_schema": {},
+    "version": "1.0.0",
+    "parent_version": null,
+    "evolution_reason": null,
+    "created_at": "2026-02-06 00:00:00.000000",
+    "updated_at": "2026-02-06 00:00:00.000000"
+  },
+  "required_tools": [
+    "list_data_files",
+    "load_data",
+    "save_data",
+    "serve_file_to_user",
+    "web_scrape",
+    "web_search"
+  ],
+  "metadata": {
+    "created_at": "2026-02-06T00:00:00.000000",
+    "node_count": 4,
+    "edge_count": 4
+  }
+}
@@ -102,23 +102,23 @@ edges = [
        condition=EdgeCondition.ON_SUCCESS,
        priority=1,
    ),
-    # review -> research (feedback loop)
+    # review -> research (feedback loop, checked first)
    EdgeSpec(
        id="review-to-research-feedback",
        source="review",
        target="research",
        condition=EdgeCondition.CONDITIONAL,
-        condition_expr="needs_more_research == True",
-        priority=1,
+        condition_expr="str(needs_more_research).lower() == 'true'",
+        priority=2,
    ),
-    # review -> report (user satisfied)
+    # review -> report (complementary condition — proceed to report when no more research needed)
    EdgeSpec(
        id="review-to-report",
        source="review",
        target="report",
        condition=EdgeCondition.CONDITIONAL,
-        condition_expr="needs_more_research == False",
-        priority=2,
+        condition_expr="str(needs_more_research).lower() != 'true'",
+        priority=1,
    ),
 ]

@@ -23,6 +23,8 @@ Have a brief conversation to clarify what they need.

 Keep it short. Don't over-ask.

+After your message, call ask_user() to wait for the user's response.
+
 **STEP 2 — After the user confirms, call set_output:**
 - set_output("research_brief", "A clear paragraph describing exactly what to research, \
 what questions to answer, what scope to cover, and how deep to go.")
@@ -93,6 +95,8 @@ Present the research findings to the user clearly and concisely.
 End by asking: Are they satisfied, or do they want deeper research? \
 Should we proceed to writing the final report?

+After your presentation, call ask_user() to wait for the user's response.
+
 **STEP 2 — After the user responds, call set_output:**
 - set_output("needs_more_research", "true")  — if they want more
 - set_output("needs_more_research", "false") — if they're satisfied
@@ -147,8 +151,11 @@ Tell the user the report is ready and include the file:// URI from
 serve_file_to_user so they can click it to open. Give a brief summary
 of what the report covers. Ask if they have questions.

+After presenting the link, call ask_user() to wait for the user's response.
+
 **STEP 3 — After the user responds:**
 - Answer follow-up questions from the research material
+- Call ask_user() again if they might have more questions
 - When the user is satisfied: set_output("delivery_status", "completed")
 """,
    tools=["save_data", "serve_file_to_user", "load_data", "list_data_files"],
@@ -353,7 +353,18 @@ class CredentialStoreAdapter:
        cls,
        specs: dict[str, CredentialSpec] | None = None,
    ) -> CredentialStoreAdapter:
-        """Create adapter with encrypted storage primary and env var fallback."""
+        """Create adapter with encrypted storage primary and env var fallback.
+
+        When ADEN_API_KEY is set, builds the store with AdenSyncProvider and
+        AdenCachedStorage so that OAuth credentials (Google, HubSpot, Slack)
+        auto-refresh via the Aden server.  Non-Aden credentials (brave_search,
+        anthropic, resend) still resolve from environment variables.
+
+        When ADEN_API_KEY is not set, behaves identically to before.
+        """
+        import logging
+        import os
+
        from framework.credentials import CredentialStore
        from framework.credentials.storage import (
            CompositeStorage,
@@ -361,6 +372,8 @@ class CredentialStoreAdapter:
            EnvVarStorage,
        )

+        log = logging.getLogger(__name__)
+
        if specs is None:
            from . import CREDENTIAL_SPECS

@@ -368,17 +381,69 @@ class CredentialStoreAdapter:

        env_mapping = {name: spec.env_var for name, spec in specs.items()}

+        # --- Aden sync branch ---
+        # Note: we don't use CredentialStore.with_aden_sync() here because it
+        # only wraps EncryptedFileStorage.  We need CompositeStorage (encrypted
+        # + env var fallback) so non-Aden credentials like brave_search still
+        # resolve from environment variables.
+        aden_api_key = os.environ.get("ADEN_API_KEY")
+        if aden_api_key:
+            try:
+                from framework.credentials.aden import (
+                    AdenCachedStorage,
+                    AdenClientConfig,
+                    AdenCredentialClient,
+                    AdenSyncProvider,
+                )
+
+                # Local storage: encrypted primary + env var fallback
+                encrypted = EncryptedFileStorage()
+                env = EnvVarStorage(env_mapping)
+                local_composite = CompositeStorage(primary=encrypted, fallbacks=[env])
+
+                # Aden components
+                client = AdenCredentialClient(
+                    AdenClientConfig(
+                        base_url=os.environ.get("ADEN_API_URL", "https://api.adenhq.com"),
+                    )
+                )
+                provider = AdenSyncProvider(client=client)
+
+                # AdenCachedStorage wraps composite, giving Aden priority
+                cached_storage = AdenCachedStorage(
+                    local_storage=local_composite,
+                    aden_provider=provider,
+                    cache_ttl_seconds=300,
+                )
+
+                store = CredentialStore(
+                    storage=cached_storage,
+                    providers=[provider],
+                    auto_refresh=True,
+                )
+
+                # Initial sync: populate local cache from Aden
+                try:
+                    synced = provider.sync_all(store)
+                    log.info("Aden credential sync complete: %d credentials synced", synced)
+                except Exception as e:
+                    log.warning("Aden initial sync failed (will retry on access): %s", e)
+
+                return cls(store=store, specs=specs)
+
+            except Exception as e:
+                log.warning(
+                    "Aden credential sync unavailable, falling back to default storage: %s", e
+                )
+
+        # --- Default branch (no ADEN_API_KEY or Aden setup failed) ---
        try:
            encrypted = EncryptedFileStorage()
            env = EnvVarStorage(env_mapping)
            composite = CompositeStorage(primary=encrypted, fallbacks=[env])
            store = CredentialStore(storage=composite)
        except Exception as e:
-            import logging
-
-            logging.getLogger(__name__).warning(
-                "Encrypted credential storage unavailable, falling back to env vars: %s", e
-            )
+            log.warning("Encrypted credential storage unavailable, falling back to env vars: %s", e)
            store = CredentialStore.with_env_storage(env_mapping)

        return cls(store=store, specs=specs)
@@ -1,5 +1,7 @@
 """Tests for CredentialStoreAdapter."""

+from unittest.mock import MagicMock, patch
+
 import pytest

 from aden_tools.credentials import (
@@ -484,3 +486,130 @@ class TestSpecCompleteness:
                assert spec.credential_group == "", (
                    f"Credential '{name}' has unexpected credential_group='{spec.credential_group}'"
                )
+
+
+class TestCredentialStoreAdapterAdenSync:
+    """Tests for Aden sync branch in CredentialStoreAdapter.default()."""
+
+    def _patch_encrypted_storage(self, tmp_path):
+        """Patch EncryptedFileStorage to use a temp directory."""
+        from framework.credentials.storage import EncryptedFileStorage
+
+        original_init = EncryptedFileStorage.__init__
+
+        def patched_init(self_inner, base_path=None, **kwargs):
+            original_init(self_inner, base_path=str(tmp_path / "creds"), **kwargs)
+
+        return patch.object(EncryptedFileStorage, "__init__", patched_init)
+
+    def test_default_with_aden_key_creates_aden_store(self, monkeypatch, tmp_path):
+        """When ADEN_API_KEY is set, default() wires up AdenSyncProvider."""
+        monkeypatch.setenv("ADEN_API_KEY", "test-aden-key")
+        monkeypatch.setenv("ADEN_API_URL", "https://test.adenhq.com")
+
+        mock_client = MagicMock()
+        mock_client.list_integrations.return_value = []
+
+        with (
+            self._patch_encrypted_storage(tmp_path),
+            patch(
+                "framework.credentials.aden.AdenCredentialClient",
+                return_value=mock_client,
+            ),
+            patch(
+                "framework.credentials.aden.AdenClientConfig",
+            ),
+        ):
+            adapter = CredentialStoreAdapter.default()
+
+        # Verify AdenSyncProvider is registered
+        provider = adapter.store.get_provider("aden_sync")
+        assert provider is not None
+
+    def test_default_without_aden_key_uses_env_fallback(self, monkeypatch, tmp_path):
+        """When ADEN_API_KEY is not set, default() uses env-only storage."""
+        monkeypatch.delenv("ADEN_API_KEY", raising=False)
+        monkeypatch.setenv("BRAVE_SEARCH_API_KEY", "test-brave-key")
+
+        with self._patch_encrypted_storage(tmp_path):
+            adapter = CredentialStoreAdapter.default()
+
+        # No Aden provider should be registered
+        assert adapter.store.get_provider("aden_sync") is None
+        # Env vars still work
+        assert adapter.get("brave_search") == "test-brave-key"
+
+    def test_default_aden_non_aden_cred_falls_through_to_env(self, monkeypatch, tmp_path):
+        """Non-Aden credentials (e.g. brave_search) resolve from env vars even with Aden."""
+        monkeypatch.setenv("ADEN_API_KEY", "test-aden-key")
+        monkeypatch.setenv("ADEN_API_URL", "https://test.adenhq.com")
+        monkeypatch.setenv("BRAVE_SEARCH_API_KEY", "brave-from-env")
+
+        mock_client = MagicMock()
+        mock_client.list_integrations.return_value = []
+        # Aden returns None for brave_search (404 → None)
+        mock_client.get_credential.return_value = None
+
+        with (
+            self._patch_encrypted_storage(tmp_path),
+            patch(
+                "framework.credentials.aden.AdenCredentialClient",
+                return_value=mock_client,
+            ),
+            patch(
+                "framework.credentials.aden.AdenClientConfig",
+            ),
+        ):
+            adapter = CredentialStoreAdapter.default()
+
+        assert adapter.get("brave_search") == "brave-from-env"
+
+    def test_default_aden_sync_failure_falls_back_gracefully(self, monkeypatch, tmp_path):
+        """If Aden initial sync fails, adapter is still created and env vars work."""
+        monkeypatch.setenv("ADEN_API_KEY", "test-aden-key")
+        monkeypatch.setenv("ADEN_API_URL", "https://test.adenhq.com")
+        monkeypatch.setenv("BRAVE_SEARCH_API_KEY", "brave-fallback")
+
+        mock_client = MagicMock()
+        mock_client.list_integrations.side_effect = Exception("Connection refused")
+        mock_client.get_credential.return_value = None
+
+        with (
+            self._patch_encrypted_storage(tmp_path),
+            patch(
+                "framework.credentials.aden.AdenCredentialClient",
+                return_value=mock_client,
+            ),
+            patch(
+                "framework.credentials.aden.AdenClientConfig",
+            ),
+        ):
+            adapter = CredentialStoreAdapter.default()
+
+        # Adapter was created despite sync failure
+        assert adapter is not None
+        assert adapter.get("brave_search") == "brave-fallback"
+
+    def test_default_aden_import_error_falls_back(self, monkeypatch, tmp_path):
+        """If Aden imports fail (e.g. missing httpx), fall back to default storage."""
+        monkeypatch.setenv("ADEN_API_KEY", "test-aden-key")
+        monkeypatch.setenv("BRAVE_SEARCH_API_KEY", "brave-fallback")
+
+        import builtins
+
+        real_import = builtins.__import__
+
+        def mock_import(name, *args, **kwargs):
+            if name == "framework.credentials.aden":
+                raise ImportError(f"No module named '{name}'")
+            return real_import(name, *args, **kwargs)
+
+        with (
+            self._patch_encrypted_storage(tmp_path),
+            patch.object(builtins, "__import__", side_effect=mock_import),
+        ):
+            adapter = CredentialStoreAdapter.default()
+
+        # Fell back to default — env vars still work, no Aden provider
+        assert adapter.store.get_provider("aden_sync") is None
+        assert adapter.get("brave_search") == "brave-fallback"
@@ -754,7 +754,7 @@ wheels = [

 [[package]]
 name = "framework"
-version = "0.1.0"
+version = "0.4.2"
 source = { editable = "core" }
 dependencies = [
    { name = "anthropic" },