feat(observability): Structured logging with automatic trace context propagation -- adjust all logs to print full uuids when in prod mode and include documentation

2026-01-28 12:31:11 -03:00
parent 9bee1666f1
commit d090c954ae
5 changed files with 265 additions and 3 deletions
@@ -160,6 +160,7 @@ orchestration
 │   │   ├── protocols/                   # Standard protocols (hooks, tracing, etc.)
 │   │   ├── llm/                         # LLM provider integrations (Anthropic, OpenAI, etc.)
 │   │   ├── memory/                      # Memory systems (STM, LTM/RLM)
+|   |   ├── observability/               # Structured logging - human-readable and machine-parseable tracing
 │   │   ├── tools/                       # Tool registry and management
 │   │   └── __init__.py
 │   ├── pyproject.toml                   # Package metadata and dependencies
@@ -0,0 +1,261 @@
+# Observability - Structured Logging
+
+## Configuration via Environment Variables
+
+Control logging format using environment variables:
+
+```bash
+# JSON logging (production) - Machine-parseable, one line per log
+export LOG_FORMAT=json
+python -m my_agent run
+
+# Human-readable (development) - Color-coded, easy to read
+# Default if LOG_FORMAT is not set
+python -m my_agent run
+```
+
+**Alternative:** Set `ENV=production` to automatically use JSON format:
+
+```bash
+export ENV=production
+python -m my_agent run
+```
+
+---
+
+## Overview
+
+The Hive framework provides automatic structured logging with trace context propagation. Logs include correlation IDs (`trace_id`, `execution_id`) that automatically follow your agent execution flow.
+
+**Features:**
+- **Zero developer friction**: Standard `logger.info()` calls automatically get trace context
+- **ContextVar-based propagation**: Thread-safe and async-safe for concurrent executions
+- **Dual output modes**: JSON for production, human-readable for development
+- **Automatic correlation**: `trace_id` and `execution_id` propagate through all logs
+
+## Quick Start
+
+Logging is automatically configured when you use `AgentRunner`. No setup required:
+
+```python
+from framework.runner import AgentRunner
+
+runner = AgentRunner(graph=my_graph, goal=my_goal)
+result = await runner.run({"input": "data"})
+# Logs automatically include trace_id, execution_id, agent_id, etc.
+```
+
+## Programmatic Configuration
+
+Configure logging explicitly in your code:
+
+```python
+from framework.observability import configure_logging
+
+# Human-readable (development)
+configure_logging(level="DEBUG", format="human")
+
+# JSON (production)
+configure_logging(level="INFO", format="json")
+
+# Auto-detect from environment
+configure_logging(level="INFO", format="auto")
+```
+
+### Configuration Options
+
+- **level**: `"DEBUG"`, `"INFO"`, `"WARNING"`, `"ERROR"`, `"CRITICAL"`
+- **format**: 
+  - `"json"` - Machine-parseable JSON (one line per log entry)
+  - `"human"` - Human-readable with colors
+  - `"auto"` - Detects from `LOG_FORMAT` env var or `ENV=production`
+
+### Programmatic Configuration
+
+Configure logging explicitly in your code:
+
+```python
+from framework.observability import configure_logging
+
+# Human-readable (development)
+configure_logging(level="DEBUG", format="human")
+
+# JSON (production)
+configure_logging(level="INFO", format="json")
+
+# Auto-detect from environment
+configure_logging(level="INFO", format="auto")
+```
+
+### Configuration Options
+
+- **level**: `"DEBUG"`, `"INFO"`, `"WARNING"`, `"ERROR"`, `"CRITICAL"`
+- **format**: 
+  - `"json"` - Machine-parseable JSON (one line per log entry)
+  - `"human"` - Human-readable with colors
+  - `"auto"` - Detects from `LOG_FORMAT` env var or `ENV=production`
+
+## Log Format Examples
+
+### JSON Format (Machine-parseable)
+
+```json
+{"timestamp": "2026-01-28T15:01:02.671126+00:00", "level": "info", "logger": "framework.runtime", "message": "Starting agent execution", "trace_id": "tr_1234567890abcdef...", "execution_id": "exec_start_a1b2c3d4e5f6...", "agent_id": "sales-agent", "goal_id": "qualify-leads"}
+```
+
+**Features:**
+- Full UUIDs for `execution_id` and `trace_id` (for machine correlation)
+- Compact single-line format (easy to stream/parse)
+- All trace context fields included automatically
+
+### Human-Readable Format (Development)
+
+```
+[INFO    ] [trace:12345678 | exec:a1b2c3d4 | agent:sales-agent] Starting agent execution
+[INFO    ] [trace:12345678 | exec:a1b2c3d4 | agent:sales-agent] Processing input data [node_id:input-processor]
+[INFO    ] [trace:12345678 | exec:a1b2c3d4 | agent:sales-agent] LLM call completed [latency_ms:1250] [tokens_used:450]
+```
+
+**Features:**
+- Color-coded log levels
+- Shortened IDs for readability (first 8 chars)
+- Context prefix shows trace correlation
+
+## Trace Context Fields
+
+The following fields are automatically included in all logs:
+
+- **trace_id**: Full UUID for tracing entire request flow
+- **execution_id**: Full UUID for specific execution (format: `exec_{stream_id}_{uuid}`)
+- **agent_id**: Agent/graph identifier
+- **goal_id**: Goal being pursued
+- **node_id**: Current node (when set)
+
+## Custom Log Fields
+
+Add custom fields using the `extra` parameter:
+
+```python
+import logging
+
+logger = logging.getLogger("my_module")
+
+# Add custom fields
+logger.info("LLM call completed", extra={
+    "latency_ms": 1250,
+    "tokens_used": 450,
+    "model": "claude-3-5-sonnet-20241022",
+    "node_id": "web-search"
+})
+```
+
+These fields appear in both JSON and human-readable formats.
+
+## Usage in Your Code
+
+### Standard Logging (Recommended)
+
+Just use Python's standard logging - context is automatic:
+
+```python
+import logging
+
+logger = logging.getLogger(__name__)
+
+def my_function():
+    # This log automatically includes trace_id, execution_id, etc.
+    logger.info("Processing data")
+    
+    try:
+        result = do_work()
+        logger.info("Work completed", extra={"result_count": len(result)})
+    except Exception as e:
+        logger.error("Work failed", exc_info=True)
+```
+
+### Framework-Managed Context
+
+The framework automatically sets trace context at key points:
+
+- **Runtime.start_run()**: Sets `trace_id`, `execution_id`, `goal_id`
+- **GraphExecutor.execute()**: Adds `agent_id`
+- **Node execution**: Adds `node_id`
+
+You don't need to manually set context - it propagates automatically via ContextVar.
+
+## Advanced Usage
+
+### Manual Context Management
+
+If you need to set trace context manually (rare):
+
+```python
+from framework.observability import set_trace_context, get_trace_context
+
+# Set context
+set_trace_context(
+    trace_id="tr_abc123",
+    execution_id="exec_xyz789",
+    agent_id="my-agent"
+)
+
+# Get current context
+context = get_trace_context()
+print(context["execution_id"])
+
+# Clear context (usually not needed)
+from framework.observability import clear_trace_context
+clear_trace_context()
+```
+
+### Testing
+
+For tests, you may want to configure logging explicitly:
+
+```python
+import pytest
+from framework.observability import configure_logging
+
+@pytest.fixture(autouse=True)
+def setup_logging():
+    configure_logging(level="DEBUG", format="human")
+```
+
+## Best Practices
+
+1. **Production**: Use JSON format (`LOG_FORMAT=json` or `ENV=production`)
+2. **Development**: Use human-readable format (default)
+3. **Don't manually set context**: Let the framework manage it
+4. **Use standard logging**: No special APIs needed - just `logger.info()`
+5. **Add custom fields**: Use `extra` dict for additional metadata
+
+## Troubleshooting
+
+### Logs missing trace context
+
+Ensure `configure_logging()` has been called (usually automatic via `AgentRunner._setup()`).
+
+### JSON logs not appearing
+
+Check environment variables:
+```bash
+echo $LOG_FORMAT
+echo $ENV
+```
+
+Or explicitly set:
+```python
+configure_logging(format="json")
+```
+
+### Context not propagating
+
+ContextVar automatically propagates through async calls. If context seems lost, check:
+- Are you in the same async execution context?
+- Has `set_trace_context()` been called for this execution?
+
+## See Also
+
+- [Logging Implementation](../observability/logging.py) - Source code
+- [AgentRunner](../runner/runner.py) - Where logging is configured
+- [Runtime Core](../runtime/core.py) - Where trace context is set
@@ -79,7 +79,7 @@ class Runtime:
        Returns:
            The run ID
        """
-        run_id = f"run_{datetime.now().strftime('%Y%m%d_%H%M%S')}_{uuid.uuid4().hex[:8]}"
+        run_id = f"run_{datetime.now().strftime('%Y%m%d_%H%M%S')}_{uuid.uuid4().hex}"

        # Generate trace_id for correlation across entire execution
        # Full UUID (128 bits) for uniqueness, displayed shortened in human mode
@@ -260,7 +260,7 @@ class ExecutionStream:
            raise RuntimeError(f"ExecutionStream '{self.stream_id}' is not running")

        # Generate execution ID
-        execution_id = f"exec_{self.stream_id}_{uuid.uuid4().hex[:8]}"
+        execution_id = f"exec_{self.stream_id}_{uuid.uuid4().hex}"
        if correlation_id is None:
            correlation_id = execution_id

@@ -119,7 +119,7 @@ class StreamRuntime:
            The run ID
        """
        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
-        run_id = f"run_{self.stream_id}_{timestamp}_{uuid.uuid4().hex[:8]}"
+        run_id = f"run_{self.stream_id}_{timestamp}_{uuid.uuid4().hex}"

        # Generate trace_id for correlation across entire execution
        # Full UUID (128 bits) for uniqueness, displayed shortened in human mode