fix: posix only permission check for skill

fix: python test
fix(micro-fix): queen skill allowlist
2026-03-31 19:06:23 -07:00 · 2026-03-31 18:55:24 -07:00 · 2026-03-31 18:52:45 -07:00 · 2026-03-31 18:30:21 -07:00 · 2026-03-31 18:13:01 -07:00 · 2026-03-31 17:59:25 -07:00
1350 changed files with 294349 additions and 60820 deletions
@@ -0,0 +1,15 @@
+{
+  "hooks": {
+    "PostToolUse": [
+      {
+        "matcher": "Edit|Write|NotebookEdit",
+        "hooks": [
+          {
+            "type": "command",
+            "command": "ruff check --fix \"$CLAUDE_FILE_PATH\" 2>/dev/null; ruff format \"$CLAUDE_FILE_PATH\" 2>/dev/null; true"
+          }
+        ]
+      }
+    ]
+  }
+}
@@ -0,0 +1,16 @@
+{
+  "permissions": {
+    "allow": [
+      "Bash(git status:*)",
+      "Bash(gh run view:*)",
+      "Bash(uv run:*)",
+      "Bash(env:*)",
+      "Bash(python -m py_compile:*)",
+      "Bash(python -m pytest:*)",
+      "Bash(source:*)",
+      "Bash(find:*)",
+      "Bash(PYTHONPATH=core:exports:tools/src uv run pytest:*)"
+    ]
+  },
+  "enabledMcpjsonServers": ["tools"]
+}
@@ -1 +0,0 @@
-../../core/.claude/skills/building-agents
@@ -0,0 +1,145 @@
+# Triage Issue Skill
+
+Analyze a GitHub issue, verify claims against the codebase, and close invalid issues with a technical response.
+
+## Trigger
+
+User provides a GitHub issue URL or number, e.g.:
+- `/triage-issue 1970`
+- `/triage-issue https://github.com/adenhq/hive/issues/1970`
+
+## Workflow
+
+### Step 1: Fetch Issue Details
+
+```bash
+gh issue view <number> --repo adenhq/hive --json title,body,state,labels,author
+```
+
+Extract:
+- Title
+- Body (the claim/bug report)
+- Current state
+- Labels
+- Author
+
+If issue is already closed, inform user and stop.
+
+### Step 2: Analyze the Claim
+
+Read the issue body and identify:
+1. **The core claim** - What is the user asserting?
+2. **Technical specifics** - File paths, function names, code snippets mentioned
+3. **Expected behavior** - What do they think should happen?
+4. **Severity claimed** - Security issue? Bug? Feature request?
+
+### Step 3: Investigate the Codebase
+
+For each technical claim:
+1. Find the referenced code using Grep/Glob/Read
+2. Understand the actual implementation
+3. Check if the claim accurately describes the behavior
+4. Look for related tests, documentation, or design decisions
+
+### Step 4: Evaluate Validity
+
+Categorize the issue as one of:
+
+| Category | Action |
+|----------|--------|
+| **Valid Bug** | Do NOT close. Inform user this is a real issue. |
+| **Valid Feature Request** | Do NOT close. Suggest labeling appropriately. |
+| **Misunderstanding** | Prepare technical explanation for why behavior is correct. |
+| **Fundamentally Flawed** | Prepare critique explaining the technical impossibility or design rationale. |
+| **Duplicate** | Find the original issue and prepare duplicate notice. |
+| **Incomplete** | Prepare request for more information. |
+
+### Step 5: Draft Response
+
+For issues to be closed, draft a response that:
+
+1. **Acknowledges the concern** - Don't be dismissive
+2. **Explains the actual behavior** - With code references
+3. **Provides technical rationale** - Why it works this way
+4. **References industry standards** - If applicable
+5. **Offers alternatives** - If there's a better approach for the user
+
+Use this template:
+
+```markdown
+## Analysis
+
+[Brief summary of what was investigated]
+
+## Technical Details
+
+[Explanation with code references]
+
+## Why This Is Working As Designed
+
+[Rationale]
+
+## Recommendation
+
+[What the user should do instead, if applicable]
+
+---
+*This issue was reviewed and closed by the maintainers.*
+```
+
+### Step 6: User Review
+
+Present the draft to the user with:
+
+```
+## Issue #<number>: <title>
+
+**Claim:** <summary of claim>
+
+**Finding:** <valid/invalid/misunderstanding/etc>
+
+**Draft Response:**
+<the markdown response>
+
+---
+Do you want me to post this comment and close the issue?
+```
+
+Use AskUserQuestion with options:
+- "Post and close" - Post comment, close issue
+- "Edit response" - Let user modify the response
+- "Skip" - Don't take action
+
+### Step 7: Execute Action
+
+If user approves:
+
+```bash
+# Post comment
+gh issue comment <number> --repo adenhq/hive --body "<response>"
+
+# Close issue
+gh issue close <number> --repo adenhq/hive --reason "not planned"
+```
+
+Report success with link to the issue.
+
+## Important Guidelines
+
+1. **Never close valid issues** - If there's any merit to the claim, don't close it
+2. **Be respectful** - The reporter took time to file the issue
+3. **Be technical** - Provide code references and evidence
+4. **Be educational** - Help them understand, don't just dismiss
+5. **Check twice** - Make sure you understand the code before declaring something invalid
+6. **Consider edge cases** - Maybe their environment reveals a real issue
+
+## Example Critiques
+
+### Security Misunderstanding
+> "The claim that secrets are exposed in plaintext misunderstands the encryption architecture. While `SecretStr` is used for logging protection, actual encryption is provided by Fernet (AES-128-CBC) at the storage layer. The code path is: serialize → encrypt → write. Only encrypted bytes touch disk."
+
+### Impossible Request
+> "The requested feature would require [X] which violates [fundamental constraint]. This is not a limitation of our implementation but a fundamental property of [technology/protocol]."
+
+### Already Handled
+> "This scenario is already handled by [code reference]. The reporter may be using an older version or misconfigured environment."
@@ -0,0 +1,18 @@
+This project uses ruff for Python linting and formatting.
+
+Rules:
+- Line length: 100 characters
+- Python target: 3.11+
+- Use double quotes for strings
+- Sort imports with isort (ruff I rules): stdlib, third-party, first-party (framework), local
+- Combine as-imports
+- Use type hints on all function signatures
+- Use `from __future__ import annotations` for modern type syntax
+- Raise exceptions with `from` in except blocks (B904)
+- No unused imports (F401), no unused variables (F841)
+- Prefer list/dict/set comprehensions over map/filter (C4)
+
+Run `make lint` to auto-fix, `make check` to verify without modifying files.
+Run `make format` to apply ruff formatting.
+
+The ruff config lives in core/pyproject.toml under [tool.ruff].
@@ -11,6 +11,9 @@ indent_size = 2
 insert_final_newline = true
 trim_trailing_whitespace = true

+[*.py]
+indent_size = 4
+
 [*.md]
 trim_trailing_whitespace = false

@@ -0,0 +1,124 @@
+# Normalize line endings for all text files
+* text=auto
+
+# Source code
+*.py text diff=python
+*.js text
+*.ts text
+*.jsx text
+*.tsx text
+*.json text
+*.yaml text
+*.yml text
+*.toml text
+*.ini text
+*.cfg text
+
+# Shell scripts (must use LF)
+*.sh text eol=lf
+quickstart.sh text eol=lf
+
+# PowerShell scripts (Windows-friendly)
+*.ps1 text eol=lf
+*.psm1 text eol=lf
+
+# Windows batch files (must use CRLF)
+*.bat text eol=crlf
+*.cmd text eol=crlf
+
+# Documentation
+*.md text
+*.txt text
+*.rst text
+*.tex text
+
+# Configuration files
+.gitignore text
+.gitattributes text
+.editorconfig text
+Dockerfile text
+docker-compose.yml text
+requirements*.txt text
+pyproject.toml text
+setup.py text
+setup.cfg text
+MANIFEST.in text
+LICENSE text
+README* text
+CHANGELOG* text
+CONTRIBUTING* text
+CODE_OF_CONDUCT* text
+
+# Web files
+*.html text
+*.css text
+*.scss text
+*.sass text
+
+# Data files
+*.xml text
+*.csv text
+*.sql text
+
+# Graphics (binary)
+*.png binary
+*.jpg binary
+*.jpeg binary
+*.gif binary
+*.ico binary
+*.svg binary
+*.eps binary
+*.bmp binary
+*.tif binary
+*.tiff binary
+
+# Archives (binary)
+*.zip binary
+*.tar binary
+*.gz binary
+*.bz2 binary
+*.7z binary
+*.rar binary
+
+# Python compiled (binary)
+*.pyc binary
+*.pyo binary
+*.pyd binary
+*.whl binary
+*.egg binary
+
+# System libraries (binary)
+*.so binary
+*.dll binary
+*.dylib binary
+*.lib binary
+*.a binary
+
+# Documents (binary)
+*.pdf binary
+*.doc binary
+*.docx binary
+*.ppt binary
+*.pptx binary
+*.xls binary
+*.xlsx binary
+
+# Fonts (binary)
+*.ttf binary
+*.otf binary
+*.woff binary
+*.woff2 binary
+*.eot binary
+
+# Audio/Video (binary)
+*.mp3 binary
+*.mp4 binary
+*.wav binary
+*.avi binary
+*.mov binary
+*.flv binary
+
+# Database files (binary)
+*.db binary
+*.sqlite binary
+*.sqlite3 binary
@@ -8,7 +8,6 @@
 /hive/ @adenhq/maintainers

 # Infrastructure
-/docker-compose*.yml @adenhq/maintainers
 /.github/ @adenhq/maintainers

 # Documentation
@@ -1,9 +1,10 @@
 ---
 name: Bug Report
 about: Report a bug to help us improve
-title: '[Bug]: '
-labels: bug
+title: "[Bug]: "
+labels: bug, enhancement
 assignees: ''
+
 ---

 ## Describe the Bug
@@ -29,13 +30,12 @@ If applicable, add screenshots to help explain your problem.
 ## Environment

 - OS: [e.g., Ubuntu 22.04, macOS 14]
- Docker version: [e.g., 24.0.0]
- Node version: [e.g., 20.10.0]
- Browser (if applicable): [e.g., Chrome 120]
+- Python version: [e.g., 3.11.0]
+- Docker version (if applicable): [e.g., 24.0.0]

 ## Configuration

-Relevant parts of your `config.yaml` (remove any sensitive data):
+Relevant parts of your agent configuration or environment setup (remove any sensitive data):

 ```yaml
 # paste here
@@ -1,9 +1,10 @@
 ---
 name: Feature Request
 about: Suggest a new feature or enhancement
-title: '[Feature]: '
+title: "[Feature]: "
 labels: enhancement
 assignees: ''
+
 ---

 ## Problem Statement
@@ -0,0 +1,89 @@
+name: Integration Bounty
+description: A bounty task for the integration contribution program
+title: "[Bounty]: "
+labels: []
+body:
+  - type: markdown
+    attributes:
+      value: |
+        ## Integration Bounty
+
+        This issue is part of the [Integration Bounty Program](../../docs/bounty-program/README.md).
+        **Claim this bounty** by commenting below — a maintainer will assign you within 24 hours.
+
+  - type: dropdown
+    id: bounty-type
+    attributes:
+      label: Bounty Type
+      options:
+        - "Test a Tool (20 pts)"
+        - "Write Docs (20 pts)"
+        - "Code Contribution (30 pts)"
+        - "New Integration (75 pts)"
+    validations:
+      required: true
+
+  - type: dropdown
+    id: difficulty
+    attributes:
+      label: Difficulty
+      options:
+        - Easy
+        - Medium
+        - Hard
+    validations:
+      required: true
+
+  - type: input
+    id: tool-name
+    attributes:
+      label: Tool Name
+      description: The integration this bounty targets (e.g., `airtable`, `salesforce`)
+      placeholder: e.g., airtable
+    validations:
+      required: true
+
+  - type: textarea
+    id: description
+    attributes:
+      label: Description
+      description: What needs to be done to complete this bounty.
+      placeholder: |
+        Describe the specific task, including:
+        - What the contributor needs to do
+        - Links to relevant files in the repo
+        - Any setup requirements (API keys, accounts, etc.)
+    validations:
+      required: true
+
+  - type: textarea
+    id: acceptance-criteria
+    attributes:
+      label: Acceptance Criteria
+      description: What "done" looks like. The PR or report must meet all criteria.
+      placeholder: |
+        - [ ] Criterion 1
+        - [ ] Criterion 2
+        - [ ] CI passes
+    validations:
+      required: true
+
+  - type: textarea
+    id: relevant-files
+    attributes:
+      label: Relevant Files
+      description: Links to tool directory, credential spec, health check file, etc.
+      placeholder: |
+        - Tool: `tools/src/aden_tools/tools/{tool_name}/`
+        - Credential spec: `tools/src/aden_tools/credentials/{category}.py`
+        - Health checks: `tools/src/aden_tools/credentials/health_check.py`
+
+  - type: textarea
+    id: resources
+    attributes:
+      label: Resources
+      description: Links to API docs, examples, or guides that will help the contributor.
+      placeholder: |
+        - [Building Tools Guide](../../tools/BUILDING_TOOLS.md)
+        - [Tool README Template](../../docs/bounty-program/templates/tool-readme-template.md)
+        - API docs: https://...
@@ -0,0 +1,71 @@
+---
+name: Integration Request
+about: Suggest a new integration
+title: "[Integration]:"
+labels: ''
+assignees: ''
+
+---
+
+## Service                                                                                      
+                                                                                                 
+ Name and brief description of the service and what it enables agents to do.                     
+                                                                                                 
+ **Description:** [e.g., "API key for Slack Bot" — short one-liner for the credential spec]      
+                                                                                                 
+ ## Credential Identity                                                                          
+                                                                                                 
+ - **credential_id:** [e.g., `slack`]                                                            
+ - **env_var:** [e.g., `SLACK_BOT_TOKEN`]                                                        
+ - **credential_key:** [e.g., `access_token`, `api_key`, `bot_token`]                            
+                                                                                                 
+ ## Tools                                                                                        
+                                                                                                 
+ Tool function names that require this credential:                                               
+                                                                                                 
+ - [e.g., `slack_send_message`]                                                                  
+ - [e.g., `slack_list_channels`]                                                                 
+                                                                                                 
+ ## Auth Methods                                                                                 
+                                                                                                 
+ - **Direct API key supported:** Yes / No                                                        
+ - **Aden OAuth supported:** Yes / No                                                            
+                                                                                                 
+ If Aden OAuth is supported, describe the OAuth scopes/permissions required.                     
+                                                                                                 
+ ## How to Get the Credential                                                                    
+                                                                                                 
+ Link where users obtain the key/token:                                                          
+                                                                                                 
+ [e.g., https://api.slack.com/apps]                                                              
+                                                                                                 
+ Step-by-step instructions:                                                                      
+                                                                                                 
+ 1. Go to ...                                                                                    
+ 2. Create a ...                                                                                 
+ 3. Select scopes/permissions: ...                                                               
+ 4. Copy the key/token                                                                           
+                                                                                                 
+ ## Health Check                                                                                 
+                                                                                                 
+ A lightweight API call to validate the credential (no writes, no charges).                      
+                                                                                                 
+ - **Endpoint:** [e.g., `https://slack.com/api/auth.test`]                                       
+ - **Method:** [e.g., `GET` or `POST`]                                                           
+ - **Auth header:** [e.g., `Authorization: Bearer {token}` or `X-Api-Key: {key}`]                
+ - **Parameters (if any):** [e.g., `?limit=1`]                                                   
+ - **200 means:** [e.g., key is valid]                                                           
+ - **401 means:** [e.g., invalid or expired]                                                     
+ - **429 means:** [e.g., rate limited but key is valid]                                          
+                                                                                                 
+ ## Credential Group                                                                             
+                                                                                                 
+ Does this require multiple credentials configured together? (e.g., Google Custom Search needs   
+ both an API key and a CSE ID)                                                                   
+                                                                                                 
+ - [ ] No, single credential                                                                     
+ - [ ] Yes — list the other credential IDs in the group:                                         
+                                                                                                 
+ ## Additional Context                                                                           
+                                                                                                 
+ Links to API docs, rate limits, free tier availability, or anything else relevant.
@@ -0,0 +1,78 @@
+name: Standard Bounty
+description: A bounty task for general framework contributions (not integration-specific)
+title: "[Bounty]: "
+labels: []
+body:
+  - type: markdown
+    attributes:
+      value: |
+        ## Standard Bounty
+
+        This issue is part of the [Bounty Program](../../docs/bounty-program/README.md).
+        **Claim this bounty** by commenting below — a maintainer will assign you within 24 hours.
+
+  - type: dropdown
+    id: bounty-size
+    attributes:
+      label: Bounty Size
+      options:
+        - "Small (10 pts)"
+        - "Medium (30 pts)"
+        - "Large (75 pts)"
+        - "Extreme (150 pts)"
+    validations:
+      required: true
+
+  - type: dropdown
+    id: difficulty
+    attributes:
+      label: Difficulty
+      options:
+        - Easy
+        - Medium
+        - Hard
+    validations:
+      required: true
+
+  - type: textarea
+    id: description
+    attributes:
+      label: Description
+      description: What needs to be done to complete this bounty.
+      placeholder: |
+        Describe the specific task, including:
+        - What the contributor needs to do
+        - Links to relevant files in the repo
+        - Any context or motivation for the change
+    validations:
+      required: true
+
+  - type: textarea
+    id: acceptance-criteria
+    attributes:
+      label: Acceptance Criteria
+      description: What "done" looks like. The PR must meet all criteria.
+      placeholder: |
+        - [ ] Criterion 1
+        - [ ] Criterion 2
+        - [ ] CI passes
+    validations:
+      required: true
+
+  - type: textarea
+    id: relevant-files
+    attributes:
+      label: Relevant Files
+      description: Links to files or directories related to this bounty.
+      placeholder: |
+        - `path/to/file.py`
+        - `path/to/directory/`
+
+  - type: textarea
+    id: resources
+    attributes:
+      label: Resources
+      description: Links to docs, issues, or external references that will help.
+      placeholder: |
+        - Related issue: #XXXX
+        - Docs: https://...
@@ -24,8 +24,8 @@ Fixes #(issue number)

 Describe the tests you ran to verify your changes:

- [ ] Unit tests pass (`npm run test`)
- [ ] Lint passes (`npm run lint`)
+- [ ] Unit tests pass (`cd core && pytest tests/`)
+- [ ] Lint passes (`cd core && ruff check .`)
 - [ ] Manual testing performed

 ## Checklist
@@ -0,0 +1,34 @@
+name: Auto-close duplicate issues
+description: Auto-closes issues that are duplicates of existing issues
+on:
+  schedule:
+    - cron: "0 */6 * * *"
+  workflow_dispatch:
+
+jobs:
+  auto-close-duplicates:
+    runs-on: ubuntu-latest
+    timeout-minutes: 10
+    permissions:
+      contents: read
+      issues: write
+
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v4
+
+      - name: Setup Bun
+        uses: oven-sh/setup-bun@v2
+        with:
+          bun-version: latest
+
+      - name: Run auto-close-duplicates tests
+        run: bun test scripts/auto-close-duplicates
+
+      - name: Auto-close duplicate issues
+        run: bun run scripts/auto-close-duplicates.ts
+        env:
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          GITHUB_REPOSITORY_OWNER: ${{ github.repository_owner }}
+          GITHUB_REPOSITORY_NAME: ${{ github.event.repository.name }}
+          STATSIG_API_KEY: ${{ secrets.STATSIG_API_KEY }}
@@ -0,0 +1,47 @@
+name: Bounty completed
+description: Awards points and notifies Discord when a bounty PR is merged
+
+on:
+  pull_request_target:
+    types: [closed]
+
+  workflow_dispatch:
+    inputs:
+      pr_number:
+        description: "PR number to process (for missed bounties)"
+        required: true
+        type: number
+
+jobs:
+  bounty-notify:
+    if: >
+      github.event_name == 'workflow_dispatch' ||
+      (github.event.pull_request.merged == true &&
+       contains(join(github.event.pull_request.labels.*.name, ','), 'bounty:'))
+    runs-on: ubuntu-latest
+    timeout-minutes: 5
+    permissions:
+      contents: read
+      pull-requests: read
+
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v4
+
+      - name: Setup Bun
+        uses: oven-sh/setup-bun@v2
+        with:
+          bun-version: latest
+
+      - name: Award XP and notify Discord
+        run: bun run scripts/bounty-tracker.ts notify
+        env:
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          GITHUB_REPOSITORY_OWNER: ${{ github.repository_owner }}
+          GITHUB_REPOSITORY_NAME: ${{ github.event.repository.name }}
+          DISCORD_WEBHOOK_URL: ${{ secrets.DISCORD_BOUNTY_WEBHOOK_URL }}
+          BOT_API_URL: ${{ secrets.BOT_API_URL }}
+          BOT_API_KEY: ${{ secrets.BOT_API_KEY }}
+          LURKR_API_KEY: ${{ secrets.LURKR_API_KEY }}
+          LURKR_GUILD_ID: ${{ secrets.LURKR_GUILD_ID }}
+          PR_NUMBER: ${{ inputs.pr_number || github.event.pull_request.number }}
@@ -5,91 +5,141 @@ on:
    branches: [main]
  pull_request:
    branches: [main]
-
+    
 concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

 jobs:
  lint:
-    name: Lint
+    name: Lint Python
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

-      - name: Setup Node.js
-        uses: actions/setup-node@v4
+      - name: Setup Python
+        uses: actions/setup-python@v5
        with:
-          node-version: '20'
-          cache: 'npm'
+          python-version: '3.11'
+
+      - name: Install uv
+        uses: astral-sh/setup-uv@v4
+        with:
+          enable-cache: true

      - name: Install dependencies
-        run: npm ci
+        run: uv sync --project core --group dev

-      - name: Run linter
-        run: npm run lint
+      - name: Ruff lint
+        run: |
+          uv run --project core ruff check core/
+          uv run --project core ruff check tools/
+
+      - name: Ruff format
+        run: |
+          uv run --project core ruff format --check core/
+          uv run --project core ruff format --check tools/

  test:
-    name: Test
-    runs-on: ubuntu-latest
+    name: Test Python Framework
+    runs-on: ${{ matrix.os }}
+    strategy:
+      matrix:
+        os: [ubuntu-latest, windows-latest]
    steps:
      - uses: actions/checkout@v4

-      - name: Setup Node.js
-        uses: actions/setup-node@v4
+      - name: Setup Python
+        uses: actions/setup-python@v5
        with:
-          node-version: '20'
-          cache: 'npm'
+          python-version: '3.11'

+      - name: Install uv
+        uses: astral-sh/setup-uv@v4
+        with:
+          enable-cache: true
+
+      - name: Install dependencies and run tests
+        working-directory: core
+        run: |
+          uv sync
+          uv run pytest tests/ -v
+
+  test-tools:
+    name: Test Tools (${{ matrix.os }})
+    runs-on: ${{ matrix.os }}
+    strategy:
+      matrix:
+        os: [ubuntu-latest, windows-latest]
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Setup Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: '3.11'
+
+      - name: Install uv
+        uses: astral-sh/setup-uv@v4
+        with:
+          enable-cache: true
+
+      - name: Install dependencies and run tests
+        working-directory: tools
+        run: |
+          uv sync --extra dev
+          uv run pytest tests/ -v
+
+  validate:
+    name: Validate Agent Exports
+    runs-on: ubuntu-latest
+    needs: [lint, test, test-tools]
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Setup Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: '3.11'
+
+      - name: Install uv
+        uses: astral-sh/setup-uv@v4
+        with:
+          enable-cache: true
+            
      - name: Install dependencies
-        run: npm ci
+        working-directory: core
+        run: |
+          uv sync

-      - name: Run tests
-        run: npm run test
+      - name: Validate exported agents
+        run: |
+          # Check that agent exports have valid structure
+          if [ ! -d "exports" ]; then
+            echo "No exports/ directory found, skipping validation"
+            exit 0
+          fi

-  build:
-    name: Build
-    runs-on: ubuntu-latest
-    needs: [lint, test]
-    steps:
-      - uses: actions/checkout@v4
+          shopt -s nullglob
+          agent_dirs=(exports/*/)
+          shopt -u nullglob

-      - name: Setup Node.js
-        uses: actions/setup-node@v4
-        with:
-          node-version: '20'
-          cache: 'npm'
+          if [ ${#agent_dirs[@]} -eq 0 ]; then
+            echo "No agent directories in exports/, skipping validation"
+            exit 0
+          fi

-      - name: Install dependencies
-        run: npm ci
+          validated=0
+          for agent_dir in "${agent_dirs[@]}"; do
+            if [ -f "$agent_dir/agent.json" ]; then
+              echo "Validating $agent_dir"
+              uv run python -c "import json; json.load(open('$agent_dir/agent.json'))"
+              validated=$((validated + 1))
+            fi
+          done

-      - name: Build packages
-        run: npm run build
-
-  docker:
-    name: Docker Build
-    runs-on: ubuntu-latest
-    needs: [lint, test]
-    steps:
-      - uses: actions/checkout@v4
-
-      - name: Set up Docker Buildx
-        uses: docker/setup-buildx-action@v3
-
-      - name: Build frontend image
-        uses: docker/build-push-action@v5
-        with:
-          context: ./honeycomb
-          push: false
-          tags: honeycomb-frontend:test
-          cache-from: type=gha
-          cache-to: type=gha,mode=max
-
-      - name: Build backend image
-        uses: docker/build-push-action@v5
-        with:
-          context: ./hive
-          push: false
-          tags: honeycomb-backend:test
-          cache-from: type=gha
-          cache-to: type=gha,mode=max
+          if [ "$validated" -eq 0 ]; then
+            echo "No agent.json files found in exports/, skipping validation"
+          else
+            echo "Validated $validated agent(s)"
+          fi
@@ -0,0 +1,103 @@
+name: Issue Triage
+
+on:
+  issues:
+    types: [opened]
+
+jobs:
+  triage:
+    runs-on: ubuntu-latest
+    timeout-minutes: 10
+    permissions:
+      contents: read
+      issues: write
+      id-token: write
+
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v4
+        with:
+          fetch-depth: 1
+
+      - name: Triage and check for duplicates
+        uses: anthropics/claude-code-action@v1
+        with:
+          anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
+          github_token: ${{ secrets.GITHUB_TOKEN }}
+          allowed_non_write_users: "*"
+          prompt: |
+            Analyze this new issue and perform triage tasks.
+
+            Issue: #${{ github.event.issue.number }}
+            Repository: ${{ github.repository }}
+
+            ## Your Tasks:
+
+            ### 1. Get issue details
+            Use mcp__github__get_issue to get the full details of issue #${{ github.event.issue.number }}
+
+            ### 2. Check for duplicates
+            Search for similar existing issues using mcp__github__search_issues with relevant keywords from the issue title and body.
+
+            Criteria for duplicates:
+            - Same bug or error being reported
+            - Same feature request (even if worded differently)
+            - Same question being asked
+            - Issues describing the same root problem
+
+            If you find a duplicate:
+            - Add a comment using EXACTLY this format (required for auto-close to work):
+              "Found a possible duplicate of #<issue_number>: <brief explanation of why it's a duplicate>"
+            - Do NOT apply the "duplicate" label yet (the auto-close script will add it after 12 hours if no objections)
+            - Suggest the user react with a thumbs-down if they disagree
+
+            ### 3. Check for Low-Quality / AI Spam
+            Analyze the issue quality. We are receiving many low-effort, AI-generated spam issues.
+            Flag the issue as INVALID if it matches these criteria:
+            - **Vague/Generic**: Title is "Fix bug" or "Error" without specific context.
+            - **Hallucinated**: Refers to files or features that do not exist in this repo.
+            - **Template Filler**: Body contains "Insert description here" or unrelated gibberish.
+            - **Low Effort**: No reproduction steps, no logs, only 1-2 sentences.
+
+            If identified as spam/low-quality:
+            - Add the "invalid" label.
+            - Add a comment:
+              "This issue has been automatically flagged as low-quality or potentially AI-generated spam. It lacks specific details (logs, reproduction steps, file references) required for us to help. Please open a new issue following the template exactly if this is a legitimate request."
+            - Do NOT proceed to other steps.
+
+            ### 4. Check for invalid issues (General)
+            If the issue is not spam but still lacks information:
+            - Add the "invalid" label
+            - Comment asking for clarification
+
+            ### 5. Categorize with labels (if NOT a duplicate or spam)
+            Apply appropriate labels based on the issue content. Use ONLY these labels:
+            - bug: Something isn't working
+            - enhancement: New feature or request
+            - question: Further information is requested
+            - documentation: Improvements or additions to documentation
+            - good first issue: Good for newcomers (if issue is well-defined and small scope)
+            - help wanted: Extra attention is needed (if issue needs community input)
+            - backlog: Tracked for the future, but not currently planned or prioritized
+
+            ### 6. Estimate size (if NOT a duplicate, spam, or invalid)
+            Apply exactly ONE size label to help contributors match their capacity to the task:
+            - "size: small": Docs, typos, single-file fixes, config changes
+            - "size: medium": Bug fixes with tests, adding a single tool, changes within one package
+            - "size: large": Cross-package changes (core + tools), new modules, complex logic, architectural refactors
+
+            You may apply multiple labels if appropriate (e.g., "bug", "size: small", and "good first issue").
+
+            ## Tools Available:
+            - mcp__github__get_issue: Get issue details
+            - mcp__github__search_issues: Search for similar issues
+            - mcp__github__list_issues: List recent issues if needed
+            - mcp__github__add_issue_comment: Add a comment
+            - mcp__github__update_issue: Add labels
+            - mcp__github__get_issue_comments: Get existing comments
+
+            Be thorough but efficient. Focus on accurate categorization and finding true duplicates.
+
+          claude_args: |
+            --model claude-haiku-4-5-20251001
+            --allowedTools "mcp__github__get_issue,mcp__github__search_issues,mcp__github__list_issues,mcp__github__add_issue_comment,mcp__github__update_issue,mcp__github__get_issue_comments"
@@ -0,0 +1,204 @@
+name: PR Check Command
+
+on:
+  issue_comment:
+    types: [created]
+
+jobs:
+  check-pr:
+    # Only run on PR comments that start with /check
+    if: github.event.issue.pull_request && startsWith(github.event.comment.body, '/check')
+    runs-on: ubuntu-latest
+    permissions:
+      pull-requests: write
+      issues: write
+      checks: write
+      statuses: write
+
+    steps:
+      - name: Check PR requirements
+        uses: actions/github-script@v7
+        with:
+          script: |
+            const prNumber = context.payload.issue.number;
+            console.log(`Triggered by /check comment on PR #${prNumber}`);
+
+            // Fetch PR data
+            const { data: pr } = await github.rest.pulls.get({
+              owner: context.repo.owner,
+              repo: context.repo.repo,
+              pull_number: prNumber,
+            });
+
+            const prBody = pr.body || '';
+            const prTitle = pr.title || '';
+            const prAuthor = pr.user.login;
+            const headSha = pr.head.sha;
+
+            // Create a check run in progress
+            const { data: checkRun } = await github.rest.checks.create({
+              owner: context.repo.owner,
+              repo: context.repo.repo,
+              name: 'check-requirements',
+              head_sha: headSha,
+              status: 'in_progress',
+              started_at: new Date().toISOString(),
+            });
+
+            // Extract issue numbers
+            const issuePattern = /(?:close[sd]?|fix(?:e[sd])?|resolve[sd]?)?\s*#(\d+)/gi;
+            const allText = `${prTitle} ${prBody}`;
+            const matches = [...allText.matchAll(issuePattern)];
+            const issueNumbers = [...new Set(matches.map(m => parseInt(m[1], 10)))];
+
+            console.log(`PR #${prNumber}:`);
+            console.log(`  Author: ${prAuthor}`);
+            console.log(`  Found issue references: ${issueNumbers.length > 0 ? issueNumbers.join(', ') : 'none'}`);
+
+            if (issueNumbers.length === 0) {
+              const message = `## PR Closed - Requirements Not Met
+
+            This PR has been automatically closed because it doesn't meet the requirements.
+
+            **Missing:** No linked issue found.
+
+            **To fix:**
+            1. Create or find an existing issue for this work
+            2. Assign yourself to the issue
+            3. Re-open this PR and add \`Fixes #123\` in the description
+
+            **Why is this required?** See #472 for details.`;
+
+              await github.rest.issues.createComment({
+                owner: context.repo.owner,
+                repo: context.repo.repo,
+                issue_number: prNumber,
+                body: message,
+              });
+
+              await github.rest.pulls.update({
+                owner: context.repo.owner,
+                repo: context.repo.repo,
+                pull_number: prNumber,
+                state: 'closed',
+              });
+
+              // Update check run to failure
+              await github.rest.checks.update({
+                owner: context.repo.owner,
+                repo: context.repo.repo,
+                check_run_id: checkRun.id,
+                status: 'completed',
+                conclusion: 'failure',
+                completed_at: new Date().toISOString(),
+                output: {
+                  title: 'Missing linked issue',
+                  summary: 'PR must reference an issue (e.g., `Fixes #123`)',
+                },
+              });
+
+              core.setFailed('PR must reference an issue');
+              return;
+            }
+
+            // Check if PR author is assigned to any linked issue
+            let issueWithAuthorAssigned = null;
+            let issuesWithoutAuthor = [];
+
+            for (const issueNum of issueNumbers) {
+              try {
+                const { data: issue } = await github.rest.issues.get({
+                  owner: context.repo.owner,
+                  repo: context.repo.repo,
+                  issue_number: issueNum,
+                });
+
+                const assigneeLogins = (issue.assignees || []).map(a => a.login);
+                if (assigneeLogins.includes(prAuthor)) {
+                  issueWithAuthorAssigned = issueNum;
+                  console.log(`  Issue #${issueNum} has PR author ${prAuthor} as assignee`);
+                  break;
+                } else {
+                  issuesWithoutAuthor.push({
+                    number: issueNum,
+                    assignees: assigneeLogins
+                  });
+                  console.log(`  Issue #${issueNum} assignees: ${assigneeLogins.length > 0 ? assigneeLogins.join(', ') : 'none'}`);
+                }
+              } catch (error) {
+                console.log(`  Issue #${issueNum} not found`);
+              }
+            }
+
+            if (!issueWithAuthorAssigned) {
+              const issueList = issuesWithoutAuthor.map(i =>
+                `#${i.number} (assignees: ${i.assignees.length > 0 ? i.assignees.join(', ') : 'none'})`
+              ).join(', ');
+
+              const message = `## PR Closed - Requirements Not Met
+
+            This PR has been automatically closed because it doesn't meet the requirements.
+
+            **PR Author:** @${prAuthor}
+            **Found issues:** ${issueList}
+            **Problem:** The PR author must be assigned to the linked issue.
+
+            **To fix:**
+            1. Assign yourself (@${prAuthor}) to one of the linked issues
+            2. Re-open this PR
+
+            **Why is this required?** See #472 for details.`;
+
+              await github.rest.issues.createComment({
+                owner: context.repo.owner,
+                repo: context.repo.repo,
+                issue_number: prNumber,
+                body: message,
+              });
+
+              await github.rest.pulls.update({
+                owner: context.repo.owner,
+                repo: context.repo.repo,
+                pull_number: prNumber,
+                state: 'closed',
+              });
+
+              // Update check run to failure
+              await github.rest.checks.update({
+                owner: context.repo.owner,
+                repo: context.repo.repo,
+                check_run_id: checkRun.id,
+                status: 'completed',
+                conclusion: 'failure',
+                completed_at: new Date().toISOString(),
+                output: {
+                  title: 'PR author not assigned to issue',
+                  summary: `PR author @${prAuthor} must be assigned to one of the linked issues: ${issueList}`,
+                },
+              });
+
+              core.setFailed('PR author must be assigned to the linked issue');
+            } else {
+              await github.rest.issues.createComment({
+                owner: context.repo.owner,
+                repo: context.repo.repo,
+                issue_number: prNumber,
+                body: `✅ PR requirements met! Issue #${issueWithAuthorAssigned} has @${prAuthor} as assignee.`,
+              });
+
+              // Update check run to success
+              await github.rest.checks.update({
+                owner: context.repo.owner,
+                repo: context.repo.repo,
+                check_run_id: checkRun.id,
+                status: 'completed',
+                conclusion: 'success',
+                completed_at: new Date().toISOString(),
+                output: {
+                  title: 'Requirements met',
+                  summary: `Issue #${issueWithAuthorAssigned} has @${prAuthor} as assignee.`,
+                },
+              });
+
+              console.log(`PR requirements met!`);
+            }
@@ -0,0 +1,138 @@
+name: PR Requirements Backfill
+
+on:
+  workflow_dispatch:
+
+jobs:
+  check-all-open-prs:
+    runs-on: ubuntu-latest
+    permissions:
+      pull-requests: write
+      issues: write
+
+    steps:
+      - name: Check all open PRs
+        uses: actions/github-script@v7
+        with:
+          script: |
+            const { data: pullRequests } = await github.rest.pulls.list({
+              owner: context.repo.owner,
+              repo: context.repo.repo,
+              state: 'open',
+              per_page: 100,
+            });
+
+            console.log(`Found ${pullRequests.length} open PRs`);
+
+            for (const pr of pullRequests) {
+              const prNumber = pr.number;
+              const prBody = pr.body || '';
+              const prTitle = pr.title || '';
+              const prAuthor = pr.user.login;
+
+              console.log(`\nChecking PR #${prNumber}: ${prTitle}`);
+
+              // Extract issue numbers from body and title
+              const issuePattern = /(?:close[sd]?|fix(?:e[sd])?|resolve[sd]?)?\s*#(\d+)/gi;
+              const allText = `${prTitle} ${prBody}`;
+              const matches = [...allText.matchAll(issuePattern)];
+              const issueNumbers = [...new Set(matches.map(m => parseInt(m[1], 10)))];
+
+              console.log(`  Found issue references: ${issueNumbers.length > 0 ? issueNumbers.join(', ') : 'none'}`);
+
+              if (issueNumbers.length === 0) {
+                console.log(`  ❌ No linked issue - closing PR`);
+
+                const message = `## PR Closed - Requirements Not Met
+
+            This PR has been automatically closed because it doesn't meet the requirements.
+
+            **Missing:** No linked issue found.
+
+            **To fix:**
+            1. Create or find an existing issue for this work
+            2. Assign yourself to the issue
+            3. Re-open this PR and add \`Fixes #123\` in the description`;
+
+                await github.rest.issues.createComment({
+                  owner: context.repo.owner,
+                  repo: context.repo.repo,
+                  issue_number: prNumber,
+                  body: message,
+                });
+
+                await github.rest.pulls.update({
+                  owner: context.repo.owner,
+                  repo: context.repo.repo,
+                  pull_number: prNumber,
+                  state: 'closed',
+                });
+
+                continue;
+              }
+
+              // Check if any linked issue has the PR author as assignee
+              let issueWithAuthorAssigned = null;
+              let issuesWithoutAuthor = [];
+
+              for (const issueNum of issueNumbers) {
+                try {
+                  const { data: issue } = await github.rest.issues.get({
+                    owner: context.repo.owner,
+                    repo: context.repo.repo,
+                    issue_number: issueNum,
+                  });
+
+                  const assigneeLogins = (issue.assignees || []).map(a => a.login);
+                  if (assigneeLogins.includes(prAuthor)) {
+                    issueWithAuthorAssigned = issueNum;
+                    break;
+                  } else {
+                    issuesWithoutAuthor.push({
+                      number: issueNum,
+                      assignees: assigneeLogins
+                    });
+                  }
+                } catch (error) {
+                  console.log(`  Issue #${issueNum} not found or inaccessible`);
+                }
+              }
+
+              if (!issueWithAuthorAssigned) {
+                const issueList = issuesWithoutAuthor.map(i =>
+                  `#${i.number} (assignees: ${i.assignees.length > 0 ? i.assignees.join(', ') : 'none'})`
+                ).join(', ');
+
+                console.log(`  ❌ PR author not assigned to any linked issue - closing PR`);
+
+                const message = `## PR Closed - Requirements Not Met
+
+            This PR has been automatically closed because it doesn't meet the requirements.
+
+            **PR Author:** @${prAuthor}
+            **Found issues:** ${issueList}
+            **Problem:** The PR author must be assigned to the linked issue.
+
+            **To fix:**
+            1. Assign yourself (@${prAuthor}) to one of the linked issues
+            2. Re-open this PR`;
+
+                await github.rest.issues.createComment({
+                  owner: context.repo.owner,
+                  repo: context.repo.repo,
+                  issue_number: prNumber,
+                  body: message,
+                });
+
+                await github.rest.pulls.update({
+                  owner: context.repo.owner,
+                  repo: context.repo.repo,
+                  pull_number: prNumber,
+                  state: 'closed',
+                });
+              } else {
+                console.log(`  ✅ PR requirements met! Issue #${issueWithAuthorAssigned} has ${prAuthor} as assignee.`);
+              }
+            }
+
+            console.log('\nBackfill complete!');
@@ -0,0 +1,54 @@
+# Closes PRs that still have the `pr-requirements-warning` label
+# after contributors were warned in pr-requirements.yml.
+name: PR Requirements Enforcement
+on:
+  schedule:
+    - cron: "0 0 * * *"   # runs every day once at midnight 
+jobs:
+  enforce:
+    name: Close PRs still failing contribution requirements
+    runs-on: ubuntu-latest
+    permissions:
+      pull-requests: write
+      issues: write
+    steps:
+      - name: Close PRs still failing requirements
+        uses: actions/github-script@v7
+        with:
+          script: |
+            const { owner, repo } = context.repo;
+            const prs = await github.paginate(github.rest.pulls.list, {
+              owner,
+              repo,
+              state: "open",
+              per_page: 100
+            });
+            for (const pr of prs) {
+              // Skip draft PRs — author may still be actively working toward compliance
+              if (pr.draft) continue;
+              const labels = pr.labels.map(l => l.name);
+              if (!labels.includes("pr-requirements-warning")) continue;
+              const gracePeriod = 24 * 60 * 60 * 1000;
+              const lastUpdated = new Date(pr.created_at);
+              const now = new Date();
+              if (now - lastUpdated < gracePeriod) {
+                console.log(`Skipping PR #${pr.number} — still within grace period`);
+                continue;
+              }
+              const prNumber = pr.number;
+              const prAuthor = pr.user.login;
+              await github.rest.issues.createComment({
+                owner,
+                repo,
+                issue_number: prNumber,
+                body: `Closing PR because the contribution requirements were not resolved within the 24-hour grace period.
+                If this was closed in error, feel free to reopen the PR after fixing the requirements.`
+              });
+              await github.rest.pulls.update({
+                owner,
+                repo,
+                pull_number: prNumber,
+                state: "closed"
+              });
+              console.log(`Closed PR #${prNumber} by ${prAuthor} (PR requirements were not met)`);
+            }
@@ -0,0 +1,203 @@
+name: PR Requirements Check
+
+on:
+  pull_request_target:
+    types: [opened, reopened, edited, synchronize]
+
+jobs:
+  check-requirements:
+    runs-on: ubuntu-latest
+    permissions:
+      pull-requests: write
+      issues: write
+
+    steps:
+      - name: Check PR has linked issue with assignee
+        uses: actions/github-script@v7
+        with:
+          script: |
+            const pr = context.payload.pull_request;
+            const prNumber = pr.number;
+            const prBody = pr.body || '';
+            const prTitle = pr.title || '';
+            const prLabels = (pr.labels || []).map(l => l.name);
+
+            // Allow micro-fix and documentation PRs without a linked issue
+            const isMicroFix = prLabels.includes('micro-fix') || /micro-fix/i.test(prTitle);
+            const isDocumentation = prLabels.includes('documentation') || /\bdocs?\b/i.test(prTitle);
+            if (isMicroFix || isDocumentation) {
+              const reason = isMicroFix ? 'micro-fix' : 'documentation';
+              console.log(`PR #${prNumber} is a ${reason}, skipping issue requirement.`);
+              return;
+            }
+
+            // Extract issue numbers from body and title
+            // Matches: fixes #123, closes #123, resolves #123, or plain #123
+            const issuePattern = /(?:close[sd]?|fix(?:e[sd])?|resolve[sd]?)?\s*#(\d+)/gi;
+
+            const allText = `${prTitle} ${prBody}`;
+            const matches = [...allText.matchAll(issuePattern)];
+            const issueNumbers = [...new Set(matches.map(m => parseInt(m[1], 10)))];
+
+            console.log(`PR #${prNumber}:`);
+            console.log(`  Found issue references: ${issueNumbers.length > 0 ? issueNumbers.join(', ') : 'none'}`);
+
+            if (issueNumbers.length === 0) {
+              const message = `## PR Requirements Warning
+
+            This PR does not meet the contribution requirements.
+            If the issue is not fixed within ~24 hours, it may be automatically closed.
+
+            **Missing:** No linked issue found.
+
+            **To fix:**
+            1. Create or find an existing issue for this work
+            2. Assign yourself to the issue
+            3. Re-open this PR and add \`Fixes #123\` in the description
+
+            **Exception:** To bypass this requirement, you can:
+            - Add the \`micro-fix\` label or include \`micro-fix\` in your PR title for trivial fixes
+            - Add the \`documentation\` label or include \`doc\`/\`docs\` in your PR title for documentation changes
+
+            **Micro-fix requirements** (must meet ALL):
+            | Qualifies | Disqualifies |
+            |-----------|--------------|
+            | < 20 lines changed | Any functional bug fix |
+            | Typos & Documentation & Linting | Refactoring for "clean code" |
+            | No logic/API/DB changes | New features (even tiny ones) |
+
+            **Why is this required?** See #472 for details.`;
+
+              const comments = await github.paginate(github.rest.issues.listComments, {
+                owner: context.repo.owner,
+                repo: context.repo.repo,
+                issue_number: prNumber,
+                per_page: 100,
+              });
+
+              const botComment = comments.find(
+                (c) => c.user.type === 'Bot' && c.body.includes('PR Requirements Warning')
+              );
+
+              if (!botComment) {
+                await github.rest.issues.createComment({
+                  owner: context.repo.owner,
+                  repo: context.repo.repo,
+                  issue_number: prNumber,
+                  body: message,
+                });
+              }
+
+              await github.rest.issues.addLabels({
+                owner: context.repo.owner,
+                repo: context.repo.repo,
+                issue_number: prNumber,
+                labels: ['pr-requirements-warning'],
+              });
+
+              core.setFailed('PR must reference an issue');
+              return;
+            }
+
+            // Check if any linked issue has the PR author as assignee
+            const prAuthor = pr.user.login;
+            let issueWithAuthorAssigned = null;
+            let issuesWithoutAuthor = [];
+
+            for (const issueNum of issueNumbers) {
+              try {
+                const { data: issue } = await github.rest.issues.get({
+                  owner: context.repo.owner,
+                  repo: context.repo.repo,
+                  issue_number: issueNum,
+                });
+
+                const assigneeLogins = (issue.assignees || []).map(a => a.login);
+                if (assigneeLogins.includes(prAuthor)) {
+                  issueWithAuthorAssigned = issueNum;
+                  console.log(`  Issue #${issueNum} has PR author ${prAuthor} as assignee`);
+                  break;
+                } else {
+                  issuesWithoutAuthor.push({
+                    number: issueNum,
+                    assignees: assigneeLogins
+                  });
+                  console.log(`  Issue #${issueNum} assignees: ${assigneeLogins.length > 0 ? assigneeLogins.join(', ') : 'none'} (PR author: ${prAuthor})`);
+                }
+              } catch (error) {
+                console.log(`  Issue #${issueNum} not found or inaccessible`);
+              }
+            }
+
+            if (!issueWithAuthorAssigned) {
+              const issueList = issuesWithoutAuthor.map(i =>
+                `#${i.number} (assignees: ${i.assignees.length > 0 ? i.assignees.join(', ') : 'none'})`
+              ).join(', ');
+
+              const message = `## PR Requirements Warning
+
+            This PR does not meet the contribution requirements.
+            If the issue is not fixed within ~24 hours, it may be automatically closed.
+
+            **PR Author:** @${prAuthor}
+            **Found issues:** ${issueList}
+            **Problem:** The PR author must be assigned to the linked issue.
+
+            **To fix:**
+            1. Assign yourself (@${prAuthor}) to one of the linked issues
+            2. Re-open this PR
+
+            **Exception:** To bypass this requirement, you can:
+            - Add the \`micro-fix\` label or include \`micro-fix\` in your PR title for trivial fixes
+            - Add the \`documentation\` label or include \`doc\`/\`docs\` in your PR title for documentation changes
+
+            **Micro-fix requirements** (must meet ALL):
+            | Qualifies | Disqualifies |
+            |-----------|--------------|
+            | < 20 lines changed | Any functional bug fix |
+            | Typos & Documentation & Linting | Refactoring for "clean code" |
+            | No logic/API/DB changes | New features (even tiny ones) |
+
+            **Why is this required?** See #472 for details.`;
+
+              const comments = await github.paginate(github.rest.issues.listComments, {
+                owner: context.repo.owner,
+                repo: context.repo.repo,
+                issue_number: prNumber,
+                per_page: 100,
+              });
+
+              const botComment = comments.find(
+                (c) => c.user.type === 'Bot' && c.body.includes('PR Requirements Warning')
+              );
+
+              if (!botComment) {
+                await github.rest.issues.createComment({
+                  owner: context.repo.owner,
+                  repo: context.repo.repo,
+                  issue_number: prNumber,
+                  body: message,
+                });
+              }
+
+              await github.rest.issues.addLabels({
+                owner: context.repo.owner,
+                repo: context.repo.repo,
+                issue_number: prNumber,
+                labels: ['pr-requirements-warning'],
+              });
+
+              core.setFailed('PR author must be assigned to the linked issue');
+            } else {
+              console.log(`PR requirements met! Issue #${issueWithAuthorAssigned} has ${prAuthor} as assignee.`);
+              try {
+                await github.rest.issues.removeLabel({
+                  owner: context.repo.owner,
+                  repo: context.repo.repo,
+                  issue_number: prNumber,
+                  name: "pr-requirements-warning"
+                });
+              }catch (error){
+                //ignore if label doesn't exist
+              }
+            }
@@ -7,7 +7,6 @@ on:

 permissions:
  contents: write
-  packages: write

 jobs:
  release:
@@ -18,20 +17,23 @@ jobs:
        with:
          fetch-depth: 0

-      - name: Setup Node.js
-        uses: actions/setup-node@v4
+      - name: Setup Python
+        uses: actions/setup-python@v5
        with:
-          node-version: '20'
-          cache: 'npm'
+          python-version: '3.11'
+
+      - name: Install uv
+        uses: astral-sh/setup-uv@v4

      - name: Install dependencies
-        run: npm ci
-
-      - name: Build packages
-        run: npm run build
+        run: |
+          cd core
+          uv sync

      - name: Run tests
-        run: npm run test
+        run: |
+          cd core
+          uv run pytest tests/ -v

      - name: Generate changelog
        id: changelog
@@ -46,50 +48,3 @@ jobs:
          generate_release_notes: true
          draft: false
          prerelease: ${{ contains(github.ref, '-') }}
-
-  docker-publish:
-    name: Publish Docker Images
-    runs-on: ubuntu-latest
-    needs: release
-    steps:
-      - uses: actions/checkout@v4
-
-      - name: Set up Docker Buildx
-        uses: docker/setup-buildx-action@v3
-
-      - name: Login to GitHub Container Registry
-        uses: docker/login-action@v3
-        with:
-          registry: ghcr.io
-          username: ${{ github.actor }}
-          password: ${{ secrets.GITHUB_TOKEN }}
-
-      - name: Extract metadata
-        id: meta
-        uses: docker/metadata-action@v5
-        with:
-          images: |
-            ghcr.io/${{ github.repository }}/frontend
-            ghcr.io/${{ github.repository }}/backend
-          tags: |
-            type=semver,pattern={{version}}
-            type=semver,pattern={{major}}.{{minor}}
-            type=semver,pattern={{major}}
-
-      - name: Build and push frontend
-        uses: docker/build-push-action@v5
-        with:
-          context: ./honeycomb
-          push: true
-          tags: ghcr.io/${{ github.repository }}/frontend:${{ github.ref_name }}
-          cache-from: type=gha
-          cache-to: type=gha,mode=max
-
-      - name: Build and push backend
-        uses: docker/build-push-action@v5
-        with:
-          context: ./hive
-          push: true
-          tags: ghcr.io/${{ github.repository }}/backend:${{ github.ref_name }}
-          cache-from: type=gha
-          cache-to: type=gha,mode=max
@@ -0,0 +1,42 @@
+name: Weekly bounty leaderboard
+description: Posts the integration bounty leaderboard to Discord every Monday
+
+on:
+  schedule:
+    # Every Monday at 9:00 UTC
+    - cron: "0 9 * * 1"
+  workflow_dispatch:
+    inputs:
+      since_date:
+        description: "Only count PRs merged after this date (YYYY-MM-DD). Leave empty for all-time."
+        required: false
+
+jobs:
+  leaderboard:
+    runs-on: ubuntu-latest
+    timeout-minutes: 5
+    permissions:
+      contents: read
+      pull-requests: read
+
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v4
+
+      - name: Setup Bun
+        uses: oven-sh/setup-bun@v2
+        with:
+          bun-version: latest
+
+      - name: Post leaderboard to Discord
+        run: bun run scripts/bounty-tracker.ts leaderboard
+        env:
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          GITHUB_REPOSITORY_OWNER: ${{ github.repository_owner }}
+          GITHUB_REPOSITORY_NAME: ${{ github.event.repository.name }}
+          DISCORD_WEBHOOK_URL: ${{ secrets.DISCORD_BOUNTY_WEBHOOK_URL }}
+          BOT_API_URL: ${{ secrets.BOT_API_URL }}
+          BOT_API_KEY: ${{ secrets.BOT_API_KEY }}
+          LURKR_API_KEY: ${{ secrets.LURKR_API_KEY }}
+          LURKR_GUILD_ID: ${{ secrets.LURKR_GUILD_ID }}
+          SINCE_DATE: ${{ github.event.inputs.since_date || '' }}
@@ -9,12 +9,14 @@ workdir/
 .next/
 out/

-# Environment files (generated from config.yaml)
+# Environment files
 .env
 .env.local
 .env.*.local
-honeycomb/.env
-hive/.env
+.venv
+/venv
+tools/src/uv.lock
+

 # User configuration (copied from .example)
 config.yaml
@@ -48,6 +50,7 @@ coverage/

 # TypeScript
 *.tsbuildinfo
+vite.config.d.ts

 # Python
 __pycache__/
@@ -57,10 +60,22 @@ __pycache__/
 .eggs/
 *.egg

+# Generated runtime data
+core/data/
+
 # Misc
 *.local
 .cache/
 tmp/
 temp/

-exports/*
+exports/*
+
+.claude/settings.local.json
+
+docs/github-issues/*
+core/tests/*dumps/*
+
+screenshots/*
+
+.gemini/*
@@ -1,9 +1,3 @@
 {
-  "mcpServers": {
-    "agent-builder": {
-      "command": "python",
-      "args": ["-m", "framework.mcp.agent_builder_server"],
-      "cwd": "/home/timothy/oss/hive/core"
-    }
-  }
+  "mcpServers": {}
 }
@@ -0,0 +1,18 @@
+repos:
+  - repo: https://github.com/astral-sh/ruff-pre-commit
+    rev: v0.15.0
+    hooks:
+      - id: ruff
+        name: ruff lint (core)
+        args: [--fix]
+        files: ^core/
+      - id: ruff
+        name: ruff lint (tools)
+        args: [--fix]
+        files: ^tools/
+      - id: ruff-format
+        name: ruff format (core)
+        files: ^core/
+      - id: ruff-format
+        name: ruff format (tools)
+        files: ^tools/
@@ -0,0 +1 @@
+3.11
@@ -0,0 +1,30 @@
+# Repository Guidelines
+
+Shared agent instructions for this workspace.
+
+## Coding Agent Notes
+
+- 
+- When working on a GitHub Issue or PR, print the full URL at the end of the task.
+- When answering questions, respond with high-confidence answers only: verify in code; do not guess.
+- Do not update dependencies casually. Version bumps, patched dependencies, overrides, or vendored dependency changes require explicit approval.
+- Add brief comments for tricky logic. Keep files reasonably small when practical; split or refactor large files instead of growing them indefinitely.
+- If shared guardrails are available locally, review them; otherwise follow this repo's guidance.
+- Use `uv` for Python execution and package management. Do not use `python` or `python3` directly unless the user explicitly asks for it.
+- Prefer `uv run` for scripts and tests, and `uv pip` for package operations.
+
+
+## Multi-Agent Safety
+
+- Do not create, apply, or drop `git stash` entries unless explicitly requested.
+- Do not create, remove, or modify `git worktree` checkouts unless explicitly requested.
+- Do not switch branches or check out a different branch unless explicitly requested.
+- When the user says `push`, you may `git pull --rebase` to integrate latest changes, but never discard other in-progress work.
+- When the user says `commit`, commit only your changes. When the user says `commit all`, commit everything in grouped chunks.
+- When you see unrecognized files or unrelated changes, keep going and focus on your scoped changes.
+
+## Change Hygiene
+
+- If staged and unstaged diffs are formatting-only, resolve them without asking.
+- If a commit or push was already requested, include formatting-only follow-up changes in that same commit when practical.
+- Only stop to ask for confirmation when changes are semantic and may alter behavior.
@@ -1,40 +1,330 @@
-# Changelog
+# Release Notes

-All notable changes to this project will be documented in this file.
+## v0.7.1

-The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
-and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+**Release Date:** March 13, 2026
+**Tag:** v0.7.1

-## [Unreleased]
+### Chrome-Native Browser Control

-### Added
- Initial project structure
- React frontend (honeycomb) with Vite and TypeScript
- Node.js backend (hive) with Express and TypeScript
- Docker Compose configuration for local development
- Configuration system via `config.yaml`
- GitHub Actions CI/CD workflows
- Comprehensive documentation
+v0.7.1 replaces Playwright with direct Chrome DevTools Protocol (CDP) integration. The GCU now launches the user's system Chrome via `open -n` on macOS, connects over CDP, and manages browser lifecycle end-to-end -- no extra browser binary required.

-### Changed
- N/A
+---

-### Deprecated
- N/A
+### Highlights

-### Removed
- N/A
+#### System Chrome via CDP

-### Fixed
- N/A
+The entire GCU browser stack has been rewritten:

-### Security
- N/A
+- **Chrome finder & launcher** -- New `chrome_finder.py` discovers installed Chrome and `chrome_launcher.py` manages process lifecycle with `--remote-debugging-port`
+- **Coexist with user's browser** -- `open -n` on macOS launches a separate Chrome instance so the user's tabs stay untouched
+- **Dynamic viewport sizing** -- Viewport auto-sizes to the available display area, suppressing Chrome warning bars
+- **Orphan cleanup** -- Chrome processes are killed on GCU server shutdown to prevent leaks
+- **`--no-startup-window`** -- Chrome launches headlessly by default until a page is needed

-## [0.1.0] - 2025-01-13
+#### Per-Subagent Browser Isolation

-### Added
- Initial release
+Each GCU subagent gets its own Chrome user-data directory, preventing cookie/session cross-contamination:

-[Unreleased]: https://github.com/adenhq/hive/compare/v0.1.0...HEAD
-[0.1.0]: https://github.com/adenhq/hive/releases/tag/v0.1.0
+- Unique browser profiles injected per subagent
+- Profiles cleaned up after top-level GCU node execution
+- Tab origin and age metadata tracked per subagent
+
+#### Dummy Agent Testing Framework
+
+A comprehensive test suite for validating agent graph patterns without LLM calls:
+
+- 8 test modules covering echo, pipeline, branch, parallel merge, retry, feedback loop, worker, and GCU subagent patterns
+- Shared fixtures and a `run_all.py` runner for CI integration
+- Subagent lifecycle tests
+
+---
+
+### What's New
+
+#### GCU Browser
+
+- **Switch from Playwright to system Chrome via CDP** -- Direct CDP connection replaces Playwright dependency. (@bryanadenhq)
+- **Chrome finder and launcher modules** -- `chrome_finder.py` and `chrome_launcher.py` for cross-platform Chrome discovery and process management. (@bryanadenhq)
+- **Dynamic viewport sizing** -- Auto-size viewport and suppress Chrome warning bar. (@bryanadenhq)
+- **Per-subagent browser profile isolation** -- Unique user-data directories per subagent with cleanup. (@bryanadenhq)
+- **Tab origin/age metadata** -- Track which subagent opened each tab and when. (@bryanadenhq)
+- **`browser_close_all` tool** -- Bulk tab cleanup for agents managing many pages. (@bryanadenhq)
+- **Auto-track popup pages** -- Popups are automatically captured and tracked. (@bryanadenhq)
+- **Auto-snapshot from browser interactions** -- Browser interaction tools return screenshots automatically. (@bryanadenhq)
+- **Kill orphaned Chrome processes** -- GCU server shutdown cleans up lingering Chrome instances. (@bryanadenhq)
+- **`--no-startup-window` Chrome flag** -- Prevent empty window on launch. (@bryanadenhq)
+- **Launch Chrome via `open -n` on macOS** -- Coexist with the user's running browser. (@bryanadenhq)
+
+#### Framework & Runtime
+
+- **Session resume fix for new agents** -- Correctly resume sessions when a new agent is loaded. (@bryanadenhq)
+- **Queen upsert fix** -- Prevent duplicate queen entries on session restore. (@bryanadenhq)
+- **Anchor worker monitoring to queen's session ID on cold-restore** -- Worker monitors reconnect to the correct queen after restart. (@bryanadenhq)
+- **Update meta.json when loading workers** -- Worker metadata stays in sync with runtime state. (@RichardTang-Aden)
+- **Generate worker MCP file correctly** -- Fix MCP config generation for spawned workers. (@RichardTang-Aden)
+- **Share event bus so tool events are visible to parent** -- Tool execution events propagate up to parent graphs. (@bryanadenhq)
+- **Subagent activity tracking in queen status** -- Queen instructions include live subagent status. (@bryanadenhq)
+- **GCU system prompt updates** -- Auto-snapshots, batching, popup tracking, and close_all guidance. (@bryanadenhq)
+
+#### Frontend
+
+- **Loading spinner in draft panel** -- Shows spinner during planning phase instead of blank panel. (@bryanadenhq)
+- **Fix credential modal errors** -- Modal no longer eats errors; banner stays visible. (@bryanadenhq)
+- **Fix credentials_required loop** -- Stop clearing the flag on modal close to prevent infinite re-prompting. (@bryanadenhq)
+- **Fix "Add tab" dropdown overflow** -- Dropdown no longer hidden when many agents are open. (@prasoonmhwr)
+
+#### Testing
+
+- **Dummy agent test framework** -- 8 test modules (echo, pipeline, branch, parallel merge, retry, feedback loop, worker, GCU subagent) with shared fixtures and CI runner. (@bryanadenhq)
+- **Subagent lifecycle tests** -- Validate subagent spawn and completion flows. (@bryanadenhq)
+
+#### Documentation & Infrastructure
+
+- **MCP integration PRD** -- Product requirements for MCP server registry. (@TimothyZhang7)
+- **Skills registry PRD** -- Product requirements for skill registry system. (@bryanadenhq)
+- **Bounty program updates** -- Standard bounty issue template and updated contributor guide. (@bryanadenhq)
+- **Windows quickstart** -- Add default context limit for PowerShell setup. (@bryanadenhq)
+- **Remove deprecated files** -- Clean up `setup_mcp.py`, `verify_mcp.py`, `antigravity-setup.md`, and `setup-antigravity-mcp.sh`. (@bryanadenhq)
+
+---
+
+### Bug Fixes
+
+- Fix credential modal eating errors and banner staying open
+- Stop clearing `credentials_required` on modal close to prevent infinite loop
+- Share event bus so tool events are visible to parent graph
+- Use lazy %-formatting in subagent completion log to avoid f-string in logger
+- Anchor worker monitoring to queen's session ID on cold-restore
+- Update meta.json when loading workers
+- Generate worker MCP file correctly
+- Fix "Add tab" dropdown partially hidden when creating multiple agents
+
+---
+
+### Community Contributors
+
+- **Prasoon Mahawar** (@prasoonmhwr) -- Fix UI overflow on agent tab dropdown
+- **Richard Tang** (@RichardTang-Aden) -- Worker MCP generation and meta.json fixes
+
+---
+
+### Upgrading
+
+```bash
+git pull origin main
+uv sync
+```
+
+The Playwright dependency is no longer required for GCU browser operations. Chrome must be installed on the host system.
+
+---
+
+## v0.7.0
+
+**Release Date:** March 5, 2026
+**Tag:** v0.7.0
+
+Session management refactor release.
+
+---
+
+## v0.5.1
+
+**Release Date:** February 18, 2026
+**Tag:** v0.5.1
+
+### The Hive Gets a Brain
+
+v0.5.1 is our most ambitious release yet. Hive agents can now **build other agents** -- the new Hive Coder meta-agent writes, tests, and fixes agent packages from natural language. The runtime grows multi-graph support so one session can orchestrate multiple agents simultaneously. The TUI gets a complete overhaul with an in-app agent picker, live streaming, and seamless escalation to the Coder. And we're now provider-agnostic: Claude Code subscriptions, OpenAI-compatible endpoints, and any LiteLLM-supported model work out of the box.
+
+---
+
+### Highlights
+
+#### Hive Coder -- The Agent That Builds Agents
+
+A native meta-agent that lives inside the framework at `core/framework/agents/hive_coder/`. Give it a natural-language specification and it produces a complete agent package -- goal definition, node prompts, edge routing, MCP tool wiring, tests, and all boilerplate files.
+
+```bash
+# Launch the Coder directly
+hive code
+
+# Or escalate from any running agent (TUI)
+Ctrl+E  # or /coder in chat
+```
+
+The Coder ships with:
+
+- **Reference documentation** -- anti-patterns, construction guide, and design patterns baked into its system prompt
+- **Guardian watchdog** -- an event-driven monitor that catches agent failures and triggers automatic remediation
+- **Coder Tools MCP server** -- file I/O, fuzzy-match editing, git snapshots, and sandboxed shell execution (`tools/coder_tools_server.py`)
+- **Test generation** -- structural tests for forever-alive agents that don't hang on `runner.run()`
+
+#### Multi-Graph Agent Runtime
+
+`AgentRuntime` now supports loading, managing, and switching between multiple agent graphs within a single session. Six new lifecycle tools give agents (and the TUI) full control:
+
+```python
+# Load a second agent into the runtime
+await runtime.add_graph("exports/deep_research_agent")
+
+# Tools available to agents:
+# load_agent, unload_agent, start_agent, restart_agent, list_agents, get_user_presence
+```
+
+The Hive Coder uses multi-graph internally -- when you escalate from a worker agent, the Coder loads as a separate graph while the worker stays alive in the background.
+
+#### TUI Revamp
+
+The Terminal UI gets a ground-up rebuild with five major additions:
+
+- **Agent Picker** (Ctrl+A) -- tabbed modal screen for browsing Your Agents, Framework agents, and Examples with metadata badges (node count, tool count, session count, tags)
+- **Runtime-optional startup** -- TUI launches without a pre-loaded agent, showing the picker on first open
+- **Live streaming pane** -- dedicated RichLog widget shows LLM tokens as they arrive, replacing the old one-token-per-line display
+- **PDF attachments** -- `/attach` and `/detach` commands with native OS file dialog (macOS, Linux, Windows)
+- **Multi-graph commands** -- `/graphs`, `/graph <id>`, `/load <path>`, `/unload <id>` for managing agent graphs in-session
+
+#### Provider-Agnostic LLM Support
+
+Hive is no longer Anthropic-only. v0.5.1 adds first-class support for:
+
+- **Claude Code subscriptions** -- `use_claude_code_subscription: true` in `~/.hive/configuration.json` reads OAuth tokens from `~/.claude/.credentials.json` with automatic refresh
+- **OpenAI-compatible endpoints** -- `api_base` config routes traffic through any compatible API (Azure OpenAI, vLLM, Ollama, etc.)
+- **Any LiteLLM model** -- `RuntimeConfig` now passes `api_key`, `api_base`, and `extra_kwargs` through to LiteLLM
+
+The quickstart script auto-detects Claude Code subscriptions and ZAI Code installations.
+
+---
+
+### What's New
+
+#### Architecture & Runtime
+
+- **Hive Coder meta-agent** -- Natural-language agent builder with reference docs, guardian watchdog, and `hive code` CLI command. (@TimothyZhang7)
+- **Multi-graph agent sessions** -- `add_graph`/`remove_graph` on AgentRuntime with 6 lifecycle tools (`load_agent`, `unload_agent`, `start_agent`, `restart_agent`, `list_agents`, `get_user_presence`). (@TimothyZhang7)
+- **Claude Code subscription support** -- OAuth token refresh via `use_claude_code_subscription` config, auto-detection in quickstart, LiteLLM header patching. (@TimothyZhang7)
+- **OpenAI-compatible endpoint support** -- `api_base` and `extra_kwargs` in `RuntimeConfig` for any OpenAI-compatible API. (@TimothyZhang7)
+- **Remove deprecated node types** -- Delete `FlexibleGraphExecutor`, `WorkerNode`, `HybridJudge`, `CodeSandbox`, `Plan`, `FunctionNode`, `LLMNode`, `RouterNode`. Deprecated types (`llm_tool_use`, `llm_generate`, `function`, `router`, `human_input`) now raise `RuntimeError` with migration guidance. (@TimothyZhang7)
+- **Interactive credential setup** -- Guided `CredentialSetupSession` with health checks and encrypted storage, accessible via `hive setup-credentials` or automatic prompting on credential errors. (@RichardTang-Aden)
+- **Pre-start confirmation prompt** -- Interactive prompt before agent execution allowing credential updates or abort. (@RichardTang-Aden)
+- **Event bus multi-graph support** -- `graph_id` on events, `filter_graph` on subscriptions, `ESCALATION_REQUESTED` event type, `exclude_own_graph` filter. (@TimothyZhang7)
+
+#### TUI Improvements
+
+- **In-app agent picker** (Ctrl+A) -- Tabbed modal for browsing agents with metadata badges (nodes, tools, sessions, tags). (@TimothyZhang7)
+- **Runtime-optional TUI startup** -- Launches without a pre-loaded agent, shows agent picker on startup. (@TimothyZhang7)
+- **Hive Coder escalation** (Ctrl+E) -- Escalate to Hive Coder and return; also available via `/coder` and `/back` chat commands. (@TimothyZhang7)
+- **PDF attachment support** -- `/attach` and `/detach` commands with native OS file dialog. (@TimothyZhang7)
+- **Streaming output pane** -- Dedicated RichLog widget for live LLM token streaming. (@TimothyZhang7)
+- **Multi-graph TUI commands** -- `/graphs`, `/graph <id>`, `/load <path>`, `/unload <id>`. (@TimothyZhang7)
+- **Agent Guardian watchdog** -- Event-driven monitor that catches secondary agent failures and triggers automatic remediation, with `--no-guardian` CLI flag. (@TimothyZhang7)
+
+#### New Tool Integrations
+
+| Tool                   | Description                                                                                                                                                            | Contributor        |
+| ---------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------ |
+| **Discord**            | 4 MCP tools (`discord_list_guilds`, `discord_list_channels`, `discord_send_message`, `discord_get_messages`) with rate-limit retry and channel filtering               | @mishrapravin114   |
+| **Exa Search API**     | 4 AI-powered search tools (`exa_search`, `exa_find_similar`, `exa_get_contents`, `exa_answer`) with neural/keyword search, domain filters, and citation-backed answers | @JeetKaria06       |
+| **Razorpay**           | 6 payment processing tools for payments, invoices, payment links, and refunds with HTTP Basic Auth                                                                     | @shivamshahi07     |
+| **Google Docs**        | Document creation, reading, and editing with OAuth credential support                                                                                                  | @haliaeetusvocifer |
+| **Gmail enhancements** | Expanded mail operations for inbox management                                                                                                                          | @bryanadenhq       |
+
+#### Infrastructure
+
+- **Default node type → `event_loop`** -- `NodeSpec.node_type` defaults to `"event_loop"` instead of `"llm_tool_use"`. (@TimothyZhang7)
+- **Default `max_node_visits` → 0 (unlimited)** -- Nodes default to unlimited visits, reducing friction for feedback loops and forever-alive agents. (@TimothyZhang7)
+- **Remove `function` field from NodeSpec** -- Follows deprecation of `FunctionNode`. (@TimothyZhang7)
+- **LiteLLM OAuth patch** -- Correct header construction for OAuth tokens (remove `x-api-key` when Bearer token is present). (@TimothyZhang7)
+- **Orchestrator config centralization** -- Reads `api_key`, `api_base`, `extra_kwargs` from centralized `~/.hive/configuration.json`. (@TimothyZhang7)
+- **System prompt datetime injection** -- All system prompts now include current date/time for time-aware agent behavior. (@TimothyZhang7)
+- **Utils module exports** -- Proper `__init__.py` exports for the utils module. (@Siddharth2624)
+- **Increased default max_tokens** -- Opus 4.6 defaults to 32768, Sonnet 4.5 to 16384 (up from 8192). (@TimothyZhang7)
+
+---
+
+### Bug Fixes
+
+- Flush WIP accumulator outputs on cancel/failure so edge conditions see correct values on resume
+- Stall detection state preserved across resume (no more resets on checkpoint restore)
+- Skip client-facing blocking for event-triggered executions (timer/webhook)
+- Executor retry override scoped to actual EventLoopNode instances only
+- Add `_awaiting_input` flag to EventLoopNode to prevent input injection race conditions
+- Fix TUI streaming display (tokens no longer appear one-per-line)
+- Fix `_return_from_escalation` crash when ChatRepl widgets not yet mounted
+- Fix tools registration problems for Google Docs credentials (@RichardTang-Aden)
+- Fix email agent version conflicts (@RichardTang-Aden)
+- Fix coder tool timeouts (120s for tests, 300s cap for commands)
+
+### Documentation
+
+- Clarify installation and prevent root pip install misuse (@paarths-collab)
+
+---
+
+### Agent Updates
+
+- **Email Inbox Management** -- Consolidate `gmail_inbox_guardian` and `inbox_management` into a single unified agent with updated prompts and config. (@RichardTang-Aden, @bryanadenhq)
+- **Job Hunter** -- Updated node prompts, config, and agent metadata; added PDF resume selection. (@bryanadenhq)
+- **Deep Research Agent** -- Revised node implementations with updated prompts and output handling.
+- **Tech News Reporter** -- Revised node prompts for improved output quality.
+- **Vulnerability Assessment** -- Expanded prompts with more detailed assessment instructions. (@bryanadenhq)
+
+---
+
+### Breaking Changes
+
+- **Deprecated node types raise `RuntimeError`** -- `llm_tool_use`, `llm_generate`, `function`, `router`, `human_input` now fail instead of warning. Migrate to `event_loop`.
+- **`NodeSpec.node_type` defaults to `"event_loop"`** (was `"llm_tool_use"`)
+- **`NodeSpec.max_node_visits` defaults to `0` / unlimited** (was `1`)
+- **`NodeSpec.function` field removed** -- `FunctionNode` is deleted; use event_loop nodes with tools instead.
+
+---
+
+### Community Contributors
+
+A huge thank you to everyone who contributed to this release:
+
+- **Richard Tang** (@RichardTang-Aden) -- Interactive credential setup, pre-start confirmation, email agent consolidation, tool registration fixes, lint and formatting
+- **Pravin Mishra** (@mishrapravin114) -- Discord integration with 4 MCP tools
+- **Jeet Karia** (@JeetKaria06) -- Exa Search API integration with 4 AI-powered search tools
+- **Shivam Shahi** (@shivamshahi07) -- Razorpay payment processing integration
+- **Siddharth Varshney** (@Siddharth2624) -- Utils module exports
+- **@haliaeetusvocifer** -- Google Docs integration with OAuth support
+- **Bryan** (@bryanadenhq) -- PDF selection, inbox agent fixes, Job Hunter and Vulnerability Assessment updates
+- **@paarths-collab** -- Documentation improvements
+
+---
+
+### Upgrading
+
+```bash
+git pull origin main
+uv sync
+```
+
+#### Migration Guide
+
+If your agents use deprecated node types, update them:
+
+```python
+# Before (v0.5.0) -- these now raise RuntimeError
+NodeSpec(node_type="llm_tool_use", ...)
+NodeSpec(node_type="function", function=my_func, ...)
+
+# After (v0.5.1) -- use event_loop for everything
+NodeSpec(node_type="event_loop", ...)  # or just omit node_type (it's the default now)
+```
+
+If your agents set `max_node_visits=1` explicitly, they'll still work. The only change is the _default_ -- new agents without an explicit value now get unlimited visits.
+
+To try the new Hive Coder:
+
+```bash
+# Launch Coder directly
+hive code
+
+# Or from TUI -- press Ctrl+E to escalate
+hive tui
+```
@@ -0,0 +1 @@
+AGENTS.md
@@ -0,0 +1,56 @@
+.PHONY: lint format check test test-tools test-live test-all install-hooks help frontend-install frontend-dev frontend-build
+
+# ── Ensure uv is findable in Git Bash on Windows ──────────────────────────────
+# uv installs to ~/.local/bin on Windows/Linux/macOS. Git Bash may not include
+# this in PATH by default, so we prepend it here.
+export PATH := $(HOME)/.local/bin:$(PATH)
+
+# ── Targets ───────────────────────────────────────────────────────────────────
+
+help: ## Show this help
+	@grep -E '^[a-zA-Z_-]+:.*?## .*$$' $(MAKEFILE_LIST) | \
+		awk 'BEGIN {FS = ":.*?## "}; {printf "  \033[36m%-15s\033[0m %s\n", $$1, $$2}'
+
+lint: ## Run ruff linter and formatter (with auto-fix)
+	cd core && uv run ruff check --fix .
+	cd tools && uv run ruff check --fix .
+	cd core && uv run ruff format .
+	cd tools && uv run ruff format .
+
+format: ## Run ruff formatter
+	cd core && uv run ruff format .
+	cd tools && uv run ruff format .
+
+check: ## Run all checks without modifying files (CI-safe)
+	cd core && uv run ruff check .
+	cd tools && uv run ruff check .
+	cd core && uv run ruff format --check .
+	cd tools && uv run ruff format --check .
+
+test: ## Run all tests (core + tools, excludes live)
+	cd core && uv run python -m pytest tests/ -v
+	cd tools && uv run python -m pytest -v
+
+test-tools: ## Run tool tests only (mocked, no credentials needed)
+	cd tools && uv run python -m pytest -v
+
+test-live: ## Run live integration tests (requires real API credentials)
+	cd tools && uv run python -m pytest -m live -s -o "addopts=" --log-cli-level=INFO
+
+test-all: ## Run everything including live tests
+	cd core && uv run python -m pytest tests/ -v
+	cd tools && uv run python -m pytest -v
+	cd tools && uv run python -m pytest -m live -s -o "addopts=" --log-cli-level=INFO
+
+install-hooks: ## Install pre-commit hooks
+	uv pip install pre-commit
+	pre-commit install
+
+frontend-install: ## Install frontend npm packages
+	cd core/frontend && npm install
+
+frontend-dev: ## Start frontend dev server
+	cd core/frontend && npm run dev
+
+frontend-build: ## Build frontend for production
+	cd core/frontend && npm run build
@@ -1,255 +1,363 @@
 <p align="center">
-  <img width="100%" alt="Hive Banner" src="https://storage.googleapis.com/aden-prod-assets/website/aden-title-card.png" />
+  <img width="100%" alt="Hive Banner" src="https://github.com/user-attachments/assets/a027429b-5d3c-4d34-88e4-0feaeaabbab3" />
 </p>

-[![Apache 2.0 License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://github.com/adenhq/hive/blob/main/LICENSE)
-[![Y Combinator](https://img.shields.io/badge/Y%20Combinator-Aden-orange)](https://www.ycombinator.com/companies/aden)
-[![Docker Pulls](https://img.shields.io/docker/pulls/adenhq/hive?logo=Docker&labelColor=%23528bff)](https://hub.docker.com/u/adenhq)
-[![Discord](https://img.shields.io/discord/1172610340073242735?logo=discord&labelColor=%235462eb&logoColor=%23f5f5f5&color=%235462eb)](https://discord.com/invite/MXE49hrKDk)
-[![Twitter Follow](https://img.shields.io/twitter/follow/teamaden?logo=X&color=%23f5f5f5)](https://x.com/aden_hq)
-[![LinkedIn](https://custom-icon-badges.demolab.com/badge/LinkedIn-0A66C2?logo=linkedin-white&logoColor=fff)](https://www.linkedin.com/company/teamaden/)
+<p align="center">
+  <a href="README.md">English</a> |
+  <a href="docs/i18n/zh-CN.md">简体中文</a> |
+  <a href="docs/i18n/es.md">Español</a> |
+  <a href="docs/i18n/hi.md">हिन्दी</a> |
+  <a href="docs/i18n/pt.md">Português</a> |
+  <a href="docs/i18n/ja.md">日本語</a> |
+  <a href="docs/i18n/ru.md">Русский</a> |
+  <a href="docs/i18n/ko.md">한국어</a>
+</p>

 <p align="center">
+  <a href="https://github.com/aden-hive/hive/blob/main/LICENSE"><img src="https://img.shields.io/badge/License-Apache%202.0-blue.svg" alt="Apache 2.0 License" /></a>
+  <a href="https://www.ycombinator.com/companies/aden"><img src="https://img.shields.io/badge/Y%20Combinator-Aden-orange" alt="Y Combinator" /></a>
+  <a href="https://discord.com/invite/MXE49hrKDk"><img src="https://img.shields.io/discord/1172610340073242735?logo=discord&labelColor=%235462eb&logoColor=%23f5f5f5&color=%235462eb" alt="Discord" /></a>
+  <a href="https://x.com/aden_hq"><img src="https://img.shields.io/twitter/follow/teamaden?logo=X&color=%23f5f5f5" alt="Twitter Follow" /></a>
+  <a href="https://www.linkedin.com/company/teamaden/"><img src="https://custom-icon-badges.demolab.com/badge/LinkedIn-0A66C2?logo=linkedin-white&logoColor=fff" alt="LinkedIn" /></a>
+  <img src="https://img.shields.io/badge/MCP-102_Tools-00ADD8?style=flat-square" alt="MCP" />
+</p>
+
+<p align="center">
+  <img src="https://img.shields.io/badge/Agent_Harness-Runtime_Layer-ff6600?style=flat-square" alt="Agent Harness" />
  <img src="https://img.shields.io/badge/AI_Agents-Self--Improving-brightgreen?style=flat-square" alt="AI Agents" />
  <img src="https://img.shields.io/badge/Multi--Agent-Systems-blue?style=flat-square" alt="Multi-Agent" />
-  <img src="https://img.shields.io/badge/Goal--Driven-Development-purple?style=flat-square" alt="Goal-Driven" />
+  <img src="https://img.shields.io/badge/Headless-Development-purple?style=flat-square" alt="Headless" />
  <img src="https://img.shields.io/badge/Human--in--the--Loop-orange?style=flat-square" alt="HITL" />
-  <img src="https://img.shields.io/badge/Production--Ready-red?style=flat-square" alt="Production" />
+  <img src="https://img.shields.io/badge/Browser-Use-red?style=flat-square" alt="Browser Use" />
 </p>
 <p align="center">
  <img src="https://img.shields.io/badge/OpenAI-supported-412991?style=flat-square&logo=openai" alt="OpenAI" />
  <img src="https://img.shields.io/badge/Anthropic-supported-d4a574?style=flat-square" alt="Anthropic" />
  <img src="https://img.shields.io/badge/Google_Gemini-supported-4285F4?style=flat-square&logo=google" alt="Gemini" />
-  <img src="https://img.shields.io/badge/MCP-19_Tools-00ADD8?style=flat-square" alt="MCP" />
 </p>

+<p align="center"><em>The agent harness for production workloads — state management, failure recovery, observability, and human oversight so your agents actually run.</em></p>
+
 ## Overview

-Build reliable, self-improving AI agents without hardcoding workflows. Define your goal through conversation with a coding agent, and the framework generates a node graph with dynamically created connection code. When things break, the framework captures failure data, evolves the agent through the coding agent, and redeploys. Built-in human-in-the-loop nodes, credential management, and real-time monitoring give you control without sacrificing adaptability.
+Hive is a runtime harness for AI agents in production. You describe your goal in natural language; a coding agent (the queen) generates the agent graph and connection code to achieve it. During execution, the harness manages state isolation, checkpoint-based crash recovery, cost enforcement, and real-time observability. When agents fail, the framework captures failure data, evolves the graph through the coding agent, and redeploys automatically. Built-in human-in-the-loop nodes, browser control, credential management, and parallel execution give you production reliability without sacrificing adaptability.

 Visit [adenhq.com](https://adenhq.com) for complete documentation, examples, and guides.

+Visit [HoneyComb](http://honeycomb.open-hive.com/) to see what jobs are being automated by AI. It’s a stock market for jobs, driven by our community’s AI agent progress. You can long and short jobs (with no real money but compute token)based on how much you think a job is going to be replaced by AI.
+
+https://github.com/user-attachments/assets/bf10edc3-06ba-48b6-98ba-d069b15fb69d
+
+
+## Who Is Hive For?
+
+Hive is the harness layer for teams moving AI agents from prototype to production. Models are getting better on their own — the bottleneck is the infrastructure around them: state management, failure recovery, cost control, and observability.
+
+Hive is a good fit if you:
+
+- Want AI agents that **execute real business processes**, not demos
+- Need a **runtime that handles state, recovery, and parallel execution** at scale
+- Need **self-healing and adaptive agents** that improve over time
+- Require **human-in-the-loop control**, observability, and cost limits
+- Plan to run agents in **production** where uptime, cost, and auditability matter
+
+Hive may not be the best fit if you’re only experimenting with simple agent chains or one-off scripts.
+
+## When Should You Use Hive?
+
+Use Hive when the bottleneck is no longer the model but the harness around it:
+
+- Long-running agents that need **state persistence and crash recovery**
+- Production workloads requiring **cost enforcement, observability, and audit trails**
+- Agents that **self-heal** through failure capture and graph evolution
+- Multi-agent coordination with **session isolation and shared memory**
+- A framework that **scales with model improvements** rather than fighting them
+
 ## Quick Links

 - **[Documentation](https://docs.adenhq.com/)** - Complete guides and API reference
 - **[Self-Hosting Guide](https://docs.adenhq.com/getting-started/quickstart)** - Deploy Hive on your infrastructure
- **[Changelog](https://github.com/adenhq/hive/releases)** - Latest updates and releases
-<!-- - **[Roadmap](https://adenhq.com/roadmap)** - Upcoming features and plans -->
- **[Report Issues](https://github.com/adenhq/hive/issues)** - Bug reports and feature requests
+- **[Changelog](https://github.com/aden-hive/hive/releases)** - Latest updates and releases
+- **[Roadmap](docs/roadmap.md)** - Upcoming features and plans
+- **[Report Issues](https://github.com/aden-hive/hive/issues)** - Bug reports and feature requests
+- **[Contributing](CONTRIBUTING.md)** - How to contribute and submit PRs

 ## Quick Start

 ### Prerequisites

- [Docker](https://docs.docker.com/get-docker/) (v20.10+)
- [Docker Compose](https://docs.docker.com/compose/install/) (v2.0+)
+- Python 3.11+ for agent development
+- An LLM provider that powers the agents
+- **ripgrep (optional, recommended on Windows):** The `search_files` tool uses ripgrep for faster file search. If not installed, a Python fallback is used. On Windows: `winget install BurntSushi.ripgrep` or `scoop install ripgrep`
+
+> **Windows Users:** Native Windows is supported via `quickstart.ps1` and `hive.ps1`. Run these in PowerShell 5.1+. WSL is also an option but not required.

 ### Installation

+> **Note**
+> Hive uses a `uv` workspace layout and is not installed with `pip install`.
+> Running `pip install -e .` from the repository root will create a placeholder package and Hive will not function correctly.
+> Please use the quickstart script below to set up the environment.
+
 ```bash
 # Clone the repository
-git clone https://github.com/adenhq/hive.git
+git clone https://github.com/aden-hive/hive.git
 cd hive

-# Copy and configure
-cp config.yaml.example config.yaml
+# Run quickstart setup (macOS/Linux)
+./quickstart.sh

-# Run setup and start services
-npm run setup
-docker compose up
+# Windows (PowerShell)
+.\quickstart.ps1
 ```

-**Access the application:**
+This sets up:

- Dashboard: http://localhost:3000
- API: http://localhost:4000
- Health: http://localhost:4000/health
+- **framework** - Core agent runtime and graph executor (in `core/.venv`)
+- **aden_tools** - MCP tools for agent capabilities (in `tools/.venv`)
+- **credential store** - Encrypted API key storage (`~/.hive/credentials`)
+- **LLM provider** - Interactive default model configuration, including Hive LLM and OpenRouter
+- All required Python dependencies with `uv`
+
+- Finally, it will open the Hive interface in your browser
+
+> **Tip:** To reopen the dashboard later, run `hive open` from the project directory.
+
+### Build Your First Agent
+
+Type the agent you want to build in the home input box. The queen is going to ask you questions and work out a solution with you.
+
+<img width="2500" height="1214" alt="Image" src="https://github.com/user-attachments/assets/1ce19141-a78b-46f5-8d64-dbf987e048f4" />
+
+### Use Template Agents
+
+Click "Try a sample agent" and check the templates. You can run a template directly or choose to build your version on top of the existing template.
+
+### Run Agents
+
+Now you can run an agent by selecting the agent (either an existing agent or example agent). You can click the Run button on the top left, or talk to the queen agent and it can run the agent for you.
+
+<img width="2549" height="1174" alt="Screenshot 2026-03-12 at 9 27 36 PM" src="https://github.com/user-attachments/assets/7c7d30fa-9ceb-4c23-95af-b1caa405547d" />

 ## Features

- **Goal-Driven Development** - Define objectives in natural language; the coding agent generates the agent graph and connection code to achieve them
- **Self-Adapting Agents** - Framework captures failures, updates objectives and updates the agent graph
- **Dynamic Node Connections** - No predefined edges; connection code is generated by any capable LLM based on your goals
+- **Browser-Use** - Control the browser on your computer to achieve hard tasks
+- **Parallel Execution** - Execute the generated graph in parallel. This way you can have multiple agents completing the jobs for you
+- **[Goal-Driven Generation](docs/key_concepts/goals_outcome.md)** - Define objectives in natural language; the coding agent generates the agent graph and connection code to achieve them
+- **[Adaptiveness](docs/key_concepts/evolution.md)** - Framework captures failures, calibrates according to the objectives, and evolves the agent graph
+- **[Dynamic Node Connections](docs/key_concepts/graph.md)** - No predefined edges; connection code is generated by any capable LLM based on your goals
 - **SDK-Wrapped Nodes** - Every node gets shared memory, local RLM memory, monitoring, tools, and LLM access out of the box
- **Human-in-the-Loop** - Intervention nodes that pause execution for human input with configurable timeouts and escalation
+- **[Human-in-the-Loop](docs/key_concepts/graph.md#human-in-the-loop)** - Intervention nodes that pause execution for human input with configurable timeouts and escalation
 - **Real-time Observability** - WebSocket streaming for live monitoring of agent execution, decisions, and node-to-node communication
- **Cost & Budget Control** - Set spending limits, throttles, and automatic model degradation policies
- **Production-Ready** - Self-hostable, built for scale and reliability

-## Why Aden
+## Integration

-Traditional agent frameworks require you to manually design workflows, define agent interactions, and handle failures reactively. Aden flips this paradigm—**you describe outcomes, and the system builds itself**.
+<a href="https://github.com/aden-hive/hive/tree/main/tools/src/aden_tools/tools"><img width="100%" alt="Integration" src="https://github.com/user-attachments/assets/a1573f93-cf02-4bb8-b3d5-b305b05b1e51" /></a>
+Hive is built to be model-agnostic and system-agnostic.
+
+- **LLM flexibility** - Hive Framework supports Anthropic, OpenAI, OpenRouter, Hive LLM, and other hosted or local models through LiteLLM-compatible providers.
+- **Business system connectivity** - Hive Framework is designed to connect to all kinds of business systems as tools, such as CRM, support, messaging, data, file, and internal APIs via MCP.
+
+## Why Hive
+
+As models improve, the upper bound of what agents can do rises — but their reliability and production value are determined by the harness. Hive focuses on generating agents that run real business processes rather than generic agents. Instead of requiring you to manually design workflows, define agent interactions, and handle failures reactively, Hive flips the paradigm: **you describe outcomes, and the system builds itself**—delivering an outcome-driven, adaptive experience with an easy-to-use set of tools and integrations.

 ```mermaid
 flowchart LR
-    subgraph BUILD["🏗️ BUILD"]
-        GOAL["Define Goal<br/>+ Success Criteria"] --> NODES["Add Nodes<br/>LLM/Router/Function"]
-        NODES --> EDGES["Connect Edges<br/>on_success/failure/conditional"]
-        EDGES --> TEST["Test & Validate"] --> APPROVE["Approve & Export"]
-    end
+    GOAL["Define Goal"] --> GEN["Auto-Generate Graph"]
+    GEN --> EXEC["Execute Agents"]
+    EXEC --> MON["Monitor & Observe"]
+    MON --> CHECK{{"Pass?"}}
+    CHECK -- "Yes" --> DONE["Deliver Result"]
+    CHECK -- "No" --> EVOLVE["Evolve Graph"]
+    EVOLVE --> EXEC

-    subgraph EXPORT["📦 EXPORT"]
-        direction TB
-        JSON["agent.json<br/>(GraphSpec)"]
-        TOOLS["tools.py<br/>(Functions)"]
-        MCP["mcp_servers.json<br/>(Integrations)"]
-    end
+    GOAL -.- V1["Natural Language"]
+    GEN -.- V2["Instant Architecture"]
+    EXEC -.- V3["Easy Integrations"]
+    MON -.- V4["Full visibility"]
+    EVOLVE -.- V5["Adaptability"]
+    DONE -.- V6["Reliable outcomes"]

-    subgraph RUN["🚀 RUNTIME"]
-        LOAD["AgentRunner<br/>Load + Parse"] --> SETUP["Setup Runtime<br/>+ ToolRegistry"]
-        SETUP --> EXEC["GraphExecutor<br/>Execute Nodes"]
-
-        subgraph DECISION["Decision Recording"]
-            DEC1["runtime.decide()<br/>intent → options → choice"]
-            DEC2["runtime.record_outcome()<br/>success, result, metrics"]
-        end
-    end
-
-    subgraph INFRA["⚙️ INFRASTRUCTURE"]
-        CTX["NodeContext<br/>memory • llm • tools"]
-        STORE[("FileStorage<br/>Runs & Decisions")]
-    end
-
-    APPROVE --> EXPORT
-    EXPORT --> LOAD
-    EXEC --> DECISION
-    EXEC --> CTX
-    DECISION --> STORE
-    STORE -.->|"Analyze & Improve"| NODES
-
-    style BUILD fill:#ffbe42,stroke:#cc5d00,stroke-width:3px,color:#333
-    style EXPORT fill:#fff59d,stroke:#ed8c00,stroke-width:2px,color:#333
-    style RUN fill:#ffb100,stroke:#cc5d00,stroke-width:3px,color:#333
-    style DECISION fill:#ffcc80,stroke:#ed8c00,stroke-width:2px,color:#333
-    style INFRA fill:#e8763d,stroke:#cc5d00,stroke-width:3px,color:#fff
-    style STORE fill:#ed8c00,stroke:#cc5d00,stroke-width:2px,color:#fff
+    style GOAL fill:#ffbe42,stroke:#cc5d00,stroke-width:2px,color:#333
+    style GEN fill:#ffb100,stroke:#cc5d00,stroke-width:2px,color:#333
+    style EXEC fill:#ff9800,stroke:#cc5d00,stroke-width:2px,color:#fff
+    style MON fill:#ff9800,stroke:#cc5d00,stroke-width:2px,color:#fff
+    style CHECK fill:#fff59d,stroke:#ed8c00,stroke-width:2px,color:#333
+    style DONE fill:#4caf50,stroke:#2e7d32,stroke-width:2px,color:#fff
+    style EVOLVE fill:#e8763d,stroke:#cc5d00,stroke-width:2px,color:#fff
+    style V1 fill:#fff,stroke:#ed8c00,stroke-width:1px,color:#cc5d00
+    style V2 fill:#fff,stroke:#ed8c00,stroke-width:1px,color:#cc5d00
+    style V3 fill:#fff,stroke:#ed8c00,stroke-width:1px,color:#cc5d00
+    style V4 fill:#fff,stroke:#ed8c00,stroke-width:1px,color:#cc5d00
+    style V5 fill:#fff,stroke:#ed8c00,stroke-width:1px,color:#cc5d00
+    style V6 fill:#fff,stroke:#ed8c00,stroke-width:1px,color:#cc5d00
 ```

-### The Aden Advantage
+### The Hive Advantage

-| Traditional Frameworks | Aden |
-|------------------------|------|
-| Hardcode agent workflows | Describe goals in natural language |
-| Manual graph definition | Auto-generated agent graphs |
-| Reactive error handling | Proactive self-evolution |
-| Static tool configurations | Dynamic SDK-wrapped nodes |
-| Separate monitoring setup | Built-in real-time observability |
-| DIY budget management | Integrated cost controls & degradation |
+| Typical Agent Frameworks   | Hive                                   |
+| -------------------------- | -------------------------------------- |
+| Focus on model orchestration | **Production harness**: state, recovery, observability |
+| Hardcode agent workflows   | Describe goals in natural language     |
+| Manual graph definition    | Auto-generated agent graphs            |
+| Reactive error handling    | Outcome-evaluation and adaptiveness    |
+| Static tool configurations | Dynamic SDK-wrapped nodes              |
+| Separate monitoring setup  | Built-in real-time observability       |
+| DIY budget management      | Integrated cost controls & degradation |

 ### How It Works

-1. **Define Your Goal** → Describe what you want to achieve in plain English
-2. **Coding Agent Generates** → Creates the agent graph, connection code, and test cases
-3. **Workers Execute** → SDK-wrapped nodes run with full observability and tool access
+1. **[Define Your Goal](docs/key_concepts/goals_outcome.md)** → Describe what you want to achieve in plain English
+2. **Coding Agent Generates** → Creates the [agent graph](docs/key_concepts/graph.md), connection code, and test cases
+3. **[Workers Execute](docs/key_concepts/worker_agent.md)** → SDK-wrapped nodes run with full observability and tool access
 4. **Control Plane Monitors** → Real-time metrics, budget enforcement, policy management
-5. **Self-Improve** → On failure, the system evolves the graph and redeploys automatically
-
-## How Aden Compares
-
-Aden takes a fundamentally different approach to agent development. While most frameworks require you to hardcode workflows or manually define agent graphs, Aden uses a **coding agent to generate your entire agent system** from natural language goals. When agents fail, the framework doesn't just log errors—it **automatically evolves the agent graph** and redeploys.
-
-### Comparison Table
-
-| Framework | Category | Approach | Aden Difference |
-|-----------|----------|----------|-----------------|
-| **LangChain, LlamaIndex, Haystack** | Component Libraries | Predefined components for RAG/LLM apps; manual connection logic | Generates entire graph and connection code upfront |
-| **CrewAI, AutoGen, Swarm** | Multi-Agent Orchestration | Role-based agents with predefined collaboration patterns | Dynamically creates agents/connections; adapts on failure |
-| **PydanticAI, Mastra, Agno** | Type-Safe Frameworks | Structured outputs and validation for known workflows | Evolving workflows; structure emerges through iteration |
-| **Agent Zero, Letta** | Personal AI Assistants | Memory and learning; OS-as-tool or stateful memory focus | Production multi-agent systems with self-healing |
-| **CAMEL** | Research Framework | Emergent behavior in large-scale simulations (up to 1M agents) | Production-oriented with reliable execution and recovery |
-| **TEN Framework, Genkit** | Infrastructure Frameworks | Real-time multimodal (TEN) or full-stack AI (Genkit) | Higher abstraction—generates and evolves agent logic |
-| **GPT Engineer, Motia** | Code Generation | Code from specs (GPT Engineer) or "Step" primitive (Motia) | Self-adapting graphs with automatic failure recovery |
-| **Trading Agents** | Domain-Specific | Hardcoded trading firm roles on LangGraph | Domain-agnostic; generates structures for any use case |
-
-### When to Choose Aden
-
-Choose Aden when you need:
- Agents that **self-improve from failures** without manual intervention
- **Goal-driven development** where you describe outcomes, not workflows
- **Production reliability** with automatic recovery and redeployment
- **Rapid iteration** on agent architectures without rewriting code
- **Full observability** with real-time monitoring and human oversight
-
-Choose other frameworks when you need:
- **Type-safe, predictable workflows** (PydanticAI, Mastra)
- **RAG and document processing** (LlamaIndex, Haystack)
- **Research on agent emergence** (CAMEL)
- **Real-time voice/multimodal** (TEN Framework)
- **Simple component chaining** (LangChain, Swarm)
-
-## Project Structure
-
-```
-hive/
-├── honeycomb/              # Frontend Dashboard 
-├── hive/                   # Backend API Server 
-├── aden-tools/             # MCP Tools Package - 19 tools for agent capabilities
-├── docs/                   # Documentation and guides
-├── scripts/                # Build and utility scripts
-├── config.yaml.example     # Configuration template
-├── docker-compose.yml      # Container orchestration
-├── DEVELOPER.md            # Developer guide
-├── CONTRIBUTING.md         # Contribution guidelines
-└── ROADMAP.md              # Product roadmap
-```
-
-## Development
-
-### Local Development with Hot Reload
-
-```bash
-# Copy development overrides
-cp docker-compose.override.yml.example docker-compose.override.yml
-
-# Start with hot reload enabled
-docker compose up
-```
-
-### Running Without Docker
-
-```bash
-# Install dependencies
-npm install
-
-# Generate environment files
-npm run generate:env
-
-# Start frontend (in honeycomb/)
-cd honeycomb && npm run dev
-
-# Start backend (in hive/)
-cd hive && npm run dev
-```
+5. **[Adaptiveness](docs/key_concepts/evolution.md)** → On failure, the system evolves the graph and redeploys automatically

 ## Documentation

- **[Developer Guide](DEVELOPER.md)** - Comprehensive guide for developers
+- **[Developer Guide](docs/developer-guide.md)** - Comprehensive guide for developers
 - [Getting Started](docs/getting-started.md) - Quick setup instructions
 - [Configuration Guide](docs/configuration.md) - All configuration options
- [Architecture Overview](docs/architecture.md) - System design and structure
+- [Architecture Overview](docs/architecture/README.md) - System design and structure

 ## Roadmap

-Aden Agent Framework aims to help developers build outcome oriented, self-adaptive agents. Please find our roadmap here
-
-[ROADMAP.md](ROADMAP.md)
+Aden Hive Agent Framework aims to help developers build outcome-oriented, self-adaptive agents. See [roadmap.md](docs/roadmap.md) for details.

 ```mermaid
-timeline
-    title Aden Agent Framework Roadmap
-    section Foundation
-        Architecture : Node-Based Architecture : Python SDK : LLM Integration (OpenAI, Anthropic, Google) : Communication Protocol
-        Coding Agent : Goal Creation Session : Worker Agent Creation : MCP Tools Integration
-        Worker Agent : Human-in-the-Loop : Callback Handlers : Intervention Points : Streaming Interface
-        Tools : File Use : Memory (STM/LTM) : Web Search : Web Scraper : Audit Trail
-        Core : Eval System : Pydantic Validation : Docker Deployment : Documentation : Sample Agents
-    section Expansion
-        Intelligence : Guardrails : Streaming Mode : Semantic Search
-        Platform : JavaScript SDK : Custom Tool Integrator : Credential Store
-        Deployment : Self-Hosted : Cloud Services : CI/CD Pipeline
-        Templates : Sales Agent : Marketing Agent : Analytics Agent : Training Agent : Smart Form Agent
+flowchart TB
+    %% Main Entity
+    User([User])
+
+    %% =========================================
+    %% EXTERNAL EVENT SOURCES
+    %% =========================================
+    subgraph ExtEventSource [External Event Source]
+        E_Sch["Schedulers"]
+        E_WH["Webhook"]
+        E_SSE["SSE"]
+    end
+
+    %% =========================================
+    %% SYSTEM NODES
+    %% =========================================
+    subgraph WorkerBees [Worker Bees]
+        WB_C["Conversation"]
+        WB_SP["System prompt"]
+
+        subgraph Graph [Graph]
+            direction TB
+            N1["Node"] --> N2["Node"] --> N3["Node"]
+            N1 -.-> AN["Active Node"]
+            N2 -.-> AN
+            N3 -.-> AN
+
+            %% Nested Event Loop Node
+            subgraph EventLoopNode [Event Loop Node]
+                ELN_L["listener"]
+                ELN_SP["System Prompt<br/>(Task)"]
+                ELN_EL["Event loop"]
+                ELN_C["Conversation"]
+            end
+        end
+    end
+
+    subgraph JudgeNode [Judge]
+        J_C["Criteria"]
+        J_P["Principles"]
+        J_EL["Event loop"] <--> J_S["Scheduler"]
+    end
+
+    subgraph QueenBee [Queen Bee]
+        QB_SP["System prompt"]
+        QB_EL["Event loop"]
+        QB_C["Conversation"]
+    end
+
+    subgraph Infra [Infra]
+        SA["Sub Agent"]
+        TR["Tool Registry"]
+        WTM["Write through Conversation Memory<br/>(Logs/RAM/Harddrive)"]
+        SM["Shared Memory<br/>(State/Harddrive)"]
+        EB["Event Bus<br/>(RAM)"]
+        CS["Credential Store<br/>(Harddrive/Cloud)"]
+    end
+
+    subgraph PC [PC]
+        B["Browser"]
+        CB["Codebase<br/>v 0.0.x ... v n.n.n"]
+    end
+
+    %% =========================================
+    %% CONNECTIONS & DATA FLOW
+    %% =========================================
+
+    %% External Event Routing
+    E_Sch --> ELN_L
+    E_WH --> ELN_L
+    E_SSE --> ELN_L
+    ELN_L -->|"triggers"| ELN_EL
+
+    %% User Interactions
+    User -->|"Talk"| WB_C
+    User -->|"Talk"| QB_C
+    User -->|"Read/Write Access"| CS
+
+    %% Inter-System Logic
+    ELN_C <-->|"Mirror"| WB_C
+    WB_C -->|"Focus"| AN
+
+    WorkerBees -->|"Inquire"| JudgeNode
+    JudgeNode -->|"Approve"| WorkerBees
+
+    %% Judge Alignments
+    J_C <-.->|"aligns"| WB_SP
+    J_P <-.->|"aligns"| QB_SP
+
+    %% Escalate path
+    J_EL -->|"Report (Escalate)"| QB_EL
+
+    %% Pub/Sub Logic
+    AN -->|"publish"| EB
+    EB -->|"subscribe"| QB_C
+
+    %% Infra and Process Spawning
+    ELN_EL -->|"Spawn"| SA
+    SA -->|"Inform"| ELN_EL
+    SA -->|"Starts"| B
+    B -->|"Report"| ELN_EL
+    TR -->|"Assigned"| ELN_EL
+    CB -->|"Modify Worker Bee"| WB_C
+
+    %% =========================================
+    %% SHARED MEMORY & LOGS ACCESS
+    %% =========================================
+
+    %% Worker Bees Access (link to node inside Graph subgraph)
+    AN <-->|"Read/Write"| WTM
+    AN <-->|"Read/Write"| SM
+
+    %% Queen Bee Access
+    QB_C <-->|"Read/Write"| WTM
+    QB_EL <-->|"Read/Write"| SM
+
+    %% Credentials Access
+    CS -->|"Read Access"| QB_C
 ```

+## Contributing
+We welcome contributions from the community! We’re especially looking for help building tools, integrations, and example agents for the framework ([check #2805](https://github.com/aden-hive/hive/issues/2805)). If you’re interested in extending its functionality, this is the perfect place to start. Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
+
+**Important:** Please get assigned to an issue before submitting a PR. Comment on an issue to claim it, and a maintainer will assign you. Issues with reproducible steps and proposals are prioritized. This helps prevent duplicate work.
+
+1. Find or create an issue and get assigned
+2. Fork the repository
+3. Create your feature branch (`git checkout -b feature/amazing-feature`)
+4. Commit your changes (`git commit -m 'Add amazing feature'`)
+5. Push to the branch (`git push origin feature/amazing-feature`)
+6. Open a Pull Request
+
 ## Community & Support

 We use [Discord](https://discord.com/invite/MXE49hrKDk) for support, feature requests, and community discussions.
@@ -258,16 +366,6 @@ We use [Discord](https://discord.com/invite/MXE49hrKDk) for support, feature req
 - Twitter/X - [@adenhq](https://x.com/aden_hq)
 - LinkedIn - [Company Page](https://www.linkedin.com/company/teamaden/)

-## Contributing
-
-We welcome contributions! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
-
-1. Fork the repository
-2. Create your feature branch (`git checkout -b feature/amazing-feature`)
-3. Commit your changes (`git commit -m 'Add amazing feature'`)
-4. Push to the branch (`git push origin feature/amazing-feature`)
-5. Open a Pull Request
-
 ## Join Our Team

 **We're hiring!** Join us in engineering, research, and go-to-market roles.
@@ -284,69 +382,55 @@ This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENS

 ## Frequently Asked Questions (FAQ)

-**Q: Does Aden depend on LangChain or other agent frameworks?**
+**Q: What LLM providers does Hive support?**

-No. Aden is built from the ground up with no dependencies on LangChain, CrewAI, or other agent frameworks. The framework is designed to be lean and flexible, generating agent graphs dynamically rather than relying on predefined components.
+Hive supports 100+ LLM providers through LiteLLM integration, including OpenAI (GPT-4, GPT-4o), Anthropic (Claude models), Google Gemini, DeepSeek, Mistral, Groq, OpenRouter, and Hive LLM. Simply set the appropriate API key environment variable and specify the model name. See [docs/configuration.md](docs/configuration.md) for provider-specific configuration examples.

-**Q: What LLM providers does Aden support?**
+**Q: Can I use Hive with local AI models like Ollama?**

-Aden supports OpenAI (GPT-4, GPT-4o), Anthropic (Claude models), and Google Gemini out of the box. The architecture is provider-agnostic through SDK abstraction, with LiteLLM integration on the roadmap for expanded model support.
+Yes! Hive supports local models through LiteLLM. Simply use the model name format `ollama/model-name` (e.g., `ollama/llama3`, `ollama/mistral`) and ensure Ollama is running locally.

-**Q: Can I use Aden with local AI models like Ollama?**
+**Q: What makes Hive different from other agent frameworks?**

-Local model support through LiteLLM integration is on our roadmap. The SDK's provider-agnostic design means adding local model support will be straightforward once implemented.
+Hive is an agent harness, not just an orchestration framework. It provides the production runtime layer — session isolation, checkpoint-based crash recovery, cost enforcement, real-time observability, and human-in-the-loop controls — that makes agents reliable enough to run real workloads. On top of that, Hive generates your entire agent system from natural language goals and automatically [evolves the graph](docs/key_concepts/evolution.md) when agents fail. The combination of a robust harness with self-improving generation is what sets Hive apart.

-**Q: What makes Aden different from other agent frameworks?**
+**Q: Is Hive open-source?**

-Aden generates your entire agent system from natural language goals using a coding agent—you don't hardcode workflows or manually define graphs. When agents fail, the framework automatically captures failure data, evolves the agent graph, and redeploys. This self-improving loop is unique to Aden.
+Yes, Hive is fully open-source under the Apache License 2.0. We actively encourage community contributions and collaboration.

-**Q: Is Aden open-source?**
+**Q: Does Hive support human-in-the-loop workflows?**

-Yes, Aden is fully open-source under the Apache License 2.0. We actively encourage community contributions and collaboration.
+Yes, Hive fully supports [human-in-the-loop](docs/key_concepts/graph.md#human-in-the-loop) workflows through intervention nodes that pause execution for human input. These include configurable timeouts and escalation policies, allowing seamless collaboration between human experts and AI agents.

-**Q: Does Aden collect data from users?**
+**Q: What programming languages does Hive support?**

-Aden collects telemetry data for monitoring and observability purposes, including token usage, latency metrics, and cost tracking. Content capture (prompts and responses) is configurable and stored with team-scoped data isolation. All data stays within your infrastructure when self-hosted.
+The Hive framework is built in Python. A JavaScript/TypeScript SDK is on the roadmap.

-**Q: What deployment options does Aden support?**
-
-Aden supports Docker Compose deployment out of the box, with both production and development configurations. Self-hosted deployments work on any infrastructure supporting Docker. Cloud deployment options and Kubernetes-ready configurations are on the roadmap.
-
-**Q: Can Aden handle complex, production-scale use cases?**
-
-Yes. Aden is explicitly designed for production environments with features like automatic failure recovery, real-time observability, cost controls, and horizontal scaling support. The framework handles both simple automations and complex multi-agent workflows.
-
-**Q: Does Aden support human-in-the-loop workflows?**
-
-Yes, Aden fully supports human-in-the-loop workflows through intervention nodes that pause execution for human input. These include configurable timeouts and escalation policies, allowing seamless collaboration between human experts and AI agents.
-
-**Q: What monitoring and debugging tools does Aden provide?**
-
-Aden includes comprehensive observability features: real-time WebSocket streaming for live agent execution monitoring, TimescaleDB-powered analytics for cost and performance metrics, health check endpoints for Kubernetes integration, and 19 MCP tools for budget management, agent status, and policy control.
-
-**Q: What programming languages does Aden support?**
-
-Aden provides SDKs for both Python and JavaScript/TypeScript. The Python SDK includes integration templates for LangGraph, LangFlow, and LiveKit. The backend is Node.js/TypeScript, and the frontend is React/TypeScript.
-
-**Q: Can Aden agents interact with external tools and APIs?**
+**Q: Can Hive agents interact with external tools and APIs?**

 Yes. Aden's SDK-wrapped nodes provide built-in tool access, and the framework supports flexible tool ecosystems. Agents can integrate with external APIs, databases, and services through the node architecture.

-**Q: How does cost control work in Aden?**
+**Q: How does cost control work in Hive?**

-Aden provides granular budget controls including spending limits, throttles, and automatic model degradation policies. You can set budgets at the team, agent, or workflow level, with real-time cost tracking and alerts.
+Hive provides granular budget controls including spending limits, throttles, and automatic model degradation policies. You can set budgets at the team, agent, or workflow level, with real-time cost tracking and alerts.

 **Q: Where can I find examples and documentation?**

-Visit [docs.adenhq.com](https://docs.adenhq.com/) for complete guides, API reference, and getting started tutorials. The repository also includes documentation in the `docs/` folder and a comprehensive [DEVELOPER.md](DEVELOPER.md) guide.
+Visit [docs.adenhq.com](https://docs.adenhq.com/) for complete guides, API reference, and getting started tutorials. The repository also includes documentation in the `docs/` folder and a comprehensive [developer guide](docs/developer-guide.md).

 **Q: How can I contribute to Aden?**

 Contributions are welcome! Fork the repository, create your feature branch, implement your changes, and submit a pull request. See [CONTRIBUTING.md](CONTRIBUTING.md) for detailed guidelines.

-**Q: Does Aden offer enterprise support?**
+## Star History

-For enterprise inquiries, contact the Aden team through [adenhq.com](https://adenhq.com) or join our [Discord community](https://discord.com/invite/MXE49hrKDk) for support and discussions.
+<a href="https://star-history.com/#aden-hive/hive&Date">
+ <picture>
+   <source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/svg?repos=aden-hive/hive&type=Date&theme=dark" />
+   <source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/svg?repos=aden-hive/hive&type=Date" />
+   <img alt="Star History Chart" src="https://api.star-history.com/svg?repos=aden-hive/hive&type=Date" />
+ </picture>
+</a>

 ---

@@ -1,150 +0,0 @@
-Product Roadmap
-
-Aden Agent Framework aims to help developers build outcome oriented, self-adaptive agents. Please find our roadmap here
-
-```mermaid
-timeline
-    title Aden Agent Framework Roadmap
-    section Foundation
-        Architecture : Node-Based Architecture : Python SDK : LLM Integration (OpenAI, Anthropic, Google) : Communication Protocol
-        Coding Agent : Goal Creation Session : Worker Agent Creation : MCP Tools Integration
-        Worker Agent : Human-in-the-Loop : Callback Handlers : Intervention Points : Streaming Interface
-        Tools : File Use : Memory (STM/LTM) : Web Search : Web Scraper : Audit Trail
-        Core : Eval System : Pydantic Validation : Docker Deployment : Documentation : Sample Agents
-    section Expansion
-        Intelligence : Guardrails : Streaming Mode : Semantic Search
-        Platform : JavaScript SDK : Custom Tool Integrator : Credential Store
-        Deployment : Self-Hosted : Cloud Services : CI/CD Pipeline
-        Templates : Sales Agent : Marketing Agent : Analytics Agent : Training Agent : Smart Form Agent
-```
-
---
-
-## Phase 1: Foundation
-
-### Backbone Architecture
- [ ] **Node-Based Architecture (Agent as a node)**
-    - [x] Object schema definition
-    - [x] Node wrapper SDK
-    - [ ] Shared memory access
-    - [ ] Default monitoring hooks
-    - [ ] Tool access layer
-    - [x] LLM integration layer (Natively supports all mainstream LLMs through LiteLLM)
-        - [x] Anthropic
-        - [x] OpenAI
-        - [x] Google
- [ ] **Communication protocol between nodes**
- [ ] **[Coding Agent] Goal Creation Session** (separate from coding session)
-    - [ ] Instruction back and forth
-    - [x] Goal Object schema definition
-    - [ ] Being able to generate the test cases
-    - [ ] Test case validation for worker agent (Outcome driven)
- [ ] **[Coding Agent] Worker Agent Creation**
-    - [x] Coding Agent tools
-    - [ ] Use Template Agent as a start
-    - [x] Use our MCP tools
- [ ] **[Worker Agent] Human-in-the-Loop**
-    - [x] Worker Agents request with questions and options
-    - [x] Callback Handler System to receive events throughout execution
-    - [ ] Tool-Based Intervention Points (tool to pause execution and request human input)
-    - [x] Multiple entrypoint for different event source (e.g. Human input, webhook)
-    - [ ] Streaming Interface for Real-time Monitoring
-    - [ ] Request State Management
-
-### Essential Tools
- [x] **File Use Tool Kit**
- [ ] **Memory Tools**
-    - [x] STM Layer Tool (state-based short-term memory)
-    - [x] LTM Layer Tool (RLM - long-term memory)
- [ ] **Infrastructure Tools**
-    - [x] Runtime Log Tool (logs for coding agent)
-    - [ ] Audit Trail Tool (decision timeline generation)
-    - [ ] Web Search
-    - [ ] Web Scraper
-    - [ ] Recipe for "Add your own tools"
-
-### Memory & File System
- [x] DB for long-term persistent memory (Filesystem as durable scratchpad pattern)
- [x] Session Local memory isolation
-
-### Eval System (Basic)
- [x] Test Driven - Run test case for all agent iteration
- [ ] Failure recording mechanism
- [ ] SDK for defining failure conditions
- [ ] Basic observability hooks
- [ ] User-driven log analysis (OSS approach)
-
-### Data Validation
- [ ] Natively Support data validation of LLMs output with Pydantic
-
-### Developer Experience
- [ ] **Debugging mode**
- [ ] **Documentation**
-    - [ ] Quick start guide
-    - [ ] Goal creation guide
-    - [ ] Agent creation guide
-    - [ ] GitHub Page setup
-    - [ ] README with examples
-    - [ ] Contributing guidelines
- [ ] **Distribution**
-    - [ ] PyPI package
-    - [ ] Docker image on Docker Hub
-
-### Sample Agents
- [ ] Knowledge Agent
- [ ] Blog Writer Agent
- [ ] SDR Agent
-
---
-
-## Phase 2: Expansion
-
-### Basic Guardrails
- [ ] Support Basic Monitoring from Agent node SDK
- [ ] SDK guardrail implementation (in node)
- [ ] Guardrail type support (Determined Condition as Guardrails)
-
-### Agent Capability
- [ ] Streaming mode support
-
-### Cross-Platform
- [ ] JavaScript / TypeScript Version SDK
-
-### File System Enhancement
- [ ] Semantic Search integration
- [ ] Interactive File System in product (frontend integration)
-
-### More Worker Tools
- [ ] Custom Tool Integrator
- [ ] Integration as a tool (Credential Store & Support)
- [ ] **Core Agent Tools**
-    - [ ] Node Discovery Tool (find other agents in the graph)
-    - [ ] HITL Tool (pause execution for human approval)
-    - [ ] Wake-up Tool (resume agent tasks)
-
-### Deployment (Self-Hosted)
- [ ] Docker container standardization
- [ ] Headless backend execution
- [ ] Exposed API for frontend attachment
- [ ] Local monitoring & observability (from hive repo)
- [ ] Basic lifecycle APIs (Start, Stop, Pause, Resume)
-
-### Deployment (Cloud)
- [ ] Cloud Service Options
- [ ] Support deployment to 3rd-party platforms
- [ ] Self-deploy + orchestrator connection
- [ ] **CI/CD Pipeline**
-    - [ ] Automated test execution
-    - [ ] Agent version control
-    - [ ] All tests must pass for deployment
-
-### Developer Experience Enhancement
- [ ] Tool usage documentation
- [ ] Discord Support Channel
-
-### More Agent Templates
- [ ] GTM Sales Agent (workflow)
- [ ] GTM Marketing Agent (workflow)
- [ ] Analytics Agent
- [ ] Training Agent
- [ ] Smart Entry / Form Agent (self-evolution emphasis)
@@ -39,8 +39,8 @@ We consider security research conducted in accordance with this policy to be:
 ## Security Best Practices for Users

 1. **Keep Updated**: Always run the latest version
-2. **Secure Configuration**: Review `config.yaml` settings, especially in production
-3. **Environment Variables**: Never commit `.env` files or `config.yaml` with secrets
+2. **Secure Configuration**: Review your `~/.hive/configuration.json`, `.mcp.json`, and environment variable settings, especially in production
+3. **Environment Variables**: Never commit `.env` files or any configuration files that contain secrets
 4. **Network Security**: Use HTTPS in production, configure firewalls appropriately
 5. **Database Security**: Use strong passwords, limit network access

@@ -1,186 +0,0 @@
-# Building Tools for Aden
-
-This guide explains how to create new tools for the Aden agent framework using FastMCP.
-
-## Quick Start Checklist
-
-1. Create folder under `src/aden_tools/tools/<tool_name>/`
-2. Implement a `register_tools(mcp: FastMCP)` function using the `@mcp.tool()` decorator
-3. Add a `README.md` documenting your tool
-4. Register in `src/aden_tools/tools/__init__.py`
-5. Add tests in `tests/tools/`
-
-## Tool Structure
-
-Each tool lives in its own folder:
-
-```
-src/aden_tools/tools/my_tool/
-├── __init__.py           # Export register_tools function
-├── my_tool.py            # Tool implementation
-└── README.md             # Documentation
-```
-
-## Implementation Pattern
-
-Tools use FastMCP's native decorator pattern:
-
-```python
-from fastmcp import FastMCP
-
-
-def register_tools(mcp: FastMCP) -> None:
-    """Register my tools with the MCP server."""
-
-    @mcp.tool()
-    def my_tool(
-        query: str,
-        limit: int = 10,
-    ) -> dict:
-        """
-        Search for items matching a query.
-
-        Use this when you need to find specific information.
-
-        Args:
-            query: The search query (1-500 chars)
-            limit: Maximum number of results (1-100)
-
-        Returns:
-            Dict with search results or error dict
-        """
-        # Validate inputs
-        if not query or len(query) > 500:
-            return {"error": "Query must be 1-500 characters"}
-        if limit < 1 or limit > 100:
-            limit = max(1, min(100, limit))
-
-        try:
-            # Your implementation here
-            results = do_search(query, limit)
-            return {
-                "query": query,
-                "results": results,
-                "total": len(results),
-            }
-        except Exception as e:
-            return {"error": f"Search failed: {str(e)}"}
-```
-
-## Exporting the Tool
-
-In `src/aden_tools/tools/my_tool/__init__.py`:
-```python
-from .my_tool import register_tools
-
-__all__ = ["register_tools"]
-```
-
-In `src/aden_tools/tools/__init__.py`, add to `_TOOL_MODULES`:
-```python
-_TOOL_MODULES = [
-    # ... existing tools
-    "my_tool",
-]
-```
-
-## Environment Variables
-
-For tools requiring API keys or configuration, check environment variables at runtime:
-
-```python
-import os
-
-def register_tools(mcp: FastMCP) -> None:
-    @mcp.tool()
-    def my_api_tool(query: str) -> dict:
-        """Tool that requires an API key."""
-        api_key = os.getenv("MY_API_KEY")
-        if not api_key:
-            return {
-                "error": "MY_API_KEY environment variable not set",
-                "help": "Get an API key at https://example.com/api",
-            }
-
-        # Use the API key...
-```
-
-## Best Practices
-
-### Error Handling
-
-Return error dicts instead of raising exceptions:
-
-```python
-@mcp.tool()
-def my_tool(**kwargs) -> dict:
-    try:
-        result = do_work()
-        return {"success": True, "data": result}
-    except SpecificError as e:
-        return {"error": f"Failed to process: {str(e)}"}
-    except Exception as e:
-        return {"error": f"Unexpected error: {str(e)}"}
-```
-
-### Return Values
-
- Return dicts for structured data
- Include relevant metadata (query, total count, etc.)
- Use `{"error": "message"}` for errors
-
-### Documentation
-
-The docstring becomes the tool description in MCP. Include:
- What the tool does
- When to use it
- Args with types and constraints
- What it returns
-
-Every tool folder needs a `README.md` with:
- Description and use cases
- Usage examples
- Argument table
- Environment variables (if any)
- Error handling notes
-
-## Testing
-
-Place tests in `tests/tools/test_{{tool_name}}.py`:
-
-```python
-import pytest
-from fastmcp import FastMCP
-
-from aden_tools.tools.{{tool_name}} import register_tools
-
-
-@pytest.fixture
-def mcp():
-    """Create a FastMCP instance with tools registered."""
-    server = FastMCP("test")
-    register_tools(server)
-    return server
-
-
-def test_my_tool_basic(mcp):
-    """Test basic tool functionality."""
-    tool_fn = mcp._tool_manager._tools["my_tool"].fn
-    result = tool_fn(query="test")
-    assert "results" in result
-
-
-def test_my_tool_validation(mcp):
-    """Test input validation."""
-    tool_fn = mcp._tool_manager._tools["my_tool"].fn
-    result = tool_fn(query="")
-    assert "error" in result
-```
-
-Mock external APIs to keep tests fast and deterministic.
-
-## Naming Conventions
-
- **Folder name**: `snake_case` with `_tool` suffix (e.g., `file_read_tool`)
- **Function name**: `snake_case` (e.g., `file_read`)
- **Tool description**: Clear, actionable docstring
@@ -1,103 +0,0 @@
-# Aden Tools
-
-Tool library for the Aden agent framework. Provides a collection of tools that AI agents can use to interact with external systems, process data, and perform actions via the Model Context Protocol (MCP).
-
-## Installation
-
-```bash
-pip install -e aden-tools
-```
-
-For development:
-```bash
-pip install -e "aden-tools[dev]"
-```
-
-## Quick Start
-
-### As an MCP Server
-
-```python
-from fastmcp import FastMCP
-from aden_tools.tools import register_all_tools
-
-mcp = FastMCP("aden-tools")
-register_all_tools(mcp)
-mcp.run()
-```
-
-Or run directly:
-```bash
-python mcp_server.py
-```
-
-## Available Tools
-
-| Tool | Description |
-|------|-------------|
-| `example_tool` | Template tool demonstrating the pattern |
-| `file_read` | Read contents of local files |
-| `file_write` | Write content to local files |
-| `web_search` | Search the web using Brave Search API |
-| `web_scrape` | Scrape and extract content from webpages |
-| `pdf_read` | Read and extract text from PDF files |
-
-## Project Structure
-
-```
-aden-tools/
-├── src/aden_tools/
-│   ├── __init__.py          # Main exports
-│   ├── utils/               # Utility functions
-│   └── tools/               # Tool implementations
-│       ├── example_tool/
-│       ├── file_read_tool/
-│       ├── file_write_tool/
-│       ├── web_search_tool/
-│       ├── web_scrape_tool/
-│       └── pdf_read_tool/
-├── tests/                   # Test suite
-├── mcp_server.py            # MCP server entry point
-├── README.md
-├── BUILDING_TOOLS.md        # Tool development guide
-└── pyproject.toml
-```
-
-## Creating Custom Tools
-
-Tools use FastMCP's native decorator pattern:
-
-```python
-from fastmcp import FastMCP
-
-
-def register_tools(mcp: FastMCP) -> None:
-    @mcp.tool()
-    def my_tool(query: str, limit: int = 10) -> dict:
-        """
-        Search for items matching the query.
-
-        Args:
-            query: The search query
-            limit: Max results to return
-
-        Returns:
-            Dict with results or error
-        """
-        try:
-            results = do_search(query, limit)
-            return {"results": results, "total": len(results)}
-        except Exception as e:
-            return {"error": str(e)}
-```
-
-See [BUILDING_TOOLS.md](BUILDING_TOOLS.md) for the full guide.
-
-## Documentation
-
- [Building Tools Guide](BUILDING_TOOLS.md) - How to create new tools
- Individual tool READMEs in `src/aden_tools/tools/*/README.md`
-
-## License
-
-This project is licensed under the Apache License 2.0 - see the [LICENSE](../LICENSE) file for details.
@@ -1,79 +0,0 @@
-#!/usr/bin/env python3
-"""
-Aden Tools MCP Server
-
-Exposes all aden-tools via Model Context Protocol using FastMCP.
-
-Usage:
-    # Run with HTTP transport (default, for Docker)
-    python mcp_server.py
-
-    # Run with custom port
-    python mcp_server.py --port 8001
-
-    # Run with STDIO transport (for local testing)
-    python mcp_server.py --stdio
-
-Environment Variables:
-    MCP_PORT              - Server port (default: 4001)
-    BRAVE_SEARCH_API_KEY  - Required for web_search tool
-"""
-import argparse
-import os
-
-from fastmcp import FastMCP
-from starlette.requests import Request
-from starlette.responses import PlainTextResponse
-
-mcp = FastMCP("aden-tools")
-
-# Register all tools with the MCP server
-from aden_tools.tools import register_all_tools
-
-tools = register_all_tools(mcp)
-print(f"[MCP] Registered {len(tools)} tools: {tools}")
-
-
-@mcp.custom_route("/health", methods=["GET"])
-async def health_check(request: Request) -> PlainTextResponse:
-    """Health check endpoint for container orchestration."""
-    return PlainTextResponse("OK")
-
-
-@mcp.custom_route("/", methods=["GET"])
-async def index(request: Request) -> PlainTextResponse:
-    """Landing page for browser visits."""
-    return PlainTextResponse("Welcome to the Hive MCP Server")
-
-
-def main() -> None:
-    """Entry point for the MCP server."""
-    parser = argparse.ArgumentParser(description="Aden Tools MCP Server")
-    parser.add_argument(
-        "--port",
-        type=int,
-        default=int(os.getenv("MCP_PORT", "4001")),
-        help="HTTP server port (default: 4001)",
-    )
-    parser.add_argument(
-        "--host",
-        default="0.0.0.0",
-        help="HTTP server host (default: 0.0.0.0)",
-    )
-    parser.add_argument(
-        "--stdio",
-        action="store_true",
-        help="Use STDIO transport instead of HTTP",
-    )
-    args = parser.parse_args()
-
-    if args.stdio:
-        print("[MCP] Starting with STDIO transport")
-        mcp.run(transport="stdio")
-    else:
-        print(f"[MCP] Starting HTTP server on {args.host}:{args.port}")
-        mcp.run(transport="http", host=args.host, port=args.port)
-
-
-if __name__ == "__main__":
-    main()
@@ -1,60 +0,0 @@
-[project]
-name = "aden-tools"
-version = "0.1.0"
-description = "Tools library for the Aden agent framework"
-readme = "README.md"
-requires-python = ">=3.10"
-license = { text = "Apache-2.0" }
-authors = [
-    { name = "Aden", email = "team@aden.ai" }
-]
-keywords = ["ai", "agents", "tools", "llm"]
-classifiers = [
-    "Development Status :: 3 - Alpha",
-    "Intended Audience :: Developers",
-    "License :: OSI Approved :: Apache Software License",
-    "Programming Language :: Python :: 3",
-    "Programming Language :: Python :: 3.10",
-    "Programming Language :: Python :: 3.11",
-    "Programming Language :: Python :: 3.12",
-]
-
-dependencies = [
-    "pydantic>=2.0.0",
-    "httpx>=0.27.0",
-    "beautifulsoup4>=4.12.0",
-    "pypdf>=4.0.0",
-    "pandas>=2.0.0",
-    "jsonpath-ng>=1.6.0",
-    "fastmcp>=2.0.0",
-    "diff-match-patch>=20230430",
-]
-
-[project.optional-dependencies]
-dev = [
-    "pytest>=7.0.0",
-    "pytest-asyncio>=0.21.0",
-]
-sandbox = [
-    "RestrictedPython>=7.0",
-]
-ocr = [
-    "pytesseract>=0.3.10",
-    "pillow>=10.0.0",
-]
-all = [
-    "RestrictedPython>=7.0",
-    "pytesseract>=0.3.10",
-    "pillow>=10.0.0",
-]
-
-[build-system]
-requires = ["hatchling"]
-build-backend = "hatchling.build"
-
-[tool.hatch.build.targets.wheel]
-packages = ["src/aden_tools"]
-
-[tool.pytest.ini_options]
-testpaths = ["tests"]
-asyncio_mode = "auto"
@@ -1,30 +0,0 @@
-"""
-Aden Tools - Tool library for the Aden agent framework.
-
-Tools provide capabilities that AI agents can use to interact with
-external systems, process data, and perform actions.
-
-Usage:
-    from fastmcp import FastMCP
-    from aden_tools.tools import register_all_tools
-
-    mcp = FastMCP("my-server")
-    register_all_tools(mcp)
-"""
-
-__version__ = "0.1.0"
-
-# Utilities
-from .utils import get_env_var
-
-# MCP registration
-from .tools import register_all_tools
-
-__all__ = [
-    # Version
-    "__version__",
-    # Utilities
-    "get_env_var",
-    # MCP registration
-    "register_all_tools",
-]
@@ -1,79 +0,0 @@
-"""
-Aden Tools - Tool implementations for FastMCP.
-
-Usage:
-    from fastmcp import FastMCP
-    from aden_tools.tools import register_all_tools
-
-    mcp = FastMCP("my-server")
-    register_all_tools(mcp)
-"""
-from typing import List
-
-from fastmcp import FastMCP
-
-# Import register_tools from each tool module
-from .example_tool import register_tools as register_example
-from .file_read_tool import register_tools as register_file_read
-from .file_write_tool import register_tools as register_file_write
-from .web_search_tool import register_tools as register_web_search
-from .web_scrape_tool import register_tools as register_web_scrape
-from .pdf_read_tool import register_tools as register_pdf_read
-
-# Import file system toolkits
-from .file_system_toolkits.view_file import register_tools as register_view_file
-from .file_system_toolkits.write_to_file import register_tools as register_write_to_file
-from .file_system_toolkits.list_dir import register_tools as register_list_dir
-from .file_system_toolkits.replace_file_content import register_tools as register_replace_file_content
-from .file_system_toolkits.apply_diff import register_tools as register_apply_diff
-from .file_system_toolkits.apply_patch import register_tools as register_apply_patch
-from .file_system_toolkits.grep_search import register_tools as register_grep_search
-from .file_system_toolkits.execute_command_tool import register_tools as register_execute_command
-
-
-def register_all_tools(mcp: FastMCP) -> List[str]:
-    """
-    Register all aden-tools with a FastMCP server.
-
-    Args:
-        mcp: FastMCP server instance
-
-    Returns:
-        List of registered tool names
-    """
-    register_example(mcp)
-    register_file_read(mcp)
-    register_file_write(mcp)
-    register_web_search(mcp)
-    register_web_scrape(mcp)
-    register_pdf_read(mcp)
-
-    # Register file system toolkits
-    register_view_file(mcp)
-    register_write_to_file(mcp)
-    register_list_dir(mcp)
-    register_replace_file_content(mcp)
-    register_apply_diff(mcp)
-    register_apply_patch(mcp)
-    register_grep_search(mcp)
-    register_execute_command(mcp)
-
-    return [
-        "example_tool",
-        "file_read",
-        "file_write",
-        "web_search",
-        "web_scrape",
-        "pdf_read",
-        "view_file",
-        "write_to_file",
-        "list_dir",
-        "replace_file_content",
-        "apply_diff",
-        "apply_patch",
-        "grep_search",
-        "execute_command_tool",
-    ]
-
-
-__all__ = ["register_all_tools"]
@@ -1,28 +0,0 @@
-# File Read Tool
-
-Read contents of local files with encoding support.
-
-## Description
-
-Use for reading configs, data files, source code, logs, or any text file. Returns file content along with path, name, size, and encoding metadata.
-
-## Arguments
-
-| Argument | Type | Required | Default | Description |
-|----------|------|----------|---------|-------------|
-| `file_path` | str | Yes | - | Path to the file to read (absolute or relative) |
-| `encoding` | str | No | `utf-8` | File encoding (utf-8, latin-1, etc.) |
-| `max_size` | int | No | `10000000` | Maximum file size to read in bytes (default 10MB) |
-
-## Environment Variables
-
-This tool does not require any environment variables.
-
-## Error Handling
-
-Returns error dicts for common issues:
- `File not found: <path>` - File does not exist
- `Not a file: <path>` - Path points to a directory
- `File too large: <size> bytes (max: <max_size>)` - File exceeds max_size limit
- `Failed to decode file with encoding '<encoding>'` - Wrong encoding specified
- `Permission denied: <path>` - No read access to file
@@ -1,4 +0,0 @@
-"""File Read Tool - Read contents of local files."""
-from .file_read_tool import register_tools
-
-__all__ = ["register_tools"]
@@ -1,75 +0,0 @@
-"""
-File Read Tool - Read contents of local files.
-
-Supports reading text files with various encodings.
-Returns file content along with metadata.
-"""
-from __future__ import annotations
-
-from pathlib import Path
-
-from fastmcp import FastMCP
-
-
-def register_tools(mcp: FastMCP) -> None:
-    """Register file read tools with the MCP server."""
-
-    @mcp.tool()
-    def file_read(
-        file_path: str,
-        encoding: str = "utf-8",
-        max_size: int = 10_000_000,
-    ) -> dict:
-        """
-        Read the contents of a local file.
-
-        Use for reading configs, data files, source code, logs, or any text file.
-        Returns file content along with path, name, size, and encoding.
-
-        Args:
-            file_path: Path to the file to read (absolute or relative)
-            encoding: File encoding (utf-8, latin-1, etc.)
-            max_size: Maximum file size to read in bytes (default 10MB)
-
-        Returns:
-            Dict with file content and metadata, or error dict
-        """
-        try:
-            path = Path(file_path).resolve()
-
-            # Check if file exists
-            if not path.exists():
-                return {"error": f"File not found: {file_path}"}
-
-            # Check if it's a file (not directory)
-            if not path.is_file():
-                return {"error": f"Not a file: {file_path}"}
-
-            # Check file size
-            file_size = path.stat().st_size
-            if max_size > 0 and file_size > max_size:
-                return {
-                    "error": f"File too large: {file_size} bytes (max: {max_size})",
-                    "file_size": file_size,
-                }
-
-            # Read the file
-            content = path.read_text(encoding=encoding)
-
-            return {
-                "path": str(path),
-                "name": path.name,
-                "content": content,
-                "size": len(content),
-                "encoding": encoding,
-            }
-
-        except UnicodeDecodeError as e:
-            return {
-                "error": f"Failed to decode file with encoding '{encoding}': {str(e)}",
-                "suggestion": "Try a different encoding like 'latin-1' or 'cp1252'",
-            }
-        except PermissionError:
-            return {"error": f"Permission denied: {file_path}"}
-        except Exception as e:
-            return {"error": f"Failed to read file: {str(e)}"}
@@ -1,70 +0,0 @@
-import os
-import re
-from mcp.server.fastmcp import FastMCP
-from ..security import get_secure_path, WORKSPACES_DIR
-
-def register_tools(mcp: FastMCP) -> None:
-    """Register grep search tools with the MCP server."""
-
-    @mcp.tool()
-    def grep_search(path: str, pattern: str, workspace_id: str, agent_id: str, session_id: str, recursive: bool = False) -> dict:
-        """
-        Search for a pattern in a file or directory within the session sandbox.
-
-        Use this when you need to find specific content or patterns in files using regex.
-        Set recursive=True to search through all subdirectories.
-
-        Args:
-            path: The path to search in (file or directory, relative to session root)
-            pattern: The regex pattern to search for
-            workspace_id: The ID of the workspace
-            agent_id: The ID of the agent
-            session_id: The ID of the current session
-            recursive: Whether to search recursively in directories (default: False)
-
-        Returns:
-            Dict with search results and match details, or error dict
-        """
-        try:
-            secure_path = get_secure_path(path, workspace_id, agent_id, session_id)
-            # Use session dir root for relative path calculations
-            session_root = os.path.join(WORKSPACES_DIR, workspace_id, agent_id, session_id)
-
-            matches = []
-            regex = re.compile(pattern)
-
-            if os.path.isfile(secure_path):
-                files = [secure_path]
-            elif recursive:
-                files = []
-                for root, _, filenames in os.walk(secure_path):
-                    for filename in filenames:
-                        files.append(os.path.join(root, filename))
-            else:
-                files = [os.path.join(secure_path, f) for f in os.listdir(secure_path) if os.path.isfile(os.path.join(secure_path, f))]
-
-            for file_path in files:
-                # Calculate relative path for display
-                display_path = os.path.relpath(file_path, session_root)
-                try:
-                    with open(file_path, "r", encoding="utf-8") as f:
-                        for i, line in enumerate(f, 1):
-                            if regex.search(line):
-                                matches.append({
-                                    "file": display_path,
-                                    "line_number": i,
-                                    "line_content": line.strip()
-                                })
-                except (UnicodeDecodeError, PermissionError):
-                    continue
-
-            return {
-                "success": True,
-                "pattern": pattern,
-                "path": path,
-                "recursive": recursive,
-                "matches": matches,
-                "total_matches": len(matches)
-            }
-        except Exception as e:
-            return {"error": f"Failed to perform grep search: {str(e)}"}
@@ -1,27 +0,0 @@
-import os
-
-WORKSPACES_DIR = os.path.abspath(os.path.join(os.getcwd(), "workdir/workspaces"))
-
-def get_secure_path(path: str, workspace_id: str, agent_id: str, session_id: str) -> str:
-    """Resolve and verify a path within a 3-layer sandbox (workspace/agent/session)."""
-    if not workspace_id or not agent_id or not session_id:
-        raise ValueError("workspace_id, agent_id, and session_id are all required")
-
-    # Ensure session directory exists: runtime/workspace_id/agent_id/session_id
-    session_dir = os.path.join(WORKSPACES_DIR, workspace_id, agent_id, session_id)
-    os.makedirs(session_dir, exist_ok=True)
-    
-    # Resolve absolute path
-    if os.path.isabs(path):
-        # Treat absolute paths as relative to the session root if they start with /
-        rel_path = path.lstrip(os.sep)
-        final_path = os.path.abspath(os.path.join(session_dir, rel_path))
-    else:
-        final_path = os.path.abspath(os.path.join(session_dir, path))
-    
-    # Verify path is within session_dir
-    common_prefix = os.path.commonpath([final_path, session_dir])
-    if common_prefix != session_dir:
-        raise ValueError(f"Access denied: Path '{path}' is outside the session sandbox.")
-        
-    return final_path
@@ -1,86 +0,0 @@
-# View File Tool
-
-Reads the content of a file within the secure session sandbox.
-
-## Description
-
-The `view_file` tool allows you to read and retrieve the complete content of files within a sandboxed session environment. It provides metadata about the file along with its content.
-
-## Use Cases
-
- Reading configuration files
- Viewing source code
- Inspecting log files
- Retrieving data files for processing
-
-## Usage
-
-```python
-view_file(
-    path="config/settings.json",
-    workspace_id="workspace-123",
-    agent_id="agent-456",
-    session_id="session-789"
-)
-```
-
-## Arguments
-
-| Argument | Type | Required | Default | Description |
-|----------|------|----------|---------|-------------|
-| `path` | str | Yes | - | The path to the file (relative to session root) |
-| `workspace_id` | str | Yes | - | The ID of the workspace |
-| `agent_id` | str | Yes | - | The ID of the agent |
-| `session_id` | str | Yes | - | The ID of the current session |
-
-## Returns
-
-Returns a dictionary with the following structure:
-
-**Success:**
-```python
-{
-    "success": True,
-    "path": "config/settings.json",
-    "content": "{\"debug\": true}",
-    "size_bytes": 16,
-    "lines": 1
-}
-```
-
-**Error:**
-```python
-{
-    "error": "File not found at config/settings.json"
-}
-```
-
-## Error Handling
-
- Returns an error dict if the file doesn't exist
- Returns an error dict if the file cannot be read (permission issues, encoding errors, etc.)
- Handles binary files gracefully by returning appropriate error messages
-
-## Examples
-
-### Reading a text file
-```python
-result = view_file(
-    path="README.md",
-    workspace_id="ws-1",
-    agent_id="agent-1",
-    session_id="session-1"
-)
-# Returns: {"success": True, "path": "README.md", "content": "# My Project\n...", "size_bytes": 1024, "lines": 42}
-```
-
-### Handling missing files
-```python
-result = view_file(
-    path="nonexistent.txt",
-    workspace_id="ws-1",
-    agent_id="agent-1",
-    session_id="session-1"
-)
-# Returns: {"error": "File not found at nonexistent.txt"}
-```
@@ -1,3 +0,0 @@
-from .view_file import register_tools
-
-__all__ = ["register_tools"]
@@ -1,40 +0,0 @@
-import os
-from mcp.server.fastmcp import FastMCP
-from ..security import get_secure_path
-
-def register_tools(mcp: FastMCP) -> None:
-    """Register file view tools with the MCP server."""
-
-    @mcp.tool()
-    def view_file(path: str, workspace_id: str, agent_id: str, session_id: str) -> dict:
-        """
-        Read the content of a file within the session sandbox.
-
-        Use this when you need to view the contents of an existing file.
-
-        Args:
-            path: The path to the file (relative to session root)
-            workspace_id: The ID of the workspace
-            agent_id: The ID of the agent
-            session_id: The ID of the current session
-
-        Returns:
-            Dict with file content and metadata, or error dict
-        """
-        try:
-            secure_path = get_secure_path(path, workspace_id, agent_id, session_id)
-            if not os.path.exists(secure_path):
-                return {"error": f"File not found at {path}"}
-
-            with open(secure_path, "r", encoding="utf-8") as f:
-                content = f.read()
-
-            return {
-                "success": True,
-                "path": path,
-                "content": content,
-                "size_bytes": len(content.encode("utf-8")),
-                "lines": len(content.splitlines())
-            }
-        except Exception as e:
-            return {"error": f"Failed to read file: {str(e)}"}
@@ -1,92 +0,0 @@
-# Write to File Tool
-
-Writes content to a file within the secure session sandbox. Supports both overwriting and appending modes.
-
-## Description
-
-The `write_to_file` tool allows you to create new files or modify existing files within a sandboxed session environment. It automatically creates parent directories if they don't exist and provides flexible write modes.
-
-## Use Cases
-
- Creating new configuration files
- Writing generated code or data
- Appending logs or output to existing files
- Saving processed results to disk
-
-## Usage
-
-```python
-write_to_file(
-    path="config/settings.json",
-    content='{"debug": true}',
-    workspace_id="workspace-123",
-    agent_id="agent-456",
-    session_id="session-789",
-    append=False
-)
-```
-
-## Arguments
-
-| Argument | Type | Required | Default | Description |
-|----------|------|----------|---------|-------------|
-| `path` | str | Yes | - | The path to the file (relative to session root) |
-| `content` | str | Yes | - | The content to write to the file |
-| `workspace_id` | str | Yes | - | The ID of the workspace |
-| `agent_id` | str | Yes | - | The ID of the agent |
-| `session_id` | str | Yes | - | The ID of the current session |
-| `append` | bool | No | False | Whether to append to the file instead of overwriting |
-
-## Returns
-
-Returns a dictionary with the following structure:
-
-**Success:**
-```python
-{
-    "success": True,
-    "path": "config/settings.json",
-    "mode": "written",  # or "appended"
-    "bytes_written": 18
-}
-```
-
-**Error:**
-```python
-{
-    "error": "Failed to write to file: [error message]"
-}
-```
-
-## Error Handling
-
- Returns an error dict if the file cannot be written (permission issues, invalid path, etc.)
- Automatically creates parent directories if they don't exist
- Handles encoding errors gracefully
-
-## Examples
-
-### Creating a new file
-```python
-result = write_to_file(
-    path="data/output.txt",
-    content="Hello, world!",
-    workspace_id="ws-1",
-    agent_id="agent-1",
-    session_id="session-1"
-)
-# Returns: {"success": True, "path": "data/output.txt", "mode": "written", "bytes_written": 13}
-```
-
-### Appending to a file
-```python
-result = write_to_file(
-    path="logs/activity.log",
-    content="\n[INFO] Task completed",
-    workspace_id="ws-1",
-    agent_id="agent-1",
-    session_id="session-1",
-    append=True
-)
-# Returns: {"success": True, "path": "logs/activity.log", "mode": "appended", "bytes_written": 24}
-```
@@ -1,3 +0,0 @@
-from .write_to_file import register_tools
-
-__all__ = ["register_tools"]
@@ -1,40 +0,0 @@
-import os
-from mcp.server.fastmcp import FastMCP
-from ..security import get_secure_path
-
-def register_tools(mcp: FastMCP) -> None:
-    """Register file write tools with the MCP server."""
-
-    @mcp.tool()
-    def write_to_file(path: str, content: str, workspace_id: str, agent_id: str, session_id: str, append: bool = False) -> dict:
-        """
-        Write content to a file within the session sandbox.
-
-        Use this when you need to create a new file or overwrite an existing file.
-        Set append=True to add content to the end of an existing file.
-
-        Args:
-            path: The path to the file (relative to session root)
-            content: The content to write to the file
-            workspace_id: The ID of the workspace
-            agent_id: The ID of the agent
-            session_id: The ID of the current session
-            append: Whether to append to the file instead of overwriting (default: False)
-
-        Returns:
-            Dict with success status and path, or error dict
-        """
-        try:
-            secure_path = get_secure_path(path, workspace_id, agent_id, session_id)
-            os.makedirs(os.path.dirname(secure_path), exist_ok=True)
-            mode = "a" if append else "w"
-            with open(secure_path, mode, encoding="utf-8") as f:
-                f.write(content)
-            return {
-                "success": True,
-                "path": path,
-                "mode": "appended" if append else "written",
-                "bytes_written": len(content.encode("utf-8"))
-            }
-        except Exception as e:
-            return {"error": f"Failed to write to file: {str(e)}"}
@@ -1,29 +0,0 @@
-# File Write Tool
-
-Write content to local files with encoding support.
-
-## Description
-
-Can create new files or overwrite/append to existing ones. Use for saving data, creating configs, writing reports, or exporting results. Optionally creates parent directories if they don't exist.
-
-## Arguments
-
-| Argument | Type | Required | Default | Description |
-|----------|------|----------|---------|-------------|
-| `file_path` | str | Yes | - | Path to the file to write (absolute or relative) |
-| `content` | str | Yes | - | Content to write to the file |
-| `encoding` | str | No | `utf-8` | File encoding (utf-8, latin-1, etc.) |
-| `mode` | str | No | `write` | Write mode - 'write' (overwrite) or 'append' |
-| `create_dirs` | bool | No | `True` | Create parent directories if they don't exist |
-
-## Environment Variables
-
-This tool does not require any environment variables.
-
-## Error Handling
-
-Returns error dicts for common issues:
- `Parent directory does not exist: <path>` - Parent dir missing and create_dirs=False
- `Invalid mode: <mode>. Use 'write' or 'append'.` - Invalid mode specified
- `Permission denied: <path>` - No write access to file/directory
- `OS error writing file: <error>` - Filesystem error
@@ -1,4 +0,0 @@
-"""File Write Tool - Create or update local files."""
-from .file_write_tool import register_tools
-
-__all__ = ["register_tools"]
@@ -1,83 +0,0 @@
-"""
-File Write Tool - Create or update local files.
-
-Supports writing text files with various encodings.
-Can create directories if they don't exist.
-"""
-from __future__ import annotations
-
-from pathlib import Path
-
-from fastmcp import FastMCP
-
-
-def register_tools(mcp: FastMCP) -> None:
-    """Register file write tools with the MCP server."""
-
-    @mcp.tool()
-    def file_write(
-        file_path: str,
-        content: str,
-        encoding: str = "utf-8",
-        mode: str = "write",
-        create_dirs: bool = True,
-    ) -> dict:
-        """
-        Write content to a local file.
-
-        Can create new files or overwrite/append to existing ones.
-        Use for saving data, creating configs, writing reports, or exporting results.
-
-        Args:
-            file_path: Path to the file to write (absolute or relative)
-            content: Content to write to the file
-            encoding: File encoding (utf-8, latin-1, etc.)
-            mode: Write mode - 'write' (overwrite) or 'append'
-            create_dirs: Create parent directories if they don't exist
-
-        Returns:
-            Dict with write result or error dict
-        """
-        try:
-            path = Path(file_path).resolve()
-
-            # Create parent directories if requested
-            if create_dirs:
-                path.parent.mkdir(parents=True, exist_ok=True)
-            elif not path.parent.exists():
-                return {"error": f"Parent directory does not exist: {path.parent}"}
-
-            # Determine write mode
-            if mode == "append":
-                write_mode = "a"
-            elif mode == "write":
-                write_mode = "w"
-            else:
-                return {"error": f"Invalid mode: {mode}. Use 'write' or 'append'."}
-
-            # Check if we're overwriting
-            existed = path.exists()
-            previous_size = path.stat().st_size if existed else 0
-
-            # Write the file
-            with open(path, write_mode, encoding=encoding) as f:
-                f.write(content)
-
-            new_size = path.stat().st_size
-
-            return {
-                "path": str(path),
-                "name": path.name,
-                "bytes_written": len(content.encode(encoding)),
-                "total_size": new_size,
-                "mode": mode,
-                "created": not existed,
-                "previous_size": previous_size if existed else None,
-            }
-
-        except PermissionError:
-            return {"error": f"Permission denied: {file_path}"}
-        except OSError as e:
-            return {"error": f"OS error writing file: {str(e)}"}
-        except Exception as e:
-            return {"error": f"Failed to write file: {str(e)}"}
@@ -1,134 +0,0 @@
-"""
-Web Scrape Tool - Extract content from web pages.
-
-Uses httpx for requests and BeautifulSoup for HTML parsing.
-Returns clean text content from web pages.
-"""
-from __future__ import annotations
-
-from typing import Any, List
-
-import httpx
-from bs4 import BeautifulSoup
-from fastmcp import FastMCP
-
-
-def register_tools(mcp: FastMCP) -> None:
-    """Register web scrape tools with the MCP server."""
-
-    @mcp.tool()
-    def web_scrape(
-        url: str,
-        selector: str | None = None,
-        include_links: bool = False,
-        max_length: int = 50000,
-    ) -> dict:
-        """
-        Scrape and extract text content from a webpage.
-
-        Use when you need to read the content of a specific URL,
-        extract data from a website, or read articles/documentation.
-
-        Args:
-            url: URL of the webpage to scrape
-            selector: CSS selector to target specific content (e.g., 'article', '.main-content')
-            include_links: Include extracted links in the response
-            max_length: Maximum length of extracted text (1000-500000)
-
-        Returns:
-            Dict with scraped content (url, title, description, content, length) or error dict
-        """
-        try:
-            # Validate URL
-            if not url.startswith(("http://", "https://")):
-                url = "https://" + url
-
-            # Validate max_length
-            if max_length < 1000:
-                max_length = 1000
-            elif max_length > 500000:
-                max_length = 500000
-
-            # Make request
-            response = httpx.get(
-                url,
-                headers={
-                    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
-                    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
-                    "Accept-Language": "en-US,en;q=0.5",
-                },
-                follow_redirects=True,
-                timeout=30.0,
-            )
-
-            if response.status_code != 200:
-                return {"error": f"HTTP {response.status_code}: Failed to fetch URL"}
-
-            # Parse HTML
-            soup = BeautifulSoup(response.text, "html.parser")
-
-            # Remove noise elements
-            for tag in soup(["script", "style", "nav", "footer", "header", "aside", "noscript", "iframe"]):
-                tag.decompose()
-
-            # Get title and description
-            title = ""
-            title_tag = soup.find("title")
-            if title_tag:
-                title = title_tag.get_text(strip=True)
-
-            description = ""
-            meta_desc = soup.find("meta", attrs={"name": "description"})
-            if meta_desc:
-                description = meta_desc.get("content", "")
-
-            # Target content
-            if selector:
-                content_elem = soup.select_one(selector)
-                if not content_elem:
-                    return {"error": f"No elements found matching selector: {selector}"}
-                text = content_elem.get_text(separator=" ", strip=True)
-            else:
-                # Auto-detect main content
-                main_content = (
-                    soup.find("article")
-                    or soup.find("main")
-                    or soup.find(attrs={"role": "main"})
-                    or soup.find(class_=["content", "post", "entry", "article-body"])
-                    or soup.find("body")
-                )
-                text = main_content.get_text(separator=" ", strip=True) if main_content else ""
-
-            # Clean up whitespace
-            text = " ".join(text.split())
-
-            # Truncate if needed
-            if len(text) > max_length:
-                text = text[:max_length] + "..."
-
-            result: dict[str, Any] = {
-                "url": str(response.url),
-                "title": title,
-                "description": description,
-                "content": text,
-                "length": len(text),
-            }
-
-            # Extract links if requested
-            if include_links:
-                links: List[dict[str, str]] = []
-                for a in soup.find_all("a", href=True)[:50]:
-                    href = a["href"]
-                    link_text = a.get_text(strip=True)
-                    if link_text and href:
-                        links.append({"text": link_text, "href": href})
-                result["links"] = links
-
-            return result
-
-        except httpx.TimeoutException:
-            return {"error": "Request timed out"}
-        except httpx.RequestError as e:
-            return {"error": f"Network error: {str(e)}"}
-        except Exception as e:
-            return {"error": f"Scraping failed: {str(e)}"}
@@ -1,31 +0,0 @@
-# Web Search Tool
-
-Search the web using the Brave Search API.
-
-## Description
-
-Returns titles, URLs, and snippets for search results. Use when you need current information, research topics, or find websites.
-
-## Arguments
-
-| Argument | Type | Required | Default | Description |
-|----------|------|----------|---------|-------------|
-| `query` | str | Yes | - | The search query (1-500 chars) |
-| `num_results` | int | No | `10` | Number of results to return (1-20) |
-| `country` | str | No | `us` | Country code for localized results (us, uk, de, etc.) |
-
-## Environment Variables
-
-| Variable | Required | Description |
-|----------|----------|-------------|
-| `BRAVE_SEARCH_API_KEY` | Yes | API key from [Brave Search API](https://brave.com/search/api/) |
-
-## Error Handling
-
-Returns error dicts for common issues:
- `BRAVE_SEARCH_API_KEY environment variable not set` - Missing API key
- `Query must be 1-500 characters` - Empty or too long query
- `Invalid API key` - API key rejected (HTTP 401)
- `Rate limit exceeded. Try again later.` - Too many requests (HTTP 429)
- `Search request timed out` - Request exceeded 30s timeout
- `Network error: <error>` - Connection or DNS issues
@@ -1,100 +0,0 @@
-"""
-Web Search Tool - Search the web using Brave Search API.
-
-Requires BRAVE_SEARCH_API_KEY environment variable.
-Returns search results with titles, URLs, and snippets.
-"""
-from __future__ import annotations
-
-import os
-
-import httpx
-from fastmcp import FastMCP
-
-
-def register_tools(mcp: FastMCP) -> None:
-    """Register web search tools with the MCP server."""
-
-    @mcp.tool()
-    def web_search(
-        query: str,
-        num_results: int = 10,
-        country: str = "us",
-    ) -> dict:
-        """
-        Search the web for information using Brave Search API.
-
-        Returns titles, URLs, and snippets. Use when you need current
-        information, research, or to find websites.
-
-        Requires BRAVE_SEARCH_API_KEY environment variable.
-
-        Args:
-            query: The search query (1-500 chars)
-            num_results: Number of results to return (1-20)
-            country: Country code for localized results (us, uk, de, etc.)
-
-        Returns:
-            Dict with search results or error dict
-        """
-        api_key = os.getenv("BRAVE_SEARCH_API_KEY")
-        if not api_key:
-            return {
-                "error": "BRAVE_SEARCH_API_KEY environment variable not set",
-                "help": "Get an API key at https://brave.com/search/api/",
-            }
-
-        # Validate inputs
-        if not query or len(query) > 500:
-            return {"error": "Query must be 1-500 characters"}
-        if num_results < 1 or num_results > 20:
-            num_results = max(1, min(20, num_results))
-
-        try:
-            # Make request to Brave Search API
-            response = httpx.get(
-                "https://api.search.brave.com/res/v1/web/search",
-                params={
-                    "q": query,
-                    "count": num_results,
-                    "country": country,
-                },
-                headers={
-                    "X-Subscription-Token": api_key,
-                    "Accept": "application/json",
-                },
-                timeout=30.0,
-            )
-
-            if response.status_code == 401:
-                return {"error": "Invalid API key"}
-            elif response.status_code == 429:
-                return {"error": "Rate limit exceeded. Try again later."}
-            elif response.status_code != 200:
-                return {"error": f"API request failed: HTTP {response.status_code}"}
-
-            data = response.json()
-
-            # Extract results
-            results = []
-            web_results = data.get("web", {}).get("results", [])
-
-            for item in web_results[:num_results]:
-                results.append({
-                    "title": item.get("title", ""),
-                    "url": item.get("url", ""),
-                    "snippet": item.get("description", ""),
-                })
-
-            return {
-                "query": query,
-                "results": results,
-                "total": len(results),
-            }
-
-        except httpx.TimeoutException:
-            return {"error": "Search request timed out"}
-        except httpx.RequestError as e:
-            return {"error": f"Network error: {str(e)}"}
-        except Exception as e:
-            return {"error": f"Search failed: {str(e)}"}
@@ -1,43 +0,0 @@
-"""Shared fixtures for aden-tools tests."""
-import pytest
-from pathlib import Path
-
-from fastmcp import FastMCP
-
-
-@pytest.fixture
-def mcp() -> FastMCP:
-    """Create a fresh FastMCP instance for testing."""
-    return FastMCP("test-server")
-
-
-@pytest.fixture
-def sample_text_file(tmp_path: Path) -> Path:
-    """Create a simple text file for testing."""
-    txt_file = tmp_path / "test.txt"
-    txt_file.write_text("Hello, World!\nLine 2\nLine 3")
-    return txt_file
-
-
-@pytest.fixture
-def sample_csv(tmp_path: Path) -> Path:
-    """Create a simple CSV file for testing."""
-    csv_file = tmp_path / "test.csv"
-    csv_file.write_text("name,age,city\nAlice,30,NYC\nBob,25,LA\nCharlie,35,Chicago\n")
-    return csv_file
-
-
-@pytest.fixture
-def sample_json(tmp_path: Path) -> Path:
-    """Create a simple JSON file for testing."""
-    json_file = tmp_path / "test.json"
-    json_file.write_text('{"users": [{"name": "Alice", "age": 30}, {"name": "Bob", "age": 25}]}')
-    return json_file
-
-
-@pytest.fixture
-def large_text_file(tmp_path: Path) -> Path:
-    """Create a large text file for size limit testing."""
-    large_file = tmp_path / "large.txt"
-    large_file.write_text("x" * 20_000_000)  # 20MB
-    return large_file
@@ -1,96 +0,0 @@
-"""Tests for file_read tool (FastMCP)."""
-import pytest
-from pathlib import Path
-
-from fastmcp import FastMCP
-from aden_tools.tools.file_read_tool import register_tools
-
-
-@pytest.fixture
-def file_read_fn(mcp: FastMCP):
-    """Register and return the file_read tool function."""
-    register_tools(mcp)
-    # Access the registered tool's function directly
-    return mcp._tool_manager._tools["file_read"].fn
-
-
-class TestFileReadTool:
-    """Tests for file_read tool."""
-
-    def test_read_existing_file(self, file_read_fn, sample_text_file: Path):
-        """Reading an existing file returns content and metadata."""
-        result = file_read_fn(file_path=str(sample_text_file))
-
-        assert "error" not in result
-        assert result["content"] == "Hello, World!\nLine 2\nLine 3"
-        assert result["name"] == "test.txt"
-        assert result["encoding"] == "utf-8"
-        assert "size" in result
-
-    def test_read_file_not_found(self, file_read_fn, tmp_path: Path):
-        """Reading a non-existent file returns an error dict."""
-        missing_file = tmp_path / "does_not_exist.txt"
-
-        result = file_read_fn(file_path=str(missing_file))
-
-        assert "error" in result
-        assert "not found" in result["error"].lower()
-
-    def test_read_directory_returns_error(self, file_read_fn, tmp_path: Path):
-        """Reading a directory (not a file) returns an error."""
-        result = file_read_fn(file_path=str(tmp_path))
-
-        assert "error" in result
-        assert "not a file" in result["error"].lower()
-
-    def test_read_file_too_large(self, file_read_fn, tmp_path: Path):
-        """Reading a file exceeding max_size returns an error."""
-        large_file = tmp_path / "large.txt"
-        large_file.write_text("x" * 1000)
-
-        result = file_read_fn(file_path=str(large_file), max_size=100)
-
-        assert "error" in result
-        assert "too large" in result["error"].lower()
-        assert "file_size" in result
-
-    def test_read_with_no_size_limit(self, file_read_fn, tmp_path: Path):
-        """Reading with max_size=0 allows any file size."""
-        large_file = tmp_path / "large.txt"
-        content = "x" * 100_000
-        large_file.write_text(content)
-
-        # max_size=0 means no limit in the implementation
-        result = file_read_fn(file_path=str(large_file), max_size=0)
-
-        assert "error" not in result
-        assert result["content"] == content
-
-    def test_read_with_different_encoding(self, file_read_fn, tmp_path: Path):
-        """Reading with a specific encoding works."""
-        latin_file = tmp_path / "latin.txt"
-        # Write bytes directly with latin-1 encoding
-        latin_file.write_bytes("café".encode("latin-1"))
-
-        result = file_read_fn(file_path=str(latin_file), encoding="latin-1")
-
-        assert "error" not in result
-        assert result["content"] == "café"
-        assert result["encoding"] == "latin-1"
-
-    def test_read_with_wrong_encoding_returns_error(self, file_read_fn, tmp_path: Path):
-        """Reading with wrong encoding returns helpful error."""
-        # Create a file with bytes that aren't valid UTF-8
-        binary_file = tmp_path / "binary.txt"
-        binary_file.write_bytes(b"\xff\xfe")
-
-        result = file_read_fn(file_path=str(binary_file), encoding="utf-8")
-
-        assert "error" in result
-        assert "suggestion" in result
-
-    def test_returns_absolute_path(self, file_read_fn, sample_text_file: Path):
-        """Result includes the absolute path."""
-        result = file_read_fn(file_path=str(sample_text_file))
-
-        assert result["path"] == str(sample_text_file.resolve())
@@ -1,731 +0,0 @@
-"""Tests for file_system_toolkits tools (FastMCP)."""
-import os
-import pytest
-from pathlib import Path
-from unittest.mock import Mock, patch
-
-from fastmcp import FastMCP
-
-
-@pytest.fixture
-def mcp():
-    """Create a FastMCP instance."""
-    return FastMCP("test-server")
-
-
-@pytest.fixture
-def mock_workspace():
-    """Mock workspace, agent, and session IDs."""
-    return {
-        "workspace_id": "test-workspace",
-        "agent_id": "test-agent",
-        "session_id": "test-session"
-    }
-
-
-@pytest.fixture
-def mock_secure_path(tmp_path):
-    """Mock get_secure_path to return temp directory paths."""
-    def _get_secure_path(path, workspace_id, agent_id, session_id):
-        return os.path.join(tmp_path, path)
-    
-    with patch("aden_tools.tools.file_system_toolkits.view_file.view_file.get_secure_path", side_effect=_get_secure_path):
-        with patch("aden_tools.tools.file_system_toolkits.write_to_file.write_to_file.get_secure_path", side_effect=_get_secure_path):
-            with patch("aden_tools.tools.file_system_toolkits.list_dir.list_dir.get_secure_path", side_effect=_get_secure_path):
-                with patch("aden_tools.tools.file_system_toolkits.replace_file_content.replace_file_content.get_secure_path", side_effect=_get_secure_path):
-                    with patch("aden_tools.tools.file_system_toolkits.apply_diff.apply_diff.get_secure_path", side_effect=_get_secure_path):
-                        with patch("aden_tools.tools.file_system_toolkits.apply_patch.apply_patch.get_secure_path", side_effect=_get_secure_path):
-                            with patch("aden_tools.tools.file_system_toolkits.grep_search.grep_search.get_secure_path", side_effect=_get_secure_path):
-                                with patch("aden_tools.tools.file_system_toolkits.grep_search.grep_search.WORKSPACES_DIR", str(tmp_path)):
-                                    with patch("aden_tools.tools.file_system_toolkits.execute_command_tool.execute_command_tool.get_secure_path", side_effect=_get_secure_path):
-                                        with patch("aden_tools.tools.file_system_toolkits.execute_command_tool.execute_command_tool.WORKSPACES_DIR", str(tmp_path)):
-                                            yield
-
-
-class TestViewFileTool:
-    """Tests for view_file tool."""
-
-    @pytest.fixture
-    def view_file_fn(self, mcp):
-        from aden_tools.tools.file_system_toolkits.view_file import register_tools
-        register_tools(mcp)
-        return mcp._tool_manager._tools["view_file"].fn
-
-    def test_view_existing_file(self, view_file_fn, mock_workspace, mock_secure_path, tmp_path):
-        """Viewing an existing file returns content and metadata."""
-        test_file = tmp_path / "test.txt"
-        test_file.write_text("Hello, World!")
-
-        result = view_file_fn(path="test.txt", **mock_workspace)
-
-        assert result["success"] is True
-        assert result["content"] == "Hello, World!"
-        assert result["size_bytes"] == len("Hello, World!".encode("utf-8"))
-        assert result["lines"] == 1
-
-    def test_view_nonexistent_file(self, view_file_fn, mock_workspace, mock_secure_path):
-        """Viewing a non-existent file returns an error."""
-        result = view_file_fn(path="nonexistent.txt", **mock_workspace)
-
-        assert "error" in result
-        assert "not found" in result["error"].lower()
-
-    def test_view_multiline_file(self, view_file_fn, mock_workspace, mock_secure_path, tmp_path):
-        """Viewing a multiline file returns correct line count."""
-        test_file = tmp_path / "multiline.txt"
-        content = "Line 1\nLine 2\nLine 3\nLine 4\n"
-        test_file.write_text(content)
-
-        result = view_file_fn(path="multiline.txt", **mock_workspace)
-
-        assert result["success"] is True
-        assert result["content"] == content
-        assert result["lines"] == 4
-
-    def test_view_empty_file(self, view_file_fn, mock_workspace, mock_secure_path, tmp_path):
-        """Viewing an empty file returns empty content."""
-        test_file = tmp_path / "empty.txt"
-        test_file.write_text("")
-
-        result = view_file_fn(path="empty.txt", **mock_workspace)
-
-        assert result["success"] is True
-        assert result["content"] == ""
-        assert result["size_bytes"] == 0
-        assert result["lines"] == 0
-
-    def test_view_file_with_unicode(self, view_file_fn, mock_workspace, mock_secure_path, tmp_path):
-        """Viewing a file with unicode characters works correctly."""
-        test_file = tmp_path / "unicode.txt"
-        content = "Hello 世界! 🌍 émoji"
-        test_file.write_text(content, encoding="utf-8")
-
-        result = view_file_fn(path="unicode.txt", **mock_workspace)
-
-        assert result["success"] is True
-        assert result["content"] == content
-        assert result["size_bytes"] == len(content.encode("utf-8"))
-
-    def test_view_nested_file(self, view_file_fn, mock_workspace, mock_secure_path, tmp_path):
-        """Viewing a file in a nested directory works correctly."""
-        nested = tmp_path / "nested" / "dir"
-        nested.mkdir(parents=True)
-        test_file = nested / "file.txt"
-        test_file.write_text("nested content")
-
-        result = view_file_fn(path="nested/dir/file.txt", **mock_workspace)
-
-        assert result["success"] is True
-        assert result["content"] == "nested content"
-
-
-class TestWriteToFileTool:
-    """Tests for write_to_file tool."""
-
-    @pytest.fixture
-    def write_to_file_fn(self, mcp):
-        from aden_tools.tools.file_system_toolkits.write_to_file import register_tools
-        register_tools(mcp)
-        return mcp._tool_manager._tools["write_to_file"].fn
-
-    def test_write_new_file(self, write_to_file_fn, mock_workspace, mock_secure_path, tmp_path):
-        """Writing to a new file creates it successfully."""
-        result = write_to_file_fn(
-            path="new_file.txt",
-            content="Test content",
-            **mock_workspace
-        )
-
-        assert result["success"] is True
-        assert result["mode"] == "written"
-        assert result["bytes_written"] > 0
-
-        # Verify file was created
-        created_file = tmp_path / "new_file.txt"
-        assert created_file.exists()
-        assert created_file.read_text() == "Test content"
-
-    def test_write_append_mode(self, write_to_file_fn, mock_workspace, mock_secure_path, tmp_path):
-        """Writing with append=True appends to existing file."""
-        test_file = tmp_path / "append_test.txt"
-        test_file.write_text("Line 1\n")
-
-        result = write_to_file_fn(
-            path="append_test.txt",
-            content="Line 2\n",
-            append=True,
-            **mock_workspace
-        )
-
-        assert result["success"] is True
-        assert result["mode"] == "appended"
-        assert test_file.read_text() == "Line 1\nLine 2\n"
-
-    def test_write_overwrite_existing(self, write_to_file_fn, mock_workspace, mock_secure_path, tmp_path):
-        """Writing to existing file overwrites it by default."""
-        test_file = tmp_path / "overwrite.txt"
-        test_file.write_text("Original content")
-
-        result = write_to_file_fn(
-            path="overwrite.txt",
-            content="New content",
-            **mock_workspace
-        )
-
-        assert result["success"] is True
-        assert result["mode"] == "written"
-        assert test_file.read_text() == "New content"
-
-    def test_write_creates_parent_directories(self, write_to_file_fn, mock_workspace, mock_secure_path, tmp_path):
-        """Writing creates parent directories if they don't exist."""
-        result = write_to_file_fn(
-            path="nested/dir/file.txt",
-            content="Test",
-            **mock_workspace
-        )
-
-        assert result["success"] is True
-        created_file = tmp_path / "nested" / "dir" / "file.txt"
-        assert created_file.exists()
-        assert created_file.read_text() == "Test"
-
-    def test_write_empty_content(self, write_to_file_fn, mock_workspace, mock_secure_path, tmp_path):
-        """Writing empty content creates empty file."""
-        result = write_to_file_fn(
-            path="empty.txt",
-            content="",
-            **mock_workspace
-        )
-
-        assert result["success"] is True
-        assert result["bytes_written"] == 0
-        created_file = tmp_path / "empty.txt"
-        assert created_file.exists()
-        assert created_file.read_text() == ""
-
-
-class TestListDirTool:
-    """Tests for list_dir tool."""
-
-    @pytest.fixture
-    def list_dir_fn(self, mcp):
-        from aden_tools.tools.file_system_toolkits.list_dir import register_tools
-        register_tools(mcp)
-        return mcp._tool_manager._tools["list_dir"].fn
-
-    def test_list_directory(self, list_dir_fn, mock_workspace, mock_secure_path, tmp_path):
-        """Listing a directory returns all entries."""
-        # Create test files and directories
-        (tmp_path / "file1.txt").write_text("content")
-        (tmp_path / "file2.txt").write_text("content")
-        (tmp_path / "subdir").mkdir()
-
-        result = list_dir_fn(path=".", **mock_workspace)
-
-        assert result["success"] is True
-        assert result["total_count"] == 3
-        assert len(result["entries"]) == 3
-
-        # Check that entries have correct structure
-        for entry in result["entries"]:
-            assert "name" in entry
-            assert "type" in entry
-            assert entry["type"] in ["file", "directory"]
-
-    def test_list_empty_directory(self, list_dir_fn, mock_workspace, mock_secure_path, tmp_path):
-        """Listing an empty directory returns empty list."""
-        empty_dir = tmp_path / "empty"
-        empty_dir.mkdir()
-
-        result = list_dir_fn(path="empty", **mock_workspace)
-
-        assert result["success"] is True
-        assert result["total_count"] == 0
-        assert result["entries"] == []
-
-    def test_list_nonexistent_directory(self, list_dir_fn, mock_workspace, mock_secure_path):
-        """Listing a non-existent directory returns error."""
-        result = list_dir_fn(path="nonexistent_dir", **mock_workspace)
-
-        assert "error" in result
-        assert "not found" in result["error"].lower()
-
-    def test_list_directory_with_file_sizes(self, list_dir_fn, mock_workspace, mock_secure_path, tmp_path):
-        """Listing a directory returns file sizes for files."""
-        (tmp_path / "small.txt").write_text("hi")
-        (tmp_path / "larger.txt").write_text("hello world")
-        (tmp_path / "subdir").mkdir()
-
-        result = list_dir_fn(path=".", **mock_workspace)
-
-        assert result["success"] is True
-
-        # Find entries by name
-        entries_by_name = {e["name"]: e for e in result["entries"]}
-
-        # Files should have size_bytes
-        assert entries_by_name["small.txt"]["type"] == "file"
-        assert entries_by_name["small.txt"]["size_bytes"] == 2
-
-        assert entries_by_name["larger.txt"]["type"] == "file"
-        assert entries_by_name["larger.txt"]["size_bytes"] == 11
-
-        # Directories should have None for size_bytes
-        assert entries_by_name["subdir"]["type"] == "directory"
-        assert entries_by_name["subdir"]["size_bytes"] is None
-
-
-class TestReplaceFileContentTool:
-    """Tests for replace_file_content tool."""
-
-    @pytest.fixture
-    def replace_file_content_fn(self, mcp):
-        from aden_tools.tools.file_system_toolkits.replace_file_content import register_tools
-        register_tools(mcp)
-        return mcp._tool_manager._tools["replace_file_content"].fn
-
-    def test_replace_content(self, replace_file_content_fn, mock_workspace, mock_secure_path, tmp_path):
-        """Replacing content in a file works correctly."""
-        test_file = tmp_path / "replace_test.txt"
-        test_file.write_text("Hello World! Hello again!")
-
-        result = replace_file_content_fn(
-            path="replace_test.txt",
-            target="Hello",
-            replacement="Hi",
-            **mock_workspace
-        )
-
-        assert result["success"] is True
-        assert result["occurrences_replaced"] == 2
-        assert test_file.read_text() == "Hi World! Hi again!"
-
-    def test_replace_target_not_found(self, replace_file_content_fn, mock_workspace, mock_secure_path, tmp_path):
-        """Replacing non-existent target returns error."""
-        test_file = tmp_path / "test.txt"
-        test_file.write_text("Hello World")
-
-        result = replace_file_content_fn(
-            path="test.txt",
-            target="nonexistent",
-            replacement="new",
-            **mock_workspace
-        )
-
-        assert "error" in result
-        assert "not found" in result["error"].lower()
-
-    def test_replace_file_not_found(self, replace_file_content_fn, mock_workspace, mock_secure_path):
-        """Replacing content in non-existent file returns error."""
-        result = replace_file_content_fn(
-            path="nonexistent.txt",
-            target="foo",
-            replacement="bar",
-            **mock_workspace
-        )
-
-        assert "error" in result
-        assert "not found" in result["error"].lower()
-
-    def test_replace_single_occurrence(self, replace_file_content_fn, mock_workspace, mock_secure_path, tmp_path):
-        """Replacing content with single occurrence works correctly."""
-        test_file = tmp_path / "single.txt"
-        test_file.write_text("Hello World")
-
-        result = replace_file_content_fn(
-            path="single.txt",
-            target="Hello",
-            replacement="Hi",
-            **mock_workspace
-        )
-
-        assert result["success"] is True
-        assert result["occurrences_replaced"] == 1
-        assert test_file.read_text() == "Hi World"
-
-    def test_replace_multiline_content(self, replace_file_content_fn, mock_workspace, mock_secure_path, tmp_path):
-        """Replacing content across multiple lines works correctly."""
-        test_file = tmp_path / "multiline.txt"
-        test_file.write_text("Line 1\nTODO: fix this\nLine 3\nTODO: add tests\n")
-
-        result = replace_file_content_fn(
-            path="multiline.txt",
-            target="TODO:",
-            replacement="DONE:",
-            **mock_workspace
-        )
-
-        assert result["success"] is True
-        assert result["occurrences_replaced"] == 2
-        assert test_file.read_text() == "Line 1\nDONE: fix this\nLine 3\nDONE: add tests\n"
-
-
-class TestGrepSearchTool:
-    """Tests for grep_search tool."""
-
-    @pytest.fixture
-    def grep_search_fn(self, mcp):
-        from aden_tools.tools.file_system_toolkits.grep_search import register_tools
-        register_tools(mcp)
-        return mcp._tool_manager._tools["grep_search"].fn
-
-    def test_grep_search_single_file(self, grep_search_fn, mock_workspace, mock_secure_path, tmp_path):
-        """Searching a single file returns matches."""
-        test_file = tmp_path / "search_test.txt"
-        test_file.write_text("Line 1\nLine 2 with pattern\nLine 3")
-
-        result = grep_search_fn(
-            path="search_test.txt",
-            pattern="pattern",
-            **mock_workspace
-        )
-
-        assert result["success"] is True
-        assert result["total_matches"] == 1
-        assert len(result["matches"]) == 1
-        assert result["matches"][0]["line_number"] == 2
-        assert "pattern" in result["matches"][0]["line_content"]
-
-    def test_grep_search_no_matches(self, grep_search_fn, mock_workspace, mock_secure_path, tmp_path):
-        """Searching with no matches returns empty list."""
-        test_file = tmp_path / "test.txt"
-        test_file.write_text("Hello World")
-
-        result = grep_search_fn(
-            path="test.txt",
-            pattern="nonexistent",
-            **mock_workspace
-        )
-
-        assert result["success"] is True
-        assert result["total_matches"] == 0
-        assert result["matches"] == []
-
-    def test_grep_search_directory_non_recursive(self, grep_search_fn, mock_workspace, mock_secure_path, tmp_path):
-        """Searching directory non-recursively only searches immediate files."""
-        # Create files in root
-        (tmp_path / "file1.txt").write_text("pattern here")
-        (tmp_path / "file2.txt").write_text("no match here")
-
-        # Create nested directory with file
-        nested = tmp_path / "nested"
-        nested.mkdir()
-        (nested / "nested_file.txt").write_text("pattern in nested")
-
-        result = grep_search_fn(
-            path=".",
-            pattern="pattern",
-            recursive=False,
-            **mock_workspace
-        )
-
-        assert result["success"] is True
-        assert result["total_matches"] == 1  # Only finds pattern in root, not in nested
-        assert result["recursive"] is False
-
-    def test_grep_search_directory_recursive(self, grep_search_fn, mock_workspace, mock_secure_path, tmp_path):
-        """Searching directory recursively finds matches in subdirectories."""
-        # Create files in root
-        (tmp_path / "file1.txt").write_text("pattern here")
-
-        # Create nested directory with file
-        nested = tmp_path / "nested"
-        nested.mkdir()
-        (nested / "nested_file.txt").write_text("pattern in nested")
-
-        result = grep_search_fn(
-            path=".",
-            pattern="pattern",
-            recursive=True,
-            **mock_workspace
-        )
-
-        assert result["success"] is True
-        assert result["total_matches"] == 2  # Finds pattern in both files
-        assert result["recursive"] is True
-
-    def test_grep_search_regex_pattern(self, grep_search_fn, mock_workspace, mock_secure_path, tmp_path):
-        """Searching with regex pattern finds complex matches."""
-        test_file = tmp_path / "regex_test.txt"
-        test_file.write_text("foo123bar\nfoo456bar\nbaz789baz\n")
-
-        result = grep_search_fn(
-            path="regex_test.txt",
-            pattern=r"foo\d+bar",
-            **mock_workspace
-        )
-
-        assert result["success"] is True
-        assert result["total_matches"] == 2
-        assert result["matches"][0]["line_number"] == 1
-        assert result["matches"][1]["line_number"] == 2
-
-    def test_grep_search_multiple_matches_per_line(self, grep_search_fn, mock_workspace, mock_secure_path, tmp_path):
-        """Searching returns one match per line even with multiple occurrences."""
-        test_file = tmp_path / "multi_match.txt"
-        test_file.write_text("hello hello hello\nworld\nhello again")
-
-        result = grep_search_fn(
-            path="multi_match.txt",
-            pattern="hello",
-            **mock_workspace
-        )
-
-        assert result["success"] is True
-        assert result["total_matches"] == 2  # Line 1 and Line 3
-
-
-class TestExecuteCommandTool:
-    """Tests for execute_command_tool."""
-
-    @pytest.fixture
-    def execute_command_fn(self, mcp):
-        from aden_tools.tools.file_system_toolkits.execute_command_tool import register_tools
-        register_tools(mcp)
-        return mcp._tool_manager._tools["execute_command_tool"].fn
-
-    def test_execute_simple_command(self, execute_command_fn, mock_workspace, mock_secure_path):
-        """Executing a simple command returns output."""
-        result = execute_command_fn(
-            command="echo 'Hello World'",
-            **mock_workspace
-        )
-
-        assert result["success"] is True
-        assert result["return_code"] == 0
-        assert "Hello World" in result["stdout"]
-
-    def test_execute_failing_command(self, execute_command_fn, mock_workspace, mock_secure_path):
-        """Executing a failing command returns non-zero exit code."""
-        result = execute_command_fn(
-            command="exit 1",
-            **mock_workspace
-        )
-
-        assert result["success"] is True
-        assert result["return_code"] == 1
-
-    def test_execute_command_with_stderr(self, execute_command_fn, mock_workspace, mock_secure_path):
-        """Executing a command that writes to stderr captures it."""
-        result = execute_command_fn(
-            command="echo 'error message' >&2",
-            **mock_workspace
-        )
-
-        assert result["success"] is True
-        assert "error message" in result.get("stderr", "")
-
-    def test_execute_command_list_files(self, execute_command_fn, mock_workspace, mock_secure_path, tmp_path):
-        """Executing ls command lists files."""
-        # Create a test file
-        (tmp_path / "testfile.txt").write_text("content")
-
-        result = execute_command_fn(
-            command=f"ls {tmp_path}",
-            **mock_workspace
-        )
-
-        assert result["success"] is True
-        assert result["return_code"] == 0
-        assert "testfile.txt" in result["stdout"]
-
-    def test_execute_command_with_pipe(self, execute_command_fn, mock_workspace, mock_secure_path):
-        """Executing a command with pipe works correctly."""
-        result = execute_command_fn(
-            command="echo 'hello world' | tr 'a-z' 'A-Z'",
-            **mock_workspace
-        )
-
-        assert result["success"] is True
-        assert result["return_code"] == 0
-        assert "HELLO WORLD" in result["stdout"]
-
-
-class TestApplyDiffTool:
-    """Tests for apply_diff tool."""
-
-    @pytest.fixture
-    def apply_diff_fn(self, mcp):
-        from aden_tools.tools.file_system_toolkits.apply_diff import register_tools
-        register_tools(mcp)
-        return mcp._tool_manager._tools["apply_diff"].fn
-
-    def test_apply_diff_file_not_found(self, apply_diff_fn, mock_workspace, mock_secure_path):
-        """Applying diff to non-existent file returns error."""
-        result = apply_diff_fn(
-            path="nonexistent.txt",
-            diff_text="some diff",
-            **mock_workspace
-        )
-
-        assert "error" in result
-        assert "not found" in result["error"].lower()
-
-    def test_apply_diff_successful(self, apply_diff_fn, mock_workspace, mock_secure_path, tmp_path):
-        """Applying a valid diff successfully modifies the file."""
-        test_file = tmp_path / "diff_test.txt"
-        test_file.write_text("Hello World")
-
-        # Create a simple diff using diff_match_patch format
-        import diff_match_patch as dmp_module
-        dmp = dmp_module.diff_match_patch()
-        patches = dmp.patch_make("Hello World", "Hello Universe")
-        diff_text = dmp.patch_toText(patches)
-
-        result = apply_diff_fn(
-            path="diff_test.txt",
-            diff_text=diff_text,
-            **mock_workspace
-        )
-
-        assert result["success"] is True
-        assert result["all_successful"] is True
-        assert result["patches_applied"] > 0
-        assert test_file.read_text() == "Hello Universe"
-
-    def test_apply_diff_multiline(self, apply_diff_fn, mock_workspace, mock_secure_path, tmp_path):
-        """Applying diff to multiline content works correctly."""
-        test_file = tmp_path / "multiline.txt"
-        original = "Line 1\nLine 2\nLine 3\n"
-        test_file.write_text(original)
-
-        import diff_match_patch as dmp_module
-        dmp = dmp_module.diff_match_patch()
-        modified = "Line 1\nModified Line 2\nLine 3\n"
-        patches = dmp.patch_make(original, modified)
-        diff_text = dmp.patch_toText(patches)
-
-        result = apply_diff_fn(
-            path="multiline.txt",
-            diff_text=diff_text,
-            **mock_workspace
-        )
-
-        assert result["success"] is True
-        assert result["all_successful"] is True
-        assert test_file.read_text() == modified
-
-    def test_apply_diff_invalid_patch(self, apply_diff_fn, mock_workspace, mock_secure_path, tmp_path):
-        """Applying an invalid diff handles gracefully."""
-        test_file = tmp_path / "test.txt"
-        original_content = "Original content"
-        test_file.write_text(original_content)
-
-        # Invalid diff text
-        result = apply_diff_fn(
-            path="test.txt",
-            diff_text="invalid diff format",
-            **mock_workspace
-        )
-
-        # Should either error or show no patches applied
-        if "error" not in result:
-            assert result.get("patches_applied", 0) == 0
-        # File should remain unchanged
-        assert test_file.read_text() == original_content
-
-
-class TestApplyPatchTool:
-    """Tests for apply_patch tool."""
-
-    @pytest.fixture
-    def apply_patch_fn(self, mcp):
-        from aden_tools.tools.file_system_toolkits.apply_patch import register_tools
-        register_tools(mcp)
-        return mcp._tool_manager._tools["apply_patch"].fn
-
-    def test_apply_patch_file_not_found(self, apply_patch_fn, mock_workspace, mock_secure_path):
-        """Applying patch to non-existent file returns error."""
-        result = apply_patch_fn(
-            path="nonexistent.txt",
-            patch_text="some patch",
-            **mock_workspace
-        )
-
-        assert "error" in result
-        assert "not found" in result["error"].lower()
-
-    def test_apply_patch_successful(self, apply_patch_fn, mock_workspace, mock_secure_path, tmp_path):
-        """Applying a valid patch successfully modifies the file."""
-        test_file = tmp_path / "patch_test.txt"
-        test_file.write_text("Hello World")
-
-        # Create a simple patch using diff_match_patch format
-        import diff_match_patch as dmp_module
-        dmp = dmp_module.diff_match_patch()
-        patches = dmp.patch_make("Hello World", "Hello Python")
-        patch_text = dmp.patch_toText(patches)
-
-        result = apply_patch_fn(
-            path="patch_test.txt",
-            patch_text=patch_text,
-            **mock_workspace
-        )
-
-        assert result["success"] is True
-        assert result["all_successful"] is True
-        assert result["patches_applied"] > 0
-        assert test_file.read_text() == "Hello Python"
-
-    def test_apply_patch_multiline(self, apply_patch_fn, mock_workspace, mock_secure_path, tmp_path):
-        """Applying patch to multiline content works correctly."""
-        test_file = tmp_path / "multiline.txt"
-        original = "Line 1\nLine 2\nLine 3\n"
-        test_file.write_text(original)
-
-        import diff_match_patch as dmp_module
-        dmp = dmp_module.diff_match_patch()
-        modified = "Line 1\nModified Line 2\nLine 3\n"
-        patches = dmp.patch_make(original, modified)
-        patch_text = dmp.patch_toText(patches)
-
-        result = apply_patch_fn(
-            path="multiline.txt",
-            patch_text=patch_text,
-            **mock_workspace
-        )
-
-        assert result["success"] is True
-        assert result["all_successful"] is True
-        assert test_file.read_text() == modified
-
-    def test_apply_patch_invalid_patch(self, apply_patch_fn, mock_workspace, mock_secure_path, tmp_path):
-        """Applying an invalid patch handles gracefully."""
-        test_file = tmp_path / "test.txt"
-        original_content = "Original content"
-        test_file.write_text(original_content)
-
-        # Invalid patch text
-        result = apply_patch_fn(
-            path="test.txt",
-            patch_text="invalid patch format",
-            **mock_workspace
-        )
-
-        # Should either error or show no patches applied
-        if "error" not in result:
-            assert result.get("patches_applied", 0) == 0
-        # File should remain unchanged
-        assert test_file.read_text() == original_content
-
-    def test_apply_patch_multiple_changes(self, apply_patch_fn, mock_workspace, mock_secure_path, tmp_path):
-        """Applying patch with multiple changes works correctly."""
-        test_file = tmp_path / "complex.txt"
-        original = "Function foo() {\n  return 42;\n}\n"
-        test_file.write_text(original)
-
-        import diff_match_patch as dmp_module
-        dmp = dmp_module.diff_match_patch()
-        modified = "Function bar() {\n  return 100;\n}\n"
-        patches = dmp.patch_make(original, modified)
-        patch_text = dmp.patch_toText(patches)
-
-        result = apply_patch_fn(
-            path="complex.txt",
-            patch_text=patch_text,
-            **mock_workspace
-        )
-
-        assert result["success"] is True
-        assert result["all_successful"] is True
-        assert test_file.read_text() == modified
@@ -1,99 +0,0 @@
-"""Tests for file_write tool (FastMCP)."""
-import pytest
-from pathlib import Path
-
-from fastmcp import FastMCP
-from aden_tools.tools.file_write_tool import register_tools
-
-
-@pytest.fixture
-def file_write_fn(mcp: FastMCP):
-    """Register and return the file_write tool function."""
-    register_tools(mcp)
-    return mcp._tool_manager._tools["file_write"].fn
-
-
-class TestFileWriteTool:
-    """Tests for file_write tool."""
-
-    def test_write_creates_new_file(self, file_write_fn, tmp_path: Path):
-        """Writing to a new file creates it with content."""
-        new_file = tmp_path / "new.txt"
-
-        result = file_write_fn(file_path=str(new_file), content="Hello, World!")
-
-        assert "error" not in result
-        assert result["created"] is True
-        assert result["name"] == "new.txt"
-        assert new_file.read_text() == "Hello, World!"
-
-    def test_write_overwrites_existing(self, file_write_fn, tmp_path: Path):
-        """Writing to existing file overwrites by default."""
-        existing = tmp_path / "existing.txt"
-        existing.write_text("old content")
-
-        result = file_write_fn(file_path=str(existing), content="new content")
-
-        assert "error" not in result
-        assert result["created"] is False
-        assert result["previous_size"] is not None
-        assert existing.read_text() == "new content"
-
-    def test_write_appends_to_existing(self, file_write_fn, tmp_path: Path):
-        """Writing with mode='append' adds to existing content."""
-        existing = tmp_path / "existing.txt"
-        existing.write_text("line1\n")
-
-        result = file_write_fn(file_path=str(existing), content="line2\n", mode="append")
-
-        assert "error" not in result
-        assert result["mode"] == "append"
-        assert existing.read_text() == "line1\nline2\n"
-
-    def test_write_creates_parent_dirs(self, file_write_fn, tmp_path: Path):
-        """Writing with create_dirs=True creates missing directories."""
-        deep_path = tmp_path / "nested" / "dirs" / "file.txt"
-
-        result = file_write_fn(file_path=str(deep_path), content="content", create_dirs=True)
-
-        assert "error" not in result
-        assert deep_path.exists()
-        assert deep_path.read_text() == "content"
-
-    def test_write_fails_without_parent_dir(self, file_write_fn, tmp_path: Path):
-        """Writing with create_dirs=False fails if parent doesn't exist."""
-        missing_dir = tmp_path / "missing" / "file.txt"
-
-        result = file_write_fn(file_path=str(missing_dir), content="content", create_dirs=False)
-
-        assert "error" in result
-        assert "parent directory" in result["error"].lower()
-
-    def test_write_invalid_mode(self, file_write_fn, tmp_path: Path):
-        """Writing with invalid mode returns error."""
-        result = file_write_fn(
-            file_path=str(tmp_path / "test.txt"),
-            content="content",
-            mode="invalid"
-        )
-
-        assert "error" in result
-        assert "invalid mode" in result["error"].lower()
-
-    def test_write_returns_bytes_written(self, file_write_fn, tmp_path: Path):
-        """Result includes accurate bytes_written count."""
-        content = "Hello, World!"
-
-        result = file_write_fn(file_path=str(tmp_path / "test.txt"), content=content)
-
-        assert result["bytes_written"] == len(content.encode("utf-8"))
-
-    def test_write_with_encoding(self, file_write_fn, tmp_path: Path):
-        """Writing with specific encoding works."""
-        file_path = tmp_path / "latin.txt"
-
-        result = file_write_fn(file_path=str(file_path), content="café", encoding="latin-1")
-
-        assert "error" not in result
-        # Verify it was written with latin-1 encoding
-        assert file_path.read_bytes() == "café".encode("latin-1")
@@ -1,80 +0,0 @@
-"""Tests for pdf_read tool (FastMCP)."""
-import pytest
-from pathlib import Path
-
-from fastmcp import FastMCP
-from aden_tools.tools.pdf_read_tool import register_tools
-
-
-@pytest.fixture
-def pdf_read_fn(mcp: FastMCP):
-    """Register and return the pdf_read tool function."""
-    register_tools(mcp)
-    return mcp._tool_manager._tools["pdf_read"].fn
-
-
-class TestPdfReadTool:
-    """Tests for pdf_read tool."""
-
-    def test_read_pdf_file_not_found(self, pdf_read_fn, tmp_path: Path):
-        """Reading non-existent PDF returns error."""
-        result = pdf_read_fn(file_path=str(tmp_path / "missing.pdf"))
-
-        assert "error" in result
-        assert "not found" in result["error"].lower()
-
-    def test_read_pdf_invalid_extension(self, pdf_read_fn, tmp_path: Path):
-        """Reading non-PDF file returns error."""
-        txt_file = tmp_path / "test.txt"
-        txt_file.write_text("not a pdf")
-
-        result = pdf_read_fn(file_path=str(txt_file))
-
-        assert "error" in result
-        assert "not a pdf" in result["error"].lower()
-
-    def test_read_pdf_directory(self, pdf_read_fn, tmp_path: Path):
-        """Reading a directory returns error."""
-        result = pdf_read_fn(file_path=str(tmp_path))
-
-        assert "error" in result
-        assert "not a file" in result["error"].lower()
-
-    def test_max_pages_clamped_low(self, pdf_read_fn, tmp_path: Path):
-        """max_pages below 1 is clamped to 1."""
-        pdf_file = tmp_path / "test.pdf"
-        pdf_file.write_bytes(b"%PDF-1.4")  # Minimal PDF header (will fail to parse)
-
-        result = pdf_read_fn(file_path=str(pdf_file), max_pages=0)
-        # Will error due to invalid PDF, but max_pages should be accepted
-        assert isinstance(result, dict)
-
-    def test_max_pages_clamped_high(self, pdf_read_fn, tmp_path: Path):
-        """max_pages above 1000 is clamped to 1000."""
-        pdf_file = tmp_path / "test.pdf"
-        pdf_file.write_bytes(b"%PDF-1.4")
-
-        result = pdf_read_fn(file_path=str(pdf_file), max_pages=2000)
-        # Will error due to invalid PDF, but max_pages should be accepted
-        assert isinstance(result, dict)
-
-    def test_pages_parameter_accepted(self, pdf_read_fn, tmp_path: Path):
-        """Various pages parameter formats are accepted."""
-        pdf_file = tmp_path / "test.pdf"
-        pdf_file.write_bytes(b"%PDF-1.4")
-
-        # Test different page formats - all should be accepted
-        for pages in ["all", "1", "1-5", "1,3,5", None]:
-            result = pdf_read_fn(file_path=str(pdf_file), pages=pages)
-            assert isinstance(result, dict)
-
-    def test_include_metadata_parameter(self, pdf_read_fn, tmp_path: Path):
-        """include_metadata parameter is accepted."""
-        pdf_file = tmp_path / "test.pdf"
-        pdf_file.write_bytes(b"%PDF-1.4")
-
-        result = pdf_read_fn(file_path=str(pdf_file), include_metadata=False)
-        assert isinstance(result, dict)
-
-        result = pdf_read_fn(file_path=str(pdf_file), include_metadata=True)
-        assert isinstance(result, dict)
@@ -1,52 +0,0 @@
-"""Tests for web_scrape tool (FastMCP)."""
-import pytest
-
-from fastmcp import FastMCP
-from aden_tools.tools.web_scrape_tool import register_tools
-
-
-@pytest.fixture
-def web_scrape_fn(mcp: FastMCP):
-    """Register and return the web_scrape tool function."""
-    register_tools(mcp)
-    return mcp._tool_manager._tools["web_scrape"].fn
-
-
-class TestWebScrapeTool:
-    """Tests for web_scrape tool."""
-
-    def test_url_auto_prefixed_with_https(self, web_scrape_fn):
-        """URLs without scheme get https:// prefix."""
-        # This will fail to connect, but we can verify the behavior
-        result = web_scrape_fn(url="example.com")
-        # Should either succeed or have a network error (not a validation error)
-        assert isinstance(result, dict)
-
-    def test_max_length_clamped_low(self, web_scrape_fn):
-        """max_length below 1000 is clamped to 1000."""
-        # Test with a very low max_length - implementation clamps to 1000
-        result = web_scrape_fn(url="https://example.com", max_length=500)
-        # Should not error due to invalid max_length
-        assert isinstance(result, dict)
-
-    def test_max_length_clamped_high(self, web_scrape_fn):
-        """max_length above 500000 is clamped to 500000."""
-        # Test with a very high max_length - implementation clamps to 500000
-        result = web_scrape_fn(url="https://example.com", max_length=600000)
-        # Should not error due to invalid max_length
-        assert isinstance(result, dict)
-
-    def test_valid_max_length_accepted(self, web_scrape_fn):
-        """Valid max_length values are accepted."""
-        result = web_scrape_fn(url="https://example.com", max_length=10000)
-        assert isinstance(result, dict)
-
-    def test_include_links_option(self, web_scrape_fn):
-        """include_links parameter is accepted."""
-        result = web_scrape_fn(url="https://example.com", include_links=True)
-        assert isinstance(result, dict)
-
-    def test_selector_option(self, web_scrape_fn):
-        """selector parameter is accepted."""
-        result = web_scrape_fn(url="https://example.com", selector=".content")
-        assert isinstance(result, dict)
@@ -1,57 +0,0 @@
-"""Tests for web_search tool (FastMCP)."""
-import pytest
-
-from fastmcp import FastMCP
-from aden_tools.tools.web_search_tool import register_tools
-
-
-@pytest.fixture
-def web_search_fn(mcp: FastMCP):
-    """Register and return the web_search tool function."""
-    register_tools(mcp)
-    return mcp._tool_manager._tools["web_search"].fn
-
-
-class TestWebSearchTool:
-    """Tests for web_search tool."""
-
-    def test_search_missing_api_key(self, web_search_fn, monkeypatch):
-        """Search without API key returns helpful error."""
-        monkeypatch.delenv("BRAVE_SEARCH_API_KEY", raising=False)
-
-        result = web_search_fn(query="test query")
-
-        assert "error" in result
-        assert "BRAVE_SEARCH_API_KEY" in result["error"]
-        assert "help" in result
-
-    def test_empty_query_returns_error(self, web_search_fn, monkeypatch):
-        """Empty query returns error."""
-        monkeypatch.setenv("BRAVE_SEARCH_API_KEY", "test-key")
-
-        result = web_search_fn(query="")
-
-        assert "error" in result
-        assert "1-500" in result["error"].lower() or "character" in result["error"].lower()
-
-    def test_long_query_returns_error(self, web_search_fn, monkeypatch):
-        """Query exceeding 500 chars returns error."""
-        monkeypatch.setenv("BRAVE_SEARCH_API_KEY", "test-key")
-
-        result = web_search_fn(query="x" * 501)
-
-        assert "error" in result
-
-    def test_num_results_clamped_to_valid_range(self, web_search_fn, monkeypatch):
-        """num_results outside 1-20 is clamped (not error)."""
-        monkeypatch.setenv("BRAVE_SEARCH_API_KEY", "test-key")
-
-        # Test that the function handles out-of-range values gracefully
-        # The implementation clamps values, so we just verify it doesn't crash
-        # (actual API call would fail with invalid key, but that's expected)
-        result = web_search_fn(query="test", num_results=0)
-        # Should either clamp or error - both are acceptable
-        assert isinstance(result, dict)
-
-        result = web_search_fn(query="test", num_results=100)
-        assert isinstance(result, dict)
@@ -1,118 +0,0 @@
-# Hive Configuration
-# ======================
-# Copy this file to config.yaml and customize for your environment.
-# Run `npm run setup` to generate .env files from this configuration.
-#
-# For detailed documentation, see: docs/configuration.md
-
-# -----------------------------------------------------------------------------
-# Application Settings
-# -----------------------------------------------------------------------------
-app:
-  # Application name (displayed in UI and logs)
-  name: Hive
-
-  # Environment: development, production, or test
-  environment: development
-
-  # Log level: debug, info, warn, error
-  log_level: info
-
-# -----------------------------------------------------------------------------
-# Server Configuration
-# -----------------------------------------------------------------------------
-server:
-  # Frontend settings
-  frontend:
-    # Port for the frontend application
-    port: 3000
-
-  # Backend (Hive) settings
-  backend:
-    # Port for the backend API
-    port: 4000
-
-    # Host to bind to (0.0.0.0 for all interfaces)
-    host: 0.0.0.0
-
-# -----------------------------------------------------------------------------
-# TimescaleDB Configuration (Time-series metrics storage)
-# -----------------------------------------------------------------------------
-timescaledb:
-  # Connection URL for TimescaleDB
-  # Format: postgresql://user:password@host:port/database
-  url: postgresql://postgres:postgres@localhost:5432/aden_tsdb
-
-  # External port mapping (for docker-compose)
-  port: 5432
-
-# -----------------------------------------------------------------------------
-# MongoDB Configuration (Policies, pricing, control config)
-# -----------------------------------------------------------------------------
-mongodb:
-  # Connection URL for MongoDB
-  url: mongodb://localhost:27017
-
-  # Database name for main data
-  database: aden
-
-  # Database name for ERP data
-  erp_database: erp
-
-  # External port mapping (for docker-compose)
-  port: 27017
-
-# -----------------------------------------------------------------------------
-# Redis Configuration (Caching and Socket.IO)
-# -----------------------------------------------------------------------------
-redis:
-  # Connection URL for Redis
-  url: redis://localhost:6379
-
-  # External port mapping (for docker-compose)
-  port: 6379
-
-# -----------------------------------------------------------------------------
-# Authentication & Security
-# -----------------------------------------------------------------------------
-auth:
-  # JWT secret key - CHANGE THIS IN PRODUCTION!
-  # Generate with: openssl rand -base64 32
-  jwt_secret: change-this-to-a-secure-random-string-min-32-chars
-
-  # JWT token expiration (e.g., 1h, 7d, 30d)
-  jwt_expires_in: 7d
-
-  # Passphrase for additional encryption - CHANGE THIS IN PRODUCTION!
-  passphrase: change-this-to-a-secure-passphrase
-
-# -----------------------------------------------------------------------------
-# NPM Configuration
-# -----------------------------------------------------------------------------
-npm:
-  # NPM token for private package access (if needed)
-  token: ""
-
-# -----------------------------------------------------------------------------
-# CORS Configuration
-# -----------------------------------------------------------------------------
-cors:
-  # Allowed origin for CORS requests
-  # In production, set this to your frontend URL
-  origin: http://localhost:3000
-
-# -----------------------------------------------------------------------------
-# Feature Flags
-# -----------------------------------------------------------------------------
-features:
-  # Enable user registration
-  registration: true
-
-  # Enable API rate limiting
-  rate_limiting: false
-
-  # Enable request logging
-  request_logging: true
-
-  # Enable MCP (Model Context Protocol) server
-  mcp_server: true
@@ -1,919 +0,0 @@
---
-name: building-agents
-description: Build goal-driven agents with nodes, edges, and validation. Use when asked to create an agent, design a workflow, or build automation that requires multiple steps with LLM reasoning.
---
-
-# Building Agents
-
-Build goal-driven agents that use LLM reasoning to accomplish tasks.
-
-## Quick Start
-
-1. Define the goal (what success looks like)
-2. Add nodes (units of work)
-3. Connect with edges (flow between nodes)
-4. Validate and test
-
-## Core Concepts
-
-**Goal**: The source of truth. Defines success criteria and constraints.
-
-**Node**: A unit of work. Types:
- `llm_generate` - Text generation, parsing
- `llm_tool_use` - Actions requiring tools
- `router` - Conditional branching
- `function` - Deterministic operations
-
-**Edge**: Connection between nodes with conditions:
- `on_success` - Proceed if node succeeds
- `on_failure` - Handle errors
- `always` - Always proceed
- `conditional` - Based on expression
-
-**Session Architecture**: Agents are stateful services that:
- Maintain execution state across invocations
- Pause at HITL nodes and resume with new input
- Accept inputs through multiple entry points
- Persist state until explicitly cleared
-
-## Workflow (HITL Required)
-
-**CRITICAL**: Each step requires human approval before proceeding.
-**CRITICAL**: Run tests during approval so humans can see actual behavior.
-**CRITICAL**: Use structured questions (AskUserQuestion) with fallback to text mode.
-
-### Approval Strategy
-
-**Always try structured questions first**, with graceful fallback:
-
-1. **Attempt**: Call AskUserQuestion with clickable options
-2. **Catch**: If tool fails/rejected, fall back to text prompt
-3. **Parse**: Accept text input like "approve", "reject", "pause"
-
-This ensures the workflow works in all environments (VSCode extension, CLI, web).
-
-**Practical Example**:
-```python
-# 1. Call MCP tool to create goal
-result = set_goal(
-    goal_id="text-parser",
-    name="Text Parser",
-    description="Parse text into JSON",
-    success_criteria='[...]',
-    constraints='[...]'
-)
-
-# 2. Parse result
-import json
-data = json.loads(result)
-
-# 3. MCP tool returns approval_required=True with approval_question
-# Claude sees this and calls AskUserQuestion
-
-# 4. Present component
-print(f"**GOAL: {data['goal']['name']}**")
-print(f"Validation: ✅ PASS")
-
-# 5. Call AskUserQuestion with the approval_question data
-answer = AskUserQuestion(
-    questions=[{
-        "question": data["approval_question"]["question"],
-        "header": data["approval_question"]["header"],
-        "options": data["approval_question"]["options"],
-        "multiSelect": False
-    }]
-)
-
-# If widget supported → User sees clickable buttons:
-┌─────────────────────────────────┐
-│ Do you approve this goal?       │
-│ ○ ✓ Approve (Recommended)       │
-│ ○ ✗ Reject & Modify             │
-│ ○ ⏸ Pause & Review              │
-└─────────────────────────────────┘
-
-# If widget NOT supported → Falls back to text:
-→ Do you approve this goal definition?
-Options: approve | reject | pause
-> approve  ← User types this
-```
-
-### Build Loop
-
-```
-For each component (goal, node, edge):
-1. PROPOSE → Show the component to the human
-2. VALIDATE → Run validation, show errors/warnings
-3. TEST → Run the component with sample inputs to show behavior
-4. ASK APPROVAL → Use AskUserQuestion with clickable options (NOT free text)
-5. Only proceed after approval
-```
-
-**CRITICAL**: Step 4 MUST use AskUserQuestion tool with structured options. Never ask "Do you approve?" as free text.
-
-### Checklist (ask approval at each ✓)
-
-**NOTE**: Every "ASK APPROVAL" means use AskUserQuestion with clickable options.
-
-```
-Agent Build Progress:
- [ ] Define goal with success criteria → ASK APPROVAL (clickable: Approve/Reject/Pause) ✓
- [ ] Define goal constraints → ASK APPROVAL (clickable: Approve/Reject/Pause) ✓
- [ ] Add entry node → TEST NODE → ASK APPROVAL (clickable: Approve/Reject/Pause) ✓
- [ ] Add each processing node → TEST NODE → ASK APPROVAL (clickable: Approve/Reject/Pause) ✓
- [ ] Add pause nodes (if HITL needed) → TEST NODE → ASK APPROVAL (clickable: Approve/Reject/Pause) ✓
- [ ] Add resume entry points (for pause nodes) → ASK APPROVAL (clickable: Approve/Reject/Pause) ✓
- [ ] Add terminal node(s) → TEST NODE → ASK APPROVAL (clickable: Approve/Reject/Pause) ✓
- [ ] Connect nodes with edges → ASK APPROVAL (clickable: Approve/Reject/Pause) ✓
- [ ] Configure entry_points and pause_nodes → ASK APPROVAL (clickable: Approve/Reject/Pause) ✓
- [ ] Validate full graph → TEST GRAPH → SHOW RESULTS
- [ ] Final approval → ASK APPROVAL (clickable: Approve & Export/Reject/Pause) ✓
- [ ] Export to exports/{agent-name}/
-```
-
-### Testing During Approval
-
-**For each node**, use `test_node` with sample inputs:
-```
-test_node(
-    node_id="my-node",
-    test_input='{"key": "sample value"}',
-)
-```
-
-Show the human:
- What inputs the node will read
- What the LLM prompt looks like
- What tools are available
- What outputs will be written
-
-**Before final approval**, use `test_graph` to simulate full execution:
-```
-test_graph(
-    test_input='{"initial": "data"}',
-    dry_run=true,
-)
-```
-
-Show the human:
- The complete execution path
- Each node that will execute
- The data flow between nodes
-
-### Approval Format
-
-After each component, **TRY to use AskUserQuestion with structured options** (fallback to text if unavailable):
-
-**CRITICAL**: Attempt structured questions first, fall back to text mode gracefully if the environment doesn't support it.
-
-```python
-# Try structured approval first
-try:
-    response = AskUserQuestion(
-        questions=[{
-            "question": "Do you approve this [goal/node/edge]?",
-            "header": "Approve",
-            "options": [
-                {
-                    "label": "✓ Approve (Recommended)",
-                    "description": "Component looks good, proceed to next step"
-                },
-                {
-                    "label": "✗ Reject & Modify",
-                    "description": "Need to make changes before proceeding"
-                },
-                {
-                    "label": "⏸ Pause & Review",
-                    "description": "I need more time to review this"
-                }
-            ],
-            "multiSelect": false
-        }]
-    )
-except:
-    # Fallback to text mode if widget not supported
-    # Ask: "Do you approve? Type: approve | reject | pause"
-    pass
-```
-
-**Before asking for approval**, present the component details:
-```
-**[COMPONENT TYPE]: [NAME]**
-
-[Show details of what was created]
-
-Validation: [PASS/FAIL]
- Errors: [list]
- Warnings: [list]
-
-Test Results:
-[Show test_node or test_graph output]
-```
-
-**Then ask for approval** using structured questions (or text fallback).
-
-**DO NOT proceed without explicit human approval.**
-
-### Approval Helper Pattern
-
-**IMPORTANT**: MCP tools now return `approval_required: true` flag with approval questions.
-
-After calling any MCP tool (`set_goal`, `add_node`, `add_edge`), check the response:
-
-```python
-# Call MCP tool
-result = set_goal(...)
-result_data = json.loads(result)
-
-# Check if approval is required
-if result_data.get("approval_required"):
-    approval_q = result_data["approval_question"]
-
-    # Present component details first
-    print(f"**{approval_q['component_type'].upper()}: {approval_q['component_name']}**")
-    print(f"\nValidation: {'✅ PASS' if result_data['valid'] else '❌ FAIL'}")
-    if result_data.get('errors'):
-        print(f"Errors: {result_data['errors']}")
-    if result_data.get('warnings'):
-        print(f"Warnings: {result_data['warnings']}")
-
-    # Try structured question first
-    try:
-        answer = AskUserQuestion(
-            questions=[{
-                "question": approval_q["question"],
-                "header": approval_q["header"],
-                "options": approval_q["options"],
-                "multiSelect": False
-            }]
-        )
-        # Parse answer - look for "Approve" in the response
-        response_text = str(answer.values())
-        if "Approve" in response_text and "Reject" not in response_text:
-            # Approved - continue
-            pass
-        elif "Reject" in response_text:
-            # Rejected - ask what to modify
-            print("What would you like to modify?")
-            # Handle modifications...
-        else:
-            # Paused - stop here
-            print("Build paused. Resume when ready.")
-            return
-
-    except:
-        # Fallback: text mode
-        print(f"\n→ {approval_q['question']}")
-        print("Options: approve | reject | pause")
-        user_input = input().strip().lower()
-
-        if user_input != "approve":
-            if user_input == "reject":
-                print("What would you like to modify?")
-            else:
-                print("Build paused.")
-            return
-```
-
-Use this pattern after EVERY MCP tool call that creates/modifies components.
-
-### Clarification Questions (Use Structured Options)
-
-When you need to clarify requirements during the build, **TRY AskUserQuestion with options (fallback to text)**:
-
-**For Node Type Selection**:
-```python
-try:
-    answer = AskUserQuestion(
-        questions=[{
-            "question": "What type of node should this be?",
-            "header": "Node Type",
-            "options": [
-                {
-                    "label": "llm_generate",
-                    "description": "Text generation, parsing, analysis"
-                },
-                {
-                    "label": "llm_tool_use",
-                    "description": "Actions requiring tools (API calls, data fetching)"
-                },
-                {
-                    "label": "router",
-                    "description": "Conditional branching based on output"
-                },
-                {
-                    "label": "function",
-                    "description": "Deterministic operations without LLM"
-                }
-            ],
-            "multiSelect": false
-        }]
-    )
-    node_type = answer["Node Type"]
-except:
-    # Fallback to text
-    print("→ Node type? Options: llm_generate | llm_tool_use | router | function")
-    node_type = input().strip()
-```
-
-**For Edge Conditions**:
-```python
-AskUserQuestion(
-    questions=[{
-        "question": "When should this edge be traversed?",
-        "header": "Edge Condition",
-        "options": [
-            {
-                "label": "on_success (Recommended)",
-                "description": "Proceed only if node succeeds"
-            },
-            {
-                "label": "on_failure",
-                "description": "Proceed only if node fails (error handling)"
-            },
-            {
-                "label": "always",
-                "description": "Always proceed regardless of result"
-            },
-            {
-                "label": "conditional",
-                "description": "Custom expression-based condition"
-            }
-        ],
-        "multiSelect": false
-    }]
-)
-```
-
-**For Multi-Field Input** (e.g., collecting input/output keys):
-```python
-AskUserQuestion(
-    questions=[{
-        "question": "What keys should this node read from memory?",
-        "header": "Input Keys",
-        "options": [
-            {
-                "label": "objective",
-                "description": "User's main objective/request"
-            },
-            {
-                "label": "context",
-                "description": "Additional context data"
-            },
-            {
-                "label": "previous_result",
-                "description": "Output from previous node"
-            },
-            {
-                "label": "Custom keys",
-                "description": "I'll specify custom keys in the text field"
-            }
-        ],
-        "multiSelect": true  # Allow selecting multiple
-    }]
-)
-```
-
-**For Yes/No Decisions**:
-```python
-AskUserQuestion(
-    questions=[{
-        "question": "Should this agent support pause/resume for HITL conversations?",
-        "header": "HITL Support",
-        "options": [
-            {
-                "label": "Yes",
-                "description": "Agent will pause for user input and resume later"
-            },
-            {
-                "label": "No",
-                "description": "Agent runs end-to-end without pausing"
-            }
-        ],
-        "multiSelect": false
-    }]
-)
-```
-
-**General Rules**:
- If there are 2-4 common options → Use structured questions with fallback
- For truly open-ended input (system prompts, descriptions) → Text input only
- **Always wrap AskUserQuestion in try/except** to handle environments without widget support
- Fallback format: Simple text prompt listing the options
-
-## Defining Goals
-
-Goals must be measurable. Include:
-
-```python
-Goal(
-    id="my-agent",
-    name="My Agent",
-    description="One sentence describing what it does",
-    success_criteria=[
-        SuccessCriterion(
-            id="primary",
-            description="What must be true for success",
-            metric="how to measure",
-            target="threshold",
-            weight=1.0,
-        ),
-    ],
-    constraints=[
-        Constraint(
-            id="safety",
-            description="What the agent must NOT do",
-            constraint_type="hard",  # hard = must not violate
-            category="safety",
-        ),
-    ],
-)
-```
-
-**Good goals**: Specific, measurable, constrained
-**Bad goals**: Vague, unmeasurable, no boundaries
-
-## Integrating External Tools (MCP Servers)
-
-Before adding nodes, you can register MCP servers to make their tools available to your agent.
-
-### Using aden-tools in the Hive Monorepo
-
-The hive monorepo includes `aden-tools` which provides web search, web scraping, and file operations.
-
-**Step 1: Register the MCP Server**
-
-After creating your session, register aden-tools:
-
-```python
-# Using MCP tools
-add_mcp_server(
-    name="aden-tools",
-    transport="stdio",
-    command="python",
-    args='["mcp_server.py", "--stdio"]',
-    cwd="../aden-tools"  # Relative to core/ directory
-)
-```
-
-**Expected response:**
-```json
-{
-  "success": true,
-  "server": {
-    "name": "aden-tools",
-    "transport": "stdio",
-    "command": "python",
-    "args": ["-m", "aden_tools.server"],
-    "cwd": "../aden-tools"
-  },
-  "tools_discovered": 6,
-  "tools": [
-    "web_search",
-    "web_scrape",
-    "file_read",
-    "file_write",
-    "pdf_read",
-    "example_tool"
-  ],
-  "note": "MCP server 'aden-tools' registered with 6 tools..."
-}
-```
-
-**Step 2: List Available Tools** (optional verification)
-
-```python
-list_mcp_tools(server_name="aden-tools")
-```
-
-This shows detailed information about each tool including parameters.
-
-**Step 3: Use Tools in Your Nodes**
-
-Now you can reference these tools in `llm_tool_use` nodes:
-
-```python
-add_node(
-    node_id="web_searcher",
-    name="Web Searcher",
-    description="Search the web for information",
-    node_type="llm_tool_use",
-    input_keys='["query"]',
-    output_keys='["search_results"]',
-    tools='["web_search"]',  # ← Tool from aden-tools
-    system_prompt="Search for {query} using web_search tool"
-)
-```
-
-**Step 4: Export Creates mcp_servers.json**
-
-When you export your agent with `export_graph()`, the MCP server configuration is automatically saved:
-
-```
-exports/my-agent/
-├── agent.json           # Agent specification
-├── README.md            # Documentation
-└── mcp_servers.json     # ← MCP configuration (auto-generated)
-```
-
-The `mcp_servers.json` file ensures the agent can access aden-tools when run later.
-
-### Available aden-tools
-
-| Tool | Description | Key Parameters |
-|------|-------------|----------------|
-| `web_search` | Search the web using Brave Search API | `query`, `num_results`, `country` |
-| `web_scrape` | Extract text content from a webpage | `url`, `selector`, `include_links` |
-| `file_read` | Read file contents | `path` |
-| `file_write` | Write content to files | `path`, `content` |
-| `pdf_read` | Extract text from PDF files | `path` |
-
-### MCP Server Management
-
-List registered servers:
-```python
-list_mcp_servers()
-```
-
-Remove a server:
-```python
-remove_mcp_server(name="aden-tools")
-```
-
-### Best Practices
-
-1. **Register early**: Call `add_mcp_server` right after `create_session` and before defining nodes
-2. **Verify tools**: Use `list_mcp_tools` to see available tools and their parameters
-3. **Minimal tools**: Only include tools a node actually needs in its `tools` list
-4. **Test nodes**: Use `test_node` to verify tool access works before building the full graph
-
-### Example: Research Agent with aden-tools
-
-```python
-# 1. Create session
-create_session(name="research-agent")
-
-# 2. Register aden-tools
-add_mcp_server(
-    name="aden-tools",
-    transport="stdio",
-    command="python",
-    args='["mcp_server.py", "--stdio"]',
-    cwd="../aden-tools"
-)
-
-# 3. Verify tools
-list_mcp_tools(server_name="aden-tools")
-
-# 4. Define goal
-set_goal(
-    goal_id="research",
-    name="Research Agent",
-    description="Gather and synthesize information",
-    success_criteria='[...]',
-    constraints='[...]'
-)
-
-# 5. Add node that uses web_search
-add_node(
-    node_id="searcher",
-    name="Information Searcher",
-    node_type="llm_tool_use",
-    input_keys='["topic"]',
-    output_keys='["search_results"]',
-    tools='["web_search"]',  # From aden-tools
-    system_prompt="Search for information about {topic}"
-)
-
-# 6. Continue building...
-```
-
-## Adding Nodes
-
-Each node does one thing:
-
-```python
-NodeSpec(
-    id="processor",
-    name="Processor",
-    description="What this node does",
-    node_type="llm_tool_use",
-    input_keys=["input_data"],      # What it reads
-    output_keys=["result"],          # What it writes
-    tools=["tool_a", "tool_b"],      # Available tools
-    system_prompt="Instructions for the LLM",
-)
-```
-
-**Node design rules**:
- Single responsibility
- Explicit input/output keys
- Minimal tools (only what's needed)
- Specific system prompts
-
-## Connecting Edges
-
-Edges define flow:
-
-```python
-EdgeSpec(
-    id="process-to-format",
-    source="processor",
-    target="formatter",
-    condition=EdgeCondition.ON_SUCCESS,
-)
-```
-
-**Edge rules**:
- Every node (except terminal) needs outgoing edges
- Handle failure paths explicitly
- Use priority when multiple edges could match
-
-## Pause/Resume Architecture (HITL Conversations)
-
-For agents that need multi-turn conversations with users:
-
-### Graph Configuration
-
-```python
-GraphSpec(
-    entry_node="start-node",
-    entry_points={
-        "start": "analyze-input",                    # Initial entry
-        "request-clarification_resume": "process-clarification",  # Resume after pause
-    },
-    pause_nodes=["request-clarification"],  # Nodes that pause execution
-    terminal_nodes=["output-result"],
-)
-```
-
-### Pause Node Pattern
-
-**Pause nodes** generate output (e.g., questions) then pause execution:
-
-```python
-# Node 1: Detect if clarification needed (entry node)
-NodeSpec(
-    id="analyze-input",
-    node_type="llm_generate",
-    input_keys=["objective"],
-    output_keys=["objective", "needs_clarification", "questions"],
-)
-
-# Node 2: Ask questions (PAUSE NODE)
-NodeSpec(
-    id="request-clarification",
-    node_type="llm_generate",
-    input_keys=["objective", "questions"],
-    output_keys=["questions_to_ask"],  # Returns questions to user
-)
-
-# Node 3: Process user's answers (RESUME ENTRY POINT)
-NodeSpec(
-    id="process-clarification",
-    node_type="llm_generate",
-    input_keys=["objective", "questions_to_ask", "input"],  # input = user's answers
-    output_keys=["enriched_objective", "ready"],
-)
-```
-
-### Execution Flow
-
-**First invocation** (fresh start):
-```
-User: "Travel to LA"
-→ Entry: analyze-input
-→ Executes: analyze-input (needs_clarification=true)
-→ Executes: request-clarification (pause node)
-⏸ PAUSES - saves state
-```
-
-**Second invocation** (resume):
-```
-User: "from SF, March 15-20"
-→ Entry: process-clarification (resume point)
-→ Executes: process-clarification (merges answers)
-→ Continues: identify-stakeholders → ...
-```
-
-### Key Rules
-
-1. **Pause nodes are NOT terminal** - They execute fully, save state, then pause
-2. **Entry points** - Each pause node needs a `{pause_node}_resume` entry point
-3. **Resume node** - Takes user's follow-up input in the `input` key
-4. **State restoration** - All memory from pause is restored automatically
-
-## Validation Checks
-
-Before running, validate:
- [ ] Entry node exists (no incoming edges)
- [ ] Terminal nodes exist (no outgoing edges)
- [ ] All nodes reachable from entry
- [ ] No orphan nodes
- [ ] All edge sources/targets exist
-
-## Example: Calculator Agent
-
-See [examples/calculator.md](examples/calculator.md) for a complete example.
-
-## Example: Sales Agent
-
-See [examples/sales-agent.md](examples/sales-agent.md) for a multi-node agent with tools.
-
-## Common Patterns
-
-**Linear pipeline**: A → B → C → D (each node feeds the next)
-
-**Router pattern**: A → Router → [B or C or D] based on condition
-
-**Error handling**: Add `on_failure` edges to error handler nodes
-
-**Parallel paths**: Multiple edges from same source (use priority)
-
-**HITL Conversation** (multi-turn with user):
-```
-analyze → needs_clarification? → YES → request-clarification (PAUSE)
-                              ↓ NO                          ↓
-                              process                [User provides answers]
-                                                             ↓
-                                              process-clarification (RESUME) → continue
-```
- Pause node generates questions and pauses
- User provides answers in next invocation
- Resume node merges answers and continues
- State persists across pauses automatically
-
-## Anti-Patterns
-
-**Too many nodes**: If a node does one tiny thing, combine with others
-
-**Vague prompts**: "Process the data" → "Extract the customer name and email from the JSON"
-
-**Missing error paths**: Always handle what happens when nodes fail
-
-**Circular dependencies**: Nodes shouldn't loop back without exit conditions
-
-**Terminal pause nodes**: ❌ Don't make pause nodes terminal - they need edges to resume nodes
-
-**Missing resume entry points**: ❌ Each pause node needs a `{pause_node}_resume` entry point
-
-**Restarting instead of resuming**: ❌ Don't route back to entry node - use resume entry points
-
-## Tools Reference
-
-### Building Tools
-| Tool | Purpose |
-|------|---------|
-| `create_session` | Start a new agent building session |
-| `set_goal` | Define the goal with success criteria and constraints |
-| `add_node` | Add a node to the graph |
-| `add_edge` | Connect two nodes with an edge |
-| `validate_graph` | Check the graph for errors |
-| `export_graph` | Export the completed agent |
-| `get_session_status` | View current build progress |
-
-### Testing Tools (for HITL approval)
-| Tool | Purpose |
-|------|---------|
-| `test_node` | Run a single node with sample inputs to show behavior |
-| `test_graph` | Simulate full graph execution to show the complete flow |
-
-## Using the Exported Agent
-
-After `export_graph`, you get JSON containing both the **plan** and the **goal**.
-
-### 1. Save the Export to Proper Location
-
-**CRITICAL**: Each agent MUST be saved to its own folder under `exports/`:
-
-```
-exports/
-├── outbound-sales-agent/
-│   ├── agent.json          # The export_graph() output
-│   └── tools.py            # Tool implementations (optional)
-├── lead-qualifier/
-│   ├── agent.json
-│   └── tools.py
-└── customer-support/
-    ├── agent.json
-    └── tools.py
-```
-
-Save the complete output from `export_graph()`:
-```python
-import os
-
-# Create agent folder
-agent_name = "outbound-sales-agent"  # Use the session name
-os.makedirs(f"exports/{agent_name}", exist_ok=True)
-
-# Save the export
-with open(f"exports/{agent_name}/agent.json", "w") as f:
-    f.write(export_graph_output)
-```
-
-### 2. Running the Agent (CLI)
-
-Use the built-in agent runner CLI:
-
-```bash
-# Show agent info
-python -m core info exports/outbound-sales-agent
-
-# Validate the agent
-python -m core validate exports/outbound-sales-agent
-
-# Run with JSON input
-python -m core run exports/outbound-sales-agent --input '{"lead_id": "123"}'
-
-# Interactive shell (best for conversational agents)
-python -m core shell exports/outbound-sales-agent
-
-# Run in mock mode (no real LLM calls)
-python -m core run exports/outbound-sales-agent --input '{"lead_id": "123"}' --mock
-
-# List all agents
-python -m core list exports/
-```
-
-**Interactive Shell** (for agents with pause/resume):
-```bash
-$ python -m core shell exports/task-planner
-
->>> Travel to LA this month
-⏸ Agent paused at: request-clarification
-   Questions: ["What's your departure city?", "What dates?"]
-
->>> from San Francisco, March 15-20
-🔄 Resuming from paused state
-✓ Execution complete!
-
-# Use /reset to clear conversation state
->>> /reset
-✓ Conversation state and agent session cleared
-```
-
-### 3. Running the Agent (Python API)
-
-Use `AgentRunner` for programmatic access:
-
-```python
-import asyncio
-from framework.runner import AgentRunner
-
-async def main():
-    # Load and run
-    runner = AgentRunner.load("exports/outbound-sales-agent")
-    result = await runner.run({"lead_id": "123"})
-
-    if result.status.value == "completed":
-        print("Success!", result.results)
-    else:
-        print("Needs attention:", result.feedback)
-
-asyncio.run(main())
-```
-
-With context manager:
-```python
-async with AgentRunner.load("exports/outbound-sales-agent") as runner:
-    result = await runner.run({"lead_id": "123"})
-```
-
-### 4. Providing Tools
-
-Create `tools.py` in the agent folder:
-
-```python
-"""Tools for my-agent."""
-import json
-from framework.llm.provider import Tool, ToolUse, ToolResult
-
-# Define tools
-TOOLS = {
-    "my_tool": Tool(
-        name="my_tool",
-        description="What it does",
-        parameters={"type": "object", "properties": {"param": {"type": "string"}}},
-    ),
-}
-
-# Implement executor
-def tool_executor(tool_use: ToolUse) -> ToolResult:
-    if tool_use.name == "my_tool":
-        result = do_something(tool_use.input["param"])
-        return ToolResult(
-            tool_use_id=tool_use.id,
-            content=json.dumps(result),
-            is_error=False,
-        )
-```
-
-Or register tools programmatically:
-```python
-runner = AgentRunner.load("exports/my-agent")
-runner.register_tool("my_tool", my_tool_function)
-result = await runner.run(context)
-```
-
-For complete API details, see [reference/api.md](reference/api.md).
@@ -1,161 +0,0 @@
-# Example: Calculator Agent
-
-A simple agent that evaluates mathematical expressions.
-
-## Goal
-
-```python
-from framework.graph import Goal, SuccessCriterion, Constraint
-
-goal = Goal(
-    id="calculator",
-    name="Calculator",
-    description="Evaluate mathematical expressions accurately",
-    success_criteria=[
-        SuccessCriterion(
-            id="correct-result",
-            description="Mathematical result is correct",
-            metric="output_equals_expected",
-            target="exact_match",
-            weight=1.0,
-        ),
-    ],
-    constraints=[
-        Constraint(
-            id="no-crash",
-            description="Invalid operations return 'Error', not exceptions",
-            constraint_type="hard",
-            category="safety",
-            check="no_exception",
-        ),
-    ],
-)
-```
-
-## Nodes
-
-```python
-from framework.graph import NodeSpec
-
-nodes = [
-    NodeSpec(
-        id="calculator",
-        name="Calculator",
-        description="Evaluate the mathematical expression",
-        node_type="llm_tool_use",
-        input_keys=["expression"],
-        output_keys=["result"],
-        tools=["calculate"],
-        system_prompt="Calculate the expression using the calculate tool. Return only the numeric result.",
-    ),
-    NodeSpec(
-        id="formatter",
-        name="Formatter",
-        description="Format the result for display",
-        node_type="llm_generate",
-        input_keys=["result"],
-        output_keys=["formatted"],
-        system_prompt="Format the number for display. Output only the formatted result.",
-    ),
-]
-```
-
-## Edges
-
-```python
-from framework.graph import EdgeSpec, EdgeCondition
-
-edges = [
-    EdgeSpec(
-        id="calc-to-format",
-        source="calculator",
-        target="formatter",
-        condition=EdgeCondition.ON_SUCCESS,
-    ),
-]
-```
-
-## Graph
-
-```python
-from framework.graph.edge import GraphSpec
-
-graph = GraphSpec(
-    id="calculator-graph",
-    goal_id=goal.id,
-    entry_node="calculator",
-    terminal_nodes=["formatter"],
-    nodes=nodes,
-    edges=edges,
-)
-```
-
-## Tool Definition
-
-```python
-from framework.llm.provider import Tool, ToolResult
-
-tools = [
-    Tool(
-        name="calculate",
-        description="Evaluate a mathematical expression",
-        parameters={
-            "type": "object",
-            "properties": {
-                "expression": {"type": "string", "description": "Math expression to evaluate"}
-            },
-            "required": ["expression"],
-        },
-    ),
-]
-
-def tool_executor(ctx, tool_use):
-    if tool_use.name == "calculate":
-        expr = tool_use.input["expression"]
-        try:
-            # Safe evaluation (in production, use a proper math parser)
-            result = eval(expr.replace('×', '*').replace('÷', '/'))
-            return ToolResult(tool_use.id, json.dumps({"result": result}), False)
-        except Exception:
-            return ToolResult(tool_use.id, json.dumps({"error": "Error"}), True)
-    return ToolResult(tool_use.id, json.dumps({"error": "Unknown tool"}), True)
-```
-
-## Running
-
-```python
-from core import Runtime
-from framework.llm import AnthropicProvider
-from framework.graph import GraphExecutor
-
-async def run():
-    runtime = Runtime("/tmp/calculator")
-    llm = AnthropicProvider()
-
-    executor = GraphExecutor(
-        runtime=runtime,
-        llm=llm,
-        tools=tools,
-        tool_executor=tool_executor,
-    )
-
-    result = await executor.execute(
-        graph=graph,
-        goal=goal,
-        input_data={"expression": "2 + 3 * 4"},
-    )
-
-    print(f"Result: {result.output}")
-```
-
-## Architecture
-
-```
-┌────────────┐    on_success    ┌───────────┐
-│ Calculator │ ───────────────► │ Formatter │
-│ (tool_use) │                  │ (generate)│
-└────────────┘                  └───────────┘
-     │                                │
-  calculate                      formats
-  tool call                      output
-```
@@ -1,207 +0,0 @@
-# Example: Sales Opportunity Agent
-
-A multi-node agent that analyzes sales opportunities and recommends actions.
-
-## Goal
-
-```python
-goal = Goal(
-    id="sales-opportunity",
-    name="Sales Opportunity Automation",
-    description="Analyze opportunities, qualify leads, recommend next actions",
-    success_criteria=[
-        SuccessCriterion(
-            id="accurate-qualification",
-            description="Correctly qualify leads as hot/warm/cold",
-            metric="qualification_accuracy",
-            target=">0.85",
-            weight=0.4,
-        ),
-        SuccessCriterion(
-            id="actionable-recommendations",
-            description="Provide specific next steps",
-            metric="recommendation_specificity",
-            target="always_specific",
-            weight=0.3,
-        ),
-    ],
-    constraints=[
-        Constraint(
-            id="no-false-promises",
-            description="Never suggest outcomes without data support",
-            constraint_type="hard",
-            category="safety",
-        ),
-        Constraint(
-            id="privacy",
-            description="Handle data in compliance with privacy regulations",
-            constraint_type="hard",
-            category="safety",
-        ),
-    ],
-)
-```
-
-## Nodes
-
-### 1. Lead Analyzer (Entry)
-
-```python
-NodeSpec(
-    id="lead-analyzer",
-    name="Lead Analyzer",
-    description="Extract engagement signals from opportunity data",
-    node_type="llm_generate",
-    input_keys=["opportunity"],
-    output_keys=["signals", "company_profile", "engagement_summary"],
-    system_prompt="""Analyze the opportunity and extract:
-1. Engagement signals (response times, meeting attendance)
-2. Company profile (size, industry, fit)
-3. Deal signals (budget, timeline, decision-maker)
-
-Output JSON with: signals, company_profile, engagement_summary""",
-)
-```
-
-### 2. Opportunity Scorer
-
-```python
-NodeSpec(
-    id="opportunity-scorer",
-    name="Opportunity Scorer",
-    description="Score opportunity based on signals",
-    node_type="llm_tool_use",
-    input_keys=["signals", "company_profile", "engagement_summary"],
-    output_keys=["score", "qualification", "score_breakdown"],
-    tools=["historical_lookup"],
-    system_prompt="""Score this opportunity 0-100:
- Engagement (30%)
- Company fit (25%)
- Deal signals (25%)
- Historical similarity (20%)
-
-Qualify as:
- HOT (80-100): High intent, active engagement
- WARM (50-79): Some interest, needs nurturing
- COLD (0-49): Low engagement or poor fit
-
-Use historical_lookup to find similar deals.""",
-)
-```
-
-### 3. Action Recommender
-
-```python
-NodeSpec(
-    id="action-recommender",
-    name="Action Recommender",
-    description="Generate specific next steps",
-    node_type="llm_tool_use",
-    input_keys=["score", "qualification", "engagement_summary", "opportunity"],
-    output_keys=["recommended_actions", "reasoning", "priority"],
-    tools=["calendar_availability", "email_templates"],
-    system_prompt="""Recommend actions based on qualification:
-
-HOT: Check calendar, schedule meeting, send proposal
-WARM: Send nurturing content, plan discovery call
-COLD: Re-engagement campaign or deprioritize
-
-Output JSON with: recommended_actions, reasoning, priority""",
-)
-```
-
-### 4. Output Formatter (Terminal)
-
-```python
-NodeSpec(
-    id="output-formatter",
-    name="Output Formatter",
-    description="Format final analysis",
-    node_type="llm_generate",
-    input_keys=["qualification", "score", "recommended_actions", "reasoning"],
-    output_keys=["result"],
-    system_prompt="""Format into clean report:
- qualification
- score
- recommended_actions
- reasoning
- one-sentence summary""",
-)
-```
-
-## Edges
-
-```python
-edges = [
-    EdgeSpec(id="analyze-to-score", source="lead-analyzer", target="opportunity-scorer", condition=EdgeCondition.ON_SUCCESS),
-    EdgeSpec(id="score-to-recommend", source="opportunity-scorer", target="action-recommender", condition=EdgeCondition.ON_SUCCESS),
-    EdgeSpec(id="recommend-to-format", source="action-recommender", target="output-formatter", condition=EdgeCondition.ON_SUCCESS),
-]
-```
-
-## Architecture
-
-```
-┌───────────────┐   ┌─────────────────┐   ┌─────────────────┐   ┌─────────────┐
-│ Lead Analyzer │──►│Opportunity      │──►│ Action          │──►│ Output      │
-│ (generate)    │   │Scorer (tool_use)│   │ Recommender     │   │ Formatter   │
-└───────────────┘   └─────────────────┘   │ (tool_use)      │   │ (generate)  │
-                           │              └─────────────────┘   └─────────────┘
-                    historical_lookup           │
-                                         calendar_availability
-                                         email_templates
-```
-
-## Tools
-
-```python
-tools = [
-    Tool(
-        name="historical_lookup",
-        description="Find similar past opportunities",
-        parameters={
-            "type": "object",
-            "properties": {
-                "company_size": {"type": "string"},
-                "industry": {"type": "string"},
-            },
-        },
-    ),
-    Tool(
-        name="calendar_availability",
-        description="Check calendar for meeting slots",
-        parameters={
-            "type": "object",
-            "properties": {
-                "timeframe": {"type": "string"},
-            },
-        },
-    ),
-    Tool(
-        name="email_templates",
-        description="Get email templates for sales scenarios",
-        parameters={
-            "type": "object",
-            "properties": {
-                "template_type": {"type": "string"},
-            },
-        },
-    ),
-]
-```
-
-## Test Cases
-
-```python
-# Hot lead test
-{"opportunity": {"engagement": "high", "budget_confirmed": True, "decision_maker": True}}
-# Expected: qualification = "HOT", priority = "high"
-
-# Cold lead test
-{"opportunity": {"engagement": "low", "budget_confirmed": False, "last_contact": "3 months ago"}}
-# Expected: qualification = "COLD", priority = "low"
-
-# Warm lead test
-{"opportunity": {"engagement": "medium", "budget_confirmed": False, "decision_maker": True}}
-# Expected: qualification = "WARM", priority = "medium"
-```
@@ -1,174 +0,0 @@
-# API Reference
-
-## Goal
-
-```python
-Goal(
-    id: str,                              # Unique identifier
-    name: str,                            # Human-readable name
-    description: str,                     # What the agent does
-    success_criteria: list[SuccessCriterion],  # Measurable success metrics
-    constraints: list[Constraint],        # Boundaries and rules
-    required_capabilities: list[str],     # e.g., ["llm", "tools"]
-    input_schema: dict,                   # Expected input format
-    output_schema: dict,                  # Expected output format
-)
-```
-
-## SuccessCriterion
-
-```python
-SuccessCriterion(
-    id: str,              # Unique identifier
-    description: str,     # What must be true
-    metric: str,          # How to measure (e.g., "accuracy", "output_equals")
-    target: str,          # Threshold (e.g., ">0.9", "exact_match")
-    weight: float,        # Importance (0.0-1.0)
-)
-```
-
-## Constraint
-
-```python
-Constraint(
-    id: str,                    # Unique identifier
-    description: str,           # What the agent must NOT do
-    constraint_type: str,       # "hard" (must not violate) or "soft" (prefer not to)
-    category: str,              # "safety", "time", "cost", "scope", "quality"
-    check: str,                 # How to verify compliance
-)
-```
-
-## NodeSpec
-
-```python
-NodeSpec(
-    id: str,                    # Unique identifier
-    name: str,                  # Human-readable name
-    description: str,           # What this node does
-    node_type: str,             # "llm_generate", "llm_tool_use", "router", "function"
-    input_keys: list[str],      # Keys to read from shared memory
-    output_keys: list[str],     # Keys to write to shared memory
-    system_prompt: str | None,  # Instructions for LLM (required for llm_*)
-    tools: list[str],           # Available tools (for llm_tool_use)
-    routes: dict[str, str],     # Route map (for router)
-    function: str | None,       # Function name (for function)
-    max_retries: int,           # Default 3
-)
-```
-
-### Node Types
-
-| Type | Description | Requires |
-|------|-------------|----------|
-| `llm_generate` | Text generation, parsing | `system_prompt` |
-| `llm_tool_use` | Actions with tools | `system_prompt`, `tools` |
-| `router` | Conditional branching | `routes` |
-| `function` | Deterministic code | `function` |
-
-## EdgeSpec
-
-```python
-EdgeSpec(
-    id: str,                      # Unique identifier
-    source: str,                  # Source node ID
-    target: str,                  # Target node ID
-    condition: EdgeCondition,     # When to traverse
-    condition_expr: str | None,   # Expression for CONDITIONAL
-    input_mapping: dict[str, str],# Data mapping between nodes
-    priority: int,                # Higher = checked first
-)
-```
-
-### EdgeCondition
-
-| Value | When |
-|-------|------|
-| `ALWAYS` | After source completes (success or failure) |
-| `ON_SUCCESS` | Only if source succeeds |
-| `ON_FAILURE` | Only if source fails |
-| `CONDITIONAL` | Based on `condition_expr` |
-
-## GraphSpec
-
-```python
-GraphSpec(
-    id: str,                    # Unique identifier
-    goal_id: str,               # Associated goal
-    entry_node: str,            # Starting node
-    terminal_nodes: list[str],  # Ending nodes
-    nodes: list[NodeSpec],      # All nodes
-    edges: list[EdgeSpec],      # All edges
-    memory_keys: list[str],     # All shared memory keys
-    default_model: str,         # Default LLM model
-    max_steps: int,             # Max execution steps
-)
-```
-
-## GraphExecutor
-
-```python
-executor = GraphExecutor(
-    runtime: Runtime,           # Decision logging
-    llm: LLMProvider,           # LLM for nodes
-    tools: list[Tool],          # Available tools
-    tool_executor: Callable,    # Function to execute tools
-)
-
-result = await executor.execute(
-    graph: GraphSpec,
-    goal: Goal,
-    input_data: dict,
-)
-```
-
-### ExecutionResult
-
-```python
-ExecutionResult(
-    success: bool,              # Did execution succeed?
-    output: dict,               # Final output from shared memory
-    error: str | None,          # Error message if failed
-    steps_executed: int,        # Number of steps taken
-    total_tokens: int,          # LLM tokens used
-    total_latency_ms: int,      # Total execution time
-    path: list[str],            # Node IDs traversed
-)
-```
-
-## Tool Definition
-
-```python
-Tool(
-    name: str,                  # Tool identifier
-    description: str,           # What the tool does
-    parameters: dict,           # JSON Schema for parameters
-)
-```
-
-## ToolResult
-
-```python
-ToolResult(
-    tool_use_id: str,           # ID from tool call
-    content: str,               # Result (usually JSON string)
-    is_error: bool,             # True if tool failed
-)
-```
-
-## Imports
-
-```python
-# Core
-from framework.graph import Goal, SuccessCriterion, Constraint
-from framework.graph import NodeSpec, EdgeSpec, EdgeCondition
-from framework.graph.edge import GraphSpec
-from framework.graph import GraphExecutor
-
-# LLM
-from framework.llm import AnthropicProvider
-from framework.llm.provider import Tool, ToolResult
-
-# Runtime
-from core import Runtime
-```
@@ -1,14 +1,9 @@
 {
  "mcpServers": {
-    "agent-builder": {
-      "command": "python",
-      "args": ["-m", "framework.mcp.agent_builder_server"],
-      "cwd": "/home/timothy/oss/hive/core"
-    },
-    "aden-tools": {
+    "tools": {
      "command": "python",
      "args": ["-m", "aden_tools.mcp_server", "--stdio"],
-      "cwd": "/home/timothy/oss/hive/aden-tools"
+      "cwd": "tools"
    }
  }
 }
@@ -6,7 +6,7 @@ This guide explains how to use the new MCP integration tools in the agent builde

 The agent builder now supports registering external MCP servers as tool sources. This allows you to:

-1. Register MCP servers (like aden-tools) during agent building
+1. Register MCP servers (like tools) during agent building
 2. Discover available tools from those servers
 3. Use those tools in your agent nodes
 4. Automatically generate `mcp_servers.json` configuration on export
@@ -18,6 +18,7 @@ The agent builder now supports registering external MCP servers as tool sources.
 Register an MCP server as a tool source for your agent.

 **Parameters:**
+
 - `name` (string, required): Unique name for the MCP server
 - `transport` (string, required): Transport type - "stdio" or "http"
 - `command` (string): Command to run (for stdio transport)
@@ -29,21 +30,23 @@ Register an MCP server as a tool source for your agent.
 - `description` (string): Description of the MCP server

 **Example - STDIO:**
+
 ```json
 {
  "name": "add_mcp_server",
  "arguments": {
-    "name": "aden-tools",
+    "name": "tools",
    "transport": "stdio",
    "command": "python",
    "args": "[\"mcp_server.py\", \"--stdio\"]",
-    "cwd": "../aden-tools",
+    "cwd": "../tools",
    "description": "Aden tools for web search and file operations"
  }
 }
 ```

 **Example - HTTP:**
+
 ```json
 {
  "name": "add_mcp_server",
@@ -57,15 +60,16 @@ Register an MCP server as a tool source for your agent.
 ```

 **Response:**
+
 ```json
 {
  "success": true,
  "server": {
-    "name": "aden-tools",
+    "name": "tools",
    "transport": "stdio",
    "command": "python",
    "args": ["mcp_server.py", "--stdio"],
-    "cwd": "../aden-tools",
+    "cwd": "../tools",
    "description": "Aden tools..."
  },
  "tools_discovered": 6,
@@ -78,7 +82,7 @@ Register an MCP server as a tool source for your agent.
    "example_tool"
  ],
  "total_mcp_servers": 1,
-  "note": "MCP server 'aden-tools' registered with 6 tools. These tools can now be used in llm_tool_use nodes."
+  "note": "MCP server 'tools' registered with 6 tools. These tools can now be used in event_loop nodes."
 }
 ```

@@ -89,15 +93,16 @@ List all registered MCP servers.
 **Parameters:** None

 **Response:**
+
 ```json
 {
  "mcp_servers": [
    {
-      "name": "aden-tools",
+      "name": "tools",
      "transport": "stdio",
      "command": "python",
      "args": ["mcp_server.py", "--stdio"],
-      "cwd": "../aden-tools",
+      "cwd": "../tools",
      "description": "Aden tools..."
    }
  ],
@@ -110,24 +115,27 @@ List all registered MCP servers.
 List tools available from registered MCP servers.

 **Parameters:**
+
 - `server_name` (string, optional): Name of specific server to list tools from. If omitted, lists tools from all servers.

 **Example:**
+
 ```json
 {
  "name": "list_mcp_tools",
  "arguments": {
-    "server_name": "aden-tools"
+    "server_name": "tools"
  }
 }
 ```

 **Response:**
+
 ```json
 {
  "success": true,
  "tools_by_server": {
-    "aden-tools": [
+    "tools": [
      {
        "name": "web_search",
        "description": "Search the web for information using Brave Search API...",
@@ -141,7 +149,7 @@ List tools available from registered MCP servers.
    ]
  },
  "total_tools": 6,
-  "note": "Use these tool names in the 'tools' parameter when adding llm_tool_use nodes"
+  "note": "Use these tool names in the 'tools' parameter when adding event_loop nodes"
 }
 ```

@@ -150,23 +158,26 @@ List tools available from registered MCP servers.
 Remove a registered MCP server.

 **Parameters:**
+
 - `name` (string, required): Name of the MCP server to remove

 **Example:**
+
 ```json
 {
  "name": "remove_mcp_server",
  "arguments": {
-    "name": "aden-tools"
+    "name": "tools"
  }
 }
 ```

 **Response:**
+
 ```json
 {
  "success": true,
-  "removed": "aden-tools",
+  "removed": "tools",
  "remaining_servers": 0
 }
 ```
@@ -176,6 +187,7 @@ Remove a registered MCP server.
 Here's a complete workflow for building an agent with MCP tools:

 ### 1. Create Session
+
 ```json
 {
  "name": "create_session",
@@ -186,30 +198,33 @@ Here's a complete workflow for building an agent with MCP tools:
 ```

 ### 2. Register MCP Server
+
 ```json
 {
  "name": "add_mcp_server",
  "arguments": {
-    "name": "aden-tools",
+    "name": "tools",
    "transport": "stdio",
    "command": "python",
    "args": "[\"mcp_server.py\", \"--stdio\"]",
-    "cwd": "../aden-tools"
+    "cwd": "../tools"
  }
 }
 ```

 ### 3. List Available Tools
+
 ```json
 {
  "name": "list_mcp_tools",
  "arguments": {
-    "server_name": "aden-tools"
+    "server_name": "tools"
  }
 }
 ```

 ### 4. Set Goal
+
 ```json
 {
  "name": "set_goal",
@@ -223,6 +238,7 @@ Here's a complete workflow for building an agent with MCP tools:
 ```

 ### 5. Add Node with MCP Tool
+
 ```json
 {
  "name": "add_node",
@@ -230,7 +246,7 @@ Here's a complete workflow for building an agent with MCP tools:
    "node_id": "web-searcher",
    "name": "Web Search",
    "description": "Search the web for information",
-    "node_type": "llm_tool_use",
+    "node_type": "event_loop",
    "input_keys": "[\"query\"]",
    "output_keys": "[\"search_results\"]",
    "system_prompt": "Search for {query} using the web_search tool",
@@ -239,9 +255,10 @@ Here's a complete workflow for building an agent with MCP tools:
 }
 ```

-Note: `web_search` is now available because we registered the aden-tools MCP server!
+Note: `web_search` is now available because we registered the tools MCP server!

 ### 6. Export Agent
+
 ```json
 {
  "name": "export_graph",
@@ -250,6 +267,7 @@ Note: `web_search` is now available because we registered the aden-tools MCP ser
 ```

 The export will create:
+
 - `exports/web-research-agent/agent.json` - Agent specification
 - `exports/web-research-agent/README.md` - Documentation
 - `exports/web-research-agent/mcp_servers.json` - **MCP server configuration** ✨
@@ -262,11 +280,11 @@ When you export an agent with registered MCP servers, an `mcp_servers.json` file
 {
  "servers": [
    {
-      "name": "aden-tools",
+      "name": "tools",
      "transport": "stdio",
      "command": "python",
      "args": ["mcp_server.py", "--stdio"],
-      "cwd": "../aden-tools",
+      "cwd": "../tools",
      "description": "Aden tools for web search and file operations"
    }
  ]
@@ -288,7 +306,7 @@ runner = AgentRunner.load("exports/web-research-agent")
 # Run with input
 result = await runner.run({"query": "latest AI breakthroughs"})

-# The web_search tool from aden-tools is automatically available!
+# The web_search tool from tools is automatically available!
 ```

 ## Benefits
@@ -301,14 +319,17 @@ result = await runner.run({"query": "latest AI breakthroughs"})

 ## Common MCP Servers

-### aden-tools
+### tools
+
 Provides:
+
 - `web_search` - Brave Search API integration
 - `web_scrape` - Web page content extraction
 - `file_read` / `file_write` - File operations
 - `pdf_read` - PDF text extraction

 ### Custom MCP Servers
+
 You can register any MCP server that follows the Model Context Protocol specification.

 ## Troubleshooting
@@ -332,3 +353,61 @@ You can register any MCP server that follows the Model Context Protocol specific
 - Verify you registered at least one MCP server
 - Check `get_session_status` to see `mcp_servers_count > 0`
 - Re-export the agent after registering servers
+
+## Credential Validation
+
+When adding nodes with tools that require API keys (like `web_search`), the agent builder automatically validates that the required credentials are available.
+
+### How It Works
+
+When you call `add_node` or `update_node` with a `tools` parameter, the agent builder:
+
+1. Checks which tools require credentials (e.g., `web_search` requires `BRAVE_SEARCH_API_KEY`)
+2. Validates those credentials are set in the environment or `.env` file
+3. Returns an error if any credentials are missing
+
+### Missing Credentials Error
+
+If credentials are missing, you'll receive a response like:
+
+```json
+{
+  "valid": false,
+  "errors": ["Missing credentials for tools: ['BRAVE_SEARCH_API_KEY']"],
+  "missing_credentials": [
+    {
+      "credential": "brave_search",
+      "env_var": "BRAVE_SEARCH_API_KEY",
+      "tools_affected": ["web_search"],
+      "help_url": "https://brave.com/search/api/",
+      "description": "API key for Brave Search"
+    }
+  ],
+  "action_required": "Add the credentials to your .env file and retry",
+  "example": "Add to .env:\nBRAVE_SEARCH_API_KEY=your_key_here",
+  "message": "Cannot add node: missing API credentials. Add them to .env and retry this command."
+}
+```
+
+### Fixing Credential Errors
+
+1. Get the required API key from the URL in `help_url`
+2. Add it to your environment:
+
+   ```bash
+   # Option 1: Export directly
+   export BRAVE_SEARCH_API_KEY=your-key-here
+
+   # Option 2: Add to tools/.env
+   echo "BRAVE_SEARCH_API_KEY=your-key-here" >> tools/.env
+   ```
+
+3. Retry the `add_node` command
+
+### Required Credentials by Tool
+
+| Tool         | Credential             | Get Key                                               |
+| ------------ | ---------------------- | ----------------------------------------------------- |
+| `web_search` | `BRAVE_SEARCH_API_KEY` | [brave.com/search/api](https://brave.com/search/api/) |
+
+Note: The MCP server itself requires `ANTHROPIC_API_KEY` at startup for LLM operations.
@@ -6,7 +6,7 @@ This guide explains how to integrate Model Context Protocol (MCP) servers with t

 The framework provides built-in support for MCP servers, allowing you to:

- **Register MCP servers** via STDIO or HTTP transport
+- **Register MCP servers** via STDIO, HTTP, Unix socket, or SSE transport
 - **Auto-discover tools** from registered servers
 - **Use MCP tools** seamlessly in your agents
 - **Manage multiple MCP servers** simultaneously
@@ -21,13 +21,13 @@ from framework.runner.runner import AgentRunner
 # Load your agent
 runner = AgentRunner.load("exports/my-agent")

-# Register aden-tools MCP server
+# Register tools MCP server
 runner.register_mcp_server(
-    name="aden-tools",
+    name="tools",
    transport="stdio",
    command="python",
    args=["-m", "aden_tools.mcp_server", "--stdio"],
-    cwd="/path/to/aden-tools"
+    cwd="/path/to/tools"
 )

 # Tools are now available to your agent
@@ -42,11 +42,11 @@ Create `mcp_servers.json` in your agent folder:
 {
  "servers": [
    {
-      "name": "aden-tools",
+      "name": "tools",
      "transport": "stdio",
      "command": "python",
      "args": ["-m", "aden_tools.mcp_server", "--stdio"],
-      "cwd": "../aden-tools"
+      "cwd": "../tools"
    }
  ]
 }
@@ -78,6 +78,7 @@ runner.register_mcp_server(
 ```

 **Configuration:**
+
 - `command`: Executable to run (e.g., "python", "node")
 - `args`: List of command-line arguments
 - `cwd`: Working directory for the process
@@ -99,9 +100,52 @@ runner.register_mcp_server(
 ```

 **Configuration:**
+
 - `url`: Base URL of the MCP server
 - `headers`: HTTP headers to include (optional)

+### Unix Socket Transport
+
+Best for same-host inter-process communication with lower overhead than TCP:
+
+```python
+runner.register_mcp_server(
+    name="local-ipc-tools",
+    transport="unix",
+    url="http://localhost",
+    socket_path="/tmp/mcp_server.sock",
+    headers={
+        "Authorization": "Bearer token"
+    }
+)
+```
+
+**Configuration:**
+
+- `url`: Base URL for HTTP requests over the socket (required, e.g., `"http://localhost"`)
+- `socket_path`: Absolute path to the Unix socket file (required, e.g., `"/tmp/mcp_server.sock"`)
+- `headers`: HTTP headers to include (optional)
+
+### SSE Transport
+
+Best for real-time, event-driven connections using the MCP SDK's SSE client:
+
+```python
+runner.register_mcp_server(
+    name="streaming-tools",
+    transport="sse",
+    url="http://localhost:8000/sse",
+    headers={
+        "Authorization": "Bearer token"
+    }
+)
+```
+
+**Configuration:**
+
+- `url`: SSE endpoint URL (required, e.g., `"http://localhost:8000/sse"`)
+- `headers`: HTTP headers for the SSE connection (optional)
+
 ## Using MCP Tools in Agents

 Once registered, MCP tools are available just like any other tool:
@@ -117,9 +161,9 @@ builder = WorkflowBuilder()
 builder.add_node(
    node_id="researcher",
    name="Web Researcher",
-    node_type="llm_tool_use",
+    node_type="event_loop",
    system_prompt="Research the topic using web_search",
-    tools=["web_search"],  # Tool from aden-tools MCP server
+    tools=["web_search"],  # Tool from tools MCP server
    input_keys=["topic"],
    output_keys=["findings"]
 )
@@ -135,7 +179,7 @@ Tools from MCP servers can be referenced in your agent.json just like built-in t
    {
      "id": "searcher",
      "name": "Web Searcher",
-      "node_type": "llm_tool_use",
+      "node_type": "event_loop",
      "system_prompt": "Search for information about {topic}",
      "tools": ["web_search", "web_scrape"],
      "input_keys": ["topic"],
@@ -145,9 +189,9 @@ Tools from MCP servers can be referenced in your agent.json just like built-in t
 }
 ```

-## Available Tools from aden-tools
+## Available Tools from tools

-When you register the `aden-tools` MCP server, the following tools become available:
+When you register the `tools` MCP server, the following tools become available:

 - **web_search**: Search the web using Brave Search API
 - **web_scrape**: Scrape content from a URL
@@ -163,11 +207,11 @@ Some MCP tools require environment variables. You can pass them in the configura

 ```python
 runner.register_mcp_server(
-    name="aden-tools",
+    name="tools",
    transport="stdio",
    command="python",
    args=["-m", "aden_tools.mcp_server", "--stdio"],
-    cwd="../aden-tools",
+    cwd="../tools",
    env={
        "BRAVE_SEARCH_API_KEY": os.environ["BRAVE_SEARCH_API_KEY"]
    }
@@ -180,11 +224,11 @@ runner.register_mcp_server(
 {
  "servers": [
    {
-      "name": "aden-tools",
+      "name": "tools",
      "transport": "stdio",
      "command": "python",
      "args": ["-m", "aden_tools.mcp_server", "--stdio"],
-      "cwd": "../aden-tools",
+      "cwd": "../tools",
      "env": {
        "BRAVE_SEARCH_API_KEY": "${BRAVE_SEARCH_API_KEY}"
      }
@@ -203,11 +247,11 @@ You can register multiple MCP servers to access different sets of tools:
 {
  "servers": [
    {
-      "name": "aden-tools",
+      "name": "tools",
      "transport": "stdio",
      "command": "python",
      "args": ["-m", "aden_tools.mcp_server", "--stdio"],
-      "cwd": "../aden-tools"
+      "cwd": "../tools"
    },
    {
      "name": "database-tools",
@@ -243,6 +287,7 @@ runner.register_mcp_server(
 ### 2. Use HTTP for Production

 HTTP transport is better for:
+
 - Containerized deployments
 - Shared tools across multiple agents
 - Remote tool execution
@@ -255,7 +300,32 @@ runner.register_mcp_server(
 )
 ```

-### 3. Handle Cleanup
+### 3. Use Unix Socket for Same-Host IPC
+
+When both the agent and MCP server run on the same machine, Unix sockets avoid TCP overhead:
+
+```python
+runner.register_mcp_server(
+    name="fast-local-tools",
+    transport="unix",
+    url="http://localhost",
+    socket_path="/tmp/mcp_server.sock"
+)
+```
+
+### 4. Use SSE for Streaming and Real-Time Tools
+
+SSE transport maintains a persistent connection, ideal for event-driven servers:
+
+```python
+runner.register_mcp_server(
+    name="realtime-tools",
+    transport="sse",
+    url="http://realtime-server:8000/sse"
+)
+```
+
+### 5. Handle Cleanup

 Always clean up MCP connections when done:

@@ -277,7 +347,7 @@ async with AgentRunner.load("exports/my-agent") as runner:
    # Automatic cleanup
 ```

-### 4. Tool Name Conflicts
+### 6. Tool Name Conflicts

 If multiple MCP servers provide tools with the same name, the last registered server wins. To avoid conflicts:

@@ -312,6 +382,24 @@ If HTTP transport fails:
 2. Check firewall settings
 3. Verify the URL and port are correct

+### Unix Socket Not Connecting
+
+If Unix socket transport fails:
+
+1. Verify the socket file exists: `ls -la /tmp/mcp_server.sock`
+2. Check file permissions on the socket
+3. Ensure no other process has locked the socket
+4. Verify the `url` field is set (e.g., `"http://localhost"`)
+
+### SSE Connection Issues
+
+If SSE transport fails:
+
+1. Verify the server supports SSE at the given URL
+2. Check that the `mcp` Python package is installed (`pip install mcp`)
+3. Ensure the SSE endpoint is accessible: `curl http://localhost:8000/sse`
+4. Check for firewall or proxy issues blocking long-lived connections
+
 ## Example: Full Agent with MCP Tools

 Here's a complete example of an agent that uses MCP tools:
@@ -330,11 +418,11 @@ async def main():

    # Register MCP server
    runner.register_mcp_server(
-        name="aden-tools",
+        name="tools",
        transport="stdio",
        command="python",
        args=["-m", "aden_tools.mcp_server", "--stdio"],
-        cwd="../aden-tools",
+        cwd="../tools",
        env={
            "BRAVE_SEARCH_API_KEY": "your-api-key"
        }
@@ -1,17 +1,16 @@
-# MCP Server Guide - Agent Builder
+# MCP Server Guide - Agent Building Tools

-This guide covers the MCP (Model Context Protocol) server for building goal-driven agents.
+> **Note:** The standalone `agent-builder` MCP server (`framework.mcp.agent_builder_server`) has been replaced. Agent building is now done via the `coder-tools` server's `initialize_and_build_agent` tool, with underlying logic in `tools/coder_tools_server.py`.
+
+This guide covers the MCP tools available for building goal-driven agents.

 ## Setup

 ### Quick Setup

 ```bash
-# Using the setup script (recommended)
-python setup_mcp.py
-
-# Or using bash
-./setup_mcp.sh
+# Run the quickstart script (recommended)
+./quickstart.sh
 ```

 ### Manual Configuration
@@ -21,10 +20,10 @@ Add to your MCP client configuration (e.g., Claude Desktop):
 ```json
 {
  "mcpServers": {
-    "agent-builder": {
-      "command": "python",
-      "args": ["-m", "framework.mcp.agent_builder_server"],
-      "cwd": "/path/to/goal-agent"
+    "coder-tools": {
+      "command": "uv",
+      "args": ["run", "coder_tools_server.py", "--stdio"],
+      "cwd": "/path/to/hive/tools"
    }
  }
 }
@@ -103,31 +102,20 @@ Add a processing node to the agent graph.
 - `node_id` (string, required): Unique node identifier
 - `name` (string, required): Human-readable name
 - `description` (string, required): What this node does
- `node_type` (string, required): One of: `llm_generate`, `llm_tool_use`, `router`, `function`
+- `node_type` (string, required): Must be `event_loop` (the only valid type)
 - `input_keys` (string, required): JSON array of input variable names
 - `output_keys` (string, required): JSON array of output variable names
- `system_prompt` (string, optional): System prompt for LLM nodes
- `tools` (string, optional): JSON array of tool names for tool_use nodes
- `routes` (string, optional): JSON object of route mappings for router nodes
+- `system_prompt` (string, optional): System prompt for the LLM
+- `tools` (string, optional): JSON array of tool names
+- `client_facing` (boolean, optional): Set to true for human-in-the-loop interaction

-**Node Types:**
+**Node Type:**

-1. **llm_generate**: Uses LLM to generate output from inputs
-   - Requires: `system_prompt`
-   - Tools: Not used
-
-2. **llm_tool_use**: Uses LLM with tools to accomplish tasks
-   - Requires: `system_prompt`, `tools`
-   - Tools: Array of tool names (e.g., `["web_search", "web_fetch"]`)
-
-3. **router**: LLM-powered routing to different paths
-   - Requires: `system_prompt`, `routes`
-   - Routes: Object mapping route names to target node IDs
-   - Example: `{"pass": "success_node", "fail": "retry_node"}`
-
-4. **function**: Executes a pre-defined function
-   - System prompt describes the function behavior
-   - No LLM calls, pure computation
+**event_loop**: LLM-powered node with self-correction loop
+- Requires: `system_prompt`
+- Optional: `tools` (array of tool names, e.g., `["web_search", "web_fetch"]`)
+- Optional: `client_facing` (set to true for HITL / user interaction)
+- Supports: iterative refinement, judge-based evaluation, tool use, streaming

 **Example:**
 ```json
@@ -135,7 +123,7 @@ Add a processing node to the agent graph.
  "node_id": "search_sources",
  "name": "Search Sources",
  "description": "Searches for relevant sources on the topic",
-  "node_type": "llm_tool_use",
+  "node_type": "event_loop",
  "input_keys": "[\"topic\", \"search_queries\"]",
  "output_keys": "[\"sources\", \"source_count\"]",
  "system_prompt": "Search for sources using the provided queries...",
@@ -198,7 +186,7 @@ Export the validated graph as an agent specification.

 **What it does:**
 1. Validates the graph
-2. Auto-generates missing edges from router routes
+2. Validates edge connectivity
 3. Writes files to disk:
   - `exports/{agent-name}/agent.json` - Full agent specification
   - `exports/{agent-name}/README.md` - Auto-generated documentation
@@ -252,47 +240,6 @@ Test the complete agent graph with sample inputs.

 ---

-### Evaluation Rules
-
-#### `add_evaluation_rule`
-Add a rule for the HybridJudge to evaluate node outputs.
-
-**Parameters:**
- `rule_id` (string, required): Unique rule identifier
- `description` (string, required): What this rule checks
- `condition` (string, required): Python expression to evaluate
- `action` (string, required): Action to take: `accept`, `retry`, `escalate`
- `priority` (integer, optional): Rule priority (default: 0)
- `feedback_template` (string, optional): Feedback message template
-
-**Condition Examples:**
- `'result.get("success") == True'` - Check for success flag
- `'result.get("error_type") == "timeout"'` - Check error type
- `'len(result.get("data", [])) > 0'` - Check for non-empty data
-
-**Example:**
-```json
-{
-  "rule_id": "timeout_retry",
-  "description": "Retry on timeout errors",
-  "condition": "result.get('error_type') == 'timeout'",
-  "action": "retry",
-  "priority": 10,
-  "feedback_template": "Timeout occurred, retrying..."
-}
-```
-
-#### `list_evaluation_rules`
-List all configured evaluation rules.
-
-#### `remove_evaluation_rule`
-Remove an evaluation rule.
-
-**Parameters:**
- `rule_id` (string, required): Rule to remove
-
---
-
 ## Example Workflow

 Here's a complete workflow for building a research agent:
@@ -320,7 +267,7 @@ add_node(
    node_id="planner",
    name="Research Planner",
    description="Creates research strategy",
-    node_type="llm_generate",
+    node_type="event_loop",
    input_keys='["topic"]',
    output_keys='["strategy", "queries"]',
    system_prompt="Analyze topic and create research plan..."
@@ -330,7 +277,7 @@ add_node(
    node_id="searcher",
    name="Search Sources",
    description="Find relevant sources",
-    node_type="llm_tool_use",
+    node_type="event_loop",
    input_keys='["queries"]',
    output_keys='["sources"]',
    system_prompt="Search for sources...",
@@ -359,10 +306,9 @@ The exported agent will be saved to `exports/research-agent/`.

 1. **Start with the goal**: Define clear success criteria before building nodes
 2. **Test nodes individually**: Use `test_node` to verify each node works
-3. **Use router nodes for branching**: Don't create edges manually for routers - define routes and they'll be auto-generated
-4. **Add evaluation rules**: Help the judge evaluate outputs deterministically
-5. **Validate early, validate often**: Run `validate_graph` after adding nodes/edges
-6. **Check exports**: Review the generated README.md to verify your agent structure
+3. **Use conditional edges for branching**: Define condition_expr on edges for decision points
+4. **Validate early, validate often**: Run `validate_graph` after adding nodes/edges
+5. **Check exports**: Review the generated README.md to verify your agent structure

 ---

@@ -14,205 +14,54 @@ Framework provides a runtime framework that captures **decisions**, not just act
 ## Installation

 ```bash
-pip install -e .
+uv pip install -e .
 ```

-## MCP Server Setup
+## Agent Building

-The framework includes an MCP (Model Context Protocol) server for building agents. To set up the MCP server:
+Agent scaffolding is handled by the `coder-tools` MCP server (in `tools/coder_tools_server.py`), which provides the `initialize_and_build_agent` tool and related utilities. The package generation logic lives directly in `tools/coder_tools_server.py`.

-### Automated Setup
-
-**Using bash (Linux/macOS):**
-```bash
-./setup_mcp.sh
-```
-
-**Using Python (cross-platform):**
-```bash
-python setup_mcp.py
-```
-
-The setup script will:
-1. Install the framework package
-2. Install MCP dependencies (mcp, fastmcp)
-3. Create/verify `.mcp.json` configuration
-4. Test the MCP server module
-
-### Manual Setup
-
-If you prefer manual setup:
-
-```bash
-# Install framework
-pip install -e .
-
-# Install MCP dependencies
-pip install mcp fastmcp
-
-# Test the server
-python -m framework.mcp.agent_builder_server
-```
-
-### Using with MCP Clients
-
-To use the agent builder with Claude Desktop or other MCP clients, add this to your MCP client configuration:
-
-```json
-{
-  "mcpServers": {
-    "agent-builder": {
-      "command": "python",
-      "args": ["-m", "framework.mcp.agent_builder_server"],
-      "cwd": "/path/to/hive/core"
-    }
-  }
-}
-```
-
-The MCP server provides tools for:
- Creating agent building sessions
- Defining goals with success criteria
- Adding nodes (llm_generate, llm_tool_use, router, function)
- Connecting nodes with edges
- **Registering MCP servers as tool sources** ✨
- **Discovering tools from MCP servers** ✨
- Validating and exporting agent graphs
- Testing nodes and full agent graphs
-
-When you register an MCP server during agent building, the tools from that server become available to your agent, and an `mcp_servers.json` configuration file is automatically created on export.
-
-See [MCP_SERVER_GUIDE.md](MCP_SERVER_GUIDE.md) for agent builder instructions and [MCP_BUILDER_TOOLS_GUIDE.md](MCP_BUILDER_TOOLS_GUIDE.md) for MCP integration tools.
-
-## MCP Tool Integration
-
-The framework also supports **connecting to MCP servers as tool providers**, allowing your agents to use tools from external MCP servers (like aden-tools). This enables you to extend your agents with powerful external capabilities.
-
-### Quick Example
-
-```python
-from framework.runner.runner import AgentRunner
-
-# Load an agent
-runner = AgentRunner.load("exports/task-planner")
-
-# Register an MCP server with tools
-runner.register_mcp_server(
-    name="aden-tools",
-    transport="stdio",
-    command="python",
-    args=["mcp_server.py", "--stdio"],
-    cwd="../aden-tools"
-)
-
-# Tools from the MCP server are now available to your agent
-result = await runner.run({"query": "Search for AI news"})
-```
-
-### Auto-loading MCP Servers
-
-Create `mcp_servers.json` in your agent folder:
-
-```json
-{
-  "servers": [
-    {
-      "name": "aden-tools",
-      "transport": "stdio",
-      "command": "python",
-      "args": ["mcp_server.py", "--stdio"],
-      "cwd": "../aden-tools"
-    }
-  ]
-}
-```
-
-MCP servers will be automatically loaded when you load the agent.
-
-### Available Tools from aden-tools
-
-When you register the aden-tools MCP server, these tools become available:
- `web_search` - Search the web using Brave Search API
- `web_scrape` - Extract content from web pages
- `file_read` - Read file contents
- `file_write` - Write content to files
- `pdf_read` - Extract text from PDF files
-
-See [MCP_INTEGRATION_GUIDE.md](MCP_INTEGRATION_GUIDE.md) for detailed instructions on MCP tool integration.
+See the [Getting Started Guide](../docs/getting-started.md) for building agents.

 ## Quick Start

-### Running Agents
+### Calculator Agent

-The framework comes with pre-built example agents in the `exports/` directory:
+Run an LLM-powered calculator:

 ```bash
-# List available agents
-python -m framework list exports/
+# Run an exported agent
+uv run python -m framework run exports/calculator --input '{"expression": "2 + 3 * 4"}'

-# Show agent information
-python -m framework info exports/task-planner
+# Interactive shell session
+uv run python -m framework shell exports/calculator

-# Run an agent
-python -m framework run exports/task-planner --input '{"objective": "Build a web scraper"}'
-
-# Interactive shell mode (with human-in-the-loop approval)
-python -m framework shell exports/task-planner
+# Show agent info
+uv run python -m framework info exports/calculator
 ```

-### Available Commands
-
- `run` - Execute an exported agent with given input
- `info` - Display agent details (goal, nodes, edges, success criteria)
- `validate` - Check that an agent is valid and runnable
- `list` - List all exported agents in a directory
- `dispatch` - Route requests to multiple agents using the orchestrator
- `shell` - Start an interactive session with an agent
-
-### Building Agents Programmatically
-
-You can build agents using the MCP server (recommended) or programmatically:
+### Using the Runtime

 ```python
 from framework import Runtime

-# Initialize runtime with storage path
-runtime = Runtime("./storage")
+runtime = Runtime("/path/to/storage")

-# Start a run for a goal
-run_id = runtime.start_run(
-    goal_id="data-processor",
-    goal_description="Process data with quality checks",
-    input_data={"dataset": "customers.csv"}
-)
-
-# Set the current node context
-runtime.set_node("processor-node")
+# Start a run
+run_id = runtime.start_run("my_goal", "Description of what we're doing")

 # Record a decision
 decision_id = runtime.decide(
    intent="Choose how to process the data",
    options=[
-        {
-            "id": "fast",
-            "description": "Quick processing",
-            "action_type": "tool_call",
-            "pros": ["Fast"],
-            "cons": ["Less accurate"]
-        },
-        {
-            "id": "thorough",
-            "description": "Detailed processing",
-            "action_type": "tool_call",
-            "pros": ["Accurate"],
-            "cons": ["Slower"]
-        },
+        {"id": "fast", "description": "Quick processing", "pros": ["Fast"], "cons": ["Less accurate"]},
+        {"id": "thorough", "description": "Detailed processing", "pros": ["Accurate"], "cons": ["Slower"]},
    ],
    chosen="thorough",
    reasoning="Accuracy is more important for this task"
 )

-# Record the outcome of the decision
+# Record the outcome
 runtime.record_outcome(
    decision_id=decision_id,
    success=True,
@@ -221,13 +70,28 @@ runtime.record_outcome(
 )

 # End the run
-runtime.end_run(
-    success=True,
-    narrative="Successfully processed all data",
-    output_data={"total_processed": 100}
-)
+runtime.end_run(success=True, narrative="Successfully processed all data")
 ```

+### Testing Agents
+
+The framework includes a goal-based testing framework for validating agent behavior.
+
+Tests are generated using MCP tools (`generate_constraint_tests`, `generate_success_tests`) which return guidelines. Claude writes tests directly using the Write tool based on these guidelines.
+
+```bash
+# Run tests against an agent
+uv run python -m framework test-run <agent_path> --goal <goal_id> --parallel 4
+
+# Debug failed tests
+uv run python -m framework test-debug <agent_path> <test_name>
+
+# List tests for an agent
+uv run python -m framework test-list <agent_path>
+```
+
+For detailed testing workflows, see [developer-guide.md](../docs/developer-guide.md).
+
 ### Analyzing Agent Behavior with Builder

 The BuilderQuery interface allows you to analyze agent runs and identify improvements:
@@ -235,119 +99,50 @@ The BuilderQuery interface allows you to analyze agent runs and identify improve
 ```python
 from framework import BuilderQuery

-# Initialize Builder query interface
-query = BuilderQuery("./storage")
+query = BuilderQuery("/path/to/storage")

-# Find patterns across runs for a goal
-patterns = query.find_patterns("data-processor")
-if patterns:
-    print(f"Success rate: {patterns.success_rate:.1%}")
-    print(f"Runs analyzed: {patterns.run_count}")
+# Find patterns across runs
+patterns = query.find_patterns("my_goal")
+print(f"Success rate: {patterns.success_rate:.1%}")

-    # Show problematic nodes
-    for node_id, failure_rate in patterns.problematic_nodes:
-        print(f"Node '{node_id}' has {failure_rate:.1%} failure rate")
+# Analyze a failure
+analysis = query.analyze_failure("run_123")
+print(f"Root cause: {analysis.root_cause}")
+print(f"Suggestions: {analysis.suggestions}")

-# Analyze a specific failure
-analysis = query.analyze_failure("run_20260119_143022_abc123")
-if analysis:
-    print(f"Failure point: {analysis.failure_point}")
-    print(f"Root cause: {analysis.root_cause}")
-    print(f"\nSuggestions:")
-    for suggestion in analysis.suggestions:
-        print(f"  - {suggestion}")
-
-# Get improvement recommendations for a goal
-suggestions = query.suggest_improvements("data-processor")
+# Get improvement recommendations
+suggestions = query.suggest_improvements("my_goal")
 for s in suggestions:
    print(f"[{s['priority']}] {s['recommendation']}")
-    print(f"  Reason: {s['reason']}")
-
-# Get performance metrics for a specific node
-perf = query.get_node_performance("processor-node")
-print(f"Node: {perf['node_id']}")
-print(f"Success rate: {perf['success_rate']:.1%}")
-print(f"Avg latency: {perf['avg_latency_ms']:.0f}ms")
 ```

 ## Architecture

-The framework consists of several layers:
-
 ```
 ┌─────────────────┐
-│  Human Engineer │  ← Supervision, approval via HITL
+│  Human Engineer │  ← Supervision, approval
 └────────┬────────┘
         │
 ┌────────▼────────┐
-│   Builder LLM   │  ← Analyzes runs, suggests improvements (via MCP)
+│   Builder LLM   │  ← Analyzes runs, suggests improvements
 │  (BuilderQuery) │
 └────────┬────────┘
         │
 ┌────────▼────────┐
-│   Agent Graph   │  ← Node-based execution flow
-│   (AgentRunner) │     (llm_generate, llm_tool_use, router, function)
-└────────┬────────┘
-         │
-┌────────▼────────┐
-│    Runtime      │  ← Records decisions, outcomes, problems
-│   (Decision DB) │
+│   Agent LLM     │  ← Executes tasks, records decisions
+│    (Runtime)    │
 └─────────────────┘
 ```

 ## Key Concepts

-### Graph-Based Agents
-
-Agents are defined as directed graphs with:
- **Nodes**: Execution steps (llm_generate, llm_tool_use, router, function)
- **Edges**: Control flow between nodes, including conditional routing
- **Goal**: What the agent is designed to accomplish with success criteria
- **Constraints**: Hard and soft limits on agent behavior
-
-### Decision Recording
-
 - **Decision**: The atomic unit of agent behavior. Captures intent, options, choice, and reasoning.
- **Outcome**: Result of executing a decision (success/failure, latency, tokens, state changes)
- **Run**: A complete execution trace with all decisions and outcomes
- **Problem**: Issues reported during execution with severity and suggested fixes
-
-### Analysis & Improvement
-
- **Runtime**: Interface agents use to record their behavior during execution
- **BuilderQuery**: Interface for analyzing agent runs and identifying patterns
- **PatternAnalysis**: Cross-run analysis showing success rates, common failures, problematic nodes
- **FailureAnalysis**: Deep dive into why a specific run failed with suggestions
-
-### Human-in-the-Loop (HITL)
-
- **Approval Callbacks**: Nodes can require human approval before execution
- **Interactive Shell**: Chat-like interface for running agents with approval prompts
- **Session State**: Agents can pause and resume based on user input
-
-### Multi-Agent Orchestration
-
- **AgentOrchestrator**: Dispatch requests to multiple agents
- **Agent Discovery**: Automatically discover and register agents from a directory
- **Dispatch Strategy**: Route requests to the most appropriate agent(s)
-
-## Example Agents
-
-The `exports/` directory contains example agents you can run or use as templates:
-
- **task-planner**: Breaks down complex objectives into actionable tasks with dependencies
- **research-summary-agent**: Conducts research and generates summaries
- **outbound-sales-agent**: Handles outbound sales workflows
- **youtube-comments-research**: Analyzes YouTube comments for insights
-
-Each agent includes:
- `agent.json`: Graph definition with nodes, edges, goal, and constraints
- `README.md`: Agent documentation
- `tools.py` (optional): Custom tool implementations
+- **Run**: A complete execution with all decisions and outcomes.
+- **Runtime**: Interface agents use to record their behavior.
+- **BuilderQuery**: Interface Builder uses to analyze agent behavior.

 ## Requirements

 - Python 3.11+
 - pydantic >= 2.0
 - anthropic >= 0.40.0 (for LLM-powered agents)
- mcp, fastmcp (optional, for MCP server)
@@ -0,0 +1,583 @@
+#!/usr/bin/env python3
+"""Antigravity authentication CLI.
+
+Implements OAuth2 flow for Google's Antigravity Code Assist gateway.
+Credentials are stored in ~/.hive/antigravity-accounts.json.
+
+Usage:
+    python -m antigravity_auth auth account add
+    python -m antigravity_auth auth account list
+    python -m antigravity_auth auth account remove <email>
+"""
+
+from __future__ import annotations
+
+import argparse
+import json
+import logging
+import os
+import secrets
+import socket
+import sys
+import time
+import urllib.parse
+import urllib.request
+import webbrowser
+from http.server import BaseHTTPRequestHandler, HTTPServer
+from pathlib import Path
+from typing import Any
+
+logging.basicConfig(level=logging.INFO, format="%(message)s")
+logger = logging.getLogger(__name__)
+
+# OAuth endpoints
+_OAUTH_AUTH_URL = "https://accounts.google.com/o/oauth2/v2/auth"
+_OAUTH_TOKEN_URL = "https://oauth2.googleapis.com/token"
+
+# Scopes for Antigravity/Cloud Code Assist
+_OAUTH_SCOPES = [
+    "https://www.googleapis.com/auth/cloud-platform",
+    "https://www.googleapis.com/auth/userinfo.email",
+    "https://www.googleapis.com/auth/userinfo.profile",
+]
+
+# Credentials file path in ~/.hive/
+_ACCOUNTS_FILE = Path.home() / ".hive" / "antigravity-accounts.json"
+
+# Default project ID
+_DEFAULT_PROJECT_ID = "rising-fact-p41fc"
+_DEFAULT_REDIRECT_PORT = 51121
+
+# OAuth credentials fetched from the opencode-antigravity-auth project.
+# This project reverse-engineered and published the public OAuth credentials
+# for Google's Antigravity/Cloud Code Assist API.
+# Source: https://github.com/NoeFabris/opencode-antigravity-auth
+_CREDENTIALS_URL = (
+    "https://raw.githubusercontent.com/NoeFabris/opencode-antigravity-auth/dev/src/constants.ts"
+)
+
+# Cached credentials fetched from public source
+_cached_client_id: str | None = None
+_cached_client_secret: str | None = None
+
+
+def _fetch_credentials_from_public_source() -> tuple[str | None, str | None]:
+    """Fetch OAuth client ID and secret from the public npm package source on GitHub."""
+    global _cached_client_id, _cached_client_secret
+    if _cached_client_id and _cached_client_secret:
+        return _cached_client_id, _cached_client_secret
+
+    try:
+        req = urllib.request.Request(
+            _CREDENTIALS_URL, headers={"User-Agent": "Hive-Antigravity-Auth/1.0"}
+        )
+        with urllib.request.urlopen(req, timeout=10) as resp:
+            content = resp.read().decode("utf-8")
+            import re
+
+            id_match = re.search(r'ANTIGRAVITY_CLIENT_ID\s*=\s*"([^"]+)"', content)
+            secret_match = re.search(r'ANTIGRAVITY_CLIENT_SECRET\s*=\s*"([^"]+)"', content)
+            if id_match:
+                _cached_client_id = id_match.group(1)
+            if secret_match:
+                _cached_client_secret = secret_match.group(1)
+            return _cached_client_id, _cached_client_secret
+    except Exception as e:
+        logger.debug(f"Failed to fetch credentials from public source: {e}")
+    return None, None
+
+
+def get_client_id() -> str:
+    """Get OAuth client ID from env, config, or public source."""
+    env_id = os.environ.get("ANTIGRAVITY_CLIENT_ID")
+    if env_id:
+        return env_id
+
+    # Try hive config
+    hive_cfg = Path.home() / ".hive" / "configuration.json"
+    if hive_cfg.exists():
+        try:
+            with open(hive_cfg) as f:
+                cfg = json.load(f)
+                cfg_id = cfg.get("llm", {}).get("antigravity_client_id")
+                if cfg_id:
+                    return cfg_id
+        except Exception:
+            pass
+
+    # Fetch from public source
+    client_id, _ = _fetch_credentials_from_public_source()
+    if client_id:
+        return client_id
+
+    raise RuntimeError("Could not obtain Antigravity OAuth client ID")
+
+
+def get_client_secret() -> str | None:
+    """Get OAuth client secret from env, config, or public source."""
+    secret = os.environ.get("ANTIGRAVITY_CLIENT_SECRET")
+    if secret:
+        return secret
+
+    # Try to read from hive config
+    hive_cfg = Path.home() / ".hive" / "configuration.json"
+    if hive_cfg.exists():
+        try:
+            with open(hive_cfg) as f:
+                cfg = json.load(f)
+                secret = cfg.get("llm", {}).get("antigravity_client_secret")
+                if secret:
+                    return secret
+        except Exception:
+            pass
+
+    # Fetch from public source (npm package on GitHub)
+    _, secret = _fetch_credentials_from_public_source()
+    return secret
+
+
+def find_free_port() -> int:
+    """Find an available local port."""
+    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
+        s.bind(("", 0))
+        s.listen(1)
+        return s.getsockname()[1]
+
+
+class OAuthCallbackHandler(BaseHTTPRequestHandler):
+    """Handle OAuth callback from browser."""
+
+    auth_code: str | None = None
+    state: str | None = None
+    error: str | None = None
+
+    def log_message(self, format: str, *args: Any) -> None:
+        pass  # Suppress default logging
+
+    def do_GET(self) -> None:
+        parsed = urllib.parse.urlparse(self.path)
+
+        if parsed.path == "/oauth-callback":
+            query = urllib.parse.parse_qs(parsed.query)
+
+            if "error" in query:
+                self.error = query["error"][0]
+                self._send_response("Authentication failed. You can close this window.")
+                return
+
+            if "code" in query and "state" in query:
+                OAuthCallbackHandler.auth_code = query["code"][0]
+                OAuthCallbackHandler.state = query["state"][0]
+                self._send_response(
+                    "Authentication successful! You can close this window "
+                    "and return to the terminal."
+                )
+                return
+
+        self._send_response("Waiting for authentication...")
+
+    def _send_response(self, message: str) -> None:
+        self.send_response(200)
+        self.send_header("Content-Type", "text/html")
+        self.end_headers()
+        html = f"""<!DOCTYPE html>
+<html>
+<head><title>Antigravity Auth</title></head>
+<body style="font-family: system-ui; display: flex; align-items: center;
+      justify-content: center; height: 100vh; margin: 0; background: #1a1a2e;
+      color: #eee;">
+    <div style="text-align: center;">
+        <h2>{message}</h2>
+    </div>
+</body>
+</html>"""
+        self.wfile.write(html.encode())
+
+
+def wait_for_callback(port: int, timeout: int = 300) -> tuple[str | None, str | None, str | None]:
+    """Start local server and wait for OAuth callback."""
+    server = HTTPServer(("localhost", port), OAuthCallbackHandler)
+    server.timeout = 1
+
+    start = time.time()
+    while time.time() - start < timeout:
+        if OAuthCallbackHandler.auth_code:
+            return (
+                OAuthCallbackHandler.auth_code,
+                OAuthCallbackHandler.state,
+                OAuthCallbackHandler.error,
+            )
+        server.handle_request()
+
+    return None, None, "timeout"
+
+
+def exchange_code_for_tokens(
+    code: str, redirect_uri: str, client_id: str, client_secret: str | None
+) -> dict[str, Any] | None:
+    """Exchange authorization code for tokens."""
+    data = {
+        "code": code,
+        "client_id": client_id,
+        "redirect_uri": redirect_uri,
+        "grant_type": "authorization_code",
+    }
+    if client_secret:
+        data["client_secret"] = client_secret
+
+    body = urllib.parse.urlencode(data).encode()
+
+    req = urllib.request.Request(
+        _OAUTH_TOKEN_URL,
+        data=body,
+        headers={"Content-Type": "application/x-www-form-urlencoded"},
+        method="POST",
+    )
+
+    try:
+        with urllib.request.urlopen(req, timeout=30) as resp:
+            return json.loads(resp.read())
+    except Exception as e:
+        logger.error(f"Token exchange failed: {e}")
+        return None
+
+
+def get_user_email(access_token: str) -> str | None:
+    """Get user email from Google API."""
+    req = urllib.request.Request(
+        "https://www.googleapis.com/oauth2/v2/userinfo",
+        headers={"Authorization": f"Bearer {access_token}"},
+    )
+    try:
+        with urllib.request.urlopen(req, timeout=10) as resp:
+            data = json.loads(resp.read())
+            return data.get("email")
+    except Exception:
+        return None
+
+
+def load_accounts() -> dict[str, Any]:
+    """Load existing accounts from file."""
+    if not _ACCOUNTS_FILE.exists():
+        return {"schemaVersion": 4, "accounts": []}
+    try:
+        with open(_ACCOUNTS_FILE) as f:
+            return json.load(f)
+    except Exception:
+        return {"schemaVersion": 4, "accounts": []}
+
+
+def save_accounts(data: dict[str, Any]) -> None:
+    """Save accounts to file."""
+    _ACCOUNTS_FILE.parent.mkdir(parents=True, exist_ok=True)
+    with open(_ACCOUNTS_FILE, "w") as f:
+        json.dump(data, f, indent=2)
+    logger.info(f"Saved credentials to {_ACCOUNTS_FILE}")
+
+
+def validate_credentials(access_token: str, project_id: str = _DEFAULT_PROJECT_ID) -> bool:
+    """Test if credentials work by making a simple API call to Antigravity.
+
+    Returns True if credentials are valid, False otherwise.
+    """
+    endpoint = "https://daily-cloudcode-pa.sandbox.googleapis.com"
+    body = {
+        "project": project_id,
+        "model": "gemini-3-flash",
+        "request": {
+            "contents": [{"role": "user", "parts": [{"text": "hi"}]}],
+            "generationConfig": {"maxOutputTokens": 10},
+        },
+        "requestType": "agent",
+        "userAgent": "antigravity",
+        "requestId": "validation-test",
+    }
+    headers = {
+        "Authorization": f"Bearer {access_token}",
+        "Content-Type": "application/json",
+        "User-Agent": (
+            "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
+            "AppleWebKit/537.36 (KHTML, like Gecko) Antigravity/1.18.3"
+        ),
+        "X-Goog-Api-Client": "google-cloud-sdk vscode_cloudshelleditor/0.1",
+    }
+
+    try:
+        req = urllib.request.Request(
+            f"{endpoint}/v1internal:generateContent",
+            data=json.dumps(body).encode("utf-8"),
+            headers=headers,
+            method="POST",
+        )
+        with urllib.request.urlopen(req, timeout=30) as resp:
+            json.loads(resp.read())
+            return True
+    except Exception:
+        return False
+
+
+def refresh_access_token(
+    refresh_token: str, client_id: str, client_secret: str | None
+) -> dict | None:
+    """Refresh the access token using the refresh token."""
+    data = {
+        "grant_type": "refresh_token",
+        "refresh_token": refresh_token,
+        "client_id": client_id,
+    }
+    if client_secret:
+        data["client_secret"] = client_secret
+
+    body = urllib.parse.urlencode(data).encode()
+    req = urllib.request.Request(
+        _OAUTH_TOKEN_URL,
+        data=body,
+        headers={"Content-Type": "application/x-www-form-urlencoded"},
+        method="POST",
+    )
+    try:
+        with urllib.request.urlopen(req, timeout=30) as resp:
+            return json.loads(resp.read())
+    except Exception as e:
+        logger.debug(f"Token refresh failed: {e}")
+        return None
+
+
+def cmd_account_add(args: argparse.Namespace) -> int:
+    """Add a new Antigravity account via OAuth2.
+
+    First checks if valid credentials already exist. If so, validates them
+    and skips OAuth if they work. Otherwise, proceeds with OAuth flow.
+    """
+    client_id = get_client_id()
+    client_secret = get_client_secret()
+
+    # Check if credentials already exist
+    accounts_data = load_accounts()
+    accounts = accounts_data.get("accounts", [])
+
+    if accounts:
+        account = next((a for a in accounts if a.get("enabled", True) is not False), accounts[0])
+        access_token = account.get("access")
+        refresh_token_str = account.get("refresh", "")
+        refresh_token = refresh_token_str.split("|")[0] if refresh_token_str else None
+        project_id = (
+            refresh_token_str.split("|")[1] if "|" in refresh_token_str else _DEFAULT_PROJECT_ID
+        )
+        email = account.get("email", "unknown")
+        expires_ms = account.get("expires", 0)
+        expires_at = expires_ms / 1000.0 if expires_ms else 0.0
+
+        # Check if token is expired or near expiry
+        if access_token and expires_at and time.time() < expires_at - 60:
+            # Token still valid, test it
+            logger.info(f"Found existing credentials for: {email}")
+            logger.info("Validating existing credentials...")
+            if validate_credentials(access_token, project_id):
+                logger.info("✓ Credentials valid! Skipping OAuth.")
+                return 0
+            else:
+                logger.info("Credentials failed validation, refreshing...")
+        elif refresh_token:
+            logger.info(f"Found expired credentials for: {email}")
+            logger.info("Attempting token refresh...")
+
+            tokens = refresh_access_token(refresh_token, client_id, client_secret)
+            if tokens:
+                new_access = tokens.get("access_token")
+                expires_in = tokens.get("expires_in", 3600)
+                if new_access:
+                    # Update the account
+                    account["access"] = new_access
+                    account["expires"] = int((time.time() + expires_in) * 1000)
+                    accounts_data["last_refresh"] = time.strftime(
+                        "%Y-%m-%dT%H:%M:%SZ", time.gmtime()
+                    )
+                    save_accounts(accounts_data)
+
+                    # Validate the refreshed token
+                    logger.info("Validating refreshed credentials...")
+                    if validate_credentials(new_access, project_id):
+                        logger.info("✓ Credentials refreshed and validated!")
+                        return 0
+                    else:
+                        logger.info("Refreshed token failed validation, proceeding with OAuth...")
+            else:
+                logger.info("Token refresh failed, proceeding with OAuth...")
+
+    # No valid credentials, proceed with OAuth
+    if not client_secret:
+        logger.warning(
+            "No client secret configured. Token refresh may fail.\n"
+            "Set ANTIGRAVITY_CLIENT_SECRET env var or add "
+            "'antigravity_client_secret' to ~/.hive/configuration.json"
+        )
+
+    # Use fixed port and path matching Google's expected OAuth redirect URI
+    port = _DEFAULT_REDIRECT_PORT
+    redirect_uri = f"http://localhost:{port}/oauth-callback"
+
+    # Generate state for CSRF protection
+    state = secrets.token_urlsafe(16)
+
+    # Build authorization URL
+    params = {
+        "client_id": client_id,
+        "redirect_uri": redirect_uri,
+        "response_type": "code",
+        "scope": " ".join(_OAUTH_SCOPES),
+        "state": state,
+        "access_type": "offline",
+        "prompt": "consent",
+    }
+    auth_url = f"{_OAUTH_AUTH_URL}?{urllib.parse.urlencode(params)}"
+
+    logger.info("Opening browser for authentication...")
+    logger.info(f"If the browser doesn't open, visit: {auth_url}\n")
+
+    # Open browser
+    webbrowser.open(auth_url)
+
+    # Wait for callback
+    logger.info(f"Listening for callback on port {port}...")
+    code, received_state, error = wait_for_callback(port)
+
+    if error:
+        logger.error(f"Authentication failed: {error}")
+        return 1
+
+    if not code:
+        logger.error("No authorization code received")
+        return 1
+
+    if received_state != state:
+        logger.error("State mismatch - possible CSRF attack")
+        return 1
+
+    # Exchange code for tokens
+    logger.info("Exchanging authorization code for tokens...")
+    tokens = exchange_code_for_tokens(code, redirect_uri, client_id, client_secret)
+
+    if not tokens:
+        return 1
+
+    access_token = tokens.get("access_token")
+    refresh_token = tokens.get("refresh_token")
+    expires_in = tokens.get("expires_in", 3600)
+
+    if not access_token:
+        logger.error("No access token in response")
+        return 1
+
+    # Get user email
+    email = get_user_email(access_token)
+    if email:
+        logger.info(f"Authenticated as: {email}")
+
+    # Load existing accounts and add/update
+    accounts_data = load_accounts()
+    accounts = accounts_data.get("accounts", [])
+
+    # Build new account entry (V4 schema)
+    expires_ms = int((time.time() + expires_in) * 1000)
+    refresh_entry = f"{refresh_token}|{_DEFAULT_PROJECT_ID}"
+
+    new_account = {
+        "access": access_token,
+        "refresh": refresh_entry,
+        "expires": expires_ms,
+        "email": email,
+        "enabled": True,
+    }
+
+    # Update existing account or add new one
+    existing_idx = next((i for i, a in enumerate(accounts) if a.get("email") == email), None)
+    if existing_idx is not None:
+        accounts[existing_idx] = new_account
+        logger.info(f"Updated existing account: {email}")
+    else:
+        accounts.append(new_account)
+        logger.info(f"Added new account: {email}")
+
+    accounts_data["accounts"] = accounts
+    accounts_data["schemaVersion"] = 4
+    accounts_data["last_refresh"] = time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime())
+
+    save_accounts(accounts_data)
+    logger.info("\n✓ Authentication complete!")
+    return 0
+
+
+def cmd_account_list(args: argparse.Namespace) -> int:
+    """List all stored accounts."""
+    data = load_accounts()
+    accounts = data.get("accounts", [])
+
+    if not accounts:
+        logger.info("No accounts configured.")
+        logger.info("Run 'antigravity auth account add' to add one.")
+        return 0
+
+    logger.info("Configured accounts:\n")
+    for i, account in enumerate(accounts, 1):
+        email = account.get("email", "unknown")
+        enabled = "enabled" if account.get("enabled", True) else "disabled"
+        logger.info(f"  {i}. {email} ({enabled})")
+
+    return 0
+
+
+def cmd_account_remove(args: argparse.Namespace) -> int:
+    """Remove an account by email."""
+    email = args.email
+    data = load_accounts()
+    accounts = data.get("accounts", [])
+
+    original_len = len(accounts)
+    accounts = [a for a in accounts if a.get("email") != email]
+
+    if len(accounts) == original_len:
+        logger.error(f"No account found with email: {email}")
+        return 1
+
+    data["accounts"] = accounts
+    save_accounts(data)
+    logger.info(f"Removed account: {email}")
+    return 0
+
+
+def main() -> int:
+    parser = argparse.ArgumentParser(
+        description="Antigravity authentication CLI",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+    )
+    subparsers = parser.add_subparsers(dest="command", help="Commands")
+
+    # auth account add
+    auth_parser = subparsers.add_parser("auth", help="Authentication commands")
+    auth_subparsers = auth_parser.add_subparsers(dest="auth_command")
+
+    account_parser = auth_subparsers.add_parser("account", help="Account management")
+    account_subparsers = account_parser.add_subparsers(dest="account_command")
+
+    add_parser = account_subparsers.add_parser("add", help="Add a new account via OAuth2")
+    add_parser.set_defaults(func=cmd_account_add)
+
+    list_parser = account_subparsers.add_parser("list", help="List configured accounts")
+    list_parser.set_defaults(func=cmd_account_list)
+
+    remove_parser = account_subparsers.add_parser("remove", help="Remove an account")
+    remove_parser.add_argument("email", help="Email of account to remove")
+    remove_parser.set_defaults(func=cmd_account_remove)
+
+    args = parser.parse_args()
+
+    if hasattr(args, "func"):
+        return args.func(args)
+
+    parser.print_help()
+    return 0
+
+
+if __name__ == "__main__":
+    sys.exit(main())
@@ -0,0 +1,441 @@
+"""OpenAI Codex OAuth PKCE login flow.
+
+Runs the full browser-based OAuth flow so users can authenticate with their
+ChatGPT Plus/Pro subscription without needing the Codex CLI installed.
+
+Usage (from quickstart.sh):
+    uv run python codex_oauth.py
+
+Exit codes:
+    0 - success (credentials saved to ~/.codex/auth.json)
+    1 - failure (user cancelled, timeout, or token exchange error)
+"""
+
+import base64
+import hashlib
+import http.server
+import json
+import os
+import platform
+import queue
+import secrets
+import subprocess
+import sys
+import threading
+import time
+import urllib.error
+import urllib.parse
+import urllib.request
+from datetime import UTC, datetime
+from pathlib import Path
+from typing import TextIO
+
+# OAuth constants (from the Codex CLI binary)
+CLIENT_ID = "app_EMoamEEZ73f0CkXaXp7hrann"
+AUTHORIZE_URL = "https://auth.openai.com/oauth/authorize"
+TOKEN_URL = "https://auth.openai.com/oauth/token"
+REDIRECT_URI = "http://localhost:1455/auth/callback"
+SCOPE = "openid profile email offline_access"
+CALLBACK_PORT = 1455
+
+# Where to save credentials (same location the Codex CLI uses)
+CODEX_AUTH_FILE = Path.home() / ".codex" / "auth.json"
+
+# JWT claim path for account_id
+JWT_CLAIM_PATH = "https://api.openai.com/auth"
+
+
+def _base64url(data: bytes) -> str:
+    return base64.urlsafe_b64encode(data).rstrip(b"=").decode("ascii")
+
+
+def generate_pkce() -> tuple[str, str]:
+    """Generate PKCE code_verifier and code_challenge (S256)."""
+    verifier_bytes = secrets.token_bytes(32)
+    verifier = _base64url(verifier_bytes)
+    challenge = _base64url(hashlib.sha256(verifier.encode("ascii")).digest())
+    return verifier, challenge
+
+
+def build_authorize_url(state: str, challenge: str) -> str:
+    """Build the OpenAI OAuth authorize URL with PKCE."""
+    params = urllib.parse.urlencode(
+        {
+            "response_type": "code",
+            "client_id": CLIENT_ID,
+            "redirect_uri": REDIRECT_URI,
+            "scope": SCOPE,
+            "code_challenge": challenge,
+            "code_challenge_method": "S256",
+            "state": state,
+            "id_token_add_organizations": "true",
+            "codex_cli_simplified_flow": "true",
+            "originator": "hive",
+        }
+    )
+    return f"{AUTHORIZE_URL}?{params}"
+
+
+def exchange_code_for_tokens(code: str, verifier: str) -> dict | None:
+    """Exchange the authorization code for tokens."""
+    data = urllib.parse.urlencode(
+        {
+            "grant_type": "authorization_code",
+            "client_id": CLIENT_ID,
+            "code": code,
+            "code_verifier": verifier,
+            "redirect_uri": REDIRECT_URI,
+        }
+    ).encode("utf-8")
+
+    req = urllib.request.Request(
+        TOKEN_URL,
+        data=data,
+        headers={"Content-Type": "application/x-www-form-urlencoded"},
+        method="POST",
+    )
+
+    try:
+        with urllib.request.urlopen(req, timeout=15) as resp:
+            token_data = json.loads(resp.read())
+    except (urllib.error.URLError, json.JSONDecodeError, TimeoutError, OSError) as exc:
+        print(f"\033[0;31mToken exchange failed: {exc}\033[0m", file=sys.stderr)
+        return None
+
+    if not token_data.get("access_token") or not token_data.get("refresh_token"):
+        print("\033[0;31mToken response missing required fields\033[0m", file=sys.stderr)
+        return None
+
+    return token_data
+
+
+def decode_jwt_payload(token: str) -> dict | None:
+    """Decode the payload of a JWT (no signature verification)."""
+    try:
+        parts = token.split(".")
+        if len(parts) != 3:
+            return None
+        payload = parts[1]
+        # Add padding
+        padding = 4 - len(payload) % 4
+        if padding != 4:
+            payload += "=" * padding
+        decoded = base64.urlsafe_b64decode(payload)
+        return json.loads(decoded)
+    except Exception:
+        return None
+
+
+def get_account_id(access_token: str) -> str | None:
+    """Extract the ChatGPT account_id from the access token JWT."""
+    payload = decode_jwt_payload(access_token)
+    if not payload:
+        return None
+    auth = payload.get(JWT_CLAIM_PATH)
+    if isinstance(auth, dict):
+        account_id = auth.get("chatgpt_account_id")
+        if isinstance(account_id, str) and account_id:
+            return account_id
+    return None
+
+
+def save_credentials(token_data: dict, account_id: str) -> None:
+    """Save credentials to ~/.codex/auth.json in the same format the Codex CLI uses."""
+    auth_data = {
+        "tokens": {
+            "access_token": token_data["access_token"],
+            "refresh_token": token_data["refresh_token"],
+            "account_id": account_id,
+        },
+        "auth_mode": "chatgpt",
+        "last_refresh": datetime.now(UTC).isoformat(),
+    }
+    if "id_token" in token_data:
+        auth_data["tokens"]["id_token"] = token_data["id_token"]
+
+    CODEX_AUTH_FILE.parent.mkdir(parents=True, exist_ok=True, mode=0o700)
+    fd = os.open(CODEX_AUTH_FILE, os.O_WRONLY | os.O_CREAT | os.O_TRUNC, 0o600)
+    with os.fdopen(fd, "w") as f:
+        json.dump(auth_data, f, indent=2)
+
+
+def open_browser(url: str) -> bool:
+    """Open the URL in the user's default browser."""
+    system = platform.system()
+    try:
+        devnull = subprocess.DEVNULL
+        if system == "Darwin":
+            subprocess.Popen(["open", url], stdout=devnull, stderr=devnull)
+        elif system == "Windows":
+            os.startfile(url)  # type: ignore[attr-defined]
+        else:
+            subprocess.Popen(["xdg-open", url], stdout=devnull, stderr=devnull)
+        return True
+    except (AttributeError, OSError):
+        return False
+
+
+class OAuthCallbackHandler(http.server.BaseHTTPRequestHandler):
+    """HTTP handler that captures the OAuth callback."""
+
+    auth_code: str | None = None
+    received_state: str | None = None
+
+    def do_GET(self) -> None:
+        parsed = urllib.parse.urlparse(self.path)
+        if parsed.path != "/auth/callback":
+            self.send_response(404)
+            self.end_headers()
+            self.wfile.write(b"Not found")
+            return
+
+        params = urllib.parse.parse_qs(parsed.query)
+        code = params.get("code", [None])[0]
+        state = params.get("state", [None])[0]
+
+        if not code:
+            self.send_response(400)
+            self.end_headers()
+            self.wfile.write(b"Missing authorization code")
+            return
+
+        OAuthCallbackHandler.auth_code = code
+        OAuthCallbackHandler.received_state = state
+
+        self.send_response(200)
+        self.send_header("Content-Type", "text/html; charset=utf-8")
+        self.end_headers()
+        self.wfile.write(
+            b"<!doctype html><html><head><meta charset='utf-8'/></head>"
+            b"<body><h2>Authentication successful</h2>"
+            b"<p>Return to your terminal to continue.</p></body></html>"
+        )
+
+    def log_message(self, format: str, *args: object) -> None:
+        # Suppress request logging
+        pass
+
+
+def wait_for_callback(state: str, timeout_secs: int = 120) -> str | None:
+    """Start a local HTTP server and wait for the OAuth callback.
+
+    Returns the authorization code on success, None on timeout.
+    """
+    OAuthCallbackHandler.auth_code = None
+    OAuthCallbackHandler.received_state = None
+
+    server = http.server.HTTPServer(("127.0.0.1", CALLBACK_PORT), OAuthCallbackHandler)
+    server.timeout = 1
+
+    deadline = time.time() + timeout_secs
+    server_thread = threading.Thread(target=_serve_until_done, args=(server, deadline, state))
+    server_thread.daemon = True
+    server_thread.start()
+    server_thread.join(timeout=timeout_secs + 2)
+
+    server.server_close()
+
+    if OAuthCallbackHandler.auth_code and OAuthCallbackHandler.received_state == state:
+        return OAuthCallbackHandler.auth_code
+    return None
+
+
+def _serve_until_done(server: http.server.HTTPServer, deadline: float, state: str) -> None:
+    while time.time() < deadline:
+        server.handle_request()
+        if OAuthCallbackHandler.auth_code and OAuthCallbackHandler.received_state == state:
+            return
+
+
+def parse_manual_input(value: str, expected_state: str) -> str | None:
+    """Parse user-pasted redirect URL or auth code."""
+    value = value.strip()
+    if not value:
+        return None
+    try:
+        parsed = urllib.parse.urlparse(value)
+        params = urllib.parse.parse_qs(parsed.query)
+        code = params.get("code", [None])[0]
+        state = params.get("state", [None])[0]
+        if state and state != expected_state:
+            return None
+        return code
+    except Exception:
+        pass
+    # Maybe it's just the raw code
+    if len(value) > 10 and " " not in value:
+        return value
+    return None
+
+
+def _read_manual_input_lines(
+    manual_inputs: queue.Queue[str],
+    stop_event: threading.Event,
+    stdin: TextIO | None = None,
+) -> None:
+    stream = sys.stdin if stdin is None else stdin
+
+    while not stop_event.is_set():
+        try:
+            manual = stream.readline()
+        except (EOFError, OSError):
+            return
+
+        if not manual:
+            return
+
+        if manual.strip():
+            manual_inputs.put(manual)
+
+
+def wait_for_code_from_callback_or_stdin(
+    expected_state: str,
+    callback_result: list[str | None],
+    callback_done: threading.Event,
+    timeout_secs: float = 120,
+    poll_interval: float = 0.1,
+    stdin: TextIO | None = None,
+) -> str | None:
+    manual_inputs: queue.Queue[str] = queue.Queue()
+    stop_event = threading.Event()
+
+    # Read stdin on a daemon thread so manual paste works on platforms where
+    # select() cannot poll console handles, including Windows terminals.
+    threading.Thread(
+        target=_read_manual_input_lines,
+        args=(manual_inputs, stop_event, stdin),
+        daemon=True,
+    ).start()
+
+    deadline = time.time() + timeout_secs
+    try:
+        while time.time() < deadline:
+            if callback_result[0]:
+                return callback_result[0]
+
+            while True:
+                try:
+                    manual = manual_inputs.get_nowait()
+                except queue.Empty:
+                    break
+
+                code = parse_manual_input(manual, expected_state)
+                if code:
+                    return code
+
+            if callback_done.is_set():
+                return callback_result[0]
+
+            time.sleep(poll_interval)
+
+        return callback_result[0]
+    finally:
+        stop_event.set()
+
+
+def main() -> int:
+    # Generate PKCE and state
+    verifier, challenge = generate_pkce()
+    state = secrets.token_hex(16)
+
+    # Build URL
+    auth_url = build_authorize_url(state, challenge)
+
+    print()
+    print("\033[1mOpenAI Codex OAuth Login\033[0m")
+    print()
+
+    # Try to start the local callback server first
+    try:
+        server_available = True
+        # Quick test that port is free
+        import socket
+
+        sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+        sock.settimeout(1)
+        result = sock.connect_ex(("127.0.0.1", CALLBACK_PORT))
+        sock.close()
+        if result == 0:
+            print(f"\033[1;33mPort {CALLBACK_PORT} is in use. Using manual paste mode.\033[0m")
+            server_available = False
+    except Exception:
+        server_available = True
+
+    # Open browser
+    browser_opened = open_browser(auth_url)
+    if browser_opened:
+        print("  Browser opened for OpenAI sign-in...")
+    else:
+        print("  Could not open browser automatically.")
+
+    print()
+    print("  If the browser didn't open, visit this URL:")
+    print(f"  \033[0;36m{auth_url}\033[0m")
+    print()
+
+    code = None
+
+    if server_available:
+        print("  Waiting for authentication (up to 2 minutes)...")
+        print("  \033[2mOr paste the redirect URL below if the callback didn't work:\033[0m")
+        print()
+
+        # Start callback server in background
+        callback_result: list[str | None] = [None]
+        callback_done = threading.Event()
+
+        def run_server() -> None:
+            try:
+                callback_result[0] = wait_for_callback(state, timeout_secs=120)
+            finally:
+                callback_done.set()
+
+        server_thread = threading.Thread(target=run_server)
+        server_thread.daemon = True
+        server_thread.start()
+
+        try:
+            code = wait_for_code_from_callback_or_stdin(
+                state,
+                callback_result,
+                callback_done,
+                timeout_secs=120,
+            )
+        except KeyboardInterrupt:
+            print("\n\033[0;31mCancelled.\033[0m")
+            return 1
+    else:
+        # Manual paste mode
+        try:
+            manual = input("  Paste the redirect URL: ").strip()
+            code = parse_manual_input(manual, state)
+        except (KeyboardInterrupt, EOFError):
+            print("\n\033[0;31mCancelled.\033[0m")
+            return 1
+
+    if not code:
+        print("\n\033[0;31mAuthentication timed out or failed.\033[0m")
+        return 1
+
+    # Exchange code for tokens
+    print()
+    print("  Exchanging authorization code for tokens...")
+    token_data = exchange_code_for_tokens(code, verifier)
+    if not token_data:
+        return 1
+
+    # Extract account_id from JWT
+    account_id = get_account_id(token_data["access_token"])
+    if not account_id:
+        print("\033[0;31mFailed to extract account ID from token.\033[0m", file=sys.stderr)
+        return 1
+
+    # Save credentials
+    save_credentials(token_data, account_id)
+    print("  \033[0;32mAuthentication successful!\033[0m")
+    print(f"  Credentials saved to {CODEX_AUTH_FILE}")
+    return 0
+
+
+if __name__ == "__main__":
+    sys.exit(main())
@@ -0,0 +1,132 @@
+"""
+Minimal Manual Agent Example
+----------------------------
+This example demonstrates how to build and run an agent programmatically
+without using the Claude Code CLI or external LLM APIs.
+
+It uses custom NodeProtocol implementations to define logic in pure Python,
+making it perfect for understanding the core runtime loop:
+Setup -> Graph definition -> Execution -> Result
+
+Run with:
+    uv run python core/examples/manual_agent.py
+"""
+
+import asyncio
+
+from framework.graph import EdgeCondition, EdgeSpec, Goal, GraphSpec, NodeSpec
+from framework.graph.executor import GraphExecutor
+from framework.graph.node import NodeContext, NodeProtocol, NodeResult
+from framework.runtime.core import Runtime
+
+
+# 1. Define Node Logic (Custom NodeProtocol implementations)
+class GreeterNode(NodeProtocol):
+    """Generate a simple greeting."""
+
+    async def execute(self, ctx: NodeContext) -> NodeResult:
+        name = ctx.input_data.get("name", "World")
+        greeting = f"Hello, {name}!"
+        ctx.memory.write("greeting", greeting)
+        return NodeResult(success=True, output={"greeting": greeting})
+
+
+class UppercaserNode(NodeProtocol):
+    """Convert text to uppercase."""
+
+    async def execute(self, ctx: NodeContext) -> NodeResult:
+        greeting = ctx.input_data.get("greeting") or ctx.memory.read("greeting") or ""
+        result = greeting.upper()
+        ctx.memory.write("final_greeting", result)
+        return NodeResult(success=True, output={"final_greeting": result})
+
+
+async def main():
+    print("Setting up Manual Agent...")
+
+    # 2. Define the Goal
+    # Every agent needs a goal with success criteria
+    goal = Goal(
+        id="greet-user",
+        name="Greet User",
+        description="Generate a friendly uppercase greeting",
+        success_criteria=[
+            {
+                "id": "greeting_generated",
+                "description": "Greeting produced",
+                "metric": "custom",
+                "target": "any",
+            }
+        ],
+    )
+
+    # 3. Define Nodes
+    # Nodes describe steps in the process
+    node1 = NodeSpec(
+        id="greeter",
+        name="Greeter",
+        description="Generates a simple greeting",
+        node_type="event_loop",
+        input_keys=["name"],
+        output_keys=["greeting"],
+    )
+
+    node2 = NodeSpec(
+        id="uppercaser",
+        name="Uppercaser",
+        description="Converts greeting to uppercase",
+        node_type="event_loop",
+        input_keys=["greeting"],
+        output_keys=["final_greeting"],
+    )
+
+    # 4. Define Edges
+    # Edges define the flow between nodes
+    edge1 = EdgeSpec(
+        id="greet-to-upper",
+        source="greeter",
+        target="uppercaser",
+        condition=EdgeCondition.ON_SUCCESS,
+    )
+
+    # 5. Create Graph
+    # The graph works like a blueprint connecting nodes and edges
+    graph = GraphSpec(
+        id="greeting-agent",
+        goal_id="greet-user",
+        entry_node="greeter",
+        terminal_nodes=["uppercaser"],
+        nodes=[node1, node2],
+        edges=[edge1],
+    )
+
+    # 6. Initialize Runtime & Executor
+    # Runtime handles state/memory; Executor runs the graph
+    from pathlib import Path
+
+    runtime = Runtime(storage_path=Path("./agent_logs"))
+    executor = GraphExecutor(runtime=runtime)
+
+    # 7. Register Node Implementations
+    # Connect node IDs in the graph to actual Python implementations
+    executor.register_node("greeter", GreeterNode())
+    executor.register_node("uppercaser", UppercaserNode())
+
+    # 8. Execute Agent
+    print("Executing agent with input: name='Alice'...")
+
+    result = await executor.execute(graph=graph, goal=goal, input_data={"name": "Alice"})
+
+    # 9. Verify Results
+    if result.success:
+        print("\nSuccess!")
+        print(f"Path taken: {' -> '.join(result.path)}")
+        print(f"Final output: {result.output.get('final_greeting')}")
+    else:
+        print(f"\nFailed: {result.error}")
+
+
+if __name__ == "__main__":
+    # Optional: Enable logging to see internal decision flow
+    # logging.basicConfig(level=logging.INFO)
+    asyncio.run(main())
@@ -21,25 +21,25 @@ async def example_1_programmatic_registration():
    # Load an existing agent
    runner = AgentRunner.load("exports/task-planner")

-    # Register aden-tools MCP server via STDIO
+    # Register tools MCP server via STDIO
    num_tools = runner.register_mcp_server(
-        name="aden-tools",
+        name="tools",
        transport="stdio",
        command="python",
        args=["-m", "aden_tools.mcp_server", "--stdio"],
-        cwd="../aden-tools",
+        cwd="../tools",
    )

-    print(f"Registered {num_tools} tools from aden-tools MCP server")
+    print(f"Registered {num_tools} tools from tools MCP server")

    # List all available tools
    tools = runner._tool_registry.get_tools()
    print(f"\nAvailable tools: {list(tools.keys())}")

    # Run the agent with MCP tools available
-    result = await runner.run({
-        "objective": "Search for 'Claude AI' and summarize the top 3 results"
-    })
+    result = await runner.run(
+        {"objective": "Search for 'Claude AI' and summarize the top 3 results"}
+    )

    print(f"\nAgent result: {result}")

@@ -51,14 +51,14 @@ async def example_2_http_transport():
    """Example 2: Connect to MCP server via HTTP"""
    print("\n=== Example 2: HTTP MCP Server Connection ===\n")

-    # First, start the aden-tools MCP server in HTTP mode:
-    # cd aden-tools && python mcp_server.py --port 4001
+    # First, start the tools MCP server in HTTP mode:
+    # cd tools && python mcp_server.py --port 4001

    runner = AgentRunner.load("exports/task-planner")

-    # Register aden-tools via HTTP
+    # Register tools via HTTP
    num_tools = runner.register_mcp_server(
-        name="aden-tools-http",
+        name="tools-http",
        transport="http",
        url="http://localhost:4001",
    )
@@ -78,10 +78,8 @@ async def example_3_config_file():

    # Copy example config (in practice, you'd place this in your agent folder)
    import shutil
-    shutil.copy(
-        "examples/mcp_servers.json",
-        test_agent_path / "mcp_servers.json"
-    )
+
+    shutil.copy(Path(__file__).parent / "mcp_servers.json", test_agent_path / "mcp_servers.json")

    # Load agent - MCP servers will be auto-discovered
    runner = AgentRunner.load(test_agent_path)
@@ -97,85 +95,6 @@ async def example_3_config_file():
    (test_agent_path / "mcp_servers.json").unlink()


-async def example_4_custom_agent_with_mcp_tools():
-    """Example 4: Build custom agent that uses MCP tools"""
-    print("\n=== Example 4: Custom Agent with MCP Tools ===\n")
-
-    from framework.builder.workflow import WorkflowBuilder
-
-    # Create a workflow builder
-    builder = WorkflowBuilder()
-
-    # Define goal
-    builder.set_goal(
-        goal_id="web-researcher",
-        name="Web Research Agent",
-        description="Search the web and summarize findings"
-    )
-
-    # Add success criteria
-    builder.add_success_criterion(
-        "search-results",
-        "Successfully retrieve at least 3 web search results"
-    )
-    builder.add_success_criterion(
-        "summary",
-        "Provide a clear, concise summary of the findings"
-    )
-
-    # Add nodes that will use MCP tools
-    builder.add_node(
-        node_id="web-searcher",
-        name="Web Search",
-        description="Search the web for information",
-        node_type="llm_tool_use",
-        system_prompt="Search for {query} and return the top results. Use the web_search tool.",
-        tools=["web_search"],  # This tool comes from aden-tools MCP server
-        input_keys=["query"],
-        output_keys=["search_results"],
-    )
-
-    builder.add_node(
-        node_id="summarizer",
-        name="Summarize Results",
-        description="Summarize the search results",
-        node_type="llm_generate",
-        system_prompt="Summarize the following search results in 2-3 sentences: {search_results}",
-        input_keys=["search_results"],
-        output_keys=["summary"],
-    )
-
-    # Connect nodes
-    builder.add_edge("web-searcher", "summarizer")
-
-    # Set entry point
-    builder.set_entry("web-searcher")
-    builder.set_terminal("summarizer")
-
-    # Export the agent
-    export_path = Path("exports/web-research-agent")
-    export_path.mkdir(parents=True, exist_ok=True)
-    builder.export(export_path)
-
-    # Load and register MCP server
-    runner = AgentRunner.load(export_path)
-    runner.register_mcp_server(
-        name="aden-tools",
-        transport="stdio",
-        command="python",
-        args=["-m", "aden_tools.mcp_server", "--stdio"],
-        cwd="../aden-tools",
-    )
-
-    # Run the agent
-    result = await runner.run({"query": "latest AI breakthroughs 2026"})
-
-    print(f"\nAgent completed with result:\n{result}")
-
-    # Cleanup
-    runner.cleanup()
-
-
 async def main():
    """Run all examples"""
    print("=" * 60)
@@ -192,6 +111,7 @@ async def main():
    except Exception as e:
        print(f"\nError running example: {e}")
        import traceback
+
        traceback.print_exc()


@@ -1,18 +1,18 @@
 {
  "servers": [
    {
-      "name": "aden-tools",
+      "name": "tools",
      "description": "Aden tools including web search, file operations, and PDF reading",
      "transport": "stdio",
-      "command": "python",
-      "args": ["mcp_server.py", "--stdio"],
-      "cwd": "../aden-tools",
+      "command": "uv",
+      "args": ["run", "python", "mcp_server.py", "--stdio"],
+      "cwd": "../tools",
      "env": {
        "BRAVE_SEARCH_API_KEY": "${BRAVE_SEARCH_API_KEY}"
      }
    },
    {
-      "name": "aden-tools-http",
+      "name": "tools-http",
      "description": "Aden tools via HTTP (for Docker deployments)",
      "transport": "http",
      "url": "http://localhost:4001",
@@ -10,14 +10,34 @@ choice the agent makes is captured with:
 - Whether that was good or bad (evaluated post-hoc)

 This gives the Builder LLM the information it needs to improve agent behavior.
+
+## Testing Framework
+
+The framework includes a Goal-Based Testing system (Goal → Agent → Eval):
+- Generate tests from Goal success_criteria and constraints
+- Mandatory user approval before tests are stored
+- Parallel test execution with error categorization
+- Debug tools with fix suggestions
+
+See `framework.testing` for details.
 """

-from framework.schemas.decision import Decision, Option, Outcome, DecisionEvaluation
-from framework.schemas.run import Run, RunSummary, Problem
+from framework.llm import AnthropicProvider, LLMProvider
+from framework.runner import AgentOrchestrator, AgentRunner
 from framework.runtime.core import Runtime
-from framework.builder.query import BuilderQuery
-from framework.llm import LLMProvider, AnthropicProvider
-from framework.runner import AgentRunner, AgentOrchestrator
+from framework.schemas.decision import Decision, DecisionEvaluation, Option, Outcome
+from framework.schemas.run import Problem, Run, RunSummary
+
+# Testing framework
+from framework.testing import (
+    ApprovalStatus,
+    DebugTool,
+    ErrorCategory,
+    Test,
+    TestResult,
+    TestStorage,
+    TestSuiteResult,
+)

 __all__ = [
    # Schemas
@@ -30,12 +50,18 @@ __all__ = [
    "Problem",
    # Runtime
    "Runtime",
-    # Builder
-    "BuilderQuery",
    # LLM
    "LLMProvider",
    "AnthropicProvider",
    # Runner
    "AgentRunner",
    "AgentOrchestrator",
+    # Testing
+    "Test",
+    "TestResult",
+    "TestSuiteResult",
+    "TestStorage",
+    "ApprovalStatus",
+    "ErrorCategory",
+    "DebugTool",
 ]
@@ -1,4 +1,4 @@
-"""Allow running as python -m framework"""
+"""Allow running as ``python -m framework``, which powers the ``hive`` console entry point."""

 from framework.cli import main

@@ -0,0 +1,13 @@
+"""Framework-provided agents."""
+
+from pathlib import Path
+
+FRAMEWORK_AGENTS_DIR = Path(__file__).parent
+
+
+def list_framework_agents() -> list[Path]:
+    """List all framework agent directories."""
+    return sorted(
+        [p for p in FRAMEWORK_AGENTS_DIR.iterdir() if p.is_dir() and (p / "agent.py").exists()],
+        key=lambda p: p.name,
+    )
@@ -0,0 +1,55 @@
+"""
+Credential Tester — verify credentials (Aden OAuth + local API keys) via live API calls.
+
+Interactive agent that lists all testable accounts, lets the user pick one,
+loads the provider's tools, and runs a chat session to test the credential.
+"""
+
+from .agent import (
+    CredentialTesterAgent,
+    _list_aden_accounts,
+    _list_env_fallback_accounts,
+    _list_local_accounts,
+    configure_for_account,
+    conversation_mode,
+    edges,
+    entry_node,
+    entry_points,
+    get_tools_for_provider,
+    goal,
+    identity_prompt,
+    list_connected_accounts,
+    loop_config,
+    nodes,
+    pause_nodes,
+    requires_account_selection,
+    skip_credential_validation,
+    terminal_nodes,
+)
+from .config import default_config
+
+__version__ = "1.0.0"
+
+__all__ = [
+    "CredentialTesterAgent",
+    "configure_for_account",
+    "conversation_mode",
+    "default_config",
+    "edges",
+    "entry_node",
+    "entry_points",
+    "get_tools_for_provider",
+    "goal",
+    "identity_prompt",
+    "list_connected_accounts",
+    "loop_config",
+    "nodes",
+    "pause_nodes",
+    "requires_account_selection",
+    "skip_credential_validation",
+    "terminal_nodes",
+    # Internal list helpers (exposed for testing)
+    "_list_aden_accounts",
+    "_list_local_accounts",
+    "_list_env_fallback_accounts",
+]
@@ -0,0 +1,111 @@
+"""CLI entry point for Credential Tester agent."""
+
+import asyncio
+
+import click
+
+from .agent import CredentialTesterAgent
+
+
+def setup_logging(verbose=False, debug=False):
+    from framework.observability import configure_logging
+
+    if debug:
+        configure_logging(level="DEBUG")
+    elif verbose:
+        configure_logging(level="INFO")
+    else:
+        configure_logging(level="WARNING")
+
+
+def pick_account(agent: CredentialTesterAgent) -> dict | None:
+    """Interactive account picker. Returns selected account dict or None."""
+    accounts = agent.list_accounts()
+    if not accounts:
+        click.echo("No connected accounts found.")
+        click.echo("Set ADEN_API_KEY and connect accounts at https://app.adenhq.com")
+        return None
+
+    click.echo("\nConnected accounts:\n")
+    for i, acct in enumerate(accounts, 1):
+        provider = acct.get("provider", "?")
+        alias = acct.get("alias", "?")
+        identity = acct.get("identity", {})
+        detail_parts = [f"{k}: {v}" for k, v in identity.items() if v]
+        detail = f"  ({', '.join(detail_parts)})" if detail_parts else ""
+        click.echo(f"  {i}. {provider}/{alias}{detail}")
+
+    click.echo()
+    while True:
+        choice = click.prompt("Pick an account to test", type=int, default=1)
+        if 1 <= choice <= len(accounts):
+            return accounts[choice - 1]
+        click.echo(f"Invalid choice. Enter 1-{len(accounts)}.")
+
+
+@click.group()
+@click.version_option(version="1.0.0")
+def cli():
+    """Credential Tester — verify synced credentials via live API calls."""
+    pass
+
+
+@cli.command()
+@click.option("--verbose", "-v", is_flag=True)
+@click.option("--debug", is_flag=True)
+def shell(verbose, debug):
+    """Interactive CLI session to test a credential."""
+    setup_logging(verbose=verbose, debug=debug)
+    asyncio.run(_interactive_shell(verbose))
+
+
+async def _interactive_shell(verbose=False):
+    agent = CredentialTesterAgent()
+    account = pick_account(agent)
+    if account is None:
+        return
+
+    agent.select_account(account)
+    provider = account.get("provider", "?")
+    alias = account.get("alias", "?")
+
+    click.echo(f"\nTesting {provider}/{alias}")
+    click.echo("Type your requests or 'quit' to exit.\n")
+
+    await agent.start()
+
+    try:
+        result = await agent._agent_runtime.trigger_and_wait(
+            entry_point_id="start",
+            input_data={},
+        )
+        if result:
+            click.echo(f"\nSession ended: {'success' if result.success else result.error}")
+    except KeyboardInterrupt:
+        click.echo("\nGoodbye!")
+    finally:
+        await agent.stop()
+
+
+@cli.command(name="list")
+def list_accounts():
+    """List all connected accounts."""
+    agent = CredentialTesterAgent()
+    accounts = agent.list_accounts()
+
+    if not accounts:
+        click.echo("No connected accounts found.")
+        return
+
+    click.echo("\nConnected accounts:\n")
+    for acct in accounts:
+        provider = acct.get("provider", "?")
+        alias = acct.get("alias", "?")
+        identity = acct.get("identity", {})
+        detail_parts = [f"{k}: {v}" for k, v in identity.items() if v]
+        detail = f"  ({', '.join(detail_parts)})" if detail_parts else ""
+        click.echo(f"  {provider}/{alias}{detail}")
+
+
+if __name__ == "__main__":
+    cli()
@@ -0,0 +1,659 @@
+"""Credential Tester agent — verify credentials via live API calls.
+
+Supports both Aden OAuth2-synced accounts AND locally-stored API key accounts.
+Aden accounts use account="alias" routing; local accounts inject the key into
+the session environment so tools read it without an account= parameter.
+
+When loaded via AgentRunner.load() (TUI picker, ``hive run``), the module-level
+``nodes`` / ``edges`` variables provide a static graph.  The TUI detects
+``requires_account_selection`` and shows an account picker *before* starting
+the agent.  ``configure_for_account()`` then scopes the node's tools to the
+selected provider.
+
+When used directly (``CredentialTesterAgent``), the graph is built dynamically
+after the user picks an account programmatically.
+"""
+
+from __future__ import annotations
+
+import logging
+from pathlib import Path
+from typing import TYPE_CHECKING
+
+from framework.config import get_max_context_tokens
+from framework.graph import Goal, NodeSpec, SuccessCriterion
+from framework.graph.checkpoint_config import CheckpointConfig
+from framework.graph.edge import GraphSpec
+from framework.graph.executor import ExecutionResult
+from framework.llm import LiteLLMProvider
+from framework.runner.mcp_registry import MCPRegistry
+from framework.runner.tool_registry import ToolRegistry
+from framework.runtime.agent_runtime import AgentRuntime, create_agent_runtime
+from framework.runtime.execution_stream import EntryPointSpec
+
+from .config import default_config
+from .nodes import build_tester_node
+
+logger = logging.getLogger(__name__)
+
+if TYPE_CHECKING:
+    from framework.runner import AgentRunner
+
+logger = logging.getLogger(__name__)
+
+# ---------------------------------------------------------------------------
+# Goal
+# ---------------------------------------------------------------------------
+
+goal = Goal(
+    id="credential-tester",
+    name="Credential Tester",
+    description="Verify that a credential can make real API calls.",
+    success_criteria=[
+        SuccessCriterion(
+            id="api-call-success",
+            description="At least one API call succeeds using the credential",
+            metric="api_call_success",
+            target="true",
+            weight=1.0,
+        ),
+    ],
+    constraints=[],
+)
+
+# ---------------------------------------------------------------------------
+# Helpers
+# ---------------------------------------------------------------------------
+
+
+def get_tools_for_provider(provider_name: str) -> list[str]:
+    """Collect tool names for a credential by credential_id OR credential_group.
+
+    Matches on both ``credential_id`` (e.g. "google" → Gmail tools) and
+    ``credential_group`` (e.g. "google_custom_search" → all google search tools).
+    """
+    from aden_tools.credentials import CREDENTIAL_SPECS
+
+    tools: list[str] = []
+    for spec in CREDENTIAL_SPECS.values():
+        if spec.credential_id == provider_name or spec.credential_group == provider_name:
+            tools.extend(spec.tools)
+    return sorted(set(tools))
+
+
+def _list_aden_accounts() -> list[dict]:
+    """List active accounts from the Aden platform (requires ADEN_API_KEY)."""
+    import os
+
+    api_key = os.environ.get("ADEN_API_KEY")
+    if not api_key:
+        return []
+
+    try:
+        from framework.credentials.aden.client import AdenClientConfig, AdenCredentialClient
+
+        client = AdenCredentialClient(
+            AdenClientConfig(
+                base_url=os.environ.get("ADEN_API_URL", "https://api.adenhq.com"),
+            )
+        )
+        try:
+            integrations = client.list_integrations()
+        finally:
+            client.close()
+
+        return [
+            {
+                "provider": c.provider,
+                "alias": c.alias,
+                "identity": {"email": c.email} if c.email else {},
+                "integration_id": c.integration_id,
+                "source": "aden",
+            }
+            for c in integrations
+            if c.status == "active"
+        ]
+    except (ImportError, OSError) as exc:
+        logger.debug("Could not list Aden accounts: %s", exc)
+        return []
+    except Exception:
+        logger.warning("Unexpected error listing Aden accounts", exc_info=True)
+        return []
+
+
+def _list_local_accounts() -> list[dict]:
+    """List named local API key accounts from LocalCredentialRegistry."""
+    try:
+        from framework.credentials.local.registry import LocalCredentialRegistry
+
+        return [
+            info.to_account_dict() for info in LocalCredentialRegistry.default().list_accounts()
+        ]
+    except ImportError as exc:
+        logger.debug("Local credential registry unavailable: %s", exc)
+        return []
+    except Exception:
+        logger.warning("Unexpected error listing local accounts", exc_info=True)
+        return []
+
+
+def _list_env_fallback_accounts() -> list[dict]:
+    """Surface configured-but-unregistered credentials as testable entries.
+
+    Detects credentials available via env vars OR stored in the encrypted
+    store in the old flat format (e.g. ``brave_search`` with no alias).
+    These are users who haven't yet run ``save_account()`` but have a working key.
+    Shows with alias="default" and status="unknown".
+    """
+    import os
+
+    from aden_tools.credentials import CREDENTIAL_SPECS
+
+    # Collect IDs in encrypted store (includes old flat entries like "brave_search")
+    try:
+        from framework.credentials.storage import EncryptedFileStorage
+
+        encrypted_ids: set[str] = set(EncryptedFileStorage().list_all())
+    except (ImportError, OSError) as exc:
+        logger.debug("Could not read encrypted store: %s", exc)
+        encrypted_ids = set()
+    except Exception:
+        logger.warning("Unexpected error reading encrypted store", exc_info=True)
+        encrypted_ids = set()
+
+    def _is_configured(cred_name: str, spec) -> bool:
+        # 1. Env var present
+        if os.environ.get(spec.env_var):
+            return True
+        # 2. Old flat encrypted entry (no slash — new entries have {x}/{y})
+        if cred_name in encrypted_ids:
+            return True
+        return False
+
+    seen_groups: set[str] = set()
+    accounts: list[dict] = []
+
+    for cred_name, spec in CREDENTIAL_SPECS.items():
+        if not spec.direct_api_key_supported or not spec.tools:
+            continue
+
+        if spec.credential_group:
+            if spec.credential_group in seen_groups:
+                continue
+            group_available = all(
+                _is_configured(n, s)
+                for n, s in CREDENTIAL_SPECS.items()
+                if s.credential_group == spec.credential_group
+            )
+            if not group_available:
+                continue
+            seen_groups.add(spec.credential_group)
+            provider = spec.credential_group
+        else:
+            if not _is_configured(cred_name, spec):
+                continue
+            provider = cred_name
+
+        accounts.append(
+            {
+                "provider": provider,
+                "alias": "default",
+                "identity": {},
+                "integration_id": None,
+                "source": "local",
+                "status": "unknown",
+            }
+        )
+
+    return accounts
+
+
+def list_connected_accounts() -> list[dict]:
+    """List all testable accounts: Aden-synced + named local + env-var fallbacks."""
+    aden = _list_aden_accounts()
+    local = _list_local_accounts()
+
+    # Show env-var fallbacks only for credentials not already in the named registry
+    local_providers = {a["provider"] for a in local}
+    env_fallbacks = [
+        a for a in _list_env_fallback_accounts() if a["provider"] not in local_providers
+    ]
+
+    return aden + local + env_fallbacks
+
+
+# ---------------------------------------------------------------------------
+# Module-level hooks (read by AgentRunner.load / TUI)
+# ---------------------------------------------------------------------------
+
+skip_credential_validation = True
+"""Don't validate credentials at load time — we don't know which provider yet."""
+
+requires_account_selection = True
+"""Signal TUI to show account picker before starting the agent."""
+
+
+def configure_for_account(runner: AgentRunner, account: dict) -> None:
+    """Scope the tester node's tools to the selected provider.
+
+    Handles both Aden accounts (account= routing) and local accounts
+    (session-level env var injection, no account= parameter in prompt).
+    """
+    provider = account["provider"]
+    source = account.get("source", "aden")
+    alias = account.get("alias", "unknown")
+    identity = account.get("identity", {})
+    tools = get_tools_for_provider(provider)
+
+    if source == "aden":
+        tools.append("get_account_info")
+        email = identity.get("email", "")
+        detail = f" (email: {email})" if email else ""
+        _configure_aden_node(runner, provider, alias, detail, tools)
+    else:
+        status = account.get("status", "unknown")
+        _activate_local_account(provider, alias)
+        _configure_local_node(runner, provider, alias, identity, tools, status)
+
+
+def _activate_local_account(credential_id: str, alias: str) -> None:
+    """Inject a named local account's key into the session environment.
+
+    Handles three cases:
+    1. Named account in LocalCredentialRegistry (new format: {credential_id}/{alias})
+    2. Old flat credential in EncryptedFileStorage (id == credential_id, no alias)
+    3. Env var already set — skip injection (nothing to do)
+    """
+    import os
+
+    from aden_tools.credentials import CREDENTIAL_SPECS
+
+    # Collect specs for this credential (handles grouped credentials too)
+    group_specs = [
+        (cred_name, spec)
+        for cred_name, spec in CREDENTIAL_SPECS.items()
+        if spec.credential_group == credential_id
+        or spec.credential_id == credential_id
+        or cred_name == credential_id
+    ]
+    # Deduplicate — credential_id and credential_group may both match the same spec
+    seen_env_vars: set[str] = set()
+
+    try:
+        from framework.credentials.local.registry import LocalCredentialRegistry
+        from framework.credentials.storage import EncryptedFileStorage
+
+        registry = LocalCredentialRegistry.default()
+        flat_storage = EncryptedFileStorage()
+
+        for _cred_name, spec in group_specs:
+            if spec.env_var in seen_env_vars:
+                continue
+            # If env var is already set, nothing to do for this one
+            if os.environ.get(spec.env_var):
+                seen_env_vars.add(spec.env_var)
+                continue
+
+            seen_env_vars.add(spec.env_var)
+
+            # Determine key name based on spec
+            key_name = "api_key"
+            if spec.credential_group and "cse" in spec.env_var.lower():
+                key_name = "cse_id"
+
+            key: str | None = None
+
+            # 1. Try named account in registry (new format)
+            if alias != "default":
+                key = registry.get_key(credential_id, alias, key_name)
+            else:
+                # For "default" alias, check registry first, then fall back to flat store
+                key = registry.get_key(credential_id, "default", key_name)
+
+            # 2. Fall back to old flat encrypted entry (id == credential_id, no alias)
+            if key is None:
+                flat_cred = flat_storage.load(credential_id)
+                if flat_cred is not None:
+                    key = flat_cred.get_key(key_name) or flat_cred.get_default_key()
+
+            if key:
+                os.environ[spec.env_var] = key
+    except (ImportError, KeyError, OSError) as exc:
+        logger.debug("Could not inject credentials: %s", exc)
+    except Exception:
+        logger.warning("Unexpected error injecting credentials", exc_info=True)
+
+
+def _configure_aden_node(
+    runner: AgentRunner,
+    provider: str,
+    alias: str,
+    detail: str,
+    tools: list[str],
+) -> None:
+    for node in runner.graph.nodes:
+        if node.id == "tester":
+            node.tools = sorted(set(tools))
+            node.system_prompt = f"""\
+You are a credential tester for the account: {provider}/{alias}{detail}
+
+# Instructions
+
+1. Suggest a simple read-only API call to verify the credential works \
+(e.g. list messages, list channels, list contacts).
+2. Execute the call when the user agrees.
+3. Report the result: success (with sample data) or failure (with error).
+4. Let the user request additional API calls to further test the credential.
+
+# Account routing
+
+IMPORTANT: Always pass `account="{alias}"` when calling any tool. \
+This routes the API call to the correct credential. Never use the email \
+or any other identifier — always use the alias exactly as shown.
+
+# Rules
+
+- Start with read-only operations (list, get) before write operations.
+- Always confirm with the user before performing write operations.
+- If a call fails, report the exact error — this helps diagnose credential issues.
+- Be concise. No emojis.
+"""
+            break
+
+    runner.intro_message = (
+        f"Testing {provider}/{alias}{detail} — "
+        f"{len(tools)} tools loaded. "
+        "I'll suggest a read-only API call to verify the credential works."
+    )
+
+
+def _configure_local_node(
+    runner: AgentRunner,
+    provider: str,
+    alias: str,
+    identity: dict,
+    tools: list[str],
+    status: str,
+) -> None:
+    identity_parts = [f"{k}: {v}" for k, v in identity.items() if v]
+    detail = f" ({', '.join(identity_parts)})" if identity_parts else ""
+    status_note = " [key not yet validated]" if status == "unknown" else ""
+
+    for node in runner.graph.nodes:
+        if node.id == "tester":
+            node.tools = sorted(set(tools))
+            node.system_prompt = f"""\
+You are a credential tester for the local API key: {provider}/{alias}{detail}{status_note}
+
+# Instructions
+
+1. Suggest a simple test call to verify the credential works \
+(e.g. search for "test", list items, get profile info).
+2. Execute the call when the user agrees.
+3. Report the result: success (with sample data) or failure (with error).
+4. Let the user request additional API calls to further test the credential.
+
+# Rules
+
+- Do NOT pass an `account` parameter — this credential is injected \
+directly into the session environment and tools read it automatically.
+- Start with read-only operations before write operations.
+- Always confirm with the user before performing write operations.
+- If a call fails, report the exact error — this helps diagnose credential issues.
+- Be concise. No emojis.
+"""
+            break
+
+    runner.intro_message = (
+        f"Testing {provider}/{alias}{detail} — "
+        f"{len(tools)} tools loaded. "
+        "I'll suggest a test API call to verify the credential works."
+    )
+
+
+# ---------------------------------------------------------------------------
+# Module-level graph variables (read by AgentRunner.load)
+# ---------------------------------------------------------------------------
+
+nodes = [
+    NodeSpec(
+        id="tester",
+        name="Credential Tester",
+        description=(
+            "Interactive credential testing — lets the user pick an account "
+            "and verify it via API calls."
+        ),
+        node_type="event_loop",
+        client_facing=True,
+        max_node_visits=0,
+        input_keys=[],
+        output_keys=["test_result"],
+        nullable_output_keys=["test_result"],
+        tools=["get_account_info"],
+        system_prompt="""\
+You are a credential tester. Your job is to help the user verify that their \
+connected accounts and API keys can make real API calls.
+
+# Startup
+
+1. Call ``get_account_info`` to list the user's connected accounts.
+2. Present the list and ask the user which account to test.
+3. Once they pick one, note the account's **alias** (e.g. "Timothy", "work-slack").
+4. Suggest a simple read-only API call to verify the credential works \
+(e.g. list messages, list channels, list contacts).
+5. Execute the call when the user agrees.
+6. Report the result: success (with sample data) or failure (with error).
+7. Let the user request additional API calls to further test the credential.
+
+# Account routing (Aden accounts only)
+
+IMPORTANT: For Aden-synced accounts, always pass the account's **alias** as the \
+``account`` parameter when calling any tool. For local API key accounts, do NOT \
+pass an account parameter — they are pre-injected into the session.
+
+# Rules
+
+- Start with read-only operations (list, get) before write operations.
+- Always confirm with the user before performing write operations.
+- If a call fails, report the exact error — this helps diagnose credential issues.
+- Be concise. No emojis.
+""",
+    ),
+]
+
+edges = []
+
+entry_node = "tester"
+entry_points = {"start": "tester"}
+pause_nodes = []
+terminal_nodes = ["tester"]  # Tester node can terminate
+
+conversation_mode = "continuous"
+identity_prompt = (
+    "You are a credential tester that verifies connected accounts and API keys "
+    "can make real API calls."
+)
+loop_config = {
+    "max_iterations": 50,
+    "max_tool_calls_per_turn": 30,
+}
+
+# ---------------------------------------------------------------------------
+# Programmatic agent class (used by __main__.py CLI)
+# ---------------------------------------------------------------------------
+
+
+class CredentialTesterAgent:
+    """Interactive agent that tests a specific credential via API calls.
+
+    Usage:
+        agent = CredentialTesterAgent()
+        accounts = agent.list_accounts()
+        agent.select_account(accounts[0])
+        await agent.start()
+        await agent.stop()
+    """
+
+    def __init__(self, config=None):
+        self.config = config or default_config
+        self._selected_account: dict | None = None
+        self._agent_runtime: AgentRuntime | None = None
+        self._tool_registry: ToolRegistry | None = None
+        self._storage_path: Path | None = None
+
+    def list_accounts(self) -> list[dict]:
+        """List all testable accounts (Aden + local named + env-var fallbacks)."""
+        return list_connected_accounts()
+
+    def select_account(self, account: dict) -> None:
+        """Select an account to test.
+
+        Args:
+            account: Account dict from list_accounts() with
+                     provider, alias, identity, source keys.
+        """
+        self._selected_account = account
+
+    @property
+    def selected_provider(self) -> str:
+        if self._selected_account is None:
+            raise RuntimeError("No account selected. Call select_account() first.")
+        return self._selected_account["provider"]
+
+    @property
+    def selected_alias(self) -> str:
+        if self._selected_account is None:
+            raise RuntimeError("No account selected. Call select_account() first.")
+        return self._selected_account.get("alias", "unknown")
+
+    def _build_graph(self) -> GraphSpec:
+        provider = self.selected_provider
+        alias = self.selected_alias
+        source = self._selected_account.get("source", "aden")
+        identity = self._selected_account.get("identity", {})
+        tools = get_tools_for_provider(provider)
+
+        if source == "local":
+            _activate_local_account(provider, alias)
+        elif source == "aden":
+            tools.append("get_account_info")
+
+        tester_node = build_tester_node(
+            provider=provider,
+            alias=alias,
+            tools=tools,
+            identity=identity,
+            source=source,
+        )
+
+        return GraphSpec(
+            id="credential-tester-graph",
+            goal_id=goal.id,
+            version="1.0.0",
+            entry_node="tester",
+            entry_points={"start": "tester"},
+            terminal_nodes=["tester"],  # Tester node can terminate
+            pause_nodes=[],
+            nodes=[tester_node],
+            edges=[],
+            default_model=self.config.model,
+            max_tokens=self.config.max_tokens,
+            loop_config={
+                "max_iterations": 50,
+                "max_tool_calls_per_turn": 30,
+                "max_context_tokens": get_max_context_tokens(),
+            },
+            conversation_mode="continuous",
+            identity_prompt=(
+                f"You are testing the {provider}/{alias} credential. "
+                "Help the user verify it works by making real API calls."
+            ),
+        )
+
+    def _setup(self) -> None:
+        if self._selected_account is None:
+            raise RuntimeError("No account selected. Call select_account() first.")
+
+        self._storage_path = Path.home() / ".hive" / "agents" / "credential_tester"
+        self._storage_path.mkdir(parents=True, exist_ok=True)
+
+        self._tool_registry = ToolRegistry()
+
+        mcp_config_path = Path(__file__).parent / "mcp_servers.json"
+        if mcp_config_path.exists():
+            self._tool_registry.load_mcp_config(mcp_config_path)
+
+        try:
+            agent_dir = Path(__file__).parent
+            registry = MCPRegistry()
+            registry.initialize()
+            if (agent_dir / "mcp_registry.json").is_file():
+                self._tool_registry.set_mcp_registry_agent_path(agent_dir)
+            registry_configs, selection_max_tools = registry.load_agent_selection(agent_dir)
+            if registry_configs:
+                self._tool_registry.load_registry_servers(
+                    registry_configs,
+                    preserve_existing_tools=True,
+                    log_collisions=True,
+                    max_tools=selection_max_tools,
+                )
+        except Exception:
+            logger.warning("MCP registry config failed to load", exc_info=True)
+
+        extra_kwargs = getattr(self.config, "extra_kwargs", {}) or {}
+        llm = LiteLLMProvider(
+            model=self.config.model,
+            api_key=self.config.api_key,
+            api_base=self.config.api_base,
+            **extra_kwargs,
+        )
+
+        tool_executor = self._tool_registry.get_executor()
+        tools = list(self._tool_registry.get_tools().values())
+
+        graph = self._build_graph()
+
+        self._agent_runtime = create_agent_runtime(
+            graph=graph,
+            goal=goal,
+            storage_path=self._storage_path,
+            entry_points=[
+                EntryPointSpec(
+                    id="start",
+                    name="Test Credential",
+                    entry_node="tester",
+                    trigger_type="manual",
+                    isolation_level="isolated",
+                ),
+            ],
+            llm=llm,
+            tools=tools,
+            tool_executor=tool_executor,
+            checkpoint_config=CheckpointConfig(enabled=False),
+            graph_id="credential_tester",
+        )
+
+    async def start(self) -> None:
+        """Set up and start the agent runtime."""
+        if self._agent_runtime is None:
+            self._setup()
+        if not self._agent_runtime.is_running:
+            await self._agent_runtime.start()
+
+    async def stop(self) -> None:
+        """Stop the agent runtime."""
+        if self._agent_runtime and self._agent_runtime.is_running:
+            await self._agent_runtime.stop()
+        self._agent_runtime = None
+
+    async def run(self) -> ExecutionResult:
+        """Run the agent (convenience for single execution)."""
+        await self.start()
+        try:
+            result = await self._agent_runtime.trigger_and_wait(
+                entry_point_id="start",
+                input_data={},
+            )
+            return result or ExecutionResult(success=False, error="Execution timeout")
+        finally:
+            await self.stop()
@@ -0,0 +1,19 @@
+"""Runtime configuration for Credential Tester agent."""
+
+from dataclasses import dataclass
+
+from framework.config import RuntimeConfig
+
+
+@dataclass
+class AgentMetadata:
+    name: str = "Credential Tester"
+    version: str = "1.0.0"
+    description: str = (
+        "Test connected accounts by making real API calls. "
+        "Pick an account, verify credentials work, and explore available tools."
+    )
+
+
+metadata = AgentMetadata()
+default_config = RuntimeConfig(temperature=0.3)
@@ -0,0 +1,9 @@
+{
+  "hive-tools": {
+    "transport": "stdio",
+    "command": "uv",
+    "args": ["run", "python", "mcp_server.py", "--stdio"],
+    "cwd": "../../../../tools",
+    "description": "Hive tools MCP server with provider-specific tools"
+  }
+}
@@ -0,0 +1,85 @@
+"""Node definitions for Credential Tester agent."""
+
+from framework.graph import NodeSpec
+
+
+def build_tester_node(
+    provider: str,
+    alias: str,
+    tools: list[str],
+    identity: dict[str, str],
+    source: str = "aden",
+) -> NodeSpec:
+    """Build the tester node dynamically for the selected account.
+
+    Args:
+        provider: Provider / credential name (e.g. "google", "brave_search").
+        alias: User-set alias (e.g. "Timothy", "work").
+        tools: Tool names available for this provider.
+        identity: Identity dict (email, workspace, etc.) for context.
+        source: "aden" or "local" — controls routing instructions in the prompt.
+    """
+    detail_parts = [f"{k}: {v}" for k, v in identity.items() if v]
+    detail = f" ({', '.join(detail_parts)})" if detail_parts else ""
+
+    if source == "aden":
+        routing_section = f"""\
+# Account routing
+
+IMPORTANT: Always pass `account="{alias}"` when calling any tool. \
+This routes the API call to the correct credential. Never use the email \
+or any other identifier — always use the alias exactly as shown.
+"""
+    else:
+        routing_section = """\
+# Credential routing
+
+This is a local API key credential — do NOT pass an `account` parameter. \
+The key is pre-injected into the session environment and tools read it automatically.
+"""
+
+    account_label = "account" if source == "aden" else "local API key"
+
+    return NodeSpec(
+        id="tester",
+        name="Credential Tester",
+        description=(
+            f"Interactive testing node for {provider}/{alias}. "
+            f"Has access to all {provider} tools to verify the credential works."
+        ),
+        node_type="event_loop",
+        client_facing=True,
+        max_node_visits=0,
+        input_keys=[],
+        output_keys=["test_result"],
+        nullable_output_keys=["test_result"],
+        tools=tools,
+        system_prompt=f"""\
+You are a credential tester for the {account_label}: {provider}/{alias}{detail}
+
+Your job is to help the user verify that this credential works by making \
+real API calls using the available tools.
+
+{routing_section}
+# Instructions
+
+1. Start by greeting the user and confirming which account you're testing.
+2. Suggest a simple, safe, read-only API call to verify the credential works \
+(e.g. list messages, list channels, list contacts, search for "test").
+3. Execute the call when the user agrees.
+4. Report the result clearly: success (with sample data) or failure (with error).
+5. Let the user request additional API calls to further test the credential.
+
+# Available tools
+
+You have access to {len(tools)} tools for {provider}:
+{chr(10).join(f"- {t}" for t in tools)}
+
+# Rules
+
+- Start with read-only operations (list, get) before write operations (create, update, delete).
+- Always confirm with the user before performing write operations.
+- If a call fails, report the exact error — this helps diagnose credential issues.
+- Be concise. No emojis.
+""",
+    )
@@ -0,0 +1,209 @@
+"""Agent discovery — scan known directories and return categorised AgentEntry lists."""
+
+from __future__ import annotations
+
+import json
+from dataclasses import dataclass, field
+from pathlib import Path
+
+
+@dataclass
+class AgentEntry:
+    """Lightweight agent metadata for the picker / API discover endpoint."""
+
+    path: Path
+    name: str
+    description: str
+    category: str
+    session_count: int = 0
+    run_count: int = 0
+    node_count: int = 0
+    tool_count: int = 0
+    tags: list[str] = field(default_factory=list)
+    last_active: str | None = None
+
+
+def _get_last_active(agent_path: Path) -> str | None:
+    """Return the most recent updated_at timestamp across all sessions.
+
+    Checks both worker sessions (``~/.hive/agents/{name}/sessions/``) and
+    queen sessions (``~/.hive/queen/session/``) whose ``meta.json`` references
+    the same *agent_path*.
+    """
+    from datetime import datetime
+
+    agent_name = agent_path.name
+    latest: str | None = None
+
+    # 1. Worker sessions
+    sessions_dir = Path.home() / ".hive" / "agents" / agent_name / "sessions"
+    if sessions_dir.exists():
+        for session_dir in sessions_dir.iterdir():
+            if not session_dir.is_dir() or not session_dir.name.startswith("session_"):
+                continue
+            state_file = session_dir / "state.json"
+            if not state_file.exists():
+                continue
+            try:
+                data = json.loads(state_file.read_text(encoding="utf-8"))
+                ts = data.get("timestamps", {}).get("updated_at")
+                if ts and (latest is None or ts > latest):
+                    latest = ts
+            except Exception:
+                continue
+
+    # 2. Queen sessions
+    queen_sessions_dir = Path.home() / ".hive" / "queen" / "session"
+    if queen_sessions_dir.exists():
+        resolved = agent_path.resolve()
+        for d in queen_sessions_dir.iterdir():
+            if not d.is_dir():
+                continue
+            meta_file = d / "meta.json"
+            if not meta_file.exists():
+                continue
+            try:
+                meta = json.loads(meta_file.read_text(encoding="utf-8"))
+                stored = meta.get("agent_path")
+                if not stored or Path(stored).resolve() != resolved:
+                    continue
+                ts = datetime.fromtimestamp(d.stat().st_mtime).isoformat()
+                if latest is None or ts > latest:
+                    latest = ts
+            except Exception:
+                continue
+
+    return latest
+
+
+def _count_sessions(agent_name: str) -> int:
+    """Count session directories under ~/.hive/agents/{agent_name}/sessions/."""
+    sessions_dir = Path.home() / ".hive" / "agents" / agent_name / "sessions"
+    if not sessions_dir.exists():
+        return 0
+    return sum(1 for d in sessions_dir.iterdir() if d.is_dir() and d.name.startswith("session_"))
+
+
+def _count_runs(agent_name: str) -> int:
+    """Count unique run_ids across all sessions for an agent."""
+    sessions_dir = Path.home() / ".hive" / "agents" / agent_name / "sessions"
+    if not sessions_dir.exists():
+        return 0
+    run_ids: set[str] = set()
+    for session_dir in sessions_dir.iterdir():
+        if not session_dir.is_dir() or not session_dir.name.startswith("session_"):
+            continue
+        # runs.jsonl lives inside workspace subdirectories
+        for runs_file in session_dir.rglob("runs.jsonl"):
+            try:
+                for line in runs_file.read_text(encoding="utf-8").splitlines():
+                    line = line.strip()
+                    if not line:
+                        continue
+                    record = json.loads(line)
+                    rid = record.get("run_id")
+                    if rid:
+                        run_ids.add(rid)
+            except Exception:
+                continue
+    return len(run_ids)
+
+
+def _extract_agent_stats(agent_path: Path) -> tuple[int, int, list[str]]:
+    """Extract node count, tool count, and tags from an agent directory.
+
+    Prefers agent.py (AST-parsed) over agent.json for node/tool counts
+    since agent.json may be stale.  Tags are only available from agent.json.
+    """
+    import ast
+
+    node_count, tool_count, tags = 0, 0, []
+
+    agent_py = agent_path / "agent.py"
+    if agent_py.exists():
+        try:
+            tree = ast.parse(agent_py.read_text(encoding="utf-8"))
+            for node in ast.walk(tree):
+                if isinstance(node, ast.Assign):
+                    for target in node.targets:
+                        if isinstance(target, ast.Name) and target.id == "nodes":
+                            if isinstance(node.value, ast.List):
+                                node_count = len(node.value.elts)
+        except Exception:
+            pass
+
+    agent_json = agent_path / "agent.json"
+    if agent_json.exists():
+        try:
+            data = json.loads(agent_json.read_text(encoding="utf-8"))
+            json_nodes = data.get("graph", {}).get("nodes", []) or data.get("nodes", [])
+            if node_count == 0:
+                node_count = len(json_nodes)
+            tools: set[str] = set()
+            for n in json_nodes:
+                tools.update(n.get("tools", []))
+            tool_count = len(tools)
+            tags = data.get("agent", {}).get("tags", [])
+        except Exception:
+            pass
+
+    return node_count, tool_count, tags
+
+
+def discover_agents() -> dict[str, list[AgentEntry]]:
+    """Discover agents from all known sources grouped by category."""
+    from framework.runner.cli import (
+        _extract_python_agent_metadata,
+        _get_framework_agents_dir,
+        _is_valid_agent_dir,
+    )
+
+    groups: dict[str, list[AgentEntry]] = {}
+    sources = [
+        ("Your Agents", Path("exports")),
+        ("Framework", _get_framework_agents_dir()),
+        ("Examples", Path("examples/templates")),
+    ]
+
+    for category, base_dir in sources:
+        if not base_dir.exists():
+            continue
+        entries: list[AgentEntry] = []
+        for path in sorted(base_dir.iterdir(), key=lambda p: p.name):
+            if not _is_valid_agent_dir(path):
+                continue
+
+            name, desc = _extract_python_agent_metadata(path)
+            config_fallback_name = path.name.replace("_", " ").title()
+            used_config = name != config_fallback_name
+
+            node_count, tool_count, tags = _extract_agent_stats(path)
+            if not used_config:
+                agent_json = path / "agent.json"
+                if agent_json.exists():
+                    try:
+                        data = json.loads(agent_json.read_text(encoding="utf-8"))
+                        meta = data.get("agent", {})
+                        name = meta.get("name", name)
+                        desc = meta.get("description", desc)
+                    except Exception:
+                        pass
+
+            entries.append(
+                AgentEntry(
+                    path=path,
+                    name=name,
+                    description=desc,
+                    category=category,
+                    session_count=_count_sessions(path.name),
+                    run_count=_count_runs(path.name),
+                    node_count=node_count,
+                    tool_count=tool_count,
+                    tags=tags,
+                    last_active=_get_last_active(path),
+                )
+            )
+        if entries:
+            groups[category] = entries
+
+    return groups
@@ -0,0 +1,21 @@
+"""
+Queen — Native agent builder for the Hive framework.
+
+Deeply understands the agent framework and produces complete Python packages
+with goals, nodes, edges, system prompts, MCP configuration, and tests
+from natural language specifications.
+"""
+
+from .agent import queen_goal, queen_graph
+from .config import AgentMetadata, RuntimeConfig, default_config, metadata
+
+__version__ = "1.0.0"
+
+__all__ = [
+    "queen_goal",
+    "queen_graph",
+    "RuntimeConfig",
+    "AgentMetadata",
+    "default_config",
+    "metadata",
+]
@@ -0,0 +1,38 @@
+"""Queen graph definition."""
+
+from framework.graph import Goal
+from framework.graph.edge import GraphSpec
+
+from .nodes import queen_node
+
+# ---------------------------------------------------------------------------
+# Queen graph — the primary persistent conversation.
+# Loaded by queen_orchestrator.create_queen(), NOT by AgentRunner.
+# ---------------------------------------------------------------------------
+
+queen_goal = Goal(
+    id="queen-manager",
+    name="Queen Manager",
+    description=(
+        "Manage the worker agent lifecycle and serve as the user's primary interactive interface."
+    ),
+    success_criteria=[],
+    constraints=[],
+)
+
+queen_graph = GraphSpec(
+    id="queen-graph",
+    goal_id=queen_goal.id,
+    version="1.0.0",
+    entry_node="queen",
+    entry_points={"start": "queen"},
+    terminal_nodes=[],
+    pause_nodes=[],
+    nodes=[queen_node],
+    edges=[],
+    conversation_mode="continuous",
+    loop_config={
+        "max_iterations": 999_999,
+        "max_tool_calls_per_turn": 30,
+    },
+)
@@ -0,0 +1,51 @@
+"""Runtime configuration for Queen agent."""
+
+import json
+from dataclasses import dataclass, field
+from pathlib import Path
+
+
+def _load_preferred_model() -> str:
+    """Load preferred model from ~/.hive/configuration.json."""
+    config_path = Path.home() / ".hive" / "configuration.json"
+    if config_path.exists():
+        try:
+            with open(config_path, encoding="utf-8") as f:
+                config = json.load(f)
+            llm = config.get("llm", {})
+            if llm.get("provider") and llm.get("model"):
+                return f"{llm['provider']}/{llm['model']}"
+        except Exception:
+            pass
+    return "anthropic/claude-sonnet-4-20250514"
+
+
+@dataclass
+class RuntimeConfig:
+    model: str = field(default_factory=_load_preferred_model)
+    temperature: float = 0.7
+    max_tokens: int = 8000
+    api_key: str | None = None
+    api_base: str | None = None
+
+
+default_config = RuntimeConfig()
+
+
+@dataclass
+class AgentMetadata:
+    name: str = "Queen"
+    version: str = "1.0.0"
+    description: str = (
+        "Native coding agent that builds production-ready Hive agent packages "
+        "from natural language specifications. Deeply understands the agent framework "
+        "and produces complete Python packages with goals, nodes, edges, system prompts, "
+        "MCP configuration, and tests."
+    )
+    intro_message: str = (
+        "I'm Queen — I build Hive agents. Describe what kind of agent "
+        "you want to create and I'll design, implement, and validate it for you."
+    )
+
+
+metadata = AgentMetadata()
@@ -0,0 +1,9 @@
+{
+  "coder-tools": {
+    "transport": "stdio",
+    "command": "uv",
+    "args": ["run", "python", "coder_tools_server.py", "--stdio"],
+    "cwd": "../../../../tools",
+    "description": "Unsandboxed file system tools for code generation and validation"
+  }
+}
@@ -0,0 +1,80 @@
+"""Queen thinking hook — HR persona classifier.
+
+Fires once when the queen enters building mode at session start.
+Makes a single non-streaming LLM call (acting as an HR Director) to select
+the best-fit expert persona for the user's request, then returns a persona
+prefix string that replaces the queen's default "Solution Architect" identity.
+
+This is designed to activate the model's latent domain expertise — a CFO
+persona on a financial question, a Lawyer on a legal question, etc.
+"""
+
+from __future__ import annotations
+
+import json
+import logging
+from typing import TYPE_CHECKING
+
+if TYPE_CHECKING:
+    from framework.llm.provider import LLMProvider
+
+logger = logging.getLogger(__name__)
+
+_HR_SYSTEM_PROMPT = """\
+You are an expert HR Director and talent consultant at a world-class firm.
+A new request has arrived and you must identify which professional's expertise
+would produce the highest-quality response.
+
+Reply with ONLY a valid JSON object — no markdown, no prose, no explanation:
+{"role": "<job title>", "persona": "<2-3 sentence first-person identity statement>"}
+
+Rules:
+- Choose from any real professional role: CFO, CEO, CTO, Lawyer, Data Scientist,
+  Product Manager, Security Engineer, DevOps Engineer, Software Architect,
+  HR Director, Marketing Director, Business Analyst, UX Designer,
+  Financial Analyst, Operations Director, Legal Counsel, etc.
+- The persona statement must be written in first person ("I am..." or "I have...").
+- Select the role whose domain knowledge most directly applies to solving the request.
+- If the request is clearly about coding or building software systems, pick Software Architect.
+- "Queen" is your internal alias — do not include it in the persona.
+"""
+
+
+async def select_expert_persona(user_message: str, llm: LLMProvider) -> str:
+    """Run the HR classifier and return a persona prefix string.
+
+    Makes a single non-streaming acomplete() call with the session LLM.
+    Returns an empty string on any failure so the queen falls back
+    gracefully to its default "Solution Architect" identity.
+
+    Args:
+        user_message: The user's opening message for the session.
+        llm: The session LLM provider.
+
+    Returns:
+        A persona prefix like "You are a CFO. I am a CFO with 20 years..."
+        or "" on failure.
+    """
+    if not user_message.strip():
+        return ""
+
+    try:
+        response = await llm.acomplete(
+            messages=[{"role": "user", "content": user_message}],
+            system=_HR_SYSTEM_PROMPT,
+            max_tokens=1024,
+            json_mode=True,
+        )
+        raw = response.content.strip()
+        parsed = json.loads(raw)
+        role = parsed.get("role", "").strip()
+        persona = parsed.get("persona", "").strip()
+        if not role or not persona:
+            logger.warning("Thinking hook: empty role/persona in response: %r", raw)
+            return ""
+        result = f"You are a {role}. {persona}"
+        logger.info("Thinking hook: selected persona — %s", role)
+        return result
+    except Exception:
+        logger.warning("Thinking hook: persona classification failed", exc_info=True)
+        return ""
@@ -0,0 +1,408 @@
+"""Queen global cross-session memory.
+
+Three-tier memory architecture:
+  ~/.hive/queen/MEMORY.md                            — semantic (who, what, why)
+  ~/.hive/queen/memories/MEMORY-YYYY-MM-DD.md        — episodic (daily journals)
+  ~/.hive/queen/session/{id}/data/adapt.md           — working (session-scoped)
+
+Semantic and episodic files are injected at queen session start.
+
+Semantic memory (MEMORY.md) is updated automatically at session end via
+consolidate_queen_memory() — the queen never rewrites this herself.
+
+Episodic memory (MEMORY-date.md) can be written by the queen during a session
+via the write_to_diary tool, and is also appended to at session end by
+consolidate_queen_memory().
+"""
+
+from __future__ import annotations
+
+import asyncio
+import json
+import logging
+import traceback
+from datetime import date, datetime
+from pathlib import Path
+
+logger = logging.getLogger(__name__)
+
+
+def _queen_dir() -> Path:
+    return Path.home() / ".hive" / "queen"
+
+
+def format_memory_date(d: date) -> str:
+    """Return a cross-platform long date label without a zero-padded day."""
+    return f"{d.strftime('%B')} {d.day}, {d.year}"
+
+
+def semantic_memory_path() -> Path:
+    return _queen_dir() / "MEMORY.md"
+
+
+def episodic_memory_path(d: date | None = None) -> Path:
+    d = d or date.today()
+    return _queen_dir() / "memories" / f"MEMORY-{d.strftime('%Y-%m-%d')}.md"
+
+
+def read_semantic_memory() -> str:
+    path = semantic_memory_path()
+    return path.read_text(encoding="utf-8").strip() if path.exists() else ""
+
+
+def read_episodic_memory(d: date | None = None) -> str:
+    path = episodic_memory_path(d)
+    return path.read_text(encoding="utf-8").strip() if path.exists() else ""
+
+
+def _find_recent_episodic(lookback: int = 7) -> tuple[date, str] | None:
+    """Find the most recent non-empty episodic memory within *lookback* days."""
+    from datetime import timedelta
+
+    today = date.today()
+    for offset in range(lookback):
+        d = today - timedelta(days=offset)
+        content = read_episodic_memory(d)
+        if content:
+            return d, content
+    return None
+
+
+# Budget (in characters) for episodic memory in the system prompt.
+_EPISODIC_CHAR_BUDGET = 6_000
+
+
+def format_for_injection() -> str:
+    """Format cross-session memory for system prompt injection.
+
+    Returns an empty string if no meaningful content exists yet (e.g. first
+    session with only the seed template).
+    """
+    semantic = read_semantic_memory()
+    recent = _find_recent_episodic()
+
+    # Suppress injection if semantic is still just the seed template
+    if semantic and semantic.startswith("# My Understanding of the User\n\n*No sessions"):
+        semantic = ""
+
+    parts: list[str] = []
+    if semantic:
+        parts.append(semantic)
+
+    if recent:
+        d, content = recent
+        # Trim oversized episodic entries to keep the prompt manageable
+        if len(content) > _EPISODIC_CHAR_BUDGET:
+            content = content[:_EPISODIC_CHAR_BUDGET] + "\n\n…(truncated)"
+        today = date.today()
+        if d == today:
+            label = f"## Today — {format_memory_date(d)}"
+        else:
+            label = f"## {format_memory_date(d)}"
+        parts.append(f"{label}\n\n{content}")
+
+    if not parts:
+        return ""
+
+    body = "\n\n---\n\n".join(parts)
+    return "--- Your Cross-Session Memory ---\n\n" + body + "\n\n--- End Cross-Session Memory ---"
+
+
+_SEED_TEMPLATE = """\
+# My Understanding of the User
+
+*No sessions recorded yet.*
+
+## Who They Are
+
+## What They're Trying to Achieve
+
+## What's Working
+
+## What I've Learned
+"""
+
+
+def append_episodic_entry(content: str) -> None:
+    """Append a timestamped prose entry to today's episodic memory file.
+
+    Creates the file (with a date heading) if it doesn't exist yet.
+    Used both by the queen's diary tool and by the consolidation hook.
+    """
+    ep_path = episodic_memory_path()
+    ep_path.parent.mkdir(parents=True, exist_ok=True)
+    today = date.today()
+    today_str = format_memory_date(today)
+    timestamp = datetime.now().strftime("%H:%M")
+    if not ep_path.exists():
+        header = f"# {today_str}\n\n"
+        block = f"{header}### {timestamp}\n\n{content.strip()}\n"
+    else:
+        block = f"\n\n### {timestamp}\n\n{content.strip()}\n"
+    with ep_path.open("a", encoding="utf-8") as f:
+        f.write(block)
+
+
+def seed_if_missing() -> None:
+    """Create MEMORY.md with a blank template if it doesn't exist yet."""
+    path = semantic_memory_path()
+    if path.exists():
+        return
+    path.parent.mkdir(parents=True, exist_ok=True)
+    path.write_text(_SEED_TEMPLATE, encoding="utf-8")
+
+
+# ---------------------------------------------------------------------------
+# Consolidation prompt
+# ---------------------------------------------------------------------------
+
+_SEMANTIC_SYSTEM = """\
+You maintain the persistent cross-session memory of an AI assistant called the Queen.
+Review the session notes and rewrite MEMORY.md — the Queen's durable understanding of the
+person she works with across all sessions.
+
+Write entirely in the Queen's voice — first person, reflective, honest.
+Not a log of events, but genuine understanding of who this person is over time.
+
+Rules:
+- Update and synthesise: incorporate new understanding, update facts that have changed, remove
+  details that are stale, superseded, or no longer say anything meaningful about the person.
+- Keep it as structured markdown with named sections about the PERSON, not about today.
+- Do NOT include diary sections, daily logs, or session summaries. Those belong elsewhere.
+  MEMORY.md is about who they are, what they want, what works — not what happened today.
+- Reference dates only when noting a lasting milestone (e.g. "since March 8th they prefer X").
+- If the session had no meaningful new information about the person,
+  return the existing text unchanged.
+- Do not add fictional details. Only reflect what is evidenced in the notes.
+- Stay concise. Prune rather than accumulate. A lean, accurate file is more useful than a
+  dense one. If something was true once but has been resolved or superseded, remove it.
+- Output only the raw markdown content of MEMORY.md. No preamble, no code fences.
+"""
+
+_DIARY_SYSTEM = """\
+You maintain the daily episodic diary of an AI assistant called the Queen.
+You receive: (1) today's existing diary so far, and (2) notes from the latest session.
+
+Rewrite the complete diary for today as a single unified narrative —
+first person, reflective, honest.
+Merge and deduplicate: if the same story (e.g. a research agent stalling) recurred several times,
+describe it once with appropriate weight rather than retelling it. Weave in new developments from
+the session notes. Preserve important milestones, emotional texture, and session path references.
+
+If today's diary is empty, write the initial entry based on the session notes alone.
+
+Output only the full diary prose — no date heading, no timestamp headers,
+no preamble, no code fences.
+"""
+
+
+def read_session_context(session_dir: Path, max_messages: int = 80) -> str:
+    """Extract a readable transcript from conversation parts + adapt.md.
+
+    Reads the last ``max_messages`` conversation parts and the session's
+    adapt.md (working memory). Tool results are omitted — only user and
+    assistant turns (with tool-call names noted) are included.
+    """
+    parts: list[str] = []
+
+    # Working notes
+    adapt_path = session_dir / "data" / "adapt.md"
+    if adapt_path.exists():
+        text = adapt_path.read_text(encoding="utf-8").strip()
+        if text:
+            parts.append(f"## Session Working Notes (adapt.md)\n\n{text}")
+
+    # Conversation transcript
+    parts_dir = session_dir / "conversations" / "parts"
+    if parts_dir.exists():
+        part_files = sorted(parts_dir.glob("*.json"))[-max_messages:]
+        lines: list[str] = []
+        for pf in part_files:
+            try:
+                data = json.loads(pf.read_text(encoding="utf-8"))
+                role = data.get("role", "")
+                content = str(data.get("content", "")).strip()
+                tool_calls = data.get("tool_calls") or []
+                if role == "tool":
+                    continue  # skip verbose tool results
+                if role == "assistant" and tool_calls and not content:
+                    names = [tc.get("function", {}).get("name", "?") for tc in tool_calls]
+                    lines.append(f"[queen calls: {', '.join(names)}]")
+                elif content:
+                    label = "user" if role == "user" else "queen"
+                    lines.append(f"[{label}]: {content[:600]}")
+            except (KeyError, TypeError) as exc:
+                logger.debug("Skipping malformed conversation message: %s", exc)
+                continue
+            except Exception:
+                logger.warning("Unexpected error parsing conversation message", exc_info=True)
+                continue
+        if lines:
+            parts.append("## Conversation\n\n" + "\n".join(lines))
+
+    return "\n\n".join(parts)
+
+
+# ---------------------------------------------------------------------------
+# Context compaction (binary-split LLM summarisation)
+# ---------------------------------------------------------------------------
+
+# If the raw session context exceeds this many characters, compact it first
+# before sending to the consolidation LLM. ~200 k chars ≈ 50 k tokens.
+_CTX_COMPACT_CHAR_LIMIT = 200_000
+_CTX_COMPACT_MAX_DEPTH = 8
+
+_COMPACT_SYSTEM = (
+    "Summarise this conversation segment. Preserve: user goals, key decisions, "
+    "what was built or changed, emotional tone, and important outcomes. "
+    "Write concisely in third person past tense. Omit routine tool invocations "
+    "unless the result matters."
+)
+
+
+async def _compact_context(text: str, llm: object, *, _depth: int = 0) -> str:
+    """Binary-split and LLM-summarise *text* until it fits within the char limit.
+
+    Mirrors the recursive binary-splitting strategy used by the main agent
+    compaction pipeline (EventLoopNode._llm_compact).
+    """
+    if len(text) <= _CTX_COMPACT_CHAR_LIMIT or _depth >= _CTX_COMPACT_MAX_DEPTH:
+        return text
+
+    # Split near the midpoint on a line boundary so we don't cut mid-message
+    mid = len(text) // 2
+    split_at = text.rfind("\n", 0, mid) + 1
+    if split_at <= 0:
+        split_at = mid
+
+    half1, half2 = text[:split_at], text[split_at:]
+
+    async def _summarise(chunk: str) -> str:
+        try:
+            resp = await llm.acomplete(
+                messages=[{"role": "user", "content": chunk}],
+                system=_COMPACT_SYSTEM,
+                max_tokens=2048,
+            )
+            return resp.content.strip()
+        except Exception:
+            logger.warning(
+                "queen_memory: context compaction LLM call failed (depth=%d), truncating",
+                _depth,
+            )
+            return chunk[: _CTX_COMPACT_CHAR_LIMIT // 4]
+
+    s1, s2 = await asyncio.gather(_summarise(half1), _summarise(half2))
+    combined = s1 + "\n\n" + s2
+    if len(combined) > _CTX_COMPACT_CHAR_LIMIT:
+        return await _compact_context(combined, llm, _depth=_depth + 1)
+    return combined
+
+
+async def consolidate_queen_memory(
+    session_id: str,
+    session_dir: Path,
+    llm: object,
+) -> None:
+    """Update MEMORY.md and append a diary entry based on the current session.
+
+    Reads conversation parts and adapt.md from session_dir. Called
+    periodically in the background and once at session end. Failures are
+    logged and silently swallowed so they never block teardown.
+
+    Args:
+        session_id: The session ID (used for the adapt.md path reference).
+        session_dir: Path to the session directory (~/.hive/queen/session/{id}).
+        llm: LLMProvider instance (must support acomplete()).
+    """
+    try:
+        session_context = read_session_context(session_dir)
+        if not session_context:
+            logger.debug("queen_memory: no session context, skipping consolidation")
+            return
+
+        logger.info("queen_memory: consolidating memory for session %s ...", session_id)
+
+        # If the transcript is very large, compact it with recursive binary LLM
+        # summarisation before sending to the consolidation model.
+        if len(session_context) > _CTX_COMPACT_CHAR_LIMIT:
+            logger.info(
+                "queen_memory: session context is %d chars — compacting first",
+                len(session_context),
+            )
+            session_context = await _compact_context(session_context, llm)
+            logger.info("queen_memory: compacted to %d chars", len(session_context))
+
+        existing_semantic = read_semantic_memory()
+        today_journal = read_episodic_memory()
+        today = date.today()
+        today_str = format_memory_date(today)
+        adapt_path = session_dir / "data" / "adapt.md"
+
+        user_msg = (
+            f"## Existing Semantic Memory (MEMORY.md)\n\n"
+            f"{existing_semantic or '(none yet)'}\n\n"
+            f"## Today's Diary So Far ({today_str})\n\n"
+            f"{today_journal or '(none yet)'}\n\n"
+            f"{session_context}\n\n"
+            f"## Session Reference\n\n"
+            f"Session ID: {session_id}\n"
+            f"Session path: {adapt_path}\n"
+        )
+
+        logger.debug(
+            "queen_memory: calling LLM (%d chars of context, ~%d tokens est.)",
+            len(user_msg),
+            len(user_msg) // 4,
+        )
+
+        from framework.agents.queen.config import default_config
+
+        semantic_resp, diary_resp = await asyncio.gather(
+            llm.acomplete(
+                messages=[{"role": "user", "content": user_msg}],
+                system=_SEMANTIC_SYSTEM,
+                max_tokens=default_config.max_tokens,
+            ),
+            llm.acomplete(
+                messages=[{"role": "user", "content": user_msg}],
+                system=_DIARY_SYSTEM,
+                max_tokens=default_config.max_tokens,
+            ),
+        )
+
+        new_semantic = semantic_resp.content.strip()
+        diary_entry = diary_resp.content.strip()
+
+        if new_semantic:
+            path = semantic_memory_path()
+            path.parent.mkdir(parents=True, exist_ok=True)
+            path.write_text(new_semantic, encoding="utf-8")
+            logger.info("queen_memory: semantic memory updated (%d chars)", len(new_semantic))
+
+        if diary_entry:
+            # Rewrite today's episodic file in-place — the LLM has merged and
+            # deduplicated the full day's content, so we replace rather than append.
+            ep_path = episodic_memory_path()
+            ep_path.parent.mkdir(parents=True, exist_ok=True)
+            heading = f"# {today_str}"
+            ep_path.write_text(f"{heading}\n\n{diary_entry}\n", encoding="utf-8")
+            logger.info(
+                "queen_memory: episodic diary rewritten for %s (%d chars)",
+                today_str,
+                len(diary_entry),
+            )
+
+    except Exception:
+        tb = traceback.format_exc()
+        logger.exception("queen_memory: consolidation failed")
+        # Write to file so the cause is findable regardless of log verbosity.
+        error_path = _queen_dir() / "consolidation_error.txt"
+        try:
+            error_path.parent.mkdir(parents=True, exist_ok=True)
+            error_path.write_text(
+                f"session: {session_id}\ntime: {datetime.now().isoformat()}\n\n{tb}",
+                encoding="utf-8",
+            )
+        except OSError:
+            pass  # Cannot write error file; original exception already logged
@@ -0,0 +1,35 @@
+# Common Mistakes When Building Hive Agents
+
+## Critical Errors
+1. **Using tools that don't exist** — Always verify tools via `list_agent_tools()` before designing. Common hallucinations: `csv_read`, `csv_write`, `file_upload`, `database_query`, `bulk_fetch_emails`.
+2. **Wrong mcp_servers.json format** — Flat dict (no `"mcpServers"` wrapper). `cwd` must be `"../../tools"`. `command` must be `"uv"` with args `["run", "python", ...]`.
+3. **Missing module-level exports in `__init__.py`** — The runner reads `goal`, `nodes`, `edges`, `entry_node`, `entry_points`, `terminal_nodes`, `conversation_mode`, `identity_prompt`, `loop_config` via `getattr()`. ALL module-level variables from agent.py must be re-exported in `__init__.py`.
+
+## Value Errors
+4. **Fabricating tools** — Always verify via `list_agent_tools()` before designing and `validate_agent_package()` after building.
+
+## Design Errors
+5. **Adding framework gating for LLM behavior** — Don't add output rollback or premature rejection. Fix with better prompts or custom judges.
+6. **Calling set_output in same turn as tool calls** — Call set_output in a SEPARATE turn.
+
+## File Template Errors
+7. **Wrong import paths** — Use `from framework.graph import ...`, NOT `from core.framework.graph import ...`.
+8. **Missing storage path** — Agent class must set `self._storage_path = Path.home() / ".hive" / "agents" / "agent_name"`.
+9. **Missing mcp_servers.json** — Without this, the agent has no tools at runtime.
+10. **Bare `python` command** — Use `"command": "uv"` with args `["run", "python", ...]`.
+
+## Testing Errors
+11. **Using `runner.run()` on forever-alive agents** — `runner.run()` hangs forever because forever-alive agents have no terminal node. Write structural tests instead: validate graph structure, verify node specs, test `AgentRunner.load()` succeeds (no API key needed).
+12. **Stale tests after restructuring** — When changing nodes/edges, update tests to match. Tests referencing old node names will fail.
+13. **Running integration tests without API keys** — Use `pytest.skip()` when credentials are missing.
+14. **Forgetting sys.path setup in conftest.py** — Tests need `exports/` and `core/` on sys.path.
+
+## GCU Errors
+15. **Manually wiring browser tools on event_loop nodes** — Use `node_type="gcu"` which auto-includes browser tools. Do NOT manually list browser tool names.
+16. **Using GCU nodes as regular graph nodes** — GCU nodes are subagents only. They must ONLY appear in `sub_agents=["gcu-node-id"]` and be invoked via `delegate_to_sub_agent()`. Never connect via edges or use as entry/terminal nodes.
+17. **Reusing the same GCU node ID for parallel tasks** — Each concurrent browser task needs a distinct GCU node ID (e.g. `gcu-site-a`, `gcu-site-b`). Two `delegate_to_sub_agent` calls with the same `agent_id` share a browser profile and will interfere with each other's pages.
+18. **Passing `profile=` in GCU tool calls** — Profile isolation for parallel subagents is automatic. The framework injects a unique profile per subagent via an asyncio `ContextVar`. Hardcoding `profile="default"` in a GCU system prompt breaks this isolation.
+
+## Worker Agent Errors
+19. **Adding client-facing intake node to workers** — The queen owns intake. Workers should start with an autonomous processing node. Client-facing nodes in workers are for mid-execution review/approval only.
+20. **Putting `escalate` or `set_output` in NodeSpec `tools=[]`** — These are synthetic framework tools, auto-injected at runtime. Only list MCP tools from `list_agent_tools()`.
--- a/Show More
+++ b/Show More