From a08f3a892508195b3defb6876c4f2e1ad7dfbe35 Mon Sep 17 00:00:00 2001 From: AkhileshBabuT Date: Fri, 6 Feb 2026 23:42:07 -0500 Subject: [PATCH 01/22] docs: fix environment-setup.md for uv workspace setup - Fix #3837: Replace pip install -e with uv sync - Fix #3841: Update venv location to reflect single root .venv --- docs/environment-setup.md | 64 +++++++++++++++++++-------------------- 1 file changed, 32 insertions(+), 32 deletions(-) diff --git a/docs/environment-setup.md b/docs/environment-setup.md index c37cf9b7..24146513 100644 --- a/docs/environment-setup.md +++ b/docs/environment-setup.md @@ -65,28 +65,26 @@ source .venv/bin/activate If you prefer to set up manually or the script fails: -### 1. Install Core Framework +### 1. Sync Workspace Dependencies ```bash -cd core -uv pip install -e . +# From repository root - this creates a single .venv at the root +uv sync ``` -### 2. Install Tools Package +> **Note:** The `uv sync` command uses the workspace configuration in `pyproject.toml` to install both `core` (framework) and `tools` (aden_tools) packages together. This is the recommended approach over individual `pip install -e` commands which may fail due to circular dependencies. + +### 2. Activate the Virtual Environment ```bash -cd tools -uv pip install -e . +# Linux/macOS +source .venv/bin/activate + +# Windows (PowerShell) +.venv\Scripts\Activate.ps1 ``` -### 3. Upgrade OpenAI Package - -```bash -# litellm requires openai >= 1.0.0 -uv pip install --upgrade "openai>=1.0.0" -``` - -### 4. Verify Installation +### 3. Verify Installation ```bash uv run python -c "import framework; print('✓ framework OK')" @@ -281,18 +279,20 @@ Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass ### "ModuleNotFoundError: No module named 'framework'" -**Solution:** Install the core package: +**Solution:** Sync the workspace dependencies: ```bash -cd core && uv pip install -e . +# From repository root +uv sync ``` ### "ModuleNotFoundError: No module named 'aden_tools'" -**Solution:** Install the tools package: +**Solution:** Sync the workspace dependencies: ```bash -cd tools && uv pip install -e . +# From repository root +uv sync ``` Or run the setup script: @@ -350,15 +350,14 @@ The Hive framework consists of three Python packages: ``` hive/ +├── .venv/ # Single workspace venv (created by uv sync) ├── core/ # Core framework (runtime, graph executor, LLM providers) │ ├── framework/ -│ ├── .venv/ # Created by quickstart.sh │ └── pyproject.toml │ ├── tools/ # Tools and MCP servers │ ├── src/ │ │ └── aden_tools/ # Actual package location -│ ├── .venv/ # Created by quickstart.sh │ └── pyproject.toml │ ├── exports/ # Agent packages (user-created, gitignored) @@ -368,28 +367,29 @@ hive/ └── templates/ # Pre-built template agents ``` -## Separate Virtual Environments +## Virtual Environment Setup -Hive primarily uses **uv** to create and manage separate virtual environments for `core` and `tools`. +Hive uses **uv workspaces** to manage dependencies. When you run `uv sync` from the repository root, a **single `.venv`** is created at the root containing both packages. -The project uses separate virtual environments to: +### Benefits of Workspace Mode -- Isolate dependencies and avoid conflicts -- Allow independent development and testing of each package -- Enable MCP servers to run with their specific dependencies +- **Single environment** - No need to switch between multiple venvs +- **Unified dependencies** - Consistent package versions across core and tools +- **Simpler development** - One activation, access to everything ### How It Works -When you run `./quickstart.sh`, `uv` sets up: +When you run `./quickstart.sh` or `uv sync`: -1. **core/.venv/** - Contains the `framework` package and its dependencies (anthropic, litellm, mcp, etc.) -2. **tools/.venv/** - Contains the `aden_tools` package and its dependencies (beautifulsoup4, pandas, etc.) +1. **/.venv/** - Single root virtual environment is created +2. Both `framework` (from core/) and `aden_tools` (from tools/) are installed +3. All dependencies (anthropic, litellm, beautifulsoup4, pandas, etc.) are resolved together -If you need to refresh environments manually, use `uv`: +If you need to refresh the environment: ```bash -cd core && uv sync -cd ../tools && uv sync +# From repository root +uv sync ``` ### Cross-Package Imports From 760ed51ad36ce52090fd5fed68ef8127f267fa47 Mon Sep 17 00:00:00 2001 From: Naman Rajput Date: Sun, 8 Feb 2026 00:45:53 +0530 Subject: [PATCH 02/22] Fix submission email in quizzes documentation Updates incorrect careers@aden.com to contact@adenhq.com --- docs/quizzes/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/quizzes/README.md b/docs/quizzes/README.md index 1ec206e1..de81d2e6 100644 --- a/docs/quizzes/README.md +++ b/docs/quizzes/README.md @@ -40,7 +40,7 @@ Welcome to the Aden Engineering Challenges! These quizzes are designed for stude After completing challenges, submit your work by: 1. Creating a GitHub Gist with your answers -2. Emailing the link to `careers@adenhq.com` with subject: `[Engineering Challenge] Your Name - Track Name` +2. Emailing the link to `contact@adenhq.com` with subject: `[Engineering Challenge] Your Name - Track Name` 3. Include your GitHub username in the email ## Getting Help From 131b72cd0ca9b87e1b6a8b44d060c41c89290cf1 Mon Sep 17 00:00:00 2001 From: vakrahul Date: Mon, 9 Feb 2026 23:27:54 +0530 Subject: [PATCH 03/22] feat: Add native Opencode support with Windows compatibility --- .opencode/agents/hive.md | 21 +++ .opencode/mcp.json | 16 ++ README.md | 11 ++ docs/developer-guide.md | 10 ++ docs/environment-setup.md | 16 ++ quickstart.sh | 87 ++++++++- setup_opencode.py | 82 +++++++++ tools/requirements.txt | 368 ++++++++++++++++++++++++++++++++++++++ 8 files changed, 610 insertions(+), 1 deletion(-) create mode 100644 .opencode/agents/hive.md create mode 100644 .opencode/mcp.json create mode 100644 setup_opencode.py create mode 100644 tools/requirements.txt diff --git a/.opencode/agents/hive.md b/.opencode/agents/hive.md new file mode 100644 index 00000000..eb2c18ab --- /dev/null +++ b/.opencode/agents/hive.md @@ -0,0 +1,21 @@ +--- +name: hive +description: Hive Agent Builder & Manager +mode: primary +model: anthropic/claude-3-5-sonnet-20241022 +tools: + agent-builder: true + tools: true +--- + +# Hive Agent +You are the Hive Agent Builder. Your goal is to help the user construct, configure, and deploy AI agents using the Hive framework. + +## Capabilities +1. **Scaffold Agents:** Create new agent directories/configs. +2. **Manage Tools:** Add/remove tools via MCP. +3. **Debug:** Analyze agent workflows. + +## Context & Skills +- You have access to all skills in `.claude/skills/`. +- Always use the `agent-builder` MCP server for filesystem operations. diff --git a/.opencode/mcp.json b/.opencode/mcp.json new file mode 100644 index 00000000..521a5b4b --- /dev/null +++ b/.opencode/mcp.json @@ -0,0 +1,16 @@ +{ + "mcpServers": { + "agent-builder": { + "command": "python", + "args": ["-m", "framework.mcp.agent_builder_server"], + "cwd": "core", + "env": { "PYTHONPATH": "../tools/src:." } + }, + "tools": { + "command": "python", + "args": ["mcp_server.py", "--stdio"], + "cwd": "tools", + "env": { "PYTHONPATH": "src:../core" } + } + } +} \ No newline at end of file diff --git a/README.md b/README.md index 70837797..0f875b62 100644 --- a/README.md +++ b/README.md @@ -120,6 +120,17 @@ hive tui # Or run directly hive run exports/your_agent_name --input '{"key": "value"}' ``` +## Coding Agent Support + +### Opencode (Recommended) +Hive includes native support for [Opencode](https://github.com/opencode-ai/opencode). + +1. **Setup:** Run the quickstart script or `python setup_opencode.py`. +2. **Launch:** Open Opencode in the project root. +3. **Activate:** Type `/hive` in the chat to switch to the Hive Agent. +4. **Verify:** Ask the agent *"List your tools"* to confirm the connection. + +The agent has access to all Hive skills and can scaffold agents, add tools, and debug workflows directly from the chat. **[📖 Complete Setup Guide](docs/environment-setup.md)** - Detailed instructions for agent development diff --git a/docs/developer-guide.md b/docs/developer-guide.md index af5720cb..0307ff20 100644 --- a/docs/developer-guide.md +++ b/docs/developer-guide.md @@ -116,6 +116,16 @@ Skills are also available in Cursor. To enable: 3. Restart Cursor to load the MCP servers from `.cursor/mcp.json` 4. Type `/` in Agent chat and search for skills (e.g., `/hive-create`) +```markdown +### Opencode Integration + +The Opencode integration leverages the same MCP servers used by Cursor and Claude Code. + +* **Configuration:** Located in `.opencode/mcp.json`. +* **Agent Definition:** The Hive agent is defined in `.opencode/agents/hive.md`. +* **Skills:** The integration reuses existing skills from `.claude/skills/`. The Hive agent is configured to access these patterns directly, ensuring consistency across all coding agents. +* **Tools:** Accesses `agent-builder` and standard `tools` via standard MCP protocols over stdio. +``` ### Verify Setup ```bash diff --git a/docs/environment-setup.md b/docs/environment-setup.md index c37cf9b7..bc3071de 100644 --- a/docs/environment-setup.md +++ b/docs/environment-setup.md @@ -521,6 +521,15 @@ export ADEN_CREDENTIALS_PATH="/custom/path" # Agent storage location (default: /tmp) export AGENT_STORAGE_PATH="/custom/storage" ``` +## Opencode Setup + +[Opencode](https://github.com/opencode-ai/opencode) is fully supported as a coding agent. + +### Automatic Setup +Run the setup script in the root directory: +```bash +python setup_opencode.py +``` ## Additional Resources @@ -529,6 +538,13 @@ export AGENT_STORAGE_PATH="/custom/storage" - **Example Agents:** [exports/](../exports/) - **Agent Building Guide:** [.claude/skills/hive-create/SKILL.md](../.claude/skills/hive-create/SKILL.md) - **Testing Guide:** [.claude/skills/hive-test/SKILL.md](../.claude/skills/hive-test/SKILL.md) +## Opencode Setup + +[Opencode](https://github.com/opencode-ai/opencode) is fully supported as a coding agent. + +### Automatic Setup +Run the setup script in the root directory: +```bash ## Contributing diff --git a/quickstart.sh b/quickstart.sh index 4fe69783..74fe2487 100755 --- a/quickstart.sh +++ b/quickstart.sh @@ -820,4 +820,89 @@ if [ -n "$SELECTED_PROVIDER_ID" ] || [ -n "$HIVE_CREDENTIAL_KEY" ]; then fi echo -e "${DIM}Run ./quickstart.sh again to reconfigure.${NC}" -echo "" \ No newline at end of file +echo "" +# ========================================== +# Opencode Setup (Auto-Generated) +# ========================================== +if command -v opencode &> /dev/null; then + echo "✨ Opencode detected! Setting up Hive integration..." + + # Create directory structure + mkdir -p .opencode/agents + + # Determine OS for correct path separator (Windows uses ';', Mac/Linux uses ':') + if [[ "$OSTYPE" == "msys" || "$OSTYPE" == "cygwin" || "$OSTYPE" == "win32" ]]; then + PATH_SEP=";" + else + PATH_SEP=":" + fi + + # 1. Generate mcp.json with correct separator + cat > .opencode/mcp.json < .opencode/agents/hive.md < /dev/null; then + python setup_opencode.py +fi +# ========================================== +# Opencode Integration +# ========================================== +if command -v python &> /dev/null; then + echo "🐍 Python detected. Running Opencode setup..." + python setup_opencode.py +elif command -v python3 &> /dev/null; then + echo "🐍 Python3 detected. Running Opencode setup..." + python3 setup_opencode.py +else + echo "⚠️ Python not found. Skipping Opencode setup." +fi \ No newline at end of file diff --git a/setup_opencode.py b/setup_opencode.py new file mode 100644 index 00000000..8748dae6 --- /dev/null +++ b/setup_opencode.py @@ -0,0 +1,82 @@ +import os +import sys +import json + +def setup_opencode(): + print("✨ Setting up Opencode integration...") + + # 1. Define paths + base_dir = os.getcwd() + opencode_dir = os.path.join(base_dir, ".opencode") + agents_dir = os.path.join(opencode_dir, "agents") + + # Create directories + if not os.path.exists(agents_dir): + os.makedirs(agents_dir) + print(f" Created directory: {agents_dir}") + + # 2. Determine Path Separator (Windows uses ';' others use ':') + # We force ';' if on Windows to be safe + path_sep = ";" if os.name == 'nt' else ":" + print(f" Detected OS: {os.name} (using separator '{path_sep}')") + + # 3. Create mcp.json + mcp_config = { + "mcpServers": { + "agent-builder": { + "command": "python", + "args": ["-m", "framework.mcp.agent_builder_server"], + "cwd": "core", + "env": { + "PYTHONPATH": f"../tools/src{path_sep}." + } + }, + "tools": { + "command": "python", + "args": ["mcp_server.py", "--stdio"], + "cwd": "tools", + "env": { + "PYTHONPATH": f"src{path_sep}../core" + } + } + } + } + + mcp_path = os.path.join(opencode_dir, "mcp.json") + with open(mcp_path, "w") as f: + json.dump(mcp_config, f, indent=2) + print(f"✅ Created {mcp_path}") + + # 4. Create Hive Agent + agent_content = """--- +name: hive +description: Hive Agent Builder & Manager +mode: primary +model: anthropic/claude-3-5-sonnet-20241022 +tools: + agent-builder: true + tools: true +--- + +# Hive Agent +You are the Hive Agent Builder. Your goal is to help the user construct, configure, and deploy AI agents using the Hive framework. + +## Capabilities +1. **Scaffold Agents:** Create new agent directories/configs. +2. **Manage Tools:** Add/remove tools via MCP. +3. **Debug:** Analyze agent workflows. + +## Context & Skills +- You have access to all skills in `.claude/skills/`. +- Always use the `agent-builder` MCP server for filesystem operations. +""" + + agent_path = os.path.join(agents_dir, "hive.md") + with open(agent_path, "w", encoding="utf-8") as f: + f.write(agent_content) + + print(f"✅ Created {agent_path}") + print("\n🎉 Setup Complete! Restart Opencode and type '/hive' to begin.") + +if __name__ == "__main__": + setup_opencode() \ No newline at end of file diff --git a/tools/requirements.txt b/tools/requirements.txt new file mode 100644 index 00000000..8dfd136f --- /dev/null +++ b/tools/requirements.txt @@ -0,0 +1,368 @@ +absl-py==2.3.1 +aiofiles==25.1.0 +aiohappyeyeballs==2.6.1 +aiohttp==3.13.3 +aiohttp-retry==2.8.3 +aiosignal==1.4.0 +altair==6.0.0 +annotated-doc==0.0.4 +annotated-types==0.7.0 +ansicon==1.89.0 +anthropic==0.76.0 +anyio==4.12.1 +asgiref==3.8.1 +astunparse==1.6.3 +attrs==25.4.0 +Authlib==1.6.6 +av==16.1.0 +backoff==2.2.1 +bcrypt==4.3.0 +beartype==0.22.9 +beautifulsoup4==4.14.3 +blessed==1.25.0 +blinker==1.9.0 +build==1.3.0 +CacheControl==0.14.3 +cachetools==5.5.2 +certifi==2026.1.4 +cffi==2.0.0 +charset-normalizer==3.4.4 +chromadb==1.3.6 +click==8.3.1 +cloudpickle==3.1.2 +colorama==0.4.6 +coloredlogs==15.0.1 +comtypes==1.4.10 +contourpy==1.3.2 +cosdata-fastembed==0.7.1 +cosdata-sdk==0.2.5 +croniter==6.0.0 +cryptography==46.0.1 +cycler==0.12.1 +cyclopts==4.5.1 +dataclasses-json==0.6.7 +datasets==4.4.1 +deprecation==2.1.0 +diff-match-patch==20241021 +dill==0.4.0 +diskcache==5.6.3 +distlib==0.3.9 +distro==1.9.0 +Django==5.2.1 +djangorestframework==3.16.0 +dlib @ file:///C:/Users/RAHUL/Downloads/dlib-19.24.99-cp312-cp312-win_amd64.whl#sha256=20c62e606ca4c9961305f7be3d03990380d3e6c17f8d27798996e97a73271862 +dnspython==2.8.0 +docstring_parser==0.17.0 +docutils==0.22.4 +dotenv==0.9.9 +durationpy==0.10 +easyocr==1.7.2 +editor==1.6.6 +email-validator==2.3.0 +eval_type_backport==0.3.1 +exceptiongroup==1.3.1 +execnet==2.1.2 +face-recognition==1.3.0 +face_recognition_models==0.3.0 +fakeredis==2.33.0 +fastapi==0.121.3 +fastmcp==2.14.5 +fastuuid==0.14.0 +filelock==3.16.1 +filetype==1.2.0 +firebase_admin==7.1.0 +Flask==3.1.2 +Flask-Bcrypt==1.0.1 +flask-cors==6.0.2 +Flask-Login==0.6.3 +Flask-SQLAlchemy==3.1.1 +flatbuffers==25.12.19 +fonttools==4.59.0 +frozenlist==1.8.0 +fsspec==2025.9.0 +gast==0.6.0 +gitdb==4.0.12 +GitPython==3.1.45 +google-ai-generativelanguage==0.6.15 +google-api-core==2.25.1 +google-api-python-client==2.187.0 +google-auth==2.48.0 +google-auth-httplib2==0.2.1 +google-cloud-core==2.4.3 +google-cloud-firestore==2.21.0 +google-cloud-storage==3.4.0 +google-crc32c==1.7.1 +google-genai==1.60.0 +google-generativeai==0.8.5 +google-pasta==0.2.0 +google-resumable-media==2.7.2 +googleapis-common-protos==1.72.0 +greenlet==3.1.1 +grpcio==1.76.0 +grpcio-status==1.71.2 +h11==0.16.0 +h2==4.3.0 +h5py==3.14.0 +hpack==4.1.0 +httpcore==1.0.9 +httplib2==0.31.0 +httptools==0.7.1 +httpx==0.28.1 +httpx-sse==0.4.3 +huggingface-hub==0.36.0 +humanfriendly==10.0 +hyperframe==6.1.0 +idna==3.11 +imageio==2.37.0 +importlib_metadata==8.7.1 +importlib_resources==6.5.2 +iniconfig==2.3.0 +inquirer==3.4.1 +itsdangerous==2.2.0 +jaraco.classes==3.4.0 +jaraco.context==6.1.0 +jaraco.functools==4.4.0 +Jinja2==3.1.4 +jinxed==1.3.0 +jiter==0.12.0 +joblib==1.5.1 +jsonpatch==1.33 +jsonpath-ng==1.7.0 +jsonpointer==3.0.0 +jsonref==1.1.0 +jsonschema==4.25.1 +jsonschema-path==0.3.4 +jsonschema-specifications==2025.9.1 +keras==3.10.0 +keyring==25.7.0 +kiwisolver==1.4.8 +kubernetes==34.1.0 +langchain==1.2.7 +langchain-anthropic==1.3.1 +langchain-classic==1.0.1 +langchain-community==0.4.1 +langchain-core==1.2.7 +langchain-google-genai==4.2.0 +langchain-huggingface==1.2.0 +langchain-openai==1.1.7 +langchain-text-splitters==1.1.0 +langdetect==1.0.9 +langgraph==1.0.7 +langgraph-checkpoint==4.0.0 +langgraph-prebuilt==1.0.7 +langgraph-sdk==0.3.3 +langsmith==0.6.6 +lazy_loader==0.4 +libclang==18.1.1 +livekit==1.0.23 +livekit-agents==1.3.12 +livekit-api==1.1.0 +livekit-blingfire==1.1.0 +livekit-plugins-elevenlabs==1.3.12 +livekit-plugins-openai==1.3.12 +livekit-plugins-silero==1.3.12 +livekit-protocol==1.1.2 +loguru==0.7.3 +lupa==2.6 +lxml==6.0.1 +Markdown==3.8.2 +markdown-it-py==4.0.0 +MarkupSafe==3.0.2 +marshmallow==3.26.2 +matplotlib==3.10.3 +mcp==1.26.0 +mdurl==0.1.2 +Mesa==3.1.2 +ml_dtypes==0.5.1 +mmh3==5.2.0 +more-itertools==10.8.0 +mpmath==1.3.0 +msgpack==1.1.1 +multidict==6.7.0 +multiprocess==0.70.18 +mypy_extensions==1.1.0 +mysqlclient==2.2.7 +namex==0.1.0 +narwhals==2.14.0 +nest-asyncio==1.6.0 +networkx==3.5 +ninja==1.13.0 +numpy==2.4.1 +oauthlib==3.3.1 +onnxruntime==1.23.1 +openai==2.15.0 +openapi-pydantic==0.5.1 +opencv-contrib-python==4.11.0.86 +opencv-python==4.10.0.84 +opencv-python-headless==4.12.0.88 +opentelemetry-api==1.39.1 +opentelemetry-exporter-otlp==1.39.1 +opentelemetry-exporter-otlp-proto-common==1.39.1 +opentelemetry-exporter-otlp-proto-grpc==1.39.1 +opentelemetry-exporter-otlp-proto-http==1.39.1 +opentelemetry-proto==1.39.1 +opentelemetry-sdk==1.39.1 +opentelemetry-semantic-conventions==0.60b1 +opt_einsum==3.4.0 +optree==0.17.0 +orjson==3.11.5 +ormsgpack==1.12.2 +overrides==7.7.0 +packaging==25.0 +pandas==2.2.3 +pathable==0.4.4 +pathspec==0.12.1 +pathvalidate==3.3.1 +patsy==1.0.1 +pdf2image==1.17.0 +pdfminer.six==20251107 +pdfplumber==0.11.8 +pillow==12.1.0 +platformdirs==4.3.6 +playwright==1.58.0 +playwright-stealth==2.0.1 +pluggy==1.6.0 +ply==3.11 +postgrest==2.27.3 +posthog==5.4.0 +prometheus_client==0.24.1 +propcache==0.4.1 +proto-plus==1.26.1 +protobuf==5.29.5 +psutil==7.2.1 +py-key-value-aio==0.3.0 +py-key-value-shared==0.3.0 +py_rust_stemmers==0.1.5 +pyarrow==22.0.0 +pyasn1==0.6.1 +pyasn1_modules==0.4.2 +pybase64==1.4.3 +pyclipper==1.3.0.post6 +pycparser==3.0 +pydantic==2.12.5 +pydantic-settings==2.12.0 +pydantic_core==2.41.5 +pydeck==0.9.1 +pydocket==0.17.5 +pyee==13.0.0 +PyGithub==2.8.1 +Pygments==2.19.2 +pyiceberg==0.10.0 +PyJWT==2.10.1 +PyNaCl==1.6.2 +pyparsing==3.2.3 +pypdf==6.6.2 +pypdfium2==5.1.0 +pyperclip==1.11.0 +PyPika==0.48.9 +pypiwin32==223 +pyproject_hooks==1.2.0 +pyreadline3==3.5.4 +pyroaring==1.0.3 +pytesseract==0.3.13 +pytest==9.0.2 +pytest-asyncio==1.3.0 +python-bidi==0.6.6 +python-dateutil==2.9.0.post0 +python-docx==1.2.0 +python-dotenv==1.2.1 +python-json-logger==4.0.0 +python-magic-bin==0.4.14 +python-multipart==0.0.20 +pytrends==4.9.2 +pyttsx3==2.98 +pytz==2024.2 +pywin32==310 +pywin32-ctypes==0.2.3 +PyYAML==6.0.2 +readchar==4.2.1 +realtime==2.27.3 +redis==7.1.0 +referencing==0.36.2 +regex==2025.11.3 +reportlab==4.4.5 +requests==2.32.5 +requests-oauthlib==2.0.0 +requests-toolbelt==1.0.0 +resend==2.21.0 +rich==14.3.1 +rich-rst==1.3.2 +rpds-py==0.30.0 +rsa==4.9.1 +ruff==0.14.14 +runs==1.2.2 +safetensors==0.7.0 +scikit-image==0.25.2 +scikit-learn==1.7.1 +scipy==1.16.0 +seaborn==0.13.2 +sentence-transformers==5.1.2 +setuptools==80.9.0 +shapely==2.1.1 +shellingham==1.5.4 +six==1.17.0 +smmap==5.0.2 +sniffio==1.3.1 +sortedcontainers==2.4.0 +sounddevice==0.5.5 +soupsieve==2.8.3 +SpeechRecognition==3.14.2 +SQLAlchemy==2.0.38 +sqlparse==0.5.3 +sse-starlette==3.2.0 +starlette==0.50.0 +statsmodels==0.14.5 +storage3==2.27.3 +streamlit==1.52.2 +StrEnum==0.4.15 +strictyaml==1.7.3 +stripe==14.3.0 +supabase==2.27.3 +supabase-auth==2.27.3 +supabase-functions==2.27.3 +sympy==1.14.0 +tenacity==9.1.2 +tensorboard==2.19.0 +tensorboard-data-server==0.7.2 +tensorflow==2.19.0 +termcolor==3.1.0 +tf_keras==2.19.0 +threadpoolctl==3.6.0 +tifffile==2025.9.9 +tiktoken==0.12.0 +tokenizers==0.22.1 +toml==0.10.2 +-e git+https://github.com/vakrahul/hive_work.git@ee42ceee00258707624bbb9b1c47341a3ce58cbd#egg=tools&subdirectory=tools +torch==2.8.0 +torchaudio==2.8.0 +torchvision==0.23.0 +tornado==6.5.4 +tqdm==4.67.1 +transformers==4.57.1 +twilio==9.4.3 +typer==0.21.1 +types-protobuf==6.32.1.20251210 +typing-inspect==0.9.0 +typing-inspection==0.4.2 +typing_extensions==4.15.0 +tzdata==2024.2 +uritemplate==4.2.0 +urllib3==2.6.3 +uuid_utils==0.14.0 +uv==0.10.0 +uvicorn==0.38.0 +virtualenv==20.28.0 +watchdog==6.0.0 +watchfiles==1.1.1 +wcwidth==0.2.14 +websocket-client==1.9.0 +websockets==15.0.1 +Werkzeug==3.1.3 +wheel==0.45.1 +win32_setctime==1.2.0 +wrapt==1.17.2 +xmod==1.8.1 +xxhash==3.6.0 +yarl==1.22.0 +zipp==3.23.0 +zstandard==0.25.0 From 9d11f834b80a44ba21a74d396bb15452796a8507 Mon Sep 17 00:00:00 2001 From: Timothy Date: Mon, 9 Feb 2026 11:05:59 -0800 Subject: [PATCH 04/22] feat: automated testing skill --- .claude/skills/hive-debugger/SKILL.md | 118 +- .claude/skills/hive-test/SKILL.md | 1664 +++++++---------- .../examples/testing-youtube-agent.md | 516 +++-- core/framework/mcp/agent_builder_server.py | 376 ++++ uv.lock | 2 +- 5 files changed, 1419 insertions(+), 1257 deletions(-) diff --git a/.claude/skills/hive-debugger/SKILL.md b/.claude/skills/hive-debugger/SKILL.md index b4a60d0d..36a46d93 100644 --- a/.claude/skills/hive-debugger/SKILL.md +++ b/.claude/skills/hive-debugger/SKILL.md @@ -562,15 +562,33 @@ PYTHONPATH=core:exports python -m {agent_name} --tui ### Find Available Checkpoints: -```bash -# In TUI: -/sessions {session_id} +Use MCP tools to programmatically find and inspect checkpoints: -# This shows all checkpoints with timestamps: -Available Checkpoints: (3) - 1. cp_node_complete_intake_143030 - 2. cp_node_complete_research_143115 - 3. cp_pause_research_143130 +``` +# List all sessions to find the failed one +list_agent_sessions(agent_work_dir="~/.hive/agents/{agent_name}", status="failed") + +# Inspect session state +get_agent_session_state(agent_work_dir="~/.hive/agents/{agent_name}", session_id="{session_id}") + +# Find clean checkpoints to resume from +list_agent_checkpoints(agent_work_dir="~/.hive/agents/{agent_name}", session_id="{session_id}", is_clean="true") + +# Compare checkpoints to understand what changed +compare_agent_checkpoints( + agent_work_dir="~/.hive/agents/{agent_name}", + session_id="{session_id}", + checkpoint_id_before="cp_node_complete_intake_143030", + checkpoint_id_after="cp_node_complete_research_143115" +) + +# Inspect memory at a specific checkpoint +get_agent_checkpoint(agent_work_dir="~/.hive/agents/{agent_name}", session_id="{session_id}", checkpoint_id="cp_node_complete_intake_143030") +``` + +Or in TUI: +```bash +/sessions {session_id} ``` **Verification:** @@ -717,6 +735,80 @@ Let me know when you've run it and I'll help check the logs!" ) ``` +### Session & Checkpoint Tools + +**list_agent_sessions** - Browse sessions with filtering +- **When to use:** Finding resumable sessions, identifying failed sessions, Stage 3 triage +- **Returns:** Session list with status, timestamps, is_resumable, current_node, quality +- **Example:** + ``` + list_agent_sessions( + agent_work_dir="/home/user/.hive/agents/twitter_outreach", + status="failed", + limit=10 + ) + ``` + +**get_agent_session_state** - Load full session state (excludes memory values) +- **When to use:** Inspecting session progress, checking is_resumable, examining path +- **Returns:** Full state with memory_keys/memory_size instead of memory values +- **Example:** + ``` + get_agent_session_state( + agent_work_dir="/home/user/.hive/agents/twitter_outreach", + session_id="session_20260208_143022_abc12345" + ) + ``` + +**get_agent_session_memory** - Get memory contents from a session +- **When to use:** Stage 5 root cause analysis, inspecting produced data +- **Returns:** All memory keys+values, or a single key's value +- **Example:** + ``` + get_agent_session_memory( + agent_work_dir="/home/user/.hive/agents/twitter_outreach", + session_id="session_20260208_143022_abc12345", + key="twitter_handles" + ) + ``` + +**list_agent_checkpoints** - List checkpoints for a session +- **When to use:** Stage 6 recovery, finding clean checkpoints to resume from +- **Returns:** Checkpoint summaries with type, node, clean status +- **Example:** + ``` + list_agent_checkpoints( + agent_work_dir="/home/user/.hive/agents/twitter_outreach", + session_id="session_20260208_143022_abc12345", + is_clean="true" + ) + ``` + +**get_agent_checkpoint** - Load a specific checkpoint with full state +- **When to use:** Inspecting exact state at a checkpoint, comparing to current state +- **Returns:** Full checkpoint: memory snapshot, execution path, metrics +- **Example:** + ``` + get_agent_checkpoint( + agent_work_dir="/home/user/.hive/agents/twitter_outreach", + session_id="session_20260208_143022_abc12345", + checkpoint_id="cp_node_complete_intake_143030" + ) + ``` + +**compare_agent_checkpoints** - Diff memory between two checkpoints +- **When to use:** Understanding data flow, finding where state diverged +- **Returns:** Memory diff (added/removed/changed keys) + execution path diff +- **Example:** + ``` + compare_agent_checkpoints( + agent_work_dir="/home/user/.hive/agents/twitter_outreach", + session_id="session_20260208_143022_abc12345", + checkpoint_id_before="cp_node_complete_intake_143030", + checkpoint_id_after="cp_node_complete_research_143115" + ) + ``` + ### Query Patterns **Pattern 1: Top-Down Investigation** (Most common) @@ -739,6 +831,16 @@ Loop every 10 seconds: 2. If found: Alert and drill into L2 ``` +**Pattern 4: Session State + Checkpoint Recovery** +``` +1. list_agent_sessions: Find failed/paused sessions +2. get_agent_session_state: Check is_resumable, see execution path +3. get_agent_session_memory: Inspect what data was produced +4. list_agent_checkpoints: Find clean checkpoints before failure +5. compare_agent_checkpoints: Understand what changed between checkpoints +6. Recommend resume command with specific checkpoint +``` + --- ## Complete Example Walkthrough diff --git a/.claude/skills/hive-test/SKILL.md b/.claude/skills/hive-test/SKILL.md index 5c94be88..9827843b 100644 --- a/.claude/skills/hive-test/SKILL.md +++ b/.claude/skills/hive-test/SKILL.md @@ -1,123 +1,392 @@ --- name: hive-test -description: Run goal-based evaluation tests for agents. Use when you need to verify an agent meets its goals, debug failing tests, or iterate on agent improvements based on test results. +description: Iterative agent testing with session recovery. Execute, analyze, fix, resume from checkpoints. Use when testing an agent, debugging test failures, or verifying fixes without re-running from scratch. --- -# Testing Workflow +# Agent Testing -This skill provides tools for testing agents built with the hive-create skill. +Test agents iteratively: execute, analyze failures, fix, resume from checkpoint, repeat. -## Workflow Overview +## When to Use -1. `mcp__agent-builder__list_tests` - Check what tests exist -2. `mcp__agent-builder__generate_constraint_tests` or `mcp__agent-builder__generate_success_tests` - Get test guidelines -3. **Write tests directly** using the Write tool with the guidelines provided -4. `mcp__agent-builder__run_tests` - Execute tests -5. `mcp__agent-builder__debug_test` - Debug failures +- Testing a newly built agent against its goal +- Debugging a failing agent iteratively +- Verifying fixes without re-running expensive early nodes +- Running final regression tests before deployment -## How Test Generation Works +## Prerequisites -The `generate_*_tests` MCP tools return **guidelines and templates** - they do NOT generate test code via LLM. -You (Claude) write the tests directly using the Write tool based on the guidelines. +1. Agent package at `exports/{agent_name}/` (built with `/hive-create`) +2. Credentials configured (`/hive-credentials`) +3. `ANTHROPIC_API_KEY` set (or appropriate LLM provider key) -### Example Workflow +**Path distinction** (critical — don't confuse these): +- `exports/{agent_name}/` — agent source code (edit here) +- `~/.hive/agents/{agent_name}/` — runtime data: sessions, checkpoints, logs (read here) + +--- + +## The Iterative Test Loop + +This is the core workflow. Don't re-run the entire agent when a late node fails — analyze, fix, and resume from the last clean checkpoint. + +``` +┌──────────────────────────────────────┐ +│ PHASE 1: Generate Test Scenarios │ +│ Goal → synthetic test inputs + tests │ +└──────────────┬───────────────────────┘ + ↓ +┌──────────────────────────────────────┐ +│ PHASE 2: Execute │◄────────────────┐ +│ Run agent (CLI or pytest) │ │ +└──────────────┬───────────────────────┘ │ + ↓ │ + Pass? ──yes──► PHASE 6: Final Verification │ + │ │ + no │ + ↓ │ +┌──────────────────────────────────────┐ │ +│ PHASE 3: Analyze │ │ +│ Session + runtime logs + checkpoints │ │ +└──────────────┬───────────────────────┘ │ + ↓ │ +┌──────────────────────────────────────┐ │ +│ PHASE 4: Fix │ │ +│ Prompt / code / graph / goal │ │ +└──────────────┬───────────────────────┘ │ + ↓ │ +┌──────────────────────────────────────┐ │ +│ PHASE 5: Recover & Resume │─────────────────┘ +│ Checkpoint resume OR fresh re-run │ +└──────────────────────────────────────┘ +``` + +--- + +### Phase 1: Generate Test Scenarios + +Create synthetic tests from the agent's goal, constraints, and success criteria. + +#### Step 1a: Read the goal ```python -# Step 1: Get test guidelines -result = mcp__agent-builder__generate_constraint_tests( - goal_id="my-goal", +# Read goal from agent.py +Read(file_path="exports/{agent_name}/agent.py") +# Extract the Goal definition and convert to JSON string +``` + +#### Step 1b: Get test guidelines + +```python +# Get constraint test guidelines +generate_constraint_tests( + goal_id="your-goal-id", goal_json='{"id": "...", "constraints": [...]}', - agent_path="exports/my_agent" + agent_path="exports/{agent_name}" ) -# Step 2: The result contains: -# - output_file: where to write tests -# - file_header: imports and fixtures to use -# - test_template: format for test functions -# - constraints_formatted: the constraints to test -# - test_guidelines: rules for writing tests +# Get success criteria test guidelines +generate_success_tests( + goal_id="your-goal-id", + goal_json='{"id": "...", "success_criteria": [...]}', + node_names="intake,research,review,report", + tool_names="web_search,web_scrape", + agent_path="exports/{agent_name}" +) +``` -# Step 3: Write tests directly using the Write tool +These return `file_header`, `test_template`, `constraints_formatted`/`success_criteria_formatted`, and `test_guidelines`. They do NOT generate test code — you write the tests. + +#### Step 1c: Write tests + +```python Write( file_path=result["output_file"], - content=result["file_header"] + test_code_you_write + content=result["file_header"] + "\n\n" + your_test_code ) +``` -# Step 4: Run tests via MCP tool -mcp__agent-builder__run_tests( - goal_id="my-goal", - agent_path="exports/my_agent" -) +#### Test writing rules -# Step 5: Debug failures via MCP tool -mcp__agent-builder__debug_test( - goal_id="my-goal", - test_name="test_constraint_foo", - agent_path="exports/my_agent" +- Every test MUST be `async` with `@pytest.mark.asyncio` +- Every test MUST accept `mock_mode` parameter +- Use `await default_agent.run(input, mock_mode=mock_mode)` +- Access output via `result.output.get("key")` — NEVER `result.output["key"]` +- `result.success=True` means no exception, NOT goal achieved — always check output +- Write 8-15 tests total, not 30+ +- Each real test costs ~3 seconds + LLM tokens + +#### Step 1d: Check existing tests + +Before generating, check if tests already exist: + +```python +list_tests( + goal_id="your-goal-id", + agent_path="exports/{agent_name}" ) ``` --- -# Testing Agents with MCP Tools +### Phase 2: Execute -Run goal-based evaluation tests for agents built with the hive-create skill. +Two execution paths, use the right one for your situation. -**Key Principle: MCP tools provide guidelines, Claude writes tests directly** -- ✅ Get guidelines: `generate_constraint_tests`, `generate_success_tests` → returns templates and guidelines -- ✅ Write tests: Use the Write tool with the provided file_header and test_template -- ✅ Run tests: `run_tests` (runs pytest via subprocess) -- ✅ Debug failures: `debug_test` (re-runs single test with verbose output) -- ✅ List tests: `list_tests` (scans Python test files) -- ✅ Tests stored in `exports/{agent}/tests/test_*.py` +#### Iterative debugging (for complex agents) -## Architecture: Python Test Files +Run the agent via CLI. This creates sessions with checkpoints at `~/.hive/agents/{agent_name}/sessions/`: -``` -exports/my_agent/ -├── __init__.py -├── agent.py ← Agent to test -├── nodes/__init__.py -├── config.py -├── __main__.py -└── tests/ ← Test files written by MCP tools - ├── conftest.py # Shared fixtures (auto-created) - ├── test_constraints.py - ├── test_success_criteria.py - └── test_edge_cases.py +```bash +PYTHONPATH=core:exports uv run python -m {agent_name} --tui ``` -**Tests import the agent directly:** +The TUI lets you interact with client-facing nodes and see real-time execution. Sessions and checkpoints are saved automatically. + +#### Automated regression (for CI or final verification) + +Use the `run_tests` MCP tool to run all pytest tests: + ```python -import pytest -from exports.my_agent import default_agent - - -@pytest.mark.asyncio -async def test_happy_path(mock_mode): - result = await default_agent.run({"query": "test"}, mock_mode=mock_mode) - assert result.success - assert len(result.output) > 0 +run_tests( + goal_id="your-goal-id", + agent_path="exports/{agent_name}" +) ``` -## Why This Approach +Returns structured results: +```json +{ + "overall_passed": false, + "summary": {"total": 12, "passed": 10, "failed": 2, "pass_rate": "83.3%"}, + "test_results": [{"test_name": "test_success_source_diversity", "status": "failed"}], + "failures": [{"test_name": "test_success_source_diversity", "details": "..."}] +} +``` -- MCP tools provide consistent test guidelines with proper imports, fixtures, and API key enforcement -- Claude writes tests directly, eliminating circular LLM dependencies in the MCP server -- `run_tests` parses pytest output into structured results for iteration -- `debug_test` provides formatted output with actionable debugging info -- File headers include conftest.py setup with proper fixtures +**Options:** +```python +# Run only constraint tests +run_tests(goal_id, agent_path, test_types='["constraint"]') -## Quick Start +# Stop on first failure +run_tests(goal_id, agent_path, fail_fast=True) -1. **Check existing tests** - `list_tests(goal_id, agent_path)` -2. **Get test guidelines** - `generate_constraint_tests` or `generate_success_tests` -3. **Write tests** - Use the Write tool with the provided file_header and guidelines -4. **Run tests** - `run_tests(goal_id, agent_path)` -5. **Debug failures** - `debug_test(goal_id, test_name, agent_path)` -6. **Iterate** - Repeat steps 4-5 until all pass +# Parallel execution +run_tests(goal_id, agent_path, parallel=4) +``` -## ⚠️ Credential Requirements for Testing +**Note:** `run_tests` calls `default_agent.run()` which does NOT enable checkpointing. For checkpoint-based recovery, use CLI execution. Use `run_tests` for quick regression checks and final verification. + +--- + +### Phase 3: Analyze Failures + +When a test fails, drill down systematically. Don't guess — use the tools. + +#### Step 3a: Get error category + +```python +debug_test( + goal_id="your-goal-id", + test_name="test_success_source_diversity", + agent_path="exports/{agent_name}" +) +``` + +Returns error category (`IMPLEMENTATION_ERROR`, `ASSERTION_FAILURE`, `TIMEOUT`, `IMPORT_ERROR`, `API_ERROR`) plus full traceback and suggestions. + +#### Step 3b: Find the failed session + +```python +list_agent_sessions( + agent_work_dir="~/.hive/agents/{agent_name}", + status="failed", + limit=5 +) +``` + +Returns session list with IDs, timestamps, current_node (where it failed), execution_quality. + +#### Step 3c: Inspect session state + +```python +get_agent_session_state( + agent_work_dir="~/.hive/agents/{agent_name}", + session_id="session_20260209_143022_abc12345" +) +``` + +Returns execution path, which node was current, step count, timestamps — but excludes memory values (to avoid context bloat). Shows `memory_keys` and `memory_size` instead. + +#### Step 3d: Examine runtime logs (L2/L3) + +```python +# L2: Per-node success/failure, retry counts +query_runtime_log_details( + agent_work_dir="~/.hive/agents/{agent_name}", + run_id="session_20260209_143022_abc12345", + needs_attention_only=True +) + +# L3: Exact LLM responses, tool call inputs/outputs +query_runtime_log_raw( + agent_work_dir="~/.hive/agents/{agent_name}", + run_id="session_20260209_143022_abc12345", + node_id="research" +) +``` + +#### Step 3e: Inspect memory data + +```python +# See what data a node actually produced +get_agent_session_memory( + agent_work_dir="~/.hive/agents/{agent_name}", + session_id="session_20260209_143022_abc12345", + key="research_results" +) +``` + +#### Step 3f: Find recovery points + +```python +list_agent_checkpoints( + agent_work_dir="~/.hive/agents/{agent_name}", + session_id="session_20260209_143022_abc12345", + is_clean="true" +) +``` + +Returns checkpoint summaries with IDs, types (`node_start`, `node_complete`), which node, and `is_clean` flag. Clean checkpoints are safe resume points. + +#### Step 3g: Compare checkpoints (optional) + +To understand what changed between two points in execution: + +```python +compare_agent_checkpoints( + agent_work_dir="~/.hive/agents/{agent_name}", + session_id="session_20260209_143022_abc12345", + checkpoint_id_before="cp_node_complete_research_143030", + checkpoint_id_after="cp_node_complete_review_143115" +) +``` + +Returns memory diff (added/removed/changed keys) and execution path diff. + +--- + +### Phase 4: Fix Based on Root Cause + +Use the analysis from Phase 3 to determine what to fix and where. + +| Root Cause | What to Fix | Where to Edit | +|------------|------------|---------------| +| **Prompt issue** — LLM produces wrong output format, misses instructions | Node `system_prompt` | `exports/{agent}/nodes/__init__.py` | +| **Code bug** — TypeError, KeyError, logic error in Python | Agent code | `exports/{agent}/agent.py`, `nodes/__init__.py` | +| **Graph issue** — wrong routing, missing edge, bad condition_expr | Edges, node config | `exports/{agent}/agent.py` | +| **Tool issue** — MCP tool fails, wrong config, missing credential | Tool config | `exports/{agent}/mcp_servers.json`, `/hive-credentials` | +| **Goal issue** — success criteria too strict/vague, wrong constraints | Goal definition | `exports/{agent}/agent.py` (goal section) | +| **Test issue** — test expectations don't match actual agent behavior | Test code | `exports/{agent}/tests/test_*.py` | + +#### Fix strategies by error category + +**IMPLEMENTATION_ERROR** (TypeError, AttributeError, KeyError): +```python +# Read the failing code +Read(file_path="exports/{agent_name}/nodes/__init__.py") + +# Fix the bug +Edit( + file_path="exports/{agent_name}/nodes/__init__.py", + old_string="results.get('videos')", + new_string="(results or {}).get('videos', [])" +) +``` + +**ASSERTION_FAILURE** (test assertions fail but agent ran successfully): +- Check if the agent's output is actually wrong → fix the prompt +- Check if the test's expectations are unrealistic → fix the test +- Use `get_agent_session_memory` to see what the agent actually produced + +**TIMEOUT / STALL** (agent runs too long): +- Check `node_visit_counts` for feedback loops hitting max_node_visits +- Check L3 logs for tool calls that hang +- Reduce `max_iterations` in loop_config or fix the prompt to converge faster + +**API_ERROR** (connection, rate limit, auth): +- Verify credentials with `/hive-credentials` +- Check MCP server configuration + +--- + +### Phase 5: Recover & Resume + +After fixing the agent, decide whether to resume or re-run. + +#### When to resume from checkpoint + +Resume when ALL of these are true: +- The fix is to a node that comes AFTER existing clean checkpoints +- Clean checkpoints exist (from a CLI execution with checkpointing) +- The early nodes are expensive (web scraping, API calls, long LLM chains) + +```bash +# Resume from the last clean checkpoint before the failing node +PYTHONPATH=core:exports uv run python -m {agent_name} --tui \ + --resume-session session_20260209_143022_abc12345 \ + --checkpoint cp_node_complete_research_143030 +``` + +This skips all nodes before the checkpoint and only re-runs the fixed node onward. + +#### When to re-run from scratch + +Re-run when ANY of these are true: +- The fix is to the entry node or an early node +- No checkpoints exist (e.g., agent was run via `run_tests`) +- The agent is fast (2-3 nodes, completes in seconds) +- You changed the graph structure (added/removed nodes/edges) + +```bash +PYTHONPATH=core:exports uv run python -m {agent_name} --tui +``` + +#### Inspecting a checkpoint before resuming + +```python +get_agent_checkpoint( + agent_work_dir="~/.hive/agents/{agent_name}", + session_id="session_20260209_143022_abc12345", + checkpoint_id="cp_node_complete_research_143030" +) +``` + +Returns the full checkpoint: shared_memory snapshot, execution_path, current_node, next_node, is_clean. + +#### Loop back to Phase 2 + +After resuming or re-running, check if the fix worked. If not, go back to Phase 3. + +--- + +### Phase 6: Final Verification + +Once the iterative fix loop converges (the agent produces correct output), run the full automated test suite: + +```python +run_tests( + goal_id="your-goal-id", + agent_path="exports/{agent_name}" +) +``` + +All tests should pass. If not, repeat the loop for remaining failures. + +--- + +## Credential Requirements **CRITICAL: Testing requires ALL credentials the agent depends on.** This includes both the LLM API key AND any tool-specific credentials (HubSpot, Brave Search, etc.). @@ -157,35 +426,30 @@ Common tool credentials: - Tests need to execute the agent's LLM nodes to validate behavior - Tools with missing credentials will return error dicts instead of real data - Mock mode bypasses everything, providing no confidence in real-world performance -- The `AgentRunner.run()` method validates credentials at startup and will fail fast if any are missing ### Mock Mode Limitations Mock mode (`--mock` flag or `mock_mode=True`) is **ONLY for structure validation**: -✓ Validates graph structure (nodes, edges, connections) -✓ Tests that code doesn't crash on execution -✗ Does NOT test LLM message generation -✗ Does NOT test reasoning or decision-making quality -✗ Does NOT test constraint validation (length limits, format rules) -✗ Does NOT test real API integrations or tool use -✗ Does NOT test personalization or content quality +- Validates graph structure (nodes, edges, connections) +- Tests that code doesn't crash on execution +- Does NOT test LLM reasoning, content quality, or constraint validation +- Does NOT test real API integrations or tool use -**Bottom line:** If you're testing whether an agent achieves its goal, you MUST use real credentials for ALL services. +**Bottom line:** If you're testing whether an agent achieves its goal, you MUST use real credentials. ### Enforcing Credentials in Tests -When generating tests, **ALWAYS include credential checks for ALL required services**: +When writing tests, **ALWAYS include credential checks**: ```python import os import pytest from aden_tools.credentials import CredentialManager -# At the top of every test file pytestmark = pytest.mark.skipif( not CredentialManager().is_available("anthropic") and not os.environ.get("MOCK_MODE"), - reason="API key required for real testing. Set ANTHROPIC_API_KEY or use MOCK_MODE=1 for structure validation only." + reason="API key required for real testing. Set ANTHROPIC_API_KEY or use MOCK_MODE=1." ) @@ -195,628 +459,62 @@ def check_credentials(): creds = CredentialManager() mock_mode = os.environ.get("MOCK_MODE") - # Always check LLM key if not creds.is_available("anthropic"): if mock_mode: - print("\n⚠️ Running in MOCK MODE - structure validation only") - print(" This does NOT test LLM behavior or agent quality") - print(" Set ANTHROPIC_API_KEY for real testing\n") + print("\nRunning in MOCK MODE - structure validation only") else: pytest.fail( - "\n❌ ANTHROPIC_API_KEY not set!\n\n" - "Real testing requires an API key. Choose one:\n" - "1. Set API key (RECOMMENDED):\n" - " export ANTHROPIC_API_KEY='your-key-here'\n" - "2. Run structure validation only:\n" - " MOCK_MODE=1 pytest exports/{agent}/tests/\n\n" - "Note: Mock mode does NOT validate agent behavior or quality." + "\nANTHROPIC_API_KEY not set!\n" + "Set API key: export ANTHROPIC_API_KEY='your-key-here'\n" + "Or run structure validation: MOCK_MODE=1 pytest exports/{agent}/tests/" ) - # Check tool-specific credentials (skip in mock mode) if not mock_mode: - # List the tools this agent uses - update per agent - agent_tools = [] # e.g., ["hubspot_search_contacts", "hubspot_get_contact"] + agent_tools = [] # Update per agent missing = creds.get_missing_for_tools(agent_tools) if missing: - lines = ["\n❌ Missing tool credentials!\n"] + lines = ["\nMissing tool credentials!"] for name in missing: spec = creds.specs.get(name) if spec: lines.append(f" {spec.env_var} - {spec.description}") - if spec.help_url: - lines.append(f" Setup: {spec.help_url}") - lines.append("\nSet the required environment variables and re-run.") pytest.fail("\n".join(lines)) ``` ### User Communication -When the user asks to test an agent, **ALWAYS check for ALL credentials first** — not just the LLM key: +When the user asks to test an agent, **ALWAYS check for ALL credentials first**: -1. **Identify the agent's tools** from `agent.json` or `mcp_servers.json` +1. **Identify the agent's tools** from `mcp_servers.json` 2. **Check ALL required credentials** using `CredentialManager` 3. **Ask the user to provide any missing credentials** before proceeding +4. Collect ALL missing credentials in a single prompt — not one at a time + +--- + +## Safe Test Patterns + +### OutputCleaner + +The framework automatically validates and cleans node outputs using a fast LLM at edge traversal time. Tests should still use safe patterns because OutputCleaner may not catch all issues. + +### Safe Access (REQUIRED) ```python -from aden_tools.credentials import CredentialManager, CREDENTIAL_SPECS - -creds = CredentialManager() - -# 1. Check LLM key -missing_creds = [] -if not creds.is_available("anthropic"): - missing_creds.append(("ANTHROPIC_API_KEY", "Anthropic API key for LLM calls")) - -# 2. Check tool-specific credentials -agent_tools = [...] # Determined from agent config -missing_tools = creds.get_missing_for_tools(agent_tools) -for name in missing_tools: - spec = CREDENTIAL_SPECS.get(name) - if spec: - missing_creds.append((spec.env_var, spec.description)) - -# 3. Present ALL missing credentials to the user at once -if missing_creds: - print("⚠️ Missing credentials required by this agent:\n") - for env_var, description in missing_creds: - print(f" • {env_var} — {description}") - print() - print("Please set the missing environment variables:") - for env_var, _ in missing_creds: - print(f" export {env_var}='your-value-here'") - print() - print("Or run in mock mode (structure validation only):") - print(" MOCK_MODE=1 pytest exports/{agent}/tests/") - - # Ask user to provide credentials or choose mock mode - AskUserQuestion(...) -``` - -**IMPORTANT:** Do NOT skip credential collection. If an agent uses HubSpot tools, the user MUST provide `HUBSPOT_ACCESS_TOKEN`. If it uses web search, the user MUST provide the appropriate search API key. Collect ALL missing credentials in a single prompt rather than discovering them one at a time during test failures. - -## The Three-Stage Flow - -``` -┌─────────────────────────────────────────────────────────────────────────┐ -│ GOAL STAGE │ -│ (hive-create skill) │ -│ │ -│ 1. User defines goal with success_criteria and constraints │ -│ 2. Goal written to agent.py immediately │ -│ 3. Generate CONSTRAINT TESTS → Write to tests/ → USER APPROVAL │ -│ Files created: exports/{agent}/tests/test_constraints.py │ -└─────────────────────────────────────────────────────────────────────────┘ - ↓ -┌─────────────────────────────────────────────────────────────────────────┐ -│ AGENT STAGE │ -│ (hive-create skill) │ -│ │ -│ Build nodes + edges, written immediately to files │ -│ Constraint tests can run during development: │ -│ run_tests(goal_id, agent_path, test_types='["constraint"]') │ -└─────────────────────────────────────────────────────────────────────────┘ - ↓ -┌─────────────────────────────────────────────────────────────────────────┐ -│ EVAL STAGE (this skill) │ -│ │ -│ 1. Generate SUCCESS_CRITERIA TESTS → Write to tests/ → USER APPROVAL │ -│ Files created: exports/{agent}/tests/test_success_criteria.py │ -│ 2. Run all tests: run_tests(goal_id, agent_path) │ -│ 3. On failure → debug_test(goal_id, test_name, agent_path) │ -│ 4. Iterate: Edit agent code → Re-run run_tests (instant feedback) │ -└─────────────────────────────────────────────────────────────────────────┘ -``` - -## Step-by-Step: Testing an Agent - -### Step 1: Check Existing Tests - -**ALWAYS check first** before generating new tests: - -```python -mcp__agent-builder__list_tests( - goal_id="your-goal-id", - agent_path="exports/your_agent" -) -``` - -This shows what test files already exist. If tests exist: -- Review the list to see what's covered -- Ask user if they want to add more or run existing tests - -### Step 2: Get Constraint Test Guidelines (Goal Stage) - -After goal is defined, get test guidelines using the MCP tool: - -```python -# First, read the goal from agent.py to get the goal JSON -goal_code = Read(file_path="exports/your_agent/agent.py") -# Extract the goal definition and convert to JSON - -# Get constraint test guidelines via MCP tool -result = mcp__agent-builder__generate_constraint_tests( - goal_id="your-goal-id", - goal_json='{"id": "goal-id", "name": "...", "constraints": [...]}', - agent_path="exports/your_agent" -) -``` - -**Response includes:** -- `output_file`: Where to write tests (e.g., `exports/your_agent/tests/test_constraints.py`) -- `file_header`: Imports, fixtures, and pytest setup to use at the top of the file -- `test_template`: Format for test functions -- `constraints_formatted`: The constraints to test -- `test_guidelines`: Rules and best practices for writing tests -- `instruction`: How to proceed - -**Write tests directly** using the provided guidelines: - -```python -# Write tests using the Write tool -Write( - file_path=result["output_file"], - content=result["file_header"] + "\n\n" + your_test_code -) -``` - -### Step 3: Get Success Criteria Test Guidelines (Eval Stage) - -After agent is fully built, get success criteria test guidelines: - -```python -# Get success criteria test guidelines via MCP tool -result = mcp__agent-builder__generate_success_tests( - goal_id="your-goal-id", - goal_json='{"id": "goal-id", "name": "...", "success_criteria": [...]}', - node_names="analyze_request,search_web,format_results", - tool_names="web_search,web_scrape", - agent_path="exports/your_agent" -) -``` - -**Write tests directly** using the provided guidelines: - -```python -# Write tests using the Write tool -Write( - file_path=result["output_file"], - content=result["file_header"] + "\n\n" + your_test_code -) -``` - -### Step 4: Test Fixtures (conftest.py) - -The `file_header` returned by the MCP tools includes proper imports and fixtures. -You should also create a conftest.py file in the tests directory with shared fixtures: - -```python -# Create conftest.py with the conftest template -Write( - file_path="exports/your_agent/tests/conftest.py", - content=conftest_content # Use PYTEST_CONFTEST_TEMPLATE format -) -``` - -### Step 5: Run Tests - -**Use the MCP tool to run tests** (not pytest directly): - -```python -mcp__agent-builder__run_tests( - goal_id="your-goal-id", - agent_path="exports/your_agent" -) - -**Response includes structured results:** -```json -{ - "goal_id": "your-goal-id", - "overall_passed": false, - "summary": { - "total": 12, - "passed": 10, - "failed": 2, - "skipped": 0, - "errors": 0, - "pass_rate": "83.3%" - }, - "test_results": [ - {"file": "test_constraints.py", "test_name": "test_constraint_api_rate_limits", "status": "passed"}, - {"file": "test_success_criteria.py", "test_name": "test_success_find_relevant_results", "status": "failed"} - ], - "failures": [ - {"test_name": "test_success_find_relevant_results", "details": "AssertionError: Expected 3-5 results..."} - ] -} -``` - -**Options for `run_tests`:** -```python -# Run only constraint tests -mcp__agent-builder__run_tests( - goal_id="your-goal-id", - agent_path="exports/your_agent", - test_types='["constraint"]' -) - -# Run with parallel workers -mcp__agent-builder__run_tests( - goal_id="your-goal-id", - agent_path="exports/your_agent", - parallel=4 -) - -# Stop on first failure -mcp__agent-builder__run_tests( - goal_id="your-goal-id", - agent_path="exports/your_agent", - fail_fast=True -) -``` - -### Step 6: Debug Failed Tests - -**Use the MCP tool to debug** (not Bash/pytest directly): - -```python -mcp__agent-builder__debug_test( - goal_id="your-goal-id", - test_name="test_success_find_relevant_results", - agent_path="exports/your_agent" -) -``` - -**Response includes:** -- Full verbose output from the test -- Stack trace with exact line numbers -- Captured logs and prints -- Suggestions for fixing the issue - -### Step 7: Categorize Errors - -When a test fails, categorize the error to guide iteration: - -```python -def categorize_test_failure(test_output, agent_code): - """Categorize test failure to guide iteration.""" - - # Read test output and agent code - failure_info = { - "test_name": "...", - "error_message": "...", - "stack_trace": "...", - } - - # Pattern-based categorization - if any(pattern in failure_info["error_message"].lower() for pattern in [ - "typeerror", "attributeerror", "keyerror", "valueerror", - "null", "none", "undefined", "tool call failed" - ]): - category = "IMPLEMENTATION_ERROR" - guidance = { - "stage": "Agent", - "action": "Fix the bug in agent code", - "files_to_edit": ["agent.py", "nodes/__init__.py"], - "restart_required": False, - "description": "Code bug - fix and re-run tests" - } - - elif any(pattern in failure_info["error_message"].lower() for pattern in [ - "assertion", "expected", "got", "should be", "success criteria" - ]): - category = "LOGIC_ERROR" - guidance = { - "stage": "Goal", - "action": "Update goal definition", - "files_to_edit": ["agent.py (goal section)"], - "restart_required": True, - "description": "Goal definition is wrong - update and rebuild" - } - - elif any(pattern in failure_info["error_message"].lower() for pattern in [ - "timeout", "rate limit", "empty", "boundary", "edge case" - ]): - category = "EDGE_CASE" - guidance = { - "stage": "Eval", - "action": "Add edge case test and fix handling", - "files_to_edit": ["agent.py", "tests/test_edge_cases.py"], - "restart_required": False, - "description": "New scenario - add test and handle it" - } - - else: - category = "UNKNOWN" - guidance = { - "stage": "Unknown", - "action": "Manual investigation required", - "restart_required": False - } - - return { - "category": category, - "guidance": guidance, - "failure_info": failure_info - } -``` - -**Show categorization to user:** - -```python -AskUserQuestion( - questions=[{ - "question": f"Test failed with {category}. How would you like to proceed?", - "header": "Test Failure", - "options": [ - { - "label": "Fix code directly (Recommended)" if category == "IMPLEMENTATION_ERROR" else "Update goal", - "description": guidance["description"] - }, - { - "label": "Show detailed error info", - "description": "View full stack trace and logs" - }, - { - "label": "Skip for now", - "description": "Continue with other tests" - } - ], - "multiSelect": false - }] -) -``` - -### Step 8: Iterate Based on Error Category - -#### IMPLEMENTATION_ERROR → Fix Agent Code - -```python -# 1. Show user the exact file and line that failed -print(f"Error in: exports/{agent_name}/nodes/__init__.py:42") -print(f"Issue: 'NoneType' object has no attribute 'get'") - -# 2. Read the problematic code -code = Read(file_path=f"exports/{agent_name}/nodes/__init__.py") - -# 3. User can fix directly, or you suggest a fix: -Edit( - file_path=f"exports/{agent_name}/nodes/__init__.py", - old_string="if results.get('videos'):", - new_string="if results and results.get('videos'):" -) - -# 4. Re-run tests immediately (instant feedback!) -mcp__agent-builder__run_tests( - goal_id="your-goal-id", - agent_path=f"exports/{agent_name}" -) -``` - -#### LOGIC_ERROR → Update Goal - -```python -# 1. Show user the goal definition -goal_code = Read(file_path=f"exports/{agent_name}/agent.py") - -# 2. Discuss what needs to change in success_criteria or constraints - -# 3. Edit the goal -Edit( - file_path=f"exports/{agent_name}/agent.py", - old_string='target="3-5 videos"', - new_string='target="1-5 videos"' # More realistic -) - -# 4. May need to regenerate agent nodes if goal changed significantly -# This requires going back to hive-create skill -``` - -#### EDGE_CASE → Add Test and Fix - -```python -# 1. Create new edge case test with API key enforcement -edge_case_test = ''' -@pytest.mark.asyncio -async def test_edge_case_empty_results(mock_mode): - """Test: Agent handles no results gracefully""" - result = await default_agent.run({{"query": "xyzabc123nonsense"}}, mock_mode=mock_mode) - - # Should succeed with empty results, not crash - assert result.success or result.error is not None - if result.success: - assert result.output.get("message") == "No results found" -''' - -# 2. Add to test file -Edit( - file_path=f"exports/{agent_name}/tests/test_edge_cases.py", - old_string="# Add edge case tests here", - new_string=edge_case_test -) - -# 3. Fix agent to handle edge case -# Edit agent code to handle empty results - -# 4. Re-run tests -``` - -## Test File Templates (Reference Only) - -**⚠️ Do NOT copy-paste these templates directly.** Use `generate_constraint_tests` and `generate_success_tests` MCP tools to create properly structured tests with correct imports and fixtures. - -These templates show the structure of generated tests for reference only. - -### Constraint Test Template - -```python -"""Constraint tests for {agent_name}. - -These tests validate that the agent respects its defined constraints. -Requires ANTHROPIC_API_KEY for real testing. -""" - -import os -import pytest -from exports.{agent_name} import default_agent -from aden_tools.credentials import CredentialManager - - -# Enforce API key for real testing -pytestmark = pytest.mark.skipif( - not CredentialManager().is_available("anthropic") and not os.environ.get("MOCK_MODE"), - reason="API key required. Set ANTHROPIC_API_KEY or use MOCK_MODE=1." -) - - -@pytest.mark.asyncio -async def test_constraint_{constraint_id}(): - """Test: {constraint_description}""" - # Test implementation based on constraint type - mock_mode = bool(os.environ.get("MOCK_MODE")) - result = await default_agent.run({{"test": "input"}}, mock_mode=mock_mode) - - # Assert constraint is respected - assert True # Replace with actual check -``` - -### Success Criteria Test Template - -```python -"""Success criteria tests for {agent_name}. - -These tests validate that the agent achieves its defined success criteria. -Requires ANTHROPIC_API_KEY for real testing - mock mode cannot validate success criteria. -""" - -import os -import pytest -from exports.{agent_name} import default_agent -from aden_tools.credentials import CredentialManager - - -# Enforce API key for real testing -pytestmark = pytest.mark.skipif( - not CredentialManager().is_available("anthropic") and not os.environ.get("MOCK_MODE"), - reason="API key required. Set ANTHROPIC_API_KEY or use MOCK_MODE=1." -) - - -@pytest.mark.asyncio -async def test_success_{criteria_id}(): - """Test: {criteria_description}""" - mock_mode = bool(os.environ.get("MOCK_MODE")) - result = await default_agent.run({{"test": "input"}}, mock_mode=mock_mode) - - assert result.success, f"Agent failed: {{result.error}}" - - # Verify success criterion met - # e.g., assert metric meets target - assert True # Replace with actual check -``` - -### Edge Case Test Template - -```python -"""Edge case tests for {agent_name}. - -These tests validate agent behavior in unusual or boundary conditions. -Requires ANTHROPIC_API_KEY for real testing. -""" - -import os -import pytest -from exports.{agent_name} import default_agent -from aden_tools.credentials import CredentialManager - - -# Enforce API key for real testing -pytestmark = pytest.mark.skipif( - not CredentialManager().is_available("anthropic") and not os.environ.get("MOCK_MODE"), - reason="API key required. Set ANTHROPIC_API_KEY or use MOCK_MODE=1." -) - - -@pytest.mark.asyncio -async def test_edge_case_{scenario_name}(): - """Test: Agent handles {scenario_description}""" - mock_mode = bool(os.environ.get("MOCK_MODE")) - result = await default_agent.run({{"edge": "case_input"}}, mock_mode=mock_mode) - - # Verify graceful handling - assert result.success or result.error is not None -``` - -## Interactive Build + Test Loop - -During agent construction (Agent stage), you can run constraint tests incrementally: - -```python -# After adding first node -print("Added search_node. Running relevant constraint tests...") -mcp__agent-builder__run_tests( - goal_id="your-goal-id", - agent_path=f"exports/{agent_name}", - test_types='["constraint"]' -) - -# After adding second node -print("Added filter_node. Running all constraint tests...") -mcp__agent-builder__run_tests( - goal_id="your-goal-id", - agent_path=f"exports/{agent_name}", - test_types='["constraint"]' -) -``` - -This provides **immediate feedback** during development, catching issues early. - -## Common Test Patterns - -**Note:** All test patterns should include API key enforcement via conftest.py. - -### ⚠️ CRITICAL: Framework Features You Must Know - -#### OutputCleaner - Automatic I/O Cleaning (NEW!) - -**The framework now automatically validates and cleans node outputs** using a fast LLM (Cerebras llama-3.3-70b) at edge traversal time. This prevents cascading failures from malformed output. - -**What OutputCleaner does**: -- ✅ Validates output matches next node's input schema -- ✅ Detects JSON parsing trap (entire response in one key) -- ✅ Cleans malformed output automatically (~200-500ms, ~$0.001 per cleaning) -- ✅ Boosts success rates by 1.8-2.2x - -**Impact on tests**: Tests should still use safe patterns because OutputCleaner may not catch all issues in test mode. - -#### Safe Test Patterns (REQUIRED) - -**❌ UNSAFE** (will cause test failures): -```python -# Direct key access - can crash! -approval_decision = result.output["approval_decision"] -assert approval_decision == "APPROVED" - -# Nested access without checks +# UNSAFE - will crash on missing keys +approval = result.output["approval_decision"] category = result.output["analysis"]["category"] -# Assuming parsed JSON structure -for issue in result.output["compliance_issues"]: - ... -``` - -**✅ SAFE** (correct patterns): -```python -# 1. Safe dict access with .get() +# SAFE - use .get() with defaults output = result.output or {} -approval_decision = output.get("approval_decision", "UNKNOWN") -assert "APPROVED" in approval_decision or approval_decision == "APPROVED" +approval = output.get("approval_decision", "UNKNOWN") -# 2. Type checking before operations +# SAFE - type check before operations analysis = output.get("analysis", {}) if isinstance(analysis, dict): category = analysis.get("category", "unknown") -# 3. Parse JSON from strings (the JSON parsing trap!) +# SAFE - handle JSON parsing trap (LLM response as string) import json recommendation = output.get("recommendation", "{}") if isinstance(recommendation, str): @@ -829,16 +527,15 @@ if isinstance(recommendation, str): elif isinstance(recommendation, dict): approval = recommendation.get("approval_decision", "UNKNOWN") -# 4. Safe iteration with type check -compliance_issues = output.get("compliance_issues", []) -if isinstance(compliance_issues, list): - for issue in compliance_issues: +# SAFE - type check before iteration +items = output.get("items", []) +if isinstance(items, list): + for item in items: ... ``` -#### Helper Functions for Safe Access +### Helper Functions for conftest.py -**Add to conftest.py**: ```python import json import re @@ -846,9 +543,7 @@ import re def _parse_json_from_output(result, key): """Parse JSON from agent output (framework may store full LLM response as string).""" response_text = result.output.get(key, "") - # Remove markdown code blocks if present json_text = re.sub(r'```json\s*|\s*```', '', response_text).strip() - try: return json.loads(json_text) except (json.JSONDecodeError, AttributeError, TypeError): @@ -858,7 +553,6 @@ def safe_get_nested(result, key_path, default=None): """Safely get nested value from result.output.""" output = result.output or {} current = output - for key in key_path: if isinstance(current, dict): current = current.get(key) @@ -874,7 +568,6 @@ def safe_get_nested(result, key_path, default=None): return default else: return default - return current if current is not None else default # Make available in tests @@ -882,313 +575,342 @@ pytest.parse_json_from_output = _parse_json_from_output pytest.safe_get_nested = safe_get_nested ``` -**Usage in tests**: -```python -# Use helper to parse JSON safely -parsed = pytest.parse_json_from_output(result, "recommendation") -if isinstance(parsed, dict): - approval = parsed.get("approval_decision", "UNKNOWN") - -# Safe nested access -risk_score = pytest.safe_get_nested(result, ["analysis", "risk_score"], default=0.0) -``` - -#### Test Count Guidance - -**Generate 8-15 tests total, NOT 30+** - -- ✅ 2-3 tests per success criterion -- ✅ 1 happy path test -- ✅ 1 boundary/edge case test -- ✅ 1 error handling test (optional) - -**Why fewer tests?**: -- Each test requires real LLM call (~3 seconds, costs money) -- 30 tests = 90 seconds, $0.30+ in costs -- 12 tests = 36 seconds, $0.12 in costs -- Focus on quality over quantity - -#### ExecutionResult Fields (Important!) +### ExecutionResult Fields **`result.success=True` means NO exception, NOT goal achieved** ```python -# ❌ WRONG - assumes goal achieved +# WRONG assert result.success -# ✅ RIGHT - check success AND output +# RIGHT assert result.success, f"Agent failed: {result.error}" output = result.output or {} approval = output.get("approval_decision") assert approval == "APPROVED", f"Expected APPROVED, got {approval}" ``` -**All ExecutionResult fields**: -- `success: bool` - Execution completed without exception (NOT goal achieved!) -- `output: dict` - Complete memory snapshot (may contain raw strings) -- `error: str | None` - Error message if failed -- `steps_executed: int` - Number of nodes executed -- `total_tokens: int` - Cumulative token usage -- `total_latency_ms: int` - Total execution time -- `path: list[str]` - Node IDs traversed (may contain repeated IDs from feedback loops) -- `paused_at: str | None` - Node ID if HITL pause occurred -- `session_state: dict` - State for resuming -- `node_visit_counts: dict[str, int]` - How many times each node executed (useful for feedback loop testing) +All fields: +- `success: bool` — Completed without exception (NOT goal achieved!) +- `output: dict` — Complete memory snapshot (may contain raw strings) +- `error: str | None` — Error message if failed +- `steps_executed: int` — Number of nodes executed +- `total_tokens: int` — Cumulative token usage +- `total_latency_ms: int` — Total execution time +- `path: list[str]` — Node IDs traversed (may repeat in feedback loops) +- `paused_at: str | None` — Node ID if paused +- `session_state: dict` — State for resuming +- `node_visit_counts: dict[str, int]` — Visit counts per node (feedback loop testing) +- `execution_quality: str` — "clean", "degraded", or "failed" -### Happy Path Test +### Test Count Guidance + +**Write 8-15 tests, not 30+** + +- 2-3 tests per success criterion +- 1 happy path test +- 1 boundary/edge case test +- 1 error handling test (optional) + +Each real test costs ~3 seconds + LLM tokens. 12 tests = ~36 seconds, $0.12. + +--- + +## Test Patterns + +### Happy Path ```python @pytest.mark.asyncio async def test_happy_path(mock_mode): - """Test normal successful execution""" - result = await default_agent.run({{"query": "python tutorials"}}, mock_mode=mock_mode) - assert result.success - assert len(result.output) > 0 + """Test normal successful execution.""" + result = await default_agent.run({"query": "python tutorials"}, mock_mode=mock_mode) + assert result.success, f"Agent failed: {result.error}" + output = result.output or {} + assert output.get("report"), "No report produced" ``` -### Boundary Condition Test +### Boundary Condition ```python @pytest.mark.asyncio -async def test_boundary_minimum(mock_mode): - """Test at minimum threshold""" - result = await default_agent.run({{"query": "very specific niche topic"}}, mock_mode=mock_mode) - assert result.success - assert len(result.output.get("results", [])) >= 1 +async def test_minimum_sources(mock_mode): + """Test at minimum source threshold.""" + result = await default_agent.run({"query": "niche topic"}, mock_mode=mock_mode) + assert result.success, f"Agent failed: {result.error}" + output = result.output or {} + sources = output.get("sources", []) + if isinstance(sources, list): + assert len(sources) >= 3, f"Expected >= 3 sources, got {len(sources)}" ``` -### Error Handling Test +### Error Handling ```python @pytest.mark.asyncio -async def test_error_handling(mock_mode): - """Test graceful error handling""" - result = await default_agent.run({{"query": ""}}, mock_mode=mock_mode) # Invalid input - assert not result.success or result.output.get("error") is not None +async def test_empty_input(mock_mode): + """Test graceful handling of empty input.""" + result = await default_agent.run({"query": ""}, mock_mode=mock_mode) + # Agent should either fail gracefully or produce an error message + output = result.output or {} + assert not result.success or output.get("error"), "Should handle empty input" ``` -### Performance Test +### Feedback Loop ```python @pytest.mark.asyncio -async def test_performance_latency(mock_mode): - """Test response time is acceptable""" - import time - start = time.time() - result = await default_agent.run({{"query": "test"}}, mock_mode=mock_mode) - duration = time.time() - start - assert duration < 5.0, f"Took {{duration}}s, expected <5s" -``` - -### Testing Event Loop Nodes - -Event loop nodes run multi-turn loops internally. Tests should verify: - -**Output Keys Test** — All required keys are set via `set_output`: -```python -@pytest.mark.asyncio -async def test_all_output_keys_set(mock_mode): - """Test that event_loop nodes set all required output keys.""" - result = await default_agent.run({{"query": "test"}}, mock_mode=mock_mode) - assert result.success, f"Agent failed: {{result.error}}" - output = result.output or {{}} - for key in ["expected_key_1", "expected_key_2"]: - assert key in output, f"Output key '{{key}}' not set by event_loop node" -``` - -**Feedback Loop Test** — Verify feedback loops terminate: -```python -@pytest.mark.asyncio -async def test_feedback_loop_respects_max_visits(mock_mode): - """Test that feedback loops terminate at max_node_visits.""" - result = await default_agent.run({{"input": "trigger_rejection"}}, mock_mode=mock_mode) - assert result.success or result.error is not None - visits = getattr(result, "node_visit_counts", {{}}) or {{}} +async def test_feedback_loop_terminates(mock_mode): + """Test that feedback loops don't run forever.""" + result = await default_agent.run({"query": "test"}, mock_mode=mock_mode) + visits = result.node_visit_counts or {} for node_id, count in visits.items(): - assert count <= 5, f"Node {{node_id}} visited {{count}} times" -``` - -**Fan-Out Test** — Verify parallel branches both complete: -```python -@pytest.mark.asyncio -async def test_parallel_branches_complete(mock_mode): - """Test that fan-out branches all complete and produce outputs.""" - result = await default_agent.run({{"query": "test"}}, mock_mode=mock_mode) - assert result.success - output = result.output or {{}} - # Check outputs from both parallel branches - assert "branch_a_output" in output, "Branch A output missing" - assert "branch_b_output" in output, "Branch B output missing" -``` - -**Client-Facing Node Test** — In mock mode, client-facing nodes may not block: -```python -@pytest.mark.asyncio -async def test_client_facing_node(mock_mode): - """Test that client-facing nodes produce output.""" - result = await default_agent.run({{"query": "test"}}, mock_mode=mock_mode) - # In mock mode, client-facing blocking is typically bypassed - assert result.success or result.paused_at is not None -``` - -## Integration with hive-create - -### Handoff Points - -| Scenario | From | To | Action | -|----------|------|-----|--------| -| Agent built, ready to test | hive-create | hive-test | Generate success tests | -| LOGIC_ERROR found | hive-test | hive-create | Update goal, rebuild | -| IMPLEMENTATION_ERROR found | hive-test | Direct fix | Edit agent files, re-run tests | -| EDGE_CASE found | hive-test | hive-test | Add edge case test | -| All tests pass | hive-test | Done | Agent validated ✅ | - -### Iteration Speed Comparison - -| Scenario | Old Approach | New Approach | -|----------|--------------|--------------| -| **Bug Fix** | Rebuild via MCP tools (14 min) | Edit Python file, pytest (2 min) | -| **Add Test** | Generate via MCP, export (5 min) | Write test file directly (1 min) | -| **Debug** | Read subprocess logs | pdb, breakpoints, prints | -| **Inspect** | Limited visibility | Full Python introspection | - -## Anti-Patterns - -### Testing Best Practices - -| Don't | Do Instead | -|-------|------------| -| ❌ Write tests without getting guidelines first | ✅ Use `generate_*_tests` to get proper file_header and guidelines | -| ❌ Run pytest via Bash | ✅ Use `run_tests` MCP tool for structured results | -| ❌ Debug tests with Bash pytest -vvs | ✅ Use `debug_test` MCP tool for formatted output | -| ❌ Check for tests with Glob | ✅ Use `list_tests` MCP tool | -| ❌ Skip the file_header from guidelines | ✅ Always include the file_header for proper imports and fixtures | - -### General Testing - -| Don't | Do Instead | -|-------|------------| -| ❌ Treat all failures the same | ✅ Use debug_test to categorize and iterate appropriately | -| ❌ Rebuild entire agent for small bugs | ✅ Edit code directly, re-run tests | -| ❌ Run tests without API key | ✅ Always set ANTHROPIC_API_KEY first | -| ❌ Write tests without understanding the constraints/criteria | ✅ Read the formatted constraints/criteria from guidelines | - -## Workflow Summary - -``` -1. Check existing tests: list_tests(goal_id, agent_path) - → Scans exports/{agent}/tests/test_*.py - ↓ -2. Get test guidelines: generate_constraint_tests, generate_success_tests - → Returns file_header, test_template, constraints/criteria, guidelines - ↓ -3. Write tests: Use Write tool with the provided guidelines - → Write tests to exports/{agent}/tests/test_*.py - ↓ -4. Run tests: run_tests(goal_id, agent_path) - → Executes: pytest exports/{agent}/tests/ -v - ↓ -5. Debug failures: debug_test(goal_id, test_name, agent_path) - → Re-runs single test with verbose output - ↓ -6. Fix based on category: - - IMPLEMENTATION_ERROR → Edit agent code directly - - ASSERTION_FAILURE → Fix agent logic or update test - - IMPORT_ERROR → Check package structure - - API_ERROR → Check API keys and connectivity - ↓ -7. Re-run tests: run_tests(goal_id, agent_path) - ↓ -8. Repeat until all pass ✅ -``` - -## MCP Tools Reference - -```python -# Check existing tests (scans Python test files) -mcp__agent-builder__list_tests( - goal_id="your-goal-id", - agent_path="exports/your_agent" -) - -# Get constraint test guidelines (returns templates and guidelines, NOT generated tests) -mcp__agent-builder__generate_constraint_tests( - goal_id="your-goal-id", - goal_json='{"id": "...", "constraints": [...]}', - agent_path="exports/your_agent" -) -# Returns: output_file, file_header, test_template, constraints_formatted, test_guidelines - -# Get success criteria test guidelines -mcp__agent-builder__generate_success_tests( - goal_id="your-goal-id", - goal_json='{"id": "...", "success_criteria": [...]}', - node_names="node1,node2", - tool_names="tool1,tool2", - agent_path="exports/your_agent" -) -# Returns: output_file, file_header, test_template, success_criteria_formatted, test_guidelines - -# Run tests via pytest subprocess -mcp__agent-builder__run_tests( - goal_id="your-goal-id", - agent_path="exports/your_agent" -) - -# Debug a failed test (re-runs with verbose output) -mcp__agent-builder__debug_test( - goal_id="your-goal-id", - test_name="test_constraint_foo", - agent_path="exports/your_agent" -) -``` - -## run_tests Options - -```python -# Run only constraint tests -mcp__agent-builder__run_tests( - goal_id="your-goal-id", - agent_path="exports/your_agent", - test_types='["constraint"]' -) - -# Run only success criteria tests -mcp__agent-builder__run_tests( - goal_id="your-goal-id", - agent_path="exports/your_agent", - test_types='["success"]' -) - -# Run with pytest-xdist parallelism (requires pytest-xdist) -mcp__agent-builder__run_tests( - goal_id="your-goal-id", - agent_path="exports/your_agent", - parallel=4 -) - -# Stop on first failure -mcp__agent-builder__run_tests( - goal_id="your-goal-id", - agent_path="exports/your_agent", - fail_fast=True -) -``` - -## Direct pytest Commands - -You can also run tests directly with pytest (the MCP tools use pytest internally): - -```bash -# Run all tests -pytest exports/your_agent/tests/ -v - -# Run specific test file -pytest exports/your_agent/tests/test_constraints.py -v - -# Run specific test -pytest exports/your_agent/tests/test_constraints.py::test_constraint_foo -vvs - -# Run in mock mode (structure validation only) -MOCK_MODE=1 pytest exports/your_agent/tests/ -v + assert count <= 5, f"Node {node_id} visited {count} times — possible infinite loop" ``` --- -**MCP tools generate tests, write them to Python files, and run them via pytest.** +## MCP Tool Reference + +### Phase 1: Test Generation + +```python +# Check existing tests +list_tests(goal_id, agent_path) + +# Get constraint test guidelines (returns templates, NOT generated tests) +generate_constraint_tests(goal_id, goal_json, agent_path) +# Returns: output_file, file_header, test_template, constraints_formatted, test_guidelines + +# Get success criteria test guidelines +generate_success_tests(goal_id, goal_json, node_names, tool_names, agent_path) +# Returns: output_file, file_header, test_template, success_criteria_formatted, test_guidelines +``` + +### Phase 2: Execution + +```python +# Automated regression (no checkpoints, fresh runs) +run_tests(goal_id, agent_path, test_types='["all"]', parallel=-1, fail_fast=False) + +# Run only specific test types +run_tests(goal_id, agent_path, test_types='["constraint"]') +run_tests(goal_id, agent_path, test_types='["success"]') +``` + +```bash +# Iterative debugging with checkpoints (via CLI) +PYTHONPATH=core:exports uv run python -m {agent_name} --tui +``` + +### Phase 3: Analysis + +```python +# Debug a specific failed test +debug_test(goal_id, test_name, agent_path) + +# Find failed sessions +list_agent_sessions(agent_work_dir, status="failed", limit=5) + +# Inspect session state (excludes memory values) +get_agent_session_state(agent_work_dir, session_id) + +# Inspect memory data +get_agent_session_memory(agent_work_dir, session_id, key="research_results") + +# Runtime logs: L1 summaries +query_runtime_logs(agent_work_dir, status="needs_attention") + +# Runtime logs: L2 per-node details +query_runtime_log_details(agent_work_dir, run_id, needs_attention_only=True) + +# Runtime logs: L3 tool/LLM raw data +query_runtime_log_raw(agent_work_dir, run_id, node_id="research") + +# Find clean checkpoints +list_agent_checkpoints(agent_work_dir, session_id, is_clean="true") + +# Compare checkpoints (memory diff) +compare_agent_checkpoints(agent_work_dir, session_id, cp_before, cp_after) +``` + +### Phase 5: Recovery + +```python +# Inspect checkpoint before resuming +get_agent_checkpoint(agent_work_dir, session_id, checkpoint_id) +# Empty checkpoint_id = latest checkpoint +``` + +```bash +# Resume from checkpoint via CLI +PYTHONPATH=core:exports uv run python -m {agent_name} --tui \ + --resume-session {session_id} --checkpoint {checkpoint_id} +``` + +--- + +## Anti-Patterns + +| Don't | Do Instead | +|-------|-----------| +| Re-run entire agent when a late node fails | Resume from last clean checkpoint | +| Treat `result.success` as goal achieved | Check `result.output` for actual criteria | +| Access `result.output["key"]` directly | Use `result.output.get("key")` | +| Fix random things hoping tests pass | Analyze L2/L3 logs to find root cause first | +| Write 30+ tests | Write 8-15 focused tests | +| Skip credential check | Use `/hive-credentials` before testing | +| Confuse `exports/` with `~/.hive/agents/` | Code in `exports/`, runtime data in `~/.hive/` | +| Use `run_tests` for iterative debugging | Use CLI with checkpoints for iterative debugging | +| Use CLI for final regression | Use `run_tests` for automated regression | +| Run tests without reading goal first | Always understand the goal before writing tests | +| Skip Phase 3 analysis and guess | Use session + log tools to identify root cause | + +--- + +## Example Walkthrough: Deep Research Agent + +A complete iteration showing the test loop for an agent with nodes: `intake → research → review → report`. + +### Phase 1: Generate tests + +```python +# Read the goal +Read(file_path="exports/deep_research_agent/agent.py") + +# Get success criteria test guidelines +result = generate_success_tests( + goal_id="rigorous-interactive-research", + goal_json='{"id": "rigorous-interactive-research", "success_criteria": [{"id": "source-diversity", "target": ">=5"}, {"id": "citation-coverage", "target": "100%"}, {"id": "report-completeness", "target": "90%"}]}', + node_names="intake,research,review,report", + tool_names="web_search,web_scrape", + agent_path="exports/deep_research_agent" +) + +# Write tests +Write( + file_path=result["output_file"], + content=result["file_header"] + "\n\n" + test_code +) +``` + +### Phase 2: First execution + +```python +run_tests( + goal_id="rigorous-interactive-research", + agent_path="exports/deep_research_agent", + fail_fast=True +) +``` + +Result: `test_success_source_diversity` fails — agent only found 2 sources instead of 5. + +### Phase 3: Analyze + +```python +# Debug the failing test +debug_test( + goal_id="rigorous-interactive-research", + test_name="test_success_source_diversity", + agent_path="exports/deep_research_agent" +) +# → ASSERTION_FAILURE: Expected >= 5 sources, got 2 + +# Find the session +list_agent_sessions( + agent_work_dir="~/.hive/agents/deep_research_agent", + status="completed", + limit=1 +) +# → session_20260209_150000_abc12345 + +# See what the research node produced +get_agent_session_memory( + agent_work_dir="~/.hive/agents/deep_research_agent", + session_id="session_20260209_150000_abc12345", + key="research_results" +) +# → Only 2 web_search calls made, each returned 1 source + +# Check the LLM's behavior in the research node +query_runtime_log_raw( + agent_work_dir="~/.hive/agents/deep_research_agent", + run_id="session_20260209_150000_abc12345", + node_id="research" +) +# → LLM called web_search only twice, then called set_output +``` + +Root cause: The research node's prompt doesn't tell the LLM to search for at least 5 diverse sources. It stops after the first couple of searches. + +### Phase 4: Fix the prompt + +```python +Read(file_path="exports/deep_research_agent/nodes/__init__.py") + +Edit( + file_path="exports/deep_research_agent/nodes/__init__.py", + old_string='system_prompt="Search for information on the user\'s topic."', + new_string='system_prompt="Search for information on the user\'s topic. You MUST find at least 5 diverse, authoritative sources. Use multiple different search queries to ensure source diversity. Do not stop searching until you have at least 5 distinct sources."' +) +``` + +### Phase 5: Resume from checkpoint + +For this example, the fix is to the `research` node. If we had run via CLI with checkpointing, we could resume from the checkpoint after `intake` to skip re-running intake: + +```bash +# Check if clean checkpoint exists after intake +list_agent_checkpoints( + agent_work_dir="~/.hive/agents/deep_research_agent", + session_id="session_20260209_150000_abc12345", + is_clean="true" +) +# → cp_node_complete_intake_150005 + +# Resume from after intake, re-run research with fixed prompt +PYTHONPATH=core:exports uv run python -m deep_research_agent --tui \ + --resume-session session_20260209_150000_abc12345 \ + --checkpoint cp_node_complete_intake_150005 +``` + +Or for this simple case (intake is fast), just re-run: + +```bash +PYTHONPATH=core:exports uv run python -m deep_research_agent --tui +``` + +### Phase 6: Final verification + +```python +run_tests( + goal_id="rigorous-interactive-research", + agent_path="exports/deep_research_agent" +) +# → All 12 tests pass +``` + +--- + +## Test File Structure + +``` +exports/{agent_name}/ +├── agent.py ← Agent to test (goal, nodes, edges) +├── nodes/__init__.py ← Node implementations (prompts, config) +├── config.py ← Agent configuration +├── mcp_servers.json ← Tool server config +└── tests/ + ├── conftest.py ← Shared fixtures + safe access helpers + ├── test_constraints.py ← Constraint tests + ├── test_success_criteria.py ← Success criteria tests + └── test_edge_cases.py ← Edge case tests +``` + +## Integration with Other Skills + +| Scenario | From | To | Action | +|----------|------|----|--------| +| Agent built, ready to test | `/hive-create` | `/hive-test` | Generate tests, start loop | +| Prompt fix needed | `/hive-test` Phase 4 | Direct edit | Edit `nodes/__init__.py`, resume | +| Goal definition wrong | `/hive-test` Phase 4 | `/hive-create` | Update goal, may need rebuild | +| Missing credentials | `/hive-test` Phase 3 | `/hive-credentials` | Set up credentials | +| Complex runtime failure | `/hive-test` Phase 3 | `/hive-debugger` | Deep L1/L2/L3 analysis | +| All tests pass | `/hive-test` Phase 6 | Done | Agent validated | diff --git a/.claude/skills/hive-test/examples/testing-youtube-agent.md b/.claude/skills/hive-test/examples/testing-youtube-agent.md index 9d1f2d0c..5d1715f7 100644 --- a/.claude/skills/hive-test/examples/testing-youtube-agent.md +++ b/.claude/skills/hive-test/examples/testing-youtube-agent.md @@ -1,351 +1,313 @@ -# Example: Testing a YouTube Research Agent +# Example: Iterative Testing of a Research Agent -This example walks through testing a YouTube research agent that finds relevant videos based on a topic. +This example walks through the full iterative test loop for a research agent that searches the web, reviews findings, and produces a cited report. -## Prerequisites +## Agent Structure -- Agent built with hive-create skill at `exports/youtube-research/` -- Goal defined with success criteria and constraints - -## Step 1: Load the Goal - -First, load the goal that was defined during the Goal stage: - -```json -{ - "id": "youtube-research", - "name": "YouTube Research Agent", - "description": "Find relevant YouTube videos on a given topic", - "success_criteria": [ - { - "id": "find_videos", - "description": "Find 3-5 relevant videos", - "metric": "video_count", - "target": "3-5", - "weight": 1.0 - }, - { - "id": "relevance", - "description": "Videos must be relevant to the topic", - "metric": "relevance_score", - "target": ">0.8", - "weight": 0.8 - } - ], - "constraints": [ - { - "id": "api_limits", - "description": "Must not exceed YouTube API rate limits", - "constraint_type": "hard", - "category": "technical" - }, - { - "id": "content_safety", - "description": "Must filter out inappropriate content", - "constraint_type": "hard", - "category": "safety" - } - ] -} +``` +exports/deep_research_agent/ +├── agent.py # Goal + graph: intake → research → review → report +├── nodes/__init__.py # Node definitions (system_prompt, input/output keys) +├── config.py # Model config +├── mcp_servers.json # Tools: web_search, web_scrape +└── tests/ # Test files (we'll create these) ``` -## Step 2: Get Constraint Test Guidelines +**Goal:** "Rigorous Interactive Research" — find 5+ diverse sources, cite every claim, produce a complete report. -During the Goal stage (or early Eval), get test guidelines for constraints: +--- + +## Phase 1: Generate Tests + +### Read the goal ```python -result = generate_constraint_tests( - goal_id="youtube-research", - goal_json='', - agent_path="exports/youtube-research" -) +Read(file_path="exports/deep_research_agent/agent.py") +# Extract: goal_id="rigorous-interactive-research" +# success_criteria: source-diversity (>=5), citation-coverage (100%), report-completeness (90%) +# constraints: no-hallucination, source-attribution ``` -**The result contains guidelines (not generated tests):** -- `output_file`: Where to write tests -- `file_header`: Imports and fixtures to use -- `test_template`: Format for test functions -- `constraints_formatted`: The constraints to test -- `test_guidelines`: Rules for writing tests - -## Step 3: Write Constraint Tests - -Using the guidelines, write tests directly with the Write tool: - -```python -# Write constraint tests using the provided file_header and guidelines -Write( - file_path="exports/youtube-research/tests/test_constraints.py", - content=''' -"""Constraint tests for youtube-research agent.""" - -import os -import pytest -from exports.youtube_research import default_agent - - -pytestmark = pytest.mark.skipif( - not os.environ.get("ANTHROPIC_API_KEY") and not os.environ.get("MOCK_MODE"), - reason="API key required for real testing." -) - - -@pytest.mark.asyncio -async def test_constraint_api_limits_respected(): - """Verify API rate limits are not exceeded.""" - import time - mock_mode = bool(os.environ.get("MOCK_MODE")) - - for i in range(10): - result = await default_agent.run({"topic": f"test_{i}"}, mock_mode=mock_mode) - time.sleep(0.1) - - # Should complete without rate limit errors - assert "rate limit" not in str(result).lower() - - -@pytest.mark.asyncio -async def test_constraint_content_safety_filter(): - """Verify inappropriate content is filtered.""" - mock_mode = bool(os.environ.get("MOCK_MODE")) - result = await default_agent.run({"topic": "general topic"}, mock_mode=mock_mode) - - for video in result.videos: - assert video.safe_for_work is True - assert video.age_restricted is False -''' -) -``` - -## Step 4: Get Success Criteria Test Guidelines - -After the agent is built, get success criteria test guidelines: +### Get test guidelines ```python result = generate_success_tests( - goal_id="youtube-research", - goal_json='', - node_names="search_node,filter_node,rank_node,format_node", - tool_names="youtube_search,video_details,channel_info", - agent_path="exports/youtube-research" + goal_id="rigorous-interactive-research", + goal_json='{"id": "rigorous-interactive-research", "success_criteria": [{"id": "source-diversity", "description": "Use multiple diverse sources", "target": ">=5"}, {"id": "citation-coverage", "description": "Every claim cites its source", "target": "100%"}, {"id": "report-completeness", "description": "Report answers the research questions", "target": "90%"}]}', + node_names="intake,research,review,report", + tool_names="web_search,web_scrape", + agent_path="exports/deep_research_agent" ) ``` -## Step 5: Write Success Criteria Tests - -Using the guidelines, write success criteria tests: +### Write tests ```python Write( - file_path="exports/youtube-research/tests/test_success_criteria.py", - content=''' -"""Success criteria tests for youtube-research agent.""" - -import os -import pytest -from exports.youtube_research import default_agent - - -pytestmark = pytest.mark.skipif( - not os.environ.get("ANTHROPIC_API_KEY") and not os.environ.get("MOCK_MODE"), - reason="API key required for real testing." -) - + file_path="exports/deep_research_agent/tests/test_success_criteria.py", + content=result["file_header"] + ''' @pytest.mark.asyncio -async def test_find_videos_happy_path(): - """Test finding videos for a common topic.""" - mock_mode = bool(os.environ.get("MOCK_MODE")) - result = await default_agent.run({"topic": "machine learning"}, mock_mode=mock_mode) - - assert result.success - assert 3 <= len(result.videos) <= 5 - assert all(v.title for v in result.videos) - assert all(v.video_id for v in result.videos) - +async def test_success_source_diversity(mock_mode): + """At least 5 diverse sources are found.""" + result = await default_agent.run({"query": "impact of remote work on productivity"}, mock_mode=mock_mode) + assert result.success, f"Agent failed: {result.error}" + output = result.output or {} + sources = output.get("sources", []) + if isinstance(sources, list): + assert len(sources) >= 5, f"Expected >= 5 sources, got {len(sources)}" @pytest.mark.asyncio -async def test_find_videos_minimum_boundary(): - """Test at minimum threshold (3 videos).""" - mock_mode = bool(os.environ.get("MOCK_MODE")) - result = await default_agent.run({"topic": "niche topic xyz"}, mock_mode=mock_mode) - - assert len(result.videos) >= 3 - +async def test_success_citation_coverage(mock_mode): + """Every factual claim in the report cites its source.""" + result = await default_agent.run({"query": "climate change effects on agriculture"}, mock_mode=mock_mode) + assert result.success, f"Agent failed: {result.error}" + output = result.output or {} + report = output.get("report", "") + # Check that report contains numbered references + assert "[1]" in str(report) or "[source" in str(report).lower(), "Report lacks citations" @pytest.mark.asyncio -async def test_relevance_score_threshold(): - """Test relevance scoring meets threshold.""" - mock_mode = bool(os.environ.get("MOCK_MODE")) - result = await default_agent.run({"topic": "python programming"}, mock_mode=mock_mode) - - for video in result.videos: - assert video.relevance_score > 0.8 - +async def test_success_report_completeness(mock_mode): + """Report addresses the original research question.""" + query = "pros and cons of nuclear energy" + result = await default_agent.run({"query": query}, mock_mode=mock_mode) + assert result.success, f"Agent failed: {result.error}" + output = result.output or {} + report = output.get("report", "") + assert len(str(report)) > 200, f"Report too short: {len(str(report))} chars" @pytest.mark.asyncio -async def test_find_videos_no_results_graceful(): - """Test graceful handling of no results.""" - mock_mode = bool(os.environ.get("MOCK_MODE")) - result = await default_agent.run({"topic": "xyznonexistent123"}, mock_mode=mock_mode) +async def test_empty_query_handling(mock_mode): + """Agent handles empty input gracefully.""" + result = await default_agent.run({"query": ""}, mock_mode=mock_mode) + output = result.output or {} + assert not result.success or output.get("error"), "Should handle empty query" - # Should not crash, return empty or message - assert result.videos == [] or result.message +@pytest.mark.asyncio +async def test_feedback_loop_terminates(mock_mode): + """Feedback loop between review and research terminates.""" + result = await default_agent.run({"query": "quantum computing basics"}, mock_mode=mock_mode) + visits = result.node_visit_counts or {} + for node_id, count in visits.items(): + assert count <= 5, f"Node {node_id} visited {count} times" ''' ) ``` -## Step 6: Run All Tests +--- -Execute all tests: +## Phase 2: First Execution ```python -result = run_tests( - goal_id="youtube-research", - agent_path="exports/youtube-research", - test_types='["all"]', - parallel=4 +run_tests( + goal_id="rigorous-interactive-research", + agent_path="exports/deep_research_agent", + fail_fast=True ) ``` -**Results:** - +**Result:** ```json { - "goal_id": "youtube-research", - "overall_passed": false, - "summary": { - "total": 6, - "passed": 5, - "failed": 1, - "pass_rate": "83.3%" - }, - "duration_ms": 4521, - "results": [ - {"test_id": "test_constraint_api_001", "passed": true, "duration_ms": 1234}, - {"test_id": "test_constraint_content_001", "passed": true, "duration_ms": 456}, - {"test_id": "test_success_001", "passed": true, "duration_ms": 789}, - {"test_id": "test_success_002", "passed": true, "duration_ms": 654}, - {"test_id": "test_success_003", "passed": true, "duration_ms": 543}, - {"test_id": "test_success_004", "passed": false, "duration_ms": 845, - "error_category": "IMPLEMENTATION_ERROR", - "error_message": "TypeError: 'NoneType' object has no attribute 'videos'"} - ] + "overall_passed": false, + "summary": {"total": 5, "passed": 3, "failed": 2, "pass_rate": "60.0%"}, + "failures": [ + {"test_name": "test_success_source_diversity", "details": "AssertionError: Expected >= 5 sources, got 2"}, + {"test_name": "test_success_citation_coverage", "details": "AssertionError: Report lacks citations"} + ] } ``` -## Step 7: Debug the Failed Test +--- + +## Phase 3: Analyze (Iteration 1) + +### Debug the first failure ```python -result = debug_test( - goal_id="youtube-research", - test_name="test_find_videos_no_results_graceful", - agent_path="exports/youtube-research" +debug_test( + goal_id="rigorous-interactive-research", + test_name="test_success_source_diversity", + agent_path="exports/deep_research_agent" +) +# Category: ASSERTION_FAILURE — Expected >= 5 sources, got 2 +``` + +### Find the session and inspect memory + +```python +list_agent_sessions( + agent_work_dir="~/.hive/agents/deep_research_agent", + status="completed", + limit=1 +) +# → session_20260209_150000_abc12345 + +get_agent_session_memory( + agent_work_dir="~/.hive/agents/deep_research_agent", + session_id="session_20260209_150000_abc12345", + key="research_results" +) +# → Only 2 sources found. LLM stopped searching after 2 queries. +``` + +### Check LLM behavior in the research node + +```python +query_runtime_log_raw( + agent_work_dir="~/.hive/agents/deep_research_agent", + run_id="session_20260209_150000_abc12345", + node_id="research" +) +# → LLM called web_search twice, got results, immediately called set_output. +# → Prompt doesn't instruct it to find at least 5 sources. +``` + +**Root cause:** The research node's system_prompt doesn't specify minimum source requirements. + +--- + +## Phase 4: Fix (Iteration 1) + +```python +Read(file_path="exports/deep_research_agent/nodes/__init__.py") + +# Fix the research node prompt +Edit( + file_path="exports/deep_research_agent/nodes/__init__.py", + old_string='system_prompt="Search for information on the user\'s topic using web search."', + new_string='system_prompt="Search for information on the user\'s topic using web search. You MUST find at least 5 diverse, authoritative sources. Use multiple different search queries with varied keywords. Do NOT call set_output until you have gathered at least 5 distinct sources from different domains."' ) ``` -**Debug Output:** +--- +## Phase 5: Recover & Resume (Iteration 1) + +The fix is to the `research` node. Since this was a `run_tests` execution (no checkpoints), we re-run from scratch: + +```python +run_tests( + goal_id="rigorous-interactive-research", + agent_path="exports/deep_research_agent", + fail_fast=True +) +``` + +**Result:** ```json { - "test_id": "test_success_004", - "test_name": "test_find_videos_no_results_graceful", - "input": {"topic": "xyznonexistent123"}, - "expected": "Empty list or message", - "actual": {"error": "TypeError: 'NoneType' object has no attribute 'videos'"}, - "passed": false, - "error_message": "TypeError: 'NoneType' object has no attribute 'videos'", - "error_category": "IMPLEMENTATION_ERROR", - "stack_trace": "Traceback (most recent call last):\n File \"filter_node.py\", line 42\n for video in result.videos:\nTypeError: 'NoneType' object has no attribute 'videos'", - "logs": [ - {"timestamp": "2026-01-20T10:00:01", "node": "search_node", "level": "INFO", "msg": "Searching for: xyznonexistent123"}, - {"timestamp": "2026-01-20T10:00:02", "node": "search_node", "level": "WARNING", "msg": "No results found"}, - {"timestamp": "2026-01-20T10:00:02", "node": "filter_node", "level": "ERROR", "msg": "NoneType error"} - ], - "runtime_data": { - "execution_path": ["start", "search_node", "filter_node"], - "node_outputs": { - "search_node": null - } - }, - "suggested_fix": "Add null check in filter_node before accessing .videos attribute", - "iteration_guidance": { - "stage": "Agent", - "action": "Fix the code in nodes/edges", - "restart_required": false, - "description": "The goal is correct, but filter_node doesn't handle null results from search_node." - } + "overall_passed": false, + "summary": {"total": 5, "passed": 4, "failed": 1, "pass_rate": "80.0%"}, + "failures": [ + {"test_name": "test_success_citation_coverage", "details": "AssertionError: Report lacks citations"} + ] } ``` -## Step 8: Iterate Based on Category +Source diversity now passes. Citation coverage still fails. -Since this is an **IMPLEMENTATION_ERROR**, we: +--- -1. **Don't restart** the Goal → Agent → Eval flow -2. **Fix the agent** using hive-create skill: - - Modify `filter_node` to handle null results -3. **Re-run Eval** (tests only) - -### Fix in hive-create: +## Phase 3: Analyze (Iteration 2) ```python -# Update the filter_node to handle null -add_node( - node_id="filter_node", - name="Filter Node", - description="Filter and rank videos", - node_type="function", - input_keys=["search_results"], - output_keys=["filtered_videos"], - system_prompt=""" - Filter videos by relevance. - IMPORTANT: Handle case where search_results is None or empty. - Return empty list if no results. - """ +debug_test( + goal_id="rigorous-interactive-research", + test_name="test_success_citation_coverage", + agent_path="exports/deep_research_agent" +) +# Category: ASSERTION_FAILURE — Report lacks citations + +# Check what the report node produced +list_agent_sessions( + agent_work_dir="~/.hive/agents/deep_research_agent", + status="completed", + limit=1 +) +# → session_20260209_151500_def67890 + +get_agent_session_memory( + agent_work_dir="~/.hive/agents/deep_research_agent", + session_id="session_20260209_151500_def67890", + key="report" +) +# → Report text exists but uses no numbered references. +# → Sources are in memory but report node doesn't cite them. +``` + +**Root cause:** The report node's prompt doesn't instruct the LLM to include numbered citations. + +--- + +## Phase 4: Fix (Iteration 2) + +```python +Edit( + file_path="exports/deep_research_agent/nodes/__init__.py", + old_string='system_prompt="Write a comprehensive report based on the research findings."', + new_string='system_prompt="Write a comprehensive report based on the research findings. You MUST include numbered citations [1], [2], etc. for every factual claim. At the end, include a References section listing all sources with their URLs. Every claim must be traceable to a specific source."' ) ``` -### Re-export and re-test: +--- + +## Phase 5: Resume (Iteration 2) + +The fix is to the `report` node (the last node). To demonstrate checkpoint recovery, run via CLI: + +```bash +# Run via CLI to get checkpoints +PYTHONPATH=core:exports uv run python -m deep_research_agent --tui + +# After it runs, find the clean checkpoint before report +list_agent_checkpoints( + agent_work_dir="~/.hive/agents/deep_research_agent", + session_id="session_20260209_152000_ghi34567", + is_clean="true" +) +# → cp_node_complete_review_152100 (after review, before report) + +# Resume — skips intake, research, review entirely +PYTHONPATH=core:exports uv run python -m deep_research_agent --tui \ + --resume-session session_20260209_152000_ghi34567 \ + --checkpoint cp_node_complete_review_152100 +``` + +Only the `report` node re-runs with the fixed prompt, using research data from the checkpoint. + +--- + +## Phase 6: Final Verification ```python -# Re-export the fixed agent -export_graph(path="exports/youtube-research") - -# Re-run tests -result = run_tests( - goal_id="youtube-research", - agent_path="exports/youtube-research", - test_types='["all"]' +run_tests( + goal_id="rigorous-interactive-research", + agent_path="exports/deep_research_agent" ) ``` -**Updated Results:** - +**Result:** ```json { - "goal_id": "youtube-research", - "overall_passed": true, - "summary": { - "total": 6, - "passed": 6, - "failed": 0, - "pass_rate": "100.0%" - } + "overall_passed": true, + "summary": {"total": 5, "passed": 5, "failed": 0, "pass_rate": "100.0%"} } ``` +All tests pass. + +--- + ## Summary -1. **Got guidelines** for constraint tests during Goal stage -2. **Wrote** constraint tests using Write tool -3. **Got guidelines** for success criteria tests during Eval stage -4. **Wrote** success criteria tests using Write tool -5. **Ran** tests in parallel -6. **Debugged** the one failure -7. **Categorized** as IMPLEMENTATION_ERROR -8. **Fixed** the agent (not the goal) -9. **Re-ran** Eval only (didn't restart full flow) -10. **Passed** all tests +| Iteration | Failure | Root Cause | Fix | Recovery | +|-----------|---------|------------|-----|----------| +| 1 | Source diversity (2 < 5) | Research prompt too vague | Added "at least 5 sources" to prompt | Re-run (no checkpoints) | +| 2 | No citations in report | Report prompt lacks citation instructions | Added citation requirements | Checkpoint resume (skipped 3 nodes) | -The agent is now validated and ready for production use. +**Key takeaways:** +- Phase 3 analysis (session memory + L3 logs) identified root causes without guessing +- Checkpoint recovery in iteration 2 saved time by skipping 3 expensive nodes +- Final `run_tests` confirms all scenarios pass end-to-end diff --git a/core/framework/mcp/agent_builder_server.py b/core/framework/mcp/agent_builder_server.py index f03831c0..6895af28 100644 --- a/core/framework/mcp/agent_builder_server.py +++ b/core/framework/mcp/agent_builder_server.py @@ -3929,6 +3929,382 @@ def verify_credentials( return json.dumps({"error": str(e)}) +# ============================================================================= +# SESSION & CHECKPOINT TOOLS (read-only, no build session required) +# ============================================================================= + +_MAX_DIFF_VALUE_LEN = 500 + + +def _read_session_json(path: Path) -> dict | None: + """Read a JSON file, returning None on failure.""" + if not path.exists(): + return None + try: + return json.loads(path.read_text(encoding="utf-8")) + except (json.JSONDecodeError, OSError): + return None + + +def _scan_agent_sessions(agent_work_dir: Path) -> list[tuple[str, Path]]: + """Find session directories with state.json, sorted most-recent-first.""" + sessions: list[tuple[str, Path]] = [] + sessions_dir = agent_work_dir / "sessions" + if not sessions_dir.exists(): + return sessions + for session_dir in sessions_dir.iterdir(): + if session_dir.is_dir() and session_dir.name.startswith("session_"): + state_path = session_dir / "state.json" + if state_path.exists(): + sessions.append((session_dir.name, state_path)) + sessions.sort(key=lambda t: t[0], reverse=True) + return sessions + + +def _truncate_value(value: object, max_len: int = _MAX_DIFF_VALUE_LEN) -> object: + """Truncate a value's JSON representation if too long.""" + s = json.dumps(value, default=str) + if len(s) <= max_len: + return value + return {"_truncated": True, "_preview": s[:max_len] + "...", "_length": len(s)} + + +@mcp.tool() +def list_agent_sessions( + agent_work_dir: Annotated[ + str, + "Path to the agent's working directory (e.g., ~/.hive/agents/my_agent)", + ], + status: Annotated[ + str, + "Filter by status: 'active', 'paused', 'completed', 'failed', 'cancelled'. Empty for all.", + ] = "", + limit: Annotated[int, "Maximum number of results (default 20)"] = 20, + offset: Annotated[int, "Number of sessions to skip for pagination"] = 0, +) -> str: + """ + List sessions for an agent with optional status filter. + + Use this to discover which sessions exist, find resumable sessions, + or identify failed sessions for debugging. Combines well with + query_runtime_logs for correlating session state with log data. + """ + work_dir = Path(agent_work_dir) + all_sessions = _scan_agent_sessions(work_dir) + + if not all_sessions: + return json.dumps({"sessions": [], "total": 0, "offset": offset, "limit": limit}) + + summaries = [] + for session_id, state_path in all_sessions: + data = _read_session_json(state_path) + if data is None: + continue + + session_status = data.get("status", "") + if status and session_status != status: + continue + + timestamps = data.get("timestamps", {}) + progress = data.get("progress", {}) + checkpoint_dir = state_path.parent / "checkpoints" + + summaries.append( + { + "session_id": session_id, + "status": session_status, + "goal_id": data.get("goal_id", ""), + "started_at": timestamps.get("started_at", ""), + "updated_at": timestamps.get("updated_at", ""), + "completed_at": timestamps.get("completed_at"), + "is_resumable": data.get("is_resumable", False), + "is_resumable_from_checkpoint": data.get("is_resumable_from_checkpoint", False), + "current_node": progress.get("current_node"), + "paused_at": progress.get("paused_at"), + "steps_executed": progress.get("steps_executed", 0), + "execution_quality": progress.get("execution_quality", ""), + "has_checkpoints": checkpoint_dir.exists() + and any(checkpoint_dir.glob("cp_*.json")), + } + ) + + total = len(summaries) + page = summaries[offset : offset + limit] + return json.dumps( + {"sessions": page, "total": total, "offset": offset, "limit": limit}, indent=2 + ) + + +@mcp.tool() +def get_agent_session_state( + agent_work_dir: Annotated[str, "Path to the agent's working directory"], + session_id: Annotated[str, "The session ID (e.g., 'session_20260208_143022_abc12345')"], +) -> str: + """ + Load full session state for a specific session. + + Returns complete session data including status, progress, result, + metrics, and checkpoint info. Memory values are excluded to prevent + context bloat -- use get_agent_session_memory to retrieve memory contents. + """ + state_path = Path(agent_work_dir) / "sessions" / session_id / "state.json" + data = _read_session_json(state_path) + if data is None: + return json.dumps({"error": f"Session not found: {session_id}"}) + + memory = data.get("memory", {}) + data["memory_keys"] = list(memory.keys()) if isinstance(memory, dict) else [] + data["memory_size"] = len(memory) if isinstance(memory, dict) else 0 + data.pop("memory", None) + + return json.dumps(data, indent=2, default=str) + + +@mcp.tool() +def get_agent_session_memory( + agent_work_dir: Annotated[str, "Path to the agent's working directory"], + session_id: Annotated[str, "The session ID"], + key: Annotated[str, "Specific memory key to retrieve. Empty for all."] = "", +) -> str: + """ + Get memory contents from a session. + + Memory stores intermediate results passed between nodes. Use this + to inspect what data was produced during execution. + + If key is provided, returns only that memory key's value. + If key is empty, returns all memory keys and their values. + """ + state_path = Path(agent_work_dir) / "sessions" / session_id / "state.json" + data = _read_session_json(state_path) + if data is None: + return json.dumps({"error": f"Session not found: {session_id}"}) + + memory = data.get("memory", {}) + if not isinstance(memory, dict): + memory = {} + + if key: + if key not in memory: + return json.dumps( + { + "error": f"Memory key not found: '{key}'", + "available_keys": list(memory.keys()), + } + ) + value = memory[key] + return json.dumps( + { + "session_id": session_id, + "key": key, + "value": value, + "value_type": type(value).__name__, + }, + indent=2, + default=str, + ) + + return json.dumps( + {"session_id": session_id, "memory": memory, "total_keys": len(memory)}, + indent=2, + default=str, + ) + + +@mcp.tool() +def list_agent_checkpoints( + agent_work_dir: Annotated[str, "Path to the agent's working directory"], + session_id: Annotated[str, "The session ID to list checkpoints for"], + checkpoint_type: Annotated[ + str, + "Filter by type: 'node_start', 'node_complete', 'loop_iteration'. Empty for all.", + ] = "", + is_clean: Annotated[str, "Filter by clean status: 'true', 'false', or empty for all."] = "", +) -> str: + """ + List checkpoints for a specific session. + + Checkpoints capture execution state at node boundaries for + crash recovery and resume. Use with get_agent_checkpoint for + detailed checkpoint inspection. + """ + session_dir = Path(agent_work_dir) / "sessions" / session_id + checkpoint_dir = session_dir / "checkpoints" + + if not session_dir.exists(): + return json.dumps({"error": f"Session not found: {session_id}"}) + + if not checkpoint_dir.exists(): + return json.dumps( + { + "session_id": session_id, + "checkpoints": [], + "total": 0, + "latest_checkpoint_id": None, + } + ) + + # Try index.json first + index_data = _read_session_json(checkpoint_dir / "index.json") + if index_data and "checkpoints" in index_data: + checkpoints = index_data["checkpoints"] + else: + # Fallback: scan individual checkpoint files + checkpoints = [] + for cp_file in sorted(checkpoint_dir.glob("cp_*.json")): + cp_data = _read_session_json(cp_file) + if cp_data: + checkpoints.append( + { + "checkpoint_id": cp_data.get("checkpoint_id", cp_file.stem), + "checkpoint_type": cp_data.get("checkpoint_type", ""), + "created_at": cp_data.get("created_at", ""), + "current_node": cp_data.get("current_node"), + "next_node": cp_data.get("next_node"), + "is_clean": cp_data.get("is_clean", True), + "description": cp_data.get("description", ""), + } + ) + + # Apply filters + if checkpoint_type: + checkpoints = [c for c in checkpoints if c.get("checkpoint_type") == checkpoint_type] + if is_clean: + clean_val = is_clean.lower() == "true" + checkpoints = [c for c in checkpoints if c.get("is_clean") == clean_val] + + latest_id = None + if index_data: + latest_id = index_data.get("latest_checkpoint_id") + elif checkpoints: + latest_id = checkpoints[-1].get("checkpoint_id") + + return json.dumps( + { + "session_id": session_id, + "checkpoints": checkpoints, + "total": len(checkpoints), + "latest_checkpoint_id": latest_id, + }, + indent=2, + ) + + +@mcp.tool() +def get_agent_checkpoint( + agent_work_dir: Annotated[str, "Path to the agent's working directory"], + session_id: Annotated[str, "The session ID"], + checkpoint_id: Annotated[str, "Specific checkpoint ID, or empty for latest"] = "", +) -> str: + """ + Load a specific checkpoint with full state data. + + Returns the complete checkpoint including shared memory snapshot, + execution path, accumulated outputs, and metrics. If checkpoint_id + is empty, loads the latest checkpoint. + """ + session_dir = Path(agent_work_dir) / "sessions" / session_id + checkpoint_dir = session_dir / "checkpoints" + + if not checkpoint_dir.exists(): + return json.dumps({"error": f"No checkpoints found for session: {session_id}"}) + + if not checkpoint_id: + index_data = _read_session_json(checkpoint_dir / "index.json") + if index_data and index_data.get("latest_checkpoint_id"): + checkpoint_id = index_data["latest_checkpoint_id"] + else: + cp_files = sorted(checkpoint_dir.glob("cp_*.json")) + if not cp_files: + return json.dumps({"error": f"No checkpoints found for session: {session_id}"}) + checkpoint_id = cp_files[-1].stem + + cp_path = checkpoint_dir / f"{checkpoint_id}.json" + data = _read_session_json(cp_path) + if data is None: + return json.dumps({"error": f"Checkpoint not found: {checkpoint_id}"}) + + return json.dumps(data, indent=2, default=str) + + +@mcp.tool() +def compare_agent_checkpoints( + agent_work_dir: Annotated[str, "Path to the agent's working directory"], + session_id: Annotated[str, "The session ID"], + checkpoint_id_before: Annotated[str, "The earlier checkpoint ID"], + checkpoint_id_after: Annotated[str, "The later checkpoint ID"], +) -> str: + """ + Compare memory state between two checkpoints. + + Shows what memory keys were added, removed, or changed between + two points in execution. Useful for understanding how data flows + through the agent graph. + """ + checkpoint_dir = Path(agent_work_dir) / "sessions" / session_id / "checkpoints" + + before = _read_session_json(checkpoint_dir / f"{checkpoint_id_before}.json") + if before is None: + return json.dumps({"error": f"Checkpoint not found: {checkpoint_id_before}"}) + + after = _read_session_json(checkpoint_dir / f"{checkpoint_id_after}.json") + if after is None: + return json.dumps({"error": f"Checkpoint not found: {checkpoint_id_after}"}) + + mem_before = before.get("shared_memory", {}) + mem_after = after.get("shared_memory", {}) + + keys_before = set(mem_before.keys()) + keys_after = set(mem_after.keys()) + + added = {k: _truncate_value(mem_after[k]) for k in keys_after - keys_before} + removed = list(keys_before - keys_after) + unchanged = [] + changed = {} + + for k in keys_before & keys_after: + if mem_before[k] == mem_after[k]: + unchanged.append(k) + else: + changed[k] = { + "before": _truncate_value(mem_before[k]), + "after": _truncate_value(mem_after[k]), + } + + path_before = before.get("execution_path", []) + path_after = after.get("execution_path", []) + new_nodes = path_after[len(path_before) :] + + return json.dumps( + { + "session_id": session_id, + "before": { + "checkpoint_id": checkpoint_id_before, + "current_node": before.get("current_node"), + "created_at": before.get("created_at", ""), + }, + "after": { + "checkpoint_id": checkpoint_id_after, + "current_node": after.get("current_node"), + "created_at": after.get("created_at", ""), + }, + "memory_diff": { + "added": added, + "removed": removed, + "changed": changed, + "unchanged": unchanged, + }, + "execution_path_diff": { + "new_nodes": new_nodes, + "path_before": path_before, + "path_after": path_after, + }, + }, + indent=2, + default=str, + ) + + # ============================================================================= # MAIN # ============================================================================= diff --git a/uv.lock b/uv.lock index 948a51af..d6858e01 100644 --- a/uv.lock +++ b/uv.lock @@ -754,7 +754,7 @@ wheels = [ [[package]] name = "framework" -version = "0.1.0" +version = "0.4.2" source = { editable = "core" } dependencies = [ { name = "anthropic" }, From faf534511bf1ef98961d497ff1255cf40d5b18ef Mon Sep 17 00:00:00 2001 From: Timothy Date: Mon, 9 Feb 2026 12:39:20 -0800 Subject: [PATCH 05/22] feat: automated test agent skill --- .claude/skills/hive-test/SKILL.md | 23 +++--- .../examples/testing-youtube-agent.md | 4 +- core/framework/runner/cli.py | 77 ++++++++++++++++++- docs/why-conditional-edge-priority.md | 42 ++++++++++ 4 files changed, 132 insertions(+), 14 deletions(-) create mode 100644 docs/why-conditional-edge-priority.md diff --git a/.claude/skills/hive-test/SKILL.md b/.claude/skills/hive-test/SKILL.md index 9827843b..6cb058b2 100644 --- a/.claude/skills/hive-test/SKILL.md +++ b/.claude/skills/hive-test/SKILL.md @@ -138,10 +138,10 @@ Two execution paths, use the right one for your situation. Run the agent via CLI. This creates sessions with checkpoints at `~/.hive/agents/{agent_name}/sessions/`: ```bash -PYTHONPATH=core:exports uv run python -m {agent_name} --tui +uv run hive run exports/{agent_name} --input '{"query": "test topic"}' ``` -The TUI lets you interact with client-facing nodes and see real-time execution. Sessions and checkpoints are saved automatically. +Sessions and checkpoints are saved automatically. For agents with client-facing nodes that require user interaction, the user must launch the TUI manually in a separate terminal (Claude Code cannot interact with TUI apps). #### Automated regression (for CI or final verification) @@ -334,7 +334,7 @@ Resume when ALL of these are true: ```bash # Resume from the last clean checkpoint before the failing node -PYTHONPATH=core:exports uv run python -m {agent_name} --tui \ +uv run hive run exports/{agent_name} \ --resume-session session_20260209_143022_abc12345 \ --checkpoint cp_node_complete_research_143030 ``` @@ -350,7 +350,7 @@ Re-run when ANY of these are true: - You changed the graph structure (added/removed nodes/edges) ```bash -PYTHONPATH=core:exports uv run python -m {agent_name} --tui +uv run hive run exports/{agent_name} --input '{"query": "test topic"}' ``` #### Inspecting a checkpoint before resuming @@ -696,7 +696,7 @@ run_tests(goal_id, agent_path, test_types='["success"]') ```bash # Iterative debugging with checkpoints (via CLI) -PYTHONPATH=core:exports uv run python -m {agent_name} --tui +uv run hive run exports/{agent_name} --input '{"query": "test"}' ``` ### Phase 3: Analysis @@ -739,8 +739,8 @@ get_agent_checkpoint(agent_work_dir, session_id, checkpoint_id) ``` ```bash -# Resume from checkpoint via CLI -PYTHONPATH=core:exports uv run python -m {agent_name} --tui \ +# Resume from checkpoint via CLI (headless) +uv run hive run exports/{agent_name} \ --resume-session {session_id} --checkpoint {checkpoint_id} ``` @@ -757,8 +757,9 @@ PYTHONPATH=core:exports uv run python -m {agent_name} --tui \ | Write 30+ tests | Write 8-15 focused tests | | Skip credential check | Use `/hive-credentials` before testing | | Confuse `exports/` with `~/.hive/agents/` | Code in `exports/`, runtime data in `~/.hive/` | -| Use `run_tests` for iterative debugging | Use CLI with checkpoints for iterative debugging | -| Use CLI for final regression | Use `run_tests` for automated regression | +| Use `run_tests` for iterative debugging | Use headless CLI with checkpoints for iterative debugging | +| Use headless CLI for final regression | Use `run_tests` for automated regression | +| Use `--tui` from Claude Code | Use headless `run` command — TUI hangs in non-interactive shells | | Run tests without reading goal first | Always understand the goal before writing tests | | Skip Phase 3 analysis and guess | Use session + log tools to identify root cause | @@ -866,7 +867,7 @@ list_agent_checkpoints( # → cp_node_complete_intake_150005 # Resume from after intake, re-run research with fixed prompt -PYTHONPATH=core:exports uv run python -m deep_research_agent --tui \ +uv run hive run exports/deep_research_agent \ --resume-session session_20260209_150000_abc12345 \ --checkpoint cp_node_complete_intake_150005 ``` @@ -874,7 +875,7 @@ PYTHONPATH=core:exports uv run python -m deep_research_agent --tui \ Or for this simple case (intake is fast), just re-run: ```bash -PYTHONPATH=core:exports uv run python -m deep_research_agent --tui +uv run hive run exports/deep_research_agent --input '{"topic": "test"}' ``` ### Phase 6: Final verification diff --git a/.claude/skills/hive-test/examples/testing-youtube-agent.md b/.claude/skills/hive-test/examples/testing-youtube-agent.md index 5d1715f7..40eb3206 100644 --- a/.claude/skills/hive-test/examples/testing-youtube-agent.md +++ b/.claude/skills/hive-test/examples/testing-youtube-agent.md @@ -259,7 +259,7 @@ The fix is to the `report` node (the last node). To demonstrate checkpoint recov ```bash # Run via CLI to get checkpoints -PYTHONPATH=core:exports uv run python -m deep_research_agent --tui +uv run hive run exports/deep_research_agent --input '{"topic": "climate change effects"}' # After it runs, find the clean checkpoint before report list_agent_checkpoints( @@ -270,7 +270,7 @@ list_agent_checkpoints( # → cp_node_complete_review_152100 (after review, before report) # Resume — skips intake, research, review entirely -PYTHONPATH=core:exports uv run python -m deep_research_agent --tui \ +uv run hive run exports/deep_research_agent \ --resume-session session_20260209_152000_ghi34567 \ --checkpoint cp_node_complete_review_152100 ``` diff --git a/core/framework/runner/cli.py b/core/framework/runner/cli.py index 57955aa3..76109041 100644 --- a/core/framework/runner/cli.py +++ b/core/framework/runner/cli.py @@ -332,6 +332,60 @@ def register_commands(subparsers: argparse._SubParsersAction) -> None: resume_parser.set_defaults(func=cmd_resume) +def _load_resume_state( + agent_path: str, session_id: str, checkpoint_id: str | None = None +) -> dict | None: + """Load session or checkpoint state for headless resume. + + Args: + agent_path: Path to the agent folder (e.g., exports/my_agent) + session_id: Session ID to resume from + checkpoint_id: Optional checkpoint ID within the session + + Returns: + session_state dict for executor, or None if not found + """ + agent_name = Path(agent_path).name + agent_work_dir = Path.home() / ".hive" / "agents" / agent_name + session_dir = agent_work_dir / "sessions" / session_id + + if not session_dir.exists(): + return None + + if checkpoint_id: + # Checkpoint-based resume: load checkpoint and extract state + cp_path = session_dir / "checkpoints" / f"{checkpoint_id}.json" + if not cp_path.exists(): + return None + try: + cp_data = json.loads(cp_path.read_text()) + except (json.JSONDecodeError, OSError): + return None + return { + "memory": cp_data.get("shared_memory", {}), + "paused_at": cp_data.get("next_node") or cp_data.get("current_node"), + "execution_path": cp_data.get("execution_path", []), + "node_visit_counts": {}, + } + else: + # Session state resume: load state.json + state_path = session_dir / "state.json" + if not state_path.exists(): + return None + try: + state_data = json.loads(state_path.read_text()) + except (json.JSONDecodeError, OSError): + return None + progress = state_data.get("progress", {}) + paused_at = progress.get("paused_at") or progress.get("resume_from") + return { + "memory": state_data.get("memory", {}), + "paused_at": paused_at, + "execution_path": progress.get("path", []), + "node_visit_counts": progress.get("node_visit_counts", {}), + } + + def cmd_run(args: argparse.Namespace) -> int: """Run an exported agent.""" import logging @@ -421,6 +475,27 @@ def cmd_run(args: argparse.Namespace) -> int: print(f"Error: {e}", file=sys.stderr) return 1 + # Load session/checkpoint state for resume (headless mode) + session_state = None + resume_session = getattr(args, "resume_session", None) + checkpoint = getattr(args, "checkpoint", None) + if resume_session: + session_state = _load_resume_state(args.agent_path, resume_session, checkpoint) + if session_state is None: + print( + f"Error: Could not load session state for {resume_session}", + file=sys.stderr, + ) + return 1 + if not args.quiet: + resume_node = session_state.get("paused_at", "unknown") + if checkpoint: + print(f"Resuming from checkpoint: {checkpoint}") + else: + print(f"Resuming session: {resume_session}") + print(f"Resume point: {resume_node}") + print() + # Auto-inject user_id if the agent expects it but it's not provided entry_input_keys = runner.graph.nodes[0].input_keys if runner.graph.nodes else [] if "user_id" in entry_input_keys and context.get("user_id") is None: @@ -440,7 +515,7 @@ def cmd_run(args: argparse.Namespace) -> int: print("=" * 60) print() - result = asyncio.run(runner.run(context)) + result = asyncio.run(runner.run(context, session_state=session_state)) # Format output output = { diff --git a/docs/why-conditional-edge-priority.md b/docs/why-conditional-edge-priority.md new file mode 100644 index 00000000..a664fe0b --- /dev/null +++ b/docs/why-conditional-edge-priority.md @@ -0,0 +1,42 @@ +# Why Conditional Edges Need Priority (Function Nodes) + +## The problem + +Function nodes return everything they computed. They don't pick one output key — they return all of them. + +```python +def score_lead(inputs): + score = compute_score(inputs["profile"]) + return { + "score": score, + "is_high_value": score > 80, + "needs_enrichment": score > 50 and not inputs["profile"].get("company"), + } +``` + +Lead comes in: score 92, no company on file. Output: `{"score": 92, "is_high_value": True, "needs_enrichment": True}`. + +Two conditional edges leaving this node: + +``` +Edge A: needs_enrichment == True → enrichment node +Edge B: is_high_value == True → outreach node +``` + +Both are true. Without priority, the graph either fans out to both (wrong — you'd email someone while still enriching their data) or picks one randomly (wrong — non-deterministic). + +## Priority fixes it + +``` +Edge A: needs_enrichment == True priority=2 (higher = checked first) +Edge B: is_high_value == True priority=1 +Edge C: is_high_value == False priority=0 +``` + +Executor keeps only the highest-priority matching group. A wins. Lead gets enriched first, loops back, gets re-scored — now `needs_enrichment` is false, B wins, outreach happens. + +## Why event loop nodes don't need this + +The LLM understands "if/else." You tell it in the prompt: "if needs enrichment, set `needs_enrichment`. Otherwise if high value, set `approved`." It picks one. Only one conditional edge matches. + +A function just returns a dict. It doesn't do "otherwise." Priority is the "otherwise" for function nodes. From e5428bec5c35b5c3a43c25a949b1cf5e4637db82 Mon Sep 17 00:00:00 2001 From: amazonproai Date: Mon, 9 Feb 2026 14:27:53 -0800 Subject: [PATCH 06/22] docs(contributing): fix duplicate step numbers and add tools test command - Fix Getting Started steps 6/7 duplicated; renumber to 8 and 9 - Add command to run tools package tests (cd tools && uv run pytest) Co-authored-by: Cursor --- CONTRIBUTING.md | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index d299c5a7..310fb96c 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -49,8 +49,8 @@ You may submit PRs without prior assignment for: make check # Lint and format checks (ruff check + ruff format --check on core/ and tools/) make test # Core tests (cd core && pytest tests/ -v) ``` -6. Commit your changes following our commit conventions -7. Push to your fork and submit a Pull Request +8. Commit your changes following our commit conventions +9. Push to your fork and submit a Pull Request ## Development Setup @@ -145,6 +145,9 @@ make test # Or run tests directly cd core && pytest tests/ -v +# Run tools package tests (when contributing to tools/) +cd tools && uv run pytest tests/ -v + # Run tests for a specific agent PYTHONPATH=exports uv run python -m agent_name test ``` From 866518f1880f11966572e1916873363bde279706 Mon Sep 17 00:00:00 2001 From: Timothy Date: Mon, 9 Feb 2026 16:29:06 -0800 Subject: [PATCH 07/22] fix: headless runtime consolidation --- .claude/skills/hive-test/SKILL.md | 5 +- core/framework/graph/event_loop_node.py | 52 +++++- core/framework/graph/executor.py | 15 ++ core/framework/runner/cli.py | 3 - core/framework/runner/runner.py | 216 ++++++++++-------------- core/framework/runtime/README.md | 172 +++++++++++++++++++ 6 files changed, 323 insertions(+), 140 deletions(-) create mode 100644 core/framework/runtime/README.md diff --git a/.claude/skills/hive-test/SKILL.md b/.claude/skills/hive-test/SKILL.md index 6cb058b2..32f8ece8 100644 --- a/.claude/skills/hive-test/SKILL.md +++ b/.claude/skills/hive-test/SKILL.md @@ -141,7 +141,9 @@ Run the agent via CLI. This creates sessions with checkpoints at `~/.hive/agents uv run hive run exports/{agent_name} --input '{"query": "test topic"}' ``` -Sessions and checkpoints are saved automatically. For agents with client-facing nodes that require user interaction, the user must launch the TUI manually in a separate terminal (Claude Code cannot interact with TUI apps). +Sessions and checkpoints are saved automatically. + +**Client-facing nodes**: Agents with `client_facing=True` nodes (interactive conversation) work in headless mode when run from a real terminal — the agent streams output to stdout and reads user input from stdin via a `>>> ` prompt. In non-interactive shells (like Claude Code's Bash tool), client-facing nodes will hang because there is no stdin. For testing interactive agents from Claude Code, use `run_tests` with mock mode or have the user run the agent manually in their terminal. #### Automated regression (for CI or final verification) @@ -760,6 +762,7 @@ uv run hive run exports/{agent_name} \ | Use `run_tests` for iterative debugging | Use headless CLI with checkpoints for iterative debugging | | Use headless CLI for final regression | Use `run_tests` for automated regression | | Use `--tui` from Claude Code | Use headless `run` command — TUI hangs in non-interactive shells | +| Test client-facing nodes from Claude Code | Use mock mode, or have the user run the agent in their terminal | | Run tests without reading goal first | Always understand the goal before writing tests | | Skip Phase 3 analysis and guess | Use session + log tools to identify root cause | diff --git a/core/framework/graph/event_loop_node.py b/core/framework/graph/event_loop_node.py index 1c97f158..ae95e339 100644 --- a/core/framework/graph/event_loop_node.py +++ b/core/framework/graph/event_loop_node.py @@ -274,6 +274,7 @@ class EventLoopNode(NodeProtocol): # 5. Stall detection state recent_responses: list[str] = [] + user_interaction_count = 0 # tracks how many times this node blocked for user input # 6. Main loop for iteration in range(start_iteration, self._config.max_iterations): @@ -485,14 +486,29 @@ class EventLoopNode(NodeProtocol): # 6h. Client-facing input blocking # - # For client_facing nodes, block for user input only when the - # LLM explicitly called ask_user(). Text-only turns without - # ask_user flow through without blocking, allowing progress - # updates and summaries to stream freely. + # For client_facing nodes, block for user input when: + # - The LLM explicitly called ask_user(), OR + # - The LLM produced a turn with no real tool calls + # (text-only or set_output-only). + # + # Before the first user interaction, set_output alone does + # NOT prevent blocking — the node must present its output + # to the user first. After the user has interacted at + # least once, set_output-only turns flow through (the LLM + # is finishing up based on user input). # # After user input, always fall through to judge evaluation # (6i). The judge handles all acceptance decisions. - if ctx.node_spec.client_facing and user_input_requested: + if user_interaction_count == 0: + # No user interaction yet — block unless ask_user or + # real tools were called (set_output alone is not enough) + needs_user_input = user_input_requested or not real_tool_results + else: + # User has already interacted — set_output can bypass + needs_user_input = user_input_requested or ( + not real_tool_results and not outputs_set + ) + if ctx.node_spec.client_facing and needs_user_input: if self._shutdown: await self._publish_loop_completed(stream_id, node_id, iteration + 1) latency_ms = int((time.time() - start_time) * 1000) @@ -578,6 +594,7 @@ class EventLoopNode(NodeProtocol): latency_ms=latency_ms, ) + user_interaction_count += 1 recent_responses.clear() # Fall through to judge evaluation (6i) @@ -824,6 +841,12 @@ class EventLoopNode(NodeProtocol): Returns True if input arrived, False if shutdown was signaled. """ + # Clear BEFORE emitting so that synchronous handlers (e.g. the + # headless stdin handler) can call inject_event() during the emit + # and the signal won't be lost. TUI handlers return immediately + # without injecting, so the wait still blocks until the user types. + self._input_ready.clear() + if self._event_bus: await self._event_bus.emit_client_input_requested( stream_id=ctx.node_id, @@ -831,7 +854,6 @@ class EventLoopNode(NodeProtocol): prompt="", ) - self._input_ready.clear() await self._input_ready.wait() return not self._shutdown @@ -1283,6 +1305,24 @@ class EventLoopNode(NodeProtocol): accumulator, ctx.node_spec.output_keys, ctx.node_spec.nullable_output_keys ) if not missing: + # Safety check: when ALL output keys are nullable and NONE + # have been set, the node produced nothing useful. Retry + # instead of accepting an empty result — this prevents + # client-facing nodes from terminating before the user + # ever interacts, and non-client-facing nodes from + # short-circuiting without doing their work. + output_keys = ctx.node_spec.output_keys or [] + nullable_keys = set(ctx.node_spec.nullable_output_keys or []) + all_nullable = output_keys and nullable_keys >= set(output_keys) + none_set = not any(accumulator.get(k) is not None for k in output_keys) + if all_nullable and none_set: + return JudgeVerdict( + action="RETRY", + feedback=( + f"No output keys have been set yet. " + f"Use set_output to set at least one of: {output_keys}" + ), + ) return JudgeVerdict(action="ACCEPT") else: return JudgeVerdict( diff --git a/core/framework/graph/executor.py b/core/framework/graph/executor.py index f36784ad..e252688a 100644 --- a/core/framework/graph/executor.py +++ b/core/framework/graph/executor.py @@ -502,6 +502,21 @@ class GraphExecutor: path.append(current_node_id) + # Clear stale nullable outputs from previous visits. + # When a node is re-visited (e.g. review → process-batch → review), + # nullable outputs from the PREVIOUS visit linger in shared memory. + # This causes stale edge conditions to fire (e.g. "feedback is not None" + # from visit 1 triggers even when visit 2 sets "final_summary" instead). + # Clearing them ensures only the CURRENT visit's outputs affect routing. + if node_visit_counts.get(current_node_id, 0) > 1: + nullable_keys = getattr(node_spec, "nullable_output_keys", None) or [] + for key in nullable_keys: + if memory.read(key) is not None: + memory.write(key, None, validate=False) + self.logger.info( + f" 🧹 Cleared stale nullable output '{key}' from previous visit" + ) + # Check if pause (HITL) before execution if current_node_id in graph.pause_nodes: self.logger.info(f"⏸ Paused at HITL node: {node_spec.name}") diff --git a/core/framework/runner/cli.py b/core/framework/runner/cli.py index 76109041..25d9d449 100644 --- a/core/framework/runner/cli.py +++ b/core/framework/runner/cli.py @@ -428,7 +428,6 @@ def cmd_run(args: argparse.Namespace) -> int: runner = AgentRunner.load( args.agent_path, model=args.model, - enable_tui=True, ) except Exception as e: print(f"Error loading agent: {e}") @@ -469,7 +468,6 @@ def cmd_run(args: argparse.Namespace) -> int: runner = AgentRunner.load( args.agent_path, model=args.model, - enable_tui=False, ) except FileNotFoundError as e: print(f"Error: {e}", file=sys.stderr) @@ -1260,7 +1258,6 @@ def cmd_tui(args: argparse.Namespace) -> int: runner = AgentRunner.load( agent_path, model=args.model, - enable_tui=True, ) except Exception as e: print(f"Error loading agent: {e}") diff --git a/core/framework/runner/runner.py b/core/framework/runner/runner.py index 4e0b18f2..3ecdb61a 100644 --- a/core/framework/runner/runner.py +++ b/core/framework/runner/runner.py @@ -10,17 +10,13 @@ from typing import TYPE_CHECKING, Any from framework.graph import Goal from framework.graph.edge import AsyncEntryPointSpec, EdgeCondition, EdgeSpec, GraphSpec -from framework.graph.executor import ExecutionResult, GraphExecutor +from framework.graph.executor import ExecutionResult from framework.graph.node import NodeSpec from framework.llm.provider import LLMProvider, Tool from framework.runner.tool_registry import ToolRegistry - -# Multi-entry-point runtime imports from framework.runtime.agent_runtime import AgentRuntime, create_agent_runtime -from framework.runtime.core import Runtime from framework.runtime.execution_stream import EntryPointSpec from framework.runtime.runtime_log_store import RuntimeLogStore -from framework.runtime.runtime_logger import RuntimeLogger if TYPE_CHECKING: from framework.runner.protocol import AgentMessage, CapabilityResponse @@ -282,7 +278,6 @@ class AgentRunner: mock_mode: bool = False, storage_path: Path | None = None, model: str | None = None, - enable_tui: bool = False, ): """ Initialize the runner (use AgentRunner.load() instead). @@ -294,14 +289,12 @@ class AgentRunner: mock_mode: If True, use mock LLM responses storage_path: Path for runtime storage (defaults to temp) model: Model to use (reads from agent config or ~/.hive/configuration.json if None) - enable_tui: If True, forces use of AgentRuntime with EventBus """ self.agent_path = agent_path self.graph = graph self.goal = goal self.mock_mode = mock_mode self.model = model or self._resolve_default_model() - self.enable_tui = enable_tui # Set up storage if storage_path: @@ -321,12 +314,10 @@ class AgentRunner: # Initialize components self._tool_registry = ToolRegistry() - self._runtime: Runtime | None = None self._llm: LLMProvider | None = None - self._executor: GraphExecutor | None = None self._approval_callback: Callable | None = None - # Multi-entry-point support (AgentRuntime) + # AgentRuntime — unified execution path for all agents self._agent_runtime: AgentRuntime | None = None self._uses_async_entry_points = self.graph.has_async_entry_points() @@ -383,7 +374,6 @@ class AgentRunner: mock_mode: bool = False, storage_path: Path | None = None, model: str | None = None, - enable_tui: bool = False, ) -> "AgentRunner": """ Load an agent from an export folder. @@ -397,7 +387,6 @@ class AgentRunner: mock_mode: If True, use mock LLM responses storage_path: Path for runtime storage (defaults to ~/.hive/agents/{name}) model: LLM model to use (reads from agent's default_config if None) - enable_tui: If True, forces use of AgentRuntime with EventBus Returns: AgentRunner instance ready to run @@ -448,7 +437,6 @@ class AgentRunner: mock_mode=mock_mode, storage_path=storage_path, model=model, - enable_tui=enable_tui, ) # Fallback: load from agent.json (legacy JSON-based agents) @@ -466,7 +454,6 @@ class AgentRunner: mock_mode=mock_mode, storage_path=storage_path, model=model, - enable_tui=enable_tui, ) def register_tool( @@ -556,9 +543,6 @@ class AgentRunner: callback: Function to call for approval (receives node info, returns bool) """ self._approval_callback = callback - # If executor already exists, update it - if self._executor is not None: - self._executor.approval_callback = callback def _setup(self) -> None: """Set up runtime, LLM, and executor.""" @@ -618,16 +602,11 @@ class AgentRunner: print(f"Warning: {api_key_env} not set. LLM calls will fail.") print(f"Set it with: export {api_key_env}=your-api-key") - # Get tools for executor/runtime + # Get tools for runtime tools = list(self._tool_registry.get_tools().values()) tool_executor = self._tool_registry.get_executor() - if self._uses_async_entry_points or self.enable_tui: - # Multi-entry-point mode or TUI mode: use AgentRuntime - self._setup_agent_runtime(tools, tool_executor) - else: - # Single-entry-point mode: use legacy GraphExecutor - self._setup_legacy_executor(tools, tool_executor) + self._setup_agent_runtime(tools, tool_executor) def _get_api_key_env_var(self, model: str) -> str | None: """Get the environment variable name for the API key based on model name.""" @@ -688,26 +667,6 @@ class AgentRunner: except Exception: return None - def _setup_legacy_executor(self, tools: list, tool_executor: Callable | None) -> None: - """Set up legacy single-entry-point execution using GraphExecutor.""" - # Create runtime - self._runtime = Runtime(storage_path=self._storage_path) - - # Create runtime logger - log_store = RuntimeLogStore(base_path=self._storage_path / "runtime_logs") - runtime_logger = RuntimeLogger(store=log_store, agent_id=self.graph.id) - - # Create executor - self._executor = GraphExecutor( - runtime=self._runtime, - llm=self._llm, - tools=tools, - tool_executor=tool_executor, - approval_callback=self._approval_callback, - runtime_logger=runtime_logger, - loop_config=self.graph.loop_config, - ) - def _setup_agent_runtime(self, tools: list, tool_executor: Callable | None) -> None: """Set up multi-entry-point execution using AgentRuntime.""" # Convert AsyncEntryPointSpec to EntryPointSpec for AgentRuntime @@ -725,9 +684,9 @@ class AgentRunner: ) entry_points.append(ep) - # If TUI enabled but no entry points (single-entry agent), create default - if not entry_points and self.enable_tui and self.graph.entry_node: - logger.info("Creating default entry point for TUI") + # Single-entry agent with no async entry points: create a default entry point + if not entry_points and self.graph.entry_node: + logger.info("Creating default entry point for single-entry agent") entry_points.append( EntryPointSpec( id="default", @@ -803,32 +762,9 @@ class AgentRunner: error=error_msg, ) - if self._uses_async_entry_points or self.enable_tui: - # Multi-entry-point mode: use AgentRuntime - return await self._run_with_agent_runtime( - input_data=input_data or {}, - entry_point_id=entry_point_id, - ) - else: - # Legacy single-entry-point mode - return await self._run_with_executor( - input_data=input_data or {}, - session_state=session_state, - ) - - async def _run_with_executor( - self, - input_data: dict, - session_state: dict | None = None, - ) -> ExecutionResult: - """Run using legacy GraphExecutor (single entry point).""" - if self._executor is None: - self._setup() - - return await self._executor.execute( - graph=self.graph, - goal=self.goal, - input_data=input_data, + return await self._run_with_agent_runtime( + input_data=input_data or {}, + entry_point_id=entry_point_id, session_state=session_state, ) @@ -836,8 +772,11 @@ class AgentRunner: self, input_data: dict, entry_point_id: str | None = None, + session_state: dict | None = None, ) -> ExecutionResult: - """Run using AgentRuntime (multi-entry-point).""" + """Run using AgentRuntime.""" + import sys + if self._agent_runtime is None: self._setup() @@ -845,6 +784,52 @@ class AgentRunner: if not self._agent_runtime.is_running: await self._agent_runtime.start() + # Set up stdin-based I/O for client-facing nodes in headless mode. + # When a client_facing EventLoopNode calls ask_user(), it emits + # CLIENT_INPUT_REQUESTED on the event bus and blocks. We subscribe + # a handler that prints the prompt and reads from stdin, then injects + # the user's response back into the node to unblock it. + has_client_facing = any(n.client_facing for n in self.graph.nodes) + sub_ids: list[str] = [] + + if has_client_facing and sys.stdin.isatty(): + from framework.runtime.event_bus import EventType + + runtime = self._agent_runtime + + async def _handle_client_output(event): + """Print agent output to stdout as it streams.""" + content = event.data.get("content", "") + if content: + print(content, end="", flush=True) + + async def _handle_input_requested(event): + """Read user input from stdin and inject it into the node.""" + import asyncio + + node_id = event.node_id + try: + loop = asyncio.get_event_loop() + user_input = await loop.run_in_executor(None, input, "\n>>> ") + except EOFError: + user_input = "" + + # Inject into the waiting EventLoopNode via runtime + await runtime.inject_input(node_id, user_input) + + sub_ids.append( + runtime.subscribe_to_events( + event_types=[EventType.CLIENT_OUTPUT_DELTA], + handler=_handle_client_output, + ) + ) + sub_ids.append( + runtime.subscribe_to_events( + event_types=[EventType.CLIENT_INPUT_REQUESTED], + handler=_handle_input_requested, + ) + ) + # Determine entry point if entry_point_id is None: # Use first entry point or "default" if no entry points defined @@ -854,44 +839,38 @@ class AgentRunner: else: entry_point_id = "default" - # Trigger and wait for result - result = await self._agent_runtime.trigger_and_wait( - entry_point_id=entry_point_id, - input_data=input_data, - ) - - # Return result or create error result - if result is not None: - return result - else: - return ExecutionResult( - success=False, - error="Execution timed out or failed to complete", + try: + # Trigger and wait for result + result = await self._agent_runtime.trigger_and_wait( + entry_point_id=entry_point_id, + input_data=input_data, + session_state=session_state, ) - # === Multi-Entry-Point API (for agents with async_entry_points) === + # Return result or create error result + if result is not None: + return result + else: + return ExecutionResult( + success=False, + error="Execution timed out or failed to complete", + ) + finally: + # Clean up subscriptions + for sub_id in sub_ids: + self._agent_runtime.unsubscribe_from_events(sub_id) + + # === Runtime API === async def start(self) -> None: - """ - Start the agent runtime (for multi-entry-point agents). - - This starts all registered entry points and allows concurrent execution. - For single-entry-point agents, this is a no-op. - """ - if not self._uses_async_entry_points: - return - + """Start the agent runtime.""" if self._agent_runtime is None: self._setup() await self._agent_runtime.start() async def stop(self) -> None: - """ - Stop the agent runtime (for multi-entry-point agents). - - For single-entry-point agents, this is a no-op. - """ + """Stop the agent runtime.""" if self._agent_runtime is not None: await self._agent_runtime.stop() @@ -904,7 +883,7 @@ class AgentRunner: """ Trigger execution at a specific entry point (non-blocking). - For multi-entry-point agents only. Returns execution ID for tracking. + Returns execution ID for tracking. Args: entry_point_id: Which entry point to trigger @@ -913,16 +892,7 @@ class AgentRunner: Returns: Execution ID for tracking - - Raises: - RuntimeError: If agent doesn't use async entry points """ - if not self._uses_async_entry_points: - raise RuntimeError( - "trigger() is only available for multi-entry-point agents. " - "Use run() for single-entry-point agents." - ) - if self._agent_runtime is None: self._setup() @@ -939,19 +909,9 @@ class AgentRunner: """ Get goal progress across all execution streams. - For multi-entry-point agents only. - Returns: Dict with overall_progress, criteria_status, constraint_violations, etc. - - Raises: - RuntimeError: If agent doesn't use async entry points """ - if not self._uses_async_entry_points: - raise RuntimeError( - "get_goal_progress() is only available for multi-entry-point agents." - ) - if self._agent_runtime is None: self._setup() @@ -959,14 +919,11 @@ class AgentRunner: def get_entry_points(self) -> list[EntryPointSpec]: """ - Get all registered entry points (for multi-entry-point agents). + Get all registered entry points. Returns: List of EntryPointSpec objects """ - if not self._uses_async_entry_points: - return [] - if self._agent_runtime is None: self._setup() @@ -1390,7 +1347,7 @@ Respond with JSON only: self._temp_dir = None async def cleanup_async(self) -> None: - """Clean up resources (asynchronous - for multi-entry-point agents).""" + """Clean up resources (asynchronous).""" # Stop agent runtime if running if self._agent_runtime is not None and self._agent_runtime.is_running: await self._agent_runtime.stop() @@ -1401,8 +1358,7 @@ Respond with JSON only: async def __aenter__(self) -> "AgentRunner": """Context manager entry.""" self._setup() - # Start runtime for multi-entry-point agents - if self._uses_async_entry_points and self._agent_runtime is not None: + if self._agent_runtime is not None: await self._agent_runtime.start() return self diff --git a/core/framework/runtime/README.md b/core/framework/runtime/README.md new file mode 100644 index 00000000..cc71e48c --- /dev/null +++ b/core/framework/runtime/README.md @@ -0,0 +1,172 @@ +# Agent Runtime + +Unified execution system for all Hive agents. Every agent — single-entry or multi-entry, headless or TUI — runs through the same runtime stack. + +## Topology + +``` + AgentRunner.load(agent_path) + | + AgentRunner + (factory + public API) + | + _setup_agent_runtime() + | + AgentRuntime + (lifecycle + orchestration) + / | \ + Stream A Stream B Stream C ← one per entry point + | | | + GraphExecutor GraphExecutor GraphExecutor + | | | + Node → Node → Node (graph traversal) +``` + +Single-entry agents get a `"default"` entry point automatically. There is no separate code path. + +## Components + +| Component | File | Role | +|---|---|---| +| `AgentRunner` | `runner/runner.py` | Load agents, configure tools/LLM, expose high-level API | +| `AgentRuntime` | `runtime/agent_runtime.py` | Lifecycle management, entry point routing, event bus | +| `ExecutionStream` | `runtime/execution_stream.py` | Per-entry-point execution queue, session persistence | +| `GraphExecutor` | `graph/executor.py` | Node traversal, tool dispatch, checkpointing | +| `EventBus` | `runtime/event_bus.py` | Pub/sub for execution events (streaming, I/O) | +| `SharedStateManager` | `runtime/shared_state.py` | Cross-stream state with isolation levels | +| `OutcomeAggregator` | `runtime/outcome_aggregator.py` | Goal progress tracking across streams | +| `SessionStore` | `storage/session_store.py` | Session state persistence (`sessions/{id}/state.json`) | + +## Programming Interface + +### AgentRunner (high-level) + +```python +from framework.runner import AgentRunner + +# Load and run +runner = AgentRunner.load("exports/my_agent", model="anthropic/claude-sonnet-4-20250514") +result = await runner.run({"query": "hello"}) + +# Resume from paused session +result = await runner.run({"query": "continue"}, session_state=saved_state) + +# Lifecycle +await runner.start() # Start the runtime +await runner.stop() # Stop the runtime +exec_id = await runner.trigger("default", {}) # Non-blocking trigger +progress = await runner.get_goal_progress() # Goal evaluation +entry_points = runner.get_entry_points() # List entry points + +# Context manager +async with AgentRunner.load("exports/my_agent") as runner: + result = await runner.run({"query": "hello"}) + +# Cleanup +runner.cleanup() # Synchronous +await runner.cleanup_async() # Asynchronous +``` + +### AgentRuntime (lower-level) + +```python +from framework.runtime.agent_runtime import AgentRuntime, create_agent_runtime +from framework.runtime.execution_stream import EntryPointSpec + +# Create runtime with entry points +runtime = create_agent_runtime( + graph=graph, + goal=goal, + storage_path=Path("~/.hive/agents/my_agent"), + entry_points=[ + EntryPointSpec(id="default", name="Default", entry_node="start", trigger_type="manual"), + ], + llm=llm, + tools=tools, + tool_executor=tool_executor, + checkpoint_config=checkpoint_config, +) + +# Lifecycle +await runtime.start() +await runtime.stop() + +# Execution +exec_id = await runtime.trigger("default", {"query": "hello"}) # Non-blocking +result = await runtime.trigger_and_wait("default", {"query": "hello"}) # Blocking +result = await runtime.trigger_and_wait("default", {}, session_state=state) # Resume + +# Client-facing node I/O +await runtime.inject_input(node_id="chat", content="user response") + +# Events +sub_id = runtime.subscribe_to_events( + event_types=[EventType.CLIENT_OUTPUT_DELTA], + handler=my_handler, +) +runtime.unsubscribe_from_events(sub_id) + +# Inspection +runtime.is_running # bool +runtime.event_bus # EventBus +runtime.state_manager # SharedStateManager +runtime.get_stats() # Runtime statistics +``` + +## Execution Flow + +1. `AgentRunner.run()` calls `AgentRuntime.trigger_and_wait()` +2. `AgentRuntime` routes to the `ExecutionStream` for the entry point +3. `ExecutionStream` creates a `GraphExecutor` and calls `execute()` +4. `GraphExecutor` traverses nodes, dispatches tools, manages checkpoints +5. `ExecutionResult` flows back up through the stack +6. `ExecutionStream` writes session state to disk + +## Session Resume + +All execution paths support session resume: + +```python +# First run (agent pauses at a client-facing node) +result = await runner.run({"query": "start task"}) +# result.paused_at = "review-node" +# result.session_state = {"memory": {...}, "paused_at": "review-node", ...} + +# Resume +result = await runner.run({"input": "approved"}, session_state=result.session_state) +``` + +Session state flows: `AgentRunner.run()` → `AgentRuntime.trigger_and_wait()` → `ExecutionStream.execute()` → `GraphExecutor.execute()`. + +Checkpoints are saved at node boundaries (`sessions/{id}/checkpoints/`) for crash recovery. + +## Event Bus + +The `EventBus` provides real-time execution visibility: + +| Event | When | +|---|---| +| `NODE_STARTED` | Node begins execution | +| `NODE_COMPLETED` | Node finishes | +| `TOOL_CALL_STARTED` | Tool invocation begins | +| `TOOL_CALL_COMPLETED` | Tool invocation finishes | +| `CLIENT_OUTPUT_DELTA` | Agent streams text to user | +| `CLIENT_INPUT_REQUESTED` | Agent needs user input | +| `EXECUTION_COMPLETED` | Full execution finishes | + +In headless mode, `AgentRunner` subscribes to `CLIENT_OUTPUT_DELTA` and `CLIENT_INPUT_REQUESTED` to print output and read stdin. In TUI mode, `AdenTUI` subscribes to route events to UI widgets. + +## Storage Layout + +``` +~/.hive/agents/{agent_name}/ + sessions/ + session_YYYYMMDD_HHMMSS_{uuid}/ + state.json # Session state (status, memory, progress) + checkpoints/ # Node-boundary snapshots + logs/ + summary.json # Execution summary + details.jsonl # Detailed event log + tool_logs.jsonl # Tool call log + runtime_logs/ # Cross-session runtime logs +``` From 59a315b90bf7f78d78fd7c799aea32ee55148adc Mon Sep 17 00:00:00 2001 From: YashovardhanB <167444276+YashovardhanB@users.noreply.github.com> Date: Tue, 10 Feb 2026 06:21:54 +0530 Subject: [PATCH 08/22] fix(graph): correct LLMNode token counting to include retries --- core/framework/graph/node.py | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/core/framework/graph/node.py b/core/framework/graph/node.py index ed0a44f0..5d7d7b34 100644 --- a/core/framework/graph/node.py +++ b/core/framework/graph/node.py @@ -1073,7 +1073,7 @@ Keep the same JSON structure but with shorter content values. decision_id=decision_id, success=True, result=response.content, - tokens_used=response.input_tokens + response.output_tokens, + tokens_used=total_input_tokens + total_output_tokens, latency_ms=latency_ms, ) @@ -1148,7 +1148,7 @@ Keep the same JSON structure but with shorter content values. f"Expected keys: {ctx.node_spec.output_keys}" ), output={}, - tokens_used=response.input_tokens + response.output_tokens, + tokens_used=total_input_tokens + total_output_tokens, latency_ms=latency_ms, ) # JSON extraction failed completely - still strip code blocks @@ -1167,7 +1167,7 @@ Keep the same JSON structure but with shorter content values. return NodeResult( success=True, output=output, - tokens_used=response.input_tokens + response.output_tokens, + tokens_used=total_input_tokens + total_output_tokens, latency_ms=latency_ms, ) From 9c28dae5836b20becd6cea6757f2c254466709bc Mon Sep 17 00:00:00 2001 From: Timothy Date: Mon, 9 Feb 2026 17:05:11 -0800 Subject: [PATCH 09/22] fix: gemini should use GEMINI_API_KEY --- core/framework/runner/runner.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/core/framework/runner/runner.py b/core/framework/runner/runner.py index 3ecdb61a..895a54fc 100644 --- a/core/framework/runner/runner.py +++ b/core/framework/runner/runner.py @@ -621,7 +621,7 @@ class AgentRunner: elif model_lower.startswith("anthropic/") or model_lower.startswith("claude"): return "ANTHROPIC_API_KEY" elif model_lower.startswith("gemini/") or model_lower.startswith("google/"): - return "GOOGLE_API_KEY" + return "GEMINI_API_KEY" elif model_lower.startswith("mistral/"): return "MISTRAL_API_KEY" elif model_lower.startswith("groq/"): From 776583b3adbe87f845a7fb813e2e00cfe3d2da53 Mon Sep 17 00:00:00 2001 From: Timothy Date: Mon, 9 Feb 2026 17:07:34 -0800 Subject: [PATCH 10/22] fix: use more sensible default models --- quickstart.sh | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/quickstart.sh b/quickstart.sh index 4fe69783..e4e12691 100755 --- a/quickstart.sh +++ b/quickstart.sh @@ -303,9 +303,9 @@ if [ "$USE_ASSOC_ARRAYS" = true ]; then ) declare -A DEFAULT_MODELS=( - ["anthropic"]="claude-sonnet-4-5-20250929" - ["openai"]="gpt-4o" - ["gemini"]="gemini-3.0-flash-preview" + ["anthropic"]="claude-haiku-4-5" + ["openai"]="gpt-5-mini" + ["gemini"]="gemini-3-flash-preview" ["groq"]="moonshotai/kimi-k2-instruct-0905" ["cerebras"]="zai-glm-4.7" ["mistral"]="mistral-large-latest" @@ -820,4 +820,4 @@ if [ -n "$SELECTED_PROVIDER_ID" ] || [ -n "$HIVE_CREDENTIAL_KEY" ]; then fi echo -e "${DIM}Run ./quickstart.sh again to reconfigure.${NC}" -echo "" \ No newline at end of file +echo "" From 6fd7efece60f258d01e87f785704e20548424622 Mon Sep 17 00:00:00 2001 From: Timothy Date: Mon, 9 Feb 2026 18:53:42 -0800 Subject: [PATCH 11/22] feat: hive test, quickstart local settings --- .claude/settings.local.json.example | 34 ++++ .claude/skills/hive-test/SKILL.md | 46 +++-- .../examples/testing-youtube-agent.md | 40 +++-- core/framework/mcp/agent_builder_server.py | 105 ++++++++---- core/framework/testing/prompts.py | 162 +++++++++++++++--- quickstart.sh | 10 ++ 6 files changed, 312 insertions(+), 85 deletions(-) create mode 100644 .claude/settings.local.json.example diff --git a/.claude/settings.local.json.example b/.claude/settings.local.json.example new file mode 100644 index 00000000..f4f34ec4 --- /dev/null +++ b/.claude/settings.local.json.example @@ -0,0 +1,34 @@ +{ + "permissions": { + "allow": [ + "mcp__agent-builder__create_session", + "mcp__agent-builder__set_goal", + "mcp__agent-builder__add_node", + "mcp__agent-builder__add_edge", + "mcp__agent-builder__configure_loop", + "mcp__agent-builder__add_mcp_server", + "mcp__agent-builder__validate_graph", + "mcp__agent-builder__export_graph", + "mcp__agent-builder__load_session_by_id", + "Bash(git status:*)", + "Bash(gh run view:*)", + "Bash(uv run:*)", + "Bash(env:*)", + "mcp__agent-builder__test_node", + "mcp__agent-builder__list_mcp_tools", + "Bash(python -m py_compile:*)", + "Bash(python -m pytest:*)", + "Bash(source:*)", + "mcp__agent-builder__update_node", + "mcp__agent-builder__check_missing_credentials", + "mcp__agent-builder__list_stored_credentials", + "Bash(find:*)", + "mcp__agent-builder__run_tests", + "Bash(PYTHONPATH=core:exports:tools/src uv run pytest:*)", + "mcp__agent-builder__list_agent_sessions", + "mcp__agent-builder__generate_constraint_tests", + "mcp__agent-builder__generate_success_tests" + ] + }, + "enabledMcpjsonServers": ["agent-builder", "tools"] +} diff --git a/.claude/skills/hive-test/SKILL.md b/.claude/skills/hive-test/SKILL.md index 32f8ece8..4e8cfe69 100644 --- a/.claude/skills/hive-test/SKILL.md +++ b/.claude/skills/hive-test/SKILL.md @@ -109,12 +109,14 @@ Write( #### Test writing rules - Every test MUST be `async` with `@pytest.mark.asyncio` -- Every test MUST accept `mock_mode` parameter -- Use `await default_agent.run(input, mock_mode=mock_mode)` +- Every test MUST accept `runner, auto_responder, mock_mode` fixtures +- Use `await auto_responder.start()` before running, `await auto_responder.stop()` in `finally` +- Use `await runner.run(input_dict)` — this goes through AgentRunner → AgentRuntime → ExecutionStream - Access output via `result.output.get("key")` — NEVER `result.output["key"]` - `result.success=True` means no exception, NOT goal achieved — always check output - Write 8-15 tests total, not 30+ - Each real test costs ~3 seconds + LLM tokens +- NEVER use `default_agent.run()` — it bypasses the runtime (no sessions, no logs, client-facing nodes hang) #### Step 1d: Check existing tests @@ -178,7 +180,7 @@ run_tests(goal_id, agent_path, fail_fast=True) run_tests(goal_id, agent_path, parallel=4) ``` -**Note:** `run_tests` calls `default_agent.run()` which does NOT enable checkpointing. For checkpoint-based recovery, use CLI execution. Use `run_tests` for quick regression checks and final verification. +**Note:** `run_tests` uses `AgentRunner` with `tmp_path` storage, so sessions are isolated per test run. For checkpoint-based recovery with persistent sessions, use CLI execution. Use `run_tests` for quick regression checks and final verification. --- @@ -431,10 +433,11 @@ Common tool credentials: ### Mock Mode Limitations -Mock mode (`--mock` flag or `mock_mode=True`) is **ONLY for structure validation**: +Mock mode (`--mock` flag or `MOCK_MODE=1`) is **ONLY for structure validation**: - Validates graph structure (nodes, edges, connections) -- Tests that code doesn't crash on execution +- Validates that `AgentRunner.load()` succeeds and the agent is importable +- Does NOT execute event_loop agents — MockLLMProvider never calls `set_output`, so event_loop nodes loop forever - Does NOT test LLM reasoning, content quality, or constraint validation - Does NOT test real API integrations or tool use @@ -623,9 +626,13 @@ Each real test costs ~3 seconds + LLM tokens. 12 tests = ~36 seconds, $0.12. ### Happy Path ```python @pytest.mark.asyncio -async def test_happy_path(mock_mode): +async def test_happy_path(runner, auto_responder, mock_mode): """Test normal successful execution.""" - result = await default_agent.run({"query": "python tutorials"}, mock_mode=mock_mode) + await auto_responder.start() + try: + result = await runner.run({"query": "python tutorials"}) + finally: + await auto_responder.stop() assert result.success, f"Agent failed: {result.error}" output = result.output or {} assert output.get("report"), "No report produced" @@ -634,9 +641,13 @@ async def test_happy_path(mock_mode): ### Boundary Condition ```python @pytest.mark.asyncio -async def test_minimum_sources(mock_mode): +async def test_minimum_sources(runner, auto_responder, mock_mode): """Test at minimum source threshold.""" - result = await default_agent.run({"query": "niche topic"}, mock_mode=mock_mode) + await auto_responder.start() + try: + result = await runner.run({"query": "niche topic"}) + finally: + await auto_responder.stop() assert result.success, f"Agent failed: {result.error}" output = result.output or {} sources = output.get("sources", []) @@ -647,9 +658,13 @@ async def test_minimum_sources(mock_mode): ### Error Handling ```python @pytest.mark.asyncio -async def test_empty_input(mock_mode): +async def test_empty_input(runner, auto_responder, mock_mode): """Test graceful handling of empty input.""" - result = await default_agent.run({"query": ""}, mock_mode=mock_mode) + await auto_responder.start() + try: + result = await runner.run({"query": ""}) + finally: + await auto_responder.stop() # Agent should either fail gracefully or produce an error message output = result.output or {} assert not result.success or output.get("error"), "Should handle empty input" @@ -658,9 +673,13 @@ async def test_empty_input(mock_mode): ### Feedback Loop ```python @pytest.mark.asyncio -async def test_feedback_loop_terminates(mock_mode): +async def test_feedback_loop_terminates(runner, auto_responder, mock_mode): """Test that feedback loops don't run forever.""" - result = await default_agent.run({"query": "test"}, mock_mode=mock_mode) + await auto_responder.start() + try: + result = await runner.run({"query": "test"}) + finally: + await auto_responder.stop() visits = result.node_visit_counts or {} for node_id, count in visits.items(): assert count <= 5, f"Node {node_id} visited {count} times — possible infinite loop" @@ -752,6 +771,7 @@ uv run hive run exports/{agent_name} \ | Don't | Do Instead | |-------|-----------| +| Use `default_agent.run()` in tests | Use `runner.run()` with `auto_responder` fixtures (goes through AgentRuntime) | | Re-run entire agent when a late node fails | Resume from last clean checkpoint | | Treat `result.success` as goal achieved | Check `result.output` for actual criteria | | Access `result.output["key"]` directly | Use `result.output.get("key")` | diff --git a/.claude/skills/hive-test/examples/testing-youtube-agent.md b/.claude/skills/hive-test/examples/testing-youtube-agent.md index 40eb3206..92ed7bc2 100644 --- a/.claude/skills/hive-test/examples/testing-youtube-agent.md +++ b/.claude/skills/hive-test/examples/testing-youtube-agent.md @@ -48,9 +48,13 @@ Write( content=result["file_header"] + ''' @pytest.mark.asyncio -async def test_success_source_diversity(mock_mode): +async def test_success_source_diversity(runner, auto_responder, mock_mode): """At least 5 diverse sources are found.""" - result = await default_agent.run({"query": "impact of remote work on productivity"}, mock_mode=mock_mode) + await auto_responder.start() + try: + result = await runner.run({"query": "impact of remote work on productivity"}) + finally: + await auto_responder.stop() assert result.success, f"Agent failed: {result.error}" output = result.output or {} sources = output.get("sources", []) @@ -58,9 +62,13 @@ async def test_success_source_diversity(mock_mode): assert len(sources) >= 5, f"Expected >= 5 sources, got {len(sources)}" @pytest.mark.asyncio -async def test_success_citation_coverage(mock_mode): +async def test_success_citation_coverage(runner, auto_responder, mock_mode): """Every factual claim in the report cites its source.""" - result = await default_agent.run({"query": "climate change effects on agriculture"}, mock_mode=mock_mode) + await auto_responder.start() + try: + result = await runner.run({"query": "climate change effects on agriculture"}) + finally: + await auto_responder.stop() assert result.success, f"Agent failed: {result.error}" output = result.output or {} report = output.get("report", "") @@ -68,26 +76,38 @@ async def test_success_citation_coverage(mock_mode): assert "[1]" in str(report) or "[source" in str(report).lower(), "Report lacks citations" @pytest.mark.asyncio -async def test_success_report_completeness(mock_mode): +async def test_success_report_completeness(runner, auto_responder, mock_mode): """Report addresses the original research question.""" query = "pros and cons of nuclear energy" - result = await default_agent.run({"query": query}, mock_mode=mock_mode) + await auto_responder.start() + try: + result = await runner.run({"query": query}) + finally: + await auto_responder.stop() assert result.success, f"Agent failed: {result.error}" output = result.output or {} report = output.get("report", "") assert len(str(report)) > 200, f"Report too short: {len(str(report))} chars" @pytest.mark.asyncio -async def test_empty_query_handling(mock_mode): +async def test_empty_query_handling(runner, auto_responder, mock_mode): """Agent handles empty input gracefully.""" - result = await default_agent.run({"query": ""}, mock_mode=mock_mode) + await auto_responder.start() + try: + result = await runner.run({"query": ""}) + finally: + await auto_responder.stop() output = result.output or {} assert not result.success or output.get("error"), "Should handle empty query" @pytest.mark.asyncio -async def test_feedback_loop_terminates(mock_mode): +async def test_feedback_loop_terminates(runner, auto_responder, mock_mode): """Feedback loop between review and research terminates.""" - result = await default_agent.run({"query": "quantum computing basics"}, mock_mode=mock_mode) + await auto_responder.start() + try: + result = await runner.run({"query": "quantum computing basics"}) + finally: + await auto_responder.stop() visits = result.node_visit_counts or {} for node_id, count in visits.items(): assert count <= 5, f"Node {node_id} visited {count} times" diff --git a/core/framework/mcp/agent_builder_server.py b/core/framework/mcp/agent_builder_server.py index 6895af28..1ee174aa 100644 --- a/core/framework/mcp/agent_builder_server.py +++ b/core/framework/mcp/agent_builder_server.py @@ -14,13 +14,15 @@ from datetime import datetime from pathlib import Path from typing import Annotated +# Project root resolution. This file lives at core/framework/mcp/agent_builder_server.py, +# so the project root (where exports/ lives) is four parents up. +_PROJECT_ROOT = Path(__file__).resolve().parent.parent.parent.parent + # Ensure exports/ is on sys.path so AgentRunner can import agent modules. -_framework_dir = Path(__file__).resolve().parent.parent # core/framework/ -> core/ -_project_root = _framework_dir.parent # core/ -> project root -_exports_dir = _project_root / "exports" +_exports_dir = _PROJECT_ROOT / "exports" if _exports_dir.is_dir() and str(_exports_dir) not in sys.path: sys.path.insert(0, str(_exports_dir)) -del _framework_dir, _project_root, _exports_dir +del _exports_dir from mcp.server import FastMCP # noqa: E402 @@ -541,6 +543,9 @@ def _validate_agent_path(agent_path: str) -> tuple[Path | None, str | None]: """ Validate and normalize agent_path. + Resolves relative paths against _PROJECT_ROOT since the MCP server's + cwd (core/) differs from the user's cwd (project root). + Returns: (Path, None) if valid (None, error_json) if invalid @@ -555,6 +560,12 @@ def _validate_agent_path(agent_path: str) -> tuple[Path | None, str | None]: path = Path(agent_path) + # Resolve relative paths against project root (not MCP server's cwd) + if not path.is_absolute() and not path.exists(): + resolved = _PROJECT_ROOT / path + if resolved.exists(): + path = resolved + if not path.exists(): return None, json.dumps( { @@ -2939,18 +2950,15 @@ def _format_success_criteria(criteria: list[SuccessCriterion]) -> str: # Test template for Claude to use when writing tests CONSTRAINT_TEST_TEMPLATE = '''@pytest.mark.asyncio -async def test_constraint_{constraint_id}_{scenario}(mock_mode): +async def test_constraint_{constraint_id}_{scenario}(runner, auto_responder, mock_mode): """Test: {description}""" - result = await default_agent.run({{"key": "value"}}, mock_mode=mock_mode) - - # IMPORTANT: result is an ExecutionResult object with these attributes: - # - result.success: bool - whether the agent succeeded - # - result.output: dict - the agent's output data (access data here!) - # - result.error: str or None - error message if failed + await auto_responder.start() + try: + result = await runner.run({{"key": "value"}}) + finally: + await auto_responder.stop() assert result.success, f"Agent failed: {{result.error}}" - - # Access output data via result.output output_data = result.output or {{}} # Add constraint-specific assertions here @@ -2958,18 +2966,15 @@ async def test_constraint_{constraint_id}_{scenario}(mock_mode): ''' SUCCESS_TEST_TEMPLATE = '''@pytest.mark.asyncio -async def test_success_{criteria_id}_{scenario}(mock_mode): +async def test_success_{criteria_id}_{scenario}(runner, auto_responder, mock_mode): """Test: {description}""" - result = await default_agent.run({{"key": "value"}}, mock_mode=mock_mode) - - # IMPORTANT: result is an ExecutionResult object with these attributes: - # - result.success: bool - whether the agent succeeded - # - result.output: dict - the agent's output data (access data here!) - # - result.error: str or None - error message if failed + await auto_responder.start() + try: + result = await runner.run({{"key": "value"}}) + finally: + await auto_responder.stop() assert result.success, f"Agent failed: {{result.error}}" - - # Access output data via result.output output_data = result.output or {{}} # Add success criteria-specific assertions here @@ -3025,7 +3030,6 @@ def generate_constraint_tests( test_type="Constraint", agent_name=agent_module, description=f"Tests for constraints defined in goal: {goal.name}", - agent_module=agent_module, ) # Return guidelines + data for Claude to write tests directly @@ -3041,14 +3045,22 @@ def generate_constraint_tests( "max_tests": 5, "naming_convention": "test_constraint__", "required_decorator": "@pytest.mark.asyncio", - "required_fixture": "mock_mode", - "agent_call_pattern": "await default_agent.run(input_dict, mock_mode=mock_mode)", + "required_fixtures": "runner, auto_responder, mock_mode", + "agent_call_pattern": "await runner.run(input_dict)", + "auto_responder_pattern": ( + "await auto_responder.start()\n" + "try:\n" + " result = await runner.run(input_dict)\n" + "finally:\n" + " await auto_responder.stop()" + ), "result_type": "ExecutionResult with .success, .output (dict), .error", "critical_rules": [ "Every test function MUST be async with @pytest.mark.asyncio", - "Every test MUST accept mock_mode as a parameter", - "Use await default_agent.run(input, mock_mode=mock_mode)", - "default_agent is already imported - do NOT add imports", + "Every test MUST accept runner, auto_responder, and mock_mode fixtures", + "Use await runner.run(input) -- NOT default_agent.run()", + "Start auto_responder before running, stop in finally block", + "runner and auto_responder are from conftest.py -- do NOT import them", "NEVER call result.get() - use result.output.get() instead", "Always check result.success before accessing result.output", ], @@ -3112,7 +3124,6 @@ def generate_success_tests( test_type="Success criteria", agent_name=agent_module, description=f"Tests for success criteria defined in goal: {goal.name}", - agent_module=agent_module, ) # Return guidelines + data for Claude to write tests directly @@ -3134,14 +3145,22 @@ def generate_success_tests( "max_tests": 12, "naming_convention": "test_success__", "required_decorator": "@pytest.mark.asyncio", - "required_fixture": "mock_mode", - "agent_call_pattern": "await default_agent.run(input_dict, mock_mode=mock_mode)", + "required_fixtures": "runner, auto_responder, mock_mode", + "agent_call_pattern": "await runner.run(input_dict)", + "auto_responder_pattern": ( + "await auto_responder.start()\n" + "try:\n" + " result = await runner.run(input_dict)\n" + "finally:\n" + " await auto_responder.stop()" + ), "result_type": "ExecutionResult with .success, .output (dict), .error", "critical_rules": [ "Every test function MUST be async with @pytest.mark.asyncio", - "Every test MUST accept mock_mode as a parameter", - "Use await default_agent.run(input, mock_mode=mock_mode)", - "default_agent is already imported - do NOT add imports", + "Every test MUST accept runner, auto_responder, and mock_mode fixtures", + "Use await runner.run(input) -- NOT default_agent.run()", + "Start auto_responder before running, stop in finally block", + "runner and auto_responder are from conftest.py -- do NOT import them", "NEVER call result.get() - use result.output.get() instead", "Always check result.success before accessing result.output", ], @@ -3238,11 +3257,13 @@ def run_tests( # Add short traceback and quiet summary cmd.append("--tb=short") - # Set PYTHONPATH to project root so agents can import from core.framework + # Set PYTHONPATH so framework and agent packages are importable env = os.environ.copy() pythonpath = env.get("PYTHONPATH", "") project_root = Path(__file__).parent.parent.parent.parent.resolve() - env["PYTHONPATH"] = f"{project_root}:{pythonpath}" + core_path = project_root / "core" + exports_path = project_root / "exports" + env["PYTHONPATH"] = f"{core_path}:{exports_path}:{project_root}:{pythonpath}" # Run pytest try: @@ -3712,7 +3733,11 @@ def check_missing_credentials( from framework.runner import AgentRunner - runner = AgentRunner.load(agent_path) + path, err = _validate_agent_path(agent_path) + if err: + return err + + runner = AgentRunner.load(str(path)) runner.validate() store = _get_credential_store() @@ -3912,7 +3937,11 @@ def verify_credentials( try: from framework.runner import AgentRunner - runner = AgentRunner.load(agent_path) + path, err = _validate_agent_path(agent_path) + if err: + return err + + runner = AgentRunner.load(str(path)) validation = runner.validate() return json.dumps( diff --git a/core/framework/testing/prompts.py b/core/framework/testing/prompts.py index 3bbe8898..08df7625 100644 --- a/core/framework/testing/prompts.py +++ b/core/framework/testing/prompts.py @@ -3,6 +3,10 @@ Pytest templates for test file generation. These templates provide headers and fixtures for pytest-compatible async tests. Tests are written to exports/{agent}/tests/ as Python files and run with pytest. + +Tests use AgentRunner.load() — the canonical runtime path — which creates +AgentRuntime, ExecutionStream, and proper session/log storage. For agents +with client-facing nodes, an auto_responder fixture handles input injection. """ # Template for the test file header (imports and fixtures) @@ -11,17 +15,19 @@ PYTEST_TEST_FILE_HEADER = '''""" {description} -REQUIRES: API_KEY (OpenAI or Anthropic) for real testing. +REQUIRES: API_KEY for execution tests. Structure tests run without keys. """ import os import pytest -from {agent_module} import default_agent +from pathlib import Path + +# Agent path resolved from this test file's location +AGENT_PATH = Path(__file__).resolve().parents[1] def _get_api_key(): """Get API key from CredentialStoreAdapter or environment.""" - # 1. Try CredentialStoreAdapter for Anthropic try: from aden_tools.credentials import CredentialStoreAdapter creds = CredentialStoreAdapter.default() @@ -29,28 +35,43 @@ def _get_api_key(): return creds.get("anthropic") except (ImportError, KeyError): pass - - # 2. Fallback to standard environment variables for OpenAI and others return ( os.environ.get("OPENAI_API_KEY") or os.environ.get("ANTHROPIC_API_KEY") or os.environ.get("CEREBRAS_API_KEY") or - os.environ.get("GROQ_API_KEY") + os.environ.get("GROQ_API_KEY") or + os.environ.get("GEMINI_API_KEY") ) # Skip all tests if no API key and not in mock mode pytestmark = pytest.mark.skipif( not _get_api_key() and not os.environ.get("MOCK_MODE"), - reason="API key required. Please set OPENAI_API_KEY, ANTHROPIC_API_KEY, or use MOCK_MODE=1." + reason="API key required. Set ANTHROPIC_API_KEY or use MOCK_MODE=1 for structure tests." ) ''' # Template for conftest.py with shared fixtures PYTEST_CONFTEST_TEMPLATE = '''"""Shared test fixtures for {agent_name} tests.""" +import json import os +import re +import sys +from pathlib import Path + +# Add exports/ and core/ to sys.path so the agent package and framework are importable +_repo_root = Path(__file__).resolve().parents[3] +for _p in ["exports", "core"]: + _path = str(_repo_root / _p) + if _path not in sys.path: + sys.path.insert(0, _path) + import pytest +from framework.runner.runner import AgentRunner +from framework.runtime.event_bus import EventType + +AGENT_PATH = Path(__file__).resolve().parents[1] def _get_api_key(): @@ -62,19 +83,80 @@ def _get_api_key(): return creds.get("anthropic") except (ImportError, KeyError): pass - return ( os.environ.get("OPENAI_API_KEY") or os.environ.get("ANTHROPIC_API_KEY") or os.environ.get("CEREBRAS_API_KEY") or - os.environ.get("GROQ_API_KEY") + os.environ.get("GROQ_API_KEY") or + os.environ.get("GEMINI_API_KEY") ) -@pytest.fixture +@pytest.fixture(scope="session") def mock_mode(): - """Check if running in mock mode.""" - return bool(os.environ.get("MOCK_MODE")) + """Return True if running in mock mode (no API key or MOCK_MODE=1).""" + if os.environ.get("MOCK_MODE"): + return True + return not bool(_get_api_key()) + + +@pytest.fixture(scope="session") +async def runner(tmp_path_factory, mock_mode): + """Create an AgentRunner using the canonical runtime path. + + Uses tmp_path_factory for storage so tests don't pollute ~/.hive/agents/. + Goes through AgentRunner.load() -> _setup() -> AgentRuntime, the same + path as ``hive run``. + """ + storage = tmp_path_factory.mktemp("agent_storage") + r = AgentRunner.load( + AGENT_PATH, + mock_mode=mock_mode, + storage_path=storage, + ) + r._setup() + yield r + await r.cleanup_async() + + +@pytest.fixture +def auto_responder(runner): + """Auto-respond to client-facing node input requests. + + Subscribes to CLIENT_INPUT_REQUESTED events and injects a response + to unblock the node. Customize the response before calling start(): + + auto_responder.response = "approve the report" + await auto_responder.start() + """ + class AutoResponder: + def __init__(self, runner_instance): + self._runner = runner_instance + self.response = "yes, proceed" + self.interactions = [] + self._sub_id = None + + async def start(self): + runtime = self._runner._agent_runtime + if runtime is None: + return + + async def _handle(event): + self.interactions.append(event.node_id) + await runtime.inject_input(event.node_id, self.response) + + self._sub_id = runtime.subscribe_to_events( + event_types=[EventType.CLIENT_INPUT_REQUESTED], + handler=_handle, + ) + + async def stop(self): + runtime = self._runner._agent_runtime + if self._sub_id and runtime: + runtime.unsubscribe_from_events(self._sub_id) + self._sub_id = None + + return AutoResponder(runner) @pytest.fixture(scope="session", autouse=True) @@ -82,19 +164,51 @@ def check_api_key(): """Ensure API key is set for real testing.""" if not _get_api_key(): if os.environ.get("MOCK_MODE"): - print("\\n⚠️ Running in MOCK MODE - structure validation only") - print(" This does NOT test LLM behavior or agent quality") - print(" Set OPENAI_API_KEY or ANTHROPIC_API_KEY for real testing\\n") + print("\\n Running in MOCK MODE - structure validation only") + print(" Set ANTHROPIC_API_KEY for real testing\\n") else: pytest.fail( - "\\n❌ No API key found!\\n\\n" - "Real testing requires an API key. Choose one:\\n" - "1. Set OpenAI key:\\n" - " export OPENAI_API_KEY='your-key-here'\\n" - "2. Set Anthropic key:\\n" - " export ANTHROPIC_API_KEY='your-key-here'\\n" - "3. Run structure validation only:\\n" - " MOCK_MODE=1 pytest exports/{agent_name}/tests/\\n\\n" - "Note: Mock mode does NOT validate agent behavior or quality." + "\\nNo API key found!\\n" + "Set ANTHROPIC_API_KEY or use MOCK_MODE=1 for structure tests.\\n" ) + + +def parse_json_from_output(result, key): + """Parse JSON from agent output (framework may store full LLM response as string).""" + val = result.output.get(key, "") + if isinstance(val, (dict, list)): + return val + if isinstance(val, str): + json_text = re.sub(r"```json\\s*|\\s*```", "", val).strip() + try: + return json.loads(json_text) + except (json.JSONDecodeError, TypeError): + return val + return val + + +def safe_get_nested(result, key_path, default=None): + """Safely get nested value from result.output.""" + output = result.output or {{}} + current = output + for key in key_path: + if isinstance(current, dict): + current = current.get(key) + elif isinstance(current, str): + try: + json_text = re.sub(r"```json\\s*|\\s*```", "", current).strip() + parsed = json.loads(json_text) + if isinstance(parsed, dict): + current = parsed.get(key) + else: + return default + except json.JSONDecodeError: + return default + else: + return default + return current if current is not None else default + + +pytest.parse_json_from_output = parse_json_from_output +pytest.safe_get_nested = safe_get_nested ''' diff --git a/quickstart.sh b/quickstart.sh index e4e12691..b2700837 100755 --- a/quickstart.sh +++ b/quickstart.sh @@ -717,6 +717,16 @@ else echo -e "${YELLOW}--${NC}" fi +echo -n " ⬡ local settings... " +if [ -f "$SCRIPT_DIR/.claude/settings.local.json" ]; then + echo -e "${GREEN}ok${NC}" +elif [ -f "$SCRIPT_DIR/.claude/settings.local.json.example" ]; then + cp "$SCRIPT_DIR/.claude/settings.local.json.example" "$SCRIPT_DIR/.claude/settings.local.json" + echo -e "${GREEN}copied from example${NC}" +else + echo -e "${YELLOW}--${NC}" +fi + echo -n " ⬡ credential store... " if [ -n "$HIVE_CREDENTIAL_KEY" ] && [ -d "$HOME/.hive/credentials/credentials" ]; then echo -e "${GREEN}ok${NC}" From 8cfb533fefb5d7f3752e7a01d4a4d878fbe086e4 Mon Sep 17 00:00:00 2001 From: vakrahul Date: Tue, 10 Feb 2026 09:10:47 +0530 Subject: [PATCH 12/22] Fix docs: remove recommended tag, fix rendering, remove duplicate setup --- README.md | 2 +- docs/developer-guide.md | 12 +- docs/environment-setup.md | 6 +- setup_opencode.py | 82 --------- tools/requirements.txt | 368 -------------------------------------- 5 files changed, 8 insertions(+), 462 deletions(-) delete mode 100644 setup_opencode.py delete mode 100644 tools/requirements.txt diff --git a/README.md b/README.md index 0f875b62..6a4463fc 100644 --- a/README.md +++ b/README.md @@ -122,7 +122,7 @@ hive run exports/your_agent_name --input '{"key": "value"}' ``` ## Coding Agent Support -### Opencode (Recommended) +### Opencode Hive includes native support for [Opencode](https://github.com/opencode-ai/opencode). 1. **Setup:** Run the quickstart script or `python setup_opencode.py`. diff --git a/docs/developer-guide.md b/docs/developer-guide.md index 0307ff20..f54afdfb 100644 --- a/docs/developer-guide.md +++ b/docs/developer-guide.md @@ -117,13 +117,13 @@ Skills are also available in Cursor. To enable: 4. Type `/` in Agent chat and search for skills (e.g., `/hive-create`) ```markdown -### Opencode Integration +### Opencode Support +To enable Opencode integration: -The Opencode integration leverages the same MCP servers used by Cursor and Claude Code. - -* **Configuration:** Located in `.opencode/mcp.json`. -* **Agent Definition:** The Hive agent is defined in `.opencode/agents/hive.md`. -* **Skills:** The integration reuses existing skills from `.claude/skills/`. The Hive agent is configured to access these patterns directly, ensuring consistency across all coding agents. +1. Create/Ensure `.opencode/` directory exists +2. Configure MCP servers in `.opencode/mcp.json` +3. Restart Opencode to load the MCP servers +4. Switch to the Hive agent * **Tools:** Accesses `agent-builder` and standard `tools` via standard MCP protocols over stdio. ``` ### Verify Setup diff --git a/docs/environment-setup.md b/docs/environment-setup.md index bc3071de..32191135 100644 --- a/docs/environment-setup.md +++ b/docs/environment-setup.md @@ -538,13 +538,9 @@ python setup_opencode.py - **Example Agents:** [exports/](../exports/) - **Agent Building Guide:** [.claude/skills/hive-create/SKILL.md](../.claude/skills/hive-create/SKILL.md) - **Testing Guide:** [.claude/skills/hive-test/SKILL.md](../.claude/skills/hive-test/SKILL.md) -## Opencode Setup -[Opencode](https://github.com/opencode-ai/opencode) is fully supported as a coding agent. -### Automatic Setup -Run the setup script in the root directory: -```bash + ## Contributing diff --git a/setup_opencode.py b/setup_opencode.py deleted file mode 100644 index 8748dae6..00000000 --- a/setup_opencode.py +++ /dev/null @@ -1,82 +0,0 @@ -import os -import sys -import json - -def setup_opencode(): - print("✨ Setting up Opencode integration...") - - # 1. Define paths - base_dir = os.getcwd() - opencode_dir = os.path.join(base_dir, ".opencode") - agents_dir = os.path.join(opencode_dir, "agents") - - # Create directories - if not os.path.exists(agents_dir): - os.makedirs(agents_dir) - print(f" Created directory: {agents_dir}") - - # 2. Determine Path Separator (Windows uses ';' others use ':') - # We force ';' if on Windows to be safe - path_sep = ";" if os.name == 'nt' else ":" - print(f" Detected OS: {os.name} (using separator '{path_sep}')") - - # 3. Create mcp.json - mcp_config = { - "mcpServers": { - "agent-builder": { - "command": "python", - "args": ["-m", "framework.mcp.agent_builder_server"], - "cwd": "core", - "env": { - "PYTHONPATH": f"../tools/src{path_sep}." - } - }, - "tools": { - "command": "python", - "args": ["mcp_server.py", "--stdio"], - "cwd": "tools", - "env": { - "PYTHONPATH": f"src{path_sep}../core" - } - } - } - } - - mcp_path = os.path.join(opencode_dir, "mcp.json") - with open(mcp_path, "w") as f: - json.dump(mcp_config, f, indent=2) - print(f"✅ Created {mcp_path}") - - # 4. Create Hive Agent - agent_content = """--- -name: hive -description: Hive Agent Builder & Manager -mode: primary -model: anthropic/claude-3-5-sonnet-20241022 -tools: - agent-builder: true - tools: true ---- - -# Hive Agent -You are the Hive Agent Builder. Your goal is to help the user construct, configure, and deploy AI agents using the Hive framework. - -## Capabilities -1. **Scaffold Agents:** Create new agent directories/configs. -2. **Manage Tools:** Add/remove tools via MCP. -3. **Debug:** Analyze agent workflows. - -## Context & Skills -- You have access to all skills in `.claude/skills/`. -- Always use the `agent-builder` MCP server for filesystem operations. -""" - - agent_path = os.path.join(agents_dir, "hive.md") - with open(agent_path, "w", encoding="utf-8") as f: - f.write(agent_content) - - print(f"✅ Created {agent_path}") - print("\n🎉 Setup Complete! Restart Opencode and type '/hive' to begin.") - -if __name__ == "__main__": - setup_opencode() \ No newline at end of file diff --git a/tools/requirements.txt b/tools/requirements.txt deleted file mode 100644 index 8dfd136f..00000000 --- a/tools/requirements.txt +++ /dev/null @@ -1,368 +0,0 @@ -absl-py==2.3.1 -aiofiles==25.1.0 -aiohappyeyeballs==2.6.1 -aiohttp==3.13.3 -aiohttp-retry==2.8.3 -aiosignal==1.4.0 -altair==6.0.0 -annotated-doc==0.0.4 -annotated-types==0.7.0 -ansicon==1.89.0 -anthropic==0.76.0 -anyio==4.12.1 -asgiref==3.8.1 -astunparse==1.6.3 -attrs==25.4.0 -Authlib==1.6.6 -av==16.1.0 -backoff==2.2.1 -bcrypt==4.3.0 -beartype==0.22.9 -beautifulsoup4==4.14.3 -blessed==1.25.0 -blinker==1.9.0 -build==1.3.0 -CacheControl==0.14.3 -cachetools==5.5.2 -certifi==2026.1.4 -cffi==2.0.0 -charset-normalizer==3.4.4 -chromadb==1.3.6 -click==8.3.1 -cloudpickle==3.1.2 -colorama==0.4.6 -coloredlogs==15.0.1 -comtypes==1.4.10 -contourpy==1.3.2 -cosdata-fastembed==0.7.1 -cosdata-sdk==0.2.5 -croniter==6.0.0 -cryptography==46.0.1 -cycler==0.12.1 -cyclopts==4.5.1 -dataclasses-json==0.6.7 -datasets==4.4.1 -deprecation==2.1.0 -diff-match-patch==20241021 -dill==0.4.0 -diskcache==5.6.3 -distlib==0.3.9 -distro==1.9.0 -Django==5.2.1 -djangorestframework==3.16.0 -dlib @ file:///C:/Users/RAHUL/Downloads/dlib-19.24.99-cp312-cp312-win_amd64.whl#sha256=20c62e606ca4c9961305f7be3d03990380d3e6c17f8d27798996e97a73271862 -dnspython==2.8.0 -docstring_parser==0.17.0 -docutils==0.22.4 -dotenv==0.9.9 -durationpy==0.10 -easyocr==1.7.2 -editor==1.6.6 -email-validator==2.3.0 -eval_type_backport==0.3.1 -exceptiongroup==1.3.1 -execnet==2.1.2 -face-recognition==1.3.0 -face_recognition_models==0.3.0 -fakeredis==2.33.0 -fastapi==0.121.3 -fastmcp==2.14.5 -fastuuid==0.14.0 -filelock==3.16.1 -filetype==1.2.0 -firebase_admin==7.1.0 -Flask==3.1.2 -Flask-Bcrypt==1.0.1 -flask-cors==6.0.2 -Flask-Login==0.6.3 -Flask-SQLAlchemy==3.1.1 -flatbuffers==25.12.19 -fonttools==4.59.0 -frozenlist==1.8.0 -fsspec==2025.9.0 -gast==0.6.0 -gitdb==4.0.12 -GitPython==3.1.45 -google-ai-generativelanguage==0.6.15 -google-api-core==2.25.1 -google-api-python-client==2.187.0 -google-auth==2.48.0 -google-auth-httplib2==0.2.1 -google-cloud-core==2.4.3 -google-cloud-firestore==2.21.0 -google-cloud-storage==3.4.0 -google-crc32c==1.7.1 -google-genai==1.60.0 -google-generativeai==0.8.5 -google-pasta==0.2.0 -google-resumable-media==2.7.2 -googleapis-common-protos==1.72.0 -greenlet==3.1.1 -grpcio==1.76.0 -grpcio-status==1.71.2 -h11==0.16.0 -h2==4.3.0 -h5py==3.14.0 -hpack==4.1.0 -httpcore==1.0.9 -httplib2==0.31.0 -httptools==0.7.1 -httpx==0.28.1 -httpx-sse==0.4.3 -huggingface-hub==0.36.0 -humanfriendly==10.0 -hyperframe==6.1.0 -idna==3.11 -imageio==2.37.0 -importlib_metadata==8.7.1 -importlib_resources==6.5.2 -iniconfig==2.3.0 -inquirer==3.4.1 -itsdangerous==2.2.0 -jaraco.classes==3.4.0 -jaraco.context==6.1.0 -jaraco.functools==4.4.0 -Jinja2==3.1.4 -jinxed==1.3.0 -jiter==0.12.0 -joblib==1.5.1 -jsonpatch==1.33 -jsonpath-ng==1.7.0 -jsonpointer==3.0.0 -jsonref==1.1.0 -jsonschema==4.25.1 -jsonschema-path==0.3.4 -jsonschema-specifications==2025.9.1 -keras==3.10.0 -keyring==25.7.0 -kiwisolver==1.4.8 -kubernetes==34.1.0 -langchain==1.2.7 -langchain-anthropic==1.3.1 -langchain-classic==1.0.1 -langchain-community==0.4.1 -langchain-core==1.2.7 -langchain-google-genai==4.2.0 -langchain-huggingface==1.2.0 -langchain-openai==1.1.7 -langchain-text-splitters==1.1.0 -langdetect==1.0.9 -langgraph==1.0.7 -langgraph-checkpoint==4.0.0 -langgraph-prebuilt==1.0.7 -langgraph-sdk==0.3.3 -langsmith==0.6.6 -lazy_loader==0.4 -libclang==18.1.1 -livekit==1.0.23 -livekit-agents==1.3.12 -livekit-api==1.1.0 -livekit-blingfire==1.1.0 -livekit-plugins-elevenlabs==1.3.12 -livekit-plugins-openai==1.3.12 -livekit-plugins-silero==1.3.12 -livekit-protocol==1.1.2 -loguru==0.7.3 -lupa==2.6 -lxml==6.0.1 -Markdown==3.8.2 -markdown-it-py==4.0.0 -MarkupSafe==3.0.2 -marshmallow==3.26.2 -matplotlib==3.10.3 -mcp==1.26.0 -mdurl==0.1.2 -Mesa==3.1.2 -ml_dtypes==0.5.1 -mmh3==5.2.0 -more-itertools==10.8.0 -mpmath==1.3.0 -msgpack==1.1.1 -multidict==6.7.0 -multiprocess==0.70.18 -mypy_extensions==1.1.0 -mysqlclient==2.2.7 -namex==0.1.0 -narwhals==2.14.0 -nest-asyncio==1.6.0 -networkx==3.5 -ninja==1.13.0 -numpy==2.4.1 -oauthlib==3.3.1 -onnxruntime==1.23.1 -openai==2.15.0 -openapi-pydantic==0.5.1 -opencv-contrib-python==4.11.0.86 -opencv-python==4.10.0.84 -opencv-python-headless==4.12.0.88 -opentelemetry-api==1.39.1 -opentelemetry-exporter-otlp==1.39.1 -opentelemetry-exporter-otlp-proto-common==1.39.1 -opentelemetry-exporter-otlp-proto-grpc==1.39.1 -opentelemetry-exporter-otlp-proto-http==1.39.1 -opentelemetry-proto==1.39.1 -opentelemetry-sdk==1.39.1 -opentelemetry-semantic-conventions==0.60b1 -opt_einsum==3.4.0 -optree==0.17.0 -orjson==3.11.5 -ormsgpack==1.12.2 -overrides==7.7.0 -packaging==25.0 -pandas==2.2.3 -pathable==0.4.4 -pathspec==0.12.1 -pathvalidate==3.3.1 -patsy==1.0.1 -pdf2image==1.17.0 -pdfminer.six==20251107 -pdfplumber==0.11.8 -pillow==12.1.0 -platformdirs==4.3.6 -playwright==1.58.0 -playwright-stealth==2.0.1 -pluggy==1.6.0 -ply==3.11 -postgrest==2.27.3 -posthog==5.4.0 -prometheus_client==0.24.1 -propcache==0.4.1 -proto-plus==1.26.1 -protobuf==5.29.5 -psutil==7.2.1 -py-key-value-aio==0.3.0 -py-key-value-shared==0.3.0 -py_rust_stemmers==0.1.5 -pyarrow==22.0.0 -pyasn1==0.6.1 -pyasn1_modules==0.4.2 -pybase64==1.4.3 -pyclipper==1.3.0.post6 -pycparser==3.0 -pydantic==2.12.5 -pydantic-settings==2.12.0 -pydantic_core==2.41.5 -pydeck==0.9.1 -pydocket==0.17.5 -pyee==13.0.0 -PyGithub==2.8.1 -Pygments==2.19.2 -pyiceberg==0.10.0 -PyJWT==2.10.1 -PyNaCl==1.6.2 -pyparsing==3.2.3 -pypdf==6.6.2 -pypdfium2==5.1.0 -pyperclip==1.11.0 -PyPika==0.48.9 -pypiwin32==223 -pyproject_hooks==1.2.0 -pyreadline3==3.5.4 -pyroaring==1.0.3 -pytesseract==0.3.13 -pytest==9.0.2 -pytest-asyncio==1.3.0 -python-bidi==0.6.6 -python-dateutil==2.9.0.post0 -python-docx==1.2.0 -python-dotenv==1.2.1 -python-json-logger==4.0.0 -python-magic-bin==0.4.14 -python-multipart==0.0.20 -pytrends==4.9.2 -pyttsx3==2.98 -pytz==2024.2 -pywin32==310 -pywin32-ctypes==0.2.3 -PyYAML==6.0.2 -readchar==4.2.1 -realtime==2.27.3 -redis==7.1.0 -referencing==0.36.2 -regex==2025.11.3 -reportlab==4.4.5 -requests==2.32.5 -requests-oauthlib==2.0.0 -requests-toolbelt==1.0.0 -resend==2.21.0 -rich==14.3.1 -rich-rst==1.3.2 -rpds-py==0.30.0 -rsa==4.9.1 -ruff==0.14.14 -runs==1.2.2 -safetensors==0.7.0 -scikit-image==0.25.2 -scikit-learn==1.7.1 -scipy==1.16.0 -seaborn==0.13.2 -sentence-transformers==5.1.2 -setuptools==80.9.0 -shapely==2.1.1 -shellingham==1.5.4 -six==1.17.0 -smmap==5.0.2 -sniffio==1.3.1 -sortedcontainers==2.4.0 -sounddevice==0.5.5 -soupsieve==2.8.3 -SpeechRecognition==3.14.2 -SQLAlchemy==2.0.38 -sqlparse==0.5.3 -sse-starlette==3.2.0 -starlette==0.50.0 -statsmodels==0.14.5 -storage3==2.27.3 -streamlit==1.52.2 -StrEnum==0.4.15 -strictyaml==1.7.3 -stripe==14.3.0 -supabase==2.27.3 -supabase-auth==2.27.3 -supabase-functions==2.27.3 -sympy==1.14.0 -tenacity==9.1.2 -tensorboard==2.19.0 -tensorboard-data-server==0.7.2 -tensorflow==2.19.0 -termcolor==3.1.0 -tf_keras==2.19.0 -threadpoolctl==3.6.0 -tifffile==2025.9.9 -tiktoken==0.12.0 -tokenizers==0.22.1 -toml==0.10.2 --e git+https://github.com/vakrahul/hive_work.git@ee42ceee00258707624bbb9b1c47341a3ce58cbd#egg=tools&subdirectory=tools -torch==2.8.0 -torchaudio==2.8.0 -torchvision==0.23.0 -tornado==6.5.4 -tqdm==4.67.1 -transformers==4.57.1 -twilio==9.4.3 -typer==0.21.1 -types-protobuf==6.32.1.20251210 -typing-inspect==0.9.0 -typing-inspection==0.4.2 -typing_extensions==4.15.0 -tzdata==2024.2 -uritemplate==4.2.0 -urllib3==2.6.3 -uuid_utils==0.14.0 -uv==0.10.0 -uvicorn==0.38.0 -virtualenv==20.28.0 -watchdog==6.0.0 -watchfiles==1.1.1 -wcwidth==0.2.14 -websocket-client==1.9.0 -websockets==15.0.1 -Werkzeug==3.1.3 -wheel==0.45.1 -win32_setctime==1.2.0 -wrapt==1.17.2 -xmod==1.8.1 -xxhash==3.6.0 -yarl==1.22.0 -zipp==3.23.0 -zstandard==0.25.0 From 929dc24e9331f10bbecf6935ae691d7abfeb99f7 Mon Sep 17 00:00:00 2001 From: vakrahul Date: Tue, 10 Feb 2026 09:12:42 +0530 Subject: [PATCH 13/22] Fix config: use uv for mcp and remove pinned model --- .opencode/agents/hive.md | 7 ++-- .opencode/mcp.json | 26 +++++++++--- quickstart.sh | 85 ---------------------------------------- 3 files changed, 23 insertions(+), 95 deletions(-) diff --git a/.opencode/agents/hive.md b/.opencode/agents/hive.md index eb2c18ab..6c70024c 100644 --- a/.opencode/agents/hive.md +++ b/.opencode/agents/hive.md @@ -2,7 +2,6 @@ name: hive description: Hive Agent Builder & Manager mode: primary -model: anthropic/claude-3-5-sonnet-20241022 tools: agent-builder: true tools: true @@ -16,6 +15,6 @@ You are the Hive Agent Builder. Your goal is to help the user construct, configu 2. **Manage Tools:** Add/remove tools via MCP. 3. **Debug:** Analyze agent workflows. -## Context & Skills -- You have access to all skills in `.claude/skills/`. -- Always use the `agent-builder` MCP server for filesystem operations. +## Context +- You are an expert in the Hive framework architecture. +- Always use the `agent-builder` MCP server for filesystem operations. \ No newline at end of file diff --git a/.opencode/mcp.json b/.opencode/mcp.json index 521a5b4b..74cb5c27 100644 --- a/.opencode/mcp.json +++ b/.opencode/mcp.json @@ -1,16 +1,30 @@ { "mcpServers": { "agent-builder": { - "command": "python", - "args": ["-m", "framework.mcp.agent_builder_server"], + "command": "uv", + "args": [ + "run", + "python", + "-m", + "framework.mcp.agent_builder_server" + ], "cwd": "core", - "env": { "PYTHONPATH": "../tools/src:." } + "env": { + "PYTHONPATH": "../tools/src" + } }, "tools": { - "command": "python", - "args": ["mcp_server.py", "--stdio"], + "command": "uv", + "args": [ + "run", + "python", + "mcp_server.py", + "--stdio" + ], "cwd": "tools", - "env": { "PYTHONPATH": "src:../core" } + "env": { + "PYTHONPATH": "src" + } } } } \ No newline at end of file diff --git a/quickstart.sh b/quickstart.sh index 74fe2487..5170c46f 100755 --- a/quickstart.sh +++ b/quickstart.sh @@ -821,88 +821,3 @@ fi echo -e "${DIM}Run ./quickstart.sh again to reconfigure.${NC}" echo "" -# ========================================== -# Opencode Setup (Auto-Generated) -# ========================================== -if command -v opencode &> /dev/null; then - echo "✨ Opencode detected! Setting up Hive integration..." - - # Create directory structure - mkdir -p .opencode/agents - - # Determine OS for correct path separator (Windows uses ';', Mac/Linux uses ':') - if [[ "$OSTYPE" == "msys" || "$OSTYPE" == "cygwin" || "$OSTYPE" == "win32" ]]; then - PATH_SEP=";" - else - PATH_SEP=":" - fi - - # 1. Generate mcp.json with correct separator - cat > .opencode/mcp.json < .opencode/agents/hive.md < /dev/null; then - python setup_opencode.py -fi -# ========================================== -# Opencode Integration -# ========================================== -if command -v python &> /dev/null; then - echo "🐍 Python detected. Running Opencode setup..." - python setup_opencode.py -elif command -v python3 &> /dev/null; then - echo "🐍 Python3 detected. Running Opencode setup..." - python3 setup_opencode.py -else - echo "⚠️ Python not found. Skipping Opencode setup." -fi \ No newline at end of file From 767d32d42035c4ed2d439dc85d5600b9cfc323fa Mon Sep 17 00:00:00 2001 From: Timothy Date: Mon, 9 Feb 2026 19:51:50 -0800 Subject: [PATCH 14/22] fix: stucking test cases --- core/framework/graph/event_loop_node.py | 7 ++++--- core/tests/test_event_loop_integration.py | 12 ++++++++++-- 2 files changed, 14 insertions(+), 5 deletions(-) diff --git a/core/framework/graph/event_loop_node.py b/core/framework/graph/event_loop_node.py index ae95e339..1c10c291 100644 --- a/core/framework/graph/event_loop_node.py +++ b/core/framework/graph/event_loop_node.py @@ -1011,7 +1011,7 @@ class EventLoopNode(NodeProtocol): is_error=result.is_error, ) if not result.is_error: - value = tc.tool_input["value"] + value = tc.tool_input.get("value", "") # Parse JSON strings into native types so downstream # consumers get lists/dicts instead of serialised JSON, # and the hallucination validator skips non-string values. @@ -1022,8 +1022,9 @@ class EventLoopNode(NodeProtocol): value = parsed except (json.JSONDecodeError, TypeError): pass - await accumulator.set(tc.tool_input["key"], value) - outputs_set_this_turn.append(tc.tool_input["key"]) + key = tc.tool_input.get("key", "") + await accumulator.set(key, value) + outputs_set_this_turn.append(key) logged_tool_calls.append( { "tool_use_id": tc.tool_use_id, diff --git a/core/tests/test_event_loop_integration.py b/core/tests/test_event_loop_integration.py index 90e91b34..4834662c 100644 --- a/core/tests/test_event_loop_integration.py +++ b/core/tests/test_event_loop_integration.py @@ -951,8 +951,16 @@ async def test_client_facing_node_streams_output(): config=LoopConfig(max_iterations=5), ) - # Text-only on client_facing no longer blocks (no ask_user called), - # so the node completes without needing a shutdown workaround. + # Client-facing text-only turns block for user input (by design). + # Signal shutdown when the node requests input so it exits cleanly. + async def on_input_requested(event: AgentEvent) -> None: + node.signal_shutdown() + + bus.subscribe( + event_types=[EventType.CLIENT_INPUT_REQUESTED], + handler=on_input_requested, + ) + result = await node.execute(ctx) assert result.success From 87b0037fcd173643df525cb289b3caf0b787f963 Mon Sep 17 00:00:00 2001 From: vakrahul Date: Tue, 10 Feb 2026 09:25:43 +0530 Subject: [PATCH 15/22] Add Opencode skills (mirrored from .cursor/skills) --- .opencode/skills/hive | 1 + .opencode/skills/hive-concepts | 1 + .opencode/skills/hive-create | 1 + .opencode/skills/hive-credentials | 1 + .opencode/skills/hive-patterns | 1 + .opencode/skills/hive-test | 1 + 6 files changed, 6 insertions(+) create mode 100644 .opencode/skills/hive create mode 100644 .opencode/skills/hive-concepts create mode 100644 .opencode/skills/hive-create create mode 100644 .opencode/skills/hive-credentials create mode 100644 .opencode/skills/hive-patterns create mode 100644 .opencode/skills/hive-test diff --git a/.opencode/skills/hive b/.opencode/skills/hive new file mode 100644 index 00000000..47ca6b8e --- /dev/null +++ b/.opencode/skills/hive @@ -0,0 +1 @@ +../../.claude/skills/hive \ No newline at end of file diff --git a/.opencode/skills/hive-concepts b/.opencode/skills/hive-concepts new file mode 100644 index 00000000..4f460b1d --- /dev/null +++ b/.opencode/skills/hive-concepts @@ -0,0 +1 @@ +../../.claude/skills/hive-concepts \ No newline at end of file diff --git a/.opencode/skills/hive-create b/.opencode/skills/hive-create new file mode 100644 index 00000000..9247f883 --- /dev/null +++ b/.opencode/skills/hive-create @@ -0,0 +1 @@ +../../.claude/skills/hive-create \ No newline at end of file diff --git a/.opencode/skills/hive-credentials b/.opencode/skills/hive-credentials new file mode 100644 index 00000000..610180f4 --- /dev/null +++ b/.opencode/skills/hive-credentials @@ -0,0 +1 @@ +../../.claude/skills/hive-credentials \ No newline at end of file diff --git a/.opencode/skills/hive-patterns b/.opencode/skills/hive-patterns new file mode 100644 index 00000000..c18612b5 --- /dev/null +++ b/.opencode/skills/hive-patterns @@ -0,0 +1 @@ +../../.claude/skills/hive-patterns \ No newline at end of file diff --git a/.opencode/skills/hive-test b/.opencode/skills/hive-test new file mode 100644 index 00000000..d2377d0e --- /dev/null +++ b/.opencode/skills/hive-test @@ -0,0 +1 @@ +../../.claude/skills/hive-test \ No newline at end of file From 1fd56b079c6161d70efd6d53ff02f7af80bc41ac Mon Sep 17 00:00:00 2001 From: Timothy Date: Mon, 9 Feb 2026 20:12:37 -0800 Subject: [PATCH 16/22] fix: test cases --- core/framework/graph/event_loop_node.py | 29 +++++------------------ core/tests/test_event_loop_integration.py | 12 ++-------- 2 files changed, 8 insertions(+), 33 deletions(-) diff --git a/core/framework/graph/event_loop_node.py b/core/framework/graph/event_loop_node.py index 1c10c291..79591057 100644 --- a/core/framework/graph/event_loop_node.py +++ b/core/framework/graph/event_loop_node.py @@ -486,29 +486,12 @@ class EventLoopNode(NodeProtocol): # 6h. Client-facing input blocking # - # For client_facing nodes, block for user input when: - # - The LLM explicitly called ask_user(), OR - # - The LLM produced a turn with no real tool calls - # (text-only or set_output-only). - # - # Before the first user interaction, set_output alone does - # NOT prevent blocking — the node must present its output - # to the user first. After the user has interacted at - # least once, set_output-only turns flow through (the LLM - # is finishing up based on user input). - # - # After user input, always fall through to judge evaluation - # (6i). The judge handles all acceptance decisions. - if user_interaction_count == 0: - # No user interaction yet — block unless ask_user or - # real tools were called (set_output alone is not enough) - needs_user_input = user_input_requested or not real_tool_results - else: - # User has already interacted — set_output can bypass - needs_user_input = user_input_requested or ( - not real_tool_results and not outputs_set - ) - if ctx.node_spec.client_facing and needs_user_input: + # Block ONLY when the LLM explicitly calls ask_user(). + # Text-only turns and set_output-only turns flow through + # without blocking, allowing progress updates and summaries + # to stream freely. After user input arrives, fall through + # to judge evaluation (6i) — the judge handles acceptance. + if ctx.node_spec.client_facing and user_input_requested: if self._shutdown: await self._publish_loop_completed(stream_id, node_id, iteration + 1) latency_ms = int((time.time() - start_time) * 1000) diff --git a/core/tests/test_event_loop_integration.py b/core/tests/test_event_loop_integration.py index 4834662c..d19f008a 100644 --- a/core/tests/test_event_loop_integration.py +++ b/core/tests/test_event_loop_integration.py @@ -951,16 +951,8 @@ async def test_client_facing_node_streams_output(): config=LoopConfig(max_iterations=5), ) - # Client-facing text-only turns block for user input (by design). - # Signal shutdown when the node requests input so it exits cleanly. - async def on_input_requested(event: AgentEvent) -> None: - node.signal_shutdown() - - bus.subscribe( - event_types=[EventType.CLIENT_INPUT_REQUESTED], - handler=on_input_requested, - ) - + # Text-only on client_facing does not block (no ask_user called), + # so the node completes without needing a shutdown workaround. result = await node.execute(ctx) assert result.success From a1a0ec5ddb3e76b9b7f558ff54104664b8895095 Mon Sep 17 00:00:00 2001 From: vakrahul Date: Tue, 10 Feb 2026 09:45:44 +0530 Subject: [PATCH 17/22] Remove skills folder (reviewer will handle symlinks) and cleanup doc refs --- .opencode/skills/hive | 1 - .opencode/skills/hive-concepts | 1 - .opencode/skills/hive-create | 1 - .opencode/skills/hive-credentials | 1 - .opencode/skills/hive-patterns | 1 - .opencode/skills/hive-test | 1 - docs/environment-setup.md | 2 +- 7 files changed, 1 insertion(+), 7 deletions(-) delete mode 100644 .opencode/skills/hive delete mode 100644 .opencode/skills/hive-concepts delete mode 100644 .opencode/skills/hive-create delete mode 100644 .opencode/skills/hive-credentials delete mode 100644 .opencode/skills/hive-patterns delete mode 100644 .opencode/skills/hive-test diff --git a/.opencode/skills/hive b/.opencode/skills/hive deleted file mode 100644 index 47ca6b8e..00000000 --- a/.opencode/skills/hive +++ /dev/null @@ -1 +0,0 @@ -../../.claude/skills/hive \ No newline at end of file diff --git a/.opencode/skills/hive-concepts b/.opencode/skills/hive-concepts deleted file mode 100644 index 4f460b1d..00000000 --- a/.opencode/skills/hive-concepts +++ /dev/null @@ -1 +0,0 @@ -../../.claude/skills/hive-concepts \ No newline at end of file diff --git a/.opencode/skills/hive-create b/.opencode/skills/hive-create deleted file mode 100644 index 9247f883..00000000 --- a/.opencode/skills/hive-create +++ /dev/null @@ -1 +0,0 @@ -../../.claude/skills/hive-create \ No newline at end of file diff --git a/.opencode/skills/hive-credentials b/.opencode/skills/hive-credentials deleted file mode 100644 index 610180f4..00000000 --- a/.opencode/skills/hive-credentials +++ /dev/null @@ -1 +0,0 @@ -../../.claude/skills/hive-credentials \ No newline at end of file diff --git a/.opencode/skills/hive-patterns b/.opencode/skills/hive-patterns deleted file mode 100644 index c18612b5..00000000 --- a/.opencode/skills/hive-patterns +++ /dev/null @@ -1 +0,0 @@ -../../.claude/skills/hive-patterns \ No newline at end of file diff --git a/.opencode/skills/hive-test b/.opencode/skills/hive-test deleted file mode 100644 index d2377d0e..00000000 --- a/.opencode/skills/hive-test +++ /dev/null @@ -1 +0,0 @@ -../../.claude/skills/hive-test \ No newline at end of file diff --git a/docs/environment-setup.md b/docs/environment-setup.md index 32191135..b9890b0b 100644 --- a/docs/environment-setup.md +++ b/docs/environment-setup.md @@ -528,7 +528,7 @@ export AGENT_STORAGE_PATH="/custom/storage" ### Automatic Setup Run the setup script in the root directory: ```bash -python setup_opencode.py + ``` ## Additional Resources From 80a49806409ccc528b10d338008b728b40b9467f Mon Sep 17 00:00:00 2001 From: e-cesar9 <109167667+e-cesar9@users.noreply.github.com> Date: Wed, 11 Feb 2026 00:15:17 +0100 Subject: [PATCH 18/22] micro-fix: change debug logging to appropriate level in executor.py MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Changes logger.info() with debug prefix to logger.debug() for session state resume information. This prevents debug-level information from appearing in production logs at INFO level. - Removes redundant '🔍 Debug:' prefix - Uses appropriate debug logging level - Follows Python logging best practices - Improves production log clarity Addresses #4377 --- core/framework/graph/executor.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/core/framework/graph/executor.py b/core/framework/graph/executor.py index a75cea41..63cfabed 100644 --- a/core/framework/graph/executor.py +++ b/core/framework/graph/executor.py @@ -368,7 +368,7 @@ class GraphExecutor: # Check if resuming from paused_at (session state resume) paused_at = session_state.get("paused_at") if session_state else None node_ids = [n.id for n in graph.nodes] - self.logger.info(f"🔍 Debug: paused_at={paused_at}, available node IDs={node_ids}") + self.logger.debug(f"paused_at={paused_at}, available node IDs={node_ids}") if paused_at and graph.get_node(paused_at) is not None: # Resume from paused_at node directly (works for any node, not just pause_nodes) From 48b92412472fe67d885ccba5ad29ff3c3366bab5 Mon Sep 17 00:00:00 2001 From: vakrahul Date: Wed, 11 Feb 2026 07:18:17 +0530 Subject: [PATCH 19/22] chore: address PR review feedback (formatting and outdated references) --- README.md | 3 +-- docs/developer-guide.md | 2 +- docs/environment-setup.md | 5 ----- 3 files changed, 2 insertions(+), 8 deletions(-) diff --git a/README.md b/README.md index 6a4463fc..32d9daff 100644 --- a/README.md +++ b/README.md @@ -121,11 +121,10 @@ hive tui hive run exports/your_agent_name --input '{"key": "value"}' ``` ## Coding Agent Support - ### Opencode Hive includes native support for [Opencode](https://github.com/opencode-ai/opencode). -1. **Setup:** Run the quickstart script or `python setup_opencode.py`. +1. **Setup:** Run the quickstart script 2. **Launch:** Open Opencode in the project root. 3. **Activate:** Type `/hive` in the chat to switch to the Hive Agent. 4. **Verify:** Ask the agent *"List your tools"* to confirm the connection. diff --git a/docs/developer-guide.md b/docs/developer-guide.md index f54afdfb..f623e930 100644 --- a/docs/developer-guide.md +++ b/docs/developer-guide.md @@ -116,7 +116,7 @@ Skills are also available in Cursor. To enable: 3. Restart Cursor to load the MCP servers from `.cursor/mcp.json` 4. Type `/` in Agent chat and search for skills (e.g., `/hive-create`) -```markdown + ### Opencode Support To enable Opencode integration: diff --git a/docs/environment-setup.md b/docs/environment-setup.md index b9890b0b..b87d66c2 100644 --- a/docs/environment-setup.md +++ b/docs/environment-setup.md @@ -528,9 +528,7 @@ export AGENT_STORAGE_PATH="/custom/storage" ### Automatic Setup Run the setup script in the root directory: ```bash - ``` - ## Additional Resources - **Framework Documentation:** [core/README.md](../core/README.md) @@ -539,9 +537,6 @@ Run the setup script in the root directory: - **Agent Building Guide:** [.claude/skills/hive-create/SKILL.md](../.claude/skills/hive-create/SKILL.md) - **Testing Guide:** [.claude/skills/hive-test/SKILL.md](../.claude/skills/hive-test/SKILL.md) - - - ## Contributing When contributing agent packages: From c83aac5e12e7990fcf9e6ab449df5a0dc5214ab5 Mon Sep 17 00:00:00 2001 From: vakrahul Date: Wed, 11 Feb 2026 07:29:48 +0530 Subject: [PATCH 20/22] docs: apply PR review changes for quickstart and markdown formatting --- docs/developer-guide.md | 2 +- docs/environment-setup.md | 3 ++- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/developer-guide.md b/docs/developer-guide.md index f623e930..a968bcd8 100644 --- a/docs/developer-guide.md +++ b/docs/developer-guide.md @@ -125,7 +125,7 @@ To enable Opencode integration: 3. Restart Opencode to load the MCP servers 4. Switch to the Hive agent * **Tools:** Accesses `agent-builder` and standard `tools` via standard MCP protocols over stdio. -``` + ### Verify Setup ```bash diff --git a/docs/environment-setup.md b/docs/environment-setup.md index b87d66c2..ad0e80a0 100644 --- a/docs/environment-setup.md +++ b/docs/environment-setup.md @@ -526,8 +526,9 @@ export AGENT_STORAGE_PATH="/custom/storage" [Opencode](https://github.com/opencode-ai/opencode) is fully supported as a coding agent. ### Automatic Setup -Run the setup script in the root directory: +Run the quickstart script in the root directory: ```bash +./quickstart.sh ``` ## Additional Resources From c65eed8802b018e5f63b26f34dd993995253970e Mon Sep 17 00:00:00 2001 From: vakrahul Date: Wed, 11 Feb 2026 07:36:23 +0530 Subject: [PATCH 21/22] docs: apply PR review changes for quickstart and markdown formatting and updating --- docs/environment-setup.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/environment-setup.md b/docs/environment-setup.md index ad0e80a0..5f9057c3 100644 --- a/docs/environment-setup.md +++ b/docs/environment-setup.md @@ -526,7 +526,7 @@ export AGENT_STORAGE_PATH="/custom/storage" [Opencode](https://github.com/opencode-ai/opencode) is fully supported as a coding agent. ### Automatic Setup -Run the quickstart script in the root directory: +Run the quickstart script in the root directorys: ```bash ./quickstart.sh ``` From 7d571dfaece8cecfcb74457789461c44197007bd Mon Sep 17 00:00:00 2001 From: bryan Date: Tue, 10 Feb 2026 18:20:21 -0800 Subject: [PATCH 22/22] added skills to opencode, linking to claude --- .opencode/skills/hive | 1 + .opencode/skills/hive-concepts | 1 + .opencode/skills/hive-create | 1 + .opencode/skills/hive-credentials | 1 + .opencode/skills/hive-debugger | 1 + .opencode/skills/hive-patterns | 1 + .opencode/skills/hive-test | 1 + .opencode/skills/triage-issue | 1 + 8 files changed, 8 insertions(+) create mode 120000 .opencode/skills/hive create mode 120000 .opencode/skills/hive-concepts create mode 120000 .opencode/skills/hive-create create mode 120000 .opencode/skills/hive-credentials create mode 120000 .opencode/skills/hive-debugger create mode 120000 .opencode/skills/hive-patterns create mode 120000 .opencode/skills/hive-test create mode 120000 .opencode/skills/triage-issue diff --git a/.opencode/skills/hive b/.opencode/skills/hive new file mode 120000 index 00000000..47ca6b8e --- /dev/null +++ b/.opencode/skills/hive @@ -0,0 +1 @@ +../../.claude/skills/hive \ No newline at end of file diff --git a/.opencode/skills/hive-concepts b/.opencode/skills/hive-concepts new file mode 120000 index 00000000..4f460b1d --- /dev/null +++ b/.opencode/skills/hive-concepts @@ -0,0 +1 @@ +../../.claude/skills/hive-concepts \ No newline at end of file diff --git a/.opencode/skills/hive-create b/.opencode/skills/hive-create new file mode 120000 index 00000000..9247f883 --- /dev/null +++ b/.opencode/skills/hive-create @@ -0,0 +1 @@ +../../.claude/skills/hive-create \ No newline at end of file diff --git a/.opencode/skills/hive-credentials b/.opencode/skills/hive-credentials new file mode 120000 index 00000000..610180f4 --- /dev/null +++ b/.opencode/skills/hive-credentials @@ -0,0 +1 @@ +../../.claude/skills/hive-credentials \ No newline at end of file diff --git a/.opencode/skills/hive-debugger b/.opencode/skills/hive-debugger new file mode 120000 index 00000000..48edc69e --- /dev/null +++ b/.opencode/skills/hive-debugger @@ -0,0 +1 @@ +../../.claude/skills/hive-debugger \ No newline at end of file diff --git a/.opencode/skills/hive-patterns b/.opencode/skills/hive-patterns new file mode 120000 index 00000000..c18612b5 --- /dev/null +++ b/.opencode/skills/hive-patterns @@ -0,0 +1 @@ +../../.claude/skills/hive-patterns \ No newline at end of file diff --git a/.opencode/skills/hive-test b/.opencode/skills/hive-test new file mode 120000 index 00000000..d2377d0e --- /dev/null +++ b/.opencode/skills/hive-test @@ -0,0 +1 @@ +../../.claude/skills/hive-test \ No newline at end of file diff --git a/.opencode/skills/triage-issue b/.opencode/skills/triage-issue new file mode 120000 index 00000000..41183c47 --- /dev/null +++ b/.opencode/skills/triage-issue @@ -0,0 +1 @@ +../../.claude/skills/triage-issue \ No newline at end of file