The previous code did `text[:max_length] + "..."`, which made the
returned content always 3 chars longer than the requested max_length.
Reserve room for the ellipsis inside the limit so the contract holds.
Fixes#2098
* test(event_bus): add comprehensive unit tests for EventBus
- Add 38 tests covering all EventBus functionality
- Test subscription management (subscribe/unsubscribe)
- Test event publishing and delivery to subscribers
- Test filtering by stream, node, and execution
- Test concurrency (handler errors, semaphore limits)
- Test history operations (get_history, get_stats)
- Test wait_for async waiting with timeout
- Test convenience publishers (emit_* methods)
- Test AgentEvent dataclass and EventType enum
Fixes#4782
* test(event_bus): add comprehensive unit tests for EventBus
- Add 41 tests covering all EventBus functionality
- Test subscription management (subscribe/unsubscribe)
- Test event publishing and delivery to subscribers
- Test filtering by stream, node, execution, and graph_id
- Test concurrency (handler errors, semaphore limits)
- Test history operations (get_history, get_stats)
- Test wait_for async waiting with timeout
- Test convenience publishers including new emit_tool_doom_loop
and emit_escalation_requested methods
- Test AgentEvent dataclass with graph_id field
- Test EventType enum including NODE_TOOL_DOOM_LOOP and ESCALATION_REQUESTED
Fixes#4782
* test(event_bus): add tests for new worker monitoring events
Add tests for newly added emit methods:
- emit_worker_escalation_ticket (judge → queen escalation)
- emit_queen_intervention_requested (queen → operator escalation)
- emit_llm_turn_complete (LLM turn metadata)
- emit_node_action_plan (node planning)
Update test_key_event_types_exist to include:
- WORKER_ESCALATION_TICKET, QUEEN_INTERVENTION_REQUESTED
- LLM_TURN_COMPLETE, NODE_ACTION_PLAN
- WORKER_LOADED, CREDENTIALS_REQUIRED
Total: 45 tests (up from 41)
* test(event_bus): add tests for run_id, subagent_report, and new event types
- Add tests for run_id field in AgentEvent.to_dict()
- Add test for emit_subagent_report convenience method
- Update test_key_event_types_exist with new event types
- Total: 48 tests
* fix(test): remove tests for deleted EventBus methods and fix enum names
Remove test_emit_worker_escalation_ticket and
test_emit_queen_intervention_requested (methods were removed in recent
refactors). Fix WORKER_LOADED -> WORKER_GRAPH_LOADED in enum assertions.
* style: add future annotations import and fix empty assertion in test_emit_node_action_plan
* feat(tools): add Weights & Biases experiment tracking and model monitoring integration
* style: fix ruff formatting in wandb_tool.py
* feat(tools): add Weights & Biases ML experiment tracking integration
* fix(tools): address CodeRabbit review comments on wandb_tool
* fix(tools): rewrite wandb_tool to use official Python SDK instead of undocumented REST endpoints
* fix(tools): address Hundao review — remove .coverage, switch to GraphQL/httpx, fix wandb_host, add README
* fix(tools): wire filters to GraphQL, validate empty metric_keys, fix line lengths
* fix(tools): check credentials before input validation in wandb_get_run_metrics
Move _get_creds() call before run_id/metric_keys checks so the
framework credential test receives the expected {error, help} response
instead of a bare input-validation error.
* fix(test): update queen memory test mocks to match litellm ModelResponse format
PR #6976 refactored reflection_agent to extract tool calls from litellm
ModelResponse objects (choices[0].message.tool_calls) instead of plain
dicts. The test mocks were not updated, causing tool calls to silently
fail and two tests to break.
Fixes#6990
* style: ruff format
* chore: update package-lock.json after npm install
* fix: export validate_agent_path from server module
* fix: remove circular import in server module
* docs: Add Frontend Dev Workflow subsection to CONTRIBUTING.md
* chore: revert accidental package-lock.json changes
* docs: clarify frontend dev requires both backend and dev server
---------
Co-authored-by: hundao <alchemy_wimp@hotmail.com>
The error_middleware was returning str(e) and type(e).__name__ directly
in JSON responses, which could expose file paths, database connection
strings, API key names, and internal class names to untrusted clients.
Changes:
- Return generic 'Internal server error' message instead of raw exception
- Improve server-side log to include request method and path
- Add unit tests verifying no internal details are leaked
The full exception traceback remains available via logger.exception()
for server-side debugging.
Co-authored-by: Aashutosh Pandey <aashutoshpandey@Aashutoshs-MacBook-Air.local>
* docs(tools): add README for 10 tools (batch 3)
Adds README.md for: supabase_tool, zoom_tool, twitter_tool,
twilio_tool, shopify_tool, snowflake_tool, zendesk_tool,
yahoo_finance_tool, youtube_transcript_tool, docker_hub_tool
Partial fix for #6486
* docs(shopify): fix fulfillment_status value and body_html param name
- fulfillment_status example: "unfulfilled" is not a valid Shopify API
value, changed to "unshipped"
- tool table: "description" is not the actual param name, it's body_html
---------
Co-authored-by: hundao <alchemy_wimp@hotmail.com>
- powerShell supports direct command invocation without requiring the call operator "&"
- the script used the call operator "&" to invoke `npm install` and `npm run build`
- which caused incorrect command parsing in PowerShell when combined with output redirection "2>&1"
- This resulted in unexpected errors such as "Unknown command: pm", even though the commands worked correctly when executed manually
- removed the unnecessary use of the call operator "&" and invoked npm commands directly
- npm commands now execute correctly within the script, aligning with standard PowerShell behavior and eliminating the parsing issue
* docs(tools): add README for huggingface, jira, pinecone, langfuse, linear, mongodb, redis, vercel, confluence
* docs(tools): fix review comments in confluence, mongodb and vercel READMEs
* docs(mongodb): add MONGODB_DATA_SOURCE to setup section
The code uses os.getenv("MONGODB_DATA_SOURCE") in every API request
body as the dataSource field. Without it, requests send an empty
dataSource and fail. Add it back to the setup section.
---------
Co-authored-by: hundao <alchemy_wimp@hotmail.com>
* fix(tools): move playwright back to main dependencies
playwright was moved to the browser extra in c7e85aa9 as part of the GCU
refactor to use a browser extension. But web_scrape_tool still imports
playwright at module level and requires it unconditionally, so CI's
Test Tools job breaks with ModuleNotFoundError.
web_scrape_tool has no fallback without playwright — it's a hard
dependency, not optional. Put it back in main deps.
Fixes CI failure on Test Tools (ubuntu-latest).
* chore: remove dead test_highlights.py script
tools/test_highlights.py is orphaned from the GCU refactor in c7e85aa9:
- imports highlight_coordinate and highlight_element from gcu.browser.highlight,
but highlight.py was deleted in that refactor
- calls BrowserSession.start(), open_tab(), get_active_page(), stop() — none
of these methods exist on the current BrowserSession class
The script can't run at all, and it's tripping ruff's I001 import-order
check (fail on Lint CI after cache invalidation).
* test: fix browser/refs tests broken by GCU refactor
Tests were still testing the old Playwright-based API after c7e85aa9
moved GCU to an extension-bridge architecture.
test_refs.py (6 tests):
Refs system now produces CSS selectors like
[role="button"][aria-label="Submit"]:nth-of-type(1) for the bridge's
DOM matcher, instead of Playwright's role=button[name="Submit"] >> nth=0.
Updated expected values to match. Renamed test_escapes_quotes_in_name to
test_quoted_name_passes_through and added a comment noting that inner
quotes aren't currently escaped (follow-up concern).
test_browser_tools_comprehensive.py (4 tests):
- test_screenshot_full_page: browser_screenshot passes selector=None
when no selector is provided; update assertion.
- test_file_upload: browser_upload validates file paths exist on disk.
Create real tmp files and mock the CDP calls it makes.
- test_evaluate_with_bare_return: renamed to
test_evaluate_passes_script_through_to_bridge. IIFE wrapping lives
in bridge.evaluate, not in the browser_evaluate tool — mocking the
bridge bypasses the wrapping logic, so the tool just passes the
script through.
- test_evaluate_complex_script: browser_evaluate returns bridge's raw
result (no 'ok' wrapper); check for 'result' key instead.
test_browser_advanced_tools.py (deleted):
The whole file patched get_session and page.wait_for_function (the old
Playwright-based API). The bug it guarded against (user text interpolated
into a JS source string) is architecturally impossible in the new
bridge-based tools, which send text via structured RPC. Coverage for
browser_wait exists in test_browser_tools_comprehensive.py.
* test(core): fix event_loop tests broken by hive-v1 refactor
Several framework tests were left failing or hanging after the hive-v1
refactor landed. This un-breaks CI without touching production code.
- Worker auto-escalation: 8 tests were hanging because EventLoopNode
with event_bus treats non-queen/non-subagent nodes as workers and
auto-escalates to queen, then blocks on _await_user_input forever
(no queen in standalone tests). Opt out via is_subagent_mode=True.
- MockConversationStore: added clear() to match the production store
(storage/conversation_store.py), which event_loop_node.py:425 calls.
- Executor output semantics: result.output now only contains terminal-
node outputs; two handoff tests now read intermediate outputs from
result.session_state["data_buffer"].
- Restore filter: test_restore_from_checkpoint needs set_current_phase
so restore()'s phase_id filter matches.
- Removed two _build_context tests whose target method no longer exists
(replaced by standalone build_node_context()). Remaining execution_id
coverage is adequate in TestExecutionId + integration tests.
* style: ruff format + drop em dash in comment
* test(core): fix remaining framework tests broken by hive-v1 refactor
Rounds out the fix started in the previous commit. Full framework
suite now passes (1589 passed, 0 failed).
- conftest.py: force-bind framework.runner submodules (mcp_registry,
mcp_client, mcp_connection_manager) as attributes on the parent
package. Without this, pytest monkeypatch.setattr with dotted-string
paths fails because the attribute walker can't resolve the submodule
even though __init__.py imports from it. Affects ~25 MCP tests.
- test_queen_memory: _execute_tool() grew a required caller kwarg for
worker type-restrictions. Pass caller="queen" so path-traversal
checks run without caller restrictions interfering.
- test_session_manager_worker_handoff: _subscribe_worker_digest was
removed in the refactor, dropped the dead monkeypatches.
- test_skill_context_protection: NodeConversation now reads _run_id
in add_tool_result(), so the __new__-based test helper has to
initialise it.
- test_node_conversation: restore() now filters parts by run_id for
crash recovery. Renamed the stale test and flipped the assertion
to match the new filtering semantics.
- test_tool_registry: CONTEXT_PARAMS was updated (workspace_id out,
profile in). Switched the test's example stripped params.
* docs: drop circular PR reference in test_refs comment
Addresses CodeRabbit nitpick. The comment referenced the PR that was
adding the comment, which becomes a self-reference after merge.
Added methods to control tabs via the Chrome extension:
- create_tab(groupId, url) - create and navigate tabs in user's Chrome
- close_tab(tabId) - close tabs
- list_tabs(groupId?) - list tabs
- cdp_attach(tabId) - attach CDP for automation
- cdp_send(tabId, method, params) - send CDP commands
These enable browser automation through the extension when Playwright
can't connect directly to the user's Chrome.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When Playwright connects to the user's Chrome via CDP (bridge connected),
we now copy cookies/storage from an existing browser context into the
new agent context. This preserves login sessions (LinkedIn, etc.).
Before: New context created fresh → no cookies → login wall
After: New context inherits storage state → cookies preserved → logged in
Requires Chrome to be started with --remote-debugging-port=9222 or
HIVE_BROWSER_CDP_URL to be set for this to work.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two fixes for browser session persistence:
1. Use stable profile name (agent_id only, not agent_id-subagent_instance)
- Before: "honeycomb_linkedin_outreach-gcu-scan-profiles-1" (unique each call)
- After: "gcu-scan-profiles" (stable across calls)
2. Remove browser_stop() call in finally block
- Keeping browser alive allows cookies/auth to persist
- Browser cleaned up when parent agent stops or explicitly requested
This fixes the issue where LinkedIn auth was lost between subagent runs
because each run created a fresh browser profile with no cookies.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Changed all browser tool `profile` parameters from defaulting to "default"
to defaulting to None. This allows `get_session()` to use the context
variable set by `set_active_profile()` in the subagent executor.
Before: Subagent calls browser_navigate() → profile="default" → tab group named "default"
After: Subagent calls browser_navigate() → profile=None → get_session() uses contextvar → tab group named "{agent_id}-{subagent_id}"
Fixes tab groups being named "default" instead of the subagent's name.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: propagate contextvars to tool executor threads
run_in_executor does not propagate contextvars to worker threads,
causing execution context params like data_dir to be lost when MCP
tools are called. This made save_data, serve_file_to_user, and other
tools that depend on auto-injected data_dir fail with "Missing
required argument: data_dir".
Fix: use contextvars.copy_context().run() to carry the current context
into the thread pool worker.
* test: regression test for contextvars propagation in tool executor
Verifies that execution context (data_dir, etc.) set via
set_execution_context is visible inside tool executors that run
in thread pool workers via run_in_executor.
Windows has no POSIX executable bits, so the stat-based check always
fails. Skip the check on Windows in the validator, and mark the two
related tests as POSIX-only. Unix CI still catches non-executable
scripts from Windows contributors.
Fixes#6893
* feat(freshdesk): add Freshdesk tool integration with credentials and API functionality
- Introduced Freshdesk tool for managing tickets, contacts, agents, and groups via Freshdesk API v2.
- Added Freshdesk credentials handling in `credentials/freshdesk.py`.
- Registered Freshdesk tools in `tools/freshdesk_tool/__init__.py` and `tools/freshdesk_tool/freshdesk_tool.py`.
- Updated `__init__.py` files to include Freshdesk in the exports.
- Created comprehensive README for Freshdesk tool usage and setup.
- Implemented unit tests for Freshdesk tool functionality.
All tests pass, and code adheres to ruff linting and formatting standards.
* refactor(freshdesk_tool): simplify _get_domain logic
- remove unnecessary try/except around credentials.get("freshdesk_domain")
- directly return stripped credential value if present
- fallback to FRESHDESK_DOMAIN env variable when missing
- eliminate unreachable code while preserving behavior
* refactor(freshdesk_tool): replace dynamic httpx dispatch in _request
- replace getattr(httpx, method) with explicit handling for get, post, and put
- raise ValueError for unsupported HTTP methods
- preserve existing status handling and response parsing logic
* docs(freshdesk): improve credential and error handling documentation
- add docstrings for error handling helpers in freshdesk_tool
- document purpose and usage of freshdesk credential specs
- improve clarity around error response structure and handling
Validate that resolved session path stays within the sessions directory
using Path.is_relative_to(). Prevents session_id values like
"../../something" from escaping the sandbox.
Also guard the caller in _write_run_event where get_session_path is
called outside the existing OSError try/except block.
Fixes#1000
Co-authored-by: Sidhartha kumar <Alearner12@users.noreply.github.com>
* refactor: remove deprecated storage/backend.py (267 lines)
Delete the fully deprecated FileStorage class and inline its 5 still-active
methods (_validate_key, _load_run_sync, _load_summary_sync, _delete_run_sync,
_list_all_runs_sync) directly into ConcurrentStorage.
Changes:
- Delete core/framework/storage/backend.py (267 lines of no-op/deprecated code)
- Inline active read methods into ConcurrentStorage (no new FileStorage dep)
- Remove deprecated index operations (get_runs_by_goal, get_runs_by_status,
get_runs_by_node, list_all_goals) and their associated locking
- Update __init__.py to export ConcurrentStorage instead of FileStorage
- Update runtime/core.py to use ConcurrentStorage directly
- Fix Runtime.end_run() to call save_run_sync() (sync wrapper) instead of
the async save_run(), which was silently dropping the coroutine
- Update test_path_traversal_fix.py to test ConcurrentStorage._validate_key()
- Clean up test_storage.py — remove all FileStorage test classes, un-skip
ConcurrentStorage tests now that it's self-contained
- Remove stale FileStorage references from testing/test_storage.py docstring,
testing/debug_tool.py docstring, and test_runtime.py skip reasons
All 44 tests pass, ruff check and ruff format clean.
Fixes#6797
* fix(core): address CodeRabbitAI PR review feedback
- Fix critical no-op in ConcurrentStorage._save_run_sync by implementing atomic persistence to
uns/{run_id}.json.
- Update est_path_traversal_fix.py to test ConcurrentStorage directly and use real file paths for end-to-end validation.
- Unskip est_run_saved_on_end and assert actual run file persistence.
- Fix debug_tool.py to use load_run_sync() instead of the async load_run().
* fix(core): address round 2 of CodeRabbitAI reviews
- Add _validate_key to _save_run_sync and _load_summary_sync to enforce path traversal protections on the lowest level APIs.
- Invalidate summary cache and refresh run cache in save_run_sync() to match the async save_run() cache coherence behavior.
- Add tests for load_summary and save_run_sync path traversal rejection.
* feat(scripts): add support for more LLM providers in check_llm_key.py
* fix(scripts): correct perplexity endpoint to /v1/models and simplify lambda kwargs to **_
* feat(quickstart): add Local (Ollama) LLM provider option
- Detect Ollama via 'ollama list' in quickstart.sh and quickstart.ps1
- Add 'Local (Ollama)' menu option with interactive model picker
- Save provider=ollama, model=<selected> to ~/.hive/configuration.json
- Omit api_key_env_var for Ollama (no API key required)
Refs #5154, #5231
* feat: add local Ollama support and resolve native tool calling
This integrates Ollama as a first-class local provider choice during quickstart, and patches several configuration barriers preventing local models from safely executing the framework's agent graphs.
* **Quickstart Integration**: Added `Local (Ollama)` to the provider menu in both quickstart.sh and quickstart.ps1. When selected, it automatically queries `ollama list` and allows the user to pick an installed model without prompting for an API key.
* **Routing & Configuration**: Automatically sets `"api_base": "http://localhost:11434"` so LiteLLM routes correctly to the local daemon, and increases the default max_tokens config.py allocation to `32768`.
* **Native Tool Calling**: Normalized Ollama models to strictly use the ollama_chat provider prefix inside litellm.py and registered them as `supports_function_calling: True`. This forces native structured function calling and fixes the infinite loop caused by JSON-mode text fallbacks.
* **Context Truncation Fix**: Updated config.py to explicitly pass `"num_ctx": 16384` to Ollama. This prevents the local daemon from silently truncating the Queen agent's ~9,500 token system prompt (Ollama defaults to 2048 `num_ctx`).
* **UX Warnings**: Added terminal notices warning users to select high-parameter models (e.g., `qwen2.5:72b+`) to ensure sufficient contextual reasoning abilities.
Resolves#6027Resolves#6028
* test: add unit tests for Ollama helper functions
Cover _is_ollama_model(), _ensure_ollama_chat_prefix(), and num_ctx
injection in get_llm_extra_kwargs() as requested in PR review.
Fix existing test_init_ollama_no_key_needed assertion to expect the
normalised ollama_chat/ prefix.
Made-with: Cursor
* chores: fixed merge conflict
* fix(ollama): address PR review comments and normalize provider config
* fix(ollama): align quickstart defaults and add tool_choice comment
* fix(ollama): enforce OLLAMA_DETECTED logic and resolve quickstart script syntax errors
* fix(ollama): align quickstart logic and cleanup test imports
* feat(mcp-cli): add hive mcp CLI management commands (#6350)
Implement the hive mcp subcommand group with shared helpers and all
P0/P1 management commands: install, add, remove, enable, disable,
list, info, config, search, health, update.
Includes update bridge (remove+reinstall with rollback on failure),
first-use security notice, credential prompting, secret masking,
and agent usage detection via load_agent_selection().
* test(mcp-cli): add CLI integration and handler tests (#6350)
58 tests covering all commands end-to-end:
- Real framework.cli.main() entrypoint dispatch (list, install, update)
- Real registry-on-disk integration (install, list, config, info, remove)
- All 11 command handlers (install, add, remove, enable, disable, list,
info, config, search, health, update)
- Security notice shown only once
- Credential prompting stores overrides, skips when env set, handles cancel
- Secret masking in human output, JSON output, and config display
- Index refresh semantics (stale cache fallback vs no-cache hard fail)
- Update rollback on reinstall failure preserves original entry
- Update rejects local servers and pinned servers with correct remediation
- Bulk update skips local and pinned servers
- Argparse registration validates all 11 subcommands present
- _find_agents_using_server resolves via real load_agent_selection
- _parse_key_value_pairs validates KEY=VAL format
* fix(mcp-cli): mask list --json secrets, preserve enabled state on update, defer security sentinel (#6350)
- list --json now masks override values as <set> before emitting
- update preserves enabled=False state across reinstall
- security notice sentinel only written after successful install
* refactor(mcp-cli): fix docstring, share registry instance in update, extract _mask_overrides helper (#6350)
- Fix module docstring to reflect update's full behavior
- Pass registry instance to _cmd_mcp_update_server to avoid redundant disk I/O
- Extract _mask_overrides() used by list --json, info --json, info human, and config display
- Add comment about _find_agents_using_server path arithmetic limitation
* docs: add Windows quickstart.ps1 command in Quick Start section
* fix: restore closing code fence and comment out Windows command
---------
Co-authored-by: hundao <alchemy_wimp@hotmail.com>
Python 3.9+ no longer wraps subscript slices in ast.Index, and
Python 3.12 removed ast.Index entirely. The project requires
Python >=3.11, so this is dead code.
litellm>=1.82.7 contains a malicious .pth file that auto-executes at
Python startup and exfiltrates env vars, SSH keys, cloud credentials,
and CI/CD secrets to an attacker-controlled domain.
Pin to last known-safe version (currently installed). Unpin once a
verified-clean upstream release is available.
Closes#6783
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- The file `tools/test_schema_discovery.py` was being incorrectly collected by pytest as a test module
- Since the file is actually a standalone script, this caused import errors during test collection
- Rename the file to remove the `test_` prefix so pytest no longer treats it as a test file
- Pytest test discovery no longer includes the script, eliminating the import error and restoring a clean test run
- multiple test files shared the same module name "test_structure.py"
- this cause pytest import mismatches during collection
- renamed test files to "test_email_reply_agent" and "test_meeting_scheduler"
- eliminated module name collisions and fixed test discovery
- the test suite still referenced _PushoverClient, which no longer exists
- this caused import errors and failing pytest runs
- removed all tests related to _PushoverClient
- fixed pytest execution errors
- removed dead test code
- ensured test coverage reflects the current implementation
Add Mattermost as a new messaging tool following the existing Discord/Telegram
pattern. Supports self-hosted and cloud instances via personal access tokens.
Tools: list_teams, list_channels, get_channel, send_message, get_posts,
create_reaction, delete_post. Includes rate limit retry logic, credential
store + env var fallback, and comprehensive tests (41 unit + 50 conformance).
Closes#6746
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add 30s transition timeouts to prevent deadlocks on stuck connections.
Split SSE from HTTP in health_check: SSE uses client.list_tools() instead
of hitting /health (SSE servers use event-stream protocol, not REST).
Add has_connection() for MCPRegistry health check integration. Handle
disconnect failures in release, reconnect, and cleanup_all. Guard
reconnect against refcount dropping to zero mid-reconnect.
Covers install/add_local/remove/enable/disable, resolve_for_agent selection
precedence, health checks with pooled connections, cache fallback (defect 1),
SSE health check (defect 2), tomllib version parsing (defect 3), JSON type
validation for mcp_registry.json fields, malformed JSON error handling,
structured log emission, and retry-on-zero-tools behavior.
Local state management for installed MCP servers in ~/.hive/mcp_registry/.
Supports install from registry index, add_local for running servers,
resolve_for_agent with include/tags/exclude/profile/max_tools/versions
selection, health checks via MCPConnectionManager, and JSON type
validation at the mcp_registry.json boundary.
Integration points: AgentRunner, queen orchestrator, credential tester
all load mcp_registry.json with error handling. ToolRegistry gains
load_registry_servers() with retry and structured DX-4 logging.
Fixes#6484
- Replace 8 raw print() calls with logger.warning() in runner.py
- Uses lazy % formatting instead of f-strings
- Warnings about missing tokens/API keys now go through logging framework
- Visible in log files when agents run headlessly
* fix: improve tool_registry error handling with stack traces and context
When tool execution fails, errors now include:
- Stack traces for debugging
- Tool name, tool_use_id, and inputs in error logs
- Same behavior for both sync and async tools
Fixes#2447
* fix: use exc_info=True and truncate inputs in tool error logs
- Replace traceback.format_exc() with exc_info=True (codebase convention)
- Truncate tool inputs to 500 chars to prevent log flooding
- Add test for input truncation
* feat(tools): add URL support to pdf_read tool
Enable pdf_read to accept both local file paths and HTTP/HTTPS URLs.
Downloads PDF content to temporary file when URL is provided, validates
content-type, and cleans up automatically after extraction.
- Detect URL inputs (http:// or https://)
- Download PDF with httpx (60s timeout)
- Validate Content-Type is application/pdf
- Use temporary file for URL-based PDFs
- Automatic cleanup in finally block
- Maintains backward compatibility with local paths
Completes the workflow: web_scrape error on PDF → pdf_read from URL
* test(tools): Add test coverage for new features in web_scrape and pdf_read tools
* style: fix lint issues in pdf_read URL support
---------
Co-authored-by: Anurag <anuragkr-codes@users.noreply.github.com>
Co-authored-by: hundao <alchemy_wimp@hotmail.com>
Move _get_client() before JSON deserialization so missing-credentials
errors aren't masked by input validation. Wrap json.loads in try/except
for non-JSON string inputs.
Apply Ruff formatting to the extracted event loop modules, the EventLoopNode wrappers, and the OpenRouter key check script so the lint CI format check passes cleanly.
Extract EventLoopNode helper logic into focused event_loop modules while keeping the node responsible for orchestration.
Preserve the existing behavior and compatibility for compaction, event publishing, cursor persistence, synthetic tools, judge evaluation, stall detection, tool result handling, and subagent escalation wiring.
Anthropic tightened OAuth validation on 2026-03-17, requiring a
specific User-Agent header and a billing integrity system block for
subscription-authenticated requests. Without these, all OAuth calls
return HTTP 400 with a generic "Error" message.
Changes:
- Add billing integrity system block (SHA-256 hash derived from first
user message content) prepended to system messages on OAuth requests
- Set User-Agent to claude-code/<version> for OAuth sessions
- Fix OAuth header patch to detect tokens in x-api-key (not just
Authorization) and add required beta/browser-access headers
- Set litellm.drop_params=True to prevent unsupported params like
stream_options from leaking to Anthropic (causes 400)
- Skip stream_options entirely for Anthropic models
- Honour LITELLM_LOG env var for debug logging instead of hardcoding
LiteLLM logger to WARNING
Integrate SkillsManager refactor from base branch. Trust gating (AS-13)
is now wired into SkillsManager._do_load() instead of inline in runner.py,
with the interactive flag passed through SkillsManagerConfig.
Restore .claude/settings.json and revert .gitignore change
that were accidentally included in the sdr-agent refactor commit.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace duplicated setup code in tui command with agent.start(mock_mode=mock)
- Fix mock mode to use MockLLMProvider instead of llm=None
- Add demo_contacts.json sample data for template testing
- Untrack .claude/settings.json and add to .gitignore
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace self._executor with self._agent_runtime (AgentRuntime | None)
- Import AgentRuntime for proper type annotation
- Add missing await self._agent_runtime.start() in start() — runtime
was created but never started, causing silent failures at runtime
- Add self._agent_runtime = None reset in stop() for clean restart
- Remove redundant self._graph is None guard in trigger_and_wait()
- Update mcp_servers.json with hive-tools server config
- Add credential file patterns to .gitignore
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* test: add comprehensive test suite for safe_eval sandboxed evaluator
Adds 113 tests across 14 test classes covering the full surface area of
the safe_eval expression evaluator used by edge conditions:
- Literals, data structures, arithmetic, unary/binary/boolean operators
- Short-circuit semantics for `and`/`or` (including guard patterns)
- Ternary expressions, variable lookup, subscript/attribute access
- Whitelisted function and method calls
- Security boundaries (private attrs, disallowed AST nodes, blocked builtins)
- Real-world EdgeSpec.condition_expr patterns from graph executor usage
* style: fix import sort order
---------
Co-authored-by: mma2027 <mma2027@users.noreply.github.com>
Co-authored-by: hundao <alchemy_wimp@hotmail.com>
- Add name and entry_node to all trigger SSE events (TRIGGER_AVAILABLE,
TRIGGER_ACTIVATED, TRIGGER_DEACTIVATED) so frontend gets correct data
immediately instead of guessing
- Use ep.entry_node from backend in polling instead of guessing first
non-trigger node
- Compute cronToLabel from trigger config during polling so pill labels
show human-readable schedule
- Fix AsyncMock for event_bus.publish in tests
- Handle trigger_updated SSE event to update graph node label and
config in real time when cron or task is saved
- Use cronToLabel for human-readable schedule display in detail panel
- Add "Saved" button feedback for Save Cron and Save Task (2s toast)
- Update trigger pill label to reflect new schedule on cron save
- Extend PATCH /triggers/{id} to accept trigger_config with cron
validation via croniter and active timer restart
- Add TRIGGER_UPDATED SSE event so frontend updates in real time
- Update frontend API client to use updateTrigger with config support
- Add tests for task update, cron restart, and invalid cron rejection
Trigger nodes (scheduler, webhook, etc.) stopped appearing after the
v0.7.0 refactor because DraftGraph had no trigger awareness.
- Extract shared utilities (cssVar, truncateLabel, trigger colors/icons,
useTriggerColors, cronToLabel) into lib/graphUtils.ts
- Render trigger pills above the draft flowchart with pill shape, icons,
countdown timers, active/inactive status, and click handling
- Draw dashed edges from trigger pills to the correct draft node using
flowchartMap lookup
- Name all trigger layout constants, fix countdown text color bug
- Include trigger pill extent in SVG viewBox width
Closes#6344
Only extend read_keys/write_keys with skill memory keys when the
list was already non-empty (restricted). An empty list means "allow
all" — adding _-prefixed skill keys to an empty list accidentally
activated the permission check and blocked legitimate reads.
- add OpenRouter chat completion validation to key checks for quickstart flows
- improve OpenRouter compat parsing to convert plain textual tool calls into real tool events
- prevent tool-call text from leaking into assistant responses
- add regression tests for OpenRouter key checks and LiteLLM tool compat parsing
quickstart.ps1 and hive.ps1 provide full native Windows support.
Update README, CONTRIBUTING, and environment-setup docs to stop
recommending WSL as the primary path. Also add Windows alternatives
for make check/test commands in CONTRIBUTING.md.
Fixes#3835Fixes#3839
* feat(tools): add command sanitizer module with blocklists for shell injection prevention
* fix(tools): validate commands in execute_command_tool before execution
* fix(tools): validate commands in coder_tools_server run_command before execution
* test(tools): add 109 tests for command sanitizer covering safe, blocked, and edge cases
* fix(tools): normalize executable sanitizer matching
\) usage with explicit .exe suffix normalization in sanitizer paths to satisfy Ruff B005 while preserving blocking behavior for executable names.
Also apply the same normalization in coder_tools_server fallback sanitizer and clean a test-file formatting lint issue.
* fix(tools): harden command sanitizer handling
Normalize executable path matching, tighten python -c detection, and remove the duplicated coder_tools_server fallback by importing the shared sanitizer reliably.
Document the shell=True limitation in the command runners and add regression tests for absolute executable paths plus quoted python -c forms.
ParallelExecutionConfig.branch_timeout_seconds and memory_conflict_strategy
were declared but never read by any code. This caused branches to run
indefinitely and memory conflicts to go undetected.
Changes:
- Wrap parallel branch tasks with asyncio.wait_for() using configured timeout
- Switch asyncio.gather to return_exceptions=True so one timeout doesn't cancel siblings
- Handle asyncio.TimeoutError in result processing loop
- Implement last_wins/first_wins/error memory conflict strategies
- Track which branch wrote which key during fan-out for conflict detection
- Add 6 new tests covering timeout and conflict scenarios
Closes#5706
croniter is used for cron-based timer entry points but was never
declared in pyproject.toml. A fresh install would silently skip
all cron triggers. Add croniter>=1.4.0 to dependencies and raise
RuntimeError instead of silently continuing on ImportError.
Fixes#5353
- Add Windows (PowerShell) section alongside Linux/macOS
- Reference .\quickstart.ps1 for native Windows users
- Add Set-ExecutionPolicy note for script execution
- Link to environment-setup.md for WSL alternatives
Closes#5753
_patch_litellm_anthropic_oauth and _patch_litellm_metadata_nonetype
silently return when litellm internal modules change. This adds
logger.warning() calls so operators are alerted when patches cannot be
applied, instead of encountering cryptic 401 or TypeError at runtime.
Co-authored-by: GowthamT-1610 <gowthamt@umd.edu>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Add MAX_FAILED_REQUEST_DUMPS = 50 cap and _prune_failed_request_dumps()
helper. After each _dump_failed_request() call the oldest files beyond
the cap are deleted so the directory never grows without bound.
Fixes#5696
* fix: preserve custom session ids in runtime logs
Treat any execution stored under sessions/<id> as a session-backed run so custom IDs stay visible in worker-session browsing and unified log APIs. Add regression coverage for custom IDs across executor path selection, log directory creation, and API listing.
Made-with: Cursor
* fix: ignore stray session directories in listing
Keep the session_ prefix as the fast path for worker session discovery, but allow custom IDs when a backing state.json exists. This avoids ghost directories in the UI while preserving the custom session ID support from the original fix.
Made-with: Cursor
* fix(windows): verify uv is runnable before launch
* fix(windows): use validated uv path for kimi health check
* fix(windows): dedupe uv discovery and keep quickstart scoped
* chore: refresh uv lockfile
Use atomic_write for GraphExecutor._write_progress and log persistence failures instead of silently swallowing exceptions. Add regression tests for atomic write usage and warning logs on write failure.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Align POST /api/sessions behavior across queen-only and one-step worker creation so callers can rely on deterministic session IDs. Add a regression test covering the forwarded session_id contract.
Made-with: Cursor
Pass browser_wait text through Playwright's function argument channel so quoted and multiline strings do not break the generated wait expression. Add a regression test covering text that previously would have been interpolated unsafely.
Made-with: Cursor
Changed HOW_TO_CONTRIBUTE.md back to CONTRIBUTING.md to comply with
GitHub's standard for contributing guidelines files.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Renamed CONTRIBUTING.md to HOW_TO_CONTRIBUTE.md and significantly expanded
the documentation with detailed sections on development setup, OS support,
tooling requirements, performance metrics, and contribution workflows.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Validate that agent names passed to --agents do not contain path
separators. Previously, passing 'exports/my_agent' would result in
the doubled path 'exports/exports/my_agent' with a confusing error.
Now a clear error message is shown suggesting the correct usage.
Fixes#6208
Co-authored-by: nightcityblade <nightcityblade@gmail.com>
Replace individual recipe READMEs with a comprehensive collection of 100 real-world agent prompt examples across marketing, sales, operations, engineering, and finance. This provides users with a broader range of use case inspiration in a single, organized reference document.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Replace individual recipe READMEs with a comprehensive collection of 100 real-world agent prompt examples across marketing, sales, operations, engineering, and finance. This provides users with a broader range of use case inspiration in a single, organized reference document.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Validate that agent names passed to --agents do not contain path
separators. Previously, passing 'exports/my_agent' would result in
the doubled path 'exports/exports/my_agent' with a confusing error.
Now a clear error message is shown suggesting the correct usage.
Fixes#6208
Co-authored-by: nightcityblade <nightcityblade@gmail.com>
Replace bare except Exception: clauses with specific exception handling:
- delete_aden_api_key(): Catch FileNotFoundError, PermissionError at debug
level; log unexpected errors at WARNING with exc_info=True
- _read_credential_key_file(): Catch FileNotFoundError, PermissionError at
debug level; log unexpected errors at WARNING with exc_info=True
- _read_aden_from_encrypted_store(): Catch FileNotFoundError, PermissionError,
KeyError at debug level; log unexpected errors at WARNING with exc_info=True
This makes credential issues easier to diagnose by:
- Logging unexpected errors at WARNING level (visible in production)
- Including full stack traces with exc_info=True
- Keeping expected failures (file not found, permissions) at debug level
Fixes#5931
Tests were asserting the old CLIENT_OUTPUT_DELTA + CLIENT_INPUT_REQUESTED
pattern; the fix in 89ccd66f routes escalations through the queen via
ESCALATION_REQUESTED instead.
- Added the Notion tool registration to the _register_verified function.
- Removed the Notion tool registration from the _register_unverified function to ensure proper handling.
- Introduced a comprehensive README.md for the Notion tool.
- Included setup instructions for the Notion API token and credential store configuration.
- Documented available tools and their functionalities.
- Provided usage examples for searching, creating, updating, and managing pages and databases.
- Implemented tests for HTTP error codes, timeouts, and generic exceptions in _request.
- Added tests to verify the use of credential store when provided.
- Enhanced tests for notion_search to include filter types and page size clamping.
- Updated test assertions for successful responses from notion_get_page.
- Added BlockType enum for various Notion block types.
- Updated notion_create_page to allow specifying parent_page_id and title_property.
- Enhanced notion_query_database to support sorting and pagination.
- Introduced notion_create_database for creating databases under a parent page.
- Improved error handling for required parameters in page and database creation.
Track the errlog file handle opened on non-Windows systems and
properly close it during cleanup to prevent file descriptor leaks.
Changes:
- Add _errlog_handle instance variable to track the file handle
- Store handle reference when opening os.devnull
- Close handle in _cleanup_stdio_async() after other cleanup
- Clear reference in disconnect() for safety
Fixes#6002
The handle_pause endpoint referenced session.mode_state (lines 360-361),
which does not exist on the Session dataclass. This caused an
AttributeError every time the pause endpoint reached the phase transition
step, preventing the queen phase from transitioning to staging and
returning a 500 error to the frontend.
Changed to session.phase_state, consistent with handle_stop (line 412),
handle_run (line 75), and the Session dataclass definition
(session_manager.py line 44).
Turns that call report_to_parent were incorrectly treated as "truly
empty" because the flag was not propagated. Thread it through
_run_single_turn and include it in the empty-turn guard.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add pg_get_table_stats for row counts and size info,
pg_list_indexes for index details, and pg_get_foreign_keys
for relationship discovery with both outgoing and incoming FKs.
Add lusha_bulk_enrich_persons for batch enrichment,
lusha_get_technologies for company tech stack lookup, and
lusha_search_decision_makers for senior contact discovery.
Add s3_copy_object for copying within/between buckets,
s3_get_object_metadata for HEAD-based metadata retrieval, and
s3_generate_presigned_url for temporary access URL generation.
Add pushover_cancel_receipt for stopping emergency retries,
pushover_send_glance for widget data updates, and
pushover_get_limits for checking message usage.
Add news_latest for breaking news without query, news_by_source
for source-filtered articles, and news_by_topic for topic-based
discovery with automatic date ranges.
Add scholar_cited_by for finding papers citing a given paper,
scholar_search_profiles for author profile discovery, and
serpapi_google_search for structured Google web results.
Add exa_search_news, exa_search_papers, and exa_search_companies
convenience wrappers with pre-configured category filters and
automatic date/domain filtering.
- greenhouse_list_offers: GET /offers or /applications/{id}/offers
- greenhouse_add_candidate_note: POST /candidates/{id}/activity_feed/notes
- greenhouse_list_scorecards: GET /applications/{id}/scorecards
- Add _post helper for POST requests
- brevo_list_contacts: GET /contacts with pagination and modified_since filter
- brevo_delete_contact: DELETE /contacts/{email} to remove contacts
- brevo_list_email_campaigns: GET /emailCampaigns with status filter and stats
- quickbooks_list_invoices: query invoices with status/customer filters
- quickbooks_get_customer: GET /customer/{id} with address and contact info
- quickbooks_create_payment: POST /payment with optional invoice linking
- cloudinary_get_usage: GET /usage for storage, bandwidth, transformation limits
- cloudinary_rename_resource: POST /rename to change public_id
- cloudinary_add_tag: POST /tags to add tags to resources
- twitter_get_user_followers: GET /users/{id}/followers with profile details
- twitter_get_tweet_replies: search recent replies via conversation_id
- twitter_get_list_tweets: GET /lists/{id}/tweets with author expansion
- apollo_get_person_activities: GET /activities for contact activity history
- apollo_list_email_accounts: GET /email_accounts for connected sending accounts
- apollo_bulk_enrich_people: POST /people/bulk_match for batch enrichment (up to 10)
- calendly_cancel_event: POST /scheduled_events/{id}/cancellation
- calendly_list_webhooks: GET /webhook_subscriptions for org/user scope
- calendly_get_event_type: GET /event_types/{id} for meeting template details
- Add _post helper for POST requests
- pagerduty_list_oncalls: GET /oncalls with schedule/policy filters
- pagerduty_add_incident_note: POST /incidents/{id}/notes to add notes
- pagerduty_list_escalation_policies: GET /escalation_policies with search
- airtable_delete_records: DELETE records by comma-separated IDs (up to 10)
- airtable_search_records: search records using FIND formula for partial matching
- airtable_list_collaborators: list base collaborators via meta API
- Add _delete helper for DELETE requests
- reddit_get_subreddit_info: GET /r/{name}/about for subscriber count, description
- reddit_get_post_detail: GET /by_id/t3_{id} for full post details with flair, ratios
- reddit_get_user_posts: GET /user/{name}/submitted for user's post history
Add confluence_update_page, confluence_delete_page, and
confluence_get_page_children tools using Confluence REST API v2.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add intercom_close_conversation, intercom_create_contact, and
intercom_list_conversations to Intercom credential spec.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add close_conversation, create_contact, and list_conversations client
methods plus intercom_close_conversation, intercom_create_contact, and
intercom_list_conversations MCP tools using Intercom API v2.11.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add gitlab_update_issue, gitlab_get_merge_request, and
gitlab_create_merge_request_note to GitLab credential spec.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add _put helper and gitlab_update_issue, gitlab_get_merge_request,
and gitlab_create_merge_request_note tools using GitLab REST API v4.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add slack_get_channel_info, slack_list_files, and slack_get_file_info
to Slack credential spec.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add get_channel_info, list_files, and get_file_info client methods
plus slack_get_channel_info, slack_list_files, and slack_get_file_info
MCP tools using Slack Web API.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add stripe_list_disputes, stripe_list_events, and
stripe_create_checkout_session to Stripe credential spec.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add list_disputes, list_events, and create_checkout_session client
methods plus stripe_list_disputes, stripe_list_events, and
stripe_create_checkout_session MCP tools using Stripe API.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add linear_cycles_list, linear_issue_comments_list, and
linear_issue_relation_create to Linear credential spec.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add list_cycles, list_issue_comments, and create_issue_relation client
methods plus linear_cycles_list, linear_issue_comments_list, and
linear_issue_relation_create MCP tools using Linear GraphQL API.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add zoom_update_meeting, zoom_list_meeting_participants, and
zoom_list_meeting_registrants to Zoom credential spec.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add zoom_update_meeting (PATCH), zoom_list_meeting_participants
(past meeting attendees), and zoom_list_meeting_registrants
using Zoom REST API v2.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add twilio_list_phone_numbers, twilio_list_calls, and
twilio_delete_message to both Twilio credential specs.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add twilio_list_phone_numbers, twilio_list_calls, and
twilio_delete_message tools using Twilio REST API.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add shopify_update_product, shopify_get_customer, and
shopify_create_draft_order to both Shopify credential specs.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add shopify_update_product, shopify_get_customer, and
shopify_create_draft_order tools using Shopify Admin REST API.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add zendesk_get_ticket_comments, zendesk_add_ticket_comment, and
zendesk_list_users to all three Zendesk credential specs.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add zendesk_get_ticket_comments, zendesk_add_ticket_comment, and
zendesk_list_users tools using Zendesk Support API v2.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Register asana_update_task, asana_add_comment, and
asana_create_subtask in the Asana credential spec.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add _put helper and three new Asana MCP tools:
- asana_update_task: modify name, notes, completion, due date, assignee
- asana_add_comment: post comment stories on tasks
- asana_create_subtask: create subtasks under existing tasks
API ref: https://developers.asana.com/docs
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add three new Trello MCP tools:
- trello_get_card: retrieve full card details with members/checklists/attachments
- trello_create_list: create new lists on boards
- trello_search_cards: full-text search across cards with board scoping
Update credential spec to include the new tool names.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add three new GitHub MCP tools:
- github_list_commits: query commits with author/date/branch filters
- github_create_release: create tagged releases with notes and draft support
- github_list_workflow_runs: monitor CI/CD pipeline runs with status filters
Update credential spec to include the new tool names.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add _GitHubClient methods for:
- list_commits: GET /repos/{owner}/{repo}/commits with sha/author/date filters
- create_release: POST /repos/{owner}/{repo}/releases with tag, notes, draft
- list_workflow_runs: GET /repos/{owner}/{repo}/actions/runs with filters
API ref: https://docs.github.com/en/rest
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add three new Telegram MCP tools:
- telegram_get_chat_member_count: retrieve group/channel membership size
- telegram_send_video: send video files via URL or file_id
- telegram_set_chat_description: update group/channel descriptions
Update credential spec to include the new tool names.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add _TelegramClient methods for:
- get_chat_member_count: getChatMemberCount API endpoint
- send_video: sendVideo with caption, parse_mode, duration support
- set_chat_description: setChatDescription for groups/channels
API ref: https://core.telegram.org/bots/api
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Register salesforce_delete_record, salesforce_search_records, and
salesforce_get_record_count in both Salesforce credential specs.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add three new Salesforce MCP tools:
- salesforce_delete_record: DELETE /sobjects/{type}/{id}
- salesforce_search_records: SOSL full-text search via /search/
- salesforce_get_record_count: efficient COUNT() query for any SObject
API ref: https://developer.salesforce.com/docs/atlas.en-us.api_rest.meta
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add three new Discord MCP tools:
- discord_get_channel: retrieve channel metadata (name, topic, type)
- discord_create_reaction: add emoji reactions to messages
- discord_delete_message: remove messages from channels
Update credential spec to include the new tool names.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add _DiscordClient methods for:
- get_channel: retrieve channel metadata via GET /channels/{id}
- create_reaction: add emoji reaction via PUT reactions endpoint
- delete_message: remove a message via DELETE messages endpoint
API ref: https://discord.com/developers/docs/resources
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Register notion_update_page, notion_archive_page, and
notion_append_blocks in the Notion credential spec.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add three new Notion MCP tools:
- notion_update_page: modify page properties via PATCH /pages/{id}
- notion_archive_page: archive or restore pages
- notion_append_blocks: add paragraphs, headings, lists, todos, etc.
API ref: https://developers.notion.com/reference
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Register jira_update_issue, jira_list_transitions, and
jira_transition_issue in all three Jira credential specs
(domain, email, token).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add three new Jira MCP tools:
- jira_update_issue: modify summary, description, priority, labels, assignee
- jira_list_transitions: discover available status transitions for an issue
- jira_transition_issue: move an issue to a new status with optional comment
API ref: https://developer.atlassian.com/cloud/jira/platform/rest/v3/
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add three new MCP tools:
- hubspot_delete_object: archive contacts, companies, or deals
- hubspot_list_associations: query links between CRM objects (v4 API)
- hubspot_create_association: link two CRM records together
Update credential spec to include the new tool names.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add _HubSpotClient methods for:
- delete_object: archive a CRM object via DELETE /crm/v3/objects
- list_associations: query associations via GET /crm/v4/objects associations endpoint
- create_association: link two CRM objects via PUT /crm/v4/objects associations endpoint
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The google_sheets.py file defined GOOGLE_SHEETS_CREDENTIALS (an API-key
based credential for reading public sheets via GOOGLE_SHEETS_API_KEY) but
was never wired into the package:
- Never imported in credentials/__init__.py
- Never merged into CREDENTIAL_SPECS
- Never listed in __all__
- Tool never calls credentials.get('google_sheets_key') — uses 'google' (OAuth2)
- Tool names in the spec were stale and did not match actual function names
The 'google' credential in email.py already correctly covers all Google
Sheets tools via OAuth2. This file was dead code with no referencing
imports anywhere in the repository.
Closes#6077
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Resolves inconsistency between CLAUDE.md/AGENTS.md (TUI deprecated) and
docs/roadmap.md (TUI listed as completed feature).
- Strike through TUI items in 3 roadmap sections
- Add deprecation note to TUI-to-GUI upgrade section
- Reference AGENTS.md and hive open as replacement
Fixes#5941
Signed-off-by: Robert Hallers <robert@terplabs.ai>
- Change get_chat client method from httpx.get+params to httpx.post+json
to avoid URL-encoding issues with @username chat IDs
- Remove {"success": True} normalization from delete_message,
send_chat_action, pin_message, and unpin_message MCP tools;
return raw Telegram API response consistently
- Update corresponding test mocks and assertions to match
Add 8 new operations to the Telegram Bot tool, bringing it from 2 to 10
operations. This covers message lifecycle (edit, delete, forward), media
(send photo), chat info (get chat), UX (typing indicators), and pin
management — making the tool practical for agent workflows beyond
fire-and-forget messaging.
New operations:
- telegram_edit_message: edit previously sent messages
- telegram_delete_message: delete messages
- telegram_forward_message: forward between chats
- telegram_send_photo: send photos via URL or file_id
- telegram_send_chat_action: show typing/uploading indicators
- telegram_get_chat: retrieve chat metadata
- telegram_pin_message: pin important messages
- telegram_unpin_message: unpin stale messages
Also includes input validation for chat actions, credential spec updates,
central registry wiring, and 31 new tests (52 total).
Closes#4808
Use is_file() instead of exists() to reject directories, and check
for empty content before passing to json parser. Prevents raw
tracebacks on invalid agent.json inputs.
Fixes#5787
- Replace joined.replace("\n", "\r\n") with re.sub(r"(?<!\r)\n", "\r\n", joined to prevent \r\n in replace op new_content from becoming \r\r\n (fixed in both hashline_edit.py and file_ops.py)
- Track and report skipped large files in grep_search instead of silently skipping them
- Extract HASHLINE_MAX_FILE_BYTES constant to hashline.py as single source of truth, imported by view_file, grep_search, hashline_edit, and file_ops
- Add tests for CRLF replace op (both copies) and large file skip reporting
register_all_tools() now only loads verified (stable) tools by default.
Pass include_unverified=True to also load new/community integrations.
This prevents unverified tools from being loaded in production.
Also fixes duplicate register_brevo and register_pushover calls.
- Remove s3_tool (duplicate of aws_s3_tool), power_bi_tool (duplicate of
powerbi_tool), x_tool (duplicate of twitter_tool)
- Remove integrations/plaid (duplicate of plaid_tool), integrations/sap_s4hana
(duplicate of sap_tool), stray tools/mssql.py
- Add help key to credential error responses across 14 tool modules
- Fix health checker registry keys (calendly -> calendly_pat, lusha -> lusha_api_key)
- Add health_check_endpoint to calendly and lusha credential specs
- Fix Trello env var (TRELLO_TOKEN -> TRELLO_API_TOKEN) and remove duplicate
Trello specs from hubspot.py
- Add credential_group="aws" to AWS S3 and Redshift specs sharing env vars
- Update conftest UNREGISTERED_COMMUNITY_MODULES to only contain mssql_tool
Implements 5 tools via Tines REST API:
- tines_list_stories: List workflow stories with search/filter
- tines_get_story: Get story details including entry/exit agents
- tines_list_actions: List actions (agents) in stories
- tines_get_action: Get action details with sources/receivers
- tines_get_action_logs: Get action execution logs by level
Uses Bearer token auth with tenant domain.
Implements 4 tools via X API v2:
- twitter_search_tweets: Search recent tweets with query operators
- twitter_get_user: Get user profile by username
- twitter_get_user_tweets: Get user timeline
- twitter_get_tweet: Get tweet details by ID
Uses Bearer token auth (app-only, read access).
Implements 5 tools via QuickBooks Online API v3:
- quickbooks_query: Query entities with SQL-like syntax
- quickbooks_get_entity: Get entity by type and ID
- quickbooks_create_customer: Create customers
- quickbooks_create_invoice: Create invoices with line items
- quickbooks_get_company_info: Get company details
Uses OAuth 2.0 Bearer token auth. Supports sandbox mode.
Implements 5 tools via AWS S3 REST API:
- s3_list_buckets: List all buckets in the account
- s3_list_objects: List objects with prefix/delimiter filtering
- s3_get_object: Get object content and metadata
- s3_put_object: Upload text objects
- s3_delete_object: Delete objects
Uses AWS Signature V4 signing (no boto3 dependency).
Implements 5 tools via Calendly API v2:
- calendly_get_current_user: Get user URI and profile info
- calendly_list_event_types: List meeting templates
- calendly_list_scheduled_events: List booked meetings with date filters
- calendly_get_scheduled_event: Get event details by URI
- calendly_list_invitees: List invitees for an event
Uses Bearer token auth (Personal Access Token).
Implements 5 tools via PagerDuty REST API v2:
- pagerduty_list_incidents: List incidents with status/urgency/date filters
- pagerduty_get_incident: Get incident details by ID
- pagerduty_create_incident: Create incidents on a service
- pagerduty_update_incident: Acknowledge or resolve incidents
- pagerduty_list_services: List services with name search
Uses Token auth header, From header for write operations.
Implements 6 tools via Airtable Web API:
- airtable_list_records: List records with filters, sort, field selection
- airtable_get_record: Get a single record by ID
- airtable_create_records: Create up to 10 records per request
- airtable_update_records: Partial update up to 10 records per request
- airtable_list_bases: List accessible bases
- airtable_get_base_schema: Get table and field schema for a base
Uses Bearer token auth (Personal Access Token).
Implements 6 tools via MongoDB Atlas Data API:
- mongodb_find: Find documents with filters, projection, sort, limit
- mongodb_find_one: Find a single document
- mongodb_insert_one: Insert a document
- mongodb_update_one: Update a document with MongoDB operators
- mongodb_delete_one: Delete a document
- mongodb_aggregate: Run aggregation pipelines
Uses API key auth header. All endpoints are POST.
Implements 5 tools via Zendesk Support API v2:
- zendesk_list_tickets: List tickets with status/sort filters
- zendesk_get_ticket: Get ticket details by ID
- zendesk_create_ticket: Create tickets with priority/type/tags
- zendesk_update_ticket: Update ticket fields and add comments
- zendesk_search_tickets: Search tickets with Zendesk query syntax
Uses Basic auth (email/token:api_token).
6 tools: jira_search_issues, jira_get_issue, jira_create_issue,
jira_list_projects, jira_get_project, jira_add_comment. Uses Basic auth
with email + API token and Atlassian Document Format for text fields.
5 tools: cloudinary_upload, cloudinary_list_resources, cloudinary_get_resource,
cloudinary_delete_resource, cloudinary_search. Uses Basic auth with
API key/secret and supports image, video, and raw resource types.
- Implement 6 YouTube API tools (search videos, get video/channel details, list channel videos, get playlist items, search channels)
- Add YOUTUBE_API_KEY credential spec with help_url and description
- Register YouTube tool in tools/__init__.py
- Add comprehensive test coverage (18 tests) with mocking
- Add detailed README with setup instructions and examples
- Use httpx for HTTP requests to YouTube Data API v3
- Verified with real API integration testing
Implements #5603
The grace period logic for client-facing auto-blocks was placed after
_await_user_input(), which blocks forever since no inject_event is
scheduled for text-only turns. This caused test_text_after_user_input
_goes_to_judge to hang indefinitely, blocking CI framework tests.
Move the grace period check before the blocking call so that within
the grace window, auto-blocks with missing outputs skip blocking
entirely and continue to the next LLM turn for judge RETRY pressure.
Also adds an _auto_missing check: nodes with no missing outputs
(e.g. queen monitoring with output_keys=[]) should still block
as their text-only output is legitimate conversation.
Fixes#5633
* fix: enforce 0600 permissions on OAuth token files
Credential files were written with default umask permissions.
Use os.open with explicit 0o600 mode to ensure token files
are always owner-read/write only, regardless of umask.
Fixes#5530
* style: fix line too long in checkpoint_store.py
- Expose page parameter on search_people and search_companies
(client + MCP tool) enabling access beyond the first 50 results
- Add guard requiring at least one filter on both search endpoints
to prevent broad requests that burn API credits
- Add unit tests for pagination and empty filter validation
Add complete SAP S/4HANA integration with:
- Connector for OData API access
- Credential management following Hive patterns
- Unit tests with mocked responses
- Documentation and usage examples
Refs #3182
- Add S3Storage class with upload, download, list, delete operations
- Support IAM roles, environment variables, and credential store
- Implement retry logic with adaptive backoff
- Add MCP tools: s3_upload, s3_download, s3_list, s3_delete, s3_check_credentials
- Include comprehensive tests with moto mocking
- Add documentation for setup and IAM permissions
Closes#3012
- Replace non-existent CLI commands (calculate, interactive, analyze)
with actual commands (run, shell, info) in core/README.md
- Fix test-list argument from <goal_id> to <agent_path> in core/README.md
- Fix misleading docstring on MockProvider.complete_with_tools()
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: hundao <alchemy_wimp@hotmail.com>
* micro-fix: fix wrong credential path and env var in docs
Both docs/configuration.md and docs/environment-setup.md reference a
non-existent ADEN_CREDENTIALS_PATH env var and wrong default path
(~/.aden/credentials). The actual env var is HIVE_CREDENTIAL_KEY and
the default path is ~/.hive/credentials (see storage.py:119,125).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* micro-fix: clarify HIVE_CREDENTIAL_KEY comment wording
Reword comment to avoid implying the env var controls the path.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
- Use Credentials.from_service_account_file() instead of mutating os.environ
- Remove unused dimensions param from _format_report_response
- Remove unused metrics param from _format_realtime_response
- Extract duplicated property_id/limit validation into _validate_inputs helper
- Add credential_group="google_cloud" to GA and BigQuery specs
- Update tests to mock Credentials class
Add read-only GA4 Data API v1 tools: ga_run_report, ga_get_realtime,
ga_get_top_pages, and ga_get_traffic_sources. Includes credential spec,
unit tests, and README.
Add IntercomHealthChecker (subclass of OAuthBearerHealthChecker) and
register it in HEALTH_CHECKERS so the credential registry completeness
test passes in CI.
- Pass assignee_type from intercom_assign_conversation tool function
through to _IntercomClient.assign_conversation() and into the API payload
- Add tests for assignee_type="team" passthrough at client and tool levels
- Add tool README with setup, usage examples, and error handling
Addresses PR #5171 review feedback from @bryanadenhq
- Add brevo_tool with 6 MCP tools: brevo_send_email, brevo_send_sms,
brevo_create_contact, brevo_get_contact, brevo_update_contact,
brevo_get_email_stats
- Add CredentialSpec for BREVO_API_KEY in credentials/brevo.py
- Register brevo_tool in tools/__init__.py and credentials/__init__.py
- Add README with setup instructions and usage examples
- Add 34 unit tests covering all tools, validation and error handling
Closes#5127
- search_people: replaced freetext searchText concatenation with proper
structured Lusha API filters (jobTitles, seniority as list[int],
departments, locations as dict, company_names, industry_ids, search_text)
- search_companies: added locations, company_names, search_text params;
made all params optional for flexible queries
- Pagination: exposed limit param (clamped 10-50 per Lusha API constraints)
on both search tools, replacing hardcoded size=25
- get_signals: changed ids from list[str] to list[int], removed internal
str-to-int conversion as Lusha IDs are always numeric
- seniority type corrected to list[int] (API rejects string-encoded values
despite OpenAPI spec suggesting strings — verified via live integration)
- Unit tests updated for all changes (19/19 pass)
Verified against live Lusha API: all 6 tools return correct responses.
- search_companies: replace names filter with mainIndustriesIds (numeric
industry IDs) per Lusha API schema. Parameter changed from
industry: str to industry_ids: list[int] | None.
- _get_api_key: return None instead of raising TypeError on unexpected
credential type. Lets _get_client handle it with the standard error dict
pattern used across all tools.
- Updated unit tests for new industry_ids parameter and added test for
non-string credential handling.
Lusha API rejects filters.companies.include.searchText (HTTP 400).
Replaced with valid 'names' field in search_companies and removed
redundant company searchText from search_people. Updated unit tests.
- Fix export endpoint: /Export -> /ExportTo
- Add 202 Accepted response handling
- Add notifyOption to refresh_dataset API call
- Rename format parameter to export_format (avoid shadowing builtin)
- Add PNG support to export formats
- All critical API issues from review addressed
Implements comprehensive Apify integration for web scraping and automation:
- Added 4 new tools: apify_run_actor, apify_get_dataset, apify_get_run, apify_search_actors
- Credential management for APIFY_API_TOKEN with health check
- Support for synchronous (wait=True) and asynchronous (wait=False) actor execution
- Actor ID validation and comprehensive error handling
- Full test coverage (26 tests passing)
- README with usage examples and documentation
Addresses #4510
Add Zoho CRM MCP integration for lead/contact/account/deal workflows with notes support. Implements 5 MCP tools:
- zoho_crm_search: Search Leads/Contacts/Accounts/Deals by criteria or word with pagination
- zoho_crm_get_record: Fetch a single record by module and ID
- zoho_crm_create_record: Create records with pass-through field payloads
- zoho_crm_update_record: Update records by ID with partial field payloads
- zoho_crm_add_note: Create notes linked to CRM records via Parent_Id mapping
Features:
- Zoho OAuth2 provider added in core credentials (refresh-token flow)
- Zoho auth format: Authorization: Zoho-oauthtoken <token>
- Region/DC-aware routing using accounts domain/region + api_domain usage
- Persisted DC metadata on refresh (api_domain/accounts_domain/location)
- Credential spec and health check registration for zoho_crm
- Tool registration and allowed-tool list updates
- Normalized tool responses with retriable 429 handling
- README with setup, auth modes, usage, and testing instructions
- Comprehensive unit/integration coverage updates for tool, provider, and health checks
Validation:
- Scoped ruff lint/format checks passed
- Targeted test suite passed: 563 passed, 18 skipped
Closes#4418
- Keep both APOLLO_CREDENTIALS and AIRTABLE_CREDENTIALS
- Keep both apollo_tool and airtable_tool imports (alphabetical)
Co-authored-by: Cursor <cursoragent@cursor.com>
- Keep both APOLLO_CREDENTIALS and CALENDLY_CREDENTIALS
- Keep both apollo_tool and calendly_tool imports (alphabetical)
Co-authored-by: Cursor <cursoragent@cursor.com>
- Add 429 handling with retry_after from Retry-After header
- Add _request_with_retry (2 retries) for all API calls
- Update tests to use httpx.request
Co-authored-by: Cursor <cursoragent@cursor.com>
- Add 429 handling with retry_after from Retry-After header
- Add _request_with_retry (2 retries) for all API calls
- Validate get_availability date range <= 7 days
- Update tests to use httpx.request
Co-authored-by: Cursor <cursoragent@cursor.com>
Implements Reddit API integration for community management and content monitoring.
Features:
- Search & Monitoring: search posts/comments, get subreddit feeds (new/hot), get posts/comments (6 tools)
- Content Creation: submit posts, reply, edit, delete comments (5 tools)
- User Engagement: get profiles, upvote, downvote, save posts (4 tools)
- Moderation: remove/approve posts, ban users (3 tools)
Implementation:
- OAuth 2.0 authentication via REDDIT_CREDENTIALS
- PRAW library for Reddit API integration
- Comprehensive error handling and validation
- Full test coverage (25 tests passing)
Resolves#3595
Add test_x_credentials_share_credential_group to verify all X credentials
share the 'x' credential group. Update test_credential_group_default_empty
to account for X credentials alongside existing Google exceptions.
- Added `x_send_dm` tool using v2 endpoint (`POST /dm_conversations/with/:id/messages`) for reliable 1:1 messaging.
- Fixed 403 Forbidden payload validation errors by simplifying DM payload structure.
- Enhanced `_handle_response` to verify `x_tool.py` returns raw API error details for 403/400 responses, aiding in permission debugging.
- Updated `demo_x_tools.py` to support standard `.env` variable names (e.g., `X_API_KEY`) and added user lookup for DM testing.
- Added unit tests covering new DM functionality and payload verification in `test_x_tool.py`.
- Audited credential handling: Read-only tools (Search/Mentions) correctly use Bearer Token, while Write tools (Post/Reply/Delete/DM) enforce OAuth 1.0a User Context.
Verified with live API tests (see PR description for logs).
- Implement 5 core functions for data warehouse querying
- Add boto3 integration with Redshift Data API
- Security: Read-only SELECT queries by default
- Full credential store support
- 26/26 tests passing (100% coverage)
- Complete documentation with examples
description: SOP for debugging browser automation failures on complex websites. Use when browser tools fail on specific sites like LinkedIn, Twitter/X, SPAs, or sites with Shadow DOM.
license: MIT
---
# Browser Tool Edge Cases
Standard Operating Procedure for debugging and fixing browser automation failures on complex websites.
## When to Use This Skill
-`browser_scroll` succeeds but page doesn't move
-`browser_click` succeeds but no action triggered
-`browser_type` text disappears or doesn't work
-`browser_snapshot` hangs or returns stale content
-`browser_navigate` loads wrong content
## SOP: Debugging Browser Tool Failures
### Phase 1: Reproduce & Isolate
```
1. Create minimal test case demonstrating failure
2. Test against simple site (example.com) to verify tool works
| **Verified** | 2026-04-03 ✓ (JS rect query returns correct viewport coords; requires server restart) |
### #16: Stale Browser Context (Group ID Mismatch)
| Attribute | Value |
|-----------|-------|
| **Site** | Any |
| **Symptom** | `browser_open()` returns `"No group with id: XXXXXXX"` even though `browser_status` shows `running: true` |
| **Root Cause** | In-memory `_contexts` dict has a stale `groupId` from a Chrome tab group that was closed outside the tool (e.g. user closed the tab group) |
| **Detection** | `browser_status` returns `running: true` but `browser_open` fails with "No group with id" |
| **Fix** | Call `browser_stop()` to clear stale context from `_contexts`, then `browser_start()` again |
| **Code** | `tools/lifecycle.py:144-160` - `already_running` check uses cached dict without validating against Chrome |
| **Verified** | 2026-04-03 ✓ |
---
## How to Add New Edge Cases
1.**Reproduce** the issue with minimal test case
2.**Document** using the template below
3.**Implement** fix with multi-layer fallback
4.**Verify** against both problematic and simple sites
5.**Submit** by appending to this file
### Template
```markdown
### #N: [Short Title]
| Attribute | Value |
|-----------|-------|
| **Site** | [URL or site type] |
| **Symptom** | [What the user observes] |
| **Root Cause** | [Technical explanation] |
| **Detection** | [JavaScript to detect this case] |
| **Fix** | [Solution approach] |
| **Code** | [File:line reference if implemented] |
description: Core concepts for goal-driven agents - architecture, node types (event_loop, function), tool discovery, and workflow overview. Use when starting agent development or need to understand agent fundamentals.
license: Apache-2.0
metadata:
author: hive
version: "2.0"
type: foundational
part_of: hive
---
# Building Agents - Core Concepts
Foundational knowledge for building goal-driven agents as Python packages.
-`event_loop` — Multi-turn streaming loop with tool execution and judge-based evaluation. Works with or without tools.
-`function` — Deterministic Python operations. No LLM involved.
```python
search_node=NodeSpec(
id="search-web",
name="Search Web",
description="Search for information and extract results",
node_type="event_loop",
input_keys=["query"],
output_keys=["search_results"],
system_prompt="Search the web for: {query}. Use the web_search tool to find results, then call set_output to store them.",
tools=["web_search"],
)
```
**NodeSpec Fields for Event Loop Nodes:**
| Field | Default | Description |
|-------|---------|-------------|
| `client_facing` | `False` | If True, streams output to user and blocks for input between turns |
| `nullable_output_keys` | `[]` | Output keys that may remain unset (for mutually exclusive outputs) |
| `max_node_visits` | `1` | Max times this node executes per run. Set >1 for feedback loop targets |
### Edge
Connection between nodes (written to agent.py)
**Edge Conditions:**
-`on_success` — Proceed if node succeeds (most common)
-`on_failure` — Handle errors
-`always` — Always proceed
-`conditional` — Based on expression evaluating node output
**Edge Priority:**
Priority controls evaluation order when multiple edges leave the same node. Higher priority edges are evaluated first. Use negative priority for feedback edges (edges that loop back to earlier nodes).
```python
# Forward edge (evaluated first)
EdgeSpec(
id="review-to-campaign",
source="review",
target="campaign-builder",
condition=EdgeCondition.CONDITIONAL,
condition_expr="output.get('approved_contacts') is not None",
priority=1,
)
# Feedback edge (evaluated after forward edges)
EdgeSpec(
id="review-feedback",
source="review",
target="extractor",
condition=EdgeCondition.CONDITIONAL,
condition_expr="output.get('redo_extraction') is not None",
priority=-1,
)
```
### Client-Facing Nodes
For multi-turn conversations with the user, set `client_facing=True` on a node. The node will:
- Stream its LLM output directly to the end user
- Block for user input between conversational turns
- Resume when new input is injected via `inject_event()`
```python
intake_node=NodeSpec(
id="intake",
name="Intake",
description="Gather requirements from the user",
node_type="event_loop",
client_facing=True,
input_keys=[],
output_keys=["repo_url","project_url"],
system_prompt="You are the intake agent. Ask the user for the repo URL and project URL.",
)
```
> **Legacy Note:** The old `pause_nodes` / `entry_points` pattern still works but `client_facing=True` is preferred for new agents.
**STEP 1 / STEP 2 Prompt Pattern:** For client-facing nodes, structure the system prompt with two explicit phases:
```python
system_prompt="""\
**STEP 1 — Respond to the user (text only, NO tool calls):**
[Present information, ask questions, etc.]
**STEP 2 — After the user responds, call set_output:**
[Call set_output with the structured outputs]
"""
```
This prevents the LLM from calling `set_output` prematurely before the user has had a chance to respond.
### Node Design: Fewer, Richer Nodes
Prefer fewer nodes that do more work over many thin single-purpose nodes:
Why: Each node boundary requires serializing outputs and passing context. Fewer nodes means the LLM retains full context of its work within the node. A research node that searches, fetches, and analyzes keeps all the source material in its conversation history.
### nullable_output_keys for Cross-Edge Inputs
When a node receives inputs that only arrive on certain edges (e.g., `feedback` only comes from a review → research feedback loop, not from intake → research), mark those keys as `nullable_output_keys`:
```python
research_node=NodeSpec(
id="research",
input_keys=["research_brief","feedback"],
nullable_output_keys=["feedback"],# Not present on first visit
max_node_visits=3,
...
)
```
## Event Loop Architecture Concepts
### How EventLoopNode Works
An event loop node runs a multi-turn loop:
1. LLM receives system prompt + conversation history
2. LLM responds (text and/or tool calls)
3. Tool calls are executed, results added to conversation
5. Repeat until judge ACCEPTs or max_iterations reached
### EventLoopNode Runtime
EventLoopNodes are **auto-created** by `GraphExecutor` at runtime. You do NOT need to manually register them. Both `GraphExecutor` (direct) and `AgentRuntime` / `create_agent_runtime()` handle event_loop nodes automatically.
```python
# Direct execution — executor auto-creates EventLoopNodes
Nodes produce structured outputs by calling `set_output(key, value)` — a synthetic tool injected by the framework. When the LLM calls `set_output`, the value is stored in the output accumulator and made available to downstream nodes via shared memory.
`set_output` is NOT a real tool — it is excluded from `real_tool_results`. For client-facing nodes, this means a turn where the LLM only calls `set_output` (no other tools) is treated as a conversational boundary and will block for user input.
### JudgeProtocol
**The judge is the SOLE mechanism for acceptance decisions.** Do not add ad-hoc framework gating, output rollback, or premature rejection logic. If the LLM calls `set_output` too early, fix it with better prompts or a custom judge — not framework-level guards.
The judge controls when a node's loop exits:
- **Implicit judge** (default, no judge configured): ACCEPTs when the LLM finishes with no tool calls and all required output keys are set
- **SchemaJudge**: Validates outputs against a Pydantic model
When tool results exceed the context window, the framework automatically saves them to a spillover directory and truncates with a hint. Nodes that produce or consume large data should include the data tools:
-`save_data(filename, data)` — Write data to a file in the data directory
-`load_data(filename, offset=0, limit=50)` — Read data with line-based pagination
-`list_data_files()` — List available data files
-`serve_file_to_user(filename, label="")` — Get a clickable file:// URI for the user
Note: `data_dir` is a framework-injected context parameter — the LLM never sees or passes it. `GraphExecutor.execute()` sets it per-execution via `contextvars`, so data tools and spillover always share the same session-scoped directory.
These are real MCP tools (not synthetic). Add them to nodes that handle large tool results:
Multiple ON_SUCCESS edges from the same source create parallel execution. All branches run concurrently via `asyncio.gather()`. Parallel event_loop nodes must have disjoint `output_keys`.
### max_node_visits
Controls how many times a node can execute in one graph run. Default is 1. Set higher for nodes that are targets of feedback edges (review-reject loops). Set 0 for unlimited (guarded by max_steps).
## Tool Discovery & Validation
**CRITICAL:** Before adding a node with tools, you MUST verify the tools exist.
Tools are provided by MCP servers. Never assume a tool exists - always discover dynamically.
### Step 1: Register MCP Server (if not already done)
description: Set up and install credentials for an agent. Detects missing credentials from agent config, collects them from the user, and stores them securely in the local encrypted store at ~/.hive/credentials.
license: Apache-2.0
metadata:
author: hive
version: "2.3"
type: utility
---
# Setup Credentials
Interactive credential setup for agents with multiple authentication options. Detects what's missing, offers auth method choices, validates with health checks, and stores credentials securely.
## When to Use
- Before running or testing an agent for the first time
- When `AgentRunner.run()` fails with "missing required credentials"
- When a user asks to configure credentials for an agent
- After building a new agent that uses tools requiring API keys
## Workflow
### Step 1: Identify the Agent
Determine which agent needs credentials. The user will either:
- Name the agent directly (e.g., "set up credentials for hubspot-agent")
- Have an agent directory open (check `exports/` for agent dirs)
- Be working on an agent in the current session
Locate the agent's directory under `exports/{agent_name}/`.
### Step 2: Detect Missing Credentials
Use the `check_missing_credentials` MCP tool to detect what the agent needs and what's already configured. This tool loads the agent, inspects its required tools and node types, maps them to credentials via `CREDENTIAL_SPECS`, and checks both the encrypted store and environment variables.
"description":"Brave Search API key for web search",
"help_url":"https://brave.com/search/api/",
"tools":["web_search"]
}
],
"available":[
{
"credential_name":"anthropic",
"env_var":"ANTHROPIC_API_KEY",
"source":"encrypted_store"
}
],
"total_missing":1,
"ready":false
}
```
**If `ready` is true (nothing missing):** Report all credentials as configured and skip Steps 3-5. Example:
```
All required credentials are already configured:
✓ anthropic (ANTHROPIC_API_KEY)
✓ brave_search (BRAVE_SEARCH_API_KEY)
Your agent is ready to run!
```
**If credentials are missing:** Continue to Step 3 with the `missing` list.
### Step 3: Present Auth Options for Each Missing Credential
For each missing credential, check what authentication methods are available:
```python
fromaden_tools.credentialsimportCREDENTIAL_SPECS
spec=CREDENTIAL_SPECS.get("hubspot")
ifspec:
# Determine available auth options
auth_options=[]
ifspec.aden_supported:
auth_options.append("aden")
ifspec.direct_api_key_supported:
auth_options.append("direct")
auth_options.append("custom")# Always available
# Get setup info
setup_info={
"env_var":spec.env_var,
"description":spec.description,
"help_url":spec.help_url,
"api_key_instructions":spec.api_key_instructions,
}
```
Present the available options using AskUserQuestion:
```
Choose how to configure HUBSPOT_ACCESS_TOKEN:
1) Aden Platform (OAuth) (Recommended)
Secure OAuth2 flow via hive.adenhq.com
- Quick setup with automatic token refresh
- No need to manage API keys manually
2) Direct API Key
Enter your own API key manually
- Requires creating a HubSpot Private App
- Full control over scopes and permissions
3) Local Credential Setup (Advanced)
Programmatic configuration for CI/CD
- For automated deployments
- Requires manual API calls
```
### Step 4: Execute Auth Flow Based on User Choice
#### Prerequisite: Ensure HIVE_CREDENTIAL_KEY Is Available
Before storing any credentials, verify `HIVE_CREDENTIAL_KEY` is set (needed to encrypt/decrypt the local store). Check both the current session and shell config:
```bash
# Check current session
printenv HIVE_CREDENTIAL_KEY > /dev/null 2>&1&&echo"session: set"||echo"session: not set"
# Check shell config files
for f in ~/.zshrc ~/.bashrc ~/.profile;do[ -f "$f"]&& grep -q 'HIVE_CREDENTIAL_KEY'"$f"&&echo"$f";done
```
- **In current session** — proceed to store credentials
- **In shell config but NOT in current session** — run `source ~/.zshrc` (or `~/.bashrc`) first, then proceed
- **Not set anywhere** — `EncryptedFileStorage` will auto-generate one. After storing, tell the user to persist it: `export HIVE_CREDENTIAL_KEY="{generated_key}"` in their shell profile
> **⚠️ IMPORTANT: After adding `HIVE_CREDENTIAL_KEY` to the user's shell config, always display:**
> ```
> ⚠️ Environment variables were added to your shell config.
> Open a NEW TERMINAL for them to take effect outside this session.
> ```
#### Option 1: Aden Platform (OAuth)
This is the recommended flow for supported integrations (HubSpot, etc.).
**How Aden OAuth Works:**
The ADEN_API_KEY represents a user who has already completed OAuth authorization on Aden's platform. When users sign up and connect integrations on Aden, those OAuth tokens are stored server-side. Having an ADEN_API_KEY means:
1. User has an Aden account
2. User has already authorized integrations (HubSpot, etc.) via OAuth on Aden
3. We just need to sync those credentials down to the local credential store
**4.1a. Check for ADEN_API_KEY**
```python
importos
aden_key=os.environ.get("ADEN_API_KEY")
```
If not set, guide user to get one from Aden (this is where they do OAuth):
The local encrypted store requires `HIVE_CREDENTIAL_KEY` to encrypt/decrypt credentials.
- If the user doesn't have one, `EncryptedFileStorage` will auto-generate one and log it
- The user MUST persist this key (e.g., in `~/.bashrc`/`~/.zshrc` or a secrets manager)
- Without this key, stored credentials cannot be decrypted
**Shell config rule:** Only TWO keys belong in shell config (`~/.zshrc`/`~/.bashrc`):
-`HIVE_CREDENTIAL_KEY` — encryption key for the credential store
-`ADEN_API_KEY` — Aden platform auth key (needed before the store can sync)
All other API keys (Brave, Google, HubSpot, etc.) must go in the encrypted store only. **Never offer to add them to shell config.**
If `HIVE_CREDENTIAL_KEY` is not set:
1. Let the store generate one
2. Tell the user to save it: `export HIVE_CREDENTIAL_KEY="{generated_key}"`
3. Recommend adding it to `~/.bashrc` or their shell profile
## Security Rules
- **NEVER** log, print, or echo credential values in tool output
- **NEVER** store credentials in plaintext files, git-tracked files, or agent configs
- **NEVER** hardcode credentials in source code
- **NEVER** offer to save API keys to shell config (`~/.zshrc`/`~/.bashrc`) — the **only** keys that belong in shell config are `HIVE_CREDENTIAL_KEY` and `ADEN_API_KEY`. All other credentials (Brave, Google, HubSpot, GitHub, Resend, etc.) go in the encrypted store only.
- **ALWAYS** use `SecretStr` from Pydantic when handling credential values in Python
- **ALWAYS** use the local encrypted store (`~/.hive/credentials`) for persistence
- **ALWAYS** run health checks before storing credentials (when possible)
- **ALWAYS** verify credentials were stored by re-running validation, not by reading them back
- When modifying `~/.bashrc` or `~/.zshrc`, confirm with the user first
## Credential Sources Reference
All credential specs are defined in `tools/src/aden_tools/credentials/`:
description: Best practices, patterns, and examples for building goal-driven agents. Includes client-facing interaction, feedback edges, judge patterns, fan-out/fan-in, context management, and anti-patterns.
license: Apache-2.0
metadata:
author: hive
version: "2.0"
type: reference
part_of: hive
---
# Building Agents - Patterns & Best Practices
Design patterns, examples, and best practices for building robust goal-driven agents.
**Prerequisites:** Complete agent structure using `hive-create`.
## Practical Example: Hybrid Workflow
How to build a node using both direct file writes and optional MCP validation:
```python
# 1. WRITE TO FILE FIRST (Primary - makes it visible)
node_code='''
search_node = NodeSpec(
id="search-web",
node_type="event_loop",
input_keys=["query"],
output_keys=["search_results"],
system_prompt="Search the web for: {query}. Use web_search, then call set_output to store results.",
- Immediately sees node in their editor (from step 1)
- Gets validation feedback (from step 2)
- Can edit the file directly if needed
## Multi-Turn Interaction Patterns
For agents needing multi-turn conversations with users, use `client_facing=True` on event_loop nodes.
### Client-Facing Nodes
A client-facing node streams LLM output to the user and blocks for user input between conversational turns. This replaces the old pause/resume pattern.
```python
# Client-facing node with STEP 1/STEP 2 prompt pattern
intake_node=NodeSpec(
id="intake",
name="Intake",
description="Gather requirements from the user",
node_type="event_loop",
client_facing=True,
input_keys=["topic"],
output_keys=["research_brief"],
system_prompt="""\
You are an intake specialist.
**STEP 1 — Read and respond (text only, NO tool calls):**
1. Read the topic provided
2. If it's vague, ask 1-2 clarifying questions
3. If it's clear, confirm your understanding
**STEP 2 — After the user confirms, call set_output:**
- set_output("research_brief", "Clear description of what to research")
""",
)
# Internal node runs without user interaction
research_node=NodeSpec(
id="research",
name="Research",
description="Search and analyze sources",
node_type="event_loop",
input_keys=["research_brief"],
output_keys=["findings","sources"],
system_prompt="Research the topic using web_search and web_scrape...",
- Client-facing nodes stream LLM text to the user and block for input after each response
- User input is injected via `node.inject_event(text)`
- When the LLM calls `set_output` to produce structured outputs, the judge evaluates and ACCEPTs
- Internal nodes (non-client-facing) run their entire loop without blocking
-`set_output` is a synthetic tool — a turn with only `set_output` calls (no real tools) triggers user input blocking
**STEP 1/STEP 2 pattern:** Always structure client-facing prompts with explicit phases. STEP 1 is text-only conversation. STEP 2 calls `set_output` after user confirmation. This prevents the LLM from calling `set_output` prematurely before the user responds.
| Gathering user requirements | Yes | Need user input |
| Human review/approval checkpoint | Yes | Need human decision |
| Data processing (scanning, scoring) | No | Runs autonomously |
| Report generation | No | No user input needed |
| Final confirmation before action | Yes | Need explicit approval |
> **Legacy Note:** The `pause_nodes` / `entry_points` pattern still works for backward compatibility but `client_facing=True` is preferred for new agents.
## Edge-Based Routing and Feedback Loops
### Conditional Edge Routing
Multiple conditional edges from the same source replace the old `router` node type. Each edge checks a condition on the node's output.
system_prompt="Present the contact list to the operator. If they approve, call set_output('approved_contacts', ...). If they want changes, call set_output('redo_extraction', 'true').",
| Loop-back on rejection | Manual entry_points | Feedback edge with `priority=-1` |
| Multi-way routing | Router with routes dict | Multiple conditional edges with priorities |
## Judge Patterns
**Core Principle: The judge is the SOLE mechanism for acceptance decisions.** Never add ad-hoc framework gating to compensate for LLM behavior. If the LLM calls `set_output` prematurely, fix the system prompt or use a custom judge. Anti-patterns to avoid:
- Output rollback logic
-`_user_has_responded` flags
- Premature set_output rejection
- Interaction protocol injection into system prompts
Judges control when an event_loop node's loop exits. Choose based on validation needs.
### Implicit Judge (Default)
When no judge is configured, the implicit judge ACCEPTs when:
- The LLM finishes its response with no tool calls
- All required output keys have been set via `set_output`
Best for simple nodes where "all outputs set" is sufficient validation.
### SchemaJudge
Validates outputs against a Pydantic model. Use when you need structural validation.
```python
frompydanticimportBaseModel
classScannerOutput(BaseModel):
github_users:list[dict]# Must be a list of user objects
3.**Aggressive compaction** — Keeps only recent messages + summary
4.**Emergency** — Hard reset with tool history preservation
### Spillover Pattern
The framework automatically truncates large tool results and saves full content to a spillover directory. The LLM receives a truncation message with instructions to use `load_data` to read the full result.
For explicit data management, use the data tools (real MCP tools, not synthetic):
```python
# save_data, load_data, list_data_files, serve_file_to_user are real MCP tools
# data_dir is auto-injected by the framework — the LLM never sees it
`data_dir` is a framework context parameter — auto-injected at call time. `GraphExecutor.execute()` sets it per-execution via `ToolRegistry.set_execution_context(data_dir=...)` (using `contextvars` for concurrency safety), ensuring it matches the session-scoped spillover directory.
## Anti-Patterns
### What NOT to Do
- **Don't rely on `export_graph`** — Write files immediately, not at end
- **Don't hide code in session** — Write to files as components are approved
- **Don't wait to write files** — Agent visible from first step
- **Don't batch everything** — Write incrementally, one component at a time
- **Don't create too many thin nodes** — Prefer fewer, richer nodes (see below)
- **Don't add framework gating for LLM behavior** — Fix prompts or use judges instead
### Fewer, Richer Nodes
A common mistake is splitting work into too many small single-purpose nodes. Each node boundary requires serializing outputs, losing in-context information, and adding edge complexity.
**Remember: Agent is actively constructed, visible the whole time. No hidden state. No surprise exports. Just transparent, incremental file building.**
description: Iterative agent testing with session recovery. Execute, analyze, fix, resume from checkpoints. Use when testing an agent, debugging test failures, or verifying fixes without re-running from scratch.
---
# Agent Testing
Test agents iteratively: execute, analyze failures, fix, resume from checkpoint, repeat.
## When to Use
- Testing a newly built agent against its goal
- Debugging a failing agent iteratively
- Verifying fixes without re-running expensive early nodes
- Running final regression tests before deployment
## Prerequisites
1. Agent package at `exports/{agent_name}/` (built with `/hive-create`)
2. Credentials configured (`/hive-credentials`)
3.`ANTHROPIC_API_KEY` set (or appropriate LLM provider key)
These return `file_header`, `test_template`, `constraints_formatted`/`success_criteria_formatted`, and `test_guidelines`. They do NOT generate test code — you write the tests.
- Every test MUST be `async` with `@pytest.mark.asyncio`
- Every test MUST accept `runner, auto_responder, mock_mode` fixtures
- Use `await auto_responder.start()` before running, `await auto_responder.stop()` in `finally`
- Use `await runner.run(input_dict)` — this goes through AgentRunner → AgentRuntime → ExecutionStream
- Access output via `result.output.get("key")` — NEVER `result.output["key"]`
-`result.success=True` means no exception, NOT goal achieved — always check output
- Write 8-15 tests total, not 30+
- Each real test costs ~3 seconds + LLM tokens
- NEVER use `default_agent.run()` — it bypasses the runtime (no sessions, no logs, client-facing nodes hang)
#### Step 1d: Check existing tests
Before generating, check if tests already exist:
```python
list_tests(
goal_id="your-goal-id",
agent_path="exports/{agent_name}"
)
```
---
### Phase 2: Execute
Two execution paths, use the right one for your situation.
#### Iterative debugging (for complex agents)
Run the agent via CLI. This creates sessions with checkpoints at `~/.hive/agents/{agent_name}/sessions/`:
```bash
uv run hive run exports/{agent_name} --input '{"query": "test topic"}'
```
Sessions and checkpoints are saved automatically.
**Client-facing nodes**: Agents with `client_facing=True` nodes (interactive conversation) work in headless mode when run from a real terminal — the agent streams output to stdout and reads user input from stdin via a `>>> ` prompt. In non-interactive shells (like Claude Code's Bash tool), client-facing nodes will hang because there is no stdin. For testing interactive agents from Claude Code, use `run_tests` with mock mode or have the user run the agent manually in their terminal.
#### Automated regression (for CI or final verification)
Use the `run_tests` MCP tool to run all pytest tests:
**Note:**`run_tests` uses `AgentRunner` with `tmp_path` storage, so sessions are isolated per test run. For checkpoint-based recovery with persistent sessions, use CLI execution. Use `run_tests` for quick regression checks and final verification.
---
### Phase 3: Analyze Failures
When a test fails, drill down systematically. Don't guess — use the tools.
#### Step 3a: Get error category
```python
debug_test(
goal_id="your-goal-id",
test_name="test_success_source_diversity",
agent_path="exports/{agent_name}"
)
```
Returns error category (`IMPLEMENTATION_ERROR`, `ASSERTION_FAILURE`, `TIMEOUT`, `IMPORT_ERROR`, `API_ERROR`) plus full traceback and suggestions.
#### Step 3b: Find the failed session
```python
list_agent_sessions(
agent_work_dir="~/.hive/agents/{agent_name}",
status="failed",
limit=5
)
```
Returns session list with IDs, timestamps, current_node (where it failed), execution_quality.
#### Step 3c: Inspect session state
```python
get_agent_session_state(
agent_work_dir="~/.hive/agents/{agent_name}",
session_id="session_20260209_143022_abc12345"
)
```
Returns execution path, which node was current, step count, timestamps — but excludes memory values (to avoid context bloat). Shows `memory_keys` and `memory_size` instead.
Returns checkpoint summaries with IDs, types (`node_start`, `node_complete`), which node, and `is_clean` flag. Clean checkpoints are safe resume points.
#### Step 3g: Compare checkpoints (optional)
To understand what changed between two points in execution:
This skips all nodes before the checkpoint and only re-runs the fixed node onward.
#### When to re-run from scratch
Re-run when ANY of these are true:
- The fix is to the entry node or an early node
- No checkpoints exist (e.g., agent was run via `run_tests`)
- The agent is fast (2-3 nodes, completes in seconds)
- You changed the graph structure (added/removed nodes/edges)
```bash
uv run hive run exports/{agent_name} --input '{"query": "test topic"}'
```
#### Inspecting a checkpoint before resuming
```python
get_agent_checkpoint(
agent_work_dir="~/.hive/agents/{agent_name}",
session_id="session_20260209_143022_abc12345",
checkpoint_id="cp_node_complete_research_143030"
)
```
Returns the full checkpoint: shared_memory snapshot, execution_path, current_node, next_node, is_clean.
#### Loop back to Phase 2
After resuming or re-running, check if the fix worked. If not, go back to Phase 3.
---
### Phase 6: Final Verification
Once the iterative fix loop converges (the agent produces correct output), run the full automated test suite:
```python
run_tests(
goal_id="your-goal-id",
agent_path="exports/{agent_name}"
)
```
All tests should pass. If not, repeat the loop for remaining failures.
---
## Credential Requirements
**CRITICAL: Testing requires ALL credentials the agent depends on.** This includes both the LLM API key AND any tool-specific credentials (HubSpot, Brave Search, etc.).
### Prerequisites
Before running agent tests, you MUST collect ALL required credentials from the user.
**Step 1: LLM API Key (always required)**
```bash
exportANTHROPIC_API_KEY="your-key-here"
```
**Step 2: Tool-specific credentials (depends on agent's tools)**
Inspect the agent's `mcp_servers.json` and tool configuration to determine which tools the agent uses, then check for all required credentials:
When the user asks to test an agent, **ALWAYS check for ALL credentials first**:
1.**Identify the agent's tools** from `mcp_servers.json`
2.**Check ALL required credentials** using `CredentialManager`
3.**Ask the user to provide any missing credentials** before proceeding
4. Collect ALL missing credentials in a single prompt — not one at a time
---
## Safe Test Patterns
### OutputCleaner
The framework automatically validates and cleans node outputs using a fast LLM at edge traversal time. Tests should still use safe patterns because OutputCleaner may not catch all issues.
old_string='system_prompt="Search for information on the user\'s topic."',
new_string='system_prompt="Search for information on the user\'s topic. You MUST find at least 5 diverse, authoritative sources. Use multiple different search queries to ensure source diversity. Do not stop searching until you have at least 5 distinct sources."'
)
```
### Phase 5: Resume from checkpoint
For this example, the fix is to the `research` node. If we had run via CLI with checkpointing, we could resume from the checkpoint after `intake` to skip re-running intake:
old_string='system_prompt="Search for information on the user\'s topic using web search."',
new_string='system_prompt="Search for information on the user\'s topic using web search. You MUST find at least 5 diverse, authoritative sources. Use multiple different search queries with varied keywords. Do NOT call set_output until you have gathered at least 5 distinct sources from different domains."'
)
```
---
## Phase 5: Recover & Resume (Iteration 1)
The fix is to the `research` node. Since this was a `run_tests` execution (no checkpoints), we re-run from scratch:
old_string='system_prompt="Write a comprehensive report based on the research findings."',
new_string='system_prompt="Write a comprehensive report based on the research findings. You MUST include numbered citations [1], [2], etc. for every factual claim. At the end, include a References section listing all sources with their URLs. Every claim must be traceable to a specific source."'
)
```
---
## Phase 5: Resume (Iteration 2)
The fix is to the `report` node (the last node). To demonstrate checkpoint recovery, run via CLI:
```bash
# Run via CLI to get checkpoints
uv run hive run exports/deep_research_agent --input '{"topic": "climate change effects"}'
# After it runs, find the clean checkpoint before report
description: Complete workflow for building, implementing, and testing goal-driven agents. Orchestrates hive-* skills. Use when starting a new agent project, unsure which skill to use, or need end-to-end guidance.
license: Apache-2.0
metadata:
author: hive
version: "2.0"
type: workflow-orchestrator
orchestrates:
- hive-concepts
- hive-create
- hive-patterns
- hive-test
- hive-credentials
- hive-debugger
---
# Agent Development Workflow
**THIS IS AN EXECUTABLE WORKFLOW. DO NOT explore the codebase or read source files. ROUTE to the correct skill IMMEDIATELY.**
When this skill is loaded, **ALWAYS use the AskUserQuestion tool** to present options:
```
Use AskUserQuestion with these options:
- "Build a new agent" → Then invoke /hive-create
- "Test an existing agent" → Then invoke /hive-test
- "Learn agent concepts" → Then invoke /hive-concepts
- "Optimize agent design" → Then invoke /hive-patterns
- "Set up credentials" → Then invoke /hive-credentials
- "Debug a failing agent" → Then invoke /hive-debugger
- "Other" (please describe what you want to achieve)
```
**DO NOT:** Read source files, explore the codebase, search for code, or do any investigation before routing. The sub-skills handle all of that.
---
Complete Standard Operating Procedure (SOP) for building production-ready goal-driven agents.
## Overview
This workflow orchestrates specialized skills to take you from initial concept to production-ready agent:
4.**Preserve working states** - Tag successful iterations
5.**Learn from failures** - Failed tests reveal design issues
## Exit Criteria
You're done with the workflow when:
✅ Agent structure validates
✅ All tests pass
✅ Success criteria met
✅ Constraints verified
✅ Documentation complete
✅ Agent ready for deployment
## Additional Resources
- **hive-concepts**: See `.claude/skills/hive-concepts/SKILL.md`
- **hive-create**: See `.claude/skills/hive-create/SKILL.md`
- **hive-patterns**: See `.claude/skills/hive-patterns/SKILL.md`
- **hive-test**: See `.claude/skills/hive-test/SKILL.md`
- **Agent framework docs**: See `core/README.md`
- **Example agents**: See `exports/` directory
## Summary
This workflow provides a proven path from concept to production-ready agent:
1.**Learn** with `/hive-concepts` → Understand fundamentals (optional)
2.**Build** with `/hive-create` → Get validated structure
3.**Optimize** with `/hive-patterns` → Apply best practices (optional)
4.**Configure** with `/hive-credentials` → Set up API keys (if needed)
5.**Test** with `/hive-test` → Get verified functionality
6.**Debug** with `/hive-debugger` → Fix runtime issues (if needed)
The workflow is **flexible** - skip phases as needed, iterate freely, and adapt to your specific requirements. The goal is **production-ready agents** built with **consistent, repeatable processes**.
## Skill Selection Guide
**Choose hive-concepts when:**
- First time building agents
- Need to understand event loop architecture
- Validating tool availability
- Learning about node types, edges, and judges
**Choose hive-create when:**
- Actually building an agent
- Have clear requirements
- Ready to write code
- Want step-by-step guidance
- Want to start from an existing template and customize it
v0.7.1 replaces Playwright with direct Chrome DevTools Protocol (CDP) integration. The GCU now launches the user's system Chrome via `open -n` on macOS, connects over CDP, and manages browser lifecycle end-to-end -- no extra browser binary required.
---
### Highlights
#### System Chrome via CDP
The entire GCU browser stack has been rewritten:
- **Chrome finder & launcher** -- New `chrome_finder.py` discovers installed Chrome and `chrome_launcher.py` manages process lifecycle with `--remote-debugging-port`
- **Coexist with user's browser** -- `open -n` on macOS launches a separate Chrome instance so the user's tabs stay untouched
- **Dynamic viewport sizing** -- Viewport auto-sizes to the available display area, suppressing Chrome warning bars
- **Orphan cleanup** -- Chrome processes are killed on GCU server shutdown to prevent leaks
- **`--no-startup-window`** -- Chrome launches headlessly by default until a page is needed
#### Per-Subagent Browser Isolation
Each GCU subagent gets its own Chrome user-data directory, preventing cookie/session cross-contamination:
- Unique browser profiles injected per subagent
- Profiles cleaned up after top-level GCU node execution
- Tab origin and age metadata tracked per subagent
#### Dummy Agent Testing Framework
A comprehensive test suite for validating agent graph patterns without LLM calls:
- 8 test modules covering echo, pipeline, branch, parallel merge, retry, feedback loop, worker, and GCU subagent patterns
- Shared fixtures and a `run_all.py` runner for CI integration
- Subagent lifecycle tests
---
### What's New
#### GCU Browser
- **Switch from Playwright to system Chrome via CDP** -- Direct CDP connection replaces Playwright dependency. (@bryanadenhq)
- **Chrome finder and launcher modules** -- `chrome_finder.py` and `chrome_launcher.py` for cross-platform Chrome discovery and process management. (@bryanadenhq)
- **Dynamic viewport sizing** -- Auto-size viewport and suppress Chrome warning bar. (@bryanadenhq)
- **Per-subagent browser profile isolation** -- Unique user-data directories per subagent with cleanup. (@bryanadenhq)
- **Tab origin/age metadata** -- Track which subagent opened each tab and when. (@bryanadenhq)
- **`browser_close_all` tool** -- Bulk tab cleanup for agents managing many pages. (@bryanadenhq)
- **Auto-track popup pages** -- Popups are automatically captured and tracked. (@bryanadenhq)
The Playwright dependency is no longer required for GCU browser operations. Chrome must be installed on the host system.
---
## v0.7.0
**Release Date:** March 5, 2026
**Tag:** v0.7.0
Session management refactor release.
---
## v0.5.1
**Release Date:** February 18, 2026
**Tag:** v0.5.1
## The Hive Gets a Brain
### The Hive Gets a Brain
v0.5.1 is our most ambitious release yet. Hive agents can now **build other agents** -- the new Hive Coder meta-agent writes, tests, and fixes agent packages from natural language. The runtime grows multi-graph support so one session can orchestrate multiple agents simultaneously. The TUI gets a complete overhaul with an in-app agent picker, live streaming, and seamless escalation to the Coder. And we're now provider-agnostic: Claude Code subscriptions, OpenAI-compatible endpoints, and any LiteLLM-supported model work out of the box.
---
## Highlights
### Highlights
### Hive Coder -- The Agent That Builds Agents
#### Hive Coder -- The Agent That Builds Agents
A native meta-agent that lives inside the framework at `core/framework/agents/hive_coder/`. Give it a natural-language specification and it produces a complete agent package -- goal definition, node prompts, edge routing, MCP tool wiring, tests, and all boilerplate files.
- **Test generation** -- structural tests for forever-alive agents that don't hang on `runner.run()`
### Multi-Graph Agent Runtime
#### Multi-Graph Agent Runtime
`AgentRuntime` now supports loading, managing, and switching between multiple agent graphs within a single session. Six new lifecycle tools give agents (and the TUI) full control:
The Hive Coder uses multi-graph internally -- when you escalate from a worker agent, the Coder loads as a separate graph while the worker stays alive in the background.
### TUI Revamp
#### TUI Revamp
The Terminal UI gets a ground-up rebuild with five major additions:
@@ -54,7 +186,7 @@ The Terminal UI gets a ground-up rebuild with five major additions:
- **PDF attachments** -- `/attach` and `/detach` commands with native OS file dialog (macOS, Linux, Windows)
@@ -99,7 +231,7 @@ The quickstart script auto-detects Claude Code subscriptions and ZAI Code instal
| **Google Docs** | Document creation, reading, and editing with OAuth credential support | @haliaeetusvocifer |
| **Gmail enhancements** | Expanded mail operations for inbox management | @bryanadenhq |
### Infrastructure
#### Infrastructure
- **Default node type → `event_loop`** -- `NodeSpec.node_type` defaults to `"event_loop"` instead of `"llm_tool_use"`. (@TimothyZhang7)
- **Default `max_node_visits` → 0 (unlimited)** -- Nodes default to unlimited visits, reducing friction for feedback loops and forever-alive agents. (@TimothyZhang7)
@@ -112,7 +244,7 @@ The quickstart script auto-detects Claude Code subscriptions and ZAI Code instal
---
## Bug Fixes
### Bug Fixes
- Flush WIP accumulator outputs on cancel/failure so edge conditions see correct values on resume
- Stall detection state preserved across resume (no more resets on checkpoint restore)
@@ -125,13 +257,13 @@ The quickstart script auto-detects Claude Code subscriptions and ZAI Code instal
- Fix email agent version conflicts (@RichardTang-Aden)
- Fix coder tool timeouts (120s for tests, 300s cap for commands)
## Documentation
### Documentation
- Clarify installation and prevent root pip install misuse (@paarths-collab)
---
## Agent Updates
### Agent Updates
- **Email Inbox Management** -- Consolidate `gmail_inbox_guardian` and `inbox_management` into a single unified agent with updated prompts and config. (@RichardTang-Aden, @bryanadenhq)
- **Job Hunter** -- Updated node prompts, config, and agent metadata; added PDF resume selection. (@bryanadenhq)
@@ -141,7 +273,7 @@ The quickstart script auto-detects Claude Code subscriptions and ZAI Code instal
---
## Breaking Changes
### Breaking Changes
- **Deprecated node types raise `RuntimeError`** -- `llm_tool_use`, `llm_generate`, `function`, `router`, `human_input` now fail instead of warning. Migrate to `event_loop`.
- **`NodeSpec.node_type` defaults to `"event_loop"`** (was `"llm_tool_use"`)
@@ -150,7 +282,7 @@ The quickstart script auto-detects Claude Code subscriptions and ZAI Code instal
---
## Community Contributors
### Community Contributors
A huge thank you to everyone who contributed to this release:
@@ -165,14 +297,14 @@ A huge thank you to everyone who contributed to this release:
---
## Upgrading
### Upgrading
```bash
git pull origin main
uv sync
```
### Migration Guide
#### Migration Guide
If your agents use deprecated node types, update them:
@@ -196,12 +328,3 @@ hive code
# Or from TUI -- press Ctrl+E to escalate
hive tui
```
---
## What's Next
- **Agent-to-agent communication** -- one agent's output triggers another agent's entry point
- **Cost visibility** -- detailed runtime log of LLM costs per node and per session
- **Persistent webhook subscriptions** -- survive agent restarts without re-registering
- **Remote agent deployment** -- run agents as long-lived services with HTTP APIs
<p align="center"><em>The agent harness for production workloads — state management, failure recovery, observability, and human oversight so your agents actually run.</em></p>
## Overview
Build autonomous, reliable, self-improving AI agents without hardcoding workflows. Define your goal through conversation with a coding agent, and the framework generates a node graph with dynamically created connection code. When things break, the framework captures failure data, evolves the agent through the coding agent, and redeploys. Built-in human-in-the-loop nodes, credential management, and real-time monitoring give you control without sacrificing adaptability.
OpenHive is a zero-setup, model-agnostic execution harness that dynamically generates multi-agent topologies to tackle complex, long-running business workflows without requiring any orchestration boilerplate. By simply defining your objective, the runtime compiles a strict, graph-based execution DAG that safely coordinates specialized agents to execute concurrent tasks in parallel. Backed by persistent, role-based memory that intelligently evolves with your project's context, OpenHive ensures deterministic fault tolerance, deep state observability, and seamless asynchronous execution across whichever underlying LLMs you choose to plug in.
## Features
- ✅ Multi-Agent Coordination for parallel task execution
- ✅ Graph-based execution for recurring and complex processes
- ✅ Role-based memory that evolves with your projects
- ✅ Zero Setup - No technical configuration required
- ✅ General Compute Use and Browser Use with Native Extension
- ✅ Custom Model Support
Visit [adenhq.com](https://adenhq.com) for complete documentation, examples, and guides.
Visit [HoneyComb](http://honeycomb.open-hive.com/) to see what jobs are being automated by AI. It’s a stock market for jobs, driven by our community’s AI agent progress. You can long and short jobs (with no real money but compute token)based on how much you think a job is going to be replaced by AI.
Hive is designed for developers and teams who want to build **production-grade AI agents** without manually wiring complex workflows.
Hive is the multi-agent harness layer for teams moving AI agents from prototype to production. Single agents like Openclaw and Cowork can finish personal jobs pretty well but lack the rigor to fulfil business processes.
Hive is a good fit if you:
- Want AI agents that **execute real business processes**, not demos
-Prefer **goal-driven development** over hardcoded workflows
-Need a **runtime that handles state, recovery, and parallel execution** at scale
- Need **self-healing and adaptive agents** that improve over time
- Require **human-in-the-loop control**, observability, and cost limits
- Plan to run agents in **production environments**
- Plan to run agents in **production** where uptime, cost, and auditability matter
Hive may not be the best fit if you’re only experimenting with simple agent chains or one-off scripts.
## When Should You Use Hive?
Use Hive when you need:
Use Hive when the bottleneck is no longer the model but the harness around it:
- Long-running, autonomous agents
-Strong guardrails, process, and controls
-Continuous improvement based on failures
- Multi-agent coordination
- A framework that evolves with your goals
- Long-running agents that need **state persistence and crash recovery**
-Production workloads requiring **cost enforcement, observability, and audit trails**
-Agents that **self-heal** through failure capture and graph evolution
- Multi-agent coordination with **session isolation and shared buffers**
- A framework that **scales with model improvements** rather than fighting them
## Quick Links
@@ -73,7 +88,7 @@ Use Hive when you need:
- **[Self-Hosting Guide](https://docs.adenhq.com/getting-started/quickstart)** - Deploy Hive on your infrastructure
- **[Changelog](https://github.com/aden-hive/hive/releases)** - Latest updates and releases
- **[Roadmap](docs/roadmap.md)** - Upcoming features and plans
- **[Report Issues](https://github.com/adenhq/hive/issues)** - Bug reports and feature requests
- **[Report Issues](https://github.com/aden-hive/hive/issues)** - Bug reports and feature requests
- **[Contributing](CONTRIBUTING.md)** - How to contribute and submit PRs
## Quick Start
@@ -81,9 +96,10 @@ Use Hive when you need:
### Prerequisites
- Python 3.11+ for agent development
-Claude Code, Codex CLI, or Cursor for utilizing agent skills
-An LLM provider that powers the agents
- **ripgrep (optional, recommended on Windows):** The `search_files` tool uses ripgrep for faster file search. If not installed, a Python fallback is used. On Windows: `winget install BurntSushi.ripgrep` or `scoop install ripgrep`
> **Note for Windows Users:** It is strongly recommended to use **WSL (Windows Subsystem for Linux)** or **Git Bash** to run this framework. Some core automation scripts may not execute correctly in standard Command Prompt or PowerShell.
> **Windows Users:** Native Windows is supported via `quickstart.ps1` and `hive.ps1`. Run these in PowerShell 5.1+. WSL is also an option but not required.
### Installation
@@ -97,9 +113,11 @@ Use Hive when you need:
git clone https://github.com/aden-hive/hive.git
cd hive
# Run quickstart setup
# Run quickstart setup (macOS/Linux)
./quickstart.sh
# Windows (PowerShell)
.\quickstart.ps1
```
This sets up:
@@ -107,87 +125,40 @@ This sets up:
- **framework** - Core agent runtime and graph executor (in `core/.venv`)
- **aden_tools** - MCP tools for agent capabilities (in `tools/.venv`)
- **credential store** - Encrypted API key storage (`~/.hive/credentials`)
- **LLM provider** - Interactive default model configuration
- **LLM provider** - Interactive default model configuration, including Hive LLM and OpenRouter
- All required Python dependencies with `uv`
- Finally, it will open the Hive interface in your browser
> **Tip:** To reopen the dashboard later, run `hive open` from the project directory.
### Build Your First Agent
```bash
# Build an agent using Claude Code
claude> /hive
Type the agent you want to build in the home input box. The queen is going to ask you questions and work out a solution with you.
# (at separate terminal) Launch the interactive dashboard
hive tui
### Use Template Agents
# Or run directly
hive run exports/your_agent_name --input '{"key": "value"}'
```
Click "Try a sample agent" and check the templates. You can run a template directly or choose to build your version on top of the existing template.
## Coding Agent Support
### Run Agents
### Codex CLI
Now you can run an agent by selecting the agent (either an existing agent or example agent). You can click the Run button on the top left, or talk to the queen agent and it can run the agent for you.
Hive includes native support for [OpenAI Codex CLI](https://github.com/openai/codex) (v0.101.0+).
1.**Config:**`.codex/config.toml` with `agent-builder` MCP server (tracked in git)
2.**Skills:**`.agents/skills/` symlinks to Hive skills (tracked in git)
3.**Launch:** Run `codex` in the repo root, then type `use hive`
Example:
```
codex> use hive
```
### Opencode
Hive includes native support for [Opencode](https://github.com/opencode-ai/opencode).
1.**Setup:** Run the quickstart script
2.**Launch:** Open Opencode in the project root.
3.**Activate:** Type `/hive` in the chat to switch to the Hive Agent.
4.**Verify:** Ask the agent _"List your tools"_ to confirm the connection.
The agent has access to all Hive skills and can scaffold agents, add tools, and debug workflows directly from the chat.
**[📖 Complete Setup Guide](docs/environment-setup.md)** - Detailed instructions for agent development
### Antigravity IDE Support
Skills and MCP servers are also available in [Antigravity IDE](https://antigravity.google/) (Google's AI-powered IDE). **Easiest:** open a terminal in the hive repo folder and run (use `./` — the script is inside the repo):
```bash
./scripts/setup-antigravity-mcp.sh
```
**Important:** Always restart/refresh Antigravity IDE after running the setup script—MCP servers only load on startup. After restart, **agent-builder** and **tools** MCP servers should connect. Skills are under `.agent/skills/` (symlinks to `.claude/skills/`). See [docs/antigravity-setup.md](docs/antigravity-setup.md) for manual setup and troubleshooting.
## Features
- **[Goal-Driven Development](docs/key_concepts/goals_outcome.md)** - Define objectives in natural language; the coding agent generates the agent graph and connection code to achieve them
- **[Adaptiveness](docs/key_concepts/evolution.md)** - Framework captures failures, calibrates according to the objectives, and evolves the agent graph
- **[Dynamic Node Connections](docs/key_concepts/graph.md)** - No predefined edges; connection code is generated by any capable LLM based on your goals
- **SDK-Wrapped Nodes** - Every node gets shared memory, local RLM memory, monitoring, tools, and LLM access out of the box
- **[Human-in-the-Loop](docs/key_concepts/graph.md#human-in-the-loop)** - Intervention nodes that pause execution for human input with configurable timeouts and escalation
- **Real-time Observability** - WebSocket streaming for live monitoring of agent execution, decisions, and node-to-node communication
- **Interactive TUI Dashboard** - Terminal-based dashboard with live graph view, event log, and chat interface for agent interaction
- **Cost & Budget Control** - Set spending limits, throttles, and automatic model degradation policies
- **Production-Ready** - Self-hostable, built for scale and reliability
Hive is built to be model-agnostic and system-agnostic.
- **LLM flexibility** - Hive Framework is designed to support various types of LLMs, including hosted and local models through LiteLLM-compatible providers.
- **LLM flexibility** - Hive Framework supports Anthropic, OpenAI, OpenRouter, Hive LLM, and other hosted or local models through LiteLLM-compatible providers.
- **Business system connectivity** - Hive Framework is designed to connect to all kinds of business systems as tools, such as CRM, support, messaging, data, file, and internal APIs via MCP.
## Why Aden
## Why Hive
Hive focuses on generating agents that run real business processes rather than generic agents. Instead of requiring you to manually design workflows, define agent interactions, and handle failures reactively, Hive flips the paradigm: **you describe outcomes, and the system builds itself**—delivering an outcome-driven, adaptive experience with an easy-to-use set of tools and integrations.
As models improve, the upper bound of what agents can do rises — but their reliability and production value are determined by the harness. Hive focuses on generating agents that run real business processes rather than generic agents. Instead of requiring you to manually design workflows, define agent interactions, and handle failures reactively, Hive flips the paradigm: **you describe outcomes, and the system builds itself**—delivering an outcome-driven, adaptive experience with an easy-to-use set of tools and integrations.
- [Configuration Guide](docs/configuration.md) - All configuration options
- [Architecture Overview](docs/architecture/README.md) - System design and structure
## Roadmap
Aden Hive Agent Framework aims to help developers build outcome-oriented, self-adaptive agents. See [roadmap.md](docs/roadmap.md) for details.
```mermaid
flowchart TB
%% Main Entity
User([User])
%% =========================================
%% EXTERNAL EVENT SOURCES
%% =========================================
subgraph ExtEventSource [External Event Source]
E_Sch["Schedulers"]
E_WH["Webhook"]
E_SSE["SSE"]
end
%% =========================================
%% SYSTEM NODES
%% =========================================
subgraph WorkerBees [Worker Bees]
WB_C["Conversation"]
WB_SP["System prompt"]
subgraph Graph [Graph]
direction TB
N1["Node"] --> N2["Node"] --> N3["Node"]
N1 -.-> AN["Active Node"]
N2 -.-> AN
N3 -.-> AN
%% Nested Event Loop Node
subgraph EventLoopNode [Event Loop Node]
ELN_L["listener"]
ELN_SP["System Prompt<br/>(Task)"]
ELN_EL["Event loop"]
ELN_C["Conversation"]
end
end
end
subgraph JudgeNode [Judge]
J_C["Criteria"]
J_P["Principles"]
J_EL["Event loop"] <--> J_S["Scheduler"]
end
subgraph QueenBee [Queen Bee]
QB_SP["System prompt"]
QB_EL["Event loop"]
QB_C["Conversation"]
end
subgraph Infra [Infra]
SA["Sub Agent"]
TR["Tool Registry"]
WTM["Write through Conversation Memory<br/>(Logs/RAM/Harddrive)"]
SM["Shared Memory<br/>(State/Harddrive)"]
EB["Event Bus<br/>(RAM)"]
CS["Credential Store<br/>(Harddrive/Cloud)"]
end
subgraph PC [PC]
B["Browser"]
CB["Codebase<br/>v 0.0.x ... v n.n.n"]
end
%% =========================================
%% CONNECTIONS & DATA FLOW
%% =========================================
%% External Event Routing
E_Sch --> ELN_L
E_WH --> ELN_L
E_SSE --> ELN_L
ELN_L -->|"triggers"| ELN_EL
%% User Interactions
User -->|"Talk"| WB_C
User -->|"Talk"| QB_C
User -->|"Read/Write Access"| CS
%% Inter-System Logic
ELN_C <-->|"Mirror"| WB_C
WB_C -->|"Focus"| AN
WorkerBees -->|"Inquire"| JudgeNode
JudgeNode -->|"Approve"| WorkerBees
%% Judge Alignments
J_C <-.->|"aligns"| WB_SP
J_P <-.->|"aligns"| QB_SP
%% Escalate path
J_EL -->|"Report (Escalate)"| QB_EL
%% Pub/Sub Logic
AN -->|"publish"| EB
EB -->|"subscribe"| QB_C
%% Infra and Process Spawning
ELN_EL -->|"Spawn"| SA
SA -->|"Inform"| ELN_EL
SA -->|"Starts"| B
B -->|"Report"| ELN_EL
TR -->|"Assigned"| ELN_EL
CB -->|"Modify Worker Bee"| WB_C
%% =========================================
%% SHARED MEMORY & LOGS ACCESS
%% =========================================
%% Worker Bees Access (link to node inside Graph subgraph)
AN <-->|"Read/Write"| WTM
AN <-->|"Read/Write"| SM
%% Queen Bee Access
QB_C <-->|"Read/Write"| WTM
QB_EL <-->|"Read/Write"| SM
%% Credentials Access
CS -->|"Read Access"| QB_C
```
## Contributing
We welcome contributions from the community! We’re especially looking for help building tools, integrations, and example agents for the framework ([check #2805](https://github.com/aden-hive/hive/issues/2805)). If you’re interested in extending its functionality, this is the perfect place to start. Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
@@ -435,7 +245,7 @@ This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENS
**Q: What LLM providers does Hive support?**
Hive supports 100+ LLM providers through LiteLLM integration, including OpenAI (GPT-4, GPT-4o), Anthropic (Claude models), Google Gemini, DeepSeek, Mistral, Groq, and many more. Simply set the appropriate API key environment variable and specify the model name.
Hive supports 100+ LLM providers through LiteLLM integration, including OpenAI (GPT-4, GPT-4o), Anthropic (Claude models), Google Gemini, DeepSeek, Mistral, Groq, OpenRouter, and Hive LLM. Simply set the appropriate API key environment variable and specify the model name. See [docs/configuration.md](docs/configuration.md) for provider-specific configuration examples.
**Q: Can I use Hive with local AI models like Ollama?**
@@ -443,16 +253,12 @@ Yes! Hive supports local models through LiteLLM. Simply use the model name forma
**Q: What makes Hive different from other agent frameworks?**
Hive generates your entire agent system from natural language goals using a coding agent—you don't hardcode workflows or manually define graphs. When agents fail, the framework automatically captures failure data, [evolves the agent graph](docs/key_concepts/evolution.md), and redeploys. This self-improving loop is unique to Aden.
Hive is an agent harness, not just an orchestration framework. It provides the production runtime layer — session isolation, checkpoint-based crash recovery, cost enforcement, real-time observability, and human-in-the-loop controls — that makes agents reliable enough to run real workloads. On top of that, Hive generates your entire agent system from natural language goals and automatically [evolves the graph](docs/key_concepts/evolution.md) when agents fail. The combination of a robust harness with self-improving generation is what sets Hive apart.
**Q: Is Hive open-source?**
Yes, Hive is fully open-source under the Apache License 2.0. We actively encourage community contributions and collaboration.
**Q: Can Hive handle complex, production-scale use cases?**
Yes. Hive is explicitly designed for production environments with features like automatic failure recovery, real-time observability, cost controls, and horizontal scaling support. The framework handles both simple automations and complex multi-agent workflows.
**Q: Does Hive support human-in-the-loop workflows?**
Yes, Hive fully supports [human-in-the-loop](docs/key_concepts/graph.md#human-in-the-loop) workflows through intervention nodes that pause execution for human input. These include configurable timeouts and escalation policies, allowing seamless collaboration between human experts and AI agents.
@@ -477,13 +283,15 @@ Visit [docs.adenhq.com](https://docs.adenhq.com/) for complete guides, API refer
Contributions are welcome! Fork the repository, create your feature branch, implement your changes, and submit a pull request. See [CONTRIBUTING.md](CONTRIBUTING.md) for detailed guidelines.
**Q: When will my team start seeing results from Aden's adaptive agents?**
## Star History
Aden's adaptation loop begins working from the first execution. When an agent fails, the framework captures the failure data, helping developers evolve the agent graph through the coding agent. How quickly this translates to measurable results depends on the complexity of your use case, the quality of your goal definitions, and the volume of executions generating feedback.
**Q: How does Hive compare to other agent frameworks?**
Hive focuses on generating agents that run real business processes, rather than generic agents. This vision emphasizes outcome-driven design, adaptability, and an easy-to-use set of tools and integrations.
This guide covers the MCP (Model Context Protocol) server for building goal-driven agents.
> **Note:** The standalone `agent-builder` MCP server (`framework.mcp.agent_builder_server`) has been replaced. Agent building is now done via the `coder-tools` server's `initialize_and_build_agent` tool, with underlying logic in `tools/coder_tools_server.py`.
This guide covers the MCP tools available for building goal-driven agents.
## Setup
### Quick Setup
```bash
# Using the setup script (recommended)
python setup_mcp.py
# Or using bash
./setup_mcp.sh
# Run the quickstart script (recommended)
./quickstart.sh
```
### Manual Configuration
@@ -21,10 +20,10 @@ Add to your MCP client configuration (e.g., Claude Desktop):
@@ -17,66 +17,11 @@ Framework provides a runtime framework that captures **decisions**, not just act
uv pip install -e .
```
## MCP Server Setup
## Agent Building
The framework includes an MCP (Model Context Protocol) server for building agents. To set up the MCP server:
Agent scaffolding is handled by the `coder-tools` MCP server (in `tools/coder_tools_server.py`), which provides the `initialize_and_build_agent` tool and related utilities. The package generation logic lives directly in `tools/coder_tools_server.py`.
### Automated Setup
**Using bash (Linux/macOS):**
```bash
./setup_mcp.sh
```
**Using Python (cross-platform):**
```bash
python setup_mcp.py
```
The setup script will:
1. Install the framework package
2. Install MCP dependencies (mcp, fastmcp)
3. Create/verify `.mcp.json` configuration
4. Test the MCP server module
### Manual Setup
If you prefer manual setup:
```bash
# Install framework
uv pip install -e .
# Install MCP dependencies
uv pip install mcp fastmcp
# Test the server
uv run python -m framework.mcp.agent_builder_server
```
### Using with MCP Clients
To use the agent builder with Claude Desktop or other MCP clients, add this to your MCP client configuration:
File diff suppressed because it is too large
Load Diff
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.