Compare commits

...

1054 Commits

Author SHA1 Message Date
Timothy 3963855d1d fix: isolate session loading 2026-02-24 11:02:58 -08:00
bryan 28a71b70a8 readme for http apis 2026-02-24 09:22:56 -08:00
bryan 33d3a13fde Merge branch 'feature/concurrent-judge-runtime' into feat/open-hive 2026-02-24 09:11:42 -08:00
bryan 5ea278a08d integrated queen, worker, judge 2026-02-24 09:09:28 -08:00
Timothy fd95f8da28 feat: active streams and waiting nodes 2026-02-24 09:03:21 -08:00
bryan c1d5952ad9 Merge branch 'feature/concurrent-judge-runtime' into feat/open-hive 2026-02-24 08:07:31 -08:00
bryan 72673e12fb remove mock data 2026-02-24 08:02:08 -08:00
Timothy 3867d3926b Merge branch 'main' into feature/concurrent-judge-runtime 2026-02-24 07:43:22 -08:00
Timothy 0b2b7a2622 feat: event bus logging 2026-02-24 07:43:05 -08:00
bryan 3951ee1a7d Merge branch 'main' into feat/open-hive 2026-02-24 07:28:42 -08:00
bryan 1afde51c7b additional graph update 2026-02-24 07:28:11 -08:00
bryan cbeef18f0a wip graph 2026-02-24 07:27:48 -08:00
Uttkarsh Joshi 1947d8c3ca Fix asyncio.run crash in GraphBuilder and enhance ToolRegistry type inference (fixes: #2680) (#2895)
* Enhance ToolRegistry type inference for function parameters

- Add _infer_schema() helper to handle Union types (Union[T, U] and T | U)
- Support Optional[T] and Union[T, None] with correct optional flag
- Infer generic types: list[T] -> array with items schema, dict[K, V] -> object with additionalProperties
- Detect Pydantic BaseModel parameters and use model_json_schema()
- Correctly mark parameters as required/optional based on type annotations
- Add comprehensive test suite covering all type inference scenarios
- Maintain backward compatibility for unannotated parameters

* Fix asyncio.run crash in GraphBuilder.run_test

* Revert "Enhance ToolRegistry type inference for function parameters"

This reverts commit dacd0fa8b926e01d3f29e7c9b2ff5101b4a52c3b.
2026-02-24 17:06:11 +08:00
austin931114 55c63736ef Merge pull request #5315 from sabasiddique1/fix/roadmap-mermaid-diagram-render
docs: fix Roadmap Mermaid diagram not rendering on GitHub
2026-02-24 10:05:48 +01:00
austin931114 a2b68d893f Merge pull request #5317 from kart1ka/fix/retired-haiku-3.5-model
micro-fix: replace retired claude-3-5-haiku-20241022 with claude-haiku-4-5
2026-02-24 09:50:17 +01:00
Kartik Saini fd06e43d9c Merge branch 'main' into fix/retired-haiku-3.5-model 2026-02-24 11:36:18 +05:30
Kartik Saini b550f6efa0 fix(llm): replace retired claude-3-5-haiku-20241022 with claude-haiku-4-5-20251001 2026-02-24 11:22:40 +05:30
Saba Siddique 47adf88773 docs: fix Roadmap Mermaid diagram for GitHub rendering 2026-02-24 10:36:35 +05:00
RichardTang-Aden 8748da38cf Merge pull request #5310 from RichardTang-Aden/fix/llm-token-source
Feat: tui workflow improvement and the fix for quickstart  problem for GLM
2026-02-23 19:23:59 -08:00
Timothy f697dc99fb feat: queen primitives 2026-02-23 19:15:55 -08:00
bryan ecb038c955 chat now creates multiple chats msgs 2026-02-23 19:07:54 -08:00
Richard Tang 77ff31cec6 feat: add back the quickstart prompt to restart terminal 2026-02-23 18:33:33 -08:00
Richard Tang 5ea8677a5d feat: tui get started menu 2026-02-23 18:06:59 -08:00
Richard Tang 97f5b3423f feat: source the llm token after quickstart 2026-02-23 17:50:53 -08:00
Timothy @aden 4968207eef Merge pull request #5276 from TimothyZhang7/fix/identity-persistence
Fix/identity persistence
2026-02-23 17:47:53 -08:00
Timothy @aden f859e2203a Merge branch 'main' into fix/identity-persistence 2026-02-23 17:45:23 -08:00
Bryan @ Aden fb3dad4354 Merge pull request #5231 from vakrahul/fix/local-llm-keyless-crash
fix(core): support local LLMs (Ollama, vLLM, LM Studio, Llama.cpp) in AgentRunner #3994
2026-02-24 01:42:53 +00:00
Timothy adc82c6a65 fix: lint issue 2026-02-23 17:42:36 -08:00
bryan 96084fea16 wip chat 2026-02-23 17:41:12 -08:00
Timothy @aden 6f52026c84 Merge branch 'main' into fix/identity-persistence 2026-02-23 17:35:44 -08:00
Timothy @aden 3576218ea9 Merge pull request #5270 from aden-hive/feature/local-credential-namespace
feat: local credential testing
2026-02-23 17:32:34 -08:00
vakrahul 4c662db530 fix: add missing accounts_prompt to add_graph in AgentRuntime 2026-02-24 07:01:48 +05:30
Timothy da1ce4e5a7 fix: lint 2026-02-23 17:30:27 -08:00
vakrahul c4944c5662 fix: pass accounts_prompt to ExecutionStream in add_graph and GraphExecutor 2026-02-24 06:31:56 +05:30
Bryan @ Aden d892f87651 Merge pull request #1814 from nafiyad/feat/wikipedia-search-tool
Feat/wikipedia search tool
2026-02-24 00:34:54 +00:00
Nafiyad Adane 447f23d157 style: run ruff check and format on tools/ 2026-02-23 17:17:58 -07:00
Nafiyad Adane aa12f0d295 Merge main into feat/wikipedia-search-tool 2026-02-23 17:12:31 -07:00
bryan de9226aae0 credentials 2026-02-23 14:11:16 -08:00
Timothy 16e1ab1a87 feat: concurrent judge session 2026-02-23 13:56:59 -08:00
Bryan @ Aden 54287e06ad Merge pull request #4519 from Rudra2637/clarify-criterion-evaluation
Clarify supported criterion evaluation and progress semantics
2026-02-23 20:39:48 +00:00
Rudra2637 b33de5f0e1 Fix lint and formatting issues 2026-02-24 01:49:45 +05:30
Rudra2637 2d5ef20d4d Restore comment explaining 0.8 threshold 2026-02-24 01:13:52 +05:30
Rudra2637 177346b159 Fix docstring indentation 2026-02-24 01:09:40 +05:30
bryan 08819b1609 Merge branch 'main' into feat/open-hive 2026-02-23 11:13:32 -08:00
Rudra2637 35b1332551 Add type field to SuccessCriterion and restore evaluation guard 2026-02-24 00:29:31 +05:30
Bryan @ Aden 52586a024b Merge pull request #5273 from aden-hive/chore/add-community-cred
(micro-fix): add community credit for competitive intelligence agent
2026-02-23 18:53:45 +00:00
bryan 05a314b121 add community credit for competitive intelligence agent 2026-02-23 10:42:13 -08:00
Bryan @ Aden 8e262e2270 Merge pull request #5179 from nafiyad/feature/competitive-intelligence-agent-4153
Add competitive intelligence agent template
2026-02-23 18:20:02 +00:00
Timothy 733bb4d2dd fix: get all account info including local apis 2026-02-23 10:09:22 -08:00
vakrahul ba31c760a6 fix: restore accounts_prompt propagation chain to ExecutionStream 2026-02-23 23:31:08 +05:30
Timothy a388bc6837 feat: local credential testing 2026-02-23 09:55:38 -08:00
Timothy 3f5bbbf1e3 feat: implementation of concurrent judge 2026-02-23 09:52:11 -08:00
Emmanuel Nwanguma 002da15375 docs(tools): add README for security tools (#5164)
* docs(tools): add README + comprehensive tests for security tools

READMEs added for 7 security scanning tools:
- port_scanner: TCP connect scans, banner grabbing, risky port detection
- ssl_tls_scanner: TLS version, cipher, certificate analysis
- http_headers_scanner: OWASP security headers validation
- dns_security_scanner: SPF, DMARC, DKIM, DNSSEC, zone transfer
- subdomain_enumerator: Passive CT log subdomain discovery
- tech_stack_detector: Web technology fingerprinting
- risk_scorer: Weighted letter-grade risk scoring

Comprehensive unit tests (92 total):
- Port scanner: constants, port categories, _check_port async tests
- SSL/TLS scanner: weak ciphers, TLS versions, cert parsing helpers
- HTTP headers scanner: security headers, leaky headers validation
- DNS security scanner: SPF/DMARC/DKIM/DNSSEC checks
- Subdomain enumerator: keyword detection, severity levels
- Tech stack detector: cookies, CDN, CMS, framework detection
- Risk scorer: grading logic, category scoring, JSON parsing

Fixes #5094

* revert: remove test changes per review feedback
2026-02-23 21:46:51 +08:00
Shubham Yadav 005609da3a Feat/PostgreSQL (Read-Only MCP) (#4160)
* feat(tools): add read-only PostgreSQL MCP tool

* test(tools): add postgres tool tests

* docs(tools): add postgres tool README

* feat(tools): update PostgreSQL MCP tool with refactored code structure and adding postgres credentials

* feat(tools): implement thread-safe connection pooling for PostgreSQL MCP tool

* fix(postgres): correct psycopg2 dependency and README setup instructions

---------

Co-authored-by: hundao <alchemy_wimp@hotmail.com>
2026-02-23 21:17:10 +08:00
Youssef Mohammed Abdelal Mohammed 182d9ca6f9 feat(tools): add arXiv search and download tools (#5222)
* feat(arxiv): implement search_papers and initial download_paper tools

* feat(arxiv): improve PDF download handling with temp files and validation (WIP)

Switch to NamedTemporaryFile for safer temp file handling

Force export.arxiv.org domain for PDF downloads

Add custom User-Agent header

Validate Content-Type to ensure PDF response

Improve error handling and cleanup logic

Add timeout to requests

Work in progress – download_paper still under refinement.

* feat(arxiv): replace NamedTemporaryFile with module-level TemporaryDirectory

Switch from NamedTemporaryFile(delete=False) to a shared _TEMP_DIR for
the lifetime of the server process. Scopes file lifetime to the session,
guarantees cleanup via atexit, and removes the need for manual file
handle management.

Expand README with full args/returns/error reference and implementation
notes explaining the temp storage design decision.

* test(arxiv): add comprehensive tests for search_papers and download_paper

fix(arxiv): return structured error instead of raising on invalid PDF content type

- Add full test coverage for search_papers (validation, success, id_list, errors)
- Add full test coverage for download_paper (success, network errors, invalid content, cleanup)
- Mock arxiv client and requests to isolate behavior
- Ensure partial files are cleaned up on failure
- Align download_paper behavior with tool contract (no exceptions, structured responses)

* style(tools): apply ruff formatting to arxiv tool and update lockfile
2026-02-23 20:57:04 +08:00
vakrahul a6b43f8016 fix: address PR review feedback (accounts_prompt, tests, and remove markdowns) 2026-02-23 17:38:22 +05:30
vakrahul 31700fa8da fix: address PR review feedback (accounts_prompt, tests, and remove markdown) 2026-02-23 17:38:04 +05:30
Rudra2637 6b475ec1cf Removed invalid type guard and clean up comments 2026-02-23 11:06:10 +05:30
Timothy 1b27844c52 feat: local credential testing 2026-02-22 20:58:42 -08:00
RichardTang-Aden 3a0b91f7ab Merge pull request #5251 from vincentjiang777/main
docs: roadmap updates for architecure v3
2026-02-22 20:48:43 -08:00
RichardTang-Aden 82108e32fa Merge branch 'main' into main 2026-02-22 20:48:10 -08:00
Timothy 28f4fecfb3 feat: handle account identity systematically 2026-02-22 20:45:36 -08:00
Vincent Jiang ff1bb08217 docs: roadmap updates for architecure v3 2026-02-22 20:41:26 -08:00
Nafiyad Adane 10617fee0d chore(templates): export agent.json configuration
Generated the agent.json fallback configuration using the agent-builder MCP server export functionality as requested by the reviewer.
2026-02-22 21:12:08 -07:00
Bryan @ Aden 866103ddf4 Merge pull request #5212 from JamieJiHeonKim/docs/fix-readme-formatting-and-links
Docs/fix readme formatting and links
2026-02-23 03:51:32 +00:00
bryan fcfaca6bd0 Merge branch 'main' into feat/open-hive 2026-02-22 19:50:39 -08:00
bryan 4c7d9ab0fb added click cursor and rename dashboard to workspace 2026-02-22 19:21:37 -08:00
bryan 061aec4b3d my agents configured 2026-02-22 19:04:48 -08:00
Bryan @ Aden f12ab10725 Merge pull request #4930 from Ttian18/fix/tina/shift-enter-newline-4565
fix(tui): add Ctrl+J as newline fallback in chat input
2026-02-23 02:31:47 +00:00
Bryan @ Aden 0882fa6ce5 Merge pull request #5165 from ishaannk/main
feat: add stop/cancel execution control for agents
2026-02-23 02:24:46 +00:00
RichardTang-Aden 0b87e4c45d Merge pull request #5245 from TimothyZhang7/main
Release / Create Release (push) Waiting to run
doc(architecture): update documents
2026-02-22 18:04:51 -08:00
Timothy 9c7e846828 chore: put event loop node zoom inside worker bee graph 2026-02-22 18:03:31 -08:00
bryan 30bd0e483a home page and mock chatroom 2026-02-22 18:03:02 -08:00
Timothy 13cc93c334 chore: architecture 2026-02-22 17:54:12 -08:00
Timothy 564b1bb752 chore: roadmap diagram 2026-02-22 17:45:56 -08:00
bryan 2f31a92d31 Merge branch 'main' into feat/open-hive 2026-02-22 16:06:44 -08:00
ishaannk fd89c7f56f Fix: Add trailing newlines for ruff format compliance 2026-02-23 04:41:38 +05:30
bryan 35738c8279 react structure 2026-02-22 14:52:15 -08:00
vakrahul a0d14b8a25 fix(core): add zero-config local LLM support and fix AgentRunner crash (#3994) and adding docs 2026-02-22 22:59:11 +05:30
Timothy @aden 9c781ed78e Merge pull request #5224 from TimothyZhang7/feature/credential-v2
fix(micro-fix): tui select account
2026-02-21 23:13:18 -08:00
Timothy 460a24e34a fix: tui select account 2026-02-21 23:01:41 -08:00
JamieJiHeonKim 8ae030e16e docs: add link to linting and formatting setup in CONTRIBUTING.md 2026-02-21 13:00:46 -05:00
JamieJiHeonKim 3c6467c814 docs: fix unclosed code block in deep_research_agent README 2026-02-21 12:54:45 -05:00
ishank 2f11f0c911 Merge branch 'aden-hive:main' into main 2026-02-21 17:29:51 +05:30
ishaannk c3ae67fb1d address review comments: rename Stop to Pause and UI toggle change 2026-02-21 17:08:42 +05:30
Timothy @aden 8c750c7edd Merge pull request #5194 from Antiarin/fix/escalate-to-coder-execution-id
fix[bug](graph): add execution_id to base Runtime and restore ctx.execution_id in escalation handler
2026-02-21 03:05:14 -08:00
Antiarin 571838a289 fix(graph): add execution_id to NodeContext for escalate_to_coder 2026-02-21 16:20:04 +05:30
RichardTang-Aden dafaaae792 Merge pull request #5182 from TimothyZhang7/feature/credential-v2
Feature/credential v2
2026-02-20 19:52:00 -08:00
Timothy b45e14efb4 fix: zai api key setup 2026-02-20 19:40:07 -08:00
RichardTang-Aden e70cbf26e2 Merge pull request #5095 from NSkogstad-AUS/docs/update-tools-readme
Docs/update tools readme
2026-02-20 19:08:23 -08:00
RichardTang-Aden daafdc3704 Merge pull request #5103 from alhousseynou-ndiaye/feature/document-agent
docs: add document processing recipe
2026-02-20 19:06:59 -08:00
bryan 6661934fed harden server apis and agent loading 2026-02-20 18:28:52 -08:00
Nafiyad Adane f568728de1 Add competitive intelligence agent template
- Adds a new autonomous agent template that monitors competitor websites, news, and GitHub
- Implements a 7-node graph workflow to collect, aggregate, and analyze competitive data
- Generates a weekly structured HTML digest with key highlights and 30-day trends
- Utilizes existing web_scrape, web_search, and github MCP tools
- Addresses issue #4153

Closes #4153
2026-02-20 19:13:47 -07:00
bryan 263d35bbd6 Merge branch 'main' into feat/open-hive 2026-02-20 18:09:01 -08:00
Bryan @ Aden bece21d217 Merge pull request #5169 from Schlaflied/docs/sync-zh-CN-readme
docs(i18n): sync zh-CN.md with latest README and fix broken links
2026-02-21 02:06:48 +00:00
bryan d4788e147a backend apis for open hive 2026-02-20 18:01:51 -08:00
Timothy f4594ecf37 fix: gmail batch tool schema coercion 2026-02-20 17:53:35 -08:00
Bryan @ Aden 8f1462cb79 Merge pull request #5113 from vakrahul/features/stripe-tools
feat(tools): add Stripe payment processing integration
2026-02-21 01:44:23 +00:00
Timothy 76d4d0de69 feat: credential v2 with provider loading and test agent 2026-02-20 17:43:00 -08:00
vakrahul 6ab4e1d641 fix: address maintainer PR reviews feedback for Stripe 2026-02-21 07:06:20 +05:30
vakrahul c5d87c99fd fix: address maintainer PR review feedback for Stripe 2026-02-21 06:30:31 +05:30
Schlaflied f53f403022 docs(i18n): sync zh-CN.md with latest README and fix broken links 2026-02-20 19:29:45 -05:00
Timothy b887b2951e wip: credential v2 2026-02-20 13:55:06 -08:00
ishaannk 842b69b155 feat: add stop/cancel execution control for agents 2026-02-21 02:03:29 +05:30
Nicolas Suescun d6c34106fc docs: fix CLI arguments mismatch for test-debug and test-list (#4113)
* docs: fix CLI usage args for test-debug/test-list to match implementation

* docs: restore 'uv run' prefix to test commands

Reverts unintentional removal of 'uv run' in usage examples as requested in code review.

* chore: changes to .gitignore
2026-02-20 17:58:17 +08:00
Nihal 67cbd31280 fix(graph): harden JSON parsing for async safety and large LLM outputs (#4869)
* perf(json): add json.loads fast path + asyncio.to_thread for extract_json

Addresses maintainer feedback:
- json.loads candidate fast path in find_json_object (300x speedup source)
- asyncio.to_thread wrappers for both _extract_json call sites (unblocks event loop)
- Remove ~480 lines of over-engineered incremental parsing logic

Total: ~16 lines, zero duplication, zero API surface change

* fix: simplify async JSON handling per maintainer feedback and align tests

* fix(test): replace tautology assertion in test_mismatched_then_valid

The original assertion `assert result is not None or result is None`
is always true. Replace with a meaningful type check.

---------

Co-authored-by: hundao <alchemy_wimp@hotmail.com>
2026-02-20 17:23:25 +08:00
Timothy @aden cf877f2b49 Merge pull request #5121 from TimothyZhang7/fix/credential-error-types
Fix(micro-fix)/credential error types
2026-02-19 16:23:31 -08:00
Timothy 6f34cb2c8a fix: credential error types 2026-02-19 14:52:29 -08:00
Timothy b88aa2b53c Merge branch 'feature/tui-credential-setup' 2026-02-19 11:12:28 -08:00
Timothy 356cab19eb Merge branch 'fix/google-tool-healthcheck' into feature/tui-credential-setup 2026-02-19 11:12:15 -08:00
vakrahul 7c6d5fa446 test_credentials changess 2026-02-19 19:46:43 +05:30
vakrahul 2dae3e47fd test_credentials changes 2026-02-19 19:43:45 +05:30
vakrahul 6fce789607 feat: add Stripe tool integration and testss 2026-02-19 17:14:57 +05:30
vakrahul 9bbb5b38e6 feat: add Stripe tool integration and tests 2026-02-19 16:55:22 +05:30
vakrahul ac73aa93bf feat: add Stripe tool integration and tests 2026-02-19 16:51:08 +05:30
Timothy 52a56e4a10 fix: google tools need healthcheck 2026-02-18 23:07:12 -08:00
alhousseynou-ndiaye a1cede510d docs: add document processing recipe 2026-02-19 07:47:13 +01:00
Timothy @aden 682c10e873 Merge pull request #5099 from TimothyZhang7/main
release(docs): v0.5.1
2026-02-18 22:11:45 -08:00
Timothy 5605e24a0d fix: streaming output leakage 2026-02-18 22:10:02 -08:00
Timothy f7268a44d9 fix: worker credential setup 2026-02-18 21:50:18 -08:00
Timothy af7a4ff4e8 release: v0.5.1
- Bump framework version 0.5.0 → 0.5.1
- Add CHANGELOG.md with full release notes

Highlights: Hive Coder meta-agent, multi-graph runtime, TUI revamp,
subscription model support, 5 new tool integrations, deprecated node
type removal.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 21:15:20 -08:00
Timothy 60b9c0d763 release: v0.5.1
- Bump framework version 0.5.0 → 0.5.1
- Add CHANGELOG.md with full release notes

Highlights: Hive Coder meta-agent, multi-graph runtime, TUI revamp,
subscription model support, 5 new tool integrations, deprecated node
type removal.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 21:00:41 -08:00
Timothy @aden 5c550270c6 Merge pull request #5071 from TimothyZhang7/feature/queen-bee
Release / Create Release (push) Waiting to run
Feature/queen bee
2026-02-18 20:59:04 -08:00
Timothy e03fd48e48 fix: lint 2026-02-18 20:51:40 -08:00
Timothy 6420c74c24 fix: ci tests 2026-02-18 20:47:07 -08:00
Timothy ad74351530 fix: agent switch return 2026-02-18 20:39:25 -08:00
Timothy @aden 1b5f656429 Merge branch 'main' into feature/queen-bee 2026-02-18 20:34:19 -08:00
Timothy @aden 132d84c529 Merge pull request #5058 from adenhq/fix/deprecation
fix(arch): remove all deprecated concepts and deadcodes
2026-02-18 20:32:24 -08:00
Timothy @aden a03b378e9b Merge branch 'main' into fix/deprecation 2026-02-18 20:29:39 -08:00
Timothy 74635e1d7d feat: subscription model support, tui revamp 2026-02-18 20:28:11 -08:00
Bryan @ Aden 893053ede7 Merge pull request #5098 from adenhq/update/inbox-agent-fixes
(micro-fix): Update/inbox agent fixes
2026-02-19 03:35:25 +00:00
bryan 596ec6fec5 fixed credentials 2026-02-18 19:26:59 -08:00
bryan 5863b83172 Merge branch 'main' into update/inbox-agent-fixes 2026-02-18 19:12:01 -08:00
bryan 20c92b197a fixes to inbox agent 2026-02-18 19:08:55 -08:00
RichardTang-Aden ec9c6b4666 Merge pull request #5097 from RichardTang-Aden/feat/credential-setup-cli
Feat/credential setup cli
2026-02-18 17:22:53 -08:00
Richard Tang 8a73e5c119 chore: ruff lint 2026-02-18 17:21:45 -08:00
Richard Tang 717f0eee9a Merge branch 'main' into feat/credential-setup-cli 2026-02-18 17:20:40 -08:00
Richard Tang 09fb47f089 chore: ruff format 2026-02-18 17:14:26 -08:00
Richard Tang b46d943e71 chore: lint issues 2026-02-18 17:13:01 -08:00
NSkogstad-AUS b980d6f6ab docs(tools): fixed small inaccuracy with gmail description 2026-02-19 11:33:50 +11:00
NSkogstad-AUS 61f27369ef docs(tools): update Available Tools table with additional search functionalities 2026-02-19 11:28:18 +11:00
NSkogstad-AUS 204b0b4744 docs(tools): expand Available Tools table with all tools by category
Previously the table listed ~20 of ~50 available tools. This expands
it to cover all tools, grouped into categories: File System, Data Files,
Web & Search, Communication, Productivity & CRM, Cloud & APIs,
Security, and Utilities.

All tool names verified against registered @mcp.tool() functions in source.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-02-19 11:20:49 +11:00
Timothy 1b6ebb1e42 fix: put guardian back to hive coder 2026-02-18 15:06:25 -08:00
Timothy 7dfc75b3e6 feat: muti graph agent session 2026-02-18 12:46:59 -08:00
Richard Tang 2920b5ab01 chore: lint issues 2026-02-17 20:05:05 -08:00
Richard Tang 81ad0467b0 Merge branch 'main' into fix/deprecation 2026-02-17 20:02:47 -08:00
Richard Tang 115ca55ea0 fix: broken ci tests 2026-02-17 20:00:47 -08:00
Richard Tang f2814a26e6 chore: lint issue 2026-02-17 19:57:31 -08:00
Richard Tang 4d309950b0 fix: unused code and ci 2026-02-17 19:55:54 -08:00
RichardTang-Aden 39216a4c12 Merge pull request #5016 from adenhq/feat/pdf-ingestion
Feat/pdf ingestion
2026-02-17 19:29:35 -08:00
Aden HQ c7fa621aeb Merge branch 'main' into feat/pdf-ingestion 2026-02-17 19:28:54 -08:00
Timothy 5914d28cbe feat(queen): hive queen bee implementation v1 2026-02-17 19:19:09 -08:00
Richard Tang 8c3ad3d70a fix: email agent version 2026-02-17 19:17:07 -08:00
Richard Tang 9eb3fc6285 fix: fix email agent version 2026-02-17 19:09:17 -08:00
Richard Tang e95f7e7339 fix: make the email inbox management agent identical to main 2026-02-17 19:02:08 -08:00
RichardTang-Aden d949551399 Merge pull request #3332 from haliaeetusvocifer/feat/google-docs-integration
added feature google docs integration
2026-02-17 18:25:03 -08:00
Richard Tang a7dbd85ed4 fix: google docs credentials 2026-02-17 18:24:31 -08:00
Richard Tang 1f288dab1c fix: tools registration problems 2026-02-17 18:15:41 -08:00
Richard Tang 021754d941 Merge branch 'main' into feat/google-docs-integration 2026-02-17 18:05:07 -08:00
bryan 7412904fbf update to job hunter and website vulnerability 2026-02-17 16:58:12 -08:00
Timothy cd1976e2b9 feat: support openai compatible endpoints 2026-02-17 16:27:36 -08:00
haliaeetusvocifer 5f3e9379a3 completed task 2026-02-17 23:59:47 +00:00
Richard Tang 0e565d6cea feat: add the agent start confirmation and credential update option 2026-02-17 13:17:03 -08:00
Richard Tang 67b249dcd5 feat: add the credential setup step after credential validation 2026-02-17 12:59:20 -08:00
Timothy bbf1c8c790 fix(arch): remove all deprecated concepts and deadcodes 2026-02-17 10:59:15 -08:00
bryan 44a8b453b5 Merge branch 'main' into feat/pdf-ingestion 2026-02-16 18:40:46 -08:00
bryan 26511fe962 added pdf select and updated job hunter 2026-02-16 18:38:13 -08:00
RichardTang-Aden ce5893216a Merge pull request #4871 from paarths-collab/docs/root-install-warning
docs(readme): clarify uv workspace setup and prevent root pip install misuse
2026-02-16 18:36:57 -08:00
RichardTang-Aden 4e821e4dbf Merge pull request #5011 from RichardTang-Aden/main
micro-fix: chore: update the intro message of the agent
2026-02-16 18:05:18 -08:00
Richard Tang d11e97de59 chore: update the intro message of the agent 2026-02-16 18:03:58 -08:00
RichardTang-Aden 4b10d3e360 Merge pull request #5010 from RichardTang-Aden/main
feat: merge the sample agent so we have one email inbox management agent
2026-02-16 18:00:31 -08:00
Richard Tang e04479930f chore: update descriptions 2026-02-16 17:56:17 -08:00
Richard Tang 8a8c4cc3f5 chore: rename the email 2026-02-16 17:53:05 -08:00
Richard Tang 1e06ff611e refactor: merge the sample agent so we have one email inbox management agent 2026-02-16 17:43:18 -08:00
Pravin Mishra 1edc7bb9c7 feat(tools): add Discord integration (#2913) (#4247)
* feat(tools): add Discord integration (#2913)

- discord_list_guilds: list servers the bot is in
- discord_list_channels: list channels for a guild
- discord_send_message: send message to channel
- discord_get_messages: get recent messages

Auth: DISCORD_BOT_TOKEN, credential spec, health checker.
Uses Discord API v10 (Bot token).

Co-authored-by: Cursor <cursoragent@cursor.com>

* style: apply ruff format to discord tool files

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(discord): add rate limit handling, message validation, channel filter

- Rate limit (429): return clear error with retry_after from API
- Message length: validate before send, max 2000 chars per Discord limit
- Channel filter: text_only param (default True) for list_channels
- Add 6 new tests for rate limit, validation, filtering

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(discord): add retry on 429 rate limit

- Retry up to 2 times using Discord's retry_after
- Cap wait at 60s, fallback to exponential backoff if no retry_after
- Add _request_with_retry helper for all API calls
- Add 3 tests: retry then success, retry exhausted, tool-level retry

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(discord): remove unused DISCORD_API_BASE import

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: mishrapravin114 <mishrapravin114@users.noreply.github.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-17 09:29:56 +08:00
Siddharth Varshney 7b1e0af155 feat(utils): add proper __init__.py exports for utils module (#3979) 2026-02-16 20:20:09 +08:00
Jeet Karia 7b15616e29 feat(tools): add Exa Search API integration with 4 MCP tools (#4941)
Implements AI-powered web search, content extraction, and research tools
via the Exa API for agent workflows.

Tools: exa_search, exa_find_similar, exa_get_contents, exa_answer

Follows existing tool pattern (web_search_tool, hubspot_tool, slack_tool):
- register_tools(mcp, credentials) with @mcp.tool() decorators
- Credential fallback: CredentialStoreAdapter -> EXA_API_KEY env var
- Error handling: always returns dicts, never raises
- Retry with exponential backoff on HTTP 429

Includes:
- Neural/keyword search with domain, date, and category filters
- Similar page discovery via neural embeddings
- Content extraction from up to 10 URLs per request
- Citation-backed answer generation
- CredentialSpec in credentials/search.py
- Comprehensive unit tests (21 tests)
- 500/500 integration CI tests passing

Fixes #4177
2026-02-16 19:28:32 +08:00
Zhang bd7d2277d8 fix(tui): add Ctrl+J as newline fallback in chat input
Terminals without extended key reporting (VS Code, Cursor) send
identical events for Enter and Shift+Enter, making it impossible
to insert newlines. Ctrl+J produces a distinct key event in all
terminals.
2026-02-15 20:37:52 -08:00
Shivam Shahi– oss/acc 99ed00fd02 feat(tools): add Razorpay payment processing integration (#4467)
* feat(tools): add Razorpay payment processing integration

Add Razorpay MCP tool integration for payment processing, invoicing,
and refund management. Implements 6 MCP tools:

- razorpay_list_payments: List recent payments with filters (pagination, date range)
- razorpay_get_payment: Fetch detailed payment information by ID
- razorpay_create_payment_link: Create one-time payment links with shareable URLs
- razorpay_list_invoices: List invoices with status and type filtering
- razorpay_get_invoice: Fetch invoice details including line items
- razorpay_create_refund: Create full or partial refunds for payments

Features:
- Authentication via HTTP Basic Auth (RAZORPAY_API_KEY + RAZORPAY_API_SECRET)
- Credential spec in dedicated razorpay.py (follows repo pattern)
- Comprehensive error handling (401, 403, 404, 400, 429, 500, timeouts)
- Input validation (payment IDs, invoice IDs, amounts, currencies)
- Full test coverage (42 unit tests, 26 integration tests)

Closes #4404

* style: fix ruff I001 import order and W291 in tools

* fix: improve Razorpay credential tracking and validation

- Add razorpay_secret CredentialSpec with credential_group
- Fix amount=0 bug by using 'is not None' checks
- Add regex validation for payment/invoice IDs

* fix: use graceful credential handling instead of raising TypeError

Match codebase convention (calcom, lusha) - return None for non-string
credentials instead of raising TypeError, so the tool returns an error
dict instead of crashing.

---------

Co-authored-by: hundao <alchemy_wimp@hotmail.com>
2026-02-16 12:02:16 +08:00
Timothy @aden f7af5f9ee8 Merge pull request #4926 from TimothyZhang7/example/gmail-inbox-guardian-agent
Release / Create Release (push) Waiting to run
chore(micro-fix): change the timer to 5 minutes
2026-02-15 19:30:27 -08:00
RichardTang-Aden e5bcc8005f Merge pull request #4922 from adenhq/feat/vulnerability_agent
Feat/vulnerability agent
2026-02-15 19:21:00 -08:00
Timothy 352d285212 chore: change the timer to 5 minutes 2026-02-15 18:47:22 -08:00
Timothy @aden 3ef60f9d14 Merge pull request #4925 from TimothyZhang7/example/gmail-inbox-guardian-agent
Example(micro-fix)/gmail inbox guardian agent
2026-02-15 18:44:19 -08:00
Timothy a103312127 feature: display timer status interactively 2026-02-15 18:41:45 -08:00
Timothy 3d0bba4167 example(agents): ready-to-use gmail automation agent 2026-02-15 18:34:14 -08:00
Timothy @aden 3df718cc14 Merge pull request #4920 from TimothyZhang7/fix/issue-4905
Fix/issue 4905
2026-02-15 18:08:28 -08:00
RichardTang-Aden c7497a180e Merge pull request #4918 from TimothyZhang7/feature/multi-entry-event-driven-agents
Fix/multi entry event driven agents
2026-02-15 18:05:43 -08:00
Bryan @ Aden 3f39039a21 Merge pull request #4800 from LukeM94/doc/typo-fixes-in-roadmap.md
doc: Fix typos in docs/roadmap.md
2026-02-16 01:57:35 +00:00
Bryan @ Aden 88fbd90fcc Merge pull request #4799 from zhanglinqian/fix-recipes-readme-links
docs: fix incorrect directory names in recipes README
2026-02-16 01:55:51 +00:00
bryan e0bf09dd78 lint fixes 2026-02-15 17:45:56 -08:00
bryan 3e158b07af Merge branch 'main' into feat/vulnerability_agent 2026-02-15 17:35:59 -08:00
Timothy 5319ed7ee1 chore: remove unsolicited docs 2026-02-15 17:32:36 -08:00
Timothy 978904d2a4 fix(executor): async operations on non-streaming llm complete for healthy event loop 2026-02-15 17:31:18 -08:00
bryan 4d876ecc54 vulnerability check to sample agents 2026-02-15 17:27:09 -08:00
RichardTang-Aden ba327d0b9e Merge pull request #4919 from adenhq/feat/sample-agent/job_hunter
Sample agent, micro-fix: remove dependency of brave search
2026-02-15 16:58:06 -08:00
Richard Tang b69cf3523c feat: remove dependency of brave search 2026-02-15 16:47:53 -08:00
Timothy 4d8c8e9308 feat(arch): architecture patches to support multi-entry agents consuming external events 2026-02-15 16:19:58 -08:00
Amit Kumar b70885934c [Integration]: Cal.com - Open Source Scheduling Infrastructure #3188 (#3255)
* feat(tools): add Cal.com scheduling integration with 8 MCP tools

Adds Cal.com API integration for booking and scheduling management:
- calcom_list_bookings, calcom_get_booking, calcom_create_booking, calcom_cancel_booking
- calcom_get_availability, calcom_update_schedule
- calcom_list_event_types, calcom_get_event_type

Includes dedicated credential spec, 20 unit tests, and full integration conformance.
Resolves #3188

* fix(calcom): address PR review + add missing list_schedules MCP tool

- Add isinstance(api_key, str) guard in _get_api_key, return None on
  non-string values for consistent error-dict handling
- Remove duplicate metadata from responses object in create_booking;
  metadata stays at top-level per Cal.com v1 API spec
- Expose availability parameter in calcom_update_schedule MCP tool
- Add calcom_list_schedules tool wrapping existing client method —
  needed to discover schedule IDs before calling update_schedule
- Register calcom_list_schedules in credential spec for CI conformance
- Add tests for credential handling, schedule availability, and
  list_schedules (24 tests total, 9 tools)

* docs: update README to include calcom_list_schedules (9 tools)

---------

Co-authored-by: hundao <alchemy_wimp@hotmail.com>
2026-02-15 20:14:00 +08:00
paarths-collab 722b087fc0 docs(readme): clarify installation and prevent root pip install misuse 2026-02-15 17:39:41 +05:30
Aaryann Chandola 0c7ea272db [integration] feat(tools): add Google Calendar integration (#3171)
* feat(calendar): add Google Calendar integration with event management tools and health checks

* fix(calendar): align google_calendar_oauth credential spec with codebase pattern
2026-02-15 08:25:53 +08:00
Aaryann Chandola 5e4f322fc0 add new time tool for current date/time retrieval (#3425) 2026-02-14 21:57:07 +08:00
zhanglinqian c02e45f1aa docs: fix incorrect directory names in recipes README 2026-02-14 21:19:47 +08:00
LukeM94 a7217f138c Fix typos in docs/roadmap.md
Correct hyphenation and spelling in the product roadmap: change 'outcome oriented' to 'outcome-oriented' and fix 'Workder' to 'Worker' in the Deployment section.
2026-02-14 13:18:03 +00:00
Emmanuel Nwanguma 3502f25048 [Integration] feat(tools): add BigQuery MCP tool for SQL querying and data analysis (#3350)
* feat(tools): add BigQuery MCP tool for SQL querying and data analysis

- Add run_bigquery_query tool for executing read-only SQL queries
- Add describe_dataset tool for exploring dataset schemas
- Implement safety features: read-only enforcement, row limits (max 10k)
- Add comprehensive unit tests (27 tests passing)
- Follow CredentialStoreAdapter pattern from email tool
- Support ADC and service account authentication

Fixes #3067

* fix(bigquery): address PR review feedback

- Add credential_id and api_key_instructions to CredentialSpec
- Fix credential key name from 'bigquery_credentials' to 'bigquery'
- Pass credential path to BigQuery client via environment variable
- Fix ADC error message detection for both error variants
- Move google-cloud-bigquery to optional dependencies
- Update tests to use correct credential key names

All 27 tests passing

* fix(bigquery): return {error, help} when dependency missing

* fix(bigquery): return full ImportError message for missing dependency

* fix(bigquery): include 'help' key when dependency missing
2026-02-14 19:48:03 +08:00
RichardTang-Aden 93c026fe31 Merge pull request #4759 from adenhq/feat/sample-agent/job_hunter
[Feature][Sample Agent]: Job Hunting Agent
2026-02-13 20:24:35 -08:00
bryan e515977b96 Merge branch 'main' into feat/vulnerability_agent 2026-02-13 20:16:48 -08:00
bryan 045490a097 testing agent run 1-5 2026-02-13 20:16:12 -08:00
Richard Tang b25903fb7f feat: init job hunter 2026-02-13 20:08:55 -08:00
bryan acf4bd5152 tools for sample agent 2026-02-13 19:14:59 -08:00
Timothy 1f5711e1a1 Merge branch 'fix/transient-error-handlings' into feat/inbox-management 2026-02-13 18:53:33 -08:00
RichardTang-Aden ca2dd90313 Merge pull request #4610 from adenhq/feat/inbox-management
Feat/inbox management
2026-02-13 18:51:26 -08:00
Timothy 21e07f3b65 Merge branch 'feature/load-by-bytes' into feat/inbox-management 2026-02-13 18:43:45 -08:00
Timothy e8a06ddd34 feat: load by bytes instead of rows 2026-02-13 18:41:16 -08:00
bryan 34cc09904f fix pytest 2026-02-13 18:27:22 -08:00
bryan f6bba8b62f Merge branch 'main' into feat/inbox-management 2026-02-13 18:17:57 -08:00
bryan d241ad60f8 updated tui to two panel 2026-02-13 18:13:12 -08:00
Timothy 5a3fcf9a8a Merge branch 'main' into feat/inbox-management 2026-02-13 16:39:07 -08:00
Timothy 1f8a47203f fix: common transient errors and loop detection 2026-02-13 16:14:43 -08:00
RichardTang-Aden 7240090274 Merge pull request #4525 from e-cesar9/fix/4428-windows-defender-exclusions
perf(windows): add Windows Defender exclusions for 40% faster uv sync
2026-02-13 15:49:58 -08:00
Richard Tang 2e6a47c2df fix(windows): replace unicode bullets with ASCII dashes
The bullet character (•) cannot be displayed properly in PowerShell
on some Windows systems. Use ASCII dash (-) instead for compatibility.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-13 15:48:48 -08:00
Richard Tang 7f5ecd7913 fix: uv local storage 2026-02-13 15:47:57 -08:00
RichardTang-Aden 105b98b113 Merge pull request #4646 from jaathavan18/docs/issue-4645-fix-exports-link
docs: fix broken exports/ link in environment-setup.md
2026-02-13 15:20:07 -08:00
RichardTang-Aden 114e65ab41 Merge pull request #4649 from jaathavan18/docs/issue-4648-fix-setup-python-ref
docs: fix references to nonexistent setup-python.sh
2026-02-13 15:19:50 -08:00
RichardTang-Aden 0fc13a5cc3 Merge pull request #4660 from Hima-de/fix/antigravity-setup-docs
docs: fix nonexistent setup-python.sh reference in antigravity-setup.md
2026-02-13 15:19:28 -08:00
bryan e651799e9e Merge branch 'main' into feat/inbox-management 2026-02-13 10:34:33 -08:00
Timothy @aden fcd3e514de Merge pull request #4722 from TimothyZhang7/main
chore(micro-fix): fix removed message
2026-02-13 09:43:05 -08:00
Timothy 7ab41de3a2 chore: fix removed message 2026-02-13 09:40:37 -08:00
Timothy @aden 58e023f277 Merge pull request #4720 from TimothyZhang7/feature/event-source
Feature/event source
2026-02-13 09:37:13 -08:00
Timothy @aden a98f2d5b86 Merge pull request #4647 from TimothyZhang7/feature/memory-inheritance
feat:phased compaction and event bus integration
2026-02-13 09:25:44 -08:00
Amit Kumar eca43231c0 [Feature]: Add Excel Tool for reading/writing .xlsx/.xlsm files #2675 (#3004)
* feat(tools): add Excel tool for reading/writing .xlsx/.xlsm files

Add excel_tool module with 7 MCP tools:
- excel_read: Read data from Excel files with pagination
- excel_write: Create new Excel files
- excel_append: Append rows to existing Excel files
- excel_info: Get file metadata (sheets, columns, rows)
- excel_sheet_list: List sheet names in a file
- excel_sql: Query Excel with SQL (DuckDB), multi-sheet support
- excel_search: Search values across sheets with match options

Includes 56 tests, openpyxl optional dependency, and documentation.

Fixes #2675

* fix(tools): address excel review feedback and stabilize tests
2026-02-13 19:32:07 +08:00
Amit Kumar 6763077887 feat(tools): add Google Maps Platform integration with 6 MCP tools (#3796)
Implements geocoding, routing, and location intelligence via Google Maps
Platform Web Services APIs for logistics, delivery, and location-based
agent workflows.

Tools: maps_geocode, maps_reverse_geocode, maps_directions,
maps_distance_matrix, maps_place_details, maps_place_search

Includes:
- page_token support for paginated place search results
- GoogleMapsHealthChecker for credential validation
- Comprehensive unit tests (42 tool tests, 30 health check tests)

Closes #3179
2026-02-13 19:01:28 +08:00
Siddharth Varshney f85ff8a2f8 feat(tools): add Telegram Bot integration (#3550)
* feat(tools): add Telegram Bot integration

- Add telegram_send_message and telegram_send_document tools
- Add credential spec for TELEGRAM_BOT_TOKEN
- Add comprehensive tests (18 test cases)
- Add README documentation with setup instructions

* fix(telegram_tool): catch network errors instead of letting them raise

---------

Co-authored-by: hundao <alchemy_wimp@hotmail.com>
2026-02-13 17:10:40 +08:00
Amit Kumar 1a5c3480e6 feat(tools): add NewsData + Finlight MCP tools for market intelligence (#4473)
* feat(tools): add news MCP tools

* fix(tools): adjust news providers and fallback

* fix(tools): add rate limiting + sentiment normalization to news tools

Address PR feedback: exponential backoff (3 retries, 2^attempt delays)
on HTTP 429 for both NewsData and Finlight with seamless provider
fallback, and normalize sentiment scores to [-1.0, +1.0] range.

* fix(news_tool): make provider fallback lazy and handle network errors

---------

Co-authored-by: hundao <alchemy_wimp@hotmail.com>
2026-02-13 16:25:56 +08:00
Hima-de 69a7fe7b92 docs: fix nonexistent setup-python.sh reference in antigravity-setup.md
Replaces references to ./scripts/setup-python.sh with ./quickstart.sh,
which is the actual setup script in the repository.

Fixes #4648
2026-02-13 06:30:36 +00:00
Emmanuel Nwanguma a5418d760f fix(runtime): validate and create storage path on init (#4466)
- Add path validation in Runtime.__init__()
- Log warning when creating nonexistent storage directory
- Auto-create directory with parents to prevent silent failures later
- Implements Option 3 from issue discussion (explicit, safe, informative)

Fixes #1870
2026-02-13 14:24:14 +08:00
T.Trinath Reddy 0deeb87c63 feat(vision): add GCP Vision API integration (#4231)
* feat(vision): add GCP Vision API integration

* refactor(vision): move GCP Vision credentials to dedicated folder

* fix: clean up credentials imports and updated gitignore

* followed ruff alphabetic order for credentials
2026-02-13 14:00:15 +08:00
Timothy d1d5f49c5a fix: add more events to event bus 2026-02-12 20:42:20 -08:00
bryan 917e23ccc8 Merge branch 'main' into feat/inbox-management 2026-02-12 20:30:34 -08:00
RichardTang-Aden 988922304f Merge pull request #4451 from Ttian18/fix/tina-tui-copy-newline-4423
Release / Create Release (push) Waiting to run
fix(tui): add multiline input and cross-platform clipboard support
2026-02-12 20:27:32 -08:00
bryan ab2bd726c3 Merge branch 'main' into feat/inbox-management 2026-02-12 20:24:46 -08:00
bryan 713fefb163 inbox-manager is now continuous 2026-02-12 20:22:40 -08:00
Timothy 83140a1398 feat: event source in runtime 2026-02-12 19:52:15 -08:00
bryan cafa6dd930 update to inbox management agent 2026-02-12 19:17:54 -08:00
jaathavan18 82e1af1a7a docs: fix references to nonexistent setup-python.sh (#4648) 2026-02-12 22:08:35 -05:00
jaathavan18 30c3dc9205 docs: fix broken exports/ link in environment-setup.md (#4645) 2026-02-12 22:04:25 -05:00
Timothy 9a3c6703e1 feat:phased compaction and event bus integration 2026-02-12 18:41:32 -08:00
RichardTang-Aden e26468aa19 Merge pull request #4587 from Antiarin/feat/codex-integration
feat(codex): add project setup, quickstart bootstrap, and docs for Codex agent support
2026-02-12 17:55:47 -08:00
Richard Tang fe14992696 docs: update documentation 2026-02-12 17:54:08 -08:00
Richard Tang d0775b95c6 fix: remove unused agents.md and quickstart clause 2026-02-12 17:48:31 -08:00
Richard Tang 96121b5757 fix: correct codex setups 2026-02-12 17:35:15 -08:00
Timothy @aden 11c003c48d Merge pull request #4636 from TimothyZhang7/feature/memory-inheritance
Feature: Conversation Memory & Continuous Agent Session
2026-02-12 13:52:19 -08:00
Timothy fbe72c58ae chore: fix ci tests 2026-02-12 13:49:34 -08:00
bryan 816156e87f merge: PR #4636 feature/memory-inheritance into feat/inbox-management
Brings in append_data tool, continuous conversation mode, conversation
judge, phase compaction, and prompt composer from the memory-inheritance
feature branch.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-12 13:23:25 -08:00
bryan 7bceab3cea wip: inbox management agent setup and gmail tool updates
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-12 13:22:19 -08:00
Timothy 83d7f56728 chore: update agent skills for new design philosophy 2026-02-12 12:10:37 -08:00
Timothy 76deba2a6a feat: consistent memory system 2026-02-12 11:41:22 -08:00
Antiarin d9d048b9e3 docs: update quickstart script for symlink handling and add Codex CLI documentation 2026-02-12 23:34:17 +05:30
bryan 930f417729 Merge remote-tracking branch 'origin/main' into feat/inbox-management 2026-02-12 08:44:13 -08:00
bryan 8e214d06c1 moved inbox management to sample agents 2026-02-12 08:33:43 -08:00
e-cesar9 63e0348963 fix(ux): improve failure messaging with success rate
Shows clear success rate when some exclusions fail (e.g., "Only 2/3
exclusions added (67%)"). Helps users understand that performance
benefit may be reduced when not all paths are excluded.

- Calculates success rate as percentage
- Shows "X/Y added (Z%)" format for clarity
- Warns that performance benefit may be reduced
- Better visibility of partial failures

Improves user awareness of partial installation issues.
2026-02-12 17:30:17 +01:00
e-cesar9 b46a5f0247 fix(robustness): recheck Defender status before adding exclusions
Adds Test-IsDefenderEnabled helper and rechecks Defender status and
paths before adding exclusions. Prevents adding ineffective exclusions
if Defender was disabled during user prompt (race condition).

- New helper: Test-IsDefenderEnabled() for quick boolean check
- Recheck Defender status immediately before Add operation
- Recheck paths to detect if added by another process
- Clear messages if status changed or already added

Fixes race condition where user could disable Defender or another
process could add exclusions during the prompt delay.
2026-02-12 17:29:44 +01:00
e-cesar9 79dfd90068 fix(security): add path validation for Defender exclusions
Validates that paths are within safe boundaries (project directory or
user AppData) before excluding them from Defender. Prevents accidental
or malicious exclusion of system directories.

- Adds safePrefixes validation (project dir + user AppData only)
- Checks each path against allowed prefixes
- Normalizes paths for consistent comparison
- Warns but processes non-existent paths (they may be created later)

Fixes potential security issue where modified script could exclude
system directories like C:\Windows or C:\Program Files.
2026-02-12 17:29:00 +01:00
Antiarin f9d5c7c751 fix(codex): patch prompt injection, checkout version, and style in codex-issue-triage workflow 2026-02-12 20:25:19 +05:30
Antiarin 8958fb2d88 eat(codex): add project setup, quickstart bootstrap, and docs for Codex agent support 2026-02-12 18:32:58 +05:30
Zhang 3c51f2ac36 fix(tui): address code review feedback on TextArea migration
- Subclass TextArea as ChatTextArea to intercept Enter key before
  the base class swallows it (fixes submission not triggering)
- Remove event.shift access that raises AttributeError on Key events
- Make action_show_sessions directly call _submit_input instead of
  just placing text in the widget
2026-02-11 23:45:30 -08:00
RichardTang-Aden 170a0918f7 Merge pull request #4538 from jaathavan18/docs/issue-4493-remove-changelog-references-v2
docs: remove references to nonexistent CHANGELOG.md
2026-02-11 19:15:00 -08:00
jaathavan18 e3da3b619c docs: remove references to nonexistent CHANGELOG.md (#4493) 2026-02-11 21:48:28 -05:00
RichardTang-Aden 6e32513b79 Merge pull request #4535 from RichardTang-Aden/main
docs: add the automonous agent guide
2026-02-11 17:11:02 -08:00
Richard Tang 520e1963ee docs: add the automonous agent guide 2026-02-11 17:10:21 -08:00
RichardTang-Aden 843b9b55e2 Merge pull request #2862 from himanshu748/feat/antigravity-ide-2571
feat: Antigravity IDE support for MCP servers and skills
2026-02-11 16:59:15 -08:00
Richard Tang ccd305ff96 fix: remove incorrect rules folder 2026-02-11 16:58:02 -08:00
Richard Tang 3bd0d1e48c feat: add Antigravity workflows for all hive skills 2026-02-11 16:56:15 -08:00
Richard Tang d9bfa8e675 docs: rename .antigravity to .agent for Antigravity IDE compatibility 2026-02-11 16:42:26 -08:00
Richard Tang 27746147e2 fix: fix the antigravity config folder 2026-02-11 16:41:21 -08:00
Richard Tang 3a0b642980 fix: update the antigravity config 2026-02-11 16:39:28 -08:00
Richard Tang 8c0241f087 fix: use --directory instead of cwd for Antigravity MCP compatibility 2026-02-11 16:36:12 -08:00
Richard Tang 958d016174 fix: use uv run for MCP servers, script reads from template 2026-02-11 16:03:32 -08:00
Richard Tang 913d318ada docs: emphasize restarting Antigravity after MCP setup, fix config path and skill names 2026-02-11 15:57:11 -08:00
Richard Tang 8212920cb7 fix: correct .antigravity/skills symlinks to point to hive-* directories 2026-02-11 15:52:27 -08:00
Richard Tang 6414be7bd4 fix: change the wrong antigravity mcp path 2026-02-11 15:47:54 -08:00
himanshu748 ac62a82d08 docs: clarify Antigravity setup for everyone
- Define repo root at top; lead with quick start (3 steps)
- Add 'What you get' and prerequisites in one place
- Full setup step-by-step; troubleshooting: problem → fix
- Manual MCP config as single section; verification optional
- Plain language, scannable structure, no duplicate sections
2026-02-11 15:45:56 -08:00
himanshu748 a670548a57 Antigravity: one-command setup script and clearer docs
- Add scripts/setup-antigravity-mcp.sh to auto-detect repo root and write
  ~/.gemini/mcp.json with absolute paths (no manual path editing)
- Lead docs with Quick start (3 steps) and note ./ vs / for the script
- README: point to one-command setup; clarify script runs from repo folder
2026-02-11 15:45:56 -08:00
himanshu748 c4a7463f9d docs(antigravity): global config, absolute cwd, cwd warning note
- Add step to run core/setup_mcp.sh first
- Document that IDE often loads ~/.claude/mcp.json, not project config
- Add Option A: copy to ~/.claude/mcp.json with absolute cwd paths
- Note that cwd schema warning in IDE is a false positive
- Renumber setup steps (1–5)
2026-02-11 15:45:30 -08:00
himanshu748 edf0ac5270 docs: add how-to-verify section for Antigravity setup 2026-02-11 15:45:30 -08:00
himanshu748 8ff6b76f37 feat: Antigravity IDE support for MCP servers and skills (#2571)
- Add .antigravity/mcp_config.json with agent-builder and tools MCP servers
- Add .antigravity/skills/ with symlinks to .claude/skills/ (5 skills)
- Add docs/antigravity-setup.md with setup and troubleshooting
- Update README.md with Antigravity IDE support section
- Update DEVELOPER.md and docs/contributing-lint-setup.md with Antigravity refs

Mirrors Cursor integration for consistent multi-IDE support.
2026-02-11 15:45:30 -08:00
Bryan @ Aden c9f9eb365c Merge pull request #4441 from saurabh007007/feature/docs/spelling
docs:fix typo in docs for the directory in environment steup
2026-02-11 23:37:44 +00:00
e-cesar9 7a17c115d3 perf(windows): add Windows Defender exclusions for 40% faster uv sync
Fixes #4428

- Add Step 2.5 to quickstart.ps1 for optional Defender exclusions
- Requires admin + explicit user consent (default=no)
- Handles non-admin gracefully (shows manual commands)
- Improves uv sync by ~40% and cold-start by ~30% on Windows 11
- Never fails installation (graceful degradation)

Implementation:
- 3 helper functions: Test-IsAdmin, Test-DefenderExclusions, Add-DefenderExclusions
- Excludes: project dir, .venv, %APPDATA%\uv
- Security: Default=no, admin-only, clear trade-offs explained
- Robustness: Handles Defender disabled, existing exclusions, partial failures
- UX: Follows existing Write-Step/Ok/Warn patterns, clipboard fallback
- Testing: Idempotent, can run multiple times safely
2026-02-11 23:57:55 +01:00
Bryan @ Aden 9a2a11055f Merge pull request #4520 from jaathavan18/docs/issue-4495-remove-config-yaml-reference
docs: remove reference to nonexistent config.yaml
2026-02-11 22:36:16 +00:00
bryan f21aecd91c Add Gmail inbox management tools (list, get, trash, modify labels) 2026-02-11 14:33:01 -08:00
jaathavan18 4aef73c1d7 docs: remove reference to nonexistent config.yaml (#4495) 2026-02-11 15:07:04 -05:00
Rudra2637 906480a6e8 Guard unsupported criterion types in _evaluate_criterion 2026-02-12 01:12:52 +05:30
bryan 9df147b450 removed send_budget_alert_email, require provider param 2026-02-11 11:26:23 -08:00
RichardTang-Aden b71b4b0fc2 Merge pull request #4500 from RichardTang-Aden/main
fix: remove unused tools from root .mcp.json
2026-02-11 10:23:45 -08:00
Richard Tang 1bd2510c52 fix: remove unused tools from root .mcp.json 2026-02-11 10:20:20 -08:00
RichardTang-Aden 28b81092f9 Merge pull request #4498 from jaathavan18/docs/issue-4494-fix-env-example-reference
docs: fix .env.example references in tools README
2026-02-11 10:11:53 -08:00
RichardTang-Aden 4b9a3abba6 micro-fix: python lint error (#4499)
* micro-fix: python lint error

* micro-fix: python lint format
2026-02-11 10:09:17 -08:00
Richard Tang 0c76b6dcb1 micro-fix: python lint format 2026-02-11 10:08:40 -08:00
Richard Tang 090a85b41b micro-fix: python lint error 2026-02-11 10:04:25 -08:00
jaathavan18 992d573573 docs: fix .env.example references in tools README (#4494) 2026-02-11 12:58:51 -05:00
e-cesar9 9e768e660b micro-fix: move inline imports to module level in edge.py (#4480)
Fixes #4445

Moved repeated inline imports (logging, json, re) to module-level:
- Eliminates import overhead on every method call
- Follows PEP 8 conventions
- Added module-level logger instance
- re is used at line 259 (re.search)

Changes:
- 4 lines added (imports + logger)
- 13 lines removed (inline imports)
- No functional changes
2026-02-11 09:55:14 -08:00
RichardTang-Aden 26b9ed362e Merge pull request #4479 from e-cesar9/microfix/4444-typo-stirct
micro-fix: fix typo STIRCT → STRICT in safe_eval.py
2026-02-11 09:48:55 -08:00
Timothy @aden 976ae75fde Merge pull request #4487 from adenhq/feature/windows-quickstart
chore(micro-fix): windows quickstart
2026-02-11 08:18:53 -08:00
Timothy @aden 9da91b5319 Merge branch 'main' into feature/windows-quickstart 2026-02-11 08:18:07 -08:00
Timothy @aden 2493beaf5a Merge pull request #4437 from GovindhKishore/fix/powershell-syntax-error
micro-fix: resolve NativeCommandError in uv sync and ParserError in dynamic env var assignment
2026-02-11 07:51:01 -08:00
e-cesar9 d63dd021ab micro-fix: fix typo STIRCT → STRICT in safe_eval.py
Fixes #4444
2026-02-11 14:40:25 +01:00
Zhang 697ba89314 fix(tui): add multiline input and cross-platform clipboard support (#4423)
Replace single-line Input widget with TextArea in chat REPL so
Shift+Enter inserts newlines and multiline paste works correctly.
Add Windows clipboard support (clip.exe) and xsel fallback for Linux.
2026-02-11 00:01:09 -08:00
Arshad Uzzama Shaik b6c65ab5d5 fix(security): handle unix-style absolute paths correctly on Windows (#1204)
* fix(security): normalize unix-style paths on windows to prevent sandbox escape (fixes #499)

* fix(security): handle unix-style absolute paths correctly on Windows

* fix: address review comments on path sanitization

- Strip leading whitespace to prevent startswith bypass
- Replace lstrip with single-char strip to preserve UNC paths
- Expand ValueError comment for clarity

---------

Co-authored-by: Arshad Shaik <arshad.shaik@violetis.ai>
Co-authored-by: hundao <alchemy_wimp@hotmail.com>
2026-02-11 15:40:38 +08:00
Arshad Uzzama Shaik 162f9a55ad fix(mcp): log errors in _load_active_session instead of silencing them (fixes #682) (#684)
Co-authored-by: Arshad Shaik <arshad.shaik@violetis.ai>
2026-02-11 15:09:10 +08:00
Govindh Kishore e484fdfa51 fix: Replaced multi-argument Join-Path with [System.IO.Path]::Combine for PS 5.1 compatibility. 2026-02-11 12:30:25 +05:30
Amit Kumar 77d9ccf2e4 feat(tools): add SerpAPI tools for Google Scholar & Patents search (#3986)
Implements 5 tools as proposed in #3224:
- scholar_search: Search Google Scholar for academic papers
- scholar_get_citations: Get citation formats (MLA, APA, Chicago, etc.)
- scholar_get_author: Author profiles with h-index, i10-index, metrics
- patents_search: Search Google Patents with filters
- patents_get_details: Detailed patent information by publication number

Follows existing tool pattern (web_search_tool, hubspot_tool, slack_tool):
- register_tools(mcp, credentials) with @mcp.tool() decorators
- _SerpAPIClient internal class for HTTP calls via httpx
- Credential fallback: CredentialStoreAdapter -> SERPAPI_API_KEY env var
- Error handling: always returns dicts, never raises

25 unit tests + live integration tests verified.
697/697 full test suite passing.

Fixes #3224
2026-02-11 14:46:59 +08:00
Govindh Kishore 94e39ee09e fix: resolve NativeCommandError by stabilizing uv sync output capture 2026-02-11 11:59:15 +05:30
saurabh007007 373ad77008 docs:fix typo in docs 2026-02-11 11:46:26 +05:30
Govindh Kishore 661b0c0038 fix: resolve PowerShell ParserError in quickstart.ps1 2026-02-11 11:02:37 +05:30
Govindh Kishore 8ed38bf0e2 fix: resolve PowerShell ParserError in quickstart.ps1 2026-02-11 11:01:33 +05:30
RichardTang-Aden 4d675dfff7 Merge pull request #4431 from adenhq/developer-success-documentation
docs: Developer success documentation
2026-02-10 20:12:11 -08:00
Richard Tang b42a3293f1 docs: change docs tune 2026-02-10 19:37:14 -08:00
Timothy @aden 87e9bf853d Merge pull request #4425 from TimothyZhang7/feature/windows-quickstart
feat(quickstart): windows script and cli
2026-02-10 19:26:23 -08:00
Timothy @aden c56f78422a Merge pull request #4408 from adenhq/fix/first-success
(micro-fix): Fix/first success
2026-02-10 18:42:12 -08:00
Timothy ac311e10ba Merge remote-tracking branch 'origin/main' into fix/first-success 2026-02-10 18:40:27 -08:00
Timothy @aden 0297520263 Merge pull request #4309 from TimothyZhang7/feature/automated-testing-skill
Feature/automated testing skill
2026-02-10 18:36:57 -08:00
Bryan @ Aden 4803552a7a Merge pull request #4429 from adenhq/feat/opencode-skills
(micro-fix): added skills to opencode, linking to claude
2026-02-11 02:23:37 +00:00
Bryan @ Aden b8d85ff723 Merge pull request #4021 from NamanRajput-git/main
docs: fix incorrect submission email in quizzes documentation
2026-02-11 02:20:33 +00:00
bryan 7d571dfaec added skills to opencode, linking to claude 2026-02-10 18:20:21 -08:00
Richard Tang ba02e53bdd docs: update the use cases 2026-02-10 18:15:40 -08:00
Bryan @ Aden 153e6142ff Merge pull request #4243 from vakrahul/feat/opencode-integration
feat: Add Opencode integration as a Coding Agent
2026-02-11 02:12:10 +00:00
Bryan @ Aden 228449c9d8 Merge pull request #3894 from AkhileshBabuT/docs/fix-environment-setup-3837-3841
docs: fix environment-setup.md for uv workspace setup
2026-02-11 02:09:33 +00:00
vakrahul c65eed8802 docs: apply PR review changes for quickstart and markdown formatting and updating 2026-02-11 07:36:23 +05:30
Richard Tang 40d32f2e01 docs: deployment strategies 2026-02-10 18:02:08 -08:00
vakrahul c83aac5e12 docs: apply PR review changes for quickstart and markdown formatting 2026-02-11 07:29:48 +05:30
vakrahul 48b9241247 chore: address PR review feedback (formatting and outdated references) 2026-02-11 07:18:17 +05:30
Richard Tang 7779bc5336 docs: use cases for first success 2026-02-10 17:42:05 -08:00
Bryan @ Aden beec549f74 Merge pull request #4417 from e-cesar9/micro-fix/debug-logging-level
micro-fix: change debug logging to appropriate level in executor.py
2026-02-11 01:30:20 +00:00
Bryan @ Aden 310698ecc0 Merge pull request #4287 from nhockcuncon77/docs/contributing-numbering-and-tools-tests
docs(contributing): fix duplicate step numbers and add tools test command
2026-02-11 01:26:00 +00:00
Bryan @ Aden 4f719c4778 Merge pull request #4299 from YashovardhanB28/fix/llm-token-counting
docs(fix): correct LLMNode token counting bug
2026-02-11 01:25:51 +00:00
bryan 4cc00f3bdc removed dead tests, updated some drifting behavior 2026-02-10 16:56:36 -08:00
bryan 1f9c47fef1 removed twitter agent, create new terminal, resume command, update quickstart 2026-02-10 16:41:08 -08:00
e-cesar9 80a4980640 micro-fix: change debug logging to appropriate level in executor.py
Changes logger.info() with debug prefix to logger.debug() for
session state resume information. This prevents debug-level
information from appearing in production logs at INFO level.

- Removes redundant '🔍 Debug:' prefix
- Uses appropriate debug logging level
- Follows Python logging best practices
- Improves production log clarity

Addresses #4377
2026-02-11 00:15:17 +01:00
Timothy Zhang 8dbe424f5a fix: environment loader compatibility tweak for windows 2026-02-10 15:10:32 -08:00
Timothy Zhang ec9bf033e6 feat: windows quickstart and hive cli 2026-02-10 15:09:38 -08:00
Richard Tang a2d21ec7bc docs: update the developer profiles 2026-02-10 13:10:59 -08:00
Richard Tang 06ccc853ee docs: explain developer success as our principle 2026-02-10 13:05:54 -08:00
Richard Tang 4847332161 add placeholders 2026-02-10 13:01:22 -08:00
Richard Tang 8c1ee54725 docs: publish our developer success roadmap 2026-02-10 12:56:54 -08:00
Timothy @aden 5e537d9d55 Merge pull request #4410 from TimothyZhang7/feature/windows-quickstart
feat(quickstart): windows script
2026-02-10 12:52:13 -08:00
Timothy d6b95067a1 feat(quickstart): windows script 2026-02-10 12:49:25 -08:00
bryan 32cae75ef5 intro msg 2026-02-10 11:28:34 -08:00
bryan 21e7554cdb cred update in quickstart, sample agent check before agent run, agent has welcome msg 2026-02-10 11:27:17 -08:00
Timothy 374442e900 Merge branch 'main' into feature/automated-testing-skill 2026-02-09 20:16:11 -08:00
vakrahul a1a0ec5ddb Remove skills folder (reviewer will handle symlinks) and cleanup doc refs 2026-02-10 09:45:44 +05:30
Timothy 1fd56b079c fix: test cases 2026-02-09 20:12:37 -08:00
Timothy @aden a12163d63f Merge pull request #4304 from adenhq/fix/init-config
Release / Create Release (push) Waiting to run
model selection + max_tokens in quickstart
2026-02-09 20:11:55 -08:00
RichardTang-Aden 0cd6f21980 Merge pull request #4270 from TimothyZhang7/feature/hard-goal-negotiation
Feature/hard goal negotiation
2026-02-09 20:04:20 -08:00
Richard Tang a88fc1d75c fix: remove the unnecessary summary before checking capabilities and gaps 2026-02-09 19:59:49 -08:00
vakrahul 87b0037fcd Add Opencode skills (mirrored from .cursor/skills) 2026-02-10 09:25:43 +05:30
Timothy 767d32d420 fix: stucking test cases 2026-02-09 19:51:50 -08:00
Richard Tang e9bde26611 fix: fixed minor issues introduced by the merge 2026-02-09 19:45:55 -08:00
Richard Tang c02f40622c Merge remote-tracking branch 'upstream/main' into feature/hard-goal-negotiation 2026-02-09 19:42:55 -08:00
vakrahul 929dc24e93 Fix config: use uv for mcp and remove pinned model 2026-02-10 09:12:42 +05:30
vakrahul 8cfb533fef Fix docs: remove recommended tag, fix rendering, remove duplicate setup 2026-02-10 09:10:47 +05:30
Timothy @aden 3328a388b3 Merge pull request #3877 from adenhq/fix/oauth-refresh
(micro-fix): update oauth to refresh token
2026-02-09 19:30:49 -08:00
Richard Tang 8f632eb005 feat: add communication style guideline 2026-02-09 19:28:48 -08:00
Richard Tang c8ee961436 fix: update the step label to avoid confusion 2026-02-09 19:04:05 -08:00
Timothy 6fd7efece6 feat: hive test, quickstart local settings 2026-02-09 18:53:42 -08:00
Richard Tang bc9f6b0af8 feat: update goal negotiation for a more conversational negotiation 2026-02-09 18:52:07 -08:00
bryan 7d48f17867 model selection + max_tokens in quickstart 2026-02-09 18:07:57 -08:00
Timothy 776583b3ad fix: use more sensible default models 2026-02-09 17:07:34 -08:00
Timothy 9c28dae583 fix: gemini should use GEMINI_API_KEY 2026-02-09 17:05:11 -08:00
YashovardhanB 59a315b90b fix(graph): correct LLMNode token counting to include retries 2026-02-10 06:21:54 +05:30
Timothy 866518f188 fix: headless runtime consolidation 2026-02-09 16:29:06 -08:00
RichardTang-Aden 736ae65a1d Merge pull request #4262 from adenhq/feat/build-from-sample
Build from Sample Agent
2026-02-09 16:05:42 -08:00
Bryan @ Aden 76c9f7c9a9 Merge pull request #1834 from fermano/feat/observability-trace-context
feat(observability): structured logging for trace context
2026-02-09 15:25:51 -08:00
Fernando Mano 32ad225d7f feat(observability): Adding OTel-compliant logging to L3 tool logs as introduced by #3715. -- remove redundant text from readme.md 2026-02-09 19:56:17 -03:00
amazonproai e5428bec5c docs(contributing): fix duplicate step numbers and add tools test command
- Fix Getting Started steps 6/7 duplicated; renumber to 8 and 9
- Add command to run tools package tests (cd tools && uv run pytest)

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-09 14:27:53 -08:00
bryan 7ae6f67470 updates to skills, renaming, suggested agents, remove changelog 2026-02-09 13:49:36 -08:00
Timothy faf534511b feat: automated test agent skill 2026-02-09 12:39:20 -08:00
Timothy @aden 594bceb8f5 Merge branch 'adenhq:main' into feature/hard-goal-negotiation 2026-02-09 12:28:19 -08:00
bryan 9dc0f48ec9 implemented building from sample agent template and updated deep research agent 2026-02-09 12:13:41 -08:00
Timothy 9d11f834b8 feat: automated testing skill 2026-02-09 11:05:59 -08:00
vakrahul 131b72cd0c feat: Add native Opencode support with Windows compatibility 2026-02-09 23:27:54 +05:30
Fernando Mano ce5a2d4a81 feat(observability): Adding OTel-compliant logging to L3 tool logs as introduced by #3715. -- remove line that would cause third-party loggers to log twice 2026-02-09 09:36:25 -03:00
Fernando Mano 7f489cee46 Merge branch 'main' into feat/observability-trace-context 2026-02-09 09:25:51 -03:00
Anjali Yadav 3c2d669a2f fix(credentials): correctly resolve integration_id in AdenCredentialResponse.from_dict (#3965)
* fix(credentials): respect integration_id in AdenCredentialResponse.from_dict

* style: fix forward reference annotation for Ruff
2026-02-09 17:52:55 +08:00
Timothy @aden ec36e96499 Merge pull request #4146 from TimothyZhang7/main
docs(release): release v0.4.2 - resumable sessions
2026-02-08 20:49:59 -08:00
Timothy 9ecd4980e4 chore: release v0.4.2 - resumable sessions
Release / Create Release (push) Waiting to run
- Add comprehensive resumable session functionality
- Immediate pause with Ctrl+Z and /pause command
- Auto-save state on quit
- Session management with /resume and /sessions commands
- Full memory and conversation history restoration
- See CHANGELOG.md for complete list of changes
2026-02-08 20:44:36 -08:00
Timothy @aden 64446ff9b6 Merge pull request #4141 from TimothyZhang7/feature/resumable-sessions
Feature/resumable sessions

Release candidate for v0.4.2
2026-02-08 20:40:33 -08:00
Timothy e3d2262292 fix: quit timeout, and tui interactions 2026-02-08 20:30:30 -08:00
Timothy 891cfa387a Merge branch 'main' into feature/resumable-sessions 2026-02-08 19:46:30 -08:00
Timothy f0243fddf2 feat: session resumable states and checkpoint system 2026-02-08 19:42:02 -08:00
Bryan @ Aden 85ff8e364b Merge pull request #3828 from Sandeepa-git/docs/fix-contributing-typo
docs(contributing): fix formatting typo in issue link
2026-02-08 19:07:48 -08:00
Bryan @ Aden 75f1afe8e3 Merge pull request #3857 from Manudeserti/docs/add-deep-research-readme
docs: add missing README for Deep Research Agent
2026-02-08 19:07:40 -08:00
Bryan @ Aden 7b660311e5 Merge pull request #4025 from hamzanajam7/docs/fix-getting-started-project-structure
docs(getting-started): fix project structure tree for tools and mcp_server location
2026-02-08 18:44:24 -08:00
Bryan @ Aden 98a493296d Merge pull request #4026 from hamzanajam7/docs/add-contributing-link-readme
docs(readme): add Contributing link to Quick Links section
2026-02-08 18:43:23 -08:00
RichardTang-Aden bc2a42aed2 Merge pull request #3901 from Templar121/docs/clarify-hive-test-generation
docs: clarify test generation responsibility in hive skill
2026-02-08 14:22:31 -08:00
Gaurav kapur 8b501d9091 fix: write node outputs to memory before edge evaluation (#3599) (#3694)
* fix: write node outputs to memory before edge evaluation (#3599)

* test: add regression tests for conditional edge direct key access
2026-02-08 23:23:37 +08:00
Nafiyad Adane cddae0ed18 Refactor Wikipedia tool and improve test structure
Reorganized imports in __init__.py for clarity and consistency. Cleaned up formatting and comments in wikipedia_tool.py. Enhanced test_wikipedia_tool.py by improving patching targets, clarifying comments, and refining test structure for better maintainability.
2026-02-07 18:49:49 -07:00
Nafiyad Adane 9dca42be27 Fix Wikipedia tool import order and test patching
Reorders imports in tools/__init__.py for clarity and groups web and PDF tools together. Updates Wikipedia tool tests to patch httpx.get using the correct import path, ensuring mocks work as intended. Removes unnecessary print statement in Wikipedia tool error handling.
2026-02-07 18:49:44 -07:00
Nafiyad Adane a1f3fe4d55 Add Wikipedia search tool and tests
Introduces a new 'search_wikipedia' tool for searching Wikipedia and retrieving article summaries using the public Wikipedia REST API. Updates documentation and tool registration, and adds unit tests for the new tool.
2026-02-07 18:49:38 -07:00
Fernando Mano 0304b392b2 feat(observability): Adding OTel-compliant logging to L3 tool logs as introduced by #3715. 2026-02-07 19:52:03 -03:00
hamzanajam7 ae9b4e82fe docs(readme): add Contributing link to Quick Links section
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-07 14:52:50 -05:00
hamzanajam7 4bac5e4c46 docs(getting-started): fix project structure tree for tools and mcp_server location
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-07 14:49:04 -05:00
Fernando Mano c4d3400ec4 Merge main into feat/observability-trace-context; resolve execution_stream conflicts 2026-02-07 16:49:04 -03:00
Naman Rajput 1da9bb0c0f Merge pull request #1 from NamanRajput-git/fix-submission-email
Fix submission email in quizzes documentation
2026-02-08 00:53:27 +05:30
Naman Rajput 760ed51ad3 Fix submission email in quizzes documentation
Updates incorrect careers@aden.com to contact@adenhq.com
2026-02-08 00:45:53 +05:30
Amit Kumar 6d0a3b952a feat(tools): add Apollo.io contact and company data enrichment integration (#3167)
Add Apollo.io MCP tool integration for B2B contact and company data
enrichment. Implements 4 MCP tools:
- apollo_enrich_person: Enrich contact by email, LinkedIn URL, or name+domain
- apollo_enrich_company: Enrich company by domain
- apollo_search_people: Search contacts with filters (titles, seniorities, etc.)
- apollo_search_companies: Search companies with filters (industries, size, etc.)

Features:
- Authentication via X-Api-Key header (APOLLO_API_KEY env var)
- Credential spec in dedicated apollo.py (follows repo pattern)
- Comprehensive error handling (401, 403, 404, 422, 429)
- Full test coverage (36 tests)

Closes #3061
2026-02-07 21:57:13 +08:00
Subhayan Mukherjee 873fcd5822 docs: clarify test generation responsibility in hive skill 2026-02-07 11:39:52 +05:30
AkhileshBabuT a08f3a8925 docs: fix environment-setup.md for uv workspace setup
- Fix #3837: Replace pip install -e with uv sync
- Fix #3841: Update venv location to reflect single root .venv
2026-02-06 23:42:07 -05:00
RichardTang-Aden 2a98d3a489 Merge pull request #3890 from RichardTang-Aden/update-readme-gifs
docs(readme): quick fix for the doc links
2026-02-06 20:34:34 -08:00
Richard Tang b681ba03b1 chore: quick fix for the doc links 2026-02-06 20:32:20 -08:00
RichardTang-Aden fe775a36c0 Merge pull request #3887 from RichardTang-Aden/update-readme-gifs
Release / Create Release (push) Waiting to run
feat: add video in the README
2026-02-06 20:21:48 -08:00
Timothy @aden 2df9adcb43 Merge pull request #3886 from TimothyZhang7/fix/quickstart-secret-key
fix(micro-fix): quickstart secret key setup
2026-02-06 20:21:06 -08:00
Richard Tang c756cbf6d5 feat: add video in the README 2026-02-06 20:20:53 -08:00
Timothy d0ac67c9d3 fix: quickstart secret key setup 2026-02-06 20:18:12 -08:00
Timothy 47cd55052f feat: hive-create needs to do some hard negotiation 2026-02-06 19:56:05 -08:00
bryan fb203b5bdf update oauth to refresh token 2026-02-06 19:43:30 -08:00
RichardTang-Aden 6ee47e243d Merge pull request #3876 from RichardTang-Aden/update-readme
Docs Update readme
2026-02-06 19:39:16 -08:00
Richard Tang c1844b7a9d docs: improve readme 2026-02-06 19:30:16 -08:00
Richard Tang 99a29e79e5 fix: fix the documentation python run to uv run 2026-02-06 19:22:16 -08:00
Richard Tang 589a66ef26 docs: remove unused docs 2026-02-06 19:19:49 -08:00
RichardTang-Aden 3f960763cb Merge pull request #3875 from RichardTang-Aden/update-readme
Update readme images
2026-02-06 19:08:46 -08:00
Richard Tang 15f8f3783c chore: update images 2026-02-06 19:07:47 -08:00
Richard Tang a2b045c7e3 chore: remove unnecessary links 2026-02-06 18:18:50 -08:00
Richard Tang 055cef2fdc feat: improve quickstart.sh messages 2026-02-06 18:15:13 -08:00
Timothy @aden 6c6c69cbc3 Merge pull request #3872 from TimothyZhang7/refactor/consolidate-multi-level-log-for-tui
docs(path): Align Agent Storage Path to .hive/agents/{agent_name}/
2026-02-06 17:40:39 -08:00
Timothy 6fe0062e6e refactor(path): consolidate tui runner log path 2026-02-06 17:33:32 -08:00
Richard Tang 26b8b2f448 chore: move unused docs 2026-02-06 17:11:13 -08:00
Timothy @aden 7e40d6950a Merge pull request #3871 from TimothyZhang7/main
fix(micro-fix): uv paths in templates
2026-02-06 17:07:19 -08:00
Timothy 590bfa92cb chore: fix mcp server default config 2026-02-06 17:04:03 -08:00
Timothy f0e89a1720 fix: mcp server config with uv 2026-02-06 17:01:42 -08:00
Timothy @aden 575563b1e8 Merge pull request #3870 from adenhq/feat/multi-level-logging
fix: hardening hive cli setup
2026-02-06 16:37:37 -08:00
Timothy 82ea0e47ce fix: hardening hive cli setup 2026-02-06 16:31:31 -08:00
RichardTang-Aden 2f57ca10f7 Merge pull request #3862 from adenhq/feat/hive-tui
(micro-fix): documentation update
2026-02-06 16:19:46 -08:00
RichardTang-Aden 75c2d541c4 Merge branch 'main' into feat/hive-tui 2026-02-06 16:19:30 -08:00
Richard Tang b666f8b50b docs: minor doc update 2026-02-06 16:16:56 -08:00
RichardTang-Aden 09f9322676 Merge pull request #3863 from RichardTang-Aden/fix-remove-old-mock-mode
Fix remove old mock mode
2026-02-06 16:02:01 -08:00
Richard Tang f9a864ef93 fix: remove mock mode in the template 2026-02-06 15:59:48 -08:00
Richard Tang 27f28afe9c fix: remove --mock in the codebase + documentation 2026-02-06 15:59:22 -08:00
Timothy @aden 8f85722fef Merge pull request #3715 from adenhq/feat/multi-level-logging
Feat/multi level logging
2026-02-06 15:59:16 -08:00
bryan 5588445a01 documentation update 2026-02-06 15:59:01 -08:00
Timothy 40529b5722 fix: debugger to instruct on hive tui 2026-02-06 15:56:13 -08:00
Timothy @aden cee632f50c Merge pull request #3855 from adenhq/feat/hive-tui
update tui to support menu, highlight/copy, update quickstart
2026-02-06 15:24:10 -08:00
bryan 3453e3aa05 Merge branch 'feat/hive-tui' into feat/multi-level-logging 2026-02-06 15:21:52 -08:00
Timothy 8de637c421 fix: deprecated tests 2026-02-06 14:00:31 -08:00
Timothy 6c75de862c fix: skip outdated tests 2026-02-06 13:46:12 -08:00
Timothy 2971134882 docs: runtime logging structure 2026-02-06 13:26:53 -08:00
Timothy 6e79860b43 feat: hive debugger skill 2026-02-06 13:22:25 -08:00
Manudeserti 3f6bdda2a0 docs: add missing README for deep_research_agent 2026-02-06 18:11:00 -03:00
bryan 74d0287ec5 update tui to support menu, highlight/copy, update quickstart to include hive tui 2026-02-06 13:10:04 -08:00
RichardTang-Aden 51e81d80fc Merge pull request #3853 from adenhq/docs-key-concepts
Docs key concepts
2026-02-06 12:45:16 -08:00
Richard Tang cd014e41e4 docs: update links in the README.md 2026-02-06 12:44:34 -08:00
Richard Tang 830f11c47d docs: add key concept section 2026-02-06 12:41:22 -08:00
Timothy a73239dd98 feat: runtime log tools 2026-02-06 12:37:18 -08:00
Timothy d68783a612 refactor: unify storage layer for agent runtime 2026-02-06 12:20:46 -08:00
Timothy a28ea40a7d fix: execution log details in error trace 2026-02-06 11:03:19 -08:00
Sandeepa f2492bd4d4 docs(contributing): fix formatting typo in issue link 2026-02-07 00:22:48 +05:30
Timothy @aden b22be7a6cb Merge pull request #3818 from TimothyZhang7/main
(micro-fix)(skills): cursor skill symlinks to claude skill
2026-02-06 09:32:23 -08:00
bryan 5b00445c05 Merge branch 'main' into feat/multi-level-logging 2026-02-05 19:09:18 -08:00
Timothy @aden 5179677e8f Merge pull request #3744 from adenhq/chore/update-hive-credential
(micro-fix): update hive-credentials
2026-02-05 18:55:19 -08:00
bryan 2c25b2eae7 Merge branch 'main' into chore/update-hive-credential 2026-02-05 18:45:11 -08:00
RichardTang-Aden f6705fe2d3 Merge pull request #3746 from RichardTang-Aden/integration-ci
(micro-fix)(chore): fix format
2026-02-05 18:36:32 -08:00
Richard Tang c2771fed20 chore: fix format 2026-02-05 18:30:50 -08:00
RichardTang-Aden fc781eccd9 Merge pull request #3745 from RichardTang-Aden/integration-ci
(micro-fix)(chore): fix lint
2026-02-05 18:15:38 -08:00
bryan d5a25ae081 update hive-credentials 2026-02-05 18:13:25 -08:00
Richard Tang 23b6fb6391 chore: fix lint 2026-02-05 18:12:47 -08:00
Timothy 433967f0cf fix: cursor skill symlinks to claude skill 2026-02-05 18:11:24 -08:00
RichardTang-Aden 2a876c2a10 Merge pull request #3743 from RichardTang-Aden/integration-ci
feat(ci): add integration credential specs and CI validation
2026-02-05 18:06:22 -08:00
Richard Tang ff0adeaba7 docs: update outdated skill references 2026-02-05 18:00:06 -08:00
Richard Tang 846edbf256 docs: update documentation structure 2026-02-05 18:00:04 -08:00
Richard Tang c68dd48f6d feat: add slack credential spec and contribution doc 2026-02-05 17:39:44 -08:00
bryan 8b828dd139 Merge branch 'main' into feat/multi-level-logging 2026-02-05 17:19:17 -08:00
Richard Tang 50c0a5da9e feat: integration credentials implementation check 2026-02-05 17:06:34 -08:00
Timothy @aden 2f0e5c42f1 Merge pull request #3724 from TimothyZhang7/main
docs(hive): hive commands rebrand
2026-02-05 15:06:25 -08:00
Timothy @aden 903288468a Merge pull request #3725 from adenhq/chore/gmail-to-google
(micro-fix): changing gmail to google
2026-02-05 14:54:18 -08:00
bryan 9e3bba6f59 updated tests 2026-02-05 14:52:19 -08:00
bryan bc16f0752f changing gmail to google 2026-02-05 14:46:38 -08:00
Timothy 86badd70fa docs(hive): hive commands rebrand 2026-02-05 14:35:50 -08:00
Timothy @aden ce5379516c Merge pull request #3722 from TimothyZhang7/main
docs(templates): put example templates in there
2026-02-05 14:31:50 -08:00
Timothy a50078bbf2 chore: moves the templates 2026-02-05 14:25:49 -08:00
Timothy 2cef168442 fix: aden hive url 2026-02-05 14:08:18 -08:00
Timothy @aden 0a1a9e3545 Merge pull request #3720 from TimothyZhang7/feature/example-agent-registry
docs(skills): Rename skills to hive-* namespace and improve create workflow
2026-02-05 13:59:45 -08:00
Timothy 3c8682d80c fix: mention of skill in readme 2026-02-05 13:59:02 -08:00
Timothy ecc5a1608f fix: make sure of the skill ordering 2026-02-05 13:54:20 -08:00
RichardTang-Aden bc81b55600 Merge pull request #3713 from adenhq/update/gmail-send-tool
(micro-fix): created gmail send tool
2026-02-05 13:15:08 -08:00
Timothy 28b628c1b4 fix: update skill names and examples 2026-02-05 13:13:19 -08:00
Timothy 148264ac73 fix: skill problems 2026-02-05 13:11:18 -08:00
bryan 4046e4e379 created gmail send tool 2026-02-05 13:10:47 -08:00
Timothy 28298d9af2 fix: streamline the executor configuration and data tool usage 2026-02-05 12:50:00 -08:00
Fernando Mano 9d156325e0 Merge branch 'main' into feat/observability-trace-context 2026-02-05 17:06:07 -03:00
bryan 221712128d bug fix for crashing agent 2026-02-05 11:59:57 -08:00
bryan e9fc36f2d3 Merge branch 'main' into feat/multi-level-logging 2026-02-05 09:10:56 -08:00
bryan 305b880b1d including missing tool log inputs 2026-02-05 09:08:42 -08:00
Anshumaan Saraf 34782a6b85 docs(CONTRIBUTING): add upstream sync steps (#3477)
Fixes #2692

Added steps to configure the upstream remote and sync the main branch
before creating a feature branch. This helps contributors avoid starting
from stale code and reduces merge conflicts.
2026-02-05 16:28:07 +08:00
Patrick d25d94e71b docs(aden-credential-sync): typo (#3601) 2026-02-05 16:11:13 +08:00
Timothy @aden 51f1b449cd Merge pull request #3584 from TimothyZhang7/main
fix: gap between lint and format
2026-02-04 21:05:22 -08:00
Timothy 804e47dde4 fix: gap between lint and format 2026-02-04 21:02:50 -08:00
Timothy @aden 582c810d15 Merge pull request #3583 from TimothyZhang7/main
fix: test case
2026-02-04 20:59:58 -08:00
Timothy cede629718 fix: test case 2026-02-04 20:53:53 -08:00
Timothy @aden 10941dc7fc Merge pull request #3579 from TimothyZhang7/fix/do-not-mention-deprecated-nodes
Release / Create Release (push) Waiting to run
fix: mentions of deprecated nodes in agent builder
2026-02-04 20:46:10 -08:00
Timothy c1c16878e4 fix: mentions of deprecated nodes in agent builder 2026-02-04 20:42:16 -08:00
Timothy @aden 80a41b434b Merge pull request #3240 from levxn/main
Integration: Advanced Slack MCP tools Integration (~45+ tools), compatibility and working checked with all other existing tools
2026-02-04 20:34:24 -08:00
Timothy 9a8e117f1d chore: fix lint and tests 2026-02-04 20:30:47 -08:00
Bryan @ Aden 878603033a Merge pull request #3573 from TimothyZhang7/feature/quickstart-credential-store
feat: credential store init, add textual dep, standardize uv commands
2026-02-04 20:20:52 -08:00
Timothy 1c6f17e8db Merge remote-tracking branch 'origin/main' into feature/quickstart-credential-store 2026-02-04 20:03:44 -08:00
Timothy 8f32ef8064 chore: uv for all 2026-02-04 19:57:41 -08:00
bryan 7519c73f2a Merge branch 'main' into feat/multi-level-logging 2026-02-04 19:34:01 -08:00
Bryan @ Aden e12bc96e21 Merge pull request #3557 from TimothyZhang7/feature/tui-dashboard
feat: TUI dashboard, EventLoopNode refinements, auto-creation, data tools, and runner overhaul
2026-02-04 19:04:41 -08:00
bryan bf402aaa18 initial multi-level logging 2026-02-04 17:26:58 -08:00
RichardTang-Aden 2355d3d729 Merge pull request #3525 from Acid-OP/fix/on-failure-edge-routing
fix: follow ON_FAILURE edges when node fails after max retries
2026-02-04 17:00:35 -08:00
Richard Tang a093a59cb0 test: add tests for ON_FAILURE edge routing after max retries
Covers:
- ON_FAILURE edge followed when node fails after max retries
- Original termination behavior preserved when no ON_FAILURE edge exists
- ON_FAILURE edge not followed on success (only ON_SUCCESS fires)
- ON_FAILURE routing with max_retries=0 (no retries)
- Failure handler appears in execution path and node_visit_counts
2026-02-04 16:58:30 -08:00
Gaurav Kapur d7917988c3 fix: follow ON_FAILURE edges when node fails after max retries 2026-02-04 16:51:46 -08:00
Timothy @aden ae566a2027 Merge pull request #2652 from mubarakar95/feature/tui-dashboard
feat: Implement production-ready TUI with interactive agent execution, real-time monitoring, and screenshot support
2026-02-04 15:43:54 -08:00
Timothy b15473d3f3 fix: graph validation 2026-02-04 15:32:53 -08:00
RichardTang-Aden 265bf885ec Merge pull request #3556 from RichardTang-Aden/remove-unused-scripts
chore(scripts): remove deprecated setup scripts
2026-02-04 15:07:12 -08:00
Richard Tang e318281989 refactor: clean up the use of setup-python 2026-02-04 14:49:39 -08:00
Timothy 3e2a11d60d feat: integrate agent builder with tui 2026-02-04 14:47:28 -08:00
Timothy @aden 4b9f73310e Merge branch 'main' into feature/tui-dashboard 2026-02-04 10:53:43 -08:00
Levin b17c26116d Merge branch 'adenhq:main' into main 2026-02-04 23:46:35 +05:30
Timothy @aden 3114af75e4 Merge pull request #3536 from TimothyZhang7/feature/coding-agent-reskill
feat: update skills and agent builder tools, bump pinned ruff version
2026-02-04 10:11:58 -08:00
Levin 7a6d10639b Merge branch 'main' into main 2026-02-04 23:39:17 +05:30
Timothy 6ff29ea6aa fix: max retry go back to 0 for event loop node 2026-02-04 10:08:40 -08:00
Timothy a23f01973a feat: update skills and agent builder tools, bump pinned ruff version 2026-02-04 09:52:28 -08:00
bryan 0aaa3a3eca uv.lock update 2026-02-04 07:57:22 -08:00
bryan 82f05d1102 Merge branch 'main' into feature/tui-dashboard 2026-02-04 07:49:08 -08:00
Bryan @ Aden 8ff6d9c8bd Merge pull request #3423 from adenhq/event-loop-arch
Event Loop Architecture: Streaming Multi-Turn Agent Nodes
2026-02-04 07:43:56 -08:00
bryan a2e102fe15 windows lint fix 2026-02-03 20:00:11 -08:00
Timothy 119280da1a Merge remote-tracking branch 'upstream/main' into event-loop-arch
Resolve conflict in tools/mcp_server.py: take main's
CredentialStoreAdapter.default() which encapsulates the same
CompositeStorage logic our branch had inline.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 19:42:47 -08:00
RichardTang-Aden 4d49f74d5a Merge pull request #3372 from ranjithkumar9343/ranjithkumar9343-patch-1
docs: add Windows compatibility warning
2026-02-03 19:36:50 -08:00
Timothy 6a42b9c66b fix: resolve CI failures in lint and tests
- Fix max_node_visits blocking executor retries: the visit count was
  incremented on every loop iteration including retries, causing nodes
  with max_node_visits=1 (default) to be skipped on retry. Added
  _is_retry flag to distinguish retries from new visits via edge
  traversal.

- Fix 20 UP042 lint errors: replace (str, Enum) with StrEnum across
  14 files. Python 3.11+ StrEnum is preferred and enforced by ruff.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 19:36:35 -08:00
RichardTang-Aden fc4a39480a Merge pull request #3455 from adenhq/feat/non-oauth-setup
update credentials store to work with non-oauth keys
2026-02-03 19:31:46 -08:00
Timothy b98afb01c8 chore: lint 2026-02-03 18:01:39 -08:00
Timothy ccd6bb7656 chore: lint 2026-02-03 17:57:02 -08:00
bryan ea30e5c631 consolidate workspace to uv monorepo 2026-02-03 17:47:57 -08:00
Timothy d16a3c3b22 chore: give comand line to demo 2026-02-03 17:38:21 -08:00
Timothy a03bd78c2e fix: formulate tool call results 2026-02-03 17:25:44 -08:00
bryan 3cca41aab1 updating tests 2026-02-03 15:17:19 -08:00
bryan d19aaed946 fixing linter issues 2026-02-03 14:55:51 -08:00
RichardTang-Aden 9a7db8cf94 Merge pull request #3450 from adenhq/adenhq-patch-1
(micro-fix): Update issue templates
2026-02-03 14:53:09 -08:00
bryan f50630c551 update credentials store to work with non-oauth keys 2026-02-03 14:49:12 -08:00
Timothy 0ef2e64733 fix: use mcp tools properly 2026-02-03 14:36:09 -08:00
Aden HQ 3a8e121d43 Update issue templates 2026-02-03 14:24:13 -08:00
bryan 23e249144d Merge remote-tracking branch 'origin/main' into feature/tui-dashboard
# Conflicts:
#	.github/workflows/ci.yml
2026-02-03 12:26:51 -08:00
bryan 25014bfa89 update ci to use uv, updated linting 2026-02-03 12:14:13 -08:00
bryan 78ea585779 tui upgrades 2026-02-03 11:55:22 -08:00
bryan ac13c11f89 Merge remote-tracking branch 'origin/event-loop-arch' into feature/tui-dashboard 2026-02-03 11:06:44 -08:00
Timothy 51d341b88c fix: tool pruning logic 2026-02-03 10:29:35 -08:00
Timothy 7dd70b8e31 feat: tool truncation 2026-02-03 08:50:14 -08:00
ranjithkumar9343 84b332d989 docs: add Windows compatibility warning
Added a note for Windows users to use WSL/Git Bash to prevent setup errors.
2026-02-03 17:43:36 +05:30
oluwasegun.haziz.omd 7fae57f311 more fixes 2026-02-03 11:53:45 +00:00
Levin fd1826a267 Merge branch 'adenhq:main' into main 2026-02-03 16:11:05 +05:30
Anjali Yadav bcc6848275 fix(mcp): handle missing exports directory in test generation tools (#3066)
* fix(mcp): handle missing exports path in test generation tools

* fix(mcp): centralize agent path validation across test tools

* fix: remove duplicate if blocks and improve error hint message

---------

Co-authored-by: hundao <alchemy_wimp@hotmail.com>
2026-02-03 18:15:34 +08:00
Hundao 75dd053a40 fix(ci): migrate remaining CI jobs from pip to uv (#3366)
Closes #3363
2026-02-03 18:11:32 +08:00
Yogesh Sakharam Diwate 20f2aa09f2 docs(readme): update architecture diagram src path (#3348) 2026-02-03 14:58:29 +08:00
RichardTang-Aden fb8c810b3d Merge pull request #3293 from Antiarin/feat/llm-datetime-context
feat: inject runtime datetime into LLM system prompts
2026-02-02 20:07:00 -08:00
levxn b99b6c5cd3 latest upstream 2026-02-03 09:35:38 +05:30
haliaeetusvocifer 1f653969a9 added feature google docs integration 2026-02-03 03:51:39 +00:00
RichardTang-Aden ad21cf4243 Merge pull request #3327 from adenhq/fix-for-quickstart
fix: replace the use of PYTHON_CMD in quickstart.sh
2026-02-02 19:35:19 -08:00
Richard Tang 1e45cfff67 feat: Check for litellm import in both CORE_PYTHON and TOOLS_PYTHON environments. 2026-02-02 19:27:32 -08:00
Richard Tang 0280600a47 fix: fix error when test import installation 2026-02-02 19:24:13 -08:00
RichardTang-Aden 571ad518dc Merge pull request #3064 from kuldeepgaur02/fix/quickstart-provider-selection
fix: silent exit when selecting non-Anthropic LLM provider
2026-02-02 19:03:16 -08:00
RichardTang-Aden fe37a25cf1 Merge pull request #3322 from RichardTang-Aden/main
fix: Resolve quickstart.sh compatibility issues and migrate from pip to uv
2026-02-02 18:55:00 -08:00
Timothy e06138628c chore: remove local claude settings 2026-02-02 18:48:18 -08:00
Timothy 1ed0edd158 Merge remote-tracking branch 'upstream/main' into event-loop-arch
# Conflicts:
#	.claude/settings.local.json
2026-02-02 18:46:07 -08:00
Richard Tang 49dbc46082 feat: migrate from pip to uv 2026-02-02 18:45:09 -08:00
Richard Tang a16a4adc09 feat: add message when LLM key is not available 2026-02-02 18:41:20 -08:00
Richard Tang b4ab1cbd56 fix: fix quickstart competibility 2026-02-02 18:34:09 -08:00
Timothy 6faa63f0d0 fix: loop prevention in feedback edges 2026-02-02 18:26:45 -08:00
bryan f4737dcfe7 Merge remote-tracking branch 'origin/event-loop-arch' into feature/tui-dashboard 2026-02-02 18:25:46 -08:00
Richard Tang 2b44af427f fix: quickstart.sh competibility fix 2026-02-02 18:21:40 -08:00
RichardTang-Aden 11f7401bc2 Merge branch 'adenhq:main' into main 2026-02-02 18:00:40 -08:00
RichardTang-Aden db7b5180dd Merge pull request #3270 from adenhq/bot-detecting-issue-size
feat: Edit bot prompt to be able to decide on the technical size of issue
2026-02-02 17:49:52 -08:00
Timothy @aden 5b4e56252c Merge pull request #2820 from lakshitaa-chellaramani/feature/github-tool
Feature/GitHub tool
2026-02-02 17:49:34 -08:00
Timothy e3c71f77de chore: fix ruff format 2026-02-02 17:37:37 -08:00
Timothy b09824faec chore: fix lint 2026-02-02 17:36:02 -08:00
RichardTang-Aden c69bc24598 Merge pull request #3301 from adenhq/add-example-structure
docs: sample agent folder, remove docker file in Readme
2026-02-02 16:47:04 -08:00
Richard Tang 0cf17e1c63 feat: sample agent folder, remove docker file in Readme 2026-02-02 14:15:58 -08:00
Timothy @aden feac803491 Merge pull request #3256 from adenhq/feat/integration-tests
Feat/integration tests
2026-02-02 13:23:42 -08:00
Timothy 4aacec30d8 fix: text delta granularity, tool limit problem 2026-02-02 13:21:50 -08:00
RichardTang-Aden b459a2f7a9 Merge pull request #918 from Siddharth2624/fix-malformed-json-tool-args
Handle malformed JSON tool arguments in LiteLLMProvider
2026-02-02 13:04:13 -08:00
bryan ca7f6d3514 fixes to linting 2026-02-02 12:52:11 -08:00
Antiarin ca8ede65f0 feat: inject runtime datetime into LLM system prompts 2026-02-03 02:10:14 +05:30
bryan b033c56ae5 Merge remote-tracking branch 'origin/main' into feature/tui-dashboard 2026-02-02 12:29:27 -08:00
Richard Tang 9a177c46e1 feat: edit bot prompt to be able to decide on the technical size of issue 2026-02-02 11:39:44 -08:00
bryan d49e858d32 lint update 2026-02-02 11:12:09 -08:00
Timothy @aden 20bea9cd7f Merge pull request #2273 from krish341360/fix/concurrent-storage-race-condition
Release / Create Release (push) Waiting to run
fix: race condition in ConcurrentStorage and cache invalidation bug
2026-02-02 10:57:45 -08:00
bryan d7afa5dcf2 wp-12 2026-02-02 10:41:12 -08:00
Timothy 22e816bf86 chore: update gitignore 2026-02-02 10:30:03 -08:00
krish341360 a7709d489c style: apply ruff formatting to test file 2026-02-02 23:53:27 +05:30
Timothy @aden 3240616808 Merge pull request #3250 from adenhq/feat/validation-client-facing
(micro-fix): added graph validation for client-facing nodes [WP-10]
2026-02-02 10:02:38 -08:00
krish341360 18dfc997b8 fix: resolve lint errors in test file 2026-02-02 23:24:21 +05:30
Timothy @aden 92d0b6addf Merge pull request #3050 from Rockysahu704/rocky-first-contribution
docs: clearify who Hive is for and when to use it
2026-02-02 09:51:05 -08:00
Timothy @aden b9f83d4d61 Merge pull request #3244 from TimothyZhang7/feature/aden-sync-by-provider
Feature/aden sync by provider
2026-02-02 09:39:00 -08:00
levxn 694feaffd2 phase 3 tools implemented, totals now to 45+ tools for Slack for multipurpose integration including CRM support 2026-02-02 22:44:10 +05:30
Timothy @aden 9c16826ad3 Merge pull request #3137 from adenhq/feat/clientIO-gateway
implemented clientIO gateway [WP-9]
2026-02-02 07:29:03 -08:00
levxn eb68e2143b slack tools add ons and latest upstream commit 2026-02-02 19:07:05 +05:30
Harsh Kishorani f305745295 feat(setup): add native PowerShell setup script for Windows (#746)
* feat(setup): add PowerShell setup script with venv for Windows

* docs: restore PEP 668 troubleshooting section

* docs: restore Alpine Linux setup section

---------

Co-authored-by: hundao <alchemy_wimp@hotmail.com>
2026-02-02 17:02:19 +08:00
Timothy df4d0ad3fd feat: aden provider credential store by provider 2026-02-01 20:34:21 -08:00
bryan 9034d1dc71 lint fix 2026-02-01 20:26:36 -08:00
bryan 537172d8ce implemented clientIO gateway [WP-9] 2026-02-01 20:23:26 -08:00
Timothy 20b2e4b3dd fix: robust compaction logic 2026-02-01 19:59:27 -08:00
RichardTang-Aden fc22586752 Merge pull request #3128 from adenhq/fix/tests
(micro-fix): fixed pytests and warnings
2026-02-01 19:53:07 -08:00
Richard Tang 646440eba3 chore: update developer doc 2026-02-01 19:49:35 -08:00
Richard Tang 53e5579326 fix: remove requirements.txt 2026-02-01 19:45:32 -08:00
Richard Tang 29a1630d0f feat: add tool tests in CI 2026-02-01 19:38:33 -08:00
bryan 171f4ab2ae fixed pytests and warnings 2026-02-01 19:11:44 -08:00
Timothy @aden a86043a2ec Merge pull request #3127 from TimothyZhang7/feature/event-loop-wp8
Feature/event loop wp8
2026-02-01 19:07:33 -08:00
Timothy 3947da2cf1 Merge upstream/event-loop-arch into feature/event-loop-wp8
Brings in upstream changes: email tool, csv/pdf fixes, docs updates,
agent builder export atomicity fix, JSON extraction validation bugfix.
No conflicts.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 19:06:30 -08:00
Timothy 17caab6563 feature: remove hard failure on schema mismatch for context hand off 2026-02-01 18:55:41 -08:00
Timothy @aden a5ae071a03 Merge pull request #723 from trinh31201/bugfix/json-extraction-validation
micro-fix(graph): validate LLM JSON extraction to prevent empty/fabricated data
2026-02-01 18:51:51 -08:00
bryan 9c33da7b8d added graph validation for client-facing nodes [WP-10] 2026-02-01 18:45:35 -08:00
Timothy 94d31743b0 fix: sync with wp7 2026-02-01 18:14:04 -08:00
Timothy 70db618c6e feat: event loop node implementation 2026-02-01 17:16:18 -08:00
levxn 960a4549ef latest upstream v1 2026-02-01 19:03:39 +05:30
Kuldeep Raj Gour 363a650dfa fix: silent exit when selecting non-Anthropic LLM provider
prompt_choice used return codes to pass selections. Combined with set -e,
non-zero returns (options 2-6) caused immediate script exit.

Fix: Use global variable PROMPT_CHOICE instead of return codes.
2026-02-01 15:51:46 +05:45
Rocky Sahu b6e2634537 docs: clearify who Hive is for and when to use it 2026-02-01 14:38:02 +05:30
Anshumaan Saraf 23146c8dae docs: remove duplicate entry in Edge Protocol docstring (#2994)
Fixes #2717

The Edge Types list in edge.py had 'always' listed twice.
Removed the duplicate line.
2026-02-01 15:19:11 +08:00
Siddharth Varshney 9f424f2fc0 Remove unused Fake* classes and unrelated note block
- Remove unused FakeFunction, FakeToolCall, FakeMessage, FakeChoice, FakeResponse classes from test_litellm_provider.py
- Remove unrelated note block from building-production-ai-agents.md
- Fix lint issues (trailing whitespace)
2026-01-31 20:56:52 +00:00
levxn 25989d9f90 slack add ons v3, now ~30 tools 2026-02-01 01:29:32 +05:30
Lakshitaa Chellaramani 0715fc5498 Merge branch 'main' into feature/github-tool 2026-01-31 23:31:22 +05:30
lakshitaa f9fddd6663 fix(github-tool): Address PR feedback - security and integration fixes
Addresses all blockers and suggestions from code review:

**Blockers fixed:**
1. Register tools in tools/__init__.py - Added import, registration call,
   and all 13 tool names to return list
2. Add credential spec - Created GitHub entry in credentials/integrations.py
   with env_var, tools list, help URL, and health check config
3. Move tests to correct location - Relocated from
   tools/src/.../github_tool/tests/ to tools/tests/tools/test_github_tool.py
4. Removed .claude/settings.local.json from PR

**Security improvements:**
1. URL parameter sanitization - Added _sanitize_path_param() to reject
   path traversal attempts (/ or ..) in owner, repo, branch, username params
2. Error message sanitization - Added _sanitize_error_message() to prevent
   token leaks from httpx.RequestError exceptions

All 38 tests passing.
2026-01-31 23:26:33 +05:30
levxn 684da96a83 slack add ons v2, now 15 tools in total 2026-01-31 20:39:38 +05:30
levxn abae7979cb excluded a file not needed 2026-01-31 18:37:35 +05:30
levxn 49bce57fcf slack bot integration v1 2026-01-31 18:35:27 +05:30
Muzzaiyyan Hussain 58b60b84fd fix: make agent builder exports atomic (#2605)
* fix: make agent builder exports atomic
2026-01-31 17:59:31 +08:00
Hundao 86aef3319f fix(ci): apply ruff format to csv_tool.py (#2910) 2026-01-31 17:50:15 +08:00
Richard Tang 63d017fc21 fix: bash version support 2026-01-30 20:32:56 -08:00
RichardTang-Aden 0015b3d43d Merge pull request #1873 from DhruvPokhriyal/bugfix/csv_read-negative-offset
fix: validate non-negative limit and offset in csv_read function
2026-01-30 20:12:39 -08:00
RichardTang-Aden 9c4d44c057 Merge pull request #2205 from mishrapravin114/fix/1390-pdf-read-max-pages
pdf_read: surface truncation when exceeding max_pages
2026-01-30 20:12:01 -08:00
RichardTang-Aden 800c7fbe11 Merge pull request #2316 from NicklausFW/1843-csv-write-fail-without-parent-dir
fix(csv): handle csv_write with no parent directory
2026-01-30 20:11:47 -08:00
Timothy @aden 291ba24229 Merge pull request #2832 from adenhq/feat/node-conversation-class-WP-6-
nodeConversation Class
2026-01-30 19:01:30 -08:00
Timothy c52ce6bb49 Merge branch 'feature/event-loop-framework' into test/wp1-wp2-wp6-combined 2026-01-30 16:34:12 -08:00
RichardTang-Aden ffa4096390 Merge pull request #2601 from Hundao/feat/email-tool
[Integration] feat(tools): add email service tool with Resend provider
2026-01-30 16:32:16 -08:00
Timothy bcddd4ce77 Merge branch 'feature/credential-manager-aden-provider' into test/wp1-wp2-wp6-combined 2026-01-30 16:30:54 -08:00
Timothy 017872f71b feat: emit bus events 2026-01-30 16:27:39 -08:00
bryan f2b6fc6948 linter updates 2026-01-30 16:18:48 -08:00
bryan acff8a0ece nodeConversation Class 2026-01-30 16:16:34 -08:00
Richard Tang 347c222f78 fix: quickstart compatibility 2026-01-30 16:07:05 -08:00
lakshitaa bfb660275e feat(tools): Add GitHub tool for repository and issue management
Implements comprehensive GitHub REST API v3 integration with 15 MCP tools
for managing repositories, issues, pull requests, code search, and branches.

Features:
- Repository management (list, get, search repos)
- Issue operations (create, update, close, list issues)
- Pull request management (create, list, get PRs)
- Code search across GitHub
- Branch operations (list, get branch info)

Technical details:
- 15 MCP tools organized in 5 categories
- 38 comprehensive tests with mocking (all passing)
- Full credential store support (env var + CredentialStoreAdapter)
- Proper error handling (timeout, network, API errors)
- Follows HubSpot/Slack tool patterns exactly

Files:
- tools/src/aden_tools/tools/github_tool/github_tool.py (757 lines)
- tools/src/aden_tools/tools/github_tool/tests/test_github_tool.py (628 lines)
- tools/src/aden_tools/tools/github_tool/README.md (646 lines)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-31 04:32:46 +05:30
Timothy f58619e378 Merge branch 'main' into feat/email-tool 2026-01-30 14:00:54 -08:00
RichardTang-Aden 472cfe1437 Merge pull request #2815 from RichardTang-Aden/main
Docs: improving Q&A and hive features
2026-01-30 13:57:46 -08:00
Richard Tang 8b7efe27c1 docs: updated hive descriptions 2026-01-30 13:57:16 -08:00
Timothy eb00c10d9b Merge remote-tracking branch 'origin/main' into feat/email-tool 2026-01-30 13:56:15 -08:00
Richard Tang 71249f4f88 docs: updated q&a and why aden 2026-01-30 13:54:37 -08:00
Timothy 0beeda3eec fix: email tool to credential store 2026-01-30 13:54:01 -08:00
lakshitaa d6ae48bc58 Merge upstream/main 2026-01-31 03:19:12 +05:30
Timothy @aden dc4a40468b Merge pull request #2808 from TimothyZhang7/feature/credential-manager-aden-provider
Feature/credential manager aden provider
2026-01-30 13:32:45 -08:00
Timothy 7fa2295d30 fix: ruff format issue 2026-01-30 13:27:29 -08:00
Timothy 756f013ecd fix: mcp test case 2026-01-30 13:24:23 -08:00
Richard Tang a963d49306 docs: remove duplicated run agent command 2026-01-30 13:23:02 -08:00
Timothy 4b00852bdf Merge remote-tracking branch 'origin/main' into feature/credential-manager-aden-provider 2026-01-30 13:18:11 -08:00
RichardTang-Aden b9b1731dc1 Merge pull request #2807 from RichardTang-Aden/main
Docs: Update instruction for tools/integration contribution
2026-01-30 13:06:13 -08:00
Richard Tang 34791e6bbd docs: update issue links 2026-01-30 13:04:54 -08:00
Richard Tang d1ebdfc92f docs: tools contribution guide 2026-01-30 12:59:56 -08:00
austin931114 33040b7978 Merge pull request #1316 from Shivraj12/fix/tool-registry-invalid-json
fix(tool_registry): handle invalid JSON returned by tools
2026-01-30 21:43:32 +01:00
austin931114 3b6b6c48a5 Merge pull request #919 from Siddharth2624/chore/validation-error-message
docs: clarify illustrative output sanitization example
2026-01-30 21:32:39 +01:00
Timothy c3fddd3c8c fix: deprecate credential manager 2026-01-30 12:28:27 -08:00
Richard Tang 41e5558715 docs: update readme 2026-01-30 12:24:16 -08:00
austin931114 58969085bf Merge pull request #1816 from NicklausFW/1277-execution-quality-tracking
fix(executor): add execution quality tracking to expose retry metrics
2026-01-30 21:15:23 +01:00
austin931114 f45ad2d543 Merge pull request #1656 from hrshmakwana/fix/setup-creates-exports
fix(micro-fix): setup script now creates missing exports directory (#1645)
2026-01-30 21:01:08 +01:00
Timothy 7e670ce0a8 feat: event loop WP1-4 2026-01-30 11:43:19 -08:00
Fernando Mano 4310852ee6 chore: Merge branch 'main' into feat/observability-trace-context 2026-01-30 15:09:54 -03:00
mubarakar95 d32308b6d2 Add TUI enhancements: screenshot feature, header polish, and keybinding updates
- Implement SVG screenshot functionality (Ctrl+S)
- Remove header icon and disable expansion
- Hide borders in screenshots for clean output
- Change command palette to Ctrl+O
- Make screenshot shortcut work globally (priority binding)
2026-01-30 16:56:34 +05:30
hundao 0030d6b499 feat(tools): add cc/bcc support to email tool
Add optional cc and bcc parameters to send_email and
send_budget_alert_email. Empty strings and whitespace-only values are
filtered out via _normalize_recipients to prevent invalid payloads.
2026-01-30 18:35:42 +08:00
hundao 5f019f44ca feat(tools): add email service tool with Resend provider
Integrate a mail service to enable email notifications for budget alerts.
Closes #7.

New tools:
- send_email: general-purpose email sending with multi-provider support
- send_budget_alert_email: formatted budget alert notifications with
  severity levels (INFO/WARNING/CRITICAL/EXCEEDED)

Architecture:
- Multi-provider pattern (matching web_search_tool), Resend as primary
- from_email resolved via explicit param or EMAIL_FROM env var
- Credential integration via CredentialManager with env var fallback

Also fixes: web_scrape_tool test mock missing content-type header
2026-01-30 18:35:42 +08:00
Hundao 0d602f92a3 fix(ci): add missing content-type header in scrape test mock and format mcp_client (#2612) 2026-01-30 18:33:37 +08:00
mubarakar95 604d16e353 Enhance TUI: Fix rendering, polish layout, and clean up header 2026-01-30 15:40:15 +05:30
mubarakar95 db577785d6 feat: Implement fully functional TUI dashboard
- Fix ScreenStackError crash by moving runtime init inside async context
- Implement selectable logging with TextArea widget
- Create interactive ChatREPL for agent execution
- Optimize 3-pane layout: logs/graph on left (60%), chat on right (40%)
- Add thread-safe event handling with call_from_thread
- Add TUI selection guide documentation
All features tested and working.
2026-01-30 11:49:04 +05:30
RichardTang-Aden b10d617166 Merge pull request #638 from Sourabsb/fix/run-async-is-running-check
fix: add is_running() and is_closed() checks to _run_async() to prevent deadlock
2026-01-29 21:17:58 -08:00
RichardTang-Aden 348c646bab Merge pull request #534 from Sourabsb/fix/mcp-client-resource-leak
fix: properly close MCP session and STDIO context managers in disconnect()
2026-01-29 21:06:30 -08:00
RichardTang-Aden a8243e6746 Merge pull request #274 from dithzz/fix/cmd-list-keyerror-steps
fix(cli): fix KeyError 'steps' in cmd_list function
2026-01-29 20:45:20 -08:00
RichardTang-Aden 9368828f94 Merge pull request #189 from RussellLuo/fix-mcp-servers-parsing
fix(skills): load MCP servers correctly
2026-01-29 18:27:24 -08:00
RichardTang-Aden 51e9a3ecdf Merge branch 'main' into fix-mcp-servers-parsing 2026-01-29 18:26:49 -08:00
Timothy 2f03605980 fix: change to production api endpoint 2026-01-29 17:53:47 -08:00
RichardTang-Aden 74e754b4e1 Merge pull request #2496 from RichardTang-Aden/main
Docs: Update Roadmap and Mermaid chart
2026-01-29 17:49:35 -08:00
RichardTang-Aden f332e40000 Merge pull request #2486 from adenhq/chore-docs-update
Docs: updating documentation
2026-01-29 17:49:22 -08:00
Richard Tang d6064147e4 chore: update mermaid chart type 2026-01-29 17:44:24 -08:00
bryan 1fb5005bf5 removing .env.example from tools 2026-01-29 17:43:24 -08:00
bryan 57fbb0479b remove env example 2026-01-29 17:37:32 -08:00
RichardTang-Aden 26154cc648 Merge pull request #2212 from MuzzaiyyanHussain/docs/i18n-hindi
docs(i18n): add Hindi (हिंदी) README translation
2026-01-29 17:23:34 -08:00
Richard Tang e207cee4ff feat: update mermaid chart 2026-01-29 17:07:54 -08:00
Richard Tang e7a2d957f5 chore: update roadmap to reflect recent direction calibration 2026-01-29 16:48:31 -08:00
bryan 7e5f02eebe updating documentation 2026-01-29 16:12:16 -08:00
Timothy 248716c093 feat: credential store auto sync 2026-01-29 14:37:20 -08:00
Muzzaiyyan Hussain 37a3fce27d Translated the last line to hindi 2026-01-30 01:41:13 +05:30
mubarakar95 c9ae3a0541 feat: finalize TUI with minimal stable mode
The TUI feature is fully functional with a minimal, stable interface:
- Header with title
- Central display area
- Footer with keybindings

This provides a foundation for future enhancements. Custom widgets
(LogPane, GraphOverview, ChatRepl) are available in the codebase
and can be integrated once Textual rendering issues on Windows are
resolved.

The --tui flag successfully launches the TUI dashboard for any agent:
  hive run <agent_path> --tui

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-01-30 01:31:59 +05:30
mubarakar95 ed95dab9f3 fix: implement lazy widget loading and store references
Defer widget creation from __init__ to compose() and store references
as instance variables to prevent garbage collection and initialization
order issues. This resolves ScreenStackError during TUI startup.

Changes:
- Move LogPane, GraphOverview, ChatRepl creation to compose()
- Store widgets as instance variables (self.log_pane, etc)
- Restore Horizontal/Vertical container layout

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-01-30 01:27:56 +05:30
mubarakar95 a6536cef94 fix: restore Horizontal/Vertical layout containers in TUI compose
The compose() method was missing the proper container structure
for the layout, which caused initialization to fail. Restored the
Horizontal/Vertical container layout with proper pane structure.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-01-30 01:22:41 +05:30
mubarakar95 3ccc81e81c feat: add interactive TUI dashboard for agent execution
Implement a Textual-based terminal UI for the hive framework that displays:
- Agent execution status and progress
- Log output in real-time
- Graph visualization of agent execution flow
- Interactive REPL for user input/feedback

Changes:
- Add core/framework/tui/ module with AdenTUI app and custom widgets
- Add LogPane widget for streaming log output
- Add GraphOverview widget for execution graph visualization
- Add ChatRepl widget for interactive user input
- Add TUILogHandler for capturing framework logs to TUI
- Update cli.py to support --tui flag for launching dashboard
- Update runner.py to enable AgentRuntime when TUI is active
- Fix missing Textual container imports (Horizontal, Vertical, Container)
- Fix race conditions in async TUI initialization
- Fix threading issues in app event handling

The TUI is launched via: hive run <agent_path> --tui

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-01-30 01:19:28 +05:30
Muzzaiyyan Hussain 7976c1dac7 linked the translated hindi version hi.md in to the main readme 2026-01-30 00:30:26 +05:30
Aden HQ da2bac1b48 Merge pull request #2414 from RichardTang-Aden/main
fix: litellm missing from tools dependencies; quickstart.sh only vali…
2026-01-29 10:30:23 -08:00
Richard Tang 4096eba564 fix: litellm missing from tools dependencies; quickstart.sh only validates tools venv 2026-01-29 10:20:19 -08:00
RichardTang-Aden 3f3a23e4b2 Merge pull request #2314 from Sourabsb/fix/remove-debug-print-statements-v2
micro-fix: remove debug print statements that leak API key
2026-01-29 07:56:45 -08:00
RichardTang-Aden 934e3145b8 Merge pull request #2348 from JVSCHANDRADITHYA/main
(Micro-Fix) CLI crash when exports/ directory is missing
2026-01-29 07:56:18 -08:00
RichardTang-Aden 6155ccbf4d Merge pull request #2037 from shivamhwp/conductorchicago
Feat(quick-start): Address PR #716 review feedback: MCP config, Python version, venv docs
2026-01-29 07:45:36 -08:00
Chandradithya Janaswami 6cadc81be8 Merge branch 'adenhq:main' into main 2026-01-29 21:12:00 +05:30
RichardTang-Aden 412521edb0 Merge branch 'main' into conductorchicago 2026-01-29 07:40:53 -08:00
Nicklaus Wibowo ec3be40ddd fix(csv): handle csv_write with no parent directory
Guard against empty parent_dir when path has no directory component (e.g., 'data.csv'). Prevents FileNotFoundError from os.makedirs(''). Adds test coverage for root-level file writes.
2026-01-29 20:44:27 +07:00
Sourabsb fd00471189 fix: remove debug print statements that leak API key to stdout 2026-01-29 18:57:33 +05:30
krish341360 94197cbcb9 fix: race condition in ConcurrentStorage and cache invalidation bug
- Fix race condition: cache now updates only after successful write
- Fix cache invalidation: summary cache invalidated on save_run()
- Add 4 tests to verify the fixes
2026-01-29 16:35:38 +05:30
Muzzaiyyan Hussain 65c3fcf76d docs(i18n): add Hindi (हिंदी) README translation 2026-01-29 14:23:40 +05:30
mishrapravin114 83f77af2ab pdf_read: surface truncation when exceeding max_pages 2026-01-29 13:38:16 +05:30
suhanijindal 2fe83187d6 fix: add Python 3.13 classifier to tools/pyproject.toml (#1780)
Co-authored-by: United IT Services <uniteditservices@Uniteds-MacBook-Air.local>
2026-01-29 15:57:19 +08:00
Ayush Pandey e65052c237 fix(core): explicitly set utf-8 encoding for storage and testing backends (#641) 2026-01-29 14:07:37 +08:00
Mrunal Nirajkumar Shah 38bc7c12ae fix(setup): Fixes python and pip version detection and mismatch (#1190)
* Fixed Python and pip version mismatch with robust code #476

- Ensured python version is found across all available python interpreters including python3, python, and py -3, and made it robust for easy interpreter add-on.
- Ensured that pip is found for the respective python interpreter.
- Generalized some variables like PYTHON_VERSION for flexiblity.
- Added a split to PYTHON_VERSION into Major and Minor version to create a robust code.
- Added clear documentation throughout the code .

Related to issue #476

* fix(setup): Code fixes raised during review by @Hundao

- PYTHON_CMD initialized to no value (blank). Fixes the bug
- PYTHON_VERSION used to generalize is changed to REQUIRED_PYTHON_VERSION due to name collision
- quotes added to "${POSSIBLE_PYTHONS[@]}" so py -3 can work.

Pending:
eval related issues pending.

* fix(setup): Code fixes raised during review by @Hundao

- eval removed altogether.
- py -3 is replaced with py in POSSIBLE_PYTHONS, and will be replaced to py -3 after the interpreter selection.

* fix(setup): Code fixes raised during review by @bryanadenhq

- Implemented Array and refactored entire code. PYTHON_CMD is changed at all places in the entire code.
- Redundant code is removed, design changed a bit for user understanding. (See Screenshots)
- Using 2>&1 as standard. Fix the mis-match in standard code writing.
2026-01-29 14:06:54 +08:00
Sourabsb 758c5157b8 Merge upstream/main and resolve conflicts 2026-01-29 10:17:44 +05:30
Sourabsb ce6b47c0d4 fix: resolve all lint issues in mcp_client.py 2026-01-29 10:10:11 +05:30
Shivam Sharma 22c95b62ce quickstart: auto-install uv and pick Python >=3.11 2026-01-29 08:49:02 +05:30
Shivam Sharma 9684311176 Merge upstream/main 2026-01-29 08:48:43 +05:30
Timothy aa0fff8ac5 fix: use credential store by default 2026-01-28 18:51:20 -08:00
RichardTang-Aden a1229d8e98 Merge pull request #1945 from Anshu-bhatt/feature/add-python-version-file
micro-fix: add .python-version for automatic Python version detection
2026-01-28 18:08:51 -08:00
RichardTang-Aden ad1b10db63 Merge pull request #1602 from magiawala/fix/workflowbuilder-import-error
[micro-fix] Correct WorkflowBuilder import to GraphBuilder in MCP example
2026-01-28 18:04:09 -08:00
RichardTang-Aden 96308637d6 Merge pull request #1745 from Aman030304/fix/runner-logging
Refactor: Replace print with logging in AgentRunner
2026-01-28 18:00:28 -08:00
Timothy e8a4cc908c Merge branch 'feature/hubspot-integration' into feature/credential-manager-aden-provider 2026-01-28 17:51:37 -08:00
Timothy 3c8ac436bd fix: onboarding experience 2026-01-28 17:49:13 -08:00
Shivam Sharma 4d341611a4 Merge upstream/main and resolve setup-python.sh conflict
Resolved conflict in scripts/setup-python.sh by keeping upstream's
improved formatting with color codes and ${PYTHON_CMD} variable.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 06:35:48 +05:30
Bryan @ Aden ef94bfe1fb Merge pull request #2042 from adenhq/fix/ruff-tests
(micro-fix): fix lint
2026-01-28 16:44:11 -08:00
bryan a58b52f420 fix lint 2026-01-28 16:39:40 -08:00
Bryan @ Aden 7852990073 Merge pull request #519 from vakrahul/perf/heuristic-json-repair
style: fix line length violation in output_cleaner.py
2026-01-28 16:30:01 -08:00
Bryan @ Aden 14c9478080 Merge pull request #2040 from adenhq/fix/ruff-tests
(micro-fix): ruff fix
2026-01-28 16:21:58 -08:00
bryan c5ebd91651 ruff fix 2026-01-28 16:19:49 -08:00
Bryan @ Aden 088f3cc817 Merge pull request #1444 from tjsasakifln/feat/1334-root-cli-entry-point
feat(cli): add root hive CLI entry point to eliminate PYTHONPATH
2026-01-28 16:18:14 -08:00
Bryan @ Aden 50087bb24c Merge pull request #1366 from tjsasakifln/fix/1332-rewrite-configuration-docs
docs(configuration): rewrite configuration.md to reflect actual Python framework architecture
2026-01-28 16:17:14 -08:00
Timothy @aden ca06465305 Merge branch 'main' into feature/hubspot-integration 2026-01-28 16:06:33 -08:00
Timothy ea719d5441 Merge branch 'main' into feature/credential-manager-aden-provider 2026-01-28 16:03:25 -08:00
Timothy 2627b6e69c fix: aden client 2026-01-28 16:03:10 -08:00
RichardTang-Aden c869e1955a Merge pull request #1934 from mishrapravin114/fix/auto-close-circular-duplicate
Fix/auto close circular duplicate
2026-01-28 15:44:45 -08:00
RichardTang-Aden 8293f75152 Merge pull request #295 from Invens/fix/setup-python-detect-311
Fix/setup python detect 311
2026-01-28 15:35:08 -08:00
Shivam Sharma 3ccf4bc383 Address PR review feedback
- Restore MCP server configurations in .mcp.json with updated paths
  for separate virtual environments (core/.venv and tools/.venv)
- Align Python version: change .python-version from 3.13 to 3.11
  to match pyproject.toml target version
- Remove AGENTS.md as suggested (quickstart is sufficient)
- Document cross-package imports and separate venv architecture
  in ENVIRONMENT_SETUP.md

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 04:26:12 +05:30
Shivam Sharma e71d850b79 Merge remote-tracking branch 'origin/main' into conductorchicago
Resolved conflict in tools/pyproject.toml by keeping the expanded format
with sql dependency from main.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 04:19:16 +05:30
Timothy @aden 774911b46c Merge pull request #2012 from TimothyZhang7/feature/credential-manager-aden-provider
chore: aden provider factory method
2026-01-28 14:22:45 -08:00
Timothy 480ade22ce chore: aden provider factory method 2026-01-28 14:04:08 -08:00
RichardTang-Aden bd31323876 Merge pull request #1999 from Jai-Harrish/docs/remove-stray-orchestration-text
docs: remove stray 'orchestration' text from project structure
2026-01-28 13:54:33 -08:00
RichardTang-Aden 2f3b8b27b8 Merge pull request #1973 from ryanbijoy/fix/docs-link
Docs: Fixed the .claude/ url to open the right file
2026-01-28 13:54:08 -08:00
RichardTang-Aden d39abf4312 Merge pull request #1909 from ayushigithub12/fix/csv-read-total-row1850
Fix/csv read total row1850
2026-01-28 13:52:07 -08:00
RichardTang-Aden ec7058414f Merge pull request #1957 from ryanbijoy/fix/readme-links
Docs: Fixed the to the correct URL - docs/architecture/README.md
2026-01-28 13:50:28 -08:00
RichardTang-Aden 8dc63771ca Merge pull request #2003 from adenhq/chore-add-micro-fix-requirements
chore: add-micro-fix-requirements
2026-01-28 13:39:02 -08:00
Richard Tang 434f1d7298 chore: add-micro-fix-requirements 2026-01-28 13:33:29 -08:00
ryanbijoy ee0ae20d06 Merge branch 'main' into fix/readme-links 2026-01-29 02:49:22 +05:30
RichardTang-Aden a7e16c84a5 Merge pull request #1839 from SH-Nihil-Mukkesh-25/micro-fix/roadmap-header
micro-fix(docs): add markdown header to ROADMAP
2026-01-28 13:16:39 -08:00
Jai Harrish A eaa54d9d4a docs: remove stray 'orchestration' text from project structure 2026-01-28 21:14:15 +00:00
RichardTang-Aden 2c4d034536 Merge pull request #1896 from aarav-shukla07/chore/remove-honeycomb-references
chore: remove references to archived honeycomb frontend
2026-01-28 12:43:13 -08:00
RichardTang-Aden a43b7c9403 Merge pull request #1981 from adenhq/docs-i18n-readme
chore: re-organize readmes
2026-01-28 12:32:07 -08:00
Richard Tang 752979da01 chore: re-organize readmes 2026-01-28 12:29:27 -08:00
Timothy @aden c4be938b7f Merge pull request #1960 from TimothyZhang7/feature/credential-manager-aden-provider
feature: aden sync provider for credential store
2026-01-28 12:27:51 -08:00
Timothy 3a308ba67e fix: load aden provider api key from default env var 2026-01-28 12:23:35 -08:00
ryanbijoy cadf401f23 micro-fix: Fixed the .claude/ url to open the right file 2026-01-29 01:40:50 +05:30
ryanbijoy 24dd41410a micro-fix: Fixed the to the correct URL - docs/architecture/README.md 2026-01-29 01:25:37 +05:30
Timothy 2abf43ed21 feature: aden sync provider for credential store 2026-01-28 11:54:14 -08:00
Fernando Mano 853f1e9873 chore: Merge remote-tracking branch 'refs/remotes/origin/feat/observability-trace-context' into feat/observability-trace-context 2026-01-28 16:52:38 -03:00
Anshu-bhatt 2e5ed77909 micro-fix: touch .python-version to reopen PR 2026-01-29 00:47:20 +05:30
Anshu-bhatt 0ae0bfda83 chore(dx): add .python-version for automatic Python version detection 2026-01-29 00:36:09 +05:30
mishrapravin114 22007e7aa9 chore: remove cross-verify doc from PR 2026-01-29 00:26:39 +05:30
mishrapravin114 05dde7414f fix(workflow): prevent circular duplicate closure in auto-close script
- Skip closing an issue as duplicate of another that is already closed
  (avoids circular closure when bot and human close in opposite order).
- Skip when duplicate target is self (same issue number).
- Extract testable helpers: isDupeComment, isDupeCommentOldEnough,
  authorDisagreedWithDupe, getLastDupeComment, decideAutoClose.
- Add 23 unit tests (Bun) and run them in CI before auto-close step.
- Add scripts/AUTO_CLOSE_DUPLICATES_CROSS_VERIFY.md for impact summary.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-01-29 00:22:23 +05:30
ayushi sharma 721cfb1ac8 Fixed the csv_read total_rows is incorrect for CSV files 2026-01-29 00:03:39 +05:30
ayushi sharma 5973168a8c Fixed the csv_read total_rows is incorrect for CSV files 2026-01-28 23:58:26 +05:30
Tiago Sasaki 56ed24a092 feat(cli): add root hive CLI entry point to eliminate PYTHONPATH requirement
Fixes #1334
2026-01-28 15:22:25 -03:00
Tiago Sasaki ca031f3ee1 docs(configuration): rewrite to reflect actual Python framework architecture
Fixes #1332
2026-01-28 15:17:59 -03:00
trinh31201 3ee6d98905 fix(graph): validate LLM JSON extraction to prevent empty/fabricated data 2026-01-29 01:04:08 +07:00
Fernando Mano ae5fe84fb2 feat(observability): Structured logging with automatic trace context propagation -- fix ruff formatting errors 2026-01-28 15:04:06 -03:00
Fernando Mano 92b538d5ae Merge branch 'adenhq:main' into feat/observability-trace-context 2026-01-28 14:52:37 -03:00
Fernando Mano 5351703949 feat(observability): Structured logging with automatic trace context propagation -- fix lint error 2026-01-28 14:52:02 -03:00
Bryan @ Aden c9f3de1af6 Merge pull request #1752 from ebrahimzaher/docs/add-alpine-support
docs: add setup instructions for Alpine Linux users
2026-01-28 09:29:12 -08:00
Bryan @ Aden d8d4b9399e Merge pull request #1871 from adenhq/fix/ruff-tests
(micro-fix): fixing linter
2026-01-28 09:28:44 -08:00
bryan 30bf1da424 fixing linter 2026-01-28 09:26:27 -08:00
Bryan @ Aden 6712fa9a8a Merge pull request #1855 from lachhmansingh16/fix/hive-list-crash
fix(cli): micro-fix to prevent hive list crash
2026-01-28 09:21:15 -08:00
Bryan @ Aden 2306b13fdc Merge pull request #1817 from Vikasverma9515/fix/tools-dependency-tests
fix(tools): handle optional duckdb dependency and update credential tests
2026-01-28 09:21:02 -08:00
Timothy @aden 14907a7c6e Merge pull request #1836 from kunnaaalll/chore/harden-triage-bot
chore(ci): harden triage bot against low-quality AI spam
2026-01-28 09:17:20 -08:00
DhruvPokhriyal 967cbf814b fix: validate non-negative limit and offset in csv_read function 2026-01-28 22:46:56 +05:30
Vikas Verma 0dfec38b4b fix: remove duplicate import in test_csv_tool.py 2026-01-28 22:39:18 +05:30
Lachhman Singh 9ad4702c08 style: fix indentation alignment 2026-01-28 22:07:01 +05:00
Kunal Parmar ec89bf3622 chore(ci): harden triage bot against low-quality AI spam 2026-01-28 22:32:40 +05:30
Lachhman Singh ab7c924b9a style: fix indentation and improve status message 2026-01-28 21:56:46 +05:00
Timothy @aden 0c2a2f31f6 Merge pull request #1858 from TimothyZhang7/main
fix: auto closing bot
2026-01-28 08:53:06 -08:00
RichardTang-Aden 2b52ed6397 Merge pull request #1807 from mgaldon17/feature/plan-failed-dependency-resolution
feat(plan): Implemented a resolution for the failed dependency
2026-01-28 08:51:59 -08:00
Timothy 1b2befaae9 fix: auto closing bot 2026-01-28 08:50:57 -08:00
Lachhman Singh bca56f8ff6 fix(cli): prevent crash when exports dir missing 2026-01-28 21:35:40 +05:00
Timothy @aden e9f7f75c34 Merge pull request #1844 from TimothyZhang7/feature/cursor-support
fix: usage guide
2026-01-28 08:18:48 -08:00
Timothy 69cd9ab9f5 fix: usage guide
Release / Create Release (push) Waiting to run
2026-01-28 08:16:05 -08:00
Timothy @aden fa1bba3320 Merge pull request #1837 from TimothyZhang7/feature/cursor-support
feature: cursor-aligned agent skils
2026-01-28 08:02:06 -08:00
Nihil 9b23668136 micro-fix(docs): add markdown header to ROADMAP 2026-01-28 21:27:35 +05:30
Timothy bf347d5e78 feature: cursor-aligned agent skils 2026-01-28 07:56:11 -08:00
Fernando Mano 7ba8169444 feat(observability): Structured logging with automatic trace context propagation -- remove colored logs for some cases when in prod mode 2026-01-28 12:46:54 -03:00
RichardTang-Aden 3cfc88c4d6 Merge pull request #1720 from ryanbijoy/fix/gitingore-issue
micro-fix/gitingore issue
2026-01-28 07:36:57 -08:00
Bryan @ Aden 031b20574c Merge pull request #1509 from brilliantkid87/test/storage-module-coverage
Test/storage module coverage
2026-01-28 07:36:39 -08:00
RichardTang-Aden f37448e602 Merge pull request #584 from AbdulTaufeeq01/fix/web-scrape-relative-urls
fix(web-scrape): convert relative URLs to absolute URLs using urljoin
2026-01-28 07:36:16 -08:00
Fernando Mano d090c954ae feat(observability): Structured logging with automatic trace context propagation -- adjust all logs to print full uuids when in prod mode and include documentation 2026-01-28 12:31:11 -03:00
Vikas Verma ab5b1a254f Merge branch 'main' into fix/tools-dependency-tests 2026-01-28 20:53:33 +05:30
Fernando Mano 9bee1666f1 chore: Merge branch 'main' into feat/observability-trace-context 2026-01-28 11:35:13 -03:00
Fernando Mano fb94637339 feat(observability): Structured logging with automatic trace context propagation 2026-01-28 11:27:24 -03:00
Nicklaus Wibowo 5d8996fe54 fix(executor): add execution quality tracking to ExecutionResult
Track retries, failed nodes, and execution quality (clean/degraded/failed) to expose retry metrics in ExecutionResult. This allows dashboards and monitoring to distinguish between clean success and degraded success with retries.
2026-01-28 21:26:45 +07:00
Manuel Galdon 6b30c2e8e7 feat(plan): Implemented a resolution for the failed dependency 2026-01-28 15:17:15 +01:00
brilliantkid87 1298d4b379 Merge branch 'test/storage-module-coverage' of https://github.com/brilliantkid87/hive into test/storage-module-coverage 2026-01-28 21:13:24 +07:00
Brilliantkid ac3aaa9348 fix: linter error 2026-01-28 21:13:17 +07:00
austin931114 bedc0eadf3 Merge pull request #1773 from dekalouis/fix/llm-decide-edge-condition-loader
fix(runner): add missing llm_decide edge condition mapping
2026-01-28 14:46:39 +01:00
dekalouis fe352ea54e fix(runner): add missing llm_decide edge condition mapping
The condition_map in _load_from_dict was missing the llm_decide
mapping, causing goal-aware routing to break for exported agents.

When agents with LLM_DECIDE edges were exported and re-imported,
the edge conditions were silently defaulted to ON_SUCCESS instead
of preserving the LLM_DECIDE routing logic.

This fix ensures that agents exported with goal-aware routing edges
maintain their correct behavior after re-import.
2026-01-28 20:42:42 +07:00
austin931114 7c990dd90a Merge pull request #1679 from MuzzaiyyanHussain/test-graph-executor-coverage
test: add pytest coverage for core graph executor success and failure paths
2026-01-28 14:24:03 +01:00
austin931114 f93111c319 Merge pull request #1766 from sashankthapa/docs/fix-claude-agent-skills-and-example
docs: fix Claude agent skills structure and workflow examples
2026-01-28 13:06:06 +01:00
kozuedoingregression d4b2c82d54 docs: fix Claude agent skills structure and workflow examples 2026-01-28 17:15:56 +05:30
Aman 169827636f Refactor: Replace print with logging in AgentRunner 2026-01-28 16:22:01 +05:30
Siddharth Varshney a96cd546c8 Merge branch 'main' into fix-malformed-json-tool-args 2026-01-28 15:35:33 +05:30
Siddharth Varshney eb33d4f1c2 Remove duplicate malformed JSON tool-call test 2026-01-28 09:57:27 +00:00
ryanbijoy b6ef35fe55 micro-fix: Removed the NULL boxes and Renamed it to 2026-01-28 15:19:52 +05:30
Siddharth Varshney 4253956326 Handle malformed JSON tool arguments safely 2026-01-28 09:49:17 +00:00
ryanbijoy 6fb84b6889 gitignore changes 2026-01-28 15:06:19 +05:30
Muzzaiyyan Hussain 6e94402a8d Merge branch 'main' into test-graph-executor-coverage 2026-01-28 13:45:32 +05:30
Muzzaiyyan Hussain d68b822687 chore: apply automated lint fixes 2026-01-28 13:03:51 +05:30
Harsh Makwana 64299e959a fix: setup script now creates missing exports directory 2026-01-28 12:41:20 +05:30
Aarav Shukla d14d23b010 chore: remove references to archived honeycomb frontend 2026-01-28 12:26:45 +05:30
JVSCHANDRADITHYA 30f1c700ce Changes made to _select_agent function to lazy create exports directory 2026-01-28 06:27:50 +00:00
Abdul Taufeeq M ccae478347 Merge branch 'main' into fix/web-scrape-relative-urls 2026-01-28 11:06:27 +05:30
Abdul Taufeeq M 3a2639f565 fix: web scrape tool improvements with content-type validation and max_length simplification
- Add Content-Type validation to skip non-HTML content
- Simplify max_length validation using max() and min()
- Improve title extraction with cleaner code
2026-01-28 10:52:10 +05:30
Patrick e241ec3341 Merge pull request #179 from PatrickChen928/main
feat: add nullable_output_keys, fix: #178
2026-01-28 13:16:39 +08:00
Timothy bc6f70933b feat: hubspot integration and advanced scraper 2026-01-27 20:50:17 -08:00
Devanshu Magiawala bc070c3e39 fix: correct WorkflowBuilder import to GraphBuilder in MCP example
The MCP integration example referenced WorkflowBuilder which doesn't exist.
Changed to GraphBuilder which is the correct class name.
Fixes import error when running: python core/examples/mcp_integration_example.py
2026-01-27 20:46:00 -08:00
Bryan @ Aden f30f42a4d3 Merge pull request #1577 from adenhq/fix/ruff-tests1
fixed ruff format --check
2026-01-27 20:12:03 -08:00
bryan e4c95c7a91 fixed ruff format --check 2026-01-27 20:09:31 -08:00
Bryan @ Aden bfb1a81b7a Merge pull request #944 from adionit7/docs/remove-docker-compose-references
docs: remove outdated Docker Compose references
2026-01-27 19:45:52 -08:00
Brilliantkid 257e36615a fix: linter error 2026-01-28 10:39:37 +07:00
Bryan @ Aden 2a049df099 Merge pull request #1540 from jaffarkeikei/fix/list-dir-isdir-check
fix(list_dir): add isdir check before listing
2026-01-27 19:17:25 -08:00
vakrahul 2194301260 Merge branch 'main' into perf/heuristic-json-repair 2026-01-28 08:00:38 +05:30
Bryan @ Aden 095dd05b17 Merge pull request #1173 from JohnnyWalker010/fix/json-validation-error-handling
Fix/json validation error handling
2026-01-27 18:15:04 -08:00
Aden HQ 6d03934452 Merge pull request #1535 from RichardTang-Aden/main
docs: chore for calling claude skills
2026-01-27 17:22:22 -08:00
RichardTang-Aden 5051f44543 Merge branch 'adenhq:main' into main 2026-01-27 17:14:10 -08:00
Richard Tang 9d98f9f678 docs: update the claude code skill instruction 2026-01-27 17:13:38 -08:00
Timothy @aden 9e0c24cd3a Merge pull request #1532 from TimothyZhang7/main
chore: fix lint issues
2026-01-27 17:01:05 -08:00
Timothy b66eec1e66 chore: fix lint issues 2026-01-27 16:58:06 -08:00
Timothy @aden aca66d60ed Merge pull request #1530 from adenhq/staging
Staging
2026-01-27 16:52:10 -08:00
RichardTang-Aden 8316e7c0e9 Merge pull request #1523 from adenhq/chore--ruff-fix
micro-fix: fix ruff and excluding Docs
2026-01-27 16:41:44 -08:00
Emmanuel Nwanguma 3bbecad044 config: add .gitattributes for cross-platform line ending consistency (#951)
* config: add .gitattributes for cross-platform line ending consistency

- Add comprehensive .gitattributes to normalize line endings
- Ensure shell scripts always use LF (required for Unix execution)
- Mark binary files explicitly to prevent corruption
- Eliminate CRLF warnings for Windows contributors
- Follow cross-platform best practices

This fixes persistent 'LF will be replaced by CRLF' warnings that
confuse Windows contributors during normal git operations.

Fixes #950

* fix: add trailing newline at end of file

Per review feedback from @Hundao
2026-01-28 08:41:11 +08:00
RichardTang-Aden a8eb7127aa Merge branch 'main' into chore--ruff-fix 2026-01-27 16:39:53 -08:00
Richard Tang ba2889faf8 chore: allow excluding doc PRs 2026-01-27 16:26:21 -08:00
Richard Tang 1e6c5b8e11 fix: CI issues 2026-01-27 16:26:21 -08:00
Richard Tang 1199c02bfd feat: allow micro fixes be passed as a PR 2026-01-27 16:26:21 -08:00
Bryan @ Aden 688451b2a9 Merge pull request #1521 from adenhq/feat--allow-Micro-fixes-to-excluded
feat: allow micro fixes be passed as a PR
2026-01-27 16:13:33 -08:00
Richard Tang 9ef3628209 feat: allow micro fixes be passed as a PR 2026-01-27 16:08:42 -08:00
Richard Tang 8695f3fea0 chore: fix ruff 2026-01-27 16:01:52 -08:00
brilliantkid87 88b094b5de Merge branch 'test/storage-module-coverage' of https://github.com/brilliantkid87/hive into test/storage-module-coverage 2026-01-28 04:34:54 +07:00
brilliantkid87 8b3b0c51f5 test(core): add test coverage for storage module Fixes #902 2026-01-28 04:34:33 +07:00
brilliantkid87 322ff7c470 git commit -m "test(core): add test coverage for storage module
Fixes #902"
2026-01-28 04:30:44 +07:00
Timothy @aden ad968a0b54 Merge pull request #1458 from TimothyZhang7/release/v_0_3_0
DX Improvements: Linting, Formatting & Pre-Commit Hooks
2026-01-27 11:04:48 -08:00
Timothy 5d79a7078c fix: precommit hooks for different pyproject 2026-01-27 10:50:11 -08:00
Timothy e4f451e3f5 fix: lint issues with new enforcement 2026-01-27 10:45:49 -08:00
Timothy d8496c47f0 fix: linter 2026-01-27 10:19:23 -08:00
Timothy @aden 9c28284331 Merge pull request #1428 from TimothyZhang7/feature/parallel-fanout
Release / Create Release (push) Waiting to run
feat: parallel execution framework
2026-01-27 10:17:07 -08:00
Timothy 075e9179c1 fix: retry logic broken by merge conflict 2026-01-27 10:11:54 -08:00
Timothy e61bdfc417 test(arch): fanout/fanin 2026-01-27 10:07:58 -08:00
Timothy @aden f6c5c5cadb Merge branch 'main' into feature/parallel-fanout 2026-01-27 10:04:54 -08:00
jaffar 8923011304 fix(list_dir): add isdir check before listing 2026-01-27 12:00:56 -05:00
Aman e6900647f8 ci: add windows runner to test workflow 2026-01-27 22:06:59 +05:30
Timothy @aden c441494c2f Merge pull request #1368 from adenhq/main
sync main to staging
2026-01-27 08:34:48 -08:00
Timothy @aden e1bea18357 Merge pull request #1113 from TanujaNair03/refactor/llm-judge-agnostic
refactor: provider-agnostic LLMJudge with auto-detection for OpenAI (#1103)
2026-01-27 08:31:50 -08:00
Timothy @aden 197f4f984a Merge pull request #1353 from Tahir-yamin/fix/concurrent-storage-file-locks-leak
fix(memory): patch ConcurrentStorage leak (WeakValueDictionary)
2026-01-27 08:23:05 -08:00
Tahir yamin 0381a5c87b Merge branch 'adenhq:main' into fix/concurrent-storage-file-locks-leak 2026-01-27 20:36:19 +05:00
Tahir Yamin 112b1baf2e fix(memory): patch ConcurrentStorage leak with WeakValueDictionary (Isolated Logic) 2026-01-27 20:28:22 +05:00
Shivraj12 c61c958964 fix(tool_registry): handle invalid JSON returned by tools 2026-01-27 20:23:36 +05:30
vrijmetse a59d6ac6db refactor(tools): add multi-provider support to web_search tool (#795)
* feat(tools): add Google Custom Search as alternative to Brave Search

Adds google_search tool using Google Custom Search API as an alternative
to the existing web_search tool (Brave Search).

Changes:
- Add google_search_tool with full implementation
- Register Google credentials (GOOGLE_API_KEY, GOOGLE_CSE_ID)
- Register tool in tools/__init__.py
- Add README with setup instructions

Closes #793

* test(tools): add unit tests for google_search tool

Adds 7 tests mirroring web_search_tool test patterns:
- Missing API key error handling
- Missing CSE ID error handling
- Empty query validation
- Long query validation
- num_results clamping
- Default parameters
- Custom language/country parameters

All tests pass.

* refactor(tools): add multi-provider support to web_search tool

BREAKING CHANGE: None - backward compatible. Brave remains default.

- Add Google Custom Search as alternative provider in web_search
- Add 'provider' parameter: 'auto' (default), 'google', 'brave'
- Auto mode tries Brave first for backward compatibility
- Remove separate google_search_tool (consolidated into web_search)
- Update tests to cover multi-provider functionality (13 tests)
- Update README documentation

Users with BRAVE_SEARCH_API_KEY: No changes needed
Users with GOOGLE_API_KEY + GOOGLE_CSE_ID: Can use provider='google'
Users with both: Brave preferred by default, use provider='google' to force

Closes #793

* feat(tools): fixed readme

---------

Co-authored-by: Mustafa Abdat <abdamus@hilti.com>
2026-01-27 22:46:41 +08:00
Vikas Verma 37b9be3ff6 fix(tools): handle optional duckdb dependency and update credential tests 2026-01-27 20:00:45 +05:30
Hundao 9d39c09e27 Merge pull request #973 from AryanyAI/refactor/logging-mcp-scripts
refactor(mcp): replace print() with logging in setup scripts
2026-01-27 20:56:40 +08:00
root ff38962ff2 fix: remove duplicate content 2026-01-27 12:30:38 +00:00
root 121f33687a docs: add setup instructions for Alpine Linux users 2026-01-27 12:11:59 +00:00
Tanuu 598cc8b078 refactor: provider-agnostic LLMJudge with ruff styling fixes (#1103) 2026-01-27 14:24:57 +05:30
Tanuu 3605f3705b refactor: make LLMJudge provider-agnostic with OpenAI support (#1103) 2026-01-27 14:16:34 +05:30
AryanyAI 407816ddbf style: fix ruff quote style violations (Q000)
- Change single quotes to double quotes in logging formatters
- Fixes: setup_mcp.py, verify_mcp.py formatter strings
- Addresses Q000 linter errors from PR review
2026-01-27 13:54:20 +05:30
Hundao 6acdb65c1c Merge pull request #948 from TanujaNair03/refactor/provider-agnostic-prompts
Refactor/provider agnostic prompts
2026-01-27 14:18:59 +08:00
Hundao a4b0c66564 Merge pull request #558 from Hundao/feature/csv-tools
feat(tools): add CSV tools with DuckDB SQL support
2026-01-27 14:02:06 +08:00
Timothy @aden d1e6101a0f Merge pull request #1007 from TimothyZhang7/feature/credential-manager-stor
Feature/credential manager store
2026-01-26 21:30:58 -08:00
Timothy 330fbb19ac feature(credentials): credential store arch 2026-01-26 20:16:43 -08:00
Abdul Taufeeq M 8cc431ee52 fix: correct link validation to use absolute_href instead of href 2026-01-27 09:29:58 +05:30
Timothy 39831cf4b1 feat: parallel execution framework 2026-01-26 19:25:25 -08:00
Bryan @ Aden bc8cdfd6da Merge pull request #941 from vakrahul/fix/graph-retry-backoff
Fix/graph retry backoff
2026-01-26 19:20:35 -08:00
Tanuu 500876d65e style: add required trailing newline to prompts.py 2026-01-27 07:35:54 +05:30
Tanuu e59bb2d83f style: fix linting issues (whitespace and newline) 2026-01-27 07:29:48 +05:30
vakrahul 03910d531f Merge branch 'main' into fix/graph-retry-backoff 2026-01-27 07:28:22 +05:30
vakrahul a122345f9c fix(graph): restore node.max_retries and fix type check per review 2026-01-27 07:26:40 +05:30
Bryan @ Aden 6d025c808a Merge pull request #946 from not-anas-ali/fix/callable-type-annotations
fix(types): correct type annotation from lowercase 'callable' to 'Callable'
2026-01-26 17:52:00 -08:00
Bryan @ Aden 8525aec49c Merge pull request #934 from adionit7/fix/validate-exports-skip-when-empty
ci: make Validate Agent Exports skip clearly when exports/ is missing or empty
2026-01-26 17:48:44 -08:00
Tanuja Nair b0435a188f Merge branch 'adenhq:main' into refactor/provider-agnostic-prompts 2026-01-27 07:07:01 +05:30
Bryan @ Aden 3eb964eff2 Merge pull request #933 from adionit7/docs/fix-execute-command-tool-name-readme
docs(tools): fix tool name in README table (execute_command → execute_command_tool)
2026-01-26 17:36:24 -08:00
Bryan @ Aden ed88129b00 Merge pull request #927 from saboor2632/fix/worker-node-json-logging
fix(graph): add logging for JSON parsing failures in worker_node
2026-01-26 17:36:13 -08:00
vakrahul e1d8624483 Merge branch 'main' into perf/heuristic-json-repair 2026-01-27 07:03:07 +05:30
vakrahul 68264b54d9 style: fix linting issues in output_cleaner.py 2026-01-27 07:02:43 +05:30
adionit7 fc36a5e607 docs: remove outdated Docker Compose references
The repository does not include docker-compose files, but multiple docs
claimed "Docker Compose deployment out of the box." This was left over
from a previous release.

Changes:
- README.md: Update FAQ to describe Python package deployment
- README.ko.md: Same update for Korean translation
- docs/configuration.md: Remove "Docker Compose Integration" section
  and docker compose commands
- docs/quizzes: Update tasks that referenced docker-compose.yml
- .github/CODEOWNERS: Remove docker-compose*.yml entry
- scripts/setup.sh: Remove docker-compose.override.yml copy step

Fixes #923
2026-01-27 06:58:18 +05:30
vakrahul 1631d01dd2 merge: resolve conflicts in executor.pyx 2026-01-27 06:52:07 +05:30
Tanuu e846ad6ea7 refactor: implement provider-agnostic logic for test templates
Centralized _get_api_key in prompts.py to support OpenAI, Cerebras, and Groq via environment variables while maintaining Anthropic support through CredentialManager.
2026-01-27 06:39:55 +05:30
adionit7 e57cad7159 ci: make Validate Agent Exports skip clearly when exports/ is missing or empty
Previously, when exports/ was missing or empty, the bash glob
`exports/*/` would not match anything and the loop would silently
do nothing. The job would pass without actually validating anything,
which was misleading.

Now the job:
- Explicitly checks if exports/ directory exists
- Uses nullglob to handle empty directories properly
- Logs clear messages when skipping validation
- Reports the number of agents validated when successful

Fixes #887
2026-01-27 05:59:43 +05:30
adionit7 0cf9e39f6f docs(tools): fix tool name in README table (execute_command → execute_command_tool)
The "Available Tools" table listed `execute_command` but the actual
registered name is `execute_command_tool`. This aligns the docs with
the runtime name in __init__.py and the tool's own README.

Fixes #901
2026-01-27 05:58:59 +05:30
saboor2632 852332483a fix(graph): add logging for JSON parsing failures in worker_node 2026-01-27 05:10:34 +05:00
not-anas-ali 2b8604610c fix(types): correct type annotation from lowercase 'callable' to 'Callable'
Fixes #922
2026-01-27 05:06:27 +05:00
Siddharth Varshney d6b05bf337 Handle malformed JSON tool arguments in LiteLLMProvider 2026-01-26 23:27:32 +00:00
Yevhen Omelianenko b07aff1be3 Merge branch 'adenhq:main' into fix/json-validation-error-handling 2026-01-27 01:25:12 +02:00
yevhen_omelianenko f3df70e8fe fix: add consistent JSON validation error handling in agent_builder_server.py
Wrap json.loads() calls in try-catch blocks for add_node() and update_node()
  functions to match the error handling pattern used elsewhere in the file.

  Fixes #907
2026-01-27 01:13:42 +02:00
Bryan @ Aden 9230ac6c20 Merge pull request #871 from pradyten/feat/llm-judge-configurable-provider
feat(testing): add configurable LLM provider to LLMJudge
2026-01-26 14:53:12 -08:00
Bryan @ Aden 5cf25c6f10 Merge pull request #906 from adenhq/fix/ruff-tests
fixed linter
2026-01-26 14:49:45 -08:00
bryan d064c98998 fixed linter 2026-01-26 14:47:56 -08:00
Bryan @ Aden 25fabd8068 Merge pull request #576 from savankansagara1/fix/mock-mode-llm-provider
Fix: Add MockLLMProvider to enable mock mode execution
2026-01-26 14:41:13 -08:00
Bryan @ Aden 396e5c35a6 Merge pull request #528 from gaurav-code098/fix/web-scrape-content-type
fix(tools): validate Content-Type in web_scrape tool (Closes #487)
2026-01-26 14:34:37 -08:00
RichardTang-Aden 0a8c30c3da Merge pull request #788 from SoulSniper-V2/feat/add-deepseek-docs
docs(llm): add DeepSeek models support documentation and examples
2026-01-26 14:33:51 -08:00
Aden HQ 798f3cfd36 Merge pull request #349 from Himanshu-ABES/feat/pydantic-llm-validation
feat(validation): add Pydantic model validation for LLM outputs
2026-01-26 14:14:12 -08:00
pradyumn tendulkar 69ad0be5ff Merge branch 'main' into feat/llm-judge-configurable-provider 2026-01-26 17:06:30 -05:00
Himanshu Chauhan 60f2e674ec feat(validation): add Pydantic model validation for LLM outputs
- Add output_model field to NodeSpec for specifying Pydantic model
- Add max_validation_retries field (default: 2) for retry configuration
- Add validation_errors field to NodeResult for error tracking
- Implement validate_with_pydantic() in OutputValidator
- Implement format_validation_feedback() for LLM retry prompts
- Auto-generate JSON schema from Pydantic model for response_format
- Add retry loop that feeds validation errors back to LLM
- Add 28 comprehensive tests covering all new functionality
2026-01-26 14:06:29 -08:00
Siddharth Varshney 6bb256e277 docs: clarify illustrative examples in validation section 2026-01-26 21:47:00 +00:00
Bryan @ Aden 81ad85db5e Merge pull request #876 from adenhq/fix/ruff-tests
Fix/ruff tests
2026-01-26 13:41:15 -08:00
Timothy @aden ed25ef7562 Merge pull request #762 from vishalharkal15/fix/concurrent-storage-race-condition
Fix race condition in ConcurrentStorage.stop() causing data loss
2026-01-26 13:38:17 -08:00
bryan d9c696aa22 fixed all linter errors 2026-01-26 13:37:25 -08:00
bryan 22358a2d83 Merge branch 'main' into fix/ruff-tests 2026-01-26 13:37:12 -08:00
Timothy @aden 39a2a34380 Merge pull request #874 from TimothyZhang7/main
fix: git actions
2026-01-26 13:36:50 -08:00
Timothy @aden 07077dbb52 Merge branch 'adenhq:main' into main 2026-01-26 13:35:33 -08:00
Timothy e1346ae557 fix: include actual status check in pr requirements 2026-01-26 13:34:54 -08:00
Timothy 4f3d34d01e fix: consolidate dedupe and triage 2026-01-26 13:29:33 -08:00
pradyten 8516eba7c5 feat(testing): add configurable LLM provider to LLMJudge
Allow LLMJudge to accept any LLMProvider instance instead of being
hardcoded to use Anthropic. This aligns with the framework's pluggable
LLM design and enables users to:

- Use the same LLM provider across their agent and tests
- Run tests with cheaper or local models
- Avoid requiring an Anthropic API key for testing

Backward compatible: existing code using LLMJudge() without arguments
continues to work by falling back to Anthropic.

Closes #477

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-26 16:27:08 -05:00
Timothy @aden 63010d45b2 Merge pull request #861 from TimothyZhang7/main
fix: explain the PR requirements
2026-01-26 12:43:30 -08:00
Timothy @aden 59db8f99d7 Merge branch 'adenhq:main' into main 2026-01-26 12:42:22 -08:00
Timothy 236e8e8638 fix: explain the pr requirement 2026-01-26 12:41:16 -08:00
Timothy @aden 3279686342 Merge pull request #858 from TimothyZhang7/main
fix: PR requirements enforcement
2026-01-26 12:32:11 -08:00
Timothy @aden b6a77ffd7e Merge branch 'adenhq:main' into main 2026-01-26 12:31:25 -08:00
Timothy e0544a57f9 fix: pr requirements 2026-01-26 12:30:12 -08:00
AryanRevolutionizingWorld 82c32e8d9f refactor(mcp): replace print() with logging in setup scripts
Replace direct print() statements with Python's logging module in MCP
setup and verification scripts for better configurability and
production readiness.

Changes:
- setup_mcp.py: Convert 30+ print() calls to structured logging
- verify_mcp.py: Convert 40+ print() calls to structured logging
- mcp_server.py: Convert 4 print() calls to structured logging
- Preserve colored CLI output using logging formatters
- Maintain all functional behavior (refactor only)

Benefits:
- Configurable log levels (debug/info/warning/error)
- Better observability in production environments
- Cleaner programmatic usage (no stdout pollution)
- Professional logging practices

Fixes #833
2026-01-27 01:57:16 +05:30
Bryan @ Aden a180d78d0c Merge pull request #782 from ayush123-bit/docs/windows-environment-clarification
docs: clarify Windows environment expectations in setup guides
2026-01-26 12:22:43 -08:00
Bryan @ Aden 9be036aa37 Merge pull request #602 from Kira714/fix/callable-type-annotation-599
fix(llm): correct type annotation from lowercase `callable` to `Callable`
2026-01-26 12:22:34 -08:00
Bryan @ Aden 8c39dad22d Merge pull request #605 from Kira714/fix/session-state-dict-validation-590
fix(executor): add type validation for session state memory
2026-01-26 12:22:21 -08:00
Bryan @ Aden 0a7aa62c45 Merge pull request #608 from Kira714/fix/agent-runtime-keyerror
fix(runtime): use safe dictionary access in trigger_and_wait()
2026-01-26 12:22:12 -08:00
Bryan @ Aden cbd34db278 Merge pull request #665 from Ranxin2023/main
Make MCP tool registration idempotent to avoid conflicts with agent-generated tools
2026-01-26 12:22:02 -08:00
Bryan @ Aden 414d86f2f0 Merge pull request #672 from subham-panja/docs/fix-architecture-link
docs(readme): fix broken architecture documentation link
2026-01-26 12:21:34 -08:00
Timothy @aden 4852d7f63b Merge pull request #842 from TimothyZhang7/main
fix: PR requirements backfill
2026-01-26 11:55:55 -08:00
Timothy @aden 1165858a58 Merge branch 'adenhq:main' into main 2026-01-26 11:54:51 -08:00
Timothy 4575540d69 fix: pr requirements 2026-01-26 11:54:10 -08:00
Timothy @aden 051aa4f065 Merge pull request #840 from TimothyZhang7/main
fix: backfill pr requirements
2026-01-26 11:49:06 -08:00
Timothy 6834dcfcb7 fix: backfill pr requirements 2026-01-26 11:47:04 -08:00
Timothy @aden 95c481ae52 Merge pull request #824 from TimothyZhang7/main
Fix: PR requirements
2026-01-26 11:17:09 -08:00
Timothy @aden 5c266d6920 Merge branch 'adenhq:main' into main 2026-01-26 11:16:04 -08:00
Timothy 7fe21d91f2 fix: pr requirements 2026-01-26 11:15:29 -08:00
Timothy @aden 751715bffe Merge pull request #822 from TimothyZhang7/main
PR Requirements Workflow
2026-01-26 11:12:31 -08:00
Timothy @aden a6bda9628c Merge branch 'adenhq:main' into main 2026-01-26 10:57:47 -08:00
Timothy ac646603c9 chore: enforce pr requirement 2026-01-26 10:54:50 -08:00
Timothy @aden 551e648be7 Merge pull request #810 from TimothyZhang7/main
GitHub Actions to auto dedupe issues
2026-01-26 10:42:13 -08:00
Timothy @aden 2f852a7eba Merge branch 'adenhq:main' into main 2026-01-26 10:41:16 -08:00
Timothy 7d462ff976 feat(actions): auto dedupe workflow 2026-01-26 10:38:01 -08:00
Timothy d1cfef5d8a fix: issue dedupe action 2026-01-26 10:06:37 -08:00
Bryan @ Aden f3c9c591bf Merge pull request #610 from Kira714/fix/semaphore-private-access
fix(stream): avoid private Semaphore._value attribute access
2026-01-26 10:05:32 -08:00
Bryan @ Aden 0bbe2d5889 Merge pull request #444 from fermano/fix/executionstream-oom
fix(runtime): execution stream memory leak
2026-01-26 10:02:23 -08:00
Timothy @aden aa341317f5 Merge pull request #791 from TimothyZhang7/main
chore(actions): automated bot
2026-01-26 09:45:45 -08:00
Timothy 6ae38b66ba chore(actions): automated bot 2026-01-26 09:43:25 -08:00
Arush Wadhawan 40e39d29f8 docs(llm): add DeepSeek models support documentation and examples
Signed-off-by: Arush Wadhawan <warush23+github@gmail.com>
2026-01-26 12:24:51 -05:00
ayush123-bit 6d7d472792 docs: clarify Windows environment expectations and WSL recommendation 2026-01-26 22:31:20 +05:30
Vishal dae63214d5 Fix race condition in ConcurrentStorage.stop() causing data loss
Fixes #755

Problem:
The stop() method had a critical race condition where _flush_pending() and
_batch_task competed for queue items, causing:
- Data loss during shutdown
- Queue items processed twice or lost
- Batch writer cancelled mid-write

Root Cause:
The method called _flush_pending() while _batch_task was still running.
Both operations drained the same queue simultaneously, leading to conflicts.

Solution:
Reordered shutdown sequence to:
1. Cancel batch task first
2. Wait for task completion (handles CancelledError with final flush)
3. Then flush any remaining items

This eliminates queue competition because:
- _batch_writer() flushes its current batch when cancelled
- After cancellation completes, _flush_pending() safely processes remaining items
- No race condition, no data loss

Changes:
- Moved batch task cancellation before _flush_pending()
- Ensures clean shutdown sequence
- Prevents queue drain conflicts

Testing:
- All 209 tests pass
- No duplicate flushes
- Clean shutdown guaranteed

Impact:
- Prevents data loss during graceful shutdown
- Eliminates race condition between flush operations
- Ensures all writes complete before stop returns
2026-01-26 21:38:59 +05:30
bryan 46bdedcabb ruff check fix 2026-01-26 07:32:03 -08:00
Patrick 5fbaae5d8d Merge branch 'adenhq:main' into main 2026-01-26 20:04:51 +08:00
Abdul Taufeeq M c9bc2b287e security: prevent path traversal attacks in FileStorage
Add comprehensive input validation to _validate_key() method that blocks:
- Empty keys
- Path separators (/ and \)
- Parent directory references (..)
- Absolute paths
- Null bytes
- Dangerous shell characters

Apply validation to all index operations: _get_index(), _add_to_index(), _remove_from_index()
Add 21 comprehensive test cases covering valid keys and all attack scenarios

Fixes: CWE-22 Path Traversal vulnerability (CVSS 7.5-9.1 Critical)

Tests: 21/21 passing
2026-01-26 17:31:43 +05:30
subhampanja28 5b46132c81 docs(readme): fix broken architecture documentation link 2026-01-26 17:24:22 +05:30
RanxinLi 7e65ab0b36 Revert local Claude settings 2026-01-26 03:28:03 -08:00
RanxinLi 8a86787b64 Merge branch 'main' of https://github.com/Ranxin2023/hive 2026-01-26 03:24:01 -08:00
RanxinLi b2acfb5447 change the file tool register bug 2026-01-26 03:23:49 -08:00
Sourabsb 10ea23be34 fix: improve cleanup race handling, thread join warning, and CancelledError strategy
- Treat run_coroutine_threadsafe race (RuntimeError) as expected: mark cleanup_attempted and log debug
- Mark cleanup_attempted on timeout/errors to avoid misleading fallback
- Add warning when loop thread fails to terminate within join timeout
- Make CancelledError best-effort (log, no re-raise) for session and stdio cleanup
2026-01-26 16:38:58 +05:30
Sourabsb 37a0324c05 fix: increase thread join timeout and clarify redundant None assignments
Changes based on Copilot AI review (2 issues):

1. Thread join timeout was shorter than cleanup timeout (Issue #1):
   - Changed _THREAD_JOIN_TIMEOUT from 5 to 12 seconds
   - Must be >= cleanup timeout (10s) plus buffer for loop.stop()
   - Prevents thread abandonment during active cleanup

2. Added detailed comment for redundant None assignments (Issue #2):
   - Explained why we set _session/_stdio_context to None even if
     _cleanup_stdio_async() already did it
   - Documents the safety cases: timeout, failure, skip, cancellation
   - Makes code intent clear for future maintainers
2026-01-26 16:31:22 +05:30
Sourabsb 837ef2da59 fix: address Copilot AI review - timeouts and CancelledError handling
Changes based on Copilot AI review (3 issues):

1. Increased thread join timeout (Issue #1):
   - Changed from 2 to 5 seconds
   - Made proportional to cleanup timeout
   - Defined as class constant _THREAD_JOIN_TIMEOUT

2. Handle asyncio.CancelledError explicitly (Issue #2):
   - Added separate except clause for CancelledError
   - Logs specific warning for cancelled cleanup
   - Re-raises CancelledError as per asyncio best practices
   - Added for both session and stdio_context cleanup

3. Increased cleanup timeout to match connection timeout (Issue #3):
   - Changed from 5 to 10 seconds (matches _connect_stdio timeout)
   - Defined as class constant _CLEANUP_TIMEOUT
   - Prevents incomplete cleanup with slow MCP servers
2026-01-26 16:23:44 +05:30
Muzzaiyyan Hussain e0bc265bb2 test: add pytest coverage for core graph executor success and failure 2026-01-26 16:18:32 +05:30
Sourabsb a39afbea23 fix: separate TimeoutError handling for better error reporting
Per Copilot AI review: distinguish timeout scenarios from actual
cleanup failures by catching TimeoutError separately. This helps
with debugging by providing clearer error messages.
2026-01-26 16:15:57 +05:30
Sourabsb 7375b26925 fix: address all Copilot AI review comments
Changes based on Copilot AI review (5 issues):

1. Simplified _cleanup_stdio_async():
   - Used try/finally pattern for cleaner reference clearing
   - References cleared in finally block (always executed)

2. Removed deprecated asyncio.get_event_loop():
   - Removed complex temp loop pattern entirely
   - Simplified fallback to just log warning and clear refs

3. Simplified fallback path (Issue #4):
   - When loop exists but not running, resources are in undefined state
   - Complex event loop manipulation removed
   - Just log warning and proceed with reference clearing
   - OS will reclaim resources on process exit

4. Handled race condition (Issue #5):
   - Added comment documenting the inherent race condition
   - Added try/except around loop.call_soon_threadsafe()
   - Track cleanup_attempted flag for proper fallback handling

5. Added explanatory comments:
   - Documented why redundant None assignments exist (safety)
   - Explained race condition handling approach

Note: Test coverage suggestion (#3) acknowledged but deferred
to separate PR to keep this fix focused.
2026-01-26 16:09:36 +05:30
Sourabsb 3626051b1a fix: address Copilot AI review suggestions for disconnect cleanup
Changes based on Copilot AI review:

1. Fixed fallback path using temp event loop pattern:
   - asyncio.run() may fail if there's already an event loop in current thread
   - Now uses new_event_loop() + set_event_loop() + run_until_complete() pattern
   - Preserves and restores original loop if one existed

2. Set references to None immediately after __aexit__:
   - self._session = None after closing session
   - self._stdio_context = None after closing context
   - Prevents window where closed objects are still referenced
   - Also clears on error to prevent reuse of broken objects

3. Added documentation for critical cleanup order:
   - Session must close BEFORE stdio_context
   - Session depends on streams provided by stdio_context
   - Mirrors initialization order in _connect_stdio()
   - Added warning comment to prevent future breakage
2026-01-26 15:59:59 +05:30
Sourabsb fbcdaf7c6d fix: add is_running() and is_closed() checks to _run_async() to prevent deadlock
When self._loop exists but is not running or is closed (e.g., crashed,
stopped externally, or closed), the code now falls through to the
standard approach that properly handles both sync and async contexts.

Key changes:
- Added is_running() AND is_closed() checks before using run_coroutine_threadsafe()
- Removed separate else branch with asyncio.run() that didn't handle async context
- Now falls through to standard approach which:
  - Detects if already in async context (get_running_loop)
  - Uses separate thread with new event loop if in async context
  - Uses asyncio.run() only when no event loop is running

Edge cases covered:
1. self._loop is None (sync context) -> uses asyncio.run()
2. self._loop is None (async context) -> uses thread with new loop
3. self._loop running normally -> uses run_coroutine_threadsafe()
4. self._loop stopped (sync context) -> falls through, uses asyncio.run()
5. self._loop stopped (async context) -> falls through, uses thread
6. self._loop closed (sync context) -> falls through, uses asyncio.run()
7. self._loop closed (async context) -> falls through, uses thread

Fixes #625
2026-01-26 15:35:53 +05:30
Kira714 6934b331d4 fix(stream): avoid private Semaphore._value attribute access
Calculate available_slots from running execution count instead of
accessing the private _value attribute of asyncio.Semaphore.

Private attributes may change between Python versions and are not
part of the public API.

Fixes #609

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-26 16:37:49 +08:00
Kira714 734fe1e4d7 fix(runtime): use safe dictionary access in trigger_and_wait()
Replace direct dictionary access with .get() and explicit ValueError
to prevent KeyError when entry_point_id is not found in _streams dict.

Fixes #589

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-26 16:36:55 +08:00
Kira714 d900f38f64 fix(executor): add type validation for session state memory
Fixes #590

Previously, the code assumed `session_state["memory"]` was always a dict
when the key existed. If it was `None` or another non-dict type, this
would raise a TypeError during iteration.

Now we validate the type before iterating and log a warning if the
memory data is not a dict, preventing runtime crashes when resuming
from malformed session states.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-26 16:28:56 +08:00
Kira714 1c78174aaf fix(llm): correct type annotation from lowercase callable to Callable
Fixes #599

The `callable` keyword in Python is a builtin function to check if something
is callable, NOT a type annotation. For type hints, we need `Callable` from
the typing module.

Changed:
- `tool_executor: callable` → `tool_executor: Callable[[ToolUse], ToolResult]`

Files updated:
- core/framework/llm/provider.py
- core/framework/llm/anthropic.py
- core/framework/llm/litellm.py

This fixes mypy/pyright type checking errors like:
"Variable annotation syntax is for types; callable is a function"

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-26 16:27:54 +08:00
Abdul Taufeeq M b897e5bdf2 test(web-scrape): add comprehensive tests for URL link conversion
Add 7 test methods to TestWebScrapeToolLinkConversion class to validate
the new URL conversion feature:

- test_relative_links_converted_to_absolute: ../page and page.html -> absolute
- test_root_relative_links_converted: /about -> absolute
- test_absolute_links_unchanged: https://external.com remains unchanged
- test_links_after_redirects: Uses final URL, not requested URL
- test_fragment_links_preserved: #section1 anchors work correctly
- test_query_parameters_preserved: ?id=123&sort=date retained
- test_empty_href_skipped: Empty text links filtered out

All tests use unittest.mock for HTTP response mocking to avoid live network calls.
Tests comprehensively validate the urljoin() implementation that converts all
relative URLs to absolute URLs based on the final response URL.
2026-01-26 13:28:12 +05:30
Abdul Taufeeq M 09dd990273 fix(web-scrape): convert relative URLs to absolute URLs using urljoin
- Add urljoin import from urllib.parse
- Convert all extracted links to absolute URLs based on page base_url
- Use response.url as base_url to handle redirects correctly
- Fixes issue where relative links like '../page' were unusable
2026-01-26 13:09:34 +05:30
savan patel af3b8b1b80 Fix: Add MockLLMProvider to enable mock mode execution
- Created MockLLMProvider class that generates placeholder JSON responses
- Updated AgentRunner._setup() to use MockLLMProvider when mock_mode=True
- Added MockLLMProvider to llm module exports
- Fixes issue where agents failed with 'LLM not available' in mock mode

The MockLLMProvider extracts expected output keys from system prompts
and generates mock JSON responses for structural validation without
making real LLM API calls. This enables:
- Testing agent structure without API keys
- Fast iteration on agent graphs
- CI/CD testing without credentials
- Zero-cost structural validation

Tested with simple agent - all nodes execute successfully in mock mode.
2026-01-26 12:34:14 +05:30
Sourabsb fc539a5d7b fix: add fallback cleanup when event loop is not running
Added else branch to handle edge case where loop exists but is not running. Uses asyncio.run() as fallback to ensure cleanup happens even if the loop was stopped externally or due to an error.
2026-01-26 12:04:43 +05:30
hundao d558bf4f60 feat(tools): add CSV tools with DuckDB SQL support
Add comprehensive CSV manipulation tools:
- csv_read: Read CSV with pagination (limit/offset)
- csv_write: Create new CSV files
- csv_append: Append rows to existing CSV
- csv_info: Get CSV metadata (columns, row count, file size)
- csv_sql: Query CSV using SQL (powered by DuckDB)

Features:
- Session sandbox security (workspace_id, agent_id, session_id)
- DuckDB as optional dependency for SQL queries
- Security: Only SELECT queries allowed, dangerous keywords blocked
- Full Unicode support
- 45 tests covering all tools

Install SQL support: pip install tools[sql]
2026-01-26 14:23:18 +08:00
Sourabsb 99efbe03bb fix: properly close MCP session and STDIO context managers in disconnect()
Added _cleanup_stdio_async() method to properly call __aexit__() on session and stdio_context before stopping the event loop.

This prevents resource leaks, zombie processes, and unclosed file handles.
2026-01-26 11:35:28 +05:30
gaurav 5168ed3cd4 fix(tools): validate Content-Type in web_scrape tool (Closes #487) 2026-01-26 11:15:52 +05:30
vakrahul f614ee7f15 style: fix line length violation in output_cleaner.py 2026-01-26 10:50:42 +05:30
Timothy @aden 02330653ee Merge pull request #489 from TimothyZhang7/main
docs: architecture readme
2026-01-25 19:37:36 -08:00
Timothy ae37d9816e docs: architecture readme 2026-01-25 19:36:03 -08:00
RichardTang-Aden 7351675795 Merge pull request #222 from Chrishabh2002/feat/manual-agent-codefirst
Add minimal code-first agent example and isolate core dependencies
2026-01-25 19:21:22 -08:00
RichardTang-Aden fa5d5057f4 Merge pull request #447 from pradyten/fix/hallucination-detection-full-string-check
fix(graph): check entire string for code indicators in hallucination detection
2026-01-25 19:08:19 -08:00
Bryan @ Aden 854a867597 Merge pull request #293 from yumosx/graph
feat(file_system_toolkits): add encoding and max_size params to view_file
2026-01-25 18:37:42 -08:00
RichardTang-Aden 35ef467dbe Merge pull request #361 from Koushith/fix/docs-hardcoded-path-and-venv
fix(docs): remove hardcoded path and add venv troubleshooting
2026-01-25 18:19:02 -08:00
yumosx 89dbc638e1 test(file_system): add tests for file viewing edge cases
Add tests for file viewing functionality with max_size truncation, negative max_size, custom encoding, and invalid encoding scenarios to ensure proper error handling and behavior.
2026-01-26 10:14:18 +08:00
Timothy @aden 4eac1b9e97 Merge pull request #475 from aiSynergy37/fix/validate-api-key-warning
Fix validate() fallback to warn on model-specific API keys
2026-01-25 18:09:17 -08:00
mithileshk 80f938a7af Fix validate warning for model-specific API keys 2026-01-26 07:34:13 +05:30
Richard T 2f7cf3bc57 chore: remove the outdated architecture documentation 2026-01-25 17:56:18 -08:00
vakrahul 1a7ed9c962 style: fix F821 undefined name and E501 line length errors 2026-01-26 07:25:23 +05:30
Bryan @ Aden 7004fffc08 Merge pull request #402 from Shamanth-8/fix/rce-safe-eval
Unsanitized expression evaluation in EdgeSpec (RCE Vulnerability)
2026-01-25 17:53:15 -08:00
vakrahul 06535192e6 verifying 2026-01-26 07:22:04 +05:30
vakrahul 5923147a71 chore(graph): fix lint issues in retry backoff loggings 2026-01-26 07:19:59 +05:30
Timothy @aden acaa89f584 Merge pull request #434 from nihalmorshed/documentation/fix-tool-name-references
docs(README): update tool names and descriptions in README inside "tools"
2026-01-25 17:47:36 -08:00
Timothy @aden e6af1f64ac Merge pull request #427 from guillermop2002/fix/remove-hardcoded-anthropic-provider
fix(llm): use LiteLLMProvider instead of hardcoded AnthropicProvider
2026-01-25 17:47:05 -08:00
Richard T 53aebd5cea docs: add issue assignment for contributors 2026-01-25 17:38:23 -08:00
Shivam Sharma d64020e024 removing the custom rule. 2026-01-26 07:05:43 +05:30
Shivam Sharma 975a002796 1. fixing quickstart.sh to use uv.
2. giving core and tools separate venv.

yea thats all.
2026-01-26 06:53:52 +05:30
Shivam Sharma 6e6b83848f uv sync 2026-01-26 06:53:52 +05:30
Shivam Sharma 3fb255c906 added agents.md 2026-01-26 06:53:52 +05:30
Shivam Sharma cd51d663fb make it fast using uv(package manager), ruff(linter) and ty(type
checker).

1. added an agents.md file for better ai assistance.
2. repalced pip with uv and added ty type checker.
2026-01-26 06:53:52 +05:30
Bryan @ Aden 28b0b6206b Merge pull request #470 from adenhq/chore-fix-python-tests
chore: fixed python tests
2026-01-25 17:19:57 -08:00
bryan 9859dc65e0 chore: fixed python tests 2026-01-25 17:19:21 -08:00
Bryan @ Aden 5c2288fbf5 Merge pull request #397 from Tahir-yamin/fix/respect-node-max-retries
fix(graph): Respect node_spec.max_retries configuration
2026-01-25 17:07:51 -08:00
yumosx 1b47d1cad4 Merge remote-tracking branch 'upstream/main' into graph 2026-01-26 09:06:20 +08:00
Bryan @ Aden 126bbf17c3 Merge pull request #228 from himanshu748/fix/remove-duplicate-web-search-registration
fix: remove duplicate web_search tool registration
2026-01-25 17:04:07 -08:00
Timothy 995ab8faaf fix: allow triage for all issues 2026-01-25 16:58:30 -08:00
RichardTang-Aden 9d1b1ab9d4 Merge pull request #187 from RussellLuo/improve-runtime-config
feat(skills): add support for setting `api_key` and `api_base` in RuntimeConfig
2026-01-25 16:45:21 -08:00
Bryan @ Aden 7e630b9416 Merge pull request #259 from charan2456/fix/docs-exports-clarification
docs: clarify that exports/ is user-generated, not included in repo
2026-01-25 16:27:15 -08:00
Timothy 14faca3933 fix: remove oidc token permission check 2026-01-25 16:22:19 -08:00
Timothy e8c9cc65dc chore: use GITHUB_TOKEN in action 2026-01-25 16:19:16 -08:00
RichardTang-Aden f0deedb1f8 Merge pull request #174 from AysunItai/fix/anthropic-provider-response-format
fix: align AnthropicProvider.complete with LLMProvider (response_format)
2026-01-25 16:01:16 -08:00
Bryan @ Aden 70693f4824 Merge pull request #231 from himanshu748/docs/update-skills-directory-structure
docs: update skills directory structure to match actual output
2026-01-25 15:57:27 -08:00
RichardTang-Aden 3ee380d98f Merge pull request #166 from LunaStev/translate/korean
Translate Korean
2026-01-25 15:54:14 -08:00
Timothy @aden b9b0c2c844 Merge pull request #451 from adenhq/add-claude-github-actions-1769383167894
Add claude GitHub actions 1769383167894
2026-01-25 15:41:07 -08:00
bryan c53acfdf77 set model 2026-01-25 15:39:31 -08:00
bryan 08beffea33 added claude issue triage workflow 2026-01-25 15:31:10 -08:00
Bryan @ Aden 7ed5006a70 "Claude Code Review workflow" 2026-01-25 15:19:29 -08:00
Bryan @ Aden e009de1c9a "Claude PR Assistant workflow" 2026-01-25 15:19:28 -08:00
Pradyumn Tendulkar df7b950e6f fix(graph): check entire string for code indicators in hallucination detection
Previously, the hallucination detection in SharedMemory.write() and
OutputValidator.validate_no_hallucination() only checked the first 500
characters for code indicators. This allowed hallucinated code to bypass
detection by prefixing with innocuous text.

Changes:
- Add _contains_code_indicators() method to SharedMemory and OutputValidator
- Check entire string for strings under 10KB
- Use strategic sampling (start, 25%, 50%, 75%, end) for longer strings
- Expand code indicators to include JavaScript, SQL, and HTML/script patterns
- Add comprehensive test suite with 19 test cases

Fixes #443

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-25 18:06:09 -05:00
Fernando Mano 7f3bc811b0 fix(runtime): execution stream memory leak -- adjust gitignore 2026-01-25 19:42:47 -03:00
guillermop2002 f0c9d4e87f fix(llm): use LiteLLMProvider instead of hardcoded AnthropicProvider
Fixes #213
2026-01-25 22:19:29 +01:00
Nihal Morshed 57781c520e docs(README): update tool names and descriptions in README inside "tools" 2026-01-26 03:17:28 +06:00
Nihal Morshed 05b18fb312 fix(tools): remove duplicate registration of web search tool 2026-01-26 03:06:50 +06:00
Fernando Mano 829783749c fix(runtime): execution stream memory leak 2026-01-25 17:21:05 -03:00
Shamanth-8 48b38e5d95 Fix: Unsanitized expression evaluation needs fix to use the safe evaluator 2026-01-25 23:56:01 +05:30
Tahir Yamin 1527a05336 fix(graph): Respect node_spec.max_retries configuration
- Remove hardcoded max_retries_per_node = 3
- Use node_spec.max_retries for all retry logic
- Add comprehensive test suite (6 test cases)
- Allows per-node retry configuration as intended

Fixes #363
2026-01-25 23:06:26 +05:00
vakrahul 491e6585a4 fix(graph): implement exponential backoff for node retries 2026-01-25 23:09:09 +05:30
koushith 8333ba6ec2 fix(docs): remove hardcoded path and add venv troubleshooting
- Replace hardcoded /home/timothy/oss/hive/ with generic instruction
- Add troubleshooting section for PEP 668 externally-managed-environment error
- Document virtual environment setup for Python 3.12+ on macOS/WSL/Linux

Fixes #322
Fixes #355
2026-01-25 22:22:45 +05:30
yumosx a5fcb89991 feat(file_system_toolkits): add encoding and max_size params to view_file
Add support for custom file encoding and size limits when viewing files. The max_size parameter prevents loading excessively large files by truncating content and adding a warning message when the limit is exceeded. Also includes validation for negative max_size values and checks if path is a file.
2026-01-25 21:53:51 +08:00
kali 3fd8f9f97a fix: enhance Python detection and error handling in setup script 2026-01-25 19:11:52 +05:30
kali 2180a60c21 Fix setup-python.sh to prefer python3.12/3.11 and support PYTHON override 2026-01-25 18:56:27 +05:30
Adith L S f64820a13e fix(cli): fix KeyError 'steps' in cmd_list function
The cmd_list function stored node count as 'nodes' but tried to
access it as 'steps', causing a KeyError when listing agents.

Changed agent['steps'] to agent['nodes'] to match the dict key.
2026-01-25 18:25:03 +05:30
Kotapati Venkata Sai Charan 073be1f870 docs: clarify that exports/ is user-generated, not included in repo
Fixes #202

- Update docs/getting-started.md to explain exports/ is created by users

- Remove references to non-existent support_ticket_agent example

- Update DEVELOPER.md with correct agent creation instructions
2026-01-25 18:10:06 +05:30
himanshu748 86686fc8f9 docs: update skills directory structure to match actual output
- Update .claude/skills/ structure in getting-started.md
- Reflect actual skills generated by quickstart.sh:
  - agent-workflow/
  - building-agents-construction/
  - building-agents-core/
  - building-agents-patterns/
  - testing-agent/

Fixes #177
2026-01-25 07:10:46 -05:00
himanshu748 8fe51a8aa9 fix: remove duplicate web_search tool registration
- Remove redundant register_web_search(mcp) call on line 54
- Keep single registration with credentials parameter
- Tool implementation handles both credential sources internally
- Added clarifying comment explaining the credential handling

Fixes #172
2026-01-25 07:05:13 -05:00
Chrishabh2002 715df547bb chore: remove generated agent logs and ignore them 2026-01-25 17:23:50 +05:30
Chrishabh2002 c454870ac8 add code-first agent example and isolate core dependencies 2026-01-25 17:21:58 +05:30
RussellLuo 68766fd131 fix(skills): load MCP servers correctly
Closes #188.
2026-01-25 17:34:34 +08:00
RussellLuo ce39cb7dde feat(skills): add support for setting api_key and api_base
Closes #186.
2026-01-25 16:05:13 +08:00
patrick e1663793c7 feat: add nullable_output_keys 2026-01-24 22:43:42 +08:00
Aysun Itai e2f387965e fix: align AnthropicProvider.complete with LLMProvider (response_format)
Update AnthropicProvider.complete to accept response_format and forward it to LiteLLMProvider.
Added unit test in test_litellm_provider.py to verify parameter forwarding.
2026-01-24 11:59:53 +02:00
LunaStev e75253f16a add missed 2026-01-24 15:05:26 +09:00
LunaStev 7d416f5421 translate korean 2026-01-24 15:00:38 +09:00
Timothy @aden cdbcac68b8 Merge pull request #165 from adenhq/staging
staging to main
2026-01-23 18:40:12 -08:00
Timothy @aden d52b6e8e56 Merge pull request #164 from TimothyZhang7/feature/multi-entrypoint-arch
Feature/multi entrypoint arch
2026-01-23 18:39:37 -08:00
Timothy 510975619d fix: register mcp tools properly, load parent env 2026-01-23 18:32:04 -08:00
Timothy 49724b6da0 Merge branch 'staging' into feature/multi-entrypoint-arch 2026-01-23 17:05:33 -08:00
Timothy @aden 31b252c018 Merge pull request #159 from bryanadenhq/fix-json-output
Chore:Small bug fixes with json output
2026-01-23 16:59:41 -08:00
Timothy dd2254989f fix: adjust tool credential check 2026-01-23 16:56:44 -08:00
Timothy 7aa56b905c feat: framework guardrails 2026-01-23 16:31:46 -08:00
Timothy 9f4948edbe fix: agent building skills 2026-01-23 15:28:51 -08:00
Timothy 2765c9fe93 feat: concurrent framework entrypoints 2026-01-23 15:02:55 -08:00
bryan 8f223ee564 Merge branch 'staging' into fix-json-output 2026-01-23 14:48:42 -08:00
bryan b0e870d1db updated output to clean json, update set goal, changed llm to llm_generate 2026-01-23 14:27:45 -08:00
Timothy @aden 17bfbf9732 Merge pull request #152 from TimothyZhang7/feature/spec
Feature/spec
2026-01-23 12:09:10 -08:00
Timothy da0c0acdcf Merge branch 'staging' into feature/spec 2026-01-23 12:04:09 -08:00
Timothy @aden ea4c56108b Merge pull request #149 from bryanadenhq/fix-remove-llm-from-mcp
Fix remove llm from mcp
2026-01-23 12:00:45 -08:00
Timothy @aden 73ba72ee52 Merge pull request #151 from adenhq/main
Align staging branch
2026-01-23 11:51:32 -08:00
bryan f67e0cc4ae cli and documentation updates 2026-01-23 11:31:10 -08:00
bryan 8d4f107f63 removed all llm dependencies from mcp server 2026-01-23 11:15:24 -08:00
Timothy 7c6c3a8cc2 feat: node I/O cleaner 2026-01-22 19:59:29 -08:00
Timothy 5930a3c95d chore: llm provider note 2026-01-22 16:15:52 -08:00
694 changed files with 171233 additions and 33827 deletions
+9
View File
@@ -0,0 +1,9 @@
{
"mcpServers": {
"agent-builder": {
"command": "uv",
"args": ["run", "--directory", "core", "-m", "framework.mcp.agent_builder_server"],
"disabled": false
}
}
}
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-concepts
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-create
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-credentials
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-patterns
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-test
+5
View File
@@ -0,0 +1,5 @@
---
description: hive-concepts
---
use hive-concepts skill
+5
View File
@@ -0,0 +1,5 @@
---
description: hive-create
---
use hive-create skill
+5
View File
@@ -0,0 +1,5 @@
---
description: hive-credentials
---
use hive-credentials skill
+5
View File
@@ -0,0 +1,5 @@
---
description: hive-patterns
---
use hive-patterns skill
+5
View File
@@ -0,0 +1,5 @@
---
description: hive-test
---
use hive-test skill
+5
View File
@@ -0,0 +1,5 @@
---
description: hive
---
use hive skill
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-concepts
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-create
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-credentials
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-patterns
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-test
+15
View File
@@ -0,0 +1,15 @@
{
"hooks": {
"PostToolUse": [
{
"matcher": "Edit|Write|NotebookEdit",
"hooks": [
{
"type": "command",
"command": "ruff check --fix \"$CLAUDE_FILE_PATH\" 2>/dev/null; ruff format \"$CLAUDE_FILE_PATH\" 2>/dev/null; true"
}
]
}
]
}
}
-23
View File
@@ -1,23 +0,0 @@
{
"permissions": {
"allow": [
"Bash(npm install:*)",
"Bash(npm test:*)",
"Skill(building-agents-construction)",
"Skill(building-agents-construction:*)",
"Bash(PYTHONPATH=core:exports pytest:*)",
"mcp__agent-builder__create_session",
"mcp__agent-builder__get_session_status",
"mcp__agent-builder__set_goal",
"mcp__agent-builder__list_mcp_servers",
"mcp__agent-builder__test_node",
"mcp__agent-builder__add_node",
"mcp__agent-builder__add_edge",
"mcp__agent-builder__validate_graph",
"Bash(ruff check:*)",
"Bash(PYTHONPATH=core:exports python:*)",
"mcp__agent-builder__list_tests",
"mcp__agent-builder__generate_constraint_tests"
]
}
}
+34
View File
@@ -0,0 +1,34 @@
{
"permissions": {
"allow": [
"mcp__agent-builder__create_session",
"mcp__agent-builder__set_goal",
"mcp__agent-builder__add_node",
"mcp__agent-builder__add_edge",
"mcp__agent-builder__configure_loop",
"mcp__agent-builder__add_mcp_server",
"mcp__agent-builder__validate_graph",
"mcp__agent-builder__export_graph",
"mcp__agent-builder__load_session_by_id",
"Bash(git status:*)",
"Bash(gh run view:*)",
"Bash(uv run:*)",
"Bash(env:*)",
"mcp__agent-builder__test_node",
"mcp__agent-builder__list_mcp_tools",
"Bash(python -m py_compile:*)",
"Bash(python -m pytest:*)",
"Bash(source:*)",
"mcp__agent-builder__update_node",
"mcp__agent-builder__check_missing_credentials",
"mcp__agent-builder__list_stored_credentials",
"Bash(find:*)",
"mcp__agent-builder__run_tests",
"Bash(PYTHONPATH=core:exports:tools/src uv run pytest:*)",
"mcp__agent-builder__list_agent_sessions",
"mcp__agent-builder__generate_constraint_tests",
"mcp__agent-builder__generate_success_tests"
]
},
"enabledMcpjsonServers": ["agent-builder", "tools"]
}
@@ -1,953 +0,0 @@
---
name: building-agents-construction
description: Step-by-step guide for building goal-driven agents. Creates package structure, defines goals, adds nodes, connects edges, and finalizes agent class. Use when actively building an agent.
license: Apache-2.0
metadata:
author: hive
version: "1.0"
type: procedural
part_of: building-agents
requires: building-agents-core
---
# Building Agents - Construction Process
Step-by-step guide for building goal-driven agent packages.
**Prerequisites:** Read `building-agents-core` for fundamental concepts.
## CRITICAL: entry_points Format Reference
**⚠️ Common Mistake Prevention:**
The `entry_points` parameter in GraphSpec has a specific format that is easy to get wrong. This section exists because this mistake has caused production bugs.
### Correct Format
```python
entry_points = {"start": "first-node-id"}
```
**Examples from working agents:**
```python
# From exports/outbound_sales_agent/agent.py
entry_node = "lead-qualification"
entry_points = {"start": "lead-qualification"}
# From exports/support_ticket_agent/agent.py (FIXED)
entry_node = "parse-ticket"
entry_points = {"start": "parse-ticket"}
```
### WRONG Formats (DO NOT USE)
```python
# ❌ WRONG: Using node ID as key with input keys as value
entry_points = {
"parse-ticket": ["ticket_content", "customer_id", "ticket_id"]
}
# Error: ValidationError: Input should be a valid string, got list
# ❌ WRONG: Using set instead of dict
entry_points = {"parse-ticket"}
# Error: ValidationError: Input should be a valid dictionary, got set
# ❌ WRONG: Missing "start" key
entry_points = {"entry": "parse-ticket"}
# Error: Graph execution fails, cannot find entry point
```
### Validation Check
After writing graph configuration, ALWAYS validate:
```python
# Check 1: Must be a dict
assert isinstance(entry_points, dict), f"entry_points must be dict, got {type(entry_points)}"
# Check 2: Must have "start" key
assert "start" in entry_points, f"entry_points must have 'start' key, got keys: {entry_points.keys()}"
# Check 3: "start" value must match entry_node
assert entry_points["start"] == entry_node, f"entry_points['start']={entry_points['start']} must match entry_node={entry_node}"
# Check 4: Value must be a string (node ID)
assert isinstance(entry_points["start"], str), f"entry_points['start'] must be string, got {type(entry_points['start'])}"
```
**Why this matters:** GraphSpec uses Pydantic validation. The wrong format causes ValidationError at runtime, which blocks all agent execution and tests. This bug is not caught until you try to run the agent.
## Building Session Management with MCP
**MANDATORY**: Use the agent-builder MCP server's BuildSession system for automatic bookkeeping and persistence.
### Available MCP Session Tools
```python
# Create new session (call FIRST before building)
mcp__agent-builder__create_session(name="Support Ticket Agent")
# Returns: session_id, automatically sets as active session
# Get current session status (use for progress tracking)
status = mcp__agent-builder__get_session_status()
# Returns: {
# "session_id": "build_20250122_...",
# "name": "Support Ticket Agent",
# "has_goal": true,
# "node_count": 5,
# "edge_count": 7,
# "nodes": ["parse-ticket", "categorize", ...],
# "edges": [("parse-ticket", "categorize"), ...]
# }
# List all saved sessions
mcp__agent-builder__list_sessions()
# Load previous session
mcp__agent-builder__load_session_by_id(session_id="build_...")
# Delete session
mcp__agent-builder__delete_session(session_id="build_...")
```
### How MCP Session Works
The BuildSession class (in `core/framework/mcp/agent_builder_server.py`) automatically:
- **Persists to disk** after every operation (`_save_session()` called automatically)
- **Tracks all components**: goal, nodes, edges, mcp_servers
- **Maintains timestamps**: created_at, last_modified
- **Stores to**: `~/.claude-code-agent-builder/sessions/`
When you call MCP tools like:
- `mcp__agent-builder__set_goal(...)` - Automatically added to session.goal and saved
- `mcp__agent-builder__add_node(...)` - Automatically added to session.nodes and saved
- `mcp__agent-builder__add_edge(...)` - Automatically added to session.edges and saved
**No manual bookkeeping needed** - the MCP server handles it all!
### Show Progress to User
```python
# Get session status to show progress
status = json.loads(mcp__agent-builder__get_session_status())
print(f"\n📊 Building Progress:")
print(f" Session: {status['name']}")
print(f" Goal defined: {status['has_goal']}")
print(f" Nodes: {status['node_count']}")
print(f" Edges: {status['edge_count']}")
print(f" Nodes added: {', '.join(status['nodes'])}")
```
**Benefits:**
- Automatic persistence - survive crashes/restarts
- Clear audit trail - all operations logged
- Session resume - continue from where you left off
- Progress tracking built-in
- No manual state management needed
## Step-by-Step Guide
### Step 1: Create Building Session & Package Structure
When user requests an agent, **immediately create MCP session and package**:
```python
# 0. FIRST: Create MCP building session
agent_name = "technical_research_agent" # snake_case
session_result = mcp__agent-builder__create_session(name=agent_name.replace('_', ' ').title())
session_id = json.loads(session_result)["session_id"]
print(f"✅ Created building session: {session_id}")
# 1. Create directory
package_path = f"exports/{agent_name}"
Bash(f"mkdir -p {package_path}/nodes")
# 2. Write skeleton files
Write(
file_path=f"{package_path}/__init__.py",
content='''"""
Agent package - will be populated as build progresses.
"""
'''
)
Write(
file_path=f"{package_path}/nodes/__init__.py",
content='''"""Node definitions."""
from framework.graph import NodeSpec
# Nodes will be added here as they are approved
__all__ = []
'''
)
Write(
file_path=f"{package_path}/agent.py",
content='''"""Agent graph construction."""
from framework.graph import EdgeSpec, EdgeCondition, Goal, SuccessCriterion, Constraint
from framework.graph.edge import GraphSpec
from framework.graph.executor import GraphExecutor
from framework.runtime import Runtime
from framework.llm.anthropic import AnthropicProvider
from framework.runner.tool_registry import ToolRegistry
from aden_tools.credentials import CredentialManager
# Goal will be added when defined
# Nodes will be imported from .nodes
# Edges will be added when approved
# Agent class will be created when graph is complete
'''
)
Write(
file_path=f"{package_path}/config.py",
content='''"""Runtime configuration."""
from dataclasses import dataclass
@dataclass
class RuntimeConfig:
model: str = "claude-haiku-4-5-20251001"
temperature: float = 0.7
max_tokens: int = 4096
default_config = RuntimeConfig()
# Metadata will be added when goal is set
'''
)
Write(
file_path=f"{package_path}/__main__.py",
content=CLI_TEMPLATE # Full CLI template (see below)
)
```
**Show user:**
```
✅ Package created: exports/technical_research_agent/
📁 Files created:
- __init__.py (skeleton)
- __main__.py (CLI ready)
- agent.py (skeleton)
- nodes/__init__.py (empty)
- config.py (skeleton)
You can open these files now and watch them grow as we build!
```
### Step 2: Define Goal
Propose goal, get approval, **write immediately**:
```python
# After user approves goal...
goal_code = f'''
goal = Goal(
id="{goal_id}",
name="{name}",
description="{description}",
success_criteria=[
SuccessCriterion(
id="{sc.id}",
description="{sc.description}",
metric="{sc.metric}",
target="{sc.target}",
weight={sc.weight},
),
# 3-5 success criteria total
],
constraints=[
Constraint(
id="{c.id}",
description="{c.description}",
constraint_type="{c.constraint_type}",
category="{c.category}",
),
# 1-5 constraints total
],
)
'''
# Append to agent.py
Read(f"{package_path}/agent.py") # Must read first
Edit(
file_path=f"{package_path}/agent.py",
old_string="# Goal will be added when defined",
new_string=f"# Goal definition\n{goal_code}"
)
# Write metadata to config.py
metadata_code = f'''
@dataclass
class AgentMetadata:
name: str = "{name}"
version: str = "1.0.0"
description: str = "{description}"
metadata = AgentMetadata()
'''
Read(f"{package_path}/config.py")
Edit(
file_path=f"{package_path}/config.py",
old_string="# Metadata will be added when goal is set",
new_string=f"# Agent metadata\n{metadata_code}"
)
```
**Show user:**
```
✅ Goal written to agent.py
✅ Metadata written to config.py
Open exports/technical_research_agent/agent.py to see the goal!
```
**Note:** Goal is automatically tracked in MCP session. Use `mcp__agent-builder__get_session_status()` to check progress.
### Step 3: Add Nodes (Incremental)
**⚠️ CRITICAL VALIDATION REQUIREMENTS:**
Before adding any node with tools:
1. Call `mcp__agent-builder__list_mcp_tools()` to discover available tools
2. Verify each tool exists in the response
3. If a tool doesn't exist, inform the user and ask how to proceed
After writing each node:
4. **MANDATORY**: Validate with `mcp__agent-builder__test_node()` before proceeding
5. **MANDATORY**: Check MCP session status to track progress
6. Only proceed to next node after validation passes
For each node, **write immediately after approval**:
```python
# After user approves node...
node_code = f'''
{node_id.replace('-', '_')}_node = NodeSpec(
id="{node_id}",
name="{name}",
description="{description}",
node_type="{node_type}",
input_keys={input_keys},
output_keys={output_keys},
system_prompt="""\\
{system_prompt}
""",
tools={tools},
max_retries={max_retries},
)
'''
# Append to nodes/__init__.py
Read(f"{package_path}/nodes/__init__.py")
Edit(
file_path=f"{package_path}/nodes/__init__.py",
old_string="__all__ = []",
new_string=f"{node_code}\n__all__ = []"
)
# Update __all__ exports
all_node_names = [n.replace('-', '_') + '_node' for n in approved_nodes]
all_exports = f"__all__ = {all_node_names}"
Edit(
file_path=f"{package_path}/nodes/__init__.py",
old_string="__all__ = []",
new_string=all_exports
)
```
**Show user after each node:**
```
✅ Added analyze_request_node to nodes/__init__.py
📊 Progress: 1/6 nodes added
Open exports/technical_research_agent/nodes/__init__.py to see it!
```
**Repeat for each node.** User watches the file grow.
#### MANDATORY: Validate Each Node with MCP Tools
After writing EVERY node, you MUST validate before proceeding:
```python
# Node is already written to file. Now VALIDATE IT (REQUIRED):
validation_result = json.loads(mcp__agent-builder__test_node(
node_id="analyze-request",
test_input='{"query": "test query"}',
mock_llm_response='{"analysis": "mock output"}'
))
# Check validation result
if validation_result["valid"]:
# Show user validation passed
print(f"✅ Node validation passed: analyze-request")
# Show session progress
status = json.loads(mcp__agent-builder__get_session_status())
print(f"📊 Session progress: {status['node_count']} nodes added")
else:
# STOP - Do not proceed until fixed
print(f"❌ Node validation FAILED:")
for error in validation_result["errors"]:
print(f" - {error}")
print("⚠️ Must fix node before proceeding to next component")
# Ask user how to proceed
```
**CRITICAL:** Do NOT proceed to the next node until validation passes. Bugs caught here prevent wasted work later.
### Step 4: Connect Edges
After all nodes approved, add edges:
```python
# Generate edges code
edges_code = "edges = [\n"
for edge in approved_edges:
edges_code += f''' EdgeSpec(
id="{edge.id}",
source="{edge.source}",
target="{edge.target}",
condition=EdgeCondition.{edge.condition.upper()},
'''
if edge.condition_expr:
edges_code += f' condition_expr="{edge.condition_expr}",\n'
edges_code += f' priority={edge.priority},\n'
edges_code += ' ),\n'
edges_code += "]\n"
# Write to agent.py
Read(f"{package_path}/agent.py")
Edit(
file_path=f"{package_path}/agent.py",
old_string="# Edges will be added when approved",
new_string=f"# Edge definitions\n{edges_code}"
)
# Write entry points and terminal nodes
# ⚠️ CRITICAL: entry_points format must be {"start": "node_id"}
# Common mistake: {"node_id": ["input_keys"]} is WRONG
# Correct format: {"start": "first-node-id"}
# Reference: See exports/outbound_sales_agent/agent.py for example
graph_config = f'''
# Graph configuration
entry_node = "{entry_node_id}"
entry_points = {{"start": "{entry_node_id}"}} # CRITICAL: Must be {{"start": "node-id"}}
pause_nodes = {pause_nodes}
terminal_nodes = {terminal_nodes}
# Collect all nodes
nodes = [
{', '.join(node_names)},
]
'''
Edit(
file_path=f"{package_path}/agent.py",
old_string="# Agent class will be created when graph is complete",
new_string=graph_config
)
```
**Show user:**
```
✅ Edges written to agent.py
✅ Graph configuration added
5 edges connecting 6 nodes
```
#### MANDATORY: Validate Graph Structure
After writing edges, you MUST validate before proceeding to finalization:
```python
# Edges already written to agent.py. Now VALIDATE STRUCTURE (REQUIRED):
graph_validation = json.loads(mcp__agent-builder__validate_graph())
# Check for structural issues
if graph_validation["valid"]:
print("✅ Graph structure validated successfully")
# Show session summary
status = json.loads(mcp__agent-builder__get_session_status())
print(f" - Nodes: {status['node_count']}")
print(f" - Edges: {status['edge_count']}")
print(f" - Entry point: {entry_node_id}")
else:
print("❌ Graph validation FAILED:")
for error in graph_validation["errors"]:
print(f" ERROR: {error}")
print("\n⚠️ Must fix graph structure before finalizing agent")
# Ask user how to proceed
# Additional validation: Check entry_points format
if not isinstance(entry_points, dict):
print("❌ CRITICAL ERROR: entry_points must be a dict")
print(f" Current value: {entry_points} (type: {type(entry_points)})")
print(" Correct format: {'start': 'node-id'}")
# STOP - This is the mistake that caused the support_ticket_agent bug
if entry_points.get("start") != entry_node_id:
print("❌ CRITICAL ERROR: entry_points['start'] must match entry_node")
print(f" entry_points: {entry_points}")
print(f" entry_node: {entry_node_id}")
print(" They must be consistent!")
```
**CRITICAL:** Do NOT proceed to Step 5 (finalization) until graph validation passes. This checkpoint prevents structural bugs from reaching production.
### Step 5: Finalize Agent Class
**Pre-flight checks before finalization:**
```python
# MANDATORY: Verify all validations passed before finalizing
print("\n🔍 Pre-finalization Checklist:")
# Get current session status
status = json.loads(mcp__agent-builder__get_session_status())
checks_passed = True
# Check 1: Goal defined
if not status["has_goal"]:
print("❌ No goal defined")
checks_passed = False
else:
print(f"✅ Goal defined: {status['goal_name']}")
# Check 2: Nodes added
if status["node_count"] == 0:
print("❌ No nodes added")
checks_passed = False
else:
print(f"{status['node_count']} nodes added: {', '.join(status['nodes'])}")
# Check 3: Edges added
if status["edge_count"] == 0:
print("❌ No edges added")
checks_passed = False
else:
print(f"{status['edge_count']} edges added")
# Check 4: Entry points format correct
if not isinstance(entry_points, dict) or "start" not in entry_points:
print("❌ CRITICAL: entry_points format incorrect")
print(f" Current: {entry_points}")
print(" Required: {'start': 'node-id'}")
checks_passed = False
else:
print(f"✅ Entry points valid: {entry_points}")
if not checks_passed:
print("\n⚠️ CANNOT PROCEED to finalization until all checks pass")
print(" Fix the issues above first")
# Ask user how to proceed or stop here
return
print("\n✅ All pre-flight checks passed - proceeding to finalization\n")
```
Write the agent class:
````python
agent_class_code = f'''
class {agent_class_name}:
"""
{agent_description}
"""
def __init__(self, config=None):
self.config = config or default_config
self.goal = goal
self.nodes = nodes
self.edges = edges
self.entry_node = entry_node
self.entry_points = entry_points
self.pause_nodes = pause_nodes
self.terminal_nodes = terminal_nodes
self.executor = None
def _create_executor(self, mock_mode=False):
"""Create executor instance."""
import tempfile
from pathlib import Path
storage_path = Path(tempfile.gettempdir()) / "{agent_name}"
storage_path.mkdir(parents=True, exist_ok=True)
runtime = Runtime(storage_path=storage_path)
tool_registry = ToolRegistry()
llm = None
if not mock_mode:
creds = CredentialManager()
if creds.is_available("anthropic"):
api_key = creds.get("anthropic")
llm = AnthropicProvider(api_key=api_key, model=self.config.model)
graph = GraphSpec(
id="{agent_name}-graph",
goal_id=self.goal.id,
version="1.0.0",
entry_node=self.entry_node,
entry_points=self.entry_points,
terminal_nodes=self.terminal_nodes,
pause_nodes=self.pause_nodes,
nodes=self.nodes,
edges=self.edges,
default_model=self.config.model,
max_tokens=self.config.max_tokens,
)
self.executor = GraphExecutor(
runtime=runtime,
llm=llm,
tools=list(tool_registry.get_tools().values()),
tool_executor=tool_registry.get_executor(),
)
self.graph = graph
return self.executor
async def run(self, context: dict, mock_mode=False, session_state=None):
"""Run the agent."""
executor = self._create_executor(mock_mode=mock_mode)
result = await executor.execute(
graph=self.graph,
goal=self.goal,
input_data=context,
session_state=session_state,
)
return result
def info(self):
"""Get agent information."""
return {{
"name": metadata.name,
"version": metadata.version,
"description": metadata.description,
"goal": {{
"name": self.goal.name,
"description": self.goal.description,
}},
"nodes": [n.id for n in self.nodes],
"edges": [e.id for e in self.edges],
"entry_node": self.entry_node,
"pause_nodes": self.pause_nodes,
"terminal_nodes": self.terminal_nodes,
}}
def validate(self):
"""Validate agent structure."""
errors = []
warnings = []
node_ids = {{node.id for node in self.nodes}}
for edge in self.edges:
if edge.source not in node_ids:
errors.append(f"Edge {{edge.id}}: source '{{edge.source}}' not found")
if edge.target not in node_ids:
errors.append(f"Edge {{edge.id}}: target '{{edge.target}}' not found")
if self.entry_node not in node_ids:
errors.append(f"Entry node '{{self.entry_node}}' not found")
return {{
"valid": len(errors) == 0,
"errors": errors,
"warnings": warnings,
}}
# Create default instance
default_agent = {agent_class_name}()
'''
# Append agent class
Read(f"{package_path}/agent.py")
Edit(
file_path=f"{package_path}/agent.py",
old_string="nodes = [",
new_string=f"nodes = [\n{agent_class_code}"
)
# Finalize __init__.py exports
init_content = f'''"""
{agent_description}
"""
from .agent import {agent_class_name}, default_agent, goal, nodes, edges
from .config import RuntimeConfig, AgentMetadata, default_config, metadata
__version__ = "1.0.0"
__all__ = [
"{agent_class_name}",
"default_agent",
"goal",
"nodes",
"edges",
"RuntimeConfig",
"AgentMetadata",
"default_config",
"metadata",
]
'''
Read(f"{package_path}/__init__.py")
Edit(
file_path=f"{package_path}/__init__.py",
old_string='"""',
new_string=init_content,
replace_all=True
)
# Write README
readme_content = f'''# {agent_name.replace('_', ' ').title()}
{agent_description}
## Usage
```bash
# Show agent info
python -m {agent_name} info
# Validate structure
python -m {agent_name} validate
# Run agent
python -m {agent_name} run --input '{{"key": "value"}}'
# Interactive shell
python -m {agent_name} shell
````
## As Python Module
```python
from {agent_name} import default_agent
result = await default_agent.run({{"key": "value"}})
```
## Structure
- `agent.py` - Goal, edges, graph construction
- `nodes/__init__.py` - Node definitions
- `config.py` - Runtime configuration
- `__main__.py` - CLI interface
'''
Write(
file_path=f"{package_path}/README.md",
content=readme_content
)
```
**Show user:**
```
✅ Agent class written to agent.py
✅ Package exports finalized in __init__.py
✅ README.md generated
🎉 Agent complete: exports/technical_research_agent/
Commands:
python -m technical_research_agent info
python -m technical_research_agent validate
python -m technical_research_agent run --input '{"topic": "..."}'
```
**Final session summary:**
```python
# Show final MCP session status
status = json.loads(mcp__agent-builder__get_session_status())
print("\n📊 Build Session Summary:")
print(f" Session ID: {status['session_id']}")
print(f" Agent: {status['name']}")
print(f" Goal: {status['goal_name']}")
print(f" Nodes: {status['node_count']}")
print(f" Edges: {status['edge_count']}")
print(f" MCP Servers: {status['mcp_servers_count']}")
print("\n✅ Agent construction complete with full validation")
print(f"\nSession saved to: ~/.claude-code-agent-builder/sessions/{status['session_id']}.json")
````
## CLI Template
```python
CLI_TEMPLATE = '''"""
CLI entry point for agent.
"""
import asyncio
import json
import sys
import click
from .agent import default_agent
@click.group()
@click.version_option(version="1.0.0")
def cli():
"""Agent CLI."""
pass
@cli.command()
@click.option("--input", "-i", "input_json", type=str, required=True)
@click.option("--mock", is_flag=True, help="Run in mock mode")
@click.option("--quiet", "-q", is_flag=True, help="Only output result JSON")
def run(input_json, mock, quiet):
"""Execute the agent."""
try:
context = json.loads(input_json)
except json.JSONDecodeError as e:
click.echo(f"Error parsing input JSON: {e}", err=True)
sys.exit(1)
if not quiet:
click.echo(f"Running agent with input: {json.dumps(context)}")
result = asyncio.run(default_agent.run(context, mock_mode=mock))
output_data = {
"success": result.success,
"steps_executed": result.steps_executed,
"output": result.output,
}
if result.error:
output_data["error"] = result.error
if result.paused_at:
output_data["paused_at"] = result.paused_at
click.echo(json.dumps(output_data, indent=2, default=str))
sys.exit(0 if result.success else 1)
@cli.command()
@click.option("--json", "output_json", is_flag=True)
def info(output_json):
"""Show agent information."""
info_data = default_agent.info()
if output_json:
click.echo(json.dumps(info_data, indent=2))
else:
click.echo(f"Agent: {info_data['name']}")
click.echo(f"Description: {info_data['description']}")
click.echo(f"Nodes: {len(info_data['nodes'])}")
click.echo(f"Edges: {len(info_data['edges'])}")
@cli.command()
def validate():
"""Validate agent structure."""
validation = default_agent.validate()
if validation["valid"]:
click.echo("✓ Agent is valid")
else:
click.echo("✗ Agent has errors:")
for error in validation["errors"]:
click.echo(f" ERROR: {error}")
sys.exit(0 if validation["valid"] else 1)
@cli.command()
def shell():
"""Interactive agent session."""
click.echo("Interactive mode - enter JSON input:")
# ... implementation
if __name__ == "__main__":
cli()
'''
````
## Testing During Build
After nodes are added:
```python
# Test individual node
python -c "
from exports.my_agent.nodes import analyze_request_node
print(analyze_request_node.id)
print(analyze_request_node.input_keys)
"
# Validate current state
PYTHONPATH=core:exports python -m my_agent validate
# Show info
PYTHONPATH=core:exports python -m my_agent info
```
## Approval Pattern
Use AskUserQuestion for all approvals:
```python
response = AskUserQuestion(
questions=[{
"question": "Do you approve this [component]?",
"header": "Approve",
"options": [
{
"label": "✓ Approve (Recommended)",
"description": "Component looks good, proceed"
},
{
"label": "✗ Reject & Modify",
"description": "Need to make changes"
},
{
"label": "⏸ Pause & Review",
"description": "Need more time to review"
}
],
"multiSelect": false
}]
)
```
## Next Steps
After completing construction:
**If agent structure complete:**
- Validate: `python -m agent_name validate`
- Test basic execution: `python -m agent_name info`
- Proceed to testing-agent skill for comprehensive tests
**If implementation needed:**
- Check for STATUS.md or IMPLEMENTATION_GUIDE.md in agent directory
- May need Python functions or MCP tool integration
## Related Skills
- **building-agents-core** - Fundamental concepts
- **building-agents-patterns** - Best practices and examples
- **testing-agent** - Test and validate completed agents
- **agent-workflow** - Complete workflow orchestrator
@@ -1,303 +0,0 @@
---
name: building-agents-core
description: Core concepts for goal-driven agents - architecture, node types, tool discovery, and workflow overview. Use when starting agent development or need to understand agent fundamentals.
license: Apache-2.0
metadata:
author: hive
version: "1.0"
type: foundational
part_of: building-agents
---
# Building Agents - Core Concepts
Foundational knowledge for building goal-driven agents as Python packages.
## Architecture: Python Services (Not JSON Configs)
Agents are built as Python packages:
```
exports/my_agent/
├── __init__.py # Package exports
├── __main__.py # CLI (run, info, validate, shell)
├── agent.py # Graph construction (goal, edges, agent class)
├── nodes/__init__.py # Node definitions (NodeSpec)
├── config.py # Runtime config
└── README.md # Documentation
```
**Key Principle: Agent is visible and editable during build**
- ✅ Files created immediately as components are approved
- ✅ User can watch files grow in their editor
- ✅ No session state - just direct file writes
- ✅ No "export" step - agent is ready when build completes
## Core Concepts
### Goal
Success criteria and constraints (written to agent.py)
```python
goal = Goal(
id="research-goal",
name="Technical Research Agent",
description="Research technical topics thoroughly",
success_criteria=[
SuccessCriterion(
id="completeness",
description="Cover all aspects of topic",
metric="coverage_score",
target=">=0.9",
weight=0.4,
),
# 3-5 success criteria total
],
constraints=[
Constraint(
id="accuracy",
description="All information must be verified",
constraint_type="hard",
category="quality",
),
# 1-5 constraints total
],
)
```
### Node
Unit of work (written to nodes/__init__.py)
**Node Types:**
- `llm_generate` - Text generation, parsing
- `llm_tool_use` - Actions requiring tools
- `router` - Conditional branching
- `function` - Deterministic operations
```python
search_node = NodeSpec(
id="search-web",
name="Search Web",
description="Search for information online",
node_type="llm_tool_use",
input_keys=["query"],
output_keys=["search_results"],
system_prompt="Search the web for: {query}",
tools=["web_search"],
max_retries=3,
)
```
### Edge
Connection between nodes (written to agent.py)
**Edge Conditions:**
- `on_success` - Proceed if node succeeds
- `on_failure` - Handle errors
- `always` - Always proceed
- `conditional` - Based on expression
```python
EdgeSpec(
id="search-to-analyze",
source="search-web",
target="analyze-results",
condition=EdgeCondition.ON_SUCCESS,
priority=1,
)
```
### Pause/Resume
Multi-turn conversations
- **Pause nodes** - Stop execution, wait for user input
- **Resume entry points** - Continue from pause with user's response
```python
# Example pause/resume configuration
pause_nodes = ["request-clarification"]
entry_points = {
"start": "analyze-request",
"request-clarification_resume": "process-clarification"
}
```
## Tool Discovery & Validation
**CRITICAL:** Before adding a node with tools, you MUST verify the tools exist.
Tools are provided by MCP servers. Never assume a tool exists - always discover dynamically.
### Step 1: Register MCP Server (if not already done)
```python
mcp__agent-builder__add_mcp_server(
name="tools",
transport="stdio",
command="python",
args='["mcp_server.py", "--stdio"]',
cwd="../tools"
)
```
### Step 2: Discover Available Tools
```python
# List all tools from all registered servers
mcp__agent-builder__list_mcp_tools()
# Or list tools from a specific server
mcp__agent-builder__list_mcp_tools(server_name="tools")
```
This returns available tools with their descriptions and parameters:
```json
{
"success": true,
"tools_by_server": {
"tools": [
{
"name": "web_search",
"description": "Search the web...",
"parameters": ["query"]
},
{
"name": "web_scrape",
"description": "Scrape a URL...",
"parameters": ["url"]
}
]
},
"total_tools": 14
}
```
### Step 3: Validate Before Adding Nodes
Before writing a node with `tools=[...]`:
1. Call `list_mcp_tools()` to get available tools
2. Check each tool in your node exists in the response
3. If a tool doesn't exist:
- **DO NOT proceed** with the node
- Inform the user: "The tool 'X' is not available. Available tools are: ..."
- Ask if they want to use an alternative or proceed without the tool
### Tool Validation Anti-Patterns
**Never assume a tool exists** - always call `list_mcp_tools()` first
**Never write a node with unverified tools** - validate before writing
**Never silently drop tools** - if a tool doesn't exist, inform the user
**Never guess tool names** - use exact names from discovery response
### Example Validation Flow
```python
# 1. User requests: "Add a node that searches the web"
# 2. Discover available tools
tools_response = mcp__agent-builder__list_mcp_tools()
# 3. Check if web_search exists
available = [t["name"] for tools in tools_response["tools_by_server"].values() for t in tools]
if "web_search" not in available:
# Inform user and ask how to proceed
print("'web_search' not available. Available tools:", available)
else:
# Proceed with node creation
# ...
```
## Workflow Overview: Incremental File Construction
```
1. CREATE PACKAGE → mkdir + write skeletons
2. DEFINE GOAL → Write to agent.py + config.py
3. FOR EACH NODE:
- Propose design
- User approves
- Write to nodes/__init__.py IMMEDIATELY ← FILE WRITTEN
- (Optional) Validate with test_node ← MCP VALIDATION
- User can open file and see it
4. CONNECT EDGES → Update agent.py ← FILE WRITTEN
- (Optional) Validate with validate_graph ← MCP VALIDATION
5. FINALIZE → Write agent class to agent.py ← FILE WRITTEN
6. DONE - Agent ready at exports/my_agent/
```
**Files written immediately. MCP tools optional for validation/testing bookkeeping.**
### The Key Difference
**OLD (Bad):**
```
MCP add_node → Session State → MCP add_node → Session State → ...
MCP export_graph
Files appear
```
**NEW (Good):**
```
Write node to file → (Optional: MCP test_node) → Write node to file → ...
↓ ↓
File visible File visible
immediately immediately
```
**Bottom line:** Use Write/Edit for construction, MCP for validation if needed.
## When to Use This Skill
Use building-agents-core when:
- Starting a new agent project and need to understand fundamentals
- Need to understand agent architecture before building
- Want to validate tool availability before proceeding
- Learning about node types, edges, and graph execution
**Next Steps:**
- Ready to build? → Use `building-agents-construction` skill
- Need patterns and examples? → Use `building-agents-patterns` skill
## MCP Tools for Validation
After writing files, optionally use MCP tools for validation:
**test_node** - Validate node configuration with mock inputs
```python
mcp__agent-builder__test_node(
node_id="search-web",
test_input='{"query": "test query"}',
mock_llm_response='{"results": "mock output"}'
)
```
**validate_graph** - Check graph structure
```python
mcp__agent-builder__validate_graph()
# Returns: unreachable nodes, missing connections, etc.
```
**create_session** - Track session state for bookkeeping
```python
mcp__agent-builder__create_session(session_name="my-build")
```
**Key Point:** Files are written FIRST. MCP tools are for validation only.
## Related Skills
- **building-agents-construction** - Step-by-step building process
- **building-agents-patterns** - Best practices and examples
- **agent-workflow** - Complete workflow orchestrator
- **testing-agent** - Test and validate completed agents
@@ -1,497 +0,0 @@
---
name: building-agents-patterns
description: Best practices, patterns, and examples for building goal-driven agents. Includes pause/resume architecture, hybrid workflows, anti-patterns, and handoff to testing. Use when optimizing agent design.
license: Apache-2.0
metadata:
author: hive
version: "1.0"
type: reference
part_of: building-agents
---
# Building Agents - Patterns & Best Practices
Design patterns, examples, and best practices for building robust goal-driven agents.
**Prerequisites:** Complete agent structure using `building-agents-construction`.
## Practical Example: Hybrid Workflow
How to build a node using both direct file writes and optional MCP validation:
```python
# 1. WRITE TO FILE FIRST (Primary - makes it visible)
node_code = '''
search_node = NodeSpec(
id="search-web",
node_type="llm_tool_use",
input_keys=["query"],
output_keys=["search_results"],
system_prompt="Search the web for: {query}",
tools=["web_search"],
)
'''
Edit(
file_path="exports/research_agent/nodes/__init__.py",
old_string="# Nodes will be added here",
new_string=node_code
)
print("✅ Added search_node to nodes/__init__.py")
print("📁 Open exports/research_agent/nodes/__init__.py to see it!")
# 2. OPTIONALLY VALIDATE WITH MCP (Secondary - bookkeeping)
validation = mcp__agent-builder__test_node(
node_id="search-web",
test_input='{"query": "python tutorials"}',
mock_llm_response='{"search_results": [...mock results...]}'
)
print(f"✓ Validation: {validation['success']}")
```
**User experience:**
- Immediately sees node in their editor (from step 1)
- Gets validation feedback (from step 2)
- Can edit the file directly if needed
This combines visibility (files) with validation (MCP tools).
## Pause/Resume Architecture
For agents needing multi-turn conversations with user interaction:
### Basic Pause/Resume Flow
```python
# Define pause nodes - execution stops at these nodes
pause_nodes = ["request-clarification", "await-approval"]
# Define entry points - where to resume from each pause
entry_points = {
"start": "analyze-request", # Initial entry
"request-clarification_resume": "process-clarification", # Resume from clarification
"await-approval_resume": "execute-action", # Resume from approval
}
```
### Example: Multi-Turn Research Agent
```python
# Nodes
nodes = [
NodeSpec(id="analyze-request", ...),
NodeSpec(id="request-clarification", ...), # PAUSE NODE
NodeSpec(id="process-clarification", ...),
NodeSpec(id="generate-results", ...),
NodeSpec(id="await-approval", ...), # PAUSE NODE
NodeSpec(id="execute-action", ...),
]
# Edges with resume flows
edges = [
EdgeSpec(
id="analyze-to-clarify",
source="analyze-request",
target="request-clarification",
condition=EdgeCondition.CONDITIONAL,
condition_expr="needs_clarification == true",
),
# When resumed, goes to process-clarification
EdgeSpec(
id="clarify-to-process",
source="request-clarification",
target="process-clarification",
condition=EdgeCondition.ALWAYS,
),
EdgeSpec(
id="results-to-approval",
source="generate-results",
target="await-approval",
condition=EdgeCondition.ALWAYS,
),
# When resumed, goes to execute-action
EdgeSpec(
id="approval-to-execute",
source="await-approval",
target="execute-action",
condition=EdgeCondition.ALWAYS,
),
]
# Configuration
pause_nodes = ["request-clarification", "await-approval"]
entry_points = {
"start": "analyze-request",
"request-clarification_resume": "process-clarification",
"await-approval_resume": "execute-action",
}
```
### Running Pause/Resume Agents
```python
# Initial run - will pause at first pause node
result1 = await agent.run(
context={"query": "research topic"},
session_state=None
)
# Check if paused
if result1.paused_at:
print(f"Paused at: {result1.paused_at}")
# Resume with user input
result2 = await agent.run(
context={"user_response": "clarification details"},
session_state=result1.session_state # Pass previous state
)
```
## Anti-Patterns
### What NOT to Do
**Don't rely on `export_graph`** - Write files immediately, not at end
```python
# BAD: Building in session state, exporting at end
mcp__agent-builder__add_node(...)
mcp__agent-builder__add_node(...)
mcp__agent-builder__export_graph() # Files appear only now
# GOOD: Writing files immediately
Write(file_path="...", content=node_code) # File visible now
Write(file_path="...", content=node_code) # File visible now
```
**Don't hide code in session** - Write to files as components approved
```python
# BAD: Accumulating changes invisibly
session.add_component(component1)
session.add_component(component2)
# User can't see anything yet
# GOOD: Incremental visibility
Edit(file_path="...", ...) # User sees change 1
Edit(file_path="...", ...) # User sees change 2
```
**Don't wait to write files** - Agent visible from first step
```python
# BAD: Building everything before writing
design_all_nodes()
design_all_edges()
write_everything_at_once()
# GOOD: Write as you go
write_package_structure() # Visible
write_goal() # Visible
write_node_1() # Visible
write_node_2() # Visible
```
**Don't batch everything** - Write incrementally
```python
# BAD: Batching all nodes
nodes = [design_node_1(), design_node_2(), ...]
write_all_nodes(nodes)
# GOOD: One at a time with user feedback
write_node_1() # User approves
write_node_2() # User approves
write_node_3() # User approves
```
### MCP Tools - Correct Usage
**MCP tools OK for:**
`test_node` - Validate node configuration with mock inputs
`validate_graph` - Check graph structure
`create_session` - Track session state for bookkeeping
✅ Other validation tools
**Just don't:** Use MCP as the primary construction method or rely on export_graph
## Best Practices
### 1. Show Progress After Each Write
```python
# After writing a node
print("✅ Added analyze_request_node to nodes/__init__.py")
print("📊 Progress: 1/6 nodes added")
print("📁 Open exports/my_agent/nodes/__init__.py to see it!")
```
### 2. Let User Open Files During Build
```python
# Encourage file inspection
print("✅ Goal written to agent.py")
print("")
print("💡 Tip: Open exports/my_agent/agent.py in your editor to see the goal!")
```
### 3. Write Incrementally - One Component at a Time
```python
# Good flow
write_package_structure()
show_user("Package created")
write_goal()
show_user("Goal written")
for node in nodes:
get_approval(node)
write_node(node)
show_user(f"Node {node.id} written")
```
### 4. Test As You Build
```python
# After adding several nodes
print("💡 You can test current state with:")
print(" PYTHONPATH=core:exports python -m my_agent validate")
print(" PYTHONPATH=core:exports python -m my_agent info")
```
### 5. Keep User Informed
```python
# Clear status updates
print("🔨 Creating package structure...")
print("✅ Package created: exports/my_agent/")
print("")
print("📝 Next: Define agent goal")
```
## Continuous Monitoring Agents
For agents that run continuously without terminal nodes:
```python
# No terminal nodes - loops forever
terminal_nodes = []
# Workflow loops back to start
edges = [
EdgeSpec(id="monitor-to-check", source="monitor", target="check-condition"),
EdgeSpec(id="check-to-wait", source="check-condition", target="wait"),
EdgeSpec(id="wait-to-monitor", source="wait", target="monitor"), # Loop
]
# Entry node only
entry_node = "monitor"
entry_points = {"start": "monitor"}
pause_nodes = []
```
**Example: File Monitor**
```python
nodes = [
NodeSpec(id="list-files", ...),
NodeSpec(id="check-new-files", node_type="router", ...),
NodeSpec(id="process-files", ...),
NodeSpec(id="wait-interval", node_type="function", ...),
]
edges = [
EdgeSpec(id="list-to-check", source="list-files", target="check-new-files"),
EdgeSpec(
id="check-to-process",
source="check-new-files",
target="process-files",
condition=EdgeCondition.CONDITIONAL,
condition_expr="new_files_count > 0",
),
EdgeSpec(
id="check-to-wait",
source="check-new-files",
target="wait-interval",
condition=EdgeCondition.CONDITIONAL,
condition_expr="new_files_count == 0",
),
EdgeSpec(id="process-to-wait", source="process-files", target="wait-interval"),
EdgeSpec(id="wait-to-list", source="wait-interval", target="list-files"), # Loop back
]
terminal_nodes = [] # No terminal - runs forever
```
## Complex Routing Patterns
### Multi-Condition Router
```python
router_node = NodeSpec(
id="decision-router",
node_type="router",
input_keys=["analysis_result"],
output_keys=["decision"],
system_prompt="""
Based on the analysis result, decide the next action:
- If confidence > 0.9: route to "execute"
- If 0.5 <= confidence <= 0.9: route to "review"
- If confidence < 0.5: route to "clarify"
Return: {"decision": "execute|review|clarify"}
""",
)
# Edges for each route
edges = [
EdgeSpec(
id="router-to-execute",
source="decision-router",
target="execute-action",
condition=EdgeCondition.CONDITIONAL,
condition_expr="decision == 'execute'",
priority=1,
),
EdgeSpec(
id="router-to-review",
source="decision-router",
target="human-review",
condition=EdgeCondition.CONDITIONAL,
condition_expr="decision == 'review'",
priority=2,
),
EdgeSpec(
id="router-to-clarify",
source="decision-router",
target="request-clarification",
condition=EdgeCondition.CONDITIONAL,
condition_expr="decision == 'clarify'",
priority=3,
),
]
```
## Error Handling Patterns
### Graceful Failure with Fallback
```python
# Primary node with error handling
nodes = [
NodeSpec(id="api-call", max_retries=3, ...),
NodeSpec(id="fallback-cache", ...),
NodeSpec(id="report-error", ...),
]
edges = [
# Success path
EdgeSpec(
id="api-success",
source="api-call",
target="process-results",
condition=EdgeCondition.ON_SUCCESS,
),
# Fallback on failure
EdgeSpec(
id="api-to-fallback",
source="api-call",
target="fallback-cache",
condition=EdgeCondition.ON_FAILURE,
priority=1,
),
# Report if fallback also fails
EdgeSpec(
id="fallback-to-error",
source="fallback-cache",
target="report-error",
condition=EdgeCondition.ON_FAILURE,
priority=1,
),
]
```
## Performance Optimization
### Parallel Node Execution
```python
# Use multiple edges from same source for parallel execution
edges = [
EdgeSpec(
id="start-to-search1",
source="start",
target="search-source-1",
condition=EdgeCondition.ALWAYS,
),
EdgeSpec(
id="start-to-search2",
source="start",
target="search-source-2",
condition=EdgeCondition.ALWAYS,
),
EdgeSpec(
id="start-to-search3",
source="start",
target="search-source-3",
condition=EdgeCondition.ALWAYS,
),
# Converge results
EdgeSpec(
id="search1-to-merge",
source="search-source-1",
target="merge-results",
),
EdgeSpec(
id="search2-to-merge",
source="search-source-2",
target="merge-results",
),
EdgeSpec(
id="search3-to-merge",
source="search-source-3",
target="merge-results",
),
]
```
## Handoff to Testing
When agent is complete, transition to testing phase:
```python
print("""
✅ Agent complete: exports/my_agent/
Next steps:
1. Switch to testing-agent skill
2. Generate and approve tests
3. Run evaluation
4. Debug any failures
Command: "Test the agent at exports/my_agent/"
""")
```
### Pre-Testing Checklist
Before handing off to testing-agent:
- [ ] Agent structure validates: `python -m agent_name validate`
- [ ] All nodes defined in nodes/__init__.py
- [ ] All edges connect valid nodes
- [ ] Entry node specified
- [ ] Agent can be imported: `from exports.agent_name import default_agent`
- [ ] README.md with usage instructions
- [ ] CLI commands work (info, validate)
## Related Skills
- **building-agents-core** - Fundamental concepts
- **building-agents-construction** - Step-by-step building
- **testing-agent** - Test and validate agents
- **agent-workflow** - Complete workflow orchestrator
---
**Remember: Agent is actively constructed, visible the whole time. No hidden state. No surprise exports. Just transparent, incremental file building.**
+399
View File
@@ -0,0 +1,399 @@
---
name: hive-concepts
description: Core concepts for goal-driven agents - architecture, node types (event_loop, function), tool discovery, and workflow overview. Use when starting agent development or need to understand agent fundamentals.
license: Apache-2.0
metadata:
author: hive
version: "2.0"
type: foundational
part_of: hive
---
# Building Agents - Core Concepts
Foundational knowledge for building goal-driven agents as Python packages.
## Architecture: Python Services (Not JSON Configs)
Agents are built as Python packages:
```
exports/my_agent/
├── __init__.py # Package exports
├── __main__.py # CLI (run, info, validate, shell)
├── agent.py # Graph construction (goal, edges, agent class)
├── nodes/__init__.py # Node definitions (NodeSpec)
├── config.py # Runtime config
└── README.md # Documentation
```
**Key Principle: Agent is visible and editable during build**
- Files created immediately as components are approved
- User can watch files grow in their editor
- No session state - just direct file writes
- No "export" step - agent is ready when build completes
## Core Concepts
### Goal
Success criteria and constraints (written to agent.py)
```python
goal = Goal(
id="research-goal",
name="Technical Research Agent",
description="Research technical topics thoroughly",
success_criteria=[
SuccessCriterion(
id="completeness",
description="Cover all aspects of topic",
metric="coverage_score",
target=">=0.9",
weight=0.4,
),
# 3-5 success criteria total
],
constraints=[
Constraint(
id="accuracy",
description="All information must be verified",
constraint_type="hard",
category="quality",
),
# 1-5 constraints total
],
)
```
### Node
Unit of work (written to nodes/__init__.py)
**Node Types:**
- `event_loop` — Multi-turn streaming loop with tool execution and judge-based evaluation. Works with or without tools.
- `function` — Deterministic Python operations. No LLM involved.
```python
search_node = NodeSpec(
id="search-web",
name="Search Web",
description="Search for information and extract results",
node_type="event_loop",
input_keys=["query"],
output_keys=["search_results"],
system_prompt="Search the web for: {query}. Use the web_search tool to find results, then call set_output to store them.",
tools=["web_search"],
)
```
**NodeSpec Fields for Event Loop Nodes:**
| Field | Default | Description |
|-------|---------|-------------|
| `client_facing` | `False` | If True, streams output to user and blocks for input between turns |
| `nullable_output_keys` | `[]` | Output keys that may remain unset (for mutually exclusive outputs) |
| `max_node_visits` | `1` | Max times this node executes per run. Set >1 for feedback loop targets |
### Edge
Connection between nodes (written to agent.py)
**Edge Conditions:**
- `on_success` — Proceed if node succeeds (most common)
- `on_failure` — Handle errors
- `always` — Always proceed
- `conditional` — Based on expression evaluating node output
**Edge Priority:**
Priority controls evaluation order when multiple edges leave the same node. Higher priority edges are evaluated first. Use negative priority for feedback edges (edges that loop back to earlier nodes).
```python
# Forward edge (evaluated first)
EdgeSpec(
id="review-to-campaign",
source="review",
target="campaign-builder",
condition=EdgeCondition.CONDITIONAL,
condition_expr="output.get('approved_contacts') is not None",
priority=1,
)
# Feedback edge (evaluated after forward edges)
EdgeSpec(
id="review-feedback",
source="review",
target="extractor",
condition=EdgeCondition.CONDITIONAL,
condition_expr="output.get('redo_extraction') is not None",
priority=-1,
)
```
### Client-Facing Nodes
For multi-turn conversations with the user, set `client_facing=True` on a node. The node will:
- Stream its LLM output directly to the end user
- Block for user input between conversational turns
- Resume when new input is injected via `inject_event()`
```python
intake_node = NodeSpec(
id="intake",
name="Intake",
description="Gather requirements from the user",
node_type="event_loop",
client_facing=True,
input_keys=[],
output_keys=["repo_url", "project_url"],
system_prompt="You are the intake agent. Ask the user for the repo URL and project URL.",
)
```
> **Legacy Note:** The old `pause_nodes` / `entry_points` pattern still works but `client_facing=True` is preferred for new agents.
**STEP 1 / STEP 2 Prompt Pattern:** For client-facing nodes, structure the system prompt with two explicit phases:
```python
system_prompt="""\
**STEP 1 — Respond to the user (text only, NO tool calls):**
[Present information, ask questions, etc.]
**STEP 2 — After the user responds, call set_output:**
[Call set_output with the structured outputs]
"""
```
This prevents the LLM from calling `set_output` prematurely before the user has had a chance to respond.
### Node Design: Fewer, Richer Nodes
Prefer fewer nodes that do more work over many thin single-purpose nodes:
- **Bad**: 8 thin nodes (parse query → search → fetch → evaluate → synthesize → write → check → save)
- **Good**: 4 rich nodes (intake → research → review → report)
Why: Each node boundary requires serializing outputs and passing context. Fewer nodes means the LLM retains full context of its work within the node. A research node that searches, fetches, and analyzes keeps all the source material in its conversation history.
### nullable_output_keys for Cross-Edge Inputs
When a node receives inputs that only arrive on certain edges (e.g., `feedback` only comes from a review → research feedback loop, not from intake → research), mark those keys as `nullable_output_keys`:
```python
research_node = NodeSpec(
id="research",
input_keys=["research_brief", "feedback"],
nullable_output_keys=["feedback"], # Not present on first visit
max_node_visits=3,
...
)
```
## Event Loop Architecture Concepts
### How EventLoopNode Works
An event loop node runs a multi-turn loop:
1. LLM receives system prompt + conversation history
2. LLM responds (text and/or tool calls)
3. Tool calls are executed, results added to conversation
4. Judge evaluates: ACCEPT (exit loop), RETRY (loop again), or ESCALATE
5. Repeat until judge ACCEPTs or max_iterations reached
### EventLoopNode Runtime
EventLoopNodes are **auto-created** by `GraphExecutor` at runtime. You do NOT need to manually register them. Both `GraphExecutor` (direct) and `AgentRuntime` / `create_agent_runtime()` handle event_loop nodes automatically.
```python
# Direct execution — executor auto-creates EventLoopNodes
from framework.graph.executor import GraphExecutor
from framework.runtime.core import Runtime
runtime = Runtime(storage_path)
executor = GraphExecutor(
runtime=runtime,
llm=llm,
tools=tools,
tool_executor=tool_executor,
storage_path=storage_path,
)
result = await executor.execute(graph=graph, goal=goal, input_data=input_data)
# TUI execution — AgentRuntime also works
from framework.runtime.agent_runtime import create_agent_runtime
runtime = create_agent_runtime(
graph=graph, goal=goal, storage_path=storage_path,
entry_points=[...], llm=llm, tools=tools, tool_executor=tool_executor,
)
```
### set_output
Nodes produce structured outputs by calling `set_output(key, value)` — a synthetic tool injected by the framework. When the LLM calls `set_output`, the value is stored in the output accumulator and made available to downstream nodes via shared memory.
`set_output` is NOT a real tool — it is excluded from `real_tool_results`. For client-facing nodes, this means a turn where the LLM only calls `set_output` (no other tools) is treated as a conversational boundary and will block for user input.
### JudgeProtocol
**The judge is the SOLE mechanism for acceptance decisions.** Do not add ad-hoc framework gating, output rollback, or premature rejection logic. If the LLM calls `set_output` too early, fix it with better prompts or a custom judge — not framework-level guards.
The judge controls when a node's loop exits:
- **Implicit judge** (default, no judge configured): ACCEPTs when the LLM finishes with no tool calls and all required output keys are set
- **SchemaJudge**: Validates outputs against a Pydantic model
- **Custom judges**: Implement `evaluate(context) -> JudgeVerdict`
### LoopConfig
Controls loop behavior:
- `max_iterations` (default 50) — prevents infinite loops
- `max_tool_calls_per_turn` (default 10) — limits tool calls per LLM response
- `tool_call_overflow_margin` (default 0.5) — wiggle room before discarding extra tool calls (50% means hard cutoff at 150% of limit)
- `stall_detection_threshold` (default 3) — detects repeated identical responses
- `max_history_tokens` (default 32000) — triggers conversation compaction
### Data Tools (Spillover Management)
When tool results exceed the context window, the framework automatically saves them to a spillover directory and truncates with a hint. Nodes that produce or consume large data should include the data tools:
- `save_data(filename, data)` — Write data to a file in the data directory
- `load_data(filename, offset=0, limit=50)` — Read data with line-based pagination
- `list_data_files()` — List available data files
- `serve_file_to_user(filename, label="")` — Get a clickable file:// URI for the user
Note: `data_dir` is a framework-injected context parameter — the LLM never sees or passes it. `GraphExecutor.execute()` sets it per-execution via `contextvars`, so data tools and spillover always share the same session-scoped directory.
These are real MCP tools (not synthetic). Add them to nodes that handle large tool results:
```python
research_node = NodeSpec(
...
tools=["web_search", "web_scrape", "load_data", "save_data", "list_data_files"],
)
```
### Fan-Out / Fan-In
Multiple ON_SUCCESS edges from the same source create parallel execution. All branches run concurrently via `asyncio.gather()`. Parallel event_loop nodes must have disjoint `output_keys`.
### max_node_visits
Controls how many times a node can execute in one graph run. Default is 1. Set higher for nodes that are targets of feedback edges (review-reject loops). Set 0 for unlimited (guarded by max_steps).
## Tool Discovery & Validation
**CRITICAL:** Before adding a node with tools, you MUST verify the tools exist.
Tools are provided by MCP servers. Never assume a tool exists - always discover dynamically.
### Step 1: Register MCP Server (if not already done)
```python
mcp__agent-builder__add_mcp_server(
name="tools",
transport="stdio",
command="python",
args='["mcp_server.py", "--stdio"]',
cwd="../tools"
)
```
### Step 2: Discover Available Tools
```python
# List all tools from all registered servers
mcp__agent-builder__list_mcp_tools()
# Or list tools from a specific server
mcp__agent-builder__list_mcp_tools(server_name="tools")
```
### Step 3: Validate Before Adding Nodes
Before writing a node with `tools=[...]`:
1. Call `list_mcp_tools()` to get available tools
2. Check each tool in your node exists in the response
3. If a tool doesn't exist:
- **DO NOT proceed** with the node
- Inform the user: "The tool 'X' is not available. Available tools are: ..."
- Ask if they want to use an alternative or proceed without the tool
### Tool Validation Anti-Patterns
- **Never assume a tool exists** - always call `list_mcp_tools()` first
- **Never write a node with unverified tools** - validate before writing
- **Never silently drop tools** - if a tool doesn't exist, inform the user
- **Never guess tool names** - use exact names from discovery response
## Workflow Overview: Incremental File Construction
```
1. CREATE PACKAGE → mkdir + write skeletons
2. DEFINE GOAL → Write to agent.py + config.py
3. FOR EACH NODE:
- Propose design (event_loop for LLM work, function for deterministic)
- User approves
- Write to nodes/__init__.py IMMEDIATELY
- (Optional) Validate with test_node
4. CONNECT EDGES → Update agent.py
- Use priority for feedback edges (negative priority)
- (Optional) Validate with validate_graph
5. FINALIZE → Write agent class to agent.py
6. DONE - Agent ready at exports/my_agent/
```
**Files written immediately. MCP tools optional for validation/testing bookkeeping.**
## When to Use This Skill
Use hive-concepts when:
- Starting a new agent project and need to understand fundamentals
- Need to understand agent architecture before building
- Want to validate tool availability before proceeding
- Learning about node types, edges, and graph execution
**Next Steps:**
- Ready to build? → Use `hive-create` skill
- Need patterns and examples? → Use `hive-patterns` skill
## MCP Tools for Validation
After writing files, optionally use MCP tools for validation:
**test_node** - Validate node configuration with mock inputs
```python
mcp__agent-builder__test_node(
node_id="search-web",
test_input='{"query": "test query"}',
mock_llm_response='{"results": "mock output"}'
)
```
**validate_graph** - Check graph structure
```python
mcp__agent-builder__validate_graph()
# Returns: unreachable nodes, missing connections, event_loop validation, etc.
```
**configure_loop** - Set event loop parameters
```python
mcp__agent-builder__configure_loop(
max_iterations=50,
max_tool_calls_per_turn=10,
stall_detection_threshold=3,
max_history_tokens=32000
)
```
**Key Point:** Files are written FIRST. MCP tools are for validation only.
## Related Skills
- **hive-create** - Step-by-step building process
- **hive-patterns** - Best practices: judges, feedback edges, fan-out, context management
- **hive** - Complete workflow orchestrator
- **hive-test** - Test and validate completed agents
File diff suppressed because it is too large Load Diff
@@ -0,0 +1,24 @@
"""
Deep Research Agent - Interactive, rigorous research with TUI conversation.
Research any topic through multi-source web search, quality evaluation,
and synthesis. Features client-facing TUI interaction at key checkpoints
for user guidance and iterative deepening.
"""
from .agent import DeepResearchAgent, default_agent, goal, nodes, edges
from .config import RuntimeConfig, AgentMetadata, default_config, metadata
__version__ = "1.0.0"
__all__ = [
"DeepResearchAgent",
"default_agent",
"goal",
"nodes",
"edges",
"RuntimeConfig",
"AgentMetadata",
"default_config",
"metadata",
]
@@ -0,0 +1,241 @@
"""
CLI entry point for Deep Research Agent.
Uses AgentRuntime for multi-entrypoint support with HITL pause/resume.
"""
import asyncio
import json
import logging
import sys
import click
from .agent import default_agent, DeepResearchAgent
def setup_logging(verbose=False, debug=False):
"""Configure logging for execution visibility."""
if debug:
level, fmt = logging.DEBUG, "%(asctime)s %(name)s: %(message)s"
elif verbose:
level, fmt = logging.INFO, "%(message)s"
else:
level, fmt = logging.WARNING, "%(levelname)s: %(message)s"
logging.basicConfig(level=level, format=fmt, stream=sys.stderr)
logging.getLogger("framework").setLevel(level)
@click.group()
@click.version_option(version="1.0.0")
def cli():
"""Deep Research Agent - Interactive, rigorous research with TUI conversation."""
pass
@cli.command()
@click.option("--topic", "-t", type=str, required=True, help="Research topic")
@click.option("--mock", is_flag=True, help="Run in mock mode")
@click.option("--quiet", "-q", is_flag=True, help="Only output result JSON")
@click.option("--verbose", "-v", is_flag=True, help="Show execution details")
@click.option("--debug", is_flag=True, help="Show debug logging")
def run(topic, mock, quiet, verbose, debug):
"""Execute research on a topic."""
if not quiet:
setup_logging(verbose=verbose, debug=debug)
context = {"topic": topic}
result = asyncio.run(default_agent.run(context, mock_mode=mock))
output_data = {
"success": result.success,
"steps_executed": result.steps_executed,
"output": result.output,
}
if result.error:
output_data["error"] = result.error
click.echo(json.dumps(output_data, indent=2, default=str))
sys.exit(0 if result.success else 1)
@cli.command()
@click.option("--mock", is_flag=True, help="Run in mock mode")
@click.option("--verbose", "-v", is_flag=True, help="Show execution details")
@click.option("--debug", is_flag=True, help="Show debug logging")
def tui(mock, verbose, debug):
"""Launch the TUI dashboard for interactive research."""
setup_logging(verbose=verbose, debug=debug)
try:
from framework.tui.app import AdenTUI
except ImportError:
click.echo(
"TUI requires the 'textual' package. Install with: pip install textual"
)
sys.exit(1)
from pathlib import Path
from framework.llm import LiteLLMProvider
from framework.runner.tool_registry import ToolRegistry
from framework.runtime.agent_runtime import create_agent_runtime
from framework.runtime.event_bus import EventBus
from framework.runtime.execution_stream import EntryPointSpec
async def run_with_tui():
agent = DeepResearchAgent()
# Build graph and tools
agent._event_bus = EventBus()
agent._tool_registry = ToolRegistry()
storage_path = Path.home() / ".hive" / "agents" / "deep_research_agent"
storage_path.mkdir(parents=True, exist_ok=True)
mcp_config_path = Path(__file__).parent / "mcp_servers.json"
if mcp_config_path.exists():
agent._tool_registry.load_mcp_config(mcp_config_path)
llm = None
if not mock:
llm = LiteLLMProvider(
model=agent.config.model,
api_key=agent.config.api_key,
api_base=agent.config.api_base,
)
tools = list(agent._tool_registry.get_tools().values())
tool_executor = agent._tool_registry.get_executor()
graph = agent._build_graph()
runtime = create_agent_runtime(
graph=graph,
goal=agent.goal,
storage_path=storage_path,
entry_points=[
EntryPointSpec(
id="start",
name="Start Research",
entry_node="intake",
trigger_type="manual",
isolation_level="isolated",
),
],
llm=llm,
tools=tools,
tool_executor=tool_executor,
)
await runtime.start()
try:
app = AdenTUI(runtime)
await app.run_async()
finally:
await runtime.stop()
asyncio.run(run_with_tui())
@cli.command()
@click.option("--json", "output_json", is_flag=True)
def info(output_json):
"""Show agent information."""
info_data = default_agent.info()
if output_json:
click.echo(json.dumps(info_data, indent=2))
else:
click.echo(f"Agent: {info_data['name']}")
click.echo(f"Version: {info_data['version']}")
click.echo(f"Description: {info_data['description']}")
click.echo(f"\nNodes: {', '.join(info_data['nodes'])}")
click.echo(f"Client-facing: {', '.join(info_data['client_facing_nodes'])}")
click.echo(f"Entry: {info_data['entry_node']}")
click.echo(f"Terminal: {', '.join(info_data['terminal_nodes'])}")
@cli.command()
def validate():
"""Validate agent structure."""
validation = default_agent.validate()
if validation["valid"]:
click.echo("Agent is valid")
if validation["warnings"]:
for warning in validation["warnings"]:
click.echo(f" WARNING: {warning}")
else:
click.echo("Agent has errors:")
for error in validation["errors"]:
click.echo(f" ERROR: {error}")
sys.exit(0 if validation["valid"] else 1)
@cli.command()
@click.option("--verbose", "-v", is_flag=True)
def shell(verbose):
"""Interactive research session (CLI, no TUI)."""
asyncio.run(_interactive_shell(verbose))
async def _interactive_shell(verbose=False):
"""Async interactive shell."""
setup_logging(verbose=verbose)
click.echo("=== Deep Research Agent ===")
click.echo("Enter a topic to research (or 'quit' to exit):\n")
agent = DeepResearchAgent()
await agent.start()
try:
while True:
try:
topic = await asyncio.get_event_loop().run_in_executor(
None, input, "Topic> "
)
if topic.lower() in ["quit", "exit", "q"]:
click.echo("Goodbye!")
break
if not topic.strip():
continue
click.echo("\nResearching...\n")
result = await agent.trigger_and_wait("start", {"topic": topic})
if result is None:
click.echo("\n[Execution timed out]\n")
continue
if result.success:
output = result.output
if "report_content" in output:
click.echo("\n--- Report ---\n")
click.echo(output["report_content"])
click.echo("\n")
if "references" in output:
click.echo("--- References ---\n")
for ref in output.get("references", []):
click.echo(
f" [{ref.get('number', '?')}] {ref.get('title', '')} - {ref.get('url', '')}"
)
click.echo("\n")
else:
click.echo(f"\nResearch failed: {result.error}\n")
except KeyboardInterrupt:
click.echo("\nGoodbye!")
break
except Exception as e:
click.echo(f"Error: {e}", err=True)
import traceback
traceback.print_exc()
finally:
await agent.stop()
if __name__ == "__main__":
cli()
@@ -0,0 +1,358 @@
"""Agent graph construction for Deep Research Agent."""
from pathlib import Path
from framework.graph import EdgeSpec, EdgeCondition, Goal, SuccessCriterion, Constraint
from framework.graph.edge import GraphSpec
from framework.graph.executor import ExecutionResult
from framework.graph.checkpoint_config import CheckpointConfig
from framework.llm import LiteLLMProvider
from framework.runner.tool_registry import ToolRegistry
from framework.runtime.agent_runtime import AgentRuntime, create_agent_runtime
from framework.runtime.execution_stream import EntryPointSpec
from .config import default_config, metadata
from .nodes import (
intake_node,
research_node,
review_node,
report_node,
)
# Goal definition
goal = Goal(
id="rigorous-interactive-research",
name="Rigorous Interactive Research",
description=(
"Research any topic by searching diverse sources, analyzing findings, "
"and producing a cited report — with user checkpoints to guide direction."
),
success_criteria=[
SuccessCriterion(
id="source-diversity",
description="Use multiple diverse, authoritative sources",
metric="source_count",
target=">=5",
weight=0.25,
),
SuccessCriterion(
id="citation-coverage",
description="Every factual claim in the report cites its source",
metric="citation_coverage",
target="100%",
weight=0.25,
),
SuccessCriterion(
id="user-satisfaction",
description="User reviews findings before report generation",
metric="user_approval",
target="true",
weight=0.25,
),
SuccessCriterion(
id="report-completeness",
description="Final report answers the original research questions",
metric="question_coverage",
target="90%",
weight=0.25,
),
],
constraints=[
Constraint(
id="no-hallucination",
description="Only include information found in fetched sources",
constraint_type="quality",
category="accuracy",
),
Constraint(
id="source-attribution",
description="Every claim must cite its source with a numbered reference",
constraint_type="quality",
category="accuracy",
),
Constraint(
id="user-checkpoint",
description="Present findings to the user before writing the final report",
constraint_type="functional",
category="interaction",
),
],
)
# Node list
nodes = [
intake_node,
research_node,
review_node,
report_node,
]
# Edge definitions
edges = [
# intake -> research
EdgeSpec(
id="intake-to-research",
source="intake",
target="research",
condition=EdgeCondition.ON_SUCCESS,
priority=1,
),
# research -> review
EdgeSpec(
id="research-to-review",
source="research",
target="review",
condition=EdgeCondition.ON_SUCCESS,
priority=1,
),
# review -> research (feedback loop)
EdgeSpec(
id="review-to-research-feedback",
source="review",
target="research",
condition=EdgeCondition.CONDITIONAL,
condition_expr="needs_more_research == True",
priority=1,
),
# review -> report (user satisfied)
EdgeSpec(
id="review-to-report",
source="review",
target="report",
condition=EdgeCondition.CONDITIONAL,
condition_expr="needs_more_research == False",
priority=2,
),
# report -> research (user wants deeper research on current topic)
EdgeSpec(
id="report-to-research",
source="report",
target="research",
condition=EdgeCondition.CONDITIONAL,
condition_expr="str(next_action).lower() == 'more_research'",
priority=2,
),
# report -> intake (user wants a new topic — default when not more_research)
EdgeSpec(
id="report-to-intake",
source="report",
target="intake",
condition=EdgeCondition.CONDITIONAL,
condition_expr="str(next_action).lower() != 'more_research'",
priority=1,
),
]
# Graph configuration
entry_node = "intake"
entry_points = {"start": "intake"}
pause_nodes = []
terminal_nodes = []
class DeepResearchAgent:
"""
Deep Research Agent 4-node pipeline with user checkpoints.
Flow: intake -> research -> review -> report
^ |
+-- feedback loop (if user wants more)
Uses AgentRuntime for proper session management:
- Session-scoped storage (sessions/{session_id}/)
- Checkpointing for resume capability
- Runtime logging
- Data folder for save_data/load_data
"""
def __init__(self, config=None):
self.config = config or default_config
self.goal = goal
self.nodes = nodes
self.edges = edges
self.entry_node = entry_node
self.entry_points = entry_points
self.pause_nodes = pause_nodes
self.terminal_nodes = terminal_nodes
self._graph: GraphSpec | None = None
self._agent_runtime: AgentRuntime | None = None
self._tool_registry: ToolRegistry | None = None
self._storage_path: Path | None = None
def _build_graph(self) -> GraphSpec:
"""Build the GraphSpec."""
return GraphSpec(
id="deep-research-agent-graph",
goal_id=self.goal.id,
version="1.0.0",
entry_node=self.entry_node,
entry_points=self.entry_points,
terminal_nodes=self.terminal_nodes,
pause_nodes=self.pause_nodes,
nodes=self.nodes,
edges=self.edges,
default_model=self.config.model,
max_tokens=self.config.max_tokens,
loop_config={
"max_iterations": 100,
"max_tool_calls_per_turn": 20,
"max_history_tokens": 32000,
},
conversation_mode="continuous",
identity_prompt=(
"You are a rigorous research agent. You search for information "
"from diverse, authoritative sources, analyze findings critically, "
"and produce well-cited reports. You never fabricate information — "
"every claim must trace back to a source you actually retrieved."
),
)
def _setup(self, mock_mode=False) -> None:
"""Set up the agent runtime with sessions, checkpoints, and logging."""
self._storage_path = Path.home() / ".hive" / "agents" / "deep_research_agent"
self._storage_path.mkdir(parents=True, exist_ok=True)
self._tool_registry = ToolRegistry()
mcp_config_path = Path(__file__).parent / "mcp_servers.json"
if mcp_config_path.exists():
self._tool_registry.load_mcp_config(mcp_config_path)
llm = None
if not mock_mode:
llm = LiteLLMProvider(
model=self.config.model,
api_key=self.config.api_key,
api_base=self.config.api_base,
)
tool_executor = self._tool_registry.get_executor()
tools = list(self._tool_registry.get_tools().values())
self._graph = self._build_graph()
checkpoint_config = CheckpointConfig(
enabled=True,
checkpoint_on_node_start=False,
checkpoint_on_node_complete=True,
checkpoint_max_age_days=7,
async_checkpoint=True,
)
entry_point_specs = [
EntryPointSpec(
id="default",
name="Default",
entry_node=self.entry_node,
trigger_type="manual",
isolation_level="shared",
)
]
self._agent_runtime = create_agent_runtime(
graph=self._graph,
goal=self.goal,
storage_path=self._storage_path,
entry_points=entry_point_specs,
llm=llm,
tools=tools,
tool_executor=tool_executor,
checkpoint_config=checkpoint_config,
)
async def start(self, mock_mode=False) -> None:
"""Set up and start the agent runtime."""
if self._agent_runtime is None:
self._setup(mock_mode=mock_mode)
if not self._agent_runtime.is_running:
await self._agent_runtime.start()
async def stop(self) -> None:
"""Stop the agent runtime and clean up."""
if self._agent_runtime and self._agent_runtime.is_running:
await self._agent_runtime.stop()
self._agent_runtime = None
async def trigger_and_wait(
self,
entry_point: str = "default",
input_data: dict | None = None,
timeout: float | None = None,
session_state: dict | None = None,
) -> ExecutionResult | None:
"""Execute the graph and wait for completion."""
if self._agent_runtime is None:
raise RuntimeError("Agent not started. Call start() first.")
return await self._agent_runtime.trigger_and_wait(
entry_point_id=entry_point,
input_data=input_data or {},
session_state=session_state,
)
async def run(
self, context: dict, mock_mode=False, session_state=None
) -> ExecutionResult:
"""Run the agent (convenience method for single execution)."""
await self.start(mock_mode=mock_mode)
try:
result = await self.trigger_and_wait(
"default", context, session_state=session_state
)
return result or ExecutionResult(success=False, error="Execution timeout")
finally:
await self.stop()
def info(self):
"""Get agent information."""
return {
"name": metadata.name,
"version": metadata.version,
"description": metadata.description,
"goal": {
"name": self.goal.name,
"description": self.goal.description,
},
"nodes": [n.id for n in self.nodes],
"edges": [e.id for e in self.edges],
"entry_node": self.entry_node,
"entry_points": self.entry_points,
"pause_nodes": self.pause_nodes,
"terminal_nodes": self.terminal_nodes,
"client_facing_nodes": [n.id for n in self.nodes if n.client_facing],
}
def validate(self):
"""Validate agent structure."""
errors = []
warnings = []
node_ids = {node.id for node in self.nodes}
for edge in self.edges:
if edge.source not in node_ids:
errors.append(f"Edge {edge.id}: source '{edge.source}' not found")
if edge.target not in node_ids:
errors.append(f"Edge {edge.id}: target '{edge.target}' not found")
if self.entry_node not in node_ids:
errors.append(f"Entry node '{self.entry_node}' not found")
for terminal in self.terminal_nodes:
if terminal not in node_ids:
errors.append(f"Terminal node '{terminal}' not found")
for ep_id, node_id in self.entry_points.items():
if node_id not in node_ids:
errors.append(
f"Entry point '{ep_id}' references unknown node '{node_id}'"
)
return {
"valid": len(errors) == 0,
"errors": errors,
"warnings": warnings,
}
# Create default instance
default_agent = DeepResearchAgent()
@@ -0,0 +1,26 @@
"""Runtime configuration."""
from dataclasses import dataclass
from framework.config import RuntimeConfig
default_config = RuntimeConfig()
@dataclass
class AgentMetadata:
name: str = "Deep Research Agent"
version: str = "1.0.0"
description: str = (
"Interactive research agent that rigorously investigates topics through "
"multi-source search, quality evaluation, and synthesis - with TUI conversation "
"at key checkpoints for user guidance and feedback."
)
intro_message: str = (
"Hi! I'm your deep research assistant. Tell me a topic and I'll investigate it "
"thoroughly — searching multiple sources, evaluating quality, and synthesizing "
"a comprehensive report. What would you like me to research?"
)
metadata = AgentMetadata()
@@ -0,0 +1,9 @@
{
"hive-tools": {
"transport": "stdio",
"command": "uv",
"args": ["run", "python", "mcp_server.py", "--stdio"],
"cwd": "../../tools",
"description": "Hive tools MCP server providing web_search, web_scrape, and write_to_file"
}
}
@@ -0,0 +1,204 @@
"""Node definitions for Deep Research Agent."""
from framework.graph import NodeSpec
# Node 1: Intake (client-facing)
# Brief conversation to clarify what the user wants researched.
intake_node = NodeSpec(
id="intake",
name="Research Intake",
description="Discuss the research topic with the user, clarify scope, and confirm direction",
node_type="event_loop",
client_facing=True,
max_node_visits=0,
input_keys=["topic"],
output_keys=["research_brief"],
success_criteria=(
"The research brief is specific and actionable: it states the topic, "
"the key questions to answer, the desired scope, and depth."
),
system_prompt="""\
You are a research intake specialist. The user wants to research a topic.
Have a brief conversation to clarify what they need.
**STEP 1 Read and respond (text only, NO tool calls):**
1. Read the topic provided
2. If it's vague, ask 1-2 clarifying questions (scope, angle, depth)
3. If it's already clear, confirm your understanding and ask the user to confirm
Keep it short. Don't over-ask.
**STEP 2 After the user confirms, call set_output:**
- set_output("research_brief", "A clear paragraph describing exactly what to research, \
what questions to answer, what scope to cover, and how deep to go.")
""",
tools=[],
)
# Node 2: Research
# The workhorse — searches the web, fetches content, analyzes sources.
# One node with both tools avoids the context-passing overhead of 5 separate nodes.
research_node = NodeSpec(
id="research",
name="Research",
description="Search the web, fetch source content, and compile findings",
node_type="event_loop",
max_node_visits=0,
input_keys=["research_brief", "feedback"],
output_keys=["findings", "sources", "gaps"],
nullable_output_keys=["feedback"],
success_criteria=(
"Findings reference at least 3 distinct sources with URLs. "
"Key claims are substantiated by fetched content, not generated."
),
system_prompt="""\
You are a research agent. Given a research brief, find and analyze sources.
If feedback is provided, this is a follow-up round focus on the gaps identified.
Work in phases:
1. **Search**: Use web_search with 3-5 diverse queries covering different angles.
Prioritize authoritative sources (.edu, .gov, established publications).
2. **Fetch**: Use web_scrape on the most promising URLs (aim for 5-8 sources).
Skip URLs that fail. Extract the substantive content.
3. **Analyze**: Review what you've collected. Identify key findings, themes,
and any contradictions between sources.
Important:
- Work in batches of 3-4 tool calls at a time never more than 10 per turn
- After each batch, assess whether you have enough material
- Prefer quality over quantity 5 good sources beat 15 thin ones
- Track which URL each finding comes from (you'll need citations later)
- Call set_output for each key in a SEPARATE turn (not in the same turn as other tool calls)
When done, use set_output (one key at a time, separate turns):
- set_output("findings", "Structured summary: key findings with source URLs for each claim. \
Include themes, contradictions, and confidence levels.")
- set_output("sources", [{"url": "...", "title": "...", "summary": "..."}])
- set_output("gaps", "What aspects of the research brief are NOT well-covered yet, if any.")
""",
tools=[
"web_search",
"web_scrape",
"load_data",
"save_data",
"append_data",
"list_data_files",
],
)
# Node 3: Review (client-facing)
# Shows the user what was found and asks whether to dig deeper or proceed.
review_node = NodeSpec(
id="review",
name="Review Findings",
description="Present findings to user and decide whether to research more or write the report",
node_type="event_loop",
client_facing=True,
max_node_visits=0,
input_keys=["findings", "sources", "gaps", "research_brief"],
output_keys=["needs_more_research", "feedback"],
success_criteria=(
"The user has been presented with findings and has explicitly indicated "
"whether they want more research or are ready for the report."
),
system_prompt="""\
Present the research findings to the user clearly and concisely.
**STEP 1 Present (your first message, text only, NO tool calls):**
1. **Summary** (2-3 sentences of what was found)
2. **Key Findings** (bulleted, with confidence levels)
3. **Sources Used** (count and quality assessment)
4. **Gaps** (what's still unclear or under-covered)
End by asking: Are they satisfied, or do they want deeper research? \
Should we proceed to writing the final report?
**STEP 2 After the user responds, call set_output:**
- set_output("needs_more_research", "true") if they want more
- set_output("needs_more_research", "false") if they're satisfied
- set_output("feedback", "What the user wants explored further, or empty string")
""",
tools=[],
)
# Node 4: Report (client-facing)
# Writes an HTML report, serves the link to the user, and answers follow-ups.
report_node = NodeSpec(
id="report",
name="Write & Deliver Report",
description="Write a cited HTML report from the findings and present it to the user",
node_type="event_loop",
client_facing=True,
max_node_visits=0,
input_keys=["findings", "sources", "research_brief"],
output_keys=["delivery_status", "next_action"],
success_criteria=(
"An HTML report has been saved, the file link has been presented to the user, "
"and the user has indicated what they want to do next."
),
system_prompt="""\
Write a research report as an HTML file and present it to the user.
IMPORTANT: save_data requires TWO separate arguments: filename and data.
Call it like: save_data(filename="report.html", data="<html>...</html>")
Do NOT use _raw, do NOT nest arguments inside a JSON string.
**STEP 1 Write and save the HTML report (tool calls, NO text to user yet):**
Build a clean HTML document. Keep the HTML concise aim for clarity over length.
Use minimal embedded CSS (a few lines of style, not a full framework).
Report structure:
- Title & date
- Executive Summary (2-3 paragraphs)
- Key Findings (organized by theme, with [n] citation links)
- Analysis (synthesis, implications)
- Conclusion (key takeaways)
- References (numbered list with clickable URLs)
Requirements:
- Every factual claim must cite its source with [n] notation
- Be objective present multiple viewpoints where sources disagree
- Answer the original research questions from the brief
Save the HTML:
save_data(filename="report.html", data="<html>...</html>")
Then get the clickable link:
serve_file_to_user(filename="report.html", label="Research Report")
If save_data fails, simplify and shorten the HTML, then retry.
**STEP 2 Present the link to the user (text only, NO tool calls):**
Tell the user the report is ready and include the file:// URI from
serve_file_to_user so they can click it to open. Give a brief summary
of what the report covers. Ask if they have questions or want to continue.
**STEP 3 After the user responds:**
- Answer any follow-up questions from the research material
- When the user is ready to move on, ask what they'd like to do next:
- Research a new topic?
- Dig deeper into the current topic?
- Then call set_output:
- set_output("delivery_status", "completed")
- set_output("next_action", "new_topic") if they want a new topic
- set_output("next_action", "more_research") if they want deeper research
""",
tools=[
"save_data",
"append_data",
"edit_data",
"serve_file_to_user",
"load_data",
"list_data_files",
],
)
__all__ = [
"intake_node",
"research_node",
"review_node",
"report_node",
]
+640
View File
@@ -0,0 +1,640 @@
---
name: hive-credentials
description: Set up and install credentials for an agent. Detects missing credentials from agent config, collects them from the user, and stores them securely in the local encrypted store at ~/.hive/credentials.
license: Apache-2.0
metadata:
author: hive
version: "2.3"
type: utility
---
# Setup Credentials
Interactive credential setup for agents with multiple authentication options. Detects what's missing, offers auth method choices, validates with health checks, and stores credentials securely.
## When to Use
- Before running or testing an agent for the first time
- When `AgentRunner.run()` fails with "missing required credentials"
- When a user asks to configure credentials for an agent
- After building a new agent that uses tools requiring API keys
## Workflow
### Step 1: Identify the Agent
Determine which agent needs credentials. The user will either:
- Name the agent directly (e.g., "set up credentials for hubspot-agent")
- Have an agent directory open (check `exports/` for agent dirs)
- Be working on an agent in the current session
Locate the agent's directory under `exports/{agent_name}/`.
### Step 2: Detect Missing Credentials
Use the `check_missing_credentials` MCP tool to detect what the agent needs and what's already configured. This tool loads the agent, inspects its required tools and node types, maps them to credentials via `CREDENTIAL_SPECS`, and checks both the encrypted store and environment variables.
```
check_missing_credentials(agent_path="exports/{agent_name}")
```
The tool returns a JSON response:
```json
{
"agent": "exports/{agent_name}",
"missing": [
{
"credential_name": "brave_search",
"env_var": "BRAVE_SEARCH_API_KEY",
"description": "Brave Search API key for web search",
"help_url": "https://brave.com/search/api/",
"tools": ["web_search"]
}
],
"available": [
{
"credential_name": "anthropic",
"env_var": "ANTHROPIC_API_KEY",
"source": "encrypted_store"
}
],
"total_missing": 1,
"ready": false
}
```
**If `ready` is true (nothing missing):** Report all credentials as configured and skip Steps 3-5. Example:
```
All required credentials are already configured:
✓ anthropic (ANTHROPIC_API_KEY)
✓ brave_search (BRAVE_SEARCH_API_KEY)
Your agent is ready to run!
```
**If credentials are missing:** Continue to Step 3 with the `missing` list.
### Step 3: Present Auth Options for Each Missing Credential
For each missing credential, check what authentication methods are available:
```python
from aden_tools.credentials import CREDENTIAL_SPECS
spec = CREDENTIAL_SPECS.get("hubspot")
if spec:
# Determine available auth options
auth_options = []
if spec.aden_supported:
auth_options.append("aden")
if spec.direct_api_key_supported:
auth_options.append("direct")
auth_options.append("custom") # Always available
# Get setup info
setup_info = {
"env_var": spec.env_var,
"description": spec.description,
"help_url": spec.help_url,
"api_key_instructions": spec.api_key_instructions,
}
```
Present the available options using AskUserQuestion:
```
Choose how to configure HUBSPOT_ACCESS_TOKEN:
1) Aden Platform (OAuth) (Recommended)
Secure OAuth2 flow via hive.adenhq.com
- Quick setup with automatic token refresh
- No need to manage API keys manually
2) Direct API Key
Enter your own API key manually
- Requires creating a HubSpot Private App
- Full control over scopes and permissions
3) Local Credential Setup (Advanced)
Programmatic configuration for CI/CD
- For automated deployments
- Requires manual API calls
```
### Step 4: Execute Auth Flow Based on User Choice
#### Prerequisite: Ensure HIVE_CREDENTIAL_KEY Is Available
Before storing any credentials, verify `HIVE_CREDENTIAL_KEY` is set (needed to encrypt/decrypt the local store). Check both the current session and shell config:
```bash
# Check current session
printenv HIVE_CREDENTIAL_KEY > /dev/null 2>&1 && echo "session: set" || echo "session: not set"
# Check shell config files
for f in ~/.zshrc ~/.bashrc ~/.profile; do [ -f "$f" ] && grep -q 'HIVE_CREDENTIAL_KEY' "$f" && echo "$f"; done
```
- **In current session** — proceed to store credentials
- **In shell config but NOT in current session** — run `source ~/.zshrc` (or `~/.bashrc`) first, then proceed
- **Not set anywhere** — `EncryptedFileStorage` will auto-generate one. After storing, tell the user to persist it: `export HIVE_CREDENTIAL_KEY="{generated_key}"` in their shell profile
> **⚠️ IMPORTANT: After adding `HIVE_CREDENTIAL_KEY` to the user's shell config, always display:**
> ```
> ⚠️ Environment variables were added to your shell config.
> Open a NEW TERMINAL for them to take effect outside this session.
> ```
#### Option 1: Aden Platform (OAuth)
This is the recommended flow for supported integrations (HubSpot, etc.).
**How Aden OAuth Works:**
The ADEN_API_KEY represents a user who has already completed OAuth authorization on Aden's platform. When users sign up and connect integrations on Aden, those OAuth tokens are stored server-side. Having an ADEN_API_KEY means:
1. User has an Aden account
2. User has already authorized integrations (HubSpot, etc.) via OAuth on Aden
3. We just need to sync those credentials down to the local credential store
**4.1a. Check for ADEN_API_KEY**
```python
import os
aden_key = os.environ.get("ADEN_API_KEY")
```
If not set, guide user to get one from Aden (this is where they do OAuth):
```python
from aden_tools.credentials import open_browser, get_aden_setup_url
# Open browser to Aden - user will sign up and connect integrations there
url = get_aden_setup_url() # https://hive.adenhq.com
success, msg = open_browser(url)
print("Please sign in to Aden and connect your integrations (HubSpot, etc.).")
print("Once done, copy your API key and return here.")
```
Ask user to provide the ADEN_API_KEY they received.
**4.1b. Save ADEN_API_KEY to Shell Config**
With user approval, persist ADEN_API_KEY to their shell config:
```python
from aden_tools.credentials import (
detect_shell,
add_env_var_to_shell_config,
get_shell_source_command,
)
shell_type = detect_shell() # 'bash', 'zsh', or 'unknown'
# Ask user for approval before modifying shell config
# If approved:
success, config_path = add_env_var_to_shell_config(
"ADEN_API_KEY",
user_provided_key,
comment="Aden Platform (OAuth) API key"
)
if success:
source_cmd = get_shell_source_command()
print(f"Saved to {config_path}")
print(f"Run: {source_cmd}")
```
> **⚠️ IMPORTANT: After adding `ADEN_API_KEY` to the user's shell config, always display:**
> ```
> ⚠️ Environment variables were added to your shell config.
> Open a NEW TERMINAL for them to take effect outside this session.
> ```
Also save to `~/.hive/configuration.json` for the framework:
```python
import json
from pathlib import Path
config_path = Path.home() / ".hive" / "configuration.json"
config = json.loads(config_path.read_text()) if config_path.exists() else {}
config["aden"] = {
"api_key_configured": True,
"api_url": "https://api.adenhq.com"
}
config_path.parent.mkdir(parents=True, exist_ok=True)
config_path.write_text(json.dumps(config, indent=2))
```
**4.1c. Sync Credentials from Aden Server**
Since the user has already authorized integrations on Aden, use the one-liner factory method:
```python
from core.framework.credentials import CredentialStore
# This single call handles everything:
# - Creates encrypted local storage at ~/.hive/credentials
# - Configures Aden client from ADEN_API_KEY env var
# - Syncs all credentials from Aden server automatically
store = CredentialStore.with_aden_sync(
base_url="https://api.adenhq.com",
auto_sync=True, # Syncs on creation
)
# Check what was synced
synced = store.list_credentials()
print(f"Synced credentials: {synced}")
# If the required credential wasn't synced, the user hasn't authorized it on Aden yet
if "hubspot" not in synced:
print("HubSpot not found in your Aden account.")
print("Please visit https://hive.adenhq.com to connect HubSpot, then try again.")
```
For more control over the sync process:
```python
from core.framework.credentials import CredentialStore
from core.framework.credentials.aden import (
AdenCredentialClient,
AdenClientConfig,
AdenSyncProvider,
)
# Create client (API key loaded from ADEN_API_KEY env var)
client = AdenCredentialClient(AdenClientConfig(
base_url="https://api.adenhq.com",
))
# Create provider and store
provider = AdenSyncProvider(client=client)
store = CredentialStore.with_encrypted_storage()
# Manual sync
synced_count = provider.sync_all(store)
print(f"Synced {synced_count} credentials from Aden")
```
**4.1d. Run Health Check**
```python
from aden_tools.credentials import check_credential_health
# Get the token from the store
cred = store.get_credential("hubspot")
token = cred.keys["access_token"].value.get_secret_value()
result = check_credential_health("hubspot", token)
if result.valid:
print("HubSpot credentials validated successfully!")
else:
print(f"Validation failed: {result.message}")
# Offer to retry the OAuth flow
```
#### Option 2: Direct API Key
For users who prefer manual API key management.
**4.2a. Show Setup Instructions**
```python
from aden_tools.credentials import CREDENTIAL_SPECS
spec = CREDENTIAL_SPECS.get("hubspot")
if spec and spec.api_key_instructions:
print(spec.api_key_instructions)
# Output:
# To get a HubSpot Private App token:
# 1. Go to HubSpot Settings > Integrations > Private Apps
# 2. Click "Create a private app"
# 3. Name your app (e.g., "Hive Agent")
# ...
if spec and spec.help_url:
print(f"More info: {spec.help_url}")
```
**4.2b. Collect API Key from User**
Use AskUserQuestion to securely collect the API key:
```
Please provide your HubSpot access token:
(This will be stored securely in ~/.hive/credentials)
```
**4.2c. Run Health Check Before Storing**
```python
from aden_tools.credentials import check_credential_health
result = check_credential_health("hubspot", user_provided_token)
if not result.valid:
print(f"Warning: {result.message}")
# Ask user if they want to:
# 1. Try a different token
# 2. Continue anyway (not recommended)
```
**4.2d. Store in Local Encrypted Store**
```python
from core.framework.credentials import CredentialStore, CredentialObject, CredentialKey
from pydantic import SecretStr
store = CredentialStore.with_encrypted_storage()
cred = CredentialObject(
id="hubspot",
name="HubSpot Access Token",
keys={
"access_token": CredentialKey(
name="access_token",
value=SecretStr(user_provided_token),
)
},
)
store.save_credential(cred)
```
**4.2e. Export to Current Session**
```bash
export HUBSPOT_ACCESS_TOKEN="the-value"
```
#### Option 3: Local Credential Setup (Advanced)
For programmatic/CI/CD setups.
**4.3a. Show Documentation**
```
For advanced credential management, you can use the CredentialStore API directly:
from core.framework.credentials import CredentialStore, CredentialObject, CredentialKey
from pydantic import SecretStr
store = CredentialStore.with_encrypted_storage()
cred = CredentialObject(
id="hubspot",
name="HubSpot Access Token",
keys={"access_token": CredentialKey(name="access_token", value=SecretStr("..."))}
)
store.save_credential(cred)
For CI/CD environments:
- Set HIVE_CREDENTIAL_KEY for encryption
- Pre-populate ~/.hive/credentials programmatically
- Or use environment variables directly (HUBSPOT_ACCESS_TOKEN)
Documentation: See core/framework/credentials/README.md
```
### Step 5: Record Configuration Method
Track which auth method was used for each credential in `~/.hive/configuration.json`:
```python
import json
from pathlib import Path
from datetime import datetime
config_path = Path.home() / ".hive" / "configuration.json"
config = json.loads(config_path.read_text()) if config_path.exists() else {}
if "credential_methods" not in config:
config["credential_methods"] = {}
config["credential_methods"]["hubspot"] = {
"method": "aden", # or "direct" or "custom"
"configured_at": datetime.now().isoformat(),
}
config_path.write_text(json.dumps(config, indent=2))
```
### Step 6: Verify All Credentials
Use the `verify_credentials` MCP tool to confirm everything is properly configured:
```
verify_credentials(agent_path="exports/{agent_name}")
```
The tool returns:
```json
{
"agent": "exports/{agent_name}",
"ready": true,
"missing_credentials": [],
"warnings": [],
"errors": []
}
```
If `ready` is true, report success. If `missing_credentials` is non-empty, identify what failed and loop back to Step 3 for the remaining credentials.
## Health Check Reference
Health checks validate credentials by making lightweight API calls:
| Credential | Endpoint | What It Checks |
| --------------- | --------------------------------------- | --------------------------------- |
| `anthropic` | `POST /v1/messages` | API key validity |
| `brave_search` | `GET /res/v1/web/search?q=test&count=1` | API key validity |
| `google_search` | `GET /customsearch/v1?q=test&num=1` | API key + CSE ID validity |
| `github` | `GET /user` | Token validity, user identity |
| `hubspot` | `GET /crm/v3/objects/contacts?limit=1` | Bearer token validity, CRM scopes |
| `resend` | `GET /domains` | API key validity |
```python
from aden_tools.credentials import check_credential_health, HealthCheckResult
result: HealthCheckResult = check_credential_health("hubspot", token_value)
# result.valid: bool
# result.message: str
# result.details: dict (status_code, rate_limited, etc.)
```
## Encryption Key (HIVE_CREDENTIAL_KEY)
The local encrypted store requires `HIVE_CREDENTIAL_KEY` to encrypt/decrypt credentials.
- If the user doesn't have one, `EncryptedFileStorage` will auto-generate one and log it
- The user MUST persist this key (e.g., in `~/.bashrc`/`~/.zshrc` or a secrets manager)
- Without this key, stored credentials cannot be decrypted
**Shell config rule:** Only TWO keys belong in shell config (`~/.zshrc`/`~/.bashrc`):
- `HIVE_CREDENTIAL_KEY` — encryption key for the credential store
- `ADEN_API_KEY` — Aden platform auth key (needed before the store can sync)
All other API keys (Brave, Google, HubSpot, etc.) must go in the encrypted store only. **Never offer to add them to shell config.**
If `HIVE_CREDENTIAL_KEY` is not set:
1. Let the store generate one
2. Tell the user to save it: `export HIVE_CREDENTIAL_KEY="{generated_key}"`
3. Recommend adding it to `~/.bashrc` or their shell profile
## Security Rules
- **NEVER** log, print, or echo credential values in tool output
- **NEVER** store credentials in plaintext files, git-tracked files, or agent configs
- **NEVER** hardcode credentials in source code
- **NEVER** offer to save API keys to shell config (`~/.zshrc`/`~/.bashrc`) — the **only** keys that belong in shell config are `HIVE_CREDENTIAL_KEY` and `ADEN_API_KEY`. All other credentials (Brave, Google, HubSpot, GitHub, Resend, etc.) go in the encrypted store only.
- **ALWAYS** use `SecretStr` from Pydantic when handling credential values in Python
- **ALWAYS** use the local encrypted store (`~/.hive/credentials`) for persistence
- **ALWAYS** run health checks before storing credentials (when possible)
- **ALWAYS** verify credentials were stored by re-running validation, not by reading them back
- When modifying `~/.bashrc` or `~/.zshrc`, confirm with the user first
## Credential Sources Reference
All credential specs are defined in `tools/src/aden_tools/credentials/`:
| File | Category | Credentials | Aden Supported |
| ----------------- | ------------- | --------------------------------------------- | -------------- |
| `llm.py` | LLM Providers | `anthropic` | No |
| `search.py` | Search Tools | `brave_search`, `google_search`, `google_cse` | No |
| `email.py` | Email | `resend` | No |
| `integrations.py` | Integrations | `github`, `hubspot`, `google_calendar_oauth` | No / Yes |
**Note:** Additional LLM providers (Cerebras, Groq, OpenAI) are handled by LiteLLM via environment
variables (`CEREBRAS_API_KEY`, `GROQ_API_KEY`, `OPENAI_API_KEY`) but are not yet in CREDENTIAL_SPECS.
Add them to `llm.py` as needed.
To check what's registered:
```python
from aden_tools.credentials import CREDENTIAL_SPECS
for name, spec in CREDENTIAL_SPECS.items():
print(f"{name}: aden={spec.aden_supported}, direct={spec.direct_api_key_supported}")
```
## Migration: CredentialManager → CredentialStore
**CredentialManager is deprecated.** Use CredentialStore instead.
| Old (Deprecated) | New (Recommended) |
| ----------------------------------------- | -------------------------------------------------------------------- |
| `CredentialManager()` | `CredentialStore.with_encrypted_storage()` |
| `creds.get("hubspot")` | `store.get("hubspot")` or `store.get_key("hubspot", "access_token")` |
| `creds.validate_for_tools(tools)` | Use `store.is_available(cred_id)` per credential |
| `creds.get_auth_options("hubspot")` | Check `CREDENTIAL_SPECS["hubspot"].aden_supported` |
| `creds.get_setup_instructions("hubspot")` | Access `CREDENTIAL_SPECS["hubspot"]` directly |
**Why migrate?**
- **CredentialStore** supports encrypted storage, multi-key credentials, template resolution, and automatic token refresh
- **CredentialManager** only reads from environment variables and .env files (no encryption, no refresh)
- **CredentialStoreAdapter** exists for backward compatibility during migration
```python
# Old way (deprecated)
from aden_tools.credentials import CredentialManager
creds = CredentialManager()
token = creds.get("hubspot")
# New way (recommended)
from core.framework.credentials import CredentialStore
store = CredentialStore.with_encrypted_storage()
token = store.get("hubspot")
# With Aden sync (recommended for OAuth integrations)
store = CredentialStore.with_aden_sync()
token = store.get_key("hubspot", "access_token")
```
## Example Session
```
User: /hive-credentials for my research-agent
Agent: Let me check what credentials your research-agent needs.
[Calls check_missing_credentials(agent_path="exports/research-agent")]
→ Returns:
available: anthropic (encrypted_store), brave_search (encrypted_store)
missing: google_search (GOOGLE_API_KEY), google_cse (GOOGLE_CSE_ID)
ready: false
Agent: 2 of 4 required credentials are already configured. Only Google Custom
Search needs setup (2 values).
--- Setting up Google Custom Search (google_search + google_cse) ---
This requires two values that work together.
[Checks HIVE_CREDENTIAL_KEY before storing]
$ printenv HIVE_CREDENTIAL_KEY > /dev/null 2>&1 && echo "set" || echo "not set"
set
First, the Google API Key:
1. Go to https://console.cloud.google.com/apis/credentials
2. Create a new project (or select an existing one)
3. Enable the "Custom Search API" from the API Library
4. Go to Credentials > Create Credentials > API Key
5. Copy the generated API key
[AskUserQuestion: "Please provide your Google API key:"]
[User provides key]
Now, the Custom Search Engine ID:
1. Go to https://programmablesearchengine.google.com/controlpanel/all
2. Click "Add" to create a new search engine
3. Under "What to search", select "Search the entire web"
4. Give your search engine a name
5. Click "Create"
6. Copy the Search Engine ID (cx value)
[AskUserQuestion: "Please provide your Google CSE ID:"]
[User provides ID]
[Runs health check with both values - GET /customsearch/v1?q=test&num=1 → 200 OK]
[Stores both in local encrypted store, exports to env]
✓ Google Custom Search credentials valid
[Calls verify_credentials(agent_path="exports/research-agent")]
→ Returns: ready: true, missing_credentials: []
All credentials are now configured:
✓ anthropic (ANTHROPIC_API_KEY) — already in encrypted store
✓ brave_search (BRAVE_SEARCH_API_KEY) — already in encrypted store
✓ google_search (GOOGLE_API_KEY) — stored in encrypted store
✓ google_cse (GOOGLE_CSE_ID) — stored in encrypted store
┌─────────────────────────────────────────────────────────────────────────────┐
│ ✅ CREDENTIALS CONFIGURED │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ OPEN A NEW TERMINAL before running commands below. │
│ Environment variables were saved to your shell config but │
│ only take effect in new terminal sessions. │
│ │
│ NEXT STEPS: │
│ │
│ 1. RUN YOUR AGENT: │
│ │
│ hive tui │
│ │
│ 2. IF YOU ENCOUNTER ISSUES, USE THE DEBUGGER: │
│ │
│ /hive-debugger │
│ │
│ The debugger analyzes runtime logs, identifies retry loops, tool │
│ failures, stalled execution, and provides actionable fix suggestions. │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
File diff suppressed because it is too large Load Diff
+385
View File
@@ -0,0 +1,385 @@
---
name: hive-patterns
description: Best practices, patterns, and examples for building goal-driven agents. Includes client-facing interaction, feedback edges, judge patterns, fan-out/fan-in, context management, and anti-patterns.
license: Apache-2.0
metadata:
author: hive
version: "2.0"
type: reference
part_of: hive
---
# Building Agents - Patterns & Best Practices
Design patterns, examples, and best practices for building robust goal-driven agents.
**Prerequisites:** Complete agent structure using `hive-create`.
## Practical Example: Hybrid Workflow
How to build a node using both direct file writes and optional MCP validation:
```python
# 1. WRITE TO FILE FIRST (Primary - makes it visible)
node_code = '''
search_node = NodeSpec(
id="search-web",
node_type="event_loop",
input_keys=["query"],
output_keys=["search_results"],
system_prompt="Search the web for: {query}. Use web_search, then call set_output to store results.",
tools=["web_search"],
)
'''
Edit(
file_path="exports/research_agent/nodes/__init__.py",
old_string="# Nodes will be added here",
new_string=node_code
)
# 2. OPTIONALLY VALIDATE WITH MCP (Secondary - bookkeeping)
validation = mcp__agent-builder__test_node(
node_id="search-web",
test_input='{"query": "python tutorials"}',
mock_llm_response='{"search_results": [...mock results...]}'
)
```
**User experience:**
- Immediately sees node in their editor (from step 1)
- Gets validation feedback (from step 2)
- Can edit the file directly if needed
## Multi-Turn Interaction Patterns
For agents needing multi-turn conversations with users, use `client_facing=True` on event_loop nodes.
### Client-Facing Nodes
A client-facing node streams LLM output to the user and blocks for user input between conversational turns. This replaces the old pause/resume pattern.
```python
# Client-facing node with STEP 1/STEP 2 prompt pattern
intake_node = NodeSpec(
id="intake",
name="Intake",
description="Gather requirements from the user",
node_type="event_loop",
client_facing=True,
input_keys=["topic"],
output_keys=["research_brief"],
system_prompt="""\
You are an intake specialist.
**STEP 1 — Read and respond (text only, NO tool calls):**
1. Read the topic provided
2. If it's vague, ask 1-2 clarifying questions
3. If it's clear, confirm your understanding
**STEP 2 — After the user confirms, call set_output:**
- set_output("research_brief", "Clear description of what to research")
""",
)
# Internal node runs without user interaction
research_node = NodeSpec(
id="research",
name="Research",
description="Search and analyze sources",
node_type="event_loop",
input_keys=["research_brief"],
output_keys=["findings", "sources"],
system_prompt="Research the topic using web_search and web_scrape...",
tools=["web_search", "web_scrape", "load_data", "save_data"],
)
```
**How it works:**
- Client-facing nodes stream LLM text to the user and block for input after each response
- User input is injected via `node.inject_event(text)`
- When the LLM calls `set_output` to produce structured outputs, the judge evaluates and ACCEPTs
- Internal nodes (non-client-facing) run their entire loop without blocking
- `set_output` is a synthetic tool — a turn with only `set_output` calls (no real tools) triggers user input blocking
**STEP 1/STEP 2 pattern:** Always structure client-facing prompts with explicit phases. STEP 1 is text-only conversation. STEP 2 calls `set_output` after user confirmation. This prevents the LLM from calling `set_output` prematurely before the user responds.
### When to Use client_facing
| Scenario | client_facing | Why |
| ----------------------------------- | :-----------: | ---------------------- |
| Gathering user requirements | Yes | Need user input |
| Human review/approval checkpoint | Yes | Need human decision |
| Data processing (scanning, scoring) | No | Runs autonomously |
| Report generation | No | No user input needed |
| Final confirmation before action | Yes | Need explicit approval |
> **Legacy Note:** The `pause_nodes` / `entry_points` pattern still works for backward compatibility but `client_facing=True` is preferred for new agents.
## Edge-Based Routing and Feedback Loops
### Conditional Edge Routing
Multiple conditional edges from the same source replace the old `router` node type. Each edge checks a condition on the node's output.
```python
# Node with mutually exclusive outputs
review_node = NodeSpec(
id="review",
name="Review",
node_type="event_loop",
client_facing=True,
output_keys=["approved_contacts", "redo_extraction"],
nullable_output_keys=["approved_contacts", "redo_extraction"],
max_node_visits=3,
system_prompt="Present the contact list to the operator. If they approve, call set_output('approved_contacts', ...). If they want changes, call set_output('redo_extraction', 'true').",
)
# Forward edge (positive priority, evaluated first)
EdgeSpec(
id="review-to-campaign",
source="review",
target="campaign-builder",
condition=EdgeCondition.CONDITIONAL,
condition_expr="output.get('approved_contacts') is not None",
priority=1,
)
# Feedback edge (negative priority, evaluated after forward edges)
EdgeSpec(
id="review-feedback",
source="review",
target="extractor",
condition=EdgeCondition.CONDITIONAL,
condition_expr="output.get('redo_extraction') is not None",
priority=-1,
)
```
**Key concepts:**
- `nullable_output_keys`: Lists output keys that may remain unset. The node sets exactly one of the mutually exclusive keys per execution.
- `max_node_visits`: Must be >1 on the feedback target (extractor) so it can re-execute. Default is 1.
- `priority`: Positive = forward edge (evaluated first). Negative = feedback edge. The executor tries forward edges first; if none match, falls back to feedback edges.
### Routing Decision Table
| Pattern | Old Approach | New Approach |
| ---------------------- | ----------------------- | --------------------------------------------- |
| Conditional branching | `router` node | Conditional edges with `condition_expr` |
| Binary approve/reject | `pause_nodes` + resume | `client_facing=True` + `nullable_output_keys` |
| Loop-back on rejection | Manual entry_points | Feedback edge with `priority=-1` |
| Multi-way routing | Router with routes dict | Multiple conditional edges with priorities |
## Judge Patterns
**Core Principle: The judge is the SOLE mechanism for acceptance decisions.** Never add ad-hoc framework gating to compensate for LLM behavior. If the LLM calls `set_output` prematurely, fix the system prompt or use a custom judge. Anti-patterns to avoid:
- Output rollback logic
- `_user_has_responded` flags
- Premature set_output rejection
- Interaction protocol injection into system prompts
Judges control when an event_loop node's loop exits. Choose based on validation needs.
### Implicit Judge (Default)
When no judge is configured, the implicit judge ACCEPTs when:
- The LLM finishes its response with no tool calls
- All required output keys have been set via `set_output`
Best for simple nodes where "all outputs set" is sufficient validation.
### SchemaJudge
Validates outputs against a Pydantic model. Use when you need structural validation.
```python
from pydantic import BaseModel
class ScannerOutput(BaseModel):
github_users: list[dict] # Must be a list of user objects
class SchemaJudge:
def __init__(self, output_model: type[BaseModel]):
self._model = output_model
async def evaluate(self, context: dict) -> JudgeVerdict:
missing = context.get("missing_keys", [])
if missing:
return JudgeVerdict(
action="RETRY",
feedback=f"Missing output keys: {missing}. Use set_output to provide them.",
)
try:
self._model.model_validate(context["output_accumulator"])
return JudgeVerdict(action="ACCEPT")
except ValidationError as e:
return JudgeVerdict(action="RETRY", feedback=str(e))
```
### When to Use Which Judge
| Judge | Use When | Example |
| --------------- | ------------------------------------- | ---------------------- |
| Implicit (None) | Output keys are sufficient validation | Simple data extraction |
| SchemaJudge | Need structural validation of outputs | API response parsing |
| Custom | Domain-specific validation logic | Score must be 0.0-1.0 |
## Fan-Out / Fan-In (Parallel Execution)
Multiple ON_SUCCESS edges from the same source trigger parallel execution. All branches run concurrently via `asyncio.gather()`.
```python
# Scanner fans out to Profiler and Scorer in parallel
EdgeSpec(id="scanner-to-profiler", source="scanner", target="profiler",
condition=EdgeCondition.ON_SUCCESS)
EdgeSpec(id="scanner-to-scorer", source="scanner", target="scorer",
condition=EdgeCondition.ON_SUCCESS)
# Both fan in to Extractor
EdgeSpec(id="profiler-to-extractor", source="profiler", target="extractor",
condition=EdgeCondition.ON_SUCCESS)
EdgeSpec(id="scorer-to-extractor", source="scorer", target="extractor",
condition=EdgeCondition.ON_SUCCESS)
```
**Requirements:**
- Parallel event_loop nodes must have **disjoint output_keys** (no key written by both)
- Only one parallel branch may contain a `client_facing` node
- Fan-in node receives outputs from all completed branches in shared memory
## Context Management Patterns
### Tiered Compaction
EventLoopNode automatically manages context window usage with tiered compaction:
1. **Pruning** — Old tool results replaced with compact placeholders (zero-cost, no LLM call)
2. **Normal compaction** — LLM summarizes older messages
3. **Aggressive compaction** — Keeps only recent messages + summary
4. **Emergency** — Hard reset with tool history preservation
### Spillover Pattern
The framework automatically truncates large tool results and saves full content to a spillover directory. The LLM receives a truncation message with instructions to use `load_data` to read the full result.
For explicit data management, use the data tools (real MCP tools, not synthetic):
```python
# save_data, load_data, list_data_files, serve_file_to_user are real MCP tools
# data_dir is auto-injected by the framework — the LLM never sees it
# Saving large results
save_data(filename="sources.json", data=large_json_string)
# Reading with pagination (line-based offset/limit)
load_data(filename="sources.json", offset=0, limit=50)
# Listing available files
list_data_files()
# Serving a file to the user as a clickable link
serve_file_to_user(filename="report.html", label="Research Report")
```
Add data tools to nodes that handle large tool results:
```python
research_node = NodeSpec(
...
tools=["web_search", "web_scrape", "load_data", "save_data", "list_data_files"],
)
```
`data_dir` is a framework context parameter — auto-injected at call time. `GraphExecutor.execute()` sets it per-execution via `ToolRegistry.set_execution_context(data_dir=...)` (using `contextvars` for concurrency safety), ensuring it matches the session-scoped spillover directory.
## Anti-Patterns
### What NOT to Do
- **Don't rely on `export_graph`** — Write files immediately, not at end
- **Don't hide code in session** — Write to files as components are approved
- **Don't wait to write files** — Agent visible from first step
- **Don't batch everything** — Write incrementally, one component at a time
- **Don't create too many thin nodes** — Prefer fewer, richer nodes (see below)
- **Don't add framework gating for LLM behavior** — Fix prompts or use judges instead
### Fewer, Richer Nodes
A common mistake is splitting work into too many small single-purpose nodes. Each node boundary requires serializing outputs, losing in-context information, and adding edge complexity.
| Bad (8 thin nodes) | Good (4 rich nodes) |
| ------------------- | ----------------------------------- |
| parse-query | intake (client-facing) |
| search-sources | research (search + fetch + analyze) |
| fetch-content | review (client-facing) |
| evaluate-sources | report (write + deliver) |
| synthesize-findings | |
| write-report | |
| quality-check | |
| save-report | |
**Why fewer nodes are better:**
- The LLM retains full context of its work within a single node
- A research node that searches, fetches, and analyzes keeps all source material in its conversation history
- Fewer edges means simpler graph and fewer failure points
- Data tools (`save_data`/`load_data`) handle context window limits within a single node
### MCP Tools - Correct Usage
**MCP tools OK for:**
- `test_node` — Validate node configuration with mock inputs
- `validate_graph` — Check graph structure
- `configure_loop` — Set event loop parameters
- `create_session` — Track session state for bookkeeping
**Just don't:** Use MCP as the primary construction method or rely on export_graph
## Error Handling Patterns
### Graceful Failure with Fallback
```python
edges = [
# Success path
EdgeSpec(id="api-success", source="api-call", target="process-results",
condition=EdgeCondition.ON_SUCCESS),
# Fallback on failure
EdgeSpec(id="api-to-fallback", source="api-call", target="fallback-cache",
condition=EdgeCondition.ON_FAILURE, priority=1),
# Report if fallback also fails
EdgeSpec(id="fallback-to-error", source="fallback-cache", target="report-error",
condition=EdgeCondition.ON_FAILURE, priority=1),
]
```
## Handoff to Testing
When agent is complete, transition to testing phase:
### Pre-Testing Checklist
- [ ] Agent structure validates: `uv run python -m agent_name validate`
- [ ] All nodes defined in nodes/**init**.py
- [ ] All edges connect valid nodes with correct priorities
- [ ] Feedback edge targets have `max_node_visits > 1`
- [ ] Client-facing nodes have meaningful system prompts
- [ ] Agent can be imported: `from exports.agent_name import default_agent`
## Related Skills
- **hive-concepts** — Fundamental concepts (node types, edges, event loop architecture)
- **hive-create** — Step-by-step building process
- **hive-test** — Test and validate agents
- **hive** — Complete workflow orchestrator
---
**Remember: Agent is actively constructed, visible the whole time. No hidden state. No surprise exports. Just transparent, incremental file building.**
+940
View File
@@ -0,0 +1,940 @@
---
name: hive-test
description: Iterative agent testing with session recovery. Execute, analyze, fix, resume from checkpoints. Use when testing an agent, debugging test failures, or verifying fixes without re-running from scratch.
---
# Agent Testing
Test agents iteratively: execute, analyze failures, fix, resume from checkpoint, repeat.
## When to Use
- Testing a newly built agent against its goal
- Debugging a failing agent iteratively
- Verifying fixes without re-running expensive early nodes
- Running final regression tests before deployment
## Prerequisites
1. Agent package at `exports/{agent_name}/` (built with `/hive-create`)
2. Credentials configured (`/hive-credentials`)
3. `ANTHROPIC_API_KEY` set (or appropriate LLM provider key)
**Path distinction** (critical — don't confuse these):
- `exports/{agent_name}/` — agent source code (edit here)
- `~/.hive/agents/{agent_name}/` — runtime data: sessions, checkpoints, logs (read here)
---
## The Iterative Test Loop
This is the core workflow. Don't re-run the entire agent when a late node fails — analyze, fix, and resume from the last clean checkpoint.
```
┌──────────────────────────────────────┐
│ PHASE 1: Generate Test Scenarios │
│ Goal → synthetic test inputs + tests │
└──────────────┬───────────────────────┘
┌──────────────────────────────────────┐
│ PHASE 2: Execute │◄────────────────┐
│ Run agent (CLI or pytest) │ │
└──────────────┬───────────────────────┘ │
↓ │
Pass? ──yes──► PHASE 6: Final Verification │
│ │
no │
↓ │
┌──────────────────────────────────────┐ │
│ PHASE 3: Analyze │ │
│ Session + runtime logs + checkpoints │ │
└──────────────┬───────────────────────┘ │
↓ │
┌──────────────────────────────────────┐ │
│ PHASE 4: Fix │ │
│ Prompt / code / graph / goal │ │
└──────────────┬───────────────────────┘ │
↓ │
┌──────────────────────────────────────┐ │
│ PHASE 5: Recover & Resume │─────────────────┘
│ Checkpoint resume OR fresh re-run │
└──────────────────────────────────────┘
```
---
### Phase 1: Generate Test Scenarios
Create synthetic tests from the agent's goal, constraints, and success criteria.
#### Step 1a: Read the goal
```python
# Read goal from agent.py
Read(file_path="exports/{agent_name}/agent.py")
# Extract the Goal definition and convert to JSON string
```
#### Step 1b: Get test guidelines
```python
# Get constraint test guidelines
generate_constraint_tests(
goal_id="your-goal-id",
goal_json='{"id": "...", "constraints": [...]}',
agent_path="exports/{agent_name}"
)
# Get success criteria test guidelines
generate_success_tests(
goal_id="your-goal-id",
goal_json='{"id": "...", "success_criteria": [...]}',
node_names="intake,research,review,report",
tool_names="web_search,web_scrape",
agent_path="exports/{agent_name}"
)
```
These return `file_header`, `test_template`, `constraints_formatted`/`success_criteria_formatted`, and `test_guidelines`. They do NOT generate test code — you write the tests.
#### Step 1c: Write tests
```python
Write(
file_path=result["output_file"],
content=result["file_header"] + "\n\n" + your_test_code
)
```
#### Test writing rules
- Every test MUST be `async` with `@pytest.mark.asyncio`
- Every test MUST accept `runner, auto_responder, mock_mode` fixtures
- Use `await auto_responder.start()` before running, `await auto_responder.stop()` in `finally`
- Use `await runner.run(input_dict)` — this goes through AgentRunner → AgentRuntime → ExecutionStream
- Access output via `result.output.get("key")` — NEVER `result.output["key"]`
- `result.success=True` means no exception, NOT goal achieved — always check output
- Write 8-15 tests total, not 30+
- Each real test costs ~3 seconds + LLM tokens
- NEVER use `default_agent.run()` — it bypasses the runtime (no sessions, no logs, client-facing nodes hang)
#### Step 1d: Check existing tests
Before generating, check if tests already exist:
```python
list_tests(
goal_id="your-goal-id",
agent_path="exports/{agent_name}"
)
```
---
### Phase 2: Execute
Two execution paths, use the right one for your situation.
#### Iterative debugging (for complex agents)
Run the agent via CLI. This creates sessions with checkpoints at `~/.hive/agents/{agent_name}/sessions/`:
```bash
uv run hive run exports/{agent_name} --input '{"query": "test topic"}'
```
Sessions and checkpoints are saved automatically.
**Client-facing nodes**: Agents with `client_facing=True` nodes (interactive conversation) work in headless mode when run from a real terminal — the agent streams output to stdout and reads user input from stdin via a `>>> ` prompt. In non-interactive shells (like Claude Code's Bash tool), client-facing nodes will hang because there is no stdin. For testing interactive agents from Claude Code, use `run_tests` with mock mode or have the user run the agent manually in their terminal.
#### Automated regression (for CI or final verification)
Use the `run_tests` MCP tool to run all pytest tests:
```python
run_tests(
goal_id="your-goal-id",
agent_path="exports/{agent_name}"
)
```
Returns structured results:
```json
{
"overall_passed": false,
"summary": {"total": 12, "passed": 10, "failed": 2, "pass_rate": "83.3%"},
"test_results": [{"test_name": "test_success_source_diversity", "status": "failed"}],
"failures": [{"test_name": "test_success_source_diversity", "details": "..."}]
}
```
**Options:**
```python
# Run only constraint tests
run_tests(goal_id, agent_path, test_types='["constraint"]')
# Stop on first failure
run_tests(goal_id, agent_path, fail_fast=True)
# Parallel execution
run_tests(goal_id, agent_path, parallel=4)
```
**Note:** `run_tests` uses `AgentRunner` with `tmp_path` storage, so sessions are isolated per test run. For checkpoint-based recovery with persistent sessions, use CLI execution. Use `run_tests` for quick regression checks and final verification.
---
### Phase 3: Analyze Failures
When a test fails, drill down systematically. Don't guess — use the tools.
#### Step 3a: Get error category
```python
debug_test(
goal_id="your-goal-id",
test_name="test_success_source_diversity",
agent_path="exports/{agent_name}"
)
```
Returns error category (`IMPLEMENTATION_ERROR`, `ASSERTION_FAILURE`, `TIMEOUT`, `IMPORT_ERROR`, `API_ERROR`) plus full traceback and suggestions.
#### Step 3b: Find the failed session
```python
list_agent_sessions(
agent_work_dir="~/.hive/agents/{agent_name}",
status="failed",
limit=5
)
```
Returns session list with IDs, timestamps, current_node (where it failed), execution_quality.
#### Step 3c: Inspect session state
```python
get_agent_session_state(
agent_work_dir="~/.hive/agents/{agent_name}",
session_id="session_20260209_143022_abc12345"
)
```
Returns execution path, which node was current, step count, timestamps — but excludes memory values (to avoid context bloat). Shows `memory_keys` and `memory_size` instead.
#### Step 3d: Examine runtime logs (L2/L3)
```python
# L2: Per-node success/failure, retry counts
query_runtime_log_details(
agent_work_dir="~/.hive/agents/{agent_name}",
run_id="session_20260209_143022_abc12345",
needs_attention_only=True
)
# L3: Exact LLM responses, tool call inputs/outputs
query_runtime_log_raw(
agent_work_dir="~/.hive/agents/{agent_name}",
run_id="session_20260209_143022_abc12345",
node_id="research"
)
```
#### Step 3e: Inspect memory data
```python
# See what data a node actually produced
get_agent_session_memory(
agent_work_dir="~/.hive/agents/{agent_name}",
session_id="session_20260209_143022_abc12345",
key="research_results"
)
```
#### Step 3f: Find recovery points
```python
list_agent_checkpoints(
agent_work_dir="~/.hive/agents/{agent_name}",
session_id="session_20260209_143022_abc12345",
is_clean="true"
)
```
Returns checkpoint summaries with IDs, types (`node_start`, `node_complete`), which node, and `is_clean` flag. Clean checkpoints are safe resume points.
#### Step 3g: Compare checkpoints (optional)
To understand what changed between two points in execution:
```python
compare_agent_checkpoints(
agent_work_dir="~/.hive/agents/{agent_name}",
session_id="session_20260209_143022_abc12345",
checkpoint_id_before="cp_node_complete_research_143030",
checkpoint_id_after="cp_node_complete_review_143115"
)
```
Returns memory diff (added/removed/changed keys) and execution path diff.
---
### Phase 4: Fix Based on Root Cause
Use the analysis from Phase 3 to determine what to fix and where.
| Root Cause | What to Fix | Where to Edit |
|------------|------------|---------------|
| **Prompt issue** — LLM produces wrong output format, misses instructions | Node `system_prompt` | `exports/{agent}/nodes/__init__.py` |
| **Code bug** — TypeError, KeyError, logic error in Python | Agent code | `exports/{agent}/agent.py`, `nodes/__init__.py` |
| **Graph issue** — wrong routing, missing edge, bad condition_expr | Edges, node config | `exports/{agent}/agent.py` |
| **Tool issue** — MCP tool fails, wrong config, missing credential | Tool config | `exports/{agent}/mcp_servers.json`, `/hive-credentials` |
| **Goal issue** — success criteria too strict/vague, wrong constraints | Goal definition | `exports/{agent}/agent.py` (goal section) |
| **Test issue** — test expectations don't match actual agent behavior | Test code | `exports/{agent}/tests/test_*.py` |
#### Fix strategies by error category
**IMPLEMENTATION_ERROR** (TypeError, AttributeError, KeyError):
```python
# Read the failing code
Read(file_path="exports/{agent_name}/nodes/__init__.py")
# Fix the bug
Edit(
file_path="exports/{agent_name}/nodes/__init__.py",
old_string="results.get('videos')",
new_string="(results or {}).get('videos', [])"
)
```
**ASSERTION_FAILURE** (test assertions fail but agent ran successfully):
- Check if the agent's output is actually wrong → fix the prompt
- Check if the test's expectations are unrealistic → fix the test
- Use `get_agent_session_memory` to see what the agent actually produced
**TIMEOUT / STALL** (agent runs too long):
- Check `node_visit_counts` for feedback loops hitting max_node_visits
- Check L3 logs for tool calls that hang
- Reduce `max_iterations` in loop_config or fix the prompt to converge faster
**API_ERROR** (connection, rate limit, auth):
- Verify credentials with `/hive-credentials`
- Check MCP server configuration
---
### Phase 5: Recover & Resume
After fixing the agent, decide whether to resume or re-run.
#### When to resume from checkpoint
Resume when ALL of these are true:
- The fix is to a node that comes AFTER existing clean checkpoints
- Clean checkpoints exist (from a CLI execution with checkpointing)
- The early nodes are expensive (web scraping, API calls, long LLM chains)
```bash
# Resume from the last clean checkpoint before the failing node
uv run hive run exports/{agent_name} \
--resume-session session_20260209_143022_abc12345 \
--checkpoint cp_node_complete_research_143030
```
This skips all nodes before the checkpoint and only re-runs the fixed node onward.
#### When to re-run from scratch
Re-run when ANY of these are true:
- The fix is to the entry node or an early node
- No checkpoints exist (e.g., agent was run via `run_tests`)
- The agent is fast (2-3 nodes, completes in seconds)
- You changed the graph structure (added/removed nodes/edges)
```bash
uv run hive run exports/{agent_name} --input '{"query": "test topic"}'
```
#### Inspecting a checkpoint before resuming
```python
get_agent_checkpoint(
agent_work_dir="~/.hive/agents/{agent_name}",
session_id="session_20260209_143022_abc12345",
checkpoint_id="cp_node_complete_research_143030"
)
```
Returns the full checkpoint: shared_memory snapshot, execution_path, current_node, next_node, is_clean.
#### Loop back to Phase 2
After resuming or re-running, check if the fix worked. If not, go back to Phase 3.
---
### Phase 6: Final Verification
Once the iterative fix loop converges (the agent produces correct output), run the full automated test suite:
```python
run_tests(
goal_id="your-goal-id",
agent_path="exports/{agent_name}"
)
```
All tests should pass. If not, repeat the loop for remaining failures.
---
## Credential Requirements
**CRITICAL: Testing requires ALL credentials the agent depends on.** This includes both the LLM API key AND any tool-specific credentials (HubSpot, Brave Search, etc.).
### Prerequisites
Before running agent tests, you MUST collect ALL required credentials from the user.
**Step 1: LLM API Key (always required)**
```bash
export ANTHROPIC_API_KEY="your-key-here"
```
**Step 2: Tool-specific credentials (depends on agent's tools)**
Inspect the agent's `mcp_servers.json` and tool configuration to determine which tools the agent uses, then check for all required credentials:
```python
from aden_tools.credentials import CredentialManager, CREDENTIAL_SPECS
creds = CredentialManager()
# Determine which tools the agent uses (from agent.json or mcp_servers.json)
agent_tools = [...] # e.g., ["hubspot_search_contacts", "web_search", ...]
# Find all missing credentials for those tools
missing = creds.get_missing_for_tools(agent_tools)
```
Common tool credentials:
| Tool | Env Var | Help URL |
|------|---------|----------|
| HubSpot CRM | `HUBSPOT_ACCESS_TOKEN` | https://developers.hubspot.com/docs/api/private-apps |
| Brave Search | `BRAVE_SEARCH_API_KEY` | https://brave.com/search/api/ |
| Google Search | `GOOGLE_SEARCH_API_KEY` + `GOOGLE_SEARCH_CX` | https://developers.google.com/custom-search |
**Why ALL credentials are required:**
- Tests need to execute the agent's LLM nodes to validate behavior
- Tools with missing credentials will return error dicts instead of real data
- Mock mode bypasses everything, providing no confidence in real-world performance
### Mock Mode Limitations
Mock mode (`--mock` flag or `MOCK_MODE=1`) is **ONLY for structure validation**:
- Validates graph structure (nodes, edges, connections)
- Validates that `AgentRunner.load()` succeeds and the agent is importable
- Does NOT execute event_loop agents — MockLLMProvider never calls `set_output`, so event_loop nodes loop forever
- Does NOT test LLM reasoning, content quality, or constraint validation
- Does NOT test real API integrations or tool use
**Bottom line:** If you're testing whether an agent achieves its goal, you MUST use real credentials.
### Enforcing Credentials in Tests
When writing tests, **ALWAYS include credential checks**:
```python
import os
import pytest
from aden_tools.credentials import CredentialManager
pytestmark = pytest.mark.skipif(
not CredentialManager().is_available("anthropic") and not os.environ.get("MOCK_MODE"),
reason="API key required for real testing. Set ANTHROPIC_API_KEY or use MOCK_MODE=1."
)
@pytest.fixture(scope="session", autouse=True)
def check_credentials():
"""Ensure ALL required credentials are set for real testing."""
creds = CredentialManager()
mock_mode = os.environ.get("MOCK_MODE")
if not creds.is_available("anthropic"):
if mock_mode:
print("\nRunning in MOCK MODE - structure validation only")
else:
pytest.fail(
"\nANTHROPIC_API_KEY not set!\n"
"Set API key: export ANTHROPIC_API_KEY='your-key-here'\n"
"Or run structure validation: MOCK_MODE=1 pytest exports/{agent}/tests/"
)
if not mock_mode:
agent_tools = [] # Update per agent
missing = creds.get_missing_for_tools(agent_tools)
if missing:
lines = ["\nMissing tool credentials!"]
for name in missing:
spec = creds.specs.get(name)
if spec:
lines.append(f" {spec.env_var} - {spec.description}")
pytest.fail("\n".join(lines))
```
### User Communication
When the user asks to test an agent, **ALWAYS check for ALL credentials first**:
1. **Identify the agent's tools** from `mcp_servers.json`
2. **Check ALL required credentials** using `CredentialManager`
3. **Ask the user to provide any missing credentials** before proceeding
4. Collect ALL missing credentials in a single prompt — not one at a time
---
## Safe Test Patterns
### OutputCleaner
The framework automatically validates and cleans node outputs using a fast LLM at edge traversal time. Tests should still use safe patterns because OutputCleaner may not catch all issues.
### Safe Access (REQUIRED)
```python
# UNSAFE - will crash on missing keys
approval = result.output["approval_decision"]
category = result.output["analysis"]["category"]
# SAFE - use .get() with defaults
output = result.output or {}
approval = output.get("approval_decision", "UNKNOWN")
# SAFE - type check before operations
analysis = output.get("analysis", {})
if isinstance(analysis, dict):
category = analysis.get("category", "unknown")
# SAFE - handle JSON parsing trap (LLM response as string)
import json
recommendation = output.get("recommendation", "{}")
if isinstance(recommendation, str):
try:
parsed = json.loads(recommendation)
if isinstance(parsed, dict):
approval = parsed.get("approval_decision", "UNKNOWN")
except json.JSONDecodeError:
approval = "UNKNOWN"
elif isinstance(recommendation, dict):
approval = recommendation.get("approval_decision", "UNKNOWN")
# SAFE - type check before iteration
items = output.get("items", [])
if isinstance(items, list):
for item in items:
...
```
### Helper Functions for conftest.py
```python
import json
import re
def _parse_json_from_output(result, key):
"""Parse JSON from agent output (framework may store full LLM response as string)."""
response_text = result.output.get(key, "")
json_text = re.sub(r'```json\s*|\s*```', '', response_text).strip()
try:
return json.loads(json_text)
except (json.JSONDecodeError, AttributeError, TypeError):
return result.output.get(key)
def safe_get_nested(result, key_path, default=None):
"""Safely get nested value from result.output."""
output = result.output or {}
current = output
for key in key_path:
if isinstance(current, dict):
current = current.get(key)
elif isinstance(current, str):
try:
json_text = re.sub(r'```json\s*|\s*```', '', current).strip()
parsed = json.loads(json_text)
if isinstance(parsed, dict):
current = parsed.get(key)
else:
return default
except json.JSONDecodeError:
return default
else:
return default
return current if current is not None else default
# Make available in tests
pytest.parse_json_from_output = _parse_json_from_output
pytest.safe_get_nested = safe_get_nested
```
### ExecutionResult Fields
**`result.success=True` means NO exception, NOT goal achieved**
```python
# WRONG
assert result.success
# RIGHT
assert result.success, f"Agent failed: {result.error}"
output = result.output or {}
approval = output.get("approval_decision")
assert approval == "APPROVED", f"Expected APPROVED, got {approval}"
```
All fields:
- `success: bool` — Completed without exception (NOT goal achieved!)
- `output: dict` — Complete memory snapshot (may contain raw strings)
- `error: str | None` — Error message if failed
- `steps_executed: int` — Number of nodes executed
- `total_tokens: int` — Cumulative token usage
- `total_latency_ms: int` — Total execution time
- `path: list[str]` — Node IDs traversed (may repeat in feedback loops)
- `paused_at: str | None` — Node ID if paused
- `session_state: dict` — State for resuming
- `node_visit_counts: dict[str, int]` — Visit counts per node (feedback loop testing)
- `execution_quality: str` — "clean", "degraded", or "failed"
### Test Count Guidance
**Write 8-15 tests, not 30+**
- 2-3 tests per success criterion
- 1 happy path test
- 1 boundary/edge case test
- 1 error handling test (optional)
Each real test costs ~3 seconds + LLM tokens. 12 tests = ~36 seconds, $0.12.
---
## Test Patterns
### Happy Path
```python
@pytest.mark.asyncio
async def test_happy_path(runner, auto_responder, mock_mode):
"""Test normal successful execution."""
await auto_responder.start()
try:
result = await runner.run({"query": "python tutorials"})
finally:
await auto_responder.stop()
assert result.success, f"Agent failed: {result.error}"
output = result.output or {}
assert output.get("report"), "No report produced"
```
### Boundary Condition
```python
@pytest.mark.asyncio
async def test_minimum_sources(runner, auto_responder, mock_mode):
"""Test at minimum source threshold."""
await auto_responder.start()
try:
result = await runner.run({"query": "niche topic"})
finally:
await auto_responder.stop()
assert result.success, f"Agent failed: {result.error}"
output = result.output or {}
sources = output.get("sources", [])
if isinstance(sources, list):
assert len(sources) >= 3, f"Expected >= 3 sources, got {len(sources)}"
```
### Error Handling
```python
@pytest.mark.asyncio
async def test_empty_input(runner, auto_responder, mock_mode):
"""Test graceful handling of empty input."""
await auto_responder.start()
try:
result = await runner.run({"query": ""})
finally:
await auto_responder.stop()
# Agent should either fail gracefully or produce an error message
output = result.output or {}
assert not result.success or output.get("error"), "Should handle empty input"
```
### Feedback Loop
```python
@pytest.mark.asyncio
async def test_feedback_loop_terminates(runner, auto_responder, mock_mode):
"""Test that feedback loops don't run forever."""
await auto_responder.start()
try:
result = await runner.run({"query": "test"})
finally:
await auto_responder.stop()
visits = result.node_visit_counts or {}
for node_id, count in visits.items():
assert count <= 5, f"Node {node_id} visited {count} times — possible infinite loop"
```
---
## MCP Tool Reference
### Phase 1: Test Generation
```python
# Check existing tests
list_tests(goal_id, agent_path)
# Get constraint test guidelines (returns templates, NOT generated tests)
generate_constraint_tests(goal_id, goal_json, agent_path)
# Returns: output_file, file_header, test_template, constraints_formatted, test_guidelines
# Get success criteria test guidelines
generate_success_tests(goal_id, goal_json, node_names, tool_names, agent_path)
# Returns: output_file, file_header, test_template, success_criteria_formatted, test_guidelines
```
### Phase 2: Execution
```python
# Automated regression (no checkpoints, fresh runs)
run_tests(goal_id, agent_path, test_types='["all"]', parallel=-1, fail_fast=False)
# Run only specific test types
run_tests(goal_id, agent_path, test_types='["constraint"]')
run_tests(goal_id, agent_path, test_types='["success"]')
```
```bash
# Iterative debugging with checkpoints (via CLI)
uv run hive run exports/{agent_name} --input '{"query": "test"}'
```
### Phase 3: Analysis
```python
# Debug a specific failed test
debug_test(goal_id, test_name, agent_path)
# Find failed sessions
list_agent_sessions(agent_work_dir, status="failed", limit=5)
# Inspect session state (excludes memory values)
get_agent_session_state(agent_work_dir, session_id)
# Inspect memory data
get_agent_session_memory(agent_work_dir, session_id, key="research_results")
# Runtime logs: L1 summaries
query_runtime_logs(agent_work_dir, status="needs_attention")
# Runtime logs: L2 per-node details
query_runtime_log_details(agent_work_dir, run_id, needs_attention_only=True)
# Runtime logs: L3 tool/LLM raw data
query_runtime_log_raw(agent_work_dir, run_id, node_id="research")
# Find clean checkpoints
list_agent_checkpoints(agent_work_dir, session_id, is_clean="true")
# Compare checkpoints (memory diff)
compare_agent_checkpoints(agent_work_dir, session_id, cp_before, cp_after)
```
### Phase 5: Recovery
```python
# Inspect checkpoint before resuming
get_agent_checkpoint(agent_work_dir, session_id, checkpoint_id)
# Empty checkpoint_id = latest checkpoint
```
```bash
# Resume from checkpoint via CLI (headless)
uv run hive run exports/{agent_name} \
--resume-session {session_id} --checkpoint {checkpoint_id}
```
---
## Anti-Patterns
| Don't | Do Instead |
|-------|-----------|
| Use `default_agent.run()` in tests | Use `runner.run()` with `auto_responder` fixtures (goes through AgentRuntime) |
| Re-run entire agent when a late node fails | Resume from last clean checkpoint |
| Treat `result.success` as goal achieved | Check `result.output` for actual criteria |
| Access `result.output["key"]` directly | Use `result.output.get("key")` |
| Fix random things hoping tests pass | Analyze L2/L3 logs to find root cause first |
| Write 30+ tests | Write 8-15 focused tests |
| Skip credential check | Use `/hive-credentials` before testing |
| Confuse `exports/` with `~/.hive/agents/` | Code in `exports/`, runtime data in `~/.hive/` |
| Use `run_tests` for iterative debugging | Use headless CLI with checkpoints for iterative debugging |
| Use headless CLI for final regression | Use `run_tests` for automated regression |
| Use `--tui` from Claude Code | Use headless `run` command — TUI hangs in non-interactive shells |
| Test client-facing nodes from Claude Code | Use mock mode, or have the user run the agent in their terminal |
| Run tests without reading goal first | Always understand the goal before writing tests |
| Skip Phase 3 analysis and guess | Use session + log tools to identify root cause |
---
## Example Walkthrough: Deep Research Agent
A complete iteration showing the test loop for an agent with nodes: `intake → research → review → report`.
### Phase 1: Generate tests
```python
# Read the goal
Read(file_path="exports/deep_research_agent/agent.py")
# Get success criteria test guidelines
result = generate_success_tests(
goal_id="rigorous-interactive-research",
goal_json='{"id": "rigorous-interactive-research", "success_criteria": [{"id": "source-diversity", "target": ">=5"}, {"id": "citation-coverage", "target": "100%"}, {"id": "report-completeness", "target": "90%"}]}',
node_names="intake,research,review,report",
tool_names="web_search,web_scrape",
agent_path="exports/deep_research_agent"
)
# Write tests
Write(
file_path=result["output_file"],
content=result["file_header"] + "\n\n" + test_code
)
```
### Phase 2: First execution
```python
run_tests(
goal_id="rigorous-interactive-research",
agent_path="exports/deep_research_agent",
fail_fast=True
)
```
Result: `test_success_source_diversity` fails — agent only found 2 sources instead of 5.
### Phase 3: Analyze
```python
# Debug the failing test
debug_test(
goal_id="rigorous-interactive-research",
test_name="test_success_source_diversity",
agent_path="exports/deep_research_agent"
)
# → ASSERTION_FAILURE: Expected >= 5 sources, got 2
# Find the session
list_agent_sessions(
agent_work_dir="~/.hive/agents/deep_research_agent",
status="completed",
limit=1
)
# → session_20260209_150000_abc12345
# See what the research node produced
get_agent_session_memory(
agent_work_dir="~/.hive/agents/deep_research_agent",
session_id="session_20260209_150000_abc12345",
key="research_results"
)
# → Only 2 web_search calls made, each returned 1 source
# Check the LLM's behavior in the research node
query_runtime_log_raw(
agent_work_dir="~/.hive/agents/deep_research_agent",
run_id="session_20260209_150000_abc12345",
node_id="research"
)
# → LLM called web_search only twice, then called set_output
```
Root cause: The research node's prompt doesn't tell the LLM to search for at least 5 diverse sources. It stops after the first couple of searches.
### Phase 4: Fix the prompt
```python
Read(file_path="exports/deep_research_agent/nodes/__init__.py")
Edit(
file_path="exports/deep_research_agent/nodes/__init__.py",
old_string='system_prompt="Search for information on the user\'s topic."',
new_string='system_prompt="Search for information on the user\'s topic. You MUST find at least 5 diverse, authoritative sources. Use multiple different search queries to ensure source diversity. Do not stop searching until you have at least 5 distinct sources."'
)
```
### Phase 5: Resume from checkpoint
For this example, the fix is to the `research` node. If we had run via CLI with checkpointing, we could resume from the checkpoint after `intake` to skip re-running intake:
```bash
# Check if clean checkpoint exists after intake
list_agent_checkpoints(
agent_work_dir="~/.hive/agents/deep_research_agent",
session_id="session_20260209_150000_abc12345",
is_clean="true"
)
# → cp_node_complete_intake_150005
# Resume from after intake, re-run research with fixed prompt
uv run hive run exports/deep_research_agent \
--resume-session session_20260209_150000_abc12345 \
--checkpoint cp_node_complete_intake_150005
```
Or for this simple case (intake is fast), just re-run:
```bash
uv run hive run exports/deep_research_agent --input '{"topic": "test"}'
```
### Phase 6: Final verification
```python
run_tests(
goal_id="rigorous-interactive-research",
agent_path="exports/deep_research_agent"
)
# → All 12 tests pass
```
---
## Test File Structure
```
exports/{agent_name}/
├── agent.py ← Agent to test (goal, nodes, edges)
├── nodes/__init__.py ← Node implementations (prompts, config)
├── config.py ← Agent configuration
├── mcp_servers.json ← Tool server config
└── tests/
├── conftest.py ← Shared fixtures + safe access helpers
├── test_constraints.py ← Constraint tests
├── test_success_criteria.py ← Success criteria tests
└── test_edge_cases.py ← Edge case tests
```
## Integration with Other Skills
| Scenario | From | To | Action |
|----------|------|----|--------|
| Agent built, ready to test | `/hive-create` | `/hive-test` | Generate tests, start loop |
| Prompt fix needed | `/hive-test` Phase 4 | Direct edit | Edit `nodes/__init__.py`, resume |
| Goal definition wrong | `/hive-test` Phase 4 | `/hive-create` | Update goal, may need rebuild |
| Missing credentials | `/hive-test` Phase 3 | `/hive-credentials` | Set up credentials |
| Complex runtime failure | `/hive-test` Phase 3 | `/hive-debugger` | Deep L1/L2/L3 analysis |
| All tests pass | `/hive-test` Phase 6 | Done | Agent validated |
@@ -0,0 +1,333 @@
# Example: Iterative Testing of a Research Agent
This example walks through the full iterative test loop for a research agent that searches the web, reviews findings, and produces a cited report.
## Agent Structure
```
exports/deep_research_agent/
├── agent.py # Goal + graph: intake → research → review → report
├── nodes/__init__.py # Node definitions (system_prompt, input/output keys)
├── config.py # Model config
├── mcp_servers.json # Tools: web_search, web_scrape
└── tests/ # Test files (we'll create these)
```
**Goal:** "Rigorous Interactive Research" — find 5+ diverse sources, cite every claim, produce a complete report.
---
## Phase 1: Generate Tests
### Read the goal
```python
Read(file_path="exports/deep_research_agent/agent.py")
# Extract: goal_id="rigorous-interactive-research"
# success_criteria: source-diversity (>=5), citation-coverage (100%), report-completeness (90%)
# constraints: no-hallucination, source-attribution
```
### Get test guidelines
```python
result = generate_success_tests(
goal_id="rigorous-interactive-research",
goal_json='{"id": "rigorous-interactive-research", "success_criteria": [{"id": "source-diversity", "description": "Use multiple diverse sources", "target": ">=5"}, {"id": "citation-coverage", "description": "Every claim cites its source", "target": "100%"}, {"id": "report-completeness", "description": "Report answers the research questions", "target": "90%"}]}',
node_names="intake,research,review,report",
tool_names="web_search,web_scrape",
agent_path="exports/deep_research_agent"
)
```
### Write tests
```python
Write(
file_path="exports/deep_research_agent/tests/test_success_criteria.py",
content=result["file_header"] + '''
@pytest.mark.asyncio
async def test_success_source_diversity(runner, auto_responder, mock_mode):
"""At least 5 diverse sources are found."""
await auto_responder.start()
try:
result = await runner.run({"query": "impact of remote work on productivity"})
finally:
await auto_responder.stop()
assert result.success, f"Agent failed: {result.error}"
output = result.output or {}
sources = output.get("sources", [])
if isinstance(sources, list):
assert len(sources) >= 5, f"Expected >= 5 sources, got {len(sources)}"
@pytest.mark.asyncio
async def test_success_citation_coverage(runner, auto_responder, mock_mode):
"""Every factual claim in the report cites its source."""
await auto_responder.start()
try:
result = await runner.run({"query": "climate change effects on agriculture"})
finally:
await auto_responder.stop()
assert result.success, f"Agent failed: {result.error}"
output = result.output or {}
report = output.get("report", "")
# Check that report contains numbered references
assert "[1]" in str(report) or "[source" in str(report).lower(), "Report lacks citations"
@pytest.mark.asyncio
async def test_success_report_completeness(runner, auto_responder, mock_mode):
"""Report addresses the original research question."""
query = "pros and cons of nuclear energy"
await auto_responder.start()
try:
result = await runner.run({"query": query})
finally:
await auto_responder.stop()
assert result.success, f"Agent failed: {result.error}"
output = result.output or {}
report = output.get("report", "")
assert len(str(report)) > 200, f"Report too short: {len(str(report))} chars"
@pytest.mark.asyncio
async def test_empty_query_handling(runner, auto_responder, mock_mode):
"""Agent handles empty input gracefully."""
await auto_responder.start()
try:
result = await runner.run({"query": ""})
finally:
await auto_responder.stop()
output = result.output or {}
assert not result.success or output.get("error"), "Should handle empty query"
@pytest.mark.asyncio
async def test_feedback_loop_terminates(runner, auto_responder, mock_mode):
"""Feedback loop between review and research terminates."""
await auto_responder.start()
try:
result = await runner.run({"query": "quantum computing basics"})
finally:
await auto_responder.stop()
visits = result.node_visit_counts or {}
for node_id, count in visits.items():
assert count <= 5, f"Node {node_id} visited {count} times"
'''
)
```
---
## Phase 2: First Execution
```python
run_tests(
goal_id="rigorous-interactive-research",
agent_path="exports/deep_research_agent",
fail_fast=True
)
```
**Result:**
```json
{
"overall_passed": false,
"summary": {"total": 5, "passed": 3, "failed": 2, "pass_rate": "60.0%"},
"failures": [
{"test_name": "test_success_source_diversity", "details": "AssertionError: Expected >= 5 sources, got 2"},
{"test_name": "test_success_citation_coverage", "details": "AssertionError: Report lacks citations"}
]
}
```
---
## Phase 3: Analyze (Iteration 1)
### Debug the first failure
```python
debug_test(
goal_id="rigorous-interactive-research",
test_name="test_success_source_diversity",
agent_path="exports/deep_research_agent"
)
# Category: ASSERTION_FAILURE — Expected >= 5 sources, got 2
```
### Find the session and inspect memory
```python
list_agent_sessions(
agent_work_dir="~/.hive/agents/deep_research_agent",
status="completed",
limit=1
)
# → session_20260209_150000_abc12345
get_agent_session_memory(
agent_work_dir="~/.hive/agents/deep_research_agent",
session_id="session_20260209_150000_abc12345",
key="research_results"
)
# → Only 2 sources found. LLM stopped searching after 2 queries.
```
### Check LLM behavior in the research node
```python
query_runtime_log_raw(
agent_work_dir="~/.hive/agents/deep_research_agent",
run_id="session_20260209_150000_abc12345",
node_id="research"
)
# → LLM called web_search twice, got results, immediately called set_output.
# → Prompt doesn't instruct it to find at least 5 sources.
```
**Root cause:** The research node's system_prompt doesn't specify minimum source requirements.
---
## Phase 4: Fix (Iteration 1)
```python
Read(file_path="exports/deep_research_agent/nodes/__init__.py")
# Fix the research node prompt
Edit(
file_path="exports/deep_research_agent/nodes/__init__.py",
old_string='system_prompt="Search for information on the user\'s topic using web search."',
new_string='system_prompt="Search for information on the user\'s topic using web search. You MUST find at least 5 diverse, authoritative sources. Use multiple different search queries with varied keywords. Do NOT call set_output until you have gathered at least 5 distinct sources from different domains."'
)
```
---
## Phase 5: Recover & Resume (Iteration 1)
The fix is to the `research` node. Since this was a `run_tests` execution (no checkpoints), we re-run from scratch:
```python
run_tests(
goal_id="rigorous-interactive-research",
agent_path="exports/deep_research_agent",
fail_fast=True
)
```
**Result:**
```json
{
"overall_passed": false,
"summary": {"total": 5, "passed": 4, "failed": 1, "pass_rate": "80.0%"},
"failures": [
{"test_name": "test_success_citation_coverage", "details": "AssertionError: Report lacks citations"}
]
}
```
Source diversity now passes. Citation coverage still fails.
---
## Phase 3: Analyze (Iteration 2)
```python
debug_test(
goal_id="rigorous-interactive-research",
test_name="test_success_citation_coverage",
agent_path="exports/deep_research_agent"
)
# Category: ASSERTION_FAILURE — Report lacks citations
# Check what the report node produced
list_agent_sessions(
agent_work_dir="~/.hive/agents/deep_research_agent",
status="completed",
limit=1
)
# → session_20260209_151500_def67890
get_agent_session_memory(
agent_work_dir="~/.hive/agents/deep_research_agent",
session_id="session_20260209_151500_def67890",
key="report"
)
# → Report text exists but uses no numbered references.
# → Sources are in memory but report node doesn't cite them.
```
**Root cause:** The report node's prompt doesn't instruct the LLM to include numbered citations.
---
## Phase 4: Fix (Iteration 2)
```python
Edit(
file_path="exports/deep_research_agent/nodes/__init__.py",
old_string='system_prompt="Write a comprehensive report based on the research findings."',
new_string='system_prompt="Write a comprehensive report based on the research findings. You MUST include numbered citations [1], [2], etc. for every factual claim. At the end, include a References section listing all sources with their URLs. Every claim must be traceable to a specific source."'
)
```
---
## Phase 5: Resume (Iteration 2)
The fix is to the `report` node (the last node). To demonstrate checkpoint recovery, run via CLI:
```bash
# Run via CLI to get checkpoints
uv run hive run exports/deep_research_agent --input '{"topic": "climate change effects"}'
# After it runs, find the clean checkpoint before report
list_agent_checkpoints(
agent_work_dir="~/.hive/agents/deep_research_agent",
session_id="session_20260209_152000_ghi34567",
is_clean="true"
)
# → cp_node_complete_review_152100 (after review, before report)
# Resume — skips intake, research, review entirely
uv run hive run exports/deep_research_agent \
--resume-session session_20260209_152000_ghi34567 \
--checkpoint cp_node_complete_review_152100
```
Only the `report` node re-runs with the fixed prompt, using research data from the checkpoint.
---
## Phase 6: Final Verification
```python
run_tests(
goal_id="rigorous-interactive-research",
agent_path="exports/deep_research_agent"
)
```
**Result:**
```json
{
"overall_passed": true,
"summary": {"total": 5, "passed": 5, "failed": 0, "pass_rate": "100.0%"}
}
```
All tests pass.
---
## Summary
| Iteration | Failure | Root Cause | Fix | Recovery |
|-----------|---------|------------|-----|----------|
| 1 | Source diversity (2 < 5) | Research prompt too vague | Added "at least 5 sources" to prompt | Re-run (no checkpoints) |
| 2 | No citations in report | Report prompt lacks citation instructions | Added citation requirements | Checkpoint resume (skipped 3 nodes) |
**Key takeaways:**
- Phase 3 analysis (session memory + L3 logs) identified root causes without guessing
- Checkpoint recovery in iteration 2 saved time by skipping 3 expensive nodes
- Final `run_tests` confirms all scenarios pass end-to-end
@@ -1,30 +1,53 @@
---
name: agent-workflow
description: Complete workflow for building, implementing, and testing goal-driven agents. Orchestrates building-agents-* and testing-agent skills. Use when starting a new agent project, unsure which skill to use, or need end-to-end guidance.
name: hive
description: Complete workflow for building, implementing, and testing goal-driven agents. Orchestrates hive-* skills. Use when starting a new agent project, unsure which skill to use, or need end-to-end guidance.
license: Apache-2.0
metadata:
author: hive
version: "2.0"
type: workflow-orchestrator
orchestrates:
- building-agents-core
- building-agents-construction
- building-agents-patterns
- testing-agent
- hive-concepts
- hive-create
- hive-patterns
- hive-test
- hive-credentials
- hive-debugger
---
# Agent Development Workflow
**THIS IS AN EXECUTABLE WORKFLOW. DO NOT explore the codebase or read source files. ROUTE to the correct skill IMMEDIATELY.**
When this skill is loaded, **ALWAYS use the AskUserQuestion tool** to present options:
```
Use AskUserQuestion with these options:
- "Build a new agent" → Then invoke /hive-create
- "Test an existing agent" → Then invoke /hive-test
- "Learn agent concepts" → Then invoke /hive-concepts
- "Optimize agent design" → Then invoke /hive-patterns
- "Set up credentials" → Then invoke /hive-credentials
- "Debug a failing agent" → Then invoke /hive-debugger
- "Other" (please describe what you want to achieve)
```
**DO NOT:** Read source files, explore the codebase, search for code, or do any investigation before routing. The sub-skills handle all of that.
---
Complete Standard Operating Procedure (SOP) for building production-ready goal-driven agents.
## Overview
This workflow orchestrates specialized skills to take you from initial concept to production-ready agent:
1. **Understand Concepts** (5-10 min) → `/building-agents-core` (optional)
2. **Build Structure** (15-30 min) → `/building-agents-construction`
3. **Optimize Design** (10-15 min) → `/building-agents-patterns` (optional)
4. **Test & Validate** (20-40 min) → `/testing-agent`
1. **Understand Concepts** `/hive-concepts` (optional)
2. **Build Structure** `/hive-create`
3. **Optimize Design** `/hive-patterns` (optional)
4. **Setup Credentials**`/hive-credentials` (if agent uses tools requiring API keys)
5. **Test & Validate**`/hive-test`
6. **Debug Issues**`/hive-debugger` (if agent fails at runtime)
## When to Use This Workflow
@@ -35,24 +58,26 @@ Use this meta-skill when:
- Want consistent, repeatable agent builds
**Skip this workflow** if:
- You only need to test an existing agent → use `/testing-agent` directly
- You only need to test an existing agent → use `/hive-test` directly
- You know exactly which phase you're in → use specific skill directly
## Quick Decision Tree
```
"Need to understand agent concepts" → building-agents-core
"Build a new agent" → building-agents-construction
"Optimize my agent design" → building-agents-patterns
"Test my agent" → testing-agent
"Need to understand agent concepts" → hive-concepts
"Build a new agent" → hive-create
"Optimize my agent design" → hive-patterns
"Need client-facing nodes or feedback loops" → hive-patterns
"Set up API keys for my agent" → hive-credentials
"Test my agent" → hive-test
"My agent is failing/stuck/has errors" → hive-debugger
"Not sure what I need" → Read phases below, then decide
"Agent has structure but needs implementation" → See agent directory STATUS.md
```
## Phase 0: Understand Concepts (Optional)
**Duration**: 5-10 minutes
**Skill**: `/building-agents-core`
**Skill**: `/hive-concepts`
**Input**: Questions about agent architecture
### When to Use
@@ -60,12 +85,12 @@ Use this meta-skill when:
- First time building an agent
- Need to understand node types, edges, goals
- Want to validate tool availability
- Learning about pause/resume architecture
- Learning about event loop architecture and client-facing nodes
### What This Phase Provides
- Architecture overview (Python packages, not JSON)
- Core concepts (Goal, Node, Edge, Pause/Resume)
- Core concepts (Goal, Node, Edge, Event Loop, Judges)
- Tool discovery and validation procedures
- Workflow overview
@@ -73,9 +98,8 @@ Use this meta-skill when:
## Phase 1: Build Agent Structure
**Duration**: 15-30 minutes
**Skill**: `/building-agents-construction`
**Input**: User requirements ("Build an agent that...")
**Skill**: `/hive-create`
**Input**: User requirements ("Build an agent that...") or a template to start from
### What This Phase Does
@@ -103,7 +127,7 @@ Creates the complete agent architecture:
- ✅ 1-5 constraints defined
- ✅ 5-10 nodes specified in nodes/__init__.py
- ✅ 8-15 edges connecting workflow
- ✅ Validated structure (passes `python -m agent_name validate`)
- ✅ Validated structure (passes `uv run python -m agent_name validate`)
- ✅ README.md with usage instructions
- ✅ CLI commands (info, validate, run, shell)
@@ -117,7 +141,7 @@ You're ready for Phase 2 when:
### Common Outputs
The building-agents-construction skill produces:
The hive-create skill produces:
```
exports/agent_name/
├── __init__.py (package exports)
@@ -137,53 +161,52 @@ exports/agent_name/
→ You may need to add Python functions or MCP tools (not covered by current skills)
**If want to optimize design:**
→ Proceed to Phase 1.5 (building-agents-patterns)
→ Proceed to Phase 1.5 (hive-patterns)
**If ready to test:**
→ Proceed to Phase 2
## Phase 1.5: Optimize Design (Optional)
**Duration**: 10-15 minutes
**Skill**: `/building-agents-patterns`
**Skill**: `/hive-patterns`
**Input**: Completed agent structure
### When to Use
- Want to add pause/resume functionality
- Want to add client-facing blocking or feedback edges
- Need judge patterns for output validation
- Want fan-out/fan-in (parallel execution)
- Need error handling patterns
- Want to optimize performance
- Need examples of complex routing
- Want best practices guidance
### What This Phase Provides
- Practical examples and patterns
- Pause/resume architecture
- Error handling strategies
- Client-facing interaction patterns
- Feedback edge routing with nullable output keys
- Judge patterns (implicit, SchemaJudge)
- Fan-out/fan-in parallel execution
- Context management and spillover patterns
- Anti-patterns to avoid
- Performance optimization techniques
**Skip this phase** if your agent design is straightforward.
## Phase 2: Test & Validate
**Duration**: 20-40 minutes
**Skill**: `/testing-agent`
**Skill**: `/hive-test`
**Input**: Working agent from Phase 1
### What This Phase Does
Creates comprehensive test suite:
- Constraint tests (verify hard requirements)
- Success criteria tests (measure goal achievement)
- Edge case tests (handle failures gracefully)
- Integration tests (end-to-end workflows)
Guides the creation and execution of a comprehensive test suite:
- Constraint tests
- Success criteria tests
- Edge case tests
- Integration tests
### Process
1. **Analyze agent** - Read goal, constraints, success criteria
2. **Generate tests** - Create pytest files in `exports/agent_name/tests/`
2. **Generate tests** - The calling agent writes pytest files in `exports/agent_name/tests/` using hive-test guidelines and templates
3. **User approval** - Review and approve each test
4. **Run evaluation** - Execute tests and collect results
5. **Debug failures** - Identify and fix issues
@@ -246,9 +269,9 @@ You're done when:
```
User: "Build an agent that monitors files"
→ Use /building-agents-construction
→ Use /hive-create
→ Agent structure created
→ Use /testing-agent
→ Use /hive-test
→ Tests created and passing
→ Done: Production-ready agent
```
@@ -257,19 +280,32 @@ User: "Build an agent that monitors files"
```
User: "Build an agent (first time)"
→ Use /building-agents-core (understand concepts)
→ Use /building-agents-construction (build structure)
→ Use /building-agents-patterns (optimize design)
→ Use /testing-agent (validate)
→ Use /hive-concepts (understand concepts)
→ Use /hive-create (build structure)
→ Use /hive-patterns (optimize design)
→ Use /hive-test (validate)
→ Done: Production-ready agent
```
### Pattern 1c: Build from Template
```
User: "Build an agent based on the deep research template"
→ Use /hive-create
→ Select "From a template" path
→ Pick template, name new agent
→ Review/modify goal, nodes, graph
→ Agent exported with customizations
→ Use /hive-test
→ Done: Customized agent
```
### Pattern 2: Test Existing Agent
```
User: "Test my agent at exports/my_agent"
→ Skip Phase 1
→ Use /testing-agent directly
→ Use /hive-test directly
→ Tests created
→ Done: Validated agent
```
@@ -278,58 +314,71 @@ User: "Test my agent at exports/my_agent"
```
User: "Build an agent"
→ Use /building-agents-construction (Phase 1)
→ Use /hive-create (Phase 1)
→ Implementation needed (see STATUS.md)
→ [User implements functions]
→ Use /testing-agent (Phase 2)
→ Use /hive-test (Phase 2)
→ Tests reveal bugs
→ [Fix bugs manually]
→ Re-run tests
→ Done: Working agent
```
### Pattern 4: Complex Agent with Patterns
### Pattern 4: Agent with Review Loops and HITL Checkpoints
```
User: "Build an agent with multi-turn conversations"
→ Use /building-agents-core (learn pause/resume)
→ Use /building-agents-construction (build structure)
→ Use /building-agents-patterns (implement pause/resume pattern)
→ Use /testing-agent (validate conversation flows)
→ Done: Complex conversational agent
User: "Build an agent with human review and feedback loops"
→ Use /hive-concepts (learn event loop, client-facing nodes)
→ Use /hive-create (build structure with feedback edges)
→ Use /hive-patterns (implement client-facing + feedback patterns)
→ Use /hive-test (validate review flows and edge routing)
→ Done: Agent with HITL checkpoints and review loops
```
## Skill Dependencies
```
agent-workflow (meta-skill)
hive (meta-skill)
├── building-agents-core (foundational)
│ ├── Architecture concepts
│ ├── Node/Edge/Goal definitions
├── hive-concepts (foundational)
│ ├── Architecture concepts (event loop, judges)
│ ├── Node types (event_loop, function)
│ ├── Edge routing and priority
│ ├── Tool discovery procedures
│ └── Workflow overview
├── building-agents-construction (procedural)
├── hive-create (procedural)
│ ├── Creates package structure
│ ├── Defines goal
│ ├── Adds nodes incrementally
│ ├── Connects edges
│ ├── Adds nodes (event_loop, function)
│ ├── Connects edges with priority routing
│ ├── Finalizes agent class
│ └── Requires: building-agents-core
│ └── Requires: hive-concepts
├── building-agents-patterns (reference)
│ ├── Best practices
│ ├── Pause/resume patterns
│ ├── Error handling
│ ├── Anti-patterns
│ └── Performance optimization
├── hive-patterns (reference)
│ ├── Client-facing interaction patterns
│ ├── Feedback edges and review loops
│ ├── Judge patterns (implicit, SchemaJudge)
│ ├── Fan-out/fan-in parallel execution
│ └── Context management and anti-patterns
── testing-agent
├── Reads agent goal
├── Generates tests
├── Runs evaluation
└── Reports results
── hive-credentials (utility)
├── Detects missing credentials
├── Offers auth method choices (Aden OAuth, direct API key)
├── Stores securely in ~/.hive/credentials
└── Validates with health checks
├── hive-test (validation)
│ ├── Reads agent goal
│ ├── Generates tests
│ ├── Runs evaluation
│ └── Reports results
└── hive-debugger (troubleshooting)
├── Monitors runtime logs (L1/L2/L3)
├── Identifies retry loops, tool failures
├── Categorizes issues (10 categories)
└── Provides fix recommendations
```
## Troubleshooting
@@ -339,13 +388,13 @@ agent-workflow (meta-skill)
- Check node IDs match between nodes/__init__.py and agent.py
- Verify all edges reference valid node IDs
- Ensure entry_node exists in nodes list
- Run: `PYTHONPATH=core:exports python -m agent_name validate`
- Run: `PYTHONPATH=exports uv run python -m agent_name validate`
### "Agent has structure but won't run"
- Check for STATUS.md or IMPLEMENTATION_GUIDE.md in agent directory
- Implementation may be needed (Python functions or MCP tools)
- This is expected - building-agents-construction creates structure, not implementation
- This is expected - hive-create creates structure, not implementation
- See implementation guide for completion options
### "Tests are failing"
@@ -353,9 +402,16 @@ agent-workflow (meta-skill)
- Review test output for specific failures
- Check agent goal and success criteria
- Verify constraints are met
- Use `/testing-agent` to debug and iterate
- Use `/hive-test` to debug and iterate
- Fix agent code and re-run tests
### "Agent is failing at runtime"
- Use `/hive-debugger` to analyze runtime logs
- The debugger identifies retry loops, tool failures, and stalled execution
- Get actionable fix recommendations with code changes
- Monitor the agent in real-time during TUI sessions
### "Not sure which phase I'm in"
Run these checks:
@@ -365,7 +421,7 @@ Run these checks:
ls exports/my_agent/agent.py
# Check if it validates
PYTHONPATH=core:exports python -m my_agent validate
PYTHONPATH=exports uv run python -m my_agent validate
# Check if tests exist
ls exports/my_agent/tests/
@@ -414,10 +470,10 @@ You're done with the workflow when:
## Additional Resources
- **building-agents-core**: See `.claude/skills/building-agents-core/SKILL.md`
- **building-agents-construction**: See `.claude/skills/building-agents-construction/SKILL.md`
- **building-agents-patterns**: See `.claude/skills/building-agents-patterns/SKILL.md`
- **testing-agent**: See `.claude/skills/testing-agent/SKILL.md`
- **hive-concepts**: See `.claude/skills/hive-concepts/SKILL.md`
- **hive-create**: See `.claude/skills/hive-create/SKILL.md`
- **hive-patterns**: See `.claude/skills/hive-patterns/SKILL.md`
- **hive-test**: See `.claude/skills/hive-test/SKILL.md`
- **Agent framework docs**: See `core/README.md`
- **Example agents**: See `exports/` directory
@@ -425,36 +481,46 @@ You're done with the workflow when:
This workflow provides a proven path from concept to production-ready agent:
1. **Learn** with `/building-agents-core` → Understand fundamentals (optional)
2. **Build** with `/building-agents-construction` → Get validated structure
3. **Optimize** with `/building-agents-patterns` → Apply best practices (optional)
4. **Test** with `/testing-agent`Get verified functionality
1. **Learn** with `/hive-concepts` → Understand fundamentals (optional)
2. **Build** with `/hive-create` → Get validated structure
3. **Optimize** with `/hive-patterns` → Apply best practices (optional)
4. **Configure** with `/hive-credentials`Set up API keys (if needed)
5. **Test** with `/hive-test` → Get verified functionality
6. **Debug** with `/hive-debugger` → Fix runtime issues (if needed)
The workflow is **flexible** - skip phases as needed, iterate freely, and adapt to your specific requirements. The goal is **production-ready agents** built with **consistent, repeatable processes**.
## Skill Selection Guide
**Choose building-agents-core when:**
**Choose hive-concepts when:**
- First time building agents
- Need to understand architecture
- Need to understand event loop architecture
- Validating tool availability
- Learning about node types and edges
- Learning about node types, edges, and judges
**Choose building-agents-construction when:**
**Choose hive-create when:**
- Actually building an agent
- Have clear requirements
- Ready to write code
- Want step-by-step guidance
- Want to start from an existing template and customize it
**Choose building-agents-patterns when:**
**Choose hive-patterns when:**
- Agent structure complete
- Need advanced patterns
- Implementing pause/resume
- Optimizing performance
- Need client-facing nodes or feedback edges
- Implementing review loops or fan-out/fan-in
- Want judge patterns or context management
- Want best practices
**Choose testing-agent when:**
**Choose hive-test when:**
- Agent structure complete
- Ready to validate functionality
- Need comprehensive test coverage
- Debugging agent behavior
- Testing feedback loops, output keys, or fan-out
**Choose hive-debugger when:**
- Agent is failing or stuck at runtime
- Seeing retry loops or escalations
- Tool calls are failing
- Need to understand why a node isn't completing
- Want real-time monitoring of agent execution
@@ -1,6 +1,6 @@
# Example: File Monitor Agent
This example shows the complete agent-workflow in action for building a file monitoring agent.
This example shows the complete /hive workflow in action for building a file monitoring agent.
## Initial Request
@@ -12,7 +12,7 @@ User: "Build an agent that monitors ~/Downloads and copies new files to ~/Docume
### Step 1: Create Structure
Agent invokes `/building-agents` skill and:
Agent invokes `/hive-create` skill and:
1. Creates `exports/file_monitor_agent/` package
2. Writes skeleton files (__init__.py, __main__.py, agent.py, etc.)
@@ -75,10 +75,10 @@ initialize → list → identify → check
### Step 5: Finalize
```bash
$ PYTHONPATH=core:exports python -m file_monitor_agent validate
$ PYTHONPATH=exports uv run python -m file_monitor_agent validate
✓ Agent is valid
$ PYTHONPATH=core:exports python -m file_monitor_agent info
$ PYTHONPATH=exports uv run python -m file_monitor_agent info
Agent: File Monitor & Copy Agent
Nodes: 7
Edges: 8
@@ -107,7 +107,7 @@ exports/file_monitor_agent/
### Step 1: Analyze Agent
Agent invokes `/testing-agent` skill and:
Agent invokes `/hive-test` skill and:
1. Reads goal from `exports/file_monitor_agent/agent.py`
2. Identifies 4 success criteria to test
@@ -131,7 +131,7 @@ Tests approved incrementally by user.
### Step 3: Run Tests
```bash
$ PYTHONPATH=core:exports pytest exports/file_monitor_agent/tests/
$ PYTHONPATH=exports uv run pytest exports/file_monitor_agent/tests/
test_constraints.py::test_preserves_originals PASSED
test_constraints.py::test_handles_errors PASSED
@@ -162,7 +162,7 @@ test_edge_cases.py::test_large_files PASSED
./RUN_AGENT.sh
# Or manually
PYTHONPATH=core:exports:tools/src python -m file_monitor_agent run
PYTHONPATH=exports uv run python -m file_monitor_agent run
```
**Capabilities:**
-955
View File
@@ -1,955 +0,0 @@
---
name: testing-agent
description: Run goal-based evaluation tests for agents. Use when you need to verify an agent meets its goals, debug failing tests, or iterate on agent improvements based on test results.
---
# ⛔ MANDATORY: USE MCP TOOLS ONLY
**STOP. Read this before doing anything else.**
You MUST use MCP tools for ALL testing operations. Never write test files directly.
## Required MCP Workflow
1. `mcp__agent-builder__list_tests` - Check what tests exist
2. `mcp__agent-builder__generate_constraint_tests` or `mcp__agent-builder__generate_success_tests` - Generate tests
3. `mcp__agent-builder__get_pending_tests` - Review pending tests
4. `mcp__agent-builder__approve_tests` - Approve tests (this writes the files)
5. `mcp__agent-builder__run_tests` - Execute tests
6. `mcp__agent-builder__debug_test` - Debug failures
## ❌ WRONG - Never Do This
```python
# WRONG: Writing test file directly with Write tool
Write(file_path="exports/agent/tests/test_foo.py", content="def test_...")
```
```python
# WRONG: Running pytest directly via Bash
Bash(command="pytest exports/agent/tests/ -v")
```
```python
# WRONG: Creating test code manually
test_code = """
def test_something():
assert True
"""
```
## ✅ CORRECT - Always Do This
```python
# CORRECT: Generate tests via MCP tool
mcp__agent-builder__generate_constraint_tests(
goal_id="my-goal",
goal_json='{"id": "...", "constraints": [...]}',
agent_path="exports/my_agent"
)
# CORRECT: Approve tests via MCP tool (this writes files)
mcp__agent-builder__approve_tests(
goal_id="my-goal",
approvals='[{"test_id": "test-1", "action": "approve"}]'
)
# CORRECT: Run tests via MCP tool
mcp__agent-builder__run_tests(
goal_id="my-goal",
agent_path="exports/my_agent"
)
# CORRECT: Debug failures via MCP tool
mcp__agent-builder__debug_test(
goal_id="my-goal",
test_name="test_constraint_foo",
agent_path="exports/my_agent"
)
```
## Self-Check Before Every Action
Before you take any testing action, ask yourself:
- Am I about to write `def test_...`? → **STOP, use `generate_*_tests` instead**
- Am I about to use `Write` for a test file? → **STOP, use `approve_tests` instead**
- Am I about to run `pytest` via Bash? → **STOP, use `run_tests` instead**
---
# Testing Agents with MCP Tools
Run goal-based evaluation tests for agents built with the building-agents skill.
**Key Principle: Tests are generated via MCP tools and written as Python files**
- ✅ Generate tests: `generate_constraint_tests`, `generate_success_tests`
- ✅ Review and approve: `get_pending_tests`, `approve_tests` → writes to Python files
- ✅ Run tests: `run_tests` (runs pytest via subprocess)
- ✅ Debug failures: `debug_test` (re-runs single test with verbose output)
- ✅ List tests: `list_tests` (scans Python test files)
- ✅ Tests stored in `exports/{agent}/tests/test_*.py`
## Architecture: Python Test Files
```
exports/my_agent/
├── __init__.py
├── agent.py ← Agent to test
├── nodes/__init__.py
├── config.py
├── __main__.py
└── tests/ ← Test files written by MCP tools
├── conftest.py # Shared fixtures (auto-created)
├── test_constraints.py
├── test_success_criteria.py
└── test_edge_cases.py
```
**Tests import the agent directly:**
```python
import pytest
from exports.my_agent import default_agent
@pytest.mark.asyncio
async def test_happy_path(mock_mode):
result = await default_agent.run({"query": "test"}, mock_mode=mock_mode)
assert result.success
assert len(result.output) > 0
```
## Why MCP Tools Are Required
- Tests are generated with proper imports, fixtures, and API key enforcement
- Approval workflow ensures user review before file creation
- `run_tests` parses pytest output into structured results for iteration
- `debug_test` provides formatted output with actionable debugging info
- `conftest.py` is auto-created with proper fixtures
## Quick Start
1. **Check existing tests** - `list_tests(goal_id, agent_path)`
2. **Generate test files** - `generate_constraint_tests` or `generate_success_tests`
3. **User reviews and approves** - `get_pending_tests``approve_tests`
4. **Run tests** - `run_tests(goal_id, agent_path)`
5. **Debug failures** - `debug_test(goal_id, test_name, agent_path)`
6. **Iterate** - Repeat steps 4-5 until all pass
## ⚠️ API Key Requirement for Real Testing
**CRITICAL: Real LLM testing requires an API key.** Mock mode only validates structure and does NOT test actual agent behavior.
### Prerequisites
Before running agent tests, you MUST set your API key:
```bash
export ANTHROPIC_API_KEY="your-key-here"
```
**Why API keys are required:**
- Tests need to execute the agent's LLM nodes to validate behavior
- Mock mode bypasses LLM calls, providing no confidence in real-world performance
- Success criteria (personalization, reasoning quality, constraint adherence) can only be tested with real LLM calls
### Mock Mode Limitations
Mock mode (`--mock` flag or `mock_mode=True`) is **ONLY for structure validation**:
✓ Validates graph structure (nodes, edges, connections)
✓ Tests that code doesn't crash on execution
✗ Does NOT test LLM message generation
✗ Does NOT test reasoning or decision-making quality
✗ Does NOT test constraint validation (length limits, format rules)
✗ Does NOT test real API integrations or tool use
✗ Does NOT test personalization or content quality
**Bottom line:** If you're testing whether an agent achieves its goal, you MUST use a real API key.
### Enforcing API Key in Tests
When generating tests, **ALWAYS include API key checks**:
```python
import os
import pytest
from aden_tools.credentials import CredentialManager
# At the top of every test file
pytestmark = pytest.mark.skipif(
not CredentialManager().is_available("anthropic") and not os.environ.get("MOCK_MODE"),
reason="API key required for real testing. Set ANTHROPIC_API_KEY or use MOCK_MODE=1 for structure validation only."
)
@pytest.fixture(scope="session", autouse=True)
def check_api_key():
"""Ensure API key is set for real testing."""
creds = CredentialManager()
if not creds.is_available("anthropic"):
if os.environ.get("MOCK_MODE"):
print("\n⚠️ Running in MOCK MODE - structure validation only")
print(" This does NOT test LLM behavior or agent quality")
print(" Set ANTHROPIC_API_KEY for real testing\n")
else:
pytest.fail(
"\n❌ ANTHROPIC_API_KEY not set!\n\n"
"Real testing requires an API key. Choose one:\n"
"1. Set API key (RECOMMENDED):\n"
" export ANTHROPIC_API_KEY='your-key-here'\n"
"2. Run structure validation only:\n"
" MOCK_MODE=1 pytest exports/{agent}/tests/\n\n"
"Note: Mock mode does NOT validate agent behavior or quality."
)
```
### User Communication
When the user asks to test an agent, **ALWAYS check for the API key first**:
```python
from aden_tools.credentials import CredentialManager
# Before running any tests
creds = CredentialManager()
if not creds.is_available("anthropic"):
print("⚠️ No ANTHROPIC_API_KEY found!")
print()
print("Testing requires a real API key to validate agent behavior.")
print()
print("Options:")
print("1. Set your API key (RECOMMENDED):")
print(" export ANTHROPIC_API_KEY='your-key-here'")
print()
print("2. Run in mock mode (structure validation only):")
print(" MOCK_MODE=1 pytest exports/{agent}/tests/")
print()
print("Mock mode does NOT test:")
print(" - LLM message generation")
print(" - Reasoning or decision quality")
print(" - Constraint validation")
print(" - Real API integrations")
# Ask user what to do
AskUserQuestion(...)
```
## The Three-Stage Flow
```
┌─────────────────────────────────────────────────────────────────────────┐
│ GOAL STAGE │
│ (building-agents skill) │
│ │
│ 1. User defines goal with success_criteria and constraints │
│ 2. Goal written to agent.py immediately │
│ 3. Generate CONSTRAINT TESTS → Write to tests/ → USER APPROVAL │
│ Files created: exports/{agent}/tests/test_constraints.py │
└─────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────┐
│ AGENT STAGE │
│ (building-agents skill) │
│ │
│ Build nodes + edges, written immediately to files │
│ Constraint tests can run during development: │
│ run_tests(goal_id, agent_path, test_types='["constraint"]') │
└─────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────┐
│ EVAL STAGE (this skill) │
│ │
│ 1. Generate SUCCESS_CRITERIA TESTS → Write to tests/ → USER APPROVAL │
│ Files created: exports/{agent}/tests/test_success_criteria.py │
│ 2. Run all tests: run_tests(goal_id, agent_path) │
│ 3. On failure → debug_test(goal_id, test_name, agent_path) │
│ 4. Iterate: Edit agent code → Re-run run_tests (instant feedback) │
└─────────────────────────────────────────────────────────────────────────┘
```
## Step-by-Step: Testing an Agent
### Step 1: Check Existing Tests
**ALWAYS check first** before generating new tests:
```python
mcp__agent-builder__list_tests(
goal_id="your-goal-id",
agent_path="exports/your_agent"
)
```
This shows what test files already exist. If tests exist:
- Review the list to see what's covered
- Ask user if they want to add more or run existing tests
### Step 2: Generate Constraint Tests (Goal Stage)
After goal is defined, generate constraint tests using the MCP tool:
```python
# First, read the goal from agent.py to get the goal JSON
goal_code = Read(file_path="exports/your_agent/agent.py")
# Extract the goal definition and convert to JSON
# Generate constraint tests via MCP tool
mcp__agent-builder__generate_constraint_tests(
goal_id="your-goal-id",
goal_json='{"id": "goal-id", "name": "...", "constraints": [...]}',
agent_path="exports/your_agent"
)
```
**Response includes:**
- `generated_count`: Number of tests generated
- `tests`: List with id, test_name, description, confidence, test_code_preview
- `next_step`: "Call approve_tests to approve, modify, or reject each test"
- `output_file`: Where tests will be written when approved
**USER APPROVAL REQUIRED**: Review generated tests and approve:
```python
# Review pending tests
mcp__agent-builder__get_pending_tests(goal_id="your-goal-id")
# Approve tests (this writes them to files)
mcp__agent-builder__approve_tests(
goal_id="your-goal-id",
approvals='[{"test_id": "test-1", "action": "approve"}, {"test_id": "test-2", "action": "approve"}]'
)
```
**Approval actions:**
- `approve` - Accept test as-is, write to file
- `modify` - Accept with changes: `{"test_id": "...", "action": "modify", "modified_code": "..."}`
- `reject` - Reject with reason: `{"test_id": "...", "action": "reject", "reason": "..."}`
- `skip` - Skip for now
### Step 3: Generate Success Criteria Tests (Eval Stage)
After agent is fully built, generate success criteria tests:
```python
# Generate success criteria tests via MCP tool
mcp__agent-builder__generate_success_tests(
goal_id="your-goal-id",
goal_json='{"id": "goal-id", "name": "...", "success_criteria": [...]}',
node_names="analyze_request,search_web,format_results",
tool_names="web_search,web_scrape",
agent_path="exports/your_agent"
)
```
**USER APPROVAL REQUIRED**: Same approval flow as constraint tests:
```python
# Review and approve
mcp__agent-builder__get_pending_tests(goal_id="your-goal-id")
mcp__agent-builder__approve_tests(
goal_id="your-goal-id",
approvals='[{"test_id": "...", "action": "approve"}]'
)
```
### Step 4: Test Fixtures (conftest.py)
**conftest.py is auto-created** when you approve tests via `approve_tests`. It includes:
- API key enforcement fixtures
- `mock_mode` fixture
- `credentials` fixture
- `sample_inputs` fixture
You do NOT need to create conftest.py manually - the MCP tool handles this.
### Step 5: Run Tests
**Use the MCP tool to run tests** (not pytest directly):
```python
mcp__agent-builder__run_tests(
goal_id="your-goal-id",
agent_path="exports/your_agent"
)
**Response includes structured results:**
```json
{
"goal_id": "your-goal-id",
"overall_passed": false,
"summary": {
"total": 12,
"passed": 10,
"failed": 2,
"skipped": 0,
"errors": 0,
"pass_rate": "83.3%"
},
"test_results": [
{"file": "test_constraints.py", "test_name": "test_constraint_api_rate_limits", "status": "passed"},
{"file": "test_success_criteria.py", "test_name": "test_success_find_relevant_results", "status": "failed"}
],
"failures": [
{"test_name": "test_success_find_relevant_results", "details": "AssertionError: Expected 3-5 results..."}
]
}
```
**Options for `run_tests`:**
```python
# Run only constraint tests
mcp__agent-builder__run_tests(
goal_id="your-goal-id",
agent_path="exports/your_agent",
test_types='["constraint"]'
)
# Run with parallel workers
mcp__agent-builder__run_tests(
goal_id="your-goal-id",
agent_path="exports/your_agent",
parallel=4
)
# Stop on first failure
mcp__agent-builder__run_tests(
goal_id="your-goal-id",
agent_path="exports/your_agent",
fail_fast=True
)
```
### Step 6: Debug Failed Tests
**Use the MCP tool to debug** (not Bash/pytest directly):
```python
mcp__agent-builder__debug_test(
goal_id="your-goal-id",
test_name="test_success_find_relevant_results",
agent_path="exports/your_agent"
)
```
**Response includes:**
- Full verbose output from the test
- Stack trace with exact line numbers
- Captured logs and prints
- Suggestions for fixing the issue
### Step 7: Categorize Errors
When a test fails, categorize the error to guide iteration:
```python
def categorize_test_failure(test_output, agent_code):
"""Categorize test failure to guide iteration."""
# Read test output and agent code
failure_info = {
"test_name": "...",
"error_message": "...",
"stack_trace": "...",
}
# Pattern-based categorization
if any(pattern in failure_info["error_message"].lower() for pattern in [
"typeerror", "attributeerror", "keyerror", "valueerror",
"null", "none", "undefined", "tool call failed"
]):
category = "IMPLEMENTATION_ERROR"
guidance = {
"stage": "Agent",
"action": "Fix the bug in agent code",
"files_to_edit": ["agent.py", "nodes/__init__.py"],
"restart_required": False,
"description": "Code bug - fix and re-run tests"
}
elif any(pattern in failure_info["error_message"].lower() for pattern in [
"assertion", "expected", "got", "should be", "success criteria"
]):
category = "LOGIC_ERROR"
guidance = {
"stage": "Goal",
"action": "Update goal definition",
"files_to_edit": ["agent.py (goal section)"],
"restart_required": True,
"description": "Goal definition is wrong - update and rebuild"
}
elif any(pattern in failure_info["error_message"].lower() for pattern in [
"timeout", "rate limit", "empty", "boundary", "edge case"
]):
category = "EDGE_CASE"
guidance = {
"stage": "Eval",
"action": "Add edge case test and fix handling",
"files_to_edit": ["agent.py", "tests/test_edge_cases.py"],
"restart_required": False,
"description": "New scenario - add test and handle it"
}
else:
category = "UNKNOWN"
guidance = {
"stage": "Unknown",
"action": "Manual investigation required",
"restart_required": False
}
return {
"category": category,
"guidance": guidance,
"failure_info": failure_info
}
```
**Show categorization to user:**
```python
AskUserQuestion(
questions=[{
"question": f"Test failed with {category}. How would you like to proceed?",
"header": "Test Failure",
"options": [
{
"label": "Fix code directly (Recommended)" if category == "IMPLEMENTATION_ERROR" else "Update goal",
"description": guidance["description"]
},
{
"label": "Show detailed error info",
"description": "View full stack trace and logs"
},
{
"label": "Skip for now",
"description": "Continue with other tests"
}
],
"multiSelect": false
}]
)
```
### Step 8: Iterate Based on Error Category
#### IMPLEMENTATION_ERROR → Fix Agent Code
```python
# 1. Show user the exact file and line that failed
print(f"Error in: exports/{agent_name}/nodes/__init__.py:42")
print(f"Issue: 'NoneType' object has no attribute 'get'")
# 2. Read the problematic code
code = Read(file_path=f"exports/{agent_name}/nodes/__init__.py")
# 3. User can fix directly, or you suggest a fix:
Edit(
file_path=f"exports/{agent_name}/nodes/__init__.py",
old_string="if results.get('videos'):",
new_string="if results and results.get('videos'):"
)
# 4. Re-run tests immediately (instant feedback!)
mcp__agent-builder__run_tests(
goal_id="your-goal-id",
agent_path=f"exports/{agent_name}"
)
```
#### LOGIC_ERROR → Update Goal
```python
# 1. Show user the goal definition
goal_code = Read(file_path=f"exports/{agent_name}/agent.py")
# 2. Discuss what needs to change in success_criteria or constraints
# 3. Edit the goal
Edit(
file_path=f"exports/{agent_name}/agent.py",
old_string='target="3-5 videos"',
new_string='target="1-5 videos"' # More realistic
)
# 4. May need to regenerate agent nodes if goal changed significantly
# This requires going back to building-agents skill
```
#### EDGE_CASE → Add Test and Fix
```python
# 1. Create new edge case test with API key enforcement
edge_case_test = '''
@pytest.mark.asyncio
async def test_edge_case_empty_results(mock_mode):
"""Test: Agent handles no results gracefully"""
result = await default_agent.run({{"query": "xyzabc123nonsense"}}, mock_mode=mock_mode)
# Should succeed with empty results, not crash
assert result.success or result.error is not None
if result.success:
assert result.output.get("message") == "No results found"
'''
# 2. Add to test file
Edit(
file_path=f"exports/{agent_name}/tests/test_edge_cases.py",
old_string="# Add edge case tests here",
new_string=edge_case_test
)
# 3. Fix agent to handle edge case
# Edit agent code to handle empty results
# 4. Re-run tests
```
## Test File Templates (Reference Only)
**⚠️ Do NOT copy-paste these templates directly.** Use `generate_constraint_tests` and `generate_success_tests` MCP tools to create properly structured tests with correct imports and fixtures.
These templates show the structure of generated tests for reference only.
### Constraint Test Template
```python
"""Constraint tests for {agent_name}.
These tests validate that the agent respects its defined constraints.
Requires ANTHROPIC_API_KEY for real testing.
"""
import os
import pytest
from exports.{agent_name} import default_agent
from aden_tools.credentials import CredentialManager
# Enforce API key for real testing
pytestmark = pytest.mark.skipif(
not CredentialManager().is_available("anthropic") and not os.environ.get("MOCK_MODE"),
reason="API key required. Set ANTHROPIC_API_KEY or use MOCK_MODE=1."
)
@pytest.mark.asyncio
async def test_constraint_{constraint_id}():
"""Test: {constraint_description}"""
# Test implementation based on constraint type
mock_mode = bool(os.environ.get("MOCK_MODE"))
result = await default_agent.run({{"test": "input"}}, mock_mode=mock_mode)
# Assert constraint is respected
assert True # Replace with actual check
```
### Success Criteria Test Template
```python
"""Success criteria tests for {agent_name}.
These tests validate that the agent achieves its defined success criteria.
Requires ANTHROPIC_API_KEY for real testing - mock mode cannot validate success criteria.
"""
import os
import pytest
from exports.{agent_name} import default_agent
from aden_tools.credentials import CredentialManager
# Enforce API key for real testing
pytestmark = pytest.mark.skipif(
not CredentialManager().is_available("anthropic") and not os.environ.get("MOCK_MODE"),
reason="API key required. Set ANTHROPIC_API_KEY or use MOCK_MODE=1."
)
@pytest.mark.asyncio
async def test_success_{criteria_id}():
"""Test: {criteria_description}"""
mock_mode = bool(os.environ.get("MOCK_MODE"))
result = await default_agent.run({{"test": "input"}}, mock_mode=mock_mode)
assert result.success, f"Agent failed: {{result.error}}"
# Verify success criterion met
# e.g., assert metric meets target
assert True # Replace with actual check
```
### Edge Case Test Template
```python
"""Edge case tests for {agent_name}.
These tests validate agent behavior in unusual or boundary conditions.
Requires ANTHROPIC_API_KEY for real testing.
"""
import os
import pytest
from exports.{agent_name} import default_agent
from aden_tools.credentials import CredentialManager
# Enforce API key for real testing
pytestmark = pytest.mark.skipif(
not CredentialManager().is_available("anthropic") and not os.environ.get("MOCK_MODE"),
reason="API key required. Set ANTHROPIC_API_KEY or use MOCK_MODE=1."
)
@pytest.mark.asyncio
async def test_edge_case_{scenario_name}():
"""Test: Agent handles {scenario_description}"""
mock_mode = bool(os.environ.get("MOCK_MODE"))
result = await default_agent.run({{"edge": "case_input"}}, mock_mode=mock_mode)
# Verify graceful handling
assert result.success or result.error is not None
```
## Interactive Build + Test Loop
During agent construction (Agent stage), you can run constraint tests incrementally:
```python
# After adding first node
print("Added search_node. Running relevant constraint tests...")
mcp__agent-builder__run_tests(
goal_id="your-goal-id",
agent_path=f"exports/{agent_name}",
test_types='["constraint"]'
)
# After adding second node
print("Added filter_node. Running all constraint tests...")
mcp__agent-builder__run_tests(
goal_id="your-goal-id",
agent_path=f"exports/{agent_name}",
test_types='["constraint"]'
)
```
This provides **immediate feedback** during development, catching issues early.
## Common Test Patterns
**Note:** All test patterns should include API key enforcement via conftest.py.
### Happy Path Test
```python
@pytest.mark.asyncio
async def test_happy_path(mock_mode):
"""Test normal successful execution"""
result = await default_agent.run({{"query": "python tutorials"}}, mock_mode=mock_mode)
assert result.success
assert len(result.output) > 0
```
### Boundary Condition Test
```python
@pytest.mark.asyncio
async def test_boundary_minimum(mock_mode):
"""Test at minimum threshold"""
result = await default_agent.run({{"query": "very specific niche topic"}}, mock_mode=mock_mode)
assert result.success
assert len(result.output.get("results", [])) >= 1
```
### Error Handling Test
```python
@pytest.mark.asyncio
async def test_error_handling(mock_mode):
"""Test graceful error handling"""
result = await default_agent.run({{"query": ""}}, mock_mode=mock_mode) # Invalid input
assert not result.success or result.output.get("error") is not None
```
### Performance Test
```python
@pytest.mark.asyncio
async def test_performance_latency(mock_mode):
"""Test response time is acceptable"""
import time
start = time.time()
result = await default_agent.run({{"query": "test"}}, mock_mode=mock_mode)
duration = time.time() - start
assert duration < 5.0, f"Took {{duration}}s, expected <5s"
```
## Integration with building-agents
### Handoff Points
| Scenario | From | To | Action |
|----------|------|-----|--------|
| Agent built, ready to test | building-agents | testing-agent | Generate success tests |
| LOGIC_ERROR found | testing-agent | building-agents | Update goal, rebuild |
| IMPLEMENTATION_ERROR found | testing-agent | Direct fix | Edit agent files, re-run tests |
| EDGE_CASE found | testing-agent | testing-agent | Add edge case test |
| All tests pass | testing-agent | Done | Agent validated ✅ |
### Iteration Speed Comparison
| Scenario | Old Approach | New Approach |
|----------|--------------|--------------|
| **Bug Fix** | Rebuild via MCP tools (14 min) | Edit Python file, pytest (2 min) |
| **Add Test** | Generate via MCP, export (5 min) | Write test file directly (1 min) |
| **Debug** | Read subprocess logs | pdb, breakpoints, prints |
| **Inspect** | Limited visibility | Full Python introspection |
## Anti-Patterns
### MCP Tool Enforcement
| Don't | Do Instead |
|-------|------------|
| ❌ Write test files with Write tool | ✅ Use `generate_*_tests` + `approve_tests` |
| ❌ Run pytest via Bash | ✅ Use `run_tests` MCP tool |
| ❌ Debug tests with Bash pytest -vvs | ✅ Use `debug_test` MCP tool |
| ❌ Edit test files directly | ✅ Use `approve_tests` with `action: "modify"` |
| ❌ Check for tests with Glob | ✅ Use `list_tests` MCP tool |
### General Testing
| Don't | Do Instead |
|-------|------------|
| ❌ Auto-approve generated tests | ✅ Always require user approval via approve_tests |
| ❌ Treat all failures the same | ✅ Use debug_test to categorize and iterate appropriately |
| ❌ Rebuild entire agent for small bugs | ✅ Edit code directly, re-run tests |
| ❌ Run tests without API key | ✅ Always set ANTHROPIC_API_KEY first |
| ❌ Skip user review of generated tests | ✅ Show test code to user before approving |
## Workflow Summary
```
1. Check existing tests: list_tests(goal_id, agent_path)
→ Scans exports/{agent}/tests/test_*.py
2. Generate tests: generate_constraint_tests, generate_success_tests
→ Returns pending tests (stored in memory)
3. Review and approve: get_pending_tests → approve_tests → USER APPROVAL
→ Writes approved tests to exports/{agent}/tests/test_*.py
4. Run tests: run_tests(goal_id, agent_path)
→ Executes: pytest exports/{agent}/tests/ -v
5. Debug failures: debug_test(goal_id, test_name, agent_path)
→ Re-runs single test with verbose output
6. Fix based on category:
- IMPLEMENTATION_ERROR → Edit agent code directly
- ASSERTION_FAILURE → Fix agent logic or update test
- IMPORT_ERROR → Check package structure
- API_ERROR → Check API keys and connectivity
7. Re-run tests: run_tests(goal_id, agent_path)
8. Repeat until all pass ✅
```
## MCP Tools Reference
```python
# Check existing tests (scans Python test files)
mcp__agent-builder__list_tests(
goal_id="your-goal-id",
agent_path="exports/your_agent"
)
# Generate constraint tests (returns pending tests for approval)
mcp__agent-builder__generate_constraint_tests(
goal_id="your-goal-id",
goal_json='{"id": "...", "constraints": [...]}',
agent_path="exports/your_agent"
)
# Generate success criteria tests
mcp__agent-builder__generate_success_tests(
goal_id="your-goal-id",
goal_json='{"id": "...", "success_criteria": [...]}',
node_names="node1,node2",
tool_names="tool1,tool2",
agent_path="exports/your_agent"
)
# Review pending tests
mcp__agent-builder__get_pending_tests(goal_id="your-goal-id")
# Approve tests → writes to Python files at exports/{agent}/tests/
mcp__agent-builder__approve_tests(
goal_id="your-goal-id",
approvals='[{"test_id": "...", "action": "approve"}]'
)
# Run tests via pytest subprocess
mcp__agent-builder__run_tests(
goal_id="your-goal-id",
agent_path="exports/your_agent"
)
# Debug a failed test (re-runs with verbose output)
mcp__agent-builder__debug_test(
goal_id="your-goal-id",
test_name="test_constraint_foo",
agent_path="exports/your_agent"
)
```
## run_tests Options
```python
# Run only constraint tests
mcp__agent-builder__run_tests(
goal_id="your-goal-id",
agent_path="exports/your_agent",
test_types='["constraint"]'
)
# Run only success criteria tests
mcp__agent-builder__run_tests(
goal_id="your-goal-id",
agent_path="exports/your_agent",
test_types='["success"]'
)
# Run with pytest-xdist parallelism (requires pytest-xdist)
mcp__agent-builder__run_tests(
goal_id="your-goal-id",
agent_path="exports/your_agent",
parallel=4
)
# Stop on first failure
mcp__agent-builder__run_tests(
goal_id="your-goal-id",
agent_path="exports/your_agent",
fail_fast=True
)
```
## Direct pytest Commands
You can also run tests directly with pytest (the MCP tools use pytest internally):
```bash
# Run all tests
pytest exports/your_agent/tests/ -v
# Run specific test file
pytest exports/your_agent/tests/test_constraints.py -v
# Run specific test
pytest exports/your_agent/tests/test_constraints.py::test_constraint_foo -vvs
# Run in mock mode (structure validation only)
MOCK_MODE=1 pytest exports/your_agent/tests/ -v
```
---
**MCP tools generate tests, write them to Python files, and run them via pytest.**
@@ -1,348 +0,0 @@
# Example: Testing a YouTube Research Agent
This example walks through testing a YouTube research agent that finds relevant videos based on a topic.
## Prerequisites
- Agent built with building-agents skill at `exports/youtube-research/`
- Goal defined with success criteria and constraints
## Step 1: Load the Goal
First, load the goal that was defined during the Goal stage:
```json
{
"id": "youtube-research",
"name": "YouTube Research Agent",
"description": "Find relevant YouTube videos on a given topic",
"success_criteria": [
{
"id": "find_videos",
"description": "Find 3-5 relevant videos",
"metric": "video_count",
"target": "3-5",
"weight": 1.0
},
{
"id": "relevance",
"description": "Videos must be relevant to the topic",
"metric": "relevance_score",
"target": ">0.8",
"weight": 0.8
}
],
"constraints": [
{
"id": "api_limits",
"description": "Must not exceed YouTube API rate limits",
"constraint_type": "hard",
"category": "technical"
},
{
"id": "content_safety",
"description": "Must filter out inappropriate content",
"constraint_type": "hard",
"category": "safety"
}
]
}
```
## Step 2: Generate Constraint Tests
During the Goal stage (or early Eval), generate tests for constraints:
```python
result = generate_constraint_tests(
goal_id="youtube-research",
goal_json='<goal JSON above>'
)
```
**Generated tests (awaiting approval):**
```
┌─────────────────────────────────────────────────────────────────┐
│ Generated Constraint Tests (2 tests) │
├─────────────────────────────────────────────────────────────────┤
│ [1/2] test_constraint_api_limits_respected │
│ Constraint: api_limits │
│ Confidence: 88% │
│ │
│ def test_constraint_api_limits_respected(agent): │
│ """Verify API rate limits are not exceeded.""" │
│ import time │
│ for i in range(10): │
│ result = agent.run({"topic": f"test_{i}"}) │
│ time.sleep(0.1) │
│ # Should complete without rate limit errors │
│ assert "rate limit" not in str(result).lower() │
│ │
│ [a]pprove [r]eject [e]dit [s]kip │
├─────────────────────────────────────────────────────────────────┤
│ [2/2] test_constraint_content_safety_filter │
│ Constraint: content_safety │
│ Confidence: 91% │
│ │
│ def test_constraint_content_safety_filter(agent): │
│ """Verify inappropriate content is filtered.""" │
│ result = agent.run({"topic": "general topic"}) │
│ for video in result.videos: │
│ assert video.safe_for_work is True │
│ assert video.age_restricted is False │
│ │
│ [a]pprove [r]eject [e]dit [s]kip │
└─────────────────────────────────────────────────────────────────┘
```
## Step 3: Approve Constraint Tests
Review and approve each test:
```python
result = approve_tests(
goal_id="youtube-research",
approvals='[
{"test_id": "test_constraint_api_001", "action": "approve"},
{"test_id": "test_constraint_content_001", "action": "approve"}
]'
)
```
## Step 4: Generate Success Criteria Tests
After the agent is built, generate success criteria tests:
```python
result = generate_success_tests(
goal_id="youtube-research",
goal_json='<goal JSON>',
node_names="search_node,filter_node,rank_node,format_node",
tool_names="youtube_search,video_details,channel_info"
)
```
**Generated tests (awaiting approval):**
```
┌─────────────────────────────────────────────────────────────────┐
│ Generated Success Criteria Tests (4 tests) │
├─────────────────────────────────────────────────────────────────┤
│ [1/4] test_find_videos_happy_path │
│ Criteria: find_videos │
│ Confidence: 95% │
│ │
│ def test_find_videos_happy_path(agent): │
│ """Test finding videos for a common topic.""" │
│ result = agent.run({"topic": "machine learning"}) │
│ assert result.success │
│ assert 3 <= len(result.videos) <= 5 │
│ assert all(v.title for v in result.videos) │
│ assert all(v.video_id for v in result.videos) │
│ │
│ [a]pprove [r]eject [e]dit [s]kip │
├─────────────────────────────────────────────────────────────────┤
│ [2/4] test_find_videos_minimum_boundary │
│ Criteria: find_videos │
│ Confidence: 87% │
│ │
│ def test_find_videos_minimum_boundary(agent): │
│ """Test at minimum threshold (3 videos).""" │
│ result = agent.run({"topic": "niche topic xyz"}) │
│ assert len(result.videos) >= 3 │
│ │
│ [a]pprove [r]eject [e]dit [s]kip │
├─────────────────────────────────────────────────────────────────┤
│ [3/4] test_relevance_score_threshold │
│ Criteria: relevance │
│ Confidence: 92% │
│ │
│ def test_relevance_score_threshold(agent): │
│ """Test relevance scoring meets threshold.""" │
│ result = agent.run({"topic": "python programming"}) │
│ for video in result.videos: │
│ assert video.relevance_score > 0.8 │
│ │
│ [a]pprove [r]eject [e]dit [s]kip │
├─────────────────────────────────────────────────────────────────┤
│ [4/4] test_find_videos_no_results_graceful │
│ Criteria: find_videos │
│ Confidence: 84% │
│ │
│ def test_find_videos_no_results_graceful(agent): │
│ """Test graceful handling of no results.""" │
│ result = agent.run({"topic": "xyznonexistent123"}) │
│ # Should not crash, return empty or message │
│ assert result.videos == [] or result.message │
│ │
│ [a]pprove [r]eject [e]dit [s]kip │
└─────────────────────────────────────────────────────────────────┘
```
## Step 5: Approve Success Criteria Tests
```python
result = approve_tests(
goal_id="youtube-research",
approvals='[
{"test_id": "test_success_001", "action": "approve"},
{"test_id": "test_success_002", "action": "approve"},
{"test_id": "test_success_003", "action": "approve"},
{"test_id": "test_success_004", "action": "approve"}
]'
)
```
## Step 6: Run All Tests
Execute all approved tests:
```python
result = run_tests(
goal_id="youtube-research",
agent_path="exports/youtube-research",
test_types='["all"]',
parallel=4
)
```
**Results:**
```json
{
"goal_id": "youtube-research",
"overall_passed": false,
"summary": {
"total": 6,
"passed": 5,
"failed": 1,
"pass_rate": "83.3%"
},
"duration_ms": 4521,
"results": [
{"test_id": "test_constraint_api_001", "passed": true, "duration_ms": 1234},
{"test_id": "test_constraint_content_001", "passed": true, "duration_ms": 456},
{"test_id": "test_success_001", "passed": true, "duration_ms": 789},
{"test_id": "test_success_002", "passed": true, "duration_ms": 654},
{"test_id": "test_success_003", "passed": true, "duration_ms": 543},
{"test_id": "test_success_004", "passed": false, "duration_ms": 845,
"error_category": "IMPLEMENTATION_ERROR",
"error_message": "TypeError: 'NoneType' object has no attribute 'videos'"}
]
}
```
## Step 7: Debug the Failed Test
```python
result = debug_test(
goal_id="youtube-research",
test_id="test_success_004"
)
```
**Debug Output:**
```json
{
"test_id": "test_success_004",
"test_name": "test_find_videos_no_results_graceful",
"input": {"topic": "xyznonexistent123"},
"expected": "Empty list or message",
"actual": {"error": "TypeError: 'NoneType' object has no attribute 'videos'"},
"passed": false,
"error_message": "TypeError: 'NoneType' object has no attribute 'videos'",
"error_category": "IMPLEMENTATION_ERROR",
"stack_trace": "Traceback (most recent call last):\n File \"filter_node.py\", line 42\n for video in result.videos:\nTypeError: 'NoneType' object has no attribute 'videos'",
"logs": [
{"timestamp": "2026-01-20T10:00:01", "node": "search_node", "level": "INFO", "msg": "Searching for: xyznonexistent123"},
{"timestamp": "2026-01-20T10:00:02", "node": "search_node", "level": "WARNING", "msg": "No results found"},
{"timestamp": "2026-01-20T10:00:02", "node": "filter_node", "level": "ERROR", "msg": "NoneType error"}
],
"runtime_data": {
"execution_path": ["start", "search_node", "filter_node"],
"node_outputs": {
"search_node": null
}
},
"suggested_fix": "Add null check in filter_node before accessing .videos attribute",
"iteration_guidance": {
"stage": "Agent",
"action": "Fix the code in nodes/edges",
"restart_required": false,
"description": "The goal is correct, but filter_node doesn't handle null results from search_node."
}
}
```
## Step 8: Iterate Based on Category
Since this is an **IMPLEMENTATION_ERROR**, we:
1. **Don't restart** the Goal → Agent → Eval flow
2. **Fix the agent** using building-agents skill:
- Modify `filter_node` to handle null results
3. **Re-run Eval** (tests only)
### Fix in building-agents:
```python
# Update the filter_node to handle null
add_node(
node_id="filter_node",
name="Filter Node",
description="Filter and rank videos",
node_type="function",
input_keys=["search_results"],
output_keys=["filtered_videos"],
system_prompt="""
Filter videos by relevance.
IMPORTANT: Handle case where search_results is None or empty.
Return empty list if no results.
"""
)
```
### Re-export and re-test:
```python
# Re-export the fixed agent
export_graph(path="exports/youtube-research")
# Re-run tests
result = run_tests(
goal_id="youtube-research",
agent_path="exports/youtube-research",
test_types='["all"]'
)
```
**Updated Results:**
```json
{
"goal_id": "youtube-research",
"overall_passed": true,
"summary": {
"total": 6,
"passed": 6,
"failed": 0,
"pass_rate": "100.0%"
}
}
```
## Summary
1. **Generated** constraint tests during Goal stage
2. **Generated** success criteria tests during Eval stage
3. **Approved** all tests with user review
4. **Ran** tests in parallel
5. **Debugged** the one failure
6. **Categorized** as IMPLEMENTATION_ERROR
7. **Fixed** the agent (not the goal)
8. **Re-ran** Eval only (didn't restart full flow)
9. **Passed** all tests
The agent is now validated and ready for production use.
+145
View File
@@ -0,0 +1,145 @@
# Triage Issue Skill
Analyze a GitHub issue, verify claims against the codebase, and close invalid issues with a technical response.
## Trigger
User provides a GitHub issue URL or number, e.g.:
- `/triage-issue 1970`
- `/triage-issue https://github.com/adenhq/hive/issues/1970`
## Workflow
### Step 1: Fetch Issue Details
```bash
gh issue view <number> --repo adenhq/hive --json title,body,state,labels,author
```
Extract:
- Title
- Body (the claim/bug report)
- Current state
- Labels
- Author
If issue is already closed, inform user and stop.
### Step 2: Analyze the Claim
Read the issue body and identify:
1. **The core claim** - What is the user asserting?
2. **Technical specifics** - File paths, function names, code snippets mentioned
3. **Expected behavior** - What do they think should happen?
4. **Severity claimed** - Security issue? Bug? Feature request?
### Step 3: Investigate the Codebase
For each technical claim:
1. Find the referenced code using Grep/Glob/Read
2. Understand the actual implementation
3. Check if the claim accurately describes the behavior
4. Look for related tests, documentation, or design decisions
### Step 4: Evaluate Validity
Categorize the issue as one of:
| Category | Action |
|----------|--------|
| **Valid Bug** | Do NOT close. Inform user this is a real issue. |
| **Valid Feature Request** | Do NOT close. Suggest labeling appropriately. |
| **Misunderstanding** | Prepare technical explanation for why behavior is correct. |
| **Fundamentally Flawed** | Prepare critique explaining the technical impossibility or design rationale. |
| **Duplicate** | Find the original issue and prepare duplicate notice. |
| **Incomplete** | Prepare request for more information. |
### Step 5: Draft Response
For issues to be closed, draft a response that:
1. **Acknowledges the concern** - Don't be dismissive
2. **Explains the actual behavior** - With code references
3. **Provides technical rationale** - Why it works this way
4. **References industry standards** - If applicable
5. **Offers alternatives** - If there's a better approach for the user
Use this template:
```markdown
## Analysis
[Brief summary of what was investigated]
## Technical Details
[Explanation with code references]
## Why This Is Working As Designed
[Rationale]
## Recommendation
[What the user should do instead, if applicable]
---
*This issue was reviewed and closed by the maintainers.*
```
### Step 6: User Review
Present the draft to the user with:
```
## Issue #<number>: <title>
**Claim:** <summary of claim>
**Finding:** <valid/invalid/misunderstanding/etc>
**Draft Response:**
<the markdown response>
---
Do you want me to post this comment and close the issue?
```
Use AskUserQuestion with options:
- "Post and close" - Post comment, close issue
- "Edit response" - Let user modify the response
- "Skip" - Don't take action
### Step 7: Execute Action
If user approves:
```bash
# Post comment
gh issue comment <number> --repo adenhq/hive --body "<response>"
# Close issue
gh issue close <number> --repo adenhq/hive --reason "not planned"
```
Report success with link to the issue.
## Important Guidelines
1. **Never close valid issues** - If there's any merit to the claim, don't close it
2. **Be respectful** - The reporter took time to file the issue
3. **Be technical** - Provide code references and evidence
4. **Be educational** - Help them understand, don't just dismiss
5. **Check twice** - Make sure you understand the code before declaring something invalid
6. **Consider edge cases** - Maybe their environment reveals a real issue
## Example Critiques
### Security Misunderstanding
> "The claim that secrets are exposed in plaintext misunderstands the encryption architecture. While `SecretStr` is used for logging protection, actual encryption is provided by Fernet (AES-128-CBC) at the storage layer. The code path is: serialize → encrypt → write. Only encrypted bytes touch disk."
### Impossible Request
> "The requested feature would require [X] which violates [fundamental constraint]. This is not a limitation of our implementation but a fundamental property of [technology/protocol]."
### Already Handled
> "This scenario is already handled by [code reference]. The reporter may be using an older version or misconfigured environment."
+7
View File
@@ -0,0 +1,7 @@
# Project-level Codex config for Hive.
# Keep this file minimal: MCP connectivity + skill discovery.
[mcp_servers.agent-builder]
command = "uv"
args = ["run", "--directory", "core", "-m", "framework.mcp.agent_builder_server"]
cwd = "."
+20
View File
@@ -0,0 +1,20 @@
{
"mcpServers": {
"agent-builder": {
"command": "python",
"args": ["-m", "framework.mcp.agent_builder_server"],
"cwd": "core",
"env": {
"PYTHONPATH": "../tools/src"
}
},
"tools": {
"command": "python",
"args": ["mcp_server.py", "--stdio"],
"cwd": "tools",
"env": {
"PYTHONPATH": "src"
}
}
}
}
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-concepts
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-create
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-credentials
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-patterns
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-test
+18
View File
@@ -0,0 +1,18 @@
This project uses ruff for Python linting and formatting.
Rules:
- Line length: 100 characters
- Python target: 3.11+
- Use double quotes for strings
- Sort imports with isort (ruff I rules): stdlib, third-party, first-party (framework), local
- Combine as-imports
- Use type hints on all function signatures
- Use `from __future__ import annotations` for modern type syntax
- Raise exceptions with `from` in except blocks (B904)
- No unused imports (F401), no unused variables (F841)
- Prefer list/dict/set comprehensions over map/filter (C4)
Run `make lint` to auto-fix, `make check` to verify without modifying files.
Run `make format` to apply ruff formatting.
The ruff config lives in core/pyproject.toml under [tool.ruff].
+3
View File
@@ -11,6 +11,9 @@ indent_size = 2
insert_final_newline = true
trim_trailing_whitespace = true
[*.py]
indent_size = 4
[*.md]
trim_trailing_whitespace = false
+124
View File
@@ -0,0 +1,124 @@
# Normalize line endings for all text files
* text=auto
# Source code
*.py text diff=python
*.js text
*.ts text
*.jsx text
*.tsx text
*.json text
*.yaml text
*.yml text
*.toml text
*.ini text
*.cfg text
# Shell scripts (must use LF)
*.sh text eol=lf
quickstart.sh text eol=lf
# PowerShell scripts (Windows-friendly)
*.ps1 text eol=lf
*.psm1 text eol=lf
# Windows batch files (must use CRLF)
*.bat text eol=crlf
*.cmd text eol=crlf
# Documentation
*.md text
*.txt text
*.rst text
*.tex text
# Configuration files
.gitignore text
.gitattributes text
.editorconfig text
Dockerfile text
docker-compose.yml text
requirements*.txt text
pyproject.toml text
setup.py text
setup.cfg text
MANIFEST.in text
LICENSE text
README* text
CHANGELOG* text
CONTRIBUTING* text
CODE_OF_CONDUCT* text
# Web files
*.html text
*.css text
*.scss text
*.sass text
# Data files
*.xml text
*.csv text
*.sql text
# Graphics (binary)
*.png binary
*.jpg binary
*.jpeg binary
*.gif binary
*.ico binary
*.svg binary
*.eps binary
*.bmp binary
*.tif binary
*.tiff binary
# Archives (binary)
*.zip binary
*.tar binary
*.gz binary
*.bz2 binary
*.7z binary
*.rar binary
# Python compiled (binary)
*.pyc binary
*.pyo binary
*.pyd binary
*.whl binary
*.egg binary
# System libraries (binary)
*.so binary
*.dll binary
*.dylib binary
*.lib binary
*.a binary
# Documents (binary)
*.pdf binary
*.doc binary
*.docx binary
*.ppt binary
*.pptx binary
*.xls binary
*.xlsx binary
# Fonts (binary)
*.ttf binary
*.otf binary
*.woff binary
*.woff2 binary
*.eot binary
# Audio/Video (binary)
*.mp3 binary
*.mp4 binary
*.wav binary
*.avi binary
*.mov binary
*.flv binary
# Database files (binary)
*.db binary
*.sqlite binary
*.sqlite3 binary
-1
View File
@@ -8,7 +8,6 @@
/hive/ @adenhq/maintainers
# Infrastructure
/docker-compose*.yml @adenhq/maintainers
/.github/ @adenhq/maintainers
# Documentation
+3 -2
View File
@@ -1,9 +1,10 @@
---
name: Bug Report
about: Report a bug to help us improve
title: '[Bug]: '
labels: bug
title: "[Bug]: "
labels: bug, enhancement
assignees: ''
---
## Describe the Bug
+2 -1
View File
@@ -1,9 +1,10 @@
---
name: Feature Request
about: Suggest a new feature or enhancement
title: '[Feature]: '
title: "[Feature]: "
labels: enhancement
assignees: ''
---
## Problem Statement
@@ -0,0 +1,71 @@
---
name: Integration Request
about: Suggest a new integration
title: "[Integration]:"
labels: ''
assignees: ''
---
## Service
Name and brief description of the service and what it enables agents to do.
**Description:** [e.g., "API key for Slack Bot" — short one-liner for the credential spec]
## Credential Identity
- **credential_id:** [e.g., `slack`]
- **env_var:** [e.g., `SLACK_BOT_TOKEN`]
- **credential_key:** [e.g., `access_token`, `api_key`, `bot_token`]
## Tools
Tool function names that require this credential:
- [e.g., `slack_send_message`]
- [e.g., `slack_list_channels`]
## Auth Methods
- **Direct API key supported:** Yes / No
- **Aden OAuth supported:** Yes / No
If Aden OAuth is supported, describe the OAuth scopes/permissions required.
## How to Get the Credential
Link where users obtain the key/token:
[e.g., https://api.slack.com/apps]
Step-by-step instructions:
1. Go to ...
2. Create a ...
3. Select scopes/permissions: ...
4. Copy the key/token
## Health Check
A lightweight API call to validate the credential (no writes, no charges).
- **Endpoint:** [e.g., `https://slack.com/api/auth.test`]
- **Method:** [e.g., `GET` or `POST`]
- **Auth header:** [e.g., `Authorization: Bearer {token}` or `X-Api-Key: {key}`]
- **Parameters (if any):** [e.g., `?limit=1`]
- **200 means:** [e.g., key is valid]
- **401 means:** [e.g., invalid or expired]
- **429 means:** [e.g., rate limited but key is valid]
## Credential Group
Does this require multiple credentials configured together? (e.g., Google Custom Search needs
both an API key and a CSE ID)
- [ ] No, single credential
- [ ] Yes — list the other credential IDs in the group:
## Additional Context
Links to API docs, rate limits, free tier availability, or anything else relevant.
@@ -0,0 +1,34 @@
name: Auto-close duplicate issues
description: Auto-closes issues that are duplicates of existing issues
on:
schedule:
- cron: "0 */6 * * *"
workflow_dispatch:
jobs:
auto-close-duplicates:
runs-on: ubuntu-latest
timeout-minutes: 10
permissions:
contents: read
issues: write
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Setup Bun
uses: oven-sh/setup-bun@v2
with:
bun-version: latest
- name: Run auto-close-duplicates tests
run: bun test scripts/auto-close-duplicates
- name: Auto-close duplicate issues
run: bun run scripts/auto-close-duplicates.ts
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GITHUB_REPOSITORY_OWNER: ${{ github.repository_owner }}
GITHUB_REPOSITORY_NAME: ${{ github.event.repository.name }}
STATSIG_API_KEY: ${{ secrets.STATSIG_API_KEY }}
+70 -23
View File
@@ -21,21 +21,48 @@ jobs:
uses: actions/setup-python@v5
with:
python-version: '3.11'
cache: 'pip'
- name: Install uv
uses: astral-sh/setup-uv@v4
- name: Install dependencies
run: |
cd core
pip install -e .
pip install -r requirements-dev.txt
run: uv sync --project core --group dev
- name: Run ruff
- name: Ruff lint
run: |
cd core
ruff check .
uv run --project core ruff check core/
uv run --project core ruff check tools/
- name: Ruff format
run: |
uv run --project core ruff format --check core/
uv run --project core ruff format --check tools/
test:
name: Test Python Framework
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest, windows-latest]
steps:
- uses: actions/checkout@v4
- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install uv
uses: astral-sh/setup-uv@v4
- name: Install dependencies and run tests
run: |
cd core
uv sync
uv run pytest tests/ -v
test-tools:
name: Test Tools
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
@@ -44,23 +71,20 @@ jobs:
uses: actions/setup-python@v5
with:
python-version: '3.11'
cache: 'pip'
- name: Install dependencies
run: |
cd core
pip install -e .
pip install -r requirements-dev.txt
- name: Install uv
uses: astral-sh/setup-uv@v4
- name: Run tests
- name: Install dependencies and run tests
run: |
cd core
pytest tests/ -v
cd tools
uv sync --extra dev
uv run pytest tests/ -v
validate:
name: Validate Agent Exports
runs-on: ubuntu-latest
needs: [lint, test]
needs: [lint, test, test-tools]
steps:
- uses: actions/checkout@v4
@@ -68,20 +92,43 @@ jobs:
uses: actions/setup-python@v5
with:
python-version: '3.11'
cache: 'pip'
- name: Install uv
uses: astral-sh/setup-uv@v4
- name: Install dependencies
run: |
cd core
pip install -e .
pip install -r requirements-dev.txt
uv sync
- name: Validate exported agents
run: |
# Check that agent exports have valid structure
for agent_dir in exports/*/; do
if [ ! -d "exports" ]; then
echo "No exports/ directory found, skipping validation"
exit 0
fi
shopt -s nullglob
agent_dirs=(exports/*/)
shopt -u nullglob
if [ ${#agent_dirs[@]} -eq 0 ]; then
echo "No agent directories in exports/, skipping validation"
exit 0
fi
validated=0
for agent_dir in "${agent_dirs[@]}"; do
if [ -f "$agent_dir/agent.json" ]; then
echo "Validating $agent_dir"
python -c "import json; json.load(open('$agent_dir/agent.json'))"
uv run python -c "import json; json.load(open('$agent_dir/agent.json'))"
validated=$((validated + 1))
fi
done
if [ "$validated" -eq 0 ]; then
echo "No agent.json files found in exports/, skipping validation"
else
echo "Validated $validated agent(s)"
fi
+103
View File
@@ -0,0 +1,103 @@
name: Issue Triage
on:
issues:
types: [opened]
jobs:
triage:
runs-on: ubuntu-latest
timeout-minutes: 10
permissions:
contents: read
issues: write
id-token: write
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
fetch-depth: 1
- name: Triage and check for duplicates
uses: anthropics/claude-code-action@v1
with:
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
github_token: ${{ secrets.GITHUB_TOKEN }}
allowed_non_write_users: "*"
prompt: |
Analyze this new issue and perform triage tasks.
Issue: #${{ github.event.issue.number }}
Repository: ${{ github.repository }}
## Your Tasks:
### 1. Get issue details
Use mcp__github__get_issue to get the full details of issue #${{ github.event.issue.number }}
### 2. Check for duplicates
Search for similar existing issues using mcp__github__search_issues with relevant keywords from the issue title and body.
Criteria for duplicates:
- Same bug or error being reported
- Same feature request (even if worded differently)
- Same question being asked
- Issues describing the same root problem
If you find a duplicate:
- Add a comment using EXACTLY this format (required for auto-close to work):
"Found a possible duplicate of #<issue_number>: <brief explanation of why it's a duplicate>"
- Do NOT apply the "duplicate" label yet (the auto-close script will add it after 12 hours if no objections)
- Suggest the user react with a thumbs-down if they disagree
### 3. Check for Low-Quality / AI Spam
Analyze the issue quality. We are receiving many low-effort, AI-generated spam issues.
Flag the issue as INVALID if it matches these criteria:
- **Vague/Generic**: Title is "Fix bug" or "Error" without specific context.
- **Hallucinated**: Refers to files or features that do not exist in this repo.
- **Template Filler**: Body contains "Insert description here" or unrelated gibberish.
- **Low Effort**: No reproduction steps, no logs, only 1-2 sentences.
If identified as spam/low-quality:
- Add the "invalid" label.
- Add a comment:
"This issue has been automatically flagged as low-quality or potentially AI-generated spam. It lacks specific details (logs, reproduction steps, file references) required for us to help. Please open a new issue following the template exactly if this is a legitimate request."
- Do NOT proceed to other steps.
### 4. Check for invalid issues (General)
If the issue is not spam but still lacks information:
- Add the "invalid" label
- Comment asking for clarification
### 5. Categorize with labels (if NOT a duplicate or spam)
Apply appropriate labels based on the issue content. Use ONLY these labels:
- bug: Something isn't working
- enhancement: New feature or request
- question: Further information is requested
- documentation: Improvements or additions to documentation
- good first issue: Good for newcomers (if issue is well-defined and small scope)
- help wanted: Extra attention is needed (if issue needs community input)
- backlog: Tracked for the future, but not currently planned or prioritized
### 6. Estimate size (if NOT a duplicate, spam, or invalid)
Apply exactly ONE size label to help contributors match their capacity to the task:
- "size: small": Docs, typos, single-file fixes, config changes
- "size: medium": Bug fixes with tests, adding a single tool, changes within one package
- "size: large": Cross-package changes (core + tools), new modules, complex logic, architectural refactors
You may apply multiple labels if appropriate (e.g., "bug", "size: small", and "good first issue").
## Tools Available:
- mcp__github__get_issue: Get issue details
- mcp__github__search_issues: Search for similar issues
- mcp__github__list_issues: List recent issues if needed
- mcp__github__add_issue_comment: Add a comment
- mcp__github__update_issue: Add labels
- mcp__github__get_issue_comments: Get existing comments
Be thorough but efficient. Focus on accurate categorization and finding true duplicates.
claude_args: |
--model claude-haiku-4-5-20251001
--allowedTools "mcp__github__get_issue,mcp__github__search_issues,mcp__github__list_issues,mcp__github__add_issue_comment,mcp__github__update_issue,mcp__github__get_issue_comments"
+204
View File
@@ -0,0 +1,204 @@
name: PR Check Command
on:
issue_comment:
types: [created]
jobs:
check-pr:
# Only run on PR comments that start with /check
if: github.event.issue.pull_request && startsWith(github.event.comment.body, '/check')
runs-on: ubuntu-latest
permissions:
pull-requests: write
issues: write
checks: write
statuses: write
steps:
- name: Check PR requirements
uses: actions/github-script@v7
with:
script: |
const prNumber = context.payload.issue.number;
console.log(`Triggered by /check comment on PR #${prNumber}`);
// Fetch PR data
const { data: pr } = await github.rest.pulls.get({
owner: context.repo.owner,
repo: context.repo.repo,
pull_number: prNumber,
});
const prBody = pr.body || '';
const prTitle = pr.title || '';
const prAuthor = pr.user.login;
const headSha = pr.head.sha;
// Create a check run in progress
const { data: checkRun } = await github.rest.checks.create({
owner: context.repo.owner,
repo: context.repo.repo,
name: 'check-requirements',
head_sha: headSha,
status: 'in_progress',
started_at: new Date().toISOString(),
});
// Extract issue numbers
const issuePattern = /(?:close[sd]?|fix(?:e[sd])?|resolve[sd]?)?\s*#(\d+)/gi;
const allText = `${prTitle} ${prBody}`;
const matches = [...allText.matchAll(issuePattern)];
const issueNumbers = [...new Set(matches.map(m => parseInt(m[1], 10)))];
console.log(`PR #${prNumber}:`);
console.log(` Author: ${prAuthor}`);
console.log(` Found issue references: ${issueNumbers.length > 0 ? issueNumbers.join(', ') : 'none'}`);
if (issueNumbers.length === 0) {
const message = `## PR Closed - Requirements Not Met
This PR has been automatically closed because it doesn't meet the requirements.
**Missing:** No linked issue found.
**To fix:**
1. Create or find an existing issue for this work
2. Assign yourself to the issue
3. Re-open this PR and add \`Fixes #123\` in the description
**Why is this required?** See #472 for details.`;
await github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: prNumber,
body: message,
});
await github.rest.pulls.update({
owner: context.repo.owner,
repo: context.repo.repo,
pull_number: prNumber,
state: 'closed',
});
// Update check run to failure
await github.rest.checks.update({
owner: context.repo.owner,
repo: context.repo.repo,
check_run_id: checkRun.id,
status: 'completed',
conclusion: 'failure',
completed_at: new Date().toISOString(),
output: {
title: 'Missing linked issue',
summary: 'PR must reference an issue (e.g., `Fixes #123`)',
},
});
core.setFailed('PR must reference an issue');
return;
}
// Check if PR author is assigned to any linked issue
let issueWithAuthorAssigned = null;
let issuesWithoutAuthor = [];
for (const issueNum of issueNumbers) {
try {
const { data: issue } = await github.rest.issues.get({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: issueNum,
});
const assigneeLogins = (issue.assignees || []).map(a => a.login);
if (assigneeLogins.includes(prAuthor)) {
issueWithAuthorAssigned = issueNum;
console.log(` Issue #${issueNum} has PR author ${prAuthor} as assignee`);
break;
} else {
issuesWithoutAuthor.push({
number: issueNum,
assignees: assigneeLogins
});
console.log(` Issue #${issueNum} assignees: ${assigneeLogins.length > 0 ? assigneeLogins.join(', ') : 'none'}`);
}
} catch (error) {
console.log(` Issue #${issueNum} not found`);
}
}
if (!issueWithAuthorAssigned) {
const issueList = issuesWithoutAuthor.map(i =>
`#${i.number} (assignees: ${i.assignees.length > 0 ? i.assignees.join(', ') : 'none'})`
).join(', ');
const message = `## PR Closed - Requirements Not Met
This PR has been automatically closed because it doesn't meet the requirements.
**PR Author:** @${prAuthor}
**Found issues:** ${issueList}
**Problem:** The PR author must be assigned to the linked issue.
**To fix:**
1. Assign yourself (@${prAuthor}) to one of the linked issues
2. Re-open this PR
**Why is this required?** See #472 for details.`;
await github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: prNumber,
body: message,
});
await github.rest.pulls.update({
owner: context.repo.owner,
repo: context.repo.repo,
pull_number: prNumber,
state: 'closed',
});
// Update check run to failure
await github.rest.checks.update({
owner: context.repo.owner,
repo: context.repo.repo,
check_run_id: checkRun.id,
status: 'completed',
conclusion: 'failure',
completed_at: new Date().toISOString(),
output: {
title: 'PR author not assigned to issue',
summary: `PR author @${prAuthor} must be assigned to one of the linked issues: ${issueList}`,
},
});
core.setFailed('PR author must be assigned to the linked issue');
} else {
await github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: prNumber,
body: `✅ PR requirements met! Issue #${issueWithAuthorAssigned} has @${prAuthor} as assignee.`,
});
// Update check run to success
await github.rest.checks.update({
owner: context.repo.owner,
repo: context.repo.repo,
check_run_id: checkRun.id,
status: 'completed',
conclusion: 'success',
completed_at: new Date().toISOString(),
output: {
title: 'Requirements met',
summary: `Issue #${issueWithAuthorAssigned} has @${prAuthor} as assignee.`,
},
});
console.log(`PR requirements met!`);
}
@@ -0,0 +1,138 @@
name: PR Requirements Backfill
on:
workflow_dispatch:
jobs:
check-all-open-prs:
runs-on: ubuntu-latest
permissions:
pull-requests: write
issues: write
steps:
- name: Check all open PRs
uses: actions/github-script@v7
with:
script: |
const { data: pullRequests } = await github.rest.pulls.list({
owner: context.repo.owner,
repo: context.repo.repo,
state: 'open',
per_page: 100,
});
console.log(`Found ${pullRequests.length} open PRs`);
for (const pr of pullRequests) {
const prNumber = pr.number;
const prBody = pr.body || '';
const prTitle = pr.title || '';
const prAuthor = pr.user.login;
console.log(`\nChecking PR #${prNumber}: ${prTitle}`);
// Extract issue numbers from body and title
const issuePattern = /(?:close[sd]?|fix(?:e[sd])?|resolve[sd]?)?\s*#(\d+)/gi;
const allText = `${prTitle} ${prBody}`;
const matches = [...allText.matchAll(issuePattern)];
const issueNumbers = [...new Set(matches.map(m => parseInt(m[1], 10)))];
console.log(` Found issue references: ${issueNumbers.length > 0 ? issueNumbers.join(', ') : 'none'}`);
if (issueNumbers.length === 0) {
console.log(` ❌ No linked issue - closing PR`);
const message = `## PR Closed - Requirements Not Met
This PR has been automatically closed because it doesn't meet the requirements.
**Missing:** No linked issue found.
**To fix:**
1. Create or find an existing issue for this work
2. Assign yourself to the issue
3. Re-open this PR and add \`Fixes #123\` in the description`;
await github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: prNumber,
body: message,
});
await github.rest.pulls.update({
owner: context.repo.owner,
repo: context.repo.repo,
pull_number: prNumber,
state: 'closed',
});
continue;
}
// Check if any linked issue has the PR author as assignee
let issueWithAuthorAssigned = null;
let issuesWithoutAuthor = [];
for (const issueNum of issueNumbers) {
try {
const { data: issue } = await github.rest.issues.get({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: issueNum,
});
const assigneeLogins = (issue.assignees || []).map(a => a.login);
if (assigneeLogins.includes(prAuthor)) {
issueWithAuthorAssigned = issueNum;
break;
} else {
issuesWithoutAuthor.push({
number: issueNum,
assignees: assigneeLogins
});
}
} catch (error) {
console.log(` Issue #${issueNum} not found or inaccessible`);
}
}
if (!issueWithAuthorAssigned) {
const issueList = issuesWithoutAuthor.map(i =>
`#${i.number} (assignees: ${i.assignees.length > 0 ? i.assignees.join(', ') : 'none'})`
).join(', ');
console.log(` ❌ PR author not assigned to any linked issue - closing PR`);
const message = `## PR Closed - Requirements Not Met
This PR has been automatically closed because it doesn't meet the requirements.
**PR Author:** @${prAuthor}
**Found issues:** ${issueList}
**Problem:** The PR author must be assigned to the linked issue.
**To fix:**
1. Assign yourself (@${prAuthor}) to one of the linked issues
2. Re-open this PR`;
await github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: prNumber,
body: message,
});
await github.rest.pulls.update({
owner: context.repo.owner,
repo: context.repo.repo,
pull_number: prNumber,
state: 'closed',
});
} else {
console.log(` ✅ PR requirements met! Issue #${issueWithAuthorAssigned} has ${prAuthor} as assignee.`);
}
}
console.log('\nBackfill complete!');
+189
View File
@@ -0,0 +1,189 @@
name: PR Requirements Check
on:
pull_request_target:
types: [opened, reopened, edited, synchronize]
jobs:
check-requirements:
runs-on: ubuntu-latest
permissions:
pull-requests: write
issues: write
steps:
- name: Check PR has linked issue with assignee
uses: actions/github-script@v7
with:
script: |
const pr = context.payload.pull_request;
const prNumber = pr.number;
const prBody = pr.body || '';
const prTitle = pr.title || '';
const prLabels = (pr.labels || []).map(l => l.name);
// Allow micro-fix and documentation PRs without a linked issue
const isMicroFix = prLabels.includes('micro-fix') || /micro-fix/i.test(prTitle);
const isDocumentation = prLabels.includes('documentation') || /\bdocs?\b/i.test(prTitle);
if (isMicroFix || isDocumentation) {
const reason = isMicroFix ? 'micro-fix' : 'documentation';
console.log(`PR #${prNumber} is a ${reason}, skipping issue requirement.`);
return;
}
// Extract issue numbers from body and title
// Matches: fixes #123, closes #123, resolves #123, or plain #123
const issuePattern = /(?:close[sd]?|fix(?:e[sd])?|resolve[sd]?)?\s*#(\d+)/gi;
const allText = `${prTitle} ${prBody}`;
const matches = [...allText.matchAll(issuePattern)];
const issueNumbers = [...new Set(matches.map(m => parseInt(m[1], 10)))];
console.log(`PR #${prNumber}:`);
console.log(` Found issue references: ${issueNumbers.length > 0 ? issueNumbers.join(', ') : 'none'}`);
if (issueNumbers.length === 0) {
const message = `## PR Closed - Requirements Not Met
This PR has been automatically closed because it doesn't meet the requirements.
**Missing:** No linked issue found.
**To fix:**
1. Create or find an existing issue for this work
2. Assign yourself to the issue
3. Re-open this PR and add \`Fixes #123\` in the description
**Exception:** To bypass this requirement, you can:
- Add the \`micro-fix\` label or include \`micro-fix\` in your PR title for trivial fixes
- Add the \`documentation\` label or include \`doc\`/\`docs\` in your PR title for documentation changes
**Micro-fix requirements** (must meet ALL):
| Qualifies | Disqualifies |
|-----------|--------------|
| < 20 lines changed | Any functional bug fix |
| Typos & Documentation & Linting | Refactoring for "clean code" |
| No logic/API/DB changes | New features (even tiny ones) |
**Why is this required?** See #472 for details.`;
const comments = await github.rest.issues.listComments({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: prNumber,
});
const botComment = comments.data.find(
(c) => c.user.type === 'Bot' && c.body.includes('PR Closed - Requirements Not Met')
);
if (!botComment) {
await github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: prNumber,
body: message,
});
}
await github.rest.pulls.update({
owner: context.repo.owner,
repo: context.repo.repo,
pull_number: prNumber,
state: 'closed',
});
core.setFailed('PR must reference an issue');
return;
}
// Check if any linked issue has the PR author as assignee
const prAuthor = pr.user.login;
let issueWithAuthorAssigned = null;
let issuesWithoutAuthor = [];
for (const issueNum of issueNumbers) {
try {
const { data: issue } = await github.rest.issues.get({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: issueNum,
});
const assigneeLogins = (issue.assignees || []).map(a => a.login);
if (assigneeLogins.includes(prAuthor)) {
issueWithAuthorAssigned = issueNum;
console.log(` Issue #${issueNum} has PR author ${prAuthor} as assignee`);
break;
} else {
issuesWithoutAuthor.push({
number: issueNum,
assignees: assigneeLogins
});
console.log(` Issue #${issueNum} assignees: ${assigneeLogins.length > 0 ? assigneeLogins.join(', ') : 'none'} (PR author: ${prAuthor})`);
}
} catch (error) {
console.log(` Issue #${issueNum} not found or inaccessible`);
}
}
if (!issueWithAuthorAssigned) {
const issueList = issuesWithoutAuthor.map(i =>
`#${i.number} (assignees: ${i.assignees.length > 0 ? i.assignees.join(', ') : 'none'})`
).join(', ');
const message = `## PR Closed - Requirements Not Met
This PR has been automatically closed because it doesn't meet the requirements.
**PR Author:** @${prAuthor}
**Found issues:** ${issueList}
**Problem:** The PR author must be assigned to the linked issue.
**To fix:**
1. Assign yourself (@${prAuthor}) to one of the linked issues
2. Re-open this PR
**Exception:** To bypass this requirement, you can:
- Add the \`micro-fix\` label or include \`micro-fix\` in your PR title for trivial fixes
- Add the \`documentation\` label or include \`doc\`/\`docs\` in your PR title for documentation changes
**Micro-fix requirements** (must meet ALL):
| Qualifies | Disqualifies |
|-----------|--------------|
| < 20 lines changed | Any functional bug fix |
| Typos & Documentation & Linting | Refactoring for "clean code" |
| No logic/API/DB changes | New features (even tiny ones) |
**Why is this required?** See #472 for details.`;
const comments = await github.rest.issues.listComments({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: prNumber,
});
const botComment = comments.data.find(
(c) => c.user.type === 'Bot' && c.body.includes('PR Closed - Requirements Not Met')
);
if (!botComment) {
await github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: prNumber,
body: message,
});
}
await github.rest.pulls.update({
owner: context.repo.owner,
repo: context.repo.repo,
pull_number: prNumber,
state: 'closed',
});
core.setFailed('PR author must be assigned to the linked issue');
} else {
console.log(`PR requirements met! Issue #${issueWithAuthorAssigned} has ${prAuthor} as assignee.`);
}
+5 -4
View File
@@ -21,18 +21,19 @@ jobs:
uses: actions/setup-python@v5
with:
python-version: '3.11'
cache: 'pip'
- name: Install uv
uses: astral-sh/setup-uv@v4
- name: Install dependencies
run: |
cd core
pip install -e .
pip install -r requirements-dev.txt
uv sync
- name: Run tests
run: |
cd core
pytest tests/ -v
uv run pytest tests/ -v
- name: Generate changelog
id: changelog
+9 -1
View File
@@ -46,6 +46,7 @@ coverage/
# TypeScript
*.tsbuildinfo
vite.config.d.ts
# Python
__pycache__/
@@ -54,7 +55,6 @@ __pycache__/
*.egg-info/
.eggs/
*.egg
uv.lock
# Generated runtime data
core/data/
@@ -69,4 +69,12 @@ exports/*
.agent-builder-sessions/*
.claude/settings.local.json
.venv
docs/github-issues/*
core/tests/*dumps/*
screenshots/*
+3 -14
View File
@@ -1,20 +1,9 @@
{
"mcpServers": {
"agent-builder": {
"command": "python",
"args": ["-m", "framework.mcp.agent_builder_server"],
"cwd": "core",
"env": {
"PYTHONPATH": "../tools/src"
}
},
"tools": {
"command": "python",
"args": ["mcp_server.py", "--stdio"],
"cwd": "tools",
"env": {
"PYTHONPATH": "src"
}
"command": "uv",
"args": ["run", "-m", "framework.mcp.agent_builder_server"],
"cwd": "core"
}
}
}
+30
View File
@@ -0,0 +1,30 @@
{
"mcpServers": {
"agent-builder": {
"command": "uv",
"args": [
"run",
"python",
"-m",
"framework.mcp.agent_builder_server"
],
"cwd": "core",
"env": {
"PYTHONPATH": "../tools/src"
}
},
"tools": {
"command": "uv",
"args": [
"run",
"python",
"mcp_server.py",
"--stdio"
],
"cwd": "tools",
"env": {
"PYTHONPATH": "src"
}
}
}
}
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-concepts
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-create
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-credentials
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-debugger
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-patterns
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-test
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/triage-issue
+18
View File
@@ -0,0 +1,18 @@
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.15.0
hooks:
- id: ruff
name: ruff lint (core)
args: [--fix]
files: ^core/
- id: ruff
name: ruff lint (tools)
args: [--fix]
files: ^tools/
- id: ruff-format
name: ruff format (core)
files: ^core/
- id: ruff-format
name: ruff format (tools)
files: ^tools/
+1
View File
@@ -0,0 +1 @@
3.11
+7
View File
@@ -0,0 +1,7 @@
{
"recommendations": [
"charliermarsh.ruff",
"editorconfig.editorconfig",
"ms-python.python"
]
}
+195 -28
View File
@@ -1,40 +1,207 @@
# Changelog
# Release Notes
All notable changes to this project will be documented in this file.
**Release Date:** February 18, 2026
**Tag:** v0.5.1
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## The Hive Gets a Brain
## [Unreleased]
v0.5.1 is our most ambitious release yet. Hive agents can now **build other agents** -- the new Hive Coder meta-agent writes, tests, and fixes agent packages from natural language. The runtime grows multi-graph support so one session can orchestrate multiple agents simultaneously. The TUI gets a complete overhaul with an in-app agent picker, live streaming, and seamless escalation to the Coder. And we're now provider-agnostic: Claude Code subscriptions, OpenAI-compatible endpoints, and any LiteLLM-supported model work out of the box.
### Added
- Initial project structure
- React frontend (honeycomb) with Vite and TypeScript
- Node.js backend (hive) with Express and TypeScript
- Docker Compose configuration for local development
- Configuration system via `config.yaml`
- GitHub Actions CI/CD workflows
- Comprehensive documentation
---
### Changed
- N/A
## Highlights
### Deprecated
- N/A
### Hive Coder -- The Agent That Builds Agents
### Removed
- N/A
A native meta-agent that lives inside the framework at `core/framework/agents/hive_coder/`. Give it a natural-language specification and it produces a complete agent package -- goal definition, node prompts, edge routing, MCP tool wiring, tests, and all boilerplate files.
### Fixed
- N/A
```bash
# Launch the Coder directly
hive code
### Security
- N/A
# Or escalate from any running agent (TUI)
Ctrl+E # or /coder in chat
```
## [0.1.0] - 2025-01-13
The Coder ships with:
### Added
- Initial release
- **Reference documentation** -- anti-patterns, construction guide, and design patterns baked into its system prompt
- **Guardian watchdog** -- an event-driven monitor that catches agent failures and triggers automatic remediation
- **Coder Tools MCP server** -- file I/O, fuzzy-match editing, git snapshots, and sandboxed shell execution (`tools/coder_tools_server.py`)
- **Test generation** -- structural tests for forever-alive agents that don't hang on `runner.run()`
[Unreleased]: https://github.com/adenhq/hive/compare/v0.1.0...HEAD
[0.1.0]: https://github.com/adenhq/hive/releases/tag/v0.1.0
### Multi-Graph Agent Runtime
`AgentRuntime` now supports loading, managing, and switching between multiple agent graphs within a single session. Six new lifecycle tools give agents (and the TUI) full control:
```python
# Load a second agent into the runtime
await runtime.add_graph("exports/deep_research_agent")
# Tools available to agents:
# load_agent, unload_agent, start_agent, restart_agent, list_agents, get_user_presence
```
The Hive Coder uses multi-graph internally -- when you escalate from a worker agent, the Coder loads as a separate graph while the worker stays alive in the background.
### TUI Revamp
The Terminal UI gets a ground-up rebuild with five major additions:
- **Agent Picker** (Ctrl+A) -- tabbed modal screen for browsing Your Agents, Framework agents, and Examples with metadata badges (node count, tool count, session count, tags)
- **Runtime-optional startup** -- TUI launches without a pre-loaded agent, showing the picker on first open
- **Live streaming pane** -- dedicated RichLog widget shows LLM tokens as they arrive, replacing the old one-token-per-line display
- **PDF attachments** -- `/attach` and `/detach` commands with native OS file dialog (macOS, Linux, Windows)
- **Multi-graph commands** -- `/graphs`, `/graph <id>`, `/load <path>`, `/unload <id>` for managing agent graphs in-session
### Provider-Agnostic LLM Support
Hive is no longer Anthropic-only. v0.5.1 adds first-class support for:
- **Claude Code subscriptions** -- `use_claude_code_subscription: true` in `~/.hive/configuration.json` reads OAuth tokens from `~/.claude/.credentials.json` with automatic refresh
- **OpenAI-compatible endpoints** -- `api_base` config routes traffic through any compatible API (Azure OpenAI, vLLM, Ollama, etc.)
- **Any LiteLLM model** -- `RuntimeConfig` now passes `api_key`, `api_base`, and `extra_kwargs` through to LiteLLM
The quickstart script auto-detects Claude Code subscriptions and ZAI Code installations.
---
## What's New
### Architecture & Runtime
- **Hive Coder meta-agent** -- Natural-language agent builder with reference docs, guardian watchdog, and `hive code` CLI command. (@TimothyZhang7)
- **Multi-graph agent sessions** -- `add_graph`/`remove_graph` on AgentRuntime with 6 lifecycle tools (`load_agent`, `unload_agent`, `start_agent`, `restart_agent`, `list_agents`, `get_user_presence`). (@TimothyZhang7)
- **Claude Code subscription support** -- OAuth token refresh via `use_claude_code_subscription` config, auto-detection in quickstart, LiteLLM header patching. (@TimothyZhang7)
- **OpenAI-compatible endpoint support** -- `api_base` and `extra_kwargs` in `RuntimeConfig` for any OpenAI-compatible API. (@TimothyZhang7)
- **Remove deprecated node types** -- Delete `FlexibleGraphExecutor`, `WorkerNode`, `HybridJudge`, `CodeSandbox`, `Plan`, `FunctionNode`, `LLMNode`, `RouterNode`. Deprecated types (`llm_tool_use`, `llm_generate`, `function`, `router`, `human_input`) now raise `RuntimeError` with migration guidance. (@TimothyZhang7)
- **Interactive credential setup** -- Guided `CredentialSetupSession` with health checks and encrypted storage, accessible via `hive setup-credentials` or automatic prompting on credential errors. (@RichardTang-Aden)
- **Pre-start confirmation prompt** -- Interactive prompt before agent execution allowing credential updates or abort. (@RichardTang-Aden)
- **Event bus multi-graph support** -- `graph_id` on events, `filter_graph` on subscriptions, `ESCALATION_REQUESTED` event type, `exclude_own_graph` filter. (@TimothyZhang7)
### TUI Improvements
- **In-app agent picker** (Ctrl+A) -- Tabbed modal for browsing agents with metadata badges (nodes, tools, sessions, tags). (@TimothyZhang7)
- **Runtime-optional TUI startup** -- Launches without a pre-loaded agent, shows agent picker on startup. (@TimothyZhang7)
- **Hive Coder escalation** (Ctrl+E) -- Escalate to Hive Coder and return; also available via `/coder` and `/back` chat commands. (@TimothyZhang7)
- **PDF attachment support** -- `/attach` and `/detach` commands with native OS file dialog. (@TimothyZhang7)
- **Streaming output pane** -- Dedicated RichLog widget for live LLM token streaming. (@TimothyZhang7)
- **Multi-graph TUI commands** -- `/graphs`, `/graph <id>`, `/load <path>`, `/unload <id>`. (@TimothyZhang7)
- **Agent Guardian watchdog** -- Event-driven monitor that catches secondary agent failures and triggers automatic remediation, with `--no-guardian` CLI flag. (@TimothyZhang7)
### New Tool Integrations
| Tool | Description | Contributor |
| ---------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------ |
| **Discord** | 4 MCP tools (`discord_list_guilds`, `discord_list_channels`, `discord_send_message`, `discord_get_messages`) with rate-limit retry and channel filtering | @mishrapravin114 |
| **Exa Search API** | 4 AI-powered search tools (`exa_search`, `exa_find_similar`, `exa_get_contents`, `exa_answer`) with neural/keyword search, domain filters, and citation-backed answers | @JeetKaria06 |
| **Razorpay** | 6 payment processing tools for payments, invoices, payment links, and refunds with HTTP Basic Auth | @shivamshahi07 |
| **Google Docs** | Document creation, reading, and editing with OAuth credential support | @haliaeetusvocifer |
| **Gmail enhancements** | Expanded mail operations for inbox management | @bryanadenhq |
### Infrastructure
- **Default node type → `event_loop`** -- `NodeSpec.node_type` defaults to `"event_loop"` instead of `"llm_tool_use"`. (@TimothyZhang7)
- **Default `max_node_visits` → 0 (unlimited)** -- Nodes default to unlimited visits, reducing friction for feedback loops and forever-alive agents. (@TimothyZhang7)
- **Remove `function` field from NodeSpec** -- Follows deprecation of `FunctionNode`. (@TimothyZhang7)
- **LiteLLM OAuth patch** -- Correct header construction for OAuth tokens (remove `x-api-key` when Bearer token is present). (@TimothyZhang7)
- **Orchestrator config centralization** -- Reads `api_key`, `api_base`, `extra_kwargs` from centralized `~/.hive/configuration.json`. (@TimothyZhang7)
- **System prompt datetime injection** -- All system prompts now include current date/time for time-aware agent behavior. (@TimothyZhang7)
- **Utils module exports** -- Proper `__init__.py` exports for the utils module. (@Siddharth2624)
- **Increased default max_tokens** -- Opus 4.6 defaults to 32768, Sonnet 4.5 to 16384 (up from 8192). (@TimothyZhang7)
---
## Bug Fixes
- Flush WIP accumulator outputs on cancel/failure so edge conditions see correct values on resume
- Stall detection state preserved across resume (no more resets on checkpoint restore)
- Skip client-facing blocking for event-triggered executions (timer/webhook)
- Executor retry override scoped to actual EventLoopNode instances only
- Add `_awaiting_input` flag to EventLoopNode to prevent input injection race conditions
- Fix TUI streaming display (tokens no longer appear one-per-line)
- Fix `_return_from_escalation` crash when ChatRepl widgets not yet mounted
- Fix tools registration problems for Google Docs credentials (@RichardTang-Aden)
- Fix email agent version conflicts (@RichardTang-Aden)
- Fix coder tool timeouts (120s for tests, 300s cap for commands)
## Documentation
- Clarify installation and prevent root pip install misuse (@paarths-collab)
---
## Agent Updates
- **Email Inbox Management** -- Consolidate `gmail_inbox_guardian` and `inbox_management` into a single unified agent with updated prompts and config. (@RichardTang-Aden, @bryanadenhq)
- **Job Hunter** -- Updated node prompts, config, and agent metadata; added PDF resume selection. (@bryanadenhq)
- **Deep Research Agent** -- Revised node implementations with updated prompts and output handling.
- **Tech News Reporter** -- Revised node prompts for improved output quality.
- **Vulnerability Assessment** -- Expanded prompts with more detailed assessment instructions. (@bryanadenhq)
---
## Breaking Changes
- **Deprecated node types raise `RuntimeError`** -- `llm_tool_use`, `llm_generate`, `function`, `router`, `human_input` now fail instead of warning. Migrate to `event_loop`.
- **`NodeSpec.node_type` defaults to `"event_loop"`** (was `"llm_tool_use"`)
- **`NodeSpec.max_node_visits` defaults to `0` / unlimited** (was `1`)
- **`NodeSpec.function` field removed** -- `FunctionNode` is deleted; use event_loop nodes with tools instead.
---
## Community Contributors
A huge thank you to everyone who contributed to this release:
- **Richard Tang** (@RichardTang-Aden) -- Interactive credential setup, pre-start confirmation, email agent consolidation, tool registration fixes, lint and formatting
- **Pravin Mishra** (@mishrapravin114) -- Discord integration with 4 MCP tools
- **Jeet Karia** (@JeetKaria06) -- Exa Search API integration with 4 AI-powered search tools
- **Shivam Shahi** (@shivamshahi07) -- Razorpay payment processing integration
- **Siddharth Varshney** (@Siddharth2624) -- Utils module exports
- **@haliaeetusvocifer** -- Google Docs integration with OAuth support
- **Bryan** (@bryanadenhq) -- PDF selection, inbox agent fixes, Job Hunter and Vulnerability Assessment updates
- **@paarths-collab** -- Documentation improvements
---
## Upgrading
```bash
git pull origin main
uv sync
```
### Migration Guide
If your agents use deprecated node types, update them:
```python
# Before (v0.5.0) -- these now raise RuntimeError
NodeSpec(node_type="llm_tool_use", ...)
NodeSpec(node_type="function", function=my_func, ...)
# After (v0.5.1) -- use event_loop for everything
NodeSpec(node_type="event_loop", ...) # or just omit node_type (it's the default now)
```
If your agents set `max_node_visits=1` explicitly, they'll still work. The only change is the _default_ -- new agents without an explicit value now get unlimited visits.
To try the new Hive Coder:
```bash
# Launch Coder directly
hive code
# Or from TUI -- press Ctrl+E to escalate
hive tui
```
---
## What's Next
- **Agent-to-agent communication** -- one agent's output triggers another agent's entry point
- **Cost visibility** -- detailed runtime log of LLM costs per node and per session
- **Persistent webhook subscriptions** -- survive agent restarts without re-registering
- **Remote agent deployment** -- run agents as long-lived services with HTTP APIs
+81 -25
View File
@@ -1,34 +1,70 @@
# Contributing to Aden Agent Framework
Thank you for your interest in contributing to the Aden Agent Framework! This document provides guidelines and information for contributors.
Thank you for your interest in contributing to the Aden Agent Framework! This document provides guidelines and information for contributors. Were especially looking for help building tools, integrations ([check #2805](https://github.com/adenhq/hive/issues/2805)), and example agents for the framework. If youre interested in extending its functionality, this is the perfect place to start.
## Code of Conduct
By participating in this project, you agree to abide by our [Code of Conduct](CODE_OF_CONDUCT.md).
By participating in this project, you agree to abide by our [Code of Conduct](docs/CODE_OF_CONDUCT.md).
## Issue Assignment Policy
To prevent duplicate work and respect contributors' time, we require issue assignment before submitting PRs.
### How to Claim an Issue
1. **Find an Issue:** Browse existing issues or create a new one
2. **Claim It:** Leave a comment (e.g., *"I'd like to work on this!"*)
3. **Wait for Assignment:** A maintainer will assign you within 24 hours. Issues with reproducible steps or proposals are prioritized.
4. **Submit Your PR:** Once assigned, you're ready to contribute
> **Note:** PRs for unassigned issues may be delayed or closed if someone else was already assigned.
### Exceptions (No Assignment Needed)
You may submit PRs without prior assignment for:
- **Documentation:** Fixing typos or clarifying instructions — add the `documentation` label or include `doc`/`docs` in your PR title to bypass the linked issue requirement
- **Micro-fixes:** Add the `micro-fix` label or include `micro-fix` in your PR title to bypass the linked issue requirement. Micro-fixes must meet **all** qualification criteria:
| Qualifies | Disqualifies |
|-----------|--------------|
| < 20 lines changed | Any functional bug fix |
| Typos & Documentation & Linting | Refactoring for "clean code" |
| No logic/API/DB changes | New features (even tiny ones) |
## Getting Started
1. Fork the repository
2. Clone your fork: `git clone https://github.com/YOUR_USERNAME/hive.git`
3. Create a feature branch: `git checkout -b feature/your-feature-name`
4. Make your changes
5. Run tests: `PYTHONPATH=core:exports python -m pytest`
6. Commit your changes following our commit conventions
7. Push to your fork and submit a Pull Request
3. Add the upstream repository: `git remote add upstream https://github.com/adenhq/hive.git`
4. Sync with upstream to ensure you're starting from the latest code:
```bash
git fetch upstream
git checkout main
git merge upstream/main
```
5. Create a feature branch: `git checkout -b feature/your-feature-name`
6. Make your changes
7. Run checks and tests:
```bash
make check # Lint and format checks (ruff check + ruff format --check on core/ and tools/)
make test # Core tests (cd core && pytest tests/ -v)
```
8. Commit your changes following our commit conventions
9. Push to your fork and submit a Pull Request
## Development Setup
```bash
# Install Python packages
./scripts/setup-python.sh
# Verify installation
python -c "import framework; import aden_tools; print('✓ Setup complete')"
# Install Claude Code skills (optional)
# Install Python packages and verify setup
./quickstart.sh
```
> **Windows Users:**
> If you are on native Windows, it is recommended to use **WSL (Windows Subsystem for Linux)**.
> Alternatively, make sure to run PowerShell or Git Bash with Python 3.11+ installed, and disable "App Execution Aliases" in Windows settings.
> **Tip:** Installing Claude Code skills is optional for running existing agents, but required if you plan to **build new agents**.
## Commit Convention
We follow [Conventional Commits](https://www.conventionalcommits.org/):
@@ -59,10 +95,10 @@ docs(readme): update installation instructions
## Pull Request Process
1. Update documentation if needed
2. Add tests for new functionality
3. Ensure all tests pass
4. Update the CHANGELOG.md if applicable
1. **Get assigned to the issue first** (see [Issue Assignment Policy](#issue-assignment-policy))
2. Update documentation if needed
3. Add tests for new functionality
4. Ensure `make check` and `make test` pass
5. Request review from maintainers
### PR Title Format
@@ -75,7 +111,7 @@ feat(component): add new feature description
## Project Structure
- `core/` - Core framework (agent runtime, graph executor, protocols)
- `tools/` - MCP Tools Package (19 tools for agent capabilities)
- `tools/` - MCP Tools Package (tools for agent capabilities)
- `exports/` - Agent packages and examples
- `docs/` - Documentation
- `scripts/` - Build and utility scripts
@@ -90,19 +126,39 @@ feat(component): add new feature description
- Use meaningful variable and function names
- Keep functions focused and small
For linting and formatting (Ruff, pre-commit hooks), see [Linting & Formatting Setup](docs/contributing-lint-setup.md).
## Testing
```bash
# Run all tests for the framework
cd core && python -m pytest
> **Note:** When testing agents in `exports/`, always set PYTHONPATH:
>
> ```bash
> PYTHONPATH=exports uv run python -m agent_name test
> ```
# Run all tests for tools
cd tools && python -m pytest
```bash
# Run lint and format checks (mirrors CI lint job)
make check
# Run core framework tests (mirrors CI test job)
make test
# Or run tests directly
cd core && pytest tests/ -v
# Run tools package tests (when contributing to tools/)
cd tools && uv run pytest tests/ -v
# Run tests for a specific agent
PYTHONPATH=core:exports python -m agent_name test
PYTHONPATH=exports uv run python -m agent_name test
```
> **CI also validates** that all exported agent JSON files (`exports/*/agent.json`) are well-formed JSON. Ensure your agent exports are valid before submitting.
## Contributor License Agreement
By submitting a Pull Request, you agree that your contributions will be licensed under the Aden Agent Framework license.
## Questions?
Feel free to open an issue for questions or join our [Discord community](https://discord.com/invite/MXE49hrKDk).
-347
View File
@@ -1,347 +0,0 @@
# Agent Development Environment Setup
Complete setup guide for building and running goal-driven agents with the Aden Agent Framework.
## Quick Setup
```bash
# Run the automated setup script
./scripts/setup-python.sh
```
This will:
- Check Python version (requires 3.11+)
- Install the core framework package (`framework`)
- Install the tools package (`aden_tools`)
- Fix package compatibility issues (openai + litellm)
- Verify all installations
## Manual Setup (Alternative)
If you prefer to set up manually or the script fails:
### 1. Install Core Framework
```bash
cd core
pip install -e .
```
### 2. Install Tools Package
```bash
cd tools
pip install -e .
```
### 3. Upgrade OpenAI Package
```bash
# litellm requires openai >= 1.0.0
pip install --upgrade "openai>=1.0.0"
```
### 4. Verify Installation
```bash
python -c "import framework; print('✓ framework OK')"
python -c "import aden_tools; print('✓ aden_tools OK')"
python -c "import litellm; print('✓ litellm OK')"
```
## Requirements
### Python Version
- **Minimum:** Python 3.11
- **Recommended:** Python 3.11 or 3.12
- **Tested on:** Python 3.11, 3.12, 3.13
### System Requirements
- pip (latest version)
- 2GB+ RAM
- Internet connection (for LLM API calls)
### API Keys (Optional)
For running agents with real LLMs:
```bash
export ANTHROPIC_API_KEY="your-key-here"
```
## Running Agents
All agent commands must be run from the project root with `PYTHONPATH` set:
```bash
# From /home/timothy/oss/hive/ directory
PYTHONPATH=core:exports python -m agent_name COMMAND
```
### Example: Support Ticket Agent
```bash
# Validate agent structure
PYTHONPATH=core:exports python -m support_ticket_agent validate
# Show agent information
PYTHONPATH=core:exports python -m support_ticket_agent info
# Run agent with input
PYTHONPATH=core:exports python -m support_ticket_agent run --input '{
"ticket_content": "My login is broken. Error 401.",
"customer_id": "CUST-123",
"ticket_id": "TKT-456"
}'
# Run in mock mode (no LLM calls)
PYTHONPATH=core:exports python -m support_ticket_agent run --mock --input '{...}'
```
### Example: Other Agents
```bash
# Market Research Agent
PYTHONPATH=core:exports python -m market_research_agent info
# Outbound Sales Agent
PYTHONPATH=core:exports python -m outbound_sales_agent validate
# Personal Assistant Agent
PYTHONPATH=core:exports python -m personal_assistant_agent run --input '{...}'
```
## Building New Agents
Use Claude Code CLI with the agent building skills:
### 1. Install Skills (One-time)
```bash
./quickstart.sh
```
This installs:
- `/building-agents` - Build new agents
- `/testing-agent` - Test agents
### 2. Build an Agent
```
claude> /building-agents
```
Follow the prompts to:
1. Define your agent's goal
2. Design the workflow nodes
3. Connect edges
4. Generate the agent package
### 3. Test Your Agent
```
claude> /testing-agent
```
Creates comprehensive test suites for your agent.
## Troubleshooting
### "ModuleNotFoundError: No module named 'framework'"
**Solution:** Install the core package:
```bash
cd core && pip install -e .
```
### "ModuleNotFoundError: No module named 'aden_tools'"
**Solution:** Install the tools package:
```bash
cd tools && pip install -e .
```
Or run the setup script:
```bash
./scripts/setup-python.sh
```
### "ModuleNotFoundError: No module named 'openai.\_models'"
**Cause:** Outdated `openai` package (0.27.x) incompatible with `litellm`
**Solution:** Upgrade openai:
```bash
pip install --upgrade "openai>=1.0.0"
```
### "No module named 'support_ticket_agent'"
**Cause:** Not running from project root or missing PYTHONPATH
**Solution:** Ensure you're in `/home/timothy/oss/hive/` and use:
```bash
PYTHONPATH=core:exports python -m support_ticket_agent validate
```
### Agent imports fail with "broken installation"
**Symptom:** `pip list` shows packages pointing to non-existent directories
**Solution:** Reinstall packages properly:
```bash
# Remove broken installations
pip uninstall -y framework tools
# Reinstall correctly
cd /home/timothy/oss/hive
./scripts/setup-python.sh
```
## Package Structure
The Hive framework consists of three Python packages:
```
hive/
├── core/ # Core framework (runtime, graph executor, LLM providers)
│ ├── framework/
│ ├── pyproject.toml
│ └── requirements.txt
├── tools/ # Tools and MCP servers
│ ├── src/
│ │ └── aden_tools/ # Actual package location
│ ├── pyproject.toml
│ └── README.md
└── exports/ # Agent packages (your agents go here)
├── support_ticket_agent/
├── market_research_agent/
├── outbound_sales_agent/
└── personal_assistant_agent/
```
### Why PYTHONPATH is Required
The packages are installed in **editable mode** (`pip install -e`), which means:
- `framework` and `aden_tools` are globally importable (no PYTHONPATH needed)
- `exports` is NOT installed as a package (PYTHONPATH required)
This design allows agents in `exports/` to be:
- Developed independently
- Version controlled separately
- Deployed as standalone packages
## Development Workflow
### 1. Setup (Once)
```bash
./scripts/setup-python.sh
```
### 2. Build Agent (Claude Code)
```
claude> /building-agents
Enter goal: "Build an agent that processes customer support tickets"
```
### 3. Validate Agent
```bash
PYTHONPATH=core:exports python -m support_ticket_agent validate
```
### 4. Test Agent
```
claude> /testing-agent
```
### 5. Run Agent
```bash
PYTHONPATH=core:exports python -m support_ticket_agent run --input '{...}'
```
## IDE Setup
### VSCode
Add to `.vscode/settings.json`:
```json
{
"python.analysis.extraPaths": [
"${workspaceFolder}/core",
"${workspaceFolder}/exports"
],
"python.autoComplete.extraPaths": [
"${workspaceFolder}/core",
"${workspaceFolder}/exports"
]
}
```
### PyCharm
1. Open Project Settings → Project Structure
2. Mark `core` as Sources Root
3. Mark `exports` as Sources Root
## Environment Variables
### Required for LLM Operations
```bash
export ANTHROPIC_API_KEY="sk-ant-..."
```
### Optional Configuration
```bash
# Credentials storage location (default: ~/.aden/credentials)
export ADEN_CREDENTIALS_PATH="/custom/path"
# Agent storage location (default: /tmp)
export AGENT_STORAGE_PATH="/custom/storage"
```
## Additional Resources
- **Framework Documentation:** [core/README.md](core/README.md)
- **Tools Documentation:** [tools/README.md](tools/README.md)
- **Example Agents:** [exports/](exports/)
- **Agent Building Guide:** [.claude/skills/building-agents-construction/SKILL.md](.claude/skills/building-agents-construction/SKILL.md)
- **Testing Guide:** [.claude/skills/testing-agent/SKILL.md](.claude/skills/testing-agent/SKILL.md)
## Contributing
When contributing agent packages:
1. Place agents in `exports/agent_name/`
2. Follow the standard agent structure (see existing agents)
3. Include README.md with usage instructions
4. Add tests if using `/testing-agent`
5. Document required environment variables
## Support
- **Issues:** https://github.com/adenhq/hive/issues
- **Discord:** https://discord.com/invite/MXE49hrKDk
- **Documentation:** https://docs.adenhq.com/
+34
View File
@@ -0,0 +1,34 @@
.PHONY: lint format check test install-hooks help frontend-dev frontend-build
help: ## Show this help
@grep -E '^[a-zA-Z_-]+:.*?## .*$$' $(MAKEFILE_LIST) | \
awk 'BEGIN {FS = ":.*?## "}; {printf " \033[36m%-15s\033[0m %s\n", $$1, $$2}'
lint: ## Run ruff linter and formatter (with auto-fix)
cd core && ruff check --fix .
cd tools && ruff check --fix .
cd core && ruff format .
cd tools && ruff format .
format: ## Run ruff formatter
cd core && ruff format .
cd tools && ruff format .
check: ## Run all checks without modifying files (CI-safe)
cd core && ruff check .
cd tools && ruff check .
cd core && ruff format --check .
cd tools && ruff format --check .
test: ## Run all tests
cd core && uv run python -m pytest tests/ -v
install-hooks: ## Install pre-commit hooks
uv pip install pre-commit
pre-commit install
frontend-dev: ## Start frontend dev server
cd core/frontend && npm run dev
frontend-build: ## Build frontend for production
cd core/frontend && npm run build
+327 -226
View File
@@ -1,27 +1,31 @@
<p align="center">
<img width="100%" alt="Hive Banner" src="https://storage.googleapis.com/aden-prod-assets/website/aden-title-card.png" />
<img width="100%" alt="Hive Banner" src="https://github.com/user-attachments/assets/a027429b-5d3c-4d34-88e4-0feaeaabbab3" />
</p>
<p align="center">
<a href="README.md">English</a> |
<a href="README.zh-CN.md">简体中文</a> |
<a href="README.es.md">Español</a> |
<a href="README.pt.md">Português</a> |
<a href="README.ja.md">日本語</a> |
<a href="README.ru.md">Русский</a>
<a href="docs/i18n/zh-CN.md">简体中文</a> |
<a href="docs/i18n/es.md">Español</a> |
<a href="docs/i18n/hi.md">हिन्दी</a> |
<a href="docs/i18n/pt.md">Português</a> |
<a href="docs/i18n/ja.md">日本語</a> |
<a href="docs/i18n/ru.md">Русский</a> |
<a href="docs/i18n/ko.md">한국어</a>
</p>
[![Apache 2.0 License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://github.com/adenhq/hive/blob/main/LICENSE)
[![Y Combinator](https://img.shields.io/badge/Y%20Combinator-Aden-orange)](https://www.ycombinator.com/companies/aden)
[![Docker Pulls](https://img.shields.io/docker/pulls/adenhq/hive?logo=Docker&labelColor=%23528bff)](https://hub.docker.com/u/adenhq)
[![Discord](https://img.shields.io/discord/1172610340073242735?logo=discord&labelColor=%235462eb&logoColor=%23f5f5f5&color=%235462eb)](https://discord.com/invite/MXE49hrKDk)
[![Twitter Follow](https://img.shields.io/twitter/follow/teamaden?logo=X&color=%23f5f5f5)](https://x.com/aden_hq)
[![LinkedIn](https://custom-icon-badges.demolab.com/badge/LinkedIn-0A66C2?logo=linkedin-white&logoColor=fff)](https://www.linkedin.com/company/teamaden/)
<p align="center">
<a href="https://github.com/adenhq/hive/blob/main/LICENSE"><img src="https://img.shields.io/badge/License-Apache%202.0-blue.svg" alt="Apache 2.0 License" /></a>
<a href="https://www.ycombinator.com/companies/aden"><img src="https://img.shields.io/badge/Y%20Combinator-Aden-orange" alt="Y Combinator" /></a>
<a href="https://discord.com/invite/MXE49hrKDk"><img src="https://img.shields.io/discord/1172610340073242735?logo=discord&labelColor=%235462eb&logoColor=%23f5f5f5&color=%235462eb" alt="Discord" /></a>
<a href="https://x.com/aden_hq"><img src="https://img.shields.io/twitter/follow/teamaden?logo=X&color=%23f5f5f5" alt="Twitter Follow" /></a>
<a href="https://www.linkedin.com/company/teamaden/"><img src="https://custom-icon-badges.demolab.com/badge/LinkedIn-0A66C2?logo=linkedin-white&logoColor=fff" alt="LinkedIn" /></a>
<img src="https://img.shields.io/badge/MCP-102_Tools-00ADD8?style=flat-square" alt="MCP" />
</p>
<p align="center">
<img src="https://img.shields.io/badge/AI_Agents-Self--Improving-brightgreen?style=flat-square" alt="AI Agents" />
<img src="https://img.shields.io/badge/Multi--Agent-Systems-blue?style=flat-square" alt="Multi-Agent" />
<img src="https://img.shields.io/badge/Goal--Driven-Development-purple?style=flat-square" alt="Goal-Driven" />
<img src="https://img.shields.io/badge/Headless-Development-purple?style=flat-square" alt="Headless" />
<img src="https://img.shields.io/badge/Human--in--the--Loop-orange?style=flat-square" alt="HITL" />
<img src="https://img.shields.io/badge/Production--Ready-red?style=flat-square" alt="Production" />
</p>
@@ -29,264 +33,383 @@
<img src="https://img.shields.io/badge/OpenAI-supported-412991?style=flat-square&logo=openai" alt="OpenAI" />
<img src="https://img.shields.io/badge/Anthropic-supported-d4a574?style=flat-square" alt="Anthropic" />
<img src="https://img.shields.io/badge/Google_Gemini-supported-4285F4?style=flat-square&logo=google" alt="Gemini" />
<img src="https://img.shields.io/badge/MCP-19_Tools-00ADD8?style=flat-square" alt="MCP" />
</p>
## Overview
Build reliable, self-improving AI agents without hardcoding workflows. Define your goal through conversation with a coding agent, and the framework generates a node graph with dynamically created connection code. When things break, the framework captures failure data, evolves the agent through the coding agent, and redeploys. Built-in human-in-the-loop nodes, credential management, and real-time monitoring give you control without sacrificing adaptability.
Build autonomous, reliable, self-improving AI agents without hardcoding workflows. Define your goal through conversation with a coding agent, and the framework generates a node graph with dynamically created connection code. When things break, the framework captures failure data, evolves the agent through the coding agent, and redeploys. Built-in human-in-the-loop nodes, credential management, and real-time monitoring give you control without sacrificing adaptability.
Visit [adenhq.com](https://adenhq.com) for complete documentation, examples, and guides.
## What is Aden
https://github.com/user-attachments/assets/846c0cc7-ffd6-47fa-b4b7-495494857a55
<p align="center">
<img width="100%" alt="Aden Architecture" src="docs/assets/aden-architecture-diagram.jpg" />
</p>
## Who Is Hive For?
Aden is a platform for building, deploying, operating, and adapting AI agents:
Hive is designed for developers and teams who want to build **production-grade AI agents** without manually wiring complex workflows.
- **Build** - A Coding Agent generates specialized Worker Agents (Sales, Marketing, Ops) from natural language goals
- **Deploy** - Headless deployment with CI/CD integration and full API lifecycle management
- **Operate** - Real-time monitoring, observability, and runtime guardrails keep agents reliable
- **Adapt** - Continuous evaluation, supervision, and adaptation ensure agents improve over time
- **Infra** - Shared memory, LLM integrations, tools, and skills power every agent
Hive is a good fit if you:
- Want AI agents that **execute real business processes**, not demos
- Prefer **goal-driven development** over hardcoded workflows
- Need **self-healing and adaptive agents** that improve over time
- Require **human-in-the-loop control**, observability, and cost limits
- Plan to run agents in **production environments**
Hive may not be the best fit if youre only experimenting with simple agent chains or one-off scripts.
## When Should You Use Hive?
Use Hive when you need:
- Long-running, autonomous agents
- Strong guardrails, process, and controls
- Continuous improvement based on failures
- Multi-agent coordination
- A framework that evolves with your goals
## Quick Links
- **[Documentation](https://docs.adenhq.com/)** - Complete guides and API reference
- **[Self-Hosting Guide](https://docs.adenhq.com/getting-started/quickstart)** - Deploy Hive on your infrastructure
- **[Changelog](https://github.com/adenhq/hive/releases)** - Latest updates and releases
<!-- - **[Roadmap](https://adenhq.com/roadmap)** - Upcoming features and plans -->
- **[Roadmap](docs/roadmap.md)** - Upcoming features and plans
- **[Report Issues](https://github.com/adenhq/hive/issues)** - Bug reports and feature requests
- **[Contributing](CONTRIBUTING.md)** - How to contribute and submit PRs
## Quick Start
### Prerequisites
- [Python 3.11+](https://www.python.org/downloads/) for agent development
- [Docker](https://docs.docker.com/get-docker/) (v20.10+) - Optional, for containerized tools
- Python 3.11+ for agent development
- Claude Code, Codex CLI, or Cursor for utilizing agent skills
> **Note for Windows Users:** It is strongly recommended to use **WSL (Windows Subsystem for Linux)** or **Git Bash** to run this framework. Some core automation scripts may not execute correctly in standard Command Prompt or PowerShell.
### Installation
> **Note**
> Hive uses a `uv` workspace layout and is not installed with `pip install`.
> Running `pip install -e .` from the repository root will create a placeholder package and Hive will not function correctly.
> Please use the quickstart script below to set up the environment.
```bash
# Clone the repository
git clone https://github.com/adenhq/hive.git
cd hive
# Run Python environment setup
./scripts/setup-python.sh
# Run quickstart setup
./quickstart.sh
```
This installs:
- **framework** - Core agent runtime and graph executor
- **aden_tools** - 19 MCP tools for agent capabilities
- All required dependencies
This sets up:
- **framework** - Core agent runtime and graph executor (in `core/.venv`)
- **aden_tools** - MCP tools for agent capabilities (in `tools/.venv`)
- **credential store** - Encrypted API key storage (`~/.hive/credentials`)
- **LLM provider** - Interactive default model configuration
- All required Python dependencies with `uv`
### Build Your First Agent
```bash
# Install Claude Code skills (one-time)
./quickstart.sh
# Build an agent using Claude Code
claude> /building-agents
claude> /hive
# Test your agent
claude> /testing-agent
claude> /hive-debugger
# Run your agent
PYTHONPATH=core:exports python -m your_agent_name run --input '{...}'
# (at separate terminal) Launch the interactive dashboard
hive tui
# Or run directly
hive run exports/your_agent_name --input '{"key": "value"}'
```
**[📖 Complete Setup Guide](ENVIRONMENT_SETUP.md)** - Detailed instructions for agent development
## Coding Agent Support
### Codex CLI
Hive includes native support for [OpenAI Codex CLI](https://github.com/openai/codex) (v0.101.0+).
1. **Config:** `.codex/config.toml` with `agent-builder` MCP server (tracked in git)
2. **Skills:** `.agents/skills/` symlinks to Hive skills (tracked in git)
3. **Launch:** Run `codex` in the repo root, then type `use hive`
Example:
```
codex> use hive
```
### Opencode
Hive includes native support for [Opencode](https://github.com/opencode-ai/opencode).
1. **Setup:** Run the quickstart script
2. **Launch:** Open Opencode in the project root.
3. **Activate:** Type `/hive` in the chat to switch to the Hive Agent.
4. **Verify:** Ask the agent _"List your tools"_ to confirm the connection.
The agent has access to all Hive skills and can scaffold agents, add tools, and debug workflows directly from the chat.
**[📖 Complete Setup Guide](docs/environment-setup.md)** - Detailed instructions for agent development
### Antigravity IDE Support
Skills and MCP servers are also available in [Antigravity IDE](https://antigravity.google/) (Google's AI-powered IDE). **Easiest:** open a terminal in the hive repo folder and run (use `./` — the script is inside the repo):
```bash
./scripts/setup-antigravity-mcp.sh
```
**Important:** Always restart/refresh Antigravity IDE after running the setup script—MCP servers only load on startup. After restart, **agent-builder** and **tools** MCP servers should connect. Skills are under `.agent/skills/` (symlinks to `.claude/skills/`). See [docs/antigravity-setup.md](docs/antigravity-setup.md) for manual setup and troubleshooting.
## Features
- **Goal-Driven Development** - Define objectives in natural language; the coding agent generates the agent graph and connection code to achieve them
- **Self-Adapting Agents** - Framework captures failures, updates objectives and updates the agent graph
- **Dynamic Node Connections** - No predefined edges; connection code is generated by any capable LLM based on your goals
- **[Goal-Driven Development](docs/key_concepts/goals_outcome.md)** - Define objectives in natural language; the coding agent generates the agent graph and connection code to achieve them
- **[Adaptiveness](docs/key_concepts/evolution.md)** - Framework captures failures, calibrates according to the objectives, and evolves the agent graph
- **[Dynamic Node Connections](docs/key_concepts/graph.md)** - No predefined edges; connection code is generated by any capable LLM based on your goals
- **SDK-Wrapped Nodes** - Every node gets shared memory, local RLM memory, monitoring, tools, and LLM access out of the box
- **Human-in-the-Loop** - Intervention nodes that pause execution for human input with configurable timeouts and escalation
- **[Human-in-the-Loop](docs/key_concepts/graph.md#human-in-the-loop)** - Intervention nodes that pause execution for human input with configurable timeouts and escalation
- **Real-time Observability** - WebSocket streaming for live monitoring of agent execution, decisions, and node-to-node communication
- **Interactive TUI Dashboard** - Terminal-based dashboard with live graph view, event log, and chat interface for agent interaction
- **Cost & Budget Control** - Set spending limits, throttles, and automatic model degradation policies
- **Production-Ready** - Self-hostable, built for scale and reliability
## Integration
<a href="https://github.com/adenhq/hive/tree/main/tools/src/aden_tools/tools"><img width="100%" alt="Integration" src="https://github.com/user-attachments/assets/a1573f93-cf02-4bb8-b3d5-b305b05b1e51" /></a>
Hive is built to be model-agnostic and system-agnostic.
- **LLM flexibility** - Hive Framework is designed to support various types of LLMs, including hosted and local models through LiteLLM-compatible providers.
- **Business system connectivity** - Hive Framework is designed to connect to all kinds of business systems as tools, such as CRM, support, messaging, data, file, and internal APIs via MCP.
## Why Aden
Traditional agent frameworks require you to manually design workflows, define agent interactions, and handle failures reactively. Aden flips this paradigm**you describe outcomes, and the system builds itself**.
Hive focuses on generating agents that run real business processes rather than generic agents. Instead of requiring you to manually design workflows, define agent interactions, and handle failures reactively, Hive flips the paradigm: **you describe outcomes, and the system builds itself**—delivering an outcome-driven, adaptive experience with an easy-to-use set of tools and integrations.
```mermaid
flowchart LR
subgraph BUILD["🏗️ BUILD"]
GOAL["Define Goal<br/>+ Success Criteria"] --> NODES["Add Nodes<br/>LLM/Router/Function"]
NODES --> EDGES["Connect Edges<br/>on_success/failure/conditional"]
EDGES --> TEST["Test & Validate"] --> APPROVE["Approve & Export"]
end
GOAL["Define Goal"] --> GEN["Auto-Generate Graph"]
GEN --> EXEC["Execute Agents"]
EXEC --> MON["Monitor & Observe"]
MON --> CHECK{{"Pass?"}}
CHECK -- "Yes" --> DONE["Deliver Result"]
CHECK -- "No" --> EVOLVE["Evolve Graph"]
EVOLVE --> EXEC
subgraph EXPORT["📦 EXPORT"]
direction TB
JSON["agent.json<br/>(GraphSpec)"]
TOOLS["tools.py<br/>(Functions)"]
MCP["mcp_servers.json<br/>(Integrations)"]
end
GOAL -.- V1["Natural Language"]
GEN -.- V2["Instant Architecture"]
EXEC -.- V3["Easy Integrations"]
MON -.- V4["Full visibility"]
EVOLVE -.- V5["Adaptability"]
DONE -.- V6["Reliable outcomes"]
subgraph RUN["🚀 RUNTIME"]
LOAD["AgentRunner<br/>Load + Parse"] --> SETUP["Setup Runtime<br/>+ ToolRegistry"]
SETUP --> EXEC["GraphExecutor<br/>Execute Nodes"]
subgraph DECISION["Decision Recording"]
DEC1["runtime.decide()<br/>intent → options → choice"]
DEC2["runtime.record_outcome()<br/>success, result, metrics"]
end
end
subgraph INFRA["⚙️ INFRASTRUCTURE"]
CTX["NodeContext<br/>memory • llm • tools"]
STORE[("FileStorage<br/>Runs & Decisions")]
end
APPROVE --> EXPORT
EXPORT --> LOAD
EXEC --> DECISION
EXEC --> CTX
DECISION --> STORE
STORE -.->|"Analyze & Improve"| NODES
style BUILD fill:#ffbe42,stroke:#cc5d00,stroke-width:3px,color:#333
style EXPORT fill:#fff59d,stroke:#ed8c00,stroke-width:2px,color:#333
style RUN fill:#ffb100,stroke:#cc5d00,stroke-width:3px,color:#333
style DECISION fill:#ffcc80,stroke:#ed8c00,stroke-width:2px,color:#333
style INFRA fill:#e8763d,stroke:#cc5d00,stroke-width:3px,color:#fff
style STORE fill:#ed8c00,stroke:#cc5d00,stroke-width:2px,color:#fff
style GOAL fill:#ffbe42,stroke:#cc5d00,stroke-width:2px,color:#333
style GEN fill:#ffb100,stroke:#cc5d00,stroke-width:2px,color:#333
style EXEC fill:#ff9800,stroke:#cc5d00,stroke-width:2px,color:#fff
style MON fill:#ff9800,stroke:#cc5d00,stroke-width:2px,color:#fff
style CHECK fill:#fff59d,stroke:#ed8c00,stroke-width:2px,color:#333
style DONE fill:#4caf50,stroke:#2e7d32,stroke-width:2px,color:#fff
style EVOLVE fill:#e8763d,stroke:#cc5d00,stroke-width:2px,color:#fff
style V1 fill:#fff,stroke:#ed8c00,stroke-width:1px,color:#cc5d00
style V2 fill:#fff,stroke:#ed8c00,stroke-width:1px,color:#cc5d00
style V3 fill:#fff,stroke:#ed8c00,stroke-width:1px,color:#cc5d00
style V4 fill:#fff,stroke:#ed8c00,stroke-width:1px,color:#cc5d00
style V5 fill:#fff,stroke:#ed8c00,stroke-width:1px,color:#cc5d00
style V6 fill:#fff,stroke:#ed8c00,stroke-width:1px,color:#cc5d00
```
### The Aden Advantage
### The Hive Advantage
| Traditional Frameworks | Aden |
| Traditional Frameworks | Hive |
| -------------------------- | -------------------------------------- |
| Hardcode agent workflows | Describe goals in natural language |
| Manual graph definition | Auto-generated agent graphs |
| Reactive error handling | Proactive self-evolution |
| Reactive error handling | Outcome-evaluation and adaptiveness |
| Static tool configurations | Dynamic SDK-wrapped nodes |
| Separate monitoring setup | Built-in real-time observability |
| DIY budget management | Integrated cost controls & degradation |
### How It Works
1. **Define Your Goal** → Describe what you want to achieve in plain English
2. **Coding Agent Generates** → Creates the agent graph, connection code, and test cases
3. **Workers Execute** → SDK-wrapped nodes run with full observability and tool access
1. **[Define Your Goal](docs/key_concepts/goals_outcome.md)** → Describe what you want to achieve in plain English
2. **Coding Agent Generates** → Creates the [agent graph](docs/key_concepts/graph.md), connection code, and test cases
3. **[Workers Execute](docs/key_concepts/worker_agent.md)** → SDK-wrapped nodes run with full observability and tool access
4. **Control Plane Monitors** → Real-time metrics, budget enforcement, policy management
5. **Self-Improve** → On failure, the system evolves the graph and redeploys automatically
5. **[Adaptiveness](docs/key_concepts/evolution.md)** → On failure, the system evolves the graph and redeploys automatically
## How Aden Compares
## Run Agents
Aden takes a fundamentally different approach to agent development. While most frameworks require you to hardcode workflows or manually define agent graphs, Aden uses a **coding agent to generate your entire agent system** from natural language goals. When agents fail, the framework doesn't just log errors—it **automatically evolves the agent graph** and redeploys.
### Comparison Table
| Framework | Category | Approach | Aden Difference |
| ----------------------------------- | ------------------------- | --------------------------------------------------------------- | --------------------------------------------------------- |
| **LangChain, LlamaIndex, Haystack** | Component Libraries | Predefined components for RAG/LLM apps; manual connection logic | Generates entire graph and connection code upfront |
| **CrewAI, AutoGen, Swarm** | Multi-Agent Orchestration | Role-based agents with predefined collaboration patterns | Dynamically creates agents/connections; adapts on failure |
| **PydanticAI, Mastra, Agno** | Type-Safe Frameworks | Structured outputs and validation for known workflows | Evolving workflows; structure emerges through iteration |
| **Agent Zero, Letta** | Personal AI Assistants | Memory and learning; OS-as-tool or stateful memory focus | Production multi-agent systems with self-healing |
| **CAMEL** | Research Framework | Emergent behavior in large-scale simulations (up to 1M agents) | Production-oriented with reliable execution and recovery |
| **TEN Framework, Genkit** | Infrastructure Frameworks | Real-time multimodal (TEN) or full-stack AI (Genkit) | Higher abstraction—generates and evolves agent logic |
| **GPT Engineer, Motia** | Code Generation | Code from specs (GPT Engineer) or "Step" primitive (Motia) | Self-adapting graphs with automatic failure recovery |
| **Trading Agents** | Domain-Specific | Hardcoded trading firm roles on LangGraph | Domain-agnostic; generates structures for any use case |
### When to Choose Aden
Choose Aden when you need:
- Agents that **self-improve from failures** without manual intervention
- **Goal-driven development** where you describe outcomes, not workflows
- **Production reliability** with automatic recovery and redeployment
- **Rapid iteration** on agent architectures without rewriting code
- **Full observability** with real-time monitoring and human oversight
Choose other frameworks when you need:
- **Type-safe, predictable workflows** (PydanticAI, Mastra)
- **RAG and document processing** (LlamaIndex, Haystack)
- **Research on agent emergence** (CAMEL)
- **Real-time voice/multimodal** (TEN Framework)
- **Simple component chaining** (LangChain, Swarm)
## Project Structure
```
hive/
├── core/ # Core framework - Agent runtime, graph executor, protocols
├── tools/ # MCP Tools Package - 19 tools for agent capabilities
├── exports/ # Agent packages - Pre-built agents and examples
├── docs/ # Documentation and guides
├── scripts/ # Build and utility scripts
├── .claude/ # Claude Code skills for building agents
├── ENVIRONMENT_SETUP.md # Python setup guide for agent development
├── DEVELOPER.md # Developer guide
├── CONTRIBUTING.md # Contribution guidelines
└── ROADMAP.md # Product roadmap
```
## Development
### Python Agent Development
For building and running goal-driven agents with the framework:
The `hive` CLI is the primary interface for running agents.
```bash
# One-time setup
./scripts/setup-python.sh
# Browse and run agents interactively (Recommended)
hive tui
# This installs:
# - framework package (core runtime)
# - aden_tools package (19 MCP tools)
# - All dependencies
# Run a specific agent directly
hive run exports/my_agent --input '{"task": "Your input here"}'
# Build new agents using Claude Code skills
claude> /building-agents
# Run a specific agent with the TUI dashboard
hive run exports/my_agent --tui
# Test agents
claude> /testing-agent
# Run agents
PYTHONPATH=core:exports python -m agent_name run --input '{...}'
# Interactive REPL
hive shell
```
See [ENVIRONMENT_SETUP.md](ENVIRONMENT_SETUP.md) for complete setup instructions.
The TUI scans both `exports/` and `examples/templates/` for available agents.
> **Using Python directly (alternative):** You can also run agents with `PYTHONPATH=exports uv run python -m agent_name run --input '{...}'`
See [environment-setup.md](docs/environment-setup.md) for complete setup instructions.
## Documentation
- **[Developer Guide](DEVELOPER.md)** - Comprehensive guide for developers
- **[Developer Guide](docs/developer-guide.md)** - Comprehensive guide for developers
- [Getting Started](docs/getting-started.md) - Quick setup instructions
- [TUI Guide](docs/tui-selection-guide.md) - Interactive dashboard usage
- [Configuration Guide](docs/configuration.md) - All configuration options
- [Architecture Overview](docs/architecture.md) - System design and structure
- [Architecture Overview](docs/architecture/README.md) - System design and structure
## Roadmap
Aden Agent Framework aims to help developers build outcome oriented, self-adaptive agents. Please find our roadmap here
[ROADMAP.md](ROADMAP.md)
Aden Hive Agent Framework aims to help developers build outcome-oriented, self-adaptive agents. See [roadmap.md](docs/roadmap.md) for details.
```mermaid
timeline
title Aden Agent Framework Roadmap
section Foundation
Architecture : Node-Based Architecture : Python SDK : LLM Integration (OpenAI, Anthropic, Google) : Communication Protocol
Coding Agent : Goal Creation Session : Worker Agent Creation : MCP Tools Integration
Worker Agent : Human-in-the-Loop : Callback Handlers : Intervention Points : Streaming Interface
Tools : File Use : Memory (STM/LTM) : Web Search : Web Scraper : Audit Trail
Core : Eval System : Pydantic Validation : Docker Deployment : Documentation : Sample Agents
section Expansion
Intelligence : Guardrails : Streaming Mode : Semantic Search
Platform : JavaScript SDK : Custom Tool Integrator : Credential Store
Deployment : Self-Hosted : Cloud Services : CI/CD Pipeline
Templates : Sales Agent : Marketing Agent : Analytics Agent : Training Agent : Smart Form Agent
flowchart TB
%% Main Entity
User([User])
%% =========================================
%% EXTERNAL EVENT SOURCES
%% =========================================
subgraph ExtEventSource [External Event Source]
E_Sch["Schedulers"]
E_WH["Webhook"]
E_SSE["SSE"]
end
%% =========================================
%% SYSTEM NODES
%% =========================================
subgraph WorkerBees [Worker Bees]
WB_C["Conversation"]
WB_SP["System prompt"]
subgraph Graph [Graph]
direction TB
N1["Node"] --> N2["Node"] --> N3["Node"]
N1 -.-> AN["Active Node"]
N2 -.-> AN
N3 -.-> AN
%% Nested Event Loop Node
subgraph EventLoopNode [Event Loop Node]
ELN_L["listener"]
ELN_SP["System Prompt<br/>(Task)"]
ELN_EL["Event loop"]
ELN_C["Conversation"]
end
end
end
subgraph JudgeNode [Judge]
J_C["Criteria"]
J_P["Principles"]
J_EL["Event loop"] <--> J_S["Scheduler"]
end
subgraph QueenBee [Queen Bee]
QB_SP["System prompt"]
QB_EL["Event loop"]
QB_C["Conversation"]
end
subgraph Infra [Infra]
SA["Sub Agent"]
TR["Tool Registry"]
WTM["Write through Conversation Memory<br/>(Logs/RAM/Harddrive)"]
SM["Shared Memory<br/>(State/Harddrive)"]
EB["Event Bus<br/>(RAM)"]
CS["Credential Store<br/>(Harddrive/Cloud)"]
end
subgraph PC [PC]
B["Browser"]
CB["Codebase<br/>v 0.0.x ... v n.n.n"]
end
%% =========================================
%% CONNECTIONS & DATA FLOW
%% =========================================
%% External Event Routing
E_Sch --> ELN_L
E_WH --> ELN_L
E_SSE --> ELN_L
ELN_L -->|"triggers"| ELN_EL
%% User Interactions
User -->|"Talk"| WB_C
User -->|"Talk"| QB_C
User -->|"Read/Write Access"| CS
%% Inter-System Logic
ELN_C <-->|"Mirror"| WB_C
WB_C -->|"Focus"| AN
WorkerBees -->|"Inquire"| JudgeNode
JudgeNode -->|"Approve"| WorkerBees
%% Judge Alignments
J_C <-.->|"aligns"| WB_SP
J_P <-.->|"aligns"| QB_SP
%% Escalate path
J_EL -->|"Report (Escalate)"| QB_EL
%% Pub/Sub Logic
AN -->|"publish"| EB
EB -->|"subscribe"| QB_C
%% Infra and Process Spawning
ELN_EL -->|"Spawn"| SA
SA -->|"Inform"| ELN_EL
SA -->|"Starts"| B
B -->|"Report"| ELN_EL
TR -->|"Assigned"| ELN_EL
CB -->|"Modify Worker Bee"| WB_C
%% =========================================
%% SHARED MEMORY & LOGS ACCESS
%% =========================================
%% Worker Bees Access (link to node inside Graph subgraph)
AN <-->|"Read/Write"| WTM
AN <-->|"Read/Write"| SM
%% Queen Bee Access
QB_C <-->|"Read/Write"| WTM
QB_EL <-->|"Read/Write"| SM
%% Credentials Access
CS -->|"Read Access"| QB_C
```
## Contributing
We welcome contributions from the community! Were especially looking for help building tools, integrations, and example agents for the framework ([check #2805](https://github.com/adenhq/hive/issues/2805)). If youre interested in extending its functionality, this is the perfect place to start. Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
**Important:** Please get assigned to an issue before submitting a PR. Comment on an issue to claim it, and a maintainer will assign you. Issues with reproducible steps and proposals are prioritized. This helps prevent duplicate work.
1. Find or create an issue and get assigned
2. Fork the repository
3. Create your feature branch (`git checkout -b feature/amazing-feature`)
4. Commit your changes (`git commit -m 'Add amazing feature'`)
5. Push to the branch (`git push origin feature/amazing-feature`)
6. Open a Pull Request
## Community & Support
We use [Discord](https://discord.com/invite/MXE49hrKDk) for support, feature requests, and community discussions.
@@ -295,16 +418,6 @@ We use [Discord](https://discord.com/invite/MXE49hrKDk) for support, feature req
- Twitter/X - [@adenhq](https://x.com/aden_hq)
- LinkedIn - [Company Page](https://www.linkedin.com/company/teamaden/)
## Contributing
We welcome contributions! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
1. Fork the repository
2. Create your feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request
## Join Our Team
**We're hiring!** Join us in engineering, research, and go-to-market roles.
@@ -321,69 +434,57 @@ This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENS
## Frequently Asked Questions (FAQ)
**Q: Does Aden depend on LangChain or other agent frameworks?**
**Q: What LLM providers does Hive support?**
No. Aden is built from the ground up with no dependencies on LangChain, CrewAI, or other agent frameworks. The framework is designed to be lean and flexible, generating agent graphs dynamically rather than relying on predefined components.
Hive supports 100+ LLM providers through LiteLLM integration, including OpenAI (GPT-4, GPT-4o), Anthropic (Claude models), Google Gemini, DeepSeek, Mistral, Groq, and many more. Simply set the appropriate API key environment variable and specify the model name.
**Q: What LLM providers does Aden support?**
**Q: Can I use Hive with local AI models like Ollama?**
Aden supports 100+ LLM providers through LiteLLM integration, including OpenAI (GPT-4, GPT-4o), Anthropic (Claude models), Google Gemini, Mistral, Groq, and many more. Simply set the appropriate API key environment variable and specify the model name.
Yes! Hive supports local models through LiteLLM. Simply use the model name format `ollama/model-name` (e.g., `ollama/llama3`, `ollama/mistral`) and ensure Ollama is running locally.
**Q: Can I use Aden with local AI models like Ollama?**
**Q: What makes Hive different from other agent frameworks?**
Yes! Aden supports local models through LiteLLM. Simply use the model name format `ollama/model-name` (e.g., `ollama/llama3`, `ollama/mistral`) and ensure Ollama is running locally.
Hive generates your entire agent system from natural language goals using a coding agent—you don't hardcode workflows or manually define graphs. When agents fail, the framework automatically captures failure data, [evolves the agent graph](docs/key_concepts/evolution.md), and redeploys. This self-improving loop is unique to Aden.
**Q: What makes Aden different from other agent frameworks?**
**Q: Is Hive open-source?**
Aden generates your entire agent system from natural language goals using a coding agent—you don't hardcode workflows or manually define graphs. When agents fail, the framework automatically captures failure data, evolves the agent graph, and redeploys. This self-improving loop is unique to Aden.
Yes, Hive is fully open-source under the Apache License 2.0. We actively encourage community contributions and collaboration.
**Q: Is Aden open-source?**
**Q: Can Hive handle complex, production-scale use cases?**
Yes, Aden is fully open-source under the Apache License 2.0. We actively encourage community contributions and collaboration.
Yes. Hive is explicitly designed for production environments with features like automatic failure recovery, real-time observability, cost controls, and horizontal scaling support. The framework handles both simple automations and complex multi-agent workflows.
**Q: Does Aden collect data from users?**
**Q: Does Hive support human-in-the-loop workflows?**
Aden collects telemetry data for monitoring and observability purposes, including token usage, latency metrics, and cost tracking. Content capture (prompts and responses) is configurable and stored with team-scoped data isolation. All data stays within your infrastructure when self-hosted.
Yes, Hive fully supports [human-in-the-loop](docs/key_concepts/graph.md#human-in-the-loop) workflows through intervention nodes that pause execution for human input. These include configurable timeouts and escalation policies, allowing seamless collaboration between human experts and AI agents.
**Q: What deployment options does Aden support?**
**Q: What programming languages does Hive support?**
Aden supports Docker Compose deployment out of the box, with both production and development configurations. Self-hosted deployments work on any infrastructure supporting Docker. Cloud deployment options and Kubernetes-ready configurations are on the roadmap.
The Hive framework is built in Python. A JavaScript/TypeScript SDK is on the roadmap.
**Q: Can Aden handle complex, production-scale use cases?**
Yes. Aden is explicitly designed for production environments with features like automatic failure recovery, real-time observability, cost controls, and horizontal scaling support. The framework handles both simple automations and complex multi-agent workflows.
**Q: Does Aden support human-in-the-loop workflows?**
Yes, Aden fully supports human-in-the-loop workflows through intervention nodes that pause execution for human input. These include configurable timeouts and escalation policies, allowing seamless collaboration between human experts and AI agents.
**Q: What monitoring and debugging tools does Aden provide?**
Aden includes comprehensive observability features: real-time WebSocket streaming for live agent execution monitoring, TimescaleDB-powered analytics for cost and performance metrics, health check endpoints for Kubernetes integration, and 19 MCP tools for budget management, agent status, and policy control.
**Q: What programming languages does Aden support?**
Aden provides SDKs for both Python and JavaScript/TypeScript. The Python SDK includes integration templates for LangGraph, LangFlow, and LiveKit. The backend is Node.js/TypeScript, and the frontend is React/TypeScript.
**Q: Can Aden agents interact with external tools and APIs?**
**Q: Can Hive agents interact with external tools and APIs?**
Yes. Aden's SDK-wrapped nodes provide built-in tool access, and the framework supports flexible tool ecosystems. Agents can integrate with external APIs, databases, and services through the node architecture.
**Q: How does cost control work in Aden?**
**Q: How does cost control work in Hive?**
Aden provides granular budget controls including spending limits, throttles, and automatic model degradation policies. You can set budgets at the team, agent, or workflow level, with real-time cost tracking and alerts.
Hive provides granular budget controls including spending limits, throttles, and automatic model degradation policies. You can set budgets at the team, agent, or workflow level, with real-time cost tracking and alerts.
**Q: Where can I find examples and documentation?**
Visit [docs.adenhq.com](https://docs.adenhq.com/) for complete guides, API reference, and getting started tutorials. The repository also includes documentation in the `docs/` folder and a comprehensive [DEVELOPER.md](DEVELOPER.md) guide.
Visit [docs.adenhq.com](https://docs.adenhq.com/) for complete guides, API reference, and getting started tutorials. The repository also includes documentation in the `docs/` folder and a comprehensive [developer guide](docs/developer-guide.md).
**Q: How can I contribute to Aden?**
Contributions are welcome! Fork the repository, create your feature branch, implement your changes, and submit a pull request. See [CONTRIBUTING.md](CONTRIBUTING.md) for detailed guidelines.
**Q: Does Aden offer enterprise support?**
**Q: When will my team start seeing results from Aden's adaptive agents?**
For enterprise inquiries, contact the Aden team through [adenhq.com](https://adenhq.com) or join our [Discord community](https://discord.com/invite/MXE49hrKDk) for support and discussions.
Aden's adaptation loop begins working from the first execution. When an agent fails, the framework captures the failure data, helping developers evolve the agent graph through the coding agent. How quickly this translates to measurable results depends on the complexity of your use case, the quality of your goal definitions, and the volume of executions generating feedback.
**Q: How does Hive compare to other agent frameworks?**
Hive focuses on generating agents that run real business processes, rather than generic agents. This vision emphasizes outcome-driven design, adaptability, and an easy-to-use set of tools and integrations.
---
-339
View File
@@ -1,339 +0,0 @@
<p align="center">
<img width="100%" alt="Hive Banner" src="https://storage.googleapis.com/aden-prod-assets/website/aden-title-card.png" />
</p>
<p align="center">
<a href="README.md">English</a> |
<a href="README.zh-CN.md">简体中文</a> |
<a href="README.es.md">Español</a> |
<a href="README.pt.md">Português</a> |
<a href="README.ja.md">日本語</a> |
<a href="README.ru.md">Русский</a>
</p>
[![Apache 2.0 License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://github.com/adenhq/hive/blob/main/LICENSE)
[![Y Combinator](https://img.shields.io/badge/Y%20Combinator-Aden-orange)](https://www.ycombinator.com/companies/aden)
[![Docker Pulls](https://img.shields.io/docker/pulls/adenhq/hive?logo=Docker&labelColor=%23528bff)](https://hub.docker.com/u/adenhq)
[![Discord](https://img.shields.io/discord/1172610340073242735?logo=discord&labelColor=%235462eb&logoColor=%23f5f5f5&color=%235462eb)](https://discord.com/invite/MXE49hrKDk)
[![Twitter Follow](https://img.shields.io/twitter/follow/teamaden?logo=X&color=%23f5f5f5)](https://x.com/aden_hq)
[![LinkedIn](https://custom-icon-badges.demolab.com/badge/LinkedIn-0A66C2?logo=linkedin-white&logoColor=fff)](https://www.linkedin.com/company/teamaden/)
<p align="center">
<img src="https://img.shields.io/badge/AI_Agents-Self--Improving-brightgreen?style=flat-square" alt="AI Agents" />
<img src="https://img.shields.io/badge/Multi--Agent-Systems-blue?style=flat-square" alt="Multi-Agent" />
<img src="https://img.shields.io/badge/Goal--Driven-Development-purple?style=flat-square" alt="Goal-Driven" />
<img src="https://img.shields.io/badge/Human--in--the--Loop-orange?style=flat-square" alt="HITL" />
<img src="https://img.shields.io/badge/Production--Ready-red?style=flat-square" alt="Production" />
</p>
<p align="center">
<img src="https://img.shields.io/badge/OpenAI-supported-412991?style=flat-square&logo=openai" alt="OpenAI" />
<img src="https://img.shields.io/badge/Anthropic-supported-d4a574?style=flat-square" alt="Anthropic" />
<img src="https://img.shields.io/badge/Google_Gemini-supported-4285F4?style=flat-square&logo=google" alt="Gemini" />
<img src="https://img.shields.io/badge/MCP-19_Tools-00ADD8?style=flat-square" alt="MCP" />
</p>
## 概述
构建可靠的、自我改进的 AI 智能体,无需硬编码工作流。通过与编码智能体对话来定义目标,框架会生成带有动态创建连接代码的节点图。当出现问题时,框架会捕获故障数据,通过编码智能体进化智能体,并重新部署。内置的人机协作节点、凭证管理和实时监控让您在保持适应性的同时拥有完全控制权。
访问 [adenhq.com](https://adenhq.com) 获取完整文档、示例和指南。
## 什么是 Aden
<p align="center">
<img width="100%" alt="Aden Architecture" src="docs/assets/aden-architecture-diagram.jpg" />
</p>
Aden 是一个用于构建、部署、运营和适应 AI 智能体的平台:
- **构建** - 编码智能体根据自然语言目标生成专业的工作智能体(销售、营销、运营)
- **部署** - 无头部署,支持 CI/CD 集成和完整的 API 生命周期管理
- **运营** - 实时监控、可观测性和运行时护栏确保智能体可靠运行
- **适应** - 持续评估、监督和适应确保智能体随时间改进
- **基础设施** - 共享内存、LLM 集成、工具和技能为每个智能体提供支持
## 快速链接
- **[文档](https://docs.adenhq.com/)** - 完整指南和 API 参考
- **[自托管指南](https://docs.adenhq.com/getting-started/quickstart)** - 在您的基础设施上部署 Hive
- **[更新日志](https://github.com/adenhq/hive/releases)** - 最新更新和版本
<!-- - **[路线图](https://adenhq.com/roadmap)** - 即将推出的功能和计划 -->
- **[报告问题](https://github.com/adenhq/hive/issues)** - Bug 报告和功能请求
## 快速开始
### 前置要求
- [Python 3.11+](https://www.python.org/downloads/) - 用于智能体开发
- [Docker](https://docs.docker.com/get-docker/) (v20.10+) - 可选,用于容器化工具
### 安装
```bash
# 克隆仓库
git clone https://github.com/adenhq/hive.git
cd hive
# 运行 Python 环境设置
./scripts/setup-python.sh
```
这将安装:
- **framework** - 核心智能体运行时和图执行器
- **aden_tools** - 19 个 MCP 工具提供智能体能力
- 所有必需的依赖项
### 构建您的第一个智能体
```bash
# 安装 Claude Code 技能(一次性)
./quickstart.sh
# 使用 Claude Code 构建智能体
claude> /building-agents
# 测试您的智能体
claude> /testing-agent
# 运行您的智能体
PYTHONPATH=core:exports python -m your_agent_name run --input '{...}'
```
**[📖 完整设置指南](ENVIRONMENT_SETUP.md)** - 智能体开发的详细说明
## 功能特性
- **目标驱动开发** - 用自然语言定义目标;编码智能体生成智能体图和连接代码来实现它们
- **自适应智能体** - 框架捕获故障,更新目标并更新智能体图
- **动态节点连接** - 没有预定义边;连接代码由任何有能力的 LLM 根据您的目标生成
- **SDK 封装节点** - 每个节点开箱即用地获得共享内存、本地 RLM 内存、监控、工具和 LLM 访问
- **人机协作** - 干预节点暂停执行以等待人工输入,支持可配置的超时和升级
- **实时可观测性** - WebSocket 流式传输用于实时监控智能体执行、决策和节点间通信
- **成本与预算控制** - 设置支出限制、节流和自动模型降级策略
- **生产就绪** - 可自托管,为规模和可靠性而构建
## 为什么选择 Aden
传统智能体框架要求您手动设计工作流、定义智能体交互并被动处理故障。Aden 颠覆了这一范式——**您描述结果,系统自动构建自己**。
```mermaid
flowchart LR
subgraph BUILD["🏗️ BUILD"]
GOAL["Define Goal<br/>+ Success Criteria"] --> NODES["Add Nodes<br/>LLM/Router/Function"]
NODES --> EDGES["Connect Edges<br/>on_success/failure/conditional"]
EDGES --> TEST["Test & Validate"] --> APPROVE["Approve & Export"]
end
subgraph EXPORT["📦 EXPORT"]
direction TB
JSON["agent.json<br/>(GraphSpec)"]
TOOLS["tools.py<br/>(Functions)"]
MCP["mcp_servers.json<br/>(Integrations)"]
end
subgraph RUN["🚀 RUNTIME"]
LOAD["AgentRunner<br/>Load + Parse"] --> SETUP["Setup Runtime<br/>+ ToolRegistry"]
SETUP --> EXEC["GraphExecutor<br/>Execute Nodes"]
subgraph DECISION["Decision Recording"]
DEC1["runtime.decide()<br/>intent → options → choice"]
DEC2["runtime.record_outcome()<br/>success, result, metrics"]
end
end
subgraph INFRA["⚙️ INFRASTRUCTURE"]
CTX["NodeContext<br/>memory • llm • tools"]
STORE[("FileStorage<br/>Runs & Decisions")]
end
APPROVE --> EXPORT
EXPORT --> LOAD
EXEC --> DECISION
EXEC --> CTX
DECISION --> STORE
STORE -.->|"Analyze & Improve"| NODES
style BUILD fill:#ffbe42,stroke:#cc5d00,stroke-width:3px,color:#333
style EXPORT fill:#fff59d,stroke:#ed8c00,stroke-width:2px,color:#333
style RUN fill:#ffb100,stroke:#cc5d00,stroke-width:3px,color:#333
style DECISION fill:#ffcc80,stroke:#ed8c00,stroke-width:2px,color:#333
style INFRA fill:#e8763d,stroke:#cc5d00,stroke-width:3px,color:#fff
style STORE fill:#ed8c00,stroke:#cc5d00,stroke-width:2px,color:#fff
```
### Aden 的优势
| 传统框架 | Aden |
|----------|------|
| 硬编码智能体工作流 | 用自然语言描述目标 |
| 手动图定义 | 自动生成智能体图 |
| 被动错误处理 | 主动自我进化 |
| 静态工具配置 | 动态 SDK 封装节点 |
| 单独设置监控 | 内置实时可观测性 |
| DIY 预算管理 | 集成成本控制和降级 |
### 工作原理
1. **定义目标** → 用简单英语描述您想要实现的目标
2. **编码智能体生成** → 创建智能体图、连接代码和测试用例
3. **工作节点执行** → SDK 封装节点以完全可观测性和工具访问运行
4. **控制平面监控** → 实时指标、预算执行、策略管理
5. **自我改进** → 失败时,系统进化图并自动重新部署
## Aden 与其他框架的比较
Aden 在智能体开发方面采取了根本不同的方法。虽然大多数框架要求您硬编码工作流或手动定义智能体图,但 Aden 使用**编码智能体从自然语言目标生成整个智能体系统**。当智能体失败时,框架不仅记录错误——它会**自动进化智能体图**并重新部署。
> **注意:** 详细的框架比较表和常见问题解答,请参阅英文版 [README.md](README.md)。
### 何时选择 Aden
选择 Aden 当您需要:
- 智能体从失败中**自我改进**而无需人工干预
- **目标驱动的开发**,您描述结果而非工作流
- 具有自动恢复和重新部署的**生产可靠性**
- 无需重写代码即可**快速迭代**智能体架构
- 具有实时监控和人工监督的**完整可观测性**
选择其他框架当您需要:
- **类型安全、可预测的工作流**PydanticAI、Mastra
- **RAG 和文档处理**LlamaIndex、Haystack
- **智能体涌现的研究**(CAMEL)
- **实时语音/多模态**TEN Framework
- **简单的组件链接**LangChain、Swarm
## 项目结构
```
hive/
├── core/ # 核心框架 - 智能体运行时、图执行器、协议
├── tools/ # MCP 工具包 - 19 个工具提供智能体能力
├── exports/ # 智能体包 - 预构建的智能体和示例
├── docs/ # 文档和指南
├── scripts/ # 构建和实用脚本
├── .claude/ # Claude Code 技能用于构建智能体
├── ENVIRONMENT_SETUP.md # 智能体开发的 Python 设置指南
├── DEVELOPER.md # 开发者指南
├── CONTRIBUTING.md # 贡献指南
└── ROADMAP.md # 产品路线图
```
## 开发
### Python 智能体开发
使用框架构建和运行目标驱动的智能体:
```bash
# 一次性设置
./scripts/setup-python.sh
# 这将安装:
# - framework 包(核心运行时)
# - aden_tools 包(19 个 MCP 工具)
# - 所有依赖项
# 使用 Claude Code 技能构建新智能体
claude> /building-agents
# 测试智能体
claude> /testing-agent
# 运行智能体
PYTHONPATH=core:exports python -m agent_name run --input '{...}'
```
完整设置说明请参阅 [ENVIRONMENT_SETUP.md](ENVIRONMENT_SETUP.md)。
## 文档
- **[开发者指南](DEVELOPER.md)** - 开发者综合指南
- [入门指南](docs/getting-started.md) - 快速设置说明
- [配置指南](docs/configuration.md) - 所有配置选项
- [架构概述](docs/architecture.md) - 系统设计和结构
## 路线图
Aden 智能体框架旨在帮助开发者构建面向结果的、自适应的智能体。请在此查看我们的路线图
[ROADMAP.md](ROADMAP.md)
```mermaid
timeline
title Aden Agent Framework Roadmap
section Foundation
Architecture : Node-Based Architecture : Python SDK : LLM Integration (OpenAI, Anthropic, Google) : Communication Protocol
Coding Agent : Goal Creation Session : Worker Agent Creation : MCP Tools Integration
Worker Agent : Human-in-the-Loop : Callback Handlers : Intervention Points : Streaming Interface
Tools : File Use : Memory (STM/LTM) : Web Search : Web Scraper : Audit Trail
Core : Eval System : Pydantic Validation : Docker Deployment : Documentation : Sample Agents
section Expansion
Intelligence : Guardrails : Streaming Mode : Semantic Search
Platform : JavaScript SDK : Custom Tool Integrator : Credential Store
Deployment : Self-Hosted : Cloud Services : CI/CD Pipeline
Templates : Sales Agent : Marketing Agent : Analytics Agent : Training Agent : Smart Form Agent
```
## 社区与支持
我们使用 [Discord](https://discord.com/invite/MXE49hrKDk) 进行支持、功能请求和社区讨论。
- Discord - [加入我们的社区](https://discord.com/invite/MXE49hrKDk)
- Twitter/X - [@adenhq](https://x.com/aden_hq)
- LinkedIn - [公司主页](https://www.linkedin.com/company/teamaden/)
## 贡献
我们欢迎贡献!请参阅 [CONTRIBUTING.md](CONTRIBUTING.md) 了解指南。
1. Fork 仓库
2. 创建功能分支 (`git checkout -b feature/amazing-feature`)
3. 提交更改 (`git commit -m 'Add amazing feature'`)
4. 推送到分支 (`git push origin feature/amazing-feature`)
5. 创建 Pull Request
## 加入我们的团队
**我们正在招聘!** 加入我们的工程、研究和市场推广团队。
[查看开放职位](https://jobs.adenhq.com/a8cec478-cdbc-473c-bbd4-f4b7027ec193/applicant)
## 安全
有关安全问题,请参阅 [SECURITY.md](SECURITY.md)。
## 许可证
本项目采用 Apache License 2.0 许可证 - 详情请参阅 [LICENSE](LICENSE) 文件。
## 常见问题 (FAQ)
> **注意:** 完整的常见问题解答,请参阅英文版 [README.md](README.md)。
**问:Aden 是否依赖 LangChain 或其他智能体框架?**
不。Aden 从头开始构建,不依赖 LangChain、CrewAI 或其他智能体框架。该框架设计精简灵活,动态生成智能体图而非依赖预定义组件。
**问:Aden 支持哪些 LLM 提供商?**
Aden 通过 LiteLLM 集成支持 100 多个 LLM 提供商,包括 OpenAIGPT-4、GPT-4o)、AnthropicClaude 模型)、Google Gemini、Mistral、Groq 等。只需设置适当的 API 密钥环境变量并指定模型名称即可。
**问:Aden 是开源的吗?**
是的,Aden 在 Apache License 2.0 下完全开源。我们积极鼓励社区贡献和协作。
**问:Aden 与其他智能体框架有何不同?**
Aden 使用编码智能体从自然语言目标生成整个智能体系统——您无需硬编码工作流或手动定义图。当智能体失败时,框架会自动捕获故障数据、进化智能体图并重新部署。这种自我改进循环是 Aden 独有的。
**问:Aden 支持人机协作工作流吗?**
是的,Aden 通过干预节点完全支持人机协作工作流,这些节点会暂停执行以等待人工输入。包括可配置的超时和升级策略,实现人类专家与 AI 智能体的无缝协作。
---
<p align="center">
用 🔥 热情打造于旧金山
</p>
-150
View File
@@ -1,150 +0,0 @@
Product Roadmap
Aden Agent Framework aims to help developers build outcome oriented, self-adaptive agents. Please find our roadmap here
```mermaid
timeline
title Aden Agent Framework Roadmap
section Foundation
Architecture : Node-Based Architecture : Python SDK : LLM Integration (OpenAI, Anthropic, Google) : Communication Protocol
Coding Agent : Goal Creation Session : Worker Agent Creation : MCP Tools Integration
Worker Agent : Human-in-the-Loop : Callback Handlers : Intervention Points : Streaming Interface
Tools : File Use : Memory (STM/LTM) : Web Search : Web Scraper : Audit Trail
Core : Eval System : Pydantic Validation : Docker Deployment : Documentation : Sample Agents
section Expansion
Intelligence : Guardrails : Streaming Mode : Semantic Search
Platform : JavaScript SDK : Custom Tool Integrator : Credential Store
Deployment : Self-Hosted : Cloud Services : CI/CD Pipeline
Templates : Sales Agent : Marketing Agent : Analytics Agent : Training Agent : Smart Form Agent
```
---
## Phase 1: Foundation
### Backbone Architecture
- [ ] **Node-Based Architecture (Agent as a node)**
- [x] Object schema definition
- [x] Node wrapper SDK
- [ ] Shared memory access
- [ ] Default monitoring hooks
- [ ] Tool access layer
- [x] LLM integration layer (Natively supports all mainstream LLMs through LiteLLM)
- [x] Anthropic
- [x] OpenAI
- [x] Google
- [ ] **Communication protocol between nodes**
- [ ] **[Coding Agent] Goal Creation Session** (separate from coding session)
- [ ] Instruction back and forth
- [x] Goal Object schema definition
- [ ] Being able to generate the test cases
- [ ] Test case validation for worker agent (Outcome driven)
- [ ] **[Coding Agent] Worker Agent Creation**
- [x] Coding Agent tools
- [ ] Use Template Agent as a start
- [x] Use our MCP tools
- [ ] **[Worker Agent] Human-in-the-Loop**
- [x] Worker Agents request with questions and options
- [x] Callback Handler System to receive events throughout execution
- [ ] Tool-Based Intervention Points (tool to pause execution and request human input)
- [x] Multiple entrypoint for different event source (e.g. Human input, webhook)
- [ ] Streaming Interface for Real-time Monitoring
- [ ] Request State Management
### Essential Tools
- [x] **File Use Tool Kit**
- [ ] **Memory Tools**
- [x] STM Layer Tool (state-based short-term memory)
- [x] LTM Layer Tool (RLM - long-term memory)
- [ ] **Infrastructure Tools**
- [x] Runtime Log Tool (logs for coding agent)
- [ ] Audit Trail Tool (decision timeline generation)
- [ ] Web Search
- [ ] Web Scraper
- [ ] Recipe for "Add your own tools"
### Memory & File System
- [x] DB for long-term persistent memory (Filesystem as durable scratchpad pattern)
- [x] Session Local memory isolation
### Eval System (Basic)
- [x] Test Driven - Run test case for all agent iteration
- [ ] Failure recording mechanism
- [ ] SDK for defining failure conditions
- [ ] Basic observability hooks
- [ ] User-driven log analysis (OSS approach)
### Data Validation
- [ ] Natively Support data validation of LLMs output with Pydantic
### Developer Experience
- [ ] **Debugging mode**
- [ ] **Documentation**
- [ ] Quick start guide
- [ ] Goal creation guide
- [ ] Agent creation guide
- [ ] GitHub Page setup
- [ ] README with examples
- [ ] Contributing guidelines
- [ ] **Distribution**
- [ ] PyPI package
- [ ] Docker image on Docker Hub
### Sample Agents
- [ ] Knowledge Agent
- [ ] Blog Writer Agent
- [ ] SDR Agent
---
## Phase 2: Expansion
### Basic Guardrails
- [ ] Support Basic Monitoring from Agent node SDK
- [ ] SDK guardrail implementation (in node)
- [ ] Guardrail type support (Determined Condition as Guardrails)
### Agent Capability
- [ ] Streaming mode support
### Cross-Platform
- [ ] JavaScript / TypeScript Version SDK
### File System Enhancement
- [ ] Semantic Search integration
- [ ] Interactive File System in product (frontend integration)
### More Worker Tools
- [ ] Custom Tool Integrator
- [ ] Integration as a tool (Credential Store & Support)
- [ ] **Core Agent Tools**
- [ ] Node Discovery Tool (find other agents in the graph)
- [ ] HITL Tool (pause execution for human approval)
- [ ] Wake-up Tool (resume agent tasks)
### Deployment (Self-Hosted)
- [ ] Docker container standardization
- [ ] Headless backend execution
- [ ] Exposed API for frontend attachment
- [ ] Local monitoring & observability
- [ ] Basic lifecycle APIs (Start, Stop, Pause, Resume)
### Deployment (Cloud)
- [ ] Cloud Service Options
- [ ] Support deployment to 3rd-party platforms
- [ ] Self-deploy + orchestrator connection
- [ ] **CI/CD Pipeline**
- [ ] Automated test execution
- [ ] Agent version control
- [ ] All tests must pass for deployment
### Developer Experience Enhancement
- [ ] Tool usage documentation
- [ ] Discord Support Channel
### More Agent Templates
- [ ] GTM Sales Agent (workflow)
- [ ] GTM Marketing Agent (workflow)
- [ ] Analytics Agent
- [ ] Training Agent
- [ ] Smart Entry / Form Agent (self-evolution emphasis)
+1
View File
@@ -1,4 +1,5 @@
exports/
docs/
.agent-builder-sessions/
.pytest_cache/
**/__pycache__/
+2 -2
View File
@@ -3,12 +3,12 @@
"agent-builder": {
"command": "python",
"args": ["-m", "framework.mcp.agent_builder_server"],
"cwd": "/home/timothy/oss/hive/core"
"cwd": "core"
},
"tools": {
"command": "python",
"args": ["-m", "aden_tools.mcp_server", "--stdio"],
"cwd": "/home/timothy/oss/hive/tools"
"cwd": "tools"
}
}
}
+3 -3
View File
@@ -82,7 +82,7 @@ Register an MCP server as a tool source for your agent.
"example_tool"
],
"total_mcp_servers": 1,
"note": "MCP server 'tools' registered with 6 tools. These tools can now be used in llm_tool_use nodes."
"note": "MCP server 'tools' registered with 6 tools. These tools can now be used in event_loop nodes."
}
```
@@ -149,7 +149,7 @@ List tools available from registered MCP servers.
]
},
"total_tools": 6,
"note": "Use these tool names in the 'tools' parameter when adding llm_tool_use nodes"
"note": "Use these tool names in the 'tools' parameter when adding event_loop nodes"
}
```
@@ -246,7 +246,7 @@ Here's a complete workflow for building an agent with MCP tools:
"node_id": "web-searcher",
"name": "Web Search",
"description": "Search the web for information",
"node_type": "llm_tool_use",
"node_type": "event_loop",
"input_keys": "[\"query\"]",
"output_keys": "[\"search_results\"]",
"system_prompt": "Search for {query} using the web_search tool",
+2 -2
View File
@@ -119,7 +119,7 @@ builder = WorkflowBuilder()
builder.add_node(
node_id="researcher",
name="Web Researcher",
node_type="llm_tool_use",
node_type="event_loop",
system_prompt="Research the topic using web_search",
tools=["web_search"], # Tool from tools MCP server
input_keys=["topic"],
@@ -137,7 +137,7 @@ Tools from MCP servers can be referenced in your agent.json just like built-in t
{
"id": "searcher",
"name": "Web Searcher",
"node_type": "llm_tool_use",
"node_type": "event_loop",
"system_prompt": "Search for information about {topic}",
"tools": ["web_search", "web_scrape"],
"input_keys": ["topic"],
+17 -70
View File
@@ -103,31 +103,20 @@ Add a processing node to the agent graph.
- `node_id` (string, required): Unique node identifier
- `name` (string, required): Human-readable name
- `description` (string, required): What this node does
- `node_type` (string, required): One of: `llm_generate`, `llm_tool_use`, `router`, `function`
- `node_type` (string, required): Must be `event_loop` (the only valid type)
- `input_keys` (string, required): JSON array of input variable names
- `output_keys` (string, required): JSON array of output variable names
- `system_prompt` (string, optional): System prompt for LLM nodes
- `tools` (string, optional): JSON array of tool names for tool_use nodes
- `routes` (string, optional): JSON object of route mappings for router nodes
- `system_prompt` (string, optional): System prompt for the LLM
- `tools` (string, optional): JSON array of tool names
- `client_facing` (boolean, optional): Set to true for human-in-the-loop interaction
**Node Types:**
**Node Type:**
1. **llm_generate**: Uses LLM to generate output from inputs
- Requires: `system_prompt`
- Tools: Not used
2. **llm_tool_use**: Uses LLM with tools to accomplish tasks
- Requires: `system_prompt`, `tools`
- Tools: Array of tool names (e.g., `["web_search", "web_fetch"]`)
3. **router**: LLM-powered routing to different paths
- Requires: `system_prompt`, `routes`
- Routes: Object mapping route names to target node IDs
- Example: `{"pass": "success_node", "fail": "retry_node"}`
4. **function**: Executes a pre-defined function
- System prompt describes the function behavior
- No LLM calls, pure computation
**event_loop**: LLM-powered node with self-correction loop
- Requires: `system_prompt`
- Optional: `tools` (array of tool names, e.g., `["web_search", "web_fetch"]`)
- Optional: `client_facing` (set to true for HITL / user interaction)
- Supports: iterative refinement, judge-based evaluation, tool use, streaming
**Example:**
```json
@@ -135,7 +124,7 @@ Add a processing node to the agent graph.
"node_id": "search_sources",
"name": "Search Sources",
"description": "Searches for relevant sources on the topic",
"node_type": "llm_tool_use",
"node_type": "event_loop",
"input_keys": "[\"topic\", \"search_queries\"]",
"output_keys": "[\"sources\", \"source_count\"]",
"system_prompt": "Search for sources using the provided queries...",
@@ -198,7 +187,7 @@ Export the validated graph as an agent specification.
**What it does:**
1. Validates the graph
2. Auto-generates missing edges from router routes
2. Validates edge connectivity
3. Writes files to disk:
- `exports/{agent-name}/agent.json` - Full agent specification
- `exports/{agent-name}/README.md` - Auto-generated documentation
@@ -252,47 +241,6 @@ Test the complete agent graph with sample inputs.
---
### Evaluation Rules
#### `add_evaluation_rule`
Add a rule for the HybridJudge to evaluate node outputs.
**Parameters:**
- `rule_id` (string, required): Unique rule identifier
- `description` (string, required): What this rule checks
- `condition` (string, required): Python expression to evaluate
- `action` (string, required): Action to take: `accept`, `retry`, `escalate`
- `priority` (integer, optional): Rule priority (default: 0)
- `feedback_template` (string, optional): Feedback message template
**Condition Examples:**
- `'result.get("success") == True'` - Check for success flag
- `'result.get("error_type") == "timeout"'` - Check error type
- `'len(result.get("data", [])) > 0'` - Check for non-empty data
**Example:**
```json
{
"rule_id": "timeout_retry",
"description": "Retry on timeout errors",
"condition": "result.get('error_type') == 'timeout'",
"action": "retry",
"priority": 10,
"feedback_template": "Timeout occurred, retrying..."
}
```
#### `list_evaluation_rules`
List all configured evaluation rules.
#### `remove_evaluation_rule`
Remove an evaluation rule.
**Parameters:**
- `rule_id` (string, required): Rule to remove
---
## Example Workflow
Here's a complete workflow for building a research agent:
@@ -320,7 +268,7 @@ add_node(
node_id="planner",
name="Research Planner",
description="Creates research strategy",
node_type="llm_generate",
node_type="event_loop",
input_keys='["topic"]',
output_keys='["strategy", "queries"]',
system_prompt="Analyze topic and create research plan..."
@@ -330,7 +278,7 @@ add_node(
node_id="searcher",
name="Search Sources",
description="Find relevant sources",
node_type="llm_tool_use",
node_type="event_loop",
input_keys='["queries"]',
output_keys='["sources"]',
system_prompt="Search for sources...",
@@ -359,10 +307,9 @@ The exported agent will be saved to `exports/research-agent/`.
1. **Start with the goal**: Define clear success criteria before building nodes
2. **Test nodes individually**: Use `test_node` to verify each node works
3. **Use router nodes for branching**: Don't create edges manually for routers - define routes and they'll be auto-generated
4. **Add evaluation rules**: Help the judge evaluate outputs deterministically
5. **Validate early, validate often**: Run `validate_graph` after adding nodes/edges
6. **Check exports**: Review the generated README.md to verify your agent structure
3. **Use conditional edges for branching**: Define condition_expr on edges for decision points
4. **Validate early, validate often**: Run `validate_graph` after adding nodes/edges
5. **Check exports**: Review the generated README.md to verify your agent structure
---
+15 -19
View File
@@ -14,7 +14,7 @@ Framework provides a runtime framework that captures **decisions**, not just act
## Installation
```bash
pip install -e .
uv pip install -e .
```
## MCP Server Setup
@@ -45,13 +45,13 @@ If you prefer manual setup:
```bash
# Install framework
pip install -e .
uv pip install -e .
# Install MCP dependencies
pip install mcp fastmcp
uv pip install mcp fastmcp
# Test the server
python -m framework.mcp.agent_builder_server
uv run python -m framework.mcp.agent_builder_server
```
### Using with MCP Clients
@@ -73,7 +73,7 @@ To use the agent builder with Claude Desktop or other MCP clients, add this to y
The MCP server provides tools for:
- Creating agent building sessions
- Defining goals with success criteria
- Adding nodes (llm_generate, llm_tool_use, router, function)
- Adding nodes (event_loop only)
- Connecting nodes with edges
- Validating and exporting agent graphs
- Testing nodes and full agent graphs
@@ -86,13 +86,13 @@ Run an LLM-powered calculator:
```bash
# Single calculation
python -m framework calculate "2 + 3 * 4"
uv run python -m framework calculate "2 + 3 * 4"
# Interactive mode
python -m framework interactive
uv run python -m framework interactive
# Analyze runs with Builder
python -m framework analyze calculator
uv run python -m framework analyze calculator
```
### Using the Runtime
@@ -132,24 +132,20 @@ runtime.end_run(success=True, narrative="Successfully processed all data")
The framework includes a goal-based testing framework for validating agent behavior.
Tests are generated using MCP tools (`generate_constraint_tests`, `generate_success_tests`) which return guidelines. Claude writes tests directly using the Write tool based on these guidelines.
```bash
# Generate tests from a goal definition
python -m framework test-generate goal.json
# Interactively approve generated tests
python -m framework test-approve <goal_id>
# Run tests against an agent
python -m framework test-run <agent_path> --parallel 4
uv run python -m framework test-run <agent_path> --goal <goal_id> --parallel 4
# Debug failed tests
python -m framework test-debug <goal_id> <test_id>
uv run python -m framework test-debug <agent_path> <test_name>
# List tests by status
python -m framework test-list <goal_id>
# List tests for a goal
uv run python -m framework test-list <goal_id>
```
For detailed testing workflows, see the [testing-agent skill](.claude/skills/testing-agent/SKILL.md).
For detailed testing workflows, see the [hive-test skill](../.claude/skills/hive-test/SKILL.md).
### Analyzing Agent Behavior with Builder
+740
View File
@@ -0,0 +1,740 @@
#!/usr/bin/env python3
"""
EventLoopNode WebSocket Demo
Real LLM, real FileConversationStore, real EventBus.
Streams EventLoopNode execution to a browser via WebSocket.
Usage:
cd /home/timothy/oss/hive/core
python demos/event_loop_wss_demo.py
Then open http://localhost:8765 in your browser.
"""
import asyncio
import json
import logging
import sys
import tempfile
from http import HTTPStatus
from pathlib import Path
import httpx
import websockets
from bs4 import BeautifulSoup
from websockets.http11 import Request, Response
# Add core, tools, and hive root to path
_CORE_DIR = Path(__file__).resolve().parent.parent
_HIVE_DIR = _CORE_DIR.parent
sys.path.insert(0, str(_CORE_DIR)) # framework.*
sys.path.insert(0, str(_HIVE_DIR / "tools" / "src")) # aden_tools.*
sys.path.insert(0, str(_HIVE_DIR)) # core.framework.* (for aden_tools imports)
import os # noqa: E402
from aden_tools.credentials import CREDENTIAL_SPECS, CredentialStoreAdapter # noqa: E402
from core.framework.credentials import CredentialStore # noqa: E402
from framework.credentials.storage import ( # noqa: E402
CompositeStorage,
EncryptedFileStorage,
EnvVarStorage,
)
from framework.graph.event_loop_node import EventLoopNode, LoopConfig # noqa: E402
from framework.graph.node import NodeContext, NodeSpec, SharedMemory # noqa: E402
from framework.llm.litellm import LiteLLMProvider # noqa: E402
from framework.llm.provider import Tool # noqa: E402
from framework.runner.tool_registry import ToolRegistry # noqa: E402
from framework.runtime.core import Runtime # noqa: E402
from framework.runtime.event_bus import EventBus, EventType # noqa: E402
from framework.storage.conversation_store import FileConversationStore # noqa: E402
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(name)s %(message)s")
logger = logging.getLogger("demo")
# -------------------------------------------------------------------------
# Persistent state (shared across WebSocket connections)
# -------------------------------------------------------------------------
STORE_DIR = Path(tempfile.mkdtemp(prefix="hive_demo_"))
STORE = FileConversationStore(STORE_DIR / "conversation")
RUNTIME = Runtime(STORE_DIR / "runtime")
LLM = LiteLLMProvider(model="claude-sonnet-4-5-20250929")
# -------------------------------------------------------------------------
# Tool Registry — real tools via ToolRegistry (same pattern as GraphExecutor)
# -------------------------------------------------------------------------
TOOL_REGISTRY = ToolRegistry()
# Credential store: Aden sync (OAuth2 tokens) + encrypted files + env var fallback
_env_mapping = {name: spec.env_var for name, spec in CREDENTIAL_SPECS.items()}
_local_storage = CompositeStorage(
primary=EncryptedFileStorage(),
fallbacks=[EnvVarStorage(env_mapping=_env_mapping)],
)
if os.environ.get("ADEN_API_KEY"):
try:
from framework.credentials.aden import ( # noqa: E402
AdenCachedStorage,
AdenClientConfig,
AdenCredentialClient,
AdenSyncProvider,
)
_client = AdenCredentialClient(AdenClientConfig(base_url="https://api.adenhq.com"))
_provider = AdenSyncProvider(client=_client)
_storage = AdenCachedStorage(
local_storage=_local_storage,
aden_provider=_provider,
)
_cred_store = CredentialStore(storage=_storage, providers=[_provider], auto_refresh=True)
_synced = _provider.sync_all(_cred_store)
logger.info("Synced %d credentials from Aden", _synced)
except Exception as e:
logger.warning("Aden sync unavailable: %s", e)
_cred_store = CredentialStore(storage=_local_storage)
else:
logger.info("ADEN_API_KEY not set, using local credential storage")
_cred_store = CredentialStore(storage=_local_storage)
CREDENTIALS = CredentialStoreAdapter(_cred_store)
# Debug: log which credentials resolved
for _name in ["brave_search", "hubspot", "anthropic"]:
_val = CREDENTIALS.get(_name)
if _val:
logger.debug("credential %s: OK (len=%d)", _name, len(_val))
else:
logger.debug("credential %s: not found", _name)
# --- web_search (Brave Search API) ---
TOOL_REGISTRY.register(
name="web_search",
tool=Tool(
name="web_search",
description=(
"Search the web for current information. "
"Returns titles, URLs, and snippets from search results."
),
parameters={
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "The search query (1-500 characters)",
},
"num_results": {
"type": "integer",
"description": "Number of results to return (1-20, default 10)",
},
},
"required": ["query"],
},
),
executor=lambda inputs: _exec_web_search(inputs),
)
def _exec_web_search(inputs: dict) -> dict:
api_key = CREDENTIALS.get("brave_search")
if not api_key:
return {"error": "brave_search credential not configured"}
query = inputs.get("query", "")
num_results = min(inputs.get("num_results", 10), 20)
resp = httpx.get(
"https://api.search.brave.com/res/v1/web/search",
params={"q": query, "count": num_results},
headers={"X-Subscription-Token": api_key, "Accept": "application/json"},
timeout=30.0,
)
if resp.status_code != 200:
return {"error": f"Brave API HTTP {resp.status_code}"}
data = resp.json()
results = [
{
"title": item.get("title", ""),
"url": item.get("url", ""),
"snippet": item.get("description", ""),
}
for item in data.get("web", {}).get("results", [])[:num_results]
]
return {"query": query, "results": results, "total": len(results)}
# --- web_scrape (httpx + BeautifulSoup, no playwright for sync compat) ---
TOOL_REGISTRY.register(
name="web_scrape",
tool=Tool(
name="web_scrape",
description=(
"Scrape and extract text content from a webpage URL. "
"Returns the page title and main text content."
),
parameters={
"type": "object",
"properties": {
"url": {
"type": "string",
"description": "URL of the webpage to scrape",
},
"max_length": {
"type": "integer",
"description": "Maximum text length (default 50000)",
},
},
"required": ["url"],
},
),
executor=lambda inputs: _exec_web_scrape(inputs),
)
_SCRAPE_HEADERS = {
"User-Agent": (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/131.0.0.0 Safari/537.36"
),
"Accept": "text/html,application/xhtml+xml",
}
def _exec_web_scrape(inputs: dict) -> dict:
url = inputs.get("url", "")
max_length = max(1000, min(inputs.get("max_length", 50000), 500000))
if not url.startswith(("http://", "https://")):
url = "https://" + url
try:
resp = httpx.get(url, timeout=30.0, follow_redirects=True, headers=_SCRAPE_HEADERS)
if resp.status_code != 200:
return {"error": f"HTTP {resp.status_code}"}
soup = BeautifulSoup(resp.text, "html.parser")
for tag in soup(["script", "style", "nav", "footer", "header", "aside", "noscript"]):
tag.decompose()
title = soup.title.get_text(strip=True) if soup.title else ""
main = (
soup.find("article")
or soup.find("main")
or soup.find(attrs={"role": "main"})
or soup.find("body")
)
text = main.get_text(separator=" ", strip=True) if main else ""
text = " ".join(text.split())
if len(text) > max_length:
text = text[:max_length] + "..."
return {"url": url, "title": title, "content": text, "length": len(text)}
except httpx.TimeoutException:
return {"error": "Request timed out"}
except Exception as e:
return {"error": f"Scrape failed: {e}"}
# --- HubSpot CRM tools (optional, requires HUBSPOT_ACCESS_TOKEN) ---
_HUBSPOT_API = "https://api.hubapi.com"
def _hubspot_headers() -> dict | None:
token = CREDENTIALS.get("hubspot")
if token:
logger.debug("HubSpot token: %s...%s (len=%d)", token[:8], token[-4:], len(token))
else:
logger.debug("HubSpot token: not found")
if not token:
return None
return {
"Authorization": f"Bearer {token}",
"Content-Type": "application/json",
"Accept": "application/json",
}
def _exec_hubspot_search(inputs: dict) -> dict:
headers = _hubspot_headers()
if not headers:
return {"error": "HUBSPOT_ACCESS_TOKEN not set"}
object_type = inputs.get("object_type", "contacts")
query = inputs.get("query", "")
limit = min(inputs.get("limit", 10), 100)
body: dict = {"limit": limit}
if query:
body["query"] = query
try:
resp = httpx.post(
f"{_HUBSPOT_API}/crm/v3/objects/{object_type}/search",
headers=headers,
json=body,
timeout=30.0,
)
if resp.status_code != 200:
return {"error": f"HubSpot API HTTP {resp.status_code}: {resp.text[:200]}"}
return resp.json()
except httpx.TimeoutException:
return {"error": "Request timed out"}
except Exception as e:
return {"error": f"HubSpot error: {e}"}
TOOL_REGISTRY.register(
name="hubspot_search",
tool=Tool(
name="hubspot_search",
description=(
"Search HubSpot CRM objects (contacts, companies, or deals). "
"Returns matching records with their properties."
),
parameters={
"type": "object",
"properties": {
"object_type": {
"type": "string",
"description": "CRM object type: 'contacts', 'companies', or 'deals'",
},
"query": {
"type": "string",
"description": "Search query (name, email, domain, etc.)",
},
"limit": {
"type": "integer",
"description": "Max results (1-100, default 10)",
},
},
"required": ["object_type"],
},
),
executor=lambda inputs: _exec_hubspot_search(inputs),
)
logger.info(
"ToolRegistry loaded: %s",
", ".join(TOOL_REGISTRY.get_registered_names()),
)
# -------------------------------------------------------------------------
# HTML page (embedded)
# -------------------------------------------------------------------------
HTML_PAGE = ( # noqa: E501
"""<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>EventLoopNode Live Demo</title>
<style>
* { box-sizing: border-box; margin: 0; padding: 0; }
body {
font-family: 'SF Mono', 'Fira Code', monospace;
background: #0d1117; color: #c9d1d9;
height: 100vh; display: flex; flex-direction: column;
}
header {
background: #161b22; padding: 12px 20px;
border-bottom: 1px solid #30363d;
display: flex; align-items: center; gap: 16px;
}
header h1 { font-size: 16px; color: #58a6ff; font-weight: 600; }
.status {
font-size: 12px; padding: 3px 10px; border-radius: 12px;
background: #21262d; color: #8b949e;
}
.status.running { background: #1a4b2e; color: #3fb950; }
.status.done { background: #1a3a5c; color: #58a6ff; }
.status.error { background: #4b1a1a; color: #f85149; }
.chat { flex: 1; overflow-y: auto; padding: 16px; }
.msg {
margin: 8px 0; padding: 10px 14px; border-radius: 8px;
line-height: 1.6; white-space: pre-wrap; word-wrap: break-word;
}
.msg.user { background: #1a3a5c; color: #58a6ff; }
.msg.assistant { background: #161b22; color: #c9d1d9; }
.msg.event {
background: transparent; color: #8b949e; font-size: 11px;
padding: 4px 14px; border-left: 3px solid #30363d;
}
.msg.event.loop { border-left-color: #58a6ff; }
.msg.event.tool { border-left-color: #d29922; }
.msg.event.stall { border-left-color: #f85149; }
.input-bar {
padding: 12px 16px; background: #161b22;
border-top: 1px solid #30363d; display: flex; gap: 8px;
}
.input-bar input {
flex: 1; background: #0d1117; border: 1px solid #30363d;
color: #c9d1d9; padding: 8px 12px; border-radius: 6px;
font-family: inherit; font-size: 14px; outline: none;
}
.input-bar input:focus { border-color: #58a6ff; }
.input-bar button {
background: #238636; color: #fff; border: none;
padding: 8px 20px; border-radius: 6px; cursor: pointer;
font-family: inherit; font-weight: 600;
}
.input-bar button:hover { background: #2ea043; }
.input-bar button:disabled {
background: #21262d; color: #484f58; cursor: not-allowed;
}
.input-bar button.clear { background: #da3633; }
.input-bar button.clear:hover { background: #f85149; }
</style>
</head>
<body>
<header>
<h1>EventLoopNode Live</h1>
<span id="status" class="status">Idle</span>
<span id="iter" class="status" style="display:none">Step 0</span>
</header>
<div id="chat" class="chat"></div>
<div class="input-bar">
<input id="input" type="text"
placeholder="Ask anything..." autofocus />
<button id="go" onclick="run()">Send</button>
<button class="clear"
onclick="clearConversation()">Clear</button>
</div>
<script>
let ws = null;
let currentAssistantEl = null;
let iterCount = 0;
const chat = document.getElementById('chat');
const status = document.getElementById('status');
const iterEl = document.getElementById('iter');
const goBtn = document.getElementById('go');
const inputEl = document.getElementById('input');
inputEl.addEventListener('keydown', e => {
if (e.key === 'Enter') run();
});
function setStatus(text, cls) {
status.textContent = text;
status.className = 'status ' + cls;
}
function addMsg(text, cls) {
const el = document.createElement('div');
el.className = 'msg ' + cls;
el.textContent = text;
chat.appendChild(el);
chat.scrollTop = chat.scrollHeight;
return el;
}
function connect() {
ws = new WebSocket('ws://' + location.host + '/ws');
ws.onopen = () => {
setStatus('Ready', 'done');
goBtn.disabled = false;
};
ws.onmessage = handleEvent;
ws.onerror = () => { setStatus('Error', 'error'); };
ws.onclose = () => {
setStatus('Reconnecting...', '');
goBtn.disabled = true;
setTimeout(connect, 2000);
};
}
function handleEvent(msg) {
const evt = JSON.parse(msg.data);
if (evt.type === 'llm_text_delta') {
if (currentAssistantEl) {
currentAssistantEl.textContent += evt.content;
chat.scrollTop = chat.scrollHeight;
}
}
else if (evt.type === 'ready') {
setStatus('Ready', 'done');
if (currentAssistantEl && !currentAssistantEl.textContent)
currentAssistantEl.remove();
goBtn.disabled = false;
}
else if (evt.type === 'node_loop_iteration') {
iterCount = evt.iteration || (iterCount + 1);
iterEl.textContent = 'Step ' + iterCount;
iterEl.style.display = '';
}
else if (evt.type === 'tool_call_started') {
var info = evt.tool_name + '('
+ JSON.stringify(evt.tool_input).slice(0, 120) + ')';
addMsg('TOOL ' + info, 'event tool');
}
else if (evt.type === 'tool_call_completed') {
var preview = (evt.result || '').slice(0, 200);
var cls = evt.is_error ? 'stall' : 'tool';
addMsg('RESULT ' + evt.tool_name + ': ' + preview,
'event ' + cls);
currentAssistantEl = addMsg('', 'assistant');
}
else if (evt.type === 'result') {
setStatus('Session ended', evt.success ? 'done' : 'error');
if (evt.error) addMsg('ERROR ' + evt.error, 'event stall');
if (currentAssistantEl && !currentAssistantEl.textContent)
currentAssistantEl.remove();
goBtn.disabled = false;
}
else if (evt.type === 'node_stalled') {
addMsg('STALLED ' + evt.reason, 'event stall');
}
else if (evt.type === 'cleared') {
chat.innerHTML = '';
iterCount = 0;
iterEl.textContent = 'Step 0';
iterEl.style.display = 'none';
setStatus('Ready', 'done');
goBtn.disabled = false;
}
}
function run() {
const text = inputEl.value.trim();
if (!text || !ws || ws.readyState !== 1) return;
addMsg(text, 'user');
currentAssistantEl = addMsg('', 'assistant');
inputEl.value = '';
setStatus('Running', 'running');
goBtn.disabled = true;
ws.send(JSON.stringify({ topic: text }));
}
function clearConversation() {
if (ws && ws.readyState === 1) {
ws.send(JSON.stringify({ command: 'clear' }));
}
}
connect();
</script>
</body>
</html>"""
)
# -------------------------------------------------------------------------
# WebSocket handler
# -------------------------------------------------------------------------
async def handle_ws(websocket):
"""Persistent WebSocket: long-lived EventLoopNode with client_facing blocking."""
global STORE
# -- Event forwarding (WebSocket ← EventBus) ----------------------------
bus = EventBus()
async def forward_event(event):
try:
payload = {"type": event.type.value, **event.data}
if event.node_id:
payload["node_id"] = event.node_id
await websocket.send(json.dumps(payload))
except Exception:
pass
bus.subscribe(
event_types=[
EventType.NODE_LOOP_STARTED,
EventType.NODE_LOOP_ITERATION,
EventType.NODE_LOOP_COMPLETED,
EventType.LLM_TEXT_DELTA,
EventType.TOOL_CALL_STARTED,
EventType.TOOL_CALL_COMPLETED,
EventType.NODE_STALLED,
],
handler=forward_event,
)
# -- Per-connection state -----------------------------------------------
node = None
loop_task = None
tools = list(TOOL_REGISTRY.get_tools().values())
tool_executor = TOOL_REGISTRY.get_executor()
node_spec = NodeSpec(
id="assistant",
name="Chat Assistant",
description="A conversational assistant that remembers context across messages",
node_type="event_loop",
client_facing=True,
system_prompt=(
"You are a helpful assistant with access to tools. "
"You can search the web, scrape webpages, and query HubSpot CRM. "
"Use tools when the user asks for current information or external data. "
"You have full conversation history, so you can reference previous messages."
),
)
# -- Ready callback: subscribe to CLIENT_INPUT_REQUESTED on the bus ---
async def on_input_requested(event):
try:
await websocket.send(json.dumps({"type": "ready"}))
except Exception:
pass
bus.subscribe(
event_types=[EventType.CLIENT_INPUT_REQUESTED],
handler=on_input_requested,
)
async def start_loop(first_message: str):
"""Create an EventLoopNode and run it as a background task."""
nonlocal node, loop_task
memory = SharedMemory()
ctx = NodeContext(
runtime=RUNTIME,
node_id="assistant",
node_spec=node_spec,
memory=memory,
input_data={},
llm=LLM,
available_tools=tools,
)
node = EventLoopNode(
event_bus=bus,
config=LoopConfig(max_iterations=10_000, max_history_tokens=32_000),
conversation_store=STORE,
tool_executor=tool_executor,
)
await node.inject_event(first_message)
async def _run():
try:
result = await node.execute(ctx)
try:
await websocket.send(
json.dumps(
{
"type": "result",
"success": result.success,
"output": result.output,
"error": result.error,
"tokens": result.tokens_used,
}
)
)
except Exception:
pass
logger.info(f"Loop ended: success={result.success}, tokens={result.tokens_used}")
except websockets.exceptions.ConnectionClosed:
logger.info("Loop stopped: WebSocket closed")
except Exception as e:
logger.exception("Loop error")
try:
await websocket.send(
json.dumps(
{
"type": "result",
"success": False,
"error": str(e),
"output": {},
}
)
)
except Exception:
pass
loop_task = asyncio.create_task(_run())
async def stop_loop():
"""Signal the node and wait for the loop task to finish."""
nonlocal node, loop_task
if loop_task and not loop_task.done():
if node:
node.signal_shutdown()
try:
await asyncio.wait_for(loop_task, timeout=5.0)
except (TimeoutError, asyncio.CancelledError):
loop_task.cancel()
node = None
loop_task = None
# -- Message loop (runs for the lifetime of this WebSocket) -------------
try:
async for raw in websocket:
try:
msg = json.loads(raw)
except Exception:
continue
# Clear command
if msg.get("command") == "clear":
import shutil
await stop_loop()
await STORE.close()
conv_dir = STORE_DIR / "conversation"
if conv_dir.exists():
shutil.rmtree(conv_dir)
STORE = FileConversationStore(conv_dir)
await websocket.send(json.dumps({"type": "cleared"}))
logger.info("Conversation cleared")
continue
topic = msg.get("topic", "")
if not topic:
continue
if node is None:
# First message — spin up the loop
logger.info(f"Starting persistent loop: {topic}")
await start_loop(topic)
else:
# Subsequent message — inject into the running loop
logger.info(f"Injecting message: {topic}")
await node.inject_event(topic)
except websockets.exceptions.ConnectionClosed:
pass
finally:
await stop_loop()
logger.info("WebSocket closed, loop stopped")
# -------------------------------------------------------------------------
# HTTP handler for serving the HTML page
# -------------------------------------------------------------------------
async def process_request(connection, request: Request):
"""Serve HTML on GET /, upgrade to WebSocket on /ws."""
if request.path == "/ws":
return None # let websockets handle the upgrade
# Serve the HTML page for any other path
return Response(
HTTPStatus.OK,
"OK",
websockets.Headers({"Content-Type": "text/html; charset=utf-8"}),
HTML_PAGE.encode(),
)
# -------------------------------------------------------------------------
# Main
# -------------------------------------------------------------------------
async def main():
port = 8765
async with websockets.serve(
handle_ws,
"0.0.0.0",
port,
process_request=process_request,
):
logger.info(f"Demo running at http://localhost:{port}")
logger.info("Open in your browser and enter a topic to research.")
await asyncio.Future() # run forever
if __name__ == "__main__":
asyncio.run(main())
File diff suppressed because it is too large Load Diff
+930
View File
@@ -0,0 +1,930 @@
#!/usr/bin/env python3
"""
Two-Node ContextHandoff Demo
Demonstrates ContextHandoff between two EventLoopNode instances:
Node A (Researcher) ContextHandoff Node B (Analyst)
Real LLM, real FileConversationStore, real EventBus.
Streams both nodes to a browser via WebSocket.
Usage:
cd /home/timothy/oss/hive/core
python demos/handoff_demo.py
Then open http://localhost:8766 in your browser.
"""
import asyncio
import json
import logging
import sys
import tempfile
from http import HTTPStatus
from pathlib import Path
import httpx
import websockets
from bs4 import BeautifulSoup
from websockets.http11 import Request, Response
# Add core, tools, and hive root to path
_CORE_DIR = Path(__file__).resolve().parent.parent
_HIVE_DIR = _CORE_DIR.parent
sys.path.insert(0, str(_CORE_DIR)) # framework.*
sys.path.insert(0, str(_HIVE_DIR / "tools" / "src")) # aden_tools.*
sys.path.insert(0, str(_HIVE_DIR)) # core.framework.* (for aden_tools imports)
from aden_tools.credentials import CREDENTIAL_SPECS, CredentialStoreAdapter # noqa: E402
from core.framework.credentials import CredentialStore # noqa: E402
from framework.credentials.storage import ( # noqa: E402
CompositeStorage,
EncryptedFileStorage,
EnvVarStorage,
)
from framework.graph.context_handoff import ContextHandoff # noqa: E402
from framework.graph.conversation import NodeConversation # noqa: E402
from framework.graph.event_loop_node import EventLoopNode, LoopConfig # noqa: E402
from framework.graph.node import NodeContext, NodeSpec, SharedMemory # noqa: E402
from framework.llm.litellm import LiteLLMProvider # noqa: E402
from framework.llm.provider import Tool # noqa: E402
from framework.runner.tool_registry import ToolRegistry # noqa: E402
from framework.runtime.core import Runtime # noqa: E402
from framework.runtime.event_bus import EventBus, EventType # noqa: E402
from framework.storage.conversation_store import FileConversationStore # noqa: E402
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(name)s %(message)s")
logger = logging.getLogger("handoff_demo")
# -------------------------------------------------------------------------
# Persistent state
# -------------------------------------------------------------------------
STORE_DIR = Path(tempfile.mkdtemp(prefix="hive_handoff_"))
RUNTIME = Runtime(STORE_DIR / "runtime")
LLM = LiteLLMProvider(model="claude-sonnet-4-5-20250929")
# -------------------------------------------------------------------------
# Credentials
# -------------------------------------------------------------------------
# Composite credential store: encrypted files (primary) + env vars (fallback)
_env_mapping = {name: spec.env_var for name, spec in CREDENTIAL_SPECS.items()}
_composite = CompositeStorage(
primary=EncryptedFileStorage(),
fallbacks=[EnvVarStorage(env_mapping=_env_mapping)],
)
CREDENTIALS = CredentialStoreAdapter(CredentialStore(storage=_composite))
for _name in ["brave_search", "hubspot"]:
_val = CREDENTIALS.get(_name)
if _val:
logger.debug("credential %s: OK (len=%d)", _name, len(_val))
else:
logger.debug("credential %s: not found", _name)
# -------------------------------------------------------------------------
# Tool Registry — web_search + web_scrape for Node A (Researcher)
# -------------------------------------------------------------------------
TOOL_REGISTRY = ToolRegistry()
def _exec_web_search(inputs: dict) -> dict:
api_key = CREDENTIALS.get("brave_search")
if not api_key:
return {"error": "brave_search credential not configured"}
query = inputs.get("query", "")
num_results = min(inputs.get("num_results", 10), 20)
resp = httpx.get(
"https://api.search.brave.com/res/v1/web/search",
params={"q": query, "count": num_results},
headers={
"X-Subscription-Token": api_key,
"Accept": "application/json",
},
timeout=30.0,
)
if resp.status_code != 200:
return {"error": f"Brave API HTTP {resp.status_code}"}
data = resp.json()
results = [
{
"title": item.get("title", ""),
"url": item.get("url", ""),
"snippet": item.get("description", ""),
}
for item in data.get("web", {}).get("results", [])[:num_results]
]
return {"query": query, "results": results, "total": len(results)}
TOOL_REGISTRY.register(
name="web_search",
tool=Tool(
name="web_search",
description=(
"Search the web for current information. "
"Returns titles, URLs, and snippets from search results."
),
parameters={
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "The search query (1-500 characters)",
},
"num_results": {
"type": "integer",
"description": "Number of results (1-20, default 10)",
},
},
"required": ["query"],
},
),
executor=lambda inputs: _exec_web_search(inputs),
)
_SCRAPE_HEADERS = {
"User-Agent": (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/131.0.0.0 Safari/537.36"
),
"Accept": "text/html,application/xhtml+xml",
}
def _exec_web_scrape(inputs: dict) -> dict:
url = inputs.get("url", "")
max_length = max(1000, min(inputs.get("max_length", 50000), 500000))
if not url.startswith(("http://", "https://")):
url = "https://" + url
try:
resp = httpx.get(
url,
timeout=30.0,
follow_redirects=True,
headers=_SCRAPE_HEADERS,
)
if resp.status_code != 200:
return {"error": f"HTTP {resp.status_code}"}
soup = BeautifulSoup(resp.text, "html.parser")
for tag in soup(["script", "style", "nav", "footer", "header", "aside", "noscript"]):
tag.decompose()
title = soup.title.get_text(strip=True) if soup.title else ""
main = (
soup.find("article")
or soup.find("main")
or soup.find(attrs={"role": "main"})
or soup.find("body")
)
text = main.get_text(separator=" ", strip=True) if main else ""
text = " ".join(text.split())
if len(text) > max_length:
text = text[:max_length] + "..."
return {
"url": url,
"title": title,
"content": text,
"length": len(text),
}
except httpx.TimeoutException:
return {"error": "Request timed out"}
except Exception as e:
return {"error": f"Scrape failed: {e}"}
TOOL_REGISTRY.register(
name="web_scrape",
tool=Tool(
name="web_scrape",
description=(
"Scrape and extract text content from a webpage URL. "
"Returns the page title and main text content."
),
parameters={
"type": "object",
"properties": {
"url": {
"type": "string",
"description": "URL of the webpage to scrape",
},
"max_length": {
"type": "integer",
"description": "Maximum text length (default 50000)",
},
},
"required": ["url"],
},
),
executor=lambda inputs: _exec_web_scrape(inputs),
)
logger.info(
"ToolRegistry loaded: %s",
", ".join(TOOL_REGISTRY.get_registered_names()),
)
# -------------------------------------------------------------------------
# Node Specs
# -------------------------------------------------------------------------
RESEARCHER_SPEC = NodeSpec(
id="researcher",
name="Researcher",
description="Researches a topic using web search and scraping tools",
node_type="event_loop",
input_keys=["topic"],
output_keys=["research_summary"],
system_prompt=(
"You are a thorough research assistant. Your job is to research "
"the given topic using the web_search and web_scrape tools.\n\n"
"1. Search for relevant information on the topic\n"
"2. Scrape 1-2 of the most promising URLs for details\n"
"3. Synthesize your findings into a comprehensive summary\n"
"4. Use set_output with key='research_summary' to save your "
"findings\n\n"
"Be thorough but efficient. Aim for 2-4 search/scrape calls, "
"then summarize and set_output."
),
)
ANALYST_SPEC = NodeSpec(
id="analyst",
name="Analyst",
description="Analyzes research findings and provides insights",
node_type="event_loop",
input_keys=["context"],
output_keys=["analysis"],
system_prompt=(
"You are a strategic analyst. You receive research findings from "
"a previous researcher and must:\n\n"
"1. Identify key themes and patterns\n"
"2. Assess the reliability and significance of the findings\n"
"3. Provide actionable insights and recommendations\n"
"4. Use set_output with key='analysis' to save your analysis\n\n"
"Be concise but insightful. Focus on what matters most."
),
)
# -------------------------------------------------------------------------
# HTML page
# -------------------------------------------------------------------------
HTML_PAGE = ( # noqa: E501
"""<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>ContextHandoff Demo</title>
<style>
* {
box-sizing: border-box;
margin: 0;
padding: 0;
}
body {
font-family: 'SF Mono', 'Fira Code', monospace;
background: #0d1117;
color: #c9d1d9;
height: 100vh;
display: flex;
flex-direction: column;
}
header {
background: #161b22;
padding: 12px 20px;
border-bottom: 1px solid #30363d;
display: flex;
align-items: center;
gap: 16px;
}
header h1 {
font-size: 16px;
color: #58a6ff;
font-weight: 600;
}
.badge {
font-size: 12px;
padding: 3px 10px;
border-radius: 12px;
background: #21262d;
color: #8b949e;
}
.badge.researcher {
background: #1a3a5c;
color: #58a6ff;
}
.badge.analyst {
background: #1a4b2e;
color: #3fb950;
}
.badge.handoff {
background: #3d1f00;
color: #d29922;
}
.badge.done {
background: #21262d;
color: #8b949e;
}
.badge.error {
background: #4b1a1a;
color: #f85149;
}
.chat {
flex: 1;
overflow-y: auto;
padding: 16px;
}
.msg {
margin: 8px 0;
padding: 10px 14px;
border-radius: 8px;
line-height: 1.6;
white-space: pre-wrap;
word-wrap: break-word;
}
.msg.user {
background: #1a3a5c;
color: #58a6ff;
}
.msg.assistant {
background: #161b22;
color: #c9d1d9;
}
.msg.assistant.analyst-msg {
border-left: 3px solid #3fb950;
}
.msg.event {
background: transparent;
color: #8b949e;
font-size: 11px;
padding: 4px 14px;
border-left: 3px solid #30363d;
}
.msg.event.loop {
border-left-color: #58a6ff;
}
.msg.event.tool {
border-left-color: #d29922;
}
.msg.event.stall {
border-left-color: #f85149;
}
.handoff-banner {
margin: 16px 0;
padding: 16px;
background: #1c1200;
border: 1px solid #d29922;
border-radius: 8px;
text-align: center;
}
.handoff-banner h3 {
color: #d29922;
font-size: 14px;
margin-bottom: 8px;
}
.handoff-banner p, .result-banner p {
color: #8b949e;
font-size: 12px;
line-height: 1.5;
max-height: 200px;
overflow-y: auto;
white-space: pre-wrap;
text-align: left;
}
.result-banner {
margin: 16px 0;
padding: 16px;
background: #0a2614;
border: 1px solid #3fb950;
border-radius: 8px;
}
.result-banner h3 {
color: #3fb950;
font-size: 14px;
margin-bottom: 8px;
text-align: center;
}
.result-banner .label {
color: #58a6ff;
font-size: 11px;
font-weight: 600;
margin-top: 10px;
margin-bottom: 2px;
}
.result-banner .tokens {
color: #484f58;
font-size: 11px;
text-align: center;
margin-top: 10px;
}
.input-bar {
padding: 12px 16px;
background: #161b22;
border-top: 1px solid #30363d;
display: flex;
gap: 8px;
}
.input-bar input {
flex: 1;
background: #0d1117;
border: 1px solid #30363d;
color: #c9d1d9;
padding: 8px 12px;
border-radius: 6px;
font-family: inherit;
font-size: 14px;
outline: none;
}
.input-bar input:focus {
border-color: #58a6ff;
}
.input-bar button {
background: #238636;
color: #fff;
border: none;
padding: 8px 20px;
border-radius: 6px;
cursor: pointer;
font-family: inherit;
font-weight: 600;
}
.input-bar button:hover {
background: #2ea043;
}
.input-bar button:disabled {
background: #21262d;
color: #484f58;
cursor: not-allowed;
}
</style>
</head>
<body>
<header>
<h1>ContextHandoff Demo</h1>
<span id="phase" class="badge">Idle</span>
<span id="iter" class="badge" style="display:none">Step 0</span>
</header>
<div id="chat" class="chat"></div>
<div class="input-bar">
<input id="input" type="text"
placeholder="Enter a research topic..." autofocus />
<button id="go" onclick="run()">Research</button>
</div>
<script>
let ws = null;
let currentAssistantEl = null;
let iterCount = 0;
let currentPhase = 'idle';
const chat = document.getElementById('chat');
const phase = document.getElementById('phase');
const iterEl = document.getElementById('iter');
const goBtn = document.getElementById('go');
const inputEl = document.getElementById('input');
inputEl.addEventListener('keydown', e => {
if (e.key === 'Enter') run();
});
function setPhase(text, cls) {
phase.textContent = text;
phase.className = 'badge ' + cls;
currentPhase = cls;
}
function addMsg(text, cls) {
const el = document.createElement('div');
el.className = 'msg ' + cls;
el.textContent = text;
chat.appendChild(el);
chat.scrollTop = chat.scrollHeight;
return el;
}
function addHandoffBanner(summary) {
const banner = document.createElement('div');
banner.className = 'handoff-banner';
const h3 = document.createElement('h3');
h3.textContent = 'Context Handoff: Researcher -> Analyst';
const p = document.createElement('p');
p.textContent = summary || 'Passing research context...';
banner.appendChild(h3);
banner.appendChild(p);
chat.appendChild(banner);
chat.scrollTop = chat.scrollHeight;
}
function addResultBanner(researcher, analyst, tokens) {
const banner = document.createElement('div');
banner.className = 'result-banner';
const h3 = document.createElement('h3');
h3.textContent = 'Pipeline Complete';
banner.appendChild(h3);
if (researcher && researcher.research_summary) {
const lbl = document.createElement('div');
lbl.className = 'label';
lbl.textContent = 'RESEARCH SUMMARY';
banner.appendChild(lbl);
const p = document.createElement('p');
p.textContent = researcher.research_summary;
banner.appendChild(p);
}
if (analyst && analyst.analysis) {
const lbl = document.createElement('div');
lbl.className = 'label';
lbl.textContent = 'ANALYSIS';
lbl.style.color = '#3fb950';
banner.appendChild(lbl);
const p = document.createElement('p');
p.textContent = analyst.analysis;
banner.appendChild(p);
}
if (tokens) {
const t = document.createElement('div');
t.className = 'tokens';
t.textContent = 'Total tokens: ' + tokens.toLocaleString();
banner.appendChild(t);
}
chat.appendChild(banner);
chat.scrollTop = chat.scrollHeight;
}
function connect() {
ws = new WebSocket('ws://' + location.host + '/ws');
ws.onopen = () => {
setPhase('Ready', 'done');
goBtn.disabled = false;
};
ws.onmessage = handleEvent;
ws.onerror = () => { setPhase('Error', 'error'); };
ws.onclose = () => {
setPhase('Reconnecting...', '');
goBtn.disabled = true;
setTimeout(connect, 2000);
};
}
function handleEvent(msg) {
const evt = JSON.parse(msg.data);
if (evt.type === 'phase') {
if (evt.phase === 'researcher') {
setPhase('Researcher', 'researcher');
} else if (evt.phase === 'handoff') {
setPhase('Handoff', 'handoff');
} else if (evt.phase === 'analyst') {
setPhase('Analyst', 'analyst');
}
iterCount = 0;
iterEl.style.display = 'none';
}
else if (evt.type === 'llm_text_delta') {
if (currentAssistantEl) {
currentAssistantEl.textContent += evt.content;
chat.scrollTop = chat.scrollHeight;
}
}
else if (evt.type === 'node_loop_iteration') {
iterCount = evt.iteration || (iterCount + 1);
iterEl.textContent = 'Step ' + iterCount;
iterEl.style.display = '';
}
else if (evt.type === 'tool_call_started') {
var info = evt.tool_name + '('
+ JSON.stringify(evt.tool_input).slice(0, 120) + ')';
addMsg('TOOL ' + info, 'event tool');
}
else if (evt.type === 'tool_call_completed') {
var preview = (evt.result || '').slice(0, 200);
var cls = evt.is_error ? 'stall' : 'tool';
addMsg(
'RESULT ' + evt.tool_name + ': ' + preview,
'event ' + cls
);
var assistCls = currentPhase === 'analyst'
? 'assistant analyst-msg' : 'assistant';
currentAssistantEl = addMsg('', assistCls);
}
else if (evt.type === 'handoff_context') {
addHandoffBanner(evt.summary);
var assistCls = 'assistant analyst-msg';
currentAssistantEl = addMsg('', assistCls);
}
else if (evt.type === 'node_result') {
if (evt.node_id === 'researcher') {
if (currentAssistantEl
&& !currentAssistantEl.textContent) {
currentAssistantEl.remove();
}
}
}
else if (evt.type === 'done') {
setPhase('Done', 'done');
iterEl.style.display = 'none';
if (currentAssistantEl
&& !currentAssistantEl.textContent) {
currentAssistantEl.remove();
}
currentAssistantEl = null;
addResultBanner(
evt.researcher, evt.analyst, evt.total_tokens
);
goBtn.disabled = false;
inputEl.placeholder = 'Enter another topic...';
}
else if (evt.type === 'error') {
setPhase('Error', 'error');
addMsg('ERROR ' + evt.message, 'event stall');
goBtn.disabled = false;
}
else if (evt.type === 'node_stalled') {
addMsg('STALLED ' + evt.reason, 'event stall');
}
}
function run() {
const text = inputEl.value.trim();
if (!text || !ws || ws.readyState !== 1) return;
chat.innerHTML = '';
addMsg(text, 'user');
currentAssistantEl = addMsg('', 'assistant');
inputEl.value = '';
goBtn.disabled = true;
ws.send(JSON.stringify({ topic: text }));
}
connect();
</script>
</body>
</html>"""
)
# -------------------------------------------------------------------------
# WebSocket handler — sequential Node A → Handoff → Node B
# -------------------------------------------------------------------------
async def handle_ws(websocket):
"""Run the two-node handoff pipeline per user message."""
try:
async for raw in websocket:
try:
msg = json.loads(raw)
except Exception:
continue
topic = msg.get("topic", "")
if not topic:
continue
logger.info(f"Starting handoff pipeline for: {topic}")
try:
await _run_pipeline(websocket, topic)
except websockets.exceptions.ConnectionClosed:
logger.info("WebSocket closed during pipeline")
return
except Exception as e:
logger.exception("Pipeline error")
try:
await websocket.send(json.dumps({"type": "error", "message": str(e)}))
except Exception:
pass
except websockets.exceptions.ConnectionClosed:
pass
async def _run_pipeline(websocket, topic: str):
"""Execute: Node A (research) → ContextHandoff → Node B (analysis)."""
import shutil
# Fresh stores for each run
run_dir = Path(tempfile.mkdtemp(prefix="hive_run_", dir=STORE_DIR))
store_a = FileConversationStore(run_dir / "node_a")
store_b = FileConversationStore(run_dir / "node_b")
# Shared event bus
bus = EventBus()
async def forward_event(event):
try:
payload = {"type": event.type.value, **event.data}
if event.node_id:
payload["node_id"] = event.node_id
await websocket.send(json.dumps(payload))
except Exception:
pass
bus.subscribe(
event_types=[
EventType.NODE_LOOP_STARTED,
EventType.NODE_LOOP_ITERATION,
EventType.NODE_LOOP_COMPLETED,
EventType.LLM_TEXT_DELTA,
EventType.TOOL_CALL_STARTED,
EventType.TOOL_CALL_COMPLETED,
EventType.NODE_STALLED,
],
handler=forward_event,
)
tools = list(TOOL_REGISTRY.get_tools().values())
tool_executor = TOOL_REGISTRY.get_executor()
# ---- Phase 1: Researcher ------------------------------------------------
await websocket.send(json.dumps({"type": "phase", "phase": "researcher"}))
node_a = EventLoopNode(
event_bus=bus,
judge=None, # implicit judge: accept when output_keys filled
config=LoopConfig(
max_iterations=20,
max_tool_calls_per_turn=10,
max_history_tokens=32_000,
),
conversation_store=store_a,
tool_executor=tool_executor,
)
ctx_a = NodeContext(
runtime=RUNTIME,
node_id="researcher",
node_spec=RESEARCHER_SPEC,
memory=SharedMemory(),
input_data={"topic": topic},
llm=LLM,
available_tools=tools,
)
result_a = await node_a.execute(ctx_a)
logger.info(
"Researcher done: success=%s, tokens=%s",
result_a.success,
result_a.tokens_used,
)
await websocket.send(
json.dumps(
{
"type": "node_result",
"node_id": "researcher",
"success": result_a.success,
"output": result_a.output,
}
)
)
if not result_a.success:
await websocket.send(
json.dumps(
{
"type": "error",
"message": f"Researcher failed: {result_a.error}",
}
)
)
return
# ---- Phase 2: Context Handoff -------------------------------------------
await websocket.send(json.dumps({"type": "phase", "phase": "handoff"}))
# Restore the researcher's conversation from store
conversation_a = await NodeConversation.restore(store_a)
if conversation_a is None:
await websocket.send(
json.dumps(
{
"type": "error",
"message": "Failed to restore researcher conversation",
}
)
)
return
handoff_engine = ContextHandoff(llm=LLM)
handoff_context = handoff_engine.summarize_conversation(
conversation=conversation_a,
node_id="researcher",
output_keys=["research_summary"],
)
formatted_handoff = ContextHandoff.format_as_input(handoff_context)
logger.info(
"Handoff: %d turns, ~%d tokens, keys=%s",
handoff_context.turn_count,
handoff_context.total_tokens_used,
list(handoff_context.key_outputs.keys()),
)
# Send handoff context to browser
await websocket.send(
json.dumps(
{
"type": "handoff_context",
"summary": handoff_context.summary[:500],
"turn_count": handoff_context.turn_count,
"tokens": handoff_context.total_tokens_used,
"key_outputs": handoff_context.key_outputs,
}
)
)
# ---- Phase 3: Analyst ---------------------------------------------------
await websocket.send(json.dumps({"type": "phase", "phase": "analyst"}))
node_b = EventLoopNode(
event_bus=bus,
judge=None, # implicit judge
config=LoopConfig(
max_iterations=10,
max_tool_calls_per_turn=5,
max_history_tokens=32_000,
),
conversation_store=store_b,
)
ctx_b = NodeContext(
runtime=RUNTIME,
node_id="analyst",
node_spec=ANALYST_SPEC,
memory=SharedMemory(),
input_data={"context": formatted_handoff},
llm=LLM,
available_tools=[],
)
result_b = await node_b.execute(ctx_b)
logger.info(
"Analyst done: success=%s, tokens=%s",
result_b.success,
result_b.tokens_used,
)
# ---- Done ---------------------------------------------------------------
await websocket.send(
json.dumps(
{
"type": "done",
"researcher": result_a.output,
"analyst": result_b.output,
"total_tokens": ((result_a.tokens_used or 0) + (result_b.tokens_used or 0)),
}
)
)
# Clean up temp stores
try:
shutil.rmtree(run_dir)
except Exception:
pass
# -------------------------------------------------------------------------
# HTTP handler
# -------------------------------------------------------------------------
async def process_request(connection, request: Request):
"""Serve HTML on GET /, upgrade to WebSocket on /ws."""
if request.path == "/ws":
return None
return Response(
HTTPStatus.OK,
"OK",
websockets.Headers({"Content-Type": "text/html; charset=utf-8"}),
HTML_PAGE.encode(),
)
# -------------------------------------------------------------------------
# Main
# -------------------------------------------------------------------------
async def main():
port = 8766
async with websockets.serve(
handle_ws,
"0.0.0.0",
port,
process_request=process_request,
):
logger.info(f"Handoff demo at http://localhost:{port}")
logger.info("Enter a research topic to start the pipeline.")
await asyncio.Future()
if __name__ == "__main__":
asyncio.run(main())
File diff suppressed because it is too large Load Diff
+132
View File
@@ -0,0 +1,132 @@
"""
Minimal Manual Agent Example
----------------------------
This example demonstrates how to build and run an agent programmatically
without using the Claude Code CLI or external LLM APIs.
It uses custom NodeProtocol implementations to define logic in pure Python,
making it perfect for understanding the core runtime loop:
Setup -> Graph definition -> Execution -> Result
Run with:
uv run python core/examples/manual_agent.py
"""
import asyncio
from framework.graph import EdgeCondition, EdgeSpec, Goal, GraphSpec, NodeSpec
from framework.graph.executor import GraphExecutor
from framework.graph.node import NodeContext, NodeProtocol, NodeResult
from framework.runtime.core import Runtime
# 1. Define Node Logic (Custom NodeProtocol implementations)
class GreeterNode(NodeProtocol):
"""Generate a simple greeting."""
async def execute(self, ctx: NodeContext) -> NodeResult:
name = ctx.input_data.get("name", "World")
greeting = f"Hello, {name}!"
ctx.memory.write("greeting", greeting)
return NodeResult(success=True, output={"greeting": greeting})
class UppercaserNode(NodeProtocol):
"""Convert text to uppercase."""
async def execute(self, ctx: NodeContext) -> NodeResult:
greeting = ctx.input_data.get("greeting") or ctx.memory.read("greeting") or ""
result = greeting.upper()
ctx.memory.write("final_greeting", result)
return NodeResult(success=True, output={"final_greeting": result})
async def main():
print("Setting up Manual Agent...")
# 2. Define the Goal
# Every agent needs a goal with success criteria
goal = Goal(
id="greet-user",
name="Greet User",
description="Generate a friendly uppercase greeting",
success_criteria=[
{
"id": "greeting_generated",
"description": "Greeting produced",
"metric": "custom",
"target": "any",
}
],
)
# 3. Define Nodes
# Nodes describe steps in the process
node1 = NodeSpec(
id="greeter",
name="Greeter",
description="Generates a simple greeting",
node_type="event_loop",
input_keys=["name"],
output_keys=["greeting"],
)
node2 = NodeSpec(
id="uppercaser",
name="Uppercaser",
description="Converts greeting to uppercase",
node_type="event_loop",
input_keys=["greeting"],
output_keys=["final_greeting"],
)
# 4. Define Edges
# Edges define the flow between nodes
edge1 = EdgeSpec(
id="greet-to-upper",
source="greeter",
target="uppercaser",
condition=EdgeCondition.ON_SUCCESS,
)
# 5. Create Graph
# The graph works like a blueprint connecting nodes and edges
graph = GraphSpec(
id="greeting-agent",
goal_id="greet-user",
entry_node="greeter",
terminal_nodes=["uppercaser"],
nodes=[node1, node2],
edges=[edge1],
)
# 6. Initialize Runtime & Executor
# Runtime handles state/memory; Executor runs the graph
from pathlib import Path
runtime = Runtime(storage_path=Path("./agent_logs"))
executor = GraphExecutor(runtime=runtime)
# 7. Register Node Implementations
# Connect node IDs in the graph to actual Python implementations
executor.register_node("greeter", GreeterNode())
executor.register_node("uppercaser", UppercaserNode())
# 8. Execute Agent
print("Executing agent with input: name='Alice'...")
result = await executor.execute(graph=graph, goal=goal, input_data={"name": "Alice"})
# 9. Verify Results
if result.success:
print("\nSuccess!")
print(f"Path taken: {' -> '.join(result.path)}")
print(f"Final output: {result.output.get('final_greeting')}")
else:
print(f"\nFailed: {result.error}")
if __name__ == "__main__":
# Optional: Enable logging to see internal decision flow
# logging.basicConfig(level=logging.INFO)
asyncio.run(main())
+13 -18
View File
@@ -37,9 +37,9 @@ async def example_1_programmatic_registration():
print(f"\nAvailable tools: {list(tools.keys())}")
# Run the agent with MCP tools available
result = await runner.run({
"objective": "Search for 'Claude AI' and summarize the top 3 results"
})
result = await runner.run(
{"objective": "Search for 'Claude AI' and summarize the top 3 results"}
)
print(f"\nAgent result: {result}")
@@ -78,10 +78,8 @@ async def example_3_config_file():
# Copy example config (in practice, you'd place this in your agent folder)
import shutil
shutil.copy(
"examples/mcp_servers.json",
test_agent_path / "mcp_servers.json"
)
shutil.copy("examples/mcp_servers.json", test_agent_path / "mcp_servers.json")
# Load agent - MCP servers will be auto-discovered
runner = AgentRunner.load(test_agent_path)
@@ -101,34 +99,30 @@ async def example_4_custom_agent_with_mcp_tools():
"""Example 4: Build custom agent that uses MCP tools"""
print("\n=== Example 4: Custom Agent with MCP Tools ===\n")
from framework.builder.workflow import WorkflowBuilder
from framework.builder.workflow import GraphBuilder
# Create a workflow builder
builder = WorkflowBuilder()
builder = GraphBuilder()
# Define goal
builder.set_goal(
goal_id="web-researcher",
name="Web Research Agent",
description="Search the web and summarize findings"
description="Search the web and summarize findings",
)
# Add success criteria
builder.add_success_criterion(
"search-results",
"Successfully retrieve at least 3 web search results"
)
builder.add_success_criterion(
"summary",
"Provide a clear, concise summary of the findings"
"search-results", "Successfully retrieve at least 3 web search results"
)
builder.add_success_criterion("summary", "Provide a clear, concise summary of the findings")
# Add nodes that will use MCP tools
builder.add_node(
node_id="web-searcher",
name="Web Search",
description="Search the web for information",
node_type="llm_tool_use",
node_type="event_loop",
system_prompt="Search for {query} and return the top results. Use the web_search tool.",
tools=["web_search"], # This tool comes from tools MCP server
input_keys=["query"],
@@ -139,7 +133,7 @@ async def example_4_custom_agent_with_mcp_tools():
node_id="summarizer",
name="Summarize Results",
description="Summarize the search results",
node_type="llm_generate",
node_type="event_loop",
system_prompt="Summarize the following search results in 2-3 sentences: {search_results}",
input_keys=["search_results"],
output_keys=["summary"],
@@ -192,6 +186,7 @@ async def main():
except Exception as e:
print(f"\nError running example: {e}")
import traceback
traceback.print_exc()
+2 -2
View File
@@ -4,8 +4,8 @@
"name": "tools",
"description": "Aden tools including web search, file operations, and PDF reading",
"transport": "stdio",
"command": "python",
"args": ["mcp_server.py", "--stdio"],
"command": "uv",
"args": ["run", "python", "mcp_server.py", "--stdio"],
"cwd": "../tools",
"env": {
"BRAVE_SEARCH_API_KEY": "${BRAVE_SEARCH_API_KEY}"
+9 -13
View File
@@ -22,24 +22,22 @@ The framework includes a Goal-Based Testing system (Goal → Agent → Eval):
See `framework.testing` for details.
"""
from framework.schemas.decision import Decision, Option, Outcome, DecisionEvaluation
from framework.schemas.run import Run, RunSummary, Problem
from framework.runtime.core import Runtime
from framework.builder.query import BuilderQuery
from framework.llm import LLMProvider, AnthropicProvider
from framework.runner import AgentRunner, AgentOrchestrator
from framework.llm import AnthropicProvider, LLMProvider
from framework.runner import AgentOrchestrator, AgentRunner
from framework.runtime.core import Runtime
from framework.schemas.decision import Decision, DecisionEvaluation, Option, Outcome
from framework.schemas.run import Problem, Run, RunSummary
# Testing framework
from framework.testing import (
ApprovalStatus,
DebugTool,
ErrorCategory,
Test,
TestResult,
TestSuiteResult,
TestStorage,
ApprovalStatus,
ErrorCategory,
ConstraintTestGenerator,
SuccessCriteriaTestGenerator,
DebugTool,
TestSuiteResult,
)
__all__ = [
@@ -68,7 +66,5 @@ __all__ = [
"TestStorage",
"ApprovalStatus",
"ErrorCategory",
"ConstraintTestGenerator",
"SuccessCriteriaTestGenerator",
"DebugTool",
]

Some files were not shown because too many files have changed in this diff Show More