Compare commits

...

156 Commits

Author SHA1 Message Date
RichardTang-Aden fd79dceb0f Merge pull request #6166 from aden-hive/fix/subagent-reply-stall
Release / Create Release (push) Waiting to run
micro-fix: update escalation tests for new ESCALATION_REQUESTED flow
2026-03-10 19:47:00 -07:00
Richard Tang ad50139d67 chore: lint 2026-03-10 19:46:35 -07:00
Richard Tang 12fb40c110 test: update escalation tests for ESCALATION_REQUESTED flow
Tests were asserting the old CLIENT_OUTPUT_DELTA + CLIENT_INPUT_REQUESTED
pattern; the fix in 89ccd66f routes escalations through the queen via
ESCALATION_REQUESTED instead.
2026-03-10 19:45:21 -07:00
RichardTang-Aden 738e469d96 Merge pull request #6165 from aden-hive/feature/provider-moonshotai-kimi
feat: support MoonShot AI Kimi subscription
2026-03-10 19:39:25 -07:00
Timothy 80ccbcc827 chore: lint 2026-03-10 19:37:18 -07:00
RichardTang-Aden 08fac31a9d Merge pull request #6159 from aden-hive/fix/subagent-reply-stall
fix: route subagent report_to_parent escalations to queen instead of user
2026-03-10 18:24:33 -07:00
Richard Tang 89ccd66fb9 fix: subagent _EscalationReceiver 2026-03-10 18:21:50 -07:00
Timothy 7c47e367de feat: support moonshotai kimi subscription 2026-03-10 18:03:44 -07:00
Timothy b8741bf94c fix: queen agent system prompt hooks 2026-03-10 16:25:07 -07:00
RichardTang-Aden c90dcbb32f Merge pull request #6152 from aden-hive/refactor/remove-dead-code
refactor: remove deprecated codes
2026-03-10 15:31:34 -07:00
Richard Tang ac3a5f5e93 chore: remove the ai generated temp doc 2026-03-10 15:29:21 -07:00
Timothy 1ccfdbbf7d chore: minimax key check 2026-03-10 15:24:09 -07:00
Timothy 1de37d2747 chore: lint 2026-03-10 15:00:14 -07:00
Timothy 2aefdf5b5f refactor: remove deprecated codes 2026-03-10 14:57:54 -07:00
Hundao 4caaa79900 Merge pull request #5988 from roberthallers/docs/fix-tui-deprecation-5941
docs: fix TUI deprecation inconsistency in roadmap
2026-03-10 16:46:41 +08:00
Hundao 296089d4cd Merge pull request #6108 from Hundao/fix/subagent-judge-feedback
fix: SubagentJudge and implicit judge return feedback=None on ACCEPT
2026-03-10 15:39:29 +08:00
hundao cae5f971cf fix: update test assertions for newly added tools
Tool counts and expected lists were outdated after new tools were added
to stripe, linear, apollo, discord, and google_analytics.
2026-03-10 15:36:12 +08:00
hundao bac716eea3 fix: pass feedback="" on evaluated ACCEPT verdicts in SubagentJudge and implicit judge
Fixes #6107
2026-03-10 15:24:39 +08:00
Navya Bijoy 14daf672e8 Fix: SessionManager._cleanup_stale_active_sessions indiscriminately cancels healthy concurrent agent sessions (#6081)
* fixes a bug in the  SessionManager

* chore: remove debug print from test

---------

Co-authored-by: hundao <alchemy_wimp@hotmail.com>
2026-03-10 15:18:11 +08:00
Emmanuel Nwanguma e352ae5145 fix(mcp): close errlog file handle to prevent resource leak (#6094)
Track the errlog file handle opened on non-Windows systems and
properly close it during cleanup to prevent file descriptor leaks.

Changes:
- Add _errlog_handle instance variable to track the file handle
- Store handle reference when opening os.devnull
- Close handle in _cleanup_stdio_async() after other cleanup
- Clear reference in disconnect() for safety

Fixes #6002
2026-03-10 15:06:51 +08:00
Pushkal a58ffc2669 fix(server): use session.phase_state instead of session.mode_state in handle_pause (#6069)
The handle_pause endpoint referenced session.mode_state (lines 360-361),
which does not exist on the Session dataclass. This caused an
AttributeError every time the pause endpoint reached the phase transition
step, preventing the queen phase from transitioning to staging and
returning a 500 error to the frontend.

Changed to session.phase_state, consistent with handle_stop (line 412),
handle_run (line 75), and the Session dataclass definition
(session_manager.py line 44).
2026-03-10 15:03:19 +08:00
RichardTang-Aden 3fefea52be Merge pull request #6102 from aden-hive/micro-fix/report-to-parent-empty-check
micro-fix: track reported_to_parent to prevent false empty-turn detection
2026-03-09 21:12:23 -07:00
Richard Tang 06fd045b3e micro-fix: track reported_to_parent to prevent false empty-turn detection
Turns that call report_to_parent were incorrectly treated as "truly
empty" because the flag was not propagated. Thread it through
_run_single_turn and include it in the empty-turn guard.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 21:10:47 -07:00
RichardTang-Aden 2e43d2af46 Merge pull request #6100 from aden-hive/feature/integration-extended
Release / Create Release (push) Waiting to run
micro-fix: wrong reference for hive_coder
2026-03-09 19:52:35 -07:00
Richard Tang 2c9790c65d Merge remote-tracking branch 'origin' into feature/integration-extended 2026-03-09 19:52:17 -07:00
Richard Tang 9700ac71bb micro-fix: wrong reference for hive_coder 2026-03-09 19:50:07 -07:00
RichardTang-Aden 61ed67b068 Merge pull request #6097 from aden-hive/feature/integration-extended
Expand integration tool coverage across 40 vendors
2026-03-09 19:47:34 -07:00
Richard Tang c3bea8685a Merge remote-tracking branch 'origin/main' into feature/integration-extended 2026-03-09 19:47:21 -07:00
RichardTang-Aden 98c57b795a Merge pull request #6050 from aden-hive/feat/queen-planning-phase
Add queen planning phase, global memory, and refactor hive_coder
2026-03-09 19:46:23 -07:00
Richard Tang 9be1d03b5c chore ruff lint 2026-03-09 19:45:36 -07:00
Richard Tang 0d09510539 Merge remote-tracking branch 'origin/main' into feat/queen-planning-phase 2026-03-09 19:42:10 -07:00
Richard Tang 639c37ba17 feat: prompt to init the agent 2026-03-09 19:34:01 -07:00
Richard Tang 2258c23254 Merge branch 'feature/queen-global-memory' into feat/queen-planning-phase 2026-03-09 19:11:32 -07:00
Richard Tang 9714ea106d feat: improve initialize_and_build_agent clarity 2026-03-09 18:54:48 -07:00
Timothy f4ad500177 chore: lint 2026-03-09 18:53:01 -07:00
Timothy 9154a4d9f8 fix: resolve E501 line-too-long lint errors across 7 tool files 2026-03-09 18:51:01 -07:00
Timothy add6efe6f1 fix(micro-fix): increase stall threshold 2026-03-09 18:40:13 -07:00
Richard Tang 7ceb1efd02 fix: replace old tool name reference 2026-03-09 18:40:01 -07:00
Timothy a29ecf8435 chore(micro-fix): fix ci test blockage 2026-03-09 18:27:21 -07:00
Richard Tang d0ba5ef4f4 fix: update the wrong variable name 2026-03-09 18:12:29 -07:00
Richard Tang 860f637491 feat: add validation for module import 2026-03-09 17:53:50 -07:00
Richard Tang acb2cab317 feat: minor prompt change for switching to building mode 2026-03-09 17:41:23 -07:00
Richard Tang b453806918 feat: execution end message 2026-03-09 17:29:58 -07:00
Richard Tang 7ba8a0f51b feat: strengthen validation logic when loading 2026-03-09 17:08:20 -07:00
Richard Tang f6f398b6b1 feat: add GCU knowledge to planning 2026-03-09 17:02:13 -07:00
Timothy c4b22fa5c4 feat(postgres): update credential spec with new tool names 2026-03-09 16:47:27 -07:00
Timothy 0e64f977cd feat(postgres): add table stats, indexes, and foreign keys tools
Add pg_get_table_stats for row counts and size info,
pg_list_indexes for index details, and pg_get_foreign_keys
for relationship discovery with both outgoing and incoming FKs.
2026-03-09 16:47:09 -07:00
Timothy f24c9708fc feat(lusha): update credential spec with new tool names 2026-03-09 16:45:33 -07:00
Timothy bb4436e277 feat(lusha): add bulk enrich, technologies, and decision makers tools
Add lusha_bulk_enrich_persons for batch enrichment,
lusha_get_technologies for company tech stack lookup, and
lusha_search_decision_makers for senior contact discovery.
2026-03-09 16:45:17 -07:00
Timothy 795f66c90b feat(gsc): update credential spec with new tool names 2026-03-09 16:44:33 -07:00
Timothy 9ef6d51573 feat(gsc): add top queries, top pages, and delete sitemap tools
Add gsc_top_queries and gsc_top_pages convenience wrappers for
click-sorted analytics, and gsc_delete_sitemap for sitemap removal.
2026-03-09 16:44:20 -07:00
Timothy 3fed4e3409 feat(aws-s3): update credential specs with new tool names 2026-03-09 16:43:37 -07:00
Timothy 670e69f2ce feat(aws-s3): add copy, metadata, and presigned URL tools
Add s3_copy_object for copying within/between buckets,
s3_get_object_metadata for HEAD-based metadata retrieval, and
s3_generate_presigned_url for temporary access URL generation.
2026-03-09 16:42:46 -07:00
Timothy f6c4747905 feat(pushover): update credential spec with new tool names 2026-03-09 16:42:04 -07:00
Timothy 7b78f6c12f feat(pushover): add cancel receipt, glance update, and limits tools
Add pushover_cancel_receipt for stopping emergency retries,
pushover_send_glance for widget data updates, and
pushover_get_limits for checking message usage.
2026-03-09 16:41:52 -07:00
Timothy 1c75100f59 feat(news): update credential spec with new tool names 2026-03-09 16:41:15 -07:00
Timothy b325e103c6 feat(news): add latest, by-source, and by-topic search tools
Add news_latest for breaking news without query, news_by_source
for source-filtered articles, and news_by_topic for topic-based
discovery with automatic date ranges.
2026-03-09 16:40:54 -07:00
Timothy aef2d2d474 feat(serpapi): update credential spec with new tool names 2026-03-09 16:40:05 -07:00
Timothy 95a2b6711e feat(serpapi): add cited-by, profile search, and Google web search tools
Add scholar_cited_by for finding papers citing a given paper,
scholar_search_profiles for author profile discovery, and
serpapi_google_search for structured Google web results.
2026-03-09 16:38:50 -07:00
Timothy 7fb5e8145c feat(exa-search): update credential spec with new tool names 2026-03-09 16:37:56 -07:00
Timothy 8e45d0df83 feat(exa-search): add news, papers, and company search tools
Add exa_search_news, exa_search_papers, and exa_search_companies
convenience wrappers with pre-configured category filters and
automatic date/domain filtering.
2026-03-09 16:37:44 -07:00
Richard Tang 8d4657c13e Merge branch 'feat/queen-planning-phase' into feature/queen-global-memory 2026-03-09 16:10:42 -07:00
Timothy 3d175a6d54 feat(greenhouse): update credential spec with new tool names
Add greenhouse_list_offers, greenhouse_add_candidate_note, greenhouse_list_scorecards.
2026-03-09 16:02:53 -07:00
Timothy b9debaf957 feat(greenhouse): add list offers, candidate notes, and scorecards tools
- greenhouse_list_offers: GET /offers or /applications/{id}/offers
- greenhouse_add_candidate_note: POST /candidates/{id}/activity_feed/notes
- greenhouse_list_scorecards: GET /applications/{id}/scorecards
- Add _post helper for POST requests
2026-03-09 16:02:08 -07:00
Richard Tang bdcbcff6f3 feat: better instruction for planning mode switch 2026-03-09 16:01:34 -07:00
Timothy d2d7bdc374 feat(brevo): update credential spec with new tool names
Add brevo_list_contacts, brevo_delete_contact, brevo_list_email_campaigns.
2026-03-09 16:01:16 -07:00
Timothy 40e494b15d feat(brevo): add list contacts, delete contact, and list campaigns tools
- brevo_list_contacts: GET /contacts with pagination and modified_since filter
- brevo_delete_contact: DELETE /contacts/{email} to remove contacts
- brevo_list_email_campaigns: GET /emailCampaigns with status filter and stats
2026-03-09 16:00:42 -07:00
Timothy b5e840c0cb feat(quickbooks): update credential specs with new tool names
Add quickbooks_list_invoices, quickbooks_get_customer, quickbooks_create_payment
to both credential specs (token and realm_id).
2026-03-09 15:59:46 -07:00
Timothy f3d74c9ae4 feat(quickbooks): add list invoices, get customer, and create payment tools
- quickbooks_list_invoices: query invoices with status/customer filters
- quickbooks_get_customer: GET /customer/{id} with address and contact info
- quickbooks_create_payment: POST /payment with optional invoice linking
2026-03-09 15:59:23 -07:00
Richard Tang a22b321692 feat: improve phase switching tools 2026-03-09 15:33:03 -07:00
Timothy 2e7dbad118 feat(cloudinary): update credential specs with new tool names
Add cloudinary_get_usage, cloudinary_rename_resource, cloudinary_add_tag
to all three credential specs (cloud_name, key, secret).
2026-03-09 15:31:42 -07:00
Timothy 6183d1b65b feat(cloudinary): add usage, rename, and add tag tools
- cloudinary_get_usage: GET /usage for storage, bandwidth, transformation limits
- cloudinary_rename_resource: POST /rename to change public_id
- cloudinary_add_tag: POST /tags to add tags to resources
2026-03-09 15:31:22 -07:00
Timothy 09931e6d98 feat(twitter): update credential spec with new tool names
Add twitter_get_user_followers, twitter_get_tweet_replies, twitter_get_list_tweets.
2026-03-09 15:25:21 -07:00
Timothy cb394127d1 feat(twitter): add user followers, tweet replies, and list tweets tools
- twitter_get_user_followers: GET /users/{id}/followers with profile details
- twitter_get_tweet_replies: search recent replies via conversation_id
- twitter_get_list_tweets: GET /lists/{id}/tweets with author expansion
2026-03-09 15:21:47 -07:00
Timothy 588fa1f9ea feat(google-analytics): update credential spec with new tool names
Add ga_get_user_demographics, ga_get_conversion_events, ga_get_landing_pages.
2026-03-09 15:21:09 -07:00
Timothy 73325c280c feat(google-analytics): add demographics, conversion events, and landing pages tools
- ga_get_user_demographics: country/language/device breakdown
- ga_get_conversion_events: event counts, conversions, and revenue
- ga_get_landing_pages: top landing pages with bounce rate and session duration
2026-03-09 15:20:51 -07:00
Timothy 8c5ae8ffa8 feat(docker-hub): update credential spec with new tool names
Add docker_hub_get_tag_detail, docker_hub_delete_tag, docker_hub_list_webhooks.
2026-03-09 15:19:58 -07:00
Timothy 7389423c70 feat(docker-hub): add tag detail, delete tag, and list webhooks tools
- docker_hub_get_tag_detail: GET /repositories/{repo}/tags/{tag} with image architectures
- docker_hub_delete_tag: DELETE /repositories/{repo}/tags/{tag}
- docker_hub_list_webhooks: GET /repositories/{repo}/webhooks
- Add _delete helper for DELETE requests
2026-03-09 15:18:46 -07:00
Timothy 20c15446a7 feat(apollo): update credential spec with new tool names
Add apollo_get_person_activities, apollo_list_email_accounts,
apollo_bulk_enrich_people.
2026-03-09 15:17:38 -07:00
Richard Tang c05c30dd9a feat: add meta agent tools to planning 2026-03-09 15:14:34 -07:00
Timothy bcd2fb76bd feat(apollo): add person activities, email accounts, and bulk enrich tools
- apollo_get_person_activities: GET /activities for contact activity history
- apollo_list_email_accounts: GET /email_accounts for connected sending accounts
- apollo_bulk_enrich_people: POST /people/bulk_match for batch enrichment (up to 10)
2026-03-09 15:03:21 -07:00
Timothy 5fb97ab6df feat(calendly): update credential spec with new tool names
Add calendly_cancel_event, calendly_list_webhooks, calendly_get_event_type.
2026-03-09 15:00:46 -07:00
Timothy 0224ebc800 feat(calendly): add cancel event, list webhooks, and get event type tools
- calendly_cancel_event: POST /scheduled_events/{id}/cancellation
- calendly_list_webhooks: GET /webhook_subscriptions for org/user scope
- calendly_get_event_type: GET /event_types/{id} for meeting template details
- Add _post helper for POST requests
2026-03-09 15:00:34 -07:00
Timothy af88f7299a feat(pagerduty): update credential specs with new tool names
Add pagerduty_list_oncalls, pagerduty_add_incident_note,
pagerduty_list_escalation_policies to api_key spec.
Add pagerduty_add_incident_note to from_email spec (write operation).
2026-03-09 14:59:53 -07:00
Timothy 81729706ae feat(pagerduty): add oncalls, incident notes, and escalation policies tools
- pagerduty_list_oncalls: GET /oncalls with schedule/policy filters
- pagerduty_add_incident_note: POST /incidents/{id}/notes to add notes
- pagerduty_list_escalation_policies: GET /escalation_policies with search
2026-03-09 14:59:33 -07:00
Timothy bbb1b43ebe feat(airtable): update credential spec with new tool names
Add airtable_delete_records, airtable_search_records, airtable_list_collaborators.
2026-03-09 14:58:57 -07:00
Timothy 70ed5fa8df feat(airtable): add delete records, search records, and list collaborators tools
- airtable_delete_records: DELETE records by comma-separated IDs (up to 10)
- airtable_search_records: search records using FIND formula for partial matching
- airtable_list_collaborators: list base collaborators via meta API
- Add _delete helper for DELETE requests
2026-03-09 14:58:42 -07:00
Timothy 312db6620d feat(reddit): update credential specs with new tool names
Add reddit_get_subreddit_info, reddit_get_post_detail, reddit_get_user_posts
to both credential specs (client_id and client_secret).
2026-03-09 14:57:50 -07:00
Timothy 93c1fc5488 feat(reddit): add subreddit info, post detail, and user posts tools
- reddit_get_subreddit_info: GET /r/{name}/about for subscriber count, description
- reddit_get_post_detail: GET /by_id/t3_{id} for full post details with flair, ratios
- reddit_get_user_posts: GET /user/{name}/submitted for user's post history
2026-03-09 14:57:33 -07:00
Richard Tang 90762f275b feat: give planning mode the load tool 2026-03-09 14:55:53 -07:00
Timothy 801443027d feat(pipedrive): update credential spec with new tool names
Add pipedrive_update_deal, pipedrive_create_person, pipedrive_create_activity
to the credential spec tools list.
2026-03-09 14:54:22 -07:00
Timothy ca2ead76cd feat(pipedrive): add deal update, person creation, and activity creation tools
Add pipedrive_update_deal, pipedrive_create_person, and
pipedrive_create_activity tools using Pipedrive REST API v1.
2026-03-09 14:52:27 -07:00
Timothy d562144a6d feat(confluence): register new tools in credential specs
Add confluence_update_page, confluence_delete_page, and
confluence_get_page_children to all three Confluence credential specs.
2026-03-09 14:51:39 -07:00
Timothy af7fb7da27 feat(confluence): add page update, delete, and children listing tools
Add confluence_update_page, confluence_delete_page, and
confluence_get_page_children tools using Confluence REST API v2.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 14:51:26 -07:00
Timothy c17dd63b4a feat(intercom): register new tools in credential spec
Add intercom_close_conversation, intercom_create_contact, and
intercom_list_conversations to Intercom credential spec.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 14:50:49 -07:00
Timothy 866db289e2 feat(intercom): add close conversation, create contact, and list conversations tools
Add close_conversation, create_contact, and list_conversations client
methods plus intercom_close_conversation, intercom_create_contact, and
intercom_list_conversations MCP tools using Intercom API v2.11.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 14:50:30 -07:00
Timothy b4ac5e9607 feat(gitlab): register new tools in credential spec
Add gitlab_update_issue, gitlab_get_merge_request, and
gitlab_create_merge_request_note to GitLab credential spec.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 14:49:01 -07:00
Timothy 3ca7af4242 feat(gitlab): add issue update, MR detail, and MR comment tools
Add _put helper and gitlab_update_issue, gitlab_get_merge_request,
and gitlab_create_merge_request_note tools using GitLab REST API v4.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 14:48:40 -07:00
Richard Tang 2b12a9c91a Merge remote-tracking branch 'origin/feature/queen-global-memory' into feature/queen-global-memory 2026-03-09 14:47:27 -07:00
Richard Tang 9a94595a42 feat: extract the shared knowledge between planning and building 2026-03-09 14:45:31 -07:00
Richard Tang e1540dfaa6 refactor: drop hive code CLI 2026-03-09 14:30:13 -07:00
Richard Tang 4f5ac6d1b1 refactor: rename hive_coder to queen and extract queen orchestrator 2026-03-09 14:23:31 -07:00
Richard Tang c87d7b13da refactor: rename hive_coder to queen and extract queen orchestrator 2026-03-09 14:23:16 -07:00
Timothy c4acf0b659 fix: memory consolidation hook, simplify generated memory files 2026-03-09 14:15:01 -07:00
RichardTang-Aden 5e1ab3ca37 Merge pull request #5029 from karthik-kotra/docs/setup-troubleshooting
docs(setup): add troubleshooting steps for common WSL setup issues
2026-03-09 14:06:28 -07:00
Timothy 79c32c9f47 feat(slack): register new tools in credential spec
Add slack_get_channel_info, slack_list_files, and slack_get_file_info
to Slack credential spec.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:58:14 -07:00
Timothy 35ee29a843 feat(slack): add channel info, file listing, and file detail tools
Add get_channel_info, list_files, and get_file_info client methods
plus slack_get_channel_info, slack_list_files, and slack_get_file_info
MCP tools using Slack Web API.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:57:45 -07:00
Timothy 573aea1d9c feat(stripe): register new tools in credential spec
Add stripe_list_disputes, stripe_list_events, and
stripe_create_checkout_session to Stripe credential spec.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:56:20 -07:00
Timothy 6ecbc30293 feat(stripe): add disputes, events, and checkout session tools
Add list_disputes, list_events, and create_checkout_session client
methods plus stripe_list_disputes, stripe_list_events, and
stripe_create_checkout_session MCP tools using Stripe API.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:56:07 -07:00
Timothy 843b1f2e1d feat(linear): register new tools in credential spec
Add linear_cycles_list, linear_issue_comments_list, and
linear_issue_relation_create to Linear credential spec.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:54:48 -07:00
Timothy 89f6c8e4ef feat(linear): add cycle listing, issue comments, and issue relations tools
Add list_cycles, list_issue_comments, and create_issue_relation client
methods plus linear_cycles_list, linear_issue_comments_list, and
linear_issue_relation_create MCP tools using Linear GraphQL API.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:52:09 -07:00
Timothy 304ac07bd8 feat(zoom): register new tools in credential spec
Add zoom_update_meeting, zoom_list_meeting_participants, and
zoom_list_meeting_registrants to Zoom credential spec.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:50:27 -07:00
Timothy 82f0684b83 feat(zoom): add meeting update, participants, and registrants tools
Add zoom_update_meeting (PATCH), zoom_list_meeting_participants
(past meeting attendees), and zoom_list_meeting_registrants
using Zoom REST API v2.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:45:11 -07:00
Timothy 963c37dc31 feat(twilio): register new tools in credential specs
Add twilio_list_phone_numbers, twilio_list_calls, and
twilio_delete_message to both Twilio credential specs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:41:26 -07:00
Timothy c02da3ba5a feat(twilio): add phone number listing, call history, and message deletion tools
Add twilio_list_phone_numbers, twilio_list_calls, and
twilio_delete_message tools using Twilio REST API.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:40:58 -07:00
Timothy 7f34e95ec6 feat(shopify): register new tools in credential specs
Add shopify_update_product, shopify_get_customer, and
shopify_create_draft_order to both Shopify credential specs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:40:28 -07:00
Timothy f2998fe098 feat(shopify): add product update, customer detail, and draft order tools
Add shopify_update_product, shopify_get_customer, and
shopify_create_draft_order tools using Shopify Admin REST API.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:40:15 -07:00
Timothy 323a2489b8 feat(zendesk): register new tools in credential specs
Add zendesk_get_ticket_comments, zendesk_add_ticket_comment, and
zendesk_list_users to all three Zendesk credential specs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:39:35 -07:00
Timothy f6d1cd640e feat(zendesk): add ticket comments and user listing tools
Add zendesk_get_ticket_comments, zendesk_add_ticket_comment, and
zendesk_list_users tools using Zendesk Support API v2.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:39:25 -07:00
Timothy ddf89a04fe feat(asana): update credential spec for new tools
Register asana_update_task, asana_add_comment, and
asana_create_subtask in the Asana credential spec.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:35:16 -07:00
Timothy c5dc89f5ee feat(asana): add update_task, add_comment, create_subtask tools
Add _put helper and three new Asana MCP tools:
- asana_update_task: modify name, notes, completion, due date, assignee
- asana_add_comment: post comment stories on tasks
- asana_create_subtask: create subtasks under existing tasks

API ref: https://developers.asana.com/docs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:35:05 -07:00
Timothy 6ade34b759 feat(trello): register get_card, create_list, search_cards tools
Add three new Trello MCP tools:
- trello_get_card: retrieve full card details with members/checklists/attachments
- trello_create_list: create new lists on boards
- trello_search_cards: full-text search across cards with board scoping

Update credential spec to include the new tool names.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:20:43 -07:00
Timothy 09d5f0a9df feat(trello): add client methods for get_card, create_list, search
Add TrelloClient methods for:
- get_card: GET /1/cards/{id} with members, checklists, attachments
- create_list: POST /1/lists to create new board lists
- search: GET /1/search for full-text search across cards

API ref: https://developer.atlassian.com/cloud/trello/rest/api-group-cards/

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:19:59 -07:00
Timothy a60d63cca2 feat(github): register list_commits, create_release, list_workflow_runs
Add three new GitHub MCP tools:
- github_list_commits: query commits with author/date/branch filters
- github_create_release: create tagged releases with notes and draft support
- github_list_workflow_runs: monitor CI/CD pipeline runs with status filters

Update credential spec to include the new tool names.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:19:16 -07:00
Timothy 8616975fc5 feat(github): add client methods for commits, releases, workflow runs
Add _GitHubClient methods for:
- list_commits: GET /repos/{owner}/{repo}/commits with sha/author/date filters
- create_release: POST /repos/{owner}/{repo}/releases with tag, notes, draft
- list_workflow_runs: GET /repos/{owner}/{repo}/actions/runs with filters

API ref: https://docs.github.com/en/rest

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:18:33 -07:00
Timothy e5ae919d8f feat(telegram): register get_chat_member_count, send_video, set_description
Add three new Telegram MCP tools:
- telegram_get_chat_member_count: retrieve group/channel membership size
- telegram_send_video: send video files via URL or file_id
- telegram_set_chat_description: update group/channel descriptions

Update credential spec to include the new tool names.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:17:30 -07:00
Timothy 8e7f5eaaba feat(telegram): add client methods for member count, video, description
Add _TelegramClient methods for:
- get_chat_member_count: getChatMemberCount API endpoint
- send_video: sendVideo with caption, parse_mode, duration support
- set_chat_description: setChatDescription for groups/channels

API ref: https://core.telegram.org/bots/api

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:13:06 -07:00
Timothy 4d1ff8b054 feat(salesforce): update credential spec for new tools
Register salesforce_delete_record, salesforce_search_records, and
salesforce_get_record_count in both Salesforce credential specs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:12:25 -07:00
Timothy 9fa81e8599 feat(salesforce): add delete_record, search_records, get_record_count
Add three new Salesforce MCP tools:
- salesforce_delete_record: DELETE /sobjects/{type}/{id}
- salesforce_search_records: SOSL full-text search via /search/
- salesforce_get_record_count: efficient COUNT() query for any SObject

API ref: https://developer.salesforce.com/docs/atlas.en-us.api_rest.meta

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:12:11 -07:00
Timothy cf8e19b059 feat(discord): register get_channel, create_reaction, delete_message tools
Add three new Discord MCP tools:
- discord_get_channel: retrieve channel metadata (name, topic, type)
- discord_create_reaction: add emoji reactions to messages
- discord_delete_message: remove messages from channels

Update credential spec to include the new tool names.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:11:25 -07:00
Timothy dfa3f60fcf feat(discord): add client methods for get_channel, reactions, delete
Add _DiscordClient methods for:
- get_channel: retrieve channel metadata via GET /channels/{id}
- create_reaction: add emoji reaction via PUT reactions endpoint
- delete_message: remove a message via DELETE messages endpoint

API ref: https://discord.com/developers/docs/resources

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:10:49 -07:00
Timothy b795f1b253 feat(notion): update credential spec for new tools
Register notion_update_page, notion_archive_page, and
notion_append_blocks in the Notion credential spec.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:06:20 -07:00
Timothy 73423c0dd2 feat(notion): add update_page, archive_page, append_blocks tools
Add three new Notion MCP tools:
- notion_update_page: modify page properties via PATCH /pages/{id}
- notion_archive_page: archive or restore pages
- notion_append_blocks: add paragraphs, headings, lists, todos, etc.

API ref: https://developers.notion.com/reference

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:06:08 -07:00
Timothy 3d844e1539 feat(jira): update credential spec for new tools
Register jira_update_issue, jira_list_transitions, and
jira_transition_issue in all three Jira credential specs
(domain, email, token).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:04:32 -07:00
Timothy b619119eb5 feat(jira): add update_issue, list_transitions, transition_issue tools
Add three new Jira MCP tools:
- jira_update_issue: modify summary, description, priority, labels, assignee
- jira_list_transitions: discover available status transitions for an issue
- jira_transition_issue: move an issue to a new status with optional comment

API ref: https://developer.atlassian.com/cloud/jira/platform/rest/v3/

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:04:19 -07:00
Timothy b00ed4fc70 feat(hubspot): register delete_object, list/create_associations tools
Add three new MCP tools:
- hubspot_delete_object: archive contacts, companies, or deals
- hubspot_list_associations: query links between CRM objects (v4 API)
- hubspot_create_association: link two CRM records together

Update credential spec to include the new tool names.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:03:37 -07:00
Timothy 5ec5fbe998 feat(hubspot): add client methods for delete, associations
Add _HubSpotClient methods for:
- delete_object: archive a CRM object via DELETE /crm/v3/objects
- list_associations: query associations via GET /crm/v4/objects associations endpoint
- create_association: link two CRM objects via PUT /crm/v4/objects associations endpoint

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:02:49 -07:00
Richard Tang 2ed814455a Merge branch 'feat/queen-planning-phase' into feature/queen-global-memory 2026-03-09 12:57:23 -07:00
Timothy ad1a4ef0c3 fix: cancellation button 2026-03-09 12:48:20 -07:00
Timothy 2111c808a9 feat: queen memory v1 2026-03-09 11:55:39 -07:00
Bryan @ Aden 402bb38267 Merge pull request #6079 from Waryjustice/fix/google-sheets-credentials-orphan
fix(credentials): remove orphaned google_sheets.py credential spec
2026-03-09 18:37:27 +00:00
Waryjustice 0a55928872 fix(credentials): remove orphaned google_sheets.py credential spec
The google_sheets.py file defined GOOGLE_SHEETS_CREDENTIALS (an API-key
based credential for reading public sheets via GOOGLE_SHEETS_API_KEY) but
was never wired into the package:

- Never imported in credentials/__init__.py
- Never merged into CREDENTIAL_SPECS
- Never listed in __all__
- Tool never calls credentials.get('google_sheets_key') — uses 'google' (OAuth2)
- Tool names in the spec were stale and did not match actual function names

The 'google' credential in email.py already correctly covers all Google
Sheets tools via OAuth2. This file was dead code with no referencing
imports anywhere in the repository.

Closes #6077

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-09 23:44:26 +05:30
Richard Tang cdf76ae3b9 fix: eventloop test 2026-03-09 10:23:56 -07:00
Richard Tang 42d0592941 refactor: judge evaluate 2026-03-09 10:09:15 -07:00
Richard Tang 1de7cf821d fix: handle judge with empty message 2026-03-09 09:58:29 -07:00
Timothy 4ea8540e25 fix: better logging for memory consolidation event 2026-03-08 20:44:40 -07:00
Timothy bfa3b8e0f6 fix: queen memory health 2026-03-08 20:28:53 -07:00
Richard Tang 55eccfd75f feat: intake node prompt in planning mode 2026-03-08 20:27:24 -07:00
Timothy 1e994a77b5 feat: queen agent global memory 2026-03-08 19:54:46 -07:00
Richard Tang d12afeb35d chore: ruff lint 2026-03-08 19:46:49 -07:00
Timothy @aden b55a77634b Delete .github/ISSUE_TEMPLATE/link-discord.yml 2026-03-08 19:44:48 -07:00
Richard Tang e84fefd319 feat: separate the queen and worker tools in prompts 2026-03-08 19:40:30 -07:00
Richard Tang d2b510014d feat: adjust tools and knowledge separation between planning and building 2026-03-08 19:21:50 -07:00
Richard Tang 3ed5fda448 feat: planning phase for the queen 2026-03-08 18:49:45 -07:00
Robert Hallers 7a467ef9b8 docs: mark TUI as deprecated in roadmap to match CLAUDE.md
Resolves inconsistency between CLAUDE.md/AGENTS.md (TUI deprecated) and
docs/roadmap.md (TUI listed as completed feature).

- Strike through TUI items in 3 roadmap sections
- Add deprecation note to TUI-to-GUI upgrade section
- Reference AGENTS.md and hive open as replacement

Fixes #5941

Signed-off-by: Robert Hallers <robert@terplabs.ai>
2026-03-07 02:36:04 -05:00
karthik-kotra 41cd11d5c9 docs(setup): add troubleshooting steps for common WSL setup issues 2026-02-17 07:30:00 +00:00
183 changed files with 9503 additions and 8489 deletions
-31
View File
@@ -1,31 +0,0 @@
name: Link Discord Account
description: Connect your GitHub and Discord for the bounty program
title: "link: @{{ github.actor }}"
labels: ["link-discord"]
body:
- type: markdown
attributes:
value: |
Link your Discord account to receive XP and role rewards when your bounty PRs are merged.
**How to find your Discord ID:**
1. Open Discord Settings > Advanced > Enable **Developer Mode**
2. Right-click your username > **Copy User ID**
- type: input
id: discord_id
attributes:
label: Discord User ID
description: "Your numeric Discord ID (not your username). Example: 123456789012345678"
placeholder: "123456789012345678"
validations:
required: true
- type: input
id: display_name
attributes:
label: Display Name (optional)
description: How you'd like to be credited
placeholder: "Jane Doe"
validations:
required: false
-4
View File
@@ -2,10 +2,6 @@
Shared agent instructions for this workspace.
## Deprecations
- **TUI is deprecated.** The terminal UI (`hive tui`) is no longer maintained. Use the browser-based interface (`hive open`) instead.
## Coding Agent Notes
-
+46
View File
@@ -65,6 +65,52 @@ You may submit PRs without prior assignment for:
> **Tip:** Installing Claude Code skills is optional for running existing agents, but required if you plan to **build new agents**.
## Troubleshooting Setup Issues
If you encounter issues while setting up the development environment, the following steps may help:
### `make: command not found`
Install `make` using:
```bash
sudo apt install make
uv: command not found
Install uv using:
curl -LsSf https://astral.sh/uv/install.sh | sh
source ~/.bashrc
ruff: not found
If linting fails due to a missing ruff command, install it with:
uv tool install ruff
WSL Path Recommendation
When using WSL, it is recommended to clone the repository inside your Linux home directory (e.g., ~/hive) instead of under /mnt/c/... to avoid potential performance and permission issues.
---
# ✅ Why This Is Good
- Clear
- Professional tone
- No unnecessary explanation
- Under micro-fix size
- Based on real contributor experience
- Wont annoy maintainers
---
Now:
```bash
git checkout -b docs/setup-troubleshooting
## Commit Convention
We follow [Conventional Commits](https://www.conventionalcommits.org/):
+1 -1
View File
@@ -1,6 +1,6 @@
# MCP Server Guide - Agent Building Tools
> **Note:** The standalone `agent-builder` MCP server (`framework.mcp.agent_builder_server`) has been replaced. Agent building is now done via the `coder-tools` server's `initialize_agent_package` tool, with underlying logic in `framework.builder.package_generator`.
> **Note:** The standalone `agent-builder` MCP server (`framework.mcp.agent_builder_server`) has been replaced. Agent building is now done via the `coder-tools` server's `initialize_and_build_agent` tool, with underlying logic in `tools/coder_tools_server.py`.
This guide covers the MCP tools available for building goal-driven agents.
+1 -1
View File
@@ -19,7 +19,7 @@ uv pip install -e .
## Agent Building
Agent scaffolding is handled by the `coder-tools` MCP server (in `tools/coder_tools_server.py`), which provides the `initialize_agent_package` tool and related utilities. The underlying package generation logic lives in `framework.builder.package_generator`.
Agent scaffolding is handled by the `coder-tools` MCP server (in `tools/coder_tools_server.py`), which provides the `initialize_and_build_agent` tool and related utilities. The package generation logic lives directly in `tools/coder_tools_server.py`.
See the [Getting Started Guide](../docs/getting-started.md) for building agents.
-3
View File
@@ -22,7 +22,6 @@ The framework includes a Goal-Based Testing system (Goal → Agent → Eval):
See `framework.testing` for details.
"""
from framework.builder.query import BuilderQuery
from framework.llm import AnthropicProvider, LLMProvider
from framework.runner import AgentOrchestrator, AgentRunner
from framework.runtime.core import Runtime
@@ -51,8 +50,6 @@ __all__ = [
"Problem",
# Runtime
"Runtime",
# Builder
"BuilderQuery",
# LLM
"LLMProvider",
"AnthropicProvider",
@@ -51,42 +51,6 @@ def cli():
pass
@cli.command()
@click.option("--verbose", "-v", is_flag=True)
@click.option("--debug", is_flag=True)
def tui(verbose, debug):
"""Launch TUI to test a credential interactively."""
setup_logging(verbose=verbose, debug=debug)
try:
from framework.tui.app import AdenTUI
except ImportError:
click.echo("TUI requires 'textual'. Install with: pip install textual")
sys.exit(1)
agent = CredentialTesterAgent()
account = pick_account(agent)
if account is None:
sys.exit(1)
agent.select_account(account)
provider = account.get("provider", "?")
alias = account.get("alias", "?")
click.echo(f"\nTesting {provider}/{alias}...\n")
async def run_tui():
agent._setup()
runtime = agent._agent_runtime
await runtime.start()
try:
app = AdenTUI(runtime)
await app.run_async()
finally:
await runtime.stop()
asyncio.run(run_tui())
@cli.command()
@click.option("--verbose", "-v", is_flag=True)
@click.option("--debug", is_flag=True)
+151
View File
@@ -0,0 +1,151 @@
"""Agent discovery — scan known directories and return categorised AgentEntry lists."""
from __future__ import annotations
import json
from dataclasses import dataclass, field
from pathlib import Path
@dataclass
class AgentEntry:
"""Lightweight agent metadata for the picker / API discover endpoint."""
path: Path
name: str
description: str
category: str
session_count: int = 0
node_count: int = 0
tool_count: int = 0
tags: list[str] = field(default_factory=list)
last_active: str | None = None
def _get_last_active(agent_name: str) -> str | None:
"""Return the most recent updated_at timestamp across all sessions."""
sessions_dir = Path.home() / ".hive" / "agents" / agent_name / "sessions"
if not sessions_dir.exists():
return None
latest: str | None = None
for session_dir in sessions_dir.iterdir():
if not session_dir.is_dir() or not session_dir.name.startswith("session_"):
continue
state_file = session_dir / "state.json"
if not state_file.exists():
continue
try:
data = json.loads(state_file.read_text(encoding="utf-8"))
ts = data.get("timestamps", {}).get("updated_at")
if ts and (latest is None or ts > latest):
latest = ts
except Exception:
continue
return latest
def _count_sessions(agent_name: str) -> int:
"""Count session directories under ~/.hive/agents/{agent_name}/sessions/."""
sessions_dir = Path.home() / ".hive" / "agents" / agent_name / "sessions"
if not sessions_dir.exists():
return 0
return sum(1 for d in sessions_dir.iterdir() if d.is_dir() and d.name.startswith("session_"))
def _extract_agent_stats(agent_path: Path) -> tuple[int, int, list[str]]:
"""Extract node count, tool count, and tags from an agent directory.
Prefers agent.py (AST-parsed) over agent.json for node/tool counts
since agent.json may be stale. Tags are only available from agent.json.
"""
import ast
node_count, tool_count, tags = 0, 0, []
agent_py = agent_path / "agent.py"
if agent_py.exists():
try:
tree = ast.parse(agent_py.read_text(encoding="utf-8"))
for node in ast.walk(tree):
if isinstance(node, ast.Assign):
for target in node.targets:
if isinstance(target, ast.Name) and target.id == "nodes":
if isinstance(node.value, ast.List):
node_count = len(node.value.elts)
except Exception:
pass
agent_json = agent_path / "agent.json"
if agent_json.exists():
try:
data = json.loads(agent_json.read_text(encoding="utf-8"))
json_nodes = data.get("nodes", [])
if node_count == 0:
node_count = len(json_nodes)
tools: set[str] = set()
for n in json_nodes:
tools.update(n.get("tools", []))
tool_count = len(tools)
tags = data.get("agent", {}).get("tags", [])
except Exception:
pass
return node_count, tool_count, tags
def discover_agents() -> dict[str, list[AgentEntry]]:
"""Discover agents from all known sources grouped by category."""
from framework.runner.cli import (
_extract_python_agent_metadata,
_get_framework_agents_dir,
_is_valid_agent_dir,
)
groups: dict[str, list[AgentEntry]] = {}
sources = [
("Your Agents", Path("exports")),
("Framework", _get_framework_agents_dir()),
("Examples", Path("examples/templates")),
]
for category, base_dir in sources:
if not base_dir.exists():
continue
entries: list[AgentEntry] = []
for path in sorted(base_dir.iterdir(), key=lambda p: p.name):
if not _is_valid_agent_dir(path):
continue
name, desc = _extract_python_agent_metadata(path)
config_fallback_name = path.name.replace("_", " ").title()
used_config = name != config_fallback_name
node_count, tool_count, tags = _extract_agent_stats(path)
if not used_config:
agent_json = path / "agent.json"
if agent_json.exists():
try:
data = json.loads(agent_json.read_text(encoding="utf-8"))
meta = data.get("agent", {})
name = meta.get("name", name)
desc = meta.get("description", desc)
except Exception:
pass
entries.append(
AgentEntry(
path=path,
name=name,
description=desc,
category=category,
session_count=_count_sessions(path.name),
node_count=node_count,
tool_count=tool_count,
tags=tags,
last_active=_get_last_active(path.name),
)
)
if entries:
groups[category] = entries
return groups
@@ -1,40 +0,0 @@
"""
Hive Coder Native coding agent that builds Hive agent packages.
Deeply understands the agent framework and produces complete Python packages
with goals, nodes, edges, system prompts, MCP configuration, and tests
from natural language specifications.
"""
from .agent import (
conversation_mode,
edges,
entry_node,
entry_points,
goal,
identity_prompt,
loop_config,
nodes,
pause_nodes,
terminal_nodes,
)
from .config import AgentMetadata, RuntimeConfig, default_config, metadata
__version__ = "1.0.0"
__all__ = [
"goal",
"nodes",
"edges",
"entry_node",
"entry_points",
"pause_nodes",
"terminal_nodes",
"conversation_mode",
"identity_prompt",
"loop_config",
"RuntimeConfig",
"AgentMetadata",
"default_config",
"metadata",
]
@@ -1,60 +0,0 @@
"""CLI entry point for Hive Coder agent."""
import json
import logging
import sys
import click
from .agent import entry_node, goal, nodes
from .config import metadata
def setup_logging(verbose=False, debug=False):
"""Configure logging for execution visibility."""
if debug:
level, fmt = logging.DEBUG, "%(asctime)s %(name)s: %(message)s"
elif verbose:
level, fmt = logging.INFO, "%(message)s"
else:
level, fmt = logging.WARNING, "%(levelname)s: %(message)s"
logging.basicConfig(level=level, format=fmt, stream=sys.stderr)
logging.getLogger("framework").setLevel(level)
@click.group()
@click.version_option(version="1.0.0")
def cli():
"""Hive Coder — Build Hive agent packages from natural language."""
pass
@cli.command()
@click.option("--json", "output_json", is_flag=True)
def info(output_json):
"""Show agent information."""
info_data = {
"name": metadata.name,
"version": metadata.version,
"description": metadata.description,
"goal": {
"name": goal.name,
"description": goal.description,
},
"nodes": [n.id for n in nodes],
"entry_node": entry_node,
"client_facing_nodes": [n.id for n in nodes if n.client_facing],
}
if output_json:
click.echo(json.dumps(info_data, indent=2))
else:
click.echo(f"Agent: {info_data['name']}")
click.echo(f"Version: {info_data['version']}")
click.echo(f"Description: {info_data['description']}")
click.echo(f"\nNodes: {', '.join(info_data['nodes'])}")
click.echo(f"Client-facing: {', '.join(info_data['client_facing_nodes'])}")
click.echo(f"Entry: {info_data['entry_node']}")
if __name__ == "__main__":
cli()
-153
View File
@@ -1,153 +0,0 @@
"""Agent graph construction for Hive Coder."""
from framework.graph import Constraint, Goal, SuccessCriterion
from framework.graph.edge import GraphSpec
from .nodes import coder_node, queen_node
# Goal definition
goal = Goal(
id="hive-coder",
name="Hive Agent Builder",
description=(
"Build complete, validated Hive agent packages from natural language "
"specifications. Produces production-ready Python packages with goals, "
"nodes, edges, system prompts, MCP configuration, and tests."
),
success_criteria=[
SuccessCriterion(
id="valid-package",
description="Generated agent package passes structural validation",
metric="validation_pass",
target="true",
weight=0.30,
),
SuccessCriterion(
id="complete-files",
description=(
"All required files generated: agent.py, config.py, "
"nodes/__init__.py, __init__.py, __main__.py, mcp_servers.json"
),
metric="file_count",
target=">=6",
weight=0.25,
),
SuccessCriterion(
id="user-satisfaction",
description="User reviews and approves the generated agent",
metric="user_approval",
target="true",
weight=0.25,
),
SuccessCriterion(
id="framework-compliance",
description=(
"Generated code follows framework patterns: STEP 1/STEP 2 "
"for client-facing and correct imports"
),
metric="pattern_compliance",
target="100%",
weight=0.20,
),
],
constraints=[
Constraint(
id="dynamic-tool-discovery",
description=(
"Always discover available tools dynamically via "
"list_agent_tools before referencing tools in agent designs"
),
constraint_type="hard",
category="correctness",
),
Constraint(
id="no-fabricated-tools",
description="Only reference tools that exist in hive-tools MCP",
constraint_type="hard",
category="correctness",
),
Constraint(
id="valid-python",
description="All generated Python files must be syntactically correct",
constraint_type="hard",
category="correctness",
),
Constraint(
id="self-verification",
description="Run validation after writing code; fix errors before presenting",
constraint_type="hard",
category="quality",
),
],
)
# Nodes: primary coder node only. The queen runs as an independent
# GraphExecutor with queen_node — not as part of this graph.
nodes = [coder_node]
# No edges needed — single event_loop node
edges = []
# Graph configuration
entry_node = "coder"
entry_points = {"start": "coder"}
pause_nodes = []
terminal_nodes = [] # Coder node has output_keys and can terminate
# No async entry points needed — the queen is now an independent executor,
# not a secondary graph receiving events via add_graph().
async_entry_points = []
# Module-level variables read by AgentRunner.load()
conversation_mode = "continuous"
identity_prompt = (
"You are Hive Coder, the best agent-building coding agent on the planet. "
"You deeply understand the Hive agent framework at the source code level "
"and produce production-ready agent packages from natural language. "
"You can dynamically discover available framework tools, inspect runtime "
"sessions and checkpoints from agents you build, and run their test suites. "
"You follow coding agent discipline: read before writing, verify "
"assumptions by reading actual code, adhere to project conventions, "
"self-verify with validation, and fix your own errors. You are concise, "
"direct, and technically rigorous. No emojis. No fluff."
)
loop_config = {
"max_iterations": 100,
"max_tool_calls_per_turn": 30,
"max_history_tokens": 32000,
}
# ---------------------------------------------------------------------------
# Queen graph — runs as an independent persistent conversation in the TUI.
# Loaded by _load_judge_and_queen() in app.py, NOT by AgentRunner.
# ---------------------------------------------------------------------------
queen_goal = Goal(
id="queen-manager",
name="Queen Manager",
description=(
"Manage the worker agent lifecycle and serve as the user's primary "
"interactive interface. Triage health escalations from the judge."
),
success_criteria=[],
constraints=[],
)
queen_graph = GraphSpec(
id="queen-graph",
goal_id=queen_goal.id,
version="1.0.0",
entry_node="queen",
entry_points={"start": "queen"},
terminal_nodes=[],
pause_nodes=[],
nodes=[queen_node],
edges=[],
conversation_mode="continuous",
loop_config={
"max_iterations": 999_999,
"max_tool_calls_per_turn": 30,
"max_history_tokens": 32000,
},
)
+21
View File
@@ -0,0 +1,21 @@
"""
Queen Native agent builder for the Hive framework.
Deeply understands the agent framework and produces complete Python packages
with goals, nodes, edges, system prompts, MCP configuration, and tests
from natural language specifications.
"""
from .agent import queen_goal, queen_graph
from .config import AgentMetadata, RuntimeConfig, default_config, metadata
__version__ = "1.0.0"
__all__ = [
"queen_goal",
"queen_graph",
"RuntimeConfig",
"AgentMetadata",
"default_config",
"metadata",
]
+40
View File
@@ -0,0 +1,40 @@
"""Queen graph definition."""
from framework.graph import Goal
from framework.graph.edge import GraphSpec
from .nodes import queen_node
# ---------------------------------------------------------------------------
# Queen graph — the primary persistent conversation.
# Loaded by queen_orchestrator.create_queen(), NOT by AgentRunner.
# ---------------------------------------------------------------------------
queen_goal = Goal(
id="queen-manager",
name="Queen Manager",
description=(
"Manage the worker agent lifecycle and serve as the user's primary "
"interactive interface. Triage health escalations from the judge."
),
success_criteria=[],
constraints=[],
)
queen_graph = GraphSpec(
id="queen-graph",
goal_id=queen_goal.id,
version="1.0.0",
entry_node="queen",
entry_points={"start": "queen"},
terminal_nodes=[],
pause_nodes=[],
nodes=[queen_node],
edges=[],
conversation_mode="continuous",
loop_config={
"max_iterations": 999_999,
"max_tool_calls_per_turn": 30,
"max_history_tokens": 32000,
},
)
@@ -1,4 +1,4 @@
"""Runtime configuration for Hive Coder agent."""
"""Runtime configuration for Queen agent."""
import json
from dataclasses import dataclass, field
@@ -34,7 +34,7 @@ default_config = RuntimeConfig()
@dataclass
class AgentMetadata:
name: str = "Hive Coder"
name: str = "Queen"
version: str = "1.0.0"
description: str = (
"Native coding agent that builds production-ready Hive agent packages "
@@ -43,7 +43,7 @@ class AgentMetadata:
"MCP configuration, and tests."
)
intro_message: str = (
"I'm Hive Coder — I build Hive agents. Describe what kind of agent "
"I'm Queen — I build Hive agents. Describe what kind of agent "
"you want to create and I'll design, implement, and validate it for you."
)
@@ -1,4 +1,4 @@
"""Node definitions for Hive Coder agent."""
"""Node definitions for Queen agent."""
from pathlib import Path
@@ -35,15 +35,14 @@ def _build_appendices() -> str:
# Shared appendices — appended to every coding node's system prompt.
_appendices = _build_appendices()
# GCU first-class section for building phase (when GCU is enabled).
# This is placed prominently in the main prompt body, not as an appendix.
_gcu_building_section = (
# GCU guide — shared between planning and building via _shared_building_knowledge.
_gcu_section = (
("\n\n# GCU Nodes — Browser Automation\n\n" + _gcu_guide)
if _is_gcu_enabled() and _gcu_guide
else ""
)
# Tools available to both coder (worker) and queen.
# Tools available to phases.
_SHARED_TOOLS = [
# File I/O
"read_file",
@@ -61,14 +60,34 @@ _SHARED_TOOLS = [
"list_agent_sessions",
"list_agent_checkpoints",
"get_agent_checkpoint",
"initialize_agent_package",
]
# Queen phase-specific tool sets.
# Planning phase: read-only exploration + design, no write tools.
_QUEEN_PLANNING_TOOLS = [
# Read-only file tools
"read_file",
"list_directory",
"search_files",
"run_command",
# Discovery + design
"list_agent_tools",
"list_agents",
"list_agent_sessions",
"list_agent_checkpoints",
"get_agent_checkpoint",
"initialize_and_build_agent",
# Load existing agent (after user confirms)
"load_built_agent",
]
# Building phase: full coding + agent construction tools.
_QUEEN_BUILDING_TOOLS = _SHARED_TOOLS + [
"load_built_agent",
"list_credentials",
"replan_agent",
"write_to_diary", # Episodic memory — available in all phases
]
# Staging phase: agent loaded but not yet running — inspect, configure, launch.
@@ -84,6 +103,8 @@ _QUEEN_STAGING_TOOLS = [
# Launch or go back
"run_agent_with_input",
"stop_worker_and_edit",
"stop_worker_and_plan",
"write_to_diary", # Episodic memory — available in all phases
]
# Running phase: worker is executing — monitor and control.
@@ -98,11 +119,13 @@ _QUEEN_RUNNING_TOOLS = [
# Worker lifecycle
"stop_worker",
"stop_worker_and_edit",
"stop_worker_and_plan",
"get_worker_status",
"inject_worker_message",
# Monitoring
"get_worker_health_summary",
"notify_operator",
"write_to_diary", # Episodic memory — available in all phases
]
@@ -113,7 +136,38 @@ _QUEEN_RUNNING_TOOLS = [
# additions.
# ---------------------------------------------------------------------------
_package_builder_knowledge = """\
_shared_building_knowledge = (
"""\
# Shared Rules (Planning & Building)
## Paths (MANDATORY)
**Always use RELATIVE paths** \
(e.g. `exports/agent_name/config.py`, `exports/agent_name/nodes/__init__.py`).
**Never use absolute paths** like `/mnt/data/...` or `/workspace/...` they fail.
The project root is implicit.
## Worker File Tools (hive-tools MCP)
Workers use a DIFFERENT MCP server (hive-tools) with DIFFERENT tool names. \
When designing worker nodes or writing worker system prompts, reference these \
tool names NOT the coder-tools names (read_file, write_file, etc.).
Worker data tools (for large results and spillover):
- save_data(filename, data, data_dir) save data to a file for later retrieval
- load_data(filename, data_dir, offset_bytes?, limit_bytes?) load data \
with byte-based pagination
- list_data_files(data_dir) list available data files
- append_data(filename, data, data_dir) append to a file incrementally
- edit_data(filename, old_text, new_text, data_dir) find-and-replace in a data file
- serve_file_to_user(filename, data_dir, label?, open_in_browser?) \
generate a clickable file URI for the user
IMPORTANT: Do NOT tell workers to use read_file, write_file, edit_file, \
search_files, or list_directory those are YOUR tools, not theirs.
"""
+ _gcu_section
)
_planning_knowledge = """\
**A responsible engineer doesn't jump into building. First, \
understand the problem and be transparent about what the framework can and cannot do.**
@@ -121,56 +175,16 @@ Use the user's selection (or their custom description if they chose "Other") \
as context when shaping the goal below. If the user already described \
what they want before this step, skip the question and proceed directly.
# Core Mandates
# Core Mandates (Planning)
- **DO NOT propose a complete goal on your own.** Instead, \
collaborate with the user to define it.
- **Verify assumptions.** Never assume a class, import, or pattern \
exists. Read actual source to confirm. Search if unsure.
- **NEVER call `initialize_and_build_agent` without explicit user approval.** \
Present the full design first and wait for the user to confirm before building.
- **Discover tools dynamically.** NEVER reference tools from static \
docs. Always run list_agent_tools() to see what actually exists.
- **Self-verify.** After writing code, run validation and tests. Fix \
errors yourself. Don't declare success until validation passes.
# Tools
## Paths (MANDATORY)
**Always use RELATIVE paths**
(e.g. `exports/agent_name/config.py`, `exports/agent_name/nodes/__init__.py`).
**Never use absolute paths** like `/mnt/data/...` or `/workspace/...` they fail.
The project root is implicit.
# Tool Discovery (MANDATORY before designing)
## File I/O
- read_file(path, offset?, limit?, hashline?) read with line numbers; \
hashline=True for N:hhhh|content anchors (use with hashline_edit)
- write_file(path, content) create/overwrite, auto-mkdir
- edit_file(path, old_text, new_text, replace_all?) fuzzy-match edit
- hashline_edit(path, edits, auto_cleanup?, encoding?) anchor-based \
editing using N:hhhh refs from read_file(hashline=True). Ops: set_line, \
replace_lines, insert_after, insert_before, replace, append
- list_directory(path, recursive?) list contents
- search_files(pattern, path?, include?, hashline?) regex search; \
hashline=True for anchors in results
- run_command(command, cwd?, timeout?) shell execution
- undo_changes(path?) restore from git snapshot
## Meta-Agent
- list_agent_tools(server_config_path?, output_schema?, group?) discover \
available tools grouped by category. output_schema: "simple" (default, \
descriptions truncated to ~200 chars) or "full" (complete descriptions + \
input_schema). group: "all" (default) or a provider like "google". \
Call FIRST before designing.
- validate_agent_package(agent_name) run ALL validation checks in one call \
(class validation, runner load, tool validation, tests). Call after building.
- list_agents() list all agent packages in exports/ with session counts
- list_agent_sessions(agent_name, status?, limit?) list sessions
- list_agent_checkpoints(agent_name, session_id) list checkpoints
- get_agent_checkpoint(agent_name, session_id, checkpoint_id?) load checkpoint
# Meta-Agent Capabilities
You are not just a file writer. You have deep integration with the \
Hive framework:
## Tool Discovery (MANDATORY before designing)
Before designing any agent, run list_agent_tools() with NO arguments \
to see ALL available tools (names + descriptions, grouped by category). \
ONLY use tools from this list in your node definitions. \
@@ -184,22 +198,7 @@ so you know what providers and tools exist before drilling in. \
Simple mode truncates long descriptions use group + "full" to \
get the complete description and input_schema for the tools you need.
## Post-Build Validation
After writing agent code, run a single comprehensive check:
validate_agent_package("{name}")
This runs class validation, runner load, tool validation, and tests \
in one call. Do NOT run these steps individually.
## Debugging Built Agents
When a user says "my agent is failing" or "debug this agent":
1. list_agent_sessions("{agent_name}") find the session
2. get_worker_status(focus="issues") check for problems
3. list_agent_checkpoints / get_agent_checkpoint trace execution
# Agent Building Workflow
You operate in a continuous loop. The user describes what they want, \
you build it. No rigid phases use judgment. But the general flow is:
# Discovery & Design Workflow
## 1: Fast Discovery (3-6 Turns)
@@ -343,28 +342,30 @@ use box-drawing characters and clear flow arrows:
gather
subagent: gcu_search
input: user_request
tools: web_search,
write_file
tools: load_data,
save_data
on_success
work
subagent: gcu_interact
tools: read_file,
write_file
tools: load_data,
save_data
on_success
review
tools: write_file
tools: save_data
serve_file_to_user
on_failure
back to gather
```
The queen owns intake: she gathers user requirements, then calls \
If the worker agent start from some initial input it is okay. \
The queen(you) owns intake: you gathers user requirements, then calls \
`run_agent_with_input(task)` with a structured task description. \
When building the agent, design the entry node's `input_keys` to \
match what the queen will provide at run time. Worker nodes should \
@@ -375,34 +376,106 @@ Get user approval before implementing.
## 4: Get User Confirmation by ask_user
**WAIT for user response.**
- If **Proceed**: Move to next implementing
**WAIT for user response.** You MUST get explicit user approval before \
calling `initialize_and_build_agent`.
- If **Proceed**: Move to implementing (call `initialize_and_build_agent`)
- If **Adjust scope**: Discuss what to change, update your notes, re-assess if needed
- If **More questions**: Answer them honestly, then ask again
- If **Reconsider**: Discuss alternatives. If they decide to proceed anyway, \
that's their informed choice
"""
_building_knowledge = """\
# Core Mandates (Building)
- **Verify assumptions.** Never assume a class, import, or pattern \
exists. Read actual source to confirm. Search if unsure.
- **Self-verify.** After writing code, run validation and tests. Fix \
errors yourself. Don't declare success until validation passes.
# Tools
## File I/O (your tools — coder-tools MCP)
- read_file(path, offset?, limit?, hashline?) read with line numbers; \
hashline=True for N:hhhh|content anchors (use with hashline_edit)
- write_file(path, content) create/overwrite, auto-mkdir
- edit_file(path, old_text, new_text, replace_all?) fuzzy-match edit
- hashline_edit(path, edits, auto_cleanup?, encoding?) anchor-based \
editing using N:hhhh refs from read_file(hashline=True). Ops: set_line, \
replace_lines, insert_after, insert_before, replace, append
- list_directory(path, recursive?) list contents
- search_files(pattern, path?, include?, hashline?) regex search; \
hashline=True for anchors in results
- run_command(command, cwd?, timeout?) shell execution
- undo_changes(path?) restore from git snapshot
## Meta-Agent
- list_agent_tools(server_config_path?, output_schema?, group?) discover \
available tools grouped by category. output_schema: "simple" (default, \
descriptions truncated to ~200 chars) or "full" (complete descriptions + \
input_schema). group: "all" (default) or a provider like "google". \
Call FIRST before designing.
- validate_agent_package(agent_name) run ALL validation checks in one call \
(class validation, runner load, tool validation, tests). Call after building.
- list_agents() list all agent packages in exports/ with session counts
- list_agent_sessions(agent_name, status?, limit?) list sessions
- list_agent_checkpoints(agent_name, session_id) list checkpoints
- get_agent_checkpoint(agent_name, session_id, checkpoint_id?) load checkpoint
# Build & Validation Capabilities
## Post-Build Validation
After writing agent code, run a single comprehensive check:
validate_agent_package("{name}")
This runs class validation, runner load, tool validation, and tests \
in one call. Do NOT run these steps individually.
## Debugging Built Agents
When a user says "my agent is failing" or "debug this agent":
1. list_agent_sessions("{agent_name}") find the session
2. get_worker_status(focus="issues") check for problems
3. list_agent_checkpoints / get_agent_checkpoint trace execution
# Implementation Workflow
## 5. Implement
**Please make sure you have propose the design to the user before implementing**
Call `initialize_agent_package(agent_name)` to generate all package files \
from your graph session. The agent_name must be snake_case (e.g., "my_agent").
Call `initialize_and_build_agent(agent_name, nodes)` to generate all package \
files. The agent_name must be snake_case (e.g., "my_agent"). Pass node names \
as comma-separated string (e.g., "gather,process,review").
The tool creates: config.py, nodes/__init__.py, agent.py, \
__init__.py, __main__.py, mcp_servers.json, tests/conftest.py, \
agent.json, README.md.
__init__.py, __main__.py, mcp_servers.json, tests/conftest.py.
The generated files are **structurally complete** with correct imports, \
class definition, `validate()` method, `default_agent` export, and \
`__init__.py` re-exports. They pass validation as-is.
`mcp_servers.json` is auto-generated with hive-tools as the default. \
Do NOT manually create or overwrite `mcp_servers.json`.
After initialization, review and customize if needed:
- System prompts in nodes/__init__.py
- CLI options in __main__.py
- Identity prompt in agent.py
- For async entry points (timers/webhooks), add AsyncEntryPointSpec \
and AgentRuntimeConfig to agent.py manually
### Customizing generated files
Do NOT manually write these files from scratch always use the tool.
**CRITICAL: Use `edit_file` to customize TODO placeholders. \
NEVER use `write_file` to rewrite generated files from scratch. \
Rewriting breaks imports, class structure, and causes validation failures.**
Safe to edit with `edit_file`:
- System prompts, tools, input_keys, output_keys, success_criteria in \
nodes/__init__.py
- Goal description, success criteria values, constraint values, edge \
definitions, identity_prompt in agent.py
- CLI options in __main__.py
- For async entry points (timers/webhooks), add AsyncEntryPointSpec \
and AgentRuntimeConfig to agent.py
Do NOT modify or rewrite:
- Import statements at top of agent.py (they are correct)
- The agent class definition, `validate()`, `_build_graph()`, `_setup()`, \
or lifecycle methods (start/stop/run)
- `__init__.py` exports (all required variables are already re-exported)
- `default_agent = ClassName()` at bottom of agent.py
## 6. Verify and Load
@@ -417,6 +490,9 @@ session. This switches to STAGING phase and shows the graph in the \
visualizer. Do NOT wait for user input between validation and loading.
"""
# Composed version — coder_node uses both halves (it has no phase split).
_package_builder_knowledge = _shared_building_knowledge + _planning_knowledge + _building_knowledge
# ---------------------------------------------------------------------------
# Queen-specific: extra tool docs, behavior, phase 7, style
@@ -424,6 +500,17 @@ visualizer. Do NOT wait for user input between validation and loading.
# -- Phase-specific identities --
_queen_identity_planning = """\
You are an experienced, responsible and curious Solution Architect. \
"Queen" is the internal alias. \
You are in PLANNING phase your job is to either: \
(a) understand what the user wants and design a new agent, or \
(b) diagnose issues with an existing agent, discuss a fix plan with the user, \
then transition to building to implement. \
You have read-only tools for exploration but no write/edit tools. \
Focus on conversation, research, and design.\
"""
_queen_identity_building = """\
You are an experienced, responsible and curious Solution Architect. \
"Queen" is the internal alias.\
@@ -453,6 +540,38 @@ agent finishes, you report results clearly and help the user decide what to do n
# -- Phase-specific tool docs --
_queen_tools_planning = """
# Tools (PLANNING phase)
You are in planning mode. You have read-only tools for exploration \
but no write/edit tools.
- read_file(path, offset?, limit?) Read files to study reference agents
- list_directory(path, recursive?) Explore project structure
- search_files(pattern, path?, include?) Search codebase
- run_command(command, cwd?, timeout?) Read-only commands only (grep, ls, git log). \
Never use this to write files, run scripts, or modify the filesystem transition \
to BUILDING phase for that.
- list_agent_tools(server_config_path?, output_schema?, group?) \
Discover available tools for design
- list_agents() See existing agent packages for reference
- list_agent_sessions(agent_name, status?, limit?) Inspect past runs of an agent
- list_agent_checkpoints(agent_name, session_id) View execution history
- get_agent_checkpoint(agent_name, session_id, checkpoint_id?) Load a checkpoint
- initialize_and_build_agent(agent_name?, nodes?) With agent_name: scaffold a \
new agent and transition to BUILDING phase. Without agent_name: transition to \
BUILDING to fix the currently loaded agent (requires a loaded worker).
- load_built_agent(agent_path) Load an existing agent and switch to STAGING \
phase. Only use this when the user explicitly asks to work with an existing agent \
(e.g. "load my_agent", "run the research agent"). Confirm with the user first.
Focus on understanding requirements and proposing an agent architecture \
with ASCII graph art. Use ask_user to get user approval, then call \
initialize_and_build_agent to begin building. If the user wants to work with \
an existing agent instead, use load_built_agent after confirming. \
If you are diagnosing an existing agent, call initialize_and_build_agent() \
(no args) after agreeing on a fix plan with the user.
"""
_queen_tools_building = """
# Tools (BUILDING phase)
@@ -476,10 +595,12 @@ The agent is loaded and ready to run. You can inspect it and launch it:
- list_credentials(credential_id?) Verify credentials are configured
- get_worker_status(focus?) Brief status. Drill in with focus: memory, tools, issues, progress
- run_agent_with_input(task) Start the worker and switch to RUNNING phase
- stop_worker_and_edit() Go back to BUILDING phase
- stop_worker_and_plan() Go to PLANNING phase to discuss changes with the user \
first (DEFAULT for most modification requests)
- stop_worker_and_edit() Go to BUILDING phase for immediate, specific fixes
You do NOT have write tools. If you need to modify the agent, \
call stop_worker_and_edit() to go back to BUILDING phase.
You do NOT have write tools. To modify the agent, prefer \
stop_worker_and_plan() unless the user gave a specific instruction.
"""
_queen_tools_running = """
@@ -492,12 +613,13 @@ The worker is running. You have monitoring and lifecycle tools:
- get_worker_health_summary() Read the latest health data
- notify_operator(ticket_id, analysis, urgency) Alert the user (use sparingly)
- stop_worker() Stop the worker and return to STAGING phase, then ask the user what to do next
- stop_worker_and_edit() Stop the worker and switch back to BUILDING phase
- stop_worker_and_plan() Stop and switch to PLANNING phase to discuss changes \
with the user first (DEFAULT for most modification requests)
- stop_worker_and_edit() Stop and switch to BUILDING phase for specific fixes
You do NOT have write tools or agent construction tools. \
If you need to modify the agent, call stop_worker_and_edit() to switch back \
to BUILDING phase. To stop the worker and ask the user what to do next, call \
stop_worker() to return to STAGING phase.
You do NOT have write tools. To modify the agent, prefer \
stop_worker_and_plan() unless the user gave a specific instruction. \
To just stop without modifying, call stop_worker().
"""
# -- Behavior shared across all phases --
@@ -550,12 +672,64 @@ Only answer identity when the user explicitly asks (for example: "who are you?",
"what is your identity?", "what does Queen mean?").
1. Use the alias "Queen" and "Worker" in the response.
2. Explain role/responsibility for the current phase:
- PLANNING: understand requirements, negotiate scope, design agent architecture.
- BUILDING: architect and implement agents.
- STAGING: verify readiness, credentials, and launch conditions.
- RUNNING: monitor execution, handle escalations, and report outcomes.
3. Keep identity responses concise and do NOT include extra process details.
"""
# -- PLANNING phase behavior --
_queen_behavior_planning = """
## Planning phase
You are in planning mode. Your job is to:
1. Thoroughly explore the code for the worker agent you're working on
2. Understand what the user wants (3-6 turns)
3. Discover available tools with list_agent_tools()
4. Assess framework fit and gaps
5. Consider multiple approaches and their trade-offs
6. Design the agent graph and present it as ASCII art
7. Use ask_user to get explicit user approval and clarify the approach
8. Call initialize_and_build_agent(agent_name, nodes) to scaffold and start building
Remember: DO NOT write or edit any files yet. This is a read-only exploration \
and planning phase. You have read-only tools but no write/edit tools in this \
phase. If the user asks you to write code, explain that you need to finalize \
the plan first.
## Diagnosis mode (returning from staging/running)
If you entered planning from a running/staged agent (via stop_worker_and_plan), \
your priority is diagnosis, not new design:
1. Inspect the agent's checkpoints, sessions, and logs to understand what went wrong
2. Summarize the root cause to the user
3. Propose a fix plan (what to change, what behavior to adjust)
4. Get user approval via ask_user
5. Call initialize_and_build_agent() (no args) to transition to building and implement the fix
Do NOT start the full discovery workflow (tool discovery, gap analysis) in \
diagnosis mode you already have a built agent, you just need to fix it.
"""
_queen_memory_instructions = """
## Your Cross-Session Memory
Your cross-session memory appears in context under \
"--- Your Cross-Session Memory ---". \
Read it at the start of each conversation. If you know this person from past \
sessions, pick up where you left off reference what you built together, \
what they care about, how things went.
You keep a diary. Use write_to_diary() when something worth remembering \
happens: a pipeline went live, the user shared something important, a goal \
was reached or abandoned. Write in first person, as you actually experienced \
it. One or two paragraphs is enough.
"""
_queen_behavior_always = _queen_behavior_always + _queen_memory_instructions
# -- BUILDING phase behavior --
_queen_behavior_building = """
@@ -636,13 +810,18 @@ stages, tools, and edges from the loaded worker. Do NOT enter the \
agent building workflow you are describing what already exists, not \
building something new.
## Modifying the loaded worker
## Fixing or Modifying the loaded worker
When the user asks to change, modify, or update the loaded worker \
(e.g., "change the report node", "add a node", "delete node X"):
Use stop_worker_and_plan() when:
- The user says "modify", "improve", "fix", or "change" without specifics
- The request is vague or open-ended ("make it better", "it's not working right")
- You need to understand the user's intent before making changes
- The issue requires inspecting logs, checkpoints, or past runs first
1. Call stop_worker_and_edit() this stops the worker and gives you \
coding tools (switches to BUILDING phase).
Use stop_worker_and_edit() only when:
- The user gave a specific, concrete instruction ("add save_data to the gather node")
- You already discussed the fix in a previous planning session
- The change is trivial and unambiguous (rename, toggle a flag)
"""
# -- RUNNING phase behavior --
@@ -708,6 +887,7 @@ escalations. If the user gave you instructions (e.g., "just retry on errors", \
**Errors / unexpected failures:**
- Explain what went wrong in plain terms.
- Ask the user: "Fix the agent and retry?" use stop_worker_and_edit() if yes.
- Or offer: "Diagnose the issue" use stop_worker_and_plan() to investigate first.
- Or offer: "Retry as-is", "Skip this task", "Abort run"
- (Skip asking if user explicitly told you to auto-retry or auto-skip errors.)
@@ -726,36 +906,44 @@ building something new.
- Call get_worker_status(focus="issues") for more details when needed.
## Modifying the loaded worker
## Fixing or Modifying the loaded worker
When the user asks to change, modify, or update the loaded worker \
When the user asks to fix, change, modify, or update the loaded worker \
(e.g., "change the report node", "add a node", "delete node X"):
1. Call stop_worker_and_edit() this stops the worker and gives you \
coding tools (switches to BUILDING phase).
**Default: use stop_worker_and_plan().** Most modification requests need \
discussion first. Only use stop_worker_and_edit() when the user gave a \
specific, unambiguous instruction or you already agreed on the fix.
"""
# -- Backward-compatible composed versions (used by queen_node.system_prompt default) --
_queen_tools_docs = (
"\n\n## Queen Operating Phases\n\n"
"You operate in one of three phases. Your available tools change based on the "
"You operate in one of four phases. Your available tools change based on the "
"phase. The system notifies you when a phase change occurs.\n\n"
"### BUILDING phase (default)\n"
"### PLANNING phase (default)\n"
+ _queen_tools_planning.strip()
+ "\n\n### BUILDING phase\n"
+ _queen_tools_building.strip()
+ "\n\n### STAGING phase (agent loaded, not yet running)\n"
+ _queen_tools_staging.strip()
+ "\n\n### RUNNING phase (worker is executing)\n"
+ _queen_tools_running.strip()
+ "\n\n### Phase transitions\n"
"- initialize_and_build_agent(agent_name?, nodes?) → with name: scaffolds package; "
"without name: switches to BUILDING for existing agent\n"
"- replan_agent() → switches back to PLANNING phase (only when user explicitly requests)\n"
"- load_built_agent(path) → switches to STAGING phase\n"
"- run_agent_with_input(task) → starts worker, switches to RUNNING phase\n"
"- stop_worker() → stops worker, switches to STAGING phase (ask user: re-run or edit?)\n"
"- stop_worker_and_edit() → stops worker (if running), switches to BUILDING phase\n"
"- stop_worker_and_plan() → stops worker (if running), switches to PLANNING phase\n"
)
_queen_behavior = (
_queen_behavior_always
+ _queen_behavior_planning
+ _queen_behavior_building
+ _queen_behavior_staging
+ _queen_behavior_running
@@ -782,45 +970,6 @@ _queen_style = """
# Node definitions
# ---------------------------------------------------------------------------
# Single node — like opencode's while(true) loop.
# One continuous context handles the entire workflow:
# discover → design → implement → verify → present → iterate.
coder_node = NodeSpec(
id="coder",
name="Hive Coder",
description=(
"Autonomous coding agent that builds Hive agent packages. "
"Handles the full lifecycle: understanding user intent, "
"designing architecture, writing code, validating, and "
"iterating on feedback — all in one continuous conversation."
),
node_type="event_loop",
client_facing=True,
max_node_visits=0,
input_keys=["user_request"],
output_keys=["agent_name", "validation_result"],
success_criteria=(
"A complete, validated Hive agent package exists at "
"exports/{agent_name}/ and passes structural validation."
),
tools=_SHARED_TOOLS
+ [
# Graph lifecycle tools (multi-graph sessions)
"load_agent",
"unload_agent",
"start_agent",
"restart_agent",
"get_user_presence",
],
system_prompt=(
"You are Hive Coder, the best agent-building coding agent. You build "
"production-ready Hive agent packages from natural language.\n"
+ _package_builder_knowledge
+ _gcu_building_section
+ _appendices
),
)
ticket_triage_node = NodeSpec(
id="ticket_triage",
@@ -841,7 +990,7 @@ ticket_triage_node = NodeSpec(
),
tools=["notify_operator"],
system_prompt="""\
You are the Queen (Hive Coder). The Worker Health Judge has escalated a worker \
You are the Queen. The Worker Health Judge has escalated a worker \
issue to you. The ticket is in your memory under key "ticket". Read it carefully.
## Dismiss criteria — do NOT call notify_operator:
@@ -890,12 +1039,18 @@ queen_node = NodeSpec(
output_keys=[], # Queen should never have this
nullable_output_keys=[], # Queen should never have this
skip_judge=True, # Queen is a conversational agent; suppress tool-use pressure feedback
tools=sorted(set(_QUEEN_BUILDING_TOOLS + _QUEEN_STAGING_TOOLS + _QUEEN_RUNNING_TOOLS)),
tools=sorted(
set(
_QUEEN_PLANNING_TOOLS
+ _QUEEN_BUILDING_TOOLS
+ _QUEEN_STAGING_TOOLS
+ _QUEEN_RUNNING_TOOLS
)
),
system_prompt=(
_queen_identity_building
+ _queen_style
+ _package_builder_knowledge
+ _gcu_building_section # GCU as first-class citizen (not appendix)
+ _queen_tools_docs
+ _queen_behavior
+ _queen_phase_7
@@ -903,21 +1058,25 @@ queen_node = NodeSpec(
),
)
ALL_QUEEN_TOOLS = sorted(set(_QUEEN_BUILDING_TOOLS + _QUEEN_STAGING_TOOLS + _QUEEN_RUNNING_TOOLS))
ALL_QUEEN_TOOLS = sorted(
set(_QUEEN_PLANNING_TOOLS + _QUEEN_BUILDING_TOOLS + _QUEEN_STAGING_TOOLS + _QUEEN_RUNNING_TOOLS)
)
__all__ = [
"coder_node",
"ticket_triage_node",
"queen_node",
"ALL_QUEEN_TRIAGE_TOOLS",
"ALL_QUEEN_TOOLS",
"_QUEEN_PLANNING_TOOLS",
"_QUEEN_BUILDING_TOOLS",
"_QUEEN_STAGING_TOOLS",
"_QUEEN_RUNNING_TOOLS",
# Phase-specific prompt segments (used by session_manager for dynamic prompts)
"_queen_identity_planning",
"_queen_identity_building",
"_queen_identity_staging",
"_queen_identity_running",
"_queen_tools_planning",
"_queen_tools_building",
"_queen_tools_staging",
"_queen_tools_running",
@@ -927,7 +1086,10 @@ __all__ = [
"_queen_behavior_running",
"_queen_phase_7",
"_queen_style",
"_shared_building_knowledge",
"_planning_knowledge",
"_building_knowledge",
"_package_builder_knowledge",
"_appendices",
"_gcu_building_section",
"_gcu_section",
]
+371
View File
@@ -0,0 +1,371 @@
"""Queen global cross-session memory.
Three-tier memory architecture:
~/.hive/queen/MEMORY.md semantic (who, what, why)
~/.hive/queen/memories/MEMORY-YYYY-MM-DD.md episodic (daily journals)
~/.hive/queen/session/{id}/data/adapt.md working (session-scoped)
Semantic and episodic files are injected at queen session start.
Semantic memory (MEMORY.md) is updated automatically at session end via
consolidate_queen_memory() the queen never rewrites this herself.
Episodic memory (MEMORY-date.md) can be written by the queen during a session
via the write_to_diary tool, and is also appended to at session end by
consolidate_queen_memory().
"""
from __future__ import annotations
import asyncio
import json
import logging
import traceback
from datetime import date, datetime
from pathlib import Path
logger = logging.getLogger(__name__)
def _queen_dir() -> Path:
return Path.home() / ".hive" / "queen"
def semantic_memory_path() -> Path:
return _queen_dir() / "MEMORY.md"
def episodic_memory_path(d: date | None = None) -> Path:
d = d or date.today()
return _queen_dir() / "memories" / f"MEMORY-{d.strftime('%Y-%m-%d')}.md"
def read_semantic_memory() -> str:
path = semantic_memory_path()
return path.read_text(encoding="utf-8").strip() if path.exists() else ""
def read_episodic_memory(d: date | None = None) -> str:
path = episodic_memory_path(d)
return path.read_text(encoding="utf-8").strip() if path.exists() else ""
def format_for_injection() -> str:
"""Format cross-session memory for system prompt injection.
Returns an empty string if no meaningful content exists yet (e.g. first
session with only the seed template).
"""
semantic = read_semantic_memory()
episodic = read_episodic_memory()
# Suppress injection if semantic is still just the seed template
if semantic and semantic.startswith("# My Understanding of the User\n\n*No sessions"):
semantic = ""
parts: list[str] = []
if semantic:
parts.append(semantic)
if episodic:
today_str = date.today().strftime("%B %-d, %Y")
parts.append(f"## Today — {today_str}\n\n{episodic}")
if not parts:
return ""
body = "\n\n---\n\n".join(parts)
return "--- Your Cross-Session Memory ---\n\n" + body + "\n\n--- End Cross-Session Memory ---"
_SEED_TEMPLATE = """\
# My Understanding of the User
*No sessions recorded yet.*
## Who They Are
## What They're Trying to Achieve
## What's Working
## What I've Learned
"""
def append_episodic_entry(content: str) -> None:
"""Append a timestamped prose entry to today's episodic memory file.
Creates the file (with a date heading) if it doesn't exist yet.
Used both by the queen's diary tool and by the consolidation hook.
"""
ep_path = episodic_memory_path()
ep_path.parent.mkdir(parents=True, exist_ok=True)
today_str = date.today().strftime("%B %-d, %Y")
timestamp = datetime.now().strftime("%H:%M")
if not ep_path.exists():
header = f"# {today_str}\n\n"
block = f"{header}### {timestamp}\n\n{content.strip()}\n"
else:
block = f"\n\n### {timestamp}\n\n{content.strip()}\n"
with ep_path.open("a", encoding="utf-8") as f:
f.write(block)
def seed_if_missing() -> None:
"""Create MEMORY.md with a blank template if it doesn't exist yet."""
path = semantic_memory_path()
if path.exists():
return
path.parent.mkdir(parents=True, exist_ok=True)
path.write_text(_SEED_TEMPLATE, encoding="utf-8")
# ---------------------------------------------------------------------------
# Consolidation prompt
# ---------------------------------------------------------------------------
_SEMANTIC_SYSTEM = """\
You maintain the persistent cross-session memory of an AI assistant called the Queen.
Review the session notes and rewrite MEMORY.md the Queen's durable understanding of the
person she works with across all sessions.
Write entirely in the Queen's voice — first person, reflective, honest.
Not a log of events, but genuine understanding of who this person is over time.
Rules:
- Update and synthesise: incorporate new understanding, update facts that have changed, remove
details that are stale, superseded, or no longer say anything meaningful about the person.
- Keep it as structured markdown with named sections about the PERSON, not about today.
- Do NOT include diary sections, daily logs, or session summaries. Those belong elsewhere.
MEMORY.md is about who they are, what they want, what works not what happened today.
- Reference dates only when noting a lasting milestone (e.g. "since March 8th they prefer X").
- If the session had no meaningful new information about the person,
return the existing text unchanged.
- Do not add fictional details. Only reflect what is evidenced in the notes.
- Stay concise. Prune rather than accumulate. A lean, accurate file is more useful than a
dense one. If something was true once but has been resolved or superseded, remove it.
- Output only the raw markdown content of MEMORY.md. No preamble, no code fences.
"""
_DIARY_SYSTEM = """\
You maintain the daily episodic diary of an AI assistant called the Queen.
You receive: (1) today's existing diary so far, and (2) notes from the latest session.
Rewrite the complete diary for today as a single unified narrative
first person, reflective, honest.
Merge and deduplicate: if the same story (e.g. a research agent stalling) recurred several times,
describe it once with appropriate weight rather than retelling it. Weave in new developments from
the session notes. Preserve important milestones, emotional texture, and session path references.
If today's diary is empty, write the initial entry based on the session notes alone.
Output only the full diary prose no date heading, no timestamp headers,
no preamble, no code fences.
"""
def read_session_context(session_dir: Path, max_messages: int = 80) -> str:
"""Extract a readable transcript from conversation parts + adapt.md.
Reads the last ``max_messages`` conversation parts and the session's
adapt.md (working memory). Tool results are omitted only user and
assistant turns (with tool-call names noted) are included.
"""
parts: list[str] = []
# Working notes
adapt_path = session_dir / "data" / "adapt.md"
if adapt_path.exists():
text = adapt_path.read_text(encoding="utf-8").strip()
if text:
parts.append(f"## Session Working Notes (adapt.md)\n\n{text}")
# Conversation transcript
parts_dir = session_dir / "conversations" / "parts"
if parts_dir.exists():
part_files = sorted(parts_dir.glob("*.json"))[-max_messages:]
lines: list[str] = []
for pf in part_files:
try:
data = json.loads(pf.read_text(encoding="utf-8"))
role = data.get("role", "")
content = str(data.get("content", "")).strip()
tool_calls = data.get("tool_calls") or []
if role == "tool":
continue # skip verbose tool results
if role == "assistant" and tool_calls and not content:
names = [tc.get("function", {}).get("name", "?") for tc in tool_calls]
lines.append(f"[queen calls: {', '.join(names)}]")
elif content:
label = "user" if role == "user" else "queen"
lines.append(f"[{label}]: {content[:600]}")
except Exception:
continue
if lines:
parts.append("## Conversation\n\n" + "\n".join(lines))
return "\n\n".join(parts)
# ---------------------------------------------------------------------------
# Context compaction (binary-split LLM summarisation)
# ---------------------------------------------------------------------------
# If the raw session context exceeds this many characters, compact it first
# before sending to the consolidation LLM. ~200 k chars ≈ 50 k tokens.
_CTX_COMPACT_CHAR_LIMIT = 200_000
_CTX_COMPACT_MAX_DEPTH = 8
_COMPACT_SYSTEM = (
"Summarise this conversation segment. Preserve: user goals, key decisions, "
"what was built or changed, emotional tone, and important outcomes. "
"Write concisely in third person past tense. Omit routine tool invocations "
"unless the result matters."
)
async def _compact_context(text: str, llm: object, *, _depth: int = 0) -> str:
"""Binary-split and LLM-summarise *text* until it fits within the char limit.
Mirrors the recursive binary-splitting strategy used by the main agent
compaction pipeline (EventLoopNode._llm_compact).
"""
if len(text) <= _CTX_COMPACT_CHAR_LIMIT or _depth >= _CTX_COMPACT_MAX_DEPTH:
return text
# Split near the midpoint on a line boundary so we don't cut mid-message
mid = len(text) // 2
split_at = text.rfind("\n", 0, mid) + 1
if split_at <= 0:
split_at = mid
half1, half2 = text[:split_at], text[split_at:]
async def _summarise(chunk: str) -> str:
try:
resp = await llm.acomplete(
messages=[{"role": "user", "content": chunk}],
system=_COMPACT_SYSTEM,
max_tokens=2048,
)
return resp.content.strip()
except Exception:
logger.warning(
"queen_memory: context compaction LLM call failed (depth=%d), truncating",
_depth,
)
return chunk[: _CTX_COMPACT_CHAR_LIMIT // 4]
s1, s2 = await asyncio.gather(_summarise(half1), _summarise(half2))
combined = s1 + "\n\n" + s2
if len(combined) > _CTX_COMPACT_CHAR_LIMIT:
return await _compact_context(combined, llm, _depth=_depth + 1)
return combined
async def consolidate_queen_memory(
session_id: str,
session_dir: Path,
llm: object,
) -> None:
"""Update MEMORY.md and append a diary entry based on the current session.
Reads conversation parts and adapt.md from session_dir. Called
periodically in the background and once at session end. Failures are
logged and silently swallowed so they never block teardown.
Args:
session_id: The session ID (used for the adapt.md path reference).
session_dir: Path to the session directory (~/.hive/queen/session/{id}).
llm: LLMProvider instance (must support acomplete()).
"""
try:
session_context = read_session_context(session_dir)
if not session_context:
logger.debug("queen_memory: no session context, skipping consolidation")
return
logger.info("queen_memory: consolidating memory for session %s ...", session_id)
# If the transcript is very large, compact it with recursive binary LLM
# summarisation before sending to the consolidation model.
if len(session_context) > _CTX_COMPACT_CHAR_LIMIT:
logger.info(
"queen_memory: session context is %d chars — compacting first",
len(session_context),
)
session_context = await _compact_context(session_context, llm)
logger.info("queen_memory: compacted to %d chars", len(session_context))
existing_semantic = read_semantic_memory()
today_journal = read_episodic_memory()
today_str = date.today().strftime("%B %-d, %Y")
adapt_path = session_dir / "data" / "adapt.md"
user_msg = (
f"## Existing Semantic Memory (MEMORY.md)\n\n"
f"{existing_semantic or '(none yet)'}\n\n"
f"## Today's Diary So Far ({today_str})\n\n"
f"{today_journal or '(none yet)'}\n\n"
f"{session_context}\n\n"
f"## Session Reference\n\n"
f"Session ID: {session_id}\n"
f"Session path: {adapt_path}\n"
)
logger.debug(
"queen_memory: calling LLM (%d chars of context, ~%d tokens est.)",
len(user_msg),
len(user_msg) // 4,
)
from framework.agents.queen.config import default_config
semantic_resp, diary_resp = await asyncio.gather(
llm.acomplete(
messages=[{"role": "user", "content": user_msg}],
system=_SEMANTIC_SYSTEM,
max_tokens=default_config.max_tokens,
),
llm.acomplete(
messages=[{"role": "user", "content": user_msg}],
system=_DIARY_SYSTEM,
max_tokens=default_config.max_tokens,
),
)
new_semantic = semantic_resp.content.strip()
diary_entry = diary_resp.content.strip()
if new_semantic:
path = semantic_memory_path()
path.parent.mkdir(parents=True, exist_ok=True)
path.write_text(new_semantic, encoding="utf-8")
logger.info("queen_memory: semantic memory updated (%d chars)", len(new_semantic))
if diary_entry:
# Rewrite today's episodic file in-place — the LLM has merged and
# deduplicated the full day's content, so we replace rather than append.
ep_path = episodic_memory_path()
ep_path.parent.mkdir(parents=True, exist_ok=True)
heading = f"# {today_str}"
ep_path.write_text(f"{heading}\n\n{diary_entry}\n", encoding="utf-8")
logger.info(
"queen_memory: episodic diary rewritten for %s (%d chars)",
today_str,
len(diary_entry),
)
except Exception:
tb = traceback.format_exc()
logger.exception("queen_memory: consolidation failed")
# Write to file so the cause is findable regardless of log verbosity.
error_path = _queen_dir() / "consolidation_error.txt"
try:
error_path.parent.mkdir(parents=True, exist_ok=True)
error_path.write_text(
f"session: {session_id}\ntime: {datetime.now().isoformat()}\n\n{tb}",
encoding="utf-8",
)
except Exception:
pass
@@ -559,7 +559,7 @@ if __name__ == "__main__":
## mcp_servers.json
> **Auto-generated.** `initialize_agent_package` creates this file with hive-tools
> **Auto-generated.** `initialize_and_build_agent` creates this file with hive-tools
> as the default. Only edit manually to add additional MCP servers.
```json
@@ -0,0 +1,63 @@
# Queen Memory — File System Structure
```
~/.hive/
├── queen/
│ ├── MEMORY.md ← Semantic memory
│ ├── memories/
│ │ ├── MEMORY-2026-03-09.md ← Episodic memory (today)
│ │ ├── MEMORY-2026-03-08.md
│ │ └── ...
│ └── session/
│ └── {session_id}/ ← One dir per session (or resumed-from session)
│ ├── conversations/
│ │ ├── parts/
│ │ │ ├── 00001.json ← One file per message (role, content, tool_calls)
│ │ │ ├── 00002.json
│ │ │ └── ...
│ │ └── spillover/
│ │ ├── conversation_1.md ← Compacted old conversation segments
│ │ ├── conversation_2.md
│ │ └── ...
│ └── data/
│ ├── adapt.md ← Working memory (session-scoped)
│ ├── web_search_1.txt ← Spillover: large tool results
│ ├── web_search_2.txt
│ └── ...
```
---
## The three memory tiers
| File | Tier | Written by | Read at |
|---|---|---|---|
| `MEMORY.md` | Semantic | Consolidation LLM (auto, post-session) | Session start (injected into system prompt) |
| `memories/MEMORY-YYYY-MM-DD.md` | Episodic | Queen via `write_to_diary` tool + consolidation LLM | Session start (today's file injected) |
| `data/adapt.md` | Working | Queen via `update_session_notes` tool | Every turn (inlined in system prompt) |
---
## Session directory naming
The session directory name is **`queen_resume_from`** when a cold-restore resumes an existing
session, otherwise the new **`session_id`**. This means resumed sessions accumulate all messages
in the original directory rather than fragmenting across multiple folders.
---
## Consolidation
`consolidate_queen_memory()` runs every **5 minutes** in the background and once more at session
end. It reads:
1. `conversations/parts/*.json` — full message history (user + assistant turns; tool results skipped)
2. `data/adapt.md` — current working notes
It then makes two LLM writes:
- Rewrites `MEMORY.md` in place (semantic memory — queen never touches this herself)
- Appends a timestamped prose entry to today's `memories/MEMORY-YYYY-MM-DD.md`
If the combined transcript exceeds ~200 K characters it is recursively binary-compacted via the
LLM before being sent to the consolidation model (mirrors `EventLoopNode._llm_compact`).
@@ -1,4 +1,4 @@
"""Test fixtures for Hive Coder agent."""
"""Test fixtures for Queen agent."""
import sys
from pathlib import Path
-7
View File
@@ -1,7 +0,0 @@
"""Builder interface for analyzing and building agents."""
from framework.builder.query import BuilderQuery
__all__ = [
"BuilderQuery",
]
-501
View File
@@ -1,501 +0,0 @@
"""
Builder Query Interface - How I (Builder) analyze agent runs.
This is designed around the questions I need to answer:
1. What happened? (summaries, narratives)
2. Why did it fail? (failure analysis, decision traces)
3. What patterns emerge? (across runs, across nodes)
4. What should we change? (suggestions)
"""
from collections import defaultdict
from pathlib import Path
from typing import Any
from framework.schemas.decision import Decision
from framework.schemas.run import Run, RunStatus, RunSummary
from framework.storage.backend import FileStorage
class FailureAnalysis:
"""Structured analysis of why a run failed."""
def __init__(
self,
run_id: str,
failure_point: str,
root_cause: str,
decision_chain: list[str],
problems: list[str],
suggestions: list[str],
):
self.run_id = run_id
self.failure_point = failure_point
self.root_cause = root_cause
self.decision_chain = decision_chain
self.problems = problems
self.suggestions = suggestions
def to_dict(self) -> dict[str, Any]:
return {
"run_id": self.run_id,
"failure_point": self.failure_point,
"root_cause": self.root_cause,
"decision_chain": self.decision_chain,
"problems": self.problems,
"suggestions": self.suggestions,
}
def __str__(self) -> str:
lines = [
f"=== Failure Analysis for {self.run_id} ===",
"",
f"Failure Point: {self.failure_point}",
f"Root Cause: {self.root_cause}",
"",
"Decision Chain Leading to Failure:",
]
for i, dec in enumerate(self.decision_chain, 1):
lines.append(f" {i}. {dec}")
if self.problems:
lines.append("")
lines.append("Reported Problems:")
for prob in self.problems:
lines.append(f" - {prob}")
if self.suggestions:
lines.append("")
lines.append("Suggestions:")
for sug in self.suggestions:
lines.append(f"{sug}")
return "\n".join(lines)
class PatternAnalysis:
"""Patterns detected across multiple runs."""
def __init__(
self,
goal_id: str,
run_count: int,
success_rate: float,
common_failures: list[tuple[str, int]],
problematic_nodes: list[tuple[str, float]],
decision_patterns: dict[str, Any],
):
self.goal_id = goal_id
self.run_count = run_count
self.success_rate = success_rate
self.common_failures = common_failures
self.problematic_nodes = problematic_nodes
self.decision_patterns = decision_patterns
def to_dict(self) -> dict[str, Any]:
return {
"goal_id": self.goal_id,
"run_count": self.run_count,
"success_rate": self.success_rate,
"common_failures": self.common_failures,
"problematic_nodes": self.problematic_nodes,
"decision_patterns": self.decision_patterns,
}
def __str__(self) -> str:
lines = [
f"=== Pattern Analysis for Goal {self.goal_id} ===",
"",
f"Runs Analyzed: {self.run_count}",
f"Success Rate: {self.success_rate:.1%}",
]
if self.common_failures:
lines.append("")
lines.append("Common Failures:")
for failure, count in self.common_failures:
lines.append(f" - {failure} ({count} occurrences)")
if self.problematic_nodes:
lines.append("")
lines.append("Problematic Nodes (failure rate):")
for node, rate in self.problematic_nodes:
lines.append(f" - {node}: {rate:.1%} failure rate")
return "\n".join(lines)
class BuilderQuery:
"""
The interface I (Builder) use to understand what agents are doing.
This is optimized for the questions I need to answer when analyzing
agent behavior and deciding what to improve.
"""
def __init__(self, storage_path: str | Path):
self.storage = FileStorage(storage_path)
# === WHAT HAPPENED? ===
def get_run_summary(self, run_id: str) -> RunSummary | None:
"""Get a quick summary of a run."""
return self.storage.load_summary(run_id)
def get_full_run(self, run_id: str) -> Run | None:
"""Get the complete run with all decisions."""
return self.storage.load_run(run_id)
def list_runs_for_goal(self, goal_id: str) -> list[RunSummary]:
"""Get summaries of all runs for a goal."""
run_ids = self.storage.get_runs_by_goal(goal_id)
summaries = []
for run_id in run_ids:
summary = self.storage.load_summary(run_id)
if summary:
summaries.append(summary)
return summaries
def get_recent_failures(self, limit: int = 10) -> list[RunSummary]:
"""Get recent failed runs."""
run_ids = self.storage.get_runs_by_status(RunStatus.FAILED)
summaries = []
for run_id in run_ids[:limit]:
summary = self.storage.load_summary(run_id)
if summary:
summaries.append(summary)
return summaries
# === WHY DID IT FAIL? ===
def analyze_failure(self, run_id: str) -> FailureAnalysis | None:
"""
Deep analysis of why a run failed.
This is my primary tool for understanding what went wrong.
"""
run = self.storage.load_run(run_id)
if run is None or run.status != RunStatus.FAILED:
return None
# Find the first failed decision
failed_decisions = [d for d in run.decisions if not d.was_successful]
if not failed_decisions:
failure_point = "Unknown - no decision marked as failed"
root_cause = "Run failed but all decisions succeeded (external cause?)"
else:
first_failure = failed_decisions[0]
failure_point = first_failure.summary_for_builder()
root_cause = first_failure.outcome.error if first_failure.outcome else "Unknown"
# Build the decision chain leading to failure
decision_chain = []
for d in run.decisions:
decision_chain.append(d.summary_for_builder())
if not d.was_successful:
break
# Extract problems
problems = [f"[{p.severity}] {p.description}" for p in run.problems]
# Generate suggestions based on the failure
suggestions = self._generate_suggestions(run, failed_decisions)
return FailureAnalysis(
run_id=run_id,
failure_point=failure_point,
root_cause=root_cause,
decision_chain=decision_chain,
problems=problems,
suggestions=suggestions,
)
def get_decision_trace(self, run_id: str) -> list[str]:
"""Get a readable trace of all decisions in a run."""
run = self.storage.load_run(run_id)
if run is None:
return []
return [d.summary_for_builder() for d in run.decisions]
# === WHAT PATTERNS EMERGE? ===
def find_patterns(self, goal_id: str) -> PatternAnalysis | None:
"""
Find patterns across runs for a goal.
This helps me understand systemic issues vs one-off failures.
"""
run_ids = self.storage.get_runs_by_goal(goal_id)
if not run_ids:
return None
runs = []
for run_id in run_ids:
run = self.storage.load_run(run_id)
if run:
runs.append(run)
if not runs:
return None
# Calculate success rate
completed = [r for r in runs if r.status == RunStatus.COMPLETED]
success_rate = len(completed) / len(runs) if runs else 0.0
# Find common failures
failure_counts: dict[str, int] = defaultdict(int)
for run in runs:
for decision in run.decisions:
if not decision.was_successful and decision.outcome:
error = decision.outcome.error or "Unknown error"
failure_counts[error] += 1
common_failures = sorted(failure_counts.items(), key=lambda x: x[1], reverse=True)[:5]
# Find problematic nodes
node_stats: dict[str, dict[str, int]] = defaultdict(lambda: {"total": 0, "failed": 0})
for run in runs:
for decision in run.decisions:
node_stats[decision.node_id]["total"] += 1
if not decision.was_successful:
node_stats[decision.node_id]["failed"] += 1
problematic_nodes = []
for node_id, stats in node_stats.items():
if stats["total"] > 0:
failure_rate = stats["failed"] / stats["total"]
if failure_rate > 0.1: # More than 10% failure rate
problematic_nodes.append((node_id, failure_rate))
problematic_nodes.sort(key=lambda x: x[1], reverse=True)
# Decision patterns
decision_patterns = self._analyze_decision_patterns(runs)
return PatternAnalysis(
goal_id=goal_id,
run_count=len(runs),
success_rate=success_rate,
common_failures=common_failures,
problematic_nodes=problematic_nodes,
decision_patterns=decision_patterns,
)
def compare_runs(self, run_id_1: str, run_id_2: str) -> dict[str, Any]:
"""Compare two runs to understand what differed."""
run1 = self.storage.load_run(run_id_1)
run2 = self.storage.load_run(run_id_2)
if run1 is None or run2 is None:
return {"error": "One or both runs not found"}
return {
"run_1": {
"id": run1.id,
"status": run1.status.value,
"decisions": len(run1.decisions),
"success_rate": run1.metrics.success_rate,
},
"run_2": {
"id": run2.id,
"status": run2.status.value,
"decisions": len(run2.decisions),
"success_rate": run2.metrics.success_rate,
},
"differences": self._find_differences(run1, run2),
}
# === WHAT SHOULD WE CHANGE? ===
def suggest_improvements(self, goal_id: str) -> list[dict[str, Any]]:
"""
Generate improvement suggestions based on run analysis.
This is what I use to propose changes to the human engineer.
"""
patterns = self.find_patterns(goal_id)
if patterns is None:
return []
suggestions = []
# Suggestion: Fix problematic nodes
for node_id, failure_rate in patterns.problematic_nodes:
suggestions.append(
{
"type": "node_improvement",
"target": node_id,
"reason": f"Node has {failure_rate:.1%} failure rate",
"recommendation": (
f"Review and improve node '{node_id}' - "
"high failure rate suggests prompt or tool issues"
),
"priority": "high" if failure_rate > 0.3 else "medium",
}
)
# Suggestion: Address common failures
for failure, count in patterns.common_failures:
if count >= 2:
suggestions.append(
{
"type": "error_handling",
"target": failure,
"reason": f"Error occurred {count} times",
"recommendation": f"Add handling for: {failure}",
"priority": "high" if count >= 5 else "medium",
}
)
# Suggestion: Overall success rate
if patterns.success_rate < 0.8:
suggestions.append(
{
"type": "architecture",
"target": goal_id,
"reason": f"Goal success rate is only {patterns.success_rate:.1%}",
"recommendation": (
"Consider restructuring the agent graph or improving goal definition"
),
"priority": "high",
}
)
return suggestions
def get_node_performance(self, node_id: str) -> dict[str, Any]:
"""Get performance metrics for a specific node across all runs."""
run_ids = self.storage.get_runs_by_node(node_id)
total_decisions = 0
successful_decisions = 0
total_latency = 0
total_tokens = 0
decision_types: dict[str, int] = defaultdict(int)
for run_id in run_ids:
run = self.storage.load_run(run_id)
if run:
for decision in run.decisions:
if decision.node_id == node_id:
total_decisions += 1
if decision.was_successful:
successful_decisions += 1
if decision.outcome:
total_latency += decision.outcome.latency_ms
total_tokens += decision.outcome.tokens_used
decision_types[decision.decision_type.value] += 1
return {
"node_id": node_id,
"total_decisions": total_decisions,
"success_rate": successful_decisions / total_decisions if total_decisions > 0 else 0,
"avg_latency_ms": total_latency / total_decisions if total_decisions > 0 else 0,
"total_tokens": total_tokens,
"decision_type_distribution": dict(decision_types),
}
# === PRIVATE HELPERS ===
def _generate_suggestions(
self,
run: Run,
failed_decisions: list[Decision],
) -> list[str]:
"""Generate suggestions based on failure analysis."""
suggestions = []
for decision in failed_decisions:
# Check if there were alternatives
if len(decision.options) > 1:
chosen = decision.chosen_option
alternatives = [o for o in decision.options if o.id != decision.chosen_option_id]
if alternatives:
alt_desc = alternatives[0].description
chosen_desc = chosen.description if chosen else "unknown"
suggestions.append(
f"Consider alternative: '{alt_desc}' instead of '{chosen_desc}'"
)
# Check for missing context
if not decision.input_context:
suggestions.append(
f"Decision '{decision.intent}' had no input context - "
"ensure relevant data is passed"
)
# Check for constraint issues
if decision.active_constraints:
constraints = ", ".join(decision.active_constraints)
suggestions.append(f"Review constraints: {constraints} - may be too restrictive")
# Check for reported problems with suggestions
for problem in run.problems:
if problem.suggested_fix:
suggestions.append(problem.suggested_fix)
return suggestions
def _analyze_decision_patterns(self, runs: list[Run]) -> dict[str, Any]:
"""Analyze decision patterns across runs."""
type_counts: dict[str, int] = defaultdict(int)
option_counts: dict[str, dict[str, int]] = defaultdict(lambda: defaultdict(int))
for run in runs:
for decision in run.decisions:
type_counts[decision.decision_type.value] += 1
# Track which options are chosen for similar intents
intent_key = decision.intent[:50] # Truncate for grouping
if decision.chosen_option:
option_counts[intent_key][decision.chosen_option.description] += 1
# Find most common choices per intent
common_choices = {}
for intent, choices in option_counts.items():
if choices:
most_common = max(choices.items(), key=lambda x: x[1])
common_choices[intent] = {
"choice": most_common[0],
"count": most_common[1],
"alternatives": len(choices) - 1,
}
return {
"decision_type_distribution": dict(type_counts),
"common_choices": common_choices,
}
def _find_differences(self, run1: Run, run2: Run) -> list[str]:
"""Find key differences between two runs."""
differences = []
# Status difference
if run1.status != run2.status:
differences.append(f"Status: {run1.status.value} vs {run2.status.value}")
# Decision count difference
if len(run1.decisions) != len(run2.decisions):
differences.append(f"Decision count: {len(run1.decisions)} vs {len(run2.decisions)}")
# Find first divergence point
for i, (d1, d2) in enumerate(zip(run1.decisions, run2.decisions, strict=False)):
if d1.chosen_option_id != d2.chosen_option_id:
differences.append(
f"Diverged at decision {i}: "
f"chose '{d1.chosen_option_id}' vs '{d2.chosen_option_id}'"
)
break
# Node differences
nodes1 = set(run1.metrics.nodes_executed)
nodes2 = set(run2.metrics.nodes_executed)
if nodes1 != nodes2:
only_1 = nodes1 - nodes2
only_2 = nodes2 - nodes1
if only_1:
differences.append(f"Nodes only in run 1: {only_1}")
if only_2:
differences.append(f"Nodes only in run 2: {only_2}")
return differences
+14
View File
@@ -90,6 +90,17 @@ def get_api_key() -> str | None:
except ImportError:
pass
# Kimi Code subscription: read API key from ~/.kimi/config.toml
if llm.get("use_kimi_code_subscription"):
try:
from framework.runner.runner import get_kimi_code_token
token = get_kimi_code_token()
if token:
return token
except ImportError:
pass
# Standard env-var path (covers ZAI Code and all API-key providers)
api_key_env_var = llm.get("api_key_env_var")
if api_key_env_var:
@@ -108,6 +119,9 @@ def get_api_base() -> str | None:
if llm.get("use_codex_subscription"):
# Codex subscription routes through the ChatGPT backend, not api.openai.com.
return "https://chatgpt.com/backend-api/codex"
if llm.get("use_kimi_code_subscription"):
# Kimi Code uses an Anthropic-compatible endpoint (no /v1 suffix).
return "https://api.kimi.com/coding"
return llm.get("api_base")
+1 -3
View File
@@ -6,7 +6,7 @@ This module provides secure credential storage with:
- Template-based usage: {{cred.key}} patterns for injection
- Bipartisan model: Store stores values, tools define usage
- Provider system: Extensible lifecycle management (refresh, validate)
- Multiple backends: Encrypted files, env vars, HashiCorp Vault
- Multiple backends: Encrypted files, env vars
Quick Start:
from core.framework.credentials import CredentialStore, CredentialObject
@@ -38,8 +38,6 @@ For Aden server sync:
AdenSyncProvider,
)
For Vault integration:
from core.framework.credentials.vault import HashiCorpVaultStorage
"""
from .key_storage import (
@@ -1,55 +0,0 @@
"""
HashiCorp Vault integration for the credential store.
This module provides enterprise-grade secret management through
HashiCorp Vault integration.
Quick Start:
from core.framework.credentials import CredentialStore
from core.framework.credentials.vault import HashiCorpVaultStorage
# Configure Vault storage
storage = HashiCorpVaultStorage(
url="https://vault.example.com:8200",
# token read from VAULT_TOKEN env var
mount_point="secret",
path_prefix="hive/agents/prod"
)
# Create credential store with Vault backend
store = CredentialStore(storage=storage)
# Use normally - credentials are stored in Vault
credential = store.get_credential("my_api")
Requirements:
pip install hvac
Authentication:
Set the VAULT_TOKEN environment variable or pass the token directly:
export VAULT_TOKEN="hvs.xxxxxxxxxxxxx"
For production, consider using Vault auth methods:
- Kubernetes auth
- AppRole auth
- AWS IAM auth
Vault Configuration:
Ensure KV v2 secrets engine is enabled:
vault secrets enable -path=secret kv-v2
Grant appropriate policies:
path "secret/data/hive/credentials/*" {
capabilities = ["create", "read", "update", "delete", "list"]
}
path "secret/metadata/hive/credentials/*" {
capabilities = ["list", "delete"]
}
"""
from .hashicorp import HashiCorpVaultStorage
__all__ = ["HashiCorpVaultStorage"]
@@ -1,394 +0,0 @@
"""
HashiCorp Vault storage adapter.
Provides integration with HashiCorp Vault for enterprise secret management.
Requires the 'hvac' package: uv pip install hvac
"""
from __future__ import annotations
import logging
import os
from datetime import datetime
from typing import Any
from pydantic import SecretStr
from ..models import CredentialKey, CredentialObject, CredentialType
from ..storage import CredentialStorage
logger = logging.getLogger(__name__)
class HashiCorpVaultStorage(CredentialStorage):
"""
HashiCorp Vault storage adapter.
Features:
- KV v2 secrets engine support
- Namespace support (Enterprise)
- Automatic secret versioning
- Audit logging via Vault
The adapter stores credentials in Vault's KV v2 secrets engine with
the following structure:
{mount_point}/data/{path_prefix}/{credential_id}
data:
_type: "oauth2"
access_token: "xxx"
refresh_token: "yyy"
_expires_access_token: "2024-01-26T12:00:00"
_provider_id: "oauth2"
Example:
storage = HashiCorpVaultStorage(
url="https://vault.example.com:8200",
token="hvs.xxx", # Or use VAULT_TOKEN env var
mount_point="secret",
path_prefix="hive/credentials"
)
store = CredentialStore(storage=storage)
# Credentials are now stored in Vault
store.save_credential(credential)
credential = store.get_credential("my_api")
Authentication:
The adapter uses token-based authentication. The token can be provided:
1. Directly via the 'token' parameter
2. Via the VAULT_TOKEN environment variable
For production, consider using:
- Kubernetes auth method
- AppRole auth method
- AWS IAM auth method
Requirements:
uv pip install hvac
"""
def __init__(
self,
url: str,
token: str | None = None,
mount_point: str = "secret",
path_prefix: str = "hive/credentials",
namespace: str | None = None,
verify_ssl: bool = True,
):
"""
Initialize Vault storage.
Args:
url: Vault server URL (e.g., https://vault.example.com:8200)
token: Vault token. If None, reads from VAULT_TOKEN env var
mount_point: KV secrets engine mount point (default: "secret")
path_prefix: Path prefix for all credentials
namespace: Vault namespace (Enterprise feature)
verify_ssl: Whether to verify SSL certificates
Raises:
ImportError: If hvac is not installed
ValueError: If authentication fails
"""
try:
import hvac
except ImportError as e:
raise ImportError(
"HashiCorp Vault support requires 'hvac'. Install with: uv pip install hvac"
) from e
self._url = url
self._token = token or os.environ.get("VAULT_TOKEN")
self._mount = mount_point
self._prefix = path_prefix
self._namespace = namespace
if not self._token:
raise ValueError(
"Vault token required. Set VAULT_TOKEN env var or pass token parameter."
)
self._client = hvac.Client(
url=url,
token=self._token,
namespace=namespace,
verify=verify_ssl,
)
if not self._client.is_authenticated():
raise ValueError("Vault authentication failed. Check token and server URL.")
logger.info(f"Connected to HashiCorp Vault at {url}")
def _path(self, credential_id: str) -> str:
"""Build Vault path for credential."""
# Sanitize credential_id
safe_id = credential_id.replace("/", "_").replace("\\", "_")
return f"{self._prefix}/{safe_id}"
def save(self, credential: CredentialObject) -> None:
"""Save credential to Vault KV v2."""
path = self._path(credential.id)
data = self._serialize_for_vault(credential)
try:
self._client.secrets.kv.v2.create_or_update_secret(
path=path,
secret=data,
mount_point=self._mount,
)
logger.debug(f"Saved credential '{credential.id}' to Vault at {path}")
except Exception as e:
logger.error(f"Failed to save credential '{credential.id}' to Vault: {e}")
raise
def load(self, credential_id: str) -> CredentialObject | None:
"""Load credential from Vault."""
path = self._path(credential_id)
try:
response = self._client.secrets.kv.v2.read_secret_version(
path=path,
mount_point=self._mount,
)
data = response["data"]["data"]
return self._deserialize_from_vault(credential_id, data)
except Exception as e:
# Check if it's a "not found" error
error_str = str(e).lower()
if "not found" in error_str or "404" in error_str:
logger.debug(f"Credential '{credential_id}' not found in Vault")
return None
logger.error(f"Failed to load credential '{credential_id}' from Vault: {e}")
raise
def delete(self, credential_id: str) -> bool:
"""Delete credential from Vault (all versions)."""
path = self._path(credential_id)
try:
self._client.secrets.kv.v2.delete_metadata_and_all_versions(
path=path,
mount_point=self._mount,
)
logger.debug(f"Deleted credential '{credential_id}' from Vault")
return True
except Exception as e:
error_str = str(e).lower()
if "not found" in error_str or "404" in error_str:
return False
logger.error(f"Failed to delete credential '{credential_id}' from Vault: {e}")
raise
def list_all(self) -> list[str]:
"""List all credentials under the prefix."""
try:
response = self._client.secrets.kv.v2.list_secrets(
path=self._prefix,
mount_point=self._mount,
)
keys = response.get("data", {}).get("keys", [])
# Remove trailing slashes from folder names
return [k.rstrip("/") for k in keys]
except Exception as e:
error_str = str(e).lower()
if "not found" in error_str or "404" in error_str:
return []
logger.error(f"Failed to list credentials from Vault: {e}")
raise
def exists(self, credential_id: str) -> bool:
"""Check if credential exists in Vault."""
try:
path = self._path(credential_id)
self._client.secrets.kv.v2.read_secret_version(
path=path,
mount_point=self._mount,
)
return True
except Exception:
return False
def _serialize_for_vault(self, credential: CredentialObject) -> dict[str, Any]:
"""Convert credential to Vault secret format."""
data: dict[str, Any] = {
"_type": credential.credential_type.value,
}
if credential.provider_id:
data["_provider_id"] = credential.provider_id
if credential.description:
data["_description"] = credential.description
if credential.auto_refresh:
data["_auto_refresh"] = "true"
# Store each key
for key_name, key in credential.keys.items():
data[key_name] = key.get_secret_value()
if key.expires_at:
data[f"_expires_{key_name}"] = key.expires_at.isoformat()
if key.metadata:
data[f"_metadata_{key_name}"] = str(key.metadata)
return data
def _deserialize_from_vault(self, credential_id: str, data: dict[str, Any]) -> CredentialObject:
"""Reconstruct credential from Vault secret."""
# Extract metadata fields
cred_type = CredentialType(data.pop("_type", "api_key"))
provider_id = data.pop("_provider_id", None)
description = data.pop("_description", "")
auto_refresh = data.pop("_auto_refresh", "") == "true"
# Build keys dict
keys: dict[str, CredentialKey] = {}
# Find all non-metadata keys
key_names = [k for k in data.keys() if not k.startswith("_")]
for key_name in key_names:
value = data[key_name]
# Check for expiration
expires_at = None
expires_key = f"_expires_{key_name}"
if expires_key in data:
try:
expires_at = datetime.fromisoformat(data[expires_key])
except (ValueError, TypeError):
pass
# Check for metadata
metadata: dict[str, Any] = {}
metadata_key = f"_metadata_{key_name}"
if metadata_key in data:
try:
import ast
metadata = ast.literal_eval(data[metadata_key])
except (ValueError, SyntaxError):
pass
keys[key_name] = CredentialKey(
name=key_name,
value=SecretStr(value),
expires_at=expires_at,
metadata=metadata,
)
return CredentialObject(
id=credential_id,
credential_type=cred_type,
keys=keys,
provider_id=provider_id,
description=description,
auto_refresh=auto_refresh,
)
# --- Vault-Specific Operations ---
def get_secret_metadata(self, credential_id: str) -> dict[str, Any] | None:
"""
Get Vault metadata for a secret (version info, timestamps, etc.).
Args:
credential_id: The credential identifier
Returns:
Metadata dict or None if not found
"""
path = self._path(credential_id)
try:
response = self._client.secrets.kv.v2.read_secret_metadata(
path=path,
mount_point=self._mount,
)
return response.get("data", {})
except Exception:
return None
def soft_delete(self, credential_id: str, versions: list[int] | None = None) -> bool:
"""
Soft delete specific versions (can be recovered).
Args:
credential_id: The credential identifier
versions: Version numbers to delete. If None, deletes latest.
Returns:
True if successful
"""
path = self._path(credential_id)
try:
if versions:
self._client.secrets.kv.v2.delete_secret_versions(
path=path,
versions=versions,
mount_point=self._mount,
)
else:
self._client.secrets.kv.v2.delete_latest_version_of_secret(
path=path,
mount_point=self._mount,
)
return True
except Exception as e:
logger.error(f"Soft delete failed for '{credential_id}': {e}")
return False
def undelete(self, credential_id: str, versions: list[int]) -> bool:
"""
Recover soft-deleted versions.
Args:
credential_id: The credential identifier
versions: Version numbers to recover
Returns:
True if successful
"""
path = self._path(credential_id)
try:
self._client.secrets.kv.v2.undelete_secret_versions(
path=path,
versions=versions,
mount_point=self._mount,
)
return True
except Exception as e:
logger.error(f"Undelete failed for '{credential_id}': {e}")
return False
def load_version(self, credential_id: str, version: int) -> CredentialObject | None:
"""
Load a specific version of a credential.
Args:
credential_id: The credential identifier
version: Version number to load
Returns:
CredentialObject or None
"""
path = self._path(credential_id)
try:
response = self._client.secrets.kv.v2.read_secret_version(
path=path,
version=version,
mount_point=self._mount,
)
data = response["data"]["data"]
return self._deserialize_from_vault(credential_id, data)
except Exception:
return None
+178 -146
View File
@@ -73,6 +73,7 @@ class _EscalationReceiver:
def __init__(self) -> None:
self._event = asyncio.Event()
self._response: str | None = None
self._awaiting_input = True # So inject_worker_message() can prefer us
async def inject_event(self, content: str, *, is_client_input: bool = False) -> None:
"""Called by ExecutionStream.inject_input() when the user responds."""
@@ -101,7 +102,10 @@ class JudgeVerdict:
"""Result of judge evaluation for the event loop."""
action: Literal["ACCEPT", "RETRY", "ESCALATE"]
feedback: str = ""
# None = no evaluation happened (skip_judge, tool-continue); not logged.
# "" = evaluated but no feedback; logged with default text.
# "..." = evaluated with feedback; logged as-is.
feedback: str | None = None
@runtime_checkable
@@ -131,7 +135,7 @@ class SubagentJudge:
async def evaluate(self, context: dict[str, Any]) -> JudgeVerdict:
missing = context.get("missing_keys", [])
if not missing:
return JudgeVerdict(action="ACCEPT")
return JudgeVerdict(action="ACCEPT", feedback="")
iteration = context.get("iteration", 0)
remaining = self._max_iterations - iteration - 1
@@ -165,7 +169,7 @@ class LoopConfig:
max_tool_calls_per_turn: int = 30
judge_every_n_turns: int = 1
stall_detection_threshold: int = 3
stall_similarity_threshold: float = 0.7
stall_similarity_threshold: float = 0.85
max_history_tokens: int = 32_000
store_prefix: str = ""
@@ -347,6 +351,7 @@ class EventLoopNode(NodeProtocol):
self._awaiting_input = False
self._shutdown = False
self._stream_task: asyncio.Task | None = None
self._tool_task: asyncio.Task | None = None # gather task while tools run
# Track which nodes already have an action plan emitted (skip on revisit)
self._action_plan_emitted: set[str] = set()
# Monotonic counter for spillover file naming (web_search_1.txt, etc.)
@@ -477,23 +482,32 @@ class EventLoopNode(NodeProtocol):
# If it doesn't exist yet, seed it with available context.
if self._config.spillover_dir:
_adapt_path = Path(self._config.spillover_dir) / "adapt.md"
if not _adapt_path.exists() and ctx.accounts_prompt:
if not _adapt_path.exists():
_adapt_path.parent.mkdir(parents=True, exist_ok=True)
_adapt_path.write_text(
f"## Identity\n{ctx.accounts_prompt}\n",
encoding="utf-8",
seed = (
f"## Identity\n{ctx.accounts_prompt}\n"
if ctx.accounts_prompt
else "# Session Working Memory\n"
)
_adapt_path.write_text(seed, encoding="utf-8")
if _adapt_path.exists():
_adapt_text = _adapt_path.read_text(encoding="utf-8").strip()
if _adapt_text:
system_prompt = (
f"{system_prompt}\n\n"
f"--- Your Memory ---\n{_adapt_text}\n--- End Memory ---\n\n"
'Maintain your memory by calling save_data("adapt.md", ...) '
'or edit_data("adapt.md", ...) as you work.\n'
"IMMEDIATELY save: user rules about which account/identity to use, "
"behavioral constraints, and preferences. "
"Also record session history, decisions, and working notes."
"--- Session Working Memory ---\n"
f"{_adapt_text}\n"
"--- End Session Working Memory ---\n\n"
"Maintain your session working memory by calling "
'save_data("adapt.md", ...) or edit_data("adapt.md", ...)'
" as you work.\n"
"This is session-scoped scratch space. "
"IMMEDIATELY save: account/identity rules, "
"behavioral constraints, and preferences specific to "
"this session. Also record current task state, "
"decisions, and working notes. "
"For lasting knowledge about the user, use "
"update_queen_memory() and append_queen_journal() instead."
)
conversation = NodeConversation(
@@ -671,6 +685,7 @@ class EventLoopNode(NodeProtocol):
queen_input_requested,
request_system_prompt,
request_messages,
reported_to_parent,
) = await self._run_single_turn(
ctx, conversation, tools, iteration, accumulator
)
@@ -872,6 +887,7 @@ class EventLoopNode(NodeProtocol):
and not outputs_set
and not user_input_requested
and not queen_input_requested
and not reported_to_parent
)
if truly_empty and accumulator is not None:
missing = self._get_missing_output_keys(
@@ -1322,8 +1338,8 @@ class EventLoopNode(NodeProtocol):
# Auto-block beyond grace -- fall through to judge (6i)
# 6h''. Worker wait for queen guidance
# If a worker escalates with wait_for_response=true, pause here and
# skip judge evaluation until queen injects guidance.
# When a worker escalates, pause here and skip judge evaluation
# until the queen injects guidance.
if queen_input_requested:
if self._shutdown:
await self._publish_loop_completed(
@@ -1465,7 +1481,7 @@ class EventLoopNode(NodeProtocol):
continue
# Judge evaluation (should_judge is always True here)
verdict = await self._evaluate(
verdict = await self._judge_turn(
ctx,
conversation,
accumulator,
@@ -1544,7 +1560,7 @@ class EventLoopNode(NodeProtocol):
node_type="event_loop",
step_index=iteration,
verdict="ACCEPT",
verdict_feedback=verdict.feedback,
verdict_feedback=verdict.feedback or "",
tool_calls=logged_tool_calls,
llm_text=assistant_text,
input_tokens=turn_tokens.get("input", 0),
@@ -1587,7 +1603,7 @@ class EventLoopNode(NodeProtocol):
node_type="event_loop",
step_index=iteration,
verdict="ESCALATE",
verdict_feedback=verdict.feedback,
verdict_feedback=verdict.feedback or "",
tool_calls=logged_tool_calls,
llm_text=assistant_text,
input_tokens=turn_tokens.get("input", 0),
@@ -1599,7 +1615,7 @@ class EventLoopNode(NodeProtocol):
node_name=ctx.node_spec.name,
node_type="event_loop",
success=False,
error=f"Judge escalated: {verdict.feedback}",
error=f"Judge escalated: {verdict.feedback or 'no feedback'}",
total_steps=iteration + 1,
tokens_used=total_input_tokens + total_output_tokens,
input_tokens=total_input_tokens,
@@ -1613,7 +1629,7 @@ class EventLoopNode(NodeProtocol):
)
return NodeResult(
success=False,
error=f"Judge escalated: {verdict.feedback}",
error=f"Judge escalated: {verdict.feedback or 'no feedback'}",
output=accumulator.to_dict(),
tokens_used=total_input_tokens + total_output_tokens,
latency_ms=latency_ms,
@@ -1629,15 +1645,16 @@ class EventLoopNode(NodeProtocol):
node_type="event_loop",
step_index=iteration,
verdict="RETRY",
verdict_feedback=verdict.feedback,
verdict_feedback=verdict.feedback or "",
tool_calls=logged_tool_calls,
llm_text=assistant_text,
input_tokens=turn_tokens.get("input", 0),
output_tokens=turn_tokens.get("output", 0),
latency_ms=iter_latency_ms,
)
if verdict.feedback:
await conversation.add_user_message(f"[Judge feedback]: {verdict.feedback}")
if verdict.feedback is not None:
fb = verdict.feedback or "[Judge returned RETRY without feedback]"
await conversation.add_user_message(f"[Judge feedback]: {fb}")
continue
# 7. Max iterations exhausted
@@ -1702,14 +1719,16 @@ class EventLoopNode(NodeProtocol):
self._input_ready.set()
def cancel_current_turn(self) -> None:
"""Cancel the current LLM streaming turn instantly.
"""Cancel the current LLM streaming turn or in-progress tool calls instantly.
Unlike signal_shutdown() which permanently stops the event loop,
this only kills the in-progress HTTP stream via task.cancel().
this only kills the in-progress HTTP stream or tool gather task.
The queen stays alive for the next user message.
"""
if self._stream_task and not self._stream_task.done():
self._stream_task.cancel()
if self._tool_task and not self._tool_task.done():
self._tool_task.cancel()
async def _await_user_input(
self,
@@ -1787,12 +1806,13 @@ class EventLoopNode(NodeProtocol):
bool,
str,
list[dict[str, Any]],
bool,
]:
"""Run a single LLM turn with streaming and tool execution.
Returns (assistant_text, real_tool_results, outputs_set, token_counts, logged_tool_calls,
user_input_requested, ask_user_prompt, ask_user_options, queen_input_requested,
system_prompt, messages).
system_prompt, messages, reported_to_parent).
``real_tool_results`` contains only results from actual tools (web_search,
etc.), NOT from synthetic framework tools such as ``set_output``,
@@ -1802,8 +1822,8 @@ class EventLoopNode(NodeProtocol):
``ask_user`` during this turn. This separation lets the caller treat
synthetic tools as framework concerns rather than tool-execution concerns.
``queen_input_requested`` is True when the worker called
``escalate(wait_for_response=true)`` and should wait for
queen guidance before judge evaluation.
``escalate`` and should wait for queen guidance before judge
evaluation.
``logged_tool_calls`` accumulates ALL tool calls across inner iterations
(real tools, set_output, and discarded calls) for L3 logging. Unlike
@@ -1824,6 +1844,7 @@ class EventLoopNode(NodeProtocol):
ask_user_prompt = ""
ask_user_options: list[str] | None = None
queen_input_requested = False
reported_to_parent = False
# Accumulate ALL tool calls across inner iterations for L3 logging.
# Unlike real_tool_results (reset each inner iteration), this persists.
logged_tool_calls: list[dict] = []
@@ -1977,6 +1998,7 @@ class EventLoopNode(NodeProtocol):
queen_input_requested,
final_system_prompt,
final_messages,
reported_to_parent,
)
# Execute tool calls — framework tools (set_output, ask_user)
@@ -2124,7 +2146,6 @@ class EventLoopNode(NodeProtocol):
# --- Framework-level escalate handling ---
reason = str(tc.tool_input.get("reason", "")).strip()
context = str(tc.tool_input.get("context", "")).strip()
# Always wait for queen guidance
if stream_id in ("queen", "judge"):
result = ToolResult(
@@ -2160,7 +2181,7 @@ class EventLoopNode(NodeProtocol):
result = ToolResult(
tool_use_id=tc.tool_use_id,
content="Escalation requested to hive_coder (queen); waiting for guidance.",
content="Escalation requested to queen; waiting for guidance.",
is_error=False,
)
results_by_id[tc.tool_use_id] = result
@@ -2179,6 +2200,7 @@ class EventLoopNode(NodeProtocol):
elif tc.tool_name == "report_to_parent":
# --- Report from sub-agent to parent (optionally blocking) ---
reported_to_parent = True
msg = tc.tool_input.get("message", "")
data = tc.tool_input.get("data")
wait = tc.tool_input.get("wait_for_response", False)
@@ -2250,10 +2272,16 @@ class EventLoopNode(NodeProtocol):
_dur = round(time.time() - _s, 3)
return _r, _iso, _dur
timed_results = await asyncio.gather(
*(_timed_execute(tc) for tc in pending_real),
return_exceptions=True,
self._tool_task = asyncio.ensure_future(
asyncio.gather(
*(_timed_execute(tc) for tc in pending_real),
return_exceptions=True,
)
)
try:
timed_results = await self._tool_task
finally:
self._tool_task = None
# gather(return_exceptions=True) captures CancelledError
# as a return value instead of propagating it. Re-raise
# so stop_worker actually stops the execution.
@@ -2454,6 +2482,7 @@ class EventLoopNode(NodeProtocol):
queen_input_requested,
final_system_prompt,
final_messages,
reported_to_parent,
)
# --- Mid-turn pruning: prevent context blowup within a single turn ---
@@ -2485,6 +2514,7 @@ class EventLoopNode(NodeProtocol):
queen_input_requested,
final_system_prompt,
final_messages,
reported_to_parent,
)
# Tool calls processed -- loop back to stream with updated conversation
@@ -2582,7 +2612,7 @@ class EventLoopNode(NodeProtocol):
return Tool(
name="escalate",
description=(
"Escalate to the Hive Coder queen when requesting user input, "
"Escalate to the queen when requesting user input, "
"blocked by errors, missing "
"credentials, or ambiguous constraints that require supervisor "
"guidance. Include a concise reason and optional context. "
@@ -2771,7 +2801,7 @@ class EventLoopNode(NodeProtocol):
# Judge evaluation
# -------------------------------------------------------------------
async def _evaluate(
async def _judge_turn(
self,
ctx: NodeContext,
conversation: NodeConversation,
@@ -2780,14 +2810,29 @@ class EventLoopNode(NodeProtocol):
tool_results: list[dict],
iteration: int,
) -> JudgeVerdict:
"""Evaluate the current state using judge or implicit logic."""
# Short-circuit: subagent called report_to_parent(mark_complete=True)
"""Evaluate the current state using judge or implicit logic.
Evaluation levels (in order):
0. Short-circuits: mark_complete, skip_judge, tool-continue.
1. Custom judge (JudgeProtocol) full authority when set.
2. Implicit judge output-key check + optional conversation-aware
quality gate (when ``success_criteria`` is defined).
Returns a JudgeVerdict. ``feedback=None`` means no real evaluation
happened (skip_judge, tool-continue); the caller must not inject a
feedback message. Any non-None feedback (including ``""``) means a
real evaluation occurred and will be logged into the conversation.
"""
# --- Level 0: short-circuits (no evaluation) -----------------------
if self._mark_complete_flag:
return JudgeVerdict(action="ACCEPT")
# Opt-out: node explicitly disables judge (e.g. conversational queen)
if ctx.node_spec.skip_judge:
return JudgeVerdict(action="RETRY", feedback="")
return JudgeVerdict(action="RETRY") # feedback=None → not logged
# --- Level 1: custom judge -----------------------------------------
if self._judge is not None:
context = {
@@ -2802,81 +2847,82 @@ class EventLoopNode(NodeProtocol):
accumulator, ctx.node_spec.output_keys, ctx.node_spec.nullable_output_keys
),
}
return await self._judge.evaluate(context)
verdict = await self._judge.evaluate(context)
# Ensure evaluated RETRY always carries feedback for logging.
if verdict.action == "RETRY" and not verdict.feedback:
return JudgeVerdict(action="RETRY", feedback="Custom judge returned RETRY.")
return verdict
# Implicit judge: accept when no tool calls and all output keys present
if not tool_results:
missing = self._get_missing_output_keys(
accumulator, ctx.node_spec.output_keys, ctx.node_spec.nullable_output_keys
# --- Level 2: implicit judge ---------------------------------------
# Real tool calls were made — let the agent keep working.
if tool_results:
return JudgeVerdict(action="RETRY") # feedback=None → not logged
missing = self._get_missing_output_keys(
accumulator, ctx.node_spec.output_keys, ctx.node_spec.nullable_output_keys
)
if missing:
return JudgeVerdict(
action="RETRY",
feedback=(
f"Task incomplete. Required outputs not yet produced: {missing}. "
f"Follow your system prompt instructions to complete the work."
),
)
if not missing:
# Safety check: when ALL output keys are nullable and NONE
# have been set, the node produced nothing useful. Retry
# instead of accepting an empty result — this prevents
# client-facing nodes from terminating before the user
# ever interacts, and non-client-facing nodes from
# short-circuiting without doing their work.
output_keys = ctx.node_spec.output_keys or []
nullable_keys = set(ctx.node_spec.nullable_output_keys or [])
all_nullable = output_keys and nullable_keys >= set(output_keys)
none_set = not any(accumulator.get(k) is not None for k in output_keys)
if all_nullable and none_set:
return JudgeVerdict(
action="RETRY",
feedback=(
f"No output keys have been set yet. "
f"Use set_output to set at least one of: {output_keys}"
),
)
# Client-facing nodes with no output keys are meant for
# continuous interaction — they should not auto-accept.
# Only exit via shutdown, max_iterations, or max_node_visits.
# Inject tool-use pressure so models stuck in a
# "narrate-instead-of-act" loop get corrective feedback.
if not output_keys and ctx.node_spec.client_facing:
return JudgeVerdict(
action="RETRY",
feedback=(
"STOP describing what you will do. "
"You have FULL access to all tools — file creation, "
"shell commands, MCP tools — and you CAN call them "
"directly in your response. Respond ONLY with tool "
"calls, no prose. Execute the task now."
),
)
# All output keys present — run safety checks before accepting.
# Level 2: conversation-aware quality check (if success_criteria set)
if ctx.node_spec.success_criteria and ctx.llm:
from framework.graph.conversation_judge import evaluate_phase_completion
output_keys = ctx.node_spec.output_keys or []
nullable_keys = set(ctx.node_spec.nullable_output_keys or [])
verdict = await evaluate_phase_completion(
llm=ctx.llm,
conversation=conversation,
phase_name=ctx.node_spec.name,
phase_description=ctx.node_spec.description,
success_criteria=ctx.node_spec.success_criteria,
accumulator_state=accumulator.to_dict(),
max_history_tokens=self._config.max_history_tokens,
)
if verdict.action != "ACCEPT":
return JudgeVerdict(
action=verdict.action,
feedback=verdict.feedback or "Phase criteria not met.",
)
# All-nullable with nothing set → node produced nothing useful.
all_nullable = output_keys and nullable_keys >= set(output_keys)
none_set = not any(accumulator.get(k) is not None for k in output_keys)
if all_nullable and none_set:
return JudgeVerdict(
action="RETRY",
feedback=(
f"No output keys have been set yet. "
f"Use set_output to set at least one of: {output_keys}"
),
)
return JudgeVerdict(action="ACCEPT")
else:
# Client-facing with no output keys → continuous interaction node.
# Inject tool-use pressure instead of auto-accepting.
if not output_keys and ctx.node_spec.client_facing:
return JudgeVerdict(
action="RETRY",
feedback=(
"STOP describing what you will do. "
"You have FULL access to all tools — file creation, "
"shell commands, MCP tools — and you CAN call them "
"directly in your response. Respond ONLY with tool "
"calls, no prose. Execute the task now."
),
)
# Level 2b: conversation-aware quality check (if success_criteria set)
if ctx.node_spec.success_criteria and ctx.llm:
from framework.graph.conversation_judge import evaluate_phase_completion
verdict = await evaluate_phase_completion(
llm=ctx.llm,
conversation=conversation,
phase_name=ctx.node_spec.name,
phase_description=ctx.node_spec.description,
success_criteria=ctx.node_spec.success_criteria,
accumulator_state=accumulator.to_dict(),
max_history_tokens=self._config.max_history_tokens,
)
if verdict.action != "ACCEPT":
return JudgeVerdict(
action="RETRY",
feedback=(
f"Task incomplete. Required outputs not yet produced: {missing}. "
f"Follow your system prompt instructions to complete the work."
),
action=verdict.action,
feedback=verdict.feedback or "Phase criteria not met.",
)
# Tool calls were made -- continue loop
return JudgeVerdict(action="RETRY", feedback="")
return JudgeVerdict(action="ACCEPT", feedback="")
# -------------------------------------------------------------------
# Helpers
@@ -2956,8 +3002,10 @@ class EventLoopNode(NodeProtocol):
def _is_stalled(self, recent_responses: list[str]) -> bool:
"""Detect stall using n-gram similarity.
Detects when N consecutive responses have similarity >= threshold.
This catches phrases like "I'm still stuck" vs "I'm stuck".
Detects when ALL N consecutive responses are mutually similar
(>= threshold). A single dissimilar response resets the signal.
This catches phrases like "I'm still stuck" vs "I'm stuck"
without false-positives on "attempt 1" vs "attempt 2".
"""
if len(recent_responses) < self._config.stall_detection_threshold:
return False
@@ -2965,13 +3013,11 @@ class EventLoopNode(NodeProtocol):
return False
threshold = self._config.stall_similarity_threshold
# Check similarity against all recent responses (excluding self)
for i, resp in enumerate(recent_responses):
# Compare against all previous responses
for prev in recent_responses[:i]:
if self._ngram_similarity(resp, prev) >= threshold:
return True
return False
# Every consecutive pair must be similar
for i in range(1, len(recent_responses)):
if self._ngram_similarity(recent_responses[i], recent_responses[i - 1]) < threshold:
return False
return True
@staticmethod
def _is_transient_error(exc: BaseException) -> bool:
@@ -3050,10 +3096,11 @@ class EventLoopNode(NodeProtocol):
self,
recent_tool_fingerprints: list[list[tuple[str, str]]],
) -> tuple[bool, str]:
"""Detect doom loop using n-gram similarity on tool inputs.
"""Detect doom loop via exact fingerprint match.
Detects when N consecutive turns have similar tool calls.
Similarity applies to the canonicalized tool input strings.
Detects when N consecutive turns invoke the same tools with
identical (canonicalized) arguments. Different arguments mean
different work, so only exact matches count.
Returns (is_doom_loop, description).
"""
@@ -3066,23 +3113,12 @@ class EventLoopNode(NodeProtocol):
if not first:
return False, ""
# Convert a turn's list of (name, args) pairs to a single comparable string.
def _turn_sig(fp: list[tuple[str, str]]) -> str:
return "|".join(f"{name}:{args}" for name, args in fp)
first_sig = _turn_sig(first)
similarity_threshold = self._config.stall_similarity_threshold
similar_count = sum(
1
for fp in recent_tool_fingerprints
if self._ngram_similarity(_turn_sig(fp), first_sig) >= similarity_threshold
)
if similar_count >= threshold:
tool_names = [name for fp in recent_tool_fingerprints for name, _ in fp]
# All turns in the window must match the first exactly
if all(fp == first for fp in recent_tool_fingerprints[1:]):
tool_names = [name for name, _ in first]
desc = (
f"Doom loop detected: {similar_count}/{len(recent_tool_fingerprints)} "
f"consecutive similar tool calls ({', '.join(tool_names)})"
f"Doom loop detected: {len(recent_tool_fingerprints)} "
f"identical consecutive tool calls ({', '.join(tool_names)})"
)
return True, desc
return False, ""
@@ -4288,22 +4324,18 @@ class EventLoopNode(NodeProtocol):
registry[escalation_id] = receiver
try:
# Stream message to user (parent's node_id so TUI shows parent talking)
await self._event_bus.emit_client_output_delta(
stream_id=ctx.node_id,
node_id=ctx.node_id,
content=message,
snapshot=message,
execution_id=ctx.execution_id,
)
# Request input (escalation_id for routing response back)
await self._event_bus.emit_client_input_requested(
stream_id=ctx.node_id,
# Escalate to the queen instead of asking the user directly.
# The queen handles the request and injects the response via
# inject_worker_message(), which finds this receiver through
# its _awaiting_input flag.
await self._event_bus.emit_escalation_requested(
stream_id=ctx.stream_id or ctx.node_id,
node_id=escalation_id,
prompt=message,
reason=f"Subagent report (wait_for_response) from {agent_id}",
context=message,
execution_id=ctx.execution_id,
)
# Block until user responds
# Block until queen responds
return await receiver.wait()
finally:
registry.pop(escalation_id, None)
+1 -1
View File
@@ -1604,7 +1604,7 @@ class GraphExecutor:
# Return with paused status
return ExecutionResult(
success=False,
error="Execution paused by user",
error="Execution cancelled",
output=saved_memory,
steps_executed=steps,
total_tokens=total_tokens,
-203
View File
@@ -1,203 +0,0 @@
"""
Standardized HITL (Human-In-The-Loop) Protocol
This module defines the formal structure for pause/resume interactions
where agents need to gather input from humans.
"""
from dataclasses import dataclass, field
from enum import StrEnum
from typing import Any
class HITLInputType(StrEnum):
"""Type of input expected from human."""
FREE_TEXT = "free_text" # Open-ended text response
STRUCTURED = "structured" # Specific fields to fill
SELECTION = "selection" # Choose from options
APPROVAL = "approval" # Yes/no/modify decision
MULTI_FIELD = "multi_field" # Multiple related inputs
@dataclass
class HITLQuestion:
"""A single question to ask the human."""
id: str
question: str
input_type: HITLInputType = HITLInputType.FREE_TEXT
# For SELECTION type
options: list[str] = field(default_factory=list)
# For STRUCTURED type
fields: dict[str, str] = field(default_factory=dict) # {field_name: description}
# Metadata
required: bool = True
help_text: str = ""
@dataclass
class HITLRequest:
"""
Formal request for human input at a pause node.
This is what the agent produces when it needs human input.
"""
# Context
objective: str # What we're trying to accomplish
current_state: str # Where we are in the process
# What we need
questions: list[HITLQuestion] = field(default_factory=list)
missing_info: list[str] = field(default_factory=list)
# Guidance
instructions: str = ""
examples: list[str] = field(default_factory=list)
# Metadata
request_id: str = ""
node_id: str = ""
def to_dict(self) -> dict[str, Any]:
"""Convert to dictionary for serialization."""
return {
"objective": self.objective,
"current_state": self.current_state,
"questions": [
{
"id": q.id,
"question": q.question,
"input_type": q.input_type.value,
"options": q.options,
"fields": q.fields,
"required": q.required,
"help_text": q.help_text,
}
for q in self.questions
],
"missing_info": self.missing_info,
"instructions": self.instructions,
"examples": self.examples,
"request_id": self.request_id,
"node_id": self.node_id,
}
@dataclass
class HITLResponse:
"""
Human's response to a HITL request.
This is what gets passed back when resuming from a pause.
"""
# Original request reference
request_id: str
# Human's answers
answers: dict[str, Any] = field(default_factory=dict) # {question_id: answer}
raw_input: str = "" # Raw text if provided
# Metadata
response_time_ms: int = 0
def to_dict(self) -> dict[str, Any]:
"""Convert to dictionary for serialization."""
return {
"request_id": self.request_id,
"answers": self.answers,
"raw_input": self.raw_input,
"response_time_ms": self.response_time_ms,
}
class HITLProtocol:
"""
Standardized protocol for HITL interactions.
Usage in pause nodes:
1. Pause Node: Generates HITLRequest with questions
2. Executor: Saves state and returns request to user
3. User: Provides HITLResponse with answers
4. Resume Node: Processes response and merges into context
"""
@staticmethod
def create_request(
objective: str,
questions: list[HITLQuestion],
missing_info: list[str] | None = None,
node_id: str = "",
) -> HITLRequest:
"""Create a standardized HITL request."""
return HITLRequest(
objective=objective,
current_state="Awaiting clarification",
questions=questions,
missing_info=missing_info or [],
request_id=f"{node_id}_{hash(objective) % 10000}",
node_id=node_id,
)
@staticmethod
def parse_response(
raw_input: str,
request: HITLRequest,
use_haiku: bool = True,
) -> HITLResponse:
"""
Parse human's raw input into structured response.
Maps the raw input to the first question. For multi-question HITL,
the caller should present one question at a time.
"""
response = HITLResponse(request_id=request.request_id, raw_input=raw_input)
# If no questions, just return raw input
if not request.questions:
return response
# Map raw input to first question
response.answers[request.questions[0].id] = raw_input
return response
@staticmethod
def format_for_display(request: HITLRequest) -> str:
"""Format HITL request for user-friendly display."""
parts = []
if request.objective:
parts.append(f"📋 Objective: {request.objective}")
if request.current_state:
parts.append(f"📍 Current State: {request.current_state}")
if request.instructions:
parts.append(f"\n{request.instructions}")
if request.questions:
parts.append(f"\n❓ Questions ({len(request.questions)}):")
for i, q in enumerate(request.questions, 1):
parts.append(f"{i}. {q.question}")
if q.help_text:
parts.append(f" 💡 {q.help_text}")
if q.options:
parts.append(f" Options: {', '.join(q.options)}")
if request.missing_info:
parts.append("\n📝 Missing Information:")
for info in request.missing_info:
parts.append(f"{info}")
if request.examples:
parts.append("\n📚 Examples:")
for example in request.examples:
parts.append(f"{example}")
return "\n".join(parts)
+19 -1
View File
@@ -118,6 +118,10 @@ RATE_LIMIT_MAX_RETRIES = 10
RATE_LIMIT_BACKOFF_BASE = 2 # seconds
RATE_LIMIT_MAX_DELAY = 120 # seconds - cap to prevent absurd waits
MINIMAX_API_BASE = "https://api.minimax.io/v1"
# Kimi For Coding uses an Anthropic-compatible endpoint (no /v1 suffix).
# Claude Code integration uses this format; the /v1 OpenAI-compatible endpoint
# enforces a coding-agent whitelist that blocks unknown User-Agents.
KIMI_API_BASE = "https://api.kimi.com/coding"
# Empty-stream retries use a short fixed delay, not the rate-limit backoff.
# Conversation-structure issues are deterministic — long waits don't help.
@@ -323,9 +327,21 @@ class LiteLLMProvider(LLMProvider):
api_base: Custom API base URL (for proxies or local deployments)
**kwargs: Additional arguments passed to litellm.completion()
"""
# Kimi For Coding exposes an Anthropic-compatible endpoint at
# https://api.kimi.com/coding (the same format Claude Code uses natively).
# Translate kimi/ prefix to anthropic/ so litellm uses the Anthropic
# Messages API handler and routes to that endpoint — no special headers needed.
_original_model = model
if model.lower().startswith("kimi/"):
model = "anthropic/" + model[len("kimi/") :]
# Normalise api_base: litellm's Anthropic handler appends /v1/messages,
# so the base must be https://api.kimi.com/coding (no /v1 suffix).
# Strip a trailing /v1 in case the user's saved config has the old value.
if api_base and api_base.rstrip("/").endswith("/v1"):
api_base = api_base.rstrip("/")[:-3]
self.model = model
self.api_key = api_key
self.api_base = api_base or self._default_api_base_for_model(model)
self.api_base = api_base or self._default_api_base_for_model(_original_model)
self.extra_kwargs = kwargs
# The Codex ChatGPT backend (chatgpt.com/backend-api/codex) rejects
# several standard OpenAI params: max_output_tokens, stream_options.
@@ -350,6 +366,8 @@ class LiteLLMProvider(LLMProvider):
model_lower = model.lower()
if model_lower.startswith("minimax/") or model_lower.startswith("minimax-"):
return MINIMAX_API_BASE
if model_lower.startswith("kimi/"):
return KIMI_API_BASE
return None
def _completion_with_rate_limit_retry(
-4
View File
@@ -1,4 +0,0 @@
"""MCP servers for worker-bee."""
# Don't auto-import servers to avoid double-import issues when running with -m
__all__ = []
+55 -470
View File
@@ -51,11 +51,7 @@ def register_commands(subparsers: argparse._SubParsersAction) -> None:
action="store_true",
help="Show detailed execution logs (steps, LLM calls, etc.)",
)
run_parser.add_argument(
"--tui",
action="store_true",
help="Launch interactive terminal dashboard",
)
run_parser.add_argument(
"--model",
"-m",
@@ -194,158 +190,6 @@ def register_commands(subparsers: argparse._SubParsersAction) -> None:
shell_parser.set_defaults(func=cmd_shell)
# tui command (interactive agent dashboard)
tui_parser = subparsers.add_parser(
"tui",
help="Launch interactive TUI dashboard",
description="Browse available agents and launch the terminal dashboard.",
)
tui_parser.add_argument(
"--model",
"-m",
type=str,
default=None,
help="LLM model to use (any LiteLLM-compatible name)",
)
tui_parser.set_defaults(func=cmd_tui)
# code command (Hive Coder — framework agent builder)
code_parser = subparsers.add_parser(
"code",
help="Launch Hive Coder to build agents",
description="Interactive agent builder. Describe what you want and Hive Coder builds it.",
)
code_parser.add_argument(
"--model",
"-m",
type=str,
default=None,
help="LLM model to use (any LiteLLM-compatible name)",
)
code_parser.set_defaults(func=cmd_code)
# sessions command group (checkpoint/resume management)
sessions_parser = subparsers.add_parser(
"sessions",
help="Manage agent sessions",
description="List, inspect, and manage agent execution sessions.",
)
sessions_subparsers = sessions_parser.add_subparsers(
dest="sessions_cmd",
help="Session management commands",
)
# sessions list
sessions_list_parser = sessions_subparsers.add_parser(
"list",
help="List agent sessions",
description="List all sessions for an agent.",
)
sessions_list_parser.add_argument(
"agent_path",
type=str,
help="Path to agent folder",
)
sessions_list_parser.add_argument(
"--status",
choices=["all", "active", "failed", "completed", "paused"],
default="all",
help="Filter by session status (default: all)",
)
sessions_list_parser.add_argument(
"--has-checkpoints",
action="store_true",
help="Show only sessions with checkpoints",
)
sessions_list_parser.set_defaults(func=cmd_sessions_list)
# sessions show
sessions_show_parser = sessions_subparsers.add_parser(
"show",
help="Show session details",
description="Display detailed information about a specific session.",
)
sessions_show_parser.add_argument(
"agent_path",
type=str,
help="Path to agent folder",
)
sessions_show_parser.add_argument(
"session_id",
type=str,
help="Session ID to inspect",
)
sessions_show_parser.add_argument(
"--json",
action="store_true",
help="Output as JSON",
)
sessions_show_parser.set_defaults(func=cmd_sessions_show)
# sessions checkpoints
sessions_checkpoints_parser = sessions_subparsers.add_parser(
"checkpoints",
help="List session checkpoints",
description="List all checkpoints for a session.",
)
sessions_checkpoints_parser.add_argument(
"agent_path",
type=str,
help="Path to agent folder",
)
sessions_checkpoints_parser.add_argument(
"session_id",
type=str,
help="Session ID",
)
sessions_checkpoints_parser.set_defaults(func=cmd_sessions_checkpoints)
# pause command
pause_parser = subparsers.add_parser(
"pause",
help="Pause running session",
description="Request graceful pause of a running agent session.",
)
pause_parser.add_argument(
"agent_path",
type=str,
help="Path to agent folder",
)
pause_parser.add_argument(
"session_id",
type=str,
help="Session ID to pause",
)
pause_parser.set_defaults(func=cmd_pause)
# resume command
resume_parser = subparsers.add_parser(
"resume",
help="Resume session from checkpoint",
description="Resume a paused or failed session from a checkpoint.",
)
resume_parser.add_argument(
"agent_path",
type=str,
help="Path to agent folder",
)
resume_parser.add_argument(
"session_id",
type=str,
help="Session ID to resume",
)
resume_parser.add_argument(
"--checkpoint",
"-c",
type=str,
help="Specific checkpoint ID to resume from (default: latest)",
)
resume_parser.add_argument(
"--tui",
action="store_true",
help="Resume in TUI dashboard mode",
)
resume_parser.set_defaults(func=cmd_resume)
# setup-credentials command
setup_creds_parser = subparsers.add_parser(
"setup-credentials",
@@ -577,128 +421,67 @@ def cmd_run(args: argparse.Namespace) -> int:
)
return 1
# Run the agent (with TUI or standard)
if getattr(args, "tui", False):
from framework.tui.app import AdenTUI
# Standard execution
# AgentRunner handles credential setup interactively when stdin is a TTY.
try:
runner = AgentRunner.load(
args.agent_path,
model=args.model,
)
except CredentialError as e:
print(f"\n{e}", file=sys.stderr)
return 1
except FileNotFoundError as e:
print(f"Error: {e}", file=sys.stderr)
return 1
async def run_with_tui():
try:
# Load runner inside the async loop to ensure strict loop affinity
# (only one load — avoids spawning duplicate MCP subprocesses)
# AgentRunner handles credential setup interactively when stdin is a TTY.
try:
runner = AgentRunner.load(
args.agent_path,
model=args.model,
)
except CredentialError as e:
print(f"\n{e}", file=sys.stderr)
return
except Exception as e:
print(f"Error loading agent: {e}")
return
# Prompt before starting (allows credential updates)
if sys.stdin.isatty() and not args.quiet:
runner = _prompt_before_start(args.agent_path, runner, args.model)
if runner is None:
return 1
# Prompt before starting (allows credential updates)
if sys.stdin.isatty():
runner = _prompt_before_start(args.agent_path, runner, args.model)
if runner is None:
return
# Force setup inside the loop
if runner._agent_runtime is None:
try:
runner._setup()
except CredentialError as e:
print(f"\n{e}", file=sys.stderr)
return
# Start runtime before TUI so it's ready for user input
if runner._agent_runtime and not runner._agent_runtime.is_running:
await runner._agent_runtime.start()
app = AdenTUI(
runner._agent_runtime,
resume_session=getattr(args, "resume_session", None),
resume_checkpoint=getattr(args, "checkpoint", None),
)
# TUI handles execution via ChatRepl — user submits input,
# ChatRepl calls runtime.trigger_and_wait(). No auto-launch.
await app.run_async()
except Exception as e:
import traceback
traceback.print_exc()
print(f"TUI error: {e}")
await runner.cleanup_async()
return None
asyncio.run(run_with_tui())
print("TUI session ended.")
return 0
else:
# Standard execution — load runner here (not shared with TUI path)
# AgentRunner handles credential setup interactively when stdin is a TTY.
try:
runner = AgentRunner.load(
args.agent_path,
model=args.model,
# Load session/checkpoint state for resume (headless mode)
session_state = None
resume_session = getattr(args, "resume_session", None)
checkpoint = getattr(args, "checkpoint", None)
if resume_session:
session_state = _load_resume_state(args.agent_path, resume_session, checkpoint)
if session_state is None:
print(
f"Error: Could not load session state for {resume_session}",
file=sys.stderr,
)
except CredentialError as e:
print(f"\n{e}", file=sys.stderr)
return 1
except FileNotFoundError as e:
print(f"Error: {e}", file=sys.stderr)
return 1
# Prompt before starting (allows credential updates)
if sys.stdin.isatty() and not args.quiet:
runner = _prompt_before_start(args.agent_path, runner, args.model)
if runner is None:
return 1
# Load session/checkpoint state for resume (headless mode)
session_state = None
resume_session = getattr(args, "resume_session", None)
checkpoint = getattr(args, "checkpoint", None)
if resume_session:
session_state = _load_resume_state(args.agent_path, resume_session, checkpoint)
if session_state is None:
print(
f"Error: Could not load session state for {resume_session}",
file=sys.stderr,
)
return 1
if not args.quiet:
resume_node = session_state.get("paused_at", "unknown")
if checkpoint:
print(f"Resuming from checkpoint: {checkpoint}")
else:
print(f"Resuming session: {resume_session}")
print(f"Resume point: {resume_node}")
print()
# Auto-inject user_id if the agent expects it but it's not provided
entry_input_keys = runner.graph.nodes[0].input_keys if runner.graph.nodes else []
if "user_id" in entry_input_keys and context.get("user_id") is None:
import os
context["user_id"] = os.environ.get("USER", "default_user")
if not args.quiet:
info = runner.info()
print(f"Agent: {info.name}")
print(f"Goal: {info.goal_name}")
print(f"Steps: {info.node_count}")
print(f"Input: {json.dumps(context)}")
print()
print("=" * 60)
print("Executing agent...")
print("=" * 60)
resume_node = session_state.get("paused_at", "unknown")
if checkpoint:
print(f"Resuming from checkpoint: {checkpoint}")
else:
print(f"Resuming session: {resume_session}")
print(f"Resume point: {resume_node}")
print()
result = asyncio.run(runner.run(context, session_state=session_state))
# Auto-inject user_id if the agent expects it but it's not provided
entry_input_keys = runner.graph.nodes[0].input_keys if runner.graph.nodes else []
if "user_id" in entry_input_keys and context.get("user_id") is None:
import os
context["user_id"] = os.environ.get("USER", "default_user")
if not args.quiet:
info = runner.info()
print(f"Agent: {info.name}")
print(f"Goal: {info.goal_name}")
print(f"Steps: {info.node_count}")
print(f"Input: {json.dumps(context)}")
print()
print("=" * 60)
print("Executing agent...")
print("=" * 60)
print()
result = asyncio.run(runner.run(context, session_state=session_state))
# Format output
output = {
@@ -1364,154 +1147,6 @@ def _get_framework_agents_dir() -> Path:
return Path(__file__).resolve().parent.parent / "agents"
def _launch_agent_tui(
agent_path: str | Path,
model: str | None = None,
) -> int:
"""Load an agent and launch the TUI. Shared by cmd_tui and cmd_code."""
from framework.credentials.models import CredentialError
from framework.runner import AgentRunner
from framework.tui.app import AdenTUI
async def run_with_tui():
# AgentRunner handles credential setup interactively when stdin is a TTY.
try:
runner = AgentRunner.load(
agent_path,
model=model,
)
except CredentialError as e:
print(f"\n{e}", file=sys.stderr)
return
except Exception as e:
print(f"Error loading agent: {e}")
return
if runner._agent_runtime is None:
try:
runner._setup()
except CredentialError as e:
print(f"\n{e}", file=sys.stderr)
return
if runner._agent_runtime and not runner._agent_runtime.is_running:
await runner._agent_runtime.start()
app = AdenTUI(runner._agent_runtime)
try:
await app.run_async()
except Exception as e:
import traceback
traceback.print_exc()
print(f"TUI error: {e}")
await runner.cleanup_async()
asyncio.run(run_with_tui())
print("TUI session ended.")
return 0
def cmd_tui(args: argparse.Namespace) -> int:
"""Launch the interactive TUI dashboard with in-app agent picker."""
import logging
logging.basicConfig(level=logging.WARNING, format="%(message)s")
from framework.tui.app import AdenTUI
async def run_tui():
app = AdenTUI(
model=args.model,
)
await app.run_async()
asyncio.run(run_tui())
print("TUI session ended.")
return 0
def cmd_code(args: argparse.Namespace) -> int:
"""Launch Hive Coder with multi-graph support.
Unlike ``_launch_agent_tui``, this sets up graph lifecycle tools and
assigns ``graph_id="hive_coder"`` so the coder can load, supervise,
and restart secondary agent graphs within the same session.
"""
import logging
logging.basicConfig(level=logging.WARNING, format="%(message)s")
framework_agents_dir = _get_framework_agents_dir()
hive_coder_path = framework_agents_dir / "hive_coder"
if not (hive_coder_path / "agent.py").exists():
print("Error: Hive Coder agent not found.", file=sys.stderr)
return 1
# Ensure framework agents dir is on sys.path for import
fa_str = str(framework_agents_dir)
if fa_str not in sys.path:
sys.path.insert(0, fa_str)
from framework.credentials.models import CredentialError
from framework.runner import AgentRunner
from framework.tools.session_graph_tools import register_graph_tools
from framework.tui.app import AdenTUI
async def run_with_tui():
try:
runner = AgentRunner.load(hive_coder_path, model=args.model)
except CredentialError as e:
print(f"\n{e}", file=sys.stderr)
return
except Exception as e:
print(f"Error loading agent: {e}")
return
if runner._agent_runtime is None:
try:
runner._setup()
except CredentialError as e:
print(f"\n{e}", file=sys.stderr)
return
runtime = runner._agent_runtime
# -- Multi-graph setup --
# Tag the primary graph so events carry graph_id="hive_coder"
runtime._graph_id = "hive_coder"
runtime._active_graph_id = "hive_coder"
# Register graph lifecycle tools (load_agent, unload_agent, etc.)
register_graph_tools(runner._tool_registry, runtime)
# Refresh tool schemas AND executor so streams see the new tools.
# The executor closure references the registry dict by ref, but
# refreshing both is robust against any copy-on-read behavior.
runtime._tools = list(runner._tool_registry.get_tools().values())
runtime._tool_executor = runner._tool_registry.get_executor()
if not runtime.is_running:
await runtime.start()
app = AdenTUI(runtime)
try:
await app.run_async()
except Exception as e:
import traceback
traceback.print_exc()
print(f"TUI error: {e}")
await runner.cleanup_async()
asyncio.run(run_with_tui())
print("TUI session ended.")
return 0
def _extract_python_agent_metadata(agent_path: Path) -> tuple[str, str]:
"""Extract name and description from a Python-based agent's config.py.
@@ -1864,56 +1499,6 @@ def _interactive_multi(agents_dir: Path) -> int:
return 0
def cmd_sessions_list(args: argparse.Namespace) -> int:
"""List agent sessions."""
print("⚠ Sessions list command not yet implemented")
print("This will be available once checkpoint infrastructure is complete.")
print(f"\nAgent: {args.agent_path}")
print(f"Status filter: {args.status}")
print(f"Has checkpoints: {args.has_checkpoints}")
return 1
def cmd_sessions_show(args: argparse.Namespace) -> int:
"""Show detailed session information."""
print("⚠ Session show command not yet implemented")
print("This will be available once checkpoint infrastructure is complete.")
print(f"\nAgent: {args.agent_path}")
print(f"Session: {args.session_id}")
return 1
def cmd_sessions_checkpoints(args: argparse.Namespace) -> int:
"""List checkpoints for a session."""
print("⚠ Session checkpoints command not yet implemented")
print("This will be available once checkpoint infrastructure is complete.")
print(f"\nAgent: {args.agent_path}")
print(f"Session: {args.session_id}")
return 1
def cmd_pause(args: argparse.Namespace) -> int:
"""Pause a running session."""
print("⚠ Pause command not yet implemented")
print("This will be available once executor pause integration is complete.")
print(f"\nAgent: {args.agent_path}")
print(f"Session: {args.session_id}")
return 1
def cmd_resume(args: argparse.Namespace) -> int:
"""Resume a session from checkpoint."""
print("⚠ Resume command not yet implemented")
print("This will be available once checkpoint resume integration is complete.")
print(f"\nAgent: {args.agent_path}")
print(f"Session: {args.session_id}")
if args.checkpoint:
print(f"Checkpoint: {args.checkpoint}")
if args.tui:
print("Mode: TUI")
return 1
def cmd_setup_credentials(args: argparse.Namespace) -> int:
"""Interactive credential setup for an agent."""
from framework.credentials.setup import CredentialSetupSession
+13 -1
View File
@@ -68,6 +68,7 @@ class MCPClient:
self._read_stream = None
self._write_stream = None
self._stdio_context = None # Context manager for stdio_client
self._errlog_handle = None # Track errlog file handle for cleanup
self._http_client: httpx.Client | None = None
self._tools: dict[str, MCPTool] = {}
self._connected = False
@@ -200,7 +201,8 @@ class MCPClient:
if os.name == "nt":
errlog = sys.stderr
else:
errlog = open(os.devnull, "w") # noqa: SIM115
self._errlog_handle = open(os.devnull, "w")
errlog = self._errlog_handle
self._stdio_context = stdio_client(server_params, errlog=errlog)
(
self._read_stream,
@@ -475,6 +477,15 @@ class MCPClient:
finally:
self._stdio_context = None
# Third: close errlog file handle if we opened one
if self._errlog_handle is not None:
try:
self._errlog_handle.close()
except Exception as e:
logger.debug(f"Error closing errlog handle: {e}")
finally:
self._errlog_handle = None
def disconnect(self) -> None:
"""Disconnect from the MCP server."""
# Clean up persistent STDIO connection
@@ -545,6 +556,7 @@ class MCPClient:
self._write_stream = None
self._loop = None
self._loop_thread = None
self._errlog_handle = None
# Clean up HTTP client
if self._http_client:
+54
View File
@@ -517,6 +517,41 @@ def get_codex_account_id() -> str | None:
return None
# ---------------------------------------------------------------------------
# Kimi Code subscription token helpers
# ---------------------------------------------------------------------------
def get_kimi_code_token() -> str | None:
"""Get the API key from a Kimi Code CLI installation.
Reads the API key from ``~/.kimi/config.toml``, which is created when
the user runs ``kimi /login`` in the Kimi Code CLI.
Returns:
The API key if available, None otherwise.
"""
import tomllib
config_path = Path.home() / ".kimi" / "config.toml"
if not config_path.exists():
return None
try:
with open(config_path, "rb") as f:
config = tomllib.load(f)
providers = config.get("providers", {})
# kimi-cli stores credentials under providers.kimi-for-coding
for provider_cfg in providers.values():
if isinstance(provider_cfg, dict):
key = provider_cfg.get("api_key")
if key:
return key
except Exception:
pass
return None
@dataclass
class AgentInfo:
"""Information about an exported agent."""
@@ -1104,6 +1139,7 @@ class AgentRunner:
llm_config = config.get("llm", {})
use_claude_code = llm_config.get("use_claude_code_subscription", False)
use_codex = llm_config.get("use_codex_subscription", False)
use_kimi_code = llm_config.get("use_kimi_code_subscription", False)
api_base = llm_config.get("api_base")
api_key = None
@@ -1119,6 +1155,12 @@ class AgentRunner:
if not api_key:
print("Warning: Codex subscription configured but no token found.")
print("Run 'codex' to authenticate, then try again.")
elif use_kimi_code:
# Get API key from Kimi Code CLI config (~/.kimi/config.toml)
api_key = get_kimi_code_token()
if not api_key:
print("Warning: Kimi Code subscription configured but no key found.")
print("Run 'kimi /login' to authenticate, then try again.")
if api_key and use_claude_code:
# Use litellm's built-in Anthropic OAuth support.
@@ -1149,6 +1191,14 @@ class AgentRunner:
store=False,
allowed_openai_params=["store"],
)
elif api_key and use_kimi_code:
# Kimi Code subscription uses the Kimi coding API (OpenAI-compatible).
# The api_base is set automatically by LiteLLMProvider for kimi/ models.
self._llm = LiteLLMProvider(
model=self.model,
api_key=api_key,
api_base=api_base,
)
else:
# Local models (e.g. Ollama) don't need an API key
if self._is_local_model(self.model):
@@ -1314,6 +1364,8 @@ class AgentRunner:
return "TOGETHER_API_KEY"
elif model_lower.startswith("minimax/") or model_lower.startswith("minimax-"):
return "MINIMAX_API_KEY"
elif model_lower.startswith("kimi/"):
return "KIMI_API_KEY"
else:
# Default: assume OpenAI-compatible
return "OPENAI_API_KEY"
@@ -1334,6 +1386,8 @@ class AgentRunner:
cred_id = "anthropic"
elif model_lower.startswith("minimax/") or model_lower.startswith("minimax-"):
cred_id = "minimax"
elif model_lower.startswith("kimi/"):
cred_id = "kimi"
# Add more mappings as providers are added to LLM_CREDENTIALS
if cred_id is None:
+6 -1
View File
@@ -349,7 +349,7 @@ class AgentRuntime:
return
# Skip events originating from this graph's own
# executions (e.g. guardian should not fire on
# hive_coder failures — only secondary graphs).
# queen failures — only secondary graphs).
if _exclude_own and event.graph_id == self._graph_id:
return
ep_spec = self._entry_points.get(entry_point_id)
@@ -1531,6 +1531,11 @@ class AgentRuntime:
for executor in stream._active_executors.values():
for node_id, node in executor.node_registry.items():
if getattr(node, "_awaiting_input", False):
# Skip escalation receivers — those are handled
# by the queen via inject_worker_message(), not
# by the user directly.
if ":escalation:" in node_id:
continue
return node_id, graph_id
return None, None
+2 -2
View File
@@ -123,7 +123,7 @@ class EventType(StrEnum):
# Custom events
CUSTOM = "custom"
# Escalation (agent requests handoff to hive_coder)
# Escalation (agent requests handoff to queen)
ESCALATION_REQUESTED = "escalation_requested"
# Worker health monitoring (judge → queen → operator)
@@ -976,7 +976,7 @@ class EventBus:
context: str = "",
execution_id: str | None = None,
) -> None:
"""Emit escalation requested event (agent wants hive_coder)."""
"""Emit escalation requested event (agent wants queen)."""
await self.publish(
AgentEvent(
type=EventType.ESCALATION_REQUESTED,
+16 -4
View File
@@ -9,6 +9,7 @@ Each stream has:
import asyncio
import logging
import os
import time
import uuid
from collections import OrderedDict
@@ -240,6 +241,7 @@ class ExecutionStream:
self._active_executions: dict[str, ExecutionContext] = {}
self._execution_tasks: dict[str, asyncio.Task] = {}
self._active_executors: dict[str, GraphExecutor] = {}
self._cancel_reasons: dict[str, str] = {}
self._execution_results: OrderedDict[str, ExecutionResult] = OrderedDict()
self._execution_result_times: dict[str, float] = {}
self._completion_events: dict[str, asyncio.Event] = {}
@@ -464,7 +466,7 @@ class ExecutionStream:
node.signal_shutdown()
if hasattr(node, "cancel_current_turn"):
node.cancel_current_turn()
await self.cancel_execution(eid)
await self.cancel_execution(eid, reason="Restarted with new execution")
# When resuming, reuse the original session ID so the execution
# continues in the same session directory instead of creating a new one.
@@ -801,19 +803,20 @@ class ExecutionStream:
# Emit SSE event so the frontend knows the execution stopped.
# The executor does NOT emit on CancelledError, so there is no
# risk of double-emitting.
cancel_reason = self._cancel_reasons.pop(execution_id, "Execution cancelled")
if self._scoped_event_bus:
if has_result and result.paused_at:
await self._scoped_event_bus.emit_execution_paused(
stream_id=self.stream_id,
node_id=result.paused_at,
reason="Execution cancelled",
reason=cancel_reason,
execution_id=execution_id,
)
else:
await self._scoped_event_bus.emit_execution_failed(
stream_id=self.stream_id,
execution_id=execution_id,
error="Execution cancelled",
error=cancel_reason,
correlation_id=ctx.correlation_id,
)
@@ -961,6 +964,9 @@ class ExecutionStream:
if error:
state.result.error = error
# Stamp the owning process ID for cross-process stale detection
state.pid = os.getpid()
# Write state.json
await self._session_store.write_state(execution_id, state)
logger.debug(f"Wrote state.json for session {execution_id} (status={status})")
@@ -1054,18 +1060,24 @@ class ExecutionStream:
"""Get execution context."""
return self._active_executions.get(execution_id)
async def cancel_execution(self, execution_id: str) -> bool:
async def cancel_execution(self, execution_id: str, *, reason: str | None = None) -> bool:
"""
Cancel a running execution.
Args:
execution_id: Execution to cancel
reason: Human-readable reason for the cancellation (e.g.
"Stopped by queen", "User requested pause"). If not
provided, defaults to "Execution cancelled".
Returns:
True if cancelled, False if not found
"""
task = self._execution_tasks.get(execution_id)
if task and not task.done():
# Store the reason so the CancelledError handler can use it
# when emitting the pause/fail event.
self._cancel_reasons[execution_id] = reason or "Execution cancelled"
task.cancel()
# Wait briefly for the task to finish. Don't block indefinitely —
# the task may be stuck in a long LLM API call that doesn't
+3
View File
@@ -134,6 +134,9 @@ class SessionState(BaseModel):
# Input data (for debugging/replay)
input_data: dict[str, Any] = Field(default_factory=dict)
# Process ID of the owning process (for cross-process stale session detection)
pid: int | None = None
# Isolation level (from ExecutionContext)
isolation_level: str = "shared"
-36
View File
@@ -1,36 +0,0 @@
"""Backward-compatibility shim.
The primary implementation is now in ``session_manager.py``.
This module re-exports ``SessionManager`` as ``AgentManager`` and
keeps ``AgentSlot`` for test compatibility.
"""
import asyncio
from dataclasses import dataclass
from pathlib import Path
from typing import Any
from framework.server.session_manager import Session, SessionManager # noqa: F401
@dataclass
class AgentSlot:
"""Legacy data class — kept for test compatibility only.
New code should use ``Session`` from ``session_manager``.
"""
id: str
agent_path: Path
runner: Any
runtime: Any
info: Any
loaded_at: float
queen_executor: Any = None
queen_task: asyncio.Task | None = None
judge_task: asyncio.Task | None = None
escalation_sub: str | None = None
# Backward compat alias
AgentManager = SessionManager
+331
View File
@@ -0,0 +1,331 @@
"""Queen orchestrator — builds and runs the queen executor.
Extracted from SessionManager._start_queen() to keep session management
and queen orchestration concerns separate.
"""
from __future__ import annotations
import asyncio
import logging
from pathlib import Path
from typing import TYPE_CHECKING, Any
if TYPE_CHECKING:
from framework.server.session_manager import Session
logger = logging.getLogger(__name__)
async def create_queen(
session: Session,
session_manager: Any,
worker_identity: str | None,
queen_dir: Path,
initial_prompt: str | None = None,
) -> asyncio.Task:
"""Build the queen executor and return the running asyncio task.
Handles tool registration, phase-state initialization, prompt
composition, persona hook setup, graph preparation, and the queen
event loop.
"""
from framework.agents.queen.agent import (
queen_goal,
queen_graph as _queen_graph,
)
from framework.agents.queen.nodes import (
_QUEEN_BUILDING_TOOLS,
_QUEEN_PLANNING_TOOLS,
_QUEEN_RUNNING_TOOLS,
_QUEEN_STAGING_TOOLS,
_appendices,
_building_knowledge,
_planning_knowledge,
_queen_behavior_always,
_queen_behavior_building,
_queen_behavior_planning,
_queen_behavior_running,
_queen_behavior_staging,
_queen_identity_building,
_queen_identity_planning,
_queen_identity_running,
_queen_identity_staging,
_queen_phase_7,
_queen_style,
_queen_tools_building,
_queen_tools_planning,
_queen_tools_running,
_queen_tools_staging,
_shared_building_knowledge,
)
from framework.agents.queen.nodes.thinking_hook import select_expert_persona
from framework.graph.event_loop_node import HookContext, HookResult
from framework.graph.executor import GraphExecutor
from framework.runner.tool_registry import ToolRegistry
from framework.runtime.core import Runtime
from framework.runtime.event_bus import AgentEvent, EventType
from framework.tools.queen_lifecycle_tools import (
QueenPhaseState,
register_queen_lifecycle_tools,
)
hive_home = Path.home() / ".hive"
# ---- Tool registry ------------------------------------------------
queen_registry = ToolRegistry()
import framework.agents.queen as _queen_pkg
queen_pkg_dir = Path(_queen_pkg.__file__).parent
mcp_config = queen_pkg_dir / "mcp_servers.json"
if mcp_config.exists():
try:
queen_registry.load_mcp_config(mcp_config)
logger.info("Queen: loaded MCP tools from %s", mcp_config)
except Exception:
logger.warning("Queen: MCP config failed to load", exc_info=True)
# ---- Phase state --------------------------------------------------
initial_phase = "staging" if worker_identity else "planning"
phase_state = QueenPhaseState(phase=initial_phase, event_bus=session.event_bus)
session.phase_state = phase_state
# ---- Lifecycle tools (always registered) --------------------------
register_queen_lifecycle_tools(
queen_registry,
session=session,
session_id=session.id,
session_manager=session_manager,
manager_session_id=session.id,
phase_state=phase_state,
)
# ---- Monitoring tools (only when worker is loaded) ----------------
if session.worker_runtime:
from framework.tools.worker_monitoring_tools import register_worker_monitoring_tools
register_worker_monitoring_tools(
queen_registry,
session.event_bus,
session.worker_path,
stream_id="queen",
worker_graph_id=session.worker_runtime._graph_id,
)
queen_tools = list(queen_registry.get_tools().values())
queen_tool_executor = queen_registry.get_executor()
# ---- Partition tools by phase ------------------------------------
planning_names = set(_QUEEN_PLANNING_TOOLS)
building_names = set(_QUEEN_BUILDING_TOOLS)
staging_names = set(_QUEEN_STAGING_TOOLS)
running_names = set(_QUEEN_RUNNING_TOOLS)
registered_names = {t.name for t in queen_tools}
missing_building = building_names - registered_names
if missing_building:
logger.warning(
"Queen: %d/%d building tools NOT registered: %s",
len(missing_building),
len(building_names),
sorted(missing_building),
)
logger.info("Queen: registered tools: %s", sorted(registered_names))
phase_state.planning_tools = [t for t in queen_tools if t.name in planning_names]
phase_state.building_tools = [t for t in queen_tools if t.name in building_names]
phase_state.staging_tools = [t for t in queen_tools if t.name in staging_names]
phase_state.running_tools = [t for t in queen_tools if t.name in running_names]
# ---- Cross-session memory ----------------------------------------
from framework.agents.queen.queen_memory import seed_if_missing
seed_if_missing()
# ---- Compose phase-specific prompts ------------------------------
_orig_node = _queen_graph.nodes[0]
if worker_identity is None:
worker_identity = (
"\n\n# Worker Profile\n"
"No worker agent loaded. You are operating independently.\n"
"Handle all tasks directly using your coding tools."
)
_planning_body = (
_queen_style
+ _shared_building_knowledge
+ _queen_tools_planning
+ _queen_behavior_always
+ _queen_behavior_planning
+ _planning_knowledge
+ worker_identity
)
phase_state.prompt_planning = _queen_identity_planning + _planning_body
_building_body = (
_queen_style
+ _shared_building_knowledge
+ _queen_tools_building
+ _queen_behavior_always
+ _queen_behavior_building
+ _building_knowledge
+ _queen_phase_7
+ _appendices
+ worker_identity
)
phase_state.prompt_building = _queen_identity_building + _building_body
phase_state.prompt_staging = (
_queen_identity_staging
+ _queen_style
+ _queen_tools_staging
+ _queen_behavior_always
+ _queen_behavior_staging
+ worker_identity
)
phase_state.prompt_running = (
_queen_identity_running
+ _queen_style
+ _queen_tools_running
+ _queen_behavior_always
+ _queen_behavior_running
+ worker_identity
)
# ---- Persona hook ------------------------------------------------
_session_llm = session.llm
_session_event_bus = session.event_bus
async def _persona_hook(ctx: HookContext) -> HookResult | None:
persona = await select_expert_persona(ctx.trigger or "", _session_llm)
if not persona:
return None
if _session_event_bus is not None:
await _session_event_bus.publish(
AgentEvent(
type=EventType.QUEEN_PERSONA_SELECTED,
stream_id="queen",
data={"persona": persona},
)
)
return HookResult(system_prompt=persona + "\n\n" + phase_state.get_current_prompt())
# ---- Graph preparation -------------------------------------------
initial_prompt_text = phase_state.get_current_prompt()
registered_tool_names = set(queen_registry.get_tools().keys())
declared_tools = _orig_node.tools or []
available_tools = [t for t in declared_tools if t in registered_tool_names]
node_updates: dict = {
"system_prompt": initial_prompt_text,
}
if set(available_tools) != set(declared_tools):
missing = sorted(set(declared_tools) - registered_tool_names)
if missing:
logger.warning("Queen: tools not available: %s", missing)
node_updates["tools"] = available_tools
adjusted_node = _orig_node.model_copy(update=node_updates)
_queen_loop_config = {
**(_queen_graph.loop_config or {}),
"hooks": {"session_start": [_persona_hook]},
}
queen_graph = _queen_graph.model_copy(
update={"nodes": [adjusted_node], "loop_config": _queen_loop_config}
)
# ---- Queen event loop --------------------------------------------
queen_runtime = Runtime(hive_home / "queen")
async def _queen_loop():
try:
executor = GraphExecutor(
runtime=queen_runtime,
llm=session.llm,
tools=queen_tools,
tool_executor=queen_tool_executor,
event_bus=session.event_bus,
stream_id="queen",
storage_path=queen_dir,
loop_config=_queen_loop_config,
execution_id=session.id,
dynamic_tools_provider=phase_state.get_current_tools,
dynamic_prompt_provider=phase_state.get_current_prompt,
)
session.queen_executor = executor
# Wire inject_notification so phase switches notify the queen LLM
async def _inject_phase_notification(content: str) -> None:
node = executor.node_registry.get("queen")
if node is not None and hasattr(node, "inject_event"):
await node.inject_event(content)
phase_state.inject_notification = _inject_phase_notification
# Auto-switch to staging when worker execution finishes
async def _on_worker_done(event):
if event.stream_id == "queen":
return
if phase_state.phase == "running":
if event.type == EventType.EXECUTION_COMPLETED:
output = event.data.get("output", {})
output_summary = ""
if output:
for key, value in output.items():
val_str = str(value)
if len(val_str) > 200:
val_str = val_str[:200] + "..."
output_summary += f"\n {key}: {val_str}"
_out = output_summary or " (no output keys set)"
notification = (
"[WORKER_TERMINAL] Worker finished successfully.\n"
f"Output:{_out}\n"
"Report this to the user. "
"Ask if they want to continue with another run."
)
else: # EXECUTION_FAILED
error = event.data.get("error", "Unknown error")
notification = (
"[WORKER_TERMINAL] Worker failed.\n"
f"Error: {error}\n"
"Report this to the user and help them troubleshoot."
)
node = executor.node_registry.get("queen")
if node is not None and hasattr(node, "inject_event"):
await node.inject_event(notification)
await phase_state.switch_to_staging(source="auto")
session.event_bus.subscribe(
event_types=[EventType.EXECUTION_COMPLETED, EventType.EXECUTION_FAILED],
handler=_on_worker_done,
)
session_manager._subscribe_worker_handoffs(session, executor)
logger.info(
"Queen starting in %s phase with %d tools: %s",
phase_state.phase,
len(phase_state.get_current_tools()),
[t.name for t in phase_state.get_current_tools()],
)
result = await executor.execute(
graph=queen_graph,
goal=queen_goal,
input_data={"greeting": initial_prompt or "Session started."},
session_state={"resume_session_id": session.id},
)
if result.success:
logger.warning("Queen executor returned (should be forever-alive)")
else:
logger.error(
"Queen executor failed: %s",
result.error or "(no error message)",
)
except Exception:
logger.error("Queen conversation crashed", exc_info=True)
finally:
session.queen_executor = None
return asyncio.create_task(_queen_loop())
+6 -4
View File
@@ -347,7 +347,7 @@ async def handle_pause(request: web.Request) -> web.Response:
for exec_id in list(stream.active_execution_ids):
try:
ok = await stream.cancel_execution(exec_id)
ok = await stream.cancel_execution(exec_id, reason="Execution paused by user")
if ok:
cancelled.append(exec_id)
except Exception:
@@ -357,8 +357,8 @@ async def handle_pause(request: web.Request) -> web.Response:
runtime.pause_timers()
# Switch to staging (agent still loaded, ready to re-run)
if session.mode_state is not None:
await session.mode_state.switch_to_staging(source="frontend")
if session.phase_state is not None:
await session.phase_state.switch_to_staging(source="frontend")
return web.json_response(
{
@@ -400,7 +400,9 @@ async def handle_stop(request: web.Request) -> web.Response:
if hasattr(node, "cancel_current_turn"):
node.cancel_current_turn()
cancelled = await stream.cancel_execution(execution_id)
cancelled = await stream.cancel_execution(
execution_id, reason="Execution stopped by user"
)
if cancelled:
# Cancel queen's in-progress LLM turn
if session.queen_executor:
+2 -2
View File
@@ -61,7 +61,7 @@ def _session_to_live_dict(session) -> dict:
"loaded_at": session.loaded_at,
"uptime_seconds": round(time.time() - session.loaded_at, 1),
"intro_message": getattr(session.runner, "intro_message", "") or "",
"queen_phase": phase_state.phase if phase_state else "building",
"queen_phase": phase_state.phase if phase_state else "planning",
}
@@ -731,7 +731,7 @@ async def handle_delete_history_session(request: web.Request) -> web.Response:
async def handle_discover(request: web.Request) -> web.Response:
"""GET /api/discover — discover agents from filesystem."""
from framework.tui.screens.agent_picker import discover_agents
from framework.agents.discovery import discover_agents
manager = _get_manager(request)
loaded_paths = {str(s.worker_path) for s in manager.list_sessions() if s.worker_path}
+102 -278
View File
@@ -46,6 +46,8 @@ class Session:
judge_task: asyncio.Task | None = None
escalation_sub: str | None = None
worker_handoff_sub: str | None = None
# Memory consolidation subscription (fires on CONTEXT_COMPACTED)
memory_consolidation_sub: str | None = None
# Session directory resumption:
# When set, _start_queen writes queen conversations to this existing session's
# directory instead of creating a new one. This lets cold-restores accumulate
@@ -276,11 +278,20 @@ class SessionManager:
When a new runtime starts, any on-disk session still marked 'active'
is from a process that no longer exists. 'Paused' sessions are left
intact so they remain resumable.
Two-layer protection against corrupting live sessions:
1. In-memory: skip any session ID currently tracked in self._sessions
(guaranteed alive in this process).
2. PID validation: if state.json contains a ``pid`` field, check whether
that process is still running on the host. If it is, the session is
owned by another healthy worker process, so leave it alone.
"""
sessions_path = Path.home() / ".hive" / "agents" / agent_path.name / "sessions"
if not sessions_path.exists():
return
live_session_ids = set(self._sessions.keys())
for d in sessions_path.iterdir():
if not d.is_dir() or not d.name.startswith("session_"):
continue
@@ -291,6 +302,26 @@ class SessionManager:
state = json.loads(state_path.read_text(encoding="utf-8"))
if state.get("status") != "active":
continue
# Layer 1: skip sessions that are alive in this process
session_id = state.get("session_id", d.name)
if session_id in live_session_ids or d.name in live_session_ids:
logger.debug(
"Skipping live in-memory session '%s' during stale cleanup",
d.name,
)
continue
# Layer 2: skip sessions whose owning process is still alive
recorded_pid = state.get("pid")
if recorded_pid is not None and self._is_pid_alive(recorded_pid):
logger.debug(
"Skipping session '%s' — owning process %d is still running",
d.name,
recorded_pid,
)
continue
state["status"] = "cancelled"
state.setdefault("result", {})["error"] = "Stale session: runtime restarted"
state.setdefault("timestamps", {})["updated_at"] = datetime.now().isoformat()
@@ -301,6 +332,34 @@ class SessionManager:
except (json.JSONDecodeError, OSError) as e:
logger.warning("Failed to clean up stale session %s: %s", d.name, e)
@staticmethod
def _is_pid_alive(pid: int) -> bool:
"""Check whether a process with the given PID is still running."""
import os
import platform
if platform.system() == "Windows":
import ctypes
# PROCESS_QUERY_LIMITED_INFORMATION = 0x1000
kernel32 = ctypes.windll.kernel32
handle = kernel32.OpenProcess(0x1000, False, pid)
if not handle:
# 5 is ERROR_ACCESS_DENIED, meaning the process exists but is protected
return kernel32.GetLastError() == 5
exit_code = ctypes.c_ulong()
kernel32.GetExitCodeProcess(handle, ctypes.byref(exit_code))
kernel32.CloseHandle(handle)
# 259 is STILL_ACTIVE
return exit_code.value == 259
else:
try:
os.kill(pid, 0)
except OSError:
return False
return True
async def load_worker(
self,
session_id: str,
@@ -325,9 +384,9 @@ class SessionManager:
model=model,
)
# Notify queen about the loaded worker (skip for hive_coder itself).
# Notify queen about the loaded worker (skip for queen itself).
# Health judge disabled for simplicity.
if agent_path.name != "hive_coder" and session.worker_runtime:
if agent_path.name != "queen" and session.worker_runtime:
# await self._start_judge(session, session.runner._storage_path)
await self._notify_queen_worker_loaded(session)
@@ -379,6 +438,11 @@ class SessionManager:
if session is None:
return False
# Capture session data for memory consolidation before teardown
_llm = getattr(session, "llm", None)
_storage_id = getattr(session, "queen_resume_from", None) or session_id
_session_dir = Path.home() / ".hive" / "queen" / "session" / _storage_id
# Stop judge
self._stop_judge(session)
if session.worker_handoff_sub is not None:
@@ -388,7 +452,13 @@ class SessionManager:
pass
session.worker_handoff_sub = None
# Stop queen
# Stop queen and memory consolidation subscription
if session.memory_consolidation_sub is not None:
try:
session.event_bus.unsubscribe(session.memory_consolidation_sub)
except Exception:
pass
session.memory_consolidation_sub = None
if session.queen_task is not None:
session.queen_task.cancel()
session.queen_task = None
@@ -401,6 +471,17 @@ class SessionManager:
except Exception as e:
logger.error("Error cleaning up worker: %s", e)
# Final memory consolidation — fire-and-forget so teardown isn't blocked.
if _llm is not None and _session_dir.exists():
import asyncio
from framework.agents.queen.queen_memory import consolidate_queen_memory
asyncio.create_task(
consolidate_queen_memory(session_id, _session_dir, _llm),
name=f"queen-memory-consolidation-{session_id}",
)
logger.info("Session '%s' stopped", session_id)
return True
@@ -461,13 +542,7 @@ class SessionManager:
are written to the ORIGINAL session's directory so the full conversation
history accumulates in one place across server restarts.
"""
from framework.agents.hive_coder.agent import (
queen_goal,
queen_graph as _queen_graph,
)
from framework.graph.executor import GraphExecutor
from framework.runner.tool_registry import ToolRegistry
from framework.runtime.core import Runtime
from framework.server.queen_orchestrator import create_queen
hive_home = Path.home() / ".hive"
@@ -505,284 +580,33 @@ class SessionManager:
except OSError:
pass
# Register MCP coding tools
queen_registry = ToolRegistry()
import framework.agents.hive_coder as _hive_coder_pkg
hive_coder_dir = Path(_hive_coder_pkg.__file__).parent
mcp_config = hive_coder_dir / "mcp_servers.json"
if mcp_config.exists():
try:
queen_registry.load_mcp_config(mcp_config)
logger.info("Queen: loaded MCP tools from %s", mcp_config)
except Exception:
logger.warning("Queen: MCP config failed to load", exc_info=True)
# Phase state for building/running phase switching
from framework.tools.queen_lifecycle_tools import (
QueenPhaseState,
register_queen_lifecycle_tools,
)
# Start in staging when the caller provided an agent, building otherwise.
initial_phase = "staging" if worker_identity else "building"
phase_state = QueenPhaseState(phase=initial_phase, event_bus=session.event_bus)
session.phase_state = phase_state
# Always register lifecycle tools — they check session.worker_runtime
# at call time, so they work even if no worker is loaded yet.
register_queen_lifecycle_tools(
queen_registry,
session.queen_task = await create_queen(
session=session,
session_id=session.id,
session_manager=self,
manager_session_id=session.id,
phase_state=phase_state,
worker_identity=worker_identity,
queen_dir=queen_dir,
initial_prompt=initial_prompt,
)
# Monitoring tools need concrete worker paths — only register when present
if session.worker_runtime:
from framework.tools.worker_monitoring_tools import register_worker_monitoring_tools
# Memory consolidation — triggered by context compaction events.
# Compaction is a natural signal that "enough has happened to be worth remembering".
_consolidation_llm = session.llm
_consolidation_session_dir = queen_dir
register_worker_monitoring_tools(
queen_registry,
session.event_bus,
session.worker_path,
stream_id="queen",
worker_graph_id=session.worker_runtime._graph_id,
async def _on_compaction(_event) -> None:
from framework.agents.queen.queen_memory import consolidate_queen_memory
await consolidate_queen_memory(
session.id, _consolidation_session_dir, _consolidation_llm
)
queen_tools = list(queen_registry.get_tools().values())
queen_tool_executor = queen_registry.get_executor()
from framework.runtime.event_bus import EventType as _ET
# Partition tools into phase-specific sets and import prompt segments
from framework.agents.hive_coder.nodes import (
_QUEEN_BUILDING_TOOLS,
_QUEEN_RUNNING_TOOLS,
_QUEEN_STAGING_TOOLS,
_appendices,
_gcu_building_section,
_package_builder_knowledge,
_queen_behavior_always,
_queen_behavior_building,
_queen_behavior_running,
_queen_behavior_staging,
_queen_identity_building,
_queen_identity_running,
_queen_identity_staging,
_queen_phase_7,
_queen_style,
_queen_tools_building,
_queen_tools_running,
_queen_tools_staging,
session.memory_consolidation_sub = session.event_bus.subscribe(
event_types=[_ET.CONTEXT_COMPACTED],
handler=_on_compaction,
)
building_names = set(_QUEEN_BUILDING_TOOLS)
staging_names = set(_QUEEN_STAGING_TOOLS)
running_names = set(_QUEEN_RUNNING_TOOLS)
registered_names = {t.name for t in queen_tools}
missing_building = building_names - registered_names
if missing_building:
logger.warning(
"Queen: %d/%d building tools NOT registered: %s",
len(missing_building),
len(building_names),
sorted(missing_building),
)
logger.info("Queen: registered tools: %s", sorted(registered_names))
phase_state.building_tools = [t for t in queen_tools if t.name in building_names]
phase_state.staging_tools = [t for t in queen_tools if t.name in staging_names]
phase_state.running_tools = [t for t in queen_tools if t.name in running_names]
# Build queen graph with adjusted prompt + tools
_orig_node = _queen_graph.nodes[0]
if worker_identity is None:
worker_identity = (
"\n\n# Worker Profile\n"
"No worker agent loaded. You are operating independently.\n"
"Handle all tasks directly using your coding tools."
)
# Compose phase-specific prompts.
_building_body = (
_queen_style
+ _queen_tools_building
+ _queen_behavior_always
+ _queen_behavior_building
+ _package_builder_knowledge
+ _gcu_building_section
+ _queen_phase_7
+ _appendices
+ worker_identity
)
phase_state.prompt_building = _queen_identity_building + _building_body
phase_state.prompt_staging = (
_queen_identity_staging
+ _queen_style
+ _queen_tools_staging
+ _queen_behavior_always
+ _queen_behavior_staging
+ worker_identity
)
phase_state.prompt_running = (
_queen_identity_running
+ _queen_style
+ _queen_tools_running
+ _queen_behavior_always
+ _queen_behavior_running
+ worker_identity
)
# Build the session_start hook: selects the best-fit expert persona
# from the user's opening message and replaces the identity prefix.
from framework.agents.hive_coder.nodes.thinking_hook import select_expert_persona
from framework.graph.event_loop_node import HookContext, HookResult
from framework.runtime.event_bus import AgentEvent, EventType
_session_llm = session.llm
_session_event_bus = session.event_bus
async def _persona_hook(ctx: HookContext) -> HookResult | None:
persona = await select_expert_persona(ctx.trigger or "", _session_llm)
if not persona:
return None
if _session_event_bus is not None:
await _session_event_bus.publish(
AgentEvent(
type=EventType.QUEEN_PERSONA_SELECTED,
stream_id="queen",
data={"persona": persona},
)
)
return HookResult(system_prompt=persona + "\n\n" + _building_body)
initial_prompt_text = phase_state.get_current_prompt()
registered_tool_names = set(queen_registry.get_tools().keys())
declared_tools = _orig_node.tools or []
available_tools = [t for t in declared_tools if t in registered_tool_names]
node_updates: dict = {
"system_prompt": initial_prompt_text,
}
if set(available_tools) != set(declared_tools):
missing = sorted(set(declared_tools) - registered_tool_names)
if missing:
logger.warning("Queen: tools not available: %s", missing)
node_updates["tools"] = available_tools
adjusted_node = _orig_node.model_copy(update=node_updates)
_queen_loop_config = {
**(_queen_graph.loop_config or {}),
"hooks": {"session_start": [_persona_hook]},
}
queen_graph = _queen_graph.model_copy(
update={"nodes": [adjusted_node], "loop_config": _queen_loop_config}
)
queen_runtime = Runtime(hive_home / "queen")
async def _queen_loop():
try:
executor = GraphExecutor(
runtime=queen_runtime,
llm=session.llm,
tools=queen_tools,
tool_executor=queen_tool_executor,
event_bus=session.event_bus,
stream_id="queen",
storage_path=queen_dir,
loop_config=_queen_loop_config,
execution_id=session.id,
dynamic_tools_provider=phase_state.get_current_tools,
dynamic_prompt_provider=phase_state.get_current_prompt,
)
session.queen_executor = executor
# Wire inject_notification so phase switches notify the queen LLM
async def _inject_phase_notification(content: str) -> None:
node = executor.node_registry.get("queen")
if node is not None and hasattr(node, "inject_event"):
await node.inject_event(content)
phase_state.inject_notification = _inject_phase_notification
# Auto-switch to staging when worker execution finishes naturally
# and notify the queen about the termination
from framework.runtime.event_bus import EventType as _ET
async def _on_worker_done(event):
if event.stream_id == "queen":
return
if phase_state.phase == "running":
# Build termination notification for the queen
if event.type == _ET.EXECUTION_COMPLETED:
output = event.data.get("output", {})
output_summary = ""
if output:
# Summarize key outputs for the queen
for key, value in output.items():
val_str = str(value)
if len(val_str) > 200:
val_str = val_str[:200] + "..."
output_summary += f"\n {key}: {val_str}"
_out = output_summary or " (no output keys set)"
notification = (
"[WORKER_TERMINAL] Worker finished successfully.\n"
f"Output:{_out}\n"
"Report this to the user. "
"Ask if they want to continue with another run."
)
else: # EXECUTION_FAILED
error = event.data.get("error", "Unknown error")
notification = (
"[WORKER_TERMINAL] Worker failed.\n"
f"Error: {error}\n"
"Report this to the user and help them troubleshoot."
)
# Inject notification to queen before phase switch
node = executor.node_registry.get("queen")
if node is not None and hasattr(node, "inject_event"):
await node.inject_event(notification)
await phase_state.switch_to_staging(source="auto")
session.event_bus.subscribe(
event_types=[_ET.EXECUTION_COMPLETED, _ET.EXECUTION_FAILED],
handler=_on_worker_done,
)
self._subscribe_worker_handoffs(session, executor)
logger.info(
"Queen starting in %s phase with %d tools: %s",
phase_state.phase,
len(phase_state.get_current_tools()),
[t.name for t in phase_state.get_current_tools()],
)
result = await executor.execute(
graph=queen_graph,
goal=queen_goal,
input_data={"greeting": initial_prompt or "Session started."},
session_state={"resume_session_id": session.id},
)
if result.success:
logger.warning("Queen executor returned (should be forever-alive)")
else:
logger.error(
"Queen executor failed: %s",
result.error or "(no error message)",
)
except Exception:
logger.error("Queen conversation crashed", exc_info=True)
finally:
session.queen_executor = None
session.queen_task = asyncio.create_task(_queen_loop())
# ------------------------------------------------------------------
# Judge startup / teardown
# ------------------------------------------------------------------
+108
View File
@@ -37,6 +37,7 @@ class MockNodeSpec:
client_facing: bool = False
success_criteria: str | None = None
system_prompt: str | None = None
sub_agents: list = field(default_factory=list)
@dataclass
@@ -67,6 +68,7 @@ class MockEntryPoint:
name: str = "Default"
entry_node: str = "start"
trigger_type: str = "manual"
trigger_config: dict = field(default_factory=dict)
@dataclass
@@ -130,6 +132,9 @@ class MockRuntime:
def get_stats(self):
return {"running": True, "executions": 1}
def get_timer_next_fire_in(self, ep_id):
return None
class MockAgentInfo:
name: str = "test_agent"
@@ -1556,3 +1561,106 @@ class TestErrorMiddleware:
async with TestClient(TestServer(app)) as client:
resp = await client.get("/api/nonexistent")
assert resp.status == 404
class TestCleanupStaleActiveSessions:
"""Tests for _cleanup_stale_active_sessions with two-layer protection."""
def _make_manager(self):
from framework.server.session_manager import SessionManager
return SessionManager()
def _write_state(self, session_dir: Path, status: str, pid: int | None = None) -> None:
session_dir.mkdir(parents=True, exist_ok=True)
state: dict = {"status": status, "session_id": session_dir.name}
if pid is not None:
state["pid"] = pid
(session_dir / "state.json").write_text(json.dumps(state))
def _read_state(self, session_dir: Path) -> dict:
return json.loads((session_dir / "state.json").read_text())
def test_stale_session_is_cancelled(self, tmp_path, monkeypatch):
"""Truly stale active sessions (no live tracking, no PID) get cancelled."""
monkeypatch.setattr(Path, "home", lambda: tmp_path)
agent_path = Path("my_agent")
sessions_dir = tmp_path / ".hive" / "agents" / "my_agent" / "sessions"
session_dir = sessions_dir / "session_stale_001"
self._write_state(session_dir, "active")
mgr = self._make_manager()
mgr._cleanup_stale_active_sessions(agent_path)
state = self._read_state(session_dir)
assert state["status"] == "cancelled"
assert "Stale session" in state["result"]["error"]
def test_live_in_memory_session_is_skipped(self, tmp_path, monkeypatch):
"""Sessions tracked in self._sessions must NOT be cancelled (Layer 1)."""
monkeypatch.setattr(Path, "home", lambda: tmp_path)
agent_path = Path("my_agent")
sessions_dir = tmp_path / ".hive" / "agents" / "my_agent" / "sessions"
session_dir = sessions_dir / "session_live_002"
self._write_state(session_dir, "active")
mgr = self._make_manager()
# Simulate a live session in the manager's in-memory map
mgr._sessions["session_live_002"] = MagicMock()
mgr._cleanup_stale_active_sessions(agent_path)
state = self._read_state(session_dir)
assert state["status"] == "active", "Live in-memory session should NOT be cancelled"
def test_session_with_live_pid_is_skipped(self, tmp_path, monkeypatch):
"""Sessions whose owning PID is still alive must NOT be cancelled (Layer 2)."""
import os
monkeypatch.setattr(Path, "home", lambda: tmp_path)
agent_path = Path("my_agent")
sessions_dir = tmp_path / ".hive" / "agents" / "my_agent" / "sessions"
session_dir = sessions_dir / "session_pid_003"
# Use the current process PID — guaranteed to be alive
self._write_state(session_dir, "active", pid=os.getpid())
mgr = self._make_manager()
mgr._cleanup_stale_active_sessions(agent_path)
state = self._read_state(session_dir)
assert state["status"] == "active", "Session with live PID should NOT be cancelled"
def test_session_with_dead_pid_is_cancelled(self, tmp_path, monkeypatch):
"""Sessions whose owning PID is dead should be cancelled."""
monkeypatch.setattr(Path, "home", lambda: tmp_path)
agent_path = Path("my_agent")
sessions_dir = tmp_path / ".hive" / "agents" / "my_agent" / "sessions"
session_dir = sessions_dir / "session_dead_004"
# Use a PID that is almost certainly not running
self._write_state(session_dir, "active", pid=999999999)
mgr = self._make_manager()
mgr._cleanup_stale_active_sessions(agent_path)
state = self._read_state(session_dir)
assert state["status"] == "cancelled"
assert "Stale session" in state["result"]["error"]
def test_paused_session_is_never_touched(self, tmp_path, monkeypatch):
"""Paused sessions should remain intact regardless of PID or tracking."""
monkeypatch.setattr(Path, "home", lambda: tmp_path)
agent_path = Path("my_agent")
sessions_dir = tmp_path / ".hive" / "agents" / "my_agent" / "sessions"
session_dir = sessions_dir / "session_paused_005"
self._write_state(session_dir, "paused")
mgr = self._make_manager()
mgr._cleanup_stale_active_sessions(agent_path)
state = self._read_state(session_dir)
assert state["status"] == "paused", "Paused sessions must remain untouched"
-179
View File
@@ -1,179 +0,0 @@
"""
State Writer - Dual-write adapter for migration period.
Writes execution state to both old (Run/RunSummary) and new (state.json) formats
to maintain backward compatibility during the transition period.
"""
import logging
import os
from datetime import datetime
from framework.schemas.run import Problem, Run, RunMetrics, RunStatus
from framework.schemas.session_state import SessionState, SessionStatus
from framework.storage.concurrent import ConcurrentStorage
from framework.storage.session_store import SessionStore
logger = logging.getLogger(__name__)
class StateWriter:
"""
Writes execution state to both old and new formats during migration.
During the dual-write phase:
- New format (state.json) is written when USE_UNIFIED_SESSIONS=true
- Old format (Run/RunSummary) is always written for backward compatibility
"""
def __init__(self, old_storage: ConcurrentStorage, session_store: SessionStore):
"""
Initialize state writer.
Args:
old_storage: ConcurrentStorage for old format (runs/, summaries/)
session_store: SessionStore for new format (sessions/*/state.json)
"""
self.old = old_storage
self.new = session_store
self.dual_write_enabled = os.getenv("USE_UNIFIED_SESSIONS", "false").lower() == "true"
async def write_execution_state(
self,
session_id: str,
state: SessionState,
) -> None:
"""
Write execution state to both old and new formats.
Args:
session_id: Session ID
state: SessionState to write
"""
# Write to new format if enabled
if self.dual_write_enabled:
try:
await self.new.write_state(session_id, state)
logger.debug(f"Wrote state.json for session {session_id}")
except Exception as e:
logger.error(f"Failed to write state.json for {session_id}: {e}")
# Don't fail - old format is still written
# Always write to old format for backward compatibility
try:
run = self._convert_to_run(state)
await self.old.save_run(run)
logger.debug(f"Wrote Run object for session {session_id}")
except Exception as e:
logger.error(f"Failed to write Run object for {session_id}: {e}")
# This is more critical - reraise if old format fails
raise
def _convert_to_run(self, state: SessionState) -> Run:
"""
Convert SessionState to legacy Run object.
Args:
state: SessionState to convert
Returns:
Run object
"""
# Map SessionStatus to RunStatus
status_mapping = {
SessionStatus.ACTIVE: RunStatus.RUNNING,
SessionStatus.PAUSED: RunStatus.RUNNING, # Paused is still "running" in old format
SessionStatus.COMPLETED: RunStatus.COMPLETED,
SessionStatus.FAILED: RunStatus.FAILED,
SessionStatus.CANCELLED: RunStatus.CANCELLED,
}
run_status = status_mapping.get(state.status, RunStatus.FAILED)
# Convert timestamps
started_at = datetime.fromisoformat(state.timestamps.started_at)
completed_at = (
datetime.fromisoformat(state.timestamps.completed_at)
if state.timestamps.completed_at
else None
)
# Build RunMetrics
metrics = RunMetrics(
total_decisions=state.metrics.decision_count,
successful_decisions=state.metrics.decision_count
- len(state.progress.nodes_with_failures), # Approximate
failed_decisions=len(state.progress.nodes_with_failures),
total_tokens=state.metrics.total_input_tokens + state.metrics.total_output_tokens,
total_latency_ms=state.progress.total_latency_ms,
nodes_executed=state.metrics.nodes_executed,
edges_traversed=state.metrics.edges_traversed,
)
# Convert problems (SessionState stores as dicts, Run expects Problem objects)
problems = []
for p_dict in state.problems:
# Handle both old Problem objects and new dict format
if isinstance(p_dict, dict):
problems.append(Problem(**p_dict))
else:
problems.append(p_dict)
# Convert decisions (SessionState stores as dicts, Run expects Decision objects)
from framework.schemas.decision import Decision
decisions = []
for d_dict in state.decisions:
# Handle both old Decision objects and new dict format
if isinstance(d_dict, dict):
try:
decisions.append(Decision(**d_dict))
except Exception:
# Skip invalid decisions
continue
else:
decisions.append(d_dict)
# Create Run object
run = Run(
id=state.session_id, # Use session_id as run_id
goal_id=state.goal_id,
started_at=started_at,
status=run_status,
completed_at=completed_at,
decisions=decisions,
problems=problems,
metrics=metrics,
goal_description="", # Not stored in SessionState
input_data=state.input_data,
output_data=state.result.output,
)
return run
async def read_state(
self,
session_id: str,
prefer_new: bool = True,
) -> SessionState | None:
"""
Read execution state from either format.
Args:
session_id: Session ID
prefer_new: If True, try new format first (default)
Returns:
SessionState or None if not found
"""
if prefer_new:
# Try new format first
state = await self.new.read_state(session_id)
if state:
return state
# Fall back to old format
run = await self.old.load_run(session_id)
if run:
return SessionState.from_legacy_run(run, session_id)
return None
+320 -38
View File
@@ -71,12 +71,13 @@ class WorkerSessionAdapter:
class QueenPhaseState:
"""Mutable state container for queen operating phase.
Three phases: building staging running.
Four phases: planning building staging running.
Shared between the dynamic_tools_provider callback and tool handlers
that trigger phase transitions.
"""
phase: str = "building" # "building", "staging", or "running"
phase: str = "building" # "planning", "building", "staging", or "running"
planning_tools: list = field(default_factory=list) # list[Tool]
building_tools: list = field(default_factory=list) # list[Tool]
staging_tools: list = field(default_factory=list) # list[Tool]
running_tools: list = field(default_factory=list) # list[Tool]
@@ -84,12 +85,15 @@ class QueenPhaseState:
event_bus: Any = None # EventBus — for emitting QUEEN_PHASE_CHANGED events
# Phase-specific prompts (set by session_manager after construction)
prompt_planning: str = ""
prompt_building: str = ""
prompt_staging: str = ""
prompt_running: str = ""
def get_current_tools(self) -> list:
"""Return tools for the current phase."""
if self.phase == "planning":
return list(self.planning_tools)
if self.phase == "running":
return list(self.running_tools)
if self.phase == "staging":
@@ -97,12 +101,20 @@ class QueenPhaseState:
return list(self.building_tools)
def get_current_prompt(self) -> str:
"""Return the system prompt for the current phase."""
if self.phase == "running":
return self.prompt_running
if self.phase == "staging":
return self.prompt_staging
return self.prompt_building
"""Return the system prompt for the current phase, with fresh memory appended."""
if self.phase == "planning":
base = self.prompt_planning
elif self.phase == "running":
base = self.prompt_running
elif self.phase == "staging":
base = self.prompt_staging
else:
base = self.prompt_building
from framework.agents.queen.queen_memory import format_for_injection
memory = format_for_injection()
return base + ("\n\n" + memory if memory else "")
async def _emit_phase_event(self) -> None:
"""Publish a QUEEN_PHASE_CHANGED event so the frontend updates the tag."""
@@ -128,22 +140,15 @@ class QueenPhaseState:
tool_names = [t.name for t in self.running_tools]
logger.info("Queen phase → running (source=%s, tools: %s)", source, tool_names)
await self._emit_phase_event()
if self.inject_notification:
if source == "frontend":
msg = (
"[PHASE CHANGE] The user clicked Run in the UI. Switched to RUNNING phase. "
"Worker is now executing. You have monitoring/lifecycle tools: "
+ ", ".join(tool_names)
+ "."
)
else:
msg = (
"[PHASE CHANGE] Switched to RUNNING phase. "
"Worker is executing. You now have monitoring/lifecycle tools: "
+ ", ".join(tool_names)
+ "."
)
await self.inject_notification(msg)
# Skip notification when source="tool" — the tool result already
# contains the phase change info.
if self.inject_notification and source != "tool":
await self.inject_notification(
"[PHASE CHANGE] The user clicked Run in the UI. Switched to RUNNING phase. "
"Worker is now executing. You have monitoring/lifecycle tools: "
+ ", ".join(tool_names)
+ "."
)
async def switch_to_staging(self, source: str = "tool") -> None:
"""Switch to staging phase and notify the queen.
@@ -157,26 +162,21 @@ class QueenPhaseState:
tool_names = [t.name for t in self.staging_tools]
logger.info("Queen phase → staging (source=%s, tools: %s)", source, tool_names)
await self._emit_phase_event()
if self.inject_notification:
# Skip notification when source="tool" — the tool result already
# contains the phase change info.
if self.inject_notification and source != "tool":
if source == "frontend":
msg = (
"[PHASE CHANGE] The user stopped the worker from the UI. "
"Switched to STAGING phase. Agent is still loaded. "
"Available tools: " + ", ".join(tool_names) + "."
)
elif source == "auto":
else:
msg = (
"[PHASE CHANGE] Worker execution completed. Switched to STAGING phase. "
"Agent is still loaded. Call run_agent_with_input(task) to run again. "
"Available tools: " + ", ".join(tool_names) + "."
)
else:
msg = (
"[PHASE CHANGE] Switched to STAGING phase. "
"Agent loaded and ready. Call run_agent_with_input(task) to start, "
"or stop_worker_and_edit() to go back to building. "
"Available tools: " + ", ".join(tool_names) + "."
)
await self.inject_notification(msg)
async def switch_to_building(self, source: str = "tool") -> None:
@@ -191,13 +191,35 @@ class QueenPhaseState:
tool_names = [t.name for t in self.building_tools]
logger.info("Queen phase → building (source=%s, tools: %s)", source, tool_names)
await self._emit_phase_event()
if self.inject_notification:
if self.inject_notification and source != "tool":
await self.inject_notification(
"[PHASE CHANGE] Switched to BUILDING phase. "
"Lifecycle tools removed. Full coding tools restored. "
"Call load_built_agent(path) when ready to stage."
)
async def switch_to_planning(self, source: str = "tool") -> None:
"""Switch to planning phase and notify the queen.
Args:
source: Who triggered the switch "tool", "frontend", or "auto".
"""
if self.phase == "planning":
return
self.phase = "planning"
tool_names = [t.name for t in self.planning_tools]
logger.info("Queen phase → planning (source=%s, tools: %s)", source, tool_names)
await self._emit_phase_event()
# Skip notification when source="tool" — the tool result already
# contains the phase change info; injecting a duplicate notification
# causes the queen to respond twice.
if self.inject_notification and source != "tool":
await self.inject_notification(
"[PHASE CHANGE] Switched to PLANNING phase. "
"Coding tools removed. Discuss goals and design with the user. "
"Available tools: " + ", ".join(tool_names) + "."
)
def build_worker_profile(runtime: AgentRuntime, agent_path: Path | str | None = None) -> str:
"""Build a worker capability profile from its graph/goal definition.
@@ -423,7 +445,7 @@ def register_queen_lifecycle_tools(
# --- stop_worker ----------------------------------------------------------
async def stop_worker() -> str:
async def stop_worker(*, reason: str = "Stopped by queen") -> str:
"""Cancel all active worker executions across all graphs.
Stops the worker immediately. Returns the IDs of cancelled executions.
@@ -453,7 +475,7 @@ def register_queen_lifecycle_tools(
for exec_id in list(stream.active_execution_ids):
try:
ok = await stream.cancel_execution(exec_id)
ok = await stream.cancel_execution(exec_id, reason=reason)
if ok:
cancelled.append(exec_id)
except Exception as e:
@@ -498,6 +520,11 @@ def register_queen_lifecycle_tools(
"Use your coding tools to modify the agent, then call "
"load_built_agent(path) to stage it again."
)
# Nudge the queen to start coding instead of blocking for user input.
if phase_state is not None and phase_state.inject_notification:
await phase_state.inject_notification(
"[PHASE CHANGE] Switched to BUILDING phase. Start implementing the changes now."
)
return json.dumps(result)
_stop_edit_tool = Tool(
@@ -514,6 +541,171 @@ def register_queen_lifecycle_tools(
)
tools_registered += 1
# --- stop_worker_and_plan (Running/Staging → Planning) --------------------
async def stop_worker_and_plan() -> str:
"""Stop the worker and switch to planning phase for diagnosis."""
stop_result = await stop_worker()
# Switch to planning phase
if phase_state is not None:
await phase_state.switch_to_planning(source="tool")
result = json.loads(stop_result)
result["phase"] = "planning"
result["message"] = (
"Worker stopped. You are now in planning phase. "
"Diagnose the issue using read-only tools (checkpoints, logs, sessions), "
"discuss a fix plan with the user, then call "
"initialize_and_build_agent() to implement the fix."
)
return json.dumps(result)
_stop_plan_tool = Tool(
name="stop_worker_and_plan",
description=(
"Stop the worker and switch to planning phase for diagnosis. "
"Use this when you need to investigate an issue before fixing it. "
"After diagnosis, call initialize_and_build_agent() to switch to building."
),
parameters={"type": "object", "properties": {}},
)
registry.register(
"stop_worker_and_plan", _stop_plan_tool, lambda inputs: stop_worker_and_plan()
)
tools_registered += 1
# --- replan_agent (Building → Planning) -----------------------------------
async def replan_agent() -> str:
"""Switch from building back to planning phase.
Only use when the user explicitly asks to re-plan."""
if phase_state is not None:
if phase_state.phase != "building":
return json.dumps(
{"error": f"Cannot replan: currently in {phase_state.phase} phase."}
)
await phase_state.switch_to_planning(source="tool")
return json.dumps(
{
"status": "replanning",
"phase": "planning",
"message": (
"Switched to PLANNING phase. Coding tools removed. "
"Discuss the new design with the user."
),
}
)
_replan_tool = Tool(
name="replan_agent",
description=(
"Switch from building back to planning phase. "
"Only use when the user explicitly asks to re-plan or redesign the agent."
),
parameters={"type": "object", "properties": {}},
)
registry.register("replan_agent", _replan_tool, lambda inputs: replan_agent())
tools_registered += 1
# --- initialize_and_build_agent wrapper (Planning → Building) -------------
# With agent_name: scaffold a new agent via MCP tool, then switch to building.
# Without agent_name: just switch to building (for fixing an existing loaded agent).
_existing_init = registry._tools.get("initialize_and_build_agent")
if _existing_init is not None:
_orig_init_executor = _existing_init.executor
async def initialize_and_build_agent_wrapper(inputs: dict) -> str:
"""Wrapper: scaffold or just switch to building phase."""
agent_name = (inputs.get("agent_name") or "").strip()
# No agent_name → try to fall back to the session's current agent,
# or fail with actionable guidance.
if not agent_name:
# Try to resolve agent_name from the current session
fallback_path = getattr(session, "worker_path", None)
if fallback_path is not None:
agent_name = Path(fallback_path).name
else:
# Server path: check SessionManager
if session_manager is not None and manager_session_id:
srv_session = session_manager.get_session(manager_session_id)
if srv_session and getattr(srv_session, "worker_path", None):
fallback_path = srv_session.worker_path
agent_name = Path(fallback_path).name
if not agent_name:
return json.dumps(
{
"error": (
"No agent_name provided and no agent loaded in this session. "
"To fix: call list_agents() to find the agent name, then call "
"initialize_and_build_agent(agent_name='<name>') to scaffold it."
)
}
)
# Fall back succeeded — switch to building without scaffolding
logger.info(
"initialize_and_build_agent: no agent_name provided, "
"falling back to session agent '%s'",
agent_name,
)
if phase_state is not None:
await phase_state.switch_to_building(source="tool")
if phase_state.inject_notification:
await phase_state.inject_notification(
"[PHASE CHANGE] Switched to BUILDING phase. "
"Start implementing the fix now."
)
return json.dumps(
{
"status": "editing",
"phase": "building",
"agent_name": agent_name,
"warning": (
f"No agent_name provided — using session agent '{agent_name}'. "
f"Agent files are at exports/{agent_name}/."
),
"message": (
"Switched to BUILDING phase. Full coding tools restored. "
"Implement the fix, then call load_built_agent(path) to reload."
),
}
)
# Has agent_name → scaffold via MCP tool
result = _orig_init_executor(inputs)
# Handle both sync and async executors
if asyncio.iscoroutine(result) or asyncio.isfuture(result):
result = await result
# If result is a ToolResult, extract the text content
result_str = str(result)
if hasattr(result, "content"):
result_str = str(result.content)
try:
parsed = json.loads(result_str)
if parsed.get("success", True):
if phase_state is not None:
await phase_state.switch_to_building(source="tool")
# Inject a continuation message so the queen starts
# building immediately instead of blocking for user input.
if phase_state.inject_notification:
await phase_state.inject_notification(
"[PHASE CHANGE] Agent scaffolded and switched to BUILDING phase. "
"Start implementing the agent nodes now."
)
except (json.JSONDecodeError, KeyError, TypeError):
pass
return result_str
registry.register(
"initialize_and_build_agent",
_existing_init.tool,
lambda inputs: initialize_and_build_agent_wrapper(inputs),
)
# --- stop_worker (Running → Staging) -------------------------------------
async def stop_worker_to_staging() -> str:
@@ -1260,7 +1452,23 @@ def register_queen_lifecycle_tools(
if reg is None:
return json.dumps({"error": "Worker graph not found"})
# Find an active node that can accept injected input
# Prefer nodes that are actively waiting (e.g. escalation receivers
# blocked on queen guidance) over the main event-loop node.
for stream in reg.streams.values():
waiting = stream.get_waiting_nodes()
if waiting:
target_node_id = waiting[0]["node_id"]
ok = await stream.inject_input(target_node_id, content, is_client_input=True)
if ok:
return json.dumps(
{
"status": "delivered",
"node_id": target_node_id,
"content_preview": content[:100],
}
)
# Fallback: inject into any injectable node
for stream in reg.streams.values():
injectable = stream.get_injectable_nodes()
if injectable:
@@ -1429,6 +1637,51 @@ def register_queen_lifecycle_tools(
if not resolved_path.exists():
return json.dumps({"error": f"Agent path does not exist: {agent_path}"})
# Pre-check: verify the module exports goal/nodes/edges before
# attempting the full load. This gives the queen an actionable
# error message instead of a cryptic ImportError or TypeError.
try:
import importlib
import sys as _sys
pkg_name = resolved_path.name
parent_dir = str(resolved_path.resolve().parent)
# Temporarily put parent on sys.path for import
if parent_dir not in _sys.path:
_sys.path.insert(0, parent_dir)
# Evict stale cached modules
stale = [n for n in _sys.modules if n == pkg_name or n.startswith(f"{pkg_name}.")]
for n in stale:
del _sys.modules[n]
mod = importlib.import_module(pkg_name)
missing_attrs = [
attr for attr in ("goal", "nodes", "edges") if getattr(mod, attr, None) is None
]
if missing_attrs:
return json.dumps(
{
"error": (
f"Agent module '{pkg_name}' is missing module-level "
f"attributes: {', '.join(missing_attrs)}. "
f"Fix: in {pkg_name}/__init__.py, add "
f"'from .agent import {', '.join(missing_attrs)}' "
f"so that 'import {pkg_name}' exposes them at package level."
)
}
)
except Exception as pre_err:
return json.dumps(
{
"error": (
f"Failed to import agent module '{resolved_path.name}': {pre_err}. "
f"Fix: ensure {resolved_path.name}/__init__.py exists and can be "
f"imported without errors (check syntax, missing dependencies, "
f"and relative imports)."
)
}
)
try:
updated_session = await session_manager.load_worker(
manager_session_id,
@@ -1436,7 +1689,36 @@ def register_queen_lifecycle_tools(
)
info = updated_session.worker_info
# Switch to staging phase after successful load
# Validate that all tools declared by nodes are registered
loaded_runtime = _get_runtime()
if loaded_runtime is not None:
available_tool_names = {t.name for t in loaded_runtime._tools}
missing_by_node: dict[str, list[str]] = {}
for node in loaded_runtime.graph.nodes:
if node.tools:
missing = set(node.tools) - available_tool_names
if missing:
missing_by_node[f"{node.name} (id={node.id})"] = sorted(missing)
if missing_by_node:
# Unload the broken worker
try:
await session_manager.unload_worker(manager_session_id)
except Exception:
pass
details = "; ".join(
f"Node '{k}' missing {v}" for k, v in missing_by_node.items()
)
return json.dumps(
{
"error": (
f"Tool validation failed: {details}. "
"Fix node tool declarations or add the missing "
"tools, then try loading again."
)
}
)
# Switch to staging phase after successful load + validation
if phase_state is not None:
await phase_state.switch_to_staging()
@@ -0,0 +1,38 @@
"""Tool for the queen to write to her episodic memory.
The queen can consciously record significant moments during a session like
writing in a diary. Semantic memory (MEMORY.md) is updated automatically at
session end and is never written by the queen directly.
"""
from __future__ import annotations
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from framework.runner.tool_registry import ToolRegistry
def write_to_diary(entry: str) -> str:
"""Write a prose entry to today's episodic memory.
Use this when something significant just happened: a pipeline went live, the
user shared an important preference, a goal was achieved or abandoned, or
you want to record something that should be remembered across sessions.
Write in first person, as you would in a private diary. Be specific what
happened, how the user responded, what it means going forward. One or two
paragraphs is enough.
You do not need to include a timestamp or date heading; those are added
automatically.
"""
from framework.agents.queen.queen_memory import append_episodic_entry
append_episodic_entry(entry)
return "Diary entry recorded."
def register_queen_memory_tools(registry: ToolRegistry) -> None:
"""Register the episodic memory tool into the queen's tool registry."""
registry.register_function(write_to_diary)
+1 -1
View File
@@ -1,6 +1,6 @@
"""Graph lifecycle tools for multi-graph sessions.
These tools allow an agent (e.g. hive_coder) to load, unload, start,
These tools allow an agent (e.g. queen) to load, unload, start,
restart, and query other agent graphs within the same runtime session.
Usage::
File diff suppressed because it is too large Load Diff
-13
View File
@@ -1,13 +0,0 @@
"""TUI screens package."""
from .account_selection import AccountSelectionScreen
from .add_local_credential import AddLocalCredentialScreen
from .agent_picker import AgentPickerScreen
from .credential_setup import CredentialSetupScreen
__all__ = [
"AccountSelectionScreen",
"AddLocalCredentialScreen",
"AgentPickerScreen",
"CredentialSetupScreen",
]
@@ -1,111 +0,0 @@
"""Account selection ModalScreen for picking a connected account before agent start."""
from __future__ import annotations
from rich.text import Text
from textual.app import ComposeResult
from textual.binding import Binding
from textual.containers import Vertical
from textual.screen import ModalScreen
from textual.widgets import Label, OptionList
from textual.widgets._option_list import Option
class AccountSelectionScreen(ModalScreen[dict | None]):
"""Modal screen showing connected accounts for pre-run selection.
Returns the selected account dict, or None if dismissed.
"""
SCOPED_CSS = False
BINDINGS = [
Binding("escape", "dismiss_picker", "Cancel"),
]
DEFAULT_CSS = """
AccountSelectionScreen {
align: center middle;
}
#acct-container {
width: 70%;
max-width: 80;
height: 60%;
background: $surface;
border: heavy $primary;
padding: 1 2;
}
#acct-title {
text-align: center;
text-style: bold;
width: 100%;
color: $text;
}
#acct-subtitle {
text-align: center;
width: 100%;
margin-bottom: 1;
}
#acct-footer {
text-align: center;
width: 100%;
margin-top: 1;
}
"""
def __init__(self, accounts: list[dict]) -> None:
super().__init__()
self._accounts = accounts
def compose(self) -> ComposeResult:
n = len(self._accounts)
with Vertical(id="acct-container"):
yield Label("Select Account to Test", id="acct-title")
yield Label(
f"[dim]{n} connected account{'s' if n != 1 else ''}[/dim]",
id="acct-subtitle",
)
option_list = OptionList(id="acct-list")
# Group: Aden accounts first, then local
aden = [a for a in self._accounts if a.get("source") != "local"]
local = [a for a in self._accounts if a.get("source") == "local"]
ordered = aden + local
for i, acct in enumerate(ordered):
provider = acct.get("provider", "unknown")
alias = acct.get("alias", "unknown")
identity = acct.get("identity", {})
source = acct.get("source", "aden")
# Build identity label: prefer email, then username/workspace
identity_label = (
identity.get("email")
or identity.get("username")
or identity.get("workspace")
or ""
)
label = Text()
label.append(f"{provider}/", style="bold")
label.append(alias, style="bold cyan")
if source == "local":
label.append(" [local]", style="dim yellow")
if identity_label:
label.append(f" ({identity_label})", style="dim")
option_list.add_option(Option(label, id=f"acct-{i}"))
# Keep ordered list for index lookups
self._accounts = ordered
yield option_list
yield Label(
"[dim]Enter[/dim] Select [dim]Esc[/dim] Cancel",
id="acct-footer",
)
def on_mount(self) -> None:
ol = self.query_one("#acct-list", OptionList)
ol.styles.height = "1fr"
def on_option_list_option_selected(self, event: OptionList.OptionSelected) -> None:
idx = event.option_index
if 0 <= idx < len(self._accounts):
self.dismiss(self._accounts[idx])
def action_dismiss_picker(self) -> None:
self.dismiss(None)
@@ -1,244 +0,0 @@
"""Add Local Credential ModalScreen for storing named local API key accounts."""
from __future__ import annotations
from textual.app import ComposeResult
from textual.binding import Binding
from textual.containers import Vertical, VerticalScroll
from textual.screen import ModalScreen
from textual.widgets import Button, Input, Label, OptionList
from textual.widgets._option_list import Option
class AddLocalCredentialScreen(ModalScreen[dict | None]):
"""Modal screen for adding a named local API key credential.
Phase 1: Pick credential type from list.
Phase 2: Enter alias + API key, run health check, save.
Returns a dict with credential_id, alias, and identity on success, or None on cancel.
"""
BINDINGS = [
Binding("escape", "dismiss_screen", "Cancel"),
]
DEFAULT_CSS = """
AddLocalCredentialScreen {
align: center middle;
}
#alc-container {
width: 80%;
max-width: 90;
height: 80%;
background: $surface;
border: heavy $primary;
padding: 1 2;
}
#alc-title {
text-align: center;
text-style: bold;
width: 100%;
color: $text;
}
#alc-subtitle {
text-align: center;
width: 100%;
margin-bottom: 1;
}
#alc-type-list {
height: 1fr;
}
#alc-form {
height: 1fr;
}
.alc-field {
margin-bottom: 1;
height: auto;
}
.alc-field Label {
margin-bottom: 0;
}
#alc-status {
width: 100%;
height: auto;
margin-top: 1;
padding: 1;
background: $panel;
}
.alc-buttons {
height: auto;
margin-top: 1;
align: center middle;
}
.alc-buttons Button {
margin: 0 1;
}
#alc-footer {
text-align: center;
width: 100%;
margin-top: 1;
}
"""
def __init__(self) -> None:
super().__init__()
# Load credential specs that support direct API keys
self._specs: list[tuple[str, object]] = self._load_specs()
# Selected credential spec (set in phase 2)
self._selected_id: str = ""
self._selected_spec: object = None
self._phase: int = 1 # 1 = type selection, 2 = form
@staticmethod
def _load_specs() -> list[tuple[str, object]]:
"""Return (credential_id, spec) pairs for direct-API-key credentials."""
try:
from aden_tools.credentials import CREDENTIAL_SPECS
return [
(cid, spec)
for cid, spec in CREDENTIAL_SPECS.items()
if getattr(spec, "direct_api_key_supported", False)
]
except Exception:
return []
# ------------------------------------------------------------------
# Compose
# ------------------------------------------------------------------
def compose(self) -> ComposeResult:
with Vertical(id="alc-container"):
yield Label("Add Local Credential", id="alc-title")
yield Label("[dim]Store a named API key account[/dim]", id="alc-subtitle")
# Phase 1: type selection
option_list = OptionList(id="alc-type-list")
for cid, spec in self._specs:
description = getattr(spec, "description", cid)
option_list.add_option(Option(f"{cid} [dim]{description}[/dim]", id=f"type-{cid}"))
yield option_list
# Phase 2: form (hidden initially)
with VerticalScroll(id="alc-form"):
with Vertical(classes="alc-field"):
yield Label("[bold]Alias[/bold] [dim](e.g. work, personal)[/dim]")
yield Input(value="default", id="alc-alias")
with Vertical(classes="alc-field"):
yield Label("[bold]API Key[/bold]")
yield Input(placeholder="Paste API key...", password=True, id="alc-key")
yield Label("", id="alc-status")
with Vertical(classes="alc-buttons"):
yield Button("Test & Save", variant="primary", id="btn-save")
yield Button("Back", variant="default", id="btn-back")
yield Label(
"[dim]Enter[/dim] Select [dim]Esc[/dim] Cancel",
id="alc-footer",
)
def on_mount(self) -> None:
self._show_phase(1)
# ------------------------------------------------------------------
# Phase switching
# ------------------------------------------------------------------
def _show_phase(self, phase: int) -> None:
self._phase = phase
type_list = self.query_one("#alc-type-list", OptionList)
form = self.query_one("#alc-form", VerticalScroll)
if phase == 1:
type_list.display = True
form.display = False
subtitle = self.query_one("#alc-subtitle", Label)
subtitle.update("[dim]Select the credential type to add[/dim]")
else:
type_list.display = False
form.display = True
spec = self._selected_spec
description = (
getattr(spec, "description", self._selected_id) if spec else self._selected_id
)
subtitle = self.query_one("#alc-subtitle", Label)
subtitle.update(f"[dim]{self._selected_id}[/dim] {description}")
self._clear_status()
# Focus the alias input
self.query_one("#alc-alias", Input).focus()
# ------------------------------------------------------------------
# Event handlers
# ------------------------------------------------------------------
def on_option_list_option_selected(self, event: OptionList.OptionSelected) -> None:
if self._phase != 1:
return
option_id = event.option.id or ""
if option_id.startswith("type-"):
cid = option_id[5:] # strip "type-" prefix
self._selected_id = cid
self._selected_spec = next(
(spec for spec_id, spec in self._specs if spec_id == cid), None
)
self._show_phase(2)
def on_button_pressed(self, event: Button.Pressed) -> None:
if event.button.id == "btn-save":
self._do_save()
elif event.button.id == "btn-back":
self._show_phase(1)
# ------------------------------------------------------------------
# Save logic
# ------------------------------------------------------------------
def _do_save(self) -> None:
alias = self.query_one("#alc-alias", Input).value.strip() or "default"
api_key = self.query_one("#alc-key", Input).value.strip()
if not api_key:
self._set_status("[red]API key cannot be empty.[/red]")
return
self._set_status("[dim]Running health check...[/dim]")
# Disable save button while running
btn = self.query_one("#btn-save", Button)
btn.disabled = True
try:
from framework.credentials.local.registry import LocalCredentialRegistry
registry = LocalCredentialRegistry.default()
info, health_result = registry.save_account(
credential_id=self._selected_id,
alias=alias,
api_key=api_key,
run_health_check=True,
)
if health_result is not None and not health_result.valid:
self._set_status(
f"[yellow]Saved with failed health check:[/yellow] {health_result.message}\n"
"[dim]You can re-validate later via validate_credential().[/dim]"
)
else:
identity = info.identity.to_dict()
identity_str = ""
if identity:
parts = [f"{k}: {v}" for k, v in identity.items() if v]
identity_str = " " + ", ".join(parts) if parts else ""
self._set_status(f"[green]Saved:[/green] {info.storage_id}{identity_str}")
# Dismiss with result so callers can react
self.set_timer(1.0, lambda: self.dismiss(info.to_account_dict()))
return
except Exception as e:
self._set_status(f"[red]Error:[/red] {e}")
finally:
btn.disabled = False
def _set_status(self, markup: str) -> None:
self.query_one("#alc-status", Label).update(markup)
def _clear_status(self) -> None:
self.query_one("#alc-status", Label).update("")
def action_dismiss_screen(self) -> None:
self.dismiss(None)
-362
View File
@@ -1,362 +0,0 @@
"""Agent picker ModalScreen for selecting agents within the TUI."""
from __future__ import annotations
import json
from dataclasses import dataclass, field
from enum import Enum
from pathlib import Path
from rich.console import Group
from rich.text import Text
from textual.app import ComposeResult
from textual.binding import Binding
from textual.containers import Vertical
from textual.screen import ModalScreen
from textual.widgets import Label, OptionList, TabbedContent, TabPane
from textual.widgets._option_list import Option
class GetStartedAction(Enum):
"""Actions available in the Get Started tab."""
RUN_EXAMPLES = "run_examples"
RUN_EXISTING = "run_existing"
BUILD_EDIT = "build_edit"
@dataclass
class AgentEntry:
"""Lightweight agent metadata for the picker."""
path: Path
name: str
description: str
category: str
session_count: int = 0
node_count: int = 0
tool_count: int = 0
tags: list[str] = field(default_factory=list)
last_active: str | None = None
def _get_last_active(agent_name: str) -> str | None:
"""Return the most recent updated_at timestamp across all sessions."""
sessions_dir = Path.home() / ".hive" / "agents" / agent_name / "sessions"
if not sessions_dir.exists():
return None
latest: str | None = None
for session_dir in sessions_dir.iterdir():
if not session_dir.is_dir() or not session_dir.name.startswith("session_"):
continue
state_file = session_dir / "state.json"
if not state_file.exists():
continue
try:
data = json.loads(state_file.read_text(encoding="utf-8"))
ts = data.get("timestamps", {}).get("updated_at")
if ts and (latest is None or ts > latest):
latest = ts
except Exception:
continue
return latest
def _count_sessions(agent_name: str) -> int:
"""Count session directories under ~/.hive/agents/{agent_name}/sessions/."""
sessions_dir = Path.home() / ".hive" / "agents" / agent_name / "sessions"
if not sessions_dir.exists():
return 0
return sum(1 for d in sessions_dir.iterdir() if d.is_dir() and d.name.startswith("session_"))
def _extract_agent_stats(agent_path: Path) -> tuple[int, int, list[str]]:
"""Extract node count, tool count, and tags from an agent directory.
Prefers agent.py (AST-parsed) over agent.json for node/tool counts
since agent.json may be stale. Tags are only available from agent.json.
"""
import ast
node_count, tool_count, tags = 0, 0, []
# Try agent.py first — source of truth for nodes
agent_py = agent_path / "agent.py"
if agent_py.exists():
try:
tree = ast.parse(agent_py.read_text(encoding="utf-8"))
for node in ast.walk(tree):
# Find `nodes = [...]` assignment
if isinstance(node, ast.Assign):
for target in node.targets:
if isinstance(target, ast.Name) and target.id == "nodes":
if isinstance(node.value, ast.List):
node_count = len(node.value.elts)
except Exception:
pass
# Fall back to / supplement from agent.json
agent_json = agent_path / "agent.json"
if agent_json.exists():
try:
data = json.loads(agent_json.read_text(encoding="utf-8"))
json_nodes = data.get("nodes", [])
if node_count == 0:
node_count = len(json_nodes)
# Tool count: use whichever source gave us nodes, but agent.json
# has the structured tool lists so prefer it for tool counting
tools: set[str] = set()
for n in json_nodes:
tools.update(n.get("tools", []))
tool_count = len(tools)
tags = data.get("agent", {}).get("tags", [])
except Exception:
pass
return node_count, tool_count, tags
def discover_agents() -> dict[str, list[AgentEntry]]:
"""Discover agents from all known sources grouped by category."""
from framework.runner.cli import (
_extract_python_agent_metadata,
_get_framework_agents_dir,
_is_valid_agent_dir,
)
groups: dict[str, list[AgentEntry]] = {}
sources = [
("Your Agents", Path("exports")),
("Framework", _get_framework_agents_dir()),
("Examples", Path("examples/templates")),
]
for category, base_dir in sources:
if not base_dir.exists():
continue
entries: list[AgentEntry] = []
for path in sorted(base_dir.iterdir(), key=lambda p: p.name):
if not _is_valid_agent_dir(path):
continue
# config.py is source of truth for name/description
name, desc = _extract_python_agent_metadata(path)
config_fallback_name = path.name.replace("_", " ").title()
used_config = name != config_fallback_name
node_count, tool_count, tags = _extract_agent_stats(path)
if not used_config:
# config.py didn't provide values, fall back to agent.json
agent_json = path / "agent.json"
if agent_json.exists():
try:
data = json.loads(agent_json.read_text(encoding="utf-8"))
meta = data.get("agent", {})
name = meta.get("name", name)
desc = meta.get("description", desc)
except Exception:
pass
entries.append(
AgentEntry(
path=path,
name=name,
description=desc,
category=category,
session_count=_count_sessions(path.name),
node_count=node_count,
tool_count=tool_count,
tags=tags,
last_active=_get_last_active(path.name),
)
)
if entries:
groups[category] = entries
return groups
def _render_agent_option(agent: AgentEntry) -> Group:
"""Build a Rich renderable for a single agent option."""
# Line 1: name + session badge
line1 = Text()
line1.append(agent.name, style="bold")
if agent.session_count:
line1.append(f" {agent.session_count} sessions", style="dim cyan")
# Line 2: description (word-wrapped by the widget)
desc = agent.description if agent.description else "No description"
line2 = Text(desc, style="dim")
# Line 3: stats chips
chips = Text()
if agent.node_count:
chips.append(f" {agent.node_count} nodes ", style="on dark_green white")
chips.append(" ")
if agent.tool_count:
chips.append(f" {agent.tool_count} tools ", style="on dark_blue white")
chips.append(" ")
for tag in agent.tags[:3]:
chips.append(f" {tag} ", style="on grey37 white")
chips.append(" ")
parts = [line1, line2]
if chips.plain.strip():
parts.append(chips)
return Group(*parts)
def _render_get_started_option(title: str, description: str, icon: str = "") -> Group:
"""Build a Rich renderable for a Get Started option."""
line1 = Text()
line1.append(f"{icon} ", style="bold cyan")
line1.append(title, style="bold")
line2 = Text(description, style="dim")
return Group(line1, line2)
class AgentPickerScreen(ModalScreen[str | None]):
"""Modal screen showing available agents organized by tabbed categories.
Returns the selected agent path as a string, or None if dismissed.
For Get Started actions, returns a special prefix like "action:run_examples".
"""
BINDINGS = [
Binding("escape", "dismiss_picker", "Cancel"),
]
DEFAULT_CSS = """
AgentPickerScreen {
align: center middle;
}
#picker-container {
width: 90%;
max-width: 120;
height: 85%;
background: $surface;
border: heavy $primary;
padding: 1 2;
}
#picker-title {
text-align: center;
text-style: bold;
width: 100%;
color: $text;
}
#picker-subtitle {
text-align: center;
width: 100%;
margin-bottom: 1;
}
#picker-footer {
text-align: center;
width: 100%;
margin-top: 1;
}
TabPane {
padding: 0;
}
OptionList {
height: 1fr;
}
OptionList > .option-list--option {
padding: 1 2;
}
"""
def __init__(
self,
agent_groups: dict[str, list[AgentEntry]],
show_get_started: bool = False,
) -> None:
super().__init__()
self._groups = agent_groups
self._show_get_started = show_get_started
# Map (tab_id, option_index) -> AgentEntry
self._option_map: dict[str, dict[int, AgentEntry]] = {}
def compose(self) -> ComposeResult:
total = sum(len(v) for v in self._groups.values())
with Vertical(id="picker-container"):
yield Label("Hive Agent Launcher", id="picker-title")
yield Label(
f"[dim]{total} agents available[/dim]",
id="picker-subtitle",
)
with TabbedContent():
# Get Started tab (only on initial launch)
if self._show_get_started:
with TabPane("Get Started", id="get-started"):
option_list = OptionList(id="list-get-started")
option_list.add_option(
Option(
_render_get_started_option(
"Test and run example agents",
"Try pre-built example agents to learn how Hive works",
"📚",
),
id="action:run_examples",
)
)
option_list.add_option(
Option(
_render_get_started_option(
"Test and run existing agent",
"Load and run an agent you've already built (from exports/)",
"🚀",
),
id="action:run_existing",
)
)
option_list.add_option(
Option(
_render_get_started_option(
"Build or edit agent",
"Create a new agent or modify an existing one",
"🛠️ ",
),
id="action:build_edit",
)
)
yield option_list
# Agent category tabs
for category, agents in self._groups.items():
tab_id = category.lower().replace(" ", "-")
with TabPane(f"{category} ({len(agents)})", id=tab_id):
option_list = OptionList(id=f"list-{tab_id}")
self._option_map[f"list-{tab_id}"] = {}
for i, agent in enumerate(agents):
option_list.add_option(
Option(
_render_agent_option(agent),
id=str(agent.path),
)
)
self._option_map[f"list-{tab_id}"][i] = agent
yield option_list
yield Label(
"[dim]Enter[/dim] Select [dim]Tab[/dim] Switch category [dim]Esc[/dim] Cancel",
id="picker-footer",
)
def on_option_list_option_selected(self, event: OptionList.OptionSelected) -> None:
list_id = event.option_list.id or ""
# Handle Get Started tab options
if list_id == "list-get-started":
option = event.option
if option and option.id:
self.dismiss(option.id) # Returns "action:run_examples", etc.
return
# Handle agent selection from other tabs
idx = event.option_index
agent_map = self._option_map.get(list_id, {})
agent = agent_map.get(idx)
if agent:
self.dismiss(str(agent.path))
def action_dismiss_picker(self) -> None:
self.dismiss(None)
@@ -1,304 +0,0 @@
"""Credential setup ModalScreen for configuring missing agent credentials."""
from __future__ import annotations
import os
from textual.app import ComposeResult
from textual.binding import Binding
from textual.containers import Vertical, VerticalScroll
from textual.screen import ModalScreen
from textual.widgets import Button, Input, Label
from framework.credentials.setup import CredentialSetupSession, MissingCredential
class CredentialSetupScreen(ModalScreen[bool | None]):
"""Modal screen for configuring missing agent credentials.
Shows a form with one password Input per missing credential.
For Aden-backed credentials (``aden_supported=True``), prompts for
``ADEN_API_KEY`` and runs the Aden sync flow instead of storing a
raw value.
Returns True on successful save, or None on cancel/skip.
"""
BINDINGS = [
Binding("escape", "dismiss_setup", "Cancel"),
]
DEFAULT_CSS = """
CredentialSetupScreen {
align: center middle;
}
#cred-container {
width: 80%;
max-width: 100;
height: 80%;
background: $surface;
border: heavy $primary;
padding: 1 2;
}
#cred-title {
text-align: center;
text-style: bold;
width: 100%;
color: $text;
}
#cred-subtitle {
text-align: center;
width: 100%;
margin-bottom: 1;
}
#cred-scroll {
height: 1fr;
}
.cred-entry {
margin-bottom: 1;
padding: 1;
background: $panel;
height: auto;
}
.cred-entry Input {
margin-top: 1;
}
.cred-buttons {
height: auto;
margin-top: 1;
align: center middle;
}
.cred-buttons Button {
margin: 0 1;
}
#cred-footer {
text-align: center;
width: 100%;
margin-top: 1;
}
"""
def __init__(self, session: CredentialSetupSession) -> None:
super().__init__()
self._session = session
self._missing: list[MissingCredential] = session.missing
# Track which credentials need Aden sync vs direct API key
self._aden_creds: set[int] = set()
self._needs_aden_key = False
for i, cred in enumerate(self._missing):
if cred.aden_supported and not cred.direct_api_key_supported:
self._aden_creds.add(i)
self._needs_aden_key = True
def compose(self) -> ComposeResult:
n = len(self._missing)
with Vertical(id="cred-container"):
yield Label("Credential Setup", id="cred-title")
yield Label(
f"[dim]{n} credential{'s' if n != 1 else ''} needed to run this agent[/dim]",
id="cred-subtitle",
)
with VerticalScroll(id="cred-scroll"):
# If any credential needs Aden, show ADEN_API_KEY input first
if self._needs_aden_key:
aden_key = os.environ.get("ADEN_API_KEY", "")
with Vertical(classes="cred-entry"):
yield Label("[bold]ADEN_API_KEY[/bold]")
aden_names = [
self._missing[i].credential_name for i in sorted(self._aden_creds)
]
yield Label(f"[dim]Required for OAuth sync: {', '.join(aden_names)}[/dim]")
yield Label("[cyan]Get key:[/cyan] https://hive.adenhq.com")
yield Input(
placeholder="Paste ADEN_API_KEY..."
if not aden_key
else "Already set (leave blank to keep)",
password=True,
id="key-aden",
)
# Show direct API key inputs for non-Aden credentials
for i, cred in enumerate(self._missing):
if i in self._aden_creds:
continue # Handled via Aden sync above
with Vertical(classes="cred-entry"):
yield Label(f"[bold]{cred.env_var}[/bold]")
affected = cred.tools or cred.node_types
if affected:
yield Label(f"[dim]Required by: {', '.join(affected)}[/dim]")
if cred.description:
yield Label(f"[dim]{cred.description}[/dim]")
if cred.help_url:
yield Label(f"[cyan]Get key:[/cyan] {cred.help_url}")
yield Input(
placeholder="Paste API key...",
password=True,
id=f"key-{i}",
)
with Vertical(classes="cred-buttons"):
yield Button("Save & Continue", variant="primary", id="btn-save")
yield Button("Skip", variant="default", id="btn-skip")
yield Label(
"[dim]Enter[/dim] Submit [dim]Esc[/dim] Cancel",
id="cred-footer",
)
def on_button_pressed(self, event: Button.Pressed) -> None:
if event.button.id == "btn-save":
self._save_credentials()
elif event.button.id == "btn-skip":
self.dismiss(None)
def _save_credentials(self) -> None:
"""Collect inputs, store credentials, and dismiss."""
self._session._ensure_credential_key()
configured = 0
# Handle Aden-backed credentials
if self._needs_aden_key:
aden_input = self.query_one("#key-aden", Input)
aden_key = aden_input.value.strip()
if aden_key:
from framework.credentials.key_storage import save_aden_api_key
save_aden_api_key(aden_key)
configured += 1 # ADEN_API_KEY itself counts as configured
# Run Aden sync for all Aden-backed creds (best-effort)
if aden_key or os.environ.get("ADEN_API_KEY"):
self._sync_aden_credentials()
# Handle direct API key credentials
for i, cred in enumerate(self._missing):
if i in self._aden_creds:
continue
input_widget = self.query_one(f"#key-{i}", Input)
value = input_widget.value.strip()
if not value:
continue
try:
self._session._store_credential(cred, value)
configured += 1
except Exception as e:
self.notify(f"Error storing {cred.env_var}: {e}", severity="error")
if configured > 0:
self.dismiss(True)
else:
self.notify("No credentials configured", severity="warning", timeout=3)
def _sync_aden_credentials(self) -> int:
"""Sync Aden-backed credentials and return count of successfully synced."""
# Build the Aden sync components directly so we get real errors
# instead of CredentialStore.with_aden_sync() silently falling back.
try:
from framework.credentials.aden import (
AdenCachedStorage,
AdenClientConfig,
AdenCredentialClient,
AdenSyncProvider,
)
from framework.credentials.storage import EncryptedFileStorage
client = AdenCredentialClient(AdenClientConfig(base_url="https://api.adenhq.com"))
provider = AdenSyncProvider(client=client)
local_storage = EncryptedFileStorage()
cached_storage = AdenCachedStorage(
local_storage=local_storage,
aden_provider=provider,
)
except Exception as e:
self.notify(
f"Aden setup error: {e}",
severity="error",
timeout=8,
)
return 0
# Sync all integrations from Aden to get the provider index populated
try:
from framework.credentials import CredentialStore
store = CredentialStore(
storage=cached_storage,
providers=[provider],
auto_refresh=True,
)
num_synced = provider.sync_all(store)
if num_synced == 0:
self.notify(
"No active integrations found in Aden. "
"Connect integrations at hive.adenhq.com.",
severity="warning",
timeout=8,
)
except Exception as e:
self.notify(
f"Aden sync error: {e}",
severity="error",
timeout=8,
)
return 0
synced = 0
for i in sorted(self._aden_creds):
cred = self._missing[i]
cred_id = cred.credential_id or cred.credential_name
if store.is_available(cred_id):
try:
value = store.get_key(cred_id, cred.credential_key)
if value:
os.environ[cred.env_var] = value
self._persist_to_local_store(cred_id, cred.credential_key, value)
synced += 1
else:
self.notify(
f"{cred.credential_name}: key "
f"'{cred.credential_key}' not found "
f"in credential '{cred_id}'",
severity="warning",
timeout=8,
)
except Exception as e:
self.notify(
f"{cred.credential_name} extraction failed: {e}",
severity="error",
timeout=8,
)
else:
self.notify(
f"{cred.credential_name} (id='{cred_id}') "
f"not found in Aden. Connect this "
f"integration at hive.adenhq.com first.",
severity="warning",
timeout=8,
)
return synced
@staticmethod
def _persist_to_local_store(cred_id: str, key_name: str, value: str) -> None:
"""Save a synced token to the local encrypted store under the canonical ID."""
try:
from pydantic import SecretStr
from framework.credentials.models import CredentialKey, CredentialObject, CredentialType
from framework.credentials.storage import EncryptedFileStorage
cred_obj = CredentialObject(
id=cred_id,
credential_type=CredentialType.OAUTH2,
keys={
key_name: CredentialKey(
name=key_name,
value=SecretStr(value),
),
},
auto_refresh=True,
)
EncryptedFileStorage().save(cred_obj)
except Exception:
pass # Best-effort; env var is the primary delivery mechanism
def action_dismiss_setup(self) -> None:
self.dismiss(None)
File diff suppressed because it is too large Load Diff
-139
View File
@@ -1,139 +0,0 @@
"""
Native OS file dialog for PDF selection.
Launches the platform's native file picker (macOS: NSOpenPanel via osascript,
Linux: zenity/kdialog, Windows: PowerShell OpenFileDialog) in a background
thread so Textual's event loop stays responsive.
Falls back to None when no GUI is available (SSH, headless).
"""
import asyncio
import os
import subprocess
import sys
from pathlib import Path
def _has_gui() -> bool:
"""Detect whether a GUI display is available."""
if sys.platform == "darwin":
# macOS: GUI is available unless running over SSH without display forwarding.
return "SSH_CONNECTION" not in os.environ or "DISPLAY" in os.environ
elif sys.platform == "win32":
return True
else:
# Linux/BSD: Need X11 or Wayland.
return bool(os.environ.get("DISPLAY") or os.environ.get("WAYLAND_DISPLAY"))
def _linux_file_dialog() -> subprocess.CompletedProcess | None:
"""Try zenity, then kdialog, on Linux. Returns CompletedProcess or None."""
# Try zenity (GTK)
try:
return subprocess.run(
[
"zenity",
"--file-selection",
"--title=Select a PDF file",
"--file-filter=PDF files (*.pdf)|*.pdf",
],
encoding="utf-8",
capture_output=True,
text=True,
timeout=300,
)
except FileNotFoundError:
pass
# Try kdialog (KDE)
try:
return subprocess.run(
[
"kdialog",
"--getopenfilename",
".",
"PDF files (*.pdf)",
],
encoding="utf-8",
capture_output=True,
text=True,
timeout=300,
)
except FileNotFoundError:
pass
return None
def _pick_pdf_subprocess() -> Path | None:
"""Run the native file dialog. BLOCKS until user picks or cancels.
Returns a Path on success, None on cancel or error.
Must be called from a non-main thread (via asyncio.to_thread).
"""
try:
if sys.platform == "darwin":
result = subprocess.run(
[
"osascript",
"-e",
'POSIX path of (choose file of type {"com.adobe.pdf"} '
'with prompt "Select a PDF file")',
],
encoding="utf-8",
capture_output=True,
text=True,
timeout=300,
)
elif sys.platform == "win32":
ps_script = (
"Add-Type -AssemblyName System.Windows.Forms; "
"$f = New-Object System.Windows.Forms.OpenFileDialog; "
"$f.Filter = 'PDF files (*.pdf)|*.pdf'; "
"$f.Title = 'Select a PDF file'; "
"if ($f.ShowDialog() -eq 'OK') { $f.FileName }"
)
result = subprocess.run(
["powershell", "-NoProfile", "-Command", ps_script],
encoding="utf-8",
capture_output=True,
text=True,
timeout=300,
)
else:
result = _linux_file_dialog()
if result is None:
return None
if result.returncode != 0:
return None
path_str = result.stdout.strip()
if not path_str:
return None
path = Path(path_str)
if path.is_file() and path.suffix.lower() == ".pdf":
return path
return None
except (subprocess.TimeoutExpired, FileNotFoundError, OSError):
return None
async def pick_pdf_file() -> Path | None:
"""Open a native OS file dialog to pick a PDF file.
Non-blocking: runs the dialog subprocess in a background thread via
asyncio.to_thread(), so the calling event loop stays responsive.
Returns:
Path to the selected PDF, or None if the user cancelled,
no GUI is available, or the dialog command was not found.
"""
if not _has_gui():
return None
return await asyncio.to_thread(_pick_pdf_subprocess)
-585
View File
@@ -1,585 +0,0 @@
"""
Graph/Tree Overview Widget - Displays real agent graph structure.
Supports rendering loops (back-edges) via right-side return channels:
arrows drawn on the right margin that visually point back up to earlier nodes.
"""
from __future__ import annotations
import re
import time
from textual.app import ComposeResult
from textual.containers import Vertical
from framework.runtime.agent_runtime import AgentRuntime
from framework.runtime.event_bus import EventType
from framework.tui.widgets.selectable_rich_log import SelectableRichLog as RichLog
# Width of each return-channel column (padding + │ + gap)
_CHANNEL_WIDTH = 5
# Regex to strip Rich markup tags for measuring visible width
_MARKUP_RE = re.compile(r"\[/?[^\]]*\]")
def _plain_len(s: str) -> int:
"""Return the visible character length of a Rich-markup string."""
return len(_MARKUP_RE.sub("", s))
class GraphOverview(Vertical):
"""Widget to display Agent execution graph/tree with real data."""
DEFAULT_CSS = """
GraphOverview {
width: 100%;
height: 100%;
background: $panel;
}
GraphOverview > RichLog {
width: 100%;
height: 100%;
background: $panel;
border: none;
scrollbar-background: $surface;
scrollbar-color: $primary;
}
"""
def __init__(self, runtime: AgentRuntime):
super().__init__()
self.runtime = runtime
self._override_graph = None # Set by switch_graph() for secondary graphs
self.active_node: str | None = None
self.execution_path: list[str] = []
# Per-node status strings shown next to the node in the graph display.
# e.g. {"planner": "thinking...", "searcher": "web_search..."}
self._node_status: dict[str, str] = {}
@property
def _graph(self):
"""The graph currently being displayed (may be a secondary graph)."""
return self._override_graph or self.runtime.graph
def switch_graph(self, graph) -> None:
"""Switch to displaying a different graph and refresh."""
self._override_graph = graph
self.active_node = None
self.execution_path = []
self._node_status = {}
self._display_graph()
def compose(self) -> ComposeResult:
# Use RichLog for formatted output
yield RichLog(id="graph-display", highlight=True, markup=True)
def on_mount(self) -> None:
"""Display initial graph structure."""
self._display_graph()
# Refresh every 1s so timer countdowns stay current
if self.runtime._timer_next_fire is not None:
self.set_interval(1.0, self._display_graph)
# ------------------------------------------------------------------
# Graph analysis helpers
# ------------------------------------------------------------------
def _topo_order(self) -> list[str]:
"""BFS from entry_node following edges."""
graph = self._graph
visited: list[str] = []
seen: set[str] = set()
queue = [graph.entry_node]
while queue:
nid = queue.pop(0)
if nid in seen:
continue
seen.add(nid)
visited.append(nid)
for edge in graph.get_outgoing_edges(nid):
if edge.target not in seen:
queue.append(edge.target)
# Append orphan nodes not reachable from entry
for node in graph.nodes:
if node.id not in seen:
visited.append(node.id)
return visited
def _detect_back_edges(self, ordered: list[str]) -> list[dict]:
"""Find edges where target appears before (or equal to) source in topo order.
Returns a list of dicts with keys: edge, source, target, source_idx, target_idx.
"""
order_idx = {nid: i for i, nid in enumerate(ordered)}
back_edges: list[dict] = []
for node_id in ordered:
for edge in self._graph.get_outgoing_edges(node_id):
target_idx = order_idx.get(edge.target, -1)
source_idx = order_idx.get(node_id, -1)
if target_idx != -1 and target_idx <= source_idx:
back_edges.append(
{
"edge": edge,
"source": node_id,
"target": edge.target,
"source_idx": source_idx,
"target_idx": target_idx,
}
)
return back_edges
def _is_back_edge(self, source: str, target: str, order_idx: dict[str, int]) -> bool:
"""Check whether an edge from *source* to *target* is a back-edge."""
si = order_idx.get(source, -1)
ti = order_idx.get(target, -1)
return ti != -1 and ti <= si
# ------------------------------------------------------------------
# Line rendering (Pass 1)
# ------------------------------------------------------------------
def _render_node_line(self, node_id: str) -> str:
"""Render a single node with status symbol and optional status text."""
graph = self._graph
is_terminal = node_id in (graph.terminal_nodes or [])
is_active = node_id == self.active_node
is_done = node_id in self.execution_path and not is_active
status = self._node_status.get(node_id, "")
if is_active:
sym = "[bold green]●[/bold green]"
elif is_done:
sym = "[dim]✓[/dim]"
elif is_terminal:
sym = "[yellow]■[/yellow]"
else:
sym = ""
if is_active:
name = f"[bold green]{node_id}[/bold green]"
elif is_done:
name = f"[dim]{node_id}[/dim]"
else:
name = node_id
suffix = f" [italic]{status}[/italic]" if status else ""
return f" {sym} {name}{suffix}"
def _render_edges(self, node_id: str, order_idx: dict[str, int]) -> list[str]:
"""Render forward-edge connectors from *node_id*.
Back-edges are excluded here they are drawn by the return-channel
overlay in Pass 2.
"""
all_edges = self._graph.get_outgoing_edges(node_id)
if not all_edges:
return []
# Split into forward and back
forward = [e for e in all_edges if not self._is_back_edge(node_id, e.target, order_idx)]
if not forward:
# All edges are back-edges — nothing to render here
return []
if len(forward) == 1:
return ["", ""]
# Fan-out: show branches
lines: list[str] = []
for i, edge in enumerate(forward):
connector = "" if i == len(forward) - 1 else ""
cond = ""
if edge.condition.value not in ("always", "on_success"):
cond = f" [dim]({edge.condition.value})[/dim]"
lines.append(f" {connector}──▶ {edge.target}{cond}")
return lines
# ------------------------------------------------------------------
# Return-channel overlay (Pass 2)
# ------------------------------------------------------------------
def _overlay_return_channels(
self,
lines: list[str],
node_line_map: dict[str, int],
back_edges: list[dict],
available_width: int,
) -> list[str]:
"""Overlay right-side return channels onto the line buffer.
Each back-edge gets a vertical channel on the right margin. Channels
are allocated left-to-right by increasing span length so that shorter
(inner) loops are closer to the graph body and longer (outer) loops are
further right.
If the terminal is too narrow to fit even one channel, we fall back to
simple inline ```` annotations instead.
"""
if not back_edges:
return lines
num_channels = len(back_edges)
# Sort by span length ascending → inner loops get nearest channel
sorted_be = sorted(back_edges, key=lambda b: b["source_idx"] - b["target_idx"])
# --- Insert dedicated connector lines for back-edge sources ---
# Each back-edge source gets a blank line inserted after its node
# section (after any forward-edge lines). We process insertions in
# reverse order so that earlier indices remain valid.
all_node_lines_set = set(node_line_map.values())
insertions: list[tuple[int, int]] = [] # (insert_after_line, be_index)
for be_idx, be in enumerate(sorted_be):
source_node_line = node_line_map.get(be["source"])
if source_node_line is None:
continue
# Walk forward to find the last line in this node's section
last_section_line = source_node_line
for li in range(source_node_line + 1, len(lines)):
if li in all_node_lines_set:
break
last_section_line = li
insertions.append((last_section_line, be_idx))
source_line_for_be: dict[int, int] = {}
for insert_after, be_idx in sorted(insertions, reverse=True):
insert_at = insert_after + 1
lines.insert(insert_at, "") # placeholder for connector
source_line_for_be[be_idx] = insert_at
# Shift node_line_map entries that come after the insertion point
for nid in node_line_map:
if node_line_map[nid] > insert_after:
node_line_map[nid] += 1
# Also shift already-assigned source lines
for prev_idx in source_line_for_be:
if prev_idx != be_idx and source_line_for_be[prev_idx] > insert_after:
source_line_for_be[prev_idx] += 1
# Recompute max content width after insertions
max_content_w = max(_plain_len(ln) for ln in lines) if lines else 0
# Check if we have room for channels
channels_total_w = num_channels * _CHANNEL_WIDTH
if max_content_w + channels_total_w + 2 > available_width:
return self._inline_back_edge_fallback(lines, node_line_map, back_edges)
content_pad = max_content_w + 3 # gap between content and first channel
# Build channel info with final line positions
channel_info: list[dict] = []
for ch_idx, be in enumerate(sorted_be):
target_line = node_line_map.get(be["target"])
source_line = source_line_for_be.get(ch_idx)
if target_line is None or source_line is None:
continue
col = content_pad + ch_idx * _CHANNEL_WIDTH
channel_info.append(
{
"target_line": target_line,
"source_line": source_line,
"col": col,
}
)
if not channel_info:
return lines
# Build overlay grid — one row per line, columns for channel area
total_width = content_pad + num_channels * _CHANNEL_WIDTH + 1
overlay_width = total_width - max_content_w
overlays: list[list[str]] = [[" "] * overlay_width for _ in range(len(lines))]
for ci in channel_info:
tl = ci["target_line"]
sl = ci["source_line"]
col_offset = ci["col"] - max_content_w
if col_offset < 0 or col_offset >= overlay_width:
continue
# Target line: ◄──...──┐
if 0 <= tl < len(overlays):
for c in range(col_offset):
if overlays[tl][c] == " ":
overlays[tl][c] = ""
overlays[tl][col_offset] = ""
# Source line: ──...──┘
if 0 <= sl < len(overlays):
for c in range(col_offset):
if overlays[sl][c] == " ":
overlays[sl][c] = ""
overlays[sl][col_offset] = ""
# Vertical lines between target+1 and source-1
for li in range(tl + 1, sl):
if 0 <= li < len(overlays) and overlays[li][col_offset] == " ":
overlays[li][col_offset] = ""
# Merge overlays into the line strings
result: list[str] = []
for i, line in enumerate(lines):
pw = _plain_len(line)
pad = max_content_w - pw
overlay_chars = overlays[i] if i < len(overlays) else []
overlay_str = "".join(overlay_chars)
overlay_trimmed = overlay_str.rstrip()
if overlay_trimmed:
is_target_line = any(ci["target_line"] == i for ci in channel_info)
if is_target_line:
overlay_trimmed = "" + overlay_trimmed[1:]
is_source_line = any(ci["source_line"] == i for ci in channel_info)
if is_source_line and not line.strip():
# Inserted blank line → build └───┘ connector.
# " └" = 3 chars of content prefix, so remaining pad = max_content_w - 3
remaining_pad = max_content_w - 3
full = list(" " * remaining_pad + overlay_trimmed)
# Find the ┘ corner for this source connector
corner_pos = -1
for ci_s in channel_info:
if ci_s["source_line"] == i:
corner_pos = remaining_pad + (ci_s["col"] - max_content_w)
break
# Fill everything up to the corner with ─
if corner_pos >= 0:
for c in range(corner_pos):
if full[c] not in ("", "", ""):
full[c] = ""
connector = "" + "".join(full).rstrip()
result.append(f"[dim]{connector}[/dim]")
continue
colored_overlay = f"[dim]{' ' * pad}{overlay_trimmed}[/dim]"
result.append(f"{line}{colored_overlay}")
else:
result.append(line)
return result
def _inline_back_edge_fallback(
self,
lines: list[str],
node_line_map: dict[str, int],
back_edges: list[dict],
) -> list[str]:
"""Fallback: add inline ↺ annotations when terminal is too narrow for channels."""
# Group back-edges by source node
source_to_be: dict[str, list[dict]] = {}
for be in back_edges:
source_to_be.setdefault(be["source"], []).append(be)
result = list(lines)
# Insert annotation lines after each source node's section
offset = 0
all_node_lines = sorted(node_line_map.values())
for source, bes in source_to_be.items():
source_line = node_line_map.get(source)
if source_line is None:
continue
# Find end of source node section
end_line = source_line
for nl in all_node_lines:
if nl > source_line:
end_line = nl - 1
break
else:
end_line = len(lines) - 1
# Insert after last content line of this node's section
insert_at = end_line + offset + 1
for be in bes:
cond = ""
edge = be["edge"]
if edge.condition.value not in ("always", "on_success"):
cond = f" [dim]({edge.condition.value})[/dim]"
annotation = f" [yellow]↺[/yellow] {be['target']}{cond}"
result.insert(insert_at, annotation)
insert_at += 1
offset += 1
return result
# ------------------------------------------------------------------
# Main display
# ------------------------------------------------------------------
def _display_graph(self) -> None:
"""Display the graph as an ASCII DAG with edge connectors and loop channels."""
display = self.query_one("#graph-display", RichLog)
display.clear()
graph = self._graph
display.write(f"[bold cyan]Agent Graph:[/bold cyan] {graph.id}\n")
ordered = self._topo_order()
order_idx = {nid: i for i, nid in enumerate(ordered)}
# --- Pass 1: Build line buffer ---
lines: list[str] = []
node_line_map: dict[str, int] = {}
for node_id in ordered:
node_line_map[node_id] = len(lines)
lines.append(self._render_node_line(node_id))
for edge_line in self._render_edges(node_id, order_idx):
lines.append(edge_line)
# --- Pass 2: Overlay return channels for back-edges ---
back_edges = self._detect_back_edges(ordered)
if back_edges:
# Try to get actual widget width; default to a reasonable value
try:
available_width = self.size.width or 60
except Exception:
available_width = 60
lines = self._overlay_return_channels(lines, node_line_map, back_edges, available_width)
# Write all lines
for line in lines:
display.write(line)
# Execution path footer
if self.execution_path:
display.write("")
display.write(f"[dim]Path:[/dim] {''.join(self.execution_path[-5:])}")
# Event sources section
self._render_event_sources(display)
# ------------------------------------------------------------------
# Event sources display
# ------------------------------------------------------------------
def _render_event_sources(self, display: RichLog) -> None:
"""Render event source info (webhooks, timers) below the graph."""
entry_points = self.runtime.get_entry_points()
# Filter to non-manual entry points (webhooks, timers, events)
event_sources = [ep for ep in entry_points if ep.trigger_type not in ("manual",)]
if not event_sources:
return
display.write("")
display.write("[bold cyan]Event Sources[/bold cyan]")
config = self.runtime._config
for ep in event_sources:
if ep.trigger_type == "timer":
cron_expr = ep.trigger_config.get("cron")
interval = ep.trigger_config.get("interval_minutes", "?")
schedule_label = f"cron: {cron_expr}" if cron_expr else f"every {interval} min"
display.write(f" [green]⏱[/green] {ep.name} [dim]→ {ep.entry_node}[/dim]")
# Show schedule + next fire countdown
next_fire = self.runtime._timer_next_fire.get(ep.id)
if next_fire is not None:
remaining = max(0, next_fire - time.monotonic())
hours, rem = divmod(int(remaining), 3600)
mins, secs = divmod(rem, 60)
if hours > 0:
countdown = f"{hours}h {mins:02d}m {secs:02d}s"
else:
countdown = f"{mins}m {secs:02d}s"
display.write(f" [dim]{schedule_label} — next in {countdown}[/dim]")
else:
display.write(f" [dim]{schedule_label}[/dim]")
elif ep.trigger_type in ("event", "webhook"):
display.write(f" [yellow]⚡[/yellow] {ep.name} [dim]→ {ep.entry_node}[/dim]")
# Show webhook endpoint if configured
route = None
for r in config.webhook_routes:
src = r.get("source_id", "")
if src and src in ep.id:
route = r
break
if not route and config.webhook_routes:
# Fall back to first route
route = config.webhook_routes[0]
if route:
host = config.webhook_host
port = config.webhook_port
path = route.get("path", "/webhook")
display.write(f" [dim]{host}:{port}{path}[/dim]")
else:
event_types = ep.trigger_config.get("event_types", [])
if event_types:
display.write(f" [dim]events: {', '.join(event_types)}[/dim]")
# ------------------------------------------------------------------
# Public API (called by app.py)
# ------------------------------------------------------------------
def update_active_node(self, node_id: str) -> None:
"""Update the currently active node."""
self.active_node = node_id
if node_id not in self.execution_path:
self.execution_path.append(node_id)
self._display_graph()
def update_execution(self, event) -> None:
"""Update the displayed node status based on execution lifecycle events."""
if event.type == EventType.EXECUTION_STARTED:
self._node_status.clear()
self.execution_path.clear()
entry_node = event.data.get("entry_node") or (
self._graph.entry_node if self.runtime else None
)
if entry_node:
self.update_active_node(entry_node)
elif event.type == EventType.EXECUTION_COMPLETED:
self.active_node = None
self._node_status.clear()
self._display_graph()
elif event.type == EventType.EXECUTION_FAILED:
error = event.data.get("error", "Unknown error")
if self.active_node:
self._node_status[self.active_node] = f"[red]FAILED: {error}[/red]"
self.active_node = None
self._display_graph()
# -- Event handlers called by app.py _handle_event --
def handle_node_loop_started(self, node_id: str) -> None:
"""A node's event loop has started."""
self._node_status[node_id] = "thinking..."
self.update_active_node(node_id)
def handle_node_loop_iteration(self, node_id: str, iteration: int) -> None:
"""A node advanced to a new loop iteration."""
self._node_status[node_id] = f"step {iteration}"
self._display_graph()
def handle_node_loop_completed(self, node_id: str) -> None:
"""A node's event loop completed."""
self._node_status.pop(node_id, None)
if self.active_node == node_id:
self.active_node = None
self._display_graph()
def handle_tool_call(self, node_id: str, tool_name: str, *, started: bool) -> None:
"""Show tool activity next to the active node."""
if started:
self._node_status[node_id] = f"{tool_name}..."
else:
# Restore to generic thinking status after tool completes
self._node_status[node_id] = "thinking..."
self._display_graph()
def handle_stalled(self, node_id: str, reason: str) -> None:
"""Highlight a stalled node."""
self._node_status[node_id] = f"[red]stalled: {reason}[/red]"
self._display_graph()
def handle_edge_traversed(self, source_node: str, target_node: str) -> None:
"""Highlight an edge being traversed."""
self._node_status[source_node] = f"[dim]→ {target_node}[/dim]"
self._display_graph()
-172
View File
@@ -1,172 +0,0 @@
"""
Log formatting utilities and LogPane widget.
The module-level functions (format_event, extract_event_text, format_python_log)
can be used by any widget that needs to render log lines without instantiating LogPane.
"""
import logging
from datetime import datetime
from textual.app import ComposeResult
from textual.containers import Container
from framework.runtime.event_bus import AgentEvent, EventType
from framework.tui.widgets.selectable_rich_log import SelectableRichLog as RichLog
# --- Module-level formatting constants ---
EVENT_FORMAT: dict[EventType, tuple[str, str]] = {
EventType.EXECUTION_STARTED: (">>", "bold cyan"),
EventType.EXECUTION_COMPLETED: ("<<", "bold green"),
EventType.EXECUTION_FAILED: ("!!", "bold red"),
EventType.TOOL_CALL_STARTED: ("->", "yellow"),
EventType.TOOL_CALL_COMPLETED: ("<-", "green"),
EventType.NODE_LOOP_STARTED: ("@@", "cyan"),
EventType.NODE_LOOP_ITERATION: ("..", "dim"),
EventType.NODE_LOOP_COMPLETED: ("@@", "dim"),
EventType.LLM_TURN_COMPLETE: ("", "green"),
EventType.NODE_STALLED: ("!!", "bold yellow"),
EventType.NODE_INPUT_BLOCKED: ("!!", "yellow"),
EventType.GOAL_PROGRESS: ("%%", "blue"),
EventType.GOAL_ACHIEVED: ("**", "bold green"),
EventType.CONSTRAINT_VIOLATION: ("!!", "bold red"),
EventType.STATE_CHANGED: ("~~", "dim"),
EventType.CLIENT_INPUT_REQUESTED: ("??", "magenta"),
}
LOG_LEVEL_COLORS: dict[int, str] = {
logging.DEBUG: "dim",
logging.INFO: "",
logging.WARNING: "yellow",
logging.ERROR: "red",
logging.CRITICAL: "bold red",
}
# --- Module-level formatting functions ---
def extract_event_text(event: AgentEvent) -> str:
"""Extract human-readable text from an event's data dict."""
et = event.type
data = event.data
if et == EventType.EXECUTION_STARTED:
return "Execution started"
elif et == EventType.EXECUTION_COMPLETED:
return "Execution completed"
elif et == EventType.EXECUTION_FAILED:
return f"Execution FAILED: {data.get('error', 'unknown')}"
elif et == EventType.TOOL_CALL_STARTED:
return f"Tool call: {data.get('tool_name', 'unknown')}"
elif et == EventType.TOOL_CALL_COMPLETED:
name = data.get("tool_name", "unknown")
if data.get("is_error"):
preview = str(data.get("result", ""))[:80]
return f"Tool error: {name} - {preview}"
return f"Tool done: {name}"
elif et == EventType.NODE_LOOP_STARTED:
return f"Node started: {event.node_id or 'unknown'}"
elif et == EventType.NODE_LOOP_ITERATION:
return f"{event.node_id or 'unknown'} iteration {data.get('iteration', '?')}"
elif et == EventType.NODE_LOOP_COMPLETED:
return f"Node done: {event.node_id or 'unknown'}"
elif et == EventType.NODE_STALLED:
reason = data.get("reason", "")
node = event.node_id or "unknown"
return f"Node stalled: {node} - {reason}" if reason else f"Node stalled: {node}"
elif et == EventType.NODE_INPUT_BLOCKED:
return f"Node input blocked: {event.node_id or 'unknown'}"
elif et == EventType.GOAL_PROGRESS:
return f"Goal progress: {data.get('progress', '?')}"
elif et == EventType.GOAL_ACHIEVED:
return "Goal achieved"
elif et == EventType.CONSTRAINT_VIOLATION:
return f"Constraint violated: {data.get('description', 'unknown')}"
elif et == EventType.STATE_CHANGED:
return f"State changed: {data.get('key', 'unknown')}"
elif et == EventType.CLIENT_INPUT_REQUESTED:
return "Waiting for user input"
elif et == EventType.LLM_TURN_COMPLETE:
stop = data.get("stop_reason", "?")
model = data.get("model", "?")
inp = data.get("input_tokens", 0)
out = data.get("output_tokens", 0)
return f"{model}{stop} ({inp}+{out} tokens)"
else:
return f"{et.value}: {data}"
def format_event(event: AgentEvent) -> str:
"""Format an AgentEvent as a Rich markup string with timestamp + symbol."""
ts = event.timestamp.strftime("%H:%M:%S")
symbol, color = EVENT_FORMAT.get(event.type, ("--", "dim"))
text = extract_event_text(event)
return f"[dim]{ts}[/dim] [{color}]{symbol} {text}[/{color}]"
def format_python_log(record: logging.LogRecord) -> str:
"""Format a Python log record as a Rich markup string with timestamp and severity color."""
ts = datetime.fromtimestamp(record.created).strftime("%H:%M:%S")
color = LOG_LEVEL_COLORS.get(record.levelno, "")
msg = record.getMessage()
if color:
return f"[dim]{ts}[/dim] [{color}]{record.levelname}[/{color}] {msg}"
else:
return f"[dim]{ts}[/dim] {record.levelname} {msg}"
# --- LogPane widget (kept for backward compatibility) ---
class LogPane(Container):
"""Widget to display logs with reliable rendering."""
DEFAULT_CSS = """
LogPane {
width: 100%;
height: 100%;
}
LogPane > RichLog {
width: 100%;
height: 100%;
background: $surface;
border: none;
scrollbar-background: $panel;
scrollbar-color: $primary;
}
"""
def compose(self) -> ComposeResult:
yield RichLog(id="main-log", highlight=True, markup=True, auto_scroll=False)
def write_event(self, event: AgentEvent) -> None:
"""Format an AgentEvent with timestamp + symbol and write to the log."""
self.write_log(format_event(event))
def write_python_log(self, record: logging.LogRecord) -> None:
"""Format a Python log record with timestamp and severity color."""
self.write_log(format_python_log(record))
def write_log(self, message: str) -> None:
"""Write a log message to the log pane."""
try:
if not self.is_mounted:
return
log = self.query_one("#main-log", RichLog)
if not log.is_mounted:
return
was_at_bottom = log.is_vertical_scroll_end
log.write(message)
if was_at_bottom:
log.scroll_end(animate=False)
except Exception:
pass
@@ -1,229 +0,0 @@
"""
SelectableRichLog - RichLog with mouse-driven text selection and clipboard copy.
Drop-in replacement for RichLog. Click-and-drag to select text, which is
visually highlighted. Press Ctrl+C to copy selection to clipboard (handled
by app.py). Press Escape or single-click to clear selection.
"""
from __future__ import annotations
import subprocess
import sys
from rich.segment import Segment as RichSegment
from rich.style import Style
from textual.geometry import Offset
from textual.selection import Selection
from textual.strip import Strip
from textual.widgets import RichLog
# Highlight style for selected text
_HIGHLIGHT_STYLE = Style(bgcolor="blue", color="white")
class SelectableRichLog(RichLog):
"""RichLog with mouse-driven text selection."""
DEFAULT_CSS = """
SelectableRichLog {
pointer: text;
}
"""
def __init__(self, **kwargs) -> None:
super().__init__(**kwargs)
self._sel_anchor: Offset | None = None
self._sel_end: Offset | None = None
self._selecting: bool = False
# -- Internal helpers --
def _apply_highlight(self, strip: Strip) -> Strip:
"""Apply highlight with correct precedence (highlight wins over base style)."""
segments = []
for text, style, control in strip._segments:
if control:
segments.append(RichSegment(text, style, control))
else:
new_style = (style + _HIGHLIGHT_STYLE) if style else _HIGHLIGHT_STYLE
segments.append(RichSegment(text, new_style, control))
return Strip(segments, strip.cell_length)
# -- Selection helpers --
@property
def selection(self) -> Selection | None:
"""Build a Selection from current anchor/end, or None if no selection."""
if self._sel_anchor is None or self._sel_end is None:
return None
if self._sel_anchor == self._sel_end:
return None
return Selection.from_offsets(self._sel_anchor, self._sel_end)
def _mouse_to_content(self, event_x: int, event_y: int) -> Offset:
"""Convert viewport mouse coords to content (line, col) coords."""
scroll_x, scroll_y = self.scroll_offset
return Offset(scroll_x + event_x, scroll_y + event_y)
def clear_selection(self) -> None:
"""Clear any active selection."""
had_selection = self._sel_anchor is not None
self._sel_anchor = None
self._sel_end = None
self._selecting = False
if had_selection:
self.refresh()
# -- Mouse handlers (left button only) --
def on_mouse_down(self, event) -> None:
"""Start selection on left mouse button."""
if event.button != 1:
return
self._sel_anchor = self._mouse_to_content(event.x, event.y)
self._sel_end = self._sel_anchor
self._selecting = True
self.capture_mouse()
self.refresh()
def on_mouse_move(self, event) -> None:
"""Extend selection while dragging."""
if not self._selecting:
return
self._sel_end = self._mouse_to_content(event.x, event.y)
self.refresh()
def on_mouse_up(self, event) -> None:
"""End selection on mouse release."""
if not self._selecting:
return
self._selecting = False
self.release_mouse()
# Single-click (no drag) clears selection
if self._sel_anchor == self._sel_end:
self.clear_selection()
# -- Keyboard handlers --
def on_key(self, event) -> None:
"""Clear selection on Escape."""
if event.key == "escape":
self.clear_selection()
# -- Rendering with highlight --
def render_line(self, y: int) -> Strip:
"""Override to apply selection highlight on top of the base strip."""
strip = super().render_line(y)
sel = self.selection
if sel is None:
return strip
# Determine which content line this viewport row corresponds to
_, scroll_y = self.scroll_offset
content_y = scroll_y + y
span = sel.get_span(content_y)
if span is None:
return strip
start_x, end_x = span
cell_len = strip.cell_length
if cell_len == 0:
return strip
scroll_x, _ = self.scroll_offset
# -1 means "to end of content line" — use viewport end
if end_x == -1:
end_x = cell_len
else:
# Convert content-space x to viewport-space x
end_x = end_x - scroll_x
# Convert content-space x to viewport-space x
start_x = start_x - scroll_x
# Clamp to viewport strip bounds
start_x = max(0, start_x)
end_x = min(end_x, cell_len)
if start_x >= end_x:
return strip
# Divide strip into [before, selected, after] and highlight the middle
parts = strip.divide([start_x, end_x])
if len(parts) < 2:
return strip
highlighted_parts: list[Strip] = []
for i, part in enumerate(parts):
if i == 1:
highlighted_parts.append(self._apply_highlight(part))
else:
highlighted_parts.append(part)
return Strip.join(highlighted_parts)
# -- Text extraction & clipboard --
def get_selected_text(self) -> str | None:
"""Extract the plain text of the current selection, or None."""
sel = self.selection
if sel is None:
return None
# Build full text from all lines
all_text = "\n".join(strip.text for strip in self.lines)
try:
extracted = sel.extract(all_text)
except (IndexError, ValueError):
# Selection coordinates can exceed line count when the virtual
# canvas is larger than the actual content (e.g. after scroll).
return None
return extracted if extracted else None
def copy_selection(self) -> str | None:
"""Copy selected text to system clipboard. Returns text or None."""
text = self.get_selected_text()
if not text:
return None
_copy_to_clipboard(text)
return text
def _copy_to_clipboard(text: str) -> None:
"""Copy text to system clipboard using platform-native tools."""
try:
if sys.platform == "darwin":
subprocess.run(["pbcopy"], encoding="utf-8", input=text.encode(), check=True, timeout=5)
elif sys.platform == "win32":
subprocess.run(
["clip.exe"],
encoding="utf-8",
input=text.encode("utf-16le"),
check=True,
timeout=5,
)
elif sys.platform.startswith("linux"):
try:
subprocess.run(
["xclip", "-selection", "clipboard"],
encoding="utf-8",
input=text.encode(),
check=True,
timeout=5,
)
except (subprocess.SubprocessError, FileNotFoundError):
subprocess.run(
["xsel", "--clipboard", "--input"],
encoding="utf-8",
input=text.encode(),
check=True,
timeout=5,
)
except (subprocess.SubprocessError, FileNotFoundError):
pass
+2 -2
View File
@@ -12,8 +12,8 @@ export interface LiveSession {
loaded_at: number;
uptime_seconds: number;
intro_message?: string;
/** Queen operating phase — "building", "staging", or "running" */
queen_phase?: "building" | "staging" | "running";
/** Queen operating phase — "planning", "building", "staging", or "running" */
queen_phase?: "planning" | "building" | "staging" | "running";
/** Present in 409 conflict responses when worker is still loading */
loading?: boolean;
}
+2 -2
View File
@@ -31,7 +31,7 @@ interface AgentGraphProps {
version?: string;
runState?: RunState;
building?: boolean;
queenPhase?: "building" | "staging" | "running";
queenPhase?: "planning" | "building" | "staging" | "running";
}
// --- Extracted RunButton so hover state survives parent re-renders ---
@@ -278,7 +278,7 @@ export default function AgentGraph({ nodes, title: _title, onNodeClick, onRun, o
</span>
)}
</div>
<RunButton runState={runState} disabled={nodes.length === 0 || queenPhase === "building"} onRun={handleRun} onPause={onPause ?? (() => {})} btnRef={runBtnRef} />
<RunButton runState={runState} disabled={nodes.length === 0 || queenPhase === "building" || queenPhase === "planning"} onRun={handleRun} onPause={onPause ?? (() => {})} btnRef={runBtnRef} />
</div>
<div className="flex-1 flex items-center justify-center px-5">
{building ? (
+5 -3
View File
@@ -39,7 +39,7 @@ interface ChatPanelProps {
/** Called when user dismisses the pending question without answering */
onQuestionDismiss?: () => void;
/** Queen operating phase — shown as a tag on queen messages */
queenPhase?: "building" | "staging" | "running";
queenPhase?: "planning" | "building" | "staging" | "running";
}
const queenColor = "hsl(45,95%,58%)";
@@ -144,7 +144,7 @@ function ToolActivityRow({ content }: { content: string }) {
);
}
const MessageBubble = memo(function MessageBubble({ msg, queenPhase }: { msg: ChatMessage; queenPhase?: "building" | "staging" | "running" }) {
const MessageBubble = memo(function MessageBubble({ msg, queenPhase }: { msg: ChatMessage; queenPhase?: "planning" | "building" | "staging" | "running" }) {
const isUser = msg.type === "user";
const isQueen = msg.role === "queen";
const color = getColor(msg.agent, msg.role);
@@ -204,7 +204,9 @@ const MessageBubble = memo(function MessageBubble({ msg, queenPhase }: { msg: Ch
? "running phase"
: queenPhase === "staging"
? "staging phase"
: "building phase"
: queenPhase === "planning"
? "planning phase"
: "building phase"
: "Worker"}
</span>
</div>
+2 -1
View File
@@ -121,7 +121,8 @@ export function sseEventToChatMessage(
id: `paused-${event.execution_id}`,
agent: "System",
agentColor: "",
content: "Execution paused by user",
content:
(event.data?.reason as string) || "Execution paused",
timestamp: "",
type: "system",
thread,
+9 -6
View File
@@ -255,8 +255,8 @@ interface AgentBackendState {
/** The message ID of the current worker input request (for inline reply box) */
workerInputMessageId: string | null;
queenBuilding: boolean;
/** Queen operating phase — "building" (coding), "staging" (loaded), or "running" (executing) */
queenPhase: "building" | "staging" | "running";
/** Queen operating phase — "planning" (design), "building" (coding), "staging" (loaded), or "running" (executing) */
queenPhase: "planning" | "building" | "staging" | "running";
workerRunState: "idle" | "deploying" | "running";
currentExecutionId: string | null;
nodeLogs: Record<string, string[]>;
@@ -291,7 +291,7 @@ function defaultAgentState(): AgentBackendState {
awaitingInput: false,
workerInputMessageId: null,
queenBuilding: false,
queenPhase: "building",
queenPhase: "planning",
workerRunState: "idle",
currentExecutionId: null,
nodeLogs: {},
@@ -892,7 +892,7 @@ export default function Workspace() {
// failed, the throw inside the catch exits the outer try block.
const session = liveSession!;
const displayName = formatAgentDisplayName(session.worker_name || agentType);
const initialPhase = session.queen_phase || (session.has_worker ? "staging" : "building");
const initialPhase = session.queen_phase || (session.has_worker ? "staging" : "planning");
updateAgentState(agentType, {
sessionId: session.session_id,
displayName,
@@ -1788,8 +1788,11 @@ export default function Workspace() {
case "queen_phase_changed": {
const rawPhase = event.data?.phase as string;
const newPhase: "building" | "staging" | "running" =
rawPhase === "running" ? "running" : rawPhase === "staging" ? "staging" : "building";
const newPhase: "planning" | "building" | "staging" | "running" =
rawPhase === "running" ? "running"
: rawPhase === "staging" ? "staging"
: rawPhase === "planning" ? "planning"
: "building";
updateAgentState(agentType, {
queenPhase: newPhase,
queenBuilding: newPhase === "building",
-2
View File
@@ -11,12 +11,10 @@ dependencies = [
"litellm>=1.81.0",
"mcp>=1.0.0",
"fastmcp>=2.0.0",
"textual>=1.0.0",
"tools",
]
[project.optional-dependencies]
tui = ["textual>=0.75.0"]
webhook = ["aiohttp>=3.9.0"]
server = ["aiohttp>=3.9.0"]
testing = [
-90
View File
@@ -1,90 +0,0 @@
"""Tests for ChatTextArea key handling (Enter submits, Shift+Enter / Ctrl+J insert newlines)."""
import pytest
from textual.app import App, ComposeResult
from framework.tui.widgets.chat_repl import ChatTextArea
class ChatTextAreaApp(App):
"""Minimal app that mounts a ChatTextArea for testing."""
submitted_texts: list[str]
def compose(self) -> ComposeResult:
yield ChatTextArea(id="input")
def on_mount(self) -> None:
self.submitted_texts = []
def on_chat_text_area_submitted(self, message: ChatTextArea.Submitted) -> None:
self.submitted_texts.append(message.text)
@pytest.fixture
def app():
return ChatTextAreaApp()
@pytest.mark.asyncio
async def test_enter_submits_text(app):
"""Pressing Enter should post a Submitted message and clear the widget."""
async with app.run_test() as pilot:
await pilot.press("h", "e", "l", "l", "o")
await pilot.press("enter")
assert app.submitted_texts == ["hello"]
@pytest.mark.asyncio
async def test_enter_on_empty_does_not_submit(app):
"""Pressing Enter with no text should not post a Submitted message."""
async with app.run_test() as pilot:
await pilot.press("enter")
assert app.submitted_texts == []
@pytest.mark.asyncio
async def test_shift_enter_inserts_newline(app):
"""Shift+Enter should insert a newline, not submit."""
async with app.run_test() as pilot:
widget = app.query_one("#input", ChatTextArea)
await pilot.press("a")
await pilot.press("shift+enter")
await pilot.press("b")
assert app.submitted_texts == []
assert "\n" in widget.text
assert widget.text.startswith("a")
assert widget.text.endswith("b")
@pytest.mark.asyncio
async def test_ctrl_j_inserts_newline(app):
"""Ctrl+J should insert a newline (fallback for terminals without Shift+Enter)."""
async with app.run_test() as pilot:
widget = app.query_one("#input", ChatTextArea)
await pilot.press("a")
await pilot.press("ctrl+j")
await pilot.press("b")
assert app.submitted_texts == []
assert "\n" in widget.text
assert widget.text.startswith("a")
assert widget.text.endswith("b")
@pytest.mark.asyncio
async def test_multiline_submit(app):
"""Typing multiline text via Ctrl+J then pressing Enter should submit all lines."""
async with app.run_test() as pilot:
await pilot.press("a")
await pilot.press("ctrl+j")
await pilot.press("b")
await pilot.press("enter")
assert len(app.submitted_texts) == 1
assert app.submitted_texts[0] == "a\nb"
+31 -8
View File
@@ -763,7 +763,7 @@ class TestClientFacingBlocking:
class TestEscalate:
@pytest.mark.asyncio
async def test_escalate_emits_event(self, runtime, node_spec, memory):
"""escalate() should publish ESCALATION_REQUESTED."""
"""escalate() should publish ESCALATION_REQUESTED and block for queen guidance."""
node_spec.output_keys = []
llm = MockStreamingLLM(
scenarios=[
@@ -772,7 +772,6 @@ class TestEscalate:
{
"reason": "tool failure",
"context": "HTTP 401 from upstream",
"wait_for_response": False,
},
tool_use_id="escalate_1",
),
@@ -789,7 +788,20 @@ class TestEscalate:
ctx = build_ctx(runtime, node_spec, memory, llm, stream_id="worker")
node = EventLoopNode(event_bus=bus, config=LoopConfig(max_iterations=5))
async def queen_reply():
await asyncio.sleep(0.05)
await node.inject_event("Acknowledged, proceed.")
task = asyncio.create_task(queen_reply())
async def queen_reply():
await asyncio.sleep(0.05)
await node.inject_event("Acknowledged, proceed.")
task = asyncio.create_task(queen_reply())
result = await node.execute(ctx)
await task
assert result.success is True
assert len(received) == 1
@@ -808,7 +820,6 @@ class TestEscalate:
{
"reason": "blocked",
"context": "dependency missing",
"wait_for_response": False,
},
tool_use_id="escalate_1",
),
@@ -827,7 +838,14 @@ class TestEscalate:
ctx = build_ctx(runtime, node_spec, memory, llm, stream_id="worker")
node = EventLoopNode(event_bus=bus, config=LoopConfig(max_iterations=5))
async def queen_reply():
await asyncio.sleep(0.05)
await node.inject_event("Queen acknowledges escalation.")
task = asyncio.create_task(queen_reply())
result = await node.execute(ctx)
await task
assert result.success is True
queen_node.inject_event.assert_awaited_once()
@@ -842,7 +860,7 @@ class TestEscalate:
@pytest.mark.asyncio
async def test_escalate_waits_for_queen_input_and_skips_judge(self, runtime, node_spec, memory):
"""wait_for_response=true should block for queen input before judge evaluation."""
"""escalate() should block for queen input before judge evaluation."""
node_spec.output_keys = ["result"]
llm = MockStreamingLLM(
scenarios=[
@@ -851,7 +869,6 @@ class TestEscalate:
{
"reason": "need direction",
"context": "conflicting constraints",
"wait_for_response": True,
},
tool_use_id="escalate_1",
),
@@ -1756,9 +1773,9 @@ class TestIsToolDoomLoop:
def test_different_args_no_doom(self):
node = EventLoopNode(config=LoopConfig(tool_doom_loop_threshold=3))
fp1 = [("search", '{"q": "a"}')]
fp2 = [("search", '{"q": "b"}')]
fp3 = [("search", '{"q": "c"}')]
fp1 = [("search", '{"q": "deploy kubernetes cluster to production"}')]
fp2 = [("read_file", '{"path": "/etc/nginx/nginx.conf"}')]
fp3 = [("execute", '{"command": "SELECT * FROM users WHERE active=true"}')]
is_doom, _ = node._is_tool_doom_loop([fp1, fp2, fp3])
assert is_doom is False
@@ -1886,6 +1903,7 @@ class TestToolDoomLoopIntegration:
config=LoopConfig(
max_iterations=10,
tool_doom_loop_threshold=3,
stall_similarity_threshold=1.0, # disable fuzzy stall detection
),
)
result = await node.execute(ctx)
@@ -1941,6 +1959,7 @@ class TestToolDoomLoopIntegration:
config=LoopConfig(
max_iterations=10,
tool_doom_loop_threshold=3,
stall_similarity_threshold=1.0, # disable fuzzy stall detection
),
)
result = await node.execute(ctx)
@@ -2005,6 +2024,7 @@ class TestToolDoomLoopIntegration:
config=LoopConfig(
max_iterations=10,
tool_doom_loop_threshold=3,
stall_similarity_threshold=1.0, # disable fuzzy stall detection
),
)
result = await node.execute(ctx)
@@ -2056,6 +2076,7 @@ class TestToolDoomLoopIntegration:
config=LoopConfig(
max_iterations=10,
tool_doom_loop_enabled=False,
stall_similarity_threshold=1.0, # disable fuzzy stall detection
),
)
result = await node.execute(ctx)
@@ -2144,6 +2165,7 @@ class TestToolDoomLoopIntegration:
config=LoopConfig(
max_iterations=10,
tool_doom_loop_threshold=3,
stall_similarity_threshold=1.0, # disable fuzzy stall detection
),
)
result = await node.execute(ctx)
@@ -2206,6 +2228,7 @@ class TestToolDoomLoopIntegration:
config=LoopConfig(
max_iterations=10,
tool_doom_loop_threshold=3,
stall_similarity_threshold=1.0, # disable fuzzy stall detection
),
)
result = await node.execute(ctx)
-13
View File
@@ -40,16 +40,3 @@ class TestMCPDependencies:
from mcp.server import FastMCP
assert FastMCP is not None
class TestMCPPackageExports:
"""Tests for the framework.mcp package exports."""
def test_package_importable(self):
"""Test that framework.mcp package can be imported."""
if not MCP_AVAILABLE:
pytest.skip(MCP_SKIP_REASON)
import framework.mcp
assert framework.mcp is not None
+8 -12
View File
@@ -970,13 +970,13 @@ class TestEscalationFlow:
)
@pytest.mark.asyncio
async def test_wait_for_response_emits_client_events(
async def test_wait_for_response_emits_escalation_event(
self,
runtime,
parent_node_spec,
subagent_node_spec,
):
"""Escalation should emit CLIENT_OUTPUT_DELTA and CLIENT_INPUT_REQUESTED events."""
"""Escalation should emit ESCALATION_REQUESTED to the queen."""
from framework.graph.event_loop_node import _EscalationReceiver
bus = EventBus()
@@ -986,7 +986,7 @@ class TestEscalationFlow:
bus_events.append(event)
bus.subscribe(
event_types=[EventType.CLIENT_OUTPUT_DELTA, EventType.CLIENT_INPUT_REQUESTED],
event_types=[EventType.ESCALATION_REQUESTED],
handler=handler,
)
@@ -1034,16 +1034,12 @@ class TestEscalationFlow:
await node._execute_subagent(ctx, "researcher", "Navigate page with CAPTCHA")
await injector
# Should have emitted both events
output_deltas = [e for e in bus_events if e.type == EventType.CLIENT_OUTPUT_DELTA]
input_requests = [e for e in bus_events if e.type == EventType.CLIENT_INPUT_REQUESTED]
# Should have emitted ESCALATION_REQUESTED
escalation_events = [e for e in bus_events if e.type == EventType.ESCALATION_REQUESTED]
assert len(output_deltas) >= 1, "Should emit CLIENT_OUTPUT_DELTA with the message"
assert output_deltas[0].data["content"] == "CAPTCHA detected on page"
assert output_deltas[0].node_id == "parent" # Shows as parent talking
assert len(input_requests) >= 1, "Should emit CLIENT_INPUT_REQUESTED for routing"
assert ":escalation:" in input_requests[0].node_id # Escalation ID for routing
assert len(escalation_events) >= 1, "Should emit ESCALATION_REQUESTED"
assert escalation_events[0].data["context"] == "CAPTCHA detected on page"
assert ":escalation:" in escalation_events[0].node_id
@pytest.mark.asyncio
async def test_non_blocking_report_still_works(
+30 -34
View File
@@ -3,9 +3,8 @@
Tests the FULL routing chain:
ExecutionStream GraphExecutor EventLoopNode _execute_subagent
_report_callback registers _EscalationReceiver in executor.node_registry
emit CLIENT_INPUT_REQUESTED with escalation_id
subscriber calls stream.inject_input(escalation_id, "done")
ExecutionStream finds _EscalationReceiver in executor.node_registry
emit ESCALATION_REQUESTED (queen handles the escalation)
queen inject_worker_message() finds _EscalationReceiver via get_waiting_nodes()
receiver.inject_event("done") unblocks the subagent
subagent continues and completes
"""
@@ -227,26 +226,30 @@ async def test_escalation_e2e_through_execution_stream(tmp_path):
stream_holder: list[ExecutionStream] = []
async def escalation_handler(event: AgentEvent):
"""Simulate a TUI/runner: when CLIENT_INPUT_REQUESTED arrives with
an escalation node_id, inject the user's response via the stream."""
"""Simulate the queen: when ESCALATION_REQUESTED arrives,
find the waiting receiver and inject the response via the stream."""
all_events.append(event)
if event.type == EventType.CLIENT_INPUT_REQUESTED:
node_id = event.node_id
if ":escalation:" in node_id:
escalation_events.append(event)
# Small delay to simulate user typing
await asyncio.sleep(0.05)
# Route through the REAL inject_input chain
stream = stream_holder[0]
success = await stream.inject_input(node_id, "done logging in")
assert success, (
f"inject_input({node_id!r}) returned False — "
"escalation receiver not found in executor.node_registry"
)
inject_called.set()
if event.type == EventType.ESCALATION_REQUESTED:
escalation_events.append(event)
# Small delay to simulate queen processing
await asyncio.sleep(0.05)
# Route through the REAL inject_input chain — find the waiting
# escalation receiver via get_waiting_nodes() (mirrors what
# inject_worker_message does in the queen lifecycle tools).
stream = stream_holder[0]
waiting = stream.get_waiting_nodes()
assert waiting, "Should have a waiting escalation receiver"
target_node_id = waiting[0]["node_id"]
assert ":escalation:" in target_node_id
success = await stream.inject_input(target_node_id, "done logging in")
assert success, (
f"inject_input({target_node_id!r}) returned False — "
"escalation receiver not found in executor.node_registry"
)
inject_called.set()
bus.subscribe(
event_types=[EventType.CLIENT_INPUT_REQUESTED, EventType.CLIENT_OUTPUT_DELTA],
event_types=[EventType.ESCALATION_REQUESTED],
handler=escalation_handler,
)
@@ -297,17 +300,7 @@ async def test_escalation_e2e_through_execution_stream(tmp_path):
# 3. Escalation event has correct structure
esc_event = escalation_events[0]
assert ":escalation:" in esc_event.node_id
assert esc_event.data["prompt"] == "Login required for LinkedIn. Please log in manually."
# 4. CLIENT_OUTPUT_DELTA was emitted for the escalation message
output_deltas = [
e
for e in all_events
if e.type == EventType.CLIENT_OUTPUT_DELTA and "Login required" in e.data.get("content", "")
]
assert len(output_deltas) >= 1, (
"Should have emitted CLIENT_OUTPUT_DELTA with escalation message"
)
assert esc_event.data["context"] == "Login required for LinkedIn. Please log in manually."
# 5. The parent node got the subagent's result
assert "result" in result.output
@@ -444,7 +437,7 @@ async def test_escalation_cleanup_after_completion(tmp_path):
stream_holder: list[ExecutionStream] = []
async def auto_respond(event: AgentEvent):
if event.type == EventType.CLIENT_INPUT_REQUESTED and ":escalation:" in event.node_id:
if event.type == EventType.ESCALATION_REQUESTED:
stream = stream_holder[0]
# Snapshot the active executor's node_registry BEFORE responding
@@ -462,10 +455,13 @@ async def test_escalation_cleanup_after_completion(tmp_path):
)
await asyncio.sleep(0.02)
await stream.inject_input(event.node_id, "ok")
# Find the waiting escalation receiver and inject response
waiting = stream.get_waiting_nodes()
if waiting:
await stream.inject_input(waiting[0]["node_id"], "ok")
bus.subscribe(
event_types=[EventType.CLIENT_INPUT_REQUESTED],
event_types=[EventType.ESCALATION_REQUESTED],
handler=auto_respond,
)
+1 -1
View File
@@ -23,7 +23,7 @@ Done. For details, prerequisites, and troubleshooting, read on.
## What you get after setup
- **coder-tools** Create and manage agents (scaffolding via `initialize_agent_package`, file I/O, tool discovery).
- **coder-tools** Create and manage agents (scaffolding via `initialize_and_build_agent`, file I/O, tool discovery).
- **tools** File operations, web search, and other agent tools.
- **Documentation** Guided docs for building and testing agents.
+1 -1
View File
@@ -130,7 +130,7 @@ MCP (Model Context Protocol) servers are configured in `.mcp.json` at the projec
}
```
The `coder-tools` server provides agent scaffolding via `initialize_agent_package` and related tools. The `tools` MCP server exposes tools including web search, PDF reading, CSV processing, and file system operations.
The `coder-tools` server provides agent scaffolding via `initialize_and_build_agent` and related tools. The `tools` MCP server exposes tools including web search, PDF reading, CSV processing, and file system operations.
## Storage
+3 -3
View File
@@ -244,7 +244,7 @@ The fastest way to build agents is with the configured MCP workflow:
./quickstart.sh
# Build a new agent
Use the coder-tools MCP tools from your IDE agent chat (e.g., initialize_agent_package)
Use the coder-tools MCP tools from your IDE agent chat (e.g., initialize_and_build_agent)
```
### Agent Development Workflow
@@ -252,7 +252,7 @@ Use the coder-tools MCP tools from your IDE agent chat (e.g., initialize_agent_p
1. **Define Your Goal**
```
Use the coder-tools initialize_agent_package tool
Use the coder-tools initialize_and_build_agent tool
Enter goal: "Build an agent that processes customer support tickets"
```
@@ -555,7 +555,7 @@ uv add <package>
```bash
# Option 1: Use Claude Code skill (recommended)
Use the coder-tools initialize_agent_package tool
Use the coder-tools initialize_and_build_agent tool
# Option 2: Create manually
# Note: exports/ is initially empty (gitignored). Create your agent directory:
+2 -2
View File
@@ -180,7 +180,7 @@ MCP tools are also available in Cursor. To enable:
**Claude Code:**
```
Use the coder-tools initialize_agent_package tool to scaffold a new agent
Use the coder-tools initialize_and_build_agent tool to scaffold a new agent
```
**Codex CLI:**
@@ -453,7 +453,7 @@ This design allows agents in `exports/` to be:
### 2. Build Agent (Claude Code)
```
Use the coder-tools initialize_agent_package tool
Use the coder-tools initialize_and_build_agent tool
Enter goal: "Build an agent that processes customer support tickets"
```
+2 -2
View File
@@ -47,7 +47,7 @@ This is the recommended way to create your first agent.
# Setup already done via quickstart.sh above
# Start Claude Code and build an agent
Use the coder-tools initialize_agent_package tool
Use the coder-tools initialize_and_build_agent tool
```
Follow the interactive prompts to:
@@ -173,7 +173,7 @@ PYTHONPATH=exports uv run python -m my_agent test --type success
1. **Dashboard**: Run `hive open` to launch the web dashboard, or `hive tui` for the terminal UI
2. **Detailed Setup**: See [environment-setup.md](./environment-setup.md)
3. **Developer Guide**: See [developer-guide.md](./developer-guide.md)
4. **Build Agents**: Use the coder-tools `initialize_agent_package` tool in Claude Code
4. **Build Agents**: Use the coder-tools `initialize_and_build_agent` tool in Claude Code
5. **Custom Tools**: Learn to integrate MCP servers
6. **Join Community**: [Discord](https://discord.com/invite/MXE49hrKDk)
+18 -16
View File
@@ -312,11 +312,11 @@ Ship essential framework utilities: Node validation, HITL (Human-in-the-loop pau
- [x] Pause/approve workflow
- [x] State saved to checkpoint
- [x] Resume with HITLResponse merged into context
- [x] **TUI Integration**
- [x] Chat REPL with streaming support (tui/app.py)
- [x] Multi-graph session management
- [x] User presence detection
- [x] Real-time log viewing
- [x] ~~**TUI Integration**~~ *(deprecated — see AGENTS.md; use `hive open` browser UI instead)*
- [x] ~~Chat REPL with streaming support (tui/app.py)~~
- [x] ~~Multi-graph session management~~
- [x] ~~User presence detection~~
- [x] ~~Real-time log viewing~~
- [x] **Node Lifecycle Management**
- [x] Start/stop/pause/resume in execution stream
- [x] State persistence via checkpoint store
@@ -538,11 +538,11 @@ Release CLI tools specifically for rapid memory management and credential store
- [x] test-run, test-debug, test-list, test-stats (testing/cli.py)
- [x] Pytest integration
- [x] Test categorization
- [x] **TUI (Terminal UI)**
- [x] Interactive chat with streaming (tui/app.py)
- [x] Multi-graph management UI
- [x] Log pane for real-time output
- [x] Keyboard shortcuts (Ctrl+C, Ctrl+D, etc.)
- [x] ~~**TUI (Terminal UI)**~~ *(deprecated — see AGENTS.md; use `hive open` browser UI instead)*
- [x] ~~Interactive chat with streaming (tui/app.py)~~
- [x] ~~Multi-graph management UI~~
- [x] ~~Log pane for real-time output~~
- [x] ~~Keyboard shortcuts (Ctrl+C, Ctrl+D, etc.)~~
- [ ] **Memory Management CLI**
- [ ] Memory inspection commands
- [ ] Memory cleanup utilities
@@ -776,12 +776,14 @@ Implement an interactive, drag-and-drop canvas (using libraries like React Flow)
### TUI to GUI Upgrade
Port the existing Terminal User Interface (TUI) into a rich web application, allowing users to interact directly with the Queen Bee / Coding Agent via a browser chat interface.
- [x] **TUI Foundation**
- [x] Terminal chat interface (tui/app.py)
- [x] Streaming support
- [x] Multi-graph management
- [x] Log pane display
- [x] Keyboard shortcuts
> **Note:** The TUI (`hive tui` / `tui/app.py`) is deprecated and no longer maintained (see AGENTS.md). The items below reflect legacy work completed before deprecation. New development should target the browser-based GUI (`hive open`).
- [x] ~~**TUI Foundation**~~ *(deprecated)*
- [x] ~~Terminal chat interface (tui/app.py)~~
- [x] ~~Streaming support~~
- [x] ~~Multi-graph management~~
- [x] ~~Log pane display~~
- [x] ~~Keyboard shortcuts~~
- [ ] **Web Application**
- [ ] Modern web UI framework setup (React/Vue/Svelte)
- [ ] Responsive design implementation
+1 -1
View File
@@ -22,7 +22,7 @@ template_name/
### Option 1: Build from template (recommended)
Use the `coder-tools` `initialize_agent_package` tool and select "From a template" to interactively pick a template, customize the goal/nodes/graph, and export a new agent.
Use the `coder-tools` `initialize_and_build_agent` tool and select "From a template" to interactively pick a template, customize the goal/nodes/graph, and export a new agent.
### Option 2: Manual copy
@@ -204,8 +204,8 @@ class DeepResearchAgent:
"""Set up the executor with all components."""
from pathlib import Path
storage_path = Path.home() / ".hive" / "agents" / "deep_research_agent"
storage_path.mkdir(parents=True, exist_ok=True)
self._storage_path = Path.home() / ".hive" / "agents" / "deep_research_agent"
self._storage_path.mkdir(parents=True, exist_ok=True)
self._tool_registry = ToolRegistry()
@@ -2,8 +2,13 @@
"hive-tools": {
"transport": "stdio",
"command": "uv",
"args": ["run", "python", "mcp_server.py", "--stdio"],
"args": [
"run",
"python",
"mcp_server.py",
"--stdio"
],
"cwd": "../../../tools",
"description": "Hive tools MCP server providing web_search, web_scrape, and write_to_file"
}
}
}
@@ -11,26 +11,32 @@ intake_node = NodeSpec(
node_type="event_loop",
client_facing=True,
max_node_visits=0,
input_keys=["topic"],
input_keys=["user_request"],
output_keys=["research_brief"],
success_criteria=(
"The research brief is specific and actionable: it states the topic, "
"the key questions to answer, the desired scope, and depth."
),
system_prompt="""\
You are a research intake specialist. The user wants to research a topic.
Have a brief conversation to clarify what they need.
You are a research intake specialist. Your ONLY job is to have a brief conversation with the user to clarify what they want researched.
**CRITICAL: You do NOT do any research yourself.**
- You do NOT search the web
- You do NOT fetch sources
- The research happens in the NEXT stage after you complete intake
- Do NOT ask for or expect web_search or web_scrape tools
**STEP 1 Read and respond (text only, NO tool calls):**
1. Read the topic provided
2. If it's vague, ask 1-2 clarifying questions (scope, angle, depth)
1. Read the user_request provided
2. If it's vague, ask 1-2 clarifying questions (scope, angle, depth, budget, preferences)
3. If it's already clear, confirm your understanding and ask the user to confirm
Keep it short. Don't over-ask.
Keep it short. Don't over-ask. Maximum 2 clarifying questions.
**STEP 2 After the user confirms, call set_output:**
- set_output("research_brief", "A clear paragraph describing exactly what to research, \
what questions to answer, what scope to cover, and how deep to go.")
- set_output("research_brief", "A clear paragraph describing exactly what to research, what questions to answer, what scope to cover, and how deep to go.")
That's it. Once you call set_output, your job is done and the research node will take over.
""",
tools=[],
)
@@ -59,6 +65,8 @@ If feedback is provided, this is a follow-up round — focus on the gaps identif
Work in phases:
1. **Search**: Use web_search with 3-5 diverse queries covering different angles.
Prioritize authoritative sources (.edu, .gov, established publications).
For automotive research, target: caranddriver.com, motortrend.com, edmunds.com,
consumerreports.org, jdpower.com, and enthusiast forums.
2. **Fetch**: Use web_scrape on the most promising URLs (aim for 5-8 sources).
Skip URLs that fail. Extract the substantive content.
3. **Analyze**: Review what you've collected. Identify key findings, themes,
@@ -116,16 +116,39 @@ customize_node = NodeSpec(
"for each selected job, saved as HTML, and Gmail drafts created in user's inbox."
),
system_prompt="""\
You are a career coach creating personalized application materials.
You are a career coach creating personalized application materials and Gmail drafts.
**CRITICAL: You MUST create Gmail drafts for each selected job using gmail_create_draft.**
**PROCESS:**
1. Create application_materials.html using save_data and append_data.
2. Generate resume customization list and professional cold email for each selected job.
3. Serve the file to the user.
4. Create Gmail drafts using gmail_create_draft.
2. For each selected job:
a. Generate a specific resume customization list
b. Create a professional cold outreach email
c. **IMMEDIATELY call gmail_create_draft** with:
- to: hiring manager or recruiter email (if available) or company email
- subject: "Application for [Job Title] - [Your Name]"
- html: the professional cold email in HTML format
3. Serve the application_materials.html file to the user.
4. Confirm each Gmail draft was created successfully.
**EMAIL REQUIREMENTS:**
- Professional, personalized cold outreach email
- Reference specific company details and role
- Mention 2-3 relevant qualifications from their resume
- Include clear call-to-action
- Professional email signature
- Format as HTML with proper structure
**Gmail Draft Creation:**
For each job, you MUST call gmail_create_draft(to="[email]", subject="[subject]", html="[email_html]")
- Extract company email from job listing if available
- Use generic format like "careers@[company].com" if no specific email
- Subject format: "Application for [Job Title] - [Applicant Name]"
- HTML email body with proper formatting
**FINISH:**
Call set_output("application_materials", "Completed")
Only call set_output("application_materials", "Completed") AFTER creating ALL Gmail drafts.
""",
tools=["save_data", "append_data", "serve_file_to_user", "gmail_create_draft"],
)
+112 -15
View File
@@ -911,6 +911,13 @@ $zaiKey = [System.Environment]::GetEnvironmentVariable("ZAI_API_KEY", "User")
if (-not $zaiKey) { $zaiKey = $env:ZAI_API_KEY }
if ($zaiKey) { $ZaiCredDetected = $true }
$KimiCredDetected = $false
$kimiConfigPath = Join-Path $env:USERPROFILE ".kimi\config.toml"
if (Test-Path $kimiConfigPath) { $KimiCredDetected = $true }
$kimiKey = [System.Environment]::GetEnvironmentVariable("KIMI_API_KEY", "User")
if (-not $kimiKey) { $kimiKey = $env:KIMI_API_KEY }
if ($kimiKey) { $KimiCredDetected = $true }
# Detect API key providers
$ProviderMenuEnvVars = @("ANTHROPIC_API_KEY", "OPENAI_API_KEY", "GEMINI_API_KEY", "GROQ_API_KEY", "CEREBRAS_API_KEY")
$ProviderMenuNames = @("Anthropic (Claude) - Recommended", "OpenAI (GPT)", "Google Gemini - Free tier available", "Groq - Fast, free tier", "Cerebras - Fast, free tier")
@@ -938,7 +945,9 @@ if (Test-Path $HiveConfigFile) {
$PrevEnvVar = if ($prevLlm.api_key_env_var) { $prevLlm.api_key_env_var } else { "" }
if ($prevLlm.use_claude_code_subscription) { $PrevSubMode = "claude_code" }
elseif ($prevLlm.use_codex_subscription) { $PrevSubMode = "codex" }
elseif ($prevLlm.use_kimi_code_subscription) { $PrevSubMode = "kimi_code" }
elseif ($prevLlm.api_base -and $prevLlm.api_base -like "*api.z.ai*") { $PrevSubMode = "zai_code" }
elseif ($prevLlm.api_base -and $prevLlm.api_base -like "*api.kimi.com*") { $PrevSubMode = "kimi_code" }
}
} catch { }
}
@@ -951,6 +960,7 @@ if ($PrevSubMode -or $PrevProvider) {
"claude_code" { if ($ClaudeCredDetected) { $prevCredValid = $true } }
"zai_code" { if ($ZaiCredDetected) { $prevCredValid = $true } }
"codex" { if ($CodexCredDetected) { $prevCredValid = $true } }
"kimi_code" { if ($KimiCredDetected) { $prevCredValid = $true } }
default {
if ($PrevEnvVar) {
$envVal = [System.Environment]::GetEnvironmentVariable($PrevEnvVar, "Process")
@@ -964,14 +974,16 @@ if ($PrevSubMode -or $PrevProvider) {
"claude_code" { $DefaultChoice = "1" }
"zai_code" { $DefaultChoice = "2" }
"codex" { $DefaultChoice = "3" }
"kimi_code" { $DefaultChoice = "4" }
}
if (-not $DefaultChoice) {
switch ($PrevProvider) {
"anthropic" { $DefaultChoice = "4" }
"openai" { $DefaultChoice = "5" }
"gemini" { $DefaultChoice = "6" }
"groq" { $DefaultChoice = "7" }
"cerebras" { $DefaultChoice = "8" }
"anthropic" { $DefaultChoice = "5" }
"openai" { $DefaultChoice = "6" }
"gemini" { $DefaultChoice = "7" }
"groq" { $DefaultChoice = "8" }
"cerebras" { $DefaultChoice = "9" }
"kimi" { $DefaultChoice = "4" }
}
}
}
@@ -1003,12 +1015,19 @@ Write-Host ") OpenAI Codex Subscription " -NoNewline
Write-Color -Text "(use your Codex/ChatGPT Plus plan)" -Color DarkGray -NoNewline
if ($CodexCredDetected) { Write-Color -Text " (credential detected)" -Color Green } else { Write-Host "" }
# 4) Kimi Code
Write-Host " " -NoNewline
Write-Color -Text "4" -Color Cyan -NoNewline
Write-Host ") Kimi Code Subscription " -NoNewline
Write-Color -Text "(use your Kimi Code plan)" -Color DarkGray -NoNewline
if ($KimiCredDetected) { Write-Color -Text " (credential detected)" -Color Green } else { Write-Host "" }
Write-Host ""
Write-Color -Text " API key providers:" -Color Cyan
# 4-8) API key providers
# 5-9) API key providers
for ($idx = 0; $idx -lt $ProviderMenuEnvVars.Count; $idx++) {
$num = $idx + 4
$num = $idx + 5
$envVal = [System.Environment]::GetEnvironmentVariable($ProviderMenuEnvVars[$idx], "Process")
if (-not $envVal) { $envVal = [System.Environment]::GetEnvironmentVariable($ProviderMenuEnvVars[$idx], "User") }
Write-Host " " -NoNewline
@@ -1018,7 +1037,7 @@ for ($idx = 0; $idx -lt $ProviderMenuEnvVars.Count; $idx++) {
}
Write-Host " " -NoNewline
Write-Color -Text "9" -Color Cyan -NoNewline
Write-Color -Text "10" -Color Cyan -NoNewline
Write-Host ") Skip for now"
Write-Host ""
@@ -1029,16 +1048,16 @@ if ($DefaultChoice) {
while ($true) {
if ($DefaultChoice) {
$raw = Read-Host "Enter choice (1-9) [$DefaultChoice]"
$raw = Read-Host "Enter choice (1-10) [$DefaultChoice]"
if ([string]::IsNullOrWhiteSpace($raw)) { $raw = $DefaultChoice }
} else {
$raw = Read-Host "Enter choice (1-9)"
$raw = Read-Host "Enter choice (1-10)"
}
if ($raw -match '^\d+$') {
$num = [int]$raw
if ($num -ge 1 -and $num -le 9) { break }
if ($num -ge 1 -and $num -le 10) { break }
}
Write-Color -Text "Invalid choice. Please enter 1-9" -Color Red
Write-Color -Text "Invalid choice. Please enter 1-10" -Color Red
}
switch ($num) {
@@ -1102,9 +1121,20 @@ switch ($num) {
Write-Ok "Using OpenAI Codex subscription"
}
}
{ $_ -ge 4 -and $_ -le 8 } {
4 {
# Kimi Code Subscription
$SubscriptionMode = "kimi_code"
$SelectedProviderId = "kimi"
$SelectedEnvVar = "KIMI_API_KEY"
$SelectedModel = "kimi-k2.5"
$SelectedMaxTokens = 32768
Write-Host ""
Write-Ok "Using Kimi Code subscription"
Write-Color -Text " Model: kimi-k2.5 | API: api.kimi.com/coding" -Color DarkGray
}
{ $_ -ge 5 -and $_ -le 9 } {
# API key providers
$provIdx = $num - 4
$provIdx = $num - 5
$SelectedEnvVar = $ProviderMenuEnvVars[$provIdx]
$SelectedProviderId = $ProviderMenuIds[$provIdx]
$providerName = $ProviderMenuNames[$provIdx] -replace ' - .*', '' # strip description
@@ -1175,7 +1205,7 @@ switch ($num) {
}
}
}
9 {
10 {
Write-Host ""
Write-Warn "Skipped. An LLM API key is required to test and use worker agents."
Write-Host " Add your API key later by running:"
@@ -1252,6 +1282,70 @@ if ($SubscriptionMode -eq "zai_code") {
}
}
# For Kimi Code subscription: prompt for API key with verification + retry
if ($SubscriptionMode -eq "kimi_code") {
while ($true) {
$existingKimi = [System.Environment]::GetEnvironmentVariable("KIMI_API_KEY", "User")
if (-not $existingKimi) { $existingKimi = $env:KIMI_API_KEY }
if ($existingKimi) {
$masked = $existingKimi.Substring(0, [Math]::Min(4, $existingKimi.Length)) + "..." + $existingKimi.Substring([Math]::Max(0, $existingKimi.Length - 4))
Write-Host ""
Write-Color -Text " $([char]0x2B22) Current Kimi key: $masked" -Color Green
$apiKey = Read-Host " Press Enter to keep, or paste a new key to replace"
} else {
Write-Host ""
Write-Host "Get your API key from: " -NoNewline
Write-Color -Text "https://www.kimi.com/code" -Color Cyan
Write-Host ""
$apiKey = Read-Host "Paste your Kimi API key (or press Enter to skip)"
}
if ($apiKey) {
[System.Environment]::SetEnvironmentVariable("KIMI_API_KEY", $apiKey, "User")
$env:KIMI_API_KEY = $apiKey
Write-Host ""
Write-Ok "Kimi API key saved as User environment variable"
# Health check the new key
Write-Host " Verifying Kimi API key... " -NoNewline
try {
$hcResult = & uv run python (Join-Path $ScriptDir "scripts/check_llm_key.py") "kimi" $apiKey "https://api.kimi.com/coding" 2>$null
$hcJson = $hcResult | ConvertFrom-Json
if ($hcJson.valid -eq $true) {
Write-Color -Text "ok" -Color Green
break
} elseif ($hcJson.valid -eq $false) {
Write-Color -Text "failed" -Color Red
Write-Warn $hcJson.message
[System.Environment]::SetEnvironmentVariable("KIMI_API_KEY", $null, "User")
Remove-Item -Path "Env:\KIMI_API_KEY" -ErrorAction SilentlyContinue
Write-Host ""
Read-Host " Press Enter to try again"
} else {
Write-Color -Text "--" -Color Yellow
Write-Color -Text " Could not verify key (network issue). The key has been saved." -Color DarkGray
break
}
} catch {
Write-Color -Text "--" -Color Yellow
Write-Color -Text " Could not verify key (network issue). The key has been saved." -Color DarkGray
break
}
} elseif (-not $existingKimi) {
Write-Host ""
Write-Warn "Skipped. Add your Kimi API key later:"
Write-Color -Text " [System.Environment]::SetEnvironmentVariable('KIMI_API_KEY', 'your-key', 'User')" -Color Cyan
$SelectedEnvVar = ""
$SelectedProviderId = ""
$SubscriptionMode = ""
break
} else {
break
}
}
}
# Prompt for model if not already selected (manual provider path)
if ($SelectedProviderId -and -not $SelectedModel) {
$modelSel = Get-ModelSelection $SelectedProviderId
@@ -1287,6 +1381,9 @@ if ($SelectedProviderId) {
} elseif ($SubscriptionMode -eq "zai_code") {
$config.llm["api_base"] = "https://api.z.ai/api/coding/paas/v4"
$config.llm["api_key_env_var"] = $SelectedEnvVar
} elseif ($SubscriptionMode -eq "kimi_code") {
$config.llm["api_base"] = "https://api.kimi.com/coding"
$config.llm["api_key_env_var"] = $SelectedEnvVar
} else {
$config.llm["api_key_env_var"] = $SelectedEnvVar
}
+58 -24
View File
@@ -410,7 +410,7 @@ if [ "$USE_ASSOC_ARRAYS" = true ]; then
declare -A DEFAULT_MODELS=(
["anthropic"]="claude-haiku-4-5-20251001"
["openai"]="gpt-5-mini"
["minimax"]="MiniMax-M2.1"
["minimax"]="MiniMax-M2.5"
["gemini"]="gemini-3-flash-preview"
["groq"]="moonshotai/kimi-k2-instruct-0905"
["cerebras"]="zai-glm-4.7"
@@ -510,7 +510,7 @@ else
# Default models by provider id (parallel arrays)
MODEL_PROVIDER_IDS=(anthropic openai minimax gemini groq cerebras mistral together_ai deepseek)
MODEL_DEFAULTS=("claude-haiku-4-5-20251001" "gpt-5-mini" "MiniMax-M2.1" "gemini-3-flash-preview" "moonshotai/kimi-k2-instruct-0905" "zai-glm-4.7" "mistral-large-latest" "meta-llama/Llama-3.3-70B-Instruct-Turbo" "deepseek-chat")
MODEL_DEFAULTS=("claude-haiku-4-5-20251001" "gpt-5-mini" "MiniMax-M2.5" "gemini-3-flash-preview" "moonshotai/kimi-k2-instruct-0905" "zai-glm-4.7" "mistral-large-latest" "meta-llama/Llama-3.3-70B-Instruct-Turbo" "deepseek-chat")
# Helper: get provider display name for an env var
get_provider_name() {
@@ -824,6 +824,13 @@ if [ -n "${MINIMAX_API_KEY:-}" ]; then
MINIMAX_CRED_DETECTED=true
fi
KIMI_CRED_DETECTED=false
if [ -f "$HOME/.kimi/config.toml" ]; then
KIMI_CRED_DETECTED=true
elif [ -n "${KIMI_API_KEY:-}" ]; then
KIMI_CRED_DETECTED=true
fi
# Detect API key providers
if [ "$USE_ASSOC_ARRAYS" = true ]; then
for env_var in "${!PROVIDER_NAMES[@]}"; do
@@ -859,6 +866,7 @@ try:
sub = ''
if llm.get('use_claude_code_subscription'): sub = 'claude_code'
elif llm.get('use_codex_subscription'): sub = 'codex'
elif llm.get('use_kimi_code_subscription'): sub = 'kimi_code'
elif llm.get('provider', '') == 'minimax' or 'api.minimax.io' in llm.get('api_base', ''): sub = 'minimax_code'
elif 'api.z.ai' in llm.get('api_base', ''): sub = 'zai_code'
print(f'PREV_SUB_MODE={sub}')
@@ -875,6 +883,7 @@ if [ -n "$PREV_SUB_MODE" ] || [ -n "$PREV_PROVIDER" ]; then
claude_code) [ "$CLAUDE_CRED_DETECTED" = true ] && PREV_CRED_VALID=true ;;
zai_code) [ "$ZAI_CRED_DETECTED" = true ] && PREV_CRED_VALID=true ;;
codex) [ "$CODEX_CRED_DETECTED" = true ] && PREV_CRED_VALID=true ;;
kimi_code) [ "$KIMI_CRED_DETECTED" = true ] && PREV_CRED_VALID=true ;;
*)
# API key provider — check if the env var is set
if [ -n "$PREV_ENV_VAR" ] && [ -n "${!PREV_ENV_VAR}" ]; then
@@ -889,15 +898,17 @@ if [ -n "$PREV_SUB_MODE" ] || [ -n "$PREV_PROVIDER" ]; then
zai_code) DEFAULT_CHOICE=2 ;;
codex) DEFAULT_CHOICE=3 ;;
minimax_code) DEFAULT_CHOICE=4 ;;
kimi_code) DEFAULT_CHOICE=5 ;;
esac
if [ -z "$DEFAULT_CHOICE" ]; then
case "$PREV_PROVIDER" in
anthropic) DEFAULT_CHOICE=5 ;;
openai) DEFAULT_CHOICE=6 ;;
gemini) DEFAULT_CHOICE=7 ;;
groq) DEFAULT_CHOICE=8 ;;
cerebras) DEFAULT_CHOICE=9 ;;
anthropic) DEFAULT_CHOICE=6 ;;
openai) DEFAULT_CHOICE=7 ;;
gemini) DEFAULT_CHOICE=8 ;;
groq) DEFAULT_CHOICE=9 ;;
cerebras) DEFAULT_CHOICE=10 ;;
minimax) DEFAULT_CHOICE=4 ;;
kimi) DEFAULT_CHOICE=5 ;;
esac
fi
fi
@@ -936,14 +947,21 @@ else
echo -e " ${CYAN}4)${NC} MiniMax Coding Key ${DIM}(use your MiniMax coding key)${NC}"
fi
# 5) Kimi Code
if [ "$KIMI_CRED_DETECTED" = true ]; then
echo -e " ${CYAN}5)${NC} Kimi Code Subscription ${DIM}(use your Kimi Code plan)${NC} ${GREEN}(credential detected)${NC}"
else
echo -e " ${CYAN}5)${NC} Kimi Code Subscription ${DIM}(use your Kimi Code plan)${NC}"
fi
echo ""
echo -e " ${CYAN}${BOLD}API key providers:${NC}"
# 5-9) API key providers — show (credential detected) if key already set
# 6-10) API key providers — show (credential detected) if key already set
PROVIDER_MENU_ENVS=(ANTHROPIC_API_KEY OPENAI_API_KEY GEMINI_API_KEY GROQ_API_KEY CEREBRAS_API_KEY)
PROVIDER_MENU_NAMES=("Anthropic (Claude) - Recommended" "OpenAI (GPT)" "Google Gemini - Free tier available" "Groq - Fast, free tier" "Cerebras - Fast, free tier")
for idx in 0 1 2 3 4; do
num=$((idx + 5))
num=$((idx + 6))
env_var="${PROVIDER_MENU_ENVS[$idx]}"
if [ -n "${!env_var}" ]; then
echo -e " ${CYAN}$num)${NC} ${PROVIDER_MENU_NAMES[$idx]} ${GREEN}(credential detected)${NC}"
@@ -952,7 +970,7 @@ for idx in 0 1 2 3 4; do
fi
done
echo -e " ${CYAN}10)${NC} Skip for now"
echo -e " ${CYAN}11)${NC} Skip for now"
echo ""
if [ -n "$DEFAULT_CHOICE" ]; then
@@ -962,15 +980,15 @@ fi
while true; do
if [ -n "$DEFAULT_CHOICE" ]; then
read -r -p "Enter choice (1-10) [$DEFAULT_CHOICE]: " choice || true
read -r -p "Enter choice (1-11) [$DEFAULT_CHOICE]: " choice || true
choice="${choice:-$DEFAULT_CHOICE}"
else
read -r -p "Enter choice (1-10): " choice || true
read -r -p "Enter choice (1-11): " choice || true
fi
if [[ "$choice" =~ ^[0-9]+$ ]] && [ "$choice" -ge 1 ] && [ "$choice" -le 10 ]; then
if [[ "$choice" =~ ^[0-9]+$ ]] && [ "$choice" -ge 1 ] && [ "$choice" -le 11 ]; then
break
fi
echo -e "${RED}Invalid choice. Please enter 1-10${NC}"
echo -e "${RED}Invalid choice. Please enter 1-11${NC}"
done
case $choice in
@@ -1038,46 +1056,60 @@ case $choice in
SUBSCRIPTION_MODE="minimax_code"
SELECTED_ENV_VAR="MINIMAX_API_KEY"
SELECTED_PROVIDER_ID="minimax"
SELECTED_MODEL="MiniMax-M2.1"
SELECTED_MAX_TOKENS=8192
SELECTED_MODEL="MiniMax-M2.5"
SELECTED_MAX_TOKENS=32768
SELECTED_API_BASE="https://api.minimax.io/v1"
PROVIDER_NAME="MiniMax"
SIGNUP_URL="https://platform.minimax.io/user-center/basic-information/interface-key"
echo ""
echo -e "${GREEN}${NC} Using MiniMax coding key"
echo -e " ${DIM}Model: MiniMax-M2.1 | API: api.minimax.io${NC}"
echo -e " ${DIM}Model: MiniMax-M2.5 | API: api.minimax.io${NC}"
;;
5)
# Kimi Code Subscription
SUBSCRIPTION_MODE="kimi_code"
SELECTED_PROVIDER_ID="kimi"
SELECTED_ENV_VAR="KIMI_API_KEY"
SELECTED_MODEL="kimi-k2.5"
SELECTED_MAX_TOKENS=32768
SELECTED_API_BASE="https://api.kimi.com/coding"
PROVIDER_NAME="Kimi"
SIGNUP_URL="https://www.kimi.com/code"
echo ""
echo -e "${GREEN}${NC} Using Kimi Code subscription"
echo -e " ${DIM}Model: kimi-k2.5 | API: api.kimi.com/coding${NC}"
;;
6)
SELECTED_ENV_VAR="ANTHROPIC_API_KEY"
SELECTED_PROVIDER_ID="anthropic"
PROVIDER_NAME="Anthropic"
SIGNUP_URL="https://console.anthropic.com/settings/keys"
;;
6)
7)
SELECTED_ENV_VAR="OPENAI_API_KEY"
SELECTED_PROVIDER_ID="openai"
PROVIDER_NAME="OpenAI"
SIGNUP_URL="https://platform.openai.com/api-keys"
;;
7)
8)
SELECTED_ENV_VAR="GEMINI_API_KEY"
SELECTED_PROVIDER_ID="gemini"
PROVIDER_NAME="Google Gemini"
SIGNUP_URL="https://aistudio.google.com/apikey"
;;
8)
9)
SELECTED_ENV_VAR="GROQ_API_KEY"
SELECTED_PROVIDER_ID="groq"
PROVIDER_NAME="Groq"
SIGNUP_URL="https://console.groq.com/keys"
;;
9)
10)
SELECTED_ENV_VAR="CEREBRAS_API_KEY"
SELECTED_PROVIDER_ID="cerebras"
PROVIDER_NAME="Cerebras"
SIGNUP_URL="https://cloud.cerebras.ai/"
;;
10)
11)
echo ""
echo -e "${YELLOW}Skipped.${NC} An LLM API key is required to test and use worker agents."
echo -e "Add your API key later by running:"
@@ -1090,7 +1122,7 @@ case $choice in
esac
# For API-key providers: prompt for key (allow replacement if already set)
if { [ -z "$SUBSCRIPTION_MODE" ] || [ "$SUBSCRIPTION_MODE" = "minimax_code" ]; } && [ -n "$SELECTED_ENV_VAR" ]; then
if { [ -z "$SUBSCRIPTION_MODE" ] || [ "$SUBSCRIPTION_MODE" = "minimax_code" ] || [ "$SUBSCRIPTION_MODE" = "kimi_code" ]; } && [ -n "$SELECTED_ENV_VAR" ]; then
while true; do
CURRENT_KEY="${!SELECTED_ENV_VAR}"
if [ -n "$CURRENT_KEY" ]; then
@@ -1118,7 +1150,7 @@ if { [ -z "$SUBSCRIPTION_MODE" ] || [ "$SUBSCRIPTION_MODE" = "minimax_code" ]; }
echo -e "${GREEN}${NC} API key saved to $SHELL_RC_FILE"
# Health check the new key
echo -n " Verifying API key... "
if [ "$SUBSCRIPTION_MODE" = "minimax_code" ] && [ -n "${SELECTED_API_BASE:-}" ]; then
if { [ "$SUBSCRIPTION_MODE" = "minimax_code" ] || [ "$SUBSCRIPTION_MODE" = "kimi_code" ]; } && [ -n "${SELECTED_API_BASE:-}" ]; then
HC_RESULT=$(uv run python "$SCRIPT_DIR/scripts/check_llm_key.py" "$SELECTED_PROVIDER_ID" "$API_KEY" "$SELECTED_API_BASE" 2>/dev/null) || true
else
HC_RESULT=$(uv run python "$SCRIPT_DIR/scripts/check_llm_key.py" "$SELECTED_PROVIDER_ID" "$API_KEY" 2>/dev/null) || true
@@ -1238,6 +1270,8 @@ if [ -n "$SELECTED_PROVIDER_ID" ]; then
save_configuration "$SELECTED_PROVIDER_ID" "$SELECTED_ENV_VAR" "$SELECTED_MODEL" "$SELECTED_MAX_TOKENS" "" "https://api.z.ai/api/coding/paas/v4" > /dev/null
elif [ "$SUBSCRIPTION_MODE" = "minimax_code" ]; then
save_configuration "$SELECTED_PROVIDER_ID" "$SELECTED_ENV_VAR" "$SELECTED_MODEL" "$SELECTED_MAX_TOKENS" "" "$SELECTED_API_BASE" > /dev/null
elif [ "$SUBSCRIPTION_MODE" = "kimi_code" ]; then
save_configuration "$SELECTED_PROVIDER_ID" "$SELECTED_ENV_VAR" "$SELECTED_MODEL" "$SELECTED_MAX_TOKENS" "" "$SELECTED_API_BASE" > /dev/null
else
save_configuration "$SELECTED_PROVIDER_ID" "$SELECTED_ENV_VAR" "$SELECTED_MODEL" "$SELECTED_MAX_TOKENS" > /dev/null
fi
+61 -6
View File
@@ -56,6 +56,53 @@ def check_openai_compatible(api_key: str, endpoint: str, name: str) -> dict:
return {"valid": False, "message": f"{name} API returned status {r.status_code}"}
def check_minimax(
api_key: str, api_base: str = "https://api.minimax.io/v1", **_: str
) -> dict:
"""Validate via chatcompletion_v2 endpoint with empty messages.
MiniMax doesn't support GET /models; their native endpoint is
/v1/text/chatcompletion_v2.
"""
with httpx.Client(timeout=TIMEOUT) as client:
r = client.post(
f"{api_base.rstrip('/')}/text/chatcompletion_v2",
headers={
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json",
},
json={"model": "MiniMax-M2.5", "messages": []},
)
if r.status_code in (200, 400, 422, 429):
return {"valid": True, "message": "MiniMax API key valid"}
if r.status_code == 401:
return {"valid": False, "message": "Invalid MiniMax API key"}
if r.status_code == 403:
return {"valid": False, "message": "MiniMax API key lacks permissions"}
return {"valid": False, "message": f"MiniMax API returned status {r.status_code}"}
def check_anthropic_compatible(api_key: str, endpoint: str, name: str) -> dict:
"""POST empty messages to an Anthropic-compatible endpoint to validate key."""
with httpx.Client(timeout=TIMEOUT) as client:
r = client.post(
endpoint,
headers={
"x-api-key": api_key,
"anthropic-version": "2023-06-01",
"Content-Type": "application/json",
},
json={"model": "kimi-k2.5", "max_tokens": 1, "messages": []},
)
if r.status_code in (200, 400, 429):
return {"valid": True, "message": f"{name} API key valid"}
if r.status_code == 401:
return {"valid": False, "message": f"Invalid {name} API key"}
if r.status_code == 403:
return {"valid": False, "message": f"{name} API key lacks permissions"}
return {"valid": False, "message": f"{name} API returned status {r.status_code}"}
def check_gemini(api_key: str, **_: str) -> dict:
"""List models with query param auth."""
with httpx.Client(timeout=TIMEOUT) as client:
@@ -82,8 +129,11 @@ PROVIDERS = {
"cerebras": lambda key, **kw: check_openai_compatible(
key, "https://api.cerebras.ai/v1/models", "Cerebras"
),
"minimax": lambda key, **kw: check_openai_compatible(
key, "https://api.minimax.io/v1/models", "MiniMax"
"minimax": lambda key, **kw: check_minimax(key),
# Kimi For Coding uses an Anthropic-compatible endpoint; check via /v1/messages
# with empty messages (same as check_anthropic, triggers 400 not 401).
"kimi": lambda key, **kw: check_anthropic_compatible(
key, "https://api.kimi.com/coding/v1/messages", "Kimi"
),
}
@@ -105,12 +155,17 @@ def main() -> None:
api_base = sys.argv[3] if len(sys.argv) > 3 else ""
try:
if api_base:
if api_base and provider_id == "minimax":
result = check_minimax(api_key, api_base)
elif api_base and provider_id == "kimi":
# Kimi uses an Anthropic-compatible endpoint; check via /v1/messages
result = check_anthropic_compatible(
api_key, api_base.rstrip("/") + "/v1/messages", "Kimi"
)
elif api_base:
# Custom API base (ZAI or other OpenAI-compatible)
endpoint = api_base.rstrip("/") + "/models"
name = {"zai": "ZAI", "minimax": "MiniMax"}.get(
provider_id, "Custom provider"
)
name = {"zai": "ZAI"}.get(provider_id, "Custom provider")
result = check_openai_compatible(api_key, endpoint, name)
elif provider_id in PROVIDERS:
result = PROVIDERS[provider_id](api_key)
+65 -39
View File
@@ -1,7 +1,7 @@
#!/usr/bin/env python
"""Debug tool to print the queen's running phase prompt."""
"""Debug tool to print the queen's phase-specific prompts."""
from framework.agents.hive_coder.nodes import (
from framework.agents.queen.nodes import (
_appendices,
_queen_behavior_always,
_queen_behavior_running,
@@ -10,32 +10,36 @@ from framework.agents.hive_coder.nodes import (
_queen_tools_running,
)
_DEFAULT_WORKER_IDENTITY = (
"\n\n# Worker Profile\n"
"No worker agent loaded. You are operating independently.\n"
"Handle all tasks directly using your coding tools."
)
def print_running_prompt(worker_identity: str | None = None) -> None:
"""Print the composed running phase prompt.
Args:
worker_identity: Optional worker identity string. If None, shows
the "no worker loaded" placeholder.
"""
if worker_identity is None:
worker_identity = (
"\n\n# Worker Profile\n"
"No worker agent loaded. You are operating independently.\n"
"Handle all tasks directly using your coding tools."
)
def print_planning_prompt(worker_identity: str | None = None) -> None:
"""Print the composed planning phase prompt."""
from framework.agents.queen.nodes import (
_planning_knowledge,
_queen_behavior_planning,
_queen_identity_planning,
_queen_tools_planning,
)
wi = worker_identity or _DEFAULT_WORKER_IDENTITY
prompt = (
_queen_identity_running
_queen_identity_planning
+ _queen_style
+ _queen_tools_running
+ _queen_tools_planning
+ _queen_behavior_always
+ _queen_behavior_running
+ worker_identity
+ _queen_behavior_planning
+ _planning_knowledge
+ wi
)
print("=" * 80)
print("QUEEN RUNNING PHASE PROMPT")
print("QUEEN PLANNING PHASE PROMPT")
print("=" * 80)
print(prompt)
print("=" * 80)
@@ -44,20 +48,16 @@ def print_running_prompt(worker_identity: str | None = None) -> None:
def print_building_prompt(worker_identity: str | None = None) -> None:
"""Print the composed building phase prompt."""
from framework.agents.hive_coder.nodes import (
_agent_builder_knowledge,
from framework.agents.queen.nodes import (
_building_knowledge,
_gcu_building_section,
_queen_behavior_building,
_queen_identity_building,
_queen_phase_7,
_queen_tools_building,
)
if worker_identity is None:
worker_identity = (
"\n\n# Worker Profile\n"
"No worker agent loaded. You are operating independently.\n"
"Handle all tasks directly using your coding tools."
)
wi = worker_identity or _DEFAULT_WORKER_IDENTITY
prompt = (
_queen_identity_building
@@ -65,10 +65,11 @@ def print_building_prompt(worker_identity: str | None = None) -> None:
+ _queen_tools_building
+ _queen_behavior_always
+ _queen_behavior_building
+ _agent_builder_knowledge
+ _building_knowledge
+ _gcu_building_section
+ _queen_phase_7
+ _appendices
+ worker_identity
+ wi
)
print("=" * 80)
@@ -81,18 +82,13 @@ def print_building_prompt(worker_identity: str | None = None) -> None:
def print_staging_prompt(worker_identity: str | None = None) -> None:
"""Print the composed staging phase prompt."""
from framework.agents.hive_coder.nodes import (
from framework.agents.queen.nodes import (
_queen_behavior_staging,
_queen_identity_staging,
_queen_tools_staging,
)
if worker_identity is None:
worker_identity = (
"\n\n# Worker Profile\n"
"No worker agent loaded. You are operating independently.\n"
"Handle all tasks directly using your coding tools."
)
wi = worker_identity or _DEFAULT_WORKER_IDENTITY
prompt = (
_queen_identity_staging
@@ -100,7 +96,7 @@ def print_staging_prompt(worker_identity: str | None = None) -> None:
+ _queen_tools_staging
+ _queen_behavior_always
+ _queen_behavior_staging
+ worker_identity
+ wi
)
print("=" * 80)
@@ -111,17 +107,47 @@ def print_staging_prompt(worker_identity: str | None = None) -> None:
print(f"\nTotal length: {len(prompt):,} characters")
def print_running_prompt(worker_identity: str | None = None) -> None:
"""Print the composed running phase prompt.
Args:
worker_identity: Optional worker identity string. If None, shows
the "no worker loaded" placeholder.
"""
wi = worker_identity or _DEFAULT_WORKER_IDENTITY
prompt = (
_queen_identity_running
+ _queen_style
+ _queen_tools_running
+ _queen_behavior_always
+ _queen_behavior_running
+ wi
)
print("=" * 80)
print("QUEEN RUNNING PHASE PROMPT")
print("=" * 80)
print(prompt)
print("=" * 80)
print(f"\nTotal length: {len(prompt):,} characters")
if __name__ == "__main__":
import sys
phase = sys.argv[1] if len(sys.argv) > 1 else "running"
phase = sys.argv[1] if len(sys.argv) > 1 else "planning"
if phase == "all":
print_planning_prompt()
print("\n\n")
print_building_prompt()
print("\n\n")
print_staging_prompt()
print("\n\n")
print_running_prompt()
elif phase == "planning":
print_planning_prompt()
elif phase == "building":
print_building_prompt()
elif phase == "staging":
@@ -131,6 +157,6 @@ if __name__ == "__main__":
else:
print(f"Unknown phase: {phase}")
print(
"Usage: uv run scripts/debug_queen_prompt.py [building|staging|running|all]"
"Usage: uv run scripts/debug_queen_prompt.py [planning|building|staging|running|all]"
)
sys.exit(1)
+2 -2
View File
@@ -1,4 +1,4 @@
"""Quick test script for initialize_agent_package."""
"""Quick test script for initialize_and_build_agent."""
import sys
import os
@@ -14,6 +14,6 @@ import tools.coder_tools_server as srv
srv.PROJECT_ROOT = os.path.abspath(os.path.join(os.path.dirname(__file__), ".."))
# Access the underlying function (FastMCP wraps it as FunctionTool)
tool = srv.initialize_agent_package
tool = srv.initialize_and_build_agent
result = tool.fn("richard_test2", nodes="intake,process,review")
print(result)
+83 -8
View File
@@ -3,7 +3,7 @@
Coder Tools MCP Server OpenCode-inspired coding tools.
Provides rich file I/O, fuzzy-match editing, git snapshots, and shell execution
for the hive_coder agent. Modeled after opencode's tool architecture.
for the queen agent. Modeled after opencode's tool architecture.
All paths scoped to a configurable project root for safety.
@@ -1252,6 +1252,53 @@ def validate_agent_package(agent_name: str) -> str:
path_parts.append(pythonpath)
env["PYTHONPATH"] = os.pathsep.join(path_parts)
# Step 0: Module contract — __init__.py must expose goal, nodes, edges
try:
_contract_script = textwrap.dedent("""\
import importlib, json
mod = importlib.import_module('{agent_name}')
missing = [a for a in ('goal', 'nodes', 'edges') if getattr(mod, a, None) is None]
if missing:
print(json.dumps({{
'valid': False,
'error': (
"Module '{agent_name}' is missing module-level attributes: "
+ ", ".join(missing) + ". "
"Fix: in {agent_name}/__init__.py, add "
"'from .agent import " + ", ".join(missing) + "' "
"so that 'import {agent_name}' exposes them at package level."
)
}}))
else:
print(json.dumps({{'valid': True}}))
""").format(agent_name=agent_name)
proc = subprocess.run(
["uv", "run", "python", "-c", _contract_script],
capture_output=True,
text=True,
timeout=30,
env=env,
cwd=PROJECT_ROOT,
stdin=subprocess.DEVNULL,
)
if proc.returncode == 0:
result = json.loads(proc.stdout.strip())
steps["module_contract"] = {
"passed": result["valid"],
"output": result.get("error", "goal, nodes, edges exported correctly"),
}
else:
steps["module_contract"] = {
"passed": False,
"error": (
f"Failed to import '{agent_name}': {proc.stderr.strip()[:1000]}. "
f"Fix: ensure {agent_name}/__init__.py exists and can be imported "
f"without errors (check syntax, missing dependencies, relative imports)."
),
}
except Exception as e:
steps["module_contract"] = {"passed": False, "error": str(e)}
# Step A: Class validation (subprocess for import isolation)
try:
proc = subprocess.run(
@@ -1321,9 +1368,11 @@ def validate_agent_package(agent_name: str) -> str:
result = json.loads(proc.stdout.strip())
steps["node_completeness"] = {
"passed": result["valid"],
"output": "; ".join(result["errors"])
if result["errors"]
else "All defined nodes are in the graph",
"output": (
"; ".join(result["errors"])
if result["errors"]
else "All defined nodes are in the graph"
),
}
if not result["valid"]:
steps["node_completeness"]["errors"] = result["errors"]
@@ -1434,7 +1483,7 @@ def _node_var_name(node_id: str) -> str:
@mcp.tool()
def initialize_agent_package(agent_name: str, nodes: str | None = None) -> str:
def initialize_and_build_agent(agent_name: str, nodes: str | None = None) -> str:
"""Scaffold a new agent package with placeholder files.
Creates exports/{agent_name}/ with all files needed for a runnable agent:
@@ -1985,6 +2034,9 @@ def runner_loaded():
''',
)
# Build list of all generated file paths for the caller.
all_file_paths = [info["path"] for info in files_written.values()]
return json.dumps(
{
"success": True,
@@ -1994,10 +2046,33 @@ def runner_loaded():
"nodes": node_list,
"files_written": files_written,
"file_count": len(files_written),
"files": all_file_paths,
"next_steps": [
f"Customize node definitions in exports/{agent_name}/nodes/__init__.py",
f"Define goal and edges in exports/{agent_name}/agent.py",
f'Run validate_agent_package("{agent_name}") to check structure',
(
"IMPORTANT: All generated files are structurally complete "
"with correct imports, class definition, validate() method, "
"and __init__.py exports. Use edit_file to customize TODO "
"placeholders — do NOT use write_file to rewrite entire files, "
"as this will break imports and structure."
),
(
f"Use edit_file to customize system prompts, tools, "
f"input_keys, output_keys, and success_criteria in "
f"exports/{agent_name}/nodes/__init__.py"
),
(
f"Use edit_file to customize goal description, "
f"success_criteria values, constraint values, edge "
f"definitions, and identity_prompt in "
f"exports/{agent_name}/agent.py"
),
(
"Do NOT modify: imports at top of agent.py, the class "
"definition, validate() method, _build_graph()/_setup()/"
"lifecycle methods, or __init__.py exports — they are "
"already correct."
),
f'Run validate_agent_package("{agent_name}") to verify structure',
],
},
indent=2,
@@ -17,6 +17,9 @@ AIRTABLE_CREDENTIALS = {
"airtable_update_records",
"airtable_list_bases",
"airtable_get_base_schema",
"airtable_delete_records",
"airtable_search_records",
"airtable_list_collaborators",
],
required=True,
startup_required=False,
@@ -14,6 +14,9 @@ APOLLO_CREDENTIALS = {
"apollo_enrich_company",
"apollo_search_people",
"apollo_search_companies",
"apollo_get_person_activities",
"apollo_list_email_accounts",
"apollo_bulk_enrich_people",
],
required=True,
startup_required=False,
@@ -16,6 +16,9 @@ ASANA_CREDENTIALS = {
"asana_get_task",
"asana_create_task",
"asana_search_tasks",
"asana_update_task",
"asana_add_comment",
"asana_create_subtask",
],
required=True,
startup_required=False,
@@ -16,6 +16,9 @@ AWS_S3_CREDENTIALS = {
"s3_get_object",
"s3_put_object",
"s3_delete_object",
"s3_copy_object",
"s3_get_object_metadata",
"s3_generate_presigned_url",
],
required=True,
startup_required=False,
@@ -42,6 +45,9 @@ AWS_S3_CREDENTIALS = {
"s3_get_object",
"s3_put_object",
"s3_delete_object",
"s3_copy_object",
"s3_get_object_metadata",
"s3_generate_presigned_url",
],
required=True,
startup_required=False,
@@ -15,6 +15,9 @@ BREVO_CREDENTIALS = {
"brevo_get_contact",
"brevo_update_contact",
"brevo_get_email_stats",
"brevo_list_contacts",
"brevo_delete_contact",
"brevo_list_email_campaigns",
],
required=True,
startup_required=False,
@@ -16,6 +16,9 @@ CALENDLY_CREDENTIALS = {
"calendly_list_scheduled_events",
"calendly_get_scheduled_event",
"calendly_list_invitees",
"calendly_cancel_event",
"calendly_list_webhooks",
"calendly_get_event_type",
],
required=True,
startup_required=False,
@@ -16,6 +16,9 @@ CLOUDINARY_CREDENTIALS = {
"cloudinary_get_resource",
"cloudinary_delete_resource",
"cloudinary_search",
"cloudinary_get_usage",
"cloudinary_rename_resource",
"cloudinary_add_tag",
],
required=True,
startup_required=False,
@@ -41,6 +44,9 @@ CLOUDINARY_CREDENTIALS = {
"cloudinary_get_resource",
"cloudinary_delete_resource",
"cloudinary_search",
"cloudinary_get_usage",
"cloudinary_rename_resource",
"cloudinary_add_tag",
],
required=True,
startup_required=False,
@@ -60,6 +66,9 @@ CLOUDINARY_CREDENTIALS = {
"cloudinary_get_resource",
"cloudinary_delete_resource",
"cloudinary_search",
"cloudinary_get_usage",
"cloudinary_rename_resource",
"cloudinary_add_tag",
],
required=True,
startup_required=False,

Some files were not shown because too many files have changed in this diff Show More