Files
hive/docs/architecture/agent-usage-and-status-tracking.md
T
2026-05-04 17:55:44 -07:00

332 lines
28 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Agent Usage & Status Tracking — Capability Document
**Audience:** Lead software architect (paired with frontend + business requirement docs)
**Scope:** Queen agent first (default local-runtime entry point), then downstream colonies/workers
**Status:** Capability inventory + proposal — no implementation commitments
**Date:** 2026-05-04
---
## 1. Why this document exists
We have a business need to track **agent usage** (what was consumed: tokens, cost, runtime, calls) and **agent status** (what state agents are in: alive, phase, progress, blocked) starting from the Queen agent, and surface this **on the cloud** for product and business consumers. This document inventories the capabilities the runtime can expose **today** vs. what is **net-new**, so architecture can pick a scope before frontend and product write against it.
The Queen is the right anchor: every local-runtime session today starts with a Queen, and every colony/worker is forked from a Queen call — so a tracking surface rooted at the Queen automatically covers the whole agent tree.
> **Headline constraint for the architect:** the runtime is **local-by-default**. Every byte described in §3 — events, runtime logs, progress DBs, session state, LLM cost numbers — is written to the user's machine under `~/.hive/` (or the platform-specific Electron `userData` directory). **Nothing is shipped to the cloud today.** The business ask therefore implies a new local→cloud transport boundary, with the data-residency, privacy, and identity decisions that come with it. §4.5 makes the gap explicit per-surface; §8 lists the risks; §9 frames the cloud cut-over as the gating decision for any "Slice 2+" work.
---
## 2. Vocabulary — what we actually mean
| Term | Definition in this codebase |
|---|---|
| **Session** | One Queen runtime instance. ID: `session_{YYYYMMDD_HHMMSS}_{uuid8}`. Persisted at `~/.hive/sessions/{session_id}/`. |
| **Queen** | Long-lived conversational agent. One per session. Single event-loop node. Phases: `independent → incubating → working → reviewing`. See [queen/nodes/__init__.py:494-518](../../core/framework/agents/queen/nodes/__init__.py#L494-L518). |
| **Colony** | Persistent stateless container forked by `create_colony`. Has its own SQLite progress DB at `~/.hive/colonies/{colony_name}/data/progress.db`. |
| **Worker** | Ephemeral agent running inside a colony to execute a task. |
| **Run / execution** | One trigger-to-completion invocation inside a node. Carries `run_id`, `execution_id`, `trace_id` (OTel-aligned). |
| **Usage** | Quantitative consumption: input/output/cached tokens, USD cost, wall-clock latency, tool-call counts. |
| **Status** | Qualitative state: phase, alive/stalled, current task, blocked-on, queue depth, last-heartbeat. |
---
## 3. What exists today (capabilities, not commitments)
The runtime is already heavily instrumented. Most of what business wants is **already emitted** — the gap is persistence, aggregation, and a stable API surface.
### 3.1 Event Bus — the spine
[core/framework/host/event_bus.py:61-177](../../core/framework/host/event_bus.py#L61-L177) defines an in-process async pub/sub with **40+ event types** scoped by `stream_id`, `session_id`, `colony_id`, `execution_id`, `run_id`, `correlation_id`, `timestamp`.
Relevant for usage/status:
- **Lifecycle:** `EXECUTION_STARTED/COMPLETED/FAILED/PAUSED/RESUMED/RESURRECTED`
- **Queen:** `QUEEN_PHASE_CHANGED`, `QUEEN_IDENTITY_SELECTED`
- **Colony/Worker:** `COLONY_CREATED`, `WORKER_COLONY_LOADED`, `WORKER_COMPLETED`, `WORKER_FAILED`, `SUBAGENT_REPORT`
- **LLM:** `LLM_TURN_COMPLETE`, `LLM_TEXT_DELTA`, `LLM_REASONING_DELTA`, `CONTEXT_USAGE_UPDATED`
- **Tools:** `TOOL_CALL_STARTED`, `TOOL_CALL_COMPLETED`, `TOOL_CALL_REPLAY_DETECTED`
- **Health:** `NODE_STALLED`, `NODE_TOOL_DOOM_LOOP`, `STREAM_TTFT_EXCEEDED`, `STREAM_INACTIVE`, `STREAM_NUDGE_SENT`
- **Tasks (right-rail panel):** `TASK_CREATED`, `TASK_UPDATED`, `TASK_DELETED`, `TASK_LIST_RESET`
- **Triggers:** `TRIGGER_AVAILABLE/ACTIVATED/DEACTIVATED/FIRED/REMOVED/UPDATED`
> Persistence today: in-memory only, **plus** optional JSONL export when `HIVE_DEBUG_EVENTS=1` ([event_bus.py:33-54](../../core/framework/host/event_bus.py#L33-L54)). There is no SQL events table.
### 3.2 Three-level runtime logs (per session)
[core/framework/tracker/runtime_log_schemas.py](../../core/framework/tracker/runtime_log_schemas.py) defines:
| Level | Schema | File | Granularity |
|---|---|---|---|
| L1 | `RunSummaryLog` | `summary.json` | Per graph run — totals + execution_quality + trace_id |
| L2 | `NodeDetail` | `details.jsonl` | Per node — exit_status, input/output tokens, latency_ms, retry/accept/escalate/continue counts |
| L3 | `NodeStepLog` | `tool_logs.jsonl` | Per LLM step — tool calls, verdicts, error traces, latency_ms |
Storage: `~/.hive/sessions/{session_id}/logs/` ([runtime_log_store.py](../../core/framework/tracker/runtime_log_store.py)). Schemas already carry OTel fields (`trace_id`, `span_id`, `parent_span_id`) — wire-ready, not yet exported.
### 3.3 LLM call accounting
[core/framework/llm/provider.py:11-32](../../core/framework/llm/provider.py#L11-L32) — `LLMResponse` carries: `model`, `input_tokens`, `output_tokens`, `cached_tokens`, `cache_creation_tokens`, `cost_usd`, `stop_reason`. Cost is computed from [model_catalog.py](../../core/framework/llm/model_catalog.py) when the model is priced; otherwise `0.0`.
> Gap: cost lives in the response object and is rolled into L2/L3 logs, but is **not** in the event bus stream and **not** in any aggregate query surface.
### 3.4 Colony Progress DB
[core/framework/host/progress_db.py:44-110](../../core/framework/host/progress_db.py#L44-L110) — per-colony SQLite (WAL mode):
- `tasks` (id, seq, priority, goal, status: pending|claimed|started|completed|failed, worker_id, claimed_at, started_at, completed_at, retry_count, last_error)
- `steps`, `sop_checklist`, `colony_meta`
This is the closest thing we have to a status SQL store today, but it is **per-colony** and **task-shaped** — not session-shaped or usage-shaped.
### 3.5 Queen task system (right-rail panel)
The mechanism the IDE-selection prompt describes is real: each `task_update` emits `TASK_UPDATED` on the bus, which a future SSE/WS surface can stream. State transitions: `pending → in_progress → completed`. Task body carries `subject`, `active_form`, `blocks`, `blocked_by`, `metadata`. Source: [tasks/events.py:52-159](../../core/framework/tasks/events.py#L52-L159).
### 3.6 HTTP surface (already shipping)
[core/framework/server/routes_sessions.py](../../core/framework/server/routes_sessions.py):
- `POST /api/sessions` — create
- `GET /api/sessions/{session_id}` — current state including `queen_phase`, `queen_model`, `colony_id`, `uptime_seconds`
- `GET /api/sessions/{session_id}/stats` — runtime statistics (extension point)
- `GET /api/sessions/{session_id}/events/history` — replay persisted events
SSE primitive exists at [server/sse.py](../../core/framework/server/sse.py) but is **not yet wired to a global event-stream route**. This is the natural attach point for a real-time status feed.
### 3.7 Worker health snapshot
`get_worker_health_summary()` ([worker_monitoring_tools.py:71-99](../../core/framework/tools/worker_monitoring_tools.py#L71-L99)) returns: `session_id`, `session_status`, `total_steps`, `recent_verdicts`, `stall_minutes`, `evidence_snippet`. Used today by Queen during the WORKING phase; can be exposed via API.
---
## 3.8 Where every byte lives today (data residency map)
Every storage location below is **on the end-user's machine**. There is no cloud sink, no telemetry endpoint, no managed database, no analytics service. The HTTP server in [core/framework/server/](../../core/framework/server/) binds to localhost for the desktop UI; it is not a cloud API.
`HIVE_HOME` defaults to `~/.hive/` and is overridden by the desktop shell to the platform `userData` dir (e.g. `~/Library/Application Support/Hive/` on macOS, `%APPDATA%\Hive\` on Windows). Source: [config.py:20-44](../../core/framework/config.py#L20-L44).
| Data | On-disk location (per machine) | Format | Lifetime | Currently shipped off-device? |
|---|---|---|---|---|
| Event bus stream | in-process memory only | Python objects | Process lifetime | No |
| Event debug log (opt-in) | `HIVE_HOME/event_logs/<ts>.jsonl` when `HIVE_DEBUG_EVENTS=1` | JSONL | Until user deletes | No |
| Session state | `HIVE_HOME/sessions/{session_id}/state.json` | JSON | Until user deletes | No |
| Conversations | `HIVE_HOME/sessions/{session_id}/conversations/` | JSON | Until user deletes | No |
| Artifacts | `HIVE_HOME/sessions/{session_id}/artifacts/` | mixed | Until user deletes | No |
| L1 run summary (tokens, cost, quality) | `HIVE_HOME/sessions/{session_id}/logs/summary.json` | JSON | Until user deletes | No |
| L2 node details | `HIVE_HOME/sessions/{session_id}/logs/details.jsonl` | JSONL | Until user deletes | No |
| L3 step / tool logs | `HIVE_HOME/sessions/{session_id}/logs/tool_logs.jsonl` | JSONL | Until user deletes | No |
| Colony task / step / SOP state | `HIVE_HOME/colonies/{colony_name}/data/progress.db` | SQLite (WAL) | Until user deletes | No |
| Queen / colony / skill / memory configs | `HIVE_HOME/{queens,colonies,skills,memories}/` | files | Until user deletes | No |
| LLM `cost_usd` numbers | computed in-process from [model_catalog.py](../../core/framework/llm/model_catalog.py), then written into L1/L2/L3 logs above | — | Same as logs | No |
**What this means for the cloud requirement:** the question for the architect is not "where do we get the data" — the data is fully captured. The question is **"how does it leave the machine, in what shape, with whose consent, and where does it land."** That decision is upstream of every endpoint in §6 and every storage option in §5.
Three architectural shapes worth considering (architect to choose):
- **Shape A — On-device only, queried over LAN/tunnel.** Cloud product reaches into the runtime via an authenticated tunnel; no data is replicated. Strongest privacy. Hardest for cross-device rollups.
- **Shape B — Outbox push.** Runtime keeps the local store as source of truth and asynchronously pushes a redacted, billing-grade subset (no prompts, no tool args by default) to a cloud aggregate. Best fit for the typical "agent status dashboard + usage rollup" product.
- **Shape C — Cloud-first runtime.** Runtime writes events directly to a cloud bus and treats local files as a cache. Largest rewrite; not recommended for a desktop-first product.
Shape B is the lowest-friction path to the stated business outcome. The rest of this document is written with Shape B as the default and calls out where Shape A or C would change things.
---
## 4. Capability matrix — what we can offer
Each row is a candidate frontend/business surface, scored by feasibility from current state.
| # | Capability | Status | Backed by |
|---|---|---|---|
| **Status** | | | |
| S1 | Queen phase indicator (independent/incubating/working/reviewing) | **Ready** | `QUEEN_PHASE_CHANGED` event + session detail field |
| S2 | Per-task progress (right-rail) | **Ready** | `TASK_*` events |
| S3 | Live LLM streaming indicator (typing, thinking, tool-calling) | **Ready** | `LLM_TEXT_DELTA`, `LLM_REASONING_DELTA`, `TOOL_CALL_STARTED/COMPLETED` |
| S4 | Stall / stuck-agent detection | **Ready** | `NODE_STALLED`, `STREAM_INACTIVE`, `NODE_TOOL_DOOM_LOOP` |
| S5 | Colony tree (Queen → colonies → workers) snapshot | **Partial** — data exists in session/colony stores; need a join query |
| S6 | Worker health roll-up across colonies | **Partial** — per-worker tool exists; needs aggregation route |
| S7 | Liveness heartbeat ("agent X last seen Y ago") | **Net-new** — must derive from event timestamps or add a periodic ping |
| S8 | Trigger schedule (when will Queen wake next) | **Ready** | `TRIGGER_*` events |
| **Usage** | | | |
| U1 | Tokens per session (input/output/cached) | **Partial** — captured per-step in L3, summed in L1; no API |
| U2 | USD cost per session / colony / model | **Partial**`cost_usd` per LLM call in logs; no rollup |
| U3 | Tool-call counts and types | **Partial** — events exist; no aggregate |
| U4 | Wall-clock runtime and active-time per agent | **Partial** — derivable from `EXECUTION_STARTED/COMPLETED` |
| U5 | Cost attribution per Queen-spawned colony | **Partial**`colony_id` is on every event; needs a query |
| U6 | Per-user / per-tenant aggregation | **Net-new** — there is no user/tenant identity in events today |
| U7 | Daily / monthly usage rollups for billing | **Net-new** — requires persistent event store |
| U8 | Quota / cap enforcement (block when over budget) | **Net-new** — requires real-time meter + policy hook |
**Read of the matrix:** ~70% of "status" surfaces are **shipping-grade today** behind a thin local API. ~70% of "usage" surfaces need a **persistence + aggregation layer**. The events themselves are not the bottleneck.
**Local vs. cloud read of the same matrix.** Every "Ready" / "Partial" cell above is *ready in-process on the local machine*. Making each row visible to a **cloud** consumer adds an additional step:
| Capability class | Local (today / near-term) | Cloud (business ask) |
|---|---|---|
| Live status (S1S4, S8) | Stream from in-process event bus over local SSE | Push events through outbox → cloud relay → cloud SSE/WS to product UI. |
| Tree / health (S5, S6) | Join local session + colony stores | Same join, but on cloud-side replica of session/colony index. |
| Liveness (S7) | Derive from local event timestamps | Requires the runtime to *post* a heartbeat; cloud cannot infer aliveness from absence. |
| Per-session usage (U1U5) | Read L1/L2/L3 logs on disk | Outbox sends durable rows (no deltas) to cloud usage table. |
| Tenant rollups (U6U7) | Not possible — no identity in events | Cloud-side aggregation keyed on session→user join, identity attached at outbox time. |
| Quotas (U8) | Local meter feasible, but pointless without cloud truth | Cloud is the meter of record; runtime calls home to check. |
---
## 5. Proposed data model (architect to validate)
Three new persisted entities, plus reuse of existing event types:
```
AgentSession UsageEvent StatusSnapshot
----------- ----------- ---------------
session_id (PK) id (PK) session_id (FK)
queen_id session_id (FK) taken_at
queen_model colony_id phase
started_at worker_id active_run_id
ended_at agent_role (queen|worker) active_node
status (active|done|failed) event_type (LLM|TOOL|...) open_task_count
user_id (when multi-tenant) model in_flight_workers
tenant_id (when multi-tenant) input_tokens last_event_at
total_input_tokens output_tokens stall_score
total_output_tokens cached_tokens
total_cached_tokens cost_usd
total_cost_usd latency_ms
total_tool_calls tool_name (nullable)
last_event_at occurred_at
trace_id
execution_id
```
Storage choice (architect call). **All three options today are local; only Option C reaches the cloud business surface.**
- **Option A — local SQLite outbox** at `HIVE_HOME/runtime.db`. Pros: zero infra, fits desktop, makes local queries cheap. Cons: per-host; no cross-device aggregation; **does not satisfy the cloud requirement on its own.**
- **Option B — DuckDB on the existing JSONL event logs.** Pros: zero ingestion code; analyst-friendly. Cons: cold-start latency on big histories; **also local-only.**
- **Option C — push events to a managed cloud store** (Postgres, ClickHouse, BigQuery) via an outbox pattern. Pros: cross-host rollups, billing-grade, the only option that actually delivers the cloud-visible status/metrics product. Cons: introduces a new transport, identity, and privacy/redaction story; needs explicit user opt-in for desktop builds.
The realistic shape is the hybrid called out in §3.8 Shape B: **A locally** as the durable buffer and source of truth, **C in the cloud** as the business-facing aggregate, with a one-way outbox that moves a *redacted, durable-event-only* subset over the wire. This document recommends that hybrid; everything in §6 and §7 is written against it.
---
## 6. Surface API — what frontend would consume
All routes assume the event-bus → SSE bridge exists (the one missing wire — see §3.6). Frontend sees this from day one.
> **Locality note.** The `/api/...` routes below are served by the **local runtime HTTP server** today. For the cloud product, the same shapes need a cloud-side counterpart fed by the outbox. Two practical patterns: (1) cloud product calls cloud-hosted versions of these routes (against the aggregate), or (2) cloud product proxies authenticated requests back to the user's runtime. §3.8 Shape A vs. Shape B picks between them.
### Real-time channel
```
GET /api/sessions/{session_id}/events/stream (SSE)
↳ filter=phase,task,llm_stream,tool,worker,trigger,health
GET /api/agents/queen/stream (SSE) — global queen events
```
### Status reads
```
GET /api/sessions/{session_id} — already shipping
GET /api/sessions/{session_id}/tree — Queen → colonies → workers
GET /api/sessions/{session_id}/health — stall_score, last_event_at, in_flight
GET /api/colonies/{colony_id}/workers — health roll-up
```
### Usage reads
```
GET /api/sessions/{session_id}/usage — tokens, cost, latency, tool-calls
GET /api/sessions/{session_id}/usage/by-model — split by model
GET /api/colonies/{colony_id}/usage — same shape, colony scope
GET /api/agents/queen/usage?range=...&group_by=... — rollup view (billing)
```
### Admin / business
```
GET /api/usage/rollup?range=...&group_by=user|tenant|model|colony
POST /api/quotas/{tenant} — set caps (if quota work in scope)
```
---
## 7. Net-new work — sized in shirt-size, not days
| Workstream | Local / Cloud | Size | Depends on | Notes |
|---|---|---|---|---|
| Event-bus → local SSE bridge ([sse.py](../../core/framework/server/sse.py) exists, route does not) | Local | **S** | — | Unlocks all real-time status surfaces in the desktop UI. Highest leverage piece. |
| Persisted local event store (SQLite outbox) | Local | **M** | Decision §5 | One writer, append-only; reuse existing JSONL writer. Source of truth for cloud push. |
| Local aggregation queries + `/usage` endpoints | Local | **M** | Persisted store | Per-session usage on disk. |
| **Outbox transport (local → cloud)** | **Boundary** | **ML** | Local store + auth | New work: durable queue, retry, redaction policy, opt-in switch, schema versioning. This is the bridge to the cloud product. |
| **Cloud event ingest + aggregate store** | **Cloud** | **L** | Outbox transport | New cloud infra (Postgres/ClickHouse/BigQuery). Hosting, ops, retention policy, access controls. |
| **Cloud-side status/usage API + dashboards** | **Cloud** | **M** | Cloud aggregate | Mirrors §6 endpoints against the cloud store; this is what business users actually see. |
| Identity layer (user_id / tenant_id on events) | Boundary | **M** | Auth model | Currently no user identity in events. Identity attaches at outbox time, not at emit time. |
| OpenTelemetry exporter (schema is ready) | Boundary | **SM** | — | `trace_id`/`span_id` already populated; an OTel collector can be the cloud sink instead of a custom outbox. |
| Quota / policy hooks | Cloud-authoritative | **L** | Cloud store + identity | Cloud holds the meter; runtime calls home synchronously on a critical path. |
| Liveness/heartbeat (S7) | Local emit, cloud consume | **S** | Outbox | Runtime must actively post; cloud cannot infer liveness from absence. |
| Cost attribution UI rollups | Cloud | **S** | `/usage` cloud endpoints | Shared with frontend doc. |
**Critical path for first frontend release (local desktop UI):** SSE bridge → status endpoints (S1S5) → per-session usage endpoint (U1, U2). Everything else is incremental.
**Critical path for first cloud release (business ask):** local event store → outbox transport with redaction + opt-in → cloud ingest → cloud `/usage` and `/status` endpoints. The local UI work above is *not* a prerequisite for the cloud cut, but most of the local-side primitives (event store, durable-event filtering) are shared, so doing them in order minimizes rework.
---
## 8. Risks and tradeoffs the architect should weigh
1. **Event volume.** `LLM_TEXT_DELTA` fires per token. A persisted store must filter — don't write deltas, write `LLM_TURN_COMPLETE`. This is the #1 way the table blows up.
2. **Privacy / desktop posture — the central architectural constraint.** The runtime is local by default ([config.py:20-44](../../core/framework/config.py#L20-L44)). The data inventory in §3.8 confirms that **no data leaves the user's machine today**, including the data the business ask needs in the cloud. Closing that gap is not "add a metrics push" — it is a new system boundary with: (a) explicit user opt-in (defaults must be safe for OSS / self-hosted users), (b) a documented redaction list (no prompts, no tool args, no file paths in the default payload), (c) schema versioning so cloud aggregates do not break on runtime upgrades, (d) a clear answer for self-hosted / air-gapped deployments where the cloud sink is unreachable, (e) regional data-residency rules if the product is sold internationally. This is the single largest design decision in the document.
3. **Cost-table accuracy.** `cost_usd` is computed from a static catalog. Using it for billing means committing to keeping the catalog current (or pulling from provider invoices). For *display*, the current approach is fine; for *charging*, it is not.
4. **Identity coupling.** Events are session-scoped today. Adding `user_id`/`tenant_id` everywhere is invasive. Recommend pinning identity at the *session* boundary and joining on session at query time, rather than threading identity through every event payload.
5. **Status vs. heartbeat semantics.** "Idle" is not "dead." A Queen sitting in `independent` waiting for a user message is healthy and should not page anyone. The stall-score in §5 must distinguish idle-by-design from stalled-by-bug — the existing `STREAM_INACTIVE` / `NODE_STALLED` events already make this distinction; preserve it.
6. **Backpressure from observability.** If usage tracking sits in the LLM call path (for quotas), it must not add latency. Recommend: meter is async/eventual for display; only quota checks are synchronous, and only when the customer has a quota.
7. **Worker-side gap.** Worker LLM calls are accounted in their own session's L1L3 logs but are *not* automatically rolled into the parent Queen session. Cost attribution from Queen → spawned colony requires either (a) a parent_session_id field on the colony's session row, or (b) walking the `COLONY_CREATED` event graph at query time. (a) is cleaner.
---
## 9. Recommendation
Ship in four thin slices. The first two are local-only and unblock the desktop UI; the last two are what actually deliver the business ask of cloud-visible status and metrics.
1. **Slice 1 — Live local status (1 sprint, fully local).**
SSE bridge + `/sessions/{id}/events/stream` + `/sessions/{id}/health` + `/sessions/{id}/tree`. Frontend (local UI) gets the right-rail and the agent-tree. No persistence work, no cloud. (S1S5, S8.)
2. **Slice 2 — Per-session local usage store (12 sprints, fully local).**
Persisted event store (SQLite outbox at `HIVE_HOME/runtime.db`), filtered to durable event types only. `/sessions/{id}/usage` + `/colonies/{id}/usage`. No identity, no rollups, **no cloud transport yet**. This is the foundation the cloud slice rides on. (U1U5.)
3. **Slice 3 — Local→cloud outbox + cloud ingest (the cloud cut, scope-defining).**
Durable outbox queue, redaction policy, opt-in toggle, identity attachment, schema versioning, retry/backoff. Cloud-side ingest service + aggregate store. **This is where the local-only world becomes a cloud product.** Architect must decide §3.8 Shape, §5 storage, redaction defaults, and identity model before this slice can start.
4. **Slice 4 — Cloud rollups, dashboards, quotas (scope TBD with product).**
Tenant aggregation, daily/monthly rollups, quota enforcement, OTel export, business dashboards. (U6U8.) Defer until business confirms billing model — the answer (per-seat vs. per-token vs. per-colony) changes the data model.
Slices 1 and 2 are mostly **wiring** — the events exist, the schemas exist, the storage paths exist. Slice 3 is the **first slice that introduces a new architectural boundary** (local→cloud transport + identity + privacy contract); everything novel about the business ask lives there. Slice 4 is **business design**, not engineering scope.
---
## 10. Open questions for the architect
The first four are direct consequences of the local-first / cloud-required gap surfaced in §3.8 and §8.2.
1. **Cloud transport shape — Shape A, B, or C from §3.8?** This decision is upstream of the entire data model. Recommend Shape B (outbox push) absent a strong privacy argument for Shape A.
2. **Redaction default for the cloud payload.** What goes (model, token counts, latency, tool names, status) vs. what stays local (prompts, tool arguments, tool results, file paths, conversation content)? Need a written allowlist before Slice 3 starts.
3. **Self-hosted / air-gapped users.** If the cloud sink is unreachable or disabled, what does the runtime do — buffer indefinitely, drop oldest, or refuse to start? Defaults differ for OSS vs. SaaS distributions.
4. **Identity binding point.** Do we attach `user_id` / `tenant_id` at event-emit time (invasive, threads identity through every node), at session-create time (clean, requires session-level auth), or at outbox-flush time (simplest, but loses per-event provenance)? Recommend session-create.
5. Do we need quota *enforcement*, or only quota *visibility* in v1?
6. Frontend doc: are status and usage rendered in the same panel or different surfaces? This affects whether we ship one merged endpoint or two.
7. Are we willing to pay the cost-table maintenance burden, or should "cost" stay labeled as estimated and not be used for invoicing?
---
## Appendix — Pointers
- Queen lifecycle: [core/framework/agents/queen/nodes/__init__.py](../../core/framework/agents/queen/nodes/__init__.py)
- Event bus + types: [core/framework/host/event_bus.py](../../core/framework/host/event_bus.py)
- Runtime log schemas: [core/framework/tracker/runtime_log_schemas.py](../../core/framework/tracker/runtime_log_schemas.py)
- Runtime log store: [core/framework/tracker/runtime_log_store.py](../../core/framework/tracker/runtime_log_store.py)
- LLM accounting: [core/framework/llm/provider.py](../../core/framework/llm/provider.py), [model_catalog.py](../../core/framework/llm/model_catalog.py)
- Colony progress DB: [core/framework/host/progress_db.py](../../core/framework/host/progress_db.py)
- Task events: [core/framework/tasks/events.py](../../core/framework/tasks/events.py)
- Session HTTP: [core/framework/server/routes_sessions.py](../../core/framework/server/routes_sessions.py)
- SSE primitive: [core/framework/server/sse.py](../../core/framework/server/sse.py)
- Worker health: [core/framework/tools/worker_monitoring_tools.py](../../core/framework/tools/worker_monitoring_tools.py)
- Config / env vars: [core/framework/config.py](../../core/framework/config.py)