Some LLM APIs (vLLM, Xinference, Chinese LLM providers) reject multiple
system messages with \”System message must be at the beginning.\” The
subagent executor was sending separate SystemMessages for the configured
system_prompt and each loaded skill, which caused failures when calling
task tool with sub-agents.
Merge system_prompt and all skill content into one SystemMessage in the
initial state, and pass system_prompt=None to create_agent() so the
factory doesn't prepend a second one.
Fixes#2693
* feat(community): add Serper web search provider
Add a new community search provider backed by the Serper Google Search
API (https://serper.dev). Serper returns real-time Google results via a
simple JSON API and requires only an API key — no extra Python package.
Changes:
- backend/packages/harness/deerflow/community/serper/__init__.py
- backend/packages/harness/deerflow/community/serper/tools.py
Implements web_search_tool using httpx (already a project dependency).
API key is read from config.yaml `api_key` field or SERPER_API_KEY env var.
Follows the same interface / output shape as the existing ddg_search provider.
Exposes max_results parameter (default 5) with config override logic.
- backend/tests/test_serper_tools.py
Unit tests covering API key resolution, config overrides, HTTP errors,
empty results, and parameter passing.
- config.example.yaml: add commented-out Serper example alongside other providers
- .env.example: add SERPER_API_KEY placeholder
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Fix the lint error
* Fix the lint error
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(gateway): return ISO 8601 timestamps from threads endpoints (#2594)
ThreadResponse documents created_at / updated_at as ISO timestamps,
matching the LangGraph Platform schema (langgraph_sdk.schema.Thread
exposes them as datetime, JSON-encoded as ISO 8601). The gateway
threads router was instead emitting str(time.time()) — unix-second
floats — breaking frontend new Date() parsing and producing a mixed
ISO/unix wire format that also corrupted the search sort order.
Centralize timestamp generation in deerflow.utils.time:
- now_iso() — datetime.now(UTC).isoformat()
- coerce_iso(x) — heals legacy unix-timestamp strings on read so the
store converges to ISO without a one-shot migration
threads.py: replace 6 time.time() call sites with now_iso(); wrap all
read paths and Phase-2 checkpoint metadata with coerce_iso(); _store_upsert
opportunistically heals legacy created_at on update; drop unused time import.
thread_runs.py: reuse now_iso() instead of a private duplicate _now_iso(),
preventing future drift between the two timestamp call sites.
Tests: 9 unit tests for the helper; 5 integration tests pinning the ISO
contract for create/get/patch/search and the legacy-healing path on the
internal store upsert. Full suite: 2144 passed, 15 skipped, 0 failed.
Closes#2594
* fix(gateway): coerce checkpoint metadata timestamps to ISO on read
After the merge with main, three additional read paths in ``threads.py``
were still emitting raw ``str(metadata.get("created_at", ""))`` —
``get_thread_state``, ``update_thread_state``, and ``get_thread_history``.
Same root cause as #2594: when the checkpoint metadata's ``created_at``
is a unix-second float (legacy data, or a checkpoint written by an older
Gateway version), ``str(float)`` produces ``"1777252410.411327"`` and the
frontend's ``new Date(...)`` returns ``Invalid Date``. The fix on the
``/threads/{id}`` GET path was already in place; these three sibling
endpoints needed the same treatment.
All four call sites now flow through ``coerce_iso``, so:
- legacy float metadata heals to ISO on the way out,
- ISO metadata passes through unchanged,
- ``datetime`` instances (which the new ``coerce_iso`` branch handles
explicitly) emit with the ``T`` separator instead of falling through
to the space-separated ``str(datetime)`` form.
Coverage added for the two endpoints not already pinned by the merge:
- ``test_get_thread_state_returns_iso_for_legacy_checkpoint_metadata``
- ``test_get_thread_history_returns_iso_for_legacy_checkpoint_metadata``
Both pre-seed a checkpoint whose metadata carries the literal float
from the issue body and assert the wire format is ISO.
* refactor: thread app config through lead prompt
* fix: honor explicit app config across runtime paths
* style: format subagent executor tests
* fix: thread resolved app config and guard subagents-only fallback
Address two PR review findings:
1. _create_summarization_middleware passed the original (possibly None)
app_config into create_chat_model, forcing the model factory back to
ambient get_app_config() and risking config drift between the
middleware's resolved view and the model's view. Pass the resolved
AppConfig instance through end-to-end.
2. get_available_subagent_names accepted Any-typed config and forwarded
it to is_host_bash_allowed, which reads ``.sandbox``. A
SubagentsAppConfig (also accepted upstream as a sum-type input) has
no ``.sandbox`` attribute and would be silently treated as "no
sandbox configured", incorrectly disabling the bash subagent. Guard
on hasattr and fall back to ambient lookup otherwise.
Adds regression tests for both paths.
* chore: simplify hasattr guard and tighten regression tests
- Collapse if/else into ternary in get_available_subagent_names; hasattr(None, ...) is False so the explicit None check was redundant.
- Drop comments that narrate the change rather than explain non-obvious WHY (test names already convey intent).
- Replace stringly-typed sentinel "no-arg" in regression test with direct args tuple comparison.
---------
Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com>
* fix(sandbox): pass no_change_timeout to exec_command to prevent 120s premature termination
The agent_sandbox library's shell API defaults no_change_timeout to 120
seconds. When AioSandbox.execute_command() called exec_command() without
this parameter, commands producing no output for 120s would return with
NO_CHANGE_TIMEOUT status even though the script was still running.
Pass no_change_timeout=600 to all exec_command calls (matching the
client-level HTTP timeout) so long-running commands are not cut short.
Fixes#2668
* test(sandbox): add assertions for no_change_timeout in execute_command and list_dir
Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/2f37bc72-0826-4443-a6ba-e5b78c22fb5a
Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
* fix(subagents): use model override for tools and middleware
* fix(config): resolve effective subagent model
* fix(subagents): defer app config loading
* fix(subagents): fully defer config.yaml load in executor __init__
The previous attempt only relocated the explicit get_app_config() call,
but left resolve_subagent_model_name(...) running eagerly in __init__.
That helper has its own internal get_app_config() fallback, which still
fired when both app_config and parent_model were None and
config.model == "inherit" — exactly the path unit tests hit, breaking
21 tests in CI with FileNotFoundError: config.yaml.
Skip the eager resolve in __init__ when it would require loading the
config file, and defer to _create_agent (which already has the
app_config or get_app_config() fallback).
* fix(harness): resolve runtime paths from project root
* docs(config): update
* fix(config): address runtime path review feedback
* test(config): fix skills path e2e root
* test(config): cover legacy config fallback when project root lacks config files
Verifies that when DEER_FLOW_PROJECT_ROOT is unset and cwd has no
config.yaml/extensions_config.json, AppConfig and ExtensionsConfig fall back
to the legacy backend/repo-root candidates — the backward-compat path
requested in PR #2642 review.
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(agents): propagate agent_name into ToolRuntime.context for setup_agent (#2677)
When creating a custom agent via the web UI, SOUL.md was always written
to the global base_dir/SOUL.md instead of agents/<name>/SOUL.md.
Root cause: the bootstrap flow sends agent_name via body.context, but
two layers were broken:
1. services.py only forwarded body.context keys into config["configurable"];
config["context"] was never populated.
2. worker.py constructed the parent Runtime with a hard-coded
{thread_id, run_id} context, ignoring config["context"] entirely.
After the langgraph >= 1.1.9 bump (#98a5b34f), ToolRuntime.context no
longer falls back to configurable, so setup_agent's
runtime.context.get("agent_name") returned None and the tool's silent
agent_name=None -> base_dir fallback kicked in, overwriting the global
SOUL.md.
Fix:
- services.py: extract merge_run_context_overrides() and write the
whitelisted context keys into both configurable (legacy readers) and
context (langgraph 1.1+ ToolRuntime consumers).
- worker.py: extract _build_runtime_context() and merge config["context"]
into the Runtime's context (without letting callers override
thread_id/run_id).
The base_dir fallback in setup_agent_tool.py is left in place because
the IM /bootstrap channel command depends on it. That code path can
be tightened in a follow-up.
Adds regression tests covering both helpers.
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Centralize log level parsing in `logging_level_from_config()` and
application in `apply_logging_level()` within `deerflow.config.app_config`.
- Gateway lifespan applies configured log level on startup
- `debug.py` uses shared helpers instead of local duplicates
- `apply_logging_level()` targets only `deerflow`/`app` logger hierarchies
so third-party library verbosity is not affected; root handler levels
are only lowered (never raised) to allow configured loggers through
without suppressing third-party output; root logger level is not modified
- Config field description updated to clarify scope
- Tests save/restore global logging state to avoid test pollution
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
* fix(memory): replace short-lived asyncio.run() with persistent event loop to prevent zombie httpx connections
The memory updater used asyncio.run() inside daemon threads, creating
and destroying short-lived event loops on every update. Langchain
providers (e.g. langchain-anthropic) cache httpx AsyncClient instances
globally via @lru_cache, so SSL connections created on a loop that is
subsequently destroyed become zombie connections in the shared pool.
When the main agent's lead run later reuses one of these connections,
httpx/anyio triggers RuntimeError: Event loop is closed during
connection cleanup.
Replace the ThreadPoolExecutor + asyncio.run() pattern with a
_MemoryLoopRunner that maintains a single persistent event loop in a
daemon thread for the process lifetime. Since the loop never closes,
connections bound to it never become invalid. The _run_async_update_sync
function now submits coroutines to this persistent loop via
run_coroutine_threadsafe instead of creating throwaway loops.
* update the code to address the review comments
* Fix the review comments of 2615
P1 — user_id forwarded through sync path: Added user_id parameter to _prepare_update_prompt, _finalize_update, and _do_update_memory_sync, and forwarded it to get_memory_data(agent_name, user_id=user_id) and
save(..., user_id=user_id). The update_memory() entry point now passes user_id through both the executor.submit path and the direct call path. Added TestUserIdForwarding with two regression tests (sync + async)
verifying get_memory_data and save receive the correct user_id.
P2 — aupdate_memory() delegates to sync: Replaced the model.ainvoke() call with asyncio.to_thread(self._do_update_memory_sync, ...). This eliminates the unsafe async provider client path entirely — all memory
updater entry points now use the isolated sync model.invoke() path. Updated the test from asserting ainvoke is awaited to asserting invoke is called and ainvoke is not.
Nit — duplicate comment removed: Removed the duplicated # Matches sentences... comment on line 230.
* Chore(test): update the code of test_memory_updater
---------
Co-authored-by: rayhpeng <rayhpeng@gmail.com>
The recent refactor (7bf618de) removed the langgraph upstream and added
individual /api/* prefix locations for models, memory, mcp, skills,
agents, threads, and sandboxes. However, /api/v1/auth/* routes (login,
register, setup-status, etc.) were not covered by any explicit location
block, causing them to fall through to the frontend catch-all.
In Docker production builds, Next.js rewrites are baked at build time
with http://127.0.0.1:8001 (the gateway is unreachable from the
frontend container's localhost), resulting in ECONNREFUSED errors for
all auth operations.
Adding a catch-all `location /api/` block after the more specific
prefix/regex locations ensures all gateway API routes are properly
proxied. nginx's location matching priority means the more specific
locations above still take precedence for /api/langgraph/, /api/models,
/api/memory, /api/mcp, /api/skills, /api/agents, /api/threads/*, and
/api/sandboxes.
Co-authored-by: Chincherry93 <Chincherry93@users.noreply.github.com>
* refactor: thread app_config through middleware factories
Continues the incremental config-refactor sequence (#2611 root, #2612 lead
path) one layer deeper into the middleware factories. Two ambient lookups
inside _build_runtime_middlewares are eliminated and the LLMErrorHandling
band-aid removed:
- _build_runtime_middlewares / build_lead_runtime_middlewares /
build_subagent_runtime_middlewares now require app_config: AppConfig.
- get_guardrails_config() inside the factory is replaced with
app_config.guardrails (semantically identical — same default-factory
GuardrailsConfig — verified by direct equality check).
- LLMErrorHandlingMiddleware.__init__ now requires app_config and reads
circuit_breaker fields directly. The class-level
circuit_failure_threshold / circuit_recovery_timeout_sec defaults are
removed along with the try/except (FileNotFoundError, RuntimeError):
pass band-aid — the let-it-crash invariant the rest of the refactor
enforces.
Caller chain (already-resolved app_config sources):
- _build_middlewares in lead_agent/agent.py: reorder so
resolved_app_config = app_config or get_app_config() is computed BEFORE
build_lead_runtime_middlewares is called, then passed as kwarg.
- SubagentExecutor: optional app_config parameter (mirrors the lead-agent
pattern); _create_agent does the same `or get_app_config()` fallback at
agent-build time, so task_tool callers don't need to plumb app_config
through yet (typed-context plumbing for tool runtimes is a separate
refactor).
Tests:
- test_llm_error_handling_middleware: _make_app_config helper using
AppConfig(sandbox=SandboxConfig(use="test")) — same minimal-config
pattern conftest already uses. Three direct LLMErrorHandlingMiddleware()
calls each followed by post-construction circuit_breaker mutation fold
cleanly into _build_middleware(circuit_failure_threshold=...,
circuit_recovery_timeout_sec=...).
Verification:
- tests/test_llm_error_handling_middleware.py — 14 passed
- tests/test_subagent_executor.py — 28 passed
- tests/test_tool_error_handling_middleware.py — 6 passed
- tests/test_task_tool_core_logic.py — 18 passed (verifies task_tool
unchanged behavior)
- Full suite: 2697 passed, 3 skipped. The single intermittent failure in
tests/test_client_e2e.py::test_tool_call_produces_events is pre-existing
LLM flakiness (the test asserts the model decided to call a tool;
reproduces 1/3 on unchanged main as well).
* fix: address middleware app config review comments
* fix: satisfy app config annotation lint
* test: cover explicit app config middleware wiring
---------
Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com>
In convertToSteps(), the extractReasoningContentFromMessage function was
called twice for the same message - once to check if reasoning exists and
again to assign it to the step object. Reuse the already-extracted value
from the local variable instead.
* feat(channels): add DingTalk channel integration
Add a new DingTalk messaging channel using the dingtalk-stream SDK
with Stream Push (WebSocket), requiring no public IP. Supports both
plain sampleMarkdown replies and optional AI Card streaming for a
typewriter effect when card_template_id is configured.
- Add DingTalkChannel implementation with token management, message
routing, allowed_users filtering, and markdown adaptation
- Register dingtalk in channel service registry and capability map
- Propagate inbound metadata to outbound messages in ChannelManager
for DingTalk sender context (sender_staff_id, conversation_type)
- Add dingtalk-stream dependency to pyproject.toml
- Add configuration examples in config.example.yaml and .env.example
- Update all README translations with setup instructions
- Add comprehensive test suite (test_dingtalk_channel.py) and
metadata propagation test in test_channels.py
- Update backend CLAUDE.md to document DingTalk channel
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(channels): address PR review feedback for DingTalk integration
- Replace runtime mutation of CHANNEL_CAPABILITIES with a
`supports_streaming` property on the Channel base class, overridden
by DingTalkChannel, FeishuChannel, and WeComChannel
- Store stream client reference and attempt graceful disconnect in
stop(); guard _on_chatbot_message with _running check to prevent
post-stop message processing
- Use msg.chat_id as the primary routing key in send/send_file via
a shared _resolve_routing helper, with metadata as fallback
- Fix process() return type annotation from tuple[str, str] to
tuple[int, str] to match AckMessage.STATUS_OK
- Protect _incoming_messages with threading.Lock for cross-thread
safety between the Stream Push thread and the asyncio loop
- Re-add Docker Compose URL guidance removed during DingTalk setup
docs addition in README.md
- Fix incomplete sentence in README_zh.md (missing verb "启用")
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(docs): restore plain paragraph format for Docker Compose note
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(channels): fix isinstance TypeError and add file size guard in DingTalk channel
Use tuple syntax for isinstance() type check to avoid runtime TypeError
with PEP 604 union types. Add upload size limit (20MB) before reading
files into memory. Narrow exception handlers to specific types.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(channels): propagate markdown fallback errors and validate access token response
- Re-raise exceptions in _send_markdown_fallback to prevent partial
deliveries (files sent without accompanying text)
- Validate _get_access_token response: reject non-dict bodies, empty
tokens, and coerce invalid expireIn to a safe default
- Add tests for both fixes
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(channels): validate upload response and broaden send_file exception handling
- Validate _upload_media JSON response: handle JSONDecodeError and
non-dict payloads gracefully by returning None
- Broaden send_file exception tuple to include TypeError and
AttributeError for unexpected JSON shapes
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(channels): fix streaming race on channel registration and slim outbound metadata
- Register channel in service before calling start() to avoid race
where background receiver publishes inbound before registration,
causing manager to fall back to static CHANNEL_CAPABILITIES
- Strip known-large metadata keys (raw_message, ref_msg) from outbound
messages to prevent memory bloat from propagated inbound payloads
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Update service.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Update CLAUDE.md
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* fix(security): allow disabling API docs in production via GATEWAY_ENABLE_DOCS
Expose /docs, /redoc, and /openapi.json only when GATEWAY_ENABLE_DOCS=true
(default). Setting GATEWAY_ENABLE_DOCS=false disables all three endpoints,
preventing unauthorized API surface discovery in production deployments.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* test(security): add unit tests and docs for GATEWAY_ENABLE_DOCS
Add 7 tests covering default behavior, env var parsing (case-insensitive,
fail-closed), endpoint visibility, and health endpoint independence.
Update CONFIGURATION.md and CLAUDE.md with the new toggle.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* style(security): apply ruff formatting to gateway app.py
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
The new-agent page pre-generates a thread UUID and passed it directly
to useThreadStream, which made the LangGraph SDK POST to
/threads/{uuid}/runs/stream against a thread the backend had never
created. After PR #2566 introduced multi-tenant owner checks on the
runs endpoints, that request now 404s with "Thread not found".
Pass threadId: undefined to useThreadStream so the SDK takes the
create-then-run path. The pre-generated UUID is still forwarded via
SubmitOptions.threadId in sendMessage, so the new thread is created
with that exact id and onCreated rebinds the hook to it.
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
introduces a complete authentication/authorization system, SQL persistence layer, run event history, user data isolation, and extensive documentation in both English and Chinese. The core additions are:
Auth system: JWT-based auth with local email/password provider, CSRF protection, rate limiting
Persistence layer: SQLAlchemy 2.0 async ORM with SQLite backend (users, threads, runs, events, feedback)
User isolation: Per-user data scoping via contextvars sentinel pattern
Frontend: Login/setup pages, AuthProvider, CSRF-aware fetcher
Documentation: Comprehensive EN/ZH docs for harness, application, tutorials
* feat(models): 适配 MindIE引擎的模型
* test: add unit tests for MindIEChatModel adapter and fix PR review comments
* chore: update uv.lock with pytest-asyncio
* build: add pytest-asyncio to test dependencies
* fix: address PR review comments (lazy import, cache clients, safe newline escape, strict xml regex)
* fix(mindie): preserve string args without JSON quotes in XML tool call serialization
* fix(mindie): preserve string args without JSON quotes in XML tool call serialization
* test_mindie_provider:format
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* fix(mindie): prevent nested tool_call params from leaking into outer args
* fixed by escaping XML entities in _fix_messages and unescaping during parse, with regression tests added.
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* fix(sandbox): block host bash traversal escapes
Fixes#2535
* fix(sandbox): harden local bash path guards
* fix(sandbox): avoid bash cd argument false positives
* Fix the lint error
Add function to resolve and validate user data path.
* Fix the lint error
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(security): harden auth system and fix run journal logic bug
- Fix inverted condition in RunJournal.on_chat_model_start that prevented
first human message capture (not messages → messages)
- Pre-hash passwords with SHA-256 before bcrypt to avoid silent 72-byte
truncation vulnerability
- Move load_dotenv() from module scope into get_auth_config() to prevent
import-time os.environ mutation breaking test isolation
- Return generic ‘Invalid token’ instead of exposing specific error
variants (expired, malformed, invalid_signature) to clients
- Make @require_auth independently enforce 401 instead of silently
passing through when AuthMiddleware is absent
- Rate-limit /setup-status endpoint with per-IP cooldown to mitigate
initialization-state information leak
- Document in-process rate limiter limitation for multi-worker deployments
* fix(security): return 429+Retry-After on setup-status rate limit, bound cooldown dict
Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/070d0be8-99a5-46c8-85bb-6b81b5284021
Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>
* fix(security): add versioned password hashes with auto-migration on login
The SHA-256 pre-hash change silently broke verification for any existing
bcrypt-only password hashes. Introduce a <N>$ prefix scheme so hashes
are self-describing:
- v2 (current): bcrypt(b64(sha256(password))) with $ prefix
- v1 (legacy): plain bcrypt, prefixed $ or bare (no prefix)
verify_password auto-detects the version and falls back to v1 for older
hashes. LocalAuthProvider.authenticate() now rehashes legacy hashes to v2
on successful login via needs_rehash(), so existing users upgrade
transparently without a dedicated migration step.
* fix(auth): harden verify_password, best-effort rehash, update require_auth docstring, downgrade journal logging
- password.py: wrap bcrypt.checkpw in try/except → return False for malformed/corrupt hashes instead of crashing
- local_provider.py: wrap auto-rehash update_user() in try/except so transient DB errors don't fail valid logins
- authz.py: update require_auth docstring to reflect independent 401 enforcement
- journal.py: downgrade on_chat_model_start from INFO to DEBUG, log only metadata (batch_count, message_counts) instead of full serialized/messages content
Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/48c5cf31-a4ab-418a-982a-6343c37bb299
Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>
* fix(auth): address code review - narrow ValueError catch, add rehash warning log, rename num_batches
Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/48c5cf31-a4ab-418a-982a-6343c37bb299
Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
- Updated documentation and comments to reflect the transition from LangGraph Server to Gateway.
- Changed default URLs in ChannelManager and tests to point to Gateway.
- Removed references to LangGraph Server in deployment scripts and configurations.
- Updated Nginx configuration to route API traffic to Gateway.
- Adjusted frontend configurations to utilize Gateway's API.
- Removed LangGraph service from Docker Compose files, consolidating services under Gateway.
- Added regression tests to ensure Gateway integration works as expected.
Co-authored-by: Copilot <copilot@github.com>
* Refactor API fetch calls to use a unified fetch function; enhance chat history loading with new hooks and UI components
- Replaced `fetchWithAuth` with a generic `fetch` function across various API modules for consistency.
- Updated `useThreadStream` and `useThreadHistory` hooks to manage chat history loading, including loading states and pagination.
- Introduced `LoadMoreHistoryIndicator` component for better user experience when loading more chat history.
- Enhanced message handling in `MessageList` to accommodate new loading states and history management.
- Added support for run messages in the thread context, improving the overall message handling logic.
- Updated translations for loading indicators in English and Chinese.
* Fix test assertions for run ordering in RunManager tests
- Updated assertions in `test_list_by_thread` to reflect correct ordering of runs.
- Modified `test_list_by_thread_is_stable_when_timestamps_tie` to ensure stable ordering when timestamps are tied.
* feat(persistence): add unified persistence layer with event store, token tracking, and feedback (#1930)
* feat(persistence): add SQLAlchemy 2.0 async ORM scaffold
Introduce a unified database configuration (DatabaseConfig) that
controls both the LangGraph checkpointer and the DeerFlow application
persistence layer from a single `database:` config section.
New modules:
- deerflow.config.database_config — Pydantic config with memory/sqlite/postgres backends
- deerflow.persistence — async engine lifecycle, DeclarativeBase with to_dict mixin, Alembic skeleton
- deerflow.runtime.runs.store — RunStore ABC + MemoryRunStore implementation
Gateway integration initializes/tears down the persistence engine in
the existing langgraph_runtime() context manager. Legacy checkpointer
config is preserved for backward compatibility.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(persistence): add RunEventStore ABC + MemoryRunEventStore
Phase 2-A prerequisite for event storage: adds the unified run event
stream interface (RunEventStore) with an in-memory implementation,
RunEventsConfig, gateway integration, and comprehensive tests (27 cases).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(persistence): add ORM models, repositories, DB/JSONL event stores, RunJournal, and API endpoints
Phase 2-B: run persistence + event storage + token tracking.
- ORM models: RunRow (with token fields), ThreadMetaRow, RunEventRow
- RunRepository implements RunStore ABC via SQLAlchemy ORM
- ThreadMetaRepository with owner access control
- DbRunEventStore with trace content truncation and cursor pagination
- JsonlRunEventStore with per-run files and seq recovery from disk
- RunJournal (BaseCallbackHandler) captures LLM/tool/lifecycle events,
accumulates token usage by caller type, buffers and flushes to store
- RunManager now accepts optional RunStore for persistent backing
- Worker creates RunJournal, writes human_message, injects callbacks
- Gateway deps use factory functions (RunRepository when DB available)
- New endpoints: messages, run messages, run events, token-usage
- ThreadCreateRequest gains assistant_id field
- 92 tests pass (33 new), zero regressions
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(persistence): add user feedback + follow-up run association
Phase 2-C: feedback and follow-up tracking.
- FeedbackRow ORM model (rating +1/-1, optional message_id, comment)
- FeedbackRepository with CRUD, list_by_run/thread, aggregate stats
- Feedback API endpoints: create, list, stats, delete
- follow_up_to_run_id in RunCreateRequest (explicit or auto-detected
from latest successful run on the thread)
- Worker writes follow_up_to_run_id into human_message event metadata
- Gateway deps: feedback_repo factory + getter
- 17 new tests (14 FeedbackRepository + 3 follow-up association)
- 109 total tests pass, zero regressions
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test+config: comprehensive Phase 2 test coverage + deprecate checkpointer config
- config.example.yaml: deprecate standalone checkpointer section, activate
unified database:sqlite as default (drives both checkpointer + app data)
- New: test_thread_meta_repo.py (14 tests) — full ThreadMetaRepository coverage
including check_access owner logic, list_by_owner pagination
- Extended test_run_repository.py (+4 tests) — completion preserves fields,
list ordering desc, limit, owner_none returns all
- Extended test_run_journal.py (+8 tests) — on_chain_error, track_tokens=false,
middleware no ai_message, unknown caller tokens, convenience fields,
tool_error, non-summarization custom event
- Extended test_run_event_store.py (+7 tests) — DB batch seq continuity,
make_run_event_store factory (memory/db/jsonl/fallback/unknown)
- Extended test_phase2b_integration.py (+4 tests) — create_or_reject persists,
follow-up metadata, summarization in history, full DB-backed lifecycle
- Fixed DB integration test to use proper fake objects (not MagicMock)
for JSON-serializable metadata
- 157 total Phase 2 tests pass, zero regressions
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* config: move default sqlite_dir to .deer-flow/data
Keep SQLite databases alongside other DeerFlow-managed data
(threads, memory) under the .deer-flow/ directory instead of a
top-level ./data folder.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(persistence): remove UTFJSON, use engine-level json_serializer + datetime.now()
- Replace custom UTFJSON type with standard sqlalchemy.JSON in all ORM
models. Add json_serializer=json.dumps(ensure_ascii=False) to all
create_async_engine calls so non-ASCII text (Chinese etc.) is stored
as-is in both SQLite and Postgres.
- Change ORM datetime defaults from datetime.now(UTC) to datetime.now(),
remove UTC imports.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(gateway): simplify deps.py with getter factory + inline repos
- Replace 6 identical getter functions with _require() factory.
- Inline 3 _make_*_repo() factories into langgraph_runtime(), call
get_session_factory() once instead of 3 times.
- Add thread_meta upsert in start_run (services.py).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(docker): add UV_EXTRAS build arg for optional dependencies
Support installing optional dependency groups (e.g. postgres) at
Docker build time via UV_EXTRAS build arg:
UV_EXTRAS=postgres docker compose build
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(journal): fix flush, token tracking, and consolidate tests
RunJournal fixes:
- _flush_sync: retain events in buffer when no event loop instead of
dropping them; worker's finally block flushes via async flush().
- on_llm_end: add tool_calls filter and caller=="lead_agent" guard for
ai_message events; mark message IDs for dedup with record_llm_usage.
- worker.py: persist completion data (tokens, message count) to RunStore
in finally block.
Model factory:
- Auto-inject stream_usage=True for BaseChatOpenAI subclasses with
custom api_base, so usage_metadata is populated in streaming responses.
Test consolidation:
- Delete test_phase2b_integration.py (redundant with existing tests).
- Move DB-backed lifecycle test into test_run_journal.py.
- Add tests for stream_usage injection in test_model_factory.py.
- Clean up executor/task_tool dead journal references.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(events): widen content type to str|dict in all store backends
Allow event content to be a dict (for structured OpenAI-format messages)
in addition to plain strings. Dict values are JSON-serialized for the DB
backend and deserialized on read; memory and JSONL backends handle dicts
natively. Trace truncation now serializes dicts to JSON before measuring.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(events): use metadata flag instead of heuristic for dict content detection
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(converters): add LangChain-to-OpenAI message format converters
Pure functions langchain_to_openai_message, langchain_to_openai_completion,
langchain_messages_to_openai, and _infer_finish_reason for converting
LangChain BaseMessage objects to OpenAI Chat Completions format, used by
RunJournal for event storage. 15 unit tests added.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(converters): handle empty list content as null, clean up test
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(events): human_message content uses OpenAI user message format
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(events): ai_message uses OpenAI format, add ai_tool_call message event
- ai_message content now uses {"role": "assistant", "content": "..."} format
- New ai_tool_call message event emitted when lead_agent LLM responds with tool_calls
- ai_tool_call uses langchain_to_openai_message converter for consistent format
- Both events include finish_reason in metadata ("stop" or "tool_calls")
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(events): add tool_result message event with OpenAI tool message format
Cache tool_call_id from on_tool_start keyed by run_id as fallback for on_tool_end,
then emit a tool_result message event (role=tool, tool_call_id, content) after each
successful tool completion.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(events): summary content uses OpenAI system message format
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(events): replace llm_start/llm_end with llm_request/llm_response in OpenAI format
Add on_chat_model_start to capture structured prompt messages as llm_request events.
Replace llm_end trace events with llm_response using OpenAI Chat Completions format.
Track llm_call_index to pair request/response events.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(events): add record_middleware method for middleware trace events
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test(events): add full run sequence integration test for OpenAI content format
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(events): align message events with checkpoint format and add middleware tag injection
- Message events (ai_message, ai_tool_call, tool_result, human_message) now use
BaseMessage.model_dump() format, matching LangGraph checkpoint values.messages
- on_tool_end extracts tool_call_id/name/status from ToolMessage objects
- on_tool_error now emits tool_result message events with error status
- record_middleware uses middleware:{tag} event_type and middleware category
- Summarization custom events use middleware:summarize category
- TitleMiddleware injects middleware:title tag via get_config() inheritance
- SummarizationMiddleware model bound with middleware:summarize tag
- Worker writes human_message using HumanMessage.model_dump()
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(threads): switch search endpoint to threads_meta table and sync title
- POST /api/threads/search now queries threads_meta table directly,
removing the two-phase Store + Checkpointer scan approach
- Add ThreadMetaRepository.search() with metadata/status filters
- Add ThreadMetaRepository.update_display_name() for title sync
- Worker syncs checkpoint title to threads_meta.display_name on run completion
- Map display_name to values.title in search response for API compatibility
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(threads): history endpoint reads messages from event store
- POST /api/threads/{thread_id}/history now combines two data sources:
checkpointer for checkpoint_id, metadata, title, thread_data;
event store for messages (complete history, not truncated by summarization)
- Strip internal LangGraph metadata keys from response
- Remove full channel_values serialization in favor of selective fields
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: remove duplicate optional-dependencies header in pyproject.toml
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(middleware): pass tagged config to TitleMiddleware ainvoke call
Without the config, the middleware:title tag was not injected,
causing the LLM response to be recorded as a lead_agent ai_message
in run_events.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: resolve merge conflict in .env.example
Keep both DATABASE_URL (from persistence-scaffold) and WECOM
credentials (from main) after the merge.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(persistence): address review feedback on PR #1851
- Fix naive datetime.now() → datetime.now(UTC) in all ORM models
- Fix seq race condition in DbRunEventStore.put() with FOR UPDATE
and UNIQUE(thread_id, seq) constraint
- Encapsulate _store access in RunManager.update_run_completion()
- Deduplicate _store.put() logic in RunManager via _persist_to_store()
- Add update_run_completion to RunStore ABC + MemoryRunStore
- Wire follow_up_to_run_id through the full create path
- Add error recovery to RunJournal._flush_sync() lost-event scenario
- Add migration note for search_threads breaking change
- Fix test_checkpointer_none_fix mock to set database=None
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: update uv.lock
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(persistence): address 22 review comments from CodeQL, Copilot, and Code Quality
Bug fixes:
- Sanitize log params to prevent log injection (CodeQL)
- Reset threads_meta.status to idle/error when run completes
- Attach messages only to latest checkpoint in /history response
- Write threads_meta on POST /threads so new threads appear in search
Lint fixes:
- Remove unused imports (journal.py, migrations/env.py, test_converters.py)
- Convert lambda to named function (engine.py, Ruff E731)
- Remove unused logger definitions in repos (Ruff F841)
- Add logging to JSONL decode errors and empty except blocks
- Separate assert side-effects in tests (CodeQL)
- Remove unused local variables in tests (Ruff F841)
- Fix max_trace_content truncation to use byte length, not char length
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* style: apply ruff format to persistence and runtime files
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Potential fix for pull request finding 'Statement has no effect'
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
* refactor(runtime): introduce RunContext to reduce run_agent parameter bloat
Extract checkpointer, store, event_store, run_events_config, thread_meta_repo,
and follow_up_to_run_id into a frozen RunContext dataclass. Add get_run_context()
in deps.py to build the base context from app.state singletons. start_run() uses
dataclasses.replace() to enrich per-run fields before passing ctx to run_agent.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(gateway): move sanitize_log_param to app/gateway/utils.py
Extract the log-injection sanitizer from routers/threads.py into a shared
utils module and rename to sanitize_log_param (public API). Eliminates the
reverse service → router import in services.py.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* perf: use SQL aggregation for feedback stats and thread token usage
Replace Python-side counting in FeedbackRepository.aggregate_by_run with
a single SELECT COUNT/SUM query. Add RunStore.aggregate_tokens_by_thread
abstract method with SQL GROUP BY implementation in RunRepository and
Python fallback in MemoryRunStore. Simplify the thread_token_usage
endpoint to delegate to the new method, eliminating the limit=10000
truncation risk.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: annotate DbRunEventStore.put() as low-frequency path
Add docstring clarifying that put() opens a per-call transaction with
FOR UPDATE and should only be used for infrequent writes (currently
just the initial human_message event). High-throughput callers should
use put_batch() instead.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(threads): fall back to Store search when ThreadMetaRepository is unavailable
When database.backend=memory (default) or no SQL session factory is
configured, search_threads now queries the LangGraph Store instead of
returning 503. Returns empty list if neither Store nor repo is available.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(persistence): introduce ThreadMetaStore ABC for backend-agnostic thread metadata
Add ThreadMetaStore abstract base class with create/get/search/update/delete
interface. ThreadMetaRepository (SQL) now inherits from it. New
MemoryThreadMetaStore wraps LangGraph BaseStore for memory-mode deployments.
deps.py now always provides a non-None thread_meta_repo, eliminating all
`if thread_meta_repo is not None` guards in services.py, worker.py, and
routers/threads.py. search_threads no longer needs a Store fallback branch.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(history): read messages from checkpointer instead of RunEventStore
The /history endpoint now reads messages directly from the
checkpointer's channel_values (the authoritative source) instead of
querying RunEventStore.list_messages(). The RunEventStore API is
preserved for other consumers.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(persistence): address new Copilot review comments
- feedback.py: validate thread_id/run_id before deleting feedback
- jsonl.py: add path traversal protection with ID validation
- run_repo.py: parse `before` to datetime for PostgreSQL compat
- thread_meta_repo.py: fix pagination when metadata filter is active
- database_config.py: use resolve_path for sqlite_dir consistency
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Implement skill self-evolution and skill_manage flow (#1874)
* chore: ignore .worktrees directory
* Add skill_manage self-evolution flow
* Fix CI regressions for skill_manage
* Address PR review feedback for skill evolution
* fix(skill-evolution): preserve history on delete
* fix(skill-evolution): tighten scanner fallbacks
* docs: add skill_manage e2e evidence screenshot
* fix(skill-manage): avoid blocking fs ops in session runtime
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(config): resolve sqlite_dir relative to CWD, not Paths.base_dir
resolve_path() resolves relative to Paths.base_dir (.deer-flow),
which double-nested the path to .deer-flow/.deer-flow/data/app.db.
Use Path.resolve() (CWD-relative) instead.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Feature/feishu receive file (#1608)
* feat(feishu): add channel file materialization hook for inbound messages
- Introduce Channel.receive_file(msg, thread_id) as a base method for file materialization; default is no-op.
- Implement FeishuChannel.receive_file to download files/images from Feishu messages, save to sandbox, and inject virtual paths into msg.text.
- Update ChannelManager to call receive_file for any channel if msg.files is present, enabling downstream model access to user-uploaded files.
- No impact on Slack/Telegram or other channels (they inherit the default no-op).
* style(backend): format code with ruff for lint compliance
- Auto-formatted packages/harness/deerflow/agents/factory.py and tests/test_create_deerflow_agent.py using `ruff format`
- Ensured both files conform to project linting standards
- Fixes CI lint check failures caused by code style issues
* fix(feishu): handle file write operation asynchronously to prevent blocking
* fix(feishu): rename GetMessageResourceRequest to _GetMessageResourceRequest and remove redundant code
* test(feishu): add tests for receive_file method and placeholder replacement
* fix(manager): remove unnecessary type casting for channel retrieval
* fix(feishu): update logging messages to reflect resource handling instead of image
* fix(feishu): sanitize filename by replacing invalid characters in file uploads
* fix(feishu): improve filename sanitization and reorder image key handling in message processing
* fix(feishu): add thread lock to prevent filename conflicts during file downloads
* fix(test): correct bad merge in test_feishu_parser.py
* chore: run ruff and apply formatting cleanup
fix(feishu): preserve rich-text attachment order and improve fallback filename handling
* fix(docker): restore gateway env vars and fix langgraph empty arg issue (#1915)
Two production docker-compose.yaml bugs prevent `make up` from working:
1. Gateway missing DEER_FLOW_CONFIG_PATH and DEER_FLOW_EXTENSIONS_CONFIG_PATH
environment overrides. Added in fb2d99f (#1836) but accidentally reverted
by ca2fb95 (#1847). Without them, gateway reads host paths from .env via
env_file, causing FileNotFoundError inside the container.
2. Langgraph command fails when LANGGRAPH_ALLOW_BLOCKING is unset (default).
Empty $${allow_blocking} inserts a bare space between flags, causing
' --no-reload' to be parsed as unexpected extra argument. Fix by building
args string first and conditionally appending --allow-blocking.
Co-authored-by: cooper <cooperfu@tencent.com>
* fix(frontend): resolve invalid HTML nesting and tabnabbing vulnerabilities (#1904)
* fix(frontend): resolve invalid HTML nesting and tabnabbing vulnerabilities
Fix `<button>` inside `<a>` invalid HTML in artifact components and add
missing `noopener,noreferrer` to `window.open` calls to prevent reverse
tabnabbing.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(frontend): address Copilot review on tabnabbing and double-tab-open
Remove redundant parent onClick on web_fetch ChainOfThoughtStep to
prevent opening two tabs on link click, and explicitly null out
window.opener after window.open() for defensive tabnabbing hardening.
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* refactor(persistence): organize entities into per-entity directories
Restructure the persistence layer from horizontal "models/ + repositories/"
split into vertical entity-aligned directories. Each entity (thread_meta,
run, feedback) now owns its ORM model, abstract interface (where applicable),
and concrete implementations under a single directory with an aggregating
__init__.py for one-line imports.
Layout:
persistence/thread_meta/{base,model,sql,memory}.py
persistence/run/{model,sql}.py
persistence/feedback/{model,sql}.py
models/__init__.py is kept as a facade so Alembic autogenerate continues to
discover all ORM tables via Base.metadata. RunEventRow remains under
models/run_event.py because its storage implementation lives in
runtime/events/store/db.py and has no matching repository directory.
The repositories/ directory is removed entirely. All call sites in
gateway/deps.py and tests are updated to import from the new entity
packages, e.g.:
from deerflow.persistence.thread_meta import ThreadMetaRepository
from deerflow.persistence.run import RunRepository
from deerflow.persistence.feedback import FeedbackRepository
Full test suite passes (1690 passed, 14 skipped).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(gateway): sync thread rename and delete through ThreadMetaStore
The POST /threads/{id}/state endpoint previously synced title changes
only to the LangGraph Store via _store_upsert. In sqlite mode the search
endpoint reads from the ThreadMetaRepository SQL table, so renames never
appeared in /threads/search until the next agent run completed (worker.py
syncs title from checkpoint to thread_meta in its finally block).
Likewise the DELETE /threads/{id} endpoint cleaned up the filesystem,
Store, and checkpointer but left the threads_meta row orphaned in sqlite,
so deleted threads kept appearing in /threads/search.
Fix both endpoints by routing through the ThreadMetaStore abstraction
which already has the correct sqlite/memory implementations wired up by
deps.py. The rename path now calls update_display_name() and the delete
path calls delete() — both work uniformly across backends.
Verified end-to-end with curl in gateway mode against sqlite backend.
Existing test suite (1690 passed) and focused router/repo tests pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(gateway): route all thread metadata access through ThreadMetaStore
Following the rename/delete bug fix in PR1, migrate the remaining direct
LangGraph Store reads/writes in the threads router and services to the
ThreadMetaStore abstraction so that the sqlite and memory backends behave
identically and the legacy dual-write paths can be removed.
Migrated endpoints (threads.py):
- create_thread: idempotency check + write now use thread_meta_repo.get/create
instead of dual-writing the LangGraph Store and the SQL row.
- get_thread: reads from thread_meta_repo.get; the checkpoint-only fallback
for legacy threads is preserved.
- patch_thread: replaced _store_get/_store_put with thread_meta_repo.update_metadata.
- delete_thread_data: dropped the legacy store.adelete; thread_meta_repo.delete
already covers it.
Removed dead code (services.py):
- _upsert_thread_in_store — redundant with the immediately following
thread_meta_repo.create() call.
- _sync_thread_title_after_run — worker.py's finally block already syncs
the title via thread_meta_repo.update_display_name() after each run.
Removed dead code (threads.py):
- _store_get / _store_put / _store_upsert helpers (no remaining callers).
- THREADS_NS constant.
- get_store import (router no longer touches the LangGraph Store directly).
New abstract method:
- ThreadMetaStore.update_metadata(thread_id, metadata) merges metadata into
the thread's metadata field. Implemented in both ThreadMetaRepository (SQL,
read-modify-write inside one session) and MemoryThreadMetaStore. Three new
unit tests cover merge / empty / nonexistent behaviour.
Net change: -134 lines. Full test suite: 1693 passed, 14 skipped.
Verified end-to-end with curl in gateway mode against sqlite backend
(create / patch / get / rename / search / delete).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
Co-authored-by: DanielWalnut <45447813+hetaoBackend@users.noreply.github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: JilongSun <965640067@qq.com>
Co-authored-by: jie <49781832+stan-fu@users.noreply.github.com>
Co-authored-by: cooper <cooperfu@tencent.com>
Co-authored-by: yangzheli <43645580+yangzheli@users.noreply.github.com>
* feat(auth): release-validation pass for 2.0-rc — 12 blockers + simplify follow-ups (#2008)
* feat(auth): introduce backend auth module
Port RFC-001 authentication core from PR #1728:
- JWT token handling (create_access_token, decode_token, TokenPayload)
- Password hashing (bcrypt) with verify_password
- SQLite UserRepository with base interface
- Provider Factory pattern (LocalAuthProvider)
- CLI reset_admin tool
- Auth-specific errors (AuthErrorCode, TokenError, AuthErrorResponse)
Deps:
- bcrypt>=4.0.0
- pyjwt>=2.9.0
- email-validator>=2.0.0
- backend/uv.toml pins public PyPI index
Tests: 12 pure unit tests (test_auth_config.py, test_auth_errors.py).
Scope note: authz.py, test_auth.py, and test_auth_type_system.py are
deferred to commit 2 because they depend on middleware and deps wiring
that is not yet in place. Commit 1 stays "pure new files only" as the
spec mandates.
* feat(auth): wire auth end-to-end (middleware + frontend replacement)
Backend:
- Port auth_middleware, csrf_middleware, langgraph_auth, routers/auth
- Port authz decorator (owner_filter_key defaults to 'owner_id')
- Merge app.py: register AuthMiddleware + CSRFMiddleware + CORS, add
_ensure_admin_user lifespan hook, _migrate_orphaned_threads helper,
register auth router
- Merge deps.py: add get_local_provider, get_current_user_from_request,
get_optional_user_from_request; keep get_current_user as thin str|None
adapter for feedback router
- langgraph.json: add auth path pointing to langgraph_auth.py:auth
- Rename metadata['user_id'] -> metadata['owner_id'] in langgraph_auth
(both metadata write and LangGraph filter dict) + test fixtures
Frontend:
- Delete better-auth library and api catch-all route
- Remove better-auth npm dependency and env vars (BETTER_AUTH_SECRET,
BETTER_AUTH_GITHUB_*) from env.js
- Port frontend/src/core/auth/* (AuthProvider, gateway-config,
proxy-policy, server-side getServerSideUser, types)
- Port frontend/src/core/api/fetcher.ts
- Port (auth)/layout, (auth)/login, (auth)/setup pages
- Rewrite workspace/layout.tsx as server component that calls
getServerSideUser and wraps in AuthProvider
- Port workspace/workspace-content.tsx for the client-side sidebar logic
Tests:
- Port 5 auth test files (test_auth, test_auth_middleware,
test_auth_type_system, test_ensure_admin, test_langgraph_auth)
- 176 auth tests PASS
After this commit: login/logout/registration flow works, but persistence
layer does not yet filter by owner_id. Commit 4 closes that gap.
* feat(auth): account settings page + i18n
- Port account-settings-page.tsx (change password, change email, logout)
- Wire into settings-dialog.tsx as new "account" section with UserIcon,
rendered first in the section list
- Add i18n keys:
- en-US/zh-CN: settings.sections.account ("Account" / "账号")
- en-US/zh-CN: button.logout ("Log out" / "退出登录")
- types.ts: matching type declarations
* feat(auth): enforce owner_id across 2.0-rc persistence layer
Add request-scoped contextvar-based owner filtering to threads_meta,
runs, run_events, and feedback repositories. Router code is unchanged
— isolation is enforced at the storage layer so that any caller that
forgets to pass owner_id still gets filtered results, and new routes
cannot accidentally leak data.
Core infrastructure
-------------------
- deerflow/runtime/user_context.py (new):
- ContextVar[CurrentUser | None] with default None
- runtime_checkable CurrentUser Protocol (structural subtype with .id)
- set/reset/get/require helpers
- AUTO sentinel + resolve_owner_id(value, method_name) for sentinel
three-state resolution: AUTO reads contextvar, explicit str
overrides, explicit None bypasses the filter (for migration/CLI)
Repository changes
------------------
- ThreadMetaRepository: create/get/search/update_*/delete gain
owner_id=AUTO kwarg; read paths filter by owner, writes stamp it,
mutations check ownership before applying
- RunRepository: put/get/list_by_thread/delete gain owner_id=AUTO kwarg
- FeedbackRepository: create/get/list_by_run/list_by_thread/delete
gain owner_id=AUTO kwarg
- DbRunEventStore: list_messages/list_events/list_messages_by_run/
count_messages/delete_by_thread/delete_by_run gain owner_id=AUTO
kwarg. Write paths (put/put_batch) read contextvar softly: when a
request-scoped user is available, owner_id is stamped; background
worker writes without a user context pass None which is valid
(orphan row to be bound by migration)
Schema
------
- persistence/models/run_event.py: RunEventRow.owner_id = Mapped[
str | None] = mapped_column(String(64), nullable=True, index=True)
- No alembic migration needed: 2.0 ships fresh, Base.metadata.create_all
picks up the new column automatically
Middleware
----------
- auth_middleware.py: after cookie check, call get_optional_user_from_
request to load the real User, stamp it into request.state.user AND
the contextvar via set_current_user, reset in a try/finally. Public
paths and unauthenticated requests continue without contextvar, and
@require_auth handles the strict 401 path
Test infrastructure
-------------------
- tests/conftest.py: @pytest.fixture(autouse=True) _auto_user_context
sets a default SimpleNamespace(id="test-user-autouse") on every test
unless marked @pytest.mark.no_auto_user. Keeps existing 20+
persistence tests passing without modification
- pyproject.toml [tool.pytest.ini_options]: register no_auto_user
marker so pytest does not emit warnings for opt-out tests
- tests/test_user_context.py: 6 tests covering three-state semantics,
Protocol duck typing, and require/optional APIs
- tests/test_thread_meta_repo.py: one test updated to pass owner_id=
None explicitly where it was previously relying on the old default
Test results
------------
- test_user_context.py: 6 passed
- test_auth*.py + test_langgraph_auth.py + test_ensure_admin.py: 127
- test_run_event_store / test_run_repository / test_thread_meta_repo
/ test_feedback: 92 passed
- Full backend suite: 1905 passed, 2 failed (both @requires_llm flaky
integration tests unrelated to auth), 1 skipped
* feat(auth): extend orphan migration to 2.0-rc persistence tables
_ensure_admin_user now runs a three-step pipeline on every boot:
Step 1 (fatal): admin user exists / is created / password is reset
Step 2 (non-fatal): LangGraph store orphan threads → admin
Step 3 (non-fatal): SQL persistence tables → admin
- threads_meta
- runs
- run_events
- feedback
Each step is idempotent. The fatal/non-fatal split mirrors PR #1728's
original philosophy: admin creation failure blocks startup (the system
is unusable without an admin), whereas migration failures log a warning
and let the service proceed (a partial migration is recoverable; a
missing admin is not).
Key helpers
-----------
- _iter_store_items(store, namespace, *, page_size=500):
async generator that cursor-paginates across LangGraph store pages.
Fixes PR #1728's hardcoded limit=1000 bug that would silently lose
orphans beyond the first page.
- _migrate_orphaned_threads(store, admin_user_id):
Rewritten to use _iter_store_items. Returns the migrated count so the
caller can log it; raises only on unhandled exceptions.
- _migrate_orphan_sql_tables(admin_user_id):
Imports the 4 ORM models lazily, grabs the shared session factory,
runs one UPDATE per table in a single transaction, commits once.
No-op when no persistence backend is configured (in-memory dev).
Tests: test_ensure_admin.py (8 passed)
* test(auth): port AUTH test plan docs + lint/format pass
- Port backend/docs/AUTH_TEST_PLAN.md and AUTH_UPGRADE.md from PR #1728
- Rename metadata.user_id → metadata.owner_id in AUTH_TEST_PLAN.md
(4 occurrences from the original PR doc)
- ruff auto-fix UP037 in sentinel type annotations: drop quotes around
"str | None | _AutoSentinel" now that from __future__ import
annotations makes them implicit string forms
- ruff format: 2 files (app/gateway/app.py, runtime/user_context.py)
Note on test coverage additions:
- conftest.py autouse fixture was already added in commit 4 (had to
be co-located with the repository changes to keep pre-existing
persistence tests passing)
- cross-user isolation E2E tests (test_owner_isolation.py) deferred
— enforcement is already proven by the 98-test repository suite
via the autouse fixture + explicit _AUTO sentinel exercises
- New test cases (TC-API-17..20, TC-ATK-13, TC-MIG-01..07) listed
in AUTH_TEST_PLAN.md are deferred to a follow-up PR — they are
manual-QA test cases rather than pytest code, and the spec-level
coverage is already met by test_user_context.py + the 98-test
repository suite.
Final test results:
- Auth suite (test_auth*, test_langgraph_auth, test_ensure_admin,
test_user_context): 186 passed
- Persistence suite (test_run_event_store, test_run_repository,
test_thread_meta_repo, test_feedback): 98 passed
- Lint: ruff check + ruff format both clean
* test(auth): add cross-user isolation test suite
10 tests exercising the storage-layer owner filter by manually
switching the user_context contextvar between two users. Verifies
the safety invariant:
After a repository write with owner_id=A, a subsequent read with
owner_id=B must not return the row, and vice versa.
Covers all 4 tables that own user-scoped data:
TC-API-17 threads_meta — read, search, update, delete cross-user
TC-API-18 runs — get, list_by_thread, delete cross-user
TC-API-19 run_events — list_messages, list_events, count_messages,
delete_by_thread (CRITICAL: raw conversation
content leak vector)
TC-API-20 feedback — get, list_by_run, delete cross-user
Plus two meta-tests verifying the sentinel pattern itself:
- AUTO + unset contextvar raises RuntimeError
- explicit owner_id=None bypasses the filter (migration escape hatch)
Architecture note
-----------------
These tests bypass the HTTP layer by design. The full chain
(cookie → middleware → contextvar → repository) is covered piecewise:
- test_auth_middleware.py: middleware sets contextvar from cookies
- test_owner_isolation.py: repositories enforce isolation when
contextvar is set to different users
Together they prove the end-to-end safety property without the
ceremony of spinning up a full TestClient + in-memory DB for every
router endpoint.
Tests pass: 231 (full auth + persistence + isolation suite)
Lint: clean
* refactor(auth): migrate user repository to SQLAlchemy ORM
Move the users table into the shared persistence engine so auth
matches the pattern of threads_meta, runs, run_events, and feedback —
one engine, one session factory, one schema init codepath.
New files
---------
- persistence/user/__init__.py, persistence/user/model.py: UserRow
ORM class with partial unique index on (oauth_provider, oauth_id)
- Registered in persistence/models/__init__.py so
Base.metadata.create_all() picks it up
Modified
--------
- auth/repositories/sqlite.py: rewritten as async SQLAlchemy,
identical constructor pattern to the other four repositories
(def __init__(self, session_factory) + self._sf = session_factory)
- auth/config.py: drop users_db_path field — storage is configured
through config.database like every other table
- deps.py/get_local_provider: construct SQLiteUserRepository with
the shared session factory, fail fast if engine is not initialised
- tests/test_auth.py: rewrite test_sqlite_round_trip_new_fields to
use the shared engine (init_engine + close_engine in a tempdir)
- tests/test_auth_type_system.py: add per-test autouse fixture that
spins up a scratch engine and resets deps._cached_* singletons
* refactor(auth): remove SQL orphan migration (unused in supported scenarios)
The _migrate_orphan_sql_tables helper existed to bind NULL owner_id
rows in threads_meta, runs, run_events, and feedback to the admin on
first boot. But in every supported upgrade path, it's a no-op:
1. Fresh install: create_all builds fresh tables, no legacy rows
2. No-auth → with-auth (no existing persistence DB): persistence
tables are created fresh by create_all, no legacy rows
3. No-auth → with-auth (has existing persistence DB from #1930):
NOT a supported upgrade path — "有 DB 到有 DB" schema evolution
is out of scope; users wipe DB or run manual ALTER
So the SQL orphan migration never has anything to do in the
supported matrix. Delete the function, simplify _ensure_admin_user
from a 3-step pipeline to a 2-step one (admin creation + LangGraph
store orphan migration only).
LangGraph store orphan migration stays: it serves the real
"no-auth → with-auth" upgrade path where a user's existing LangGraph
thread metadata has no owner_id field and needs to be stamped with
the newly-created admin's id.
Tests: 284 passed (auth + persistence + isolation)
Lint: clean
* security(auth): write initial admin password to 0600 file instead of logs
CodeQL py/clear-text-logging-sensitive-data flagged 3 call sites that
logged the auto-generated admin password to stdout via logger.info().
Production log aggregators (ELK/Splunk/etc) would have captured those
cleartext secrets. Replace with a shared helper that writes to
.deer-flow/admin_initial_credentials.txt with mode 0600, and log only
the path.
New file
--------
- app/gateway/auth/credential_file.py: write_initial_credentials()
helper. Takes email, password, and a "initial"/"reset" label.
Creates .deer-flow/ if missing, writes a header comment plus the
email+password, chmods 0o600, returns the absolute Path.
Modified
--------
- app/gateway/app.py: both _ensure_admin_user paths (fresh creation
+ needs_setup password reset) now write to file and log the path
- app/gateway/auth/reset_admin.py: rewritten to use the shared ORM
repo (SQLiteUserRepository with session_factory) and the
credential_file helper. The previous implementation was broken
after the earlier ORM refactor — it still imported _get_users_conn
and constructed SQLiteUserRepository() without a session factory.
No tests changed — the three password-log sites are all exercised
via existing test_ensure_admin.py which checks that startup
succeeds, not that a specific string appears in logs.
CodeQL alerts 272, 283, 284: all resolved.
* security(auth): strict JWT validation in middleware (fix junk cookie bypass)
AUTH_TEST_PLAN test 7.5.8 expects junk cookies to be rejected with
401. The previous middleware behaviour was "presence-only": check
that some access_token cookie exists, then pass through. In
combination with my Task-12 decision to skip @require_auth
decorators on routes, this created a gap where a request with any
cookie-shaped string (e.g. access_token=not-a-jwt) would bypass
authentication on routes that do not touch the repository
(/api/models, /api/mcp/config, /api/memory, /api/skills, …).
Fix: middleware now calls get_current_user_from_request() strictly
and catches the resulting HTTPException to render a 401 with the
proper fine-grained error code (token_invalid, token_expired,
user_not_found, …). On success it stamps request.state.user and
the contextvar so repository-layer owner filters work downstream.
The 4 old "_with_cookie_passes" tests in test_auth_middleware.py
were written for the presence-only behaviour; they asserted that
a junk cookie would make the handler return 200. They are renamed
to "_with_junk_cookie_rejected" and their assertions flipped to
401. The negative path (no cookie → 401 not_authenticated)
is unchanged.
Verified:
no cookie → 401 not_authenticated
junk cookie → 401 token_invalid (the fixed bug)
expired cookie → 401 token_expired
Tests: 284 passed (auth + persistence + isolation)
Lint: clean
* security(auth): wire @require_permission(owner_check=True) on isolation routes
Apply the require_permission decorator to all 28 routes that take a
{thread_id} path parameter. Combined with the strict middleware
(previous commit), this gives the double-layer protection that
AUTH_TEST_PLAN test 7.5.9 documents:
Layer 1 (AuthMiddleware): cookie + JWT validation, rejects junk
cookies and stamps request.state.user
Layer 2 (@require_permission with owner_check=True): per-resource
ownership verification via
ThreadMetaStore.check_access — returns
404 if a different user owns the thread
The decorator's owner_check branch is rewritten to use the SQL
thread_meta_repo (the 2.0-rc persistence layer) instead of the
LangGraph store path that PR #1728 used (_store_get / get_store
in routers/threads.py). The inject_record convenience is dropped
— no caller in 2.0 needs the LangGraph blob, and the SQL repo has
a different shape.
Routes decorated (28 total):
- threads.py: delete, patch, get, get-state, post-state, post-history
- thread_runs.py: post-runs, post-runs-stream, post-runs-wait,
list_runs, get_run, cancel_run, join_run, stream_existing_run,
list_thread_messages, list_run_messages, list_run_events,
thread_token_usage
- feedback.py: create, list, stats, delete
- uploads.py: upload (added Request param), list, delete
- artifacts.py: get_artifact
- suggestions.py: generate (renamed body parameter to avoid
conflict with FastAPI Request)
Test fixes:
- test_suggestions_router.py: bypass the decorator via __wrapped__
(the unit tests cover parsing logic, not auth — no point spinning
up a thread_meta_repo just to test JSON unwrapping)
- test_auth_middleware.py 4 fake-cookie tests: already updated in
the previous commit (745bf432)
Tests: 293 passed (auth + persistence + isolation + suggestions)
Lint: clean
* security(auth): defense-in-depth fixes from release validation pass
Eight findings caught while running the AUTH_TEST_PLAN end-to-end against
the deployed sg_dev stack. Each is a pre-condition for shipping
release/2.0-rc that the previous PRs missed.
Backend hardening
- routers/auth.py: rate limiter X-Real-IP now requires AUTH_TRUSTED_PROXIES
whitelist (CIDR/IP allowlist). Without nginx in front, the previous code
honored arbitrary X-Real-IP, letting an attacker rotate the header to
fully bypass the per-IP login lockout.
- routers/auth.py: 36-entry common-password blocklist via Pydantic
field_validator on RegisterRequest + ChangePasswordRequest. The shared
_validate_strong_password helper keeps the constraint in one place.
- routers/threads.py: ThreadCreateRequest + ThreadPatchRequest strip
server-reserved metadata keys (owner_id, user_id) via Pydantic
field_validator so a forged value can never round-trip back to other
clients reading the same thread. The actual ownership invariant stays
on the threads_meta row; this closes the metadata-blob echo gap.
- authz.py + thread_meta/sql.py: require_permission gains a require_existing
flag plumbed through check_access(require_existing=True). Destructive
routes (DELETE/PATCH/state-update/runs/feedback) now treat a missing
thread_meta row as 404 instead of "untracked legacy thread, allow",
closing the cross-user delete-idempotence gap where any user could
successfully DELETE another user's deleted thread.
- repositories/sqlite.py + base.py: update_user raises UserNotFoundError
on a vanished row instead of silently returning the input. Concurrent
delete during password reset can no longer look like a successful update.
- runtime/user_context.py: resolve_owner_id() coerces User.id (UUID) to
str at the contextvar boundary so SQLAlchemy String(64) columns can
bind it. The whole 2.0-rc isolation pipeline was previously broken
end-to-end (POST /api/threads → 500 "type 'UUID' is not supported").
- persistence/engine.py: SQLAlchemy listener enables PRAGMA journal_mode=WAL,
synchronous=NORMAL, foreign_keys=ON on every new SQLite connection.
TC-UPG-06 in the test plan expects WAL; previous code shipped with the
default 'delete' journal.
- auth_middleware.py: stamp request.state.auth = AuthContext(...) so
@require_permission's short-circuit fires; previously every isolation
request did a duplicate JWT decode + users SELECT. Also unifies the
401 payload through AuthErrorResponse(...).model_dump().
- app.py: _ensure_admin_user restructure removes the noqa F821 scoping
bug where 'password' was referenced outside the branch that defined it.
New _announce_credentials helper absorbs the duplicate log block in
the fresh-admin and reset-admin branches.
* fix(frontend+nginx): rollout CSRF on every state-changing client path
The frontend was 100% broken in gateway-pro mode for any user trying to
open a specific chat thread. Three cumulative bugs each silently
masked the next.
LangGraph SDK CSRF gap (api-client.ts)
- The Client constructor took only apiUrl, no defaultHeaders, no fetch
interceptor. The SDK's internal fetch never sent X-CSRF-Token, so
every state-changing /api/langgraph-compat/* call (runs/stream,
threads/search, threads/{tid}/history, ...) hit CSRFMiddleware and
got 403 before reaching the auth check. UI symptom: empty thread page
with no error message; the SPA's hooks swallowed the rejection.
- Fix: pass an onRequest hook that injects X-CSRF-Token from the
csrf_token cookie per request. Reading the cookie per call (not at
construction time) handles login / logout / password-change cookie
rotation transparently. The SDK's prepareFetchOptions calls
onRequest for both regular requests AND streaming/SSE/reconnect, so
the same hook covers runs.stream and runs.joinStream.
Raw fetch CSRF gap (7 files)
- Audit: 11 frontend fetch sites, only 2 included CSRF (login/setup +
account-settings change-password). The other 7 routed through raw
fetch() with no header — suggestions, memory, agents, mcp, skills,
uploads, and the local thread cleanup hook all 403'd silently.
- Fix: enhance fetcher.ts:fetchWithAuth to auto-inject X-CSRF-Token on
POST/PUT/DELETE/PATCH from a single shared readCsrfCookie() helper.
Convert all 7 raw fetch() callers to fetchWithAuth so the contract
is centrally enforced. api-client.ts and fetcher.ts share
readCsrfCookie + STATE_CHANGING_METHODS to avoid drift.
nginx routing + buffering (nginx.local.conf)
- The auth feature shipped without updating the nginx config: per-API
explicit location blocks but no /api/v1/auth/, /api/feedback, /api/runs.
The frontend's client-side fetches to /api/v1/auth/login/local 404'd
from the Next.js side because nginx routed /api/* to the frontend.
- Fix: add catch-all `location /api/` that proxies to the gateway.
nginx longest-prefix matching keeps the explicit blocks (/api/models,
/api/threads regex, /api/langgraph/, ...) winning for their paths.
- Fix: disable proxy_buffering + proxy_request_buffering for the
frontend `location /` block. Without it, nginx tries to spool large
Next.js chunks into /var/lib/nginx/proxy (root-owned) and fails with
Permission denied → ERR_INCOMPLETE_CHUNKED_ENCODING → ChunkLoadError.
* test(auth): release-validation test infra and new coverage
Test fixtures and unit tests added during the validation pass.
Router test helpers (NEW: tests/_router_auth_helpers.py)
- make_authed_test_app(): builds a FastAPI test app with a stub
middleware that stamps request.state.user + request.state.auth and a
permissive thread_meta_repo mock. TestClient-based router tests
(test_artifacts_router, test_threads_router) use it instead of bare
FastAPI() so the new @require_permission(owner_check=True) decorators
short-circuit cleanly.
- call_unwrapped(): walks the __wrapped__ chain to invoke the underlying
handler without going through the authz wrappers. Direct-call tests
(test_uploads_router) use it. Typed with ParamSpec so the wrapped
signature flows through.
Backend test additions
- test_auth.py: 7 tests for the new _get_client_ip trust model (no
proxy / trusted proxy / untrusted peer / XFF rejection / invalid
CIDR / no client). 5 tests for the password blocklist (literal,
case-insensitive, strong password accepted, change-password binding,
short-password length-check still fires before blocklist).
test_update_user_raises_when_row_concurrently_deleted: closes a
shipped-without-coverage gap on the new UserNotFoundError contract.
- test_thread_meta_repo.py: 4 tests for check_access(require_existing=True)
— strict missing-row denial, strict owner match, strict owner mismatch,
strict null-owner still allowed (shared rows survive the tightening).
- test_ensure_admin.py: 3 tests for _migrate_orphaned_threads /
_iter_store_items pagination, covering the TC-UPG-02 upgrade story
end-to-end via mock store. Closes the gap where the cursor pagination
was untested even though the previous PR rewrote it.
- test_threads_router.py: 5 tests for _strip_reserved_metadata
(owner_id removal, user_id removal, safe-keys passthrough, empty
input, both-stripped).
- test_auth_type_system.py: replace "password123" fixtures with
Tr0ub4dor3a / AnotherStr0ngPwd! so the new password blocklist
doesn't reject the test data.
* docs(auth): refresh TC-DOCKER-05 + document Docker validation gap
- AUTH_TEST_PLAN.md TC-DOCKER-05: the previous expectation
("admin password visible in docker logs") was stale after the simplify
pass that moved credentials to a 0600 file. The grep "Password:" check
would have silently failed and given a false sense of coverage. New
expectation matches the actual file-based path: 0600 file in
DEER_FLOW_HOME, log shows the path (not the secret), reverse-grep
asserts no leaked password in container logs.
- NEW: docs/AUTH_TEST_DOCKER_GAP.md documents the only un-executed
block in the test plan (TC-DOCKER-01..06). Reason: sg_dev validation
host has no Docker daemon installed. The doc maps each Docker case
to an already-validated bare-metal equivalent (TC-1.1, TC-REENT-01,
TC-API-02 etc.) so the gap is auditable, and includes pre-flight
reproduction steps for whoever has Docker available.
---------
Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com>
* refactor(persistence): unify SQLite to single deerflow.db and move checkpointer to runtime
Merge checkpoints.db and app.db into a single deerflow.db file (WAL mode
handles concurrent access safely). Move checkpointer module from
agents/checkpointer to runtime/checkpointer to better reflect its role
as a runtime infrastructure concern.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(persistence): rename owner_id to user_id and thread_meta_repo to thread_store
Rename owner_id to user_id across all persistence models, repositories,
stores, routers, and tests for clearer semantics. Rename thread_meta_repo
to thread_store for consistency with run_store/run_event_store naming.
Add ThreadMetaStore return type annotation to get_thread_store().
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(persistence): unify ThreadMetaStore interface with user isolation and factory
Add user_id parameter to all ThreadMetaStore abstract methods. Implement
owner isolation in MemoryThreadMetaStore with _get_owned_record helper.
Add check_access to base class and memory implementation. Add
make_thread_store factory to simplify deps.py initialization. Add
memory-backend isolation tests.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(feedback): add UNIQUE(thread_id, run_id, user_id) constraint
Add UNIQUE constraint to FeedbackRow to enforce one feedback per user per run,
enabling upsert behavior in Task 2. Update tests to use distinct user_ids for
multiple feedback records per run, and pass user_id=None to list_by_run for
admin-style queries that bypass user isolation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(feedback): add upsert() method with UNIQUE enforcement
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(feedback): add delete_by_run() and list_by_thread_grouped()
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(feedback): add PUT upsert and DELETE-by-run endpoints
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(feedback): enrich messages endpoint with per-run feedback data
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(feedback): add frontend feedback API client
Adds upsertFeedback and deleteFeedback API functions backed by
fetchWithAuth, targeting the /api/threads/{id}/runs/{id}/feedback
endpoint.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(feedback): wire feedback data into message rendering for history echo
Adds useThreadFeedback hook that fetches run-level feedback from the
messages API and builds a runId->FeedbackData map. MessageList now calls
this hook and passes feedback and runId to each MessageListItem so
previously-submitted thumbs are pre-filled when revisiting a thread.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(feedback): correct run_id mapping for feedback echo
The feedbackMap was keyed by run_id but looked up by LangGraph message ID.
Fixed by tracking AI message ordinal index to correlate event store
run_ids with LangGraph SDK messages.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(feedback): use real threadId and refresh after stream
- Pass threadId prop to MessageListItem instead of reading "new" from URL params
- Invalidate thread-feedback query on stream finish so buttons appear immediately
- Show feedback buttons always visible, copy button on hover only
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* style(feedback): group copy and feedback buttons together on the left
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* style(feedback): always show toolbar buttons without hover
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(persistence): stream hang when run_events.backend=db
DbRunEventStore._user_id_from_context() returned user.id without
coercing it to str. User.id is a Pydantic UUID, and aiosqlite cannot
bind a raw UUID object to a VARCHAR column, so the INSERT for the
initial human_message event silently rolled back and raised out of
the worker task. Because that put() sat outside the worker's try
block, the finally-clause that publishes end-of-stream never ran
and the SSE stream hung forever.
jsonl mode was unaffected because json.dumps(default=str) coerces
UUID objects transparently.
Fixes:
- db.py: coerce user.id to str at the context-read boundary (matches
what resolve_user_id already does for the other repositories)
- worker.py: move RunJournal init + human_message put inside the try
block so any failure flows through the finally/publish_end path
instead of hanging the subscriber
Defense-in-depth:
- engine.py: add PRAGMA busy_timeout=5000 so checkpointer and event
store wait for each other on the shared deerflow.db file instead
of failing immediately under write-lock contention
- journal.py: skip fire-and-forget _flush_sync when a previous flush
task is still in flight, to avoid piling up concurrent put_batch
writes on the same SQLAlchemy engine during streaming; flush() now
waits for pending tasks before draining the buffer
- database_config.py: doc-only update clarifying WAL + busy_timeout
keep the unified deerflow.db safe for both workloads
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore(persistence): drop redundant busy_timeout PRAGMA
Python's sqlite3 driver defaults to a 5-second busy timeout via the
``timeout`` kwarg of ``sqlite3.connect``, and aiosqlite + SQLAlchemy's
aiosqlite dialect inherit that default. Setting ``PRAGMA busy_timeout=5000``
explicitly was a no-op — verified by reading back the PRAGMA on a fresh
connection (it already reports 5000ms without our PRAGMA).
Concurrent stress test (50 checkpoint writes + 20 event batches + 50
thread_meta updates on the same deerflow.db) still completes with zero
errors and 200/200 rows after removing the explicit PRAGMA.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(journal): unwrap Command tool results in on_tool_end
Tools that update graph state (e.g. ``present_files``) return
``Command(update={'messages': [ToolMessage(...)], 'artifacts': [...]})``.
LangGraph later unwraps the inner ``ToolMessage`` into checkpoint state,
but ``RunJournal.on_tool_end`` was receiving the ``Command`` object
directly via the LangChain callback chain and storing
``str(Command(update={...}))`` as the tool_result content.
This produced a visible divergence between the event-store and the
checkpoint for any thread that used a Command-returning tool, blocking
the event-store-backed history fix in the follow-up commit. Concrete
example from thread ``6d30913e-dcd4-41c8-8941-f66c716cf359`` (seq=48):
checkpoint had ``'Successfully presented files'`` while event_store
stored the full Command repr.
The fix detects ``Command`` in ``on_tool_end``, extracts the first
``ToolMessage`` from ``update['messages']``, and lets the existing
ToolMessage branch handle the ``model_dump()`` path. Legacy rows still
containing the Command repr are separately cleaned up by the history
helper in the follow-up commit.
Tests:
- ``test_tool_end_unwraps_command_with_inner_tool_message`` — unit test
of the unwrap branch with a constructed Command
- ``test_tool_invoke_end_to_end_unwraps_command`` — end-to-end via
``CallbackManager`` + ``tool.invoke`` to exercise the real LangChain
dispatch path that production uses, matching the repro shape from
``present_files``
- Counter-proof: temporarily reverted the patch, both tests failed with
the exact ``Command(update={...})`` repr that was stored in the
production SQLite row at seq=48, confirming LangChain does pass the
``Command`` through callbacks (the unwrap is load-bearing)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(threads): load history messages from event store, immune to summarize
``get_thread_history`` and ``get_thread_state`` in Gateway mode read
messages from ``checkpoint.channel_values["messages"]``. After
SummarizationMiddleware runs mid-run, that list is rewritten in-place:
pre-summarize messages are dropped and a synthetic summary-as-human
message takes position 0. The frontend then renders a chat history that
starts with ``"Here is a summary of the conversation to date:..."``
instead of the user's original query, and all earlier turns are gone.
The event store (``RunEventStore``) is append-only and never rewritten,
so it retains the full transcript. This commit adds a helper
``_get_event_store_messages`` that loads the event store's message
stream and overrides ``values["messages"]`` in both endpoints; the
checkpoint fallback kicks in only when the event store is unavailable.
Behavior contract of the helper:
- **Full pagination.** ``list_messages`` returns the newest ``limit``
records when no cursor is given, so a fixed limit silently drops
older messages on long threads. The helper sizes the read from
``count_messages()`` and pages forward with ``after_seq`` cursors.
- **Copy-on-read.** Each content dict is copied before ``id`` is
patched so the live store object (``MemoryRunEventStore`` returns
references) is never mutated.
- **Stable ids.** Messages with ``id=None`` (human + tool_result,
which don't receive an id until checkpoint persistence) get a
deterministic ``uuid5(NAMESPACE_URL, f"{thread_id}:{seq}")`` so
React keys stay stable across requests. AI messages keep their
LLM-assigned ``lc_run--*`` ids.
- **Legacy ``Command`` repr sanitization.** Rows captured before the
``journal.py`` ``on_tool_end`` fix (previous commit) stored
``str(Command(update={'messages': [ToolMessage(content='X', ...)]}))``
as the tool_result content. ``_sanitize_legacy_command_repr``
regex-extracts the inner text so old threads render cleanly.
- **Inline feedback.** When loading the stream, the helper also pulls
``feedback_repo.list_by_thread_grouped`` and attaches ``run_id`` to
every message plus ``feedback`` to the final ``ai_message`` of each
run. This removes the frontend's need to fetch a second endpoint
and positional-index-map its way back to the right run. When the
feedback subsystem is unavailable, the ``feedback`` field is left
absent entirely so the frontend hides the button rather than
rendering it over a broken write path.
- **User context.** ``DbRunEventStore`` is user-scoped by default via
``resolve_user_id(AUTO)``. The helper relies on the ``@require_permission``
decorator having populated the user contextvar on both callers; the
docstring documents this dependency explicitly so nobody wires it
into a CLI or migration script without passing ``user_id=None``.
Real data verification against thread
``6d30913e-dcd4-41c8-8941-f66c716cf359``: checkpoint showed 12 messages
(summarize-corrupted), event store had 16. The original human message
``"最新伊美局势"`` was preserved as seq=1 in the event store and
correctly restored to position 0 in the helper output. Helper output
for AI messages was byte-identical to checkpoint for every overlapping
message; only tool_result ids differed (patched to uuid5) and the
legacy Command repr at seq=48 was sanitized.
Tests:
- ``test_thread_state_event_store.py`` — 18 tests covering
``_sanitize_legacy_command_repr`` (passthrough, single/double-quote
extraction, unparseable fallback), helper happy path (all message
types, stable uuid5, store non-mutation), multi-page pagination,
summarize regression (recovers pre-summarize messages), feedback
attachment (per-run, multi-run threads, repo failure graceful),
and dependency failure fallback to ``None``.
Docs:
- ``docs/superpowers/plans/2026-04-10-event-store-history.md`` — the
implementation plan this commit realizes, with Task 1 revised after
the evaluation findings (pagination, copy-on-read, Command wrap
already landed in journal.py, frontend feedback pagination in the
follow-up commit, Standard-mode follow-up noted).
- ``docs/superpowers/specs/2026-04-11-runjournal-history-evaluation.md``
— the Claude + second-opinion evaluation document that drove the
plan revisions (pagination bug, dict-mutation bug, feedback hidden
bug, Command bug).
- ``docs/superpowers/specs/2026-04-11-summarize-marker-design.md`` —
design for a follow-up PR that visually marks summarize events in
history, based on a verified ``adispatch_custom_event`` experiment
(``trace=False`` middleware nodes can still forward the Pregel task
config via explicit signature injection).
Scope: Gateway mode only (``make dev-pro``). Standard mode
(``make dev``) hits LangGraph Server directly and bypasses these
endpoints; the summarize symptom is still present there and is
tracked as a separate follow-up in the plan.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(feedback): inline feedback on history and drop positional mapping
The old ``useThreadFeedback`` hook loaded ``GET /api/threads/{id}/messages?limit=200``
and built two parallel lookup tables: ``runIdByAiIndex`` (an ordinal array of
run_ids for every ``ai_message``-typed event) and ``feedbackByRunId``. The render
loop in ``message-list.tsx`` walked the AI messages in order, incrementing
``aiMessageIndex`` on each non-human message, and used that ordinal to look up
the run_id and feedback.
This shape had three latent bugs we could observe on real threads:
1. **Fetch was capped at 200 messages.** Long or tool-heavy threads silently
dropped earlier entries from the map, so feedback buttons could be missing
on messages they should own.
2. **Ordinal mismatch.** The render loop counted every non-human message
(including each intermediate ``ai_tool_call``), but ``runIdByAiIndex`` only
pushed entries for ``event_type == "ai_message"``. A run with 3 tool_calls
+ 1 final AI message would push 1 entry while the render consumed 4
positions, so buttons mapped to the wrong positions across multi-run
threads.
3. **Two parallel data paths.** The ``/history`` render path and the
``/messages`` feedback-lookup path could drift in-between an
``invalidateQueries`` call and the next refetch, producing transient
mismaps.
The previous commit moved the authoritative message source for history to
the event store and added ``run_id`` + ``feedback`` inline on each message
dict returned by ``_get_event_store_messages``. This commit aligns the
frontend with that contract:
- **Delete** ``useThreadFeedback``, ``ThreadFeedbackData``,
``runIdByAiIndex``, ``feedbackByRunId``, and ``fetchAllThreadMessages``.
- **Introduce** ``useThreadMessageEnrichment`` that fetches
``POST /history?limit=1`` once, indexes the returned messages by
``message.id`` into a ``Map<id, {run_id, feedback?}>``, and invalidates
on stream completion (``onFinish`` in ``useThreadStream``). Keying by
``message.id`` is stable across runs, tool_call chains, and summarize.
- **Simplify** ``message-list.tsx`` to drop the ``aiMessageIndex``
counter and read ``enrichment?.get(msg.id)`` at each render step.
- **Rewire** ``message-list-item.tsx`` so the feedback button renders
when ``feedback !== undefined`` rather than when the message happens
to be non-human. ``feedback`` is ``undefined`` for non-eligible
messages (humans, non-final AI, tools), ``null`` for the final
ai_message of an unrated run, and a ``FeedbackData`` object once
rated — cleanly distinguishing "not eligible" from "eligible but
unrated".
``/api/threads/{id}/messages`` is kept as a debug/export surface; no
frontend code calls it anymore but the backend router is untouched.
Validation:
- ``pnpm check`` clean (0 errors, 1 pre-existing unrelated warning)
- Live test on thread ``3d5dea4a`` after gateway restart confirmed the
original user query is restored to position 0 and the feedback
button behaves correctly on the final AI message.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(rebase): remove duplicate definitions and update stale module paths
Rebase left duplicate function blocks in worker.py (triple human_message
write causing 3x user messages in /history), deps.py, and prompt.py.
Also update checkpointer imports from the old deerflow.agents.checkpointer
path to deerflow.runtime.checkpointer, and clean up orphaned feedback
props in the frontend message components.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(rebase): restore FeedbackButtons component and enrichment lost during rebase
The FeedbackButtons component (defined inline in message-list-item.tsx)
was introduced in commit 95df8d13 but lost during rebase. The previous
rebase cleanup commit incorrectly removed the feedback/runId props and
enrichment hook as "orphaned code" instead of restoring the missing
component. This commit restores:
- FeedbackButtons component with thumbs up/down toggle and optimistic state
- FeedbackData/upsertFeedback/deleteFeedback imports
- feedback and runId props on MessageListItem
- useThreadMessageEnrichment hook and entry lookup in message-list.tsx
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(user-context): add DEFAULT_USER_ID and get_effective_user_id helper
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(paths): add user-aware path methods with optional user_id parameter
Add _validate_user_id(), user_dir(), user_memory_file(), user_agent_memory_file()
and optional keyword-only user_id parameter to all thread-related path methods.
When user_id is provided, paths resolve under users/{user_id}/threads/{thread_id}/;
when omitted, legacy layout is preserved for backward compatibility.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(memory): add user_id to MemoryStorage interface for per-user isolation
Thread user_id through MemoryStorage.load/reload/save abstract methods and
FileMemoryStorage, re-keying the in-memory cache from bare agent_name to a
(user_id, agent_name) tuple to prevent cross-user cache collisions.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(memory): thread user_id through memory updater layer
Add `user_id` keyword-only parameter to all public updater functions
(_save_memory_to_file, get_memory_data, reload_memory_data, import_memory_data,
clear_memory_data, create/delete/update_memory_fact) and regular keyword param
to MemoryUpdater.update_memory + update_memory_from_conversation, propagating
it to every storage load/save/reload call.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(memory): capture user_id at enqueue time for async-safe thread isolation
Add user_id field to ConversationContext and MemoryUpdateQueue.add() so the
user identity is stored explicitly at request time, before threading.Timer
fires on a different thread where ContextVar values do not propagate.
MemoryMiddleware.after_agent() now calls get_effective_user_id() at enqueue
time and passes the value through to updater.update_memory().
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(isolation): wire user_id through all Paths and memory callsites
Pass user_id=get_effective_user_id() at every callsite that invokes
Paths methods or memory functions, enabling per-user filesystem isolation
throughout the harness and app layers.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(migration): add idempotent script for per-user data migration
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: update CLAUDE.md and config docs for per-user isolation
* feat(events): add pagination to list_messages_by_run on all store backends
Replicates the existing before_seq/after_seq/limit cursor-pagination pattern
from list_messages onto list_messages_by_run across the abstract interface,
MemoryRunEventStore, JsonlRunEventStore, and DbRunEventStore.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(api): add GET /api/runs/{run_id}/messages with cursor pagination
New endpoint resolves thread_id from the run record and delegates to
RunEventStore.list_messages_by_run for cursor-based pagination.
Ownership is enforced implicitly via RunStore.get() user filtering.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(api): add GET /api/runs/{run_id}/feedback
Delegates to FeedbackRepository.list_by_run via the existing _resolve_run
helper; includes tests for success, 404, empty list, and 503 (no DB).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(api): retrofit cursor pagination onto GET /threads/{tid}/runs/{rid}/messages
Replace bare list[dict] response with {data: [...], has_more: bool} envelope,
forwarding limit/before_seq/after_seq query params to the event store.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: add run-level API endpoints to CLAUDE.md routers table
* refactor(threads): remove event-store message loader and feedback from state/history endpoints
State and history endpoints now return messages purely from the
checkpointer's channel_values. The _get_event_store_messages helper
(which loaded the full event-store transcript with feedback attached)
is removed along with its tests. Frontend will use the dedicated
GET /api/runs/{run_id}/messages and /feedback endpoints instead.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(persistence): add unified persistence layer with event store, token tracking, and feedback (#1930)
* feat(persistence): add SQLAlchemy 2.0 async ORM scaffold
Introduce a unified database configuration (DatabaseConfig) that
controls both the LangGraph checkpointer and the DeerFlow application
persistence layer from a single `database:` config section.
New modules:
- deerflow.config.database_config — Pydantic config with memory/sqlite/postgres backends
- deerflow.persistence — async engine lifecycle, DeclarativeBase with to_dict mixin, Alembic skeleton
- deerflow.runtime.runs.store — RunStore ABC + MemoryRunStore implementation
Gateway integration initializes/tears down the persistence engine in
the existing langgraph_runtime() context manager. Legacy checkpointer
config is preserved for backward compatibility.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(persistence): add RunEventStore ABC + MemoryRunEventStore
Phase 2-A prerequisite for event storage: adds the unified run event
stream interface (RunEventStore) with an in-memory implementation,
RunEventsConfig, gateway integration, and comprehensive tests (27 cases).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(persistence): add ORM models, repositories, DB/JSONL event stores, RunJournal, and API endpoints
Phase 2-B: run persistence + event storage + token tracking.
- ORM models: RunRow (with token fields), ThreadMetaRow, RunEventRow
- RunRepository implements RunStore ABC via SQLAlchemy ORM
- ThreadMetaRepository with owner access control
- DbRunEventStore with trace content truncation and cursor pagination
- JsonlRunEventStore with per-run files and seq recovery from disk
- RunJournal (BaseCallbackHandler) captures LLM/tool/lifecycle events,
accumulates token usage by caller type, buffers and flushes to store
- RunManager now accepts optional RunStore for persistent backing
- Worker creates RunJournal, writes human_message, injects callbacks
- Gateway deps use factory functions (RunRepository when DB available)
- New endpoints: messages, run messages, run events, token-usage
- ThreadCreateRequest gains assistant_id field
- 92 tests pass (33 new), zero regressions
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(persistence): add user feedback + follow-up run association
Phase 2-C: feedback and follow-up tracking.
- FeedbackRow ORM model (rating +1/-1, optional message_id, comment)
- FeedbackRepository with CRUD, list_by_run/thread, aggregate stats
- Feedback API endpoints: create, list, stats, delete
- follow_up_to_run_id in RunCreateRequest (explicit or auto-detected
from latest successful run on the thread)
- Worker writes follow_up_to_run_id into human_message event metadata
- Gateway deps: feedback_repo factory + getter
- 17 new tests (14 FeedbackRepository + 3 follow-up association)
- 109 total tests pass, zero regressions
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test+config: comprehensive Phase 2 test coverage + deprecate checkpointer config
- config.example.yaml: deprecate standalone checkpointer section, activate
unified database:sqlite as default (drives both checkpointer + app data)
- New: test_thread_meta_repo.py (14 tests) — full ThreadMetaRepository coverage
including check_access owner logic, list_by_owner pagination
- Extended test_run_repository.py (+4 tests) — completion preserves fields,
list ordering desc, limit, owner_none returns all
- Extended test_run_journal.py (+8 tests) — on_chain_error, track_tokens=false,
middleware no ai_message, unknown caller tokens, convenience fields,
tool_error, non-summarization custom event
- Extended test_run_event_store.py (+7 tests) — DB batch seq continuity,
make_run_event_store factory (memory/db/jsonl/fallback/unknown)
- Extended test_phase2b_integration.py (+4 tests) — create_or_reject persists,
follow-up metadata, summarization in history, full DB-backed lifecycle
- Fixed DB integration test to use proper fake objects (not MagicMock)
for JSON-serializable metadata
- 157 total Phase 2 tests pass, zero regressions
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* config: move default sqlite_dir to .deer-flow/data
Keep SQLite databases alongside other DeerFlow-managed data
(threads, memory) under the .deer-flow/ directory instead of a
top-level ./data folder.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(persistence): remove UTFJSON, use engine-level json_serializer + datetime.now()
- Replace custom UTFJSON type with standard sqlalchemy.JSON in all ORM
models. Add json_serializer=json.dumps(ensure_ascii=False) to all
create_async_engine calls so non-ASCII text (Chinese etc.) is stored
as-is in both SQLite and Postgres.
- Change ORM datetime defaults from datetime.now(UTC) to datetime.now(),
remove UTC imports.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(gateway): simplify deps.py with getter factory + inline repos
- Replace 6 identical getter functions with _require() factory.
- Inline 3 _make_*_repo() factories into langgraph_runtime(), call
get_session_factory() once instead of 3 times.
- Add thread_meta upsert in start_run (services.py).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(docker): add UV_EXTRAS build arg for optional dependencies
Support installing optional dependency groups (e.g. postgres) at
Docker build time via UV_EXTRAS build arg:
UV_EXTRAS=postgres docker compose build
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(journal): fix flush, token tracking, and consolidate tests
RunJournal fixes:
- _flush_sync: retain events in buffer when no event loop instead of
dropping them; worker's finally block flushes via async flush().
- on_llm_end: add tool_calls filter and caller=="lead_agent" guard for
ai_message events; mark message IDs for dedup with record_llm_usage.
- worker.py: persist completion data (tokens, message count) to RunStore
in finally block.
Model factory:
- Auto-inject stream_usage=True for BaseChatOpenAI subclasses with
custom api_base, so usage_metadata is populated in streaming responses.
Test consolidation:
- Delete test_phase2b_integration.py (redundant with existing tests).
- Move DB-backed lifecycle test into test_run_journal.py.
- Add tests for stream_usage injection in test_model_factory.py.
- Clean up executor/task_tool dead journal references.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(events): widen content type to str|dict in all store backends
Allow event content to be a dict (for structured OpenAI-format messages)
in addition to plain strings. Dict values are JSON-serialized for the DB
backend and deserialized on read; memory and JSONL backends handle dicts
natively. Trace truncation now serializes dicts to JSON before measuring.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(events): use metadata flag instead of heuristic for dict content detection
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(converters): add LangChain-to-OpenAI message format converters
Pure functions langchain_to_openai_message, langchain_to_openai_completion,
langchain_messages_to_openai, and _infer_finish_reason for converting
LangChain BaseMessage objects to OpenAI Chat Completions format, used by
RunJournal for event storage. 15 unit tests added.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(converters): handle empty list content as null, clean up test
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(events): human_message content uses OpenAI user message format
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(events): ai_message uses OpenAI format, add ai_tool_call message event
- ai_message content now uses {"role": "assistant", "content": "..."} format
- New ai_tool_call message event emitted when lead_agent LLM responds with tool_calls
- ai_tool_call uses langchain_to_openai_message converter for consistent format
- Both events include finish_reason in metadata ("stop" or "tool_calls")
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(events): add tool_result message event with OpenAI tool message format
Cache tool_call_id from on_tool_start keyed by run_id as fallback for on_tool_end,
then emit a tool_result message event (role=tool, tool_call_id, content) after each
successful tool completion.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(events): summary content uses OpenAI system message format
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(events): replace llm_start/llm_end with llm_request/llm_response in OpenAI format
Add on_chat_model_start to capture structured prompt messages as llm_request events.
Replace llm_end trace events with llm_response using OpenAI Chat Completions format.
Track llm_call_index to pair request/response events.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(events): add record_middleware method for middleware trace events
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test(events): add full run sequence integration test for OpenAI content format
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(events): align message events with checkpoint format and add middleware tag injection
- Message events (ai_message, ai_tool_call, tool_result, human_message) now use
BaseMessage.model_dump() format, matching LangGraph checkpoint values.messages
- on_tool_end extracts tool_call_id/name/status from ToolMessage objects
- on_tool_error now emits tool_result message events with error status
- record_middleware uses middleware:{tag} event_type and middleware category
- Summarization custom events use middleware:summarize category
- TitleMiddleware injects middleware:title tag via get_config() inheritance
- SummarizationMiddleware model bound with middleware:summarize tag
- Worker writes human_message using HumanMessage.model_dump()
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(threads): switch search endpoint to threads_meta table and sync title
- POST /api/threads/search now queries threads_meta table directly,
removing the two-phase Store + Checkpointer scan approach
- Add ThreadMetaRepository.search() with metadata/status filters
- Add ThreadMetaRepository.update_display_name() for title sync
- Worker syncs checkpoint title to threads_meta.display_name on run completion
- Map display_name to values.title in search response for API compatibility
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(threads): history endpoint reads messages from event store
- POST /api/threads/{thread_id}/history now combines two data sources:
checkpointer for checkpoint_id, metadata, title, thread_data;
event store for messages (complete history, not truncated by summarization)
- Strip internal LangGraph metadata keys from response
- Remove full channel_values serialization in favor of selective fields
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: remove duplicate optional-dependencies header in pyproject.toml
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(middleware): pass tagged config to TitleMiddleware ainvoke call
Without the config, the middleware:title tag was not injected,
causing the LLM response to be recorded as a lead_agent ai_message
in run_events.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: resolve merge conflict in .env.example
Keep both DATABASE_URL (from persistence-scaffold) and WECOM
credentials (from main) after the merge.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(persistence): address review feedback on PR #1851
- Fix naive datetime.now() → datetime.now(UTC) in all ORM models
- Fix seq race condition in DbRunEventStore.put() with FOR UPDATE
and UNIQUE(thread_id, seq) constraint
- Encapsulate _store access in RunManager.update_run_completion()
- Deduplicate _store.put() logic in RunManager via _persist_to_store()
- Add update_run_completion to RunStore ABC + MemoryRunStore
- Wire follow_up_to_run_id through the full create path
- Add error recovery to RunJournal._flush_sync() lost-event scenario
- Add migration note for search_threads breaking change
- Fix test_checkpointer_none_fix mock to set database=None
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: update uv.lock
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(persistence): address 22 review comments from CodeQL, Copilot, and Code Quality
Bug fixes:
- Sanitize log params to prevent log injection (CodeQL)
- Reset threads_meta.status to idle/error when run completes
- Attach messages only to latest checkpoint in /history response
- Write threads_meta on POST /threads so new threads appear in search
Lint fixes:
- Remove unused imports (journal.py, migrations/env.py, test_converters.py)
- Convert lambda to named function (engine.py, Ruff E731)
- Remove unused logger definitions in repos (Ruff F841)
- Add logging to JSONL decode errors and empty except blocks
- Separate assert side-effects in tests (CodeQL)
- Remove unused local variables in tests (Ruff F841)
- Fix max_trace_content truncation to use byte length, not char length
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* style: apply ruff format to persistence and runtime files
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Potential fix for pull request finding 'Statement has no effect'
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
* refactor(runtime): introduce RunContext to reduce run_agent parameter bloat
Extract checkpointer, store, event_store, run_events_config, thread_meta_repo,
and follow_up_to_run_id into a frozen RunContext dataclass. Add get_run_context()
in deps.py to build the base context from app.state singletons. start_run() uses
dataclasses.replace() to enrich per-run fields before passing ctx to run_agent.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(gateway): move sanitize_log_param to app/gateway/utils.py
Extract the log-injection sanitizer from routers/threads.py into a shared
utils module and rename to sanitize_log_param (public API). Eliminates the
reverse service → router import in services.py.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* perf: use SQL aggregation for feedback stats and thread token usage
Replace Python-side counting in FeedbackRepository.aggregate_by_run with
a single SELECT COUNT/SUM query. Add RunStore.aggregate_tokens_by_thread
abstract method with SQL GROUP BY implementation in RunRepository and
Python fallback in MemoryRunStore. Simplify the thread_token_usage
endpoint to delegate to the new method, eliminating the limit=10000
truncation risk.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: annotate DbRunEventStore.put() as low-frequency path
Add docstring clarifying that put() opens a per-call transaction with
FOR UPDATE and should only be used for infrequent writes (currently
just the initial human_message event). High-throughput callers should
use put_batch() instead.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(threads): fall back to Store search when ThreadMetaRepository is unavailable
When database.backend=memory (default) or no SQL session factory is
configured, search_threads now queries the LangGraph Store instead of
returning 503. Returns empty list if neither Store nor repo is available.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(persistence): introduce ThreadMetaStore ABC for backend-agnostic thread metadata
Add ThreadMetaStore abstract base class with create/get/search/update/delete
interface. ThreadMetaRepository (SQL) now inherits from it. New
MemoryThreadMetaStore wraps LangGraph BaseStore for memory-mode deployments.
deps.py now always provides a non-None thread_meta_repo, eliminating all
`if thread_meta_repo is not None` guards in services.py, worker.py, and
routers/threads.py. search_threads no longer needs a Store fallback branch.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(history): read messages from checkpointer instead of RunEventStore
The /history endpoint now reads messages directly from the
checkpointer's channel_values (the authoritative source) instead of
querying RunEventStore.list_messages(). The RunEventStore API is
preserved for other consumers.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(persistence): address new Copilot review comments
- feedback.py: validate thread_id/run_id before deleting feedback
- jsonl.py: add path traversal protection with ID validation
- run_repo.py: parse `before` to datetime for PostgreSQL compat
- thread_meta_repo.py: fix pagination when metadata filter is active
- database_config.py: use resolve_path for sqlite_dir consistency
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Implement skill self-evolution and skill_manage flow (#1874)
* chore: ignore .worktrees directory
* Add skill_manage self-evolution flow
* Fix CI regressions for skill_manage
* Address PR review feedback for skill evolution
* fix(skill-evolution): preserve history on delete
* fix(skill-evolution): tighten scanner fallbacks
* docs: add skill_manage e2e evidence screenshot
* fix(skill-manage): avoid blocking fs ops in session runtime
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(config): resolve sqlite_dir relative to CWD, not Paths.base_dir
resolve_path() resolves relative to Paths.base_dir (.deer-flow),
which double-nested the path to .deer-flow/.deer-flow/data/app.db.
Use Path.resolve() (CWD-relative) instead.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Feature/feishu receive file (#1608)
* feat(feishu): add channel file materialization hook for inbound messages
- Introduce Channel.receive_file(msg, thread_id) as a base method for file materialization; default is no-op.
- Implement FeishuChannel.receive_file to download files/images from Feishu messages, save to sandbox, and inject virtual paths into msg.text.
- Update ChannelManager to call receive_file for any channel if msg.files is present, enabling downstream model access to user-uploaded files.
- No impact on Slack/Telegram or other channels (they inherit the default no-op).
* style(backend): format code with ruff for lint compliance
- Auto-formatted packages/harness/deerflow/agents/factory.py and tests/test_create_deerflow_agent.py using `ruff format`
- Ensured both files conform to project linting standards
- Fixes CI lint check failures caused by code style issues
* fix(feishu): handle file write operation asynchronously to prevent blocking
* fix(feishu): rename GetMessageResourceRequest to _GetMessageResourceRequest and remove redundant code
* test(feishu): add tests for receive_file method and placeholder replacement
* fix(manager): remove unnecessary type casting for channel retrieval
* fix(feishu): update logging messages to reflect resource handling instead of image
* fix(feishu): sanitize filename by replacing invalid characters in file uploads
* fix(feishu): improve filename sanitization and reorder image key handling in message processing
* fix(feishu): add thread lock to prevent filename conflicts during file downloads
* fix(test): correct bad merge in test_feishu_parser.py
* chore: run ruff and apply formatting cleanup
fix(feishu): preserve rich-text attachment order and improve fallback filename handling
* fix(docker): restore gateway env vars and fix langgraph empty arg issue (#1915)
Two production docker-compose.yaml bugs prevent `make up` from working:
1. Gateway missing DEER_FLOW_CONFIG_PATH and DEER_FLOW_EXTENSIONS_CONFIG_PATH
environment overrides. Added in fb2d99f (#1836) but accidentally reverted
by ca2fb95 (#1847). Without them, gateway reads host paths from .env via
env_file, causing FileNotFoundError inside the container.
2. Langgraph command fails when LANGGRAPH_ALLOW_BLOCKING is unset (default).
Empty $${allow_blocking} inserts a bare space between flags, causing
' --no-reload' to be parsed as unexpected extra argument. Fix by building
args string first and conditionally appending --allow-blocking.
Co-authored-by: cooper <cooperfu@tencent.com>
* fix(frontend): resolve invalid HTML nesting and tabnabbing vulnerabilities (#1904)
* fix(frontend): resolve invalid HTML nesting and tabnabbing vulnerabilities
Fix `<button>` inside `<a>` invalid HTML in artifact components and add
missing `noopener,noreferrer` to `window.open` calls to prevent reverse
tabnabbing.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(frontend): address Copilot review on tabnabbing and double-tab-open
Remove redundant parent onClick on web_fetch ChainOfThoughtStep to
prevent opening two tabs on link click, and explicitly null out
window.opener after window.open() for defensive tabnabbing hardening.
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* refactor(persistence): organize entities into per-entity directories
Restructure the persistence layer from horizontal "models/ + repositories/"
split into vertical entity-aligned directories. Each entity (thread_meta,
run, feedback) now owns its ORM model, abstract interface (where applicable),
and concrete implementations under a single directory with an aggregating
__init__.py for one-line imports.
Layout:
persistence/thread_meta/{base,model,sql,memory}.py
persistence/run/{model,sql}.py
persistence/feedback/{model,sql}.py
models/__init__.py is kept as a facade so Alembic autogenerate continues to
discover all ORM tables via Base.metadata. RunEventRow remains under
models/run_event.py because its storage implementation lives in
runtime/events/store/db.py and has no matching repository directory.
The repositories/ directory is removed entirely. All call sites in
gateway/deps.py and tests are updated to import from the new entity
packages, e.g.:
from deerflow.persistence.thread_meta import ThreadMetaRepository
from deerflow.persistence.run import RunRepository
from deerflow.persistence.feedback import FeedbackRepository
Full test suite passes (1690 passed, 14 skipped).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(gateway): sync thread rename and delete through ThreadMetaStore
The POST /threads/{id}/state endpoint previously synced title changes
only to the LangGraph Store via _store_upsert. In sqlite mode the search
endpoint reads from the ThreadMetaRepository SQL table, so renames never
appeared in /threads/search until the next agent run completed (worker.py
syncs title from checkpoint to thread_meta in its finally block).
Likewise the DELETE /threads/{id} endpoint cleaned up the filesystem,
Store, and checkpointer but left the threads_meta row orphaned in sqlite,
so deleted threads kept appearing in /threads/search.
Fix both endpoints by routing through the ThreadMetaStore abstraction
which already has the correct sqlite/memory implementations wired up by
deps.py. The rename path now calls update_display_name() and the delete
path calls delete() — both work uniformly across backends.
Verified end-to-end with curl in gateway mode against sqlite backend.
Existing test suite (1690 passed) and focused router/repo tests pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(gateway): route all thread metadata access through ThreadMetaStore
Following the rename/delete bug fix in PR1, migrate the remaining direct
LangGraph Store reads/writes in the threads router and services to the
ThreadMetaStore abstraction so that the sqlite and memory backends behave
identically and the legacy dual-write paths can be removed.
Migrated endpoints (threads.py):
- create_thread: idempotency check + write now use thread_meta_repo.get/create
instead of dual-writing the LangGraph Store and the SQL row.
- get_thread: reads from thread_meta_repo.get; the checkpoint-only fallback
for legacy threads is preserved.
- patch_thread: replaced _store_get/_store_put with thread_meta_repo.update_metadata.
- delete_thread_data: dropped the legacy store.adelete; thread_meta_repo.delete
already covers it.
Removed dead code (services.py):
- _upsert_thread_in_store — redundant with the immediately following
thread_meta_repo.create() call.
- _sync_thread_title_after_run — worker.py's finally block already syncs
the title via thread_meta_repo.update_display_name() after each run.
Removed dead code (threads.py):
- _store_get / _store_put / _store_upsert helpers (no remaining callers).
- THREADS_NS constant.
- get_store import (router no longer touches the LangGraph Store directly).
New abstract method:
- ThreadMetaStore.update_metadata(thread_id, metadata) merges metadata into
the thread's metadata field. Implemented in both ThreadMetaRepository (SQL,
read-modify-write inside one session) and MemoryThreadMetaStore. Three new
unit tests cover merge / empty / nonexistent behaviour.
Net change: -134 lines. Full test suite: 1693 passed, 14 skipped.
Verified end-to-end with curl in gateway mode against sqlite backend
(create / patch / get / rename / search / delete).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
Co-authored-by: DanielWalnut <45447813+hetaoBackend@users.noreply.github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: JilongSun <965640067@qq.com>
Co-authored-by: jie <49781832+stan-fu@users.noreply.github.com>
Co-authored-by: cooper <cooperfu@tencent.com>
Co-authored-by: yangzheli <43645580+yangzheli@users.noreply.github.com>
* feat(auth): release-validation pass for 2.0-rc — 12 blockers + simplify follow-ups (#2008)
* feat(auth): introduce backend auth module
Port RFC-001 authentication core from PR #1728:
- JWT token handling (create_access_token, decode_token, TokenPayload)
- Password hashing (bcrypt) with verify_password
- SQLite UserRepository with base interface
- Provider Factory pattern (LocalAuthProvider)
- CLI reset_admin tool
- Auth-specific errors (AuthErrorCode, TokenError, AuthErrorResponse)
Deps:
- bcrypt>=4.0.0
- pyjwt>=2.9.0
- email-validator>=2.0.0
- backend/uv.toml pins public PyPI index
Tests: 12 pure unit tests (test_auth_config.py, test_auth_errors.py).
Scope note: authz.py, test_auth.py, and test_auth_type_system.py are
deferred to commit 2 because they depend on middleware and deps wiring
that is not yet in place. Commit 1 stays "pure new files only" as the
spec mandates.
* feat(auth): wire auth end-to-end (middleware + frontend replacement)
Backend:
- Port auth_middleware, csrf_middleware, langgraph_auth, routers/auth
- Port authz decorator (owner_filter_key defaults to 'owner_id')
- Merge app.py: register AuthMiddleware + CSRFMiddleware + CORS, add
_ensure_admin_user lifespan hook, _migrate_orphaned_threads helper,
register auth router
- Merge deps.py: add get_local_provider, get_current_user_from_request,
get_optional_user_from_request; keep get_current_user as thin str|None
adapter for feedback router
- langgraph.json: add auth path pointing to langgraph_auth.py:auth
- Rename metadata['user_id'] -> metadata['owner_id'] in langgraph_auth
(both metadata write and LangGraph filter dict) + test fixtures
Frontend:
- Delete better-auth library and api catch-all route
- Remove better-auth npm dependency and env vars (BETTER_AUTH_SECRET,
BETTER_AUTH_GITHUB_*) from env.js
- Port frontend/src/core/auth/* (AuthProvider, gateway-config,
proxy-policy, server-side getServerSideUser, types)
- Port frontend/src/core/api/fetcher.ts
- Port (auth)/layout, (auth)/login, (auth)/setup pages
- Rewrite workspace/layout.tsx as server component that calls
getServerSideUser and wraps in AuthProvider
- Port workspace/workspace-content.tsx for the client-side sidebar logic
Tests:
- Port 5 auth test files (test_auth, test_auth_middleware,
test_auth_type_system, test_ensure_admin, test_langgraph_auth)
- 176 auth tests PASS
After this commit: login/logout/registration flow works, but persistence
layer does not yet filter by owner_id. Commit 4 closes that gap.
* feat(auth): account settings page + i18n
- Port account-settings-page.tsx (change password, change email, logout)
- Wire into settings-dialog.tsx as new "account" section with UserIcon,
rendered first in the section list
- Add i18n keys:
- en-US/zh-CN: settings.sections.account ("Account" / "账号")
- en-US/zh-CN: button.logout ("Log out" / "退出登录")
- types.ts: matching type declarations
* feat(auth): enforce owner_id across 2.0-rc persistence layer
Add request-scoped contextvar-based owner filtering to threads_meta,
runs, run_events, and feedback repositories. Router code is unchanged
— isolation is enforced at the storage layer so that any caller that
forgets to pass owner_id still gets filtered results, and new routes
cannot accidentally leak data.
Core infrastructure
-------------------
- deerflow/runtime/user_context.py (new):
- ContextVar[CurrentUser | None] with default None
- runtime_checkable CurrentUser Protocol (structural subtype with .id)
- set/reset/get/require helpers
- AUTO sentinel + resolve_owner_id(value, method_name) for sentinel
three-state resolution: AUTO reads contextvar, explicit str
overrides, explicit None bypasses the filter (for migration/CLI)
Repository changes
------------------
- ThreadMetaRepository: create/get/search/update_*/delete gain
owner_id=AUTO kwarg; read paths filter by owner, writes stamp it,
mutations check ownership before applying
- RunRepository: put/get/list_by_thread/delete gain owner_id=AUTO kwarg
- FeedbackRepository: create/get/list_by_run/list_by_thread/delete
gain owner_id=AUTO kwarg
- DbRunEventStore: list_messages/list_events/list_messages_by_run/
count_messages/delete_by_thread/delete_by_run gain owner_id=AUTO
kwarg. Write paths (put/put_batch) read contextvar softly: when a
request-scoped user is available, owner_id is stamped; background
worker writes without a user context pass None which is valid
(orphan row to be bound by migration)
Schema
------
- persistence/models/run_event.py: RunEventRow.owner_id = Mapped[
str | None] = mapped_column(String(64), nullable=True, index=True)
- No alembic migration needed: 2.0 ships fresh, Base.metadata.create_all
picks up the new column automatically
Middleware
----------
- auth_middleware.py: after cookie check, call get_optional_user_from_
request to load the real User, stamp it into request.state.user AND
the contextvar via set_current_user, reset in a try/finally. Public
paths and unauthenticated requests continue without contextvar, and
@require_auth handles the strict 401 path
Test infrastructure
-------------------
- tests/conftest.py: @pytest.fixture(autouse=True) _auto_user_context
sets a default SimpleNamespace(id="test-user-autouse") on every test
unless marked @pytest.mark.no_auto_user. Keeps existing 20+
persistence tests passing without modification
- pyproject.toml [tool.pytest.ini_options]: register no_auto_user
marker so pytest does not emit warnings for opt-out tests
- tests/test_user_context.py: 6 tests covering three-state semantics,
Protocol duck typing, and require/optional APIs
- tests/test_thread_meta_repo.py: one test updated to pass owner_id=
None explicitly where it was previously relying on the old default
Test results
------------
- test_user_context.py: 6 passed
- test_auth*.py + test_langgraph_auth.py + test_ensure_admin.py: 127
- test_run_event_store / test_run_repository / test_thread_meta_repo
/ test_feedback: 92 passed
- Full backend suite: 1905 passed, 2 failed (both @requires_llm flaky
integration tests unrelated to auth), 1 skipped
* feat(auth): extend orphan migration to 2.0-rc persistence tables
_ensure_admin_user now runs a three-step pipeline on every boot:
Step 1 (fatal): admin user exists / is created / password is reset
Step 2 (non-fatal): LangGraph store orphan threads → admin
Step 3 (non-fatal): SQL persistence tables → admin
- threads_meta
- runs
- run_events
- feedback
Each step is idempotent. The fatal/non-fatal split mirrors PR #1728's
original philosophy: admin creation failure blocks startup (the system
is unusable without an admin), whereas migration failures log a warning
and let the service proceed (a partial migration is recoverable; a
missing admin is not).
Key helpers
-----------
- _iter_store_items(store, namespace, *, page_size=500):
async generator that cursor-paginates across LangGraph store pages.
Fixes PR #1728's hardcoded limit=1000 bug that would silently lose
orphans beyond the first page.
- _migrate_orphaned_threads(store, admin_user_id):
Rewritten to use _iter_store_items. Returns the migrated count so the
caller can log it; raises only on unhandled exceptions.
- _migrate_orphan_sql_tables(admin_user_id):
Imports the 4 ORM models lazily, grabs the shared session factory,
runs one UPDATE per table in a single transaction, commits once.
No-op when no persistence backend is configured (in-memory dev).
Tests: test_ensure_admin.py (8 passed)
* test(auth): port AUTH test plan docs + lint/format pass
- Port backend/docs/AUTH_TEST_PLAN.md and AUTH_UPGRADE.md from PR #1728
- Rename metadata.user_id → metadata.owner_id in AUTH_TEST_PLAN.md
(4 occurrences from the original PR doc)
- ruff auto-fix UP037 in sentinel type annotations: drop quotes around
"str | None | _AutoSentinel" now that from __future__ import
annotations makes them implicit string forms
- ruff format: 2 files (app/gateway/app.py, runtime/user_context.py)
Note on test coverage additions:
- conftest.py autouse fixture was already added in commit 4 (had to
be co-located with the repository changes to keep pre-existing
persistence tests passing)
- cross-user isolation E2E tests (test_owner_isolation.py) deferred
— enforcement is already proven by the 98-test repository suite
via the autouse fixture + explicit _AUTO sentinel exercises
- New test cases (TC-API-17..20, TC-ATK-13, TC-MIG-01..07) listed
in AUTH_TEST_PLAN.md are deferred to a follow-up PR — they are
manual-QA test cases rather than pytest code, and the spec-level
coverage is already met by test_user_context.py + the 98-test
repository suite.
Final test results:
- Auth suite (test_auth*, test_langgraph_auth, test_ensure_admin,
test_user_context): 186 passed
- Persistence suite (test_run_event_store, test_run_repository,
test_thread_meta_repo, test_feedback): 98 passed
- Lint: ruff check + ruff format both clean
* test(auth): add cross-user isolation test suite
10 tests exercising the storage-layer owner filter by manually
switching the user_context contextvar between two users. Verifies
the safety invariant:
After a repository write with owner_id=A, a subsequent read with
owner_id=B must not return the row, and vice versa.
Covers all 4 tables that own user-scoped data:
TC-API-17 threads_meta — read, search, update, delete cross-user
TC-API-18 runs — get, list_by_thread, delete cross-user
TC-API-19 run_events — list_messages, list_events, count_messages,
delete_by_thread (CRITICAL: raw conversation
content leak vector)
TC-API-20 feedback — get, list_by_run, delete cross-user
Plus two meta-tests verifying the sentinel pattern itself:
- AUTO + unset contextvar raises RuntimeError
- explicit owner_id=None bypasses the filter (migration escape hatch)
Architecture note
-----------------
These tests bypass the HTTP layer by design. The full chain
(cookie → middleware → contextvar → repository) is covered piecewise:
- test_auth_middleware.py: middleware sets contextvar from cookies
- test_owner_isolation.py: repositories enforce isolation when
contextvar is set to different users
Together they prove the end-to-end safety property without the
ceremony of spinning up a full TestClient + in-memory DB for every
router endpoint.
Tests pass: 231 (full auth + persistence + isolation suite)
Lint: clean
* refactor(auth): migrate user repository to SQLAlchemy ORM
Move the users table into the shared persistence engine so auth
matches the pattern of threads_meta, runs, run_events, and feedback —
one engine, one session factory, one schema init codepath.
New files
---------
- persistence/user/__init__.py, persistence/user/model.py: UserRow
ORM class with partial unique index on (oauth_provider, oauth_id)
- Registered in persistence/models/__init__.py so
Base.metadata.create_all() picks it up
Modified
--------
- auth/repositories/sqlite.py: rewritten as async SQLAlchemy,
identical constructor pattern to the other four repositories
(def __init__(self, session_factory) + self._sf = session_factory)
- auth/config.py: drop users_db_path field — storage is configured
through config.database like every other table
- deps.py/get_local_provider: construct SQLiteUserRepository with
the shared session factory, fail fast if engine is not initialised
- tests/test_auth.py: rewrite test_sqlite_round_trip_new_fields to
use the shared engine (init_engine + close_engine in a tempdir)
- tests/test_auth_type_system.py: add per-test autouse fixture that
spins up a scratch engine and resets deps._cached_* singletons
* refactor(auth): remove SQL orphan migration (unused in supported scenarios)
The _migrate_orphan_sql_tables helper existed to bind NULL owner_id
rows in threads_meta, runs, run_events, and feedback to the admin on
first boot. But in every supported upgrade path, it's a no-op:
1. Fresh install: create_all builds fresh tables, no legacy rows
2. No-auth → with-auth (no existing persistence DB): persistence
tables are created fresh by create_all, no legacy rows
3. No-auth → with-auth (has existing persistence DB from #1930):
NOT a supported upgrade path — "有 DB 到有 DB" schema evolution
is out of scope; users wipe DB or run manual ALTER
So the SQL orphan migration never has anything to do in the
supported matrix. Delete the function, simplify _ensure_admin_user
from a 3-step pipeline to a 2-step one (admin creation + LangGraph
store orphan migration only).
LangGraph store orphan migration stays: it serves the real
"no-auth → with-auth" upgrade path where a user's existing LangGraph
thread metadata has no owner_id field and needs to be stamped with
the newly-created admin's id.
Tests: 284 passed (auth + persistence + isolation)
Lint: clean
* security(auth): write initial admin password to 0600 file instead of logs
CodeQL py/clear-text-logging-sensitive-data flagged 3 call sites that
logged the auto-generated admin password to stdout via logger.info().
Production log aggregators (ELK/Splunk/etc) would have captured those
cleartext secrets. Replace with a shared helper that writes to
.deer-flow/admin_initial_credentials.txt with mode 0600, and log only
the path.
New file
--------
- app/gateway/auth/credential_file.py: write_initial_credentials()
helper. Takes email, password, and a "initial"/"reset" label.
Creates .deer-flow/ if missing, writes a header comment plus the
email+password, chmods 0o600, returns the absolute Path.
Modified
--------
- app/gateway/app.py: both _ensure_admin_user paths (fresh creation
+ needs_setup password reset) now write to file and log the path
- app/gateway/auth/reset_admin.py: rewritten to use the shared ORM
repo (SQLiteUserRepository with session_factory) and the
credential_file helper. The previous implementation was broken
after the earlier ORM refactor — it still imported _get_users_conn
and constructed SQLiteUserRepository() without a session factory.
No tests changed — the three password-log sites are all exercised
via existing test_ensure_admin.py which checks that startup
succeeds, not that a specific string appears in logs.
CodeQL alerts 272, 283, 284: all resolved.
* security(auth): strict JWT validation in middleware (fix junk cookie bypass)
AUTH_TEST_PLAN test 7.5.8 expects junk cookies to be rejected with
401. The previous middleware behaviour was "presence-only": check
that some access_token cookie exists, then pass through. In
combination with my Task-12 decision to skip @require_auth
decorators on routes, this created a gap where a request with any
cookie-shaped string (e.g. access_token=not-a-jwt) would bypass
authentication on routes that do not touch the repository
(/api/models, /api/mcp/config, /api/memory, /api/skills, …).
Fix: middleware now calls get_current_user_from_request() strictly
and catches the resulting HTTPException to render a 401 with the
proper fine-grained error code (token_invalid, token_expired,
user_not_found, …). On success it stamps request.state.user and
the contextvar so repository-layer owner filters work downstream.
The 4 old "_with_cookie_passes" tests in test_auth_middleware.py
were written for the presence-only behaviour; they asserted that
a junk cookie would make the handler return 200. They are renamed
to "_with_junk_cookie_rejected" and their assertions flipped to
401. The negative path (no cookie → 401 not_authenticated)
is unchanged.
Verified:
no cookie → 401 not_authenticated
junk cookie → 401 token_invalid (the fixed bug)
expired cookie → 401 token_expired
Tests: 284 passed (auth + persistence + isolation)
Lint: clean
* security(auth): wire @require_permission(owner_check=True) on isolation routes
Apply the require_permission decorator to all 28 routes that take a
{thread_id} path parameter. Combined with the strict middleware
(previous commit), this gives the double-layer protection that
AUTH_TEST_PLAN test 7.5.9 documents:
Layer 1 (AuthMiddleware): cookie + JWT validation, rejects junk
cookies and stamps request.state.user
Layer 2 (@require_permission with owner_check=True): per-resource
ownership verification via
ThreadMetaStore.check_access — returns
404 if a different user owns the thread
The decorator's owner_check branch is rewritten to use the SQL
thread_meta_repo (the 2.0-rc persistence layer) instead of the
LangGraph store path that PR #1728 used (_store_get / get_store
in routers/threads.py). The inject_record convenience is dropped
— no caller in 2.0 needs the LangGraph blob, and the SQL repo has
a different shape.
Routes decorated (28 total):
- threads.py: delete, patch, get, get-state, post-state, post-history
- thread_runs.py: post-runs, post-runs-stream, post-runs-wait,
list_runs, get_run, cancel_run, join_run, stream_existing_run,
list_thread_messages, list_run_messages, list_run_events,
thread_token_usage
- feedback.py: create, list, stats, delete
- uploads.py: upload (added Request param), list, delete
- artifacts.py: get_artifact
- suggestions.py: generate (renamed body parameter to avoid
conflict with FastAPI Request)
Test fixes:
- test_suggestions_router.py: bypass the decorator via __wrapped__
(the unit tests cover parsing logic, not auth — no point spinning
up a thread_meta_repo just to test JSON unwrapping)
- test_auth_middleware.py 4 fake-cookie tests: already updated in
the previous commit (745bf432)
Tests: 293 passed (auth + persistence + isolation + suggestions)
Lint: clean
* security(auth): defense-in-depth fixes from release validation pass
Eight findings caught while running the AUTH_TEST_PLAN end-to-end against
the deployed sg_dev stack. Each is a pre-condition for shipping
release/2.0-rc that the previous PRs missed.
Backend hardening
- routers/auth.py: rate limiter X-Real-IP now requires AUTH_TRUSTED_PROXIES
whitelist (CIDR/IP allowlist). Without nginx in front, the previous code
honored arbitrary X-Real-IP, letting an attacker rotate the header to
fully bypass the per-IP login lockout.
- routers/auth.py: 36-entry common-password blocklist via Pydantic
field_validator on RegisterRequest + ChangePasswordRequest. The shared
_validate_strong_password helper keeps the constraint in one place.
- routers/threads.py: ThreadCreateRequest + ThreadPatchRequest strip
server-reserved metadata keys (owner_id, user_id) via Pydantic
field_validator so a forged value can never round-trip back to other
clients reading the same thread. The actual ownership invariant stays
on the threads_meta row; this closes the metadata-blob echo gap.
- authz.py + thread_meta/sql.py: require_permission gains a require_existing
flag plumbed through check_access(require_existing=True). Destructive
routes (DELETE/PATCH/state-update/runs/feedback) now treat a missing
thread_meta row as 404 instead of "untracked legacy thread, allow",
closing the cross-user delete-idempotence gap where any user could
successfully DELETE another user's deleted thread.
- repositories/sqlite.py + base.py: update_user raises UserNotFoundError
on a vanished row instead of silently returning the input. Concurrent
delete during password reset can no longer look like a successful update.
- runtime/user_context.py: resolve_owner_id() coerces User.id (UUID) to
str at the contextvar boundary so SQLAlchemy String(64) columns can
bind it. The whole 2.0-rc isolation pipeline was previously broken
end-to-end (POST /api/threads → 500 "type 'UUID' is not supported").
- persistence/engine.py: SQLAlchemy listener enables PRAGMA journal_mode=WAL,
synchronous=NORMAL, foreign_keys=ON on every new SQLite connection.
TC-UPG-06 in the test plan expects WAL; previous code shipped with the
default 'delete' journal.
- auth_middleware.py: stamp request.state.auth = AuthContext(...) so
@require_permission's short-circuit fires; previously every isolation
request did a duplicate JWT decode + users SELECT. Also unifies the
401 payload through AuthErrorResponse(...).model_dump().
- app.py: _ensure_admin_user restructure removes the noqa F821 scoping
bug where 'password' was referenced outside the branch that defined it.
New _announce_credentials helper absorbs the duplicate log block in
the fresh-admin and reset-admin branches.
* fix(frontend+nginx): rollout CSRF on every state-changing client path
The frontend was 100% broken in gateway-pro mode for any user trying to
open a specific chat thread. Three cumulative bugs each silently
masked the next.
LangGraph SDK CSRF gap (api-client.ts)
- The Client constructor took only apiUrl, no defaultHeaders, no fetch
interceptor. The SDK's internal fetch never sent X-CSRF-Token, so
every state-changing /api/langgraph-compat/* call (runs/stream,
threads/search, threads/{tid}/history, ...) hit CSRFMiddleware and
got 403 before reaching the auth check. UI symptom: empty thread page
with no error message; the SPA's hooks swallowed the rejection.
- Fix: pass an onRequest hook that injects X-CSRF-Token from the
csrf_token cookie per request. Reading the cookie per call (not at
construction time) handles login / logout / password-change cookie
rotation transparently. The SDK's prepareFetchOptions calls
onRequest for both regular requests AND streaming/SSE/reconnect, so
the same hook covers runs.stream and runs.joinStream.
Raw fetch CSRF gap (7 files)
- Audit: 11 frontend fetch sites, only 2 included CSRF (login/setup +
account-settings change-password). The other 7 routed through raw
fetch() with no header — suggestions, memory, agents, mcp, skills,
uploads, and the local thread cleanup hook all 403'd silently.
- Fix: enhance fetcher.ts:fetchWithAuth to auto-inject X-CSRF-Token on
POST/PUT/DELETE/PATCH from a single shared readCsrfCookie() helper.
Convert all 7 raw fetch() callers to fetchWithAuth so the contract
is centrally enforced. api-client.ts and fetcher.ts share
readCsrfCookie + STATE_CHANGING_METHODS to avoid drift.
nginx routing + buffering (nginx.local.conf)
- The auth feature shipped without updating the nginx config: per-API
explicit location blocks but no /api/v1/auth/, /api/feedback, /api/runs.
The frontend's client-side fetches to /api/v1/auth/login/local 404'd
from the Next.js side because nginx routed /api/* to the frontend.
- Fix: add catch-all `location /api/` that proxies to the gateway.
nginx longest-prefix matching keeps the explicit blocks (/api/models,
/api/threads regex, /api/langgraph/, ...) winning for their paths.
- Fix: disable proxy_buffering + proxy_request_buffering for the
frontend `location /` block. Without it, nginx tries to spool large
Next.js chunks into /var/lib/nginx/proxy (root-owned) and fails with
Permission denied → ERR_INCOMPLETE_CHUNKED_ENCODING → ChunkLoadError.
* test(auth): release-validation test infra and new coverage
Test fixtures and unit tests added during the validation pass.
Router test helpers (NEW: tests/_router_auth_helpers.py)
- make_authed_test_app(): builds a FastAPI test app with a stub
middleware that stamps request.state.user + request.state.auth and a
permissive thread_meta_repo mock. TestClient-based router tests
(test_artifacts_router, test_threads_router) use it instead of bare
FastAPI() so the new @require_permission(owner_check=True) decorators
short-circuit cleanly.
- call_unwrapped(): walks the __wrapped__ chain to invoke the underlying
handler without going through the authz wrappers. Direct-call tests
(test_uploads_router) use it. Typed with ParamSpec so the wrapped
signature flows through.
Backend test additions
- test_auth.py: 7 tests for the new _get_client_ip trust model (no
proxy / trusted proxy / untrusted peer / XFF rejection / invalid
CIDR / no client). 5 tests for the password blocklist (literal,
case-insensitive, strong password accepted, change-password binding,
short-password length-check still fires before blocklist).
test_update_user_raises_when_row_concurrently_deleted: closes a
shipped-without-coverage gap on the new UserNotFoundError contract.
- test_thread_meta_repo.py: 4 tests for check_access(require_existing=True)
— strict missing-row denial, strict owner match, strict owner mismatch,
strict null-owner still allowed (shared rows survive the tightening).
- test_ensure_admin.py: 3 tests for _migrate_orphaned_threads /
_iter_store_items pagination, covering the TC-UPG-02 upgrade story
end-to-end via mock store. Closes the gap where the cursor pagination
was untested even though the previous PR rewrote it.
- test_threads_router.py: 5 tests for _strip_reserved_metadata
(owner_id removal, user_id removal, safe-keys passthrough, empty
input, both-stripped).
- test_auth_type_system.py: replace "password123" fixtures with
Tr0ub4dor3a / AnotherStr0ngPwd! so the new password blocklist
doesn't reject the test data.
* docs(auth): refresh TC-DOCKER-05 + document Docker validation gap
- AUTH_TEST_PLAN.md TC-DOCKER-05: the previous expectation
("admin password visible in docker logs") was stale after the simplify
pass that moved credentials to a 0600 file. The grep "Password:" check
would have silently failed and given a false sense of coverage. New
expectation matches the actual file-based path: 0600 file in
DEER_FLOW_HOME, log shows the path (not the secret), reverse-grep
asserts no leaked password in container logs.
- NEW: docs/AUTH_TEST_DOCKER_GAP.md documents the only un-executed
block in the test plan (TC-DOCKER-01..06). Reason: sg_dev validation
host has no Docker daemon installed. The doc maps each Docker case
to an already-validated bare-metal equivalent (TC-1.1, TC-REENT-01,
TC-API-02 etc.) so the gap is auditable, and includes pre-flight
reproduction steps for whoever has Docker available.
---------
Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com>
* fix(persistence): stream hang when run_events.backend=db
DbRunEventStore._user_id_from_context() returned user.id without
coercing it to str. User.id is a Pydantic UUID, and aiosqlite cannot
bind a raw UUID object to a VARCHAR column, so the INSERT for the
initial human_message event silently rolled back and raised out of
the worker task. Because that put() sat outside the worker's try
block, the finally-clause that publishes end-of-stream never ran
and the SSE stream hung forever.
jsonl mode was unaffected because json.dumps(default=str) coerces
UUID objects transparently.
Fixes:
- db.py: coerce user.id to str at the context-read boundary (matches
what resolve_user_id already does for the other repositories)
- worker.py: move RunJournal init + human_message put inside the try
block so any failure flows through the finally/publish_end path
instead of hanging the subscriber
Defense-in-depth:
- engine.py: add PRAGMA busy_timeout=5000 so checkpointer and event
store wait for each other on the shared deerflow.db file instead
of failing immediately under write-lock contention
- journal.py: skip fire-and-forget _flush_sync when a previous flush
task is still in flight, to avoid piling up concurrent put_batch
writes on the same SQLAlchemy engine during streaming; flush() now
waits for pending tasks before draining the buffer
- database_config.py: doc-only update clarifying WAL + busy_timeout
keep the unified deerflow.db safe for both workloads
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore(persistence): drop redundant busy_timeout PRAGMA
Python's sqlite3 driver defaults to a 5-second busy timeout via the
``timeout`` kwarg of ``sqlite3.connect``, and aiosqlite + SQLAlchemy's
aiosqlite dialect inherit that default. Setting ``PRAGMA busy_timeout=5000``
explicitly was a no-op — verified by reading back the PRAGMA on a fresh
connection (it already reports 5000ms without our PRAGMA).
Concurrent stress test (50 checkpoint writes + 20 event batches + 50
thread_meta updates on the same deerflow.db) still completes with zero
errors and 200/200 rows after removing the explicit PRAGMA.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(rebase): remove duplicate definitions and update stale module paths
Rebase left duplicate function blocks in worker.py (triple human_message
write causing 3x user messages in /history), deps.py, and prompt.py.
Also update checkpointer imports from the old deerflow.agents.checkpointer
path to deerflow.runtime.checkpointer, and clean up orphaned feedback
props in the frontend message components.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(user-context): add DEFAULT_USER_ID and get_effective_user_id helper
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(paths): add user-aware path methods with optional user_id parameter
Add _validate_user_id(), user_dir(), user_memory_file(), user_agent_memory_file()
and optional keyword-only user_id parameter to all thread-related path methods.
When user_id is provided, paths resolve under users/{user_id}/threads/{thread_id}/;
when omitted, legacy layout is preserved for backward compatibility.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(memory): add user_id to MemoryStorage interface for per-user isolation
Thread user_id through MemoryStorage.load/reload/save abstract methods and
FileMemoryStorage, re-keying the in-memory cache from bare agent_name to a
(user_id, agent_name) tuple to prevent cross-user cache collisions.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(memory): thread user_id through memory updater layer
Add `user_id` keyword-only parameter to all public updater functions
(_save_memory_to_file, get_memory_data, reload_memory_data, import_memory_data,
clear_memory_data, create/delete/update_memory_fact) and regular keyword param
to MemoryUpdater.update_memory + update_memory_from_conversation, propagating
it to every storage load/save/reload call.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(memory): capture user_id at enqueue time for async-safe thread isolation
Add user_id field to ConversationContext and MemoryUpdateQueue.add() so the
user identity is stored explicitly at request time, before threading.Timer
fires on a different thread where ContextVar values do not propagate.
MemoryMiddleware.after_agent() now calls get_effective_user_id() at enqueue
time and passes the value through to updater.update_memory().
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(isolation): wire user_id through all Paths and memory callsites
Pass user_id=get_effective_user_id() at every callsite that invokes
Paths methods or memory functions, enabling per-user filesystem isolation
throughout the harness and app layers.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(migration): add idempotent script for per-user data migration
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: update CLAUDE.md and config docs for per-user isolation
* feat(events): add pagination to list_messages_by_run on all store backends
Replicates the existing before_seq/after_seq/limit cursor-pagination pattern
from list_messages onto list_messages_by_run across the abstract interface,
MemoryRunEventStore, JsonlRunEventStore, and DbRunEventStore.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(api): add GET /api/runs/{run_id}/messages with cursor pagination
New endpoint resolves thread_id from the run record and delegates to
RunEventStore.list_messages_by_run for cursor-based pagination.
Ownership is enforced implicitly via RunStore.get() user filtering.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(api): add GET /api/runs/{run_id}/feedback
Delegates to FeedbackRepository.list_by_run via the existing _resolve_run
helper; includes tests for success, 404, empty list, and 503 (no DB).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(api): retrofit cursor pagination onto GET /threads/{tid}/runs/{rid}/messages
Replace bare list[dict] response with {data: [...], has_more: bool} envelope,
forwarding limit/before_seq/after_seq query params to the event store.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: add run-level API endpoints to CLAUDE.md routers table
* refactor(threads): remove event-store message loader and feedback from state/history endpoints
State and history endpoints now return messages purely from the
checkpointer's channel_values. The _get_event_store_messages helper
(which loaded the full event-store transcript with feedback attached)
is removed along with its tests. Frontend will use the dedicated
GET /api/runs/{run_id}/messages and /feedback endpoints instead.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
Co-authored-by: DanielWalnut <45447813+hetaoBackend@users.noreply.github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: JilongSun <965640067@qq.com>
Co-authored-by: jie <49781832+stan-fu@users.noreply.github.com>
Co-authored-by: cooper <cooperfu@tencent.com>
Co-authored-by: yangzheli <43645580+yangzheli@users.noreply.github.com>
Co-authored-by: greatmengqi <chenmengqi.0376@gmail.com>
Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com>
- Added titles and descriptions to workspace usage, configuration, customization, design principles, installation, integration guide, lead agent, MCP integration, memory system, middleware, quick start, sandbox, skills, subagents, and tools documentation.
- Removed outdated API/Gateway reference and concepts glossary pages.
- Updated configuration reference to reflect current structure and removed unnecessary sections.
- Introduced new model provider documentation for Ark and updated the index page for model providers.
- Enhanced tutorials with titles and descriptions for better clarity and navigation.
* feat(persistence): add unified persistence layer with event store, token tracking, and feedback (#1930)
* feat(persistence): add SQLAlchemy 2.0 async ORM scaffold
Introduce a unified database configuration (DatabaseConfig) that
controls both the LangGraph checkpointer and the DeerFlow application
persistence layer from a single `database:` config section.
New modules:
- deerflow.config.database_config — Pydantic config with memory/sqlite/postgres backends
- deerflow.persistence — async engine lifecycle, DeclarativeBase with to_dict mixin, Alembic skeleton
- deerflow.runtime.runs.store — RunStore ABC + MemoryRunStore implementation
Gateway integration initializes/tears down the persistence engine in
the existing langgraph_runtime() context manager. Legacy checkpointer
config is preserved for backward compatibility.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(persistence): add RunEventStore ABC + MemoryRunEventStore
Phase 2-A prerequisite for event storage: adds the unified run event
stream interface (RunEventStore) with an in-memory implementation,
RunEventsConfig, gateway integration, and comprehensive tests (27 cases).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(persistence): add ORM models, repositories, DB/JSONL event stores, RunJournal, and API endpoints
Phase 2-B: run persistence + event storage + token tracking.
- ORM models: RunRow (with token fields), ThreadMetaRow, RunEventRow
- RunRepository implements RunStore ABC via SQLAlchemy ORM
- ThreadMetaRepository with owner access control
- DbRunEventStore with trace content truncation and cursor pagination
- JsonlRunEventStore with per-run files and seq recovery from disk
- RunJournal (BaseCallbackHandler) captures LLM/tool/lifecycle events,
accumulates token usage by caller type, buffers and flushes to store
- RunManager now accepts optional RunStore for persistent backing
- Worker creates RunJournal, writes human_message, injects callbacks
- Gateway deps use factory functions (RunRepository when DB available)
- New endpoints: messages, run messages, run events, token-usage
- ThreadCreateRequest gains assistant_id field
- 92 tests pass (33 new), zero regressions
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(persistence): add user feedback + follow-up run association
Phase 2-C: feedback and follow-up tracking.
- FeedbackRow ORM model (rating +1/-1, optional message_id, comment)
- FeedbackRepository with CRUD, list_by_run/thread, aggregate stats
- Feedback API endpoints: create, list, stats, delete
- follow_up_to_run_id in RunCreateRequest (explicit or auto-detected
from latest successful run on the thread)
- Worker writes follow_up_to_run_id into human_message event metadata
- Gateway deps: feedback_repo factory + getter
- 17 new tests (14 FeedbackRepository + 3 follow-up association)
- 109 total tests pass, zero regressions
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test+config: comprehensive Phase 2 test coverage + deprecate checkpointer config
- config.example.yaml: deprecate standalone checkpointer section, activate
unified database:sqlite as default (drives both checkpointer + app data)
- New: test_thread_meta_repo.py (14 tests) — full ThreadMetaRepository coverage
including check_access owner logic, list_by_owner pagination
- Extended test_run_repository.py (+4 tests) — completion preserves fields,
list ordering desc, limit, owner_none returns all
- Extended test_run_journal.py (+8 tests) — on_chain_error, track_tokens=false,
middleware no ai_message, unknown caller tokens, convenience fields,
tool_error, non-summarization custom event
- Extended test_run_event_store.py (+7 tests) — DB batch seq continuity,
make_run_event_store factory (memory/db/jsonl/fallback/unknown)
- Extended test_phase2b_integration.py (+4 tests) — create_or_reject persists,
follow-up metadata, summarization in history, full DB-backed lifecycle
- Fixed DB integration test to use proper fake objects (not MagicMock)
for JSON-serializable metadata
- 157 total Phase 2 tests pass, zero regressions
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* config: move default sqlite_dir to .deer-flow/data
Keep SQLite databases alongside other DeerFlow-managed data
(threads, memory) under the .deer-flow/ directory instead of a
top-level ./data folder.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(persistence): remove UTFJSON, use engine-level json_serializer + datetime.now()
- Replace custom UTFJSON type with standard sqlalchemy.JSON in all ORM
models. Add json_serializer=json.dumps(ensure_ascii=False) to all
create_async_engine calls so non-ASCII text (Chinese etc.) is stored
as-is in both SQLite and Postgres.
- Change ORM datetime defaults from datetime.now(UTC) to datetime.now(),
remove UTC imports.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(gateway): simplify deps.py with getter factory + inline repos
- Replace 6 identical getter functions with _require() factory.
- Inline 3 _make_*_repo() factories into langgraph_runtime(), call
get_session_factory() once instead of 3 times.
- Add thread_meta upsert in start_run (services.py).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(docker): add UV_EXTRAS build arg for optional dependencies
Support installing optional dependency groups (e.g. postgres) at
Docker build time via UV_EXTRAS build arg:
UV_EXTRAS=postgres docker compose build
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(journal): fix flush, token tracking, and consolidate tests
RunJournal fixes:
- _flush_sync: retain events in buffer when no event loop instead of
dropping them; worker's finally block flushes via async flush().
- on_llm_end: add tool_calls filter and caller=="lead_agent" guard for
ai_message events; mark message IDs for dedup with record_llm_usage.
- worker.py: persist completion data (tokens, message count) to RunStore
in finally block.
Model factory:
- Auto-inject stream_usage=True for BaseChatOpenAI subclasses with
custom api_base, so usage_metadata is populated in streaming responses.
Test consolidation:
- Delete test_phase2b_integration.py (redundant with existing tests).
- Move DB-backed lifecycle test into test_run_journal.py.
- Add tests for stream_usage injection in test_model_factory.py.
- Clean up executor/task_tool dead journal references.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(events): widen content type to str|dict in all store backends
Allow event content to be a dict (for structured OpenAI-format messages)
in addition to plain strings. Dict values are JSON-serialized for the DB
backend and deserialized on read; memory and JSONL backends handle dicts
natively. Trace truncation now serializes dicts to JSON before measuring.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(events): use metadata flag instead of heuristic for dict content detection
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(converters): add LangChain-to-OpenAI message format converters
Pure functions langchain_to_openai_message, langchain_to_openai_completion,
langchain_messages_to_openai, and _infer_finish_reason for converting
LangChain BaseMessage objects to OpenAI Chat Completions format, used by
RunJournal for event storage. 15 unit tests added.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(converters): handle empty list content as null, clean up test
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(events): human_message content uses OpenAI user message format
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(events): ai_message uses OpenAI format, add ai_tool_call message event
- ai_message content now uses {"role": "assistant", "content": "..."} format
- New ai_tool_call message event emitted when lead_agent LLM responds with tool_calls
- ai_tool_call uses langchain_to_openai_message converter for consistent format
- Both events include finish_reason in metadata ("stop" or "tool_calls")
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(events): add tool_result message event with OpenAI tool message format
Cache tool_call_id from on_tool_start keyed by run_id as fallback for on_tool_end,
then emit a tool_result message event (role=tool, tool_call_id, content) after each
successful tool completion.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(events): summary content uses OpenAI system message format
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(events): replace llm_start/llm_end with llm_request/llm_response in OpenAI format
Add on_chat_model_start to capture structured prompt messages as llm_request events.
Replace llm_end trace events with llm_response using OpenAI Chat Completions format.
Track llm_call_index to pair request/response events.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(events): add record_middleware method for middleware trace events
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test(events): add full run sequence integration test for OpenAI content format
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(events): align message events with checkpoint format and add middleware tag injection
- Message events (ai_message, ai_tool_call, tool_result, human_message) now use
BaseMessage.model_dump() format, matching LangGraph checkpoint values.messages
- on_tool_end extracts tool_call_id/name/status from ToolMessage objects
- on_tool_error now emits tool_result message events with error status
- record_middleware uses middleware:{tag} event_type and middleware category
- Summarization custom events use middleware:summarize category
- TitleMiddleware injects middleware:title tag via get_config() inheritance
- SummarizationMiddleware model bound with middleware:summarize tag
- Worker writes human_message using HumanMessage.model_dump()
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(threads): switch search endpoint to threads_meta table and sync title
- POST /api/threads/search now queries threads_meta table directly,
removing the two-phase Store + Checkpointer scan approach
- Add ThreadMetaRepository.search() with metadata/status filters
- Add ThreadMetaRepository.update_display_name() for title sync
- Worker syncs checkpoint title to threads_meta.display_name on run completion
- Map display_name to values.title in search response for API compatibility
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(threads): history endpoint reads messages from event store
- POST /api/threads/{thread_id}/history now combines two data sources:
checkpointer for checkpoint_id, metadata, title, thread_data;
event store for messages (complete history, not truncated by summarization)
- Strip internal LangGraph metadata keys from response
- Remove full channel_values serialization in favor of selective fields
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: remove duplicate optional-dependencies header in pyproject.toml
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(middleware): pass tagged config to TitleMiddleware ainvoke call
Without the config, the middleware:title tag was not injected,
causing the LLM response to be recorded as a lead_agent ai_message
in run_events.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: resolve merge conflict in .env.example
Keep both DATABASE_URL (from persistence-scaffold) and WECOM
credentials (from main) after the merge.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(persistence): address review feedback on PR #1851
- Fix naive datetime.now() → datetime.now(UTC) in all ORM models
- Fix seq race condition in DbRunEventStore.put() with FOR UPDATE
and UNIQUE(thread_id, seq) constraint
- Encapsulate _store access in RunManager.update_run_completion()
- Deduplicate _store.put() logic in RunManager via _persist_to_store()
- Add update_run_completion to RunStore ABC + MemoryRunStore
- Wire follow_up_to_run_id through the full create path
- Add error recovery to RunJournal._flush_sync() lost-event scenario
- Add migration note for search_threads breaking change
- Fix test_checkpointer_none_fix mock to set database=None
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: update uv.lock
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(persistence): address 22 review comments from CodeQL, Copilot, and Code Quality
Bug fixes:
- Sanitize log params to prevent log injection (CodeQL)
- Reset threads_meta.status to idle/error when run completes
- Attach messages only to latest checkpoint in /history response
- Write threads_meta on POST /threads so new threads appear in search
Lint fixes:
- Remove unused imports (journal.py, migrations/env.py, test_converters.py)
- Convert lambda to named function (engine.py, Ruff E731)
- Remove unused logger definitions in repos (Ruff F841)
- Add logging to JSONL decode errors and empty except blocks
- Separate assert side-effects in tests (CodeQL)
- Remove unused local variables in tests (Ruff F841)
- Fix max_trace_content truncation to use byte length, not char length
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* style: apply ruff format to persistence and runtime files
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Potential fix for pull request finding 'Statement has no effect'
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
* refactor(runtime): introduce RunContext to reduce run_agent parameter bloat
Extract checkpointer, store, event_store, run_events_config, thread_meta_repo,
and follow_up_to_run_id into a frozen RunContext dataclass. Add get_run_context()
in deps.py to build the base context from app.state singletons. start_run() uses
dataclasses.replace() to enrich per-run fields before passing ctx to run_agent.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(gateway): move sanitize_log_param to app/gateway/utils.py
Extract the log-injection sanitizer from routers/threads.py into a shared
utils module and rename to sanitize_log_param (public API). Eliminates the
reverse service → router import in services.py.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* perf: use SQL aggregation for feedback stats and thread token usage
Replace Python-side counting in FeedbackRepository.aggregate_by_run with
a single SELECT COUNT/SUM query. Add RunStore.aggregate_tokens_by_thread
abstract method with SQL GROUP BY implementation in RunRepository and
Python fallback in MemoryRunStore. Simplify the thread_token_usage
endpoint to delegate to the new method, eliminating the limit=10000
truncation risk.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: annotate DbRunEventStore.put() as low-frequency path
Add docstring clarifying that put() opens a per-call transaction with
FOR UPDATE and should only be used for infrequent writes (currently
just the initial human_message event). High-throughput callers should
use put_batch() instead.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(threads): fall back to Store search when ThreadMetaRepository is unavailable
When database.backend=memory (default) or no SQL session factory is
configured, search_threads now queries the LangGraph Store instead of
returning 503. Returns empty list if neither Store nor repo is available.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(persistence): introduce ThreadMetaStore ABC for backend-agnostic thread metadata
Add ThreadMetaStore abstract base class with create/get/search/update/delete
interface. ThreadMetaRepository (SQL) now inherits from it. New
MemoryThreadMetaStore wraps LangGraph BaseStore for memory-mode deployments.
deps.py now always provides a non-None thread_meta_repo, eliminating all
`if thread_meta_repo is not None` guards in services.py, worker.py, and
routers/threads.py. search_threads no longer needs a Store fallback branch.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(history): read messages from checkpointer instead of RunEventStore
The /history endpoint now reads messages directly from the
checkpointer's channel_values (the authoritative source) instead of
querying RunEventStore.list_messages(). The RunEventStore API is
preserved for other consumers.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(persistence): address new Copilot review comments
- feedback.py: validate thread_id/run_id before deleting feedback
- jsonl.py: add path traversal protection with ID validation
- run_repo.py: parse `before` to datetime for PostgreSQL compat
- thread_meta_repo.py: fix pagination when metadata filter is active
- database_config.py: use resolve_path for sqlite_dir consistency
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Implement skill self-evolution and skill_manage flow (#1874)
* chore: ignore .worktrees directory
* Add skill_manage self-evolution flow
* Fix CI regressions for skill_manage
* Address PR review feedback for skill evolution
* fix(skill-evolution): preserve history on delete
* fix(skill-evolution): tighten scanner fallbacks
* docs: add skill_manage e2e evidence screenshot
* fix(skill-manage): avoid blocking fs ops in session runtime
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(config): resolve sqlite_dir relative to CWD, not Paths.base_dir
resolve_path() resolves relative to Paths.base_dir (.deer-flow),
which double-nested the path to .deer-flow/.deer-flow/data/app.db.
Use Path.resolve() (CWD-relative) instead.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Feature/feishu receive file (#1608)
* feat(feishu): add channel file materialization hook for inbound messages
- Introduce Channel.receive_file(msg, thread_id) as a base method for file materialization; default is no-op.
- Implement FeishuChannel.receive_file to download files/images from Feishu messages, save to sandbox, and inject virtual paths into msg.text.
- Update ChannelManager to call receive_file for any channel if msg.files is present, enabling downstream model access to user-uploaded files.
- No impact on Slack/Telegram or other channels (they inherit the default no-op).
* style(backend): format code with ruff for lint compliance
- Auto-formatted packages/harness/deerflow/agents/factory.py and tests/test_create_deerflow_agent.py using `ruff format`
- Ensured both files conform to project linting standards
- Fixes CI lint check failures caused by code style issues
* fix(feishu): handle file write operation asynchronously to prevent blocking
* fix(feishu): rename GetMessageResourceRequest to _GetMessageResourceRequest and remove redundant code
* test(feishu): add tests for receive_file method and placeholder replacement
* fix(manager): remove unnecessary type casting for channel retrieval
* fix(feishu): update logging messages to reflect resource handling instead of image
* fix(feishu): sanitize filename by replacing invalid characters in file uploads
* fix(feishu): improve filename sanitization and reorder image key handling in message processing
* fix(feishu): add thread lock to prevent filename conflicts during file downloads
* fix(test): correct bad merge in test_feishu_parser.py
* chore: run ruff and apply formatting cleanup
fix(feishu): preserve rich-text attachment order and improve fallback filename handling
* fix(docker): restore gateway env vars and fix langgraph empty arg issue (#1915)
Two production docker-compose.yaml bugs prevent `make up` from working:
1. Gateway missing DEER_FLOW_CONFIG_PATH and DEER_FLOW_EXTENSIONS_CONFIG_PATH
environment overrides. Added in fb2d99f (#1836) but accidentally reverted
by ca2fb95 (#1847). Without them, gateway reads host paths from .env via
env_file, causing FileNotFoundError inside the container.
2. Langgraph command fails when LANGGRAPH_ALLOW_BLOCKING is unset (default).
Empty $${allow_blocking} inserts a bare space between flags, causing
' --no-reload' to be parsed as unexpected extra argument. Fix by building
args string first and conditionally appending --allow-blocking.
Co-authored-by: cooper <cooperfu@tencent.com>
* fix(frontend): resolve invalid HTML nesting and tabnabbing vulnerabilities (#1904)
* fix(frontend): resolve invalid HTML nesting and tabnabbing vulnerabilities
Fix `<button>` inside `<a>` invalid HTML in artifact components and add
missing `noopener,noreferrer` to `window.open` calls to prevent reverse
tabnabbing.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(frontend): address Copilot review on tabnabbing and double-tab-open
Remove redundant parent onClick on web_fetch ChainOfThoughtStep to
prevent opening two tabs on link click, and explicitly null out
window.opener after window.open() for defensive tabnabbing hardening.
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* refactor(persistence): organize entities into per-entity directories
Restructure the persistence layer from horizontal "models/ + repositories/"
split into vertical entity-aligned directories. Each entity (thread_meta,
run, feedback) now owns its ORM model, abstract interface (where applicable),
and concrete implementations under a single directory with an aggregating
__init__.py for one-line imports.
Layout:
persistence/thread_meta/{base,model,sql,memory}.py
persistence/run/{model,sql}.py
persistence/feedback/{model,sql}.py
models/__init__.py is kept as a facade so Alembic autogenerate continues to
discover all ORM tables via Base.metadata. RunEventRow remains under
models/run_event.py because its storage implementation lives in
runtime/events/store/db.py and has no matching repository directory.
The repositories/ directory is removed entirely. All call sites in
gateway/deps.py and tests are updated to import from the new entity
packages, e.g.:
from deerflow.persistence.thread_meta import ThreadMetaRepository
from deerflow.persistence.run import RunRepository
from deerflow.persistence.feedback import FeedbackRepository
Full test suite passes (1690 passed, 14 skipped).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(gateway): sync thread rename and delete through ThreadMetaStore
The POST /threads/{id}/state endpoint previously synced title changes
only to the LangGraph Store via _store_upsert. In sqlite mode the search
endpoint reads from the ThreadMetaRepository SQL table, so renames never
appeared in /threads/search until the next agent run completed (worker.py
syncs title from checkpoint to thread_meta in its finally block).
Likewise the DELETE /threads/{id} endpoint cleaned up the filesystem,
Store, and checkpointer but left the threads_meta row orphaned in sqlite,
so deleted threads kept appearing in /threads/search.
Fix both endpoints by routing through the ThreadMetaStore abstraction
which already has the correct sqlite/memory implementations wired up by
deps.py. The rename path now calls update_display_name() and the delete
path calls delete() — both work uniformly across backends.
Verified end-to-end with curl in gateway mode against sqlite backend.
Existing test suite (1690 passed) and focused router/repo tests pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(gateway): route all thread metadata access through ThreadMetaStore
Following the rename/delete bug fix in PR1, migrate the remaining direct
LangGraph Store reads/writes in the threads router and services to the
ThreadMetaStore abstraction so that the sqlite and memory backends behave
identically and the legacy dual-write paths can be removed.
Migrated endpoints (threads.py):
- create_thread: idempotency check + write now use thread_meta_repo.get/create
instead of dual-writing the LangGraph Store and the SQL row.
- get_thread: reads from thread_meta_repo.get; the checkpoint-only fallback
for legacy threads is preserved.
- patch_thread: replaced _store_get/_store_put with thread_meta_repo.update_metadata.
- delete_thread_data: dropped the legacy store.adelete; thread_meta_repo.delete
already covers it.
Removed dead code (services.py):
- _upsert_thread_in_store — redundant with the immediately following
thread_meta_repo.create() call.
- _sync_thread_title_after_run — worker.py's finally block already syncs
the title via thread_meta_repo.update_display_name() after each run.
Removed dead code (threads.py):
- _store_get / _store_put / _store_upsert helpers (no remaining callers).
- THREADS_NS constant.
- get_store import (router no longer touches the LangGraph Store directly).
New abstract method:
- ThreadMetaStore.update_metadata(thread_id, metadata) merges metadata into
the thread's metadata field. Implemented in both ThreadMetaRepository (SQL,
read-modify-write inside one session) and MemoryThreadMetaStore. Three new
unit tests cover merge / empty / nonexistent behaviour.
Net change: -134 lines. Full test suite: 1693 passed, 14 skipped.
Verified end-to-end with curl in gateway mode against sqlite backend
(create / patch / get / rename / search / delete).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
Co-authored-by: DanielWalnut <45447813+hetaoBackend@users.noreply.github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: JilongSun <965640067@qq.com>
Co-authored-by: jie <49781832+stan-fu@users.noreply.github.com>
Co-authored-by: cooper <cooperfu@tencent.com>
Co-authored-by: yangzheli <43645580+yangzheli@users.noreply.github.com>
* feat(auth): release-validation pass for 2.0-rc — 12 blockers + simplify follow-ups (#2008)
* feat(auth): introduce backend auth module
Port RFC-001 authentication core from PR #1728:
- JWT token handling (create_access_token, decode_token, TokenPayload)
- Password hashing (bcrypt) with verify_password
- SQLite UserRepository with base interface
- Provider Factory pattern (LocalAuthProvider)
- CLI reset_admin tool
- Auth-specific errors (AuthErrorCode, TokenError, AuthErrorResponse)
Deps:
- bcrypt>=4.0.0
- pyjwt>=2.9.0
- email-validator>=2.0.0
- backend/uv.toml pins public PyPI index
Tests: 12 pure unit tests (test_auth_config.py, test_auth_errors.py).
Scope note: authz.py, test_auth.py, and test_auth_type_system.py are
deferred to commit 2 because they depend on middleware and deps wiring
that is not yet in place. Commit 1 stays "pure new files only" as the
spec mandates.
* feat(auth): wire auth end-to-end (middleware + frontend replacement)
Backend:
- Port auth_middleware, csrf_middleware, langgraph_auth, routers/auth
- Port authz decorator (owner_filter_key defaults to 'owner_id')
- Merge app.py: register AuthMiddleware + CSRFMiddleware + CORS, add
_ensure_admin_user lifespan hook, _migrate_orphaned_threads helper,
register auth router
- Merge deps.py: add get_local_provider, get_current_user_from_request,
get_optional_user_from_request; keep get_current_user as thin str|None
adapter for feedback router
- langgraph.json: add auth path pointing to langgraph_auth.py:auth
- Rename metadata['user_id'] -> metadata['owner_id'] in langgraph_auth
(both metadata write and LangGraph filter dict) + test fixtures
Frontend:
- Delete better-auth library and api catch-all route
- Remove better-auth npm dependency and env vars (BETTER_AUTH_SECRET,
BETTER_AUTH_GITHUB_*) from env.js
- Port frontend/src/core/auth/* (AuthProvider, gateway-config,
proxy-policy, server-side getServerSideUser, types)
- Port frontend/src/core/api/fetcher.ts
- Port (auth)/layout, (auth)/login, (auth)/setup pages
- Rewrite workspace/layout.tsx as server component that calls
getServerSideUser and wraps in AuthProvider
- Port workspace/workspace-content.tsx for the client-side sidebar logic
Tests:
- Port 5 auth test files (test_auth, test_auth_middleware,
test_auth_type_system, test_ensure_admin, test_langgraph_auth)
- 176 auth tests PASS
After this commit: login/logout/registration flow works, but persistence
layer does not yet filter by owner_id. Commit 4 closes that gap.
* feat(auth): account settings page + i18n
- Port account-settings-page.tsx (change password, change email, logout)
- Wire into settings-dialog.tsx as new "account" section with UserIcon,
rendered first in the section list
- Add i18n keys:
- en-US/zh-CN: settings.sections.account ("Account" / "账号")
- en-US/zh-CN: button.logout ("Log out" / "退出登录")
- types.ts: matching type declarations
* feat(auth): enforce owner_id across 2.0-rc persistence layer
Add request-scoped contextvar-based owner filtering to threads_meta,
runs, run_events, and feedback repositories. Router code is unchanged
— isolation is enforced at the storage layer so that any caller that
forgets to pass owner_id still gets filtered results, and new routes
cannot accidentally leak data.
Core infrastructure
-------------------
- deerflow/runtime/user_context.py (new):
- ContextVar[CurrentUser | None] with default None
- runtime_checkable CurrentUser Protocol (structural subtype with .id)
- set/reset/get/require helpers
- AUTO sentinel + resolve_owner_id(value, method_name) for sentinel
three-state resolution: AUTO reads contextvar, explicit str
overrides, explicit None bypasses the filter (for migration/CLI)
Repository changes
------------------
- ThreadMetaRepository: create/get/search/update_*/delete gain
owner_id=AUTO kwarg; read paths filter by owner, writes stamp it,
mutations check ownership before applying
- RunRepository: put/get/list_by_thread/delete gain owner_id=AUTO kwarg
- FeedbackRepository: create/get/list_by_run/list_by_thread/delete
gain owner_id=AUTO kwarg
- DbRunEventStore: list_messages/list_events/list_messages_by_run/
count_messages/delete_by_thread/delete_by_run gain owner_id=AUTO
kwarg. Write paths (put/put_batch) read contextvar softly: when a
request-scoped user is available, owner_id is stamped; background
worker writes without a user context pass None which is valid
(orphan row to be bound by migration)
Schema
------
- persistence/models/run_event.py: RunEventRow.owner_id = Mapped[
str | None] = mapped_column(String(64), nullable=True, index=True)
- No alembic migration needed: 2.0 ships fresh, Base.metadata.create_all
picks up the new column automatically
Middleware
----------
- auth_middleware.py: after cookie check, call get_optional_user_from_
request to load the real User, stamp it into request.state.user AND
the contextvar via set_current_user, reset in a try/finally. Public
paths and unauthenticated requests continue without contextvar, and
@require_auth handles the strict 401 path
Test infrastructure
-------------------
- tests/conftest.py: @pytest.fixture(autouse=True) _auto_user_context
sets a default SimpleNamespace(id="test-user-autouse") on every test
unless marked @pytest.mark.no_auto_user. Keeps existing 20+
persistence tests passing without modification
- pyproject.toml [tool.pytest.ini_options]: register no_auto_user
marker so pytest does not emit warnings for opt-out tests
- tests/test_user_context.py: 6 tests covering three-state semantics,
Protocol duck typing, and require/optional APIs
- tests/test_thread_meta_repo.py: one test updated to pass owner_id=
None explicitly where it was previously relying on the old default
Test results
------------
- test_user_context.py: 6 passed
- test_auth*.py + test_langgraph_auth.py + test_ensure_admin.py: 127
- test_run_event_store / test_run_repository / test_thread_meta_repo
/ test_feedback: 92 passed
- Full backend suite: 1905 passed, 2 failed (both @requires_llm flaky
integration tests unrelated to auth), 1 skipped
* feat(auth): extend orphan migration to 2.0-rc persistence tables
_ensure_admin_user now runs a three-step pipeline on every boot:
Step 1 (fatal): admin user exists / is created / password is reset
Step 2 (non-fatal): LangGraph store orphan threads → admin
Step 3 (non-fatal): SQL persistence tables → admin
- threads_meta
- runs
- run_events
- feedback
Each step is idempotent. The fatal/non-fatal split mirrors PR #1728's
original philosophy: admin creation failure blocks startup (the system
is unusable without an admin), whereas migration failures log a warning
and let the service proceed (a partial migration is recoverable; a
missing admin is not).
Key helpers
-----------
- _iter_store_items(store, namespace, *, page_size=500):
async generator that cursor-paginates across LangGraph store pages.
Fixes PR #1728's hardcoded limit=1000 bug that would silently lose
orphans beyond the first page.
- _migrate_orphaned_threads(store, admin_user_id):
Rewritten to use _iter_store_items. Returns the migrated count so the
caller can log it; raises only on unhandled exceptions.
- _migrate_orphan_sql_tables(admin_user_id):
Imports the 4 ORM models lazily, grabs the shared session factory,
runs one UPDATE per table in a single transaction, commits once.
No-op when no persistence backend is configured (in-memory dev).
Tests: test_ensure_admin.py (8 passed)
* test(auth): port AUTH test plan docs + lint/format pass
- Port backend/docs/AUTH_TEST_PLAN.md and AUTH_UPGRADE.md from PR #1728
- Rename metadata.user_id → metadata.owner_id in AUTH_TEST_PLAN.md
(4 occurrences from the original PR doc)
- ruff auto-fix UP037 in sentinel type annotations: drop quotes around
"str | None | _AutoSentinel" now that from __future__ import
annotations makes them implicit string forms
- ruff format: 2 files (app/gateway/app.py, runtime/user_context.py)
Note on test coverage additions:
- conftest.py autouse fixture was already added in commit 4 (had to
be co-located with the repository changes to keep pre-existing
persistence tests passing)
- cross-user isolation E2E tests (test_owner_isolation.py) deferred
— enforcement is already proven by the 98-test repository suite
via the autouse fixture + explicit _AUTO sentinel exercises
- New test cases (TC-API-17..20, TC-ATK-13, TC-MIG-01..07) listed
in AUTH_TEST_PLAN.md are deferred to a follow-up PR — they are
manual-QA test cases rather than pytest code, and the spec-level
coverage is already met by test_user_context.py + the 98-test
repository suite.
Final test results:
- Auth suite (test_auth*, test_langgraph_auth, test_ensure_admin,
test_user_context): 186 passed
- Persistence suite (test_run_event_store, test_run_repository,
test_thread_meta_repo, test_feedback): 98 passed
- Lint: ruff check + ruff format both clean
* test(auth): add cross-user isolation test suite
10 tests exercising the storage-layer owner filter by manually
switching the user_context contextvar between two users. Verifies
the safety invariant:
After a repository write with owner_id=A, a subsequent read with
owner_id=B must not return the row, and vice versa.
Covers all 4 tables that own user-scoped data:
TC-API-17 threads_meta — read, search, update, delete cross-user
TC-API-18 runs — get, list_by_thread, delete cross-user
TC-API-19 run_events — list_messages, list_events, count_messages,
delete_by_thread (CRITICAL: raw conversation
content leak vector)
TC-API-20 feedback — get, list_by_run, delete cross-user
Plus two meta-tests verifying the sentinel pattern itself:
- AUTO + unset contextvar raises RuntimeError
- explicit owner_id=None bypasses the filter (migration escape hatch)
Architecture note
-----------------
These tests bypass the HTTP layer by design. The full chain
(cookie → middleware → contextvar → repository) is covered piecewise:
- test_auth_middleware.py: middleware sets contextvar from cookies
- test_owner_isolation.py: repositories enforce isolation when
contextvar is set to different users
Together they prove the end-to-end safety property without the
ceremony of spinning up a full TestClient + in-memory DB for every
router endpoint.
Tests pass: 231 (full auth + persistence + isolation suite)
Lint: clean
* refactor(auth): migrate user repository to SQLAlchemy ORM
Move the users table into the shared persistence engine so auth
matches the pattern of threads_meta, runs, run_events, and feedback —
one engine, one session factory, one schema init codepath.
New files
---------
- persistence/user/__init__.py, persistence/user/model.py: UserRow
ORM class with partial unique index on (oauth_provider, oauth_id)
- Registered in persistence/models/__init__.py so
Base.metadata.create_all() picks it up
Modified
--------
- auth/repositories/sqlite.py: rewritten as async SQLAlchemy,
identical constructor pattern to the other four repositories
(def __init__(self, session_factory) + self._sf = session_factory)
- auth/config.py: drop users_db_path field — storage is configured
through config.database like every other table
- deps.py/get_local_provider: construct SQLiteUserRepository with
the shared session factory, fail fast if engine is not initialised
- tests/test_auth.py: rewrite test_sqlite_round_trip_new_fields to
use the shared engine (init_engine + close_engine in a tempdir)
- tests/test_auth_type_system.py: add per-test autouse fixture that
spins up a scratch engine and resets deps._cached_* singletons
* refactor(auth): remove SQL orphan migration (unused in supported scenarios)
The _migrate_orphan_sql_tables helper existed to bind NULL owner_id
rows in threads_meta, runs, run_events, and feedback to the admin on
first boot. But in every supported upgrade path, it's a no-op:
1. Fresh install: create_all builds fresh tables, no legacy rows
2. No-auth → with-auth (no existing persistence DB): persistence
tables are created fresh by create_all, no legacy rows
3. No-auth → with-auth (has existing persistence DB from #1930):
NOT a supported upgrade path — "有 DB 到有 DB" schema evolution
is out of scope; users wipe DB or run manual ALTER
So the SQL orphan migration never has anything to do in the
supported matrix. Delete the function, simplify _ensure_admin_user
from a 3-step pipeline to a 2-step one (admin creation + LangGraph
store orphan migration only).
LangGraph store orphan migration stays: it serves the real
"no-auth → with-auth" upgrade path where a user's existing LangGraph
thread metadata has no owner_id field and needs to be stamped with
the newly-created admin's id.
Tests: 284 passed (auth + persistence + isolation)
Lint: clean
* security(auth): write initial admin password to 0600 file instead of logs
CodeQL py/clear-text-logging-sensitive-data flagged 3 call sites that
logged the auto-generated admin password to stdout via logger.info().
Production log aggregators (ELK/Splunk/etc) would have captured those
cleartext secrets. Replace with a shared helper that writes to
.deer-flow/admin_initial_credentials.txt with mode 0600, and log only
the path.
New file
--------
- app/gateway/auth/credential_file.py: write_initial_credentials()
helper. Takes email, password, and a "initial"/"reset" label.
Creates .deer-flow/ if missing, writes a header comment plus the
email+password, chmods 0o600, returns the absolute Path.
Modified
--------
- app/gateway/app.py: both _ensure_admin_user paths (fresh creation
+ needs_setup password reset) now write to file and log the path
- app/gateway/auth/reset_admin.py: rewritten to use the shared ORM
repo (SQLiteUserRepository with session_factory) and the
credential_file helper. The previous implementation was broken
after the earlier ORM refactor — it still imported _get_users_conn
and constructed SQLiteUserRepository() without a session factory.
No tests changed — the three password-log sites are all exercised
via existing test_ensure_admin.py which checks that startup
succeeds, not that a specific string appears in logs.
CodeQL alerts 272, 283, 284: all resolved.
* security(auth): strict JWT validation in middleware (fix junk cookie bypass)
AUTH_TEST_PLAN test 7.5.8 expects junk cookies to be rejected with
401. The previous middleware behaviour was "presence-only": check
that some access_token cookie exists, then pass through. In
combination with my Task-12 decision to skip @require_auth
decorators on routes, this created a gap where a request with any
cookie-shaped string (e.g. access_token=not-a-jwt) would bypass
authentication on routes that do not touch the repository
(/api/models, /api/mcp/config, /api/memory, /api/skills, …).
Fix: middleware now calls get_current_user_from_request() strictly
and catches the resulting HTTPException to render a 401 with the
proper fine-grained error code (token_invalid, token_expired,
user_not_found, …). On success it stamps request.state.user and
the contextvar so repository-layer owner filters work downstream.
The 4 old "_with_cookie_passes" tests in test_auth_middleware.py
were written for the presence-only behaviour; they asserted that
a junk cookie would make the handler return 200. They are renamed
to "_with_junk_cookie_rejected" and their assertions flipped to
401. The negative path (no cookie → 401 not_authenticated)
is unchanged.
Verified:
no cookie → 401 not_authenticated
junk cookie → 401 token_invalid (the fixed bug)
expired cookie → 401 token_expired
Tests: 284 passed (auth + persistence + isolation)
Lint: clean
* security(auth): wire @require_permission(owner_check=True) on isolation routes
Apply the require_permission decorator to all 28 routes that take a
{thread_id} path parameter. Combined with the strict middleware
(previous commit), this gives the double-layer protection that
AUTH_TEST_PLAN test 7.5.9 documents:
Layer 1 (AuthMiddleware): cookie + JWT validation, rejects junk
cookies and stamps request.state.user
Layer 2 (@require_permission with owner_check=True): per-resource
ownership verification via
ThreadMetaStore.check_access — returns
404 if a different user owns the thread
The decorator's owner_check branch is rewritten to use the SQL
thread_meta_repo (the 2.0-rc persistence layer) instead of the
LangGraph store path that PR #1728 used (_store_get / get_store
in routers/threads.py). The inject_record convenience is dropped
— no caller in 2.0 needs the LangGraph blob, and the SQL repo has
a different shape.
Routes decorated (28 total):
- threads.py: delete, patch, get, get-state, post-state, post-history
- thread_runs.py: post-runs, post-runs-stream, post-runs-wait,
list_runs, get_run, cancel_run, join_run, stream_existing_run,
list_thread_messages, list_run_messages, list_run_events,
thread_token_usage
- feedback.py: create, list, stats, delete
- uploads.py: upload (added Request param), list, delete
- artifacts.py: get_artifact
- suggestions.py: generate (renamed body parameter to avoid
conflict with FastAPI Request)
Test fixes:
- test_suggestions_router.py: bypass the decorator via __wrapped__
(the unit tests cover parsing logic, not auth — no point spinning
up a thread_meta_repo just to test JSON unwrapping)
- test_auth_middleware.py 4 fake-cookie tests: already updated in
the previous commit (745bf432)
Tests: 293 passed (auth + persistence + isolation + suggestions)
Lint: clean
* security(auth): defense-in-depth fixes from release validation pass
Eight findings caught while running the AUTH_TEST_PLAN end-to-end against
the deployed sg_dev stack. Each is a pre-condition for shipping
release/2.0-rc that the previous PRs missed.
Backend hardening
- routers/auth.py: rate limiter X-Real-IP now requires AUTH_TRUSTED_PROXIES
whitelist (CIDR/IP allowlist). Without nginx in front, the previous code
honored arbitrary X-Real-IP, letting an attacker rotate the header to
fully bypass the per-IP login lockout.
- routers/auth.py: 36-entry common-password blocklist via Pydantic
field_validator on RegisterRequest + ChangePasswordRequest. The shared
_validate_strong_password helper keeps the constraint in one place.
- routers/threads.py: ThreadCreateRequest + ThreadPatchRequest strip
server-reserved metadata keys (owner_id, user_id) via Pydantic
field_validator so a forged value can never round-trip back to other
clients reading the same thread. The actual ownership invariant stays
on the threads_meta row; this closes the metadata-blob echo gap.
- authz.py + thread_meta/sql.py: require_permission gains a require_existing
flag plumbed through check_access(require_existing=True). Destructive
routes (DELETE/PATCH/state-update/runs/feedback) now treat a missing
thread_meta row as 404 instead of "untracked legacy thread, allow",
closing the cross-user delete-idempotence gap where any user could
successfully DELETE another user's deleted thread.
- repositories/sqlite.py + base.py: update_user raises UserNotFoundError
on a vanished row instead of silently returning the input. Concurrent
delete during password reset can no longer look like a successful update.
- runtime/user_context.py: resolve_owner_id() coerces User.id (UUID) to
str at the contextvar boundary so SQLAlchemy String(64) columns can
bind it. The whole 2.0-rc isolation pipeline was previously broken
end-to-end (POST /api/threads → 500 "type 'UUID' is not supported").
- persistence/engine.py: SQLAlchemy listener enables PRAGMA journal_mode=WAL,
synchronous=NORMAL, foreign_keys=ON on every new SQLite connection.
TC-UPG-06 in the test plan expects WAL; previous code shipped with the
default 'delete' journal.
- auth_middleware.py: stamp request.state.auth = AuthContext(...) so
@require_permission's short-circuit fires; previously every isolation
request did a duplicate JWT decode + users SELECT. Also unifies the
401 payload through AuthErrorResponse(...).model_dump().
- app.py: _ensure_admin_user restructure removes the noqa F821 scoping
bug where 'password' was referenced outside the branch that defined it.
New _announce_credentials helper absorbs the duplicate log block in
the fresh-admin and reset-admin branches.
* fix(frontend+nginx): rollout CSRF on every state-changing client path
The frontend was 100% broken in gateway-pro mode for any user trying to
open a specific chat thread. Three cumulative bugs each silently
masked the next.
LangGraph SDK CSRF gap (api-client.ts)
- The Client constructor took only apiUrl, no defaultHeaders, no fetch
interceptor. The SDK's internal fetch never sent X-CSRF-Token, so
every state-changing /api/langgraph-compat/* call (runs/stream,
threads/search, threads/{tid}/history, ...) hit CSRFMiddleware and
got 403 before reaching the auth check. UI symptom: empty thread page
with no error message; the SPA's hooks swallowed the rejection.
- Fix: pass an onRequest hook that injects X-CSRF-Token from the
csrf_token cookie per request. Reading the cookie per call (not at
construction time) handles login / logout / password-change cookie
rotation transparently. The SDK's prepareFetchOptions calls
onRequest for both regular requests AND streaming/SSE/reconnect, so
the same hook covers runs.stream and runs.joinStream.
Raw fetch CSRF gap (7 files)
- Audit: 11 frontend fetch sites, only 2 included CSRF (login/setup +
account-settings change-password). The other 7 routed through raw
fetch() with no header — suggestions, memory, agents, mcp, skills,
uploads, and the local thread cleanup hook all 403'd silently.
- Fix: enhance fetcher.ts:fetchWithAuth to auto-inject X-CSRF-Token on
POST/PUT/DELETE/PATCH from a single shared readCsrfCookie() helper.
Convert all 7 raw fetch() callers to fetchWithAuth so the contract
is centrally enforced. api-client.ts and fetcher.ts share
readCsrfCookie + STATE_CHANGING_METHODS to avoid drift.
nginx routing + buffering (nginx.local.conf)
- The auth feature shipped without updating the nginx config: per-API
explicit location blocks but no /api/v1/auth/, /api/feedback, /api/runs.
The frontend's client-side fetches to /api/v1/auth/login/local 404'd
from the Next.js side because nginx routed /api/* to the frontend.
- Fix: add catch-all `location /api/` that proxies to the gateway.
nginx longest-prefix matching keeps the explicit blocks (/api/models,
/api/threads regex, /api/langgraph/, ...) winning for their paths.
- Fix: disable proxy_buffering + proxy_request_buffering for the
frontend `location /` block. Without it, nginx tries to spool large
Next.js chunks into /var/lib/nginx/proxy (root-owned) and fails with
Permission denied → ERR_INCOMPLETE_CHUNKED_ENCODING → ChunkLoadError.
* test(auth): release-validation test infra and new coverage
Test fixtures and unit tests added during the validation pass.
Router test helpers (NEW: tests/_router_auth_helpers.py)
- make_authed_test_app(): builds a FastAPI test app with a stub
middleware that stamps request.state.user + request.state.auth and a
permissive thread_meta_repo mock. TestClient-based router tests
(test_artifacts_router, test_threads_router) use it instead of bare
FastAPI() so the new @require_permission(owner_check=True) decorators
short-circuit cleanly.
- call_unwrapped(): walks the __wrapped__ chain to invoke the underlying
handler without going through the authz wrappers. Direct-call tests
(test_uploads_router) use it. Typed with ParamSpec so the wrapped
signature flows through.
Backend test additions
- test_auth.py: 7 tests for the new _get_client_ip trust model (no
proxy / trusted proxy / untrusted peer / XFF rejection / invalid
CIDR / no client). 5 tests for the password blocklist (literal,
case-insensitive, strong password accepted, change-password binding,
short-password length-check still fires before blocklist).
test_update_user_raises_when_row_concurrently_deleted: closes a
shipped-without-coverage gap on the new UserNotFoundError contract.
- test_thread_meta_repo.py: 4 tests for check_access(require_existing=True)
— strict missing-row denial, strict owner match, strict owner mismatch,
strict null-owner still allowed (shared rows survive the tightening).
- test_ensure_admin.py: 3 tests for _migrate_orphaned_threads /
_iter_store_items pagination, covering the TC-UPG-02 upgrade story
end-to-end via mock store. Closes the gap where the cursor pagination
was untested even though the previous PR rewrote it.
- test_threads_router.py: 5 tests for _strip_reserved_metadata
(owner_id removal, user_id removal, safe-keys passthrough, empty
input, both-stripped).
- test_auth_type_system.py: replace "password123" fixtures with
Tr0ub4dor3a / AnotherStr0ngPwd! so the new password blocklist
doesn't reject the test data.
* docs(auth): refresh TC-DOCKER-05 + document Docker validation gap
- AUTH_TEST_PLAN.md TC-DOCKER-05: the previous expectation
("admin password visible in docker logs") was stale after the simplify
pass that moved credentials to a 0600 file. The grep "Password:" check
would have silently failed and given a false sense of coverage. New
expectation matches the actual file-based path: 0600 file in
DEER_FLOW_HOME, log shows the path (not the secret), reverse-grep
asserts no leaked password in container logs.
- NEW: docs/AUTH_TEST_DOCKER_GAP.md documents the only un-executed
block in the test plan (TC-DOCKER-01..06). Reason: sg_dev validation
host has no Docker daemon installed. The doc maps each Docker case
to an already-validated bare-metal equivalent (TC-1.1, TC-REENT-01,
TC-API-02 etc.) so the gap is auditable, and includes pre-flight
reproduction steps for whoever has Docker available.
---------
Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com>
* refactor(persistence): unify SQLite to single deerflow.db and move checkpointer to runtime
Merge checkpoints.db and app.db into a single deerflow.db file (WAL mode
handles concurrent access safely). Move checkpointer module from
agents/checkpointer to runtime/checkpointer to better reflect its role
as a runtime infrastructure concern.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(persistence): rename owner_id to user_id and thread_meta_repo to thread_store
Rename owner_id to user_id across all persistence models, repositories,
stores, routers, and tests for clearer semantics. Rename thread_meta_repo
to thread_store for consistency with run_store/run_event_store naming.
Add ThreadMetaStore return type annotation to get_thread_store().
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(persistence): unify ThreadMetaStore interface with user isolation and factory
Add user_id parameter to all ThreadMetaStore abstract methods. Implement
owner isolation in MemoryThreadMetaStore with _get_owned_record helper.
Add check_access to base class and memory implementation. Add
make_thread_store factory to simplify deps.py initialization. Add
memory-backend isolation tests.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(feedback): add UNIQUE(thread_id, run_id, user_id) constraint
Add UNIQUE constraint to FeedbackRow to enforce one feedback per user per run,
enabling upsert behavior in Task 2. Update tests to use distinct user_ids for
multiple feedback records per run, and pass user_id=None to list_by_run for
admin-style queries that bypass user isolation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(feedback): add upsert() method with UNIQUE enforcement
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(feedback): add delete_by_run() and list_by_thread_grouped()
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(feedback): add PUT upsert and DELETE-by-run endpoints
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(feedback): enrich messages endpoint with per-run feedback data
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(feedback): add frontend feedback API client
Adds upsertFeedback and deleteFeedback API functions backed by
fetchWithAuth, targeting the /api/threads/{id}/runs/{id}/feedback
endpoint.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(feedback): wire feedback data into message rendering for history echo
Adds useThreadFeedback hook that fetches run-level feedback from the
messages API and builds a runId->FeedbackData map. MessageList now calls
this hook and passes feedback and runId to each MessageListItem so
previously-submitted thumbs are pre-filled when revisiting a thread.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(feedback): correct run_id mapping for feedback echo
The feedbackMap was keyed by run_id but looked up by LangGraph message ID.
Fixed by tracking AI message ordinal index to correlate event store
run_ids with LangGraph SDK messages.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(feedback): use real threadId and refresh after stream
- Pass threadId prop to MessageListItem instead of reading "new" from URL params
- Invalidate thread-feedback query on stream finish so buttons appear immediately
- Show feedback buttons always visible, copy button on hover only
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* style(feedback): group copy and feedback buttons together on the left
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* style(feedback): always show toolbar buttons without hover
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(persistence): stream hang when run_events.backend=db
DbRunEventStore._user_id_from_context() returned user.id without
coercing it to str. User.id is a Pydantic UUID, and aiosqlite cannot
bind a raw UUID object to a VARCHAR column, so the INSERT for the
initial human_message event silently rolled back and raised out of
the worker task. Because that put() sat outside the worker's try
block, the finally-clause that publishes end-of-stream never ran
and the SSE stream hung forever.
jsonl mode was unaffected because json.dumps(default=str) coerces
UUID objects transparently.
Fixes:
- db.py: coerce user.id to str at the context-read boundary (matches
what resolve_user_id already does for the other repositories)
- worker.py: move RunJournal init + human_message put inside the try
block so any failure flows through the finally/publish_end path
instead of hanging the subscriber
Defense-in-depth:
- engine.py: add PRAGMA busy_timeout=5000 so checkpointer and event
store wait for each other on the shared deerflow.db file instead
of failing immediately under write-lock contention
- journal.py: skip fire-and-forget _flush_sync when a previous flush
task is still in flight, to avoid piling up concurrent put_batch
writes on the same SQLAlchemy engine during streaming; flush() now
waits for pending tasks before draining the buffer
- database_config.py: doc-only update clarifying WAL + busy_timeout
keep the unified deerflow.db safe for both workloads
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore(persistence): drop redundant busy_timeout PRAGMA
Python's sqlite3 driver defaults to a 5-second busy timeout via the
``timeout`` kwarg of ``sqlite3.connect``, and aiosqlite + SQLAlchemy's
aiosqlite dialect inherit that default. Setting ``PRAGMA busy_timeout=5000``
explicitly was a no-op — verified by reading back the PRAGMA on a fresh
connection (it already reports 5000ms without our PRAGMA).
Concurrent stress test (50 checkpoint writes + 20 event batches + 50
thread_meta updates on the same deerflow.db) still completes with zero
errors and 200/200 rows after removing the explicit PRAGMA.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(journal): unwrap Command tool results in on_tool_end
Tools that update graph state (e.g. ``present_files``) return
``Command(update={'messages': [ToolMessage(...)], 'artifacts': [...]})``.
LangGraph later unwraps the inner ``ToolMessage`` into checkpoint state,
but ``RunJournal.on_tool_end`` was receiving the ``Command`` object
directly via the LangChain callback chain and storing
``str(Command(update={...}))`` as the tool_result content.
This produced a visible divergence between the event-store and the
checkpoint for any thread that used a Command-returning tool, blocking
the event-store-backed history fix in the follow-up commit. Concrete
example from thread ``6d30913e-dcd4-41c8-8941-f66c716cf359`` (seq=48):
checkpoint had ``'Successfully presented files'`` while event_store
stored the full Command repr.
The fix detects ``Command`` in ``on_tool_end``, extracts the first
``ToolMessage`` from ``update['messages']``, and lets the existing
ToolMessage branch handle the ``model_dump()`` path. Legacy rows still
containing the Command repr are separately cleaned up by the history
helper in the follow-up commit.
Tests:
- ``test_tool_end_unwraps_command_with_inner_tool_message`` — unit test
of the unwrap branch with a constructed Command
- ``test_tool_invoke_end_to_end_unwraps_command`` — end-to-end via
``CallbackManager`` + ``tool.invoke`` to exercise the real LangChain
dispatch path that production uses, matching the repro shape from
``present_files``
- Counter-proof: temporarily reverted the patch, both tests failed with
the exact ``Command(update={...})`` repr that was stored in the
production SQLite row at seq=48, confirming LangChain does pass the
``Command`` through callbacks (the unwrap is load-bearing)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(threads): load history messages from event store, immune to summarize
``get_thread_history`` and ``get_thread_state`` in Gateway mode read
messages from ``checkpoint.channel_values["messages"]``. After
SummarizationMiddleware runs mid-run, that list is rewritten in-place:
pre-summarize messages are dropped and a synthetic summary-as-human
message takes position 0. The frontend then renders a chat history that
starts with ``"Here is a summary of the conversation to date:..."``
instead of the user's original query, and all earlier turns are gone.
The event store (``RunEventStore``) is append-only and never rewritten,
so it retains the full transcript. This commit adds a helper
``_get_event_store_messages`` that loads the event store's message
stream and overrides ``values["messages"]`` in both endpoints; the
checkpoint fallback kicks in only when the event store is unavailable.
Behavior contract of the helper:
- **Full pagination.** ``list_messages`` returns the newest ``limit``
records when no cursor is given, so a fixed limit silently drops
older messages on long threads. The helper sizes the read from
``count_messages()`` and pages forward with ``after_seq`` cursors.
- **Copy-on-read.** Each content dict is copied before ``id`` is
patched so the live store object (``MemoryRunEventStore`` returns
references) is never mutated.
- **Stable ids.** Messages with ``id=None`` (human + tool_result,
which don't receive an id until checkpoint persistence) get a
deterministic ``uuid5(NAMESPACE_URL, f"{thread_id}:{seq}")`` so
React keys stay stable across requests. AI messages keep their
LLM-assigned ``lc_run--*`` ids.
- **Legacy ``Command`` repr sanitization.** Rows captured before the
``journal.py`` ``on_tool_end`` fix (previous commit) stored
``str(Command(update={'messages': [ToolMessage(content='X', ...)]}))``
as the tool_result content. ``_sanitize_legacy_command_repr``
regex-extracts the inner text so old threads render cleanly.
- **Inline feedback.** When loading the stream, the helper also pulls
``feedback_repo.list_by_thread_grouped`` and attaches ``run_id`` to
every message plus ``feedback`` to the final ``ai_message`` of each
run. This removes the frontend's need to fetch a second endpoint
and positional-index-map its way back to the right run. When the
feedback subsystem is unavailable, the ``feedback`` field is left
absent entirely so the frontend hides the button rather than
rendering it over a broken write path.
- **User context.** ``DbRunEventStore`` is user-scoped by default via
``resolve_user_id(AUTO)``. The helper relies on the ``@require_permission``
decorator having populated the user contextvar on both callers; the
docstring documents this dependency explicitly so nobody wires it
into a CLI or migration script without passing ``user_id=None``.
Real data verification against thread
``6d30913e-dcd4-41c8-8941-f66c716cf359``: checkpoint showed 12 messages
(summarize-corrupted), event store had 16. The original human message
``"最新伊美局势"`` was preserved as seq=1 in the event store and
correctly restored to position 0 in the helper output. Helper output
for AI messages was byte-identical to checkpoint for every overlapping
message; only tool_result ids differed (patched to uuid5) and the
legacy Command repr at seq=48 was sanitized.
Tests:
- ``test_thread_state_event_store.py`` — 18 tests covering
``_sanitize_legacy_command_repr`` (passthrough, single/double-quote
extraction, unparseable fallback), helper happy path (all message
types, stable uuid5, store non-mutation), multi-page pagination,
summarize regression (recovers pre-summarize messages), feedback
attachment (per-run, multi-run threads, repo failure graceful),
and dependency failure fallback to ``None``.
Docs:
- ``docs/superpowers/plans/2026-04-10-event-store-history.md`` — the
implementation plan this commit realizes, with Task 1 revised after
the evaluation findings (pagination, copy-on-read, Command wrap
already landed in journal.py, frontend feedback pagination in the
follow-up commit, Standard-mode follow-up noted).
- ``docs/superpowers/specs/2026-04-11-runjournal-history-evaluation.md``
— the Claude + second-opinion evaluation document that drove the
plan revisions (pagination bug, dict-mutation bug, feedback hidden
bug, Command bug).
- ``docs/superpowers/specs/2026-04-11-summarize-marker-design.md`` —
design for a follow-up PR that visually marks summarize events in
history, based on a verified ``adispatch_custom_event`` experiment
(``trace=False`` middleware nodes can still forward the Pregel task
config via explicit signature injection).
Scope: Gateway mode only (``make dev-pro``). Standard mode
(``make dev``) hits LangGraph Server directly and bypasses these
endpoints; the summarize symptom is still present there and is
tracked as a separate follow-up in the plan.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(feedback): inline feedback on history and drop positional mapping
The old ``useThreadFeedback`` hook loaded ``GET /api/threads/{id}/messages?limit=200``
and built two parallel lookup tables: ``runIdByAiIndex`` (an ordinal array of
run_ids for every ``ai_message``-typed event) and ``feedbackByRunId``. The render
loop in ``message-list.tsx`` walked the AI messages in order, incrementing
``aiMessageIndex`` on each non-human message, and used that ordinal to look up
the run_id and feedback.
This shape had three latent bugs we could observe on real threads:
1. **Fetch was capped at 200 messages.** Long or tool-heavy threads silently
dropped earlier entries from the map, so feedback buttons could be missing
on messages they should own.
2. **Ordinal mismatch.** The render loop counted every non-human message
(including each intermediate ``ai_tool_call``), but ``runIdByAiIndex`` only
pushed entries for ``event_type == "ai_message"``. A run with 3 tool_calls
+ 1 final AI message would push 1 entry while the render consumed 4
positions, so buttons mapped to the wrong positions across multi-run
threads.
3. **Two parallel data paths.** The ``/history`` render path and the
``/messages`` feedback-lookup path could drift in-between an
``invalidateQueries`` call and the next refetch, producing transient
mismaps.
The previous commit moved the authoritative message source for history to
the event store and added ``run_id`` + ``feedback`` inline on each message
dict returned by ``_get_event_store_messages``. This commit aligns the
frontend with that contract:
- **Delete** ``useThreadFeedback``, ``ThreadFeedbackData``,
``runIdByAiIndex``, ``feedbackByRunId``, and ``fetchAllThreadMessages``.
- **Introduce** ``useThreadMessageEnrichment`` that fetches
``POST /history?limit=1`` once, indexes the returned messages by
``message.id`` into a ``Map<id, {run_id, feedback?}>``, and invalidates
on stream completion (``onFinish`` in ``useThreadStream``). Keying by
``message.id`` is stable across runs, tool_call chains, and summarize.
- **Simplify** ``message-list.tsx`` to drop the ``aiMessageIndex``
counter and read ``enrichment?.get(msg.id)`` at each render step.
- **Rewire** ``message-list-item.tsx`` so the feedback button renders
when ``feedback !== undefined`` rather than when the message happens
to be non-human. ``feedback`` is ``undefined`` for non-eligible
messages (humans, non-final AI, tools), ``null`` for the final
ai_message of an unrated run, and a ``FeedbackData`` object once
rated — cleanly distinguishing "not eligible" from "eligible but
unrated".
``/api/threads/{id}/messages`` is kept as a debug/export surface; no
frontend code calls it anymore but the backend router is untouched.
Validation:
- ``pnpm check`` clean (0 errors, 1 pre-existing unrelated warning)
- Live test on thread ``3d5dea4a`` after gateway restart confirmed the
original user query is restored to position 0 and the feedback
button behaves correctly on the final AI message.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(rebase): remove duplicate definitions and update stale module paths
Rebase left duplicate function blocks in worker.py (triple human_message
write causing 3x user messages in /history), deps.py, and prompt.py.
Also update checkpointer imports from the old deerflow.agents.checkpointer
path to deerflow.runtime.checkpointer, and clean up orphaned feedback
props in the frontend message components.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(rebase): restore FeedbackButtons component and enrichment lost during rebase
The FeedbackButtons component (defined inline in message-list-item.tsx)
was introduced in commit 95df8d13 but lost during rebase. The previous
rebase cleanup commit incorrectly removed the feedback/runId props and
enrichment hook as "orphaned code" instead of restoring the missing
component. This commit restores:
- FeedbackButtons component with thumbs up/down toggle and optimistic state
- FeedbackData/upsertFeedback/deleteFeedback imports
- feedback and runId props on MessageListItem
- useThreadMessageEnrichment hook and entry lookup in message-list.tsx
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
Co-authored-by: DanielWalnut <45447813+hetaoBackend@users.noreply.github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: JilongSun <965640067@qq.com>
Co-authored-by: jie <49781832+stan-fu@users.noreply.github.com>
Co-authored-by: cooper <cooperfu@tencent.com>
Co-authored-by: yangzheli <43645580+yangzheli@users.noreply.github.com>
Co-authored-by: greatmengqi <chenmengqi.0376@gmail.com>
Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com>
* feat(persistence): add unified persistence layer with event store, token tracking, and feedback (#1930)
* feat(persistence): add SQLAlchemy 2.0 async ORM scaffold
Introduce a unified database configuration (DatabaseConfig) that
controls both the LangGraph checkpointer and the DeerFlow application
persistence layer from a single `database:` config section.
New modules:
- deerflow.config.database_config — Pydantic config with memory/sqlite/postgres backends
- deerflow.persistence — async engine lifecycle, DeclarativeBase with to_dict mixin, Alembic skeleton
- deerflow.runtime.runs.store — RunStore ABC + MemoryRunStore implementation
Gateway integration initializes/tears down the persistence engine in
the existing langgraph_runtime() context manager. Legacy checkpointer
config is preserved for backward compatibility.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(persistence): add RunEventStore ABC + MemoryRunEventStore
Phase 2-A prerequisite for event storage: adds the unified run event
stream interface (RunEventStore) with an in-memory implementation,
RunEventsConfig, gateway integration, and comprehensive tests (27 cases).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(persistence): add ORM models, repositories, DB/JSONL event stores, RunJournal, and API endpoints
Phase 2-B: run persistence + event storage + token tracking.
- ORM models: RunRow (with token fields), ThreadMetaRow, RunEventRow
- RunRepository implements RunStore ABC via SQLAlchemy ORM
- ThreadMetaRepository with owner access control
- DbRunEventStore with trace content truncation and cursor pagination
- JsonlRunEventStore with per-run files and seq recovery from disk
- RunJournal (BaseCallbackHandler) captures LLM/tool/lifecycle events,
accumulates token usage by caller type, buffers and flushes to store
- RunManager now accepts optional RunStore for persistent backing
- Worker creates RunJournal, writes human_message, injects callbacks
- Gateway deps use factory functions (RunRepository when DB available)
- New endpoints: messages, run messages, run events, token-usage
- ThreadCreateRequest gains assistant_id field
- 92 tests pass (33 new), zero regressions
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(persistence): add user feedback + follow-up run association
Phase 2-C: feedback and follow-up tracking.
- FeedbackRow ORM model (rating +1/-1, optional message_id, comment)
- FeedbackRepository with CRUD, list_by_run/thread, aggregate stats
- Feedback API endpoints: create, list, stats, delete
- follow_up_to_run_id in RunCreateRequest (explicit or auto-detected
from latest successful run on the thread)
- Worker writes follow_up_to_run_id into human_message event metadata
- Gateway deps: feedback_repo factory + getter
- 17 new tests (14 FeedbackRepository + 3 follow-up association)
- 109 total tests pass, zero regressions
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test+config: comprehensive Phase 2 test coverage + deprecate checkpointer config
- config.example.yaml: deprecate standalone checkpointer section, activate
unified database:sqlite as default (drives both checkpointer + app data)
- New: test_thread_meta_repo.py (14 tests) — full ThreadMetaRepository coverage
including check_access owner logic, list_by_owner pagination
- Extended test_run_repository.py (+4 tests) — completion preserves fields,
list ordering desc, limit, owner_none returns all
- Extended test_run_journal.py (+8 tests) — on_chain_error, track_tokens=false,
middleware no ai_message, unknown caller tokens, convenience fields,
tool_error, non-summarization custom event
- Extended test_run_event_store.py (+7 tests) — DB batch seq continuity,
make_run_event_store factory (memory/db/jsonl/fallback/unknown)
- Extended test_phase2b_integration.py (+4 tests) — create_or_reject persists,
follow-up metadata, summarization in history, full DB-backed lifecycle
- Fixed DB integration test to use proper fake objects (not MagicMock)
for JSON-serializable metadata
- 157 total Phase 2 tests pass, zero regressions
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* config: move default sqlite_dir to .deer-flow/data
Keep SQLite databases alongside other DeerFlow-managed data
(threads, memory) under the .deer-flow/ directory instead of a
top-level ./data folder.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(persistence): remove UTFJSON, use engine-level json_serializer + datetime.now()
- Replace custom UTFJSON type with standard sqlalchemy.JSON in all ORM
models. Add json_serializer=json.dumps(ensure_ascii=False) to all
create_async_engine calls so non-ASCII text (Chinese etc.) is stored
as-is in both SQLite and Postgres.
- Change ORM datetime defaults from datetime.now(UTC) to datetime.now(),
remove UTC imports.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(gateway): simplify deps.py with getter factory + inline repos
- Replace 6 identical getter functions with _require() factory.
- Inline 3 _make_*_repo() factories into langgraph_runtime(), call
get_session_factory() once instead of 3 times.
- Add thread_meta upsert in start_run (services.py).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(docker): add UV_EXTRAS build arg for optional dependencies
Support installing optional dependency groups (e.g. postgres) at
Docker build time via UV_EXTRAS build arg:
UV_EXTRAS=postgres docker compose build
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(journal): fix flush, token tracking, and consolidate tests
RunJournal fixes:
- _flush_sync: retain events in buffer when no event loop instead of
dropping them; worker's finally block flushes via async flush().
- on_llm_end: add tool_calls filter and caller=="lead_agent" guard for
ai_message events; mark message IDs for dedup with record_llm_usage.
- worker.py: persist completion data (tokens, message count) to RunStore
in finally block.
Model factory:
- Auto-inject stream_usage=True for BaseChatOpenAI subclasses with
custom api_base, so usage_metadata is populated in streaming responses.
Test consolidation:
- Delete test_phase2b_integration.py (redundant with existing tests).
- Move DB-backed lifecycle test into test_run_journal.py.
- Add tests for stream_usage injection in test_model_factory.py.
- Clean up executor/task_tool dead journal references.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(events): widen content type to str|dict in all store backends
Allow event content to be a dict (for structured OpenAI-format messages)
in addition to plain strings. Dict values are JSON-serialized for the DB
backend and deserialized on read; memory and JSONL backends handle dicts
natively. Trace truncation now serializes dicts to JSON before measuring.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(events): use metadata flag instead of heuristic for dict content detection
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(converters): add LangChain-to-OpenAI message format converters
Pure functions langchain_to_openai_message, langchain_to_openai_completion,
langchain_messages_to_openai, and _infer_finish_reason for converting
LangChain BaseMessage objects to OpenAI Chat Completions format, used by
RunJournal for event storage. 15 unit tests added.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(converters): handle empty list content as null, clean up test
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(events): human_message content uses OpenAI user message format
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(events): ai_message uses OpenAI format, add ai_tool_call message event
- ai_message content now uses {"role": "assistant", "content": "..."} format
- New ai_tool_call message event emitted when lead_agent LLM responds with tool_calls
- ai_tool_call uses langchain_to_openai_message converter for consistent format
- Both events include finish_reason in metadata ("stop" or "tool_calls")
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(events): add tool_result message event with OpenAI tool message format
Cache tool_call_id from on_tool_start keyed by run_id as fallback for on_tool_end,
then emit a tool_result message event (role=tool, tool_call_id, content) after each
successful tool completion.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(events): summary content uses OpenAI system message format
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(events): replace llm_start/llm_end with llm_request/llm_response in OpenAI format
Add on_chat_model_start to capture structured prompt messages as llm_request events.
Replace llm_end trace events with llm_response using OpenAI Chat Completions format.
Track llm_call_index to pair request/response events.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(events): add record_middleware method for middleware trace events
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test(events): add full run sequence integration test for OpenAI content format
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(events): align message events with checkpoint format and add middleware tag injection
- Message events (ai_message, ai_tool_call, tool_result, human_message) now use
BaseMessage.model_dump() format, matching LangGraph checkpoint values.messages
- on_tool_end extracts tool_call_id/name/status from ToolMessage objects
- on_tool_error now emits tool_result message events with error status
- record_middleware uses middleware:{tag} event_type and middleware category
- Summarization custom events use middleware:summarize category
- TitleMiddleware injects middleware:title tag via get_config() inheritance
- SummarizationMiddleware model bound with middleware:summarize tag
- Worker writes human_message using HumanMessage.model_dump()
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(threads): switch search endpoint to threads_meta table and sync title
- POST /api/threads/search now queries threads_meta table directly,
removing the two-phase Store + Checkpointer scan approach
- Add ThreadMetaRepository.search() with metadata/status filters
- Add ThreadMetaRepository.update_display_name() for title sync
- Worker syncs checkpoint title to threads_meta.display_name on run completion
- Map display_name to values.title in search response for API compatibility
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(threads): history endpoint reads messages from event store
- POST /api/threads/{thread_id}/history now combines two data sources:
checkpointer for checkpoint_id, metadata, title, thread_data;
event store for messages (complete history, not truncated by summarization)
- Strip internal LangGraph metadata keys from response
- Remove full channel_values serialization in favor of selective fields
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: remove duplicate optional-dependencies header in pyproject.toml
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(middleware): pass tagged config to TitleMiddleware ainvoke call
Without the config, the middleware:title tag was not injected,
causing the LLM response to be recorded as a lead_agent ai_message
in run_events.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: resolve merge conflict in .env.example
Keep both DATABASE_URL (from persistence-scaffold) and WECOM
credentials (from main) after the merge.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(persistence): address review feedback on PR #1851
- Fix naive datetime.now() → datetime.now(UTC) in all ORM models
- Fix seq race condition in DbRunEventStore.put() with FOR UPDATE
and UNIQUE(thread_id, seq) constraint
- Encapsulate _store access in RunManager.update_run_completion()
- Deduplicate _store.put() logic in RunManager via _persist_to_store()
- Add update_run_completion to RunStore ABC + MemoryRunStore
- Wire follow_up_to_run_id through the full create path
- Add error recovery to RunJournal._flush_sync() lost-event scenario
- Add migration note for search_threads breaking change
- Fix test_checkpointer_none_fix mock to set database=None
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: update uv.lock
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(persistence): address 22 review comments from CodeQL, Copilot, and Code Quality
Bug fixes:
- Sanitize log params to prevent log injection (CodeQL)
- Reset threads_meta.status to idle/error when run completes
- Attach messages only to latest checkpoint in /history response
- Write threads_meta on POST /threads so new threads appear in search
Lint fixes:
- Remove unused imports (journal.py, migrations/env.py, test_converters.py)
- Convert lambda to named function (engine.py, Ruff E731)
- Remove unused logger definitions in repos (Ruff F841)
- Add logging to JSONL decode errors and empty except blocks
- Separate assert side-effects in tests (CodeQL)
- Remove unused local variables in tests (Ruff F841)
- Fix max_trace_content truncation to use byte length, not char length
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* style: apply ruff format to persistence and runtime files
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Potential fix for pull request finding 'Statement has no effect'
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
* refactor(runtime): introduce RunContext to reduce run_agent parameter bloat
Extract checkpointer, store, event_store, run_events_config, thread_meta_repo,
and follow_up_to_run_id into a frozen RunContext dataclass. Add get_run_context()
in deps.py to build the base context from app.state singletons. start_run() uses
dataclasses.replace() to enrich per-run fields before passing ctx to run_agent.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(gateway): move sanitize_log_param to app/gateway/utils.py
Extract the log-injection sanitizer from routers/threads.py into a shared
utils module and rename to sanitize_log_param (public API). Eliminates the
reverse service → router import in services.py.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* perf: use SQL aggregation for feedback stats and thread token usage
Replace Python-side counting in FeedbackRepository.aggregate_by_run with
a single SELECT COUNT/SUM query. Add RunStore.aggregate_tokens_by_thread
abstract method with SQL GROUP BY implementation in RunRepository and
Python fallback in MemoryRunStore. Simplify the thread_token_usage
endpoint to delegate to the new method, eliminating the limit=10000
truncation risk.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: annotate DbRunEventStore.put() as low-frequency path
Add docstring clarifying that put() opens a per-call transaction with
FOR UPDATE and should only be used for infrequent writes (currently
just the initial human_message event). High-throughput callers should
use put_batch() instead.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(threads): fall back to Store search when ThreadMetaRepository is unavailable
When database.backend=memory (default) or no SQL session factory is
configured, search_threads now queries the LangGraph Store instead of
returning 503. Returns empty list if neither Store nor repo is available.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(persistence): introduce ThreadMetaStore ABC for backend-agnostic thread metadata
Add ThreadMetaStore abstract base class with create/get/search/update/delete
interface. ThreadMetaRepository (SQL) now inherits from it. New
MemoryThreadMetaStore wraps LangGraph BaseStore for memory-mode deployments.
deps.py now always provides a non-None thread_meta_repo, eliminating all
`if thread_meta_repo is not None` guards in services.py, worker.py, and
routers/threads.py. search_threads no longer needs a Store fallback branch.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(history): read messages from checkpointer instead of RunEventStore
The /history endpoint now reads messages directly from the
checkpointer's channel_values (the authoritative source) instead of
querying RunEventStore.list_messages(). The RunEventStore API is
preserved for other consumers.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(persistence): address new Copilot review comments
- feedback.py: validate thread_id/run_id before deleting feedback
- jsonl.py: add path traversal protection with ID validation
- run_repo.py: parse `before` to datetime for PostgreSQL compat
- thread_meta_repo.py: fix pagination when metadata filter is active
- database_config.py: use resolve_path for sqlite_dir consistency
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Implement skill self-evolution and skill_manage flow (#1874)
* chore: ignore .worktrees directory
* Add skill_manage self-evolution flow
* Fix CI regressions for skill_manage
* Address PR review feedback for skill evolution
* fix(skill-evolution): preserve history on delete
* fix(skill-evolution): tighten scanner fallbacks
* docs: add skill_manage e2e evidence screenshot
* fix(skill-manage): avoid blocking fs ops in session runtime
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(config): resolve sqlite_dir relative to CWD, not Paths.base_dir
resolve_path() resolves relative to Paths.base_dir (.deer-flow),
which double-nested the path to .deer-flow/.deer-flow/data/app.db.
Use Path.resolve() (CWD-relative) instead.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Feature/feishu receive file (#1608)
* feat(feishu): add channel file materialization hook for inbound messages
- Introduce Channel.receive_file(msg, thread_id) as a base method for file materialization; default is no-op.
- Implement FeishuChannel.receive_file to download files/images from Feishu messages, save to sandbox, and inject virtual paths into msg.text.
- Update ChannelManager to call receive_file for any channel if msg.files is present, enabling downstream model access to user-uploaded files.
- No impact on Slack/Telegram or other channels (they inherit the default no-op).
* style(backend): format code with ruff for lint compliance
- Auto-formatted packages/harness/deerflow/agents/factory.py and tests/test_create_deerflow_agent.py using `ruff format`
- Ensured both files conform to project linting standards
- Fixes CI lint check failures caused by code style issues
* fix(feishu): handle file write operation asynchronously to prevent blocking
* fix(feishu): rename GetMessageResourceRequest to _GetMessageResourceRequest and remove redundant code
* test(feishu): add tests for receive_file method and placeholder replacement
* fix(manager): remove unnecessary type casting for channel retrieval
* fix(feishu): update logging messages to reflect resource handling instead of image
* fix(feishu): sanitize filename by replacing invalid characters in file uploads
* fix(feishu): improve filename sanitization and reorder image key handling in message processing
* fix(feishu): add thread lock to prevent filename conflicts during file downloads
* fix(test): correct bad merge in test_feishu_parser.py
* chore: run ruff and apply formatting cleanup
fix(feishu): preserve rich-text attachment order and improve fallback filename handling
* fix(docker): restore gateway env vars and fix langgraph empty arg issue (#1915)
Two production docker-compose.yaml bugs prevent `make up` from working:
1. Gateway missing DEER_FLOW_CONFIG_PATH and DEER_FLOW_EXTENSIONS_CONFIG_PATH
environment overrides. Added in fb2d99f (#1836) but accidentally reverted
by ca2fb95 (#1847). Without them, gateway reads host paths from .env via
env_file, causing FileNotFoundError inside the container.
2. Langgraph command fails when LANGGRAPH_ALLOW_BLOCKING is unset (default).
Empty $${allow_blocking} inserts a bare space between flags, causing
' --no-reload' to be parsed as unexpected extra argument. Fix by building
args string first and conditionally appending --allow-blocking.
Co-authored-by: cooper <cooperfu@tencent.com>
* fix(frontend): resolve invalid HTML nesting and tabnabbing vulnerabilities (#1904)
* fix(frontend): resolve invalid HTML nesting and tabnabbing vulnerabilities
Fix `<button>` inside `<a>` invalid HTML in artifact components and add
missing `noopener,noreferrer` to `window.open` calls to prevent reverse
tabnabbing.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(frontend): address Copilot review on tabnabbing and double-tab-open
Remove redundant parent onClick on web_fetch ChainOfThoughtStep to
prevent opening two tabs on link click, and explicitly null out
window.opener after window.open() for defensive tabnabbing hardening.
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* refactor(persistence): organize entities into per-entity directories
Restructure the persistence layer from horizontal "models/ + repositories/"
split into vertical entity-aligned directories. Each entity (thread_meta,
run, feedback) now owns its ORM model, abstract interface (where applicable),
and concrete implementations under a single directory with an aggregating
__init__.py for one-line imports.
Layout:
persistence/thread_meta/{base,model,sql,memory}.py
persistence/run/{model,sql}.py
persistence/feedback/{model,sql}.py
models/__init__.py is kept as a facade so Alembic autogenerate continues to
discover all ORM tables via Base.metadata. RunEventRow remains under
models/run_event.py because its storage implementation lives in
runtime/events/store/db.py and has no matching repository directory.
The repositories/ directory is removed entirely. All call sites in
gateway/deps.py and tests are updated to import from the new entity
packages, e.g.:
from deerflow.persistence.thread_meta import ThreadMetaRepository
from deerflow.persistence.run import RunRepository
from deerflow.persistence.feedback import FeedbackRepository
Full test suite passes (1690 passed, 14 skipped).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(gateway): sync thread rename and delete through ThreadMetaStore
The POST /threads/{id}/state endpoint previously synced title changes
only to the LangGraph Store via _store_upsert. In sqlite mode the search
endpoint reads from the ThreadMetaRepository SQL table, so renames never
appeared in /threads/search until the next agent run completed (worker.py
syncs title from checkpoint to thread_meta in its finally block).
Likewise the DELETE /threads/{id} endpoint cleaned up the filesystem,
Store, and checkpointer but left the threads_meta row orphaned in sqlite,
so deleted threads kept appearing in /threads/search.
Fix both endpoints by routing through the ThreadMetaStore abstraction
which already has the correct sqlite/memory implementations wired up by
deps.py. The rename path now calls update_display_name() and the delete
path calls delete() — both work uniformly across backends.
Verified end-to-end with curl in gateway mode against sqlite backend.
Existing test suite (1690 passed) and focused router/repo tests pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(gateway): route all thread metadata access through ThreadMetaStore
Following the rename/delete bug fix in PR1, migrate the remaining direct
LangGraph Store reads/writes in the threads router and services to the
ThreadMetaStore abstraction so that the sqlite and memory backends behave
identically and the legacy dual-write paths can be removed.
Migrated endpoints (threads.py):
- create_thread: idempotency check + write now use thread_meta_repo.get/create
instead of dual-writing the LangGraph Store and the SQL row.
- get_thread: reads from thread_meta_repo.get; the checkpoint-only fallback
for legacy threads is preserved.
- patch_thread: replaced _store_get/_store_put with thread_meta_repo.update_metadata.
- delete_thread_data: dropped the legacy store.adelete; thread_meta_repo.delete
already covers it.
Removed dead code (services.py):
- _upsert_thread_in_store — redundant with the immediately following
thread_meta_repo.create() call.
- _sync_thread_title_after_run — worker.py's finally block already syncs
the title via thread_meta_repo.update_display_name() after each run.
Removed dead code (threads.py):
- _store_get / _store_put / _store_upsert helpers (no remaining callers).
- THREADS_NS constant.
- get_store import (router no longer touches the LangGraph Store directly).
New abstract method:
- ThreadMetaStore.update_metadata(thread_id, metadata) merges metadata into
the thread's metadata field. Implemented in both ThreadMetaRepository (SQL,
read-modify-write inside one session) and MemoryThreadMetaStore. Three new
unit tests cover merge / empty / nonexistent behaviour.
Net change: -134 lines. Full test suite: 1693 passed, 14 skipped.
Verified end-to-end with curl in gateway mode against sqlite backend
(create / patch / get / rename / search / delete).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
Co-authored-by: DanielWalnut <45447813+hetaoBackend@users.noreply.github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: JilongSun <965640067@qq.com>
Co-authored-by: jie <49781832+stan-fu@users.noreply.github.com>
Co-authored-by: cooper <cooperfu@tencent.com>
Co-authored-by: yangzheli <43645580+yangzheli@users.noreply.github.com>
* feat(auth): release-validation pass for 2.0-rc — 12 blockers + simplify follow-ups (#2008)
* feat(auth): introduce backend auth module
Port RFC-001 authentication core from PR #1728:
- JWT token handling (create_access_token, decode_token, TokenPayload)
- Password hashing (bcrypt) with verify_password
- SQLite UserRepository with base interface
- Provider Factory pattern (LocalAuthProvider)
- CLI reset_admin tool
- Auth-specific errors (AuthErrorCode, TokenError, AuthErrorResponse)
Deps:
- bcrypt>=4.0.0
- pyjwt>=2.9.0
- email-validator>=2.0.0
- backend/uv.toml pins public PyPI index
Tests: 12 pure unit tests (test_auth_config.py, test_auth_errors.py).
Scope note: authz.py, test_auth.py, and test_auth_type_system.py are
deferred to commit 2 because they depend on middleware and deps wiring
that is not yet in place. Commit 1 stays "pure new files only" as the
spec mandates.
* feat(auth): wire auth end-to-end (middleware + frontend replacement)
Backend:
- Port auth_middleware, csrf_middleware, langgraph_auth, routers/auth
- Port authz decorator (owner_filter_key defaults to 'owner_id')
- Merge app.py: register AuthMiddleware + CSRFMiddleware + CORS, add
_ensure_admin_user lifespan hook, _migrate_orphaned_threads helper,
register auth router
- Merge deps.py: add get_local_provider, get_current_user_from_request,
get_optional_user_from_request; keep get_current_user as thin str|None
adapter for feedback router
- langgraph.json: add auth path pointing to langgraph_auth.py:auth
- Rename metadata['user_id'] -> metadata['owner_id'] in langgraph_auth
(both metadata write and LangGraph filter dict) + test fixtures
Frontend:
- Delete better-auth library and api catch-all route
- Remove better-auth npm dependency and env vars (BETTER_AUTH_SECRET,
BETTER_AUTH_GITHUB_*) from env.js
- Port frontend/src/core/auth/* (AuthProvider, gateway-config,
proxy-policy, server-side getServerSideUser, types)
- Port frontend/src/core/api/fetcher.ts
- Port (auth)/layout, (auth)/login, (auth)/setup pages
- Rewrite workspace/layout.tsx as server component that calls
getServerSideUser and wraps in AuthProvider
- Port workspace/workspace-content.tsx for the client-side sidebar logic
Tests:
- Port 5 auth test files (test_auth, test_auth_middleware,
test_auth_type_system, test_ensure_admin, test_langgraph_auth)
- 176 auth tests PASS
After this commit: login/logout/registration flow works, but persistence
layer does not yet filter by owner_id. Commit 4 closes that gap.
* feat(auth): account settings page + i18n
- Port account-settings-page.tsx (change password, change email, logout)
- Wire into settings-dialog.tsx as new "account" section with UserIcon,
rendered first in the section list
- Add i18n keys:
- en-US/zh-CN: settings.sections.account ("Account" / "账号")
- en-US/zh-CN: button.logout ("Log out" / "退出登录")
- types.ts: matching type declarations
* feat(auth): enforce owner_id across 2.0-rc persistence layer
Add request-scoped contextvar-based owner filtering to threads_meta,
runs, run_events, and feedback repositories. Router code is unchanged
— isolation is enforced at the storage layer so that any caller that
forgets to pass owner_id still gets filtered results, and new routes
cannot accidentally leak data.
Core infrastructure
-------------------
- deerflow/runtime/user_context.py (new):
- ContextVar[CurrentUser | None] with default None
- runtime_checkable CurrentUser Protocol (structural subtype with .id)
- set/reset/get/require helpers
- AUTO sentinel + resolve_owner_id(value, method_name) for sentinel
three-state resolution: AUTO reads contextvar, explicit str
overrides, explicit None bypasses the filter (for migration/CLI)
Repository changes
------------------
- ThreadMetaRepository: create/get/search/update_*/delete gain
owner_id=AUTO kwarg; read paths filter by owner, writes stamp it,
mutations check ownership before applying
- RunRepository: put/get/list_by_thread/delete gain owner_id=AUTO kwarg
- FeedbackRepository: create/get/list_by_run/list_by_thread/delete
gain owner_id=AUTO kwarg
- DbRunEventStore: list_messages/list_events/list_messages_by_run/
count_messages/delete_by_thread/delete_by_run gain owner_id=AUTO
kwarg. Write paths (put/put_batch) read contextvar softly: when a
request-scoped user is available, owner_id is stamped; background
worker writes without a user context pass None which is valid
(orphan row to be bound by migration)
Schema
------
- persistence/models/run_event.py: RunEventRow.owner_id = Mapped[
str | None] = mapped_column(String(64), nullable=True, index=True)
- No alembic migration needed: 2.0 ships fresh, Base.metadata.create_all
picks up the new column automatically
Middleware
----------
- auth_middleware.py: after cookie check, call get_optional_user_from_
request to load the real User, stamp it into request.state.user AND
the contextvar via set_current_user, reset in a try/finally. Public
paths and unauthenticated requests continue without contextvar, and
@require_auth handles the strict 401 path
Test infrastructure
-------------------
- tests/conftest.py: @pytest.fixture(autouse=True) _auto_user_context
sets a default SimpleNamespace(id="test-user-autouse") on every test
unless marked @pytest.mark.no_auto_user. Keeps existing 20+
persistence tests passing without modification
- pyproject.toml [tool.pytest.ini_options]: register no_auto_user
marker so pytest does not emit warnings for opt-out tests
- tests/test_user_context.py: 6 tests covering three-state semantics,
Protocol duck typing, and require/optional APIs
- tests/test_thread_meta_repo.py: one test updated to pass owner_id=
None explicitly where it was previously relying on the old default
Test results
------------
- test_user_context.py: 6 passed
- test_auth*.py + test_langgraph_auth.py + test_ensure_admin.py: 127
- test_run_event_store / test_run_repository / test_thread_meta_repo
/ test_feedback: 92 passed
- Full backend suite: 1905 passed, 2 failed (both @requires_llm flaky
integration tests unrelated to auth), 1 skipped
* feat(auth): extend orphan migration to 2.0-rc persistence tables
_ensure_admin_user now runs a three-step pipeline on every boot:
Step 1 (fatal): admin user exists / is created / password is reset
Step 2 (non-fatal): LangGraph store orphan threads → admin
Step 3 (non-fatal): SQL persistence tables → admin
- threads_meta
- runs
- run_events
- feedback
Each step is idempotent. The fatal/non-fatal split mirrors PR #1728's
original philosophy: admin creation failure blocks startup (the system
is unusable without an admin), whereas migration failures log a warning
and let the service proceed (a partial migration is recoverable; a
missing admin is not).
Key helpers
-----------
- _iter_store_items(store, namespace, *, page_size=500):
async generator that cursor-paginates across LangGraph store pages.
Fixes PR #1728's hardcoded limit=1000 bug that would silently lose
orphans beyond the first page.
- _migrate_orphaned_threads(store, admin_user_id):
Rewritten to use _iter_store_items. Returns the migrated count so the
caller can log it; raises only on unhandled exceptions.
- _migrate_orphan_sql_tables(admin_user_id):
Imports the 4 ORM models lazily, grabs the shared session factory,
runs one UPDATE per table in a single transaction, commits once.
No-op when no persistence backend is configured (in-memory dev).
Tests: test_ensure_admin.py (8 passed)
* test(auth): port AUTH test plan docs + lint/format pass
- Port backend/docs/AUTH_TEST_PLAN.md and AUTH_UPGRADE.md from PR #1728
- Rename metadata.user_id → metadata.owner_id in AUTH_TEST_PLAN.md
(4 occurrences from the original PR doc)
- ruff auto-fix UP037 in sentinel type annotations: drop quotes around
"str | None | _AutoSentinel" now that from __future__ import
annotations makes them implicit string forms
- ruff format: 2 files (app/gateway/app.py, runtime/user_context.py)
Note on test coverage additions:
- conftest.py autouse fixture was already added in commit 4 (had to
be co-located with the repository changes to keep pre-existing
persistence tests passing)
- cross-user isolation E2E tests (test_owner_isolation.py) deferred
— enforcement is already proven by the 98-test repository suite
via the autouse fixture + explicit _AUTO sentinel exercises
- New test cases (TC-API-17..20, TC-ATK-13, TC-MIG-01..07) listed
in AUTH_TEST_PLAN.md are deferred to a follow-up PR — they are
manual-QA test cases rather than pytest code, and the spec-level
coverage is already met by test_user_context.py + the 98-test
repository suite.
Final test results:
- Auth suite (test_auth*, test_langgraph_auth, test_ensure_admin,
test_user_context): 186 passed
- Persistence suite (test_run_event_store, test_run_repository,
test_thread_meta_repo, test_feedback): 98 passed
- Lint: ruff check + ruff format both clean
* test(auth): add cross-user isolation test suite
10 tests exercising the storage-layer owner filter by manually
switching the user_context contextvar between two users. Verifies
the safety invariant:
After a repository write with owner_id=A, a subsequent read with
owner_id=B must not return the row, and vice versa.
Covers all 4 tables that own user-scoped data:
TC-API-17 threads_meta — read, search, update, delete cross-user
TC-API-18 runs — get, list_by_thread, delete cross-user
TC-API-19 run_events — list_messages, list_events, count_messages,
delete_by_thread (CRITICAL: raw conversation
content leak vector)
TC-API-20 feedback — get, list_by_run, delete cross-user
Plus two meta-tests verifying the sentinel pattern itself:
- AUTO + unset contextvar raises RuntimeError
- explicit owner_id=None bypasses the filter (migration escape hatch)
Architecture note
-----------------
These tests bypass the HTTP layer by design. The full chain
(cookie → middleware → contextvar → repository) is covered piecewise:
- test_auth_middleware.py: middleware sets contextvar from cookies
- test_owner_isolation.py: repositories enforce isolation when
contextvar is set to different users
Together they prove the end-to-end safety property without the
ceremony of spinning up a full TestClient + in-memory DB for every
router endpoint.
Tests pass: 231 (full auth + persistence + isolation suite)
Lint: clean
* refactor(auth): migrate user repository to SQLAlchemy ORM
Move the users table into the shared persistence engine so auth
matches the pattern of threads_meta, runs, run_events, and feedback —
one engine, one session factory, one schema init codepath.
New files
---------
- persistence/user/__init__.py, persistence/user/model.py: UserRow
ORM class with partial unique index on (oauth_provider, oauth_id)
- Registered in persistence/models/__init__.py so
Base.metadata.create_all() picks it up
Modified
--------
- auth/repositories/sqlite.py: rewritten as async SQLAlchemy,
identical constructor pattern to the other four repositories
(def __init__(self, session_factory) + self._sf = session_factory)
- auth/config.py: drop users_db_path field — storage is configured
through config.database like every other table
- deps.py/get_local_provider: construct SQLiteUserRepository with
the shared session factory, fail fast if engine is not initialised
- tests/test_auth.py: rewrite test_sqlite_round_trip_new_fields to
use the shared engine (init_engine + close_engine in a tempdir)
- tests/test_auth_type_system.py: add per-test autouse fixture that
spins up a scratch engine and resets deps._cached_* singletons
* refactor(auth): remove SQL orphan migration (unused in supported scenarios)
The _migrate_orphan_sql_tables helper existed to bind NULL owner_id
rows in threads_meta, runs, run_events, and feedback to the admin on
first boot. But in every supported upgrade path, it's a no-op:
1. Fresh install: create_all builds fresh tables, no legacy rows
2. No-auth → with-auth (no existing persistence DB): persistence
tables are created fresh by create_all, no legacy rows
3. No-auth → with-auth (has existing persistence DB from #1930):
NOT a supported upgrade path — "有 DB 到有 DB" schema evolution
is out of scope; users wipe DB or run manual ALTER
So the SQL orphan migration never has anything to do in the
supported matrix. Delete the function, simplify _ensure_admin_user
from a 3-step pipeline to a 2-step one (admin creation + LangGraph
store orphan migration only).
LangGraph store orphan migration stays: it serves the real
"no-auth → with-auth" upgrade path where a user's existing LangGraph
thread metadata has no owner_id field and needs to be stamped with
the newly-created admin's id.
Tests: 284 passed (auth + persistence + isolation)
Lint: clean
* security(auth): write initial admin password to 0600 file instead of logs
CodeQL py/clear-text-logging-sensitive-data flagged 3 call sites that
logged the auto-generated admin password to stdout via logger.info().
Production log aggregators (ELK/Splunk/etc) would have captured those
cleartext secrets. Replace with a shared helper that writes to
.deer-flow/admin_initial_credentials.txt with mode 0600, and log only
the path.
New file
--------
- app/gateway/auth/credential_file.py: write_initial_credentials()
helper. Takes email, password, and a "initial"/"reset" label.
Creates .deer-flow/ if missing, writes a header comment plus the
email+password, chmods 0o600, returns the absolute Path.
Modified
--------
- app/gateway/app.py: both _ensure_admin_user paths (fresh creation
+ needs_setup password reset) now write to file and log the path
- app/gateway/auth/reset_admin.py: rewritten to use the shared ORM
repo (SQLiteUserRepository with session_factory) and the
credential_file helper. The previous implementation was broken
after the earlier ORM refactor — it still imported _get_users_conn
and constructed SQLiteUserRepository() without a session factory.
No tests changed — the three password-log sites are all exercised
via existing test_ensure_admin.py which checks that startup
succeeds, not that a specific string appears in logs.
CodeQL alerts 272, 283, 284: all resolved.
* security(auth): strict JWT validation in middleware (fix junk cookie bypass)
AUTH_TEST_PLAN test 7.5.8 expects junk cookies to be rejected with
401. The previous middleware behaviour was "presence-only": check
that some access_token cookie exists, then pass through. In
combination with my Task-12 decision to skip @require_auth
decorators on routes, this created a gap where a request with any
cookie-shaped string (e.g. access_token=not-a-jwt) would bypass
authentication on routes that do not touch the repository
(/api/models, /api/mcp/config, /api/memory, /api/skills, …).
Fix: middleware now calls get_current_user_from_request() strictly
and catches the resulting HTTPException to render a 401 with the
proper fine-grained error code (token_invalid, token_expired,
user_not_found, …). On success it stamps request.state.user and
the contextvar so repository-layer owner filters work downstream.
The 4 old "_with_cookie_passes" tests in test_auth_middleware.py
were written for the presence-only behaviour; they asserted that
a junk cookie would make the handler return 200. They are renamed
to "_with_junk_cookie_rejected" and their assertions flipped to
401. The negative path (no cookie → 401 not_authenticated)
is unchanged.
Verified:
no cookie → 401 not_authenticated
junk cookie → 401 token_invalid (the fixed bug)
expired cookie → 401 token_expired
Tests: 284 passed (auth + persistence + isolation)
Lint: clean
* security(auth): wire @require_permission(owner_check=True) on isolation routes
Apply the require_permission decorator to all 28 routes that take a
{thread_id} path parameter. Combined with the strict middleware
(previous commit), this gives the double-layer protection that
AUTH_TEST_PLAN test 7.5.9 documents:
Layer 1 (AuthMiddleware): cookie + JWT validation, rejects junk
cookies and stamps request.state.user
Layer 2 (@require_permission with owner_check=True): per-resource
ownership verification via
ThreadMetaStore.check_access — returns
404 if a different user owns the thread
The decorator's owner_check branch is rewritten to use the SQL
thread_meta_repo (the 2.0-rc persistence layer) instead of the
LangGraph store path that PR #1728 used (_store_get / get_store
in routers/threads.py). The inject_record convenience is dropped
— no caller in 2.0 needs the LangGraph blob, and the SQL repo has
a different shape.
Routes decorated (28 total):
- threads.py: delete, patch, get, get-state, post-state, post-history
- thread_runs.py: post-runs, post-runs-stream, post-runs-wait,
list_runs, get_run, cancel_run, join_run, stream_existing_run,
list_thread_messages, list_run_messages, list_run_events,
thread_token_usage
- feedback.py: create, list, stats, delete
- uploads.py: upload (added Request param), list, delete
- artifacts.py: get_artifact
- suggestions.py: generate (renamed body parameter to avoid
conflict with FastAPI Request)
Test fixes:
- test_suggestions_router.py: bypass the decorator via __wrapped__
(the unit tests cover parsing logic, not auth — no point spinning
up a thread_meta_repo just to test JSON unwrapping)
- test_auth_middleware.py 4 fake-cookie tests: already updated in
the previous commit (745bf432)
Tests: 293 passed (auth + persistence + isolation + suggestions)
Lint: clean
* security(auth): defense-in-depth fixes from release validation pass
Eight findings caught while running the AUTH_TEST_PLAN end-to-end against
the deployed sg_dev stack. Each is a pre-condition for shipping
release/2.0-rc that the previous PRs missed.
Backend hardening
- routers/auth.py: rate limiter X-Real-IP now requires AUTH_TRUSTED_PROXIES
whitelist (CIDR/IP allowlist). Without nginx in front, the previous code
honored arbitrary X-Real-IP, letting an attacker rotate the header to
fully bypass the per-IP login lockout.
- routers/auth.py: 36-entry common-password blocklist via Pydantic
field_validator on RegisterRequest + ChangePasswordRequest. The shared
_validate_strong_password helper keeps the constraint in one place.
- routers/threads.py: ThreadCreateRequest + ThreadPatchRequest strip
server-reserved metadata keys (owner_id, user_id) via Pydantic
field_validator so a forged value can never round-trip back to other
clients reading the same thread. The actual ownership invariant stays
on the threads_meta row; this closes the metadata-blob echo gap.
- authz.py + thread_meta/sql.py: require_permission gains a require_existing
flag plumbed through check_access(require_existing=True). Destructive
routes (DELETE/PATCH/state-update/runs/feedback) now treat a missing
thread_meta row as 404 instead of "untracked legacy thread, allow",
closing the cross-user delete-idempotence gap where any user could
successfully DELETE another user's deleted thread.
- repositories/sqlite.py + base.py: update_user raises UserNotFoundError
on a vanished row instead of silently returning the input. Concurrent
delete during password reset can no longer look like a successful update.
- runtime/user_context.py: resolve_owner_id() coerces User.id (UUID) to
str at the contextvar boundary so SQLAlchemy String(64) columns can
bind it. The whole 2.0-rc isolation pipeline was previously broken
end-to-end (POST /api/threads → 500 "type 'UUID' is not supported").
- persistence/engine.py: SQLAlchemy listener enables PRAGMA journal_mode=WAL,
synchronous=NORMAL, foreign_keys=ON on every new SQLite connection.
TC-UPG-06 in the test plan expects WAL; previous code shipped with the
default 'delete' journal.
- auth_middleware.py: stamp request.state.auth = AuthContext(...) so
@require_permission's short-circuit fires; previously every isolation
request did a duplicate JWT decode + users SELECT. Also unifies the
401 payload through AuthErrorResponse(...).model_dump().
- app.py: _ensure_admin_user restructure removes the noqa F821 scoping
bug where 'password' was referenced outside the branch that defined it.
New _announce_credentials helper absorbs the duplicate log block in
the fresh-admin and reset-admin branches.
* fix(frontend+nginx): rollout CSRF on every state-changing client path
The frontend was 100% broken in gateway-pro mode for any user trying to
open a specific chat thread. Three cumulative bugs each silently
masked the next.
LangGraph SDK CSRF gap (api-client.ts)
- The Client constructor took only apiUrl, no defaultHeaders, no fetch
interceptor. The SDK's internal fetch never sent X-CSRF-Token, so
every state-changing /api/langgraph-compat/* call (runs/stream,
threads/search, threads/{tid}/history, ...) hit CSRFMiddleware and
got 403 before reaching the auth check. UI symptom: empty thread page
with no error message; the SPA's hooks swallowed the rejection.
- Fix: pass an onRequest hook that injects X-CSRF-Token from the
csrf_token cookie per request. Reading the cookie per call (not at
construction time) handles login / logout / password-change cookie
rotation transparently. The SDK's prepareFetchOptions calls
onRequest for both regular requests AND streaming/SSE/reconnect, so
the same hook covers runs.stream and runs.joinStream.
Raw fetch CSRF gap (7 files)
- Audit: 11 frontend fetch sites, only 2 included CSRF (login/setup +
account-settings change-password). The other 7 routed through raw
fetch() with no header — suggestions, memory, agents, mcp, skills,
uploads, and the local thread cleanup hook all 403'd silently.
- Fix: enhance fetcher.ts:fetchWithAuth to auto-inject X-CSRF-Token on
POST/PUT/DELETE/PATCH from a single shared readCsrfCookie() helper.
Convert all 7 raw fetch() callers to fetchWithAuth so the contract
is centrally enforced. api-client.ts and fetcher.ts share
readCsrfCookie + STATE_CHANGING_METHODS to avoid drift.
nginx routing + buffering (nginx.local.conf)
- The auth feature shipped without updating the nginx config: per-API
explicit location blocks but no /api/v1/auth/, /api/feedback, /api/runs.
The frontend's client-side fetches to /api/v1/auth/login/local 404'd
from the Next.js side because nginx routed /api/* to the frontend.
- Fix: add catch-all `location /api/` that proxies to the gateway.
nginx longest-prefix matching keeps the explicit blocks (/api/models,
/api/threads regex, /api/langgraph/, ...) winning for their paths.
- Fix: disable proxy_buffering + proxy_request_buffering for the
frontend `location /` block. Without it, nginx tries to spool large
Next.js chunks into /var/lib/nginx/proxy (root-owned) and fails with
Permission denied → ERR_INCOMPLETE_CHUNKED_ENCODING → ChunkLoadError.
* test(auth): release-validation test infra and new coverage
Test fixtures and unit tests added during the validation pass.
Router test helpers (NEW: tests/_router_auth_helpers.py)
- make_authed_test_app(): builds a FastAPI test app with a stub
middleware that stamps request.state.user + request.state.auth and a
permissive thread_meta_repo mock. TestClient-based router tests
(test_artifacts_router, test_threads_router) use it instead of bare
FastAPI() so the new @require_permission(owner_check=True) decorators
short-circuit cleanly.
- call_unwrapped(): walks the __wrapped__ chain to invoke the underlying
handler without going through the authz wrappers. Direct-call tests
(test_uploads_router) use it. Typed with ParamSpec so the wrapped
signature flows through.
Backend test additions
- test_auth.py: 7 tests for the new _get_client_ip trust model (no
proxy / trusted proxy / untrusted peer / XFF rejection / invalid
CIDR / no client). 5 tests for the password blocklist (literal,
case-insensitive, strong password accepted, change-password binding,
short-password length-check still fires before blocklist).
test_update_user_raises_when_row_concurrently_deleted: closes a
shipped-without-coverage gap on the new UserNotFoundError contract.
- test_thread_meta_repo.py: 4 tests for check_access(require_existing=True)
— strict missing-row denial, strict owner match, strict owner mismatch,
strict null-owner still allowed (shared rows survive the tightening).
- test_ensure_admin.py: 3 tests for _migrate_orphaned_threads /
_iter_store_items pagination, covering the TC-UPG-02 upgrade story
end-to-end via mock store. Closes the gap where the cursor pagination
was untested even though the previous PR rewrote it.
- test_threads_router.py: 5 tests for _strip_reserved_metadata
(owner_id removal, user_id removal, safe-keys passthrough, empty
input, both-stripped).
- test_auth_type_system.py: replace "password123" fixtures with
Tr0ub4dor3a / AnotherStr0ngPwd! so the new password blocklist
doesn't reject the test data.
* docs(auth): refresh TC-DOCKER-05 + document Docker validation gap
- AUTH_TEST_PLAN.md TC-DOCKER-05: the previous expectation
("admin password visible in docker logs") was stale after the simplify
pass that moved credentials to a 0600 file. The grep "Password:" check
would have silently failed and given a false sense of coverage. New
expectation matches the actual file-based path: 0600 file in
DEER_FLOW_HOME, log shows the path (not the secret), reverse-grep
asserts no leaked password in container logs.
- NEW: docs/AUTH_TEST_DOCKER_GAP.md documents the only un-executed
block in the test plan (TC-DOCKER-01..06). Reason: sg_dev validation
host has no Docker daemon installed. The doc maps each Docker case
to an already-validated bare-metal equivalent (TC-1.1, TC-REENT-01,
TC-API-02 etc.) so the gap is auditable, and includes pre-flight
reproduction steps for whoever has Docker available.
---------
Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com>
* feat: replace auto-admin creation with interactive setup flow
On first boot, instead of auto-creating admin@deerflow.dev with a
random password written to a credential file, DeerFlow now redirects
to /setup where the user creates the admin account interactively.
Backend:
- Remove auto admin creation from _ensure_admin_user (now only runs
orphan thread migration when an admin already exists)
- Add POST /api/v1/auth/initialize endpoint (public, only callable
when 0 users exist; auto-logs in after creation)
- Add /api/v1/auth/initialize to public paths in auth_middleware.py
and CSRF exempt paths in csrf_middleware.py
- Update test_ensure_admin.py to match new behavior
- Add test_initialize_admin.py with 8 tests for the new endpoint
Frontend:
- Add system_setup_required to AuthResult type
- getServerSideUser() checks setup-status when unauthenticated
- Auth layout allows system_setup_required (renders children)
- Workspace layout redirects system_setup_required → /setup
- Login page redirects to /setup when system not initialized
- Setup page detects mode via isAuthenticated: unauth = create-admin
form (calls /initialize), auth = change-password form (existing)
Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/9c2471c5-d6e9-4ada-9192-61b56007b8d7
Co-authored-by: foreleven <4785594+foreleven@users.noreply.github.com>
* fix: add cleanup flags to useEffect async fetches in setup/login pages
Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/9c2471c5-d6e9-4ada-9192-61b56007b8d7
Co-authored-by: foreleven <4785594+foreleven@users.noreply.github.com>
* fix: address reviewer feedback on /initialize endpoint security and robustness
1. Concurrency/register-blocking: switch setup-status and /initialize to
check admin_count (via new count_admin_users()) instead of total
user_count — /register can no longer block admin initialization
2. Dedicated error code: add SYSTEM_ALREADY_INITIALIZED to AuthErrorCode
and use it in /initialize 409 responses; add to frontend types
3. Init token security: generate a one-time token at startup (logged to
stdout) and require it in the /initialize request body — prevents
an attacker from claiming admin on an exposed first-boot instance
4. Setup-status fetch timeout: apply SSR_AUTH_TIMEOUT_MS abort-controller
pattern to the setup-status fetch in server.ts (same as /auth/me)
Backend repo/provider: add count_admin_users() to base, SQLite, and
LocalAuthProvider. Tests updated + new token-validation/register-blocking
test cases added.
Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/b9f531fc-8ed3-41db-b416-237f243b45fd
Co-authored-by: foreleven <4785594+foreleven@users.noreply.github.com>
* fix: address code review nits — move secrets import, add INVALID_INIT_TOKEN error code, fix test assertions
Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/b9f531fc-8ed3-41db-b416-237f243b45fd
Co-authored-by: foreleven <4785594+foreleven@users.noreply.github.com>
* refactor: remove init_token generation and validation from admin setup flow
* fix: re-apply init_token security for /initialize endpoint
Re-adds the one-time init_token requirement to the /initialize endpoint,
building on the human's UI improvements in 5eeeb09. This addresses the
two remaining unresolved review threads:
1. Dedicated error code (SYSTEM_ALREADY_INITIALIZED + INVALID_INIT_TOKEN)
2. Init token security gate — requires the token logged at startup
Changes:
- errors.py: re-add INVALID_INIT_TOKEN error code
- routers/auth.py: re-add `import secrets`, `init_token` field,
token validation with secrets.compare_digest, and token consumption
- app.py: re-add token generation/logging and app.state.init_token = None
- setup/page.tsx: re-add initToken state + input field (human's UI kept)
- types.ts: re-add invalid_init_token error code
- test_initialize_admin.py: restore full token test coverage
- test_ensure_admin.py: restore init_token assertions
Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/646fb5c0-ec09-41aa-9fe9-e6f7c32364e8
Co-authored-by: foreleven <4785594+foreleven@users.noreply.github.com>
* fix: make init_token optional (403 not 422 on missing), don't consume token on error paths
Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/646fb5c0-ec09-41aa-9fe9-e6f7c32364e8
Co-authored-by: foreleven <4785594+foreleven@users.noreply.github.com>
* refactor: remove redundant skill-related functions and documentation
---------
Co-authored-by: rayhpeng <rayhpeng@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
Co-authored-by: DanielWalnut <45447813+hetaoBackend@users.noreply.github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: JilongSun <965640067@qq.com>
Co-authored-by: jie <49781832+stan-fu@users.noreply.github.com>
Co-authored-by: cooper <cooperfu@tencent.com>
Co-authored-by: yangzheli <43645580+yangzheli@users.noreply.github.com>
Co-authored-by: greatmengqi <chenmengqi.0376@gmail.com>
Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: foreleven <4785594+foreleven@users.noreply.github.com>
Co-authored-by: jiangfeng.11 <jiangfeng.11@bytedance.com>
* feat(auth): introduce backend auth module
Port RFC-001 authentication core from PR #1728:
- JWT token handling (create_access_token, decode_token, TokenPayload)
- Password hashing (bcrypt) with verify_password
- SQLite UserRepository with base interface
- Provider Factory pattern (LocalAuthProvider)
- CLI reset_admin tool
- Auth-specific errors (AuthErrorCode, TokenError, AuthErrorResponse)
Deps:
- bcrypt>=4.0.0
- pyjwt>=2.9.0
- email-validator>=2.0.0
- backend/uv.toml pins public PyPI index
Tests: 12 pure unit tests (test_auth_config.py, test_auth_errors.py).
Scope note: authz.py, test_auth.py, and test_auth_type_system.py are
deferred to commit 2 because they depend on middleware and deps wiring
that is not yet in place. Commit 1 stays "pure new files only" as the
spec mandates.
* feat(auth): wire auth end-to-end (middleware + frontend replacement)
Backend:
- Port auth_middleware, csrf_middleware, langgraph_auth, routers/auth
- Port authz decorator (owner_filter_key defaults to 'owner_id')
- Merge app.py: register AuthMiddleware + CSRFMiddleware + CORS, add
_ensure_admin_user lifespan hook, _migrate_orphaned_threads helper,
register auth router
- Merge deps.py: add get_local_provider, get_current_user_from_request,
get_optional_user_from_request; keep get_current_user as thin str|None
adapter for feedback router
- langgraph.json: add auth path pointing to langgraph_auth.py:auth
- Rename metadata['user_id'] -> metadata['owner_id'] in langgraph_auth
(both metadata write and LangGraph filter dict) + test fixtures
Frontend:
- Delete better-auth library and api catch-all route
- Remove better-auth npm dependency and env vars (BETTER_AUTH_SECRET,
BETTER_AUTH_GITHUB_*) from env.js
- Port frontend/src/core/auth/* (AuthProvider, gateway-config,
proxy-policy, server-side getServerSideUser, types)
- Port frontend/src/core/api/fetcher.ts
- Port (auth)/layout, (auth)/login, (auth)/setup pages
- Rewrite workspace/layout.tsx as server component that calls
getServerSideUser and wraps in AuthProvider
- Port workspace/workspace-content.tsx for the client-side sidebar logic
Tests:
- Port 5 auth test files (test_auth, test_auth_middleware,
test_auth_type_system, test_ensure_admin, test_langgraph_auth)
- 176 auth tests PASS
After this commit: login/logout/registration flow works, but persistence
layer does not yet filter by owner_id. Commit 4 closes that gap.
* feat(auth): account settings page + i18n
- Port account-settings-page.tsx (change password, change email, logout)
- Wire into settings-dialog.tsx as new "account" section with UserIcon,
rendered first in the section list
- Add i18n keys:
- en-US/zh-CN: settings.sections.account ("Account" / "账号")
- en-US/zh-CN: button.logout ("Log out" / "退出登录")
- types.ts: matching type declarations
* feat(auth): enforce owner_id across 2.0-rc persistence layer
Add request-scoped contextvar-based owner filtering to threads_meta,
runs, run_events, and feedback repositories. Router code is unchanged
— isolation is enforced at the storage layer so that any caller that
forgets to pass owner_id still gets filtered results, and new routes
cannot accidentally leak data.
Core infrastructure
-------------------
- deerflow/runtime/user_context.py (new):
- ContextVar[CurrentUser | None] with default None
- runtime_checkable CurrentUser Protocol (structural subtype with .id)
- set/reset/get/require helpers
- AUTO sentinel + resolve_owner_id(value, method_name) for sentinel
three-state resolution: AUTO reads contextvar, explicit str
overrides, explicit None bypasses the filter (for migration/CLI)
Repository changes
------------------
- ThreadMetaRepository: create/get/search/update_*/delete gain
owner_id=AUTO kwarg; read paths filter by owner, writes stamp it,
mutations check ownership before applying
- RunRepository: put/get/list_by_thread/delete gain owner_id=AUTO kwarg
- FeedbackRepository: create/get/list_by_run/list_by_thread/delete
gain owner_id=AUTO kwarg
- DbRunEventStore: list_messages/list_events/list_messages_by_run/
count_messages/delete_by_thread/delete_by_run gain owner_id=AUTO
kwarg. Write paths (put/put_batch) read contextvar softly: when a
request-scoped user is available, owner_id is stamped; background
worker writes without a user context pass None which is valid
(orphan row to be bound by migration)
Schema
------
- persistence/models/run_event.py: RunEventRow.owner_id = Mapped[
str | None] = mapped_column(String(64), nullable=True, index=True)
- No alembic migration needed: 2.0 ships fresh, Base.metadata.create_all
picks up the new column automatically
Middleware
----------
- auth_middleware.py: after cookie check, call get_optional_user_from_
request to load the real User, stamp it into request.state.user AND
the contextvar via set_current_user, reset in a try/finally. Public
paths and unauthenticated requests continue without contextvar, and
@require_auth handles the strict 401 path
Test infrastructure
-------------------
- tests/conftest.py: @pytest.fixture(autouse=True) _auto_user_context
sets a default SimpleNamespace(id="test-user-autouse") on every test
unless marked @pytest.mark.no_auto_user. Keeps existing 20+
persistence tests passing without modification
- pyproject.toml [tool.pytest.ini_options]: register no_auto_user
marker so pytest does not emit warnings for opt-out tests
- tests/test_user_context.py: 6 tests covering three-state semantics,
Protocol duck typing, and require/optional APIs
- tests/test_thread_meta_repo.py: one test updated to pass owner_id=
None explicitly where it was previously relying on the old default
Test results
------------
- test_user_context.py: 6 passed
- test_auth*.py + test_langgraph_auth.py + test_ensure_admin.py: 127
- test_run_event_store / test_run_repository / test_thread_meta_repo
/ test_feedback: 92 passed
- Full backend suite: 1905 passed, 2 failed (both @requires_llm flaky
integration tests unrelated to auth), 1 skipped
* feat(auth): extend orphan migration to 2.0-rc persistence tables
_ensure_admin_user now runs a three-step pipeline on every boot:
Step 1 (fatal): admin user exists / is created / password is reset
Step 2 (non-fatal): LangGraph store orphan threads → admin
Step 3 (non-fatal): SQL persistence tables → admin
- threads_meta
- runs
- run_events
- feedback
Each step is idempotent. The fatal/non-fatal split mirrors PR #1728's
original philosophy: admin creation failure blocks startup (the system
is unusable without an admin), whereas migration failures log a warning
and let the service proceed (a partial migration is recoverable; a
missing admin is not).
Key helpers
-----------
- _iter_store_items(store, namespace, *, page_size=500):
async generator that cursor-paginates across LangGraph store pages.
Fixes PR #1728's hardcoded limit=1000 bug that would silently lose
orphans beyond the first page.
- _migrate_orphaned_threads(store, admin_user_id):
Rewritten to use _iter_store_items. Returns the migrated count so the
caller can log it; raises only on unhandled exceptions.
- _migrate_orphan_sql_tables(admin_user_id):
Imports the 4 ORM models lazily, grabs the shared session factory,
runs one UPDATE per table in a single transaction, commits once.
No-op when no persistence backend is configured (in-memory dev).
Tests: test_ensure_admin.py (8 passed)
* test(auth): port AUTH test plan docs + lint/format pass
- Port backend/docs/AUTH_TEST_PLAN.md and AUTH_UPGRADE.md from PR #1728
- Rename metadata.user_id → metadata.owner_id in AUTH_TEST_PLAN.md
(4 occurrences from the original PR doc)
- ruff auto-fix UP037 in sentinel type annotations: drop quotes around
"str | None | _AutoSentinel" now that from __future__ import
annotations makes them implicit string forms
- ruff format: 2 files (app/gateway/app.py, runtime/user_context.py)
Note on test coverage additions:
- conftest.py autouse fixture was already added in commit 4 (had to
be co-located with the repository changes to keep pre-existing
persistence tests passing)
- cross-user isolation E2E tests (test_owner_isolation.py) deferred
— enforcement is already proven by the 98-test repository suite
via the autouse fixture + explicit _AUTO sentinel exercises
- New test cases (TC-API-17..20, TC-ATK-13, TC-MIG-01..07) listed
in AUTH_TEST_PLAN.md are deferred to a follow-up PR — they are
manual-QA test cases rather than pytest code, and the spec-level
coverage is already met by test_user_context.py + the 98-test
repository suite.
Final test results:
- Auth suite (test_auth*, test_langgraph_auth, test_ensure_admin,
test_user_context): 186 passed
- Persistence suite (test_run_event_store, test_run_repository,
test_thread_meta_repo, test_feedback): 98 passed
- Lint: ruff check + ruff format both clean
* test(auth): add cross-user isolation test suite
10 tests exercising the storage-layer owner filter by manually
switching the user_context contextvar between two users. Verifies
the safety invariant:
After a repository write with owner_id=A, a subsequent read with
owner_id=B must not return the row, and vice versa.
Covers all 4 tables that own user-scoped data:
TC-API-17 threads_meta — read, search, update, delete cross-user
TC-API-18 runs — get, list_by_thread, delete cross-user
TC-API-19 run_events — list_messages, list_events, count_messages,
delete_by_thread (CRITICAL: raw conversation
content leak vector)
TC-API-20 feedback — get, list_by_run, delete cross-user
Plus two meta-tests verifying the sentinel pattern itself:
- AUTO + unset contextvar raises RuntimeError
- explicit owner_id=None bypasses the filter (migration escape hatch)
Architecture note
-----------------
These tests bypass the HTTP layer by design. The full chain
(cookie → middleware → contextvar → repository) is covered piecewise:
- test_auth_middleware.py: middleware sets contextvar from cookies
- test_owner_isolation.py: repositories enforce isolation when
contextvar is set to different users
Together they prove the end-to-end safety property without the
ceremony of spinning up a full TestClient + in-memory DB for every
router endpoint.
Tests pass: 231 (full auth + persistence + isolation suite)
Lint: clean
* refactor(auth): migrate user repository to SQLAlchemy ORM
Move the users table into the shared persistence engine so auth
matches the pattern of threads_meta, runs, run_events, and feedback —
one engine, one session factory, one schema init codepath.
New files
---------
- persistence/user/__init__.py, persistence/user/model.py: UserRow
ORM class with partial unique index on (oauth_provider, oauth_id)
- Registered in persistence/models/__init__.py so
Base.metadata.create_all() picks it up
Modified
--------
- auth/repositories/sqlite.py: rewritten as async SQLAlchemy,
identical constructor pattern to the other four repositories
(def __init__(self, session_factory) + self._sf = session_factory)
- auth/config.py: drop users_db_path field — storage is configured
through config.database like every other table
- deps.py/get_local_provider: construct SQLiteUserRepository with
the shared session factory, fail fast if engine is not initialised
- tests/test_auth.py: rewrite test_sqlite_round_trip_new_fields to
use the shared engine (init_engine + close_engine in a tempdir)
- tests/test_auth_type_system.py: add per-test autouse fixture that
spins up a scratch engine and resets deps._cached_* singletons
* refactor(auth): remove SQL orphan migration (unused in supported scenarios)
The _migrate_orphan_sql_tables helper existed to bind NULL owner_id
rows in threads_meta, runs, run_events, and feedback to the admin on
first boot. But in every supported upgrade path, it's a no-op:
1. Fresh install: create_all builds fresh tables, no legacy rows
2. No-auth → with-auth (no existing persistence DB): persistence
tables are created fresh by create_all, no legacy rows
3. No-auth → with-auth (has existing persistence DB from #1930):
NOT a supported upgrade path — "有 DB 到有 DB" schema evolution
is out of scope; users wipe DB or run manual ALTER
So the SQL orphan migration never has anything to do in the
supported matrix. Delete the function, simplify _ensure_admin_user
from a 3-step pipeline to a 2-step one (admin creation + LangGraph
store orphan migration only).
LangGraph store orphan migration stays: it serves the real
"no-auth → with-auth" upgrade path where a user's existing LangGraph
thread metadata has no owner_id field and needs to be stamped with
the newly-created admin's id.
Tests: 284 passed (auth + persistence + isolation)
Lint: clean
* security(auth): write initial admin password to 0600 file instead of logs
CodeQL py/clear-text-logging-sensitive-data flagged 3 call sites that
logged the auto-generated admin password to stdout via logger.info().
Production log aggregators (ELK/Splunk/etc) would have captured those
cleartext secrets. Replace with a shared helper that writes to
.deer-flow/admin_initial_credentials.txt with mode 0600, and log only
the path.
New file
--------
- app/gateway/auth/credential_file.py: write_initial_credentials()
helper. Takes email, password, and a "initial"/"reset" label.
Creates .deer-flow/ if missing, writes a header comment plus the
email+password, chmods 0o600, returns the absolute Path.
Modified
--------
- app/gateway/app.py: both _ensure_admin_user paths (fresh creation
+ needs_setup password reset) now write to file and log the path
- app/gateway/auth/reset_admin.py: rewritten to use the shared ORM
repo (SQLiteUserRepository with session_factory) and the
credential_file helper. The previous implementation was broken
after the earlier ORM refactor — it still imported _get_users_conn
and constructed SQLiteUserRepository() without a session factory.
No tests changed — the three password-log sites are all exercised
via existing test_ensure_admin.py which checks that startup
succeeds, not that a specific string appears in logs.
CodeQL alerts 272, 283, 284: all resolved.
* security(auth): strict JWT validation in middleware (fix junk cookie bypass)
AUTH_TEST_PLAN test 7.5.8 expects junk cookies to be rejected with
401. The previous middleware behaviour was "presence-only": check
that some access_token cookie exists, then pass through. In
combination with my Task-12 decision to skip @require_auth
decorators on routes, this created a gap where a request with any
cookie-shaped string (e.g. access_token=not-a-jwt) would bypass
authentication on routes that do not touch the repository
(/api/models, /api/mcp/config, /api/memory, /api/skills, …).
Fix: middleware now calls get_current_user_from_request() strictly
and catches the resulting HTTPException to render a 401 with the
proper fine-grained error code (token_invalid, token_expired,
user_not_found, …). On success it stamps request.state.user and
the contextvar so repository-layer owner filters work downstream.
The 4 old "_with_cookie_passes" tests in test_auth_middleware.py
were written for the presence-only behaviour; they asserted that
a junk cookie would make the handler return 200. They are renamed
to "_with_junk_cookie_rejected" and their assertions flipped to
401. The negative path (no cookie → 401 not_authenticated)
is unchanged.
Verified:
no cookie → 401 not_authenticated
junk cookie → 401 token_invalid (the fixed bug)
expired cookie → 401 token_expired
Tests: 284 passed (auth + persistence + isolation)
Lint: clean
* security(auth): wire @require_permission(owner_check=True) on isolation routes
Apply the require_permission decorator to all 28 routes that take a
{thread_id} path parameter. Combined with the strict middleware
(previous commit), this gives the double-layer protection that
AUTH_TEST_PLAN test 7.5.9 documents:
Layer 1 (AuthMiddleware): cookie + JWT validation, rejects junk
cookies and stamps request.state.user
Layer 2 (@require_permission with owner_check=True): per-resource
ownership verification via
ThreadMetaStore.check_access — returns
404 if a different user owns the thread
The decorator's owner_check branch is rewritten to use the SQL
thread_meta_repo (the 2.0-rc persistence layer) instead of the
LangGraph store path that PR #1728 used (_store_get / get_store
in routers/threads.py). The inject_record convenience is dropped
— no caller in 2.0 needs the LangGraph blob, and the SQL repo has
a different shape.
Routes decorated (28 total):
- threads.py: delete, patch, get, get-state, post-state, post-history
- thread_runs.py: post-runs, post-runs-stream, post-runs-wait,
list_runs, get_run, cancel_run, join_run, stream_existing_run,
list_thread_messages, list_run_messages, list_run_events,
thread_token_usage
- feedback.py: create, list, stats, delete
- uploads.py: upload (added Request param), list, delete
- artifacts.py: get_artifact
- suggestions.py: generate (renamed body parameter to avoid
conflict with FastAPI Request)
Test fixes:
- test_suggestions_router.py: bypass the decorator via __wrapped__
(the unit tests cover parsing logic, not auth — no point spinning
up a thread_meta_repo just to test JSON unwrapping)
- test_auth_middleware.py 4 fake-cookie tests: already updated in
the previous commit (745bf432)
Tests: 293 passed (auth + persistence + isolation + suggestions)
Lint: clean
* security(auth): defense-in-depth fixes from release validation pass
Eight findings caught while running the AUTH_TEST_PLAN end-to-end against
the deployed sg_dev stack. Each is a pre-condition for shipping
release/2.0-rc that the previous PRs missed.
Backend hardening
- routers/auth.py: rate limiter X-Real-IP now requires AUTH_TRUSTED_PROXIES
whitelist (CIDR/IP allowlist). Without nginx in front, the previous code
honored arbitrary X-Real-IP, letting an attacker rotate the header to
fully bypass the per-IP login lockout.
- routers/auth.py: 36-entry common-password blocklist via Pydantic
field_validator on RegisterRequest + ChangePasswordRequest. The shared
_validate_strong_password helper keeps the constraint in one place.
- routers/threads.py: ThreadCreateRequest + ThreadPatchRequest strip
server-reserved metadata keys (owner_id, user_id) via Pydantic
field_validator so a forged value can never round-trip back to other
clients reading the same thread. The actual ownership invariant stays
on the threads_meta row; this closes the metadata-blob echo gap.
- authz.py + thread_meta/sql.py: require_permission gains a require_existing
flag plumbed through check_access(require_existing=True). Destructive
routes (DELETE/PATCH/state-update/runs/feedback) now treat a missing
thread_meta row as 404 instead of "untracked legacy thread, allow",
closing the cross-user delete-idempotence gap where any user could
successfully DELETE another user's deleted thread.
- repositories/sqlite.py + base.py: update_user raises UserNotFoundError
on a vanished row instead of silently returning the input. Concurrent
delete during password reset can no longer look like a successful update.
- runtime/user_context.py: resolve_owner_id() coerces User.id (UUID) to
str at the contextvar boundary so SQLAlchemy String(64) columns can
bind it. The whole 2.0-rc isolation pipeline was previously broken
end-to-end (POST /api/threads → 500 "type 'UUID' is not supported").
- persistence/engine.py: SQLAlchemy listener enables PRAGMA journal_mode=WAL,
synchronous=NORMAL, foreign_keys=ON on every new SQLite connection.
TC-UPG-06 in the test plan expects WAL; previous code shipped with the
default 'delete' journal.
- auth_middleware.py: stamp request.state.auth = AuthContext(...) so
@require_permission's short-circuit fires; previously every isolation
request did a duplicate JWT decode + users SELECT. Also unifies the
401 payload through AuthErrorResponse(...).model_dump().
- app.py: _ensure_admin_user restructure removes the noqa F821 scoping
bug where 'password' was referenced outside the branch that defined it.
New _announce_credentials helper absorbs the duplicate log block in
the fresh-admin and reset-admin branches.
* fix(frontend+nginx): rollout CSRF on every state-changing client path
The frontend was 100% broken in gateway-pro mode for any user trying to
open a specific chat thread. Three cumulative bugs each silently
masked the next.
LangGraph SDK CSRF gap (api-client.ts)
- The Client constructor took only apiUrl, no defaultHeaders, no fetch
interceptor. The SDK's internal fetch never sent X-CSRF-Token, so
every state-changing /api/langgraph-compat/* call (runs/stream,
threads/search, threads/{tid}/history, ...) hit CSRFMiddleware and
got 403 before reaching the auth check. UI symptom: empty thread page
with no error message; the SPA's hooks swallowed the rejection.
- Fix: pass an onRequest hook that injects X-CSRF-Token from the
csrf_token cookie per request. Reading the cookie per call (not at
construction time) handles login / logout / password-change cookie
rotation transparently. The SDK's prepareFetchOptions calls
onRequest for both regular requests AND streaming/SSE/reconnect, so
the same hook covers runs.stream and runs.joinStream.
Raw fetch CSRF gap (7 files)
- Audit: 11 frontend fetch sites, only 2 included CSRF (login/setup +
account-settings change-password). The other 7 routed through raw
fetch() with no header — suggestions, memory, agents, mcp, skills,
uploads, and the local thread cleanup hook all 403'd silently.
- Fix: enhance fetcher.ts:fetchWithAuth to auto-inject X-CSRF-Token on
POST/PUT/DELETE/PATCH from a single shared readCsrfCookie() helper.
Convert all 7 raw fetch() callers to fetchWithAuth so the contract
is centrally enforced. api-client.ts and fetcher.ts share
readCsrfCookie + STATE_CHANGING_METHODS to avoid drift.
nginx routing + buffering (nginx.local.conf)
- The auth feature shipped without updating the nginx config: per-API
explicit location blocks but no /api/v1/auth/, /api/feedback, /api/runs.
The frontend's client-side fetches to /api/v1/auth/login/local 404'd
from the Next.js side because nginx routed /api/* to the frontend.
- Fix: add catch-all `location /api/` that proxies to the gateway.
nginx longest-prefix matching keeps the explicit blocks (/api/models,
/api/threads regex, /api/langgraph/, ...) winning for their paths.
- Fix: disable proxy_buffering + proxy_request_buffering for the
frontend `location /` block. Without it, nginx tries to spool large
Next.js chunks into /var/lib/nginx/proxy (root-owned) and fails with
Permission denied → ERR_INCOMPLETE_CHUNKED_ENCODING → ChunkLoadError.
* test(auth): release-validation test infra and new coverage
Test fixtures and unit tests added during the validation pass.
Router test helpers (NEW: tests/_router_auth_helpers.py)
- make_authed_test_app(): builds a FastAPI test app with a stub
middleware that stamps request.state.user + request.state.auth and a
permissive thread_meta_repo mock. TestClient-based router tests
(test_artifacts_router, test_threads_router) use it instead of bare
FastAPI() so the new @require_permission(owner_check=True) decorators
short-circuit cleanly.
- call_unwrapped(): walks the __wrapped__ chain to invoke the underlying
handler without going through the authz wrappers. Direct-call tests
(test_uploads_router) use it. Typed with ParamSpec so the wrapped
signature flows through.
Backend test additions
- test_auth.py: 7 tests for the new _get_client_ip trust model (no
proxy / trusted proxy / untrusted peer / XFF rejection / invalid
CIDR / no client). 5 tests for the password blocklist (literal,
case-insensitive, strong password accepted, change-password binding,
short-password length-check still fires before blocklist).
test_update_user_raises_when_row_concurrently_deleted: closes a
shipped-without-coverage gap on the new UserNotFoundError contract.
- test_thread_meta_repo.py: 4 tests for check_access(require_existing=True)
— strict missing-row denial, strict owner match, strict owner mismatch,
strict null-owner still allowed (shared rows survive the tightening).
- test_ensure_admin.py: 3 tests for _migrate_orphaned_threads /
_iter_store_items pagination, covering the TC-UPG-02 upgrade story
end-to-end via mock store. Closes the gap where the cursor pagination
was untested even though the previous PR rewrote it.
- test_threads_router.py: 5 tests for _strip_reserved_metadata
(owner_id removal, user_id removal, safe-keys passthrough, empty
input, both-stripped).
- test_auth_type_system.py: replace "password123" fixtures with
Tr0ub4dor3a / AnotherStr0ngPwd! so the new password blocklist
doesn't reject the test data.
* docs(auth): refresh TC-DOCKER-05 + document Docker validation gap
- AUTH_TEST_PLAN.md TC-DOCKER-05: the previous expectation
("admin password visible in docker logs") was stale after the simplify
pass that moved credentials to a 0600 file. The grep "Password:" check
would have silently failed and given a false sense of coverage. New
expectation matches the actual file-based path: 0600 file in
DEER_FLOW_HOME, log shows the path (not the secret), reverse-grep
asserts no leaked password in container logs.
- NEW: docs/AUTH_TEST_DOCKER_GAP.md documents the only un-executed
block in the test plan (TC-DOCKER-01..06). Reason: sg_dev validation
host has no Docker daemon installed. The doc maps each Docker case
to an already-validated bare-metal equivalent (TC-1.1, TC-REENT-01,
TC-API-02 etc.) so the gap is auditable, and includes pre-flight
reproduction steps for whoever has Docker available.
---------
Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com>
* feat(persistence): add SQLAlchemy 2.0 async ORM scaffold
Introduce a unified database configuration (DatabaseConfig) that
controls both the LangGraph checkpointer and the DeerFlow application
persistence layer from a single `database:` config section.
New modules:
- deerflow.config.database_config — Pydantic config with memory/sqlite/postgres backends
- deerflow.persistence — async engine lifecycle, DeclarativeBase with to_dict mixin, Alembic skeleton
- deerflow.runtime.runs.store — RunStore ABC + MemoryRunStore implementation
Gateway integration initializes/tears down the persistence engine in
the existing langgraph_runtime() context manager. Legacy checkpointer
config is preserved for backward compatibility.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(persistence): add RunEventStore ABC + MemoryRunEventStore
Phase 2-A prerequisite for event storage: adds the unified run event
stream interface (RunEventStore) with an in-memory implementation,
RunEventsConfig, gateway integration, and comprehensive tests (27 cases).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(persistence): add ORM models, repositories, DB/JSONL event stores, RunJournal, and API endpoints
Phase 2-B: run persistence + event storage + token tracking.
- ORM models: RunRow (with token fields), ThreadMetaRow, RunEventRow
- RunRepository implements RunStore ABC via SQLAlchemy ORM
- ThreadMetaRepository with owner access control
- DbRunEventStore with trace content truncation and cursor pagination
- JsonlRunEventStore with per-run files and seq recovery from disk
- RunJournal (BaseCallbackHandler) captures LLM/tool/lifecycle events,
accumulates token usage by caller type, buffers and flushes to store
- RunManager now accepts optional RunStore for persistent backing
- Worker creates RunJournal, writes human_message, injects callbacks
- Gateway deps use factory functions (RunRepository when DB available)
- New endpoints: messages, run messages, run events, token-usage
- ThreadCreateRequest gains assistant_id field
- 92 tests pass (33 new), zero regressions
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(persistence): add user feedback + follow-up run association
Phase 2-C: feedback and follow-up tracking.
- FeedbackRow ORM model (rating +1/-1, optional message_id, comment)
- FeedbackRepository with CRUD, list_by_run/thread, aggregate stats
- Feedback API endpoints: create, list, stats, delete
- follow_up_to_run_id in RunCreateRequest (explicit or auto-detected
from latest successful run on the thread)
- Worker writes follow_up_to_run_id into human_message event metadata
- Gateway deps: feedback_repo factory + getter
- 17 new tests (14 FeedbackRepository + 3 follow-up association)
- 109 total tests pass, zero regressions
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test+config: comprehensive Phase 2 test coverage + deprecate checkpointer config
- config.example.yaml: deprecate standalone checkpointer section, activate
unified database:sqlite as default (drives both checkpointer + app data)
- New: test_thread_meta_repo.py (14 tests) — full ThreadMetaRepository coverage
including check_access owner logic, list_by_owner pagination
- Extended test_run_repository.py (+4 tests) — completion preserves fields,
list ordering desc, limit, owner_none returns all
- Extended test_run_journal.py (+8 tests) — on_chain_error, track_tokens=false,
middleware no ai_message, unknown caller tokens, convenience fields,
tool_error, non-summarization custom event
- Extended test_run_event_store.py (+7 tests) — DB batch seq continuity,
make_run_event_store factory (memory/db/jsonl/fallback/unknown)
- Extended test_phase2b_integration.py (+4 tests) — create_or_reject persists,
follow-up metadata, summarization in history, full DB-backed lifecycle
- Fixed DB integration test to use proper fake objects (not MagicMock)
for JSON-serializable metadata
- 157 total Phase 2 tests pass, zero regressions
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* config: move default sqlite_dir to .deer-flow/data
Keep SQLite databases alongside other DeerFlow-managed data
(threads, memory) under the .deer-flow/ directory instead of a
top-level ./data folder.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(persistence): remove UTFJSON, use engine-level json_serializer + datetime.now()
- Replace custom UTFJSON type with standard sqlalchemy.JSON in all ORM
models. Add json_serializer=json.dumps(ensure_ascii=False) to all
create_async_engine calls so non-ASCII text (Chinese etc.) is stored
as-is in both SQLite and Postgres.
- Change ORM datetime defaults from datetime.now(UTC) to datetime.now(),
remove UTC imports.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(gateway): simplify deps.py with getter factory + inline repos
- Replace 6 identical getter functions with _require() factory.
- Inline 3 _make_*_repo() factories into langgraph_runtime(), call
get_session_factory() once instead of 3 times.
- Add thread_meta upsert in start_run (services.py).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(docker): add UV_EXTRAS build arg for optional dependencies
Support installing optional dependency groups (e.g. postgres) at
Docker build time via UV_EXTRAS build arg:
UV_EXTRAS=postgres docker compose build
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(journal): fix flush, token tracking, and consolidate tests
RunJournal fixes:
- _flush_sync: retain events in buffer when no event loop instead of
dropping them; worker's finally block flushes via async flush().
- on_llm_end: add tool_calls filter and caller=="lead_agent" guard for
ai_message events; mark message IDs for dedup with record_llm_usage.
- worker.py: persist completion data (tokens, message count) to RunStore
in finally block.
Model factory:
- Auto-inject stream_usage=True for BaseChatOpenAI subclasses with
custom api_base, so usage_metadata is populated in streaming responses.
Test consolidation:
- Delete test_phase2b_integration.py (redundant with existing tests).
- Move DB-backed lifecycle test into test_run_journal.py.
- Add tests for stream_usage injection in test_model_factory.py.
- Clean up executor/task_tool dead journal references.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(events): widen content type to str|dict in all store backends
Allow event content to be a dict (for structured OpenAI-format messages)
in addition to plain strings. Dict values are JSON-serialized for the DB
backend and deserialized on read; memory and JSONL backends handle dicts
natively. Trace truncation now serializes dicts to JSON before measuring.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(events): use metadata flag instead of heuristic for dict content detection
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(converters): add LangChain-to-OpenAI message format converters
Pure functions langchain_to_openai_message, langchain_to_openai_completion,
langchain_messages_to_openai, and _infer_finish_reason for converting
LangChain BaseMessage objects to OpenAI Chat Completions format, used by
RunJournal for event storage. 15 unit tests added.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(converters): handle empty list content as null, clean up test
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(events): human_message content uses OpenAI user message format
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(events): ai_message uses OpenAI format, add ai_tool_call message event
- ai_message content now uses {"role": "assistant", "content": "..."} format
- New ai_tool_call message event emitted when lead_agent LLM responds with tool_calls
- ai_tool_call uses langchain_to_openai_message converter for consistent format
- Both events include finish_reason in metadata ("stop" or "tool_calls")
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(events): add tool_result message event with OpenAI tool message format
Cache tool_call_id from on_tool_start keyed by run_id as fallback for on_tool_end,
then emit a tool_result message event (role=tool, tool_call_id, content) after each
successful tool completion.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(events): summary content uses OpenAI system message format
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(events): replace llm_start/llm_end with llm_request/llm_response in OpenAI format
Add on_chat_model_start to capture structured prompt messages as llm_request events.
Replace llm_end trace events with llm_response using OpenAI Chat Completions format.
Track llm_call_index to pair request/response events.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(events): add record_middleware method for middleware trace events
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test(events): add full run sequence integration test for OpenAI content format
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(events): align message events with checkpoint format and add middleware tag injection
- Message events (ai_message, ai_tool_call, tool_result, human_message) now use
BaseMessage.model_dump() format, matching LangGraph checkpoint values.messages
- on_tool_end extracts tool_call_id/name/status from ToolMessage objects
- on_tool_error now emits tool_result message events with error status
- record_middleware uses middleware:{tag} event_type and middleware category
- Summarization custom events use middleware:summarize category
- TitleMiddleware injects middleware:title tag via get_config() inheritance
- SummarizationMiddleware model bound with middleware:summarize tag
- Worker writes human_message using HumanMessage.model_dump()
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(threads): switch search endpoint to threads_meta table and sync title
- POST /api/threads/search now queries threads_meta table directly,
removing the two-phase Store + Checkpointer scan approach
- Add ThreadMetaRepository.search() with metadata/status filters
- Add ThreadMetaRepository.update_display_name() for title sync
- Worker syncs checkpoint title to threads_meta.display_name on run completion
- Map display_name to values.title in search response for API compatibility
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(threads): history endpoint reads messages from event store
- POST /api/threads/{thread_id}/history now combines two data sources:
checkpointer for checkpoint_id, metadata, title, thread_data;
event store for messages (complete history, not truncated by summarization)
- Strip internal LangGraph metadata keys from response
- Remove full channel_values serialization in favor of selective fields
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: remove duplicate optional-dependencies header in pyproject.toml
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(middleware): pass tagged config to TitleMiddleware ainvoke call
Without the config, the middleware:title tag was not injected,
causing the LLM response to be recorded as a lead_agent ai_message
in run_events.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: resolve merge conflict in .env.example
Keep both DATABASE_URL (from persistence-scaffold) and WECOM
credentials (from main) after the merge.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(persistence): address review feedback on PR #1851
- Fix naive datetime.now() → datetime.now(UTC) in all ORM models
- Fix seq race condition in DbRunEventStore.put() with FOR UPDATE
and UNIQUE(thread_id, seq) constraint
- Encapsulate _store access in RunManager.update_run_completion()
- Deduplicate _store.put() logic in RunManager via _persist_to_store()
- Add update_run_completion to RunStore ABC + MemoryRunStore
- Wire follow_up_to_run_id through the full create path
- Add error recovery to RunJournal._flush_sync() lost-event scenario
- Add migration note for search_threads breaking change
- Fix test_checkpointer_none_fix mock to set database=None
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: update uv.lock
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(persistence): address 22 review comments from CodeQL, Copilot, and Code Quality
Bug fixes:
- Sanitize log params to prevent log injection (CodeQL)
- Reset threads_meta.status to idle/error when run completes
- Attach messages only to latest checkpoint in /history response
- Write threads_meta on POST /threads so new threads appear in search
Lint fixes:
- Remove unused imports (journal.py, migrations/env.py, test_converters.py)
- Convert lambda to named function (engine.py, Ruff E731)
- Remove unused logger definitions in repos (Ruff F841)
- Add logging to JSONL decode errors and empty except blocks
- Separate assert side-effects in tests (CodeQL)
- Remove unused local variables in tests (Ruff F841)
- Fix max_trace_content truncation to use byte length, not char length
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* style: apply ruff format to persistence and runtime files
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Potential fix for pull request finding 'Statement has no effect'
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
* refactor(runtime): introduce RunContext to reduce run_agent parameter bloat
Extract checkpointer, store, event_store, run_events_config, thread_meta_repo,
and follow_up_to_run_id into a frozen RunContext dataclass. Add get_run_context()
in deps.py to build the base context from app.state singletons. start_run() uses
dataclasses.replace() to enrich per-run fields before passing ctx to run_agent.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(gateway): move sanitize_log_param to app/gateway/utils.py
Extract the log-injection sanitizer from routers/threads.py into a shared
utils module and rename to sanitize_log_param (public API). Eliminates the
reverse service → router import in services.py.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* perf: use SQL aggregation for feedback stats and thread token usage
Replace Python-side counting in FeedbackRepository.aggregate_by_run with
a single SELECT COUNT/SUM query. Add RunStore.aggregate_tokens_by_thread
abstract method with SQL GROUP BY implementation in RunRepository and
Python fallback in MemoryRunStore. Simplify the thread_token_usage
endpoint to delegate to the new method, eliminating the limit=10000
truncation risk.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: annotate DbRunEventStore.put() as low-frequency path
Add docstring clarifying that put() opens a per-call transaction with
FOR UPDATE and should only be used for infrequent writes (currently
just the initial human_message event). High-throughput callers should
use put_batch() instead.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(threads): fall back to Store search when ThreadMetaRepository is unavailable
When database.backend=memory (default) or no SQL session factory is
configured, search_threads now queries the LangGraph Store instead of
returning 503. Returns empty list if neither Store nor repo is available.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(persistence): introduce ThreadMetaStore ABC for backend-agnostic thread metadata
Add ThreadMetaStore abstract base class with create/get/search/update/delete
interface. ThreadMetaRepository (SQL) now inherits from it. New
MemoryThreadMetaStore wraps LangGraph BaseStore for memory-mode deployments.
deps.py now always provides a non-None thread_meta_repo, eliminating all
`if thread_meta_repo is not None` guards in services.py, worker.py, and
routers/threads.py. search_threads no longer needs a Store fallback branch.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(history): read messages from checkpointer instead of RunEventStore
The /history endpoint now reads messages directly from the
checkpointer's channel_values (the authoritative source) instead of
querying RunEventStore.list_messages(). The RunEventStore API is
preserved for other consumers.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(persistence): address new Copilot review comments
- feedback.py: validate thread_id/run_id before deleting feedback
- jsonl.py: add path traversal protection with ID validation
- run_repo.py: parse `before` to datetime for PostgreSQL compat
- thread_meta_repo.py: fix pagination when metadata filter is active
- database_config.py: use resolve_path for sqlite_dir consistency
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Implement skill self-evolution and skill_manage flow (#1874)
* chore: ignore .worktrees directory
* Add skill_manage self-evolution flow
* Fix CI regressions for skill_manage
* Address PR review feedback for skill evolution
* fix(skill-evolution): preserve history on delete
* fix(skill-evolution): tighten scanner fallbacks
* docs: add skill_manage e2e evidence screenshot
* fix(skill-manage): avoid blocking fs ops in session runtime
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(config): resolve sqlite_dir relative to CWD, not Paths.base_dir
resolve_path() resolves relative to Paths.base_dir (.deer-flow),
which double-nested the path to .deer-flow/.deer-flow/data/app.db.
Use Path.resolve() (CWD-relative) instead.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Feature/feishu receive file (#1608)
* feat(feishu): add channel file materialization hook for inbound messages
- Introduce Channel.receive_file(msg, thread_id) as a base method for file materialization; default is no-op.
- Implement FeishuChannel.receive_file to download files/images from Feishu messages, save to sandbox, and inject virtual paths into msg.text.
- Update ChannelManager to call receive_file for any channel if msg.files is present, enabling downstream model access to user-uploaded files.
- No impact on Slack/Telegram or other channels (they inherit the default no-op).
* style(backend): format code with ruff for lint compliance
- Auto-formatted packages/harness/deerflow/agents/factory.py and tests/test_create_deerflow_agent.py using `ruff format`
- Ensured both files conform to project linting standards
- Fixes CI lint check failures caused by code style issues
* fix(feishu): handle file write operation asynchronously to prevent blocking
* fix(feishu): rename GetMessageResourceRequest to _GetMessageResourceRequest and remove redundant code
* test(feishu): add tests for receive_file method and placeholder replacement
* fix(manager): remove unnecessary type casting for channel retrieval
* fix(feishu): update logging messages to reflect resource handling instead of image
* fix(feishu): sanitize filename by replacing invalid characters in file uploads
* fix(feishu): improve filename sanitization and reorder image key handling in message processing
* fix(feishu): add thread lock to prevent filename conflicts during file downloads
* fix(test): correct bad merge in test_feishu_parser.py
* chore: run ruff and apply formatting cleanup
fix(feishu): preserve rich-text attachment order and improve fallback filename handling
* fix(docker): restore gateway env vars and fix langgraph empty arg issue (#1915)
Two production docker-compose.yaml bugs prevent `make up` from working:
1. Gateway missing DEER_FLOW_CONFIG_PATH and DEER_FLOW_EXTENSIONS_CONFIG_PATH
environment overrides. Added in fb2d99f (#1836) but accidentally reverted
by ca2fb95 (#1847). Without them, gateway reads host paths from .env via
env_file, causing FileNotFoundError inside the container.
2. Langgraph command fails when LANGGRAPH_ALLOW_BLOCKING is unset (default).
Empty $${allow_blocking} inserts a bare space between flags, causing
' --no-reload' to be parsed as unexpected extra argument. Fix by building
args string first and conditionally appending --allow-blocking.
Co-authored-by: cooper <cooperfu@tencent.com>
* fix(frontend): resolve invalid HTML nesting and tabnabbing vulnerabilities (#1904)
* fix(frontend): resolve invalid HTML nesting and tabnabbing vulnerabilities
Fix `<button>` inside `<a>` invalid HTML in artifact components and add
missing `noopener,noreferrer` to `window.open` calls to prevent reverse
tabnabbing.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(frontend): address Copilot review on tabnabbing and double-tab-open
Remove redundant parent onClick on web_fetch ChainOfThoughtStep to
prevent opening two tabs on link click, and explicitly null out
window.opener after window.open() for defensive tabnabbing hardening.
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* refactor(persistence): organize entities into per-entity directories
Restructure the persistence layer from horizontal "models/ + repositories/"
split into vertical entity-aligned directories. Each entity (thread_meta,
run, feedback) now owns its ORM model, abstract interface (where applicable),
and concrete implementations under a single directory with an aggregating
__init__.py for one-line imports.
Layout:
persistence/thread_meta/{base,model,sql,memory}.py
persistence/run/{model,sql}.py
persistence/feedback/{model,sql}.py
models/__init__.py is kept as a facade so Alembic autogenerate continues to
discover all ORM tables via Base.metadata. RunEventRow remains under
models/run_event.py because its storage implementation lives in
runtime/events/store/db.py and has no matching repository directory.
The repositories/ directory is removed entirely. All call sites in
gateway/deps.py and tests are updated to import from the new entity
packages, e.g.:
from deerflow.persistence.thread_meta import ThreadMetaRepository
from deerflow.persistence.run import RunRepository
from deerflow.persistence.feedback import FeedbackRepository
Full test suite passes (1690 passed, 14 skipped).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(gateway): sync thread rename and delete through ThreadMetaStore
The POST /threads/{id}/state endpoint previously synced title changes
only to the LangGraph Store via _store_upsert. In sqlite mode the search
endpoint reads from the ThreadMetaRepository SQL table, so renames never
appeared in /threads/search until the next agent run completed (worker.py
syncs title from checkpoint to thread_meta in its finally block).
Likewise the DELETE /threads/{id} endpoint cleaned up the filesystem,
Store, and checkpointer but left the threads_meta row orphaned in sqlite,
so deleted threads kept appearing in /threads/search.
Fix both endpoints by routing through the ThreadMetaStore abstraction
which already has the correct sqlite/memory implementations wired up by
deps.py. The rename path now calls update_display_name() and the delete
path calls delete() — both work uniformly across backends.
Verified end-to-end with curl in gateway mode against sqlite backend.
Existing test suite (1690 passed) and focused router/repo tests pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(gateway): route all thread metadata access through ThreadMetaStore
Following the rename/delete bug fix in PR1, migrate the remaining direct
LangGraph Store reads/writes in the threads router and services to the
ThreadMetaStore abstraction so that the sqlite and memory backends behave
identically and the legacy dual-write paths can be removed.
Migrated endpoints (threads.py):
- create_thread: idempotency check + write now use thread_meta_repo.get/create
instead of dual-writing the LangGraph Store and the SQL row.
- get_thread: reads from thread_meta_repo.get; the checkpoint-only fallback
for legacy threads is preserved.
- patch_thread: replaced _store_get/_store_put with thread_meta_repo.update_metadata.
- delete_thread_data: dropped the legacy store.adelete; thread_meta_repo.delete
already covers it.
Removed dead code (services.py):
- _upsert_thread_in_store — redundant with the immediately following
thread_meta_repo.create() call.
- _sync_thread_title_after_run — worker.py's finally block already syncs
the title via thread_meta_repo.update_display_name() after each run.
Removed dead code (threads.py):
- _store_get / _store_put / _store_upsert helpers (no remaining callers).
- THREADS_NS constant.
- get_store import (router no longer touches the LangGraph Store directly).
New abstract method:
- ThreadMetaStore.update_metadata(thread_id, metadata) merges metadata into
the thread's metadata field. Implemented in both ThreadMetaRepository (SQL,
read-modify-write inside one session) and MemoryThreadMetaStore. Three new
unit tests cover merge / empty / nonexistent behaviour.
Net change: -134 lines. Full test suite: 1693 passed, 14 skipped.
Verified end-to-end with curl in gateway mode against sqlite backend
(create / patch / get / rename / search / delete).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
Co-authored-by: DanielWalnut <45447813+hetaoBackend@users.noreply.github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: JilongSun <965640067@qq.com>
Co-authored-by: jie <49781832+stan-fu@users.noreply.github.com>
Co-authored-by: cooper <cooperfu@tencent.com>
Co-authored-by: yangzheli <43645580+yangzheli@users.noreply.github.com>
* fix: cap prompt caching breakpoints at 4 to prevent API 400 errors (fixes#2448)
The previous _apply_prompt_caching() attached cache_control to every text
block in the system prompt, every content block in the last N messages, and
the last tool definition. In multi-turn conversations with structured content
blocks this easily exceeded the 4-breakpoint hard limit enforced by both the
Anthropic API and AWS Bedrock, producing a 400 Bad Request (or a silent
"No generations found in stream" when streaming).
Fix: collect all candidate blocks in document order, then apply cache_control
only to the last MAX_CACHE_BREAKPOINTS (4) of them. Later breakpoints cover a
larger prefix and therefore yield better cache hit rates, making this the
optimal placement strategy as well as the safe one.
Adds 13 unit tests covering the budget cap, edge cases, and correct
last-candidate placement.
* docs: clarify _apply_prompt_caching docstring includes tool definitions
Per Copilot review: the implementation also caches the last tool definition
(see the candidates list at lines 202-205), so the docstring summary should
explicitly mention tools alongside system and recent messages.
* Fix the lint error
* style: fix ruff format check for test_claude_provider_prompt_caching.py
Add the missing blank line before the 'Edge cases' section comment so
that ruff format --check passes in CI.
---------
Co-authored-by: octo-patch <octo-patch@github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* feat(mcp): support custom tool interceptors via extensions_config.json
Add a generic extension point for registering custom MCP tool
interceptors through `extensions_config.json`. This allows downstream
projects to inject per-request header manipulation, auth context
propagation, or other cross-cutting concerns without modifying
DeerFlow source code.
Interceptors are declared as Python callable paths in a new
`mcpInterceptors` array field and loaded via the existing
`resolve_variable` reflection mechanism:
```json
{
"mcpInterceptors": [
"my_package.mcp.auth:build_auth_interceptor"
]
}
```
Each entry must resolve to a no-arg builder function that returns an
async interceptor compatible with `MultiServerMCPClient`'s
`tool_interceptors` interface.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test(mcp): add unit tests for custom tool interceptors
Cover all branches of the mcpInterceptors loading logic:
- valid interceptor loaded and appended to tool_interceptors
- multiple interceptors loaded in declaration order
- builder returning None is skipped
- resolve_variable ImportError logged and skipped
- builder raising exception logged and skipped
- absent mcpInterceptors field is safe (no-op)
- custom interceptors coexist with OAuth interceptor
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* fix(mcp): validate mcpInterceptors type and fix lint warnings
Address review feedback:
1. Validate mcpInterceptors config value before iterating:
- Accept a single string and normalize to [string]
- Ignore None silently
- Log warning and skip for non-list/non-string types
2. Fix ruff F841 lint errors in tests:
- Rename _make_mock_env to _make_patches, embed mock_client
- Remove unused `as mock_cls` bindings where not needed
- Extract _get_interceptors() helper to reduce repetition
3. Add two new test cases for type validation:
- test_mcp_interceptors_single_string_is_normalized
- test_mcp_interceptors_invalid_type_logs_warning
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(mcp): validate interceptor return type and fix import mock path
Address review feedback:
1. Validate builder return type with callable() check:
- callable interceptor → append to tool_interceptors
- None → silently skip (builder opted out)
- non-callable → log warning with type name and skip
2. Fix test mock path: resolve_variable is a top-level import in
tools.py, so mock deerflow.mcp.tools.resolve_variable instead of
deerflow.reflection.resolve_variable to correctly intercept calls.
3. Add test_custom_interceptor_non_callable_return_logs_warning to
cover the new non-callable validation branch.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs(mcp): add mcpInterceptors example and documentation
- Add mcpInterceptors field to extensions_config.example.json
- Add "Custom Tool Interceptors" section to MCP_SERVER.md with
configuration format, example interceptor code, and edge case
behavior notes
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: IECspace <IECspace@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* fix: use subprocess instead of os.system in local_backend.py
The sandbox backend and skill evaluation scripts use subprocess
* fixing the failing test
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(debug): keep terminal clean by redirecting all logs to file
- Redirect all logs to debug.log file to prevent background task logs
from interfering with interactive terminal prompts
- Honor AppConfig.log_level setting instead of hard-coding to INFO
- Make logging setup idempotent by clearing pre-existing handlers
- Defer deerflow imports until after logging is configured to ensure
import-time side effects are captured in debug.log
- Display active log level in startup banner
- Add prompt_toolkit installation tip for enhanced readline support
Made-with: Cursor
* attaching the file handler before importing/calling get_app_config()
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* feat(trace): Add `run_name` to the trace info for suggestions and memory.
before(in langsmith):
CodexChatModel
CodexChatModel
lead_agent
after:
suggest_agent
memory_agent
lead_agent
feat(trace): Add `run_name` to the trace info for suggestions and memory.
before(in langsmith):
CodexChatModel
CodexChatModel
lead_agent
after:
suggest_agent
memory_agent
lead_agent
* feat(trace): Add `run_name` to the trace info for system agents.
before(in langsmith):
CodexChatModel
CodexChatModel
CodexChatModel
CodexChatModel
lead_agent
after:
suggest_agent
title_agent
security_agent
memory_agent
lead_agent
* chore(code format):code format
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
The exception handler in JinaClient.crawl used logger.exception, which
emits an ERROR-level record with the full httpx/httpcore/anyio traceback
for every transient network failure (timeout, connection refused). Other
search/crawl providers in the project log the same class of recoverable
failures as a single line. One offline/slow-network session could produce
dozens of multi-frame ERROR stack traces, drowning out real problems.
Switch to logger.warning with a concise message that includes the
exception type and its str, matching the style used elsewhere for
recoverable transient failures (aio_sandbox, ddg, etc.). The exception
type now also surfaces into the returned "Error: ..." string so callers
retain diagnostic signal.
Adds a regression test that asserts the log record is WARNING, carries
no exc_info, and includes the exception class name.
Co-authored-by: voidborne-d <voidborne-d@users.noreply.github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* feat(subagents): support per-subagent skill loading and custom subagent types (#2230)
Add per-subagent skill configuration and custom subagent type registration,
aligned with Codex's role-based config layering and per-session skill injection.
Backend:
- SubagentConfig gains `skills` field (None=all, []=none, list=whitelist)
- New CustomSubagentConfig for user-defined subagent types in config.yaml
- SubagentsAppConfig gains `custom_agents` section and `get_skills_for()`
- Registry resolves custom agents with three-layer config precedence
- SubagentExecutor loads skills per-session as conversation items (Codex pattern)
- task_tool no longer appends skills to system_prompt
- Lead agent system prompt dynamically lists all registered subagent types
- setup_agent tool accepts optional skills parameter
- Gateway agents API transparently passes skills in CRUD operations
Frontend:
- Agent/CreateAgentRequest/UpdateAgentRequest types include skills field
- Agent card displays skills as badges alongside tool_groups
Config:
- config.example.yaml documents custom_agents and per-agent skills override
Tests:
- 40 new tests covering all skill config, custom agents, and registry logic
- Existing tests updated for new get_skills_prompt_section signature
Closes#2230
* fix: address review feedback on skills PR
- Remove stale get_skills_prompt_section monkeypatches from test_task_tool_core_logic.py
(task_tool no longer imports this function after skill injection moved to executor)
- Add key prefixes (tg:/sk:) to agent-card badges to prevent React key collisions
between tool_groups and skills
* fix(ci): resolve lint and test failures
- Format agent-card.tsx with prettier (lint-frontend)
- Remove stale "Skills Appendix" system_prompt assertion — skills are now
loaded per-session by SubagentExecutor, not appended to system_prompt
* fix(ci): sort imports in test_subagent_skills_config.py (ruff I001)
* fix(ci): use nullish coalescing in agent-card badge condition (eslint)
* fix: address review feedback on skills PR
- Use model_fields_set in AgentUpdateRequest to distinguish "field omitted"
from "explicitly set to null" — fixes skills=None ambiguity where None
means "inherit all" but was treated as "don't change"
- Move lazy import of get_subagent_config outside loop in
_build_available_subagents_description to avoid repeated import overhead
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(gateway): bound lifespan shutdown hooks to prevent worker hang
Gateway worker can hang indefinitely in `uvicorn --reload` mode with
the listening socket still bound — all /api/* requests return 504,
and SIGKILL is the only recovery.
Root cause (py-spy dump from a reproduction showed 16+ stacked frames
of signal_handler -> Event.set -> threading.Lock.__enter__ on the
main thread): CPython's `threading.Event` uses `Condition(Lock())`
where the inner Lock is non-reentrant. uvicorn's BaseReload signal
handler calls `should_exit.set()` directly from signal context; if a
second signal (SIGTERM/SIGHUP from the reload supervisor, or
watchfiles-triggered reload) arrives while the first handler holds
the Lock, the reentrant call deadlocks on itself.
The reload supervisor keeps sending those signals only when the
worker fails to exit promptly. DeerFlow's lifespan currently awaits
`stop_channel_service()` with no timeout; if a channel's `stop()`
stalls (e.g. Feishu/Slack WebSocket waiting for an ack), the worker
can't exit, the supervisor keeps signaling, and the deadlock becomes
reachable.
This is a defense-in-depth fix — it does not repair the upstream
uvicorn/CPython issue, but it ensures DeerFlow's lifespan exits
within a bounded window so the supervisor has no reason to keep
firing signals. No behavior change on the happy path.
Wraps the shutdown hook in `asyncio.wait_for(timeout=5.0)` and logs
a warning on timeout before proceeding to worker exit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Update backend/app/gateway/app.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* style: apply make format (ruff) to test assertions
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* feat: add optional prompt-toolkit support to debug.py
Use PromptSession.prompt_async() for arrow-key navigation and input
history when prompt-toolkit is available, falling back to plain input()
with a helpful install tip otherwise.
Made-with: Cursor
* fix: handle EOFError gracefully in debug.py
Catch EOFError alongside KeyboardInterrupt so that Ctrl-D exits
cleanly instead of printing a traceback.
Made-with: Cursor
* fix(skills): validate bundled SKILL.md front-matter in CI (fixes#2443)
Adds a parametrized backend test that runs `_validate_skill_frontmatter`
against every bundled SKILL.md under `skills/public/`, so a broken
front-matter fails CI with a per-skill error message instead of
surfacing as a runtime gateway-load warning.
The new test caught two pre-existing breakages on `main` and fixes them:
* `bootstrap/SKILL.md`: the unquoted description had a second `:` mid-line
("Also trigger for updates: ..."), which YAML parses as a nested mapping
("mapping values are not allowed here"). Rewrites the description as a
folded scalar (`>-`), which preserves the original wording (including the
embedded colon, double quotes, and apostrophes) without further escaping.
This complements PR #2436 (single-file colon→hyphen patch) with a more
general convention that survives future edits.
* `chart-visualization/SKILL.md`: used `dependency:` which is not in
`ALLOWED_FRONTMATTER_PROPERTIES`. Renamed to `compatibility:`, the
documented field for "Required tools, dependencies" per skill-creator.
No code reads `dependency` (verified by grep across backend/).
* Apply suggestions from code review
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* Fix the lint error
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* fix: remove mismatched context param in debug.py to suppress Pydantic warning
The ainvoke call passed context={"thread_id": ...} but the agent graph
has no context_schema (ContextT defaults to None), causing a
PydanticSerializationUnexpectedValue warning on every invocation.
Align with the production run_agent path by injecting context via
Runtime into configurable["__pregel_runtime"] instead.
Closes#2445
Made-with: Cursor
* refactor: derive runtime thread_id from config to avoid duplication
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Made-with: Cursor
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
The tool is registered as `present_files` (plural) in present_file_tool.py,
but four references in documentation and prompt strings incorrectly used the
singular form `present_file`. This could cause confusion and potentially
lead to incorrect tool invocations.
Changed files:
- backend/docs/GUARDRAILS.md
- backend/docs/ARCHITECTURE.md
- backend/packages/harness/deerflow/agents/lead_agent/prompt.py (2 occurrences)
- Remove f-string prefix on 7 strings with no placeholders (F541)
in analyze.py, aggregate_benchmark.py, run_loop.py, generate_review.py
- Remove unused `os` import in quick_validate.py (F401)
Found by ruff via HUMMBL Arbiter (https://hummbl.io/audit).
* Refactor tests for SKILL.md parser
Updated tests for SKILL.md parser to handle quoted names and descriptions correctly. Added new tests for parsing plain and single-quoted names, and ensured multi-line descriptions are processed properly.
* Implement tool name validation and deduplication
Add tool name mismatch warning and deduplication logic
* Refactor skill file parsing and error handling
* Add tests for tool name deduplication
Added tests for tool name deduplication in get_available_tools(). Ensured that duplicates are not returned, the first occurrence is kept, and warnings are logged for skipped duplicates.
* Apply suggestions from code review
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* Update minimal config to include tools list
* Update test for nonexistent skill file
Ensure the test for nonexistent files checks for None.
* Refactor tool loading and add skill management support
Refactor tool loading logic to include skill management tools based on configuration and clean up comments.
* Enhance code comments for tool loading logic
Added comments to clarify the purpose of various code sections related to tool loading and configuration.
* Fix assertion for duplicate tool name warning
* Fix indentation issues in tools.py
* Fix the lint error of test_tool_deduplication
* Fix the lint error of tools.py
* Fix the lint error
* Fix the lint error
* make format
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* fix(setup-agent): prevent data loss when setup fails on existing agent directory
Record whether the agent directory pre-existed before mkdir, and only
run shutil.rmtree cleanup when the directory was newly created during
this call. Previously, any failure would delete the entire directory
including pre-existing SOUL.md and config.yaml.
* fix: address PR review — init variables before try, remove unused result
* style: fix ruff I001 import block formatting in test file
* style: add missing blank lines between top-level definitions in test file
* fix(subagent): inherit parent agent's tool_groups in task_tool
When a custom agent defines tool_groups (e.g. [file:read, file:write, bash]),
the restriction is correctly applied to the lead agent. However, when the lead
agent delegates work to a subagent via the task tool, get_available_tools() is
called without the groups parameter, causing the subagent to receive ALL tools
(including web_search, web_fetch, image_search, etc.) regardless of the parent
agent's configuration.
This fix propagates tool_groups through run metadata so that task_tool passes
the same group filter when building the subagent's tool set.
Changes:
- agent.py: include tool_groups in run metadata
- task_tool.py: read tool_groups from metadata and pass to get_available_tools()
* fix: initialize metadata before conditional block and update tests for tool_groups propagation
- Initialize metadata = {} before the 'if runtime is not None' block to
avoid Ruff F821 (possibly-undefined variable) and simplify the
parent_tool_groups expression.
- Update existing test assertion to expect groups=None in
get_available_tools call signature.
- Add 3 new test cases:
- test_task_tool_propagates_tool_groups_to_subagent
- test_task_tool_no_tool_groups_passes_none
- test_task_tool_runtime_none_passes_groups_none
* fix(mcp): prevent RuntimeError from escaping except block in get_cached_mcp_tools
When `asyncio.get_event_loop()` raises RuntimeError and the fallback
`asyncio.run()` also fails, the exception escapes unhandled because
Python does not route exceptions raised inside an `except` block to
sibling `except` clauses. Wrap the fallback call in its own try/except
so failures are logged and the function returns [] as intended.
* fix: use logger.exception to preserve stack traces on MCP init failure
When NEXT_PUBLIC_BACKEND_BASE_URL is unset, the frontend proxies API
requests to the gateway. Only /api/agents and /api/skills had rewrite
rules, causing 404s for /api/models, /api/threads, /api/memory,
/api/mcp, /api/suggestions, /api/runs, etc.
Add a catch-all /api/:path* rewrite that proxies all remaining gateway
API routes. The existing /api/langgraph rewrite takes priority because
it is pushed to the array first (Next.js checks rewrites in order).
Fixes#2327
Co-authored-by: JasonOA888 <JasonOA888@users.noreply.github.com>
ls_tool was the only file-system tool that did not call
mask_local_paths_in_output() before returning its result, causing host
absolute paths (e.g. /Users/.../backend/.deer-flow/knowledge-base/...)
to leak to the LLM instead of the expected virtual paths
(/mnt/knowledge-base/...).
This patch:
- Adds the mask_local_paths_in_output() call to ls_tool, consistent
with bash_tool, glob_tool and grep_tool.
- Initialises thread_data = None before the is_local_sandbox branch
(same pattern as glob_tool) so the variable is always in scope.
- Adds three new tests covering user-data path masking, skills path
masking and the empty-directory edge case.
* fix(memory): cache corruption, thread-safety, and caller mutation bugs
Bug 1 (updater.py): deep-copy current_memory before passing to
_apply_updates() so a subsequent save() failure cannot leave a
partially-mutated object in the storage cache.
Bug 3 (storage.py): add _cache_lock (threading.Lock) to
FileMemoryStorage and acquire it around every read/write of
_memory_cache, fixing concurrent-access races between the background
timer thread and HTTP reload calls.
Bug 4 (storage.py): replace in-place mutation
memory_data["lastUpdated"] = ...
with a shallow copy
memory_data = {**memory_data, "lastUpdated": ...}
so save() no longer silently modifies the caller's dict.
Regression tests added for all three bugs in test_memory_storage.py
and test_memory_updater.py.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* style: format test_memory_updater.py with ruff
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* style: remove stale bug-number labels from code comments and docstrings
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(checkpointer): create parent directory before opening SQLite in sync provider
The sync checkpointer factory (_sync_checkpointer_cm) opens a SQLite
connection without first ensuring the parent directory exists. The async
provider and both store providers already call ensure_sqlite_parent_dir(),
but this call was missing from the sync path.
When the deer-flow harness package is used from an external virtualenv
(where the .deer-flow directory is not pre-created), the missing parent
directory causes:
sqlite3.OperationalError: unable to open database file
Add the missing ensure_sqlite_parent_dir() call in the sync SQLite
branch, consistent with the async provider, and add a regression test.
Closes#2259
* style: fix ruff format + add call-order assertion for ensure_parent_dir
- Fix formatting in test_checkpointer.py (ruff format)
- Add test_sqlite_ensure_parent_dir_before_connect to verify
ensure_sqlite_parent_dir is called before from_conn_string
(addresses Copilot review suggestion)
---------
Co-authored-by: voidborne-d <voidborne-d@users.noreply.github.com>
* fix(frontend): make Suggestion button opaque in dark mode
The outline Button variant applies dark:bg-input/30, leaving Suggestion
pills ~70% transparent in dark mode. Scrolled chat content bled through
the buttons, making suggestion text unreadable. Override with
dark:bg-background so it matches the opaque light-mode appearance.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix the lint error of commit
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
After a page refresh, the artifact panel's autoOpen/autoSelect state is
reset to true. Submitting a new question flips thread.isLoading to true,
which message-list passes to every MessageGroup — including historical
ones. The previous response's last write_file step then satisfies the
auto-open condition and re-pops the stale artifact.
Gate the auto-open on the tool call having no result yet, so only a
write_file that is still streaming in the current response can trigger
it; rehydrated tool calls always carry a result and are now skipped.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* test: add unit tests for ViewImageMiddleware
- Add 33 test cases covering all 7 internal methods plus sync/async
before_model hooks
- Cover normal path, edge cases (missing keys, empty base64, stale
ToolMessages before assistant turn), and deduplication logic
- Related to Q2 Roadmap #1669
* test: add unit tests for ViewImageMiddleware
Add 35 test cases covering all internal methods, before_model hooks,
and edge cases (missing attrs, list-content dedup, stale ToolMessages).
Related to #1669
Fixes#2203
When NEXT_PUBLIC_BACKEND_BASE_URL is not set, the frontend uses Next.js
rewrites to proxy API calls to the gateway. Skills API routes were missing
from the rewrite config, causing /api/skills to return the SPA HTML instead
of JSON, which produced 'Unexpected token <' errors in the skill settings page.
Co-authored-by: JasonOA888 <JasonOA888@users.noreply.github.com>
* fix(gateway): forward agent_name and is_bootstrap from context to configurable
The frontend sends agent_name and is_bootstrap via the context field
in run requests, but services.py only forwards a hardcoded whitelist
of keys (_CONTEXT_CONFIGURABLE_KEYS) into the agent's configurable
dict. Since agent_name was missing, custom agents never received
their name — make_lead_agent always fell back to the default lead
agent, skipping SOUL.md, per-agent config and skill filtering.
Similarly, is_bootstrap was dropped, so the bootstrap creation flow
could never activate the setup_agent tool path.
Add both keys to the whitelist so they reach make_lead_agent.
Fixes#2222
* fix(frontend): resolve /mnt/ links in markdown to artifact API URLs
AI agent messages contain links like /mnt/user-data/outputs/file.pdf
which were rendered as-is in the browser, resulting in 404 errors.
Images already got the correct treatment via MessageImage and
resolveArtifactURL, but anchor tags (<a>) were passed through
unchanged.
Add an 'a' component override in MessageContent_ that rewrites
/mnt/-prefixed hrefs to the artifact API endpoint, matching the
existing image handling pattern.
Fixes#2232
---------
Co-authored-by: JasonOA888 <JasonOA888@users.noreply.github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
The frontend sends agent_name and is_bootstrap via the context field
in run requests, but services.py only forwards a hardcoded whitelist
of keys (_CONTEXT_CONFIGURABLE_KEYS) into the agent's configurable
dict. Since agent_name was missing, custom agents never received
their name — make_lead_agent always fell back to the default lead
agent, skipping SOUL.md, per-agent config and skill filtering.
Similarly, is_bootstrap was dropped, so the bootstrap creation flow
could never activate the setup_agent tool path.
Add both keys to the whitelist so they reach make_lead_agent.
Fixes#2222
Co-authored-by: JasonOA888 <JasonOA888@users.noreply.github.com>
* fix(memory): use asyncio.to_thread for blocking file I/O in aupdate_memory
`_finalize_update` performs synchronous blocking operations (os.mkdir,
file open/write/rename/stat) that were called directly from the async
`aupdate_memory` method, causing `BlockingError` from blockbuster when
running under an ASGI server. Wrap the call with `asyncio.to_thread` to
offload all blocking I/O to a thread pool.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(memory): use unique temp filename to prevent concurrent write collision
`file_path.with_suffix(".tmp")` produces a fixed path — concurrent saves
for the same agent (now possible after wrapping _finalize_update in
asyncio.to_thread) would clobber the same temp file. Use a UUID-suffixed
temp file so each write is isolated.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(memory): also offload _prepare_update_prompt to thread pool
FileMemoryStorage.load() inside _prepare_update_prompt performs
synchronous stat() and file read, blocking the event loop just like
_finalize_update did. Wrap _prepare_update_prompt in asyncio.to_thread
for the same reason.
The async path now has no blocking file I/O on the event loop:
to_thread(_prepare_update_prompt) → await model.ainvoke() → to_thread(_finalize_update)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(todo-middleware): prevent premature agent exit with incomplete todos
When plan mode is active (is_plan_mode=True), the agent occasionally
exits the loop and outputs a final response while todo items are still
incomplete. This happens because the routing edge only checks for
tool_calls, not todo completion state.
Fixes#2112
Add an after_model override to TodoMiddleware with
@hook_config(can_jump_to=["model"]). When the model produces a
response with no tool calls but there are still incomplete todos, the
middleware injects a todo_completion_reminder HumanMessage and returns
jump_to=model to force another model turn. A cap of 2 reminders
prevents infinite loops when the agent cannot make further progress.
Also adds _completion_reminder_count() helper and 14 new unit tests
covering all edge cases of the new after_model / aafter_model logic.
* Remove unnecessary blank line in test file
* Fix runtime argument annotation in before_model
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: octo-patch <octo-patch@github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* docs: mark memory updater async migration as completed
- Update TODO.md to mark the replacement of sync model.invoke()
with async model.ainvoke() in title_middleware and memory updater
as completed using [x] format
Addresses #2131
* feat: switch memory updater to async LLM calls
- Add async aupdate_memory() method using await model.ainvoke()
- Convert sync update_memory() to use async wrapper
- Add _run_async_update_sync() for nested loop context handling
- Maintain backward compatibility with existing sync API
- Add ThreadPoolExecutor for async execution from sync contexts
Addresses #2131
* test: add tests for async memory updater
- Add test_async_update_memory_uses_ainvoke() to verify async path
- Convert existing tests to use AsyncMock and ainvoke assertions
- Add test_sync_update_memory_wrapper_works_in_running_loop()
- Update all model mocks to use async await patterns
Addresses #2131
* fix: apply ruff formatting to memory updater
- Format multi-line expressions to single line
- Ensure code style consistency with project standards
- Fix lint issues caught by GitHub Actions
* test: add comprehensive tests for async memory updater
- Add test_async_update_memory_uses_ainvoke() to verify async path
- Convert existing tests to use AsyncMock and ainvoke assertions
- Add test_sync_update_memory_wrapper_works_in_running_loop()
- Update all model mocks to use async await patterns
- Ensure backward compatibility with sync API
* fix: satisfy ruff formatting in memory updater test
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix: resolve Windows pnpm detection in check script
* style: format check script regression test
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* fix: resolve corepack fallback on windows
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* fix(title): strip <think> tags from title model responses and assistant context
Reasoning models (e.g. minimax M2.7, DeepSeek-R1) emit <think>...</think>
blocks before their actual output. When such a model is used as the title
model (or as the main agent), the raw thinking content leaked into the thread
title stored in state, so the chat list showed the internal monologue instead
of a meaningful title.
Fixes#1884
- Add `_strip_think_tags()` helper using a regex to remove all <think>...</think> blocks
- Apply it in `_parse_title()` so the title model response is always clean
- Apply it to the assistant message in `_build_title_prompt()` so thinking
content from the first AI turn is not fed back to the title model
- Add four new unit tests covering: stripping in parse, think-only response,
assistant prompt stripping, and end-to-end async flow with think tags
* Fix the lint error
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix: disable custom-agent management API by default
* style: format agents API hardening files
* fix: address review feedback for agents API hardening
* fix: add missing disabled API coverage
* fix: wrap blocking readability call with asyncio.to_thread in web_fetch
The readability extractor internally spawns a Node.js subprocess via
readabilipy, which blocks the async event loop and causes a
BlockingError when web_fetch is invoked inside LangGraph's async
runtime.
Wrap the synchronous extract_article call with asyncio.to_thread to
offload it to a thread pool, unblocking the event loop.
Note: community/infoquest/tools.py has the same latent issue and
should be addressed in a follow-up PR.
Closes#2152
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* test: verify web_fetch offloads extraction via asyncio.to_thread
Add a regression test that monkeypatches asyncio.to_thread to confirm
readability extraction is offloaded to a worker thread, preventing
future refactors from reintroducing the blocking call.
Addresses Copilot review feedback on #2157.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* feat: set up Vitest frontend testing infrastructure with CI workflow
Migrate existing 4 frontend test files from Node.js native test runner
(node:test + node:assert/strict) to Vitest, reorganize test directory
structure under tests/unit/ mirroring src/ layout, and add a dedicated
CI workflow for frontend unit tests.
- Add vitest as devDependency, remove tsx
- Create vitest.config.ts with @/ path alias
- Migrate tests to Vitest API (test/expect/vi)
- Rename .mjs test files to .ts
- Move tests from src/ to tests/unit/ (mirrors src/ layout)
- Add frontend/Makefile `test` target
- Add .github/workflows/frontend-unit-tests.yml (parallel to backend)
- Update CONTRIBUTING.md, README.md, AGENTS.md, CLAUDE.md
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* style: fix the lint error
* style: fix the lint error
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
- Move time.sleep() -> asyncio.sleep() from Planned to Completed Features
- Clean up duplicate entries in TODO.md
Ensures completed async optimizations are properly tracked.
* feat(subagents): allow model override per subagent in config.yaml
Wire the existing SubagentConfig.model field to config.yaml so users
can assign different models to different subagent types.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test(subagents): cover model override in SubagentsAppConfig + registry
Addresses review feedback on #2064:
- registry.py: update stale inline comment — the block now applies
timeout, max_turns AND model overrides, not just timeout.
- test_subagent_timeout_config.py: add coverage for model override
resolution across SubagentOverrideConfig, SubagentsAppConfig
(get_model_for + load), and registry.get_subagent_config:
- per-agent model override is applied to registry-returned config
- omitted `model` keeps the builtin value
- explicit `model: null` in config.yaml is equivalent to omission
- model override on one agent does not affect other agents
- model override preserves all other fields (name, description,
timeout_seconds, max_turns)
- model override does not mutate BUILTIN_SUBAGENTS
Copilot's suggestion (3) "setting model to 'inherit' forces inheritance"
is skipped intentionally: there is no 'inherit' sentinel in the current
implementation — model is `str | None`, and None already means
"inherit from parent". Adding a sentinel would be a new feature, not
test coverage for this PR.
Tests run locally: 51 passed (37 existing + 14 new / expanded).
* test(subagents): reject empty-string model at config load time
Addresses WillemJiang's review comment on #2064 (empty-string edge case):
- subagents_config.py: add `min_length=1` to the `model` field on
SubagentOverrideConfig. `model: ""` in config.yaml would otherwise
bypass the `is not None` check and reach create_chat_model(name="")
as a confusing runtime error. This is symmetric with the existing
`ge=1` guards on timeout_seconds / max_turns, so the validation style
stays consistent across all three override fields.
- test_subagent_timeout_config.py: add test_rejects_empty_model
mirroring the existing test_rejects_zero / test_rejects_negative
cases; update the docstring on test_model_accepts_any_string (now
test_model_accepts_any_non_empty_string) to reflect the new guard.
Not addressing the first comment (validating `model` against the
`models:` section at load time) in this PR. `SubagentsAppConfig` is
scoped to the `subagents:` block and cannot see the sibling `models:`
section, so proper cross-section validation needs a second pass or a
structural change that is out of scope here — and the current behavior
is consistent with how timeout_seconds / max_turns work today. Happy to
track this as a follow-up issue covering cross-section validation
uniformly for all three fields.
Tests run locally: 52 passed in this file; 1847 passed, 18 skipped
across the full backend suite. Ruff check + format clean.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(channels): add Discord channel integration
Add a Discord bot channel following the existing Telegram/Slack pattern.
The bot listens for messages, creates conversation threads, and relays
responses back to Discord with 2000-char message splitting.
- DiscordChannel extends Channel base class
- Lazy imports discord.py with install hint
- Thread-based conversations (each Discord thread maps to a DeerFlow thread)
- Allowed guilds filter for access control
- File attachment support via discord.File
- Registered in service.py and manager.py
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(channels): address Copilot review suggestions for Discord integration
- Disable @everyone/@here mentions via AllowedMentions.none()
- Add 10s timeout to client close to prevent shutdown hangs
- Log publish_inbound errors via future callback instead of silently dropping
- Open file handle on caller thread to avoid cross-thread ownership issues
- Notify user in channel when thread creation fails
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(discord): resolve lint errors in Discord channel
- Replace asyncio.TimeoutError with builtin TimeoutError (UP041)
- Remove extraneous f-string prefix (F541)
- Apply ruff format
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(tests): remove fake langgraph_sdk shim from test_discord_channel
The module-level sys.modules.setdefault shim installed a fake
langgraph_sdk.errors.ConflictError during pytest collection. Because
pytest imports all test modules before running them, test_channels.py
then imported the fake ConflictError instead of the real one.
In test_handle_feishu_stream_conflict_sends_busy_message, the test
constructs ConflictError(message, response=..., body=...). The fake
only subclasses Exception (which takes no kwargs), so the construction
raised TypeError. The manager's _is_thread_busy_error check then saw a
TypeError instead of a ConflictError and fell through to the generic
'An error occurred' message.
langgraph_sdk is a real dependency, so the shim is unnecessary.
Removing it makes both test files import the same real ConflictError
and the full suite pass (1773 passed, 15 skipped).
---------
Co-authored-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(sandbox): resolve paths in read_file/write_file content for LocalSandbox
In LocalSandbox mode, read_file and write_file now transform
container paths in file content, matching the path handling
behavior of bash tool.
- write_file: resolves virtual paths in content to system paths
before writing, so scripts with /mnt/user-data paths work
when executed
- read_file: reverse-resolves system paths back to virtual
paths in returned content for consistency
This fixes scenarios where agents write Python scripts with
virtual paths, then execute them via bash tool expecting the
paths to work.
Fixes#1778
* fix(sandbox): address Copilot review — dedicated content resolver + forward-slash safety + tests
- Extract _resolve_paths_in_content() separate from _resolve_paths_in_command()
to decouple file-content path resolution from shell-command parsing
- Normalize resolved paths to forward slashes to avoid Windows backslash
escape issues in source files (e.g. \U in Python string literals)
- Add 4 focused tests: write resolves content, forward-slash guarantee,
read reverse-resolves content, and write→read roundtrip
* style: fix ruff lint — remove extraneous f-string prefix
* fix(sandbox): only reverse-resolve paths in agent-written files
read_file previously applied _reverse_resolve_paths_in_output to ALL
file content, which could silently rewrite paths in user uploads and
external tool output (Willem Jiang review on #1935).
Now tracks files written through write_file in _agent_written_paths.
Only those files get reverse-resolved on read. Non-agent files are
returned as-is.
---------
Co-authored-by: JasonOA888 <JasonOA888@users.noreply.github.com>
* fix(middleware): add per-tool-type frequency detection to LoopDetectionMiddleware
The existing hash-based loop detection only catches identical tool call
sets. When the agent calls the same tool type (e.g. read_file) on many
different files, each call produces a unique hash and bypasses detection.
This causes the agent to exhaust recursion_limit, consuming 150K-225K
tokens per failed run.
Add a second detection layer that tracks cumulative call counts per tool
type per thread. Warns at 30 calls (configurable) and forces stop at 50.
The hard stop message now uses the actual returned message instead of a
hardcoded constant, so both hash-based and frequency-based stops produce
accurate diagnostics.
Also fix _apply() to use the warning message returned by
_track_and_check() for hard stops, instead of always using _HARD_STOP_MSG.
Closes#1987
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* fix(lint): remove unused imports and fix line length
- Remove unused _TOOL_FREQ_HARD_STOP_MSG and _TOOL_FREQ_WARNING_MSG
imports from test file (F401)
- Break long _TOOL_FREQ_WARNING_MSG string to fit within 240 char limit (E501)
* style: apply ruff format
* test: add LRU eviction and per-thread reset coverage for frequency state
Address review feedback from @WillemJiang:
- Verify _tool_freq and _tool_freq_warned are cleaned on LRU eviction
- Add test for reset(thread_id=...) clearing only the target thread's
frequency state while leaving others intact
* fix(makefile): route Windows shell-script targets through Git Bash (#2060)
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Asish Kumar <87874775+officialasishkumar@users.noreply.github.com>
* fix: improve sandbox security and preserve multimodal content
* Add unit test modifications for test_injects_uploaded_files_tag_into_list_content
* format updated_content
* Add regression tests for multimodal upload content and host bash default safety
* feat(provisioner): add optional PVC support for sandbox volumes (#1978)
Add SKILLS_PVC_NAME and USERDATA_PVC_NAME env vars to allow sandbox
Pods to use PersistentVolumeClaims instead of hostPath volumes. This
prevents data loss in production when pods are rescheduled across nodes.
When USERDATA_PVC_NAME is set, a subPath of threads/{thread_id}/user-data
is used so a single PVC can serve multiple threads. Falls back to hostPath
when the new env vars are not set, preserving backward compatibility.
* add unit test for provisioner pvc volumes
* refactor: extract shared provisioner_module fixture to conftest.py
Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/e7ccf708-c6ba-40e4-844a-b526bdb249dd
Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: JeffJiang <for-eleven@hotmail.com>
* feat(blog): implement blog structure with post listing and tagging functionality
* feat(blog): enhance blog layout and post metadata display with new components
* fix(blog): address PR #1962 review feedback and fix lint issues (#14)
* fix: format
---------
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
* fix(backend): stream DeerFlowClient AI text as token deltas (#1969)
DeerFlowClient.stream() subscribed to LangGraph stream_mode=["values",
"custom"] which only delivers full-state snapshots at graph-node
boundaries, so AI replies were dumped as a single messages-tuple event
per node instead of streaming token-by-token. `client.stream("hello")`
looked identical to `client.chat("hello")` — the bug reported in #1969.
Subscribe to "messages" mode as well, forward AIMessageChunk deltas as
messages-tuple events with delta semantics (consumers accumulate by id),
and dedup the values-snapshot path so it does not re-synthesize AI
text that was already streamed. Introduce a per-id usage_metadata
counter so the final AIMessage in the values snapshot and the final
"messages" chunk — which carry the same cumulative usage — are not
double-counted.
chat() now accumulates per-id deltas and returns the last message's
full accumulated text. Non-streaming mock sources (single event per id)
are a degenerate case of the same logic, keeping existing callers and
tests backward compatible.
Verified end-to-end against a real LLM: a 15-number count emits 35
messages-tuple events with BPE subword boundaries clearly visible
("eleven" -> "ele" / "ven", "twelve" -> "tw" / "elve"), 476ms across
the window, end-event usage matches the values-snapshot usage exactly
(not doubled). tests/test_client_live.py::TestLiveStreaming passes.
New unit tests:
- test_messages_mode_emits_token_deltas: 3 AIMessageChunks produce 3
delta events with correct content/id/usage, values-snapshot does not
duplicate, usage counted once.
- test_chat_accumulates_streamed_deltas: chat() rebuilds full text
from deltas.
- test_messages_mode_tool_message: ToolMessage delivered via messages
mode is not duplicated by the values-snapshot synthesis path.
The stream() docstring now documents why this client does not reuse
Gateway's run_agent() / StreamBridge pipeline (sync vs async, raw
LangChain objects vs serialized dicts, single caller vs HTTP fan-out).
Fixes#1969
* refactor(backend): simplify DeerFlowClient streaming helpers (#1969)
Post-review cleanup for the token-level streaming fix. No behavior
change for correct inputs; one efficiency regression fixed.
Fix: chat() O(n²) accumulator
-----------------------------
`chat()` accumulated per-id text via `buffers[id] = buffers.get(id,"") + delta`,
which is O(n) per concat → O(n²) total over a streamed response. At
~2 KB cumulative text this becomes user-visible; at 50 KB / 5000 chunks
it costs roughly 100-300 ms of pure copying. Switched to
`dict[str, list[str]]` + `"".join()` once at return.
Cleanup
-------
- Extract `_serialize_tool_calls`, `_ai_text_event`, `_ai_tool_calls_event`,
and `_tool_message_event` static helpers. The messages-mode and
values-mode branches previously repeated four inline dict literals each;
they now call the same builders.
- `StreamEvent.type` is now typed as `Literal["values", "messages-tuple",
"custom", "end"]` via a `StreamEventType` alias. Makes the closed set
explicit and catches typos at type-check time.
- Direct attribute access on `AIMessage`/`AIMessageChunk`: `.usage_metadata`,
`.tool_calls`, `.id` all have default values on the base class, so the
`getattr(..., None)` fallbacks were dead code. Removed from the hot
path.
- `_account_usage` parameter type loosened to `Any` so that LangChain's
`UsageMetadata` TypedDict is accepted under strict type checking.
- Trimmed narrating comments on `seen_ids` / `streamed_ids` / the
values-synthesis skip block; kept the non-obvious ones that document
the cross-mode dedup invariant.
Net diff: -15 lines. All 132 unit tests + harness boundary test still
pass; ruff check and ruff format pass.
* docs(backend): add STREAMING.md design note (#1969)
Dedicated design document for the token-level streaming architecture,
prompted by the bug investigation in #1969.
Contents:
- Why two parallel streaming paths exist (Gateway HTTP/async vs
DeerFlowClient sync/in-process) and why they cannot be merged.
- LangGraph's three-layer mode naming (Graph "messages" vs Platform
SDK "messages-tuple" vs HTTP SSE) and why a shared string constant
would be harmful.
- Gateway path: run_agent + StreamBridge + sse_consumer with a
sequence diagram.
- DeerFlowClient path: sync generator + direct yield, delta semantics,
chat() accumulator.
- Why the three id sets (seen_ids / streamed_ids / counted_usage_ids)
each carry an independent invariant and cannot be collapsed.
- End-to-end sequence for a real conversation turn.
- Lessons from #1969: why mock-based tests missed the bug, why
BPE subword boundaries in live output are the strongest
correctness signal, and the regression test that locks it in.
- Source code location index.
Also:
- Link from backend/CLAUDE.md Embedded Client section.
- Link from backend/docs/README.md under Feature Documentation.
* test(backend): add refactor regression guards for stream() (#1969)
Three new tests in TestStream that lock the contract introduced by
PR #1974 so any future refactor (sync->async migration, sharing a
core with Gateway's run_agent, dedup strategy change) cannot
silently change behavior.
- test_dedup_requires_messages_before_values_invariant: canary that
documents the order-dependence of cross-mode dedup. streamed_ids
is populated only by the messages branch, so values-before-messages
for the same id produces duplicate AI text events. Real LangGraph
never inverts this order, but a refactor that does (or that makes
dedup idempotent) must update this test deliberately.
- test_messages_mode_golden_event_sequence: locks the *exact* event
sequence (4 events: 2 messages-tuple deltas, 1 values snapshot, 1
end) for a canonical streaming turn. List equality gives a clear
diff on any drift in order, type, or payload shape.
- test_chat_accumulates_in_linear_time: perf canary for the O(n^2)
fix in commit 1f11ba10. 10,000 single-char chunks must accumulate
in under 1s; the threshold is wide enough to pass on slow CI but
tight enough to fail if buffer = buffer + delta is restored.
All three tests pass alongside the existing 12 TestStream tests
(15/15). ruff check + ruff format clean.
* docs(backend): clarify stream() docstring on JSON serialization (#1969)
Replace the misleading "raw LangChain objects (AIMessage,
usage_metadata as dataclasses), not dicts" claim in the
"Why not reuse Gateway's run_agent?" section. The implementation
already yields plain Python dicts (StreamEvent.data is dict, and
usage_metadata is a TypedDict), so the original wording suggested
a richer return type than the API actually delivers.
The corrected wording focuses on what is actually true and
relevant: this client skips the JSON/SSE serialization layer that
Gateway adds for HTTP wire transmission, and yields stream event
payloads directly as Python data structures.
Addresses Copilot review feedback on PR #1974.
* test(backend): document none-id messages dedup limitation (#1969)
Add test_none_id_chunks_produce_duplicates_known_limitation to
TestStream that explicitly documents and asserts the current
behavior when an LLM provider emits AIMessageChunk with id=None
(vLLM, certain custom backends).
The cross-mode dedup machinery cannot record a None id in
streamed_ids (guarded by ``if msg_id:``), so the values snapshot's
reassembled AIMessage with a real id falls through and synthesizes
a duplicate AI text event. The test asserts len == 2 and locks
this as a known limitation rather than silently letting future
contributors hit it without context.
Why this is documented rather than fixed:
* Falling back to ``metadata.get("id")`` does not help — LangGraph's
messages-mode metadata never carries the message id.
* Synthesizing ``f"_synth_{id(msg_chunk)}"`` only helps if the
values snapshot uses the same fallback, which it does not.
* A real fix requires provider cooperation (always emit chunk ids)
or content-based dedup (false-positive risk), neither of which
belongs in this PR.
If a real fix lands, replace this test with a positive assertion
that dedup works for None-id chunks.
Addresses Copilot review feedback on PR #1974 (client.py:515).
* fix(frontend): UI polish - fix CSS typo, dark mode border, and hardcoded colors (#1942)
- Fix `font-norma` typo to `font-normal` in message-list subtask count
- Fix dark mode `--border` using reddish hue (22.216) instead of neutral
- Replace hardcoded `rgb(184,184,192)` in hero with `text-muted-foreground`
- Replace hardcoded `bg-[#a3a1a1]` in streaming indicator with `bg-muted-foreground`
- Add missing `font-sans` to welcome description `<pre>` for consistency
- Make case-study-section padding responsive (`px-4 md:px-20`)
Closes#1940
* docs: clarify deployment sizing guidance (#1963)
* fix(frontend): prevent stale 'new' thread ID from triggering 422 history requests (#1960)
After history.replaceState updates the URL from /chats/new to
/chats/{UUID}, Next.js useParams does not update because replaceState
bypasses the router. The useEffect in useThreadChat would then set
threadIdFromPath ('new') as the threadId, causing the LangGraph SDK
to call POST /threads/new/history which returns HTTP 422 (Invalid
thread ID: must be a UUID).
This fix adds a guard to skip the threadId update when
threadIdFromPath is the literal string 'new', preserving the
already-correct UUID that was set when the thread was created.
* fix(frontend): avoid using route new as thread id (#1967)
Co-authored-by: luoxiao6645 <luoxiao6645@gmail.com>
* Fix(subagent): Event loop conflict in SubagentExecutor.execute() (#1965)
* Fix event loop conflict in SubagentExecutor.execute()
When SubagentExecutor.execute() is called from within an already-running
event loop (e.g., when the parent agent uses async/await), calling
asyncio.run() creates a new event loop that conflicts with asyncio
primitives (like httpx.AsyncClient) that were created in and bound to
the parent loop.
This fix detects if we're already in a running event loop, and if so,
runs the subagent in a separate thread with its own isolated event loop
to avoid conflicts.
Fixes: sub-task cards not appearing in Ultra mode when using async parent agents
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(subagent): harden isolated event loop execution
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor(backend): remove dead getattr in _tool_message_event
---------
Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com>
Co-authored-by: Xinmin Zeng <135568692+fancyboi999@users.noreply.github.com>
Co-authored-by: 13ernkastel <LennonCMJ@live.com>
Co-authored-by: siwuai <458372151@qq.com>
Co-authored-by: 肖 <168966994+luoxiao6645@users.noreply.github.com>
Co-authored-by: luoxiao6645 <luoxiao6645@gmail.com>
Co-authored-by: Saber <11769524+hawkli-1994@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* test(skills): add trigger eval set for systematic-literature-review skill
20 eval queries (10 should-trigger, 10 should-not-trigger) for use with
skill-creator's run_eval.py. Includes real-world SLR queries contributed
by @VANDRANKI (issue #1862 author) and edge cases for routing
disambiguation with academic-paper-review.
* test(skills): add grader expectations for SLR skill evaluation
5 eval cases with 39 expectations covering:
- Standard SLR flow (APA/BibTeX/IEEE format selection)
- Keyword extraction and search behavior
- Subagent dispatch for metadata extraction
- Report structure (themes, convergences, gaps, per-paper annotations)
- Negative case: single-paper routing to academic-paper-review
- Edge case: implicit SLR without explicit keywords
* refactor(skills): shorten SLR description for better trigger rate
Reduce description from 833 to 344 chars. Key changes:
- Lead with "systematic literature review" as primary trigger phrase
- Strengthen single-paper exclusion: "Not for single-paper tasks"
- Remove verbose example patterns that didn't improve routing
Tested with run_eval.py (10 runs/query):
- False positive "best paper on RL": 67% → 20% (improved)
- True positive explicit SLR query: ~30% (unchanged)
Low recall is a routing-layer limitation, not a description issue —
see PR description for full analysis.
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
The /api/langgraph/* endpoints proxy straight to the LangGraph server,
so clients inherit LangGraph's native recursion_limit default of 25
instead of the 100 that build_run_config sets for the Gateway and IM
channel paths. 25 is too low for plan-mode or subagent runs and
reliably triggers GraphRecursionError on the lead agent's final
synthesis step after subagents return.
Set recursion_limit: 100 in the Create Run example and the cURL
snippet, and add a short note explaining the discrepancy so users
following the docs don't hit the 25-step ceiling as a surprise.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(skills): add systematic-literature-review skill for multi-paper SLR workflows
Adds a new skill that produces a structured systematic literature review (SLR)
across multiple academic papers on a topic. Addresses #1862 with a pure skill
approach: no new tools, no architectural changes, no new dependencies.
Skill layout:
- SKILL.md — 4+1 phase workflow (plan, search, extract, synthesize, present)
- scripts/arxiv_search.py — arXiv API client, stdlib only, with a
requests->urllib fallback shim modeled after github-deep-research's
github_api.py
- templates/{apa,ieee,bibtex}.md — citation format templates selected
dynamically in Phase 4, mirroring podcast-generation's templates/ pattern
Design notes:
- Multi-paper synthesis uses the existing `task` tool to dispatch extraction
subagents in parallel. SKILL.md's Phase 3 includes a fixed decision table
for batch splitting to respect the runtime's MAX_CONCURRENT_SUBAGENTS = 3
cap, and explicitly tells the agent to strip the "Task Succeeded. Result: "
prefix before parsing subagent JSON output.
- arXiv only, by design. Semantic Scholar and PubMed adapters would push the
scope toward a standalone MCP server (see #933) and are intentionally out
of scope for this skill.
- Coexists with the existing `academic-paper-review` skill: this skill does
breadth-first synthesis across many papers, academic-paper-review does
single-paper peer review. The two are routed via distinct triggers and
can compose (SLR on many + deep review on 1-2 important ones).
- Hard upper bound of 50 papers, tied to the Phase 3 concurrency strategy.
Larger surveys degrade in synthesis quality and are better split by
sub-topic.
BibTeX template explicitly uses @misc for arXiv preprints (not @article),
which is the most common mistake when generating BibTeX for arXiv papers.
arxiv_search.py was smoke-tested end-to-end against the live arXiv API with
two query shapes (relevance sort, submittedDate sort with category filter);
all returned JSON fields parse correctly (id normalization, Atom namespace
handling, URL encoding for multi-word queries).
* fix(skills): prevent LLM from saving intermediate search results to file
Adds an explicit "do not save" instruction at the end of Phase 2.
Observed during Test 1 with DeepSeek: the model saved search results
to a markdown file before proceeding to Phase 3, wasting 2-3 tool call
rounds and increasing the risk of hitting the graph recursion limit.
The search JSON should stay in context for Phase 3, not be persisted.
* fix(skills): use relevance+start-date instead of submittedDate sorting
Test 2 revealed that arXiv's submittedDate sorting returns the most
recently submitted papers in the category regardless of query relevance.
Searching "diffusion models" with sortBy=submittedDate in cs.CV returned
papers on spatial memory, Navier-Stokes, and photon-counting CT — none
about diffusion models. The LLM then retried with 4 different queries,
wasting tool calls and approaching the recursion limit.
Fix: always sort by relevance; when the user wants "recent" papers,
combine relevance sorting with --start-date to constrain the time window.
Also add an explicit "run the search exactly once" instruction to prevent
the retry loop.
* fix(skills): wrap multi-word arXiv queries in double quotes for phrase matching
Without quotes, `all:diffusion model` is parsed by arXiv's Lucene as
`all:diffusion OR model`, pulling in unrelated papers from physics
(thermal diffusion) and other fields. Wrapping in double quotes forces
phrase matching: `all:"diffusion model"`.
Also fixes date filtering: the previous bug caused 2011 papers to appear
in results despite --start-date 2024-04-09, because the unquoted query
words were OR'd with the date constraint.
Verified: "diffusion models" --category cs.CV --start-date 2024-04-09
now returns only relevant diffusion model papers published after April
2024.
* fix(skills): add query phrasing guide and enforce subagent delegation
Two fixes from Test 2 observations with DeepSeek:
1. Query phrasing: add a table showing good vs bad query examples.
The script wraps multi-word queries in double quotes for phrase
matching, so long queries like "diffusion models in computer vision"
return 0 results. Guide the LLM to use 2-3 core keywords + --category
instead.
2. Subagent enforcement: DeepSeek was extracting metadata inline via
python -c scripts instead of using the task tool. Strengthen Phase 3
to explicitly name the task tool, say "do not extract metadata
yourself", and explain why (token budget, isolation). This is more
direct than the previous natural-language-only approach while still
providing the reasoning behind the constraint.
* fix(skills): strengthen search keyword guidance and subagent enforcement
Address two issues found during end-to-end testing with DeepSeek:
1. Search retry: LLM passed full topic descriptions as queries (e.g.
"diffusion models in computer vision"), which returned 0 results due
to exact phrase matching and triggered retries. Added explicit
instruction to extract 2-3 core keywords before searching.
2. Subagent bypass: LLM used python -c to extract metadata instead of
dispatching via task tool. Added explicit prohibition list (python -c,
bash scripts, inline extraction) with ❌ markers for clarity.
* fix(skills): address Copilot review feedback on SLR skill
- Fix legacy arXiv ID parsing: preserve archive prefix for pre-2007
papers (e.g. hep-th/9901001 instead of just 9901001)
- Fix phase count: "four phases" -> "five phases"
- Add subagent_enabled prerequisite note to SKILL.md Notes section
- Remove PR-specific references ("PR 1") from ieee.md and bibtex.md
templates, replace with workflow-scoped wording
- Fix script header: "stdlib only" -> "no additional dependencies
required", fix relative path to github_api.py reference
- Remove reference to non-existent docs/enhancement/ path in header
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Add langchain-ollama as an optional dependency and provide ChatOllama
config examples, enabling proper thinking/reasoning content preservation
for local Ollama models.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(frontend): replace invalid "context" select field with "metadata" in threads.search
The LangGraph API server does not support "context" as a select field for
threads/search, causing a 422 Unprocessable Entity error introduced by
commit 60e0abf (#1771).
- Replace "context" with "metadata" in the default select list
- Persist agent_name into thread metadata on creation so search results
carry the agent identity
- Update pathOfThread() to fall back to metadata.agent_name when
context is unavailable from search results
- Add regression tests for metadata-based agent routing
Fixes#2037
Made-with: Cursor
* fix: apply Copilot suggestions
* style: fix the lint error
* feat(config): add when_thinking_disabled support for model configs
Allow users to explicitly configure what parameters are sent to the
model when thinking is disabled, via a new `when_thinking_disabled`
field in model config. This mirrors the existing `when_thinking_enabled`
pattern and takes full precedence over the hardcoded disable behavior
when set. Backwards compatible — existing configs work unchanged.
Closes#1675
* fix(config): address copilot review — gate when_thinking_disabled independently
- Switch truthiness check to `is not None` so empty dict overrides work
- Restructure disable path so when_thinking_disabled is gated independently
of has_thinking_settings, allowing it to work without when_thinking_enabled
- Update test to reflect new behavior
* feat: implement full checkpoint rollback on user cancellation
- Capture pre-run checkpoint snapshot including checkpoint state, metadata, and pending_writes
- Add _rollback_to_pre_run_checkpoint() function to restore thread state
- Implement _call_checkpointer_method() helper to support both async and sync checkpointer methods
- Rollback now properly restores checkpoint, metadata, channel_versions, and pending_writes
- Remove obsolete TODO comment (Phase 2) as rollback is now complete
This resolves the TODO(Phase 2) comment and enables full thread state
restoration when a run is cancelled by the user.
* fix: address rollback review feedback
* fix: strengthen checkpoint rollback validation and error handling
- Validate restored_config structure and checkpoint_id before use
- Raise RuntimeError on malformed pending_writes instead of silent skip
- Normalize None checkpoint_ns to empty string instead of "None"
- Move delete_thread to only execute when pre_run_snapshot is None
- Add docstring noting non-atomic rollback as known limitation
This addresses review feedback on PR #1867 regarding data integrity
in the checkpoint rollback implementation.
* test: add comprehensive coverage for checkpoint rollback edge cases
- test_rollback_restores_snapshot_without_deleting_thread
- test_rollback_deletes_thread_when_no_snapshot_exists
- test_rollback_raises_when_restore_config_has_no_checkpoint_id
- test_rollback_normalizes_none_checkpoint_ns_to_root_namespace
- test_rollback_raises_on_malformed_pending_write_not_a_tuple
- test_rollback_raises_on_malformed_pending_write_non_string_channel
- test_rollback_propagates_aput_writes_failure
Covers all scenarios from PR #1867 review feedback.
* test: format rollback worker tests
* fix(sandbox): add startup reconciliation to prevent orphaned container leaks
Sandbox containers were never cleaned up when the managing process restarted,
because all lifecycle tracking lived in in-memory dictionaries. This adds
startup reconciliation that enumerates running containers via `docker ps` and
either destroys orphans (age > idle_timeout) or adopts them into the warm pool.
Closes#1972
* fix(sandbox): address Copilot review — adopt-all strategy, improved error handling
- Reconciliation now adopts all containers into warm pool unconditionally,
letting the idle checker decide cleanup. Avoids destroying containers
that another concurrent process may still be using.
- list_running() logs stderr on docker ps failure and catches
FileNotFoundError/OSError.
- Signal handler test restores SIGTERM/SIGINT in addition to SIGHUP.
- E2E test docstring corrected to match actual coverage scope.
* fix(sandbox): address maintainer review — batch inspect, lock tightening, import hygiene
- _reconcile_orphans(): merge check-and-insert into a single lock acquisition
per container to eliminate the TOCTOU window.
- list_running(): batch the per-container docker inspect into a single call.
Total subprocess calls drop from 2N+1 to 2 (one ps + one batch inspect).
Parse port and created_at from the inspect JSON payload.
- Extract _parse_docker_timestamp() and _extract_host_port() as module-level
pure helpers and test them directly.
- Move datetime/json imports to module top level.
- _make_provider_for_reconciliation(): document the __new__ bypass and the
lockstep coupling to AioSandboxProvider.__init__.
- Add assertion that list_running() makes exactly ONE inspect call.
* Fix HTML artifact preview rendering
* Add after screenshot for HTML preview fix
* Add before screenshot for HTML preview fix
* Update before screenshot for HTML preview fix
* Update after screenshot for HTML preview fix
* Update before screenshot to Tsinghua homepage repro
* Update after screenshot to Tsinghua homepage preview
* Address PR review on HTML artifact preview
* Harden HTML artifact preview isolation
Streamdown's streaming safeguard appends closing markers (e.g. `*`) to
text with unmatched markdown syntax. This causes user messages containing
literal `*` (such as `99 * 87`) to display with a spurious trailing
asterisk. Human messages are always complete, so the incomplete-markdown
pre-processing is unnecessary.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix(docker): nginx fails to start on hosts without IPv6
- Detect IPv6 support at runtime and remove `listen [::]` directive
when unavailable, preventing nginx startup failure on non-IPv6 hosts
- Use `exec` to replace shell with nginx as PID 1 for proper signal
handling (graceful shutdown on SIGTERM)
- Reformat command from YAML folded scalar to block scalar (no
functional change)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(docker): harden nginx startup script (Copilot review feedback)
Add `set -e` so envsubst failures exit immediately instead of starting
nginx with an incomplete config. Narrow the sed pattern to match only
the `listen [::]:2026;` directive to avoid accidentally removing future
lines containing [::].
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
When a model config includes `reasoning_effort` as an extra YAML field
(ModelConfig uses `extra="allow"`), and the thinking-disabled code path
also injects `reasoning_effort="minimal"` into kwargs, the previous
`model_class(**kwargs, **model_settings_from_config)` call raises:
TypeError: got multiple values for keyword argument 'reasoning_effort'
Fix by merging the two dicts before instantiation, giving runtime kwargs
precedence over config values: `{**model_settings_from_config, **kwargs}`.
Fixes#1977
Co-authored-by: octo-patch <octo-patch@github.com>
* fix(middleware): handle string-serialized options in ClarificationMiddleware (#1995)
Some models (e.g. Qwen3-Max) serialize array tool parameters as JSON
strings instead of native arrays. Add defensive type checking in
_format_clarification_message() to deserialize string options before
iteration, preventing per-character rendering.
* fix(middleware): normalize options after JSON deserialization
Address Copilot review feedback:
- Add post-deserialization normalization so options is always a list
(handles json.loads returning a scalar string, dict, or None)
- Add test for JSON-encoded scalar string ("development")
- Fix test_json_string_with_mixed_types to use actual mixed types
* feat(community): add Exa search as community tool provider
Add Exa (exa.ai) as a new community search provider alongside Tavily,
Firecrawl, InfoQuest, and Jina AI. Exa is an AI-native search engine
with neural, keyword, and auto search types.
New files:
- community/exa/tools.py: web_search_tool and web_fetch_tool
- tests/test_exa_tools.py: 10 unit tests with mocked Exa client
Changes:
- pyproject.toml: add exa-py dependency
- config.example.yaml: add commented-out Exa configuration examples
Usage: set `use: deerflow.community.exa.tools:web_search_tool` in
config.yaml and provide EXA_API_KEY.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(community): address PR review comments for Exa tools
- Make _get_exa_client() accept tool_name param so web_fetch reads its own config
- Remove __init__.py to match namespace package pattern of other providers
- Add duplicate tool name warning in config.example.yaml
- Add regression tests for web_fetch config resolution
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Update revision in uv.lock to 3
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(backend): use timezone-aware UTC in memory modules
Replace datetime.utcnow() with datetime.now(timezone.utc) and a shared
utc_now_iso_z() helper so persisted ISO timestamps keep the trailing Z
suffix without triggering Python 3.12+ deprecation warnings.
Made-with: Cursor
* refactor(backend): use removesuffix for utc_now_iso_z suffix
Makes the +00:00 -> Z transform explicit for the trailing offset only
(Copilot review on PR #1992).
Made-with: Cursor
* style(backend): satisfy ruff UP017 with datetime.UTC in memory queue
Made-with: Cursor
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* Fix event loop conflict in SubagentExecutor.execute()
When SubagentExecutor.execute() is called from within an already-running
event loop (e.g., when the parent agent uses async/await), calling
asyncio.run() creates a new event loop that conflicts with asyncio
primitives (like httpx.AsyncClient) that were created in and bound to
the parent loop.
This fix detects if we're already in a running event loop, and if so,
runs the subagent in a separate thread with its own isolated event loop
to avoid conflicts.
Fixes: sub-task cards not appearing in Ultra mode when using async parent agents
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(subagent): harden isolated event loop execution
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
After history.replaceState updates the URL from /chats/new to
/chats/{UUID}, Next.js useParams does not update because replaceState
bypasses the router. The useEffect in useThreadChat would then set
threadIdFromPath ('new') as the threadId, causing the LangGraph SDK
to call POST /threads/new/history which returns HTTP 422 (Invalid
thread ID: must be a UUID).
This fix adds a guard to skip the threadId update when
threadIdFromPath is the literal string 'new', preserving the
already-correct UUID that was set when the thread was created.
- Fix `font-norma` typo to `font-normal` in message-list subtask count
- Fix dark mode `--border` using reddish hue (22.216) instead of neutral
- Replace hardcoded `rgb(184,184,192)` in hero with `text-muted-foreground`
- Replace hardcoded `bg-[#a3a1a1]` in streaming indicator with `bg-muted-foreground`
- Add missing `font-sans` to welcome description `<pre>` for consistency
- Make case-study-section padding responsive (`px-4 md:px-20`)
Closes#1940
* fix(backend): make loop detection hash tool calls by stable keys
The loop detection middleware previously hashed full tool call arguments,
which made repeated calls look different when only non-essential argument
details changed. In particular, `read_file` calls with nearby line ranges
could bypass repetition detection even when the agent was effectively
reading the same file region again and again.
- Hash tool calls using stable keys instead of the full raw args payload
- Bucket `read_file` line ranges so nearby reads map to the same region key
- Prefer stable identifiers such as `path`, `url`, `query`, or `command`
before falling back to JSON serialization of args
- Keep hashing order-independent so the same tool call set produces the
same hash regardless of call order
Fixes#1905
* fix(backend): harden loop detection hash normalization
- Normalize and parse stringified tool args defensively
- Expand stable key derivation to include pattern, glob, and cmd
- Normalize reversed read_file ranges before bucketing
Fixes#1905
* fix(backend): harden loop detection tool format
* exclude write_file and str_replace from the stable-key path — writing different content to the same file shouldn't be flagged.
---------
Co-authored-by: JeffJiang <for-eleven@hotmail.com>
* fix(frontend): resolve layout flickering by migrating workspace sidebar state to cookie
* fix(frontend): unify local settings runtime state to fix state drift
* fix(frontend): only persist thread model on explicit context model updates
* fix(subagents): add cooperative cancellation for subagent threads
Subagent tasks run inside ThreadPoolExecutor threads with their own
event loop (asyncio.run). When a user clicks stop, RunManager cancels
the parent asyncio.Task, but Future.cancel() cannot terminate a running
thread and asyncio.Event does not propagate across event loops. This
causes subagent threads to keep executing (writing files, calling LLMs)
even after the user explicitly stops the run.
Fix: add a threading.Event (cancel_event) to SubagentResult and check
it cooperatively in _aexecute()'s astream iteration loop. On cancel,
request_cancel_background_task() sets the event, and the thread exits
at the next iteration boundary.
Changes:
- executor.py: Add cancel_event field to SubagentResult, check it in
_aexecute loop, set it on timeout, add request_cancel_background_task
- task_tool.py: Call request_cancel_background_task on CancelledError
* fix(subagents): guard cancel status and add pre-check before astream
- Only overwrite status to FAILED when still RUNNING, preserving
TIMED_OUT set by the scheduler thread.
- Add cancel_event pre-check before entering the astream loop so
cancellation is detected immediately when already signalled.
* fix(subagents): guard status updates with lock to prevent race condition
Wrap the check-and-set on result.status in _aexecute with
_background_tasks_lock so the timeout handler in execute_async
cannot interleave between the read and write.
* fix(subagents): add dedicated CANCELLED status for user cancellation
Introduce SubagentStatus.CANCELLED to distinguish user-initiated
cancellation from actual execution failures. Update _aexecute,
task_tool polling, cleanup terminal-status sets, and test fixtures.
* test(subagents): add cancellation tests and fix timeout regression test
- Add dedicated TestCooperativeCancellation test class with 6 tests:
- Pre-set cancel_event prevents astream from starting
- Mid-stream cancel_event returns CANCELLED immediately
- request_cancel_background_task() sets cancel_event correctly
- request_cancel on nonexistent task is a no-op
- Real execute_async timeout does not overwrite CANCELLED (deterministic
threading.Event sync, no wall-clock sleeps)
- cleanup_background_task removes CANCELLED tasks
- Add task_tool cancellation coverage:
- test_cancellation_calls_request_cancel: assert CancelledError path
calls request_cancel_background_task(task_id)
- test_task_tool_returns_cancelled_message: assert CANCELLED polling
branch emits task_cancelled event and returns expected message
- Fix pre-existing test infrastructure issue: add deerflow.sandbox.security
to _MOCKED_MODULE_NAMES (fixes ModuleNotFoundError for all executor tests)
- Add RUNNING guard to timeout handler in executor.py to prevent
TIMED_OUT from overwriting CANCELLED status
- Add cooperative cancellation granularity comment documenting that
cancellation is only detected at astream iteration boundaries
---------
Co-authored-by: lulusiyuyu <lulusiyuyu@users.noreply.github.com>
* fix(frontend): resolve invalid HTML nesting and tabnabbing vulnerabilities
Fix `<button>` inside `<a>` invalid HTML in artifact components and add
missing `noopener,noreferrer` to `window.open` calls to prevent reverse
tabnabbing.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(frontend): address Copilot review on tabnabbing and double-tab-open
Remove redundant parent onClick on web_fetch ChainOfThoughtStep to
prevent opening two tabs on link click, and explicitly null out
window.opener after window.open() for defensive tabnabbing hardening.
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Two production docker-compose.yaml bugs prevent `make up` from working:
1. Gateway missing DEER_FLOW_CONFIG_PATH and DEER_FLOW_EXTENSIONS_CONFIG_PATH
environment overrides. Added in fb2d99f (#1836) but accidentally reverted
by ca2fb95 (#1847). Without them, gateway reads host paths from .env via
env_file, causing FileNotFoundError inside the container.
2. Langgraph command fails when LANGGRAPH_ALLOW_BLOCKING is unset (default).
Empty $${allow_blocking} inserts a bare space between flags, causing
' --no-reload' to be parsed as unexpected extra argument. Fix by building
args string first and conditionally appending --allow-blocking.
Co-authored-by: cooper <cooperfu@tencent.com>
* feat(feishu): add channel file materialization hook for inbound messages
- Introduce Channel.receive_file(msg, thread_id) as a base method for file materialization; default is no-op.
- Implement FeishuChannel.receive_file to download files/images from Feishu messages, save to sandbox, and inject virtual paths into msg.text.
- Update ChannelManager to call receive_file for any channel if msg.files is present, enabling downstream model access to user-uploaded files.
- No impact on Slack/Telegram or other channels (they inherit the default no-op).
* style(backend): format code with ruff for lint compliance
- Auto-formatted packages/harness/deerflow/agents/factory.py and tests/test_create_deerflow_agent.py using `ruff format`
- Ensured both files conform to project linting standards
- Fixes CI lint check failures caused by code style issues
* fix(feishu): handle file write operation asynchronously to prevent blocking
* fix(feishu): rename GetMessageResourceRequest to _GetMessageResourceRequest and remove redundant code
* test(feishu): add tests for receive_file method and placeholder replacement
* fix(manager): remove unnecessary type casting for channel retrieval
* fix(feishu): update logging messages to reflect resource handling instead of image
* fix(feishu): sanitize filename by replacing invalid characters in file uploads
* fix(feishu): improve filename sanitization and reorder image key handling in message processing
* fix(feishu): add thread lock to prevent filename conflicts during file downloads
* fix(test): correct bad merge in test_feishu_parser.py
* chore: run ruff and apply formatting cleanup
fix(feishu): preserve rich-text attachment order and improve fallback filename handling
* fix(sandbox): add L2 input sanitisation to SandboxAuditMiddleware
Add _validate_input() to reject malformed bash commands before regex
classification: empty commands, oversized commands (>10 000 chars), and
null bytes that could cause detection/execution layer inconsistency.
* fix(sandbox): address Copilot review — type guard, log truncation, reject reason
- Coerce None/non-string command to str before validation
- Truncate oversized commands in audit logs to prevent log amplification
- Propagate reject_reason through _pre_process() to block message
- Remove L2 label from comments and test class names
* fix(sandbox): isinstance type guard + async input sanitisation tests
Address review comments:
- Replace str() coercion with isinstance(raw_command, str) guard so
non-string truthy values (0, [], False) fall back to empty string
instead of passing validation as "0"/"[]"/"False".
- Add TestInputSanitisationBlocksInAwrapToolCall with 4 async tests
covering empty, null-byte, oversized, and None command via
awrap_tool_call path.
support for vLLM 0.19.0 OpenAI-compatible chat endpoints and fixes the Qwen reasoning toggle so flash mode can actually disable thinking.
Co-authored-by: NmanQAQ <normangyao@qq.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
ls_tool was the only sandbox tool without output size limits, allowing
multi-MB results from large directories to blow up the model context
window. Add head-truncation (configurable via ls_output_max_chars,
default 20000) consistent with existing bash and read_file truncation.
Closes#1887
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Escape shell variables to prevent Docker Compose from attempting
substitution at parse time. Rename allow_blocking_flag to allow_blocking
for consistency with dev version.
Fixes the 'allow_blocking_flag not set' warning and enables --allow-blocking
flag to work correctly.
* fix(memory): case-insensitive fact deduplication and positive reinforcement detection
Two fixes to the memory system:
1. _fact_content_key() now lowercases content before comparison, preventing
semantically duplicate facts like "User prefers Python" and "user prefers
python" from being stored separately.
2. Adds detect_reinforcement() to MemoryMiddleware (closes#1719), mirroring
detect_correction(). When users signal approval ("yes exactly", "perfect",
"完全正确", etc.), the memory updater now receives reinforcement_detected=True
and injects a hint prompting the LLM to record confirmed preferences and
behaviors with high confidence.
Changes across the full signal path:
- memory_middleware.py: _REINFORCEMENT_PATTERNS + detect_reinforcement()
- queue.py: reinforcement_detected field in ConversationContext and add()
- updater.py: reinforcement_detected param in update_memory() and
update_memory_from_conversation(); builds reinforcement_hint alongside
the existing correction_hint
Tests: 11 new tests covering deduplication, hint injection, and signal
detection (Chinese + English patterns, window boundary, conflict with correction).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(memory): address Copilot review comments on reinforcement detection
- Tighten _REINFORCEMENT_PATTERNS: remove 很好, require punctuation/end-of-string boundaries on remaining patterns, split this-is-good into stricter variants
- Suppress reinforcement_detected when correction_detected is true to avoid mixed-signal noise
- Use casefold() instead of lower() for Unicode-aware fact deduplication
- Add missing test coverage for reinforcement_detected OR merge and forwarding in queue
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* Rename BACKEND_TODO.md to TODO.md in documentation
* Update MCP Setup Guide link in CONTRIBUTING.md
* Update reference to config.yaml path in documentation
* Fix config file path in TITLE_GENERATION_IMPLEMENTATION.md
Updated the path to the example config file in the documentation.
* fix(docker): use multi-stage build to remove build-essential from runtime image
The build-essential toolchain (~200 MB) was only needed for compiling
native Python extensions during `uv sync` but remained in the final
image, increasing size and attack surface. Split the Dockerfile into
a builder stage (with build-essential) and a clean runtime stage that
copies only the compiled artifacts, Node.js, Docker CLI, and uv.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(docker): add dev stage and pin docker:cli per review feedback
Address Copilot review comments:
- Add a `dev` build stage (FROM builder) that retains build-essential
so startup-time `uv sync` in dev containers can compile from source
- Update docker-compose-dev.yaml to use `target: dev` for gateway and
langgraph services
- Keep the clean runtime stage (no build-essential) as the default
final stage for production builds
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
sandbox_from_runtime() and ensure_sandbox_initialized() write
sandbox_id into runtime.context after acquiring a sandbox. When
lazy_init=True and no context is supplied to the graph run,
runtime.context is None (the LangGraph default), causing a TypeError
on the assignment.
Add `if runtime.context is not None` guards at all three write sites.
Reads already had equivalent guards (e.g. `runtime.context.get(...) if
runtime.context else None`); this brings writes into line.
Previously, the list endpoint always returned soul=null because
_agent_config_to_response() was called without include_soul=True.
This caused confusion since PUT /api/agents/{name} and GET /api/agents/{name}
both returned the soul content, but the list endpoint silently omitted it.
Co-authored-by: octo-patch <octo-patch@users.noreply.github.com>
Add three new public skills to enhance DeerFlow's content creation capabilities:
- **academic-paper-review**: Structured peer-review-quality analysis of
research papers following top-venue review standards (NeurIPS, ICML, ACL).
Covers methodology assessment, contribution evaluation, literature
positioning, and constructive feedback with a 3-phase workflow.
- **code-documentation**: Professional documentation generation for software
projects, including README generation, API reference docs, architecture
documentation with Mermaid diagrams, and inline code documentation
supporting Python, TypeScript, Go, Rust, and Java conventions.
- **newsletter-generation**: Curated newsletter creation with research
workflow, supporting daily digest, weekly roundup, deep-dive, and industry
briefing formats. Includes audience-specific tone adaptation and
multi-source content curation.
All skills:
- Follow the existing SKILL.md frontmatter convention (name + description)
- Pass the official _validate_skill_frontmatter() validation
- Use hyphen-case naming consistent with existing skills
- Contain only allowed frontmatter properties
- Include comprehensive examples, quality checklists, and output templates
* feat(uploads): guide agent to use grep/glob/read_file for uploaded documents
Add workflow guidance to the <uploaded_files> context block so the agent
knows to use grep and glob (added in #1784) alongside read_file when
working with uploaded documents, rather than falling back to web search.
This is the final piece of the three-PR PDF agentic search pipeline:
- PR1 (#1727): pymupdf4llm converter produces structured Markdown with headings
- PR2 (#1738): document outline injected into agent context with line numbers
- PR3 (this): agent guided to use outline + grep + read_file workflow
* feat(uploads): add file-first priority and fallback guidance to uploaded_files context
* fix(uploads): handle split-bold headings and ** ** artefacts in extract_outline
- Add _clean_bold_title() to merge adjacent bold spans (** **) produced
by pymupdf4llm when bold text crosses span boundaries
- Add _SPLIT_BOLD_HEADING_RE (Style 3) to recognise **<num>** **<title>**
headings common in academic papers; excludes pure-number table headers
and rows with more than 4 bold blocks
- When outline is empty, read first 5 non-empty lines of the .md as a
content preview and surface a grep hint in the agent context
- Update _format_file_entry to render the preview + grep hint instead of
silently omitting the outline section
- Add 3 new extract_outline tests and 2 new middleware tests (65 total)
* fix(uploads): address Copilot review comments on extract_outline regex
- Replace ASCII [A-Za-z] guard with negative lookahead to support non-ASCII
titles (e.g. **1** **概述**); pure-numeric/punctuation blocks still excluded
- Replace .+ with [^*]+ and cap repetition at {0,2} (four blocks total) to
keep _SPLIT_BOLD_HEADING_RE linear and avoid ReDoS on malformed input
- Remove now-redundant len(blocks) <= 4 code-level check (enforced by regex)
- Log debug message with exc_info when preview extraction fails
Server-rendered data-variant={undefined} didn't match client hydration.
Now only render data-variant and data-size when explicitly set.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: JeffJiang <for-eleven@hotmail.com>
* feat(uploads): guide agent to use grep/glob/read_file for uploaded documents
Add workflow guidance to the <uploaded_files> context block so the agent
knows to use grep and glob (added in #1784) alongside read_file when
working with uploaded documents, rather than falling back to web search.
This is the final piece of the three-PR PDF agentic search pipeline:
- PR1 (#1727): pymupdf4llm converter produces structured Markdown with headings
- PR2 (#1738): document outline injected into agent context with line numbers
- PR3 (this): agent guided to use outline + grep + read_file workflow
* feat(uploads): add file-first priority and fallback guidance to uploaded_files context
* fix: add missing DEER_FLOW_CONFIG_PATH and DEER_FLOW_EXTENSIONS_CONFIG_PATH env vars to gateway service (fixes#1829)
The gateway service was missing these two environment variables that tell
it where to find the config files inside the container. Without them,
the gateway reads DEER_FLOW_CONFIG_PATH from the host's .env file (set
to a host filesystem path), which is not accessible inside the container,
causing FileNotFoundError on startup. The langgraph service already had
these variables set correctly.
* fix: remove nginx Plus-only zone/resolve directives from nginx.conf (fixes#1744)
The `zone` and `resolve` parameters in upstream server directives are
nginx Plus features not available in the standard `nginx:alpine` image.
This caused nginx to fail at startup with:
[emerg] invalid parameter "resolve" in /etc/nginx/nginx.conf:25
Remove these directives so the config is compatible with open-source nginx.
Docker's internal DNS (127.0.0.11, already configured via `resolver`) handles
service name resolution. The `resolver` directive is kept for the provisioner
location which uses variable-based proxy_pass for optional-service support.
The gateway service was missing these two environment variables that tell
it where to find the config files inside the container. Without them,
the gateway reads DEER_FLOW_CONFIG_PATH from the host's .env file (set
to a host filesystem path), which is not accessible inside the container,
causing FileNotFoundError on startup. The langgraph service already had
these variables set correctly.
* fix: inject longTermBackground into memory prompt
The format_memory_for_injection function only processed recentMonths and
earlierContext from the history section, silently dropping longTermBackground.
The LLM writes longTermBackground correctly and it persists to memory.json,
but it was never injected into the system prompt — making the user's
long-term background invisible to the AI.
Add the missing field handling and a regression test.
* fix(middleware): handle list-type AIMessage.content in LoopDetectionMiddleware
LangChain AIMessage.content can be str | list. When using providers that
return structured content blocks (e.g. Anthropic thinking mode, certain
OpenAI-compatible gateways), content is a list of dicts like
[{"type": "text", "text": "..."}].
The hard_limit branch in _apply() concatenated content with a string via
(last_msg.content or "") + f"\n\n{_HARD_STOP_MSG}", which raises
TypeError when content is a non-empty list (list + str is invalid).
Add _append_text() static method that:
- Returns the text directly when content is None
- Appends a {"type": "text"} block when content is a list
- Falls back to string concatenation when content is a str
This is consistent with how other modules in the project already handle
list content (client.py._extract_text, memory_middleware, executor.py).
* test(middleware): add unit tests for _append_text and list content hard stop
Add regression tests to verify LoopDetectionMiddleware handles list-type
AIMessage.content correctly during hard stop:
- TestAppendText: unit tests for the new _append_text() static method
covering None, str, list (including empty list) content types
- TestHardStopWithListContent: integration tests verifying hard stop
works correctly with list content (Anthropic thinking mode), None
content, and str content
Requested by reviewer in PR #1823.
* fix(middleware): improve _append_text robustness and test isolation
- Add explicit isinstance(content, str) check with fallback for
unexpected types (coerce to str) to prevent TypeError on edge cases
- Deep-copy list content in _make_state() test helper to prevent
shared mutable references across test iterations
- Add test_unexpected_type_coerced_to_str: verify fallback for
non-str/list/None content types
- Add test_list_content_not_mutated_in_place: verify _append_text
does not modify the original list
* style: fix ruff format whitespace in test file
---------
Co-authored-by: ppyt <14163465+ppyt@users.noreply.github.com>
* feat(uploads): add pymupdf4llm PDF converter with auto-fallback and async offload
- Introduce pymupdf4llm as an optional PDF converter with better heading
detection and table preservation than MarkItDown
- Auto mode: prefer pymupdf4llm when installed; fall back to MarkItDown
when output is suspiciously sparse (image-based / scanned PDFs)
- Sparsity check uses chars-per-page (< 50 chars/page) rather than an
absolute threshold, correctly handling both short and long documents
- Large files (> 1 MB) are offloaded to asyncio.to_thread() to avoid
blocking the event loop (related: #1569)
- Add UploadsConfig with pdf_converter field (auto/pymupdf4llm/markitdown)
- Add pymupdf4llm as optional dependency: pip install deerflow-harness[pymupdf]
- Add 14 unit tests covering sparsity heuristic, routing logic, and async path
* fix(uploads): address Copilot review comments on PDF converter
- Fix docstring: MIN_CHARS_PYMUPDF -> _MIN_CHARS_PER_PAGE (typo)
- Fix file handle leak: wrap pymupdf.open in try/finally to ensure doc.close()
- Fix silent fallback gap: _convert_pdf_with_pymupdf4llm now catches all
conversion exceptions (not just ImportError), so encrypted/corrupt PDFs
fall back to MarkItDown instead of propagating
- Tighten type: pdf_converter field changed from str to Literal[auto|pymupdf4llm|markitdown]
- Normalize config value: _get_pdf_converter() strips and lowercases the raw
config string, warns and falls back to 'auto' on unknown values
* feat(uploads): inject document outline into agent context for converted files
Extract headings from converted .md files and inject them into the
<uploaded_files> context block so the agent can navigate large documents
by line number before reading.
- Add `extract_outline()` to `file_conversion.py`: recognises standard
Markdown headings (#/##/###) and SEC-style bold structural headings
(**ITEM N. BUSINESS**, **PART II**); caps at 50 entries; excludes
cover-page boilerplate (WASHINGTON DC, CURRENT REPORT, SIGNATURES)
- Add `_extract_outline_for_file()` helper in `uploads_middleware.py`:
looks for a sibling `.md` file produced by the conversion pipeline
- Update `UploadsMiddleware._create_files_message()` to render the outline
under each file entry with `L{line}: {title}` format and a `read_file`
prompt for range-based reading
- Tests: 10 new tests for `extract_outline()`, 4 new tests for outline
injection in `UploadsMiddleware`; existing test updated for new `outline`
field in `uploaded_files` state
Partially addresses #1647 (agent ignores uploaded files).
* fix(uploads): stream outline file reads and strip inline bold from heading titles
- Switch extract_outline() from read_text().splitlines() to open()+line iteration
so large converted documents are not loaded into memory on every agent turn;
exits as soon as MAX_OUTLINE_ENTRIES is reached (Copilot suggestion)
- Strip **...** wrapper from standard Markdown heading titles before appending
to outline so agent context stays clean (e.g. "## **Overview**" → "Overview")
(Copilot suggestion)
- Remove unused pathlib.Path import and fix import sort order in test_file_conversion.py
to satisfy ruff CI lint
* fix(uploads): show truncation hint when outline exceeds MAX_OUTLINE_ENTRIES
When extract_outline() hits the cap it now appends a sentinel entry
{"truncated": True} instead of silently dropping the rest of the headings.
UploadsMiddleware reads the sentinel and renders a hint line:
... (showing first 50 headings; use `read_file` to explore further)
Without this the agent had no way to know the outline was incomplete and
would treat the first 50 headings as the full document structure.
* fix(uploads): fall back to configurable.thread_id when runtime.context lacks thread_id
runtime.context does not always carry thread_id (depends on LangGraph
invocation path). ThreadDataMiddleware already falls back to
get_config().configurable.thread_id — apply the same pattern so
UploadsMiddleware can resolve the uploads directory and attach outlines
in all invocation paths.
* style: apply ruff format
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(uploads): fall back to configurable.thread_id when runtime.context lacks thread_id
runtime.context does not always carry thread_id depending on the
LangGraph invocation path. When absent, uploads_dir resolved to None
and the entire outline/historical-files attachment was silently skipped.
Apply the same fallback pattern already used by ThreadDataMiddleware:
try get_config().configurable.thread_id, with a RuntimeError guard for
test environments where get_config() is called outside a runnable context.
Discovered via live integration testing (curl against local LangGraph).
Unit tests inject uploads_dir directly and would not catch this.
* style: apply ruff format to uploads_middleware.py
When MemoryStreamBridge queue reaches capacity, publish_end() previously
used the same 30s timeout + drop strategy as regular events. If the END
sentinel was dropped, subscribe() would loop forever waiting for it,
causing the SSE connection to hang indefinitely and leaking _queues and
_counters resources for that run_id.
Changes:
- publish_end() now evicts oldest regular events when queue is full to
guarantee END sentinel delivery — the sentinel is the only signal that
allows subscribers to terminate
- Added per-run drop counters (_dropped_counts) with dropped_count() and
dropped_total properties for observability
- cleanup() and close() now clear drop counters
- publish() logs total dropped count per run for easier debugging
Tests:
- test_end_sentinel_delivered_when_queue_full: verifies END arrives even
with a completely full queue
- test_end_sentinel_evicts_oldest_events: verifies eviction behavior
- test_end_sentinel_no_eviction_when_space_available: no side effects
when queue has room
- test_concurrent_tasks_end_sentinel: 4 concurrent producer/consumer
pairs all terminate properly
- test_dropped_count_tracking, test_dropped_total,
test_cleanup_clears_dropped_counts, test_close_clears_dropped_counts:
drop counter coverage
Closes#1689
Co-authored-by: voidborne-d <voidborne-d@users.noreply.github.com>
* fix: use SystemMessage+HumanMessage for follow-up question generation (fixes#1697)
Some models (e.g. MiniMax-M2.7) require the system prompt and user
content to be passed as separate message objects rather than a single
combined string. Invoking with a plain string sends everything as a
HumanMessage, which causes these models to ignore the generation
instructions and fail to produce valid follow-up questions.
* test: verify model is invoked with SystemMessage and HumanMessage
* Add explicit save action for agent creation
* Hide internal save prompts and retry agent reads
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
The format_memory_for_injection function only processed recentMonths and
earlierContext from the history section, silently dropping longTermBackground.
The LLM writes longTermBackground correctly and it persists to memory.json,
but it was never injected into the system prompt — making the user's
long-term background invisible to the AI.
Add the missing field handling and a regression test.
Co-authored-by: ppyt <14163465+ppyt@users.noreply.github.com>
* feat: add docs site
- Implemented dynamic routing for MDX documentation pages with language support.
- Created layout components for documentation with a header and footer.
- Added metadata for various documentation sections in English and Chinese.
- Developed initial content for the DeerFlow App and Harness documentation.
- Introduced i18n hooks and translations for English and Chinese languages.
- Enhanced header component to include navigation links for documentation and blog.
- Established a structure for tutorials and reference materials.
- Created a new translations file to manage locale-specific strings.
* feat: enhance documentation structure and content for application and harness sections
* feat: update .gitignore to include .playwright-mcp and remove obsolete Playwright YAML file
* fix(docs): correct punctuation and formatting in documentation files
* feat(docs): remove outdated index.mdx file from documentation
* fix(docs): update documentation links and improve Chinese description in index.mdx
* fix(docs): update title in Chinese for meta information in _meta.ts
* fix(frontend): add missing rel="noopener noreferrer" to target="_blank" links
Prevent tabnabbing attacks and referrer leakage by ensuring all
external links with target="_blank" include both noopener and
noreferrer in the rel attribute.
Made-with: Cursor
* style: fix code formatting
* fix(sandbox): URL路径被误判为不安全绝对路径 (#1385)
在本地沙箱模式下,bash工具对命令做绝对路径安全校验时,会把curl命令中的
HTTPS URL(如 https://example.com/api/v1/check)误识别为本地绝对路径并拦截。
根因:_ABSOLUTE_PATH_PATTERN 正则的负向后行断言 (?<![:\w]) 只排除了冒号和
单词字符,但 :// 中第二个斜杠前面是第一个斜杠(/),不在排除列表中,导致
//example.com/api/... 被匹配为绝对路径 /example.com/api/...。
修复:在负向后行断言中增加斜杠字符,改为 (?<![:\w/]),使得 :// 中的连续
斜杠不会触发绝对路径匹配。同时补充了URL相关的单元测试用例。
Signed-off-by: moose-lab <moose-lab@users.noreply.github.com>
* fix(sandbox): refine absolute path regex to preserve file:// defense-in-depth
Change lookbehind from (?<![:\w/]) to (?<![:\w])(?<!:/) so only the
second slash in :// sequences is excluded. This keeps URL paths from
false-positiving while still letting the regex detect /etc/passwd in
file:///etc/passwd. Also add explicit file:// URL blocking and tests.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Signed-off-by: moose-lab <moose-lab@users.noreply.github.com>
Co-authored-by: moose-lab <moose-lab@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix: prevent concurrent subagent file write conflicts
Serialize same-path str_replace operations in sandbox tools
Guard AioSandbox write_file/update_file with the existing sandbox lock
Add regression tests for concurrent str_replace and append races
Verify with backend full tests and ruff lint checks
* fix(sandbox): Fix the concurrency issue of file operations on the same path in isolated sandboxes.
Ensure that different sandbox instances use independent locks for file operations on the same virtual path to avoid concurrency conflicts. Change the lock key from a single path to a composite key of (sandbox.id, path), and add tests to verify the concurrent safety of isolated sandboxes.
* feat(sandbox): Extract file operation lock logic to standalone module and fix concurrency issues
Extract file operation lock related logic from tools.py into a separate file_operation_lock.py module.
Fix data race issues during concurrent str_replace and write_file operations.
* feat(agent): 为AgentConfig添加skills字段并更新lead_agent系统提示
在AgentConfig中添加skills字段以支持配置agent可用技能
更新lead_agent的系统提示模板以包含可用技能信息
* fix: resolve agent skill configuration edge cases and add tests
* Update backend/packages/harness/deerflow/agents/lead_agent/prompt.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* refactor(agent): address PR review comments for skills configuration
- Add detailed docstring to `skills` field in `AgentConfig` to clarify the semantics of `None` vs `[]`.
- Add unit tests in `test_custom_agent.py` to verify `load_agent_config()` correctly parses omitted skills and explicit empty lists.
- Fix `test_make_lead_agent_empty_skills_passed_correctly` to include `agent_name` in the runtime config, ensuring it exercises the real code path.
* docs: 添加关于按代理过滤技能的配置说明
在配置示例文件和文档中添加说明,解释如何通过代理的config.yaml文件限制加载的技能
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* feat(sandbox): truncate oversized bash and read_file tool outputs
Long tool outputs (large directory listings, multi-MB source files) can
overflow the model's context window. Two new configurable limits:
- bash_output_max_chars (default 20000): middle-truncates bash output,
preserving both head and tail so stderr at the end is not lost
- read_file_output_max_chars (default 50000): head-truncates file output
with a hint to use start_line/end_line for targeted reads
Both limits are enforced at the tool layer (sandbox/tools.py) rather
than middleware, so truncation is guaranteed regardless of call path.
Setting either limit to 0 disables truncation entirely.
Measured: read_file on a 250KB source file drops from 63,698 tokens to
19,927 tokens (69% reduction) with the default limit.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(tests): remove unused pytest import and fix import sort order
* style: apply ruff format to sandbox/tools.py
* refactor(sandbox): address Copilot review feedback on truncation feature
- strict hard cap: while-loop ensures result (including marker) ≤ max_chars
- max_chars=0 now returns "" instead of original output
- get_app_config() wrapped in try/except with fallback to defaults
- sandbox_config.py: add ge=0 validation on truncation limit fields
- config.example.yaml: bump config_version 4→5
- tests: add len(result) <= max_chars assertions, edge-case (max=0, small
max, various sizes) tests; fix skipped-count test for strict hard cap
* refactor(sandbox): replace while-loop truncation with fixed marker budget
Use a pre-allocated constant (_MARKER_MAX_LEN) instead of a convergence
loop to ensure result <= max_chars. Simpler, safer, and skipped-char
count in the marker is now an exact predictable value.
* refactor(sandbox): compute marker budget dynamically instead of hardcoding
* fix(sandbox): make max_chars=0 disable truncation instead of returning empty string
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: JeffJiang <for-eleven@hotmail.com>
Feishu channel classified any slash-prefixed text (including absolute
paths such as /mnt/user-data/...) as a COMMAND, causing them to be
misrouted through the command pipeline instead of the chat pipeline.
Fix by introducing a shared KNOWN_CHANNEL_COMMANDS frozenset in
app/channels/commands.py — the single authoritative source for the set
of supported slash commands. Both the Feishu inbound parser and the
ChannelManager's unknown-command reply now derive from it, so adding
or removing a command requires only one edit.
Changes:
- app/channels/commands.py (new): defines KNOWN_CHANNEL_COMMANDS
- app/channels/feishu.py: replace local KNOWN_FEISHU_COMMANDS with the
shared constant; _is_feishu_command() now gates on it
- app/channels/manager.py: import KNOWN_CHANNEL_COMMANDS and use it in
the unknown-command fallback reply so the displayed list stays in sync
- tests/test_feishu_parser.py: parametrize over every entry in
KNOWN_CHANNEL_COMMANDS (each must yield msg_type=command) and add
parametrized chat cases for /unknown, absolute paths, etc.
Made with Cursor
Made-with: Cursor
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(gateway): prevent 400 error when client sends context with configurable
Fixes#1290
LangGraph >= 0.6.0 rejects requests that include both 'configurable' and
'context' in the run config. If the client (e.g. useStream hook) sends
a 'context' key, we now honour it and skip creating our own
'configurable' dict to avoid the conflict.
When no 'context' is provided, we fall back to the existing
'configurable' behaviour with thread_id.
* fix(gateway): address review feedback — warn on dual keys, fix runtime injection, add tests
- Log a warning when client sends both 'context' and 'configurable' so
it's no longer silently dropped (reviewer feedback)
- Ensure thread_id is available in config['context'] when present so
middlewares can find it there too
- Add test coverage for the context path, the both-keys-present case,
passthrough of other keys, and the no-config fallback
* style: ruff format services.py
---------
Co-authored-by: JasonOA888 <JasonOA888@users.noreply.github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* windows check and dev fixes
* fix windows startup scripts
* fix windows startup scripts
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
The langgraph-compat layer dropped the DeerFlow-specific `context` field
from run requests, causing agent config (subagent_enabled, is_plan_mode,
thinking_enabled, etc.) to fall back to defaults. Add `context` to
RunCreateRequest and merge allowlisted keys into config.configurable in
start_run, with existing configurable values taking precedence.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: replace sync requests with async httpx in Jina AI client
Replace synchronous `requests.post()` with `httpx.AsyncClient` in
JinaClient.crawl() and make web_fetch_tool async. This is part of the
planned async concurrency optimization for the agent hot path
(see docs/TODO.md).
* fix: address Copilot review feedback on async Jina client
- Short-circuit error strings in web_fetch_tool before passing to
ReadabilityExtractor, preventing misleading extraction results
- Log missing JINA_API_KEY warning only once per process to reduce
noise under concurrent async fetching
- Use logger.exception instead of logger.error in crawl exception
handler to preserve stack traces for debugging
- Add async web_fetch_tool tests and warn-once coverage
* fix: mock get_app_config in web_fetch_tool tests for CI
The web_fetch_tool tests failed in CI because get_app_config requires
a config.yaml file that isn't present in the test environment. Mock
the config loader to remove the filesystem dependency.
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
#1623 added this flag to both Docker Compose files but missed the
backend Makefile used by `make dev`. Without it `langgraph dev`
defaults to n_jobs_per_worker=1, so all conversation runs are
serialised and concurrent requests block.
This mirrors the Docker configuration.
Docker's -v host:container syntax is ambiguous for Windows drive-letter
paths (e.g. D:/...) because ':' is both the drive separator and the
volume separator, causing mount failures on Windows hosts.
Introduce _format_container_mount() which uses '--mount type=bind,...'
for Docker (unambiguous on all platforms) and keeps '-v' for Apple
Container runtime which does not support the --mount flag yet.
Adds unit tests covering Windows paths, read-only mounts, and Apple
Container pass-through.
Made-with: Cursor
* fix(gateway): forward assistant_id as agent_name in build_run_config
Fixes#1644
When the LangGraph Platform-compatible /runs endpoint receives a custom
assistant_id (e.g. 'finalis'), the Gateway's build_run_config() silently
ignored it — configurable['agent_name'] was never set, so make_lead_agent
fell through to the default lead agent and SOUL.md was never loaded.
Root cause (introduced in #1403):
resolve_agent_factory() correctly falls back to make_lead_agent for all
assistant_id values, but build_run_config() had no assistant_id parameter
and never injected configurable['agent_name']. The full call chain:
POST /runs (assistant_id='finalis')
→ resolve_agent_factory('finalis') # returns make_lead_agent ✓
→ build_run_config(thread_id, ...) # no agent_name injected ✗
→ make_lead_agent(config)
→ cfg.get('agent_name') → None
→ load_agent_soul(None) → base SOUL.md (doesn't exist) → None
Fix:
- Add keyword-only parameter to build_run_config().
- When assistant_id is set and differs from 'lead_agent', inject it as
configurable['agent_name'] (matching the channel manager's existing
_resolve_run_params() logic for IM channels).
- Honour an explicit configurable['agent_name'] in the request body;
assistant_id mapping only fills the gap when it is absent.
- Remove stale log-only branch from resolve_agent_factory(); update
docstring to explain the factory/configurable split.
Tests added (test_gateway_services.py):
- Custom assistant_id injects configurable['agent_name']
- 'lead_agent' assistant_id does NOT inject agent_name
- None assistant_id does NOT inject agent_name
- Explicit configurable['agent_name'] in request is not overwritten
- resolve_agent_factory returns make_lead_agent for all inputs
* style: format with ruff
* fix: validate and normalize assistant_id to prevent path traversal
Addresses Copilot review: strip/lowercase/replace underscores and
reject names that don't match [a-z0-9-]+, consistent with
ChannelManager._normalize_custom_agent_name().
---------
Co-authored-by: voidborne-d <voidborne-d@users.noreply.github.com>
* fix(sandbox): serialize concurrent exec_command calls in AioSandbox
The AIO sandbox container maintains a single persistent shell session
that corrupts when multiple exec_command requests arrive concurrently
(e.g. when ToolNode issues parallel tool_calls). The corrupted session
returns 'ErrorObservation' strings as output, cascading into subsequent
commands.
Add a threading.Lock to AioSandbox to serialize shell commands. As a
secondary defense, detect ErrorObservation in output and retry with a
fresh session ID.
Fixes#1433
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(sandbox): address Copilot review findings
- Fix shell injection in list_dir: use shlex.quote(path) to escape
user-provided paths in the find command
- Narrow ErrorObservation retry condition from broad substring match
to the specific corruption signature to prevent false retries
- Improve test_lock_prevents_concurrent_execution: use threading.Barrier
to ensure all workers contend for the lock simultaneously
- Improve test_list_dir_uses_lock: assert lock.locked() is True during
exec_command to verify lock acquisition
* style: auto-format with ruff
---------
Co-authored-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
view_image_tool.py had a top-level import of deerflow.sandbox.tools, which
created a circular dependency chain:
sandbox.tools
-> deerflow.agents.thread_state (triggers agents/__init__.py)
-> agents/factory.py
-> tools/builtins/__init__.py
-> view_image_tool.py
-> deerflow.sandbox.tools <-- circular!
This caused ImportError when any test directly imported sandbox.tools,
making test_sandbox_tools_security.py fail to collect since #1522.
Fix: move the sandbox.tools import inside the view_image_tool function body.
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(frontend): distinguish CORS errors from generic name check failures
* fix(frontend): improve network error message for agent name check
* Fix network error message in zh-CN locale
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
The `environment` section in docker-compose.yaml set
`LANGSMITH_TRACING=${LANGSMITH_TRACING:-false}`, which always resolves
to `false` because Docker Compose evaluates `${}` substitutions from
the host shell environment, not from `env_file`.
Since `environment` entries take precedence over `env_file`, setting
`LANGSMITH_TRACING=true` in `.env` had no effect — tracing stayed
disabled despite following the documented instructions.
Remove the explicit `LANGSMITH_TRACING` from `environment` so the
value from `.env` (loaded via `env_file`) is used as intended.
The dev Docker Compose uses named volumes (langgraph-venv, gateway-venv)
to persist .venv across container restarts. Docker only populates named
volumes from the image on first creation — subsequent rebuilds do NOT
refresh existing volume contents.
When new dependencies are added to packages/harness/pyproject.toml
(e.g. langchain-anthropic), the stale named volume still contains
the old .venv missing the new packages, causing ModuleNotFoundError
at runtime.
Add `uv sync` before launching both gateway and langgraph services.
When deps are already satisfied this is a no-op (~1s), but when the
volume is stale it installs missing packages before the service starts.
Fixes#1624
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
`langgraph dev` defaults `n_jobs_per_worker` to 1 when the flag is not
explicitly passed (see langgraph_api/cli.py), even though the
`N_JOBS_PER_WORKER` env-var default is 10.
This causes the LangGraph server to run with a single background worker,
meaning all conversation runs are processed serially. When one run is
busy (e.g. summarization or long tool-calling chains), all other threads
are blocked until it finishes.
Add `--n-jobs-per-worker 10` to both production and dev Docker Compose
files to match the intended default concurrency.
* feat(gateway): implement LangGraph Platform API in Gateway, replace langgraph-cli
Implement all core LangGraph Platform API endpoints in the Gateway,
allowing it to fully replace the langgraph-cli dev server for local
development. This eliminates a heavyweight dependency and simplifies
the development stack.
Changes:
- Add runs lifecycle endpoints (create, stream, wait, cancel, join)
- Add threads CRUD and search endpoints
- Add assistants compatibility endpoints (search, get, graph, schemas)
- Add StreamBridge (in-memory pub/sub for SSE) and async provider
- Add RunManager with atomic create_or_reject (eliminates TOCTOU race)
- Add worker with interrupt/rollback cancel actions and runtime context injection
- Route /api/langgraph/* to Gateway in nginx config
- Skip langgraph-cli startup by default (SKIP_LANGGRAPH_SERVER=0 to restore)
- Add unit tests for RunManager, SSE format, and StreamBridge
* fix: drain bridge queue on client disconnect to prevent backpressure
When on_disconnect=continue, keep consuming events from the bridge
without yielding, so the worker is not blocked by a full queue.
Only on_disconnect=cancel breaks out immediately.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* fix: remove pytest import
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* fix: Fix default stream_mode to ["values", "messages-tuple"]
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* fix: Remove unused if_exists field from ThreadCreateRequest
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* fix: address review comments on gateway LangGraph API
- Mount runs.py router in app.py (missing include_router)
- Normalize interrupt_before/after "*" to node list before run_agent()
- Use entry.id for SSE event ID instead of counter
- Drain bridge queue on disconnect when on_disconnect=continue
- Reuse serialization helper in wait_run() for consistent wire format
- Reject unsupported multitask_strategy with 400
- Remove SKIP_LANGGRAPH_SERVER fallback, always use Gateway
* feat: extract app.state access into deps.py
Encapsulate read/write operations for singleton objects (RunManager,
StreamBridge, checkpointer) held in app.state into a shared utility,
reducing repeated access patterns across router modules.
* feat: extract deerflow.runtime.serialization module with tests
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: replace duplicated serialization with deerflow.runtime.serialization
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: extract app/gateway/services.py with run lifecycle logic
Create a service layer that centralizes SSE formatting, input/config
normalization, and run lifecycle management. Router modules will delegate
to these functions instead of using private cross-imported helpers.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: wire routers to use services layer, remove cross-module private imports
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* style: apply ruff formatting to refactored files
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(runtime): support LangGraph dev server and add compat route
- Enable official LangGraph dev server for local development workflow
- Decouple runtime components from agents package for better separation
- Provide gateway-backed fallback route when dev server is skipped
- Simplify lifecycle management using context manager in gateway
* feat(runtime): add Store providers with auto-backend selection
- Add async_provider.py and provider.py under deerflow/runtime/store/
- Support memory, sqlite, postgres backends matching checkpointer config
- Integrate into FastAPI lifespan via AsyncExitStack in deps.py
- Replace hardcoded InMemoryStore with config-driven factory
* refactor(gateway): migrate thread management from checkpointer to Store and resolve multiple endpoint failures
- Add Store-backed CRUD helpers (_store_get, _store_put, _store_upsert)
- Replace checkpoint-scanning search with two-phase strategy:
phase 1 reads Store (O(threads)), phase 2 backfills from checkpointer
for legacy/LangGraph Server threads with lazy migration
- Extend Store record schema with values field for title persistence
- Sync thread title from checkpoint to Store after run completion
- Fix /threads/{id}/runs/{run_id}/stream 405 by accepting both
GET and POST methods; POST handles interrupt/rollback actions
- Fix /threads/{id}/state 500 by separating read_config and
write_config, adding checkpoint_ns to configurable, and
shallow-copying checkpoint/metadata before mutation
- Sync title to Store on state update for immediate search reflection
- Move _upsert_thread_in_store into services.py, remove duplicate logic
- Add _sync_thread_title_after_run: await run task, read final
checkpoint title, write back to Store record
- Spawn title sync as background task from start_run when Store exists
* refactor(runtime): deduplicate store and checkpointer provider logic
Extract _ensure_sqlite_parent_dir() helper into checkpointer/provider.py
and use it in all three places that previously inlined the same mkdir logic.
Consolidate duplicate error constants in store/async_provider.py by importing
from store/provider.py instead of redefining them.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor(runtime): move SQLite helpers to runtime/store, checkpointer imports from store
_resolve_sqlite_conn_str and _ensure_sqlite_parent_dir now live in
runtime/store/provider.py. agents/checkpointer/provider and
agents/checkpointer/async_provider import from there, reversing the
previous dependency direction (store → checkpointer becomes
checkpointer → store).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor(runtime): extract SQLite helpers into runtime/store/_sqlite_utils.py
Move resolve_sqlite_conn_str and ensure_sqlite_parent_dir out of
checkpointer/provider.py into a dedicated _sqlite_utils module.
Functions are now public (no underscore prefix), making cross-module
imports semantically correct. All four provider files import from
the single shared location.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(gateway): use adelete_thread to fully remove thread checkpoints on delete
AsyncSqliteSaver has no adelete method — the previous hasattr check
always evaluated to False, silently leaving all checkpoint rows in the
database. Switch to adelete_thread(thread_id) which deletes every
checkpoint and pending-write row for the thread across all namespaces
(including sub-graph checkpoints).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(gateway): remove dead bridge_cm/ckpt_cm code and fix StrEnum lint
app.py had unreachable code after the async-with lifespan refactor:
bridge_cm and ckpt_cm were referenced but never defined (F821), and
the channel service startup/shutdown was outside the langgraph_runtime
block so it never ran. Move channel service lifecycle inside the
async-with block where it belongs.
Replace str+Enum inheritance in RunStatus and DisconnectMode with
StrEnum as suggested by UP042.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* style: format with ruff
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: JeffJiang <for-eleven@hotmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
The lark-oapi SDK defaults to open.feishu.cn (China), but apps on the
international Lark platform (open.larksuite.com) fail to connect with
error 1000040351 'Incorrect domain name'.
Changes:
- Add 'domain' config option to feishu channel (default: open.feishu.cn)
- Pass domain to both API client and WebSocket client
- Update config.example.yaml and all README files
* fix: promote matched tools from deferred registry after tool_search returns schema
After tool_search returns a tool's full schema, the tool is promoted
(removed from the deferred registry) so DeferredToolFilterMiddleware
stops filtering it from bind_tools on subsequent LLM calls.
Without this, deferred tools are permanently filtered — the LLM gets
the schema from tool_search but can never invoke the tool because
the middleware keeps stripping it.
Fixes#1554
* test: add promote() and tool_search promotion tests
Tests cover:
- promote removes tools from registry
- promote nonexistent/empty is no-op
- search returns nothing after promote
- middleware passes promoted tools through
- tool_search auto-promotes matched tools (select + keyword)
* fix: address review — lint blank line + empty registry guard
- Add missing blank line between FakeRequest methods (E301)
- Use 'if not registry' to handle empty registries consistently
---------
Co-authored-by: d 🔹 <258577966+voidborne-d@users.noreply.github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(config): correct MiniMax M2.7 highspeed model name and add thinking support
- Rename minimax-m2.5-highspeed to minimax-m2.7-highspeed for CN region
- Add supports_thinking: true for both M2.7 and M2.7-highspeed models
* Add supports_thinking option to config examples
Added support_thinking configuration option in examples.
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(dev): exclude sandbox dirs from gateway hot-reload watcher
The dev-mode gateway uses --reload which watches for file changes.
Sandbox containers mount the repo and write .pyc/__pycache__ during
execution, triggering spurious gateway restarts mid-request.
Add --reload-exclude for .pyc, __pycache__, and sandbox/ paths so
only actual source changes trigger a reload.
Fixes#1513
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: JasonOA888 <JasonOA888@users.noreply.github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* feat(sandbox): add SandboxAuditMiddleware for bash command security auditing
Addresses the LocalSandbox escape vector reported in #1224 where bash tool
calls can execute destructive commands against the host filesystem.
- Add SandboxAuditMiddleware with three-tier command classification:
- High-risk (block): rm -rf /, curl|bash, dd if=, mkfs, /etc/shadow access
- Medium-risk (warn): pip install, apt install, chmod 777
- Safe (pass): normal workspace operations
- Register middleware after GuardrailMiddleware in _build_runtime_middlewares,
applied to both lead agent and subagents
- Structured audit log via standard logger (visible in langgraph.log)
- Medium-risk commands execute but append a warning to the tool result,
allowing the LLM to self-correct without blocking legitimate workflows
- High-risk commands return an error ToolMessage without calling the handler,
so the agent loop continues gracefully
* fix(lint): sort imports in test_sandbox_audit_middleware
* refactor(sandbox-audit): address Copilot review feedback (3/5/6)
- Fix class docstring to match implementation: medium-risk commands are
executed with a warning appended (not rejected), and cwd anchoring note
removed (handled in a separate PR)
- Remove capsys.disabled() from benchmark test to avoid CI log noise;
keep assertions for recall/precision targets
- Remove misleading 'cwd fix' from test module docstring
* test(sandbox-audit): add async tests for awrap_tool_call
* fix(sandbox-audit): address Copilot review feedback (1/2)
- Narrow rm high-risk regex to only block truly destructive targets
(/, /*, ~, ~/*, /home, /root); legitimate workspace paths like
/mnt/user-data/ are no longer false-positived
- Handle list-typed ToolMessage content in _append_warn_to_result;
append a text block instead of str()-ing the list to avoid breaking
structured content normalization
* style: apply ruff format to sandbox_audit_middleware files
* fix(sandbox-audit): update benchmark comment to match assert-based implementation
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(task_tool): fallback to configurable thread_id when context is missing
task_tool only read thread_id from runtime.context, but when invoked
via LangGraph Server, thread_id lives in config.configurable instead.
Add the same fallback that ThreadDataMiddleware uses (PR #1237).
Fixes subagent execution failure: 'Thread ID is required in runtime
context or config.configurable'
* remove debug logging from task_tool
* fix(sandbox): anchor relative paths to thread workspace in local mode
In local sandbox mode, bash commands using relative paths were resolved
against the langgraph server process cwd (backend/) instead of the
per-thread workspace directory. This allowed relative-path writes to
escape the thread isolation boundary.
Root cause: validate_local_bash_command_paths and
replace_virtual_paths_in_command only process absolute paths (scanning
for '/' prefix). Relative paths pass through untouched and inherit the
process cwd at subprocess.run time.
Fix: after virtual path translation, prepend `cd {workspace} &&` to
anchor the shell's cwd to the thread-isolated workspace directory before
execution. shlex.quote() ensures paths with spaces or special characters
are handled safely.
This mirrors the approach used by OpenHands (fixed cwd at execution
layer) and is the correct fix for local mode where each subprocess.run
is an independent process with no persistent shell session.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor(sandbox): extract _apply_cwd_prefix and add unit tests
Extract the workspace cd-prefix logic from bash_tool into a dedicated
_apply_cwd_prefix() helper so it can be unit-tested in isolation.
Add four tests covering: normal prefix, no thread_data, missing
workspace_path, and paths with spaces (shlex.quote).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* revert: remove unrelated configurable thread_id fallback from sandbox/tools.py
This change belongs in a separate PR.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* style: remove trailing whitespace in test_sandbox_tools_security
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* Fix path for TitleMiddleware implementation
* Fix link to Provisioner Setup Guide in CONFIGURATION.md
* Update file path for TitleMiddleware implementation
* Update image paths in Leica photography article
* fix: add Windows shell fallback for local sandbox
* fix: handle PowerShell execution on Windows
* fix: handle Windows local shell execution
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(security): disable host bash by default in local sandbox
* fix(security): address review feedback for local bash hardening
* fix(ci): sort live test imports for lint
* style: apply backend formatter
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Added explicit timeout and retry configurations to `config.example.yaml` to help users properly configure their model connections.
Since different LangChain provider classes use different parameter names, this update maps the correct arguments for each:
- ChatOpenAI (OpenAI, MiniMax, Novita, OpenRouter): added `request_timeout` and `max_retries`
- ChatAnthropic (Claude): added `default_request_timeout` and `max_retries`
- ChatGoogleGenerativeAI (Gemini): added `timeout` and `max_retries`
- PatchedChatDeepSeek (Doubao, DeepSeek, Kimi): added `timeout` and `max_retries`
Default example values are set to 600.0 seconds for timeouts and 2 for max retries.
* fix(sandbox): fall back to config.configurable for thread_id in lazy sandbox init
LangGraph Server injects thread_id via config["configurable"]["thread_id"],
not always via context["thread_id"]. Without the fallback, lazy sandbox
acquisition fails when context is empty.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(sandbox): align configurable fallback style with task_tool.py
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(sandbox): guard runtime.config None check for thread_id fallback
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(client): support custom middleware injection
Add support for custom middleware, allowing custom middleware list to be passed when initializing DeerFlowClient. These middleware will be injected after the default middleware when creating the agent, extending the agent's functionality.
* feat: inject custom middlewares before ClarificationMiddleware to preserve ordering
- Add `custom_middlewares` param to `_build_middlewares`
- Inject custom middlewares right before `ClarificationMiddleware` to keep it as the last in the chain
- Remove unsafe `.extend()` in `client.py`
- Update tests in `test_client.py` and `test_lead_agent_model_resolution.py` to assert correct injection ordering
* fix(task_tool): fallback to configurable thread_id when context is missing
task_tool only read thread_id from runtime.context, but when invoked
via LangGraph Server, thread_id lives in config.configurable instead.
Add the same fallback that ThreadDataMiddleware uses (PR #1237).
Fixes subagent execution failure: 'Thread ID is required in runtime
context or config.configurable'
* remove debug logging from task_tool
* fix(oauth): inject billing header for non-Haiku model access
The Anthropic Messages API requires a billing identification block
in the system prompt when using Claude Code OAuth tokens (sk-ant-oat*)
to access non-Haiku models (Opus, Sonnet). Without it, the API returns
a generic 400 "Error" with no actionable detail.
This was discovered by intercepting Claude Code CLI requests — the CLI
injects an `x-anthropic-billing-header` text block as the first system
prompt entry on every request. Third-party consumers of the same OAuth
tokens must do the same.
Changes:
- Add `_apply_oauth_billing()` to `ClaudeChatModel` that prepends the
billing header block to the system prompt when `_is_oauth` is True
- Add `metadata.user_id` with device/session identifiers (required by
the API alongside the billing header)
- Called from `_get_request_payload()` before prompt caching runs
Verified with Claude Max OAuth tokens against all three model tiers:
- claude-opus-4-6: 200 OK
- claude-sonnet-4-6: 200 OK
- claude-haiku-4-5-20251001: 200 OK (was already working)
Closes#1245
* fix(oauth): address review feedback on billing header injection
- Make OAUTH_BILLING_HEADER configurable via ANTHROPIC_BILLING_HEADER env var
- Normalize billing block to always be first in system list (strip + reinsert)
- Guard metadata with isinstance check for non-dict values
- Replace os.uname() with socket.gethostname() for Windows compat
- Fix docstrings to say "all OAuth requests" instead of "non-Haiku"
- Move inline imports to module level (fixes ruff I001)
- Add 9 unit tests for _apply_oauth_billing
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Replace all bare print() calls with proper logging using Python's
standard logging module across the deerflow harness package.
Changes across 8 files (16 print statements replaced):
- agents/middlewares/clarification_middleware.py: use logger.info/debug
- agents/middlewares/memory_middleware.py: use logger.debug
- agents/middlewares/thread_data_middleware.py: use logger.debug
- agents/middlewares/view_image_middleware.py: use logger.debug
- agents/memory/queue.py: use logger.info/debug/warning/error
- agents/lead_agent/prompt.py: use logger.error
- skills/loader.py: use logger.warning
- skills/parser.py: use logger.error
Each file follows the established codebase convention:
import logging
logger = logging.getLogger(__name__)
Log levels chosen based on message semantics:
- debug: routine operational details (directory creation, timer resets)
- info: significant state changes (memory queued, updates processed)
- warning: recoverable issues (config load failures, skipped updates)
- error: unexpected failures (parsing errors, memory update errors)
Note: client.py is intentionally excluded as it uses print() for
CLI output, which is the correct behavior for a command-line client.
Co-authored-by: moose-lab <moose-lab@users.noreply.github.com>
* test: add unit tests for skill frontmatter validation
Cover _validate_skill_frontmatter logic:
- Valid minimal and full-field skills
- Missing SKILL.md, missing frontmatter, invalid YAML
- Required field validation (name, description)
- Unexpected key rejection
- Name format: hyphen-case, no leading/trailing/consecutive hyphens
- Name and description length limits
- Angle bracket rejection in description
* test: fix unused variables flagged by ruff F841
Replace unused tuple elements with _ and add assertions on
msg/name return values in success-path tests.
* test: address review feedback on unused variables
* test: consolidate validation tests into single module
Move the UTF-8/windows-locale test from test_skills_router.py into
test_skills_validation.py and remove test_skills_router.py to eliminate
duplicated assertions and future maintenance drift.
* fix: match assertion strings to actual validation messages
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Allow per-agent environment variables to be declared in config.yaml under
acp_agents.<name>.env. Values prefixed with $ are resolved from the host
environment at invocation time, consistent with other config fields.
Passes None to spawn_agent_process when env is empty so the subprocess
inherits the parent environment unchanged.
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix OpenAI BadRequestError: 'No images have been viewed.' was returned as
a plain string array instead of a properly formatted content block
- The OpenAI API expects message content to be either a string or an array
of objects with 'type' field, not an array of plain strings
- Changed return from ['No images have been viewed.'] to
[{'type': 'text', 'text': 'No images have been viewed.'}]
Fixes#1441
Co-authored-by: JasonOA888 <noreply@github.com>
Add LangSmith tracing setup instructions across the project:
- .env.example: add LANGSMITH_* env vars (commented out)
- README.md + translations (zh/ja/fr/ru): add LangSmith Tracing section
under Advanced with setup steps and env var reference
- backend/README.md: add detailed LangSmith Tracing section with setup,
env var table, how-it-works explanation, and Docker notes
- docker-compose.yaml: update LANGCHAIN_TRACING_V2 to LANGSMITH_TRACING
for naming consistency with the rest of the project
Made-with: Cursor
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
The channels config section uses localhost URLs by default, which don't
work inside Docker containers. Add inline comments showing the Docker
service names (langgraph, gateway) that match the docker-compose service
definitions.
Fixes#1421
Co-authored-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix: add build-arg support for proxies and mirrors in Docker builds (#1260)
Pin Debian images to bookworm, make UV source image configurable,
and pass APT_MIRROR/NPM_REGISTRY/UV_IMAGE through docker-compose.
* fix: ensure build args use consistent defaults across compose and Dockerfiles
UV_IMAGE: ${UV_IMAGE:-} resolved to empty when unset, overriding the
Dockerfile ARG default and breaking `FROM ${UV_IMAGE}`. Also configure
COREPACK_NPM_REGISTRY before pnpm download and propagate NPM_REGISTRY
into the prod stage.
* fix: dearmor NodeSource GPG key to resolve signing error
Pipe the downloaded key through gpg --dearmor so apt can verify
the repository signature (fixes NO_PUBKEY 2F59B5F99B1BE0B4).
---------
Co-authored-by: JeffJiang <for-eleven@hotmail.com>
* feat: Add github PAT configs, allowing larger github API rates.
* Update comment to English for better clarity
* fix: Remove unused config lines in config.example.yaml and unreferenced declarations in app_config. Fix lint issues and update documentation.
* fix: Remove unused imports, and passed the ruff check.
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix: use create_chat_model for summarization alias
* fix: remove unused radix Icon import from suggestion
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* Fix Windows backend test compatibility
* Preserve ACP path style on Windows
* Fix installer import ordering
* Address review comments for Windows fixes
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(LLM): fixing Gemini thinking + tool calls via OpenAI gateway (#1180)
When using Gemini with thinking enabled through an OpenAI-compatible gateway,
the API requires that fields on thinking content blocks are
preserved and echoed back verbatim in subsequent requests. Standard
silently drops these signatures when serializing
messages, causing HTTP 400 errors:
Changes:
- Add PatchedChatOpenAI adapter that re-injects signed thinking blocks into
request payloads, preserving the signature chain across multi-turn
conversations with tool calls.
- Support two LangChain storage patterns: additional_kwargs.thinking_blocks
and content list.
- Add 11 unit tests covering signed/unsigned blocks, storage patterns, edge
cases, and precedence rules.
- Update config.example.yaml with Gemini + thinking gateway example.
- Update CONFIGURATION.md with detailed guidance and error explanation.
Fixes: #1180
* Updated the patched_openai.py with thought_signature of function call
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* docs: fix inaccurate thought_signature description in CONFIGURATION.md (#1220)
* Initial plan
* docs: fix CONFIGURATION.md wording for thought_signature - tool-call objects, not thinking blocks
Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>
Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/360f5226-4631-48a7-a050-189094af8ffe
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
* refactor: extract shared utils to break harness→app cross-layer imports
Move _validate_skill_frontmatter to src/skills/validation.py and
CONVERTIBLE_EXTENSIONS + convert_file_to_markdown to src/utils/file_conversion.py.
This eliminates the two reverse dependencies from client.py (harness layer)
into gateway/routers/ (app layer), preparing for the harness/app package split.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: split backend/src into harness (deerflow.*) and app (app.*)
Physically split the monolithic backend/src/ package into two layers:
- **Harness** (`packages/harness/deerflow/`): publishable agent framework
package with import prefix `deerflow.*`. Contains agents, sandbox, tools,
models, MCP, skills, config, and all core infrastructure.
- **App** (`app/`): unpublished application code with import prefix `app.*`.
Contains gateway (FastAPI REST API) and channels (IM integrations).
Key changes:
- Move 13 harness modules to packages/harness/deerflow/ via git mv
- Move gateway + channels to app/ via git mv
- Rename all imports: src.* → deerflow.* (harness) / app.* (app layer)
- Set up uv workspace with deerflow-harness as workspace member
- Update langgraph.json, config.example.yaml, all scripts, Docker files
- Add build-system (hatchling) to harness pyproject.toml
- Add PYTHONPATH=. to gateway startup commands for app.* resolution
- Update ruff.toml with known-first-party for import sorting
- Update all documentation to reflect new directory structure
Boundary rule enforced: harness code never imports from app.
All 429 tests pass. Lint clean.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: add harness→app boundary check test and update docs
Add test_harness_boundary.py that scans all Python files in
packages/harness/deerflow/ and fails if any `from app.*` or
`import app.*` statement is found. This enforces the architectural
rule that the harness layer never depends on the app layer.
Update CLAUDE.md to document the harness/app split architecture,
import conventions, and the boundary enforcement test.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add config versioning with auto-upgrade on startup
When config.example.yaml schema changes, developers' local config.yaml
files can silently become outdated. This adds a config_version field and
auto-upgrade mechanism so breaking changes (like src.* → deerflow.*
renames) are applied automatically before services start.
- Add config_version: 1 to config.example.yaml
- Add startup version check warning in AppConfig.from_file()
- Add scripts/config-upgrade.sh with migration registry for value replacements
- Add `make config-upgrade` target
- Auto-run config-upgrade in serve.sh and start-daemon.sh before starting services
- Add config error hints in service failure messages
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix comments
* fix: update src.* import in test_sandbox_tools_security to deerflow.*
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: handle empty config and search parent dirs for config.example.yaml
Address Copilot review comments on PR #1131:
- Guard against yaml.safe_load() returning None for empty config files
- Search parent directories for config.example.yaml instead of only
looking next to config.yaml, fixing detection in common setups
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: correct skills root path depth and config_version type coercion
- loader.py: fix get_skills_root_path() to use 5 parent levels (was 3)
after harness split, file lives at packages/harness/deerflow/skills/
so parent×3 resolved to backend/packages/harness/ instead of backend/
- app_config.py: coerce config_version to int() before comparison in
_check_config_version() to prevent TypeError when YAML stores value
as string (e.g. config_version: "1")
- tests: add regression tests for both fixes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: update test imports from src.* to deerflow.*/app.* after harness refactor
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(harness): add tool-first ACP agent invocation (#37)
* feat(harness): add tool-first ACP agent invocation
* build(harness): make ACP dependency required
* fix(harness): address ACP review feedback
* feat(harness): decouple ACP agent workspace from thread data
ACP agents (codex, claude-code) previously used per-thread workspace
directories, causing path resolution complexity and coupling task
execution to DeerFlow's internal thread data layout. This change:
- Replace _resolve_cwd() with a fixed _get_work_dir() that always uses
{base_dir}/acp-workspace/, eliminating virtual path translation and
thread_id lookups
- Introduce /mnt/acp-workspace virtual path for lead agent read-only
access to ACP agent output files (same pattern as /mnt/skills)
- Add security guards: read-only validation, path traversal prevention,
command path allowlisting, and output masking for acp-workspace
- Update system prompt and tool description to guide LLM: send
self-contained tasks to ACP agents, copy results via /mnt/acp-workspace
- Add 11 new security tests for ACP workspace path handling
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor(prompt): inject ACP section only when ACP agents are configured
The ACP agent guidance in the system prompt is now conditionally built
by _build_acp_section(), which checks get_acp_agents() and returns an
empty string when no ACP agents are configured. This avoids polluting
the prompt with irrelevant instructions for users who don't use ACP.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix lint
* fix(harness): address Copilot review comments on sandbox path handling and ACP tool
- local_sandbox: fix path-segment boundary bug in _resolve_path (== or startswith +"/")
and add lookahead in _resolve_paths_in_command regex to prevent /mnt/skills matching
inside /mnt/skills-extra
- local_sandbox_provider: replace print() with logger.warning(..., exc_info=True)
- invoke_acp_agent_tool: guard getattr(option, "optionId") with None default + continue;
move full prompt from INFO to DEBUG level (truncated to 200 chars)
- sandbox/tools: fix _get_acp_workspace_host_path docstring to match implementation;
remove misleading "read-only" language from validate_local_bash_command_paths
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(acp): thread-isolated workspaces, permission guardrail, and ContextVar registry
P1.1 – ACP workspace thread isolation
- Add `Paths.acp_workspace_dir(thread_id)` for per-thread paths
- `_get_work_dir(thread_id)` in invoke_acp_agent_tool now uses
`{base_dir}/threads/{thread_id}/acp-workspace/`; falls back to
global workspace when thread_id is absent or invalid
- `_invoke` extracts thread_id from `RunnableConfig` via
`Annotated[RunnableConfig, InjectedToolArg]`
- `sandbox/tools.py`: `_get_acp_workspace_host_path(thread_id)`,
`_resolve_acp_workspace_path(path, thread_id)`, and all callers
(`replace_virtual_paths_in_command`, `mask_local_paths_in_output`,
`ls_tool`, `read_file_tool`) now resolve ACP paths per-thread
P1.2 – ACP permission guardrail
- New `auto_approve_permissions: bool = False` field in `ACPAgentConfig`
- `_build_permission_response(options, *, auto_approve: bool)` now
defaults to deny; only approves when `auto_approve=True`
- Document field in `config.example.yaml`
P2 – Deferred tool registry race condition
- Replace module-level `_registry` global with `contextvars.ContextVar`
- Each asyncio request context gets its own registry; worker threads
inherit the context automatically via `loop.run_in_executor`
- Expose `get_deferred_registry` / `set_deferred_registry` /
`reset_deferred_registry` helpers
Tests: 831 pass (57 for affected modules, 3 new tests)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(sandbox): mount /mnt/acp-workspace in docker sandbox container
The AioSandboxProvider was not mounting the ACP workspace into the
sandbox container, so /mnt/acp-workspace was inaccessible when the lead
agent tried to read ACP results in docker mode.
Changes:
- `ensure_thread_dirs`: also create `acp-workspace/` (chmod 0o777) so
the directory exists before the sandbox container starts — required
for Docker volume mounts
- `_get_thread_mounts`: add read-only `/mnt/acp-workspace` mount using
the per-thread host path (`host_paths.acp_workspace_dir(thread_id)`)
- Update stale CLAUDE.md description (was "fixed global workspace")
Tests: `test_aio_sandbox_provider.py` (4 new tests)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(lint): remove unused imports in test_aio_sandbox_provider
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix config
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* test: add unit tests for TodoMiddleware
Cover context-loss detection logic:
- _todos_in_messages and _reminder_in_messages helpers
- _format_todos formatting
- Reminder injection when write_todos truncated
- No-op when todos visible or reminder already present
- abefore_model async delegation
* test: fix event loop error in todo middleware async test
Use asyncio.run() instead of get_event_loop().run_until_complete()
to avoid RuntimeError on Python 3.12 where no default event loop
exists in the main thread.
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* test: add unit tests for DanglingToolCallMiddleware
Cover message patching logic for dangling tool calls:
- No-op when all tool calls have responses
- Synthetic ToolMessage insertion at correct positions
- Mixed responded/dangling scenarios
- wrap_model_call and awrap_model_call integration
* test: fix async tests and strengthen override assertions
- Use @pytest.mark.anyio + async def instead of deprecated
asyncio.get_event_loop().run_until_complete() (fixes Py3.12 CI failure)
- Assert that override() receives the correct patched messages kwarg
in both wrap_model_call and awrap_model_call tests
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
- Add null checks for runtime.context in uploads_middleware.py and
sandbox/middleware.py to prevent NPE when langgraph runtime context is None
- Tighten langgraph version constraint from >=1.0.6 to >=1.0.6,<1.0.10
to avoid context=None incompatibility with langgraph-api 0.7.x
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* refactor: extract shared skill installer and upload manager to harness
Move duplicated business logic from Gateway routers and Client into
shared harness modules, eliminating code duplication.
New shared modules:
- deerflow.skills.installer: 6 functions (zip security, extraction, install)
- deerflow.uploads.manager: 7 functions (normalize, deduplicate, validate,
list, delete, get_uploads_dir, ensure_uploads_dir)
Key improvements:
- SkillAlreadyExistsError replaces stringly-typed 409 status routing
- normalize_filename rejects backslash-containing filenames
- Read paths (list/delete) no longer mkdir via get_uploads_dir
- Write paths use ensure_uploads_dir for explicit directory creation
- list_files_in_dir does stat inside scandir context (no re-stat)
- install_skill_from_archive uses single is_file() check (one syscall)
- Fix agent config key not reset on update_mcp_config/update_skill
Tests: 42 new (22 installer + 20 upload manager) + client hardening
* refactor: centralize upload URL construction and clean up installer
- Extract upload_virtual_path(), upload_artifact_url(), enrich_file_listing()
into shared manager.py, eliminating 6 duplicated URL constructions across
Gateway router and Client
- Derive all upload URLs from VIRTUAL_PATH_PREFIX constant instead of
hardcoded "mnt/user-data/uploads" strings
- Eliminate TOCTOU pre-checks and double file read in installer — single
ZipFile() open with exception handling replaces is_file() + is_zipfile()
+ ZipFile() sequence
- Add missing re-exports: ensure_uploads_dir in uploads/__init__.py,
SkillAlreadyExistsError in skills/__init__.py
- Remove redundant .lower() on already-lowercase CONVERTIBLE_EXTENSIONS
- Hoist sandbox_uploads_dir(thread_id) before loop in uploads router
* fix: add input validation for thread_id and filename length
- Reject thread_id containing unsafe filesystem characters (only allow
alphanumeric, hyphens, underscores, dots) — prevents 500 on inputs
like <script> or shell metacharacters
- Reject filenames longer than 255 bytes (OS limit) in normalize_filename
- Gateway upload router maps ValueError to 400 for invalid thread_id
* fix: address PR review — symlink safety, input validation coverage, error ordering
- list_files_in_dir: use follow_symlinks=False to prevent symlink metadata
leakage; check is_dir() instead of exists() for non-directory paths
- install_skill_from_archive: restore is_file() pre-check before extension
validation so error messages match the documented exception contract
- validate_thread_id: move from ensure_uploads_dir to get_uploads_dir so
all entry points (upload/list/delete) are protected
- delete_uploaded_file: catch ValueError from thread_id validation (was 500)
- requires_llm marker: also skip when OPENAI_API_KEY is unset
- e2e fixture: update TitleMiddleware exclusion comment (kept filtering —
middleware triggers extra LLM calls that add non-determinism to tests)
* chore: revert uv.lock to main — no dependency changes in this PR
* fix: use monkeypatch for global config in e2e fixture to prevent test pollution
The e2e_env fixture was calling set_title_config() and
set_summarization_config() directly, which mutated global singletons
without automatic cleanup. When pytest ran test_client_e2e.py before
test_title_middleware_core_logic.py, the leaked enabled=False caused
5 title tests to fail in CI.
Switched to monkeypatch.setattr on the module-level private variables
so pytest restores the originals after each test.
* fix: address code review — URL encoding, API consistency, test isolation
- upload_artifact_url: percent-encode filename to handle spaces/#/?
- deduplicate_filename: mutate seen set in place (caller no longer
needs manual .add() — less error-prone API)
- list_files_in_dir: document that size is int, enrich stringifies
- e2e fixture: monkeypatch _app_config instead of set_app_config()
to prevent global singleton pollution (same pattern as title/summarization fix)
- _make_e2e_config: read LLM connection details from env vars so
external contributors can override defaults
- Update tests to match new deduplicate_filename contract
* docs: rewrite RFC in English and add alternatives/breaking changes sections
* fix: address code review feedback on PR #1202
- Rename deduplicate_filename to claim_unique_filename to make
the in-place set mutation explicit in the function name
- Replace PermissionError with PathTraversalError(ValueError) for
path traversal detection — malformed input is 400, not 403
* fix: set _app_config_is_custom in e2e test fixture to prevent config.yaml lookup in CI
---------
Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: DanielWalnut <45447813+hetaoBackend@users.noreply.github.com>
* feat: add configurable log level and token usage tracking
- Add `log_level` config to control deerflow module log level, synced
to LangGraph Server via serve.sh `--server-log-level`
- Add `token_usage.enabled` config with TokenUsageMiddleware that logs
input/output/total tokens per LLM call from usage_metadata
- Add .omc/ to .gitignore
* fix: use info level for token usage logs since feature has its own toggle
* fix: sort imports to pass lint check
---------
Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
LoopDetectionMiddleware injected SystemMessage mid-conversation to warn
about repetitive tool calls. This crashes Anthropic models because
langchain_anthropic's _format_messages() requires system messages to
appear only at the start of the conversation — interleaved system
messages raise 'Received multiple non-consecutive system messages'.
Switch the warning injection from SystemMessage to HumanMessage, which
works with all providers (Anthropic, OpenAI, Google, etc.).
Fixes#1299
Co-authored-by: voidborne-d <voidborne-d@users.noreply.github.com>
* fix: add Windows compatibility for make dev/start commands
On Windows with MinGW/Git Bash, the Makefile's direct shell script
execution fails with 'CreateProcess(NULL, env bash ...)' error.
This fix:
- Detects Windows via $(OS) == Windows_NT
- Uses explicit bash invocation on Windows
- Falls back to direct execution on Unix
Users need Git Bash installed (comes with Git for Windows).
Fixes#1288
Related #1278
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* fix(mcp): implement sync invocation wrapper for async MCP tools
Since DeerFlowClient streams synchronously, invoking async-only MCP tools
(loaded via langchain-mcp-adapters) resulted in a NotImplementedError.
This commit bridges the sync/async gap by dynamically injecting a `func`
wrapper into `StructuredTool` instances that only have a `coroutine`.
Key changes:
- Added `sync_wrapper` in `get_mcp_tools` to execute async tool calls.
- Handled nested event loops by delegating to a global `ThreadPoolExecutor`
when an event loop is already running, avoiding `RuntimeError`.
- Added detailed error logging within the wrapper for better transparency.
- Added comprehensive test coverage in `test_mcp_sync_wrapper.py` verifying
tool patching, event loop behavior, and exception propagation.
* refactor(mcp): extract sync wrapper to module level and fix test mocks
Addressed PR review comments:
- Extracted _make_sync_tool_wrapper to module level to avoid nested func definitions.
- Refactored tests to use the actual production helper instead of duplicating logic.
- Fixed AsyncMock patching for awaited dependencies in tests.
- Added atexit hook for graceful thread pool shutdown.
- Fixed PEP8 blank line formatting in tests.
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
os.walk() does not follow symbolic links by default. This means
custom skills installed as symlinks in skills/custom/ are discovered
as directories but never descended into, so their SKILL.md files
are never found and the skills silently fail to load.
Adding followlinks=True fixes this for users who symlink skill
directories from external projects into the custom skills folder.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Only tool calls with name === "task" should be rendered as SubtaskCard.
Previously all tool_calls were mapped to IDs, causing SubtaskCard to
render for non-task tool calls whose IDs were never registered in the
subtask context, resulting in a TypeError on task.status.
Signed-off-by: Gao Mingfei <g199209@gmail.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Surface the usage_metadata that PR #1218 added to the streaming API.
A compact indicator in the chat header shows cumulative tokens consumed
per thread, with a tooltip breakdown of input/output/total counts.
Co-authored-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(threads): clean up local thread data after thread deletion
Delete DeerFlow-managed thread directories after the web UI removes a LangGraph thread.
This keeps local thread data in sync with conversation deletion and adds regression coverage for the cleanup flow.
* fix(threads): address thread cleanup review feedback
Encode thread cleanup URLs in the web client, keep cache updates explicit when no thread search data is cached, and return a generic 500 response from the cleanup endpoint while documenting the sanitized error behavior.
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix: add error handling for podcast generation failures
When TTS processing fails, the system was generating 0-second audio files
without any error indication. This fix adds:
1. Track failed TTS lines and log warning with indices
2. Raise ValueError when all TTS generation fails with helpful message
3. Check for empty audio output in mix_audio and raise error
4. Log success/failure ratio for debugging
Fixes#30
* fix: address Copilot review feedback
- Use `not audio` to catch both None and empty bytes
- Log failed lines with 1-based indices for user-friendly output
- Handle empty script case with clear error message
- Validate env vars before ThreadPoolExecutor for fast-fail on config errors
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* feat(frontend): add Cmd+K command palette and keyboard shortcuts
Wire up the existing shadcn/ui Command component as a global command
palette. Adds a useGlobalShortcuts hook for Cmd+K (palette), Cmd+Shift+N
(new chat), Cmd+, (settings), and Cmd+/ (shortcuts help overlay).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(frontend): address Copilot review feedback on command palette
- Normalize event.key with toLowerCase() for reliable Shift+key matching
- Replace dead deerflow:open-settings event with router.push navigation
- Use platform-appropriate Shift label (Shift+ on Windows/Linux, glyph on Mac)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Add GuardrailMiddleware that evaluates every tool call before execution.
Three provider options: built-in AllowlistProvider (zero deps), OAP passport
providers (open standard), or custom providers loaded by class path.
- GuardrailProvider protocol with GuardrailRequest/Decision dataclasses
- GuardrailMiddleware (AgentMiddleware, position 5 in chain)
- AllowlistProvider for simple deny/allow by tool name
- GuardrailsConfig (Pydantic singleton, loaded from config.yaml)
- 25 tests covering allow/deny, fail-closed/open, async, GraphBubbleUp
- Comprehensive docs at backend/docs/GUARDRAILS.md
Closes#1213
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* feat(web): add conversation export as Markdown and JSON (#976)
Add the ability to export conversations in Markdown and JSON formats,
accessible from both the chat header and the sidebar context menu.
- Add export utility (formatThreadAsMarkdown, formatThreadAsJSON) with
support for user/assistant messages, thinking blocks, and tool calls
- Add ExportTrigger component in chat header (appears when messages exist)
- Add Export submenu to sidebar dropdown (fetches full thread state on demand)
- Add i18n translations for en-US and zh-CN
Closes#976
Made-with: Cursor
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Update thread creation timestamp to updated_at
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
The dev compose file was missing CLI auth directory mounts that exist in
the production compose file. This caused CodexChatModel to fail with
'Codex CLI credential not found' error in dev mode.
Fixes#1246
* feat: add Claude Code OAuth and Codex CLI providers
Port of bytedance/deer-flow#1136 from @solanian's feat/cli-oauth-providers branch.\n\nCarries the feature forward on top of current main without the original CLA-blocked commit metadata, while preserving attribution in the commit message for review.
* fix: harden CLI credential loading
Align Codex auth loading with the current ~/.codex/auth.json shape, make Docker credential mounts directory-based to avoid broken file binds on hosts without exported credential files, and add focused loader tests.
* refactor: tighten codex auth typing
Replace the temporary Any return type in CodexChatModel._load_codex_auth with the concrete CodexCliCredential type after the credential loader was stabilized.
* fix: load Claude Code OAuth from Keychain
Match Claude Code's macOS storage strategy more closely by checking the Keychain-backed credentials store before falling back to ~/.claude/.credentials.json. Keep explicit file overrides and add focused tests for the Keychain path.
* fix: require explicit Claude OAuth handoff
* style: format thread hooks reasoning request
* docs: document CLI-backed auth providers
* fix: address provider review feedback
* fix: harden provider edge cases
* Fix deferred tools, Codex message normalization, and local sandbox paths
* chore: narrow PR scope to OAuth providers
* chore: remove unrelated frontend changes
* chore: reapply OAuth branch frontend scope cleanup
* fix: preserve upload guards with reasoning effort wiring
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix: normalize ToolMessage structured content in serialization
When models return ToolMessage content as a list of content blocks
(e.g. [{"type": "text", "text": "..."}]), the UI previously displayed
the raw Python repr string instead of the extracted text.
Replace str(msg.content) with the existing _extract_text() helper in
both _serialize_message() and stream() to properly normalize
list-of-blocks content to plain text.
Fixes#1149
Also fixes the same root cause as #1188 (characters displayed one per
line when tool response content is returned as structured blocks).
Added 11 regression tests covering string, list-of-blocks, mixed,
empty, and fallback content types.
* fix(memory): extract text from structured LLM responses in memory updater
When LLMs return response content as list of content blocks
(e.g. [{"type": "text", "text": "..."}]) instead of plain strings,
str() produces Python repr which breaks JSON parsing in the memory
updater. This caused memory updates to silently fail.
Changes:
- Add _extract_text() helper in updater.py for safe content normalization
- Use _extract_text() instead of str(response.content) in update_memory()
- Fix format_conversation_for_update() to handle plain strings in list content
- Fix subagent executor fallback path to extract text from list content
- Replace print() with structured logging (logger.info/warning/error)
- Add 13 regression tests covering _extract_text, format_conversation,
and update_memory with structured LLM responses
* fix: address Copilot review - defensive text extraction + logger.exception
- client.py _extract_text: use block.get('text') + isinstance check (prevent KeyError/TypeError)
- prompt.py format_conversation_for_update: same defensive check for dict text blocks
- executor.py: type-safe text extraction in both code paths, fallback to placeholder instead of str(raw_content)
- updater.py: use logger.exception() instead of logger.error() for traceback preservation
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* fix: preserve chunked structured content without spurious newlines
* fix: restore backend unit test compatibility
---------
Co-authored-by: Exploreunive <Exploreunive@users.noreply.github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* feat: track token usage per conversation turn
Add token usage tracking to the streaming API so consumers can monitor
cost per turn without additional API calls.
Changes:
1. _serialize_message now includes usage_metadata for AI messages in
values events, exposing input_tokens/output_tokens/total_tokens
from LangChain's native metadata.
2. stream() accumulates token usage across all AI messages in a turn
and emits the cumulative totals in the end event:
{usage: {input_tokens: N, output_tokens: N, total_tokens: N}}
3. Each messages-tuple AI event with text content now includes a
per-message usage_metadata field for granular tracking.
This enables the frontend to display token consumption per turn,
support cost-aware UX, and let users monitor API spending.
10 tests added covering serialization passthrough and cumulative
aggregation logic.
Co-Authored-By: OpenClaw <noreply@openclaw.ai>
* fix: address Copilot review - use Mapping access for usage_metadata
- Replace getattr(usage, 'input_tokens', 0) with usage.get('input_tokens', 0)
since LangChain usage_metadata is a dict, not an object
- Remove unused 'import pytest' (fixes Ruff F401)
- Add proper stream() integration tests for cumulative usage in end event
and per-message usage_metadata in messages-tuple events
---------
Co-authored-by: Exploreunive <Exploreunive@users.noreply.github.com>
Co-authored-by: OpenClaw <noreply@openclaw.ai>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
This PR improves MiniMax Code Plan integration in DeerFlow by fixing three issues in the current flow: stream errors were not clearly surfaced in the UI, the frontend could not display the actual provider model ID, and MiniMax reasoning output could leak into final assistant content as inline <think>...</think>. The change adds a MiniMax-specific adapter, exposes real model IDs end-to-end, and adds a frontend fallback for historical messages.
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(feishu): support @bot message in topic groups
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* fix(feishu): preserve rich-text formatting and add parser unit tests
* chore(test): remove unused import to fix ruff lint error
* style: auto-format imports to satisfy ruff
---------
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* feat(manager): add bootstrap command to initialize soul.md in correct place
* feat(channels): add /bootstrap command to IM channels
Add a `/bootstrap` command that routes to the chat handler with
`is_bootstrap: True` in the run context, allowing the agent to invoke
its setup/initialization flow (e.g. `setup_agent`).
- The text after `/bootstrap` is forwarded as the chat message; when
omitted a default "Initialize workspace" message is used.
- Feishu channels use the streaming path as with normal chat.
- No changes to ChannelStore — bootstrap is stateless and triggered
purely by the command.
- Update /help output to include /bootstrap.
- Add 5 tests covering: text/no-text variants, Feishu streaming path,
thread creation, and help text.
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* fix: accept copilot suggestion
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* fix(scripts): handle docker-init failures gracefully for private registry
The make docker-init command was failing on Linux environments when users
could not access the private Volces container registry. This commonly
occurs in corporate intranet environments with proxies or for users
without registry credentials.
Changes:
- Detect sandbox mode from config.yaml before attempting image pull
- Skip image pull entirely for local sandbox mode (default)
- Gracefully handle pull failures with informative messages
- Update setup-sandbox Makefile target with same error handling
Fixes#1168
* Apply suggestions from code review
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: BillionClaw <billionclaw@users.noreply.github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* fix(frontend): block duplicate sends during uploads
Expose pre-submit upload work as a busy state so the chat input does not allow a second send while the first attachment is still uploading.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
* docs(frontend): document upload and stream ownership
Record that thread hooks own upload-before-submit state while the chat page owns composer busy wiring, so future changes do not reintroduce duplicate socket or upload state handling.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
* fix(frontend): separate upload busy state from streaming
Keep uploads from reusing the streaming stop state so duplicate submits are blocked without turning the composer into a stop button during file uploads.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
---------
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(harness): allow agent read access to /mnt/skills in local sandbox
Skill files under /mnt/skills/ were blocked by the path validator,
preventing agents from reading skill definitions. This change:
- Refactors `resolve_local_tool_path` into `validate_local_tool_path`,
a pure security gate that no longer resolves paths (left to the sandbox)
- Permits read-only access to the skills container path (/mnt/skills by
default, configurable via config.skills.container_path)
- Blocks write access to skills paths (PermissionError)
- Allows /mnt/skills in bash command path validation
- Adds `LocalSandbox.update_path_mappings` and injects per-thread
user-data mappings into the sandbox so all virtual-path resolution
is handled uniformly by the sandbox layer
- Covers all new behaviour with tests
Fixes#1177
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor(sandbox): unify all virtual path resolution in tools.py
Move skills path resolution from LocalSandbox into tools.py so that all
virtual-to-host path translation (user-data and skills) lives in one
layer. LocalSandbox becomes a pure execution layer that receives only
real host paths — no more path_mappings, _resolve_path, or reverse
resolve logic.
This addresses architecture feedback that path resolution was split
across two layers (tools.py for user-data, LocalSandbox for skills),
making the flow hard to follow.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(sandbox): address Copilot review — cache-on-success and error path masking
- Replace @lru_cache with manual cache-on-success for _get_skills_container_path
and _get_skills_host_path so transient failures at startup don't permanently
disable skills access.
- Add _sanitize_error() helper that masks host filesystem paths in error
messages via mask_local_paths_in_output before returning them to the agent.
- Apply _sanitize_error() to all catch-all (Exception/OSError) handlers in
sandbox tool functions to prevent host path leakage in error output.
- Remove unused lru_cache import.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(tools): add tool_search for deferred MCP tool loading
When multiple MCP servers are enabled, total tool count can exceed 30-50,
causing context bloat and degraded tool selection accuracy. This adds a
deferred tool loading mechanism controlled by `tool_search.enabled` config.
- Add ToolSearchConfig with single `enabled` field
- Add DeferredToolRegistry with regex search (select:, +keyword, keyword)
- Add tool_search tool returning OpenAI-compatible function JSON
- Add DeferredToolFilterMiddleware to hide deferred schemas from bind_tools
- Add <available-deferred-tools> section to system prompt
- Enable MCP tool_name_prefix to prevent cross-server name collisions
- Add 34 unit tests covering registry, tool, prompt, and middleware
* fix: reset stale deferred registry and bump config_version
- Reset deferred registry upfront in get_available_tools() to prevent
stale tool entries when MCP servers are disabled between calls
- Bump config_version to 2 for new tool_search config field
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(tests): mock get_app_config in prompt section tests for CI
CI has no config.yaml, causing TestDeferredToolsPromptSection to fail
with FileNotFoundError. Add autouse fixture to mock get_app_config.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
The cleanup() trap kills "next dev" and "next start" but not
"next-server". Since "next start" forks a "next-server" child
process, killing the parent may leave the child running as a
zombie, holding port 3000. The startup teardown block (line 35)
already handles this, but the Ctrl+C / SIGTERM trap did not.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* feat: add citation/reference support to deep research reports (#1141)
- Enhance lead agent system prompt with mandatory citation requirements
after web_search/web_fetch tool usage
- Add citation examples and best practices to GitHub Deep Research skill
- Add citation hints to report template (Executive Summary, Key Analysis)
- Style regular markdown links in frontend for visual distinction
(color, underline, hover effect)
- Fix TitleMiddleware being registered when title generation is disabled
* fix: address PR review comments
- Revert TitleMiddleware conditional registration (agent.py) to avoid
sync/async incompatibility with DeerFlowClient
- Fix markdown link rendering: merge classNames instead of overwriting,
only set target=_blank for external http(s) URLs
- Remove unrelated package.json/pnpm-lock.yaml changes
* fix: use plain markdown links in Sources section for cleaner rendering
Inline citations in report body use [citation:Title](URL) for pill/badge style.
Sources section uses plain [Title](URL) for simple underlined link style.
* fix(frontend): render plain links as underlined text in artifact markdown
Only links with citation: prefix render as Badge pills.
Regular links in Sources section now render as underlined text links.
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
The help text described docker-init as "Build the custom k3s image"
but the actual implementation (scripts/docker.sh init) only pulls
the sandbox image. Updated to match the real behavior.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Wrap the OGL Renderer instantiation in a try-catch so the app does not
crash when WebGL is unavailable (e.g. hardware acceleration disabled).
The Galaxy background simply does not render instead of taking down the
entire page.
Fixes#1144
Co-authored-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
docker.sh restart() tells users to run `make docker-dev-logs`, but
this target does not exist. The correct target is `make docker-logs`.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: extract shared utils to break harness→app cross-layer imports
Move _validate_skill_frontmatter to src/skills/validation.py and
CONVERTIBLE_EXTENSIONS + convert_file_to_markdown to src/utils/file_conversion.py.
This eliminates the two reverse dependencies from client.py (harness layer)
into gateway/routers/ (app layer), preparing for the harness/app package split.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: split backend/src into harness (deerflow.*) and app (app.*)
Physically split the monolithic backend/src/ package into two layers:
- **Harness** (`packages/harness/deerflow/`): publishable agent framework
package with import prefix `deerflow.*`. Contains agents, sandbox, tools,
models, MCP, skills, config, and all core infrastructure.
- **App** (`app/`): unpublished application code with import prefix `app.*`.
Contains gateway (FastAPI REST API) and channels (IM integrations).
Key changes:
- Move 13 harness modules to packages/harness/deerflow/ via git mv
- Move gateway + channels to app/ via git mv
- Rename all imports: src.* → deerflow.* (harness) / app.* (app layer)
- Set up uv workspace with deerflow-harness as workspace member
- Update langgraph.json, config.example.yaml, all scripts, Docker files
- Add build-system (hatchling) to harness pyproject.toml
- Add PYTHONPATH=. to gateway startup commands for app.* resolution
- Update ruff.toml with known-first-party for import sorting
- Update all documentation to reflect new directory structure
Boundary rule enforced: harness code never imports from app.
All 429 tests pass. Lint clean.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: add harness→app boundary check test and update docs
Add test_harness_boundary.py that scans all Python files in
packages/harness/deerflow/ and fails if any `from app.*` or
`import app.*` statement is found. This enforces the architectural
rule that the harness layer never depends on the app layer.
Update CLAUDE.md to document the harness/app split architecture,
import conventions, and the boundary enforcement test.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add config versioning with auto-upgrade on startup
When config.example.yaml schema changes, developers' local config.yaml
files can silently become outdated. This adds a config_version field and
auto-upgrade mechanism so breaking changes (like src.* → deerflow.*
renames) are applied automatically before services start.
- Add config_version: 1 to config.example.yaml
- Add startup version check warning in AppConfig.from_file()
- Add scripts/config-upgrade.sh with migration registry for value replacements
- Add `make config-upgrade` target
- Auto-run config-upgrade in serve.sh and start-daemon.sh before starting services
- Add config error hints in service failure messages
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix comments
* fix: update src.* import in test_sandbox_tools_security to deerflow.*
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: handle empty config and search parent dirs for config.example.yaml
Address Copilot review comments on PR #1131:
- Guard against yaml.safe_load() returning None for empty config files
- Search parent directories for config.example.yaml instead of only
looking next to config.yaml, fixing detection in common setups
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: correct skills root path depth and config_version type coercion
- loader.py: fix get_skills_root_path() to use 5 parent levels (was 3)
after harness split, file lives at packages/harness/deerflow/skills/
so parent×3 resolved to backend/packages/harness/ instead of backend/
- app_config.py: coerce config_version to int() before comparison in
_check_config_version() to prevent TypeError when YAML stores value
as string (e.g. config_version: "1")
- tests: add regression tests for both fixes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: update test imports from src.* to deerflow.*/app.* after harness refactor
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* feat(feishu): stream updates on a single card
* fix(feishu): ensure final message on stream error and warn on missing card ID
- Wrap streaming loop in try/except/finally so a is_final=True outbound
message is always published, even when the LangGraph stream breaks
mid-way. This prevents _running_card_ids memory leaks and ensures the
Feishu card shows a DONE reaction instead of hanging on "Working on it".
- Log a warning when _ensure_running_card gets no message_id back from
the Feishu reply API, making silent fallback to new-card behavior
visible in logs.
- Add test_handle_feishu_stream_error_still_sends_final to cover the
error path.
- Reformat service.py dict comprehension (ruff format, no logic change).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Avoid blocking inbound on Feishu card creation
---------
Co-authored-by: songyaolun <songyaolun@bytedance.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* feat: add LoopDetectionMiddleware to break repetitive tool call loops
Adds a new AgentMiddleware that detects when the agent is stuck calling
the same tools with the same arguments repeatedly, which currently runs
until the recursion limit kills the run.
Detection: per-thread sliding window of tool call hashes (name + args).
- Warn threshold (default 3): injects a "wrap up" system message
- Hard limit (default 5): strips tool_calls, forcing final text output
Includes 13 unit tests covering hashing, thresholds, window sliding,
reset, and edge cases.
Closes#1055
* fix: address PR #1056 review feedback for LoopDetectionMiddleware
- Remove unused imports (Awaitable, Callable, ModelCallResult,
ModelRequest, ModelResponse, AIMessage) from loop_detection_middleware
- Remove unused pytest import from test file
- Fix _hash_tool_calls sort key: sort by (name, serialized args) for
deterministic hashing when multiple calls share the same tool name
- Revert subagent_enabled default to False in agent.py to match
DeerFlowClient and channel defaults
- Remove unrelated SearxNG tools and Next.js rewrite changes from PR
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: address 2nd round review feedback on PR #1056
- Inject loop warning only once per thread (prevents context bloat)
- Add threading.Lock for thread-safe history mutations
- Use runtime.context thread_id instead of workspace_path
- Add LRU eviction for per-thread history (max 100 threads)
- Add 5 new tests covering warn-once, LRU eviction, thread isolation,
fallback thread_id, and lock presence
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: resolve lint errors in loop detection middleware tests
Sort imports (I001) and remove unused _WARNING_MSG import (F401)
to fix ruff lint failures in CI.
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Add MiniMax as an OpenAI-compatible model provider
MiniMax offers high-performance LLMs (M2.5, M2.5-highspeed) with
204K context windows. This commit adds MiniMax as a selectable
provider in the configuration system.
Changes:
- Add MiniMax to SUPPORTED_MODELS with model definitions
- Add MiniMax provider configuration in conf/config.yaml
- Update documentation with MiniMax setup instructions
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Update README to remove MiniMax API details
Removed mention of MiniMax API usage and configuration examples.
---------
Co-authored-by: octo-patch <octo-patch@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix: preserve conversation context in Telegram private chats
In private (1-on-1) chats, set topic_id=None so all messages map to a
single DeerFlow thread per chat instead of creating a new thread for
every message. Also fix _cmd_generic to use topic_id=None in private
chats so /new correctly targets the default thread.
Group chat behavior is unchanged (reply_to or msg_id as topic_id).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: preserve conversation context in Telegram private chats
Fixes#1101
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: mirror _on_text reply logic in _cmd_generic for group chats
_cmd_generic now prefers reply_to_message.message_id over msg_id in
group/supergroup chats, consistent with _on_text. This ensures commands
like /new and /status target the correct conversation thread when sent
as a reply in group chats.
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: JeffJiang <for-eleven@hotmail.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* feat(sandbox): harden local file access and mask host paths
- enforce local sandbox file tools to only accept /mnt/user-data paths
- add path traversal checks against thread workspace/uploads/outputs roots
- preserve requested virtual paths in tool error messages (no host path leaks)
- mask local absolute paths in bash output back to virtual sandbox paths
- update bash tool guidance to prefer thread-local venv + python -m pip
- add regression tests for path mapping, masking, and access restrictions
Fixes#968
* feat(sandbox): restrict risky absolute paths in local bash commands
- validate absolute path usage in local-mode bash commands
- allow only /mnt/user-data virtual paths for user data access
- keep a small allowlist for system executable/device paths
- return clear permission errors for unsafe command paths
- add regression tests for bash path validation rules
* test(sandbox): add success path test for resolve_local_tool_path (#992)
* Initial plan
* test(sandbox): add success path test for resolve_local_tool_path
Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>
* fix(sandbox): reject bare virtual root early with clear error in resolve_local_tool_path (#991)
* Initial plan
* fix(sandbox): reject bare virtual root early with clear error in resolve_local_tool_path
Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
---------
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
* feat: add dev-daemon target for background development mode
Add a new make dev-daemon target that allows running DeerFlow services
in background mode without keeping the terminal connection.
Following the pattern of PR #1042, the implementation uses a dedicated
shell script (scripts/start-daemon.sh) for better maintainability.
- Create scripts/start-daemon.sh for daemon mode startup
- Add dev-daemon target to Makefile
- Each service writes logs to separate files (langgraph, gateway, frontend, nginx)
- Services can be stopped with make stop
- Use nohup for proper daemon process detachment
- Add cleanup on failure when services fail to start
- Use more specific pkill pattern to avoid killing unrelated nginx processes
* refactor: use wait-for-port.sh instead of hardcoded sleep in daemon script
* refactor: use specific nginx process pattern to avoid killing unrelated processes
* Revert "refactor: use specific nginx process pattern to avoid killing unrelated processes"
This reverts commit 4c369155bfc91ccce347876a8982f955fa039da8.
* refactor: use consistent nginx kill pattern across all scripts
* chore(daemon): add trap for cleanup on interrupt signals
* fix(daemon): pass repo root as positional argument to nginx command
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
- Use router.replace() instead of history.replaceState() so Next.js
router's internal state is updated on chat start. This ensures
subsequent "New Chat" clicks are treated as a real cross-route
navigation (actual-id → "new") rather than a no-op same-path
navigation, which was causing stale content to persist.
- In ChatLayout, increment the SubtasksProvider key only when
navigating TO "new" from a non-"new" route. This forces a full
remount for a fresh new-chat state without remounting when the URL
transitions from "new" → actual-id (which would interrupt streaming).
Made-with: Cursor
Co-authored-by: DanielWalnut <45447813+hetaoBackend@users.noreply.github.com>
* fix(tracing): support LANGCHAIN_* env fallback for LangSmith config
- add backward-compatible env parsing in tracing_config.py
- support fallback keys:
LANGCHAIN_TRACING_V2 / LANGCHAIN_TRACING
LANGCHAIN_API_KEY
LANGCHAIN_PROJECT
LANGCHAIN_ENDPOINT
- keep LANGSMITH_* as preferred source when both are present
- add regression tests in test_tracing_config.py
* fix(tracing): correct LANGSMITH_* precedence over LANGCHAIN_* for enabled flag (#1067)
* Initial plan
* fix(tracing): use first-present-wins logic for enabled flag, add precedence docs and test
Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>
---------
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
* Refactor sandbox state management and improve Docker integration
- Removed FileSandboxStateStore and SandboxStateStore classes for a cleaner architecture.
- Enhanced LocalContainerBackend to handle port allocation retries and introduced environment variable support for sandbox host configuration.
- Updated Paths class to include host_base_dir for Docker volume mounts and ensured proper permissions for sandbox directories.
- Modified ExtensionsConfig to improve error handling when loading configuration files and adjusted environment variable resolution.
- Updated sandbox configuration to include a replicas option for managing concurrent sandbox containers.
- Improved logging and context management in SandboxMiddleware for better sandbox lifecycle handling.
- Enhanced network port allocation logic to bind to 0.0.0.0 for compatibility with Docker.
- Updated Docker Compose files to ensure proper volume management and environment variable configuration.
- Created scripts to ensure necessary configuration files are present before starting services.
- Cleaned up unused MCP server configurations in extensions_config.example.json.
* Address Copilot review suggestions from PR #1068 (#9)
---------
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
* fix(subagents): cleanup background tasks after completion to prevent memory leak
Added cleanup_background_task() function to remove completed subagent results
from the global _background_tasks dict. Found a small issue: completed tasks
were never removed, causing memory to grow indefinitely with each subagent
execution.
Alternative approaches considered:
- Future + SubagentHandle pattern: Not chosen due to requiring refactoring
Chose the simple cleanup approach for minimal code changes while effectively
resolving the memory leak.
Changes:
- Add cleanup_background_task() in executor.py
- Call cleanup in all task_tool return paths (completed, failed, timed out)
* fix(subagents): prevent race condition in background task cleanup
Address Copilot review feedback on memory leak fix:
- Add terminal state check in cleanup_background_task() to only remove
tasks that are COMPLETED/FAILED/TIMED_OUT or have completed_at set
- Remove cleanup call from polling safety-timeout branch in task_tool
since the task may still be running
- Add comprehensive tests for cleanup behavior including:
- Verification that cleanup is called on terminal states
- Verification that cleanup is NOT called on polling timeout
- Tests for terminal state check logic in executor
This prevents KeyError when the background executor tries to update
a task that was prematurely removed from _background_tasks.
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(checkpointer): return InMemorySaver instead of None when not configured (#1016)
* fix(checkpointer): also fix get_checkpointer() to return InMemorySaver
Make all three checkpointer functions consistent:
- make_checkpointer() (async) → InMemorySaver
- checkpointer_context() (sync) → InMemorySaver
- get_checkpointer() (sync singleton) → InMemorySaver
This ensures DeerFlowClient always has a valid checkpointer.
* fix: address CI failure and Copilot review feedback
- Fix import order in test_checkpointer_none_fix.py (I001 ruff error)
- Fix type annotation: _checkpointer should be Checkpointer | None
- Update docstring: change "None if not configured" to "InMemorySaver if not configured"
- Ensure app config is loaded before checking checkpointer config to prevent incorrect InMemorySaver fallback
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* feat: add claude-to-deerflow skill for DeerFlow API integration
Add a new skill that enables Claude Code to interact with the DeerFlow
AI agent platform via its HTTP API, including chat streaming and status
checking capabilities.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: fix telegram channel
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Partially addresses #1011
The cache_from options reference /tmp/docker-cache-* directories
that don't exist by default, causing WARN messages on startup:
WARN local cache import at /tmp/docker-cache-gateway not found
WARN local cache import at /tmp/docker-cache-langgraph not found
Fix: Comment out cache_from with setup instructions.
To re-enable caching, create the directories:
mkdir -p /tmp/docker-cache-gateway /tmp/docker-cache-langgraph
Note: This PR only fixes the cache warnings. The main NoneType error
in #1011 requires further investigation.
* feat: add IM channels system for Feishu, Slack, and Telegram integration
Bridge external messaging platforms to DeerFlow via LangGraph Server with
async message bus, thread management, and per-channel configuration.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: address review comments on IM channels system
Fix topic_id handling in store remove/list_entries and manager commands,
correct Telegram reply threading, remove unused imports/variables, update
docstrings and docs to match implementation, and prevent config mutation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* update skill creator
* fix im reply text
* fix comments
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* Add checkpointer configuration to config.example.yaml
- Introduced a new section for checkpointer configuration to enable state persistence for the embedded DeerFlowClient.
- Documented supported types: memory, sqlite, and postgres, along with examples for each.
- Clarified that the LangGraph Server manages its own state persistence separately.
* refactor(checkpointer): streamline checkpointer initialization and logging
* fix(uv.lock): update revision and add new wheel URLs for brotlicffi package
* feat: add langchain-anthropic dependency and update related configurations
* Fix checkpointer lifecycle, docstring, and path resolution bugs from PR #1005 review (#4)
* Initial plan
* Address all review suggestions from PR #1005
Co-authored-by: foreleven <4785594+foreleven@users.noreply.github.com>
* Fix resolve_path to always return real Path; move SQLite special-string handling to callers
Co-authored-by: foreleven <4785594+foreleven@users.noreply.github.com>
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: foreleven <4785594+foreleven@users.noreply.github.com>
---------
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: foreleven <4785594+foreleven@users.noreply.github.com>
* feat: u may ask
* chore: adjust code according to CR
* chore: adjust code according to CR
* ut: test for suggestions.py
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(subagent): support async MCP tools in subagent executor
SubagentExecutor.execute() was synchronous and could not handle async-only tools like MCP tools. This caused failures when trying to use MCP tools within subagents.
Changes:
- Add _aexecute() async method using agent.astream() for async execution
- Refactor execute() to use asyncio.run() wrapping _aexecute()
- This allows subagents to use async tools (MCP) within ThreadPoolExecutor
* test(subagent): add unit tests for executor async/sync paths
Add comprehensive tests covering:
- Async _aexecute() with success/error cases
- Sync execute() wrapper using asyncio.run()
- Async tool (MCP) support verification
- Thread pool execution safety
* fix(subagent): subagent-test-circular-depend
- Use session-scoped fixture with delayed import to handle circular dependencies
without affecting other test modules
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
- replace with explicit runtime deps:
- regenerate after dependency changes
- make deterministic by patching
to avoid leaked global affecting expected paths
* feat(upload): implement optimistic UI for file uploads and enhance message handling
* feat(middleware): enhance file handling by collecting historical uploads from directory
* feat(thread-title): update page title handling for new threads and improve loading state
* feat(uploads-middleware): enhance file extraction by verifying file existence in uploads directory
* feat(thread-stream): update file path reference to use virtual_path for uploads
* feat(tests): add core behaviour tests for UploadsMiddleware
* feat(tests): remove unused pytest import from test_uploads_middleware_core_logic.py
* feat: enhance file upload handling and localization support
- Update UploadsMiddleware to validate filenames more robustly.
- Modify MessageListItem to parse uploaded files from raw content for backward compatibility.
- Add localization for uploading messages in English and Chinese.
- Introduce parseUploadedFiles utility to extract uploaded files from message content.
* fix(memory): prevent file upload events from persisting in long-term memory
Uploaded files are session-scoped and unavailable in future sessions.
Previously, upload interactions were recorded in memory, causing the
agent to search for non-existent files in subsequent conversations.
Changes:
- memory_middleware: skip human messages containing <uploaded_files>
and their paired AI responses from the memory queue
- updater: post-process generated memory to strip upload mentions
before saving to file
- prompt: instruct the memory LLM to ignore file upload events
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(memory): address Copilot review feedback on upload filtering
- memory_middleware: strip <uploaded_files> block from human messages
instead of dropping the entire turn; only skip the turn (and paired
AI response) when nothing remains after stripping
- updater: narrow the upload-scrubbing regex to explicit upload events
(avoids false-positive removal of "User works with CSV files" etc.);
also filter upload-event facts from the facts array
- prompt: move `import re` to module scope; skip upload-only human
messages (empty after stripping) rather than appending "User: "
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(memory): allow optional words between 'upload' and 'file' in scrub regex
The previous pattern required 'uploading file' with no intervening words,
so 'uploading a test file' was not matched and leaked into long-term memory.
Allow up to 3 modifier words between the verb and noun (e.g. 'uploading a
test file', 'uploaded the attachment').
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* test(memory): add unit tests for upload filtering in memory pipeline
Covers _filter_messages_for_memory and _strip_upload_mentions_from_memory
per Copilot review suggestion. 15 test cases verify:
- Upload-only turns (and paired AI responses) are excluded from memory queue
- User's real question is preserved when combined with an upload block
- Upload file paths are never present in filtered message content
- Intermediate tool messages are always excluded
- Multi-turn conversations: only the upload turn is dropped
- Multimodal (list-content) human messages are handled
- Upload-event sentences are removed from summaries and facts
- Legitimate file-related facts (CSV preferences, PDF exports) are preserved
- "uploading a test file" (words between verb and noun) is caught by regex
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Fixes issue #47: CORS error when frontend port isn't 3000
Users running the frontend on a port other than 3000 need to set
CORS_ORIGINS to allow cross-origin requests. This addition to
.env.example makes this configuration option visible.
Co-authored-by: GitHub Agent <agent@example.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* feat: add agent management functionality with creation, editing, and deletion
* feat: enhance agent creation and chat experience
- Added AgentWelcome component to display agent description on new thread creation.
- Improved agent name validation with availability check during agent creation.
- Updated NewAgentPage to handle agent creation flow more effectively, including enhanced error handling and user feedback.
- Refactored chat components to streamline message handling and improve user experience.
- Introduced new bootstrap skill for personalized onboarding conversations, including detailed conversation phases and a structured SOUL.md template.
- Updated localization files to reflect new features and error messages.
- General code cleanup and optimizations across various components and hooks.
* Refactor workspace layout and agent management components
- Updated WorkspaceLayout to use useLayoutEffect for sidebar state initialization.
- Removed unused AgentFormDialog and related edit functionality from AgentCard.
- Introduced ArtifactTrigger component to manage artifact visibility.
- Enhanced ChatBox to handle artifact selection and display.
- Improved message list rendering logic to avoid loading states.
- Updated localization files to remove deprecated keys and add new translations.
- Refined hooks for local settings and thread management to improve performance and clarity.
- Added temporal awareness guidelines to deep research skill documentation.
* feat: refactor chat components and introduce thread management hooks
* feat: improve artifact file detail preview logic and clean up console logs
* feat: refactor lead agent creation logic and improve logging details
* feat: validate agent name format and enhance error handling in agent setup
* feat: simplify thread search query by removing unnecessary metadata
* feat: update query key in useDeleteThread and useRenameThread for consistency
* feat: add isMock parameter to thread and artifact handling for improved testing
* fix: reorder import of setup_agent for consistency in builtins module
* feat: append mock parameter to thread links in CaseStudySection for testing purposes
* fix: update load_agent_soul calls to use cfg.name for improved clarity
* fix: update date format in apply_prompt_template for consistency
* feat: integrate isMock parameter into artifact content loading for enhanced testing
* docs: add license section to SKILL.md for clarity and attribution
* feat(agent): enhance model resolution and agent configuration handling
* chore: remove unused import of _resolve_model_name from agents
* feat(agent): remove unused field
* fix(agent): set default value for requested_model_name in _resolve_model_name function
* feat(agent): update get_available_tools call to handle optional agent_config and improve middleware function signature
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* feat: Add reasoning effort configuration support
* Add `reasoning_effort` parameter to model config and agent initialization
* Support reasoning effort levels (minimal/low/medium/high) for Doubao/GPT-5 models
* Add UI controls in input box for reasoning effort selection
* Update doubao-seed-1.8 example config with reasoning effort support
Fixes & Cleanup:
* Ensure UTF-8 encoding for file operations
* Remove unused imports
* fix: set reasoning_effort to None for unsupported models
* fix: unit test error
* Update frontend/src/components/workspace/input-box.tsx
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
add oauth schema to MCP server config (extensions_config.json)
support client_credentials and refresh_token grants
implement token manager with caching and pre-expiry refresh
inject OAuth Authorization header for MCP tool discovery and tool calls
extend MCP gateway config models to read/write OAuth settings
update docs and examples for OAuth configuration
add unit tests for token fetch/cache and header injection
* fix: use shell fallback instead of hardcoded /bin/zsh in LocalSandbox
Replace hardcoded /bin/zsh executable with dynamic shell detection
that falls back through /bin/zsh → /bin/bash → /bin/sh. This fixes
skill execution failures in Docker containers (python:3.12-slim)
where zsh is not available.
Closes#935
* Update backend/src/sandbox/local/local_sandbox.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: atian8179 <atian8179@users.noreply.github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Validate that all dict-returning client methods conform to Gateway
Pydantic response models (ModelsListResponse, ModelResponse,
SkillsListResponse, SkillResponse, SkillInstallResponse,
McpConfigResponse, UploadResponse, MemoryConfigResponse,
MemoryStatusResponse). Pydantic ValidationError in CI catches
schema drift between client and Gateway with zero production coupling.
Also includes prior review fixes: enhanced client methods, expanded
unit tests (67→77), live integration test improvements, and updated
documentation.
Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com>
Add `DeerFlowClient` class that provides direct in-process access to
DeerFlow's agent and Gateway capabilities without requiring LangGraph
Server or Gateway API processes. This enables users to import and use
DeerFlow as a Python library.
Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com>
* feat: add Novita AI as optional LLM provider
Adds Novita AI (https://novita.ai) as an optional, OpenAI-compatible
LLM provider.
Changes:
- Added Novita model configuration example in config.example.yaml
- Added NOVITA_API_KEY to .env.example
Usage: Set NOVITA_API_KEY in your environment and use novita-gpt-4
as the model name.
* update correct model info
* Update README.md
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix: recover from stale model context after config model changes
* fix: fail fast on missing model config and expand model resolution tests
* fix: remove duplicate get_app_config imports
* fix: align model resolution tests with runtime imports
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* fix: remove duplicate model resolution test case
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Previously used before_model which returned {"messages": patches}, causing
LangGraph's add_messages reducer to append patches at the end of the message
list. This resulted in invalid ordering (ToolMessage after a HumanMessage)
that LLMs reject with tool call ID mismatch errors.
Switch to wrap_model_call/awrap_model_call to insert synthetic ToolMessages
immediately after each dangling AIMessage before the request reaches the LLM,
without persisting the patches to state.
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* Enforces config env var checks and improves startup handling
Ensures critical environment variables are validated during config resolution,
raising clear errors if missing. Improves server startup reliability by
verifying that backend services are listening and by terminating on
misconfiguration at launch. Adds more robust feedback to developers when
API startup fails, reducing silent misconfigurations and speeding up
troubleshooting.
* Initial plan
* Implement suggestions from PR #892: fix env var checks and improve error logging
Co-authored-by: foreleven <4785594+foreleven@users.noreply.github.com>
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: foreleven <4785594+foreleven@users.noreply.github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* feat(subagents): make subagent timeout configurable via config.yaml
- Add SubagentsAppConfig supporting global and per-agent timeout_seconds
- Load subagents config section in AppConfig.from_file()
- Registry now applies config.yaml overrides without mutating builtin defaults
- Polling safety-net in task_tool is now dynamic (execution timeout + 60s buffer)
- Document subagents section in config.example.yaml
- Add make test command and enforce TDD policy in CLAUDE.md
- Add 38 unit tests covering config validation, timeout resolution, registry
override behavior, and polling timeout formula
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(subagents): add logging for subagent timeout config and execution
- Log loaded timeout config (global default + per-agent overrides) on startup
- Log debug message in registry when config.yaml overrides a builtin timeout
- Include timeout in executor's async execution start log
- Log effective timeout and polling limit when a task is dispatched
- Fix UnboundLocalError: move max_poll_count assignment before logger.info
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* ci(backend): add lint step and run all unit tests via Makefile
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix lint
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
The condition guarding ArtifactFilePreview only allowed markdown files
through, which prevented HTML files from reaching the preview component.
Added `language === "html"` to the condition so HTML artifacts render
correctly in preview mode.
Fixes#873
Co-authored-by: Claude <noreply@anthropic.com>
Set pid directive to /tmp/nginx.pid in nginx.conf and nginx.local.conf
to prevent permission denied errors when running nginx as a non-root user.
Co-authored-by: Claude <noreply@anthropic.com>
* fix: move Key Citations to early position in reporter prompt to reduce URL hallucination
Move the Key Citations section from position 6 (end of report) to position 2
(immediately after title) in the reporter prompt. When citations are placed at
the end of a long report, LLMs tend to forget real URLs from source material
and fabricate plausible-looking but non-existent URLs.
Changes to src/prompts/reporter.md:
- Move Key Citations from section 6 to section 2 (right after Title)
- Add explicit anti-hallucination instructions: only use URLs from provided
source material, never fabricate or guess URLs
- Keep a repeated citation list at the end (section 7) for completeness
- Renumber all subsequent sections accordingly
- Update Notes section to reflect new structure
Tested with real DeerFlow backend + DuckDuckGo search:
- Before: multiple hallucinated URLs in report citations
- After: hallucinated URLs reduced significantly
Closes#825
* fix: move citations after observations in reporter_node to reduce URL hallucination
Previously, the citation message was appended BEFORE observation messages,
meaning it got buried under potentially thousands of chars of research data.
By the time the LLM reached the end of the context to generate the report,
it had 'forgotten' the real URLs and fabricated plausible-looking ones.
Now citations are appended AFTER compressed observations, placing them
closest to the LLM's generation point for maximum recall accuracy.
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
- Expand Docker Compose section in README.md and README_zh.md with:
- Explicit note that only root .env is used (not web/.env)
- Instructions to update NEXT_PUBLIC_API_URL for remote/LAN deployment
- Explanation that NEXT_PUBLIC_API_URL is a build-time variable
- Improve NEXT_PUBLIC_API_URL comments in root .env.example
Closes#527
Unifies market analysis, data analysis, and consulting reporting into a comprehensive consulting-analysis skill, enabling a two-phase workflow from analysis framework design to professional report generation. Introduces a DuckDB-based data analysis utility for Excel/CSV files and a chart-visualization skill with a flexible JS interface and extensive chart type documentation. Removes the legacy market analysis skill to streamline report generation and improve extensibility for consulting and data-driven workflows.
* Adds Kubernetes sandbox provisioner support
* Improves Docker dev setup by standardizing host paths
Replaces hardcoded host paths with a configurable root directory,
making the development environment more portable and easier to use
across different machines. Automatically sets the root path if not
already defined, reducing manual setup steps.
Explicitly prohibit parallel image generation to ensure each slide
can use the previous slide as a reference image for visual consistency.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Support configuring max_concurrent_subagents (2-4, default 3) through
config.configurable, with automatic clamping and dynamic prompt generation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extract get_skills_prompt_section() from apply_prompt_template() so
subagents can also receive the available skills list in their system
prompt. This allows subagents to discover and load skills via read_file,
just like the lead agent.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- 添加 rehype-raw 依赖以支持在 markdown 中渲染 HTML
Add rehype-raw dependency to support HTML rendering in markdown
- 重构 memory-settings-page,提取 formatMemorySection 函数减少重复代码
Refactor memory-settings-page by extracting formatMemorySection function to reduce code duplication
- 改进空状态显示,使用 HTML span 标签替代 markdown 斜体,提供更好的样式控制
Improve empty state display by using HTML span tags instead of markdown italics for better style control
- 为 skill-settings-page 添加完整的国际化支持,替换硬编码的英文文本
Add complete i18n support for skill-settings-page, replacing hardcoded English text
- 更新国际化文件,添加技能设置页面的空状态文本(中英文)
Update i18n files with empty state text for skill settings page (both Chinese and English)
- 在 streamdown 插件配置中添加 rehypeRaw 以支持 HTML 渲染
Add rehypeRaw to streamdown plugins configuration to support HTML rendering
Co-authored-by: Cursor <cursoragent@cursor.com>
* feat: adds docker-based dev environment
* docs: updates Docker command help
* fix local dev
* feat(sandbox): add Kubernetes-based sandbox provider for multi-instance support
* fix: skills path in k8s
* feat: add example config for k8s sandbox
* fix: docker config
* fix: load skills on docker dev
* feat: support sandbox execution to Kubernetes Deployment model
* chore: rename web service name
* feat: adds docker-based dev environment
* docs: updates Docker command help
* fix local dev
* feat(sandbox): add Kubernetes-based sandbox provider for multi-instance support
* fix: skills path in k8s
* feat: add example config for k8s sandbox
* fix: docker config
* fix: load skills on docker dev
* feat: support sandbox execution to Kubernetes Deployment model
* chore: rename web service name
This commit fixes a potential AttributeError in ChatStreamManager.__init__().
Problem:
- When checkpoint_saver=True and db_uri=None, the code attempts to call
self.db_uri.startswith() on line 56, which raises AttributeError
- Line 56: if self.db_uri.startswith("mongodb://"):
This fails with "AttributeError: 'NoneType' object has no attribute 'startswith'"
Root Cause:
- The __init__ method accepts db_uri: Optional[str] = None
- If None is passed, self.db_uri is set to None
- The code doesn't check if db_uri is None before calling .startswith()
Solution:
- Add explicit None check before string prefix checks
- Provide clear warning message when db_uri is None but checkpoint_saver is enabled
- This prevents AttributeError and helps users understand the configuration issue
Changes:
```python
# Before:
if self.checkpoint_saver:
if self.db_uri.startswith("mongodb://"):
# After:
if self.checkpoint_saver:
if self.db_uri is None:
self.logger.warning(
"Checkpoint saver is enabled but db_uri is None. "
"Please provide a valid database URI or disable checkpoint saver."
)
elif self.db_uri.startswith("mongodb://"):
```
This makes the error handling more robust and provides better user feedback.
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
- Backend: add citation format to lead_agent and general_purpose prompts
- Add CitationLink component (Badge + HoverCard) for citation cards
- MarkdownContent: detect citation: prefix in link text, render CitationLink
- Message/artifact/subtask: use MarkdownContent or Streamdown with CitationLink
- message-list-item: pass img via components prop (remove isHuman/img)
- message-group, subtask-card: drop unused imports; fix import order (lint)
Co-authored-by: Cursor <cursoragent@cursor.com>
- Keep upstream subagent HARD LIMIT (max 3 task calls, batching) in subagent_reminder
- Keep our removal of Citations: do not add back 'Citations when synthesizing'
Co-authored-by: Cursor <cursoragent@cursor.com>
- Keep upstream subagent HARD LIMIT (max 3 task calls, batching) in subagent_reminder
- Keep our removal of Citations: do not add back 'Citations when synthesizing'
Co-authored-by: Cursor <cursoragent@cursor.com>
Citations:
- In shouldShowCitationLoading, treat any unreplaced [cite-N] in cleanContent
as show-loading (no body). Fixes Ultra and other modes when refs arrive
before the <citations> block in the stream.
- Single rule: hasUnreplacedCitationRefs(cleanContent) => true forces loading;
then isLoading && hasCitationsBlock(rawContent) for streaming indicator.
SSE end state:
- When stream finishes, SDK may set isLoading=false before client state has
the final message content, so UI stayed wrong until refresh.
- Store onFinish(state) as finalState in chat page; clear when stream starts.
- Pass messagesOverride={finalState.messages} to MessageList when not loading
so the list uses server-complete messages right after SSE ends (no refresh).
Chore:
- Stop tracking .githooks/pre-commit; add .githooks/ to .gitignore (local only).
Co-authored-by: Cursor <cursoragent@cursor.com>
---
fix(前端): 杜绝半成品引用,SSE 结束时展示正确状态
引用:
- shouldShowCitationLoading 中只要 cleanContent 仍含未替换的 [cite-N] 就
只显示加载、不渲染正文,解决流式时引用块未到就出现 [cite-1] 的问题。
- 规则:hasUnreplacedCitationRefs(cleanContent) 为真则一律显示加载;
此外 isLoading && hasCitationsBlock 用于流式时显示「正在整理引用」。
SSE 结束状态:
- 流结束时 SDK 可能先置 isLoading=false,客户端 messages 尚未包含
最终内容,导致需刷新才显示正确。
- 在对话页保存 onFinish(state) 为 finalState,流开始时清空。
- 非加载时向 MessageList 传入 messagesOverride={finalState.messages},
列表在 SSE 结束后立即用服务端完整消息渲染,无需刷新。
杂项:
- 取消跟踪 .githooks/pre-commit,.gitignore 增加 .githooks/(仅本地)。
Citations:
- In shouldShowCitationLoading, treat any unreplaced [cite-N] in cleanContent
as show-loading (no body). Fixes Ultra and other modes when refs arrive
before the <citations> block in the stream.
- Single rule: hasUnreplacedCitationRefs(cleanContent) => true forces loading;
then isLoading && hasCitationsBlock(rawContent) for streaming indicator.
SSE end state:
- When stream finishes, SDK may set isLoading=false before client state has
the final message content, so UI stayed wrong until refresh.
- Store onFinish(state) as finalState in chat page; clear when stream starts.
- Pass messagesOverride={finalState.messages} to MessageList when not loading
so the list uses server-complete messages right after SSE ends (no refresh).
Chore:
- Stop tracking .githooks/pre-commit; add .githooks/ to .gitignore (local only).
Co-authored-by: Cursor <cursoragent@cursor.com>
---
fix(前端): 杜绝半成品引用,SSE 结束时展示正确状态
引用:
- shouldShowCitationLoading 中只要 cleanContent 仍含未替换的 [cite-N] 就
只显示加载、不渲染正文,解决流式时引用块未到就出现 [cite-1] 的问题。
- 规则:hasUnreplacedCitationRefs(cleanContent) 为真则一律显示加载;
此外 isLoading && hasCitationsBlock 用于流式时显示「正在整理引用」。
SSE 结束状态:
- 流结束时 SDK 可能先置 isLoading=false,客户端 messages 尚未包含
最终内容,导致需刷新才显示正确。
- 在对话页保存 onFinish(state) 为 finalState,流开始时清空。
- 非加载时向 MessageList 传入 messagesOverride={finalState.messages},
列表在 SSE 结束后立即用服务端完整消息渲染,无需刷新。
杂项:
- 取消跟踪 .githooks/pre-commit,.gitignore 增加 .githooks/(仅本地)。
Strengthen the SUBAGENT_SECTION prompt to prevent the model from launching
more than 3 subagents in a single response. When >3 sub-tasks are needed,
the model is now explicitly instructed to plan and execute in sequential
batches of ≤3. Reinforced at three prompt injection points: thinking style,
main subagent instructions, and critical reminders.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Strengthen the SUBAGENT_SECTION prompt to prevent the model from launching
more than 3 subagents in a single response. When >3 sub-tasks are needed,
the model is now explicitly instructed to plan and execute in sequential
batches of ≤3. Reinforced at three prompt injection points: thinking style,
main subagent instructions, and critical reminders.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add two new middlewares to improve robustness of the agent pipeline:
- DanglingToolCallMiddleware injects placeholder ToolMessages for
interrupted tool calls, preventing LLM errors from malformed history
- SubagentLimitMiddleware truncates excess parallel task tool calls at
the model response level, replacing the runtime check in task_tool
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add two new middlewares to improve robustness of the agent pipeline:
- DanglingToolCallMiddleware injects placeholder ToolMessages for
interrupted tool calls, preventing LLM errors from malformed history
- SubagentLimitMiddleware truncates excess parallel task tool calls at
the model response level, replacing the runtime check in task_tool
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Extract removeCitationsBlocks in utils, reuse in parseCitations and removeAllCitations
- Add hasCitationsBlock; isCitationsBlockIncomplete now uses it
- Add useParsedCitations hook (parseCitations + buildCitationMap) for message/artifact
- Add CitationAwareLink to unify link rendering (message-list-item + artifact-file-detail)
- Add getCleanContent helper; message-group uses it and useParsedCitations
- ArtifactFileDetail: single useParsedCitations, pass cleanContent/citationMap to Preview
- Stop exporting buildCitationMap and removeCitationsBlocks from citations index
- Remove duplicate MessageLink and inline link logic in artifact preview
Co-authored-by: Cursor <cursoragent@cursor.com>
- Extract removeCitationsBlocks in utils, reuse in parseCitations and removeAllCitations
- Add hasCitationsBlock; isCitationsBlockIncomplete now uses it
- Add useParsedCitations hook (parseCitations + buildCitationMap) for message/artifact
- Add CitationAwareLink to unify link rendering (message-list-item + artifact-file-detail)
- Add getCleanContent helper; message-group uses it and useParsedCitations
- ArtifactFileDetail: single useParsedCitations, pass cleanContent/citationMap to Preview
- Stop exporting buildCitationMap and removeCitationsBlocks from citations index
- Remove duplicate MessageLink and inline link logic in artifact preview
Co-authored-by: Cursor <cursoragent@cursor.com>
- About: use aboutMarkdown from about-content.ts instead of raw-loader for
about.md (fixes Turbopack 'Cannot find module raw-loader')
- Web search: remove Tooltip from web_search and web_fetch result links
- Citations: remove HoverCard from CitationLink so no hover popup on badges
Co-authored-by: Cursor <cursoragent@cursor.com>
- About: use aboutMarkdown from about-content.ts instead of raw-loader for
about.md (fixes Turbopack 'Cannot find module raw-loader')
- Web search: remove Tooltip from web_search and web_fetch result links
- Citations: remove HoverCard from CitationLink so no hover popup on badges
Co-authored-by: Cursor <cursoragent@cursor.com>
Add "present_files" to disallowed_tools for bash and general-purpose
subagents to prevent them from presenting files directly. Also add the
new market-analysis skill for generating consulting-grade reports.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add "present_files" to disallowed_tools for bash and general-purpose
subagents to prevent them from presenting files directly. Also add the
new market-analysis skill for generating consulting-grade reports.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add "present_files" to disallowed_tools for bash and general-purpose
subagents to prevent them from presenting files directly. Also add the
new market-analysis skill for generating consulting-grade reports.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Updated documentation to accurately cover all backend subsystems including
subagents, memory, middleware chain, sandbox, MCP, skills, and gateway API.
Fixed broken MCP_SETUP.md link in root README.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Updated documentation to accurately cover all backend subsystems including
subagents, memory, middleware chain, sandbox, MCP, skills, and gateway API.
Fixed broken MCP_SETUP.md link in root README.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Updated documentation to accurately cover all backend subsystems including
subagents, memory, middleware chain, sandbox, MCP, skills, and gateway API.
Fixed broken MCP_SETUP.md link in root README.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Prevent resource exhaustion by capping the number of parallel subagents.
Adds runtime enforcement in task_tool and updates prompts/examples accordingly.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Prevent resource exhaustion by capping the number of parallel subagents.
Adds runtime enforcement in task_tool and updates prompts/examples accordingly.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Prevent resource exhaustion by capping the number of parallel subagents.
Adds runtime enforcement in task_tool and updates prompts/examples accordingly.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Enable task tool to capture and stream AI messages as they are generated by subagents. This replaces simple polling status updates with detailed message-level progress updates.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Enable task tool to capture and stream AI messages as they are generated by subagents. This replaces simple polling status updates with detailed message-level progress updates.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Enable task tool to capture and stream AI messages as they are generated by subagents. This replaces simple polling status updates with detailed message-level progress updates.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Reorder task tool parameters to prioritize description first for better usability
- Add tool_call_id injection for better task traceability
- Use tool_call_id as task_id in executor for consistent tracking
- Simplify event messages by removing redundant task_type field
- Update task examples in prompt to reflect new parameter order
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Reorder task tool parameters to prioritize description first for better usability
- Add tool_call_id injection for better task traceability
- Use tool_call_id as task_id in executor for consistent tracking
- Simplify event messages by removing redundant task_type field
- Update task examples in prompt to reflect new parameter order
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Reorder task tool parameters to prioritize description first for better usability
- Add tool_call_id injection for better task traceability
- Use tool_call_id as task_id in executor for consistent tracking
- Simplify event messages by removing redundant task_type field
- Update task examples in prompt to reflect new parameter order
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The reasoning content in message-group.tsx was not being processed
through parseCitations, causing raw <citations> blocks to be visible.
Now reasoning content is parsed to remove citations blocks.
Co-authored-by: Cursor <cursoragent@cursor.com>
The reasoning content in message-group.tsx was not being processed
through parseCitations, causing raw <citations> blocks to be visible.
Now reasoning content is parsed to remove citations blocks.
Co-authored-by: Cursor <cursoragent@cursor.com>
The reasoning content in message-group.tsx was not being processed
through parseCitations, causing raw <citations> blocks to be visible.
Now reasoning content is parsed to remove citations blocks.
Co-authored-by: Cursor <cursoragent@cursor.com>
Revert streaming logic - only links that are actually in citationMap
should render as badges. This prevents project URLs and other regular
links from being incorrectly rendered as citation badges.
During streaming, links may initially appear as plain links until the
citations block is fully parsed, then they will update to badge style.
Co-authored-by: Cursor <cursoragent@cursor.com>
Revert streaming logic - only links that are actually in citationMap
should render as badges. This prevents project URLs and other regular
links from being incorrectly rendered as citation badges.
During streaming, links may initially appear as plain links until the
citations block is fully parsed, then they will update to badge style.
Co-authored-by: Cursor <cursoragent@cursor.com>
Revert streaming logic - only links that are actually in citationMap
should render as badges. This prevents project URLs and other regular
links from being incorrectly rendered as citation badges.
During streaming, links may initially appear as plain links until the
citations block is fully parsed, then they will update to badge style.
Co-authored-by: Cursor <cursoragent@cursor.com>
During streaming when citations are still loading (isLoadingCitations=true),
all external links should be rendered as badges since we don't know yet
which links are citations. After streaming completes, only links in
citationMap are rendered as badges.
Co-authored-by: Cursor <cursoragent@cursor.com>
During streaming when citations are still loading (isLoadingCitations=true),
all external links should be rendered as badges since we don't know yet
which links are citations. After streaming completes, only links in
citationMap are rendered as badges.
Co-authored-by: Cursor <cursoragent@cursor.com>
During streaming when citations are still loading (isLoadingCitations=true),
all external links should be rendered as badges since we don't know yet
which links are citations. After streaming completes, only links in
citationMap are rendered as badges.
Co-authored-by: Cursor <cursoragent@cursor.com>
When only reasoning content exists (no main content), the citations
block was not being parsed and removed. Now reasoning content also
goes through parseCitations to hide the raw citations block.
Co-authored-by: Cursor <cursoragent@cursor.com>
When only reasoning content exists (no main content), the citations
block was not being parsed and removed. Now reasoning content also
goes through parseCitations to hide the raw citations block.
Co-authored-by: Cursor <cursoragent@cursor.com>
When only reasoning content exists (no main content), the citations
block was not being parsed and removed. Now reasoning content also
goes through parseCitations to hide the raw citations block.
Co-authored-by: Cursor <cursoragent@cursor.com>
Same fix as message-list-item: project URLs and regular links in
artifact file preview should be rendered as plain links, not badges.
Only actual citations (in citationMap) should be rendered as badges.
Co-authored-by: Cursor <cursoragent@cursor.com>
Same fix as message-list-item: project URLs and regular links in
artifact file preview should be rendered as plain links, not badges.
Only actual citations (in citationMap) should be rendered as badges.
Co-authored-by: Cursor <cursoragent@cursor.com>
Same fix as message-list-item: project URLs and regular links in
artifact file preview should be rendered as plain links, not badges.
Only actual citations (in citationMap) should be rendered as badges.
Co-authored-by: Cursor <cursoragent@cursor.com>
Project URLs and regular links should be rendered as plain underlined
links, not as citation badges. Only links that are actual citations
(present in citationMap) should be rendered as badges.
Co-authored-by: Cursor <cursoragent@cursor.com>
Project URLs and regular links should be rendered as plain underlined
links, not as citation badges. Only links that are actual citations
(present in citationMap) should be rendered as badges.
Co-authored-by: Cursor <cursoragent@cursor.com>
Project URLs and regular links should be rendered as plain underlined
links, not as citation badges. Only links that are actual citations
(present in citationMap) should be rendered as badges.
Co-authored-by: Cursor <cursoragent@cursor.com>
When citation data is not available, use the markdown link text
(children) as display text instead of just the domain. This ensures
that links like [OpenJudge](github.com/...) show 'OpenJudge' instead
of just 'github.com'.
Co-authored-by: Cursor <cursoragent@cursor.com>
When citation data is not available, use the markdown link text
(children) as display text instead of just the domain. This ensures
that links like [OpenJudge](github.com/...) show 'OpenJudge' instead
of just 'github.com'.
Co-authored-by: Cursor <cursoragent@cursor.com>
When citation data is not available, use the markdown link text
(children) as display text instead of just the domain. This ensures
that links like [OpenJudge](github.com/...) show 'OpenJudge' instead
of just 'github.com'.
Co-authored-by: Cursor <cursoragent@cursor.com>
AI was outputting bare brackets like [arXiv:xxx] without URLs,
which do not render as links. Updated prompt to explicitly show
correct vs wrong formats and require complete markdown links.
Co-authored-by: Cursor <cursoragent@cursor.com>
AI was outputting bare brackets like [arXiv:xxx] without URLs,
which do not render as links. Updated prompt to explicitly show
correct vs wrong formats and require complete markdown links.
Co-authored-by: Cursor <cursoragent@cursor.com>
AI was outputting bare brackets like [arXiv:xxx] without URLs,
which do not render as links. Updated prompt to explicitly show
correct vs wrong formats and require complete markdown links.
Co-authored-by: Cursor <cursoragent@cursor.com>
For human messages, disable remark-gfm autolink feature to prevent
URLs from incorrectly including adjacent text (especially Chinese
characters) as part of the link. This ensures that when users input
"https://example.com 帮我分析", only the URL becomes a link.
Co-authored-by: Cursor <cursoragent@cursor.com>
For human messages, disable remark-gfm autolink feature to prevent
URLs from incorrectly including adjacent text (especially Chinese
characters) as part of the link. This ensures that when users input
"https://example.com 帮我分析", only the URL becomes a link.
Co-authored-by: Cursor <cursoragent@cursor.com>
For human messages, disable remark-gfm autolink feature to prevent
URLs from incorrectly including adjacent text (especially Chinese
characters) as part of the link. This ensures that when users input
"https://example.com 帮我分析", only the URL becomes a link.
Co-authored-by: Cursor <cursoragent@cursor.com>
Human messages should display links as plain underlined text,
not as citation badges. This preserves the original user input
appearance when users paste URLs in their messages.
Co-authored-by: Cursor <cursoragent@cursor.com>
Human messages should display links as plain underlined text,
not as citation badges. This preserves the original user input
appearance when users paste URLs in their messages.
Co-authored-by: Cursor <cursoragent@cursor.com>
Human messages should display links as plain underlined text,
not as citation badges. This preserves the original user input
appearance when users paste URLs in their messages.
Co-authored-by: Cursor <cursoragent@cursor.com>
Add subagents.enabled flag in config.yaml to control subagent feature:
- When disabled, task/task_status tools are not loaded
- When disabled, system prompt excludes subagent documentation
- Default is enabled for backward compatibility
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add subagents.enabled flag in config.yaml to control subagent feature:
- When disabled, task/task_status tools are not loaded
- When disabled, system prompt excludes subagent documentation
- Default is enabled for backward compatibility
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add subagents.enabled flag in config.yaml to control subagent feature:
- When disabled, task/task_status tools are not loaded
- When disabled, system prompt excludes subagent documentation
- Default is enabled for backward compatibility
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add accurate token counting using tiktoken library and significantly enhance
memory update prompts with detailed section guidelines, multilingual support,
and improved fact extraction. Update deep-research skill to be more proactive
for research queries.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add accurate token counting using tiktoken library and significantly enhance
memory update prompts with detailed section guidelines, multilingual support,
and improved fact extraction. Update deep-research skill to be more proactive
for research queries.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add accurate token counting using tiktoken library and significantly enhance
memory update prompts with detailed section guidelines, multilingual support,
and improved fact extraction. Update deep-research skill to be more proactive
for research queries.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Use citation.title for display text in CitationLink to ensure correct
titles show during streaming (instead of generic "Source" text)
- Render all external links as CitationLink badges for consistent styling
during streaming output
- Add removeAllCitations when copying message content to clipboard
- Simplify citations_format prompt for cleaner AI output
Co-authored-by: Cursor <cursoragent@cursor.com>
- Use citation.title for display text in CitationLink to ensure correct
titles show during streaming (instead of generic "Source" text)
- Render all external links as CitationLink badges for consistent styling
during streaming output
- Add removeAllCitations when copying message content to clipboard
- Simplify citations_format prompt for cleaner AI output
Co-authored-by: Cursor <cursoragent@cursor.com>
- Use citation.title for display text in CitationLink to ensure correct
titles show during streaming (instead of generic "Source" text)
- Render all external links as CitationLink badges for consistent styling
during streaming output
- Add removeAllCitations when copying message content to clipboard
- Simplify citations_format prompt for cleaner AI output
Co-authored-by: Cursor <cursoragent@cursor.com>
Add native Apple Container support for better performance on macOS while
maintaining full Docker compatibility. Enhance documentation with memory system
details, development guidelines, and sandbox setup instructions. Improve dev
experience with container image pre-pulling and unified cleanup tools.
Key changes:
- Auto-detect and prefer Apple Container on macOS with Docker fallback
- Add APPLE_CONTAINER.md with complete usage and troubleshooting guide
- Document memory system architecture in CLAUDE.md
- Add make setup-sandbox for pre-pulling container images
- Create cleanup-containers.sh for cross-runtime container cleanup
- Update all related documentation (README, SETUP, config examples)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add native Apple Container support for better performance on macOS while
maintaining full Docker compatibility. Enhance documentation with memory system
details, development guidelines, and sandbox setup instructions. Improve dev
experience with container image pre-pulling and unified cleanup tools.
Key changes:
- Auto-detect and prefer Apple Container on macOS with Docker fallback
- Add APPLE_CONTAINER.md with complete usage and troubleshooting guide
- Document memory system architecture in CLAUDE.md
- Add make setup-sandbox for pre-pulling container images
- Create cleanup-containers.sh for cross-runtime container cleanup
- Update all related documentation (README, SETUP, config examples)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add native Apple Container support for better performance on macOS while
maintaining full Docker compatibility. Enhance documentation with memory system
details, development guidelines, and sandbox setup instructions. Improve dev
experience with container image pre-pulling and unified cleanup tools.
Key changes:
- Auto-detect and prefer Apple Container on macOS with Docker fallback
- Add APPLE_CONTAINER.md with complete usage and troubleshooting guide
- Document memory system architecture in CLAUDE.md
- Add make setup-sandbox for pre-pulling container images
- Create cleanup-containers.sh for cross-runtime container cleanup
- Update all related documentation (README, SETUP, config examples)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Implement automatic cache invalidation based on file modification time to ensure memory data consistency across Gateway API and agent prompts. The cache now automatically reloads when the memory file is updated externally.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Implement automatic cache invalidation based on file modification time to ensure memory data consistency across Gateway API and agent prompts. The cache now automatically reloads when the memory file is updated externally.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Implement automatic cache invalidation based on file modification time to ensure memory data consistency across Gateway API and agent prompts. The cache now automatically reloads when the memory file is updated externally.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Add memory API endpoints for retrieving memory data:
- GET /api/memory - get current memory data
- POST /api/memory/reload - reload from file
- GET /api/memory/config - get memory configuration
- GET /api/memory/status - get config and data together
- Optimize MemoryMiddleware to only use user inputs and final
assistant responses, filtering out intermediate tool calls
- Add memory configuration example to config.example.yaml
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add memory API endpoints for retrieving memory data:
- GET /api/memory - get current memory data
- POST /api/memory/reload - reload from file
- GET /api/memory/config - get memory configuration
- GET /api/memory/status - get config and data together
- Optimize MemoryMiddleware to only use user inputs and final
assistant responses, filtering out intermediate tool calls
- Add memory configuration example to config.example.yaml
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add memory API endpoints for retrieving memory data:
- GET /api/memory - get current memory data
- POST /api/memory/reload - reload from file
- GET /api/memory/config - get memory configuration
- GET /api/memory/status - get config and data together
- Optimize MemoryMiddleware to only use user inputs and final
assistant responses, filtering out intermediate tool calls
- Add memory configuration example to config.example.yaml
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implement a memory system that stores user context and conversation history
in memory.json, uses LLM to summarize conversations, and injects relevant
context into system prompts for personalized responses.
Key components:
- MemoryConfig for configuration management
- MemoryUpdateQueue with debounce for batch processing
- MemoryUpdater for LLM-based memory extraction
- MemoryMiddleware to queue conversations after agent execution
- Memory injection into lead agent system prompt
Note: Add memory section to config.yaml to enable (see config.example.yaml)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implement a memory system that stores user context and conversation history
in memory.json, uses LLM to summarize conversations, and injects relevant
context into system prompts for personalized responses.
Key components:
- MemoryConfig for configuration management
- MemoryUpdateQueue with debounce for batch processing
- MemoryUpdater for LLM-based memory extraction
- MemoryMiddleware to queue conversations after agent execution
- Memory injection into lead agent system prompt
Note: Add memory section to config.yaml to enable (see config.example.yaml)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implement a memory system that stores user context and conversation history
in memory.json, uses LLM to summarize conversations, and injects relevant
context into system prompts for personalized responses.
Key components:
- MemoryConfig for configuration management
- MemoryUpdateQueue with debounce for batch processing
- MemoryUpdater for LLM-based memory extraction
- MemoryMiddleware to queue conversations after agent execution
- Memory injection into lead agent system prompt
Note: Add memory section to config.yaml to enable (see config.example.yaml)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add README.md with project overview, quick start, and API reference
- Add CONTRIBUTING.md with development setup and contribution guidelines
- Add docs/ARCHITECTURE.md with detailed system architecture diagrams
- Add docs/API.md with complete API reference for LangGraph and Gateway
- Add docs/README.md as documentation index
- Update CLAUDE.md with improved structure and new features
- Update docs/TODO.md to reflect current status
- Update pyproject.toml description
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add README.md with project overview, quick start, and API reference
- Add CONTRIBUTING.md with development setup and contribution guidelines
- Add docs/ARCHITECTURE.md with detailed system architecture diagrams
- Add docs/API.md with complete API reference for LangGraph and Gateway
- Add docs/README.md as documentation index
- Update CLAUDE.md with improved structure and new features
- Update docs/TODO.md to reflect current status
- Update pyproject.toml description
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add README.md with project overview, quick start, and API reference
- Add CONTRIBUTING.md with development setup and contribution guidelines
- Add docs/ARCHITECTURE.md with detailed system architecture diagrams
- Add docs/API.md with complete API reference for LangGraph and Gateway
- Add docs/README.md as documentation index
- Update CLAUDE.md with improved structure and new features
- Update docs/TODO.md to reflect current status
- Update pyproject.toml description
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Enable previewing .skill files (ZIP archives) by extracting and displaying
their SKILL.md content. Add caching to avoid repeated ZIP extraction.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Enable previewing .skill files (ZIP archives) by extracting and displaying
their SKILL.md content. Add caching to avoid repeated ZIP extraction.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Enable previewing .skill files (ZIP archives) by extracting and displaying
their SKILL.md content. Add caching to avoid repeated ZIP extraction.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add POST /api/skills/install endpoint to install .skill files from
thread's user-data directory. The endpoint extracts the ZIP archive,
validates SKILL.md frontmatter, and installs to skills/custom/.
Frontend Install buttons now call the API instead of downloading.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add POST /api/skills/install endpoint to install .skill files from
thread's user-data directory. The endpoint extracts the ZIP archive,
validates SKILL.md frontmatter, and installs to skills/custom/.
Frontend Install buttons now call the API instead of downloading.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add POST /api/skills/install endpoint to install .skill files from
thread's user-data directory. The endpoint extracts the ZIP archive,
validates SKILL.md frontmatter, and installs to skills/custom/.
Frontend Install buttons now call the API instead of downloading.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Change skills rendering from attribute-based format to nested element format
with <available_skills>, <skill>, <name>, <description>, and <location> tags
for better readability and structure.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Change skills rendering from attribute-based format to nested element format
with <available_skills>, <skill>, <name>, <description>, and <location> tags
for better readability and structure.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Change skills rendering from attribute-based format to nested element format
with <available_skills>, <skill>, <name>, <description>, and <location> tags
for better readability and structure.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add model-aware vision tool loading based on supports_vision flag
- Move view_image_tool from config to builtin tools for dynamic inclusion
- Add timeout to image search to prevent hanging requests
- Optimize image search results format using thumbnails
- Add image validation for reference images in generation
- Improve error handling with detailed messages
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Add model-aware vision tool loading based on supports_vision flag
- Move view_image_tool from config to builtin tools for dynamic inclusion
- Add timeout to image search to prevent hanging requests
- Optimize image search results format using thumbnails
- Add image validation for reference images in generation
- Improve error handling with detailed messages
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Add model-aware vision tool loading based on supports_vision flag
- Move view_image_tool from config to builtin tools for dynamic inclusion
- Add timeout to image search to prevent hanging requests
- Optimize image search results format using thumbnails
- Add image validation for reference images in generation
- Improve error handling with detailed messages
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add image viewing capability for vision-enabled models with ViewImageMiddleware and view_image_tool. Limit web_fetch tool output to 4096 characters to prevent excessive content. Update model config to support vision capability flag.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add image viewing capability for vision-enabled models with ViewImageMiddleware and view_image_tool. Limit web_fetch tool output to 4096 characters to prevent excessive content. Update model config to support vision capability flag.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add image viewing capability for vision-enabled models with ViewImageMiddleware and view_image_tool. Limit web_fetch tool output to 4096 characters to prevent excessive content. Update model config to support vision capability flag.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Backend:
- Handle both string and list format for message content in uploads middleware
- Extract text content from structured message blocks
- Add logging for debugging file upload flow
Frontend:
- Separate file display from message bubble for human messages
- Show uploaded files outside the message bubble for cleaner layout
- Improve file card border styling with subtle border color
- Add debug logging for message submission with files
Co-authored-by: Cursor <cursoragent@cursor.com>
Backend:
- Handle both string and list format for message content in uploads middleware
- Extract text content from structured message blocks
- Add logging for debugging file upload flow
Frontend:
- Separate file display from message bubble for human messages
- Show uploaded files outside the message bubble for cleaner layout
- Improve file card border styling with subtle border color
- Add debug logging for message submission with files
Co-authored-by: Cursor <cursoragent@cursor.com>
Backend:
- Handle both string and list format for message content in uploads middleware
- Extract text content from structured message blocks
- Add logging for debugging file upload flow
Frontend:
- Separate file display from message bubble for human messages
- Show uploaded files outside the message bubble for cleaner layout
- Improve file card border styling with subtle border color
- Add debug logging for message submission with files
Co-authored-by: Cursor <cursoragent@cursor.com>
Improve UX by hiding citations block while it's being streamed:
- Remove complete citations blocks (existing logic)
- Also remove incomplete citations blocks during streaming
- Prevents flickering of raw citations XML in the UI
Co-authored-by: Cursor <cursoragent@cursor.com>
Improve UX by hiding citations block while it's being streamed:
- Remove complete citations blocks (existing logic)
- Also remove incomplete citations blocks during streaming
- Prevents flickering of raw citations XML in the UI
Co-authored-by: Cursor <cursoragent@cursor.com>
Improve UX by hiding citations block while it's being streamed:
- Remove complete citations blocks (existing logic)
- Also remove incomplete citations blocks during streaming
- Prevents flickering of raw citations XML in the UI
Co-authored-by: Cursor <cursoragent@cursor.com>
Backend:
- Simplify citations prompt format and rules
- Add clear distinction between chat responses and file content
- Enforce full URL usage in markdown links, prohibit [cite-1] format
- Require content-first approach: write full content, then add citations at end
Frontend:
- Hide <citations> block in both chat messages and markdown preview
- Remove top-level Citations/Sources list for cleaner UI
- Auto-remove <citations> block in code editor view for markdown files
- Keep inline citation hover cards for reference details
This ensures citations are presented like Claude: clean content with inline reference badges.
Co-authored-by: Cursor <cursoragent@cursor.com>
Backend:
- Simplify citations prompt format and rules
- Add clear distinction between chat responses and file content
- Enforce full URL usage in markdown links, prohibit [cite-1] format
- Require content-first approach: write full content, then add citations at end
Frontend:
- Hide <citations> block in both chat messages and markdown preview
- Remove top-level Citations/Sources list for cleaner UI
- Auto-remove <citations> block in code editor view for markdown files
- Keep inline citation hover cards for reference details
This ensures citations are presented like Claude: clean content with inline reference badges.
Co-authored-by: Cursor <cursoragent@cursor.com>
Backend:
- Simplify citations prompt format and rules
- Add clear distinction between chat responses and file content
- Enforce full URL usage in markdown links, prohibit [cite-1] format
- Require content-first approach: write full content, then add citations at end
Frontend:
- Hide <citations> block in both chat messages and markdown preview
- Remove top-level Citations/Sources list for cleaner UI
- Auto-remove <citations> block in code editor view for markdown files
- Keep inline citation hover cards for reference details
This ensures citations are presented like Claude: clean content with inline reference badges.
Co-authored-by: Cursor <cursoragent@cursor.com>
Update presentation generation with contemporary design styles
(glassmorphism, dark-premium, neo-brutalist, etc.) and add a new
deep-research skill to guide thorough web research before content
generation tasks.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Update presentation generation with contemporary design styles
(glassmorphism, dark-premium, neo-brutalist, etc.) and add a new
deep-research skill to guide thorough web research before content
generation tasks.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Simplify clarification message handling by having the frontend detect and
display ask_clarification tool messages directly, instead of relying on
backend to add an extra AIMessage.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Simplify clarification message handling by having the frontend detect and
display ask_clarification tool messages directly, instead of relying on
backend to add an extra AIMessage.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Citations:
- Add citations parsing utilities for extracting source references from AI responses
- Render inline citations as hover card badges in message content
- Display citation cards with title, URL, and description on hover
- Add citation badge rendering in artifact markdown preview
- Update prompt to guide AI to output citations in correct format
Thread Management:
- Add rename functionality for chat threads with dialog UI
- Add share functionality to copy thread link to clipboard
- Share links use Vercel URL for production accessibility
- Add useRenameThread hook for thread title updates
i18n:
- Add translations for rename, share, cancel, save, and linkCopied
Co-authored-by: Cursor <cursoragent@cursor.com>
Citations:
- Add citations parsing utilities for extracting source references from AI responses
- Render inline citations as hover card badges in message content
- Display citation cards with title, URL, and description on hover
- Add citation badge rendering in artifact markdown preview
- Update prompt to guide AI to output citations in correct format
Thread Management:
- Add rename functionality for chat threads with dialog UI
- Add share functionality to copy thread link to clipboard
- Share links use Vercel URL for production accessibility
- Add useRenameThread hook for thread title updates
i18n:
- Add translations for rename, share, cancel, save, and linkCopied
Co-authored-by: Cursor <cursoragent@cursor.com>
Citations:
- Add citations parsing utilities for extracting source references from AI responses
- Render inline citations as hover card badges in message content
- Display citation cards with title, URL, and description on hover
- Add citation badge rendering in artifact markdown preview
- Update prompt to guide AI to output citations in correct format
Thread Management:
- Add rename functionality for chat threads with dialog UI
- Add share functionality to copy thread link to clipboard
- Share links use Vercel URL for production accessibility
- Add useRenameThread hook for thread title updates
i18n:
- Add translations for rename, share, cancel, save, and linkCopied
Co-authored-by: Cursor <cursoragent@cursor.com>
When using thinking-enabled models (like Kimi K2.5, DeepSeek), the API
expects reasoning_content on all assistant messages. The original
ChatDeepSeek stores reasoning_content in additional_kwargs but doesn't
include it when making subsequent API calls, causing "reasoning_content
is missing" errors.
This adds PatchedChatDeepSeek which overrides _get_request_payload to
restore reasoning_content from additional_kwargs into the payload.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When using thinking-enabled models (like Kimi K2.5, DeepSeek), the API
expects reasoning_content on all assistant messages. The original
ChatDeepSeek stores reasoning_content in additional_kwargs but doesn't
include it when making subsequent API calls, causing "reasoning_content
is missing" errors.
This adds PatchedChatDeepSeek which overrides _get_request_payload to
restore reasoning_content from additional_kwargs into the payload.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add web_search_tool and web_fetch_tool implementations using the official
firecrawl-py SDK as an alternative to Tavily/Jina AI integrations.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add web_search_tool and web_fetch_tool implementations using the official
firecrawl-py SDK as an alternative to Tavily/Jina AI integrations.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Creates presentations by generating AI images for each slide and composing
them into PPTX files. Features include:
- Multiple presentation styles (business, academic, minimal, keynote, creative)
- Visual consistency through reference image chaining (each slide uses the
previous slide as reference)
- Speaker notes from presentation plan
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Creates presentations by generating AI images for each slide and composing
them into PPTX files. Features include:
- Multiple presentation styles (business, academic, minimal, keynote, creative)
- Visual consistency through reference image chaining (each slide uses the
previous slide as reference)
- Speaker notes from presentation plan
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Use ThreadPoolExecutor to generate audio for multiple script lines
concurrently, significantly speeding up podcast generation.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Use ThreadPoolExecutor to generate audio for multiple script lines
concurrently, significantly speeding up podcast generation.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove LLM script generation from Python script, model now generates
JSON script directly (similar to image-generation skill)
- Add --transcript-file option to generate markdown transcript
- Add optional "title" field in JSON for transcript heading
- Remove dependency on OPENAI_API_KEY for podcast generation
- Update SKILL.md with new workflow and JSON format documentation
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove LLM script generation from Python script, model now generates
JSON script directly (similar to image-generation skill)
- Add --transcript-file option to generate markdown transcript
- Add optional "title" field in JSON for transcript heading
- Remove dependency on OPENAI_API_KEY for podcast generation
- Update SKILL.md with new workflow and JSON format documentation
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add podcast-generation skill for creating tech explainer podcasts
- Include generate.py script with TTS synthesis capabilities
- Add tech-explainer template for structured podcast content
- Increase sandbox command timeout from 30s to 600s to support
longer-running skill scripts
- Add podcast-generation skill for creating tech explainer podcasts
- Include generate.py script with TTS synthesis capabilities
- Add tech-explainer template for structured podcast content
- Increase sandbox command timeout from 30s to 600s to support
longer-running skill scripts
- Use ExtensionsConfig.from_file() instead of cached config to always
read latest configuration from disk in LangGraph Server
- Add mtime-based cache invalidation for MCP tools to detect config
file changes made through Gateway API
- Call reload_extensions_config() in Gateway API after updates to
refresh the global cache
- Remove unnecessary MCP initialization from Gateway startup since
MCP tools are only used by LangGraph Server
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Use ExtensionsConfig.from_file() instead of cached config to always
read latest configuration from disk in LangGraph Server
- Add mtime-based cache invalidation for MCP tools to detect config
file changes made through Gateway API
- Call reload_extensions_config() in Gateway API after updates to
refresh the global cache
- Remove unnecessary MCP initialization from Gateway startup since
MCP tools are only used by LangGraph Server
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add environment field to sandbox config for injecting env vars into container
- Support $VAR syntax to resolve values from host environment variables
- Refactor frontend API modules to use centralized getBackendBaseURL()
- Improve Doraemon skill with explicit input/output path arguments
- Add .env.example file
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add environment field to sandbox config for injecting env vars into container
- Support $VAR syntax to resolve values from host environment variables
- Refactor frontend API modules to use centralized getBackendBaseURL()
- Improve Doraemon skill with explicit input/output path arguments
- Add .env.example file
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* feat: add citation support in research report block and markdown
- Enhanced ResearchReportBlock to fetch citations based on researchId and pass them to the Markdown component.
- Introduced CitationLink component to display citation metadata on hover for links in markdown.
- Implemented CitationCard and CitationList components for displaying citation details and lists.
- Updated Markdown component to handle citation links and inline citations.
- Created HoverCard component for displaying citation information in a tooltip-like manner.
- Modified store to manage citations, including setting and retrieving citations for ongoing research.
- Added CitationsEvent type to handle citations in chat events and updated Message type to include citations.
* fix(log): Enable the logging level when enabling the DEBUG environment variable (#793)
* fix(frontend): render all tool calls in the frontend #796 (#797)
* build(deps): bump jspdf from 3.0.4 to 4.0.0 in /web (#798)
Bumps [jspdf](https://github.com/parallax/jsPDF) from 3.0.4 to 4.0.0.
- [Release notes](https://github.com/parallax/jsPDF/releases)
- [Changelog](https://github.com/parallax/jsPDF/blob/master/RELEASE.md)
- [Commits](https://github.com/parallax/jsPDF/compare/v3.0.4...v4.0.0)
---
updated-dependencies:
- dependency-name: jspdf
dependency-version: 4.0.0
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* fix(frontend):added the display of the 'analyst' message #800 (#801)
* fix: migrate from deprecated create_react_agent to langchain.agents.create_agent (#802)
* fix: migrate from deprecated create_react_agent to langchain.agents.create_agent
Fixes#799
- Replace deprecated langgraph.prebuilt.create_react_agent with
langchain.agents.create_agent (LangGraph 1.0 migration)
- Add DynamicPromptMiddleware to handle dynamic prompt templates
(replaces the 'prompt' callable parameter)
- Add PreModelHookMiddleware to handle pre-model hooks
(replaces the 'pre_model_hook' parameter)
- Update AgentState import from langchain.agents in template.py
- Update tests to use the new API
* fix:update the code with review comments
* fix: Add runtime parameter to compress_messages method(#803)
* fix: Add runtime parameter to compress_messages method(#803)
The compress_messages method was being called by PreModelHookMiddleware
with both state and runtime parameters, but only accepted state parameter.
This caused a TypeError when the middleware executed the pre_model_hook.
Added optional runtime parameter to compress_messages signature to match
the expected interface while maintaining backward compatibility.
* Update the code with the review comments
* fix: Refactor citation handling and add comprehensive tests for citation features
* refactor: Clean up imports and formatting across citation modules
* fix: Add monkeypatch to clear AGENT_RECURSION_LIMIT in recursion limit tests
* feat: Enhance citation link handling in Markdown component
* fix: Exclude citations from finish reason handling in mergeMessage function
* fix(nodes): update message handling
* fix(citations): improve citation extraction and handling in event processing
* feat(citations): enhance citation extraction and handling with improved merging and normalization
* fix(reporter): update citation formatting instructions for clarity and consistency
* fix(reporter): prioritize using Markdown tables for data presentation and comparison
---------
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: LoftyComet <1277173875@qq。>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
This PR addresses token limit issues when web_search is enabled with include_raw_content by implementing a two-pronged approach: changing the default behavior to exclude raw content and adding compression logic for when raw content is included.
Add a root-level Makefile to manage frontend, backend, and nginx services:
- `make check` validates required dependencies (Node.js 22+, pnpm, uv, nginx)
- `make install` installs all project dependencies
- `make dev` starts all services with unified port 2026
- `make stop` and `make clean` for cleanup
Update nginx configuration:
- Change port from 8000 to 2026
- Add frontend upstream and routing (port 3000)
- Add /api/langgraph/* routing with path rewriting to LangGraph server
- Keep other /api/* routes to Gateway API
- Route non-API requests to frontend
Update frontend configuration:
- Use relative URLs through nginx proxy by default
- Support environment variables for direct backend access
- Construct full URL for LangGraph SDK compatibility
Clean up backend Makefile by removing nginx and serve targets.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add a root-level Makefile to manage frontend, backend, and nginx services:
- `make check` validates required dependencies (Node.js 22+, pnpm, uv, nginx)
- `make install` installs all project dependencies
- `make dev` starts all services with unified port 2026
- `make stop` and `make clean` for cleanup
Update nginx configuration:
- Change port from 8000 to 2026
- Add frontend upstream and routing (port 3000)
- Add /api/langgraph/* routing with path rewriting to LangGraph server
- Keep other /api/* routes to Gateway API
- Route non-API requests to frontend
Update frontend configuration:
- Use relative URLs through nginx proxy by default
- Support environment variables for direct backend access
- Construct full URL for LangGraph SDK compatibility
Clean up backend Makefile by removing nginx and serve targets.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* feat: Implement DeerFlow API server with chat streaming, Langgraph orchestration, and various content generation capabilities.
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* - Use MongoDB `$push` with `$each` to append new messages to existing threads
- Use PostgreSQL jsonb concatenation operator to merge messages instead of overwriting
- Update comments to reflect append behavior in both database implementations
* fix: updated the unit tests with the recent changes
---------
Co-authored-by: Bink <992359580@qq.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: YikB <54528024+Bin1783@users.noreply.github.com>
* feat: Implement DeerFlow API server with chat streaming, Langgraph orchestration, and various content generation capabilities.
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* - Use MongoDB `$push` with `$each` to append new messages to existing threads
- Use PostgreSQL jsonb concatenation operator to merge messages instead of overwriting
- Update comments to reflect append behavior in both database implementations
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- Add type, url, and headers fields to MCP server config
- Update MCP client to handle stdio, sse, and http transports
- Add todos field to ThreadState
- Add Deerflow branding requirement to frontend-design skill
- Update extensions_config.example.json with SSE/HTTP examples
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add type, url, and headers fields to MCP server config
- Update MCP client to handle stdio, sse, and http transports
- Add todos field to ThreadState
- Add Deerflow branding requirement to frontend-design skill
- Update extensions_config.example.json with SSE/HTTP examples
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Fixes a critical bug in the from_runnable_config() method where falsy values (like False, 0, and empty strings) were being incorrectly filtered out, causing configuration fields to revert to their default values. The fix changes the filter condition from if v to if v is not None, ensuring only None values are skipped.
Add new MCP configuration management endpoint and enhance API documentation
with detailed descriptions, examples, and OpenAPI support for better
developer experience.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add new MCP configuration management endpoint and enhance API documentation
with detailed descriptions, examples, and OpenAPI support for better
developer experience.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add comprehensive MCP integration using langchain-mcp-adapters to enable
pluggable external tools from MCP servers.
Features:
- MCP server configuration via mcp_config.json
- Automatic lazy initialization for seamless use in both FastAPI and LangGraph Studio
- Support for multiple MCP servers (filesystem, postgres, github, brave-search, etc.)
- Environment variable resolution in configuration
- Tool caching mechanism for optimal performance
- Complete documentation and setup guide
Implementation:
- Add src/mcp module with client, tools, and cache components
- Integrate MCP config loading in AppConfig
- Update tool system to include MCP tools automatically
- Add eager initialization in FastAPI lifespan handler
- Add lazy initialization fallback for LangGraph Studio
Dependencies:
- Add langchain-mcp-adapters>=0.1.0
Documentation:
- Add MCP_SETUP.md with comprehensive setup guide
- Update CLAUDE.md with MCP system architecture
- Update config.example.yaml with MCP configuration notes
- Update README.md with MCP setup instructions
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add comprehensive MCP integration using langchain-mcp-adapters to enable
pluggable external tools from MCP servers.
Features:
- MCP server configuration via mcp_config.json
- Automatic lazy initialization for seamless use in both FastAPI and LangGraph Studio
- Support for multiple MCP servers (filesystem, postgres, github, brave-search, etc.)
- Environment variable resolution in configuration
- Tool caching mechanism for optimal performance
- Complete documentation and setup guide
Implementation:
- Add src/mcp module with client, tools, and cache components
- Integrate MCP config loading in AppConfig
- Update tool system to include MCP tools automatically
- Add eager initialization in FastAPI lifespan handler
- Add lazy initialization fallback for LangGraph Studio
Dependencies:
- Add langchain-mcp-adapters>=0.1.0
Documentation:
- Add MCP_SETUP.md with comprehensive setup guide
- Update CLAUDE.md with MCP system architecture
- Update config.example.yaml with MCP configuration notes
- Update README.md with MCP setup instructions
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The proxy was creating a temporary httpx.AsyncClient within an async context manager.
When returning StreamingResponse for SSE endpoints, the client was being closed before
the streaming generator could use it, causing "client has been closed" errors.
This change introduces a shared httpx.AsyncClient that persists for the application
lifecycle, properly cleaned up during shutdown. This also improves performance by
reusing TCP connections across requests.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The proxy was creating a temporary httpx.AsyncClient within an async context manager.
When returning StreamingResponse for SSE endpoints, the client was being closed before
the streaming generator could use it, causing "client has been closed" errors.
This change introduces a shared httpx.AsyncClient that persists for the application
lifecycle, properly cleaned up during shutdown. This also improves performance by
reusing TCP connections across requests.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Remove .claude/settings.local.json from git tracking and add it to .gitignore to prevent future accidental commits of local settings.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Remove .claude/settings.local.json from git tracking and add it to .gitignore to prevent future accidental commits of local settings.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Defer sandbox acquisition and thread directory creation until first use to improve performance and reduce resource usage.
Changes:
- Add lazy_init parameter to SandboxMiddleware (default: true)
- Add ensure_sandbox_initialized() helper for lazy sandbox acquisition
- Update all sandbox tools to use lazy initialization
- Add lazy_init parameter to ThreadDataMiddleware (default: true)
- Create thread directories on-demand in AioSandboxProvider
- LocalSandbox already creates directories on write (no changes needed)
Benefits:
- Saves 1-2s Docker container startup for conversations without tools
- Reduces unnecessary directory creation and file system operations
- Backward compatible with lazy_init=false option
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Defer sandbox acquisition and thread directory creation until first use to improve performance and reduce resource usage.
Changes:
- Add lazy_init parameter to SandboxMiddleware (default: true)
- Add ensure_sandbox_initialized() helper for lazy sandbox acquisition
- Update all sandbox tools to use lazy initialization
- Add lazy_init parameter to ThreadDataMiddleware (default: true)
- Create thread directories on-demand in AioSandboxProvider
- LocalSandbox already creates directories on write (no changes needed)
Benefits:
- Saves 1-2s Docker container startup for conversations without tools
- Reduces unnecessary directory creation and file system operations
- Backward compatible with lazy_init=false option
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Add thread-safe port allocation and proper cleanup on process exit to
prevent port conflicts in concurrent environments and ensure containers
are stopped when the application terminates.
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Add thread-safe port allocation and proper cleanup on process exit to
prevent port conflicts in concurrent environments and ensure containers
are stopped when the application terminates.
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Implement a skills framework that enables specialized workflows for
specific tasks (e.g., PDF processing, web page generation). Skills are
discovered from the skills/ directory and automatically mounted in
sandboxes with path mapping support.
- Add SkillsConfig for configuring skills path and container mount point
- Implement dynamic skill loading from SKILL.md files with YAML frontmatter
- Add path mapping in LocalSandbox to translate container paths to local paths
- Mount skills directory in AIO Docker sandbox containers
- Update lead agent prompt to dynamically inject available skills
- Add setup documentation and expand config.example.yaml
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Implement a skills framework that enables specialized workflows for
specific tasks (e.g., PDF processing, web page generation). Skills are
discovered from the skills/ directory and automatically mounted in
sandboxes with path mapping support.
- Add SkillsConfig for configuring skills path and container mount point
- Implement dynamic skill loading from SKILL.md files with YAML frontmatter
- Add path mapping in LocalSandbox to translate container paths to local paths
- Mount skills directory in AIO Docker sandbox containers
- Update lead agent prompt to dynamically inject available skills
- Add setup documentation and expand config.example.yaml
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
* chore: add .claude/ to .gitignore
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* feat: add gateway module with FastAPI server
- Add new gateway module with FastAPI app for API routing
- Add gateway and serve commands to Makefile
- Add fastapi, httpx, uvicorn, sse-starlette dependencies
- Fix model config retrieval in lead_agent (support both model_name and model)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
* chore: add .claude/ to .gitignore
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* feat: add gateway module with FastAPI server
- Add new gateway module with FastAPI app for API routing
- Add gateway and serve commands to Makefile
- Add fastapi, httpx, uvicorn, sse-starlette dependencies
- Fix model config retrieval in lead_agent (support both model_name and model)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
- Add AioSandboxProvider for Docker-based sandbox execution with
configurable container lifecycle, volume mounts, and port management
- Add TitleMiddleware to auto-generate thread titles after first
user-assistant exchange using LLM
- Add Claude Code documentation (CLAUDE.md, AGENTS.md)
- Extend SandboxConfig with Docker-specific options (image, port, mounts)
- Fix hardcoded mount path to use expanduser
- Add agent-sandbox and dotenv dependencies
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
- Add AioSandboxProvider for Docker-based sandbox execution with
configurable container lifecycle, volume mounts, and port management
- Add TitleMiddleware to auto-generate thread titles after first
user-assistant exchange using LLM
- Add Claude Code documentation (CLAUDE.md, AGENTS.md)
- Extend SandboxConfig with Docker-specific options (image, port, mounts)
- Fix hardcoded mount path to use expanduser
- Add agent-sandbox and dotenv dependencies
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
* fix(config): Add support for MCP server configuration parameters
* refact: rename the sse_readtimeout to sse_read_timeout
* update the code with review comments
* update the MCP document for the latest change
This pull request adds support for custom HTTP headers to the MCP server configuration and ensures that these headers are properly validated and included when adding new MCP servers. The changes are primarily focused on extending the schema and data handling for MCP server metadata.
* fix: Add runtime parameter to compress_messages method(#803)
The compress_messages method was being called by PreModelHookMiddleware
with both state and runtime parameters, but only accepted state parameter.
This caused a TypeError when the middleware executed the pre_model_hook.
Added optional runtime parameter to compress_messages signature to match
the expected interface while maintaining backward compatibility.
* Update the code with the review comments
* fix: migrate from deprecated create_react_agent to langchain.agents.create_agent
Fixes#799
- Replace deprecated langgraph.prebuilt.create_react_agent with
langchain.agents.create_agent (LangGraph 1.0 migration)
- Add DynamicPromptMiddleware to handle dynamic prompt templates
(replaces the 'prompt' callable parameter)
- Add PreModelHookMiddleware to handle pre-model hooks
(replaces the 'pre_model_hook' parameter)
- Update AgentState import from langchain.agents in template.py
- Update tests to use the new API
* fix:update the code with review comments
* fix(podcast): add fallback for models without json_object support (#747)
Models like Kimi K2 don't support response_format.type: json_object.
Add try-except to fall back to regular prompting with JSON parsing
when BadRequestError mentions json_object not supported.
- Add fallback to prompting + repair_json_output parsing
- Re-raise other BadRequestError types
- Add unit tests for script_writer_node with 100% coverage
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* fixes: the unit test error of test_script_writer_node.py
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
When NEXT_PUBLIC_API_URL is not explicitly configured, the frontend now
automatically detects the API URL based on the current page's hostname.
This allows accessing DeerFlow from other machines without rebuilding
the frontend.
- Add getBaseURL() helper with runtime window.location detection
- Use same protocol and hostname with default port 8000
- Preserve explicit NEXT_PUBLIC_API_URL configuration when set
- Fallback to localhost:8000 for SSR scenarios
* feat(eval): add report quality evaluation module
Addresses issue #773 - How to evaluate generated report quality objectively.
This module provides two evaluation approaches:
1. Automated metrics (no LLM required):
- Citation count and source diversity
- Word count compliance per report style
- Section structure validation
- Image inclusion tracking
2. LLM-as-Judge evaluation:
- Factual accuracy scoring
- Completeness assessment
- Coherence evaluation
- Relevance and citation quality checks
The combined evaluator provides a final score (1-10) and letter grade (A+ to F).
Files added:
- src/eval/__init__.py
- src/eval/metrics.py
- src/eval/llm_judge.py
- src/eval/evaluator.py
- tests/unit/eval/test_metrics.py
- tests/unit/eval/test_evaluator.py
* feat(eval): integrate report evaluation with web UI
This commit adds the web UI integration for the evaluation module:
Backend:
- Add EvaluateReportRequest/Response models in src/server/eval_request.py
- Add /api/report/evaluate endpoint to src/server/app.py
Frontend:
- Add evaluateReport API function in web/src/core/api/evaluate.ts
- Create EvaluationDialog component with grade badge, metrics display,
and optional LLM deep evaluation
- Add evaluation button (graduation cap icon) to research-block.tsx toolbar
- Add i18n translations for English and Chinese
The evaluation UI allows users to:
1. View quick metrics-only evaluation (instant)
2. Optionally run deep LLM-based evaluation for detailed analysis
3. See grade (A+ to F), score (1-10), and metric breakdown
* feat(eval): improve evaluation reliability and add LLM judge tests
- Extract MAX_REPORT_LENGTH constant in llm_judge.py for maintainability
- Add comprehensive unit tests for LLMJudge class (parse_response,
calculate_weighted_score, evaluate with mocked LLM)
- Pass reportStyle prop to EvaluationDialog for accurate evaluation criteria
- Add researchQueries store map to reliably associate queries with research
- Add getResearchQuery helper to retrieve query by researchId
- Remove unused imports in test_metrics.py
* fix(eval): use resolveServiceURL for evaluate API endpoint
The evaluateReport function was using a relative URL '/api/report/evaluate'
which sent requests to the Next.js server instead of the FastAPI backend.
Changed to use resolveServiceURL() consistent with other API functions.
* fix: improve type accuracy and React hooks in evaluation components
- Fix get_word_count_target return type from Optional[Dict] to Dict since it always returns a value via default fallback
- Fix useEffect dependency issue in EvaluationDialog using useRef to prevent unwanted re-evaluations
- Add aria-label to GradeBadge for screen reader accessibility
* test: add unit tests for global connection pool (Issue #778)
- Add TestLifespanFunction class with 9 tests for lifespan management:
- PostgreSQL/MongoDB pool initialization success/failure
- Cleanup on shutdown
- Skip initialization when not configured
- Add TestGlobalConnectionPoolUsage class with 4 tests:
- Using global pools when available
- Fallback to per-request connections
- Fix missing dict_row import in app.py (bug from PR #757)
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Extract message content from direct_response tool call args
and display it as the message content when tool call completes.
Note: This is a workaround. The message is not streamed because
direct_response uses tool calling mechanism where args are JSON,
not natural language text that can be streamed directly.
* feat(web): add multi-format report export (Markdown, HTML, PDF, Word, Image)
* fix: correct import order for docx (lint error)
* fix(web): address Copilot review comments for multi-format export
- Add i18n support for dropdown menu items (en/zh)
- Add DOMPurify for HTML sanitization (XSS protection)
- Fix async handling for canvas.toBlob with Promise wrapper
- Add toast notifications for export errors
- Fix Tooltip + DropdownMenuTrigger nesting (accessibility)
- Ensure container cleanup in finally block
* fix(web): enhance markdown parsing for PDF and Word export
- Add list support (bullet and numbered) for PDF export
- Add parseInlineMarkdown helper for Word export to handle bold, italic, code, links
- Add list support for Word export (bullet and numbered)
- Address Copilot review comments from PR #756
* fix(web): address PR review feedback for multi-format export
- Extract PDF formatting magic numbers into PDF_CONSTANTS
- Add Tooltip wrapper for download dropdown button
- Reduce triggerDownload cleanup timeout from 1000ms to 100ms
- Use marked.Lexer.lexInline for robust markdown parsing
- Add console.warn for image export cleanup errors
- Add numbering config for Word document ordered lists
- Fix CSS class typo: px-5pb-20 -> px-5 pb-20
- Remove unreachable dead code in parseInlineMarkdown
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* feat: add Serper search engine support
* docs: update configuration guide and env example for Serper
* test: add test case for Serper with missing API key
* feat: add enable_web_search config to disable web search (#681)
* fix: skip enforce_researcher_search validation when web search is disabled
- Return json.dumps([]) instead of empty string for consistency in background_investigation_node
- Add enable_web_search check to skip validation warning when user intentionally disabled web search
- Add warning log when researcher has no tools available
- Update tests to include new enable_web_search parameter
* fix: address Copilot review feedback
- Coordinate enforce_web_search with enable_web_search in validate_and_fix_plan
- Fix misleading comment in background_investigation_node
* docs: add warning about local RAG setup when disabling web search
* docs: add web search toggle section to configuration guide
* fix: handle greetings without triggering research workflow (#733)
* test: update tests for direct_response tool behavior
* fix: address Copilot review comments for coordinator_node - Extract locale from direct_response tool_args - Fix import sorting (ruff I001)
* fix: remove locale extraction from tool_args in direct_response
Use locale from state instead of tool_args to avoid potential side effects. The locale is already properly passed from frontend via state.
* fix: only fallback to planner when clarification is enabled
In legacy mode (BRANCH 1), no tool calls should end the workflow gracefully instead of falling back to planner. This fixes the test_coordinator_node_no_tool_calls integration test.
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* Update uv.lock to sync with pyproject.toml
* fix: update Interrupt object attribute access for LangGraph 1.0+ (#730)
The Interrupt class in LangGraph 1.0 no longer has the 'ns' attribute.
This change updates _create_interrupt_event() to use the new 'id'
attribute instead, with a fallback to thread_id for compatibility.
Changes:
- Replace event_data["__interrupt__"][0].ns[0] with interrupt.id
- Use getattr() with fallback for backward compatibility
- Update debug log message from 'ns=' to 'id='
- Add unit tests for _create_interrupt_event function
* fix the unit test error and address review comment
---------
Co-authored-by: Willem Jiang <143703838+willem-bd@users.noreply.github.com>
* support infoquest
* support html checker
* support html checker
* change line break format
* change line break format
* change line break format
* change line break format
* change line break format
* change line break format
* change line break format
* change line break format
* Fix several critical issues in the codebase
- Resolve crawler panic by improving error handling
- Fix plan validation to prevent invalid configurations
- Correct InfoQuest crawler JSON conversion logic
* add test for infoquest
* add test for infoquest
* Add InfoQuest introduction to the README
* add test for infoquest
* fix readme for infoquest
* fix readme for infoquest
* resolve the conflict
* resolve the conflict
* resolve the conflict
* Fix formatting of INFOQUEST in SearchEngine enum
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Willem Jiang <143703838+willem-bd@users.noreply.github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* fix(web): handle incomplete JSON in MCP tool call arguments (#528)
When using stream_mode=["messages", "updates"] with MCP tools, tool call
arguments arrive in chunks that may be incomplete JSON (missing closing
braces). This caused JSON.parse() to throw errors in the frontend.
Changes:
- Add safeParseToolArgs() function using best-effort-json-parser to
gracefully handle incomplete JSON from streaming
- Replace direct JSON.parse() with safe parser in mergeMessage()
- Add comprehensive tests for tool call argument parsing scenarios
* Address the code review comments
* fix(llm): filter unexpected config keys to prevent LangChain warnings (#411)
Add allowlist validation for LLM configuration keys to prevent unexpected
parameters like SEARCH_ENGINE from being passed to LLM constructors.
Changes:
- Add ALLOWED_LLM_CONFIG_KEYS set with valid LLM configuration parameters
- Filter out unexpected keys before creating LLM instances
- Log clear warning messages when unexpected keys are removed
- Add unit test for configuration key filtering
This fixes the confusing LangChain warning "WARNING! SEARCH_ENGINE is not
default parameter. SEARCH_ENGINE was transferred to model_kwargs" that
occurred when users accidentally placed configuration keys in wrong sections
of conf.yaml.
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Add a new "analysis" step type to handle reasoning and synthesis tasks
that don't require code execution, addressing the concern that routing
all non-search tasks to the coder agent was inappropriate.
Changes:
- Add ANALYSIS enum value to StepType in planner_model.py
- Create analyst_node for pure LLM reasoning without tools
- Update graph routing to route analysis steps to analyst agent
- Add analyst agent to AGENT_LLM_MAP configuration
- Create analyst prompts (English and Chinese)
- Update planner prompts with guidance on choosing between
analysis (reasoning/synthesis) and processing (code execution)
- Change default step_type inference from "processing" to "analysis"
when need_search=false
Co-authored-by: Willem Jiang <143703838+willem-bd@users.noreply.github.com>
* fix: revert the part of patch of issue-710 to extract the content from the plan
* Upgrade the ddgs for the new compatible version
* Upgraded langchain to 1.1.0
updated langchain related package to the new compatable version
* Update pyproject.toml
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* fix: apply context compression to prevent token overflow (Issue #721)
- Add token_limit configuration to conf.yaml.example for BASIC_MODEL and REASONING_MODEL
- Implement context compression in _execute_agent_step() before agent invocation
- Preserve first 3 messages (system prompt + context) during compression
- Enhance ContextManager logging with better token count reporting
- Prevent 400 Input tokens exceeded errors by automatically compressing message history
* feat: add model-based token limit inference for Issue #721
- Add smart default token limits based on common LLM models
- Support model name inference when token_limit not explicitly configured
- Models include: OpenAI (GPT-4o, GPT-4, etc.), Claude, Gemini, Doubao, DeepSeek, etc.
- Conservative defaults prevent token overflow even without explicit configuration
- Priority: explicit config > model inference > safe default (100,000 tokens)
- Ensures Issue #721 protection for all users, not just those with token_limit set
* fix: Missing Required Fields in Plan Validation
* fix: the exception of plan validation
* Fixed the test errors
* Addressed the comments of the PR reviews
* fix: multiple web_search ToolMessages only showing last result
* fix: Missing Required Fields in Plan Validation
* fix: the exception of plan validation
* Fixed the test errors
* Addressed the comments of the PR reviews
* fix: the crawling error when encountering PDF URLs
* Added the unit test for the new feature of crawl tool
* fix: address the code review problems
* fix: address the code review problems
* fix: the validation Error with qwen-max-latest Model
- Added comprehensive unit tests in tests/unit/graph/test_nodes.py for the new extract_plan_content function
- Tests cover various input types: string, AIMessage, dictionary, other types
- Includes a specific test case for issue #703 with the qwen-max-latest model
- All tests pass successfully, confirming the function handles different input types correctly
* feat: address the code review concerns
* fix: ensure researcher agent uses web search tool instead of generating URLs (#702)
- Add enforce_researcher_search configuration option (default: True) to control web search requirement
- Strengthen researcher prompts in both English and Chinese with explicit instructions to use web_search tool
- Implement validate_web_search_usage function to detect if web search tool was used during research
- Add validation logic that warns when researcher doesn't use web search tool
- Enhance logging for web search tools with special markers for easy tracking
- Skip validation during unit tests to avoid test failures
- Update _execute_agent_step to accept config parameter for proper configuration access
This addresses issue #702 where the researcher agent was generating URLs on its own instead of using the web search tool.
* fix: addressed the code review comment
* fix the unit test error and update the code
* feat: enable ppt_composer.zh_CN.md with request.locale
* fix: GeneratePPTRequest miss locale field
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* feat: 兼容使用的模型不支持json结构化输出的情况
* fix: add explicit validation that the response content is valid JSON before proceeding to parse it
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* refactor: Welcome layout and conditional rendering
Improves flex layout and spacing in ConversationStarter, and updates MessagesBlock to conditionally render ConversationStarter or MessageListView based on chat state. This streamlines the UI and removes redundant rendering logic.
* fix: replay mode
* fix: Remove unnecessary inset-0
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
This commit addresses issue #682 by providing clear documentation on how to view complete model output and debug DeerFlow workflows.
Changes:
- Add new DEBUGGING.md guide with detailed instructions for:
- Viewing complete model output
- Enabling debug logging
- Configuring LangChain verbose logging
- Setting up LangSmith tracing
- Docker Compose debugging tips
- Common troubleshooting scenarios
- Update .env.example with:
- Clearer comments for DEBUG setting
- Documentation for LANGCHAIN_VERBOSE and LANGCHAIN_DEBUG options
- Improved LangSmith configuration guidance
- Enhance docs/FAQ.md with:
- How to view complete model output
- How to enable debug logging
- How to troubleshoot common issues
- Links to the new debugging guide
These documentation improvements make it easier for users to:
- Debug workflow issues
- View LLM prompts and responses
- Troubleshoot deployment problems
- Monitor performance with LangSmith
Fixes#682
* feat: add edit and refresh functionality for MCP servers in settings tab
* feat: fix lint error and enhance MCP server dialog with validation and error handling
* fix: add missing newline at the end of en.json file
* feat: only refreshing specific servers
* feat: add validation messages for MCP server configuration and improve server update logic
Add defensive checks before removeChild to prevent 'Failed to execute removeChild' error when the element has already been removed from DOM. Wrap URL.revokeObjectURL in finally block to ensure proper resource cleanup.
* fix: presever the local setting between frontend and backend
* Added unit test for the state preservation
* fix: passing the locale to the agent call
* fix: apply the fix after code review
* security: add log injection attack prevention with input sanitization
- Created src/utils/log_sanitizer.py to sanitize user-controlled input before logging
- Prevents log injection attacks using newlines, tabs, carriage returns, etc.
- Escapes dangerous characters: \n, \r, \t, \0, \x1b
- Provides specialized functions for different input types:
- sanitize_log_input: general purpose sanitization
- sanitize_thread_id: for user-provided thread IDs
- sanitize_user_content: for user messages (more aggressive truncation)
- sanitize_agent_name: for agent identifiers
- sanitize_tool_name: for tool names
- sanitize_feedback: for user interrupt feedback
- create_safe_log_message: template-based safe message creation
- Updated src/server/app.py to sanitize all user input in logging:
- Thread IDs from request parameter
- Message content from user
- Agent names and node information
- Tool names and feedback
- Updated src/agents/tool_interceptor.py to sanitize:
- Tool names during execution
- User feedback during interrupt handling
- Tool input data
- Added 29 comprehensive unit tests covering:
- Classic newline injection attacks
- Carriage return injection
- Tab and null character injection
- HTML/ANSI escape sequence injection
- Combined multi-character attacks
- Truncation and length limits
Fixes potential log forgery vulnerability where malicious users could inject
fake log entries via unsanitized input containing control characters.
* fix: make SSE buffer size configurable to prevent overflow during multi-round searches (Issue #664)
- Add NEXT_PUBLIC_MAX_STREAM_BUFFER_SIZE environment variable for frontend SSE stream buffer
- Default to 1MB for backward compatibility, users can increase to 5-10MB for large searches
- Enhance error message with actual buffer sizes and guidance on configuration
- Add validation schema in env.js with positive integer requirement
- Document configuration in .env.example with clear examples and use cases
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- Add unescape for \{ and \} characters in unescapeMarkdownSpecialChars()
- These are commonly used in LaTeX commands like \mathcal{F}
- Add test cases for Fourier transform notation and mixed escape scenarios
- All 118 tests pass including 4 new edge case tests for issue #608
* feat: implement tool-specific interrupts for create_react_agent (#572)
Add selective tool interrupt capability allowing interrupts before specific tools
rather than all tools. Users can now configure which tools trigger interrupts via
the interrupt_before_tools parameter.
Changes:
- Create ToolInterceptor class to handle tool-specific interrupt logic
- Add interrupt_before_tools parameter to create_agent() function
- Extend Configuration with interrupt_before_tools field
- Add interrupt_before_tools to ChatRequest API
- Update nodes.py to pass interrupt configuration to agents
- Update app.py workflow to support tool interrupt configuration
- Add comprehensive unit tests for tool interceptor
Features:
- Selective tool interrupts: interrupt only specific tools by name
- Approval keywords: recognize user approval (approved, proceed, accept, etc.)
- Backward compatible: optional parameter, existing code unaffected
- Flexible: works with default tools and MCP-powered tools
- Works with existing resume mechanism for seamless workflow
Example usage:
request = ChatRequest(
messages=[...],
interrupt_before_tools=['db_tool', 'sensitive_api']
)
* test: add comprehensive integration tests for tool-specific interrupts (#572)
Add 24 integration tests covering all aspects of the tool interceptor feature:
Test Coverage:
- Agent creation with tool interrupts
- Configuration support (with/without interrupts)
- ChatRequest API integration
- Multiple tools with selective interrupts
- User approval/rejection flows
- Tool wrapping and functionality preservation
- Error handling and edge cases
- Approval keyword recognition
- Complex tool inputs
- Logging and monitoring
All tests pass with 100% coverage of tool interceptor functionality.
Tests verify:
✓ Selective tool interrupts work correctly
✓ Only specified tools trigger interrupts
✓ Non-matching tools execute normally
✓ User feedback is properly parsed
✓ Tool functionality is preserved after wrapping
✓ Error handling works as expected
✓ Configuration options are properly respected
✓ Logging provides useful debugging info
* fix: mock get_llm_by_type in agent creation test
Fix test_agent_creation_with_tool_interrupts which was failing because
get_llm_by_type() was being called before create_react_agent was mocked.
Changes:
- Add mock for get_llm_by_type in test
- Use context manager composition for multiple patches
- Test now passes and validates tool wrapping correctly
All 24 integration tests now pass successfully.
* refactor: use mock assertion methods for consistent and clearer error messages
Update integration tests to use mock assertion methods instead of direct
attribute checking for consistency and clearer error messages:
Changes:
- Replace 'assert mock_interrupt.called' with 'mock_interrupt.assert_called()'
- Replace 'assert not mock_interrupt.called' with 'mock_interrupt.assert_not_called()'
Benefits:
- Consistent with pytest-mock and unittest.mock best practices
- Clearer error messages when assertions fail
- Better IDE autocompletion support
- More professional test code
All 42 tests pass with improved assertion patterns.
* refactor: use default_factory for interrupt_before_tools consistency
Improve consistency between ChatRequest and Configuration implementations:
Changes:
- ChatRequest.interrupt_before_tools: Use Field(default_factory=list) instead of Optional[None]
- Remove unnecessary 'or []' conversion in app.py line 505
- Aligns with Configuration.interrupt_before_tools implementation pattern
- No functional changes - all tests still pass
Benefits:
- Consistent field definition across codebase
- Simpler and cleaner code
- Reduced chance of None/empty list bugs
- Better alignment with Pydantic best practices
All 42 tests passing.
* refactor: improve tool input formatting in interrupt messages
Enhance tool input representation for better readability in interrupt messages:
Changes:
- Add json import for better formatting
- Create _format_tool_input() static method with JSON serialization
- Use JSON formatting for dicts, lists, tuples with indent=2
- Fall back to str() for non-serializable types
- Handle None input specially (returns 'No input')
- Improve interrupt message formatting with better spacing
Benefits:
- Complex tool inputs now display as readable JSON
- Nested structures are properly indented and visible
- Better user experience when reviewing tool inputs before approval
- Handles edge cases gracefully with fallbacks
- Improved logging output for debugging
Example improvements:
Before: {'query': 'SELECT...', 'limit': 10, 'nested': {'key': 'value'}}
After:
{
"query": "SELECT...",
"limit": 10,
"nested": {
"key": "value"
}
}
All 42 tests still passing.
* test: add comprehensive unit tests for tool input formatting
* fix: improve config loading resilience for non-localhost access (#510)
- Add DEFAULT_CONFIG fallback to always return valid config even if fetch fails
- Implement retry logic with exponential backoff (max 2 retries) to handle transient failures
- Add 5-second fetch timeout to prevent hanging on unreachable backends
- Improve error logging with clear messages about config fetch status
- Always return DeerFlowConfig (never null) to prevent UI rendering issues
- Add safety checks in input-box component to verify reasoning models before access
- Improve type safety: verify array length before accessing array indices
- Add comprehensive documentation in .env.example with examples for different deployment scenarios
- Document NEXT_PUBLIC_API_URL variable behavior and fallback mechanism
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* fix: add nullish coalescing to prevent TypeScript error in input-box
- Add ?? operator to handle potential undefined value when accessing reasoning[0]
- Fixes TS2322 error: Type 'string | undefined' is not assignable to type 'string | number | Date'
---------
Co-authored-by: Willem Jiang <143703838+willem-bd@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* fix: handle [ACCEPTED] feedback gracefully without TypeError in plan review (#607)
- Add explicit None/empty feedback check to prevent processing None values
- Normalize feedback string once using strip().upper() instead of repeated calls
- Replace TypeError exception with graceful fallback to planner node
- Handle invalid feedback formats by logging warning and returning to planner
- Maintain backward compatibility for '[ACCEPTED]' and '[EDIT_PLAN]' formats
- Add test cases for None feedback, empty string feedback, and invalid formats
- Update existing test to verify graceful handling instead of exception raising
* Update src/graph/nodes.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* fix: resolve issue #588 - react key warnings from duplicate message IDs + establish jest testing framework
* Update the makefile and workflow with the js test
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- Implement index-based grouping of tool call chunks in _process_tool_call_chunks()
- Add _validate_tool_call_chunks() for debug logging and validation
- Enhance _process_message_chunk() with tool call ID validation and boundary detection
- Add comprehensive unit tests (17 tests) for tool call chunk processing
- Fix issue where tool names were incorrectly concatenated (e.g., 'web_searchweb_search')
- Ensure chunks from different tool calls (different indices) remain properly separated
- Add detailed logging for debugging tool call streaming issues
* update the code with suggestions of reviewing
* fix: resolve issue #650 - repair missing step_type fields in Plan validation
- Add step_type repair logic to validate_and_fix_plan() to auto-infer missing step_type
- Infer as 'research' when need_search=true, 'processing' when need_search=false
- Add explicit CRITICAL REQUIREMENT section to planner.md emphasizing step_type mandatory for every step
- Include validation checklist and examples showing both research and processing steps
- Add 23 comprehensive unit tests for validate_and_fix_plan() covering all scenarios
- Add 4 integration tests specifically for Issue #650 with actual Plan validation
- Prevents Pydantic ValidationError: 'Field required' for missing step_type
* Update tests/unit/graph/test_plan_validation.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Update tests/unit/graph/test_plan_validation.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* update the planner.zh_CN.md with recent changes of planner.md
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* feat: Add comprehensive Chinese localization support for issue #412
- Add locale parameter to ChatRequest model to capture user's language preference
- Implement language-aware template loading in template.py with fallback to English
- Update all apply_prompt_template calls to pass locale through the workflow
- Create Chinese translations for 14 core prompt files:
* Main agents: coordinator, planner, researcher, reporter, coder
* Subprocess agents: podcast_script_writer, ppt_composer, prompt_enhancer
* Writing assistant: all 6 prose prompts
- Update app.py to extract and propagate locale through workflow state
- Support both zh-CN and en-US locales with automatic fallback
- Ensure locale flows through all agent nodes and template rendering
* address the review suggestions
* fix: resolve issue #467 - message content validation and Tavily search error handling
This commit implements a comprehensive fix for issue #467 where the application
crashed with 'Field required: input.messages.3.content' error when generating reports.
## Root Cause Analysis
The issue had multiple interconnected causes:
1. Tavily tool returned mixed types (lists/error strings) instead of consistent JSON
2. background_investigation_node didn't handle error cases properly, returning None
3. Missing message content validation before LLM calls
4. Insufficient error diagnostics for content-related errors
## Changes Made
### Part 1: Fix Tavily Search Tool (tavily_search_results_with_images.py)
- Modified _run() and _arun() methods to return JSON strings instead of mixed types
- Error responses now return JSON: {"error": repr(e)}
- Successful responses return JSON string: json.dumps(cleaned_results)
- Ensures tool results always have valid string content for ToolMessages
### Part 2: Fix background_investigation_node Error Handling (graph/nodes.py)
- Initialize background_investigation_results to empty list instead of None
- Added proper JSON parsing for string responses from Tavily tool
- Handle error responses with explicit error logging
- Always return valid JSON (empty list if error) instead of None
### Part 3: Add Message Content Validation (utils/context_manager.py)
- New validate_message_content() function validates all messages before LLM calls
- Ensures all messages have content attribute and valid string content
- Converts complex types (lists, dicts) to JSON strings
- Provides graceful fallback for messages with issues
### Part 4: Enhanced Error Diagnostics (_execute_agent_step in graph/nodes.py)
- Call message validation before agent invocation
- Add detailed logging for content-related errors
- Log message types, content types, and lengths when validation fails
- Helps with future debugging of similar issues
## Testing
- All unit tests pass (395 tests)
- Python syntax verified for all modified files
- No breaking changes to existing functionality
* test: update tests for issue #467 fixes
Update test expectations to match the new implementation:
- Tavily search tool now returns JSON strings instead of mixed types
- background_investigation_node returns empty list [] for errors instead of None
- All tests updated to verify the new behavior
- All 391 tests pass successfully
* Update src/graph/nodes.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* fix: support additional Tavily search parameters via configuration to fix#548
- Add include_answer, search_depth, include_raw_content, include_images, include_image_descriptions to SEARCH_ENGINE config
- Update get_web_search_tool() to load these parameters from configuration with sensible defaults
- Parameters are now properly passed to TavilySearchWithImages during initialization
- This fixes 'got an unexpected keyword argument' errors when using web_search tool
- Update tests to verify new parameters are correctly set
* test: add comprehensive unit tests for web search configuration loading
- Add test for custom configuration values (include_answer, search_depth, etc.)
- Add test for empty configuration (all defaults)
- Add test for image_descriptions logic when include_images is false
- Add test for partial configuration
- Add test for missing config file
- Add test for multiple domains in include/exclude lists
All 7 new tests pass and provide comprehensive coverage of configuration loading
and parameter handling for Tavily search tool initialization.
* test: verify all Tavily configuration parameters are optional
Add 8 comprehensive tests to verify that all Tavily engine configuration
parameters are truly optional:
- test_tavily_with_no_search_engine_section: SEARCH_ENGINE section missing
- test_tavily_with_completely_empty_config: Entire config missing
- test_tavily_with_only_include_answer_param: Single param, rest default
- test_tavily_with_only_search_depth_param: Single param, rest default
- test_tavily_with_only_include_domains_param: Domain param, rest default
- test_tavily_with_explicit_false_boolean_values: False values work correctly
- test_tavily_with_empty_domain_lists: Empty lists handled correctly
- test_tavily_all_parameters_optional_mix: Multiple missing params work
These tests verify:
- Tool creation never fails regardless of missing configuration
- All parameters have sensible defaults
- Boolean parameters can be explicitly set to False
- Any combination of optional parameters works
- Domain lists can be empty or omitted
All 15 Tavily configuration tests pass successfully.
* fix: support local models by making thought field optional in Plan model
- Make thought field optional in Plan model to fix Pydantic validation errors with local models
- Add Ollama configuration example to conf.yaml.example
- Update documentation to include local model support
- Improve planner prompt with better JSON format requirements
Fixes local model integration issues where models like qwen3:14b would fail
due to missing thought field in JSON output.
* feat: Add intelligent clarification feature for research queries
- Add multi-turn clarification process to refine vague research questions
- Implement three-dimension clarification standard (Tech/App, Focus, Scope)
- Add clarification state management in coordinator node
- Update coordinator prompt with detailed clarification guidelines
- Add UI settings to enable/disable clarification feature (disabled by default)
- Update workflow to handle clarification rounds recursively
- Add comprehensive test coverage for clarification functionality
- Update documentation with clarification feature usage guide
Key components:
- src/graph/nodes.py: Core clarification logic and state management
- src/prompts/coordinator.md: Detailed clarification guidelines
- src/workflow.py: Recursive clarification handling
- web/: UI settings integration
- tests/: Comprehensive test coverage
- docs/: Updated configuration guide
* fix: Improve clarification conversation continuity
- Add comprehensive conversation history to clarification context
- Include previous exchanges summary in system messages
- Add explicit guidelines for continuing rounds in coordinator prompt
- Prevent LLM from starting new topics during clarification
- Ensure topic continuity across clarification rounds
Fixes issue where LLM would restart clarification instead of building upon previous exchanges.
* fix: Add conversation history to clarification context
* fix: resolve clarification feature message to planer, prompt, test issues
- Optimize coordinator.md prompt template for better clarification flow
- Simplify final message sent to planner after clarification
- Fix API key assertion issues in test_search.py
* fix: Add configurable max_clarification_rounds and comprehensive tests
- Add max_clarification_rounds parameter for external configuration
- Add comprehensive test cases for clarification feature in test_app.py
- Fixes issues found during interactive mode testing where:
- Recursive call failed due to missing initial_state parameter
- Clarification exited prematurely at max rounds
- Incorrect logging of max rounds reached
* Move clarification tests to test_nodes.py and add max_clarification_rounds to zh.json
* fix: add max_clarification_rounds parameter passing from frontend to backend
- Add max_clarification_rounds parameter in store.ts sendMessage function
- Add max_clarification_rounds type definition in chat.ts
- Ensure frontend settings page clarification rounds are correctly passed to backend
* fix: refine clarification workflow state handling and coverage
- Add clarification history reconstruction
- Fix clarified topic accumulation
- Add clarified_research_topic state field
- Preserve clarification state in recursive calls
- Add comprehensive test coverage
* refactor: optimize coordinator logic and type annotations
- Simplify handoff topic logic in coordinator_node
- Update type annotations from Tuple to tuple
- Improve code readability and maintainability
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix: ensure web search is performed for research plans to fix#535
When using certain models (DeepSeek-V3, Qwen3, or local deployments), the
agent framework failed to trigger web search tools, resulting in hallucinated
data. This fix implements multiple safeguards:
1. Add enforce_web_search configuration flag:
- New config option to mandate web search in research plans
- Defaults to False for backward compatibility
2. Add plan validation function validate_and_fix_plan():
- Validates that plans include at least one research step with web search
- Enforces web search requirement when enabled
- Adds default research step if plan has no steps
3. Enhance coordinator_node fallback logic:
- When model fails to call tools, fallback to planner instead of __end__
- Ensures workflow continues even when tool calling fails
- Logs detailed diagnostic info for debugging
4. Update prompts for stricter requirements:
- planner.md: Add MANDATORY web search requirement and clear warnings
- coordinator.md: Add CRITICAL tool calling requirement
- Emphasize consequences of missing web search (hallucinated data)
5. Update tests to reflect new behavior:
- test_coordinator_node_no_tool_calls: Expect planner instead of __end__
- test_coordinator_empty_llm_response_corner_case: Same expectation
Fixes#535 by ensuring:
- Web search is always performed for research tasks
- Workflow doesn't terminate on tool calling failures
- Models with poor tool calling support can still proceed
- No hallucinated data without real information gathering
* Update src/graph/nodes.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Update src/graph/nodes.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* accept the review suggestion of getting configuration
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
When editing reports, tiptap-markdown escapes special characters (*, _, [, ])
which corrupts LaTeX formulas. This fix:
1. Adds unescapeLatexInMath() function to reverse markdown escaping within
math delimiters ($...$ and 94410...94410)
2. Applies the unescape function in the editor's onChange callback to clean
the markdown before storing it
3. Adds comprehensive tests covering edge cases and round-trip scenarios
The fix ensures formulas like $(f * g)[n]$ remain unescaped when editing,
preventing display errors after save/reload.
Keep fixing #631
This pull request updates the crawl_tool function to return its results as a JSON string instead of a dictionary, and adjusts the unit tests accordingly to handle the new return type. The changes ensure consistent serialization of output and proper validation in tests.
- Change image result type from 'image' to 'image_url' to match OpenAI API expectations
- Wrap image URL in dict structure: {"url": "..."} instead of plain string
- Update SearchResultPostProcessor to handle dict-based image_url during duplicate removal
- Update tests to validate new image format
This fixes the 400 error: Invalid value: 'image'. Supported values are: 'text', 'image_url'...
Co-authored-by: Willem Jiang <143703838+willem-bd@users.noreply.github.com>
- Backend: Convert non-string content (lists, dicts) to JSON strings in _create_event_stream_message to ensure frontend always receives string content
- Frontend: Add type guard before calling startsWith() on toolCall.result for defensive programming
This fixes the TypeError: toolCall.result.startsWith is not a function when tools return complex objects.
Fixes#570 where browser freezes when research plan has 8+ steps.
Performance optimizations:
- Add animation throttling: only animate first 10 activity items
- Reduce animation durations (0.4s → 0.3s for activities, 0.2s → 0.15s for results)
- Remove scale animations (GPU-intensive) from search results
- Limit displayed results (20 pages, 10 images max)
- Add conditional animations based on item index
- Cap animation delays to prevent excessive staggering
- Add React.memo to ActivityMessage and ActivityListItem components
These changes significantly improve performance when rendering multiple
research steps while maintaining visual appeal for smaller lists.
* fix: add missing RunnableConfig parameter to human_feedback_node
This fixes issue #569 where interrupt() was being called outside of a runnable context.
The human_feedback_node was missing the config: RunnableConfig parameter that all other
node functions have, which caused RuntimeError when interrupt() tried to access the config.
- Add config: RunnableConfig parameter to function signature
- Add State type annotation to state parameter for consistency
- Maintains LangGraph execution context required by interrupt()
* test: update human_feedback_node tests to pass RunnableConfig parameter
Update all test functions that call human_feedback_node to include the new
required config parameter. These tests were failing because they were not
providing the RunnableConfig argument after the fix to add proper LangGraph
execution context.
Tests updated:
- test_human_feedback_node_auto_accepted
- test_human_feedback_node_edit_plan
- test_human_feedback_node_accepted
- test_human_feedback_node_invalid_interrupt
- test_human_feedback_node_json_decode_error_first_iteration
- test_human_feedback_node_json_decode_error_second_iteration
- test_human_feedback_node_not_enough_context
All tests now pass the mock_config fixture to human_feedback_node.
- Wrap agent.ainvoke() calls in try-except blocks
- Log full exception tracebacks for better debugging
- Return detailed error messages to users instead of generic 'internal error'
- Include step title and agent name in error context
- Allow workflow to continue gracefully when agent execution fails
- Store error details in observations for audit trail
* fix: prevent repeated content animation during thinking streaming (#614)
- Implement chunked rendering using reasoningContentChunks
- Static content (previous chunks) renders without animation
- Only current streaming chunk animates
- Disable animation on plan content (title, thought, steps) during streaming
- Animation applies after content finishes streaming (when complete)
- Prevents visual duplication of repeated sentences in thinking process
- Changed key from question text to combination of index and question text
- Ensures unique keys even if translation has duplicate questions
- Resolves React warning: 'Each child in a list should have a unique key prop'
- Set asyncio.WindowsSelectorEventLoopPolicy() on Windows at app module level
- Ensures psycopg can run in async mode on Windows regardless of entry point
- Fixes 'ProactorEventLoop' error when using PostgreSQL checkpointer
- Works with all entry points: server.py, uvicorn, langgraph dev, etc.
Added 'node --test tests/*.test.ts' to the lint-frontend target to ensure
frontend unit tests are run as part of the CI/quality checks workflow.
This ensures:
- Math formula normalization tests run before build
- Tests are validated alongside linting and type checking
- All 19 frontend tests pass before deployment
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Test files use .ts extensions in imports for Node's native test runner
compatibility, which conflicts with TypeScript's default behavior.
Excluding test files from tsconfig allows tests to run with Node while
maintaining strict type checking for the main codebase.
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
This fix addresses the issue where math formulas become corrupted or
incorrectly displayed after editing the generated report in the editor.
**Root Cause:**
The issue occurred due to incompatibility between markdown processing
in the display component and the Tiptap editor:
1. Display component used \[\] and \(\) LaTeX delimiters
2. Tiptap Mathematics extension expects $ and 70868 delimiters
3. tiptap-markdown didn't have built-in math node serialization
4. Math syntax was lost/corrupted during editor save operations
**Solution Implemented:**
1. Created MathematicsWithMarkdown extension that adds markdown
serialization support to Tiptap's Mathematics nodes
2. Added math delimiter normalization functions:
- normalizeMathForEditor(): Converts LaTeX delimiters to $/70868
- normalizeMathForDisplay(): Standardizes all delimiters to 70868
3. Updated Markdown component to use new normalization
4. Updated ReportEditor to normalize content before loading
**Changes:**
- web/src/components/editor/math-serializer.ts (new)
- web/src/components/editor/extensions.tsx
- web/src/components/editor/index.tsx
- web/src/components/deer-flow/markdown.tsx
- web/src/core/utils/markdown.ts
- web/tests/markdown-math-editor.test.ts (new)
- web/tests/markdown-katex.test.ts
**Testing:**
- Added 15 comprehensive tests for math normalization round-trip
- All tests passing (math editor + existing katex tests)
- Verified TypeScript compilation and linting
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Bug Fix
This PR fixes the issue where max_clarification_rounds parameter was not being passed from the frontend to the backend, causing a TypeError: '<' not supported between instances of 'int' and 'NoneType' error.
Technical Details
The issue was that the frontend was not passing the max_clarification_rounds parameter to the backend API, causing the backend to receive None values and fail during comparison operations. This fix ensures the parameter is properly typed and passed through the entire request chain.
* fix: support local models by making thought field optional in Plan model
- Make thought field optional in Plan model to fix Pydantic validation errors with local models
- Add Ollama configuration example to conf.yaml.example
- Update documentation to include local model support
- Improve planner prompt with better JSON format requirements
Fixes local model integration issues where models like qwen3:14b would fail
due to missing thought field in JSON output.
* feat: Add intelligent clarification feature for research queries
- Add multi-turn clarification process to refine vague research questions
- Implement three-dimension clarification standard (Tech/App, Focus, Scope)
- Add clarification state management in coordinator node
- Update coordinator prompt with detailed clarification guidelines
- Add UI settings to enable/disable clarification feature (disabled by default)
- Update workflow to handle clarification rounds recursively
- Add comprehensive test coverage for clarification functionality
- Update documentation with clarification feature usage guide
Key components:
- src/graph/nodes.py: Core clarification logic and state management
- src/prompts/coordinator.md: Detailed clarification guidelines
- src/workflow.py: Recursive clarification handling
- web/: UI settings integration
- tests/: Comprehensive test coverage
- docs/: Updated configuration guide
* fix: Improve clarification conversation continuity
- Add comprehensive conversation history to clarification context
- Include previous exchanges summary in system messages
- Add explicit guidelines for continuing rounds in coordinator prompt
- Prevent LLM from starting new topics during clarification
- Ensure topic continuity across clarification rounds
Fixes issue where LLM would restart clarification instead of building upon previous exchanges.
* fix: Add conversation history to clarification context
* fix: resolve clarification feature message to planer, prompt, test issues
- Optimize coordinator.md prompt template for better clarification flow
- Simplify final message sent to planner after clarification
- Fix API key assertion issues in test_search.py
* fix: Add configurable max_clarification_rounds and comprehensive tests
- Add max_clarification_rounds parameter for external configuration
- Add comprehensive test cases for clarification feature in test_app.py
- Fixes issues found during interactive mode testing where:
- Recursive call failed due to missing initial_state parameter
- Clarification exited prematurely at max rounds
- Incorrect logging of max rounds reached
* Move clarification tests to test_nodes.py and add max_clarification_rounds to zh.json
- Make thought field optional in Plan model to fix Pydantic validation errors with local models
- Add Ollama configuration example to conf.yaml.example
- Update documentation to include local model support
- Improve planner prompt with better JSON format requirements
Fixes local model integration issues where models like qwen3:14b would fail
due to missing thought field in JSON output.
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* feat:Add context compress
* feat: Add unit test
* feat: add unit test for context manager
* feat: add postprocessor param && code format
* feat: add configuration guide
* fix: fix the configuration_guide
* fix: fix the unit test
* fix: fix the default value
* feat: add test and log for context_manager
* add searx/searxng support
* nit
* Fix indentation in search.py for readability
* Clean up imports in search.py
Removed unused imports from search.py
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* feat:support config tavily search results
* feat: support config tavily search results
* feat: update the default value of include_images
* fix: fix the test
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* Debug deerflow server, web with vscode
Signed-off-by: shjy <asdf_0403@qq.com>
* removed the duplicated setting of .vscode
---------
Signed-off-by: shjy <asdf_0403@qq.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* feat: add Google AI Studio API support with platform-based detection
* chore: update configuration_guide.md
* fix the uv.lock formate error
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* feat: Implement MilvusRetriever with embedding model and resource management
* chore: Update configuration and loader files for consistency
* chore: Clean up test_milvus.py for improved readability and organization
* feat: Add tests for DashscopeEmbeddings query and document embedding methods
* feat: Add tests for embedding model initialization and example file loading in MilvusProvider
* chore: Remove unused imports and clean up test_milvus.py for better readability
* chore: Clean up test_milvus.py for improved readability and organization
* chore: Clean up test_milvus.py for improved readability and organization
* fix: replace print statements with logging in recursion limit function
* Implement feature X to enhance user experience and optimize performance
* refactor: clean up unused imports and comments in AboutTab component
* Implement feature X to enhance user experience and fix bug Y in module Z
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* docs: add deployment note for Linux servers
- allow external connections by changing the host to 0.0.0.0.
- security warning to remind users to secure their environment before exposing the service.
* docs: add deployment note for Linux servers
- allow external connections by changing the host to 0.0.0.0.
- security warning to remind users to secure their environment before exposing the service.
* fix: using mongomock for the checkpoint test
* Add postgres mock setting to the unit test
* Added utils file of postgres_mock_utils
* fixed the runtime loading error of deerflow server
* fix: correct typo in MongoDB connection string within .env.example
- Changes 'ongodb' to 'mongodb' in LANGGRAPH_CHECKPOINT_DB_URL example.
- This ensures the example configuration is valid and can be used directly.
* Update LANGGRAPH_CHECKPOINT_DB_URL in .env.example
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix: using commit hash as the action version
* fix: using commit hash as the action version
---------
Co-authored-by: Willem Jiang <143703838+willem-bd@users.noreply.github.com>
* feat: Enhance chat streaming and tool call processing
- Added support for MongoDB checkpointer in the chat streaming workflow.
- Introduced functions to process tool call chunks and sanitize arguments.
- Improved event message creation with additional metadata.
- Enhanced error handling for JSON serialization in event messages.
- Updated the frontend to convert escaped characters in tool call arguments.
- Refactored the workflow input preparation and initial message processing.
- Added new dependencies for MongoDB integration and tool argument sanitization.
* fix: Update MongoDB checkpointer configuration to use LANGGRAPH_CHECKPOINT_DB_URL
* feat: Add support for Postgres checkpointing and update README with database recommendations
* feat: Implement checkpoint saver functionality and update MongoDB connection handling
* refactor: Improve code formatting and readability in app.py and json_utils.py
* refactor: Clean up commented code and improve formatting in server.py
* refactor: Remove unused imports and improve code organization in app.py
* refactor: Improve code organization and remove unnecessary comments in app.py
* chore: use langgraph-checkpoint-postgres==2.0.21 to avoid the JSON convert issue in the latest version, implement chat stream persistant with Postgres
* feat: add MongoDB and PostgreSQL support for LangGraph checkpointing, enhance environment variable handling
* fix: update comments for clarity on Windows event loop policy
* chore: remove empty code changes in MongoDB and PostgreSQL checkpoint tests
* chore: clean up unused imports and code in checkpoint-related files
* chore: remove empty code changes in test_checkpoint.py
* chore: remove empty code changes in test_checkpoint.py
* chore: remove empty code changes in test_checkpoint.py
* test: update status code assertions in MCP endpoint tests to allow for 403 responses
* test: update MCP endpoint tests to assert specific status codes and enable MCP server configuration
* chore: remove unnecessary environment variables from unittest workflow
* fix: invert condition for MCP server configuration check to raise 403 when disabled
* chore: remove pymongo from test dependencies in uv.lock
* chore: optimize the _get_agent_name method
* test: enhance ChatStreamManager tests for PostgreSQL and MongoDB initialization
* test: add persistence tests for ChatStreamManager with PostgreSQL and MongoDB
* test: add unit tests for ChatStreamManager initialization with PostgreSQL and MongoDB
* test: enhance persistence tests for ChatStreamManager with PostgreSQL and MongoDB to verify message aggregation
* test: add unit tests for ChatStreamManager with PostgreSQL and MongoDB
* test: add unit tests for ChatStreamManager initialization with PostgreSQL and MongoDB
* test: add unit tests for ChatStreamManager initialization with PostgreSQL and MongoDB
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix: update README and configuration guide for new model support and reasoning capabilities
* fix: format code for consistency in agent and node files
* fix: update test cases for environment variable handling in llm configuration
* fix: refactor message chunk conversion functions for improved clarity and maintainability
* refactor: remove enable_thinking parameter from LLM configuration functions
* chore: update agent-LLM mapping for consistency
* chore: update LLM configuration handling for improved clarity
* test: add unit tests for Dashscope message chunk conversion and LLM configuration
* test: add unit tests for message chunk conversion in Dashscope
* test: add unit tests for message chunk conversion in Dashscope
* chore: remove unused imports from test_dashscope.py
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix:env AGENT_RECURSION_LIMIT not work
* fix:add test
* black tests/unit/config/test_configuration.py
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* feat: add support for 'unknown' message agent in MessageListItem and Message type
* fix: update default agent name from 'unknown' to 'planner' in workflow generator
* fix: remove handling for 'unknown' agent in MessageListItem
* fix: remove 'unknown' agent from Message interface
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* feat: disable the MCP server configuation by default
* Fixed the lint and test errors
* fix the lint error
* feat:update the mcp config documents and tests
* fixed the lint errors
* fix: Error: MISSING_MESSAGE: Could not resolve chat.page` in messages for locale 'en'
Fixed a `MISSING_MESSAGE` error that was occurring on the chat page due to missing translation keys for `chat.page` in the internationalization messages.
* Update en.json
* feat(llm): Add retry mechanism for LLM API calls
* feat: configure max_retries for LLM calls via conf.yaml
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* improve: add abort btn to abort the mcp add request.
* refactor: simplify style mapping by using upper case only
* format: execute uv run black --preview . to format python files.
* Add support for self-signed certs from model providers
* cleanup
---------
Co-authored-by: tonydoesathing <tmastromarino@cpacket.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* test: add unit tests in server
* test: add unit tests of app.py in server
* test: reformat the codes
* test: add more tests to cover the exception part
* test: add more tests on the server app part
* fix: don't show the detail exception to the client
* test: try to fix the CI test
* fix: keep the TTS API call without exposure information
* Fixed the unit test errors
* Fixed the lint error
* test: added unit test of builder
* test: Add unit tests for nodes.py
* test: add more unit tests in test_nodes
* test: try to fix the unit test error on GitHub
* test: reformate the code of test_nodes.py
* Fix the test error of reset the local argument
* Fixed the test error by setup args
* reformat the code
* test: add more test on test_tts.py
* test: add unit test of search and retriever in tools
* test: remove the main code of search.py
* test: add the travily_search unit test
* reformate the codes
* test: add unit tests of tools
* Added the pytest-asyncio dependency
* added the license header of test_tavily_search_api_wrapper.py
description: End-to-end smoke test skill for DeerFlow. Guides through: 1) Pulling latest code, 2) Docker OR Local installation and deployment (user preference, default to Local if Docker network issues), 3) Service availability verification, 4) Health check, 5) Final test report. Use when the user says "run smoke test", "smoke test deployment", "verify installation", "test service availability", "end-to-end test", or similar.
---
# DeerFlow Smoke Test Skill
This skill guides the Agent through DeerFlow's full end-to-end smoke test workflow, including code updates, deployment (supporting both Docker and local installation modes), service availability verification, and health checks.
## Deployment Mode Selection
This skill supports two deployment modes:
- **Local installation mode** (recommended, especially when network issues occur) - Run all services directly on the local machine
- **Docker mode** - Run all services inside Docker containers
**Selection strategy**:
- If the user explicitly asks for Docker mode, use Docker
- If network issues occur (such as slow image pulls), automatically switch to local mode
- Default to local mode whenever possible
## Structure
```
smoke-test/
├── SKILL.md ← You are here - core workflow and logic
├── scripts/
│ ├── check_docker.sh ← Check the Docker environment
│ ├── check_local_env.sh ← Check local environment dependencies
├── report.local.template.md ← Local mode smoke test report template
└── report.docker.template.md ← Docker mode smoke test report template
```
## Standard Operating Procedure (SOP)
### Phase 1: Code Update Check
1.**Confirm current directory** - Verify that the current working directory is the DeerFlow project root
2.**Check Git status** - See whether there are uncommitted changes
3.**Pull the latest code** - Use `git pull origin main` to get the latest updates
4.**Confirm code update** - Verify that the latest code was pulled successfully
### Phase 2: Deployment Mode Selection and Environment Check
**Choose deployment mode**:
- Ask for user preference, or choose automatically based on network conditions
- Default to local installation mode
**Local mode environment check**:
1.**Check Node.js version** - Requires 22+
2.**Check pnpm** - Package manager
3.**Check uv** - Python package manager
4.**Check nginx** - Reverse proxy
5.**Check required ports** - Confirm that ports 2026, 3000, 8001, and 2024 are not occupied
**Docker mode environment check** (if Docker is selected):
1.**Check whether Docker is installed** - Run `docker --version`
2.**Check Docker daemon status** - Run `docker info`
3.**Check Docker Compose availability** - Run `docker compose version`
4.**Check required ports** - Confirm that port 2026 is not occupied
### Phase 3: Configuration Preparation
1.**Check whether config.yaml exists**
- If it does not exist, run `make config` to generate it
- If it already exists, check whether it needs an upgrade with `make config-upgrade`
2.**Check the .env file**
- Verify that required environment variables are configured
- Especially model API keys such as `OPENAI_API_KEY`
### Phase 4: Deployment Execution
**Local mode deployment**:
1.**Check dependencies** - Run `make check`
2.**Install dependencies** - Run `make install`
3.**(Optional) Pre-pull the sandbox image** - If needed, run `make setup-sandbox`
4.**Start services** - Run `make dev-daemon` (background mode, recommended) or `make dev` (foreground mode)
5.**Wait for startup** - Give all services enough time to start completely (90-120 seconds recommended)
**Docker mode deployment** (if Docker is selected):
1.**Initialize Docker environment** - Run `make docker-init`
2.**Start Docker services** - Run `make docker-start`
3.**Wait for startup** - Give all containers enough time to start completely (60 seconds recommended)
### Phase 5: Service Health Check
**Local mode health check**:
1.**Check process status** - Confirm that LangGraph, Gateway, Frontend, and Nginx processes are all running
2.**Check frontend service** - Visit `http://localhost:2026` and verify that the page loads
3.**Check API Gateway** - Verify the `http://localhost:2026/health` endpoint
4.**Check LangGraph service** - Verify the availability of relevant endpoints
5.**Frontend route smoke check** - Run `bash .agent/skills/smoke-test/scripts/frontend_check.sh` to verify key routes under `/workspace`
**Docker mode health check** (when using Docker):
1.**Check container status** - Run `docker ps` and confirm that all containers are running
2.**Check frontend service** - Visit `http://localhost:2026` and verify that the page loads
3.**Check API Gateway** - Verify the `http://localhost:2026/health` endpoint
4.**Check LangGraph service** - Verify the availability of relevant endpoints
5.**Frontend route smoke check** - Run `bash .agent/skills/smoke-test/scripts/frontend_check.sh` to verify key routes under `/workspace`
### Optional Functional Verification
1.**List available models** - Verify that model configuration loads correctly
2.**List available skills** - Verify that the skill directory is mounted correctly
3.**Simple chat test** - Send a simple message to verify the end-to-end flow
### Phase 6: Generate Test Report
1.**Collect all test results** - Summarize execution status for each phase
2.**Record encountered issues** - If anything fails, record the error details
3.**Generate the final report** - Use the template that matches the selected deployment mode to create the complete test report, including overall conclusion, detailed key test cases, and explicit frontend page / route results
4.**Provide follow-up recommendations** - Offer suggestions based on the test results
## Execution Rules
- **Follow the sequence** - Execute strictly in the order described above
- **Idempotency** - Every step should be safe to repeat
- **Error handling** - If a step fails, stop and report the issue, then provide troubleshooting suggestions
- **Detailed logging** - Record the execution result and status of each step
- **User confirmation** - Ask for confirmation before potentially risky operations such as overwriting config
- **Mode preference** - Prefer local mode to avoid network-related issues
- **Template requirement** - The final report must use the matching template under `templates/`; do not output a free-form summary instead of the template-based report
- **Report clarity** - The execution summary must include the overall pass/fail conclusion plus per-case result explanations, and frontend smoke check results must be listed explicitly in the report
- **Optional phase handling** - If functional verification is not executed, do not present it as a separate skipped phase in the final report
## Known Acceptable Warnings
The following warnings can appear during smoke testing and do not block a successful result:
- Feishu/Lark SSL errors in Gateway logs (certificate verification failure) can be ignored if that channel is not enabled
- Warnings in LangGraph logs about missing methods in the custom checkpointer, such as `adelete_for_runs` or `aprune`, do not affect the core functionality
## Key Tools
Use the following tools during execution:
1.**bash** - Run shell commands
2.**present_file** - Show generated reports and important files
3.**task_tool** - Organize complex steps with subtasks when needed
Use this file as the default operating guide for this repository. Follow it first, and only search the codebase when this file is incomplete or incorrect.
These were executed and validated in this repository.
### A. Bootstrap and install
1. Check prerequisites:
```bash
make check
```
Observed: passes when required tools are installed.
2. Install dependencies (recommended order: backend then frontend, as implemented by `make install`):
```bash
make install
```
### B. Backend CI-equivalent validation
Run from `backend/`:
```bash
make lint
make test
```
Validated results:
-`make lint`: pass (`ruff check .`)
-`make test`: pass (`277 passed, 15 warnings in ~76.6s`)
CI parity:
-`.github/workflows/backend-unit-tests.yml` runs on pull requests.
- CI executes `uv sync --group dev`, then `make lint`, then `make test` in `backend/`.
### C. Frontend validation
Run from `frontend/`.
Recommended reliable sequence:
```bash
pnpm lint
pnpm typecheck
BETTER_AUTH_SECRET=local-dev-secret pnpm build
```
Observed failure modes and workarounds:
-`pnpm build` fails without `BETTER_AUTH_SECRET` in production-mode env validation.
- Workaround: set `BETTER_AUTH_SECRET` (best) or set `SKIP_ENV_VALIDATION=1`.
- Even with `SKIP_ENV_VALIDATION=1`, Better Auth can still warn/error in logs about default secret; prefer setting a real non-default secret.
-`pnpm check` currently fails (`next lint` invocation is incompatible here and resolves to an invalid directory). Do not rely on `pnpm check`; run `pnpm lint` and `pnpm typecheck` explicitly.
Thank you for your interest in contributing to DeerFlow! This guide will help you set up your development environment and understand our development workflow.
## Development Environment Setup
We offer two development environments. **Docker is recommended** for the most consistent and hassle-free experience.
### Option 1: Docker Development (Recommended)
Docker provides a consistent, isolated environment with all dependencies pre-configured. No need to install Node.js, Python, or nginx on your local machine.
#### Prerequisites
- Docker Desktop or Docker Engine
- pnpm (for caching optimization)
#### Setup Steps
1.**Configure the application**:
```bash
# Copy example configuration
cp config.example.yaml config.yaml
# Set your API keys
export OPENAI_API_KEY="your-key-here"
# or edit config.yaml directly
```
2. **Initialize Docker environment** (first time only):
```bash
make docker-init
```
This will:
- Build Docker images
- Install frontend dependencies (pnpm)
- Install backend dependencies (uv)
- Share pnpm cache with host for faster builds
3. **Start development services**:
```bash
make docker-start
```
`make docker-start` reads `config.yaml` and starts `provisioner` only for provisioner/Kubernetes sandbox mode.
If Docker builds are slow in your network, you can override the default package registries before running `make docker-init` or `make docker-start`:
```bash
export UV_INDEX_URL=https://pypi.org/simple
export NPM_REGISTRY=https://registry.npmjs.org
```
#### Recommended host resources
Use these as practical starting points for development and review environments:
| Scenario | Starting point | Recommended | Notes |
|---------|-----------|------------|-------|
| `make dev` on one machine | 4 vCPU, 8 GB RAM | 8 vCPU, 16 GB RAM | Best when DeerFlow uses hosted model APIs. |
| `make docker-start` review environment | 4 vCPU, 8 GB RAM | 8 vCPU, 16 GB RAM | Docker image builds and sandbox containers need extra headroom. |
| Shared Linux test server | 8 vCPU, 16 GB RAM | 16 vCPU, 32 GB RAM | Prefer this for heavier multi-agent runs or multiple reviewers. |
`2 vCPU / 4 GB` environments often fail to start reliably or become unresponsive under normal DeerFlow workloads.
#### Linux: Docker daemon permission denied
If `make docker-init`, `make docker-start`, or `make docker-stop` fails on Linux with an error like below, your current user likely does not have permission to access the Docker daemon socket:
```text
unable to get image 'deer-flow-dev-langgraph': permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock
```
Recommended fix: add your current user to the `docker` group so Docker commands work without `sudo`.
1. Confirm the `docker` group exists:
```bash
getent group docker
```
2. Add your current user to the `docker` group:
```bash
sudo usermod -aG docker $USER
```
3. Apply the new group membership. The most reliable option is to log out completely and then log back in. If you want to refresh the current shell session instead, run:
```bash
newgrp docker
```
4. Verify Docker access:
```bash
docker ps
```
5. Retry the DeerFlow command:
```bash
make docker-stop
make docker-start
```
If `docker ps` still reports a permission error after `usermod`, fully log out and log back in before retrying.
#### Docker Architecture
```
Host Machine
↓
Docker Compose (deer-flow-dev)
├→ nginx (port 2026) ← Reverse proxy
├→ web (port 3000) ← Frontend with hot-reload
├→ api (port 8001) ← Gateway API with hot-reload
├→ langgraph (port 2024) ← LangGraph server with hot-reload
└→ provisioner (optional, port 8002) ← Started only in provisioner/K8s sandbox mode
```
**Benefits of Docker Development**:
- ✅ Consistent environment across different machines
- ✅ No need to install Node.js, Python, or nginx locally
- ✅ Isolated dependencies and services
- ✅ Easy cleanup and reset
- ✅ Hot-reload for all services
- ✅ Production-like environment
### Option 2: Local Development
If you prefer to run services directly on your machine:
#### Prerequisites
Check that you have all required tools installed:
```bash
make check
```
Required tools:
- Node.js 22+
- pnpm
- uv (Python package manager)
- nginx
#### Setup Steps
1. **Configure the application** (same as Docker setup above)
2. **Install dependencies** (this also sets up pre-commit hooks):
```bash
make install
```
3. **Run development server** (starts all services with nginx):
```bash
make dev
```
4. **Access the application**:
- Web Interface: http://localhost:2026
- All API requests are automatically proxied through nginx
#### Manual Service Control
If you need to start services individually:
1. **Start backend services**:
```bash
# Terminal 1: Start LangGraph Server (port 2024)
cd backend
make dev
# Terminal 2: Start Gateway API (port 8001)
cd backend
make gateway
# Terminal 3: Start Frontend (port 3000)
cd frontend
pnpm dev
```
2. **Start nginx**:
```bash
make nginx
# or directly: nginx -c $(pwd)/docker/nginx/nginx.local.conf -g 'daemon off;'
```
3. **Access the application**:
- Web Interface: http://localhost:2026
#### Nginx Configuration
The nginx configuration provides:
- Unified entry point on port 2026
- Routes `/api/langgraph/*` to LangGraph Server (2024)
- Routes other `/api/*` endpoints to Gateway API (8001)
- Routes non-API requests to Frontend (3000)
- Centralized CORS handling
- SSE/streaming support for real-time agent responses
- Optimized timeouts for long-running operations
## Project Structure
```
deer-flow/
├── config.example.yaml # Configuration template
├── extensions_config.example.json # MCP and Skills configuration template
This file is for coding agents. If the DeerFlow repository is not already cloned and open, clone `https://github.com/bytedance/deer-flow.git` first, then continue from the repository root.
## Goal
Bootstrap a DeerFlow local development workspace on the user's machine with the least risky path available.
Default preference:
1. Docker development environment
2. Local development environment
Do not assume API keys or model credentials exist. Set up everything that can be prepared safely, then stop with a concise summary of what the user still needs to provide.
## Operating Rules
- Be idempotent. Re-running this document should not damage an existing setup.
- Prefer existing repo commands over ad hoc shell commands.
- Do not use `sudo` or install system packages without explicit user approval.
- Do not overwrite existing user config values unless the user asks.
- If a step fails, stop, explain the blocker, and provide the smallest next action.
- If multiple setup paths are possible, prefer Docker when Docker is already available.
## Success Criteria
Consider the setup successful when all of the following are true:
- The DeerFlow repository is cloned and the current working directory is the repo root.
-`config.yaml` exists.
- For Docker setup, `make docker-init` completed successfully and Docker prerequisites are prepared, but services are not assumed to be running yet.
- For local setup, `make check` passed or reported no missing prerequisites, and `make install` completed successfully.
- The user receives the exact next command to launch DeerFlow.
- The user also receives any missing model configuration or referenced environment variable names from `config.yaml`, without inspecting secret-bearing files for actual values.
## Steps
- If the current directory is not the DeerFlow repository root, clone `https://github.com/bytedance/deer-flow.git` if needed, then change into the repository root.
- Confirm the current directory is the DeerFlow repository root by checking that `Makefile`, `backend/`, `frontend/`, and `config.example.yaml` exist.
- Detect whether `config.yaml` already exists.
- If `config.yaml` does not exist, run `make config`.
- Detect whether Docker is available and the daemon is reachable with `docker info`.
- If Docker is available:
- Run `make docker-init`.
- Treat this as Docker prerequisite preparation only. Do not claim that app services, compose validation, or image builds have already succeeded.
- Do not start long-running services unless the user explicitly asks or this setup request clearly includes launch verification.
- Tell the user the recommended next command is `make docker-start`.
- If Docker is not available:
- Run `make check`.
- If `make check` reports missing system dependencies such as `node`, `pnpm`, `uv`, or `nginx`, stop and report the missing tools instead of attempting privileged installs.
- If prerequisites are satisfied, run `make install`.
- Tell the user the recommended next command is `make dev`.
- Inspect `config.yaml` only for missing model entries or referenced environment variable placeholders. Do not read `.env`, `frontend/.env`, or other secret-bearing files.
- If no model is configured, tell the user they must add at least one entry under `models` in `config.yaml`.
- If `config.yaml` references variables such as `$OPENAI_API_KEY`, tell the user which variable names still need real values, but do not verify them by opening secret-bearing files.
- If the repository already appears configured, avoid repeating expensive work unless it is necessary to verify the environment.
## Verification
Use the lightest verification that matches the chosen setup path.
> Aus Open Source entstanden, an Open Source zurückgeben.
**DeerFlow** (**D**eep **E**xploration and **E**fficient **R**esearch **Flow**) ist ein Community-getriebenes Framework für tiefgehende Recherche, das auf der großartigen Arbeit der Open-Source-Community aufbaut. Unser Ziel ist es, Sprachmodelle mit spezialisierten Werkzeugen für Aufgaben wie Websuche, Crawling und Python-Code-Ausführung zu kombinieren und gleichzeitig der Community, die dies möglich gemacht hat, etwas zurückzugeben.
Besuchen Sie [unsere offizielle Website](https://deerflow.tech/) für weitere Details.
DeerFlow ist in Python entwickelt und kommt mit einer in Node.js geschriebenen Web-UI. Um einen reibungslosen Einrichtungsprozess zu gewährleisten, empfehlen wir die Verwendung der folgenden Tools:
Vereinfacht die Verwaltung von Python-Umgebungen und Abhängigkeiten. `uv` erstellt automatisch eine virtuelle Umgebung im Stammverzeichnis und installiert alle erforderlichen Pakete für Sie—keine manuelle Installation von Python-Umgebungen notwendig.
- **[`nvm`](https://github.com/nvm-sh/nvm):**
Verwalten Sie mühelos mehrere Versionen der Node.js-Laufzeit.
- **[`pnpm`](https://pnpm.io/installation):**
Installieren und verwalten Sie Abhängigkeiten des Node.js-Projekts.
### Umgebungsanforderungen
Stellen Sie sicher, dass Ihr System die folgenden Mindestanforderungen erfüllt:
- **[Python](https://www.python.org/downloads/):** Version `3.12+`
- **[Node.js](https://nodejs.org/en/download/):** Version `22+`
Optional können Sie Web-UI-Abhängigkeiten über [pnpm](https://pnpm.io/installation) installieren:
```bash
cd deer-flow/web
pnpm install
```
### Konfigurationen
Weitere Informationen finden Sie im [Konfigurationsleitfaden](docs/configuration_guide.md).
> [!HINWEIS]
> Lesen Sie den Leitfaden sorgfältig, bevor Sie das Projekt starten, und aktualisieren Sie die Konfigurationen entsprechend Ihren spezifischen Einstellungen und Anforderungen.
### Konsolen-UI
Der schnellste Weg, um das Projekt auszuführen, ist die Verwendung der Konsolen-UI.
```bash
# Führen Sie das Projekt in einer bash-ähnlichen Shell aus
uv run main.py
```
### Web-UI
Dieses Projekt enthält auch eine Web-UI, die ein dynamischeres und ansprechenderes interaktives Erlebnis bietet.
> [!HINWEIS]
> Sie müssen zuerst die Abhängigkeiten der Web-UI installieren.
```bash
# Führen Sie sowohl den Backend- als auch den Frontend-Server im Entwicklungsmodus aus
# Unter macOS/Linux
./bootstrap.sh -d
# Unter Windows
bootstrap.bat -d
```
Öffnen Sie Ihren Browser und besuchen Sie [`http://localhost:3000`](http://localhost:3000), um die Web-UI zu erkunden.
Weitere Details finden Sie im Verzeichnis [`web`](./web/).
## Unterstützte Suchmaschinen
DeerFlow unterstützt mehrere Suchmaschinen, die in Ihrer `.env`-Datei über die Variable `SEARCH_API` konfiguriert werden können:
- **Tavily** (Standard): Eine spezialisierte Such-API für KI-Anwendungen
- Erfordert `TAVILY_API_KEY` in Ihrer `.env`-Datei
- Registrieren Sie sich unter: https://app.tavily.com/home
- Anpassbare Vorlagen für maßgeschneiderte Inhalte
## Architektur
DeerFlow implementiert eine modulare Multi-Agenten-Systemarchitektur, die für automatisierte Forschung und Codeanalyse konzipiert ist. Das System basiert auf LangGraph und ermöglicht einen flexiblen zustandsbasierten Workflow, bei dem Komponenten über ein klar definiertes Nachrichtenübermittlungssystem kommunizieren.

> Sehen Sie es live auf [deerflow.tech](https://deerflow.tech/#multi-agent-architecture)
Das System verwendet einen optimierten Workflow mit den folgenden Komponenten:
1.**Koordinator**: Der Einstiegspunkt, der den Workflow-Lebenszyklus verwaltet
- Initiiert den Forschungsprozess basierend auf Benutzereingaben
- Delegiert Aufgaben bei Bedarf an den Planer
- Fungiert als primäre Schnittstelle zwischen dem Benutzer und dem System
2.**Planer**: Strategische Komponente für Aufgabenzerlegung und -planung
- Analysiert Forschungsziele und erstellt strukturierte Ausführungspläne
- Bestimmt, ob ausreichend Kontext verfügbar ist oder ob weitere Forschung benötigt wird
- Verwaltet den Forschungsablauf und entscheidet, wann der endgültige Bericht erstellt wird
3.**Forschungsteam**: Eine Sammlung spezialisierter Agenten, die den Plan ausführen:
- **Forscher**: Führt Websuchen und Informationssammlung mit Tools wie Websuchmaschinen, Crawling und sogar MCP-Diensten durch.
- **Codierer**: Behandelt Codeanalyse, -ausführung und technische Aufgaben mit dem Python REPL Tool.
Jeder Agent hat Zugriff auf spezifische Tools, die für seine Rolle optimiert sind, und operiert innerhalb des LangGraph-Frameworks
4.**Reporter**: Endphasenprozessor für Forschungsergebnisse
- Aggregiert Erkenntnisse vom Forschungsteam
- Verarbeitet und strukturiert die gesammelten Informationen
- Erstellt umfassende Forschungsberichte
## Text-zu-Sprache-Integration
DeerFlow enthält jetzt eine Text-zu-Sprache (TTS)-Funktion, mit der Sie Forschungsberichte in Sprache umwandeln können. Diese Funktion verwendet die volcengine TTS API, um hochwertige Audios aus Text zu generieren. Funktionen wie Geschwindigkeit, Lautstärke und Tonhöhe können ebenfalls angepasst werden.
### Verwendung der TTS API
Sie können auf die TTS-Funktionalität über den Endpunkt `/api/tts` zugreifen:
```bash
# Beispiel API-Aufruf mit curl
curl --location 'http://localhost:8000/api/tts'\
--header 'Content-Type: application/json'\
--data '{
"text": "Dies ist ein Test der Text-zu-Sprache-Funktionalität.",
"speed_ratio": 1.0,
"volume_ratio": 1.0,
"pitch_ratio": 1.0
}'\
--output speech.mp3
```
## Entwicklung
### Testen
Führen Sie die Testsuite aus:
```bash
# Alle Tests ausführen
make test
# Spezifische Testdatei ausführen
pytest tests/integration/test_workflow.py
# Mit Abdeckung ausführen
make coverage
```
### Codequalität
```bash
# Lint ausführen
make lint
# Code formatieren
make format
```
### Debugging mit LangGraph Studio
DeerFlow verwendet LangGraph für seine Workflow-Architektur. Sie können LangGraph Studio verwenden, um den Workflow in Echtzeit zu debuggen und zu visualisieren.
#### LangGraph Studio lokal ausführen
DeerFlow enthält eine `langgraph.json`-Konfigurationsdatei, die die Graphstruktur und Abhängigkeiten für das LangGraph Studio definiert. Diese Datei verweist auf die im Projekt definierten Workflow-Graphen und lädt automatisch Umgebungsvariablen aus der `.env`-Datei.
##### Mac
```bash
# Installieren Sie den uv-Paketmanager, wenn Sie ihn noch nicht haben
curl -LsSf https://astral.sh/uv/install.sh | sh
# Installieren Sie Abhängigkeiten und starten Sie den LangGraph-Server
Um diese Beispiele auszuführen oder Ihre eigenen Forschungsberichte zu erstellen, können Sie die folgenden Befehle verwenden:
```bash
# Mit einer spezifischen Anfrage ausführen
uv run main.py "Welche Faktoren beeinflussen die KI-Adoption im Gesundheitswesen?"
# Mit benutzerdefinierten Planungsparametern ausführen
uv run main.py --max_plan_iterations 3 "Wie wirkt sich Quantencomputing auf die Kryptographie aus?"
# Im interaktiven Modus mit eingebauten Fragen ausführen
uv run main.py --interactive
# Oder mit grundlegendem interaktiven Prompt ausführen
uv run main.py
# Alle verfügbaren Optionen anzeigen
uv run main.py --help
```
### Interaktiver Modus
Die Anwendung unterstützt jetzt einen interaktiven Modus mit eingebauten Fragen in Englisch und Chinesisch:
1. Starten Sie den interaktiven Modus:
```bash
uv run main.py --interactive
```
2. Wählen Sie Ihre bevorzugte Sprache (English oder 中文)
3. Wählen Sie aus einer Liste von eingebauten Fragen oder wählen Sie die Option, Ihre eigene Frage zu stellen
4. Das System wird Ihre Frage verarbeiten und einen umfassenden Forschungsbericht generieren
### Mensch-in-der-Schleife
DeerFlow enthält einen Mensch-in-der-Schleife-Mechanismus, der es Ihnen ermöglicht, Forschungspläne vor ihrer Ausführung zu überprüfen, zu bearbeiten und zu genehmigen:
1. **Planüberprüfung**: Wenn Mensch-in-der-Schleife aktiviert ist, präsentiert das System den generierten Forschungsplan zur Überprüfung vor der Ausführung
2. **Feedback geben**: Sie können:
- Den Plan akzeptieren, indem Sie mit `[ACCEPTED]` antworten
- Den Plan bearbeiten, indem Sie Feedback geben (z.B., `[EDIT PLAN] Fügen Sie mehr Schritte zur technischen Implementierung hinzu`)
- Das System wird Ihr Feedback einarbeiten und einen überarbeiteten Plan generieren
3. **Automatische Akzeptanz**: Sie können die automatische Akzeptanz aktivieren, um den Überprüfungsprozess zu überspringen:
- Über API: Setzen Sie `auto_accepted_plan: true` in Ihrer Anfrage
4. **API-Integration**: Bei Verwendung der API können Sie Feedback über den Parameter `feedback` geben:
```json
{
"messages": [{"role": "user", "content": "Was ist Quantencomputing?"}],
"thread_id": "my_thread_id",
"auto_accepted_plan": false,
"feedback": "[EDIT PLAN] Mehr über Quantenalgorithmen aufnehmen"
}
```
### Kommandozeilenargumente
Die Anwendung unterstützt mehrere Kommandozeilenargumente, um ihr Verhalten anzupassen:
- **query**: Die zu verarbeitende Forschungsanfrage (kann mehrere Wörter umfassen)
- **--interactive**: Im interaktiven Modus mit eingebauten Fragen ausführen
- **--max_plan_iterations**: Maximale Anzahl von Planungszyklen (Standard: 1)
- **--max_step_num**: Maximale Anzahl von Schritten in einem Forschungsplan (Standard: 3)
Weitere Informationen finden Sie in der [FAQ.md](docs/FAQ.md).
## Lizenz
Dieses Projekt ist Open Source und unter der [MIT-Lizenz](./LICENSE) verfügbar.
## Danksagungen
DeerFlow baut auf der unglaublichen Arbeit der Open-Source-Community auf. Wir sind allen Projekten und Mitwirkenden zutiefst dankbar, deren Bemühungen DeerFlow möglich gemacht haben. Wahrhaftig stehen wir auf den Schultern von Riesen.
Wir möchten unsere aufrichtige Wertschätzung den folgenden Projekten für ihre unschätzbaren Beiträge aussprechen:
- **[LangChain](https://github.com/langchain-ai/langchain)**: Ihr außergewöhnliches Framework unterstützt unsere LLM-Interaktionen und -Ketten und ermöglicht nahtlose Integration und Funktionalität.
- **[LangGraph](https://github.com/langchain-ai/langgraph)**: Ihr innovativer Ansatz zur Multi-Agenten-Orchestrierung war maßgeblich für die Ermöglichung der ausgeklügelten Workflows von DeerFlow.
Diese Projekte veranschaulichen die transformative Kraft der Open-Source-Zusammenarbeit, und wir sind stolz darauf, auf ihren Grundlagen aufzubauen.
### Hauptmitwirkende
Ein herzliches Dankeschön geht an die Hauptautoren von `DeerFlow`, deren Vision, Leidenschaft und Engagement dieses Projekt zum Leben erweckt haben:
Ihr unerschütterliches Engagement und Fachwissen waren die treibende Kraft hinter dem Erfolg von DeerFlow. Wir fühlen uns geehrt, Sie an der Spitze dieser Reise zu haben.
## Star-Verlauf
[](https://star-history.com/#bytedance/deer-flow&Date)
> Originado del código abierto, retribuido al código abierto.
**DeerFlow** (**D**eep **E**xploration and **E**fficient **R**esearch **Flow**) es un marco de Investigación Profunda impulsado por la comunidad que se basa en el increíble trabajo de la comunidad de código abierto. Nuestro objetivo es combinar modelos de lenguaje con herramientas especializadas para tareas como búsqueda web, rastreo y ejecución de código Python, mientras devolvemos a la comunidad que hizo esto posible.
Por favor, visita [nuestra página web oficial](https://deerflow.tech/) para más detalles.
En esta demostración, mostramos cómo usar DeerFlow para:
- Integrar perfectamente con servicios MCP
- Realizar el proceso de Investigación Profunda y producir un informe completo con imágenes
- Crear audio de podcast basado en el informe generado
### Repeticiones
- [¿Qué altura tiene la Torre Eiffel comparada con el edificio más alto?](https://deerflow.tech/chat?replay=eiffel-tower-vs-tallest-building)
- [¿Cuáles son los repositorios más populares en GitHub?](https://deerflow.tech/chat?replay=github-top-trending-repo)
- [Escribir un artículo sobre los platos tradicionales de Nanjing](https://deerflow.tech/chat?replay=nanjing-traditional-dishes)
- [¿Cómo decorar un apartamento de alquiler?](https://deerflow.tech/chat?replay=rental-apartment-decoration)
- [Visita nuestra página web oficial para explorar más repeticiones.](https://deerflow.tech/#case-studies)
---
## 📑 Tabla de Contenidos
- [🚀 Inicio Rápido](#inicio-rápido)
- [🌟 Características](#características)
- [🏗️ Arquitectura](#arquitectura)
- [🛠️ Desarrollo](#desarrollo)
- [🐳 Docker](#docker)
- [🗣️ Integración de Texto a Voz](#integración-de-texto-a-voz)
- [📚 Ejemplos](#ejemplos)
- [❓ Preguntas Frecuentes](#preguntas-frecuentes)
- [📜 Licencia](#licencia)
- [💖 Agradecimientos](#agradecimientos)
- [⭐ Historial de Estrellas](#historial-de-estrellas)
## Inicio Rápido
DeerFlow está desarrollado en Python y viene con una interfaz web escrita en Node.js. Para garantizar un proceso de configuración sin problemas, recomendamos utilizar las siguientes herramientas:
Simplifica la gestión del entorno Python y las dependencias. `uv` crea automáticamente un entorno virtual en el directorio raíz e instala todos los paquetes necesarios por ti—sin necesidad de instalar entornos Python manualmente.
- **[`nvm`](https://github.com/nvm-sh/nvm):**
Gestiona múltiples versiones del entorno de ejecución Node.js sin esfuerzo.
- **[`pnpm`](https://pnpm.io/installation):**
Instala y gestiona dependencias del proyecto Node.js.
### Requisitos del Entorno
Asegúrate de que tu sistema cumple con los siguientes requisitos mínimos:
- **[Python](https://www.python.org/downloads/):** Versión `3.12+`
- **[Node.js](https://nodejs.org/en/download/):** Versión `22+`
Opcionalmente, instala las dependencias de la interfaz web vía [pnpm](https://pnpm.io/installation):
```bash
cd deer-flow/web
pnpm install
```
### Configuraciones
Por favor, consulta la [Guía de Configuración](docs/configuration_guide.md) para más detalles.
> [!NOTA]
> Antes de iniciar el proyecto, lee la guía cuidadosamente y actualiza las configuraciones para que coincidan con tus ajustes y requisitos específicos.
### Interfaz de Consola
La forma más rápida de ejecutar el proyecto es utilizar la interfaz de consola.
```bash
# Ejecutar el proyecto en un shell tipo bash
uv run main.py
```
### Interfaz Web
Este proyecto también incluye una Interfaz Web, que ofrece una experiencia interactiva más dinámica y atractiva.
> [!NOTA]
> Necesitas instalar primero las dependencias de la interfaz web.
```bash
# Ejecutar tanto el servidor backend como el frontend en modo desarrollo
# En macOS/Linux
./bootstrap.sh -d
# En Windows
bootstrap.bat -d
```
Abre tu navegador y visita [`http://localhost:3000`](http://localhost:3000) para explorar la interfaz web.
Explora más detalles en el directorio [`web`](./web/).
## Motores de Búsqueda Compatibles
DeerFlow soporta múltiples motores de búsqueda que pueden configurarse en tu archivo `.env` usando la variable `SEARCH_API`:
- **Tavily** (predeterminado): Una API de búsqueda especializada para aplicaciones de IA
- Requiere `TAVILY_API_KEY` en tu archivo `.env`
- Regístrate en: https://app.tavily.com/home
- **DuckDuckGo**: Motor de búsqueda centrado en la privacidad
- No requiere clave API
- **Brave Search**: Motor de búsqueda centrado en la privacidad con características avanzadas
- Requiere `BRAVE_SEARCH_API_KEY` en tu archivo `.env`
- Regístrate en: https://brave.com/search/api/
- **Arxiv**: Búsqueda de artículos científicos para investigación académica
- No requiere clave API
- Especializado en artículos científicos y académicos
Para configurar tu motor de búsqueda preferido, establece la variable `SEARCH_API` en tu archivo `.env`:
- Soporta la integración de la mayoría de los modelos a través de [litellm](https://docs.litellm.ai/docs/providers).
- Soporte para modelos de código abierto como Qwen
- Interfaz API compatible con OpenAI
- Sistema LLM de múltiples niveles para diferentes complejidades de tareas
### Herramientas e Integraciones MCP
- 🔍 **Búsqueda y Recuperación**
- Búsqueda web a través de Tavily, Brave Search y más
- Rastreo con Jina
- Extracción avanzada de contenido
- 🔗 **Integración Perfecta con MCP**
- Amplía capacidades para acceso a dominio privado, gráfico de conocimiento, navegación web y más
- Facilita la integración de diversas herramientas y metodologías de investigación
### Colaboración Humana
- 🧠 **Humano en el Bucle**
- Soporta modificación interactiva de planes de investigación usando lenguaje natural
- Soporta aceptación automática de planes de investigación
- 📝 **Post-Edición de Informes**
- Soporta edición de bloques tipo Notion
- Permite refinamientos por IA, incluyendo pulido asistido por IA, acortamiento y expansión de oraciones
- Impulsado por [tiptap](https://tiptap.dev/)
### Creación de Contenido
- 🎙️ **Generación de Podcasts y Presentaciones**
- Generación de guiones de podcast y síntesis de audio impulsadas por IA
- Creación automatizada de presentaciones PowerPoint simples
- Plantillas personalizables para contenido a medida
## Arquitectura
DeerFlow implementa una arquitectura modular de sistema multi-agente diseñada para investigación automatizada y análisis de código. El sistema está construido sobre LangGraph, permitiendo un flujo de trabajo flexible basado en estados donde los componentes se comunican a través de un sistema de paso de mensajes bien definido.

> Vélo en vivo en [deerflow.tech](https://deerflow.tech/#multi-agent-architecture)
El sistema emplea un flujo de trabajo racionalizado con los siguientes componentes:
1.**Coordinador**: El punto de entrada que gestiona el ciclo de vida del flujo de trabajo
- Inicia el proceso de investigación basado en la entrada del usuario
- Delega tareas al planificador cuando corresponde
- Actúa como la interfaz principal entre el usuario y el sistema
2.**Planificador**: Componente estratégico para descomposición y planificación de tareas
- Analiza objetivos de investigación y crea planes de ejecución estructurados
- Determina si hay suficiente contexto disponible o si se necesita más investigación
- Gestiona el flujo de investigación y decide cuándo generar el informe final
3.**Equipo de Investigación**: Una colección de agentes especializados que ejecutan el plan:
- **Investigador**: Realiza búsquedas web y recopilación de información utilizando herramientas como motores de búsqueda web, rastreo e incluso servicios MCP.
- **Programador**: Maneja análisis de código, ejecución y tareas técnicas utilizando la herramienta Python REPL.
Cada agente tiene acceso a herramientas específicas optimizadas para su rol y opera dentro del marco LangGraph
4.**Reportero**: Procesador de etapa final para los resultados de la investigación
- Agrega hallazgos del equipo de investigación
- Procesa y estructura la información recopilada
- Genera informes de investigación completos
## Integración de Texto a Voz
DeerFlow ahora incluye una función de Texto a Voz (TTS) que te permite convertir informes de investigación a voz. Esta función utiliza la API TTS de volcengine para generar audio de alta calidad a partir de texto. Características como velocidad, volumen y tono también son personalizables.
### Usando la API TTS
Puedes acceder a la funcionalidad TTS a través del punto final `/api/tts`:
```bash
# Ejemplo de llamada API usando curl
curl --location 'http://localhost:8000/api/tts'\
--header 'Content-Type: application/json'\
--data '{
"text": "Esto es una prueba de la funcionalidad de texto a voz.",
"speed_ratio": 1.0,
"volume_ratio": 1.0,
"pitch_ratio": 1.0
}'\
--output speech.mp3
```
## Desarrollo
### Pruebas
Ejecuta el conjunto de pruebas:
```bash
# Ejecutar todas las pruebas
make test
# Ejecutar archivo de prueba específico
pytest tests/integration/test_workflow.py
# Ejecutar con cobertura
make coverage
```
### Calidad del Código
```bash
# Ejecutar linting
make lint
# Formatear código
make format
```
### Depuración con LangGraph Studio
DeerFlow utiliza LangGraph para su arquitectura de flujo de trabajo. Puedes usar LangGraph Studio para depurar y visualizar el flujo de trabajo en tiempo real.
#### Ejecutando LangGraph Studio Localmente
DeerFlow incluye un archivo de configuración `langgraph.json` que define la estructura del grafo y las dependencias para LangGraph Studio. Este archivo apunta a los grafos de flujo de trabajo definidos en el proyecto y carga automáticamente variables de entorno desde el archivo `.env`.
##### Mac
```bash
# Instala el gestor de paquetes uv si no lo tienes
curl -LsSf https://astral.sh/uv/install.sh | sh
# Instala dependencias e inicia el servidor LangGraph
2. Inicia el rastreo y visualiza el grafo localmente con LangSmith ejecutando:
```bash
langgraph dev
```
Esto habilitará la visualización de rastros en LangGraph Studio y enviará tus rastros a LangSmith para monitoreo y análisis.
## Docker
También puedes ejecutar este proyecto con Docker.
Primero, necesitas leer la [configuración](docs/configuration_guide.md) a continuación. Asegúrate de que los archivos `.env` y `.conf.yaml` estén listos.
Segundo, para construir una imagen Docker de tu propio servidor web:
```bash
docker build -t deer-flow-api .
```
Finalmente, inicia un contenedor Docker que ejecute el servidor web:
```bash
# Reemplaza deer-flow-api-app con tu nombre de contenedor preferido
Para ejecutar estos ejemplos o crear tus propios informes de investigación, puedes usar los siguientes comandos:
```bash
# Ejecutar con una consulta específica
uv run main.py "¿Qué factores están influyendo en la adopción de IA en salud?"
# Ejecutar con parámetros de planificación personalizados
uv run main.py --max_plan_iterations 3 "¿Cómo impacta la computación cuántica en la criptografía?"
# Ejecutar en modo interactivo con preguntas integradas
uv run main.py --interactive
# O ejecutar con prompt interactivo básico
uv run main.py
# Ver todas las opciones disponibles
uv run main.py --help
```
### Modo Interactivo
La aplicación ahora soporta un modo interactivo con preguntas integradas tanto en inglés como en chino:
1. Lanza el modo interactivo:
```bash
uv run main.py --interactive
```
2. Selecciona tu idioma preferido (English o 中文)
3. Elige de una lista de preguntas integradas o selecciona la opción para hacer tu propia pregunta
4. El sistema procesará tu pregunta y generará un informe de investigación completo
### Humano en el Bucle
DeerFlow incluye un mecanismo de humano en el bucle que te permite revisar, editar y aprobar planes de investigación antes de que sean ejecutados:
1. **Revisión del Plan**: Cuando el humano en el bucle está habilitado, el sistema presentará el plan de investigación generado para tu revisión antes de la ejecución
2. **Proporcionando Retroalimentación**: Puedes:
- Aceptar el plan respondiendo con `[ACCEPTED]`
- Editar el plan proporcionando retroalimentación (p.ej., `[EDIT PLAN] Añadir más pasos sobre implementación técnica`)
- El sistema incorporará tu retroalimentación y generará un plan revisado
3. **Auto-aceptación**: Puedes habilitar la auto-aceptación para omitir el proceso de revisión:
- Vía API: Establece `auto_accepted_plan: true` en tu solicitud
4. **Integración API**: Cuando uses la API, puedes proporcionar retroalimentación a través del parámetro `feedback`:
```json
{
"messages": [{ "role": "user", "content": "¿Qué es la computación cuántica?" }],
"thread_id": "my_thread_id",
"auto_accepted_plan": false,
"feedback": "[EDIT PLAN] Incluir más sobre algoritmos cuánticos"
}
```
### Argumentos de Línea de Comandos
La aplicación soporta varios argumentos de línea de comandos para personalizar su comportamiento:
- **query**: La consulta de investigación a procesar (puede ser múltiples palabras)
- **--interactive**: Ejecutar en modo interactivo con preguntas integradas
- **--max_plan_iterations**: Número máximo de ciclos de planificación (predeterminado: 1)
- **--max_step_num**: Número máximo de pasos en un plan de investigación (predeterminado: 3)
- **--debug**: Habilitar registro detallado de depuración
## Preguntas Frecuentes
Por favor, consulta [FAQ.md](docs/FAQ.md) para más detalles.
## Licencia
Este proyecto es de código abierto y está disponible bajo la [Licencia MIT](./LICENSE).
## Agradecimientos
DeerFlow está construido sobre el increíble trabajo de la comunidad de código abierto. Estamos profundamente agradecidos a todos los proyectos y contribuyentes cuyos esfuerzos han hecho posible DeerFlow. Verdaderamente, nos apoyamos en hombros de gigantes.
Nos gustaría extender nuestro sincero agradecimiento a los siguientes proyectos por sus invaluables contribuciones:
- **[LangChain](https://github.com/langchain-ai/langchain)**: Su excepcional marco impulsa nuestras interacciones y cadenas LLM, permitiendo integración y funcionalidad sin problemas.
- **[LangGraph](https://github.com/langchain-ai/langgraph)**: Su enfoque innovador para la orquestación multi-agente ha sido instrumental en permitir los sofisticados flujos de trabajo de DeerFlow.
Estos proyectos ejemplifican el poder transformador de la colaboración de código abierto, y estamos orgullosos de construir sobre sus cimientos.
### Contribuyentes Clave
Un sentido agradecimiento va para los autores principales de `DeerFlow`, cuya visión, pasión y dedicación han dado vida a este proyecto:
Su compromiso inquebrantable y experiencia han sido la fuerza impulsora detrás del éxito de DeerFlow. Nos sentimos honrados de tenerlos al timón de este viaje.
## Historial de Estrellas
[](https://star-history.com/#bytedance/deer-flow&Date)
> Le 28 février 2026, DeerFlow a décroché la 🏆 1re place sur GitHub Trending suite au lancement de la version 2. Un immense merci à notre incroyable communauté — c'est grâce à vous ! 💪🔥
DeerFlow (**D**eep **E**xploration and **E**fficient **R**esearch **Flow**) est un **super agent harness** open source qui orchestre des **sub-agents**, de la **mémoire** et des **sandboxes** pour accomplir pratiquement n'importe quelle tâche — le tout propulsé par des **skills extensibles**.
> **DeerFlow 2.0 est une réécriture complète.** Il ne partage aucun code avec la v1. Si vous cherchez le framework Deep Research original, il est maintenu sur la [branche `1.x`](https://github.com/bytedance/deer-flow/tree/main-1.x) — les contributions y sont toujours les bienvenues. Le développement actif a migré vers la 2.0.
- [Développeurs en Chine continentale, cliquez ici](https://www.volcengine.com/activity/codingplan?utm_campaign=deer_flow&utm_content=deer_flow&utm_medium=devrel&utm_source=OWO&utm_term=deer_flow)
## InfoQuest
DeerFlow intègre désormais le toolkit de recherche et de crawling intelligent développé par BytePlus — [InfoQuest (essai gratuit en ligne)](https://docs.byteplus.com/en/docs/InfoQuest/What_is_Info_Quest)
## Installation en une phrase pour un coding agent
Si vous utilisez Claude Code, Codex, Cursor, Windsurf ou un autre coding agent, vous pouvez simplement lui envoyer cette phrase :
```text
Aide-moi à cloner DeerFlow si nécessaire, puis à initialiser son environnement de développement local en suivant https://raw.githubusercontent.com/bytedance/deer-flow/main/Install.md
```
Ce prompt est destiné aux coding agents. Il leur demande de cloner le dépôt si nécessaire, de privilégier Docker quand il est disponible, puis de s'arrêter avec la commande exacte pour lancer DeerFlow et la liste des configurations encore manquantes.
2. **Générer les fichiers de configuration locaux**
Depuis le répertoire racine du projet (`deer-flow/`), exécutez :
```bash
make config
```
Cette commande crée les fichiers de configuration locaux à partir des templates fournis.
3. **Configurer le(s) modèle(s) de votre choix**
Éditez `config.yaml` et définissez au moins un modèle :
```yaml
models:
- name: gpt-4 # Internal identifier
display_name: GPT-4 # Human-readable name
use: langchain_openai:ChatOpenAI # LangChain class path
model: gpt-4 # Model identifier for API
api_key: $OPENAI_API_KEY # API key (recommended: use env var)
max_tokens: 4096 # Maximum tokens per request
temperature: 0.7 # Sampling temperature
- name: openrouter-gemini-2.5-flash
display_name: Gemini 2.5 Flash (OpenRouter)
use: langchain_openai:ChatOpenAI
model: google/gemini-2.5-flash-preview
api_key: $OPENAI_API_KEY # OpenRouter still uses the OpenAI-compatible field name here
base_url: https://openrouter.ai/api/v1
- name: gpt-5-responses
display_name: GPT-5 (Responses API)
use: langchain_openai:ChatOpenAI
model: gpt-5
api_key: $OPENAI_API_KEY
use_responses_api: true
output_version: responses/v1
```
OpenRouter et les passerelles compatibles OpenAI similaires doivent être configurés avec `langchain_openai:ChatOpenAI` et `base_url`. Si vous préférez utiliser un nom de variable d'environnement propre au fournisseur, pointez `api_key` vers cette variable explicitement (par exemple `api_key: $OPENROUTER_API_KEY`).
Pour router les modèles OpenAI via `/v1/responses`, continuez d'utiliser `langchain_openai:ChatOpenAI` et définissez `use_responses_api: true` avec `output_version: responses/v1`.
- L'endpoint Responses de Codex rejette actuellement `max_tokens` et `max_output_tokens`, donc `CodexChatModel` n'expose pas de limite de tokens par requête
- Claude Code accepte `CLAUDE_CODE_OAUTH_TOKEN`, `ANTHROPIC_AUTH_TOKEN`, `CLAUDE_CODE_OAUTH_TOKEN_FILE_DESCRIPTOR`, `CLAUDE_CODE_CREDENTIALS_PATH`, ou en clair `~/.claude/.credentials.json`
- Sur macOS, DeerFlow ne sonde pas le Keychain automatiquement. Exportez l'auth Claude Code explicitement si nécessaire :
**Développement** (hot-reload, montage des sources) :
```bash
make docker-init # Pull sandbox image (only once or when image updates)
make docker-start # Start services (auto-detects sandbox mode from config.yaml)
```
`make docker-start` ne lance `provisioner` que si `config.yaml` utilise le mode provisioner (`sandbox.use: deerflow.community.aio_sandbox:AioSandboxProvider` avec `provisioner_url`).
Les processus backend récupèrent automatiquement les changements dans `config.yaml` au prochain accès à la configuration, donc les mises à jour de métadonnées des modèles ne nécessitent pas de redémarrage manuel en développement.
> [!TIP]
> Sous Linux, si les commandes Docker échouent avec `permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock`, ajoutez votre utilisateur au groupe `docker` et reconnectez-vous avant de réessayer. Voir [CONTRIBUTING.md](CONTRIBUTING.md#linux-docker-daemon-permission-denied) pour la solution complète.
**Production** (build des images en local, montage de la config et des données) :
```bash
make up # Build images and start all production services
make down # Stop and remove containers
```
> [!NOTE]
> Le serveur d'agents LangGraph fonctionne actuellement via `langgraph dev` (le serveur CLI open source).
Accès : http://localhost:2026
Voir [CONTRIBUTING.md](CONTRIBUTING.md) pour le guide complet de développement avec Docker.
#### Option 2 : Développement local
Si vous préférez lancer les services en local :
Prérequis : complétez d'abord les étapes de « Configuration » ci-dessus (`make config` et clés API des modèles). `make dev` nécessite un fichier de configuration valide (par défaut `config.yaml` à la racine du projet ; modifiable via `DEER_FLOW_CONFIG_PATH`).
1. **Vérifier les prérequis** :
```bash
make check # Verifies Node.js 22+, pnpm, uv, nginx
```
2. **Installer les dépendances** :
```bash
make install # Install backend + frontend dependencies
# Recommended if using Docker/Container-based sandbox
make setup-sandbox
```
4. **Démarrer les services** :
```bash
make dev
```
5. **Accès** : http://localhost:2026
### Avancé
#### Mode Sandbox
DeerFlow supporte plusieurs modes d'exécution sandbox :
- **Exécution locale** (exécute le code sandbox directement sur la machine hôte)
- **Exécution Docker** (exécute le code sandbox dans des conteneurs Docker isolés)
- **Exécution Docker avec Kubernetes** (exécute le code sandbox dans des pods Kubernetes via le service provisioner)
En développement Docker, le démarrage des services suit le mode sandbox défini dans `config.yaml`. En mode Local/Docker, `provisioner` n'est pas démarré.
Voir le [Guide de configuration Sandbox](backend/docs/CONFIGURATION.md#sandbox) pour configurer le mode de votre choix.
#### Serveur MCP
DeerFlow supporte des serveurs MCP et des skills configurables pour étendre ses capacités.
Pour les serveurs MCP HTTP/SSE, les flux de tokens OAuth sont supportés (`client_credentials`, `refresh_token`).
Voir le [Guide MCP Server](backend/docs/MCP_SERVER.md) pour les instructions détaillées.
#### Canaux de messagerie
DeerFlow peut recevoir des tâches depuis des applications de messagerie. Les canaux démarrent automatiquement une fois configurés — aucune IP publique n'est requise.
| Canal | Transport | Difficulté |
|---------|-----------|------------|
| Telegram | Bot API (long-polling) | Facile |
| Slack | Socket Mode | Modérée |
| Feishu / Lark | WebSocket | Modérée |
| DingTalk | Stream Push (WebSocket) | Modérée |
**Configuration dans `config.yaml` :**
```yaml
channels:
# LangGraph Server URL (default: http://localhost:2024)
langgraph_url: http://localhost:2024
# Gateway API URL (default: http://localhost:8001)
gateway_url: http://localhost:8001
# Optional: global session defaults for all mobile channels
session:
assistant_id: lead_agent
config:
recursion_limit: 100
context:
thinking_enabled: true
is_plan_mode: false
subagent_enabled: false
feishu:
enabled: true
app_id: $FEISHU_APP_ID
app_secret: $FEISHU_APP_SECRET
# domain: https://open.feishu.cn # China (default)
# domain: https://open.larksuite.com # International
1. Ouvrez une conversation avec [@BotFather](https://t.me/BotFather), envoyez `/newbot`, et copiez le token HTTP API.
2. Définissez `TELEGRAM_BOT_TOKEN` dans `.env` et activez le canal dans `config.yaml`.
**Configuration Slack**
1. Créez une Slack App sur [api.slack.com/apps](https://api.slack.com/apps) → Create New App → From scratch.
2. Dans **OAuth & Permissions**, ajoutez les Bot Token Scopes : `app_mentions:read`, `chat:write`, `im:history`, `im:read`, `im:write`, `files:write`.
3. Activez le **Socket Mode** → générez un App-Level Token (`xapp-…`) avec le scope `connections:write`.
4. Dans **Event Subscriptions**, abonnez-vous aux bot events : `app_mention`, `message.im`.
5. Définissez `SLACK_BOT_TOKEN` et `SLACK_APP_TOKEN` dans `.env` et activez le canal dans `config.yaml`.
**Configuration Feishu / Lark**
1. Créez une application sur [Feishu Open Platform](https://open.feishu.cn/) → activez la capacité **Bot**.
2. Ajoutez les permissions : `im:message`, `im:message.p2p_msg:readonly`, `im:resource`.
3. Dans **Events**, abonnez-vous à `im.message.receive_v1` et sélectionnez le mode **Long Connection**.
4. Copiez l'App ID et l'App Secret. Définissez `FEISHU_APP_ID` et `FEISHU_APP_SECRET` dans `.env` et activez le canal dans `config.yaml`.
**Configuration DingTalk**
1. Créez une application sur [DingTalk Open Platform](https://open.dingtalk.com/) et activez la capacité **Robot**.
2. Dans la page de configuration du robot, définissez le mode de réception des messages sur **Stream**.
3. Copiez le `Client ID` et le `Client Secret`. Définissez `DINGTALK_CLIENT_ID` et `DINGTALK_CLIENT_SECRET` dans `.env` et activez le canal dans `config.yaml`.
4. *(Optionnel)* Pour activer les réponses en streaming AI Card (effet machine à écrire), créez un modèle **AI Card** sur la [plateforme de cartes DingTalk](https://open.dingtalk.com/document/dingstart/typewriter-effect-streaming-ai-card), puis définissez `card_template_id` dans `config.yaml` avec l'ID du modèle. Vous devez également demander les permissions `Card.Streaming.Write` et `Card.Instance.Write`.
**Commandes**
Une fois un canal connecté, vous pouvez interagir avec DeerFlow directement depuis le chat :
| Commande | Description |
|---------|-------------|
| `/new` | Démarrer une nouvelle conversation |
| `/status` | Afficher les infos du thread en cours |
| `/models` | Lister les modèles disponibles |
| `/memory` | Consulter la mémoire |
| `/help` | Afficher l'aide |
> Les messages sans préfixe de commande sont traités comme du chat classique — DeerFlow crée un thread et répond de manière conversationnelle.
#### Traçage LangSmith
DeerFlow intègre nativement [LangSmith](https://smith.langchain.com) pour l'observabilité. Une fois activé, tous les appels LLM, les exécutions d'agents et les exécutions d'outils sont tracés et visibles dans le tableau de bord LangSmith.
Ajoutez les lignes suivantes à votre fichier `.env` :
Pour les déploiements Docker, le traçage est désactivé par défaut. Définissez `LANGSMITH_TRACING=true` et `LANGSMITH_API_KEY` dans votre `.env` pour l'activer.
## Du Deep Research au Super Agent Harness
DeerFlow a démarré comme un framework de Deep Research — et la communauté s'en est emparée. Depuis le lancement, les développeurs l'ont poussé bien au-delà de la recherche : construction de pipelines de données, génération de présentations, mise en place de dashboards, automatisation de workflows de contenu. Des usages qu'on n'avait jamais anticipés.
Ça nous a révélé quelque chose d'important : DeerFlow n'était pas qu'un simple outil de recherche. C'était un **harness** — un runtime qui donne aux agents l'infrastructure nécessaire pour vraiment accomplir du travail.
On l'a donc reconstruit de zéro.
DeerFlow 2.0 n'est plus un framework à assembler soi-même. C'est un super agent harness — clé en main et entièrement extensible. Construit sur LangGraph et LangChain, il embarque tout ce dont un agent a besoin out of the box : un système de fichiers, de la mémoire, des skills, une exécution sandboxée, et la capacité de planifier et de lancer des sub-agents pour les tâches complexes et multi-étapes.
Utilisez-le tel quel. Ou démontez-le et faites-en le vôtre.
## Fonctionnalités principales
### Skills et outils
Les skills sont ce qui permet à DeerFlow de faire *pratiquement n'importe quoi*.
Un Agent Skill standard est un module de capacité structuré — un fichier Markdown qui définit un workflow, des bonnes pratiques et des références vers des ressources associées. DeerFlow est livré avec des skills intégrés pour la recherche, la génération de rapports, la création de présentations, les pages web, la génération d'images et de vidéos, et bien plus. Mais la vraie force réside dans l'extensibilité : ajoutez vos propres skills, remplacez ceux fournis, ou combinez-les en workflows composites.
Les skills sont chargés progressivement — uniquement quand la tâche le nécessite, pas tous en même temps. Ça permet de garder la fenêtre de contexte légère et de bien fonctionner même avec des modèles sensibles au nombre de tokens.
Quand vous installez des archives `.skill` via le Gateway, DeerFlow accepte les métadonnées frontmatter optionnelles standard comme `version`, `author` et `compatibility`, plutôt que de rejeter des skills externes par ailleurs valides.
Les outils suivent la même philosophie. DeerFlow est livré avec un ensemble d'outils de base — recherche web, fetch de pages web, opérations sur les fichiers, exécution bash — et supporte les outils custom via des serveurs MCP et des fonctions Python. Remplacez n'importe quoi. Ajoutez n'importe quoi.
Les suggestions de suivi générées par le Gateway normalisent désormais aussi bien la sortie texte brut du modèle que le contenu riche au format bloc/liste avant de parser la réponse en tableau JSON, de sorte que les wrappers de contenu propres à chaque provider ne suppriment plus silencieusement les suggestions.
```
# Paths inside the sandbox container
/mnt/skills/public
├── research/SKILL.md
├── report-generation/SKILL.md
├── slide-creation/SKILL.md
├── web-page/SKILL.md
└── image-generation/SKILL.md
/mnt/skills/custom
└── your-custom-skill/SKILL.md ← yours
```
#### Intégration Claude Code
Le skill `claude-to-deerflow` vous permet d'interagir avec une instance DeerFlow en cours d'exécution directement depuis [Claude Code](https://docs.anthropic.com/en/docs/claude-code). Envoyez des tâches de recherche, vérifiez le statut, gérez les threads — le tout sans quitter le terminal.
Assurez-vous ensuite que DeerFlow tourne (par défaut sur `http://localhost:2026`) et utilisez la commande `/claude-to-deerflow` dans Claude Code.
**Ce que vous pouvez faire** :
- Envoyer des messages à DeerFlow et recevoir des réponses en streaming
- Choisir le mode d'exécution : flash (rapide), standard, pro (planification), ultra (sub-agents)
- Vérifier la santé de DeerFlow, lister les modèles/skills/agents
- Gérer les threads et l'historique des conversations
- Upload des fichiers pour analyse
**Variables d'environnement** (optionnel, pour des endpoints custom) :
```bash
DEERFLOW_URL=http://localhost:2026 # Unified proxy base URL
DEERFLOW_GATEWAY_URL=http://localhost:2026 # Gateway API
DEERFLOW_LANGGRAPH_URL=http://localhost:2026/api/langgraph # LangGraph API
```
Voir [`skills/public/claude-to-deerflow/SKILL.md`](skills/public/claude-to-deerflow/SKILL.md) pour la référence API complète.
### Sub-Agents
Les tâches complexes tiennent rarement en une seule passe. DeerFlow les décompose.
L'agent principal peut lancer des sub-agents à la volée — chacun avec son propre contexte délimité, ses outils et ses conditions d'arrêt. Les sub-agents s'exécutent en parallèle quand c'est possible, remontent des résultats structurés, et l'agent principal synthétise le tout en une sortie cohérente.
C'est comme ça que DeerFlow gère les tâches qui prennent de quelques minutes à plusieurs heures : une tâche de recherche peut se déployer en une dizaine de sub-agents, chacun explorant un angle différent, puis converger vers un seul rapport — ou un site web — ou un jeu de slides avec des visuels générés. Un seul harness, de nombreuses mains.
### Sandbox et système de fichiers
DeerFlow ne se contente pas de *parler* de faire les choses. Il dispose de son propre ordinateur.
Chaque tâche s'exécute dans un conteneur Docker isolé avec un système de fichiers complet — skills, workspace, uploads, outputs. L'agent lit, écrit et édite des fichiers. Il exécute des commandes bash et du code. Il visualise des images. Le tout sandboxé, le tout auditable, zéro contamination entre les sessions.
C'est la différence entre un chatbot avec accès à des outils et un agent doté d'un véritable environnement d'exécution.
```
# Paths inside the sandbox container
/mnt/user-data/
├── uploads/ ← your files
├── workspace/ ← agents' working directory
└── outputs/ ← final deliverables
```
### Context Engineering
**Contexte isolé des Sub-Agents** : chaque sub-agent s'exécute dans son propre contexte isolé. Il ne peut voir ni le contexte de l'agent principal, ni celui des autres sub-agents. L'objectif est de garantir que chaque sub-agent reste concentré sur sa tâche sans être parasité par des informations non pertinentes.
**Résumé** : au sein d'une session, DeerFlow gère le contexte de manière agressive — en résumant les sous-tâches terminées, en déchargeant les résultats intermédiaires vers le système de fichiers, en compressant ce qui n'est plus immédiatement pertinent. Ça lui permet de rester efficace sur des tâches longues et multi-étapes sans faire exploser la fenêtre de contexte.
### Mémoire à long terme
La plupart des agents oublient tout dès qu'une conversation se termine. DeerFlow, lui, se souvient.
D'une session à l'autre, DeerFlow construit une mémoire persistante de votre profil, de vos préférences et de vos connaissances accumulées. Plus vous l'utilisez, mieux il vous connaît — votre style d'écriture, votre stack technique, vos workflows récurrents. La mémoire est stockée localement et reste sous votre contrôle.
Les mises à jour de la mémoire ignorent désormais les entrées de faits en double au moment de l'application, de sorte que les préférences et le contexte répétés ne s'accumulent plus indéfiniment entre les sessions.
## Modèles recommandés
DeerFlow est agnostique en termes de modèle — il fonctionne avec n'importe quel LLM implémentant l'API compatible OpenAI. Cela dit, il offre de meilleures performances avec des modèles qui supportent :
- **De longues fenêtres de contexte** (100k+ tokens) pour la recherche approfondie et les tâches multi-étapes
- **Des capacités de raisonnement** pour la planification adaptative et la décomposition de tâches complexes
- **Des entrées multimodales** pour la compréhension d'images et de vidéos
- **Un usage fiable des outils (tool use)** pour des appels de fonctions et des sorties structurées fiables
## Client Python intégré
DeerFlow peut être utilisé comme bibliothèque Python intégrée sans lancer l'ensemble des services HTTP. Le `DeerFlowClient` fournit un accès direct in-process à toutes les capacités d'agent et de Gateway, en retournant les mêmes schémas de réponse que l'API HTTP Gateway. Le HTTP Gateway expose également `DELETE /api/threads/{thread_id}` pour supprimer les données de thread locales gérées par DeerFlow après la suppression du thread LangGraph :
```python
from deerflow.client import DeerFlowClient
client = DeerFlowClient()
# Chat
response = client.chat("Analyze this paper for me", thread_id="my-thread")
Toutes les méthodes retournant des dicts sont validées en CI contre les modèles de réponse Pydantic du Gateway (`TestGatewayConformance`), garantissant que le client intégré reste synchronisé avec les schémas de l'API HTTP. Voir `backend/packages/harness/deerflow/client.py` pour la documentation API complète.
## Documentation
- [Guide de contribution](CONTRIBUTING.md) - Mise en place de l'environnement de développement et workflow
- [Guide de configuration](backend/docs/CONFIGURATION.md) - Instructions d'installation et de configuration
- [Vue d'ensemble de l'architecture](backend/CLAUDE.md) - Détails de l'architecture technique
- [Architecture backend](backend/README.md) - Architecture backend et référence API
## ⚠️ Avertissement de sécurité
### Un déploiement inapproprié peut introduire des risques de sécurité
DeerFlow dispose de capacités clés à hauts privilèges, notamment **l'exécution de commandes système, les opérations sur les ressources et l'invocation de logique métier**. Il est conçu par défaut pour être **déployé dans un environnement local de confiance (accessible uniquement via l'interface de loopback 127.0.0.1)**. Si vous déployez l'agent dans des environnements non fiables — tels que des réseaux LAN, des serveurs cloud publics ou d'autres environnements accessibles depuis plusieurs terminaux — sans mesures de sécurité strictes, cela peut introduire des risques, notamment :
- **Invocation non autorisée** : les fonctionnalités de l'agent pourraient être découvertes par des tiers non autorisés ou des scanners malveillants, déclenchant des requêtes non autorisées en masse qui exécutent des opérations à haut risque (commandes système, lecture/écriture de fichiers), pouvant causer de graves conséquences.
- **Risques juridiques et de conformité** : si l'agent est utilisé illégalement pour mener des cyberattaques, du vol de données ou d'autres activités illicites, cela peut entraîner des responsabilités juridiques et des risques de conformité.
### Recommandations de sécurité
**Note : nous recommandons fortement de déployer DeerFlow dans un environnement réseau local de confiance.** Si vous avez besoin d'un déploiement multi-appareils ou multi-réseaux, vous devez mettre en place des mesures de sécurité strictes, par exemple :
- **Liste blanche d'IP** : utilisez `iptables`, ou déployez des pare-feux matériels / commutateurs avec ACL, pour **configurer des règles de liste blanche d'IP** et refuser l'accès à toutes les autres adresses IP.
- **Passerelle d'authentification** : configurez un proxy inverse (ex. nginx) et **activez une authentification forte en amont**, bloquant tout accès non authentifié.
- **Isolation réseau** : si possible, placez l'agent et les appareils de confiance dans le **même VLAN dédié**, isolé des autres équipements réseau.
- **Restez informé** : continuez à suivre les mises à jour de sécurité du projet DeerFlow.
## Contribuer
Les contributions sont les bienvenues ! Consultez [CONTRIBUTING.md](CONTRIBUTING.md) pour la mise en place de l'environnement de développement, le workflow et les conventions.
La couverture de tests de régression inclut la détection du mode sandbox Docker et les tests de gestion du kubeconfig-path du provisioner dans `backend/tests/`.
## Licence
Ce projet est open source et disponible sous la [Licence MIT](./LICENSE).
## Remerciements
DeerFlow est construit sur le travail remarquable de la communauté open source. Nous sommes profondément reconnaissants envers tous les projets et contributeurs dont les efforts ont rendu DeerFlow possible. Nous nous tenons véritablement sur les épaules de géants.
Nous tenons à exprimer notre sincère gratitude aux projets suivants pour leurs contributions inestimables :
- **[LangChain](https://github.com/langchain-ai/langchain)** : leur excellent framework propulse nos interactions LLM et nos chaînes, permettant une intégration et des fonctionnalités fluides.
- **[LangGraph](https://github.com/langchain-ai/langgraph)** : leur approche innovante de l'orchestration multi-agents a été déterminante pour les workflows sophistiqués de DeerFlow.
Ces projets illustrent le pouvoir transformateur de la collaboration open source, et nous sommes fiers de bâtir sur leurs fondations.
### Contributeurs principaux
Un grand merci aux auteurs principaux de `DeerFlow`, dont la vision, la passion et le dévouement ont donné vie à ce projet :
> Originado do Open Source, de volta ao Open Source
**DeerFlow** (**D**eep **E**xploration and **E**fficient **R**esearch **Flow**) é um framework de Pesquisa Profunda orientado-a-comunidade que baseia-se em um íncrivel trabalho da comunidade open source. Nosso objetivo é combinar modelos de linguagem com ferramentas especializadas para tarefas como busca na web, crawling, e execução de código Python, enquanto retribui com a comunidade que o tornou possível.
Por favor, visite [Nosso Site Oficial](https://deerflow.tech/) para maiores detalhes.
DeerFlow é desenvolvido em Python, e vem com uma IU web escrita em Node.js. Para garantir um processo de configuração fácil, nós recomendamos o uso das seguintes ferramentas:
Simplifica o gerenciamento de dependência de ambientes Python. `uv` automaticamente cria um ambiente virtual no diretório raiz e instala todos os pacotes necessários para não haver a necessidade de instalar ambientes Python manualmente
- **[`nvm`](https://github.com/nvm-sh/nvm):**
Gerencia múltiplas versões do ambiente de execução do Node.js sem esforço.
- **[`pnpm`](https://pnpm.io/installation):**
Instala e gerencia dependências do projeto Node.js.
### Requisitos de Ambiente
Certifique-se de que seu sistema atenda os seguintes requisitos mínimos:
- **[Python](https://www.python.org/downloads/):** Versão `3.12+`
- **[Node.js](https://nodejs.org/en/download/):** Versão `22+`
- Suporta a integração da maioria dos modelos através de [litellm](https://docs.litellm.ai/docs/providers).
- Suporte a modelos open source como Qwen
- Interface API compatível com a OpenAI
- Sistema LLM multicamadas para diferentes complexidades de tarefa
### Ferramentas e Integrações MCP
- 🔍 **Busca e Recuperação**
- Busca web com Tavily, Brave Search e mais
- Crawling com Jina
- Extração de Conteúdo avançada
- 🔗 **Integração MCP perfeita**
- Expansão de capacidades de acesso para acesso a domínios privados, grafo de conhecimento, navegação web e mais
- Integração facilitdade de diversas ferramentas de pesquisa e metodologias
### Colaboração Humana
- 🧠 **Humano-no-processo**
- Suporta modificação interativa de planos de pesquisa usando linguagem natural
- Suporta auto-aceite de planos de pesquisa
- 📝 **Relatório Pós-Edição**
- Suporta edição de edição de blocos estilo Notion
- Permite refinamentos de IA, incluindo polimento de IA assistida, encurtamento de frase, e expansão
- Distribuído por [tiptap](https://tiptap.dev/)
### Criação de Conteúdo
- 🎙️ **Geração de Podcast e apresentação**
- Script de geração de podcast e síntese de áudio movido por IA
- Criação automatizada de apresentações PowerPoint simples
- Templates customizáveis para conteúdo personalizado
## Arquitetura
DeerFlow implementa uma arquitetura de sistema multi-agente modular designada para pesquisa e análise de código automatizada. O sistema é construído em LangGraph, possibilitando um fluxo de trabalho flexível baseado-em-estado onde os componentes se comunicam através de um sistema de transmissão de mensagens bem-definido.

> Veja ao vivo em [deerflow.tech](https://deerflow.tech/#multi-agent-architecture)
O sistema emprega um fluxo de trabalho simplificado com os seguintes componentes:
1.**Coordenador**: O ponto de entrada que gerencia o ciclo de vida do fluxo de trabalho
- Inicia o processo de pesquisa baseado na entrada do usuário
- Delega tarefas so planejador quando apropriado
- Atua como a interface primária entre o usuário e o sistema
2.**Planejador**: Componente estratégico para a decomposição e planejamento
- Analisa objetivos de pesquisa e cria planos de execução estruturados
- Determina se há contexto suficiente disponível ou se mais pesquisa é necessária
- Gerencia o fluxo de pesquisa e decide quando gerar o relatório final
3.**Time de Pesquisa**: Uma coleção de agentes especializados que executam o plano:
- **Pesquisador**: Conduz buscas web e coleta informações utilizando ferramentas como mecanismos de busca web, crawling e mesmo serviços MCP.
- **Programador**: Lida com a análise de código, execução e tarefas técnicas como usar a ferramenta Python REPL.
Cada agente tem acesso à ferramentas específicas otimizadas para seu papel e opera dentro do fluxo de trabalho LangGraph.
4.**Repórter**: Estágio final do processador de estágio para saídas de pesquisa
- Resultados agregados do time de pesquisa
- Processa e estrutura as informações coletadas
- Gera relatórios abrangentes de pesquisas
## Texto-para-Fala Integração
DeerFlow agora inclui uma funcionalidade Texto-para-Fala (TTS) que permite que você converta relatórios de busca para voz. Essa funcionalidade usa o mecanismo de voz da API TTS para gerar áudio de alta qualidade a partir do texto. Funcionalidades como velocidade, volume e tom também são customizáveis.
### Usando a API TTS
Você pode acessar a funcionalidade TTS através do endpoint `/api/tts`:
```bash
# Exemplo de chamada da API usando curl
curl --location 'http://localhost:8000/api/tts'\
--header 'Content-Type: application/json'\
--data '{
"text": "This is a test of the text-to-speech functionality.",
"speed_ratio": 1.0,
"volume_ratio": 1.0,
"pitch_ratio": 1.0
}'\
--output speech.mp3
```
## Desenvolvimento
### Testando
Rode o conjunto de testes:
```bash
# Roda todos os testes
make test
# Roda um arquivo de teste específico
pytest tests/integration/test_workflow.py
# Roda com coverage
make coverage
```
### Qualidade de Código
```bash
# Roda o linting
make lint
# Formata de código
make format
```
### Debugando com o LangGraph Studio
DeerFlow usa LangGraph para sua arquitetura de fluxo de trabalho. Nós podemos usar o LangGraph Studio para debugar e visualizar o fluxo de trabalho em tempo real.
#### Rodando o LangGraph Studio Localmente
DeerFlow inclui um arquivo de configuração `langgraph.json` que define a estrutura do grafo e dependências para o LangGraph Studio. Esse arquivo aponta para o grafo do fluxo de trabalho definido no projeto e automaticamente carrega as variáveis de ambiente do arquivo `.env`.
##### Mac
```bash
# Instala o gerenciador de pacote uv caso você não o possua
curl -LsSf https://astral.sh/uv/install.sh | sh
# Instala as dependências e inicia o servidor LangGraph
### Docker Compose (inclui ambos backend e frontend)
DeerFlow fornece uma estrutura docker-compose para facilmente executar ambos o backend e frontend juntos:
```bash
# building docker image
docker compose build
# start the server
docker compose up
```
## Exemplos:
Os seguintes exemplos demonstram as capacidades do DeerFlow:
### Relatórios de Pesquisa
1.**Relatório OpenAI Sora** - Análise da ferramenta Sora da OpenAI
- Discute funcionalidades, acesso, engenharia de prompt, limitações e considerações éticas
- [Veja o relatório completo](examples/openai_sora_report.md)
2.**Relatório Protocolo Agent-to-Agent do Google** - Visão geral do protocolo Agent-to-Agent (A2A) do Google
- Discute o seu papel na comunicação de Agente de IA e seu relacionamento com o Protocolo de Contexto de Modelo ( MCP ) da Anthropic
- [Veja o relatório completo](examples/what_is_agent_to_agent_protocol.md)
3.**O que é MCP?** - Uma análise abrangente to termo "MCP" através de múltiplos contextos
- Explora o Protocolo de Contexto de Modelo em IA, Fosfato Monocálcio em Química, e placa de microcanal em eletrônica
- [Veja o relatório completo](examples/what_is_mcp.md)
4.**Bitcoin Price Fluctuations** - Análise das recentes movimentações de preço do Bitcoin
- Examina tendências de mercado, influências regulatórias, e indicadores técnicos
- Fornece recomendações baseadas nos dados históricos
- [Veja o relatório completo](examples/bitcoin_price_fluctuation.md)
5.**O que é LLM?** - Uma exploração em profundidade de Large Language Models
- Discute arquitetura, treinamento, aplicações, e considerações éticas
- [Veja o relatório completo](examples/what_is_llm.md)
6.**Como usar Claude para Pesquisa Aprofundada?** - Melhores práticas e fluxos de trabalho para usar Claude em pesquisa aprofundada
- Cobre engenharia de prompt, análise de dados, e integração com outras ferramentas
- [Veja o relatório completo](examples/how_to_use_claude_deep_research.md)
7.**Adoção de IA na Área da Saúde: Fatores de Influência** - Análise dos fatores que levam à adoção de IA na área da saúde
- Discute tecnologias de IA, qualidade de dados, considerações éticas, avaliações econômicas, prontidão organizacional, e infraestrutura digital
- [Veja o relatório completo](examples/AI_adoption_in_healthcare.md)
8.**Impacto da Computação Quântica em Criptografia** - Análise dos impactos da computação quântica em criptografia
- Discture vulnerabilidades da criptografia clássica, criptografia pós-quântica, e soluções criptográficas de resistência-quântica
- [Veja o relatório completo](examples/Quantum_Computing_Impact_on_Cryptography.md)
9.**Destaques da Performance do Cristiano Ronaldo** - Análise dos destaques da performance do Cristiano Ronaldo
- Discute as suas conquistas de carreira, objetivos internacionais, e performance em diversas partidas
- [Veja o relatório completo](examples/Cristiano_Ronaldo's_Performance_Highlights.md)
Para executar esses exemplos ou criar seus próprios relatórios de pesquisa, você deve utilizar os seguintes comandos:
```bash
# Executa com uma consulta específica
uv run main.py "Quais fatores estão influenciando a adoção de IA na área da saúde?"
# Executa com parâmetros de planejamento customizados
uv run main.py --max_plan_iterations 3"Como a computação quântica impacta na criptografia?"
# Executa em modo interativo com questões embutidas
uv run main.py --interactive
# Ou executa com um prompt interativo básico
uv run main.py
# Vê todas as opções disponíveis
uv run main.py --help
```
### Modo Interativo
A aplicação agora suporta um modo interativo com questões embutidas tanto em Inglês quanto Chinês:
1. Inicie o modo interativo:
```bash
uv run main.py --interactive
```
2. Selecione sua linguagem de preferência (English or 中文)
3. Escolha uma das questões embutidas da lista ou selecione a opção para perguntar sua própria questão
4. O sistema irá processar sua questão e gerar um relatório abrangente de pesquisa
### Humano no processo
DeerFlow inclue um mecanismo de humano no processo que permite a você revisar, editar e aprovar planos de pesquisa antes que estes sejam executados:
1. **Revisão de Plano**: Quando o humano no processo está habilitado, o sistema irá apresentar o plano de pesquisa gerado para sua revisão antes da execução
2. **Fornecimento de Feedback**: Você pode:
- Aceitar o plano respondendo com `[ACCEPTED]`
- Edite o plano fornecendo feedback (e.g., `[EDIT PLAN] Adicione mais passos sobre a implementação técnica`)
- O sistema irá incorporar seu feedback e gerar um plano revisado
3. **Auto-aceite**: Você pode habilitar o auto-aceite ou pular o processo de revisão:
- Via API: Defina `auto_accepted_plan: true` na sua requisição
4. **Integração de API**: Quanto usar a API, você pode fornecer um feedback através do parâmetro `feedback`:
```json
{
"messages": [{ "role": "user", "content": "O que é computação quântica?" }],
"thread_id": "my_thread_id",
"auto_accepted_plan": false,
"feedback": "[EDIT PLAN] Inclua mais sobre algoritmos quânticos"
}
```
### Argumentos via Linha de Comando
A aplicação suporta diversos argumentos via linha de comando para customizar o seu comportamento:
- **consulta**: A consulta de pesquisa a ser processada (podem ser múltiplas palavras)
- **--interativo**: Roda no modo interativo com questões embutidas
- **--max_plan_iterations**: Número máximo de ciclos de planejamento (padrão: 1)
- **--max_step_num**: Número máximo de passos em um plano de pesquisa (padrão: 3)
- **--debug**: Habilita Enable um log de depuração detalhado
## FAQ
Por favor consulte a [FAQ.md](docs/FAQ.md) para maiores detalhes.
## Licença
Esse projeto é open source e disponível sob a [MIT License](./LICENSE).
## Agradecimentos
DeerFlow é construído através do incrível trabalho da comunidade open-source. Nós somos profundamente gratos a todos os projetos e contribuidores cujos esforços tornaram o DeerFlow possível. Realmente, nós estamos apoiados nos ombros de gigantes.
Nós gostaríamos de extender nossos sinceros agradecimentos aos seguintes projetos por suas invaloráveis contribuições:
- **[LangChain](https://github.com/langchain-ai/langchain)**: O framework excepcional deles empodera nossas interações via LLM e correntes, permitindo uma integração perfeita e funcional.
- **[LangGraph](https://github.com/langchain-ai/langgraph)**: A abordagem inovativa para orquestração multi-agente deles tem sido foi fundamental em permitir o acesso dos fluxos de trabalho sofisticados do DeerFlow.
Esses projetos exemplificam o poder transformador da colaboração open-source, e nós temos orgulho de construir baseado em suas fundações.
### Contribuidores-Chave
Um sincero muito obrigado vai para os principais autores do `DeerFlow`, cuja visão, paixão, e dedicação trouxe esse projeto à vida:
O seu compromisso inabalável e experiência tem sido a força por trás do sucesso do DeerFlow. Nós estamos honrados em tê-los no comando dessa trajetória.
## Histórico-Estrelas
[](https://star-history.com/#bytedance/deer-flow&Date)
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
DeerFlow is a LangGraph-based AI super agent system with a full-stack architecture. The backend provides a "super agent" with sandbox execution, persistent memory, subagent delegation, and extensible tool integration - all operating in per-thread isolated environments.
**Architecture**:
- **Gateway API** (port 8001): REST API plus embedded LangGraph-compatible agent runtime
- **Frontend** (port 3000): Next.js web interface
- **Nginx** (port 2026): Unified reverse proxy entry point
- **Provisioner** (port 8002, optional in Docker dev): Started only when sandbox is configured for provisioner/Kubernetes mode
**Runtime**:
-`make dev`, Docker dev, and production all run the agent runtime in Gateway via `RunManager` + `run_agent()` + `StreamBridge` (`packages/harness/deerflow/runtime/`). Nginx exposes that runtime at `/api/langgraph/*` and rewrites it to Gateway's native `/api/*` routers.
-`tests/test_harness_boundary.py` — ensures `packages/harness/deerflow/` never imports from `app.*`
CI runs these regression tests for every pull request via [.github/workflows/backend-unit-tests.yml](../.github/workflows/backend-unit-tests.yml).
## Architecture
### Harness / App Split
The backend is split into two layers with a strict dependency direction:
- **Harness** (`packages/harness/deerflow/`): Publishable agent framework package (`deerflow-harness`). Import prefix: `deerflow.*`. Contains agent orchestration, tools, sandbox, models, MCP, skills, config — everything needed to build and run agents.
- **App** (`app/`): Unpublished application code. Import prefix: `app.*`. Contains the FastAPI Gateway API and IM channel integrations (Feishu, Slack, Telegram, DingTalk).
**Dependency rule**: App imports deerflow, but deerflow never imports app. This boundary is enforced by `tests/test_harness_boundary.py` which runs in CI.
Lead-agent middlewares are assembled in strict append order across `packages/harness/deerflow/agents/middlewares/tool_error_handling_middleware.py` (`build_lead_runtime_middlewares`) and `packages/harness/deerflow/agents/lead_agent/agent.py` (`_build_middlewares`):
1.**ThreadDataMiddleware** - Creates per-thread directories under the user's isolation scope (`backend/.deer-flow/users/{user_id}/threads/{thread_id}/user-data/{workspace,uploads,outputs}`); resolves `user_id` via `get_effective_user_id()` (falls back to `"default"` in no-auth mode); Web UI thread deletion now follows LangGraph thread removal with Gateway cleanup of the local thread directory
2.**UploadsMiddleware** - Tracks and injects newly uploaded files into conversation
3.**SandboxMiddleware** - Acquires sandbox, stores `sandbox_id` in state
4.**DanglingToolCallMiddleware** - Injects placeholder ToolMessages for AIMessage tool_calls that lack responses (e.g., due to user interruption), including raw provider tool-call payloads preserved only in `additional_kwargs["tool_calls"]`
5.**LLMErrorHandlingMiddleware** - Normalizes provider/model invocation failures into recoverable assistant-facing errors before later middleware/tool stages run
6.**GuardrailMiddleware** - Pre-tool-call authorization via pluggable `GuardrailProvider` protocol (optional, if `guardrails.enabled` in config). Evaluates each tool call and returns error ToolMessage on deny. Three provider options: built-in `AllowlistProvider` (zero deps), OAP policy providers (e.g. `aport-agent-guardrails`), or custom providers. See [docs/GUARDRAILS.md](docs/GUARDRAILS.md) for setup, usage, and how to implement a provider.
7.**SandboxAuditMiddleware** - Audits sandboxed shell/file operations for security logging before tool execution continues
8.**ToolErrorHandlingMiddleware** - Converts tool exceptions into error `ToolMessage`s so the run can continue instead of aborting
9.**SummarizationMiddleware** - Context reduction when approaching token limits (optional, if enabled)
10.**TodoListMiddleware** - Task tracking with `write_todos` tool (optional, if plan_mode)
11.**TokenUsageMiddleware** - Records token usage metrics when token tracking is enabled (optional)
12.**TitleMiddleware** - Auto-generates thread title after first complete exchange and normalizes structured message content before prompting the title model
13.**MemoryMiddleware** - Queues conversations for async memory update (filters to user + final AI responses)
14.**ViewImageMiddleware** - Injects base64 image data before LLM call (conditional on vision support)
15.**DeferredToolFilterMiddleware** - Hides deferred tool schemas from the bound model until tool search is enabled (optional)
16.**SubagentLimitMiddleware** - Truncates excess `task` tool calls from model response to enforce `MAX_CONCURRENT_SUBAGENTS` limit (optional, if `subagent_enabled`)
17.**LoopDetectionMiddleware** - Detects repeated tool-call loops; hard-stop responses clear both structured `tool_calls` and raw provider tool-call metadata before forcing a final text answer
18.**ClarificationMiddleware** - Intercepts `ask_clarification` tool calls, interrupts via `Command(goto=END)` (must be last)
### Configuration System
**Main Configuration** (`config.yaml`):
Setup: Copy `config.example.yaml` to `config.yaml` in the **project root** directory.
**Config Versioning**: `config.example.yaml` has a `config_version` field. On startup, `AppConfig.from_file()` compares user version vs example version and emits a warning if outdated. Missing `config_version` = version 0. Run `make config-upgrade` to auto-merge missing fields. When changing the config schema, bump `config_version` in `config.example.yaml`.
**Config Caching**: `get_app_config()` caches the parsed config, but automatically reloads it when the resolved config path changes or the file's mtime increases. This keeps Gateway and LangGraph reads aligned with `config.yaml` edits without requiring a manual process restart.
Configuration priority:
1. Explicit `config_path` argument
2.`DEER_FLOW_CONFIG_PATH` environment variable
3.`config.yaml` in current directory (backend/)
4.`config.yaml` in parent directory (project root - **recommended location**)
Config values starting with `$` are resolved as environment variables (e.g., `$OPENAI_API_KEY`).
`ModelConfig` also declares `use_responses_api` and `output_version` so OpenAI `/v1/responses` can be enabled explicitly while still using `langchain_openai:ChatOpenAI`.
3.`extensions_config.json` in current directory (backend/)
4.`extensions_config.json` in parent directory (project root - **recommended location**)
### Gateway API (`app/gateway/`)
FastAPI application on port 8001 with health check at `GET /health`. Set `GATEWAY_ENABLE_DOCS=false` to disable `/docs`, `/redoc`, and `/openapi.json` in production (default: enabled).
**Routers**:
| Router | Endpoints |
|--------|-----------|
| **Models** (`/api/models`) | `GET /` - list models; `GET /{name}` - model details |
| **MCP** (`/api/mcp`) | `GET /config` - get config; `PUT /config` - update config (saves to extensions_config.json) |
| **Skills** (`/api/skills`) | `GET /` - list skills; `GET /{name}` - details; `PUT /{name}` - update enabled; `POST /install` - install from .skill archive (accepts standard optional frontmatter like `version`, `author`, `compatibility`) |
| **Threads** (`/api/threads/{id}`) | `DELETE /` - remove DeerFlow-managed local thread data after LangGraph thread deletion; unexpected failures are logged server-side and return a generic 500 detail |
| **Artifacts** (`/api/threads/{id}/artifacts`) | `GET /{path}` - serve artifacts; active content types (`text/html`, `application/xhtml+xml`, `image/svg+xml`) are always forced as download attachments to reduce XSS risk; `?download=true` still forces download for other file types |
| **Suggestions** (`/api/threads/{id}/suggestions`) | `POST /` - generate follow-up questions; rich list/block model content is normalized before JSON parsing |
**Sandbox Tools** (in `packages/harness/deerflow/sandbox/tools.py`):
-`bash` - Execute commands with path translation and error handling
-`ls` - Directory listing (tree format, max 2 levels)
-`read_file` - Read file contents with optional line range
-`write_file` - Write/append to files, creates directories
-`str_replace` - Substring replacement (single or all occurrences); same-path serialization is scoped to `(sandbox.id, path)` so isolated sandboxes do not contend on identical virtual paths inside one process
### Subagent System (`packages/harness/deerflow/subagents/`)
-`tavily/` - Web search (5 results default) and web fetch (4KB limit)
-`jina_ai/` - Web fetch via Jina reader API with readability extraction
-`firecrawl/` - Web scraping via Firecrawl API
**ACP agent tools**:
-`invoke_acp_agent` - Invokes external ACP-compatible agents from `config.yaml`
- ACP launchers must be real ACP adapters. The standard `codex` CLI is not ACP-compatible by itself; configure a wrapper such as `npx -y @zed-industries/codex-acp` or an installed `codex-acp` binary
- Missing ACP executables now return an actionable error message instead of a raw `[Errno 2]`
- Each ACP agent uses a per-thread workspace at `{base_dir}/users/{user_id}/threads/{thread_id}/acp-workspace/`. The workspace is accessible to the lead agent via the virtual path `/mnt/acp-workspace/` (read-only). In docker sandbox mode, the directory is volume-mounted into the container at `/mnt/acp-workspace` (read-only); in local sandbox mode, path translation is handled by `tools.py`
-`image_search/` - Image search via DuckDuckGo
### MCP System (`packages/harness/deerflow/mcp/`)
- Uses `langchain-mcp-adapters``MultiServerMCPClient` for multi-server management
- **Lazy initialization**: Tools loaded on first use via `get_cached_mcp_tools()`
- **Cache invalidation**: Detects config file changes via mtime comparison
- **Loading**: `load_skills()` recursively scans `skills/{public,custom}` for `SKILL.md`, parses metadata, and reads enabled state from extensions_config.json
- **Injection**: Enabled skills listed in agent system prompt with container paths
- **Installation**: `POST /api/skills/install` extracts .skill ZIP archive to custom/ directory
### Model Factory (`packages/harness/deerflow/models/factory.py`)
-`create_chat_model(name, thinking_enabled)` instantiates LLM from config via reflection
- Supports `thinking_enabled` flag with per-model `when_thinking_enabled` overrides
- Supports vLLM-style thinking toggles via `when_thinking_enabled.extra_body.chat_template_kwargs.enable_thinking` for Qwen reasoning models, while normalizing legacy `thinking` configs for backward compatibility
- Supports `supports_vision` flag for image understanding models
- Config values starting with `$` resolved as environment variables
- Missing provider modules surface actionable install hints from reflection resolvers (for example `uv add langchain-google-genai`)
-`VllmChatModel` subclasses `langchain_openai:ChatOpenAI` for vLLM 0.19.0 OpenAI-compatible endpoints
- Preserves vLLM's non-standard assistant `reasoning` field on full responses, streaming deltas, and follow-up tool-call turns
- Designed for configs that enable thinking through `extra_body.chat_template_kwargs.enable_thinking` on vLLM 0.19.0 Qwen reasoning models, while accepting the older `thinking` alias
### IM Channels System (`app/channels/`)
Bridges external messaging platforms (Feishu, Slack, Telegram, DingTalk) to the DeerFlow agent via the LangGraph Server.
**Architecture**: Channels communicate with Gateway through the `langgraph-sdk` HTTP client (same as the frontend), ensuring threads are created and managed server-side. The internal SDK client injects process-local internal auth plus a matching CSRF cookie/header pair so Gateway accepts state-changing thread/run requests from channel workers without relying on browser session cookies.
-`store.py` - JSON-file persistence mapping `channel_name:chat_id[:topic_id]` → `thread_id` (keys are `channel:chat` for root conversations and `channel:chat:topic` for threaded conversations)
-`manager.py` - Core dispatcher: creates threads via `client.threads.create()`, routes commands, keeps Slack/Telegram on `client.runs.wait()`, and uses `client.runs.stream(["messages-tuple", "values"])` for Feishu incremental outbound updates
-`base.py` - Abstract `Channel` base class (start/stop/send lifecycle)
-`service.py` - Manages lifecycle of all configured channels from `config.yaml`
-`slack.py` / `feishu.py` / `telegram.py` / `dingtalk.py` - Platform-specific implementations (`feishu.py` tracks the running card `message_id` in memory and patches the same card in place; `dingtalk.py` optionally uses AI Card streaming for in-place updates when `card_template_id` is configured)
6. Feishu channel sends one running reply card up front, then patches the same card for each outbound update (card JSON sets `config.update_multi=true` for Feishu's patch API requirement)
7. DingTalk AI Card mode (when `card_template_id` configured): `runs.stream()` → create card with initial text → stream updates via `PUT /v1.0/card/streaming` → finalize on `is_final=True`. Falls back to `sampleMarkdown` if card creation or streaming fails
8. For commands (`/new`, `/status`, `/models`, `/memory`, `/help`): handle locally or query Gateway API
9. Outbound → channel callbacks → platform reply
**Configuration** (`config.yaml` -> `channels`):
-`langgraph_url` - LangGraph-compatible Gateway API base URL (default: `http://localhost:8001/api`)
-`gateway_url` - Gateway API URL for auxiliary commands (default: `http://localhost:8001`)
- In Docker Compose, IM channels run inside the `gateway` container, so `localhost` points back to that container. Use `http://gateway:8001/api` for `langgraph_url` and `http://gateway:8001` for `gateway_url`, or set `DEER_FLOW_CHANNELS_LANGGRAPH_URL` / `DEER_FLOW_CHANNELS_GATEWAY_URL`.
- Per-channel configs: `feishu` (app_id, app_secret), `slack` (bot_token, app_token), `telegram` (bot_token), `dingtalk` (client_id, client_secret, optional `card_template_id` for AI Card streaming)
### Memory System (`packages/harness/deerflow/agents/memory/`)
**Components**:
-`updater.py` - LLM-based memory updates with fact extraction, whitespace-normalized fact deduplication (trims leading/trailing whitespace before comparing), and atomic file I/O
-`queue.py` - Debounced update queue (per-thread deduplication, configurable wait time); captures `user_id` at enqueue time so it survives the `threading.Timer` boundary
-`prompt.py` - Prompt templates for memory updates
-`storage.py` - File-based storage with per-user isolation; cache keyed by `(user_id, agent_name)` tuple
**Per-User Isolation**:
- Memory is stored per-user at `{base_dir}/users/{user_id}/memory.json`
- Per-agent per-user memory at `{base_dir}/users/{user_id}/agents/{agent_name}/memory.json`
-`user_id` is resolved via `get_effective_user_id()` from `deerflow.runtime.user_context`
- In no-auth mode, `user_id` defaults to `"default"` (constant `DEFAULT_USER_ID`)
- Absolute `storage_path` in config opts out of per-user isolation
- **Migration**: Run `PYTHONPATH=. python scripts/migrate_user_isolation.py` to move legacy `memory.json` and `threads/` into per-user layout; supports `--dry-run`
**Data Structure** (stored in `{base_dir}/users/{user_id}/memory.json`):
1.`MemoryMiddleware` filters messages (user inputs + final AI responses), captures `user_id` via `get_effective_user_id()`, and queues conversation with the captured `user_id`
3. Background thread invokes LLM to extract context updates and facts, using the stored `user_id` (not the contextvar, which is unavailable on timer threads)
4. Applies updates atomically (temp file + rename) with cache invalidation, skipping duplicate fact content before append
5. Next interaction injects top 15 facts + context into `<memory>` tags in system prompt
Focused regression coverage for the updater lives in `backend/tests/test_memory_updater.py`.
-`max_injection_tokens` - Token limit for prompt injection (2000)
### Reflection System (`packages/harness/deerflow/reflection/`)
-`resolve_variable(path)` - Import module and return variable (e.g., `module.path:variable_name`)
-`resolve_class(path, base_class)` - Import and validate class against base class
### Config Schema
**`config.yaml`** key sections:
-`models[]` - LLM configs with `use` class path, `supports_thinking`, `supports_vision`, provider-specific fields
- vLLM reasoning models should use `deerflow.models.vllm_provider:VllmChatModel`; for Qwen-style parsers prefer `when_thinking_enabled.extra_body.chat_template_kwargs.enable_thinking`, and DeerFlow will also normalize the older `thinking` alias
-`tools[]` - Tool configs with `use` variable path and `group`
-`tool_groups[]` - Logical groupings for tools
-`sandbox.use` - Sandbox provider class path
-`skills.path` / `skills.container_path` - Host and container paths to skills directory
`DeerFlowClient` provides direct in-process access to all DeerFlow capabilities without HTTP services. All return types align with the Gateway API response schemas, so consumer code works identically in HTTP and embedded modes.
**Architecture**: Imports the same `deerflow` modules that Gateway API uses. Shares the same config files and data directories. No FastAPI dependency.
**Agent Conversation**:
-`chat(message, thread_id)` — synchronous, accumulates streaming deltas per message-id and returns the final AI text
-`stream(message, thread_id)` — subscribes to LangGraph `stream_mode=["values", "messages", "custom"]` and yields `StreamEvent`:
-`"values"` — full state snapshot (title, messages, artifacts); AI text already delivered via `messages` mode is **not** re-synthesized here to avoid duplicate deliveries
-`"messages-tuple"` — per-chunk update: for AI text this is a **delta** (concat per `id` to rebuild the full message); tool calls and tool results are emitted once each
-`"custom"` — forwarded from `StreamWriter`
-`"end"` — stream finished (carries cumulative `usage` counted once per message id)
- Agent created lazily via `create_agent()` + `_build_middlewares()`, same as `make_lead_agent`
- Supports `checkpointer` parameter for state persistence across turns
-`reset_agent()` forces agent recreation (e.g. after memory or skill changes)
- See [docs/STREAMING.md](docs/STREAMING.md) for the full design: why Gateway and DeerFlowClient are parallel paths, LangGraph's `stream_mode` semantics, the per-id dedup invariants, and regression testing strategy
**Key difference from Gateway**: Upload accepts local `Path` objects instead of HTTP `UploadFile`, rejects directory paths before copying, and reuses a single worker when document conversion must run inside an active event loop. Artifact returns `(bytes, mime_type)` instead of HTTP Response. The new Gateway-only thread cleanup route deletes `.deer-flow/threads/{thread_id}` after LangGraph thread deletion; there is no matching `DeerFlowClient` method yet. `update_mcp_config()` and `update_skill()` automatically invalidate the cached agent.
**Tests**: `tests/test_client.py` (77 unit tests including `TestGatewayConformance`), `tests/test_client_live.py` (live integration tests, requires config.yaml)
**Gateway Conformance Tests** (`TestGatewayConformance`): Validate that every dict-returning client method conforms to the corresponding Gateway Pydantic response model. Each test parses the client output through the Gateway model — if Gateway adds a required field that the client doesn't provide, Pydantic raises `ValidationError` and CI catches the drift. Covers: `ModelsListResponse`, `ModelResponse`, `SkillsListResponse`, `SkillResponse`, `SkillInstallResponse`, `McpConfigResponse`, `UploadResponse`, `MemoryConfigResponse`, `MemoryStatusResponse`.
## Development Workflow
### Test-Driven Development (TDD) — MANDATORY
**Every new feature or bug fix MUST be accompanied by unit tests. No exceptions.**
- Write tests in `backend/tests/` following the existing naming convention `test_<feature>.py`
- Run the full suite before and after your change: `make test`
- Tests must pass before a feature is considered complete
- For lightweight config/utility modules, prefer pure unit tests with no external dependencies
- If a module causes circular import issues in tests, add a `sys.modules` mock in `tests/conftest.py` (see existing example for `deerflow.subagents.executor`)
```bash
# Run all tests
make test
# Run a specific test file
PYTHONPATH=. uv run pytest tests/test_<feature>.py -v
```
### Running the Full Application
From the **project root** directory:
```bash
make dev
```
This starts all services and makes the application available at `http://localhost:2026`.
DeerFlow is a LangGraph-based AI super agent with sandbox execution, persistent memory, and extensible tool integration. The backend enables AI agents to execute code, browse the web, manage files, delegate tasks to subagents, and retain context across conversations - all in isolated, per-thread environments.
---
## Architecture
```
┌──────────────────────────────────────┐
│ Nginx (Port 2026) │
│ Unified reverse proxy │
└───────┬──────────────────┬───────────┘
│ │
/api/langgraph/* │ │ /api/* (other)
▼ ▼
┌────────────────────┐ ┌────────────────────────┐
│ LangGraph Server │ │ Gateway API (8001) │
│ (Port 2024) │ │ FastAPI REST │
│ │ │ │
│ ┌────────────────┐ │ │ Models, MCP, Skills, │
│ │ Lead Agent │ │ │ Memory, Uploads, │
│ │ ┌──────────┐ │ │ │ Artifacts │
│ │ │Middleware│ │ │ └────────────────────────┘
│ │ │ Chain │ │ │
│ │ └──────────┘ │ │
│ │ ┌──────────┐ │ │
│ │ │ Tools │ │ │
│ │ └──────────┘ │ │
│ │ ┌──────────┐ │ │
│ │ │Subagents │ │ │
│ │ └──────────┘ │ │
│ └────────────────┘ │
└────────────────────┘
```
**Request Routing** (via Nginx):
-`/api/langgraph/*` → LangGraph Server - agent interactions, threads, streaming
- **Skills loading**: Recursively discovers nested `SKILL.md` files under `skills/{public,custom}` and preserves nested container paths
- **File-write safety**: `str_replace` serializes read-modify-write per `(sandbox.id, path)` so isolated sandboxes keep concurrency even when virtual paths match
- **Tools**: `bash`, `ls`, `read_file`, `write_file`, `str_replace` (`bash` is disabled by default when using `LocalSandboxProvider`; use `AioSandboxProvider` for isolated shell access)
### Subagent System
Async task delegation with concurrent execution:
- **Built-in agents**: `general-purpose` (full toolset) and `bash` (command specialist, exposed only when shell access is available)
- **Concurrency**: Max 3 subagents per turn, 15-minute timeout
- **Execution**: Background thread pools with status tracking and SSE events
- **Flow**: Agent calls `task()` tool → executor runs subagent in background → polls for completion → returns result
### Memory System
LLM-powered persistent context retention across conversations:
- **Automatic extraction**: Analyzes conversations for user context, facts, and preferences
- **Structured storage**: User context (work, personal, top-of-mind), history, and confidence-scored facts
| `GET /api/threads/{id}/uploads/list` | List uploaded files |
| `DELETE /api/threads/{id}` | Delete DeerFlow-managed local thread data after LangGraph thread deletion; unexpected failures are logged server-side and return a generic 500 detail |
The IM bridge supports Feishu, Slack, and Telegram. Slack and Telegram still use the final `runs.wait()` response path, while Feishu now streams through `runs.stream(["messages-tuple", "values"])` and updates a single in-thread card in place.
For Feishu card updates, DeerFlow stores the running card's `message_id` per inbound message and patches that same card until the run finishes, preserving the existing `OK` / `DONE` reaction flow.
- Model API keys: `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `DEEPSEEK_API_KEY`, etc.
- Tool API keys: `TAVILY_API_KEY`, `GITHUB_TOKEN`, etc.
### LangSmith Tracing
DeerFlow has built-in [LangSmith](https://smith.langchain.com) integration for observability. When enabled, all LLM calls, agent runs, tool executions, and middleware processing are traced and visible in the LangSmith dashboard.
**Setup:**
1. Sign up at [smith.langchain.com](https://smith.langchain.com) and create a project.
2. Add the following to your `.env` file in the project root:
**Legacy variables:** The `LANGCHAIN_TRACING_V2`, `LANGCHAIN_API_KEY`, `LANGCHAIN_PROJECT`, and `LANGCHAIN_ENDPOINT` variables are also supported for backward compatibility. `LANGSMITH_*` variables take precedence when both are set.
### Langfuse Tracing
DeerFlow also supports [Langfuse](https://langfuse.com) observability for LangChain-compatible runs.
Add the following to your `.env` file:
```bash
LANGFUSE_TRACING=true
LANGFUSE_PUBLIC_KEY=pk-lf-xxxxxxxxxxxxxxxx
LANGFUSE_SECRET_KEY=sk-lf-xxxxxxxxxxxxxxxx
LANGFUSE_BASE_URL=https://cloud.langfuse.com
```
If you are using a self-hosted Langfuse deployment, set `LANGFUSE_BASE_URL` to your Langfuse host.
### Dual Provider Behavior
If both LangSmith and Langfuse are enabled, DeerFlow initializes and attaches both callbacks so the same run data is reported to both systems.
If a provider is explicitly enabled but required credentials are missing, or the provider callback cannot be initialized, DeerFlow raises an error when tracing is initialized during model creation instead of silently disabling tracing.
**Docker:** In `docker-compose.yaml`, tracing is disabled by default (`LANGSMITH_TRACING=false`). Set `LANGSMITH_TRACING=true` and/or `LANGFUSE_TRACING=true` in your `.env`, together with the required credentials, to enable tracing in containerized deployments.
---
## Development
### Commands
```bash
make install # Install dependencies
make dev # Run LangGraph server (port 2024)
make gateway # Run Gateway API (port 8001)
make lint # Run linter (ruff)
make format # Format code (ruff)
```
### Code Style
- **Linter/Formatter**: `ruff`
- **Line length**: 240 characters
- **Python**: 3.12+ with type hints
- **Quotes**: Double quotes
- **Indentation**: 4 spaces
### Testing
```bash
uv run pytest
```
---
## Technology Stack
- **LangGraph** (1.0.6+) - Agent framework and multi-agent orchestration
- **LangChain** (1.2.3+) - LLM abstractions and tool system
- **FastAPI** (0.115.0+) - Gateway REST API
- **langchain-mcp-adapters** - Model Context Protocol support
logger.warning("[Feishu] running card creation returned no message_id for source=%s, subsequent updates will fall back to new replies",source_message_id)
returnrunning_card_id
def_ensure_running_card_started(self,source_message_id:str,text:str="Working on it...")->asyncio.Task|None:
"""Start running-card creation once per source message."""
raiseRuntimeError("Installed wecom-aibot-python-sdk does not expose the WebSocket media upload API expected by DeerFlow. Use wecom-aibot-python-sdk==0.1.6 or update the adapter.")
# Validate: wildcard origin with credentials is a security misconfiguration
fororiginincors_origins:
iforigin=="*":
logger.error("GATEWAY_CORS_ORIGINS contains wildcard '*' with allow_credentials=True. This is a security misconfiguration — browsers will reject the response. Use explicit scheme://host:port origins instead.")
cors_origins=[oforoincors_originsifo!="*"]
break
ifcors_origins:
app.add_middleware(
CORSMiddleware,
allow_origins=cors_origins,
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
# Include routers
# Models API is mounted at /api/models
app.include_router(models.router)
# MCP API is mounted at /api/mcp
app.include_router(mcp.router)
# Memory API is mounted at /api/memory
app.include_router(memory.router)
# Skills API is mounted at /api/skills
app.include_router(skills.router)
# Artifacts API is mounted at /api/threads/{thread_id}/artifacts
app.include_router(artifacts.router)
# Uploads API is mounted at /api/threads/{thread_id}/uploads
app.include_router(uploads.router)
# Thread cleanup API is mounted at /api/threads/{thread_id}
app.include_router(threads.router)
# Agents API is mounted at /api/agents
app.include_router(agents.router)
# Suggestions API is mounted at /api/threads/{thread_id}/suggestions
app.include_router(suggestions.router)
# Channels API is mounted at /api/channels
app.include_router(channels.router)
# Assistants compatibility API (LangGraph Platform stub)
app.include_router(assistants_compat.router)
# Auth API is mounted at /api/v1/auth
app.include_router(auth.router)
# Feedback API is mounted at /api/threads/{thread_id}/runs/{run_id}/feedback
app.include_router(feedback.router)
# Thread Runs API (LangGraph Platform-compatible runs lifecycle)
app.include_router(thread_runs.router)
# Stateless Runs API (stream/wait without a pre-existing thread)
"""Write the admin email + password to ``{base_dir}/admin_initial_credentials.txt``.
The file is created **atomically** with mode 0600 via ``os.open``
so the password is never world-readable, even for the single syscall
window between ``write_text`` and ``chmod``.
``label`` distinguishes "initial" (fresh creation) from "reset"
(password reset) in the file header so an operator picking up the
file after a restart can tell which event produced it.
Returns the absolute :class:`Path` to the file.
"""
target=get_paths().base_dir/_CREDENTIAL_FILENAME
target.parent.mkdir(parents=True,exist_ok=True)
content=(
f"# DeerFlow admin {label} credentials\n# This file is generated on first boot or password reset.\n# Change the password after login via Settings -> Account,\n# then delete this file.\n#\nemail: {email}\npassword: {password}\n"
)
# Atomic 0600 create-or-truncate. O_TRUNC (not O_EXCL) so the
# reset-password path can rewrite an existing file without a
description="Retrieve an artifact file generated by the AI agent. Text and binary files can be viewed inline, while active web content is always downloaded.",
raiseHTTPException(status_code=status.HTTP_400_BAD_REQUEST,detail=AuthErrorResponse(code=AuthErrorCode.INVALID_CREDENTIALS,message="Current password is incorrect").model_dump())
raiseHTTPException(status_code=status.HTTP_400_BAD_REQUEST,detail=AuthErrorResponse(code=AuthErrorCode.EMAIL_ALREADY_EXISTS,message="Email already in use").model_dump())
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.