feat(agent): add custom-agent self-updates with user isolation (#2713 )

* feat(agent): add update_agent tool for in-chat custom-agent self-updates (#2616) Custom agents had no built-in way to persist updates to their own SOUL.md / config.yaml from a normal chat — `setup_agent` was only bound during the bootstrap flow, so when the user asked the agent to refine its description or personality, the agent would shell out via bash/write_file and the edits landed in a temporary sandbox/tool workspace instead of `{base_dir}/agents/{agent_name}/`. Changes: - New `update_agent` builtin tool with partial-update semantics (only the fields you pass are written) and atomic temp-file + os.replace writes so a failed update never corrupts existing SOUL.md / config.yaml. - Lead agent now binds `update_agent` in the non-bootstrap path whenever `agent_name` is set in the runtime context. Default agent (no agent_name) and bootstrap flow are unchanged. - New `<self_update>` system-prompt section is injected for custom agents, instructing them to use `update_agent` — and explicitly NOT bash / write_file — to persist self-updates. - Tests: 11 new cases in `tests/test_update_agent_tool.py` covering validation (missing/invalid agent_name, unknown agent, no fields), partial updates (soul-only, description-only, skills=[] vs omitted), no-op detection, atomic-write safety, and AgentConfig round-tripping; plus 2 new cases in `tests/test_lead_agent_prompt.py` covering the self-update prompt section. - Docs: updated backend/CLAUDE.md builtin tools list and tools.mdx (en/zh) with the new tool description. * feat(agent): isolate custom agents per user Store custom agent definitions under the effective user, keep legacy agents readable until migration, and cover API/tool/migration behavior with tests. Co-authored-by: Cursor <cursoragent@cursor.com> * feat: consistent write/delete targets & add --user-id to migration --------- Co-authored-by: Cursor <cursoragent@cursor.com>
fix(loop-detection): keep tool-call pairing on warn injection (#2724 ) (#2725 )
2026-05-05 23:17:42 +08:00 · 2026-05-05 18:53:49 +08:00 · 2026-05-05 18:53:10 +08:00 · 2026-05-05 16:27:29 +08:00 · 2026-05-05 14:35:55 +08:00 · 2026-05-04 16:14:07 +08:00
58 changed files with 4311 additions and 349 deletions
@@ -48,3 +48,14 @@ INFOQUEST_API_KEY=your-infoquest-api-key

 # Set to "false" to disable Swagger UI, ReDoc, and OpenAPI schema in production
 # GATEWAY_ENABLE_DOCS=false
+
+# ── Frontend SSR → Gateway wiring ─────────────────────────────────────────────
+# The Next.js server uses these to reach the Gateway during SSR (auth checks,
+# /api/* rewrites). They default to localhost values that match `make dev` and
+# `make start`, so most local users do not need to set them.
+#
+# Override only when the Gateway is not on localhost:8001 (e.g. when the
+# frontend and gateway run on different hosts, in containers with a service
+# alias, or behind a different port). docker-compose already sets these.
+# DEER_FLOW_INTERNAL_GATEWAY_BASE_URL=http://localhost:8001
+# DEER_FLOW_TRUSTED_ORIGINS=http://localhost:3000,http://localhost:2026
@@ -0,0 +1,101 @@
+name: Publish Containers
+
+on:
+  push:
+    tags:
+      - "v*"
+
+jobs:
+
+  backend-container:
+    runs-on: ubuntu-latest
+    permissions:
+      contents: read
+      packages: write
+      attestations: write
+      id-token: write
+    env:
+      REGISTRY: ghcr.io
+      IMAGE_NAME: ${{ github.repository }}-backend
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v6
+      - name: Log in to the Container registry
+        uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 #v3.4.0
+        with:
+          registry: ${{ env.REGISTRY }}
+          username: ${{ github.actor }}
+          password: ${{ secrets.GITHUB_TOKEN }}
+      - name: Extract metadata (tags, labels) for Docker
+        id: meta
+        uses: docker/metadata-action@902fa8ec7d6ecbf8d84d538b9b233a880e428804 #v5.7.0
+        with:
+          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
+          tags: |
+            type=ref,event=tag
+            type=ref,event=branch
+            type=sha
+            type=raw,value=latest,enable={{is_default_branch}}
+      - name: Build and push Docker image
+        id: push
+        uses: docker/build-push-action@263435318d21b8e681c14492fe198d362a7d2c83 #v6.18.0
+        with:
+          context: .
+          file: backend/Dockerfile
+          push: true
+          tags: ${{ steps.meta.outputs.tags }}
+          labels: ${{ steps.meta.outputs.labels }}
+
+      - name: Generate artifact attestation
+        uses: actions/attest-build-provenance@v2
+        with:
+          subject-name: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME}}
+          subject-digest: ${{ steps.push.outputs.digest }}
+          push-to-registry: true
+
+  frontend-container:
+    runs-on: ubuntu-latest
+    permissions:
+      contents: read
+      packages: write
+      attestations: write
+      id-token: write
+    env:
+      REGISTRY: ghcr.io
+      IMAGE_NAME: ${{ github.repository }}-frontend
+
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v6
+      - name: Log in to the Container registry
+        uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 #v3.4.0
+        with:
+          registry: ${{ env.REGISTRY }}
+          username: ${{ github.actor }}
+          password: ${{ secrets.GITHUB_TOKEN }}
+      - name: Extract metadata (tags, labels) for Docker
+        id: meta
+        uses: docker/metadata-action@902fa8ec7d6ecbf8d84d538b9b233a880e428804 #v5.7.0
+        with:
+          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
+          tags: |
+            type=ref,event=tag
+            type=ref,event=branch
+            type=sha
+            type=raw,value=latest,enable={{is_default_branch}}
+      - name: Build and push Docker image
+        id: push
+        uses: docker/build-push-action@263435318d21b8e681c14492fe198d362a7d2c83 #v6.18.0
+        with:
+          context: .
+          file: frontend/Dockerfile
+          push: true
+          tags: ${{ steps.meta.outputs.tags }}
+          labels: ${{ steps.meta.outputs.labels }}
+
+      - name: Generate artifact attestation
+        uses: actions/attest-build-provenance@v2
+        with:
+          subject-name: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME}}
+          subject-digest: ${{ steps.push.outputs.digest }}
+          push-to-registry: true
@@ -263,6 +263,8 @@ Proxied through nginx: `/api/langgraph/*` → LangGraph, all other `/api/*` →
   - `present_files` - Make output files visible to user (only `/mnt/user-data/outputs`)
   - `ask_clarification` - Request clarification (intercepted by ClarificationMiddleware → interrupts)
   - `view_image` - Read image as base64 (added only if model supports vision)
+   - `setup_agent` - Bootstrap-only: persist a brand-new custom agent's `SOUL.md` and `config.yaml`. Bound only when `is_bootstrap=True`.
+   - `update_agent` - Custom-agent-only: persist self-updates to the current agent's `SOUL.md` / `config.yaml` from inside a normal chat (partial update + atomic write). Bound when `agent_name` is set and `is_bootstrap=False`.
 4. **Subagent tool** (if enabled):
   - `task` - Delegate to subagent (description, prompt, subagent_type, max_turns)

@@ -354,10 +356,11 @@ Bridges external messaging platforms (Feishu, Slack, Telegram, DingTalk) to the
 **Per-User Isolation**:
 - Memory is stored per-user at `{base_dir}/users/{user_id}/memory.json`
 - Per-agent per-user memory at `{base_dir}/users/{user_id}/agents/{agent_name}/memory.json`
+- Custom agent definitions (`SOUL.md` + `config.yaml`) are also per-user at `{base_dir}/users/{user_id}/agents/{agent_name}/`. The legacy shared layout `{base_dir}/agents/{agent_name}/` remains read-only fallback for unmigrated installations
 - `user_id` is resolved via `get_effective_user_id()` from `deerflow.runtime.user_context`
 - In no-auth mode, `user_id` defaults to `"default"` (constant `DEFAULT_USER_ID`)
 - Absolute `storage_path` in config opts out of per-user isolation
- **Migration**: Run `PYTHONPATH=. python scripts/migrate_user_isolation.py` to move legacy `memory.json` and `threads/` into per-user layout; supports `--dry-run`
+- **Migration**: Run `PYTHONPATH=. python scripts/migrate_user_isolation.py` to move legacy `memory.json`, `threads/`, and `agents/` into per-user layout. Supports `--dry-run` (preview changes) and `--user-id USER_ID` (assign unowned legacy data to a user, defaults to `default`).

 **Data Structure** (stored in `{base_dir}/users/{user_id}/memory.json`):
 - **User Context**: `workContext`, `personalContext`, `topOfMind` (1-3 sentence summaries)
@@ -50,6 +50,12 @@ COPY backend ./backend
 RUN --mount=type=cache,target=/root/.cache/uv \
    sh -c "cd backend && UV_INDEX_URL=${UV_INDEX_URL:-https://pypi.org/simple} uv sync ${UV_EXTRAS:+--extra $UV_EXTRAS}"

+# UTF-8 locale prevents UnicodeEncodeError on Chinese/emoji content in minimal
+# containers where locale configuration may be missing and the default encoding is not UTF-8.
+ENV LANG=C.UTF-8
+ENV LC_ALL=C.UTF-8
+ENV PYTHONIOENCODING=utf-8
+
 # ── Stage 2: Dev ──────────────────────────────────────────────────────────────
 # Retains compiler toolchain from builder so startup-time `uv sync` can build
 # source distributions in development containers.
@@ -66,6 +72,10 @@ CMD ["sh", "-c", "cd backend && PYTHONPATH=. uv run uvicorn app.gateway.app:app
 # Clean image without build-essential — reduces size (~200 MB) and attack surface.
 FROM python:3.12-slim-bookworm

+ENV LANG=C.UTF-8
+ENV LC_ALL=C.UTF-8
+ENV PYTHONIOENCODING=utf-8
+
 # Copy Node.js runtime from builder (provides npx for MCP servers)
 COPY --from=builder /usr/bin/node /usr/bin/node
 COPY --from=builder /usr/lib/node_modules /usr/lib/node_modules
@@ -146,6 +146,13 @@ def _normalize_custom_agent_name(raw_value: str) -> str:
    return normalized


+def _strip_loop_warning_text(text: str) -> str:
+    """Remove middleware-authored loop warning lines from display text."""
+    if "[LOOP DETECTED]" not in text:
+        return text
+    return "\n".join(line for line in text.splitlines() if "[LOOP DETECTED]" not in line).strip()
+
+
 def _extract_response_text(result: dict | list) -> str:
    """Extract the last AI message text from a LangGraph runs.wait result.

@@ -155,7 +162,7 @@ def _extract_response_text(result: dict | list) -> str:
    Handles special cases:
    - Regular AI text responses
    - Clarification interrupts (``ask_clarification`` tool messages)
-    - AI messages with tool_calls but no text content
+    - Strips loop-detection warnings attached to tool-call AI messages
    """
    if isinstance(result, list):
        messages = result
@@ -185,7 +192,12 @@ def _extract_response_text(result: dict | list) -> str:
        # Regular AI message with text content
        if msg_type == "ai":
            content = msg.get("content", "")
+            has_tool_calls = bool(msg.get("tool_calls"))
            if isinstance(content, str) and content:
+                if has_tool_calls:
+                    content = _strip_loop_warning_text(content)
+                    if not content:
+                        continue
                return content
            # content can be a list of content blocks
            if isinstance(content, list):
@@ -196,6 +208,8 @@ def _extract_response_text(result: dict | list) -> str:
                    elif isinstance(block, str):
                        parts.append(block)
                text = "".join(parts)
+                if has_tool_calls:
+                    text = _strip_loop_warning_text(text)
                if text:
                    return text
    return ""
@@ -589,6 +603,17 @@ class ChannelManager:
            user_layer.get("config"),
        )

+        configurable = run_config.get("configurable")
+        if isinstance(configurable, Mapping):
+            configurable = dict(configurable)
+        else:
+            configurable = {}
+        run_config["configurable"] = configurable
+        # Pin channel-triggered runs to the root graph namespace so follow-up
+        # turns continue from the same conversation checkpoint.
+        configurable["checkpoint_ns"] = ""
+        configurable["thread_id"] = thread_id
+
        run_context = _merge_dicts(
            DEFAULT_RUN_CONTEXT,
            self._default_session.get("context"),
@@ -11,6 +11,7 @@ from pydantic import BaseModel, Field
 from deerflow.config.agents_api_config import get_agents_api_config
 from deerflow.config.agents_config import AgentConfig, list_custom_agents, load_agent_config, load_agent_soul
 from deerflow.config.paths import get_paths
+from deerflow.runtime.user_context import get_effective_user_id

 logger = logging.getLogger(__name__)
 router = APIRouter(prefix="/api", tags=["agents"])
@@ -86,11 +87,11 @@ def _require_agents_api_enabled() -> None:
        )


-def _agent_config_to_response(agent_cfg: AgentConfig, include_soul: bool = False) -> AgentResponse:
+def _agent_config_to_response(agent_cfg: AgentConfig, include_soul: bool = False, *, user_id: str | None = None) -> AgentResponse:
    """Convert AgentConfig to AgentResponse."""
    soul: str | None = None
    if include_soul:
-        soul = load_agent_soul(agent_cfg.name) or ""
+        soul = load_agent_soul(agent_cfg.name, user_id=user_id) or ""

    return AgentResponse(
        name=agent_cfg.name,
@@ -116,9 +117,10 @@ async def list_agents() -> AgentsListResponse:
    """
    _require_agents_api_enabled()

+    user_id = get_effective_user_id()
    try:
-        agents = list_custom_agents()
-        return AgentsListResponse(agents=[_agent_config_to_response(a, include_soul=True) for a in agents])
+        agents = list_custom_agents(user_id=user_id)
+        return AgentsListResponse(agents=[_agent_config_to_response(a, include_soul=True, user_id=user_id) for a in agents])
    except Exception as e:
        logger.error(f"Failed to list agents: {e}", exc_info=True)
        raise HTTPException(status_code=500, detail=f"Failed to list agents: {str(e)}")
@@ -144,7 +146,12 @@ async def check_agent_name(name: str) -> dict:
    _require_agents_api_enabled()
    _validate_agent_name(name)
    normalized = _normalize_agent_name(name)
-    available = not get_paths().agent_dir(normalized).exists()
+    user_id = get_effective_user_id()
+    paths = get_paths()
+    # Treat the name as taken if either the per-user path or the legacy shared
+    # path holds an agent — picking a name that collides with an unmigrated
+    # legacy agent would shadow the legacy entry once migration runs.
+    available = not paths.user_agent_dir(user_id, normalized).exists() and not paths.agent_dir(normalized).exists()
    return {"available": available, "name": normalized}


@@ -169,10 +176,11 @@ async def get_agent(name: str) -> AgentResponse:
    _require_agents_api_enabled()
    _validate_agent_name(name)
    name = _normalize_agent_name(name)
+    user_id = get_effective_user_id()

    try:
-        agent_cfg = load_agent_config(name)
-        return _agent_config_to_response(agent_cfg, include_soul=True)
+        agent_cfg = load_agent_config(name, user_id=user_id)
+        return _agent_config_to_response(agent_cfg, include_soul=True, user_id=user_id)
    except FileNotFoundError:
        raise HTTPException(status_code=404, detail=f"Agent '{name}' not found")
    except Exception as e:
@@ -202,10 +210,13 @@ async def create_agent_endpoint(request: AgentCreateRequest) -> AgentResponse:
    _require_agents_api_enabled()
    _validate_agent_name(request.name)
    normalized_name = _normalize_agent_name(request.name)
+    user_id = get_effective_user_id()
+    paths = get_paths()

-    agent_dir = get_paths().agent_dir(normalized_name)
+    agent_dir = paths.user_agent_dir(user_id, normalized_name)
+    legacy_dir = paths.agent_dir(normalized_name)

-    if agent_dir.exists():
+    if agent_dir.exists() or legacy_dir.exists():
        raise HTTPException(status_code=409, detail=f"Agent '{normalized_name}' already exists")

    try:
@@ -232,8 +243,8 @@ async def create_agent_endpoint(request: AgentCreateRequest) -> AgentResponse:

        logger.info(f"Created agent '{normalized_name}' at {agent_dir}")

-        agent_cfg = load_agent_config(normalized_name)
-        return _agent_config_to_response(agent_cfg, include_soul=True)
+        agent_cfg = load_agent_config(normalized_name, user_id=user_id)
+        return _agent_config_to_response(agent_cfg, include_soul=True, user_id=user_id)

    except HTTPException:
        raise
@@ -267,13 +278,20 @@ async def update_agent(name: str, request: AgentUpdateRequest) -> AgentResponse:
    _require_agents_api_enabled()
    _validate_agent_name(name)
    name = _normalize_agent_name(name)
+    user_id = get_effective_user_id()

    try:
-        agent_cfg = load_agent_config(name)
+        agent_cfg = load_agent_config(name, user_id=user_id)
    except FileNotFoundError:
        raise HTTPException(status_code=404, detail=f"Agent '{name}' not found")

-    agent_dir = get_paths().agent_dir(name)
+    paths = get_paths()
+    agent_dir = paths.user_agent_dir(user_id, name)
+    if not agent_dir.exists() and paths.agent_dir(name).exists():
+        raise HTTPException(
+            status_code=409,
+            detail=(f"Agent '{name}' only exists in the legacy shared layout and is not scoped to a user. Run scripts/migrate_user_isolation.py to move legacy agents into the per-user layout before updating."),
+        )

    try:
        # Update config if any config fields changed
@@ -314,8 +332,8 @@ async def update_agent(name: str, request: AgentUpdateRequest) -> AgentResponse:

        logger.info(f"Updated agent '{name}'")

-        refreshed_cfg = load_agent_config(name)
-        return _agent_config_to_response(refreshed_cfg, include_soul=True)
+        refreshed_cfg = load_agent_config(name, user_id=user_id)
+        return _agent_config_to_response(refreshed_cfg, include_soul=True, user_id=user_id)

    except HTTPException:
        raise
@@ -402,15 +420,22 @@ async def delete_agent(name: str) -> None:
        name: The agent name.

    Raises:
-        HTTPException: 404 if agent not found.
+        HTTPException: 404 if no per-user copy exists; 409 if only a legacy
+            shared copy exists (suggesting the migration script).
    """
    _require_agents_api_enabled()
    _validate_agent_name(name)
    name = _normalize_agent_name(name)
-
-    agent_dir = get_paths().agent_dir(name)
+    user_id = get_effective_user_id()
+    paths = get_paths()
+    agent_dir = paths.user_agent_dir(user_id, name)

    if not agent_dir.exists():
+        if paths.agent_dir(name).exists():
+            raise HTTPException(
+                status_code=409,
+                detail=(f"Agent '{name}' only exists in the legacy shared layout and is not scoped to a user. Run scripts/migrate_user_isolation.py to move legacy agents into the per-user layout before deleting."),
+            )
        raise HTTPException(status_code=404, detail=f"Agent '{name}' not found")

    try:
@@ -318,7 +318,7 @@ def make_lead_agent(config: RunnableConfig):
 def _make_lead_agent(config: RunnableConfig, *, app_config: AppConfig):
    # Lazy import to avoid circular dependency
    from deerflow.tools import get_available_tools
-    from deerflow.tools.builtins import setup_agent
+    from deerflow.tools.builtins import setup_agent, update_agent

    cfg = _get_runtime_config(config)
    resolved_app_config = app_config
@@ -390,6 +390,9 @@ def _make_lead_agent(config: RunnableConfig, *, app_config: AppConfig):
            state_schema=ThreadState,
        )

+    # Custom agents can update their own SOUL.md / config via update_agent.
+    # The default agent (no agent_name) does not see this tool.
+    extra_tools = [update_agent] if agent_name else []
    # Default lead agent (unchanged behavior)
    return create_agent(
        model=create_chat_model(name=model_name, thinking_enabled=thinking_enabled, reasoning_effort=reasoning_effort, app_config=resolved_app_config),
@@ -398,7 +401,8 @@ def _make_lead_agent(config: RunnableConfig, *, app_config: AppConfig):
            groups=agent_config.tool_groups if agent_config else None,
            subagent_enabled=subagent_enabled,
            app_config=resolved_app_config,
-        ),
+        )
+        + extra_tools,
        middleware=_build_middlewares(config, model_name=model_name, agent_name=agent_name, app_config=resolved_app_config),
        system_prompt=apply_prompt_template(
            subagent_enabled=subagent_enabled,
@@ -344,6 +344,7 @@ You are {agent_name}, an open-source super agent.
 </role>

 {soul}
+{self_update_section}
 {memory_context}

 <thinking_style>
@@ -643,6 +644,26 @@ def get_agent_soul(agent_name: str | None) -> str:
    return ""


+def _build_self_update_section(agent_name: str | None) -> str:
+    """Prompt block that teaches the custom agent to persist self-updates via update_agent."""
+    if not agent_name:
+        return ""
+    return f"""<self_update>
+You are running as the custom agent **{agent_name}** with a persisted SOUL.md and config.yaml.
+
+When the user asks you to update your own description, personality, behaviour, skill set, tool groups, or default model,
+you MUST persist the change with the `update_agent` tool. Do NOT use `bash`, `write_file`, or any sandbox tool to edit
+SOUL.md or config.yaml — those write into a temporary sandbox/tool workspace and the changes will be lost on the next turn.
+
+Rules:
+- Always pass the FULL replacement text for `soul` (no patch semantics). Start from your current SOUL above and apply the user's edits.
+- Only pass the fields that should change. Omit the others to preserve them.
+- Pass `skills=[]` to disable all skills, or omit `skills` to keep the existing whitelist.
+- After `update_agent` returns successfully, tell the user the change is persisted and will take effect on the next turn.
+</self_update>
+"""
+
+
 def get_deferred_tools_prompt_section(*, app_config: AppConfig | None = None) -> str:
    """Generate <available-deferred-tools> block for the system prompt.

@@ -772,6 +793,7 @@ def apply_prompt_template(
    prompt = SYSTEM_PROMPT_TEMPLATE.format(
        agent_name=agent_name or "DeerFlow 2.0",
        soul=get_agent_soul(agent_name),
+        self_update_section=_build_self_update_section(agent_name),
        skills_section=skills_section,
        deferred_tools_section=deferred_tools_section,
        memory_context=memory_context,
@@ -22,7 +22,6 @@ from typing import override

 from langchain.agents import AgentState
 from langchain.agents.middleware import AgentMiddleware
-from langchain_core.messages import HumanMessage
 from langgraph.runtime import Runtime

 logger = logging.getLogger(__name__)
@@ -356,13 +355,30 @@ class LoopDetectionMiddleware(AgentMiddleware[AgentState]):
            return {"messages": [stripped_msg]}

        if warning:
-            # Inject as HumanMessage instead of SystemMessage to avoid
-            # Anthropic's "multiple non-consecutive system messages" error.
-            # Anthropic models require system messages only at the start of
-            # the conversation; injecting one mid-conversation crashes
-            # langchain_anthropic's _format_messages(). HumanMessage works
-            # with all providers. See #1299.
-            return {"messages": [HumanMessage(content=warning, name="loop_warning")]}
+            # WORKAROUND for v2.0-m1 — see #2724.
+            #
+            # Append the warning to the AIMessage content instead of
+            # injecting a separate HumanMessage. Inserting any non-tool
+            # message between an AIMessage(tool_calls=...) and its
+            # ToolMessage responses breaks OpenAI/Moonshot strict pairing
+            # validation ("tool_call_ids did not have response messages")
+            # because the tools node has not run yet at after_model time.
+            # tool_calls are preserved so the tools node still executes.
+            #
+            # This is a temporary mitigation: mutating an existing
+            # AIMessage to carry framework-authored text leaks loop-warning
+            # text into downstream consumers (MemoryMiddleware fact
+            # extraction, TitleMiddleware, telemetry, model replay) as if
+            # the model said it. The proper fix is to defer warning
+            # injection from after_model to wrap_model_call so every prior
+            # ToolMessage is already in the request — see RFC #2517 (which
+            # lists "loop intervention does not leave invalid
+            # tool-call/tool-message state" as acceptance criteria) and
+            # the prototype on `fix/loop-detection-tool-call-pairing`.
+            messages = state.get("messages", [])
+            last_msg = messages[-1]
+            patched_msg = last_msg.model_copy(update={"content": self._append_text(last_msg.content, warning)})
+            return {"messages": [patched_msg]}

        return None

@@ -1,31 +1,270 @@
-"""Middleware for logging LLM token usage."""
+"""Middleware for logging token usage and annotating step attribution."""
+
+from __future__ import annotations

 import logging
-from typing import override
+from collections import defaultdict
+from typing import Any, override

 from langchain.agents import AgentState
 from langchain.agents.middleware import AgentMiddleware
+from langchain.agents.middleware.todo import Todo
+from langchain_core.messages import AIMessage
 from langgraph.runtime import Runtime

 logger = logging.getLogger(__name__)

+TOKEN_USAGE_ATTRIBUTION_KEY = "token_usage_attribution"
+
+
+def _string_arg(value: Any) -> str | None:
+    if isinstance(value, str):
+        normalized = value.strip()
+        return normalized or None
+    return None
+
+
+def _normalize_todos(value: Any) -> list[Todo]:
+    if not isinstance(value, list):
+        return []
+
+    normalized: list[Todo] = []
+    for item in value:
+        if not isinstance(item, dict):
+            continue
+
+        todo: Todo = {}
+        content = _string_arg(item.get("content"))
+        status = item.get("status")
+
+        if content is not None:
+            todo["content"] = content
+        if status in {"pending", "in_progress", "completed"}:
+            todo["status"] = status
+
+        normalized.append(todo)
+
+    return normalized
+
+
+def _todo_action_kind(previous: Todo | None, current: Todo) -> str:
+    status = current.get("status")
+    previous_content = previous.get("content") if previous else None
+    current_content = current.get("content")
+
+    if previous is None:
+        if status == "completed":
+            return "todo_complete"
+        if status == "in_progress":
+            return "todo_start"
+        return "todo_update"
+
+    if previous_content != current_content:
+        return "todo_update"
+
+    if status == "completed":
+        return "todo_complete"
+    if status == "in_progress":
+        return "todo_start"
+    return "todo_update"
+
+
+def _build_todo_actions(previous_todos: list[Todo], next_todos: list[Todo]) -> list[dict[str, Any]]:
+    # This is the single source of truth for precise write_todos token
+    # attribution. The frontend intentionally falls back to a generic
+    # "Update to-do list" label when this metadata is missing or malformed.
+    previous_by_content: dict[str, list[tuple[int, Todo]]] = defaultdict(list)
+    matched_previous_indices: set[int] = set()
+
+    for index, todo in enumerate(previous_todos):
+        content = todo.get("content")
+        if isinstance(content, str) and content:
+            previous_by_content[content].append((index, todo))
+
+    actions: list[dict[str, Any]] = []
+
+    for index, todo in enumerate(next_todos):
+        content = todo.get("content")
+        if not isinstance(content, str) or not content:
+            continue
+
+        previous_match: Todo | None = None
+        content_matches = previous_by_content.get(content)
+        if content_matches:
+            while content_matches and content_matches[0][0] in matched_previous_indices:
+                content_matches.pop(0)
+            if content_matches:
+                previous_index, previous_match = content_matches.pop(0)
+                matched_previous_indices.add(previous_index)
+
+        if previous_match is None and index < len(previous_todos) and index not in matched_previous_indices:
+            previous_match = previous_todos[index]
+            matched_previous_indices.add(index)
+
+        if previous_match is not None:
+            previous_content = previous_match.get("content")
+            previous_status = previous_match.get("status")
+            if previous_content == content and previous_status == todo.get("status"):
+                continue
+
+        actions.append(
+            {
+                "kind": _todo_action_kind(previous_match, todo),
+                "content": content,
+            }
+        )
+
+    for index, todo in enumerate(previous_todos):
+        if index in matched_previous_indices:
+            continue
+
+        content = todo.get("content")
+        if not isinstance(content, str) or not content:
+            continue
+
+        actions.append(
+            {
+                "kind": "todo_remove",
+                "content": content,
+            }
+        )
+
+    return actions
+
+
+def _describe_tool_call(tool_call: dict[str, Any], todos: list[Todo]) -> list[dict[str, Any]]:
+    name = _string_arg(tool_call.get("name")) or "unknown"
+    args = tool_call.get("args") if isinstance(tool_call.get("args"), dict) else {}
+    tool_call_id = _string_arg(tool_call.get("id"))
+
+    if name == "write_todos":
+        next_todos = _normalize_todos(args.get("todos"))
+        actions = _build_todo_actions(todos, next_todos)
+        if not actions:
+            return [
+                {
+                    "kind": "tool",
+                    "tool_name": name,
+                    "tool_call_id": tool_call_id,
+                }
+            ]
+        return [
+            {
+                **action,
+                "tool_call_id": tool_call_id,
+            }
+            for action in actions
+        ]
+
+    if name == "task":
+        return [
+            {
+                "kind": "subagent",
+                "description": _string_arg(args.get("description")),
+                "subagent_type": _string_arg(args.get("subagent_type")),
+                "tool_call_id": tool_call_id,
+            }
+        ]
+
+    if name in {"web_search", "image_search"}:
+        query = _string_arg(args.get("query"))
+        return [
+            {
+                "kind": "search",
+                "tool_name": name,
+                "query": query,
+                "tool_call_id": tool_call_id,
+            }
+        ]
+
+    if name == "present_files":
+        return [
+            {
+                "kind": "present_files",
+                "tool_call_id": tool_call_id,
+            }
+        ]
+
+    if name == "ask_clarification":
+        return [
+            {
+                "kind": "clarification",
+                "tool_call_id": tool_call_id,
+            }
+        ]
+
+    return [
+        {
+            "kind": "tool",
+            "tool_name": name,
+            "description": _string_arg(args.get("description")),
+            "tool_call_id": tool_call_id,
+        }
+    ]
+
+
+def _infer_step_kind(message: AIMessage, actions: list[dict[str, Any]]) -> str:
+    if actions:
+        first_kind = actions[0].get("kind")
+        if len(actions) == 1 and first_kind in {"todo_start", "todo_complete", "todo_update", "todo_remove"}:
+            return "todo_update"
+        if len(actions) == 1 and first_kind == "subagent":
+            return "subagent_dispatch"
+        return "tool_batch"
+
+    if message.content:
+        return "final_answer"
+    return "thinking"
+
+
+def _build_attribution(message: AIMessage, todos: list[Todo]) -> dict[str, Any]:
+    tool_calls = getattr(message, "tool_calls", None) or []
+    actions: list[dict[str, Any]] = []
+    current_todos = list(todos)
+
+    for raw_tool_call in tool_calls:
+        if not isinstance(raw_tool_call, dict):
+            continue
+
+        described_actions = _describe_tool_call(raw_tool_call, current_todos)
+        actions.extend(described_actions)
+
+        if raw_tool_call.get("name") == "write_todos":
+            args = raw_tool_call.get("args") if isinstance(raw_tool_call.get("args"), dict) else {}
+            current_todos = _normalize_todos(args.get("todos"))
+
+    tool_call_ids: list[str] = []
+    for tool_call in tool_calls:
+        if not isinstance(tool_call, dict):
+            continue
+
+        tool_call_id = _string_arg(tool_call.get("id"))
+        if tool_call_id is not None:
+            tool_call_ids.append(tool_call_id)
+
+    return {
+        # Schema changes should remain additive where possible so older
+        # frontends can ignore unknown fields and fall back safely.
+        "version": 1,
+        "kind": _infer_step_kind(message, actions),
+        "shared_attribution": len(actions) > 1,
+        "tool_call_ids": tool_call_ids,
+        "actions": actions,
+    }
+

 class TokenUsageMiddleware(AgentMiddleware):
-    """Logs token usage from model response usage_metadata."""
+    """Logs token usage from model responses and annotates the AI step."""

-    @override
-    def after_model(self, state: AgentState, runtime: Runtime) -> dict | None:
-        return self._log_usage(state)
-
-    @override
-    async def aafter_model(self, state: AgentState, runtime: Runtime) -> dict | None:
-        return self._log_usage(state)
-
-    def _log_usage(self, state: AgentState) -> None:
+    def _apply(self, state: AgentState) -> dict | None:
        messages = state.get("messages", [])
        if not messages:
            return None
+
        last = messages[-1]
+        if not isinstance(last, AIMessage):
+            return None
+
        usage = getattr(last, "usage_metadata", None)
        if usage:
            logger.info(
@@ -34,4 +273,22 @@ class TokenUsageMiddleware(AgentMiddleware):
                usage.get("output_tokens", "?"),
                usage.get("total_tokens", "?"),
            )
-        return None
+
+        todos = state.get("todos") or []
+        attribution = _build_attribution(last, todos if isinstance(todos, list) else [])
+        additional_kwargs = dict(getattr(last, "additional_kwargs", {}) or {})
+
+        if additional_kwargs.get(TOKEN_USAGE_ATTRIBUTION_KEY) == attribution:
+            return None
+
+        additional_kwargs[TOKEN_USAGE_ATTRIBUTION_KEY] = attribution
+        updated_msg = last.model_copy(update={"additional_kwargs": additional_kwargs})
+        return {"messages": [updated_msg]}
+
+    @override
+    def after_model(self, state: AgentState, runtime: Runtime) -> dict | None:
+        return self._apply(state)
+
+    @override
+    async def aafter_model(self, state: AgentState, runtime: Runtime) -> dict | None:
+        return self._apply(state)
@@ -264,25 +264,35 @@ class DeerFlowClient:
        return [{"name": tc["name"], "args": tc["args"], "id": tc.get("id")} for tc in tool_calls]

    @staticmethod
-    def _ai_text_event(msg_id: str | None, text: str, usage: dict | None) -> "StreamEvent":
-        """Build a ``messages-tuple`` AI text event, attaching usage when present."""
+    def _serialize_additional_kwargs(msg) -> dict[str, Any] | None:
+        """Copy message additional_kwargs when present."""
+        additional_kwargs = getattr(msg, "additional_kwargs", None)
+        if isinstance(additional_kwargs, dict) and additional_kwargs:
+            return dict(additional_kwargs)
+        return None
+
+    @staticmethod
+    def _ai_text_event(msg_id: str | None, text: str, usage: dict | None, additional_kwargs: dict[str, Any] | None = None) -> "StreamEvent":
+        """Build a ``messages-tuple`` AI text event."""
        data: dict[str, Any] = {"type": "ai", "content": text, "id": msg_id}
        if usage:
            data["usage_metadata"] = usage
+        if additional_kwargs:
+            data["additional_kwargs"] = additional_kwargs
        return StreamEvent(type="messages-tuple", data=data)

    @staticmethod
-    def _ai_tool_calls_event(msg_id: str | None, tool_calls) -> "StreamEvent":
+    def _ai_tool_calls_event(msg_id: str | None, tool_calls, additional_kwargs: dict[str, Any] | None = None) -> "StreamEvent":
        """Build a ``messages-tuple`` AI tool-calls event."""
-        return StreamEvent(
-            type="messages-tuple",
-            data={
-                "type": "ai",
-                "content": "",
-                "id": msg_id,
-                "tool_calls": DeerFlowClient._serialize_tool_calls(tool_calls),
-            },
-        )
+        data: dict[str, Any] = {
+            "type": "ai",
+            "content": "",
+            "id": msg_id,
+            "tool_calls": DeerFlowClient._serialize_tool_calls(tool_calls),
+        }
+        if additional_kwargs:
+            data["additional_kwargs"] = additional_kwargs
+        return StreamEvent(type="messages-tuple", data=data)

    @staticmethod
    def _tool_message_event(msg: ToolMessage) -> "StreamEvent":
@@ -307,19 +317,30 @@ class DeerFlowClient:
                d["tool_calls"] = DeerFlowClient._serialize_tool_calls(msg.tool_calls)
            if getattr(msg, "usage_metadata", None):
                d["usage_metadata"] = msg.usage_metadata
+            if additional_kwargs := DeerFlowClient._serialize_additional_kwargs(msg):
+                d["additional_kwargs"] = additional_kwargs
            return d
        if isinstance(msg, ToolMessage):
-            return {
+            d = {
                "type": "tool",
                "content": DeerFlowClient._extract_text(msg.content),
                "name": getattr(msg, "name", None),
                "tool_call_id": getattr(msg, "tool_call_id", None),
                "id": getattr(msg, "id", None),
            }
+            if additional_kwargs := DeerFlowClient._serialize_additional_kwargs(msg):
+                d["additional_kwargs"] = additional_kwargs
+            return d
        if isinstance(msg, HumanMessage):
-            return {"type": "human", "content": msg.content, "id": getattr(msg, "id", None)}
+            d = {"type": "human", "content": msg.content, "id": getattr(msg, "id", None)}
+            if additional_kwargs := DeerFlowClient._serialize_additional_kwargs(msg):
+                d["additional_kwargs"] = additional_kwargs
+            return d
        if isinstance(msg, SystemMessage):
-            return {"type": "system", "content": msg.content, "id": getattr(msg, "id", None)}
+            d = {"type": "system", "content": msg.content, "id": getattr(msg, "id", None)}
+            if additional_kwargs := DeerFlowClient._serialize_additional_kwargs(msg):
+                d["additional_kwargs"] = additional_kwargs
+            return d
        return {"type": "unknown", "content": str(msg), "id": getattr(msg, "id", None)}

    @staticmethod
@@ -542,6 +563,7 @@ class DeerFlowClient:
            - type="messages-tuple"  data={"type": "ai", "content": <delta>, "id": str}
            - type="messages-tuple"  data={"type": "ai", "content": <delta>, "id": str, "usage_metadata": {...}}
            - type="messages-tuple"  data={"type": "ai", "content": "", "id": str, "tool_calls": [...]}
+            - type="messages-tuple"  data={"type": "ai", "content": "", "id": str, "additional_kwargs": {...}}
            - type="messages-tuple"  data={"type": "tool", "content": str, "name": str, "tool_call_id": str, "id": str}
            - type="end"             data={"usage": {"input_tokens": int, "output_tokens": int, "total_tokens": int}}
        """
@@ -564,6 +586,7 @@ class DeerFlowClient:
        # in both the final ``messages`` chunk and the values snapshot —
        # count it only on whichever arrives first.
        counted_usage_ids: set[str] = set()
+        sent_additional_kwargs_by_id: dict[str, dict[str, Any]] = {}
        cumulative_usage: dict[str, int] = {"input_tokens": 0, "output_tokens": 0, "total_tokens": 0}

        def _account_usage(msg_id: str | None, usage: Any) -> dict | None:
@@ -593,6 +616,20 @@ class DeerFlowClient:
                "total_tokens": total_tokens,
            }

+        def _unsent_additional_kwargs(msg_id: str | None, additional_kwargs: dict[str, Any] | None) -> dict[str, Any] | None:
+            if not additional_kwargs:
+                return None
+            if not msg_id:
+                return additional_kwargs
+
+            sent = sent_additional_kwargs_by_id.setdefault(msg_id, {})
+            delta = {key: value for key, value in additional_kwargs.items() if sent.get(key) != value}
+            if not delta:
+                return None
+
+            sent.update(delta)
+            return delta
+
        for item in self._agent.stream(
            state,
            config=config,
@@ -620,17 +657,31 @@ class DeerFlowClient:

                if isinstance(msg_chunk, AIMessage):
                    text = self._extract_text(msg_chunk.content)
+                    additional_kwargs = self._serialize_additional_kwargs(msg_chunk)
                    counted_usage = _account_usage(msg_id, msg_chunk.usage_metadata)
+                    sent_additional_kwargs = False

                    if text:
                        if msg_id:
                            streamed_ids.add(msg_id)
-                        yield self._ai_text_event(msg_id, text, counted_usage)
+                        additional_kwargs_delta = _unsent_additional_kwargs(msg_id, additional_kwargs)
+                        yield self._ai_text_event(
+                            msg_id,
+                            text,
+                            counted_usage,
+                            additional_kwargs_delta,
+                        )
+                        sent_additional_kwargs = bool(additional_kwargs_delta)

                    if msg_chunk.tool_calls:
                        if msg_id:
                            streamed_ids.add(msg_id)
-                        yield self._ai_tool_calls_event(msg_id, msg_chunk.tool_calls)
+                        additional_kwargs_delta = None if sent_additional_kwargs else _unsent_additional_kwargs(msg_id, additional_kwargs)
+                        yield self._ai_tool_calls_event(
+                            msg_id,
+                            msg_chunk.tool_calls,
+                            additional_kwargs_delta,
+                        )

                elif isinstance(msg_chunk, ToolMessage):
                    if msg_id:
@@ -653,17 +704,45 @@ class DeerFlowClient:
                if msg_id and msg_id in streamed_ids:
                    if isinstance(msg, AIMessage):
                        _account_usage(msg_id, getattr(msg, "usage_metadata", None))
+                        additional_kwargs = self._serialize_additional_kwargs(msg)
+                        additional_kwargs_delta = _unsent_additional_kwargs(msg_id, additional_kwargs)
+                        if additional_kwargs_delta:
+                            # Metadata-only follow-up: ``messages-tuple`` has no
+                            # dedicated attribution event, so clients should
+                            # merge this empty-content AI event by message id
+                            # and ignore it for text rendering.
+                            yield self._ai_text_event(msg_id, "", None, additional_kwargs_delta)
                    continue

                if isinstance(msg, AIMessage):
                    counted_usage = _account_usage(msg_id, msg.usage_metadata)
+                    additional_kwargs = self._serialize_additional_kwargs(msg)
+                    sent_additional_kwargs = False

                    if msg.tool_calls:
-                        yield self._ai_tool_calls_event(msg_id, msg.tool_calls)
+                        additional_kwargs_delta = _unsent_additional_kwargs(msg_id, additional_kwargs)
+                        yield self._ai_tool_calls_event(
+                            msg_id,
+                            msg.tool_calls,
+                            additional_kwargs_delta,
+                        )
+                        sent_additional_kwargs = bool(additional_kwargs_delta)

                    text = self._extract_text(msg.content)
                    if text:
-                        yield self._ai_text_event(msg_id, text, counted_usage)
+                        additional_kwargs_delta = None if sent_additional_kwargs else _unsent_additional_kwargs(msg_id, additional_kwargs)
+                        yield self._ai_text_event(
+                            msg_id,
+                            text,
+                            counted_usage,
+                            additional_kwargs_delta,
+                        )
+                    elif msg_id:
+                        additional_kwargs_delta = None if sent_additional_kwargs else _unsent_additional_kwargs(msg_id, additional_kwargs)
+                        if not additional_kwargs_delta:
+                            continue
+                        # See the metadata-only follow-up convention above.
+                        yield self._ai_text_event(msg_id, "", None, additional_kwargs_delta)

                elif isinstance(msg, ToolMessage):
                    yield self._tool_message_event(msg)
@@ -84,8 +84,52 @@ class RemoteSandboxBackend(SandboxBackend):
        """
        return self._provisioner_discover(sandbox_id)

+    def list_running(self) -> list[SandboxInfo]:
+        """Return all sandboxes currently managed by the provisioner.
+
+        Calls ``GET /api/sandboxes`` so that ``AioSandboxProvider._reconcile_orphans()``
+        can adopt pods that were created by a previous process and were never
+        explicitly destroyed.
+        Without this, a process restart silently orphans all existing k8s Pods —
+        they stay running forever because the idle checker only
+        tracks in-process state.
+        """
+        return self._provisioner_list()
+
    # ── Provisioner API calls ─────────────────────────────────────────────

+    def _provisioner_list(self) -> list[SandboxInfo]:
+        """GET /api/sandboxes → list all running sandboxes."""
+        try:
+            resp = requests.get(f"{self._provisioner_url}/api/sandboxes", timeout=10)
+            resp.raise_for_status()
+            data = resp.json()
+            if not isinstance(data, dict):
+                logger.warning("Provisioner list_running returned non-dict payload: %r", type(data))
+                return []
+
+            sandboxes = data.get("sandboxes", [])
+            if not isinstance(sandboxes, list):
+                logger.warning("Provisioner list_running returned non-list sandboxes: %r", type(sandboxes))
+                return []
+
+            infos: list[SandboxInfo] = []
+            for sandbox in sandboxes:
+                if not isinstance(sandbox, dict):
+                    logger.warning("Provisioner list_running entry is not a dict: %r", type(sandbox))
+                    continue
+
+                sandbox_id = sandbox.get("sandbox_id")
+                sandbox_url = sandbox.get("sandbox_url")
+                if isinstance(sandbox_id, str) and sandbox_id and isinstance(sandbox_url, str) and sandbox_url:
+                    infos.append(SandboxInfo(sandbox_id=sandbox_id, sandbox_url=sandbox_url))
+
+            logger.info("Provisioner list_running: %d sandbox(es) found", len(infos))
+            return infos
+        except requests.RequestException as exc:
+            logger.warning("Provisioner list_running failed: %s", exc)
+            return []
+
    def _provisioner_create(self, thread_id: str, sandbox_id: str, extra_mounts: list[tuple[str, str, bool]] | None = None) -> SandboxInfo:
        """POST /api/sandboxes → create Pod + Service."""
        try:
@@ -1,13 +1,22 @@
-"""Configuration and loaders for custom agents."""
+"""Configuration and loaders for custom agents.
+
+Custom agents are stored per-user under ``{base_dir}/users/{user_id}/agents/{name}/``.
+A legacy shared layout at ``{base_dir}/agents/{name}/`` is still readable so that
+installations that pre-date user isolation continue to work until they run the
+``scripts/migrate_user_isolation.py`` migration. New writes always target the
+per-user layout.
+"""

 import logging
 import re
+from pathlib import Path
 from typing import Any

 import yaml
 from pydantic import BaseModel

 from deerflow.config.paths import get_paths
+from deerflow.runtime.user_context import get_effective_user_id

 logger = logging.getLogger(__name__)

@@ -40,14 +49,47 @@ class AgentConfig(BaseModel):
    skills: list[str] | None = None


-def load_agent_config(name: str | None) -> AgentConfig | None:
+def resolve_agent_dir(name: str, *, user_id: str | None = None) -> Path:
+    """Return the on-disk directory for an agent, preferring the per-user layout.
+
+    Resolution order:
+    1. ``{base_dir}/users/{user_id}/agents/{name}/`` (per-user, current layout).
+    2. ``{base_dir}/agents/{name}/`` (legacy shared layout — read-only fallback).
+
+    If neither exists, the per-user path is returned so callers that intend to
+    create the agent write into the new layout.
+
+    Args:
+        name: Validated agent name.
+        user_id: Owner of the agent. Defaults to the effective user from the
+            request context (or ``"default"`` in no-auth mode).
+    """
+    paths = get_paths()
+    effective_user = user_id or get_effective_user_id()
+    user_path = paths.user_agent_dir(effective_user, name)
+    if user_path.exists():
+        return user_path
+
+    legacy_path = paths.agent_dir(name)
+    if legacy_path.exists():
+        return legacy_path
+
+    return user_path
+
+
+def load_agent_config(name: str | None, *, user_id: str | None = None) -> AgentConfig | None:
    """Load the custom or default agent's config from its directory.

+    Reads from the per-user layout first; falls back to the legacy shared layout
+    for installations that have not yet been migrated.
+
    Args:
        name: The agent name.
+        user_id: Owner of the agent. Defaults to the effective user from the
+            current request context.

    Returns:
-        AgentConfig instance.
+        AgentConfig instance, or ``None`` if ``name`` is ``None``.

    Raises:
        FileNotFoundError: If the agent directory or config.yaml does not exist.
@@ -58,7 +100,7 @@ def load_agent_config(name: str | None) -> AgentConfig | None:
        return None

    name = validate_agent_name(name)
-    agent_dir = get_paths().agent_dir(name)
+    agent_dir = resolve_agent_dir(name, user_id=user_id)
    config_file = agent_dir / "config.yaml"

    if not agent_dir.exists():
@@ -84,7 +126,7 @@ def load_agent_config(name: str | None) -> AgentConfig | None:
    return AgentConfig(**data)


-def load_agent_soul(agent_name: str | None) -> str | None:
+def load_agent_soul(agent_name: str | None, *, user_id: str | None = None) -> str | None:
    """Read the SOUL.md file for a custom agent, if it exists.

    SOUL.md defines the agent's personality, values, and behavioral guardrails.
@@ -92,11 +134,16 @@ def load_agent_soul(agent_name: str | None) -> str | None:

    Args:
        agent_name: The name of the agent or None for the default agent.
+        user_id: Owner of the agent. Defaults to the effective user from the
+            current request context.

    Returns:
        The SOUL.md content as a string, or None if the file does not exist.
    """
-    agent_dir = get_paths().agent_dir(agent_name) if agent_name else get_paths().base_dir
+    if agent_name:
+        agent_dir = resolve_agent_dir(agent_name, user_id=user_id)
+    else:
+        agent_dir = get_paths().base_dir
    soul_path = agent_dir / SOUL_FILENAME
    if not soul_path.exists():
        return None
@@ -104,32 +151,50 @@ def load_agent_soul(agent_name: str | None) -> str | None:
    return content or None


-def list_custom_agents() -> list[AgentConfig]:
+def list_custom_agents(*, user_id: str | None = None) -> list[AgentConfig]:
    """Scan the agents directory and return all valid custom agents.

+    Returns the union of agents in the per-user layout and the legacy shared
+    layout, so that pre-migration installations remain visible until they are
+    migrated. Per-user entries shadow legacy entries with the same name.
+
+    Args:
+        user_id: Owner whose agents to list. Defaults to the effective user
+            from the current request context.
+
    Returns:
        List of AgentConfig for each valid agent directory found.
    """
-    agents_dir = get_paths().agents_dir
-
-    if not agents_dir.exists():
-        return []
+    paths = get_paths()
+    effective_user = user_id or get_effective_user_id()

+    seen: set[str] = set()
    agents: list[AgentConfig] = []

-    for entry in sorted(agents_dir.iterdir()):
-        if not entry.is_dir():
+    user_root = paths.user_agents_dir(effective_user)
+    legacy_root = paths.agents_dir
+
+    for root in (user_root, legacy_root):
+        if not root.exists():
            continue
+        for entry in sorted(root.iterdir()):
+            if not entry.is_dir():
+                continue
+            if entry.name in seen:
+                continue
+            config_file = entry / "config.yaml"
+            if not config_file.exists():
+                logger.debug(f"Skipping {entry.name}: no config.yaml")
+                continue

-        config_file = entry / "config.yaml"
-        if not config_file.exists():
-            logger.debug(f"Skipping {entry.name}: no config.yaml")
-            continue
-
-        try:
-            agent_cfg = load_agent_config(entry.name)
-            agents.append(agent_cfg)
-        except Exception as e:
-            logger.warning(f"Skipping agent '{entry.name}': {e}")
+            try:
+                agent_cfg = load_agent_config(entry.name, user_id=effective_user)
+                if agent_cfg is None:
+                    continue
+                agents.append(agent_cfg)
+                seen.add(entry.name)
+            except Exception as e:
+                logger.warning(f"Skipping agent '{entry.name}': {e}")

+    agents.sort(key=lambda a: a.name)
    return agents
@@ -132,15 +132,20 @@ class Paths:

    @property
    def agents_dir(self) -> Path:
-        """Root directory for all custom agents: `{base_dir}/agents/`."""
+        """Legacy root for shared (pre user-isolation) custom agents: `{base_dir}/agents/`.
+
+        New code should use :meth:`user_agents_dir` instead. This property remains
+        only as a read-side fallback for installations that have not yet run the
+        ``migrate_user_isolation.py`` script.
+        """
        return self.base_dir / "agents"

    def agent_dir(self, name: str) -> Path:
-        """Directory for a specific agent: `{base_dir}/agents/{name}/`."""
+        """Legacy per-agent directory (no user isolation): `{base_dir}/agents/{name}/`."""
        return self.agents_dir / name.lower()

    def agent_memory_file(self, name: str) -> Path:
-        """Per-agent memory file: `{base_dir}/agents/{name}/memory.json`."""
+        """Legacy per-agent memory file: `{base_dir}/agents/{name}/memory.json`."""
        return self.agent_dir(name) / "memory.json"

    def user_dir(self, user_id: str) -> Path:
@@ -151,9 +156,17 @@ class Paths:
        """Per-user memory file: `{base_dir}/users/{user_id}/memory.json`."""
        return self.user_dir(user_id) / "memory.json"

+    def user_agents_dir(self, user_id: str) -> Path:
+        """Per-user root for that user's custom agents: `{base_dir}/users/{user_id}/agents/`."""
+        return self.user_dir(user_id) / "agents"
+
+    def user_agent_dir(self, user_id: str, agent_name: str) -> Path:
+        """Per-user per-agent directory: `{base_dir}/users/{user_id}/agents/{name}/`."""
+        return self.user_agents_dir(user_id) / agent_name.lower()
+
    def user_agent_memory_file(self, user_id: str, agent_name: str) -> Path:
        """Per-user per-agent memory: `{base_dir}/users/{user_id}/agents/{name}/memory.json`."""
-        return self.user_dir(user_id) / "agents" / agent_name.lower() / "memory.json"
+        return self.user_agent_dir(user_id, agent_name) / "memory.json"

    def thread_dir(self, thread_id: str, *, user_id: str | None = None) -> Path:
        """
@@ -6,6 +6,13 @@ from pydantic import BaseModel, Field
 from deerflow.config.runtime_paths import project_root, resolve_path


+def _legacy_skills_candidates() -> tuple[Path, ...]:
+    """Return source-tree skills locations for monorepo compatibility."""
+    backend_dir = Path(__file__).resolve().parents[4]
+    repo_root = backend_dir.parent
+    return (repo_root / "skills",)
+
+
 class SkillsConfig(BaseModel):
    """Configuration for skills system"""

@@ -15,7 +22,7 @@ class SkillsConfig(BaseModel):
    )
    path: str | None = Field(
        default=None,
-        description="Path to skills directory. If not specified, defaults to skills under the caller project root.",
+        description=("Path to skills directory. If not specified, defaults to `skills` under the caller project root, falling back to the legacy repo-root location for monorepo compatibility."),
    )
    container_path: str = Field(
        default="/mnt/skills",
@@ -26,15 +33,30 @@ class SkillsConfig(BaseModel):
        """
        Get the resolved skills directory path.

-        Returns:
-            Path to the skills directory
+        Resolution order:
+            1. Explicit ``path`` field
+            2. ``DEER_FLOW_SKILLS_PATH`` environment variable
+            3. ``skills`` under the caller project root (``project_root()``)
+            4. Legacy repo-root candidates for monorepo compatibility (``_legacy_skills_candidates``)
+
+        When none of (3) or (4) exist on disk, the project-root default is returned so callers
+        can still surface a stable "no skills" location without raising.
        """
        if self.path:
            # Use configured path (can be absolute or relative to project root)
            return resolve_path(self.path)
        if env_path := os.getenv("DEER_FLOW_SKILLS_PATH"):
            return resolve_path(env_path)
-        return project_root() / "skills"
+
+        project_default = project_root() / "skills"
+        if project_default.is_dir():
+            return project_default
+
+        for candidate in _legacy_skills_candidates():
+            if candidate.is_dir():
+                return candidate
+
+        return project_default

    def get_skill_container_path(self, skill_name: str, category: str = "public") -> str:
        """
@@ -2,10 +2,12 @@ from .clarification_tool import ask_clarification_tool
 from .present_file_tool import present_file_tool
 from .setup_agent_tool import setup_agent
 from .task_tool import task_tool
+from .update_agent_tool import update_agent
 from .view_image_tool import view_image_tool

 __all__ = [
    "setup_agent",
+    "update_agent",
    "present_file_tool",
    "ask_clarification_tool",
    "view_image_tool",
@@ -8,6 +8,7 @@ from langgraph.types import Command

 from deerflow.config.agents_config import validate_agent_name
 from deerflow.config.paths import get_paths
+from deerflow.runtime.user_context import get_effective_user_id

 logger = logging.getLogger(__name__)

@@ -34,7 +35,14 @@ def setup_agent(
    try:
        agent_name = validate_agent_name(agent_name)
        paths = get_paths()
-        agent_dir = paths.agent_dir(agent_name) if agent_name else paths.base_dir
+        if agent_name:
+            # Custom agents are persisted under the current user's bucket so
+            # different users do not see each other's agents.
+            user_id = get_effective_user_id()
+            agent_dir = paths.user_agent_dir(user_id, agent_name)
+        else:
+            # Default agent (no agent_name): SOUL.md lives at the global base dir.
+            agent_dir = paths.base_dir
        is_new_dir = not agent_dir.exists()
        agent_dir.mkdir(parents=True, exist_ok=True)

@@ -0,0 +1,241 @@
+"""update_agent tool — let a custom agent persist updates to its own SOUL.md / config.
+
+Bound to the lead agent only when ``runtime.context['agent_name']`` is set
+(i.e. inside an existing custom agent's chat). The default agent does not see
+this tool, and the bootstrap flow continues to use ``setup_agent`` for the
+initial creation handshake.
+
+The tool writes back to ``{base_dir}/users/{user_id}/agents/{agent_name}/{config.yaml,SOUL.md}``
+so an agent created by one user is never visible to (or mutable by) another.
+Writes are staged into temp files first; both files are renamed into place only
+after both temp files are successfully written, so a partial failure cannot leave
+config.yaml updated while SOUL.md still holds stale content.
+"""
+
+from __future__ import annotations
+
+import logging
+import tempfile
+from pathlib import Path
+from typing import Any
+
+import yaml
+from langchain_core.messages import ToolMessage
+from langchain_core.tools import tool
+from langgraph.prebuilt import ToolRuntime
+from langgraph.types import Command
+
+from deerflow.config.agents_config import load_agent_config, validate_agent_name
+from deerflow.config.app_config import get_app_config
+from deerflow.config.paths import get_paths
+from deerflow.runtime.user_context import get_effective_user_id
+
+logger = logging.getLogger(__name__)
+
+
+def _stage_temp(path: Path, text: str) -> Path:
+    """Write ``text`` into a sibling temp file and return its path.
+
+    The caller is responsible for ``Path.replace``-ing the temp into the target
+    once every staged file is ready, or for unlinking it on failure.
+    """
+    path.parent.mkdir(parents=True, exist_ok=True)
+    fd = tempfile.NamedTemporaryFile(
+        mode="w",
+        dir=path.parent,
+        suffix=".tmp",
+        delete=False,
+        encoding="utf-8",
+    )
+    try:
+        fd.write(text)
+        fd.flush()
+        fd.close()
+        return Path(fd.name)
+    except BaseException:
+        fd.close()
+        Path(fd.name).unlink(missing_ok=True)
+        raise
+
+
+def _cleanup_temps(temps: list[Path]) -> None:
+    """Best-effort removal of staged temp files."""
+    for tmp in temps:
+        try:
+            tmp.unlink(missing_ok=True)
+        except OSError:
+            logger.debug("Failed to clean up temp file %s", tmp, exc_info=True)
+
+
+@tool
+def update_agent(
+    runtime: ToolRuntime,
+    soul: str | None = None,
+    description: str | None = None,
+    skills: list[str] | None = None,
+    tool_groups: list[str] | None = None,
+    model: str | None = None,
+) -> Command:
+    """Persist updates to the current custom agent's SOUL.md and config.yaml.
+
+    Use this when the user asks to refine the agent's identity, description,
+    skill whitelist, tool-group whitelist, or default model. Only the fields
+    you explicitly pass are updated; omitted fields keep their existing values.
+
+    Pass ``soul`` as the FULL replacement SOUL.md content — there is no patch
+    semantics, so always start from the current SOUL and apply your edits.
+
+    Pass ``skills=[]`` to disable all skills for this agent. Omit ``skills``
+    entirely to keep the existing whitelist.
+
+    Args:
+        soul: Optional full replacement SOUL.md content.
+        description: Optional new one-line description.
+        skills: Optional skill whitelist. ``[]`` = no skills, omit = unchanged.
+        tool_groups: Optional tool-group whitelist. ``[]`` = empty, omit = unchanged.
+        model: Optional model override (must match a configured model name).
+
+    Returns:
+        Command with a ToolMessage describing the result. Changes take effect
+        on the next user turn (when the lead agent is rebuilt with the fresh
+        SOUL.md and config.yaml).
+    """
+    tool_call_id = runtime.tool_call_id
+    agent_name_raw: str | None = runtime.context.get("agent_name") if runtime.context else None
+
+    def _err(message: str) -> Command:
+        return Command(update={"messages": [ToolMessage(content=f"Error: {message}", tool_call_id=tool_call_id)]})
+
+    if soul is None and description is None and skills is None and tool_groups is None and model is None:
+        return _err("No fields provided. Pass at least one of: soul, description, skills, tool_groups, model.")
+
+    try:
+        agent_name = validate_agent_name(agent_name_raw)
+    except ValueError as e:
+        return _err(str(e))
+
+    if not agent_name:
+        return _err("update_agent is only available inside a custom agent's chat. There is no agent_name in the current runtime context, so there is nothing to update. If you are inside the bootstrap flow, use setup_agent instead.")
+
+    # Resolve the active user so that updates only affect this user's agent.
+    # ``get_effective_user_id`` returns DEFAULT_USER_ID when no auth context
+    # is set (matching how memory and thread storage behave).
+    user_id = get_effective_user_id()
+
+    # Reject an unknown ``model`` *before* touching the filesystem. Otherwise
+    # ``_resolve_model_name`` silently falls back to the default at runtime
+    # and the user sees confusing repeated warnings on every later turn.
+    if model is not None and get_app_config().get_model_config(model) is None:
+        return _err(f"Unknown model '{model}'. Pass a model name that exists in config.yaml's models section.")
+
+    paths = get_paths()
+    agent_dir = paths.user_agent_dir(user_id, agent_name)
+    if not agent_dir.exists() and paths.agent_dir(agent_name).exists():
+        return _err(f"Agent '{agent_name}' only exists in the legacy shared layout and is not scoped to a user. Run scripts/migrate_user_isolation.py to move legacy agents into the per-user layout before updating.")
+
+    try:
+        existing_cfg = load_agent_config(agent_name, user_id=user_id)
+    except FileNotFoundError:
+        return _err(f"Agent '{agent_name}' does not exist for the current user. Use setup_agent to create a new agent first.")
+    except ValueError as e:
+        return _err(f"Agent '{agent_name}' has an unreadable config: {e}")
+
+    if existing_cfg is None:
+        return _err(f"Agent '{agent_name}' could not be loaded.")
+
+    updated_fields: list[str] = []
+
+    # Force the on-disk ``name`` to match the directory we are writing into,
+    # even if ``existing_cfg.name`` had drifted (e.g. from manual yaml edits).
+    config_data: dict[str, Any] = {"name": agent_name}
+    new_description = description if description is not None else existing_cfg.description
+    config_data["description"] = new_description
+    if description is not None and description != existing_cfg.description:
+        updated_fields.append("description")
+
+    new_model = model if model is not None else existing_cfg.model
+    if new_model is not None:
+        config_data["model"] = new_model
+    if model is not None and model != existing_cfg.model:
+        updated_fields.append("model")
+
+    new_tool_groups = tool_groups if tool_groups is not None else existing_cfg.tool_groups
+    if new_tool_groups is not None:
+        config_data["tool_groups"] = new_tool_groups
+    if tool_groups is not None and tool_groups != existing_cfg.tool_groups:
+        updated_fields.append("tool_groups")
+
+    new_skills = skills if skills is not None else existing_cfg.skills
+    if new_skills is not None:
+        config_data["skills"] = new_skills
+    if skills is not None and skills != existing_cfg.skills:
+        updated_fields.append("skills")
+
+    config_changed = bool({"description", "model", "tool_groups", "skills"} & set(updated_fields))
+
+    # Stage every file we intend to rewrite into a temp sibling. Only after
+    # *all* temp files exist do we rename them into place — so a failure on
+    # SOUL.md cannot leave config.yaml already replaced.
+    pending: list[tuple[Path, Path]] = []
+    staged_temps: list[Path] = []
+
+    try:
+        agent_dir.mkdir(parents=True, exist_ok=True)
+
+        if config_changed:
+            yaml_text = yaml.dump(config_data, default_flow_style=False, allow_unicode=True, sort_keys=False)
+            config_target = agent_dir / "config.yaml"
+            config_tmp = _stage_temp(config_target, yaml_text)
+            staged_temps.append(config_tmp)
+            pending.append((config_tmp, config_target))
+
+        if soul is not None:
+            soul_target = agent_dir / "SOUL.md"
+            soul_tmp = _stage_temp(soul_target, soul)
+            staged_temps.append(soul_tmp)
+            pending.append((soul_tmp, soul_target))
+            updated_fields.append("soul")
+
+        # Commit phase. ``Path.replace`` is atomic per file on POSIX/NTFS and
+        # the staging step above means any earlier failure has already been
+        # reported. The remaining failure mode is a crash *between* two
+        # ``replace`` calls, which is reported via the partial-write error
+        # branch below so the caller knows which files are now on disk.
+        committed: list[Path] = []
+        try:
+            for tmp, target in pending:
+                tmp.replace(target)
+                committed.append(target)
+        except Exception as e:
+            _cleanup_temps([t for t, _ in pending if t not in committed])
+            if committed:
+                logger.error(
+                    "[update_agent] Partial write for agent '%s' (user=%s): committed=%s, failed during rename: %s",
+                    agent_name,
+                    user_id,
+                    [p.name for p in committed],
+                    e,
+                    exc_info=True,
+                )
+                return _err(f"Partial update for agent '{agent_name}': {[p.name for p in committed]} were updated, but the rest failed ({e}). Re-run update_agent to retry the remaining fields.")
+            raise
+
+    except Exception as e:
+        _cleanup_temps(staged_temps)
+        logger.error("[update_agent] Failed to update agent '%s' (user=%s): %s", agent_name, user_id, e, exc_info=True)
+        return _err(f"Failed to update agent '{agent_name}': {e}")
+
+    if not updated_fields:
+        return Command(update={"messages": [ToolMessage(content=f"No changes applied to agent '{agent_name}'. The provided values matched the existing config.", tool_call_id=tool_call_id)]})
+
+    logger.info("[update_agent] Updated agent '%s' (user=%s) fields: %s", agent_name, user_id, updated_fields)
+    return Command(
+        update={
+            "messages": [
+                ToolMessage(
+                    content=(f"Agent '{agent_name}' updated successfully. Changed: {', '.join(updated_fields)}. The new configuration takes effect on the next user turn."),
+                    tool_call_id=tool_call_id,
+                )
+            ]
+        }
+    )
@@ -1,7 +1,7 @@
 """One-time migration: move legacy thread dirs and memory into per-user layout.

 Usage:
-    PYTHONPATH=. python scripts/migrate_user_isolation.py [--dry-run]
+    PYTHONPATH=. python scripts/migrate_user_isolation.py [--dry-run] [--user-id USER_ID]

 The script is idempotent — re-running it after a successful migration is a no-op.
 """
@@ -69,6 +69,67 @@ def migrate_thread_dirs(
    return report


+def migrate_agents(
+    paths: Paths,
+    user_id: str = "default",
+    *,
+    dry_run: bool = False,
+) -> list[dict]:
+    """Move legacy custom-agent directories into per-user layout.
+
+    Legacy layout:  ``{base_dir}/agents/{name}/``
+    Per-user layout: ``{base_dir}/users/{user_id}/agents/{name}/``
+
+    Pre-existing per-user agents take precedence: if a destination already
+    exists for an agent name, the legacy copy is moved to
+    ``{base_dir}/migration-conflicts/agents/{name}/`` for manual review.
+
+    Args:
+        paths: Paths instance.
+        user_id: Target user to receive the legacy agents (defaults to
+            ``"default"``, matching ``DEFAULT_USER_ID`` for no-auth setups).
+        dry_run: If True, only log what would happen.
+
+    Returns:
+        List of migration report entries, one per legacy agent directory found.
+    """
+    report: list[dict] = []
+    legacy_agents = paths.agents_dir
+    if not legacy_agents.exists():
+        logger.info("No legacy agents directory found — nothing to migrate.")
+        return report
+
+    for agent_dir in sorted(legacy_agents.iterdir()):
+        if not agent_dir.is_dir():
+            continue
+        agent_name = agent_dir.name
+        dest = paths.user_agent_dir(user_id, agent_name)
+
+        entry = {"agent": agent_name, "user_id": user_id, "action": ""}
+
+        if dest.exists():
+            conflicts_dir = paths.base_dir / "migration-conflicts" / "agents" / agent_name
+            entry["action"] = f"conflict -> {conflicts_dir}"
+            if not dry_run:
+                conflicts_dir.parent.mkdir(parents=True, exist_ok=True)
+                shutil.move(str(agent_dir), str(conflicts_dir))
+            logger.warning("Conflict for agent %s: moved legacy copy to %s", agent_name, conflicts_dir)
+        else:
+            entry["action"] = f"moved -> {dest}"
+            if not dry_run:
+                dest.parent.mkdir(parents=True, exist_ok=True)
+                shutil.move(str(agent_dir), str(dest))
+            logger.info("Migrated agent %s -> user %s", agent_name, user_id)
+
+        report.append(entry)
+
+    # Clean up empty legacy agents dir
+    if not dry_run and legacy_agents.exists() and not any(legacy_agents.iterdir()):
+        legacy_agents.rmdir()
+
+    return report
+
+
 def migrate_memory(
    paths: Paths,
    user_id: str = "default",
@@ -127,6 +188,12 @@ def _build_owner_map_from_db(paths: Paths) -> dict[str, str]:
 def main() -> None:
    parser = argparse.ArgumentParser(description="Migrate DeerFlow data to per-user layout")
    parser.add_argument("--dry-run", action="store_true", help="Log actions without making changes")
+    parser.add_argument(
+        "--user-id",
+        default="default",
+        metavar="USER_ID",
+        help=("User ID to claim un-owned legacy data (global memory.json and legacy custom agents). Defaults to 'default'. In multi-user installs, set this to the operator account that should inherit those legacy artifacts."),
+    )
    args = parser.parse_args()

    logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")
@@ -134,26 +201,42 @@ def main() -> None:
    paths = get_paths()
    logger.info("Base directory: %s", paths.base_dir)
    logger.info("Dry run: %s", args.dry_run)
+    logger.info("Claiming un-owned legacy data for user_id=%s", args.user_id)

    owner_map = _build_owner_map_from_db(paths)
    logger.info("Found %d thread ownership records in DB", len(owner_map))

    report = migrate_thread_dirs(paths, owner_map, dry_run=args.dry_run)
-    migrate_memory(paths, user_id="default", dry_run=args.dry_run)
+    migrate_memory(paths, user_id=args.user_id, dry_run=args.dry_run)
+    agent_report = migrate_agents(paths, user_id=args.user_id, dry_run=args.dry_run)

    if report:
-        logger.info("Migration report:")
+        logger.info("Thread migration report:")
        for entry in report:
            logger.info("  thread=%s user=%s action=%s", entry["thread_id"], entry["user_id"], entry["action"])
    else:
        logger.info("No threads to migrate.")

+    if agent_report:
+        logger.info("Agent migration report:")
+        for entry in agent_report:
+            logger.info("  agent=%s user=%s action=%s", entry["agent"], entry["user_id"], entry["action"])
+    else:
+        logger.info("No agents to migrate.")
+
    unowned = [e for e in report if e["user_id"] == "default"]
    if unowned:
        logger.warning("%d thread(s) had no owner and were assigned to 'default':", len(unowned))
        for e in unowned:
            logger.warning("  %s", e["thread_id"])

+    if agent_report:
+        logger.warning(
+            "%d legacy agent(s) were assigned to '%s'. If those agents belonged to other users, move them manually under {base_dir}/users/<user_id>/agents/.",
+            len(agent_report),
+            args.user_id,
+        )
+

 if __name__ == "__main__":
    main()
@@ -372,6 +372,37 @@ class TestExtractResponseText:
        # Should return "" (no text in current turn), NOT "Hi there!" from previous turn
        assert _extract_response_text(result) == ""

+    def test_does_not_publish_loop_warning_on_tool_calling_ai_message(self):
+        """Loop-detection warning text on a tool-calling AI message is middleware-authored."""
+        from app.channels.manager import _extract_response_text
+
+        result = {
+            "messages": [
+                {"type": "human", "content": "search the repo"},
+                {
+                    "type": "ai",
+                    "content": "[LOOP DETECTED] You are repeating the same tool calls.",
+                    "tool_calls": [{"name": "grep", "args": {"pattern": "TODO"}, "id": "call_1"}],
+                },
+            ]
+        }
+        assert _extract_response_text(result) == ""
+
+    def test_preserves_visible_text_when_stripping_loop_warning(self):
+        from app.channels.manager import _extract_response_text
+
+        result = {
+            "messages": [
+                {"type": "human", "content": "prepare the report"},
+                {
+                    "type": "ai",
+                    "content": "Here is the report.\n\n[LOOP DETECTED] You are repeating the same tool calls.",
+                    "tool_calls": [{"name": "present_files", "args": {"filepaths": ["/mnt/user-data/outputs/report.md"]}, "id": "call_1"}],
+                },
+            ]
+        }
+        assert _extract_response_text(result) == "Here is the report."
+

 # ---------------------------------------------------------------------------
 # ChannelManager tests
@@ -530,6 +561,8 @@ class TestChannelManager:
            assert call_args[0][0] == "test-thread-123"  # thread_id
            assert call_args[0][1] == "lead_agent"  # assistant_id
            assert call_args[1]["input"]["messages"][0]["content"] == "hi"
+            assert call_args[1]["config"]["configurable"]["checkpoint_ns"] == ""
+            assert call_args[1]["config"]["configurable"]["thread_id"] == "test-thread-123"

            assert len(outbound_received) == 1
            assert outbound_received[0].text == "Hello from agent!"
@@ -661,12 +694,135 @@ class TestChannelManager:
            call_args = mock_client.runs.wait.call_args
            assert call_args[0][1] == "lead_agent"
            assert call_args[1]["config"]["recursion_limit"] == 55
+            assert call_args[1]["config"]["configurable"]["checkpoint_ns"] == ""
+            assert call_args[1]["config"]["configurable"]["thread_id"] == "test-thread-123"
            assert call_args[1]["context"]["thinking_enabled"] is False
            assert call_args[1]["context"]["subagent_enabled"] is True
            assert call_args[1]["context"]["agent_name"] == "mobile-agent"

        _run(go())

+    def test_clarification_follow_up_preserves_history(self):
+        """Conversation should continue after ask_clarification instead of resetting history."""
+        from app.channels.manager import ChannelManager
+
+        async def go():
+            bus = MessageBus()
+            store = ChannelStore(path=Path(tempfile.mkdtemp()) / "store.json")
+            manager = ChannelManager(bus=bus, store=store)
+
+            outbound_received = []
+
+            async def capture_outbound(msg):
+                outbound_received.append(msg)
+
+            bus.subscribe_outbound(capture_outbound)
+
+            history_by_checkpoint: dict[tuple[str, str], list[str]] = {}
+
+            async def _runs_wait(thread_id, assistant_id, *, input, config, context):
+                del assistant_id, context  # unused in this test, kept for signature parity
+
+                checkpoint_ns = config.get("configurable", {}).get("checkpoint_ns")
+                key = (thread_id, str(checkpoint_ns))
+                history = history_by_checkpoint.setdefault(key, [])
+
+                human_text = input["messages"][0]["content"]
+                history.append(human_text)
+
+                if len(history) == 1:
+                    return {
+                        "messages": [
+                            {"type": "human", "content": history[0]},
+                            {
+                                "type": "ai",
+                                "content": "",
+                                "tool_calls": [
+                                    {
+                                        "name": "ask_clarification",
+                                        "args": {"question": "Which environment should I use?"},
+                                    }
+                                ],
+                            },
+                            {
+                                "type": "tool",
+                                "name": "ask_clarification",
+                                "content": "Which environment should I use?",
+                            },
+                        ]
+                    }
+
+                if len(history) == 2 and history[0] == "Deploy my app" and history[1] == "prod":
+                    return {
+                        "messages": [
+                            {"type": "human", "content": history[0]},
+                            {
+                                "type": "ai",
+                                "content": "",
+                                "tool_calls": [
+                                    {
+                                        "name": "ask_clarification",
+                                        "args": {"question": "Which environment should I use?"},
+                                    }
+                                ],
+                            },
+                            {
+                                "type": "tool",
+                                "name": "ask_clarification",
+                                "content": "Which environment should I use?",
+                            },
+                            {"type": "human", "content": history[1]},
+                            {"type": "ai", "content": "Got it. I will deploy to prod."},
+                        ]
+                    }
+
+                return {
+                    "messages": [
+                        {"type": "human", "content": history[-1]},
+                        {"type": "ai", "content": "History missing; clarification repeated."},
+                    ]
+                }
+
+            mock_client = MagicMock()
+            mock_client.threads.create = AsyncMock(return_value={"thread_id": "clarify-thread-1"})
+            mock_client.threads.get = AsyncMock(return_value={"thread_id": "clarify-thread-1"})
+            mock_client.runs.wait = AsyncMock(side_effect=_runs_wait)
+            manager._client = mock_client
+
+            await manager.start()
+
+            await bus.publish_inbound(
+                InboundMessage(
+                    channel_name="test",
+                    chat_id="chat1",
+                    user_id="user1",
+                    text="Deploy my app",
+                )
+            )
+            await _wait_for(lambda: len(outbound_received) >= 1)
+
+            await bus.publish_inbound(
+                InboundMessage(
+                    channel_name="test",
+                    chat_id="chat1",
+                    user_id="user1",
+                    text="prod",
+                )
+            )
+            await _wait_for(lambda: len(outbound_received) >= 2)
+            await manager.stop()
+
+            assert outbound_received[0].text == "Which environment should I use?"
+            assert outbound_received[1].text == "Got it. I will deploy to prod."
+
+            assert mock_client.runs.wait.call_count == 2
+            first_call = mock_client.runs.wait.call_args_list[0]
+            second_call = mock_client.runs.wait.call_args_list[1]
+            assert first_call.kwargs["config"]["configurable"]["checkpoint_ns"] == ""
+            assert second_call.kwargs["config"]["configurable"]["checkpoint_ns"] == ""
+
+        _run(go())
+
    def test_handle_chat_uses_user_session_overrides(self):
        from app.channels.manager import ChannelManager

@@ -1343,6 +1499,8 @@ class TestChannelManager:
            call_args = mock_client.runs.stream.call_args

            assert call_args[1]["input"]["messages"][0]["content"] == "hello"
+            assert call_args[1]["config"]["configurable"]["checkpoint_ns"] == ""
+            assert call_args[1]["config"]["configurable"]["thread_id"] == "test-thread-123"
            assert call_args[1]["context"]["is_bootstrap"] is True

            # Final message should be published
@@ -437,6 +437,85 @@ class TestStream:
        call_kwargs = agent.stream.call_args.kwargs
        assert "messages" in call_kwargs["stream_mode"]

+    def test_stream_emits_additional_kwargs_updates_for_streamed_ai_messages(self, client):
+        """stream() emits a follow-up AI event when attribution metadata arrives via values."""
+        assembled = AIMessage(
+            content="Hello!",
+            id="ai-1",
+            additional_kwargs={
+                "token_usage_attribution": {
+                    "version": 1,
+                    "kind": "final_answer",
+                    "shared_attribution": False,
+                    "actions": [],
+                }
+            },
+        )
+        agent = MagicMock()
+        agent.stream.return_value = iter(
+            [
+                ("messages", (AIMessageChunk(content="Hello!", id="ai-1"), {})),
+                ("values", {"messages": [HumanMessage(content="hi", id="h-1"), assembled]}),
+            ]
+        )
+
+        with (
+            patch.object(client, "_ensure_agent"),
+            patch.object(client, "_agent", agent),
+        ):
+            events = list(client.stream("hi", thread_id="t-stream-kwargs"))
+
+        ai_events = [event for event in events if event.type == "messages-tuple" and event.data.get("type") == "ai" and event.data.get("id") == "ai-1"]
+        assert any(event.data.get("content") == "Hello!" for event in ai_events)
+        assert any(event.data.get("additional_kwargs", {}).get("token_usage_attribution", {}).get("kind") == "final_answer" for event in ai_events)
+
+    def test_stream_emits_new_additional_kwargs_after_prior_metadata(self, client):
+        """stream() emits later attribution metadata even after earlier kwargs for the same id."""
+        attribution = {
+            "version": 1,
+            "kind": "final_answer",
+            "shared_attribution": False,
+            "actions": [],
+        }
+        assembled = AIMessage(
+            content="Hello!",
+            id="ai-1",
+            additional_kwargs={
+                "reasoning_content": "Thinking first.",
+                "token_usage_attribution": attribution,
+            },
+        )
+        agent = MagicMock()
+        agent.stream.return_value = iter(
+            [
+                (
+                    "messages",
+                    (
+                        AIMessageChunk(
+                            content="Hello!",
+                            id="ai-1",
+                            additional_kwargs={"reasoning_content": "Thinking first."},
+                        ),
+                        {},
+                    ),
+                ),
+                ("values", {"messages": [HumanMessage(content="hi", id="h-1"), assembled]}),
+            ]
+        )
+
+        with (
+            patch.object(client, "_ensure_agent"),
+            patch.object(client, "_agent", agent),
+        ):
+            events = list(client.stream("hi", thread_id="t-stream-kwargs-delta"))
+
+        ai_events = [event for event in events if event.type == "messages-tuple" and event.data.get("type") == "ai" and event.data.get("id") == "ai-1"]
+        metadata_events = [event for event in ai_events if event.data.get("additional_kwargs")]
+
+        assert metadata_events[0].data["additional_kwargs"] == {"reasoning_content": "Thinking first."}
+        assert metadata_events[1].data["content"] == ""
+        assert metadata_events[1].data["additional_kwargs"] == {"token_usage_attribution": attribution}
+
    def test_chat_accumulates_streamed_deltas(self, client):
        """chat() concatenates per-id deltas from messages mode."""
        agent = MagicMock()
@@ -0,0 +1,53 @@
+"""Tests for DeerFlowClient message serialization helpers."""
+
+from langchain_core.messages import AIMessage, HumanMessage
+
+from deerflow.client import DeerFlowClient
+
+
+def test_serialize_ai_message_preserves_additional_kwargs():
+    message = AIMessage(
+        content="done",
+        additional_kwargs={
+            "token_usage_attribution": {
+                "version": 1,
+                "kind": "final_answer",
+                "shared_attribution": False,
+                "actions": [],
+            }
+        },
+        usage_metadata={"input_tokens": 12, "output_tokens": 3, "total_tokens": 15},
+    )
+
+    serialized = DeerFlowClient._serialize_message(message)
+
+    assert serialized["type"] == "ai"
+    assert serialized["usage_metadata"] == {
+        "input_tokens": 12,
+        "output_tokens": 3,
+        "total_tokens": 15,
+    }
+    assert serialized["additional_kwargs"] == {
+        "token_usage_attribution": {
+            "version": 1,
+            "kind": "final_answer",
+            "shared_attribution": False,
+            "actions": [],
+        }
+    }
+
+
+def test_serialize_human_message_preserves_additional_kwargs():
+    message = HumanMessage(
+        content="hello",
+        additional_kwargs={"files": [{"name": "diagram.png"}]},
+    )
+
+    serialized = DeerFlowClient._serialize_message(message)
+
+    assert serialized == {
+        "type": "human",
+        "content": "hello",
+        "id": None,
+        "additional_kwargs": {"files": [{"name": "diagram.png"}]},
+    }
@@ -537,7 +537,10 @@ class TestAgentsAPI:
    def test_create_persists_files_on_disk(self, agent_client, tmp_path):
        agent_client.post("/api/agents", json={"name": "disk-check", "soul": "disk soul"})

-        agent_dir = tmp_path / "agents" / "disk-check"
+        # tests/conftest.py installs an autouse fixture that sets the
+        # contextvar to "test-user-autouse", so the agent is persisted under
+        # users/test-user-autouse/agents/ rather than the legacy shared dir.
+        agent_dir = tmp_path / "users" / "test-user-autouse" / "agents" / "disk-check"
        assert agent_dir.exists()
        assert (agent_dir / "config.yaml").exists()
        assert (agent_dir / "SOUL.md").exists()
@@ -545,12 +548,23 @@ class TestAgentsAPI:

    def test_delete_removes_files_from_disk(self, agent_client, tmp_path):
        agent_client.post("/api/agents", json={"name": "remove-me", "soul": "bye"})
-        agent_dir = tmp_path / "agents" / "remove-me"
+        agent_dir = tmp_path / "users" / "test-user-autouse" / "agents" / "remove-me"
        assert agent_dir.exists()

        agent_client.delete("/api/agents/remove-me")
        assert not agent_dir.exists()

+    def test_create_rejects_legacy_name_collision(self, agent_client, tmp_path):
+        """An unmigrated legacy agent must still block name collision so that
+        running the migration script later won't shadow the legacy entry."""
+        legacy_dir = tmp_path / "agents" / "legacy-agent"
+        legacy_dir.mkdir(parents=True)
+        (legacy_dir / "config.yaml").write_text("name: legacy-agent\n", encoding="utf-8")
+        (legacy_dir / "SOUL.md").write_text("legacy soul", encoding="utf-8")
+
+        response = agent_client.post("/api/agents", json={"name": "legacy-agent", "soul": "x"})
+        assert response.status_code == 409
+

 # ===========================================================================
 # 9. Gateway API – User Profile endpoints
@@ -50,7 +50,7 @@ def test_nginx_routes_official_langgraph_prefix_to_gateway_api():
        assert "/api/langgraph-compat" not in content
        assert "proxy_pass http://langgraph" not in content
        assert "rewrite ^/api/langgraph/(.*) /api/$1 break;" in content
-        assert "proxy_pass http://gateway" in content
+        assert "proxy_pass http://gateway" in content or "proxy_pass http://$gateway_upstream" in content


 def test_frontend_rewrites_langgraph_prefix_to_gateway():
@@ -17,6 +17,18 @@ def _set_skills_cache_state(*, skills=None, active=False, version=0):
        prompt_module._enabled_skills_refresh_event.clear()


+def test_build_self_update_section_empty_for_default_agent():
+    assert prompt_module._build_self_update_section(None) == ""
+
+
+def test_build_self_update_section_present_for_custom_agent():
+    section = prompt_module._build_self_update_section("my-agent")
+
+    assert "<self_update>" in section
+    assert "my-agent" in section
+    assert "update_agent" in section
+
+
 def test_build_custom_mounts_section_returns_empty_when_no_mounts(monkeypatch):
    config = SimpleNamespace(sandbox=SimpleNamespace(mounts=[]))
    monkeypatch.setattr("deerflow.config.get_app_config", lambda: config)
@@ -3,7 +3,7 @@
 import copy
 from unittest.mock import MagicMock

-from langchain_core.messages import AIMessage, HumanMessage, SystemMessage
+from langchain_core.messages import AIMessage, SystemMessage

 from deerflow.agents.middlewares.loop_detection_middleware import (
    _HARD_STOP_MSG,
@@ -146,14 +146,42 @@ class TestLoopDetection:
        for _ in range(2):
            mw._apply(_make_state(tool_calls=call), runtime)

-        # Third identical call triggers warning
+        # Third identical call triggers warning. The warning is appended to
+        # the AIMessage content (tool_calls preserved) — never inserted as a
+        # separate HumanMessage between the AIMessage(tool_calls) and its
+        # ToolMessage responses, which would break OpenAI/Moonshot strict
+        # tool-call pairing validation.
        result = mw._apply(_make_state(tool_calls=call), runtime)
        assert result is not None
        msgs = result["messages"]
        assert len(msgs) == 1
-        assert isinstance(msgs[0], HumanMessage)
+        assert isinstance(msgs[0], AIMessage)
+        assert len(msgs[0].tool_calls) == len(call)
+        assert msgs[0].tool_calls[0]["id"] == call[0]["id"]
        assert "LOOP DETECTED" in msgs[0].content

+    def test_warn_does_not_break_tool_call_pairing(self):
+        """Regression: the warn branch must NOT inject a non-tool message
+        after an AIMessage(tool_calls=...). Moonshot/OpenAI reject the next
+        request with 'tool_call_ids did not have response messages' if any
+        non-tool message is wedged between the AIMessage and its ToolMessage
+        responses. See #2029.
+        """
+        mw = LoopDetectionMiddleware(warn_threshold=3, hard_limit=10)
+        runtime = _make_runtime()
+        call = [_bash_call("ls")]
+
+        for _ in range(2):
+            mw._apply(_make_state(tool_calls=call), runtime)
+
+        result = mw._apply(_make_state(tool_calls=call), runtime)
+        assert result is not None
+        msgs = result["messages"]
+        assert len(msgs) == 1
+        assert isinstance(msgs[0], AIMessage)
+        assert len(msgs[0].tool_calls) == len(call)
+        assert msgs[0].tool_calls[0]["id"] == call[0]["id"]
+
    def test_warn_only_injected_once(self):
        """Warning for the same hash should only be injected once per thread."""
        mw = LoopDetectionMiddleware(warn_threshold=3, hard_limit=10)
@@ -483,7 +511,11 @@ class TestToolFrequencyDetection:
        result = mw._apply(_make_state(tool_calls=[self._read_call("/file_4.py")]), runtime)
        assert result is not None
        msg = result["messages"][0]
-        assert isinstance(msg, HumanMessage)
+        # Warning is appended to the AIMessage content; tool_calls preserved
+        # so the tools node still runs and Moonshot/OpenAI tool-call pairing
+        # validation does not break.
+        assert isinstance(msg, AIMessage)
+        assert msg.tool_calls
        assert "read_file" in msg.content
        assert "LOOP DETECTED" in msg.content

@@ -125,3 +125,68 @@ class TestMigrateMemory:
        from scripts.migrate_user_isolation import migrate_memory

        migrate_memory(paths, user_id="default")  # should not raise
+
+
+class TestMigrateAgents:
+    @staticmethod
+    def _seed_legacy_agent(paths: Paths, name: str, *, soul: str = "soul", description: str = "d") -> Path:
+        legacy_dir = paths.agents_dir / name
+        legacy_dir.mkdir(parents=True, exist_ok=True)
+        (legacy_dir / "config.yaml").write_text(f"name: {name}\ndescription: {description}\n", encoding="utf-8")
+        (legacy_dir / "SOUL.md").write_text(soul, encoding="utf-8")
+        return legacy_dir
+
+    def test_moves_legacy_into_user_layout(self, base_dir: Path, paths: Paths):
+        self._seed_legacy_agent(paths, "agent-a", soul="soul-a")
+        self._seed_legacy_agent(paths, "agent-b", soul="soul-b")
+
+        from scripts.migrate_user_isolation import migrate_agents
+
+        report = migrate_agents(paths, user_id="default")
+
+        assert {entry["agent"] for entry in report} == {"agent-a", "agent-b"}
+        for entry in report:
+            assert entry["user_id"] == "default"
+            assert "moved -> " in entry["action"]
+
+        for name, soul in [("agent-a", "soul-a"), ("agent-b", "soul-b")]:
+            dest = paths.user_agent_dir("default", name)
+            assert dest.exists(), f"{name} should have moved into the per-user layout"
+            assert (dest / "SOUL.md").read_text() == soul
+
+        # Legacy agents/ root is cleaned up once empty.
+        assert not paths.agents_dir.exists()
+
+    def test_dry_run_does_not_move(self, base_dir: Path, paths: Paths):
+        legacy_dir = self._seed_legacy_agent(paths, "agent-a")
+
+        from scripts.migrate_user_isolation import migrate_agents
+
+        report = migrate_agents(paths, user_id="default", dry_run=True)
+
+        assert len(report) == 1
+        assert legacy_dir.exists(), "dry-run must not touch the filesystem"
+        assert not paths.user_agent_dir("default", "agent-a").exists()
+
+    def test_existing_destination_is_treated_as_conflict(self, base_dir: Path, paths: Paths):
+        self._seed_legacy_agent(paths, "agent-a", soul="legacy soul")
+        dest = paths.user_agent_dir("default", "agent-a")
+        dest.mkdir(parents=True)
+        (dest / "SOUL.md").write_text("preexisting", encoding="utf-8")
+
+        from scripts.migrate_user_isolation import migrate_agents
+
+        report = migrate_agents(paths, user_id="default")
+
+        assert report[0]["action"].startswith("conflict -> ")
+        # Per-user destination must be left untouched.
+        assert (dest / "SOUL.md").read_text() == "preexisting"
+        # Legacy copy lands under migration-conflicts/agents/.
+        conflicts_dir = paths.base_dir / "migration-conflicts" / "agents" / "agent-a"
+        assert (conflicts_dir / "SOUL.md").read_text() == "legacy soul"
+
+    def test_no_legacy_dir_is_noop(self, base_dir: Path, paths: Paths):
+        from scripts.migrate_user_isolation import migrate_agents
+
+        report = migrate_agents(paths, user_id="default")
+        assert report == []
@@ -50,6 +50,21 @@ class TestUserAgentMemoryFile:
        assert paths.user_agent_memory_file("bob", "MyAgent") == expected


+class TestUserAgentDir:
+    def test_user_agents_dir(self, paths: Paths):
+        assert paths.user_agents_dir("alice") == paths.base_dir / "users" / "alice" / "agents"
+
+    def test_user_agent_dir(self, paths: Paths):
+        assert paths.user_agent_dir("alice", "code-reviewer") == paths.base_dir / "users" / "alice" / "agents" / "code-reviewer"
+
+    def test_user_agent_dir_lowercases_name(self, paths: Paths):
+        assert paths.user_agent_dir("alice", "CodeReviewer") == paths.base_dir / "users" / "alice" / "agents" / "codereviewer"
+
+    def test_user_agent_dir_validates_user_id(self, paths: Paths):
+        with pytest.raises(ValueError, match="Invalid user_id"):
+            paths.user_agent_dir("../escape", "myagent")
+
+
 class TestUserThreadDir:
    def test_user_thread_dir(self, paths: Paths):
        expected = paths.base_dir / "users" / "u1" / "threads" / "t1"
@@ -0,0 +1,293 @@
+from __future__ import annotations
+
+import pytest
+import requests
+
+from deerflow.community.aio_sandbox.remote_backend import RemoteSandboxBackend
+from deerflow.community.aio_sandbox.sandbox_info import SandboxInfo
+
+
+class _StubResponse:
+    def __init__(
+        self,
+        *,
+        status_code: int = 200,
+        payload: object | None = None,
+        json_exc: Exception | None = None,
+    ):
+        self.status_code = status_code
+        self._payload = {} if payload is None else payload
+        self._json_exc = json_exc
+        self.ok = 200 <= status_code < 400
+        self.text = ""
+
+    def raise_for_status(self) -> None:
+        if self.status_code >= 400:
+            raise requests.HTTPError(f"HTTP {self.status_code}")
+
+    def json(self) -> object:
+        if self._json_exc is not None:
+            raise self._json_exc
+        return self._payload
+
+
+def test_list_running_delegates_to_provisioner_list(monkeypatch):
+    backend = RemoteSandboxBackend("http://provisioner:8002")
+    sandbox_info = SandboxInfo(sandbox_id="test-id", sandbox_url="http://localhost:8080")
+
+    def mock_list():
+        return [sandbox_info]
+
+    monkeypatch.setattr(backend, "_provisioner_list", mock_list)
+
+    assert backend.list_running() == [sandbox_info]
+
+
+def test_provisioner_list_returns_sandbox_infos_and_filters_invalid_entries(monkeypatch):
+    backend = RemoteSandboxBackend("http://provisioner:8002")
+
+    def mock_get(url: str, timeout: int):
+        assert url == "http://provisioner:8002/api/sandboxes"
+        assert timeout == 10
+        return _StubResponse(
+            payload={
+                "sandboxes": [
+                    {"sandbox_id": "abc123", "sandbox_url": "http://k3s:31001"},
+                    {"sandbox_id": "missing-url"},
+                    {"sandbox_url": "http://k3s:31002"},
+                ]
+            }
+        )
+
+    monkeypatch.setattr(requests, "get", mock_get)
+
+    infos = backend._provisioner_list()
+    assert len(infos) == 1
+    assert infos[0].sandbox_id == "abc123"
+    assert infos[0].sandbox_url == "http://k3s:31001"
+
+
+def test_provisioner_list_returns_empty_on_request_exception(monkeypatch):
+    backend = RemoteSandboxBackend("http://provisioner:8002")
+
+    def mock_get(url: str, timeout: int):
+        raise requests.RequestException("network down")
+
+    monkeypatch.setattr(requests, "get", mock_get)
+
+    assert backend._provisioner_list() == []
+
+
+def test_provisioner_list_returns_empty_when_payload_is_not_dict(monkeypatch):
+    backend = RemoteSandboxBackend("http://provisioner:8002")
+
+    def mock_get(url: str, timeout: int):
+        return _StubResponse(payload=[{"sandbox_id": "abc", "sandbox_url": "http://k3s:31001"}])
+
+    monkeypatch.setattr(requests, "get", mock_get)
+
+    assert backend._provisioner_list() == []
+
+
+def test_provisioner_list_returns_empty_when_sandboxes_is_not_list(monkeypatch):
+    backend = RemoteSandboxBackend("http://provisioner:8002")
+
+    def mock_get(url: str, timeout: int):
+        return _StubResponse(payload={"sandboxes": {"sandbox_id": "abc"}})
+
+    monkeypatch.setattr(requests, "get", mock_get)
+
+    assert backend._provisioner_list() == []
+
+
+def test_provisioner_list_skips_non_dict_sandbox_entries(monkeypatch):
+    backend = RemoteSandboxBackend("http://provisioner:8002")
+
+    def mock_get(url: str, timeout: int):
+        return _StubResponse(
+            payload={
+                "sandboxes": [
+                    {"sandbox_id": "abc123", "sandbox_url": "http://k3s:31001"},
+                    "bad-entry",
+                    123,
+                    None,
+                ]
+            }
+        )
+
+    monkeypatch.setattr(requests, "get", mock_get)
+
+    infos = backend._provisioner_list()
+    assert len(infos) == 1
+    assert infos[0].sandbox_id == "abc123"
+    assert infos[0].sandbox_url == "http://k3s:31001"
+
+
+def test_create_delegates_to_provisioner_create(monkeypatch):
+    backend = RemoteSandboxBackend("http://provisioner:8002")
+    expected = SandboxInfo(sandbox_id="abc123", sandbox_url="http://k3s:31001")
+
+    def mock_create(thread_id: str, sandbox_id: str, extra_mounts=None):
+        assert thread_id == "thread-1"
+        assert sandbox_id == "abc123"
+        assert extra_mounts == [("/host", "/container", False)]
+        return expected
+
+    monkeypatch.setattr(backend, "_provisioner_create", mock_create)
+
+    result = backend.create("thread-1", "abc123", extra_mounts=[("/host", "/container", False)])
+    assert result == expected
+
+
+def test_provisioner_create_returns_sandbox_info(monkeypatch):
+    backend = RemoteSandboxBackend("http://provisioner:8002")
+
+    def mock_post(url: str, json: dict, timeout: int):
+        assert url == "http://provisioner:8002/api/sandboxes"
+        assert json == {"sandbox_id": "abc123", "thread_id": "thread-1"}
+        assert timeout == 30
+        return _StubResponse(payload={"sandbox_id": "abc123", "sandbox_url": "http://k3s:31001"})
+
+    monkeypatch.setattr(requests, "post", mock_post)
+
+    info = backend._provisioner_create("thread-1", "abc123")
+    assert info.sandbox_id == "abc123"
+    assert info.sandbox_url == "http://k3s:31001"
+
+
+def test_provisioner_create_raises_runtime_error_on_request_exception(monkeypatch):
+    backend = RemoteSandboxBackend("http://provisioner:8002")
+
+    def mock_post(url: str, json: dict, timeout: int):
+        raise requests.RequestException("boom")
+
+    monkeypatch.setattr(requests, "post", mock_post)
+
+    with pytest.raises(RuntimeError, match="Provisioner create failed"):
+        backend._provisioner_create("thread-1", "abc123")
+
+
+def test_destroy_delegates_to_provisioner_destroy(monkeypatch):
+    backend = RemoteSandboxBackend("http://provisioner:8002")
+    called: list[str] = []
+
+    def mock_destroy(sandbox_id: str):
+        called.append(sandbox_id)
+
+    monkeypatch.setattr(backend, "_provisioner_destroy", mock_destroy)
+
+    backend.destroy(SandboxInfo(sandbox_id="abc123", sandbox_url="http://k3s:31001"))
+    assert called == ["abc123"]
+
+
+def test_provisioner_destroy_calls_delete(monkeypatch):
+    backend = RemoteSandboxBackend("http://provisioner:8002")
+
+    def mock_delete(url: str, timeout: int):
+        assert url == "http://provisioner:8002/api/sandboxes/abc123"
+        assert timeout == 15
+        return _StubResponse(status_code=200)
+
+    monkeypatch.setattr(requests, "delete", mock_delete)
+
+    backend._provisioner_destroy("abc123")
+
+
+def test_provisioner_destroy_swallows_request_exception(monkeypatch):
+    backend = RemoteSandboxBackend("http://provisioner:8002")
+
+    def mock_delete(url: str, timeout: int):
+        raise requests.RequestException("network down")
+
+    monkeypatch.setattr(requests, "delete", mock_delete)
+
+    backend._provisioner_destroy("abc123")
+
+
+def test_is_alive_delegates_to_provisioner_is_alive(monkeypatch):
+    backend = RemoteSandboxBackend("http://provisioner:8002")
+
+    def mock_is_alive(sandbox_id: str):
+        assert sandbox_id == "abc123"
+        return True
+
+    monkeypatch.setattr(backend, "_provisioner_is_alive", mock_is_alive)
+
+    alive = backend.is_alive(SandboxInfo(sandbox_id="abc123", sandbox_url="http://k3s:31001"))
+    assert alive is True
+
+
+def test_provisioner_is_alive_true_only_when_status_running(monkeypatch):
+    backend = RemoteSandboxBackend("http://provisioner:8002")
+
+    def mock_get_running(url: str, timeout: int):
+        return _StubResponse(payload={"status": "Running"})
+
+    monkeypatch.setattr(requests, "get", mock_get_running)
+    assert backend._provisioner_is_alive("abc123") is True
+
+    def mock_get_pending(url: str, timeout: int):
+        return _StubResponse(payload={"status": "Pending"})
+
+    monkeypatch.setattr(requests, "get", mock_get_pending)
+    assert backend._provisioner_is_alive("abc123") is False
+
+
+def test_provisioner_is_alive_returns_false_on_request_exception(monkeypatch):
+    backend = RemoteSandboxBackend("http://provisioner:8002")
+
+    def mock_get(url: str, timeout: int):
+        raise requests.RequestException("boom")
+
+    monkeypatch.setattr(requests, "get", mock_get)
+    assert backend._provisioner_is_alive("abc123") is False
+
+
+def test_discover_delegates_to_provisioner_discover(monkeypatch):
+    backend = RemoteSandboxBackend("http://provisioner:8002")
+    expected = SandboxInfo(sandbox_id="abc123", sandbox_url="http://k3s:31001")
+
+    def mock_discover(sandbox_id: str):
+        assert sandbox_id == "abc123"
+        return expected
+
+    monkeypatch.setattr(backend, "_provisioner_discover", mock_discover)
+
+    result = backend.discover("abc123")
+    assert result == expected
+
+
+def test_provisioner_discover_returns_none_on_404(monkeypatch):
+    backend = RemoteSandboxBackend("http://provisioner:8002")
+
+    def mock_get(url: str, timeout: int):
+        return _StubResponse(status_code=404)
+
+    monkeypatch.setattr(requests, "get", mock_get)
+
+    assert backend._provisioner_discover("abc123") is None
+
+
+def test_provisioner_discover_returns_info_on_success(monkeypatch):
+    backend = RemoteSandboxBackend("http://provisioner:8002")
+
+    def mock_get(url: str, timeout: int):
+        return _StubResponse(payload={"sandbox_id": "abc123", "sandbox_url": "http://k3s:31001"})
+
+    monkeypatch.setattr(requests, "get", mock_get)
+
+    info = backend._provisioner_discover("abc123")
+    assert info is not None
+    assert info.sandbox_id == "abc123"
+    assert info.sandbox_url == "http://k3s:31001"
+
+
+def test_provisioner_discover_returns_none_on_request_exception(monkeypatch):
+    backend = RemoteSandboxBackend("http://provisioner:8002")
+
+    def mock_get(url: str, timeout: int):
+        raise requests.RequestException("boom")
+
+    monkeypatch.setattr(requests, "get", mock_get)
+
+    assert backend._provisioner_discover("abc123") is None
@@ -7,6 +7,7 @@ import yaml

 from deerflow.config import app_config as app_config_module
 from deerflow.config import extensions_config as extensions_config_module
+from deerflow.config import skills_config as skills_config_module
 from deerflow.config.app_config import AppConfig
 from deerflow.config.extensions_config import ExtensionsConfig
 from deerflow.config.paths import Paths
@@ -35,6 +36,7 @@ def test_default_runtime_paths_resolve_from_current_project(tmp_path: Path, monk
        encoding="utf-8",
    )
    (tmp_path / "extensions_config.json").write_text('{"mcpServers": {}, "skills": {}}', encoding="utf-8")
+    (tmp_path / "skills").mkdir()

    assert AppConfig.resolve_config_path() == tmp_path / "config.yaml"
    assert ExtensionsConfig.resolve_config_path() == tmp_path / "extensions_config.json"
@@ -121,6 +123,40 @@ def test_app_config_falls_back_to_legacy_when_project_root_lacks_config(tmp_path
    assert AppConfig.resolve_config_path() == legacy_backend_config


+def test_skills_config_falls_back_to_legacy_when_project_root_lacks_skills(tmp_path: Path, monkeypatch):
+    """When DEER_FLOW_PROJECT_ROOT is unset and cwd has no `skills/`, the legacy
+    repo-root candidate must be used so monorepo runs (cwd=backend/) keep finding
+    `<repo>/skills` instead of `<repo>/backend/skills` (regression test for #2694)."""
+    _clear_path_env(monkeypatch)
+    cwd = tmp_path / "cwd"
+    cwd.mkdir()
+    monkeypatch.chdir(cwd)
+
+    legacy_skills = tmp_path / "legacy-repo" / "skills"
+    legacy_skills.mkdir(parents=True)
+
+    monkeypatch.setattr(
+        skills_config_module,
+        "_legacy_skills_candidates",
+        lambda: (legacy_skills,),
+    )
+
+    assert SkillsConfig().get_skills_path() == legacy_skills
+
+
+def test_skills_config_returns_project_default_when_neither_exists(tmp_path: Path, monkeypatch):
+    """When nothing exists, fall back to the project-root default path so callers
+    surface a stable empty location instead of silently picking a stale legacy dir."""
+    _clear_path_env(monkeypatch)
+    cwd = tmp_path / "cwd"
+    cwd.mkdir()
+    monkeypatch.chdir(cwd)
+
+    monkeypatch.setattr(skills_config_module, "_legacy_skills_candidates", lambda: ())
+
+    assert SkillsConfig().get_skills_path() == cwd / "skills"
+
+
 def test_extensions_config_falls_back_to_legacy_when_project_root_lacks_file(tmp_path: Path, monkeypatch):
    """ExtensionsConfig should hit the legacy backend/repo-root locations when
    the caller project root has no extensions_config.json/mcp_config.json."""
@@ -27,6 +27,7 @@ def _make_paths_mock(tmp_path: Path):
    paths = MagicMock()
    paths.base_dir = tmp_path
    paths.agent_dir = lambda name: tmp_path / "agents" / name
+    paths.user_agent_dir = lambda user_id, name: tmp_path / "users" / user_id / "agents" / name
    return paths


@@ -54,7 +55,7 @@ def test_setup_agent_rejects_invalid_agent_name_before_writing(tmp_path, monkeyp
    messages = result.update["messages"]
    assert len(messages) == 1
    assert "Invalid agent name" in messages[0].content
-    assert not (tmp_path / "agents").exists()
+    assert not (tmp_path / "users" / "test-user-autouse" / "agents").exists()
    assert not (outside_dir / "evil" / "SOUL.md").exists()


@@ -68,7 +69,7 @@ def test_setup_agent_rejects_absolute_agent_name_before_writing(tmp_path, monkey
    messages = result.update["messages"]
    assert len(messages) == 1
    assert "Invalid agent name" in messages[0].content
-    assert not (tmp_path / "agents").exists()
+    assert not (tmp_path / "users" / "test-user-autouse" / "agents").exists()
    assert not (Path(absolute_agent) / "SOUL.md").exists()


@@ -81,10 +82,10 @@ class TestSetupAgentNoDataLoss:
    def test_existing_agent_dir_preserved_on_failure(self, tmp_path: Path):
        """If the agent directory already exists and setup fails,
        the directory and its contents must NOT be deleted."""
-        agent_dir = tmp_path / "agents" / "test-agent"
+        agent_dir = tmp_path / "users" / "test-user-autouse" / "agents" / "test-agent"
        agent_dir.mkdir(parents=True)
        old_soul = agent_dir / "SOUL.md"
-        old_soul.write_text("original soul content")
+        old_soul.write_text("original soul content", encoding="utf-8")

        with patch("deerflow.tools.builtins.setup_agent_tool.get_paths", return_value=_make_paths_mock(tmp_path)):
            # Force soul_file.write_text to raise after directory already exists
@@ -103,7 +104,7 @@ class TestSetupAgentNoDataLoss:
    def test_new_agent_dir_cleaned_up_on_failure(self, tmp_path: Path):
        """If the agent directory is newly created and setup fails,
        the directory should be cleaned up."""
-        agent_dir = tmp_path / "agents" / "test-agent"
+        agent_dir = tmp_path / "users" / "test-user-autouse" / "agents" / "test-agent"
        assert not agent_dir.exists()

        with patch("deerflow.tools.builtins.setup_agent_tool.get_paths", return_value=_make_paths_mock(tmp_path)):
@@ -121,7 +122,7 @@ class TestSetupAgentNoDataLoss:
        """Happy path: setup_agent creates config.yaml and SOUL.md."""
        _call_setup_agent(tmp_path, soul="# My Agent", description="A test agent")

-        agent_dir = tmp_path / "agents" / "test-agent"
+        agent_dir = tmp_path / "users" / "test-user-autouse" / "agents" / "test-agent"
        assert agent_dir.exists()
        assert (agent_dir / "SOUL.md").read_text() == "# My Agent"
        assert (agent_dir / "config.yaml").exists()
@@ -19,6 +19,7 @@ def test_get_skills_root_path_points_to_current_project_skills(tmp_path: Path, m
    monkeypatch.delenv("DEER_FLOW_SKILLS_PATH", raising=False)
    monkeypatch.delenv("DEER_FLOW_PROJECT_ROOT", raising=False)
    monkeypatch.chdir(tmp_path)
+    (tmp_path / "skills").mkdir()

    app_config = SimpleNamespace(skills=SkillsConfig())
    path = get_or_new_skill_storage(app_config=app_config).get_skills_root_path()
@@ -1,32 +1,157 @@
-from unittest.mock import MagicMock, patch
+"""Tests for TokenUsageMiddleware attribution annotations."""
+
+from unittest.mock import MagicMock

 from langchain_core.messages import AIMessage

-from deerflow.agents.middlewares.token_usage_middleware import TokenUsageMiddleware
+from deerflow.agents.middlewares.token_usage_middleware import (
+    TOKEN_USAGE_ATTRIBUTION_KEY,
+    TokenUsageMiddleware,
+)


-def test_after_model_logs_usage_metadata_counts():
-    middleware = TokenUsageMiddleware()
-    state = {
-        "messages": [
-            AIMessage(
-                content="done",
-                usage_metadata={
-                    "input_tokens": 10,
-                    "output_tokens": 5,
-                    "total_tokens": 15,
-                },
-            )
+def _make_runtime():
+    runtime = MagicMock()
+    runtime.context = {"thread_id": "test-thread"}
+    return runtime
+
+
+class TestTokenUsageMiddleware:
+    def test_annotates_todo_updates_with_structured_actions(self):
+        middleware = TokenUsageMiddleware()
+        message = AIMessage(
+            content="",
+            tool_calls=[
+                {
+                    "id": "write_todos:1",
+                    "name": "write_todos",
+                    "args": {
+                        "todos": [
+                            {"content": "Inspect streaming path", "status": "completed"},
+                            {"content": "Design token attribution schema", "status": "in_progress"},
+                        ]
+                    },
+                }
+            ],
+            usage_metadata={"input_tokens": 100, "output_tokens": 20, "total_tokens": 120},
+        )
+
+        state = {
+            "messages": [message],
+            "todos": [
+                {"content": "Inspect streaming path", "status": "in_progress"},
+                {"content": "Design token attribution schema", "status": "pending"},
+            ],
+        }
+
+        result = middleware.after_model(state, _make_runtime())
+
+        assert result is not None
+        updated_message = result["messages"][0]
+        attribution = updated_message.additional_kwargs[TOKEN_USAGE_ATTRIBUTION_KEY]
+        assert attribution["kind"] == "tool_batch"
+        assert attribution["shared_attribution"] is True
+        assert attribution["tool_call_ids"] == ["write_todos:1"]
+        assert attribution["actions"] == [
+            {
+                "kind": "todo_complete",
+                "content": "Inspect streaming path",
+                "tool_call_id": "write_todos:1",
+            },
+            {
+                "kind": "todo_start",
+                "content": "Design token attribution schema",
+                "tool_call_id": "write_todos:1",
+            },
        ]
-    }

-    with patch("deerflow.agents.middlewares.token_usage_middleware.logger.info") as info_mock:
-        result = middleware.after_model(state=state, runtime=MagicMock())
+    def test_annotates_subagent_and_search_steps(self):
+        middleware = TokenUsageMiddleware()
+        message = AIMessage(
+            content="",
+            tool_calls=[
+                {
+                    "id": "task:1",
+                    "name": "task",
+                    "args": {
+                        "description": "spec-coder patch message grouping",
+                        "subagent_type": "general-purpose",
+                    },
+                },
+                {
+                    "id": "web_search:1",
+                    "name": "web_search",
+                    "args": {"query": "LangGraph useStream messages tuple"},
+                },
+            ],
+        )

-    assert result is None
-    info_mock.assert_called_once_with(
-        "LLM token usage: input=%s output=%s total=%s",
-        10,
-        5,
-        15,
-    )
+        result = middleware.after_model({"messages": [message]}, _make_runtime())
+
+        assert result is not None
+        attribution = result["messages"][0].additional_kwargs[TOKEN_USAGE_ATTRIBUTION_KEY]
+        assert attribution["kind"] == "tool_batch"
+        assert attribution["shared_attribution"] is True
+        assert attribution["actions"] == [
+            {
+                "kind": "subagent",
+                "description": "spec-coder patch message grouping",
+                "subagent_type": "general-purpose",
+                "tool_call_id": "task:1",
+            },
+            {
+                "kind": "search",
+                "tool_name": "web_search",
+                "query": "LangGraph useStream messages tuple",
+                "tool_call_id": "web_search:1",
+            },
+        ]
+
+    def test_marks_final_answer_when_no_tools(self):
+        middleware = TokenUsageMiddleware()
+        message = AIMessage(content="Here is the final answer.")
+
+        result = middleware.after_model({"messages": [message]}, _make_runtime())
+
+        assert result is not None
+        attribution = result["messages"][0].additional_kwargs[TOKEN_USAGE_ATTRIBUTION_KEY]
+        assert attribution["kind"] == "final_answer"
+        assert attribution["shared_attribution"] is False
+        assert attribution["actions"] == []
+
+    def test_annotates_removed_todos(self):
+        middleware = TokenUsageMiddleware()
+        message = AIMessage(
+            content="",
+            tool_calls=[
+                {
+                    "id": "write_todos:remove",
+                    "name": "write_todos",
+                    "args": {
+                        "todos": [],
+                    },
+                }
+            ],
+        )
+
+        result = middleware.after_model(
+            {
+                "messages": [message],
+                "todos": [
+                    {"content": "Archive obsolete plan", "status": "pending"},
+                ],
+            },
+            _make_runtime(),
+        )
+
+        assert result is not None
+        attribution = result["messages"][0].additional_kwargs[TOKEN_USAGE_ATTRIBUTION_KEY]
+        assert attribution["kind"] == "todo_update"
+        assert attribution["shared_attribution"] is False
+        assert attribution["actions"] == [
+            {
+                "kind": "todo_remove",
+                "content": "Archive obsolete plan",
+                "tool_call_id": "write_todos:remove",
+            }
+        ]
@@ -0,0 +1,310 @@
+"""Tests for update_agent tool — partial updates, atomic writes, and validation.
+
+Resolves issue #2616: a custom agent must be able to persist updates to its
+own SOUL.md / config.yaml from inside a normal chat (not only from bootstrap).
+
+The tool writes per-user (``{base_dir}/users/{user_id}/agents/{name}/``) so
+that one user's update cannot mutate another user's agent.
+"""
+
+from __future__ import annotations
+
+from pathlib import Path
+from types import SimpleNamespace
+from unittest.mock import MagicMock, patch
+
+import pytest
+import yaml
+
+from deerflow.config.agents_config import AgentConfig
+from deerflow.tools.builtins.update_agent_tool import update_agent
+
+DEFAULT_USER = "test-user-autouse"  # matches the autouse fixture in tests/conftest.py
+
+
+class _DummyRuntime(SimpleNamespace):
+    context: dict
+    tool_call_id: str
+
+
+def _runtime(agent_name: str | None = "test-agent", tool_call_id: str = "call_1") -> _DummyRuntime:
+    return _DummyRuntime(context={"agent_name": agent_name} if agent_name is not None else {}, tool_call_id=tool_call_id)
+
+
+def _make_paths_mock(tmp_path: Path) -> MagicMock:
+    paths = MagicMock()
+    paths.base_dir = tmp_path
+    paths.agent_dir = lambda name: tmp_path / "agents" / name
+    paths.agents_dir = tmp_path / "agents"
+    paths.user_agent_dir = lambda user_id, name: tmp_path / "users" / user_id / "agents" / name
+    paths.user_agents_dir = lambda user_id: tmp_path / "users" / user_id / "agents"
+    return paths
+
+
+def _user_agent_dir(tmp_path: Path, name: str = "test-agent", user_id: str = DEFAULT_USER) -> Path:
+    return tmp_path / "users" / user_id / "agents" / name
+
+
+def _seed_agent(
+    tmp_path: Path,
+    name: str = "test-agent",
+    *,
+    description: str = "old desc",
+    soul: str = "old soul",
+    skills: list[str] | None = None,
+    user_id: str = DEFAULT_USER,
+) -> Path:
+    """Create a baseline agent dir with config.yaml and SOUL.md for tests to mutate."""
+    agent_dir = _user_agent_dir(tmp_path, name, user_id=user_id)
+    agent_dir.mkdir(parents=True, exist_ok=True)
+    cfg: dict = {"name": name, "description": description}
+    if skills is not None:
+        cfg["skills"] = skills
+    (agent_dir / "config.yaml").write_text(yaml.safe_dump(cfg, sort_keys=False), encoding="utf-8")
+    (agent_dir / "SOUL.md").write_text(soul, encoding="utf-8")
+    return agent_dir
+
+
+@pytest.fixture()
+def patched_paths(tmp_path: Path):
+    paths_mock = _make_paths_mock(tmp_path)
+    with patch("deerflow.tools.builtins.update_agent_tool.get_paths", return_value=paths_mock):
+        # load_agent_config also calls get_paths(); patch the same target it uses.
+        with patch("deerflow.config.agents_config.get_paths", return_value=paths_mock):
+            yield paths_mock
+
+
+@pytest.fixture()
+def stub_app_config():
+    """Stub get_app_config so model validation accepts only known names."""
+    fake = MagicMock()
+    fake.get_model_config.side_effect = lambda name: object() if name in {"gpt-known", "m1"} else None
+    with patch("deerflow.tools.builtins.update_agent_tool.get_app_config", return_value=fake):
+        yield fake
+
+
+# --- Validation tests ---
+
+
+def test_update_agent_rejects_missing_agent_name(patched_paths):
+    result = update_agent.func(runtime=_runtime(agent_name=None), soul="new soul")
+
+    msg = result.update["messages"][0]
+    assert "only available inside a custom agent's chat" in msg.content
+
+
+def test_update_agent_rejects_invalid_agent_name(patched_paths):
+    result = update_agent.func(runtime=_runtime(agent_name="../../etc/passwd"), soul="x")
+
+    msg = result.update["messages"][0]
+    assert "Invalid agent name" in msg.content
+
+
+def test_update_agent_rejects_unknown_agent(tmp_path, patched_paths):
+    result = update_agent.func(runtime=_runtime(agent_name="ghost"), soul="x")
+
+    msg = result.update["messages"][0]
+    assert "does not exist" in msg.content
+    assert not _user_agent_dir(tmp_path, "ghost").exists()
+
+
+def test_update_agent_requires_at_least_one_field(tmp_path, patched_paths):
+    _seed_agent(tmp_path)
+
+    result = update_agent.func(runtime=_runtime())
+
+    msg = result.update["messages"][0]
+    assert "No fields provided" in msg.content
+
+
+def test_update_agent_rejects_unknown_model(tmp_path, patched_paths, stub_app_config):
+    """Copilot review: model must be validated against configured models before
+    being persisted; otherwise _resolve_model_name silently falls back to the
+    default and the user gets repeated warnings on every later turn."""
+    _seed_agent(tmp_path)
+
+    result = update_agent.func(runtime=_runtime(), model="not-in-config")
+
+    msg = result.update["messages"][0]
+    assert "Unknown model" in msg.content
+    cfg = yaml.safe_load((_user_agent_dir(tmp_path) / "config.yaml").read_text())
+    assert "model" not in cfg, "Invalid model must not have been written to config.yaml"
+
+
+def test_update_agent_accepts_known_model(tmp_path, patched_paths, stub_app_config):
+    _seed_agent(tmp_path)
+
+    result = update_agent.func(runtime=_runtime(), model="gpt-known")
+
+    cfg = yaml.safe_load((_user_agent_dir(tmp_path) / "config.yaml").read_text())
+    assert cfg["model"] == "gpt-known"
+    assert "model" in result.update["messages"][0].content
+
+
+# --- Partial update tests ---
+
+
+def test_update_agent_updates_soul_only(tmp_path, patched_paths):
+    agent_dir = _seed_agent(tmp_path, description="keep me", soul="old soul")
+
+    result = update_agent.func(runtime=_runtime(), soul="brand new soul")
+
+    assert (agent_dir / "SOUL.md").read_text() == "brand new soul"
+    cfg = yaml.safe_load((agent_dir / "config.yaml").read_text())
+    assert cfg["description"] == "keep me", "description must be preserved"
+    assert "soul" in result.update["messages"][0].content
+
+
+def test_update_agent_updates_description_only(tmp_path, patched_paths):
+    agent_dir = _seed_agent(tmp_path, description="old desc", soul="keep this soul")
+
+    result = update_agent.func(runtime=_runtime(), description="new desc")
+
+    cfg = yaml.safe_load((agent_dir / "config.yaml").read_text())
+    assert cfg["description"] == "new desc"
+    assert (agent_dir / "SOUL.md").read_text() == "keep this soul", "SOUL.md must be preserved"
+    assert "description" in result.update["messages"][0].content
+
+
+def test_update_agent_skills_empty_list_disables_all(tmp_path, patched_paths):
+    agent_dir = _seed_agent(tmp_path, skills=["a", "b"])
+
+    result = update_agent.func(runtime=_runtime(), skills=[])
+
+    cfg = yaml.safe_load((agent_dir / "config.yaml").read_text())
+    assert cfg["skills"] == [], "empty list must persist as empty list (not be omitted)"
+    assert "skills" in result.update["messages"][0].content
+
+
+def test_update_agent_skills_omitted_keeps_existing(tmp_path, patched_paths):
+    agent_dir = _seed_agent(tmp_path, skills=["alpha", "beta"])
+
+    update_agent.func(runtime=_runtime(), description="bumped")
+
+    cfg = yaml.safe_load((agent_dir / "config.yaml").read_text())
+    assert cfg["skills"] == ["alpha", "beta"], "omitting skills must preserve the existing whitelist"
+
+
+def test_update_agent_no_op_when_values_match_existing(tmp_path, patched_paths):
+    _seed_agent(tmp_path, description="same")
+
+    result = update_agent.func(runtime=_runtime(), description="same")
+
+    assert "No changes applied" in result.update["messages"][0].content
+
+
+def test_update_agent_forces_name_to_directory(tmp_path, patched_paths):
+    """Copilot review: if the existing config.yaml has a drifted ``name`` field,
+    update_agent must rewrite it to match the directory name so on-disk state
+    stays consistent with the runtime context."""
+    agent_dir = _user_agent_dir(tmp_path)
+    agent_dir.mkdir(parents=True)
+    (agent_dir / "config.yaml").write_text(yaml.safe_dump({"name": "drifted-name", "description": "old"}, sort_keys=False), encoding="utf-8")
+    (agent_dir / "SOUL.md").write_text("soul", encoding="utf-8")
+
+    update_agent.func(runtime=_runtime(), description="bumped")
+
+    cfg = yaml.safe_load((agent_dir / "config.yaml").read_text())
+    assert cfg["name"] == "test-agent", "config.yaml name must follow the directory name, not legacy yaml content"
+
+
+# --- Atomicity tests ---
+
+
+def test_update_agent_failure_preserves_existing_files(tmp_path, patched_paths):
+    agent_dir = _seed_agent(tmp_path, soul="original soul")
+
+    real_replace = Path.replace
+
+    def _explode(self, target):
+        if str(target).endswith("SOUL.md"):
+            raise OSError("disk full")
+        return real_replace(self, target)
+
+    with patch.object(Path, "replace", _explode):
+        result = update_agent.func(runtime=_runtime(), soul="poisoned content")
+
+    assert (agent_dir / "SOUL.md").read_text() == "original soul", "atomic write must not corrupt existing SOUL.md"
+    assert "Error" in result.update["messages"][0].content
+    leftover_tmps = list(agent_dir.glob("*.tmp"))
+    assert leftover_tmps == [], "temp files must be cleaned up on failure"
+
+
+def test_update_agent_soul_failure_does_not_replace_config(tmp_path, patched_paths):
+    """Copilot review: if both config.yaml and SOUL.md are scheduled to be
+    written and SOUL.md staging fails *before* any rename, config.yaml must
+    NOT be replaced. The fix stages every temp file first and only renames
+    after all temps exist on disk."""
+    agent_dir = _seed_agent(tmp_path, description="original-desc", soul="original soul")
+
+    real_named_temp_file = __import__("tempfile").NamedTemporaryFile
+    call_count = {"n": 0}
+
+    def _explode_on_soul(*args, **kwargs):
+        # Inspect target dir + suffix; the SOUL temp file is the second one we stage.
+        call_count["n"] += 1
+        if call_count["n"] >= 2:
+            raise OSError("disk full while staging SOUL.md")
+        return real_named_temp_file(*args, **kwargs)
+
+    with patch("deerflow.tools.builtins.update_agent_tool.tempfile.NamedTemporaryFile", side_effect=_explode_on_soul):
+        result = update_agent.func(runtime=_runtime(), description="new-desc", soul="new soul")
+
+    cfg = yaml.safe_load((agent_dir / "config.yaml").read_text())
+    assert cfg["description"] == "original-desc", "config.yaml must not be replaced when SOUL.md staging fails"
+    assert (agent_dir / "SOUL.md").read_text() == "original soul"
+    assert "Error" in result.update["messages"][0].content
+    assert list(agent_dir.glob("*.tmp")) == [], "staged config.yaml temp must be cleaned up on SOUL.md failure"
+
+
+# --- Per-user isolation ---
+
+
+def test_update_agent_only_writes_under_current_user(tmp_path, patched_paths):
+    """An update from user 'alice' must never touch user 'bob's agent files."""
+    from deerflow.runtime.user_context import reset_current_user, set_current_user
+
+    # Seed an agent for both users with the same name.
+    alice_dir = _seed_agent(tmp_path, name="shared", description="alice-desc", soul="alice soul", user_id="alice")
+    bob_dir = _seed_agent(tmp_path, name="shared", description="bob-desc", soul="bob soul", user_id="bob")
+
+    # Override the autouse contextvar so update_agent runs as Alice.
+    token = set_current_user(SimpleNamespace(id="alice"))
+    try:
+        update_agent.func(runtime=_runtime(agent_name="shared"), description="alice-bumped")
+    finally:
+        reset_current_user(token)
+
+    alice_cfg = yaml.safe_load((alice_dir / "config.yaml").read_text())
+    bob_cfg = yaml.safe_load((bob_dir / "config.yaml").read_text())
+    assert alice_cfg["description"] == "alice-bumped"
+    assert bob_cfg["description"] == "bob-desc", "bob's config.yaml must not have been touched"
+    assert (bob_dir / "SOUL.md").read_text() == "bob soul"
+
+
+# --- Loader passthrough sanity check ---
+
+
+def test_update_agent_round_trips_known_fields(tmp_path, patched_paths):
+    """update_agent reads through load_agent_config so all fields the loader
+    knows about (name, description, model, tool_groups, skills) round-trip
+    on a partial update.
+
+    Note: ``load_agent_config`` strips unknown fields before constructing
+    AgentConfig, so legacy/extra YAML keys are NOT preserved across
+    updates — by design.
+    """
+    _seed_agent(tmp_path, description="legacy")
+
+    fake_cfg = AgentConfig(name="test-agent", description="legacy", skills=["s1"], tool_groups=["g1"], model="m1")
+    fake_app_config = MagicMock()
+    fake_app_config.get_model_config.return_value = object()
+    with patch("deerflow.tools.builtins.update_agent_tool.load_agent_config", return_value=fake_cfg):
+        with patch("deerflow.tools.builtins.update_agent_tool.get_app_config", return_value=fake_app_config):
+            update_agent.func(runtime=_runtime(), description="bumped")
+
+    cfg = yaml.safe_load((_user_agent_dir(tmp_path) / "config.yaml").read_text())
+    assert cfg["description"] == "bumped"
+    assert cfg["skills"] == ["s1"]
+    assert cfg["tool_groups"] == ["g1"]
+    assert cfg["model"] == "m1"
@@ -17,25 +17,17 @@ http {
    # Docker internal DNS (for resolving k3s hostname)
    resolver 127.0.0.11 valid=10s ipv6=off;

-    # Upstream servers (using Docker service names)
-    # NOTE: `zone` and `resolve` are nginx Plus-only features and are not
-    # available in the standard nginx:alpine image. Docker's internal DNS
-    # (127.0.0.11) handles service discovery; upstreams are resolved at
-    # nginx startup and remain valid for the lifetime of the deployment.
-    upstream gateway {
-        server gateway:8001;
-    }
-
-    upstream frontend {
-        server frontend:3000;
-    }
-
    # ── Main server (path-based routing) ─────────────────────────────────
    server {
        listen 2026 default_server;
        listen [::]:2026 default_server;
        server_name _;

+        # Resolve Docker service names at request time to avoid stale upstream
+        # IPs when containers restart and receive new addresses.
+        set $gateway_upstream gateway:8001;
+        set $frontend_upstream frontend:3000;
+
        # Hide CORS headers from upstream to prevent duplicates
        proxy_hide_header 'Access-Control-Allow-Origin';
        proxy_hide_header 'Access-Control-Allow-Methods';
@@ -56,7 +48,7 @@ http {
        # Rewrites /api/langgraph/* to /api/* before proxying to Gateway.
        location /api/langgraph/ {
            rewrite ^/api/langgraph/(.*) /api/$1 break;
-            proxy_pass http://gateway;
+            proxy_pass http://$gateway_upstream;
            proxy_http_version 1.1;

            # Headers
@@ -82,7 +74,7 @@ http {

        # Custom API: Models endpoint
        location /api/models {
-            proxy_pass http://gateway;
+            proxy_pass http://$gateway_upstream;
            proxy_http_version 1.1;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
@@ -92,7 +84,7 @@ http {

        # Custom API: Memory endpoint
        location /api/memory {
-            proxy_pass http://gateway;
+            proxy_pass http://$gateway_upstream;
            proxy_http_version 1.1;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
@@ -102,7 +94,7 @@ http {

        # Custom API: MCP configuration endpoint
        location /api/mcp {
-            proxy_pass http://gateway;
+            proxy_pass http://$gateway_upstream;
            proxy_http_version 1.1;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
@@ -112,7 +104,7 @@ http {

        # Custom API: Skills configuration endpoint
        location /api/skills {
-            proxy_pass http://gateway;
+            proxy_pass http://$gateway_upstream;
            proxy_http_version 1.1;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
@@ -122,7 +114,7 @@ http {

        # Custom API: Agents endpoint
        location /api/agents {
-            proxy_pass http://gateway;
+            proxy_pass http://$gateway_upstream;
            proxy_http_version 1.1;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
@@ -132,7 +124,7 @@ http {

        # Custom API: Uploads endpoint
        location ~ ^/api/threads/[^/]+/uploads {
-            proxy_pass http://gateway;
+            proxy_pass http://$gateway_upstream;
            proxy_http_version 1.1;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
@@ -146,7 +138,7 @@ http {

        # Custom API: Other endpoints under /api/threads
        location ~ ^/api/threads {
-            proxy_pass http://gateway;
+            proxy_pass http://$gateway_upstream;
            proxy_http_version 1.1;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
@@ -156,7 +148,7 @@ http {

        # API Documentation: Swagger UI
        location /docs {
-            proxy_pass http://gateway;
+            proxy_pass http://$gateway_upstream;
            proxy_http_version 1.1;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
@@ -166,7 +158,7 @@ http {

        # API Documentation: ReDoc
        location /redoc {
-            proxy_pass http://gateway;
+            proxy_pass http://$gateway_upstream;
            proxy_http_version 1.1;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
@@ -176,7 +168,7 @@ http {

        # API Documentation: OpenAPI Schema
        location /openapi.json {
-            proxy_pass http://gateway;
+            proxy_pass http://$gateway_upstream;
            proxy_http_version 1.1;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
@@ -186,7 +178,7 @@ http {

        # Health check endpoint (gateway)
        location /health {
-            proxy_pass http://gateway;
+            proxy_pass http://$gateway_upstream;
            proxy_http_version 1.1;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
@@ -210,7 +202,7 @@ http {
        # Catch-all for /api/ routes not covered above (e.g. /api/v1/auth/*).
        # More specific prefix and regex locations above still take precedence.
        location /api/ {
-            proxy_pass http://gateway;
+            proxy_pass http://$gateway_upstream;
            proxy_http_version 1.1;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
@@ -220,7 +212,7 @@ http {

        # All other requests go to frontend
        location / {
-            proxy_pass http://frontend;
+            proxy_pass http://$frontend_upstream;
            proxy_http_version 1.1;

            # Headers
@@ -14,3 +14,8 @@
 # Only set these if you need to connect to backend services directly
 # NEXT_PUBLIC_BACKEND_BASE_URL="http://localhost:8001"
 # NEXT_PUBLIC_LANGGRAPH_BASE_URL="http://localhost:2024"
+
+# Server-only Gateway wiring used by SSR (auth checks, /api/* rewrites).
+# Defaults to localhost — only override for non-local deployments.
+# DEER_FLOW_INTERNAL_GATEWAY_BASE_URL="http://localhost:8001"
+# DEER_FLOW_TRUSTED_ORIGINS="http://localhost:3000,http://localhost:2026"
@@ -25,7 +25,7 @@ import { useAgent } from "@/core/agents";
 import { useI18n } from "@/core/i18n/hooks";
 import { useModels } from "@/core/models/hooks";
 import { useNotification } from "@/core/notification/hooks";
-import { useThreadSettings } from "@/core/settings";
+import { useLocalSettings, useThreadSettings } from "@/core/settings";
 import { useThreadStream } from "@/core/threads/hooks";
 import { textOfMessage } from "@/core/threads/utils";
 import { env } from "@/env";
@@ -45,6 +45,7 @@ export default function AgentChatPage() {
  const { threadId, setThreadId, isNewThread, setIsNewThread } =
    useThreadChat();
  const [settings, setSettings] = useThreadSettings(threadId);
+  const [localSettings, setLocalSettings] = useLocalSettings();
  const { tokenUsageEnabled } = useModels();

  const { showNotification } = useNotification();
@@ -100,6 +101,9 @@ export default function AgentChatPage() {
    ? MESSAGE_LIST_DEFAULT_PADDING_BOTTOM +
      MESSAGE_LIST_FOLLOWUPS_EXTRA_PADDING_BOTTOM
    : undefined;
+  const tokenUsageInlineMode = tokenUsageEnabled
+    ? localSettings.tokenUsage.inlineMode
+    : "off";

  return (
    <ThreadContext.Provider value={{ thread }}>
@@ -139,6 +143,10 @@ export default function AgentChatPage() {
              <TokenUsageIndicator
                enabled={tokenUsageEnabled}
                messages={thread.messages}
+                preferences={localSettings.tokenUsage}
+                onPreferencesChange={(preferences) =>
+                  setLocalSettings("tokenUsage", preferences)
+                }
              />
              <ExportTrigger threadId={threadId} />
              <ArtifactTrigger />
@@ -152,10 +160,10 @@ export default function AgentChatPage() {
                threadId={threadId}
                thread={thread}
                paddingBottom={messageListPaddingBottom}
-                tokenUsageEnabled={tokenUsageEnabled}
                hasMoreHistory={hasMoreHistory}
                loadMoreHistory={loadMoreHistory}
                isHistoryLoading={isHistoryLoading}
+                tokenUsageInlineMode={tokenUsageInlineMode}
              />
            </div>

@@ -33,6 +33,7 @@ import { ThreadContext } from "@/components/workspace/messages/context";
 import type { Agent } from "@/core/agents";
 import {
  AgentNameCheckError,
+  AgentsApiDisabledError,
  checkAgentName,
  createAgent,
  getAgent,
@@ -154,7 +155,9 @@ export default function NewAgentPage() {
        return;
      }
    } catch (err) {
-      if (
+      if (err instanceof AgentsApiDisabledError) {
+        setNameError(t.agents.nameStepApiDisabledError);
+      } else if (
        err instanceof AgentNameCheckError &&
        err.reason === "backend_unreachable"
      ) {
@@ -175,6 +178,10 @@ export default function NewAgentPage() {
        soul: "",
      });
    } catch (err) {
+      if (err instanceof AgentsApiDisabledError) {
+        setNameError(t.agents.nameStepApiDisabledError);
+        return;
+      }
      setNameError(
        getCreateAgentErrorMessage(
          err,
@@ -197,6 +204,7 @@ export default function NewAgentPage() {
    nameInput,
    sendMessage,
    t.agents.nameStepAlreadyExistsError,
+    t.agents.nameStepApiDisabledError,
    t.agents.nameStepNetworkError,
    t.agents.nameStepBootstrapMessage,
    t.agents.nameStepCheckError,
@@ -24,7 +24,7 @@ import { Welcome } from "@/components/workspace/welcome";
 import { useI18n } from "@/core/i18n/hooks";
 import { useModels } from "@/core/models/hooks";
 import { useNotification } from "@/core/notification/hooks";
-import { useThreadSettings } from "@/core/settings";
+import { useLocalSettings, useThreadSettings } from "@/core/settings";
 import { useThreadStream } from "@/core/threads/hooks";
 import { textOfMessage } from "@/core/threads/utils";
 import { env } from "@/env";
@@ -36,6 +36,7 @@ export default function ChatPage() {
  const { threadId, setThreadId, isNewThread, setIsNewThread, isMock } =
    useThreadChat();
  const [settings, setSettings] = useThreadSettings(threadId);
+  const [localSettings, setLocalSettings] = useLocalSettings();
  const { tokenUsageEnabled } = useModels();
  const mountedRef = useRef(false);
  useSpecificChatMode();
@@ -99,6 +100,9 @@ export default function ChatPage() {
    ? MESSAGE_LIST_DEFAULT_PADDING_BOTTOM +
      MESSAGE_LIST_FOLLOWUPS_EXTRA_PADDING_BOTTOM
    : undefined;
+  const tokenUsageInlineMode = tokenUsageEnabled
+    ? localSettings.tokenUsage.inlineMode
+    : "off";

  return (
    <ThreadContext.Provider value={{ thread, isMock }}>
@@ -119,6 +123,10 @@ export default function ChatPage() {
              <TokenUsageIndicator
                enabled={tokenUsageEnabled}
                messages={thread.messages}
+                preferences={localSettings.tokenUsage}
+                onPreferencesChange={(preferences) =>
+                  setLocalSettings("tokenUsage", preferences)
+                }
              />
              <ExportTrigger threadId={threadId} />
              <ArtifactTrigger />
@@ -131,10 +139,10 @@ export default function ChatPage() {
                threadId={threadId}
                thread={thread}
                paddingBottom={messageListPaddingBottom}
-                tokenUsageEnabled={tokenUsageEnabled}
                hasMoreHistory={hasMoreHistory}
                loadMoreHistory={loadMoreHistory}
                isHistoryLoading={isHistoryLoading}
+                tokenUsageInlineMode={tokenUsageInlineMode}
              />
            </div>
            <div className="absolute right-0 bottom-0 left-0 z-30 flex justify-center px-4">
@@ -2,6 +2,7 @@ import type { Message } from "@langchain/langgraph-sdk";
 import {
  BookOpenTextIcon,
  ChevronUp,
+  CoinsIcon,
  FolderOpenIcon,
  GlobeIcon,
  LightbulbIcon,
@@ -24,6 +25,8 @@ import {
 import { CodeBlock } from "@/components/ai-elements/code-block";
 import { Button } from "@/components/ui/button";
 import { useI18n } from "@/core/i18n/hooks";
+import { formatTokenCount } from "@/core/messages/usage";
+import type { TokenDebugStep } from "@/core/messages/usage-model";
 import {
  extractReasoningContentFromMessage,
  findToolCallResult,
@@ -43,10 +46,14 @@ export function MessageGroup({
  className,
  messages,
  isLoading = false,
+  tokenDebugSteps = [],
+  showTokenDebugSummaries = false,
 }: {
  className?: string;
  messages: Message[];
  isLoading?: boolean;
+  tokenDebugSteps?: TokenDebugStep[];
+  showTokenDebugSummaries?: boolean;
 }) {
  const { t } = useI18n();
  const [showAbove, setShowAbove] = useState(
@@ -56,6 +63,28 @@ export function MessageGroup({
    env.NEXT_PUBLIC_STATIC_WEBSITE_ONLY === "true",
  );
  const steps = useMemo(() => convertToSteps(messages), [messages]);
+  const debugStepByMessageId = useMemo(
+    () =>
+      new Map(
+        tokenDebugSteps.map(
+          (step) => [step.messageId || step.id, step] as const,
+        ),
+      ),
+    [tokenDebugSteps],
+  );
+  const toolCallCountByMessageId = useMemo(() => {
+    const counts = new Map<string, number>();
+
+    for (const step of steps) {
+      if (step.type !== "toolCall" || !step.messageId) {
+        continue;
+      }
+
+      counts.set(step.messageId, (counts.get(step.messageId) ?? 0) + 1);
+    }
+
+    return counts;
+  }, [steps]);
  const lastToolCallStep = useMemo(() => {
    const filteredSteps = steps.filter((step) => step.type === "toolCall");
    return filteredSteps[filteredSteps.length - 1];
@@ -77,6 +106,125 @@ export function MessageGroup({
    }
  }, [lastToolCallStep, steps]);
  const rehypePlugins = useRehypeSplitWordsIntoSpans(isLoading);
+  const firstEligibleDebugSummaryStepIndexByMessageId = useMemo(() => {
+    const firstIndices = new Map<string, number>();
+
+    if (!showTokenDebugSummaries) {
+      return firstIndices;
+    }
+
+    for (const [index, step] of steps.entries()) {
+      const messageId = step.messageId;
+      if (!messageId || firstIndices.has(messageId)) {
+        continue;
+      }
+
+      const debugStep = debugStepByMessageId.get(messageId);
+      if (!debugStep) {
+        continue;
+      }
+
+      const toolCallCount = toolCallCountByMessageId.get(messageId) ?? 0;
+      if (!debugStep.sharedAttribution && toolCallCount > 0) {
+        continue;
+      }
+      if (
+        !debugStep.sharedAttribution &&
+        toolCallCount === 0 &&
+        debugStep.label === t.common.thinking &&
+        debugStep.secondaryLabels.length === 0
+      ) {
+        continue;
+      }
+
+      firstIndices.set(messageId, index);
+    }
+
+    return firstIndices;
+  }, [
+    debugStepByMessageId,
+    showTokenDebugSummaries,
+    steps,
+    t.common.thinking,
+    toolCallCountByMessageId,
+  ]);
+
+  const renderDebugSummary = (
+    messageId: string | undefined,
+    stepIndex: number,
+  ) => {
+    if (!showTokenDebugSummaries || !messageId) {
+      return null;
+    }
+
+    const debugStep = debugStepByMessageId.get(messageId);
+    if (!debugStep) {
+      return null;
+    }
+    if (
+      firstEligibleDebugSummaryStepIndexByMessageId.get(messageId) !== stepIndex
+    ) {
+      return null;
+    }
+
+    return (
+      <ChainOfThoughtStep
+        key={`token-debug-${messageId}`}
+        icon={CoinsIcon}
+        label={
+          <DebugStepLabel
+            label={debugStep.label}
+            token={formatDebugToken(debugStep, t)}
+          />
+        }
+        description={
+          debugStep.sharedAttribution
+            ? t.tokenUsage.sharedAttribution
+            : undefined
+        }
+      >
+        {debugStep.secondaryLabels.length > 0 && (
+          <ChainOfThoughtSearchResults>
+            {debugStep.secondaryLabels.map((label, index) => (
+              <ChainOfThoughtSearchResult
+                key={`${debugStep.id}-${index}-${label}`}
+              >
+                {label}
+              </ChainOfThoughtSearchResult>
+            ))}
+          </ChainOfThoughtSearchResults>
+        )}
+      </ChainOfThoughtStep>
+    );
+  };
+
+  const renderToolCall = (
+    step: CoTToolCallStep,
+    options?: { isLast?: boolean },
+  ) => {
+    const debugStep =
+      showTokenDebugSummaries && step.messageId
+        ? debugStepByMessageId.get(step.messageId)
+        : undefined;
+
+    return (
+      <ToolCall
+        key={step.id}
+        {...step}
+        isLast={options?.isLast}
+        isLoading={isLoading}
+        tokenDebugStep={
+          debugStep && !debugStep.sharedAttribution ? debugStep : undefined
+        }
+      />
+    );
+  };
+
+  const lastReasoningDebugStep =
+    showTokenDebugSummaries && lastReasoningStep?.messageId
+      ? debugStepByMessageId.get(lastReasoningStep.messageId)
+      : undefined;
+
  return (
    <ChainOfThought
      className={cn("w-full gap-2 rounded-lg border p-0.5", className)}
@@ -111,36 +259,46 @@ export function MessageGroup({
      {lastToolCallStep && (
        <ChainOfThoughtContent className="px-4 pb-2">
          {showAbove &&
-            aboveLastToolCallSteps.map((step) =>
-              step.type === "reasoning" ? (
-                <ChainOfThoughtStep
-                  key={step.id}
-                  label={
-                    <MarkdownContent
-                      content={step.reasoning ?? ""}
-                      isLoading={isLoading}
-                      rehypePlugins={rehypePlugins}
-                    />
-                  }
-                ></ChainOfThoughtStep>
-              ) : (
-                <ToolCall key={step.id} {...step} isLoading={isLoading} />
-              ),
-            )}
+            aboveLastToolCallSteps.flatMap((step) => {
+              const stepIndex = steps.indexOf(step);
+              if (step.type === "reasoning") {
+                return [
+                  renderDebugSummary(step.messageId, stepIndex),
+                  <ChainOfThoughtStep
+                    key={step.id}
+                    label={
+                      <MarkdownContent
+                        content={step.reasoning ?? ""}
+                        isLoading={isLoading}
+                        rehypePlugins={rehypePlugins}
+                      />
+                    }
+                  ></ChainOfThoughtStep>,
+                ];
+              }
+
+              return [
+                renderDebugSummary(step.messageId, stepIndex),
+                renderToolCall(step),
+              ];
+            })}
+          {renderDebugSummary(
+            lastToolCallStep.messageId,
+            steps.indexOf(lastToolCallStep),
+          )}
          {lastToolCallStep && (
            <FlipDisplay uniqueKey={lastToolCallStep.id ?? ""}>
-              <ToolCall
-                key={lastToolCallStep.id}
-                {...lastToolCallStep}
-                isLast={true}
-                isLoading={isLoading}
-              />
+              {renderToolCall(lastToolCallStep, { isLast: true })}
            </FlipDisplay>
          )}
        </ChainOfThoughtContent>
      )}
      {lastReasoningStep && (
        <>
+          {renderDebugSummary(
+            lastReasoningStep.messageId,
+            steps.indexOf(lastReasoningStep),
+          )}
          <Button
            key={lastReasoningStep.id}
            className="w-full items-start justify-start text-left"
@@ -150,7 +308,22 @@ export function MessageGroup({
            <div className="flex w-full items-center justify-between">
              <ChainOfThoughtStep
                className="font-normal"
-                label={t.common.thinking}
+                label={
+                  <DebugStepLabel
+                    label={t.common.thinking}
+                    token={shouldInlineThinkingToken({
+                      debugStep: lastReasoningDebugStep,
+                      toolCallCount: lastReasoningStep.messageId
+                        ? (toolCallCountByMessageId.get(
+                            lastReasoningStep.messageId,
+                          ) ?? 0)
+                        : 0,
+                      enabled: showTokenDebugSummaries,
+                      thinkingLabel: t.common.thinking,
+                      t,
+                    })}
+                  />
+                }
                icon={LightbulbIcon}
              ></ChainOfThoughtStep>
              <div>
@@ -183,6 +356,60 @@ export function MessageGroup({
  );
 }

+function formatDebugToken(
+  debugStep: TokenDebugStep,
+  t: ReturnType<typeof useI18n>["t"],
+) {
+  return debugStep.usage
+    ? `${formatTokenCount(debugStep.usage.totalTokens)} ${t.tokenUsage.label}`
+    : t.tokenUsage.unavailableShort;
+}
+
+function shouldInlineThinkingToken({
+  debugStep,
+  toolCallCount,
+  enabled,
+  thinkingLabel,
+  t,
+}: {
+  debugStep?: TokenDebugStep;
+  toolCallCount: number;
+  enabled: boolean;
+  thinkingLabel: string;
+  t: ReturnType<typeof useI18n>["t"];
+}) {
+  if (
+    !enabled ||
+    !debugStep ||
+    debugStep.sharedAttribution ||
+    toolCallCount > 0 ||
+    debugStep.label !== thinkingLabel
+  ) {
+    return null;
+  }
+
+  return formatDebugToken(debugStep, t);
+}
+
+function DebugStepLabel({
+  label,
+  token,
+}: {
+  label: React.ReactNode;
+  token?: string | null;
+}) {
+  return (
+    <div className="flex items-center justify-between gap-3">
+      <div className="min-w-0 flex-1">{label}</div>
+      {token ? (
+        <div className="text-muted-foreground shrink-0 font-mono text-[11px]">
+          {token}
+        </div>
+      ) : null}
+    </div>
+  );
+}
+
 function ToolCall({
  id,
  messageId,
@@ -191,6 +418,7 @@ function ToolCall({
  result,
  isLast = false,
  isLoading = false,
+  tokenDebugStep,
 }: {
  id?: string;
  messageId?: string;
@@ -199,10 +427,20 @@ function ToolCall({
  result?: string | Record<string, unknown>;
  isLast?: boolean;
  isLoading?: boolean;
+  tokenDebugStep?: TokenDebugStep;
 }) {
  const { t } = useI18n();
  const { setOpen, autoOpen, autoSelect, selectedArtifact, select } =
    useArtifacts();
+  const tokenLabel = tokenDebugStep
+    ? formatDebugToken(tokenDebugStep, t)
+    : null;
+  const resolveLabel = (fallback: React.ReactNode) =>
+    tokenDebugStep ? (
+      <DebugStepLabel label={tokenDebugStep.label} token={tokenLabel} />
+    ) : (
+      fallback
+    );

  if (name === "web_search") {
    let label: React.ReactNode = t.toolCalls.searchForRelatedInfo;
@@ -210,7 +448,11 @@ function ToolCall({
      label = t.toolCalls.searchOnWebFor(args.query);
    }
    return (
-      <ChainOfThoughtStep key={id} label={label} icon={SearchIcon}>
+      <ChainOfThoughtStep
+        key={id}
+        label={resolveLabel(label)}
+        icon={SearchIcon}
+      >
        {Array.isArray(result) && (
          <ChainOfThoughtSearchResults>
            {result.map((item) => (
@@ -240,7 +482,11 @@ function ToolCall({
      }
    )?.results;
    return (
-      <ChainOfThoughtStep key={id} label={label} icon={SearchIcon}>
+      <ChainOfThoughtStep
+        key={id}
+        label={resolveLabel(label)}
+        icon={SearchIcon}
+      >
        {Array.isArray(results) && (
          <ChainOfThoughtSearchResults>
            {Array.isArray(results) &&
@@ -280,7 +526,7 @@ function ToolCall({
    return (
      <ChainOfThoughtStep
        key={id}
-        label={t.toolCalls.viewWebPage}
+        label={resolveLabel(t.toolCalls.viewWebPage)}
        icon={GlobeIcon}
      >
        <ChainOfThoughtSearchResult>
@@ -305,7 +551,11 @@ function ToolCall({
    }
    const path: string | undefined = (args as { path: string })?.path;
    return (
-      <ChainOfThoughtStep key={id} label={description} icon={FolderOpenIcon}>
+      <ChainOfThoughtStep
+        key={id}
+        label={resolveLabel(description)}
+        icon={FolderOpenIcon}
+      >
        {path && (
          <ChainOfThoughtSearchResult className="cursor-pointer">
            {path}
@@ -321,7 +571,11 @@ function ToolCall({
    }
    const { path } = args as { path: string; content: string };
    return (
-      <ChainOfThoughtStep key={id} label={description} icon={BookOpenTextIcon}>
+      <ChainOfThoughtStep
+        key={id}
+        label={resolveLabel(description)}
+        icon={BookOpenTextIcon}
+      >
        {path && (
          <ChainOfThoughtSearchResult className="cursor-pointer">
            {path}
@@ -353,7 +607,7 @@ function ToolCall({
      <ChainOfThoughtStep
        key={id}
        className="cursor-pointer"
-        label={description}
+        label={resolveLabel(description)}
        icon={NotebookPenIcon}
        onClick={() => {
          select(
@@ -375,13 +629,19 @@ function ToolCall({
    const description: string | undefined = (args as { description: string })
      ?.description;
    if (!description) {
-      return t.toolCalls.executeCommand;
+      return (
+        <ChainOfThoughtStep
+          key={id}
+          label={resolveLabel(t.toolCalls.executeCommand)}
+          icon={SquareTerminalIcon}
+        />
+      );
    }
    const command: string | undefined = (args as { command: string })?.command;
    return (
      <ChainOfThoughtStep
        key={id}
-        label={description}
+        label={resolveLabel(description)}
        icon={SquareTerminalIcon}
      >
        {command && (
@@ -398,7 +658,7 @@ function ToolCall({
    return (
      <ChainOfThoughtStep
        key={id}
-        label={t.toolCalls.needYourHelp}
+        label={resolveLabel(t.toolCalls.needYourHelp)}
        icon={MessageCircleQuestionMarkIcon}
      ></ChainOfThoughtStep>
    );
@@ -406,7 +666,7 @@ function ToolCall({
    return (
      <ChainOfThoughtStep
        key={id}
-        label={t.toolCalls.writeTodos}
+        label={resolveLabel(t.toolCalls.writeTodos)}
        icon={ListTodoIcon}
      ></ChainOfThoughtStep>
    );
@@ -416,7 +676,7 @@ function ToolCall({
    return (
      <ChainOfThoughtStep
        key={id}
-        label={description ?? t.toolCalls.useTool(name)}
+        label={resolveLabel(description ?? t.toolCalls.useTool(name))}
        icon={WrenchIcon}
      ></ChainOfThoughtStep>
    );
@@ -50,7 +50,6 @@ import { cn } from "@/lib/utils";
 import { CopyButton } from "../copy-button";

 import { MarkdownContent } from "./markdown-content";
-import { MessageTokenUsage } from "./message-token-usage";

 function FeedbackButtons({
  threadId,
@@ -121,20 +120,20 @@ function FeedbackButtons({

 export function MessageListItem({
  className,
-  threadId,
  message,
  isLoading,
-  tokenUsageEnabled = false,
  feedback,
  runId,
+  threadId,
+  showCopyButton = true,
 }: {
  className?: string;
  message: Message;
  isLoading?: boolean;
  threadId: string;
-  tokenUsageEnabled?: boolean;
  feedback?: FeedbackData | null;
  runId?: string;
+  showCopyButton?: boolean;
 }) {
  const isHuman = message.type === "human";
  return (
@@ -147,16 +146,17 @@ export function MessageListItem({
        message={message}
        isLoading={isLoading}
        threadId={threadId}
-        tokenUsageEnabled={tokenUsageEnabled}
      />
-      {!isLoading && (
+      {!isLoading && showCopyButton && (
        <MessageToolbar
          className={cn(
-            isHuman ? "-bottom-9 justify-end" : "-bottom-8",
-            "absolute right-0 left-0 z-20",
+            isHuman
+              ? "absolute right-0 -bottom-9 left-0 justify-end"
+              : "absolute right-0 bottom-0 left-0",
+            "z-20 opacity-0 transition-opacity delay-200 duration-300 group-hover/conversation-message:opacity-100",
          )}
        >
-          <div className="flex gap-1">
+          <div className="pointer-events-auto flex gap-1">
            <CopyButton
              clipboardData={
                extractContentFromMessage(message) ??
@@ -213,13 +213,11 @@ function MessageContent_({
  message,
  isLoading = false,
  threadId,
-  tokenUsageEnabled = false,
 }: {
  className?: string;
  message: Message;
  isLoading?: boolean;
  threadId: string;
-  tokenUsageEnabled?: boolean;
 }) {
  const rehypePlugins = useRehypeSplitWordsIntoSpans(isLoading);
  const isHuman = message.type === "human";
@@ -297,11 +295,6 @@ function MessageContent_({
          <ReasoningTrigger />
          <ReasoningContent>{reasoningContent}</ReasoningContent>
        </Reasoning>
-        <MessageTokenUsage
-          enabled={tokenUsageEnabled}
-          isLoading={isLoading}
-          message={message}
-        />
      </AIElementMessageContent>
    );
  }
@@ -339,11 +332,6 @@ function MessageContent_({
        className="my-3"
        components={components}
      />
-      <MessageTokenUsage
-        enabled={tokenUsageEnabled}
-        isLoading={isLoading}
-        message={message}
-      />
    </AIElementMessageContent>
  );
 }
@@ -1,6 +1,7 @@
+import type { Message } from "@langchain/langgraph-sdk";
 import type { BaseStream } from "@langchain/langgraph-sdk/react";
 import { ChevronUpIcon, Loader2Icon } from "lucide-react";
-import { useCallback, useEffect, useRef } from "react";
+import { useCallback, useEffect, useMemo, useRef } from "react";

 import {
  Conversation,
@@ -8,15 +9,20 @@ import {
 } from "@/components/ai-elements/conversation";
 import { Button } from "@/components/ui/button";
 import { useI18n } from "@/core/i18n/hooks";
+import {
+  buildTokenDebugSteps,
+  type TokenUsageInlineMode,
+} from "@/core/messages/usage-model";
 import {
  extractContentFromMessage,
  extractPresentFilesFromMessage,
+  extractReasoningContentFromMessage,
  extractTextFromMessage,
-  groupMessages,
+  getAssistantTurnUsageMessages,
+  getMessageGroups,
  hasContent,
  hasPresentFiles,
  hasReasoning,
-  hasToolCalls,
 } from "@/core/messages/utils";
 import { useRehypeSplitWordsIntoSpans } from "@/core/rehype";
 import type { Subtask } from "@/core/tasks";
@@ -25,12 +31,16 @@ import type { AgentThreadState } from "@/core/threads";
 import { cn } from "@/lib/utils";

 import { ArtifactFileList } from "../artifacts/artifact-file-list";
+import { CopyButton } from "../copy-button";
 import { StreamingIndicator } from "../streaming-indicator";

 import { MarkdownContent } from "./markdown-content";
 import { MessageGroup } from "./message-group";
 import { MessageListItem } from "./message-list-item";
-import { MessageTokenUsageList } from "./message-token-usage";
+import {
+  MessageTokenUsageDebugList,
+  MessageTokenUsageList,
+} from "./message-token-usage";
 import { MessageListSkeleton } from "./skeleton";
 import { SubtaskCard } from "./subtask-card";

@@ -149,7 +159,7 @@ export function MessageList({
  threadId,
  thread,
  paddingBottom = MESSAGE_LIST_DEFAULT_PADDING_BOTTOM,
-  tokenUsageEnabled = false,
+  tokenUsageInlineMode = "off",
  hasMoreHistory,
  loadMoreHistory,
  isHistoryLoading,
@@ -158,7 +168,7 @@ export function MessageList({
  threadId: string;
  thread: BaseStream<AgentThreadState>;
  paddingBottom?: number;
-  tokenUsageEnabled?: boolean;
+  tokenUsageInlineMode?: TokenUsageInlineMode;
  hasMoreHistory?: boolean;
  loadMoreHistory?: () => void;
  isHistoryLoading?: boolean;
@@ -167,10 +177,85 @@ export function MessageList({
  const rehypePlugins = useRehypeSplitWordsIntoSpans(thread.isLoading);
  const updateSubtask = useUpdateSubtask();
  const messages = thread.messages;
+  const groupedMessages = getMessageGroups(messages);
+  const turnUsageMessagesByGroupIndex =
+    getAssistantTurnUsageMessages(groupedMessages);
+  const tokenDebugSteps = useMemo(
+    () => buildTokenDebugSteps(messages, t),
+    [messages, t],
+  );
+
+  const renderAssistantCopyButton = useCallback((messages: Message[]) => {
+    const clipboardData = [...messages]
+      .reverse()
+      .filter((message) => message.type === "ai")
+      .map((message) => {
+        const content = extractContentFromMessage(message);
+        return content ?? extractReasoningContentFromMessage(message) ?? "";
+      })
+      .find((content) => content.length > 0);
+
+    if (!clipboardData) {
+      return null;
+    }
+
+    return (
+      <div className="mt-2 flex justify-start opacity-0 transition-opacity delay-200 duration-300 group-hover/assistant-turn:opacity-100">
+        <CopyButton clipboardData={clipboardData} />
+      </div>
+    );
+  }, []);
+
+  const renderTokenUsage = useCallback(
+    ({
+      messages,
+      turnUsageMessages,
+      inlineDebug = true,
+      debugMessageIds,
+    }: {
+      messages: Message[];
+      turnUsageMessages?: Message[] | null;
+      inlineDebug?: boolean;
+      debugMessageIds?: string[];
+    }) => {
+      if (tokenUsageInlineMode === "per_turn") {
+        return (
+          <MessageTokenUsageList
+            enabled={true}
+            isLoading={thread.isLoading}
+            messages={turnUsageMessages ?? []}
+          />
+        );
+      }
+
+      if (tokenUsageInlineMode === "step_debug" && inlineDebug) {
+        const messageIds = new Set(
+          debugMessageIds ??
+            messages
+              .filter((message) => message.type === "ai")
+              .map((message) => message.id)
+              .filter((id): id is string => typeof id === "string"),
+        );
+        return (
+          <MessageTokenUsageDebugList
+            enabled={true}
+            isLoading={thread.isLoading}
+            steps={tokenDebugSteps.filter((step) =>
+              messageIds.has(step.messageId),
+            )}
+          />
+        );
+      }
+
+      return null;
+    },
+    [thread.isLoading, tokenDebugSteps, tokenUsageInlineMode],
+  );

  if (thread.isThreadLoading && messages.length === 0) {
    return <MessageListSkeleton />;
  }
+
  return (
    <Conversation
      className={cn("flex size-full flex-col justify-center", className)}
@@ -181,19 +266,37 @@ export function MessageList({
          hasMore={hasMoreHistory}
          loadMore={loadMoreHistory}
        />
-        {groupMessages(messages, (group) => {
+        {groupedMessages.map((group, groupIndex) => {
+          const turnUsageMessages = turnUsageMessagesByGroupIndex[groupIndex];
+
          if (group.type === "human" || group.type === "assistant") {
-            return group.messages.map((msg) => {
-              return (
-                <MessageListItem
-                  key={`${group.id}/${msg.id}`}
-                  threadId={threadId}
-                  message={msg}
-                  isLoading={thread.isLoading}
-                  tokenUsageEnabled={tokenUsageEnabled}
-                />
-              );
-            });
+            return (
+              <div
+                key={group.id}
+                className={cn(
+                  "w-full",
+                  group.type === "assistant" && "group/assistant-turn",
+                )}
+              >
+                {group.messages.map((msg) => {
+                  return (
+                    <MessageListItem
+                      key={`${group.id}/${msg.id}`}
+                      message={msg}
+                      isLoading={thread.isLoading}
+                      threadId={threadId}
+                      showCopyButton={group.type !== "assistant"}
+                    />
+                  );
+                })}
+                {renderTokenUsage({
+                  messages: group.messages,
+                  turnUsageMessages,
+                })}
+                {group.type === "assistant" &&
+                  renderAssistantCopyButton(group.messages)}
+              </div>
+            );
          } else if (group.type === "assistant:clarification") {
            const message = group.messages[0];
            if (message && hasContent(message)) {
@@ -204,11 +307,10 @@ export function MessageList({
                    isLoading={thread.isLoading}
                    rehypePlugins={rehypePlugins}
                  />
-                  <MessageTokenUsageList
-                    enabled={tokenUsageEnabled}
-                    isLoading={thread.isLoading}
-                    messages={group.messages}
-                  />
+                  {renderTokenUsage({
+                    messages: group.messages,
+                    turnUsageMessages,
+                  })}
                </div>
              );
            }
@@ -232,11 +334,10 @@ export function MessageList({
                  />
                )}
                <ArtifactFileList files={files} threadId={threadId} />
-                <MessageTokenUsageList
-                  enabled={tokenUsageEnabled}
-                  isLoading={thread.isLoading}
-                  messages={group.messages}
-                />
+                {renderTokenUsage({
+                  messages: group.messages,
+                  turnUsageMessages,
+                })}
              </div>
            );
          } else if (group.type === "assistant:subagent") {
@@ -289,7 +390,19 @@ export function MessageList({
                }
              }
            }
+
            const results: React.ReactNode[] = [];
+            const subagentDebugMessageIds: string[] = [];
+            if (tasks.size > 0) {
+              results.push(
+                <div
+                  key="subtask-count"
+                  className="text-muted-foreground pt-2 text-sm font-normal"
+                >
+                  {t.subtasks.executing(tasks.size)}
+                </div>,
+              );
+            }
            for (const message of group.messages.filter(
              (message) => message.type === "ai",
            )) {
@@ -299,17 +412,17 @@ export function MessageList({
                    key={"thinking-group-" + message.id}
                    messages={[message]}
                    isLoading={thread.isLoading}
+                    tokenDebugSteps={tokenDebugSteps.filter(
+                      (step) => step.messageId === message.id,
+                    )}
+                    showTokenDebugSummaries={
+                      tokenUsageInlineMode === "step_debug"
+                    }
                  />,
                );
+              } else if (message.id) {
+                subagentDebugMessageIds.push(message.id);
              }
-              results.push(
-                <div
-                  key="subtask-count"
-                  className="text-muted-foreground font-norma pt-2 text-sm"
-                >
-                  {t.subtasks.executing(tasks.size)}
-                </div>,
-              );
              const taskIds = message.tool_calls
                ?.filter((toolCall) => toolCall.name === "task")
                .map((toolCall) => toolCall.id);
@@ -329,30 +442,31 @@ export function MessageList({
                className="relative z-1 flex flex-col gap-2"
              >
                {results}
-                <MessageTokenUsageList
-                  enabled={tokenUsageEnabled}
-                  isLoading={thread.isLoading}
-                  messages={group.messages}
-                />
+                {renderTokenUsage({
+                  messages: group.messages,
+                  turnUsageMessages,
+                  debugMessageIds: subagentDebugMessageIds,
+                })}
              </div>
            );
          }
-          const tokenUsageMessages = group.messages.filter(
-            (message) =>
-              message.type === "ai" &&
-              (hasToolCalls(message) ? true : !hasContent(message)),
-          );
          return (
            <div key={"group-" + group.id} className="w-full">
              <MessageGroup
                messages={group.messages}
                isLoading={thread.isLoading}
+                tokenDebugSteps={tokenDebugSteps.filter((step) =>
+                  group.messages.some(
+                    (message) => message.id === step.messageId,
+                  ),
+                )}
+                showTokenDebugSummaries={tokenUsageInlineMode === "step_debug"}
              />
-              <MessageTokenUsageList
-                enabled={tokenUsageEnabled}
-                isLoading={thread.isLoading}
-                messages={tokenUsageMessages}
-              />
+              {renderTokenUsage({
+                messages: group.messages,
+                turnUsageMessages,
+                inlineDebug: false,
+              })}
            </div>
          );
        })}
@@ -1,29 +1,27 @@
 import type { Message } from "@langchain/langgraph-sdk";
 import { CoinsIcon } from "lucide-react";

+import { Badge } from "@/components/ui/badge";
 import { useI18n } from "@/core/i18n/hooks";
-import { formatTokenCount, getUsageMetadata } from "@/core/messages/usage";
+import { accumulateUsage, formatTokenCount } from "@/core/messages/usage";
+import type { TokenDebugStep } from "@/core/messages/usage-model";
 import { cn } from "@/lib/utils";

-export function MessageTokenUsage({
+function TokenUsageSummary({
  className,
-  enabled = false,
-  isLoading = false,
-  message,
+  inputTokens,
+  outputTokens,
+  totalTokens,
+  unavailable = false,
 }: {
  className?: string;
-  enabled?: boolean;
-  isLoading?: boolean;
-  message: Message;
+  inputTokens?: number;
+  outputTokens?: number;
+  totalTokens?: number;
+  unavailable?: boolean;
 }) {
  const { t } = useI18n();

-  if (!enabled || isLoading || message.type !== "ai") {
-    return null;
-  }
-
-  const usage = getUsageMetadata(message);
-
  return (
    <div
      className={cn(
@@ -35,16 +33,16 @@ export function MessageTokenUsage({
        <CoinsIcon className="size-3" />
        {t.tokenUsage.label}
      </span>
-      {usage ? (
+      {!unavailable ? (
        <>
          <span>
-            {t.tokenUsage.input}: {formatTokenCount(usage.inputTokens)}
+            {t.tokenUsage.input}: {formatTokenCount(inputTokens ?? 0)}
          </span>
          <span>
-            {t.tokenUsage.output}: {formatTokenCount(usage.outputTokens)}
+            {t.tokenUsage.output}: {formatTokenCount(outputTokens ?? 0)}
          </span>
          <span className="font-medium">
-            {t.tokenUsage.total}: {formatTokenCount(usage.totalTokens)}
+            {t.tokenUsage.total}: {formatTokenCount(totalTokens ?? 0)}
          </span>
        </>
      ) : (
@@ -75,17 +73,93 @@ export function MessageTokenUsageList({
    return null;
  }

+  const usage = accumulateUsage(aiMessages);
+
  return (
-    <>
-      {aiMessages.map((message, index) => (
-        <MessageTokenUsage
-          className={className}
-          enabled={enabled}
-          isLoading={isLoading}
-          key={message.id ?? index}
-          message={message}
-        />
-      ))}
-    </>
+    <TokenUsageSummary
+      className={className}
+      inputTokens={usage?.inputTokens}
+      outputTokens={usage?.outputTokens}
+      totalTokens={usage?.totalTokens}
+      unavailable={!usage}
+    />
+  );
+}
+
+export function MessageTokenUsageDebugList({
+  className,
+  enabled = false,
+  isLoading = false,
+  steps,
+}: {
+  className?: string;
+  enabled?: boolean;
+  isLoading?: boolean;
+  steps: TokenDebugStep[];
+}) {
+  const { t } = useI18n();
+
+  if (!enabled || isLoading) {
+    return null;
+  }
+
+  if (steps.length === 0) {
+    return null;
+  }
+
+  return (
+    <div className={cn("border-border/60 mt-1 border-t pt-2", className)}>
+      <div className="space-y-2">
+        {steps.map((step) => (
+          <div
+            key={step.id}
+            className="bg-muted/30 border-border/50 flex items-start justify-between gap-3 rounded-md border px-3 py-2"
+          >
+            <div className="min-w-0 flex-1 space-y-1">
+              <div className="text-foreground flex items-center gap-2 text-xs font-medium">
+                <CoinsIcon className="text-muted-foreground size-3" />
+                <span className="truncate">{step.label}</span>
+              </div>
+              {step.secondaryLabels.length > 0 && (
+                <div className="flex flex-wrap gap-1.5">
+                  {step.secondaryLabels.map((label, index) => (
+                    <Badge
+                      key={`${step.id}-${index}-${label}`}
+                      className="px-1.5 py-0 text-[10px] font-normal"
+                      variant="secondary"
+                    >
+                      {label}
+                    </Badge>
+                  ))}
+                </div>
+              )}
+              {step.sharedAttribution && (
+                <div className="text-muted-foreground text-[11px]">
+                  {t.tokenUsage.sharedAttribution}
+                </div>
+              )}
+              <div className="text-muted-foreground text-[11px]">
+                {step.usage ? (
+                  <>
+                    {t.tokenUsage.input}:{" "}
+                    {formatTokenCount(step.usage.inputTokens)}
+                    {" · "}
+                    {t.tokenUsage.output}:{" "}
+                    {formatTokenCount(step.usage.outputTokens)}
+                  </>
+                ) : (
+                  t.tokenUsage.unavailableShort
+                )}
+              </div>
+            </div>
+            <Badge className="shrink-0 font-mono" variant="outline">
+              {step.usage
+                ? `${formatTokenCount(step.usage.totalTokens)} ${t.tokenUsage.label}`
+                : t.tokenUsage.unavailableShort}
+            </Badge>
+          </div>
+        ))}
+      </div>
+    </div>
  );
 }
@@ -8,11 +8,13 @@ import { Input } from "@/components/ui/input";
 import { fetch, getCsrfHeaders } from "@/core/api/fetcher";
 import { useAuth } from "@/core/auth/AuthProvider";
 import { parseAuthError } from "@/core/auth/types";
+import { useI18n } from "@/core/i18n/hooks";

 import { SettingsSection } from "./settings-section";

 export function AccountSettingsPage() {
  const { user, logout } = useAuth();
+  const { t } = useI18n();
  const [currentPassword, setCurrentPassword] = useState("");
  const [newPassword, setNewPassword] = useState("");
  const [confirmPassword, setConfirmPassword] = useState("");
@@ -26,11 +28,11 @@ export function AccountSettingsPage() {
    setMessage("");

    if (newPassword !== confirmPassword) {
-      setError("New passwords do not match");
+      setError(t.settings.account.passwordMismatch);
      return;
    }
    if (newPassword.length < 8) {
-      setError("Password must be at least 8 characters");
+      setError(t.settings.account.passwordTooShort);
      return;
    }

@@ -55,12 +57,12 @@ export function AccountSettingsPage() {
        return;
      }

-      setMessage("Password changed successfully");
+      setMessage(t.settings.account.passwordChangedSuccess);
      setCurrentPassword("");
      setNewPassword("");
      setConfirmPassword("");
    } catch {
-      setError("Network error. Please try again.");
+      setError(t.settings.account.networkError);
    } finally {
      setLoading(false);
    }
@@ -68,12 +70,16 @@ export function AccountSettingsPage() {

  return (
    <div className="space-y-8">
-      <SettingsSection title="Profile">
+      <SettingsSection title={t.settings.account.profileTitle}>
        <div className="space-y-2">
          <div className="grid grid-cols-[max-content_max-content] items-center gap-4">
-            <span className="text-muted-foreground text-sm">Email</span>
+            <span className="text-muted-foreground text-sm">
+              {t.settings.account.email}
+            </span>
            <span className="text-sm font-medium">{user?.email ?? "—"}</span>
-            <span className="text-muted-foreground text-sm">Role</span>
+            <span className="text-muted-foreground text-sm">
+              {t.settings.account.role}
+            </span>
            <span className="text-sm font-medium capitalize">
              {user?.system_role ?? "—"}
            </span>
@@ -82,20 +88,20 @@ export function AccountSettingsPage() {
      </SettingsSection>

      <SettingsSection
-        title="Change Password"
-        description="Update your account password."
+        title={t.settings.account.changePasswordTitle}
+        description={t.settings.account.changePasswordDescription}
      >
        <form onSubmit={handleChangePassword} className="max-w-sm space-y-3">
          <Input
            type="password"
-            placeholder="Current password"
+            placeholder={t.settings.account.currentPassword}
            value={currentPassword}
            onChange={(e) => setCurrentPassword(e.target.value)}
            required
          />
          <Input
            type="password"
-            placeholder="New password"
+            placeholder={t.settings.account.newPassword}
            value={newPassword}
            onChange={(e) => setNewPassword(e.target.value)}
            required
@@ -103,7 +109,7 @@ export function AccountSettingsPage() {
          />
          <Input
            type="password"
-            placeholder="Confirm new password"
+            placeholder={t.settings.account.confirmNewPassword}
            value={confirmPassword}
            onChange={(e) => setConfirmPassword(e.target.value)}
            required
@@ -112,7 +118,9 @@ export function AccountSettingsPage() {
          {error && <p className="text-sm text-red-500">{error}</p>}
          {message && <p className="text-sm text-green-500">{message}</p>}
          <Button type="submit" variant="outline" size="sm" disabled={loading}>
-            {loading ? "Updating..." : "Update Password"}
+            {loading
+              ? t.settings.account.updating
+              : t.settings.account.updatePassword}
          </Button>
        </form>
      </SettingsSection>
@@ -125,7 +133,7 @@ export function AccountSettingsPage() {
          className="gap-2"
        >
          <LogOutIcon className="size-4" />
-          Sign Out
+          {t.settings.account.signOut}
        </Button>
      </SettingsSection>
    </div>
@@ -1,60 +1,81 @@
 "use client";

 import type { Message } from "@langchain/langgraph-sdk";
-import { CoinsIcon } from "lucide-react";
+import { ChevronDownIcon, CoinsIcon } from "lucide-react";
 import { useMemo } from "react";

+import { Button } from "@/components/ui/button";
 import {
-  Tooltip,
-  TooltipContent,
-  TooltipTrigger,
-} from "@/components/ui/tooltip";
+  DropdownMenu,
+  DropdownMenuContent,
+  DropdownMenuLabel,
+  DropdownMenuRadioGroup,
+  DropdownMenuRadioItem,
+  DropdownMenuSeparator,
+  DropdownMenuTrigger,
+} from "@/components/ui/dropdown-menu";
 import { useI18n } from "@/core/i18n/hooks";
 import { accumulateUsage, formatTokenCount } from "@/core/messages/usage";
+import {
+  getTokenUsageViewPreset,
+  tokenUsagePreferencesFromPreset,
+  type TokenUsagePreferences,
+  type TokenUsageViewPreset,
+} from "@/core/messages/usage-model";
 import { cn } from "@/lib/utils";

 interface TokenUsageIndicatorProps {
  messages: Message[];
  enabled?: boolean;
+  preferences: TokenUsagePreferences;
+  onPreferencesChange: (preferences: TokenUsagePreferences) => void;
  className?: string;
 }

 export function TokenUsageIndicator({
  messages,
  enabled = false,
+  preferences,
+  onPreferencesChange,
  className,
 }: TokenUsageIndicatorProps) {
  const { t } = useI18n();

  const usage = useMemo(() => accumulateUsage(messages), [messages]);
+  const preset = getTokenUsageViewPreset(preferences);

  if (!enabled) {
    return null;
  }

  return (
-    <Tooltip delayDuration={200}>
-      <TooltipTrigger asChild>
-        <button
+    <DropdownMenu>
+      <DropdownMenuTrigger asChild>
+        <Button
          type="button"
+          variant="ghost"
          className={cn(
-            "text-muted-foreground bg-background/70 flex cursor-default items-center gap-1.5 rounded-full border px-2 py-1 text-xs",
-            !usage && "opacity-60",
+            "text-muted-foreground bg-background/70 hover:bg-background/90 flex h-auto items-center gap-1.5 rounded-full border px-2 py-1 text-xs font-normal",
            className,
          )}
        >
          <CoinsIcon size={14} />
          <span>{t.tokenUsage.label}</span>
          <span className="font-mono">
-            {usage ? formatTokenCount(usage.totalTokens) : "-"}
+            {preferences.headerTotal
+              ? usage
+                ? formatTokenCount(usage.totalTokens)
+                : "-"
+              : t.tokenUsage.presets[presetKeyToTranslationKey(preset)]}
          </span>
-        </button>
-      </TooltipTrigger>
-      <TooltipContent side="bottom" align="end">
-        <div className="space-y-1 text-xs">
-          <div className="font-medium">{t.tokenUsage.title}</div>
+          <ChevronDownIcon className="size-3" />
+        </Button>
+      </DropdownMenuTrigger>
+      <DropdownMenuContent side="bottom" align="end" className="w-80">
+        <DropdownMenuLabel>{t.tokenUsage.title}</DropdownMenuLabel>
+        <div className="px-2 py-1 text-xs">
          {usage ? (
-            <>
+            <div className="space-y-1">
              <div className="flex justify-between gap-4">
                <span>{t.tokenUsage.input}</span>
                <span className="font-mono">
@@ -75,14 +96,53 @@ export function TokenUsageIndicator({
                  </span>
                </div>
              </div>
-            </>
+            </div>
          ) : (
-            <div className="text-muted-foreground max-w-56">
+            <div className="text-muted-foreground">
              {t.tokenUsage.unavailable}
            </div>
          )}
        </div>
-      </TooltipContent>
-    </Tooltip>
+        <DropdownMenuSeparator />
+        <DropdownMenuLabel>{t.tokenUsage.view}</DropdownMenuLabel>
+        <DropdownMenuRadioGroup
+          value={preset}
+          onValueChange={(value) =>
+            onPreferencesChange(
+              tokenUsagePreferencesFromPreset(value as TokenUsageViewPreset),
+            )
+          }
+        >
+          {(
+            ["off", "summary", "per_turn", "debug"] as TokenUsageViewPreset[]
+          ).map((value) => {
+            const translationKey = presetKeyToTranslationKey(value);
+            return (
+              <DropdownMenuRadioItem key={value} value={value}>
+                <div className="grid gap-0.5">
+                  <span>{t.tokenUsage.presets[translationKey]}</span>
+                  <span className="text-muted-foreground text-xs">
+                    {t.tokenUsage.presetDescriptions[translationKey]}
+                  </span>
+                </div>
+              </DropdownMenuRadioItem>
+            );
+          })}
+        </DropdownMenuRadioGroup>
+        <DropdownMenuSeparator />
+        <div className="text-muted-foreground px-2 py-2 text-xs leading-relaxed">
+          {t.tokenUsage.note}
+        </div>
+      </DropdownMenuContent>
+    </DropdownMenu>
  );
 }
+
+function presetKeyToTranslationKey(preset: TokenUsageViewPreset) {
+  switch (preset) {
+    case "per_turn":
+      return "perTurn" as const;
+    default:
+      return preset;
+  }
+}
@@ -64,6 +64,12 @@ Dynamically configures the current agent session. Used during the bootstrap flow

 ---

+### update_agent
+
+Persists updates to the current custom agent's `SOUL.md` and `config.yaml`. Bound to the lead agent only when a custom agent is active (`agent_name` is set in the runtime context). Use this when the user asks the agent to refine its own description, personality, skill whitelist, tool-group whitelist, or default model — it writes directly into the per-user layout `{base_dir}/users/{user_id}/agents/{agent_name}/`, so the change is picked up automatically on the next user turn. Only the fields you explicitly pass are updated; omit a field to preserve its existing value. Pass `skills=[]` to disable all skills, or omit `skills` to keep the existing whitelist.
+
+---
+
 ### invoke_acp_agent

 Invokes an external agent using the [Agent Connect Protocol (ACP)](https://agentconnectprotocol.org/). Requires `acp_agents:` configuration in `config.yaml`. See the [Subagents](/docs/harness/subagents) page for ACP configuration.
@@ -61,6 +61,12 @@ task(agent="general-purpose", task="...", context="...")

 ---

+### update_agent
+
+将更新持久化到当前自定义 Agent 的 `SOUL.md` 和 `config.yaml`。仅当激活了自定义 Agent（运行时上下文中存在 `agent_name`）时，才会绑定到 lead agent。当用户在 Agent 内开启 chat 并要求该 Agent 调整自身的描述、人格、技能白名单、工具组白名单或默认模型时使用——它会直接写入按用户隔离的 `{base_dir}/users/{user_id}/agents/{agent_name}/` 下的真实配置文件，下一轮对话即可生效。仅显式传入的字段会被更新；省略某个字段以保留其现有值。传入 `skills=[]` 可禁用全部技能，省略 `skills` 则保留现有白名单。
+
+---
+
 ### invoke_acp_agent

 使用 [Agent Connect Protocol (ACP)](https://agentconnectprotocol.org/) 调用外部 Agent。需要在 `config.yaml` 中配置 `acp_agents:`。参见[子 Agent](/docs/harness/subagents)页面了解 ACP 配置。
@@ -15,6 +15,17 @@ export class AgentNameCheckError extends Error {
  }
 }

+export class AgentsApiDisabledError extends Error {
+  constructor(message: string) {
+    super(message);
+    this.name = "AgentsApiDisabledError";
+  }
+}
+
+function isAgentsApiDisabledDetail(detail: string | undefined): boolean {
+  return typeof detail === "string" && detail.includes("agents_api.enabled");
+}
+
 export async function listAgents(): Promise<Agent[]> {
  const res = await fetch(`${getBackendBaseURL()}/api/agents`);
  if (!res.ok) throw new Error(`Failed to load agents: ${res.statusText}`);
@@ -36,6 +47,9 @@ export async function createAgent(request: CreateAgentRequest): Promise<Agent> {
  });
  if (!res.ok) {
    const err = (await res.json().catch(() => ({}))) as { detail?: string };
+    if (isAgentsApiDisabledDetail(err.detail)) {
+      throw new AgentsApiDisabledError(err.detail!);
+    }
    throw new Error(err.detail ?? `Failed to create agent: ${res.statusText}`);
  }
  return res.json() as Promise<Agent>;
@@ -81,6 +95,9 @@ export async function checkAgentName(

  if (!res.ok) {
    const err = (await res.json().catch(() => ({}))) as { detail?: string };
+    if (isAgentsApiDisabledDetail(err.detail)) {
+      throw new AgentsApiDisabledError(err.detail!);
+    }
    if (BACKEND_UNAVAILABLE_STATUSES.has(res.status)) {
      throw new AgentNameCheckError(
        "Could not reach the DeerFlow backend.",
@@ -12,12 +12,11 @@ let _cached: GatewayConfig | null = null;
 export function getGatewayConfig(): GatewayConfig {
  if (_cached) return _cached;

-  const isDev = process.env.NODE_ENV === "development";
-
  const rawUrl = process.env.DEER_FLOW_INTERNAL_GATEWAY_BASE_URL?.trim();
  const internalGatewayUrl =
-    rawUrl?.replace(/\/+$/, "") ??
-    (isDev ? "http://localhost:8001" : undefined);
+    rawUrl && rawUrl.length > 0
+      ? rawUrl.replace(/\/+$/, "")
+      : "http://127.0.0.1:8001";

  const rawOrigins = process.env.DEER_FLOW_TRUSTED_ORIGINS?.trim();
  const trustedOrigins = rawOrigins
@@ -25,9 +24,7 @@ export function getGatewayConfig(): GatewayConfig {
        .split(",")
        .map((s) => s.trim())
        .filter(Boolean)
-    : isDev
-      ? ["http://localhost:3000"]
-      : undefined;
+    : ["http://localhost:3000"];

  _cached = gatewayConfigSchema.parse({ internalGatewayUrl, trustedOrigins });
  return _cached;
@@ -204,6 +204,8 @@ export const enUS: Translations = {
    nameStepNetworkError:
      "Network request failed — check your network or backend connection",
    nameStepCheckError: "Could not verify name availability — please try again",
+    nameStepApiDisabledError:
+      "Custom agent management is not enabled on this server. Please contact your administrator.",
    nameStepBootstrapMessage:
      "The new custom agent name is {name}. Let's bootstrap it's **SOUL**.",
    save: "Save agent",
@@ -304,9 +306,32 @@ export const enUS: Translations = {
    input: "Input",
    output: "Output",
    total: "Total",
+    view: "Display",
    unavailable:
      "No token usage yet. Usage appears only after a successful model response when the provider returns usage_metadata.",
    unavailableShort: "No usage returned",
+    note: "Shown from provider-returned usage_metadata. Totals are best-effort conversation totals and may differ from provider billing pages.",
+    presets: {
+      off: "Off",
+      summary: "Summary",
+      perTurn: "Per turn",
+      debug: "Debug",
+    },
+    presetDescriptions: {
+      off: "Hide token usage in the header and conversation.",
+      summary: "Show only the current conversation total in the header.",
+      perTurn:
+        "Show the header total and one token summary per assistant turn.",
+      debug: "Show the header total and step-level token debugging details.",
+    },
+    finalAnswer: "Final answer",
+    stepTotal: "Step total",
+    sharedAttribution: "Shared across multiple actions in this step",
+    subagent: (description: string) => `Subagent: ${description}`,
+    startTodo: (content: string) => `Start To-do: ${content}`,
+    completeTodo: (content: string) => `Complete To-do: ${content}`,
+    updateTodo: (content: string) => `Update To-do: ${content}`,
+    removeTodo: (content: string) => `Remove To-do: ${content}`,
  },

  // Shortcuts
@@ -453,6 +478,23 @@ export const enUS: Translations = {
      notSupported: "Your browser does not support notifications.",
      disableNotification: "Disable notification",
    },
+    account: {
+      profileTitle: "Profile",
+      email: "Email",
+      role: "Role",
+      changePasswordTitle: "Change Password",
+      changePasswordDescription: "Update your account password.",
+      currentPassword: "Current password",
+      newPassword: "New password",
+      confirmNewPassword: "Confirm new password",
+      passwordMismatch: "New passwords do not match",
+      passwordTooShort: "Password must be at least 8 characters",
+      passwordChangedSuccess: "Password changed successfully",
+      networkError: "Network error. Please try again.",
+      updating: "Updating...",
+      updatePassword: "Update Password",
+      signOut: "Sign Out",
+    },
    acknowledge: {
      emptyTitle: "Acknowledgements",
      emptyDescription: "Credits and acknowledgements will show here.",
@@ -141,6 +141,7 @@ export interface Translations {
    nameStepAlreadyExistsError: string;
    nameStepNetworkError: string;
    nameStepCheckError: string;
+    nameStepApiDisabledError: string;
    nameStepBootstrapMessage: string;
    save: string;
    saving: string;
@@ -235,8 +236,30 @@ export interface Translations {
    input: string;
    output: string;
    total: string;
+    view: string;
    unavailable: string;
    unavailableShort: string;
+    note: string;
+    presets: {
+      off: string;
+      summary: string;
+      perTurn: string;
+      debug: string;
+    };
+    presetDescriptions: {
+      off: string;
+      summary: string;
+      perTurn: string;
+      debug: string;
+    };
+    finalAnswer: string;
+    stepTotal: string;
+    sharedAttribution: string;
+    subagent: (description: string) => string;
+    startTodo: (content: string) => string;
+    completeTodo: (content: string) => string;
+    updateTodo: (content: string) => string;
+    removeTodo: (content: string) => string;
  };

  // Shortcuts
@@ -371,6 +394,23 @@ export interface Translations {
      notSupported: string;
      disableNotification: string;
    };
+    account: {
+      profileTitle: string;
+      email: string;
+      role: string;
+      changePasswordTitle: string;
+      changePasswordDescription: string;
+      currentPassword: string;
+      newPassword: string;
+      confirmNewPassword: string;
+      passwordMismatch: string;
+      passwordTooShort: string;
+      passwordChangedSuccess: string;
+      networkError: string;
+      updating: string;
+      updatePassword: string;
+      signOut: string;
+    };
    acknowledge: {
      emptyTitle: string;
      emptyDescription: string;
@@ -192,6 +192,8 @@ export const zhCN: Translations = {
    nameStepAlreadyExistsError: "已存在同名智能体",
    nameStepNetworkError: "网络请求失败，请检查网络或后端连接",
    nameStepCheckError: "无法验证名称可用性，请稍后重试",
+    nameStepApiDisabledError:
+      "服务器未开启自定义智能体管理功能，请联系管理员。",
    nameStepBootstrapMessage:
      "新智能体的名称是 {name}，现在开始为它生成 **SOUL**。",
    save: "保存智能体",
@@ -290,9 +292,31 @@ export const zhCN: Translations = {
    input: "输入",
    output: "输出",
    total: "总计",
+    view: "显示方式",
    unavailable:
      "暂无 Token 用量。只有模型成功返回且供应商提供 usage_metadata 时才会显示。",
    unavailableShort: "未返回用量",
+    note: "基于供应商返回的 usage_metadata 展示。当前总量是 best-effort 的会话参考值，可能与平台账单页不完全一致。",
+    presets: {
+      off: "关闭",
+      summary: "总览",
+      perTurn: "每轮",
+      debug: "调试",
+    },
+    presetDescriptions: {
+      off: "隐藏顶部和会话内的 token 展示。",
+      summary: "只在顶部显示当前对话累计 token。",
+      perTurn: "显示顶部累计，并为每轮 assistant 回复显示一条汇总 token。",
+      debug: "显示顶部累计，并展示按步骤归类的 token 调试信息。",
+    },
+    finalAnswer: "最终回复",
+    stepTotal: "步骤总计",
+    sharedAttribution: "该 token 由此步骤中的多个动作共同消耗",
+    subagent: (description: string) => `子任务：${description}`,
+    startTodo: (content: string) => `开始 To-do：${content}`,
+    completeTodo: (content: string) => `完成 To-do：${content}`,
+    updateTodo: (content: string) => `更新 To-do：${content}`,
+    removeTodo: (content: string) => `移除 To-do：${content}`,
  },

  // Shortcuts
@@ -434,6 +458,23 @@ export const zhCN: Translations = {
      notSupported: "当前浏览器不支持通知功能。",
      disableNotification: "关闭通知",
    },
+    account: {
+      profileTitle: "个人信息",
+      email: "邮箱",
+      role: "角色",
+      changePasswordTitle: "修改密码",
+      changePasswordDescription: "更新你的账号密码。",
+      currentPassword: "当前密码",
+      newPassword: "新密码",
+      confirmNewPassword: "确认新密码",
+      passwordMismatch: "两次输入的新密码不一致",
+      passwordTooShort: "密码长度至少为 8 个字符",
+      passwordChangedSuccess: "密码修改成功",
+      networkError: "网络错误，请重试。",
+      updating: "更新中...",
+      updatePassword: "修改密码",
+      signOut: "退出登录",
+    },
    acknowledge: {
      emptyTitle: "致谢",
      emptyDescription: "相关的致谢信息会展示在这里。",
@@ -0,0 +1,440 @@
+import type { Message } from "@langchain/langgraph-sdk";
+
+import type { Translations } from "@/core/i18n/locales/types";
+
+import { getUsageMetadata, type TokenUsage } from "./usage";
+import { hasContent } from "./utils";
+
+export type TokenUsageInlineMode = "off" | "per_turn" | "step_debug";
+
+export interface TokenUsagePreferences {
+  headerTotal: boolean;
+  inlineMode: TokenUsageInlineMode;
+}
+
+export type TokenUsageViewPreset = "off" | "summary" | "per_turn" | "debug";
+
+export interface TokenDebugStep {
+  id: string;
+  messageId: string;
+  label: string;
+  secondaryLabels: string[];
+  usage: TokenUsage | null;
+  sharedAttribution: boolean;
+}
+
+type TokenUsageAttributionAction =
+  | {
+      kind: "todo_start" | "todo_complete" | "todo_update" | "todo_remove";
+      content?: string;
+      tool_call_id?: string;
+    }
+  | {
+      kind: "subagent";
+      description?: string | null;
+      subagent_type?: string | null;
+      tool_call_id?: string;
+    }
+  | {
+      kind: "search";
+      query?: string | null;
+      tool_name?: string | null;
+      tool_call_id?: string;
+    }
+  | {
+      kind: "present_files" | "clarification";
+      tool_call_id?: string;
+    }
+  | {
+      kind: "tool";
+      tool_name?: string | null;
+      description?: string | null;
+      tool_call_id?: string;
+    };
+
+interface TokenUsageAttribution {
+  version?: number;
+  kind?:
+    | "thinking"
+    | "final_answer"
+    | "tool_batch"
+    | "todo_update"
+    | "subagent_dispatch";
+  shared_attribution?: boolean;
+  tool_call_ids?: string[];
+  actions?: TokenUsageAttributionAction[];
+}
+
+// Precise write_todos labels come from the backend attribution payload.
+// The frontend fallback intentionally stays generic so we do not duplicate
+// backend/packages/harness/deerflow/agents/middlewares/token_usage_middleware.py
+//::_build_todo_actions and risk the two diffing algorithms drifting apart.
+
+export function getTokenUsageViewPreset(
+  preferences: TokenUsagePreferences,
+): TokenUsageViewPreset {
+  if (!preferences.headerTotal && preferences.inlineMode === "off") {
+    return "off";
+  }
+  if (preferences.headerTotal && preferences.inlineMode === "off") {
+    return "summary";
+  }
+  if (preferences.inlineMode === "step_debug") {
+    return "debug";
+  }
+  return "per_turn";
+}
+
+export function tokenUsagePreferencesFromPreset(
+  preset: TokenUsageViewPreset,
+): TokenUsagePreferences {
+  switch (preset) {
+    case "off":
+      return { headerTotal: false, inlineMode: "off" };
+    case "summary":
+      return { headerTotal: true, inlineMode: "off" };
+    case "debug":
+      return { headerTotal: true, inlineMode: "step_debug" };
+    case "per_turn":
+    default:
+      return { headerTotal: true, inlineMode: "per_turn" };
+  }
+}
+
+export function buildTokenDebugSteps(
+  messages: Message[],
+  t: Translations,
+): TokenDebugStep[] {
+  const steps: TokenDebugStep[] = [];
+
+  for (const [index, message] of messages.entries()) {
+    if (message.type !== "ai") {
+      continue;
+    }
+
+    const usage = getUsageMetadata(message);
+    const attribution = getTokenUsageAttribution(message);
+    const actionLabels: string[] = [];
+
+    if (attribution) {
+      actionLabels.push(...buildActionLabelsFromAttribution(attribution, t));
+
+      if (actionLabels.length === 0) {
+        if (attribution.kind === "final_answer") {
+          actionLabels.push(t.tokenUsage.finalAnswer);
+        } else if (attribution.kind === "thinking") {
+          actionLabels.push(t.common.thinking);
+        }
+      }
+
+      if (actionLabels.length > 0) {
+        const sharedAttribution =
+          attribution.shared_attribution ?? actionLabels.length > 1;
+        steps.push({
+          id: message.id ?? `token-step-${index}`,
+          messageId: message.id ?? `token-step-${index}`,
+          label:
+            sharedAttribution && actionLabels.length > 1
+              ? t.tokenUsage.stepTotal
+              : actionLabels[0]!,
+          secondaryLabels:
+            sharedAttribution && actionLabels.length > 1 ? actionLabels : [],
+          usage,
+          sharedAttribution,
+        });
+        continue;
+      }
+    }
+
+    for (const toolCall of message.tool_calls ?? []) {
+      const toolArgs = (toolCall.args ?? {}) as Record<string, unknown>;
+
+      if (toolCall.name === "write_todos") {
+        actionLabels.push(t.toolCalls.writeTodos);
+        continue;
+      }
+
+      actionLabels.push(
+        describeToolCall(
+          {
+            name: toolCall.name,
+            args: toolArgs,
+          },
+          t,
+        ),
+      );
+    }
+
+    if (actionLabels.length === 0) {
+      if (hasContent(message)) {
+        actionLabels.push(t.tokenUsage.finalAnswer);
+      } else {
+        actionLabels.push(t.common.thinking);
+      }
+    }
+
+    steps.push({
+      id: message.id ?? `token-step-${index}`,
+      messageId: message.id ?? `token-step-${index}`,
+      label:
+        actionLabels.length === 1 ? actionLabels[0]! : t.tokenUsage.stepTotal,
+      secondaryLabels: actionLabels.length > 1 ? actionLabels : [],
+      usage,
+      sharedAttribution: actionLabels.length > 1,
+    });
+  }
+
+  return steps;
+}
+
+function getTokenUsageAttribution(
+  message: Message,
+): TokenUsageAttribution | null {
+  if (message.type !== "ai") {
+    return null;
+  }
+
+  const additionalKwargs = message.additional_kwargs;
+  if (!additionalKwargs || typeof additionalKwargs !== "object") {
+    return null;
+  }
+
+  const attribution = (additionalKwargs as Record<string, unknown>)
+    .token_usage_attribution;
+  const normalized = normalizeTokenUsageAttribution(attribution);
+  if (!normalized) {
+    return null;
+  }
+
+  return normalized;
+}
+
+function buildActionLabelsFromAttribution(
+  attribution: TokenUsageAttribution,
+  t: Translations,
+): string[] {
+  return (attribution.actions ?? [])
+    .map((action) => describeAttributionAction(action, t))
+    .filter((label): label is string => !!label);
+}
+
+function describeAttributionAction(
+  action: TokenUsageAttributionAction,
+  t: Translations,
+): string | null {
+  switch (action.kind) {
+    case "todo_start":
+      return action.content
+        ? t.tokenUsage.startTodo(action.content)
+        : t.toolCalls.writeTodos;
+    case "todo_complete":
+      return action.content
+        ? t.tokenUsage.completeTodo(action.content)
+        : t.toolCalls.writeTodos;
+    case "todo_update":
+      return action.content
+        ? t.tokenUsage.updateTodo(action.content)
+        : t.toolCalls.writeTodos;
+    case "todo_remove":
+      return action.content
+        ? t.tokenUsage.removeTodo(action.content)
+        : t.toolCalls.writeTodos;
+    case "subagent":
+      return t.tokenUsage.subagent(action.description ?? t.subtasks.subtask);
+    case "search":
+      if (action.query) {
+        return t.toolCalls.searchFor(action.query);
+      }
+      return t.toolCalls.useTool(action.tool_name ?? "search");
+    case "present_files":
+      return t.toolCalls.presentFiles;
+    case "clarification":
+      return t.toolCalls.needYourHelp;
+    case "tool":
+      return describeToolCall(
+        {
+          name: action.tool_name ?? "tool",
+          args: action.description ? { description: action.description } : {},
+        },
+        t,
+      );
+    default:
+      return null;
+  }
+}
+
+function describeToolCall(
+  toolCall: {
+    name: string;
+    args: Record<string, unknown>;
+  },
+  t: Translations,
+): string {
+  if (toolCall.name === "task") {
+    const description =
+      typeof toolCall.args.description === "string"
+        ? toolCall.args.description
+        : t.subtasks.subtask;
+    return t.tokenUsage.subagent(description);
+  }
+
+  if (
+    (toolCall.name === "web_search" || toolCall.name === "image_search") &&
+    typeof toolCall.args.query === "string"
+  ) {
+    return t.toolCalls.searchFor(toolCall.args.query);
+  }
+
+  if (toolCall.name === "web_fetch") {
+    return t.toolCalls.viewWebPage;
+  }
+
+  if (toolCall.name === "present_files") {
+    return t.toolCalls.presentFiles;
+  }
+
+  if (toolCall.name === "ask_clarification") {
+    return t.toolCalls.needYourHelp;
+  }
+
+  if (typeof toolCall.args.description === "string") {
+    return toolCall.args.description;
+  }
+
+  return t.toolCalls.useTool(toolCall.name);
+}
+
+function normalizeTokenUsageAttribution(
+  value: unknown,
+): TokenUsageAttribution | null {
+  const record = asRecord(value);
+  if (!record) {
+    return null;
+  }
+
+  const rawActions = record.actions;
+  if (rawActions !== undefined && !Array.isArray(rawActions)) {
+    return null;
+  }
+
+  return {
+    // Versioning is additive for now: the frontend should ignore unknown
+    // fields and fall back when required fields become incompatible.
+    version: typeof record.version === "number" ? record.version : undefined,
+    kind: isTokenUsageAttributionKind(record.kind) ? record.kind : undefined,
+    shared_attribution:
+      typeof record.shared_attribution === "boolean"
+        ? record.shared_attribution
+        : undefined,
+    tool_call_ids: Array.isArray(record.tool_call_ids)
+      ? record.tool_call_ids.filter(
+          (toolCallId): toolCallId is string =>
+            typeof toolCallId === "string" && toolCallId.trim().length > 0,
+        )
+      : undefined,
+    actions: Array.isArray(rawActions)
+      ? rawActions
+          .map((action) => normalizeTokenUsageAttributionAction(action))
+          .filter(
+            (action): action is TokenUsageAttributionAction => action !== null,
+          )
+      : undefined,
+  };
+}
+
+function normalizeTokenUsageAttributionAction(
+  value: unknown,
+): TokenUsageAttributionAction | null {
+  const record = asRecord(value);
+  if (!record) {
+    return null;
+  }
+
+  const kind = record.kind;
+  if (
+    kind !== "todo_start" &&
+    kind !== "todo_complete" &&
+    kind !== "todo_update" &&
+    kind !== "todo_remove" &&
+    kind !== "subagent" &&
+    kind !== "search" &&
+    kind !== "present_files" &&
+    kind !== "clarification" &&
+    kind !== "tool"
+  ) {
+    return null;
+  }
+
+  const content = readString(record.content);
+  const toolCallId = readString(record.tool_call_id);
+
+  switch (kind) {
+    case "todo_start":
+    case "todo_complete":
+    case "todo_update":
+    case "todo_remove":
+      return {
+        kind,
+        content,
+        tool_call_id: toolCallId,
+      };
+    case "subagent":
+      return {
+        kind,
+        description: readString(record.description),
+        subagent_type: readString(record.subagent_type),
+        tool_call_id: toolCallId,
+      };
+    case "search":
+      return {
+        kind,
+        query: readString(record.query),
+        tool_name: readString(record.tool_name),
+        tool_call_id: toolCallId,
+      };
+    case "present_files":
+    case "clarification":
+      return {
+        kind,
+        tool_call_id: toolCallId,
+      };
+    case "tool":
+      return {
+        kind,
+        tool_name: readString(record.tool_name),
+        description: readString(record.description),
+        tool_call_id: toolCallId,
+      };
+    default:
+      return null;
+  }
+}
+
+function asRecord(value: unknown): Record<string, unknown> | null {
+  if (!value || typeof value !== "object" || Array.isArray(value)) {
+    return null;
+  }
+
+  return value as Record<string, unknown>;
+}
+
+function readString(value: unknown): string | undefined {
+  if (typeof value !== "string") {
+    return undefined;
+  }
+
+  const normalized = value.trim();
+  return normalized.length > 0 ? normalized : undefined;
+}
+
+function isTokenUsageAttributionKind(
+  value: unknown,
+): value is NonNullable<TokenUsageAttribution["kind"]> {
+  return (
+    value === "thinking" ||
+    value === "final_answer" ||
+    value === "tool_batch" ||
+    value === "todo_update" ||
+    value === "subagent_dispatch"
+  );
+}
@@ -18,7 +18,7 @@ interface AssistantClarificationGroup extends GenericMessageGroup<"assistant:cla

 interface AssistantSubagentGroup extends GenericMessageGroup<"assistant:subagent"> {}

-type MessageGroup =
+export type MessageGroup =
  | HumanMessageGroup
  | AssistantProcessingGroup
  | AssistantMessageGroup
@@ -26,10 +26,7 @@ type MessageGroup =
  | AssistantClarificationGroup
  | AssistantSubagentGroup;

-export function groupMessages<T>(
-  messages: Message[],
-  mapper: (group: MessageGroup) => T,
-): T[] {
+export function getMessageGroups(messages: Message[]): MessageGroup[] {
  if (messages.length === 0) {
    return [];
  }
@@ -124,11 +121,52 @@ export function groupMessages<T>(
    }
  }

-  return groups
+  return groups;
+}
+
+export function groupMessages<T>(
+  messages: Message[],
+  mapper: (group: MessageGroup) => T,
+): T[] {
+  return getMessageGroups(messages)
    .map(mapper)
    .filter((result) => result !== undefined && result !== null) as T[];
 }

+export function getAssistantTurnUsageMessages(groups: MessageGroup[]) {
+  const usageMessagesByGroupIndex: Array<Message[] | null> = Array.from(
+    { length: groups.length },
+    () => null,
+  );
+
+  let turnStartIndex: number | null = null;
+
+  for (const [index, group] of groups.entries()) {
+    if (group.type === "human") {
+      turnStartIndex = null;
+      continue;
+    }
+
+    turnStartIndex ??= index;
+
+    const nextGroup = groups[index + 1];
+    const isTurnEnd = !nextGroup || nextGroup.type === "human";
+
+    if (!isTurnEnd) {
+      continue;
+    }
+
+    usageMessagesByGroupIndex[index] = groups
+      .slice(turnStartIndex, index + 1)
+      .flatMap((currentGroup) => currentGroup.messages)
+      .filter((message) => message.type === "ai");
+
+    turnStartIndex = null;
+  }
+
+  return usageMessagesByGroupIndex;
+}
+
 export function extractTextFromMessage(message: Message) {
  if (typeof message.content === "string") {
    return (
@@ -1,9 +1,14 @@
+import type { TokenUsageInlineMode } from "../messages/usage-model";
 import type { AgentThreadContext } from "../threads";

 export const DEFAULT_LOCAL_SETTINGS: LocalSettings = {
  notification: {
    enabled: true,
  },
+  tokenUsage: {
+    headerTotal: true,
+    inlineMode: "per_turn",
+  },
  context: {
    model_name: undefined,
    mode: undefined,
@@ -22,6 +27,10 @@ export interface LocalSettings {
  notification: {
    enabled: boolean;
  };
+  tokenUsage: {
+    headerTotal: boolean;
+    inlineMode: TokenUsageInlineMode;
+  };
  context: Omit<
    AgentThreadContext,
    | "thread_id"
@@ -44,6 +53,10 @@ function mergeLocalSettings(settings?: Partial<LocalSettings>): LocalSettings {
      ...DEFAULT_LOCAL_SETTINGS.context,
      ...settings?.context,
    },
+    tokenUsage: {
+      ...DEFAULT_LOCAL_SETTINGS.tokenUsage,
+      ...settings?.tokenUsage,
+    },
    notification: {
      ...DEFAULT_LOCAL_SETTINGS.notification,
      ...settings?.notification,
@@ -0,0 +1,111 @@
+import { afterEach, beforeEach, describe, expect, test, vi } from "vitest";
+
+const ENV_KEYS = [
+  "NODE_ENV",
+  "DEER_FLOW_INTERNAL_GATEWAY_BASE_URL",
+  "DEER_FLOW_TRUSTED_ORIGINS",
+] as const;
+
+type EnvSnapshot = Partial<
+  Record<(typeof ENV_KEYS)[number], string | undefined>
+>;
+
+function snapshotEnv(): EnvSnapshot {
+  const snapshot: EnvSnapshot = {};
+  for (const key of ENV_KEYS) {
+    snapshot[key] = process.env[key];
+  }
+  return snapshot;
+}
+
+function setEnv(key: (typeof ENV_KEYS)[number], value: string | undefined) {
+  // NODE_ENV is typed as a readonly literal union, so we go through the
+  // index signature to keep the test compiler-friendly across cases.
+  const env = process.env as Record<string, string | undefined>;
+  if (value === undefined) {
+    delete env[key];
+  } else {
+    env[key] = value;
+  }
+}
+
+function restoreEnv(snapshot: EnvSnapshot) {
+  for (const key of ENV_KEYS) {
+    setEnv(key, snapshot[key]);
+  }
+}
+
+async function loadFreshConfig() {
+  vi.resetModules();
+  return await import("@/core/auth/gateway-config");
+}
+
+describe("getGatewayConfig", () => {
+  let saved: EnvSnapshot;
+
+  beforeEach(() => {
+    saved = snapshotEnv();
+    setEnv("DEER_FLOW_INTERNAL_GATEWAY_BASE_URL", undefined);
+    setEnv("DEER_FLOW_TRUSTED_ORIGINS", undefined);
+  });
+
+  afterEach(() => {
+    restoreEnv(saved);
+  });
+
+  test("returns localhost defaults when env is unset in development", async () => {
+    setEnv("NODE_ENV", "development");
+
+    const { getGatewayConfig } = await loadFreshConfig();
+    const cfg = getGatewayConfig();
+
+    expect(cfg.internalGatewayUrl).toBe("http://127.0.0.1:8001");
+    expect(cfg.trustedOrigins).toEqual(["http://localhost:3000"]);
+  });
+
+  test("returns localhost defaults when env is unset in production (regression: issue #2705)", async () => {
+    setEnv("NODE_ENV", "production");
+
+    const { getGatewayConfig } = await loadFreshConfig();
+
+    expect(() => getGatewayConfig()).not.toThrow();
+    const cfg = getGatewayConfig();
+    expect(cfg.internalGatewayUrl).toBe("http://127.0.0.1:8001");
+    expect(cfg.trustedOrigins).toEqual(["http://localhost:3000"]);
+  });
+
+  test("uses env values verbatim when set, regardless of NODE_ENV", async () => {
+    setEnv("NODE_ENV", "production");
+    setEnv("DEER_FLOW_INTERNAL_GATEWAY_BASE_URL", "https://gw.example.com/");
+    setEnv(
+      "DEER_FLOW_TRUSTED_ORIGINS",
+      "https://app.example.com, https://admin.example.com",
+    );
+
+    const { getGatewayConfig } = await loadFreshConfig();
+    const cfg = getGatewayConfig();
+
+    expect(cfg.internalGatewayUrl).toBe("https://gw.example.com");
+    expect(cfg.trustedOrigins).toEqual([
+      "https://app.example.com",
+      "https://admin.example.com",
+    ]);
+  });
+
+  test("trims and filters empty entries in trustedOrigins", async () => {
+    setEnv("NODE_ENV", "production");
+    setEnv("DEER_FLOW_INTERNAL_GATEWAY_BASE_URL", "https://gw.example.com");
+    setEnv(
+      "DEER_FLOW_TRUSTED_ORIGINS",
+      " https://a.example , ,https://b.example ",
+    );
+
+    const { getGatewayConfig } = await loadFreshConfig();
+    const cfg = getGatewayConfig();
+
+    expect(cfg.trustedOrigins).toEqual([
+      "https://a.example",
+      "https://b.example",
+    ]);
+  });
+});
@@ -0,0 +1,396 @@
+import type { Message } from "@langchain/langgraph-sdk";
+import { expect, test } from "vitest";
+
+import { enUS } from "@/core/i18n";
+import {
+  buildTokenDebugSteps,
+  getTokenUsageViewPreset,
+  tokenUsagePreferencesFromPreset,
+} from "@/core/messages/usage-model";
+
+test("maps token usage presets to persisted preferences", () => {
+  expect(tokenUsagePreferencesFromPreset("off")).toEqual({
+    headerTotal: false,
+    inlineMode: "off",
+  });
+  expect(tokenUsagePreferencesFromPreset("summary")).toEqual({
+    headerTotal: true,
+    inlineMode: "off",
+  });
+  expect(tokenUsagePreferencesFromPreset("per_turn")).toEqual({
+    headerTotal: true,
+    inlineMode: "per_turn",
+  });
+  expect(tokenUsagePreferencesFromPreset("debug")).toEqual({
+    headerTotal: true,
+    inlineMode: "step_debug",
+  });
+});
+
+test("derives the active preset from persisted preferences", () => {
+  expect(
+    getTokenUsageViewPreset({
+      headerTotal: false,
+      inlineMode: "off",
+    }),
+  ).toBe("off");
+
+  expect(
+    getTokenUsageViewPreset({
+      headerTotal: true,
+      inlineMode: "off",
+    }),
+  ).toBe("summary");
+
+  expect(
+    getTokenUsageViewPreset({
+      headerTotal: true,
+      inlineMode: "per_turn",
+    }),
+  ).toBe("per_turn");
+
+  expect(
+    getTokenUsageViewPreset({
+      headerTotal: true,
+      inlineMode: "step_debug",
+    }),
+  ).toBe("debug");
+});
+
+test("uses generic todo labels when backend attribution is absent", () => {
+  const messages = [
+    {
+      id: "ai-1",
+      type: "ai",
+      content: "",
+      tool_calls: [
+        {
+          id: "write_todos:1",
+          name: "write_todos",
+          args: {
+            todos: [{ content: "Draft the plan", status: "in_progress" }],
+          },
+        },
+      ],
+      usage_metadata: {
+        input_tokens: 100,
+        output_tokens: 20,
+        total_tokens: 120,
+      },
+    },
+    {
+      id: "tool-1",
+      type: "tool",
+      name: "write_todos",
+      tool_call_id: "write_todos:1",
+      content: "ok",
+    },
+    {
+      id: "ai-2",
+      type: "ai",
+      content: "",
+      tool_calls: [
+        {
+          id: "write_todos:2",
+          name: "write_todos",
+          args: {
+            todos: [{ content: "Draft the plan", status: "completed" }],
+          },
+        },
+      ],
+      usage_metadata: { input_tokens: 50, output_tokens: 10, total_tokens: 60 },
+    },
+    {
+      id: "ai-3",
+      type: "ai",
+      content: "Here is the result",
+      usage_metadata: { input_tokens: 40, output_tokens: 15, total_tokens: 55 },
+    },
+  ] as Message[];
+
+  expect(buildTokenDebugSteps(messages, enUS)).toEqual([
+    expect.objectContaining({
+      messageId: "ai-1",
+      label: "Update to-do list",
+      sharedAttribution: false,
+    }),
+    expect.objectContaining({
+      messageId: "ai-2",
+      label: "Update to-do list",
+      sharedAttribution: false,
+    }),
+    expect.objectContaining({
+      messageId: "ai-3",
+      label: "Final answer",
+      sharedAttribution: false,
+    }),
+  ]);
+});
+
+test("marks multi-action AI steps as shared attribution", () => {
+  const messages = [
+    {
+      id: "ai-1",
+      type: "ai",
+      content: "",
+      tool_calls: [
+        {
+          id: "web_search:1",
+          name: "web_search",
+          args: { query: "LangGraph stream mode" },
+        },
+        {
+          id: "write_todos:1",
+          name: "write_todos",
+          args: {
+            todos: [
+              {
+                content: "Inspect stream mode handling",
+                status: "in_progress",
+              },
+            ],
+          },
+        },
+      ],
+      usage_metadata: {
+        input_tokens: 120,
+        output_tokens: 30,
+        total_tokens: 150,
+      },
+    },
+  ] as Message[];
+
+  expect(buildTokenDebugSteps(messages, enUS)).toEqual([
+    expect.objectContaining({
+      messageId: "ai-1",
+      label: "Step total",
+      sharedAttribution: true,
+      secondaryLabels: [
+        'Search for "LangGraph stream mode"',
+        "Update to-do list",
+      ],
+    }),
+  ]);
+});
+
+test("prefers backend attribution metadata when available", () => {
+  const messages = [
+    {
+      id: "ai-1",
+      type: "ai",
+      content: "",
+      tool_calls: [
+        {
+          id: "write_todos:1",
+          name: "write_todos",
+          args: {
+            todos: [
+              {
+                content: "Fallback label should not win",
+                status: "in_progress",
+              },
+            ],
+          },
+        },
+      ],
+      additional_kwargs: {
+        token_usage_attribution: {
+          version: 1,
+          kind: "todo_update",
+          shared_attribution: false,
+          actions: [{ kind: "todo_start", content: "Use backend attribution" }],
+        },
+      },
+      usage_metadata: { input_tokens: 25, output_tokens: 5, total_tokens: 30 },
+    },
+  ] as Message[];
+
+  expect(buildTokenDebugSteps(messages, enUS)).toEqual([
+    expect.objectContaining({
+      messageId: "ai-1",
+      label: "Start To-do: Use backend attribution",
+      sharedAttribution: false,
+    }),
+  ]);
+});
+
+test("falls back safely when attribution payload is malformed", () => {
+  const messages = [
+    {
+      id: "ai-1",
+      type: "ai",
+      content: "",
+      tool_calls: [
+        {
+          id: "web_search:1",
+          name: "web_search",
+          args: { query: "LangGraph stream mode" },
+        },
+      ],
+      additional_kwargs: {
+        token_usage_attribution: {
+          version: 1,
+          kind: "tool_batch",
+          actions: { broken: true },
+        },
+      },
+      usage_metadata: { input_tokens: 10, output_tokens: 5, total_tokens: 15 },
+    },
+  ] as Message[];
+
+  expect(buildTokenDebugSteps(messages, enUS)).toEqual([
+    expect.objectContaining({
+      messageId: "ai-1",
+      label: 'Search for "LangGraph stream mode"',
+      sharedAttribution: false,
+    }),
+  ]);
+});
+
+test("ignores attribution actions that are not objects", () => {
+  const messages = [
+    {
+      id: "ai-1",
+      type: "ai",
+      content: "",
+      tool_calls: [],
+      additional_kwargs: {
+        token_usage_attribution: {
+          version: 1,
+          kind: "tool_batch",
+          shared_attribution: true,
+          actions: [
+            null,
+            "bad-action",
+            { kind: "search", query: "valid search", ignored: "extra-field" },
+          ],
+        },
+      },
+      usage_metadata: { input_tokens: 10, output_tokens: 5, total_tokens: 15 },
+    },
+  ] as Message[];
+
+  expect(buildTokenDebugSteps(messages, enUS)).toEqual([
+    expect.objectContaining({
+      messageId: "ai-1",
+      label: 'Search for "valid search"',
+    }),
+  ]);
+});
+
+test("ignores malformed attribution fields and falls back to message content", () => {
+  const messages = [
+    {
+      id: "ai-1",
+      type: "ai",
+      content: "Real final answer",
+      tool_calls: [],
+      additional_kwargs: {
+        token_usage_attribution: {
+          version: 1,
+          kind: null,
+          shared_attribution: null,
+          tool_call_ids: [null, "tool-1", 123],
+          actions: [{ query: "missing kind" }],
+        },
+      },
+      usage_metadata: { input_tokens: 9, output_tokens: 3, total_tokens: 12 },
+    },
+  ] as Message[];
+
+  expect(buildTokenDebugSteps(messages, enUS)).toEqual([
+    expect.objectContaining({
+      messageId: "ai-1",
+      label: "Final answer",
+      sharedAttribution: false,
+    }),
+  ]);
+});
+
+test("ignores unknown top-level attribution fields", () => {
+  const messages = [
+    {
+      id: "ai-1",
+      type: "ai",
+      content: "",
+      tool_calls: [],
+      additional_kwargs: {
+        token_usage_attribution: {
+          version: 1,
+          kind: "tool_batch",
+          shared_attribution: false,
+          unknown_field: "ignored",
+          actions: [{ kind: "subagent", description: "Inspect the fix" }],
+        },
+      },
+      usage_metadata: { input_tokens: 12, output_tokens: 4, total_tokens: 16 },
+    },
+  ] as Message[];
+
+  expect(buildTokenDebugSteps(messages, enUS)).toEqual([
+    expect.objectContaining({
+      messageId: "ai-1",
+      label: "Subagent: Inspect the fix",
+      sharedAttribution: false,
+    }),
+  ]);
+});
+
+test("falls back to generic todo labels when backend attribution has no actions", () => {
+  const messages = [
+    {
+      id: "ai-1",
+      type: "ai",
+      content: "",
+      tool_calls: [
+        {
+          id: "write_todos:1",
+          name: "write_todos",
+          args: {
+            todos: [{ content: "Clean up stale tasks", status: "in_progress" }],
+          },
+        },
+      ],
+      usage_metadata: {
+        input_tokens: 100,
+        output_tokens: 20,
+        total_tokens: 120,
+      },
+    },
+    {
+      id: "ai-2",
+      type: "ai",
+      content: "",
+      tool_calls: [
+        {
+          id: "write_todos:2",
+          name: "write_todos",
+          args: {
+            todos: [],
+          },
+        },
+      ],
+      additional_kwargs: {
+        token_usage_attribution: {
+          version: 1,
+          kind: "todo_update",
+          shared_attribution: false,
+          actions: [],
+        },
+      },
+      usage_metadata: { input_tokens: 30, output_tokens: 8, total_tokens: 38 },
+    },
+  ] as Message[];
+
+  expect(buildTokenDebugSteps(messages, enUS)).toEqual([
+    expect.objectContaining({
+      messageId: "ai-1",
+      label: "Update to-do list",
+    }),
+    expect.objectContaining({
+      messageId: "ai-2",
+      label: "Update to-do list",
+      sharedAttribution: false,
+    }),
+  ]);
+});
@@ -0,0 +1,65 @@
+import type { Message } from "@langchain/langgraph-sdk";
+import { expect, test } from "vitest";
+
+import {
+  getAssistantTurnUsageMessages,
+  getMessageGroups,
+} from "@/core/messages/utils";
+
+test("aggregates token usage messages once per assistant turn", () => {
+  const messages = [
+    {
+      id: "human-1",
+      type: "human",
+      content: "Plan a trip",
+    },
+    {
+      id: "ai-1",
+      type: "ai",
+      content: "",
+      tool_calls: [{ id: "tool-1", name: "web_search", args: {} }],
+      usage_metadata: { input_tokens: 10, output_tokens: 5, total_tokens: 15 },
+    },
+    {
+      id: "tool-1-result",
+      type: "tool",
+      name: "web_search",
+      tool_call_id: "tool-1",
+      content: "[]",
+    },
+    {
+      id: "ai-2",
+      type: "ai",
+      content: "Here is the itinerary",
+      usage_metadata: { input_tokens: 2, output_tokens: 8, total_tokens: 10 },
+    },
+    {
+      id: "human-2",
+      type: "human",
+      content: "Make it shorter",
+    },
+    {
+      id: "ai-3",
+      type: "ai",
+      content: "Short version",
+      usage_metadata: { input_tokens: 1, output_tokens: 1, total_tokens: 2 },
+    },
+  ] as Message[];
+
+  const groups = getMessageGroups(messages);
+  const usageMessagesByGroupIndex = getAssistantTurnUsageMessages(groups);
+
+  expect(groups.map((group) => group.type)).toEqual([
+    "human",
+    "assistant:processing",
+    "assistant",
+    "human",
+    "assistant",
+  ]);
+
+  expect(
+    usageMessagesByGroupIndex.map(
+      (groupMessages) => groupMessages?.map((message) => message.id) ?? null,
+    ),
+  ).toEqual([null, null, ["ai-1", "ai-2"], null, ["ai-3"]]);
+});
Author	SHA1	Message	Date
yangzheli	59c4a3f0a4	feat(agent): add custom-agent self-updates with user isolation (#2713 ) Unit Tests / backend-unit-tests (push) Waiting to run Details E2E Tests / e2e-tests (push) Waiting to run Details Frontend Unit Tests / frontend-unit-tests (push) Waiting to run Details Lint Check / lint (push) Waiting to run Details Lint Check / lint-frontend (push) Waiting to run Details * feat(agent): add update_agent tool for in-chat custom-agent self-updates (#2616) Custom agents had no built-in way to persist updates to their own SOUL.md / config.yaml from a normal chat — `setup_agent` was only bound during the bootstrap flow, so when the user asked the agent to refine its description or personality, the agent would shell out via bash/write_file and the edits landed in a temporary sandbox/tool workspace instead of `{base_dir}/agents/{agent_name}/`. Changes: - New `update_agent` builtin tool with partial-update semantics (only the fields you pass are written) and atomic temp-file + os.replace writes so a failed update never corrupts existing SOUL.md / config.yaml. - Lead agent now binds `update_agent` in the non-bootstrap path whenever `agent_name` is set in the runtime context. Default agent (no agent_name) and bootstrap flow are unchanged. - New `<self_update>` system-prompt section is injected for custom agents, instructing them to use `update_agent` — and explicitly NOT bash / write_file — to persist self-updates. - Tests: 11 new cases in `tests/test_update_agent_tool.py` covering validation (missing/invalid agent_name, unknown agent, no fields), partial updates (soul-only, description-only, skills=[] vs omitted), no-op detection, atomic-write safety, and AgentConfig round-tripping; plus 2 new cases in `tests/test_lead_agent_prompt.py` covering the self-update prompt section. - Docs: updated backend/CLAUDE.md builtin tools list and tools.mdx (en/zh) with the new tool description. * feat(agent): isolate custom agents per user Store custom agent definitions under the effective user, keep legacy agents readable until migration, and cover API/tool/migration behavior with tests. Co-authored-by: Cursor <cursoragent@cursor.com> * feat: consistent write/delete targets & add --user-id to migration --------- Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-05 23:17:42 +08:00
Nan Gao	e8675f266d	fix(loop-detection): keep tool-call pairing on warn injection (#2724 ) (#2725 ) * fix(loop-detection): keep tool-call pairing on warn injection (#2724) * make format * fix(loop-detection): avoid IMMessage leak to downstream consumer * fix(channels): filter loop warning text from IM replies	2026-05-05 18:53:49 +08:00
Xun	680187ddc2	fix: Supplement list_running in RemoteSandboxBackend (#2716 ) * fix: Supplement list_running in RemoteSandboxBackend * fix * except requests.RequestException as exc: * fix	2026-05-05 18:53:10 +08:00
Xinmin Zeng	aded753de3	fix(frontend): restore localhost fallback for getGatewayConfig in prod mode (#2705 ) (#2718 ) * fix(frontend): unify gateway-config localhost fallback for prod (#2705) `getGatewayConfig()` only fell back to localhost defaults when `NODE_ENV === "development"`, while `next.config.js` always falls back to `127.0.0.1:8001`. Running `make start` (which sets NODE_ENV=production via `next start`) without `DEER_FLOW_INTERNAL_GATEWAY_BASE_URL` / `DEER_FLOW_TRUSTED_ORIGINS` therefore caused zod to throw inside SSR layouts and surfaced as a 500. Drop the NODE_ENV gating and use localhost defaults everywhere — the "force explicit config in prod" intent should be enforced by deployment templates (docker-compose already sets both vars), not by request-time crashes. Document the two vars in both .env.example files and add unit coverage for the dev/prod env-unset paths. * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * Update internalGatewayUrl in gateway config tests --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>	2026-05-05 16:27:29 +08:00
Willem Jiang	028493bfd8	fix(docker):force ngix to resolve upstream names at request time (#2717 ) * fix(docker):force ngix to resolve upstream names at request time * fix(docker): set resolver valid=0s to eliminate DNS cache window for request-time re-resolution Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/07bdb872-022f-4fd2-9fa8-d800a4ce34a7 Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com> * Update DNS resolver valid time and add upstreams * fix the unit test error * Remove upstream server configurations from nginx.conf Removed upstream server configurations for gateway and frontend. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>	2026-05-05 14:35:55 +08:00
Willem Jiang	8e48b7e85c	fix(channels): preserve clarification conversation history across follow-up turns (#2444 ) * fix(channels): preserve clarification conversation history across follow-up turns Pin channel-triggered runs to the root checkpoint namespace and ensure thread_id is always present in configurable run config so follow-up replies resume the same conversation state. Add regression coverage to channel tests: assert checkpoint_ns/thread_id are passed in wait and stream paths add an integration-style clarification flow test that verifies the second user reply continues prior context instead of starting a new session This addresses history loss after ask_clarification interruptions (issue #2425). * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix(channels): copy configurable dict before injecting run-scoped fields When configurable was already a plain dict, _resolve_run_params mutated it in place, leaking checkpoint_ns and thread_id back into the shared session config. Always copy via dict() before mutating to prevent cross-user or cross-channel config pollution. --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-05-04 16:14:07 +08:00
Willem Jiang	af6e48ccaa	fix(i18n): add Chinese translations for account settings page (#2712 ) Publish Containers / backend-container (push) Waiting to run Details Publish Containers / frontend-container (push) Waiting to run Details The account settings page had all user-facing strings (profile labels, password form placeholders, validation messages, button text) hardcoded in English. Replace them with i18n translation keys so the page renders correctly when the locale is set to Chinese. Fixed #2710	2026-05-04 11:15:16 +08:00
Willem Jiang	b10eb7bafc	feat(github): Added container push workflow (#2709 ) * feat(github):Added container push workflow * Apply suggestions from code review Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>	2026-05-04 11:14:34 +08:00
YuJitang	d02f762ab0	feat: refine token usage display modes (#2329 ) * feat: refine token usage display modes * docs: clarify token usage accounting semantics * fix: avoid duplicate subtask debug keys * style: format token usage tests * chore: address token attribution review feedback * Update test_token_usage_middleware.py * Update test_token_usage_middleware.py * chore: simplify token attribution fallback * fix token usage metadata follow-up handling --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-05-04 09:56:16 +08:00
Willem Jiang	82e7936d36	fix(docker): set UTF-8 locale to prevent ASCII encoding errors in minimal containers (#2707 ) * fix(docker): set UTF-8 locale to prevent ASCII encoding errors in minimal containers * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>	2026-05-04 09:41:10 +08:00
Nan Gao	222a7773cb	fix(frontend): avoid misleading error message when agent api is disable (#2697 ) (#2698 )	2026-05-04 09:38:05 +08:00
Nan Gao	f80ac961ec	fix(harness): restore legacy skills path fallback (#2694 ) (#2696 ) * fix(harness): restore legacy skills path fallback (#2694) * fix(format): make format * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>	2026-05-03 23:40:59 +08:00