feat: perita manus
This commit is contained in:
@@ -1,3 +1,3 @@
|
||||
{
|
||||
"include": ["gcu-tools", "hive_tools"]
|
||||
"include": ["gcu-tools", "hive_tools", "shell-tools"]
|
||||
}
|
||||
|
||||
@@ -51,6 +51,10 @@ _DEFAULT_LOCAL_SERVERS: dict[str, dict[str, Any]] = {
|
||||
"description": "File I/O: read, write, edit, search, list, run commands",
|
||||
"args": ["run", "python", "files_server.py", "--stdio"],
|
||||
},
|
||||
"shell-tools": {
|
||||
"description": "Terminal/shell capabilities: process exec, background jobs, PTY shells, fs search. Bash-only on POSIX.",
|
||||
"args": ["run", "python", "shell_tools_server.py", "--stdio"],
|
||||
},
|
||||
}
|
||||
|
||||
# Aliases that earlier versions of ensure_defaults wrote under the wrong name.
|
||||
|
||||
@@ -0,0 +1,132 @@
|
||||
---
|
||||
name: hive.shell-tools-foundations
|
||||
description: Required reading whenever any shell_* tool is available. Teaches the foreground/background dichotomy (shell_exec auto-promotes past 30s, returns a job_id you poll with shell_job_logs), the standard envelope shape (exit_code, stdout, stdout_truncated_bytes, output_handle, semantic_status, warning, auto_backgrounded, job_id), output handle pagination via shell_output_get, when to read semantic_status instead of raw exit_code (grep/rg/find/diff/test exit 1 is NOT an error), the destructive-warning surface (rm -rf, git push --force, DROP TABLE), tool preference (use files-tools / gcu-tools / hive_tools before raw shell), and the bash-only-on-macOS policy. Skipping this leads to "tool returned no output" surprises, orphaned jobs, and panic over benign grep exit codes.
|
||||
metadata:
|
||||
author: hive
|
||||
type: preset-skill
|
||||
version: "1.0"
|
||||
---
|
||||
|
||||
# shell-tools — foundations
|
||||
|
||||
These tools give you a real terminal: foreground exec with smart envelopes, background jobs with offset-based log streaming, persistent PTY shells, and filesystem search. Bash-only on POSIX.
|
||||
|
||||
## Tool preference (read first)
|
||||
|
||||
Before reaching for shell-tools, check whether a higher-level tool already covers the task. Shell is for system operations the other servers don't reach.
|
||||
|
||||
- **Reading files** → `files-tools.read_file` (handles size, paging, line-numbered output) — NOT `shell_exec("cat ...")`
|
||||
- **Editing files** → `files-tools.edit_file` (atomic patch with diff verification) — NOT `shell_exec("sed -i ...")`
|
||||
- **Writing files** → `files-tools.write_file` — NOT `shell_exec("echo > ...")`
|
||||
- **In-project search** → `files-tools.search_files` (project-scoped, code-aware) — use `shell_rg` only for raw paths outside the project (`/var/log`, `/etc`)
|
||||
- **Browser / web pages** → `gcu-tools.browser_*` for rendered pages — NOT `shell_exec("curl ...")`
|
||||
- **Web search** → `hive_tools.web_search` — NOT scraping
|
||||
- **System operations** (process exec, jobs, PTYs, raw fs search) → shell-tools. This is its territory.
|
||||
|
||||
## The standard envelope
|
||||
|
||||
Every spawn-style call (`shell_exec`, the auto-promoted job state) returns this shape:
|
||||
|
||||
```jsonc
|
||||
{
|
||||
"exit_code": 0, // null when auto-backgrounded or pre-spawn error
|
||||
"stdout": "...", // decoded, truncated to max_output_kb (default 256 KB)
|
||||
"stderr": "...",
|
||||
"stdout_truncated_bytes": 0, // > 0 means more is in output_handle
|
||||
"stderr_truncated_bytes": 0,
|
||||
"runtime_ms": 42,
|
||||
"pid": 12345,
|
||||
"output_handle": null, // "out_<hex>" when truncated — paginate with shell_output_get
|
||||
"timed_out": false,
|
||||
"semantic_status": "ok", // "ok" | "signal" | "error" — read THIS, not just exit_code
|
||||
"semantic_message": null, // e.g. "No matches found" for grep exit 1
|
||||
"warning": null, // e.g. "may force-remove files" for rm -rf
|
||||
"auto_backgrounded": false,
|
||||
"job_id": null // set when auto_backgrounded=true
|
||||
}
|
||||
```
|
||||
|
||||
## Auto-promotion (the core mental model)
|
||||
|
||||
`shell_exec` runs commands in the foreground until the **auto-background budget** (default 30s) elapses. Past that point, the process is silently transferred to a background job and the call returns immediately with:
|
||||
|
||||
```jsonc
|
||||
{ "auto_backgrounded": true, "exit_code": null, "job_id": "job_<hex>", ... }
|
||||
```
|
||||
|
||||
When you see `auto_backgrounded: true`, **pivot to polling**. The job is still running:
|
||||
|
||||
```
|
||||
shell_job_logs(job_id, since_offset=0, wait_until_exit=true, wait_timeout_sec=60)
|
||||
→ blocks server-side until the job exits or the timeout, returns logs + status
|
||||
```
|
||||
|
||||
You're not failing — you're freed up to do other work while the long task runs.
|
||||
|
||||
To force pure-foreground (kill on `timeout_sec`), pass `auto_background_after_sec=0`. Use this when you genuinely don't want a background job (small commands where promotion would surprise you).
|
||||
|
||||
## Semantic exit codes — read `semantic_status`, not raw `exit_code`
|
||||
|
||||
Several common commands use exit 1 for legitimate non-error states:
|
||||
|
||||
| Command | exit 0 | exit 1 |
|
||||
|---|---|---|
|
||||
| `grep` / `rg` | matches found | **no matches** (not an error) |
|
||||
| `find` | success | **some dirs unreadable** (informational) |
|
||||
| `diff` | identical | **files differ** (informational) |
|
||||
| `test` / `[` | true | **false** (informational) |
|
||||
|
||||
For these, `semantic_status` will be `"ok"` even when `exit_code == 1`, with `semantic_message` describing why ("No matches found"). For everything else, `semantic_status` defaults to `"ok"` on 0 and `"error"` on nonzero.
|
||||
|
||||
**Rule**: always check `semantic_status` first. Only fall back to `exit_code` when you need the exact number (e.g. distinguishing `make` errors).
|
||||
|
||||
## Destructive warnings — re-read your command
|
||||
|
||||
The envelope's `warning` field is set when the command matches a known destructive pattern (`rm -rf`, `git push --force`, `git reset --hard`, `DROP TABLE`, `kubectl delete`, `terraform destroy`, etc.). The command **still ran** — the warning is informational. Use it as a "did I mean to do that?" prompt before trusting subsequent steps that depend on the side effect.
|
||||
|
||||
If a `warning` appears unexpectedly, stop and verify: was the destructive action intended, or did a path/glob slip in?
|
||||
|
||||
## Output handles — never lose output
|
||||
|
||||
When `stdout_truncated_bytes > 0` or `stderr_truncated_bytes > 0`, the inline output was capped at `max_output_kb` (default 256 KB). The full bytes are stashed under `output_handle` for **5 minutes**. Paginate with:
|
||||
|
||||
```
|
||||
shell_output_get(output_handle, since_offset=0, max_kb=64)
|
||||
→ { data, offset, next_offset, eof, expired }
|
||||
```
|
||||
|
||||
Track `next_offset` across calls. If `expired: true`, re-run the command (the handle's TTL has lapsed).
|
||||
|
||||
The store has a 64 MB cap with LRU eviction. For huge outputs, prefer `shell_job_start` + `shell_job_logs` polling (4 MB ring buffer per stream, infinite total throughput).
|
||||
|
||||
## Bash, not zsh — even on macOS
|
||||
|
||||
`shell_exec` and `shell_pty_open` always invoke `/bin/bash`. The user's `$SHELL` is ignored. Explicit `shell="/bin/zsh"` is **rejected** with a clear error. This is a deliberate security stance, not aesthetic — zsh has command/builtin classes (`zmodload`, `=cmd` expansion, `zpty`, `ztcp`, `zf_*`) that bypass bash-shaped checks. The `shell-tools-pty-sessions` skill explains the implications for PTY sessions specifically.
|
||||
|
||||
`ZDOTDIR` and `ZSH_*` env vars are stripped before exec to prevent zsh dotfiles leaking in. Bash dotfiles still apply when invoked interactively (e.g. PTY sessions use `bash --norc --noprofile` to keep things predictable).
|
||||
|
||||
## Pipelines and complex commands
|
||||
|
||||
For pipes, redirects, globs, and bash builtins, set `shell=True`:
|
||||
|
||||
```
|
||||
shell_exec("find . -type f -name '*.log' | xargs grep -l ERROR | head", shell=True)
|
||||
```
|
||||
|
||||
For simple `argv` commands, `shell=False` (the default) is faster and avoids quoting hazards. Naive whitespace splitting is fine for the common case; use `shell=True` when arguments contain spaces or quotes.
|
||||
|
||||
## When to use what (cheat sheet)
|
||||
|
||||
| Need | Tool |
|
||||
|---|---|
|
||||
| One-shot command, ≤30s | `shell_exec` |
|
||||
| One-shot command, might be longer | `shell_exec` (auto-promotes) |
|
||||
| Long-running job from the start | `shell_job_start` |
|
||||
| State across calls (cd, env, REPL) | `shell_pty_open` + `shell_pty_run` |
|
||||
| Search file contents (raw paths) | `shell_rg` |
|
||||
| Find files by predicate | `shell_find` |
|
||||
| Retrieve truncated output | `shell_output_get` |
|
||||
| Tree / stat / du | `shell_exec("ls -la"/"stat foo"/"du -sh path")` |
|
||||
| HTTP / DNS / ping / archives | `shell_exec("curl ..."/"dig ..."/"tar xzf ...")` |
|
||||
|
||||
See `references/exit_codes.md` for the full POSIX + signal-induced + semantic catalog.
|
||||
@@ -0,0 +1,50 @@
|
||||
# Exit code reference
|
||||
|
||||
## POSIX conventions
|
||||
|
||||
| Code | Meaning |
|
||||
|---|---|
|
||||
| 0 | Success |
|
||||
| 1 | General error / catchall |
|
||||
| 2 | Misuse of shell builtins, syntax error |
|
||||
| 126 | Command found but not executable |
|
||||
| 127 | Command not found |
|
||||
| 128 | Invalid argument to `exit` |
|
||||
| 128 + N | Killed by signal N |
|
||||
| 130 | Killed by SIGINT (Ctrl-C) |
|
||||
| 137 | Killed by SIGKILL |
|
||||
| 143 | Killed by SIGTERM |
|
||||
| 255 | Exit status out of range |
|
||||
|
||||
When `exit_code < 0` in the envelope, the process was killed by a signal: `abs(exit_code)` is the signal number (subprocess uses negative codes for signaled exits, separate from the `128 + N` shell convention).
|
||||
|
||||
## Semantic exits — when exit 1 is NOT an error
|
||||
|
||||
shell-tools encodes these in `semantic_status`. The agent should read `semantic_status` first.
|
||||
|
||||
| Command | Code 0 | Code 1 | Code ≥2 |
|
||||
|---|---|---|---|
|
||||
| `grep` / `rg` / `ripgrep` | matches found | **no matches** (ok) | error |
|
||||
| `find` | success | **some dirs unreadable** (ok) | error |
|
||||
| `diff` | files identical | **files differ** (ok) | error |
|
||||
| `test` / `[` | condition true | **condition false** (ok) | error |
|
||||
|
||||
For any command not in this table, the default convention applies (0 = ok, nonzero = error).
|
||||
|
||||
## When `exit_code` is `null`
|
||||
|
||||
- `auto_backgrounded: true` — the process is still running under a `job_id`. Poll with `shell_job_logs`.
|
||||
- Pre-spawn error (command not found, exec failed) — see `error` field in the envelope.
|
||||
- `timed_out: true` and the process refused to die — extremely rare; the kernel has the answer.
|
||||
|
||||
## Common signal-induced exits
|
||||
|
||||
| Signal | Number | Subprocess exit | Shell exit | Meaning |
|
||||
|---|---|---|---|---|
|
||||
| SIGHUP | 1 | -1 | 129 | Terminal hangup |
|
||||
| SIGINT | 2 | -2 | 130 | Interrupt (Ctrl-C) |
|
||||
| SIGQUIT | 3 | -3 | 131 | Quit (Ctrl-\\) |
|
||||
| SIGKILL | 9 | -9 | 137 | Forced kill (uncatchable) |
|
||||
| SIGTERM | 15 | -15 | 143 | Polite termination |
|
||||
| SIGSEGV | 11 | -11 | 139 | Segmentation fault |
|
||||
| SIGABRT | 6 | -6 | 134 | Abort (assertion failed, etc.) |
|
||||
@@ -0,0 +1,96 @@
|
||||
---
|
||||
name: hive.shell-tools-fs-search
|
||||
description: Use shell_rg / shell_find when you need raw filesystem search outside the project tree — system configs, /var/log, /etc, archive contents — or when files-tools.search_files is too project-scoped. Teaches the rg vs find vs shell_exec("ls/du/tree") split, common rg flag combos for code/logs/configs, find predicates for mtime/size/type queries, and the rule that for tree views or single-file stat info you should just use shell_exec instead of inventing a tool. Read before reaching for raw shell to grep or find anything.
|
||||
metadata:
|
||||
author: hive
|
||||
type: preset-skill
|
||||
version: "1.0"
|
||||
---
|
||||
|
||||
# Filesystem search
|
||||
|
||||
shell-tools provides two structured search tools: `shell_rg` (ripgrep for content) and `shell_find` (find for predicates). Everything else (tree, stat, du) is just `shell_exec`.
|
||||
|
||||
## When to use what
|
||||
|
||||
| Task | Tool |
|
||||
|---|---|
|
||||
| Find code/text matching a pattern in your **project** | `files-tools.search_files` (project-aware, ranks by relevance) |
|
||||
| Find code/text matching a pattern in `/var/log`, `/etc`, archives, system dirs | `shell_rg` |
|
||||
| Find files matching name/glob/predicate | `shell_find` |
|
||||
| List a directory | `shell_exec("ls -la /path")` |
|
||||
| Tree view | `shell_exec("tree -L 2 /path")` |
|
||||
| Single-path stat | `shell_exec("stat /path")` |
|
||||
| Disk usage | `shell_exec("du -sh /path")` or `shell_exec("du -h --max-depth=2 /")` |
|
||||
| Count matches across files | `shell_rg(pattern, count=True via extra_args=["-c"])` |
|
||||
|
||||
## `shell_rg` — content search
|
||||
|
||||
ripgrep is fast, gitignore-aware, and has a deep flag surface. The structured wrapper exposes the most useful flags directly; `extra_args` covers the rest.
|
||||
|
||||
### Common patterns
|
||||
|
||||
```
|
||||
# All Python files containing "TODO"
|
||||
shell_rg(pattern="TODO", path=".", type_filter="py")
|
||||
|
||||
# Case-insensitive, with context
|
||||
shell_rg(pattern="error", path="/var/log", ignore_case=True, context=2)
|
||||
|
||||
# Search hidden files (rg ignores them by default)
|
||||
shell_rg(pattern="api_key", path="~", hidden=True)
|
||||
|
||||
# Don't respect .gitignore (find files git would ignore)
|
||||
shell_rg(pattern="generated", path=".", no_ignore=True)
|
||||
|
||||
# Multi-line pattern (e.g., function definitions spanning lines)
|
||||
shell_rg(pattern=r"def\s+\w+\(.*\n.*\n", path="src", extra_args=["--multiline"])
|
||||
|
||||
# Specific filename glob
|
||||
shell_rg(pattern="version", path=".", glob="*.toml")
|
||||
```
|
||||
|
||||
### rg flag idioms
|
||||
|
||||
| Flag | Effect |
|
||||
|---|---|
|
||||
| `-tpy` (`type_filter="py"`) | Only Python files |
|
||||
| `-uu` | Don't respect any ignores (incl. `.git/`) |
|
||||
| `--multiline` (`extra_args`) | Allow regex spanning lines |
|
||||
| `--max-count` (`max_count`) | Stop after N matches per file |
|
||||
| `--max-depth` (`max_depth`) | Limit recursion |
|
||||
| `-w` (`extra_args`) | Whole word match |
|
||||
| `-F` (`extra_args`) | Fixed string (no regex) |
|
||||
|
||||
See `references/ripgrep_cheatsheet.md` for the long form.
|
||||
|
||||
## `shell_find` — predicate search
|
||||
|
||||
`find` excels at "files matching N criteria". The wrapper surfaces the most common predicates; combine via the structured arguments.
|
||||
|
||||
```
|
||||
# All .log files modified in the last 7 days, larger than 1MB
|
||||
shell_find(path="/var/log", iname="*.log", mtime_days=7, size_kb_min=1024)
|
||||
|
||||
# All directories named ".git" (find Git repos under a tree)
|
||||
shell_find(path="~/projects", name=".git", type_filter="d")
|
||||
|
||||
# Only the top three levels
|
||||
shell_find(path="/etc", max_depth=3, type_filter="f")
|
||||
|
||||
# Symlinks
|
||||
shell_find(path=".", type_filter="l")
|
||||
```
|
||||
|
||||
See `references/find_predicates.md` for combinations not directly exposed.
|
||||
|
||||
## Output truncation
|
||||
|
||||
Both tools return `truncated: true` when their output exceeded the inline cap. For `shell_rg`, this means matches were dropped (refine the pattern or narrow the path); for `shell_find`, results past `max_results` (default 1000) are dropped. Tighten predicates rather than raising the cap.
|
||||
|
||||
## Anti-patterns
|
||||
|
||||
- **Don't `shell_rg` your project tree** — `files-tools.search_files` is project-aware and ranks results.
|
||||
- **Don't reach for `shell_find` to list one directory** — `shell_exec("ls -la /path")` is shorter.
|
||||
- **Don't use `shell_exec("grep ...")`** when `shell_rg` exists — rg is faster, gitignore-aware, and returns structured matches.
|
||||
- **Don't use `shell_exec("find ...")`** to invent your own predicate combinations — use `shell_find` and report missing capabilities.
|
||||
+78
@@ -0,0 +1,78 @@
|
||||
# find predicate reference
|
||||
|
||||
The `shell_find` wrapper exposes name/iname, type, mtime_days, size bounds, max_depth, max_results. For combinations beyond that, drop to `shell_exec("find ...")`.
|
||||
|
||||
## Time predicates
|
||||
|
||||
| Need | find predicate |
|
||||
|---|---|
|
||||
| Modified within N days | `-mtime -N` (wrapper: `mtime_days=N`) |
|
||||
| Modified more than N days ago | `-mtime +N` |
|
||||
| Modified exactly N days ago | `-mtime N` |
|
||||
| Accessed within N days | `-atime -N` |
|
||||
| Inode changed within N days | `-ctime -N` |
|
||||
| Modified in last N minutes | `-mmin -N` |
|
||||
| Newer than reference file | `-newer ref` |
|
||||
|
||||
## Size predicates
|
||||
|
||||
| Need | find predicate |
|
||||
|---|---|
|
||||
| Bigger than N kilobytes | `-size +Nk` (wrapper: `size_kb_min`) |
|
||||
| Smaller than N kilobytes | `-size -Nk` (wrapper: `size_kb_max`) |
|
||||
| Exactly N kilobytes | `-size Nk` |
|
||||
| Bigger than N megabytes | `-size +NM` |
|
||||
| Empty files | `-empty` |
|
||||
|
||||
## Type predicates
|
||||
|
||||
| Need | find predicate |
|
||||
|---|---|
|
||||
| Regular file | `-type f` (wrapper: `type_filter="f"`) |
|
||||
| Directory | `-type d` (wrapper: `type_filter="d"`) |
|
||||
| Symlink | `-type l` (wrapper: `type_filter="l"`) |
|
||||
| Block device | `-type b` |
|
||||
| Character device | `-type c` |
|
||||
| FIFO | `-type p` |
|
||||
| Socket | `-type s` |
|
||||
|
||||
## Permission predicates
|
||||
|
||||
| Need | find predicate |
|
||||
|---|---|
|
||||
| Owned by user | `-user alice` |
|
||||
| Owned by group | `-group dev` |
|
||||
| Permission bits exact | `-perm 644` |
|
||||
| Has any of these bits | `-perm /u+x` |
|
||||
| Has all of these bits | `-perm -u+x` |
|
||||
| Readable by current user | `-readable` |
|
||||
| Writable | `-writable` |
|
||||
| Executable | `-executable` |
|
||||
|
||||
## Composing
|
||||
|
||||
`find` evaluates predicates left-to-right with implicit AND. For OR, use `\(`...\` or .
|
||||
|
||||
```
|
||||
# .log OR .txt (drop to shell_exec for OR)
|
||||
shell_exec(r"find /path \( -name '*.log' -o -name '*.txt' \) -type f", shell=True)
|
||||
|
||||
# NOT in a directory called node_modules
|
||||
shell_exec("find . -path '*/node_modules' -prune -o -name '*.js' -print", shell=True)
|
||||
```
|
||||
|
||||
## Actions
|
||||
|
||||
| Need | predicate |
|
||||
|---|---|
|
||||
| Print path (default) | (implicit `-print`) |
|
||||
| Print null-separated | `-print0` (for piping to xargs -0) |
|
||||
| Delete | `-delete` (DANGEROUS — use shell_exec with explicit confirmation) |
|
||||
| Run command per match | `-exec cmd {} \;` (drop to shell_exec) |
|
||||
| Run command, batched | `-exec cmd {} +` |
|
||||
|
||||
## When NOT to use find
|
||||
|
||||
- **One directory listing**: `shell_exec("ls -la /path")`
|
||||
- **Recursive grep**: `shell_rg`
|
||||
- **Count files**: `shell_exec("find /path -type f | wc -l")`
|
||||
+70
@@ -0,0 +1,70 @@
|
||||
# ripgrep cheatsheet
|
||||
|
||||
For when the structured `shell_rg` flags don't cover the case. Pass via `extra_args=[...]`.
|
||||
|
||||
## Filtering
|
||||
|
||||
| Need | Flag |
|
||||
|---|---|
|
||||
| Whole word | `-w` |
|
||||
| Fixed string (no regex) | `-F` |
|
||||
| Match files only (paths, not lines) | `-l` |
|
||||
| Count matches per file | `-c` |
|
||||
| Print only filenames with no matches | `--files-without-match` |
|
||||
| Exclude binary files | (default) |
|
||||
| Include binaries | `--binary` |
|
||||
| Search archives transparently | (rg doesn't — extract first) |
|
||||
|
||||
## Output shape
|
||||
|
||||
| Need | Flag |
|
||||
|---|---|
|
||||
| Show only matched part | `-o` |
|
||||
| Show byte offset of match | `-b` |
|
||||
| No filename prefix | `-N` (or pipe through awk) |
|
||||
| Color always (for piping into a colorizer) | `--color=always` |
|
||||
| JSON output | (the wrapper already uses `--json` internally) |
|
||||
|
||||
## Boundaries
|
||||
|
||||
| Need | Flag |
|
||||
|---|---|
|
||||
| Line-by-line (default) | (default) |
|
||||
| Multi-line regex | `--multiline` (or `-U`) |
|
||||
| Multi-line dotall (`.` matches `\n`) | `--multiline-dotall` |
|
||||
| Crlf line endings | `--crlf` |
|
||||
|
||||
## Path control
|
||||
|
||||
| Need | Flag |
|
||||
|---|---|
|
||||
| Follow symlinks | `-L` |
|
||||
| Don't follow | (default) |
|
||||
| Search hidden | `-.` (also expressed as `hidden=True`) |
|
||||
| Don't respect any ignores | `-uuu` |
|
||||
| Glob include | `-g 'pattern'` (also `glob="..."`) |
|
||||
| Glob exclude | `-g '!pattern'` |
|
||||
|
||||
## Performance
|
||||
|
||||
| Need | Flag |
|
||||
|---|---|
|
||||
| One thread | `-j 1` |
|
||||
| Smaller mmap chunks | `--mmap` (default behavior usually fine) |
|
||||
| Per-file match cap | `-m N` (also `max_count=N`) |
|
||||
|
||||
## Common composed queries
|
||||
|
||||
```
|
||||
# Find unused imports in Python
|
||||
shell_rg(pattern=r"^import\s+\w+$", path="src", type_filter="py")
|
||||
|
||||
# All TODO/FIXME/XXX with file:line
|
||||
shell_rg(pattern=r"\b(TODO|FIXME|XXX)\b", path=".", extra_args=["-n"])
|
||||
|
||||
# Functions defined at module top-level
|
||||
shell_rg(pattern=r"^def\s+\w+", path=".", type_filter="py")
|
||||
|
||||
# Lines that DON'T match a pattern (filtered through awk)
|
||||
# rg can't invert at line level; use shell_exec with grep -v
|
||||
```
|
||||
@@ -0,0 +1,110 @@
|
||||
---
|
||||
name: hive.shell-tools-job-control
|
||||
description: Use when launching anything that runs longer than a minute, anything that streams logs, anything you want to keep running while doing other work — or when shell_exec auto-backgrounded on you and returned a job_id. Teaches the start→poll→wait pattern with shell_job_logs offset bookkeeping, the `wait_until_exit=True` blocking-poll idiom, the truncated_bytes_dropped resumption signal, the merge_stderr decision, the SIGINT→SIGTERM→SIGKILL escalation ladder via shell_job_manage, and the hard rule that jobs die when the shell-tools server restarts. Read before calling shell_job_start, or right after shell_exec auto-backgrounded.
|
||||
metadata:
|
||||
author: hive
|
||||
type: preset-skill
|
||||
version: "1.0"
|
||||
---
|
||||
|
||||
# Background job control
|
||||
|
||||
Background jobs are how you do things that take time without blocking your conversation. Three tools cover the surface: `shell_job_start`, `shell_job_logs`, `shell_job_manage`.
|
||||
|
||||
## When to use a job
|
||||
|
||||
- Builds, deploys, long tests
|
||||
- Processes you want to monitor (streaming a log file, a dev server)
|
||||
- Anything that auto-backgrounded from `shell_exec` (you have a `job_id`; pivot to this skill's idioms)
|
||||
|
||||
For one-shot work expected to finish quickly, `shell_exec` is simpler. The auto-promotion mechanic in `shell_exec` is your safety net — start with `shell_exec`, take over with this skill if needed.
|
||||
|
||||
## Lifecycle
|
||||
|
||||
```
|
||||
shell_job_start(command, ...)
|
||||
→ { job_id, pid, started_at }
|
||||
|
||||
shell_job_logs(job_id, since_offset=0, max_bytes=64000)
|
||||
→ { data, offset, next_offset, status: "running"|"exited", exit_code, ... }
|
||||
|
||||
# Repeat with since_offset = previous next_offset until status == "exited"
|
||||
# Or block once with wait_until_exit=True:
|
||||
shell_job_logs(job_id, since_offset=N, wait_until_exit=True, wait_timeout_sec=60)
|
||||
→ blocks server-side until exit or timeout
|
||||
```
|
||||
|
||||
After exit, the job is retained for inspection (`shell_job_manage(action="list")`) until evicted by FIFO (50 most recent exits kept).
|
||||
|
||||
## Offset bookkeeping — the only rule that matters
|
||||
|
||||
The job's output lives in a 4 MB ring buffer per stream. Each call to `shell_job_logs` returns:
|
||||
|
||||
- `data` — bytes between `since_offset` and `next_offset`
|
||||
- `next_offset` — pass this as `since_offset` on your next call
|
||||
- `truncated_bytes_dropped` — non-zero when your `since_offset` was older than the ring's floor (you fell behind)
|
||||
|
||||
**Always carry `next_offset` forward.** Don't replay from 0 — that's an offset reset, you'll see the same data twice and miss the part that fell off.
|
||||
|
||||
When `truncated_bytes_dropped > 0`, the buffer evicted N bytes between your last call and now. Treat it as a signal that the job is producing output faster than you're consuming. Either poll more often or accept the gap and read from `next_offset` going forward.
|
||||
|
||||
## merge_stderr — interleaved or separate
|
||||
|
||||
```
|
||||
merge_stderr=False → two streams, request "stdout" or "stderr" by name
|
||||
merge_stderr=True → one stream ("merged"), order preserved
|
||||
```
|
||||
|
||||
Pick `merge_stderr=True` when:
|
||||
- The job's logs are designed to be read together (most servers, build tools)
|
||||
- You don't need to distinguish "this was stderr"
|
||||
|
||||
Pick `merge_stderr=False` when:
|
||||
- stderr is genuinely error-only and stdout is data
|
||||
- You'll process them differently
|
||||
|
||||
## Signal escalation
|
||||
|
||||
```
|
||||
shell_job_manage(action="signal_int", job_id=...) # graceful (Ctrl-C-equivalent)
|
||||
shell_job_manage(action="signal_term", job_id=...) # polite kill (SIGTERM)
|
||||
shell_job_manage(action="signal_kill", job_id=...) # forced kill (SIGKILL, uncatchable)
|
||||
```
|
||||
|
||||
The idiom: `signal_int` → wait 2-5s → `signal_term` → wait 2-5s → `signal_kill`. Most well-behaved processes handle SIGINT (graceful) and SIGTERM (cleanup, then exit). SIGKILL bypasses cleanup — use only when the process is truly unresponsive.
|
||||
|
||||
After signaling, check exit with `shell_job_logs(job_id, wait_until_exit=True, wait_timeout_sec=2)`.
|
||||
|
||||
## Stdin
|
||||
|
||||
```
|
||||
shell_job_manage(action="stdin", job_id=..., data="some input\n")
|
||||
shell_job_manage(action="close_stdin", job_id=...)
|
||||
```
|
||||
|
||||
For tools that read stdin to EOF, `close_stdin` after writing flushes them. For interactive tools that read line-by-line, just write each line.
|
||||
|
||||
## Take-over: when shell_exec auto-backgrounds
|
||||
|
||||
When `shell_exec` returned `auto_backgrounded: true, job_id: <X>`, the process is **already** in the JobManager with its output flowing into the ring buffer. Your transition is seamless:
|
||||
|
||||
```
|
||||
# Already saw the start of output in shell_exec's stdout/stderr.
|
||||
# Pick up reading where the env left off — use the byte count of the
|
||||
# initial stdout as your since_offset, OR just request tail output:
|
||||
shell_job_logs(job_id="job_xxx", tail=True, max_bytes=64000)
|
||||
```
|
||||
|
||||
Or block until exit and grab everything:
|
||||
|
||||
```
|
||||
shell_job_logs(job_id="job_xxx", since_offset=0, wait_until_exit=True, wait_timeout_sec=120)
|
||||
```
|
||||
|
||||
## Hard rules
|
||||
|
||||
- **Jobs die when the server restarts.** The desktop runtime restarts shell-tools when Hive restarts. There's no re-attach. If you need durability, use `nohup` + `shell_exec` to detach into the system's process tree and track the PID yourself.
|
||||
- **Server-wide hard cap on concurrent jobs** (`SHELL_TOOLS_MAX_JOBS`, default 32). Past the cap, `shell_job_start` returns an error. Wait for jobs to exit or kill old ones.
|
||||
- **No cross-restart output.** Output handles and ring buffers are in-memory only.
|
||||
|
||||
See `references/signals.md` for the full signal catalog.
|
||||
@@ -0,0 +1,41 @@
|
||||
# Signal reference
|
||||
|
||||
shell_job_manage exposes six signals via the action name.
|
||||
|
||||
| Action | Signal | Number | Purpose | Catchable? |
|
||||
|---|---|---|---|---|
|
||||
| `signal_int` | SIGINT | 2 | Interrupt — Ctrl-C equivalent. Most CLIs treat as "stop gracefully". | Yes |
|
||||
| `signal_term` | SIGTERM | 15 | Polite termination request. Default for `kill`. | Yes |
|
||||
| `signal_kill` | SIGKILL | 9 | Forced kill. Process can't catch, clean up, or finalize. Use sparingly. | **No** |
|
||||
| `signal_hup` | SIGHUP | 1 | Hangup. Many daemons reload config on this. | Yes |
|
||||
| `signal_usr1` | SIGUSR1 | 10 | User-defined #1. Common: dump state, rotate logs (nginx, etc). | Yes |
|
||||
| `signal_usr2` | SIGUSR2 | 12 | User-defined #2. Common: graceful binary upgrade (unicorn, etc). | Yes |
|
||||
|
||||
## Escalation idiom
|
||||
|
||||
```
|
||||
1. signal_int (Ctrl-C — graceful)
|
||||
2. wait 2-5s, check status with shell_job_logs(wait_until_exit=True, wait_timeout_sec=3)
|
||||
3. if still running: signal_term (cleanup-then-exit)
|
||||
4. wait 2-5s
|
||||
5. if still running: signal_kill (forced)
|
||||
```
|
||||
|
||||
The waits matter: SIGTERM handlers do real work (flush logs, close DBs, release locks) and need time. Skipping straight to SIGKILL leaks resources.
|
||||
|
||||
## When to use SIGUSR1 / SIGUSR2
|
||||
|
||||
These are application-defined. Read the target's docs first. Common:
|
||||
- **nginx**: SIGUSR1 → reopen log files (for log rotation)
|
||||
- **unicorn / puma**: SIGUSR2 → fork a new master with the latest binary (graceful restart)
|
||||
- **rsync**: SIGUSR1 → print stats so far
|
||||
|
||||
## Reading exit codes after a signal
|
||||
|
||||
When a job exits via signal, `shell_job_logs` returns `exit_code: -N` (subprocess convention) where `abs(N)` is the signal number. The shell convention `128 + N` doesn't apply to the JobManager — that's for shell-spawned children.
|
||||
|
||||
| exit_code | Means |
|
||||
|---|---|
|
||||
| -2 | Killed by SIGINT |
|
||||
| -9 | Killed by SIGKILL |
|
||||
| -15 | Killed by SIGTERM |
|
||||
@@ -0,0 +1,127 @@
|
||||
---
|
||||
name: hive.shell-tools-pty-sessions
|
||||
description: Use when you need state across calls — building env vars, navigating with cd, driving REPLs (python -i, mysql, psql, node), or responding to interactive prompts (sudo password, ssh host-key confirmation, mysql connection). Teaches the prompt-sentinel exec pattern (default mode), raw I/O for REPLs (raw_send=True then read_only=True), the one-in-flight-per-session rule, and the close-or-leak-against-the-cap discipline. Bash on macOS — never zsh; explicit shell=/bin/zsh is rejected. Read before calling shell_pty_open.
|
||||
metadata:
|
||||
author: hive
|
||||
type: preset-skill
|
||||
version: "1.0"
|
||||
---
|
||||
|
||||
# Persistent PTY sessions
|
||||
|
||||
PTY sessions are how you talk to interactive programs — programs that detect a terminal (`isatty()`) and behave differently when they don't see one. Use a session when:
|
||||
|
||||
- You need state to persist across calls (`cd`, env vars, sourced scripts)
|
||||
- You're driving a REPL (`python -i`, `mysql`, `psql`, `node`, `irb`)
|
||||
- A program demands an interactive prompt (`sudo`, `ssh`, `npm login`, `gh auth login`)
|
||||
|
||||
For everything else, `shell_exec` is simpler. Sessions cost more (per-session bash process, ring buffer, idle-reaping bookkeeping) and have a hard cap (`SHELL_TOOLS_MAX_PTY`, default 8).
|
||||
|
||||
## Why PTY (and not subprocess pipes)
|
||||
|
||||
Subprocess pipes break on every interactive program. The moment a program calls `isatty()` and sees False, it disables prompts, color, line-editing, password masking, progress bars — sometimes refuses to start. PTY makes us look like a real terminal so these programs work the same as in your shell.
|
||||
|
||||
The cost: PTY output includes terminal escape codes (cursor moves, color codes). The session captures them as-is; if you need clean text, strip ANSI escapes in your processing layer.
|
||||
|
||||
## Bash on macOS — by deliberate policy
|
||||
|
||||
`shell_pty_open` always invokes `/bin/bash`, regardless of the user's `$SHELL`. macOS users: yes, even when zsh is your interactive default. This is the **shell-tools-foundations** policy applied to PTYs.
|
||||
|
||||
Reasons:
|
||||
- zsh has command/builtin classes (`zmodload`, `=cmd` expansion, `zpty`, `ztcp`) that bypass bash-shaped security checks
|
||||
- One shell behavior across platforms eliminates "works on Linux, breaks on macOS" surprises
|
||||
- Bash is universal: any shell you've used will accept the bash subset
|
||||
|
||||
The bash invocation uses `--norc --noprofile` so user dotfiles don't leak in. PS1 is set to a unique sentinel for prompt detection. PS2 is empty. PROMPT_COMMAND is empty.
|
||||
|
||||
## Three modes of `shell_pty_run`
|
||||
|
||||
### 1. Default: send command, wait for prompt sentinel
|
||||
|
||||
```
|
||||
shell_pty_run(session_id, command="ls -la")
|
||||
→ { output, prompt_after: True, ... }
|
||||
```
|
||||
|
||||
The session writes `ls -la\n`, waits for the sentinel that its custom PS1 emits, returns the slice between submission and prompt. **One in-flight call per session** — a concurrent call returns a `"session busy"` error.
|
||||
|
||||
### 2. raw_send: send raw input, no waiting
|
||||
|
||||
```
|
||||
shell_pty_run(session_id, command="print('hi')\n", raw_send=True)
|
||||
→ { bytes_sent: 12 }
|
||||
```
|
||||
|
||||
For REPLs, vim keystrokes, password prompts. The session writes the bytes and returns immediately — it doesn't wait for a prompt (REPLs don't print bash's prompt; they print their own).
|
||||
|
||||
After a `raw_send`, you typically follow with:
|
||||
|
||||
### 3. read_only: drain currently-buffered output
|
||||
|
||||
```
|
||||
shell_pty_run(session_id, read_only=True, timeout_sec=2)
|
||||
→ { output: "hi\n", more: False, ... }
|
||||
```
|
||||
|
||||
Reads whatever the session has accumulated since the last drain, with a brief settle window. Use after raw_send to capture the REPL's response.
|
||||
|
||||
## Custom prompt detection (`expect`)
|
||||
|
||||
When the command launches a program with its own prompt (Python REPL's `>>> `, mysql's `mysql> `, sudo's password prompt), the bash sentinel won't appear until the program exits. Override:
|
||||
|
||||
```
|
||||
shell_pty_run(session_id, command="python3", expect=r">>>\s*$", timeout_sec=10)
|
||||
→ output up to and including ">>>", then control returns
|
||||
```
|
||||
|
||||
For sudo:
|
||||
|
||||
```
|
||||
shell_pty_run(session_id, command="sudo -k && sudo whoami", expect=r"[Pp]assword:")
|
||||
shell_pty_run(session_id, command="<password>", raw_send=True, command="<password>\n")
|
||||
shell_pty_run(session_id, read_only=True, timeout_sec=5)
|
||||
```
|
||||
|
||||
(Treat passwords carefully — they end up in the ring buffer.)
|
||||
|
||||
## Always close
|
||||
|
||||
```
|
||||
shell_pty_close(session_id)
|
||||
```
|
||||
|
||||
Leaked sessions count against `SHELL_TOOLS_MAX_PTY` (default 8). Idle reaping happens lazily on every `_open` call (sessions inactive longer than `idle_timeout_sec`, default 1800s, are dropped) — but don't rely on it. Close when you're done.
|
||||
|
||||
For unresponsive sessions, `force=True` skips the graceful "exit" attempt and goes straight to SIGTERM/SIGKILL.
|
||||
|
||||
## Common patterns
|
||||
|
||||
### Stateful navigation
|
||||
|
||||
```
|
||||
sid = shell_pty_open(cwd="/")
|
||||
shell_pty_run(sid, command="cd /var/log")
|
||||
shell_pty_run(sid, command="ls -la *.log | head")
|
||||
shell_pty_close(sid)
|
||||
```
|
||||
|
||||
### Python REPL
|
||||
|
||||
```
|
||||
sid = shell_pty_open()
|
||||
shell_pty_run(sid, command="python3", expect=r">>>\s*$")
|
||||
shell_pty_run(sid, command="x = 42", raw_send=True)
|
||||
shell_pty_run(sid, command="print(x*x)\n", raw_send=True)
|
||||
result = shell_pty_run(sid, read_only=True) # → "1764\n>>> "
|
||||
shell_pty_run(sid, command="exit()", raw_send=True)
|
||||
shell_pty_close(sid)
|
||||
```
|
||||
|
||||
### ssh with host-key prompt
|
||||
|
||||
```
|
||||
sid = shell_pty_open()
|
||||
shell_pty_run(sid, command="ssh user@new-host", expect=r"\(yes/no.*\)\?")
|
||||
shell_pty_run(sid, command="yes\n", raw_send=True)
|
||||
shell_pty_run(sid, read_only=True, timeout_sec=10) # password prompt or login
|
||||
```
|
||||
@@ -0,0 +1,92 @@
|
||||
---
|
||||
name: hive.shell-tools-troubleshooting
|
||||
description: Read when a shell-tools call returned something surprising — empty stdout despite no error, exit_code is null, output_handle came back expired, "too many jobs" / "session busy" / "too many PTYs", warning was set unexpectedly, semantic_status disagrees with exit_code. Diagnostic recipes only — load on demand. Don't preload; the foundational skill covers the happy path.
|
||||
metadata:
|
||||
author: hive
|
||||
type: preset-skill
|
||||
version: "1.0"
|
||||
---
|
||||
|
||||
# Troubleshooting shell-tools
|
||||
|
||||
Recipes for surprising results. Match the symptom to the section.
|
||||
|
||||
## Empty `stdout` despite the command "should have" produced output
|
||||
|
||||
Possible causes:
|
||||
1. Output went to **stderr** instead. Check `stderr` in the envelope (or use `merge_stderr=True` for jobs).
|
||||
2. Output was **fully truncated** because `max_output_kb` is too small. Check `stdout_truncated_bytes > 0`. Bump `max_output_kb` or paginate via `output_handle`.
|
||||
3. Command produced no output (correct, just unexpected — `silent` flags, no matches).
|
||||
4. Pipeline issue: the last stage of a pipe ran but stdout went elsewhere (`> /dev/null`, redirected via `2>&1`).
|
||||
5. Process is buffering its output and didn't flush before exit. Add `stdbuf -oL` (line-buffered) or `unbuffer` to the command.
|
||||
|
||||
## `exit_code: null`
|
||||
|
||||
| Cause | Other field |
|
||||
|---|---|
|
||||
| Auto-backgrounded | `auto_backgrounded: true, job_id: <X>` |
|
||||
| Hard timeout, process killed | `timed_out: true` |
|
||||
| Pre-spawn failure (command not found) | `error: ...` set, `pid: null` |
|
||||
| Still running (in `shell_job_logs`) | `status: "running"` |
|
||||
|
||||
## `output_handle` returned `expired: true`
|
||||
|
||||
5-minute TTL. Either (a) you waited too long, or (b) the store evicted it under memory pressure (64 MB total cap, LRU eviction). Re-run the command.
|
||||
|
||||
To reduce risk: paginate the handle as soon as you receive it, or use `shell_job_*` for huge outputs (4 MB ring buffer with offsets — no expiry).
|
||||
|
||||
## "too many jobs" / `JobLimitExceeded`
|
||||
|
||||
`SHELL_TOOLS_MAX_JOBS` (default 32) hit. Either:
|
||||
- Wait for jobs to exit (poll with `shell_job_logs(wait_until_exit=True)`)
|
||||
- Kill old jobs: `shell_job_manage(action="list")` to see what's running, then `signal_term` the abandoned ones
|
||||
- Raise the cap via env (rare)
|
||||
|
||||
## "session busy"
|
||||
|
||||
A `shell_pty_run` was issued while another `_run` is in flight on the same session. PTY sessions are single-threaded conversations. Wait for the prior call to return, or open a second session.
|
||||
|
||||
## "PTY cap reached"
|
||||
|
||||
`SHELL_TOOLS_MAX_PTY` (default 8) hit. Close idle sessions (`shell_pty_close`). Idle reaping is lazy; force it by opening — no, actually, opening throws when the cap is hit. Just close manually.
|
||||
|
||||
## `warning` is set, the command worked
|
||||
|
||||
Informational only. The pattern matched (e.g. `rm -rf` literally appears, or `git push --force` was used). The command ran. The warning is your "did I mean to do that?" prompt — verify the side effect was intended before continuing.
|
||||
|
||||
## `semantic_status: "ok"` but `exit_code: 1`
|
||||
|
||||
Working as designed. Some commands use exit 1 for legitimate non-error states:
|
||||
- `grep` / `rg` exit 1 when **no matches** found
|
||||
- `find` exit 1 when **some directories were unreadable** (typical on `/proc`, etc.)
|
||||
- `diff` exit 1 when **files differ**
|
||||
- `test` / `[` exit 1 when **condition is false**
|
||||
|
||||
The `semantic_message` field explains. Trust `semantic_status`, not raw `exit_code`.
|
||||
|
||||
## `semantic_status: "error"` but `exit_code: 0`
|
||||
|
||||
Shouldn't happen. If it does, file a bug.
|
||||
|
||||
## `truncated_bytes_dropped > 0` in `shell_job_logs`
|
||||
|
||||
Your `since_offset` was older than the ring buffer's floor — bytes evicted before you could read them. Either:
|
||||
- Poll faster (lower latency between calls)
|
||||
- Use `merge_stderr=True` (single 4 MB ring instead of 4 MB × 2)
|
||||
- Accept the gap and move forward from `next_offset`
|
||||
|
||||
## `shell_pty_open` succeeds but the first `_run` times out
|
||||
|
||||
The session may not have produced its first prompt sentinel within the 2-second startup window. Try:
|
||||
- A `shell_pty_run(sid, read_only=True, timeout_sec=2)` to drain whatever's accumulated
|
||||
- A noop command (`shell_pty_run(sid, command="true")`) to force a prompt cycle
|
||||
|
||||
Could also indicate the bash process died at startup — `shell_pty_run(sid, ...)` would then return `"session has exited"`.
|
||||
|
||||
## `shell="/bin/zsh"` returned an error
|
||||
|
||||
By design. shell-tools is bash-only on POSIX. Use `shell=True` (default `/bin/bash`) or omit `shell=` to exec directly.
|
||||
|
||||
## A command in `shell=True` is interpreted differently than expected
|
||||
|
||||
Bash, not zsh, semantics. `**/*` doesn't recurse without `shopt -s globstar`; `=cmd` expansion doesn't work; arrays use `arr[idx]` not `${arr[idx]}` differently than zsh. When in doubt, the foundational skill's "bash, not zsh" section is the canonical statement.
|
||||
@@ -33,6 +33,7 @@ _BUNDLED_DIRS: tuple[Path, ...] = (
|
||||
# (tool-name prefix, skill directory name, display name)
|
||||
_TOOL_GATED_SKILLS: list[tuple[str, str, str]] = [
|
||||
("browser_", "browser-automation", "hive.browser-automation"),
|
||||
("shell_", "shell-tools-foundations", "hive.shell-tools-foundations"),
|
||||
]
|
||||
|
||||
_BODY_CACHE: dict[str, str] = {}
|
||||
|
||||
@@ -0,0 +1,19 @@
|
||||
#!/usr/bin/env python3
|
||||
"""shell-tools MCP server entry point.
|
||||
|
||||
Wired into _DEFAULT_LOCAL_SERVERS in core/framework/loader/mcp_registry.py
|
||||
so that running ``uv run python shell_tools_server.py --stdio`` from this
|
||||
directory starts the server. The cwd of ``tools/`` puts ``src/shell_tools``
|
||||
on the import path via uv's workspace setup.
|
||||
|
||||
Usage:
|
||||
uv run python shell_tools_server.py --stdio # for agent integration
|
||||
uv run python shell_tools_server.py --port 4004 # HTTP for inspection
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from shell_tools.server import main
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -0,0 +1,43 @@
|
||||
"""shell-tools — Terminal/shell capabilities MCP server.
|
||||
|
||||
Exposes ten tools (prefix ``shell_*``) covering:
|
||||
- Foreground exec with auto-promotion to background (``shell_exec``)
|
||||
- Background job lifecycle (``shell_job_*``)
|
||||
- Persistent PTY-backed bash sessions (``shell_pty_*``)
|
||||
- Filesystem search (``shell_rg``, ``shell_find``)
|
||||
- Truncation handle retrieval (``shell_output_get``)
|
||||
|
||||
Bash-only on POSIX. zsh is rejected at the shell-resolver level. See
|
||||
``common/limits.py:_resolve_shell`` for the single enforcement point.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import TYPE_CHECKING
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from fastmcp import FastMCP
|
||||
|
||||
|
||||
def register_shell_tools(mcp: FastMCP) -> list[str]:
|
||||
"""Register all ten shell-tools with the FastMCP server.
|
||||
|
||||
Returns the list of registered tool names so the caller can log /
|
||||
smoke-test how many landed.
|
||||
"""
|
||||
from shell_tools.exec import register_exec_tools
|
||||
from shell_tools.jobs.tools import register_job_tools
|
||||
from shell_tools.output import register_output_tools
|
||||
from shell_tools.pty.tools import register_pty_tools
|
||||
from shell_tools.search.tools import register_search_tools
|
||||
|
||||
register_exec_tools(mcp)
|
||||
register_job_tools(mcp)
|
||||
register_pty_tools(mcp)
|
||||
register_search_tools(mcp)
|
||||
register_output_tools(mcp)
|
||||
|
||||
return [name for name in mcp._tool_manager._tools.keys() if name.startswith("shell_")]
|
||||
|
||||
|
||||
__all__ = ["register_shell_tools"]
|
||||
@@ -0,0 +1,72 @@
|
||||
"""Detect potentially destructive commands and surface a warning string.
|
||||
|
||||
Informational only — the warning is included in the exec envelope, not
|
||||
used to block execution. Lets the agent re-read its command before
|
||||
trusting the result of an irreversible action. Catalog ported from
|
||||
claudecode's BashTool/destructiveCommandWarning.ts.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import re
|
||||
from collections.abc import Sequence
|
||||
|
||||
_PATTERNS: tuple[tuple[re.Pattern[str], str], ...] = (
|
||||
# Git — data loss / hard to reverse
|
||||
(re.compile(r"\bgit\s+reset\s+--hard\b"), "may discard uncommitted changes"),
|
||||
(
|
||||
re.compile(r"\bgit\s+push\b[^;&|\n]*[ \t](--force|--force-with-lease|-f)\b"),
|
||||
"may overwrite remote history",
|
||||
),
|
||||
(
|
||||
re.compile(r"\bgit\s+clean\b(?![^;&|\n]*(?:-[a-zA-Z]*n|--dry-run))[^;&|\n]*-[a-zA-Z]*f"),
|
||||
"may permanently delete untracked files",
|
||||
),
|
||||
(re.compile(r"\bgit\s+checkout\s+(--\s+)?\.[ \t]*($|[;&|\n])"), "may discard all working tree changes"),
|
||||
(re.compile(r"\bgit\s+restore\s+(--\s+)?\.[ \t]*($|[;&|\n])"), "may discard all working tree changes"),
|
||||
(re.compile(r"\bgit\s+stash[ \t]+(drop|clear)\b"), "may permanently remove stashed changes"),
|
||||
(
|
||||
re.compile(r"\bgit\s+branch\s+(-D[ \t]|--delete\s+--force|--force\s+--delete)\b"),
|
||||
"may force-delete a branch",
|
||||
),
|
||||
# Git — safety bypass
|
||||
(re.compile(r"\bgit\s+(commit|push|merge)\b[^;&|\n]*--no-verify\b"), "may skip safety hooks"),
|
||||
(re.compile(r"\bgit\s+commit\b[^;&|\n]*--amend\b"), "may rewrite the last commit"),
|
||||
# File deletion — most specific patterns first so the warning is descriptive
|
||||
(
|
||||
re.compile(r"(^|[;&|\n]\s*)rm\s+-[a-zA-Z]*[rR][a-zA-Z]*f|(^|[;&|\n]\s*)rm\s+-[a-zA-Z]*f[a-zA-Z]*[rR]"),
|
||||
"may recursively force-remove files",
|
||||
),
|
||||
(re.compile(r"(^|[;&|\n]\s*)rm\s+-[a-zA-Z]*[rR]"), "may recursively remove files"),
|
||||
(re.compile(r"(^|[;&|\n]\s*)rm\s+-[a-zA-Z]*f"), "may force-remove files"),
|
||||
# Database
|
||||
(
|
||||
re.compile(r"\b(DROP|TRUNCATE)\s+(TABLE|DATABASE|SCHEMA)\b", re.IGNORECASE),
|
||||
"may drop or truncate database objects",
|
||||
),
|
||||
(re.compile(r"\bDELETE\s+FROM\s+\w+[ \t]*(;|\"|'|\n|$)", re.IGNORECASE), "may delete rows from a database table"),
|
||||
# Infrastructure
|
||||
(re.compile(r"\bkubectl\s+delete\b"), "may delete Kubernetes resources"),
|
||||
(re.compile(r"\bterraform\s+destroy\b"), "may destroy Terraform infrastructure"),
|
||||
)
|
||||
|
||||
|
||||
def get_warning(command: str | Sequence[str]) -> str | None:
|
||||
"""Return a warning string if the command matches a destructive pattern.
|
||||
|
||||
For argv-style invocations (``command=["rm", "-rf", "/tmp/x"]``), we
|
||||
join with spaces so the same regex catalog applies. Returns None
|
||||
when nothing matches.
|
||||
"""
|
||||
if isinstance(command, (list, tuple)):
|
||||
text = " ".join(str(c) for c in command)
|
||||
else:
|
||||
text = command
|
||||
|
||||
for pattern, message in _PATTERNS:
|
||||
if pattern.search(text):
|
||||
return message
|
||||
return None
|
||||
|
||||
|
||||
__all__ = ["get_warning"]
|
||||
@@ -0,0 +1,153 @@
|
||||
"""Shell resolution + resource limits.
|
||||
|
||||
The single place that decides which shell binary we invoke and how to
|
||||
strip zsh-specific environment leakage. Per the shell-tools security
|
||||
stance (see ``destructive_warning.py`` neighbours), zsh constructs
|
||||
(``zmodload``, ``=cmd``, ``zpty``, ``ztcp``) bypass bash-shaped
|
||||
checks — refusing zsh isn't aesthetic, it's a deliberate hardening
|
||||
choice.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import os
|
||||
import resource
|
||||
from collections.abc import Callable
|
||||
from typing import Any
|
||||
|
||||
# Env vars that influence zsh startup. Strip these before exec so a
|
||||
# user with zsh dotfiles can't accidentally jam zsh behaviour into
|
||||
# the bash subprocess.
|
||||
_ZSH_ENV_PREFIXES: tuple[str, ...] = ("ZDOTDIR", "ZSH_")
|
||||
|
||||
|
||||
class ZshRefused(ValueError):
|
||||
"""Raised when an explicit zsh shell is requested."""
|
||||
|
||||
|
||||
def _resolve_shell(shell: bool | str) -> str | None:
|
||||
"""Return the shell executable to use, or None for direct exec.
|
||||
|
||||
- ``shell=False`` → None (caller should exec command directly)
|
||||
- ``shell=True`` → ``/bin/bash`` always (ignores ``$SHELL``)
|
||||
- ``shell="/bin/bash"`` or any path containing ``bash`` → that path
|
||||
- ``shell="/bin/zsh"`` or any zsh-containing path → raises ZshRefused
|
||||
|
||||
Caller is expected to invoke as ``[shell_path, "-c", command]``.
|
||||
"""
|
||||
if shell is False or shell is None:
|
||||
return None
|
||||
|
||||
if shell is True:
|
||||
return "/bin/bash"
|
||||
|
||||
if not isinstance(shell, str):
|
||||
raise TypeError(f"shell must be bool or str, got {type(shell).__name__}")
|
||||
|
||||
lower = shell.lower()
|
||||
if "zsh" in lower:
|
||||
raise ZshRefused(
|
||||
f"shell={shell!r} rejected: shell-tools is bash-only on POSIX. "
|
||||
"Use shell=True (bash) or omit the shell parameter to exec directly. "
|
||||
"This is a deliberate security stance — zsh has command/builtin "
|
||||
"classes (zmodload, =cmd, zpty, ztcp) that bypass bash-shaped checks."
|
||||
)
|
||||
|
||||
return shell
|
||||
|
||||
|
||||
def sanitized_env(extra: dict[str, str] | None = None) -> dict[str, str]:
|
||||
"""Return os.environ with zsh-related vars stripped, plus optional overrides.
|
||||
|
||||
Stripping ``ZDOTDIR`` and ``ZSH_*`` ensures zsh dotfiles don't leak
|
||||
into the bash subprocess's startup. Bash dotfiles still apply when
|
||||
the shell is invoked interactively.
|
||||
"""
|
||||
env = {k: v for k, v in os.environ.items() if not k.startswith(_ZSH_ENV_PREFIXES)}
|
||||
if extra:
|
||||
env.update(extra)
|
||||
return env
|
||||
|
||||
|
||||
# ── Resource limits ───────────────────────────────────────────────────
|
||||
|
||||
|
||||
# Maps the public limit name to its (resource constant, multiplier)
|
||||
# tuple. Multipliers convert the agent-friendly unit (seconds, MB) to
|
||||
# the kernel unit (seconds, bytes).
|
||||
_LIMIT_MAP: dict[str, tuple[int, int]] = {
|
||||
"cpu_sec": (resource.RLIMIT_CPU, 1),
|
||||
"rss_mb": (resource.RLIMIT_AS, 1024 * 1024),
|
||||
"fsize_mb": (resource.RLIMIT_FSIZE, 1024 * 1024),
|
||||
"nofile": (resource.RLIMIT_NOFILE, 1),
|
||||
}
|
||||
|
||||
|
||||
def make_preexec_fn(limits: dict[str, int] | None) -> Callable[[], None] | None:
|
||||
"""Build a preexec_fn that applies setrlimit before exec.
|
||||
|
||||
Returns None if no limits are configured (so subprocess.Popen can
|
||||
skip the fork hook entirely). Unknown keys are ignored — agents
|
||||
pass arbitrary dicts and we don't want a typo to crash exec.
|
||||
"""
|
||||
if not limits:
|
||||
return None
|
||||
|
||||
def _apply() -> None:
|
||||
for key, value in limits.items():
|
||||
spec = _LIMIT_MAP.get(key)
|
||||
if spec is None or value is None:
|
||||
continue
|
||||
rlimit_const, multiplier = spec
|
||||
limit = int(value) * multiplier
|
||||
try:
|
||||
resource.setrlimit(rlimit_const, (limit, limit))
|
||||
except (OSError, ValueError):
|
||||
# Hard limit may exceed the current ceiling. Best-effort:
|
||||
# set just the soft limit to whatever we can.
|
||||
try:
|
||||
soft, hard = resource.getrlimit(rlimit_const)
|
||||
resource.setrlimit(rlimit_const, (min(limit, hard), hard))
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
return _apply
|
||||
|
||||
|
||||
def coerce_limits(limits: Any) -> dict[str, int] | None:
|
||||
"""Validate and normalize a user-supplied limits dict.
|
||||
|
||||
Accepts the four supported keys (``cpu_sec``, ``rss_mb``,
|
||||
``fsize_mb``, ``nofile``); silently drops unknown keys; returns
|
||||
None when the result is empty. Negative or non-int values are
|
||||
dropped too — invalid limits are better as no-ops than as errors,
|
||||
since the agent didn't ask for enforcement of a *specific*
|
||||
failure mode.
|
||||
"""
|
||||
if not limits:
|
||||
return None
|
||||
if not isinstance(limits, dict):
|
||||
return None
|
||||
|
||||
out: dict[str, int] = {}
|
||||
for key in _LIMIT_MAP:
|
||||
value = limits.get(key)
|
||||
if value is None:
|
||||
continue
|
||||
try:
|
||||
ivalue = int(value)
|
||||
except (TypeError, ValueError):
|
||||
continue
|
||||
if ivalue <= 0:
|
||||
continue
|
||||
out[key] = ivalue
|
||||
return out or None
|
||||
|
||||
|
||||
__all__ = [
|
||||
"ZshRefused",
|
||||
"_resolve_shell",
|
||||
"coerce_limits",
|
||||
"make_preexec_fn",
|
||||
"sanitized_env",
|
||||
]
|
||||
@@ -0,0 +1,121 @@
|
||||
"""TTL-bounded output handle store.
|
||||
|
||||
When an exec produces more output than the inline cap (default 256 KB),
|
||||
the surplus is kept here under a short-lived handle. The agent passes
|
||||
the handle to ``shell_output_get`` to paginate the rest. Handles
|
||||
expire after 5 minutes; total store size is capped at 64 MB with LRU
|
||||
eviction so the server can't be DoS'd by a chatty subprocess.
|
||||
|
||||
Thread-safe — exec/job code paths populate; the MCP request thread
|
||||
drains.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import secrets
|
||||
import threading
|
||||
import time
|
||||
from dataclasses import dataclass, field
|
||||
|
||||
_DEFAULT_TTL_SEC = 300
|
||||
_DEFAULT_TOTAL_CAP_BYTES = 64 * 1024 * 1024
|
||||
|
||||
|
||||
@dataclass(slots=True)
|
||||
class _Entry:
|
||||
data: bytes
|
||||
created_at: float
|
||||
last_accessed_at: float = field(default_factory=time.monotonic)
|
||||
|
||||
|
||||
class OutputStore:
|
||||
"""LRU-with-TTL byte store keyed by opaque handle."""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
ttl_sec: int = _DEFAULT_TTL_SEC,
|
||||
total_cap_bytes: int = _DEFAULT_TOTAL_CAP_BYTES,
|
||||
):
|
||||
self._ttl = ttl_sec
|
||||
self._cap = total_cap_bytes
|
||||
self._entries: dict[str, _Entry] = {}
|
||||
self._total_bytes = 0
|
||||
self._lock = threading.Lock()
|
||||
|
||||
def put(self, data: bytes) -> str:
|
||||
"""Store ``data``, return a fresh handle. Evicts older entries
|
||||
if the total cap would be exceeded."""
|
||||
if not data:
|
||||
# Empty payloads don't need a handle.
|
||||
return ""
|
||||
handle = "out_" + secrets.token_hex(8)
|
||||
now = time.monotonic()
|
||||
with self._lock:
|
||||
self._evict_locked(now)
|
||||
# Reserve room for new entry; evict LRU until it fits.
|
||||
while self._total_bytes + len(data) > self._cap and self._entries:
|
||||
self._pop_lru_locked()
|
||||
self._entries[handle] = _Entry(data=data, created_at=now, last_accessed_at=now)
|
||||
self._total_bytes += len(data)
|
||||
return handle
|
||||
|
||||
def get(self, handle: str, since_offset: int = 0, max_bytes: int = 64 * 1024) -> dict:
|
||||
"""Retrieve a slice of stored data.
|
||||
|
||||
Returns ``{data, offset, next_offset, eof, expired}`` so the
|
||||
agent can paginate without separate calls. ``expired=True``
|
||||
when the handle is unknown or the TTL has lapsed.
|
||||
"""
|
||||
now = time.monotonic()
|
||||
with self._lock:
|
||||
self._evict_locked(now)
|
||||
entry = self._entries.get(handle)
|
||||
if entry is None:
|
||||
return {
|
||||
"data": "",
|
||||
"offset": int(since_offset),
|
||||
"next_offset": int(since_offset),
|
||||
"eof": True,
|
||||
"expired": True,
|
||||
}
|
||||
entry.last_accessed_at = now
|
||||
buf = entry.data
|
||||
|
||||
since = max(0, int(since_offset))
|
||||
end = min(len(buf), since + max(0, int(max_bytes)))
|
||||
data_slice = buf[since:end]
|
||||
return {
|
||||
"data": data_slice.decode("utf-8", errors="replace"),
|
||||
"offset": since,
|
||||
"next_offset": end,
|
||||
"eof": end >= len(buf),
|
||||
"expired": False,
|
||||
}
|
||||
|
||||
# ── Eviction ──────────────────────────────────────────────────
|
||||
|
||||
def _evict_locked(self, now: float) -> None:
|
||||
# TTL eviction — anything past TTL goes.
|
||||
stale = [h for h, e in self._entries.items() if now - e.created_at > self._ttl]
|
||||
for h in stale:
|
||||
entry = self._entries.pop(h, None)
|
||||
if entry is not None:
|
||||
self._total_bytes -= len(entry.data)
|
||||
|
||||
def _pop_lru_locked(self) -> None:
|
||||
if not self._entries:
|
||||
return
|
||||
oldest_handle = min(self._entries, key=lambda h: self._entries[h].last_accessed_at)
|
||||
entry = self._entries.pop(oldest_handle)
|
||||
self._total_bytes -= len(entry.data)
|
||||
|
||||
|
||||
# Module-level singleton; the server has one instance per process.
|
||||
_STORE = OutputStore()
|
||||
|
||||
|
||||
def get_store() -> OutputStore:
|
||||
return _STORE
|
||||
|
||||
|
||||
__all__ = ["OutputStore", "get_store"]
|
||||
@@ -0,0 +1,155 @@
|
||||
"""Bounded byte ring buffer with absolute monotonic offsets.
|
||||
|
||||
The streaming primitive shared by jobs and PTY sessions. Writers push
|
||||
bytes; readers ask for ``[since_offset, since_offset + N)`` and the
|
||||
buffer either returns the data (if still in window) or signals how
|
||||
many bytes were dropped from the floor. This lets the agent resume
|
||||
after a missed poll without silent loss.
|
||||
|
||||
Thread-safe via a single lock — readers and writers can come from
|
||||
different threads (a pump thread fills it, the MCP request thread
|
||||
drains it).
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import threading
|
||||
from collections import deque
|
||||
from dataclasses import dataclass
|
||||
|
||||
|
||||
@dataclass(slots=True)
|
||||
class ReadResult:
|
||||
data: bytes
|
||||
offset: int
|
||||
next_offset: int
|
||||
truncated_bytes_dropped: int # bytes lost between since_offset and the buffer floor
|
||||
|
||||
|
||||
class RingBuffer:
|
||||
"""Capacity-bounded byte ring with absolute offsets.
|
||||
|
||||
The total written count never resets; each call sees absolute
|
||||
offsets growing monotonically. The on-disk window slides forward
|
||||
once total_written exceeds capacity_bytes.
|
||||
"""
|
||||
|
||||
def __init__(self, capacity_bytes: int = 4 * 1024 * 1024):
|
||||
if capacity_bytes <= 0:
|
||||
raise ValueError("capacity_bytes must be positive")
|
||||
self._capacity = capacity_bytes
|
||||
self._chunks: deque[bytes] = deque()
|
||||
self._buffered_bytes = 0
|
||||
self._floor = 0 # absolute offset of the oldest byte still in buffer
|
||||
self._total_written = 0
|
||||
self._eof = False
|
||||
self._lock = threading.Lock()
|
||||
|
||||
# ── Writer side ───────────────────────────────────────────────
|
||||
|
||||
def write(self, data: bytes) -> None:
|
||||
if not data:
|
||||
return
|
||||
with self._lock:
|
||||
self._chunks.append(data)
|
||||
self._buffered_bytes += len(data)
|
||||
self._total_written += len(data)
|
||||
self._evict_locked()
|
||||
|
||||
def close(self) -> None:
|
||||
"""Mark the stream as ended. Subsequent reads will see eof=True
|
||||
once they catch up to total_written."""
|
||||
with self._lock:
|
||||
self._eof = True
|
||||
|
||||
def _evict_locked(self) -> None:
|
||||
while self._buffered_bytes > self._capacity and self._chunks:
|
||||
head = self._chunks[0]
|
||||
overshoot = self._buffered_bytes - self._capacity
|
||||
if len(head) <= overshoot:
|
||||
self._chunks.popleft()
|
||||
self._buffered_bytes -= len(head)
|
||||
self._floor += len(head)
|
||||
else:
|
||||
self._chunks[0] = head[overshoot:]
|
||||
self._buffered_bytes -= overshoot
|
||||
self._floor += overshoot
|
||||
|
||||
# ── Reader side ───────────────────────────────────────────────
|
||||
|
||||
@property
|
||||
def total_written(self) -> int:
|
||||
with self._lock:
|
||||
return self._total_written
|
||||
|
||||
@property
|
||||
def floor(self) -> int:
|
||||
with self._lock:
|
||||
return self._floor
|
||||
|
||||
@property
|
||||
def eof(self) -> bool:
|
||||
with self._lock:
|
||||
return self._eof
|
||||
|
||||
def read(self, since_offset: int, max_bytes: int) -> ReadResult:
|
||||
"""Read up to ``max_bytes`` starting at ``since_offset``.
|
||||
|
||||
- If ``since_offset`` is past total_written, returns empty data
|
||||
(and ``next_offset == since_offset``, signaling caller to wait).
|
||||
- If ``since_offset`` is below the buffer floor, the missed
|
||||
bytes are reported as ``truncated_bytes_dropped`` and reading
|
||||
starts from the floor.
|
||||
"""
|
||||
max_bytes = max(0, int(max_bytes))
|
||||
with self._lock:
|
||||
since = max(0, int(since_offset))
|
||||
dropped = 0
|
||||
if since < self._floor:
|
||||
dropped = self._floor - since
|
||||
since = self._floor
|
||||
|
||||
available = self._total_written - since
|
||||
if available <= 0 or max_bytes == 0:
|
||||
return ReadResult(
|
||||
data=b"",
|
||||
offset=since,
|
||||
next_offset=since,
|
||||
truncated_bytes_dropped=dropped,
|
||||
)
|
||||
|
||||
to_take = min(available, max_bytes)
|
||||
# Walk chunks to assemble [since, since+to_take)
|
||||
cursor = self._floor
|
||||
collected: list[bytes] = []
|
||||
remaining = to_take
|
||||
for chunk in self._chunks:
|
||||
chunk_end = cursor + len(chunk)
|
||||
if chunk_end <= since:
|
||||
cursor = chunk_end
|
||||
continue
|
||||
start_in_chunk = max(0, since - cursor)
|
||||
end_in_chunk = min(len(chunk), start_in_chunk + remaining)
|
||||
slice_ = chunk[start_in_chunk:end_in_chunk]
|
||||
collected.append(slice_)
|
||||
remaining -= len(slice_)
|
||||
cursor = chunk_end
|
||||
if remaining <= 0:
|
||||
break
|
||||
|
||||
data = b"".join(collected)
|
||||
return ReadResult(
|
||||
data=data,
|
||||
offset=since,
|
||||
next_offset=since + len(data),
|
||||
truncated_bytes_dropped=dropped,
|
||||
)
|
||||
|
||||
def tail(self, max_bytes: int) -> ReadResult:
|
||||
"""Read the last ``max_bytes`` (or as much as is buffered)."""
|
||||
with self._lock:
|
||||
start = max(self._floor, self._total_written - max(0, int(max_bytes)))
|
||||
return self.read(start, max_bytes)
|
||||
|
||||
|
||||
__all__ = ["RingBuffer", "ReadResult"]
|
||||
@@ -0,0 +1,103 @@
|
||||
"""Per-command exit-code semantics.
|
||||
|
||||
Many commands use exit codes to convey information other than just
|
||||
success/failure. ``grep`` returns 1 when no matches are found, which
|
||||
is not an error. Encoding this lookup means the agent reads
|
||||
``semantic_status`` instead of having to memorize per-command quirks.
|
||||
|
||||
Catalog ported from claudecode's BashTool/commandSemantics.ts. We
|
||||
inspect only the *final* command in a piped chain (its exit code is
|
||||
what the shell propagates), and only when the command is run with
|
||||
``shell=False`` (i.e. we know the argv). For ``shell=True`` we fall
|
||||
back to default semantics — the heuristic of parsing a bash command
|
||||
string for "the last command" is fragile and the upstream tool
|
||||
already documents the issue.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from collections.abc import Sequence
|
||||
|
||||
SemanticStatus = str # "ok" | "signal" | "error"
|
||||
|
||||
|
||||
# Maps base command name → (exit_code → semantic). Returning
|
||||
# (status, message) — message may be None for the success cases.
|
||||
_SEMANTICS: dict[str, dict[int, tuple[SemanticStatus, str | None]]] = {
|
||||
# grep: 0=matches, 1=no matches (not an error), 2+=error
|
||||
"grep": {0: ("ok", None), 1: ("ok", "No matches found")},
|
||||
"rg": {0: ("ok", None), 1: ("ok", "No matches found")},
|
||||
"ripgrep": {0: ("ok", None), 1: ("ok", "No matches found")},
|
||||
# find: 0=success, 1=partial (some dirs unreadable), 2+=error
|
||||
"find": {0: ("ok", None), 1: ("ok", "Some directories were inaccessible")},
|
||||
# diff: 0=identical, 1=differ (informational), 2+=error
|
||||
"diff": {0: ("ok", None), 1: ("ok", "Files differ")},
|
||||
# test / [: 0=true, 1=false, 2+=error
|
||||
"test": {0: ("ok", None), 1: ("ok", "Condition is false")},
|
||||
"[": {0: ("ok", None), 1: ("ok", "Condition is false")},
|
||||
}
|
||||
|
||||
|
||||
def _base_command(command: str | Sequence[str]) -> str:
|
||||
"""Extract the base command (first word) from argv or a string.
|
||||
|
||||
For shell=True strings, picks the *last* command in a pipeline since
|
||||
that determines the propagated exit code. Heuristic and intentionally
|
||||
not security-critical — only used to label the exit-code semantics.
|
||||
"""
|
||||
if isinstance(command, (list, tuple)):
|
||||
return command[0] if command else ""
|
||||
|
||||
if not isinstance(command, str):
|
||||
return ""
|
||||
|
||||
# Take the segment after the last unquoted pipe/&&/||/; — best-effort.
|
||||
text = command
|
||||
for sep in ("||", "&&", "|", ";"):
|
||||
# Crude split — fine for the heuristic.
|
||||
if sep in text:
|
||||
text = text.split(sep)[-1]
|
||||
|
||||
text = text.strip()
|
||||
if not text:
|
||||
return ""
|
||||
first = text.split()[0]
|
||||
# Strip a leading path: /usr/bin/grep → grep
|
||||
return first.rsplit("/", 1)[-1]
|
||||
|
||||
|
||||
def classify(
|
||||
command: str | Sequence[str],
|
||||
exit_code: int | None,
|
||||
*,
|
||||
timed_out: bool = False,
|
||||
signaled: bool = False,
|
||||
) -> tuple[SemanticStatus, str | None]:
|
||||
"""Classify an exit code with command-specific semantics.
|
||||
|
||||
Returns (status, message) where status is one of "ok"/"signal"/"error"
|
||||
and message is a short explanation when the status would otherwise
|
||||
surprise the agent (e.g. ``grep`` exiting 1).
|
||||
"""
|
||||
if timed_out:
|
||||
return ("error", "Command timed out")
|
||||
if signaled:
|
||||
return ("signal", f"Killed by signal (exit {exit_code})")
|
||||
if exit_code is None:
|
||||
return ("ok", "Still running") # auto-backgrounded case
|
||||
|
||||
base = _base_command(command)
|
||||
table = _SEMANTICS.get(base)
|
||||
if table is not None:
|
||||
if exit_code in table:
|
||||
return table[exit_code]
|
||||
# Beyond the catalog's known codes for this command, treat as error.
|
||||
return ("error", f"Command failed with exit code {exit_code}")
|
||||
|
||||
# Default: zero is success, nonzero is error.
|
||||
if exit_code == 0:
|
||||
return ("ok", None)
|
||||
return ("error", f"Command failed with exit code {exit_code}")
|
||||
|
||||
|
||||
__all__ = ["classify"]
|
||||
@@ -0,0 +1,105 @@
|
||||
"""Helpers to build the standard exec/job envelope with truncation.
|
||||
|
||||
The envelope shape is documented in the foundational skill — keep
|
||||
this module's output stable so skill updates don't have to chase
|
||||
field renames. Callers pass raw bytes; we decode and trim.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from collections.abc import Sequence
|
||||
|
||||
from shell_tools.common.destructive_warning import get_warning
|
||||
from shell_tools.common.output_store import get_store
|
||||
from shell_tools.common.semantic_exit import classify
|
||||
|
||||
|
||||
def _truncate_bytes(buf: bytes, max_bytes: int) -> tuple[str, int, str]:
|
||||
"""Trim ``buf`` to ``max_bytes`` (decoded). Returns
|
||||
``(decoded_text, dropped_bytes, full_for_handle)``. We always store
|
||||
the *original* bytes in the handle so the agent gets exactly what
|
||||
the process emitted, even when truncation point split a multi-byte
|
||||
char.
|
||||
"""
|
||||
if max_bytes < 0:
|
||||
max_bytes = 0
|
||||
if len(buf) <= max_bytes:
|
||||
return buf.decode("utf-8", errors="replace"), 0, buf.decode("utf-8", errors="replace")
|
||||
|
||||
head = buf[:max_bytes]
|
||||
return (
|
||||
head.decode("utf-8", errors="replace"),
|
||||
len(buf) - max_bytes,
|
||||
buf.decode("utf-8", errors="replace"),
|
||||
)
|
||||
|
||||
|
||||
def build_exec_envelope(
|
||||
*,
|
||||
command: str | Sequence[str],
|
||||
exit_code: int | None,
|
||||
stdout_bytes: bytes,
|
||||
stderr_bytes: bytes,
|
||||
runtime_ms: int,
|
||||
pid: int | None,
|
||||
timed_out: bool,
|
||||
signaled: bool = False,
|
||||
max_output_kb: int = 256,
|
||||
auto_backgrounded: bool = False,
|
||||
job_id: str | None = None,
|
||||
) -> dict:
|
||||
"""Construct the standard exec envelope.
|
||||
|
||||
See ``shell-tools-foundations`` SKILL for the field semantics. The
|
||||
inline ``stdout``/``stderr`` are decoded and trimmed; if either
|
||||
overflows ``max_output_kb`` the *full* bytes are stashed in the
|
||||
output store under ``output_handle`` for retrieval via
|
||||
``shell_output_get``. Both streams share the same handle (with
|
||||
``out_<hex>:stdout`` / ``out_<hex>:stderr`` suffixes) when both
|
||||
overflow — the agent uses the suffix to pick a stream.
|
||||
"""
|
||||
max_bytes = max(1024, max_output_kb * 1024)
|
||||
|
||||
stdout_text, stdout_dropped, stdout_full = _truncate_bytes(stdout_bytes, max_bytes)
|
||||
stderr_text, stderr_dropped, stderr_full = _truncate_bytes(stderr_bytes, max_bytes)
|
||||
|
||||
output_handle: str | None = None
|
||||
if stdout_dropped > 0 or stderr_dropped > 0:
|
||||
store = get_store()
|
||||
# Stash whichever overflowed (or both, joined with a separator
|
||||
# the foundational skill documents). For simplicity we always
|
||||
# store both when either overflows so the agent can fetch the
|
||||
# other stream in full too if it wants.
|
||||
combined = (
|
||||
b"--- stdout ---\n"
|
||||
+ stdout_bytes
|
||||
+ b"\n--- stderr ---\n"
|
||||
+ stderr_bytes
|
||||
)
|
||||
output_handle = store.put(combined)
|
||||
|
||||
semantic_status, semantic_message = classify(
|
||||
command, exit_code, timed_out=timed_out, signaled=signaled
|
||||
)
|
||||
|
||||
warning = get_warning(command)
|
||||
|
||||
return {
|
||||
"exit_code": exit_code,
|
||||
"stdout": stdout_text,
|
||||
"stderr": stderr_text,
|
||||
"stdout_truncated_bytes": stdout_dropped,
|
||||
"stderr_truncated_bytes": stderr_dropped,
|
||||
"runtime_ms": int(runtime_ms),
|
||||
"pid": int(pid) if pid is not None else None,
|
||||
"output_handle": output_handle,
|
||||
"timed_out": bool(timed_out),
|
||||
"semantic_status": semantic_status,
|
||||
"semantic_message": semantic_message,
|
||||
"warning": warning,
|
||||
"auto_backgrounded": bool(auto_backgrounded),
|
||||
"job_id": job_id,
|
||||
}
|
||||
|
||||
|
||||
__all__ = ["build_exec_envelope"]
|
||||
@@ -0,0 +1,269 @@
|
||||
"""``shell_exec`` — foreground exec with auto-promotion to background.
|
||||
|
||||
The flagship tool. Most agent terminal interactions go through here:
|
||||
fast commands (<30s) return inline with the standard envelope; longer
|
||||
commands silently transition into the JobManager and surface a
|
||||
``job_id`` so the agent can poll. The "should I background this?"
|
||||
decision is removed — the answer is always yes-if-needed.
|
||||
|
||||
Implementation notes:
|
||||
- We spawn the process the same way JobManager does, then wait with
|
||||
``proc.wait(timeout=auto_background_after_sec)``. Inline path
|
||||
drains pipes via ``proc.communicate()`` to avoid pipe-fill
|
||||
deadlocks.
|
||||
- Auto-promotion: when the timeout fires while the process is still
|
||||
running, we already have its stdin/stdout/stderr file objects.
|
||||
We hand them to JobManager which spawns pump threads to fill ring
|
||||
buffers from that point on. The agent sees an envelope with
|
||||
``auto_backgrounded=True, exit_code=None, job_id=<…>`` and
|
||||
transitions to ``shell_job_logs``. **There's no early-output loss**
|
||||
because the pumps start before we return from the tool call.
|
||||
- For pure-foreground use (``auto_background_after_sec=0``), we
|
||||
fall back to ``proc.communicate(timeout=timeout_sec)`` which has
|
||||
the simpler "kill on overall timeout" semantics.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import subprocess
|
||||
import threading
|
||||
import time
|
||||
from typing import TYPE_CHECKING
|
||||
|
||||
from shell_tools.common.limits import (
|
||||
ZshRefused,
|
||||
_resolve_shell,
|
||||
coerce_limits,
|
||||
make_preexec_fn,
|
||||
sanitized_env,
|
||||
)
|
||||
from shell_tools.common.ring_buffer import RingBuffer
|
||||
from shell_tools.common.truncation import build_exec_envelope
|
||||
from shell_tools.jobs.manager import JobLimitExceeded, get_manager
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from fastmcp import FastMCP
|
||||
|
||||
|
||||
def register_exec_tools(mcp: FastMCP) -> None:
|
||||
@mcp.tool()
|
||||
def shell_exec(
|
||||
command: str,
|
||||
cwd: str | None = None,
|
||||
env: dict[str, str] | None = None,
|
||||
timeout_sec: int = 60,
|
||||
auto_background_after_sec: int = 30,
|
||||
shell: bool = False,
|
||||
stdin: str | None = None,
|
||||
limits: dict[str, int] | None = None,
|
||||
max_output_kb: int = 256,
|
||||
) -> dict:
|
||||
"""Run a shell command and capture its output.
|
||||
|
||||
Past auto_background_after_sec, the call auto-promotes to a background
|
||||
job and returns immediately with `auto_backgrounded=True, job_id=...`
|
||||
— poll with shell_job_logs(job_id, since_offset=...) to read the rest.
|
||||
Set auto_background_after_sec=0 to force pure foreground (kill on
|
||||
timeout_sec).
|
||||
|
||||
Bash-only on POSIX. Passing shell="/bin/zsh" raises an error — this is
|
||||
a deliberate security stance.
|
||||
|
||||
Args:
|
||||
command: The command. With shell=False we naively split on
|
||||
whitespace; for pipes / quoting / globs use shell=True.
|
||||
cwd: Working directory.
|
||||
env: Environment override (merged into a sanitized base — zsh
|
||||
dotfile vars are stripped).
|
||||
timeout_sec: Hard kill deadline. Past this, the process is
|
||||
terminated and `timed_out=True` is returned. Should be ≥
|
||||
auto_background_after_sec for the auto-promote path to work.
|
||||
auto_background_after_sec: Inline budget. Past this, promote to
|
||||
a background job and return. 0 disables auto-promotion.
|
||||
shell: True for `/bin/bash -c <command>`. zsh refused.
|
||||
stdin: Optional stdin payload (string).
|
||||
limits: Optional setrlimit caps. Keys: cpu_sec, rss_mb,
|
||||
fsize_mb, nofile.
|
||||
max_output_kb: Inline output cap. Overflow stashes to an
|
||||
output_handle for retrieval via shell_output_get.
|
||||
|
||||
Returns the standard envelope: see `shell-tools-foundations` skill.
|
||||
"""
|
||||
try:
|
||||
argv: list[str] | str
|
||||
if shell:
|
||||
argv = command
|
||||
else:
|
||||
argv = command.split()
|
||||
if not argv:
|
||||
return _err_envelope(command, "command was empty")
|
||||
|
||||
full_env = sanitized_env(env) if env is not None else None
|
||||
preexec = make_preexec_fn(coerce_limits(limits))
|
||||
except ZshRefused as e:
|
||||
return _err_envelope(command, str(e))
|
||||
|
||||
# Resolve shell here so the same logic the JobManager uses applies
|
||||
# in both the inline + promoted paths.
|
||||
try:
|
||||
resolved_shell = _resolve_shell(shell)
|
||||
except ZshRefused as e:
|
||||
return _err_envelope(command, str(e))
|
||||
|
||||
if resolved_shell is not None:
|
||||
spawn_argv: list[str] = [resolved_shell, "-c", command]
|
||||
else:
|
||||
spawn_argv = list(argv) if isinstance(argv, list) else [str(argv)]
|
||||
|
||||
start = time.monotonic()
|
||||
try:
|
||||
proc = subprocess.Popen(
|
||||
spawn_argv,
|
||||
cwd=cwd,
|
||||
env=full_env,
|
||||
stdin=subprocess.PIPE if stdin is not None else None,
|
||||
stdout=subprocess.PIPE,
|
||||
stderr=subprocess.PIPE,
|
||||
preexec_fn=preexec,
|
||||
close_fds=True,
|
||||
bufsize=0,
|
||||
)
|
||||
except FileNotFoundError as e:
|
||||
return _err_envelope(command, f"command not found: {e}")
|
||||
except OSError as e:
|
||||
return _err_envelope(command, f"spawn failed: {e}")
|
||||
|
||||
# Push stdin without blocking on the process draining it. For
|
||||
# large stdin payloads this would deadlock; for typical agent
|
||||
# use (small payloads or None) it's fine.
|
||||
if stdin is not None and proc.stdin is not None:
|
||||
try:
|
||||
proc.stdin.write(stdin.encode("utf-8"))
|
||||
proc.stdin.close()
|
||||
except (BrokenPipeError, OSError):
|
||||
pass
|
||||
|
||||
# Pump stdout/stderr into ring buffers so we don't deadlock on
|
||||
# full pipes during the wait. These same buffers become the
|
||||
# job's buffers if we auto-promote.
|
||||
stdout_buf = RingBuffer()
|
||||
stderr_buf = RingBuffer()
|
||||
pumps: list[threading.Thread] = []
|
||||
|
||||
def _pump(stream, ring: RingBuffer) -> None:
|
||||
try:
|
||||
while True:
|
||||
chunk = stream.read(4096)
|
||||
if not chunk:
|
||||
break
|
||||
ring.write(chunk)
|
||||
except (OSError, ValueError):
|
||||
pass
|
||||
finally:
|
||||
try:
|
||||
stream.close()
|
||||
except Exception:
|
||||
pass
|
||||
ring.close()
|
||||
|
||||
if proc.stdout is not None:
|
||||
t = threading.Thread(target=_pump, args=(proc.stdout, stdout_buf), daemon=True)
|
||||
t.start()
|
||||
pumps.append(t)
|
||||
if proc.stderr is not None:
|
||||
t = threading.Thread(target=_pump, args=(proc.stderr, stderr_buf), daemon=True)
|
||||
t.start()
|
||||
pumps.append(t)
|
||||
|
||||
# Wait for either: auto-bg budget, hard timeout, or natural exit.
|
||||
promoted = False
|
||||
timed_out = False
|
||||
budget = auto_background_after_sec if auto_background_after_sec > 0 else timeout_sec
|
||||
budget = min(budget, timeout_sec) if timeout_sec > 0 else budget
|
||||
|
||||
try:
|
||||
proc.wait(timeout=budget if budget > 0 else None)
|
||||
except subprocess.TimeoutExpired:
|
||||
if auto_background_after_sec > 0:
|
||||
# Promote: the process keeps running, we hand its
|
||||
# already-pumping buffers to the JobManager.
|
||||
try:
|
||||
record = get_manager().adopt_running(
|
||||
proc,
|
||||
spawn_argv if resolved_shell is None else command,
|
||||
merged=False,
|
||||
existing_stdout_buf=stdout_buf,
|
||||
existing_stderr_buf=stderr_buf,
|
||||
existing_pumps=pumps,
|
||||
)
|
||||
promoted = True
|
||||
return build_exec_envelope(
|
||||
command=command,
|
||||
exit_code=None,
|
||||
stdout_bytes=stdout_buf.tail(64 * 1024).data,
|
||||
stderr_bytes=stderr_buf.tail(64 * 1024).data,
|
||||
runtime_ms=int((time.monotonic() - start) * 1000),
|
||||
pid=proc.pid,
|
||||
timed_out=False,
|
||||
max_output_kb=max_output_kb,
|
||||
auto_backgrounded=True,
|
||||
job_id=record.job_id,
|
||||
)
|
||||
except JobLimitExceeded:
|
||||
# Cap reached; treat as a hard timeout rather than spin.
|
||||
pass
|
||||
# Fall through to hard-kill path.
|
||||
try:
|
||||
proc.terminate()
|
||||
proc.wait(timeout=2.0)
|
||||
except subprocess.TimeoutExpired:
|
||||
proc.kill()
|
||||
proc.wait()
|
||||
timed_out = True
|
||||
|
||||
# Inline path: drain pump threads.
|
||||
for t in pumps:
|
||||
t.join(timeout=2.0)
|
||||
|
||||
runtime_ms = int((time.monotonic() - start) * 1000)
|
||||
exit_code = proc.returncode if not promoted else None
|
||||
|
||||
# The whole stream is in the ring; read from offset 0 to grab everything.
|
||||
stdout_full = stdout_buf.read(0, stdout_buf.total_written).data
|
||||
stderr_full = stderr_buf.read(0, stderr_buf.total_written).data
|
||||
|
||||
return build_exec_envelope(
|
||||
command=command,
|
||||
exit_code=exit_code,
|
||||
stdout_bytes=stdout_full,
|
||||
stderr_bytes=stderr_full,
|
||||
runtime_ms=runtime_ms,
|
||||
pid=proc.pid,
|
||||
timed_out=timed_out,
|
||||
signaled=(exit_code is not None and exit_code < 0),
|
||||
max_output_kb=max_output_kb,
|
||||
)
|
||||
|
||||
|
||||
def _err_envelope(command: str, message: str) -> dict:
|
||||
"""Construct an envelope-shaped error reply for pre-spawn failures."""
|
||||
return {
|
||||
"exit_code": None,
|
||||
"stdout": "",
|
||||
"stderr": message,
|
||||
"stdout_truncated_bytes": 0,
|
||||
"stderr_truncated_bytes": 0,
|
||||
"runtime_ms": 0,
|
||||
"pid": None,
|
||||
"output_handle": None,
|
||||
"timed_out": False,
|
||||
"semantic_status": "error",
|
||||
"semantic_message": message,
|
||||
"warning": None,
|
||||
"auto_backgrounded": False,
|
||||
"job_id": None,
|
||||
"error": message,
|
||||
}
|
||||
|
||||
|
||||
__all__ = ["register_exec_tools"]
|
||||
@@ -0,0 +1,6 @@
|
||||
"""Background job management for shell-tools."""
|
||||
|
||||
from shell_tools.jobs.manager import JobManager, JobRecord, get_manager
|
||||
from shell_tools.jobs.tools import register_job_tools
|
||||
|
||||
__all__ = ["JobManager", "JobRecord", "get_manager", "register_job_tools"]
|
||||
@@ -0,0 +1,424 @@
|
||||
"""Background job manager.
|
||||
|
||||
Owns the long-lived ``Popen`` instances backing ``shell_job_*`` and
|
||||
``shell_exec`` auto-promotion. Each job has up to two ring buffers
|
||||
(stdout / stderr, or one merged) fed by background pump threads.
|
||||
|
||||
Design notes:
|
||||
- We don't use asyncio here. FastMCP's tool handlers run in a worker
|
||||
thread; subprocess + threads compose more naturally with that
|
||||
model than asyncio Subprocess (which would need its own loop).
|
||||
- ``shell_exec`` "promotes" by adopting an already-running Popen
|
||||
into the manager — it doesn't re-spawn. The pump threads were
|
||||
already filling buffers in the exec path.
|
||||
- Hard concurrency cap (env: ``SHELL_TOOLS_MAX_JOBS``, default 32).
|
||||
The cap is the only non-bypassable safety pin per the soft-
|
||||
guardrails design.
|
||||
- On server shutdown the lifespan hook calls ``shutdown_all()``
|
||||
which TERMs every child, waits 2s, then KILLs. Eliminates
|
||||
orphans.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import os
|
||||
import secrets
|
||||
import signal
|
||||
import subprocess
|
||||
import threading
|
||||
import time
|
||||
from collections.abc import Sequence
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Any
|
||||
|
||||
from shell_tools.common.ring_buffer import RingBuffer
|
||||
|
||||
_MAX_JOBS_DEFAULT = 32
|
||||
_DEFAULT_RING_BYTES = 4 * 1024 * 1024
|
||||
_RECENT_EXIT_KEEP = 50 # exited jobs we still surface to ``shell_job_manage(action="list")``
|
||||
|
||||
|
||||
@dataclass(slots=True)
|
||||
class JobRecord:
|
||||
job_id: str
|
||||
pid: int
|
||||
name: str
|
||||
command: str | list[str]
|
||||
started_at: float
|
||||
proc: subprocess.Popen[bytes]
|
||||
stdout_buf: RingBuffer | None
|
||||
stderr_buf: RingBuffer | None
|
||||
merged: bool
|
||||
pumps: list[threading.Thread] = field(default_factory=list)
|
||||
exited_at: float | None = None
|
||||
exit_code: int | None = None
|
||||
signaled: bool = False
|
||||
# Adopted=True when the job started life as a foreground shell_exec
|
||||
# and was promoted past the auto-background budget.
|
||||
adopted: bool = False
|
||||
|
||||
@property
|
||||
def status(self) -> str:
|
||||
return "exited" if self.exited_at is not None else "running"
|
||||
|
||||
def runtime_ms(self) -> int:
|
||||
end = self.exited_at if self.exited_at is not None else time.monotonic()
|
||||
return int((end - self.started_at) * 1000)
|
||||
|
||||
def to_summary(self) -> dict[str, Any]:
|
||||
return {
|
||||
"job_id": self.job_id,
|
||||
"pid": self.pid,
|
||||
"name": self.name,
|
||||
"command": self.command,
|
||||
"started_at": self.started_at,
|
||||
"status": self.status,
|
||||
"exit_code": self.exit_code,
|
||||
"runtime_ms": self.runtime_ms(),
|
||||
"merged": self.merged,
|
||||
"stdout_bytes": (self.stdout_buf.total_written if self.stdout_buf else 0),
|
||||
"stderr_bytes": (self.stderr_buf.total_written if self.stderr_buf else 0),
|
||||
"adopted": self.adopted,
|
||||
}
|
||||
|
||||
|
||||
class JobLimitExceeded(RuntimeError):
|
||||
"""Raised when the per-server concurrent-job cap would be exceeded."""
|
||||
|
||||
|
||||
class JobManager:
|
||||
def __init__(self, max_jobs: int | None = None, ring_bytes: int = _DEFAULT_RING_BYTES):
|
||||
self._max_jobs = max_jobs or int(os.getenv("SHELL_TOOLS_MAX_JOBS", str(_MAX_JOBS_DEFAULT)))
|
||||
self._ring_bytes = ring_bytes
|
||||
self._jobs: dict[str, JobRecord] = {}
|
||||
# FIFO of recently-exited job_ids so list/inspect can still
|
||||
# find them for a while after exit.
|
||||
self._exited_order: list[str] = []
|
||||
self._lock = threading.Lock()
|
||||
|
||||
# ── Public API ────────────────────────────────────────────────
|
||||
|
||||
def active_count(self) -> int:
|
||||
with self._lock:
|
||||
return sum(1 for j in self._jobs.values() if j.exited_at is None)
|
||||
|
||||
def start(
|
||||
self,
|
||||
command: str | Sequence[str],
|
||||
*,
|
||||
cwd: str | None = None,
|
||||
env: dict[str, str] | None = None,
|
||||
shell: bool | str = False,
|
||||
merge_stderr: bool = False,
|
||||
name: str | None = None,
|
||||
preexec_fn=None,
|
||||
) -> JobRecord:
|
||||
"""Spawn a process and start pumping its output into ring buffers."""
|
||||
if self.active_count() >= self._max_jobs:
|
||||
raise JobLimitExceeded(
|
||||
f"shell-tools job cap reached ({self._max_jobs}). "
|
||||
"Wait for a job to finish or raise SHELL_TOOLS_MAX_JOBS."
|
||||
)
|
||||
|
||||
proc = self._spawn(command, cwd=cwd, env=env, shell=shell, merge_stderr=merge_stderr, preexec_fn=preexec_fn)
|
||||
record = self._adopt(proc, command, name=name, merged=merge_stderr)
|
||||
return record
|
||||
|
||||
def adopt_running(
|
||||
self,
|
||||
proc: subprocess.Popen[bytes],
|
||||
command: str | Sequence[str],
|
||||
*,
|
||||
name: str | None = None,
|
||||
merged: bool = False,
|
||||
existing_stdout_buf: RingBuffer | None = None,
|
||||
existing_stderr_buf: RingBuffer | None = None,
|
||||
existing_pumps: list[threading.Thread] | None = None,
|
||||
) -> JobRecord:
|
||||
"""Adopt a Popen that's already running with pumps in flight.
|
||||
|
||||
Used by ``shell_exec`` for auto-promotion: the foreground path
|
||||
had already started pump threads filling its own ring buffers.
|
||||
We hand the buffers + pumps over to the manager so the agent
|
||||
can resume reading via ``shell_job_logs``.
|
||||
"""
|
||||
if self.active_count() >= self._max_jobs:
|
||||
# Mid-call cap exceeded — kill and report.
|
||||
try:
|
||||
proc.terminate()
|
||||
except Exception:
|
||||
pass
|
||||
raise JobLimitExceeded(
|
||||
f"shell-tools job cap reached ({self._max_jobs}); foreground exec was killed during auto-promotion."
|
||||
)
|
||||
record = self._wrap(
|
||||
proc,
|
||||
command,
|
||||
name=name,
|
||||
merged=merged,
|
||||
stdout_buf=existing_stdout_buf,
|
||||
stderr_buf=existing_stderr_buf,
|
||||
pumps=existing_pumps,
|
||||
adopted=True,
|
||||
)
|
||||
with self._lock:
|
||||
self._jobs[record.job_id] = record
|
||||
# Watcher only — pumps already running.
|
||||
threading.Thread(target=self._watch_for_exit, args=(record,), daemon=True).start()
|
||||
return record
|
||||
|
||||
def get(self, job_id: str) -> JobRecord | None:
|
||||
with self._lock:
|
||||
return self._jobs.get(job_id)
|
||||
|
||||
def list(self) -> list[dict]:
|
||||
with self._lock:
|
||||
jobs = list(self._jobs.values())
|
||||
# Recent first — running, then exited by exit time descending
|
||||
jobs.sort(
|
||||
key=lambda j: (j.exited_at is not None, -(j.exited_at or j.started_at)),
|
||||
)
|
||||
return [j.to_summary() for j in jobs]
|
||||
|
||||
def signal(self, job_id: str, signum: int) -> bool:
|
||||
record = self.get(job_id)
|
||||
if record is None or record.exited_at is not None:
|
||||
return False
|
||||
try:
|
||||
record.proc.send_signal(signum)
|
||||
return True
|
||||
except (ProcessLookupError, OSError):
|
||||
return False
|
||||
|
||||
def write_stdin(self, job_id: str, data: bytes, *, close_after: bool = False) -> int:
|
||||
record = self.get(job_id)
|
||||
if record is None or record.proc.stdin is None or record.exited_at is not None:
|
||||
return 0
|
||||
try:
|
||||
n = record.proc.stdin.write(data)
|
||||
record.proc.stdin.flush()
|
||||
if close_after:
|
||||
record.proc.stdin.close()
|
||||
return int(n or len(data))
|
||||
except (BrokenPipeError, OSError):
|
||||
return 0
|
||||
|
||||
def close_stdin(self, job_id: str) -> bool:
|
||||
record = self.get(job_id)
|
||||
if record is None or record.proc.stdin is None:
|
||||
return False
|
||||
try:
|
||||
record.proc.stdin.close()
|
||||
return True
|
||||
except OSError:
|
||||
return False
|
||||
|
||||
def wait(self, job_id: str, timeout_sec: float | None = None) -> JobRecord | None:
|
||||
"""Block until the job exits or ``timeout_sec`` elapses. Returns
|
||||
the (possibly still-running) record so callers can read final state."""
|
||||
record = self.get(job_id)
|
||||
if record is None:
|
||||
return None
|
||||
try:
|
||||
record.proc.wait(timeout=timeout_sec)
|
||||
except subprocess.TimeoutExpired:
|
||||
pass
|
||||
return record
|
||||
|
||||
def shutdown_all(self, grace_sec: float = 2.0) -> None:
|
||||
"""SIGTERM every running job, wait ``grace_sec``, then SIGKILL.
|
||||
Called from the FastMCP lifespan hook. Idempotent."""
|
||||
with self._lock:
|
||||
running = [j for j in self._jobs.values() if j.exited_at is None]
|
||||
for record in running:
|
||||
try:
|
||||
record.proc.terminate()
|
||||
except Exception:
|
||||
pass
|
||||
deadline = time.monotonic() + grace_sec
|
||||
while time.monotonic() < deadline and any(j.proc.poll() is None for j in running):
|
||||
time.sleep(0.05)
|
||||
for record in running:
|
||||
if record.proc.poll() is None:
|
||||
try:
|
||||
record.proc.kill()
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
# ── Internals ─────────────────────────────────────────────────
|
||||
|
||||
def _spawn(
|
||||
self,
|
||||
command: str | Sequence[str],
|
||||
*,
|
||||
cwd: str | None,
|
||||
env: dict[str, str] | None,
|
||||
shell: bool | str,
|
||||
merge_stderr: bool,
|
||||
preexec_fn,
|
||||
) -> subprocess.Popen[bytes]:
|
||||
# Resolve shell: a string shell is coerced to ``[<shell>, "-c", command]``,
|
||||
# bool=True means /bin/bash with the same shape.
|
||||
from shell_tools.common.limits import _resolve_shell
|
||||
|
||||
resolved = _resolve_shell(shell)
|
||||
if resolved is not None:
|
||||
if isinstance(command, (list, tuple)):
|
||||
command_str = " ".join(str(c) for c in command)
|
||||
else:
|
||||
command_str = str(command)
|
||||
argv: list[str] = [resolved, "-c", command_str]
|
||||
shell_arg = False
|
||||
else:
|
||||
argv = list(command) if isinstance(command, (list, tuple)) else command # type: ignore[assignment]
|
||||
shell_arg = False
|
||||
|
||||
return subprocess.Popen(
|
||||
argv,
|
||||
cwd=cwd,
|
||||
env=env,
|
||||
stdin=subprocess.PIPE,
|
||||
stdout=subprocess.PIPE,
|
||||
stderr=(subprocess.STDOUT if merge_stderr else subprocess.PIPE),
|
||||
shell=shell_arg,
|
||||
preexec_fn=preexec_fn,
|
||||
close_fds=True,
|
||||
bufsize=0,
|
||||
)
|
||||
|
||||
def _adopt(
|
||||
self,
|
||||
proc: subprocess.Popen[bytes],
|
||||
command: str | Sequence[str],
|
||||
*,
|
||||
name: str | None,
|
||||
merged: bool,
|
||||
) -> JobRecord:
|
||||
stdout_buf = RingBuffer(self._ring_bytes)
|
||||
stderr_buf = None if merged else RingBuffer(self._ring_bytes)
|
||||
|
||||
record = self._wrap(proc, command, name=name, merged=merged, stdout_buf=stdout_buf, stderr_buf=stderr_buf)
|
||||
with self._lock:
|
||||
self._jobs[record.job_id] = record
|
||||
|
||||
# Start pumps + watcher
|
||||
if proc.stdout is not None:
|
||||
t = threading.Thread(
|
||||
target=_pump_stream,
|
||||
args=(proc.stdout, stdout_buf),
|
||||
daemon=True,
|
||||
name=f"shell-job-stdout-{record.job_id}",
|
||||
)
|
||||
t.start()
|
||||
record.pumps.append(t)
|
||||
if not merged and proc.stderr is not None and stderr_buf is not None:
|
||||
t = threading.Thread(
|
||||
target=_pump_stream,
|
||||
args=(proc.stderr, stderr_buf),
|
||||
daemon=True,
|
||||
name=f"shell-job-stderr-{record.job_id}",
|
||||
)
|
||||
t.start()
|
||||
record.pumps.append(t)
|
||||
threading.Thread(target=self._watch_for_exit, args=(record,), daemon=True).start()
|
||||
return record
|
||||
|
||||
def _wrap(
|
||||
self,
|
||||
proc: subprocess.Popen[bytes],
|
||||
command: str | Sequence[str],
|
||||
*,
|
||||
name: str | None,
|
||||
merged: bool,
|
||||
stdout_buf: RingBuffer | None = None,
|
||||
stderr_buf: RingBuffer | None = None,
|
||||
pumps: list[threading.Thread] | None = None,
|
||||
adopted: bool = False,
|
||||
) -> JobRecord:
|
||||
return JobRecord(
|
||||
job_id="job_" + secrets.token_hex(6),
|
||||
pid=proc.pid,
|
||||
name=name or _default_name(command),
|
||||
command=list(command) if isinstance(command, (list, tuple)) else str(command),
|
||||
started_at=time.monotonic(),
|
||||
proc=proc,
|
||||
stdout_buf=stdout_buf,
|
||||
stderr_buf=stderr_buf,
|
||||
merged=merged,
|
||||
pumps=pumps or [],
|
||||
adopted=adopted,
|
||||
)
|
||||
|
||||
def _watch_for_exit(self, record: JobRecord) -> None:
|
||||
rc = record.proc.wait()
|
||||
# Drain any final bytes — pump threads exit on EOF, so this is
|
||||
# mostly a join; we don't need to actively pull.
|
||||
for pump in record.pumps:
|
||||
pump.join(timeout=2.0)
|
||||
if record.stdout_buf is not None:
|
||||
record.stdout_buf.close()
|
||||
if record.stderr_buf is not None:
|
||||
record.stderr_buf.close()
|
||||
with self._lock:
|
||||
record.exited_at = time.monotonic()
|
||||
record.exit_code = rc
|
||||
record.signaled = rc < 0 or (rc != 0 and abs(rc) in _SIGNAL_NUMBERS)
|
||||
self._exited_order.append(record.job_id)
|
||||
self._evict_old_exits_locked()
|
||||
|
||||
def _evict_old_exits_locked(self) -> None:
|
||||
while len(self._exited_order) > _RECENT_EXIT_KEEP:
|
||||
old_id = self._exited_order.pop(0)
|
||||
self._jobs.pop(old_id, None)
|
||||
|
||||
|
||||
def _pump_stream(stream, ring: RingBuffer) -> None:
|
||||
"""Read bytes from ``stream`` until EOF; push into ``ring``."""
|
||||
try:
|
||||
while True:
|
||||
chunk = stream.read(4096)
|
||||
if not chunk:
|
||||
break
|
||||
ring.write(chunk)
|
||||
except (OSError, ValueError):
|
||||
pass
|
||||
finally:
|
||||
try:
|
||||
stream.close()
|
||||
except Exception:
|
||||
pass
|
||||
ring.close()
|
||||
|
||||
|
||||
def _default_name(command: str | Sequence[str]) -> str:
|
||||
if isinstance(command, (list, tuple)):
|
||||
return command[0] if command else "job"
|
||||
text = str(command).strip().split()
|
||||
return text[0] if text else "job"
|
||||
|
||||
|
||||
_SIGNAL_NUMBERS = {
|
||||
signal.SIGINT,
|
||||
signal.SIGTERM,
|
||||
signal.SIGKILL,
|
||||
signal.SIGHUP,
|
||||
signal.SIGUSR1,
|
||||
signal.SIGUSR2,
|
||||
}
|
||||
|
||||
|
||||
# Module-level singleton.
|
||||
_MANAGER: JobManager | None = None
|
||||
_MANAGER_LOCK = threading.Lock()
|
||||
|
||||
|
||||
def get_manager() -> JobManager:
|
||||
global _MANAGER
|
||||
if _MANAGER is None:
|
||||
with _MANAGER_LOCK:
|
||||
if _MANAGER is None:
|
||||
_MANAGER = JobManager()
|
||||
return _MANAGER
|
||||
|
||||
|
||||
__all__ = ["JobManager", "JobRecord", "JobLimitExceeded", "get_manager"]
|
||||
@@ -0,0 +1,221 @@
|
||||
"""Job-control MCP tools: ``shell_job_start``, ``shell_job_logs``,
|
||||
``shell_job_manage``.
|
||||
|
||||
Three tools, not seven: ``_logs`` rolls in status + wait, ``_manage``
|
||||
covers list + signals + stdin so the agent has fewer tool names to
|
||||
remember. Tradeoff is multi-action ``_manage`` is slightly less
|
||||
self-documenting; the foundational skill compensates.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import signal
|
||||
from typing import TYPE_CHECKING, Any
|
||||
|
||||
from shell_tools.common.limits import coerce_limits, make_preexec_fn, sanitized_env
|
||||
from shell_tools.jobs.manager import JobLimitExceeded, get_manager
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from fastmcp import FastMCP
|
||||
|
||||
|
||||
_SIGNAL_ALIASES = {
|
||||
"signal_term": signal.SIGTERM,
|
||||
"signal_kill": signal.SIGKILL,
|
||||
"signal_int": signal.SIGINT,
|
||||
"signal_hup": signal.SIGHUP,
|
||||
"signal_usr1": signal.SIGUSR1,
|
||||
"signal_usr2": signal.SIGUSR2,
|
||||
}
|
||||
|
||||
|
||||
def register_job_tools(mcp: FastMCP) -> None:
|
||||
manager = get_manager()
|
||||
|
||||
@mcp.tool()
|
||||
def shell_job_start(
|
||||
command: str,
|
||||
cwd: str | None = None,
|
||||
env: dict[str, str] | None = None,
|
||||
merge_stderr: bool = False,
|
||||
shell: bool = False,
|
||||
name: str | None = None,
|
||||
limits: dict[str, int] | None = None,
|
||||
) -> dict:
|
||||
"""Spawn a background process. Returns a job_id you poll with shell_job_logs.
|
||||
|
||||
Use this when work might run >1 minute, when you want to keep doing
|
||||
other things while it runs, or when you need to stream logs as they
|
||||
arrive. Jobs die when the shell-tools server restarts — they are NOT
|
||||
persistent across reboots.
|
||||
|
||||
Args:
|
||||
command: Shell command to run. With shell=False, pass argv via the
|
||||
command string and we'll split on whitespace; for complex
|
||||
quoting use shell=True.
|
||||
cwd: Working directory. Default: server's cwd.
|
||||
env: Environment override. Merged into a sanitized base env (with
|
||||
zsh dotfile vars stripped).
|
||||
merge_stderr: When True, stderr is interleaved into stdout in a
|
||||
single ring buffer. Convenient for log-shaped output where
|
||||
ordering matters.
|
||||
shell: True to invoke /bin/bash -c. Refuses zsh.
|
||||
name: Optional human label surfaced in shell_job_manage(action="list").
|
||||
limits: Optional resource caps applied via setrlimit before exec.
|
||||
Keys: cpu_sec, rss_mb, fsize_mb, nofile.
|
||||
|
||||
Returns: {job_id, pid, started_at}
|
||||
"""
|
||||
try:
|
||||
# Build argv: for shell=False, naive split is fine for the common case;
|
||||
# the foundational skill steers complex commands toward shell=True.
|
||||
argv: list[str] | str
|
||||
if shell:
|
||||
argv = command
|
||||
else:
|
||||
argv = command.split()
|
||||
if not argv:
|
||||
return {"error": "command was empty"}
|
||||
|
||||
full_env = sanitized_env(env) if env is not None else None
|
||||
preexec = make_preexec_fn(coerce_limits(limits))
|
||||
record = manager.start(
|
||||
argv,
|
||||
cwd=cwd,
|
||||
env=full_env,
|
||||
shell=shell,
|
||||
merge_stderr=merge_stderr,
|
||||
name=name,
|
||||
preexec_fn=preexec,
|
||||
)
|
||||
return {
|
||||
"job_id": record.job_id,
|
||||
"pid": record.pid,
|
||||
"started_at": record.started_at,
|
||||
"name": record.name,
|
||||
"merged": merge_stderr,
|
||||
}
|
||||
except JobLimitExceeded as e:
|
||||
return {"error": str(e)}
|
||||
except Exception as e:
|
||||
return {"error": f"{type(e).__name__}: {e}"}
|
||||
|
||||
@mcp.tool()
|
||||
def shell_job_logs(
|
||||
job_id: str,
|
||||
stream: str = "stdout",
|
||||
since_offset: int = 0,
|
||||
max_bytes: int = 64000,
|
||||
wait_until_exit: bool = False,
|
||||
wait_timeout_sec: float = 30.0,
|
||||
tail: bool = False,
|
||||
) -> dict:
|
||||
"""Read job output at an offset. Combined read + status + wait primitive.
|
||||
|
||||
Track next_offset across calls to avoid replaying data. When
|
||||
wait_until_exit=True, blocks server-side until the job exits or
|
||||
wait_timeout_sec elapses, then returns logs and final status.
|
||||
|
||||
Args:
|
||||
job_id: From shell_job_start (or auto-promoted from shell_exec).
|
||||
stream: "stdout" | "stderr" | "merged". Use "merged" only when the
|
||||
job was started with merge_stderr=True.
|
||||
since_offset: Absolute byte offset to start reading from. Pass 0
|
||||
on first call; pass next_offset on subsequent calls.
|
||||
max_bytes: Max bytes of decoded output to return inline.
|
||||
wait_until_exit: When True, blocks until the job exits before reading.
|
||||
wait_timeout_sec: Cap on the wait. Returns whatever's accumulated.
|
||||
tail: When True, ignores since_offset and returns the last max_bytes.
|
||||
|
||||
Returns: {data, offset, next_offset, status, exit_code, eof, truncated_bytes_dropped}
|
||||
"""
|
||||
record = manager.get(job_id)
|
||||
if record is None:
|
||||
return {"error": f"unknown job_id: {job_id}"}
|
||||
|
||||
if wait_until_exit:
|
||||
manager.wait(job_id, timeout_sec=wait_timeout_sec)
|
||||
record = manager.get(job_id) or record
|
||||
|
||||
if stream == "merged":
|
||||
# Merged jobs always read from stdout_buf (which received both)
|
||||
buf = record.stdout_buf
|
||||
elif stream == "stderr":
|
||||
buf = record.stderr_buf
|
||||
else:
|
||||
buf = record.stdout_buf
|
||||
|
||||
if buf is None:
|
||||
return {
|
||||
"error": f"stream={stream!r} not available (merge_stderr={record.merged})",
|
||||
}
|
||||
|
||||
result = buf.tail(max_bytes) if tail else buf.read(since_offset, max_bytes)
|
||||
return {
|
||||
"data": result.data.decode("utf-8", errors="replace"),
|
||||
"offset": result.offset,
|
||||
"next_offset": result.next_offset,
|
||||
"truncated_bytes_dropped": result.truncated_bytes_dropped,
|
||||
"eof": buf.eof and result.next_offset >= buf.total_written,
|
||||
"status": record.status,
|
||||
"exit_code": record.exit_code,
|
||||
"runtime_ms": record.runtime_ms(),
|
||||
}
|
||||
|
||||
@mcp.tool()
|
||||
def shell_job_manage(
|
||||
action: str,
|
||||
job_id: str | None = None,
|
||||
data: str | None = None,
|
||||
) -> dict:
|
||||
"""List jobs, send signals, or write to job stdin.
|
||||
|
||||
Single tool covering job-control side effects. The action argument
|
||||
picks the operation:
|
||||
|
||||
- "list": list active + recently-exited jobs. job_id ignored.
|
||||
- "signal_term" | "signal_kill" | "signal_int" | "signal_hup"
|
||||
| "signal_usr1" | "signal_usr2": send the named signal. Requires job_id.
|
||||
- "stdin": write `data` to the job's stdin. Requires job_id and data.
|
||||
- "close_stdin": close the job's stdin pipe (e.g. to flush a tool that
|
||||
reads until EOF). Requires job_id.
|
||||
|
||||
Signal escalation idiom (foundational skill teaches this): try
|
||||
signal_int first (graceful), then signal_term after a few seconds, then
|
||||
signal_kill as a last resort. The OS may take a moment to deliver.
|
||||
|
||||
Returns vary by action. List → {jobs: [...]}. Signals → {ok, signal}.
|
||||
Stdin → {bytes_written}.
|
||||
"""
|
||||
if action == "list":
|
||||
return {"jobs": manager.list()}
|
||||
|
||||
if not job_id:
|
||||
return {"error": f"action={action!r} requires job_id"}
|
||||
|
||||
if action in _SIGNAL_ALIASES:
|
||||
ok = manager.signal(job_id, _SIGNAL_ALIASES[action])
|
||||
return {"ok": ok, "signal": action.removeprefix("signal_").upper()}
|
||||
|
||||
if action == "stdin":
|
||||
if data is None:
|
||||
return {"error": "action=stdin requires data"}
|
||||
n = manager.write_stdin(job_id, data.encode("utf-8"))
|
||||
return {"bytes_written": n}
|
||||
|
||||
if action == "close_stdin":
|
||||
return {"ok": manager.close_stdin(job_id)}
|
||||
|
||||
return {"error": f"unknown action: {action!r}"}
|
||||
|
||||
# Expose a non-tool reference so the lifespan hook can shutdown_all().
|
||||
register_job_tools.manager = manager # type: ignore[attr-defined]
|
||||
|
||||
|
||||
def get_registered_manager() -> Any:
|
||||
"""Return the JobManager registered for the most recent FastMCP setup.
|
||||
Used by the server lifespan to reap on shutdown."""
|
||||
return get_manager()
|
||||
|
||||
|
||||
__all__ = ["register_job_tools", "get_registered_manager"]
|
||||
@@ -0,0 +1,41 @@
|
||||
"""``shell_output_get`` — retrieve truncated output via handle."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import TYPE_CHECKING
|
||||
|
||||
from shell_tools.common.output_store import get_store
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from fastmcp import FastMCP
|
||||
|
||||
|
||||
def register_output_tools(mcp: FastMCP) -> None:
|
||||
@mcp.tool()
|
||||
def shell_output_get(
|
||||
output_handle: str,
|
||||
since_offset: int = 0,
|
||||
max_kb: int = 64,
|
||||
) -> dict:
|
||||
"""Retrieve a slice of truncated output by handle.
|
||||
|
||||
When shell_exec or shell_job_logs returns more output than fits inline,
|
||||
you'll see `output_handle: "out_<hex>"`. Pass it here with successive
|
||||
offsets to paginate. The full output is preserved (combined stdout+stderr
|
||||
with `--- stdout ---` / `--- stderr ---` separators) for 5 minutes.
|
||||
|
||||
Args:
|
||||
output_handle: From a prior tool's envelope.
|
||||
since_offset: Pass 0 first, then next_offset from the previous call.
|
||||
max_kb: Max KB to return per call.
|
||||
|
||||
Returns: {data, offset, next_offset, eof, expired}
|
||||
"""
|
||||
return get_store().get(
|
||||
output_handle,
|
||||
since_offset=since_offset,
|
||||
max_bytes=max_kb * 1024,
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["register_output_tools"]
|
||||
@@ -0,0 +1,5 @@
|
||||
"""Persistent PTY-backed shell sessions."""
|
||||
|
||||
from shell_tools.pty.tools import register_pty_tools
|
||||
|
||||
__all__ = ["register_pty_tools"]
|
||||
@@ -0,0 +1,367 @@
|
||||
"""Persistent PTY-backed bash sessions.
|
||||
|
||||
Built on stdlib ``pty.openpty()`` + ``os.fork()``. A reader thread
|
||||
fills a ring buffer; the public API exposes three modes:
|
||||
|
||||
- ``run(command, timeout_sec)``: write the command, wait for the
|
||||
unique prompt sentinel (or an ``expect=`` regex override), return
|
||||
everything in between.
|
||||
- ``send_raw(data)``: write bytes, no waiting. For REPLs / vim /
|
||||
sudo-prompt-style flows.
|
||||
- ``drain(timeout_sec)``: read whatever's currently buffered (after
|
||||
a raw send).
|
||||
|
||||
A unique ``PS1`` sentinel is set at session start so ``run()`` can
|
||||
unambiguously detect command completion. Per-session concurrency is
|
||||
serialized: a busy session refuses concurrent ``run()`` calls.
|
||||
|
||||
POSIX-only: imports stdlib ``pty`` which doesn't exist on Windows.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import errno
|
||||
import fcntl
|
||||
import os
|
||||
import pty
|
||||
import re
|
||||
import select
|
||||
import signal
|
||||
import struct
|
||||
import termios
|
||||
import threading
|
||||
import time
|
||||
import uuid
|
||||
|
||||
from shell_tools.common.limits import _resolve_shell, sanitized_env
|
||||
from shell_tools.common.ring_buffer import RingBuffer
|
||||
|
||||
_BUF_BYTES = 2 * 1024 * 1024
|
||||
|
||||
|
||||
class SessionBusy(RuntimeError):
|
||||
"""Raised when a concurrent run() attempts to use a session that's already executing."""
|
||||
|
||||
|
||||
class PtySession:
|
||||
"""One persistent bash session bound to a PTY.
|
||||
|
||||
Thread-safe for the disjoint-mode operations: ``run`` serializes via
|
||||
``_busy_lock``, ``send_raw`` and ``drain`` use the ring's own lock.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
*,
|
||||
cwd: str | None = None,
|
||||
env: dict[str, str] | None = None,
|
||||
shell: bool | str = True,
|
||||
cols: int = 120,
|
||||
rows: int = 40,
|
||||
idle_timeout_sec: int = 1800,
|
||||
):
|
||||
self.session_id = "pty_" + uuid.uuid4().hex[:10]
|
||||
self.shell_path = _resolve_shell(shell) or "/bin/bash"
|
||||
self._sentinel_token = uuid.uuid4().hex
|
||||
self._sentinel = f"__SHELLTOOLS_PROMPT_{self._sentinel_token}__"
|
||||
self._sentinel_re = re.compile(re.escape(self._sentinel))
|
||||
|
||||
# Build env: zsh leakage stripped, prompt set to our sentinel.
|
||||
merged_env = sanitized_env(env)
|
||||
merged_env["PS1"] = f"{self._sentinel}\n$ "
|
||||
merged_env["PS2"] = ""
|
||||
merged_env["PROMPT_COMMAND"] = "" # don't let user dotfiles override PS1
|
||||
merged_env["TERM"] = merged_env.get("TERM", "xterm-256color")
|
||||
|
||||
self._created_at = time.monotonic()
|
||||
self._last_activity = self._created_at
|
||||
self.idle_timeout_sec = idle_timeout_sec
|
||||
|
||||
self._pid, self._fd = pty.fork()
|
||||
if self._pid == 0:
|
||||
# Child — exec bash. --norc --noprofile keeps things
|
||||
# predictable; the foundational skill teaches that the
|
||||
# session runs vanilla bash, not the user's interactive
|
||||
# shell.
|
||||
try:
|
||||
if cwd:
|
||||
os.chdir(cwd)
|
||||
argv = [self.shell_path, "--norc", "--noprofile", "-i"]
|
||||
os.execve(self.shell_path, argv, merged_env)
|
||||
except Exception as e: # pragma: no cover — child exec
|
||||
os.write(2, f"shell-tools pty: exec failed: {e}\n".encode())
|
||||
os._exit(127)
|
||||
|
||||
# Parent
|
||||
_set_pty_size(self._fd, rows, cols)
|
||||
_set_nonblocking(self._fd)
|
||||
|
||||
self._buf = RingBuffer(_BUF_BYTES)
|
||||
self._busy_lock = threading.Lock()
|
||||
self._closed = threading.Event()
|
||||
|
||||
self._reader = threading.Thread(target=self._read_loop, daemon=True, name=f"pty-reader-{self.session_id}")
|
||||
self._reader.start()
|
||||
|
||||
# Wait for the first prompt so the session is "ready" before we return.
|
||||
# If bash --norc somehow doesn't print one, give up after 2 seconds —
|
||||
# the session is still usable, it just won't have a prompt-aligned
|
||||
# initial offset.
|
||||
self._wait_for_sentinel(timeout_sec=2.0, since_offset=0)
|
||||
|
||||
# ── Public API ────────────────────────────────────────────────
|
||||
|
||||
@property
|
||||
def pid(self) -> int:
|
||||
return self._pid
|
||||
|
||||
def is_alive(self) -> bool:
|
||||
if self._closed.is_set():
|
||||
return False
|
||||
try:
|
||||
pid, _ = os.waitpid(self._pid, os.WNOHANG)
|
||||
return pid == 0
|
||||
except ChildProcessError:
|
||||
return False
|
||||
|
||||
def run(self, command: str, *, expect: str | None = None, timeout_sec: float = 60.0) -> dict:
|
||||
"""Send ``command`` + newline, wait for the prompt sentinel
|
||||
(or ``expect`` regex override), return the slice in between."""
|
||||
if not self._busy_lock.acquire(blocking=False):
|
||||
raise SessionBusy(f"session {self.session_id} is busy")
|
||||
try:
|
||||
start_offset = self._buf.total_written
|
||||
self._write(command.encode("utf-8") + b"\n")
|
||||
self._last_activity = time.monotonic()
|
||||
return self._wait_for_sentinel(
|
||||
timeout_sec=timeout_sec,
|
||||
since_offset=start_offset,
|
||||
expect_pattern=expect,
|
||||
)
|
||||
finally:
|
||||
self._busy_lock.release()
|
||||
|
||||
def send_raw(self, data: str, *, add_newline: bool = False) -> int:
|
||||
"""Write bytes without waiting for prompt. For REPLs/vim/sudo prompts."""
|
||||
payload = data.encode("utf-8")
|
||||
if add_newline:
|
||||
payload += b"\n"
|
||||
n = self._write(payload)
|
||||
self._last_activity = time.monotonic()
|
||||
return n
|
||||
|
||||
def drain(self, *, timeout_sec: float = 2.0, max_bytes: int = 64000) -> dict:
|
||||
"""Read whatever's currently buffered. Used after send_raw to capture
|
||||
REPL / interactive-program output."""
|
||||
deadline = time.monotonic() + timeout_sec
|
||||
last_total = self._buf.total_written
|
||||
# Wait for activity to settle for a brief window — gives the
|
||||
# process a chance to finish its current line.
|
||||
while time.monotonic() < deadline:
|
||||
time.sleep(0.05)
|
||||
current = self._buf.total_written
|
||||
if current == last_total:
|
||||
break
|
||||
last_total = current
|
||||
|
||||
result = self._buf.tail(max_bytes)
|
||||
return {
|
||||
"output": result.data.decode("utf-8", errors="replace"),
|
||||
"more": result.next_offset < self._buf.total_written,
|
||||
"offset": result.offset,
|
||||
"next_offset": result.next_offset,
|
||||
"timed_out": False,
|
||||
}
|
||||
|
||||
def close(self, *, force: bool = False, grace_sec: float = 1.0) -> dict:
|
||||
"""Terminate the session. Returns final output."""
|
||||
if self._closed.is_set():
|
||||
return {"exit_code": None, "final_output": "", "already_closed": True}
|
||||
|
||||
# Flush an exit if not forcing.
|
||||
if not force:
|
||||
try:
|
||||
self._write(b"exit\n")
|
||||
except OSError:
|
||||
pass
|
||||
|
||||
deadline = time.monotonic() + grace_sec
|
||||
while time.monotonic() < deadline:
|
||||
try:
|
||||
pid, status = os.waitpid(self._pid, os.WNOHANG)
|
||||
if pid != 0:
|
||||
break
|
||||
except ChildProcessError:
|
||||
break
|
||||
time.sleep(0.05)
|
||||
|
||||
try:
|
||||
os.kill(self._pid, signal.SIGTERM)
|
||||
except (ProcessLookupError, PermissionError):
|
||||
pass
|
||||
try:
|
||||
os.waitpid(self._pid, os.WNOHANG)
|
||||
except ChildProcessError:
|
||||
pass
|
||||
|
||||
if self.is_alive():
|
||||
try:
|
||||
os.kill(self._pid, signal.SIGKILL)
|
||||
except (ProcessLookupError, PermissionError):
|
||||
pass
|
||||
|
||||
self._closed.set()
|
||||
try:
|
||||
os.close(self._fd)
|
||||
except OSError:
|
||||
pass
|
||||
|
||||
# Final output = whatever's still in the ring.
|
||||
result = self._buf.tail(64 * 1024)
|
||||
try:
|
||||
_pid, status = os.waitpid(self._pid, os.WNOHANG)
|
||||
exit_code = os.WEXITSTATUS(status) if os.WIFEXITED(status) else None
|
||||
except ChildProcessError:
|
||||
exit_code = None
|
||||
return {
|
||||
"exit_code": exit_code,
|
||||
"final_output": result.data.decode("utf-8", errors="replace"),
|
||||
"already_closed": False,
|
||||
}
|
||||
|
||||
def to_summary(self) -> dict:
|
||||
return {
|
||||
"session_id": self.session_id,
|
||||
"pid": self._pid,
|
||||
"shell": self.shell_path,
|
||||
"alive": self.is_alive(),
|
||||
"idle_sec": int(time.monotonic() - self._last_activity),
|
||||
"created_at": self._created_at,
|
||||
}
|
||||
|
||||
# ── Internals ─────────────────────────────────────────────────
|
||||
|
||||
def _write(self, data: bytes) -> int:
|
||||
if self._closed.is_set():
|
||||
raise OSError("session is closed")
|
||||
try:
|
||||
return os.write(self._fd, data)
|
||||
except OSError as e:
|
||||
if e.errno == errno.EAGAIN:
|
||||
# PTY is full — retry briefly.
|
||||
deadline = time.monotonic() + 1.0
|
||||
while time.monotonic() < deadline:
|
||||
time.sleep(0.01)
|
||||
try:
|
||||
return os.write(self._fd, data)
|
||||
except OSError:
|
||||
continue
|
||||
raise
|
||||
|
||||
def _read_loop(self) -> None:
|
||||
while not self._closed.is_set():
|
||||
try:
|
||||
ready, _, _ = select.select([self._fd], [], [], 0.5)
|
||||
except (OSError, ValueError):
|
||||
break
|
||||
if not ready:
|
||||
# Periodically check for child death even when no data.
|
||||
try:
|
||||
pid, _ = os.waitpid(self._pid, os.WNOHANG)
|
||||
if pid != 0:
|
||||
break
|
||||
except ChildProcessError:
|
||||
break
|
||||
continue
|
||||
try:
|
||||
chunk = os.read(self._fd, 4096)
|
||||
except OSError:
|
||||
break
|
||||
if not chunk:
|
||||
break
|
||||
self._buf.write(chunk)
|
||||
self._buf.close()
|
||||
self._closed.set()
|
||||
|
||||
def _wait_for_sentinel(
|
||||
self,
|
||||
*,
|
||||
timeout_sec: float,
|
||||
since_offset: int,
|
||||
expect_pattern: str | None = None,
|
||||
) -> dict:
|
||||
"""Poll the buffer until we see the sentinel (or expect pattern)."""
|
||||
deadline = time.monotonic() + timeout_sec
|
||||
pattern: re.Pattern[str] | None = None
|
||||
if expect_pattern is not None:
|
||||
pattern = re.compile(expect_pattern)
|
||||
|
||||
prompt_offset = since_offset
|
||||
while time.monotonic() < deadline:
|
||||
slice_ = self._buf.read(since_offset, self._buf.total_written - since_offset)
|
||||
text = slice_.data.decode("utf-8", errors="replace")
|
||||
if pattern is not None:
|
||||
m = pattern.search(text)
|
||||
if m is not None:
|
||||
output = text[: m.start()]
|
||||
prompt_offset = since_offset + len(text[: m.end()].encode("utf-8", errors="replace"))
|
||||
return {
|
||||
"output": output,
|
||||
"prompt_after": True,
|
||||
"matched_expect": True,
|
||||
"next_offset": prompt_offset,
|
||||
"timed_out": False,
|
||||
}
|
||||
else:
|
||||
m = self._sentinel_re.search(text)
|
||||
if m is not None:
|
||||
output = text[: m.start()]
|
||||
# Strip the trailing echoed command/newline above the sentinel
|
||||
output = _strip_command_echo(output)
|
||||
return {
|
||||
"output": output,
|
||||
"prompt_after": True,
|
||||
"matched_expect": False,
|
||||
"next_offset": since_offset + len(text[: m.end()].encode("utf-8", errors="replace")),
|
||||
"timed_out": False,
|
||||
}
|
||||
time.sleep(0.05)
|
||||
if self._closed.is_set():
|
||||
break
|
||||
|
||||
# Timed out — return whatever we have.
|
||||
slice_ = self._buf.read(since_offset, self._buf.total_written - since_offset)
|
||||
return {
|
||||
"output": slice_.data.decode("utf-8", errors="replace"),
|
||||
"prompt_after": False,
|
||||
"matched_expect": False,
|
||||
"next_offset": slice_.next_offset,
|
||||
"timed_out": True,
|
||||
}
|
||||
|
||||
|
||||
def _set_pty_size(fd: int, rows: int, cols: int) -> None:
|
||||
try:
|
||||
fcntl.ioctl(fd, termios.TIOCSWINSZ, struct.pack("HHHH", rows, cols, 0, 0))
|
||||
except OSError:
|
||||
pass
|
||||
|
||||
|
||||
def _set_nonblocking(fd: int) -> None:
|
||||
flags = fcntl.fcntl(fd, fcntl.F_GETFL)
|
||||
fcntl.fcntl(fd, fcntl.F_SETFL, flags | os.O_NONBLOCK)
|
||||
|
||||
|
||||
def _strip_command_echo(text: str) -> str:
|
||||
"""Drop the first line if it looks like the echoed command. PTYs in
|
||||
canonical mode echo the user's input back; we want only the program's
|
||||
output. Best-effort heuristic — leaves the text alone if uncertain."""
|
||||
if "\n" in text:
|
||||
first, rest = text.split("\n", 1)
|
||||
# Keep only the rest if the first line is short (likely the echo).
|
||||
if len(first) < 4096:
|
||||
return rest
|
||||
return text
|
||||
|
||||
|
||||
__all__ = ["PtySession", "SessionBusy"]
|
||||
@@ -0,0 +1,243 @@
|
||||
"""Three PTY tools: ``shell_pty_open``, ``shell_pty_run``, ``shell_pty_close``.
|
||||
|
||||
Per-server hard cap on concurrent sessions (env: ``SHELL_TOOLS_MAX_PTY``,
|
||||
default 8) prevents PTY exhaustion. Idle sessions older than
|
||||
``idle_timeout_sec`` are reaped lazily on every ``_open`` so an
|
||||
abandoned session can't leak a bash forever.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import os
|
||||
import sys
|
||||
import threading
|
||||
import time
|
||||
from typing import TYPE_CHECKING
|
||||
|
||||
from shell_tools.common.limits import ZshRefused
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from fastmcp import FastMCP
|
||||
|
||||
|
||||
_MAX_PTY_DEFAULT = 8
|
||||
|
||||
|
||||
class _PtyRegistry:
|
||||
def __init__(self):
|
||||
self._sessions: dict[str, PtySession] = {} # noqa: F821
|
||||
self._lock = threading.Lock()
|
||||
self._max = int(os.getenv("SHELL_TOOLS_MAX_PTY", str(_MAX_PTY_DEFAULT)))
|
||||
|
||||
def reap_idle(self) -> None:
|
||||
"""Drop sessions whose idle time exceeded their idle_timeout_sec."""
|
||||
with self._lock:
|
||||
now = time.monotonic()
|
||||
stale = [
|
||||
sid
|
||||
for sid, sess in self._sessions.items()
|
||||
if not sess.is_alive() or (now - sess._last_activity) > sess.idle_timeout_sec
|
||||
]
|
||||
for sid in stale:
|
||||
sess = self._sessions.pop(sid, None)
|
||||
if sess is not None:
|
||||
try:
|
||||
sess.close(force=True, grace_sec=0.5)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
def count(self) -> int:
|
||||
with self._lock:
|
||||
return len(self._sessions)
|
||||
|
||||
def add(self, sess) -> None:
|
||||
with self._lock:
|
||||
if len(self._sessions) >= self._max:
|
||||
# Caller should have reaped first; treat as cap.
|
||||
raise RuntimeError(
|
||||
f"shell-tools PTY cap reached ({self._max}). "
|
||||
"Close idle sessions or raise SHELL_TOOLS_MAX_PTY."
|
||||
)
|
||||
self._sessions[sess.session_id] = sess
|
||||
|
||||
def get(self, sid: str):
|
||||
with self._lock:
|
||||
return self._sessions.get(sid)
|
||||
|
||||
def remove(self, sid: str) -> None:
|
||||
with self._lock:
|
||||
self._sessions.pop(sid, None)
|
||||
|
||||
def list(self) -> list[dict]:
|
||||
with self._lock:
|
||||
return [s.to_summary() for s in self._sessions.values()]
|
||||
|
||||
def shutdown_all(self) -> None:
|
||||
with self._lock:
|
||||
sessions = list(self._sessions.values())
|
||||
self._sessions.clear()
|
||||
for sess in sessions:
|
||||
try:
|
||||
sess.close(force=True, grace_sec=0.5)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
|
||||
_REGISTRY = _PtyRegistry()
|
||||
|
||||
|
||||
def get_registry() -> _PtyRegistry:
|
||||
return _REGISTRY
|
||||
|
||||
|
||||
def register_pty_tools(mcp: FastMCP) -> None:
|
||||
if sys.platform == "win32":
|
||||
# Register stub tools that report unsupported; keeps the tool
|
||||
# surface uniform across platforms even when PTY is unavailable.
|
||||
@mcp.tool()
|
||||
def shell_pty_open(*args, **kwargs) -> dict:
|
||||
"""Persistent PTY-backed bash session. POSIX-only.
|
||||
|
||||
Windows is not supported in v1 — use shell_exec / shell_job_*
|
||||
for non-interactive work. The PTY tools require stdlib pty,
|
||||
which exists only on Linux + macOS.
|
||||
"""
|
||||
return {"error": "shell_pty_* tools are POSIX-only; not supported on Windows"}
|
||||
|
||||
@mcp.tool()
|
||||
def shell_pty_run(*args, **kwargs) -> dict: # noqa: D401
|
||||
"""Persistent PTY-backed bash session. POSIX-only."""
|
||||
return {"error": "shell_pty_* tools are POSIX-only; not supported on Windows"}
|
||||
|
||||
@mcp.tool()
|
||||
def shell_pty_close(*args, **kwargs) -> dict: # noqa: D401
|
||||
"""Persistent PTY-backed bash session. POSIX-only."""
|
||||
return {"error": "shell_pty_* tools are POSIX-only; not supported on Windows"}
|
||||
|
||||
return
|
||||
|
||||
from shell_tools.pty.session import PtySession, SessionBusy
|
||||
|
||||
@mcp.tool()
|
||||
def shell_pty_open(
|
||||
cwd: str | None = None,
|
||||
env: dict[str, str] | None = None,
|
||||
cols: int = 120,
|
||||
rows: int = 40,
|
||||
idle_timeout_sec: int = 1800,
|
||||
) -> dict:
|
||||
"""Open a persistent /bin/bash session in a PTY.
|
||||
|
||||
Use a session when you need state across calls — building env vars,
|
||||
navigating with cd, driving REPLs, or responding to interactive
|
||||
prompts (sudo, ssh, mysql). For one-shot work, use shell_exec
|
||||
instead.
|
||||
|
||||
The session runs vanilla bash (--norc --noprofile) so dotfiles
|
||||
don't surprise you. A unique PS1 sentinel is set so shell_pty_run
|
||||
can unambiguously detect command completion. macOS users: this
|
||||
is /bin/bash, not zsh, by deliberate policy — explicit
|
||||
shell="/bin/zsh" overrides are rejected.
|
||||
|
||||
Args:
|
||||
cwd: Initial working directory.
|
||||
env: Environment override (zsh dotfile vars are stripped).
|
||||
cols, rows: Terminal size.
|
||||
idle_timeout_sec: Drop the session after this many seconds idle.
|
||||
|
||||
Returns: {session_id, pid, shell}
|
||||
"""
|
||||
_REGISTRY.reap_idle()
|
||||
try:
|
||||
sess = PtySession(cwd=cwd, env=env, cols=cols, rows=rows, idle_timeout_sec=idle_timeout_sec)
|
||||
except ZshRefused as e:
|
||||
return {"error": str(e)}
|
||||
except Exception as e:
|
||||
return {"error": f"failed to open session: {type(e).__name__}: {e}"}
|
||||
try:
|
||||
_REGISTRY.add(sess)
|
||||
except RuntimeError as e:
|
||||
sess.close(force=True, grace_sec=0.2)
|
||||
return {"error": str(e)}
|
||||
return {
|
||||
"session_id": sess.session_id,
|
||||
"pid": sess.pid,
|
||||
"shell": sess.shell_path,
|
||||
}
|
||||
|
||||
@mcp.tool()
|
||||
def shell_pty_run(
|
||||
session_id: str,
|
||||
command: str | None = None,
|
||||
expect: str | None = None,
|
||||
raw_send: bool = False,
|
||||
read_only: bool = False,
|
||||
timeout_sec: float = 60.0,
|
||||
) -> dict:
|
||||
"""Run a command in a session, send raw input, or drain output.
|
||||
|
||||
Three modes:
|
||||
- Default: pass a command. The session sends it, waits for the
|
||||
unique prompt sentinel (or `expect` regex if provided), and
|
||||
returns the output between submission and prompt.
|
||||
- raw_send=True: pass a command. The text is written without
|
||||
waiting for prompt. Use for REPL input ("p('hi')\\n"), for
|
||||
password prompts (sudo), or for vim keystrokes.
|
||||
- read_only=True: drains whatever's currently buffered.
|
||||
Typically follows raw_send.
|
||||
|
||||
Args:
|
||||
session_id: From shell_pty_open.
|
||||
command: The text to send. None when read_only=True.
|
||||
expect: Regex to wait for INSTEAD of the default prompt sentinel.
|
||||
Useful when the command launches a REPL with its own prompt.
|
||||
raw_send: Don't wait for prompt; just write.
|
||||
read_only: Don't send anything; drain the buffer.
|
||||
timeout_sec: Max wait. On timeout, returns whatever's buffered
|
||||
with timed_out=True (the command may still be running —
|
||||
check with another _run call).
|
||||
|
||||
Returns: {output, prompt_after, timed_out, ...}
|
||||
"""
|
||||
sess = _REGISTRY.get(session_id)
|
||||
if sess is None:
|
||||
return {"error": f"unknown session_id: {session_id}"}
|
||||
if not sess.is_alive():
|
||||
_REGISTRY.remove(session_id)
|
||||
return {"error": f"session {session_id} has exited"}
|
||||
|
||||
if read_only:
|
||||
return sess.drain(timeout_sec=timeout_sec)
|
||||
|
||||
if command is None:
|
||||
return {"error": "command is required unless read_only=True"}
|
||||
|
||||
if raw_send:
|
||||
n = sess.send_raw(command, add_newline=False)
|
||||
return {"bytes_sent": n}
|
||||
|
||||
try:
|
||||
return sess.run(command, expect=expect, timeout_sec=timeout_sec)
|
||||
except SessionBusy as e:
|
||||
return {"error": str(e)}
|
||||
|
||||
@mcp.tool()
|
||||
def shell_pty_close(session_id: str, force: bool = False) -> dict:
|
||||
"""Terminate a PTY session. Always do this when you're done — leaked
|
||||
sessions count against the per-server PTY cap.
|
||||
|
||||
Args:
|
||||
session_id: From shell_pty_open.
|
||||
force: Skip the graceful "exit\\n" attempt and SIGTERM/SIGKILL.
|
||||
|
||||
Returns: {exit_code, final_output, already_closed}
|
||||
"""
|
||||
sess = _REGISTRY.get(session_id)
|
||||
if sess is None:
|
||||
return {"error": f"unknown session_id: {session_id}"}
|
||||
result = sess.close(force=force)
|
||||
_REGISTRY.remove(session_id)
|
||||
return result
|
||||
|
||||
|
||||
__all__ = ["register_pty_tools", "get_registry"]
|
||||
@@ -0,0 +1,5 @@
|
||||
"""Filesystem search tools (rg + find)."""
|
||||
|
||||
from shell_tools.search.tools import register_search_tools
|
||||
|
||||
__all__ = ["register_search_tools"]
|
||||
@@ -0,0 +1,204 @@
|
||||
"""``shell_rg`` and ``shell_find`` — structured wrappers over ripgrep / find.
|
||||
|
||||
Distinct from ``files-tools.search_files`` (project-relative,
|
||||
code-editor-tuned) — these accept arbitrary paths and surface the
|
||||
underlying tool's full feature set. The foundational skill steers
|
||||
agents to ``files-tools`` for in-project work and these tools for
|
||||
``/var/log``, ``/etc``, archive contents, etc.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import shutil
|
||||
import subprocess
|
||||
from typing import TYPE_CHECKING
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from fastmcp import FastMCP
|
||||
|
||||
|
||||
_DEFAULT_TIMEOUT_SEC = 30
|
||||
_MAX_OUTPUT_BYTES = 256 * 1024
|
||||
|
||||
|
||||
def register_search_tools(mcp: FastMCP) -> None:
|
||||
@mcp.tool()
|
||||
def shell_rg(
|
||||
pattern: str,
|
||||
path: str = ".",
|
||||
glob: str | None = None,
|
||||
type_filter: str | None = None,
|
||||
ignore_case: bool = False,
|
||||
context: int = 0,
|
||||
max_count: int | None = None,
|
||||
max_depth: int | None = None,
|
||||
hidden: bool = False,
|
||||
no_ignore: bool = False,
|
||||
extra_args: list[str] | None = None,
|
||||
) -> dict:
|
||||
"""Run ripgrep on `path` for `pattern`.
|
||||
|
||||
For project-scoped code search use files-tools.search_files instead;
|
||||
this tool is for raw paths (system configs, /var/log, archive contents)
|
||||
and exposes the full rg flag surface.
|
||||
|
||||
Args:
|
||||
pattern: Regex pattern.
|
||||
path: Directory or file to search. Default: current dir.
|
||||
glob: Filename glob (e.g. "*.py").
|
||||
type_filter: rg filetype shortcut (e.g. "py", "rust", "md").
|
||||
ignore_case: Case-insensitive search.
|
||||
context: Lines of context above and below each match.
|
||||
max_count: Stop after N matches per file.
|
||||
max_depth: Limit directory recursion depth.
|
||||
hidden: Include hidden files (rg ignores them by default).
|
||||
no_ignore: Don't respect .gitignore.
|
||||
extra_args: Raw flags to append (use sparingly — most needs are covered above).
|
||||
|
||||
Returns: {matches: [...], total, truncated, command}
|
||||
"""
|
||||
if not shutil.which("rg"):
|
||||
return {"error": "ripgrep (rg) is not installed on this host"}
|
||||
|
||||
argv = ["rg", "--json", "--no-heading"]
|
||||
if ignore_case:
|
||||
argv.append("-i")
|
||||
if context > 0:
|
||||
argv.extend(["-C", str(context)])
|
||||
if max_count is not None:
|
||||
argv.extend(["-m", str(max_count)])
|
||||
if max_depth is not None:
|
||||
argv.extend(["--max-depth", str(max_depth)])
|
||||
if hidden:
|
||||
argv.append("--hidden")
|
||||
if no_ignore:
|
||||
argv.append("--no-ignore")
|
||||
if type_filter:
|
||||
argv.extend(["-t", type_filter])
|
||||
if glob:
|
||||
argv.extend(["-g", glob])
|
||||
if extra_args:
|
||||
argv.extend(str(a) for a in extra_args)
|
||||
argv.extend(["--", pattern, path])
|
||||
|
||||
try:
|
||||
proc = subprocess.run(
|
||||
argv,
|
||||
capture_output=True,
|
||||
timeout=_DEFAULT_TIMEOUT_SEC,
|
||||
check=False,
|
||||
)
|
||||
except subprocess.TimeoutExpired:
|
||||
return {"error": "ripgrep timed out", "command": argv}
|
||||
except FileNotFoundError:
|
||||
return {"error": "ripgrep (rg) is not installed on this host"}
|
||||
|
||||
# Parse JSON-line output: only "match" events are interesting for the
|
||||
# default surface. Errors land in stderr.
|
||||
import json
|
||||
|
||||
matches: list[dict] = []
|
||||
truncated = False
|
||||
bytes_seen = 0
|
||||
for line in proc.stdout.splitlines():
|
||||
if not line:
|
||||
continue
|
||||
bytes_seen += len(line)
|
||||
if bytes_seen > _MAX_OUTPUT_BYTES:
|
||||
truncated = True
|
||||
break
|
||||
try:
|
||||
evt = json.loads(line)
|
||||
except json.JSONDecodeError:
|
||||
continue
|
||||
if evt.get("type") != "match":
|
||||
continue
|
||||
data = evt.get("data", {})
|
||||
path_data = (data.get("path") or {}).get("text") or ""
|
||||
line_no = data.get("line_number")
|
||||
text = (data.get("lines") or {}).get("text") or ""
|
||||
matches.append({"path": path_data, "line": line_no, "text": text.rstrip("\n")})
|
||||
|
||||
return {
|
||||
"matches": matches,
|
||||
"total": len(matches),
|
||||
"truncated": truncated,
|
||||
"exit_code": proc.returncode,
|
||||
"stderr": proc.stderr.decode("utf-8", errors="replace")[-2000:] if proc.stderr else "",
|
||||
}
|
||||
|
||||
@mcp.tool()
|
||||
def shell_find(
|
||||
path: str,
|
||||
name: str | None = None,
|
||||
iname: str | None = None,
|
||||
type_filter: str | None = None,
|
||||
mtime_days: int | None = None,
|
||||
size_kb_min: int | None = None,
|
||||
size_kb_max: int | None = None,
|
||||
max_depth: int | None = None,
|
||||
max_results: int = 1000,
|
||||
) -> dict:
|
||||
"""Run `find` with structured predicates.
|
||||
|
||||
For tree views or stat-like info on a single path, use shell_exec
|
||||
("ls -la", "tree -L 2", "stat foo"). This tool is for predicate-driven
|
||||
searches (find me .log files modified in the last 7 days bigger than 1MB).
|
||||
|
||||
Args:
|
||||
path: Directory to search under.
|
||||
name: Glob match (case-sensitive). e.g. "*.log".
|
||||
iname: Glob match (case-insensitive).
|
||||
type_filter: "f" file, "d" dir, "l" symlink.
|
||||
mtime_days: Modified within the last N days (negative or 0 means
|
||||
exact-day; we use -N for "within").
|
||||
size_kb_min, size_kb_max: Size bounds in KB.
|
||||
max_depth: Limit directory recursion.
|
||||
max_results: Cap on returned paths.
|
||||
|
||||
Returns: {paths: [...], count, truncated, command}
|
||||
"""
|
||||
if not shutil.which("find"):
|
||||
return {"error": "find is not installed on this host"}
|
||||
|
||||
argv = ["find", path]
|
||||
if max_depth is not None:
|
||||
argv.extend(["-maxdepth", str(max_depth)])
|
||||
if type_filter in {"f", "d", "l"}:
|
||||
argv.extend(["-type", type_filter])
|
||||
if name:
|
||||
argv.extend(["-name", name])
|
||||
if iname:
|
||||
argv.extend(["-iname", iname])
|
||||
if mtime_days is not None:
|
||||
argv.extend(["-mtime", f"-{abs(mtime_days)}"])
|
||||
if size_kb_min is not None:
|
||||
argv.extend(["-size", f"+{int(size_kb_min)}k"])
|
||||
if size_kb_max is not None:
|
||||
argv.extend(["-size", f"-{int(size_kb_max)}k"])
|
||||
|
||||
try:
|
||||
proc = subprocess.run(
|
||||
argv,
|
||||
capture_output=True,
|
||||
timeout=_DEFAULT_TIMEOUT_SEC,
|
||||
check=False,
|
||||
)
|
||||
except subprocess.TimeoutExpired:
|
||||
return {"error": "find timed out", "command": argv}
|
||||
|
||||
all_paths = proc.stdout.decode("utf-8", errors="replace").splitlines()
|
||||
truncated = len(all_paths) > max_results
|
||||
paths = all_paths[:max_results]
|
||||
return {
|
||||
"paths": paths,
|
||||
"count": len(paths),
|
||||
"truncated": truncated,
|
||||
"total_seen": len(all_paths),
|
||||
"exit_code": proc.returncode,
|
||||
"stderr": proc.stderr.decode("utf-8", errors="replace")[-2000:] if proc.stderr else "",
|
||||
"command": argv,
|
||||
}
|
||||
|
||||
|
||||
__all__ = ["register_search_tools"]
|
||||
@@ -0,0 +1,145 @@
|
||||
"""shell-tools FastMCP server — entry module.
|
||||
|
||||
Run via:
|
||||
uv run python -m shell_tools.server --stdio
|
||||
uv run python shell_tools_server.py --stdio (preferred, see _DEFAULT_LOCAL_SERVERS)
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import asyncio
|
||||
import atexit
|
||||
import logging
|
||||
import os
|
||||
import sys
|
||||
from collections.abc import AsyncIterator
|
||||
from contextlib import asynccontextmanager
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def setup_logger() -> None:
|
||||
if not logger.handlers:
|
||||
stream = sys.stderr if "--stdio" in sys.argv else sys.stdout
|
||||
handler = logging.StreamHandler(stream)
|
||||
handler.setFormatter(logging.Formatter("[shell-tools] %(message)s"))
|
||||
logger.addHandler(handler)
|
||||
logger.setLevel(logging.INFO)
|
||||
|
||||
|
||||
setup_logger()
|
||||
|
||||
# Suppress FastMCP banner in STDIO mode (mirrors gcu/server.py).
|
||||
if "--stdio" in sys.argv:
|
||||
import rich.console
|
||||
|
||||
_orig_console_init = rich.console.Console.__init__
|
||||
|
||||
def _patched_console_init(self, *args, **kwargs):
|
||||
kwargs["file"] = sys.stderr
|
||||
_orig_console_init(self, *args, **kwargs)
|
||||
|
||||
rich.console.Console.__init__ = _patched_console_init
|
||||
|
||||
|
||||
from fastmcp import FastMCP # noqa: E402
|
||||
|
||||
from shell_tools import register_shell_tools # noqa: E402
|
||||
from shell_tools.jobs.manager import get_manager # noqa: E402
|
||||
from shell_tools.pty.tools import get_registry as get_pty_registry # noqa: E402
|
||||
|
||||
|
||||
@asynccontextmanager
|
||||
async def _lifespan(_server: FastMCP) -> AsyncIterator[dict]:
|
||||
"""Reap children on shutdown so we don't orphan jobs/PTYs.
|
||||
|
||||
Mirrors the gcu-tools lifespan pattern. Runs in the FastMCP event
|
||||
loop on graceful shutdown; the atexit hook below catches abrupt
|
||||
exits (SIGTERM, etc.) where lifespan teardown may not complete.
|
||||
"""
|
||||
parent_pid_env = os.getenv("HIVE_DESKTOP_PARENT_PID")
|
||||
if parent_pid_env:
|
||||
try:
|
||||
parent_pid = int(parent_pid_env)
|
||||
asyncio.create_task(_parent_watchdog(parent_pid))
|
||||
logger.info("Parent watchdog armed for PID %d", parent_pid)
|
||||
except ValueError:
|
||||
logger.warning("Invalid HIVE_DESKTOP_PARENT_PID=%r", parent_pid_env)
|
||||
|
||||
yield {}
|
||||
|
||||
logger.info("Shutting down — reaping jobs and PTY sessions...")
|
||||
try:
|
||||
get_manager().shutdown_all(grace_sec=2.0)
|
||||
except Exception as e:
|
||||
logger.warning("JobManager shutdown error: %s", e)
|
||||
try:
|
||||
get_pty_registry().shutdown_all()
|
||||
except Exception as e:
|
||||
logger.warning("PTY registry shutdown error: %s", e)
|
||||
|
||||
|
||||
def _is_alive(pid: int) -> bool:
|
||||
try:
|
||||
os.kill(pid, 0)
|
||||
return True
|
||||
except (ProcessLookupError, PermissionError):
|
||||
return False
|
||||
|
||||
|
||||
async def _parent_watchdog(parent_pid: int) -> None:
|
||||
"""Self-destruct when the desktop parent dies."""
|
||||
while True:
|
||||
await asyncio.sleep(2.0)
|
||||
if not _is_alive(parent_pid):
|
||||
logger.warning("Parent PID %d gone — shell-tools exiting", parent_pid)
|
||||
try:
|
||||
get_manager().shutdown_all(grace_sec=1.0)
|
||||
except Exception:
|
||||
pass
|
||||
try:
|
||||
get_pty_registry().shutdown_all()
|
||||
except Exception:
|
||||
pass
|
||||
os._exit(0)
|
||||
|
||||
|
||||
def _atexit_reap() -> None:
|
||||
"""Last-ditch reaping if lifespan didn't run."""
|
||||
try:
|
||||
get_manager().shutdown_all(grace_sec=1.0)
|
||||
except Exception:
|
||||
pass
|
||||
try:
|
||||
get_pty_registry().shutdown_all()
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
|
||||
atexit.register(_atexit_reap)
|
||||
|
||||
mcp = FastMCP("shell-tools", lifespan=_lifespan)
|
||||
|
||||
|
||||
def main() -> None:
|
||||
parser = argparse.ArgumentParser(description="shell-tools MCP server")
|
||||
parser.add_argument("--port", type=int, default=int(os.getenv("SHELL_TOOLS_PORT", "4004")))
|
||||
parser.add_argument("--host", default="0.0.0.0")
|
||||
parser.add_argument("--stdio", action="store_true")
|
||||
args = parser.parse_args()
|
||||
|
||||
tools = register_shell_tools(mcp)
|
||||
|
||||
if not args.stdio:
|
||||
logger.info("Registered %d shell-tools: %s", len(tools), tools)
|
||||
|
||||
if args.stdio:
|
||||
mcp.run(transport="stdio")
|
||||
else:
|
||||
logger.info("Starting shell-tools on %s:%d", args.host, args.port)
|
||||
asyncio.run(mcp.run_async(transport="http", host=args.host, port=args.port))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -0,0 +1,162 @@
|
||||
"""shell_exec — envelope shape, semantic exits, warnings, auto-promotion."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import time
|
||||
|
||||
import pytest
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def exec_tool(mcp):
|
||||
from shell_tools.exec import register_exec_tools
|
||||
|
||||
register_exec_tools(mcp)
|
||||
return mcp._tool_manager._tools["shell_exec"].fn
|
||||
|
||||
|
||||
def test_envelope_shape_simple_echo(exec_tool):
|
||||
result = exec_tool(command="echo hello world")
|
||||
assert result["exit_code"] == 0
|
||||
assert result["stdout"].strip() == "hello world"
|
||||
assert result["stderr"] == ""
|
||||
assert result["semantic_status"] == "ok"
|
||||
assert result["timed_out"] is False
|
||||
assert result["auto_backgrounded"] is False
|
||||
assert result["job_id"] is None
|
||||
assert result["warning"] is None
|
||||
assert result["pid"] is not None
|
||||
|
||||
|
||||
def test_grep_no_matches_is_ok_not_error(exec_tool, tmp_path):
|
||||
f = tmp_path / "haystack.txt"
|
||||
f.write_text("apples\nbananas\n")
|
||||
result = exec_tool(command=f"grep zzz {f}")
|
||||
assert result["exit_code"] == 1
|
||||
assert result["semantic_status"] == "ok"
|
||||
assert "No matches found" in (result["semantic_message"] or "")
|
||||
|
||||
|
||||
def test_diff_files_differ_is_ok_not_error(exec_tool, tmp_path):
|
||||
a = tmp_path / "a.txt"
|
||||
a.write_text("hi\n")
|
||||
b = tmp_path / "b.txt"
|
||||
b.write_text("bye\n")
|
||||
result = exec_tool(command=f"diff {a} {b}")
|
||||
assert result["exit_code"] == 1
|
||||
assert result["semantic_status"] == "ok"
|
||||
assert "differ" in (result["semantic_message"] or "")
|
||||
|
||||
|
||||
def test_destructive_warning_for_rm_rf(exec_tool, tmp_path):
|
||||
# Don't actually delete anything — point at a missing path so the
|
||||
# command exits non-zero but the warning still fires from regex.
|
||||
target = tmp_path / "definitely_missing_dir"
|
||||
result = exec_tool(command=f"rm -rf {target}")
|
||||
assert result["warning"] is not None
|
||||
assert "force-remove" in result["warning"] or "recursively" in result["warning"]
|
||||
|
||||
|
||||
def test_destructive_warning_drop_table(exec_tool):
|
||||
# Run `true` so the test doesn't depend on echo behavior; pass the
|
||||
# destructive text via stdin so the regex still matches the command.
|
||||
result = exec_tool(command="echo 'DROP TABLE users;'", shell=True)
|
||||
assert result["warning"] is not None
|
||||
assert "drop" in result["warning"].lower() or "truncate" in result["warning"].lower()
|
||||
|
||||
|
||||
def test_command_not_found(exec_tool):
|
||||
result = exec_tool(command="this_command_does_not_exist_xyzzy")
|
||||
assert result["exit_code"] is None or result["exit_code"] != 0
|
||||
# Either pre-spawn FileNotFoundError or shell exit 127 — both are fine
|
||||
# as long as semantic_status reflects an error or the error field is set.
|
||||
assert (
|
||||
result["semantic_status"] == "error"
|
||||
or result.get("error")
|
||||
or "not found" in (result["semantic_message"] or "").lower()
|
||||
)
|
||||
|
||||
|
||||
def test_zsh_refused(exec_tool):
|
||||
result = exec_tool(command="echo hi", shell=True)
|
||||
# shell=True (the bool) → /bin/bash → succeeds
|
||||
assert result["exit_code"] == 0
|
||||
|
||||
|
||||
def test_zsh_string_refused():
|
||||
"""Calling _resolve_shell with zsh path raises ZshRefused."""
|
||||
from shell_tools.common.limits import ZshRefused, _resolve_shell
|
||||
|
||||
with pytest.raises(ZshRefused):
|
||||
_resolve_shell("/bin/zsh")
|
||||
with pytest.raises(ZshRefused):
|
||||
_resolve_shell("/usr/local/bin/zsh")
|
||||
|
||||
|
||||
def test_truncation_via_handle(exec_tool):
|
||||
"""Generate >256 KB of output, verify output_handle is returned."""
|
||||
# Generate ~300 KB of output
|
||||
result = exec_tool(
|
||||
command="python3 -c 'import sys; sys.stdout.write(\"x\" * 300_000)'",
|
||||
shell=True,
|
||||
max_output_kb=128, # smaller cap to force truncation
|
||||
)
|
||||
assert result["exit_code"] == 0
|
||||
assert result["stdout_truncated_bytes"] > 0
|
||||
assert result["output_handle"] is not None
|
||||
assert result["output_handle"].startswith("out_")
|
||||
|
||||
|
||||
def test_output_handle_round_trip(exec_tool, mcp):
|
||||
from shell_tools.output import register_output_tools
|
||||
|
||||
register_output_tools(mcp)
|
||||
output_get = mcp._tool_manager._tools["shell_output_get"].fn
|
||||
|
||||
result = exec_tool(
|
||||
command="python3 -c 'import sys; sys.stdout.write(\"x\" * 300_000)'",
|
||||
shell=True,
|
||||
max_output_kb=64,
|
||||
)
|
||||
handle = result["output_handle"]
|
||||
assert handle is not None
|
||||
|
||||
# First page
|
||||
page = output_get(output_handle=handle, since_offset=0, max_kb=64)
|
||||
assert page["expired"] is False
|
||||
assert len(page["data"]) > 0
|
||||
assert page["next_offset"] > 0
|
||||
|
||||
# Bogus handle
|
||||
bogus = output_get(output_handle="out_doesnotexist", since_offset=0, max_kb=64)
|
||||
assert bogus["expired"] is True
|
||||
|
||||
|
||||
def test_timed_out_marker(exec_tool):
|
||||
result = exec_tool(command="sleep 5", timeout_sec=1, auto_background_after_sec=0)
|
||||
assert result["timed_out"] is True
|
||||
|
||||
|
||||
def test_auto_promotion(exec_tool, mcp):
|
||||
"""Past auto_background_after_sec, the call returns auto_backgrounded=True."""
|
||||
from shell_tools.jobs.tools import register_job_tools
|
||||
|
||||
register_job_tools(mcp)
|
||||
# Use a 1s budget so the test runs quickly.
|
||||
start = time.monotonic()
|
||||
result = exec_tool(
|
||||
command="sleep 5",
|
||||
auto_background_after_sec=1,
|
||||
timeout_sec=10,
|
||||
)
|
||||
elapsed = time.monotonic() - start
|
||||
assert result["auto_backgrounded"] is True, result
|
||||
assert result["job_id"] is not None
|
||||
assert result["exit_code"] is None
|
||||
assert elapsed < 3, "auto-promotion should return quickly past the budget"
|
||||
|
||||
# Take over via shell_job_logs
|
||||
job_logs = mcp._tool_manager._tools["shell_job_logs"].fn
|
||||
log_result = job_logs(job_id=result["job_id"], wait_until_exit=True, wait_timeout_sec=10)
|
||||
assert log_result["status"] == "exited"
|
||||
assert log_result["exit_code"] == 0
|
||||
@@ -0,0 +1,97 @@
|
||||
"""Job lifecycle: ring buffer offsets, signals, stdin."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import time
|
||||
|
||||
import pytest
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def job_tools(mcp):
|
||||
from shell_tools.jobs.tools import register_job_tools
|
||||
|
||||
register_job_tools(mcp)
|
||||
return {
|
||||
"start": mcp._tool_manager._tools["shell_job_start"].fn,
|
||||
"logs": mcp._tool_manager._tools["shell_job_logs"].fn,
|
||||
"manage": mcp._tool_manager._tools["shell_job_manage"].fn,
|
||||
}
|
||||
|
||||
|
||||
def test_start_logs_wait_basic(job_tools):
|
||||
started = job_tools["start"](command="echo first; echo second; echo third", shell=True)
|
||||
assert "job_id" in started
|
||||
job_id = started["job_id"]
|
||||
|
||||
# Wait for completion via logs
|
||||
result = job_tools["logs"](job_id=job_id, wait_until_exit=True, wait_timeout_sec=5)
|
||||
assert result["status"] == "exited"
|
||||
assert result["exit_code"] == 0
|
||||
assert "first" in result["data"] and "third" in result["data"]
|
||||
|
||||
|
||||
def test_offset_bookkeeping(job_tools):
|
||||
started = job_tools["start"](
|
||||
command="for i in 1 2 3 4 5; do echo line$i; sleep 0.1; done",
|
||||
shell=True,
|
||||
)
|
||||
job_id = started["job_id"]
|
||||
|
||||
# Read a couple times with offset bookkeeping
|
||||
seen = ""
|
||||
offset = 0
|
||||
for _ in range(20):
|
||||
result = job_tools["logs"](job_id=job_id, since_offset=offset, max_bytes=4096)
|
||||
seen += result["data"]
|
||||
offset = result["next_offset"]
|
||||
if result["status"] == "exited":
|
||||
# Drain anything left
|
||||
tail = job_tools["logs"](job_id=job_id, since_offset=offset, max_bytes=4096)
|
||||
seen += tail["data"]
|
||||
break
|
||||
time.sleep(0.1)
|
||||
|
||||
for n in range(1, 6):
|
||||
assert f"line{n}" in seen, f"missing line{n} from {seen!r}"
|
||||
|
||||
|
||||
def test_merge_stderr(job_tools):
|
||||
started = job_tools["start"](
|
||||
command="echo stdout1; echo stderr1 1>&2; echo stdout2",
|
||||
shell=True,
|
||||
merge_stderr=True,
|
||||
)
|
||||
job_id = started["job_id"]
|
||||
result = job_tools["logs"](
|
||||
job_id=job_id, stream="merged", wait_until_exit=True, wait_timeout_sec=5
|
||||
)
|
||||
assert "stdout1" in result["data"]
|
||||
assert "stderr1" in result["data"]
|
||||
|
||||
|
||||
def test_signal_term(job_tools):
|
||||
started = job_tools["start"](command="sleep 30")
|
||||
job_id = started["job_id"]
|
||||
|
||||
# Give it a moment to actually start
|
||||
time.sleep(0.2)
|
||||
|
||||
result = job_tools["manage"](action="signal_term", job_id=job_id)
|
||||
assert result["ok"] is True
|
||||
|
||||
final = job_tools["logs"](job_id=job_id, wait_until_exit=True, wait_timeout_sec=3)
|
||||
assert final["status"] == "exited"
|
||||
# On SIGTERM, exit_code is -15 (subprocess convention)
|
||||
assert final["exit_code"] == -15
|
||||
|
||||
|
||||
def test_list_action(job_tools):
|
||||
started = job_tools["start"](command="sleep 1")
|
||||
listing = job_tools["manage"](action="list")
|
||||
assert any(j["job_id"] == started["job_id"] for j in listing["jobs"])
|
||||
|
||||
|
||||
def test_unknown_job_id(job_tools):
|
||||
result = job_tools["logs"](job_id="job_doesnotexist", wait_until_exit=False)
|
||||
assert "error" in result
|
||||
@@ -0,0 +1,109 @@
|
||||
"""PTY sessions: bash-on-macOS, prompt sentinel, raw I/O, zsh refusal."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import sys
|
||||
import time
|
||||
|
||||
import pytest
|
||||
|
||||
pytestmark = pytest.mark.skipif(sys.platform == "win32", reason="PTY is POSIX-only")
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def pty_tools(mcp):
|
||||
from shell_tools.pty.tools import register_pty_tools
|
||||
|
||||
register_pty_tools(mcp)
|
||||
return {
|
||||
"open": mcp._tool_manager._tools["shell_pty_open"].fn,
|
||||
"run": mcp._tool_manager._tools["shell_pty_run"].fn,
|
||||
"close": mcp._tool_manager._tools["shell_pty_close"].fn,
|
||||
}
|
||||
|
||||
|
||||
def test_open_close_basic(pty_tools):
|
||||
opened = pty_tools["open"]()
|
||||
assert "session_id" in opened
|
||||
assert opened["shell"] == "/bin/bash", "shell-tools must default to bash, not zsh"
|
||||
closed = pty_tools["close"](session_id=opened["session_id"])
|
||||
assert closed.get("already_closed") in (False, None)
|
||||
|
||||
|
||||
def test_bash_on_darwin():
|
||||
"""Even on macOS, the resolved shell is /bin/bash, not /bin/zsh."""
|
||||
from shell_tools.common.limits import _resolve_shell
|
||||
|
||||
assert _resolve_shell(True) == "/bin/bash"
|
||||
|
||||
|
||||
def test_pty_run_command(pty_tools):
|
||||
opened = pty_tools["open"]()
|
||||
sid = opened["session_id"]
|
||||
try:
|
||||
result = pty_tools["run"](session_id=sid, command="echo hello-pty", timeout_sec=5)
|
||||
assert result.get("timed_out") is False
|
||||
assert "hello-pty" in result["output"]
|
||||
assert result["prompt_after"] is True
|
||||
finally:
|
||||
pty_tools["close"](session_id=sid)
|
||||
|
||||
|
||||
def test_pty_state_persists(pty_tools):
|
||||
opened = pty_tools["open"]()
|
||||
sid = opened["session_id"]
|
||||
try:
|
||||
pty_tools["run"](session_id=sid, command="MY_VAR=42")
|
||||
result = pty_tools["run"](session_id=sid, command="echo $MY_VAR", timeout_sec=3)
|
||||
assert "42" in result["output"]
|
||||
finally:
|
||||
pty_tools["close"](session_id=sid)
|
||||
|
||||
|
||||
def test_raw_send_then_read_only(pty_tools):
|
||||
"""Drive the python REPL via raw_send + read_only."""
|
||||
opened = pty_tools["open"]()
|
||||
sid = opened["session_id"]
|
||||
try:
|
||||
# Launch python with our own prompt regex
|
||||
pty_tools["run"](
|
||||
session_id=sid,
|
||||
command="python3 -q",
|
||||
expect=r">>>\s*$",
|
||||
timeout_sec=10,
|
||||
)
|
||||
pty_tools["run"](session_id=sid, command="x = 7\n", raw_send=True)
|
||||
pty_tools["run"](session_id=sid, command="print(x*x)\n", raw_send=True)
|
||||
time.sleep(0.5)
|
||||
drained = pty_tools["run"](session_id=sid, read_only=True, timeout_sec=2)
|
||||
assert "49" in drained["output"]
|
||||
finally:
|
||||
pty_tools["close"](session_id=sid, force=True)
|
||||
|
||||
|
||||
def test_session_busy(pty_tools):
|
||||
"""Concurrent run() calls on the same session return 'session busy'."""
|
||||
import threading
|
||||
|
||||
opened = pty_tools["open"]()
|
||||
sid = opened["session_id"]
|
||||
try:
|
||||
results = []
|
||||
|
||||
def run_long():
|
||||
results.append(pty_tools["run"](session_id=sid, command="sleep 2", timeout_sec=5))
|
||||
|
||||
t = threading.Thread(target=run_long)
|
||||
t.start()
|
||||
time.sleep(0.2)
|
||||
# Concurrent call should fail
|
||||
result = pty_tools["run"](session_id=sid, command="echo nope", timeout_sec=1)
|
||||
assert "error" in result and "busy" in result["error"].lower()
|
||||
t.join(timeout=10)
|
||||
finally:
|
||||
pty_tools["close"](session_id=sid, force=True)
|
||||
|
||||
|
||||
def test_unknown_session(pty_tools):
|
||||
result = pty_tools["run"](session_id="pty_doesnotexist", command="ls")
|
||||
assert "error" in result
|
||||
@@ -0,0 +1,58 @@
|
||||
"""shell_rg + shell_find — basic functionality, structured output."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import shutil
|
||||
|
||||
import pytest
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def search_tools(mcp):
|
||||
from shell_tools.search.tools import register_search_tools
|
||||
|
||||
register_search_tools(mcp)
|
||||
return {
|
||||
"rg": mcp._tool_manager._tools["shell_rg"].fn,
|
||||
"find": mcp._tool_manager._tools["shell_find"].fn,
|
||||
}
|
||||
|
||||
|
||||
@pytest.mark.skipif(not shutil.which("rg"), reason="ripgrep not installed")
|
||||
def test_rg_finds_pattern(search_tools, tmp_path):
|
||||
(tmp_path / "a.txt").write_text("hello\nworld\nfoo\n")
|
||||
(tmp_path / "b.txt").write_text("bar\nworld\n")
|
||||
|
||||
result = search_tools["rg"](pattern="world", path=str(tmp_path))
|
||||
assert result["total"] >= 2
|
||||
paths = {m["path"] for m in result["matches"]}
|
||||
assert any("a.txt" in p for p in paths)
|
||||
|
||||
|
||||
@pytest.mark.skipif(not shutil.which("rg"), reason="ripgrep not installed")
|
||||
def test_rg_no_matches(search_tools, tmp_path):
|
||||
(tmp_path / "a.txt").write_text("hello\n")
|
||||
result = search_tools["rg"](pattern="zzz_no_match_zzz", path=str(tmp_path))
|
||||
assert result["total"] == 0
|
||||
assert result["matches"] == []
|
||||
|
||||
|
||||
def test_find_by_name(search_tools, tmp_path):
|
||||
(tmp_path / "alpha.log").write_text("a")
|
||||
(tmp_path / "beta.log").write_text("b")
|
||||
(tmp_path / "ignore.txt").write_text("c")
|
||||
|
||||
result = search_tools["find"](path=str(tmp_path), name="*.log")
|
||||
assert result["count"] == 2
|
||||
assert all(p.endswith(".log") for p in result["paths"])
|
||||
|
||||
|
||||
def test_find_by_type_dir(search_tools, tmp_path):
|
||||
(tmp_path / "sub").mkdir()
|
||||
(tmp_path / "file.txt").write_text("x")
|
||||
|
||||
result = search_tools["find"](path=str(tmp_path), type_filter="d")
|
||||
paths = result["paths"]
|
||||
# tmp_path itself + sub
|
||||
assert any(p.endswith("sub") for p in paths)
|
||||
assert not any(p.endswith("file.txt") for p in paths)
|
||||
@@ -0,0 +1,102 @@
|
||||
"""Security/policy tests: zsh refusal, env stripping, destructive catalog."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import pytest
|
||||
|
||||
|
||||
def test_resolve_shell_rejects_zsh():
|
||||
from shell_tools.common.limits import ZshRefused, _resolve_shell
|
||||
|
||||
for path in ("/bin/zsh", "/usr/bin/zsh", "/usr/local/bin/zsh", "ZSH"):
|
||||
with pytest.raises(ZshRefused):
|
||||
_resolve_shell(path)
|
||||
|
||||
|
||||
def test_resolve_shell_accepts_bash():
|
||||
from shell_tools.common.limits import _resolve_shell
|
||||
|
||||
assert _resolve_shell(True) == "/bin/bash"
|
||||
assert _resolve_shell("/bin/bash") == "/bin/bash"
|
||||
assert _resolve_shell(False) is None
|
||||
|
||||
|
||||
def test_sanitized_env_strips_zsh_vars(monkeypatch):
|
||||
from shell_tools.common.limits import sanitized_env
|
||||
|
||||
monkeypatch.setenv("ZDOTDIR", "/some/path")
|
||||
monkeypatch.setenv("ZSH_VERSION", "5.9")
|
||||
monkeypatch.setenv("ZSH_NAME", "zsh")
|
||||
monkeypatch.setenv("PATH", "/usr/bin:/bin")
|
||||
|
||||
env = sanitized_env()
|
||||
assert "ZDOTDIR" not in env
|
||||
assert "ZSH_VERSION" not in env
|
||||
assert "ZSH_NAME" not in env
|
||||
# Non-zsh vars survive
|
||||
assert env["PATH"] == "/usr/bin:/bin"
|
||||
|
||||
|
||||
def test_destructive_warning_catalog():
|
||||
from shell_tools.common.destructive_warning import get_warning
|
||||
|
||||
cases = [
|
||||
("rm -rf /tmp/foo", "force-remove"),
|
||||
("rm -r /tmp/foo", "recursively remove"),
|
||||
("git reset --hard HEAD~1", "discard"),
|
||||
("git push --force origin main", "remote history"),
|
||||
("git push -f origin main", "remote history"),
|
||||
("git commit --amend -m 'x'", "rewrite"),
|
||||
("DROP TABLE users;", "drop or truncate"),
|
||||
("DELETE FROM users;", "delete rows"),
|
||||
("kubectl delete pod foo", "Kubernetes"),
|
||||
("terraform destroy", "Terraform"),
|
||||
]
|
||||
for cmd, expected in cases:
|
||||
warning = get_warning(cmd)
|
||||
assert warning is not None, f"expected warning for {cmd!r}"
|
||||
assert expected in warning, f"warning {warning!r} should mention {expected!r}"
|
||||
|
||||
|
||||
def test_destructive_warning_clean_commands():
|
||||
from shell_tools.common.destructive_warning import get_warning
|
||||
|
||||
for cmd in ["ls -la", "echo hi", "git status", "git commit -m 'x'"]:
|
||||
assert get_warning(cmd) is None, f"unexpected warning for {cmd!r}"
|
||||
|
||||
|
||||
def test_semantic_exit_grep():
|
||||
from shell_tools.common.semantic_exit import classify
|
||||
|
||||
status, msg = classify("grep foo /tmp/x", 0)
|
||||
assert status == "ok"
|
||||
status, msg = classify("grep foo /tmp/x", 1)
|
||||
assert status == "ok"
|
||||
assert "No matches" in msg
|
||||
status, msg = classify("grep foo /tmp/x", 2)
|
||||
assert status == "error"
|
||||
|
||||
|
||||
def test_semantic_exit_default():
|
||||
from shell_tools.common.semantic_exit import classify
|
||||
|
||||
status, msg = classify("ls", 0)
|
||||
assert status == "ok"
|
||||
assert msg is None
|
||||
status, msg = classify("ls", 1)
|
||||
assert status == "error"
|
||||
|
||||
|
||||
def test_semantic_exit_signaled():
|
||||
from shell_tools.common.semantic_exit import classify
|
||||
|
||||
status, msg = classify("sleep 999", -15, signaled=True)
|
||||
assert status == "signal"
|
||||
|
||||
|
||||
def test_semantic_exit_timed_out():
|
||||
from shell_tools.common.semantic_exit import classify
|
||||
|
||||
status, msg = classify("sleep 999", None, timed_out=True)
|
||||
assert status == "error"
|
||||
assert "timed out" in msg.lower()
|
||||
@@ -0,0 +1,33 @@
|
||||
"""Smoke test: load the server module, register tools, assert all 10 land."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
EXPECTED_TOOLS = {
|
||||
"shell_exec",
|
||||
"shell_job_start",
|
||||
"shell_job_logs",
|
||||
"shell_job_manage",
|
||||
"shell_pty_open",
|
||||
"shell_pty_run",
|
||||
"shell_pty_close",
|
||||
"shell_rg",
|
||||
"shell_find",
|
||||
"shell_output_get",
|
||||
}
|
||||
|
||||
|
||||
def test_register_shell_tools_lands_all_ten(mcp):
|
||||
from shell_tools import register_shell_tools
|
||||
|
||||
names = register_shell_tools(mcp)
|
||||
assert set(names) == EXPECTED_TOOLS, (
|
||||
f"missing: {EXPECTED_TOOLS - set(names)}, extra: {set(names) - EXPECTED_TOOLS}"
|
||||
)
|
||||
|
||||
|
||||
def test_all_tools_have_shell_prefix(mcp):
|
||||
from shell_tools import register_shell_tools
|
||||
|
||||
names = register_shell_tools(mcp)
|
||||
for n in names:
|
||||
assert n.startswith("shell_"), f"tool {n!r} missing shell_ prefix"
|
||||
Reference in New Issue
Block a user