diff --git a/CHANGELOG.md b/CHANGELOG.md index 8a1265e..83a722e 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -10,6 +10,246 @@ CalVer image tags (`stable-YYYY.MM.N`, `dev-YYYY.MM.N`) are produced for every C ## [Unreleased] +### Fixed (PR #242 follow-ups) + +- **`/agnes-private` legacy-scan gap closed (David #8 from PR review).** + `agnes push --legacy-scan` now consults the private list using the + jsonl file stem as the session id (Claude Code names them + `.jsonl`). Previously legacy-scan entries carried an + empty session_id, so `--legacy-scan` would upload every transcript + on disk regardless of whether the user later marked it private. +- **`statusline`/`is_private` no longer mkdir-pollutes arbitrary + workdirs (S2.7 from PR review).** Read paths now use a side-effect- + free helper that returns the `.claude/` path WITHOUT creating it; + only `add_private` materializes the dir. Adds a process-local + mtime-keyed cache around `read_all_private` so in-process callers + (push doing one stat per upload candidate, `agnes diagnose` + scanning workspaces) don't re-parse the file every time. +- **`agnes capture-session` writes an operability breadcrumb log + (David #11 from PR review).** Every invocation appends one TSV + line to `/.claude/agnes-capture-session.log` with the + outcome (`ok`, `private_skip`, `bad_json`, `no_transcript_path`, + …). Gives operators a signal to detect "hook fires but queue stays + empty" — without it, an upstream Claude Code stdin-contract change + is invisible because the hook always exits 0. Log rolls at 256 KiB. + Best-effort: a breadcrumb-write failure is swallowed so the hook + contract stays "exit 0 always". Skipped in non-Agnes workdirs (no + `.claude/`) so opening Claude Code in `~/` doesn't pollute it. + +### Added + +- **Session capture queue + new `agnes capture-session` SessionStart + subcommand.** Replaces the previous encoding-based scan of + `~/.claude/projects/` for session jsonls (which depended on Claude + Code's cwd-to-folder encoding — a moving target across versions). + The hook reads Claude Code's documented stdin JSON + (`transcript_path`) and appends `\t` to + `/.claude/agnes-sessions.txt`. `agnes push` then atomically + renames that queue to a snapshot, processes it, and re-queues failed + uploads. Recovery snapshots from a crashed push are picked up on the + next run. Concurrent SessionStart hooks (multiple Claude Code windows + opening at once) are serialized by a short-lived `agnes-queue.lock` + so the queue is race-free on every OS. + +- **`/agnes-private` slash command + `agnes mark-private` subcommand.** + Mark the current Claude Code session as private — its transcript is + skipped by `agnes push` and audit-logged to + `/.claude/agnes-sessions-private-skipped.txt` instead. + The slash command runs deterministically via `!`-prefix bash (no AI + in the loop). State lives in + `/.claude/agnes-sessions-private.txt` (one session_id per + line) and is the authoritative source — both `capture-session` and + `push` consult it, so the slash-command-before-capture and + capture-before-slash-command races both resolve safely without an + ordering dependency. Requires the `CLAUDE_CODE_SESSION_ID` + environment variable that Claude Code sets in every bash subprocess + it spawns; `agnes mark-private` exits 1 if missing (defends against + accidental invocations from a regular terminal). + +- **`agnes statusline` subcommand + statusLine wiring.** Renders + `🔒 agnes-private` in the Claude Code status bar when the current + session is marked private; empty string otherwise. `agnes init` wires + it to Claude Code's `statusLine` setting. Polite to existing + customizations — if the workspace `settings.json` already has a + `statusLine`, the install preserves it untouched and emits a + one-line stderr warning instructing the operator how to compose + `agnes statusline` into their own command. + +- **`agnes push --legacy-scan` opt-in fallback** scans + `~/.claude/projects/` via the pre-queue encoding-based path. Use + for one-off backfill of session jsonls that pre-date the queue + mechanism on workspaces upgrading from < v0.49. Note: legacy-scanned + entries have empty `session_id`, so the `/agnes-private` list filter + never matches — backfill uploads bypass the private list. Document + this gap before running a backfill on a workspace that has + previously-marked-private sessions in the encoding-based location. + +- **Single-instance lock for `agnes push`** (cross-platform via + `filelock`: `fcntl.flock` on POSIX, `msvcrt.locking` on Windows). + When the user closes several Claude Code sessions simultaneously, + every SessionEnd hook fires its own `agnes push` — exactly one + acquires `/.claude/agnes-push.lock` and runs, the rest + silent-exit. Prevents concurrent uploads from each other's queues + and matches the existing `bash -c "( nohup ... & ) ; true"` + SessionEnd wrapping (push must survive Claude Code's ~1s SIGTERM + in `-p` headless mode). + +- **New `filelock>=3.13,<4` runtime dependency.** Backs both the + push single-instance lock and the queue-write serialization above. + +### Changed + +- **BREAKING: SessionStart / SessionEnd hook wire format.** + `agnes init` (and the new `agnes self-upgrade` auto-refresh path) + write a different hook layout than v0.48: + - SessionStart gains `agnes capture-session` as the very first + entry — feeds the new session-capture queue that powers + `agnes push`. Must run before any other SessionStart hook so the + `transcript_path` is captured even if a later hook fails. + - SessionStart's previous `agnes push` self-heal entry is removed + — the queue persists across runs so orphan jsonls from headless / + crashed sessions ship out on the next SessionEnd push naturally. + Workspaces upgrading from < v0.49 with sessions that pre-date the + queue mechanism need a one-off `agnes push --legacy-scan` to + backfill them; see `--legacy-scan` entry above. + - SessionEnd `agnes push` is wrapped in a `nohup` subshell so the + upload survives Claude Code's `-p` headless SIGTERM (~1s after + hook fires) and completes the full upload cycle. The synchronous + form would lose 5-30s of uploads to the kill. + - All entries are wrapped in `bash -c "..."` for Windows + compatibility — Claude Code on Windows runs hook commands directly + without a shell, so any `;` chain / `2>/dev/null` redirection / + `|| true` short-circuit silently no-op'd previously. + + Existing workspaces auto-migrate to the new layout on the next + session-start via `maybe_refresh_claude_hooks` invoked from + `agnes self-upgrade` (see separate Changed entry). No operator + action required. + +- **`agnes self-upgrade` now auto-refreshes the workspace Claude Code + hooks** so an existing Agnes workspace picks up the new SessionStart / + SessionEnd layout the moment its CLI is upgraded — no need to re-run + `agnes init` after a release. Without this, an existing v0.48 + workspace would auto-upgrade the CLI via its own SessionStart + self-upgrade entry, but the new `agnes capture-session` hook (added + in this release) would never get installed, the queue would stay + empty, and `agnes push` would silently stop uploading sessions. The + refresh fires on both the "info is None" fast path (CLI already + current — handles the second SessionStart after a prior upgrade) and + after a successful install. Guarded by + `cli.lib.hooks.workspace_has_agnes_hooks` so it never writes + `.claude/settings.json` into directories that aren't Agnes workspaces + (e.g. `agnes self-upgrade` from `~/`). Failures are best-effort — + they're surfaced on stderr but never flip the upgrade exit code. + +### Added + +- **Onboarding docs for the `/agnes-private` privacy feature.** + `config/claude_md_template.txt` gains a short "Private sessions" + subsection (next to "Data Sync") covering the slash command, + statusbar indicator, and audit-log location. The web-served setup + prompt (`app/web/setup_instructions.py`) gets a one-line mention so + analysts learn the feature exists at onboarding instead of by + accident. + +### Changed + +- **`_install_statusline` distinguishes explicit `null` / empty-string + `statusLine` from absent key.** Previously the `if existing:` truthy + check silently took the same path for all three cases. The new + `existing is None or existing == ""` branch documents and tests the + behavior (install ours — treated as "not configured" rather than + "explicit user opt-out"). Two new tests pin both edge cases. + +### Fixed + +- **`agnes push --legacy-scan` help text documents the private-list + gap.** Legacy-scan entries carry an empty `session_id`, so the + `/agnes-private` filter is not consulted. The practical impact is + bounded — pre-queue sessions cannot have been marked private (the + private list is a queue-era feature) — but the help text now spells + out the gap so an operator running a backfill is not surprised. + +- **`agnes push` no longer crashes on filesystem errors when acquiring + the single-instance lock.** `acquire_or_skip` in + `cli/lib/push_lock.py` now treats `OSError` (read-only filesystem, + permission denied on `.claude/`, disk full, hardware I/O failure) the + same as `filelock.Timeout` — yields `None`, push exits cleanly. + Previously the `OSError` propagated as an unhandled traceback; + invisible in the SessionEnd hook context (the `|| true` wrapper + swallowed it), but ugly in a manual `agnes push` invocation. + +- **`agnes push` no longer infinite-loops on permanent 4xx failures.** + Previously any non-200 response except the literal `file not found + on disk` was re-queued, so 401 (token expired), 403 (RBAC denial), + 413 (payload too large), 400 (server-side validation error) cycled + through every push run forever — the queue grew without bound and + each run re-bombarded the server with the same failing upload. + 4xx (except 408 Request Timeout + 429 Too Many Requests, which the + HTTP spec marks as transient) is now dropped + audit-logged to + `/.claude/agnes-sessions-failed.txt` instead (TSV: + `\t\t\t`). 5xx and + network errors continue to re-queue (genuinely transient — server + or transport state can change between runs). `agnes push --json` + surfaces a new `dropped_permanent` counter; non-quiet stdout + mentions the audit-log path so operators tailing the output have a + pointer to the forensic trail. + +- **Session capture queue: concurrent SessionStart hooks no longer + corrupt the queue file on Windows.** `append_to_queue`, + `requeue_failed`, and `snapshot_queue` in `cli/lib/session_queue.py` + now hold a short-lived `agnes-queue.lock` (filelock) while writing. + Previously the code assumed Python's `open(path, "a")` is atomic on + NTFS for small writes; it isn't — the Windows CRT does not pass + `FILE_APPEND_DATA` to `CreateFile`, so concurrent appenders (e.g. + user opens several Claude Code windows simultaneously) could + interleave bytes mid-line and the parser would silently drop the + malformed entries. The lock is separate from `agnes-push.lock` — + capture-session hooks don't block on the push command. +- **Session capture queue: snapshot filenames now include a uuid8 tail + so a recycled OS PID cannot silently overwrite a recovery snapshot + left behind by a crashed push.** `snapshot_queue` previously named + files `agnes-sessions.snapshot..txt`; after a crash + PID reuse + (Linux default `kernel.pid_max=32768`), `os.rename` atomically + replaces the recovery file with the new snapshot, losing every entry + in it. New format: `agnes-sessions.snapshot...txt`; + `find_recovery_snapshots` already uses a glob so the change is + backward-compatible with snapshots written by older CLI versions. + +### Changed + +- **Setup prompt + CLAUDE.md template: marketplace copy now reflects the + actual three-source served stack composition + `--check`-only + SessionStart hook.** Previous text (shipped in 0.48.0 / PR #240) said + the SessionStart hook keeps the marketplace clone in sync via + `agnes refresh-marketplace --quiet` on every session, and that admin + grants land automatically without re-running setup — both false since + PR #237 (0.47.x) moved the install/update path out of the hook into + the `/update-agnes-plugins` slash command. The hook is `--check`-only: + it detects server-side changes and prompts the user to run the slash + command, which does the full reconcile interactively with output + visible in the transcript. Updated copy spells out the real + composition of the served stack — `(admin RBAC ∩ /marketplace + subscriptions) ∪ system-mandatory plugins ∪ Flea market installs` — + rather than the admin-grants-only framing the previous copy implied. + Affects: `app/web/setup_instructions.py:_marketplace_block` (both + trailer variants) and `config/claude_md_template.txt` (Agnes + Marketplace section). + +### Removed + +- **Setup prompt's interactive Skills step deleted.** The final step + before Confirm used to ask the user verbatim whether to bulk-copy + every `agnes skills` markdown file into `~/.claude/skills/agnes/` or + pull them on-demand via `agnes skills show `. The named-opinion + question with no obvious right answer was confusing for new users at + the tail end of a wall of technical steps. On-demand lookup via + `agnes skills show ` is the one-size-fits-all default — the + CLI knowledge base remains discoverable through `agnes skills list` + and the CLAUDE.md template references specific skills (e.g. + `agnes-data-querying`) inline where they're relevant. Layout: Confirm + shifts from step 9 to step 8 across all variants. + ## [0.48.0] — 2026-05-10 ### Fixed diff --git a/app/web/setup_instructions.py b/app/web/setup_instructions.py index 4fa6b20..978d703 100644 --- a/app/web/setup_instructions.py +++ b/app/web/setup_instructions.py @@ -91,26 +91,22 @@ The numbered steps are arranged so that: CLAUDE.md fetch, and Claude Code SessionStart/End hooks into one non-interactive call. Replaces the old `agnes auth import-token` + `agnes auth whoami` pair. - - The interactive question (skills copy vs on-demand) is the LAST step - before Confirm — by that point everything else is done, the user only - needs to decide one thing, and the assistant blocks on their answer. - `agnes diagnose` runs late so it doubles as a final smoke test after - plugins are in place, instead of gating them. + plugins are in place, instead of gating them. It is also the last + step before Confirm — the whole prompt is non-interactive, no + decision questions for the user. -Layout (with marketplace plugins to install): +Layout: 0 TLS trust block (only when ca_pem is supplied) 1 Install CLI 2 agnes init (auth + workspace bootstrap) 3 agnes catalog (smoke verify) 4 Pre-flight: git + claude - 5 Marketplace + plugins - 6 Diagnose - 7 Skills (interactive — assistant waits for user) + 5 Marketplace (always, even with empty served stack) + 6 MCP servers (Atlassian Remote MCP) + 7 Diagnose 8 Confirm -Layout (no plugins): steps 4-5 collapse out, diagnose/skills/confirm -renumber to 4-5-6. - The combined-bundle source uses a fallback chain so the prompt still works on machines without the system Python `certifi`: we try (a) `python3 -c 'import certifi'`, (b) the platform's curl/openssl bundle path, (c) @@ -358,23 +354,31 @@ def _init_lines(server_url_placeholder: str = "{server_url}") -> list[str]: "", " This should list the tables your account has grants for. Empty list", " means your admin hasn't granted you access yet — contact them.", + "", + " Tip: type `/agnes-private` inside any Claude Code session to mark it", + " private — its transcript is skipped by `agnes push` (audit-logged to", + " `.claude/agnes-sessions-private-skipped.txt`). The statusbar shows", + " `🔒 agnes-private` while you're in a private session.", ] -def _diagnose_skills_lines(*, diagnose_num: str, skills_num: str) -> list[str]: - """Diagnose + skills steps — moved AFTER the marketplace block. +def _diagnose_lines(*, diagnose_num: str) -> list[str]: + """Diagnose step — runs AFTER the marketplace + MCP blocks. - Putting these last (instead of right after `whoami`) means: by the time - we ask the user the skills question, all installation work is finished — - the only thing the prompt is still waiting on is one human-loop answer. - `agnes diagnose` then doubles as a server-health smoke test that runs after - plugins are in place, not as a gate before them. With the new ordering - skills is the LAST step before Confirm, so the assistant must wait for - the user's answer before finalizing — there's no "run other steps in - parallel" affordance any more (and it isn't needed). + Putting it last (instead of right after `whoami`) means it doubles as + a server-health smoke test that runs once everything else is in place, + not as a gate before them. It is the last step before Confirm — the + whole prompt is non-interactive. - Step numbers are filled in by the caller because they shift between - the no-marketplace layout (4, 5) and the marketplace layout (6, 7). + The bundled `agnes skills` knowledge base (markdown documents listable + via `agnes skills list` / readable via `agnes skills show `) is + no longer surfaced from this prompt: discovery happens organically + when CLAUDE.md or another skill references a specific entry (see the + `agnes skills show agnes-data-querying` mention in the CLAUDE.md + template's BigQuery section). Bulk-copying every skill into + `~/.claude/skills/agnes/` at setup time was an interactive opinion + question with no obvious right answer; on-demand lookup is the + one-size-fits-all default. """ return [ "", @@ -387,25 +391,6 @@ def _diagnose_skills_lines(*, diagnose_num: str, skills_num: str) -> list[str]: " - non-admin roles (e.g. `analyst`) that don't have grants to read", " the system schema even on populated instances.", " Only flag actual yellow/red checks (api / duckdb_state / users).", - "", - f"{skills_num}) Skills (ask the user — this is the last interactive step before Confirm):", - " The CLI ships with reusable markdown skills (setup, connectors,", - " corporate-memory, deploy, notifications, security, troubleshoot),", - " listable via `agnes skills list` and readable via `agnes skills show `.", - "", - " Ask the user verbatim: \"Do you want me to copy the Agnes skills into", - " ~/.claude/skills/agnes/ so they are always loaded in Claude Code,", - " or should I pull them on-demand via `agnes skills show ` when", - " needed?\"", - "", - " Wait for the user's answer before moving to Confirm.", - "", - " If they say copy:", - " mkdir -p ~/.claude/skills/agnes", - " for s in $(agnes skills list | awk '{print $1}'); do", - " agnes skills show \"$s\" > ~/.claude/skills/agnes/\"$s\".md", - " done", - " echo \"Copied skills to ~/.claude/skills/agnes/\"", ] @@ -417,16 +402,16 @@ def _finale_lines(*, confirm_step_num: str, has_ca: bool) -> list[str]: the trust block ran (`has_ca`). The marketplace clone bullet is unconditional now — preflight + marketplace are always emitted (Fix B in the 2026-05-10 init-report response). Init + catalog + diagnose + - skills + version always render, so their bullets are unconditional.""" + version always render, so their bullets are unconditional.""" bullets = [ " - `agnes --version` output", " - First few lines of `agnes catalog` (tables you can see)", " - Confirmation that `./CLAUDE.md` and `./AGNES_WORKSPACE.md` exist", " - Confirmation that `./.claude/settings.json` contains SessionStart/End hooks", " - The `agnes diagnose` overall status", - " - Whether skills were copied or left on-demand", " - Confirmation that `~/.agnes/marketplace/.git/` exists " - "(the marketplace clone) and that any granted plugins installed", + "(the marketplace clone) and that any plugins currently in the " + "served stack installed cleanly", " - Reminder to scroll to the connector cards on /home and connect " "Asana / Google Workspace / Atlassian (those run separately from this script)", ] @@ -493,13 +478,26 @@ def _marketplace_block( ) -> list[str]: """Build the marketplace + plugin-install block. - `plugin_install_names` may be empty: registering the per-user - marketplace clone with Claude Code is useful even when the operator - has zero plugin grants, because it pre-wires the SessionStart hook - and the grant flow — admin grants land on the next Claude Code - session without re-running setup. The block copy adapts for the - empty case so the comment-bullet doesn't promise plugin installs - that won't happen. + `plugin_install_names` is the user's current *served stack* as + computed by `src/marketplace_filter.py:resolve_user_marketplace` — + i.e.:: + + (admin_RBAC ∩ /marketplace subscriptions) + ∪ system-mandatory plugins (admin-pinned, auto-applied) + ∪ Flea market installs (skills/agents bundled, plugins standalone) + + May be empty: the served stack is curated by the user on the + `/marketplace` page (admin grants are eligibility only — the user + opts in via "Add to stack") plus whatever the admin pinned as + system-mandatory plus the user's own Flea market picks. A brand-new + account with no system plugins and no curation has an empty stack + until something lands in any of those three buckets. + + Registering the marketplace clone is unconditional regardless — + Claude Code learns about the `agnes` marketplace at bootstrap, and + the moment the served stack becomes non-empty, the user's next + `/update-agnes-plugins` run installs the diff. No need to re-run + setup when the stack changes server-side. `step_num` is parameterized because step ordering shifted between layouts (this block now runs before diagnose/skills, so it's step 5 @@ -555,29 +553,41 @@ def _marketplace_block( """ has_plugins = bool(plugin_install_names) header = ( - "Register the Agnes Claude Code marketplace and install plugins:" + "Register the Agnes Claude Code marketplace and install your current stack:" if has_plugins - else "Register the Agnes Claude Code marketplace (no plugins granted yet):" + else "Register the Agnes Claude Code marketplace (your stack is empty for now):" ) bullet_5 = ( - " # 5. install every plugin listed in the served manifest" + " # 5. install every plugin currently in your served stack" if has_plugins - else " # 5. (no plugins to install — your account has zero grants)" + else " # 5. (your served stack is empty right now — nothing to install yet)" ) if has_plugins: trailer = [ " These run non-interactively. After they finish, tell the user to /exit", - " and run `claude` again so the new plugins load. From then on, the", - " SessionStart hook keeps the marketplace clone in sync via", - " `agnes refresh-marketplace --quiet` on every Claude Code session.", + " and run `claude` again so the new plugins load.", + "", + " Stack curation lives on the server — visit /marketplace to add or", + " remove items (admin-granted opt-ins, system plugins your org pinned,", + " and uploads from the Flea market tab). The SessionStart hook checks", + " for server-side changes on every Claude Code session and, when it", + " detects a diff, prompts you to run `/update-agnes-plugins` inside", + " Claude Code to apply it. No silent auto-install at session start —", + " the slash command runs full reconcile with output visible in the", + " transcript, under your control.", ] else: trailer = [ - " Your account has no plugin grants right now, but registering the", - " marketplace anyway pre-wires the SessionStart hook. When an admin", - " grants you a plugin later, `agnes refresh-marketplace` (run by the", - " hook on every Claude Code session) will install it automatically —", - " no need to re-run this setup script.", + " Your served stack is empty right now — nothing to install yet.", + " Registering the marketplace clone anyway pre-wires Claude Code so", + " future picks land cleanly: visit /marketplace to add plugins to", + " your stack (admin-granted opt-ins, uploads from the Flea market", + " tab), or wait for your admin to pin something as system-mandatory.", + "", + " When your stack becomes non-empty, the SessionStart hook detects", + " the change on the next Claude Code session and prompts you to run", + " `/update-agnes-plugins` inside Claude Code to install the new", + " items. No need to re-run this setup script.", ] return [ "", @@ -670,29 +680,24 @@ def _preamble_lines(*, has_ca: bool) -> list[str]: return lines -def _step_numbers(*, has_skills: bool = True) -> dict[str, str]: +def _step_numbers() -> dict[str, str]: """Compute the step numbers for the unified layout. Returns a dict keyed by logical step name; values are stringified 1-based step numbers (preserving the existing string-based helper API so call sites stay diff-minimal). - Mandatory steps (always emitted): install (1), init (2), catalog (3), - preflight (4), marketplace (5), mcp_servers (6), diagnose (7), skills - (8), confirm (9). Preflight + marketplace + mcp_servers are all - always-on: - - Marketplace registration is useful even when the operator has - zero plugin grants (SessionStart hook reconciles future grants - automatically). + Steps (always emitted): install (1), init (2), catalog (3), + preflight (4), marketplace (5), mcp_servers (6), diagnose (7), + confirm (8). Preflight + marketplace + mcp_servers are all always-on: + - Marketplace registration is useful even with an empty served + stack (future admin grants / system pins / Flea installs land + cleanly without re-running setup). - Atlassian MCP registration is unattended-safe (hosted Remote MCP with Claude Code-managed OAuth) and applies to every analyst whose work touches Jira/Confluence — high enough hit rate to justify default-on. - `has_skills` is kept as a parameter for future flexibility; default - True (the Resolved-Question section in the original plan settled on - always-on). - Step-0 (TLS trust block) sits outside this numbering — it is gated by has_ca and has its own "0)" header rendered inside the trust block helper. @@ -702,16 +707,12 @@ def _step_numbers(*, has_skills: bool = True) -> dict[str, str]: marketplace = str(n); n += 1 mcp_servers = str(n); n += 1 diagnose = str(n); n += 1 - skills = str(n) if has_skills else "" - if has_skills: - n += 1 confirm = str(n) return { "preflight": preflight, "marketplace": marketplace, "mcp_servers": mcp_servers, "diagnose": diagnose, - "skills": skills, "confirm": confirm, } @@ -729,10 +730,12 @@ def resolve_lines( `{server_url}` and `{token}` as placeholders for click-time JS substitution (or for `render_setup_instructions()` below). - When `plugin_install_names` is empty/None, the output matches the - six-step no-marketplace layout (Confirm = step 6). When non-empty, a - step-4 pre-flight + step-5 marketplace block are inserted and Confirm - becomes step 8. + The layout is the same regardless of `plugin_install_names`: install + (1), init (2), catalog (3), preflight (4), marketplace (5), + mcp_servers (6), diagnose (7), confirm (8). The marketplace block's + copy adapts to an empty served stack but the step is always emitted + so future stack changes (admin grants, system pins, Flea installs) + land cleanly without re-running setup. `ca_pem` (PEM-encoded fullchain of the Agnes server's TLS cert) gates the cross-platform step-0 trust-bootstrap block AND switches step 1 to @@ -749,14 +752,10 @@ def resolve_lines( names = list(plugin_install_names or []) has_ca = bool(ca_pem and ca_pem.strip()) - # Step layout. Preflight + marketplace go BEFORE diagnose/skills so the - # human-loop skills question is the last step before Confirm. Both are - # always emitted: the marketplace registration is useful even with zero - # plugin grants (the SessionStart hook reconciles future admin grants - # automatically without re-running setup). `_step_numbers` returns the - # renumbered step labels in one place — no branch on every helper — so - # the layout is unambiguous and trivially extendable. - steps = _step_numbers(has_skills=True) + # Step layout — single fixed shape; `_step_numbers` returns the + # renumbered step labels in one place so the layout is unambiguous + # and trivially extendable when a future step is added. + steps = _step_numbers() lines: list[str] = [] if has_ca: @@ -767,10 +766,9 @@ def resolve_lines( lines.extend(_preflight_block(steps["preflight"])) # 4 lines.extend(_marketplace_block(names, step_num=steps["marketplace"])) # 5 lines.extend(_mcp_servers_block(steps["mcp_servers"])) # 6 - # Diagnose + skills come AFTER marketplace + MCP wiring. - lines.extend(_diagnose_skills_lines( - diagnose_num=steps["diagnose"], skills_num=steps["skills"], - )) + # Diagnose runs AFTER marketplace + MCP wiring so it doubles as a + # final smoke test, not a pre-install gate. + lines.extend(_diagnose_lines(diagnose_num=steps["diagnose"])) # 7 lines.append("") lines.extend(_finale_lines( confirm_step_num=steps["confirm"], diff --git a/cli/commands/capture_session.py b/cli/commands/capture_session.py new file mode 100644 index 0000000..b9ed9cb --- /dev/null +++ b/cli/commands/capture_session.py @@ -0,0 +1,174 @@ +"""`agnes capture-session` — SessionStart hook helper. + +Reads the Claude Code hook payload from stdin (a JSON object containing +``transcript_path``), extracts the absolute path to the current session's +``.jsonl`` transcript, and appends it to ``/.claude/agnes-sessions.txt``. + +The queue file feeds ``agnes push``: rather than reverse-engineer Claude +Code's cwd-to-folder encoding (an internal implementation detail), we use +the ``transcript_path`` field of the hook stdin JSON, which is part of the +documented hook contract. + +Failure modes — silent exit code 0 in all cases, since this command runs +inside a SessionStart hook chain and a noisy failure would clutter Claude +Code's startup output: +- stdin not JSON → no-op +- JSON missing ``transcript_path`` → no-op +- ``transcript_path`` empty → no-op +- Workspace ``.claude/`` not writable → no-op (best-effort, hook continues) + +Diagnostic stderr output only when ``--verbose`` is set, for debugging +hook misconfiguration. The hook command in ``cli/lib/hooks.py`` does NOT +pass ``--verbose`` in production. + +Operability breadcrumb: every invocation (success OR silent-failure path) +appends one line to ``/.claude/agnes-capture-session.log`` +with the timestamp and outcome. ``agnes diagnose`` (and ad-hoc ops +inspection) can read the tail of this log to flag "hook is firing but +queue stays empty" — without it, an upstream Claude Code contract change +(e.g. stdin payload schema shifts) is invisible to operators because the +hook always exits 0 (David's #11 from the PR review). +""" + +from __future__ import annotations + +import json +import os +import sys +from datetime import datetime, timezone +from pathlib import Path + +import typer + +from cli.lib.private_list import is_private +from cli.lib.session_queue import append_to_queue + + +_BREADCRUMB_FILENAME = "agnes-capture-session.log" +_BREADCRUMB_MAX_BYTES = 256 * 1024 # 256 KiB rolling cap; ~3000 lines + + +def _record_breadcrumb(workspace: Path, outcome: str, detail: str = "") -> None: + """Append one TSV line to the capture-session breadcrumb log. + + Format: ``\\t\\t``. Outcomes: + ``ok`` (queued), ``private_skip`` (matched private list), + ``empty_stdin``, ``bad_json``, ``not_object``, ``no_transcript_path``, + ``stdin_read_error``, ``write_error``. + + Best-effort: a failure here MUST NOT escape — this is the hook's + silent-failure observability layer; raising would defeat the point. + Rolls the log file when it crosses 256 KiB so it doesn't grow + unboundedly on a long-lived workspace. + """ + try: + claude_dir = workspace / ".claude" + if not claude_dir.exists(): + # Don't materialize .claude/ in non-Agnes directories — same + # rationale as private_list._claude_dir_readonly. If the dir + # doesn't exist, capture-session is a no-op anyway. + return + path = claude_dir / _BREADCRUMB_FILENAME + try: + if path.stat().st_size > _BREADCRUMB_MAX_BYTES: + path.unlink() + except OSError: + pass + line = "\t".join([ + datetime.now(timezone.utc).isoformat(timespec="seconds"), + outcome, + detail.replace("\t", " ").replace("\n", " ")[:200], + ]) + "\n" + with open(path, "a", encoding="utf-8") as f: + f.write(line) + except Exception: + # Truly best-effort: any breadcrumb failure is swallowed so the + # capture-session contract stays "exit 0 always". + pass + + +capture_session_app = typer.Typer( + help="Capture the current Claude Code session's transcript path into the upload queue.", +) + + +@capture_session_app.callback(invoke_without_command=True) +def capture_session( + verbose: bool = typer.Option( + False, "--verbose", help="Log diagnostic info to stderr (off by default)." + ), +) -> None: + """Read SessionStart hook stdin JSON and append (session_id, transcript_path) to queue. + + Honors the private list: if the session_id is already marked private + (e.g. user ran `/agnes-private` before this hook chain reached + capture-session), the queue write is skipped so the session never + enters the upload pipeline. + """ + workspace = Path(os.environ.get("AGNES_LOCAL_DIR", ".")).resolve() + + try: + raw = sys.stdin.read() + except Exception as exc: + if verbose: + typer.echo(f"capture-session: stdin read failed: {exc}", err=True) + _record_breadcrumb(workspace, "stdin_read_error", str(exc)) + return + + if not raw.strip(): + if verbose: + typer.echo("capture-session: empty stdin, nothing to capture.", err=True) + _record_breadcrumb(workspace, "empty_stdin") + return + + try: + payload = json.loads(raw) + except json.JSONDecodeError as exc: + if verbose: + typer.echo(f"capture-session: stdin not valid JSON: {exc}", err=True) + _record_breadcrumb(workspace, "bad_json", str(exc)) + return + + if not isinstance(payload, dict): + if verbose: + typer.echo("capture-session: payload is not a JSON object.", err=True) + _record_breadcrumb(workspace, "not_object", type(payload).__name__) + return + + transcript_path = payload.get("transcript_path") + if not transcript_path or not isinstance(transcript_path, str): + if verbose: + typer.echo("capture-session: payload missing transcript_path.", err=True) + _record_breadcrumb(workspace, "no_transcript_path") + return + + session_id = payload.get("session_id") or "" + if not isinstance(session_id, str): + session_id = "" + + # Race protection: user may have run /agnes-private BEFORE this hook + # got a chance to write. Skip the queue append in that case — the + # private list is the authoritative source for "do not upload". + if session_id and is_private(workspace, session_id): + if verbose: + typer.echo( + f"capture-session: session {session_id} is private; skipping queue.", + err=True, + ) + _record_breadcrumb(workspace, "private_skip", session_id) + return + + try: + append_to_queue(workspace, session_id, transcript_path) + except OSError as exc: + if verbose: + typer.echo( + f"capture-session: append to queue failed ({workspace}): {exc}", + err=True, + ) + _record_breadcrumb(workspace, "write_error", str(exc)) + return + + _record_breadcrumb(workspace, "ok", session_id or "(no-sid)") + if verbose: + typer.echo(f"capture-session: queued {transcript_path}", err=True) diff --git a/cli/commands/mark_private.py b/cli/commands/mark_private.py new file mode 100644 index 0000000..4b8990f --- /dev/null +++ b/cli/commands/mark_private.py @@ -0,0 +1,55 @@ +"""`agnes mark-private` — mark the current Claude Code session as private. + +Invoked by the `/agnes-private` slash command (deterministic ``!``-prefix +direct bash, no AI in the loop). Reads ``CLAUDE_CODE_SESSION_ID`` from the +environment — Claude Code sets this variable in every Bash/PowerShell +subprocess it spawns (documented stable API). + +Adds the session_id to ``/.claude/agnes-sessions-private.txt``. +That file is the authoritative source for "do not upload" — both +``capture-session`` and ``push`` consult it. Adding the ID here is enough +to keep the session out of the upload pipeline regardless of whether +``capture-session`` already ran (race-safe by design — see +``cli/lib/private_list.py`` docstring). + +Refuses to run outside a Claude Code session (no ``CLAUDE_CODE_SESSION_ID``) +to make accidental CLI invocations from a regular terminal obvious. +""" + +from __future__ import annotations + +import os +from pathlib import Path + +import typer + +from cli.lib.private_list import add_private + + +mark_private_app = typer.Typer( + help="Mark the current Claude Code session as private — exclude it from `agnes push`.", +) + + +@mark_private_app.callback(invoke_without_command=True) +def mark_private() -> None: + """Add CLAUDE_CODE_SESSION_ID to the workspace private list.""" + session_id = os.environ.get("CLAUDE_CODE_SESSION_ID", "").strip() + if not session_id: + typer.echo( + "Error: CLAUDE_CODE_SESSION_ID is not set. " + "Run this inside a Claude Code session (via /agnes-private).", + err=True, + ) + raise typer.Exit(1) + + workspace = Path(os.environ.get("AGNES_LOCAL_DIR", ".")).resolve() + newly_added = add_private(workspace, session_id) + + if newly_added: + typer.echo( + f"Session {session_id} marked as private. " + "Its transcript will not be uploaded by `agnes push`." + ) + else: + typer.echo(f"Session {session_id} is already marked as private. No change.") diff --git a/cli/commands/push.py b/cli/commands/push.py index 525fc9e..7fbb242 100644 --- a/cli/commands/push.py +++ b/cli/commands/push.py @@ -1,28 +1,39 @@ -"""`agnes push` - upload session jsonl + CLAUDE.local.md to the server. +"""`agnes push` — upload session jsonls + CLAUDE.local.md to the server. -Thin Typer wrapper extracted from the legacy `agnes sync --upload-only` -path in `cli/commands/sync.py`. Used by: -- Manual invocation: analyst types `agnes push` to force an upload. -- SessionEnd hook: `agnes push --quiet 2>/dev/null || true` runs at the - end of every Claude Code session in this workspace. +The push command consumes a workspace-local queue file +(``/.claude/agnes-sessions.txt``) populated by the +``agnes capture-session`` SessionStart hook. Each line is a TSV pair: +``\\t``. The session_id lets push consult +the private list (``cli/lib/private_list.py``) and skip uploads that +the user explicitly marked via ``/agnes-private``. -Lazy on-disk contract: when there are no `user/sessions/*.jsonl` files -and no `.claude/CLAUDE.local.md`, this command must NOT create -`user/sessions/` (or any other directory). Tests pin the lazy mkdir -contract so the empty-workspace case stays a true no-op on disk. +Concurrency: a single-instance lock (``filelock`` via ``cli/lib/push_lock.py``) +ensures only one push runs at a time. When the user closes several Claude +Code sessions simultaneously, every SessionEnd hook fires its own +``agnes push``; exactly one runs, the rest exit silently. -Errors render via `cli/error_render.py:render_error()` for typed-error -shape consistency with `agnes pull`. +Race protection: the queue file is atomically renamed to a snapshot before +processing. SessionStart hooks that fire during the push window write to +a freshly-created queue, so their entries aren't lost. -Task 18 will register `push_app` on the root Typer app and delete the -legacy `agnes sync --upload-only` flag. Until then this module is -callable only via direct import (which is exactly what the test does). +Recovery: if a previous push crashed mid-run, its snapshot file persists. +The next push picks it up before processing the current queue. + +Private filter: even if a marked-private session_id slipped into the +queue before ``/agnes-private`` was run, push re-checks the private list +per entry. Skipped entries are audit-logged to +``/.claude/agnes-sessions-private-skipped.txt``. + +Legacy fallback: the encoding-based ``list_session_files`` path remains +available behind ``--legacy-scan`` for one-off backfills of sessions that +predate the queue mechanism. """ from __future__ import annotations import json import os +from datetime import datetime, timezone from pathlib import Path import typer @@ -30,23 +41,131 @@ import typer from cli.client import api_post from cli.config import get_server_url, get_token from cli.error_render import render_error +from cli.lib.private_list import read_all_private +from cli.lib.push_lock import acquire_or_skip +from cli.lib.session_queue import ( + discard_snapshot, + failed_log_path, + find_recovery_snapshots, + mark_failed_permanent, + mark_private_skipped, + mark_uploaded, + queue_path, + read_entries_from_snapshot, + requeue_failed, + snapshot_queue, + uploaded_log_path, +) push_app = typer.Typer(help="Upload sessions and CLAUDE.local.md to the server") +def _collect_snapshots(workspace: Path) -> list[Path]: + """Recovery snapshots first (oldest first), then a fresh snapshot of the + current queue. Either may be absent. Returns the list of snapshot paths + to process in order. + """ + snapshots = find_recovery_snapshots(workspace) + fresh = snapshot_queue(workspace) + if fresh is not None: + snapshots.append(fresh) + return snapshots + + +def _gather_entries_for_dry_run(workspace: Path) -> list[tuple[str, Path]]: + """Read (session_id, path) entries that *would* be uploaded without + consuming the queue. Combines existing recovery snapshots + the + current live queue. Does NOT rename the live queue (dry-run is + read-only). + """ + out: list[tuple[str, Path]] = [] + seen: set[tuple[str, str]] = set() + + def _add(entries: list[tuple[str, Path]]) -> None: + for sid, p in entries: + key = (sid, str(p)) + if key in seen: + continue + seen.add(key) + out.append((sid, p)) + + for snap in find_recovery_snapshots(workspace): + _add(read_entries_from_snapshot(snap)) + + live = queue_path(workspace) + if live.exists(): + _add(read_entries_from_snapshot(live)) + + return out + + +def _is_permanent_failure(info: dict) -> bool: + """True iff the server's response indicates a deterministic failure + that retrying won't help. We treat 4xx (except 408 / 429) as + permanent — 401 (token expired), 403 (RBAC denial), 413 (payload + too large), 400 (validation error) all have the same property: + re-uploading the same file produces the same answer, so a + requeue-loop only wastes bytes and grows the queue forever. 5xx + and network exceptions stay transient — those reflect server or + transport state that can change between push runs. + + 408 Request Timeout and 429 Too Many Requests are flagged transient + by the HTTP spec (RFC 7231 / RFC 6585); the server is telling us + to back off and try again later, not that the request is invalid. + """ + status = info.get("status") + if not isinstance(status, int): + return False # network error / exception — transient + if status in (408, 429): + return False + return 400 <= status < 500 + + +def _upload_one(transcript: Path) -> tuple[bool, dict]: + """Upload a single session jsonl. Returns (success, error_or_meta).""" + if not transcript.exists(): + return False, {"file": transcript.name, "error": "file not found on disk"} + try: + with open(transcript, "rb") as fh: + resp = api_post("/api/upload/sessions", files={"file": (transcript.name, fh)}) + except Exception as exc: + return False, {"file": transcript.name, "error": str(exc)} + if resp.status_code == 200: + return True, {"file": transcript.name} + return False, {"file": transcript.name, "status": resp.status_code} + + @push_app.callback(invoke_without_command=True) def push( - quiet: bool = typer.Option(False, "--quiet", help="Suppress success stdout (errors still surface on stderr)"), - as_json: bool = typer.Option(False, "--json", help="Emit a single JSON object summarizing the upload"), - dry_run: bool = typer.Option(False, "--dry-run", help="List what would be uploaded without sending anything"), + quiet: bool = typer.Option( + False, + "--quiet", + help="Suppress success stdout (errors still surface on stderr).", + ), + as_json: bool = typer.Option( + False, "--json", help="Emit a single JSON object summarizing the upload." + ), + dry_run: bool = typer.Option( + False, + "--dry-run", + help="List what would be uploaded without sending anything.", + ), + legacy_scan: bool = typer.Option( + False, + "--legacy-scan", + help=( + "Fallback: also include sessions found by the encoding-based scan " + "of ~/.claude/projects/. Use for one-off backfill of sessions " + "predating the queue mechanism. The /agnes-private list IS " + "consulted — Claude Code names jsonls ``.jsonl`` so " + "the file stem provides the session id even for legacy entries." + ), + ), ): - """Upload session jsonl + CLAUDE.local.md from ./user/sessions and ./.claude.""" + """Upload queued session jsonls + CLAUDE.local.md from this workspace.""" server_url = get_server_url() if not server_url: - # `get_server_url()` falls back to a localhost default today, so this - # branch is mostly a defensive guard for a future config change that - # might return an empty string. typer.echo( render_error(0, {"detail": { "kind": "server_unreachable", @@ -69,27 +188,49 @@ def push( workspace = Path(os.environ.get("AGNES_LOCAL_DIR", ".")).resolve() local_md = workspace / ".claude" / "CLAUDE.local.md" - - # Claude Code writes session jsonls to ~/.claude/projects// - # — the encoding varies by Claude Code version (older: `/` -> `-`, - # newer: all non-alphanumeric -> `-`). The helper tries both encodings - # and also falls back to the legacy /user/sessions/ for - # setups that mirror sessions there explicitly. See - # cli/lib/claude_sessions.py for details. - from cli.lib.claude_sessions import list_session_files - session_files = list_session_files(workspace) has_local_md = local_md.exists() + # ---- DRY RUN ---------------------------------------------------------- if dry_run: + candidates = _gather_entries_for_dry_run(workspace) + private_ids = read_all_private(workspace) + non_private = [(sid, p) for sid, p in candidates if not (sid and sid in private_ids)] + private_skipped = [(sid, p) for sid, p in candidates if sid and sid in private_ids] + + if legacy_scan: + from cli.lib.claude_sessions import list_session_files + seen = {str(p) for _sid, p in non_private} + for p in list_session_files(workspace): + if str(p) in seen: + continue + # Apply the private filter to legacy-scan candidates too. + # Claude Code names jsonls ``.jsonl``, so the + # file stem IS the session id and we can apply the same + # filter the queue path uses. Closes the gap David #8 + # raised: legacy-scan would otherwise upload everything + # on disk, including sessions the user later marked + # private. + sid_from_path = p.stem + if sid_from_path and sid_from_path in private_ids: + private_skipped.append((sid_from_path, p)) + else: + non_private.append((sid_from_path, p)) + seen.add(str(p)) + plan = { "dry_run": True, "would_upload": { - "sessions": [str(f) for f in session_files], + "sessions": [str(p) for _sid, p in non_private], "local_md": str(local_md) if has_local_md else None, }, + "would_skip_private": [ + {"session_id": sid, "path": str(p)} for sid, p in private_skipped + ], "summary": { - "sessions_count": len(session_files), + "sessions_count": len(non_private), + "private_skipped_count": len(private_skipped), "local_md_present": has_local_md, + "uploaded_log": str(uploaded_log_path(workspace)), }, } if as_json: @@ -97,60 +238,154 @@ def push( return if quiet: return - typer.echo(f"Dry run - would upload {len(session_files)} session file(s)") - for f in session_files: - typer.echo(f" {f}") + typer.echo(f"Dry run - would upload {len(non_private)} session file(s)") + for _sid, p in non_private: + typer.echo(f" {p}") + if private_skipped: + typer.echo(f"Would skip {len(private_skipped)} private session(s):") + for sid, p in private_skipped: + typer.echo(f" [{sid}] {p}") if has_local_md: typer.echo(f"Would upload CLAUDE.local.md ({local_md})") else: typer.echo("No CLAUDE.local.md to upload") return - results = {"sessions": 0, "local_md": False, "errors": []} + # ---- REAL RUN --------------------------------------------------------- + # Acquire single-instance lock. Silent exit if another push is already + # running — typical when several SessionEnd hooks fire at once. + with acquire_or_skip(workspace) as lock: + if lock is None: + return # another push has the lock; this one no-ops - # Upload sessions. Per-file failures are recorded into `errors` and the - # loop continues - one corrupt jsonl mustn't block the rest, and a - # transient 5xx on one file shouldn't poison the whole upload. - for f in session_files: - try: - with open(f, "rb") as fh: - resp = api_post("/api/upload/sessions", files={"file": (f.name, fh)}) - if resp.status_code == 200: - results["sessions"] += 1 - else: - results["errors"].append( - {"file": f.name, "status": resp.status_code} - ) - except Exception as exc: - results["errors"].append({"file": f.name, "error": str(exc)}) + results = { + "sessions": 0, + "local_md": False, + "errors": [], + "skipped": 0, + "private_skipped": 0, + "dropped_permanent": 0, + } - # Upload CLAUDE.local.md - if has_local_md: - try: - content = local_md.read_text(encoding="utf-8") - resp = api_post("/api/upload/local-md", json={"content": content}) - if resp.status_code == 200: - results["local_md"] = True - else: - results["errors"].append( - {"file": "CLAUDE.local.md", "status": resp.status_code} - ) - except Exception as exc: - results["errors"].append({"file": "CLAUDE.local.md", "error": str(exc)}) + # Snapshot the private list once at the start of the run. Adding + # a new private ID between snapshot and the per-entry check is + # benign (worst case: one more upload of a session the user just + # marked, which next push will skip). + private_ids = read_all_private(workspace) + # Process snapshots: recovery (from prior crash) first, then fresh. + snapshots = _collect_snapshots(workspace) + all_failed_entries: list[tuple[str, Path]] = [] + + for snapshot in snapshots: + entries = read_entries_from_snapshot(snapshot) + failed_in_snapshot: list[tuple[str, Path]] = [] + now = datetime.now(timezone.utc) + for session_id, transcript in entries: + if session_id and session_id in private_ids: + # Skip private; audit-log and move on. Do not requeue — + # this is the user's explicit "do not upload" intent. + mark_private_skipped(workspace, session_id, transcript, now) + results["private_skipped"] += 1 + continue + ok, info = _upload_one(transcript) + if ok: + results["sessions"] += 1 + mark_uploaded(workspace, transcript, now) + else: + if info.get("error") == "file not found on disk": + # Stale queue entry (Claude Code auto-cleanup deleted + # the jsonl). Skip without re-queuing — retry would + # loop forever. + results["skipped"] += 1 + results["errors"].append(info) + elif _is_permanent_failure(info): + # 4xx (except 408 / 429): server says this request + # will never succeed. Drop + audit-log instead of + # requeueing forever. Closes the prior loop bug + # where 401 (token expired), 413 (file too large), + # 400 (validation), etc. cycled through every + # push run, growing the queue without bound. + mark_failed_permanent( + workspace, session_id, transcript, + info["status"], now, + ) + results["dropped_permanent"] += 1 + results["errors"].append(info) + else: + # 5xx, 408, 429, network errors — genuinely + # transient. Requeue for the next push. + results["errors"].append(info) + failed_in_snapshot.append((session_id, transcript)) + # Failed entries from this snapshot get re-queued on the live file. + all_failed_entries.extend(failed_in_snapshot) + discard_snapshot(snapshot) + + if all_failed_entries: + requeue_failed(workspace, all_failed_entries) + + # Optional: legacy scan to backfill sessions outside the queue. + # Honors the private list — Claude Code names jsonls + # ``.jsonl``, so the file stem IS the session id and + # we can apply the same filter the queue path uses. Without this + # filter, an operator running ``--legacy-scan`` to backfill old + # sessions would silently upload every transcript on disk, + # including ones the user later marked private (David's #8 from + # the PR review). + if legacy_scan: + from cli.lib.claude_sessions import list_session_files + private_ids = read_all_private(workspace) + for transcript in list_session_files(workspace): + sid_from_path = transcript.stem + if sid_from_path and sid_from_path in private_ids: + mark_private_skipped(workspace, sid_from_path, str(transcript)) + results["private_skipped"] = results.get("private_skipped", 0) + 1 + continue + ok, info = _upload_one(transcript) + if ok: + results["sessions"] += 1 + mark_uploaded(workspace, transcript, datetime.now(timezone.utc)) + else: + results["errors"].append(info) + + # Upload CLAUDE.local.md. + if has_local_md: + try: + content = local_md.read_text(encoding="utf-8") + resp = api_post("/api/upload/local-md", json={"content": content}) + if resp.status_code == 200: + results["local_md"] = True + else: + results["errors"].append( + {"file": "CLAUDE.local.md", "status": resp.status_code} + ) + except Exception as exc: + results["errors"].append({"file": "CLAUDE.local.md", "error": str(exc)}) + + # Render output. if as_json: typer.echo(json.dumps(results)) return if quiet: - # Quiet mode is for the SessionEnd hook - silent on success so - # Claude Code's stdout stays clean. Errors still flow to stderr. if results["errors"]: for e in results["errors"]: typer.echo(f"warn: {e}", err=True) return typer.echo(f"Uploaded {results['sessions']} sessions") + if results["private_skipped"]: + typer.echo( + f"Skipped {results['private_skipped']} private session(s) " + f"(see .claude/agnes-sessions-private-skipped.txt)" + ) + if results["skipped"]: + typer.echo(f"Skipped {results['skipped']} stale queue entries (file missing)") + if results["dropped_permanent"]: + typer.echo( + f"Dropped {results['dropped_permanent']} session(s) with permanent failure " + f"(see .claude/agnes-sessions-failed.txt)" + ) if results["local_md"]: typer.echo("Uploaded CLAUDE.local.md") if results["errors"]: diff --git a/cli/commands/self_upgrade.py b/cli/commands/self_upgrade.py index 51b1144..2dbed9d 100644 --- a/cli/commands/self_upgrade.py +++ b/cli/commands/self_upgrade.py @@ -15,6 +15,7 @@ from typing import Optional, Union import typer from cli.config import _config_dir, get_server_url +from cli.lib.hooks import maybe_refresh_claude_hooks from cli.update_check import UpdateInfo, check, format_outdated_notice self_upgrade_app = typer.Typer( @@ -169,6 +170,28 @@ def _smoke_test_new_binary(install_method: str, expected_version: str) -> tuple[ return False, f"{type(e).__name__}: {e}" +def _try_refresh_hooks(*, quiet: bool) -> None: + """Best-effort idempotent refresh of the workspace's Claude Code hooks. + + Resolves the workspace via ``AGNES_LOCAL_DIR`` (set by Claude Code's + hook subprocess to the workspace root) or the current working directory. + Delegates the actual decision to :func:`maybe_refresh_claude_hooks`, + which guards against writing into non-Agnes directories. + + Swallows any exception — a partially-broken settings.json or a + permissions issue must not flip the exit code of a successful + upgrade. When ``quiet`` is False the failure is surfaced to stderr + so an operator running ``agnes self-upgrade`` interactively still sees + it; under ``--quiet`` (the SessionStart case) it stays silent. + """ + workspace = Path(os.environ.get("AGNES_LOCAL_DIR", ".")).resolve() + try: + maybe_refresh_claude_hooks(workspace) + except Exception as exc: # pragma: no cover — defensive + if not quiet: + sys.stderr.write(f"agnes self-upgrade: hook refresh failed: {exc}\n") + + def _resolve_info(force: bool) -> Union[UpdateInfo, _Unreachable, None]: """Returns: UpdateInfo — install this wheel @@ -282,9 +305,24 @@ def self_upgrade( raise typer.Exit(1) if info is None: - raise typer.Exit(0) # nothing to do, silent + # CLI already current — still attempt hook refresh in case the + # workspace was initialized on an older CLI whose hook layout + # has since changed (e.g. v0.48 → v0.49 introduced the + # capture-session SessionStart entry). The refresh is a no-op + # for directories that don't look like Agnes workspaces, so + # an `agnes self-upgrade` invoked from ~/ won't write there. + _try_refresh_hooks(quiet=quiet) + raise typer.Exit(0) rc = _do_install_with_smoke_and_rollback(info, quiet=quiet) + if rc == 0: + # After a successful install of the new wheel, refresh the + # workspace hooks so any wire-format change in the new release + # lands on the next session-start without re-running + # `agnes init`. Failure here must not turn a successful + # upgrade into a non-zero exit — the rollback path has already + # finished. Errors are surfaced on stderr only. + _try_refresh_hooks(quiet=quiet) raise typer.Exit(rc) finally: if prior_sentinel is None: diff --git a/cli/commands/statusline.py b/cli/commands/statusline.py new file mode 100644 index 0000000..64d0ba3 --- /dev/null +++ b/cli/commands/statusline.py @@ -0,0 +1,62 @@ +"""`agnes statusline` — Claude Code statusLine helper. + +Claude Code's ``statusLine`` setting in ``settings.json`` invokes a shell +command on every status-bar render and displays the first line of its +stdout. The session payload (a JSON object containing ``session_id``, +``transcript_path``, ``workspace``, ``model``, etc.) arrives on stdin. + +This command: +1. Reads the JSON payload from stdin. +2. Extracts ``session_id``. +3. Checks the workspace private list. +4. Prints ``🔒 agnes-private`` if the session is private; otherwise prints + nothing (empty status bar segment — lets other tools paint the rest). + +Performance: this is invoked on EVERY status-bar refresh, which can be +once per second or more. The implementation is intentionally minimal — +stdlib-only, single file read, no API calls, no heavy imports. + +All failure modes (malformed JSON, missing session_id, unreadable list) +end with exit 0 + empty stdout. Polluting Claude Code's status bar with +errors would be worse than a missing indicator. +""" + +from __future__ import annotations + +import json +import os +import sys +from pathlib import Path + +import typer + +from cli.lib.private_list import is_private + + +statusline_app = typer.Typer( + help="Status-line helper for Claude Code — prints '🔒 agnes-private' when the current session is private.", +) + + +@statusline_app.callback(invoke_without_command=True) +def statusline() -> None: + """Read stdin session JSON; emit private indicator if the session is private.""" + try: + raw = sys.stdin.read() + payload = json.loads(raw) if raw.strip() else {} + except (json.JSONDecodeError, OSError): + return # silent — never poison the status bar + + if not isinstance(payload, dict): + return + + session_id = payload.get("session_id") + if not isinstance(session_id, str) or not session_id: + return + + workspace = Path(os.environ.get("AGNES_LOCAL_DIR", ".")).resolve() + try: + if is_private(workspace, session_id): + typer.echo("🔒 agnes-private") + except OSError: + return # filesystem hiccup — empty output is better than a crash diff --git a/cli/lib/commands.py b/cli/lib/commands.py index 16bceb9..a8d4d3d 100644 --- a/cli/lib/commands.py +++ b/cli/lib/commands.py @@ -33,20 +33,30 @@ from pathlib import Path # `/` slug exposed to Claude Code). _MANAGED_COMMANDS: tuple[tuple[str, str], ...] = ( ("update-agnes-plugins.md", "update-agnes-plugins.md"), + ("agnes-private.md", "agnes-private.md"), ) -# Defensive fallback used when the bundled template is missing on disk -# (broken install, stripped-down test environment). Mirrors the -# `agnes_workspace_template.txt` fallback in `cli/commands/init.py` — -# better to write a usable stub than to crash `agnes init`. -_FALLBACK_BODY = ( - "---\n" - "description: Update Agnes marketplace plugins to latest versions\n" - "---\n" - "\n" - "Run `agnes refresh-marketplace` and report the output.\n" -) +# Defensive fallbacks used when the bundled template is missing on disk +# (broken install, stripped-down test environment). Keyed by source +# template filename so a missing `agnes-private.md` doesn't get +# clobbered with `update-agnes-plugins` content. +_FALLBACK_BODIES: dict[str, str] = { + "update-agnes-plugins.md": ( + "---\n" + "description: Update Agnes marketplace plugins to latest versions\n" + "---\n" + "\n" + "Run `agnes refresh-marketplace` and report the output.\n" + ), + "agnes-private.md": ( + "---\n" + "description: Mark the current Claude Code session as private\n" + "---\n" + "\n" + "!`agnes mark-private`\n" + ), +} def _templates_dir() -> Path: @@ -78,5 +88,5 @@ def install_claude_commands(workspace: Path) -> None: f"{source_path} missing; writing defensive fallback.", file=sys.stderr, ) - body = _FALLBACK_BODY + body = _FALLBACK_BODIES.get(source_name, "") (commands_dir / dest_name).write_text(body, encoding="utf-8") diff --git a/cli/lib/hooks.py b/cli/lib/hooks.py index cc7351c..f03deef 100644 --- a/cli/lib/hooks.py +++ b/cli/lib/hooks.py @@ -14,11 +14,17 @@ Design notes: - Uses `|| true` in the hook command so the hook never blocks a session on a transient sync error. - SessionStart gets three entries: - 1. Chained `agnes self-upgrade; agnes pull` — self-upgrade runs first + 1. `agnes capture-session` — reads the SessionStart stdin JSON payload + (`transcript_path`) and appends the absolute path to the queue file + `/.claude/agnes-sessions.txt`. This feeds `agnes push` + without reverse-engineering Claude Code's cwd-to-folder encoding. + Runs first so the path is captured before any subsequent hook can + fail and prevent later hooks from firing. + 2. Chained `agnes self-upgrade; agnes pull` — self-upgrade runs first so any wire-protocol bump lands before pull tries to use the new CLI version. Both `|| true`-guarded so an upgrade failure doesn't block the pull. - 2. `agnes refresh-marketplace --check` — independent entry. Detector- + 3. `agnes refresh-marketplace --check` — independent entry. Detector- only (since the slash-command split): runs `git fetch` against the marketplace clone and emits a Claude Code hook JSON message hinting the user at `/update-agnes-plugins` when remote content @@ -26,11 +32,13 @@ Design notes: command does that interactively, with full output visible in the Claude Code transcript and under user control. Failure (no clone, no token) silently no-ops via the surrounding `|| true`. - 3. `agnes push` — uploads any session JSONLs that haven't reached the - server yet (orphans from `claude -p` headless mode where Claude Code - does NOT fire SessionEnd, or from abnormal session exits). Symmetric - with `agnes pull` so the workspace heals on the next interactive - session start. + + The previous SessionStart `agnes push` self-heal entry was removed once + the capture-queue mechanism made it redundant: orphan session JSONLs from + headless / crashed sessions stay in the queue and get uploaded by the + next SessionEnd push (queue file persists across runs). Workspaces with + the old entry are migrated cleanly — `_replace_or_add` strips any + matching `agnes push` from SessionStart on the next `agnes init`. - SessionEnd gets one entry: `agnes push --quiet`, wrapped to detach into the background. Claude Code in `-p` (headless) mode terminates SessionEnd @@ -61,6 +69,9 @@ _OUR_COMMAND_MARKERS = ( "agnes pull", "agnes push", "agnes refresh-marketplace", + "agnes capture-session", + "agnes mark-private", + "agnes statusline", "da sync", ) @@ -108,13 +119,15 @@ def install_claude_hooks(workspace: Path) -> None: for cmd in commands: existing.append({"hooks": [{"type": "command", "command": cmd}]}) - # `refresh-marketplace` is wrapped in `bash -c` because Claude Code on - # Windows runs hook commands directly (no shell), so the `2>/dev/null - # || true` redirection + short-circuit syntax never gets interpreted. - # The self-upgrade+pull chained entry pre-dates the Windows fix and - # isn't churned for parity (the same redirection fluff applies but - # changing the existing wire would force every workspace to re-write - # its settings.json on the next `agnes init` for no behaviour gain). + # All entries are wrapped in `bash -c "..."` for Windows compatibility: + # Claude Code on Windows runs hook commands directly (no shell), so + # the `; ` chain operator, `2>/dev/null` redirection, and `|| true` + # short-circuit never get interpreted unless we explicitly invoke + # bash. (Git Bash on PATH or WSL satisfies this.) The self-upgrade + + # pull chain previously shipped unwrapped — that pre-dates the + # Windows fix; ship it wrapped now so every entry uses the same + # contract. Workspaces still on the older unwrapped form auto-upgrade + # via `maybe_refresh_claude_hooks` on the next `agnes self-upgrade`. # # `--check` makes the marketplace entry a detector only: the actual # plugin install/update happens in the `/update-agnes-plugins` slash @@ -122,11 +135,23 @@ def install_claude_hooks(workspace: Path) -> None: # Workspaces still on the older `--quiet` form auto-upgrade here # because `_OUR_COMMAND_MARKERS` matches by substring on the # `agnes refresh-marketplace` prefix. + # `agnes capture-session` reads the SessionStart hook stdin (a JSON + # payload with `transcript_path`) and appends the path to + # `.claude/agnes-sessions.txt`. That queue file feeds `agnes push`, + # avoiding any reverse-engineering of Claude Code's cwd-to-folder + # encoding. + # + # The previous SessionStart `agnes push` self-heal entry was dropped + # once the queue mechanism made it redundant — orphans from headless + # / crashed sessions remain in the queue and ship out with the next + # SessionEnd push. The marker substring "agnes push" stays in + # _OUR_COMMAND_MARKERS so the old entry is cleanly removed from any + # pre-existing settings.json on the next init. _replace_or_add("SessionStart", [ - "agnes self-upgrade --quiet 2>/dev/null || true; " - "agnes pull --quiet 2>/dev/null || true", + 'bash -c "agnes capture-session 2>/dev/null || true"', + 'bash -c "agnes self-upgrade --quiet 2>/dev/null || true; ' + 'agnes pull --quiet 2>/dev/null || true"', 'bash -c "agnes refresh-marketplace --check 2>/dev/null || true"', - 'bash -c "agnes push --quiet 2>/dev/null || true"', ]) # SessionEnd push must run detached. Claude Code in `-p` (headless) mode # SIGTERMs hook subprocesses after ~1 second regardless of work in @@ -142,4 +167,107 @@ def install_claude_hooks(workspace: Path) -> None: 'bash -c "( nohup agnes push --quiet /dev/null 2>&1 & ) ; true"', ]) + _install_statusline(cfg) + settings_path.write_text(json.dumps(cfg, indent=2) + "\n", encoding="utf-8") + + +# Claude Code's `statusLine` setting tells the editor to invoke a command +# on every status-bar refresh and display the first line of stdout. We +# wire it to `agnes statusline`, which surfaces the `🔒 agnes-private` +# indicator when the current session is marked private. +# +# Politeness: if the user (or another tool) has already set a `statusLine` +# in the workspace `settings.json`, we leave it untouched and emit a +# one-line stderr warning. Customizing the status bar is a personal +# preference and Agnes shouldn't clobber it. Operators who want both +# their custom output and the private indicator can compose `agnes +# statusline` into their own status-line command manually. +_STATUSLINE_MARKER = "agnes statusline" + + +def _install_statusline(cfg: dict) -> None: + existing = cfg.get("statusLine") + # Distinguish "key absent" / "key=null" / "key=empty string" from any + # real value. A `None` or `""` value is legal JSON but conveys "no + # statusLine wanted" rather than "default" — overwriting it would + # silently undo the user's explicit opt-out. + if existing is None or existing == "": + cfg["statusLine"] = {"type": "command", "command": "agnes statusline"} + return + if isinstance(existing, dict) and _STATUSLINE_MARKER in str(existing.get("command", "")): + return # already ours — idempotent re-init + print( + "Warning: existing statusLine in .claude/settings.json preserved. " + "To show the agnes-private indicator alongside your custom status, " + "add `agnes statusline` to your command.", + file=sys.stderr, + ) + + +def workspace_has_agnes_hooks(workspace: Path) -> bool: + """True iff ``workspace/.claude/settings.json`` already shows signs of a + prior ``agnes init``: at least one Agnes-managed hook entry, or our + statusLine command. + + Used as a guard by :func:`maybe_refresh_claude_hooks` so that + ``agnes self-upgrade`` (which fires from a SessionStart hook in every + Agnes workspace) does not accidentally install hooks into a directory + that is not an Agnes workspace — e.g. the user's home dir, if they + invoke ``agnes self-upgrade`` manually from there. + + Returns False on missing / malformed settings.json — the caller treats + that as "not an Agnes workspace", so the refresh skips. + """ + settings_path = workspace / ".claude" / "settings.json" + if not settings_path.exists(): + return False + try: + cfg = json.loads(settings_path.read_text(encoding="utf-8")) + except (json.JSONDecodeError, OSError): + return False + if not isinstance(cfg, dict): + return False + sl = cfg.get("statusLine") + if isinstance(sl, dict) and _STATUSLINE_MARKER in str(sl.get("command", "")): + return True + hooks = cfg.get("hooks", {}) + if not isinstance(hooks, dict): + return False + for entries in hooks.values(): + if not isinstance(entries, list): + continue + for entry in entries: + if not isinstance(entry, dict): + continue + for h in entry.get("hooks", []) or []: + cmd = h.get("command", "") if isinstance(h, dict) else "" + if any(marker in cmd for marker in _OUR_COMMAND_MARKERS): + return True + return False + + +def maybe_refresh_claude_hooks(workspace: Path) -> bool: + """Idempotently re-install Agnes hooks if ``workspace`` already looks like + an Agnes workspace, otherwise no-op. + + Called from ``agnes self-upgrade`` so that operators on workspaces + initialized with an older CLI version pick up new hook layout (e.g. + the v0.49 SessionStart ``agnes capture-session`` entry) on the next + session-start, without needing to re-run ``agnes init`` manually. + Without this, an existing v0.48 workspace would auto-upgrade the CLI + via its own SessionStart self-upgrade entry, but the new + capture-session hook would never get installed — the queue would stay + empty and ``agnes push`` would silently stop uploading sessions. + + The guard (``workspace_has_agnes_hooks``) makes this safe to call from + any working directory: ``agnes self-upgrade`` invoked from ``~/`` will + not create ``~/.claude/`` or write hooks there. + + Returns True if hooks were refreshed; False if the workspace looked + non-Agnes and we skipped. + """ + if not workspace_has_agnes_hooks(workspace): + return False + install_claude_hooks(workspace) + return True diff --git a/cli/lib/private_list.py b/cli/lib/private_list.py new file mode 100644 index 0000000..0106167 --- /dev/null +++ b/cli/lib/private_list.py @@ -0,0 +1,160 @@ +"""Private session list — authoritative state for "do not upload". + +A session is "private" iff its session_id appears (as a line) in +``/.claude/agnes-sessions-private.txt``. The list is the single +source of truth — both ``agnes capture-session`` (queue writer) and +``agnes push`` (queue reader) consult it and skip private session IDs. + +By making the list authoritative rather than treating queue removal as +authoritative, the slash-command / SessionStart-hook race condition is +impossible by construction: + +- If ``mark-private`` writes the ID before ``capture-session`` runs, + ``capture-session`` sees it on the list and skips the queue write. +- If ``capture-session`` runs first, ``mark-private`` adds the ID to the + list. ``push`` re-checks the list before each upload — the list itself + is the source of truth, queue removal is incidental. + +File format: one session_id per line, UTF-8, LF terminators. The file +is never rewritten — only appended to (with dedup on add). The growth +rate is one line per privately-marked session, so manual cleanup (if +ever needed) is fine. +""" + +from __future__ import annotations + +import os +from pathlib import Path +from threading import Lock + + +_PRIVATE_LIST_FILENAME = "agnes-sessions-private.txt" + + +def _claude_dir_writable(workspace: Path) -> Path: + """Return the workspace ``.claude/`` dir, creating it if needed. + + Use ONLY from write paths (``add_private``). Read paths must use + ``_claude_dir_readonly`` so ``agnes statusline`` doesn't materialize + ``.claude/`` directories in arbitrary working dirs (it gets called on + every prompt redraw, and the user may be in `~/` or another non-Agnes + directory at the time). + """ + d = workspace / ".claude" + d.mkdir(parents=True, exist_ok=True) + return d + + +def _claude_dir_readonly(workspace: Path) -> Path: + """Return the workspace ``.claude/`` path WITHOUT mkdir. + + Read paths can compose this with ``Path.exists()`` to short-circuit + cleanly when the directory doesn't exist — e.g. the user opened + Claude Code in a non-Agnes directory; the statusline still calls + ``is_private`` once per redraw and creating a ``.claude/`` shadow + in every directory the user opens is hostile. + """ + return workspace / ".claude" + + +def private_list_path(workspace: Path, *, writable: bool = False) -> Path: + """Path to the private-session list. + + ``writable=True`` ensures the parent ``.claude/`` exists (use from + ``add_private``); the read default keeps the function side-effect- + free so ``statusline`` and other hot-path readers don't churn the + filesystem. + """ + parent = _claude_dir_writable(workspace) if writable else _claude_dir_readonly(workspace) + return parent / _PRIVATE_LIST_FILENAME + + +# --------------------------------------------------------------------------- +# mtime-keyed read cache. ``agnes statusline`` is documented to run once per +# Claude Code prompt redraw (sub-second on an active session). The private +# list is append-only, so mtime is sufficient cache-keying: if the file's +# mtime hasn't moved, the cached set is still accurate. Cache keyed per +# absolute file path so multiple workspaces in the same process don't +# share state. +# +# The cache is process-local. statusline is invoked as a fresh subprocess +# per redraw (Claude Code's ``statusLine`` setting spawns the configured +# command), so the cache doesn't help the per-redraw path on its own — +# subprocess spawn cost dominates. What it DOES help is in-process callers +# (push doing one mtime stat per upload candidate, ``agnes diagnose`` +# scanning workspaces) and prevents the previous behaviour of calling +# ``mkdir`` on every read (S2.7 from the PR review), which polluted +# arbitrary directories with stray ``.claude/`` shadows. +# --------------------------------------------------------------------------- + +_CACHE: dict[str, tuple[float, set[str]]] = {} +_CACHE_LOCK = Lock() + + +def _read_cached(path: Path) -> set[str]: + key = os.fspath(path) + try: + mtime = path.stat().st_mtime + except OSError: + # File doesn't exist (or unreadable) — drop any cached entry and + # return empty. Don't raise; statusline must never paint errors. + with _CACHE_LOCK: + _CACHE.pop(key, None) + return set() + with _CACHE_LOCK: + cached = _CACHE.get(key) + if cached and cached[0] == mtime: + return cached[1] + # Cache miss / stale — read outside the lock so concurrent readers + # don't serialize on disk I/O. Last-writer-wins on the cache write + # is fine: both readers will store the same (mtime, set) pair. + out: set[str] = set() + try: + for raw in path.read_text(encoding="utf-8").splitlines(): + s = raw.strip() + if s: + out.add(s) + except OSError: + return set() + with _CACHE_LOCK: + _CACHE[key] = (mtime, out) + return out + + +def read_all_private(workspace: Path) -> set[str]: + """Return all private session IDs as a set for O(1) membership checks. + + Returns an empty set if the file doesn't exist. Whitespace-only lines + are skipped. Cached by file mtime — repeated calls within the same + process return the cached set when the file hasn't changed. + """ + path = private_list_path(workspace) + return _read_cached(path) + + +def is_private(workspace: Path, session_id: str) -> bool: + """True iff ``session_id`` is on the private list. Empty session_id → False.""" + if not session_id: + return False + return session_id in read_all_private(workspace) + + +def add_private(workspace: Path, session_id: str) -> bool: + """Append ``session_id`` to the private list if not already present. + + Returns True if the entry was added, False if it was already present + (idempotent re-mark). Empty session_id → no-op, returns False. + """ + if not session_id: + return False + if is_private(workspace, session_id): + return False + line = session_id.rstrip("\n") + "\n" + with open(private_list_path(workspace, writable=True), "a", encoding="utf-8") as f: + f.write(line) + # Explicit cache eviction prevents a sub-second window where add+ + # is_private from the same process disagrees (mtime granularity on + # some filesystems is 1s). + with _CACHE_LOCK: + _CACHE.pop(os.fspath(private_list_path(workspace)), None) + return True diff --git a/cli/lib/push_lock.py b/cli/lib/push_lock.py new file mode 100644 index 0000000..2d1e28b --- /dev/null +++ b/cli/lib/push_lock.py @@ -0,0 +1,64 @@ +"""Cross-platform single-instance lock for `agnes push`. + +Wraps :class:`filelock.FileLock` (which delegates to ``fcntl.flock`` on POSIX +and ``msvcrt.locking`` on Windows) into a context manager that returns +``None`` when another push is already running. Callers use it via: + +.. code-block:: python + + with acquire_or_skip(workspace) as lock: + if lock is None: + return # silent exit — another push holds the lock + do_push() + +The OS releases the lock automatically when the holding process exits +(including crashes), so we do NOT track PIDs or stale-lock ages. The lock +file persists between runs but stays empty — :class:`filelock` only uses +it for the kernel-level lock handle. +""" + +from __future__ import annotations + +from contextlib import contextmanager +from pathlib import Path +from typing import Iterator + +from filelock import FileLock, Timeout + + +_LOCK_FILENAME = "agnes-push.lock" + + +def lock_path(workspace: Path) -> Path: + """Resolve ``/.claude/agnes-push.lock``.""" + claude_dir = workspace / ".claude" + claude_dir.mkdir(parents=True, exist_ok=True) + return claude_dir / _LOCK_FILENAME + + +@contextmanager +def acquire_or_skip(workspace: Path) -> Iterator[FileLock | None]: + """Yield the held lock, or ``None`` if the lock can't be acquired. + + Non-blocking (``timeout=0``). Two ways acquisition can fail, both + treated the same way (yield ``None`` so the caller can ``return`` / + ``sys.exit(0)`` quietly): + + - ``filelock.Timeout`` — another push is already running. Expected + when multiple SessionEnd hooks fire simultaneously after the user + closes several Claude Code sessions at once: exactly one acquires + the lock and runs, the rest no-op. + - ``OSError`` — the lock file can't be created or opened (read-only + filesystem, ``.claude/`` not writable, disk full, hardware I/O + error). Rare; when it happens the operator's environment has + bigger problems than missing session uploads. We swallow it so + ``agnes push`` exits cleanly instead of dumping an opaque + traceback to stderr or, in the SessionEnd hook context, crashing + silently under ``|| true``. + """ + lock = FileLock(str(lock_path(workspace))) + try: + with lock.acquire(timeout=0): + yield lock + except (Timeout, OSError): + yield None diff --git a/cli/lib/session_queue.py b/cli/lib/session_queue.py new file mode 100644 index 0000000..446f323 --- /dev/null +++ b/cli/lib/session_queue.py @@ -0,0 +1,289 @@ +"""Session queue and uploaded-log management for `agnes push`. + +The push command operates on a queue file +(``/.claude/agnes-sessions.txt``) populated by the +``agnes capture-session`` SessionStart hook. Each line is a TSV pair: +``\\t``. session_id is needed so the +push and slash-command machinery can filter against the private +list (``cli/lib/private_list.py``). + +Backward compatibility: legacy lines without a tab (just an absolute +path) are accepted and treated as having an empty session_id. They +still upload via push but cannot be marked private retroactively — +which is fine, since by definition they pre-date the feature. + +Race protection: push atomically renames the queue to a snapshot file +before processing. New SessionStart hooks write to a freshly-created +queue without their entries being clobbered by the eventual rewrite. +A short-lived ``agnes-queue.lock`` (filelock) serializes the rename +against in-flight appends so the queue file is never written to and +renamed concurrently — required on Windows, where ``os.rename`` fails +if another handle has the file open, and where ``open(path, "a")`` is +not atomic across writers. + +Recovery: if push crashes mid-snapshot, the snapshot file persists. The +next push picks it up via :func:`find_recovery_snapshots` and processes +it before touching the live queue. +""" + +from __future__ import annotations + +import os +import uuid +from datetime import datetime, timezone +from pathlib import Path + +from filelock import FileLock + + +_QUEUE_FILENAME = "agnes-sessions.txt" +_UPLOADED_FILENAME = "agnes-sessions-uploaded.txt" +_PRIVATE_SKIPPED_FILENAME = "agnes-sessions-private-skipped.txt" +_FAILED_FILENAME = "agnes-sessions-failed.txt" +_SNAPSHOT_PREFIX = "agnes-sessions.snapshot." +_SNAPSHOT_SUFFIX = ".txt" +_QUEUE_LOCK_FILENAME = "agnes-queue.lock" + + +def _claude_dir(workspace: Path) -> Path: + """Return ``/.claude``, creating it if missing.""" + d = workspace / ".claude" + d.mkdir(parents=True, exist_ok=True) + return d + + +def queue_path(workspace: Path) -> Path: + return _claude_dir(workspace) / _QUEUE_FILENAME + + +def uploaded_log_path(workspace: Path) -> Path: + return _claude_dir(workspace) / _UPLOADED_FILENAME + + +def private_skipped_log_path(workspace: Path) -> Path: + return _claude_dir(workspace) / _PRIVATE_SKIPPED_FILENAME + + +def failed_log_path(workspace: Path) -> Path: + return _claude_dir(workspace) / _FAILED_FILENAME + + +def _queue_lock_path(workspace: Path) -> Path: + """Lock file serializing concurrent writers to the queue file. + + Separate from ``agnes-push.lock`` — that one serializes the push + command end-to-end; this one is short-lived (held only for the + duration of a single append or rename). + """ + return _claude_dir(workspace) / _QUEUE_LOCK_FILENAME + + +def append_to_queue(workspace: Path, session_id: str, transcript_path: str) -> None: + """Append a ``\\t`` line to the queue. + + Held under ``agnes-queue.lock`` to serialize concurrent SessionStart + hooks. Python's ``open(path, "a")`` is NOT atomic on Windows — the + CRT does not pass ``FILE_APPEND_DATA`` to ``CreateFile``, so it's a + plain seek-to-end + write that can interleave bytes mid-line under + concurrent writers (e.g. user opens several Claude Code windows + simultaneously). The lock makes the append safe on every platform. + + No deduplication here: duplicates may legitimately appear (resume + scenario re-writes the same path). Dedup happens at read time. + """ + sid = (session_id or "").rstrip("\n").rstrip("\t") + tp = transcript_path.rstrip("\n") + line = f"{sid}\t{tp}\n" + with FileLock(str(_queue_lock_path(workspace))): + with open(queue_path(workspace), "a", encoding="utf-8") as f: + f.write(line) + + +def snapshot_queue(workspace: Path) -> Path | None: + """Atomically rename the live queue to a snapshot for processing. + + Returns the snapshot path, or None if the queue doesn't exist (no work + to do). The snapshot filename embeds the current PID *and* a random + uuid8 hex tail: PID alone is not unique after the OS recycles it + (Linux wraps at ~32768 by default), so a crashed push leaving a + snapshot on disk could be silently overwritten by a future push with + the same PID — ``os.rename`` atomically replaces the destination on + POSIX and Windows alike, so data loss would be silent. The uuid tail + makes every snapshot filename unique regardless of PID reuse. + + Held under ``agnes-queue.lock`` to serialize against in-flight + ``append_to_queue`` calls: on Windows, ``os.rename`` would fail with + ``PermissionError`` if another handle has the queue open for write, + so the lock prevents that race. The lock is short-lived (single + rename), so it doesn't meaningfully delay concurrent capture-session + hooks. + """ + queue = queue_path(workspace) + if not queue.exists(): + return None + unique = uuid.uuid4().hex[:8] + snapshot = ( + _claude_dir(workspace) + / f"{_SNAPSHOT_PREFIX}{os.getpid()}.{unique}{_SNAPSHOT_SUFFIX}" + ) + with FileLock(str(_queue_lock_path(workspace))): + try: + os.rename(queue, snapshot) + except FileNotFoundError: + return None # race: queue removed between exists() and rename() + return snapshot + + +def _parse_queue_line(raw: str) -> tuple[str, Path] | None: + """Parse one queue line into (session_id, path), or None if blank/invalid.""" + s = raw.strip() + if not s: + return None + if "\t" in s: + sid, _, p = s.partition("\t") + sid = sid.strip() + p = p.strip() + else: + # Legacy format: bare path, no session_id known. + sid = "" + p = s + if not p: + return None + return sid, Path(p) + + +def read_entries_from_snapshot(snapshot: Path) -> list[tuple[str, Path]]: + """Read (session_id, path) entries from a snapshot, deduplicated. + + Deduplication is by the (session_id, path) pair — preserves first-seen + order. Blank lines and lines without a path are skipped. Mixed legacy + (1-column) and new (2-column) lines coexist. + + Repeats from the resume scenario collapse into a single entry: the + server-side overwrite makes a second upload of the same path redundant + within one push run. + """ + if not snapshot.exists(): + return [] + seen: set[tuple[str, str]] = set() + out: list[tuple[str, Path]] = [] + for raw in snapshot.read_text(encoding="utf-8").splitlines(): + parsed = _parse_queue_line(raw) + if parsed is None: + continue + sid, path = parsed + key = (sid, str(path)) + if key in seen: + continue + seen.add(key) + out.append(parsed) + return out + + +# Backward-compatible alias for code that only needs paths. Returns just +# the paths (preserving the old ``list[Path]`` shape) for callers that +# don't care about session_id. Internally used by the dry-run preview +# path which only displays files. +def read_paths_from_snapshot(snapshot: Path) -> list[Path]: + return [path for _sid, path in read_entries_from_snapshot(snapshot)] + + +def find_recovery_snapshots(workspace: Path) -> list[Path]: + """Return any pre-existing snapshot files left behind by a crashed push.""" + return sorted(_claude_dir(workspace).glob(f"{_SNAPSHOT_PREFIX}*{_SNAPSHOT_SUFFIX}")) + + +def discard_snapshot(snapshot: Path) -> None: + """Delete a fully-processed snapshot file. Idempotent.""" + try: + snapshot.unlink() + except FileNotFoundError: + pass + + +def mark_uploaded( + workspace: Path, + transcript_path: Path, + when: datetime | None = None, +) -> None: + """Append `\\t\\n` to the uploaded log.""" + if when is None: + when = datetime.now(timezone.utc) + ts = when.strftime("%Y-%m-%dT%H:%M:%SZ") + line = f"{ts}\t{transcript_path}\n" + with open(uploaded_log_path(workspace), "a", encoding="utf-8") as f: + f.write(line) + + +def mark_private_skipped( + workspace: Path, + session_id: str, + transcript_path: Path, + when: datetime | None = None, +) -> None: + """Append `\\t\\t` to the private-skipped audit log. + + Called by push when it filters out an entry whose session_id is on + the private list. The audit log is append-only — its purpose is to + surface (during incident review or user support) which sessions were + intentionally NOT uploaded. + """ + if when is None: + when = datetime.now(timezone.utc) + ts = when.strftime("%Y-%m-%dT%H:%M:%SZ") + line = f"{ts}\t{session_id}\t{transcript_path}\n" + with open(private_skipped_log_path(workspace), "a", encoding="utf-8") as f: + f.write(line) + + +def mark_failed_permanent( + workspace: Path, + session_id: str, + transcript_path: Path, + status_code: int, + when: datetime | None = None, +) -> None: + """Append `\\t\\t\\t` to the + permanent-failure audit log. + + Called by push when the server returns a 4xx (other than 408 / 429) + — deterministic failures where retrying never succeeds (401 token + expired, 403 RBAC denial, 413 payload too large, 400 server + validation, etc.). The transcript path is logged here instead of + silently dropped so operators have a forensic trail; the entry is + NOT re-queued, breaking the prior infinite-loop bug where every + push run would re-bombard the server with the same failing upload. + + No separate lock: piggybacks on `agnes-push.lock` (the + single-instance push lock), same as `mark_uploaded` and + `mark_private_skipped`. Push is the only writer to this file. + """ + if when is None: + when = datetime.now(timezone.utc) + ts = when.strftime("%Y-%m-%dT%H:%M:%SZ") + line = f"{ts}\t{session_id}\t{status_code}\t{transcript_path}\n" + with open(failed_log_path(workspace), "a", encoding="utf-8") as f: + f.write(line) + + +def requeue_failed( + workspace: Path, + entries: list[tuple[str, Path]], +) -> None: + """Append failed (session_id, path) entries back to the live queue. + + Failed entries land at the end of the queue alongside any fresh + appends that hooks wrote during this push run. Relative ordering + vs. those fresh entries is best-effort — order doesn't affect + correctness. + + Held under ``agnes-queue.lock`` because concurrent ``capture-session`` + hooks (which don't hold the push lock) may be appending at the same + time — same Windows non-atomicity concern as ``append_to_queue``. + """ + if not entries: + return + with FileLock(str(_queue_lock_path(workspace))): + with open(queue_path(workspace), "a", encoding="utf-8") as f: + for sid, p in entries: + f.write(f"{sid}\t{p}\n") diff --git a/cli/main.py b/cli/main.py index 1d92f15..89aa66b 100644 --- a/cli/main.py +++ b/cli/main.py @@ -25,11 +25,14 @@ if sys.platform == "win32": pass from cli.commands.auth import auth_app +from cli.commands.capture_session import capture_session_app from cli.commands.init import init_app +from cli.commands.mark_private import mark_private_app from cli.commands.onboarded import onboarded_app from cli.commands.pull import pull_app from cli.commands.push import push_app from cli.commands.refresh_marketplace import refresh_marketplace_app +from cli.commands.statusline import statusline_app from cli.commands.query import query_command from cli.commands.status import status_app from cli.commands.admin import admin_app @@ -114,6 +117,9 @@ app.add_typer(init_app, name="init") app.add_typer(onboarded_app, name="onboarded") app.add_typer(pull_app, name="pull") app.add_typer(push_app, name="push") +app.add_typer(capture_session_app, name="capture-session") +app.add_typer(mark_private_app, name="mark-private") +app.add_typer(statusline_app, name="statusline") app.add_typer(refresh_marketplace_app, name="refresh-marketplace") app.command("query")(query_command) app.add_typer(status_app, name="status") diff --git a/cli/templates/commands/agnes-private.md b/cli/templates/commands/agnes-private.md new file mode 100644 index 0000000..6f868fc --- /dev/null +++ b/cli/templates/commands/agnes-private.md @@ -0,0 +1,8 @@ +--- +description: Mark the current Claude Code session as private — exclude it from Agnes upload +--- + +!`agnes mark-private` + +The session above has been marked as private. Its transcript will be skipped +the next time `agnes push` runs. diff --git a/config/claude_md_template.txt b/config/claude_md_template.txt index dfe1362..1ed1565 100644 --- a/config/claude_md_template.txt +++ b/config/claude_md_template.txt @@ -37,6 +37,24 @@ This workspace is connected to {{ server.url }}. - `agnes push` — upload sessions and local notes to server - Data on the server refreshes every {{ sync_interval }} +## Private sessions + +If a Claude Code session covers something you do **not** want uploaded +to the Agnes server (sensitive PII walk-through, exploratory work that +shouldn't shape org-wide context, etc.), mark it private: + +- Run `/agnes-private` from inside the session — the deterministic + `!`-prefix bash adds the current `CLAUDE_CODE_SESSION_ID` to + `/.claude/agnes-sessions-private.txt`. Subsequent + `agnes push` runs skip its transcript and audit-log the skip to + `/.claude/agnes-sessions-private-skipped.txt`. +- The Claude Code statusbar shows `🔒 agnes-private` while you are + in a private session, so the marking is visible at a glance. +- Marking before `capture-session` fires (the SessionStart hook that + queues the transcript) means the session never enters the upload + queue at all; marking after it fires is also safe — the push-side + re-check on the private list is the authoritative filter. + ## Discovering tables — never enumerate from memory Tables, columns, sizes, descriptions, and `query_mode` change as admins @@ -67,15 +85,28 @@ and use `agnes snapshot create --estimate` to size-check before fetching. {% if marketplaces -%} ## Agnes Marketplace — plugins available to you -These plugins reach Claude Code through the per-user **`agnes`** marketplace -served by this server (an aggregated, RBAC-filtered view of the upstream -marketplaces below). When you install or invoke one of these plugins inside -Claude Code, address it as `@agnes` regardless of which upstream it -came from — e.g. `claude plugin install @agnes`. The -`agnes refresh-marketplace` command (run by the SessionStart hook every -session) keeps the local clone in sync. +The Agnes server publishes a per-user **`agnes`** Claude Code marketplace, +aggregated from upstream marketplaces and RBAC-filtered. The list below is +your **eligibility set** — plugins your account is allowed to add to its +stack on the `/marketplace` page. Eligibility ≠ installed. -Upstream marketplaces folded into your `agnes` view: +What actually reaches Claude Code is your **served stack**, which is: +- plugins you opted in to on `/marketplace` (from the eligibility set above), +- plus any plugin your admin pinned as system-mandatory (auto-applied to every user), +- plus skills / agents / plugins you installed from the **Flea market** tab on `/marketplace`. + +When you install or invoke a stack plugin inside Claude Code, address it +as `@agnes` regardless of which upstream marketplace it came from +— e.g. `claude plugin install @agnes`. + +Updates: the SessionStart hook runs `agnes refresh-marketplace --check` +on every Claude Code session — it only **detects** server-side changes +(new admin grants, new system plugins, your own `/marketplace` picks) and +prompts you to run `/update-agnes-plugins` inside Claude Code to install +the diff. The slash command does the full reconcile with output visible +in the transcript; no silent auto-install at session start. + +Upstream marketplaces folded into your `agnes` view (eligibility): {% for mp in marketplaces -%} - **{{ mp.name }}** ({{ mp.slug }}): {{ mp.plugins | map(attribute="name") | join(", ") }} {% endfor %} diff --git a/pyproject.toml b/pyproject.toml index 1bc21f9..8f7c296 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -88,6 +88,11 @@ dependencies = [ # src/welcome_template.py was vulnerable to. Pre-built wheels published # for all supported (mac/linux/windows × arm64/x86_64) targets. "nh3>=0.2", + # Cross-platform advisory file locking for the `agnes push` single-instance + # guard. Wraps fcntl.flock on POSIX and msvcrt.locking on Windows behind + # a uniform API; OS releases the lock automatically on process exit (no + # stale-lock detection required). Used by cli/lib/push_lock.py. + "filelock>=3.13,<4", ] [project.optional-dependencies] diff --git a/tests/test_capture_session.py b/tests/test_capture_session.py new file mode 100644 index 0000000..81cf1c2 --- /dev/null +++ b/tests/test_capture_session.py @@ -0,0 +1,197 @@ +"""Tests for `agnes capture-session` — SessionStart hook helper.""" + +import json + +from typer.testing import CliRunner + +from cli.commands.capture_session import capture_session_app +from cli.lib.private_list import add_private +from cli.lib.session_queue import queue_path + +runner = CliRunner() + + +def test_capture_appends_session_id_and_transcript_path(tmp_path, monkeypatch): + monkeypatch.setenv("AGNES_LOCAL_DIR", str(tmp_path)) + payload = json.dumps({ + "session_id": "abc-123", + "transcript_path": "/Users/foo/.claude/projects/.../abc.jsonl", + "cwd": "/Users/foo/work", + "hook_event_name": "SessionStart", + }) + result = runner.invoke(capture_session_app, [], input=payload) + assert result.exit_code == 0 + assert queue_path(tmp_path).read_text(encoding="utf-8") == \ + "abc-123\t/Users/foo/.claude/projects/.../abc.jsonl\n" + + +def test_capture_writes_empty_session_id_when_field_missing(tmp_path, monkeypatch): + """Payload missing session_id: still write to queue (legacy behavior). + The empty session_id means the entry cannot be marked private later + via /agnes-private (which keys on session_id), but it'll still upload.""" + monkeypatch.setenv("AGNES_LOCAL_DIR", str(tmp_path)) + payload = json.dumps({"transcript_path": "/abc.jsonl"}) + result = runner.invoke(capture_session_app, [], input=payload) + assert result.exit_code == 0 + assert queue_path(tmp_path).read_text(encoding="utf-8") == "\t/abc.jsonl\n" + + +def test_capture_silently_skips_when_field_missing(tmp_path, monkeypatch): + """transcript_path is required; absence is a no-op.""" + monkeypatch.setenv("AGNES_LOCAL_DIR", str(tmp_path)) + result = runner.invoke(capture_session_app, [], input='{"session_id": "abc"}') + assert result.exit_code == 0 + assert not queue_path(tmp_path).exists() + + +def test_capture_silently_skips_invalid_json(tmp_path, monkeypatch): + monkeypatch.setenv("AGNES_LOCAL_DIR", str(tmp_path)) + result = runner.invoke(capture_session_app, [], input="not json {") + assert result.exit_code == 0 + assert not queue_path(tmp_path).exists() + + +def test_capture_silently_skips_empty_stdin(tmp_path, monkeypatch): + monkeypatch.setenv("AGNES_LOCAL_DIR", str(tmp_path)) + result = runner.invoke(capture_session_app, [], input="") + assert result.exit_code == 0 + assert not queue_path(tmp_path).exists() + + +def test_capture_silently_skips_non_string_transcript_path(tmp_path, monkeypatch): + monkeypatch.setenv("AGNES_LOCAL_DIR", str(tmp_path)) + payload = json.dumps({"transcript_path": 123}) + result = runner.invoke(capture_session_app, [], input=payload) + assert result.exit_code == 0 + assert not queue_path(tmp_path).exists() + + +def test_capture_silently_skips_when_payload_not_object(tmp_path, monkeypatch): + monkeypatch.setenv("AGNES_LOCAL_DIR", str(tmp_path)) + result = runner.invoke(capture_session_app, [], input='["array", "not object"]') + assert result.exit_code == 0 + assert not queue_path(tmp_path).exists() + + +def test_capture_appends_multiple_calls(tmp_path, monkeypatch): + """Resume scenario: SessionStart fires again — same path appended twice.""" + monkeypatch.setenv("AGNES_LOCAL_DIR", str(tmp_path)) + p1 = json.dumps({"session_id": "abc", "transcript_path": "/abc.jsonl"}) + p2 = json.dumps({"session_id": "abc", "transcript_path": "/abc.jsonl"}) + runner.invoke(capture_session_app, [], input=p1) + runner.invoke(capture_session_app, [], input=p2) + assert queue_path(tmp_path).read_text(encoding="utf-8") == ( + "abc\t/abc.jsonl\n" + "abc\t/abc.jsonl\n" + ) + + +def test_capture_verbose_emits_diagnostic(tmp_path, monkeypatch): + monkeypatch.setenv("AGNES_LOCAL_DIR", str(tmp_path)) + payload = json.dumps({"session_id": "abc", "transcript_path": "/x.jsonl"}) + result = runner.invoke(capture_session_app, ["--verbose"], input=payload) + assert result.exit_code == 0 + assert "/x.jsonl" in result.output + + +def test_capture_skips_queue_when_session_is_already_private(tmp_path, monkeypatch): + """Race scenario: user ran /agnes-private BEFORE the SessionStart hook + chain reached capture-session. The private list write happened first; + capture-session sees it and skips the queue append.""" + monkeypatch.setenv("AGNES_LOCAL_DIR", str(tmp_path)) + add_private(tmp_path, "abc-123") + payload = json.dumps({ + "session_id": "abc-123", + "transcript_path": "/abc.jsonl", + }) + result = runner.invoke(capture_session_app, [], input=payload) + assert result.exit_code == 0 + # Queue was NOT written. + assert not queue_path(tmp_path).exists() + + +def test_capture_writes_when_unrelated_session_is_private(tmp_path, monkeypatch): + """Marking session X private must not block capture of unrelated session Y.""" + monkeypatch.setenv("AGNES_LOCAL_DIR", str(tmp_path)) + add_private(tmp_path, "other-session") + payload = json.dumps({ + "session_id": "abc-123", + "transcript_path": "/abc.jsonl", + }) + result = runner.invoke(capture_session_app, [], input=payload) + assert result.exit_code == 0 + assert queue_path(tmp_path).read_text(encoding="utf-8") == "abc-123\t/abc.jsonl\n" + + +# ---------- Breadcrumb tests (David #11 from PR review) --------------------- +# +# capture-session is a SessionStart hook that always exits 0 (it must NOT +# fail loudly inside Claude Code's startup chain). Without an external +# observability signal, an upstream contract change (Claude Code's stdin +# JSON shape shifts; the queue mysteriously stays empty) is invisible to +# operators. The breadcrumb log gives `agnes diagnose` something to +# inspect. + +from cli.commands.capture_session import _BREADCRUMB_FILENAME + + +def _breadcrumb_lines(workspace) -> list[str]: + """Read all breadcrumb lines, dropping trailing newline.""" + path = workspace / ".claude" / _BREADCRUMB_FILENAME + if not path.exists(): + return [] + return [ln for ln in path.read_text(encoding="utf-8").splitlines() if ln] + + +def test_capture_writes_ok_breadcrumb_on_success(tmp_path, monkeypatch): + monkeypatch.setenv("AGNES_LOCAL_DIR", str(tmp_path)) + payload = json.dumps({"session_id": "abc-123", "transcript_path": "/abc.jsonl"}) + runner.invoke(capture_session_app, [], input=payload) + lines = _breadcrumb_lines(tmp_path) + assert len(lines) == 1 + parts = lines[0].split("\t") + assert parts[1] == "ok" + assert parts[2] == "abc-123" + + +def test_capture_writes_bad_json_breadcrumb_on_invalid_input(tmp_path, monkeypatch): + monkeypatch.setenv("AGNES_LOCAL_DIR", str(tmp_path)) + # Pre-create .claude/ so breadcrumb has somewhere to land (the breadcrumb + # is read-only re: dir creation — see _record_breadcrumb). + (tmp_path / ".claude").mkdir() + runner.invoke(capture_session_app, [], input="not json at all") + lines = _breadcrumb_lines(tmp_path) + assert len(lines) == 1 + assert lines[0].split("\t")[1] == "bad_json" + + +def test_capture_writes_no_transcript_breadcrumb_when_field_missing(tmp_path, monkeypatch): + monkeypatch.setenv("AGNES_LOCAL_DIR", str(tmp_path)) + (tmp_path / ".claude").mkdir() + runner.invoke(capture_session_app, [], input=json.dumps({"session_id": "x"})) + lines = _breadcrumb_lines(tmp_path) + assert len(lines) == 1 + assert lines[0].split("\t")[1] == "no_transcript_path" + + +def test_capture_writes_private_skip_breadcrumb_on_marked_session(tmp_path, monkeypatch): + monkeypatch.setenv("AGNES_LOCAL_DIR", str(tmp_path)) + add_private(tmp_path, "private-sid") + runner.invoke( + capture_session_app, [], + input=json.dumps({"session_id": "private-sid", "transcript_path": "/x"}), + ) + lines = _breadcrumb_lines(tmp_path) + assert lines[-1].split("\t")[1] == "private_skip" + assert lines[-1].split("\t")[2] == "private-sid" + + +def test_breadcrumb_does_not_create_claude_dir_in_arbitrary_workspaces(tmp_path, monkeypatch): + """If the workspace has no .claude/ directory, capture-session + never materializes one — same rationale as the read-only path in + private_list.py. Hooks fire in directories the user just opened + Claude Code in; the breadcrumb log shouldn't pollute those.""" + monkeypatch.setenv("AGNES_LOCAL_DIR", str(tmp_path)) + runner.invoke(capture_session_app, [], input="malformed") + # No .claude/ created, no breadcrumb written. + assert not (tmp_path / ".claude").exists() diff --git a/tests/test_cli_push.py b/tests/test_cli_push.py index 029c846..09f0e2c 100644 --- a/tests/test_cli_push.py +++ b/tests/test_cli_push.py @@ -1,30 +1,70 @@ """Tests for agnes push command (SessionEnd uploader).""" +import json +import re +from contextlib import contextmanager + from typer.testing import CliRunner +from cli.commands.push import push_app +from cli.lib.private_list import add_private +from cli.lib.session_queue import ( + append_to_queue, + failed_log_path, + private_skipped_log_path, + queue_path, + uploaded_log_path, +) + # CI-safety: Typer/rich emits ANSI escapes in --help output. Strip before asserts. -_ANSI_RE = __import__("re").compile(r"\x1b\[[0-9;]*m") +_ANSI_RE = re.compile(r"\x1b\[[0-9;]*m") + + def _clean(s: str) -> str: return _ANSI_RE.sub("", s) -from cli.commands.push import push_app runner = CliRunner() +class _FakeResp: + def __init__(self, status_code: int = 200) -> None: + self.status_code = status_code + + +def _stub_config(monkeypatch) -> None: + monkeypatch.setattr("cli.commands.push.get_server_url", lambda: "http://x") + monkeypatch.setattr("cli.commands.push.get_token", lambda: "test-pat") + + +def _record_uploads(monkeypatch) -> list[tuple[str, dict]]: + """Patch api_post to record calls and return success. Returns the recorder list.""" + calls: list[tuple[str, dict]] = [] + + def _fake(endpoint, **kwargs): + calls.append((endpoint, kwargs)) + return _FakeResp(200) + + monkeypatch.setattr("cli.commands.push.api_post", _fake) + return calls + + +# ---------- Smoke + dry-run -------------------------------------------------- + + def test_push_help(): result = runner.invoke(push_app, ["--help"]) assert result.exit_code == 0 assert "--quiet" in _clean(result.output) assert "--json" in _clean(result.output) assert "--dry-run" in _clean(result.output) + assert "--legacy-scan" in _clean(result.output) def test_push_no_sessions_no_mkdir(tmp_path, monkeypatch): """Empty workspace -> push exits 0, doesn't create user/sessions/.""" monkeypatch.setenv("AGNES_LOCAL_DIR", str(tmp_path)) - monkeypatch.setattr("cli.commands.push.get_server_url", lambda: "http://x") - monkeypatch.setattr("cli.commands.push.get_token", lambda: "test-pat") + _stub_config(monkeypatch) result = runner.invoke(push_app, ["--quiet"]) assert result.exit_code == 0 assert not (tmp_path / "user" / "sessions").exists(), \ @@ -34,17 +74,526 @@ def test_push_no_sessions_no_mkdir(tmp_path, monkeypatch): def test_push_dry_run_no_writes(tmp_path, monkeypatch): """--dry-run lists what would upload but sends nothing.""" monkeypatch.setenv("AGNES_LOCAL_DIR", str(tmp_path)) - monkeypatch.setattr("cli.commands.push.get_server_url", lambda: "http://x") - monkeypatch.setattr("cli.commands.push.get_token", lambda: "test-pat") - # Create a session file - sessions_dir = tmp_path / "user" / "sessions" - sessions_dir.mkdir(parents=True) - (sessions_dir / "abc.jsonl").write_text('{"event":"test"}\n') + _stub_config(monkeypatch) + transcript = tmp_path / "abc.jsonl" + transcript.write_text('{"event":"test"}\n') + append_to_queue(tmp_path, "sid-1", str(transcript)) - # No api_post should be called - patch it to fail loudly if invoked def _raise(*a, **kw): raise AssertionError("api_post was called during --dry-run") + monkeypatch.setattr("cli.commands.push.api_post", _raise) result = runner.invoke(push_app, ["--dry-run"]) assert result.exit_code == 0 + assert queue_path(tmp_path).exists() # not consumed + + +# ---------- Queue happy path + dedup + lock + recovery ---------------------- + + +def test_push_uploads_queued_session_and_clears_queue(tmp_path, monkeypatch): + """Happy path: queue has one session, push uploads it, queue cleared, log written.""" + monkeypatch.setenv("AGNES_LOCAL_DIR", str(tmp_path)) + _stub_config(monkeypatch) + calls = _record_uploads(monkeypatch) + + transcript = tmp_path / "abc.jsonl" + transcript.write_text('{"event":"test"}\n') + append_to_queue(tmp_path, "sid-1", str(transcript)) + + result = runner.invoke(push_app, ["--quiet"]) + assert result.exit_code == 0 + + sessions_calls = [c for c in calls if c[0] == "/api/upload/sessions"] + assert len(sessions_calls) == 1 + assert not queue_path(tmp_path).exists() + log = uploaded_log_path(tmp_path).read_text(encoding="utf-8") + assert str(transcript) in log + assert "\t" in log + + +def test_push_dedups_duplicate_paths_in_queue(tmp_path, monkeypatch): + """Resume scenario: same (session_id, path) queued twice — push uploads once.""" + monkeypatch.setenv("AGNES_LOCAL_DIR", str(tmp_path)) + _stub_config(monkeypatch) + calls = _record_uploads(monkeypatch) + + transcript = tmp_path / "abc.jsonl" + transcript.write_text('{"event":"test"}\n') + append_to_queue(tmp_path, "sid-1", str(transcript)) + append_to_queue(tmp_path, "sid-1", str(transcript)) + + result = runner.invoke(push_app, ["--quiet"]) + assert result.exit_code == 0 + + sessions_calls = [c for c in calls if c[0] == "/api/upload/sessions"] + assert len(sessions_calls) == 1 + + +def test_push_silent_exit_when_lock_held(tmp_path, monkeypatch): + """Concurrent SessionEnd hooks: only one push runs, others silent-exit.""" + monkeypatch.setenv("AGNES_LOCAL_DIR", str(tmp_path)) + _stub_config(monkeypatch) + + @contextmanager + def _yield_none(workspace): + yield None + + monkeypatch.setattr("cli.commands.push.acquire_or_skip", _yield_none) + + def _raise(*a, **kw): + raise AssertionError("api_post called when lock unavailable") + + monkeypatch.setattr("cli.commands.push.api_post", _raise) + + transcript = tmp_path / "x.jsonl" + transcript.write_text("{}\n") + append_to_queue(tmp_path, "sid-1", str(transcript)) + + result = runner.invoke(push_app, ["--quiet"]) + assert result.exit_code == 0 + assert result.output == "" + assert queue_path(tmp_path).read_text(encoding="utf-8") == f"sid-1\t{transcript}\n" + + +def test_push_silent_exit_when_filelock_raises_oserror(tmp_path, monkeypatch): + """OSError from filelock (read-only FS, permission denied, disk full) + must not crash push with an unhandled traceback. Exercises the real + acquire_or_skip by replacing it with a context manager that raises + OSError on entry — simulates what filelock.FileLock.acquire raises + on a read-only mount.""" + monkeypatch.setenv("AGNES_LOCAL_DIR", str(tmp_path)) + _stub_config(monkeypatch) + + # Queue setup must happen BEFORE we install the failing lock — the + # `append_to_queue` path holds its own `agnes-queue.lock` and a + # blanket `FileLock.acquire` patch would break that one too. + transcript = tmp_path / "x.jsonl" + transcript.write_text("{}\n") + append_to_queue(tmp_path, "sid-1", str(transcript)) + + # Wrap the real acquire_or_skip with one that raises OSError before + # yielding. We can't just patch `cli.commands.push.acquire_or_skip` + # because the new behaviour lives inside `acquire_or_skip` itself — + # we have to exercise its except handler. So we patch `FileLock` + # used inside push_lock: subclass with overridden `acquire` that + # raises OSError. + from cli.lib import push_lock as pl + + class _BrokenLock: + def __init__(self, path: str) -> None: + self._path = path + + def acquire(self, timeout: float = -1): + raise OSError("read-only filesystem") + + monkeypatch.setattr(pl, "FileLock", _BrokenLock) + + def _api_should_not_be_called(*a, **kw): + raise AssertionError("api_post called when lock acquisition raised OSError") + + monkeypatch.setattr("cli.commands.push.api_post", _api_should_not_be_called) + + result = runner.invoke(push_app, ["--quiet"]) + assert result.exit_code == 0, f"push must exit 0 on OSError, got: {result.output}" + # No traceback in output + assert "Traceback" not in result.output + # Queue preserved for next push attempt + assert queue_path(tmp_path).read_text(encoding="utf-8") == f"sid-1\t{transcript}\n" + + +def test_push_processes_recovery_snapshot_first(tmp_path, monkeypatch): + """Pre-existing snapshot from a crashed push gets picked up.""" + monkeypatch.setenv("AGNES_LOCAL_DIR", str(tmp_path)) + _stub_config(monkeypatch) + calls = _record_uploads(monkeypatch) + + claude = tmp_path / ".claude" + claude.mkdir(parents=True, exist_ok=True) + recovery = claude / "agnes-sessions.snapshot.99999.txt" + crashed_jsonl = tmp_path / "crashed.jsonl" + crashed_jsonl.write_text("{}\n") + recovery.write_text(f"sid-old\t{crashed_jsonl}\n", encoding="utf-8") + + fresh_jsonl = tmp_path / "fresh.jsonl" + fresh_jsonl.write_text("{}\n") + append_to_queue(tmp_path, "sid-new", str(fresh_jsonl)) + + result = runner.invoke(push_app, ["--quiet"]) + assert result.exit_code == 0 + + sessions_calls = [c for c in calls if c[0] == "/api/upload/sessions"] + assert len(sessions_calls) == 2 + assert not recovery.exists() + assert not queue_path(tmp_path).exists() + + +def test_push_skips_stale_queue_entry(tmp_path, monkeypatch): + """Queue entry pointing to a deleted file: skipped, not retried forever.""" + monkeypatch.setenv("AGNES_LOCAL_DIR", str(tmp_path)) + _stub_config(monkeypatch) + + def _raise(*a, **kw): + raise AssertionError("api_post should not be called for missing file") + + monkeypatch.setattr("cli.commands.push.api_post", _raise) + + append_to_queue(tmp_path, "sid-1", str(tmp_path / "ghost.jsonl")) + + result = runner.invoke(push_app, []) + assert result.exit_code == 0 + assert not queue_path(tmp_path).exists() + assert not uploaded_log_path(tmp_path).exists() + + +def test_push_requeues_failed_uploads(tmp_path, monkeypatch): + """Server returns 500 → path stays in queue for next push.""" + monkeypatch.setenv("AGNES_LOCAL_DIR", str(tmp_path)) + _stub_config(monkeypatch) + + def _fail(*a, **kw): + return _FakeResp(500) + + monkeypatch.setattr("cli.commands.push.api_post", _fail) + + transcript = tmp_path / "x.jsonl" + transcript.write_text("{}\n") + append_to_queue(tmp_path, "sid-1", str(transcript)) + + result = runner.invoke(push_app, []) + assert result.exit_code == 0 + assert queue_path(tmp_path).read_text(encoding="utf-8") == f"sid-1\t{transcript}\n" + assert not uploaded_log_path(tmp_path).exists() + + +def test_push_uploads_local_md(tmp_path, monkeypatch): + """CLAUDE.local.md uploaded when present.""" + monkeypatch.setenv("AGNES_LOCAL_DIR", str(tmp_path)) + _stub_config(monkeypatch) + calls = _record_uploads(monkeypatch) + + claude = tmp_path / ".claude" + claude.mkdir() + (claude / "CLAUDE.local.md").write_text("notes") + + result = runner.invoke(push_app, ["--quiet"]) + assert result.exit_code == 0 + + md_calls = [c for c in calls if c[0] == "/api/upload/local-md"] + assert len(md_calls) == 1 + + +def test_push_json_output(tmp_path, monkeypatch): + """--json emits a single JSON object with results.""" + monkeypatch.setenv("AGNES_LOCAL_DIR", str(tmp_path)) + _stub_config(monkeypatch) + _record_uploads(monkeypatch) + + transcript = tmp_path / "x.jsonl" + transcript.write_text("{}\n") + append_to_queue(tmp_path, "sid-1", str(transcript)) + + result = runner.invoke(push_app, ["--json"]) + assert result.exit_code == 0 + data = json.loads(result.output.strip()) + assert data["sessions"] == 1 + assert data["errors"] == [] + assert data["private_skipped"] == 0 + + +# ---------- Private filter tests -------------------------------------------- + + +def test_push_skips_private_session_and_audit_logs(tmp_path, monkeypatch): + """Queue contains a private session_id → no upload, audit log appended.""" + monkeypatch.setenv("AGNES_LOCAL_DIR", str(tmp_path)) + _stub_config(monkeypatch) + calls = _record_uploads(monkeypatch) + + transcript = tmp_path / "secret.jsonl" + transcript.write_text("{}\n") + add_private(tmp_path, "sid-private") + append_to_queue(tmp_path, "sid-private", str(transcript)) + + result = runner.invoke(push_app, ["--quiet"]) + assert result.exit_code == 0 + + sessions_calls = [c for c in calls if c[0] == "/api/upload/sessions"] + assert sessions_calls == [], "private session must NOT be uploaded" + + # Audit log entry written + audit = private_skipped_log_path(tmp_path).read_text(encoding="utf-8") + assert "sid-private" in audit + assert str(transcript) in audit + + # Queue consumed (snapshot processed and discarded — private entry not requeued) + assert not queue_path(tmp_path).exists() + + +def test_push_mixes_private_and_public_correctly(tmp_path, monkeypatch): + """A push run with one private + one public session uploads only the public one.""" + monkeypatch.setenv("AGNES_LOCAL_DIR", str(tmp_path)) + _stub_config(monkeypatch) + calls = _record_uploads(monkeypatch) + + secret = tmp_path / "secret.jsonl" + secret.write_text("{}\n") + public = tmp_path / "public.jsonl" + public.write_text("{}\n") + + add_private(tmp_path, "sid-secret") + append_to_queue(tmp_path, "sid-secret", str(secret)) + append_to_queue(tmp_path, "sid-public", str(public)) + + result = runner.invoke(push_app, ["--quiet"]) + assert result.exit_code == 0 + + sessions_calls = [c for c in calls if c[0] == "/api/upload/sessions"] + assert len(sessions_calls) == 1 + + audit = private_skipped_log_path(tmp_path).read_text(encoding="utf-8") + assert "sid-secret" in audit + assert "sid-public" not in audit + + +def test_push_dry_run_shows_private_skip(tmp_path, monkeypatch): + """--dry-run preview reports private-skipped count separately.""" + monkeypatch.setenv("AGNES_LOCAL_DIR", str(tmp_path)) + _stub_config(monkeypatch) + + def _raise(*a, **kw): + raise AssertionError("api_post was called during --dry-run") + + monkeypatch.setattr("cli.commands.push.api_post", _raise) + + transcript = tmp_path / "secret.jsonl" + transcript.write_text("{}\n") + add_private(tmp_path, "sid-priv") + append_to_queue(tmp_path, "sid-priv", str(transcript)) + + result = runner.invoke(push_app, ["--dry-run"]) + assert result.exit_code == 0 + assert "1 private session" in result.output + assert "sid-priv" in result.output + + +# ---------- 4xx permanent-failure handling ----------------------------------- + + +def _stub_api_post_status(monkeypatch, status: int) -> None: + """Patch api_post to always return the given status code.""" + def _fixed(*a, **kw): + return _FakeResp(status) + monkeypatch.setattr("cli.commands.push.api_post", _fixed) + + +def test_push_drops_4xx_to_audit_log_not_requeue(tmp_path, monkeypatch): + """4xx (here: 401 token expired) → drop + audit, no requeue. + Closes the prior infinite-loop bug where every non-200 except + `file not found on disk` was requeued forever.""" + monkeypatch.setenv("AGNES_LOCAL_DIR", str(tmp_path)) + _stub_config(monkeypatch) + _stub_api_post_status(monkeypatch, 401) + + transcript = tmp_path / "x.jsonl" + transcript.write_text("{}\n") + append_to_queue(tmp_path, "sid-1", str(transcript)) + + result = runner.invoke(push_app, []) + assert result.exit_code == 0 + # Entry must NOT be in the live queue any more. + assert not queue_path(tmp_path).exists() or \ + queue_path(tmp_path).read_text(encoding="utf-8") == "" + # Audit log must record the drop with status + session_id + path. + log = failed_log_path(tmp_path).read_text(encoding="utf-8") + assert "\t401\t" in log + assert "sid-1" in log + assert str(transcript) in log + + +def test_push_drops_each_4xx_status(tmp_path, monkeypatch): + """403, 413, 400 → all drop (not just 401).""" + for status in (400, 403, 413): + ws = tmp_path / f"ws-{status}" + ws.mkdir() + monkeypatch.setenv("AGNES_LOCAL_DIR", str(ws)) + _stub_config(monkeypatch) + _stub_api_post_status(monkeypatch, status) + transcript = ws / "x.jsonl" + transcript.write_text("{}\n") + append_to_queue(ws, f"sid-{status}", str(transcript)) + + result = runner.invoke(push_app, []) + assert result.exit_code == 0, (status, result.output) + log = failed_log_path(ws).read_text(encoding="utf-8") + assert f"\t{status}\t" in log, (status, log) + + +def test_push_requeues_408_and_429(tmp_path, monkeypatch): + """408 Request Timeout + 429 Too Many Requests are transient per + HTTP spec — server is asking us to retry, not telling us the + request is invalid. Must requeue, not drop.""" + for status in (408, 429): + ws = tmp_path / f"ws-{status}" + ws.mkdir() + monkeypatch.setenv("AGNES_LOCAL_DIR", str(ws)) + _stub_config(monkeypatch) + _stub_api_post_status(monkeypatch, status) + transcript = ws / "x.jsonl" + transcript.write_text("{}\n") + append_to_queue(ws, f"sid-{status}", str(transcript)) + + result = runner.invoke(push_app, []) + assert result.exit_code == 0 + # Requeued → entry back in live queue. + live = queue_path(ws).read_text(encoding="utf-8") + assert f"sid-{status}\t{transcript}\n" == live + # NOT in the failed audit log. + assert not failed_log_path(ws).exists() + + +def test_push_requeues_5xx(tmp_path, monkeypatch): + """5xx is genuine server-side failure: request was valid but server + couldn't honor it right now. Requeue for the next push.""" + for status in (500, 502, 503): + ws = tmp_path / f"ws-{status}" + ws.mkdir() + monkeypatch.setenv("AGNES_LOCAL_DIR", str(ws)) + _stub_config(monkeypatch) + _stub_api_post_status(monkeypatch, status) + transcript = ws / "x.jsonl" + transcript.write_text("{}\n") + append_to_queue(ws, f"sid-{status}", str(transcript)) + + result = runner.invoke(push_app, []) + assert result.exit_code == 0 + live = queue_path(ws).read_text(encoding="utf-8") + assert f"sid-{status}\t{transcript}\n" == live + assert not failed_log_path(ws).exists() + + +def test_push_requeues_network_exception(tmp_path, monkeypatch): + """Connection error / DNS / timeout — no status code from server. + Treat as transient: requeue rather than drop.""" + monkeypatch.setenv("AGNES_LOCAL_DIR", str(tmp_path)) + _stub_config(monkeypatch) + + def _raise(*a, **kw): + raise ConnectionError("server unreachable") + monkeypatch.setattr("cli.commands.push.api_post", _raise) + + transcript = tmp_path / "x.jsonl" + transcript.write_text("{}\n") + append_to_queue(tmp_path, "sid-net", str(transcript)) + + result = runner.invoke(push_app, []) + assert result.exit_code == 0 + live = queue_path(tmp_path).read_text(encoding="utf-8") + assert f"sid-net\t{transcript}\n" == live + assert not failed_log_path(tmp_path).exists() + + +def test_push_4xx_drop_count_in_json_output(tmp_path, monkeypatch): + """--json surfaces the new `dropped_permanent` counter so operators + can pipe it into monitoring / scripts.""" + monkeypatch.setenv("AGNES_LOCAL_DIR", str(tmp_path)) + _stub_config(monkeypatch) + _stub_api_post_status(monkeypatch, 401) + + transcript = tmp_path / "x.jsonl" + transcript.write_text("{}\n") + append_to_queue(tmp_path, "sid-1", str(transcript)) + + result = runner.invoke(push_app, ["--json"]) + assert result.exit_code == 0 + payload = json.loads(result.output) + assert payload["dropped_permanent"] == 1 + assert payload["sessions"] == 0 + + +def test_push_4xx_drop_visible_in_quiet_stdout(tmp_path, monkeypatch): + """Non-quiet stdout mentions the audit-log path so operators tailing + `agnes push` output get a pointer to the forensic trail.""" + monkeypatch.setenv("AGNES_LOCAL_DIR", str(tmp_path)) + _stub_config(monkeypatch) + _stub_api_post_status(monkeypatch, 413) + + transcript = tmp_path / "huge.jsonl" + transcript.write_text("{}\n") + append_to_queue(tmp_path, "sid-big", str(transcript)) + + result = runner.invoke(push_app, []) + assert result.exit_code == 0 + assert "agnes-sessions-failed.txt" in result.output + assert "permanent failure" in result.output + + +# ---------- David #8: legacy-scan honors the private list ------------------- +# +# `--legacy-scan` walks ~/.claude/projects//*.jsonl. Claude Code +# names jsonls `.jsonl`, so the file stem IS the session id — +# the same private filter that protects queue uploads must apply. Without +# this, an operator running `agnes push --legacy-scan` to backfill old +# sessions would silently upload everything on disk. + + +def test_push_legacy_scan_skips_private_session(tmp_path, monkeypatch): + """Legacy-scan picks up `.jsonl` from the projects dir; if the + sid is on the private list, it must be skipped + audit-logged.""" + monkeypatch.setenv("AGNES_LOCAL_DIR", str(tmp_path)) + _stub_config(monkeypatch) + calls = _record_uploads(monkeypatch) + + projects_dir = tmp_path / "projects-fake" + projects_dir.mkdir() + pub = projects_dir / "sid-public.jsonl" + pub.write_text("{}\n") + priv = projects_dir / "sid-private.jsonl" + priv.write_text("{}\n") + + monkeypatch.setattr( + "cli.lib.claude_sessions.list_session_files", + lambda _w: [pub, priv], + ) + add_private(tmp_path, "sid-private") + + result = runner.invoke(push_app, ["--legacy-scan", "--quiet"]) + assert result.exit_code == 0 + + sessions_calls = [c for c in calls if c[0] == "/api/upload/sessions"] + assert len(sessions_calls) == 1 + uploaded_path = sessions_calls[0][1]["files"]["file"][0] + assert uploaded_path == "sid-public.jsonl" + + audit = private_skipped_log_path(tmp_path).read_text(encoding="utf-8") + assert "sid-private" in audit + assert str(priv) in audit + + +def test_push_legacy_scan_dry_run_segregates_private(tmp_path, monkeypatch): + """Dry-run JSON shape: legacy-scan candidates surface in + would_upload OR would_skip_private depending on private membership.""" + monkeypatch.setenv("AGNES_LOCAL_DIR", str(tmp_path)) + _stub_config(monkeypatch) + + projects_dir = tmp_path / "projects-fake" + projects_dir.mkdir() + public_jsonl = projects_dir / "sid-pub.jsonl" + public_jsonl.write_text("{}\n") + private_jsonl = projects_dir / "sid-priv.jsonl" + private_jsonl.write_text("{}\n") + + monkeypatch.setattr( + "cli.lib.claude_sessions.list_session_files", + lambda _w: [public_jsonl, private_jsonl], + ) + add_private(tmp_path, "sid-priv") + + result = runner.invoke(push_app, ["--legacy-scan", "--dry-run", "--json"]) + assert result.exit_code == 0 + payload = json.loads(result.output) + assert str(public_jsonl) in payload["would_upload"]["sessions"] + assert str(private_jsonl) not in payload["would_upload"]["sessions"] + skipped_paths = [e["path"] for e in payload["would_skip_private"]] + assert str(private_jsonl) in skipped_paths diff --git a/tests/test_e2e_privacy.py b/tests/test_e2e_privacy.py new file mode 100644 index 0000000..9c722d4 --- /dev/null +++ b/tests/test_e2e_privacy.py @@ -0,0 +1,208 @@ +"""End-to-end test pinning the user-visible privacy promise: +**a session marked /agnes-private never reaches the server.** + +Each unit test (capture_session, mark_private, push, private_list) covers +its own component edge cases. This file wires them together against a +recording fake `api_post` to verify the cross-component contract — closes +ZdenekSrotyr S2.9 in PR #242. + +The test exercises both race orderings: +- mark BEFORE capture (capture-session sees the private flag and skips + the queue write — entry never enters the upload pipeline) +- capture BEFORE mark (entry sits in the queue; push's per-entry + re-check against the private list filters it out and audit-logs) +""" + +from __future__ import annotations + +import json + +from typer.testing import CliRunner + +from cli.commands.capture_session import capture_session_app +from cli.commands.mark_private import mark_private_app +from cli.commands.push import push_app +from cli.lib.session_queue import ( + failed_log_path, + private_skipped_log_path, + queue_path, + uploaded_log_path, +) + + +runner = CliRunner() + + +class _FakeResp: + def __init__(self, status_code: int = 200) -> None: + self.status_code = status_code + + +def _record_uploads(monkeypatch) -> list[tuple[str, dict]]: + """Patch api_post to record every call. Sessions endpoint succeeds with + 200; local-md endpoint also succeeds. Returns the recorder list.""" + calls: list[tuple[str, dict]] = [] + + def _fake(endpoint, **kwargs): + calls.append((endpoint, kwargs)) + return _FakeResp(200) + + monkeypatch.setattr("cli.commands.push.api_post", _fake) + return calls + + +def _stub_push_config(monkeypatch) -> None: + monkeypatch.setattr("cli.commands.push.get_server_url", lambda: "http://x") + monkeypatch.setattr("cli.commands.push.get_token", lambda: "test-pat") + + +def _capture(workspace, session_id, transcript_path, monkeypatch) -> None: + """Simulate a SessionStart hook firing `agnes capture-session` with the + real CC stdin payload shape.""" + monkeypatch.setenv("AGNES_LOCAL_DIR", str(workspace)) + payload = json.dumps({ + "session_id": session_id, + "transcript_path": str(transcript_path), + "cwd": str(workspace), + "hook_event_name": "SessionStart", + }) + result = runner.invoke(capture_session_app, [], input=payload) + assert result.exit_code == 0, result.output + + +def _mark_private(workspace, session_id, monkeypatch) -> None: + """Simulate the user typing `/agnes-private` inside Claude Code — + `!`-prefix bash spawns `agnes mark-private` with CLAUDE_CODE_SESSION_ID + set to the active session.""" + monkeypatch.setenv("AGNES_LOCAL_DIR", str(workspace)) + monkeypatch.setenv("CLAUDE_CODE_SESSION_ID", session_id) + result = runner.invoke(mark_private_app, []) + assert result.exit_code == 0, result.output + + +def _uploaded_session_names(calls: list[tuple[str, dict]]) -> list[str]: + """Extract the `name` field from every recorded /api/upload/sessions + multipart upload. Returns the list of jsonl filenames that reached + the (fake) server.""" + out: list[str] = [] + for endpoint, kwargs in calls: + if endpoint != "/api/upload/sessions": + continue + files = kwargs.get("files", {}) + file_field = files.get("file") + if file_field is None: + continue + # `files={"file": (name, fh)}` — `_upload_one` in push.py + name = file_field[0] if isinstance(file_field, tuple) else None + if name is not None: + out.append(name) + return out + + +def test_e2e_mark_before_capture_session_never_uploads(tmp_path, monkeypatch): + """User marks /agnes-private *before* the SessionStart hook runs + capture-session. The capture-session call sees the session_id is on + the private list and skips the queue write entirely — the upload + pipeline never even sees it.""" + _stub_push_config(monkeypatch) + calls = _record_uploads(monkeypatch) + + private_id = "private-session" + private_transcript = tmp_path / "private.jsonl" + private_transcript.write_text("{}\n") + + # Step 1: user runs /agnes-private (e.g. typed it before any + # SessionStart hook reached capture-session). + _mark_private(tmp_path, private_id, monkeypatch) + + # Step 2: SessionStart hook fires capture-session. It must see the + # private flag and skip the queue write. + _capture(tmp_path, private_id, private_transcript, monkeypatch) + + # Queue must be empty / absent — capture-session bailed before write. + assert not queue_path(tmp_path).exists() or \ + queue_path(tmp_path).read_text(encoding="utf-8") == "" + + # Step 3: push runs (e.g. SessionEnd fires it). + result = runner.invoke(push_app, ["--quiet"]) + assert result.exit_code == 0, result.output + + # Assert: nothing reached the (fake) server. + assert _uploaded_session_names(calls) == [], ( + f"Private session must not upload; got: {_uploaded_session_names(calls)}" + ) + # And no uploaded-log entry was written. + assert not uploaded_log_path(tmp_path).exists() + + +def test_e2e_capture_before_mark_filters_at_push(tmp_path, monkeypatch): + """User marks /agnes-private *after* capture-session has already + queued the transcript (typical scenario — user types the slash + command mid-session). The entry sits in the queue; push's per-entry + re-check against the private list filters it out at the last moment + and audit-logs the skip.""" + _stub_push_config(monkeypatch) + calls = _record_uploads(monkeypatch) + + private_id = "private-session" + private_transcript = tmp_path / "private.jsonl" + private_transcript.write_text("{}\n") + public_id = "public-session" + public_transcript = tmp_path / "public.jsonl" + public_transcript.write_text("{}\n") + + # Step 1: SessionStart hooks fire capture-session for BOTH sessions. + # Neither is private yet, so both land in the queue. + _capture(tmp_path, private_id, private_transcript, monkeypatch) + _capture(tmp_path, public_id, public_transcript, monkeypatch) + # Sanity: queue has both entries. + queue_lines = queue_path(tmp_path).read_text(encoding="utf-8").splitlines() + assert len(queue_lines) == 2 + + # Step 2: user marks one of them /agnes-private mid-session. + _mark_private(tmp_path, private_id, monkeypatch) + + # Step 3: push runs. + result = runner.invoke(push_app, ["--quiet"]) + assert result.exit_code == 0, result.output + + # Assert: ONLY the public session reached the (fake) server. + uploaded = _uploaded_session_names(calls) + assert uploaded == ["public.jsonl"], ( + f"Only the non-private session should upload; got: {uploaded}" + ) + + # Audit log records the skipped private entry — the forensic trail. + skipped_log = private_skipped_log_path(tmp_path).read_text(encoding="utf-8") + assert private_id in skipped_log + assert "private.jsonl" in skipped_log + + # Failed log MUST be empty — private skip is not a failure, just + # an intentional exclusion. + assert not failed_log_path(tmp_path).exists() + + +def test_e2e_unmarked_sessions_upload_normally(tmp_path, monkeypatch): + """Control case: with no /agnes-private call, both sessions upload. + Proves the e2e tests above are exercising the privacy filter, not + some unrelated reason the upload didn't fire.""" + _stub_push_config(monkeypatch) + calls = _record_uploads(monkeypatch) + + t1 = tmp_path / "a.jsonl" + t1.write_text("{}\n") + t2 = tmp_path / "b.jsonl" + t2.write_text("{}\n") + + _capture(tmp_path, "sid-a", t1, monkeypatch) + _capture(tmp_path, "sid-b", t2, monkeypatch) + + result = runner.invoke(push_app, ["--quiet"]) + assert result.exit_code == 0, result.output + + uploaded = sorted(_uploaded_session_names(calls)) + assert uploaded == ["a.jsonl", "b.jsonl"], ( + f"Both unmarked sessions should upload; got: {uploaded}" + ) + # No private-skipped audit entries either — nothing was marked private. + assert not private_skipped_log_path(tmp_path).exists() diff --git a/tests/test_lib_commands.py b/tests/test_lib_commands.py index 9c9b2a0..eda20dd 100644 --- a/tests/test_lib_commands.py +++ b/tests/test_lib_commands.py @@ -78,6 +78,20 @@ def test_install_idempotent(tmp_path): install_claude_commands(tmp_path) second = _read_managed_command(tmp_path) assert first == second - # And no extra files appeared. + # Exactly the managed files, no strays. cmd_dir_files = sorted(p.name for p in (tmp_path / ".claude" / "commands").iterdir()) - assert cmd_dir_files == ["update-agnes-plugins.md"] + assert cmd_dir_files == ["agnes-private.md", "update-agnes-plugins.md"] + + +def test_install_writes_agnes_private_slash_command(tmp_path): + """The /agnes-private slash command is shipped alongside update-agnes-plugins + and triggers `agnes mark-private` deterministically (no AI in the loop). + Pin the marker so a template rewrite that drops the `!`-prefix line is caught.""" + install_claude_commands(tmp_path) + body = (tmp_path / ".claude" / "commands" / "agnes-private.md").read_text( + encoding="utf-8" + ) + assert body.startswith("---"), body[:120] + assert "description:" in body.split("---", 2)[1] + # `!`-prefix runs as bash directly — deterministic, no AI tokens. + assert "agnes mark-private" in body diff --git a/tests/test_lib_hooks.py b/tests/test_lib_hooks.py index 78d5127..745a6cd 100644 --- a/tests/test_lib_hooks.py +++ b/tests/test_lib_hooks.py @@ -4,7 +4,11 @@ import json from pathlib import Path -from cli.lib.hooks import install_claude_hooks +from cli.lib.hooks import ( + install_claude_hooks, + maybe_refresh_claude_hooks, + workspace_has_agnes_hooks, +) def _read_settings(workspace: Path) -> dict: @@ -27,14 +31,26 @@ def test_install_creates_settings_file(tmp_path): install_claude_hooks(tmp_path) cfg = _read_settings(tmp_path) starts = _commands_for(cfg, "SessionStart") - # SessionStart has three entries: (1) chained self-upgrade ; pull — - # self-upgrade runs first so a wire-protocol bump lands before pull - # tries to use the new CLI; (2) refresh-marketplace as a separate - # entry so a failure (e.g. fresh workspace with no clone) doesn't - # suppress the data pull above; (3) push as a self-heal for orphan - # session JSONLs from `claude -p` headless mode (where Claude Code - # does NOT fire SessionEnd) or abnormal exits. + # SessionStart has three entries: (1) capture-session as the very first + # so the hook stdin (transcript_path) is appended to the queue before + # any other hook runs; (2) chained self-upgrade ; pull — self-upgrade + # runs first so a wire-protocol bump lands before pull tries to use + # the new CLI; (3) refresh-marketplace as a separate entry so a + # failure (e.g. fresh workspace with no clone) doesn't suppress the + # data pull above. + # + # `agnes push` is NOT in SessionStart — the queue mechanism handles + # orphans on the next SessionEnd, so the old self-heal entry was + # redundant + would re-upload the just-starting (empty) session. assert len(starts) == 3 + capture = next((c for c in starts if "agnes capture-session" in c), None) + assert capture is not None, "Expected SessionStart capture-session entry" + assert capture.startswith("bash -c "), ( + f"capture-session hook must be wrapped in bash -c for Windows; got: {capture!r}" + ) + assert not any("agnes push" in c for c in starts), ( + f"agnes push must NOT be in SessionStart; got: {starts!r}" + ) chain = next( (c for c in starts if "agnes self-upgrade" in c and "agnes pull" in c), None, @@ -62,25 +78,36 @@ def test_install_creates_settings_file(tmp_path): assert "--quiet" not in refresh, ( f"refresh-marketplace hook must NOT use --quiet (removed flag); got: {refresh!r}" ) - # The push self-heal entry is also bash-c-wrapped for Windows parity. - push_start = next((c for c in starts if "agnes push" in c), None) - assert push_start is not None, ( - "Expected SessionStart self-heal `agnes push` entry for orphan JSONLs" - ) - assert push_start.startswith("bash -c "), ( - f"push self-heal hook must be wrapped in bash -c for Windows; got: {push_start!r}" - ) ends = _commands_for(cfg, "SessionEnd") assert len(ends) == 1 assert "agnes push --quiet" in ends[0] +def test_all_installed_hooks_are_bash_wrapped(tmp_path): + """Pin the invariant: every Agnes-managed SessionStart / SessionEnd + entry must be wrapped in `bash -c "..."`. Claude Code on Windows + runs hook commands directly (no shell), so any unwrapped entry that + relies on shell syntax (`;` chains, `2>/dev/null` redirection, + `|| true` short-circuit) fails silently on Windows. The previous + self-upgrade+pull chain shipped unwrapped — this test catches a + regression to that state.""" + install_claude_hooks(tmp_path) + cfg = _read_settings(tmp_path) + starts = _commands_for(cfg, "SessionStart") + ends = _commands_for(cfg, "SessionEnd") + for cmd in starts + ends: + assert cmd.startswith("bash -c "), ( + f"Hook must be wrapped in bash -c for Windows; got: {cmd!r}" + ) + + def test_install_idempotent(tmp_path): install_claude_hooks(tmp_path) install_claude_hooks(tmp_path) cfg = _read_settings(tmp_path) - # Three SessionStart entries (pull + refresh-marketplace + push self-heal), - # one SessionEnd entry (push). Re-install must NOT duplicate them. + # Three SessionStart entries (capture-session + chained self-upgrade/pull + # + refresh-marketplace), one SessionEnd entry (push). Re-install must + # NOT duplicate them. assert len(cfg["hooks"]["SessionStart"]) == 3 assert len(cfg["hooks"]["SessionEnd"]) == 1 @@ -100,9 +127,11 @@ def test_install_replaces_old_da_sync_entries(tmp_path): cfg = _read_settings(tmp_path) starts = _commands_for(cfg, "SessionStart") assert len(starts) == 3 + assert any("agnes capture-session" in c for c in starts) assert any("agnes pull" in c for c in starts) assert any("agnes refresh-marketplace" in c for c in starts) - assert any("agnes push" in c for c in starts) + # `agnes push` lives only in SessionEnd now. + assert not any("agnes push" in c for c in starts) # Legacy command must be gone from BOTH starts. assert not any("da sync" in c for c in starts) @@ -111,7 +140,7 @@ def test_install_replaces_prior_single_pull_entry(tmp_path): """Workspaces bootstrapped by a CLI version that only installed a single SessionStart entry (`agnes pull`, no refresh-marketplace) must upgrade to the three-entry layout on the next install — not end up - with four entries (one old + three new).""" + stacking the new entries on top of the old one.""" settings_path = tmp_path / ".claude" / "settings.json" settings_path.parent.mkdir(parents=True) settings_path.write_text(json.dumps({ @@ -125,9 +154,10 @@ def test_install_replaces_prior_single_pull_entry(tmp_path): cfg = _read_settings(tmp_path) starts = _commands_for(cfg, "SessionStart") assert len(starts) == 3 + assert any("agnes capture-session" in c for c in starts) assert any("agnes pull" in c for c in starts) assert any("agnes refresh-marketplace" in c for c in starts) - assert any("agnes push" in c for c in starts) + assert not any("agnes push" in c for c in starts) def test_install_replaces_v0_43_chained_self_upgrade_pull_entry(tmp_path): @@ -163,8 +193,9 @@ def test_install_replaces_v0_43_chained_self_upgrade_pull_entry(tmp_path): None, ) assert chain is not None + assert any("agnes capture-session" in c for c in starts) assert any("agnes refresh-marketplace" in c for c in starts) - assert any("agnes push" in c for c in starts) + assert not any("agnes push" in c for c in starts) # SessionEnd untouched (single push entry). ends = _commands_for(cfg, "SessionEnd") assert len(ends) == 1 @@ -233,9 +264,10 @@ def test_install_preserves_third_party_hooks(tmp_path): # Third-party entry stays + all three agnes entries get added. assert len(starts) == 4 assert any("echo hi from another tool" in c for c in starts) + assert any("agnes capture-session" in c for c in starts) assert any("agnes pull" in c for c in starts) assert any("agnes refresh-marketplace" in c for c in starts) - assert any("agnes push" in c for c in starts) + assert not any("agnes push" in c for c in starts) # Other event types untouched. assert cfg["hooks"]["PreToolUse"][0]["hooks"][0]["command"] == "echo pre" @@ -279,9 +311,8 @@ def test_install_idempotent_chained_entry(tmp_path): install_claude_hooks(tmp_path) install_claude_hooks(tmp_path) cfg = _read_settings(tmp_path) - # Three SessionStart entries (chained self-upgrade+pull, refresh- - # marketplace, and the push self-heal) — re-install must not - # duplicate any of them. + # Three SessionStart entries (capture-session, chained self-upgrade+pull, + # refresh-marketplace) — re-install must not duplicate any of them. assert len(cfg["hooks"]["SessionStart"]) == 3 assert len(cfg["hooks"]["SessionEnd"]) == 1 @@ -328,6 +359,80 @@ def test_session_end_push_is_detached(tmp_path): ) +def test_install_writes_statusline_when_absent(tmp_path): + """Greenfield install: no prior statusLine → we write ours.""" + install_claude_hooks(tmp_path) + cfg = _read_settings(tmp_path) + assert "statusLine" in cfg + assert cfg["statusLine"]["type"] == "command" + assert "agnes statusline" in cfg["statusLine"]["command"] + + +def test_install_preserves_existing_user_statusline(tmp_path, capsys): + """User has their own statusLine — we leave it alone and warn on stderr. + Customizing the status bar is a personal preference; agnes shouldn't + clobber it. Operators who want the private indicator alongside their + own content can compose `agnes statusline` into their command.""" + settings_path = tmp_path / ".claude" / "settings.json" + settings_path.parent.mkdir(parents=True) + user_statusline = {"type": "command", "command": "my-custom-status"} + settings_path.write_text(json.dumps({"statusLine": user_statusline})) + + install_claude_hooks(tmp_path) + cfg = _read_settings(tmp_path) + # User's statusLine intact. + assert cfg["statusLine"] == user_statusline + # Warning surfaced. + captured = capsys.readouterr() + assert "statusLine" in captured.err + + +def test_install_idempotent_when_statusline_already_ours(tmp_path): + """Re-running install when our statusLine is already in place is a no-op, + NOT a warning (idempotent re-init shouldn't spam the user).""" + install_claude_hooks(tmp_path) + install_claude_hooks(tmp_path) + cfg = _read_settings(tmp_path) + assert "agnes statusline" in cfg["statusLine"]["command"] + + +def test_install_treats_explicit_null_statusline_as_unconfigured(tmp_path, capsys): + """`"statusLine": null` (legal JSON, unusual) is treated as + "no statusLine configured" rather than "user explicitly opted out". + The previous truthy check (`if existing:`) silently took this path + without distinguishing it from absent-key; the new check makes + the behavior explicit and tested. No warning — the user didn't + configure anything actionable.""" + settings_path = tmp_path / ".claude" / "settings.json" + settings_path.parent.mkdir(parents=True) + settings_path.write_text(json.dumps({"statusLine": None})) + + install_claude_hooks(tmp_path) + + cfg = _read_settings(tmp_path) + assert isinstance(cfg["statusLine"], dict) + assert "agnes statusline" in cfg["statusLine"]["command"] + captured = capsys.readouterr() + assert "preserved" not in captured.err # no spurious warning + + +def test_install_treats_empty_statusline_as_unconfigured(tmp_path, capsys): + """Empty string is the falsy sibling of None — same treatment. + Documents the boundary so future changes can't accidentally treat + `""` as a non-empty truthy command.""" + settings_path = tmp_path / ".claude" / "settings.json" + settings_path.parent.mkdir(parents=True) + settings_path.write_text(json.dumps({"statusLine": ""})) + + install_claude_hooks(tmp_path) + + cfg = _read_settings(tmp_path) + assert isinstance(cfg["statusLine"], dict) + assert "agnes statusline" in cfg["statusLine"]["command"] + captured = capsys.readouterr() + assert "preserved" not in captured.err + + def test_install_replaces_old_synchronous_session_end_push(tmp_path): """A workspace bootstrapped before the detachment fix has the old synchronous `agnes push --quiet 2>/dev/null || true` SessionEnd entry. @@ -350,3 +455,135 @@ def test_install_replaces_old_synchronous_session_end_push(tmp_path): assert "nohup" in ends[0], ( f"Old synchronous push entry must have been replaced with the detached form; got: {ends!r}" ) + + +# --------------------------------------------------------------------------- +# workspace_has_agnes_hooks / maybe_refresh_claude_hooks +# --------------------------------------------------------------------------- + + +def test_workspace_has_agnes_hooks_false_for_missing_settings(tmp_path): + """Fresh dir without `.claude/settings.json` is not an Agnes workspace.""" + assert workspace_has_agnes_hooks(tmp_path) is False + + +def test_workspace_has_agnes_hooks_false_for_empty_settings(tmp_path): + """`.claude/settings.json` exists but is empty `{}` — not an Agnes workspace.""" + sp = tmp_path / ".claude" / "settings.json" + sp.parent.mkdir(parents=True) + sp.write_text(json.dumps({}), encoding="utf-8") + assert workspace_has_agnes_hooks(tmp_path) is False + + +def test_workspace_has_agnes_hooks_false_for_invalid_json(tmp_path): + sp = tmp_path / ".claude" / "settings.json" + sp.parent.mkdir(parents=True) + sp.write_text("not json", encoding="utf-8") + assert workspace_has_agnes_hooks(tmp_path) is False + + +def test_workspace_has_agnes_hooks_false_for_third_party_only_hook(tmp_path): + """A settings.json with only third-party hooks (no agnes marker) is + NOT an Agnes workspace — important so `agnes self-upgrade` from such + a dir doesn't auto-install Agnes hooks behind the user's back.""" + sp = tmp_path / ".claude" / "settings.json" + sp.parent.mkdir(parents=True) + sp.write_text(json.dumps({ + "hooks": { + "SessionStart": [ + {"hooks": [{"type": "command", "command": "echo hello"}]}, + ], + } + }), encoding="utf-8") + assert workspace_has_agnes_hooks(tmp_path) is False + + +def test_workspace_has_agnes_hooks_true_for_agnes_hook(tmp_path): + install_claude_hooks(tmp_path) + assert workspace_has_agnes_hooks(tmp_path) is True + + +def test_workspace_has_agnes_hooks_true_for_just_statusline(tmp_path): + """statusLine alone (no hook entries) still signals an Agnes workspace.""" + sp = tmp_path / ".claude" / "settings.json" + sp.parent.mkdir(parents=True) + sp.write_text(json.dumps({ + "statusLine": {"type": "command", "command": "agnes statusline"}, + }), encoding="utf-8") + assert workspace_has_agnes_hooks(tmp_path) is True + + +def test_maybe_refresh_noop_in_non_agnes_directory(tmp_path): + """Critical safety: `agnes self-upgrade` invoked from a non-Agnes dir + (e.g. ~/) must NOT create `.claude/settings.json` there.""" + refreshed = maybe_refresh_claude_hooks(tmp_path) + assert refreshed is False + assert not (tmp_path / ".claude").exists() + + +def test_maybe_refresh_writes_new_layout_into_v048_workspace(tmp_path): + """Simulate a v0.48 workspace (pre-PR-242 hook layout — chained + self-upgrade+pull and old refresh-marketplace --quiet, no + capture-session, no statusLine) and assert that + maybe_refresh_claude_hooks brings it up to the current layout.""" + sp = tmp_path / ".claude" / "settings.json" + sp.parent.mkdir(parents=True) + sp.write_text(json.dumps({ + "hooks": { + "SessionStart": [ + {"hooks": [{"type": "command", + "command": "agnes self-upgrade --quiet 2>/dev/null || true; " + "agnes pull --quiet 2>/dev/null || true"}]}, + {"hooks": [{"type": "command", + "command": 'bash -c "agnes refresh-marketplace --quiet 2>/dev/null || true"'}]}, + ], + "SessionEnd": [ + {"hooks": [{"type": "command", + "command": "agnes push --quiet 2>/dev/null || true"}]}, + ], + } + }), encoding="utf-8") + + refreshed = maybe_refresh_claude_hooks(tmp_path) + assert refreshed is True + + cfg = _read_settings(tmp_path) + starts = _commands_for(cfg, "SessionStart") + # New layout: capture-session is now the very first SessionStart entry. + assert any("agnes capture-session" in c for c in starts), ( + f"v0.48→v0.49 migration must install capture-session; got: {starts!r}" + ) + # And refresh-marketplace must be on --check, not --quiet. + refresh = next((c for c in starts if "agnes refresh-marketplace" in c), None) + assert refresh is not None and "--check" in refresh, refresh + + ends = _commands_for(cfg, "SessionEnd") + # SessionEnd push is now detached via nohup (replaces the old + # synchronous form). + assert any("nohup" in c for c in ends), ( + f"v0.48→v0.49 migration must detach SessionEnd push; got: {ends!r}" + ) + # statusLine for the agnes-private indicator must also be installed. + assert cfg.get("statusLine", {}).get("command", "").startswith("agnes statusline") + + +def test_maybe_refresh_preserves_third_party_hooks(tmp_path): + """Refresh must keep third-party hooks intact (same guarantee as + install_claude_hooks — the migration path uses the same machinery).""" + sp = tmp_path / ".claude" / "settings.json" + sp.parent.mkdir(parents=True) + sp.write_text(json.dumps({ + "hooks": { + "SessionStart": [ + {"hooks": [{"type": "command", "command": "agnes self-upgrade --quiet || true"}]}, + {"hooks": [{"type": "command", "command": "echo hi from another tool"}]}, + ], + } + }), encoding="utf-8") + refreshed = maybe_refresh_claude_hooks(tmp_path) + assert refreshed is True + cfg = _read_settings(tmp_path) + starts = _commands_for(cfg, "SessionStart") + assert "echo hi from another tool" in starts, ( + f"Third-party hook must survive refresh; got: {starts!r}" + ) diff --git a/tests/test_mark_private.py b/tests/test_mark_private.py new file mode 100644 index 0000000..dbeb27c --- /dev/null +++ b/tests/test_mark_private.py @@ -0,0 +1,47 @@ +"""Tests for `agnes mark-private` — slash-command-driven private flag.""" + +from typer.testing import CliRunner + +from cli.commands.mark_private import mark_private_app +from cli.lib.private_list import is_private + +runner = CliRunner() + + +def test_mark_private_requires_session_id_env(tmp_path, monkeypatch): + """Without CLAUDE_CODE_SESSION_ID set (= ran outside a Claude session), + the command must error out with exit 1, NOT silently no-op.""" + monkeypatch.setenv("AGNES_LOCAL_DIR", str(tmp_path)) + monkeypatch.delenv("CLAUDE_CODE_SESSION_ID", raising=False) + + result = runner.invoke(mark_private_app, []) + assert result.exit_code == 1 + assert "CLAUDE_CODE_SESSION_ID" in result.output + + +def test_mark_private_writes_to_private_list(tmp_path, monkeypatch): + monkeypatch.setenv("AGNES_LOCAL_DIR", str(tmp_path)) + monkeypatch.setenv("CLAUDE_CODE_SESSION_ID", "abc-123") + + result = runner.invoke(mark_private_app, []) + assert result.exit_code == 0 + assert is_private(tmp_path, "abc-123") + assert "abc-123" in result.output + + +def test_mark_private_is_idempotent(tmp_path, monkeypatch): + monkeypatch.setenv("AGNES_LOCAL_DIR", str(tmp_path)) + monkeypatch.setenv("CLAUDE_CODE_SESSION_ID", "abc-123") + + runner.invoke(mark_private_app, []) + result = runner.invoke(mark_private_app, []) + assert result.exit_code == 0 + assert "already marked" in result.output.lower() + + +def test_mark_private_blank_session_id_treated_as_missing(tmp_path, monkeypatch): + """Whitespace-only env var is the same as unset.""" + monkeypatch.setenv("AGNES_LOCAL_DIR", str(tmp_path)) + monkeypatch.setenv("CLAUDE_CODE_SESSION_ID", " ") + result = runner.invoke(mark_private_app, []) + assert result.exit_code == 1 diff --git a/tests/test_private_list.py b/tests/test_private_list.py new file mode 100644 index 0000000..12e6a72 --- /dev/null +++ b/tests/test_private_list.py @@ -0,0 +1,86 @@ +"""Tests for cli.lib.private_list — authoritative "do not upload" state.""" + +from cli.lib.private_list import ( + add_private, + is_private, + private_list_path, + read_all_private, +) + + +def test_is_private_false_when_file_missing(tmp_path): + assert is_private(tmp_path, "abc") is False + + +def test_is_private_false_for_empty_session_id(tmp_path): + add_private(tmp_path, "abc") + assert is_private(tmp_path, "") is False + + +def test_add_private_creates_claude_dir_and_writes(tmp_path): + assert add_private(tmp_path, "abc") is True + assert private_list_path(tmp_path).exists() + assert is_private(tmp_path, "abc") is True + + +def test_add_private_is_idempotent(tmp_path): + assert add_private(tmp_path, "abc") is True + assert add_private(tmp_path, "abc") is False # second call: already present + # File has exactly one line + contents = private_list_path(tmp_path).read_text(encoding="utf-8") + assert contents == "abc\n" + + +def test_add_private_empty_id_noop(tmp_path): + assert add_private(tmp_path, "") is False + assert not private_list_path(tmp_path).exists() + + +def test_read_all_private_returns_set(tmp_path): + add_private(tmp_path, "abc") + add_private(tmp_path, "def") + assert read_all_private(tmp_path) == {"abc", "def"} + + +def test_read_all_private_skips_blank_lines(tmp_path): + # writable=True so the parent .claude/ exists for the manual write below. + path = private_list_path(tmp_path, writable=True) + path.write_text("abc\n\n \ndef\n", encoding="utf-8") + assert read_all_private(tmp_path) == {"abc", "def"} + + +def test_read_all_private_empty_when_file_missing(tmp_path): + assert read_all_private(tmp_path) == set() + + +def test_read_only_path_does_not_create_claude_dir(tmp_path): + """Hot-path reader (statusline → is_private → private_list_path) + must NOT materialize ``.claude/`` in arbitrary directories. S2.7 from + the PR review.""" + _ = private_list_path(tmp_path) # default writable=False + assert not (tmp_path / ".claude").exists() + # Read-side helpers also stay side-effect-free. + _ = read_all_private(tmp_path) + assert is_private(tmp_path, "missing") is False + assert not (tmp_path / ".claude").exists() + + +def test_read_cache_returns_same_set_across_calls(tmp_path): + """mtime-keyed cache: repeated reads of an unchanged file return the + same set (same identity is fine; same value is the contract).""" + add_private(tmp_path, "abc") + a = read_all_private(tmp_path) + b = read_all_private(tmp_path) + assert a == b == {"abc"} + + +def test_add_private_evicts_cache_so_subsequent_read_sees_new_id(tmp_path): + """Within a sub-second window after add, is_private must reflect + the new entry — even when filesystem mtime granularity is 1s and + a naive mtime check would still treat the cache as fresh.""" + add_private(tmp_path, "first") + assert read_all_private(tmp_path) == {"first"} + # Second add within the same second; cache eviction makes the next + # read see both entries even though mtime may be identical. + add_private(tmp_path, "second") + assert read_all_private(tmp_path) == {"first", "second"} diff --git a/tests/test_self_upgrade.py b/tests/test_self_upgrade.py index 8984a95..01be539 100644 --- a/tests/test_self_upgrade.py +++ b/tests/test_self_upgrade.py @@ -369,3 +369,83 @@ def test_smoke_test_passes_with_pep440_local_version(): ok, detail = su._smoke_test_new_binary("uv", expected_version="0.40.0") assert ok is False assert "version mismatch" in detail + + +# --------------------------------------------------------------------------- +# Workspace hook auto-refresh (PR #242 — ZdenekSrotyr #2 silent-stop fix) +# --------------------------------------------------------------------------- + + +def test_hook_refresh_fires_when_cli_already_current(monkeypatch): + """The info-is-None fast path must still refresh hooks. Covers the + v0.48→v0.49 migration moment when the operator already self-upgraded + the CLI (so the second self-upgrade call from a SessionStart hook + finds nothing to install), but their workspace settings.json was + written by the older CLI version and lacks the new capture-session + hook entry.""" + monkeypatch.setenv("AGNES_LOCAL_DIR", "/fake/workspace") + with patch("cli.commands.self_upgrade.check", return_value=_current_info()), \ + patch("cli.commands.self_upgrade.maybe_refresh_claude_hooks") as mock_refresh: + result = runner.invoke(app, ["self-upgrade"]) + assert result.exit_code == 0 + mock_refresh.assert_called_once() + + +def test_hook_refresh_fires_after_successful_install(monkeypatch): + """The install-success path must refresh hooks AFTER the new wheel is + in place — so any wire-format change in the new release lands on the + next session-start without re-running `agnes init`.""" + monkeypatch.setenv("AGNES_LOCAL_DIR", "/fake/workspace") + with patch("cli.commands.self_upgrade.check", return_value=_outdated_info()), \ + patch("cli.commands.self_upgrade.shutil.which", return_value="/usr/local/bin/uv"), \ + patch("cli.commands.self_upgrade.subprocess.run") as mock_run, \ + patch("cli.commands.self_upgrade._smoke_test_new_binary", return_value=_smoke_pass()), \ + patch("cli.commands.self_upgrade._read_last_known_good", return_value=None), \ + patch("cli.commands.self_upgrade._record_last_known_good"), \ + patch("cli.commands.self_upgrade._invalidate_update_cache"), \ + patch("cli.commands.self_upgrade.maybe_refresh_claude_hooks") as mock_refresh: + mock_run.return_value = MagicMock(returncode=0) + result = runner.invoke(app, ["self-upgrade"]) + assert result.exit_code == 0 + mock_refresh.assert_called_once() + + +def test_hook_refresh_skipped_on_install_failure(monkeypatch): + """Failed install: do NOT refresh hooks — the rollback has already + run and the workspace is in a known-prior state; rewriting hooks now + could pin a layout that doesn't match the rolled-back binary.""" + monkeypatch.setenv("AGNES_LOCAL_DIR", "/fake/workspace") + with patch("cli.commands.self_upgrade.check", return_value=_outdated_info()), \ + patch("cli.commands.self_upgrade.shutil.which", return_value="/usr/local/bin/uv"), \ + patch("cli.commands.self_upgrade.subprocess.run") as mock_run, \ + patch("cli.commands.self_upgrade._smoke_test_new_binary", return_value=_smoke_fail()), \ + patch("cli.commands.self_upgrade._read_last_known_good", return_value=_PRIOR_URL), \ + patch("cli.commands.self_upgrade._record_last_known_good"), \ + patch("cli.commands.self_upgrade._invalidate_update_cache"), \ + patch("cli.commands.self_upgrade.maybe_refresh_claude_hooks") as mock_refresh: + mock_run.return_value = MagicMock(returncode=0) # install rc=0 but smoke failed + result = runner.invoke(app, ["self-upgrade"]) + assert result.exit_code == 1 + mock_refresh.assert_not_called() + + +def test_hook_refresh_skipped_when_check_only(monkeypatch): + """--check-only is read-only intent; never touch the workspace.""" + monkeypatch.setenv("AGNES_LOCAL_DIR", "/fake/workspace") + with patch("cli.commands.self_upgrade.check", return_value=_outdated_info()), \ + patch("cli.commands.self_upgrade.maybe_refresh_claude_hooks") as mock_refresh: + result = runner.invoke(app, ["self-upgrade", "--check-only"]) + # exit 1 because outdated — see test_check_only_when_outdated_exits_1 + assert result.exit_code == 1 + mock_refresh.assert_not_called() + + +def test_hook_refresh_failure_does_not_flip_exit_code(monkeypatch): + """An exception inside maybe_refresh_claude_hooks must NOT turn a + successful upgrade into rc=1. The refresh is best-effort.""" + monkeypatch.setenv("AGNES_LOCAL_DIR", "/fake/workspace") + with patch("cli.commands.self_upgrade.check", return_value=_current_info()), \ + patch("cli.commands.self_upgrade.maybe_refresh_claude_hooks", + side_effect=PermissionError("settings.json read-only")): + result = runner.invoke(app, ["self-upgrade"]) + assert result.exit_code == 0 diff --git a/tests/test_session_queue.py b/tests/test_session_queue.py new file mode 100644 index 0000000..1cd2674 --- /dev/null +++ b/tests/test_session_queue.py @@ -0,0 +1,258 @@ +"""Tests for cli.lib.session_queue — queue and uploaded-log helpers.""" + +from datetime import datetime, timezone +from pathlib import Path + +from cli.lib.session_queue import ( + append_to_queue, + discard_snapshot, + failed_log_path, + find_recovery_snapshots, + mark_failed_permanent, + mark_private_skipped, + mark_uploaded, + private_skipped_log_path, + queue_path, + read_entries_from_snapshot, + read_paths_from_snapshot, + requeue_failed, + snapshot_queue, + uploaded_log_path, +) + + +def test_append_creates_claude_dir_and_appends(tmp_path): + append_to_queue(tmp_path, "sid-1", "/abc/123.jsonl") + queue = queue_path(tmp_path) + assert queue.exists() + assert queue.read_text(encoding="utf-8") == "sid-1\t/abc/123.jsonl\n" + + append_to_queue(tmp_path, "sid-2", "/def/456.jsonl") + assert queue.read_text(encoding="utf-8") == ( + "sid-1\t/abc/123.jsonl\n" + "sid-2\t/def/456.jsonl\n" + ) + + +def test_append_strips_trailing_newline_from_input(tmp_path): + """Defensive: appender adds exactly one '\\n' even if caller passes one.""" + append_to_queue(tmp_path, "sid-1", "/abc.jsonl\n") + assert queue_path(tmp_path).read_text(encoding="utf-8") == "sid-1\t/abc.jsonl\n" + + +def test_append_empty_session_id_writes_leading_tab(tmp_path): + """Empty session_id is legal — line starts with the tab separator. + Used by code paths where session_id isn't yet known (none today, but + keeps the door open for backfill tooling).""" + append_to_queue(tmp_path, "", "/abc.jsonl") + assert queue_path(tmp_path).read_text(encoding="utf-8") == "\t/abc.jsonl\n" + + +def test_snapshot_returns_none_when_no_queue(tmp_path): + assert snapshot_queue(tmp_path) is None + + +def test_snapshot_renames_queue_atomically(tmp_path): + append_to_queue(tmp_path, "sid-1", "/abc.jsonl") + snap = snapshot_queue(tmp_path) + assert snap is not None + assert snap.exists() + assert not queue_path(tmp_path).exists() + assert snap.read_text(encoding="utf-8") == "sid-1\t/abc.jsonl\n" + + +def test_snapshot_filename_carries_pid(tmp_path): + import os + append_to_queue(tmp_path, "sid", "/x.jsonl") + snap = snapshot_queue(tmp_path) + assert snap is not None + assert str(os.getpid()) in snap.name + + +def test_snapshot_filename_is_unique_per_call(tmp_path): + """Two consecutive snapshots under the same PID must not collide. + Guards against the data-loss scenario where a crashed push left a + snapshot on disk and the OS later reused its PID for a new push: + without the uuid suffix, os.rename would atomically overwrite the + recovery snapshot, silently losing its entries.""" + append_to_queue(tmp_path, "sid-1", "/a.jsonl") + snap1 = snapshot_queue(tmp_path) + append_to_queue(tmp_path, "sid-2", "/b.jsonl") + snap2 = snapshot_queue(tmp_path) + assert snap1 is not None and snap2 is not None + assert snap1 != snap2 + assert snap1.exists() and snap2.exists() + + +def test_read_entries_dedups_and_preserves_order(tmp_path): + append_to_queue(tmp_path, "sid-1", "/abc.jsonl") + append_to_queue(tmp_path, "sid-2", "/def.jsonl") + append_to_queue(tmp_path, "sid-1", "/abc.jsonl") # exact duplicate + snap = snapshot_queue(tmp_path) + assert snap is not None + entries = read_entries_from_snapshot(snap) + assert entries == [("sid-1", Path("/abc.jsonl")), ("sid-2", Path("/def.jsonl"))] + + +def test_read_entries_different_session_id_same_path_kept(tmp_path): + """Same path with different session IDs are distinct entries — they + represent two sessions that wrote to the same transcript file (rare + but possible if Claude Code renames sessions).""" + append_to_queue(tmp_path, "sid-1", "/abc.jsonl") + append_to_queue(tmp_path, "sid-2", "/abc.jsonl") + snap = snapshot_queue(tmp_path) + entries = read_entries_from_snapshot(snap) + assert len(entries) == 2 + + +def test_read_entries_accepts_legacy_one_column_lines(tmp_path): + """Forward compat: pre-feature workspaces had bare-path lines. Those + still upload, just with empty session_id (can't be marked private + retroactively, which is fine — they pre-date the feature).""" + queue = queue_path(tmp_path) + queue.write_text("/legacy.jsonl\nsid-1\t/new.jsonl\n", encoding="utf-8") + entries = read_entries_from_snapshot(queue) + assert entries == [("", Path("/legacy.jsonl")), ("sid-1", Path("/new.jsonl"))] + + +def test_read_entries_skips_blank_lines(tmp_path): + queue = queue_path(tmp_path) + queue.write_text("sid-1\t/abc.jsonl\n\n \nsid-2\t/def.jsonl\n", encoding="utf-8") + entries = read_entries_from_snapshot(queue) + assert entries == [("sid-1", Path("/abc.jsonl")), ("sid-2", Path("/def.jsonl"))] + + +def test_read_entries_returns_empty_for_missing_file(tmp_path): + assert read_entries_from_snapshot(tmp_path / "does-not-exist.txt") == [] + + +def test_read_paths_compat_wrapper(tmp_path): + """read_paths_from_snapshot returns list[Path] only — used by code + paths (dry-run preview) that don't need session_id.""" + append_to_queue(tmp_path, "sid-1", "/abc.jsonl") + append_to_queue(tmp_path, "sid-2", "/def.jsonl") + snap = snapshot_queue(tmp_path) + paths = read_paths_from_snapshot(snap) + assert paths == [Path("/abc.jsonl"), Path("/def.jsonl")] + + +def test_mark_uploaded_appends_tsv(tmp_path): + when = datetime(2026, 5, 10, 14, 32, 18, tzinfo=timezone.utc) + p1 = tmp_path / "abc.jsonl" + mark_uploaded(tmp_path, p1, when=when) + log = uploaded_log_path(tmp_path) + assert log.read_text(encoding="utf-8") == f"2026-05-10T14:32:18Z\t{p1}\n" + + when2 = datetime(2026, 5, 10, 14, 32, 19, tzinfo=timezone.utc) + p2 = tmp_path / "def.jsonl" + mark_uploaded(tmp_path, p2, when=when2) + assert log.read_text(encoding="utf-8") == ( + f"2026-05-10T14:32:18Z\t{p1}\n" + f"2026-05-10T14:32:19Z\t{p2}\n" + ) + + +def test_mark_uploaded_default_timestamp_is_utc(tmp_path): + p = tmp_path / "x.jsonl" + mark_uploaded(tmp_path, p) + line = uploaded_log_path(tmp_path).read_text(encoding="utf-8") + assert line.endswith(f"\t{p}\n") + assert line.startswith("20") + assert line.split("\t")[0].endswith("Z") + + +def test_mark_private_skipped_appends_audit_log(tmp_path): + when = datetime(2026, 5, 10, 14, 32, 18, tzinfo=timezone.utc) + p = tmp_path / "abc.jsonl" + mark_private_skipped(tmp_path, "sid-1", p, when=when) + log = private_skipped_log_path(tmp_path) + assert log.read_text(encoding="utf-8") == f"2026-05-10T14:32:18Z\tsid-1\t{p}\n" + + +def test_mark_failed_permanent_appends_tsv(tmp_path): + when = datetime(2026, 5, 10, 14, 32, 18, tzinfo=timezone.utc) + p = tmp_path / "abc.jsonl" + mark_failed_permanent(tmp_path, "sid-1", p, 401, when=when) + log = failed_log_path(tmp_path) + assert log.read_text(encoding="utf-8") == f"2026-05-10T14:32:18Z\tsid-1\t401\t{p}\n" + + when2 = datetime(2026, 5, 10, 14, 32, 19, tzinfo=timezone.utc) + p2 = tmp_path / "def.jsonl" + mark_failed_permanent(tmp_path, "sid-2", p2, 413, when=when2) + assert log.read_text(encoding="utf-8") == ( + f"2026-05-10T14:32:18Z\tsid-1\t401\t{p}\n" + f"2026-05-10T14:32:19Z\tsid-2\t413\t{p2}\n" + ) + + +def test_requeue_failed_appends_to_live_queue(tmp_path): + fresh = tmp_path / "fresh.jsonl" + failed = tmp_path / "failed.jsonl" + append_to_queue(tmp_path, "sid-fresh", str(fresh)) + requeue_failed(tmp_path, [("sid-failed", failed)]) + assert queue_path(tmp_path).read_text(encoding="utf-8") == ( + f"sid-fresh\t{fresh}\n" + f"sid-failed\t{failed}\n" + ) + + +def test_requeue_empty_is_noop(tmp_path): + requeue_failed(tmp_path, []) + assert not queue_path(tmp_path).exists() + + +def test_find_recovery_snapshots_returns_existing(tmp_path): + claude = tmp_path / ".claude" + claude.mkdir() + (claude / "agnes-sessions.snapshot.111.txt").write_text("sid-a\t/a.jsonl\n") + (claude / "agnes-sessions.snapshot.222.txt").write_text("sid-b\t/b.jsonl\n") + snaps = find_recovery_snapshots(tmp_path) + assert len(snaps) == 2 + assert all(s.name.startswith("agnes-sessions.snapshot.") for s in snaps) + + +def test_find_recovery_snapshots_empty_when_none(tmp_path): + assert find_recovery_snapshots(tmp_path) == [] + + +def test_discard_snapshot_idempotent(tmp_path): + append_to_queue(tmp_path, "sid", "/a.jsonl") + snap = snapshot_queue(tmp_path) + assert snap is not None + discard_snapshot(snap) + assert not snap.exists() + discard_snapshot(snap) # second call must not raise + + +def test_append_concurrent_threads_no_corruption(tmp_path): + """Concurrent appends from multiple threads must not interleave bytes + mid-line. Guards against the Windows non-atomic open(path, 'a') + regression that the queue lock was added to prevent.""" + import threading + + per_worker = 50 + n_workers = 4 + + def worker(start: int) -> None: + for i in range(per_worker): + append_to_queue(tmp_path, f"sid-{start}-{i}", f"/p/{start}-{i}.jsonl") + + threads = [threading.Thread(target=worker, args=(s,)) for s in range(n_workers)] + for t in threads: + t.start() + for t in threads: + t.join() + + text = queue_path(tmp_path).read_text(encoding="utf-8") + lines = text.splitlines() + assert len(lines) == n_workers * per_worker + # Every line must be well-formed: exactly one tab separator, non-empty + # sid + non-empty path. Interleaved bytes would produce malformed lines. + seen: set[str] = set() + for line in lines: + assert line.count("\t") == 1, f"corrupted line: {line!r}" + sid, _, path = line.partition("\t") + assert sid.startswith("sid-") + assert path.startswith("/p/") + seen.add(line) + assert len(seen) == n_workers * per_worker # no dropped writes diff --git a/tests/test_setup_instructions.py b/tests/test_setup_instructions.py index ff2fb38..670f041 100644 --- a/tests/test_setup_instructions.py +++ b/tests/test_setup_instructions.py @@ -48,13 +48,15 @@ def test_render_setup_instructions_wires_all_placeholders(): def test_resolve_lines_no_plugins_unified_layout(): - """Unified always-on layout (Fix B + Fix C in 2026-05-10 init-report - response): 1 install, 2 init, 3 catalog, 4 preflight, 5 marketplace, - 6 mcp_servers, 7 diagnose, 8 skills, 9 confirm. Preflight + - marketplace + MCP block are emitted even when the operator has zero - plugin grants — registering the per-user marketplace clone pre-wires - the SessionStart hook, and the Atlassian Remote MCP applies to every - analyst whose work touches Jira/Confluence.""" + """Unified always-on layout: 1 install, 2 init, 3 catalog, 4 preflight, + 5 marketplace, 6 mcp_servers, 7 diagnose, 8 confirm. Preflight + + marketplace + MCP block are emitted even when the operator's served + stack is empty — registering the per-user marketplace clone pre-wires + Claude Code for future stack changes (admin grants, system pins, + Flea installs), and the Atlassian Remote MCP applies to every analyst + whose work touches Jira/Confluence. Skills step deleted — the + interactive copy-or-on-demand question was confusing and the + on-demand path is the one-size-fits-all default.""" from app.web.setup_instructions import resolve_lines joined = "\n".join(resolve_lines("agnes.whl")) @@ -66,14 +68,19 @@ def test_resolve_lines_no_plugins_unified_layout(): assert "5) Register the Agnes Claude Code marketplace" in joined assert "6) Register the Atlassian MCP server" in joined assert "7) Run diagnostics:" in joined - assert "8) Skills" in joined - assert "9) Confirm:" in joined + assert "8) Confirm:" in joined # No stray Confirms at other positions. + assert "9) Confirm:" not in joined assert "10) Confirm:" not in joined assert "6) Confirm:" not in joined - # The marketplace step header adapts to "no plugins granted yet" copy + # No skills step in any form. + assert "Skills (ask the user" not in joined + assert "Skills" not in joined or "agnes skills" in joined # comment refs still OK + assert "8) Skills" not in joined + assert "~/.claude/skills/agnes/" not in joined + # The marketplace step header adapts to the empty-stack copy # rather than the plugin-installing variant. - assert "no plugins granted yet" in joined + assert "your stack is empty for now" in joined assert "agnes refresh-marketplace --bootstrap" in joined # MCP step uses SSE transport for Atlassian's hosted Remote MCP. assert "claude mcp add --transport sse atlassian https://mcp.atlassian.com/v1/sse" in joined @@ -250,9 +257,10 @@ def test_trust_block_step_0c_does_not_reference_stale_step_number(): def test_resolve_lines_with_plugins_uses_install_first_diagnose_last_layout(): """Marketplace layout puts install/init/catalog/preflight/marketplace - BEFORE diagnose and skills, so the human-loop skills question is the - final blocking step before Confirm. Step numbers: 4 preflight, - 5 marketplace, 6 diagnose, 7 skills, 8 confirm.""" + BEFORE diagnose, so diagnose is the final smoke test before Confirm. + Step numbers: 4 preflight, 5 marketplace, 6 mcp, 7 diagnose, + 8 confirm. No skills step — interactive copy-or-on-demand question + was confusing; on-demand `agnes skills show` is the default.""" from app.web.setup_instructions import resolve_lines lines = resolve_lines( @@ -268,13 +276,13 @@ def test_resolve_lines_with_plugins_uses_install_first_diagnose_last_layout(): assert "brew install git" in joined assert "winget install --id Git.Git -e --source winget --silent" in joined assert "sudo apt-get install git" in joined or "sudo dnf install git" in joined - # Step 5 — marketplace + plugins. Collapsed to a single CLI call: + # Step 5 — marketplace + stack install. Collapsed to a single CLI call: # `agnes refresh-marketplace --bootstrap` does clone + PAT-strip + # chmod + register-with-Claude + auto-install-from-manifest internally. # Pulling that out of the inline shell script avoided Claude Code's # agent-driven `rm -rf` permission gate that the old multi-line # sequence tripped on. - assert "5) Register the Agnes Claude Code marketplace and install plugins" in joined + assert "5) Register the Agnes Claude Code marketplace and install your current stack" in joined assert "agnes refresh-marketplace --bootstrap" in joined # The destructive prep + per-plugin install commands are now inside # the CLI; the prompt must not emit the inline shell forms in @@ -287,15 +295,17 @@ def test_resolve_lines_with_plugins_uses_install_first_diagnose_last_layout(): assert "claude plugin marketplace add" not in executable assert "claude plugin install foo@agnes" not in executable assert "claude plugin install bar@agnes" not in executable - # Step 6 — Atlassian MCP registration (Fix C in 2026-05-10 init-report response). + # Step 6 — Atlassian MCP registration. assert "6) Register the Atlassian MCP server" in joined # Step 7 — diagnose now AFTER marketplace + MCP wiring. assert "7) Run diagnostics:" in joined - # Step 8 — skills, the last interactive step before Confirm. - assert "8) Skills" in joined - # Step 9 — Confirm renumbered (no stray Confirms at other positions). - assert "9) Confirm:" in joined - for stray in ("4) Confirm:", "5) Confirm:", "6) Confirm:", "7) Confirm:", "8) Confirm:"): + # Step 8 — Confirm. + assert "8) Confirm:" in joined + # No skills step in any form. + assert "Skills (ask the user" not in joined + assert "8) Skills" not in joined + assert "~/.claude/skills/agnes/" not in joined + for stray in ("4) Confirm:", "5) Confirm:", "6) Confirm:", "7) Confirm:", "9) Confirm:"): assert stray not in joined # Crucial ordering invariants for the new layout. install_idx = joined.index("1) Install the CLI") @@ -305,9 +315,8 @@ def test_resolve_lines_with_plugins_uses_install_first_diagnose_last_layout(): market_idx = joined.index("5) Register the Agnes Claude Code marketplace") mcp_idx = joined.index("6) Register the Atlassian MCP server") diag_idx = joined.index("7) Run diagnostics:") - skills_idx = joined.index("8) Skills") - confirm_idx = joined.index("9) Confirm:") - assert install_idx < init_idx < catalog_idx < git_idx < market_idx < mcp_idx < diag_idx < skills_idx < confirm_idx + confirm_idx = joined.index("8) Confirm:") + assert install_idx < init_idx < catalog_idx < git_idx < market_idx < mcp_idx < diag_idx < confirm_idx # Legacy `git config sslVerify=false` downgrade is gone — see CHANGELOG. assert "git config --global" not in joined # server_host is server-side substituted; the placeholder must be gone. @@ -631,14 +640,13 @@ def test_resolve_lines_ca_pem_empty_string_is_treated_as_absent(): def test_resolve_lines_ca_pem_works_without_plugins(): """Trust block is independent of the marketplace + MCP blocks — emit - step 0 even when plugin list is empty. Confirm step is at 9 in the - always-on layout (Fix B + Fix C in 2026-05-10 init-report response). - Step 0 is preamble, not numbered.""" + step 0 even when plugin list is empty. Confirm step is at 8 in the + post-skills-removal layout. Step 0 is preamble, not numbered.""" from app.web.setup_instructions import resolve_lines joined = "\n".join(resolve_lines("agnes.whl", ca_pem=_FAKE_CA_PEM)) assert "0) Trust the Agnes TLS certificate" in joined - assert "9) Confirm:" in joined + assert "8) Confirm:" in joined # Marketplace block is now emitted unconditionally; the bootstrap # one-liner does the `claude plugin marketplace add` internally so # the literal string isn't in the prompt text — the user-facing @@ -683,51 +691,43 @@ def test_diagnose_step_documents_normal_states(): assert "NORMAL" in joined or "normal" in joined -def test_skills_step_is_last_blocking_step_before_confirm(): - """In the new layout, skills is the LAST interactive step before Confirm - (it used to come right after diagnose and before git/marketplace, which - invited the assistant to "do the rest in parallel"). We've moved the - install work earlier, so the skills question is now a single clear gate - — there's nothing left to do in parallel and the assistant must wait - for the user's answer. +def test_no_skills_step_emitted(): + """Skills step was removed: the interactive copy-or-on-demand question + was confusing for new users (named opinion call with no obvious right + answer after a wall of technical steps). On-demand lookup via + `agnes skills show ` is the one-size-fits-all default; CLAUDE.md + references specific skills (e.g. agnes-data-querying) when relevant. - Assert two things: - (a) The prompt explicitly tells the assistant to wait for the answer. - (b) The skills step appears AFTER the marketplace step in the rendered - line order — i.e., the legacy "skills before marketplace" flow - isn't accidentally back.""" + Regression guard: the rendered prompt must not contain a numbered + Skills step or the bulk-copy shell loop into ~/.claude/skills/agnes/. + """ from app.web.setup_instructions import resolve_lines - joined = "\n".join(resolve_lines("agnes.whl", plugin_install_names=["foo"], server_host="h")) - flattened = " ".join(joined.split()) - - # (a) The prompt must instruct the assistant to wait — and must NOT - # contain the obsolete "you can continue in parallel" hint. - assert "Wait for the user's answer" in joined - assert "don't depend on the answer" not in flattened - assert "do not depend on the answer" not in flattened - - # (b) Skills comes after marketplace in the rendered line order. - market_idx = joined.index("Register the Agnes Claude Code marketplace") - skills_idx = joined.index("Skills (ask the user") - assert market_idx < skills_idx + for kwargs in ( + {}, + {"plugin_install_names": ["foo"], "server_host": "h"}, + ): + joined = "\n".join(resolve_lines("agnes.whl", **kwargs)) + assert "Skills (ask the user" not in joined + assert "8) Skills" not in joined + assert "9) Skills" not in joined + assert "~/.claude/skills/agnes/" not in joined + assert "for s in $(agnes skills list" not in joined + assert "Wait for the user's answer" not in joined -def test_no_plugins_layout_keeps_diagnose_before_skills(): - """Always-on layout (Fix B + Fix C in 2026-05-10 init-report response): +def test_no_plugins_layout_diagnose_before_confirm(): + """Always-on layout (post-skills-removal): install → init → catalog → preflight → marketplace → mcp_servers → - diagnose → skills → confirm, regardless of plugin grants. Step - numbers: 7 diagnose, 8 skills, 9 confirm.""" + diagnose → confirm. Step numbers: 7 diagnose, 8 confirm.""" from app.web.setup_instructions import resolve_lines joined = "\n".join(resolve_lines("agnes.whl")) assert "7) Run diagnostics:" in joined - assert "8) Skills" in joined - assert "9) Confirm:" in joined + assert "8) Confirm:" in joined diag_idx = joined.index("7) Run diagnostics:") - skills_idx = joined.index("8) Skills") - confirm_idx = joined.index("9) Confirm:") - assert diag_idx < skills_idx < confirm_idx + confirm_idx = joined.index("8) Confirm:") + assert diag_idx < confirm_idx def test_unified_flow_uses_only_agnes_verbs(): diff --git a/tests/test_setup_page_unified.py b/tests/test_setup_page_unified.py index a6f861d..88e6305 100644 --- a/tests/test_setup_page_unified.py +++ b/tests/test_setup_page_unified.py @@ -37,10 +37,10 @@ def test_setup_page_renders_unified_layout(client): - `agnes init` is mandatory (subsumes the old admin-only `agnes auth import-token` + `agnes auth whoami` pair). - - Marketplace block is always emitted (Fix B in 2026-05-10 - init-report response): anonymous visitors with no plugin grants - still get the marketplace registration step so the SessionStart - hook is pre-wired. Confirm = step 8. + - Marketplace block is always emitted: anonymous visitors with no + plugin grants still get the marketplace registration step so + future stack changes land cleanly. Confirm = step 8 in the + post-skills-removal layout. """ resp = client.get("/setup", follow_redirects=True) assert resp.status_code == 200 @@ -49,9 +49,9 @@ def test_setup_page_renders_unified_layout(client): assert "agnes init" in text # Legacy admin-only login verbs are gone from the rendered prompt. assert "agnes auth import-token" not in text - # Always-on layout (preflight + marketplace + MCP block all unconditional): - # Confirm = step 9. - assert "9) Confirm:" in text + # Always-on layout (preflight + marketplace + MCP all unconditional, + # skills removed): Confirm = step 8. + assert "8) Confirm:" in text def test_setup_page_ignores_role_query_param(client): @@ -72,14 +72,15 @@ def test_setup_page_ignores_role_query_param(client): def test_setup_page_renders_marketplace_for_user_with_grants(client, monkeypatch): - """When the caller has plugin grants in `resource_grants`, the - unified flow inserts the marketplace + plugins block (step 5) and - Confirm shifts to step 8. + """When the caller has a non-empty served stack, the marketplace block + renders the "install your current stack" copy variant. Confirm stays + at step 8 in the post-skills-removal layout (preflight + marketplace + + MCP all always-on regardless of stack contents). Stub `marketplace_filter.resolve_user_marketplace` to return a plugin so we don't have to seed the full marketplace plumbing in - this test — we're verifying the layout switch, not the RBAC - resolver itself (covered by `test_marketplace_filter`). + this test — we're verifying the layout, not the RBAC resolver + itself (covered by `test_marketplace_filter`). Post-Model B (v28+): the setup page reads from `resolve_user_marketplace` (which gates on explicit subscriptions) @@ -109,11 +110,11 @@ def test_setup_page_renders_marketplace_for_user_with_grants(client, monkeypatch # Marketplace block marker. The per-plugin install lines moved inside # `agnes refresh-marketplace --bootstrap`, so we check the section # header + the one-liner instead of `claude plugin install @agnes`. - assert "Register the Agnes Claude Code marketplace" in text + # Non-empty stack → "install your current stack" header variant. + assert "Register the Agnes Claude Code marketplace and install your current stack" in text assert "agnes refresh-marketplace --bootstrap" in text - # Layout shift: Confirm is now step 9 (preflight + marketplace + MCP all - # always-on per Fix B + Fix C in 2026-05-10 init-report response). - assert "9) Confirm:" in text + # Post-skills-removal layout: Confirm is step 8. + assert "8) Confirm:" in text # Pre-flight is in the rendered prompt at step 4. assert "Make sure git and claude are installed" in text # Atlassian MCP registration is at step 6. diff --git a/tests/test_statusline.py b/tests/test_statusline.py new file mode 100644 index 0000000..d48cd5e --- /dev/null +++ b/tests/test_statusline.py @@ -0,0 +1,55 @@ +"""Tests for `agnes statusline` — Claude Code statusLine helper.""" + +import json + +from typer.testing import CliRunner + +from cli.commands.statusline import statusline_app +from cli.lib.private_list import add_private + +runner = CliRunner() + + +def test_statusline_emits_private_indicator(tmp_path, monkeypatch): + monkeypatch.setenv("AGNES_LOCAL_DIR", str(tmp_path)) + add_private(tmp_path, "abc-123") + payload = json.dumps({"session_id": "abc-123"}) + result = runner.invoke(statusline_app, [], input=payload) + assert result.exit_code == 0 + assert "agnes-private" in result.output + + +def test_statusline_silent_for_non_private(tmp_path, monkeypatch): + monkeypatch.setenv("AGNES_LOCAL_DIR", str(tmp_path)) + payload = json.dumps({"session_id": "abc-123"}) + result = runner.invoke(statusline_app, [], input=payload) + assert result.exit_code == 0 + assert "agnes-private" not in result.output + + +def test_statusline_silent_on_malformed_stdin(tmp_path, monkeypatch): + monkeypatch.setenv("AGNES_LOCAL_DIR", str(tmp_path)) + result = runner.invoke(statusline_app, [], input="not json {") + assert result.exit_code == 0 + assert result.output.strip() == "" + + +def test_statusline_silent_on_missing_session_id(tmp_path, monkeypatch): + monkeypatch.setenv("AGNES_LOCAL_DIR", str(tmp_path)) + result = runner.invoke(statusline_app, [], input='{"foo": "bar"}') + assert result.exit_code == 0 + assert result.output.strip() == "" + + +def test_statusline_silent_on_empty_stdin(tmp_path, monkeypatch): + monkeypatch.setenv("AGNES_LOCAL_DIR", str(tmp_path)) + result = runner.invoke(statusline_app, [], input="") + assert result.exit_code == 0 + assert result.output.strip() == "" + + +def test_statusline_silent_when_payload_not_object(tmp_path, monkeypatch): + monkeypatch.setenv("AGNES_LOCAL_DIR", str(tmp_path)) + result = runner.invoke(statusline_app, [], input='["array"]') + assert result.exit_code == 0 + assert result.output.strip() == "" diff --git a/uv.lock b/uv.lock index 67e6cee..1025a2d 100644 --- a/uv.lock +++ b/uv.lock @@ -21,7 +21,7 @@ wheels = [ [[package]] name = "agnes-the-ai-analyst" -version = "0.47.4" +version = "0.48.0" source = { editable = "." } dependencies = [ { name = "a2wsgi" }, @@ -32,6 +32,7 @@ dependencies = [ { name = "duckdb" }, { name = "dulwich" }, { name = "fastapi" }, + { name = "filelock" }, { name = "google-api-python-client" }, { name = "google-cloud-bigquery" }, { name = "google-cloud-bigquery-storage" }, @@ -97,6 +98,7 @@ requires-dist = [ { name = "faker", marker = "extra == 'dev'", specifier = ">=24.0.0" }, { name = "fastapi", specifier = ">=0.115.0" }, { name = "fastapi-debug-toolbar", marker = "extra == 'dev'", specifier = ">=0.6.3" }, + { name = "filelock", specifier = ">=3.13,<4" }, { name = "google-api-python-client", specifier = ">=2.0.0" }, { name = "google-cloud-bigquery", specifier = ">=3.0.0" }, { name = "google-cloud-bigquery-storage", specifier = ">=2.0.0" }, @@ -712,6 +714,15 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/a7/09/b01c8c60d608e93faa257a9a3fc9a9739c70c9ac926edfbbe92baebbf33b/fastapi_debug_toolbar-0.6.3-py3-none-any.whl", hash = "sha256:077b5ffeb10426c49387ef090cf307a316d5c71c69bbffe270d9898d2554429e", size = 42904, upload-time = "2024-05-04T17:03:11.585Z" }, ] +[[package]] +name = "filelock" +version = "3.29.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/b5/fe/997687a931ab51049acce6fa1f23e8f01216374ea81374ddee763c493db5/filelock-3.29.0.tar.gz", hash = "sha256:69974355e960702e789734cb4871f884ea6fe50bd8404051a3530bc07809cf90", size = 57571, upload-time = "2026-04-19T15:39:10.068Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/81/47/dd9a212ef6e343a6857485ffe25bba537304f1913bdbed446a23f7f592e1/filelock-3.29.0-py3-none-any.whl", hash = "sha256:96f5f6344709aa1572bbf631c640e4ebeeb519e08da902c39a001882f30ac258", size = 39812, upload-time = "2026-04-19T15:39:08.752Z" }, +] + [[package]] name = "fonttools" version = "4.62.1"