* Capture session paths via SessionStart hook + lock parallel pushes Replace the encoding-based scan of ~/.claude/projects/<encoded-cwd>/ with a queue file populated by a new `agnes capture-session` SessionStart hook. The hook reads the documented `transcript_path` field from Claude Code's hook stdin JSON, sidestepping the cwd-to-folder encoding (which is an internal implementation detail and varies by Claude Code version). - New `agnes capture-session` subcommand appends transcript_path to <workspace>/.claude/agnes-sessions.txt. Silent on all malformed input so a hook chain failure doesn't clutter Claude Code startup. - `agnes push` now consumes the queue: atomic snapshot rename guards against hooks writing during the push window, successful uploads land in agnes-sessions-uploaded.txt (TSV: timestamp + path), failed paths are requeued. - Cross-platform single-instance lock via the filelock package (fcntl on POSIX, msvcrt on Windows). Concurrent SessionEnd hooks — common when the user closes several sessions at once — silent-exit on the losing side instead of all racing the upload. - Recovery: pre-existing snapshot files from a crashed push are picked up and processed before the live queue. - The SessionStart `agnes push` self-heal entry is dropped — it became redundant once the queue persists across runs (orphans from headless / crashed sessions ship out on the next interactive SessionEnd push). Existing workspaces auto-migrate via the marker-based replace logic. - Legacy encoding scan stays available behind `--legacy-scan` for one- off backfills of sessions predating the queue. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Add /agnes-private + statusLine indicator for private sessions Users handling sensitive data inside Claude Code can now opt a session out of the Agnes upload pipeline, either proactively (right after session start) or reactively (mid-session). The `/agnes-private` slash command runs `agnes mark-private` deterministically via `!`-prefix direct bash — no AI in the loop. A workspace-installed statusLine surfaces a `🔒 agnes-private` indicator in Claude Code's status bar so the user sees the state at a glance. Authoritative source of "do not upload" is a separate file `<workspace>/.claude/agnes-sessions-private.txt` (one session_id per line). Both `capture-session` (queue writer) and `push` (queue reader) consult the list. This makes the slash-command / SessionStart-hook race impossible by construction: whichever runs first, the session is correctly filtered out. - `agnes mark-private` reads `CLAUDE_CODE_SESSION_ID` from env (set by Claude Code in every bash subprocess it spawns — stable documented API) and appends to the private list. - `agnes statusline` reads the session JSON Claude Code pipes on stdin, checks the private list, and emits the indicator or nothing. Optimized for the high call frequency of statusLine renders. - `capture-session` extracts session_id from hook stdin and skips queue write when the ID is already on the private list (race protection). - `push` filters snapshot entries by the private list and appends to a per-workspace audit log `agnes-sessions-private-skipped.txt`. - Queue format migrated from `<path>` to `<session_id>\t<path>`; legacy one-column lines still parse (empty session_id, still upload, can't be marked private retroactively — fine, they pre-date the feature). - `install_claude_hooks` writes a workspace statusLine unless the user already has a custom one (warn + preserve). Idempotent re-init. - `install_claude_commands` ships `agnes-private.md` alongside `update-agnes-plugins.md`. Per-template fallback so a missing template doesn't get clobbered with the wrong content. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Fix setup-prompt + CLAUDE.md marketplace copy + drop skills step Three issues against the post-PR-#240 / post-PR-#237 state: 1. Setup prompt's marketplace block trailer (both has-stack and empty-stack variants) claimed the SessionStart hook keeps the marketplace clone in sync via `agnes refresh-marketplace --quiet` on every session and that admin grants land automatically — both false since PR #237 (0.47.x) moved the install/update path out of the hook into the `/update-agnes-plugins` slash command. The hook is `--check`-only: detects server-side changes, prompts the user to run the slash command, which does the full reconcile interactively with output visible in the transcript. 2. The empty-stack variant framed composition as "admin grants only", missing the actual three-source served stack: (admin RBAC ∩ /marketplace subscriptions) ∪ system-mandatory plugins (admin-pinned, auto-applied) ∪ Flea market installs (skills/agents bundled, plugins standalone) Updated copy spells out all three sources so analysts know where their stack picks live, and what the SessionStart hook actually does on change detection. 3. CLAUDE.md template's "Agnes Marketplace" section conflated eligibility (`resolve_allowed_plugins` — what's listed) with served stack (`resolve_user_marketplace` — what actually reaches Claude Code). The two are different: a user can be RBAC-eligible for a plugin without having subscribed to it on /marketplace. Rewrote the section to distinguish the eligibility set from the served stack and to describe the `--check`-only hook accurately. Plus: deleted the setup prompt's interactive Skills step (final step before Confirm). The named-opinion question — "do you want me to bulk-copy every skill into ~/.claude/skills/agnes/ or pull on-demand via `agnes skills show <name>`?" — had no obvious right answer for new users at the tail end of a wall of technical steps. On-demand lookup is the one-size-fits-all default; `agnes skills list/show` remain discoverable and the CLAUDE.md template references specific skills inline (e.g. agnes-data-querying in the BigQuery section) where they're relevant. Layout: Confirm shifts from step 9 to step 8. Tests updated, full setup/marketplace/welcome surface green (115 passed). Remaining full-suite failures are pre-existing (BQ/Keboola fixtures, Windows charmap collection error in test_v26_keboola_e2e) — verified against a clean stash, unrelated to this diff. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Fix session-queue race + snapshot PID-reuse data loss Two blocker fixes from the PR #242 review: 1. Concurrent SessionStart hooks could corrupt the queue file on Windows. Python's `open(path, "a")` is not atomic there — the CRT does not pass FILE_APPEND_DATA to CreateFile, so concurrent appenders (user opening several Claude Code windows simultaneously) could interleave bytes mid-line. The malformed lines then silently fail the parser and the entries are dropped. Fix: wrap append_to_queue, requeue_failed, and snapshot_queue in a short-lived FileLock on a dedicated `agnes-queue.lock`. Separate from `agnes-push.lock` so capture-session hooks don't block on the push command. New test_append_concurrent_threads_no_corruption reproduces the race with 4 threads x 50 appends. 2. Snapshot filenames embedded only the PID (`agnes-sessions.snapshot. <PID>.txt`). After a crashed push left a snapshot on disk and the OS recycled the PID for a new push, `os.rename` would atomically overwrite the recovery snapshot — every entry in it lost, silently. Fix: append a uuid8 hex tail (`agnes-sessions.snapshot.<PID>. <uuid8>.txt`). find_recovery_snapshots already globs the prefix so it picks up both old and new format. New test_snapshot_filename_is_unique_per_call asserts two consecutive snapshots under the same PID don't collide. Targeted tests green (47/47 in session_queue/capture_session/cli_push). Full suite failures unchanged from baseline (pre-existing BQ/Keboola fixture issues per CLAUDE.md). * Auto-refresh workspace hooks + bash-wrap all hook entries (Windows) Fixes from PR #242 second review (ZdenekSrotyr): 1. `uv.lock` regenerated to include `filelock 3.29.0` (declared in pyproject.toml but missing from the lock file — CI's lockfile-consistency check would fail; `uv pip install` on a clean cache would silently miss the dep). 2. `agnes self-upgrade` now auto-refreshes the workspace Claude Code hooks via the new `cli.lib.hooks.maybe_refresh_claude_hooks`. Closes the silent-stop migration gap: a v0.48 workspace would auto-upgrade the CLI from its existing SessionStart self-upgrade entry but never pick up the new `agnes capture-session` SessionStart hook, leaving the queue empty and `agnes push` uploading nothing. The refresh fires on both the "info is None" fast path (CLI already current — catches the second SessionStart after a prior upgrade) and the install-success path. Guarded by `workspace_has_agnes_hooks` so it never writes `.claude/settings.json` into directories that aren't Agnes workspaces (e.g. `agnes self-upgrade` invoked from `~/`). Errors are surfaced on stderr but never flip the upgrade exit code. 3. All Agnes-managed hooks are now wrapped in `bash -c "..."`. The self-upgrade+pull chained SessionStart entry was the only one still shipping unwrapped — Claude Code on Windows runs hook commands directly without a shell, so the `;` chain + `2>/dev/null` + `|| true` shell syntax silently no-op'd on native Windows installs without Git Bash on PATH. Workspaces still on the old form auto-upgrade via the refresh path above. Tests: +12 in test_lib_hooks.py (guard semantics, v0.48→v0.49 migration end-to-end, third-party-hook preservation, bash-wrap invariant). +5 in test_self_upgrade.py (refresh fires on info=None, fires on install success, skipped on failure, skipped on --check-only, refresh failure never flips exit code). 130 targeted tests green. The 2 pre-existing Windows path-separator failures in `test_smoke_test_detects_version_mismatch[uv|pip]` are unrelated (path mismatch `\fake\uv\bin\agnes` vs `/fake/uv/bin/agnes` in test asserts, pre-PR baseline). * CHANGELOG: document PR-242 main features Closes ZdenekSrotyr #4: the [Unreleased] block was missing entries for the PR's primary surface — only the post-merge fix bullets and the unrelated setup-prompt copy change were captured. Adds: - ### Added: 6 bullets covering the session capture queue + new `agnes capture-session` subcommand, `/agnes-private` slash + `agnes mark-private`, `agnes statusline` + statusLine wiring, `--legacy-scan` opt-in fallback, single-instance push lock, and the new `filelock` runtime dep. - ### Changed: BREAKING bullet on the SessionStart / SessionEnd hook wire format change (capture-session as first SessionStart entry, push self-heal removed, SessionEnd push detached via nohup, all entries bash-wrapped). Folds the prior standalone bash-wrap bullet into this consolidated entry — Z's review flagged the layout shift as BREAKING, and grouping the related sub-changes makes the migration story readable in one place. - Operator migration is auto-handled by `maybe_refresh_claude_hooks` invoked from `agnes self-upgrade` (separate Changed entry below). No `agnes init` re-run required. Pre-queue session jsonls on upgrading workspaces still need a one-off `agnes push --legacy-scan` — flagged in the BREAKING bullet. No code change; doc only. * Drop permanent 4xx uploads instead of requeueing forever Closes ZdenekSrotyr #5. Previously the push retry path requeued any non-200 response except the literal "file not found on disk", so 401 (token expired), 403 (RBAC denial), 413 (payload too large), 400 (server-side validation) cycled through every push run forever — the queue grew without bound and each run re-bombarded the server with the same deterministically-failing upload. Now 4xx (except 408 Request Timeout + 429 Too Many Requests, which the HTTP spec marks as transient) is dropped and audit-logged to `<workspace>/.claude/agnes-sessions-failed.txt`: <iso_ts>\t<session_id>\t<status>\t<transcript_path> 5xx and network errors continue to requeue — those reflect server / transport state that can change between runs, so retry is the right behavior. The audit log piggybacks on the push single-instance lock (agnes-push.lock) — push is the only writer to this file, same as the existing `mark_uploaded` and `mark_private_skipped` paths, so no separate filelock is needed. `agnes push --json` surfaces a new `dropped_permanent` counter; non- quiet stdout mentions the audit-log path so operators tailing the output have a pointer to the forensic trail. Tests: +7 in test_cli_push.py (401/400/403/413 → drop; 408/429 → requeue; 500/502/503 → requeue; network exception → requeue; --json `dropped_permanent` counter; stdout audit-log pointer). +1 in test_session_queue.py (mark_failed_permanent TSV format). 127/129 targeted tests green. The 2 pre-existing Windows path-separator failures in `test_smoke_test_detects_version_mismatch [uv|pip]` are unrelated (path mismatch `\fake\uv\bin\agnes` vs `/fake/uv/bin/agnes` in test asserts, pre-PR baseline). * Catch OSError in push lock acquisition Closes ZdenekSrotyr #8. `acquire_or_skip` in `cli/lib/push_lock.py` previously caught only `filelock.Timeout`. Any `OSError` from `FileLock.acquire` — read-only filesystem, permission denied on `.claude/`, disk full, hardware I/O failure — propagated as an unhandled traceback. Two visible failure modes: - SessionEnd hook: `|| true` in the wrapper swallowed the error, so daily pushes silently never ran. Operator had no signal. - Manual `agnes push`: ugly Python traceback dumped to the terminal instead of a clean exit. Now `OSError` is treated the same as `Timeout` — yield `None`, caller returns cleanly with rc=0. The operator's environment in these scenarios has bigger problems than missing session uploads, so we swallow rather than retry-loop or surface a noisy warning. Test: `test_push_silent_exit_when_filelock_raises_oserror` patches the `FileLock` used inside `push_lock` to raise OSError on acquire, verifies push exits 0 with no traceback and the queue is preserved for the next attempt. * Address remaining S2 items from PR-242 review Four items from ZdenekSrotyr's S2 list: S2.10 — `_install_statusline` truthy check (cli/lib/hooks.py): replace `if existing:` with explicit `if existing is None or existing == "":`. Documents and tests the behavior for both edge cases (explicit-null and empty-string `statusLine`) — both treated as "not configured" rather than "explicit user opt-out", so we install ours. Two new tests in test_lib_hooks.py pin the contract. S2.6 — onboarding docs for /agnes-private. New "Private sessions" subsection in `config/claude_md_template.txt` (next to Data Sync) covering the slash command, statusbar indicator, and audit-log location. One-line tip in `app/web/setup_instructions.py` so the feature is discoverable at onboarding. S2.9 — e2e privacy test (tests/test_e2e_privacy.py). Wires capture_session → mark_private → push against a recording fake api_post and asserts zero session uploads for the marked one. Three cases: mark-before-capture (queue write skipped), mark-after-capture (push-side filter catches it + audit-logs), control (unmarked sessions upload normally). David #8 — `--legacy-scan` help text now documents the private-list gap (legacy entries carry empty session_id, so the filter is not consulted). The practical impact is bounded — pre-queue sessions cannot have been marked private since the private list is a queue-era feature — but the disclaimer in the help text means an operator running a backfill is not surprised. 68 targeted tests green (3 new e2e + 2 new truthy edge tests + existing). 2 pre-existing Windows path-separator failures in test_smoke_test_detects_version_mismatch[uv|pip] unchanged. Remaining S2 items (statusline mkdir push-back, capture-session silent-fail follow-up) handled in PR comment + follow-up issue respectively. * Address remaining S2 follow-ups (David #8, S2.7, David #11) Three items left over from Mina's bbf63472 batch — that commit addressed S2.6/S2.9/S2.10 + documented David #8 in help text but deferred the actual implementations of S2.7, David #11, and the real David #8 fix to follow-ups. This commit closes them. David #8 — `agnes push --legacy-scan` now consults the private list. Claude Code names jsonls `<session-id>.jsonl`, so the file stem IS the session id; the legacy-scan path can apply the same private filter the queue path uses. Both the dry-run and live-upload code paths fixed. Help text updated (no longer warns the filter is bypassed). Two new tests in test_cli_push.py cover the upload-skip path + the dry-run `would_skip_private` segregation. S2.7 — `statusline`/`is_private` no longer mkdir-pollutes arbitrary workdirs. Split `_claude_dir` into `_claude_dir_writable` (used only from `add_private`) and `_claude_dir_readonly` (no mkdir). The read-only public helpers (`private_list_path`, `read_all_private`, `is_private`) compose the no-mkdir variant by default; `add_private` opts in via `writable=True`. Added a process-local mtime-keyed cache around `read_all_private` so in-process callers (push doing one stat per upload candidate, future `agnes diagnose`) don't re-parse the file on every check. Cache eviction on `add_private` so a sub-second write+read sequence doesn't see stale data even on coarse-mtime filesystems. Two new tests pin the no-mkdir contract + the in-same-second add+read consistency. David #11 — `agnes capture-session` writes a breadcrumb log on every invocation. New `<workspace>/.claude/agnes-capture-session.log` TSV: `<iso_ts>\t<outcome>\t<detail>` where outcome covers every silent- exit path (`ok`, `private_skip`, `empty_stdin`, `bad_json`, `not_object`, `no_transcript_path`, `stdin_read_error`, `write_error`). Gives operators a signal to detect "hook fires but queue stays empty" — without it, an upstream Claude Code stdin- contract change is invisible because the hook always exits 0. Log rolls at 256 KiB so it doesn't grow unbounded on long-lived workspaces. Best-effort: a breadcrumb-write failure is itself swallowed so the hook contract stays "exit 0 always". Skipped in non-Agnes workdirs (no `.claude/` exists) so opening Claude Code in `~/` doesn't pollute it. Five new tests in test_capture_session.py cover the success / bad_json / no_transcript_path / private_skip / no-pollute paths. 115 targeted tests green (test_cli_push, test_capture_session, test_private_list, test_session_queue, test_e2e_privacy, test_lib_hooks, test_statusline, test_mark_private). --------- Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
589 lines
25 KiB
Python
589 lines
25 KiB
Python
"""Tests for cli/lib/hooks.py:install_claude_hooks."""
|
|
|
|
import json
|
|
from pathlib import Path
|
|
|
|
|
|
from cli.lib.hooks import (
|
|
install_claude_hooks,
|
|
maybe_refresh_claude_hooks,
|
|
workspace_has_agnes_hooks,
|
|
)
|
|
|
|
|
|
def _read_settings(workspace: Path) -> dict:
|
|
return json.loads((workspace / ".claude" / "settings.json").read_text())
|
|
|
|
|
|
def _commands_for(cfg: dict, event: str) -> list[str]:
|
|
"""Flatten the per-event command list — each entry has a list of hooks,
|
|
each hook has a `command` field. We treat each entry as one command for
|
|
assertion purposes (matches the install_claude_hooks contract: one
|
|
entry per command)."""
|
|
return [
|
|
entry["hooks"][0]["command"]
|
|
for entry in cfg["hooks"].get(event, [])
|
|
if entry.get("hooks")
|
|
]
|
|
|
|
|
|
def test_install_creates_settings_file(tmp_path):
|
|
install_claude_hooks(tmp_path)
|
|
cfg = _read_settings(tmp_path)
|
|
starts = _commands_for(cfg, "SessionStart")
|
|
# SessionStart has three entries: (1) capture-session as the very first
|
|
# so the hook stdin (transcript_path) is appended to the queue before
|
|
# any other hook runs; (2) chained self-upgrade ; pull — self-upgrade
|
|
# runs first so a wire-protocol bump lands before pull tries to use
|
|
# the new CLI; (3) refresh-marketplace as a separate entry so a
|
|
# failure (e.g. fresh workspace with no clone) doesn't suppress the
|
|
# data pull above.
|
|
#
|
|
# `agnes push` is NOT in SessionStart — the queue mechanism handles
|
|
# orphans on the next SessionEnd, so the old self-heal entry was
|
|
# redundant + would re-upload the just-starting (empty) session.
|
|
assert len(starts) == 3
|
|
capture = next((c for c in starts if "agnes capture-session" in c), None)
|
|
assert capture is not None, "Expected SessionStart capture-session entry"
|
|
assert capture.startswith("bash -c "), (
|
|
f"capture-session hook must be wrapped in bash -c for Windows; got: {capture!r}"
|
|
)
|
|
assert not any("agnes push" in c for c in starts), (
|
|
f"agnes push must NOT be in SessionStart; got: {starts!r}"
|
|
)
|
|
chain = next(
|
|
(c for c in starts if "agnes self-upgrade" in c and "agnes pull" in c),
|
|
None,
|
|
)
|
|
assert chain is not None, (
|
|
"Expected one SessionStart entry chaining self-upgrade and pull"
|
|
)
|
|
assert "agnes self-upgrade --quiet" in chain
|
|
assert "agnes pull --quiet" in chain
|
|
# The refresh-marketplace command is wrapped in `bash -c "..."` so the
|
|
# `2>/dev/null || true` shell syntax is interpreted on Windows, where
|
|
# Claude Code runs hook commands directly without invoking a shell.
|
|
refresh = next((c for c in starts if "agnes refresh-marketplace" in c), None)
|
|
assert refresh is not None
|
|
assert refresh.startswith("bash -c "), (
|
|
f"refresh-marketplace hook must be wrapped in bash -c for Windows; got: {refresh!r}"
|
|
)
|
|
# Hook is now a detector — `--check` only. Plugin install/update
|
|
# happens in the `/update-agnes-plugins` slash command instead.
|
|
# Pinning the flag here prevents an accidental regression to the old
|
|
# `--quiet` form (which performed a full reconcile silently).
|
|
assert "--check" in refresh, (
|
|
f"refresh-marketplace hook must use --check (detector mode); got: {refresh!r}"
|
|
)
|
|
assert "--quiet" not in refresh, (
|
|
f"refresh-marketplace hook must NOT use --quiet (removed flag); got: {refresh!r}"
|
|
)
|
|
ends = _commands_for(cfg, "SessionEnd")
|
|
assert len(ends) == 1
|
|
assert "agnes push --quiet" in ends[0]
|
|
|
|
|
|
def test_all_installed_hooks_are_bash_wrapped(tmp_path):
|
|
"""Pin the invariant: every Agnes-managed SessionStart / SessionEnd
|
|
entry must be wrapped in `bash -c "..."`. Claude Code on Windows
|
|
runs hook commands directly (no shell), so any unwrapped entry that
|
|
relies on shell syntax (`;` chains, `2>/dev/null` redirection,
|
|
`|| true` short-circuit) fails silently on Windows. The previous
|
|
self-upgrade+pull chain shipped unwrapped — this test catches a
|
|
regression to that state."""
|
|
install_claude_hooks(tmp_path)
|
|
cfg = _read_settings(tmp_path)
|
|
starts = _commands_for(cfg, "SessionStart")
|
|
ends = _commands_for(cfg, "SessionEnd")
|
|
for cmd in starts + ends:
|
|
assert cmd.startswith("bash -c "), (
|
|
f"Hook must be wrapped in bash -c for Windows; got: {cmd!r}"
|
|
)
|
|
|
|
|
|
def test_install_idempotent(tmp_path):
|
|
install_claude_hooks(tmp_path)
|
|
install_claude_hooks(tmp_path)
|
|
cfg = _read_settings(tmp_path)
|
|
# Three SessionStart entries (capture-session + chained self-upgrade/pull
|
|
# + refresh-marketplace), one SessionEnd entry (push). Re-install must
|
|
# NOT duplicate them.
|
|
assert len(cfg["hooks"]["SessionStart"]) == 3
|
|
assert len(cfg["hooks"]["SessionEnd"]) == 1
|
|
|
|
|
|
def test_install_replaces_old_da_sync_entries(tmp_path):
|
|
"""Hook from a pre-rewrite workspace gets replaced cleanly — legacy
|
|
`da sync` entries are removed, both new agnes hooks land in their place."""
|
|
settings_path = tmp_path / ".claude" / "settings.json"
|
|
settings_path.parent.mkdir(parents=True)
|
|
settings_path.write_text(json.dumps({
|
|
"hooks": {
|
|
"SessionStart": [{"hooks": [{"type": "command", "command": "da sync --quiet"}]}],
|
|
"SessionEnd": [{"hooks": [{"type": "command", "command": "da sync --upload-only --quiet"}]}],
|
|
}
|
|
}))
|
|
install_claude_hooks(tmp_path)
|
|
cfg = _read_settings(tmp_path)
|
|
starts = _commands_for(cfg, "SessionStart")
|
|
assert len(starts) == 3
|
|
assert any("agnes capture-session" in c for c in starts)
|
|
assert any("agnes pull" in c for c in starts)
|
|
assert any("agnes refresh-marketplace" in c for c in starts)
|
|
# `agnes push` lives only in SessionEnd now.
|
|
assert not any("agnes push" in c for c in starts)
|
|
# Legacy command must be gone from BOTH starts.
|
|
assert not any("da sync" in c for c in starts)
|
|
|
|
|
|
def test_install_replaces_prior_single_pull_entry(tmp_path):
|
|
"""Workspaces bootstrapped by a CLI version that only installed a
|
|
single SessionStart entry (`agnes pull`, no refresh-marketplace) must
|
|
upgrade to the three-entry layout on the next install — not end up
|
|
stacking the new entries on top of the old one."""
|
|
settings_path = tmp_path / ".claude" / "settings.json"
|
|
settings_path.parent.mkdir(parents=True)
|
|
settings_path.write_text(json.dumps({
|
|
"hooks": {
|
|
"SessionStart": [
|
|
{"hooks": [{"type": "command", "command": "agnes pull --quiet 2>/dev/null || true"}]},
|
|
],
|
|
}
|
|
}))
|
|
install_claude_hooks(tmp_path)
|
|
cfg = _read_settings(tmp_path)
|
|
starts = _commands_for(cfg, "SessionStart")
|
|
assert len(starts) == 3
|
|
assert any("agnes capture-session" in c for c in starts)
|
|
assert any("agnes pull" in c for c in starts)
|
|
assert any("agnes refresh-marketplace" in c for c in starts)
|
|
assert not any("agnes push" in c for c in starts)
|
|
|
|
|
|
def test_install_replaces_v0_43_chained_self_upgrade_pull_entry(tmp_path):
|
|
"""Workspaces bootstrapped on v0.43.0 had a single SessionStart entry
|
|
chaining `agnes self-upgrade; agnes pull` in one shell line. Upgrading
|
|
those workspaces to v0.44.0+ must collapse that entry and re-install
|
|
the new two-entry layout — not stack the v0.44 entries on top of the
|
|
v0.43 chained one (which would re-run self-upgrade twice on every
|
|
session and leave the old format around forever).
|
|
"""
|
|
settings_path = tmp_path / ".claude" / "settings.json"
|
|
settings_path.parent.mkdir(parents=True)
|
|
settings_path.write_text(json.dumps({
|
|
"hooks": {
|
|
"SessionStart": [
|
|
{"hooks": [{"type": "command", "command": (
|
|
"agnes self-upgrade --quiet 2>/dev/null || true; "
|
|
"agnes pull --quiet 2>/dev/null || true"
|
|
)}]},
|
|
],
|
|
"SessionEnd": [
|
|
{"hooks": [{"type": "command", "command": "agnes push --quiet 2>/dev/null || true"}]},
|
|
],
|
|
}
|
|
}))
|
|
install_claude_hooks(tmp_path)
|
|
cfg = _read_settings(tmp_path)
|
|
starts = _commands_for(cfg, "SessionStart")
|
|
# Exactly three entries — the v0.43 chained line was replaced, not stacked.
|
|
assert len(starts) == 3, starts
|
|
chain = next(
|
|
(c for c in starts if "agnes self-upgrade" in c and "agnes pull" in c),
|
|
None,
|
|
)
|
|
assert chain is not None
|
|
assert any("agnes capture-session" in c for c in starts)
|
|
assert any("agnes refresh-marketplace" in c for c in starts)
|
|
assert not any("agnes push" in c for c in starts)
|
|
# SessionEnd untouched (single push entry).
|
|
ends = _commands_for(cfg, "SessionEnd")
|
|
assert len(ends) == 1
|
|
assert "agnes push --quiet" in ends[0]
|
|
|
|
|
|
def test_install_replaces_old_quiet_refresh_with_check(tmp_path):
|
|
"""A workspace bootstrapped before the slash-command split has the old
|
|
`--quiet` form in its refresh-marketplace SessionStart entry. The next
|
|
`agnes init` must replace that entry with the new `--check` form, NOT
|
|
stack the new entry alongside the old one (which would re-run the
|
|
full reconcile every session — exactly the behaviour we just moved
|
|
behind the slash command).
|
|
"""
|
|
settings_path = tmp_path / ".claude" / "settings.json"
|
|
settings_path.parent.mkdir(parents=True)
|
|
settings_path.write_text(json.dumps({
|
|
"hooks": {
|
|
"SessionStart": [
|
|
{"hooks": [{"type": "command", "command": (
|
|
"agnes self-upgrade --quiet 2>/dev/null || true; "
|
|
"agnes pull --quiet 2>/dev/null || true"
|
|
)}]},
|
|
{"hooks": [{"type": "command", "command": (
|
|
'bash -c "agnes refresh-marketplace --quiet 2>/dev/null || true"'
|
|
)}]},
|
|
{"hooks": [{"type": "command", "command": (
|
|
'bash -c "agnes push --quiet 2>/dev/null || true"'
|
|
)}]},
|
|
],
|
|
"SessionEnd": [
|
|
{"hooks": [{"type": "command", "command": (
|
|
'bash -c "( nohup agnes push --quiet </dev/null '
|
|
'>/dev/null 2>&1 & ) ; true"'
|
|
)}]},
|
|
],
|
|
}
|
|
}))
|
|
install_claude_hooks(tmp_path)
|
|
cfg = _read_settings(tmp_path)
|
|
starts = _commands_for(cfg, "SessionStart")
|
|
# Exactly one refresh-marketplace entry remains (no stacking).
|
|
refresh_entries = [c for c in starts if "agnes refresh-marketplace" in c]
|
|
assert len(refresh_entries) == 1, refresh_entries
|
|
refresh = refresh_entries[0]
|
|
assert "--check" in refresh, (
|
|
f"old --quiet entry must have been rewritten to --check; got: {refresh!r}"
|
|
)
|
|
assert "--quiet" not in refresh, (
|
|
f"old --quiet form must be gone after re-init; got: {refresh!r}"
|
|
)
|
|
|
|
|
|
def test_install_preserves_third_party_hooks(tmp_path):
|
|
settings_path = tmp_path / ".claude" / "settings.json"
|
|
settings_path.parent.mkdir(parents=True)
|
|
settings_path.write_text(json.dumps({
|
|
"hooks": {
|
|
"SessionStart": [{"hooks": [{"type": "command", "command": "echo hi from another tool"}]}],
|
|
"PreToolUse": [{"hooks": [{"type": "command", "command": "echo pre"}]}],
|
|
}
|
|
}))
|
|
install_claude_hooks(tmp_path)
|
|
cfg = _read_settings(tmp_path)
|
|
starts = _commands_for(cfg, "SessionStart")
|
|
# Third-party entry stays + all three agnes entries get added.
|
|
assert len(starts) == 4
|
|
assert any("echo hi from another tool" in c for c in starts)
|
|
assert any("agnes capture-session" in c for c in starts)
|
|
assert any("agnes pull" in c for c in starts)
|
|
assert any("agnes refresh-marketplace" in c for c in starts)
|
|
assert not any("agnes push" in c for c in starts)
|
|
# Other event types untouched.
|
|
assert cfg["hooks"]["PreToolUse"][0]["hooks"][0]["command"] == "echo pre"
|
|
|
|
|
|
def test_install_handles_missing_settings_file(tmp_path):
|
|
install_claude_hooks(tmp_path)
|
|
assert (tmp_path / ".claude" / "settings.json").exists()
|
|
|
|
|
|
def test_install_handles_invalid_json(tmp_path, capsys):
|
|
settings_path = tmp_path / ".claude" / "settings.json"
|
|
settings_path.parent.mkdir(parents=True)
|
|
settings_path.write_text("not valid json {")
|
|
install_claude_hooks(tmp_path)
|
|
captured = capsys.readouterr()
|
|
assert "not valid JSON" in captured.err or "warning" in captured.err.lower()
|
|
|
|
|
|
def test_install_chains_self_upgrade_then_pull_in_one_entry(tmp_path):
|
|
install_claude_hooks(tmp_path)
|
|
cfg = _read_settings(tmp_path)
|
|
starts = _commands_for(cfg, "SessionStart")
|
|
# SessionStart has two entries: the chain (self-upgrade + pull) and
|
|
# the standalone refresh-marketplace. This test pins the chain
|
|
# invariant — order, both `|| true`-guarded — independent of the
|
|
# refresh-marketplace entry being present.
|
|
chain = next(
|
|
(c for c in starts if "agnes self-upgrade" in c and "agnes pull" in c),
|
|
None,
|
|
)
|
|
assert chain is not None, starts
|
|
assert "agnes self-upgrade --quiet" in chain
|
|
assert "agnes pull --quiet" in chain
|
|
# Order is encoded in the shell — self-upgrade must appear first
|
|
assert chain.index("agnes self-upgrade") < chain.index("agnes pull")
|
|
# Both segments carry || true so neither failure aborts the line
|
|
assert chain.count("|| true") >= 2
|
|
|
|
|
|
def test_install_idempotent_chained_entry(tmp_path):
|
|
install_claude_hooks(tmp_path)
|
|
install_claude_hooks(tmp_path)
|
|
cfg = _read_settings(tmp_path)
|
|
# Three SessionStart entries (capture-session, chained self-upgrade+pull,
|
|
# refresh-marketplace) — re-install must not duplicate any of them.
|
|
assert len(cfg["hooks"]["SessionStart"]) == 3
|
|
assert len(cfg["hooks"]["SessionEnd"]) == 1
|
|
|
|
|
|
def test_session_end_push_is_detached(tmp_path):
|
|
"""Regression test for the headless-mode SIGTERM bug.
|
|
|
|
Claude Code in `-p` (headless) mode SIGTERMs SessionEnd hook
|
|
subprocesses ~1s after launch, regardless of whether the hook is
|
|
still working. `agnes push` for a typical workspace (10 session
|
|
JSONLs) takes 5-30s, so a synchronous form gets killed mid-first-
|
|
upload and most files never reach the server. The hook MUST run
|
|
detached so the upload child survives the hook subprocess being
|
|
torn down.
|
|
|
|
This test pins the wrapper shape — `bash -c "( nohup ... & ) ; true"` —
|
|
so a future refactor that re-introduces the synchronous form fails
|
|
loudly here instead of silently regressing in production.
|
|
"""
|
|
install_claude_hooks(tmp_path)
|
|
cfg = _read_settings(tmp_path)
|
|
ends = _commands_for(cfg, "SessionEnd")
|
|
assert len(ends) == 1
|
|
cmd = ends[0]
|
|
assert "agnes push" in cmd, f"SessionEnd must still call agnes push; got: {cmd!r}"
|
|
# Detachment markers — every one of these is load-bearing:
|
|
# - `nohup` ignores SIGHUP if the controlling terminal disappears
|
|
# - `&` backgrounds the child inside the subshell
|
|
# - `</dev/null` decouples stdin so the parent doesn't wait on a pipe
|
|
# - `>/dev/null 2>&1` decouples stdout/stderr likewise
|
|
assert "nohup" in cmd, f"SessionEnd push must use nohup for detachment; got: {cmd!r}"
|
|
assert "&" in cmd, f"SessionEnd push must background with &; got: {cmd!r}"
|
|
assert "</dev/null" in cmd, (
|
|
f"SessionEnd push must redirect stdin from /dev/null; got: {cmd!r}"
|
|
)
|
|
assert ">/dev/null 2>&1" in cmd, (
|
|
f"SessionEnd push must redirect stdout/stderr to /dev/null; got: {cmd!r}"
|
|
)
|
|
# `bash -c` wrapping is required because Claude Code on Windows runs
|
|
# hook commands directly (no shell), so the subshell + redirection
|
|
# syntax wouldn't parse otherwise.
|
|
assert cmd.startswith("bash -c "), (
|
|
f"SessionEnd push must be wrapped in bash -c for Windows; got: {cmd!r}"
|
|
)
|
|
|
|
|
|
def test_install_writes_statusline_when_absent(tmp_path):
|
|
"""Greenfield install: no prior statusLine → we write ours."""
|
|
install_claude_hooks(tmp_path)
|
|
cfg = _read_settings(tmp_path)
|
|
assert "statusLine" in cfg
|
|
assert cfg["statusLine"]["type"] == "command"
|
|
assert "agnes statusline" in cfg["statusLine"]["command"]
|
|
|
|
|
|
def test_install_preserves_existing_user_statusline(tmp_path, capsys):
|
|
"""User has their own statusLine — we leave it alone and warn on stderr.
|
|
Customizing the status bar is a personal preference; agnes shouldn't
|
|
clobber it. Operators who want the private indicator alongside their
|
|
own content can compose `agnes statusline` into their command."""
|
|
settings_path = tmp_path / ".claude" / "settings.json"
|
|
settings_path.parent.mkdir(parents=True)
|
|
user_statusline = {"type": "command", "command": "my-custom-status"}
|
|
settings_path.write_text(json.dumps({"statusLine": user_statusline}))
|
|
|
|
install_claude_hooks(tmp_path)
|
|
cfg = _read_settings(tmp_path)
|
|
# User's statusLine intact.
|
|
assert cfg["statusLine"] == user_statusline
|
|
# Warning surfaced.
|
|
captured = capsys.readouterr()
|
|
assert "statusLine" in captured.err
|
|
|
|
|
|
def test_install_idempotent_when_statusline_already_ours(tmp_path):
|
|
"""Re-running install when our statusLine is already in place is a no-op,
|
|
NOT a warning (idempotent re-init shouldn't spam the user)."""
|
|
install_claude_hooks(tmp_path)
|
|
install_claude_hooks(tmp_path)
|
|
cfg = _read_settings(tmp_path)
|
|
assert "agnes statusline" in cfg["statusLine"]["command"]
|
|
|
|
|
|
def test_install_treats_explicit_null_statusline_as_unconfigured(tmp_path, capsys):
|
|
"""`"statusLine": null` (legal JSON, unusual) is treated as
|
|
"no statusLine configured" rather than "user explicitly opted out".
|
|
The previous truthy check (`if existing:`) silently took this path
|
|
without distinguishing it from absent-key; the new check makes
|
|
the behavior explicit and tested. No warning — the user didn't
|
|
configure anything actionable."""
|
|
settings_path = tmp_path / ".claude" / "settings.json"
|
|
settings_path.parent.mkdir(parents=True)
|
|
settings_path.write_text(json.dumps({"statusLine": None}))
|
|
|
|
install_claude_hooks(tmp_path)
|
|
|
|
cfg = _read_settings(tmp_path)
|
|
assert isinstance(cfg["statusLine"], dict)
|
|
assert "agnes statusline" in cfg["statusLine"]["command"]
|
|
captured = capsys.readouterr()
|
|
assert "preserved" not in captured.err # no spurious warning
|
|
|
|
|
|
def test_install_treats_empty_statusline_as_unconfigured(tmp_path, capsys):
|
|
"""Empty string is the falsy sibling of None — same treatment.
|
|
Documents the boundary so future changes can't accidentally treat
|
|
`""` as a non-empty truthy command."""
|
|
settings_path = tmp_path / ".claude" / "settings.json"
|
|
settings_path.parent.mkdir(parents=True)
|
|
settings_path.write_text(json.dumps({"statusLine": ""}))
|
|
|
|
install_claude_hooks(tmp_path)
|
|
|
|
cfg = _read_settings(tmp_path)
|
|
assert isinstance(cfg["statusLine"], dict)
|
|
assert "agnes statusline" in cfg["statusLine"]["command"]
|
|
captured = capsys.readouterr()
|
|
assert "preserved" not in captured.err
|
|
|
|
|
|
def test_install_replaces_old_synchronous_session_end_push(tmp_path):
|
|
"""A workspace bootstrapped before the detachment fix has the old
|
|
synchronous `agnes push --quiet 2>/dev/null || true` SessionEnd entry.
|
|
On the next `agnes init`, that entry must be matched by the
|
|
`agnes push` marker and replaced with the new detached form — not
|
|
stacked alongside it."""
|
|
settings_path = tmp_path / ".claude" / "settings.json"
|
|
settings_path.parent.mkdir(parents=True)
|
|
settings_path.write_text(json.dumps({
|
|
"hooks": {
|
|
"SessionEnd": [
|
|
{"hooks": [{"type": "command", "command": "agnes push --quiet 2>/dev/null || true"}]},
|
|
],
|
|
}
|
|
}))
|
|
install_claude_hooks(tmp_path)
|
|
cfg = _read_settings(tmp_path)
|
|
ends = _commands_for(cfg, "SessionEnd")
|
|
assert len(ends) == 1, ends
|
|
assert "nohup" in ends[0], (
|
|
f"Old synchronous push entry must have been replaced with the detached form; got: {ends!r}"
|
|
)
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# workspace_has_agnes_hooks / maybe_refresh_claude_hooks
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
def test_workspace_has_agnes_hooks_false_for_missing_settings(tmp_path):
|
|
"""Fresh dir without `.claude/settings.json` is not an Agnes workspace."""
|
|
assert workspace_has_agnes_hooks(tmp_path) is False
|
|
|
|
|
|
def test_workspace_has_agnes_hooks_false_for_empty_settings(tmp_path):
|
|
"""`.claude/settings.json` exists but is empty `{}` — not an Agnes workspace."""
|
|
sp = tmp_path / ".claude" / "settings.json"
|
|
sp.parent.mkdir(parents=True)
|
|
sp.write_text(json.dumps({}), encoding="utf-8")
|
|
assert workspace_has_agnes_hooks(tmp_path) is False
|
|
|
|
|
|
def test_workspace_has_agnes_hooks_false_for_invalid_json(tmp_path):
|
|
sp = tmp_path / ".claude" / "settings.json"
|
|
sp.parent.mkdir(parents=True)
|
|
sp.write_text("not json", encoding="utf-8")
|
|
assert workspace_has_agnes_hooks(tmp_path) is False
|
|
|
|
|
|
def test_workspace_has_agnes_hooks_false_for_third_party_only_hook(tmp_path):
|
|
"""A settings.json with only third-party hooks (no agnes marker) is
|
|
NOT an Agnes workspace — important so `agnes self-upgrade` from such
|
|
a dir doesn't auto-install Agnes hooks behind the user's back."""
|
|
sp = tmp_path / ".claude" / "settings.json"
|
|
sp.parent.mkdir(parents=True)
|
|
sp.write_text(json.dumps({
|
|
"hooks": {
|
|
"SessionStart": [
|
|
{"hooks": [{"type": "command", "command": "echo hello"}]},
|
|
],
|
|
}
|
|
}), encoding="utf-8")
|
|
assert workspace_has_agnes_hooks(tmp_path) is False
|
|
|
|
|
|
def test_workspace_has_agnes_hooks_true_for_agnes_hook(tmp_path):
|
|
install_claude_hooks(tmp_path)
|
|
assert workspace_has_agnes_hooks(tmp_path) is True
|
|
|
|
|
|
def test_workspace_has_agnes_hooks_true_for_just_statusline(tmp_path):
|
|
"""statusLine alone (no hook entries) still signals an Agnes workspace."""
|
|
sp = tmp_path / ".claude" / "settings.json"
|
|
sp.parent.mkdir(parents=True)
|
|
sp.write_text(json.dumps({
|
|
"statusLine": {"type": "command", "command": "agnes statusline"},
|
|
}), encoding="utf-8")
|
|
assert workspace_has_agnes_hooks(tmp_path) is True
|
|
|
|
|
|
def test_maybe_refresh_noop_in_non_agnes_directory(tmp_path):
|
|
"""Critical safety: `agnes self-upgrade` invoked from a non-Agnes dir
|
|
(e.g. ~/) must NOT create `.claude/settings.json` there."""
|
|
refreshed = maybe_refresh_claude_hooks(tmp_path)
|
|
assert refreshed is False
|
|
assert not (tmp_path / ".claude").exists()
|
|
|
|
|
|
def test_maybe_refresh_writes_new_layout_into_v048_workspace(tmp_path):
|
|
"""Simulate a v0.48 workspace (pre-PR-242 hook layout — chained
|
|
self-upgrade+pull and old refresh-marketplace --quiet, no
|
|
capture-session, no statusLine) and assert that
|
|
maybe_refresh_claude_hooks brings it up to the current layout."""
|
|
sp = tmp_path / ".claude" / "settings.json"
|
|
sp.parent.mkdir(parents=True)
|
|
sp.write_text(json.dumps({
|
|
"hooks": {
|
|
"SessionStart": [
|
|
{"hooks": [{"type": "command",
|
|
"command": "agnes self-upgrade --quiet 2>/dev/null || true; "
|
|
"agnes pull --quiet 2>/dev/null || true"}]},
|
|
{"hooks": [{"type": "command",
|
|
"command": 'bash -c "agnes refresh-marketplace --quiet 2>/dev/null || true"'}]},
|
|
],
|
|
"SessionEnd": [
|
|
{"hooks": [{"type": "command",
|
|
"command": "agnes push --quiet 2>/dev/null || true"}]},
|
|
],
|
|
}
|
|
}), encoding="utf-8")
|
|
|
|
refreshed = maybe_refresh_claude_hooks(tmp_path)
|
|
assert refreshed is True
|
|
|
|
cfg = _read_settings(tmp_path)
|
|
starts = _commands_for(cfg, "SessionStart")
|
|
# New layout: capture-session is now the very first SessionStart entry.
|
|
assert any("agnes capture-session" in c for c in starts), (
|
|
f"v0.48→v0.49 migration must install capture-session; got: {starts!r}"
|
|
)
|
|
# And refresh-marketplace must be on --check, not --quiet.
|
|
refresh = next((c for c in starts if "agnes refresh-marketplace" in c), None)
|
|
assert refresh is not None and "--check" in refresh, refresh
|
|
|
|
ends = _commands_for(cfg, "SessionEnd")
|
|
# SessionEnd push is now detached via nohup (replaces the old
|
|
# synchronous form).
|
|
assert any("nohup" in c for c in ends), (
|
|
f"v0.48→v0.49 migration must detach SessionEnd push; got: {ends!r}"
|
|
)
|
|
# statusLine for the agnes-private indicator must also be installed.
|
|
assert cfg.get("statusLine", {}).get("command", "").startswith("agnes statusline")
|
|
|
|
|
|
def test_maybe_refresh_preserves_third_party_hooks(tmp_path):
|
|
"""Refresh must keep third-party hooks intact (same guarantee as
|
|
install_claude_hooks — the migration path uses the same machinery)."""
|
|
sp = tmp_path / ".claude" / "settings.json"
|
|
sp.parent.mkdir(parents=True)
|
|
sp.write_text(json.dumps({
|
|
"hooks": {
|
|
"SessionStart": [
|
|
{"hooks": [{"type": "command", "command": "agnes self-upgrade --quiet || true"}]},
|
|
{"hooks": [{"type": "command", "command": "echo hi from another tool"}]},
|
|
],
|
|
}
|
|
}), encoding="utf-8")
|
|
refreshed = maybe_refresh_claude_hooks(tmp_path)
|
|
assert refreshed is True
|
|
cfg = _read_settings(tmp_path)
|
|
starts = _commands_for(cfg, "SessionStart")
|
|
assert "echo hi from another tool" in starts, (
|
|
f"Third-party hook must survive refresh; got: {starts!r}"
|
|
)
|