Four review iterations resolved: - PATH-shadow-safe smoke test (uv tool dir --bin + ~/.local/bin fallback) - Recursion sentinel for in-flight self-upgrade - sys.executable + --no-deps pip fallback (NOT system python3, NOT --user) - Smoke + rollback with rc capture and bootstrap recovery - Single chained SessionStart entry (shell ; for ordering, no Claude Code semantics dependency) - AGNES_NO_UPDATE_CHECK bypass for explicit self-upgrade - _get_shared_client() left unhooked (mid-stream sys.exit unsafe; Caddy proxies parquets anyway) Targets release 0.40.0.
18 KiB
CLI Auto-Upgrade — Server-Pinned Version
Status: spec / design. Convert to an implementation plan in
docs/superpowers/plans/once reviewed.
Goal: Keep an analyst's locally-installed agnes CLI in sync with the server it talks to. The server is the single source of truth for "what version should be running"; the CLI never asks PyPI, only the server.
Why now: today an analyst installs once via uv tool install $SERVER/cli/wheel/<name> and drifts arbitrarily. The CLI already prints a warning when out of date but never upgrades itself, and there's no hard-stop when a wire-protocol break ships — drifted clients fail with cryptic errors instead of being told to upgrade.
Non-goal: distributing the CLI through PyPI, GitHub releases, or any out-of-band channel. The wheel lives next to the server (/app/dist/*.whl) and is served by app/api/cli_artifacts.py.
What already exists
The first half of this design is already shipped, just incomplete:
GET /cli/latest(app/api/cli_artifacts.py:42) →{version, wheel_filename, download_url_path}. Public, no auth.GET /cli/wheel/{name}+/cli/download+/cli/install.shfor distribution.cli/update_check.py— polls/cli/lateston every CLI invocation fromcli/main.py:99-104, caches result for 24h (positive) / 5min (negative), prints a stderr warning with a copy-pasteuv tool install --force <url>command. Opt-out:AGNES_NO_UPDATE_CHECK=1.cli/client.py:216 get_client()— the sharedhttpx.Clientfactory. Single chokepoint for response-header inspection.- Hook installer at
cli/lib/hooks.py:install_claude_hookswrites:SessionStart→agnes pull --quiet 2>/dev/null || trueSessionEnd→agnes push --quiet 2>/dev/null || true
What's missing:
- The CLI prints a copy-paste command but never executes the upgrade.
- No
min_versionfloor — drift is unbounded; a wire break gives a cryptic 500 instead of a clear "you're too old, upgrade". - No SessionStart hook for proactive upgrade — analyst must notice the warning, copy, paste, run.
- The server-side comment on
/cli/latest(app/api/cli_artifacts.py:47) and the docstring incli/update_check.pystill reference the olddabinary name; cleanup while we're in there.
Design
Two layers, complementary, with different latencies and failure modes.
Layer A — proactive auto-upgrade (SessionStart hook + new CLI command)
agnes init writes a single SessionStart hook entry that chains self-upgrade and pull with ; so ordering is guaranteed by the shell, not by undocumented Claude Code hook-execution semantics:
SessionStart → agnes self-upgrade --quiet 2>/dev/null || true; agnes pull --quiet 2>/dev/null || true
SessionEnd → agnes push --quiet 2>/dev/null || true
The ; runs both unconditionally; each || true keeps a single failure from aborting the line. We lose nothing the design relied on (the "upgrade fail does not block pull" property is preserved by the second || true), and we gain an ordering guarantee that holds across every Claude Code version.
agnes self-upgrade [--quiet] [--check-only] [--force]:
- Set
AGNES_SELF_UPGRADE_IN_PROGRESS=1inos.environfor the duration of the call. Layer B's header check reads this sentinel and skips the hard-stop while we're upgrading — without this, a later refactor that hasself-upgradecallingget_client()(e.g. for auth) would loop: hit< min, exit 2 with "Run: agnes self-upgrade" — insideagnes self-upgrade. Sentinel propagates to subprocesses via the explicitenv=we pass to the smoke test. - If
--force, invalidate theupdate_check.jsoncache before probing, so we always pick up the server's currentdownload_url. - Reuse
cli.update_check.check(server_url)— same/cli/latestcall, same cache, same version comparison. No second polling path. - If
info is None(disabled / no server / unknown local version) or(not force and not info.is_outdated())→ exit 0. --check-only→ printformat_outdated_notice(info), exit 1 if outdated, 0 if current.- Otherwise: snapshot
prior_url = _read_last_known_good()(the URL of the version we last successfully smoke-tested into; may beNoneon first upgrade — best-effort rollback only). Then reinstall:uvavailable (shutil.which("uv")) →uv tool install --force "<download_url>"- else → download wheel to
mktemp -d(curl), then[sys.executable, "-m", "pip", "install", "--force-reinstall", "--no-deps", <wheel>]. Crucially usessys.executable(the running CLI's interpreter) rather thanpython3(PATH-resolved system Python), and does not pass--user— both would land the wheel outside the uv-tool venv that owns theagnesbinary, silently no-op'ing the upgrade.
- Smoke-test the new binary before declaring success — but not via
shutil.which("agnes"). PATH may shadow the just-installed binary with a stale/usr/local/bin/agnesfrom an oldpip install --useror Homebrew shim, in which case--versionwould print the old version and report success. Instead, locate the binary deterministically:- uv path → call
uv tool dir --bin(one subprocess; uv's--binflag returns the directory containing entrypoint shims, working transparently across POSIX/Windows). Look foragnesthenagnes.exein that directory. - pip path →
<sys.executable parent>/agnes(POSIX) or<sys.executable parent>/agnes.exe(Windows) — the sibling of the running interpreter, which is the venv pip just rewrote. Thensubprocess.run([str(binary), "--version"], env={**os.environ, "AGNES_NO_UPDATE_CHECK": "1", "AGNES_SELF_UPGRADE_IN_PROGRESS": "1"}, timeout=10, capture_output=True). Smoke passes when returncode is 0 and the trailing token of stdout parses to apackaging.version.Versionequal toinfo.latest— equality onVersion()(not substring), so0.40.0does not falsely match0.40.10and PEP 440 local segments are handled.
- uv path → call
- On smoke fail: if
prior_urlis set and ≠info.download_url, attempt a single rollback install ofprior_urlvia the same uv/pip path. Capture the rollback's return code — if it's non-zero, the CLI is in a broken state, surface this on stderr alongside the bootstrap-recovery command. Ifprior_urlisNone(first-ever upgrade) or rollback also fails, stderr printsRun: curl -fsSL <server>/cli/install.sh | bash— the canonical bootstrap path that doesn't depend on local state. Either wayraise typer.Exit(1). - On smoke pass:
_record_last_known_good(info.download_url)(writes~/.config/agnes/last_known_good.json— separate fromupdate_check.json, updated only after a verified-good install) then_invalidate_update_cache(). Convention; no correctness consequence either way. --quietsuppresses progress output; stderr always passes through on install / smoke / rollback failures —--quietis for routine success runs (the SessionStart hook), not a gag on errors.--force+ offline.--forceinvalidates the cache before probing/cli/latest. If the probe fails (network down),--forceraisestyper.Exit(1)withcannot reach <server>/cli/lateston stderr — explicit destructive intent deserves explicit feedback. Without--force, an offline probe is silent (the implicit warning loop's contract).--check-onlyis read-only intent — exit 0 on transport errors. Even with--force, when the probe is unreachable under--check-only, the command exits 0 silently rather than surfacing the error:--check-onlyshould never produce a non-zero exit unless the CLI is known outdated. (--forcesemantics still apply to the actual install path; pairing--check-only --forceis well-defined: it invalidates the cache, fresh-probes, prints status, never installs.)AGNES_NO_UPDATE_CHECK=1silences the implicit warning loop only. Explicitagnes self-upgradecallscheck(server_url, bypass_disabled=True)so the env var does not turn a user-typed upgrade command into a silent no-op.
Platform support: smoke test branches on sys.platform == "win32" for the .exe suffix; the rest of the flow is platform-neutral via uv. Windows is supported on a best-effort basis (analyst laptops are predominantly macOS/Linux).
Honors the existing AGNES_NO_UPDATE_CHECK=1 opt-out — same flag, same intent. No new opt-out env var.
Latency: runs once at session start, blocks pull by ~3-10s on upgrade (install + ~1s smoke test), ~0.2s when in-sync (one cached HTTP roundtrip + early-out).
Failure modes: offline / server down → || true → session continues on old version. Install succeeds but new wheel is broken → smoke test catches it, attempts rollback, prints recovery instructions. Layer B catches drift on the next API call.
Layer B — reactive verification (response headers)
Every /api/* response includes two headers (FastAPI middleware):
X-Agnes-Latest-Version: 0.40.0—APP_VERSION, same value the install script bakes in.X-Agnes-Min-Version: 0.0.0— oldest CLI version the server still accepts. Lives in a single Python constant. Bumped manually when a wire-protocol break ships. Ships at0.0.0on day one so rollout doesn't accidentally lock anyone out — first deliberate gate is the first time this gets bumped.
The shared HTTP client (cli/client.py:216) inspects these on every response:
| Local CLI version | Behavior |
|---|---|
>= latest |
nothing |
>= min and < latest |
nothing — Layer A's startup poll already prints the warning; no need to nag again on every API call |
< min |
print error: agnes <local> is incompatible with server <latest> (min required: <min>). Run: agnes self-upgrade and sys.exit(2). Operation is not performed. |
Recursion barrier: _check_version_headers short-circuits (returns silently, no enforcement) when os.environ.get("AGNES_SELF_UPGRADE_IN_PROGRESS") == "1". Set by Layer A's command for the duration of the upgrade so the in-flight agnes self-upgrade cannot be locked out from itself by a < min response on any internal /api/* call. The sentinel is process-local and propagates to the smoke-test subprocess via explicit env=.
The CLI also sends User-Agent: agnes/<version> (<platform>) so the server can audit drift in access logs.
Day-one floor. MIN_COMPAT_CLI_VERSION = "0.0.0" — no enforcement. The constant + middleware + CLI inspection are an opt-in mechanism for the future. When a wire break ships, the engineer bumps the constant in the same PR and adds a **BREAKING** CHANGELOG bullet — same review discipline as every other behavior change. No standalone CI gate, no doc, no PR-template checkbox: those would be theater that catches nothing real (an engineer can check a box without bumping a constant). The mechanism stays free-to-use; the policy is one constant change away when someone needs it.
How the two layers compose
| Scenario | Layer A | Layer B | Outcome |
|---|---|---|---|
| Happy path | upgrade silent (already current) | headers OK | no output |
| Drift caught at session start | upgrades to latest | headers OK after upgrade | brief "installed: 0.40.0" line if not --quiet |
| Hook failed (offline at session start), online now | no-op | < latest ⇒ silent (warning still printed by update_check from main callback) |
analyst sees one warning, runs agnes self-upgrade manually |
Server shipped a wire break, analyst is < min |
hook would have caught it, but maybe the analyst skipped Claude Code | hard-stop with remediation | exit 2, clear message |
| Headless / CI / ad-hoc terminal (no Claude Code) | hook never runs | warning + hard-stop still apply | covered |
Server-side changes
app/version.py (new — single source of truth)
"""Single source of truth for app + CLI compat versions."""
import importlib.metadata
APP_VERSION = importlib.metadata.version("agnes-the-ai-analyst")
# Bump when shipping a wire-protocol break. Older CLIs are blocked at the
# response-header layer with exit 2 + remediation message. Day-one value
# of 0.0.0 means no enforcement — set the floor the first time a deliberate
# break ships.
MIN_COMPAT_CLI_VERSION = "0.0.0"
app/main.py — middleware
@app.middleware("http")
async def add_version_headers(request, call_next):
response = await call_next(request)
if request.url.path.startswith("/api/"):
response.headers["X-Agnes-Latest-Version"] = APP_VERSION
response.headers["X-Agnes-Min-Version"] = MIN_COMPAT_CLI_VERSION
return response
Applied only to /api/ so marketplace / wheel / web UI responses stay clean. Verify CORS expose_headers includes these (or *).
app/api/cli_artifacts.py — fix stale da reference
Drive-by: line 47 still says "Consumed by da CLI's auto-update check". Update to agnes. No behavior change.
/cli/latest itself stays as-is — pure metadata about the wheel on disk. min_version is a server-policy concern (per-request), not wheel metadata, so it lives on the headers and not in this payload.
CLI-side changes
cli/commands/self_upgrade.py (new)
Logic per Layer A above. ~80 lines including the install subprocess call. Reuses:
cli.update_check.check()for the version probe (identical to whatcli/main.py:102already calls)cli.update_check.format_outdated_notice()for--check-onlyoutputcli.config.get_server_url()for the server URLshutil.which("uv")to choose install pathsubprocess.runwithcheck=Trueto surface install failures
Wire into cli/main.py near the existing typer registrations.
cli/client.py:get_client() — header inspection
Wrap the returned httpx.Client so every response goes through one hook. Cleanest is httpx.Client(event_hooks={"response": [_check_version_headers]}):
def _check_version_headers(response: httpx.Response) -> None:
latest = response.headers.get("X-Agnes-Latest-Version")
minv = response.headers.get("X-Agnes-Min-Version")
if not latest or not minv:
return # talking to an older server; no enforcement
local = _installed_version() # reuse from update_check
if local == "unknown":
return # dev install / editable; never block
if _version_lt(local, minv): # reuse update_check._version_lt
sys.stderr.write(
f"error: agnes {local} is incompatible with server {latest}"
f" (min required: {minv}). Run: agnes self-upgrade\n"
)
sys.exit(2)
Only the hard-stop is enforced here — drift warnings are already handled by update_check in the root callback, no point doubling them on every API call.
_version_lt and _installed_version move from cli/update_check.py into cli/_version_compat.py (or stay in update_check.py and client.py imports them) — pick whichever keeps imports simple. Both files need them.
User-Agent: extend get_client() to set headers={"User-Agent": f"agnes/{_installed_version()} ({platform.system().lower()})"} (merge with caller-supplied headers).
cli/lib/hooks.py:install_claude_hooks — chain self-upgrade ahead of pull
_OUR_COMMAND_MARKERS = ("agnes self-upgrade", "agnes pull", "agnes push", "da sync")
_replace_or_add(
"SessionStart",
"agnes self-upgrade --quiet 2>/dev/null || true; "
"agnes pull --quiet 2>/dev/null || true",
)
_replace_or_add("SessionEnd", "agnes push --quiet 2>/dev/null || true")
Single chained SessionStart entry. Shell ; guarantees ordering (no reliance on Claude Code's undocumented multi-hook semantics); each || true ensures one segment's failure does not abort the line. _OUR_COMMAND_MARKERS is extended so re-running agnes init recognises the chained line on substring match and replaces rather than duplicates.
Drive-by cleanup
cli/update_check.py docstring (lines 1-9) still references da four times. Update to agnes. No behavior change.
Tests
Server
- New:
tests/test_version_headers_middleware.py—/api/sync/trigger(or any cheap/api/*) returns both headers;/web/*and/cli/*do not. - Existing
/cli/latesttests already cover the wheel metadata path.
CLI
tests/test_self_upgrade.py— mockupdate_check.check(), mocksubprocess.run, assert correct command shape (uv vs pip path), assert--check-onlyexits 1 when outdated and 0 when current, assert--forceskips theis_outdated()short-circuit, assert success path invalidates theupdate_check.jsoncache.tests/test_client_version_check.py— fake response withmin > local⇒SystemExit(2). Fake response withlatest > local >= min⇒ no stderr, no exit. Localunknown⇒ no enforcement. Missing headers (old server) ⇒ no enforcement.tests/test_lib_hooks.py— assert the chained command is the sole SessionStart entry, thatself-upgradeprecedespull, that both segments end in|| true, and that re-runninginstall_claude_hooksstays idempotent (length stays at 1).
Migration / rollout
- Additive — no breaking change. Old CLIs (no header check, no self-upgrade command) keep working; old servers (no headers) make the new CLI silent (no enforcement, just the existing warning loop).
- Ship in one PR. CHANGELOG entry under
### Added: "CLI now auto-upgrades from the server at session start (agnes self-upgrade) and hard-stops on incompatible-version mismatch via response headers." - After merge, manually bump
MIN_COMPAT_CLI_VERSIONin the next PR that ships a wire-protocol break — that's the first time the hard-stop actually fires.
Self-review
- Spec coverage: both layers (A/B), both directions (check + enforce), reuse of
update_checkto avoid two polling paths, hook idempotency, drive-byda → agnescleanup. ✓ - Resolved during review: A (
cli/client.py:216+cli/main.py:99-104), B (MIN_COMPAT_CLI_VERSION = "0.0.0"on day one), D (reuseAGNES_NO_UPDATE_CHECK, no new opt-out flag). - No placeholders: every component has a concrete file path and existing-symbol reference.
- Type/name consistency:
APP_VERSION,MIN_COMPAT_CLI_VERSION,X-Agnes-Latest-Version,X-Agnes-Min-Version,agnes self-upgrade, reusedupdate_check.check()/format_outdated_notice()/_version_lt()/_installed_version()— consistent throughout. - Spec, not plan: no per-step TDD breakdown. Convert to a plan once reviewed.