agnes-the-ai-analyst/docs/archive/superpowers/specs/2026-05-06-cli-auto-upgrade-spec.md
ZdenekSrotyr a48524509a
docs: consolidate and de-clutter the documentation tree (#306)
CLAUDE.md rewritten (708 -> ~320 lines): four overlapping release
sections collapsed to one, stale v1->v35 schema history dropped (it
lives in CHANGELOG), marketplace endpoint internals and verbose
process sections moved out or tightened.

New focused docs:
- docs/RELEASING.md - release process, deploy workflows, CI quirks
  (RELEASE_TEMPLATE.md folded in as an appendix)
- docs/marketplace.md - marketplace ingestion + re-serving internals
- docs/README.md - documentation index by audience, linked from
  README.md and CLAUDE.md

Archived under docs/archive/: docs/superpowers/ (52 historical
planning artifacts), HACKATHON.md, pd-ps-comments.md,
security-audit-2026-04.md, future/NOTIFICATIONS.md.

Removed the docs/auto-install.md stub. Fixed dangling links in
connectors/jira/README.md and dev_docs/README.md, repointed
code/doc references to archived paths.
2026-05-14 18:54:22 +00:00

241 lines
18 KiB
Markdown

# CLI Auto-Upgrade — Server-Pinned Version
> **Status:** spec / design. Convert to an implementation plan in `docs/superpowers/plans/` once reviewed.
**Goal:** Keep an analyst's locally-installed `agnes` CLI in sync with the server it talks to. The server is the single source of truth for "what version should be running"; the CLI never asks PyPI, only the server.
**Why now:** today an analyst installs once via `uv tool install $SERVER/cli/wheel/<name>` and drifts arbitrarily. The CLI already prints a *warning* when out of date but never upgrades itself, and there's no hard-stop when a wire-protocol break ships — drifted clients fail with cryptic errors instead of being told to upgrade.
**Non-goal:** distributing the CLI through PyPI, GitHub releases, or any out-of-band channel. The wheel lives next to the server (`/app/dist/*.whl`) and is served by `app/api/cli_artifacts.py`.
---
## What already exists
The first half of this design **is already shipped**, just incomplete:
- **`GET /cli/latest`** (`app/api/cli_artifacts.py:42`) → `{version, wheel_filename, download_url_path}`. Public, no auth.
- **`GET /cli/wheel/{name}`** + `/cli/download` + `/cli/install.sh` for distribution.
- **`cli/update_check.py`** — polls `/cli/latest` on every CLI invocation from `cli/main.py:99-104`, caches result for 24h (positive) / 5min (negative), prints a stderr warning with a copy-paste `uv tool install --force <url>` command. Opt-out: `AGNES_NO_UPDATE_CHECK=1`.
- **`cli/client.py:216 get_client()`** — the shared `httpx.Client` factory. Single chokepoint for response-header inspection.
- **Hook installer** at `cli/lib/hooks.py:install_claude_hooks` writes:
- `SessionStart``agnes pull --quiet 2>/dev/null || true`
- `SessionEnd``agnes push --quiet 2>/dev/null || true`
What's missing:
1. The CLI prints a copy-paste command but never **executes** the upgrade.
2. No `min_version` floor — drift is unbounded; a wire break gives a cryptic 500 instead of a clear "you're too old, upgrade".
3. No SessionStart hook for proactive upgrade — analyst must notice the warning, copy, paste, run.
4. The server-side comment on `/cli/latest` (`app/api/cli_artifacts.py:47`) and the docstring in `cli/update_check.py` still reference the old `da` binary name; cleanup while we're in there.
---
## Design
Two layers, complementary, with different latencies and failure modes.
### Layer A — proactive auto-upgrade (SessionStart hook + new CLI command)
`agnes init` writes a **single** SessionStart hook entry that chains self-upgrade and pull with `;` so ordering is guaranteed by the shell, not by undocumented Claude Code hook-execution semantics:
```
SessionStart → agnes self-upgrade --quiet 2>/dev/null || true; agnes pull --quiet 2>/dev/null || true
SessionEnd → agnes push --quiet 2>/dev/null || true
```
The `;` runs both unconditionally; each `|| true` keeps a single failure from aborting the line. We lose nothing the design relied on (the *"upgrade fail does not block pull"* property is preserved by the second `|| true`), and we gain an ordering guarantee that holds across every Claude Code version.
`agnes self-upgrade [--quiet] [--check-only] [--force]`:
1. Set `AGNES_SELF_UPGRADE_IN_PROGRESS=1` in `os.environ` for the duration of the call. Layer B's header check reads this sentinel and *skips* the hard-stop while we're upgrading — without this, a later refactor that has `self-upgrade` calling `get_client()` (e.g. for auth) would loop: hit `< min`, exit 2 with *"Run: agnes self-upgrade"* — inside `agnes self-upgrade`. Sentinel propagates to subprocesses via the explicit `env=` we pass to the smoke test.
2. If `--force`, **invalidate** the `update_check.json` cache *before* probing, so we always pick up the server's current `download_url`.
3. Reuse `cli.update_check.check(server_url)` — same `/cli/latest` call, same cache, same version comparison. No second polling path.
4. If `info is None` (disabled / no server / unknown local version) or `(not force and not info.is_outdated())` → exit 0.
5. `--check-only` → print `format_outdated_notice(info)`, exit 1 if outdated, 0 if current.
6. Otherwise: snapshot `prior_url = _read_last_known_good()` (the URL of the version we last successfully smoke-tested into; may be `None` on first upgrade — best-effort rollback only). Then reinstall:
- `uv` available (`shutil.which("uv")`) → `uv tool install --force "<download_url>"`
- else → download wheel to `mktemp -d` (curl), then `[sys.executable, "-m", "pip", "install", "--force-reinstall", "--no-deps", <wheel>]`. **Crucially** uses `sys.executable` (the running CLI's interpreter) rather than `python3` (PATH-resolved system Python), and **does not** pass `--user` — both would land the wheel outside the uv-tool venv that owns the `agnes` binary, silently no-op'ing the upgrade.
7. **Smoke-test the new binary** before declaring success — but not via `shutil.which("agnes")`. PATH may shadow the just-installed binary with a stale `/usr/local/bin/agnes` from an old `pip install --user` or Homebrew shim, in which case `--version` would print the *old* version and report success. Instead, locate the binary deterministically:
- **uv path** → call `uv tool dir --bin` (one subprocess; uv's `--bin` flag returns the directory containing entrypoint shims, working transparently across POSIX/Windows). Look for `agnes` then `agnes.exe` in that directory.
- **pip path** → `<sys.executable parent>/agnes` (POSIX) or `<sys.executable parent>/agnes.exe` (Windows) — the sibling of the running interpreter, which is the venv pip just rewrote.
Then `subprocess.run([str(binary), "--version"], env={**os.environ, "AGNES_NO_UPDATE_CHECK": "1", "AGNES_SELF_UPGRADE_IN_PROGRESS": "1"}, timeout=10, capture_output=True)`. Smoke passes when returncode is 0 **and** the trailing token of stdout parses to a `packaging.version.Version` equal to `info.latest` — equality on `Version()` (not substring), so `0.40.0` does not falsely match `0.40.10` and PEP 440 local segments are handled.
8. On smoke fail: if `prior_url` is set and ≠ `info.download_url`, attempt a single rollback install of `prior_url` via the same uv/pip path. **Capture the rollback's return code** — if it's non-zero, the CLI is in a broken state, surface this on stderr alongside the bootstrap-recovery command. If `prior_url` is `None` (first-ever upgrade) or rollback also fails, stderr prints `Run: curl -fsSL <server>/cli/install.sh | bash` — the canonical bootstrap path that doesn't depend on local state. Either way `raise typer.Exit(1)`.
9. On smoke pass: `_record_last_known_good(info.download_url)` (writes `~/.config/agnes/last_known_good.json` — separate from `update_check.json`, updated only after a verified-good install) then `_invalidate_update_cache()`. Convention; no correctness consequence either way.
10. `--quiet` suppresses progress output; **stderr always passes through on install / smoke / rollback failures**`--quiet` is for routine success runs (the SessionStart hook), not a gag on errors.
11. **`--force` + offline.** `--force` invalidates the cache before probing `/cli/latest`. If the probe fails (network down), `--force` raises `typer.Exit(1)` with `cannot reach <server>/cli/latest` on stderr — explicit destructive intent deserves explicit feedback. Without `--force`, an offline probe is silent (the implicit warning loop's contract).
12. **`--check-only` is read-only intent — exit 0 on transport errors.** Even with `--force`, when the probe is unreachable under `--check-only`, the command exits 0 silently rather than surfacing the error: `--check-only` should never produce a non-zero exit unless the CLI is *known* outdated. (`--force` semantics still apply to the actual install path; pairing `--check-only --force` is well-defined: it invalidates the cache, fresh-probes, prints status, never installs.)
13. **`AGNES_NO_UPDATE_CHECK=1`** silences the implicit warning loop only. Explicit `agnes self-upgrade` calls `check(server_url, bypass_disabled=True)` so the env var does not turn a user-typed upgrade command into a silent no-op.
**Platform support:** smoke test branches on `sys.platform == "win32"` for the `.exe` suffix; the rest of the flow is platform-neutral via uv. Windows is supported on a best-effort basis (analyst laptops are predominantly macOS/Linux).
Honors the existing `AGNES_NO_UPDATE_CHECK=1` opt-out — same flag, same intent. No new opt-out env var.
**Latency:** runs once at session start, blocks pull by ~3-10s on upgrade (install + ~1s smoke test), ~0.2s when in-sync (one cached HTTP roundtrip + early-out).
**Failure modes:** offline / server down → `|| true` → session continues on old version. Install succeeds but new wheel is broken → smoke test catches it, attempts rollback, prints recovery instructions. Layer B catches drift on the next API call.
### Layer B — reactive verification (response headers)
Every `/api/*` response includes two headers (FastAPI middleware):
- `X-Agnes-Latest-Version: 0.40.0``APP_VERSION`, same value the install script bakes in.
- `X-Agnes-Min-Version: 0.0.0` — oldest CLI version the server still accepts. Lives in a single Python constant. Bumped manually when a wire-protocol break ships. **Ships at `0.0.0` on day one** so rollout doesn't accidentally lock anyone out — first deliberate gate is the first time this gets bumped.
The shared HTTP client (`cli/client.py:216`) inspects these on every response:
| Local CLI version | Behavior |
|---|---|
| `>= latest` | nothing |
| `>= min` and `< latest` | nothing — Layer A's startup poll already prints the warning; no need to nag again on every API call |
| `< min` | print `error: agnes <local> is incompatible with server <latest> (min required: <min>). Run: agnes self-upgrade` and `sys.exit(2)`. **Operation is not performed.** |
**Recursion barrier:** `_check_version_headers` short-circuits (returns silently, no enforcement) when `os.environ.get("AGNES_SELF_UPGRADE_IN_PROGRESS") == "1"`. Set by Layer A's command for the duration of the upgrade so the in-flight `agnes self-upgrade` cannot be locked out from itself by a `< min` response on any internal `/api/*` call. The sentinel is process-local and propagates to the smoke-test subprocess via explicit `env=`.
The CLI also sends `User-Agent: agnes/<version> (<platform>)` so the server can audit drift in access logs.
**Day-one floor.** `MIN_COMPAT_CLI_VERSION = "0.0.0"` — no enforcement. The constant + middleware + CLI inspection are an opt-in mechanism for the future. When a wire break ships, the engineer bumps the constant in the same PR and adds a `**BREAKING**` CHANGELOG bullet — same review discipline as every other behavior change. No standalone CI gate, no doc, no PR-template checkbox: those would be theater that catches nothing real (an engineer can check a box without bumping a constant). The mechanism stays free-to-use; the policy is one constant change away when someone needs it.
### How the two layers compose
| Scenario | Layer A | Layer B | Outcome |
|---|---|---|---|
| Happy path | upgrade silent (already current) | headers OK | no output |
| Drift caught at session start | upgrades to latest | headers OK after upgrade | brief "installed: 0.40.0" line if not `--quiet` |
| Hook failed (offline at session start), online now | no-op | `< latest` ⇒ silent (warning still printed by `update_check` from main callback) | analyst sees one warning, runs `agnes self-upgrade` manually |
| Server shipped a wire break, analyst is `< min` | hook would have caught it, but maybe the analyst skipped Claude Code | hard-stop with remediation | exit 2, clear message |
| Headless / CI / ad-hoc terminal (no Claude Code) | hook never runs | warning + hard-stop still apply | covered |
---
## Server-side changes
### `app/version.py` (new — single source of truth)
```python
"""Single source of truth for app + CLI compat versions."""
import importlib.metadata
APP_VERSION = importlib.metadata.version("agnes-the-ai-analyst")
# Bump when shipping a wire-protocol break. Older CLIs are blocked at the
# response-header layer with exit 2 + remediation message. Day-one value
# of 0.0.0 means no enforcement — set the floor the first time a deliberate
# break ships.
MIN_COMPAT_CLI_VERSION = "0.0.0"
```
### `app/main.py` — middleware
```python
@app.middleware("http")
async def add_version_headers(request, call_next):
response = await call_next(request)
if request.url.path.startswith("/api/"):
response.headers["X-Agnes-Latest-Version"] = APP_VERSION
response.headers["X-Agnes-Min-Version"] = MIN_COMPAT_CLI_VERSION
return response
```
Applied only to `/api/` so marketplace / wheel / web UI responses stay clean. Verify CORS `expose_headers` includes these (or `*`).
### `app/api/cli_artifacts.py` — fix stale `da` reference
Drive-by: line 47 still says *"Consumed by `da` CLI's auto-update check"*. Update to `agnes`. No behavior change.
`/cli/latest` itself stays as-is — pure metadata about the wheel on disk. `min_version` is a server-policy concern (per-request), not wheel metadata, so it lives on the headers and not in this payload.
---
## CLI-side changes
### `cli/commands/self_upgrade.py` (new)
Logic per Layer A above. ~80 lines including the install subprocess call. Reuses:
- `cli.update_check.check()` for the version probe (identical to what `cli/main.py:102` already calls)
- `cli.update_check.format_outdated_notice()` for `--check-only` output
- `cli.config.get_server_url()` for the server URL
- `shutil.which("uv")` to choose install path
- `subprocess.run` with `check=True` to surface install failures
Wire into `cli/main.py` near the existing typer registrations.
### `cli/client.py:get_client()` — header inspection
Wrap the returned `httpx.Client` so every response goes through one hook. Cleanest is `httpx.Client(event_hooks={"response": [_check_version_headers]})`:
```python
def _check_version_headers(response: httpx.Response) -> None:
latest = response.headers.get("X-Agnes-Latest-Version")
minv = response.headers.get("X-Agnes-Min-Version")
if not latest or not minv:
return # talking to an older server; no enforcement
local = _installed_version() # reuse from update_check
if local == "unknown":
return # dev install / editable; never block
if _version_lt(local, minv): # reuse update_check._version_lt
sys.stderr.write(
f"error: agnes {local} is incompatible with server {latest}"
f" (min required: {minv}). Run: agnes self-upgrade\n"
)
sys.exit(2)
```
Only the hard-stop is enforced here — drift warnings are already handled by `update_check` in the root callback, no point doubling them on every API call.
`_version_lt` and `_installed_version` move from `cli/update_check.py` into `cli/_version_compat.py` (or stay in `update_check.py` and `client.py` imports them) — pick whichever keeps imports simple. Both files need them.
User-Agent: extend `get_client()` to set `headers={"User-Agent": f"agnes/{_installed_version()} ({platform.system().lower()})"}` (merge with caller-supplied headers).
### `cli/lib/hooks.py:install_claude_hooks` — chain self-upgrade ahead of pull
```python
_OUR_COMMAND_MARKERS = ("agnes self-upgrade", "agnes pull", "agnes push", "da sync")
_replace_or_add(
"SessionStart",
"agnes self-upgrade --quiet 2>/dev/null || true; "
"agnes pull --quiet 2>/dev/null || true",
)
_replace_or_add("SessionEnd", "agnes push --quiet 2>/dev/null || true")
```
Single chained SessionStart entry. Shell `;` guarantees ordering (no reliance on Claude Code's undocumented multi-hook semantics); each `|| true` ensures one segment's failure does not abort the line. `_OUR_COMMAND_MARKERS` is extended so re-running `agnes init` recognises the chained line on substring match and replaces rather than duplicates.
### Drive-by cleanup
`cli/update_check.py` docstring (lines 1-9) still references `da` four times. Update to `agnes`. No behavior change.
---
## Tests
### Server
- New: `tests/test_version_headers_middleware.py``/api/sync/trigger` (or any cheap `/api/*`) returns both headers; `/web/*` and `/cli/*` do not.
- Existing `/cli/latest` tests already cover the wheel metadata path.
### CLI
- `tests/test_self_upgrade.py` — mock `update_check.check()`, mock `subprocess.run`, assert correct command shape (uv vs pip path), assert `--check-only` exits 1 when outdated and 0 when current, assert `--force` skips the `is_outdated()` short-circuit, assert success path invalidates the `update_check.json` cache.
- `tests/test_client_version_check.py` — fake response with `min > local``SystemExit(2)`. Fake response with `latest > local >= min` ⇒ no stderr, no exit. Local `unknown` ⇒ no enforcement. Missing headers (old server) ⇒ no enforcement.
- `tests/test_lib_hooks.py` — assert the chained command is the sole SessionStart entry, that `self-upgrade` precedes `pull`, that both segments end in `|| true`, and that re-running `install_claude_hooks` stays idempotent (length stays at 1).
---
## Migration / rollout
- Additive — no breaking change. Old CLIs (no header check, no self-upgrade command) keep working; old servers (no headers) make the new CLI silent (no enforcement, just the existing warning loop).
- Ship in one PR. CHANGELOG entry under `### Added`: "CLI now auto-upgrades from the server at session start (`agnes self-upgrade`) and hard-stops on incompatible-version mismatch via response headers."
- After merge, manually bump `MIN_COMPAT_CLI_VERSION` in the next PR that ships a wire-protocol break — that's the first time the hard-stop actually fires.
---
## Self-review
- **Spec coverage:** both layers (A/B), both directions (check + enforce), reuse of `update_check` to avoid two polling paths, hook idempotency, drive-by `da → agnes` cleanup. ✓
- **Resolved during review:** A (`cli/client.py:216` + `cli/main.py:99-104`), B (`MIN_COMPAT_CLI_VERSION = "0.0.0"` on day one), D (reuse `AGNES_NO_UPDATE_CHECK`, no new opt-out flag).
- **No placeholders:** every component has a concrete file path and existing-symbol reference.
- **Type/name consistency:** `APP_VERSION`, `MIN_COMPAT_CLI_VERSION`, `X-Agnes-Latest-Version`, `X-Agnes-Min-Version`, `agnes self-upgrade`, reused `update_check.check()` / `format_outdated_notice()` / `_version_lt()` / `_installed_version()` — consistent throughout.
- Spec, not plan: no per-step TDD breakdown. Convert to a plan once reviewed.