diff --git a/docs/superpowers/plans/2026-05-06-cli-auto-upgrade.md b/docs/superpowers/plans/2026-05-06-cli-auto-upgrade.md new file mode 100644 index 0000000..cddd3a9 --- /dev/null +++ b/docs/superpowers/plans/2026-05-06-cli-auto-upgrade.md @@ -0,0 +1,1546 @@ +# CLI Auto-Upgrade Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Make `agnes` CLI auto-upgrade from the server it talks to. Two layers: (A) `agnes self-upgrade` invoked from a SessionStart hook for proactive upgrade; (B) `X-Agnes-Min-Version` response header for a hard-stop on incompatible drift. + +**Architecture:** Server already serves `/cli/latest` (wheel metadata) and `/cli/wheel/` (wheel bytes). CLI already polls `/cli/latest` from `cli/update_check.py` and warns on drift. This plan adds: a server-side `MIN_COMPAT_CLI_VERSION` constant + middleware that stamps `X-Agnes-Latest-Version` / `X-Agnes-Min-Version` on every `/api/*` response; a CLI `agnes self-upgrade` command that reuses `update_check.check()` and shells out to `uv tool install --force` (pip fallback); response-header inspection in `cli/client.py:get_client()` that hard-stops with `sys.exit(2)` on `local < min`; and a third `SessionStart` hook line that runs `agnes self-upgrade --quiet` ahead of `agnes pull`. + +**Tech Stack:** Python 3.12 / FastAPI / httpx / typer / uv / pytest. No new dependencies. + +**Spec:** `docs/superpowers/specs/2026-05-06-cli-auto-upgrade-spec.md` — read this first if context is unclear. + +--- + +## File Structure + +**New files:** +- `app/version.py` — `APP_VERSION` (deduped from `app/main.py:_app_version`) + `MIN_COMPAT_CLI_VERSION` constants. Single source of truth. +- `cli/commands/self_upgrade.py` — `agnes self-upgrade` typer command, including smoke test (deterministic install path, not PATH-resolved), last-known-good record, rollback with rc capture, recursion sentinel, and explicit `--force` offline error. +- `tests/test_version_headers_middleware.py` — server middleware integration test. +- `tests/test_client_version_check.py` — header-inspection hard-stop test, including the `AGNES_SELF_UPGRADE_IN_PROGRESS` sentinel barrier. +- `tests/test_self_upgrade.py` — command behavior, subprocess shape, smoke-test rollback (with rc capture), `--force` offline failure, `AGNES_NO_UPDATE_CHECK` bypass for explicit upgrades, sentinel propagation. + +**Modified files:** +- `app/main.py` — delete `_app_version()`, import `APP_VERSION` from `app/version.py`, register version-headers middleware. +- `app/api/cli_artifacts.py` — drive-by docstring fix (`da` → `agnes`). +- `cli/client.py` — `get_client()` adds `event_hooks` for response inspection + `User-Agent` header. `_check_version_headers` short-circuits on `AGNES_SELF_UPGRADE_IN_PROGRESS=1`. +- `cli/main.py` — register `self_upgrade_app` typer. +- `cli/update_check.py` — drive-by docstring fix (`da` → `agnes`); add `bypass_disabled=False` keyword-only kwarg to `check()` so explicit `agnes self-upgrade` invocations can override `AGNES_NO_UPDATE_CHECK`; ensure `_version_lt` and `_installed_version` are importable from `cli/client.py` and `cli/commands/self_upgrade.py`. +- `cli/lib/hooks.py` — single chained SessionStart entry (`agnes self-upgrade ... || true; agnes pull ... || true`); extend `_OUR_COMMAND_MARKERS` with `agnes self-upgrade`. +- `tests/test_lib_hooks.py` — assert chained command + ordering + idempotency. +- `tests/test_app_version.py` — rewrite to target `app.version` (since `app.main._app_version` is deleted). +- `CHANGELOG.md` — `### Added` entry under `## [Unreleased]`. +- `pyproject.toml` — bump `[project].version` from `0.39.0` to `0.40.0` in the release-cut commit (Task 7). + +**Files this plan does NOT touch (by design):** +- `~/.config/agnes/last_known_good.json` — written at runtime by `_record_last_known_good` after the smoke test passes; separate file from `update_check.json`. (Convention: record before invalidate, no correctness consequence either way.) +- `docs/CLI_COMPAT.md`, `.github/pull_request_template.md` — earlier draft proposed these as enforcement scaffolding; dropped because a doc + checkbox catches nothing real (engineer can check the box without bumping the constant). Layer B's mechanism stays as opt-in for the day someone needs it; same review discipline as every other behavior change. + +--- + +## Task 1: Server-side version constants + middleware + +**Files:** +- Create: `app/version.py` +- Modify: `app/main.py` (top-level import + middleware registration; replace `_app_version()` body to read from `app.version.APP_VERSION`) +- Create: `tests/test_version_headers_middleware.py` + +- [ ] **Step 1.1: Write the failing middleware test** + +Create `tests/test_version_headers_middleware.py`: + +```python +"""Verify /api/* responses carry X-Agnes-Latest-Version + X-Agnes-Min-Version.""" + +from fastapi.testclient import TestClient + + +def test_api_response_carries_version_headers(): + from app.main import app + from app.version import APP_VERSION, MIN_COMPAT_CLI_VERSION + client = TestClient(app) + # /api/version is unauthenticated and cheap. + resp = client.get("/api/version") + assert resp.status_code == 200 + # Headers must equal the constants in app.version, not just be parseable. + # When MIN_COMPAT_CLI_VERSION is deliberately bumped in a future PR, this + # test is updated in the same PR — the review-discipline guardrail. + assert resp.headers["X-Agnes-Latest-Version"] == APP_VERSION + assert resp.headers["X-Agnes-Min-Version"] == MIN_COMPAT_CLI_VERSION + # Day-one floor pin: drop or update this assertion when the floor moves. + assert resp.headers["X-Agnes-Min-Version"] == "0.0.0" + + +def test_non_api_response_does_not_carry_version_headers(): + from app.main import app + client = TestClient(app) + # /cli/latest is under /cli, not /api — should NOT carry the headers. + resp = client.get("/cli/latest") + assert resp.status_code == 200 + assert "X-Agnes-Latest-Version" not in resp.headers + assert "X-Agnes-Min-Version" not in resp.headers +``` + +- [ ] **Step 1.2: Run test, verify it fails** + +```bash +pytest tests/test_version_headers_middleware.py -v +``` +Expected: FAIL — `X-Agnes-Latest-Version` not in headers. + +- [ ] **Step 1.3: Create `app/version.py`** + +```python +"""Single source of truth for app + CLI compat versions. + +`APP_VERSION` is read from package metadata so it tracks `pyproject.toml` +without a manual literal to keep in sync. + +`MIN_COMPAT_CLI_VERSION` is the oldest CLI version the server still accepts +on `/api/*`. Bumped manually when shipping a wire-protocol break. Day-one +value of "0.0.0" means no enforcement — set the floor the first time a +deliberate break ships. +""" + +from importlib.metadata import PackageNotFoundError +from importlib.metadata import version as _pkg_version + + +def _read_app_version() -> str: + try: + return _pkg_version("agnes-the-ai-analyst") + except PackageNotFoundError: + return "0.0.0+dev" + + +APP_VERSION = _read_app_version() +MIN_COMPAT_CLI_VERSION = "0.0.0" +``` + +- [ ] **Step 1.4: Replace `_app_version()` with `APP_VERSION` import + register middleware** + +Two changes in `app/main.py`: + +(a) **Dedupe.** Both `_app_version()` (line 40) and `app/version.py:APP_VERSION` read from `importlib.metadata.version("agnes-the-ai-analyst")` — keeping both invites drift. Delete the `_app_version()` helper, import `APP_VERSION` at module top: + +```python +# At module top, alongside other app.* imports: +from app.version import APP_VERSION, MIN_COMPAT_CLI_VERSION + +# Delete the entire `_app_version()` function (line 40 onwards). + +# Replace line 186: +- version=_app_version(), ++ version=APP_VERSION, +``` + +(b) **Middleware.** After the `app = FastAPI(...)` instantiation block, add: + +```python +@app.middleware("http") +async def _add_version_headers(request, call_next): + response = await call_next(request) + if request.url.path.startswith("/api/"): + response.headers["X-Agnes-Latest-Version"] = APP_VERSION + response.headers["X-Agnes-Min-Version"] = MIN_COMPAT_CLI_VERSION + return response +``` + +(c) **Update `tests/test_app_version.py`** — the existing tests patch `app.main._pkg_version` and `app.main._app_version`, both of which no longer exist. Rewrite to target `app.version` AND keep the end-to-end pin that the FastAPI app object surfaces the constant: + +```python +"""Pin that APP_VERSION reads from package metadata, not a hardcoded literal, +and that the FastAPI app's `version=` field surfaces it end-to-end.""" + +import importlib +from unittest.mock import patch + + +def test_app_version_reads_package_metadata(): + with patch("app.version._pkg_version", return_value="9.9.9") as mock_pkg_ver: + import app.version + importlib.reload(app.version) + assert app.version.APP_VERSION == "9.9.9" + mock_pkg_ver.assert_called_once_with("agnes-the-ai-analyst") + + +def test_app_version_falls_back_when_package_missing(): + from importlib.metadata import PackageNotFoundError + with patch("app.version._pkg_version", side_effect=PackageNotFoundError): + import app.version + importlib.reload(app.version) + assert app.version.APP_VERSION == "0.0.0+dev" + + +def test_fastapi_app_version_matches_app_version_constant(): + """End-to-end: FastAPI's app.version (consumed by /openapi.json and + /docs) must equal app.version.APP_VERSION. Guards the wiring at + `app/main.py:186 version=APP_VERSION` against accidental literal.""" + import importlib + import app.version + import app.main + + # Reload both so we read post-patch values consistently. + with patch("app.version._pkg_version", return_value="7.7.7"): + importlib.reload(app.version) + importlib.reload(app.main) + assert app.main.app.version == "7.7.7" + assert app.main.app.version == app.version.APP_VERSION +``` + +The reload trick: `APP_VERSION` is set once at module import time; reimporting under a patch reruns `_read_app_version()`. The third test reimports `app.main` after `app.version` to pick up the new constant value through the `from app.version import APP_VERSION` import line. + +- [ ] **Step 1.5: Run test, verify it passes** + +```bash +pytest tests/test_version_headers_middleware.py -v +``` +Expected: PASS — both tests. + +- [ ] **Step 1.6: Run the full app-side test suite to catch regressions** + +```bash +pytest tests/test_app_version.py tests/test_version_headers_middleware.py -v +``` +Expected: PASS — `_app_version()` test still green (we didn't touch it). + +- [ ] **Step 1.7: Commit** + +```bash +git add app/version.py app/main.py tests/test_version_headers_middleware.py tests/test_app_version.py +git commit -m "feat(server): expose APP_VERSION + MIN_COMPAT_CLI_VERSION on /api/* response headers + +Adds X-Agnes-Latest-Version and X-Agnes-Min-Version headers to every +/api/* response. CLI consumes these to hard-stop on incompatible drift. +MIN_COMPAT_CLI_VERSION ships at 0.0.0 — no enforcement until a deliberate +wire-protocol break bumps it. + +Also dedupes app version logic: app/main.py:_app_version() helper deleted, +replaced by app/version.py:APP_VERSION as the single source of truth. +test_app_version.py rewritten to target app.version." +``` + +--- + +## Task 2: CLI response-header version check + +**Files:** +- Modify: `cli/update_check.py` (export helpers — `_version_lt` and `_installed_version` must be reusable; rename to public if needed, or just import the underscore-prefixed names) +- Modify: `cli/client.py:get_client()` — add `event_hooks={"response": [_check_version_headers]}` and `User-Agent` +- Create: `tests/test_client_version_check.py` + +- [ ] **Step 2.1: Write the failing hard-stop test** + +Create `tests/test_client_version_check.py`: + +```python +"""Verify cli/client.py:get_client() hard-stops on min_version mismatch.""" + +from unittest.mock import patch + +import httpx +import pytest + + +def _fake_response(headers: dict) -> httpx.Response: + return httpx.Response(status_code=200, headers=headers, content=b"{}", request=httpx.Request("GET", "http://x/")) + + +def test_local_below_min_exits_with_code_2(): + from cli.client import _check_version_headers + with patch("cli.client._installed_version", return_value="0.30.0"): + resp = _fake_response({ + "X-Agnes-Latest-Version": "0.40.0", + "X-Agnes-Min-Version": "0.35.0", + }) + with pytest.raises(SystemExit) as exc: + _check_version_headers(resp) + assert exc.value.code == 2 + + +def test_local_at_or_above_min_does_not_exit(): + from cli.client import _check_version_headers + with patch("cli.client._installed_version", return_value="0.40.0"): + resp = _fake_response({ + "X-Agnes-Latest-Version": "0.40.0", + "X-Agnes-Min-Version": "0.35.0", + }) + _check_version_headers(resp) # must not raise + + +def test_missing_headers_no_enforcement(): + """Older server without middleware → no headers → no-op.""" + from cli.client import _check_version_headers + with patch("cli.client._installed_version", return_value="0.10.0"): + resp = _fake_response({}) # empty headers + _check_version_headers(resp) # must not raise + + +def test_unknown_local_version_no_enforcement(): + """Source-checkout / editable install → never block.""" + from cli.client import _check_version_headers + with patch("cli.client._installed_version", return_value="unknown"): + resp = _fake_response({ + "X-Agnes-Latest-Version": "0.40.0", + "X-Agnes-Min-Version": "0.35.0", + }) + _check_version_headers(resp) # must not raise + + +def test_self_upgrade_in_progress_disables_enforcement(monkeypatch): + """Recursion barrier: while self-upgrade runs, no /api/* call may + block on min-version drift. Otherwise an in-flight upgrade could + sys.exit(2) with 'Run: agnes self-upgrade' from inside itself.""" + from cli.client import _check_version_headers + monkeypatch.setenv("AGNES_SELF_UPGRADE_IN_PROGRESS", "1") + with patch("cli.client._installed_version", return_value="0.10.0"): + resp = _fake_response({ + "X-Agnes-Latest-Version": "0.40.0", + "X-Agnes-Min-Version": "0.35.0", + }) + _check_version_headers(resp) # must not raise +``` + +- [ ] **Step 2.2: Run test, verify it fails** + +```bash +pytest tests/test_client_version_check.py -v +``` +Expected: FAIL — `cli.client._check_version_headers` does not exist. + +- [ ] **Step 2.3: Implement `_check_version_headers` in `cli/client.py`** + +At the top of `cli/client.py`, near other imports, add: + +```python +import os +import sys + +from cli.update_check import _installed_version, _version_lt +``` + +Then before `get_client()`, define: + +```python +def _check_version_headers(response: "httpx.Response") -> None: + """Hard-stop the CLI when the server reports we're below min_version. + + Drift warnings (`local < latest`) are already printed by the + update_check root callback in cli/main.py — no need to nag again on + every API call. This hook only enforces the hard floor. + """ + # Recursion barrier: `agnes self-upgrade` sets this for the duration + # of the upgrade. Without it, a /api/* call inside the install flow + # could exit 2 with "Run: agnes self-upgrade" — inside agnes + # self-upgrade. The sentinel is process-local and propagates to + # subprocesses via the explicit env= passed to the smoke test. + if os.environ.get("AGNES_SELF_UPGRADE_IN_PROGRESS") == "1": + return + latest = response.headers.get("X-Agnes-Latest-Version") + minv = response.headers.get("X-Agnes-Min-Version") + if not latest or not minv: + return + local = _installed_version() + if local == "unknown": + return + if _version_lt(local, minv): + sys.stderr.write( + f"error: agnes {local} is incompatible with server {latest} " + f"(min required: {minv}). Run: agnes self-upgrade\n" + ) + sys.exit(2) +``` + +**Patch only `get_client()` — leave `_get_shared_client()` alone.** Post-rebase, `cli/client.py` has both `get_client()` (line 216, one-shot metadata calls) and `_get_shared_client()` (line 252, persistent HTTP/2 client used by `stream_download` for parquet bytes via chunked range requests). + +The hook is wired ONLY on `get_client()`: + +- httpx fires response event hooks **as soon as headers arrive**, before `iter_bytes()` consumes the body. On `_get_shared_client()`, `_check_version_headers` would run inside the `with client.stream(...) as response:` context of `_download_chunk` (`cli/client.py:452`) and `_download_single_stream` (`cli/client.py:595`). A `sys.exit(2)` from the hook kills the process mid-stream: `ThreadPoolExecutor` with N parallel chunk-writer threads, open `..partN` file handles, no `.tmp → final` rename. Half-written part files left on disk (the existing PID-reaper cleans those eventually, but the abrupt exit is ungraceful). +- In production, parquet downloads typically go through a Caddy `file_server` (PR #182) anyway, so FastAPI middleware doesn't stamp headers on the streaming responses. Skipping the hook on `_get_shared_client()` matches that production reality. In dev / non-Caddy deployments, parquet streaming bypasses the hard-stop — accepted gap. The next metadata call (which runs through `get_client()`) catches drift. +- All `/api/*` metadata calls (catalog, schema, snapshot create, sync trigger, auth, store, etc.) go through `get_client()`, where the hook fires safely on a fresh single-response client. + +Modify `get_client()` to wire the hook and a User-Agent. Locate the `httpx.Client(...)` constructor call and pass: + +```python +import platform + +return httpx.Client( + base_url=server_url, + timeout=timeout, + headers={**headers, "User-Agent": f"agnes/{_installed_version()} ({platform.system().lower()})"}, + event_hooks={"response": [_check_version_headers]}, +) +``` + +`headers` already contains `Authorization` from the existing implementation; we merge in `User-Agent`. **Do not** modify `_get_shared_client()` — the streaming-response semantics make `sys.exit(2)` from a response event hook unsafe (see the rationale above). + +- [ ] **Step 2.4: Run test, verify it passes** + +```bash +pytest tests/test_client_version_check.py -v +``` +Expected: PASS — all four tests. + +- [ ] **Step 2.5: Run the existing CLI test suite to catch regressions** + +```bash +pytest tests/test_cli_update_check.py tests/test_client_version_check.py -v +``` +Expected: PASS — no regressions in update_check. + +- [ ] **Step 2.6: Commit** + +```bash +git add cli/client.py tests/test_client_version_check.py +git commit -m "feat(cli): hard-stop on incompatible-version response header + +Every API response is inspected via httpx event_hooks. When the server +reports X-Agnes-Min-Version > local, CLI prints a remediation message +and exits 2. Latest-version drift continues to be handled by the +update_check warning loop — no double-warning on every API call." +``` + +--- + +## Task 3: `agnes self-upgrade` command + +**Files:** +- Modify: `cli/update_check.py` — add `bypass_disabled` kwarg to `check()`. +- Create: `cli/commands/self_upgrade.py` +- Modify: `cli/main.py` — register the command +- Create: `tests/test_self_upgrade.py` + +- [ ] **Step 3.0: Extend `check()` with `bypass_disabled` kwarg** + +`AGNES_NO_UPDATE_CHECK=1` was designed to silence the implicit warning loop that runs in the root callback. An explicit `agnes self-upgrade` is a user-typed command and should not become a silent no-op when that env var happens to be set. Thread a keyword-only kwarg through: + +In `cli/update_check.py`, modify the signature and the disabled-check: + +```python +def check(server_url: Optional[str], *, bypass_disabled: bool = False) -> Optional[UpdateInfo]: + """...""" + if not bypass_disabled and is_disabled(): + return None + if not server_url: + return None + # ... rest unchanged +``` + +Existing callers (the root callback at `cli/main.py:102`) keep their default-false behavior; `self-upgrade` will pass `bypass_disabled=True`. Add a test in `tests/test_cli_update_check.py`: + +```python +def test_check_bypass_disabled_overrides_env(monkeypatch): + monkeypatch.setenv("AGNES_NO_UPDATE_CHECK", "1") + with patch("cli.update_check._fetch_latest", return_value={ + "version": "9.9.9", "wheel_filename": "x.whl", + "download_url_path": "/cli/wheel/x.whl", + }): + # Default: env var wins, returns None. + assert check("http://server.test") is None + # Bypass: env var ignored. + info = check("http://server.test", bypass_disabled=True) + assert info is not None and info.latest == "9.9.9" +``` + +Run the existing tests to catch regressions: + +```bash +pytest tests/test_cli_update_check.py -v +``` +Expected: PASS — old tests still green, new test passes. + +Commit at end of task; the kwarg is shipped together with `self-upgrade`. + +- [ ] **Step 3.1: Write the failing tests** + +Create `tests/test_self_upgrade.py`: + +```python +"""Tests for `agnes self-upgrade` — install path, smoke test, rollback +(with rc capture), recursion barrier, --force offline failure, AGNES_NO_UPDATE_CHECK +bypass for explicit upgrades, --quiet stderr behavior, version-mismatch +smoke detection.""" + +import os +import sys +from unittest.mock import patch, MagicMock + +import pytest +from typer.testing import CliRunner + +from cli.main import app +from cli.update_check import UpdateInfo + +runner = CliRunner() + + +@pytest.fixture(autouse=True) +def _ensure_no_sentinel_leak(monkeypatch): + """Pytest test order is not guaranteed; explicitly clear the recursion + sentinel before every test so a leaked value from a prior test doesn't + produce a false-positive 'cleared on exit' assertion.""" + monkeypatch.delenv("AGNES_SELF_UPGRADE_IN_PROGRESS", raising=False) + yield + +_OUTDATED_URL = "http://server.test/cli/wheel/agnes-0.40.0-py3-none-any.whl" +_PRIOR_URL = "http://server.test/cli/wheel/agnes-0.35.0-py3-none-any.whl" + + +def _outdated_info(): + return UpdateInfo(installed="0.30.0", latest="0.40.0", download_url=_OUTDATED_URL) + + +def _current_info(): + return UpdateInfo(installed="0.40.0", latest="0.40.0", download_url=None) + + +def _smoke_pass(): + return (True, "agnes 0.40.0") + + +def _smoke_fail(): + return (False, "exit 1: ImportError: cannot import name 'foo'") + + +def test_check_only_when_outdated_exits_1(): + with patch("cli.commands.self_upgrade.check", return_value=_outdated_info()): + result = runner.invoke(app, ["self-upgrade", "--check-only"]) + assert result.exit_code == 1 + assert "out of date" in result.output + + +def test_check_only_when_current_exits_0(): + with patch("cli.commands.self_upgrade.check", return_value=_current_info()): + result = runner.invoke(app, ["self-upgrade", "--check-only"]) + assert result.exit_code == 0 + + +def test_when_current_short_circuits_no_install(): + with patch("cli.commands.self_upgrade.check", return_value=_current_info()), \ + patch("cli.commands.self_upgrade.subprocess.run") as mock_run: + result = runner.invoke(app, ["self-upgrade"]) + assert result.exit_code == 0 + mock_run.assert_not_called() + + +def test_uv_path_when_uv_available(): + with patch("cli.commands.self_upgrade.check", return_value=_outdated_info()), \ + patch("cli.commands.self_upgrade.shutil.which", return_value="/usr/local/bin/uv"), \ + patch("cli.commands.self_upgrade.subprocess.run") as mock_run, \ + patch("cli.commands.self_upgrade._smoke_test_new_binary", return_value=_smoke_pass()), \ + patch("cli.commands.self_upgrade._read_last_known_good", return_value=None), \ + patch("cli.commands.self_upgrade._record_last_known_good"), \ + patch("cli.commands.self_upgrade._invalidate_update_cache"): + mock_run.return_value = MagicMock(returncode=0) + result = runner.invoke(app, ["self-upgrade"]) + assert result.exit_code == 0 + args = mock_run.call_args_list[0].args[0] + assert args[:3] == ["uv", "tool", "install"] + assert "--force" in args + assert _OUTDATED_URL in args + + +def test_pip_fallback_uses_sys_executable_not_user(): + """pip path must target the running interpreter's venv, never --user.""" + with patch("cli.commands.self_upgrade.check", return_value=_outdated_info()), \ + patch("cli.commands.self_upgrade.shutil.which", return_value=None), \ + patch("cli.commands.self_upgrade.subprocess.run") as mock_run, \ + patch("cli.commands.self_upgrade._smoke_test_new_binary", return_value=_smoke_pass()), \ + patch("cli.commands.self_upgrade._read_last_known_good", return_value=None), \ + patch("cli.commands.self_upgrade._record_last_known_good"), \ + patch("cli.commands.self_upgrade._invalidate_update_cache"): + mock_run.return_value = MagicMock(returncode=0) + result = runner.invoke(app, ["self-upgrade"]) + assert result.exit_code == 0 + cmds = [c.args[0] for c in mock_run.call_args_list] + assert any(cmd[0] == "curl" for cmd in cmds), cmds + pip_cmd = next(cmd for cmd in cmds if "pip" in cmd) + assert pip_cmd[0] == sys.executable, pip_cmd + assert "--force-reinstall" in pip_cmd + assert "--user" not in pip_cmd # would land outside the venv + + +def test_force_invalidates_cache_before_check(): + """--force must drop the cached download_url before probing /cli/latest, + so we get the SERVER's current wheel, not whatever was cached 24h ago.""" + fresh_current_with_url = UpdateInfo(installed="0.40.0", latest="0.40.0", + download_url=_OUTDATED_URL) + with patch("cli.commands.self_upgrade._invalidate_update_cache") as mock_invalidate, \ + patch("cli.commands.self_upgrade.check", return_value=fresh_current_with_url) as mock_check, \ + patch("cli.commands.self_upgrade.shutil.which", return_value="/usr/local/bin/uv"), \ + patch("cli.commands.self_upgrade.subprocess.run") as mock_run, \ + patch("cli.commands.self_upgrade._smoke_test_new_binary", return_value=_smoke_pass()), \ + patch("cli.commands.self_upgrade._read_last_known_good", return_value=None), \ + patch("cli.commands.self_upgrade._record_last_known_good"): + mock_run.return_value = MagicMock(returncode=0) + result = runner.invoke(app, ["self-upgrade", "--force"]) + assert result.exit_code == 0 + # invalidate called twice: once before check (forced fresh probe), + # once after smoke pass (next invocation re-probes the new wheel). + assert mock_invalidate.call_count == 2 + mock_check.assert_called_once() + + +def test_force_offline_exits_1_with_stderr(): + """--force + server unreachable: exit 1 with explicit stderr. + Without --force, an offline check is silent; with --force it is not.""" + with patch("cli.commands.self_upgrade.check", return_value=None), \ + patch("cli.commands.self_upgrade.get_server_url", + return_value="http://server.test"), \ + patch("cli.commands.self_upgrade._invalidate_update_cache"): + result = runner.invoke(app, ["self-upgrade", "--force"], mix_stderr=False) + assert result.exit_code == 1 + assert "cannot reach" in result.stderr + assert "server.test" in result.stderr + + +def test_offline_without_force_is_silent(): + """No --force, server unreachable: exit 0 silently. Implicit warning + loop already covered by update_check.""" + with patch("cli.commands.self_upgrade.check", return_value=None), \ + patch("cli.commands.self_upgrade._invalidate_update_cache"): + result = runner.invoke(app, ["self-upgrade"], mix_stderr=False) + assert result.exit_code == 0 + assert result.stderr == "" + + +def test_self_upgrade_passes_bypass_disabled_to_check(): + """AGNES_NO_UPDATE_CHECK silences the implicit warning loop, but + explicit `agnes self-upgrade` must NOT be a silent no-op when set. + Verify the callback passes bypass_disabled=True to check().""" + with patch("cli.commands.self_upgrade.check", return_value=_current_info()) as mock_check: + result = runner.invoke(app, ["self-upgrade", "--check-only"]) + assert result.exit_code == 0 + # check() was called with bypass_disabled=True (positional or kwarg). + kwargs = mock_check.call_args.kwargs + assert kwargs.get("bypass_disabled") is True + + +def test_quiet_does_not_suppress_install_failure_stderr(): + """--quiet suppresses progress but install/smoke failures always surface.""" + with patch("cli.commands.self_upgrade.check", return_value=_outdated_info()), \ + patch("cli.commands.self_upgrade.shutil.which", return_value="/usr/local/bin/uv"), \ + patch("cli.commands.self_upgrade.subprocess.run") as mock_run, \ + patch("cli.commands.self_upgrade._read_last_known_good", return_value=None): + mock_run.return_value = MagicMock(returncode=42) + result = runner.invoke(app, ["self-upgrade", "--quiet"], mix_stderr=False) + assert result.exit_code == 1 + assert "install failed" in result.stderr + + +def test_smoke_fail_triggers_rollback_when_prior_url_known(): + """Broken new wheel: smoke fails, rollback to last-known-good URL, exit 1.""" + with patch("cli.commands.self_upgrade.check", return_value=_outdated_info()), \ + patch("cli.commands.self_upgrade.shutil.which", return_value="/usr/local/bin/uv"), \ + patch("cli.commands.self_upgrade.subprocess.run") as mock_run, \ + patch("cli.commands.self_upgrade._smoke_test_new_binary", return_value=_smoke_fail()), \ + patch("cli.commands.self_upgrade._read_last_known_good", return_value=_PRIOR_URL), \ + patch("cli.commands.self_upgrade._record_last_known_good") as mock_record: + mock_run.return_value = MagicMock(returncode=0) + result = runner.invoke(app, ["self-upgrade"], mix_stderr=False) + assert result.exit_code == 1 + # Two install calls: forward to new, rollback to prior + urls_installed = [ + arg for c in mock_run.call_args_list + for arg in c.args[0] if isinstance(arg, str) and arg.startswith("http") + ] + assert _OUTDATED_URL in urls_installed + assert _PRIOR_URL in urls_installed + # Last-known-good is NOT updated on a failed upgrade + mock_record.assert_not_called() + assert "smoke test" in result.stderr + + +def test_smoke_fail_with_rollback_failure_surfaces_rc(): + """Forward install ok, smoke fail, rollback ALSO fails: + stderr must surface the rollback rc + bootstrap recovery command.""" + # First call: forward install (rc=0). Second call: rollback (rc=99). + install_results = [MagicMock(returncode=0), MagicMock(returncode=99)] + with patch("cli.commands.self_upgrade.check", return_value=_outdated_info()), \ + patch("cli.commands.self_upgrade.shutil.which", return_value="/usr/local/bin/uv"), \ + patch("cli.commands.self_upgrade.subprocess.run", side_effect=install_results), \ + patch("cli.commands.self_upgrade._smoke_test_new_binary", return_value=_smoke_fail()), \ + patch("cli.commands.self_upgrade._read_last_known_good", return_value=_PRIOR_URL), \ + patch("cli.commands.self_upgrade.get_server_url", + return_value="http://server.test"): + result = runner.invoke(app, ["self-upgrade"], mix_stderr=False) + assert result.exit_code == 1 + assert "rollback ALSO failed" in result.stderr + assert "rc=99" in result.stderr + assert "/cli/install.sh" in result.stderr # bootstrap recovery + + +def test_smoke_fail_no_prior_url_prints_install_sh_recovery(): + """First-ever upgrade with no rollback target: stderr points at the + canonical bootstrap path with a fully-formed curl command.""" + with patch("cli.commands.self_upgrade.check", return_value=_outdated_info()), \ + patch("cli.commands.self_upgrade.shutil.which", return_value="/usr/local/bin/uv"), \ + patch("cli.commands.self_upgrade.subprocess.run") as mock_run, \ + patch("cli.commands.self_upgrade._smoke_test_new_binary", return_value=_smoke_fail()), \ + patch("cli.commands.self_upgrade._read_last_known_good", return_value=None), \ + patch("cli.commands.self_upgrade.get_server_url", + return_value="http://server.test"): + mock_run.return_value = MagicMock(returncode=0) + result = runner.invoke(app, ["self-upgrade"], mix_stderr=False) + assert result.exit_code == 1 + assert "/cli/install.sh" in result.stderr + assert "server.test" in result.stderr # actual server URL, not + + +def test_smoke_pass_records_last_known_good_then_invalidates_cache(): + """Convention: record before invalidate. No correctness consequence either + way; this test pins the convention so swapping order shows up in review.""" + call_order = [] + with patch("cli.commands.self_upgrade.check", return_value=_outdated_info()), \ + patch("cli.commands.self_upgrade.shutil.which", return_value="/usr/local/bin/uv"), \ + patch("cli.commands.self_upgrade.subprocess.run") as mock_run, \ + patch("cli.commands.self_upgrade._smoke_test_new_binary", return_value=_smoke_pass()), \ + patch("cli.commands.self_upgrade._read_last_known_good", return_value=None), \ + patch("cli.commands.self_upgrade._record_last_known_good", + side_effect=lambda url: call_order.append(("record", url))), \ + patch("cli.commands.self_upgrade._invalidate_update_cache", + side_effect=lambda: call_order.append(("invalidate", None))): + mock_run.return_value = MagicMock(returncode=0) + result = runner.invoke(app, ["self-upgrade"]) + assert result.exit_code == 0 + record_idx = next(i for i, c in enumerate(call_order) if c[0] == "record") + invalidate_idx = next(i for i, c in enumerate(call_order) if c[0] == "invalidate") + assert record_idx < invalidate_idx, call_order + assert call_order[record_idx] == ("record", _OUTDATED_URL) + + +def test_self_upgrade_propagates_sentinel_to_smoke_subprocess(): + """During the upgrade, AGNES_SELF_UPGRADE_IN_PROGRESS=1 must be in + os.environ. The smoke test subprocess inherits via env={**os.environ, ...}. + Cleared in finally on callback exit. The test fakes _smoke_test_new_binary + to capture the env it would build, asserting both the sentinel propagation + and the cleanup.""" + captured_envs = [] + + def _fake_smoke(method, expected_version): + env = {**os.environ, "AGNES_NO_UPDATE_CHECK": "1", + "AGNES_SELF_UPGRADE_IN_PROGRESS": "1"} + captured_envs.append(env) + return _smoke_pass() + + with patch("cli.commands.self_upgrade.check", return_value=_outdated_info()), \ + patch("cli.commands.self_upgrade.shutil.which", return_value="/usr/local/bin/uv"), \ + patch("cli.commands.self_upgrade.subprocess.run", + return_value=MagicMock(returncode=0)), \ + patch("cli.commands.self_upgrade._smoke_test_new_binary", side_effect=_fake_smoke), \ + patch("cli.commands.self_upgrade._read_last_known_good", return_value=None), \ + patch("cli.commands.self_upgrade._record_last_known_good"), \ + patch("cli.commands.self_upgrade._invalidate_update_cache"): + result = runner.invoke(app, ["self-upgrade"]) + assert result.exit_code == 0 + assert captured_envs and captured_envs[0]["AGNES_SELF_UPGRADE_IN_PROGRESS"] == "1" + # Cleared in finally + assert os.environ.get("AGNES_SELF_UPGRADE_IN_PROGRESS") is None + + +@pytest.mark.parametrize("install_method,patch_target", [ + ("uv", "_uv_tool_bin_path"), + ("pip", "_pip_bin_path"), +]) +def test_smoke_test_detects_version_mismatch(install_method, patch_target): + """The smoke test must exec the binary at the install-resolved path + (NOT shutil.which) and compare its --version output via + packaging.version.Version equality. A stale PATH-shadow returning the + old version must FAIL the smoke. Parametrized over both uv and pip + install paths so neither branch becomes silently broken.""" + from pathlib import Path + from cli.commands import self_upgrade as su + + fake_bin = f"/fake/{install_method}/bin/agnes" + with patch.object(su, patch_target, return_value=Path(fake_bin)), \ + patch.object(su.subprocess, "run") as mock_run: + mock_run.return_value = MagicMock(returncode=0, stdout="agnes 0.30.0\n", stderr="") + ok, detail = su._smoke_test_new_binary(install_method, expected_version="0.40.0") + assert ok is False + assert "version mismatch" in detail + assert "0.40.0" in detail and "0.30.0" in detail + # Must have execed the install-path binary, not "agnes" via PATH + assert mock_run.call_args.args[0][0] == fake_bin + + +def test_smoke_test_passes_with_pep440_local_version(): + """PEP 440 local version segments (e.g. '0.40.0+local.dev') must NOT + trip the equality check when the server reports the canonical version. + Use Version() comparison, not substring.""" + from pathlib import Path + from cli.commands import self_upgrade as su + + with patch.object(su, "_uv_tool_bin_path", return_value=Path("/fake/agnes")), \ + patch.object(su.subprocess, "run") as mock_run: + # Wheel reports a local-segmented version; server's expected is canonical. + mock_run.return_value = MagicMock(returncode=0, stdout="agnes 0.40.0\n", stderr="") + ok, _ = su._smoke_test_new_binary("uv", expected_version="0.40.0") + assert ok is True + # Reverse: substring "0.40.0" inside "0.40.10" must NOT pass. + mock_run.return_value = MagicMock(returncode=0, stdout="agnes 0.40.10\n", stderr="") + ok, detail = su._smoke_test_new_binary("uv", expected_version="0.40.0") + assert ok is False + assert "version mismatch" in detail +``` + +- [ ] **Step 3.2: Run tests, verify they fail** + +```bash +pytest tests/test_self_upgrade.py -v +``` +Expected: FAIL — `cli.commands.self_upgrade` module does not exist. + +- [ ] **Step 3.3: Create `cli/commands/self_upgrade.py`** + +```python +"""`agnes self-upgrade` — pull the wheel from the server, reinstall, smoke-test, +roll back on failure. + +Flow: + 1. Set AGNES_SELF_UPGRADE_IN_PROGRESS=1 (recursion barrier — see cli/client.py). + 2. If --force, invalidate update_check cache so we get fresh /cli/latest. + 3. Probe via update_check.check(..., bypass_disabled=True) — explicit user + intent overrides AGNES_NO_UPDATE_CHECK (which is for the implicit warning + loop only). + 4. --force + offline ⇒ exit 1 with "cannot reach ". Without --force, + offline is silent. + 5. If nothing to do (current, no download_url) → exit 0. + 6. Snapshot _read_last_known_good() — URL of the last verified-good install. + 7. Install via uv (preferred) or pip (sys.executable, no --user, --no-deps). + 8. Smoke-test the binary at the deterministic install path (NOT shutil.which, + which can resolve a stale PATH shadow). Verify --version output contains + info.latest. Failure → rollback (capturing rc) → exit 1. + 9. On smoke pass: _record_last_known_good(new_url) then + _invalidate_update_cache(). Convention; no correctness consequence either way. + 10. Sentinel cleared in finally. +""" + +from __future__ import annotations + +import json +import os +import shutil +import subprocess +import sys +import tempfile +from pathlib import Path +from typing import Optional, Union + +import typer + +from cli.config import _config_dir, get_server_url +from cli.update_check import UpdateInfo, check, format_outdated_notice + +self_upgrade_app = typer.Typer( + name="self-upgrade", + help="Reinstall the CLI from the server's currently-shipped wheel.", + invoke_without_command=True, +) + +_SENTINEL_ENV = "AGNES_SELF_UPGRADE_IN_PROGRESS" + + +class _Unreachable: + """Sentinel returned by _resolve_info when --force was specified but the + server probe failed. Distinguishes 'explicitly requested an upgrade and + we couldn't reach the server' (exit 1, stderr) from 'no upgrade needed' + (exit 0, silent).""" + + +_UNREACHABLE = _Unreachable() + + +def _invalidate_update_cache() -> None: + """Drop update_check.json so the next CLI invocation re-probes /cli/latest.""" + (_config_dir() / "update_check.json").unlink(missing_ok=True) + + +def _last_known_good_path() -> Path: + return _config_dir() / "last_known_good.json" + + +def _read_last_known_good() -> Optional[str]: + """URL of the last wheel that passed the smoke test on this machine. + None on first ever upgrade — first-run failure falls back to the bootstrap + install.sh recovery message rather than a rollback.""" + p = _last_known_good_path() + if not p.exists(): + return None + try: + return json.loads(p.read_text(encoding="utf-8")).get("download_url") + except (OSError, json.JSONDecodeError): + return None + + +def _record_last_known_good(download_url: str) -> None: + p = _last_known_good_path() + try: + p.parent.mkdir(parents=True, exist_ok=True) + p.write_text(json.dumps({"download_url": download_url}), encoding="utf-8") + except OSError: + pass # best-effort — failure to record must not break the flow + + +def _uv_tool_bin_path() -> Optional[Path]: + """Locate the agnes shim uv installed. + + Tries `uv tool dir --bin` first (uv >= 0.5 prints the entrypoint shim + directory directly). On older uv where `--bin` is rejected, falls back + to uv's documented default install location (`~/.local/bin/` on POSIX, + `%APPDATA%\\uv\\tools\\bin\\` on Windows). Smoke-test failure here would + silently rollback an otherwise-good install on every older-uv analyst, + so the fallback matters. + """ + bin_dir: Optional[Path] = None + try: + out = subprocess.run( + ["uv", "tool", "dir", "--bin"], capture_output=True, text=True, timeout=5, + ) + if out.returncode == 0: + bin_dir = Path(out.stdout.strip()) + except (OSError, subprocess.TimeoutExpired): + bin_dir = None + + if bin_dir is None: + # Fallback: uv's documented default install location. + if sys.platform == "win32": + appdata = os.environ.get("APPDATA") + if appdata: + bin_dir = Path(appdata) / "uv" / "tools" / "bin" + else: + bin_dir = Path.home() / ".local" / "bin" + + if bin_dir is None or not bin_dir.exists(): + return None + + # uv emits `agnes.exe` on Windows and `agnes` on POSIX; check both. + for name in ("agnes.exe", "agnes"): + candidate = bin_dir / name + if candidate.exists(): + return candidate + return None + + +def _pip_bin_path() -> Optional[Path]: + """`/bin/agnes` (POSIX) or `\\Scripts\\agnes.exe` (Windows).""" + parent = Path(sys.executable).parent + name = "agnes.exe" if sys.platform == "win32" else "agnes" + candidate = parent / name + return candidate if candidate.exists() else None + + +def _install_with_uv(download_url: str, *, quiet: bool) -> int: + out = subprocess.DEVNULL if quiet else None + return subprocess.run( + ["uv", "tool", "install", "--force", download_url], stdout=out + ).returncode + + +def _install_with_pip(download_url: str, *, quiet: bool) -> int: + """Install into the SAME interpreter that's running this command. + + sys.executable resolves to the venv (uv-tool venv, user-pip --user venv, + or system) that owns the live `agnes` binary. Using `python3` instead + would PATH-resolve to system python on macOS analyst machines, landing + the wheel outside the agnes venv and silently no-op'ing the upgrade. + --user is wrong here: inside a uv-tool venv it targets ~/.local outside + the venv. Drop it. + """ + out = subprocess.DEVNULL if quiet else None + with tempfile.TemporaryDirectory(prefix="agnes_cli.") as td: + wheel_path = Path(td) / "agnes.whl" + rc = subprocess.run( + ["curl", "-fsSL", "-o", str(wheel_path), download_url], stdout=out + ).returncode + if rc != 0: + return rc + return subprocess.run( + [sys.executable, "-m", "pip", "install", + "--force-reinstall", "--no-deps", str(wheel_path)], + stdout=out, + ).returncode + + +def _smoke_test_new_binary(install_method: str, expected_version: str) -> tuple[bool, str]: + """Exec `/agnes --version` from a fresh subprocess, confirm + it boots AND reports the expected version. + + Resolves the binary at the install-method-specific path (uv tool dir / + sys.executable parent) rather than via PATH — defends against a stale + shadow ahead of the freshly-installed binary in $PATH. Suppresses the + new binary's own update check + propagates the recursion sentinel so + the smoke run can't trigger a nested self-upgrade. + """ + binary = _uv_tool_bin_path() if install_method == "uv" else _pip_bin_path() + if binary is None: + return False, f"agnes binary not found at expected {install_method} install path" + try: + env = {**os.environ, "AGNES_NO_UPDATE_CHECK": "1", _SENTINEL_ENV: "1"} + out = subprocess.run( + [str(binary), "--version"], + capture_output=True, text=True, timeout=10, env=env, + ) + if out.returncode != 0: + return False, f"exit {out.returncode}: {out.stderr.strip()[:200]}" + # `agnes --version` prints `agnes ` — extract and compare + # via packaging.version.Version (PEP 440-aware) to avoid substring + # false-positives like "0.40.0" matching "0.40.10". + from packaging.version import InvalidVersion, Version + tokens = out.stdout.strip().split() + actual_str = tokens[-1] if tokens else "" + try: + if Version(actual_str) != Version(expected_version): + return False, ( + f"version mismatch: expected {expected_version}, " + f"got {actual_str}" + ) + except InvalidVersion: + return False, f"unparseable version output: {out.stdout.strip()[:80]}" + return True, out.stdout.strip() + except (subprocess.TimeoutExpired, OSError) as e: + return False, f"{type(e).__name__}: {e}" + + +def _resolve_info(force: bool) -> Union[UpdateInfo, _Unreachable, None]: + """Returns: + UpdateInfo — install this wheel + _UNREACHABLE — --force specified, server probe failed + None — nothing to do (current, or offline without --force) + """ + if force: + _invalidate_update_cache() + # bypass_disabled=True so an explicit `agnes self-upgrade` is not silenced + # by AGNES_NO_UPDATE_CHECK (which exists for the implicit warning loop). + info = check(get_server_url(), bypass_disabled=True) + if info is None: + return _UNREACHABLE if force else None + if not info.download_url: + return None + if not force and not info.is_outdated(): + return None + return info + + +def _do_install_with_smoke_and_rollback( + info: UpdateInfo, *, quiet: bool +) -> int: + """Returns the exit code typer should use (0 success, 1 failure).""" + prior_url = _read_last_known_good() # may be None on first upgrade + + if shutil.which("uv"): + rc = _install_with_uv(info.download_url, quiet=quiet) + method = "uv" + else: + rc = _install_with_pip(info.download_url, quiet=quiet) + method = "pip" + + if rc != 0: + sys.stderr.write(f"agnes self-upgrade: install failed with exit {rc}\n") + return 1 + + ok, detail = _smoke_test_new_binary(method, expected_version=info.latest) + if not ok: + sys.stderr.write( + f"agnes self-upgrade: new binary failed smoke test ({detail}).\n" + ) + server = get_server_url().rstrip("/") + bootstrap_recovery = f" Manual recovery: curl -fsSL {server}/cli/install.sh | bash\n" + if prior_url and prior_url != info.download_url: + sys.stderr.write(f" rolling back to {prior_url}\n") + rb_rc = ( + _install_with_uv(prior_url, quiet=True) + if method == "uv" + else _install_with_pip(prior_url, quiet=True) + ) + if rb_rc != 0: + sys.stderr.write( + f" rollback ALSO failed (rc={rb_rc}); CLI is in a broken state.\n" + ) + sys.stderr.write(bootstrap_recovery) + else: + sys.stderr.write( + " no prior wheel URL on record; rollback skipped.\n" + ) + sys.stderr.write(bootstrap_recovery) + return 1 + + # Convention: record then invalidate. No correctness consequence either way. + _record_last_known_good(info.download_url) + _invalidate_update_cache() + if not quiet: + typer.echo(f"agnes self-upgrade: installed {info.latest}", err=True) + return 0 + + +@self_upgrade_app.callback() +def self_upgrade( + quiet: bool = typer.Option(False, "--quiet", help="Suppress progress output. Failures still surface on stderr."), + check_only: bool = typer.Option(False, "--check-only", help="Print status, don't install. Exit 1 if outdated."), + force: bool = typer.Option(False, "--force", help="Reinstall the server's current wheel even when already on the latest version."), +) -> None: + # Defensively snapshot any prior value so we restore (rather than + # destroy) it in finally — we own the namespace but a wrapper could + # legitimately set it for its own bookkeeping. + prior_sentinel = os.environ.get(_SENTINEL_ENV) + os.environ[_SENTINEL_ENV] = "1" + try: + info = _resolve_info(force) + + # --check-only is read-only intent — never exit non-zero on + # transport errors. If unreachable, treat as "can't tell, current" + # and exit 0 silently. (Without --check-only, --force + offline + # is exit 1, which is the destructive-intent contract.) + if check_only: + if isinstance(info, _Unreachable) or info is None or not info.is_outdated(): + raise typer.Exit(0) + typer.echo(format_outdated_notice(info), err=True) + raise typer.Exit(1) + + if isinstance(info, _Unreachable): + sys.stderr.write( + f"agnes self-upgrade: cannot reach {get_server_url()}/cli/latest\n" + ) + raise typer.Exit(1) + + if info is None: + raise typer.Exit(0) # nothing to do, silent + + rc = _do_install_with_smoke_and_rollback(info, quiet=quiet) + raise typer.Exit(rc) + finally: + if prior_sentinel is None: + os.environ.pop(_SENTINEL_ENV, None) + else: + os.environ[_SENTINEL_ENV] = prior_sentinel +``` + +- [ ] **Step 3.4: Register in `cli/main.py`** + +After the existing `from cli.commands.X import Y_app` block, add: + +```python +from cli.commands.self_upgrade import self_upgrade_app +``` + +In the `app.add_typer(...)` block (around line 109-127), add: + +```python +app.add_typer(self_upgrade_app, name="self-upgrade") +``` + +Place it near `app.add_typer(setup_app, name="setup")` for grouping. + +- [ ] **Step 3.5: Run tests, verify they pass** + +```bash +pytest tests/test_self_upgrade.py -v +``` +Expected: PASS — all seven tests. + +- [ ] **Step 3.6: Smoke-test the command shape locally** + +```bash +agnes self-upgrade --help +``` +Expected: typer help text with `--quiet`, `--check-only`, `--force` flags. + +- [ ] **Step 3.7: Commit** + +```bash +git add cli/update_check.py cli/commands/self_upgrade.py cli/main.py \ + tests/test_self_upgrade.py tests/test_cli_update_check.py +git commit -m "feat(cli): add agnes self-upgrade with smoke test + rollback + +Reuses cli.update_check.check() for the version probe — extended with +bypass_disabled=True so explicit user-typed self-upgrade is not silenced +by AGNES_NO_UPDATE_CHECK (which is for the implicit warning loop). + +Install path: uv tool install --force when uv is on PATH; otherwise +curl + pip via sys.executable (NOT system python3, NOT --user — both +would land outside the agnes venv and silently no-op the upgrade). + +Smoke test execs the binary at the install-resolved path (uv tool dir +joined with agnes-the-ai-analyst/bin/agnes, or sys.executable's sibling +agnes for pip) — never via shutil.which, which can resolve a stale shadow +on PATH and produce a false-positive smoke pass on the OLD version. Smoke +also asserts --version output contains info.latest. + +On smoke fail: rollback to last_known_good.json (written only after a +previous run's smoke passed). Rollback rc is captured and surfaced on +stderr if it also fails. First-ever upgrade or unrecoverable rollback +prints the canonical bootstrap recovery: curl -fsSL /cli/install.sh | bash. + +AGNES_SELF_UPGRADE_IN_PROGRESS=1 is set for the duration of the run +and propagated to the smoke-test subprocess. Layer B's _check_version_headers +honors the sentinel and skips the < min hard-stop, so an in-flight +upgrade can never sys.exit(2) itself. + +--force invalidates the update_check cache BEFORE probing. --force + +offline = exit 1 with explicit stderr (without --force, offline is silent). +--quiet suppresses progress output but never gags failure stderr." +``` + +--- + +## Task 4: SessionStart hook (single chained entry) + +**Why one entry, not two:** Claude Code's hook execution semantics for multiple SessionStart entries (parallel? sequential? bounded?) are not documented in this repo and are not relied upon. Chain in a single entry with `;` so the shell guarantees ordering: self-upgrade first, pull second, regardless of host. Each segment carries its own `|| true`, so a failed upgrade does not abort the pull. + +**Files:** +- Modify: `cli/lib/hooks.py` +- Modify: `tests/test_lib_hooks.py` + +- [ ] **Step 4.1: Write the failing hook-installer test** + +Append to `tests/test_lib_hooks.py`: + +```python +def test_install_chains_self_upgrade_then_pull_in_one_entry(tmp_path): + install_claude_hooks(tmp_path) + cfg = _read_settings(tmp_path) + session_start = cfg["hooks"]["SessionStart"] + assert len(session_start) == 1, session_start + cmd = session_start[0]["hooks"][0]["command"] + assert "agnes self-upgrade --quiet" in cmd + assert "agnes pull --quiet" in cmd + # Order is encoded in the shell — self-upgrade must appear first + assert cmd.index("agnes self-upgrade") < cmd.index("agnes pull") + # Both segments carry || true so neither failure aborts the line + assert cmd.count("|| true") >= 2 + + +def test_install_idempotent_chained_entry(tmp_path): + install_claude_hooks(tmp_path) + install_claude_hooks(tmp_path) + cfg = _read_settings(tmp_path) + assert len(cfg["hooks"]["SessionStart"]) == 1 + assert len(cfg["hooks"]["SessionEnd"]) == 1 +``` + +The existing `test_install_creates_settings_file` (around line 14) currently asserts `[0]` is the lone pull entry. Update it to assert the chained command: + +```python +def test_install_creates_settings_file(tmp_path): + install_claude_hooks(tmp_path) + cfg = _read_settings(tmp_path) + cmd = cfg["hooks"]["SessionStart"][0]["hooks"][0]["command"] + assert "agnes self-upgrade --quiet" in cmd + assert "agnes pull --quiet" in cmd + assert "agnes push --quiet" in cfg["hooks"]["SessionEnd"][0]["hooks"][0]["command"] +``` + +The existing `test_install_idempotent` already asserts `len(SessionStart) == 1` — leave as-is, that's still correct under the chained-entry design. + +- [ ] **Step 4.2: Run tests, verify they fail** + +```bash +pytest tests/test_lib_hooks.py -v +``` +Expected: FAIL — chained-entry tests fail (the lone pull command does not contain `self-upgrade`). + +- [ ] **Step 4.3: Modify `cli/lib/hooks.py`** + +Update `_OUR_COMMAND_MARKERS` (line 27) to include `self-upgrade` so the substring match still recognises our line for idempotent replacement: + +```python +_OUR_COMMAND_MARKERS = ("agnes self-upgrade", "agnes pull", "agnes push", "da sync") +``` + +Replace the SessionStart registration (around line 63) with a single chained command: + +```python + _replace_or_add( + "SessionStart", + "agnes self-upgrade --quiet 2>/dev/null || true; " + "agnes pull --quiet 2>/dev/null || true", + ) + _replace_or_add("SessionEnd", "agnes push --quiet 2>/dev/null || true") +``` + +The `;` runs the second command unconditionally; each `|| true` prevents either failure from aborting the line. Idempotency: re-running `install_claude_hooks` matches the existing entry on either `agnes self-upgrade` or `agnes pull` (both substrings present), drops it, and re-appends — net length stays at 1. + +- [ ] **Step 4.4: Run tests, verify they pass** + +```bash +pytest tests/test_lib_hooks.py -v +``` +Expected: PASS — all hook tests including the new chained-entry assertions and idempotency. + +- [ ] **Step 4.5: Commit** + +```bash +git add cli/lib/hooks.py tests/test_lib_hooks.py +git commit -m "feat(cli): install SessionStart hook chaining self-upgrade then pull + +Single hook entry: 'agnes self-upgrade --quiet ... || true; agnes pull +--quiet ... || true'. Shell semicolon guarantees ordering across every +Claude Code version (no reliance on undocumented multi-hook execution +semantics); each segment's || true preserves the original property +that an upgrade failure does not abort the pull." +``` + +--- + +## Task 5: Drive-by `da` → `agnes` cleanup + CHANGELOG + +**Files:** +- Modify: `app/api/cli_artifacts.py` +- Modify: `cli/update_check.py` +- Modify: `CHANGELOG.md` + +- [ ] **Step 5.1: Fix `da` references** + +In `app/api/cli_artifacts.py:47`, replace: + +``` + Consumed by `da` CLI's auto-update check so it can warn when a newer +``` + +with: + +``` + Consumed by `agnes` CLI's auto-update check so it can warn when a newer +``` + +In `cli/update_check.py:1-9`, replace the four `da` occurrences in the docstring with `agnes`: + +```python +"""Auto-check for a newer CLI version on the configured server. + +Runs in the root typer callback before subcommand dispatch. Failure is +silent — we never block a working `agnes` command on a best-effort version +probe. Result is cached in `$AGNES_CONFIG_DIR/update_check.json` for 24h so +we don't hammer the server on every invocation. + +Disable with `AGNES_NO_UPDATE_CHECK=1`. +""" +``` + +Also fix the `da` reference in the negative-cache comment around line 26: + +```python +_NEGATIVE_CACHE_TTL_SECONDS = 5 * 60 # 5min on a failed probe, to avoid +# re-probing 3s of silence (drop-packet networks: corporate firewall, VPN) +# on every `agnes` invocation. +``` + +- [ ] **Step 5.2: Add CHANGELOG entry** + +Open `CHANGELOG.md`. After rebasing on `origin/main`, the file's structure at the top is: + +``` +line 11: ## [Unreleased] +line 12: (blank) +line 13: ## [0.39.0] — 2026-05-06 +line 15: ### Performance +... +``` + +The `## [Unreleased]` block is empty. Insert `### Added` and the three bullets directly between line 11 and line 13: + +```markdown +## [Unreleased] + +### Added + +- CLI auto-upgrade: ... +- Server: ... +- CLI: ... + +## [0.39.0] — 2026-05-06 +``` + +```markdown +- CLI auto-upgrade: `agnes self-upgrade` reinstalls the CLI from the server's currently-shipped wheel via `uv tool install --force`, falling back to `pip install --force-reinstall --no-deps` via `sys.executable` when uv is not on PATH. After install, the new binary is smoke-tested at the install-resolved path (`uv tool dir --bin` for uv, `/agnes` for pip) — never via PATH lookup, to avoid stale-shadow false positives. Smoke failure triggers automatic rollback to the previously verified-good wheel (recorded in `~/.config/agnes/last_known_good.json`); rollback's exit code is captured and surfaced on stderr if it also fails. First-ever upgrade or unrecoverable rollback prints the canonical bootstrap recovery: `curl -fsSL /cli/install.sh | bash`. The new command is wired into the SessionStart hook installed by `agnes init` as a chained shell entry (`agnes self-upgrade … || true; agnes pull … || true`) so an upgrade failure does not block the pull. +- Server: `/api/*` responses now carry `X-Agnes-Latest-Version` and `X-Agnes-Min-Version` headers. CLIs older than `X-Agnes-Min-Version` exit with **code 2** and a remediation message instead of failing on a wire-protocol mismatch. Day-one floor is `0.0.0` (no enforcement) — bump `MIN_COMPAT_CLI_VERSION` in `app/version.py` in the same PR that ships a deliberate wire break. +- CLI: `cli/update_check.py:check()` accepts a keyword-only `bypass_disabled=True` so explicit `agnes self-upgrade` invocations probe `/cli/latest` even when `AGNES_NO_UPDATE_CHECK=1` is set (which silences the implicit warning loop only). +``` + +- [ ] **Step 5.3: Run the full affected test surface** + +```bash +pytest tests/test_app_version.py tests/test_version_headers_middleware.py \ + tests/test_cli_update_check.py tests/test_client_version_check.py \ + tests/test_self_upgrade.py tests/test_lib_hooks.py \ + tests/test_cli_init.py -v +``` +Expected: PASS — full green. + +- [ ] **Step 5.4: Commit** + +```bash +git add app/api/cli_artifacts.py cli/update_check.py CHANGELOG.md +git commit -m "chore: rename stale 'da' references to 'agnes' + CHANGELOG + +Drive-by docstring/comment cleanup in cli_artifacts.py and update_check.py. +CHANGELOG entry for the auto-upgrade feature shipped in this branch." +``` + +--- + +## Task 6: Manual verification + +- [ ] **Step 6.1: Local smoke test — version mismatch hard-stop** + +Start the server locally: + +```bash +cd /path/to/agnes +uvicorn app.main:app --reload & +SERVER_PID=$! +``` + +Force a min-version mismatch by patching `app/version.py`: + +```bash +sed -i.bak 's/MIN_COMPAT_CLI_VERSION = "0.0.0"/MIN_COMPAT_CLI_VERSION = "99.99.99"/' app/version.py +``` + +Wait for the reload, then hit any `/api/*` endpoint with the CLI: + +```bash +agnes status +``` + +Expected: stderr `error: agnes is incompatible with server (min required: 99.99.99). Run: agnes self-upgrade`, exit code 2. + +Restore: + +```bash +mv app/version.py.bak app/version.py +kill $SERVER_PID +``` + +- [ ] **Step 6.2: Local smoke test — `agnes self-upgrade --check-only`** + +```bash +agnes self-upgrade --check-only +``` + +Expected: exit 0 (current) or exit 1 with `[update] agnes ... out of date ...` on stderr (depends on what version is on disk vs. served). + +- [ ] **Step 6.3: Verify hook installation** + +In a clean tmp workspace: + +```bash +mkdir /tmp/agnes-hook-smoke && cd /tmp/agnes-hook-smoke +agnes init +cat .claude/settings.json | jq '.hooks.SessionStart' +``` + +Expected: two entries — `agnes self-upgrade --quiet ...` and `agnes pull --quiet ...` in that order. + +Re-run: + +```bash +agnes init +cat .claude/settings.json | jq '.hooks.SessionStart | length' +``` + +Expected: `2` (not `4`) — idempotent. + +- [ ] **Step 6.4: Open the PR** + +```bash +git push -u origin zs/cli-auto-upgrade-spec +gh pr create --title "feat: server-pinned CLI auto-upgrade" --body "$(cat <<'EOF' +## Summary +- `agnes self-upgrade` reinstalls the CLI from `/cli/wheel/` (uv tool install --force, pip --user fallback). Reuses cli.update_check.check() — single polling path, single cache. +- SessionStart hook installs the upgrade ahead of `agnes pull`, so analyst CLIs stay current with the server they connect to. +- /api/* responses carry X-Agnes-Latest-Version / X-Agnes-Min-Version headers. CLIs below min exit 2 with a remediation message instead of failing on a wire-protocol mismatch. +- Drive-by: stale `da` references renamed to `agnes` in cli_artifacts.py and update_check.py docstrings. + +## Spec / plan +- Spec: `docs/superpowers/specs/2026-05-06-cli-auto-upgrade-spec.md` +- Plan: `docs/superpowers/plans/2026-05-06-cli-auto-upgrade.md` + +## Test plan +- [x] `pytest tests/test_version_headers_middleware.py` — middleware applied to /api/*, not /web/* +- [x] `pytest tests/test_client_version_check.py` — hard-stop on min mismatch +- [x] `pytest tests/test_self_upgrade.py` — uv path, pip fallback, --check-only, --force, --quiet +- [x] `pytest tests/test_lib_hooks.py` — new entry + idempotency +- [ ] Manual: spoof `MIN_COMPAT_CLI_VERSION="99.99.99"` server-side, verify CLI exits 2 +- [ ] Manual: fresh `agnes init` workspace shows two SessionStart entries in correct order +EOF +)" +``` + +--- + +## Task 7: Release-cut (last commits on this PR) + +**Why now:** per CLAUDE.md changelog discipline + project convention, the version bump and `[Unreleased]` rename land on the same PR as the user-visible behavior change. This task converts the in-flight CHANGELOG entry into a versioned release. + +**Files:** +- Modify: `CHANGELOG.md` — rename topmost `## [Unreleased]` to `## [0.40.0] — 2026-05-06`, then add a fresh empty `## [Unreleased]` heading above it for the next PR. +- Modify: `pyproject.toml` — bump `[project].version` from `0.39.0` to `0.40.0` (additive feature → minor bump). + +- [ ] **Step 7.1: Rename `## [Unreleased]` → `## [0.40.0] — 2026-05-06`** + +In `CHANGELOG.md`, locate the topmost `## [Unreleased]` heading. Rename it to `## [0.40.0] — 2026-05-06`. Above it, insert a new empty `## [Unreleased]` block so the next PR has somewhere to land: + +```markdown +## [Unreleased] + +## [0.40.0] — 2026-05-06 + +### Added +- CLI auto-upgrade: ... (existing entries from Task 5) +- Server: `/api/*` responses now carry ... (existing entries from Task 5) +``` + +- [ ] **Step 7.2: Bump `pyproject.toml` version** + +```bash +sed -i.bak 's/^version = "0.39.0"/version = "0.40.0"/' pyproject.toml && rm pyproject.toml.bak +``` + +Verify: + +```bash +grep '^version = ' pyproject.toml +``` +Expected output: `version = "0.40.0"` + +- [ ] **Step 7.3: Commit** + +```bash +git add CHANGELOG.md pyproject.toml +git commit -m "release: 0.40.0 — server-pinned CLI auto-upgrade + +See CHANGELOG.md for the full entry." +``` + +- [ ] **Step 7.4: Tag + GitHub Release (after PR merge)** + +After the PR merges to `main`, capture the merge SHA explicitly so a concurrent unrelated merge between this PR's merge and the operator running tag commands does not push our tag onto the wrong commit: + +```bash +PR_NUM= +MERGE_SHA=$(gh pr view "$PR_NUM" --json mergeCommit -q .mergeCommit.oid) +git fetch origin +git tag v0.40.0 "$MERGE_SHA" +git push origin v0.40.0 +``` + +Then create a GitHub Release for `v0.40.0`. Mirror the prose structure of the most recent prior release on the same repo (`gh release view v0.39.0` for the latest format) — typically an intro paragraph, the CHANGELOG section verbatim, and any operator-facing notes (e.g. *"this release introduces SessionStart hook behavior; expect a one-time `agnes self-upgrade` install on the first session per analyst"*). + +```bash +gh release create v0.40.0 --target "$MERGE_SHA" --title "v0.40.0 — server-pinned CLI auto-upgrade" --notes "$(...)" +``` + +(Per user memory: a git tag without a GitHub Release is incomplete.) + +--- + +## Self-Review Checklist (run before declaring complete) + +- [ ] Spec coverage: every section of the spec maps to a task above. ✓ +- [ ] Placeholder scan: no "TBD" / "fill in later" / "similar to Task N" without inline code. +- [ ] Type/name consistency: `APP_VERSION`, `MIN_COMPAT_CLI_VERSION`, `X-Agnes-Latest-Version`, `X-Agnes-Min-Version`, `_check_version_headers`, `self_upgrade_app`, `_invalidate_update_cache`, `_install_with_uv`, `_install_with_pip`, `_smoke_test_new_binary`, `_uv_tool_bin_path`, `_pip_bin_path`, `_Unreachable`, `_UNREACHABLE`, `_read_last_known_good`, `_record_last_known_good`, `bypass_disabled` — used identically across tasks. +- [ ] CHANGELOG entry exists under `## [Unreleased]` (Task 5), then renamed to `## [0.40.0] — 2026-05-06` (Task 7). +- [ ] CLAUDE.md "OSS — no customer-specific content" rule respected: no Keboola/Groupon/FoundryAI tokens in code or PR body. +- [ ] Each task ends with a real commit. No squash-everything-at-end. +- [ ] Layer B is shipped at `MIN_COMPAT_CLI_VERSION = "0.0.0"` — no enforcement on day one. The bump-when-needed policy is review-time discipline, not a CI gate (rejected during spec iteration as theater). diff --git a/docs/superpowers/specs/2026-05-06-cli-auto-upgrade-spec.md b/docs/superpowers/specs/2026-05-06-cli-auto-upgrade-spec.md new file mode 100644 index 0000000..15865cf --- /dev/null +++ b/docs/superpowers/specs/2026-05-06-cli-auto-upgrade-spec.md @@ -0,0 +1,241 @@ +# CLI Auto-Upgrade — Server-Pinned Version + +> **Status:** spec / design. Convert to an implementation plan in `docs/superpowers/plans/` once reviewed. + +**Goal:** Keep an analyst's locally-installed `agnes` CLI in sync with the server it talks to. The server is the single source of truth for "what version should be running"; the CLI never asks PyPI, only the server. + +**Why now:** today an analyst installs once via `uv tool install $SERVER/cli/wheel/` and drifts arbitrarily. The CLI already prints a *warning* when out of date but never upgrades itself, and there's no hard-stop when a wire-protocol break ships — drifted clients fail with cryptic errors instead of being told to upgrade. + +**Non-goal:** distributing the CLI through PyPI, GitHub releases, or any out-of-band channel. The wheel lives next to the server (`/app/dist/*.whl`) and is served by `app/api/cli_artifacts.py`. + +--- + +## What already exists + +The first half of this design **is already shipped**, just incomplete: + +- **`GET /cli/latest`** (`app/api/cli_artifacts.py:42`) → `{version, wheel_filename, download_url_path}`. Public, no auth. +- **`GET /cli/wheel/{name}`** + `/cli/download` + `/cli/install.sh` for distribution. +- **`cli/update_check.py`** — polls `/cli/latest` on every CLI invocation from `cli/main.py:99-104`, caches result for 24h (positive) / 5min (negative), prints a stderr warning with a copy-paste `uv tool install --force ` command. Opt-out: `AGNES_NO_UPDATE_CHECK=1`. +- **`cli/client.py:216 get_client()`** — the shared `httpx.Client` factory. Single chokepoint for response-header inspection. +- **Hook installer** at `cli/lib/hooks.py:install_claude_hooks` writes: + - `SessionStart` → `agnes pull --quiet 2>/dev/null || true` + - `SessionEnd` → `agnes push --quiet 2>/dev/null || true` + +What's missing: + +1. The CLI prints a copy-paste command but never **executes** the upgrade. +2. No `min_version` floor — drift is unbounded; a wire break gives a cryptic 500 instead of a clear "you're too old, upgrade". +3. No SessionStart hook for proactive upgrade — analyst must notice the warning, copy, paste, run. +4. The server-side comment on `/cli/latest` (`app/api/cli_artifacts.py:47`) and the docstring in `cli/update_check.py` still reference the old `da` binary name; cleanup while we're in there. + +--- + +## Design + +Two layers, complementary, with different latencies and failure modes. + +### Layer A — proactive auto-upgrade (SessionStart hook + new CLI command) + +`agnes init` writes a **single** SessionStart hook entry that chains self-upgrade and pull with `;` so ordering is guaranteed by the shell, not by undocumented Claude Code hook-execution semantics: + +``` +SessionStart → agnes self-upgrade --quiet 2>/dev/null || true; agnes pull --quiet 2>/dev/null || true +SessionEnd → agnes push --quiet 2>/dev/null || true +``` + +The `;` runs both unconditionally; each `|| true` keeps a single failure from aborting the line. We lose nothing the design relied on (the *"upgrade fail does not block pull"* property is preserved by the second `|| true`), and we gain an ordering guarantee that holds across every Claude Code version. + +`agnes self-upgrade [--quiet] [--check-only] [--force]`: + +1. Set `AGNES_SELF_UPGRADE_IN_PROGRESS=1` in `os.environ` for the duration of the call. Layer B's header check reads this sentinel and *skips* the hard-stop while we're upgrading — without this, a later refactor that has `self-upgrade` calling `get_client()` (e.g. for auth) would loop: hit `< min`, exit 2 with *"Run: agnes self-upgrade"* — inside `agnes self-upgrade`. Sentinel propagates to subprocesses via the explicit `env=` we pass to the smoke test. +2. If `--force`, **invalidate** the `update_check.json` cache *before* probing, so we always pick up the server's current `download_url`. +3. Reuse `cli.update_check.check(server_url)` — same `/cli/latest` call, same cache, same version comparison. No second polling path. +4. If `info is None` (disabled / no server / unknown local version) or `(not force and not info.is_outdated())` → exit 0. +5. `--check-only` → print `format_outdated_notice(info)`, exit 1 if outdated, 0 if current. +6. Otherwise: snapshot `prior_url = _read_last_known_good()` (the URL of the version we last successfully smoke-tested into; may be `None` on first upgrade — best-effort rollback only). Then reinstall: + - `uv` available (`shutil.which("uv")`) → `uv tool install --force ""` + - else → download wheel to `mktemp -d` (curl), then `[sys.executable, "-m", "pip", "install", "--force-reinstall", "--no-deps", ]`. **Crucially** uses `sys.executable` (the running CLI's interpreter) rather than `python3` (PATH-resolved system Python), and **does not** pass `--user` — both would land the wheel outside the uv-tool venv that owns the `agnes` binary, silently no-op'ing the upgrade. +7. **Smoke-test the new binary** before declaring success — but not via `shutil.which("agnes")`. PATH may shadow the just-installed binary with a stale `/usr/local/bin/agnes` from an old `pip install --user` or Homebrew shim, in which case `--version` would print the *old* version and report success. Instead, locate the binary deterministically: + - **uv path** → call `uv tool dir --bin` (one subprocess; uv's `--bin` flag returns the directory containing entrypoint shims, working transparently across POSIX/Windows). Look for `agnes` then `agnes.exe` in that directory. + - **pip path** → `/agnes` (POSIX) or `/agnes.exe` (Windows) — the sibling of the running interpreter, which is the venv pip just rewrote. + Then `subprocess.run([str(binary), "--version"], env={**os.environ, "AGNES_NO_UPDATE_CHECK": "1", "AGNES_SELF_UPGRADE_IN_PROGRESS": "1"}, timeout=10, capture_output=True)`. Smoke passes when returncode is 0 **and** the trailing token of stdout parses to a `packaging.version.Version` equal to `info.latest` — equality on `Version()` (not substring), so `0.40.0` does not falsely match `0.40.10` and PEP 440 local segments are handled. +8. On smoke fail: if `prior_url` is set and ≠ `info.download_url`, attempt a single rollback install of `prior_url` via the same uv/pip path. **Capture the rollback's return code** — if it's non-zero, the CLI is in a broken state, surface this on stderr alongside the bootstrap-recovery command. If `prior_url` is `None` (first-ever upgrade) or rollback also fails, stderr prints `Run: curl -fsSL /cli/install.sh | bash` — the canonical bootstrap path that doesn't depend on local state. Either way `raise typer.Exit(1)`. +9. On smoke pass: `_record_last_known_good(info.download_url)` (writes `~/.config/agnes/last_known_good.json` — separate from `update_check.json`, updated only after a verified-good install) then `_invalidate_update_cache()`. Convention; no correctness consequence either way. +10. `--quiet` suppresses progress output; **stderr always passes through on install / smoke / rollback failures** — `--quiet` is for routine success runs (the SessionStart hook), not a gag on errors. +11. **`--force` + offline.** `--force` invalidates the cache before probing `/cli/latest`. If the probe fails (network down), `--force` raises `typer.Exit(1)` with `cannot reach /cli/latest` on stderr — explicit destructive intent deserves explicit feedback. Without `--force`, an offline probe is silent (the implicit warning loop's contract). +12. **`--check-only` is read-only intent — exit 0 on transport errors.** Even with `--force`, when the probe is unreachable under `--check-only`, the command exits 0 silently rather than surfacing the error: `--check-only` should never produce a non-zero exit unless the CLI is *known* outdated. (`--force` semantics still apply to the actual install path; pairing `--check-only --force` is well-defined: it invalidates the cache, fresh-probes, prints status, never installs.) +13. **`AGNES_NO_UPDATE_CHECK=1`** silences the implicit warning loop only. Explicit `agnes self-upgrade` calls `check(server_url, bypass_disabled=True)` so the env var does not turn a user-typed upgrade command into a silent no-op. + +**Platform support:** smoke test branches on `sys.platform == "win32"` for the `.exe` suffix; the rest of the flow is platform-neutral via uv. Windows is supported on a best-effort basis (analyst laptops are predominantly macOS/Linux). + +Honors the existing `AGNES_NO_UPDATE_CHECK=1` opt-out — same flag, same intent. No new opt-out env var. + +**Latency:** runs once at session start, blocks pull by ~3-10s on upgrade (install + ~1s smoke test), ~0.2s when in-sync (one cached HTTP roundtrip + early-out). + +**Failure modes:** offline / server down → `|| true` → session continues on old version. Install succeeds but new wheel is broken → smoke test catches it, attempts rollback, prints recovery instructions. Layer B catches drift on the next API call. + +### Layer B — reactive verification (response headers) + +Every `/api/*` response includes two headers (FastAPI middleware): + +- `X-Agnes-Latest-Version: 0.40.0` — `APP_VERSION`, same value the install script bakes in. +- `X-Agnes-Min-Version: 0.0.0` — oldest CLI version the server still accepts. Lives in a single Python constant. Bumped manually when a wire-protocol break ships. **Ships at `0.0.0` on day one** so rollout doesn't accidentally lock anyone out — first deliberate gate is the first time this gets bumped. + +The shared HTTP client (`cli/client.py:216`) inspects these on every response: + +| Local CLI version | Behavior | +|---|---| +| `>= latest` | nothing | +| `>= min` and `< latest` | nothing — Layer A's startup poll already prints the warning; no need to nag again on every API call | +| `< min` | print `error: agnes is incompatible with server (min required: ). Run: agnes self-upgrade` and `sys.exit(2)`. **Operation is not performed.** | + +**Recursion barrier:** `_check_version_headers` short-circuits (returns silently, no enforcement) when `os.environ.get("AGNES_SELF_UPGRADE_IN_PROGRESS") == "1"`. Set by Layer A's command for the duration of the upgrade so the in-flight `agnes self-upgrade` cannot be locked out from itself by a `< min` response on any internal `/api/*` call. The sentinel is process-local and propagates to the smoke-test subprocess via explicit `env=`. + +The CLI also sends `User-Agent: agnes/ ()` so the server can audit drift in access logs. + +**Day-one floor.** `MIN_COMPAT_CLI_VERSION = "0.0.0"` — no enforcement. The constant + middleware + CLI inspection are an opt-in mechanism for the future. When a wire break ships, the engineer bumps the constant in the same PR and adds a `**BREAKING**` CHANGELOG bullet — same review discipline as every other behavior change. No standalone CI gate, no doc, no PR-template checkbox: those would be theater that catches nothing real (an engineer can check a box without bumping a constant). The mechanism stays free-to-use; the policy is one constant change away when someone needs it. + +### How the two layers compose + +| Scenario | Layer A | Layer B | Outcome | +|---|---|---|---| +| Happy path | upgrade silent (already current) | headers OK | no output | +| Drift caught at session start | upgrades to latest | headers OK after upgrade | brief "installed: 0.40.0" line if not `--quiet` | +| Hook failed (offline at session start), online now | no-op | `< latest` ⇒ silent (warning still printed by `update_check` from main callback) | analyst sees one warning, runs `agnes self-upgrade` manually | +| Server shipped a wire break, analyst is `< min` | hook would have caught it, but maybe the analyst skipped Claude Code | hard-stop with remediation | exit 2, clear message | +| Headless / CI / ad-hoc terminal (no Claude Code) | hook never runs | warning + hard-stop still apply | covered | + +--- + +## Server-side changes + +### `app/version.py` (new — single source of truth) + +```python +"""Single source of truth for app + CLI compat versions.""" +import importlib.metadata + +APP_VERSION = importlib.metadata.version("agnes-the-ai-analyst") + +# Bump when shipping a wire-protocol break. Older CLIs are blocked at the +# response-header layer with exit 2 + remediation message. Day-one value +# of 0.0.0 means no enforcement — set the floor the first time a deliberate +# break ships. +MIN_COMPAT_CLI_VERSION = "0.0.0" +``` + +### `app/main.py` — middleware + +```python +@app.middleware("http") +async def add_version_headers(request, call_next): + response = await call_next(request) + if request.url.path.startswith("/api/"): + response.headers["X-Agnes-Latest-Version"] = APP_VERSION + response.headers["X-Agnes-Min-Version"] = MIN_COMPAT_CLI_VERSION + return response +``` + +Applied only to `/api/` so marketplace / wheel / web UI responses stay clean. Verify CORS `expose_headers` includes these (or `*`). + +### `app/api/cli_artifacts.py` — fix stale `da` reference + +Drive-by: line 47 still says *"Consumed by `da` CLI's auto-update check"*. Update to `agnes`. No behavior change. + +`/cli/latest` itself stays as-is — pure metadata about the wheel on disk. `min_version` is a server-policy concern (per-request), not wheel metadata, so it lives on the headers and not in this payload. + +--- + +## CLI-side changes + +### `cli/commands/self_upgrade.py` (new) + +Logic per Layer A above. ~80 lines including the install subprocess call. Reuses: + +- `cli.update_check.check()` for the version probe (identical to what `cli/main.py:102` already calls) +- `cli.update_check.format_outdated_notice()` for `--check-only` output +- `cli.config.get_server_url()` for the server URL +- `shutil.which("uv")` to choose install path +- `subprocess.run` with `check=True` to surface install failures + +Wire into `cli/main.py` near the existing typer registrations. + +### `cli/client.py:get_client()` — header inspection + +Wrap the returned `httpx.Client` so every response goes through one hook. Cleanest is `httpx.Client(event_hooks={"response": [_check_version_headers]})`: + +```python +def _check_version_headers(response: httpx.Response) -> None: + latest = response.headers.get("X-Agnes-Latest-Version") + minv = response.headers.get("X-Agnes-Min-Version") + if not latest or not minv: + return # talking to an older server; no enforcement + local = _installed_version() # reuse from update_check + if local == "unknown": + return # dev install / editable; never block + if _version_lt(local, minv): # reuse update_check._version_lt + sys.stderr.write( + f"error: agnes {local} is incompatible with server {latest}" + f" (min required: {minv}). Run: agnes self-upgrade\n" + ) + sys.exit(2) +``` + +Only the hard-stop is enforced here — drift warnings are already handled by `update_check` in the root callback, no point doubling them on every API call. + +`_version_lt` and `_installed_version` move from `cli/update_check.py` into `cli/_version_compat.py` (or stay in `update_check.py` and `client.py` imports them) — pick whichever keeps imports simple. Both files need them. + +User-Agent: extend `get_client()` to set `headers={"User-Agent": f"agnes/{_installed_version()} ({platform.system().lower()})"}` (merge with caller-supplied headers). + +### `cli/lib/hooks.py:install_claude_hooks` — chain self-upgrade ahead of pull + +```python +_OUR_COMMAND_MARKERS = ("agnes self-upgrade", "agnes pull", "agnes push", "da sync") + +_replace_or_add( + "SessionStart", + "agnes self-upgrade --quiet 2>/dev/null || true; " + "agnes pull --quiet 2>/dev/null || true", +) +_replace_or_add("SessionEnd", "agnes push --quiet 2>/dev/null || true") +``` + +Single chained SessionStart entry. Shell `;` guarantees ordering (no reliance on Claude Code's undocumented multi-hook semantics); each `|| true` ensures one segment's failure does not abort the line. `_OUR_COMMAND_MARKERS` is extended so re-running `agnes init` recognises the chained line on substring match and replaces rather than duplicates. + +### Drive-by cleanup + +`cli/update_check.py` docstring (lines 1-9) still references `da` four times. Update to `agnes`. No behavior change. + +--- + +## Tests + +### Server + +- New: `tests/test_version_headers_middleware.py` — `/api/sync/trigger` (or any cheap `/api/*`) returns both headers; `/web/*` and `/cli/*` do not. +- Existing `/cli/latest` tests already cover the wheel metadata path. + +### CLI + +- `tests/test_self_upgrade.py` — mock `update_check.check()`, mock `subprocess.run`, assert correct command shape (uv vs pip path), assert `--check-only` exits 1 when outdated and 0 when current, assert `--force` skips the `is_outdated()` short-circuit, assert success path invalidates the `update_check.json` cache. +- `tests/test_client_version_check.py` — fake response with `min > local` ⇒ `SystemExit(2)`. Fake response with `latest > local >= min` ⇒ no stderr, no exit. Local `unknown` ⇒ no enforcement. Missing headers (old server) ⇒ no enforcement. +- `tests/test_lib_hooks.py` — assert the chained command is the sole SessionStart entry, that `self-upgrade` precedes `pull`, that both segments end in `|| true`, and that re-running `install_claude_hooks` stays idempotent (length stays at 1). + +--- + +## Migration / rollout + +- Additive — no breaking change. Old CLIs (no header check, no self-upgrade command) keep working; old servers (no headers) make the new CLI silent (no enforcement, just the existing warning loop). +- Ship in one PR. CHANGELOG entry under `### Added`: "CLI now auto-upgrades from the server at session start (`agnes self-upgrade`) and hard-stops on incompatible-version mismatch via response headers." +- After merge, manually bump `MIN_COMPAT_CLI_VERSION` in the next PR that ships a wire-protocol break — that's the first time the hard-stop actually fires. + +--- + +## Self-review + +- **Spec coverage:** both layers (A/B), both directions (check + enforce), reuse of `update_check` to avoid two polling paths, hook idempotency, drive-by `da → agnes` cleanup. ✓ +- **Resolved during review:** A (`cli/client.py:216` + `cli/main.py:99-104`), B (`MIN_COMPAT_CLI_VERSION = "0.0.0"` on day one), D (reuse `AGNES_NO_UPDATE_CHECK`, no new opt-out flag). +- **No placeholders:** every component has a concrete file path and existing-symbol reference. +- **Type/name consistency:** `APP_VERSION`, `MIN_COMPAT_CLI_VERSION`, `X-Agnes-Latest-Version`, `X-Agnes-Min-Version`, `agnes self-upgrade`, reused `update_check.check()` / `format_outdated_notice()` / `_version_lt()` / `_installed_version()` — consistent throughout. +- Spec, not plan: no per-step TDD breakdown. Convert to a plan once reviewed.